Compare commits

...

101 Commits

Author SHA1 Message Date
alt-glitch e5492dc002 chore(docs): remove stale documentation files
Remove outdated docs that no longer reflect the current architecture:
ACP setup guide, Honcho integration spec, OpenClaw migration notes,
pricing architecture design, ink-gateway TUI migration plan,
example skin config, and container CLI review fixes.
2026-04-20 01:34:30 +05:30
konsisumer 1d1e1277e4 fix(gateway): flush undelivered tail before segment reset to preserve streamed text (#8124)
When a streaming edit fails mid-stream (flood control, transport error)
and a tool boundary arrives before the fallback threshold is reached,
the pre-boundary tail in `_accumulated` was silently discarded by
`_reset_segment_state`. The user saw a frozen partial message and
missing words on the other side of the tool call.

Flush the undelivered tail as a continuation message before the reset,
computed relative to the last successfully-delivered prefix so we don't
duplicate content the user already saw.
2026-04-19 01:43:04 -07:00
Teknium e017131403 feat(cron): add wakeAgent gate — scripts can skip the agent entirely
Extends the existing cron script hook with a wake gate ported from
nanoclaw #1232. When a cron job's pre-check Python script (already
sandboxed to HERMES_HOME/scripts/) writes a JSON line like
```json
{"wakeAgent": false}
```
on its last stdout line, `run_job()` returns the SILENT marker and
skips the agent entirely — no LLM call, no delivery, no tokens spent.
Useful for frequent polls (every 1-5 min) that only need to wake the
agent when something has genuinely changed.

Any other script output (non-JSON, missing key, non-dict, `wakeAgent: true`,
truthy/falsy non-False values) behaves as before: stdout is injected
as context and the agent runs normally. Strict `False` is required
to skip — avoids accidental gating from arbitrary JSON.

Refactor:
- New pure helper `_parse_wake_gate(script_output)` in cron/scheduler.py
- `_build_job_prompt` accepts optional `prerun_script` tuple so the
  script runs exactly once per job (run_job runs it for the gate check,
  reuses the output for prompt injection)
- `run_job` short-circuits with SILENT_MARKER when gate fires

Script failures (success=False) still cannot trigger the gate — the
failure is reported as context to the agent as before.

This replaces the approach in closed PR #3837, which inlined bash
scripts via tempfile and lost the path-traversal/scripts-dir sandbox
that main's impl has. The wake-gate idea (the one net-new capability)
is ported on top of the existing sandboxed Python-script model.

Tests:
- 11 pure unit tests for _parse_wake_gate (empty, whitespace, non-JSON,
  non-dict JSON, missing key, truthy/falsy non-False, multi-line,
  trailing blanks, non-last-line JSON)
- 5 integration tests for run_job wake-gate (skip returns SILENT,
  wake-true passes through, script-runs-only-once, script failure
  doesn't gate, no-script regression)
- Full tests/cron/ suite: 194/194 pass
2026-04-19 01:42:35 -07:00
helix4u c94d26c69b fix(cli): sanitize interactive command output 2026-04-19 01:16:34 -07:00
kshitijk4poor 175cf7e6bb fix: tighten quiet-mode salvage follow-ups
Follow-up for the helix4u easy-fix salvage batch:
- route remaining context-engine quiet-mode output through
  _should_emit_quiet_tool_messages() so non-CLI/library callers stay
  silent consistently
- drop the extra senderAliases computation from WhatsApp allowlist-drop
  logging and remove the now-unused import

This keeps the batch scoped to the intended fixes while avoiding
leaked quiet-mode output and unnecessary duplicate work in the bridge.
2026-04-19 00:28:25 -07:00
helix4u cd59af17cc fix(agent): silence quiet_mode in python library use 2026-04-19 00:28:25 -07:00
helix4u 361675018f fix(setup): stop hardcoding max-iterations copy 2026-04-19 00:28:25 -07:00
helix4u 3ade655999 fix(whatsapp): log allowlist drops in bridge 2026-04-19 00:28:25 -07:00
Teknium 7c10761dd2 fix(discord): shield text-batch flush from follow-up cancel (#12444)
When Discord splits a long message at 2000 chars, _enqueue_text_event
buffers each chunk and schedules a _flush_text_batch task with a
short delay.  If another chunk lands while the prior flush task is
already inside handle_message, _enqueue_text_event calls
prior_task.cancel() — and without asyncio.shield, CancelledError
propagates from the flush task into handle_message → the agent's
streaming request, aborting the response the user was waiting on.

Reproducer: user sends a 3000-char prompt (split by Discord into 2
messages).  Chunk 1 lands, flush delay starts, chunk 2 lands during
the brief window when chunk 1's flush has already committed to
handle_message.  Agent's current streaming response is cancelled
with CancelledError, user sees a truncated or missing reply.

Fix (gateway/platforms/discord.py):
- Wrap the handle_message call in asyncio.shield so the inner
  dispatch is protected from the outer task's cancel.
- Add an except asyncio.CancelledError clause so the outer task
  still exits cleanly when cancel lands during the sleep window
  (before the pop) — semantics for that path are unchanged.

The new flush task spawned by the follow-up chunk still handles its
own batch via the normal pending-message / active-session machinery
in base.py, so follow-ups are not lost.

Tests: tests/gateway/test_text_batching.py —
test_shield_protects_handle_message_from_cancel.  Tracks a distinct
first_handle_cancelled event so the assertion fails cleanly when the
shield is missing (verified by stashing the fix and re-running).

Live E2E on the live-loaded DiscordAdapter:
  first_handle_cancelled: False  (shield worked)
  first_handle_completed: True   (handle_message ran to completion)
2026-04-19 00:09:38 -07:00
Teknium dca439fe92 fix(tui): scope session.interrupt pending-prompt release to the calling session (#12441)
session.interrupt on session A was blast-resolving pending
clarify/sudo/secret prompts on ALL sessions sharing the same
tui_gateway process.  Other sessions' agent threads unblocked with
empty-string answers as if the user had cancelled — silent
cross-session corruption.

Root cause: _pending and _answers were globals keyed by random rid
with no record of the owning session.  _clear_pending() iterated
every entry, so the session.interrupt handler had no way to limit
the release to its own sid.

Fix:
- tui_gateway/server.py: _pending now maps rid to (sid, Event)
  tuples.  _clear_pending takes an optional sid argument and filters
  by owner_sid when provided.  session.interrupt passes the calling
  sid so unrelated sessions are untouched.  _clear_pending(None)
  remains the shutdown path for completeness.
- _block and _respond updated to pack/unpack the new tuple format.

Tests (tests/test_tui_gateway_server.py): 4 new cases.
- test_interrupt_only_clears_own_session_pending: two sessions with
  pending prompts, interrupting one must not release the other.
- test_interrupt_clears_multiple_own_pending: same-sid multi-prompt
  release works.
- test_clear_pending_without_sid_clears_all: shutdown path preserved.
- test_respond_unpacks_sid_tuple_correctly: _respond handles the
  tuple format.

Also updated tests/tui_gateway/test_protocol.py to use the new tuple
format for test_block_and_respond and test_clear_pending.

Live E2E against the live Python environment confirmed cross-session
isolation: interrupting sid_a released its own pending prompt without
touching sid_b's.  All 78 related tests pass.
2026-04-19 00:03:58 -07:00
Teknium ce410521b3 feat(browser): add browser_cdp raw DevTools Protocol passthrough (#12369)
Agents can now send arbitrary CDP commands to the browser. The tool is
gated on a reachable CDP endpoint at session start — it only appears in
the toolset when BROWSER_CDP_URL is set (from '/browser connect') or
'browser.cdp_url' is configured in config.yaml. Backends that don't
currently expose CDP to the Python side (Camofox, default local
agent-browser, cloud providers whose per-session cdp_url is not yet
surfaced) do not see the tool at all.

Tool schema description links to the CDP method reference at
https://chromedevtools.github.io/devtools-protocol/ so the agent can
web_extract specific method docs on demand.

Stateless per call. Browser-level methods (Target.*, Browser.*,
Storage.*) omit target_id. Page-level methods attach to the target
with flatten=true and dispatch the method on the returned sessionId.
Clean errors when the endpoint becomes unreachable mid-session or
the URL isn't a WebSocket.

Tests: 19 unit (mock CDP server + gate checks) + E2E against real
headless Chrome (Target.getTargets, Browser.getVersion,
Runtime.evaluate with target_id, Page.navigate + re-eval, bogus
method, bogus target_id, missing endpoint) + E2E of the check_fn
gate (tool hidden without CDP URL, visible with it, hidden again
after unset).
2026-04-19 00:03:10 -07:00
helix4u d66414a844 docs(custom-providers): use key_env in examples 2026-04-18 23:07:59 -07:00
helix4u 7b1a11b971 fix(memory): keep Honcho provider opt-in 2026-04-18 22:50:55 -07:00
kshitijk4poor 0a8d48809f chore: add LeonSGP43 numeric noreply email to AUTHOR_MAP
The cherry-picked commit from #11434 uses the 154585401+ prefixed
noreply format. Add it alongside the existing bare entry so the
contributor audit passes.
2026-04-18 22:50:55 -07:00
Erosika 21d5ef2f17 feat(honcho): wizard cadence default 2, surface reasoning level, backwards-compat fallback
Setup wizard now always writes dialecticCadence=2 on new configs and
surfaces the reasoning level as an explicit step with all five options
(minimal / low / medium / high / max), always writing
dialecticReasoningLevel.

Code keeps a backwards-compat fallback of 1 when dialecticCadence is
unset so existing honcho.json configs that predate the setting keep
firing every turn on upgrade. New setups via the wizard get 2
explicitly; docs show 2 as the default.

Also scrubs editorial lines from code and docs ("max is reserved for
explicit tool-path selection", "Unset → every turn; wizard pre-fills 2",
and similar process-exposing phrasing) and adds an inline link to
app.honcho.dev where the server-side observation sync is mentioned in
honcho.md. Recommended cadence range updated to 1-5 across docs and
wizard copy.
2026-04-18 22:50:55 -07:00
LeonSGP43 5b6792f04d fix(honcho): scope gateway sessions by runtime user id 2026-04-18 22:50:55 -07:00
Erosika ba7da73ca9 test(honcho): drop two first-turn tests subsumed by prewarm + smoke coverage
- TestDialecticDepth::test_first_turn_runs_dialectic_synchronously:
  covered by TestSessionStartDialecticPrewarm::test_turn1_falls_back_to_sync_when_prewarm_missing
  (more realistic — exercises the empty-prewarm → sync-fallback path)
- TestDialecticDepth::test_first_turn_dialectic_does_not_double_fire:
  covered by TestDialecticLifecycleSmoke (turn 1 flow) and
  TestDialecticCadenceAdvancesOnSuccess::test_empty_dialectic_result_does_not_advance_cadence

Both predate the prewarm refactor and test paths that are now
fallback behaviors already covered elsewhere.
2026-04-18 22:50:55 -07:00
Erosika c630dfcdac feat(honcho): dialectic liveness — stale-thread watchdog, stale-result discard, empty-streak backoff
Hardens the dialectic lifecycle against three failure modes that could
leave the prefetch pipeline stuck or injecting stale content:

- Stale-thread watchdog: _thread_is_live() treats any prefetch thread
  older than timeout × 2.0 as dead. A hung Honcho call can no longer
  block subsequent fires indefinitely.

- Stale-result discard: pending _prefetch_result is tagged with its
  fire turn. prefetch() discards the result if more than cadence × 2
  turns passed before a consumer read it (e.g. a run of trivial-prompt
  turns between fire and read).

- Empty-streak backoff: consecutive empty dialectic returns widen the
  effective cadence (dialectic_cadence + streak, capped at cadence × 8).
  A healthy fire resets the streak. Prevents the plugin from hammering
  the backend every turn when the peer graph is cold.

- liveness_snapshot() on the provider exposes current turn, last fire,
  pending fire-at, empty streak, effective cadence, and thread status
  for in-process diagnostics.

- system_prompt_block: nudge the model that honcho_reasoning accepts
  reasoning_level minimal/low/medium/high/max per call.

- hermes honcho status: surface base reasoning level, cap, and heuristic
  toggle so config drift is visible at a glance.

Tests: 550 passed.
- TestDialecticLiveness (8 tests): stale-thread recovery, stale-result
  discard, fresh-result retention, backoff widening, backoff ceiling,
  streak reset on success, streak increment on empty, snapshot shape.
- Existing TestDialecticCadenceAdvancesOnSuccess::test_in_flight_thread_is_not_stacked
  updated to set _prefetch_thread_started_at so it tests the
  fresh-thread-blocks branch (stale path covered separately).
- test_cli TestCmdStatus fake updated with the new config attrs surfaced
  in the status block.
2026-04-18 22:50:55 -07:00
Erosika 098efde848 docs(honcho): wizard cadence default 2, prewarm/depth + observation + multi-peer
- cli: setup wizard pre-fills dialecticCadence=2 (code default stays 1
  so unset → every turn)
- honcho.md: fix stale dialecticCadence default in tables, add
  Session-Start Prewarm subsection (depth runs at init), add
  Query-Adaptive Reasoning Level subsection, expand Observation
  section with directional vs unified semantics and per-peer patterns
- memory-providers.md: fix stale default, rename Multi-agent/Profiles
  to Multi-peer setup, add concrete walkthrough for new profiles and
  sync, document observation toggles + presets, link to honcho.md
- SKILL.md: fix stale defaults, add Depth at session start callout
2026-04-18 22:50:55 -07:00
Erosika 5f9907c116 chore(honcho): drop docs from PR scope, scrub commentary
- Revert website/docs and SKILL.md changes; docs unification handled separately
- Scrub commit/PR refs and process narration from code comments and test
  docstrings (no behavior change)
2026-04-18 22:50:55 -07:00
Erosika 78586ce036 fix(honcho): dialectic lifecycle — defaults, retry, prewarm consumption
Several correctness and cost-safety fixes to the Honcho dialectic path
after a multi-turn investigation surfaced a chain of silent failures:

- dialecticCadence default flipped 3 → 1. PR #10619 changed this from 1 to
  3 for cost, but existing installs with no explicit config silently went
  from per-turn dialectic to every-3-turns on upgrade. Restores pre-#10619
  behavior; 3+ remains available for cost-conscious setups. Docs + wizard
  + status output updated to match.

- Session-start prewarm now consumed. Previously fired a .chat() on init
  whose result landed in HonchoSessionManager._dialectic_cache and was
  never read — pop_dialectic_result had zero call sites. Turn 1 paid for
  a duplicate synchronous dialectic. Prewarm now writes directly to the
  plugin's _prefetch_result via _prefetch_lock so turn 1 consumes it with
  no extra call.

- Prewarm is now dialecticDepth-aware. A single-pass prewarm can return
  weak output on cold peers; the multi-pass audit/reconcile cycle is
  exactly the case dialecticDepth was built for. Prewarm now runs the
  full configured depth in the background.

- Silent dialectic failure no longer burns the cadence window.
  _last_dialectic_turn now advances only when the result is non-empty.
  Empty result → next eligible turn retries immediately instead of
  waiting the full cadence gap.

- Thread pile-up guard. queue_prefetch skips when a prior dialectic
  thread is still in-flight, preventing stacked races on _prefetch_result.

- First-turn sync timeout is recoverable. Previously on timeout the
  background thread's result was stored in a dead local list. Now the
  thread writes into _prefetch_result under lock so the next turn
  picks it up.

- Cadence gate applies uniformly. At cadence=1 the old "cadence > 1"
  guard let first-turn sync + same-turn queue_prefetch both fire.
  Gate now always applies.

- Restored query-length reasoning-level scaling, dropped in 9a0ab34c.
  Scales dialecticReasoningLevel up on longer queries (+1 at ≥120 chars,
  +2 at ≥400), clamped at reasoningLevelCap. Two new config keys:
  `reasoningHeuristic` (bool, default true) and `reasoningLevelCap`
  (string, default "high"; previously parsed but never enforced).
  Respects dialecticDepthLevels and proportional lighter-early passes.

- Restored short-prompt skip, dropped in ef7f3156. One-word
  acknowledgements ("ok", "y", "thanks") and slash commands bypass
  both injection and dialectic fire.

- Purged dead code in session.py: prefetch_dialectic, _dialectic_cache,
  set_dialectic_result, pop_dialectic_result — all unused after prewarm
  refactor.

Tests: 542 passed across honcho_plugin/, agent/test_memory_provider.py,
and run_agent/test_run_agent.py. New coverage:
- TestTrivialPromptHeuristic (classifier + prefetch/queue skip)
- TestDialecticCadenceAdvancesOnSuccess (empty-result retry, pile-up guard)
- TestSessionStartDialecticPrewarm (prewarm consumed, sync fallback)
- TestReasoningHeuristic (length bumps, cap clamp, interaction with depth)
- TestDialecticLifecycleSmoke (end-to-end 8-turn session walk)
2026-04-18 22:50:55 -07:00
Teknium bf5d7462ba fix(tui): reject history-mutating commands while session is running (#12416)
Fixes silent data loss in the TUI when /undo, /compress, /retry, or
rollback.restore runs during an in-flight agent turn.  The version-
guard at prompt.submit:1449 would fail the version check and silently
skip writing the agent's result — UI showed the assistant reply but
DB / backend history never received it, causing UI↔backend desync
that persisted across session resume.

Changes (tui_gateway/server.py):
- session.undo, session.compress, /retry, rollback.restore (full-history
  only — file-scoped rollbacks still allowed): reject with 4009 when
  session.running is True.  Users can /interrupt first.
- prompt.submit: on history_version mismatch (defensive backstop),
  attach a 'warning' field to message.complete and log to stderr
  instead of silently dropping the agent's output.  The UI can surface
  the warning to the user; the operator can spot it in logs.

Tests (tests/test_tui_gateway_server.py): 6 new cases.
- test_session_undo_rejects_while_running
- test_session_undo_allowed_when_idle (regression guard)
- test_session_compress_rejects_while_running
- test_rollback_restore_rejects_full_history_while_running
- test_prompt_submit_history_version_mismatch_surfaces_warning
- test_prompt_submit_history_version_match_persists_normally (regression)

Validated: against unpatched server.py the three 'rejects_while_running'
tests fail and the version-mismatch test fails (no 'warning' field).
With the fix, all 6 pass, all 33 tests in the file pass, 74 TUI tests
in total pass.  Live E2E against the live Python environment confirmed
all 5 patches present and guards enforce 4009 exactly as designed.
2026-04-18 22:30:10 -07:00
Teknium 3a6351454b fix(gateway): close pending-drain and late-arrival races in base adapter (#12371)
Two related race conditions in gateway/platforms/base.py that could
produce duplicate agent runs or silently drop messages. Neither is
specific to any one platform — all adapters inherit this logic.

R5 (HIGH) — duplicate agent spawn on turn chain
  In _process_message_background, the pending-drain path deleted
  _active_sessions[session_key] before awaiting typing_task.cancel()
  and then recursively awaiting _process_message_background for the
  queued event. During the typing_task await, a fresh inbound message
  M3 could pass the Level-1 guard (entry now missing), set its own
  Event, and spawn a second _process_message_background for the same
  session_key — two agents running simultaneously, duplicate responses,
  duplicate tool calls.

  Fix: keep the _active_sessions entry populated and only clear() the
  Event. The guard stays live, so any concurrent inbound message takes
  the busy-handler path (queue + interrupt) as intended.

R6 (MED-HIGH) — message dropped during finally cleanup
  The finally block has two await points (typing_task, stop_typing)
  before it deletes _active_sessions. A message arriving in that
  window passes the guard (entry still live), lands in
  _pending_messages via the busy-handler — and then the unconditional
  del removes the guard with that message still queued. Nothing
  drains it; the user never gets a reply.

  Fix: before deleting _active_sessions in finally, pop any late
  pending_messages entry and spawn a drain task for it. Only delete
  _active_sessions when no pending is waiting.

Tests: tests/gateway/test_pending_drain_race.py — three regression
cases. Validated: without the fix, two of the three fail exactly
where the races manifest (duplicate-spawn guard loses identity,
late-arrival 'LATE' message not in processed list).
2026-04-18 19:32:26 -07:00
Teknium 762f7e9796 feat: configurable approval mode for cron jobs (approvals.cron_mode)
Add approvals.cron_mode config option that controls how cron jobs handle
dangerous commands. Previously, cron jobs silently auto-approved all
dangerous commands because there was no user present to approve them.

Now the behavior is configurable:
  - deny (default): block dangerous commands and return a message telling
    the agent to find an alternative approach. The agent loop continues —
    it just can't use that specific command.
  - approve: auto-approve all dangerous commands (previous behavior).

When a command is blocked, the agent receives the same response format as
a user denial in the CLI — exit_code=-1, status=blocked, with a message
explaining why and pointing to the config option. This keeps the agent
loop running and encourages it to adapt.

Implementation:
  - config.py: add approvals.cron_mode to DEFAULT_CONFIG
  - scheduler.py: set HERMES_CRON_SESSION=1 env var before agent runs
  - approval.py: both check_command_approval() and check_all_command_guards()
    now check for cron sessions and apply the configured mode
  - 21 new tests covering config parsing, deny/approve behavior, and
    interaction with other bypass mechanisms (yolo, containers)
2026-04-18 19:24:35 -07:00
Teknium b02833f32d fix(codex): Hermes owns its own Codex auth; stop touching ~/.codex/auth.json (#12360)
Codex OAuth refresh tokens are single-use and rotate on every refresh.
Sharing them with the Codex CLI / VS Code via ~/.codex/auth.json made
concurrent use of both tools a race: whoever refreshed last invalidated
the other side's refresh_token.  On top of that, the silent auto-import
path picked up placeholder / aborted-auth data from ~/.codex/auth.json
(e.g. literal {"access_token":"access-new","refresh_token":"refresh-new"})
and seeded it into the Hermes pool as an entry the selector could
eventually pick.

Hermes now owns its own Codex auth state end-to-end:

Removed
- agent/credential_pool.py: _sync_codex_entry_from_cli() method,
  its pre-refresh + retry + _available_entries call sites, and the
  post-refresh write-back to ~/.codex/auth.json.
- agent/credential_pool.py: auto-import from ~/.codex/auth.json in
  _seed_from_singletons() — users now run `hermes auth openai-codex`
  explicitly.
- hermes_cli/auth.py: silent runtime migration in
  resolve_codex_runtime_credentials() — now surfaces
  `codex_auth_missing` directly (message already points to `hermes auth`).
- hermes_cli/auth.py: post-refresh write-back in
  _refresh_codex_auth_tokens().
- hermes_cli/auth.py: dead helper _write_codex_cli_tokens() and its 4
  tests in test_auth_codex_provider.py.

Kept
- hermes_cli/auth.py: _import_codex_cli_tokens() — still used by the
  interactive `hermes auth openai-codex` setup flow for a user-gated
  one-time import (with "a separate login is recommended" messaging).

User-visible impact
- On existing installs with Hermes auth already present: no change.
- On a fresh install where the user has only logged in via Codex CLI:
  `hermes chat --provider openai-codex` now fails with "No Codex
  credentials stored. Run `hermes auth` to authenticate." The
  interactive setup flow then detects ~/.codex/auth.json and offers a
  one-time import.
- On an install where Codex CLI later refreshes its token: Hermes is
  unaffected (we no longer read from that file at runtime).

Tests
- tests/hermes_cli/test_auth_codex_provider.py: 15/15 pass.
- tests/hermes_cli/test_auth_commands.py: 20/20 pass.
- tests/agent/test_credential_pool.py: 31/31 pass.
- Live E2E on openai-codex/gpt-5.4: 1 API call, 1.7s latency,
  3 log lines, no refresh events, no auth drama.

The related 14:52 refresh-loop bug (hundreds of rotations/minute on a
single entry) is a separate issue — that requires a refresh-attempt
cap on the auth-recovery path in run_agent.py, which remains open.
2026-04-18 19:19:46 -07:00
yeyitech bd01ec7885 fix(cli): strip all reasoning tag variants from /resume recap
HermesCLI._display_resumed_history() calls the module-level _strip_reasoning_tags() to clean assistant content before rendering the recap panel.  The tag list was missing <thought> (Gemma 4) and there was no pass for stray orphan </tag> closes, so those variants leaked internal reasoning into the recap display (#11316).

- Add <thought> to _REASONING_TAGS.
- Add a third regex pass that strips orphan close tags (e.g. 'stuff</think>answer' → 'stuffanswer').
- Apply IGNORECASE to closed-pair and unclosed-pair passes so mixed-case variants (<THINK>, <Thinking>) are handled uniformly — previously both 'THINKING' and 'thinking' had to be listed explicitly as distinct tuple entries, which missed <Thinking>.

7 new regression tests in tests/cli/test_resume_display.py covering: <think>, <thinking>, <reasoning>, <thought>, unclosed <think>, multiple interleaved blocks, and orphan </think> close.

Resolves #11316.

Originally proposed as PR #11366.
2026-04-18 19:19:24 -07:00
Tranquil-Flow ec48ec5530 fix(agent): strip <think> blocks from stored assistant content
Inline reasoning tags in an assistant message's content field leak to every downstream consumer: messaging platforms (#8878, #9568), API replay of prior turns, session transcript, CLI recap, generated session titles, and context compression.  _extract_reasoning() already captures the reasoning text into msg['reasoning'] separately, so the raw tags in content are redundant.

Stripping once at the storage boundary in _build_assistant_message() cleans the content for every downstream path in one place — no per-platform or per-path stripper needed.  Measured impact on a real MiniMax M2.7-highspeed session (per @luoyejiaoe-source, #9306): 55% of assistant messages started with <think> blocks, 51/100 session titles were polluted, 16% content-size reduction.

3 new regression tests in TestBuildAssistantMessage: closed-pair strip with reasoning capture, no-think-tag passthrough, and unterminated-block strip.

Resolves #8878 and #9568.

Originally proposed as PR #9250.
2026-04-18 19:19:24 -07:00
Teknium 9489d1577d fix(agent): strip unterminated <think> blocks from visible content
Providers served via NIM (MiniMax M2.7, some Moonshot/DeepSeek proxies) sometimes drop the closing </think> tag, leaving raw reasoning in the assistant's content field.  _strip_think_blocks()'s closed-pair regex is non-greedy so it only matches complete blocks — any orphan <think>...EOF survived the stripper and leaked to users (#8878, #9568, #10408).

Adds an unterminated-tag pass that fires when an open reasoning tag sits at a block boundary (start of text or after a newline) with no matching close.  Everything from that tag to end of string is stripped.  The block-boundary check mirrors gateway/stream_consumer.py's filter so models that mention <think> in prose are not over-stripped.

Also makes the closed-pair regexes consistently case-insensitive so <THINK>...</THINK> and <Thinking>...</Thinking> are handled uniformly — previously the mixed-case open tag would bypass the closed-pair pass and be caught by the unterminated-tag pass, taking trailing visible content with it.

6 new regression tests in TestStripThinkBlocks covering: unterminated <think>, unterminated <thought>, multi-line unterminated, line-start orphan with preserved prefix, prose-mention non-regression, mixed-case closed pairs.

The implementation is inspired by @luinbytes's PR #10408 report of the NIM/MiniMax symptom.  This commit does not include the 💭/🧠 emoji regexes from that PR — those glyphs are Hermes CLI display decorations, not model content markers.
2026-04-18 19:19:24 -07:00
Teknium 79c5a381c5 feat(uninstall): offer to remove named profiles when uninstalling from default
When `hermes uninstall` runs from the default HERMES_HOME (~/.hermes)
and other named profiles exist under ~/.hermes/profiles/, show them in
the installation overview and prompt:

    Also stop and remove these N profile(s)? [y/N]

If confirmed, for each named profile we:
  1. Shell out to `python -m hermes_cli.main -p <name> gateway stop/uninstall`
     to stop the gateway and remove its systemd unit or launchd plist
     (service names + unit paths are derived from HERMES_HOME, so we
     can't cleanly switch in-process)
  2. Remove the ~/.local/bin/<name> alias wrapper (outside HERMES_HOME)
  3. Wipe the profile's HERMES_HOME dir

Previously `hermes uninstall` was silently profile-scoped, leaving
zombie systemd units at ~/.config/systemd/user/hermes-gateway-<profile>.service
and zombie HERMES_HOMEs under ~/.hermes/profiles/ whenever a user
uninstalled from default with other profiles configured.

Prompt only appears when uninstalling from the default root. Uninstalling
from within a named profile stays profile-scoped as before.
2026-04-18 19:18:13 -07:00
Teknium 3fe0d503b6 fix(uninstall): properly stop and destroy gateway on hermes uninstall
The uninstaller's gateway cleanup was incomplete:
- Linux only (ignored macOS launchd)
- Only checked user systemd scope (missed system services)
- Didn't kill standalone gateway processes (hermes gateway run)
- Missing DBUS env setup for headless servers

Now delegates to gateway.py's existing machinery:
1. Kill any standalone gateway processes (all platforms)
2. Linux: stop + disable + remove both user AND system systemd services
3. macOS: unload + remove launchd plist
4. Warns (instead of silently failing) when system service needs sudo
2026-04-18 19:18:13 -07:00
Teknium 1e5f0439d9 docs: update Anthropic console URLs to platform.claude.com
Anthropic migrated their developer console from console.anthropic.com
to platform.claude.com. Two user-facing display URLs were still pointing
to the old domain:

- hermes_cli/main.py — API key prompt in the Anthropic model flow
- run_agent.py — 401 troubleshooting output

The OAuth token refresh endpoint was already migrated in PR #3246
(with fallback).

Spotted by @LucidPaths in PR #3237.

(Salvage of #3758 — dropped the setup.py hunk since that section was
refactored away and no longer contains the stale URL.)
2026-04-18 18:55:58 -07:00
Teknium 2a2e5c0fed fix: force relogin on 401/403 Codex token refresh failures
When the OAuth token endpoint returns 401/403 but the JSON body
doesn't contain a known error code (invalid_grant, etc.),
relogin_required stayed False. Users saw a bare error message
without guidance to re-authenticate.

Now any 401/403 from the token endpoint forces relogin_required=True,
since these status codes always indicate invalid credentials on a
refresh endpoint. 500+ errors remain as transient (no relogin).
2026-04-18 18:54:34 -07:00
Teknium beabbd87ef fix(gateway): close adapter resources when connect() fails or raises (#12339)
Gateway startup leaks aiohttp.ClientSession (and other partial-init
resources) when an adapter's connect() returns False or raises. The
adapter is never added to self.adapters, so the shutdown path at
gateway/run.py:2426 never calls disconnect() on it — Python GC later
logs 'Unclosed client session' at process exit.

Seen on 2026-04-18 18:08:16 during a double --replace takeover cycle:
one of the partial-init sessions survived past shutdown and emitted
the warning right before status=75/TEMPFAIL.

Fix:
- New GatewayRunner._safe_adapter_disconnect() helper — calls
  adapter.disconnect() and swallows any exception. Used on error paths.
- Connect loop calls it in both failure branches: success=False and
  except Exception.
- Adapter disconnect() implementations are already expected to be
  idempotent and tolerate partial-init state (they all guard on
  self._http_session / self._bridge_process before touching them).

Tests: tests/gateway/test_safe_adapter_disconnect.py — 3 cases verify
the helper forwards to disconnect, swallows exceptions, and tolerates
platform=None.
2026-04-18 18:53:31 -07:00
Teknium 632a807a3e fix(gateway): slash commands never interrupt a running agent (#12334)
Any recognized slash command now bypasses the Level-1 active-session
guard instead of queueing + interrupting. A mid-run /model (or
/reasoning, /voice, /insights, /title, /resume, /retry, /undo,
/compress, /usage, /provider, /reload-mcp, /sethome, /reset) used to
interrupt the agent AND get silently discarded by the slash-command
safety net — zero-char response, dropped tool calls.

Root cause:
- Discord registers 41 native slash commands via tree.command().
- Only 14 were in ACTIVE_SESSION_BYPASS_COMMANDS.
- The other ~15 user-facing ones fell through base.py:handle_message
  to the busy-session handler, which calls running_agent.interrupt()
  AND queues the text.
- After the aborted run, gateway/run.py:9912 correctly identifies the
  queued text as a slash command and discards it — but the damage
  (interrupt + zero-char response) already happened.

Fix:
- should_bypass_active_session() now returns True for any resolvable
  slash command. ACTIVE_SESSION_BYPASS_COMMANDS stays as the subset
  with dedicated Level-2 handlers (documentation + tests).
- gateway/run.py adds a catch-all after the dedicated handlers that
  returns a user-visible "agent busy — wait or /stop first" response
  for any other resolvable command.
- Unknown text / file-path-like messages are unchanged — they still
  queue.

Also:
- gateway/platforms/discord.py logs the invoker identity on every
  slash command (user id + name + channel + guild) so future
  ghost-command reports can be triaged without guessing.

Tests:
- 15 new parametrized cases in test_command_bypass_active_session.py
  cover every previously-broken Discord slash command.
- Existing tests for /stop, /new, /approve, /deny, /help, /status,
  /agents, /background, /steer, /update, /queue still pass.
- test_steer.py's ACTIVE_SESSION_BYPASS_COMMANDS check still passes.

Fixes #5057. Related: #6252, #10370, #4665.
2026-04-18 18:53:22 -07:00
Teknium 41560192c4 chore(attribution): add AUTHOR_MAP entry for nish3451
Adds the nish3451 noreply email to the AUTHOR_MAP so CI attribution checks
pass for the #6100 Telegram DM fallback fix merged in 1a9a2d7f.
2026-04-18 18:52:41 -07:00
Teknium aa5f89d3ea test: add coverage for from_user=None DM fallback
Tests the three cases:
- DM with from_user=None: user_id falls back to chat.id
- Group with from_user=None: user_id stays None (safe default)
- DM with from_user present: user_id uses from_user.id (no regression)
2026-04-18 18:18:01 -07:00
Nish 1a9a2d7fe8 fix(gateway/telegram): fall back to chat.id when from_user is None in DMs
When `message.from_user` is None — which can happen for forwarded messages,
anonymous admin mode in groups, or certain Telegram client edge cases —
`_build_message_event` set `source.user_id` to None. This caused:

1. `_is_user_authorized()` to early-return False (`if not user_id: return False`)
2. The access check never compared against `TELEGRAM_ALLOWED_USERS` even when
   the user actually was in the allowlist
3. The pairing flow fired and generated a code for `user_id=None`
4. The pairing approval saved an entry under the literal string key "null"
5. The user was effectively locked out because their real user_id never
   matched the "null" key on subsequent messages

For DMs (`chat_type == "dm"`), Telegram guarantees `chat.id == user.id` —
they are the same numeric ID for private chats. Falling back to `chat.id`
when `from_user` is None for DMs restores the expected access-control
behavior without weakening it (group/channel chats correctly stay None).

Also adds a parallel `user_name` fallback to `chat.full_name` so the
display name still works in the same edge case.
2026-04-18 18:18:01 -07:00
Teknium 139a6da67c fix(skills): touchdesigner-mcp setup.sh — correct pgrep match + suppress stray yaml output
Discovered while dogfooding the skill end-to-end:

- pgrep -if "TouchDesigner" matched any shell whose command line
  contained the substring (including the setup script's own invocation
  under certain wrappers), falsely reporting TD running on machines
  where it isn't. Switch to pgrep -x (exact process name match,
  supported on both macOS and Linux) and also check TouchDesignerFTE
  (the non-commercial variant).
- The embedded python3 yaml-writer printed 'added' / 'exists' to
  stdout as status, which leaked a stray word into the setup output
  right before the ✔ line. Drop the print()s — the bash-level ✔/✘ is
  the status indicator.
2026-04-18 17:43:42 -07:00
Teknium 6b31e20894 chore(skills): touchdesigner-mcp follow-ups
- Remove orphan skills/creative/touchdesigner/references/pitfalls.md
  left over from the rename commit (git add-then-edit instead of git mv
  meant the old file never got deleted).
- Honour $HERMES_HOME in setup.sh and SKILL.md setup invocation so
  profile-aware installs work correctly.
- Fix troubleshooting.md config path to use $HERMES_HOME instead of
  hardcoding ~/.hermes/.
- Add touchdesigner-mcp entries to skills-catalog.md and
  optional-skills-catalog.md for parity with blender-mcp/meme-generation.
2026-04-18 17:43:42 -07:00
Teknium 11ee87e605 chore(attribution): add AUTHOR_MAP entry for kshitijk4poor@gmail.com
Covers the non-noreply email used on commit dd3e6424 (rename of the
TouchDesigner skill to touchdesigner-mcp).
2026-04-18 17:43:42 -07:00
kshitijk4poor 6d2fe1d624 feat: rename touchdesigner -> touchdesigner-mcp, move to optional-skills/
- Rename skill to touchdesigner-mcp (matches blender-mcp convention)
- Move from skills/creative/ to optional-skills/creative/
- Fix duplicate pitfall numbering (#3 appeared twice)
- Update SKILL.md cross-references for renumbered pitfalls
- Update setup.sh path for new directory location
2026-04-18 17:43:42 -07:00
kshitijk4poor 6f27390fae feat: rewrite TouchDesigner skill for twozero MCP (v2.0.0)
Major rewrite of the TouchDesigner skill:
- Replace custom API handler with twozero MCP (36 native tools)
- Add audio-reactive GLSL proven recipe (spectrum chain, pitfalls)
- Add recording checklist (FPS>0, non-black, audio cueing)
- Expand pitfalls: 38 entries from real sessions (was 20)
- Update network-patterns with MCP-native build scripts
- Rewrite mcp-tools reference for twozero v2.774+
- Update troubleshooting for MCP-based workflow
- Remove obsolete custom_api_handler.py
- Generalize Environment section for all users
- Remove session-specific Paired Skills section
- Bump version to 2.0.0
2026-04-18 17:43:42 -07:00
kshitijk4poor 7a5371b20d feat: add TouchDesigner integration skill
New skill: creative/touchdesigner — control a running TouchDesigner
instance via REST API. Build real-time visual networks programmatically.

Architecture:
  Hermes Agent -> HTTP REST (curl) -> TD WebServer DAT -> TD Python env

Key features:
- Custom API handler (scripts/custom_api_handler.py) that creates a
  self-contained WebServer DAT + callback in TD. More reliable than the
  official mcp_webserver_base.tox which frequently fails module imports.
- Discovery-first workflow: never hardcode TD parameter names. Always
  probe the running instance first since names change across versions.
- Persistent setup: save the TD project once with the API handler baked
  in. TD auto-opens the last project on launch, so port 9981 is live
  with zero manual steps after first-time setup.
- Works via curl in execute_code (no MCP dependency required).
- Optional MCP server config for touchdesigner-mcp-server npm package.

Skill structure (2823 lines total):
  SKILL.md (209 lines) — setup, workflow, key rules, operator reference
  references/pitfalls.md (276 lines) — 24 hard-won lessons
  references/operators.md (239 lines) — all 6 operator families
  references/network-patterns.md (589 lines) — audio-reactive, generative,
    video processing, GLSL, instancing, live performance recipes
  references/mcp-tools.md (501 lines) — 13 MCP tool schemas
  references/python-api.md (443 lines) — TD Python scripting patterns
  references/troubleshooting.md (274 lines) — connection diagnostics
  scripts/custom_api_handler.py (140 lines) — REST API handler for TD
  scripts/setup.sh (152 lines) — prerequisite checker

Tested on TouchDesigner 099 Non-Commercial (macOS/darwin).
2026-04-18 17:43:42 -07:00
Teknium c49a58a6d0 fix(gateway): mark only still-running sessions resume_pending on drain timeout (#12332)
Follow-up to #12301.

The drain-timeout branch of _stop_impl() was iterating the drain-start
snapshot (active_agents) when marking sessions resume_pending. That
snapshot can include sessions that finished gracefully during the drain
window — marking them would give their next turn a stray
'your previous turn was interrupted by a gateway restart' system note
even though the prior turn actually completed cleanly.

Iterate self._running_agents at timeout time instead, mirroring
_interrupt_running_agents() exactly:
- only sessions still blocking the shutdown get marked
- pending sentinels (AIAgent construction not yet complete) are skipped

Changes:
- gateway/run.py: swap active_agents.keys() for filtered
  self._running_agents.items() iteration in the drain-timeout mark loop.
- tests/gateway/test_restart_resume_pending.py: two regression tests —
  finisher-during-drain not marked, pending sentinel not marked.
2026-04-18 17:40:34 -07:00
Teknium cb4addacab fix(gateway): auto-resume sessions after drain-timeout restart (#11852) (#12301)
The shutdown banner promised "send any message after restart to resume
where you left off" but the code did the opposite: a drain-timeout
restart skipped the .clean_shutdown marker, which made the next startup
call suspend_recently_active(), which marked the session suspended,
which made get_or_create_session() spawn a fresh session_id with a
'Session automatically reset. Use /resume...' notice — contradicting
the banner.

Introduce a resume_pending state on SessionEntry that is distinct from
suspended. Drain-timeout shutdown flags active sessions resume_pending
instead of letting startup-wide suspension destroy them. The next
message on the same session_key preserves the session_id, reloads the
transcript, and the agent receives a reason-aware restart-resume
system note that subsumes the existing tool-tail auto-continue note
(PR #9934).

Terminal escalation still flows through the existing
.restart_failure_counts stuck-loop counter (PR #7536, threshold 3) —
no parallel counter on SessionEntry. suspended still wins over
resume_pending in get_or_create_session() so genuinely stuck sessions
converge to a clean slate.

Spec: PR #11852 (BrennerSpear). Implementation follows the spec with
the approved correction (reuse .restart_failure_counts rather than
adding a resume_attempts field).

Changes:
- gateway/session.py: SessionEntry.resume_pending/resume_reason/
  last_resume_marked_at + to_dict/from_dict; SessionStore
  .mark_resume_pending()/clear_resume_pending(); get_or_create_session()
  returns existing entry when resume_pending (suspended still wins);
  suspend_recently_active() skips resume_pending entries.
- gateway/run.py: _stop_impl() drain-timeout branch marks active
  sessions resume_pending before _interrupt_running_agents();
  _run_agent() injects reason-aware restart-resume system note that
  subsumes the tool-tail case; successful-turn cleanup also clears
  resume_pending next to _clear_restart_failure_count();
  _notify_active_sessions_of_shutdown() softens the restart banner to
  'I'll try to resume where you left off' (honest about stuck-loop
  escalation).
- tests/gateway/test_restart_resume_pending.py: 29 new tests covering
  SessionEntry roundtrip, mark/clear helpers, get_or_create_session
  precedence (suspended > resume_pending), suspend_recently_active
  skip, drain-timeout mark reason (restart vs shutdown), system-note
  injection decision tree (including tool-tail subsumption), banner
  wording, and stuck-loop escalation override.
2026-04-18 17:32:17 -07:00
brooklyn! ad99e32371 Merge pull request #12312 from NousResearch/bb/tui-ux-pack
feat(tui): UX pack — stable picker keys, /clear confirm, light-theme preset
2026-04-18 18:13:06 -05:00
Brooklyn Nicholson df5ca5065f feat(tui): replace /clear double-press gate with a proper confirm overlay
The time-window gate felt wrong — users would hit /clear, read the
prompt, retype, and consistently blow past the window. Swapping to a
real yes/no overlay that blocks input like the existing Approval and
Clarify prompts.

- add ConfirmReq type + OverlayState.confirm + $isBlocked coverage
- ConfirmPrompt component (prompts.tsx): cancel row on top as the
  default, danger-coloured confirm row on the bottom, Y/N hotkeys,
  Enter on default = cancel, Esc/Ctrl+C cancel
- wire into PromptZone (appOverlays.tsx)
- /clear + /new now push onto the overlay instead of arming a timer
- HERMES_TUI_NO_CONFIRM=1 still skips the prompt for scripting
- drop the destructiveGate + createSlashHandler reset wiring
  (destructive.ts and its tests removed)

Refs #4069.
2026-04-18 18:04:08 -05:00
Brooklyn Nicholson 75377feb07 fix(tui): make /clear confirm window humane (3s → 30s, reset on other slash)
The 3s gate was too tight — users reading the prompt and retyping
consistently blow past it and get stuck in a loop ("press /clear
again within 3s" forever). Fixes:

- bump CONFIRM_WINDOW_MS 3_000 → 30_000
- drop the time number from the confirmation message to remove the
  pressure vibe: "press /clear again to confirm — starts a new session"
- reset the gate from createSlashHandler whenever any non-destructive
  slash command runs, so stale arming from 20s ago can't silently
  turn the next /clear into an unintended confirm
- export the gate + isDestructiveCommand helper for that wiring
- add armed() introspection method

Follow-up to #4069 / 3366714b.
2026-04-18 17:55:53 -05:00
Brooklyn Nicholson 20eab355e7 feat(tui): add LIGHT_THEME preset for white/light terminal backgrounds
Splits the existing palette into DARK_THEME (current yellow-heavy
default) and LIGHT_THEME (darker browns + proper contrast on white).
DEFAULT_THEME aliases DARK_THEME, and flips to LIGHT_THEME when
HERMES_TUI_LIGHT=1 is set at launch.

Skin system (fromSkin) still layers on top of whichever preset is
active, so users can keep customizing on top of either palette.

Refs #11300.
2026-04-18 17:49:40 -05:00
Brooklyn Nicholson 3366714ba4 feat(tui): double-press confirm on /clear and /new
Prevents accidental session loss: the first press prints
"press /clear again within 3s to confirm"; a second press inside
the window actually starts a new session. Outside the window the
gate re-arms.

Opt out with HERMES_TUI_NO_CONFIRM=1 for scripted / muscle-memory
workflows.

Refs #4069.
2026-04-18 17:48:34 -05:00
Brooklyn Nicholson 52124384de fix(tui): stable React keys in /model picker rows
Use provider.slug (and a composite key for model rows) instead of the
rendered string, so dupes in the backend response can't collapse two
rows into one or trigger key-collision warnings.
2026-04-18 17:47:26 -05:00
brooklyn! db59c190c1 Merge pull request #12305 from NousResearch/bb/tui-status-git-branch
feat(tui): append git branch to cwd label in status bar
2026-04-18 17:27:40 -05:00
brooklyn! c0edcf2d53 Merge pull request #12306 from NousResearch/bb/tui-model-picker-dedupe-names
fix(tui): disambiguate /model picker rows when provider display names collide
2026-04-18 17:27:31 -05:00
Brooklyn Nicholson 4aa52590d8 fix(tui): disambiguate /model picker rows when provider display names collide
If the gateway returns two providers that resolve to the same display name
(e.g. `kimi-coding` and `kimi-coding-cn` both → "Kimi For Coding"), the
picker now appends the slug so users can tell them apart, in both the
provider list and the selected-provider header. No-op when names are
already unique.

Refs #10526 — the Python backend dedupe from #10599 skips one alias, but
user-defined providers, canonical overlays, and future regressions can
still surface as indistinguishable rows in the picker. This is a
client-side safety net on top of that.
2026-04-18 17:22:23 -05:00
Brooklyn Nicholson ff2aa7ccd7 feat(tui): append git branch to cwd label in status bar
Adds useGitBranch hook (async, cached, 15s TTL) and fmtCwdBranch
helper so the footer shows `~/repo (main)` instead of just `~/repo`.
Degrades silently when git is unavailable or cwd is outside a repo.

Partial fix for #12267 (TUI portion; #12277 covers the Python side).
2026-04-18 17:17:05 -05:00
Teknium 0175ff7516 feat(skills): replace xitter with xurl — the official X API CLI (#12303)
Swap the social-media/xitter skill (third-party wrapper around
Infatoshi/x-cli) for a new social-media/xurl skill wrapping
xdevplatform/xurl — the official X API CLI from the X developer
platform team.

Why:
- xurl is officially maintained by the X dev platform team
- OAuth 2.0 PKCE with auto-refresh + multi-app / multi-user support
  (vs. xitter's 5-env-var OAuth 1.0a + single account)
- Credentials stored in ~/.xurl managed by xurl itself — no manual
  env var juggling for users
- Substantially larger API surface: DMs, follows, blocks, mutes,
  media upload, streaming, and raw v2 endpoint access
- Ships stronger agent-safety guardrails (forbidden-flag list,
  no --verbose in agent mode, never-read-~/.xurl rule)

Adaptation:
- Ported the openclaw SKILL.md (which the xdevplatform team seeded)
  to Hermes frontmatter conventions (prerequisites.commands, platforms,
  metadata.hermes.tags/homepage) — dropped openclaw-specific metadata
- Added a Hermes-oriented one-time user setup section so the agent
  knows to direct the user to run auth commands themselves, never
  execute them with inline secrets
- Preserved the mandatory secret-safety rules verbatim
- Attribution block credits xdevplatform, openclaw, and the Hermes
  port

Docs: updated website/docs/reference/skills-catalog.md to replace
the xitter row with xurl.
2026-04-18 15:11:32 -07:00
Teknium 6a3a6a0fb6 Merge pull request #12263 from NousResearch/bb/tui-audit-followup
fix(tui): TUI v2 audit follow-up — registry, overlays, paste, reasoning, hyperlinks
2026-04-18 14:40:16 -07:00
helix4u 4e8f60fd11 fix(cli): use display width for wrapped spinner height 2026-04-18 14:34:05 -07:00
Brooklyn Nicholson fb06bc67de fix(tui): Ctrl+C with input selection actually preserves input (lift handler to app level)
Previous fix in 9dbf1ec6 handled Ctrl+C inside textInput but the APP-level
useInputHandlers fires the same keypress in a separate React hook and ran
clearIn() regardless. Net effect: the OSC 52 copy succeeded but the input
wiped right after, so Brooklyn only noticed the wipe.

Lift the selection-aware Ctrl+C to a single place by threading input
selection state through a new nanostore (src/app/inputSelectionStore.ts).
textInput syncs its derived `selected` range + a clear() callback to the
store on every selection change, and the app-level Ctrl+C handler reads
the store before its clear/interrupt/die chain:

  - terminal-level selection (scrollback) → copy, existing behavior
  - in-input selection present → copy + clear selection, preserve input
  - input has text, no selection → clearIn(), existing behavior
  - empty + busy → interrupt turn
  - empty + idle → die

textInput no longer has its own Ctrl+C block; keypress falls through to
app-level like it did before 9dbf1ec6.
2026-04-18 16:28:51 -05:00
Brooklyn Nicholson bfac5d039d Merge branch 'main' of github.com:NousResearch/hermes-agent into bb/tui-audit-followup 2026-04-18 15:27:40 -05:00
Brooklyn Nicholson 17e95a26b7 fix(tui): render /skills browse as a formatted Panel instead of raw JSON
Previous handler dumped the raw skills.manage response into a pager, which
was unreadable and hid the pagination metadata. Also silently accepted
non-numeric page args.

Now:
- validates page arg (rejects NaN / <1 with a usage message)
- shows "fetching community skills (scans 6 sources, may take ~15s)…" up
  front so the 10-30s hub fetch isn't a silent hang
- renders items as {name · trust, description (truncated 160 chars)} rows
  in the existing Panel component
- footer shows "page X of Y · N skills total · /skills browse N+1 for more"
  when the server returned pagination metadata

Skills hub's remote fetch latency is a separate upstream issue
(browse_skills hits 6 sources sequentially) — client-side we just stop
misrepresenting it.
2026-04-18 15:22:43 -05:00
Brooklyn Nicholson 7e9a098574 chore: uptick 2026-04-18 15:17:42 -05:00
Brooklyn Nicholson 450ded98db chore(tui): prettier whitespace on files touched in this branch 2026-04-18 15:13:31 -05:00
Brooklyn Nicholson 93b4080b78 Merge branch 'main' of github.com:NousResearch/hermes-agent into bb/tui-audit-followup
# Conflicts:
#	ui-tui/src/components/markdown.tsx
#	ui-tui/src/types/hermes-ink.d.ts
2026-04-18 14:52:54 -05:00
helix4u ca32a2a60b fix(gemini): restore bearer auth on openai route 2026-04-18 12:52:01 -07:00
helix4u a7dd6a3449 fix(gemini): hide stale and low-TPM Google models 2026-04-18 12:52:01 -07:00
helix4u 2eab7ee15f fix(gemini): hide low-TPM Gemma models from exposed lists 2026-04-18 12:52:01 -07:00
LVT382009 f7af90e2da fix: wire _ephemeral_max_output_tokens into chat_completions and add NVIDIA NIM default
Based on #12152 by @LVT382009.

Two fixes to run_agent.py:

1. _ephemeral_max_output_tokens consumption in chat_completions path:
   The error-recovery ephemeral override was only consumed in the
   anthropic_messages branch of _build_api_kwargs.  All chat_completions
   providers (OpenRouter, NVIDIA NIM, Qwen, Alibaba, custom, etc.)
   silently ignored it.  Now consumed at highest priority, matching the
   anthropic pattern.

2. NVIDIA NIM max_tokens default (16384):
   NVIDIA NIM falls back to a very low internal default when max_tokens
   is omitted, causing models like GLM-4.7 to truncate immediately
   (thinking tokens exhaust the budget before the response starts).

3. Progressive length-continuation boost:
   When finish_reason='length' triggers a continuation retry, the output
   budget now grows progressively (2x base on retry 1, 3x on retry 2,
   capped at 32768) via _ephemeral_max_output_tokens.  Previously the
   retry loop just re-sent the same token limit on all 3 attempts.
2026-04-18 12:51:30 -07:00
jarvischer 0f778f7768 fix: prevent tool name duplication in streaming accumulator (MiniMax/NVIDIA NIM)
Based on #11984 by @maxchernin.  Fixes #8259.

Some providers (MiniMax M2.7 via NVIDIA NIM) resend the full function
name in every streaming chunk instead of only the first.  The old
accumulator used += which concatenated them into 'read_fileread_file'.

Changed to simple assignment (=), matching the OpenAI Node SDK, LiteLLM,
and Vercel AI SDK patterns.  Function names are atomic identifiers
delivered complete — no provider splits them across chunks, so
concatenation was never correct semantics.
2026-04-18 12:50:32 -07:00
Brooklyn Nicholson 4caf6c23dd fix(tui): strip <think>…</think> tags from assistant content and route to reasoning panel
Models that emit reasoning inline as <think>/<reasoning>/<thinking>/<thought>/
<REASONING_SCRATCHPAD> tags in the content field (rather than a separate API
reasoning channel) had the raw tags + inner content shown twice: once as body
text with literal <think> markers, and again in the thinking panel when the
reasoning field was populated.

Port v1's tag set to lib/reasoning.ts with a splitReasoning(text) helper that
returns { reasoning, text }. Applied in three spots:

  - scheduleStreaming: strips tags from the live streaming view so the user
    never sees <think> mid-turn.
  - flushStreamingSegment: when a tool interrupts assistant output mid-turn,
    the saved segment is the stripped text; extracted reasoning promotes to
    reasoningText if the API channel hasn't already populated it.
  - recordMessageComplete: final message text is split, extracted reasoning
    merges with any existing reasoning (API channel wins on conflicts so we
    don't double-count when both are present).
2026-04-18 14:46:38 -05:00
Brooklyn Nicholson 37cba82bfc fix(tui): Ctrl+C on in-input selection copies to clipboard instead of clearing
Before: textInput explicitly ignored Ctrl+C so the app-level handler took
over — with no knowledge of the TextInput's own selection — and fell through
to clearIn() whenever input had text. Selecting part of the composer and
pressing Ctrl+C silently nuked everything you typed.

Now: Ctrl+C with an active in-input selection writes the selected substring
to the clipboard via OSC 52 and clears the selection. The original semantics
(Ctrl+C with no selection → app-level interrupt/clear/die chain) are
preserved by still returning early in that case.
2026-04-18 14:42:03 -05:00
Teknium 0bebf5b948 chore(attribution): add AUTHOR_MAP entry for Honghua Yang (honghua) 2026-04-18 12:40:56 -07:00
Honghua Yang 3128d9fcd2 fix(context_compressor): keep tool-call arguments JSON valid when shrinking
Pass 3 of `_prune_old_tool_results` previously shrunk long `function.arguments`
blobs by slicing the raw JSON string at byte 200 and appending the literal
text `...[truncated]`. That routinely produced payloads like::

    {"path": "/foo.md", "content": "# Long markdown
    ...[truncated]

— an unterminated string with no closing brace. Strict providers (observed
on MiniMax) reject this as `invalid function arguments json string` with a
non-retryable 400. Because the broken call survives in the session history,
every subsequent turn re-sends the same malformed payload and gets the same
400, locking the session into a re-send loop until the call falls out of
the window.

Fix: parse the arguments first, shrink long string leaves inside the parsed
structure, and re-serialise. Non-string values (paths, ints, booleans, lists)
pass through intact. Arguments that are not valid JSON to begin with (rare,
some backends use non-JSON tool args) are returned unchanged rather than
replaced with something neither we nor the provider can parse.

Observed in the wild: a `write_file` with ~800 chars of markdown `content`
triggered this on a real session against MiniMax-M2.7; every turn after
compression got rejected until the session was manually reset.

Tests:
- 7 direct tests of `_truncate_tool_call_args_json` covering valid-JSON
  output, non-JSON pass-through, nested structures, non-string leaves,
  scalar JSON, and Unicode preservation
- 1 end-to-end test through `_prune_old_tool_results` Pass 3 that
  reproduces the exact failure payload shape from the incident

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 12:40:56 -07:00
Brooklyn Nicholson 5c8b291607 fix(tui): wrap markdown links in Link so Ghostty/iTerm/kitty get real OSC 8 hyperlinks
renderLink was discarding the URL entirely — it rendered the label as amber
underlined text and dropped the href. Result: Cmd+Click / Ctrl+Click did
nothing in any terminal, including Ghostty.

Now both markdown links `[label](url)` and bare `https://…` URLs are wrapped
in @hermes/ink's Link component, which emits OSC 8 (\\x1b]8;;url\\x07label\\x1b]8;;\\x07)
when supportsHyperlinks() returns true. ADDITIONAL_HYPERLINK_TERMINALS already
includes ghostty, iTerm2, kitty, alacritty, Hyper.

Autolinks that look like bare emails (foo@bar.com) now prepend mailto: in the
href so they open the mail client correctly.

Also adds a typed declaration for Link in hermes-ink.d.ts.
2026-04-18 14:39:24 -05:00
Brooklyn Nicholson a7f4d756b7 fix(tui): cap approval prompt command preview at 10 lines
Large inline scripts (e.g. Python code_execution bodies) rendered as a single
unbounded <Text> block, pushing the Allow/Deny options below the visible
viewport. Users had to scroll the terminal to vote.

Preview now shows the first 10 lines with truncate-end wrap per line and a
dim "… +N more lines" indicator. Full text remains in the transcript above.
2026-04-18 14:36:34 -05:00
Teknium b73ebfee30 chore(attribution): add AUTHOR_MAP entry for Jim Liu (JimLiu)
Maps junminliu@gmail.com → JimLiu for the baoyu-infographic skill port
co-author attribution.
2026-04-18 12:32:16 -07:00
Teknium ade7958f1f docs: add PORT_NOTES.md for baoyu-infographic
Documents what changed from upstream and how to sync future updates.
2026-04-18 12:32:16 -07:00
Teknium 65c0a30a77 feat(skills): add baoyu-infographic skill — 21 layouts × 21 styles
Port of baoyu-infographic from JimLiu/baoyu-skills (v1.56.1) adapted
for Hermes Agent's tool ecosystem.

Adaptations from upstream:
- Frontmatter: openclaw metadata → hermes metadata
- Usage: slash command syntax → natural language triggers
- Removed EXTEND.md config system (not part of Hermes infrastructure)
- AskUserQuestion → clarify tool (one question at a time)
- Image generation → image_generate tool
- Removed Windows-specific paths
- Simplified file operations to use Hermes file tools
- All 45 reference files (layouts, styles, templates) preserved intact

Attribution preserved per agreement with 宝玉 (Jim Liu):
- author, version, GitHub homepage URL in frontmatter

Co-authored-by: Jim Liu 宝玉 <junminliu@gmail.com>
2026-04-18 12:32:16 -07:00
Siddharth Balyan a828daa7f8 perf(docker): layer-cache npm/Playwright and skip redundant web rebuild (#12225)
* perf(docker): layer-cache npm/Playwright and skip redundant web rebuild

Copy package manifests before source so npm install + Playwright only
re-run when lockfiles change. Use COPY --chown instead of chown -R,
set HERMES_WEB_DIST to skip runtime web rebuild, and drop the
USER root / chmod dance since entrypoint.sh is already executable in git.

* Update Dockerfile
2026-04-18 22:44:31 +05:30
bluefishs b0bde98b0f fix(docker): build web/ dashboard assets in image (#12180)
The Dockerfile installs root-level npm dependencies (for Playwright) and the
whatsapp-bridge bundle, but never builds the web/ Vite project. As a result,
'hermes dashboard' starts FastAPI on :9119 but serves a broken SPA because
hermes_cli/web_dist/ is empty and requests to /assets/index-<hash>.js 404.

Add a build step inside web/ so the Vite output is baked into the image.

Reproduce (before):
  docker build -t hermes-repro -f Dockerfile .
  docker run --rm -p 9119:9119 hermes-repro hermes dashboard
  curl -sI http://localhost:9119/assets/ | head -1   # -> 404

After: /assets/ returns the built asset path.
2026-04-18 22:20:24 +05:30
kshitij c14b3b5880 fix(kimi): force fixed temperature on kimi-k2.* models (k2.5, thinking, turbo) (#12144)
* fix(kimi): force fixed temperature on kimi-k2.* models (k2.5, thinking, turbo)

The prior override only matched the literal model name "kimi-for-coding",
but Moonshot's coding endpoint is hit with real model IDs such as
`kimi-k2.5`, `kimi-k2-turbo-preview`, `kimi-k2-thinking`, etc.  Those
requests bypassed the override and kept the caller's temperature, so
Moonshot returns HTTP 400 "invalid temperature: only 0.6 is allowed for
this model" (or 1.0 for thinking variants).

Match the whole kimi-k2.* family:
  * kimi-k2-thinking / kimi-k2-thinking-turbo -> 1.0 (thinking mode)
  * all other kimi-k2.* -> 0.6 (non-thinking / instant mode)

Also accept an optional vendor prefix (e.g. `moonshotai/kimi-k2.5`) so
aggregator routings are covered.

* refactor(kimi): whitelist-match kimi coding models instead of prefix

Addresses review feedback on PR #12144.

- Replace `startswith("kimi-k2")` with explicit frozensets sourced from
  Moonshot's kimi-for-coding model list.  The prefix match would have also
  clamped `kimi-k2-instruct` / `kimi-k2-instruct-0905`, which are the
  separate non-coding K2 family with variable temperature (recommended 0.6
  but not enforced — see huggingface.co/moonshotai/Kimi-K2-Instruct).
- Confirmed via platform.kimi.ai docs that all five coding models
  (k2.5, k2-turbo-preview, k2-0905-preview, k2-thinking, k2-thinking-turbo)
  share the fixed-temperature lock, so the preview-model mapping is no
  longer an assumption.
- Drop the fragile `"thinking" in bare` substring test for a set lookup.
- Log a debug line on each override so operators can see when Hermes
  silently rewrites temperature.
- Update class docstring.  Extend the negative test to parametrize over
  kimi-k2-instruct, Kimi-K2-Instruct-0905, and a hypothetical future
  kimi-k2-experimental name — all must keep the caller's temperature.
2026-04-18 09:35:51 -07:00
kshitijk4poor 656c375855 fix(tui): review follow-up — /retry, /plan, ANSI truncation, caching
- /retry: use session['history'] instead of non-existent
  agent.conversation_history; truncate history at last user message
  to match CLI retry_last() behavior; add history_lock safety
- /plan: pass user instruction (arg) to build_plan_path instead of
  session_key; add runtime_note so agent knows where to save the plan
- ANSI tool results: render full text via <Ansi wrap=truncate-end>
  instead of slicing raw ANSI through compactPreview (which cuts
  mid-escape-sequence producing garbled output)
- Move _PENDING_INPUT_COMMANDS frozenset to module level
- Use get_skill_commands() (cached) instead of scan_skill_commands()
  (rescans disk) in slash.exec skill interception
- Add 3 retry tests: happy path with history truncation verification,
  empty history error, multipart content extraction
- Update test mock target from scan_skill_commands to get_skill_commands
2026-04-18 09:30:48 -07:00
kshitijk4poor abc95338c2 fix(tui): slash.exec _pending_input commands, tool ANSI, terminal title
Additional TUI fixes discovered in the same audit:

1. /plan slash command was silently lost — process_command() queues the
   plan skill invocation onto _pending_input which nobody reads in the
   slash worker subprocess.  Now intercepted in slash.exec and routed
   through command.dispatch with a new 'send' dispatch type.

   Same interception added for /retry, /queue, /steer as safety nets
   (these already have correct TUI-local handlers in core.ts, but the
   server-side guard prevents regressions if the local handler is
   bypassed).

2. Tool results were stripping ANSI escape codes — the messageLine
   component used stripAnsi() + plain <Text> for tool role messages,
   losing all color/styling from terminal, search_files, etc.  Now
   uses <Ansi> component (already imported) when ANSI is detected.

3. Terminal tab title now shows model + busy status via useTerminalTitle
   hook from @hermes/ink (was never used).  Users can identify Hermes
   tabs and see at a glance whether the agent is busy or ready.

4. Added 'send' variant to CommandDispatchResponse type + asCommandDispatch
   parser + createSlashHandler handler for commands that need to inject
   a message into the conversation (plan, queue fallback, steer fallback).
2026-04-18 09:30:48 -07:00
kshitijk4poor 2da558ec36 fix(tui): clickable hyperlinks and skill slash command dispatch
Two TUI fixes:

1. Hyperlinks are now clickable (Cmd+Click / Ctrl+Click) in terminals
   that support OSC 8.  The markdown renderer was rendering links as
   plain colored text — now wraps them in the existing <Link> component
   from @hermes/ink which emits OSC 8 escape sequences.

2. Skill slash commands (e.g. /hermes-agent-dev) now work in the TUI.
   The slash.exec handler was delegating to the _SlashWorker subprocess
   which calls cli.process_command().  For skills, process_command()
   queues the invocation message onto _pending_input — a Queue that
   nobody reads in the worker subprocess.  The skill message was lost.
   Now slash.exec detects skill commands early and rejects them so
   the TUI falls through to command.dispatch, which correctly builds
   and returns the skill payload for the client to send().
2026-04-18 09:30:48 -07:00
Siddharth Balyan b0efdf37d7 fix(nix): upgrade Python 3.11 → 3.12, add cross-platform eval check (#12208) 2026-04-18 21:51:03 +05:30
Siddharth Balyan 8a0c774e9e Add web dashboard build to Nix flake (#12194)
The web dashboard (Vite/React frontend) is now built as a separate Nix
derivation and baked into the Hermes package. The build output is
installed to a standard location and exposed via the `HERMES_WEB_DIST`
environment variable, allowing the dashboard command to use pre-built
assets when available (e.g., in packaged releases) instead of rebuilding
on every invocation.
2026-04-18 20:55:39 +05:30
Brooklyn Nicholson f8becbfbea feat(tui): per-language syntax highlighting in markdown code fences
Adds a minimal hand-rolled highlighter for ts/js/jsx/tsx, py, sh/bash, go, rust,
json, yaml, sql. Recognizes whole-line comments, single/double/backtick strings,
numbers, and per-language keyword sets. Unknown langs fall through to the current
plain rendering; the existing diff-specific colorization is preserved.

Closes the §8 "Markdown syntax highlighting is missing (only diff gets colored)"
finding from the TUI v2 audit without pulling in a highlighter library.
2026-04-18 09:48:38 -05:00
Brooklyn Nicholson 5e148ca3d0 fix(tui): route /skills subcommands through skills.manage instead of curses slash.exec
/skills install, inspect, search, browse, list now call the typed skills.manage RPC
and render results via panel/page. Previously they fell through to slash.exec which
invokes v1's curses code path — that hangs or crashes inside the Ink worker per the
§2 parity-audit finding.

Also drop Enter-as-install from the Skills Hub action stage since the Hub lists
locally installed skills; primary action is inspect-and-close. x still triggers a
manual reinstall for power users.
2026-04-18 09:46:36 -05:00
Brooklyn Nicholson 949b8f5521 feat(tui): register /skills slash command to open Skills Hub
Intercept bare /skills locally and flip overlay.skillsHub, so the
overlay opens instantly without waiting on slash.exec. /skills <args>
still forwards to slash.exec and paginates any output. Tests cover
both branches.
2026-04-18 09:42:57 -05:00
Brooklyn Nicholson ef284e021a feat(tui): add two-step SkillsHub overlay component
New SkillsHub mirrors ModelPicker's category → item → actions flow with
paginated 12-line lists, 1-9/0 quick-pick, Esc-back navigation, and
lazy skills.manage inspect/install calls. Mount it from appOverlays
when overlay.skillsHub is true.
2026-04-18 09:42:57 -05:00
Brooklyn Nicholson 6fbfae8f42 feat(tui): add skillsHub overlay state wiring
Extend OverlayState with a skillsHub flag, fold it into $isBlocked, and
teach Ctrl+C to close the overlay so later PRs can render the component
behind this slot.
2026-04-18 09:42:57 -05:00
Brooklyn Nicholson 3821323029 feat(tui): render per-MCP-server status block in SessionPanel 2026-04-18 09:42:57 -05:00
Brooklyn Nicholson b82ec6419d test(tui-gateway): cover mcp_servers field in _session_info output 2026-04-18 09:42:57 -05:00
Brooklyn Nicholson 202b78ec68 feat(tui-gateway): include per-MCP-server status in session.info payload 2026-04-18 09:42:57 -05:00
Brooklyn Nicholson fd6ffc777f feat(tui): honor display.* flags in turn renderer, status bar, and event handler
- turnController gates scheduleStreaming / reasoning recorders on
  streaming + showReasoning so disabling them keeps the buffer silent
  until message.complete flushes
- createGatewayEventHandler only surfaces inline_diff previews when
  inlineDiffs is on
- StatusRule takes a showCost prop and renders `· $X.XXXX` with the
  same toFixed(4) formatting as /usage when usage.cost_usd is present
- Usage grows cost_usd?: number to match the gateway payload
- Existing handler tests flip showReasoning on in beforeEach so
  reasoning-flow assertions keep their meaning
2026-04-18 09:42:57 -05:00
Brooklyn Nicholson 200c17433c feat(tui): read display.streaming / show_reasoning / show_cost / inline_diffs from config
Extends ConfigDisplayConfig and UiState so the four new display flags
flow from `config.get {key:"full"}` into the nanostore. applyDisplay is
exported to keep the fan-out testable without an Ink harness.

Defaults mirror v1 parity: streaming + inline_diffs default true
(opt-out via `=== false`), show_cost + show_reasoning default false
(opt-in via plain truthy check).
2026-04-18 09:42:57 -05:00
Brooklyn Nicholson 586b2f2089 feat(tui): persist large pastes to ~/.hermes/pastes/ via paste.collapse 2026-04-18 09:42:57 -05:00
Brooklyn Nicholson a397b0fd4d test(tui-gateway): assert quick_commands appear in commands.catalog output 2026-04-18 09:42:57 -05:00
Brooklyn Nicholson 5152e1ad86 feat(tui-gateway): surface config.quick_commands in commands.catalog 2026-04-18 09:42:57 -05:00
Brooklyn Nicholson 4e1ea79edc feat(tui): accept raw Ctrl+V as clipboard image paste fallback 2026-04-18 09:42:57 -05:00
Brooklyn Nicholson f0638f3596 fix(tui): split /model picker from /provider wizard to resolve registry collision 2026-04-18 09:42:57 -05:00
186 changed files with 13960 additions and 3705 deletions
+20 -10
View File
@@ -21,26 +21,36 @@ RUN useradd -u 10000 -m -d /opt/data hermes
COPY --chmod=0755 --from=gosu_source /gosu /usr/local/bin/
COPY --chmod=0755 --from=uv_source /usr/local/bin/uv /usr/local/bin/uvx /usr/local/bin/
COPY . /opt/hermes
WORKDIR /opt/hermes
# Install Node dependencies and Playwright as root (--with-deps needs apt)
# ---------- Layer-cached dependency install ----------
# Copy only package manifests first so npm install + Playwright are cached
# unless the lockfiles themselves change.
COPY package.json package-lock.json ./
COPY scripts/whatsapp-bridge/package.json scripts/whatsapp-bridge/package-lock.json scripts/whatsapp-bridge/
COPY web/package.json web/package-lock.json web/
RUN npm install --prefer-offline --no-audit && \
npx playwright install --with-deps chromium --only-shell && \
cd /opt/hermes/scripts/whatsapp-bridge && \
npm install --prefer-offline --no-audit && \
(cd scripts/whatsapp-bridge && npm install --prefer-offline --no-audit) && \
(cd web && npm install --prefer-offline --no-audit) && \
npm cache clean --force
# Hand ownership to hermes user, then install Python deps in a virtualenv
RUN chown -R hermes:hermes /opt/hermes
USER hermes
# ---------- Source code ----------
# .dockerignore excludes node_modules, so the installs above survive.
COPY --chown=hermes:hermes . .
# Build web dashboard (Vite outputs to hermes_cli/web_dist/)
RUN cd web && npm run build
# ---------- Python virtualenv ----------
RUN chown hermes:hermes /opt/hermes
USER hermes
RUN uv venv && \
uv pip install --no-cache-dir -e ".[all]"
USER root
RUN chmod +x /opt/hermes/docker/entrypoint.sh
# ---------- Runtime ----------
ENV HERMES_WEB_DIST=/opt/hermes/hermes_cli/web_dist
ENV HERMES_HOME=/opt/data
VOLUME [ "/opt/data" ]
ENTRYPOINT [ "/opt/hermes/docker/entrypoint.sh" ]
+39 -30
View File
@@ -99,11 +99,48 @@ _FIXED_TEMPERATURE_MODELS: Dict[str, float] = {
"kimi-for-coding": 0.6,
}
# Moonshot's kimi-for-coding endpoint (api.kimi.com/coding) documents:
# "k2.5 model will use a fixed value 1.0, non-thinking mode will use a fixed
# value 0.6. Any other value will result in an error." The same lock applies
# to the other k2.* models served on that endpoint. Enumerated explicitly so
# non-coding siblings like `kimi-k2-instruct` (variable temperature, served on
# the standard chat API and third parties) are NOT clamped.
# Source: https://platform.kimi.ai/docs/guide/kimi-k2-5-quickstart
_KIMI_INSTANT_MODELS: frozenset = frozenset({
"kimi-k2.5",
"kimi-k2-turbo-preview",
"kimi-k2-0905-preview",
})
_KIMI_THINKING_MODELS: frozenset = frozenset({
"kimi-k2-thinking",
"kimi-k2-thinking-turbo",
})
def _fixed_temperature_for_model(model: Optional[str]) -> Optional[float]:
"""Return a required temperature override for models with strict contracts."""
"""Return a required temperature override for models with strict contracts.
Moonshot's kimi-for-coding endpoint rejects any non-approved temperature on
the k2.5 family. Non-thinking variants require exactly 0.6; thinking
variants require 1.0. An optional ``vendor/`` prefix (e.g.
``moonshotai/kimi-k2.5``) is tolerated for aggregator routings.
Returns ``None`` for every other model, including ``kimi-k2-instruct*``
which is the separate non-coding K2 family with variable temperature.
"""
normalized = (model or "").strip().lower()
return _FIXED_TEMPERATURE_MODELS.get(normalized)
fixed = _FIXED_TEMPERATURE_MODELS.get(normalized)
if fixed is not None:
logger.debug("Forcing temperature=%s for model %r (fixed map)", fixed, model)
return fixed
bare = normalized.rsplit("/", 1)[-1]
if bare in _KIMI_THINKING_MODELS:
logger.debug("Forcing temperature=1.0 for kimi thinking model %r", model)
return 1.0
if bare in _KIMI_INSTANT_MODELS:
logger.debug("Forcing temperature=0.6 for kimi instant model %r", model)
return 0.6
return None
# Default auxiliary models for direct API-key providers (cheap/fast for side tasks)
_API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
@@ -745,15 +782,6 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
from hermes_cli.models import copilot_default_headers
extra["default_headers"] = copilot_default_headers()
elif "generativelanguage.googleapis.com" in base_url.lower():
# Google's OpenAI-compatible endpoint only accepts x-goog-api-key.
# Passing api_key= causes the SDK to inject Authorization: Bearer,
# which Google rejects with HTTP 400 "Multiple authentication
# credentials received". Use a placeholder for api_key and pass
# the real key via x-goog-api-key header instead.
# Fixes: https://github.com/NousResearch/hermes-agent/issues/7893
extra["default_headers"] = {"x-goog-api-key": api_key}
api_key = "not-used"
return OpenAI(api_key=api_key, base_url=base_url, **extra), model
creds = resolve_api_key_provider_credentials(provider_id)
@@ -775,15 +803,6 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
from hermes_cli.models import copilot_default_headers
extra["default_headers"] = copilot_default_headers()
elif "generativelanguage.googleapis.com" in base_url.lower():
# Google's OpenAI-compatible endpoint only accepts x-goog-api-key.
# Passing api_key= causes the SDK to inject Authorization: Bearer,
# which Google rejects with HTTP 400 "Multiple authentication
# credentials received". Use a placeholder for api_key and pass
# the real key via x-goog-api-key header instead.
# Fixes: https://github.com/NousResearch/hermes-agent/issues/7893
extra["default_headers"] = {"x-goog-api-key": api_key}
api_key = "not-used"
return OpenAI(api_key=api_key, base_url=base_url, **extra), model
return None, None
@@ -1629,16 +1648,6 @@ def resolve_provider_client(
from hermes_cli.models import copilot_default_headers
headers.update(copilot_default_headers())
elif "generativelanguage.googleapis.com" in base_url.lower():
# Google's OpenAI-compatible endpoint only accepts x-goog-api-key.
# Passing api_key= causes the OpenAI SDK to inject Authorization: Bearer,
# which Google rejects with HTTP 400 "Multiple authentication credentials
# received". Use a placeholder for api_key and pass the real key via
# x-goog-api-key header instead.
# Fixes: https://github.com/NousResearch/hermes-agent/issues/7893
headers["x-goog-api-key"] = api_key
api_key = "not-used"
client = OpenAI(api_key=api_key, base_url=base_url,
**({"default_headers": headers} if headers else {}))
+55 -2
View File
@@ -63,6 +63,52 @@ _CHARS_PER_TOKEN = 4
_SUMMARY_FAILURE_COOLDOWN_SECONDS = 600
def _truncate_tool_call_args_json(args: str, head_chars: int = 200) -> str:
"""Shrink long string values inside a tool-call arguments JSON blob while
preserving JSON validity.
The ``function.arguments`` field on a tool call is a JSON-encoded string
passed through to the LLM provider; downstream providers strictly
validate it and return a non-retryable 400 when it is not well-formed.
An earlier implementation sliced the raw JSON at a fixed byte offset and
appended ``...[truncated]`` — which routinely produced strings like::
{"path": "/foo/bar", "content": "# long markdown
...[truncated]
i.e. an unterminated string and a missing closing brace. MiniMax, for
example, rejects this with ``invalid function arguments json string``
and the session gets stuck re-sending the same broken history on every
turn. See issue #11762 for the observed loop.
This helper parses the arguments, shrinks long string leaves inside the
parsed structure, and re-serialises. Non-string values (paths, ints,
booleans) are preserved intact. If the arguments are not valid JSON
to begin with — some model backends use non-JSON tool arguments — the
original string is returned unchanged rather than replaced with
something neither we nor the backend can parse.
"""
try:
parsed = json.loads(args)
except (ValueError, TypeError):
return args
def _shrink(obj: Any) -> Any:
if isinstance(obj, str):
if len(obj) > head_chars:
return obj[:head_chars] + "...[truncated]"
return obj
if isinstance(obj, dict):
return {k: _shrink(v) for k, v in obj.items()}
if isinstance(obj, list):
return [_shrink(v) for v in obj]
return obj
shrunken = _shrink(parsed)
# ensure_ascii=False preserves CJK/emoji instead of bloating with \uXXXX
return json.dumps(shrunken, ensure_ascii=False)
def _summarize_tool_result(tool_name: str, tool_args: str, tool_content: str) -> str:
"""Create an informative 1-line summary of a tool call + result.
@@ -449,6 +495,11 @@ class ContextCompressor(ContextEngine):
# Pass 3: Truncate large tool_call arguments in assistant messages
# outside the protected tail. write_file with 50KB content, for
# example, survives pruning entirely without this.
#
# The shrinking is done inside the parsed JSON structure so the
# result remains valid JSON — otherwise downstream providers 400
# on every subsequent turn until the broken call falls out of
# the window. See ``_truncate_tool_call_args_json`` docstring.
for i in range(prune_boundary):
msg = result[i]
if msg.get("role") != "assistant" or not msg.get("tool_calls"):
@@ -459,8 +510,10 @@ class ContextCompressor(ContextEngine):
if isinstance(tc, dict):
args = tc.get("function", {}).get("arguments", "")
if len(args) > 500:
tc = {**tc, "function": {**tc["function"], "arguments": args[:200] + "...[truncated]"}}
modified = True
new_args = _truncate_tool_call_args_json(args)
if new_args != args:
tc = {**tc, "function": {**tc["function"], "arguments": new_args}}
modified = True
new_tcs.append(tc)
if modified:
result[i] = {**msg, "tool_calls": new_tcs}
+8 -121
View File
@@ -22,8 +22,6 @@ from hermes_cli.auth import (
_auth_store_lock,
_codex_access_token_is_expiring,
_decode_jwt_claims,
_import_codex_cli_tokens,
_write_codex_cli_tokens,
_load_auth_store,
_load_provider_state,
_resolve_kimi_base_url,
@@ -457,39 +455,6 @@ class CredentialPool:
logger.debug("Failed to sync from credentials file: %s", exc)
return entry
def _sync_codex_entry_from_cli(self, entry: PooledCredential) -> PooledCredential:
"""Sync an openai-codex pool entry from ~/.codex/auth.json if tokens differ.
OpenAI OAuth refresh tokens are single-use and rotate on every refresh.
When the Codex CLI (or another Hermes profile) refreshes its token,
the pool entry's refresh_token becomes stale. This method detects that
by comparing against ~/.codex/auth.json and syncing the fresh pair.
"""
if self.provider != "openai-codex":
return entry
try:
cli_tokens = _import_codex_cli_tokens()
if not cli_tokens:
return entry
cli_refresh = cli_tokens.get("refresh_token", "")
cli_access = cli_tokens.get("access_token", "")
if cli_refresh and cli_refresh != entry.refresh_token:
logger.debug("Pool entry %s: syncing tokens from ~/.codex/auth.json (refresh token changed)", entry.id)
updated = replace(
entry,
access_token=cli_access,
refresh_token=cli_refresh,
last_status=None,
last_status_at=None,
last_error_code=None,
)
self._replace_entry(entry, updated)
self._persist()
return updated
except Exception as exc:
logger.debug("Failed to sync from ~/.codex/auth.json: %s", exc)
return entry
def _sync_device_code_entry_to_auth_store(self, entry: PooledCredential) -> None:
"""Write refreshed pool entry tokens back to auth.json providers.
@@ -585,13 +550,6 @@ class CredentialPool:
except Exception as wexc:
logger.debug("Failed to write refreshed token to credentials file: %s", wexc)
elif self.provider == "openai-codex":
# Proactively sync from ~/.codex/auth.json before refresh.
# The Codex CLI (or another Hermes profile) may have already
# consumed our refresh_token. Syncing first avoids a
# "refresh_token_reused" error when the CLI has a newer pair.
synced = self._sync_codex_entry_from_cli(entry)
if synced is not entry:
entry = synced
refreshed = auth_mod.refresh_codex_oauth_pure(
entry.access_token,
entry.refresh_token,
@@ -677,45 +635,6 @@ class CredentialPool:
# Credentials file had a valid (non-expired) token — use it directly
logger.debug("Credentials file has valid token, using without refresh")
return synced
# For openai-codex: the refresh_token may have been consumed by
# the Codex CLI between our proactive sync and the refresh call.
# Re-sync and retry once.
if self.provider == "openai-codex":
synced = self._sync_codex_entry_from_cli(entry)
if synced.refresh_token != entry.refresh_token:
logger.debug("Retrying Codex refresh with synced token from ~/.codex/auth.json")
try:
refreshed = auth_mod.refresh_codex_oauth_pure(
synced.access_token,
synced.refresh_token,
)
updated = replace(
synced,
access_token=refreshed["access_token"],
refresh_token=refreshed["refresh_token"],
last_refresh=refreshed.get("last_refresh"),
last_status=STATUS_OK,
last_status_at=None,
last_error_code=None,
)
self._replace_entry(synced, updated)
self._persist()
self._sync_device_code_entry_to_auth_store(updated)
try:
_write_codex_cli_tokens(
updated.access_token,
updated.refresh_token,
last_refresh=updated.last_refresh,
)
except Exception as wexc:
logger.debug("Failed to write refreshed Codex tokens to CLI file (retry): %s", wexc)
return updated
except Exception as retry_exc:
logger.debug("Codex retry refresh also failed: %s", retry_exc)
elif not self._entry_needs_refresh(synced):
logger.debug("Codex CLI has valid token, using without refresh")
self._sync_device_code_entry_to_auth_store(synced)
return synced
self._mark_exhausted(entry, None)
return None
@@ -734,17 +653,6 @@ class CredentialPool:
# _seed_from_singletons() on the next load_pool() sees fresh state
# instead of re-seeding stale/consumed tokens.
self._sync_device_code_entry_to_auth_store(updated)
# Write refreshed tokens back to ~/.codex/auth.json so Codex CLI
# and VS Code don't hit "refresh_token_reused" on their next refresh.
if self.provider == "openai-codex":
try:
_write_codex_cli_tokens(
updated.access_token,
updated.refresh_token,
last_refresh=updated.last_refresh,
)
except Exception as wexc:
logger.debug("Failed to write refreshed Codex tokens to CLI file: %s", wexc)
return updated
def _entry_needs_refresh(self, entry: PooledCredential) -> bool:
@@ -790,16 +698,6 @@ class CredentialPool:
if synced is not entry:
entry = synced
cleared_any = True
# For openai-codex entries, sync from ~/.codex/auth.json before
# any status/refresh checks. This picks up tokens refreshed by
# the Codex CLI or another Hermes profile.
if (self.provider == "openai-codex"
and entry.last_status == STATUS_EXHAUSTED
and entry.refresh_token):
synced = self._sync_codex_entry_from_cli(entry)
if synced is not entry:
entry = synced
cleared_any = True
if entry.last_status == STATUS_EXHAUSTED:
exhausted_until = _exhausted_until(entry)
if exhausted_until is not None and now < exhausted_until:
@@ -1218,8 +1116,8 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
elif provider == "openai-codex":
# Respect user suppression — `hermes auth remove openai-codex` marks
# the device_code source as suppressed so it won't be re-seeded from
# either the Hermes auth store or ~/.codex/auth.json. Without this
# gate the removal is instantly undone on the next load_pool() call.
# the Hermes auth store. Without this gate the removal is instantly
# undone on the next load_pool() call.
codex_suppressed = False
try:
from hermes_cli.auth import is_source_suppressed
@@ -1231,23 +1129,12 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
state = _load_provider_state(auth_store, "openai-codex")
tokens = state.get("tokens") if isinstance(state, dict) else None
# Fallback: import from Codex CLI (~/.codex/auth.json) if Hermes auth
# store has no tokens. This mirrors resolve_codex_runtime_credentials()
# so that load_pool() and list_authenticated_providers() detect tokens
# that only exist in the Codex CLI shared file.
if not (isinstance(tokens, dict) and tokens.get("access_token")):
try:
from hermes_cli.auth import _import_codex_cli_tokens, _save_codex_tokens
cli_tokens = _import_codex_cli_tokens()
if cli_tokens:
logger.info("Importing Codex CLI tokens into Hermes auth store.")
_save_codex_tokens(cli_tokens)
# Re-read state after import
auth_store = _load_auth_store()
state = _load_provider_state(auth_store, "openai-codex")
tokens = state.get("tokens") if isinstance(state, dict) else None
except Exception as exc:
logger.debug("Codex CLI token import failed: %s", exc)
# Hermes owns its own Codex auth state — we do NOT auto-import from
# ~/.codex/auth.json at pool-load time. OAuth refresh tokens are
# single-use, so sharing them with Codex CLI / VS Code causes
# refresh_token_reused race failures. Users who want to adopt
# existing Codex CLI credentials get a one-time, explicit prompt
# via `hermes auth openai-codex`.
if isinstance(tokens, dict) and tokens.get("access_token"):
active_sources.add("device_code")
changed |= _upsert_entry(
+43 -3
View File
@@ -420,7 +420,10 @@ def list_provider_models(provider: str) -> List[str]:
models = _get_provider_models(provider)
if models is None:
return []
return list(models.keys())
return [
mid for mid in models.keys()
if not _should_hide_from_provider_catalog(provider, mid)
]
# Patterns that indicate non-agentic or noise models (TTS, embedding,
@@ -432,6 +435,43 @@ _NOISE_PATTERNS: re.Pattern = re.compile(
re.IGNORECASE,
)
# Google's live Gemini catalogs currently include a mix of stale slugs and
# Gemma models whose TPM quotas are too small for normal Hermes agent traffic.
# Keep capability metadata available for direct/manual use, but hide these from
# the Gemini model catalogs we surface in setup and model selection.
_GOOGLE_HIDDEN_MODELS = frozenset({
# Low-TPM Gemma models that trip Google input-token quota walls under
# agent-style traffic despite advertising large context windows.
"gemma-4-31b-it",
"gemma-4-26b-it",
"gemma-4-26b-a4b-it",
"gemma-3-1b",
"gemma-3-1b-it",
"gemma-3-2b",
"gemma-3-2b-it",
"gemma-3-4b",
"gemma-3-4b-it",
"gemma-3-12b",
"gemma-3-12b-it",
"gemma-3-27b",
"gemma-3-27b-it",
# Stale/retired Google slugs that still surface through models.dev-backed
# Gemini selection but 404 on the current Google endpoints.
"gemini-1.5-flash",
"gemini-1.5-pro",
"gemini-1.5-flash-8b",
"gemini-2.0-flash",
"gemini-2.0-flash-lite",
})
def _should_hide_from_provider_catalog(provider: str, model_id: str) -> bool:
provider_lower = (provider or "").strip().lower()
model_lower = (model_id or "").strip().lower()
if provider_lower in {"gemini", "google"} and model_lower in _GOOGLE_HIDDEN_MODELS:
return True
return False
def list_agentic_models(provider: str) -> List[str]:
"""Return model IDs suitable for agentic use from models.dev.
@@ -448,6 +488,8 @@ def list_agentic_models(provider: str) -> List[str]:
for mid, entry in models.items():
if not isinstance(entry, dict):
continue
if _should_hide_from_provider_catalog(provider, mid):
continue
if not entry.get("tool_call", False):
continue
if _NOISE_PATTERNS.search(mid):
@@ -582,5 +624,3 @@ def get_model_info(
return _parse_model_info(mid, mdata, mdev_id)
return None
+121 -76
View File
@@ -83,17 +83,51 @@ load_hermes_dotenv(hermes_home=_hermes_home, project_env=_project_env)
_REASONING_TAGS = (
"REASONING_SCRATCHPAD",
"think",
"reasoning",
"THINKING",
"thinking",
"reasoning",
"thought",
)
def _strip_reasoning_tags(text: str) -> str:
"""Remove reasoning/thinking blocks from displayed text.
Handles every case:
* Closed pairs ``<tag></tag>`` (case-insensitive, multi-line).
* Unterminated open tags that run to end-of-text (e.g. truncated
generations on NIM/MiniMax where the close tag is dropped).
* Stray orphan close tags (``stuff</think>answer``) left behind by
partial-content dumps.
Covers the variants emitted by reasoning models today: ``<think>``,
``<thinking>``, ``<reasoning>``, ``<REASONING_SCRATCHPAD>``, and
``<thought>`` (Gemma 4). Must stay in sync with
``run_agent.py::_strip_think_blocks`` and the stream consumer's
``_OPEN_THINK_TAGS`` / ``_CLOSE_THINK_TAGS`` tuples.
"""
cleaned = text
for tag in _REASONING_TAGS:
cleaned = re.sub(rf"<{tag}>.*?</{tag}>\s*", "", cleaned, flags=re.DOTALL)
cleaned = re.sub(rf"<{tag}>.*$", "", cleaned, flags=re.DOTALL)
# Closed pair — case-insensitive so <THINK>…</THINK> is handled too.
cleaned = re.sub(
rf"<{tag}>.*?</{tag}>\s*",
"",
cleaned,
flags=re.DOTALL | re.IGNORECASE,
)
# Unterminated open tag — strip from the tag to end of text.
cleaned = re.sub(
rf"<{tag}>.*$",
"",
cleaned,
flags=re.DOTALL | re.IGNORECASE,
)
# Stray orphan close tag left behind by partial dumps.
cleaned = re.sub(
rf"</{tag}>\s*",
"",
cleaned,
flags=re.IGNORECASE,
)
return cleaned.strip()
@@ -1776,7 +1810,7 @@ class HermesCLI:
mcp_names = set((CLI_CONFIG.get("mcp_servers") or {}).keys())
invalid = [t for t in toolsets if not validate_toolset(t) and t not in mcp_names]
if invalid:
self.console.print(f"[bold red]Warning: Unknown toolsets: {', '.join(invalid)}[/]")
self._console_print(f"[bold red]Warning: Unknown toolsets: {', '.join(invalid)}[/]")
# Filesystem checkpoints: CLI flag > config
cp_cfg = CLI_CONFIG.get("checkpoints", {})
@@ -2068,20 +2102,35 @@ class HermesCLI:
def _spinner_widget_height(self, width: Optional[int] = None) -> int:
"""Return the visible height for the spinner/status text line above the status bar."""
if not getattr(self, "_spinner_text", ""):
spinner_line = self._render_spinner_text()
if not spinner_line:
return 0
if self._use_minimal_tui_chrome(width=width):
return 0
# Compute how many lines the spinner text needs when wrapped.
# The rendered text is " {emoji} {label} ({elapsed})" — about
# len(_spinner_text) + 16 chars for indent + timer suffix.
width = width or self._get_tui_terminal_width()
if width and width > 10:
import math
text_len = len(self._spinner_text) + 16 # indent + timer
return max(1, math.ceil(text_len / width))
text_width = self._status_bar_display_width(spinner_line)
return max(1, math.ceil(text_width / width))
return 1
def _render_spinner_text(self) -> str:
"""Return the live spinner/status text exactly as rendered in the TUI."""
txt = getattr(self, "_spinner_text", "")
if not txt:
return ""
t0 = getattr(self, "_tool_start_time", 0) or 0
if t0 > 0:
import time as _time
elapsed = _time.monotonic() - t0
if elapsed >= 60:
_m, _s = int(elapsed // 60), int(elapsed % 60)
elapsed_str = f"{_m}m {_s}s"
else:
elapsed_str = f"{elapsed:.1f}s"
return f" {txt} ({elapsed_str})"
return f" {txt}"
def _get_voice_status_fragments(self, width: Optional[int] = None):
"""Return the voice status bar fragments for the interactive TUI."""
width = width or self._get_tui_terminal_width()
@@ -2212,7 +2261,7 @@ class HermesCLI:
normalized_model = normalize_model_for_provider(current_model, resolved_provider)
if normalized_model and normalized_model != current_model:
if not self._model_is_default:
self.console.print(
self._console_print(
f"[yellow]⚠️ Normalized model '{current_model}' to '{normalized_model}' for {resolved_provider}.[/]"
)
self.model = normalized_model
@@ -2228,7 +2277,7 @@ class HermesCLI:
canonical = normalize_copilot_model_id(current_model, api_key=self.api_key)
if canonical and canonical != current_model:
if not self._model_is_default:
self.console.print(
self._console_print(
f"[yellow]⚠️ Normalized Copilot model '{current_model}' to '{canonical}'.[/]"
)
self.model = canonical
@@ -2250,7 +2299,7 @@ class HermesCLI:
canonical = normalize_opencode_model_id(resolved_provider, current_model)
if canonical and canonical != current_model:
if not self._model_is_default:
self.console.print(
self._console_print(
f"[yellow]⚠️ Stripped provider prefix from '{current_model}'; using '{canonical}' for {resolved_provider}.[/]"
)
self.model = canonical
@@ -2272,7 +2321,7 @@ class HermesCLI:
if "/" in current_model:
slug = current_model.split("/", 1)[1]
if not self._model_is_default:
self.console.print(
self._console_print(
f"[yellow]⚠️ Stripped provider prefix from '{current_model}'; "
f"using '{slug}' for OpenAI Codex.[/]"
)
@@ -3021,7 +3070,7 @@ class HermesCLI:
use_compact = self.compact or term_width < 80
if use_compact:
self.console.print(_build_compact_banner())
self._console_print(_build_compact_banner())
self._show_status()
else:
# Get tools for display
@@ -3046,25 +3095,25 @@ class HermesCLI:
# Warn about very low context lengths (common with local servers)
if ctx_len and ctx_len <= 8192:
self.console.print()
self.console.print(
self._console_print()
self._console_print(
f"[yellow]⚠️ Context length is only {ctx_len:,} tokens — "
f"this is likely too low for agent use with tools.[/]"
)
self.console.print(
self._console_print(
"[dim] Hermes needs 16k32k minimum. Tool schemas + system prompt alone use ~4k8k.[/]"
)
base_url = getattr(self, "base_url", "") or ""
if "11434" in base_url or "ollama" in base_url.lower():
self.console.print(
self._console_print(
"[dim] Ollama fix: OLLAMA_CONTEXT_LENGTH=32768 ollama serve[/]"
)
elif "1234" in base_url:
self.console.print(
self._console_print(
"[dim] LM Studio fix: Set context length in model settings → reload model[/]"
)
else:
self.console.print(
self._console_print(
"[dim] Fix: Set model.context_length in config.yaml, or increase your server's context setting[/]"
)
@@ -3073,20 +3122,20 @@ class HermesCLI:
model_name = getattr(self, "model", "") or ""
if is_nous_hermes_non_agentic(model_name):
self.console.print()
self.console.print(
self._console_print()
self._console_print(
"[bold yellow]⚠ Nous Research Hermes 3 & 4 models are NOT agentic and are not "
"designed for use with Hermes Agent.[/]"
)
self.console.print(
self._console_print(
"[dim] They lack tool-calling capabilities required for agent workflows. "
"Consider using an agentic model (Claude, GPT, Gemini, DeepSeek, etc.).[/]"
)
self.console.print(
self._console_print(
"[dim] Switch with: /model sonnet or /model gpt5[/]"
)
self.console.print()
self._console_print()
def _preload_resumed_session(self) -> bool:
"""Load a resumed session's history from the DB early (before first chat).
@@ -3104,10 +3153,10 @@ class HermesCLI:
session_meta = self._session_db.get_session(self.session_id)
if not session_meta:
self.console.print(
self._console_print(
f"[bold red]Session not found: {self.session_id}[/]"
)
self.console.print(
self._console_print(
"[dim]Use a session ID from a previous CLI run "
"(hermes sessions list).[/]"
)
@@ -3122,7 +3171,7 @@ class HermesCLI:
if session_meta.get("title"):
title_part = f' "{session_meta["title"]}"'
accent_color = _accent_hex()
self.console.print(
self._console_print(
f"[{accent_color}]↻ Resumed session [bold]{self.session_id}[/bold]"
f"{title_part} "
f"({msg_count} user message{'s' if msg_count != 1 else ''}, "
@@ -3130,7 +3179,7 @@ class HermesCLI:
)
else:
accent_color = _accent_hex()
self.console.print(
self._console_print(
f"[{accent_color}]Session {self.session_id} found but has no "
f"messages. Starting fresh.[/]"
)
@@ -3305,7 +3354,7 @@ class HermesCLI:
padding=(0, 1),
style=_history_text_c,
)
self.console.print(panel)
self._console_print(panel)
def _try_attach_clipboard_image(self) -> bool:
"""Check clipboard for an image and attach it if found.
@@ -3741,14 +3790,14 @@ class HermesCLI:
api_key_missing = [u for u in unavailable if u["missing_vars"]]
if api_key_missing:
self.console.print()
self.console.print("[yellow]⚠️ Some tools disabled (missing API keys):[/]")
self._console_print()
self._console_print("[yellow]⚠️ Some tools disabled (missing API keys):[/]")
for item in api_key_missing:
tools_str = ", ".join(item["tools"][:2]) # Show first 2 tools
if len(item["tools"]) > 2:
tools_str += f", +{len(item['tools'])-2} more"
self.console.print(f" [dim]• {item['name']}[/] [dim italic]({', '.join(item['missing_vars'])})[/]")
self.console.print("[dim] Run 'hermes setup' to configure[/]")
self._console_print(f" [dim]• {item['name']}[/] [dim italic]({', '.join(item['missing_vars'])})[/]")
self._console_print("[dim] Run 'hermes setup' to configure[/]")
except Exception:
pass # Don't crash on import errors
@@ -3786,7 +3835,7 @@ class HermesCLI:
if self._provider_source:
provider_info += f" [dim {separator_color}]·[/] [dim]auth: {self._provider_source}[/]"
self.console.print(
self._console_print(
f" {api_indicator} [{accent_color}]{model_short}[/] "
f"[dim {separator_color}]·[/] [bold {label_color}]{tool_count} tools[/]"
f"{toolsets_info}{provider_info}"
@@ -3843,7 +3892,7 @@ class HermesCLI:
f"Tokens: {total_tokens:,}",
f"Agent Running: {'Yes' if is_running else 'No'}",
])
self.console.print("\n".join(lines), highlight=False, markup=False)
self._console_print("\n".join(lines), highlight=False, markup=False)
def _fast_command_available(self) -> bool:
try:
@@ -5041,8 +5090,15 @@ class HermesCLI:
print(" To change model or provider, use: hermes model")
def _output_console(self):
"""Use prompt_toolkit-safe Rich rendering once the TUI is live."""
if getattr(self, "_app", None):
return ChatConsole()
return self.console
def _console_print(self, *args, **kwargs):
"""Print through the active command-safe console."""
self._output_console().print(*args, **kwargs)
@staticmethod
def _resolve_personality_prompt(value) -> str:
@@ -5062,14 +5118,14 @@ class HermesCLI:
from agent.google_oauth import get_valid_access_token, GoogleOAuthError, load_credentials
from agent.google_code_assist import retrieve_user_quota, CodeAssistError
except ImportError as exc:
self.console.print(f" [red]Gemini modules unavailable: {exc}[/]")
self._console_print(f" [red]Gemini modules unavailable: {exc}[/]")
return
try:
access_token = get_valid_access_token()
except GoogleOAuthError as exc:
self.console.print(f" [yellow]{exc}[/]")
self.console.print(" Run [bold]/model[/] and pick 'Google Gemini (OAuth)' to sign in.")
self._console_print(f" [yellow]{exc}[/]")
self._console_print(" Run [bold]/model[/] and pick 'Google Gemini (OAuth)' to sign in.")
return
creds = load_credentials()
@@ -5078,18 +5134,18 @@ class HermesCLI:
try:
buckets = retrieve_user_quota(access_token, project_id=project_id)
except CodeAssistError as exc:
self.console.print(f" [red]Quota lookup failed:[/] {exc}")
self._console_print(f" [red]Quota lookup failed:[/] {exc}")
return
if not buckets:
self.console.print(" [dim]No quota buckets reported (account may be on legacy/unmetered tier).[/]")
self._console_print(" [dim]No quota buckets reported (account may be on legacy/unmetered tier).[/]")
return
# Sort for stable display, group by model
buckets.sort(key=lambda b: (b.model_id, b.token_type))
self.console.print()
self.console.print(f" [bold]Gemini Code Assist quota[/] (project: {project_id or '(auto / free-tier)'})")
self.console.print()
self._console_print()
self._console_print(f" [bold]Gemini Code Assist quota[/] (project: {project_id or '(auto / free-tier)'})")
self._console_print()
for b in buckets:
pct = max(0.0, min(1.0, b.remaining_fraction))
width = 20
@@ -5099,8 +5155,8 @@ class HermesCLI:
header = b.model_id
if b.token_type:
header += f" [{b.token_type}]"
self.console.print(f" {header:40s} {bar} {pct_str}")
self.console.print()
self._console_print(f" {header:40s} {bar} {pct_str}")
self._console_print()
def _handle_personality_command(self, cmd: str):
"""Handle the /personality command to set predefined personalities."""
@@ -5548,7 +5604,7 @@ class HermesCLI:
_tip_color = get_active_skin().get_color("banner_dim", "#B8860B")
except Exception:
_tip_color = "#B8860B"
self.console.print(f"[dim {_tip_color}]✦ Tip: {_tip}[/]")
self._console_print(f"[dim {_tip_color}]✦ Tip: {_tip}[/]")
except Exception:
pass
elif canonical == "history":
@@ -5642,7 +5698,7 @@ class HermesCLI:
elif canonical == "statusbar":
self._status_bar_visible = not self._status_bar_visible
state = "visible" if self._status_bar_visible else "hidden"
self.console.print(f" Status bar {state}")
self._console_print(f" Status bar {state}")
elif canonical == "verbose":
self._toggle_verbose()
elif canonical == "yolo":
@@ -5765,15 +5821,15 @@ class HermesCLI:
)
output = result.stdout.strip() or result.stderr.strip()
if output:
self.console.print(_rich_text_from_ansi(output))
self._console_print(_rich_text_from_ansi(output))
else:
self.console.print("[dim]Command returned no output[/]")
self._console_print("[dim]Command returned no output[/]")
except subprocess.TimeoutExpired:
self.console.print("[bold red]Quick command timed out (30s)[/]")
self._console_print("[bold red]Quick command timed out (30s)[/]")
except Exception as e:
self.console.print(f"[bold red]Quick command error: {e}[/]")
self._console_print(f"[bold red]Quick command error: {e}[/]")
else:
self.console.print(f"[bold red]Quick command '{base_cmd}' has no command defined[/]")
self._console_print(f"[bold red]Quick command '{base_cmd}' has no command defined[/]")
elif qcmd.get("type") == "alias":
target = qcmd.get("target", "").strip()
if target:
@@ -5782,9 +5838,9 @@ class HermesCLI:
aliased_command = f"{target} {user_args}".strip()
return self.process_command(aliased_command)
else:
self.console.print(f"[bold red]Quick command '{base_cmd}' has no target defined[/]")
self._console_print(f"[bold red]Quick command '{base_cmd}' has no target defined[/]")
else:
self.console.print(f"[bold red]Quick command '{base_cmd}' has unsupported type (supported: 'exec', 'alias')[/]")
self._console_print(f"[bold red]Quick command '{base_cmd}' has unsupported type (supported: 'exec', 'alias')[/]")
# Check for plugin-registered slash commands
elif base_cmd.lstrip("/") in _get_plugin_cmd_handler_names():
from hermes_cli.plugins import get_plugin_command_handler
@@ -8554,7 +8610,7 @@ class HermesCLI:
except Exception:
_welcome_text = "Welcome to Hermes Agent! Type your message or /help for commands."
_welcome_color = "#FFF8DC"
self.console.print(f"[{_welcome_color}]{_welcome_text}[/]")
self._console_print(f"[{_welcome_color}]{_welcome_text}[/]")
# Show a random tip to help users discover features
try:
from hermes_cli.tips import get_random_tip
@@ -8563,16 +8619,16 @@ class HermesCLI:
_tip_color = _welcome_skin.get_color("banner_dim", "#B8860B")
except Exception:
_tip_color = "#B8860B"
self.console.print(f"[dim {_tip_color}]✦ Tip: {_tip}[/]")
self._console_print(f"[dim {_tip_color}]✦ Tip: {_tip}[/]")
except Exception:
pass # Tips are non-critical — never break startup
if self.preloaded_skills and not self._startup_skills_line_shown:
skills_label = ", ".join(self.preloaded_skills)
self.console.print(
self._console_print(
f"[bold {_accent_hex()}]Activated skills:[/] {skills_label}"
)
self._startup_skills_line_shown = True
self.console.print()
self._console_print()
# State for async operation
self._agent_running = False
@@ -9375,21 +9431,10 @@ class HermesCLI:
return cli_ref._agent_spacer_height()
def get_spinner_text():
txt = cli_ref._spinner_text
if not txt:
spinner_line = cli_ref._render_spinner_text()
if not spinner_line:
return []
# Append live elapsed timer when a tool is running
t0 = cli_ref._tool_start_time
if t0 > 0:
import time as _time
elapsed = _time.monotonic() - t0
if elapsed >= 60:
_m, _s = int(elapsed // 60), int(elapsed % 60)
elapsed_str = f"{_m}m {_s}s"
else:
elapsed_str = f"{elapsed:.1f}s"
return [('class:hint', f' {txt} ({elapsed_str})')]
return [('class:hint', f' {txt}')]
return [('class:hint', spinner_line)]
def get_spinner_height():
return cli_ref._spinner_widget_height()
+70 -4
View File
@@ -564,15 +564,53 @@ def _run_job_script(script_path: str) -> tuple[bool, str]:
return False, f"Script execution failed: {exc}"
def _build_job_prompt(job: dict) -> str:
"""Build the effective prompt for a cron job, optionally loading one or more skills first."""
def _parse_wake_gate(script_output: str) -> bool:
"""Parse the last non-empty stdout line of a cron job's pre-check script
as a wake gate.
The convention (ported from nanoclaw #1232): if the last stdout line is
JSON like ``{"wakeAgent": false}``, the agent is skipped entirely no
LLM run, no delivery. Any other output (non-JSON, missing flag, gate
absent, or ``wakeAgent: true``) means wake the agent normally.
Returns True if the agent should wake, False to skip.
"""
if not script_output:
return True
stripped_lines = [line for line in script_output.splitlines() if line.strip()]
if not stripped_lines:
return True
last_line = stripped_lines[-1].strip()
try:
gate = json.loads(last_line)
except (json.JSONDecodeError, ValueError):
return True
if not isinstance(gate, dict):
return True
return gate.get("wakeAgent", True) is not False
def _build_job_prompt(job: dict, prerun_script: Optional[tuple] = None) -> str:
"""Build the effective prompt for a cron job, optionally loading one or more skills first.
Args:
job: The cron job dict.
prerun_script: Optional ``(success, stdout)`` from a script that has
already been executed by the caller (e.g. for a wake-gate check).
When provided, the script is not re-executed and the cached
result is used for prompt injection. When omitted, the script
(if any) runs inline as before.
"""
prompt = job.get("prompt", "")
skills = job.get("skills")
# Run data-collection script if configured, inject output as context.
script_path = job.get("script")
if script_path:
success, script_output = _run_job_script(script_path)
if prerun_script is not None:
success, script_output = prerun_script
else:
success, script_output = _run_job_script(script_path)
if success:
if script_output:
prompt = (
@@ -674,13 +712,41 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
job_id = job["id"]
job_name = job["name"]
prompt = _build_job_prompt(job)
# Wake-gate: if this job has a pre-check script, run it BEFORE building
# the prompt so a ``{"wakeAgent": false}`` response can short-circuit
# the whole agent run. We pass the result into _build_job_prompt so
# the script is only executed once.
prerun_script = None
script_path = job.get("script")
if script_path:
prerun_script = _run_job_script(script_path)
_ran_ok, _script_output = prerun_script
if _ran_ok and not _parse_wake_gate(_script_output):
logger.info(
"Job '%s' (ID: %s): wakeAgent=false, skipping agent run",
job_name, job_id,
)
silent_doc = (
f"# Cron Job: {job_name}\n\n"
f"**Job ID:** {job_id}\n"
f"**Run Time:** {_hermes_now().strftime('%Y-%m-%d %H:%M:%S')}\n\n"
"Script gate returned `wakeAgent=false` — agent skipped.\n"
)
return True, silent_doc, SILENT_MARKER, None
prompt = _build_job_prompt(job, prerun_script=prerun_script)
origin = _resolve_origin(job)
_cron_session_id = f"cron_{job_id}_{_hermes_now().strftime('%Y%m%d_%H%M%S')}"
logger.info("Running job '%s' (ID: %s)", job_name, job_id)
logger.info("Prompt: %s", prompt[:100])
# Mark this as a cron session so the approval system can apply cron_mode.
# This env var is process-wide and persists for the lifetime of the
# scheduler process — every job this process runs is a cron job.
os.environ["HERMES_CRON_SESSION"] = "1"
try:
# Inject origin context so the agent's send_message tool knows the chat.
# Must be INSIDE the try block so the finally cleanup always runs.
-228
View File
@@ -1,228 +0,0 @@
# Hermes Agent — ACP (Agent Client Protocol) Setup Guide
Hermes Agent supports the **Agent Client Protocol (ACP)**, allowing it to run as
a coding agent inside your editor. ACP lets your IDE send tasks to Hermes, and
Hermes responds with file edits, terminal commands, and explanations — all shown
natively in the editor UI.
---
## Prerequisites
- Hermes Agent installed and configured (`hermes setup` completed)
- An API key / provider set up in `~/.hermes/.env` or via `hermes login`
- Python 3.11+
Install the ACP extra:
```bash
pip install -e ".[acp]"
```
---
## VS Code Setup
### 1. Install the ACP Client extension
Open VS Code and install **ACP Client** from the marketplace:
- Press `Ctrl+Shift+X` (or `Cmd+Shift+X` on macOS)
- Search for **"ACP Client"**
- Click **Install**
Or install from the command line:
```bash
code --install-extension anysphere.acp-client
```
### 2. Configure settings.json
Open your VS Code settings (`Ctrl+,` → click the `{}` icon for JSON) and add:
```json
{
"acpClient.agents": [
{
"name": "hermes-agent",
"registryDir": "/path/to/hermes-agent/acp_registry"
}
]
}
```
Replace `/path/to/hermes-agent` with the actual path to your Hermes Agent
installation (e.g. `~/.hermes/hermes-agent`).
Alternatively, if `hermes` is on your PATH, the ACP Client can discover it
automatically via the registry directory.
### 3. Restart VS Code
After configuring, restart VS Code. You should see **Hermes Agent** appear in
the ACP agent picker in the chat/agent panel.
---
## Zed Setup
Zed has built-in ACP support.
### 1. Configure Zed settings
Open Zed settings (`Cmd+,` on macOS or `Ctrl+,` on Linux) and add to your
`settings.json`:
```json
{
"agent_servers": {
"hermes-agent": {
"type": "custom",
"command": "hermes",
"args": ["acp"],
},
},
}
```
### 2. Restart Zed
Hermes Agent will appear in the agent panel. Select it and start a conversation.
---
## JetBrains Setup (IntelliJ, PyCharm, WebStorm, etc.)
### 1. Install the ACP plugin
- Open **Settings****Plugins** → **Marketplace**
- Search for **"ACP"** or **"Agent Client Protocol"**
- Install and restart the IDE
### 2. Configure the agent
- Open **Settings****Tools** → **ACP Agents**
- Click **+** to add a new agent
- Set the registry directory to your `acp_registry/` folder:
`/path/to/hermes-agent/acp_registry`
- Click **OK**
### 3. Use the agent
Open the ACP panel (usually in the right sidebar) and select **Hermes Agent**.
---
## What You Will See
Once connected, your editor provides a native interface to Hermes Agent:
### Chat Panel
A conversational interface where you can describe tasks, ask questions, and
give instructions. Hermes responds with explanations and actions.
### File Diffs
When Hermes edits files, you see standard diffs in the editor. You can:
- **Accept** individual changes
- **Reject** changes you don't want
- **Review** the full diff before applying
### Terminal Commands
When Hermes needs to run shell commands (builds, tests, installs), the editor
shows them in an integrated terminal. Depending on your settings:
- Commands may run automatically
- Or you may be prompted to **approve** each command
### Approval Flow
For potentially destructive operations, the editor will prompt you for
approval before Hermes proceeds. This includes:
- File deletions
- Shell commands
- Git operations
---
## Configuration
Hermes Agent under ACP uses the **same configuration** as the CLI:
- **API keys / providers**: `~/.hermes/.env`
- **Agent config**: `~/.hermes/config.yaml`
- **Skills**: `~/.hermes/skills/`
- **Sessions**: `~/.hermes/state.db`
You can run `hermes setup` to configure providers, or edit `~/.hermes/.env`
directly.
### Changing the model
Edit `~/.hermes/config.yaml`:
```yaml
model: openrouter/nous/hermes-3-llama-3.1-70b
```
Or set the `HERMES_MODEL` environment variable.
### Toolsets
ACP sessions use the curated `hermes-acp` toolset by default. It is designed for editor workflows and intentionally excludes things like messaging delivery, cronjob management, and audio-first UX features.
---
## Troubleshooting
### Agent doesn't appear in the editor
1. **Check the registry path** — make sure the `acp_registry/` directory path
in your editor settings is correct and contains `agent.json`.
2. **Check `hermes` is on PATH** — run `which hermes` in a terminal. If not
found, you may need to activate your virtualenv or add it to PATH.
3. **Restart the editor** after changing settings.
### Agent starts but errors immediately
1. Run `hermes doctor` to check your configuration.
2. Check that you have a valid API key: `hermes status`
3. Try running `hermes acp` directly in a terminal to see error output.
### "Module not found" errors
Make sure you installed the ACP extra:
```bash
pip install -e ".[acp]"
```
### Slow responses
- ACP streams responses, so you should see incremental output. If the agent
appears stuck, check your network connection and API provider status.
- Some providers have rate limits. Try switching to a different model/provider.
### Permission denied for terminal commands
If the editor blocks terminal commands, check your ACP Client extension
settings for auto-approval or manual-approval preferences.
### Logs
Hermes logs are written to stderr when running in ACP mode. Check:
- VS Code: **Output** panel → select **ACP Client** or **Hermes Agent**
- Zed: **View****Toggle Terminal** and check the process output
- JetBrains: **Event Log** or the ACP tool window
You can also enable verbose logging:
```bash
HERMES_LOG_LEVEL=DEBUG hermes acp
```
---
## Further Reading
- [ACP Specification](https://github.com/anysphere/acp)
- [Hermes Agent Documentation](https://github.com/NousResearch/hermes-agent)
- Run `hermes --help` for all CLI options
-698
View File
@@ -1,698 +0,0 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>honcho-integration-spec</title>
<style>
:root {
--bg: #0b0e14;
--bg-surface: #11151c;
--bg-elevated: #181d27;
--bg-code: #0d1018;
--fg: #c9d1d9;
--fg-bright: #e6edf3;
--fg-muted: #6e7681;
--fg-subtle: #484f58;
--accent: #7eb8f6;
--accent-dim: #3d6ea5;
--accent-glow: rgba(126, 184, 246, 0.08);
--green: #7ee6a8;
--green-dim: #2ea04f;
--orange: #e6a855;
--red: #f47067;
--purple: #bc8cff;
--cyan: #56d4dd;
--border: #21262d;
--border-subtle: #161b22;
--radius: 6px;
--font-sans: 'New York', ui-serif, 'Iowan Old Style', 'Apple Garamond', Baskerville, 'Times New Roman', 'Noto Emoji', serif;
--font-mono: 'Departure Mono', 'Noto Emoji', monospace;
}
*, *::before, *::after { box-sizing: border-box; margin: 0; padding: 0; }
html { scroll-behavior: smooth; scroll-padding-top: 2rem; }
body {
font-family: var(--font-sans);
background: var(--bg);
color: var(--fg);
line-height: 1.7;
font-size: 15px;
-webkit-font-smoothing: antialiased;
}
.container { max-width: 860px; margin: 0 auto; padding: 3rem 2rem 6rem; }
.hero {
text-align: center;
padding: 4rem 0 3rem;
border-bottom: 1px solid var(--border);
margin-bottom: 3rem;
}
.hero h1 { font-family: var(--font-mono); font-size: 2.2rem; font-weight: 700; color: var(--fg-bright); letter-spacing: -0.03em; margin-bottom: 0.5rem; }
.hero h1 span { color: var(--accent); }
.hero .subtitle { font-family: var(--font-sans); color: var(--fg-muted); font-size: 0.92rem; max-width: 560px; margin: 0 auto; line-height: 1.6; }
.hero .meta { margin-top: 1.5rem; display: flex; justify-content: center; gap: 1.5rem; flex-wrap: wrap; }
.hero .meta span { font-size: 0.8rem; color: var(--fg-subtle); font-family: var(--font-mono); }
.toc { background: var(--bg-surface); border: 1px solid var(--border); border-radius: var(--radius); padding: 1.5rem 2rem; margin-bottom: 3rem; }
.toc h2 { font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.1em; color: var(--fg-muted); margin-bottom: 1rem; }
.toc ol { list-style: none; counter-reset: toc; columns: 2; column-gap: 2rem; }
.toc li { counter-increment: toc; break-inside: avoid; margin-bottom: 0.35rem; }
.toc li::before { content: counter(toc, decimal-leading-zero) " "; color: var(--fg-subtle); font-family: var(--font-mono); font-size: 0.75rem; margin-right: 0.25rem; }
.toc a { font-family: var(--font-mono); color: var(--fg); text-decoration: none; font-size: 0.82rem; transition: color 0.15s; }
.toc a:hover { color: var(--accent); }
section { margin-bottom: 4rem; }
section + section { padding-top: 1rem; }
h2 { font-family: var(--font-mono); font-size: 1.3rem; font-weight: 700; color: var(--fg-bright); letter-spacing: -0.01em; margin-bottom: 1.25rem; padding-bottom: 0.5rem; border-bottom: 1px solid var(--border); }
h3 { font-family: var(--font-mono); font-size: 1rem; font-weight: 600; color: var(--fg-bright); margin-top: 2rem; margin-bottom: 0.75rem; }
h4 { font-family: var(--font-mono); font-size: 0.9rem; font-weight: 600; color: var(--accent); margin-top: 1.5rem; margin-bottom: 0.5rem; }
p { margin-bottom: 1rem; font-size: 0.95rem; line-height: 1.75; }
strong { color: var(--fg-bright); font-weight: 600; }
a { color: var(--accent); text-decoration: none; }
a:hover { text-decoration: underline; }
ul, ol { margin-bottom: 1rem; padding-left: 1.5rem; font-size: 0.93rem; line-height: 1.7; }
li { margin-bottom: 0.35rem; }
li::marker { color: var(--fg-subtle); }
.table-wrap { overflow-x: auto; margin-bottom: 1.5rem; }
table { width: 100%; border-collapse: collapse; font-size: 0.88rem; }
th, td { text-align: left; padding: 0.6rem 1rem; border-bottom: 1px solid var(--border-subtle); }
th { font-family: var(--font-mono); font-size: 0.72rem; text-transform: uppercase; letter-spacing: 0.06em; color: var(--fg-muted); background: var(--bg-surface); border-bottom-color: var(--border); white-space: nowrap; }
td { font-family: var(--font-sans); font-size: 0.88rem; color: var(--fg); }
tr:hover td { background: var(--accent-glow); }
td code { background: var(--bg-elevated); padding: 0.15em 0.4em; border-radius: 3px; font-family: var(--font-mono); font-size: 0.82em; color: var(--cyan); }
pre { background: var(--bg-code); border: 1px solid var(--border); border-radius: var(--radius); padding: 1.25rem 1.5rem; overflow-x: auto; margin-bottom: 1.5rem; font-family: var(--font-mono); font-size: 0.82rem; line-height: 1.65; color: var(--fg); }
pre code { background: none; padding: 0; color: inherit; font-size: inherit; }
code { font-family: var(--font-mono); font-size: 0.85em; }
p code, li code { background: var(--bg-elevated); padding: 0.15em 0.4em; border-radius: 3px; color: var(--cyan); font-size: 0.85em; }
.kw { color: var(--purple); }
.str { color: var(--green); }
.cm { color: var(--fg-subtle); font-style: italic; }
.num { color: var(--orange); }
.key { color: var(--accent); }
.mermaid { margin: 1.5rem 0 2rem; text-align: center; }
.mermaid svg { max-width: 100%; height: auto; }
.callout { font-family: var(--font-sans); background: var(--bg-surface); border-left: 3px solid var(--accent-dim); border-radius: 0 var(--radius) var(--radius) 0; padding: 1rem 1.25rem; margin-bottom: 1.5rem; font-size: 0.88rem; color: var(--fg-muted); line-height: 1.6; }
.callout strong { font-family: var(--font-mono); color: var(--fg-bright); }
.callout.success { border-left-color: var(--green-dim); }
.callout.warn { border-left-color: var(--orange); }
.badge { display: inline-block; font-family: var(--font-mono); font-size: 0.65rem; font-weight: 600; text-transform: uppercase; letter-spacing: 0.05em; padding: 0.2em 0.6em; border-radius: 3px; vertical-align: middle; margin-left: 0.4rem; }
.badge-done { background: var(--green-dim); color: #fff; }
.badge-wip { background: var(--orange); color: #0b0e14; }
.badge-todo { background: var(--fg-subtle); color: var(--fg); }
.checklist { list-style: none; padding-left: 0; }
.checklist li { padding-left: 1.5rem; position: relative; margin-bottom: 0.5rem; }
.checklist li::before { position: absolute; left: 0; font-family: var(--font-mono); font-size: 0.85rem; }
.checklist li.done { color: var(--fg-muted); }
.checklist li.done::before { content: "\2713"; color: var(--green); }
.checklist li.todo::before { content: "\25CB"; color: var(--fg-subtle); }
.checklist li.wip::before { content: "\25D4"; color: var(--orange); }
.compare { display: grid; grid-template-columns: 1fr 1fr; gap: 1rem; margin-bottom: 2rem; }
.compare-card { background: var(--bg-surface); border: 1px solid var(--border); border-radius: var(--radius); padding: 1.25rem; }
.compare-card h4 { margin-top: 0; font-size: 0.82rem; }
.compare-card.after { border-color: var(--accent-dim); }
.compare-card ul { font-family: var(--font-mono); padding-left: 1.25rem; font-size: 0.8rem; }
hr { border: none; border-top: 1px solid var(--border); margin: 3rem 0; }
.progress-bar { position: fixed; top: 0; left: 0; height: 2px; background: var(--accent); z-index: 999; transition: width 0.1s linear; }
@media (max-width: 640px) {
.container { padding: 2rem 1rem 4rem; }
.hero h1 { font-size: 1.6rem; }
.toc ol { columns: 1; }
.compare { grid-template-columns: 1fr; }
table { font-size: 0.8rem; }
th, td { padding: 0.4rem 0.6rem; }
}
</style>
<link rel="preconnect" href="https://fonts.googleapis.com">
<link href="https://fonts.googleapis.com/css2?family=Noto+Emoji&display=swap" rel="stylesheet">
<style>
@font-face {
font-family: 'Departure Mono';
src: url('https://cdn.jsdelivr.net/gh/rektdeckard/departure-mono@latest/fonts/DepartureMono-Regular.woff2') format('woff2');
font-weight: normal;
font-style: normal;
font-display: swap;
}
</style>
</head>
<body>
<div class="progress-bar" id="progress"></div>
<div class="container">
<header class="hero">
<h1>honcho<span>-integration-spec</span></h1>
<p class="subtitle">Comparison of Hermes Agent vs. openclaw-honcho — and a porting spec for bringing Hermes patterns into other Honcho integrations.</p>
<div class="meta">
<span>hermes-agent / openclaw-honcho</span>
<span>Python + TypeScript</span>
<span>2026-03-09</span>
</div>
</header>
<nav class="toc">
<h2>Contents</h2>
<ol>
<li><a href="#overview">Overview</a></li>
<li><a href="#architecture">Architecture comparison</a></li>
<li><a href="#diff-table">Diff table</a></li>
<li><a href="#patterns">Hermes patterns to port</a></li>
<li><a href="#spec-async">Spec: async prefetch</a></li>
<li><a href="#spec-reasoning">Spec: dynamic reasoning level</a></li>
<li><a href="#spec-modes">Spec: per-peer memory modes</a></li>
<li><a href="#spec-identity">Spec: AI peer identity formation</a></li>
<li><a href="#spec-sessions">Spec: session naming strategies</a></li>
<li><a href="#spec-cli">Spec: CLI surface injection</a></li>
<li><a href="#openclaw-checklist">openclaw-honcho checklist</a></li>
<li><a href="#nanobot-checklist">nanobot-honcho checklist</a></li>
</ol>
</nav>
<!-- OVERVIEW -->
<section id="overview">
<h2>Overview</h2>
<p>Two independent Honcho integrations have been built for two different agent runtimes: <strong>Hermes Agent</strong> (Python, baked into the runner) and <strong>openclaw-honcho</strong> (TypeScript plugin via hook/tool API). Both use the same Honcho peer paradigm — dual peer model, <code>session.context()</code>, <code>peer.chat()</code> — but they made different tradeoffs at every layer.</p>
<p>This document maps those tradeoffs and defines a porting spec: a set of Hermes-originated patterns, each stated as an integration-agnostic interface, that any Honcho integration can adopt regardless of runtime or language.</p>
<div class="callout">
<strong>Scope</strong> Both integrations work correctly today. This spec is about the delta — patterns in Hermes that are worth propagating and patterns in openclaw-honcho that Hermes should eventually adopt. The spec is additive, not prescriptive.
</div>
</section>
<!-- ARCHITECTURE -->
<section id="architecture">
<h2>Architecture comparison</h2>
<h3>Hermes: baked-in runner</h3>
<p>Honcho is initialised directly inside <code>AIAgent.__init__</code>. There is no plugin boundary. Session management, context injection, async prefetch, and CLI surface are all first-class concerns of the runner. Context is injected once per session (baked into <code>_cached_system_prompt</code>) and never re-fetched mid-session — this maximises prefix cache hits at the LLM provider.</p>
<div class="mermaid">
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1f3150', 'primaryTextColor': '#c9d1d9', 'primaryBorderColor': '#3d6ea5', 'lineColor': '#3d6ea5', 'secondaryColor': '#162030', 'tertiaryColor': '#11151c' }}}%%
flowchart TD
U["user message"] --> P["_honcho_prefetch()<br/>(reads cache — no HTTP)"]
P --> SP["_build_system_prompt()<br/>(first turn only, cached)"]
SP --> LLM["LLM call"]
LLM --> R["response"]
R --> FP["_honcho_fire_prefetch()<br/>(daemon threads, turn end)"]
FP --> C1["prefetch_context() thread"]
FP --> C2["prefetch_dialectic() thread"]
C1 --> CACHE["_context_cache / _dialectic_cache"]
C2 --> CACHE
style U fill:#162030,stroke:#3d6ea5,color:#c9d1d9
style P fill:#1f3150,stroke:#3d6ea5,color:#c9d1d9
style SP fill:#1f3150,stroke:#3d6ea5,color:#c9d1d9
style LLM fill:#162030,stroke:#3d6ea5,color:#c9d1d9
style R fill:#162030,stroke:#3d6ea5,color:#c9d1d9
style FP fill:#2a1a40,stroke:#bc8cff,color:#c9d1d9
style C1 fill:#2a1a40,stroke:#bc8cff,color:#c9d1d9
style C2 fill:#2a1a40,stroke:#bc8cff,color:#c9d1d9
style CACHE fill:#11151c,stroke:#484f58,color:#6e7681
</div>
<h3>openclaw-honcho: hook-based plugin</h3>
<p>The plugin registers hooks against OpenClaw's event bus. Context is fetched synchronously inside <code>before_prompt_build</code> on every turn. Message capture happens in <code>agent_end</code>. The multi-agent hierarchy is tracked via <code>subagent_spawned</code>. This model is correct but every turn pays a blocking Honcho round-trip before the LLM call can begin.</p>
<div class="mermaid">
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1f3150', 'primaryTextColor': '#c9d1d9', 'primaryBorderColor': '#3d6ea5', 'lineColor': '#3d6ea5', 'secondaryColor': '#162030', 'tertiaryColor': '#11151c' }}}%%
flowchart TD
U2["user message"] --> BPB["before_prompt_build<br/>(BLOCKING HTTP — every turn)"]
BPB --> CTX["session.context()"]
CTX --> SP2["system prompt assembled"]
SP2 --> LLM2["LLM call"]
LLM2 --> R2["response"]
R2 --> AE["agent_end hook"]
AE --> SAVE["session.addMessages()<br/>session.setMetadata()"]
style U2 fill:#162030,stroke:#3d6ea5,color:#c9d1d9
style BPB fill:#3a1515,stroke:#f47067,color:#c9d1d9
style CTX fill:#3a1515,stroke:#f47067,color:#c9d1d9
style SP2 fill:#1f3150,stroke:#3d6ea5,color:#c9d1d9
style LLM2 fill:#162030,stroke:#3d6ea5,color:#c9d1d9
style R2 fill:#162030,stroke:#3d6ea5,color:#c9d1d9
style AE fill:#162030,stroke:#3d6ea5,color:#c9d1d9
style SAVE fill:#11151c,stroke:#484f58,color:#6e7681
</div>
</section>
<!-- DIFF TABLE -->
<section id="diff-table">
<h2>Diff table</h2>
<div class="table-wrap">
<table>
<thead>
<tr>
<th>Dimension</th>
<th>Hermes Agent</th>
<th>openclaw-honcho</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Context injection timing</strong></td>
<td>Once per session (cached). Zero HTTP on response path after turn 1.</td>
<td>Every turn, blocking. Fresh context per turn but adds latency.</td>
</tr>
<tr>
<td><strong>Prefetch strategy</strong></td>
<td>Daemon threads fire at turn end; consumed next turn from cache.</td>
<td>None. Blocking call at prompt-build time.</td>
</tr>
<tr>
<td><strong>Dialectic (peer.chat)</strong></td>
<td>Prefetched async; result injected into system prompt next turn.</td>
<td>On-demand via <code>honcho_recall</code> / <code>honcho_analyze</code> tools.</td>
</tr>
<tr>
<td><strong>Reasoning level</strong></td>
<td>Dynamic: scales with message length. Floor = config default. Cap = "high".</td>
<td>Fixed per tool: recall=minimal, analyze=medium.</td>
</tr>
<tr>
<td><strong>Memory modes</strong></td>
<td><code>user_memory_mode</code> / <code>agent_memory_mode</code>: hybrid / honcho / local.</td>
<td>None. Always writes to Honcho.</td>
</tr>
<tr>
<td><strong>Write frequency</strong></td>
<td>async (background queue), turn, session, N turns.</td>
<td>After every agent_end (no control).</td>
</tr>
<tr>
<td><strong>AI peer identity</strong></td>
<td><code>observe_me=True</code>, <code>seed_ai_identity()</code>, <code>get_ai_representation()</code>, SOUL.md → AI peer.</td>
<td>Agent files uploaded to agent peer at setup. No ongoing self-observation seeding.</td>
</tr>
<tr>
<td><strong>Context scope</strong></td>
<td>User peer + AI peer representation, both injected.</td>
<td>User peer (owner) representation + conversation summary. <code>peerPerspective</code> on context call.</td>
</tr>
<tr>
<td><strong>Session naming</strong></td>
<td>per-directory / global / manual map / title-based.</td>
<td>Derived from platform session key.</td>
</tr>
<tr>
<td><strong>Multi-agent</strong></td>
<td>Single-agent only.</td>
<td>Parent observer hierarchy via <code>subagent_spawned</code>.</td>
</tr>
<tr>
<td><strong>Tool surface</strong></td>
<td>Single <code>query_user_context</code> tool (on-demand dialectic).</td>
<td>6 tools: session, profile, search, context (fast) + recall, analyze (LLM).</td>
</tr>
<tr>
<td><strong>Platform metadata</strong></td>
<td>Not stripped.</td>
<td>Explicitly stripped before Honcho storage.</td>
</tr>
<tr>
<td><strong>Message dedup</strong></td>
<td>None (sends on every save cycle).</td>
<td><code>lastSavedIndex</code> in session metadata prevents re-sending.</td>
</tr>
<tr>
<td><strong>CLI surface in prompt</strong></td>
<td>Management commands injected into system prompt. Agent knows its own CLI.</td>
<td>Not injected.</td>
</tr>
<tr>
<td><strong>AI peer name in identity</strong></td>
<td>Replaces "Hermes Agent" in DEFAULT_AGENT_IDENTITY when configured.</td>
<td>Not implemented.</td>
</tr>
<tr>
<td><strong>QMD / local file search</strong></td>
<td>Not implemented.</td>
<td>Passthrough tools when QMD backend configured.</td>
</tr>
<tr>
<td><strong>Workspace metadata</strong></td>
<td>Not implemented.</td>
<td><code>agentPeerMap</code> in workspace metadata tracks agent&#8594;peer ID.</td>
</tr>
</tbody>
</table>
</div>
</section>
<!-- PATTERNS -->
<section id="patterns">
<h2>Hermes patterns to port</h2>
<p>Six patterns from Hermes are worth adopting in any Honcho integration. They are described below as integration-agnostic interfaces — the implementation will differ per runtime, but the contract is the same.</p>
<div class="compare">
<div class="compare-card">
<h4>Patterns Hermes contributes</h4>
<ul>
<li>Async prefetch (zero-latency)</li>
<li>Dynamic reasoning level</li>
<li>Per-peer memory modes</li>
<li>AI peer identity formation</li>
<li>Session naming strategies</li>
<li>CLI surface injection</li>
</ul>
</div>
<div class="compare-card after">
<h4>Patterns openclaw contributes back</h4>
<ul>
<li>lastSavedIndex dedup</li>
<li>Platform metadata stripping</li>
<li>Multi-agent observer hierarchy</li>
<li>peerPerspective on context()</li>
<li>Tiered tool surface (fast/LLM)</li>
<li>Workspace agentPeerMap</li>
</ul>
</div>
</div>
</section>
<!-- SPEC: ASYNC PREFETCH -->
<section id="spec-async">
<h2>Spec: async prefetch</h2>
<h3>Problem</h3>
<p>Calling <code>session.context()</code> and <code>peer.chat()</code> synchronously before each LLM call adds 200800ms of Honcho round-trip latency to every turn. Users experience this as the agent "thinking slowly."</p>
<h3>Pattern</h3>
<p>Fire both calls as non-blocking background work at the <strong>end</strong> of each turn. Store results in a per-session cache keyed by session ID. At the <strong>start</strong> of the next turn, pop from cache — the HTTP is already done. First turn is cold (empty cache); all subsequent turns are zero-latency on the response path.</p>
<h3>Interface contract</h3>
<pre><code><span class="cm">// TypeScript (openclaw / nanobot plugin shape)</span>
<span class="kw">interface</span> <span class="key">AsyncPrefetch</span> {
<span class="cm">// Fire context + dialectic fetches at turn end. Non-blocking.</span>
firePrefetch(sessionId: <span class="str">string</span>, userMessage: <span class="str">string</span>): <span class="kw">void</span>;
<span class="cm">// Pop cached results at turn start. Returns empty if cache is cold.</span>
popContextResult(sessionId: <span class="str">string</span>): ContextResult | <span class="kw">null</span>;
popDialecticResult(sessionId: <span class="str">string</span>): <span class="str">string</span> | <span class="kw">null</span>;
}
<span class="kw">type</span> <span class="key">ContextResult</span> = {
representation: <span class="str">string</span>;
card: <span class="str">string</span>[];
aiRepresentation?: <span class="str">string</span>; <span class="cm">// AI peer context if enabled</span>
summary?: <span class="str">string</span>; <span class="cm">// conversation summary if fetched</span>
};</code></pre>
<h3>Implementation notes</h3>
<ul>
<li>Python: <code>threading.Thread(daemon=True)</code>. Write to <code>dict[session_id, result]</code> — GIL makes this safe for simple writes.</li>
<li>TypeScript: <code>Promise</code> stored in <code>Map&lt;string, Promise&lt;ContextResult&gt;&gt;</code>. Await at pop time. If not resolved yet, skip (return null) — do not block.</li>
<li>The pop is destructive: clears the cache entry after reading so stale data never accumulates.</li>
<li>Prefetch should also fire on first turn (even though it won't be consumed until turn 2) — this ensures turn 2 is never cold.</li>
</ul>
<h3>openclaw-honcho adoption</h3>
<p>Move <code>session.context()</code> from <code>before_prompt_build</code> to a post-<code>agent_end</code> background task. Store result in <code>state.contextCache</code>. In <code>before_prompt_build</code>, read from cache instead of calling Honcho. If cache is empty (turn 1), inject nothing — the prompt is still valid without Honcho context on the first turn.</p>
</section>
<!-- SPEC: DYNAMIC REASONING LEVEL -->
<section id="spec-reasoning">
<h2>Spec: dynamic reasoning level</h2>
<h3>Problem</h3>
<p>Honcho's dialectic endpoint supports reasoning levels from <code>minimal</code> to <code>max</code>. A fixed level per tool wastes budget on simple queries and under-serves complex ones.</p>
<h3>Pattern</h3>
<p>Select the reasoning level dynamically based on the user's message. Use the configured default as a floor. Bump by message length. Cap auto-selection at <code>high</code> — never select <code>max</code> automatically.</p>
<h3>Interface contract</h3>
<pre><code><span class="cm">// Shared helper — identical logic in any language</span>
<span class="kw">const</span> LEVELS = [<span class="str">"minimal"</span>, <span class="str">"low"</span>, <span class="str">"medium"</span>, <span class="str">"high"</span>, <span class="str">"max"</span>];
<span class="kw">function</span> <span class="key">dynamicReasoningLevel</span>(
query: <span class="str">string</span>,
configDefault: <span class="str">string</span> = <span class="str">"low"</span>
): <span class="str">string</span> {
<span class="kw">const</span> baseIdx = Math.max(<span class="num">0</span>, LEVELS.indexOf(configDefault));
<span class="kw">const</span> n = query.length;
<span class="kw">const</span> bump = n &lt; <span class="num">120</span> ? <span class="num">0</span> : n &lt; <span class="num">400</span> ? <span class="num">1</span> : <span class="num">2</span>;
<span class="kw">return</span> LEVELS[Math.min(baseIdx + bump, <span class="num">3</span>)]; <span class="cm">// cap at "high" (idx 3)</span>
}</code></pre>
<h3>Config key</h3>
<p>Add a <code>dialecticReasoningLevel</code> config field (string, default <code>"low"</code>). This sets the floor. Users can raise or lower it. The dynamic bump always applies on top.</p>
<h3>openclaw-honcho adoption</h3>
<p>Apply in <code>honcho_recall</code> and <code>honcho_analyze</code>: replace the fixed <code>reasoningLevel</code> with the dynamic selector. <code>honcho_recall</code> should use floor <code>"minimal"</code> and <code>honcho_analyze</code> floor <code>"medium"</code> — both still bump with message length.</p>
</section>
<!-- SPEC: PER-PEER MEMORY MODES -->
<section id="spec-modes">
<h2>Spec: per-peer memory modes</h2>
<h3>Problem</h3>
<p>Users want independent control over whether user context and agent context are written locally, to Honcho, or both. A single <code>memoryMode</code> shorthand is not granular enough.</p>
<h3>Pattern</h3>
<p>Three modes per peer: <code>hybrid</code> (write both local + Honcho), <code>honcho</code> (Honcho only, disable local files), <code>local</code> (local files only, skip Honcho sync for this peer). Two orthogonal axes: user peer and agent peer.</p>
<h3>Config schema</h3>
<pre><code><span class="cm">// ~/.openclaw/openclaw.json (or ~/.nanobot/config.json)</span>
{
<span class="str">"plugins"</span>: {
<span class="str">"openclaw-honcho"</span>: {
<span class="str">"config"</span>: {
<span class="str">"apiKey"</span>: <span class="str">"..."</span>,
<span class="str">"memoryMode"</span>: <span class="str">"hybrid"</span>, <span class="cm">// shorthand: both peers</span>
<span class="str">"userMemoryMode"</span>: <span class="str">"honcho"</span>, <span class="cm">// override for user peer</span>
<span class="str">"agentMemoryMode"</span>: <span class="str">"hybrid"</span> <span class="cm">// override for agent peer</span>
}
}
}
}</code></pre>
<h3>Resolution order</h3>
<ol>
<li>Per-peer field (<code>userMemoryMode</code> / <code>agentMemoryMode</code>) — wins if present.</li>
<li>Shorthand <code>memoryMode</code> — applies to both peers as default.</li>
<li>Hardcoded default: <code>"hybrid"</code>.</li>
</ol>
<h3>Effect on Honcho sync</h3>
<ul>
<li><code>userMemoryMode=local</code>: skip adding user peer messages to Honcho.</li>
<li><code>agentMemoryMode=local</code>: skip adding assistant peer messages to Honcho.</li>
<li>Both local: skip <code>session.addMessages()</code> entirely.</li>
<li><code>userMemoryMode=honcho</code>: disable local USER.md writes.</li>
<li><code>agentMemoryMode=honcho</code>: disable local MEMORY.md / SOUL.md writes.</li>
</ul>
</section>
<!-- SPEC: AI PEER IDENTITY -->
<section id="spec-identity">
<h2>Spec: AI peer identity formation</h2>
<h3>Problem</h3>
<p>Honcho builds the user's representation organically by observing what the user says. The same mechanism exists for the AI peer — but only if <code>observe_me=True</code> is set for the agent peer. Without it, the agent peer accumulates nothing and Honcho's AI-side model never forms.</p>
<p>Additionally, existing persona files (SOUL.md, IDENTITY.md) should seed the AI peer's Honcho representation at first activation, rather than waiting for it to emerge from scratch.</p>
<h3>Part A: observe_me=True for agent peer</h3>
<pre><code><span class="cm">// TypeScript — in session.addPeers() call</span>
<span class="kw">await</span> session.addPeers([
[ownerPeer.id, { observeMe: <span class="kw">true</span>, observeOthers: <span class="kw">false</span> }],
[agentPeer.id, { observeMe: <span class="kw">true</span>, observeOthers: <span class="kw">true</span> }], <span class="cm">// was false</span>
]);</code></pre>
<p>This is a one-line change but foundational. Without it, Honcho's AI peer representation stays empty regardless of what the agent says.</p>
<h3>Part B: seedAiIdentity()</h3>
<pre><code><span class="kw">async function</span> <span class="key">seedAiIdentity</span>(
session: HonchoSession,
agentPeer: Peer,
content: <span class="str">string</span>,
source: <span class="str">string</span>
): Promise&lt;<span class="kw">boolean</span>&gt; {
<span class="kw">const</span> wrapped = [
<span class="str">`&lt;ai_identity_seed&gt;`</span>,
<span class="str">`&lt;source&gt;${source}&lt;/source&gt;`</span>,
<span class="str">``</span>,
content.trim(),
<span class="str">`&lt;/ai_identity_seed&gt;`</span>,
].join(<span class="str">"\n"</span>);
<span class="kw">await</span> agentPeer.addMessage(<span class="str">"assistant"</span>, wrapped);
<span class="kw">return true</span>;
}</code></pre>
<h3>Part C: migrate agent files at setup</h3>
<p>During <code>openclaw honcho setup</code>, upload agent-self files (SOUL.md, IDENTITY.md, AGENTS.md, BOOTSTRAP.md) to the agent peer using <code>seedAiIdentity()</code> instead of <code>session.uploadFile()</code>. This routes the content through Honcho's observation pipeline rather than the file store.</p>
<h3>Part D: AI peer name in identity</h3>
<p>When the agent has a configured name (non-default), inject it into the agent's self-identity prefix. In OpenClaw this means adding to the injected system prompt section:</p>
<pre><code><span class="cm">// In context hook return value</span>
<span class="kw">return</span> {
systemPrompt: [
agentName ? <span class="str">`You are ${agentName}.`</span> : <span class="str">""</span>,
<span class="str">"## User Memory Context"</span>,
...sections,
].filter(Boolean).join(<span class="str">"\n\n"</span>)
};</code></pre>
<h3>CLI surface: honcho identity subcommand</h3>
<pre><code>openclaw honcho identity &lt;file&gt; <span class="cm"># seed from file</span>
openclaw honcho identity --show <span class="cm"># show current AI peer representation</span></code></pre>
</section>
<!-- SPEC: SESSION NAMING -->
<section id="spec-sessions">
<h2>Spec: session naming strategies</h2>
<h3>Problem</h3>
<p>When Honcho is used across multiple projects or directories, a single global session means every project shares the same context. Per-directory sessions provide isolation without requiring users to name sessions manually.</p>
<h3>Strategies</h3>
<div class="table-wrap">
<table>
<thead><tr><th>Strategy</th><th>Session key</th><th>When to use</th></tr></thead>
<tbody>
<tr><td><code>per-directory</code></td><td>basename of CWD</td><td>Default. Each project gets its own session.</td></tr>
<tr><td><code>global</code></td><td>fixed string <code>"global"</code></td><td>Single cross-project session.</td></tr>
<tr><td>manual map</td><td>user-configured per path</td><td><code>sessions</code> config map overrides directory basename.</td></tr>
<tr><td>title-based</td><td>sanitized session title</td><td>When agent supports named sessions; title set mid-conversation.</td></tr>
</tbody>
</table>
</div>
<h3>Config schema</h3>
<pre><code>{
<span class="str">"sessionStrategy"</span>: <span class="str">"per-directory"</span>, <span class="cm">// "per-directory" | "global"</span>
<span class="str">"sessionPeerPrefix"</span>: <span class="kw">false</span>, <span class="cm">// prepend peer name to session key</span>
<span class="str">"sessions"</span>: { <span class="cm">// manual overrides</span>
<span class="str">"/home/user/projects/foo"</span>: <span class="str">"foo-project"</span>
}
}</code></pre>
<h3>CLI surface</h3>
<pre><code>openclaw honcho sessions <span class="cm"># list all mappings</span>
openclaw honcho map &lt;name&gt; <span class="cm"># map cwd to session name</span>
openclaw honcho map <span class="cm"># no-arg = list mappings</span></code></pre>
<p>Resolution order: manual map wins &rarr; session title &rarr; directory basename &rarr; platform key.</p>
</section>
<!-- SPEC: CLI SURFACE INJECTION -->
<section id="spec-cli">
<h2>Spec: CLI surface injection</h2>
<h3>Problem</h3>
<p>When a user asks "how do I change my memory settings?" or "what Honcho commands are available?" the agent either hallucinates or says it doesn't know. The agent should know its own management interface.</p>
<h3>Pattern</h3>
<p>When Honcho is active, append a compact command reference to the system prompt. The agent can cite these commands directly instead of guessing.</p>
<pre><code><span class="cm">// In context hook, append to systemPrompt</span>
<span class="kw">const</span> honchoSection = [
<span class="str">"# Honcho memory integration"</span>,
<span class="str">`Active. Session: ${sessionKey}. Mode: ${mode}.`</span>,
<span class="str">"Management commands:"</span>,
<span class="str">" openclaw honcho status — show config + connection"</span>,
<span class="str">" openclaw honcho mode [hybrid|honcho|local] — show or set memory mode"</span>,
<span class="str">" openclaw honcho sessions — list session mappings"</span>,
<span class="str">" openclaw honcho map &lt;name&gt; — map directory to session"</span>,
<span class="str">" openclaw honcho identity [file] [--show] — seed or show AI identity"</span>,
<span class="str">" openclaw honcho setup — full interactive wizard"</span>,
].join(<span class="str">"\n"</span>);</code></pre>
<div class="callout warn">
<strong>Keep it compact.</strong> This section is injected every turn. Keep it under 300 chars of context. List commands, not explanations — the agent can explain them on request.
</div>
</section>
<!-- OPENCLAW CHECKLIST -->
<section id="openclaw-checklist">
<h2>openclaw-honcho checklist</h2>
<p>Ordered by impact. Each item maps to a spec section above.</p>
<ul class="checklist">
<li class="todo"><strong>Async prefetch</strong> — move <code>session.context()</code> out of <code>before_prompt_build</code> into post-<code>agent_end</code> background Promise. Pop from cache at prompt build. (<a href="#spec-async">spec</a>)</li>
<li class="todo"><strong>observe_me=True for agent peer</strong> — one-line change in <code>session.addPeers()</code> config for agent peer. (<a href="#spec-identity">spec</a>)</li>
<li class="todo"><strong>Dynamic reasoning level</strong> — add <code>dynamicReasoningLevel()</code> helper; apply in <code>honcho_recall</code> and <code>honcho_analyze</code>. Add <code>dialecticReasoningLevel</code> to config schema. (<a href="#spec-reasoning">spec</a>)</li>
<li class="todo"><strong>Per-peer memory modes</strong> — add <code>userMemoryMode</code> / <code>agentMemoryMode</code> to config; gate Honcho sync and local writes accordingly. (<a href="#spec-modes">spec</a>)</li>
<li class="todo"><strong>seedAiIdentity()</strong> — add helper; apply during setup migration for SOUL.md / IDENTITY.md instead of <code>session.uploadFile()</code>. (<a href="#spec-identity">spec</a>)</li>
<li class="todo"><strong>Session naming strategies</strong> — add <code>sessionStrategy</code>, <code>sessions</code> map, <code>sessionPeerPrefix</code> to config; implement resolution function. (<a href="#spec-sessions">spec</a>)</li>
<li class="todo"><strong>CLI surface injection</strong> — append command reference to <code>before_prompt_build</code> return value when Honcho is active. (<a href="#spec-cli">spec</a>)</li>
<li class="todo"><strong>honcho identity subcommand</strong> — add <code>openclaw honcho identity</code> CLI command. (<a href="#spec-identity">spec</a>)</li>
<li class="todo"><strong>AI peer name injection</strong> — if <code>aiPeer</code> name configured, prepend to injected system prompt. (<a href="#spec-identity">spec</a>)</li>
<li class="todo"><strong>honcho mode / honcho sessions / honcho map</strong> — CLI parity with Hermes. (<a href="#spec-sessions">spec</a>)</li>
</ul>
<div class="callout success">
<strong>Already done in openclaw-honcho (do not re-implement):</strong> lastSavedIndex dedup, platform metadata stripping, multi-agent parent observer hierarchy, peerPerspective on context(), tiered tool surface (fast/LLM), workspace agentPeerMap, QMD passthrough, self-hosted Honcho support.
</div>
</section>
<!-- NANOBOT CHECKLIST -->
<section id="nanobot-checklist">
<h2>nanobot-honcho checklist</h2>
<p>nanobot-honcho is a greenfield integration. Start from openclaw-honcho's architecture (hook-based, dual peer) and apply all Hermes patterns from day one rather than retrofitting. Priority order:</p>
<h3>Phase 1 — core correctness</h3>
<ul class="checklist">
<li class="todo">Dual peer model (owner + agent peer), both with <code>observe_me=True</code></li>
<li class="todo">Message capture at turn end with <code>lastSavedIndex</code> dedup</li>
<li class="todo">Platform metadata stripping before Honcho storage</li>
<li class="todo">Async prefetch from day one — do not implement blocking context injection</li>
<li class="todo">Legacy file migration at first activation (USER.md → owner peer, SOUL.md → <code>seedAiIdentity()</code>)</li>
</ul>
<h3>Phase 2 — configuration</h3>
<ul class="checklist">
<li class="todo">Config schema: <code>apiKey</code>, <code>workspaceId</code>, <code>baseUrl</code>, <code>memoryMode</code>, <code>userMemoryMode</code>, <code>agentMemoryMode</code>, <code>dialecticReasoningLevel</code>, <code>sessionStrategy</code>, <code>sessions</code></li>
<li class="todo">Per-peer memory mode gating</li>
<li class="todo">Dynamic reasoning level</li>
<li class="todo">Session naming strategies</li>
</ul>
<h3>Phase 3 — tools and CLI</h3>
<ul class="checklist">
<li class="todo">Tool surface: <code>honcho_profile</code>, <code>honcho_recall</code>, <code>honcho_analyze</code>, <code>honcho_search</code>, <code>honcho_context</code></li>
<li class="todo">CLI: <code>setup</code>, <code>status</code>, <code>sessions</code>, <code>map</code>, <code>mode</code>, <code>identity</code></li>
<li class="todo">CLI surface injection into system prompt</li>
<li class="todo">AI peer name wired into agent identity</li>
</ul>
</section>
</div>
<script type="module">
import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@11/dist/mermaid.esm.min.mjs';
mermaid.initialize({ startOnLoad: true, securityLevel: 'loose', fontFamily: 'Departure Mono, Noto Emoji, monospace' });
</script>
<script>
window.addEventListener('scroll', () => {
const bar = document.getElementById('progress');
const max = document.documentElement.scrollHeight - window.innerHeight;
bar.style.width = (max > 0 ? (window.scrollY / max) * 100 : 0) + '%';
});
</script>
</body>
</html>
-377
View File
@@ -1,377 +0,0 @@
# honcho-integration-spec
Comparison of Hermes Agent vs. openclaw-honcho — and a porting spec for bringing Hermes patterns into other Honcho integrations.
---
## Overview
Two independent Honcho integrations have been built for two different agent runtimes: **Hermes Agent** (Python, baked into the runner) and **openclaw-honcho** (TypeScript plugin via hook/tool API). Both use the same Honcho peer paradigm — dual peer model, `session.context()`, `peer.chat()` — but they made different tradeoffs at every layer.
This document maps those tradeoffs and defines a porting spec: a set of Hermes-originated patterns, each stated as an integration-agnostic interface, that any Honcho integration can adopt regardless of runtime or language.
> **Scope** Both integrations work correctly today. This spec is about the delta — patterns in Hermes that are worth propagating and patterns in openclaw-honcho that Hermes should eventually adopt. The spec is additive, not prescriptive.
---
## Architecture comparison
### Hermes: baked-in runner
Honcho is initialised directly inside `AIAgent.__init__`. There is no plugin boundary. Session management, context injection, async prefetch, and CLI surface are all first-class concerns of the runner. Context is injected once per session (baked into `_cached_system_prompt`) and never re-fetched mid-session — this maximises prefix cache hits at the LLM provider.
Turn flow:
```
user message
→ _honcho_prefetch() (reads cache — no HTTP)
→ _build_system_prompt() (first turn only, cached)
→ LLM call
→ response
→ _honcho_fire_prefetch() (daemon threads, turn end)
→ prefetch_context() thread ──┐
→ prefetch_dialectic() thread ─┴→ _context_cache / _dialectic_cache
```
### openclaw-honcho: hook-based plugin
The plugin registers hooks against OpenClaw's event bus. Context is fetched synchronously inside `before_prompt_build` on every turn. Message capture happens in `agent_end`. The multi-agent hierarchy is tracked via `subagent_spawned`. This model is correct but every turn pays a blocking Honcho round-trip before the LLM call can begin.
Turn flow:
```
user message
→ before_prompt_build (BLOCKING HTTP — every turn)
→ session.context()
→ system prompt assembled
→ LLM call
→ response
→ agent_end hook
→ session.addMessages()
→ session.setMetadata()
```
---
## Diff table
| Dimension | Hermes Agent | openclaw-honcho |
|---|---|---|
| **Context injection timing** | Once per session (cached). Zero HTTP on response path after turn 1. | Every turn, blocking. Fresh context per turn but adds latency. |
| **Prefetch strategy** | Daemon threads fire at turn end; consumed next turn from cache. | None. Blocking call at prompt-build time. |
| **Dialectic (peer.chat)** | Prefetched async; result injected into system prompt next turn. | On-demand via `honcho_recall` / `honcho_analyze` tools. |
| **Reasoning level** | Dynamic: scales with message length. Floor = config default. Cap = "high". | Fixed per tool: recall=minimal, analyze=medium. |
| **Memory modes** | `user_memory_mode` / `agent_memory_mode`: hybrid / honcho / local. | None. Always writes to Honcho. |
| **Write frequency** | async (background queue), turn, session, N turns. | After every agent_end (no control). |
| **AI peer identity** | `observe_me=True`, `seed_ai_identity()`, `get_ai_representation()`, SOUL.md → AI peer. | Agent files uploaded to agent peer at setup. No ongoing self-observation. |
| **Context scope** | User peer + AI peer representation, both injected. | User peer (owner) representation + conversation summary. `peerPerspective` on context call. |
| **Session naming** | per-directory / global / manual map / title-based. | Derived from platform session key. |
| **Multi-agent** | Single-agent only. | Parent observer hierarchy via `subagent_spawned`. |
| **Tool surface** | Single `query_user_context` tool (on-demand dialectic). | 6 tools: session, profile, search, context (fast) + recall, analyze (LLM). |
| **Platform metadata** | Not stripped. | Explicitly stripped before Honcho storage. |
| **Message dedup** | None. | `lastSavedIndex` in session metadata prevents re-sending. |
| **CLI surface in prompt** | Management commands injected into system prompt. Agent knows its own CLI. | Not injected. |
| **AI peer name in identity** | Replaces "Hermes Agent" in DEFAULT_AGENT_IDENTITY when configured. | Not implemented. |
| **QMD / local file search** | Not implemented. | Passthrough tools when QMD backend configured. |
| **Workspace metadata** | Not implemented. | `agentPeerMap` in workspace metadata tracks agent→peer ID. |
---
## Patterns
Six patterns from Hermes are worth adopting in any Honcho integration. Each is described as an integration-agnostic interface.
**Hermes contributes:**
- Async prefetch (zero-latency)
- Dynamic reasoning level
- Per-peer memory modes
- AI peer identity formation
- Session naming strategies
- CLI surface injection
**openclaw-honcho contributes back (Hermes should adopt):**
- `lastSavedIndex` dedup
- Platform metadata stripping
- Multi-agent observer hierarchy
- `peerPerspective` on `context()`
- Tiered tool surface (fast/LLM)
- Workspace `agentPeerMap`
---
## Spec: async prefetch
### Problem
Calling `session.context()` and `peer.chat()` synchronously before each LLM call adds 200800ms of Honcho round-trip latency to every turn.
### Pattern
Fire both calls as non-blocking background work at the **end** of each turn. Store results in a per-session cache keyed by session ID. At the **start** of the next turn, pop from cache — the HTTP is already done. First turn is cold (empty cache); all subsequent turns are zero-latency on the response path.
### Interface contract
```typescript
interface AsyncPrefetch {
// Fire context + dialectic fetches at turn end. Non-blocking.
firePrefetch(sessionId: string, userMessage: string): void;
// Pop cached results at turn start. Returns empty if cache is cold.
popContextResult(sessionId: string): ContextResult | null;
popDialecticResult(sessionId: string): string | null;
}
type ContextResult = {
representation: string;
card: string[];
aiRepresentation?: string; // AI peer context if enabled
summary?: string; // conversation summary if fetched
};
```
### Implementation notes
- **Python:** `threading.Thread(daemon=True)`. Write to `dict[session_id, result]` — GIL makes this safe for simple writes.
- **TypeScript:** `Promise` stored in `Map<string, Promise<ContextResult>>`. Await at pop time. If not resolved yet, return null — do not block.
- The pop is destructive: clears the cache entry after reading so stale data never accumulates.
- Prefetch should also fire on first turn (even though it won't be consumed until turn 2).
### openclaw-honcho adoption
Move `session.context()` from `before_prompt_build` to a post-`agent_end` background task. Store result in `state.contextCache`. In `before_prompt_build`, read from cache instead of calling Honcho. If cache is empty (turn 1), inject nothing — the prompt is still valid without Honcho context on the first turn.
---
## Spec: dynamic reasoning level
### Problem
Honcho's dialectic endpoint supports reasoning levels from `minimal` to `max`. A fixed level per tool wastes budget on simple queries and under-serves complex ones.
### Pattern
Select the reasoning level dynamically based on the user's message. Use the configured default as a floor. Bump by message length. Cap auto-selection at `high` — never select `max` automatically.
### Logic
```
< 120 chars → default (typically "low")
120400 chars → one level above default (cap at "high")
> 400 chars → two levels above default (cap at "high")
```
### Config key
Add `dialecticReasoningLevel` (string, default `"low"`). This sets the floor. The dynamic bump always applies on top.
### openclaw-honcho adoption
Apply in `honcho_recall` and `honcho_analyze`: replace fixed `reasoningLevel` with the dynamic selector. `honcho_recall` uses floor `"minimal"`, `honcho_analyze` uses floor `"medium"` — both still bump with message length.
---
## Spec: per-peer memory modes
### Problem
Users want independent control over whether user context and agent context are written locally, to Honcho, or both.
### Modes
| Mode | Effect |
|---|---|
| `hybrid` | Write to both local files and Honcho (default) |
| `honcho` | Honcho only — disable corresponding local file writes |
| `local` | Local files only — skip Honcho sync for this peer |
### Config schema
```json
{
"memoryMode": "hybrid",
"userMemoryMode": "honcho",
"agentMemoryMode": "hybrid"
}
```
Resolution order: per-peer field wins → shorthand `memoryMode` → default `"hybrid"`.
### Effect on Honcho sync
- `userMemoryMode=local`: skip adding user peer messages to Honcho
- `agentMemoryMode=local`: skip adding assistant peer messages to Honcho
- Both local: skip `session.addMessages()` entirely
- `userMemoryMode=honcho`: disable local USER.md writes
- `agentMemoryMode=honcho`: disable local MEMORY.md / SOUL.md writes
---
## Spec: AI peer identity formation
### Problem
Honcho builds the user's representation organically by observing what the user says. The same mechanism exists for the AI peer — but only if `observe_me=True` is set for the agent peer. Without it, the agent peer accumulates nothing.
Additionally, existing persona files (SOUL.md, IDENTITY.md) should seed the AI peer's Honcho representation at first activation.
### Part A: observe_me=True for agent peer
```typescript
await session.addPeers([
[ownerPeer.id, { observeMe: true, observeOthers: false }],
[agentPeer.id, { observeMe: true, observeOthers: true }], // was false
]);
```
One-line change. Foundational. Without it, the AI peer representation stays empty regardless of what the agent says.
### Part B: seedAiIdentity()
```typescript
async function seedAiIdentity(
agentPeer: Peer,
content: string,
source: string
): Promise<boolean> {
const wrapped = [
`<ai_identity_seed>`,
`<source>${source}</source>`,
``,
content.trim(),
`</ai_identity_seed>`,
].join("\n");
await agentPeer.addMessage("assistant", wrapped);
return true;
}
```
### Part C: migrate agent files at setup
During `honcho setup`, upload agent-self files (SOUL.md, IDENTITY.md, AGENTS.md) to the agent peer via `seedAiIdentity()` instead of `session.uploadFile()`. This routes content through Honcho's observation pipeline.
### Part D: AI peer name in identity
When the agent has a configured name, prepend it to the injected system prompt:
```typescript
const namePrefix = agentName ? `You are ${agentName}.\n\n` : "";
return { systemPrompt: namePrefix + "## User Memory Context\n\n" + sections };
```
### CLI surface
```
honcho identity <file> # seed from file
honcho identity --show # show current AI peer representation
```
---
## Spec: session naming strategies
### Problem
A single global session means every project shares the same Honcho context. Per-directory sessions provide isolation without requiring users to name sessions manually.
### Strategies
| Strategy | Session key | When to use |
|---|---|---|
| `per-directory` | basename of CWD | Default. Each project gets its own session. |
| `global` | fixed string `"global"` | Single cross-project session. |
| manual map | user-configured per path | `sessions` config map overrides directory basename. |
| title-based | sanitized session title | When agent supports named sessions set mid-conversation. |
### Config schema
```json
{
"sessionStrategy": "per-directory",
"sessionPeerPrefix": false,
"sessions": {
"/home/user/projects/foo": "foo-project"
}
}
```
### CLI surface
```
honcho sessions # list all mappings
honcho map <name> # map cwd to session name
honcho map # no-arg = list mappings
```
Resolution order: manual map → session title → directory basename → platform key.
---
## Spec: CLI surface injection
### Problem
When a user asks "how do I change my memory settings?" the agent either hallucinates or says it doesn't know. The agent should know its own management interface.
### Pattern
When Honcho is active, append a compact command reference to the system prompt. Keep it under 300 chars.
```
# Honcho memory integration
Active. Session: {sessionKey}. Mode: {mode}.
Management commands:
honcho status — show config + connection
honcho mode [hybrid|honcho|local] — show or set memory mode
honcho sessions — list session mappings
honcho map <name> — map directory to session
honcho identity [file] [--show] — seed or show AI identity
honcho setup — full interactive wizard
```
---
## openclaw-honcho checklist
Ordered by impact:
- [ ] **Async prefetch** — move `session.context()` out of `before_prompt_build` into post-`agent_end` background Promise
- [ ] **observe_me=True for agent peer** — one-line change in `session.addPeers()`
- [ ] **Dynamic reasoning level** — add helper; apply in `honcho_recall` and `honcho_analyze`; add `dialecticReasoningLevel` to config
- [ ] **Per-peer memory modes** — add `userMemoryMode` / `agentMemoryMode` to config; gate Honcho sync and local writes
- [ ] **seedAiIdentity()** — add helper; use during setup migration for SOUL.md / IDENTITY.md
- [ ] **Session naming strategies** — add `sessionStrategy`, `sessions` map, `sessionPeerPrefix`
- [ ] **CLI surface injection** — append command reference to `before_prompt_build` return value
- [ ] **honcho identity subcommand** — seed from file or `--show` current representation
- [ ] **AI peer name injection** — if `aiPeer` name configured, prepend to injected system prompt
- [ ] **honcho mode / sessions / map** — CLI parity with Hermes
Already done in openclaw-honcho (do not re-implement): `lastSavedIndex` dedup, platform metadata stripping, multi-agent parent observer, `peerPerspective` on `context()`, tiered tool surface, workspace `agentPeerMap`, QMD passthrough, self-hosted Honcho.
---
## nanobot-honcho checklist
Greenfield integration. Start from openclaw-honcho's architecture and apply all Hermes patterns from day one.
### Phase 1 — core correctness
- [ ] Dual peer model (owner + agent peer), both with `observe_me=True`
- [ ] Message capture at turn end with `lastSavedIndex` dedup
- [ ] Platform metadata stripping before Honcho storage
- [ ] Async prefetch from day one — do not implement blocking context injection
- [ ] Legacy file migration at first activation (USER.md → owner peer, SOUL.md → `seedAiIdentity()`)
### Phase 2 — configuration
- [ ] Config schema: `apiKey`, `workspaceId`, `baseUrl`, `memoryMode`, `userMemoryMode`, `agentMemoryMode`, `dialecticReasoningLevel`, `sessionStrategy`, `sessions`
- [ ] Per-peer memory mode gating
- [ ] Dynamic reasoning level
- [ ] Session naming strategies
### Phase 3 — tools and CLI
- [ ] Tool surface: `honcho_profile`, `honcho_recall`, `honcho_analyze`, `honcho_search`, `honcho_context`
- [ ] CLI: `setup`, `status`, `sessions`, `map`, `mode`, `identity`
- [ ] CLI surface injection into system prompt
- [ ] AI peer name wired into agent identity
-142
View File
@@ -1,142 +0,0 @@
# Migrating from OpenClaw to Hermes Agent
This guide covers how to import your OpenClaw settings, memories, skills, and API keys into Hermes Agent.
## Three Ways to Migrate
### 1. Automatic (during first-time setup)
When you run `hermes setup` for the first time and Hermes detects `~/.openclaw`, it automatically offers to import your OpenClaw data before configuration begins. Just accept the prompt and everything is handled for you.
### 2. CLI Command (quick, scriptable)
```bash
hermes claw migrate # Preview then migrate (always shows preview first)
hermes claw migrate --dry-run # Preview only, no changes
hermes claw migrate --preset user-data # Migrate without API keys/secrets
hermes claw migrate --yes # Skip confirmation prompt
```
The migration always shows a full preview of what will be imported before making any changes. You review the preview and confirm before anything is written.
**All options:**
| Flag | Description |
|------|-------------|
| `--source PATH` | Path to OpenClaw directory (default: `~/.openclaw`) |
| `--dry-run` | Preview only — no files are modified |
| `--preset {user-data,full}` | Migration preset (default: `full`). `user-data` excludes secrets |
| `--overwrite` | Overwrite existing files (default: skip conflicts) |
| `--migrate-secrets` | Include allowlisted secrets (auto-enabled with `full` preset) |
| `--workspace-target PATH` | Copy workspace instructions (AGENTS.md) to this absolute path |
| `--skill-conflict {skip,overwrite,rename}` | How to handle skill name conflicts (default: `skip`) |
| `--yes`, `-y` | Skip confirmation prompts |
### 3. Agent-Guided (interactive, with previews)
Ask the agent to run the migration for you:
```
> Migrate my OpenClaw setup to Hermes
```
The agent will use the `openclaw-migration` skill to:
1. Run a preview first to show what would change
2. Ask about conflict resolution (SOUL.md, skills, etc.)
3. Let you choose between `user-data` and `full` presets
4. Execute the migration with your choices
5. Print a detailed summary of what was migrated
## What Gets Migrated
### `user-data` preset
| Item | Source | Destination |
|------|--------|-------------|
| SOUL.md | `~/.openclaw/workspace/SOUL.md` | `~/.hermes/SOUL.md` |
| Memory entries | `~/.openclaw/workspace/MEMORY.md` | `~/.hermes/memories/MEMORY.md` |
| User profile | `~/.openclaw/workspace/USER.md` | `~/.hermes/memories/USER.md` |
| Skills | `~/.openclaw/workspace/skills/` | `~/.hermes/skills/openclaw-imports/` |
| Command allowlist | `~/.openclaw/workspace/exec_approval_patterns.yaml` | Merged into `~/.hermes/config.yaml` |
| Messaging settings | `~/.openclaw/config.yaml` (TELEGRAM_ALLOWED_USERS, MESSAGING_CWD) | `~/.hermes/.env` |
| TTS assets | `~/.openclaw/workspace/tts/` | `~/.hermes/tts/` |
Workspace files are also checked at `workspace.default/` and `workspace-main/` as fallback paths (OpenClaw renamed `workspace/` to `workspace-main/` in recent versions).
### `full` preset (adds to `user-data`)
| Item | Source | Destination |
|------|--------|-------------|
| Telegram bot token | `openclaw.json` channels config | `~/.hermes/.env` |
| OpenRouter API key | `.env`, `openclaw.json`, or `openclaw.json["env"]` | `~/.hermes/.env` |
| OpenAI API key | `.env`, `openclaw.json`, or `openclaw.json["env"]` | `~/.hermes/.env` |
| Anthropic API key | `.env`, `openclaw.json`, or `openclaw.json["env"]` | `~/.hermes/.env` |
| ElevenLabs API key | `.env`, `openclaw.json`, or `openclaw.json["env"]` | `~/.hermes/.env` |
API keys are searched across four sources: inline config values, `~/.openclaw/.env`, the `openclaw.json` `"env"` sub-object, and per-agent auth profiles.
Only allowlisted secrets are ever imported. Other credentials are skipped and reported.
## OpenClaw Schema Compatibility
The migration handles both old and current OpenClaw config layouts:
- **Channel tokens**: Reads from flat paths (`channels.telegram.botToken`) and the newer `accounts.default` layout (`channels.telegram.accounts.default.botToken`)
- **TTS provider**: OpenClaw renamed "edge" to "microsoft" — both are recognized and mapped to Hermes' "edge"
- **Provider API types**: Both short (`openai`, `anthropic`) and hyphenated (`openai-completions`, `anthropic-messages`, `google-generative-ai`) values are mapped correctly
- **thinkingDefault**: All enum values are handled including newer ones (`minimal`, `xhigh`, `adaptive`)
- **Matrix**: Uses `accessToken` field (not `botToken`)
- **SecretRef formats**: Plain strings, env templates (`${VAR}`), and `source: "env"` SecretRefs are resolved. `source: "file"` and `source: "exec"` SecretRefs produce a warning — add those keys manually after migration.
## Conflict Handling
By default, the migration **will not overwrite** existing Hermes data:
- **SOUL.md** — skipped if one already exists in `~/.hermes/`
- **Memory entries** — skipped if memories already exist (to avoid duplicates)
- **Skills** — skipped if a skill with the same name already exists
- **API keys** — skipped if the key is already set in `~/.hermes/.env`
To overwrite conflicts, use `--overwrite`. The migration creates backups before overwriting.
For skills, you can also use `--skill-conflict rename` to import conflicting skills under a new name (e.g., `skill-name-imported`).
## Migration Report
Every migration produces a report showing:
- **Migrated items** — what was successfully imported
- **Conflicts** — items skipped because they already exist
- **Skipped items** — items not found in the source
- **Errors** — items that failed to import
For executed migrations, the full report is saved to `~/.hermes/migration/openclaw/<timestamp>/`.
## Post-Migration Notes
- **Skills require a new session** — imported skills take effect after restarting your agent or starting a new chat.
- **WhatsApp requires re-pairing** — WhatsApp uses QR-code pairing, not token-based auth. Run `hermes whatsapp` to pair.
- **Archive cleanup** — after migration, you'll be offered to rename `~/.openclaw/` to `.openclaw.pre-migration/` to prevent state confusion. You can also run `hermes claw cleanup` later.
## Troubleshooting
### "OpenClaw directory not found"
The migration looks for `~/.openclaw` by default, then tries `~/.clawdbot` and `~/.moltbot`. If your OpenClaw is installed elsewhere, use `--source`:
```bash
hermes claw migrate --source /path/to/.openclaw
```
### "Migration script not found"
The migration script ships with Hermes Agent. If you installed via pip (not git clone), the `optional-skills/` directory may not be present. Install the skill from the Skills Hub:
```bash
hermes skills install openclaw-migration
```
### Memory overflow
If your OpenClaw MEMORY.md or USER.md exceeds Hermes' character limits, excess entries are exported to an overflow file in the migration report directory. You can manually review and add the most important ones.
### API keys not found
Keys might be stored in different places depending on your OpenClaw setup:
- `~/.openclaw/.env` file
- Inline in `openclaw.json` under `models.providers.*.apiKey`
- In `openclaw.json` under the `"env"` or `"env.vars"` sub-objects
- In `~/.openclaw/agents/main/agent/auth-profiles.json`
The migration checks all four. If keys use `source: "file"` or `source: "exec"` SecretRefs, they can't be resolved automatically — add them via `hermes config set`.
@@ -1,608 +0,0 @@
# Pricing Accuracy Architecture
Date: 2026-03-16
## Goal
Hermes should only show dollar costs when they are backed by an official source for the user's actual billing path.
This design replaces the current static, heuristic pricing flow in:
- `run_agent.py`
- `agent/usage_pricing.py`
- `agent/insights.py`
- `cli.py`
with a provider-aware pricing system that:
- handles cache billing correctly
- distinguishes `actual` vs `estimated` vs `included` vs `unknown`
- reconciles post-hoc costs when providers expose authoritative billing data
- supports direct providers, OpenRouter, subscriptions, enterprise pricing, and custom endpoints
## Problems In The Current Design
Current Hermes behavior has four structural issues:
1. It stores only `prompt_tokens` and `completion_tokens`, which is insufficient for providers that bill cache reads and cache writes separately.
2. It uses a static model price table and fuzzy heuristics, which can drift from current official pricing.
3. It assumes public API list pricing matches the user's real billing path.
4. It has no distinction between live estimates and reconciled billed cost.
## Design Principles
1. Normalize usage before pricing.
2. Never fold cached tokens into plain input cost.
3. Track certainty explicitly.
4. Treat the billing path as part of the model identity.
5. Prefer official machine-readable sources over scraped docs.
6. Use post-hoc provider cost APIs when available.
7. Show `n/a` rather than inventing precision.
## High-Level Architecture
The new system has four layers:
1. `usage_normalization`
Converts raw provider usage into a canonical usage record.
2. `pricing_source_resolution`
Determines the billing path, source of truth, and applicable pricing source.
3. `cost_estimation_and_reconciliation`
Produces an immediate estimate when possible, then replaces or annotates it with actual billed cost later.
4. `presentation`
`/usage`, `/insights`, and the status bar display cost with certainty metadata.
## Canonical Usage Record
Add a canonical usage model that every provider path maps into before any pricing math happens.
Suggested structure:
```python
@dataclass
class CanonicalUsage:
provider: str
billing_provider: str
model: str
billing_route: str
input_tokens: int = 0
output_tokens: int = 0
cache_read_tokens: int = 0
cache_write_tokens: int = 0
reasoning_tokens: int = 0
request_count: int = 1
raw_usage: dict[str, Any] | None = None
raw_usage_fields: dict[str, str] | None = None
computed_fields: set[str] | None = None
provider_request_id: str | None = None
provider_generation_id: str | None = None
provider_response_id: str | None = None
```
Rules:
- `input_tokens` means non-cached input only.
- `cache_read_tokens` and `cache_write_tokens` are never merged into `input_tokens`.
- `output_tokens` excludes cache metrics.
- `reasoning_tokens` is telemetry unless a provider officially bills it separately.
This is the same normalization pattern used by `opencode`, extended with provenance and reconciliation ids.
## Provider Normalization Rules
### OpenAI Direct
Source usage fields:
- `prompt_tokens`
- `completion_tokens`
- `prompt_tokens_details.cached_tokens`
Normalization:
- `cache_read_tokens = cached_tokens`
- `input_tokens = prompt_tokens - cached_tokens`
- `cache_write_tokens = 0` unless OpenAI exposes it in the relevant route
- `output_tokens = completion_tokens`
### Anthropic Direct
Source usage fields:
- `input_tokens`
- `output_tokens`
- `cache_read_input_tokens`
- `cache_creation_input_tokens`
Normalization:
- `input_tokens = input_tokens`
- `output_tokens = output_tokens`
- `cache_read_tokens = cache_read_input_tokens`
- `cache_write_tokens = cache_creation_input_tokens`
### OpenRouter
Estimate-time usage normalization should use the response usage payload with the same rules as the underlying provider when possible.
Reconciliation-time records should also store:
- OpenRouter generation id
- native token fields when available
- `total_cost`
- `cache_discount`
- `upstream_inference_cost`
- `is_byok`
### Gemini / Vertex
Use official Gemini or Vertex usage fields where available.
If cached content tokens are exposed:
- map them to `cache_read_tokens`
If a route exposes no cache creation metric:
- store `cache_write_tokens = 0`
- preserve the raw usage payload for later extension
### DeepSeek And Other Direct Providers
Normalize only the fields that are officially exposed.
If a provider does not expose cache buckets:
- do not infer them unless the provider explicitly documents how to derive them
### Subscription / Included-Cost Routes
These still use the canonical usage model.
Tokens are tracked normally. Cost depends on billing mode, not on whether usage exists.
## Billing Route Model
Hermes must stop keying pricing solely by `model`.
Introduce a billing route descriptor:
```python
@dataclass
class BillingRoute:
provider: str
base_url: str | None
model: str
billing_mode: str
organization_hint: str | None = None
```
`billing_mode` values:
- `official_cost_api`
- `official_generation_api`
- `official_models_api`
- `official_docs_snapshot`
- `subscription_included`
- `user_override`
- `custom_contract`
- `unknown`
Examples:
- OpenAI direct API with Costs API access: `official_cost_api`
- Anthropic direct API with Usage & Cost API access: `official_cost_api`
- OpenRouter request before reconciliation: `official_models_api`
- OpenRouter request after generation lookup: `official_generation_api`
- GitHub Copilot style subscription route: `subscription_included`
- local OpenAI-compatible server: `unknown`
- enterprise contract with configured rates: `custom_contract`
## Cost Status Model
Every displayed cost should have:
```python
@dataclass
class CostResult:
amount_usd: Decimal | None
status: Literal["actual", "estimated", "included", "unknown"]
source: Literal[
"provider_cost_api",
"provider_generation_api",
"provider_models_api",
"official_docs_snapshot",
"user_override",
"custom_contract",
"none",
]
label: str
fetched_at: datetime | None
pricing_version: str | None
notes: list[str]
```
Presentation rules:
- `actual`: show dollar amount as final
- `estimated`: show dollar amount with estimate labeling
- `included`: show `included` or `$0.00 (included)` depending on UX choice
- `unknown`: show `n/a`
## Official Source Hierarchy
Resolve cost using this order:
1. Request-level or account-level official billed cost
2. Official machine-readable model pricing
3. Official docs snapshot
4. User override or custom contract
5. Unknown
The system must never skip to a lower level if a higher-confidence source exists for the current billing route.
## Provider-Specific Truth Rules
### OpenAI Direct
Preferred truth:
1. Costs API for reconciled spend
2. Official pricing page for live estimate
### Anthropic Direct
Preferred truth:
1. Usage & Cost API for reconciled spend
2. Official pricing docs for live estimate
### OpenRouter
Preferred truth:
1. `GET /api/v1/generation` for reconciled `total_cost`
2. `GET /api/v1/models` pricing for live estimate
Do not use underlying provider public pricing as the source of truth for OpenRouter billing.
### Gemini / Vertex
Preferred truth:
1. official billing export or billing API for reconciled spend when available for the route
2. official pricing docs for estimate
### DeepSeek
Preferred truth:
1. official machine-readable cost source if available in the future
2. official pricing docs snapshot today
### Subscription-Included Routes
Preferred truth:
1. explicit route config marking the model as included in subscription
These should display `included`, not an API list-price estimate.
### Custom Endpoint / Local Model
Preferred truth:
1. user override
2. custom contract config
3. unknown
These should default to `unknown`.
## Pricing Catalog
Replace the current `MODEL_PRICING` dict with a richer pricing catalog.
Suggested record:
```python
@dataclass
class PricingEntry:
provider: str
route_pattern: str
model_pattern: str
input_cost_per_million: Decimal | None = None
output_cost_per_million: Decimal | None = None
cache_read_cost_per_million: Decimal | None = None
cache_write_cost_per_million: Decimal | None = None
request_cost: Decimal | None = None
image_cost: Decimal | None = None
source: str = "official_docs_snapshot"
source_url: str | None = None
fetched_at: datetime | None = None
pricing_version: str | None = None
```
The catalog should be route-aware:
- `openai:gpt-5`
- `anthropic:claude-opus-4-6`
- `openrouter:anthropic/claude-opus-4.6`
- `copilot:gpt-4o`
This avoids conflating direct-provider billing with aggregator billing.
## Pricing Sync Architecture
Introduce a pricing sync subsystem instead of manually maintaining a single hardcoded table.
Suggested modules:
- `agent/pricing/catalog.py`
- `agent/pricing/sources.py`
- `agent/pricing/sync.py`
- `agent/pricing/reconcile.py`
- `agent/pricing/types.py`
### Sync Sources
- OpenRouter models API
- official provider docs snapshots where no API exists
- user overrides from config
### Sync Output
Cache pricing entries locally with:
- source URL
- fetch timestamp
- version/hash
- confidence/source type
### Sync Frequency
- startup warm cache
- background refresh every 6 to 24 hours depending on source
- manual `hermes pricing sync`
## Reconciliation Architecture
Live requests may produce only an estimate initially. Hermes should reconcile them later when a provider exposes actual billed cost.
Suggested flow:
1. Agent call completes.
2. Hermes stores canonical usage plus reconciliation ids.
3. Hermes computes an immediate estimate if a pricing source exists.
4. A reconciliation worker fetches actual cost when supported.
5. Session and message records are updated with `actual` cost.
This can run:
- inline for cheap lookups
- asynchronously for delayed provider accounting
## Persistence Changes
Session storage should stop storing only aggregate prompt/completion totals.
Add fields for both usage and cost certainty:
- `input_tokens`
- `output_tokens`
- `cache_read_tokens`
- `cache_write_tokens`
- `reasoning_tokens`
- `estimated_cost_usd`
- `actual_cost_usd`
- `cost_status`
- `cost_source`
- `pricing_version`
- `billing_provider`
- `billing_mode`
If schema expansion is too large for one PR, add a new pricing events table:
```text
session_cost_events
id
session_id
request_id
provider
model
billing_mode
input_tokens
output_tokens
cache_read_tokens
cache_write_tokens
estimated_cost_usd
actual_cost_usd
cost_status
cost_source
pricing_version
created_at
updated_at
```
## Hermes Touchpoints
### `run_agent.py`
Current responsibility:
- parse raw provider usage
- update session token counters
New responsibility:
- build `CanonicalUsage`
- update canonical counters
- store reconciliation ids
- emit usage event to pricing subsystem
### `agent/usage_pricing.py`
Current responsibility:
- static lookup table
- direct cost arithmetic
New responsibility:
- move or replace with pricing catalog facade
- no fuzzy model-family heuristics
- no direct pricing without billing-route context
### `cli.py`
Current responsibility:
- compute session cost directly from prompt/completion totals
New responsibility:
- display `CostResult`
- show status badges:
- `actual`
- `estimated`
- `included`
- `n/a`
### `agent/insights.py`
Current responsibility:
- recompute historical estimates from static pricing
New responsibility:
- aggregate stored pricing events
- prefer actual cost over estimate
- surface estimates only when reconciliation is unavailable
## UX Rules
### Status Bar
Show one of:
- `$1.42`
- `~$1.42`
- `included`
- `cost n/a`
Where:
- `$1.42` means `actual`
- `~$1.42` means `estimated`
- `included` means subscription-backed or explicitly zero-cost route
- `cost n/a` means unknown
### `/usage`
Show:
- token buckets
- estimated cost
- actual cost if available
- cost status
- pricing source
### `/insights`
Aggregate:
- actual cost totals
- estimated-only totals
- unknown-cost sessions count
- included-cost sessions count
## Config And Overrides
Add user-configurable pricing overrides in config:
```yaml
pricing:
mode: hybrid
sync_on_startup: true
sync_interval_hours: 12
overrides:
- provider: openrouter
model: anthropic/claude-opus-4.6
billing_mode: custom_contract
input_cost_per_million: 4.25
output_cost_per_million: 22.0
cache_read_cost_per_million: 0.5
cache_write_cost_per_million: 6.0
included_routes:
- provider: copilot
model: "*"
- provider: codex-subscription
model: "*"
```
Overrides must win over catalog defaults for the matching billing route.
## Rollout Plan
### Phase 1
- add canonical usage model
- split cache token buckets in `run_agent.py`
- stop pricing cache-inflated prompt totals
- preserve current UI with improved backend math
### Phase 2
- add route-aware pricing catalog
- integrate OpenRouter models API sync
- add `estimated` vs `included` vs `unknown`
### Phase 3
- add reconciliation for OpenRouter generation cost
- add actual cost persistence
- update `/insights` to prefer actual cost
### Phase 4
- add direct OpenAI and Anthropic reconciliation paths
- add user overrides and contract pricing
- add pricing sync CLI command
## Testing Strategy
Add tests for:
- OpenAI cached token subtraction
- Anthropic cache read/write separation
- OpenRouter estimated vs actual reconciliation
- subscription-backed models showing `included`
- custom endpoints showing `n/a`
- override precedence
- stale catalog fallback behavior
Current tests that assume heuristic pricing should be replaced with route-aware expectations.
## Non-Goals
- exact enterprise billing reconstruction without an official source or user override
- backfilling perfect historical cost for old sessions that lack cache bucket data
- scraping arbitrary provider web pages at request time
## Recommendation
Do not expand the existing `MODEL_PRICING` dict.
That path cannot satisfy the product requirement. Hermes should instead migrate to:
- canonical usage normalization
- route-aware pricing sources
- estimate-then-reconcile cost lifecycle
- explicit certainty states in the UI
This is the minimum architecture that makes the statement "Hermes pricing is backed by official sources where possible, and otherwise clearly labeled" defensible.
@@ -1,108 +0,0 @@
# Ink Gateway TUI Migration — Post-mortem
Planned: 2026-04-01 · Delivered: 2026-04 · Status: shipped, classic (prompt_toolkit) CLI still present
## What Shipped
Three layers, same repo, Python runtime unchanged.
```
ui-tui (Node/TS) ──stdio JSON-RPC──▶ tui_gateway (Py) ──▶ AIAgent (run_agent.py)
```
### Backend — `tui_gateway/`
```
tui_gateway/
├── entry.py # subprocess entrypoint, stdio read/write loop
├── server.py # everything: sessions dict, @method handlers, _emit
├── render.py # stream renderer, diff rendering, message rendering
├── slash_worker.py # subprocess that runs hermes_cli slash commands
└── __init__.py
```
`server.py` owns the full runtime-control surface: session store (`_sessions: dict[str, dict]`), method registry (`@method("…")` decorator), event emitter (`_emit`), agent lifecycle (`_make_agent`, `_init_session`, `_wire_callbacks`), approval/sudo/clarify round-trips, and JSON-RPC dispatch.
Protocol methods (`@method(...)` in `server.py`):
- session: `session.{create, resume, list, close, interrupt, usage, history, compress, branch, title, save, undo}`
- prompt: `prompt.{submit, background, btw}`
- tools: `tools.{list, show, configure}`
- slash: `slash.exec`, `command.{dispatch, resolve}`, `commands.catalog`, `complete.{path, slash}`
- approvals: `approval.respond`, `sudo.respond`, `clarify.respond`, `secret.respond`
- config/state: `config.{get, set, show}`, `model.options`, `reload.mcp`
- ops: `shell.exec`, `cli.exec`, `terminal.resize`, `input.detect_drop`, `clipboard.paste`, `paste.collapse`, `image.attach`, `process.stop`
- misc: `agents.list`, `skills.manage`, `plugins.list`, `cron.manage`, `insights.get`, `rollback.{list, diff, restore}`, `browser.manage`
Protocol events (`_emit(…)` → handled in `ui-tui/src/app/createGatewayEventHandler.ts`):
- lifecycle: `gateway.{ready, stderr}`, `session.info`, `skin.changed`
- stream: `message.{start, delta, complete}`, `thinking.delta`, `reasoning.{delta, available}`, `status.update`
- tools: `tool.{start, progress, complete, generating}`, `subagent.{start, thinking, tool, progress, complete}`
- interactive: `approval.request`, `sudo.request`, `clarify.request`, `secret.request`
- async: `background.complete`, `btw.complete`, `error`
### Frontend — `ui-tui/src/`
```
src/
├── entry.tsx # node bootstrap: bootBanner → spawn python → dynamic-import Ink → render(<App/>)
├── app.tsx # <GatewayProvider> wraps <AppLayout>
├── bootBanner.ts # raw-ANSI banner to stdout in ~2ms, pre-React
├── gatewayClient.ts # JSON-RPC client over child_process stdio
├── gatewayTypes.ts # typed RPC responses + GatewayEvent union
├── theme.ts # DEFAULT_THEME + fromSkin
├── app/ # hooks + stores — the orchestration layer
│ ├── uiStore.ts # nanostore: sid, info, busy, usage, theme, status…
│ ├── turnStore.ts # nanostore: per-turn activity / reasoning / tools
│ ├── turnController.ts # imperative singleton for stream-time operations
│ ├── overlayStore.ts # nanostore: modal/overlay state
│ ├── useMainApp.ts # top-level composition hook
│ ├── useSessionLifecycle.ts # session.create/resume/close/reset
│ ├── useSubmission.ts # shell/slash/prompt dispatch + interpolation
│ ├── useConfigSync.ts # config.get + mtime poll
│ ├── useComposerState.ts # input buffer, paste snippets, editor mode
│ ├── useInputHandlers.ts # key bindings
│ ├── createGatewayEventHandler.ts # event-stream dispatcher
│ ├── createSlashHandler.ts # slash command router (registry + python fallback)
│ └── slash/commands/ # core.ts, ops.ts, session.ts — TS-owned slash commands
├── components/ # AppLayout, AppChrome, AppOverlays, MessageLine, Thinking, Markdown, pickers, prompts, Banner, SessionPanel
├── config/ # env, limits, timing constants
├── content/ # charms, faces, fortunes, hotkeys, placeholders, verbs
├── domain/ # details, messages, paths, roles, slash, usage, viewport
├── protocol/ # interpolation, paste regex
├── hooks/ # useCompletion, useInputHistory, useQueue, useVirtualHistory
└── lib/ # history, messages, osc52, rpc, text
```
### CLI entry points — `hermes_cli/main.py`
- `hermes --tui``node dist/entry.js` (auto-builds when `.ts`/`.tsx` newer than `dist/entry.js`)
- `hermes --tui --dev``tsx src/entry.tsx` (skip build)
- `HERMES_TUI_DIR=…` → external prebuilt dist (nix, distro packaging)
## Diverged From Original Plan
| Plan | Reality | Why |
|---|---|---|
| `tui_gateway/{controller,session_state,events,protocol}.py` | all collapsed into `server.py` | no second consumer ever emerged, keeping one file cheaper than four |
| `ui-tui/src/main.tsx` | split into `entry.tsx` (bootstrap) + `app.tsx` (shell) | boot banner + early python spawn wanted a pre-React moment |
| `ui-tui/src/state/store.ts` | three nanostores (`uiStore`, `turnStore`, `overlayStore`) | separate lifetimes: ui persists, turn resets per reply, overlay is modal |
| `approval.requested` / `sudo.requested` / `clarify.requested` | `*.request` (no `-ed`) | cosmetic |
| `session.cancel` | dropped | `session.interrupt` covers it |
| `HERMES_EXPERIMENTAL_TUI=1`, `display.experimental_tui: true`, `/tui on/off/status` | none shipped | `--tui` went from opt-in to first-class without an experimental phase |
## Post-migration Additions (not in original plan)
- **Async `session.create`** — returns sid in ~1ms, agent builds on a background thread, `session.info` broadcasts when ready; `_wait_agent()` gates every agent-touching handler via `_sess`
- **`bootBanner`** — raw-ANSI logo painted to stdout at T≈2ms, before Ink loads; `<AlternateScreen>` wipes it seamlessly when React mounts
- **Selection uniform bg**`theme.color.selectionBg` wired via `useSelection().setSelectionBgColor`; replaces SGR-inverse per-cell swap that fragmented over amber/gold fg
- **Slash command registry** — TS-owned commands in `app/slash/commands/{core,ops,session}.ts`, everything else falls through to `slash.exec` (python worker)
- **Turn store + controller split** — imperative singleton (`turnController`) holds refs/timers, nanostore (`turnStore`) holds render-visible state
## What's Still Open
- **Classic CLI not deleted.** `cli.py` still has ~80 `prompt_toolkit` references; classic REPL is still the default when `--tui` is absent. The original plan's "Cut 4 · prompt_toolkit removal later" hasn't happened.
- **No config-file opt-in.** `HERMES_EXPERIMENTAL_TUI` and `display.experimental_tui` were never built; only the CLI flag exists. Fine for now — if we want "default to TUI", a single line in `main.py` flips it.
-106
View File
@@ -1,106 +0,0 @@
# ============================================================================
# Hermes Agent — Example Skin Template
# ============================================================================
#
# Copy this file to ~/.hermes/skins/<name>.yaml to create a custom skin.
# All fields are optional — missing values inherit from the default skin.
# Activate with: /skin <name> or display.skin: <name> in config.yaml
#
# Keys are marked:
# (both) — applies to both the classic CLI and the TUI
# (classic) — classic CLI only (see hermes --tui in user-guide/tui.md)
# (tui) — TUI only
#
# See hermes_cli/skin_engine.py for the full schema reference.
# ============================================================================
# Required: unique skin name (used in /skin command and config)
name: example
description: An example custom skin — copy and modify this template
# ── Colors ──────────────────────────────────────────────────────────────────
# Hex color values. These control the visual palette.
colors:
# Banner panel (the startup welcome box) — (both)
banner_border: "#CD7F32" # Panel border
banner_title: "#FFD700" # Panel title text
banner_accent: "#FFBF00" # Section headers (Available Tools, Skills, etc.)
banner_dim: "#B8860B" # Dim/muted text (separators, model info)
banner_text: "#FFF8DC" # Body text (tool names, skill names)
# UI elements — (both)
ui_accent: "#FFBF00" # General accent (falls back to banner_accent)
ui_label: "#4dd0e1" # Labels
ui_ok: "#4caf50" # Success indicators
ui_error: "#ef5350" # Error indicators
ui_warn: "#ffa726" # Warning indicators
# Input area
prompt: "#FFF8DC" # Prompt text / `` glyph color (both)
input_rule: "#CD7F32" # Horizontal rule above input (classic)
# Response box — (classic)
response_border: "#FFD700" # Response box border
# Session display — (both)
session_label: "#DAA520" # "Session: " label
session_border: "#8B8682" # Session ID text
# TUI / CLI surfaces — (classic: status bar, voice badge, completion meta)
status_bar_bg: "#1a1a2e" # Status / usage bar background (classic)
voice_status_bg: "#1a1a2e" # Voice-mode badge background (classic)
completion_menu_bg: "#1a1a2e" # Completion list background (both)
completion_menu_current_bg: "#333355" # Active completion row background (both)
completion_menu_meta_bg: "#1a1a2e" # Completion meta column bg (classic)
completion_menu_meta_current_bg: "#333355" # Active meta bg (classic)
# Drag-to-select background — (tui)
selection_bg: "#3a3a55" # Uniform selection highlight in the TUI
# ── Spinner ─────────────────────────────────────────────────────────────────
# (classic) — the TUI uses its own animated indicators; spinner config here
# is only read by the classic prompt_toolkit CLI.
spinner:
# Faces shown while waiting for the API response
waiting_faces:
- "(。◕‿◕。)"
- "(◕‿◕✿)"
- "٩(◕‿◕。)۶"
# Faces shown during extended thinking/reasoning
thinking_faces:
- "(。•́︿•̀。)"
- "(◔_◔)"
- "(¬‿¬)"
# Verbs used in spinner messages (e.g., "pondering your request...")
thinking_verbs:
- "pondering"
- "contemplating"
- "musing"
- "ruminating"
# Optional: left/right decorations around the spinner
# Each entry is a [left, right] pair. Omit entirely for no wings.
# wings:
# - ["⟪⚔", "⚔⟫"]
# - ["⟪▲", "▲⟫"]
# ── Branding ────────────────────────────────────────────────────────────────
# Text strings used throughout the interface.
branding:
agent_name: "Hermes Agent" # (both) Banner title, about display
welcome: "Welcome! Type your message or /help for commands." # (both)
goodbye: "Goodbye! ⚕" # (both) Exit message
response_label: " ⚕ Hermes " # (classic) Response box header label
prompt_symbol: " " # (both) Input prompt glyph
help_header: "(^_^)? Available Commands" # (both) /help overlay title
# ── Tool Output ─────────────────────────────────────────────────────────────
# Character used as the prefix for tool output lines. (both)
# Default is "┊" (thin dotted vertical line). Some alternatives:
# "╎" (light triple dash vertical)
# "▏" (left one-eighth block)
# "│" (box drawing light vertical)
# "┃" (box drawing heavy vertical)
tool_prefix: "┊"
-329
View File
@@ -1,329 +0,0 @@
# Container-Aware CLI Review Fixes Spec
**PR:** NousResearch/hermes-agent#7543
**Review:** cursor[bot] bugbot review (4094049442) + two prior rounds
**Date:** 2026-04-12
**Branch:** `feat/container-aware-cli-clean`
## Review Issues Summary
Six issues were raised across three bugbot review rounds. Three were fixed in intermediate commits (38277a6a, 726cf90f). This spec addresses remaining design concerns surfaced by those reviews and simplifies the implementation based on interview decisions.
| # | Issue | Severity | Status |
|---|-------|----------|--------|
| 1 | `os.execvp` retry loop unreachable | Medium | Fixed in 79e8cd12 (switched to subprocess.run) |
| 2 | Redundant `shutil.which("sudo")` | Medium | Fixed in 38277a6a (reuses `sudo` var) |
| 3 | Missing `chown -h` on symlink update | Low | Fixed in 38277a6a |
| 4 | Container routing after `parse_args()` | High | Fixed in 726cf90f |
| 5 | Hardcoded `/home/${user}` | Medium | Fixed in 726cf90f |
| 6 | Group membership not gated on `container.enable` | Low | Fixed in 726cf90f |
The mechanical fixes are in place but the overall design needs revision. The retry loop, error swallowing, and process model have deeper issues than what the bugbot flagged.
---
## Spec: Revised `_exec_in_container`
### Design Principles
1. **Let it crash.** No silent fallbacks. If `.container-mode` exists but something goes wrong, the error propagates naturally (Python traceback). The only case where container routing is skipped is when `.container-mode` doesn't exist or `HERMES_DEV=1`.
2. **No retries.** Probe once for sudo, exec once. If it fails, docker/podman's stderr reaches the user verbatim.
3. **Completely transparent.** No error wrapping, no prefixes, no spinners. Docker's output goes straight through.
4. **`os.execvp` on the happy path.** Replace the Python process entirely so there's no idle parent during interactive sessions. Note: `execvp` never returns on success (process is replaced) and raises `OSError` on failure (it does not return a value). The container process's exit code becomes the process exit code by definition — no explicit propagation needed.
5. **One human-readable exception to "let it crash".** `subprocess.TimeoutExpired` from the sudo probe gets a specific catch with a readable message, since a raw traceback for "your Docker daemon is slow" is confusing. All other exceptions propagate naturally.
### Execution Flow
```
1. get_container_exec_info()
- HERMES_DEV=1 → return None (skip routing)
- Inside container → return None (skip routing)
- .container-mode doesn't exist → return None (skip routing)
- .container-mode exists → parse and return dict
- .container-mode exists but malformed/unreadable → LET IT CRASH (no try/except)
2. _exec_in_container(container_info, sys.argv[1:])
a. shutil.which(backend) → if None, print "{backend} not found on PATH" and sys.exit(1)
b. Sudo probe: subprocess.run([runtime, "inspect", "--format", "ok", container_name], timeout=15)
- If succeeds → needs_sudo = False
- If fails → try subprocess.run([sudo, "-n", runtime, "inspect", ...], timeout=15)
- If succeeds → needs_sudo = True
- If fails → print error with sudoers hint (including why -n is required) and sys.exit(1)
- If TimeoutExpired → catch specifically, print human-readable message about slow daemon
c. Build exec_cmd: [sudo? + runtime, "exec", tty_flags, "-u", exec_user, env_flags, container, hermes_bin, *cli_args]
d. os.execvp(exec_cmd[0], exec_cmd)
- On success: process is replaced — Python is gone, container exit code IS the process exit code
- On OSError: let it crash (natural traceback)
```
### Changes to `hermes_cli/main.py`
#### `_exec_in_container` — rewrite
Remove:
- The entire retry loop (`max_retries`, `for attempt in range(...)`)
- Spinner logic (`"Waiting for container..."`, dots)
- Exit code classification (125/126/127 handling)
- `subprocess.run` for the exec call (keep it only for the sudo probe)
- Special TTY vs non-TTY retry counts
- The `time` import (no longer needed)
Change:
- Use `os.execvp(exec_cmd[0], exec_cmd)` as the final call
- Keep the `subprocess` import only for the sudo probe
- Keep TTY detection for the `-it` vs `-i` flag
- Keep env var forwarding (TERM, COLORTERM, LANG, LC_ALL)
- Keep the sudo probe as-is (it's the one "smart" part)
- Bump probe `timeout` from 5s to 15s — cold podman on a loaded machine needs headroom
- Catch `subprocess.TimeoutExpired` specifically on both probe calls — print a readable message about the daemon being unresponsive instead of a raw traceback
- Expand the sudoers hint error message to explain *why* `-n` (non-interactive) is required: a password prompt would hang the CLI or break piped commands
The function becomes roughly:
```python
def _exec_in_container(container_info: dict, cli_args: list):
"""Replace the current process with a command inside the managed container.
Probes whether sudo is needed (rootful containers), then os.execvp
into the container. If exec fails, the OS error propagates naturally.
"""
import shutil
import subprocess
backend = container_info["backend"]
container_name = container_info["container_name"]
exec_user = container_info["exec_user"]
hermes_bin = container_info["hermes_bin"]
runtime = shutil.which(backend)
if not runtime:
print(f"Error: {backend} not found on PATH. Cannot route to container.",
file=sys.stderr)
sys.exit(1)
# Probe whether we need sudo to see the rootful container.
# Timeout is 15s — cold podman on a loaded machine can take a while.
# TimeoutExpired is caught specifically for a human-readable message;
# all other exceptions propagate naturally.
needs_sudo = False
sudo = None
try:
probe = subprocess.run(
[runtime, "inspect", "--format", "ok", container_name],
capture_output=True, text=True, timeout=15,
)
except subprocess.TimeoutExpired:
print(
f"Error: timed out waiting for {backend} to respond.\n"
f"The {backend} daemon may be unresponsive or starting up.",
file=sys.stderr,
)
sys.exit(1)
if probe.returncode != 0:
sudo = shutil.which("sudo")
if sudo:
try:
probe2 = subprocess.run(
[sudo, "-n", runtime, "inspect", "--format", "ok", container_name],
capture_output=True, text=True, timeout=15,
)
except subprocess.TimeoutExpired:
print(
f"Error: timed out waiting for sudo {backend} to respond.",
file=sys.stderr,
)
sys.exit(1)
if probe2.returncode == 0:
needs_sudo = True
else:
print(
f"Error: container '{container_name}' not found via {backend}.\n"
f"\n"
f"The NixOS service runs the container as root. Your user cannot\n"
f"see it because {backend} uses per-user namespaces.\n"
f"\n"
f"Fix: grant passwordless sudo for {backend}. The -n (non-interactive)\n"
f"flag is required because the CLI calls sudo non-interactively —\n"
f"a password prompt would hang or break piped commands:\n"
f"\n"
f' security.sudo.extraRules = [{{\n'
f' users = [ "{os.getenv("USER", "your-user")}" ];\n'
f' commands = [{{ command = "{runtime}"; options = [ "NOPASSWD" ]; }}];\n'
f' }}];\n'
f"\n"
f"Or run: sudo hermes {' '.join(cli_args)}",
file=sys.stderr,
)
sys.exit(1)
else:
print(
f"Error: container '{container_name}' not found via {backend}.\n"
f"The container may be running under root. Try: sudo hermes {' '.join(cli_args)}",
file=sys.stderr,
)
sys.exit(1)
is_tty = sys.stdin.isatty()
tty_flags = ["-it"] if is_tty else ["-i"]
env_flags = []
for var in ("TERM", "COLORTERM", "LANG", "LC_ALL"):
val = os.environ.get(var)
if val:
env_flags.extend(["-e", f"{var}={val}"])
cmd_prefix = [sudo, "-n", runtime] if needs_sudo else [runtime]
exec_cmd = (
cmd_prefix + ["exec"]
+ tty_flags
+ ["-u", exec_user]
+ env_flags
+ [container_name, hermes_bin]
+ cli_args
)
# execvp replaces this process entirely — it never returns on success.
# On failure it raises OSError, which propagates naturally.
os.execvp(exec_cmd[0], exec_cmd)
```
#### Container routing call site in `main()` — remove try/except
Current:
```python
try:
from hermes_cli.config import get_container_exec_info
container_info = get_container_exec_info()
if container_info:
_exec_in_container(container_info, sys.argv[1:])
sys.exit(1) # exec failed if we reach here
except SystemExit:
raise
except Exception:
pass # Container routing unavailable, proceed locally
```
Revised:
```python
from hermes_cli.config import get_container_exec_info
container_info = get_container_exec_info()
if container_info:
_exec_in_container(container_info, sys.argv[1:])
# Unreachable: os.execvp never returns on success (process is replaced)
# and raises OSError on failure (which propagates as a traceback).
# This line exists only as a defensive assertion.
sys.exit(1)
```
No try/except. If `.container-mode` doesn't exist, `get_container_exec_info()` returns `None` and we skip routing. If it exists but is broken, the exception propagates with a natural traceback.
Note: `sys.exit(1)` after `_exec_in_container` is dead code in all paths — `os.execvp` either replaces the process or raises. It's kept as a belt-and-suspenders assertion with a comment marking it unreachable, not as actual error handling.
### Changes to `hermes_cli/config.py`
#### `get_container_exec_info` — remove inner try/except
Current code catches `(OSError, IOError)` and returns `None`. This silently hides permission errors, corrupt files, etc.
Change: Remove the try/except around file reading. Keep the early returns for `HERMES_DEV=1` and `_is_inside_container()`. The `FileNotFoundError` from `open()` when `.container-mode` doesn't exist should still return `None` (this is the "container mode not enabled" case). All other exceptions propagate.
```python
def get_container_exec_info() -> Optional[dict]:
if os.environ.get("HERMES_DEV") == "1":
return None
if _is_inside_container():
return None
container_mode_file = get_hermes_home() / ".container-mode"
try:
with open(container_mode_file, "r") as f:
# ... parse key=value lines ...
except FileNotFoundError:
return None
# All other exceptions (PermissionError, malformed data, etc.) propagate
return { ... }
```
---
## Spec: NixOS Module Changes
### Symlink creation — simplify to two branches
Current: 4 branches (symlink exists, directory exists, other file, doesn't exist).
Revised: 2 branches.
```bash
if [ -d "${symlinkPath}" ] && [ ! -L "${symlinkPath}" ]; then
# Real directory — back it up, then create symlink
_backup="${symlinkPath}.bak.$(date +%s)"
echo "hermes-agent: backing up existing ${symlinkPath} to $_backup"
mv "${symlinkPath}" "$_backup"
fi
# For everything else (symlink, doesn't exist, etc.) — just force-create
ln -sfn "${target}" "${symlinkPath}"
chown -h ${user}:${cfg.group} "${symlinkPath}"
```
`ln -sfn` handles: existing symlink (replaces), doesn't exist (creates), and after the `mv` above (creates). The only case that needs special handling is a real directory, because `ln -sfn` cannot atomically replace a directory.
Note: there is a theoretical race between the `[ -d ... ]` check and the `mv` (something could create/remove the directory in between). In practice this is a NixOS activation script running as root during `nixos-rebuild switch` — no other process should be touching `~/.hermes` at that moment. Not worth adding locking for.
### Sudoers — document, don't auto-configure
Do NOT add `security.sudo.extraRules` to the module. Document the sudoers requirement in the module's description/comments and in the error message the CLI prints when sudo probe fails.
### Group membership gating — keep as-is
The fix in 726cf90f (`cfg.container.enable && cfg.container.hostUsers != []`) is correct. Leftover group membership when container mode is disabled is harmless. No cleanup needed.
---
## Spec: Test Rewrite
The existing test file (`tests/hermes_cli/test_container_aware_cli.py`) has 16 tests. With the simplified exec model, several are obsolete.
### Tests to keep (update as needed)
- `test_is_inside_container_dockerenv` — unchanged
- `test_is_inside_container_containerenv` — unchanged
- `test_is_inside_container_cgroup_docker` — unchanged
- `test_is_inside_container_false_on_host` — unchanged
- `test_get_container_exec_info_returns_metadata` — unchanged
- `test_get_container_exec_info_none_inside_container` — unchanged
- `test_get_container_exec_info_none_without_file` — unchanged
- `test_get_container_exec_info_skipped_when_hermes_dev` — unchanged
- `test_get_container_exec_info_not_skipped_when_hermes_dev_zero` — unchanged
- `test_get_container_exec_info_defaults` — unchanged
- `test_get_container_exec_info_docker_backend` — unchanged
### Tests to add
- `test_get_container_exec_info_crashes_on_permission_error` — verify that `PermissionError` propagates (no silent `None` return)
- `test_exec_in_container_calls_execvp` — verify `os.execvp` is called with correct args (runtime, tty flags, user, env, container, binary, cli args)
- `test_exec_in_container_sudo_probe_sets_prefix` — verify that when first probe fails and sudo probe succeeds, `os.execvp` is called with `sudo -n` prefix
- `test_exec_in_container_no_runtime_hard_fails` — keep existing, verify `sys.exit(1)` when `shutil.which` returns None
- `test_exec_in_container_non_tty_uses_i_only` — update to check `os.execvp` args instead of `subprocess.run` args
- `test_exec_in_container_probe_timeout_prints_message` — verify that `subprocess.TimeoutExpired` from the probe produces a human-readable error and `sys.exit(1)`, not a raw traceback
- `test_exec_in_container_container_not_running_no_sudo` — verify the path where runtime exists (`shutil.which` returns a path) but probe returns non-zero and no sudo is available. Should print the "container may be running under root" error. This is distinct from `no_runtime_hard_fails` which covers `shutil.which` returning None.
### Tests to delete
- `test_exec_in_container_tty_retries_on_container_failure` — retry loop removed
- `test_exec_in_container_non_tty_retries_silently_exits_126` — retry loop removed
- `test_exec_in_container_propagates_hermes_exit_code` — no subprocess.run to check exit codes; execvp replaces the process. Note: exit code propagation still works correctly — when `os.execvp` succeeds, the container's process *becomes* this process, so its exit code is the process exit code by OS semantics. No application code needed, no test needed. A comment in the function docstring documents this intent for future readers.
---
## Out of Scope
- Auto-configuring sudoers rules in the NixOS module
- Any changes to `get_container_exec_info` parsing logic beyond the try/except narrowing
- Changes to `.container-mode` file format
- Changes to the `HERMES_DEV=1` bypass
- Changes to container detection logic (`_is_inside_container`)
+40 -3
View File
@@ -1926,9 +1926,18 @@ class BasePlatformAdapter(ABC):
if session_key in self._pending_messages:
pending_event = self._pending_messages.pop(session_key)
logger.debug("[%s] Processing queued message from interrupt", self.name)
# Clean up current session before processing pending
if session_key in self._active_sessions:
del self._active_sessions[session_key]
# Keep the _active_sessions entry live across the turn chain
# and only CLEAR the interrupt Event — do NOT delete the entry.
# If we deleted here, a concurrent inbound message arriving
# during the awaits below would pass the Level-1 guard, spawn
# its own _process_message_background, and run simultaneously
# with the recursive drain below. Two agents on one
# session_key = duplicate responses, duplicate tool calls.
# Clearing the Event keeps the guard live so follow-ups take
# the busy-handler path (queue + interrupt) as intended.
_active = self._active_sessions.get(session_key)
if _active is not None:
_active.clear()
typing_task.cancel()
try:
await typing_task
@@ -1986,6 +1995,34 @@ class BasePlatformAdapter(ABC):
await self.stop_typing(event.source.chat_id)
except Exception:
pass
# Late-arrival drain: a message may have arrived during the
# cleanup awaits above (typing_task cancel, stop_typing). Such
# messages passed the Level-1 guard (entry still live, Event
# possibly set) and landed in _pending_messages via the
# busy-handler path. Without this block, we would delete the
# active-session entry and the queued message would be silently
# dropped (user never gets a reply).
late_pending = self._pending_messages.pop(session_key, None)
if late_pending is not None:
logger.debug(
"[%s] Late-arrival pending message during cleanup — spawning drain task",
self.name,
)
_active = self._active_sessions.get(session_key)
if _active is not None:
_active.clear()
drain_task = asyncio.create_task(
self._process_message_background(late_pending, session_key)
)
try:
self._background_tasks.add(drain_task)
drain_task.add_done_callback(self._background_tasks.discard)
except TypeError:
# Tests stub create_task() with non-hashable sentinels; tolerate.
pass
# Leave _active_sessions[session_key] populated — the drain
# task's own lifecycle will clean it up.
return
# Clean up session tracking
if session_key in self._active_sessions:
del self._active_sessions[session_key]
+32 -1
View File
@@ -1933,6 +1933,24 @@ class DiscordAdapter(BasePlatformAdapter):
the "thinking..." indicator is replaced with that text; otherwise it
is deleted so the channel isn't cluttered.
"""
# Log the invoker so ghost-command reports can be triaged. Discord
# native slash invocations are always user-initiated (no bot can fire
# them), but mobile autocomplete / keyboard shortcuts / other users
# in the same channel are easy to miss in post-mortems.
try:
_user = interaction.user
_chan_id = getattr(interaction.channel, "id", None) or getattr(interaction, "channel_id", None)
logger.info(
"[Discord] slash '%s' invoked by user=%s id=%s channel=%s guild=%s",
command_text,
getattr(_user, "name", "?"),
getattr(_user, "id", "?"),
_chan_id,
getattr(interaction, "guild_id", None),
)
except Exception:
pass # logging must never block command dispatch
await interaction.response.defer(ephemeral=True)
event = self._build_slash_event(interaction, command_text)
await self.handle_message(event)
@@ -3247,7 +3265,20 @@ class DiscordAdapter(BasePlatformAdapter):
"[Discord] Flushing text batch %s (%d chars)",
key, len(event.text or ""),
)
await self.handle_message(event)
# Shield the downstream dispatch so that a subsequent chunk
# arriving while handle_message is mid-flight cannot cancel
# the running agent turn. _enqueue_text_event always cancels
# the prior flush task when a new chunk lands; without this
# shield, CancelledError would propagate from our task down
# into handle_message → the agent's streaming request,
# aborting the response the user was waiting on. The new
# chunk is handled by the fresh flush task regardless.
await asyncio.shield(self.handle_message(event))
except asyncio.CancelledError:
# Only reached if cancel landed before the pop — the shielded
# handle_message is unaffected either way. Let the task exit
# cleanly so the finally block cleans up.
pass
finally:
if self._pending_text_batch_tasks.get(key) is current_task:
self._pending_text_batch_tasks.pop(key, None)
+2 -2
View File
@@ -2926,8 +2926,8 @@ class TelegramAdapter(BasePlatformAdapter):
chat_id=str(chat.id),
chat_name=chat.title or (chat.full_name if hasattr(chat, "full_name") else None),
chat_type=chat_type,
user_id=str(user.id) if user else None,
user_name=user.full_name if user else None,
user_id=str(user.id) if user else (str(chat.id) if chat_type == "dm" else None),
user_name=user.full_name if user else (chat.full_name if hasattr(chat, "full_name") and chat_type == "dm" else None),
thread_id=thread_id_str,
chat_topic=chat_topic,
)
+133 -8
View File
@@ -752,6 +752,26 @@ class GatewayRunner:
chat_id for chat_id, mode in self._voice_mode.items() if mode == "off"
)
async def _safe_adapter_disconnect(self, adapter, platform) -> None:
"""Call adapter.disconnect() defensively, swallowing any error.
Used when adapter.connect() failed or raised the adapter may
have allocated partial resources (aiohttp.ClientSession, poll
tasks, child subprocesses) that would otherwise leak and surface
as "Unclosed client session" warnings at process exit.
Must tolerate partial-init state and never raise, since callers
use it inside error-handling blocks.
"""
try:
await adapter.disconnect()
except Exception as e:
logger.debug(
"Defensive %s disconnect after failed connect raised: %s",
platform.value if platform is not None else "adapter",
e,
)
# -----------------------------------------------------------------
def _flush_memories_for_session(
@@ -1539,7 +1559,7 @@ class GatewayRunner:
action = "restarting" if self._restart_requested else "shutting down"
hint = (
"Your current task will be interrupted. "
"Send any message after restart to resume where it left off."
"Send any message after restart and I'll try to resume where you left off."
if self._restart_requested
else "Your current task will be interrupted."
)
@@ -1913,6 +1933,15 @@ class GatewayRunner:
logger.info("%s connected", platform.value)
else:
logger.warning("%s failed to connect", platform.value)
# Defensive cleanup: a failed connect() may have
# allocated resources (aiohttp.ClientSession, poll
# tasks, bridge subprocesses) before giving up.
# Without this call, those resources are orphaned
# and Python logs "Unclosed client session" at
# process exit. Adapter disconnect() implementations
# are expected to be idempotent and tolerate
# partial-init state.
await self._safe_adapter_disconnect(adapter, platform)
if adapter.has_fatal_error:
self._update_platform_runtime_status(
platform.value,
@@ -1953,6 +1982,10 @@ class GatewayRunner:
}
except Exception as e:
logger.error("%s error: %s", platform.value, e)
# Same defensive cleanup path for exceptions — an adapter
# that raised mid-connect may still have a live
# aiohttp.ClientSession or child subprocess.
await self._safe_adapter_disconnect(adapter, platform)
self._update_platform_runtime_status(
platform.value,
platform_state="retrying",
@@ -2373,6 +2406,40 @@ class GatewayRunner:
timeout,
self._running_agent_count(),
)
# Mark forcibly-interrupted sessions as resume_pending BEFORE
# interrupting the agents. This preserves each session's
# session_id + transcript so the next message on the same
# session_key auto-resumes from the existing conversation
# instead of getting routed through suspend_recently_active()
# and converted into a fresh session. Terminal escalation
# for genuinely stuck sessions still flows through the
# existing ``.restart_failure_counts`` stuck-loop counter
# (incremented below, threshold 3), which sets
# ``suspended=True`` and overrides resume_pending.
#
# Iterate self._running_agents (current) rather than the
# drain-start ``active_agents`` snapshot — the snapshot
# may include sessions that finished gracefully during
# the drain window, and marking those falsely would give
# them a stray restart-interruption system note on their
# next turn even though their previous turn completed
# cleanly. Skip pending sentinels for the same reason
# _interrupt_running_agents() does: their agent hasn't
# started yet, there's nothing to interrupt, and the
# session shouldn't carry a misleading resume flag.
_resume_reason = (
"restart_timeout" if self._restart_requested else "shutdown_timeout"
)
for _sk, _agent in list(self._running_agents.items()):
if _agent is _AGENT_PENDING_SENTINEL:
continue
try:
self.session_store.mark_resume_pending(_sk, _resume_reason)
except Exception as _e:
logger.debug(
"mark_resume_pending failed for %s: %s",
_sk[:20], _e,
)
self._interrupt_running_agents(
"Gateway restarting" if self._restart_requested else "Gateway shutting down"
)
@@ -2953,8 +3020,8 @@ class GatewayRunner:
# Resolve the command once for all early-intercept checks below.
from hermes_cli.commands import (
ACTIVE_SESSION_BYPASS_COMMANDS as _DEDICATED_HANDLERS,
resolve_command as _resolve_cmd_inner,
should_bypass_active_session as _should_bypass_active_inner,
)
_evt_cmd = event.get_command()
_cmd_def_inner = _resolve_cmd_inner(_evt_cmd) if _evt_cmd else None
@@ -3089,11 +3156,9 @@ class GatewayRunner:
if _cmd_def_inner and _cmd_def_inner.name == "background":
return await self._handle_background_command(event)
# Gateway-handled info/control commands must never fall through to
# the interrupt path. If they are queued as pending text, the
# slash-command safety net discards them before the user sees any
# response.
if _cmd_def_inner and _should_bypass_active_inner(_cmd_def_inner.name):
# Gateway-handled info/control commands with dedicated
# running-agent handlers.
if _cmd_def_inner and _cmd_def_inner.name in _DEDICATED_HANDLERS:
if _cmd_def_inner.name == "help":
return await self._handle_help_command(event)
if _cmd_def_inner.name == "commands":
@@ -3103,6 +3168,21 @@ class GatewayRunner:
if _cmd_def_inner.name == "update":
return await self._handle_update_command(event)
# Catch-all: any other recognized slash command reached the
# running-agent guard. Reject gracefully rather than falling
# through to interrupt + discard. Without this, commands
# like /model, /reasoning, /voice, /insights, /title,
# /resume, /retry, /undo, /compress, /usage, /provider,
# /reload-mcp, /sethome, /reset (all registered as Discord
# slash commands) would interrupt the agent AND get
# silently discarded by the slash-command safety net,
# producing a zero-char response. See #5057, #6252, #10370.
if _cmd_def_inner:
return (
f"⏳ Agent is running — `/{_cmd_def_inner.name}` can't run "
f"mid-turn. Wait for the current response or `/stop` first."
)
if event.message_type == MessageType.PHOTO:
logger.debug("PRIORITY photo follow-up for session %s — queueing without interrupt", _quick_key[:20])
adapter = self.adapters.get(source.platform)
@@ -4152,8 +4232,20 @@ class GatewayRunner:
# Successful turn — clear any stuck-loop counter for this session.
# This ensures the counter only accumulates across CONSECUTIVE
# restarts where the session was active (never completed).
#
# Also clear the resume_pending flag (set by drain-timeout
# shutdown) — the turn ran to completion, so recovery
# succeeded and subsequent messages should no longer receive
# the restart-interruption system note.
if session_key:
self._clear_restart_failure_count(session_key)
try:
self.session_store.clear_resume_pending(session_key)
except Exception as _e:
logger.debug(
"clear_resume_pending failed for %s: %s",
session_key[:20], _e,
)
# Surface error details when the agent failed silently (final_response=None)
if not response and agent_result.get("failed"):
@@ -9427,7 +9519,40 @@ class GatewayRunner:
# restart, crash, SIGTERM). Prepend a system note so the model
# finishes processing the pending tool results before addressing
# the user's new message. (#4493)
if agent_history and agent_history[-1].get("role") == "tool":
#
# Session-level resume_pending (set on drain-timeout shutdown)
# escalates the wording — the transcript's last role may be
# anything (tool, assistant with unfinished work, etc.), so we
# give a stronger, reason-aware instruction that subsumes the
# tool-tail case.
_resume_entry = None
if session_key:
try:
_resume_entry = self.session_store._entries.get(session_key)
except Exception:
_resume_entry = None
_is_resume_pending = bool(
_resume_entry is not None and getattr(_resume_entry, "resume_pending", False)
)
if _is_resume_pending:
_reason = getattr(_resume_entry, "resume_reason", None) or "restart_timeout"
_reason_phrase = (
"a gateway restart"
if _reason == "restart_timeout"
else "a gateway shutdown"
if _reason == "shutdown_timeout"
else "a gateway interruption"
)
message = (
f"[System note: Your previous turn in this session was interrupted "
f"by {_reason_phrase}. The conversation history below is intact. "
f"If it contains unfinished tool result(s), process them first and "
f"summarize what was accomplished, then address the user's new "
f"message below.]\n\n"
+ message
)
elif agent_history and agent_history[-1].get("role") == "tool":
message = (
"[System note: Your previous turn was interrupted before you could "
"process the last tool result(s). The conversation history contains "
+104 -3
View File
@@ -377,7 +377,19 @@ class SessionEntry:
# this session (create a new session_id) so the user starts fresh.
# Set by /stop to break stuck-resume loops (#7536).
suspended: bool = False
# When True the session was interrupted by a gateway restart/shutdown
# drain timeout, but recovery is still expected. Unlike ``suspended``,
# ``resume_pending`` preserves the existing session_id on next access —
# the user stays on the same transcript and the agent auto-continues
# from where it left off. Cleared after the next successful turn.
# Escalation to ``suspended`` is handled by the existing
# ``.restart_failure_counts`` stuck-loop counter (#7536), not by a
# parallel counter on this entry.
resume_pending: bool = False
resume_reason: Optional[str] = None # e.g. "restart_timeout"
last_resume_marked_at: Optional[datetime] = None
def to_dict(self) -> Dict[str, Any]:
result = {
"session_key": self.session_key,
@@ -397,6 +409,13 @@ class SessionEntry:
"cost_status": self.cost_status,
"memory_flushed": self.memory_flushed,
"suspended": self.suspended,
"resume_pending": self.resume_pending,
"resume_reason": self.resume_reason,
"last_resume_marked_at": (
self.last_resume_marked_at.isoformat()
if self.last_resume_marked_at
else None
),
}
if self.origin:
result["origin"] = self.origin.to_dict()
@@ -414,7 +433,15 @@ class SessionEntry:
platform = Platform(data["platform"])
except ValueError as e:
logger.debug("Unknown platform value %r: %s", data["platform"], e)
last_resume_marked_at = None
_lrma = data.get("last_resume_marked_at")
if _lrma:
try:
last_resume_marked_at = datetime.fromisoformat(_lrma)
except (TypeError, ValueError):
last_resume_marked_at = None
return cls(
session_key=data["session_key"],
session_id=data["session_id"],
@@ -434,6 +461,9 @@ class SessionEntry:
cost_status=data.get("cost_status", "unknown"),
memory_flushed=data.get("memory_flushed", False),
suspended=data.get("suspended", False),
resume_pending=data.get("resume_pending", False),
resume_reason=data.get("resume_reason"),
last_resume_marked_at=last_resume_marked_at,
)
@@ -710,9 +740,23 @@ class SessionStore:
entry = self._entries[session_key]
# Auto-reset sessions marked as suspended (e.g. after /stop
# broke a stuck loop — #7536).
# broke a stuck loop — #7536). ``suspended`` is the hard
# forced-wipe signal and always wins over ``resume_pending``,
# so repeated interrupted restarts that escalate via the
# existing ``.restart_failure_counts`` stuck-loop counter
# still converge to a clean slate.
if entry.suspended:
reset_reason = "suspended"
elif entry.resume_pending:
# Restart-interrupted session: preserve the session_id
# and return the existing entry so the transcript
# reloads intact. ``resume_pending`` is cleared after
# the NEXT successful turn completes (not here), which
# means a re-interrupted retry keeps trying — the
# stuck-loop counter handles terminal escalation.
entry.updated_at = now
self._save()
return entry
else:
reset_reason = self._should_reset(entry, source)
if not reset_reason:
@@ -802,6 +846,55 @@ class SessionStore:
return True
return False
def mark_resume_pending(
self,
session_key: str,
reason: str = "restart_timeout",
) -> bool:
"""Mark a session as resumable after a restart interruption.
Unlike ``suspend_session()``, this preserves the existing
``session_id`` and the transcript. The next call to
``get_or_create_session()`` for this key returns the same entry
so the user auto-resumes on the same conversation lane.
Returns True if the session existed and was marked.
"""
with self._lock:
self._ensure_loaded_locked()
if session_key in self._entries:
entry = self._entries[session_key]
# Never override an explicit ``suspended`` — that is a hard
# forced-wipe signal (from /stop or stuck-loop escalation).
if entry.suspended:
return False
entry.resume_pending = True
entry.resume_reason = reason
entry.last_resume_marked_at = _now()
self._save()
return True
return False
def clear_resume_pending(self, session_key: str) -> bool:
"""Clear the resume-pending flag after a successful resumed turn.
Called from the gateway after ``run_conversation()`` returns a
final response for a session that had ``resume_pending=True``,
signalling that recovery succeeded.
Returns True if a flag was cleared.
"""
with self._lock:
self._ensure_loaded_locked()
entry = self._entries.get(session_key)
if entry is None or not entry.resume_pending:
return False
entry.resume_pending = False
entry.resume_reason = None
entry.last_resume_marked_at = None
self._save()
return True
def prune_old_entries(self, max_age_days: int) -> int:
"""Drop SessionEntry records older than max_age_days.
@@ -861,6 +954,12 @@ class SessionStore:
(#7536). Only suspends sessions updated within *max_age_seconds*
to avoid resetting long-idle sessions that are harmless to resume.
Returns the number of sessions that were suspended.
Entries flagged ``resume_pending=True`` are skipped those were
marked intentionally by the drain-timeout path as recoverable.
Terminal escalation for genuinely stuck ``resume_pending`` sessions
is handled by the existing ``.restart_failure_counts`` stuck-loop
counter, which runs after this method on startup.
"""
from datetime import timedelta
@@ -869,6 +968,8 @@ class SessionStore:
with self._lock:
self._ensure_loaded_locked()
for entry in self._entries.values():
if entry.resume_pending:
continue
if not entry.suspended and entry.updated_at >= cutoff:
entry.suspended = True
count += 1
+48
View File
@@ -430,6 +430,21 @@ class GatewayStreamConsumer:
# a real string like "msg_1", not "__no_edit__", so that case
# still resets and creates a fresh segment as intended.)
if got_segment_break:
# If the segment-break edit failed to deliver the
# accumulated content (flood control that has not yet
# promoted to fallback mode, or fallback mode itself),
# _accumulated still holds pre-boundary text the user
# never saw. Flush that tail as a continuation message
# before the reset below wipes _accumulated — otherwise
# text generated before the tool boundary is silently
# dropped (issue #8124).
if (
self._accumulated
and not current_update_visible
and self._message_id
and self._message_id != "__no_edit__"
):
await self._flush_segment_tail_on_edit_failure()
self._reset_segment_state(preserve_no_edit=True)
await asyncio.sleep(0.05) # Small yield to not busy-loop
@@ -620,6 +635,39 @@ class GatewayStreamConsumer:
err_lower = err.lower()
return "flood" in err_lower or "retry after" in err_lower or "rate" in err_lower
async def _flush_segment_tail_on_edit_failure(self) -> None:
"""Deliver un-sent tail content before a segment-break reset.
When an edit fails (flood control, transport error) and a tool
boundary arrives before the next retry, ``_accumulated`` holds text
that was generated but never shown to the user. Without this flush,
the segment reset would discard that tail and leave a frozen cursor
in the partial message.
Sends the tail that sits after the last successfully-delivered
prefix as a new message, and best-effort strips the stuck cursor
from the previous partial message.
"""
if not self._fallback_final_send:
await self._try_strip_cursor()
visible = self._fallback_prefix or self._visible_prefix()
tail = self._accumulated
if visible and tail.startswith(visible):
tail = tail[len(visible):].lstrip()
tail = self._clean_for_display(tail)
if not tail.strip():
return
try:
result = await self.adapter.send(
chat_id=self.chat_id,
content=tail,
metadata=self.metadata,
)
if result.success:
self._already_sent = True
except Exception as e:
logger.error("Segment-break tail flush error: %s", e)
async def _try_strip_cursor(self) -> None:
"""Best-effort edit to remove the cursor from the last visible message.
+6 -68
View File
@@ -1434,49 +1434,6 @@ def _read_codex_tokens(*, _lock: bool = True) -> Dict[str, Any]:
}
def _write_codex_cli_tokens(
access_token: str,
refresh_token: str,
*,
last_refresh: Optional[str] = None,
) -> None:
"""Write refreshed tokens back to ~/.codex/auth.json.
OpenAI OAuth refresh tokens are single-use and rotate on every refresh.
When Hermes refreshes a token it consumes the old refresh_token; if we
don't write the new pair back, the Codex CLI (or VS Code extension) will
fail with ``refresh_token_reused`` on its next refresh attempt.
This mirrors the Anthropic write-back to ~/.claude/.credentials.json
via ``_write_claude_code_credentials()``.
"""
codex_home = os.getenv("CODEX_HOME", "").strip()
if not codex_home:
codex_home = str(Path.home() / ".codex")
auth_path = Path(codex_home).expanduser() / "auth.json"
try:
existing: Dict[str, Any] = {}
if auth_path.is_file():
existing = json.loads(auth_path.read_text(encoding="utf-8"))
if not isinstance(existing, dict):
existing = {}
tokens_dict = existing.get("tokens")
if not isinstance(tokens_dict, dict):
tokens_dict = {}
tokens_dict["access_token"] = access_token
tokens_dict["refresh_token"] = refresh_token
existing["tokens"] = tokens_dict
if last_refresh is not None:
existing["last_refresh"] = last_refresh
auth_path.parent.mkdir(parents=True, exist_ok=True)
auth_path.write_text(json.dumps(existing, indent=2), encoding="utf-8")
auth_path.chmod(0o600)
except (OSError, IOError) as exc:
logger.debug("Failed to write refreshed tokens to %s: %s", auth_path, exc)
def _save_codex_tokens(tokens: Dict[str, str], last_refresh: str = None) -> None:
"""Save Codex OAuth tokens to Hermes auth store (~/.hermes/auth.json)."""
if last_refresh is None:
@@ -1544,6 +1501,11 @@ def refresh_codex_oauth_pure(
"then run `hermes auth` to re-authenticate."
)
relogin_required = True
# A 401/403 from the token endpoint always means the refresh token
# is invalid/expired — force relogin even if the body error code
# wasn't one of the known strings above.
if response.status_code in (401, 403) and not relogin_required:
relogin_required = True
raise AuthError(
message,
provider="openai-codex",
@@ -1599,12 +1561,6 @@ def _refresh_codex_auth_tokens(
updated_tokens["refresh_token"] = refreshed["refresh_token"]
_save_codex_tokens(updated_tokens)
# Write back to ~/.codex/auth.json so Codex CLI / VS Code stay in sync.
_write_codex_cli_tokens(
refreshed["access_token"],
refreshed["refresh_token"],
last_refresh=refreshed.get("last_refresh"),
)
return updated_tokens
@@ -1649,25 +1605,7 @@ def resolve_codex_runtime_credentials(
refresh_skew_seconds: int = CODEX_ACCESS_TOKEN_REFRESH_SKEW_SECONDS,
) -> Dict[str, Any]:
"""Resolve runtime credentials from Hermes's own Codex token store."""
try:
data = _read_codex_tokens()
except AuthError as orig_err:
# Only attempt migration when there are NO tokens stored at all
# (code == "codex_auth_missing"), not when tokens exist but are invalid.
if orig_err.code != "codex_auth_missing":
raise
# Migration: user had Codex as active provider with old storage (~/.codex/).
cli_tokens = _import_codex_cli_tokens()
if cli_tokens:
logger.info("Migrating Codex credentials from ~/.codex/ to Hermes auth store")
print("⚠️ Migrating Codex credentials to Hermes's own auth store.")
print(" This avoids conflicts with Codex CLI and VS Code.")
print(" Run `hermes auth` to create a fully independent session.\n")
_save_codex_tokens(cli_tokens)
data = _read_codex_tokens()
else:
raise
data = _read_codex_tokens()
tokens = dict(data["tokens"])
access_token = str(tokens.get("access_token", "") or "").strip()
refresh_timeout_seconds = float(os.getenv("HERMES_CODEX_REFRESH_TIMEOUT_SECONDS", "20"))
+24 -7
View File
@@ -260,10 +260,10 @@ GATEWAY_KNOWN_COMMANDS: frozenset[str] = frozenset(
)
# Commands that must never be queued behind an active gateway session.
# These are explicit control/info commands handled by the gateway itself;
# if they get queued as pending text, the safety net in gateway.run will
# discard them before they ever reach the user.
# Commands with explicit Level-2 running-agent handlers in gateway/run.py.
# Listed here for introspection / tests; semantically a subset of
# "all resolvable commands" — which is the real bypass set (see
# should_bypass_active_session below).
ACTIVE_SESSION_BYPASS_COMMANDS: frozenset[str] = frozenset(
{
"agents",
@@ -285,9 +285,26 @@ ACTIVE_SESSION_BYPASS_COMMANDS: frozenset[str] = frozenset(
def should_bypass_active_session(command_name: str | None) -> bool:
"""Return True when a slash command must bypass active-session queuing."""
cmd = resolve_command(command_name) if command_name else None
return bool(cmd and cmd.name in ACTIVE_SESSION_BYPASS_COMMANDS)
"""Return True for any resolvable slash command.
Rationale: every gateway-registered slash command either has a
specific Level-2 handler in gateway/run.py (/stop, /new, /model,
/approve, etc.) or reaches the running-agent catch-all that returns
a "busy — wait or /stop first" response. In both paths the command
is dispatched, not queued.
Queueing is always wrong for a recognized slash command because the
safety net in gateway.run discards any command text that reaches
the pending queue which meant a mid-run /model (or /reasoning,
/voice, /insights, /title, /resume, /retry, /undo, /compress,
/usage, /provider, /reload-mcp, /sethome, /reset) would silently
interrupt the agent AND get discarded, producing a zero-char
response. See issue #5057 / PRs #6252, #10370, #4665.
ACTIVE_SESSION_BYPASS_COMMANDS remains the subset of commands with
explicit Level-2 handlers; the rest fall through to the catch-all.
"""
return resolve_command(command_name) is not None if command_name else False
def _resolve_config_gates() -> set[str]:
+7 -2
View File
@@ -737,9 +737,14 @@ DEFAULT_CONFIG = {
# manual — always prompt the user (default)
# smart — use auxiliary LLM to auto-approve low-risk commands, prompt for high-risk
# off — skip all approval prompts (equivalent to --yolo)
#
# cron_mode — what to do when a cron job hits a dangerous command:
# deny — block the command and let the agent find another way (default, safe)
# approve — auto-approve all dangerous commands in cron jobs
"approvals": {
"mode": "manual",
"timeout": 60,
"cron_mode": "deny",
},
# Permanently allowed dangerous command patterns (added via "always" approval)
@@ -2856,7 +2861,7 @@ _FALLBACK_COMMENT = """
# minimax (MINIMAX_API_KEY) — MiniMax
# minimax-cn (MINIMAX_CN_API_KEY) — MiniMax (China)
#
# For custom OpenAI-compatible endpoints, add base_url and api_key_env.
# For custom OpenAI-compatible endpoints, add base_url and key_env.
#
# fallback_model:
# provider: openrouter
@@ -2900,7 +2905,7 @@ _COMMENTED_SECTIONS = """
# minimax (MINIMAX_API_KEY) — MiniMax
# minimax-cn (MINIMAX_CN_API_KEY) — MiniMax (China)
#
# For custom OpenAI-compatible endpoints, add base_url and api_key_env.
# For custom OpenAI-compatible endpoints, add base_url and key_env.
#
# fallback_model:
# provider: openrouter
+4 -3
View File
@@ -3973,7 +3973,7 @@ def _model_flow_anthropic(config, current_model=""):
elif choice == "2":
print()
print(" Get an API key at: https://console.anthropic.com/settings/keys")
print(" Get an API key at: https://platform.claude.com/settings/keys")
print()
try:
import getpass
@@ -6229,8 +6229,9 @@ def cmd_dashboard(args):
print(f"Install them with: {sys.executable} -m pip install 'fastapi' 'uvicorn[standard]'")
sys.exit(1)
if not _build_web_ui(PROJECT_ROOT / "web", fatal=True):
sys.exit(1)
if "HERMES_WEB_DIST" not in os.environ:
if not _build_web_ui(PROJECT_ROOT / "web", fatal=True):
sys.exit(1)
from hermes_cli.web_server import start_server
-2
View File
@@ -133,8 +133,6 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"gemini-2.5-pro",
"gemini-2.5-flash",
"gemini-2.5-flash-lite",
# Gemma open models (also served via AI Studio)
"gemma-4-31b-it",
],
"google-gemini-cli": [
"gemini-2.5-pro",
+3 -2
View File
@@ -91,7 +91,6 @@ _DEFAULT_PROVIDER_MODELS = {
"gemini": [
"gemini-3.1-pro-preview", "gemini-3-flash-preview", "gemini-3.1-flash-lite-preview",
"gemini-2.5-pro", "gemini-2.5-flash", "gemini-2.5-flash-lite",
"gemma-4-31b-it",
],
"zai": ["glm-5.1", "glm-5", "glm-4.7", "glm-4.5", "glm-4.5-flash"],
"kimi-coding": ["kimi-k2.5", "kimi-k2-thinking", "kimi-k2-turbo-preview"],
@@ -1461,7 +1460,9 @@ def setup_agent_settings(config: dict):
)
print_info("Maximum tool-calling iterations per conversation.")
print_info("Higher = more complex tasks, but costs more tokens.")
print_info("Default is 90, which works for most tasks. Use 150+ for open exploration.")
print_info(
f"Press Enter to keep {current_max}. Use 90 for most tasks or 150+ for open exploration."
)
max_iter_str = prompt("Max iterations", current_max)
try:
+210 -55
View File
@@ -118,59 +118,166 @@ def remove_wrapper_script():
def uninstall_gateway_service():
"""Stop and uninstall the gateway service if running."""
"""Stop and uninstall the gateway service (systemd, launchd) and kill any
standalone gateway processes.
Delegates to the gateway module which handles:
- Linux: user + system systemd services (with proper DBUS env setup)
- macOS: launchd plists
- All platforms: standalone ``hermes gateway run`` processes
- Termux/Android: skips systemd (no systemd on Android), still kills standalone processes
"""
import platform
if platform.system() != "Linux":
return False
stopped_something = False
prefix = os.getenv("PREFIX", "")
if os.getenv("TERMUX_VERSION") or "com.termux/files/usr" in prefix:
return False
# 1. Kill any standalone gateway processes (all platforms, including Termux)
try:
from hermes_cli.gateway import get_service_name
svc_name = get_service_name()
except Exception:
svc_name = "hermes-gateway"
service_file = Path.home() / ".config" / "systemd" / "user" / f"{svc_name}.service"
if not service_file.exists():
return False
try:
# Stop the service
subprocess.run(
["systemctl", "--user", "stop", svc_name],
capture_output=True,
check=False
)
# Disable the service
subprocess.run(
["systemctl", "--user", "disable", svc_name],
capture_output=True,
check=False
)
# Remove service file
service_file.unlink()
# Reload systemd
subprocess.run(
["systemctl", "--user", "daemon-reload"],
capture_output=True,
check=False
)
return True
from hermes_cli.gateway import kill_gateway_processes, find_gateway_pids
pids = find_gateway_pids()
if pids:
killed = kill_gateway_processes()
if killed:
log_success(f"Killed {killed} running gateway process(es)")
stopped_something = True
except Exception as e:
log_warn(f"Could not fully remove gateway service: {e}")
log_warn(f"Could not check for gateway processes: {e}")
system = platform.system()
# Termux/Android has no systemd and no launchd — nothing left to do.
prefix = os.getenv("PREFIX", "")
is_termux = bool(os.getenv("TERMUX_VERSION") or "com.termux/files/usr" in prefix)
if is_termux:
return stopped_something
# 2. Linux: uninstall systemd services (both user and system scopes)
if system == "Linux":
try:
from hermes_cli.gateway import (
get_systemd_unit_path,
get_service_name,
_systemctl_cmd,
)
svc_name = get_service_name()
for is_system in (False, True):
unit_path = get_systemd_unit_path(system=is_system)
if not unit_path.exists():
continue
scope = "system" if is_system else "user"
try:
if is_system and os.geteuid() != 0:
log_warn(f"System gateway service exists at {unit_path} "
f"but needs sudo to remove")
continue
cmd = _systemctl_cmd(is_system)
subprocess.run(cmd + ["stop", svc_name],
capture_output=True, check=False)
subprocess.run(cmd + ["disable", svc_name],
capture_output=True, check=False)
unit_path.unlink()
subprocess.run(cmd + ["daemon-reload"],
capture_output=True, check=False)
log_success(f"Removed {scope} gateway service ({unit_path})")
stopped_something = True
except Exception as e:
log_warn(f"Could not remove {scope} gateway service: {e}")
except Exception as e:
log_warn(f"Could not check systemd gateway services: {e}")
# 3. macOS: uninstall launchd plist
elif system == "Darwin":
try:
from hermes_cli.gateway import get_launchd_plist_path
plist_path = get_launchd_plist_path()
if plist_path.exists():
subprocess.run(["launchctl", "unload", str(plist_path)],
capture_output=True, check=False)
plist_path.unlink()
log_success(f"Removed macOS gateway service ({plist_path})")
stopped_something = True
except Exception as e:
log_warn(f"Could not remove launchd gateway service: {e}")
return stopped_something
def _is_default_hermes_home(hermes_home: Path) -> bool:
"""Return True when ``hermes_home`` points at the default (non-profile) root."""
try:
from hermes_constants import get_default_hermes_root
return hermes_home.resolve() == get_default_hermes_root().resolve()
except Exception:
return False
def _discover_named_profiles():
"""Return a list of ``ProfileInfo`` for every non-default profile, or ``[]``
if profile support is unavailable or nothing is installed beyond the
default root."""
try:
from hermes_cli.profiles import list_profiles
except Exception:
return []
try:
return [p for p in list_profiles() if not getattr(p, "is_default", False)]
except Exception as e:
log_warn(f"Could not enumerate profiles: {e}")
return []
def _uninstall_profile(profile) -> None:
"""Fully uninstall a single named profile: stop its gateway service,
remove its alias wrapper, and wipe its HERMES_HOME directory.
We shell out to ``hermes -p <name> gateway stop|uninstall`` because
service names, unit paths, and plist paths are all derived from the
current HERMES_HOME and can't be easily switched in-process.
"""
import sys as _sys
name = profile.name
profile_home = profile.path
log_info(f"Uninstalling profile '{name}'...")
# 1. Stop and remove this profile's gateway service.
# Use `python -m hermes_cli.main` so we don't depend on a `hermes`
# wrapper that may be half-removed mid-uninstall.
hermes_invocation = [_sys.executable, "-m", "hermes_cli.main", "--profile", name]
for subcmd in ("stop", "uninstall"):
try:
subprocess.run(
hermes_invocation + ["gateway", subcmd],
capture_output=True,
text=True,
timeout=60,
check=False,
)
except subprocess.TimeoutExpired:
log_warn(f" Gateway {subcmd} timed out for '{name}'")
except Exception as e:
log_warn(f" Could not run gateway {subcmd} for '{name}': {e}")
# 2. Remove the wrapper alias script at ~/.local/bin/<name> (if any).
alias_path = getattr(profile, "alias_path", None)
if alias_path and alias_path.exists():
try:
alias_path.unlink()
log_success(f" Removed alias {alias_path}")
except Exception as e:
log_warn(f" Could not remove alias {alias_path}: {e}")
# 3. Wipe the profile's HERMES_HOME directory.
try:
if profile_home.exists():
shutil.rmtree(profile_home)
log_success(f" Removed {profile_home}")
except Exception as e:
log_warn(f" Could not remove {profile_home}: {e}")
def run_uninstall(args):
"""
Run the uninstall process.
@@ -181,7 +288,13 @@ def run_uninstall(args):
"""
project_root = get_project_root()
hermes_home = get_hermes_home()
# Detect named profiles when uninstalling from the default root —
# offer to clean them up too instead of leaving zombie HERMES_HOMEs
# and systemd units behind.
is_default_profile = _is_default_hermes_home(hermes_home)
named_profiles = _discover_named_profiles() if is_default_profile else []
print()
print(color("┌─────────────────────────────────────────────────────────┐", Colors.MAGENTA, Colors.BOLD))
print(color("│ ⚕ Hermes Agent Uninstaller │", Colors.MAGENTA, Colors.BOLD))
@@ -195,6 +308,13 @@ def run_uninstall(args):
print(f" Secrets: {hermes_home / '.env'}")
print(f" Data: {hermes_home / 'cron/'}, {hermes_home / 'sessions/'}, {hermes_home / 'logs/'}")
print()
if named_profiles:
print(color("Other profiles detected:", Colors.CYAN, Colors.BOLD))
for p in named_profiles:
running = " (gateway running)" if getattr(p, "gateway_running", False) else ""
print(f"{p.name}{running}: {p.path}")
print()
# Ask for confirmation
print(color("Uninstall Options:", Colors.YELLOW, Colors.BOLD))
@@ -221,12 +341,40 @@ def run_uninstall(args):
return
full_uninstall = (choice == "2")
# When doing a full uninstall from the default profile, also offer to
# remove any named profiles — stopping their gateway services, unlinking
# their alias wrappers, and wiping their HERMES_HOME dirs. Otherwise
# those leave zombie services and data behind.
remove_profiles = False
if full_uninstall and named_profiles:
print()
print(color("Other profiles will NOT be removed by default.", Colors.YELLOW))
print(f"Found {len(named_profiles)} named profile(s): " +
", ".join(p.name for p in named_profiles))
print()
try:
resp = input(color(
f"Also stop and remove these {len(named_profiles)} profile(s)? [y/N]: ",
Colors.BOLD
)).strip().lower()
except (KeyboardInterrupt, EOFError):
print()
print("Cancelled.")
return
remove_profiles = resp in ("y", "yes")
# Final confirmation
print()
if full_uninstall:
print(color("⚠️ WARNING: This will permanently delete ALL Hermes data!", Colors.RED, Colors.BOLD))
print(color(" Including: configs, API keys, sessions, scheduled jobs, logs", Colors.RED))
if remove_profiles:
print(color(
f" Plus {len(named_profiles)} profile(s): " +
", ".join(p.name for p in named_profiles),
Colors.RED
))
else:
print("This will remove the Hermes code but keep your configuration and data.")
@@ -247,12 +395,10 @@ def run_uninstall(args):
print(color("Uninstalling...", Colors.CYAN, Colors.BOLD))
print()
# 1. Stop and uninstall gateway service
log_info("Checking for gateway service...")
if uninstall_gateway_service():
log_success("Gateway service stopped and removed")
else:
log_info("No gateway service found")
# 1. Stop and uninstall gateway service + kill standalone processes
log_info("Checking for running gateway...")
if not uninstall_gateway_service():
log_info("No gateway service or processes found")
# 2. Remove PATH entries from shell configs
log_info("Removing PATH entries from shell configs...")
@@ -291,8 +437,17 @@ def run_uninstall(args):
log_warn(f"Could not fully remove {project_root}: {e}")
log_info("You may need to manually remove it")
# 5. Optionally remove ~/.hermes/ data directory
# 5. Optionally remove ~/.hermes/ data directory (and named profiles)
if full_uninstall:
# 5a. Stop and remove each named profile's gateway service and
# alias wrapper. The profile HERMES_HOME dirs live under
# ``<default>/profiles/<name>/`` and will be swept away by the
# rmtree below, but services + alias scripts live OUTSIDE the
# default root and have to be cleaned up explicitly.
if remove_profiles and named_profiles:
for prof in named_profiles:
_uninstall_profile(prof)
log_info("Removing configuration and data...")
try:
if hermes_home.exists():
+1 -1
View File
@@ -59,7 +59,7 @@ except ImportError:
f"Install with: {sys.executable} -m pip install 'fastapi' 'uvicorn[standard]'"
)
WEB_DIST = Path(__file__).parent / "web_dist"
WEB_DIST = Path(os.environ["HERMES_WEB_DIST"]) if "HERMES_WEB_DIST" in os.environ else Path(__file__).parent / "web_dist"
_log = logging.getLogger(__name__)
app = FastAPI(title="Hermes Agent", version=__version__)
+24 -1
View File
@@ -37,7 +37,30 @@ json.dump(sorted(leaf_paths(DEFAULT_CONFIG)), sys.stdout, indent=2)
in {
packages.configKeys = configKeys;
checks = lib.optionalAttrs pkgs.stdenv.hostPlatform.isLinux {
checks = {
# Cross-platform evaluation — catches "not supported for interpreter"
# errors (e.g. sphinx dropping python311) without needing a darwin builder.
# Evaluation is pure and instant; it doesn't build anything.
cross-eval = let
targetSystems = builtins.filter
(s: inputs.self.packages ? ${s})
[ "x86_64-linux" "aarch64-linux" "aarch64-darwin" "x86_64-darwin" ];
tryEvalPkg = sys:
let pkg = inputs.self.packages.${sys}.default;
in builtins.tryEval (builtins.seq pkg.drvPath true);
results = map (sys: { inherit sys; result = tryEvalPkg sys; }) targetSystems;
failures = builtins.filter (r: !r.result.success) results;
failMsg = lib.concatMapStringsSep "\n" (r: " - ${r.sys}") failures;
in pkgs.runCommand "hermes-cross-eval" { } (
if failures != [] then
builtins.throw "Package fails to evaluate on:\n${failMsg}"
else ''
echo "PASS: package evaluates on all ${toString (builtins.length targetSystems)} platforms"
mkdir -p $out
echo "ok" > $out/result
''
);
} // lib.optionalAttrs pkgs.stdenv.hostPlatform.isLinux {
# Verify binaries exist and are executable
package-contents = pkgs.runCommand "hermes-package-contents" { } ''
set -e
+1 -1
View File
@@ -12,7 +12,7 @@
devShells.default = pkgs.mkShell {
inputsFrom = packages;
packages = with pkgs; [
python311 uv nodejs_22 ripgrep git openssh ffmpeg
python312 uv nodejs_22 ripgrep git openssh ffmpeg
];
shellHook = let
+3 -4
View File
@@ -148,15 +148,14 @@
su -s /bin/sh "$TARGET_USER" -c 'curl -LsSf https://astral.sh/uv/install.sh | sh' || true
fi
# Python 3.11 venv — gives the agent a writable Python with pip.
# Uses uv to install Python 3.11 (Ubuntu 24.04 ships 3.12).
# Python 3.12 venv — gives the agent a writable Python with pip.
# --seed includes pip/setuptools so bare `pip install` works.
_UV_BIN="$TARGET_HOME/.local/bin/uv"
if [ ! -d "$TARGET_HOME/.venv" ] && [ -x "$_UV_BIN" ]; then
su -s /bin/sh "$TARGET_USER" -c "
export PATH=\"\$HOME/.local/bin:\$PATH\"
uv python install 3.11
uv venv --python 3.11 --seed \"\$HOME/.venv\"
uv python install 3.12
uv venv --python 3.12 --seed \"\$HOME/.venv\"
" || true
fi
+8 -1
View File
@@ -18,6 +18,10 @@
filter = path: _type: !(pkgs.lib.hasInfix "/index-cache/" path);
};
hermesWeb = pkgs.callPackage ./web.nix {
npm-lockfile-fix = inputs'.npm-lockfile-fix.packages.default;
};
runtimeDeps = with pkgs; [
nodejs_22
ripgrep
@@ -52,6 +56,7 @@
mkdir -p $out/share/hermes-agent $out/bin
cp -r ${bundledSkills} $out/share/hermes-agent/skills
cp -r ${hermesWeb} $out/share/hermes-agent/web_dist
# copy pre-built TUI (same layout as dev: ui-tui/dist/ + node_modules/)
mkdir -p $out/ui-tui
@@ -62,6 +67,7 @@
makeWrapper ${hermesVenv}/bin/${name} $out/bin/${name} \
--suffix PATH : "${runtimePath}" \
--set HERMES_BUNDLED_SKILLS $out/share/hermes-agent/skills \
--set HERMES_WEB_DIST $out/share/hermes-agent/web_dist \
--set HERMES_TUI_DIR $out/ui-tui \
--set HERMES_PYTHON ${hermesVenv}/bin/python3 \
--set HERMES_NODE ${pkgs.nodejs_22}/bin/node
@@ -81,7 +87,7 @@
STAMP_VALUE="${pyprojectHash}:${uvLockHash}"
if [ ! -f "$STAMP" ] || [ "$(cat "$STAMP")" != "$STAMP_VALUE" ]; then
echo "hermes-agent: installing Python dependencies..."
uv venv .venv --python ${pkgs.python311}/bin/python3 2>/dev/null || true
uv venv .venv --python ${pkgs.python312}/bin/python3 2>/dev/null || true
source .venv/bin/activate
uv pip install -e ".[all]"
[ -d mini-swe-agent ] && uv pip install -e ./mini-swe-agent 2>/dev/null || true
@@ -104,6 +110,7 @@
};
tui = hermesTui;
web = hermesWeb;
};
};
}
+11 -9
View File
@@ -1,6 +1,6 @@
# nix/python.nix — uv2nix virtual environment builder
{
python311,
python312,
lib,
callPackage,
uv2nix,
@@ -51,28 +51,30 @@ let
pythonPackageOverrides = final: _prev:
if isAarch64Darwin then {
numpy = mkPrebuiltOverride final python311.pkgs.numpy { };
numpy = mkPrebuiltOverride final python312.pkgs.numpy { };
av = mkPrebuiltOverride final python311.pkgs.av { };
pyarrow = mkPrebuiltOverride final python312.pkgs.pyarrow { };
humanfriendly = mkPrebuiltOverride final python311.pkgs.humanfriendly { };
av = mkPrebuiltOverride final python312.pkgs.av { };
coloredlogs = mkPrebuiltOverride final python311.pkgs.coloredlogs {
humanfriendly = mkPrebuiltOverride final python312.pkgs.humanfriendly { };
coloredlogs = mkPrebuiltOverride final python312.pkgs.coloredlogs {
humanfriendly = [ ];
};
onnxruntime = mkPrebuiltOverride final python311.pkgs.onnxruntime {
onnxruntime = mkPrebuiltOverride final python312.pkgs.onnxruntime {
coloredlogs = [ ];
numpy = [ ];
packaging = [ ];
};
ctranslate2 = mkPrebuiltOverride final python311.pkgs.ctranslate2 {
ctranslate2 = mkPrebuiltOverride final python312.pkgs.ctranslate2 {
numpy = [ ];
pyyaml = [ ];
};
faster-whisper = mkPrebuiltOverride final python311.pkgs.faster-whisper {
faster-whisper = mkPrebuiltOverride final python312.pkgs.faster-whisper {
av = [ ];
ctranslate2 = [ ];
huggingface-hub = [ ];
@@ -84,7 +86,7 @@ let
pythonSet =
(callPackage pyproject-nix.build.packages {
python = python311;
python = python312;
}).overrideScope
(lib.composeManyExtensions [
pyproject-build-systems.overlays.default
+63
View File
@@ -0,0 +1,63 @@
# nix/web.nix — Hermes Web Dashboard (Vite/React) frontend build
{ pkgs, npm-lockfile-fix, ... }:
let
src = ../web;
npmDeps = pkgs.fetchNpmDeps {
inherit src;
hash = "sha256-Y0pOzdFG8BLjfvCLmsvqYpjxFjAQabXp1i7X9W/cCU4=";
};
npmLockHash = builtins.hashString "sha256" (builtins.readFile ../web/package-lock.json);
in
pkgs.buildNpmPackage {
pname = "hermes-web";
version = "0.0.0";
inherit src npmDeps;
doCheck = false;
buildPhase = ''
npx tsc -b
npx vite build --outDir dist
'';
installPhase = ''
runHook preInstall
cp -r dist $out
runHook postInstall
'';
nativeBuildInputs = [
(pkgs.writeShellScriptBin "update_web_lockfile" ''
set -euox pipefail
REPO_ROOT=$(git rev-parse --show-toplevel)
cd "$REPO_ROOT/web"
rm -rf node_modules/
npm cache clean --force
CI=true npm install
${pkgs.lib.getExe npm-lockfile-fix} ./package-lock.json
NIX_FILE="$REPO_ROOT/nix/web.nix"
sed -i "s/hash = \"[^\"]*\";/hash = \"\";/" $NIX_FILE
NIX_OUTPUT=$(nix build .#web 2>&1 || true)
NEW_HASH=$(echo "$NIX_OUTPUT" | grep 'got:' | awk '{print $2}')
echo got new hash $NEW_HASH
sed -i "s|hash = \"[^\"]*\";|hash = \"$NEW_HASH\";|" $NIX_FILE
nix build .#web
echo "Updated npm hash in $NIX_FILE to $NEW_HASH"
'')
];
passthru.devShellHook = ''
STAMP=".nix-stamps/hermes-web"
STAMP_VALUE="${npmLockHash}"
if [ ! -f "$STAMP" ] || [ "$(cat "$STAMP")" != "$STAMP_VALUE" ]; then
echo "hermes-web: installing npm dependencies..."
cd web && CI=true npm install --silent --no-fund --no-audit 2>/dev/null && cd ..
mkdir -p .nix-stamps
echo "$STAMP_VALUE" > "$STAMP"
fi
'';
}
@@ -145,10 +145,10 @@ Controls **how often** dialectic and context calls happen.
| Key | Default | Description |
|-----|---------|-------------|
| `contextCadence` | `1` | Min turns between context API calls |
| `dialecticCadence` | `3` | Min turns between dialectic API calls |
| `dialecticCadence` | `2` | Min turns between dialectic API calls. Recommended 15 |
| `injectionFrequency` | `every-turn` | `every-turn` or `first-turn` for base context injection |
Higher cadence values reduce API calls and cost. `dialecticCadence: 3` (default) means the dialectic engine fires at most every 3rd turn.
Higher cadence values fire the dialectic LLM less often. `dialecticCadence: 2` means the engine fires every other turn. Setting it to `1` fires every turn.
### Depth (how many)
@@ -180,6 +180,8 @@ If `dialecticDepthLevels` is omitted, rounds use **proportional levels** derived
This keeps earlier passes cheap while using full depth on the final synthesis.
**Depth at session start.** The session-start prewarm runs the full configured `dialecticDepth` in the background before turn 1. A single-pass prewarm on a cold peer often returns thin output — multi-pass depth runs the audit/reconcile cycle before the user ever speaks. Turn 1 consumes the prewarm result directly; if prewarm hasn't landed in time, turn 1 falls back to a synchronous call with a bounded timeout.
### Level (how hard)
Controls the **intensity** of each dialectic reasoning round.
@@ -368,7 +370,7 @@ Config file: `$HERMES_HOME/honcho.json` (profile-local) or `~/.honcho/config.jso
| `contextTokens` | uncapped | Max tokens for the combined base context injection (summary + representation + card). Opt-in cap — omit to leave uncapped, set to an integer to bound injection size. |
| `injectionFrequency` | `every-turn` | `every-turn` or `first-turn` |
| `contextCadence` | `1` | Min turns between context API calls |
| `dialecticCadence` | `3` | Min turns between dialectic LLM calls |
| `dialecticCadence` | `2` | Min turns between dialectic LLM calls (recommended 15) |
The `contextTokens` budget is enforced at injection time. If the session summary + representation + card exceed the budget, Honcho trims the summary first, then the representation, preserving the card. This prevents context blowup in long sessions.
@@ -0,0 +1,339 @@
---
name: touchdesigner-mcp
description: "Control a running TouchDesigner instance via twozero MCP — create operators, set parameters, wire connections, execute Python, build real-time visuals. 36 native tools."
version: 1.0.0
author: kshitijk4poor
license: MIT
metadata:
hermes:
tags: [TouchDesigner, MCP, twozero, creative-coding, real-time-visuals, generative-art, audio-reactive, VJ, installation, GLSL]
related_skills: [native-mcp, ascii-video, manim-video, hermes-video]
---
# TouchDesigner Integration (twozero MCP)
## CRITICAL RULES
1. **NEVER guess parameter names.** Call `td_get_par_info` for the op type FIRST. Your training data is wrong for TD 2025.32.
2. **If `tdAttributeError` fires, STOP.** Call `td_get_operator_info` on the failing node before continuing.
3. **NEVER hardcode absolute paths** in script callbacks. Use `me.parent()` / `scriptOp.parent()`.
4. **Prefer native MCP tools over td_execute_python.** Use `td_create_operator`, `td_set_operator_pars`, `td_get_errors` etc. Only fall back to `td_execute_python` for complex multi-step logic.
5. **Call `td_get_hints` before building.** It returns patterns specific to the op type you're working with.
## Architecture
```
Hermes Agent -> MCP (Streamable HTTP) -> twozero.tox (port 40404) -> TD Python
```
36 native tools. Free plugin (no payment/license — confirmed April 2026).
Context-aware (knows selected OP, current network).
Hub health check: `GET http://localhost:40404/mcp` returns JSON with instance PID, project name, TD version.
## Setup (Automated)
Run the setup script to handle everything:
```bash
bash "${HERMES_HOME:-$HOME/.hermes}/skills/creative/touchdesigner-mcp/scripts/setup.sh"
```
The script will:
1. Check if TD is running
2. Download twozero.tox if not already cached
3. Add `twozero_td` MCP server to Hermes config (if missing)
4. Test the MCP connection on port 40404
5. Report what manual steps remain (drag .tox into TD, enable MCP toggle)
### Manual steps (one-time, cannot be automated)
1. **Drag `~/Downloads/twozero.tox` into the TD network editor** → click Install
2. **Enable MCP:** click twozero icon → Settings → mcp → "auto start MCP" → Yes
3. **Restart Hermes session** to pick up the new MCP server
After setup, verify:
```bash
nc -z 127.0.0.1 40404 && echo "twozero MCP: READY"
```
## Environment Notes
- **Non-Commercial TD** caps resolution at 1280×1280. Use `outputresolution = 'custom'` and set width/height explicitly.
- **Codecs:** `prores` (preferred on macOS) or `mjpa` as fallback. H.264/H.265/AV1 require a Commercial license.
- Always call `td_get_par_info` before setting params — names vary by TD version (see CRITICAL RULES #1).
## Workflow
### Step 0: Discover (before building anything)
```
Call td_get_par_info with op_type for each type you plan to use.
Call td_get_hints with the topic you're building (e.g. "glsl", "audio reactive", "feedback").
Call td_get_focus to see where the user is and what's selected.
Call td_get_network to see what already exists.
```
No temp nodes, no cleanup. This replaces the old discovery dance entirely.
### Step 1: Clean + Build
**IMPORTANT: Split cleanup and creation into SEPARATE MCP calls.** Destroying and recreating same-named nodes in one `td_execute_python` script causes "Invalid OP object" errors. See pitfalls #11b.
Use `td_create_operator` for each node (handles viewport positioning automatically):
```
td_create_operator(type="noiseTOP", parent="/project1", name="bg", parameters={"resolutionw": 1280, "resolutionh": 720})
td_create_operator(type="levelTOP", parent="/project1", name="brightness")
td_create_operator(type="nullTOP", parent="/project1", name="out")
```
For bulk creation or wiring, use `td_execute_python`:
```python
# td_execute_python script:
root = op('/project1')
nodes = []
for name, optype in [('bg', noiseTOP), ('fx', levelTOP), ('out', nullTOP)]:
n = root.create(optype, name)
nodes.append(n.path)
# Wire chain
for i in range(len(nodes)-1):
op(nodes[i]).outputConnectors[0].connect(op(nodes[i+1]).inputConnectors[0])
result = {'created': nodes}
```
### Step 2: Set Parameters
Prefer the native tool (validates params, won't crash):
```
td_set_operator_pars(path="/project1/bg", parameters={"roughness": 0.6, "monochrome": true})
```
For expressions or modes, use `td_execute_python`:
```python
op('/project1/time_driver').par.colorr.expr = "absTime.seconds % 1000.0"
```
### Step 3: Wire
Use `td_execute_python` — no native wire tool exists:
```python
op('/project1/bg').outputConnectors[0].connect(op('/project1/fx').inputConnectors[0])
```
### Step 4: Verify
```
td_get_errors(path="/project1", recursive=true)
td_get_perf()
td_get_operator_info(path="/project1/out", detail="full")
```
### Step 5: Display / Capture
```
td_get_screenshot(path="/project1/out")
```
Or open a window via script:
```python
win = op('/project1').create(windowCOMP, 'display')
win.par.winop = op('/project1/out').path
win.par.winw = 1280; win.par.winh = 720
win.par.winopen.pulse()
```
## MCP Tool Quick Reference
**Core (use these most):**
| Tool | What |
|------|------|
| `td_execute_python` | Run arbitrary Python in TD. Full API access. |
| `td_create_operator` | Create node with params + auto-positioning |
| `td_set_operator_pars` | Set params safely (validates, won't crash) |
| `td_get_operator_info` | Inspect one node: connections, params, errors |
| `td_get_operators_info` | Inspect multiple nodes in one call |
| `td_get_network` | See network structure at a path |
| `td_get_errors` | Find errors/warnings recursively |
| `td_get_par_info` | Get param names for an OP type (replaces discovery) |
| `td_get_hints` | Get patterns/tips before building |
| `td_get_focus` | What network is open, what's selected |
**Read/Write:**
| Tool | What |
|------|------|
| `td_read_dat` | Read DAT text content |
| `td_write_dat` | Write/patch DAT content |
| `td_read_chop` | Read CHOP channel values |
| `td_read_textport` | Read TD console output |
**Visual:**
| Tool | What |
|------|------|
| `td_get_screenshot` | Capture one OP viewer to file |
| `td_get_screenshots` | Capture multiple OPs at once |
| `td_get_screen_screenshot` | Capture actual screen via TD |
| `td_navigate_to` | Jump network editor to an OP |
**Search:**
| Tool | What |
|------|------|
| `td_find_op` | Find ops by name/type across project |
| `td_search` | Search code, expressions, string params |
**System:**
| Tool | What |
|------|------|
| `td_get_perf` | Performance profiling (FPS, slow ops) |
| `td_list_instances` | List all running TD instances |
| `td_get_docs` | In-depth docs on a TD topic |
| `td_agents_md` | Read/write per-COMP markdown docs |
| `td_reinit_extension` | Reload extension after code edit |
| `td_clear_textport` | Clear console before debug session |
**Input Automation:**
| Tool | What |
|------|------|
| `td_input_execute` | Send mouse/keyboard to TD |
| `td_input_status` | Poll input queue status |
| `td_input_clear` | Stop input automation |
| `td_op_screen_rect` | Get screen coords of a node |
| `td_click_screen_point` | Click a point in a screenshot |
See `references/mcp-tools.md` for full parameter schemas.
## Key Implementation Rules
**GLSL time:** No `uTDCurrentTime` in GLSL TOP. Use the Values page:
```python
# Call td_get_par_info(op_type="glslTOP") first to confirm param names
td_set_operator_pars(path="/project1/shader", parameters={"value0name": "uTime"})
# Then set expression via script:
# op('/project1/shader').par.value0.expr = "absTime.seconds"
# In GLSL: uniform float uTime;
```
Fallback: Constant TOP in `rgba32float` format (8-bit clamps to 0-1, freezing the shader).
**Feedback TOP:** Use `top` parameter reference, not direct input wire. "Not enough sources" resolves after first cook. "Cook dependency loop" warning is expected.
**Resolution:** Non-Commercial caps at 1280×1280. Use `outputresolution = 'custom'`.
**Large shaders:** Write GLSL to `/tmp/file.glsl`, then use `td_write_dat` or `td_execute_python` to load.
**Vertex/Point access (TD 2025.32):** `point.P[0]`, `point.P[1]`, `point.P[2]` — NOT `.x`, `.y`, `.z`.
**Extensions:** `ext0object` format is `"op('./datName').module.ClassName(me)"` in CONSTANT mode. After editing extension code with `td_write_dat`, call `td_reinit_extension`.
**Script callbacks:** ALWAYS use relative paths via `me.parent()` / `scriptOp.parent()`.
**Cleaning nodes:** Always `list(root.children)` before iterating + `child.valid` check.
## Recording / Exporting Video
```python
# via td_execute_python:
root = op('/project1')
rec = root.create(moviefileoutTOP, 'recorder')
op('/project1/out').outputConnectors[0].connect(rec.inputConnectors[0])
rec.par.type = 'movie'
rec.par.file = '/tmp/output.mov'
rec.par.videocodec = 'prores' # Apple ProRes — NOT license-restricted on macOS
rec.par.record = True # start
# rec.par.record = False # stop (call separately later)
```
H.264/H.265/AV1 need Commercial license. Use `prores` on macOS or `mjpa` as fallback.
Extract frames: `ffmpeg -i /tmp/output.mov -vframes 120 /tmp/frames/frame_%06d.png`
**TOP.save() is useless for animation** — captures same GPU texture every time. Always use MovieFileOut.
### Before Recording: Checklist
1. **Verify FPS > 0** via `td_get_perf`. If FPS=0 the recording will be empty. See pitfalls #38-39.
2. **Verify shader output is not black** via `td_get_screenshot`. Black output = shader error or missing input. See pitfalls #8, #40.
3. **If recording with audio:** cue audio to start first, then delay recording by 3 frames. See pitfalls #19.
4. **Set output path before starting record** — setting both in the same script can race.
## Audio-Reactive GLSL (Proven Recipe)
### Correct signal chain (tested April 2026)
```
AudioFileIn CHOP (playmode=sequential)
→ AudioSpectrum CHOP (FFT=512, outputmenu=setmanually, outlength=256, timeslice=ON)
→ Math CHOP (gain=10)
→ CHOP to TOP (dataformat=r, layout=rowscropped)
→ GLSL TOP input 1 (spectrum texture, 256x2)
Constant TOP (rgba32float, time) → GLSL TOP input 0
GLSL TOP → Null TOP → MovieFileOut
```
### Critical audio-reactive rules (empirically verified)
1. **TimeSlice must stay ON** for AudioSpectrum. OFF = processes entire audio file → 24000+ samples → CHOP to TOP overflow.
2. **Set Output Length manually** to 256 via `outputmenu='setmanually'` and `outlength=256`. Default outputs 22050 samples.
3. **DO NOT use Lag CHOP for spectrum smoothing.** Lag CHOP operates in timeslice mode and expands 256 samples to 2400+, averaging all values to near-zero (~1e-06). The shader receives no usable data. This was the #1 audio sync failure in testing.
4. **DO NOT use Filter CHOP either** — same timeslice expansion problem with spectrum data.
5. **Smoothing belongs in the GLSL shader** if needed, via temporal lerp with a feedback texture: `mix(prevValue, newValue, 0.3)`. This gives frame-perfect sync with zero pipeline latency.
6. **CHOP to TOP dataformat = 'r'**, layout = 'rowscropped'. Spectrum output is 256x2 (stereo). Sample at y=0.25 for first channel.
7. **Math gain = 10** (not 5). Raw spectrum values are ~0.19 in bass range. Gain of 10 gives usable ~5.0 for the shader.
8. **No Resample CHOP needed.** Control output size via AudioSpectrum's `outlength` param directly.
### GLSL spectrum sampling
```glsl
// Input 0 = time (1x1 rgba32float), Input 1 = spectrum (256x2)
float iTime = texture(sTD2DInputs[0], vec2(0.5)).r;
// Sample multiple points per band and average for stability:
// NOTE: y=0.25 for first channel (stereo texture is 256x2, first row center is 0.25)
float bass = (texture(sTD2DInputs[1], vec2(0.02, 0.25)).r +
texture(sTD2DInputs[1], vec2(0.05, 0.25)).r) / 2.0;
float mid = (texture(sTD2DInputs[1], vec2(0.2, 0.25)).r +
texture(sTD2DInputs[1], vec2(0.35, 0.25)).r) / 2.0;
float hi = (texture(sTD2DInputs[1], vec2(0.6, 0.25)).r +
texture(sTD2DInputs[1], vec2(0.8, 0.25)).r) / 2.0;
```
See `references/network-patterns.md` for complete build scripts + shader code.
## Operator Quick Reference
| Family | Color | Python class / MCP type | Suffix |
|--------|-------|-------------|--------|
| TOP | Purple | noiseTOP, glslTOP, compositeTOP, levelTop, blurTOP, textTOP, nullTOP | TOP |
| CHOP | Green | audiofileinCHOP, audiospectrumCHOP, mathCHOP, lfoCHOP, constantCHOP | CHOP |
| SOP | Blue | gridSOP, sphereSOP, transformSOP, noiseSOP | SOP |
| DAT | White | textDAT, tableDAT, scriptDAT, webserverDAT | DAT |
| MAT | Yellow | phongMAT, pbrMAT, glslMAT, constMAT | MAT |
| COMP | Gray | geometryCOMP, containerCOMP, cameraCOMP, lightCOMP, windowCOMP | COMP |
## Security Notes
- MCP runs on localhost only (port 40404). No authentication — any local process can send commands.
- `td_execute_python` has unrestricted access to the TD Python environment and filesystem as the TD process user.
- `setup.sh` downloads twozero.tox from the official 404zero.com URL. Verify the download if concerned.
- The skill never sends data outside localhost. All MCP communication is local.
## References
| File | What |
|------|------|
| `references/pitfalls.md` | Hard-won lessons from real sessions |
| `references/operators.md` | All operator families with params and use cases |
| `references/network-patterns.md` | Recipes: audio-reactive, generative, GLSL, instancing |
| `references/mcp-tools.md` | Full twozero MCP tool parameter schemas |
| `references/python-api.md` | TD Python: op(), scripting, extensions |
| `references/troubleshooting.md` | Connection diagnostics, debugging |
| `scripts/setup.sh` | Automated setup script |
---
> You're not writing code. You're conducting light.
@@ -0,0 +1,382 @@
# twozero MCP Tools Reference
36 tools from twozero MCP v2.774+ (April 2026).
All tools accept an optional `target_instance` param for multi-TD-instance scenarios.
## Execution & Scripting
### td_execute_python
Execute Python code inside TouchDesigner and return the result. Has full access to TD Python API (op, project, app, etc). Print statements and the last expression value are captured. Best for: wiring connections (inputConnectors), setting expressions (par.X.expr/mode), querying parameter names, and batch creation scripts (5+ operators). For creating 1-4 operators, prefer td_create_operator instead.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `code` | string | yes | Python code to execute in TouchDesigner |
## Network & Structure
### td_get_network
Get the operator network structure in TouchDesigner (TD) at a given path. Returns compact list: name OPType flags. First line is full path of queried op. Flags: ch:N=children count, !cook=allowCooking off, bypass, private=isPrivate, blocked:reason, "comment text". depth=0 (default) = current level only. depth=1 = one level of children (indented). To explore deeper, call again on a specific COMP path. System operators (/ui, /sys) are hidden by default.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `path` | string | no | Network path to inspect, e.g. '/' or '/project1' |
| `depth` | integer | no | How many levels deep to recurse. 0=current level only (recommended), 1=include direct children of COMPs |
| `includeSystem` | boolean | no | Include system operators (/ui, /sys). Default false. |
| `nodeXY` | boolean | no | Include nodeX,nodeY coordinates. Default false. |
### td_create_operator
Create a new operator (node) in TouchDesigner (TD). Preferred way to create operators — handles viewport positioning, viewer flag, and docked ops automatically. For batch creation (5+ ops), you may use td_execute_python with a script instead, but then call td_get_hints('construction') first for correct parameter names and layout rules. Supports all TD operator types: TOP, CHOP, SOP, DAT, COMP, MAT. If parent is omitted, creates in the currently open network at the user's viewport position. When building a container: first create baseCOMP (no parent), then create children with parent=compPath.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `type` | string | yes | Operator type, e.g. 'textDAT', 'constantCHOP', 'noiseTOP', 'transformTOP', 'baseCOMP' |
| `parent` | string | no | Path to the parent operator. If omitted, uses the currently open network in TD. |
| `name` | string | no | Name for the new operator (optional, TD auto-names if omitted) |
| `parameters` | object | no | Key-value pairs of parameters to set on the created operator |
### td_find_op
Find operators by name and/or type across the project. Returns TSV: path, OPType, flags. Flags: bypass, !cook, private, blocked:reason. Use td_search to search inside code/expressions; use td_find_op to find operators themselves.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `name` | string | no | Substring to match in operator name (case-insensitive). E.g. 'noise' finds noise1, noise2, myNoise. |
| `type` | string | no | Substring to match in OPType (case-insensitive). E.g. 'noiseTOP', 'baseCOMP', 'CHOP'. Use exact type for precision or partial for broader matches. |
| `root` | string | no | Root operator path to search from. Default '/project1'. |
| `max_results` | number | no | Maximum results to return. Default 50. |
| `max_depth` | number | no | Max recursion depth from root. Default unlimited. |
| `detail` | `basic` / `summary` | no | Result detail level. 'basic' = name/path/type (fast). 'summary' = + connections, non-default pars, expressions. Default 'basic'. |
### td_search
Search for text across all code (DAT scripts), parameter expressions, and string parameter values in the TD project. Returns TSV: path, kind (code/expression/parameter/ref), line, text. JSON when context>0. Words are OR-matched. Use quotes for exact phrases: 'GetLogin "op('login')"'. Use count_only=true to quickly check if something is referenced without fetching full results.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `query` | string | yes | Search query. Multiple words = OR (any match). Wrap in quotes for exact phrase. Example: 'GetLogin getLogin' finds either. |
| `root` | string | no | Root operator path to search from. Default '/project1'. |
| `scope` | `all` / `code` / `editable` / `expressions` / `parameters` | no | What to search. 'code' = DAT scripts only (fast, ~0.05s). 'editable' = only editable code (skips inherited/ref DATs). 'expressions' = parameter expressions only. 'parameters' = string parameter values only. 'all' = everything (slow, ~1.5s due to parameter scan). Default 'all'. |
| `case_sensitive` | boolean | no | Case-sensitive matching. Default false. |
| `max_results` | number | no | Maximum results to return. Default 50. |
| `context` | number | no | Lines to show before/after each code match. Saves td_read_dat calls. Default 0. |
| `count_only` | boolean | no | Return only match count, not results. Fast existence check. |
| `max_depth` | number | no | Max recursion depth from root. Default unlimited. |
### td_navigate_to
Navigate the TouchDesigner Network Editor viewport to show a specific operator. Opens the operator's parent network and centers the view on it. Use this to show the user where a problem is, or to navigate to an operator before modifying it.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `path` | string | yes | Path to the operator to navigate to, e.g. '/project1/noise1' |
## Operator Inspection
### td_get_operator_info
Get information about a specific operator (node) in TouchDesigner (TD). detail='summary': connections, non-default pars, expressions, CHOP channels (compact). detail='full': all of the above PLUS every parameter with value/default/label.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `path` | string | yes | Full path to the operator, e.g. '/project1/noise1' |
| `detail` | `summary` / `full` | no | Level of detail. 'summary' = connections, expressions, non-default pars, custom pars (pulse marked), CHOP channels. 'full' = summary + all parameters. Default 'full'. |
### td_get_operators_info
Get information about multiple operators in one call. Returns an array of operator info objects. Use instead of calling td_get_operator_info multiple times.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `paths` | array | yes | Array of full operator paths, e.g. ['/project1/null1', '/project1/null2'] |
| `detail` | `summary` / `full` | no | Level of detail. Default 'summary'. |
### td_get_par_info
Get parameter names and details for a TouchDesigner operator type. Without specific pars: returns compact list of all parameters with their names, types, and menu options. With pars: returns full details (help text, menu values, style) for specific parameters. Use this when you need to know exact parameter names before setting them.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `op_type` | string | yes | TD operator type name, e.g. 'noiseTOP', 'blurTOP', 'lfoCHOP', 'compositeTOP' |
| `pars` | array | no | Optional list of specific parameter names to get full details for |
## Parameter Setting
### td_set_operator_pars
Set parameters and flags on an operator in TouchDesigner (TD). Safer than td_execute_python for simple parameter changes. Can set values, toggle bypass/viewer, without writing Python code.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `path` | string | yes | Path to the operator |
| `parameters` | object | no | Key-value pairs of parameters to set |
| `bypass` | boolean | no | Set bypass state of the operator (not available on COMPs) |
| `viewer` | boolean | no | Set viewer state of the operator |
| `allowCooking` | boolean | no | Set cooking flag on a COMP. When False, internal network stops cooking (0 CPU). COMP-only. |
## Data Read/Write
### td_read_dat
Read the text content of a DAT operator in TouchDesigner (TD). Returns content with line numbers. Use to read scripts, extensions, GLSL shaders, table data.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `path` | string | yes | Path to the DAT operator |
| `start_line` | integer | no | Start line (1-based). Omit to read from beginning. |
| `end_line` | integer | no | End line (inclusive). Omit to read to end. |
### td_write_dat
Write or patch text content of a DAT operator in TouchDesigner (TD). Can do full replacement or StrReplace-style patching (old_text -> new_text). Use for editing scripts, extensions, shaders. Does NOT reinit extensions automatically.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `path` | string | yes | Path to the DAT operator |
| `text` | string | no | Full replacement text. Use this OR old_text+new_text, not both. |
| `old_text` | string | no | Text to find and replace (must be unique in the DAT) |
| `new_text` | string | no | Replacement text |
| `replace_all` | boolean | no | If true, replaces ALL occurrences of old_text (default: false, requires unique match) |
### td_read_chop
Read CHOP channel sample data. Returns channel values as arrays. Use when you need the actual sample values (animation curves, lookup tables, waveforms), not just the summary from td_get_operator_info.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `path` | string | yes | Path to the CHOP operator |
| `channels` | array | no | Channel names to read. Omit to read all channels. |
| `start` | integer | no | Start sample index (0-based). Omit to read from beginning. |
| `end` | integer | no | End sample index (inclusive). Omit to read to end. |
### td_read_textport
Read the last N lines from the TouchDesigner (TD) log/textport (console output). Use this to see errors, warnings and print output from TD.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `lines` | integer | no | Number of recent lines to return |
### td_clear_textport
Clear the MCP textport log buffer. Use this before starting a debug session or an edit-run-check loop to keep td_read_textport output focused and minimal.
No parameters (other than optional `target_instance`).
## Visual Capture
### td_get_screenshot
Get a screenshot of an operator's viewer in TouchDesigner (TD). Saves the image to a file and returns the file path. Use your file-reading tool to view the image. Shows what the operator looks like in its viewer (TOP output, CHOP waveform graph, SOP geometry, DAT table, parameter UI, etc). Use this to visually inspect any operator, or to generate images via TD for use in your project. TWO-STEP ASYNC USAGE: Step 1 — call with 'path' to start: returns {'status': 'pending', 'requestId': '...'}. Step 2 — call with 'request_id' to retrieve: returns {'file': '/tmp/.../opname_id.jpg'}. Then read the file to see the image. If step 2 still returns pending, make one other tool call then retry.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `path` | string | no | Full operator path to screenshot, e.g. '/project1/noise1'. Required for step 1. |
| `request_id` | string | no | Request ID from step 1 to retrieve the completed screenshot. |
| `max_size` | integer | no | Max pixel size for the longer side (default 512). Use 0 for original operator resolution (useful for pixel-accurate UI work). Higher values (e.g. 1024) for more detail. |
| `output_path` | string | no | Optional absolute path where the image should be saved (e.g. '/Users/me/project/render.png'). If omitted, saved to /tmp/pisang_mcp/screenshots/. Use absolute paths — TD's working directory may differ from the agent's. |
| `as_top` | boolean | no | If true, captures the operator directly as a TOP (bypasses the viewer renderer), preserving alpha/transparency. Only works for TOP operators — if the target is not a TOP, falls back to the viewer automatically. Use this when you need a clean PNG with alpha, e.g. to save a generated image for use in another project. |
| `format` | `auto` / `jpg` / `png` | no | Image format. 'auto' (default): JPEG for viewer mode, PNG for as_top=true. 'jpg': always JPEG (smaller). 'png': always PNG (lossless). |
### td_get_screenshots
Get screenshots of multiple operators in one batch. Saves images to files and returns file paths. Use your file-reading tool to view images. TWO-STEP ASYNC USAGE: Step 1 — call with 'paths' array to start: returns {'status': 'pending', 'batchId': '...', 'total': N}. Step 2 — call with 'batch_id' to retrieve: returns {'files': [{op, file}, ...]}. Then read the files to see the images. If still processing returns {'status': 'pending', 'ready': K, 'total': N}.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `paths` | array | no | List of full operator paths to screenshot. Required for step 1. |
| `batch_id` | string | no | Batch ID from step 1 to retrieve completed screenshots. |
| `max_size` | integer | no | Max pixel size for longer side (default 512). Use 0 for original resolution. |
| `as_top` | boolean | no | If true, captures TOP operators directly (preserves alpha). Non-TOP operators fall back to viewer. |
| `output_dir` | string | no | Optional absolute path to a directory. Each screenshot saved as <opname>.jpg or .png inside it and kept on disk. |
| `format` | `auto` / `jpg` / `png` | no | Image format. 'auto' (default): JPEG for viewer mode, PNG for as_top=true. 'jpg': always JPEG (smaller). 'png': always PNG (lossless). |
### td_get_screen_screenshot
Capture a screenshot of the actual screen via TD's screenGrabTOP. Saves the image to a file and returns the file path. Use your file-reading tool to view the image. Unlike td_get_screenshot (operator viewer), this shows what the user literally sees on their monitor — TD windows, UI panels, everything. Use when simulating mouse/keyboard input to verify what happened on screen. Workflow: td_get_screen_screenshot → read file → td_input_execute → wait idle → td_get_screen_screenshot again. TWO-STEP ASYNC: Step 1 — call without request_id: returns {'status':'pending','requestId':'...'}. Step 2 — call with request_id: returns {'file': '/tmp/.../screen_id.jpg', 'info': '...metadata...'}. Then read the file to see the image. The requestId also stays usable with td_screen_point_to_global for later coordinate lookup. crop_x/y/w/h are in ACTUAL SCREEN PIXELS (not image pixels). Crops exceeding screen bounds are auto-clamped. SMART DEFAULTS: max_size is auto when omitted — 1920 for full screen (good overview), max(crop_w,crop_h) for cropped (guarantees 1:1 scale). At 1:1 scale: screen_coord = crop_origin + image_pixel. Otherwise use the formula from metadata.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `request_id` | string | no | Request ID from step 1 to retrieve the completed screenshot. |
| `max_size` | integer | no | Max pixel size for the longer side. Auto when omitted: 1920 for full screen, max(crop_w,crop_h) for cropped (1:1). Set explicitly to override. |
| `crop_x` | integer | no | Left edge in screen pixels. |
| `crop_y` | integer | no | Top edge in screen pixels (y=0 at top of screen). |
| `crop_w` | integer | no | Width in pixels. |
| `crop_h` | integer | no | Height in pixels. |
| `display` | integer | no | Screen index (default 0 = primary display). |
## Context & Focus
### td_get_focus
Get the current user focus in TouchDesigner (TD): which network is open, selected operators, current operator, and rollover (what is under the mouse cursor). IMPORTANT: when the user says 'this operator' or 'вот этот', they mean the SELECTED/CURRENT operator, NOT the rollover. Rollover is just incidental mouse position and should be ignored for intent. Pass screenshots=true to immediately start a screenshot batch for all selected operators — response includes a 'screenshots' field with batchId; retrieve with td_get_screenshots(batch_id=...).
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `screenshots` | boolean | no | If true, start a screenshot batch for all selected operators. Retrieve with td_get_screenshots(batch_id=...). |
| `max_size` | integer | no | Max screenshot size when screenshots=true (default 512). |
| `as_top` | boolean | no | Passed to the screenshot batch when screenshots=true. |
### td_get_errors
Find errors and warnings in TouchDesigner (TD) operators. Checks operator errors, warnings, AND broken parameter expressions (missing channels, bad references, etc). Also includes recent script errors from the log (tracebacks), grouped and deduplicated — e.g. 1000 identical mouse-move errors shown as ×1000 with one entry. If path is given, checks that operator and its children. If no path, checks the currently open network. Use '/' for entire project. Use when user says something is broken, has errors, red nodes, горит ошибка, etc. TIP: call td_clear_textport before reproducing an error to keep log focused. TIP: combine with td_get_perf when user says 'тупит/лагает' to check both errors and performance.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `path` | string | no | Path to check. If omitted, checks the current network. Use '/' to scan entire project. |
| `recursive` | boolean | no | Check children recursively (default true) |
| `include_log` | boolean | no | Include recent script errors from log, grouped by unique signature (default true). Use td_clear_textport before reproducing an error to keep results focused. |
### td_get_perf
Get performance data from TouchDesigner (TD). Returns TSV: header with fps/budget/memory summary, then slowest operators sorted by cook time. Columns: path, OPType, cpu/cook(ms), gpu/cook(ms), cpu/s, gpu/s, rate, flags. Use when user reports lag, low FPS, slow performance, тупит, тормозит.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `path` | string | no | Path to profile. If omitted, profiles the current network. Use '/' for entire project. |
| `top` | integer | no | Number of slowest operators to return |
## Documentation
### td_get_docs
Get comprehensive documentation on a TouchDesigner topic. Unlike td_get_hints (compact tips), this returns in-depth reference material. Call without arguments to see available topics with descriptions. Call with a topic name to get the full documentation.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `topic` | string | no | Topic to get docs for. Omit to list available topics. |
### td_get_hints
Get TouchDesigner tips and common patterns for a topic. Call this BEFORE creating operators or writing TD Python code to learn correct parameter names, expressions, and idiomatic approaches. Available topics: animation, noise, connections, parameters, scripting, construction, ui_analysis, panel_layout, screenshots, input_simulation, undo. IMPORTANT: always call with topic='construction' before building multi-operator setups to get correct TOP/CHOP parameter names, compositeTOP input ordering, and layout guidelines. IMPORTANT: always call with topic='input_simulation' before using td_input_execute to learn focus recovery, coordinate systems, and testing workflow.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `topic` | string | yes | Topic to get hints for. Available: 'animation', 'noise', 'connections', 'parameters', 'scripting', 'construction', 'ui_analysis', 'panel_layout', 'screenshots', 'input_simulation', 'undo', 'networking', 'all' |
### td_agents_md
Read, write, or update the agents_md documentation inside a COMP container. agents_md is a Markdown textDAT describing the container's purpose, structure, and conventions. action='read': returns content + staleness check (compares documented children vs live state). action='update': refreshes auto-generated sections (children list, connections) from live state, preserves human-written sections. action='write': sets full content, creates the DAT if missing.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `path` | string | yes | Path to the COMP container |
| `action` | `read` / `update` / `write` | yes | read=get content+staleness, update=refresh auto sections, write=set content |
| `content` | string | no | Markdown content (only for action='write') |
## Input Automation
### td_input_execute
Send a sequence of mouse/keyboard commands to TouchDesigner. Commands execute sequentially with smooth bezier movement. Returns immediately — poll td_input_status() until status='idle' before proceeding. Command types: 'focus' — bring TD to foreground. 'move' — smooth mouse move: {type,x,y,duration,easing}. 'click' — click: {type,x,y,button,hold,duration,easing}. hold=seconds to hold down. duration=smooth move before click. 'dblclick' — double click: {type,x,y,duration}. 'mousedown'/'mouseup' — {type,x,y,button}. 'key' — keystroke: {type,keys} e.g. 'ctrl+z','tab','escape','shift+f5'. Requires Accessibility permission on Mac. 'type' — human-like typing: {type,text,wpm,variance} — layout-independent Unicode, variable timing. 'wait' — pause: {type,duration}. 'scroll' — {type,x,y,dx,dy,steps} — human-like scroll: moves mouse to (x,y) first, then sends dy (vertical, +up) and dx (horizontal, +right) as multiple ticks with natural timing. steps=4 by default. Mouse commands may include coord_space='logical' (default) or coord_space='physical'. On macOS, 'physical' means actual screen pixels from td_get_screen_screenshot and is converted to CGEvent logical coords automatically. Top-level coord_space applies to commands that do not override it. on_error: 'stop' (default) clears queue on error; 'continue' skips failed command. IMPORTANT: call td_get_hints('input_simulation') before first use to learn focus recovery, coordinate systems, and testing workflow.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `commands` | array | yes | List of command dicts to execute in sequence. |
| `coord_space` | `logical` / `physical` | no | Default coordinate space for mouse commands that do not specify their own coord_space. 'logical' uses CGEvent coords directly. 'physical' uses actual screen pixels from td_get_screen_screenshot and is auto-converted on macOS. |
| `on_error` | `stop` / `continue` | no | What to do on error. Default 'stop'. |
### td_input_status
Get current status of the td_input command queue. Poll this after td_input_execute until status='idle'. Returns: status ('idle'/'running'), current command, queue_remaining, last error.
No parameters (other than optional `target_instance`).
### td_input_clear
Clear the td_input command queue and stop current execution immediately.
No parameters (other than optional `target_instance`).
### td_op_screen_rect
Get the screen coordinates of an operator node in the network editor. Returns {x,y,w,h,cx,cy} where cx,cy is the center for clicking. Use this to find where to click on a specific operator. Only works if the operator's parent network is currently open in a network editor pane.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `path` | string | yes | Full path to the operator, e.g. '/project1/myComp/noise1' |
### td_click_screen_point
Resolve a point inside a previous td_get_screen_screenshot result and click it. Pass the screenshot request_id plus either normalized u/v or image_x/image_y. Queues a td_input click using physical screen coordinates, so it works directly with screenshot-derived points. Use duration/easing to control the cursor travel before the click.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `request_id` | string | yes | Request ID originally returned by td_get_screen_screenshot. |
| `u` | number | no | Normalized horizontal position inside the screenshot region (0=left, 1=right). Use with v. |
| `v` | number | no | Normalized vertical position inside the screenshot region (0=top, 1=bottom). Use with u. |
| `image_x` | number | no | Horizontal pixel coordinate inside the returned screenshot image. Use with image_y. |
| `image_y` | number | no | Vertical pixel coordinate inside the returned screenshot image. Use with image_x. |
| `button` | `left` / `right` / `middle` | no | Mouse button to click. Default left. |
| `hold` | number | no | Seconds to hold the mouse button down before releasing. |
| `duration` | number | no | Seconds for the cursor to travel to the target before clicking. |
| `easing` | `linear` / `ease-in` / `ease-out` / `ease-in-out` | no | Cursor movement easing for the pre-click travel. |
| `focus` | boolean | no | If true, bring TD to the front before clicking and wait briefly for focus to settle. |
### td_screen_point_to_global
Convert a point inside a previous td_get_screen_screenshot result into absolute screen coordinates. Pass the screenshot request_id plus either normalized u/v (0..1 inside that screenshot region) or image_x/image_y in returned image pixels. Returns absolute physical screen coordinates, logical coordinates, and a ready-to-use td_input_execute payload. Metadata is kept for the most recent screen screenshots so multiple agents can resolve points later by request_id.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `request_id` | string | yes | Request ID originally returned by td_get_screen_screenshot. |
| `u` | number | no | Normalized horizontal position inside the screenshot region (0=left, 1=right). Use with v. |
| `v` | number | no | Normalized vertical position inside the screenshot region (0=top, 1=bottom). Use with u. |
| `image_x` | number | no | Horizontal pixel coordinate inside the returned screenshot image. Use with image_y. |
| `image_y` | number | no | Vertical pixel coordinate inside the returned screenshot image. Use with image_x. |
## System
### td_list_instances
List all running TouchDesigner (TD) instances with active MCP servers. Returns port, project name, PID, and instanceId for each instance. Call this at the start of every conversation to discover available instances and choose which one to work with. instanceId is stable for the lifetime of a TD process and is used as target_instance in all other tool calls.
No parameters (other than optional `target_instance`).
### td_project_quit
Save and/or close the current TouchDesigner (TD) project. Can save before closing. Reports if project has unsaved changes. To close a different instance, pass target_instance=instanceId. WARNING: this will shut down the MCP server on that instance.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `save` | boolean | no | Save the project before closing. Default true. |
| `force` | boolean | no | Force close without save dialog. Default false. |
### td_reinit_extension
Reinitialize an extension on a COMP in TouchDesigner (TD). Call this AFTER finishing all code edits via td_write_dat to apply changes. Do NOT call after every small edit - batch your changes first.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `path` | string | yes | Path to the COMP with the extension |
### td_dev_log
Read the last N entries from the MCP dev log. Only available when Devmode is enabled. Shows request/response history.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `count` | integer | no | Number of recent log entries to return |
### td_clear_dev_log
Clear the current MCP dev log by closing the old file and starting a fresh one. Only available when Devmode is enabled.
No parameters (other than optional `target_instance`).
### td_test_session
Manage test sessions, bug reports, and conversation export. IMPORTANT: Do NOT proactively suggest exporting chat or submitting reports. These are tools for specific situations: - export_chat / submit_report: ONLY when the user encounters a BUG with the plugin or TouchDesigner and wants to report it, or when the user explicitly asks to export the conversation. Never suggest this at session end or as routine action. USER PHRASES → ACTIONS: 'разбор тестовых сессий' / 'analyze test sessions' → list, then pull, read meta.json → index.jsonl → calls/. 'разбор репортов' / 'analyze user reports' → list with session='user', then pull by name. 'экспортируй чат' / 'export chat' → (1) export_chat_id → marker, (2) export_chat with session=marker. 'сообщи о проблеме' / 'report bug' → export chat, review for privacy, then submit_report with summary + tags + result_op=file_path. ACTIONS: export_chat_id | export_chat | submit_report | start | note | import_chat | end | list | pull. list: default=auto-detect repo. session='user' for user_reports (dev only). pull: auto-searches both repos. Auto-detects dev vs user Hub access.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `action` | `export_chat_id` / `export_chat` / `submit_report` / `start` / `note` / `import_chat` / `end` / `list` / `pull` | yes | Action: export_chat_id / export_chat / submit_report / start / note / import_chat / end / list / pull |
| `prompt` | string | no | (start) The test prompt/task description |
| `tags` | array | no | (start) Tags for categorization, e.g. ['ui', 'layout'] |
| `text` | string | no | (note) Observation text. (import_chat) Full conversation text. |
| `outcome` | `success` / `partial` / `failure` | no | (end) Result: success / partial / failure |
| `summary` | string | no | (end) Brief summary of what happened |
| `result_op` | string | no | (end) Path to operator to save as result.tox |
| `session` | string | no | (pull) Session name or substring to download |
@@ -0,0 +1,966 @@
# TouchDesigner Network Patterns
Complete network recipes for common creative coding tasks. Each pattern shows the operator chain, MCP tool calls to build it, and key parameter settings.
## Audio-Reactive Visuals
### Pattern 1: Audio Spectrum -> Noise Displacement
Audio drives noise parameters for organic, music-responsive textures.
```
Audio File In CHOP -> Audio Spectrum CHOP -> Math CHOP (scale)
|
v (export to noise params)
Noise TOP -> Level TOP -> Feedback TOP -> Composite TOP -> Null TOP (out)
^ |
|________________|
```
**MCP Build Sequence:**
```
1. td_create_operator(parent="/project1", type="audiofileinChop", name="audio_in")
2. td_create_operator(parent="/project1", type="audiospectrumChop", name="spectrum")
3. td_create_operator(parent="/project1", type="mathChop", name="spectrum_scale")
4. td_create_operator(parent="/project1", type="noiseTop", name="noise1")
5. td_create_operator(parent="/project1", type="levelTop", name="level1")
6. td_create_operator(parent="/project1", type="feedbackTop", name="feedback1")
7. td_create_operator(parent="/project1", type="compositeTop", name="comp1")
8. td_create_operator(parent="/project1", type="nullTop", name="out")
9. td_set_operator_pars(path="/project1/audio_in",
properties={"file": "/path/to/music.wav", "play": true})
10. td_set_operator_pars(path="/project1/spectrum",
properties={"size": 512})
11. td_set_operator_pars(path="/project1/spectrum_scale",
properties={"gain": 2.0, "postoff": 0.0})
12. td_set_operator_pars(path="/project1/noise1",
properties={"type": 1, "monochrome": false, "resolutionw": 1280, "resolutionh": 720,
"period": 4.0, "harmonics": 3, "amp": 1.0})
13. td_set_operator_pars(path="/project1/level1",
properties={"opacity": 0.95, "gamma1": 0.75})
14. td_set_operator_pars(path="/project1/feedback1",
properties={"top": "/project1/comp1"})
15. td_set_operator_pars(path="/project1/comp1",
properties={"operand": 0})
16. td_execute_python: """
op('/project1/audio_in').outputConnectors[0].connect(op('/project1/spectrum'))
op('/project1/spectrum').outputConnectors[0].connect(op('/project1/spectrum_scale'))
op('/project1/noise1').outputConnectors[0].connect(op('/project1/level1'))
op('/project1/level1').outputConnectors[0].connect(op('/project1/comp1').inputConnectors[0])
op('/project1/feedback1').outputConnectors[0].connect(op('/project1/comp1').inputConnectors[1])
op('/project1/comp1').outputConnectors[0].connect(op('/project1/out'))
"""
17. td_execute_python: """
# Export spectrum values to drive noise parameters
# This makes the noise react to audio frequencies
op('/project1/noise1').par.seed.expr = "op('/project1/spectrum_scale')['chan1']"
op('/project1/noise1').par.period.expr = "tdu.remap(op('/project1/spectrum_scale')['chan1'].eval(), 0, 1, 1, 8)"
"""
```
### Pattern 2: Beat Detection -> Visual Pulses
Detect beats from audio and trigger visual events.
```
Audio Device In CHOP -> Audio Spectrum CHOP -> Math CHOP (isolate bass)
|
Trigger CHOP (envelope)
|
[export to visual params]
```
**Key parameter settings:**
```
# Isolate bass frequencies (20-200 Hz)
Math CHOP: chanop=1 (Add channels), range1low=0, range1high=10
(first 10 FFT bins = bass frequencies with 512 FFT at 44100Hz)
# ADSR envelope on each beat
Trigger CHOP: attack=0.02, peak=1.0, decay=0.3, sustain=0.0, release=0.1
# Export to visual: Scale, brightness, or color intensity
td_execute_python: "op('/project1/level1').par.brightness1.expr = \"1.0 + op('/project1/trigger1')['chan1'] * 0.5\""
```
### Pattern 3: Multi-Band Audio -> Multi-Layer Visuals
Split audio into frequency bands, drive different visual layers per band.
```
Audio In -> Spectrum -> Audio Band EQ (3 bands: bass, mid, treble)
|
+---------+---------+
| | |
Bass Mids Treble
| | |
Noise TOP Circle TOP Text TOP
(slow,dark) (mid,warm) (fast,bright)
| | |
+-----+----+----+----+
| |
Composite Composite
|
Out
```
### Pattern 3b: Audio-Reactive GLSL Fractal (Proven Recipe)
Complete working recipe. Plays an MP3, runs FFT, feeds spectrum as a texture into a GLSL shader where inner fractal reacts to bass, outer to treble.
**Network:**
```
AudioFileIn CHOP → AudioSpectrum CHOP (FFT=512, outlength=256)
→ Math CHOP (gain=10) → CHOP To TOP (256x2 spectrum texture, dataformat=r)
Constant TOP (time, rgba32float) → GLSL TOP (input 0=time, input 1=spectrum) → Null → MovieFileOut
AudioFileIn CHOP → Audio Device Out CHOP Record to .mov
```
**Build via td_execute_python (one call per step for reliability):**
```python
# Step 1: Audio chain
# td_execute_python script:
td_execute_python(code="""
root = op('/project1')
audio = root.create(audiofileinCHOP, 'audio_in')
audio.par.file = '/path/to/music.mp3'
audio.par.playmode = 0 # Locked to timeline
audio.par.volume = 0.5
spec = root.create(audiospectrumCHOP, 'spectrum')
audio.outputConnectors[0].connect(spec.inputConnectors[0])
math_n = root.create(mathCHOP, 'math_norm')
spec.outputConnectors[0].connect(math_n.inputConnectors[0])
math_n.par.gain = 5 # boost signal
resamp = root.create(resampleCHOP, 'resample_spec')
math_n.outputConnectors[0].connect(resamp.inputConnectors[0])
resamp.par.timeslice = True
resamp.par.rate = 256
chop2top = root.create(choptoTOP, 'spectrum_tex')
chop2top.par.chop = resamp # CHOP To TOP has NO input connectors — use par.chop reference
# Audio output (hear the music)
aout = root.create(audiodeviceoutCHOP, 'audio_out')
audio.outputConnectors[0].connect(aout.inputConnectors[0])
result = 'audio chain ok'
""")
# Step 2: Time driver (MUST be rgba32float — see pitfalls #6)
# td_execute_python script:
td_execute_python(code="""
root = op('/project1')
td = root.create(constantTOP, 'time_driver')
td.par.format = 'rgba32float'
td.par.outputresolution = 'custom'
td.par.resolutionw = 1
td.par.resolutionh = 1
td.par.colorr.expr = "absTime.seconds % 1000.0"
td.par.colorg.expr = "int(absTime.seconds / 1000.0)"
result = 'time ok'
""")
# Step 3: GLSL shader (write to /tmp, load from file)
# td_execute_python script:
td_execute_python(code="""
root = op('/project1')
glsl = root.create(glslTOP, 'audio_shader')
glsl.par.outputresolution = 'custom'
glsl.par.resolutionw = 1280
glsl.par.resolutionh = 720
sd = root.create(textDAT, 'shader_code')
sd.text = open('/tmp/my_shader.glsl').read()
glsl.par.pixeldat = sd
# Wire: input 0 = time, input 1 = spectrum texture
op('/project1/time_driver').outputConnectors[0].connect(glsl.inputConnectors[0])
op('/project1/spectrum_tex').outputConnectors[0].connect(glsl.inputConnectors[1])
result = 'glsl ok'
""")
# Step 4: Output + recorder
# td_execute_python script:
td_execute_python(code="""
root = op('/project1')
out = root.create(nullTOP, 'output')
op('/project1/audio_shader').outputConnectors[0].connect(out.inputConnectors[0])
rec = root.create(moviefileoutTOP, 'recorder')
out.outputConnectors[0].connect(rec.inputConnectors[0])
rec.par.type = 'movie'
rec.par.file = '/tmp/output.mov'
rec.par.videocodec = 'mjpa'
result = 'output ok'
""")
```
**GLSL shader pattern (audio-reactive fractal):**
```glsl
out vec4 fragColor;
vec3 palette(float t) {
vec3 a = vec3(0.5); vec3 b = vec3(0.5);
vec3 c = vec3(1.0); vec3 d = vec3(0.263, 0.416, 0.557);
return a + b * cos(6.28318 * (c * t + d));
}
void main() {
// Input 0 = time (1x1 rgba32float constant)
// Input 1 = audio spectrum (256x2 CHOP To TOP, stereo — sample at y=0.25 for first channel)
vec4 td = texture(sTD2DInputs[0], vec2(0.5));
float t = td.r + td.g * 1000.0;
vec2 res = uTDOutputInfo.res.zw;
vec2 uv = (gl_FragCoord.xy * 2.0 - res) / min(res.x, res.y);
vec2 uv0 = uv;
vec3 finalColor = vec3(0.0);
float bass = texture(sTD2DInputs[1], vec2(0.05, 0.25)).r;
float mids = texture(sTD2DInputs[1], vec2(0.25, 0.25)).r;
for (float i = 0.0; i < 4.0; i++) {
uv = fract(uv * (1.4 + bass * 0.3)) - 0.5;
float d = length(uv) * exp(-length(uv0));
// Sample spectrum at distance: inner=bass, outer=treble
float freq = texture(sTD2DInputs[1], vec2(clamp(d * 0.5, 0.0, 1.0), 0.25)).r;
vec3 col = palette(length(uv0) + i * 0.4 + t * 0.35);
d = sin(d * (7.0 + bass * 4.0) + t * 1.5) / 8.0;
d = abs(d);
d = pow(0.012 / d, 1.2 + freq * 0.8 + bass * 0.5);
finalColor += col * d;
}
// Tone mapping
finalColor = finalColor / (finalColor + vec3(1.0));
fragColor = TDOutputSwizzle(vec4(finalColor, 1.0));
}
```
**Key insights from testing:**
- `spectrum_tex` (CHOP To TOP) produces a 256x2 texture — x position = frequency, y=0.25 for first channel
- Sampling at `vec2(0.05, 0.0)` gets bass, `vec2(0.65, 0.0)` gets treble
- Sampling based on pixel distance (`d * 0.5`) makes inner fractal react to bass, outer to treble
- `bass * 0.3` in the `fract()` zoom makes the fractal breathe with kicks
- Math CHOP gain of 5 is needed because raw spectrum values are very small
## Generative Art
### Pattern 4: Feedback Loop with Transform
Classic generative technique — texture evolves through recursive transformation.
```
Noise TOP -> Composite TOP -> Level TOP -> Null TOP (out)
^ |
| v
Transform TOP <- Feedback TOP
```
**MCP Build Sequence:**
```
1. td_create_operator(parent="/project1", type="noiseTop", name="seed_noise")
2. td_create_operator(parent="/project1", type="compositeTop", name="mix")
3. td_create_operator(parent="/project1", type="transformTop", name="evolve")
4. td_create_operator(parent="/project1", type="feedbackTop", name="fb")
5. td_create_operator(parent="/project1", type="levelTop", name="color_correct")
6. td_create_operator(parent="/project1", type="nullTop", name="out")
7. td_set_operator_pars(path="/project1/seed_noise",
properties={"type": 1, "monochrome": false, "period": 2.0, "amp": 0.3,
"resolutionw": 1280, "resolutionh": 720})
8. td_set_operator_pars(path="/project1/mix",
properties={"operand": 27}) # 27 = Screen blend
9. td_set_operator_pars(path="/project1/evolve",
properties={"sx": 1.003, "sy": 1.003, "rz": 0.5, "extend": 2}) # slight zoom + rotate, repeat edges
10. td_set_operator_pars(path="/project1/fb",
properties={"top": "/project1/mix"})
11. td_set_operator_pars(path="/project1/color_correct",
properties={"opacity": 0.98, "gamma1": 0.85})
12. td_execute_python: """
op('/project1/seed_noise').outputConnectors[0].connect(op('/project1/mix').inputConnectors[0])
op('/project1/fb').outputConnectors[0].connect(op('/project1/evolve'))
op('/project1/evolve').outputConnectors[0].connect(op('/project1/mix').inputConnectors[1])
op('/project1/mix').outputConnectors[0].connect(op('/project1/color_correct'))
op('/project1/color_correct').outputConnectors[0].connect(op('/project1/out'))
"""
```
**Variations:**
- Change Transform: `rz` (rotation), `sx/sy` (zoom), `tx/ty` (drift)
- Change Composite operand: Screen (glow), Add (bright), Multiply (dark)
- Add HSV Adjust in the feedback loop for color evolution
- Add Blur for dreamlike softness
- Replace Noise with a GLSL TOP for custom seed patterns
### Pattern 5: Instancing (Particle-Like Systems)
Render thousands of copies of geometry, each with unique position/rotation/scale driven by CHOP data or DATs.
```
Table DAT (instance data) -> DAT to CHOP -> Geometry COMP (instancing on) -> Render TOP
+ Sphere SOP (template geometry)
+ Constant MAT (material)
+ Camera COMP
+ Light COMP
```
**MCP Build Sequence:**
```
1. td_create_operator(parent="/project1", type="tableDat", name="instance_data")
2. td_create_operator(parent="/project1", type="geometryComp", name="geo1")
3. td_create_operator(parent="/project1/geo1", type="sphereSop", name="sphere")
4. td_create_operator(parent="/project1", type="constMat", name="mat1")
5. td_create_operator(parent="/project1", type="cameraComp", name="cam1")
6. td_create_operator(parent="/project1", type="lightComp", name="light1")
7. td_create_operator(parent="/project1", type="renderTop", name="render1")
8. td_execute_python: """
import random, math
dat = op('/project1/instance_data')
dat.clear()
dat.appendRow(['tx', 'ty', 'tz', 'sx', 'sy', 'sz', 'cr', 'cg', 'cb'])
for i in range(500):
angle = i * 0.1
r = 2 + i * 0.01
dat.appendRow([
str(math.cos(angle) * r),
str(math.sin(angle) * r),
str((i - 250) * 0.02),
'0.05', '0.05', '0.05',
str(random.random()),
str(random.random()),
str(random.random())
])
"""
9. td_set_operator_pars(path="/project1/geo1",
properties={"instancing": true, "instancechop": "",
"instancedat": "/project1/instance_data",
"material": "/project1/mat1"})
10. td_set_operator_pars(path="/project1/render1",
properties={"camera": "/project1/cam1", "geometry": "/project1/geo1",
"light": "/project1/light1",
"resolutionw": 1280, "resolutionh": 720})
11. td_set_operator_pars(path="/project1/cam1",
properties={"tz": 10})
```
### Pattern 6: Reaction-Diffusion (GLSL)
Classic Gray-Scott reaction-diffusion system running on the GPU.
```
Text DAT (GLSL code) -> GLSL TOP (resolution, dat reference) -> Feedback TOP
^ |
|_______________________________________|
Level TOP (out)
```
**Key GLSL code (write to Text DAT via td_execute_python):**
```glsl
// Gray-Scott reaction-diffusion
uniform float feed; // 0.037
uniform float kill; // 0.06
uniform float dA; // 1.0
uniform float dB; // 0.5
layout(location = 0) out vec4 fragColor;
void main() {
vec2 uv = vUV.st;
vec2 texel = 1.0 / uTDOutputInfo.res.zw;
vec4 c = texture(sTD2DInputs[0], uv);
float a = c.r;
float b = c.g;
// Laplacian (9-point stencil)
float lA = 0.0, lB = 0.0;
for(int dx = -1; dx <= 1; dx++) {
for(int dy = -1; dy <= 1; dy++) {
float w = (dx == 0 && dy == 0) ? -1.0 : (abs(dx) + abs(dy) == 1 ? 0.2 : 0.05);
vec4 s = texture(sTD2DInputs[0], uv + vec2(dx, dy) * texel);
lA += s.r * w;
lB += s.g * w;
}
}
float reaction = a * b * b;
float newA = a + (dA * lA - reaction + feed * (1.0 - a));
float newB = b + (dB * lB + reaction - (kill + feed) * b);
fragColor = vec4(clamp(newA, 0.0, 1.0), clamp(newB, 0.0, 1.0), 0.0, 1.0);
}
```
## Video Processing
### Pattern 7: Video Effects Chain
Apply a chain of effects to a video file.
```
Movie File In TOP -> HSV Adjust TOP -> Level TOP -> Blur TOP -> Composite TOP -> Null TOP (out)
^
Text TOP ---+
```
**MCP Build Sequence:**
```
1. td_create_operator(parent="/project1", type="moviefileinTop", name="video_in")
2. td_create_operator(parent="/project1", type="hsvadjustTop", name="color")
3. td_create_operator(parent="/project1", type="levelTop", name="levels")
4. td_create_operator(parent="/project1", type="blurTop", name="blur")
5. td_create_operator(parent="/project1", type="compositeTop", name="overlay")
6. td_create_operator(parent="/project1", type="textTop", name="title")
7. td_create_operator(parent="/project1", type="nullTop", name="out")
8. td_set_operator_pars(path="/project1/video_in",
properties={"file": "/path/to/video.mp4", "play": true})
9. td_set_operator_pars(path="/project1/color",
properties={"hueoffset": 0.1, "saturationmult": 1.3})
10. td_set_operator_pars(path="/project1/levels",
properties={"brightness1": 1.1, "contrast": 1.2, "gamma1": 0.9})
11. td_set_operator_pars(path="/project1/blur",
properties={"sizex": 2, "sizey": 2})
12. td_set_operator_pars(path="/project1/title",
properties={"text": "My Video", "fontsizex": 48, "alignx": 1, "aligny": 1})
13. td_execute_python: """
chain = ['video_in', 'color', 'levels', 'blur']
for i in range(len(chain) - 1):
op(f'/project1/{chain[i]}').outputConnectors[0].connect(op(f'/project1/{chain[i+1]}'))
op('/project1/blur').outputConnectors[0].connect(op('/project1/overlay').inputConnectors[0])
op('/project1/title').outputConnectors[0].connect(op('/project1/overlay').inputConnectors[1])
op('/project1/overlay').outputConnectors[0].connect(op('/project1/out'))
"""
```
### Pattern 8: Video Recording
Record the output to a file. **H.264/H.265 require a Commercial license** — use Motion JPEG (`mjpa`) on Non-Commercial.
```
[any TOP chain] -> Null TOP -> Movie File Out TOP
```
```python
# Build via td_execute_python:
root = op('/project1')
# Always put a Null TOP before the recorder
null_out = root.op('out') # or create one
rec = root.create(moviefileoutTOP, 'recorder')
null_out.outputConnectors[0].connect(rec.inputConnectors[0])
rec.par.type = 'movie'
rec.par.file = '/tmp/output.mov'
rec.par.videocodec = 'mjpa' # Motion JPEG — works on Non-Commercial
# Start recording (par.record is a toggle — .record() method may not exist)
rec.par.record = True
# ... let TD run for desired duration ...
rec.par.record = False
# For image sequences:
# rec.par.type = 'imagesequence'
# rec.par.imagefiletype = 'png'
# rec.par.file.expr = "'/tmp/frames/out' + me.fileSuffix" # fileSuffix REQUIRED
```
**Pitfalls:**
- Setting `par.file` + `par.record = True` in the same script may race — use `run("...", delayFrames=2)`
- `TOP.save()` called rapidly always captures the same frame — use MovieFileOut for animation
- See `pitfalls.md` #25-27 for full details
### Pattern 8b: TD → External Pipeline (FFmpeg / Python / Post-Processing)
Export TD visuals for use in another tool (ffmpeg, Python, ASCII art, etc.). This is the standard workflow when you need to composite TD output with external processing (ASCII conversion, Python shader chains, ML inference, etc.).
**Step 1: Record to video in TD**
```python
# Preferred: ProRes on macOS (lossless, Non-Commercial OK, ~55MB/s at 1280x720)
rec.par.videocodec = 'prores'
# Fallback for non-macOS: mjpa (Motion JPEG)
# rec.par.videocodec = 'mjpa'
rec.par.record = True
# ... wait N seconds ...
rec.par.record = False
```
**Step 2: Extract frames with ffmpeg**
```bash
# Extract all frames at 30fps
ffmpeg -y -i /tmp/output.mov -vf 'fps=30' /tmp/frames/frame_%06d.png
# Or extract a specific duration
ffmpeg -y -i /tmp/output.mov -t 25 -vf 'fps=30' /tmp/frames/frame_%06d.png
# Or extract specific frame range
ffmpeg -y -i /tmp/output.mov -vf 'select=between(n\,0\,749)' -vsync vfr /tmp/frames/frame_%06d.png
```
**Step 3: Process frames in Python**
```python
from PIL import Image
import os
frames_dir = '/tmp/frames'
output_dir = '/tmp/processed'
os.makedirs(output_dir, exist_ok=True)
for fname in sorted(os.listdir(frames_dir)):
if not fname.endswith('.png'):
continue
img = Image.open(os.path.join(frames_dir, fname))
# ... apply your processing ...
img.save(os.path.join(output_dir, fname))
```
**Step 4: Mux processed frames back with audio**
```bash
# Create video from processed frames + audio with fade-out
ffmpeg -y \
-framerate 30 -i /tmp/processed/frame_%06d.png \
-i /tmp/audio.mp3 \
-c:v libx264 -pix_fmt yuv420p -crf 18 \
-c:a aac -b:a 192k \
-shortest \
-af 'afade=t=out:st=23:d=2' \
/tmp/final_output.mp4
```
**Key considerations:**
- Use ProRes for the TD recording step to avoid generation loss during compositing
- Extract at the target output framerate (not TD's render framerate)
- For audio-synced content, analyze the audio file separately in Python (scipy FFT) to get per-frame features (rms, spectral bands, beats) and drive compositing parameters
- Always verify TD FPS > 0 before recording (see pitfalls #37, #38)
## Data Visualization
### Pattern 9: Table Data -> Bar Chart via Instancing
Visualize tabular data as a 3D bar chart.
```
Table DAT (data) -> Script DAT (transform to instance format) -> DAT to CHOP
|
Box SOP -> Geometry COMP (instancing from CHOP) -> Render TOP -> Null TOP (out)
+ PBR MAT
+ Camera COMP
+ Light COMP
```
```python
# Script DAT code to transform data to instance positions
td_execute_python: """
source = op('/project1/data_table')
instance = op('/project1/instance_transform')
instance.clear()
instance.appendRow(['tx', 'ty', 'tz', 'sx', 'sy', 'sz', 'cr', 'cg', 'cb'])
for i in range(1, source.numRows):
value = float(source[i, 'value'])
name = source[i, 'name']
instance.appendRow([
str(i * 1.5), # x position (spread bars)
str(value / 2), # y position (center bar vertically)
'0', # z position
'1', str(value), '1', # scale (height = data value)
'0.2', '0.6', '1.0' # color (blue)
])
"""
```
### Pattern 9b: Audio-Reactive GLSL Fractal (Proven Recipe)
Audio spectrum drives a GLSL fractal shader directly via a spectrum texture input. Bass thickens inner fractal lines, mids twist rotation, highs light outer edges. **Always run discovery (SKILL.md Step 0) before using any param names from these recipes — they may differ in your TD version.**
```
Audio File In CHOP → Audio Spectrum CHOP (FFT=512, outlength=256)
→ Math CHOP (gain=10)
→ CHOP To TOP (spectrum texture, 256x2, dataformat=r)
↓ (input 1)
Constant TOP (rgba32float, time) → GLSL TOP (audio-reactive shader) → Null TOP
(input 0) ↑
Text DAT (shader code)
```
**Build via td_execute_python (complete working script):**
```python
# td_execute_python script:
td_execute_python(code="""
import os
root = op('/project1')
# Audio input
audio = root.create(audiofileinCHOP, 'audio_in')
audio.par.file = '/path/to/music.mp3'
audio.par.playmode = 0 # Locked to timeline
# FFT analysis (output length manually set to 256 bins)
spectrum = root.create(audiospectrumCHOP, 'spectrum')
audio.outputConnectors[0].connect(spectrum.inputConnectors[0])
spectrum.par.fftsize = '512'
spectrum.par.outputmenu = 'setmanually'
spectrum.par.outlength = 256
# THEN boost gain on the raw spectrum (NO Lag CHOP — see pitfall #34)
math = root.create(mathCHOP, 'math_norm')
spectrum.outputConnectors[0].connect(math.inputConnectors[0])
math.par.gain = 10
# Spectrum → texture (256x2 image — stereo, sample at y=0.25 for first channel)
# NOTE: choptoTOP has NO input connectors — use par.chop reference!
spec_tex = root.create(choptoTOP, 'spectrum_tex')
spec_tex.par.chop = math
spec_tex.par.dataformat = 'r'
spec_tex.par.layout = 'rowscropped'
# Time driver (rgba32float to avoid 0-1 clamping!)
time_drv = root.create(constantTOP, 'time_driver')
time_drv.par.format = 'rgba32float'
time_drv.par.outputresolution = 'custom'
time_drv.par.resolutionw = 1
time_drv.par.resolutionh = 1
time_drv.par.colorr.expr = "absTime.seconds % 1000.0"
time_drv.par.colorg.expr = "int(absTime.seconds / 1000.0)"
# GLSL shader
glsl = root.create(glslTOP, 'audio_shader')
glsl.par.outputresolution = 'custom'
glsl.par.resolutionw = 1280; glsl.par.resolutionh = 720
shader_dat = root.create(textDAT, 'shader_code')
shader_dat.text = open('/tmp/shader.glsl').read()
glsl.par.pixeldat = shader_dat
# Wire: input 0=time, input 1=spectrum
time_drv.outputConnectors[0].connect(glsl.inputConnectors[0])
spec_tex.outputConnectors[0].connect(glsl.inputConnectors[1])
# Output + audio playback
out = root.create(nullTOP, 'output')
glsl.outputConnectors[0].connect(out.inputConnectors[0])
audio_out = root.create(audiodeviceoutCHOP, 'audio_out')
audio.outputConnectors[0].connect(audio_out.inputConnectors[0])
result = 'network built'
""")
```
**GLSL shader (reads spectrum from input 1 texture):**
```glsl
out vec4 fragColor;
vec3 palette(float t) {
vec3 a = vec3(0.5); vec3 b = vec3(0.5);
vec3 c = vec3(1.0); vec3 d = vec3(0.263, 0.416, 0.557);
return a + b * cos(6.28318 * (c * t + d));
}
void main() {
vec4 td = texture(sTD2DInputs[0], vec2(0.5));
float t = td.r + td.g * 1000.0;
vec2 res = uTDOutputInfo.res.zw;
vec2 uv = (gl_FragCoord.xy * 2.0 - res) / min(res.x, res.y);
vec2 uv0 = uv;
vec3 finalColor = vec3(0.0);
float bass = texture(sTD2DInputs[1], vec2(0.05, 0.25)).r;
float mids = texture(sTD2DInputs[1], vec2(0.25, 0.25)).r;
float highs = texture(sTD2DInputs[1], vec2(0.65, 0.25)).r;
float ca = cos(t * (0.15 + mids * 0.3));
float sa = sin(t * (0.15 + mids * 0.3));
uv = mat2(ca, -sa, sa, ca) * uv;
for (float i = 0.0; i < 4.0; i++) {
uv = fract(uv * (1.4 + bass * 0.3)) - 0.5;
float d = length(uv) * exp(-length(uv0));
float freq = texture(sTD2DInputs[1], vec2(clamp(d*0.5, 0.0, 1.0), 0.25)).r;
vec3 col = palette(length(uv0) + i * 0.4 + t * 0.35);
d = sin(d * (7.0 + bass * 4.0) + t * 1.5) / 8.0;
d = abs(d);
d = pow(0.012 / d, 1.2 + freq * 0.8 + bass * 0.5);
finalColor += col * d;
}
float glow = (0.03 + bass * 0.05) / (length(uv0) + 0.03);
finalColor += vec3(0.4, 0.1, 0.7) * glow * (0.6 + 0.4 * sin(t * 2.5));
float ring = abs(length(uv0) - 0.4 - mids * 0.3);
finalColor += vec3(0.1, 0.6, 0.8) * (0.005 / ring) * (0.2 + highs * 0.5);
finalColor *= smoothstep(0.0, 1.0, 1.0 - dot(uv0*0.55, uv0*0.55));
finalColor = finalColor / (finalColor + vec3(1.0));
fragColor = TDOutputSwizzle(vec4(finalColor, 1.0));
}
```
**How spectrum sampling drives the visual:**
- `texture(sTD2DInputs[1], vec2(x, 0.0)).r` — x position = frequency (0=bass, 1=treble)
- Inner fractal iterations sample lower x → react to bass
- Outer iterations sample higher x → react to treble
- `bass * 0.3` on `fract()` scale → fractal zoom pulses with bass
- `bass * 4.0` on sin frequency → line density pulses with bass
- `mids * 0.3` on rotation speed → spiral twists faster during vocal/mid sections
- `highs * 0.5` on ring opacity → high-frequency sparkle on outer ring
**Recording the output:** Use MovieFileOut TOP with `mjpa` codec (H.264 requires Commercial license). See pitfalls #25-27.
## GLSL Shaders
### Pattern 10: Custom Fragment Shader
Write a custom visual effect as a GLSL fragment shader.
```
Text DAT (shader code) -> GLSL TOP -> Level TOP -> Null TOP (out)
+ optional input TOPs for texture sampling
```
**Common GLSL uniforms available in TouchDesigner:**
```glsl
// Automatically provided by TD
uniform vec4 uTDOutputInfo; // .res.zw = resolution
// NOTE: uTDCurrentTime does NOT exist in TD 099!
// Feed time via a 1x1 Constant TOP (format=rgba32float):
// t.par.colorr.expr = "absTime.seconds % 1000.0"
// t.par.colorg.expr = "int(absTime.seconds / 1000.0)"
// Then read in GLSL:
// vec4 td = texture(sTD2DInputs[0], vec2(0.5));
// float t = td.r + td.g * 1000.0;
// Input textures (from connected TOP inputs)
uniform sampler2D sTD2DInputs[1]; // array of input samplers
// From vertex shader
in vec3 vUV; // UV coordinates (0-1 range)
```
**Example: Plasma shader (using time from input texture)**
```glsl
layout(location = 0) out vec4 fragColor;
void main() {
vec2 uv = vUV.st;
// Read time from Constant TOP input 0 (rgba32float format)
vec4 td = texture(sTD2DInputs[0], vec2(0.5));
float t = td.r + td.g * 1000.0;
float v1 = sin(uv.x * 10.0 + t);
float v2 = sin(uv.y * 10.0 + t * 0.7);
float v3 = sin((uv.x + uv.y) * 10.0 + t * 1.3);
float v4 = sin(length(uv - 0.5) * 20.0 - t * 2.0);
float v = (v1 + v2 + v3 + v4) * 0.25;
vec3 color = vec3(
sin(v * 3.14159 + 0.0) * 0.5 + 0.5,
sin(v * 3.14159 + 2.094) * 0.5 + 0.5,
sin(v * 3.14159 + 4.189) * 0.5 + 0.5
);
fragColor = vec4(color, 1.0);
}
```
### Pattern 11: Multi-Pass GLSL (Ping-Pong)
For effects needing state across frames (particles, fluid, cellular automata), use GLSL Multi TOP with multiple passes or a Feedback TOP loop.
```
GLSL Multi TOP (pass 0: simulation, pass 1: rendering)
+ Text DAT (simulation shader)
+ Text DAT (render shader)
-> Level TOP -> Null TOP (out)
^
|__ Feedback TOP (feeds simulation state back)
```
## Interactive Installations
### Pattern 12: Mouse/Touch -> Visual Response
```
Mouse In CHOP -> Math CHOP (normalize to 0-1) -> [export to visual params]
# Or for touch/multi-touch:
Multi Touch In DAT -> Script CHOP (parse touches) -> [export to visual params]
```
```python
# Normalize mouse position to 0-1 range
td_execute_python: """
op('/project1/noise1').par.offsetx.expr = "op('/project1/mouse_norm')['tx']"
op('/project1/noise1').par.offsety.expr = "op('/project1/mouse_norm')['ty']"
"""
```
### Pattern 13: OSC Control (from external software)
```
OSC In CHOP (port 7000) -> Select CHOP (pick channels) -> [export to visual params]
```
```
1. td_create_operator(parent="/project1", type="oscinChop", name="osc_in")
2. td_set_operator_pars(path="/project1/osc_in", properties={"port": 7000})
# OSC messages like /frequency 440 will appear as channel "frequency" with value 440
# Export to any parameter:
3. td_execute_python: "op('/project1/noise1').par.period.expr = \"op('/project1/osc_in')['frequency']\""
```
### Pattern 14: MIDI Control (DJ/VJ)
```
MIDI In CHOP (device) -> Select CHOP -> [export channels to visual params]
```
Common MIDI mappings:
- CC channels (knobs/faders): continuous 0-127, map to float params
- Note On/Off: binary triggers, map to Trigger CHOP for envelopes
- Velocity: intensity/brightness
## Live Performance
### Pattern 15: Multi-Source VJ Setup
```
Source A (generative) ----+
Source B (video) ---------+-- Switch/Cross TOP -- Level TOP -- Window COMP (output)
Source C (camera) --------+
^
MIDI/OSC control selects active source and crossfade
```
```python
# MIDI CC1 controls which source is active (0-127 -> 0-2)
td_execute_python: """
op('/project1/switch1').par.index.expr = "int(op('/project1/midi_in')['cc1'] / 42)"
"""
# MIDI CC2 controls crossfade between current and next
td_execute_python: """
op('/project1/cross1').par.cross.expr = "op('/project1/midi_in')['cc2'] / 127.0"
"""
```
### Pattern 16: Projection Mapping
```
Content TOPs ----+
|
Stoner TOP (UV mapping) -> Composite TOP -> Window COMP (projector output)
or
Kantan Mapper COMP (external .tox)
```
For projection mapping, the key is:
1. Create your visual content as standard TOPs
2. Use Stoner TOP or a third-party mapping tool to UV-map content to physical surfaces
3. Output via Window COMP to the projector
### Pattern 17: Cue System
```
Table DAT (cue list: cue_number, scene_name, duration, transition_type)
|
Script CHOP (cue state: current_cue, progress, next_cue_trigger)
|
[export to Switch/Cross TOPs to transition between scenes]
```
```python
td_execute_python: """
# Simple cue system
cue_table = op('/project1/cue_list')
cue_state = op('/project1/cue_state')
def advance_cue():
current = int(cue_state.par.value0.val)
next_cue = min(current + 1, cue_table.numRows - 1)
cue_state.par.value0.val = next_cue
scene = cue_table[next_cue, 'scene']
duration = float(cue_table[next_cue, 'duration'])
# Set crossfade target and duration
op('/project1/cross1').par.cross.val = 0
# Animate cross to 1.0 over duration seconds
# (use a Timer CHOP or LFO CHOP for smooth animation)
"""
```
## Networking
### Pattern 18: OSC Server/Client
```
# Sending OSC
OSC Out CHOP -> (network) -> external application
# Receiving OSC
(network) -> OSC In CHOP -> Select CHOP -> [use values]
```
### Pattern 19: NDI Video Streaming
```
# Send video over network
[any TOP chain] -> NDI Out TOP (source name)
# Receive video from network
NDI In TOP (select source) -> [process as normal TOP]
```
### Pattern 20: WebSocket Communication
```
WebSocket DAT -> Script DAT (parse JSON messages) -> [update visuals]
```
```python
td_execute_python: """
ws = op('/project1/websocket1')
ws.par.address = 'ws://localhost:8080'
ws.par.active = True
# In a DAT Execute callback (Script DAT watching WebSocket DAT):
# def onTableChange(dat):
# import json
# msg = json.loads(dat.text)
# op('/project1/noise1').par.seed.val = msg.get('seed', 0)
"""
```
@@ -0,0 +1,239 @@
# TouchDesigner Operator Reference
## Operator Families Overview
TouchDesigner has 6 operator families. Each family processes a specific data type and is color-coded in the UI. Operators can only connect to others of the SAME family (with cross-family converters as the bridge).
## TOPs — Texture Operators (Purple)
2D image/texture processing on the GPU. The workhorse of visual output.
### Generators (create images from nothing)
| Operator | Type Name | Key Parameters | Use |
|----------|-----------|---------------|-----|
| Noise TOP | `noiseTop` | `type` (0-6), `monochrome`, `seed`, `period`, `harmonics`, `exponent`, `amp`, `offset`, `resolutionw/h` | Procedural noise textures — Perlin, Simplex, Sparse, etc. Foundation of generative art. |
| Constant TOP | `constantTop` | `colorr/g/b/a`, `resolutionw/h` | Solid color. Use as background or blend input. |
| Text TOP | `textTop` | `text`, `fontsizex`, `fontfile`, `alignx/y`, `colorr/g/b` | Render text to texture. Supports multi-line, word wrap. |
| Ramp TOP | `rampTop` | `type` (0=horizontal, 1=vertical, 2=radial, 3=circular), `phase`, `period` | Gradient textures for masking, color mapping. |
| Circle TOP | `circleTop` | `radiusx/y`, `centerx/y`, `width` | Circles, rings, ellipses. |
| Rectangle TOP | `rectangleTop` | `sizex/y`, `centerx/y`, `softness` | Rectangles with optional softness. |
| GLSL TOP | `glslTop` | `dat` (points to shader DAT), `resolutionw/h`, `outputformat`, custom uniforms | Custom fragment shaders. Most powerful TOP for custom visuals. |
| GLSL Multi TOP | `glslmultiTop` | `dat`, `numinputs`, `numoutputs`, `numcomputepasses` | Multi-pass GLSL with compute shaders. Advanced. |
| Render TOP | `renderTop` | `camera`, `geometry`, `lights`, `resolutionw/h` | Renders 3D scenes (SOPs + MATs + Camera/Light COMPs). |
### Filters (modify a single input)
| Operator | Type Name | Key Parameters | Use |
|----------|-----------|---------------|-----|
| Level TOP | `levelTop` | `opacity`, `brightness1/2`, `gamma1/2`, `contrast`, `invert`, `blacklevel/whitelevel` | Brightness, contrast, gamma, levels. Essential color correction. |
| Blur TOP | `blurTop` | `sizex/y`, `type` (0=Gaussian, 1=Box, 2=Bartlett) | Gaussian/box blur. |
| Transform TOP | `transformTop` | `tx/ty`, `sx/sy`, `rz`, `pivotx/y`, `extend` (0=Hold, 1=Zero, 2=Repeat, 3=Mirror) | Translate, scale, rotate textures. |
| HSV Adjust TOP | `hsvadjustTop` | `hueoffset`, `saturationmult`, `valuemult` | HSV color adjustments. |
| Lookup TOP | `lookupTop` | (input: texture + lookup table) | Color remapping via lookup table texture. |
| Edge TOP | `edgeTop` | `type` (0=Sobel, 1=Frei-Chen) | Edge detection. |
| Displace TOP | `displaceTop` | `scalex/y` | Pixel displacement using a second input as displacement map. |
| Flip TOP | `flipTop` | `flipx`, `flipy`, `flop` (diagonal) | Mirror/flip textures. |
| Crop TOP | `cropTop` | `cropleft/right/top/bottom` | Crop region of texture. |
| Resolution TOP | `resolutionTop` | `resolutionw/h`, `outputresolution` | Resize textures. |
| Null TOP | `nullTop` | (none significant) | Pass-through. Use for organization, referencing, feedback delay. |
| Cache TOP | `cacheTop` | `length`, `step` | Store N frames of history. Useful for trails, time effects. |
### Compositors (combine multiple inputs)
| Operator | Type Name | Key Parameters | Use |
|----------|-----------|---------------|-----|
| Composite TOP | `compositeTop` | `operand` (0-31: Over, Add, Multiply, Screen, etc.) | Blend two textures with standard compositing modes. |
| Over TOP | `overTop` | (simple alpha compositing) | Layer with alpha. Simpler than Composite. |
| Add TOP | `addTop` | (additive blend) | Additive blending. Great for glow, light effects. |
| Multiply TOP | `multiplyTop` | (multiplicative blend) | Multiply blend. Good for masking, darkening. |
| Switch TOP | `switchTop` | `index` (0-based) | Switch between multiple inputs by index. |
| Cross TOP | `crossTop` | `cross` (0.0-1.0) | Crossfade between two inputs. |
### I/O (input/output)
| Operator | Type Name | Key Parameters | Use |
|----------|-----------|---------------|-----|
| Movie File In TOP | `moviefileinTop` | `file`, `speed`, `trim`, `index` | Load video files, image sequences. |
| Movie File Out TOP | `moviefileoutTop` | `file`, `type` (codec), `record` (toggle) | Record/export video files. |
| NDI In TOP | `ndiinTop` | `sourcename` | Receive NDI video streams. |
| NDI Out TOP | `ndioutTop` | `sourcename` | Send NDI video streams. |
| Syphon Spout In/Out TOP | `syphonspoutinTop` / `syphonspoutoutTop` | `servername` | Inter-app texture sharing. |
| Video Device In TOP | `videodeviceinTop` | `device` | Webcam/capture card input. |
| Feedback TOP | `feedbackTop` | `top` (path to the TOP to feed back) | One-frame delay feedback. Essential for recursive effects. |
### Converters
| Operator | Type Name | Direction | Use |
|----------|-----------|-----------|-----|
| CHOP to TOP | `choptopTop` | CHOP -> TOP | Visualize channel data as texture (waveform, spectrum display). |
| TOP to CHOP | `topchopChop` | TOP -> CHOP | Sample texture pixels as channel data. |
## CHOPs — Channel Operators (Green)
Time-varying numeric data: audio, animation curves, sensor data, control signals.
### Generators
| Operator | Type Name | Key Parameters | Use |
|----------|-----------|---------------|-----|
| Constant CHOP | `constantChop` | `name0/value0`, `name1/value1`... | Static named channels. Control panel for parameters. |
| LFO CHOP | `lfoChop` | `frequency`, `type` (0=Sin, 1=Tri, 2=Square, 3=Ramp, 4=Pulse), `amp`, `offset`, `phase` | Low frequency oscillator. Animation driver. |
| Noise CHOP | `noiseChop` | `type`, `roughness`, `period`, `amp`, `seed`, `channels` | Smooth random motion. Organic animation. |
| Pattern CHOP | `patternChop` | `type` (0=Sine, 1=Triangle, ...), `length`, `cycles` | Generate waveform patterns. |
| Timer CHOP | `timerChop` | `length`, `play`, `cue`, `cycles` | Countdown/count-up timer with cue points. |
| Count CHOP | `countChop` | `threshold`, `limittype`, `limitmin/max` | Event counter with wrapping/clamping. |
### Audio
| Operator | Type Name | Key Parameters | Use |
|----------|-----------|---------------|-----|
| Audio File In CHOP | `audiofileinChop` | `file`, `volume`, `play`, `speed`, `trim` | Play audio files. |
| Audio Device In CHOP | `audiodeviceinChop` | `device`, `channels` | Live microphone/line input. |
| Audio Spectrum CHOP | `audiospectrumChop` | `size` (FFT size), `outputformat` (0=Power, 1=Magnitude) | FFT frequency analysis. |
| Audio Band EQ CHOP | `audiobandeqChop` | `bands`, `gaindb` per band | Frequency band isolation. |
| Audio Device Out CHOP | `audiodeviceoutChop` | `device` | Audio playback output. |
### Math/Logic
| Operator | Type Name | Key Parameters | Use |
|----------|-----------|---------------|-----|
| Math CHOP | `mathChop` | `preoff`, `gain`, `postoff`, `chanop` (0=Off, 1=Add, 2=Subtract, 3=Multiply...) | Math operations on channels. The Swiss army knife. |
| Logic CHOP | `logicChop` | `preop` (0=Off, 1=AND, 2=OR, 3=XOR, 4=NAND), `convert` | Boolean logic on channels. |
| Filter CHOP | `filterChop` | `type` (0=Low Pass, 1=Band Pass, 2=High Pass, 3=Notch), `cutofffreq`, `filterwidth` | Smooth, dampen, filter signals. |
| Lag CHOP | `lagChop` | `lag1/2`, `overshoot1/2` | Smooth transitions with overshoot. |
| Limit CHOP | `limitChop` | `type` (0=Clamp, 1=Loop, 2=ZigZag), `min/max` | Clamp or wrap channel values. |
| Speed CHOP | `speedChop` | (none significant) | Integrate values (velocity to position, acceleration to velocity). |
| Trigger CHOP | `triggerChop` | `attack`, `peak`, `decay`, `sustain`, `release` | ADSR envelope from trigger events. |
| Select CHOP | `selectChop` | `chop` (path), `channames` | Reference channels from another CHOP. |
| Merge CHOP | `mergeChop` | `align` (0=Extend, 1=Trim to First, 2=Trim to Shortest) | Combine channels from multiple CHOPs. |
| Null CHOP | `nullChop` | (none significant) | Pass-through for organization and referencing. |
### Input Devices
| Operator | Type Name | Use |
|----------|-----------|-----|
| Mouse In CHOP | `mouseinChop` | Mouse position, buttons, wheel. |
| Keyboard In CHOP | `keyboardinChop` | Keyboard key states. |
| MIDI In CHOP | `midiinChop` | MIDI note/CC input. |
| OSC In CHOP | `oscinChop` | OSC message input (network). |
## SOPs — Surface Operators (Blue)
3D geometry: points, polygons, NURBS, meshes.
### Generators
| Operator | Type Name | Key Parameters | Use |
|----------|-----------|---------------|-----|
| Grid SOP | `gridSop` | `rows`, `cols`, `sizex/y`, `type` (0=Polygon, 1=Mesh, 2=NURBS) | Flat grid mesh. Foundation for displacement, instancing. |
| Sphere SOP | `sphereSop` | `type`, `rows`, `cols`, `radius` | Sphere geometry. |
| Box SOP | `boxSop` | `sizex/y/z` | Box geometry. |
| Torus SOP | `torusSop` | `radiusx/y`, `rows`, `cols` | Donut shape. |
| Circle SOP | `circleSop` | `type`, `radius`, `divs` | Circle/ring geometry. |
| Line SOP | `lineSop` | `dist`, `points` | Line segments. |
| Text SOP | `textSop` | `text`, `fontsizex`, `fontfile`, `extrude` | 3D text geometry. |
### Modifiers
| Operator | Type Name | Key Parameters | Use |
|----------|-----------|---------------|-----|
| Transform SOP | `transformSop` | `tx/ty/tz`, `rx/ry/rz`, `sx/sy/sz` | Transform geometry (translate, rotate, scale). |
| Noise SOP | `noiseSop` | `type`, `amp`, `period`, `roughness` | Deform geometry with noise. |
| Sort SOP | `sortSop` | `ptsort`, `primsort` | Reorder points/primitives. |
| Facet SOP | `facetSop` | `unique`, `consolidate`, `computenormals` | Normals, consolidation, unique points. |
| Merge SOP | `mergeSop` | (none significant) | Combine multiple geometry inputs. |
| Null SOP | `nullSop` | (none significant) | Pass-through. |
## DATs — Data Operators (White)
Text, tables, scripts, network data.
### Core
| Operator | Type Name | Key Parameters | Use |
|----------|-----------|---------------|-----|
| Table DAT | `tableDat` | (edit content directly) | Spreadsheet-like data tables. |
| Text DAT | `textDat` | (edit content directly) | Arbitrary text content. Shader code, configs, scripts. |
| Script DAT | `scriptDat` | `language` (0=Python, 1=C++) | Custom callbacks and DAT processing. |
| CHOP Execute DAT | `chopexecDat` | `chop` (path to watch), callbacks | Trigger Python on CHOP value changes. |
| DAT Execute DAT | `datexecDat` | `dat` (path to watch) | Trigger Python on DAT content changes. |
| Panel Execute DAT | `panelexecDat` | `panel` | Trigger Python on UI panel events. |
### I/O
| Operator | Type Name | Key Parameters | Use |
|----------|-----------|---------------|-----|
| Web DAT | `webDat` | `url`, `fetchmethod` (0=GET, 1=POST) | HTTP requests. API integration. |
| TCP/IP DAT | `tcpipDat` | `address`, `port`, `mode` | TCP networking. |
| OSC In DAT | `oscinDat` | `port` | Receive OSC as text messages. |
| Serial DAT | `serialDat` | `port`, `baudrate` | Serial port communication (Arduino, etc.). |
| File In DAT | `fileinDat` | `file` | Read text files. |
| File Out DAT | `fileoutDat` | `file`, `write` | Write text files. |
### Conversions
| Operator | Type Name | Direction | Use |
|----------|-----------|-----------|-----|
| DAT to CHOP | `dattochopChop` | DAT -> CHOP | Convert table data to channels. |
| CHOP to DAT | `choptodatDat` | CHOP -> DAT | Convert channel data to table rows. |
| SOP to DAT | `soptodatDat` | SOP -> DAT | Extract geometry data as table. |
## MATs — Material Operators (Yellow)
Materials for 3D rendering in Render TOP / Geometry COMP.
| Operator | Type Name | Key Parameters | Use |
|----------|-----------|---------------|-----|
| Phong MAT | `phongMat` | `diff_colorr/g/b`, `spec_colorr/g/b`, `shininess`, `colormap`, `normalmap` | Classic Phong shading. Simple, fast. |
| PBR MAT | `pbrMat` | `basecolorr/g/b`, `metallic`, `roughness`, `normalmap`, `emitcolorr/g/b` | Physically-based rendering. Realistic materials. |
| GLSL MAT | `glslMat` | `dat` (shader DAT), custom uniforms | Custom vertex + fragment shaders for 3D. |
| Constant MAT | `constMat` | `colorr/g/b`, `colormap` | Flat unlit color/texture. No shading. |
| Point Sprite MAT | `pointspriteMat` | `colormap`, `scale` | Render points as camera-facing sprites. Great for particles. |
| Wireframe MAT | `wireframeMat` | `colorr/g/b`, `width` | Wireframe rendering. |
| Depth MAT | `depthMat` | `near`, `far` | Render depth buffer as grayscale. |
## COMPs — Component Operators (Gray)
Containers, 3D scene elements, UI components.
### 3D Scene
| Operator | Type Name | Key Parameters | Use |
|----------|-----------|---------------|-----|
| Geometry COMP | `geometryComp` | `material` (path), `instancechop` (path), `instancing` (toggle) | Renders geometry with material. Instancing host. |
| Camera COMP | `cameraComp` | `tx/ty/tz`, `rx/ry/rz`, `fov`, `near/far` | Camera for Render TOP. |
| Light COMP | `lightComp` | `lighttype` (0=Point, 1=Directional, 2=Spot, 3=Cone), `dimmer`, `colorr/g/b` | Lighting for 3D scenes. |
| Ambient Light COMP | `ambientlightComp` | `dimmer`, `colorr/g/b` | Ambient lighting. |
| Environment Light COMP | `envlightComp` | `envmap` | Image-based lighting (IBL). |
### Containers
| Operator | Type Name | Key Parameters | Use |
|----------|-----------|---------------|-----|
| Container COMP | `containerComp` | `w`, `h`, `bgcolor1/2/3` | UI container. Holds other COMPs for panel layouts. |
| Base COMP | `baseComp` | (none significant) | Generic container. Networks-inside-networks. |
| Replicator COMP | `replicatorComp` | `template`, `operatorsdat` | Clone a template operator N times from a table. |
### Utilities
| Operator | Type Name | Key Parameters | Use |
|----------|-----------|---------------|-----|
| Window COMP | `windowComp` | `winw/h`, `winoffsetx/y`, `monitor`, `borders` | Output window for display/projection. |
| Select COMP | `selectComp` | `rowcol`, `panel` | Select and display content from elsewhere. |
| Engine COMP | `engineComp` | `tox`, `externaltox` | Load external .tox components. Sub-process isolation. |
## Cross-Family Converter Summary
| From | To | Operator | Type Name |
|------|-----|----------|-----------|
| CHOP | TOP | CHOP to TOP | `choptopTop` |
| TOP | CHOP | TOP to CHOP | `topchopChop` |
| DAT | CHOP | DAT to CHOP | `dattochopChop` |
| CHOP | DAT | CHOP to DAT | `choptodatDat` |
| SOP | CHOP | SOP to CHOP | `soptochopChop` |
| CHOP | SOP | CHOP to SOP | `choptosopSop` |
| SOP | DAT | SOP to DAT | `soptodatDat` |
| DAT | SOP | DAT to SOP | `dattosopSop` |
| SOP | TOP | (use Render TOP + Geometry COMP) | — |
| TOP | SOP | TOP to SOP | `toptosopSop` |
@@ -0,0 +1,508 @@
# TouchDesigner MCP — Pitfalls & Lessons Learned
Hard-won knowledge from real TD sessions. Read this before building anything.
## Parameter Names
### 1. NEVER hardcode parameter names — always discover
Parameter names change between TD versions. What works in one build may not work in another. ALWAYS use td_get_par_info to discover actual names from TD.
The agent's LLM training data contains WRONG parameter names. Do not trust them.
Known historical differences (may vary further — always verify):
| What docs/training say | Actual in some versions | Notes |
|---------------|---------------|-------|
| `dat` | `pixeldat` | GLSL TOP pixel shader DAT |
| `colora` | `alpha` | Constant TOP alpha |
| `sizex` / `sizey` | `size` | Blur TOP (single value) |
| `fontr/g/b/a` | `fontcolorr/g/b/a` | Text TOP font color (r/g/b) |
| `fontcolora` | `fontalpha` | Text TOP font alpha (NOT `fontcolora`) |
| `bgcolora` | `bgalpha` | Text TOP bg alpha |
| `value1name` | `vec0name` | GLSL TOP uniform name |
### 2. twozero td_execute_python response format
When calling `td_execute_python` via twozero MCP, successful responses return `(ok)` followed by FPS/error summary (e.g. `[fps 60.0/60] [0 err/0 warn]`), NOT the raw Python `result` dict. If you're parsing responses programmatically, check for the `(ok)` prefix — don't pattern-match on Python variable names from the script. Use `td_get_operator_info` or separate inspection calls to read back values.
### 3. When using td_set_operator_pars, param names must match exactly
Use td_get_par_info to discover them. The MCP tool validates parameter names and returns clear errors explaining what went wrong, unlike raw Python which crashes the whole script with tdAttributeError and stops execution. Always discover before setting.
### 4. Use `safe_par()` pattern for cross-version compatibility
```python
def safe_par(node, name, value):
p = getattr(node.par, name, None)
if p is not None:
p.val = value
return True
return False
```
### 5. `td.tdAttributeError` crashes the whole script — use defensive access
If you do `node.par.nonexistent = value`, TD raises `tdAttributeError` and stops the entire script. Prevention is better than catching:
- Use `op()` instead of `opex()``op()` returns None on failure, `opex()` raises
- Use `hasattr(node.par, 'name')` before accessing any parameter
- Use `getattr(node.par, 'name', None)` with a default
- Use the `safe_par()` pattern from pitfall #3
```python
# WRONG — crashes if param doesn't exist:
node.par.nonexistent = value
# CORRECT — defensive access:
if hasattr(node.par, 'nonexistent'):
node.par.nonexistent = value
```
### 6. `outputresolution` is a string menu, not an integer
```
menuNames: ['useinput','eighth','quarter','half','2x','4x','8x','fit','limit','custom','parpanel']
```
Always use the string form. Setting `outputresolution = 9` may silently fail.
```python
node.par.outputresolution = 'custom' # correct
node.par.resolutionw = 1280; node.par.resolutionh = 720
```
Discover valid values: `list(node.par.outputresolution.menuNames)`
## GLSL Shaders
### 7. `uTDCurrentTime` does NOT exist in GLSL TOP
There is NO built-in time uniform for GLSL TOPs. GLSL MAT has `uTDGeneral.seconds` but that's NOT available in GLSL TOP context.
**PRIMARY — GLSL TOP Vectors/Values page:**
```python
gl.par.value0name = 'uTime'
gl.par.value0.expr = "absTime.seconds"
# In GLSL: uniform float uTime;
```
**FALLBACK — Constant TOP texture (for complex time data):**
CRITICAL: set format to `rgba32float` — default 8-bit clamps to 0-1:
```python
t = root.create(constantTOP, 'time_driver')
t.par.format = 'rgba32float'
t.par.outputresolution = 'custom'
t.par.resolutionw = 1; t.par.resolutionh = 1
t.par.colorr.expr = "absTime.seconds % 1000.0"
t.outputConnectors[0].connect(glsl.inputConnectors[0])
```
### 8. GLSL compile errors are silent in the API
The GLSL TOP shows a yellow warning triangle in the UI but `node.errors()` may return empty string. Check `node.warnings()` too, and create an Info DAT pointed at the GLSL TOP to read the actual compiler output.
### 9. TD GLSL uses `vUV.st` not `gl_FragCoord` — and REQUIRES `TDOutputSwizzle()` on macOS
Standard GLSL patterns don't work. TD provides:
- `vUV.st` — UV coordinates (0-1)
- `uTDOutputInfo.res.zw` — resolution
- `sTD2DInputs[0]` — input textures
- `layout(location = 0) out vec4 fragColor` — output
CRITICAL on macOS: Always wrap output with `TDOutputSwizzle()`:
```glsl
fragColor = TDOutputSwizzle(color);
```
TD uses GLSL 4.60 (Vulkan backend). GLSL 3.30 and earlier removed.
### 10. Large GLSL shaders — write to temp file
GLSL code with special characters can corrupt JSON payloads. Write the shader to a temp file and load it in TD:
```python
# Agent side: write shader to /tmp/shader.glsl via write_file
# TD side:
sd = root.create(textDAT, 'shader_code')
with open('/tmp/shader.glsl', 'r') as f:
sd.text = f.read()
```
## Node Management
### 11. Destroying nodes while iterating `root.children` causes `tdError`
The iterator is invalidated when a child is destroyed. Always snapshot first:
```python
kids = list(root.children) # snapshot
for child in kids:
if child.valid: # check — earlier destroys may cascade
child.destroy()
```
### 11b. Split cleanup and creation into SEPARATE td_execute_python calls
Creating nodes with the same names you just destroyed in the SAME script causes "Invalid OP object" errors — even with `list()` snapshot. TD's internal references can go stale within one execution context.
**WRONG (single call):**
```python
# td_execute_python:
for c in list(root.children):
if c.valid and c.name.startswith('promo_'):
c.destroy()
# ... then create promo_audio, promo_shader etc. in same script → CRASHES
```
**CORRECT (two separate calls):**
```python
# Call 1: td_execute_python — clean only
for c in list(root.children):
if c.valid and c.name.startswith('promo_'):
c.destroy()
# Call 2: td_execute_python — build (separate MCP call)
audio = root.create(audiofileinCHOP, 'promo_audio')
# ... rest of build
```
### 12. Feedback TOP: use `top` parameter, NOT direct input wire
The feedbackTOP's `top` parameter references which TOP to delay. Do NOT also wire that TOP directly into the feedback's input — this creates a real cook dependency loop.
Correct setup:
```python
fb = root.create(feedbackTOP, 'fb_delay')
fb.par.top = comp.path # reference only — no wire to fb input
fb.outputConnectors[0].connect(xf) # fb output -> transform -> fade -> comp
```
The "Cook dependency loop detected" warning on the transform/fade chain is expected.
### 13. GLSL TOP auto-creates companion nodes
Creating a `glslTOP` also creates `name_pixel` (Text DAT), `name_info` (Info DAT), and `name_compute` (Text DAT). These are visible in the network. Don't be alarmed by "extra" nodes.
### 14. The default project root is `/project1`
New TD files start with `/project1` as the main container. System nodes live at `/`, `/ui`, `/sys`, `/local`, `/perform`. Don't create user nodes outside `/project1`.
### 15. Non-Commercial license caps resolution at 1280x1280
Setting `resolutionw=1920` silently clamps to 1280. Always check effective resolution after creation:
```python
n.cook(force=True)
actual = str(n.width) + 'x' + str(n.height)
```
## Recording & Codecs
### 16. MovieFileOut TOP: H.264/H.265/AV1 requires Commercial license
In Non-Commercial TD, these codecs produce an error. Recommended alternatives:
- `prores` — Apple ProRes, **best on macOS**, HW accelerated, NOT license-restricted. ~55MB/s at 1280x720 but lossless quality. **Use this as default on macOS.**
- `cineform` — GoPro Cineform, supports alpha
- `hap` — GPU-accelerated playback, large files
- `notchlc` — GPU-accelerated, good quality
- `mjpa` — Motion JPEG, legacy fallback (lossy, use only if ProRes unavailable)
For image sequences: `rec.par.type = 'imagesequence'`, `rec.par.imagefiletype = 'png'`
### 17. MovieFileOut `.record()` method may not exist
Use the toggle parameter instead:
```python
rec.par.record = True # start recording
rec.par.record = False # stop recording
```
When setting file path and starting recording in the same script, use delayFrames:
```python
rec.par.file = '/tmp/new_output.mov'
run("op('/project1/recorder').par.record = True", delayFrames=2)
```
### 18. TOP.save() captures same frame when called rapidly
Use MovieFileOut for real-time recording. Set `project.realTime = False` for frame-accurate output.
### 19. AudioFileIn CHOP: cue and recording sequence matters
The recording sequence must be done in exact order, or the recording will be empty, audio will start mid-file, or the file won't be written.
**Proven recording sequence:**
```python
# Step 1: Stop any existing recording
rec.par.record = False
# Step 2: Reset audio to beginning
audio.par.play = False
audio.par.cue = True
audio.par.cuepoint = 0 # may need cuepointunit=0 too
# Verify: audio.par.cue.eval() should be True
# Step 3: Set output file path
rec.par.file = '/tmp/output.mov'
# Step 4: Release cue + start playing + start recording (with frame delay)
audio.par.cue = False
audio.par.play = True
audio.par.playmode = 2 # Sequential — plays once through
run("op('/project1/recorder').par.record = True", delayFrames=3)
```
**Why each step matters:**
- `rec.par.record = False` first — if a previous recording is active, setting `par.file` may fail silently
- `audio.par.cue = True` + `cuepoint = 0` — guarantees audio starts from the beginning, otherwise the spectrum may be silent for the first few seconds
- `delayFrames=3` on the record start — setting `par.file` and `par.record = True` in the same script can race; the file path needs a frame to register before recording starts
- `playmode = 2` (Sequential) — plays the file once. Use `playmode = 0` (Locked to Timeline) if you want TD's timeline to control position
## TD Python API Patterns
### 20. COMP extension setup: ext0object format is CRITICAL
`ext0object` expects a CONSTANT string (NOT expression mode):
```python
comp.par.ext0object = "op('./myExtensionDat').module.MyClassName(me)"
```
NEVER set as just the DAT name. NEVER use ParMode.EXPRESSION. ALWAYS ensure the DAT has `par.language='python'`.
### 21. td.Panel is NOT subscriptable — use attribute access
```python
comp.panel.select # correct (attribute access, returns float)
comp.panel['select'] # WRONG — 'td.Panel' object is not subscriptable
```
### 22. ALWAYS use relative paths in script callbacks
In scriptTOP/CHOP/SOP/DAT callbacks, use paths relative to `scriptOp` or `me`:
```python
root = scriptOp.parent().parent()
dat = root.op('pixel_data')
```
NEVER hardcode absolute paths like `op('/project1/myComp/child')` — they break when containers are renamed or copied.
### 23. keyboardinCHOP channel names have 'k' prefix
Channel names are `kup`, `kdown`, `kleft`, `kright`, `ka`, `kb`, etc. — NOT `up`, `down`, `a`, `b`. Always verify with:
```python
channels = [c.name for c in op('/project1/keyboard1').chans()]
```
### 24. expressCHOP cook-only properties — false positive errors
`me.inputVal`, `me.chanIndex`, `me.sampleIndex` work ONLY in cook-context. Calling `par.expr0expr.eval()` from outside always raises an error — this is NOT a real operator error. Ignore these in error scans.
### 25. td.Vertex attributes — use index access not named attributes
In TD 2025.32, `td.Vertex` objects do NOT have `.x`, `.y`, `.z` attributes:
```python
# WRONG — crashes:
vertex.x, vertex.y, vertex.z
# CORRECT — index-based:
vertex.point.P[0], vertex.point.P[1], vertex.point.P[2]
# Or for SOP point positions:
pt = sop.points()[i]
pos = pt.P # use P[0], P[1], P[2]
```
## Audio
### 26. Audio Spectrum CHOP output is weak — boost it
Raw output is very small (0.001-0.05). Use built-in boost: `spectrum.par.highfrequencyboost = 3.0`
If still weak, add Math CHOP in Range mode: `fromrangehi=0.05, torangehi=1.0`
### 27. AudioSpectrum CHOP: timeslice and sample count are the #1 gotcha
AudioSpectrum at 44100Hz with `timeslice=False` outputs the ENTIRE audio file as samples (~24000+). CHOP-to-TOP then exceeds texture resolution max and warns/fails.
**Fix:** Keep `timeslice = True` (default) for real-time per-frame FFT. Set `fftsize` to control bin count (it's a STRING enum: `'256'` not `256`).
If the CHOP-to-TOP still gets too many samples, set `layout = 'rowscropped'` on the choptoTOP.
```python
spectrum.par.fftsize = '256' # STRING, not int — enum values
spectrum.par.timeslice = True # MUST be True for real-time audio reactivity
spectex.par.layout = 'rowscropped' # handles oversized CHOP inputs
```
**resampleCHOP has NO `numsamples` param.** It uses `rate`, `start`, `end`, `method`. Don't guess — always `td_get_par_info('resampleCHOP')` first.
### 28. CHOP To TOP has NO input connectors — use par.chop reference
```python
spec_tex = root.create(choptoTOP, 'spectrum_tex')
spec_tex.par.chop = resample # correct: parameter reference
# NOT: resample.outputConnectors[0].connect(spec_tex.inputConnectors[0]) # WRONG
```
## Workflow
### 29. Always verify after building — errors are silent
Node errors and broken connections produce no output. Always check:
```python
for c in list(root.children):
e = c.errors()
w = c.warnings()
if e: print(c.name, 'ERR:', e)
if w: print(c.name, 'WARN:', w)
```
### 30. Window COMP param for display target is `winop`
```python
win = root.create(windowCOMP, 'display')
win.par.winop = '/project1/logo_out'
win.par.winw = 1280; win.par.winh = 720
win.par.winopen.pulse()
```
### 31. `sample()` returns frozen pixels in rapid calls
`out.sample(x, y)` returns pixels from a single cook snapshot. Compare samples with 2+ second delays, or use screencapture on the display window.
### 32. Audio-reactive GLSL: dual-layer sync pipeline
For audio-synced visuals, use BOTH layers for maximum effect:
**Layer 1 (TD-side, real-time):** AudioFileIn → AudioSpectrum(timeslice=True, fftsize='256') → Math(gain=5) → choptoTOP(par.chop=math, layout='rowscropped') → GLSL input. The shader samples `sTD2DInputs[1]` at different x positions for bass/mid/hi. Record the TD output with MovieFileOut.
**Layer 2 (Python-side, post-hoc):** scipy FFT on the SAME audio file → per-frame features (rms, bass, mid, hi, beat detection) → drive ASCII brightness, chromatic aberration, beat flashes during the render pass.
Both layers locked to the same audio file = visuals genuinely sync to the beat at two independent stages.
**Key gotcha:** AudioFileIn must be cued (`par.cue=True``par.cuepulse.pulse()`) then uncued (`par.cue=False`, `par.play=True`) before recording starts. Otherwise the spectrum is silent for the first few seconds.
### 33. twozero MCP: benchmark and prefer native tools
Benchmarked April 2026: twozero MCP with 36 native tools. The old curl/REST method (port 9981) had zero native tools.
**Always prefer native MCP tools over td_execute_python:**
- `td_create_operator` over `root.create()` scripts (handles viewport positioning)
- `td_set_operator_pars` over `node.par.X = Y` scripts (validates param names)
- `td_get_par_info` over temp-node discovery dance (instant, no cleanup)
- `td_get_errors` over manual `c.errors()` loops
- `td_get_focus` for context awareness (no equivalent in old method)
Only fall back to `td_execute_python` for multi-step logic (wiring chains, conditional builds, loops).
### 34. twozero td_execute_python response wrapping
twozero wraps `td_execute_python` responses with status info: `(ok)\n\n[fps 60.0/60] [0 err/0 warn]`. Your Python `result` variable value may not appear verbatim in the response text. If you need to check results programmatically, use `print()` statements in the script — they appear in the response. Don't rely on string-matching the `result` dict.
### 35. Audio-reactive chain: DO NOT use Lag CHOP or Filter CHOP for spectrum smoothing
The Derivative docs and tutorials suggest using Lag CHOP (lag1=0.2, lag2=0.5) to smooth raw FFT output before passing to a shader. **This does NOT work with AudioSpectrum → CHOP to TOP → GLSL.**
What happens: Lag CHOP operates in timeslice mode. A 256-sample spectrum input gets expanded to 1600-2400 samples. The Lag averaging drives all values to near-zero (~1e-06). The CHOP to TOP produces a 2400x2 texture instead of 256x2. The shader receives effectively zero audio data.
**The correct chain is: Spectrum(outlength=256) → Math(gain=10) → CHOPtoTOP → GLSL.** No CHOP smoothing at all. If you need smoothing, do it in the GLSL shader via temporal lerp with a feedback texture.
Verified values with audio playing:
- Without Lag CHOP: bass bins = 5.0-5.4, mid bins = 1.0-1.7 (strong, usable)
- With Lag CHOP: ALL bins = 0.000001-0.00004 (dead, zero audio reactivity)
### 36. AudioSpectrum Output Length: set manually to avoid CHOP to TOP overflow
AudioSpectrum in Visualization mode with FFT 8192 outputs 22,050 samples by default (1 per Hz, 022050). CHOP to TOP cannot handle this — you get "Number of samples exceeded texture resolution max".
Fix: `spectrum.par.outputmenu = 'setmanually'` and `spectrum.par.outlength = 256`. This gives 256 frequency bins — plenty for visual FFT.
DO NOT set `timeslice = False` as a workaround — that processes the entire audio file at once and produces even more samples.
### 37. GLSL spectrum texture from CHOP to TOP is 256x2 not 256x1
AudioSpectrum outputs 2 channels (stereo: chan1, chan2). CHOP to TOP with `dataformat='r'` creates a 256x2 texture — one row per channel. Sample the first channel at `y=0.25` (center of first row), NOT `y=0.5` (boundary between rows):
```glsl
float bass = texture(sTD2DInputs[1], vec2(0.05, 0.25)).r; // correct
float bass = texture(sTD2DInputs[1], vec2(0.05, 0.5)).r; // WRONG — samples between rows
```
### 38. FPS=0 doesn't mean ops aren't cooking — check play state
TD can show `fps:0` in `td_get_perf` while ops still cook and `TOP.save()` still produces valid screenshots. The two most common causes:
**a) Project is paused (playbar stopped).** TD's playbar can be toggled with spacebar. The `root` at `/` has no `.playbar` attribute (it's on the perform COMP). The easiest fix is sending a spacebar keypress via `td_input_execute`, though this tool can sometimes error. As a workaround, `TOP.save()` always works regardless of play state — use it to verify rendering is actually happening before spending time debugging FPS.
**b) Audio device CHOP blocking the main thread.** An `audiooutCHOP` with an active audio device can consume 300-400ms/s (2000%+ of frame budget), stalling the cook loop at FPS=0. Fix: keep the CHOP active but set `volume=0` to prevent the audio driver from blocking. Disabling it entirely (`active=False`) may also work but can prevent downstream audio processing CHOPs from cooking.
Diagnostic sequence when FPS=0:
1. `td_get_perf` — check if any op has extreme CPU/s
2. `TOP.save()` on the output — if it produces a valid image, the pipeline works, just not at real-time rate
3. Check for blocking CHOPs (audioout, audiodevin, etc.)
4. Toggle play state (spacebar, or check if absTime.seconds is advancing)
### 39. Recording while FPS=0 produces empty or near-empty files
This is the #1 cause of "I recorded for 30 seconds but got a 2-frame video." If TD's cook loop is stalled (FPS=0 or very low), MovieFileOut has nothing to record. Unlike `TOP.save()` which captures the last cooked frame regardless, MovieFileOut only writes frames that actually cook.
**Always verify FPS before starting a recording:**
```python
# Check via td_get_perf first
# If FPS < 30, do NOT start recording — fix the performance issue first
# If FPS=0, the playbar is likely paused — see pitfall #37
```
Common causes of recording empty video:
- Playbar paused (FPS=0) — see pitfall #37
- Audio device CHOP blocking the main thread — see pitfall #37b
- Recording started before audio was cued — audio is silent, GLSL outputs black, MovieFileOut records black frames that look empty
- `par.file` set in the same script as `par.record = True` — see pitfall #18
### 40. GLSL shader produces black output — test before committing to a long render
New GLSL shaders can fail silently (see pitfall #7). Before recording a long take, always:
1. **Write a minimal test shader first** that just outputs a solid color or pass-through:
```glsl
void main() {
vec2 uv = vUV.st;
fragColor = TDOutputSwizzle(vec4(uv, 0.0, 1.0));
}
```
2. **Verify the test renders correctly** via `td_get_screenshot` on the GLSL TOP's output.
3. **Swap in the real shader** and screenshot again immediately. If black, the shader has a compile error or logic issue.
4. **Only then start recording.** A 90-second ProRes recording is ~5GB. Recording black frames wastes disk and time.
Common causes of black GLSL output:
- Missing `TDOutputSwizzle()` on macOS (pitfall #8)
- Time uniform not connected — shader uses default 0.0, fractal stays at origin
- Spectrum texture not connected — audio values all 0.0, driving everything to black
- Integer division where float division was expected (`1/2 = 0` not `0.5`)
- `absTime.seconds % 1000.0` rolled over past 1000 and the modulo produces unexpected values
### 41. td_write_dat uses `text` parameter, NOT `content`
The MCP tool `td_write_dat` expects a `text` parameter for full replacement. Passing `content` returns an error: `"Provide either 'text' for full replace, or 'old_text'+'new_text' for patching"`.
If `td_write_dat` fails, fall back to `td_execute_python`:
```python
op("/project1/shader_code").text = shader_string
```
### 42. td_execute_python does NOT return stdout or print() output
Despite what earlier versions of pitfall #33 stated, `print()` and `debug()` output from `td_execute_python` scripts does NOT appear in the MCP response. The response is always just `(ok)` + FPS/error summary. To read values back, use dedicated inspection tools (`td_get_operator_info`, `td_read_dat`, `td_read_chop`) instead of trying to print from within a script.
### 43. td_get_operator_info JSON is appended with `[fps X.X/X]` — breaks json.loads()
The response text from `td_get_operator_info` has `[fps 60.0/60]` appended after the JSON object. This causes `json.loads()` to fail with "Extra data" errors. Strip it before parsing:
```python
clean = response_text.rsplit('[fps', 1)[0]
data = json.loads(clean)
```
### 44. td_get_screenshot is asynchronous — returns `{"status": "pending"}`
Screenshots don't complete instantly. The tool returns `{"status": "pending", "requestId": "..."}` and the actual file appears later. Wait a few seconds before checking for the file. There is no callback or completion notification — poll the filesystem.
### 45. Recording duration is manual — no auto-stop at audio end
MovieFileOut records until `par.record = False` is set. If audio ends before you stop recording, the file keeps growing with repeated frames. Always stop recording promptly after the audio duration. For precision: set a timer on the agent side matching the audio length, then send `par.record = False`. Trim excess with ffmpeg as a safety net:
```bash
ffmpeg -i raw.mov -t 25 -c copy trimmed.mov
```
@@ -0,0 +1,463 @@
# TouchDesigner Python API Reference
## The td Module
TouchDesigner's Python environment auto-imports the `td` module. All TD-specific classes, functions, and constants live here. Scripts inside TD (Script DATs, CHOP/DAT Execute callbacks, Extensions) have full access.
When using the MCP `execute_python_script` tool, these globals are pre-loaded:
- `op` — shortcut for `td.op()`, finds operators by path
- `ops` — shortcut for `td.ops()`, finds multiple operators by pattern
- `me` — the operator running the script (via MCP this is the twozero internal executor)
- `parent` — shortcut for `me.parent()`
- `project` — the root project component
- `td` — the full td module
## Finding Operators: op() and ops()
### op(path) — Find a single operator
```python
# Absolute path (always works from MCP)
node = op('/project1/noise1')
# Relative path (relative to current operator — only in Script DATs)
node = op('noise1') # sibling
node = op('../noise1') # parent's sibling
# Returns None if not found (does NOT raise)
node = op('/project1/nonexistent') # None
```
### ops(pattern) — Find multiple operators
```python
# Glob patterns
nodes = ops('/project1/noise*') # all nodes starting with "noise"
nodes = ops('/project1/*') # all direct children
nodes = ops('/project1/container1/*') # all children of container1
# Returns a tuple of operators (may be empty)
for n in ops('/project1/*'):
print(n.name, n.OPType)
```
### Navigation from a node
```python
node = op('/project1/noise1')
node.name # 'noise1'
node.path # '/project1/noise1'
node.OPType # 'noiseTop'
node.type # <class 'noiseTop'>
node.family # 'TOP'
# Parent / children
node.parent() # the parent COMP
node.parent().children # all siblings + self
node.parent().findChildren(name='noise*') # filtered
# Type checking
node.isTOP # True
node.isCHOP # False
node.isSOP # False
node.isDAT # False
node.isMAT # False
node.isCOMP # False
```
## Parameters
Every operator has parameters accessed via the `.par` attribute.
### Reading parameters
```python
node = op('/project1/noise1')
# Direct access
node.par.seed.val # current evaluated value (may be an expression result)
node.par.seed.eval() # same as .val
node.par.seed.default # default value
node.par.monochrome.val # boolean parameters: True/False
# List all parameters
for p in node.pars():
print(f"{p.name}: {p.val} (default: {p.default})")
# Filter by page (parameter group)
for p in node.pars('Noise'): # page name
print(f"{p.name}: {p.val}")
```
### Setting parameters
```python
# Direct value setting
node.par.seed.val = 42
node.par.monochrome.val = True
node.par.resolutionw.val = 1920
node.par.resolutionh.val = 1080
# String parameters
op('/project1/text1').par.text.val = 'Hello World'
# File paths
op('/project1/moviefilein1').par.file.val = '/path/to/video.mp4'
# Reference another operator (for "dat", "chop", "top" type parameters)
op('/project1/glsl1').par.dat.val = '/project1/shader_code'
```
### Parameter expressions
```python
# Python expressions that evaluate dynamically
node.par.seed.expr = "me.time.frame"
node.par.tx.expr = "math.sin(me.time.seconds * 2)"
# Reference another parameter
node.par.brightness1.expr = "op('/project1/constant1').par.value0.val"
# Export (one-way binding from CHOP to parameter)
# This makes the parameter follow a CHOP channel value
op('/project1/noise1').par.seed.val # can also be driven by exports
```
### Parameter types
| Type | Python Type | Example |
|------|------------|---------|
| Float | `float` | `node.par.brightness1.val = 0.5` |
| Int | `int` | `node.par.seed.val = 42` |
| Toggle | `bool` | `node.par.monochrome.val = True` |
| String | `str` | `node.par.text.val = 'hello'` |
| Menu | `int` (index) or `str` (label) | `node.par.type.val = 'sine'` |
| File | `str` (path) | `node.par.file.val = '/path/to/file'` |
| OP reference | `str` (path) | `node.par.dat.val = '/project1/text1'` |
| Color | separate r/g/b/a floats | `node.par.colorr.val = 1.0` |
| XY/XYZ | separate x/y/z floats | `node.par.tx.val = 0.5` |
## Creating and Deleting Operators
```python
# Create via parent component
parent = op('/project1')
new_node = parent.create(noiseTop) # using class reference
new_node = parent.create(noiseTop, 'my_noise') # with custom name
# The MCP create_td_node tool handles this automatically:
# create_td_node(parentPath="/project1", nodeType="noiseTop", nodeName="my_noise")
# Delete
node = op('/project1/my_noise')
node.destroy()
# Copy
original = op('/project1/noise1')
copy = parent.copy(original, name='noise1_copy')
```
## Connections (Wiring Operators)
### Output to Input connections
```python
# Connect noise1's output to level1's input
op('/project1/noise1').outputConnectors[0].connect(op('/project1/level1'))
# Connect to specific input index (for multi-input operators like Composite)
op('/project1/noise1').outputConnectors[0].connect(op('/project1/composite1').inputConnectors[0])
op('/project1/text1').outputConnectors[0].connect(op('/project1/composite1').inputConnectors[1])
# Disconnect all outputs
op('/project1/noise1').outputConnectors[0].disconnect()
# Query connections
node = op('/project1/level1')
inputs = node.inputs # list of connected input operators
outputs = node.outputs # list of connected output operators
```
### Connection patterns for common setups
```python
# Linear chain: A -> B -> C -> D
ops_list = [op(f'/project1/{name}') for name in ['noise1', 'level1', 'blur1', 'null1']]
for i in range(len(ops_list) - 1):
ops_list[i].outputConnectors[0].connect(ops_list[i+1])
# Fan-out: A -> B, A -> C, A -> D
source = op('/project1/noise1')
for target_name in ['level1', 'composite1', 'transform1']:
source.outputConnectors[0].connect(op(f'/project1/{target_name}'))
# Merge: A + B + C -> Composite
comp = op('/project1/composite1')
for i, source_name in enumerate(['noise1', 'text1', 'ramp1']):
op(f'/project1/{source_name}').outputConnectors[0].connect(comp.inputConnectors[i])
```
## DAT Content Manipulation
### Text DATs
```python
dat = op('/project1/text1')
# Read
content = dat.text # full text as string
# Write
dat.text = "new content"
dat.text = '''multi
line
content'''
# Append
dat.text += "\nnew line"
```
### Table DATs
```python
dat = op('/project1/table1')
# Read cell
val = dat[0, 0] # row 0, col 0
val = dat[0, 'name'] # row 0, column named 'name'
val = dat['key', 1] # row named 'key', col 1
# Write cell
dat[0, 0] = 'value'
# Read row/col
row = dat.row(0) # list of Cell objects
col = dat.col('name') # list of Cell objects
# Dimensions
rows = dat.numRows
cols = dat.numCols
# Append row
dat.appendRow(['col1_val', 'col2_val', 'col3_val'])
# Clear
dat.clear()
# Set entire table
dat.clear()
dat.appendRow(['name', 'value', 'type'])
dat.appendRow(['frequency', '440', 'float'])
dat.appendRow(['amplitude', '0.8', 'float'])
```
## Time and Animation
```python
# Global time
td.absTime.frame # absolute frame number (never resets)
td.absTime.seconds # absolute seconds
# Timeline time (affected by play/pause/loop)
me.time.frame # current frame on timeline
me.time.seconds # current seconds on timeline
me.time.rate # FPS setting
# Timeline control (via execute_python_script)
project.play = True
project.play = False
project.frameRange = (1, 300) # set timeline range
# Cook frame (when operator was last computed)
node.cookFrame
node.cookTime
```
## Extensions (Custom Python Classes on Components)
Extensions add custom Python methods and attributes to COMPs.
```python
# Create extension on a Base COMP
base = op('/project1/myBase')
# The extension class is defined in a Text DAT inside the COMP
# Typically named 'ExtClass' with the extension code:
extension_code = '''
class MyExtension:
def __init__(self, ownerComp):
self.ownerComp = ownerComp
self.counter = 0
def Reset(self):
self.counter = 0
def Increment(self):
self.counter += 1
return self.counter
@property
def Count(self):
return self.counter
'''
# Write extension code to DAT inside the COMP
op('/project1/myBase/extClass').text = extension_code
# Configure the extension on the COMP
base.par.extension1 = 'extClass' # name of the DAT
base.par.promoteextension1 = True # promote methods to parent
# Call extension methods
base.Increment() # calls MyExtension.Increment()
count = base.Count # accesses MyExtension.Count property
base.Reset()
```
## Useful Built-in Modules
### tdu — TouchDesigner Utilities
```python
import tdu
# Dependency tracking (reactive values)
dep = tdu.Dependency(initial_value)
dep.val = new_value # triggers dependents to recook
# File path utilities
tdu.expandPath('$HOME/Desktop/output.mov')
# Math
tdu.clamp(value, min, max)
tdu.remap(value, from_min, from_max, to_min, to_max)
```
### TDFunctions
```python
from TDFunctions import *
# Commonly used utilities
clamp(value, low, high)
remap(value, inLow, inHigh, outLow, outHigh)
interp(value1, value2, t) # linear interpolation
```
### TDStoreTools — Persistent Storage
```python
from TDStoreTools import StorageManager
# Store data that survives project reload
me.store('myKey', 'myValue')
val = me.fetch('myKey', default='fallback')
# Storage dict
me.storage['key'] = value
```
## Common Patterns via execute_python_script
### Build a complete chain
```python
# Create a complete audio-reactive noise chain
parent = op('/project1')
# Create operators
audio_in = parent.create(audiofileinChop, 'audio_in')
spectrum = parent.create(audiospectrumChop, 'spectrum')
chop_to_top = parent.create(choptopTop, 'chop_to_top')
noise = parent.create(noiseTop, 'noise1')
level = parent.create(levelTop, 'level1')
null_out = parent.create(nullTop, 'out')
# Wire the chain
audio_in.outputConnectors[0].connect(spectrum)
spectrum.outputConnectors[0].connect(chop_to_top)
noise.outputConnectors[0].connect(level)
level.outputConnectors[0].connect(null_out)
# Set parameters
audio_in.par.file = '/path/to/music.wav'
audio_in.par.play = True
spectrum.par.size = 512
noise.par.type = 1 # Sparse
noise.par.monochrome = False
noise.par.resolutionw = 1920
noise.par.resolutionh = 1080
level.par.opacity = 0.8
level.par.gamma1 = 0.7
```
### Query network state
```python
# Get all TOPs in the project
tops = [c for c in op('/project1').findChildren(type=TOP)]
for t in tops:
print(f"{t.path}: {t.OPType} {'ERROR' if t.errors() else 'OK'}")
# Find all operators with errors
def find_errors(parent_path='/project1'):
parent = op(parent_path)
errors = []
for child in parent.findChildren(depth=-1):
if child.errors():
errors.append((child.path, child.errors()))
return errors
result = find_errors()
```
### Batch parameter changes
```python
# Set parameters on multiple nodes at once
settings = {
'/project1/noise1': {'seed': 42, 'monochrome': False, 'resolutionw': 1920},
'/project1/level1': {'brightness1': 1.2, 'gamma1': 0.8},
'/project1/blur1': {'sizex': 5, 'sizey': 5},
}
for path, params in settings.items():
node = op(path)
if node:
for key, val in params.items():
setattr(node.par, key, val)
```
## Python Version and Packages
TouchDesigner bundles Python 3.11+ with these pre-installed:
- **numpy** — array operations, fast math
- **scipy** — signal processing, FFT
- **OpenCV** (cv2) — computer vision
- **PIL/Pillow** — image processing
- **requests** — HTTP client
- **json**, **re**, **os**, **sys** — standard library
**IMPORTANT:** Parameter names in examples below are illustrative. Always run discovery (SKILL.md Step 0) to get actual names for your TD version. Do NOT copy param names from these examples verbatim.
Custom packages can be installed to TD's Python site-packages directory. See TD documentation for the exact path per platform.
## SOP Vertex/Point Access (TD 2025.32)
In TD 2025.32, `td.Vertex` does NOT have `.x`, `.y`, `.z` attributes. Use index access:
```python
# WRONG — crashes in TD 2025.32:
vertex.x, vertex.y, vertex.z
# CORRECT — index/attribute access:
pt = sop.points()[i]
pos = pt.P # Position object
x, y, z = pos[0], pos[1], pos[2]
# Always introspect first:
dir(sop.points()[0]) # see what attributes actually exist
dir(sop.points()[0].P) # see Position object interface
```
@@ -0,0 +1,244 @@
# TouchDesigner Troubleshooting (twozero MCP)
> See `references/pitfalls.md` for the comprehensive lessons-learned list.
## 1. Connection Issues
### Port 40404 not responding
Check these in order:
1. Is TouchDesigner running?
```bash
pgrep TouchDesigner
```
1b. Quick hub health check (no JSON-RPC needed):
A plain GET to the MCP URL returns instance info:
```
curl -s http://localhost:40404/mcp
```
Returns: `{"hub": true, "pid": ..., "instances": {"127.0.0.1_PID": {"project": "...", "tdVersion": "...", ...}}}`
If this returns JSON but `instances` is empty, TD is running but twozero hasn't registered yet.
2. Is twozero installed in TD?
Open TD Palette Browser > twozero should be listed. If not, install it.
3. Is MCP enabled in twozero settings?
In TD, open twozero preferences and confirm MCP server is toggled ON.
4. Test the port directly:
```bash
nc -z 127.0.0.1 40404
```
5. Test the MCP endpoint:
```bash
curl -s http://localhost:40404/mcp
```
Should return JSON with hub info. If it does, the server is running.
### Hub responds but no TD instances
The twozero MCP hub is running but TD hasn't registered. Causes:
- TD project not loaded yet (still on splash screen)
- twozero COMP not initialized in the current project
- twozero version mismatch
Fix: Open/reload a TD project that contains the twozero COMP. Use td_list_instances
to check which TD instances are registered.
### Multi-instance setup
twozero auto-assigns ports for multiple TD instances:
- First instance: 40404
- Second instance: 40405
- Third instance: 40406
- etc.
Use `td_list_instances` to discover all running instances and their ports.
## 2. MCP Tool Errors
### td_execute_python returns error
The error message from td_execute_python often contains the Python traceback.
If it's unclear, use `td_read_textport` to see the full TD console output —
Python exceptions are always printed there.
Common causes:
- Syntax error in the script
- Referencing a node that doesn't exist (op() returns None, then you call .par on None)
- Using wrong parameter names (see pitfalls.md)
### td_set_operator_pars fails
Parameter name mismatch is the #1 cause. The tool validates param names and
returns clear errors, but you must use exact names.
Fix: ALWAYS call `td_get_par_info` first to discover the real parameter names:
```
td_get_par_info(op_type='glslTOP')
td_get_par_info(op_type='noiseTOP')
```
### td_create_operator type name errors
Operator type names use camelCase with family suffix:
- CORRECT: noiseTOP, glslTOP, levelTOP, compositeTOP, audiospectrumCHOP
- WRONG: NoiseTOP, noise_top, NOISE TOP, Noise
### td_get_operator_info for deep inspection
If unsure about any aspect of an operator (params, inputs, outputs, state):
```
td_get_operator_info(path='/project1/noise1', detail='full')
```
## 3. Parameter Discovery
CRITICAL: ALWAYS use td_get_par_info to discover parameter names.
The agent's LLM training data contains WRONG parameter names for TouchDesigner.
Do not trust them. Known wrong names include dat vs pixeldat, colora vs alpha,
sizex vs size, and many more. See pitfalls.md for the full list.
Workflow:
1. td_get_par_info(op_type='glslTOP') — get all params for a type
2. td_get_operator_info(path='/project1/mynode', detail='full') — get params for a specific instance
3. Use ONLY the names returned by these tools
## 4. Performance
### Diagnosing slow performance
Use `td_get_perf` to see which operators are slow. Look at cook times —
anything over 1ms per frame is worth investigating.
Common causes:
- Resolution too high (especially on Non-Commercial)
- Complex GLSL shaders
- Too many TOP-to-CHOP or CHOP-to-TOP transfers (GPU-CPU memory copies)
- Feedback loops without decay (values accumulate, memory grows)
### Non-Commercial license restrictions
- Resolution cap: 1280x1280. Setting resolutionw=1920 silently clamps to 1280.
- H.264/H.265/AV1 encoding requires Commercial license. Use ProRes or Hap instead.
- No commercial use of output.
Always check effective resolution after creation:
```python
n.cook(force=True)
actual = str(n.width) + 'x' + str(n.height)
```
## 5. Hermes Configuration
### Config location
`$HERMES_HOME/config.yaml` (defaults to `~/.hermes/config.yaml` when `HERMES_HOME` is unset)
### MCP entry format
The twozero TD entry should look like:
```yaml
mcpServers:
twozero_td:
url: http://localhost:40404/mcp
```
### After config changes
Restart the Hermes session for changes to take effect. The MCP connection is
established at session startup.
### Verifying MCP tools are available
After restarting, the session log should show twozero MCP tools registered.
If tools show as registered but aren't callable, check:
- The twozero MCP hub is still running (curl test above)
- TD is still running with a project loaded
- No firewall blocking localhost:40404
## 6. Node Creation Issues
### "Node type not found" error
Wrong type string. Use camelCase with family suffix:
- Wrong: NoiseTop, noise_top, NOISE TOP
- Right: noiseTOP
### Node created but not visible
Check parentPath — use absolute paths like /project1. The default project
root is /project1. System nodes live at /, /ui, /sys, /local, /perform.
Don't create user nodes outside /project1.
### Cannot create node inside a non-COMP
Only COMP operators (Container, Base, Geometry, etc.) can contain children.
You cannot create nodes inside a TOP, CHOP, SOP, DAT, or MAT.
## 7. Wiring Issues
### Cross-family wiring
TOPs connect to TOPs, CHOPs to CHOPs, SOPs to SOPs, DATs to DATs.
Use converter operators to bridge: choptoTOP, topToCHOP, soptoDAT, etc.
Note: choptoTOP has NO input connectors. Use par.chop reference instead:
```python
spec_tex.par.chop = resample_node # correct
# NOT: resample.outputConnectors[0].connect(spec_tex.inputConnectors[0])
```
### Feedback loops
Never create A -> B -> A directly. Use a Feedback TOP:
```python
fb = root.create(feedbackTOP, 'fb')
fb.par.top = comp.path # reference only, no wire to fb input
fb.outputConnectors[0].connect(next_node)
```
"Cook dependency loop detected" warning on the chain is expected and correct.
## 8. GLSL Issues
### Shader compilation errors are silent
GLSL TOP shows a yellow warning in the UI but node.errors() may return empty.
Check node.warnings() too. Create an Info DAT pointed at the GLSL TOP for
full compiler output.
### TD GLSL specifics
- Uses GLSL 4.60 (Vulkan backend). GLSL 3.30 and earlier removed.
- UV coordinates: vUV.st (not gl_FragCoord)
- Input textures: sTD2DInputs[0]
- Output: layout(location = 0) out vec4 fragColor
- macOS CRITICAL: Always wrap output with TDOutputSwizzle(color)
- No built-in time uniform. Pass time via GLSL TOP Values page or Constant TOP.
## 9. Recording Issues
### H.264/H.265/AV1 requires Commercial license
Use Apple ProRes on macOS (hardware accelerated, not license-restricted):
```python
rec.par.videocodec = 'prores' # Preferred on macOS — lossless, Non-Commercial OK
# rec.par.videocodec = 'mjpa' # Fallback — lossy, works everywhere
```
### MovieFileOut has no .record() method
Use the toggle parameter:
```python
rec.par.record = True # start
rec.par.record = False # stop
```
### All exported frames identical
TOP.save() captures same frame when called rapidly. Use MovieFileOut for
real-time recording. Set project.realTime = False for frame-accurate output.
@@ -0,0 +1,115 @@
#!/usr/bin/env bash
# setup.sh — Automated setup for twozero MCP plugin for TouchDesigner
# Idempotent: safe to run multiple times.
set -euo pipefail
GREEN='\033[0;32m'; RED='\033[0;31m'; YELLOW='\033[1;33m'; CYAN='\033[0;36m'; NC='\033[0m'
OK="${GREEN}${NC}"; FAIL="${RED}${NC}"; WARN="${YELLOW}${NC}"
TWOZERO_URL="https://www.404zero.com/pisang/twozero.tox"
TOX_PATH="$HOME/Downloads/twozero.tox"
HERMES_HOME_DIR="${HERMES_HOME:-$HOME/.hermes}"
HERMES_CFG="${HERMES_HOME_DIR}/config.yaml"
MCP_PORT=40404
MCP_ENDPOINT="http://localhost:${MCP_PORT}/mcp"
manual_steps=()
echo -e "\n${CYAN}═══ twozero MCP for TouchDesigner — Setup ═══${NC}\n"
# ── 1. Check if TouchDesigner is running ──
# Match on process *name* (not full cmdline) to avoid self-matching shells
# that happen to have "TouchDesigner" in their args. macOS and Linux pgrep
# both support -x for exact name match.
if pgrep -x TouchDesigner >/dev/null 2>&1 || pgrep -x TouchDesignerFTE >/dev/null 2>&1; then
echo -e " ${OK} TouchDesigner is running"
td_running=true
else
echo -e " ${WARN} TouchDesigner is not running"
td_running=false
fi
# ── 2. Ensure twozero.tox exists ──
if [[ -f "$TOX_PATH" ]]; then
echo -e " ${OK} twozero.tox already exists at ${TOX_PATH}"
else
echo -e " ${WARN} twozero.tox not found — downloading..."
if curl -fSL -o "$TOX_PATH" "$TWOZERO_URL" 2>/dev/null; then
echo -e " ${OK} Downloaded twozero.tox to ${TOX_PATH}"
else
echo -e " ${FAIL} Failed to download twozero.tox from ${TWOZERO_URL}"
echo " Please download manually and place at ${TOX_PATH}"
manual_steps+=("Download twozero.tox from ${TWOZERO_URL} to ${TOX_PATH}")
fi
fi
# ── 3. Ensure Hermes config has twozero_td MCP entry ──
if [[ ! -f "$HERMES_CFG" ]]; then
echo -e " ${FAIL} Hermes config not found at ${HERMES_CFG}"
manual_steps+=("Create ${HERMES_CFG} with twozero_td MCP server entry")
elif grep -q 'twozero_td' "$HERMES_CFG" 2>/dev/null; then
echo -e " ${OK} twozero_td MCP entry exists in Hermes config"
else
echo -e " ${WARN} Adding twozero_td MCP entry to Hermes config..."
python3 -c "
import yaml, sys, copy
cfg_path = '$HERMES_CFG'
with open(cfg_path, 'r') as f:
cfg = yaml.safe_load(f) or {}
if 'mcp_servers' not in cfg:
cfg['mcp_servers'] = {}
if 'twozero_td' not in cfg['mcp_servers']:
cfg['mcp_servers']['twozero_td'] = {
'url': '${MCP_ENDPOINT}',
'timeout': 120,
'connect_timeout': 60
}
with open(cfg_path, 'w') as f:
yaml.dump(cfg, f, default_flow_style=False, sort_keys=False)
" 2>/dev/null && echo -e " ${OK} twozero_td MCP entry added to config" \
|| { echo -e " ${FAIL} Could not update config (is PyYAML installed?)"; \
manual_steps+=("Add twozero_td MCP entry to ${HERMES_CFG} manually"); }
manual_steps+=("Restart Hermes session to pick up config change")
fi
# ── 4. Test if MCP port is responding ──
if nc -z 127.0.0.1 "$MCP_PORT" 2>/dev/null; then
echo -e " ${OK} Port ${MCP_PORT} is open"
# ── 5. Verify MCP endpoint responds ──
resp=$(curl -s --max-time 3 "$MCP_ENDPOINT" 2>/dev/null || true)
if [[ -n "$resp" ]]; then
echo -e " ${OK} MCP endpoint responded at ${MCP_ENDPOINT}"
else
echo -e " ${WARN} Port open but MCP endpoint returned empty response"
manual_steps+=("Verify MCP is enabled in twozero settings")
fi
else
echo -e " ${WARN} Port ${MCP_PORT} is not open"
if [[ "$td_running" == true ]]; then
manual_steps+=("In TD: drag twozero.tox into network editor → click Install")
manual_steps+=("Enable MCP: twozero icon → Settings → mcp → 'auto start MCP' → Yes")
else
manual_steps+=("Launch TouchDesigner")
manual_steps+=("Drag twozero.tox into the TD network editor and click Install")
manual_steps+=("Enable MCP: twozero icon → Settings → mcp → 'auto start MCP' → Yes")
fi
fi
# ── Status Report ──
echo -e "\n${CYAN}═══ Status Report ═══${NC}\n"
if [[ ${#manual_steps[@]} -eq 0 ]]; then
echo -e " ${OK} ${GREEN}Fully configured! twozero MCP is ready to use.${NC}\n"
exit 0
else
echo -e " ${WARN} ${YELLOW}Manual steps remaining:${NC}\n"
for i in "${!manual_steps[@]}"; do
echo -e " $((i+1)). ${manual_steps[$i]}"
done
echo ""
exit 1
fi
+252 -53
View File
@@ -19,6 +19,7 @@ import json
import logging
import re
import threading
import time
from typing import Any, Dict, List, Optional
from agent.memory_provider import MemoryProvider
@@ -206,13 +207,19 @@ class HonchoMemoryProvider(MemoryProvider):
self._turn_count = 0
self._injection_frequency = "every-turn" # or "first-turn"
self._context_cadence = 1 # minimum turns between context API calls
self._dialectic_cadence = 3 # minimum turns between dialectic API calls
self._dialectic_cadence = 1 # backwards-compat fallback; wizard writes 2 on new configs
self._dialectic_depth = 1 # how many .chat() calls per dialectic cycle (1-3)
self._dialectic_depth_levels: list[str] | None = None # per-pass reasoning levels
self._reasoning_level_cap: Optional[str] = None # "minimal", "low", "medium", "high"
self._reasoning_heuristic: bool = True # scale base level by query length
self._reasoning_level_cap: str = "high" # ceiling for auto-selected level
self._last_context_turn = -999
self._last_dialectic_turn = -999
# Liveness + observability state
self._prefetch_thread_started_at: float = 0.0 # monotonic ts of current thread
self._prefetch_result_fired_at: int = -999 # turn the pending result was fired at
self._dialectic_empty_streak: int = 0 # consecutive empty returns
# Port #1957: lazy session init for tools-only mode
self._session_initialized = False
self._lazy_init_kwargs: Optional[dict] = None
@@ -286,14 +293,6 @@ class HonchoMemoryProvider(MemoryProvider):
logger.debug("Honcho not configured — plugin inactive")
return
# Override peer_name with gateway user_id for per-user memory scoping.
# Only when no explicit peerName was configured — an explicit peerName
# means the user chose their identity; a raw user_id (e.g. Telegram
# chat ID) should not silently replace it.
_gw_user_id = kwargs.get("user_id")
if _gw_user_id and not cfg.peer_name:
cfg.peer_name = _gw_user_id
self._config = cfg
# ----- B1: recall_mode from config -----
@@ -305,12 +304,16 @@ class HonchoMemoryProvider(MemoryProvider):
raw = cfg.raw or {}
self._injection_frequency = raw.get("injectionFrequency", "every-turn")
self._context_cadence = int(raw.get("contextCadence", 1))
self._dialectic_cadence = int(raw.get("dialecticCadence", 3))
# Backwards-compat: unset dialecticCadence falls back to 1
# (every turn) so existing honcho.json configs without the key
# behave as they did before. New setups via `hermes honcho setup`
# get dialecticCadence=2 written explicitly by the wizard.
self._dialectic_cadence = int(raw.get("dialecticCadence", 1))
self._dialectic_depth = max(1, min(cfg.dialectic_depth, 3))
self._dialectic_depth_levels = cfg.dialectic_depth_levels
cap = raw.get("reasoningLevelCap")
if cap and cap in ("minimal", "low", "medium", "high"):
self._reasoning_level_cap = cap
self._reasoning_heuristic = cfg.reasoning_heuristic
if cfg.reasoning_level_cap in self._LEVEL_ORDER:
self._reasoning_level_cap = cfg.reasoning_level_cap
except Exception as e:
logger.debug("Honcho cost-awareness config parse error: %s", e)
@@ -352,6 +355,7 @@ class HonchoMemoryProvider(MemoryProvider):
honcho=client,
config=cfg,
context_tokens=cfg.context_tokens,
runtime_user_peer_name=kwargs.get("user_id") or None,
)
# ----- B3: resolve_session_name -----
@@ -391,14 +395,45 @@ class HonchoMemoryProvider(MemoryProvider):
except Exception as e:
logger.debug("Honcho memory file migration skipped: %s", e)
# ----- B7: Pre-warming context at init -----
# ----- B7: Pre-warming at init -----
# Context prewarm warms peer.context() (base layer), consumed via
# pop_context_result() in prefetch(). Dialectic prewarm runs the
# full configured depth and writes into _prefetch_result so turn 1
# consumes the result directly.
if self._recall_mode in ("context", "hybrid"):
try:
self._manager.prefetch_context(self._session_key)
self._manager.prefetch_dialectic(self._session_key, "What should I know about this user?")
logger.debug("Honcho pre-warm threads started for session: %s", self._session_key)
except Exception as e:
logger.debug("Honcho pre-warm failed: %s", e)
logger.debug("Honcho context prewarm failed: %s", e)
_prewarm_query = (
"Summarize what you know about this user. "
"Focus on preferences, current projects, and working style."
)
def _prewarm_dialectic() -> None:
try:
r = self._run_dialectic_depth(_prewarm_query)
except Exception as exc:
logger.debug("Honcho dialectic prewarm failed: %s", exc)
self._dialectic_empty_streak += 1
return
if r and r.strip():
with self._prefetch_lock:
self._prefetch_result = r
self._prefetch_result_fired_at = 0
# Treat prewarm as turn 0 so cadence gating starts clean.
self._last_dialectic_turn = 0
self._dialectic_empty_streak = 0
else:
self._dialectic_empty_streak += 1
self._prefetch_thread_started_at = time.monotonic()
self._prefetch_thread = threading.Thread(
target=_prewarm_dialectic, daemon=True, name="honcho-prewarm-dialectic"
)
self._prefetch_thread.start()
logger.debug("Honcho pre-warm started for session: %s", self._session_key)
def _ensure_session(self) -> bool:
"""Lazily initialize the Honcho session (for tools-only mode).
@@ -487,7 +522,8 @@ class HonchoMemoryProvider(MemoryProvider):
"# Honcho Memory\n"
"Active (tools-only mode). Use honcho_profile for a quick factual snapshot, "
"honcho_search for raw excerpts, honcho_context for raw peer context, "
"honcho_reasoning for synthesized answers, "
"honcho_reasoning for synthesized answers (pass reasoning_level "
"minimal/low/medium/high/max — you pick the depth per call), "
"honcho_conclude to save facts about the user. "
"No automatic context injection — you must use tools to access memory."
)
@@ -497,7 +533,8 @@ class HonchoMemoryProvider(MemoryProvider):
"Active (hybrid mode). Relevant context is auto-injected AND memory tools are available. "
"Use honcho_profile for a quick factual snapshot, "
"honcho_search for raw excerpts, honcho_context for raw peer context, "
"honcho_reasoning for synthesized answers, "
"honcho_reasoning for synthesized answers (pass reasoning_level "
"minimal/low/medium/high/max — you pick the depth per call), "
"honcho_conclude to save facts about the user."
)
@@ -526,6 +563,10 @@ class HonchoMemoryProvider(MemoryProvider):
if self._injection_frequency == "first-turn" and self._turn_count > 1:
return ""
# Trivial prompts ("ok", "yes", slash commands) carry no semantic signal.
if self._is_trivial_prompt(query):
return ""
parts = []
# ----- Layer 1: Base context (representation + card) -----
@@ -560,43 +601,72 @@ class HonchoMemoryProvider(MemoryProvider):
# On the very first turn, no queue_prefetch() has run yet so the
# dialectic result is empty. Run with a bounded timeout so a slow
# Honcho connection doesn't block the first response indefinitely.
# On timeout the result is skipped and queue_prefetch() will pick it
# up at the next cadence-allowed turn.
# On timeout we let the thread keep running and write its result into
# _prefetch_result under the lock, so the next turn picks it up.
#
# Skip if the session-start prewarm already filled _prefetch_result —
# firing another .chat() would be duplicate work.
with self._prefetch_lock:
_prewarm_landed = bool(self._prefetch_result)
if _prewarm_landed and self._last_dialectic_turn == -999:
self._last_dialectic_turn = self._turn_count
if self._last_dialectic_turn == -999 and query:
_first_turn_timeout = (
self._config.timeout if self._config and self._config.timeout else 8.0
)
_result_holder: list[str] = []
_fired_at = self._turn_count
def _run_first_turn() -> None:
try:
_result_holder.append(self._run_dialectic_depth(query))
r = self._run_dialectic_depth(query)
except Exception as exc:
logger.debug("Honcho first-turn dialectic failed: %s", exc)
_t = threading.Thread(target=_run_first_turn, daemon=True)
_t.start()
_t.join(timeout=_first_turn_timeout)
if not _t.is_alive():
first_turn_dialectic = _result_holder[0] if _result_holder else ""
if first_turn_dialectic and first_turn_dialectic.strip():
self._dialectic_empty_streak += 1
return
if r and r.strip():
with self._prefetch_lock:
self._prefetch_result = first_turn_dialectic
self._last_dialectic_turn = self._turn_count
else:
self._prefetch_result = r
self._prefetch_result_fired_at = _fired_at
# Advance cadence only on a non-empty result so the next
# turn retries when the call returned nothing.
self._last_dialectic_turn = _fired_at
self._dialectic_empty_streak = 0
else:
self._dialectic_empty_streak += 1
self._prefetch_thread_started_at = time.monotonic()
self._prefetch_thread = threading.Thread(
target=_run_first_turn, daemon=True, name="honcho-prefetch-first"
)
self._prefetch_thread.start()
self._prefetch_thread.join(timeout=_first_turn_timeout)
if self._prefetch_thread.is_alive():
logger.debug(
"Honcho first-turn dialectic timed out (%.1fs)"
"will inject at next cadence-allowed turn",
"Honcho first-turn dialectic still running after %.1fs — "
"will surface on next turn",
_first_turn_timeout,
)
# Don't update _last_dialectic_turn: queue_prefetch() will
# retry at the next cadence-allowed turn via the async path.
if self._prefetch_thread and self._prefetch_thread.is_alive():
self._prefetch_thread.join(timeout=3.0)
with self._prefetch_lock:
dialectic_result = self._prefetch_result
fired_at = self._prefetch_result_fired_at
self._prefetch_result = ""
self._prefetch_result_fired_at = -999
# Discard stale pending results: if the fire happened more than
# cadence × multiplier turns ago (e.g. a run of trivial-prompt turns
# passed without consumption), the content likely no longer tracks
# the current conversational pivot.
stale_limit = self._dialectic_cadence * self._STALE_RESULT_MULTIPLIER
if dialectic_result and fired_at >= 0 and (self._turn_count - fired_at) > stale_limit:
logger.debug(
"Honcho pending dialectic discarded as stale: fired_at=%d, "
"turn=%d, limit=%d", fired_at, self._turn_count, stale_limit,
)
dialectic_result = ""
if dialectic_result and dialectic_result.strip():
parts.append(dialectic_result)
@@ -641,6 +711,10 @@ class HonchoMemoryProvider(MemoryProvider):
if self._recall_mode == "tools":
return
# Trivial prompts don't warrant either a context refresh or a dialectic call.
if self._is_trivial_prompt(query):
return
# ----- Context refresh (base layer) — independent cadence -----
if self._context_cadence <= 1 or (self._turn_count - self._last_context_turn) >= self._context_cadence:
self._last_context_turn = self._turn_count
@@ -650,24 +724,46 @@ class HonchoMemoryProvider(MemoryProvider):
logger.debug("Honcho context prefetch failed: %s", e)
# ----- Dialectic prefetch (supplement layer) -----
# B5: cadence check — skip if too soon since last dialectic call
if self._dialectic_cadence > 1:
if (self._turn_count - self._last_dialectic_turn) < self._dialectic_cadence:
logger.debug("Honcho dialectic prefetch skipped: cadence %d, turns since last: %d",
self._dialectic_cadence, self._turn_count - self._last_dialectic_turn)
return
# Thread-alive guard with stale-thread recovery: a hung Honcho call
# older than timeout × multiplier is treated as dead so it can't
# block subsequent fires.
if self._thread_is_live():
logger.debug("Honcho dialectic prefetch skipped: prior thread still running")
return
self._last_dialectic_turn = self._turn_count
# Cadence gate, widened by the empty-streak backoff so a persistently
# silent backend doesn't retry every turn forever.
effective = self._effective_cadence()
if (self._turn_count - self._last_dialectic_turn) < effective:
logger.debug(
"Honcho dialectic prefetch skipped: effective cadence %d "
"(base %d, empty streak %d), turns since last: %d",
effective, self._dialectic_cadence, self._dialectic_empty_streak,
self._turn_count - self._last_dialectic_turn,
)
return
# Cadence advances only on a non-empty result so empty returns
# (transient API error, sparse representation) retry next turn.
_fired_at = self._turn_count
def _run():
try:
result = self._run_dialectic_depth(query)
if result and result.strip():
with self._prefetch_lock:
self._prefetch_result = result
except Exception as e:
logger.debug("Honcho prefetch failed: %s", e)
self._dialectic_empty_streak += 1
return
if result and result.strip():
with self._prefetch_lock:
self._prefetch_result = result
self._prefetch_result_fired_at = _fired_at
self._last_dialectic_turn = _fired_at
self._dialectic_empty_streak = 0
else:
self._dialectic_empty_streak += 1
self._prefetch_thread_started_at = time.monotonic()
self._prefetch_thread = threading.Thread(
target=_run, daemon=True, name="honcho-prefetch"
)
@@ -692,11 +788,91 @@ class HonchoMemoryProvider(MemoryProvider):
_LEVEL_ORDER = ("minimal", "low", "medium", "high", "max")
def _resolve_pass_level(self, pass_idx: int) -> str:
# Char-count thresholds for the query-length reasoning heuristic.
_HEURISTIC_LENGTH_MEDIUM = 120
_HEURISTIC_LENGTH_HIGH = 400
# Liveness constants. A thread older than timeout × multiplier is treated
# as dead so a hung Honcho call can't block future retries indefinitely.
_STALE_THREAD_MULTIPLIER = 2.0
# Pending result whose fire-turn is older than cadence × multiplier is
# discarded on read so we don't inject context for a stale conversational
# pivot after a gap of trivial-prompt turns.
_STALE_RESULT_MULTIPLIER = 2
# Cap on the empty-streak backoff so a persistently silent backend
# eventually settles on a ceiling instead of unbounded widening.
_BACKOFF_MAX = 8
def _thread_is_live(self) -> bool:
"""Thread-alive guard that treats threads older than the stale
threshold as dead, so a hung Honcho request can't block new fires."""
if not self._prefetch_thread or not self._prefetch_thread.is_alive():
return False
timeout = (self._config.timeout if self._config and self._config.timeout else 8.0)
age = time.monotonic() - self._prefetch_thread_started_at
if age > timeout * self._STALE_THREAD_MULTIPLIER:
logger.debug(
"Honcho prefetch thread age %.1fs exceeds stale threshold "
"%.1fs — treating as dead", age, timeout * self._STALE_THREAD_MULTIPLIER,
)
return False
return True
def _effective_cadence(self) -> int:
"""Cadence plus empty-streak backoff, capped at _BACKOFF_MAX × base."""
if self._dialectic_empty_streak <= 0:
return self._dialectic_cadence
widened = self._dialectic_cadence + self._dialectic_empty_streak
ceiling = self._dialectic_cadence * self._BACKOFF_MAX
return min(widened, ceiling)
def liveness_snapshot(self) -> dict:
"""In-process snapshot of dialectic liveness state for diagnostics.
Returns current turn, last successful dialectic turn, pending-result
fire turn, empty streak, effective cadence, and thread status.
"""
thread_age = None
if self._prefetch_thread and self._prefetch_thread.is_alive():
thread_age = time.monotonic() - self._prefetch_thread_started_at
return {
"turn_count": self._turn_count,
"last_dialectic_turn": self._last_dialectic_turn,
"pending_result_fired_at": self._prefetch_result_fired_at,
"empty_streak": self._dialectic_empty_streak,
"effective_cadence": self._effective_cadence(),
"thread_alive": thread_age is not None,
"thread_age_seconds": thread_age,
}
def _apply_reasoning_heuristic(self, base: str, query: str) -> str:
"""Scale `base` up by query length, clamped at reasoning_level_cap.
Char-count heuristic: +1 at >=120 chars, +2 at >=400.
"""
if not self._reasoning_heuristic or not query:
return base
if base not in self._LEVEL_ORDER:
return base
n = len(query)
if n < self._HEURISTIC_LENGTH_MEDIUM:
bump = 0
elif n < self._HEURISTIC_LENGTH_HIGH:
bump = 1
else:
bump = 2
base_idx = self._LEVEL_ORDER.index(base)
cap_idx = self._LEVEL_ORDER.index(self._reasoning_level_cap)
return self._LEVEL_ORDER[min(base_idx + bump, cap_idx)]
def _resolve_pass_level(self, pass_idx: int, query: str = "") -> str:
"""Resolve reasoning level for a given pass index.
Uses dialecticDepthLevels if configured, otherwise proportional
defaults relative to dialecticReasoningLevel.
Precedence:
1. dialecticDepthLevels (explicit per-pass) wins absolutely
2. _PROPORTIONAL_LEVELS table (depth>1 lighter-early passes)
3. Base level = dialecticReasoningLevel, optionally scaled by the
reasoning heuristic when the mapping falls through to 'base'
"""
if self._dialectic_depth_levels and pass_idx < len(self._dialectic_depth_levels):
return self._dialectic_depth_levels[pass_idx]
@@ -704,7 +880,7 @@ class HonchoMemoryProvider(MemoryProvider):
base = (self._config.dialectic_reasoning_level if self._config else "low")
mapping = self._PROPORTIONAL_LEVELS.get((self._dialectic_depth, pass_idx))
if mapping is None or mapping == "base":
return base
return self._apply_reasoning_heuristic(base, query)
return mapping
def _build_dialectic_prompt(self, pass_idx: int, prior_results: list[str], is_cold: bool) -> str:
@@ -791,7 +967,7 @@ class HonchoMemoryProvider(MemoryProvider):
break
prompt = self._build_dialectic_prompt(i, results, is_cold)
level = self._resolve_pass_level(i)
level = self._resolve_pass_level(i, query=query)
logger.debug("Honcho dialectic depth %d: pass %d, level=%s, cold=%s",
self._dialectic_depth, i, level, is_cold)
@@ -808,6 +984,29 @@ class HonchoMemoryProvider(MemoryProvider):
return r
return ""
# Prompts that carry no semantic signal — trivial acknowledgements, slash
# commands, empty input. Skipping injection here saves tokens and prevents
# stale user-model context from derailing one-word replies.
_TRIVIAL_PROMPT_RE = re.compile(
r'^(yes|no|ok|okay|sure|thanks|thank you|y|n|yep|nope|yeah|nah|'
r'continue|go ahead|do it|proceed|got it|cool|nice|great|done|next|lgtm|k)$',
re.IGNORECASE,
)
@classmethod
def _is_trivial_prompt(cls, text: str) -> bool:
"""Return True if the prompt is too trivial to warrant context injection."""
if not text:
return True
stripped = text.strip()
if not stripped:
return True
if stripped.startswith("/"):
return True
if cls._TRIVIAL_PROMPT_RE.match(stripped):
return True
return False
def on_turn_start(self, turn_number: int, message: str, **kwargs) -> None:
"""Track turn count for cadence and injection_frequency logic."""
self._turn_count = turn_number
+27 -4
View File
@@ -460,17 +460,37 @@ def cmd_setup(args) -> None:
pass # keep current
# --- 7b. Dialectic cadence ---
current_dialectic = str(hermes_host.get("dialecticCadence") or cfg.get("dialecticCadence") or "3")
current_dialectic = str(hermes_host.get("dialecticCadence") or cfg.get("dialecticCadence") or "2")
print("\n Dialectic cadence:")
print(" How often Honcho rebuilds its user model (LLM call on Honcho backend).")
print(" 1 = every turn (aggressive), 3 = every 3 turns (recommended), 5+ = sparse.")
print(" 1 = every turn, 2 = every other turn, 3+ = sparser.")
print(" Recommended: 1-5.")
new_dialectic = _prompt("Dialectic cadence", default=current_dialectic)
try:
val = int(new_dialectic)
if val >= 1:
hermes_host["dialecticCadence"] = val
except (ValueError, TypeError):
hermes_host["dialecticCadence"] = 3
hermes_host["dialecticCadence"] = 2
# --- 7c. Dialectic reasoning level ---
current_reasoning = (
hermes_host.get("dialecticReasoningLevel")
or cfg.get("dialecticReasoningLevel")
or "low"
)
print("\n Dialectic reasoning level:")
print(" Depth Honcho uses when synthesizing user context on auto-injected calls.")
print(" minimal -- quick factual lookups")
print(" low -- straightforward questions (default)")
print(" medium -- multi-aspect synthesis")
print(" high -- complex behavioral patterns")
print(" max -- thorough audit-level analysis")
new_reasoning = _prompt("Reasoning level", default=current_reasoning)
if new_reasoning in ("minimal", "low", "medium", "high", "max"):
hermes_host["dialecticReasoningLevel"] = new_reasoning
else:
hermes_host["dialecticReasoningLevel"] = "low"
# --- 8. Session strategy ---
current_strat = hermes_host.get("sessionStrategy") or cfg.get("sessionStrategy", "per-session")
@@ -636,8 +656,11 @@ def cmd_status(args) -> None:
print(f" Recall mode: {hcfg.recall_mode}")
print(f" Context budget: {hcfg.context_tokens or '(uncapped)'} tokens")
raw = getattr(hcfg, "raw", None) or {}
dialectic_cadence = raw.get("dialecticCadence") or 3
dialectic_cadence = raw.get("dialecticCadence") or 1
print(f" Dialectic cad: every {dialectic_cadence} turn{'s' if dialectic_cadence != 1 else ''}")
reasoning_cap = raw.get("reasoningLevelCap") or hcfg.reasoning_level_cap
heuristic_on = "on" if hcfg.reasoning_heuristic else "off"
print(f" Reasoning: base={hcfg.dialectic_reasoning_level}, cap={reasoning_cap}, heuristic={heuristic_on}")
print(f" Observation: user(me={hcfg.user_observe_me},others={hcfg.user_observe_others}) ai(me={hcfg.ai_observe_me},others={hcfg.ai_observe_others})")
print(f" Write freq: {hcfg.write_frequency}")
+15
View File
@@ -251,6 +251,11 @@ class HonchoClientConfig:
# matching dialectic_depth length. When None, uses proportional defaults
# derived from dialectic_reasoning_level.
dialectic_depth_levels: list[str] | None = None
# When true, the auto-injected dialectic scales reasoning level up on
# longer queries. See HonchoMemoryProvider for thresholds.
reasoning_heuristic: bool = True
# Ceiling for the heuristic-selected reasoning level.
reasoning_level_cap: str = "high"
# Honcho API limits — configurable for self-hosted instances
# Max chars per message sent via add_messages() (Honcho cloud: 25000)
message_max_chars: int = 25000
@@ -446,6 +451,16 @@ class HonchoClientConfig:
raw.get("dialecticDepthLevels"),
depth=_parse_dialectic_depth(host_block.get("dialecticDepth"), raw.get("dialecticDepth")),
),
reasoning_heuristic=_resolve_bool(
host_block.get("reasoningHeuristic"),
raw.get("reasoningHeuristic"),
default=True,
),
reasoning_level_cap=(
host_block.get("reasoningLevelCap")
or raw.get("reasoningLevelCap")
or "high"
),
message_max_chars=int(
host_block.get("messageMaxChars")
or raw.get("messageMaxChars")
+13 -42
View File
@@ -78,6 +78,7 @@ class HonchoSessionManager:
honcho: Honcho | None = None,
context_tokens: int | None = None,
config: Any | None = None,
runtime_user_peer_name: str | None = None,
):
"""
Initialize the session manager.
@@ -87,10 +88,12 @@ class HonchoSessionManager:
context_tokens: Max tokens for context() calls (None = Honcho default).
config: HonchoClientConfig from global config (provides peer_name, ai_peer,
write_frequency, observation, etc.).
runtime_user_peer_name: Gateway user identity for per-user memory scoping.
"""
self._honcho = honcho
self._context_tokens = context_tokens
self._config = config
self._runtime_user_peer_name = runtime_user_peer_name
self._cache: dict[str, HonchoSession] = {}
self._peers_cache: dict[str, Any] = {}
self._sessions_cache: dict[str, Any] = {}
@@ -100,9 +103,11 @@ class HonchoSessionManager:
self._write_frequency = write_frequency
self._turn_counter: int = 0
# Prefetch caches: session_key → last result (consumed once per turn)
# Prefetch cache: session_key → last context result (consumed once per turn).
# Dialectic results are cached on the plugin side (HonchoMemoryProvider
# ._prefetch_result) so session-start prewarm and turn-driven fires share
# one source of truth; see __init__.py _do_session_init for the prewarm.
self._context_cache: dict[str, dict] = {}
self._dialectic_cache: dict[str, str] = {}
self._prefetch_cache_lock = threading.Lock()
self._dialectic_reasoning_level: str = (
config.dialectic_reasoning_level if config else "low"
@@ -272,8 +277,10 @@ class HonchoSessionManager:
logger.debug("Local session cache hit: %s", key)
return self._cache[key]
# Use peer names from global config when available
if self._config and self._config.peer_name:
# Gateway sessions should use the runtime user identity when available.
if self._runtime_user_peer_name:
user_peer_id = self._sanitize_id(self._runtime_user_peer_name)
elif self._config and self._config.peer_name:
user_peer_id = self._sanitize_id(self._config.peer_name)
else:
# Fallback: derive from session key
@@ -499,8 +506,8 @@ class HonchoSessionManager:
Query Honcho's dialectic endpoint about a peer.
Runs an LLM on Honcho's backend against the target peer's full
representation. Higher latency than context() call async via
prefetch_dialectic() to avoid blocking the response.
representation. Higher latency than context() callers run this in
a background thread (see HonchoMemoryProvider) to avoid blocking.
Args:
session_key: The session key to query against.
@@ -555,42 +562,6 @@ class HonchoSessionManager:
logger.warning("Honcho dialectic query failed: %s", e)
return ""
def prefetch_dialectic(self, session_key: str, query: str) -> None:
"""
Fire a dialectic_query in a background thread, caching the result.
Non-blocking. The result is available via pop_dialectic_result()
on the next call (typically the following turn). Reasoning level
is selected dynamically based on query complexity.
Args:
session_key: The session key to query against.
query: The user's current message, used as the query.
"""
def _run():
result = self.dialectic_query(session_key, query)
if result:
self.set_dialectic_result(session_key, result)
t = threading.Thread(target=_run, name="honcho-dialectic-prefetch", daemon=True)
t.start()
def set_dialectic_result(self, session_key: str, result: str) -> None:
"""Store a prefetched dialectic result in a thread-safe way."""
if not result:
return
with self._prefetch_cache_lock:
self._dialectic_cache[session_key] = result
def pop_dialectic_result(self, session_key: str) -> str:
"""
Return and clear the cached dialectic result for this session.
Returns empty string if no result is ready yet.
"""
with self._prefetch_cache_lock:
return self._dialectic_cache.pop(session_key, "")
def prefetch_context(self, session_key: str, user_message: str | None = None) -> None:
"""
Fire get_prefetch_context in a background thread, caching the result.
+95 -73
View File
@@ -1054,16 +1054,6 @@ class AIAgent:
}
elif "portal.qwen.ai" in effective_base.lower():
client_kwargs["default_headers"] = _qwen_portal_headers()
elif "generativelanguage.googleapis.com" in effective_base.lower():
# Google's OpenAI-compatible endpoint only accepts x-goog-api-key.
# The OpenAI SDK auto-injects Authorization: Bearer when api_key= is
# set to a real value, causing HTTP 400 "Multiple authentication
# credentials received". Pass a placeholder so the SDK does not
# emit Bearer, and carry the real key via x-goog-api-key instead.
# Fixes: https://github.com/NousResearch/hermes-agent/issues/7893
real_key = client_kwargs["api_key"]
client_kwargs["api_key"] = "not-used"
client_kwargs["default_headers"] = {"x-goog-api-key": real_key}
else:
# No explicit creds — use the centralized provider router
from agent.auxiliary_client import resolve_provider_client
@@ -1316,31 +1306,6 @@ class AIAgent:
try:
_mem_provider_name = mem_config.get("provider", "") if mem_config else ""
# Auto-migrate: if Honcho was actively configured (enabled +
# credentials) but memory.provider is not set, activate the
# honcho plugin automatically. Just having the config file
# is not enough — the user may have disabled Honcho or the
# file may be from a different tool.
if not _mem_provider_name:
try:
from plugins.memory.honcho.client import HonchoClientConfig as _HCC
_hcfg = _HCC.from_global_config()
if _hcfg.enabled and (_hcfg.api_key or _hcfg.base_url):
_mem_provider_name = "honcho"
# Persist so this only auto-migrates once
try:
from hermes_cli.config import load_config as _lc, save_config as _sc
_cfg = _lc()
_cfg.setdefault("memory", {})["provider"] = "honcho"
_sc(_cfg)
except Exception:
pass
if not self.quiet_mode:
print(" ✓ Auto-migrated Honcho to memory provider plugin.")
print(" Your config and data are preserved.\n")
except Exception:
pass
if _mem_provider_name:
from agent.memory_manager import MemoryManager as _MemoryManager
from plugins.memory import load_memory_provider as _load_mem
@@ -1951,13 +1916,16 @@ class AIAgent:
def _should_emit_quiet_tool_messages(self) -> bool:
"""Return True when quiet-mode tool summaries should print directly.
When the caller provides ``tool_progress_callback`` (for example the CLI
TUI or a gateway progress renderer), that callback owns progress display.
Emitting quiet-mode summary lines here duplicates progress and leaks tool
previews into flows that are expected to stay silent, such as
``hermes chat -q``.
Quiet mode is used by both the interactive CLI and embedded/library
callers. The CLI may still want compact progress hints when no callback
owns rendering. Embedded/library callers, on the other hand, expect
quiet mode to be truly silent.
"""
return self.quiet_mode and not self.tool_progress_callback
return (
self.quiet_mode
and not self.tool_progress_callback
and getattr(self, "platform", "") == "cli"
)
def _emit_status(self, message: str) -> None:
"""Emit a lifecycle status message to both CLI and gateway channels.
@@ -2182,17 +2150,49 @@ class AIAgent:
return bool(cleaned.strip())
def _strip_think_blocks(self, content: str) -> str:
"""Remove reasoning/thinking blocks from content, returning only visible text."""
"""Remove reasoning/thinking blocks from content, returning only visible text.
Handles four cases:
1. Closed tag pairs (``<think></think>``) the common path when
the provider emits complete reasoning blocks.
2. Unterminated open tag at a block boundary (start of text or
after a newline) e.g. MiniMax M2.7 / NIM endpoints where the
closing tag is dropped. Everything from the open tag to end
of string is stripped. The block-boundary check mirrors
``gateway/stream_consumer.py``'s filter so models that mention
``<think>`` in prose aren't over-stripped.
3. Stray orphan open/close tags that slip through.
4. Tag variants: ``<think>``, ``<thinking>``, ``<reasoning>``,
``<REASONING_SCRATCHPAD>``, ``<thought>`` (Gemma 4), all
case-insensitive.
"""
if not content:
return ""
# Strip all reasoning tag variants: <think>, <thinking>, <THINKING>,
# <reasoning>, <REASONING_SCRATCHPAD>, <thought> (Gemma 4)
content = re.sub(r'<think>.*?</think>', '', content, flags=re.DOTALL)
# 1. Closed tag pairs — case-insensitive for all variants so
# mixed-case tags (<THINK>, <Thinking>) don't slip through to
# the unterminated-tag pass and take trailing content with them.
content = re.sub(r'<think>.*?</think>', '', content, flags=re.DOTALL | re.IGNORECASE)
content = re.sub(r'<thinking>.*?</thinking>', '', content, flags=re.DOTALL | re.IGNORECASE)
content = re.sub(r'<reasoning>.*?</reasoning>', '', content, flags=re.DOTALL)
content = re.sub(r'<REASONING_SCRATCHPAD>.*?</REASONING_SCRATCHPAD>', '', content, flags=re.DOTALL)
content = re.sub(r'<reasoning>.*?</reasoning>', '', content, flags=re.DOTALL | re.IGNORECASE)
content = re.sub(r'<REASONING_SCRATCHPAD>.*?</REASONING_SCRATCHPAD>', '', content, flags=re.DOTALL | re.IGNORECASE)
content = re.sub(r'<thought>.*?</thought>', '', content, flags=re.DOTALL | re.IGNORECASE)
content = re.sub(r'</?(?:think|thinking|reasoning|thought|REASONING_SCRATCHPAD)>\s*', '', content, flags=re.IGNORECASE)
# 2. Unterminated reasoning block — open tag at a block boundary
# (start of text, or after a newline) with no matching close.
# Strip from the tag to end of string. Fixes #8878 / #9568
# (MiniMax M2.7 leaking raw reasoning into assistant content).
content = re.sub(
r'(?:^|\n)[ \t]*<(?:think|thinking|reasoning|thought|REASONING_SCRATCHPAD)\b[^>]*>.*$',
'',
content,
flags=re.DOTALL | re.IGNORECASE,
)
# 3. Stray orphan open/close tags that slipped through.
content = re.sub(
r'</?(?:think|thinking|reasoning|thought|REASONING_SCRATCHPAD)>\s*',
'',
content,
flags=re.IGNORECASE,
)
return content
@staticmethod
@@ -5245,17 +5245,6 @@ class AIAgent:
self._client_kwargs["default_headers"] = {"User-Agent": "KimiCLI/1.30.0"}
elif "portal.qwen.ai" in normalized:
self._client_kwargs["default_headers"] = _qwen_portal_headers()
elif "generativelanguage.googleapis.com" in normalized:
# Google's endpoint rejects Bearer tokens; use x-goog-api-key instead.
# Swap the real key out of api_key and into the header so the OpenAI
# SDK does not emit Authorization: Bearer.
# Fixes: https://github.com/NousResearch/hermes-agent/issues/7893
real_key = self._client_kwargs.get("api_key", "")
if real_key and real_key != "not-used":
self._client_kwargs["api_key"] = "not-used"
self._client_kwargs["default_headers"] = {
"x-goog-api-key": real_key or self._client_kwargs.get("api_key", ""),
}
else:
self._client_kwargs.pop("default_headers", None)
@@ -5868,7 +5857,15 @@ class AIAgent:
entry["id"] = tc_delta.id
if tc_delta.function:
if tc_delta.function.name:
entry["function"]["name"] += tc_delta.function.name
# Use assignment, not +=. Function names are
# atomic identifiers delivered complete in the
# first chunk (OpenAI spec). Some providers
# (MiniMax M2.7 via NVIDIA NIM) resend the full
# name in every chunk; concatenation would
# produce "read_fileread_file". Assignment
# (matching the OpenAI Node SDK / LiteLLM /
# Vercel AI patterns) is immune to this.
entry["function"]["name"] = tc_delta.function.name
if tc_delta.function.arguments:
entry["function"]["arguments"] += tc_delta.function.arguments
extra = getattr(tc_delta, "extra_content", None)
@@ -7053,8 +7050,20 @@ class AIAgent:
if self.tools:
api_kwargs["tools"] = self.tools
if self.max_tokens is not None:
# ── max_tokens for chat_completions ──────────────────────────────
# Priority: ephemeral override (error recovery / length-continuation
# boost) > user-configured max_tokens > provider-specific defaults.
_ephemeral_out = getattr(self, "_ephemeral_max_output_tokens", None)
if _ephemeral_out is not None:
self._ephemeral_max_output_tokens = None # consume immediately
api_kwargs.update(self._max_tokens_param(_ephemeral_out))
elif self.max_tokens is not None:
api_kwargs.update(self._max_tokens_param(self.max_tokens))
elif "integrate.api.nvidia.com" in self._base_url_lower:
# NVIDIA NIM defaults to a very low max_tokens when omitted,
# causing models like GLM-4.7 to truncate immediately (thinking
# tokens alone exhaust the budget). 16384 provides adequate room.
api_kwargs.update(self._max_tokens_param(16384))
elif self._is_qwen_portal():
# Qwen Portal defaults to a very low max_tokens when omitted.
# Reasoning models (qwen3-coder-plus) exhaust that budget on
@@ -7263,6 +7272,20 @@ class AIAgent:
if reasoning_text:
reasoning_text = _sanitize_surrogates(reasoning_text)
# Strip inline reasoning tags (<think>…</think> etc.) from the stored
# assistant content. Reasoning was already captured into
# ``reasoning_text`` above (either from structured fields or the
# inline-block fallback), so the raw tags in content are redundant.
# Leaving them in place caused reasoning to leak to messaging
# platforms (#8878, #9568), inflate context on subsequent turns
# (#9306 observed 16% content-size reduction on a real MiniMax
# session), and pollute generated session titles. One strip at the
# storage boundary cleans content for every downstream consumer:
# API replay, session transcript, gateway delivery, CLI display,
# compression, title generation.
if isinstance(_san_content, str) and _san_content:
_san_content = self._strip_think_blocks(_san_content).strip()
msg = {
"role": "assistant",
"content": _san_content,
@@ -8302,7 +8325,7 @@ class AIAgent:
elif self._context_engine_tool_names and function_name in self._context_engine_tool_names:
# Context engine tools (lcm_grep, lcm_describe, lcm_expand, etc.)
spinner = None
if self.quiet_mode and not self.tool_progress_callback:
if self._should_emit_quiet_tool_messages():
face = random.choice(KawaiiSpinner.get_waiting_faces())
emoji = _get_tool_emoji(function_name)
preview = _build_tool_preview(function_name, function_args) or function_name
@@ -8320,7 +8343,7 @@ class AIAgent:
cute_msg = _get_cute_tool_message_impl(function_name, function_args, tool_duration, result=_ce_result)
if spinner:
spinner.stop(cute_msg)
elif self.quiet_mode:
elif self._should_emit_quiet_tool_messages():
self._vprint(f" {cute_msg}")
elif self._memory_manager and self._memory_manager.has_tool(function_name):
# Memory provider tools (hindsight_retain, honcho_search, etc.)
@@ -10152,7 +10175,7 @@ class AIAgent:
_dhh = _dhh_fn()
print(f"{self.log_prefix} • Check ANTHROPIC_TOKEN in {_dhh}/.env for Hermes-managed OAuth/setup tokens")
print(f"{self.log_prefix} • Check ANTHROPIC_API_KEY in {_dhh}/.env for API keys or legacy token values")
print(f"{self.log_prefix} • For API keys: verify at https://console.anthropic.com/settings/keys")
print(f"{self.log_prefix} • For API keys: verify at https://platform.claude.com/settings/keys")
print(f"{self.log_prefix} • For Claude Code: run 'claude /login' to refresh, then retry")
print(f"{self.log_prefix} • Legacy cleanup: hermes config set ANTHROPIC_TOKEN \"\"")
print(f"{self.log_prefix} • Clear stale keys: hermes config set ANTHROPIC_API_KEY \"\"")
@@ -10796,6 +10819,12 @@ class AIAgent:
continue
if restart_with_length_continuation:
# Progressively boost the output token budget on each retry.
# Retry 1 → 2× base, retry 2 → 3× base, capped at 32 768.
# Applies to all providers via _ephemeral_max_output_tokens.
_boost_base = self.max_tokens if self.max_tokens else 4096
_boost = _boost_base * (length_continue_retries + 1)
self._ephemeral_max_output_tokens = min(_boost, 32768)
continue
# Guard: if all retries exhausted without a successful response
@@ -11158,17 +11187,10 @@ class AIAgent:
self._last_content_tools_all_housekeeping = _all_housekeeping
if _all_housekeeping and self._has_stream_consumers():
self._mute_post_response = True
elif self.quiet_mode:
elif self._should_emit_quiet_tool_messages():
clean = self._strip_think_blocks(turn_content).strip()
if clean:
relayed = False
if (
self.tool_progress_callback
and getattr(self, "platform", "") == "tui"
):
relayed = True
if not relayed:
self._vprint(f" ┊ 💬 {clean}")
self._vprint(f" ┊ 💬 {clean}")
# Pop thinking-only prefill message(s) before appending
# (tool-call path — same rationale as the final-response path).
+7
View File
@@ -48,6 +48,7 @@ AUTHOR_MAP = {
"35742124+0xbyt4@users.noreply.github.com": "0xbyt4",
"82637225+kshitijk4poor@users.noreply.github.com": "kshitijk4poor",
"kshitijk4poor@users.noreply.github.com": "kshitijk4poor",
"kshitijk4poor@gmail.com": "kshitijk4poor",
"16443023+stablegenius49@users.noreply.github.com": "stablegenius49",
"185121704+stablegenius49@users.noreply.github.com": "stablegenius49",
"101283333+batuhankocyigit@users.noreply.github.com": "batuhankocyigit",
@@ -74,6 +75,8 @@ AUTHOR_MAP = {
"109555139+davetist@users.noreply.github.com": "davetist",
"39405770+yyq4193@users.noreply.github.com": "yyq4193",
"Asunfly@users.noreply.github.com": "Asunfly",
"2500400+honghua@users.noreply.github.com": "honghua",
"nish3451@users.noreply.github.com": "nish3451",
# contributors (manual mapping from git names)
"ahmedsherif95@gmail.com": "asheriif",
"liujinkun@bytedance.com": "liujinkun2025",
@@ -212,6 +215,7 @@ AUTHOR_MAP = {
"ziliangpeng@users.noreply.github.com": "ziliangpeng",
"centripetal-star@users.noreply.github.com": "centripetal-star",
"LeonSGP43@users.noreply.github.com": "LeonSGP43",
"154585401+LeonSGP43@users.noreply.github.com": "LeonSGP43",
"Lubrsy706@users.noreply.github.com": "Lubrsy706",
"niyant@spicefi.xyz": "spniyant",
"olafthiele@gmail.com": "olafthiele",
@@ -264,6 +268,9 @@ AUTHOR_MAP = {
"asurla@nvidia.com": "anniesurla",
"limkuan24@gmail.com": "WideLee",
"aviralarora002@gmail.com": "AviArora02-commits",
"junminliu@gmail.com": "JimLiu",
"jarvischer@gmail.com": "maxchernin",
"levantam.98.2324@gmail.com": "LVT382009",
}
+8
View File
@@ -229,6 +229,14 @@ async function startSocket() {
// Check allowlist for messages from others (resolve LID ↔ phone aliases)
if (!msg.key.fromMe && !matchesAllowedUser(senderId, ALLOWED_USERS, SESSION_DIR)) {
try {
console.log(JSON.stringify({
event: 'ignored',
reason: 'allowlist_mismatch',
chatId,
senderId,
}));
} catch {}
continue;
}
@@ -0,0 +1,43 @@
# Port Notes — baoyu-infographic
Ported from [JimLiu/baoyu-skills](https://github.com/JimLiu/baoyu-skills) v1.56.1.
## Changes from upstream
Only `SKILL.md` was modified. All 45 reference files are verbatim copies.
### SKILL.md adaptations
| Change | Upstream | Hermes |
|--------|----------|--------|
| Metadata namespace | `openclaw` | `hermes` |
| Trigger | `/baoyu-infographic` slash command | Natural language skill matching |
| User config | EXTEND.md file (project/user/XDG paths) | Removed — not part of Hermes infra |
| User prompts | `AskUserQuestion` (batched) | `clarify` tool (one at a time) |
| Image generation | baoyu-imagine (Bun/TypeScript) | `image_generate` tool |
| Platform support | Linux/macOS/Windows/WSL/PowerShell | Linux/macOS only |
| File operations | Bash commands | Hermes file tools (write_file, read_file) |
### What was preserved
- All layout definitions (21 files)
- All style definitions (21 files)
- Core reference files (analysis-framework, base-prompt, structured-content-template)
- Recommended combinations table
- Keyword shortcuts table
- Core principles and workflow structure
- Author, version, homepage attribution
## Syncing with upstream
To pull upstream updates:
```bash
# Compare versions
curl -sL https://raw.githubusercontent.com/JimLiu/baoyu-skills/main/skills/baoyu-infographic/SKILL.md | head -5
# Look for version: line
# Diff reference files
diff <(curl -sL https://raw.githubusercontent.com/.../references/layouts/bento-grid.md) references/layouts/bento-grid.md
```
Reference files can be overwritten directly (they're unchanged from upstream). SKILL.md must be manually merged since it contains Hermes-specific adaptations.
+236
View File
@@ -0,0 +1,236 @@
---
name: baoyu-infographic
description: Generate professional infographics with 21 layout types and 21 visual styles. Analyzes content, recommends layout×style combinations, and generates publication-ready infographics. Use when user asks to create "infographic", "visual summary", "信息图", "可视化", or "高密度信息大图".
version: 1.56.1
author: 宝玉 (JimLiu)
license: MIT
metadata:
hermes:
tags: [infographic, visual-summary, creative, image-generation]
homepage: https://github.com/JimLiu/baoyu-skills#baoyu-infographic
---
# Infographic Generator
Adapted from [baoyu-infographic](https://github.com/JimLiu/baoyu-skills) for Hermes Agent's tool ecosystem.
Two dimensions: **layout** (information structure) × **style** (visual aesthetics). Freely combine any layout with any style.
## When to Use
Trigger this skill when the user asks to create an infographic, visual summary, information graphic, or uses terms like "信息图", "可视化", or "高密度信息大图". The user provides content (text, file path, URL, or topic) and optionally specifies layout, style, aspect ratio, or language.
## Options
| Option | Values |
|--------|--------|
| Layout | 21 options (see Layout Gallery), default: bento-grid |
| Style | 21 options (see Style Gallery), default: craft-handmade |
| Aspect | Named: landscape (16:9), portrait (9:16), square (1:1). Custom: any W:H ratio (e.g., 3:4, 4:3, 2.35:1) |
| Language | en, zh, ja, etc. |
## Layout Gallery
| Layout | Best For |
|--------|----------|
| `linear-progression` | Timelines, processes, tutorials |
| `binary-comparison` | A vs B, before-after, pros-cons |
| `comparison-matrix` | Multi-factor comparisons |
| `hierarchical-layers` | Pyramids, priority levels |
| `tree-branching` | Categories, taxonomies |
| `hub-spoke` | Central concept with related items |
| `structural-breakdown` | Exploded views, cross-sections |
| `bento-grid` | Multiple topics, overview (default) |
| `iceberg` | Surface vs hidden aspects |
| `bridge` | Problem-solution |
| `funnel` | Conversion, filtering |
| `isometric-map` | Spatial relationships |
| `dashboard` | Metrics, KPIs |
| `periodic-table` | Categorized collections |
| `comic-strip` | Narratives, sequences |
| `story-mountain` | Plot structure, tension arcs |
| `jigsaw` | Interconnected parts |
| `venn-diagram` | Overlapping concepts |
| `winding-roadmap` | Journey, milestones |
| `circular-flow` | Cycles, recurring processes |
| `dense-modules` | High-density modules, data-rich guides |
Full definitions: `references/layouts/<layout>.md`
## Style Gallery
| Style | Description |
|-------|-------------|
| `craft-handmade` | Hand-drawn, paper craft (default) |
| `claymation` | 3D clay figures, stop-motion |
| `kawaii` | Japanese cute, pastels |
| `storybook-watercolor` | Soft painted, whimsical |
| `chalkboard` | Chalk on black board |
| `cyberpunk-neon` | Neon glow, futuristic |
| `bold-graphic` | Comic style, halftone |
| `aged-academia` | Vintage science, sepia |
| `corporate-memphis` | Flat vector, vibrant |
| `technical-schematic` | Blueprint, engineering |
| `origami` | Folded paper, geometric |
| `pixel-art` | Retro 8-bit |
| `ui-wireframe` | Grayscale interface mockup |
| `subway-map` | Transit diagram |
| `ikea-manual` | Minimal line art |
| `knolling` | Organized flat-lay |
| `lego-brick` | Toy brick construction |
| `pop-laboratory` | Blueprint grid, coordinate markers, lab precision |
| `morandi-journal` | Hand-drawn doodle, warm Morandi tones |
| `retro-pop-grid` | 1970s retro pop art, Swiss grid, thick outlines |
| `hand-drawn-edu` | Macaron pastels, hand-drawn wobble, stick figures |
Full definitions: `references/styles/<style>.md`
## Recommended Combinations
| Content Type | Layout + Style |
|--------------|----------------|
| Timeline/History | `linear-progression` + `craft-handmade` |
| Step-by-step | `linear-progression` + `ikea-manual` |
| A vs B | `binary-comparison` + `corporate-memphis` |
| Hierarchy | `hierarchical-layers` + `craft-handmade` |
| Overlap | `venn-diagram` + `craft-handmade` |
| Conversion | `funnel` + `corporate-memphis` |
| Cycles | `circular-flow` + `craft-handmade` |
| Technical | `structural-breakdown` + `technical-schematic` |
| Metrics | `dashboard` + `corporate-memphis` |
| Educational | `bento-grid` + `chalkboard` |
| Journey | `winding-roadmap` + `storybook-watercolor` |
| Categories | `periodic-table` + `bold-graphic` |
| Product Guide | `dense-modules` + `morandi-journal` |
| Technical Guide | `dense-modules` + `pop-laboratory` |
| Trendy Guide | `dense-modules` + `retro-pop-grid` |
| Educational Diagram | `hub-spoke` + `hand-drawn-edu` |
| Process Tutorial | `linear-progression` + `hand-drawn-edu` |
Default: `bento-grid` + `craft-handmade`
## Keyword Shortcuts
When user input contains these keywords, **auto-select** the associated layout and offer associated styles as top recommendations in Step 3. Skip content-based layout inference for matched keywords.
If a shortcut has **Prompt Notes**, append them to the generated prompt (Step 5) as additional style instructions.
| User Keyword | Layout | Recommended Styles | Default Aspect | Prompt Notes |
|--------------|--------|--------------------|----------------|--------------|
| 高密度信息大图 / high-density-info | `dense-modules` | `morandi-journal`, `pop-laboratory`, `retro-pop-grid` | portrait | — |
| 信息图 / infographic | `bento-grid` | `craft-handmade` | landscape | Minimalist: clean canvas, ample whitespace, no complex background textures. Simple cartoon elements and icons only. |
## Output Structure
```
infographic/{topic-slug}/
├── source-{slug}.{ext}
├── analysis.md
├── structured-content.md
├── prompts/infographic.md
└── infographic.png
```
Slug: 2-4 words kebab-case from topic. Conflict: append `-YYYYMMDD-HHMMSS`.
## Core Principles
- Preserve source data faithfully — no summarization or rephrasing (but **strip any credentials, API keys, tokens, or secrets** before including in outputs)
- Define learning objectives before structuring content
- Structure for visual communication (headlines, labels, visual elements)
## Workflow
### Step 1: Analyze Content
**Load references**: Read `references/analysis-framework.md` from this skill.
1. Save source content (file path or paste → `source.md` using `write_file`)
- **Backup rule**: If `source.md` exists, rename to `source-backup-YYYYMMDD-HHMMSS.md`
2. Analyze: topic, data type, complexity, tone, audience
3. Detect source language and user language
4. Extract design instructions from user input
5. Save analysis to `analysis.md`
- **Backup rule**: If `analysis.md` exists, rename to `analysis-backup-YYYYMMDD-HHMMSS.md`
See `references/analysis-framework.md` for detailed format.
### Step 2: Generate Structured Content → `structured-content.md`
Transform content into infographic structure:
1. Title and learning objectives
2. Sections with: key concept, content (verbatim), visual element, text labels
3. Data points (all statistics/quotes copied exactly)
4. Design instructions from user
**Rules**: Markdown only. No new information. Preserve data faithfully. Strip any credentials or secrets from output.
See `references/structured-content-template.md` for detailed format.
### Step 3: Recommend Combinations
**3.1 Check Keyword Shortcuts first**: If user input matches a keyword from the **Keyword Shortcuts** table, auto-select the associated layout and prioritize associated styles as top recommendations. Skip content-based layout inference.
**3.2 Otherwise**, recommend 3-5 layout×style combinations based on:
- Data structure → matching layout
- Content tone → matching style
- Audience expectations
- User design instructions
### Step 4: Confirm Options
Use the `clarify` tool to confirm options with the user. Since `clarify` handles one question at a time, ask the most important question first:
**Q1 — Combination**: Present 3+ layout×style combos with rationale. Ask user to pick one.
**Q2 — Aspect**: Ask for aspect ratio preference (landscape/portrait/square or custom W:H).
**Q3 — Language** (only if source ≠ user language): Ask which language the text content should use.
### Step 5: Generate Prompt → `prompts/infographic.md`
**Backup rule**: If `prompts/infographic.md` exists, rename to `prompts/infographic-backup-YYYYMMDD-HHMMSS.md`
**Load references**: Read the selected layout from `references/layouts/<layout>.md` and style from `references/styles/<style>.md`.
Combine:
1. Layout definition from `references/layouts/<layout>.md`
2. Style definition from `references/styles/<style>.md`
3. Base template from `references/base-prompt.md`
4. Structured content from Step 2
5. All text in confirmed language
**Aspect ratio resolution** for `{{ASPECT_RATIO}}`:
- Named presets → ratio string: landscape→`16:9`, portrait→`9:16`, square→`1:1`
- Custom W:H ratios → use as-is (e.g., `3:4`, `4:3`, `2.35:1`)
Save the assembled prompt to `prompts/infographic.md` using `write_file`.
### Step 6: Generate Image
Use the `image_generate` tool with the assembled prompt from Step 5.
- Map aspect ratio to image_generate's format: `16:9``landscape`, `9:16``portrait`, `1:1``square`
- For custom ratios, pick the closest named aspect
- On failure, auto-retry once
- Save the resulting image URL/path to the output directory
### Step 7: Output Summary
Report: topic, layout, style, aspect, language, output path, files created.
## References
- `references/analysis-framework.md` — Analysis methodology
- `references/structured-content-template.md` — Content format
- `references/base-prompt.md` — Prompt template
- `references/layouts/<layout>.md` — 21 layout definitions
- `references/styles/<style>.md` — 21 style definitions
## Pitfalls
1. **Data integrity is paramount** — never summarize, paraphrase, or alter source statistics. "73% increase" must stay "73% increase", not "significant increase".
2. **Strip secrets** — always scan source content for API keys, tokens, or credentials before including in any output file.
3. **One message per section** — each infographic section should convey one clear concept. Overloading sections reduces readability.
4. **Style consistency** — the style definition from the references file must be applied consistently across the entire infographic. Don't mix styles.
5. **image_generate aspect ratios** — the tool only supports `landscape`, `portrait`, and `square`. Custom ratios like `3:4` should map to the nearest option (portrait in that case).
@@ -0,0 +1,182 @@
# Infographic Content Analysis Framework
Deep analysis framework applying instructional design principles to infographic creation.
## Purpose
Before creating an infographic, thoroughly analyze the source material to:
- Understand the content at a deep level
- Identify clear learning objectives for the viewer
- Structure information for maximum clarity and retention
- Match content to optimal layout×style combinations
- Preserve all source data verbatim
## Instructional Design Mindset
Approach content analysis as a **world-class instructional designer**:
| Principle | Application |
|-----------|-------------|
| **Deep Understanding** | Read the entire document before analyzing any part |
| **Learner-Centered** | Focus on what the viewer needs to understand |
| **Visual Storytelling** | Use visuals to communicate, not just decorate |
| **Cognitive Load** | Simplify complex ideas without losing accuracy |
| **Data Integrity** | Never alter, summarize, or paraphrase source facts |
## Analysis Dimensions
### 1. Content Type Classification
| Type | Characteristics | Best Layout | Best Style |
|------|-----------------|-------------|------------|
| **Timeline/History** | Sequential events, dates, progression | linear-progression | craft-handmade, aged-academia |
| **Process/Tutorial** | Step-by-step instructions, how-to | linear-progression, winding-roadmap | ikea-manual, technical-schematic |
| **Comparison** | A vs B, pros/cons, before-after | binary-comparison, comparison-matrix | corporate-memphis, bold-graphic |
| **Hierarchy** | Levels, priorities, pyramids | hierarchical-layers, tree-branching | craft-handmade, corporate-memphis |
| **Relationships** | Connections, overlaps, influences | venn-diagram, hub-spoke, jigsaw | craft-handmade, subway-map |
| **Data/Metrics** | Statistics, KPIs, measurements | dashboard, periodic-table | corporate-memphis, technical-schematic |
| **Cycle/Loop** | Recurring processes, feedback loops | circular-flow | craft-handmade, technical-schematic |
| **System/Structure** | Components, architecture, anatomy | structural-breakdown, bento-grid | technical-schematic, ikea-manual |
| **Journey/Narrative** | Stories, user flows, milestones | winding-roadmap, story-mountain | storybook-watercolor, comic-strip |
| **Overview/Summary** | Multiple topics, feature highlights | bento-grid, periodic-table, dense-modules | chalkboard, bold-graphic |
| **Product/Buying Guide** | Multi-dimension comparisons, specs, pitfalls | dense-modules | morandi-journal, pop-laboratory, retro-pop-grid |
### 2. Learning Objective Identification
Every infographic should have 1-3 clear learning objectives.
**Good Learning Objectives**:
- Specific and measurable
- Focus on what the viewer will understand, not just see
- Written from the viewer's perspective
**Format**: "After viewing this infographic, the viewer will understand..."
| Content Aspect | Objective Type |
|----------------|----------------|
| Core concept | "...what [topic] is and why it matters" |
| Process | "...how to [accomplish something]" |
| Comparison | "...the key differences between [A] and [B]" |
| Relationships | "...how [elements] connect to each other" |
| Data | "...the significance of [key statistics]" |
### 3. Audience Analysis
| Factor | Questions | Impact |
|--------|-----------|--------|
| **Knowledge Level** | What do they already know? | Determines complexity depth |
| **Context** | Why are they viewing this? | Determines emphasis points |
| **Expectations** | What do they hope to learn? | Determines success criteria |
| **Visual Preferences** | Professional, playful, technical? | Influences style choice |
### 4. Complexity Assessment
| Level | Indicators | Layout Recommendation |
|-------|------------|----------------------|
| **Simple** (3-5 points) | Few main concepts, clear relationships | sparse layouts, single focus |
| **Moderate** (6-8 points) | Multiple concepts, some relationships | balanced layouts, clear sections |
| **Complex** (9+ points) | Many concepts, intricate relationships | dense layouts, multiple sections |
### 5. Visual Opportunity Mapping
Identify what can be shown rather than told:
| Content Element | Visual Treatment |
|-----------------|------------------|
| Numbers/Statistics | Large, highlighted numerals |
| Comparisons | Side-by-side, split screen |
| Processes | Arrows, numbered steps, flow |
| Hierarchies | Pyramids, layers, size differences |
| Relationships | Lines, connections, overlapping shapes |
| Categories | Color coding, grouping, sections |
| Timelines | Horizontal/vertical progression |
| Quotes | Callout boxes, quotation marks |
### 6. Data Verbatim Extraction
**Critical**: All factual information must be preserved exactly as written in the source.
| Data Type | Handling Rule |
|-----------|---------------|
| **Statistics** | Copy exactly: "73%" not "about 70%" |
| **Quotes** | Copy word-for-word with attribution |
| **Names** | Preserve exact spelling |
| **Dates** | Keep original format |
| **Technical Terms** | Do not simplify or substitute |
| **Lists** | Preserve order and wording |
**Never**:
- Round numbers
- Paraphrase quotes
- Substitute simpler words
- Add implied information
- Remove context that affects meaning
## Output Format
Save analysis results to `analysis.md`:
```yaml
---
title: "[Main topic title]"
topic: "[educational/technical/business/creative/etc.]"
data_type: "[timeline/hierarchy/comparison/process/etc.]"
complexity: "[simple/moderate/complex]"
point_count: [number of main points]
source_language: "[detected language]"
user_language: "[user's language]"
---
## Main Topic
[1-2 sentence summary of what this content is about]
## Learning Objectives
After viewing this infographic, the viewer should understand:
1. [Primary objective]
2. [Secondary objective]
3. [Tertiary objective if applicable]
## Target Audience
- **Knowledge Level**: [Beginner/Intermediate/Expert]
- **Context**: [Why they're viewing this]
- **Expectations**: [What they hope to learn]
## Content Type Analysis
- **Data Structure**: [How information relates to itself]
- **Key Relationships**: [What connects to what]
- **Visual Opportunities**: [What can be shown rather than told]
## Key Data Points (Verbatim)
[All statistics, quotes, and critical facts exactly as they appear in source]
- "[Exact data point 1]"
- "[Exact data point 2]"
- "[Exact quote with attribution]"
## Layout × Style Signals
- Content type: [type] → suggests [layout]
- Tone: [tone] → suggests [style]
- Audience: [audience] → suggests [style]
- Complexity: [level] → suggests [layout density]
## Design Instructions (from user input)
[Any style, color, layout, or visual preferences extracted from user's steering prompt]
## Recommended Combinations
1. **[Layout] + [Style]** (Recommended): [Brief rationale]
2. **[Layout] + [Style]**: [Brief rationale]
3. **[Layout] + [Style]**: [Brief rationale]
```
## Analysis Checklist
Before proceeding to structured content generation:
- [ ] Have I read the entire source document?
- [ ] Can I summarize the main topic in 1-2 sentences?
- [ ] Have I identified 1-3 clear learning objectives?
- [ ] Do I understand the target audience?
- [ ] Have I classified the content type correctly?
- [ ] Have I extracted all data points verbatim?
- [ ] Have I identified visual opportunities?
- [ ] Have I extracted design instructions from user input?
- [ ] Have I recommended 3 layout×style combinations?
@@ -0,0 +1,43 @@
Create a professional infographic following these specifications:
## Image Specifications
- **Type**: Infographic
- **Layout**: {{LAYOUT}}
- **Style**: {{STYLE}}
- **Aspect Ratio**: {{ASPECT_RATIO}}
- **Language**: {{LANGUAGE}}
## Core Principles
- Follow the layout structure precisely for information architecture
- Apply style aesthetics consistently throughout
- If content involves sensitive or copyrighted figures, create stylistically similar alternatives
- Keep information concise, highlight keywords and core concepts
- Use ample whitespace for visual clarity
- Maintain clear visual hierarchy
## Text Requirements
- All text must match the specified style treatment
- Main titles should be prominent and readable
- Key concepts should be visually emphasized
- Labels should be clear and appropriately sized
- Use the specified language for all text content
## Layout Guidelines
{{LAYOUT_GUIDELINES}}
## Style Guidelines
{{STYLE_GUIDELINES}}
---
Generate the infographic based on the content below:
{{CONTENT}}
Text labels (in {{LANGUAGE}}):
{{TEXT_LABELS}}
@@ -0,0 +1,41 @@
# bento-grid
Modular grid layout with varied cell sizes, like a bento box.
## Structure
- Grid of rectangular cells
- Mixed cell sizes (1x1, 2x1, 1x2, 2x2)
- No strict symmetry required
- Hero cell for main point
- Supporting cells around it
## Best For
- Multiple topic overview
- Feature highlights
- Dashboard summaries
- Portfolio displays
- Mixed content types
## Visual Elements
- Clear cell boundaries
- Varied cell backgrounds
- Icons or illustrations per cell
- Consistent padding/margins
- Visual hierarchy through size
## Text Placement
- Main title at top
- Cell titles within each cell
- Brief content per cell
- Minimal text, maximum visual
- CTA or summary in prominent cell
## Recommended Pairings
- `craft-handmade`: Friendly overviews (default)
- `corporate-memphis`: Business summaries
- `pixel-art`: Retro feature grids
@@ -0,0 +1,48 @@
# binary-comparison
Side-by-side comparison of two items, states, or concepts.
## Structure
- Vertical divider splitting image in half
- Left side: Item A / Before / Pro
- Right side: Item B / After / Con
- Mirrored layout for easy comparison
- Clear visual distinction between sides
## Variants
| Variant | Focus | Visual Emphasis |
|---------|-------|-----------------|
| **Before-After** | Transformation over time | Temporal change, improvement |
| **A vs B** | Feature comparison | Direct contrast, differences |
| **Pro-Con** | Advantages/disadvantages | Balanced evaluation |
## Best For
- Before/after transformations
- Product or option comparisons
- Pros and cons analysis
- Old vs new comparisons
- Two perspectives on a topic
## Visual Elements
- Strong vertical dividing line or gradient
- Contrasting colors per side
- Matching element positions for comparison
- VS symbol or divider decoration
- Transformation arrow for before-after
## Text Placement
- Main title centered at top
- Side labels (A/B, Before/After)
- Corresponding points aligned horizontally
- Summary at bottom if needed
## Recommended Pairings
- `corporate-memphis`: Business comparisons
- `bold-graphic`: High-contrast dramatic comparisons
- `craft-handmade`: Friendly explainers
@@ -0,0 +1,41 @@
# bridge
Gap-crossing structure connecting problem to solution or current to future state.
## Structure
- Left side: current state/problem
- Right side: desired state/solution
- Bridge element spanning the gap
- Gap representing challenge/obstacle
- Bridge elements as steps/methods
## Best For
- Problem to solution journeys
- Current vs future state
- Gap analysis
- Transformation bridges
- Strategic initiatives
## Visual Elements
- Two distinct platforms/sides
- Visible gap or chasm
- Bridge structure with supports
- Icons representing each side
- Stepping stones or bridge planks
## Text Placement
- Title at top
- Left label (From/Problem/Current)
- Right label (To/Solution/Future)
- Bridge elements labeled
- Gap description below
## Recommended Pairings
- `cartoon-hand-drawn`: Friendly journeys
- `corporate-memphis`: Business transformations
- `isometric-3d`: Technical transitions
@@ -0,0 +1,41 @@
# circular-flow
Cyclic process showing continuous or recurring steps.
## Structure
- Circular arrangement
- Steps around the circle
- Arrows showing direction
- No clear start/end (continuous)
- Center can hold main concept
## Best For
- Recurring processes
- Feedback loops
- Lifecycle stages
- Continuous improvement
- Natural cycles
## Visual Elements
- Circle or ring shape
- Directional arrows
- Step nodes evenly spaced
- Icons per step
- Optional center element
## Text Placement
- Title at top
- Step labels at each node
- Brief descriptions near nodes
- Center concept if applicable
- Cycle name
## Recommended Pairings
- `cartoon-hand-drawn`: Friendly cycles
- `corporate-memphis`: Business processes
- `subway-map`: Transit-style cycles
@@ -0,0 +1,41 @@
# comic-strip
Sequential narrative panels telling a story or explaining a concept.
## Structure
- Multiple panels in sequence
- Left-to-right, top-to-bottom reading
- Characters or subjects in scenes
- Speech/thought bubbles
- Panel borders clearly defined
## Best For
- Storytelling explanations
- User journey narratives
- Scenario illustrations
- Step sequences with context
- Before/during/after stories
## Visual Elements
- Panel frames
- Speech and thought bubbles
- Sound effects (optional)
- Characters with expressions
- Scene backgrounds
## Text Placement
- Title at top
- Dialogue in speech bubbles
- Narration in caption boxes
- Sound effects integrated
- Panel numbers if needed
## Recommended Pairings
- `graphic-novel`: Dramatic narratives
- `kawaii`: Cute character stories
- `cartoon-hand-drawn`: Friendly explanations
@@ -0,0 +1,41 @@
# comparison-matrix
Grid-based multi-factor comparison across multiple items.
## Structure
- Table/grid layout
- Rows: items being compared
- Columns: comparison criteria
- Cells: scores, checks, or values
- Header row and column clearly marked
## Best For
- Product feature comparisons
- Tool/software evaluations
- Multi-criteria decisions
- Specification sheets
- Rating comparisons
## Visual Elements
- Clear grid lines or cell boundaries
- Checkmarks, X marks, or scores in cells
- Color coding for quick scanning
- Icons for criteria categories
- Highlight for recommended option
## Text Placement
- Title at top
- Item names in first column
- Criteria in header row
- Brief values in cells
- Legend if using symbols
## Recommended Pairings
- `corporate-memphis`: Business tool comparisons
- `ui-wireframe`: Technical feature matrices
- `blueprint`: Specification comparisons
@@ -0,0 +1,41 @@
# dashboard
Multi-metric display with charts, numbers, and KPI indicators.
## Structure
- Multiple data widgets
- Charts, graphs, numbers
- Grid or modular layout
- Key metrics prominent
- Status indicators
## Best For
- KPI summaries
- Performance metrics
- Analytics overviews
- Status reports
- Data snapshots
## Visual Elements
- Chart types (bar, line, pie, gauge)
- Big numbers for KPIs
- Trend arrows (up/down)
- Color-coded status (green/red)
- Clean data visualization
## Text Placement
- Title at top
- Widget titles above each section
- Metric labels and values
- Units clearly shown
- Time period indicated
## Recommended Pairings
- `corporate-memphis`: Business dashboards
- `ui-wireframe`: Technical dashboards
- `cyberpunk-neon`: Futuristic displays
@@ -0,0 +1,72 @@
# dense-modules
High-density modular layout with 6-7 typed information modules packed with concrete data.
## Structure
- 6-7 distinct modules per image, each serving a specific information function
- Every module contains concrete data: brand names, numbers, percentages, parameters
- Minimal whitespace—compact spacing prioritized over breathing room
- Smaller text acceptable to maximize information density
- Each module identified by coordinate label or section marker (e.g., MOD-1, SEC-A)
## Module Archetypes
| Module | Purpose | Content Requirements |
|--------|---------|---------------------|
| **Brand/Selection Array** | Grid of options with recommendations | 4-8 items with icons, names, brief descriptions; highlight "best choice" |
| **Specification Scale** | Quality/measurement gauge | 3-5 levels with precise numerical increments, quality indicators (emoji faces, checkmarks) |
| **Deep Dive/Detail** | Technical breakdown of key item | Zoom-in callouts, internal components, cross-section or exploded view |
| **Scenario Comparison** | Side-by-side use cases | 3-6 scenarios with specific recommendations and data per scenario |
| **Identification Tips** | How-to checklist | 3-5 inspection methods: look/test/check/ask format |
| **Warning/Pitfall Zone** | Critical mistakes to avoid | 3-5 pitfalls with consequences, 1-2 correct approaches; high visual contrast |
| **Quick Reference** | Compact summary | Dense table, one-line summaries, decision flowchart, or key takeaways |
## Variants
| Variant | Focus | Visual Emphasis |
|---------|-------|-----------------|
| **Coordinate-labeled** | Precision and systematicity | Each module has alphanumeric coordinate (A-01, B-05, C-12), ruler/axis markers |
| **Grid-cell** | Order and structure | Modules in strict rectangular cells divided by thick lines, Swiss grid feel |
| **Free-flowing** | Organic density | Magazine-style layout with dotted frames, varying module sizes, connected by arrows |
## Best For
- Product selection guides and buying guides
- Multi-dimensional comparison content
- Data-rich educational materials
- "Avoid pitfalls" / "complete guide" formats
- Content targeting platforms like Xiaohongshu with high-density visual requirements
## Visual Elements
- Module boundary markers (thick lines, dotted frames, or coordinate grids)
- Quality indicators per module (emoji faces, checkmarks, crosses, crowns)
- Data callout boxes with highlighted numbers
- Comparison arrows and progression indicators
- Warning/alert visual markers for pitfall modules
- Metadata in corners (page numbers, timestamps, small barcodes)
## Text Placement
- Main title at top, prominent and impactful
- Subtitle with module count ("X大维度全面解析...")
- Module headers inside colored badges or labeled frames
- Body text compact, multiple columns within modules
- Numbers highlighted with accent colors, slightly larger than body text
## Information Density Rules
- Every corner should contain useful information or metadata
- No decorative-only empty space
- Text size may be reduced to fit more content—information over font size
- Each module must have specific data points, not generic descriptions
- Balance between density and readability: dense but organized
## Recommended Pairings
- `pop-laboratory`: Technical precision with coordinate markers and blueprint grid
- `morandi-journal`: Hand-drawn warmth with doodle illustrations and organic frames
- `retro-pop-grid`: 1970s pop art with strict grid cells and bold contrast
- `corporate-memphis`: Clean business feel for product comparisons
- `technical-schematic`: Engineering precision for technical product guides
@@ -0,0 +1,41 @@
# funnel
Narrowing stages showing conversion, filtering, or refinement process.
## Structure
- Wide top (input/start)
- Narrow bottom (output/result)
- Horizontal layers for stages
- Progressive narrowing
- 3-6 stages typically
## Best For
- Sales/marketing funnels
- Conversion processes
- Filtering/selection
- Recruitment pipelines
- Decision processes
## Visual Elements
- Funnel shape clearly defined
- Distinct colors per stage
- Width indicates volume/quantity
- Stage icons or symbols
- Numbers/percentages per stage
## Text Placement
- Title at top
- Stage names inside or beside
- Metrics/numbers per stage
- Input label at top
- Output label at bottom
## Recommended Pairings
- `corporate-memphis`: Marketing funnels
- `isometric-3d`: Technical pipelines
- `cartoon-hand-drawn`: Educational funnels
@@ -0,0 +1,48 @@
# hierarchical-layers
Nested layers showing levels of importance, influence, or proximity.
## Structure
- Multiple layers from core to periphery
- Core/top: most important/central
- Outer/bottom: decreasing importance
- 3-7 levels typically
- Clear boundaries between levels
## Variants
| Variant | Shape | Visual Emphasis |
|---------|-------|-----------------|
| **Pyramid** | Triangle, vertical | Top-down hierarchy, quantity |
| **Concentric** | Rings, radial | Center-out influence, proximity |
## Best For
- Maslow's hierarchy style concepts
- Priority and importance levels
- Spheres of influence
- Organizational structures
- Stakeholder analysis
## Visual Elements
- Distinct color per level
- Icons or illustrations per tier
- Size indicates importance/quantity
- Labels inside or beside layers
- Decorative apex/center element
## Text Placement
- Title at top or side
- Level names inside each tier
- Brief descriptions outside
- Quantities or percentages if relevant
- Legend for color meanings
## Recommended Pairings
- `craft-handmade`: Playful layered concepts
- `corporate-memphis`: Business hierarchies
- `technical-schematic`: Technical 3D pyramids
@@ -0,0 +1,41 @@
# hub-spoke
Central concept with radiating connections to related items.
## Structure
- Central hub (main concept)
- Spokes radiating outward
- Nodes at spoke ends (related concepts)
- Even or weighted distribution
- Optional secondary connections
## Best For
- Central theme with components
- Product features around core
- Team roles around project
- Ecosystem mapping
- Mind maps
## Visual Elements
- Prominent central hub
- Clear spoke lines
- Consistent node styling
- Icons representing each spoke item
- Optional grouping colors
## Text Placement
- Title at top
- Core concept in center hub
- Spoke item labels at nodes
- Brief descriptions near nodes
- Connection labels on spokes if needed
## Recommended Pairings
- `cartoon-hand-drawn`: Friendly concept maps
- `corporate-memphis`: Business ecosystems
- `subway-map`: Network-style connections
@@ -0,0 +1,41 @@
# iceberg
Surface vs hidden depths, visible vs underlying factors.
## Structure
- Waterline dividing visible/hidden
- Tip above water (obvious/surface)
- Larger mass below (hidden/deep)
- Proportional to emphasize hidden depth
- Optional layers within underwater section
## Best For
- Surface vs root causes
- Visible vs invisible work
- Symptoms vs underlying issues
- Public vs private aspects
- Known vs unknown factors
## Visual Elements
- Clear water/surface line
- Above: smaller, brighter
- Below: larger, darker/deeper
- Wave or water texture
- Gradient showing depth
## Text Placement
- Title at top
- Surface items above waterline
- Hidden items below, larger
- Waterline label optional
- Depth indicators for layers
## Recommended Pairings
- `cartoon-hand-drawn`: Friendly metaphor
- `storybook-watercolor`: Artistic depth
- `graphic-novel`: Dramatic revelation
@@ -0,0 +1,41 @@
# isometric-map
3D-style spatial layout showing locations, relationships, or journey through space.
## Structure
- Isometric 3D perspective
- Locations as buildings/landmarks
- Paths connecting locations
- Spatial relationships visible
- Bird's eye view angle
## Best For
- Office/campus layouts
- City/ecosystem maps
- User journey maps
- System architecture
- Process landscapes
## Visual Elements
- Consistent isometric angle (30°)
- 3D buildings or objects
- Pathways and roads
- Labels floating above
- Mini scenes at locations
## Text Placement
- Title at top corner
- Location labels above objects
- Path labels along routes
- Legend for symbols
- Scale indicator if relevant
## Recommended Pairings
- `isometric-3d`: Clean technical maps
- `pixel-art`: Retro game-style maps
- `lego-brick`: Playful location maps
@@ -0,0 +1,41 @@
# jigsaw
Interlocking puzzle pieces showing how parts fit together.
## Structure
- Puzzle pieces that interlock
- Each piece represents a component
- Connections show relationships
- Can be assembled or exploded view
- Missing piece highlights gaps
## Best For
- Component relationships
- Team/skill fit
- Strategy pieces
- Integration concepts
- Completeness assessments
## Visual Elements
- Classic puzzle piece shapes
- Distinct colors per piece
- Interlocking edges visible
- Icons or labels per piece
- Optional missing piece
## Text Placement
- Title at top
- Piece labels inside or beside
- Connection descriptions
- Missing piece explanation
- Assembly context
## Recommended Pairings
- `cartoon-hand-drawn`: Friendly integration concepts
- `paper-cutout`: Tactile puzzle feel
- `corporate-memphis`: Business strategy pieces
@@ -0,0 +1,48 @@
# linear-progression
Sequential progression showing steps, timeline, or chronological events.
## Structure
- Linear arrangement (horizontal or vertical)
- Nodes/markers at key points
- Connecting line or path between nodes
- Clear start and end points
- Directional flow indicators
## Variants
| Variant | Focus | Visual Emphasis |
|---------|-------|-----------------|
| **Timeline** | Chronological events, dates | Time markers, period labels |
| **Process** | Action steps, numbered sequence | Step numbers, action icons |
## Best For
- Step-by-step tutorials and how-tos
- Historical timelines and evolution
- Project milestones and roadmaps
- Workflow documentation
- Onboarding processes
## Visual Elements
- Numbered steps or date markers
- Arrows or connectors showing direction
- Icons representing each step/event
- Consistent node spacing
- Progress indicators optional
## Text Placement
- Title at top
- Step/event titles at each node
- Brief descriptions below nodes
- Dates or numbers clearly visible
## Recommended Pairings
- `craft-handmade`: Friendly tutorials and timelines
- `ikea-manual`: Clean assembly instructions
- `corporate-memphis`: Business process flows
- `aged-academia`: Historical discoveries
@@ -0,0 +1,41 @@
# periodic-table
Grid of categorized elements with consistent cell formatting.
## Structure
- Rectangular grid
- Each cell is one element
- Color-coded categories
- Consistent cell format
- Optional grouping gaps
## Best For
- Categorized collections
- Tool/resource catalogs
- Skill matrices
- Element collections
- Reference guides
## Visual Elements
- Uniform cell sizes
- Category colors
- Symbol/abbreviation prominent
- Small icon per cell
- Category legend
## Text Placement
- Title at top
- Cell: symbol, name, brief info
- Category names in legend
- Optional row/column headers
- Footnotes for special cases
## Recommended Pairings
- `pop-art`: Vibrant element grids
- `pixel-art`: Retro collection displays
- `corporate-memphis`: Business tool catalogs
@@ -0,0 +1,41 @@
# story-mountain
Plot structure visualization showing rising action, climax, and resolution.
## Structure
- Mountain/arc shape
- Rising slope (build-up)
- Peak (climax)
- Falling slope (resolution)
- Start and end at base level
## Best For
- Narrative structures
- Project lifecycles
- Tension/release patterns
- Emotional journeys
- Campaign arcs
## Visual Elements
- Mountain or arc curve
- Points along the path
- Climax visually emphasized
- Slope steepness meaningful
- Base camps or milestones
## Text Placement
- Title at top
- Stage labels along path
- Climax prominently labeled
- Brief descriptions at points
- Start/end clearly marked
## Recommended Pairings
- `storybook-watercolor`: Narrative journeys
- `cartoon-hand-drawn`: Educational plot diagrams
- `graphic-novel`: Dramatic story arcs
@@ -0,0 +1,48 @@
# structural-breakdown
Internal structure visualization with labeled parts or layers.
## Structure
- Central subject (object, system, body)
- Parts or layers clearly shown
- Labels with callout lines
- Exploded or cutaway view
- Optional zoomed detail sections
## Variants
| Variant | View Type | Visual Emphasis |
|---------|-----------|-----------------|
| **Exploded** | Parts separated outward | Component relationships |
| **Cross-section** | Sliced/cutaway view | Internal layers, composition |
## Best For
- Product part breakdowns
- Anatomy explanations
- System components
- Device teardowns
- Material composition
## Visual Elements
- Main subject clearly rendered
- Callout lines with dots/arrows
- Label boxes at endpoints
- Numbered parts optionally
- Layer boundaries or separation
## Text Placement
- Title at top
- Part/layer labels at callouts
- Brief descriptions in boxes
- Legend for numbered systems
- Depth/thickness if relevant
## Recommended Pairings
- `technical-schematic`: Technical schematics
- `aged-academia`: Classic anatomical style
- `craft-handmade`: Friendly breakdowns
@@ -0,0 +1,41 @@
# tree-branching
Hierarchical structure branching from root to leaves, showing categories and subcategories.
## Structure
- Root/trunk at top or left
- Branches splitting into sub-branches
- Leaves as terminal nodes
- Clear parent-child relationships
- Balanced or organic branching
## Best For
- Taxonomies and classifications
- Decision trees
- Organizational charts
- File/folder structures
- Family trees
## Visual Elements
- Connecting lines showing relationships
- Nodes at branch points
- Icons or labels at each node
- Color coding by branch
- Visual weight decreasing toward leaves
## Text Placement
- Title at top
- Root concept prominently labeled
- Branch and leaf labels
- Optional descriptions at key nodes
- Legend for categories
## Recommended Pairings
- `cartoon-hand-drawn`: Friendly taxonomies
- `da-vinci-notebook`: Scientific classifications
- `origami`: Geometric tree structures
@@ -0,0 +1,41 @@
# venn-diagram
Overlapping circles showing relationships, commonalities, and differences.
## Structure
- 2-3 overlapping circles
- Each circle is a category/concept
- Overlaps show shared elements
- Center shows common to all
- Unique areas for exclusives
## Best For
- Concept relationships
- Skill overlaps
- Market segments
- Comparative analysis
- Finding common ground
## Visual Elements
- Translucent circle fills
- Clear overlap regions
- Distinct colors per circle
- Icons in regions
- Boundary labels
## Text Placement
- Title at top
- Circle labels outside or on edge
- Items in appropriate regions
- Overlap region labels
- Legend if needed
## Recommended Pairings
- `cartoon-hand-drawn`: Friendly concept overlaps
- `corporate-memphis`: Business segment analysis
- `pop-art`: High-contrast comparisons
@@ -0,0 +1,41 @@
# winding-roadmap
Curved path showing journey with milestones and checkpoints.
## Structure
- S-curve or winding path
- Milestones along the path
- Start and destination points
- Side elements (obstacles, helpers)
- Progress indicators
## Best For
- Project roadmaps
- Career paths
- Customer journeys
- Learning paths
- Strategy timelines
## Visual Elements
- Curving road or river
- Milestone markers/flags
- Scene elements along path
- Vehicle/character on journey
- Destination landmark
## Text Placement
- Title at top
- Milestone labels at each point
- Path section names
- Destination description
- Optional timeline indicators
## Recommended Pairings
- `storybook-watercolor`: Whimsical journeys
- `cartoon-hand-drawn`: Friendly roadmaps
- `isometric-3d`: Technical project paths
@@ -0,0 +1,244 @@
# Structured Content Template
Template for generating structured infographic content that informs the visual designer.
## Purpose
This document bridges content analysis and visual design:
- Transforms source material into designer-ready format
- Organizes learning objectives into visual sections
- Preserves all source data verbatim
- Separates content from design instructions
## Instructional Design Process
### Phase 1: High-Level Outline
1. **Title**: Capture the essence in a compelling headline
2. **Overview**: Brief description (1-2 sentences)
3. **Learning Objectives**: List what the viewer will understand
### Phase 2: Section Development
For each learning objective:
1. **Key Concept**: One-sentence summary of the section
2. **Content**: Points extracted verbatim from source
3. **Visual Element**: What should be shown visually
4. **Text Labels**: Exact text for headlines, subheads, labels
### Phase 3: Data Integrity Check
Verify all source data is:
- Copied exactly (no paraphrasing)
- Attributed correctly (for quotes)
- Formatted consistently
## Critical Rules
| Rule | Requirement | Example |
|------|-------------|---------|
| **Output format** | Markdown only | Use proper headers, lists, code blocks |
| **Tone** | Expert trainer | Knowledgeable, clear, encouraging |
| **No new information** | Only source content | Don't add examples not in source |
| **Verbatim data** | Exact copies | "73% increase" not "significant increase" |
## Structured Content Format
```markdown
# [Infographic Title]
## Overview
[Brief description of what this infographic conveys - 1-2 sentences]
## Learning Objectives
The viewer will understand:
1. [Primary objective]
2. [Secondary objective]
3. [Tertiary objective if applicable]
---
## Section 1: [Section Title]
**Key Concept**: [One-sentence summary of this section]
**Content**:
- [Point 1 - verbatim from source]
- [Point 2 - verbatim from source]
- [Point 3 - verbatim from source]
**Visual Element**: [Description of what to show visually]
- Type: [icon/chart/illustration/diagram/photo]
- Subject: [what it depicts]
- Treatment: [how it should be presented]
**Text Labels**:
- Headline: "[Exact text for headline]"
- Subhead: "[Exact text for subhead]"
- Labels: "[Label 1]", "[Label 2]", "[Label 3]"
---
## Section 2: [Section Title]
**Key Concept**: [One-sentence summary]
**Content**:
- [Point 1]
- [Point 2]
**Visual Element**: [Description]
**Text Labels**:
- Headline: "[text]"
- Labels: "[Label 1]", "[Label 2]"
---
[Continue for each section...]
---
## Data Points (Verbatim)
All statistics, numbers, and quotes exactly as they appear in source:
### Statistics
- "[Exact statistic 1]"
- "[Exact statistic 2]"
- "[Exact statistic 3]"
### Quotes
- "[Exact quote]" — [Attribution]
### Key Terms
- **[Term 1]**: [Definition from source]
- **[Term 2]**: [Definition from source]
---
## Design Instructions
Extracted from user's steering prompt:
### Style Preferences
- [Any color preferences]
- [Any mood/aesthetic preferences]
- [Any artistic style preferences]
### Layout Preferences
- [Any structure preferences]
- [Any organization preferences]
### Other Requirements
- [Any other visual requirements from user]
- [Target platform if specified]
- [Brand guidelines if any]
```
## Section Types by Content
### For Process/Steps
```markdown
## Section N: Step N - [Step Title]
**Key Concept**: [What this step accomplishes]
**Content**:
- Action: [What to do]
- Details: [How to do it]
- Note: [Important consideration]
**Visual Element**:
- Type: numbered step icon
- Subject: [visual representing the action]
- Arrow: leads to next step
**Text Labels**:
- Headline: "Step N: [Title]"
- Action: "[Imperative verb + object]"
```
### For Comparison
```markdown
## Section N: [Item A] vs [Item B]
**Key Concept**: [What distinguishes them]
**Content**:
| Aspect | [Item A] | [Item B] |
|--------|----------|----------|
| [Factor 1] | [Value] | [Value] |
| [Factor 2] | [Value] | [Value] |
**Visual Element**:
- Type: split comparison
- Left: [Item A representation]
- Right: [Item B representation]
**Text Labels**:
- Headline: "[Item A] vs [Item B]"
- Left label: "[Item A name]"
- Right label: "[Item B name]"
```
### For Hierarchy
```markdown
## Section N: [Level Name]
**Key Concept**: [What this level represents]
**Content**:
- Position: [Top/Middle/Bottom]
- Priority: [Importance level]
- Contains: [Elements at this level]
**Visual Element**:
- Type: layer/tier
- Size: [relative to other levels]
- Position: [where in hierarchy]
**Text Labels**:
- Level title: "[Name]"
- Description: "[Brief description]"
```
### For Data/Statistics
```markdown
## Section N: [Metric Name]
**Key Concept**: [What this data shows]
**Content**:
- Value: [Exact number/percentage]
- Context: [What it means]
- Comparison: [Benchmark if any]
**Visual Element**:
- Type: [chart/number highlight/gauge]
- Emphasis: [how to draw attention]
**Text Labels**:
- Main number: "[Exact value]"
- Label: "[Metric name]"
- Context: "[Brief context]"
```
## Quality Checklist
Before finalizing structured content:
- [ ] Title captures the main message
- [ ] Learning objectives are clear and measurable
- [ ] Each section maps to an objective
- [ ] All content is verbatim from source
- [ ] Visual elements are clearly described
- [ ] Text labels are specified exactly
- [ ] Data points are collected and verified
- [ ] Design instructions are separated
- [ ] No new information has been added
@@ -0,0 +1,36 @@
# aged-academia
Historical scientific illustration with aged paper aesthetic.
## Color Palette
- Primary: Sepia brown (#704214), aged ink, muted earth tones
- Background: Parchment (#F4E4BC), yellowed paper texture
- Accents: Faded red annotations, iron gall ink spots
## Variants
| Variant | Focus | Visual Emphasis |
|---------|-------|-----------------|
| **Notebook** | Personal sketches, inventions | Cursive notes, margin annotations |
| **Specimen** | Scientific classification | Numbered diagrams, Latin labels |
## Visual Elements
- Aged paper texture overlay
- Detailed cross-hatching and line work
- Scientific illustration precision
- Study notes and annotations
- Specimen plate or sketch aesthetic
- Numbered diagram elements
## Typography
- Handwritten cursive or serif fonts
- Scientific annotations
- Small caps for labels
- Italics for scientific names
## Best For
Scientific education, biology topics, historical explanations, inventions, nature documentation
@@ -0,0 +1,36 @@
# bold-graphic
High-contrast comic style with bold outlines and dramatic visuals.
## Color Palette
- Primary: Bold primaries - red, yellow, blue, black
- Background: White, halftone patterns, dramatic shadows
- Accents: Spot colors, neon highlights
## Variants
| Variant | Focus | Visual Emphasis |
|---------|-------|-----------------|
| **Graphic-novel** | Dramatic narratives | Action lines, hatching, panels |
| **Pop-art** | High-energy impact | Halftone dots, Warhol repetition |
## Visual Elements
- Bold black outlines
- High contrast compositions
- Halftone dot patterns
- Comic panel borders optional
- Action lines and motion
- Speech bubbles and sound effects
## Typography
- Comic book lettering
- Impact fonts for emphasis
- POW/BANG effects for pop-art
- Caption boxes for narrative
## Best For
Attention-grabbing content, dramatic narratives, pop culture, marketing, high-energy presentations
@@ -0,0 +1,61 @@
# chalkboard
Black chalkboard background with colorful chalk drawing style
## Design Aesthetic
Classic classroom chalkboard aesthetic with hand-drawn chalk illustrations. Nostalgic educational feel with imperfect, sketchy lines that capture the warmth of traditional teaching. Colorful chalk creates visual hierarchy while maintaining the authentic chalkboard experience.
## Background
- Color: Chalkboard Black (#1A1A1A) or Dark Green-Black (#1C2B1C)
- Texture: Realistic chalkboard texture with subtle scratches, dust particles, and faint eraser marks
## Typography
Hand-drawn chalk lettering style with visible chalk texture. Imperfect baseline adds authenticity. White or bright colored chalk for emphasis.
## Color Palette
| Role | Color | Hex | Usage |
|------|-------|-----|-------|
| Background | Chalkboard Black | #1A1A1A | Primary background |
| Alt Background | Green-Black | #1C2B1C | Traditional green board |
| Primary Text | Chalk White | #F5F5F5 | Main text, outlines |
| Accent 1 | Chalk Yellow | #FFE566 | Highlights, emphasis |
| Accent 2 | Chalk Pink | #FF9999 | Secondary highlights |
| Accent 3 | Chalk Blue | #66B3FF | Diagrams, links |
| Accent 4 | Chalk Green | #90EE90 | Success, nature |
| Accent 5 | Chalk Orange | #FFB366 | Warnings, energy |
## Visual Elements
- Hand-drawn chalk illustrations with sketchy, imperfect lines
- Chalk dust effects around text and key elements
- Doodles: stars, arrows, underlines, circles, checkmarks
- Mathematical formulas and simple diagrams
- Eraser smudges and chalk residue textures
- Wooden frame border optional
- Stick figures and simple icons
- Connection lines with hand-drawn feel
## Style Rules
### Do
- Maintain authentic chalk texture on all elements
- Use imperfect, hand-drawn quality throughout
- Add subtle chalk dust and smudge effects
- Create visual hierarchy with color variety
- Include playful doodles and annotations
### Don't
- Use perfect geometric shapes
- Create clean digital-looking lines
- Add photorealistic elements
- Use gradients or glossy effects
## Best For
Educational content, tutorials, classroom themes, teaching materials, workshops, informal learning sessions, knowledge sharing
@@ -0,0 +1,29 @@
# claymation
3D clay figure aesthetic with stop-motion charm
## Color Palette
- Primary: Saturated clay colors - bright but slightly muted
- Background: Neutral studio backdrop, soft gradients
- Accents: Complementary clay colors, shiny highlights
## Visual Elements
- Clay/plasticine texture on all objects
- Fingerprint marks and imperfections
- Rounded, sculpted forms
- Soft shadows
- Stop-motion staging
- Miniature set aesthetic
## Typography
- Extruded clay letters
- Dimensional, rounded text
- Playful and chunky
- Embedded in clay scenes
## Best For
Playful explanations, children's content, stop-motion narratives, friendly processes
@@ -0,0 +1,29 @@
# corporate-memphis
Flat vector people with vibrant geometric fills
## Color Palette
- Primary: Bright, saturated - purple, orange, teal, yellow
- Background: White or light pastels
- Accents: Gradient fills, geometric patterns
## Visual Elements
- Flat vector illustration
- Disproportionate human figures
- Abstract body shapes
- Floating geometric elements
- No outlines, solid fills
- Plant and object accents
## Typography
- Clean sans-serif
- Bold headings
- Professional but friendly
- Minimal decoration
## Best For
Business presentations, tech products, marketing materials, corporate training
@@ -0,0 +1,44 @@
# craft-handmade (DEFAULT)
Hand-drawn and paper craft aesthetic with warm, organic feel.
## Color Palette
- Primary: Warm pastels, soft saturated colors, craft paper tones
- Background: Light cream (#FFF8F0), textured paper (#F5F0E6)
- Accents: Bold highlights, construction paper colors
## Variants
| Variant | Focus | Visual Emphasis |
|---------|-------|-----------------|
| **Hand-drawn** | Cartoon illustration | Simple icons, slightly imperfect lines |
| **Paper-cutout** | Layered paper craft | Drop shadows, torn edges, texture |
## Visual Elements
- Hand-drawn or cut-paper quality
- Organic, slightly imperfect shapes
- Layered depth with shadows (paper variant)
- Simple cartoon elements and icons
- Character illustrations (people, personalities in cartoon form)
- Ample whitespace, clean composition
- Keywords and core concepts highlighted
- **Strictly hand-drawn—no realistic or photographic elements**
## Style Enforcement
- All imagery must maintain cartoon/illustrated aesthetic
- Replace real photos or realistic figures with hand-drawn equivalents
- Maintain consistent line weight and illustration style throughout
## Typography
- Hand-drawn or casual font style
- Clear, readable labels
- Keywords emphasized with larger/bolder text
- Cut-out letter style for paper variant
## Best For
Educational content, general explanations, friendly infographics, children's content, playful hierarchies
@@ -0,0 +1,29 @@
# cyberpunk-neon
Neon glow on dark backgrounds, futuristic aesthetic
## Color Palette
- Primary: Neon pink (#FF00FF), cyan (#00FFFF), electric blue
- Background: Deep black (#0A0A0A), dark purple gradients
- Accents: Neon glow effects, chrome reflections
## Visual Elements
- Glowing neon outlines
- Dark atmospheric backgrounds
- Digital glitch effects
- Circuit patterns
- Holographic elements
- Rain and reflections
## Typography
- Glowing neon text
- Digital/tech fonts
- Flickering effects
- Outlined glow letters
## Best For
Tech futures, gaming content, digital culture, futuristic concepts, night aesthetics
@@ -0,0 +1,63 @@
# hand-drawn-edu
Hand-drawn educational infographic with macaron pastel color blocks on warm cream paper texture.
## Color Palette
- Background: Warm cream (#F5F0E8) with subtle paper grain texture
- Primary text: Deep charcoal (#2D2D2D) for headlines, outlines
- Macaron Blue: #A8D8EA for cool-toned information zones
- Macaron Mint: #B5E5CF for growth/positive zones
- Macaron Lavender: #D5C6E0 for abstract/concept zones
- Macaron Peach: #FFD5C2 for warm-toned zones
- Accent: Coral Red (#E8655A) for key data, warnings, emphasis
- Muted annotations: Warm gray (#6B6B6B) for secondary labels
## Visual Elements
- Macaron pastel rounded cards as distinct information zones
- Hand-drawn wavy connection lines and arrows with small text labels
- Simple stick-figure characters and cartoon icons to humanize concepts
- Doodle decorations: small stars, underlines, spirals, sparkles
- Color fills don't completely fill outlines — preserve casual hand-drawn feel
- Dashed borders for secondary or contained zones
- Small icon doodles (clipboard, lock, checkmark, lightbulb) to reinforce concepts
- Bold centered quote or takeaway at the bottom
- Slight hand-drawn wobble on all lines and shapes
## Variants
| Variant | Focus | Visual Emphasis |
|---------|-------|-----------------|
| **Sketch-notes** | Concept mapping | More stick figures, thought bubbles, connecting arrows |
| **Pastel cards** | Structured info | Cleaner macaron blocks, less doodle, more white space |
## Typography
- Main title: Bold hand-drawn lettering with organic strokes, large confident letterforms with slight wobble
- Section headers: Hand-lettered text on or inside macaron color blocks
- Body text: Clear handwritten print style, legible but not mechanical
- Annotations: Warm gray (#6B6B6B), smaller, neat handwritten labels
- Keywords: Bold emphasis within body text
## Style Enforcement
- All lines must have slight hand-drawn wobble — no perfect geometry
- Each information zone uses a distinct macaron color block
- Maintain consistent wobble quality across all shapes and lines
- Include at least one simple cartoon character or stick figure
- Generous white space between zones — each zone should breathe
- Maximum 4 macaron colors per infographic
## Avoid
- Perfect geometric shapes or straight lines
- Photorealistic elements or stock illustration style
- Pure white backgrounds
- Flat vector icons or digital-precision graphics
- Overcrowded layouts — let zones breathe
- Corporate or clinical aesthetic
## Best For
Educational diagrams, process explainers, concept maps, knowledge summaries, tutorial walkthroughs, onboarding visuals
@@ -0,0 +1,29 @@
# ikea-manual
Minimal line art assembly instruction style
## Color Palette
- Primary: Black lines, minimal fills
- Background: White or cream paper
- Accents: Red for warnings, blue for highlights
## Visual Elements
- Simple line drawings
- Numbered step sequences
- Arrow indicators
- Exploded assembly views
- Wordless communication
- Stick figures for scale
## Typography
- Minimal text
- Step numbers prominent
- Universal symbols
- Simple sans-serif when needed
## Best For
Step-by-step instructions, assembly guides, how-to content, universal communication
@@ -0,0 +1,29 @@
# kawaii
Japanese cute style with big eyes and pastel colors
## Color Palette
- Primary: Soft pastels - pink (#FFB6C1), mint (#98D8C8), lavender (#E6E6FA)
- Background: Light pink or cream, sparkle overlays
- Accents: Bright pops, star and heart shapes
## Visual Elements
- Big sparkly eyes on characters
- Rounded, soft shapes
- Blushing cheeks
- Sparkles and stars scattered
- Cute animal characters
- Chibi proportions
## Typography
- Rounded, bubbly fonts
- Cute decorations on letters
- Hearts and stars in text
- Soft, friendly appearance
## Best For
Cute tutorials, children's education, lifestyle content, character-driven explanations
@@ -0,0 +1,29 @@
# knolling
Organized flat-lay with top-down arrangement
## Color Palette
- Primary: Object's natural colors
- Background: Solid color - black, white, or colored surface
- Accents: Shadows, subtle highlights
## Visual Elements
- Top-down camera angle
- Objects arranged at 90° angles
- Equal spacing between items
- Clean organization
- Symmetry and order
- No overlapping items
## Typography
- Clean labels
- Positioned outside objects
- Connecting lines to items
- Minimal, catalog-style
## Best For
Product collections, tool inventories, gear layouts, organized overviews
@@ -0,0 +1,29 @@
# lego-brick
Toy brick construction with playful aesthetic
## Color Palette
- Primary: Classic LEGO colors - red, blue, yellow, green, white
- Background: Light gray baseplate or white
- Accents: Bright primary pops, shiny studs
## Visual Elements
- Visible brick studs
- Modular construction
- Minifigure characters
- Building instruction style
- Stackable elements
- Plastic sheen
## Typography
- Blocky, bold fonts
- LEGO instruction style
- Step numbers
- Playful appearance
## Best For
Building concepts, modular systems, playful education, children's content
@@ -0,0 +1,60 @@
# morandi-journal
Hand-drawn doodle illustration with warm Morandi color tones and cozy bullet journal aesthetic.
## Color Palette
- Background: Warm cream/beige with subtle paper texture (#F5F0E6)
- Primary: Muted teal/sage green (#7BA3A8) for headers and frames
- Secondary: Warm terracotta/orange (#D4956A) for highlights and numbers
- Line art: Dark charcoal brown (#4A4540)
- Soft highlights: Pale yellow (#F5E6C8)
## Visual Elements
- Hand-drawn doodle illustrations with organic, slightly imperfect ink lines
- Washi tape strip decorations (diagonal stripes pattern, beige and brown)
- Rounded card containers for brand/option items
- Hand-drawn rulers, scales, and progress bars with emoji quality indicators
- Smiley/frowny faces as quality markers (😊✓ 😐 ☹️✗)
- Dotted line frames around sections
- Connecting arrows and dotted lines between modules
- Corner decorations: tiny houses, stars, sparkles, clouds
- Wavy line dividers between sections
- Callout bubbles for tips
- Magnifying glass icons for identification tips
- Thumbs up/down icons (hand-drawn style)
## Variants
| Variant | Focus | Visual Emphasis |
|---------|-------|-----------------|
| **Cozy journal** | Maximum warmth | More washi tape, stickers, decorative doodles |
| **Clean sketch** | Readability | Cleaner lines, less decoration, more structured |
## Typography
- Main title: Bold hand-lettered calligraphy style with decorative flourishes
- Module headers: Clean handwritten text in white on dark teal rounded badge (#6B9080)
- Body text: Neat handwritten print style, easy to read
- Numbers: Highlighted in terracotta (#D4956A), slightly larger than body
## Style Enforcement
- All imagery must maintain hand-drawn/doodle aesthetic—no digital precision
- Organic, slightly imperfect shapes throughout
- Sketch-like quality with visible line weight variations
- Warm and cozy journal feel, not clinical or corporate
## Avoid
- Flat vector icons or emoji
- Clean geometric shapes
- Stock illustration style
- Strict grid layout
- Pure white background
- Digital/corporate look
## Best For
Product selection guides, lifestyle content, educational overviews, consumer-facing comparison content, Xiaohongshu-style posts
@@ -0,0 +1,29 @@
# origami
Folded paper forms with geometric precision
## Color Palette
- Primary: Solid origami paper colors - red, blue, green, gold
- Background: White or soft gray, subtle shadows
- Accents: Paper fold highlights, crisp shadows
## Visual Elements
- Geometric folded shapes
- Visible fold lines
- Cast shadows showing depth
- Paper texture
- Angular, faceted forms
- Low-poly aesthetic
## Typography
- Clean geometric fonts
- Angular letterforms
- Folded paper text effect
- Minimal, precise labels
## Best For
Geometric concepts, transformation topics, Japanese themes, abstract representations
@@ -0,0 +1,29 @@
# pixel-art
Retro 8-bit gaming aesthetic
## Color Palette
- Primary: Limited palette - NES/SNES colors
- Background: Black or dark blue, scanlines optional
- Accents: Bright pixel highlights, CRT glow
## Visual Elements
- Visible pixel grid
- Limited color count per sprite
- 8-bit or 16-bit style
- Retro game UI elements
- Pixel-perfect edges
- Dithering for gradients
## Typography
- Pixel fonts
- Blocky letterforms
- Game UI style text
- Score/stat display style
## Best For
Gaming topics, nostalgia content, developer audiences, retro tech themes
@@ -0,0 +1,48 @@
# pop-laboratory
Lab manual precision meets pop art color impact—coordinate systems, technical diagrams, and fluorescent accents on blueprint grid.
## Color Palette
- Background: Professional grayish-white with faint blueprint grid texture (#F2F2F2)
- Primary: Muted teal/sage green (#B8D8BE) for major functional blocks and data zones
- High-alert accent: Vibrant fluorescent pink (#E91E63) strictly for warnings, critical data, or "winner" highlights
- Marker highlights: Vivid lemon yellow (#FFF200) as translucent highlighter effect for keywords
- Line art: Ultra-fine charcoal brown (#2D2926) for technical grids, coordinates, and hairlines
## Visual Elements
- Coordinate-style labels on every module (e.g., R-20, G-02, SEC-08)
- Technical diagrams: exploded views, cross-sections with anchor points, architectural skeletal lines
- Vertical/horizontal rulers with precise markers (0.5mm, 1.8mm, 45°)
- "Marker-over-print" effect: color blocks slightly offset from text, postmodern print feel
- Cross-hair targets, mathematical symbols (Σ, Δ, ∞), directional arrows (X/Y axis)
- Microscopic detail annotations alongside macroscopic bold headers
- Corner metadata: tiny barcodes, timestamps, technical parameters
- High contrast between massive bold headers and tiny 8pt-style annotations
## Typography
- Headers: Bold brutalist characters, high visual impact
- Body: Professional sans-serif or crisp technical print
- Numbers: Large, highlighted with yellow or blue to stand out
- Annotations: Ultra-crisp, small technical labels
## Style Enforcement
- Strictly systematic color usage: only teal, pink, yellow, charcoal—no rainbow palette
- Sufficient fine grid lines and coordinate annotations throughout
- Maintain tension between large impactful headers and small precise parameters
- Lab manual aesthetic: mix of microscopic details and macroscopic data
## Avoid
- Cute or cartoonish doodles
- Soft pastels or generic textures
- Empty white space
- Flat vector stock icons
- Organic or hand-drawn imperfections
## Best For
Technical product guides, specification comparisons, precision-focused data visualization, engineering-adjacent content
@@ -0,0 +1,47 @@
# retro-pop-grid
1970s retro pop art with strict Swiss international grid, thick black outlines, and flat color blocks.
## Color Palette
- Background: Warm vintage cream/beige (#F5F0E6)
- Flat accents: Salmon pink, sky blue, mustard yellow, mint green—all muted retro tones
- Contrast blocks: Solid pure black (#000000) and solid pure white (#FFFFFF) used strategically for extreme contrast
- Line art and outlines: Solid thick black
## Visual Elements
- Uniform thick black outlines on all illustrations, text boxes, and grid dividers
- Pure 2D flat vector aesthetic with subtle screen print texture
- Strict Swiss international grid: poster divided into square and rectangular cells by thick black lines
- Black-background cells with white text for warnings or key categories (inverted contrast)
- Geometric fill patterns in empty cells: checkerboards, diagonal lines, dots
- Flat abstract symbols, warning signs, keyholes, stars, arrows
- Vintage comic-style smiley/frowny faces for quality indicators
- Colored cells used for breathing room—some with minimal/no content
## Typography
- Headers: Bold brutalist or retro thick display fonts, high legibility
- Body: Clean sans-serif, structured typographic alignment
- Decorative English text acceptable for stylistic labels ("WARNING", "INFO", "BEST")
- All content text in specified language
## Style Enforcement
- Absolutely no gradients, shading, drop shadows, or 3D effects
- Everything anchored in grid cells—no floating or unorganized elements
- Maintain 1970s retro pop art and underground comic illustration feel
- Visual density balanced with rhythmic grid—some cells intentionally sparse for contrast
## Avoid
- 3D rendering, realistic details, gradients, soft shadows
- Soft, thin, or sketch-like pencil lines
- Free-flowing, unorganized, or floating layouts (everything must be grid-anchored)
- Pure white background canvas
- Organic or hand-drawn imperfections
## Best For
Trendy product guides, design-conscious content, visually striking comparisons, content targeting design-savvy audiences, bold social media posts
@@ -0,0 +1,29 @@
# storybook-watercolor
Soft hand-painted illustration with whimsical charm
## Color Palette
- Primary: Soft watercolor washes - muted blues, greens, warm earth
- Background: Watercolor paper texture, white or cream
- Accents: Deeper pigment pools, splatter effects
## Visual Elements
- Visible brushstrokes
- Soft color bleeds and gradients
- White space as design element
- Delicate line work over washes
- Natural, organic shapes
- Dreamy, atmospheric quality
## Typography
- Elegant hand-lettering
- Watercolor-style text
- Flowing, organic letterforms
- Integrated with illustrations
## Best For
Storytelling, emotional journeys, nature topics, children's education, artistic presentations
@@ -0,0 +1,29 @@
# subway-map
Transit diagram style with colored lines and stations
## Color Palette
- Primary: Transit line colors - red, blue, green, yellow, orange
- Background: White or light gray
- Accents: Station dots, interchange markers
## Visual Elements
- Colored route lines
- 45° and 90° angles only
- Station circle markers
- Interchange symbols
- Simplified geography
- Line thickness hierarchy
## Typography
- Clean sans-serif
- Station name labels
- Line number/name badges
- Horizontal or angled text
## Best For
Journey maps, process flows, network diagrams, route explanations
@@ -0,0 +1,36 @@
# technical-schematic
Technical diagrams with engineering precision and clean geometry.
## Color Palette
- Primary: Blues (#2563EB), teals, grays, white lines
- Background: Deep blue (#1E3A5F), white, or light gray with grid
- Accents: Amber highlights (#F59E0B), cyan callouts
## Variants
| Variant | Focus | Visual Emphasis |
|---------|-------|-----------------|
| **Blueprint** | Engineering schematics | White on blue, measurements, grid |
| **Isometric** | 3D spatial representation | 30° angle blocks, clean fills |
## Visual Elements
- Geometric precision throughout
- Grid pattern or isometric angle
- Dimension lines and measurements
- Technical symbols and annotations
- Clean vector shapes
- Consistent stroke weights
## Typography
- Technical stencil or clean sans-serif
- All-caps labels
- Measurement annotations
- Floating labels for isometric
## Best For
Technical architecture, system diagrams, engineering specs, product breakdowns, data visualization
@@ -0,0 +1,29 @@
# ui-wireframe
Grayscale interface mockup style
## Color Palette
- Primary: Grays - light (#E5E5E5), medium (#9CA3AF), dark (#374151)
- Background: White (#FFFFFF), light gray
- Accents: Blue for interactive (#3B82F6), red for emphasis
## Visual Elements
- Wireframe boxes and placeholders
- X marks for image placeholders
- Simple line icons
- Grid-based layout
- Annotation callouts
- Redline specifications
## Typography
- System fonts
- Placeholder "Lorem ipsum"
- UI label style
- Sans-serif throughout
## Best For
Product designs, UI explanations, app concepts, user flow diagrams
-202
View File
@@ -1,202 +0,0 @@
---
name: xitter
description: Interact with X/Twitter via the x-cli terminal client using official X API credentials. Use for posting, reading timelines, searching tweets, liking, retweeting, bookmarks, mentions, and user lookups.
version: 1.0.0
author: Siddharth Balyan + Hermes Agent
license: MIT
platforms: [linux, macos]
prerequisites:
commands: [uv]
env_vars: [X_API_KEY, X_API_SECRET, X_BEARER_TOKEN, X_ACCESS_TOKEN, X_ACCESS_TOKEN_SECRET]
metadata:
hermes:
tags: [twitter, x, social-media, x-cli]
homepage: https://github.com/Infatoshi/x-cli
---
# Xitter — X/Twitter via x-cli
Use `x-cli` for official X/Twitter API interactions from the terminal.
This skill is for:
- posting tweets, replies, and quote tweets
- searching tweets and reading timelines
- looking up users, followers, and following
- liking and retweeting
- checking mentions and bookmarks
This skill intentionally does not vendor a separate CLI implementation into Hermes. Install and use upstream `x-cli` instead.
## Important Cost / Access Note
X API access is not meaningfully free for most real usage. Expect to need paid or prepaid X developer access. If commands fail with permissions or quota errors, check your X developer plan first.
## Install
Install upstream `x-cli` with `uv`:
```bash
uv tool install git+https://github.com/Infatoshi/x-cli.git
```
Upgrade later with:
```bash
uv tool upgrade x-cli
```
Verify:
```bash
x-cli --help
```
## Credentials
You need these five values from the X Developer Portal:
- `X_API_KEY`
- `X_API_SECRET`
- `X_BEARER_TOKEN`
- `X_ACCESS_TOKEN`
- `X_ACCESS_TOKEN_SECRET`
Get them from:
- https://developer.x.com/en/portal/dashboard
### Why does X need 5 secrets?
Unfortunately, the official X API splits auth across both app-level and user-level credentials:
- `X_API_KEY` + `X_API_SECRET` identify your app
- `X_BEARER_TOKEN` is used for app-level read access
- `X_ACCESS_TOKEN` + `X_ACCESS_TOKEN_SECRET` let the CLI act as your user account for writes and authenticated actions
So yes — it is a lot of secrets for one integration, but this is the stable official API path and is still preferable to cookie/session scraping.
Setup requirements in the portal:
1. Create or open your app
2. In user authentication settings, set permissions to `Read and write`
3. Generate or regenerate the access token + access token secret after enabling write permissions
4. Save all five values carefully — missing any one of them will usually produce confusing auth or permission errors
Note: upstream `x-cli` expects the full credential set to be present, so even if you mostly care about read-only commands, it is simplest to configure all five.
## Cost / Friction Reality Check
If this setup feels heavier than it should be, that is because it is. Xs official developer flow is high-friction and often paid. This skill chooses the official API path because it is more stable and maintainable than browser-cookie/session approaches.
If the user wants the least brittle long-term setup, use this skill. If they want a zero-setup or unofficial path, that is a different trade-off and not what this skill is for.
## Where to Store Credentials
`x-cli` looks for credentials in `~/.config/x-cli/.env`.
If you already keep your X credentials in `~/.hermes/.env`, the cleanest setup is:
```bash
mkdir -p ~/.config/x-cli
ln -sf ~/.hermes/.env ~/.config/x-cli/.env
```
Or create a dedicated file:
```bash
mkdir -p ~/.config/x-cli
cat > ~/.config/x-cli/.env <<'EOF'
X_API_KEY=your_consumer_key
X_API_SECRET=your_secret_key
X_BEARER_TOKEN=your_bearer_token
X_ACCESS_TOKEN=your_access_token
X_ACCESS_TOKEN_SECRET=your_access_token_secret
EOF
chmod 600 ~/.config/x-cli/.env
```
## Quick Verification
```bash
x-cli user get openai
x-cli tweet search "from:NousResearch" --max 3
x-cli me mentions --max 5
```
If reads work but writes fail, regenerate the access token after confirming `Read and write` permissions.
## Common Commands
### Tweets
```bash
x-cli tweet post "hello world"
x-cli tweet get https://x.com/user/status/1234567890
x-cli tweet delete 1234567890
x-cli tweet reply 1234567890 "nice post"
x-cli tweet quote 1234567890 "worth reading"
x-cli tweet search "AI agents" --max 20
x-cli tweet metrics 1234567890
```
### Users
```bash
x-cli user get openai
x-cli user timeline openai --max 10
x-cli user followers openai --max 50
x-cli user following openai --max 50
```
### Self / Authenticated User
```bash
x-cli me mentions --max 20
x-cli me bookmarks --max 20
x-cli me bookmark 1234567890
x-cli me unbookmark 1234567890
```
### Quick Actions
```bash
x-cli like 1234567890
x-cli retweet 1234567890
```
## Output Modes
Use structured output when the agent needs to inspect fields programmatically:
```bash
x-cli -j tweet search "AI agents" --max 5
x-cli -p user get openai
x-cli -md tweet get 1234567890
x-cli -v -j tweet get 1234567890
```
Recommended defaults:
- `-j` for machine-readable output
- `-v` when you need timestamps, metrics, or metadata
- plain/default mode for quick human inspection
## Agent Workflow
1. Confirm `x-cli` is installed
2. Confirm credentials are present
3. Start with a read command (`user get`, `tweet search`, `me mentions`)
4. Use `-j` when extracting fields for later steps
5. Only perform write actions after confirming the target tweet/user and the user's intent
## Pitfalls
- **Paid API access**: many failures are plan/permission problems, not code problems.
- **403 oauth1-permissions**: regenerate the access token after enabling `Read and write`.
- **Reply restrictions**: X restricts many programmatic replies. `tweet quote` is often more reliable than `tweet reply`.
- **Rate limits**: expect per-endpoint limits and cooldown windows.
- **Credential drift**: if you rotate tokens in `~/.hermes/.env`, make sure `~/.config/x-cli/.env` still points at the current file.
## Notes
- Prefer official API workflows over cookie/session scraping.
- Use tweet URLs or IDs interchangeably — `x-cli` accepts both.
- If bookmark behavior changes upstream, check the upstream README first:
https://github.com/Infatoshi/x-cli
+386
View File
@@ -0,0 +1,386 @@
---
name: xurl
description: Interact with X/Twitter via xurl, the official X API CLI. Use for posting, replying, quoting, searching, timelines, mentions, likes, reposts, bookmarks, follows, DMs, media upload, and raw v2 endpoint access.
version: 1.0.0
author: xdevplatform + openclaw + Hermes Agent
license: MIT
platforms: [linux, macos]
prerequisites:
commands: [xurl]
metadata:
hermes:
tags: [twitter, x, social-media, xurl, official-api]
homepage: https://github.com/xdevplatform/xurl
upstream_skill: https://github.com/openclaw/openclaw/blob/main/skills/xurl/SKILL.md
---
# xurl — X (Twitter) API via the Official CLI
`xurl` is the X developer platform's official CLI for the X API. It supports shortcut commands for common actions AND raw curl-style access to any v2 endpoint. All commands return JSON to stdout.
Use this skill for:
- posting, replying, quoting, deleting posts
- searching posts and reading timelines/mentions
- liking, reposting, bookmarking
- following, unfollowing, blocking, muting
- direct messages
- media uploads (images and video)
- raw access to any X API v2 endpoint
- multi-app / multi-account workflows
This skill replaces the older `xitter` skill (which wrapped a third-party Python CLI). `xurl` is maintained by the X developer platform team, supports OAuth 2.0 PKCE with auto-refresh, and covers a substantially larger API surface.
---
## Secret Safety (MANDATORY)
Critical rules when operating inside an agent/LLM session:
- **Never** read, print, parse, summarize, upload, or send `~/.xurl` to LLM context.
- **Never** ask the user to paste credentials/tokens into chat.
- The user must fill `~/.xurl` with secrets manually on their own machine.
- **Never** recommend or execute auth commands with inline secrets in agent sessions.
- **Never** use `--verbose` / `-v` in agent sessions — it can expose auth headers/tokens.
- To verify credentials exist, only use: `xurl auth status`.
Forbidden flags in agent commands (they accept inline secrets):
`--bearer-token`, `--consumer-key`, `--consumer-secret`, `--access-token`, `--token-secret`, `--client-id`, `--client-secret`
App credential registration and credential rotation must be done by the user manually, outside the agent session. After credentials are registered, the user authenticates with `xurl auth oauth2` — also outside the agent session. Tokens persist to `~/.xurl` in YAML. Each app has isolated tokens. OAuth 2.0 tokens auto-refresh.
---
## Installation
Pick ONE method. On Linux, the shell script or `go install` are the easiest.
```bash
# Shell script (installs to ~/.local/bin, no sudo, works on Linux + macOS)
curl -fsSL https://raw.githubusercontent.com/xdevplatform/xurl/main/install.sh | bash
# Homebrew (macOS)
brew install --cask xdevplatform/tap/xurl
# npm
npm install -g @xdevplatform/xurl
# Go
go install github.com/xdevplatform/xurl@latest
```
Verify:
```bash
xurl --help
xurl auth status
```
If `xurl` is installed but `auth status` shows no apps or tokens, the user needs to complete auth manually — see the next section.
---
## One-Time User Setup (user runs these outside the agent)
These steps must be performed by the user directly, NOT by the agent, because they involve pasting secrets. Direct the user to this block; do not execute it for them.
1. Create or open an app at https://developer.x.com/en/portal/dashboard
2. Set the redirect URI to `http://localhost:8080/callback`
3. Copy the app's Client ID and Client Secret
4. Register the app locally (user runs this):
```bash
xurl auth apps add my-app --client-id YOUR_CLIENT_ID --client-secret YOUR_CLIENT_SECRET
```
5. Authenticate:
```bash
xurl auth oauth2
```
(This opens a browser for the OAuth 2.0 PKCE flow.)
6. Verify:
```bash
xurl auth status
xurl whoami
```
After this, the agent can use any command below without further setup. OAuth 2.0 tokens auto-refresh.
---
## Quick Reference
| Action | Command |
| --- | --- |
| Post | `xurl post "Hello world!"` |
| Reply | `xurl reply POST_ID "Nice post!"` |
| Quote | `xurl quote POST_ID "My take"` |
| Delete a post | `xurl delete POST_ID` |
| Read a post | `xurl read POST_ID` |
| Search posts | `xurl search "QUERY" -n 10` |
| Who am I | `xurl whoami` |
| Look up a user | `xurl user @handle` |
| Home timeline | `xurl timeline -n 20` |
| Mentions | `xurl mentions -n 10` |
| Like / Unlike | `xurl like POST_ID` / `xurl unlike POST_ID` |
| Repost / Undo | `xurl repost POST_ID` / `xurl unrepost POST_ID` |
| Bookmark / Remove | `xurl bookmark POST_ID` / `xurl unbookmark POST_ID` |
| List bookmarks / likes | `xurl bookmarks -n 10` / `xurl likes -n 10` |
| Follow / Unfollow | `xurl follow @handle` / `xurl unfollow @handle` |
| Following / Followers | `xurl following -n 20` / `xurl followers -n 20` |
| Block / Unblock | `xurl block @handle` / `xurl unblock @handle` |
| Mute / Unmute | `xurl mute @handle` / `xurl unmute @handle` |
| Send DM | `xurl dm @handle "message"` |
| List DMs | `xurl dms -n 10` |
| Upload media | `xurl media upload path/to/file.mp4` |
| Media status | `xurl media status MEDIA_ID` |
| List apps | `xurl auth apps list` |
| Remove app | `xurl auth apps remove NAME` |
| Set default app | `xurl auth default APP_NAME [USERNAME]` |
| Per-request app | `xurl --app NAME /2/users/me` |
| Auth status | `xurl auth status` |
Notes:
- `POST_ID` accepts full URLs too (e.g. `https://x.com/user/status/1234567890`) — xurl extracts the ID.
- Usernames work with or without a leading `@`.
---
## Command Details
### Posting
```bash
xurl post "Hello world!"
xurl post "Check this out" --media-id MEDIA_ID
xurl post "Thread pics" --media-id 111 --media-id 222
xurl reply 1234567890 "Great point!"
xurl reply https://x.com/user/status/1234567890 "Agreed!"
xurl reply 1234567890 "Look at this" --media-id MEDIA_ID
xurl quote 1234567890 "Adding my thoughts"
xurl delete 1234567890
```
### Reading & Search
```bash
xurl read 1234567890
xurl read https://x.com/user/status/1234567890
xurl search "golang"
xurl search "from:elonmusk" -n 20
xurl search "#buildinpublic lang:en" -n 15
```
### Users, Timeline, Mentions
```bash
xurl whoami
xurl user elonmusk
xurl user @XDevelopers
xurl timeline -n 25
xurl mentions -n 20
```
### Engagement
```bash
xurl like 1234567890
xurl unlike 1234567890
xurl repost 1234567890
xurl unrepost 1234567890
xurl bookmark 1234567890
xurl unbookmark 1234567890
xurl bookmarks -n 20
xurl likes -n 20
```
### Social Graph
```bash
xurl follow @XDevelopers
xurl unfollow @XDevelopers
xurl following -n 50
xurl followers -n 50
# Another user's graph
xurl following --of elonmusk -n 20
xurl followers --of elonmusk -n 20
xurl block @spammer
xurl unblock @spammer
xurl mute @annoying
xurl unmute @annoying
```
### Direct Messages
```bash
xurl dm @someuser "Hey, saw your post!"
xurl dms -n 25
```
### Media Upload
```bash
# Auto-detect type
xurl media upload photo.jpg
xurl media upload video.mp4
# Explicit type/category
xurl media upload --media-type image/jpeg --category tweet_image photo.jpg
# Videos need server-side processing — check status (or poll)
xurl media status MEDIA_ID
xurl media status --wait MEDIA_ID
# Full workflow
xurl media upload meme.png # returns media id
xurl post "lol" --media-id MEDIA_ID
```
---
## Raw API Access
The shortcuts cover common operations. For anything else, use raw curl-style mode against any X API v2 endpoint:
```bash
# GET
xurl /2/users/me
# POST with JSON body
xurl -X POST /2/tweets -d '{"text":"Hello world!"}'
# DELETE / PUT / PATCH
xurl -X DELETE /2/tweets/1234567890
# Custom headers
xurl -H "Content-Type: application/json" /2/some/endpoint
# Force streaming
xurl -s /2/tweets/search/stream
# Full URLs also work
xurl https://api.x.com/2/users/me
```
---
## Global Flags
| Flag | Short | Description |
| --- | --- | --- |
| `--app` | | Use a specific registered app (overrides default) |
| `--auth` | | Force auth type: `oauth1`, `oauth2`, or `app` |
| `--username` | `-u` | Which OAuth2 account to use (if multiple exist) |
| `--verbose` | `-v` | **Forbidden in agent sessions** — leaks auth headers |
| `--trace` | `-t` | Add `X-B3-Flags: 1` trace header |
---
## Streaming
Streaming endpoints are auto-detected. Known ones include:
- `/2/tweets/search/stream`
- `/2/tweets/sample/stream`
- `/2/tweets/sample10/stream`
Force streaming on any endpoint with `-s`.
---
## Output Format
All commands return JSON to stdout. Structure mirrors X API v2:
```json
{ "data": { "id": "1234567890", "text": "Hello world!" } }
```
Errors are also JSON:
```json
{ "errors": [ { "message": "Not authorized", "code": 403 } ] }
```
---
## Common Workflows
### Post with an image
```bash
xurl media upload photo.jpg
xurl post "Check out this photo!" --media-id MEDIA_ID
```
### Reply to a conversation
```bash
xurl read https://x.com/user/status/1234567890
xurl reply 1234567890 "Here are my thoughts..."
```
### Search and engage
```bash
xurl search "topic of interest" -n 10
xurl like POST_ID_FROM_RESULTS
xurl reply POST_ID_FROM_RESULTS "Great point!"
```
### Check your activity
```bash
xurl whoami
xurl mentions -n 20
xurl timeline -n 20
```
### Multiple apps (credentials pre-configured manually)
```bash
xurl auth default prod alice # prod app, alice user
xurl --app staging /2/users/me # one-off against staging
```
---
## Error Handling
- Non-zero exit code on any error.
- API errors are still printed as JSON to stdout, so you can parse them.
- Auth errors → have the user re-run `xurl auth oauth2` outside the agent session.
- Commands that need the caller's user ID (like, repost, bookmark, follow, etc.) will auto-fetch it via `/2/users/me`. An auth failure there surfaces as an auth error.
---
## Agent Workflow
1. Verify prerequisites: `xurl --help` and `xurl auth status`.
2. If auth is missing, stop and direct the user to the "One-Time User Setup" section — do NOT attempt to register apps or pass secrets yourself.
3. Start with a cheap read (`xurl whoami`, `xurl user @handle`, `xurl search ... -n 3`) to confirm reachability.
4. Confirm the target post/user and the user's intent before any write action (post, reply, like, repost, DM, follow, block, delete).
5. Use JSON output directly — every response is already structured.
6. Never paste `~/.xurl` contents back into the conversation.
---
## Notes
- **Rate limits:** X enforces per-endpoint rate limits. A 429 means wait and retry. Write endpoints (post, reply, like, repost) have tighter limits than reads.
- **Scopes:** OAuth 2.0 tokens use broad scopes. A 403 on a specific action usually means the token is missing a scope — have the user re-run `xurl auth oauth2`.
- **Token refresh:** OAuth 2.0 tokens auto-refresh. Nothing to do.
- **Multiple apps:** Each app has isolated credentials/tokens. Switch with `xurl auth default` or `--app`.
- **Multiple accounts per app:** Select with `-u / --username`, or set a default with `xurl auth default APP USER`.
- **Token storage:** `~/.xurl` is YAML. Never read or send this file to LLM context.
- **Cost:** X API access is typically paid for meaningful usage. Many failures are plan/permission problems, not code problems.
---
## Attribution
- Upstream CLI: https://github.com/xdevplatform/xurl (X developer platform team, Chris Park et al.)
- Upstream agent skill: https://github.com/openclaw/openclaw/blob/main/skills/xurl/SKILL.md
- Hermes adaptation: reformatted for Hermes skill conventions; safety guardrails preserved verbatim.

Some files were not shown because too many files have changed in this diff Show More