Compare commits

...

25 Commits

Author SHA1 Message Date
Teknium 29c850058f fix(moonshot): strip $ref siblings and collapse tuple items in tool schemas
Port from anomalyco/opencode#24730: Moonshot's JSON Schema validator rejects
two shapes that the rest of the JSON Schema ecosystem accepts:

1. $ref nodes with sibling keywords. Moonshot expands the reference before
   validation and then rejects the node if keys like `description`, `type`,
   or `default` appear alongside $ref. MCP-sourced tool schemas commonly
   put a `description` on $ref-typed properties so the model sees the
   field hint — which worked on every provider except Moonshot.

2. Tuple-style `items` arrays (positional element schemas). Moonshot's
   engine requires ONE schema applied to every array element. Common in
   tool schemas generated from Go/Protobuf that model fixed-length arrays
   as `[{type:number}, {type:number}]`.

Repairs applied in `agent/moonshot_schema.py`:

- Rule 3: when a node has `$ref`, return `{"$ref": <value>}` only
  (strip every sibling). The referenced definition still carries its own
  description on the target node, which Moonshot accepts.
- Rule 4: when `items` is a list, collapse to the first element schema
  (falling back to `{}` which is then filled by the generic missing-type
  rule). Preserves `minItems` / `maxItems` / other siblings.

Tests: 10 new cases across TestRefSiblingStripping + TestTupleItems,
plus the existing TestMissingTypeFilled::test_ref_node_is_not_given_synthetic_type
still passes (it asserted plain $ref passes through; now it passes through
as exactly `{"$ref": "..."}` which is strictly compatible).

All 35 tests in test_moonshot_schema.py pass.
2026-04-30 17:07:46 -07:00
Teknium e5dad4ac57 fix(agent): propagate ContextVars to concurrent tool worker threads (#18123)
Propagates ContextVars (notably `tools.approval._approval_session_key`) into concurrent tool worker threads via `copy_context().run` — mirrors `asyncio.to_thread` semantics.

Fixes approval-card cross-session misrouting in concurrent gateway traffic. Repro'd on Slack: session A's dangerous-command approval was delivered to channel B (@syahidfrd).

Salvages #16660 — core 4-LOC fix preserved, unrelated `tests/eval_018/` scope contamination dropped. Adds 5 regression guards including an AST-level source check on the real call site.

Closes #16660.

Co-authored-by: firefly <promptsiren@gmail.com>
Co-authored-by: banditburai <banditburai@users.noreply.github.com>
2026-04-30 16:26:26 -07:00
Teknium 180a7036bc feat(skills): add Shopify optional skill (Admin + Storefront GraphQL) (#18116)
Adds optional-skills/productivity/shopify — curl-based guide for the
Shopify Admin GraphQL API (products, orders, customers, inventory,
metafields, bulk operations, webhooks) and the Storefront GraphQL API.

- API version 2026-01 (current stable)
- Custom-app access tokens (shpat_...) with X-Shopify-Access-Token header
- Notes the 2026-01-01 deprecation of admin-created custom apps, points
  users at Dev Dashboard for new setups after that date
- Includes a reusable shop_gql() bash helper, cursor pagination,
  rate-limit cost inspection, GID conventions, userErrors check
- Safety section warns on destructive mutations (delete/refund/cancel)

Installs cleanly via: hermes skills install official/productivity/shopify
2026-04-30 15:58:44 -07:00
brooklyn! 8fed969618 Merge pull request #18113 from NousResearch/bb/tui-sgr-mouse-fragments
fix(tui): recover fragmented SGR mouse reports
2026-04-30 15:56:59 -07:00
Brooklyn Nicholson ded011c5a5 fix(tui): tighten SGR fragment matching 2026-04-30 17:50:49 -05:00
Brooklyn Nicholson 71b685aee0 fix(tui): recover fragmented SGR mouse reports 2026-04-30 17:43:21 -05:00
Teknium bbbce92651 feat(tui): render self-improvement review summaries in the transcript
The Ink TUI (\`hermes --tui\` + dashboard \`/chat\`) had no wiring for the
background self-improvement review. When the review fired and patched
a skill or saved a memory entry, the change landed but the user had
no visual indication it happened — only the CLI had a print surface
for the '💾 Self-improvement review: …' line.

Changes:

- tui_gateway/server.py: in _init_session, attach
  agent.background_review_callback to an _emit('review.summary',
  sid, {text}) closure. Wrapped in try/except so agents with locked
  attribute slots don't break session startup.
- ui-tui/src/app/createGatewayEventHandler.ts: handle 'review.summary'
  by routing ev.payload.text through sys(…), matching the existing
  'background.complete' pattern. Empty / whitespace payloads are
  ignored so the transcript never gets a blank system line.
- ui-tui/src/gatewayTypes.ts: extend the GatewayEvent discriminated
  union with { type: 'review.summary', payload?: { text?: string } }.

Gateway platforms (Telegram, Discord, Slack, …) already route the
review summary via background_review_callback → post-delivery queue
in gateway/run.py, so they pick up the new 'Self-improvement review:'
prefix from the companion run_agent change with no platform edits.

Tests:
- tests/tui_gateway/test_review_summary_callback.py (Python, 2 tests):
  _init_session attaches a callback that emits the right event; the
  callback path survives agents that can't accept the attribute.
- ui-tui/src/__tests__/createGatewayEventHandler.test.ts (vitest, 2
  new cases): review.summary events feed sys(...) with the full text;
  empty / missing payloads are no-ops.
- TypeScript type-check passes.
- tui_gateway suite: 64/64 pass.
2026-04-30 14:07:22 -07:00
Teknium 80a676658c fix(cli): surface self-improvement review summaries from bg thread
When the self-improvement background review fires after a turn, it runs
in a bg thread and emits a '  💾 <summary>' line to announce what it
saved to memory or skills. Two problems made this invisible to users
even when the review successfully modified a skill:

1. The print went through `_cprint` (prompt_toolkit's print_formatted_text)
   on a bg thread while the CLI's PromptSession was live. Direct
   print_formatted_text races with the input-area redraw and the line
   can land behind/above the prompt, scrolled off without the user
   seeing it.

2. The message said only '💾 Skill created.' / '💾 Memory updated'
   with no indication that the self-improvement loop was the one doing
   this. Users who did catch the line couldn't tell the background
   review from some other agent action.

Fixes:

- `_cprint` now detects when it's called from a non-app thread with a
  running prompt_toolkit Application, and routes through
  `run_in_terminal` via `loop.call_soon_threadsafe`. That pauses the
  input, prints the line above the prompt, and redraws — the normal
  prompt_toolkit contract for bg-thread output. Direct-print fallback
  preserved for the no-app / same-thread / import-error paths. Affects
  every bg-thread emission, not just the review summary (curator
  summaries and auxiliary failure prints benefit too).

- The summary now reads '  💾 Self-improvement review: <summary>' in
  both the CLI and the gateway `background_review_callback` path, so
  the origin is unambiguous.

Tests:
- New `tests/cli/test_cprint_bg_thread.py` covers all five routing
  branches (no app, app-not-running, cross-thread schedule, same-thread
  direct, app-loop-attribute-error, import-error).
- New case in `tests/run_agent/test_background_review.py` asserts the
  attributed prefix shows up in both `_safe_print` and
  `background_review_callback`.

Live E2E: exercised _cprint from a bg thread inside a real Application
event loop; confirmed get_app_or_none() sees the app, call_soon_threadsafe
schedules run_in_terminal, and the inner _pt_print runs.
2026-04-30 14:07:22 -07:00
Teknium c868425467 feat(kanban): durable multi-profile collaboration board (#17805)
Salvage of PR #16100 onto current main (after emozilla's #17514 fix
that unblocks plugin Pydantic body validation). History preserved on
the standing `feat/kanban-standing` branch; this squashes the 22
iterative commits into one clean landing.

What this lands:
- SQLite kernel (hermes_cli/kanban_db.py) — durable task board with
  tasks, task_links, task_runs, task_comments, task_events,
  kanban_notify_subs tables. WAL mode, atomic claim via CAS,
  tenant-namespaced, skills JSON array per task, max-runtime timeouts,
  worker heartbeats, idempotency keys, circuit breaker on repeated
  spawn failures, crash detection via /proc/<pid>/status, run history
  preserved across attempts.
- Dispatcher — runs inside the gateway by default
  (`kanban.dispatch_in_gateway: true`). Ticks every 60s, reclaims
  stale claims, promotes ready tasks, spawns `hermes -p <assignee>
  chat -q "work kanban task <id>"` with HERMES_KANBAN_TASK +
  HERMES_KANBAN_WORKSPACE env. Auto-loads `--skills kanban-worker`
  plus any per-task skills. Health telemetry warns on stuck ready
  queue.
- Structured tool surface (tools/kanban_tools.py) — 7 tools
  (kanban_show, kanban_complete, kanban_block, kanban_heartbeat,
  kanban_comment, kanban_create, kanban_link). Gated on
  HERMES_KANBAN_TASK via check_fn so zero schema footprint in normal
  sessions.
- System-prompt guidance (agent/prompt_builder.py KANBAN_GUIDANCE)
  injected only when kanban tools are active.
- Dashboard plugin (plugins/kanban/dashboard/) — Linear-style board
  UI: triage/todo/ready/running/blocked/done columns, drag-drop,
  inline create, task drawer with markdown, comments, run history,
  dependency editor, bulk ops, lanes-by-profile grouping, WS-driven
  live refresh. Matches active dashboard theme via CSS variables.
- CLI — `hermes kanban init|create|list|show|assign|link|unlink|
  claim|comment|complete|block|unblock|archive|tail|dispatch|context|
  init|gc|watch|stats|notify|log|heartbeat|runs|assignees` +
  `/kanban` slash in-session.
- Worker + orchestrator skills (skills/devops/kanban-worker +
  kanban-orchestrator) — pattern library for good summary/metadata
  shapes, retry diagnostics, block-reason examples, fan-out patterns.
- Per-task force-loaded skills — `--skill <name>` (repeatable),
  stored as JSON, threaded through to dispatcher argv as one
  `--skills X` pair per skill alongside the built-in kanban-worker.
  Dashboard + CLI + tool parity.
- Deprecation of standalone `hermes kanban daemon` — stub exits 2
  with migration guidance; `--force` escape hatch for headless hosts.
- Docs (website/docs/user-guide/features/kanban.md + kanban-tutorial.md)
  with 11 dashboard screenshots walking through four user stories
  (Solo Dev, Fleet Farming, Role Pipeline, Circuit Breaker).
- Tests (251 passing): kernel schema + migration + CAS atomicity,
  dispatcher logic, circuit breaker, crash detection, max-runtime
  timeouts, claim lifecycle, tenant isolation, idempotency keys, per-
  task skills round-trip + validation + dispatcher argv, tool surface
  (7 tools × round-trip + error paths), dashboard REST (CRUD + bulk
  + links + warnings), gateway-embedded dispatcher (config gate, env
  override, graceful shutdown), CLI deprecation stub, migration from
  legacy schemas.

Gateway integration:
- GatewayRunner._kanban_dispatcher_watcher — new asyncio background
  task, symmetric with _kanban_notifier_watcher. Runs dispatch_once
  via asyncio.to_thread so SQLite WAL never blocks the loop. Sleeps
  in 1s slices for snappy shutdown. Respects HERMES_KANBAN_DISPATCH_IN_GATEWAY=0
  env override for debugging.
- Config: new `kanban` section in DEFAULT_CONFIG with
  `dispatch_in_gateway: true` (default) + `dispatch_interval_seconds: 60`.
  Additive — no \_config_version bump needed.

Forward-compat:
- workflow_template_id / current_step_key columns on tasks (v1 writes
  NULL; v2 will use them for routing).
- task_runs holds claim machinery (claim_lock, claim_expires,
  worker_pid, last_heartbeat_at) so multi-attempt history is first-
  class from day one.

Closes #16102.

Co-authored-by: emozilla <emozilla@nousresearch.com>
2026-04-30 13:36:47 -07:00
ethernet 59c1a13f45 Merge pull request #15680 from NousResearch/fix/nix-package-lock
fix: let fixing nix pkgs command work without an initial build
2026-04-30 16:21:51 -04:00
Teknium 1d8068d71d feat(models): add openrouter/owl-alpha (free) to curated OpenRouter list (#18071) 2026-04-30 12:57:02 -07:00
Ari Lotter 9ac4a2e53e fix: let fixing nix pkgs command work without an initial build 2026-04-30 15:39:45 -04:00
Austin Pickett 6bc5d72271 Merge pull request #16419 from vincez-hms-coder/feat/dashboard-profiles-hms-coder
feat(dashboard): add profiles management page
2026-04-30 12:09:23 -07:00
ethernet b737af8226 Merge pull request #18047 from stephenschoettler/fix/acp-persist-user-message-test-mocks
test(acp): accept prompt persistence kwargs in MCP E2E mocks
2026-04-30 14:43:26 -04:00
Stephen Schoettler 699a9c11a9 test(acp): accept prompt persistence kwargs in mocks 2026-04-30 10:47:23 -07:00
VinceZ-Hms-Coder ca7f46beb5 Merge upstream/main and address Copilot review feedback
Merge resolved conflicts in web/src/{i18n/{en,zh,types}.ts,lib/api.ts}
by keeping both this branch's `profiles` additions and upstream's new
`models` page additions.

Copilot review feedback:
- Implement POST /api/profiles/{name}/open-terminal endpoint (already
  present); align Windows branch to `cmd.exe /c start "" <cmd>` so it
  matches the new test and spawns a fresh window instead of /k reusing
  the parent console.
- Move backslash escaping out of the macOS AppleScript f-string
  expression (Python <3.12 disallows backslashes inside f-string
  expression parts).
- Patch `_get_wrapper_dir` via monkeypatch in
  test_profiles_create_creates_wrapper_alias_when_safe so the test no
  longer writes to the real `~/.local/bin`.
- Extend test_dashboard_browser_safe_imports to scan `.ts` files in
  addition to `.tsx`.
- Switch upstream's new ModelsPage.tsx away from the `@nous-research/ui`
  root barrel onto per-component subpaths to satisfy the stricter scan.
- Fix NouiTypography `leading-1.4` -> `leading-[1.4]` so Tailwind
  actually emits the line-height for the `sm` variant.
- Guard ProfilesPage.openSoulEditor against out-of-order responses by
  tracking the latest requested profile via a ref.
- Replace ProfilesPage's hand-rolled setup command with a fetch to
  `/api/profiles/{name}/setup-command` so the copied command always
  matches what the backend would actually run (handles wrapper-alias
  collisions and reserved names correctly).
- Wire SOUL.md textarea label `htmlFor` -> textarea `id` so screen
  readers and clicking the label work as expected.
2026-04-30 06:43:22 -04:00
Austin Pickett cb0e2e2f36 Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-04-29 15:23:30 -04:00
vincez-hms-coder 4c0cc77e94 fix(dashboard): keep ui imports browser-safe after rebase 2026-04-29 01:47:13 -04:00
vincez-hms-coder 9b62c98170 chore(dashboard): restore package lock metadata 2026-04-29 01:43:21 -04:00
vincez-hms-coder 469e4df3c2 fix(profiles): preserve skills on dashboard profile creation 2026-04-29 01:42:51 -04:00
vincez-hms-coder ae11a31058 feat(profiles): add profile setup command endpoint and wrapper creation 2026-04-29 01:42:51 -04:00
vincez-hms-coder 3e200b64fb fix(profiles): update terminal command for copying based on profile name
Co-authored-by: Copilot <copilot@github.com>
2026-04-29 01:42:51 -04:00
vincez-hms-coder 1745cfc6d7 fix(dashboard): avoid node-only ui imports in browser 2026-04-29 01:42:50 -04:00
vincez-hms-coder 58c07867e3 fix(dashboard): keep profiles list resilient 2026-04-29 01:39:52 -04:00
vincez-hms-coder 4523965de9 feat(dashboard): add profiles management page
Copy profile dashboard changes onto a fresh branch under the vincez-hms-coder account.

Includes:
- Profiles dashboard route and sidebar entry
- Profile lifecycle REST endpoints
- SOUL.md read/write support
- i18n labels and helper text updates
- Targeted profile API tests

Test plan:
- pytest tests/hermes_cli/test_web_server.py -k profile -q
- cd web && npm run build
2026-04-29 01:39:51 -04:00
103 changed files with 20144 additions and 76 deletions
+25 -3
View File
@@ -15,6 +15,16 @@ and MoonshotAI/kimi-cli#1595:
2. When ``anyOf`` is used, ``type`` must be on the ``anyOf`` children, not
the parent. Presence of both causes "type should be defined in anyOf
items instead of the parent schema".
3. ``$ref`` nodes may not carry sibling keywords. Moonshot expands the
reference before validation and then rejects the node if sibling keys
like ``description`` remain on the same node as ``$ref``. Strip every
sibling from ``$ref`` nodes so only ``{"$ref": "..."}`` survives.
(Ported from anomalyco/opencode#24730.)
4. ``items`` may not be a tuple-style array (``items: [schemaA, schemaB]``
for positional element schemas). Moonshot's schema engine requires a
single object schema applied to every array element. Collapse tuple
``items`` to the first element schema (or ``{}`` if the tuple is empty).
(Ported from anomalyco/opencode#24730.)
The ``#/definitions/...`` → ``#/$defs/...`` rewrite for draft-07 refs is
handled separately in ``tools/mcp_tool._normalize_mcp_input_schema`` so it
@@ -66,6 +76,16 @@ def _repair_schema(node: Any, is_schema: bool = True) -> Any:
}
elif key in _SCHEMA_LIST_KEYS and isinstance(value, list):
repaired[key] = [_repair_schema(v, is_schema=True) for v in value]
elif key == "items" and isinstance(value, list):
# Rule 4: tuple-style ``items`` arrays (positional element
# schemas) are not accepted by Moonshot. Collapse to the
# first element schema if present, else to ``{}``. This
# matches opencode's behaviour for moonshotai / kimi models.
first = value[0] if value else {}
if isinstance(first, dict):
repaired[key] = _repair_schema(first, is_schema=True)
else:
repaired[key] = first
elif key in _SCHEMA_NODE_KEYS:
# items / not / additionalProperties: single nested schema.
# additionalProperties can also be a bool — leave those alone.
@@ -85,10 +105,12 @@ def _repair_schema(node: Any, is_schema: bool = True) -> Any:
repaired.pop("type", None)
return repaired
# Rule 1: property schemas without type need one. $ref nodes are exempt
# — their type comes from the referenced definition.
# Rule 3: $ref nodes must not have sibling keywords. Strip everything
# except $ref itself so Moonshot's validator (which expands the ref
# before checking) doesn't reject the node for redundant keys like
# ``description`` / ``type`` / ``default`` appearing alongside $ref.
if "$ref" in repaired:
return repaired
return {"$ref": repaired["$ref"]}
return _fill_missing_type(repaired)
+58
View File
@@ -182,6 +182,64 @@ SKILLS_GUIDANCE = (
"Skills that aren't maintained become liabilities."
)
KANBAN_GUIDANCE = (
"# You are a Kanban worker\n"
"You were spawned by the Hermes Kanban dispatcher to execute ONE task from "
"the shared board at `~/.hermes/kanban.db`. Your task id is in "
"`$HERMES_KANBAN_TASK`; your workspace is `$HERMES_KANBAN_WORKSPACE`. "
"The `kanban_*` tools in your schema are your primary coordination surface — "
"they write directly to the shared SQLite DB and work regardless of terminal "
"backend (local/docker/modal/ssh).\n"
"\n"
"## Lifecycle\n"
"\n"
"1. **Orient.** Call `kanban_show()` first (no args — it defaults to your "
"task). The response includes title, body, parent-task handoffs (summary + "
"metadata), any prior attempts on this task if you're a retry, the full "
"comment thread, and a pre-formatted `worker_context` you can treat as "
"ground truth.\n"
"2. **Work inside the workspace.** `cd $HERMES_KANBAN_WORKSPACE` before "
"any file operations. The workspace is yours for this run. Don't modify "
"files outside it unless the task explicitly asks.\n"
"3. **Heartbeat on long operations.** Call `kanban_heartbeat(note=...)` "
"every few minutes during long subprocesses (training, encoding, crawling). "
"Skip heartbeats for short tasks.\n"
"4. **Block on genuine ambiguity.** If you need a human decision you cannot "
"infer (missing credentials, UX choice, paywalled source, peer output you "
"need first), call `kanban_block(reason=\"...\")` and stop. Don't guess. "
"The user will unblock with context and the dispatcher will respawn you.\n"
"5. **Complete with structured handoff.** Call `kanban_complete(summary=..., "
"metadata=...)`. `summary` is 13 human-readable sentences naming concrete "
"artifacts. `metadata` is machine-readable facts "
"(`{changed_files: [...], tests_run: N, decisions: [...]}`). Downstream "
"workers read both via their own `kanban_show`. Never put secrets / "
"tokens / raw PII in either field — run rows are durable forever.\n"
"6. **If follow-up work appears, create it; don't do it.** Use "
"`kanban_create(title=..., assignee=<right-profile>, parents=[your-task-id])` "
"to spawn a child task for the appropriate specialist profile instead of "
"scope-creeping into the next thing.\n"
"\n"
"## Orchestrator mode\n"
"\n"
"If your task is itself a decomposition task (e.g. a planner profile given "
"a high-level goal), use `kanban_create` to fan out into child tasks — one "
"per specialist, each with an explicit `assignee` and `parents=[...]` to "
"express dependencies. Then `kanban_complete` your own task with a summary "
"of the decomposition. Do NOT execute the work yourself; your job is "
"routing, not implementation.\n"
"\n"
"## Do NOT\n"
"\n"
"- Do not shell out to `hermes kanban <verb>` for board operations. Use "
"the `kanban_*` tools — they work across all terminal backends.\n"
"- Do not complete a task you didn't actually finish. Block it.\n"
"- Do not assign follow-up work to yourself. Assign it to the right "
"specialist profile.\n"
"- Do not call `delegate_task` as a board substitute. `delegate_task` is "
"for short reasoning subtasks inside your own run; board tasks are for "
"cross-agent handoffs that outlive one API loop."
)
TOOL_USE_ENFORCEMENT_GUIDANCE = (
"# Tool-use enforcement\n"
"You MUST use your tools to take action — do not describe what you would do "
+89 -1
View File
@@ -1240,8 +1240,73 @@ def _cprint(text: str):
Raw ANSI escapes written via print() are swallowed by patch_stdout's
StdoutProxy. Routing through print_formatted_text(ANSI(...)) lets
prompt_toolkit parse the escapes and render real colors.
When called from a background thread while a prompt_toolkit
``Application`` is running (the common case for the self-improvement
background review's ``💾 …`` summary, curator summaries, and other
bg-thread emissions), a direct ``_pt_print`` races with the input
area's redraw and the line can end up visually buried behind the
prompt. Route those cases through ``run_in_terminal`` via
``loop.call_soon_threadsafe``, which pauses the input area, prints
the line above it, and redraws the prompt cleanly.
"""
_pt_print(_PT_ANSI(text))
try:
from prompt_toolkit.application import get_app_or_none, run_in_terminal
except Exception:
_pt_print(_PT_ANSI(text))
return
app = None
try:
app = get_app_or_none()
except Exception:
app = None
# No active app, or we're already on the app's main thread: the
# direct prompt_toolkit print is safe and matches existing behavior
# (spinner frames, streamed tokens, tool activity prefixes, …).
if app is None or not getattr(app, "_is_running", False):
_pt_print(_PT_ANSI(text))
return
try:
loop = app.loop # type: ignore[attr-defined]
except Exception:
loop = None
if loop is None:
_pt_print(_PT_ANSI(text))
return
import asyncio as _asyncio
try:
current_loop = _asyncio.get_event_loop_policy().get_event_loop()
except Exception:
current_loop = None
# Same thread as the app's loop → safe to print directly.
if current_loop is loop and loop.is_running():
_pt_print(_PT_ANSI(text))
return
# Cross-thread emission: ask the app's event loop to schedule a
# ``run_in_terminal`` that wraps ``_pt_print``. This hides the
# prompt, prints, and redraws. Fire-and-forget — if scheduling
# fails we fall back to a direct print so the line isn't lost.
def _schedule():
try:
run_in_terminal(lambda: _pt_print(_PT_ANSI(text)))
except Exception:
try:
_pt_print(_PT_ANSI(text))
except Exception:
pass
try:
loop.call_soon_threadsafe(_schedule)
except Exception:
try:
_pt_print(_PT_ANSI(text))
except Exception:
pass
# ---------------------------------------------------------------------------
@@ -6087,6 +6152,27 @@ class HermesCLI:
except Exception as exc:
print(f"(._.) curator: {exc}")
def _handle_kanban_command(self, cmd: str):
"""Handle the /kanban command — delegate to the shared kanban CLI.
The string form passed here is the user's full ``/kanban ...``
including the leading slash; we strip it and hand the remainder
to ``kanban.run_slash`` which returns a single formatted string.
"""
from hermes_cli.kanban import run_slash
rest = cmd.strip()
if rest.startswith("/"):
rest = rest.lstrip("/")
if rest.startswith("kanban"):
rest = rest[len("kanban"):].lstrip()
try:
output = run_slash(rest)
except Exception as exc: # pragma: no cover - defensive
output = f"(._.) kanban error: {exc}"
if output:
print(output)
def _handle_skills_command(self, cmd: str):
"""Handle /skills slash command — delegates to hermes_cli.skills_hub."""
from hermes_cli.skills_hub import handle_skills_slash
@@ -6332,6 +6418,8 @@ class HermesCLI:
self._handle_cron_command(cmd_original)
elif canonical == "curator":
self._handle_curator_command(cmd_original)
elif canonical == "kanban":
self._handle_kanban_command(cmd_original)
elif canonical == "skills":
with self._busy_command(self._slow_command_status(cmd_original)):
self._handle_skills_command(cmd_original)
Binary file not shown.
+493
View File
@@ -2732,6 +2732,17 @@ class GatewayRunner:
# Start background session expiry watcher to finalize expired sessions
asyncio.create_task(self._session_expiry_watcher())
# Start background kanban notifier — delivers `completed`, `blocked`,
# `spawn_auto_blocked`, and `crashed` events to gateway subscribers
# so human-in-the-loop workflows hear back without polling.
asyncio.create_task(self._kanban_notifier_watcher())
# Start background kanban dispatcher — spawns workers for ready
# tasks. Gated by `kanban.dispatch_in_gateway` (default True).
# When false, users run `hermes kanban daemon` externally or
# simply don't use kanban; this loop becomes a no-op.
asyncio.create_task(self._kanban_dispatcher_watcher())
# Start background reconnection watcher for platforms that failed at startup
if self._failed_platforms:
logger.info(
@@ -2907,6 +2918,399 @@ class GatewayRunner:
break
await asyncio.sleep(1)
async def _kanban_notifier_watcher(self, interval: float = 5.0) -> None:
"""Poll ``kanban_notify_subs`` and deliver terminal events to users.
For each subscription row, fetches ``task_events`` newer than the
stored cursor with kind in the terminal set (``completed``,
``blocked``, ``gave_up``, ``crashed``, ``timed_out``). Sends one
message per new event to ``(platform, chat_id, thread_id)``,
then advances the cursor. When a task reaches a terminal state
(``completed`` / ``archived``), the subscription is removed.
Runs in the gateway event loop; all SQLite work is pushed to a
thread via ``asyncio.to_thread`` so the loop never blocks on the
WAL lock. Failures in one tick don't stop subsequent ticks.
"""
from gateway.config import Platform as _Platform
try:
from hermes_cli import kanban_db as _kb
except Exception:
logger.warning("kanban notifier: kanban_db not importable; notifier disabled")
return
TERMINAL_KINDS = ("completed", "blocked", "gave_up", "crashed", "timed_out")
# Terminal event kinds trigger automatic unsubscription — the task
# is done, blocked, or in a retry-needed state that the human
# shouldn't keep pinging a stale chat for. Previously we only
# unsubbed when task.status in ('done', 'archived'), which left
# subscriptions on 'blocked' / 'gave_up' / 'crashed' / 'timed_out'
# tasks stranded forever.
TERMINAL_EVENT_KINDS = TERMINAL_KINDS
# Per-subscription send-failure counter. Adapter.send raising
# means the chat is dead (deleted, bot kicked, etc.) — after N
# consecutive send failures the sub is dropped so we don't spin
# against a dead chat every 5 seconds forever.
MAX_SEND_FAILURES = 3
sub_fail_counts: dict[tuple, int] = getattr(
self, "_kanban_sub_fail_counts", {}
)
self._kanban_sub_fail_counts = sub_fail_counts
# Initial delay so the gateway can finish wiring adapters.
await asyncio.sleep(5)
while self._running:
try:
def _collect():
conn = _kb.connect()
try:
_kb.init_db() # idempotent; handles first-run
except Exception:
pass
try:
subs = _kb.list_notify_subs(conn)
deliveries: list[dict] = []
for sub in subs:
cursor, events = _kb.unseen_events_for_sub(
conn,
task_id=sub["task_id"],
platform=sub["platform"],
chat_id=sub["chat_id"],
thread_id=sub.get("thread_id") or "",
kinds=TERMINAL_KINDS,
)
if not events:
continue
task = _kb.get_task(conn, sub["task_id"])
deliveries.append({
"sub": sub,
"cursor": cursor,
"events": events,
"task": task,
})
return deliveries
finally:
conn.close()
deliveries = await asyncio.to_thread(_collect)
for d in deliveries:
sub = d["sub"]
task = d["task"]
platform_str = (sub["platform"] or "").lower()
try:
plat = _Platform(platform_str)
except ValueError:
# Unknown platform string; skip and advance cursor so
# we don't replay forever.
await asyncio.to_thread(
self._kanban_advance, sub, d["cursor"],
)
continue
adapter = self.adapters.get(plat)
if adapter is None:
continue # platform not currently connected
title = (task.title if task else sub["task_id"])[:120]
for ev in d["events"]:
kind = ev.kind
# Identity prefix: attribute terminal pings to the
# worker that did the work. Makes fleets (where one
# chat subscribes to many tasks) legible at a glance.
who = (task.assignee if task and task.assignee else None)
tag = f"@{who} " if who else ""
if kind == "completed":
# Prefer the run's summary (the worker's
# intentional human-facing handoff, carried
# in the event payload), then fall back to
# task.result for legacy rows written before
# runs shipped.
handoff = ""
payload_summary = None
if ev.payload and ev.payload.get("summary"):
payload_summary = str(ev.payload["summary"])
if payload_summary:
h = payload_summary.strip().splitlines()[0][:200]
handoff = f"\n{h}"
elif task and task.result:
r = task.result.strip().splitlines()[0][:160]
handoff = f"\n{r}"
msg = (
f"{tag}Kanban {sub['task_id']} done"
f"{title}{handoff}"
)
elif kind == "blocked":
reason = ""
if ev.payload and ev.payload.get("reason"):
reason = f": {str(ev.payload['reason'])[:160]}"
msg = f"{tag}Kanban {sub['task_id']} blocked{reason}"
elif kind == "gave_up":
err = ""
if ev.payload and ev.payload.get("error"):
err = f"\n{str(ev.payload['error'])[:200]}"
msg = (
f"{tag}Kanban {sub['task_id']} gave up "
f"after repeated spawn failures{err}"
)
elif kind == "crashed":
msg = (
f"{tag}Kanban {sub['task_id']} worker crashed "
f"(pid gone); dispatcher will retry"
)
elif kind == "timed_out":
limit = 0
if ev.payload and ev.payload.get("limit_seconds"):
limit = int(ev.payload["limit_seconds"])
msg = (
f"{tag}Kanban {sub['task_id']} timed out "
f"(max_runtime={limit}s); will retry"
)
else:
continue
metadata: dict[str, Any] = {}
if sub.get("thread_id"):
metadata["thread_id"] = sub["thread_id"]
sub_key = (
sub["task_id"], sub["platform"],
sub["chat_id"], sub.get("thread_id") or "",
)
try:
await adapter.send(
sub["chat_id"], msg, metadata=metadata,
)
# Reset the failure counter on success.
sub_fail_counts.pop(sub_key, None)
except Exception as exc:
fails = sub_fail_counts.get(sub_key, 0) + 1
sub_fail_counts[sub_key] = fails
logger.warning(
"kanban notifier: send failed for %s on %s "
"(attempt %d/%d): %s",
sub["task_id"], platform_str, fails,
MAX_SEND_FAILURES, exc,
)
if fails >= MAX_SEND_FAILURES:
logger.warning(
"kanban notifier: dropping subscription "
"%s on %s after %d consecutive send failures",
sub["task_id"], platform_str, fails,
)
await asyncio.to_thread(self._kanban_unsub, sub)
sub_fail_counts.pop(sub_key, None)
# Don't advance cursor on send failure — retry next tick.
break
else:
# All events delivered; advance cursor + maybe unsub.
await asyncio.to_thread(
self._kanban_advance, sub, d["cursor"],
)
# Unsubscribe when the LAST delivered event is a
# terminal kind (the task hit a "no further updates"
# state), not just on task.status in {done, archived}.
# Covers blocked / gave_up / crashed / timed_out which
# used to leak subs forever.
last_kind = d["events"][-1].kind if d["events"] else None
task_terminal = task and task.status in ("done", "archived")
event_terminal = last_kind in TERMINAL_EVENT_KINDS
if task_terminal or event_terminal:
await asyncio.to_thread(
self._kanban_unsub, sub,
)
except Exception as exc:
logger.warning("kanban notifier tick failed: %s", exc)
# Sleep with cancellation checks.
for _ in range(int(max(1, interval))):
if not self._running:
return
await asyncio.sleep(1)
def _kanban_advance(self, sub: dict, cursor: int) -> None:
"""Sync helper: advance a subscription's cursor. Runs in to_thread."""
from hermes_cli import kanban_db as _kb
conn = _kb.connect()
try:
_kb.advance_notify_cursor(
conn,
task_id=sub["task_id"],
platform=sub["platform"],
chat_id=sub["chat_id"],
thread_id=sub.get("thread_id") or "",
new_cursor=cursor,
)
finally:
conn.close()
def _kanban_unsub(self, sub: dict) -> None:
from hermes_cli import kanban_db as _kb
conn = _kb.connect()
try:
_kb.remove_notify_sub(
conn,
task_id=sub["task_id"],
platform=sub["platform"],
chat_id=sub["chat_id"],
thread_id=sub.get("thread_id") or "",
)
finally:
conn.close()
async def _kanban_dispatcher_watcher(self) -> None:
"""Embedded kanban dispatcher — one tick every `dispatch_interval_seconds`.
Gated by `kanban.dispatch_in_gateway` in config.yaml (default True).
When true, the gateway hosts the single dispatcher for this profile:
no separate `hermes kanban daemon` process needed. When false, the
loop exits immediately and an external daemon is expected.
Each tick calls :func:`kanban_db.dispatch_once` inside
``asyncio.to_thread`` so the SQLite WAL lock never blocks the
event loop. Failures in one tick don't stop subsequent ticks —
same pattern as `_kanban_notifier_watcher`.
Shutdown: the loop checks ``self._running`` between ticks; gateway
stop() flips it to False and cancels pending tasks, and the
in-flight ``to_thread`` returns on its own after the current
``dispatch_once`` call finishes (typically <1ms on an idle board).
"""
# Read config once at boot. If the user flips the flag later, they
# restart the gateway; same pattern as every other background
# watcher here. Honours HERMES_KANBAN_DISPATCH_IN_GATEWAY env var
# as an escape hatch (false-y value disables without editing YAML).
try:
from hermes_cli.config import load_config as _load_config
except Exception:
logger.warning("kanban dispatcher: config loader unavailable; disabled")
return
env_override = os.environ.get("HERMES_KANBAN_DISPATCH_IN_GATEWAY", "").strip().lower()
if env_override in ("0", "false", "no", "off"):
logger.info("kanban dispatcher: disabled via HERMES_KANBAN_DISPATCH_IN_GATEWAY env")
return
try:
cfg = _load_config()
except Exception as exc:
logger.warning("kanban dispatcher: cannot load config (%s); disabled", exc)
return
kanban_cfg = cfg.get("kanban", {}) if isinstance(cfg, dict) else {}
if not kanban_cfg.get("dispatch_in_gateway", True):
logger.info(
"kanban dispatcher: disabled via config kanban.dispatch_in_gateway=false"
)
return
try:
from hermes_cli import kanban_db as _kb
except Exception:
logger.warning("kanban dispatcher: kanban_db not importable; dispatcher disabled")
return
interval = float(kanban_cfg.get("dispatch_interval_seconds", 60) or 60)
if interval < 1.0:
interval = 1.0 # sanity floor — tighter than this is a footgun
# Initial delay so the gateway finishes wiring adapters before the
# dispatcher spawns workers (those workers may hit gateway notify
# subscriptions etc.). Matches the notifier watcher's delay.
await asyncio.sleep(5)
# Health telemetry mirrored from `_cmd_daemon`: warn when ready
# queue is non-empty but spawns are 0 for N consecutive ticks —
# usually means broken PATH, missing venv, or credential loss.
HEALTH_WINDOW = 6
bad_ticks = 0
last_warn_at = 0
def _tick_once() -> "Optional[object]":
"""Run one dispatch_once; return result or None on error.
Runs in a worker thread via `asyncio.to_thread`."""
conn = None
try:
conn = _kb.connect()
try:
_kb.init_db() # idempotent, handles first-run
except Exception:
pass
return _kb.dispatch_once(conn)
except Exception:
logger.exception("kanban dispatcher: tick failed")
return None
finally:
if conn is not None:
try:
conn.close()
except Exception:
pass
def _ready_nonempty() -> bool:
"""Cheap probe: is there at least one ready+assigned+unclaimed task?"""
conn = None
try:
conn = _kb.connect()
row = conn.execute(
"SELECT 1 FROM tasks "
"WHERE status = 'ready' AND assignee IS NOT NULL "
" AND claim_lock IS NULL LIMIT 1"
).fetchone()
return row is not None
except Exception:
return False
finally:
if conn is not None:
try:
conn.close()
except Exception:
pass
logger.info(
"kanban dispatcher: embedded in gateway (interval=%.1fs)", interval
)
while self._running:
try:
res = await asyncio.to_thread(_tick_once)
if res is not None and getattr(res, "spawned", None):
# Quiet by default — only log when something actually
# happened, so an idle gateway stays silent.
logger.info(
"kanban dispatcher: tick spawned=%d reclaimed=%d "
"crashed=%d timed_out=%d promoted=%d auto_blocked=%d",
len(res.spawned),
res.reclaimed,
len(res.crashed) if hasattr(res.crashed, "__len__") else 0,
len(res.timed_out) if hasattr(res.timed_out, "__len__") else 0,
res.promoted,
len(res.auto_blocked) if hasattr(res.auto_blocked, "__len__") else 0,
)
# Health telemetry
ready_pending = await asyncio.to_thread(_ready_nonempty)
spawned_any = bool(res and getattr(res, "spawned", None))
if ready_pending and not spawned_any:
bad_ticks += 1
else:
bad_ticks = 0
if bad_ticks >= HEALTH_WINDOW:
now = int(time.time())
if now - last_warn_at >= 300:
logger.warning(
"kanban dispatcher stuck: ready queue non-empty for "
"%d consecutive ticks but 0 workers spawned. Check "
"profile health (venv, PATH, credentials) and "
"`hermes kanban list --status ready`.",
bad_ticks,
)
last_warn_at = now
except asyncio.CancelledError:
logger.debug("kanban dispatcher: cancelled")
raise
except Exception:
logger.exception("kanban dispatcher: unexpected watcher error")
# Sleep in 1s slices so shutdown is snappy — otherwise a stop()
# waits up to `interval` seconds for the current sleep to finish.
slept = 0.0
while slept < interval and self._running:
await asyncio.sleep(min(1.0, interval - slept))
slept += 1.0
async def _platform_reconnect_watcher(self) -> None:
"""Background task that periodically retries connecting failed platforms.
@@ -4168,6 +4572,14 @@ class GatewayRunner:
if _cmd_def_inner and _cmd_def_inner.name == "background":
return await self._handle_background_command(event)
# /kanban must bypass the guard. It writes to a profile-agnostic
# DB (kanban.db), not to the running agent's state. In fact
# /kanban unblock is often the only way to free a worker that
# has blocked waiting for a peer — letting that be dispatched
# mid-run is the whole point of the board.
if _cmd_def_inner and _cmd_def_inner.name == "kanban":
return await self._handle_kanban_command(event)
# Session-level toggles that are safe to run mid-agent —
# /yolo can unblock a pending approval prompt, /verbose cycles
# the tool-progress display mode for the ongoing stream.
@@ -4415,6 +4827,9 @@ class GatewayRunner:
if canonical == "personality":
return await self._handle_personality_command(event)
if canonical == "kanban":
return await self._handle_kanban_command(event)
if canonical == "retry":
return await self._handle_retry_command(event)
@@ -6031,6 +6446,84 @@ class GatewayRunner:
return "\n".join(lines)
async def _handle_kanban_command(self, event: MessageEvent) -> str:
"""Handle /kanban — delegate to the shared kanban CLI.
Run the potentially-blocking DB work in a thread pool so the
gateway event loop stays responsive. Read operations (list,
show, context, tail) are permitted while an agent is running;
mutations are allowed too because the board is profile-agnostic
and does not touch the running agent's state.
For ``/kanban create`` invocations we also auto-subscribe the
originating gateway source (platform + chat + thread) to the new
task's terminal events, so the user hears back when the worker
completes / blocks / auto-blocks / crashes without having to poll.
"""
import asyncio
import re
from hermes_cli.kanban import run_slash
text = (event.text or "").strip()
# Strip the leading "/kanban" (with or without slash), leaving args.
if text.startswith("/"):
text = text.lstrip("/")
if text.startswith("kanban"):
text = text[len("kanban"):].lstrip()
is_create = text.split(None, 1)[:1] == ["create"]
try:
output = await asyncio.to_thread(run_slash, text)
except Exception as exc: # pragma: no cover - defensive
return f"⚠ kanban error: {exc}"
# Auto-subscribe on create. Parse the task id from the CLI's standard
# success line ("Created t_abcd (ready, assignee=...)"). If the user
# passed --json we don't subscribe; they're clearly scripting and
# can call /kanban notify-subscribe explicitly.
if is_create and output:
m = re.search(r"Created\s+(t_[0-9a-f]+)\b", output)
if m:
task_id = m.group(1)
try:
source = event.source
platform = getattr(source, "platform", None)
platform_str = (
platform.value if hasattr(platform, "value") else str(platform or "")
).lower()
chat_id = str(getattr(source, "chat_id", "") or "")
thread_id = str(getattr(source, "thread_id", "") or "")
user_id = str(getattr(source, "user_id", "") or "") or None
if platform_str and chat_id:
def _sub():
from hermes_cli import kanban_db as _kb
conn = _kb.connect()
try:
_kb.add_notify_sub(
conn, task_id=task_id,
platform=platform_str, chat_id=chat_id,
thread_id=thread_id or None,
user_id=user_id,
)
finally:
conn.close()
await asyncio.to_thread(_sub)
output = (
output.rstrip()
+ f"\n(subscribed — you'll be notified when {task_id} "
f"completes or blocks)"
)
except Exception as exc:
logger.warning("kanban create auto-subscribe failed: %s", exc)
# Gateway messages have practical length caps; truncate long
# listings to keep the UX reasonable.
if len(output) > 3800:
output = output[:3800] + "\n… (truncated; use `hermes kanban …` in your terminal for full output)"
return output or "(no output)"
async def _handle_status_command(self, event: MessageEvent) -> str:
"""Handle /status command."""
source = event.source
+5
View File
@@ -151,6 +151,11 @@ COMMAND_REGISTRY: list[CommandDef] = [
CommandDef("curator", "Background skill maintenance (status, run, pin, archive)",
"Tools & Skills", args_hint="[subcommand]",
subcommands=("status", "run", "pause", "resume", "pin", "unpin", "restore")),
CommandDef("kanban", "Multi-profile collaboration board (tasks, links, comments)",
"Tools & Skills", args_hint="[subcommand]",
subcommands=("list", "ls", "show", "create", "assign", "link", "unlink",
"claim", "comment", "complete", "block", "unblock", "archive",
"tail", "dispatch", "context", "init", "gc")),
CommandDef("reload", "Reload .env variables into the running session", "Tools & Skills",
cli_only=True),
CommandDef("reload-mcp", "Reload MCP servers from config", "Tools & Skills",
+18
View File
@@ -1104,6 +1104,24 @@ DEFAULT_CONFIG = {
"max_parallel_jobs": None,
},
# Kanban multi-agent coordination — controls the dispatcher loop that
# spawns workers for ready tasks. The dispatcher ticks every N seconds
# (default 60), reclaims stale claims, promotes dependency-satisfied
# todos to ready, and fires `hermes -p <assignee> chat -q ...` for
# each claimable ready task. One dispatcher per profile is sufficient;
# running more than one on the same kanban.db will race for claims.
"kanban": {
# Run the dispatcher inside the gateway process. On by default —
# the cost is ~300µs every `dispatch_interval_seconds` when idle,
# and gateway is the supervisor users already have. Set to false
# only if you run the dispatcher as a separate systemd unit or
# don't want the gateway to spawn workers.
"dispatch_in_gateway": True,
# Seconds between dispatcher ticks (idle or not). Lower = snappier
# pickup of newly-ready tasks; higher = less SQL pressure.
"dispatch_interval_seconds": 60,
},
# execute_code settings — controls the tool used for programmatic tool calls.
"code_execution": {
# Execution mode:
+1393
View File
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
+15 -1
View File
@@ -5041,6 +5041,13 @@ def cmd_slack(args):
return 1
def cmd_kanban(args):
"""Multi-profile collaboration board."""
from hermes_cli.kanban import kanban_command
return kanban_command(args)
def cmd_hooks(args):
"""Shell-hook inspection and management."""
from hermes_cli.hooks import hooks_command
@@ -7682,7 +7689,7 @@ def cmd_profile(args):
if clone_all:
print(f"Full copy from {source_label}.")
else:
print(f"Cloned config, .env, SOUL.md from {source_label}.")
print(f"Cloned config, .env, SOUL.md, and skills from {source_label}.")
# Auto-clone Honcho config for the new profile (only with --clone/--clone-all)
if clone or clone_all:
@@ -8640,6 +8647,13 @@ def main():
webhook_parser.set_defaults(func=cmd_webhook)
# =========================================================================
# kanban command — multi-profile collaboration board
# =========================================================================
from hermes_cli.kanban import build_parser as _build_kanban_parser
kanban_parser = _build_kanban_parser(subparsers)
kanban_parser.set_defaults(func=cmd_kanban)
# =========================================================================
# hooks command — shell-hook inspection and management
# =========================================================================
+1
View File
@@ -40,6 +40,7 @@ OPENROUTER_MODELS: list[tuple[str, str]] = [
("anthropic/claude-sonnet-4.5", ""),
("anthropic/claude-haiku-4.5", ""),
("openrouter/elephant-alpha", "free"),
("openrouter/owl-alpha", "free"),
("openai/gpt-5.5", ""),
("openai/gpt-5.4-mini", ""),
("xiaomi/mimo-v2.5-pro", ""),
+11 -2
View File
@@ -11,7 +11,7 @@ zero migration needed.
Usage::
hermes profile create coder # fresh profile + bundled skills
hermes profile create coder --clone # also copy config, .env, SOUL.md
hermes profile create coder --clone # also copy config, .env, SOUL.md, skills
hermes profile create coder --clone-all # full copy of source profile
coder chat # use via wrapper alias
hermes -p coder chat # or via flag
@@ -411,7 +411,8 @@ def create_profile(
clone_all:
If True, do a full copytree of the source (all state).
clone_config:
If True, copy only config files (config.yaml, .env, SOUL.md).
If True, copy config files (config.yaml, .env, SOUL.md), installed
skills, and selected profile identity files from the source profile.
no_alias:
If True, skip wrapper script creation.
@@ -469,6 +470,14 @@ def create_profile(
if src.exists():
shutil.copy2(src, profile_dir / filename)
# Clone installed skills from the source profile. The dashboard's
# "clone from default" flow is expected to preserve both bundled
# and user-installed skills so the new profile immediately has the
# same agent capabilities as the source profile.
source_skills = source_dir / "skills"
if source_skills.is_dir():
shutil.copytree(source_skills, profile_dir / "skills", dirs_exist_ok=True)
# Clone memory and other subdirectory files
for relpath in _CLONE_SUBDIR_FILES:
src = source_dir / relpath
+248
View File
@@ -2344,6 +2344,254 @@ async def delete_cron_job(job_id: str):
return {"ok": True}
# ---------------------------------------------------------------------------
# Profile management endpoints (minimal — list/create/rename/delete + SOUL.md)
# ---------------------------------------------------------------------------
class ProfileCreate(BaseModel):
name: str
clone_from_default: bool = False
class ProfileRename(BaseModel):
new_name: str
class ProfileSoulUpdate(BaseModel):
content: str
def _profile_attr(info, name: str, default: Any = None) -> Any:
try:
return getattr(info, name)
except Exception:
return default
def _profile_to_dict(info) -> Dict[str, Any]:
return {
"name": _profile_attr(info, "name", ""),
"path": str(_profile_attr(info, "path", "")),
"is_default": bool(_profile_attr(info, "is_default", False)),
"model": _profile_attr(info, "model"),
"provider": _profile_attr(info, "provider"),
"has_env": bool(_profile_attr(info, "has_env", False)),
"skill_count": int(_profile_attr(info, "skill_count", 0) or 0),
}
def _fallback_profile_dicts(profiles_mod) -> List[Dict[str, Any]]:
def _safe(callable_, default):
try:
return callable_()
except Exception:
return default
profiles: List[Dict[str, Any]] = []
default_home = profiles_mod._get_default_hermes_home()
if default_home.is_dir():
model, provider = _safe(lambda: profiles_mod._read_config_model(default_home), (None, None))
profiles.append({
"name": "default",
"path": str(default_home),
"is_default": True,
"model": model,
"provider": provider,
"has_env": (default_home / ".env").exists(),
"skill_count": _safe(lambda: profiles_mod._count_skills(default_home), 0),
})
profiles_root = profiles_mod._get_profiles_root()
if profiles_root.is_dir():
for entry in sorted(profiles_root.iterdir()):
if not entry.is_dir() or not profiles_mod._PROFILE_ID_RE.match(entry.name):
continue
model, provider = _safe(lambda entry=entry: profiles_mod._read_config_model(entry), (None, None))
profiles.append({
"name": entry.name,
"path": str(entry),
"is_default": False,
"model": model,
"provider": provider,
"has_env": (entry / ".env").exists(),
"skill_count": _safe(lambda entry=entry: profiles_mod._count_skills(entry), 0),
})
return profiles
def _resolve_profile_dir(name: str) -> Path:
"""Validate ``name`` and resolve to its directory or raise an HTTPException."""
from hermes_cli import profiles as profiles_mod
try:
profiles_mod.validate_profile_name(name)
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
if not profiles_mod.profile_exists(name):
raise HTTPException(status_code=404, detail=f"Profile '{name}' does not exist.")
return profiles_mod.get_profile_dir(name)
def _profile_setup_command(name: str) -> str:
"""Return the shell command used to configure a profile in the CLI."""
_resolve_profile_dir(name)
return "hermes setup" if name == "default" else f"{name} setup"
@app.get("/api/profiles")
async def list_profiles_endpoint():
from hermes_cli import profiles as profiles_mod
try:
return {"profiles": [_profile_to_dict(p) for p in profiles_mod.list_profiles()]}
except Exception:
_log.exception("GET /api/profiles failed; falling back to profile directory scan")
return {"profiles": _fallback_profile_dicts(profiles_mod)}
@app.post("/api/profiles")
async def create_profile_endpoint(body: ProfileCreate):
from hermes_cli import profiles as profiles_mod
try:
path = profiles_mod.create_profile(
name=body.name,
clone_from="default" if body.clone_from_default else None,
clone_config=body.clone_from_default,
)
# Match the CLI's profile-create flow: fresh named profiles get the
# bundled skills installed. When cloning from default, create_profile()
# has already copied the source profile's skills, including any
# user-installed skills.
if not body.clone_from_default:
profiles_mod.seed_profile_skills(path, quiet=True)
# Match the CLI's profile-create flow: named profiles should get a
# wrapper in ~/.local/bin when the alias is safe to create.
collision = profiles_mod.check_alias_collision(body.name)
if not collision:
profiles_mod.create_wrapper_script(body.name)
except (ValueError, FileExistsError, FileNotFoundError) as e:
raise HTTPException(status_code=400, detail=str(e))
except Exception as e:
_log.exception("POST /api/profiles failed")
raise HTTPException(status_code=500, detail=str(e))
return {"ok": True, "name": body.name, "path": str(path)}
@app.get("/api/profiles/{name}/setup-command")
async def get_profile_setup_command(name: str):
return {"command": _profile_setup_command(name)}
@app.post("/api/profiles/{name}/open-terminal")
async def open_profile_terminal_endpoint(name: str):
try:
command = _profile_setup_command(name)
if sys.platform.startswith("win"):
subprocess.Popen(["cmd.exe", "/c", "start", "", command])
elif sys.platform == "darwin":
escaped = command.replace("\\", "\\\\").replace('"', '\\"')
applescript = (
'tell application "Terminal"\n'
"activate\n"
f'do script "{escaped}"\n'
"end tell"
)
subprocess.Popen(["osascript", "-e", applescript])
else:
terminal_commands = [
("x-terminal-emulator", ["x-terminal-emulator", "-e", "sh", "-lc", command]),
("gnome-terminal", ["gnome-terminal", "--", "sh", "-lc", command]),
("konsole", ["konsole", "-e", "sh", "-lc", command]),
("xfce4-terminal", ["xfce4-terminal", "-e", f"sh -lc '{command}'"]),
("mate-terminal", ["mate-terminal", "-e", f"sh -lc '{command}'"]),
("lxterminal", ["lxterminal", "-e", f"sh -lc '{command}'"]),
("tilix", ["tilix", "-e", "sh", "-lc", command]),
("alacritty", ["alacritty", "-e", "sh", "-lc", command]),
("kitty", ["kitty", "sh", "-lc", command]),
("xterm", ["xterm", "-e", "sh", "-lc", command]),
]
for executable, popen_args in terminal_commands:
if subprocess.call(
["which", executable],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
) == 0:
subprocess.Popen(popen_args)
break
else:
raise HTTPException(
status_code=400,
detail="No supported terminal emulator found",
)
except FileNotFoundError as e:
raise HTTPException(status_code=404, detail=str(e))
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
except HTTPException:
raise
except Exception as e:
_log.exception("POST /api/profiles/%s/open-terminal failed", name)
raise HTTPException(status_code=500, detail=str(e))
return {"ok": True, "command": command}
@app.patch("/api/profiles/{name}")
async def rename_profile_endpoint(name: str, body: ProfileRename):
from hermes_cli import profiles as profiles_mod
try:
path = profiles_mod.rename_profile(name, body.new_name)
except FileNotFoundError as e:
raise HTTPException(status_code=404, detail=str(e))
except (ValueError, FileExistsError) as e:
raise HTTPException(status_code=400, detail=str(e))
except Exception as e:
_log.exception("PATCH /api/profiles/%s failed", name)
raise HTTPException(status_code=500, detail=str(e))
return {"ok": True, "name": body.new_name, "path": str(path)}
@app.delete("/api/profiles/{name}")
async def delete_profile_endpoint(name: str):
"""Delete a profile. The dashboard collects the user's confirmation in
its own dialog before this request, so we always pass ``yes=True`` to
skip the CLI's interactive prompt."""
from hermes_cli import profiles as profiles_mod
try:
path = profiles_mod.delete_profile(name, yes=True)
except FileNotFoundError as e:
raise HTTPException(status_code=404, detail=str(e))
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
except Exception as e:
_log.exception("DELETE /api/profiles/%s failed", name)
raise HTTPException(status_code=500, detail=str(e))
return {"ok": True, "path": str(path)}
@app.get("/api/profiles/{name}/soul")
async def get_profile_soul(name: str):
soul_path = _resolve_profile_dir(name) / "SOUL.md"
if soul_path.exists():
try:
return {"content": soul_path.read_text(encoding="utf-8"), "exists": True}
except OSError as e:
raise HTTPException(status_code=500, detail=f"Could not read SOUL.md: {e}")
return {"content": "", "exists": False}
@app.put("/api/profiles/{name}/soul")
async def update_profile_soul(name: str, body: ProfileSoulUpdate):
soul_path = _resolve_profile_dir(name) / "SOUL.md"
try:
soul_path.write_text(body.content, encoding="utf-8")
except OSError as e:
_log.exception("PUT /api/profiles/%s/soul failed", name)
raise HTTPException(status_code=500, detail=f"Could not write SOUL.md: {e}")
return {"ok": True}
# ---------------------------------------------------------------------------
# Skills & Tools endpoints
# ---------------------------------------------------------------------------
+1 -1
View File
@@ -163,7 +163,7 @@
for entry in "''${ENTRIES[@]}"; do
IFS=":" read -r ATTR FOLDER NIX_FILE <<< "$entry"
echo "==> .#$ATTR ($FOLDER -> $NIX_FILE)"
OUTPUT=$(nix build ".#$ATTR.npmDeps" --no-link --rebuild --print-build-logs 2>&1)
OUTPUT=$(nix build ".#$ATTR.npmDeps" --no-link --print-build-logs 2>&1)
STATUS=$?
if [ "$STATUS" -eq 0 ]; then
echo " ok"
@@ -0,0 +1,372 @@
---
name: shopify
description: Shopify Admin & Storefront GraphQL APIs via curl. Products, orders, customers, inventory, metafields.
version: 1.0.0
author: community
license: MIT
prerequisites:
env_vars: [SHOPIFY_ACCESS_TOKEN, SHOPIFY_STORE_DOMAIN]
commands: [curl, jq]
required_environment_variables:
- name: SHOPIFY_ACCESS_TOKEN
prompt: Shopify Admin API access token (starts with shpat_)
help: "Shopify admin → Settings → Apps and sales channels → Develop apps → Create an app → API credentials. Token shown ONCE on install."
- name: SHOPIFY_STORE_DOMAIN
prompt: Your shop subdomain without protocol (e.g. my-store.myshopify.com)
help: "The permanent myshopify.com domain, not your custom domain."
- name: SHOPIFY_API_VERSION
prompt: Shopify API version (default 2026-01)
help: "Stable quarterly version. Override if you need an older one."
metadata:
hermes:
tags: [Shopify, E-commerce, Commerce, API, GraphQL]
related_skills: [airtable, xurl]
homepage: https://shopify.dev/docs/api/admin-graphql
---
# Shopify — Admin & Storefront GraphQL APIs
Work with Shopify stores directly through `curl`: list products, manage inventory, pull orders, update customers, read metafields. No SDK, no app framework — just the GraphQL endpoint and a custom-app access token.
The REST Admin API is legacy since 2024-04 and only receives security fixes. **Use GraphQL Admin** for all admin work. Use **Storefront GraphQL** for read-only customer-facing queries (products, collections, cart).
## Prerequisites
1. In Shopify admin: **Settings → Apps and sales channels → Develop apps → Create an app**.
2. Click **Configure Admin API scopes**, select what you need (examples below), save.
3. **Install app** → the Admin API access token appears ONCE. Copy it immediately — Shopify will never show it again. Tokens start with `shpat_`.
4. Save to `~/.hermes/.env`:
```
SHOPIFY_ACCESS_TOKEN=shpat_xxxxxxxxxxxxxxxxxxxx
SHOPIFY_STORE_DOMAIN=my-store.myshopify.com
SHOPIFY_API_VERSION=2026-01
```
> **Heads up:** As of January 1, 2026, new "legacy custom apps" created in the Shopify admin are gone. New setups should use the **Dev Dashboard** (`shopify.dev/docs/apps/build/dev-dashboard`). Existing admin-created apps keep working. If the user's shop has no existing custom app and it's after 2026-01-01, direct them to Dev Dashboard instead of the admin flow.
Common scopes by task:
- Products / collections: `read_products`, `write_products`
- Inventory: `read_inventory`, `write_inventory`, `read_locations`
- Orders: `read_orders`, `write_orders` (30 most recent without `read_all_orders`)
- Customers: `read_customers`, `write_customers`
- Draft orders: `read_draft_orders`, `write_draft_orders`
- Fulfillments: `read_fulfillments`, `write_fulfillments`
- Metafields / metaobjects: covered by the matching resource scopes
## API Basics
- **Endpoint:** `https://$SHOPIFY_STORE_DOMAIN/admin/api/$SHOPIFY_API_VERSION/graphql.json`
- **Auth header:** `X-Shopify-Access-Token: $SHOPIFY_ACCESS_TOKEN` (NOT `Authorization: Bearer`)
- **Method:** always `POST`, always `Content-Type: application/json`, body is `{"query": "...", "variables": {...}}`
- **HTTP 200 does not mean success.** GraphQL returns errors in a top-level `errors` array and per-field `userErrors`. Always check both.
- **IDs are GID strings:** `gid://shopify/Product/10079467700516`, `gid://shopify/Variant/...`, `gid://shopify/Order/...`. Pass these verbatim — don't strip the prefix.
- **Rate limit:** calculated via query cost (leaky bucket). Each response has `extensions.cost` with `requestedQueryCost`, `actualQueryCost`, `throttleStatus.{currentlyAvailable, maximumAvailable, restoreRate}`. Back off when `currentlyAvailable` drops below your next query's cost. Standard shops = 100 points bucket, 50/s restore; Plus = 1000/100.
Base curl pattern (reusable):
```bash
shop_gql() {
local query="$1"
local variables="${2:-{}}"
curl -sS -X POST \
"https://${SHOPIFY_STORE_DOMAIN}/admin/api/${SHOPIFY_API_VERSION:-2026-01}/graphql.json" \
-H "Content-Type: application/json" \
-H "X-Shopify-Access-Token: ${SHOPIFY_ACCESS_TOKEN}" \
--data "$(jq -nc --arg q "$query" --argjson v "$variables" '{query: $q, variables: $v}')"
}
```
Pipe through `jq` for readable output. `-sS` keeps errors visible but hides the progress bar.
## Discovery
### Shop info + current API version
```bash
shop_gql '{ shop { name myshopifyDomain primaryDomain { url } currencyCode plan { displayName } } }' | jq
```
### List all supported API versions
```bash
shop_gql '{ publicApiVersions { handle supported } }' | jq '.data.publicApiVersions[] | select(.supported)'
```
## Products
### Search products (first 20 matching query)
```bash
shop_gql '
query($q: String!) {
products(first: 20, query: $q) {
edges { node { id title handle status totalInventory variants(first: 5) { edges { node { id sku price inventoryQuantity } } } } }
pageInfo { hasNextPage endCursor }
}
}' '{"q":"hoodie status:active"}' | jq
```
Query syntax supports `title:`, `sku:`, `vendor:`, `product_type:`, `status:active`, `tag:`, `created_at:>2025-01-01`. Full grammar: https://shopify.dev/docs/api/usage/search-syntax
### Paginate products (cursor)
```bash
shop_gql '
query($cursor: String) {
products(first: 100, after: $cursor) {
edges { cursor node { id handle } }
pageInfo { hasNextPage endCursor }
}
}' '{"cursor":null}'
# subsequent calls: pass the previous endCursor
```
### Get a product with variants + metafields
```bash
shop_gql '
query($id: ID!) {
product(id: $id) {
id title handle descriptionHtml tags status
variants(first: 20) { edges { node { id sku price compareAtPrice inventoryQuantity selectedOptions { name value } } } }
metafields(first: 20) { edges { node { namespace key type value } } }
}
}' '{"id":"gid://shopify/Product/10079467700516"}' | jq
```
### Create a product with one variant
```bash
shop_gql '
mutation($input: ProductCreateInput!) {
productCreate(product: $input) {
product { id handle }
userErrors { field message }
}
}' '{"input":{"title":"Test Hoodie","status":"DRAFT","vendor":"Hermes","productType":"Apparel","tags":["test"]}}'
```
Variants now have their own mutations in recent versions:
```bash
# Add variants after creating the product
shop_gql '
mutation($productId: ID!, $variants: [ProductVariantsBulkInput!]!) {
productVariantsBulkCreate(productId: $productId, variants: $variants) {
productVariants { id sku price }
userErrors { field message }
}
}' '{"productId":"gid://shopify/Product/...","variants":[{"optionValues":[{"optionName":"Size","name":"M"}],"price":"49.00","inventoryItem":{"sku":"HD-M","tracked":true}}]}'
```
### Update price / SKU
```bash
shop_gql '
mutation($productId: ID!, $variants: [ProductVariantsBulkInput!]!) {
productVariantsBulkUpdate(productId: $productId, variants: $variants) {
productVariants { id sku price }
userErrors { field message }
}
}' '{"productId":"gid://shopify/Product/...","variants":[{"id":"gid://shopify/ProductVariant/...","price":"55.00"}]}'
```
## Orders
### List recent orders (last 30 by default without `read_all_orders`)
```bash
shop_gql '
{
orders(first: 20, reverse: true, query: "financial_status:paid") {
edges { node {
id name createdAt displayFinancialStatus displayFulfillmentStatus
totalPriceSet { shopMoney { amount currencyCode } }
customer { id displayName email }
lineItems(first: 10) { edges { node { title quantity sku } } }
} }
}
}' | jq
```
Useful order query filters: `financial_status:paid|pending|refunded`, `fulfillment_status:unfulfilled|fulfilled`, `created_at:>2025-01-01`, `tag:gift`, `email:foo@example.com`.
### Fetch a single order with shipping address
```bash
shop_gql '
query($id: ID!) {
order(id: $id) {
id name email
shippingAddress { name address1 address2 city province country zip phone }
lineItems(first: 50) { edges { node { title quantity variant { sku } originalUnitPriceSet { shopMoney { amount currencyCode } } } } }
transactions { id kind status amountSet { shopMoney { amount currencyCode } } }
}
}' '{"id":"gid://shopify/Order/...."}' | jq
```
## Customers
```bash
# Search
shop_gql '
{
customers(first: 10, query: "email:*@example.com") {
edges { node { id email displayName numberOfOrders amountSpent { amount currencyCode } } }
}
}'
# Create
shop_gql '
mutation($input: CustomerInput!) {
customerCreate(input: $input) {
customer { id email }
userErrors { field message }
}
}' '{"input":{"email":"test@example.com","firstName":"Test","lastName":"User","tags":["api-created"]}}'
```
## Inventory
Inventory lives on **inventory items** tied to variants, quantities tracked per **location**.
```bash
# Get inventory for a variant across all locations
shop_gql '
query($id: ID!) {
productVariant(id: $id) {
id sku
inventoryItem {
id tracked
inventoryLevels(first: 10) {
edges { node { location { id name } quantities(names: ["available","on_hand","committed"]) { name quantity } } }
}
}
}
}' '{"id":"gid://shopify/ProductVariant/..."}'
```
Adjust stock (delta) — uses `inventoryAdjustQuantities`:
```bash
shop_gql '
mutation($input: InventoryAdjustQuantitiesInput!) {
inventoryAdjustQuantities(input: $input) {
inventoryAdjustmentGroup { reason changes { name delta } }
userErrors { field message }
}
}' '{
"input": {
"reason": "correction",
"name": "available",
"changes": [{"delta": 5, "inventoryItemId": "gid://shopify/InventoryItem/...", "locationId": "gid://shopify/Location/..."}]
}
}'
```
Set absolute stock (not delta) — `inventorySetQuantities`:
```bash
shop_gql '
mutation($input: InventorySetQuantitiesInput!) {
inventorySetQuantities(input: $input) {
inventoryAdjustmentGroup { id }
userErrors { field message }
}
}' '{"input":{"reason":"correction","name":"available","ignoreCompareQuantity":true,"quantities":[{"inventoryItemId":"gid://shopify/InventoryItem/...","locationId":"gid://shopify/Location/...","quantity":100}]}}'
```
## Metafields & Metaobjects
Metafields attach custom data to resources (products, customers, orders, shop).
```bash
# Read
shop_gql '
query($id: ID!) {
product(id: $id) {
metafields(first: 10, namespace: "custom") {
edges { node { key type value } }
}
}
}' '{"id":"gid://shopify/Product/..."}'
# Write (works for any owner type)
shop_gql '
mutation($metafields: [MetafieldsSetInput!]!) {
metafieldsSet(metafields: $metafields) {
metafields { id key namespace }
userErrors { field message code }
}
}' '{"metafields":[{"ownerId":"gid://shopify/Product/...","namespace":"custom","key":"care_instructions","type":"multi_line_text_field","value":"Wash cold. Tumble dry low."}]}'
```
## Storefront API (public read-only)
Different endpoint, different token, used for customer-facing apps/hydrogen-style headless setups. Headers differ:
- **Endpoint:** `https://$SHOPIFY_STORE_DOMAIN/api/$SHOPIFY_API_VERSION/graphql.json`
- **Auth header (public):** `X-Shopify-Storefront-Access-Token: <public token>` — embeddable in browser
- **Auth header (private):** `Shopify-Storefront-Private-Token: <private token>` — server-only
```bash
curl -sS -X POST \
"https://${SHOPIFY_STORE_DOMAIN}/api/${SHOPIFY_API_VERSION:-2026-01}/graphql.json" \
-H "Content-Type: application/json" \
-H "X-Shopify-Storefront-Access-Token: ${SHOPIFY_STOREFRONT_TOKEN}" \
-d '{"query":"{ shop { name } products(first: 5) { edges { node { id title handle } } } }"}' | jq
```
## Bulk Operations
For dumps larger than rate limits allow (full product catalog, all orders for a year):
```bash
# 1. Start bulk query
shop_gql '
mutation {
bulkOperationRunQuery(query: """
{ products { edges { node { id title handle variants { edges { node { sku price } } } } } } }
""") {
bulkOperation { id status }
userErrors { field message }
}
}'
# 2. Poll status
shop_gql '{ currentBulkOperation { id status errorCode objectCount fileSize url partialDataUrl } }'
# 3. When status=COMPLETED, download the JSONL file
curl -sS "$URL" > products.jsonl
```
Each JSONL line is a node, and nested connections are emitted as separate lines with `__parentId`. Reassemble client-side if needed.
## Webhooks
Subscribe to events so you don't have to poll:
```bash
shop_gql '
mutation($topic: WebhookSubscriptionTopic!, $sub: WebhookSubscriptionInput!) {
webhookSubscriptionCreate(topic: $topic, webhookSubscription: $sub) {
webhookSubscription { id topic endpoint { __typename ... on WebhookHttpEndpoint { callbackUrl } } }
userErrors { field message }
}
}' '{"topic":"ORDERS_CREATE","sub":{"callbackUrl":"https://example.com/webhook","format":"JSON"}}'
```
Verify incoming webhook HMAC using the app's client secret (not the access token):
```bash
echo -n "$REQUEST_BODY" | openssl dgst -sha256 -hmac "$APP_SECRET" -binary | base64
# Compare to X-Shopify-Hmac-Sha256 header
```
## Pitfalls
- **REST endpoints still exist but are frozen.** Don't write new integrations against `/admin/api/.../products.json`. Use GraphQL.
- **Token format check.** Admin tokens start with `shpat_`. Storefront public tokens with `shpua_`. If you have one and the wrong header, every request returns 401 without a useful error body.
- **403 with a valid token = missing scope.** Shopify returns `{"errors":[{"message":"Access denied for ..."}]}`. Re-configure Admin API scopes on the app, then reinstall to regenerate the token.
- **`userErrors` is empty != success.** Also check `data.<mutation>.<resource>` is non-null. Some failures populate neither — inspect the whole response.
- **GID vs numeric ID.** Legacy REST gave numeric IDs; GraphQL wants full GID strings. To convert: `gid://shopify/Product/<numeric>`.
- **Rate limit surprise.** A single `products(first: 250)` with deep nesting can cost 1000+ points and throttle immediately on a standard-plan shop. Start narrow, read `extensions.cost`, adjust.
- **Pagination order.** `products(first: N, reverse: true)` sorts by `id DESC`, not `created_at`. Use `sortKey: CREATED_AT, reverse: true` for "newest first."
- **`read_all_orders` for historical data.** Without it, `orders(...)` silently caps at the 60-day window. You won't get an error, just fewer results than expected. For Shopify Plus merchants with many orders, request this scope via the app's protected-data settings.
- **Currencies are strings.** Amounts come back as `"49.00"` not `49.0`. Don't `jq tonumber` blindly if you care about zero-padding.
- **Multi-currency Money fields** have `shopMoney` (store's currency) AND `presentmentMoney` (customer's). Pick one consistently.
## Safety
Mutations in Shopify are real — they create products, charge refunds, cancel orders, ship fulfillments. Before running `productDelete`, `orderCancel`, `refundCreate`, or any bulk mutation: state clearly what the change is, on which shop, and confirm with the user. There is no staging clone of production data unless the user has a separate dev store.
File diff suppressed because it is too large Load Diff
+752
View File
@@ -0,0 +1,752 @@
/*
* Hermes Kanban dashboard plugin styles.
*
* All colors reference theme CSS vars so the board reskins with the
* active dashboard theme. No hardcoded palette.
*/
.hermes-kanban {
width: 100%;
}
/* ---- Columns layout -------------------------------------------------- */
.hermes-kanban-columns {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(260px, 1fr));
gap: 0.75rem;
align-items: start;
}
.hermes-kanban-column {
display: flex;
flex-direction: column;
background: color-mix(in srgb, var(--color-card) 85%, transparent);
border: 1px solid var(--color-border);
border-radius: var(--radius);
padding: 0.5rem;
min-height: 200px;
max-height: calc(100vh - 220px);
transition: border-color 120ms ease, background-color 120ms ease;
}
.hermes-kanban-column--drop {
border-color: var(--color-ring);
background: color-mix(in srgb, var(--color-ring) 8%, var(--color-card));
}
.hermes-kanban-column-header {
display: flex;
align-items: center;
gap: 0.5rem;
padding: 0.25rem 0.25rem 0.35rem;
font-weight: 600;
font-size: 0.85rem;
color: var(--color-foreground);
}
.hermes-kanban-column-label {
flex: 1;
letter-spacing: 0.01em;
}
.hermes-kanban-column-count {
font-variant-numeric: tabular-nums;
color: var(--color-muted-foreground);
font-size: 0.75rem;
font-weight: 500;
}
.hermes-kanban-column-add {
appearance: none;
background: transparent;
border: 1px solid var(--color-border);
color: var(--color-foreground);
border-radius: var(--radius-sm, 0.25rem);
width: 22px;
height: 22px;
line-height: 1;
font-size: 1rem;
cursor: pointer;
}
.hermes-kanban-column-add:hover {
background: color-mix(in srgb, var(--color-foreground) 8%, transparent);
}
.hermes-kanban-column-sub {
padding: 0 0.25rem 0.5rem;
font-size: 0.7rem;
color: var(--color-muted-foreground);
border-bottom: 1px solid color-mix(in srgb, var(--color-border) 60%, transparent);
margin-bottom: 0.5rem;
}
.hermes-kanban-column-body {
display: flex;
flex-direction: column;
gap: 0.45rem;
overflow-y: auto;
padding-right: 0.1rem;
}
.hermes-kanban-empty {
padding: 1.5rem 0.5rem;
text-align: center;
font-size: 0.75rem;
color: var(--color-muted-foreground);
border: 1px dashed color-mix(in srgb, var(--color-border) 70%, transparent);
border-radius: var(--radius-sm, 0.25rem);
}
/* ---- Status dots ----------------------------------------------------- */
.hermes-kanban-dot {
display: inline-block;
width: 0.5rem;
height: 0.5rem;
border-radius: 999px;
background: var(--color-muted-foreground);
}
.hermes-kanban-dot-triage { background: #b47dd6; } /* lilac — fresh/unspecified */
.hermes-kanban-dot-todo { background: var(--color-muted-foreground); }
.hermes-kanban-dot-ready { background: #d4b348; } /* amber */
.hermes-kanban-dot-running { background: #3fb97d; } /* green */
.hermes-kanban-dot-blocked { background: var(--color-destructive, #d14a4a); }
.hermes-kanban-dot-done { background: #4a8cd1; } /* blue */
.hermes-kanban-dot-archived { background: var(--color-border); }
/* ---- Progress pill (N/M child tasks done) --------------------------- */
.hermes-kanban-progress {
font-family: var(--font-mono, ui-monospace, monospace);
font-size: 0.62rem;
padding: 0.05rem 0.35rem;
border-radius: 999px;
background: color-mix(in srgb, var(--color-foreground) 8%, transparent);
border: 1px solid color-mix(in srgb, var(--color-border) 80%, transparent);
color: var(--color-muted-foreground);
letter-spacing: 0.02em;
}
.hermes-kanban-progress--full {
background: color-mix(in srgb, #3fb97d 22%, transparent);
border-color: color-mix(in srgb, #3fb97d 45%, transparent);
color: var(--color-foreground);
}
/* ---- Lanes (per-profile sub-grouping inside Running) ---------------- */
.hermes-kanban-lane {
display: flex;
flex-direction: column;
gap: 0.35rem;
padding: 0.25rem 0 0.35rem;
border-top: 1px dashed color-mix(in srgb, var(--color-border) 70%, transparent);
}
.hermes-kanban-lane:first-child {
border-top: 0;
padding-top: 0;
}
.hermes-kanban-lane-head {
display: flex;
align-items: center;
gap: 0.4rem;
font-size: 0.65rem;
text-transform: uppercase;
letter-spacing: 0.08em;
color: var(--color-muted-foreground);
padding: 0 0.1rem;
}
.hermes-kanban-lane-name {
font-weight: 600;
font-family: var(--font-mono, ui-monospace, monospace);
}
.hermes-kanban-lane-count {
margin-left: auto;
font-variant-numeric: tabular-nums;
}
/* ---- Card ------------------------------------------------------------ */
.hermes-kanban-card {
cursor: grab;
transition: transform 100ms ease, box-shadow 100ms ease;
}
.hermes-kanban-card:hover {
box-shadow: 0 1px 0 0 var(--color-ring) inset, 0 0 0 1px var(--color-ring) inset;
}
.hermes-kanban-card:active {
cursor: grabbing;
transform: scale(0.995);
}
.hermes-kanban-card-content {
padding: 0.5rem 0.6rem !important;
display: flex;
flex-direction: column;
gap: 0.3rem;
}
.hermes-kanban-card-row {
display: flex;
align-items: center;
gap: 0.35rem;
flex-wrap: wrap;
}
.hermes-kanban-card-id {
font-family: var(--font-mono, ui-monospace, monospace);
font-size: 0.65rem;
color: var(--color-muted-foreground);
letter-spacing: 0.03em;
}
.hermes-kanban-card-title {
font-size: 0.85rem;
font-weight: 500;
line-height: 1.3;
color: var(--color-foreground);
word-break: break-word;
}
.hermes-kanban-card-meta {
font-size: 0.7rem;
color: var(--color-muted-foreground);
gap: 0.55rem;
}
.hermes-kanban-priority {
font-size: 0.6rem !important;
padding: 0.05rem 0.3rem !important;
background: color-mix(in srgb, var(--color-ring) 18%, transparent);
color: var(--color-foreground);
border: 1px solid color-mix(in srgb, var(--color-ring) 40%, transparent);
}
.hermes-kanban-tag {
font-size: 0.6rem !important;
padding: 0.05rem 0.3rem !important;
}
.hermes-kanban-assignee {
font-weight: 500;
color: color-mix(in srgb, var(--color-foreground) 80%, var(--color-muted-foreground));
}
.hermes-kanban-unassigned {
font-style: italic;
}
.hermes-kanban-ago {
margin-left: auto;
}
/* ---- Inline create --------------------------------------------------- */
.hermes-kanban-inline-create {
display: flex;
flex-direction: column;
gap: 0.35rem;
padding: 0.5rem;
margin-bottom: 0.5rem;
background: color-mix(in srgb, var(--color-card) 70%, transparent);
border: 1px dashed var(--color-border);
border-radius: var(--radius-sm, 0.25rem);
}
/* ---- Drawer (task detail side panel) --------------------------------- */
.hermes-kanban-drawer-shade {
position: fixed;
inset: 0;
background: rgba(0, 0, 0, 0.45);
z-index: 60;
display: flex;
justify-content: flex-end;
}
.hermes-kanban-drawer {
width: min(480px, 92vw);
height: 100vh;
background: var(--color-card);
border-left: 1px solid var(--color-border);
display: flex;
flex-direction: column;
box-shadow: -4px 0 18px rgba(0, 0, 0, 0.35);
animation: hermes-kanban-drawer-in 180ms ease-out;
}
@keyframes hermes-kanban-drawer-in {
from { transform: translateX(100%); opacity: 0.3; }
to { transform: translateX(0); opacity: 1; }
}
.hermes-kanban-drawer-head {
display: flex;
align-items: center;
justify-content: space-between;
padding: 0.6rem 0.8rem;
border-bottom: 1px solid var(--color-border);
font-family: var(--font-mono, ui-monospace, monospace);
}
.hermes-kanban-drawer-close {
appearance: none;
background: transparent;
border: 0;
color: var(--color-muted-foreground);
font-size: 1.25rem;
line-height: 1;
cursor: pointer;
padding: 0 0.25rem;
}
.hermes-kanban-drawer-close:hover { color: var(--color-foreground); }
.hermes-kanban-drawer-body {
flex: 1;
overflow-y: auto;
padding: 0.9rem;
display: flex;
flex-direction: column;
gap: 0.85rem;
}
.hermes-kanban-drawer-title {
display: flex;
align-items: center;
gap: 0.5rem;
font-size: 1rem;
font-weight: 600;
}
.hermes-kanban-drawer-meta {
display: flex;
flex-direction: column;
gap: 0.15rem;
padding: 0.5rem 0.6rem;
background: color-mix(in srgb, var(--color-foreground) 4%, transparent);
border: 1px solid var(--color-border);
border-radius: var(--radius-sm, 0.25rem);
}
.hermes-kanban-meta-row {
display: flex;
gap: 0.5rem;
font-size: 0.72rem;
}
.hermes-kanban-meta-label {
width: 92px;
color: var(--color-muted-foreground);
}
.hermes-kanban-meta-value {
color: var(--color-foreground);
word-break: break-word;
}
.hermes-kanban-actions {
display: flex;
flex-wrap: wrap;
gap: 0.3rem;
}
.hermes-kanban-section {
display: flex;
flex-direction: column;
gap: 0.35rem;
}
.hermes-kanban-section-head {
font-size: 0.72rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.07em;
color: var(--color-muted-foreground);
}
.hermes-kanban-pre {
margin: 0;
padding: 0.45rem 0.55rem;
white-space: pre-wrap;
word-break: break-word;
background: color-mix(in srgb, var(--color-foreground) 4%, transparent);
border: 1px solid var(--color-border);
border-radius: var(--radius-sm, 0.25rem);
font-family: var(--font-mono, ui-monospace, monospace);
font-size: 0.72rem;
color: var(--color-foreground);
}
.hermes-kanban-comment {
border-left: 2px solid color-mix(in srgb, var(--color-ring) 35%, transparent);
padding-left: 0.5rem;
display: flex;
flex-direction: column;
gap: 0.2rem;
}
.hermes-kanban-comment-head {
display: flex;
gap: 0.5rem;
font-size: 0.7rem;
}
.hermes-kanban-comment-author {
font-weight: 600;
color: var(--color-foreground);
}
.hermes-kanban-comment-ago {
color: var(--color-muted-foreground);
}
.hermes-kanban-event {
display: flex;
gap: 0.5rem;
font-size: 0.7rem;
color: var(--color-muted-foreground);
font-family: var(--font-mono, ui-monospace, monospace);
}
.hermes-kanban-event-kind {
color: var(--color-foreground);
min-width: 6rem;
}
.hermes-kanban-event-payload {
color: var(--color-muted-foreground);
overflow: hidden;
text-overflow: ellipsis;
white-space: nowrap;
max-width: 280px;
}
.hermes-kanban-drawer-comment-row {
display: flex;
gap: 0.4rem;
padding: 0.55rem 0.75rem;
border-top: 1px solid var(--color-border);
background: color-mix(in srgb, var(--color-card) 90%, transparent);
}
.hermes-kanban-count {
display: inline-flex;
gap: 0.2rem;
align-items: center;
}
/* ---- Selection chrome ----------------------------------------------- */
.hermes-kanban-card--selected :where(.hermes-kanban-card-content) {
box-shadow: 0 0 0 2px var(--color-ring) inset,
0 0 0 1px var(--color-ring) inset;
background: color-mix(in srgb, var(--color-ring) 6%, var(--color-card));
}
.hermes-kanban-card-check {
width: 0.85rem;
height: 0.85rem;
margin: 0;
cursor: pointer;
accent-color: var(--color-ring);
}
/* ---- Bulk action bar ------------------------------------------------ */
.hermes-kanban-bulk {
display: flex;
align-items: center;
gap: 0.5rem;
padding: 0.4rem 0.75rem;
background: color-mix(in srgb, var(--color-ring) 10%, var(--color-card));
border: 1px solid color-mix(in srgb, var(--color-ring) 40%, var(--color-border));
border-radius: var(--radius-sm, 0.25rem);
flex-wrap: wrap;
}
.hermes-kanban-bulk-count {
font-weight: 600;
font-size: 0.75rem;
padding-right: 0.25rem;
}
.hermes-kanban-bulk-btn {
height: 1.7rem !important;
padding: 0 0.5rem !important;
font-size: 0.7rem !important;
border: 1px solid var(--color-border);
cursor: pointer;
}
.hermes-kanban-bulk-btn:hover {
background: color-mix(in srgb, var(--color-foreground) 8%, transparent);
}
.hermes-kanban-bulk-reassign {
display: flex;
align-items: center;
gap: 0.25rem;
padding-left: 0.5rem;
border-left: 1px solid color-mix(in srgb, var(--color-border) 70%, transparent);
}
/* ---- Dependency editor chips --------------------------------------- */
.hermes-kanban-deps-row {
display: flex;
align-items: center;
gap: 0.5rem;
margin-bottom: 0.4rem;
}
.hermes-kanban-deps-label {
font-size: 0.68rem;
text-transform: uppercase;
letter-spacing: 0.08em;
color: var(--color-muted-foreground);
min-width: 4rem;
}
.hermes-kanban-deps-chips {
display: flex;
gap: 0.3rem;
flex-wrap: wrap;
flex: 1;
}
.hermes-kanban-deps-empty {
font-size: 0.7rem;
color: var(--color-muted-foreground);
font-style: italic;
}
.hermes-kanban-dep-chip {
display: inline-flex;
align-items: center;
gap: 0.15rem;
padding: 0.1rem 0.35rem;
background: color-mix(in srgb, var(--color-foreground) 6%, transparent);
border: 1px solid var(--color-border);
border-radius: var(--radius-sm, 0.25rem);
font-family: var(--font-mono, ui-monospace, monospace);
font-size: 0.68rem;
color: var(--color-foreground);
}
.hermes-kanban-dep-chip-x {
appearance: none;
background: transparent;
border: 0;
color: var(--color-muted-foreground);
cursor: pointer;
font-size: 0.85rem;
line-height: 1;
padding: 0 0.15rem;
}
.hermes-kanban-dep-chip-x:hover { color: var(--color-destructive, #d14a4a); }
/* ---- Inline edit affordances --------------------------------------- */
.hermes-kanban-editable {
cursor: pointer;
border-bottom: 1px dotted color-mix(in srgb, var(--color-border) 80%, transparent);
}
.hermes-kanban-editable:hover {
color: var(--color-foreground);
border-bottom-color: var(--color-ring);
}
.hermes-kanban-drawer-title-text {
cursor: pointer;
}
.hermes-kanban-drawer-title-text:hover {
text-decoration: underline;
text-decoration-color: var(--color-ring);
text-decoration-style: dotted;
text-underline-offset: 3px;
}
.hermes-kanban-edit-row {
display: flex;
align-items: center;
gap: 0.35rem;
width: 100%;
}
.hermes-kanban-section-head-row {
display: flex;
align-items: center;
justify-content: space-between;
gap: 0.5rem;
}
.hermes-kanban-edit-link {
appearance: none;
background: transparent;
border: 0;
color: var(--color-muted-foreground);
font-size: 0.7rem;
text-transform: uppercase;
letter-spacing: 0.05em;
cursor: pointer;
padding: 0;
}
.hermes-kanban-edit-link:hover { color: var(--color-ring); }
.hermes-kanban-textarea {
width: 100%;
min-height: 8rem;
background: var(--color-card);
color: var(--color-foreground);
border: 1px solid var(--color-border);
border-radius: var(--radius-sm, 0.25rem);
padding: 0.5rem 0.6rem;
font-family: var(--font-mono, ui-monospace, monospace);
font-size: 0.8rem;
line-height: 1.5;
resize: vertical;
}
.hermes-kanban-textarea:focus {
outline: none;
border-color: var(--color-ring);
box-shadow: 0 0 0 2px color-mix(in srgb, var(--color-ring) 30%, transparent);
}
/* ---- Markdown rendering -------------------------------------------- */
.hermes-kanban-md {
font-size: 0.8rem;
line-height: 1.55;
color: var(--color-foreground);
}
.hermes-kanban-md p { margin: 0.25rem 0; }
.hermes-kanban-md h1,
.hermes-kanban-md h2,
.hermes-kanban-md h3,
.hermes-kanban-md h4 {
margin: 0.6rem 0 0.2rem;
line-height: 1.25;
}
.hermes-kanban-md h1 { font-size: 1.05rem; }
.hermes-kanban-md h2 { font-size: 0.95rem; }
.hermes-kanban-md h3 { font-size: 0.88rem; }
.hermes-kanban-md h4 { font-size: 0.82rem; }
.hermes-kanban-md ul {
margin: 0.25rem 0 0.25rem 1.1rem;
padding: 0;
}
.hermes-kanban-md li { margin: 0.1rem 0; }
.hermes-kanban-md a {
color: var(--color-ring);
text-decoration: underline;
}
.hermes-kanban-md code {
font-family: var(--font-mono, ui-monospace, monospace);
font-size: 0.75rem;
padding: 0.05rem 0.3rem;
background: color-mix(in srgb, var(--color-foreground) 8%, transparent);
border-radius: 3px;
}
.hermes-kanban-md-code {
margin: 0.35rem 0;
padding: 0.5rem 0.6rem;
background: color-mix(in srgb, var(--color-foreground) 5%, transparent);
border: 1px solid var(--color-border);
border-radius: var(--radius-sm, 0.25rem);
overflow-x: auto;
}
.hermes-kanban-md-code code {
background: transparent;
padding: 0;
font-size: 0.75rem;
white-space: pre;
}
.hermes-kanban-md strong { font-weight: 600; }
/* ---- Touch-drag proxy ---------------------------------------------- */
.hermes-kanban-touch-proxy {
pointer-events: none;
opacity: 0.85;
box-shadow: 0 8px 20px rgba(0, 0, 0, 0.35);
transform: scale(1.02);
transition: none;
}
/* ---- Staleness tiers ------------------------------------------------ */
.hermes-kanban-card--stale-amber :where(.hermes-kanban-card-content) {
box-shadow: 0 0 0 1px #d4b34888 inset;
}
.hermes-kanban-card--stale-amber:hover :where(.hermes-kanban-card-content) {
box-shadow: 0 0 0 2px #d4b348 inset;
}
.hermes-kanban-card--stale-red :where(.hermes-kanban-card-content) {
box-shadow: 0 0 0 1px var(--color-destructive, #d14a4a) inset,
0 0 8px color-mix(in srgb, var(--color-destructive, #d14a4a) 30%, transparent);
}
.hermes-kanban-card--stale-red:hover :where(.hermes-kanban-card-content) {
box-shadow: 0 0 0 2px var(--color-destructive, #d14a4a) inset,
0 0 10px color-mix(in srgb, var(--color-destructive, #d14a4a) 45%, transparent);
}
/* ---- Worker log pane ------------------------------------------------ */
.hermes-kanban-log {
max-height: 340px;
overflow: auto;
white-space: pre;
font-size: 0.7rem;
line-height: 1.45;
}
/* ---- Run history (per-attempt log in the drawer) ------------------- */
.hermes-kanban-run {
border-left: 2px solid var(--color-border);
padding: 0.35rem 0.5rem;
margin-bottom: 0.4rem;
background: color-mix(in srgb, var(--color-foreground) 3%, transparent);
border-radius: var(--radius-sm, 0.25rem);
}
.hermes-kanban-run--active { border-left-color: #3fb97d; }
.hermes-kanban-run--completed { border-left-color: #4a8cd1; }
.hermes-kanban-run--ended { border-left-color: #6b7280; } /* generic fallback when outcome is unset */
.hermes-kanban-run--blocked { border-left-color: var(--color-destructive, #d14a4a); }
.hermes-kanban-run--crashed,
.hermes-kanban-run--timed_out,
.hermes-kanban-run--gave_up,
.hermes-kanban-run--spawn_failed {
border-left-color: var(--color-destructive, #d14a4a);
background: color-mix(in srgb, var(--color-destructive, #d14a4a) 6%, transparent);
}
.hermes-kanban-run--reclaimed { border-left-color: #d4b348; }
.hermes-kanban-run-head {
display: flex;
align-items: center;
gap: 0.6rem;
font-size: 0.7rem;
}
.hermes-kanban-run-outcome {
font-family: var(--font-mono, ui-monospace, monospace);
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.05em;
color: var(--color-foreground);
}
.hermes-kanban-run-profile {
color: var(--color-muted-foreground);
}
.hermes-kanban-run-elapsed {
font-variant-numeric: tabular-nums;
color: var(--color-muted-foreground);
}
.hermes-kanban-run-ago {
margin-left: auto;
color: var(--color-muted-foreground);
}
.hermes-kanban-run-summary {
font-size: 0.75rem;
padding: 0.2rem 0 0;
color: var(--color-foreground);
}
.hermes-kanban-run-error {
font-size: 0.7rem;
color: var(--color-destructive, #d14a4a);
padding: 0.15rem 0 0;
font-family: var(--font-mono, ui-monospace, monospace);
}
.hermes-kanban-run-meta {
display: block;
font-size: 0.65rem;
padding: 0.15rem 0 0;
color: var(--color-muted-foreground);
white-space: pre-wrap;
word-break: break-word;
font-family: var(--font-mono, ui-monospace, monospace);
}
+14
View File
@@ -0,0 +1,14 @@
{
"name": "kanban",
"label": "Kanban",
"description": "Multi-agent collaboration board — drag-drop cards across columns, read comment threads, see which profile is running what",
"icon": "Package",
"version": "1.0.0",
"tab": {
"path": "/kanban",
"position": "after:skills"
},
"entry": "dist/index.js",
"css": "dist/style.css",
"api": "plugin_api.py"
}
+845
View File
@@ -0,0 +1,845 @@
"""Kanban dashboard plugin — backend API routes.
Mounted at /api/plugins/kanban/ by the dashboard plugin system.
This layer is intentionally thin: every handler is a small wrapper around
``hermes_cli.kanban_db`` or a direct SQL query. Writes use the same code
paths the CLI and gateway ``/kanban`` command use, so the three surfaces
cannot drift.
Live updates arrive via the ``/events`` WebSocket, which tails the
append-only ``task_events`` table on a short poll interval (WAL mode lets
reads run alongside the dispatcher's IMMEDIATE write transactions).
Security note
-------------
The dashboard's HTTP auth middleware (``web_server.auth_middleware``)
explicitly skips ``/api/plugins/`` plugin routes are unauthenticated by
design because the dashboard binds to localhost by default. For the
WebSocket we still require the session token as a ``?token=`` query
parameter (browsers cannot set the ``Authorization`` header on an upgrade
request), matching the established pattern used by the in-browser PTY
bridge in ``hermes_cli/web_server.py``. If you run the dashboard with
``--host 0.0.0.0``, every plugin route kanban included becomes
reachable from the network. Don't do that on a shared host.
"""
from __future__ import annotations
import asyncio
import hmac
import json
import logging
import sqlite3
import time
from dataclasses import asdict
from typing import Any, Optional
from fastapi import APIRouter, HTTPException, Query, WebSocket, WebSocketDisconnect, status as http_status
from pydantic import BaseModel, Field
from hermes_cli import kanban_db
log = logging.getLogger(__name__)
router = APIRouter()
# ---------------------------------------------------------------------------
# Auth helper — WebSocket only (HTTP routes live behind the dashboard's
# existing plugin-bypass; this is documented above).
# ---------------------------------------------------------------------------
def _check_ws_token(provided: Optional[str]) -> bool:
"""Constant-time compare against the dashboard session token.
Imported lazily so the plugin still loads in test contexts where the
dashboard web_server module isn't importable (e.g. the bare-FastAPI
test harness).
"""
if not provided:
return False
try:
from hermes_cli import web_server as _ws
except Exception:
# No dashboard context (tests). Accept so the tail loop is still
# testable; in production the dashboard module always imports
# cleanly because it's the caller.
return True
expected = getattr(_ws, "_SESSION_TOKEN", None)
if not expected:
return True
return hmac.compare_digest(str(provided), str(expected))
def _conn():
"""Open a kanban_db connection, creating the schema on first use.
Every handler that mutates the DB goes through this so the plugin
self-heals on a fresh install (no user-visible "no such table"
error if somebody hits POST /tasks before GET /board).
``init_db`` is idempotent.
"""
try:
kanban_db.init_db()
except Exception as exc:
log.warning("kanban init_db failed: %s", exc)
return kanban_db.connect()
# ---------------------------------------------------------------------------
# Serialization helpers
# ---------------------------------------------------------------------------
# Columns shown by the dashboard, in left-to-right order. "archived" is
# available via a filter toggle rather than a visible column.
BOARD_COLUMNS: list[str] = [
"triage", "todo", "ready", "running", "blocked", "done",
]
def _task_dict(task: kanban_db.Task) -> dict[str, Any]:
d = asdict(task)
# Add derived age metrics so the UI can colour stale cards without
# computing deltas client-side.
d["age"] = kanban_db.task_age(task)
# Keep body short on list endpoints; full body comes from /tasks/:id.
return d
def _event_dict(event: kanban_db.Event) -> dict[str, Any]:
return {
"id": event.id,
"task_id": event.task_id,
"kind": event.kind,
"payload": event.payload,
"created_at": event.created_at,
"run_id": event.run_id,
}
def _comment_dict(c: kanban_db.Comment) -> dict[str, Any]:
return {
"id": c.id,
"task_id": c.task_id,
"author": c.author,
"body": c.body,
"created_at": c.created_at,
}
def _run_dict(r: kanban_db.Run) -> dict[str, Any]:
"""Serialise a Run for the drawer's Run history section."""
return {
"id": r.id,
"task_id": r.task_id,
"profile": r.profile,
"step_key": r.step_key,
"status": r.status,
"claim_lock": r.claim_lock,
"claim_expires": r.claim_expires,
"worker_pid": r.worker_pid,
"max_runtime_seconds": r.max_runtime_seconds,
"last_heartbeat_at": r.last_heartbeat_at,
"started_at": r.started_at,
"ended_at": r.ended_at,
"outcome": r.outcome,
"summary": r.summary,
"metadata": r.metadata,
"error": r.error,
}
def _links_for(conn: sqlite3.Connection, task_id: str) -> dict[str, list[str]]:
"""Return {'parents': [...], 'children': [...]} for a task."""
parents = [
r["parent_id"]
for r in conn.execute(
"SELECT parent_id FROM task_links WHERE child_id = ? ORDER BY parent_id",
(task_id,),
)
]
children = [
r["child_id"]
for r in conn.execute(
"SELECT child_id FROM task_links WHERE parent_id = ? ORDER BY child_id",
(task_id,),
)
]
return {"parents": parents, "children": children}
# ---------------------------------------------------------------------------
# GET /board
# ---------------------------------------------------------------------------
@router.get("/board")
def get_board(
tenant: Optional[str] = Query(None, description="Filter to a single tenant"),
include_archived: bool = Query(False),
):
"""Return the full board grouped by status column.
``_conn()`` auto-initializes ``kanban.db`` on first call so a fresh
install doesn't surface a "failed to load" error on the plugin tab.
"""
conn = _conn()
try:
tasks = kanban_db.list_tasks(
conn, tenant=tenant, include_archived=include_archived
)
# Pre-fetch link counts per task (cheap: one query).
link_counts: dict[str, dict[str, int]] = {}
for row in conn.execute(
"SELECT parent_id, child_id FROM task_links"
).fetchall():
link_counts.setdefault(row["parent_id"], {"parents": 0, "children": 0})[
"children"
] += 1
link_counts.setdefault(row["child_id"], {"parents": 0, "children": 0})[
"parents"
] += 1
# Comment + event counts (both cheap aggregates).
comment_counts: dict[str, int] = {
r["task_id"]: r["n"]
for r in conn.execute(
"SELECT task_id, COUNT(*) AS n FROM task_comments GROUP BY task_id"
)
}
# Progress rollup: for each parent, how many children are done / total.
# One pass over task_links joined with child status — cheaper than
# N per-task queries and the plugin uses it to render "N/M".
progress: dict[str, dict[str, int]] = {}
for row in conn.execute(
"SELECT l.parent_id AS pid, t.status AS cstatus "
"FROM task_links l JOIN tasks t ON t.id = l.child_id"
).fetchall():
p = progress.setdefault(row["pid"], {"done": 0, "total": 0})
p["total"] += 1
if row["cstatus"] == "done":
p["done"] += 1
latest_event_id = conn.execute(
"SELECT COALESCE(MAX(id), 0) AS m FROM task_events"
).fetchone()["m"]
columns: dict[str, list[dict]] = {c: [] for c in BOARD_COLUMNS}
if include_archived:
columns["archived"] = []
for t in tasks:
d = _task_dict(t)
d["link_counts"] = link_counts.get(t.id, {"parents": 0, "children": 0})
d["comment_count"] = comment_counts.get(t.id, 0)
d["progress"] = progress.get(t.id) # None when the task has no children
col = t.status if t.status in columns else "todo"
columns[col].append(d)
# Stable per-column ordering already applied by list_tasks
# (priority DESC, created_at ASC), keep as-is.
# List of known tenants for the UI filter dropdown.
tenants = [
r["tenant"]
for r in conn.execute(
"SELECT DISTINCT tenant FROM tasks WHERE tenant IS NOT NULL ORDER BY tenant"
)
]
# List of distinct assignees for the lane-by-profile sub-grouping.
assignees = [
r["assignee"]
for r in conn.execute(
"SELECT DISTINCT assignee FROM tasks WHERE assignee IS NOT NULL "
"AND status != 'archived' ORDER BY assignee"
)
]
return {
"columns": [
{"name": name, "tasks": columns[name]} for name in columns.keys()
],
"tenants": tenants,
"assignees": assignees,
"latest_event_id": int(latest_event_id),
"now": int(time.time()),
}
finally:
conn.close()
# ---------------------------------------------------------------------------
# GET /tasks/:id
# ---------------------------------------------------------------------------
@router.get("/tasks/{task_id}")
def get_task(task_id: str):
conn = _conn()
try:
task = kanban_db.get_task(conn, task_id)
if task is None:
raise HTTPException(status_code=404, detail=f"task {task_id} not found")
return {
"task": _task_dict(task),
"comments": [_comment_dict(c) for c in kanban_db.list_comments(conn, task_id)],
"events": [_event_dict(e) for e in kanban_db.list_events(conn, task_id)],
"links": _links_for(conn, task_id),
"runs": [_run_dict(r) for r in kanban_db.list_runs(conn, task_id)],
}
finally:
conn.close()
# ---------------------------------------------------------------------------
# POST /tasks
# ---------------------------------------------------------------------------
class CreateTaskBody(BaseModel):
title: str
body: Optional[str] = None
assignee: Optional[str] = None
tenant: Optional[str] = None
priority: int = 0
workspace_kind: str = "scratch"
workspace_path: Optional[str] = None
parents: list[str] = Field(default_factory=list)
triage: bool = False
idempotency_key: Optional[str] = None
max_runtime_seconds: Optional[int] = None
skills: Optional[list[str]] = None
@router.post("/tasks")
def create_task(payload: CreateTaskBody):
conn = _conn()
try:
task_id = kanban_db.create_task(
conn,
title=payload.title,
body=payload.body,
assignee=payload.assignee,
created_by="dashboard",
workspace_kind=payload.workspace_kind,
workspace_path=payload.workspace_path,
tenant=payload.tenant,
priority=payload.priority,
parents=payload.parents,
triage=payload.triage,
idempotency_key=payload.idempotency_key,
max_runtime_seconds=payload.max_runtime_seconds,
skills=payload.skills,
)
task = kanban_db.get_task(conn, task_id)
body: dict[str, Any] = {"task": _task_dict(task) if task else None}
# Surface a dispatcher-presence warning so the UI can show a
# banner when a `ready` task would otherwise sit idle because no
# gateway is running (or dispatch_in_gateway=false). Only emit
# for ready+assigned tasks; triage/todo are expected to wait,
# and unassigned tasks can't be dispatched regardless.
if task and task.status == "ready" and task.assignee:
try:
from hermes_cli.kanban import _check_dispatcher_presence
running, message = _check_dispatcher_presence()
if not running and message:
body["warning"] = message
except Exception:
# Probe failure must never block the create itself.
pass
return body
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
finally:
conn.close()
# ---------------------------------------------------------------------------
# PATCH /tasks/:id (status / assignee / priority / title / body)
# ---------------------------------------------------------------------------
class UpdateTaskBody(BaseModel):
status: Optional[str] = None
assignee: Optional[str] = None
priority: Optional[int] = None
title: Optional[str] = None
body: Optional[str] = None
result: Optional[str] = None
block_reason: Optional[str] = None
# Structured handoff fields — forwarded to complete_task when status
# transitions to 'done'. Dashboard parity with ``hermes kanban
# complete --summary ... --metadata ...``.
summary: Optional[str] = None
metadata: Optional[dict] = None
@router.patch("/tasks/{task_id}")
def update_task(task_id: str, payload: UpdateTaskBody):
conn = _conn()
try:
task = kanban_db.get_task(conn, task_id)
if task is None:
raise HTTPException(status_code=404, detail=f"task {task_id} not found")
# --- assignee ----------------------------------------------------
if payload.assignee is not None:
try:
ok = kanban_db.assign_task(
conn, task_id, payload.assignee or None,
)
except RuntimeError as e:
raise HTTPException(status_code=409, detail=str(e))
if not ok:
raise HTTPException(status_code=404, detail="task not found")
# --- status -------------------------------------------------------
if payload.status is not None:
s = payload.status
ok = True
if s == "done":
ok = kanban_db.complete_task(
conn, task_id,
result=payload.result,
summary=payload.summary,
metadata=payload.metadata,
)
elif s == "blocked":
ok = kanban_db.block_task(conn, task_id, reason=payload.block_reason)
elif s == "ready":
# Re-open a blocked task, or just an explicit status set.
current = kanban_db.get_task(conn, task_id)
if current and current.status == "blocked":
ok = kanban_db.unblock_task(conn, task_id)
else:
# Direct status write for drag-drop (todo -> ready etc).
ok = _set_status_direct(conn, task_id, "ready")
elif s == "archived":
ok = kanban_db.archive_task(conn, task_id)
elif s in ("todo", "running", "triage"):
ok = _set_status_direct(conn, task_id, s)
else:
raise HTTPException(status_code=400, detail=f"unknown status: {s}")
if not ok:
raise HTTPException(
status_code=409,
detail=f"status transition to {s!r} not valid from current state",
)
# --- priority -----------------------------------------------------
if payload.priority is not None:
with kanban_db.write_txn(conn):
conn.execute(
"UPDATE tasks SET priority = ? WHERE id = ?",
(int(payload.priority), task_id),
)
conn.execute(
"INSERT INTO task_events (task_id, kind, payload, created_at) "
"VALUES (?, 'reprioritized', ?, ?)",
(task_id, json.dumps({"priority": int(payload.priority)}),
int(time.time())),
)
# --- title / body -------------------------------------------------
if payload.title is not None or payload.body is not None:
with kanban_db.write_txn(conn):
sets, vals = [], []
if payload.title is not None:
if not payload.title.strip():
raise HTTPException(status_code=400, detail="title cannot be empty")
sets.append("title = ?")
vals.append(payload.title.strip())
if payload.body is not None:
sets.append("body = ?")
vals.append(payload.body)
vals.append(task_id)
conn.execute(
f"UPDATE tasks SET {', '.join(sets)} WHERE id = ?", vals,
)
conn.execute(
"INSERT INTO task_events (task_id, kind, payload, created_at) "
"VALUES (?, 'edited', NULL, ?)",
(task_id, int(time.time())),
)
updated = kanban_db.get_task(conn, task_id)
return {"task": _task_dict(updated) if updated else None}
finally:
conn.close()
def _set_status_direct(
conn: sqlite3.Connection, task_id: str, new_status: str,
) -> bool:
"""Direct status write for drag-drop moves that aren't covered by the
structured complete/block/unblock/archive verbs (e.g. todo<->ready,
running<->ready). Appends a ``status`` event row for the live feed.
When this transitions OFF ``running`` to anything other than the
terminal verbs above (which own their own run closing), we close the
active run with outcome='reclaimed' so attempt history isn't
orphaned. ``running -> ready`` via drag-drop is the common case
(user yanking a stuck worker back to the queue).
"""
with kanban_db.write_txn(conn):
# Snapshot current state so we know whether to close a run.
prev = conn.execute(
"SELECT status, current_run_id FROM tasks WHERE id = ?",
(task_id,),
).fetchone()
if prev is None:
return False
was_running = prev["status"] == "running"
cur = conn.execute(
"UPDATE tasks SET status = ?, "
" claim_lock = CASE WHEN ? = 'running' THEN claim_lock ELSE NULL END, "
" claim_expires = CASE WHEN ? = 'running' THEN claim_expires ELSE NULL END, "
" worker_pid = CASE WHEN ? = 'running' THEN worker_pid ELSE NULL END "
"WHERE id = ?",
(new_status, new_status, new_status, new_status, task_id),
)
if cur.rowcount != 1:
return False
run_id = None
if was_running and new_status != "running" and prev["current_run_id"]:
run_id = kanban_db._end_run(
conn, task_id,
outcome="reclaimed", status="reclaimed",
summary=f"status changed to {new_status} (dashboard/direct)",
)
conn.execute(
"INSERT INTO task_events (task_id, run_id, kind, payload, created_at) "
"VALUES (?, ?, 'status', ?, ?)",
(task_id, run_id, json.dumps({"status": new_status}), int(time.time())),
)
# If we re-opened something, children may have gone stale.
if new_status in ("done", "ready"):
kanban_db.recompute_ready(conn)
return True
# ---------------------------------------------------------------------------
# Comments
# ---------------------------------------------------------------------------
class CommentBody(BaseModel):
body: str
author: Optional[str] = "dashboard"
@router.post("/tasks/{task_id}/comments")
def add_comment(task_id: str, payload: CommentBody):
if not payload.body.strip():
raise HTTPException(status_code=400, detail="body is required")
conn = _conn()
try:
if kanban_db.get_task(conn, task_id) is None:
raise HTTPException(status_code=404, detail=f"task {task_id} not found")
kanban_db.add_comment(
conn, task_id, author=payload.author or "dashboard", body=payload.body,
)
return {"ok": True}
finally:
conn.close()
# ---------------------------------------------------------------------------
# Links
# ---------------------------------------------------------------------------
class LinkBody(BaseModel):
parent_id: str
child_id: str
@router.post("/links")
def add_link(payload: LinkBody):
conn = _conn()
try:
kanban_db.link_tasks(conn, payload.parent_id, payload.child_id)
return {"ok": True}
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
finally:
conn.close()
@router.delete("/links")
def delete_link(parent_id: str = Query(...), child_id: str = Query(...)):
conn = _conn()
try:
ok = kanban_db.unlink_tasks(conn, parent_id, child_id)
return {"ok": bool(ok)}
finally:
conn.close()
# ---------------------------------------------------------------------------
# Bulk actions (multi-select on the board)
# ---------------------------------------------------------------------------
class BulkTaskBody(BaseModel):
ids: list[str]
status: Optional[str] = None
assignee: Optional[str] = None # "" or None = unassign
priority: Optional[int] = None
archive: bool = False
@router.post("/tasks/bulk")
def bulk_update(payload: BulkTaskBody):
"""Apply the same patch to every id in ``payload.ids``.
This is an *independent* iteration per-task failures don't abort
siblings. Returns per-id outcome so the UI can surface partials.
"""
ids = [i for i in (payload.ids or []) if i]
if not ids:
raise HTTPException(status_code=400, detail="ids is required")
results: list[dict] = []
conn = _conn()
try:
for tid in ids:
entry: dict[str, Any] = {"id": tid, "ok": True}
try:
task = kanban_db.get_task(conn, tid)
if task is None:
entry.update(ok=False, error="not found")
results.append(entry)
continue
if payload.archive:
if not kanban_db.archive_task(conn, tid):
entry.update(ok=False, error="archive refused")
if payload.status is not None and not payload.archive:
s = payload.status
if s == "done":
ok = kanban_db.complete_task(conn, tid)
elif s == "blocked":
ok = kanban_db.block_task(conn, tid)
elif s == "ready":
cur = kanban_db.get_task(conn, tid)
if cur and cur.status == "blocked":
ok = kanban_db.unblock_task(conn, tid)
else:
ok = _set_status_direct(conn, tid, "ready")
elif s in ("todo", "running", "triage"):
ok = _set_status_direct(conn, tid, s)
else:
entry.update(ok=False, error=f"unknown status {s!r}")
results.append(entry)
continue
if not ok:
entry.update(ok=False, error=f"transition to {s!r} refused")
if payload.assignee is not None:
try:
if not kanban_db.assign_task(
conn, tid, payload.assignee or None,
):
entry.update(ok=False, error="assign refused")
except RuntimeError as e:
entry.update(ok=False, error=str(e))
if payload.priority is not None:
with kanban_db.write_txn(conn):
conn.execute(
"UPDATE tasks SET priority = ? WHERE id = ?",
(int(payload.priority), tid),
)
conn.execute(
"INSERT INTO task_events (task_id, kind, payload, created_at) "
"VALUES (?, 'reprioritized', ?, ?)",
(tid, json.dumps({"priority": int(payload.priority)}),
int(time.time())),
)
except Exception as e: # defensive — one bad id shouldn't kill the batch
entry.update(ok=False, error=str(e))
results.append(entry)
return {"results": results}
finally:
conn.close()
# ---------------------------------------------------------------------------
# Plugin config (read dashboard.kanban.* defaults from config.yaml)
# ---------------------------------------------------------------------------
@router.get("/config")
def get_config():
"""Return kanban dashboard preferences from ~/.hermes/config.yaml.
Reads the ``dashboard.kanban`` section if present; defaults otherwise.
Used by the UI to pre-select tenant filters, toggle markdown rendering,
or set column-width preferences without a round-trip per page load.
"""
try:
from hermes_cli.config import load_config
cfg = load_config() or {}
except Exception:
cfg = {}
dash_cfg = (cfg.get("dashboard") or {})
# dashboard.kanban may itself be a dict; fall back to {}.
k_cfg = dash_cfg.get("kanban") or {}
return {
"default_tenant": k_cfg.get("default_tenant") or "",
"lane_by_profile": bool(k_cfg.get("lane_by_profile", True)),
"include_archived_by_default": bool(k_cfg.get("include_archived_by_default", False)),
"render_markdown": bool(k_cfg.get("render_markdown", True)),
}
# ---------------------------------------------------------------------------
# Stats (per-profile / per-status counts + oldest-ready age)
# ---------------------------------------------------------------------------
@router.get("/stats")
def get_stats():
"""Per-status + per-assignee counts + oldest-ready age.
Designed for the dashboard HUD and for router profiles that need to
answer "is this specialist overloaded?" without scanning the whole
board themselves.
"""
conn = _conn()
try:
return kanban_db.board_stats(conn)
finally:
conn.close()
@router.get("/assignees")
def get_assignees():
"""Known profiles + per-profile task counts.
Returns the union of ``~/.hermes/profiles/*`` on disk and every
distinct assignee currently used on the board. The dashboard uses
this to populate its assignee dropdown so a freshly-created profile
appears in the picker before it's been given any task.
"""
conn = _conn()
try:
return {"assignees": kanban_db.known_assignees(conn)}
finally:
conn.close()
# ---------------------------------------------------------------------------
# Worker log (read-only; file written by _default_spawn)
# ---------------------------------------------------------------------------
@router.get("/tasks/{task_id}/log")
def get_task_log(task_id: str, tail: Optional[int] = Query(None, ge=1, le=2_000_000)):
"""Return the worker's stdout/stderr log.
``tail`` caps the response size (bytes) so the dashboard drawer
doesn't paginate megabytes into the browser. Returns 404 if the task
has never spawned. The on-disk log is rotated at 2 MiB per
``_rotate_worker_log`` a single ``.log.1`` is kept, no further
generations, so disk usage per task is bounded at ~4 MiB.
"""
conn = _conn()
try:
task = kanban_db.get_task(conn, task_id)
finally:
conn.close()
if task is None:
raise HTTPException(status_code=404, detail=f"task {task_id} not found")
content = kanban_db.read_worker_log(task_id, tail_bytes=tail)
log_path = kanban_db.worker_log_path(task_id)
size = log_path.stat().st_size if log_path.exists() else 0
return {
"task_id": task_id,
"path": str(log_path),
"exists": content is not None,
"size_bytes": size,
"content": content or "",
# Truncated when the on-disk file was larger than the tail cap.
"truncated": bool(tail and size > tail),
}
# ---------------------------------------------------------------------------
# Dispatch nudge (optional quick-path so the UI doesn't wait 60 s)
# ---------------------------------------------------------------------------
@router.post("/dispatch")
def dispatch(dry_run: bool = Query(False), max_n: int = Query(8, alias="max")):
conn = _conn()
try:
result = kanban_db.dispatch_once(
conn, dry_run=dry_run, max_spawn=max_n,
)
# DispatchResult is a dataclass.
try:
return asdict(result)
except TypeError:
return {"result": str(result)}
finally:
conn.close()
# ---------------------------------------------------------------------------
# WebSocket: /events?since=<event_id>
# ---------------------------------------------------------------------------
# Poll interval for the event tail loop. SQLite WAL + 300 ms polling is
# the simplest and most robust approach; it adds a fraction of a percent
# of CPU and has no shared state to synchronize across workers.
_EVENT_POLL_SECONDS = 0.3
@router.websocket("/events")
async def stream_events(ws: WebSocket):
# Enforce the dashboard session token as a query param — browsers can't
# set Authorization on a WS upgrade. This matches how the PTY bridge
# authenticates in hermes_cli/web_server.py.
token = ws.query_params.get("token")
if not _check_ws_token(token):
await ws.close(code=http_status.WS_1008_POLICY_VIOLATION)
return
await ws.accept()
try:
since_raw = ws.query_params.get("since", "0")
try:
cursor = int(since_raw)
except ValueError:
cursor = 0
def _fetch_new(cursor_val: int) -> tuple[int, list[dict]]:
conn = kanban_db.connect()
try:
rows = conn.execute(
"SELECT id, task_id, run_id, kind, payload, created_at "
"FROM task_events WHERE id > ? ORDER BY id ASC LIMIT 200",
(cursor_val,),
).fetchall()
out: list[dict] = []
new_cursor = cursor_val
for r in rows:
try:
payload = json.loads(r["payload"]) if r["payload"] else None
except Exception:
payload = None
out.append({
"id": r["id"],
"task_id": r["task_id"],
"run_id": r["run_id"],
"kind": r["kind"],
"payload": payload,
"created_at": r["created_at"],
})
new_cursor = r["id"]
return new_cursor, out
finally:
conn.close()
while True:
cursor, events = await asyncio.to_thread(_fetch_new, cursor)
if events:
await ws.send_json({"events": events, "cursor": cursor})
await asyncio.sleep(_EVENT_POLL_SECONDS)
except WebSocketDisconnect:
return
except Exception as exc: # defensive: never crash the dashboard worker
log.warning("Kanban event stream error: %s", exc)
try:
await ws.close()
except Exception:
pass
@@ -0,0 +1,32 @@
# DEPRECATED — the kanban dispatcher now runs inside the gateway by
# default (config key: kanban.dispatch_in_gateway, default true). To
# migrate:
#
# systemctl --user disable --now hermes-kanban-dispatcher.service
# # then make sure a gateway is running; e.g. a systemd user unit
# # for `hermes gateway start`. The gateway hosts the dispatcher.
#
# This unit is kept for users who truly cannot run the gateway (host
# policy forbids long-lived services, etc.). It now invokes the
# standalone dispatcher via the explicit --force flag, so nobody
# accidentally keeps two dispatchers racing against the same
# kanban.db. Running this unit AND a gateway with
# dispatch_in_gateway=true is NOT supported.
[Unit]
Description=Hermes Kanban dispatcher (DEPRECATED standalone daemon — prefer gateway-embedded dispatch)
Documentation=https://hermes-agent.nousresearch.com/docs/user-guide/features/kanban
After=network.target
[Service]
Type=simple
ExecStart=/usr/bin/env hermes kanban daemon --force --interval 60 --pidfile %t/hermes-kanban-dispatcher.pid
Restart=on-failure
RestartSec=5
# Log to the journal via stdout/stderr; the dispatcher also writes per-task
# worker output to $HERMES_HOME/kanban/logs/<task>.log.
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=default.target
+17 -3
View File
@@ -23,6 +23,7 @@ Usage:
import asyncio
import base64
import concurrent.futures
import contextvars
import copy
import hashlib
import json
@@ -133,6 +134,7 @@ from agent.prompt_builder import (
DEFAULT_AGENT_IDENTITY, PLATFORM_HINTS,
MEMORY_GUIDANCE, SESSION_SEARCH_GUIDANCE, SKILLS_GUIDANCE,
HERMES_AGENT_HELP_GUIDANCE,
KANBAN_GUIDANCE,
build_nous_subscription_prompt,
)
from agent.model_metadata import (
@@ -3592,11 +3594,15 @@ class AIAgent:
if actions:
summary = " · ".join(dict.fromkeys(actions))
self._safe_print(f" 💾 {summary}")
self._safe_print(
f" 💾 Self-improvement review: {summary}"
)
_bg_cb = self.background_review_callback
if _bg_cb:
try:
_bg_cb(f"💾 {summary}")
_bg_cb(
f"💾 Self-improvement review: {summary}"
)
except Exception:
pass
@@ -4823,6 +4829,12 @@ class AIAgent:
tool_guidance.append(SESSION_SEARCH_GUIDANCE)
if "skill_manage" in self.valid_tool_names:
tool_guidance.append(SKILLS_GUIDANCE)
# Kanban worker/orchestrator lifecycle — only present when the
# dispatcher spawned this process (kanban_show check_fn gates on
# HERMES_KANBAN_TASK env var). Normal chat sessions never see
# this block.
if "kanban_show" in self.valid_tool_names:
tool_guidance.append(KANBAN_GUIDANCE)
if tool_guidance:
prompt_parts.append(" ".join(tool_guidance))
@@ -9432,7 +9444,9 @@ class AIAgent:
with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
futures = []
for i, (tc, name, args) in enumerate(parsed_calls):
f = executor.submit(_run_tool, i, tc, name, args)
# Propagate ContextVars (e.g. _approval_session_key); mirrors asyncio.to_thread.
ctx = contextvars.copy_context()
f = executor.submit(ctx.run, _run_tool, i, tc, name, args)
futures.append(f)
# Wait for all to complete with periodic heartbeats so the
+152
View File
@@ -0,0 +1,152 @@
---
name: kanban-orchestrator
description: Decomposition playbook + specialist-roster conventions + anti-temptation rules for an orchestrator profile routing work through Kanban. The "don't do the work yourself" rule and the basic lifecycle are auto-injected into every kanban worker's system prompt; this skill is the deeper playbook when you're specifically playing the orchestrator role.
version: 2.0.0
metadata:
hermes:
tags: [kanban, multi-agent, orchestration, routing]
related_skills: [kanban-worker]
---
# Kanban Orchestrator — Decomposition Playbook
> The **core worker lifecycle** (including the `kanban_create` fan-out pattern and the "decompose, don't execute" rule) is auto-injected into every kanban process via the `KANBAN_GUIDANCE` system-prompt block. This skill is the deeper playbook when you're an orchestrator profile whose whole job is routing.
## When to use the board (vs. just doing the work)
Create Kanban tasks when any of these are true:
1. **Multiple specialists are needed.** Research + analysis + writing is three profiles.
2. **The work should survive a crash or restart.** Long-running, recurring, or important.
3. **The user might want to interject.** Human-in-the-loop at any step.
4. **Multiple subtasks can run in parallel.** Fan-out for speed.
5. **Review / iteration is expected.** A reviewer profile loops on drafter output.
6. **The audit trail matters.** Board rows persist in SQLite forever.
If *none* of those apply — it's a small one-shot reasoning task — use `delegate_task` instead or answer the user directly.
## The anti-temptation rules
Your job description says "route, don't execute." The rules that enforce that:
- **Do not execute the work yourself.** Your restricted toolset usually doesn't even include terminal/file/code/web for implementation. If you find yourself "just fixing this quickly" — stop and create a task for the right specialist.
- **For any concrete task, create a Kanban task and assign it.** Every single time.
- **If no specialist fits, ask the user which profile to create.** Do not default to doing it yourself under "close enough."
- **Decompose, route, and summarize — that's the whole job.**
## The standard specialist roster (convention)
Unless the user's setup has customized profiles, assume these exist. Adjust to whatever the user actually has — ask if you're unsure.
| Profile | Does | Typical workspace |
|---|---|---|
| `researcher` | Reads sources, gathers facts, writes findings | `scratch` |
| `analyst` | Synthesizes, ranks, de-dupes. Consumes multiple `researcher` outputs | `scratch` |
| `writer` | Drafts prose in the user's voice | `scratch` or `dir:` into their Obsidian vault |
| `reviewer` | Reads output, leaves findings, gates approval | `scratch` |
| `backend-eng` | Writes server-side code | `worktree` |
| `frontend-eng` | Writes client-side code | `worktree` |
| `ops` | Runs scripts, manages services, handles deployments | `dir:` into ops scripts repo |
| `pm` | Writes specs, acceptance criteria | `scratch` |
## Decomposition playbook
### Step 1 — Understand the goal
Ask clarifying questions if the goal is ambiguous. Cheap to ask; expensive to spawn the wrong fleet.
### Step 2 — Sketch the task graph
Before creating anything, draft the graph out loud (in your response to the user). Example for "Analyze whether we should migrate to Postgres":
```
T1 researcher research: Postgres cost vs current
T2 researcher research: Postgres performance vs current
T3 analyst synthesize migration recommendation parents: T1, T2
T4 writer draft decision memo parents: T3
```
Show this to the user. Let them correct it before you create anything.
### Step 3 — Create tasks and link
```python
t1 = kanban_create(
title="research: Postgres cost vs current",
assignee="researcher",
body="Compare estimated infrastructure costs, migration costs, and ongoing ops costs over a 3-year window. Sources: AWS/GCP pricing, team time estimates, current Postgres bills from peers.",
tenant=os.environ.get("HERMES_TENANT"),
)["task_id"]
t2 = kanban_create(
title="research: Postgres performance vs current",
assignee="researcher",
body="Compare query latency, throughput, and scaling characteristics at our expected data volume (~500GB, 10k QPS peak). Sources: benchmark papers, public case studies, pgbench results if easy.",
)["task_id"]
t3 = kanban_create(
title="synthesize migration recommendation",
assignee="analyst",
body="Read the findings from T1 (cost) and T2 (performance). Produce a 1-page recommendation with explicit trade-offs and a go/no-go call.",
parents=[t1, t2],
)["task_id"]
t4 = kanban_create(
title="draft decision memo",
assignee="writer",
body="Turn the analyst's recommendation into a 2-page memo for the CTO. Match the tone of previous decision memos in the team's knowledge base.",
parents=[t3],
)["task_id"]
```
`parents=[...]` gates promotion — children stay in `todo` until every parent reaches `done`, then auto-promote to `ready`. No manual coordination needed; the dispatcher and dependency engine handle it.
### Step 4 — Complete your own task
If you were spawned as a task yourself (e.g. `planner` profile was assigned `T0: "investigate Postgres migration"`), mark it done with a summary of what you created:
```python
kanban_complete(
summary="decomposed into T1-T4: 2 researchers parallel, 1 analyst on their outputs, 1 writer on the recommendation",
metadata={
"task_graph": {
"T1": {"assignee": "researcher", "parents": []},
"T2": {"assignee": "researcher", "parents": []},
"T3": {"assignee": "analyst", "parents": ["T1", "T2"]},
"T4": {"assignee": "writer", "parents": ["T3"]},
},
},
)
```
### Step 5 — Report back to the user
Tell them what you created in plain prose:
> I've queued 4 tasks:
> - **T1** (researcher): cost comparison
> - **T2** (researcher): performance comparison, in parallel with T1
> - **T3** (analyst): synthesizes T1 + T2 into a recommendation
> - **T4** (writer): turns T3 into a CTO memo
>
> The dispatcher will pick up T1 and T2 now. T3 starts when both finish. You'll get a gateway ping when T4 completes. Use the dashboard or `hermes kanban tail <id>` to follow along.
## Common patterns
**Fan-out + fan-in (research → synthesize):** N `researcher` tasks with no parents, one `analyst` task with all of them as parents.
**Pipeline with gates:** `pm → backend-eng → reviewer`. Each stage's `parents=[previous_task]`. Reviewer blocks or completes; if reviewer blocks, the operator unblocks with feedback and respawns.
**Same-profile queue:** 50 tasks, all assigned to `translator`, no dependencies between them. Dispatcher serializes — translator processes them in priority order, accumulating experience in their own memory.
**Human-in-the-loop:** Any task can `kanban_block()` to wait for input. Dispatcher respawns after `/unblock`. The comment thread carries the full context.
## Pitfalls
**Reassignment vs. new task.** If a reviewer blocks with "needs changes," create a NEW task linked from the reviewer's task — don't re-run the same task with a stern look. The new task is assigned to the original implementer profile.
**Argument order for links.** `kanban_link(parent_id=..., child_id=...)` — parent first. Mixing them up demotes the wrong task to `todo`.
**Don't pre-create the whole graph if the shape depends on intermediate findings.** If T3's structure depends on what T1 and T2 find, let T3 exist as a "synthesize findings" task whose own first step is to read parent handoffs and plan the rest. Orchestrators can spawn orchestrators.
**Tenant inheritance.** If `HERMES_TENANT` is set in your env, pass `tenant=os.environ.get("HERMES_TENANT")` on every `kanban_create` call so child tasks stay in the same namespace.
+134
View File
@@ -0,0 +1,134 @@
---
name: kanban-worker
description: Pitfalls, examples, and edge cases for Hermes Kanban workers. The lifecycle itself is auto-injected into every worker's system prompt as KANBAN_GUIDANCE (from agent/prompt_builder.py); this skill is what you load when you want deeper detail on specific scenarios.
version: 2.0.0
metadata:
hermes:
tags: [kanban, multi-agent, collaboration, workflow, pitfalls]
related_skills: [kanban-orchestrator]
---
# Kanban Worker — Pitfalls and Examples
> You're seeing this skill because the Hermes Kanban dispatcher spawned you as a worker with `--skills kanban-worker` — it's loaded automatically for every dispatched worker. The **lifecycle** (6 steps: orient → work → heartbeat → block/complete) also lives in the `KANBAN_GUIDANCE` block that's auto-injected into your system prompt. This skill is the deeper detail: good handoff shapes, retry diagnostics, edge cases.
## Workspace handling
Your workspace kind determines how you should behave inside `$HERMES_KANBAN_WORKSPACE`:
| Kind | What it is | How to work |
|---|---|---|
| `scratch` | Fresh tmp dir, yours alone | Read/write freely; it gets GC'd when the task is archived. |
| `dir:<path>` | Shared persistent directory | Other runs will read what you write. Treat it like long-lived state. Path is guaranteed absolute (the kernel rejects relative paths). |
| `worktree` | Git worktree at the resolved path | If `.git` doesn't exist, run `git worktree add <path> <branch>` from the main repo first, then cd and work normally. Commit work here. |
## Tenant isolation
If `$HERMES_TENANT` is set, the task belongs to a tenant namespace. When reading or writing persistent memory, prefix memory entries with the tenant so context doesn't leak across tenants:
- Good: `business-a: Acme is our biggest customer`
- Bad (leaks): `Acme is our biggest customer`
## Good summary + metadata shapes
The `kanban_complete(summary=..., metadata=...)` handoff is how downstream workers read what you did. Patterns that work:
**Coding task:**
```python
kanban_complete(
summary="shipped rate limiter — token bucket, keys on user_id with IP fallback, 14 tests pass",
metadata={
"changed_files": ["rate_limiter.py", "tests/test_rate_limiter.py"],
"tests_run": 14,
"tests_passed": 14,
"decisions": ["user_id primary, IP fallback for unauthenticated requests"],
},
)
```
**Research task:**
```python
kanban_complete(
summary="3 competing libraries reviewed; vLLM wins on throughput, SGLang on latency, Tensorrt-LLM on memory efficiency",
metadata={
"sources_read": 12,
"recommendation": "vLLM",
"benchmarks": {"vllm": 1.0, "sglang": 0.87, "trtllm": 0.72},
},
)
```
**Review task:**
```python
kanban_complete(
summary="reviewed PR #123; 2 blocking issues found (SQL injection in /search, missing CSRF on /settings)",
metadata={
"pr_number": 123,
"findings": [
{"severity": "critical", "file": "api/search.py", "line": 42, "issue": "raw SQL concat"},
{"severity": "high", "file": "api/settings.py", "issue": "missing CSRF middleware"},
],
"approved": False,
},
)
```
Shape `metadata` so downstream parsers (reviewers, aggregators, schedulers) can use it without re-reading your prose.
## Block reasons that get answered fast
Bad: `"stuck"` — the human has no context.
Good: one sentence naming the specific decision you need. Leave longer context as a comment instead.
```python
kanban_comment(
task_id=os.environ["HERMES_KANBAN_TASK"],
body="Full context: I have user IPs from Cloudflare headers but some users are behind NATs with thousands of peers. Keying on IP alone causes false positives.",
)
kanban_block(reason="Rate limit key choice: IP (simple, NAT-unsafe) or user_id (requires auth, skips anonymous endpoints)?")
```
The block message is what appears in the dashboard / gateway notifier. The comment is the deeper context a human reads when they open the task.
## Heartbeats worth sending
Good heartbeats name progress: `"epoch 12/50, loss 0.31"`, `"scanned 1.2M/2.4M rows"`, `"uploaded 47/120 videos"`.
Bad heartbeats: `"still working"`, empty notes, sub-second intervals. Every few minutes max; skip entirely for tasks under ~2 minutes.
## Retry scenarios
If you open the task and `kanban_show` returns `runs: [...]` with one or more closed runs, you're a retry. The prior runs' `outcome` / `summary` / `error` tell you what didn't work. Don't repeat that path. Typical retry diagnostics:
- `outcome: "timed_out"` — the previous attempt hit `max_runtime_seconds`. You may need to chunk the work or shorten it.
- `outcome: "crashed"` — OOM or segfault. Reduce memory footprint.
- `outcome: "spawn_failed"` + `error: "..."` — usually a profile config issue (missing credential, bad PATH). Ask the human via `kanban_block` instead of retrying blindly.
- `outcome: "reclaimed"` + `summary: "task archived..."` — operator archived the task out from under the previous run; you probably shouldn't be running at all, check status carefully.
- `outcome: "blocked"` — a previous attempt blocked; the unblock comment should be in the thread by now.
## Do NOT
- Call `delegate_task` as a substitute for `kanban_create`. `delegate_task` is for short reasoning subtasks inside YOUR run; `kanban_create` is for cross-agent handoffs that outlive one API loop.
- Modify files outside `$HERMES_KANBAN_WORKSPACE` unless the task body says to.
- Create follow-up tasks assigned to yourself — assign to the right specialist.
- Complete a task you didn't actually finish. Block it instead.
## Pitfalls
**Task state can change between dispatch and your startup.** Between when the dispatcher claimed and when your process actually booted, the task may have been blocked, reassigned, or archived. Always `kanban_show` first. If it reports `blocked` or `archived`, stop — you shouldn't be running.
**Workspace may have stale artifacts.** Especially `dir:` and `worktree` workspaces can have files from previous runs. Read the comment thread — it usually explains why you're running again and what state the workspace is in.
**Don't rely on the CLI when the guidance is available.** The `kanban_*` tools work across all terminal backends (Docker, Modal, SSH). `hermes kanban <verb>` from your terminal tool will fail in containerized backends because the CLI isn't installed there. When in doubt, use the tool.
## CLI fallback (for scripting)
Every tool has a CLI equivalent for human operators and scripts:
- `kanban_show``hermes kanban show <id> --json`
- `kanban_complete``hermes kanban complete <id> --summary "..." --metadata '{...}'`
- `kanban_block``hermes kanban block <id> "reason"`
- `kanban_create``hermes kanban create "title" --assignee <profile> [--parent <id>]`
- etc.
Use the tools from inside an agent; the CLI exists for the human at the terminal.
+2 -2
View File
@@ -124,7 +124,7 @@ class TestMcpRegistrationE2E:
mock_conn.request_permission = AsyncMock()
acp_agent._conn = mock_conn
def mock_run_conversation(user_message, conversation_history=None, task_id=None):
def mock_run_conversation(user_message, conversation_history=None, task_id=None, **kwargs):
"""Simulate an agent turn that calls terminal, gets a result, then responds."""
agent = state.agent
@@ -213,7 +213,7 @@ class TestMcpRegistrationE2E:
mock_conn.request_permission = AsyncMock()
acp_agent._conn = mock_conn
def mock_run(user_message, conversation_history=None, task_id=None):
def mock_run(user_message, conversation_history=None, task_id=None, **kwargs):
agent = state.agent
# Fire two tool calls
if agent.tool_progress_callback:
+159
View File
@@ -6,6 +6,11 @@ the JSON Schema ecosystem accepts:
1. Properties without ``type`` Moonshot requires ``type`` on every node.
2. ``type`` at the parent of ``anyOf`` Moonshot requires it only inside
``anyOf`` children.
3. ``$ref`` with sibling keywords Moonshot expands the ref first and then
rejects ``description``/``type`` siblings on the same node.
(Ported from anomalyco/opencode#24730.)
4. Tuple-style ``items`` arrays Moonshot requires a single item schema,
not positional ones. (Ported from anomalyco/opencode#24730.)
These tests cover the repairs applied by ``agent/moonshot_schema.py``.
"""
@@ -153,6 +158,160 @@ class TestAnyOfParentType:
assert "type" in children[1]
class TestRefSiblingStripping:
"""Rule 3: ``$ref`` nodes may not carry sibling keywords on Moonshot.
Ported from anomalyco/opencode#24730. The real-world failure was MCP tools
whose generated schemas put a ``description`` on a ``$ref`` property so the
model would see the field's human-readable hint. The reference stays — the
referenced definition still owns the description (on the target node itself)
and still serves the model's context.
"""
def test_description_sibling_stripped_from_ref(self):
params = {
"type": "object",
"properties": {
"variantOptions": {
"$ref": "#/$defs/VariantOptions",
"description": "Required. The variant options for generation.",
},
},
"$defs": {
"VariantOptions": {
"type": "object",
"properties": {},
"description": "Configuration options.",
},
},
}
out = sanitize_moonshot_tool_parameters(params)
# Sibling stripped.
assert out["properties"]["variantOptions"] == {"$ref": "#/$defs/VariantOptions"}
# The target definition's own description is preserved — we only strip
# siblings ON the $ref node, not on the thing it points at.
assert out["$defs"]["VariantOptions"]["description"] == "Configuration options."
def test_multiple_siblings_all_stripped(self):
params = {
"type": "object",
"properties": {
"p": {
"$ref": "#/$defs/T",
"type": "object",
"description": "x",
"default": {},
"title": "P",
},
},
"$defs": {"T": {"type": "object"}},
}
out = sanitize_moonshot_tool_parameters(params)
assert out["properties"]["p"] == {"$ref": "#/$defs/T"}
def test_ref_without_siblings_unchanged(self):
params = {
"type": "object",
"properties": {"p": {"$ref": "#/$defs/T"}},
"$defs": {"T": {"type": "object"}},
}
out = sanitize_moonshot_tool_parameters(params)
assert out["properties"]["p"] == {"$ref": "#/$defs/T"}
def test_ref_inside_anyof_children(self):
params = {
"type": "object",
"properties": {
"v": {
"anyOf": [
{"$ref": "#/$defs/A", "description": "variant A"},
{"type": "null"},
],
},
},
"$defs": {"A": {"type": "object"}},
}
out = sanitize_moonshot_tool_parameters(params)
children = out["properties"]["v"]["anyOf"]
assert children[0] == {"$ref": "#/$defs/A"}
assert children[1] == {"type": "null"}
class TestTupleItems:
"""Rule 4: tuple-style ``items`` arrays collapse to a single schema.
Ported from anomalyco/opencode#24730. Moonshot's schema engine requires
``items`` to be ONE schema object applied to every array element; tuple-
style positional item schemas are rejected. We collapse to the first
element's schema (which is the "closest" interpretation of positional →
single) and drop the rest.
"""
def test_tuple_items_collapsed_to_first(self):
params = {
"type": "object",
"properties": {
"renderedSize": {
"type": "array",
"items": [{"type": "number"}, {"type": "number"}],
"minItems": 2,
"maxItems": 2,
},
},
}
out = sanitize_moonshot_tool_parameters(params)
assert out["properties"]["renderedSize"]["items"] == {"type": "number"}
# Sibling constraints are preserved — only the tuple shape is repaired.
assert out["properties"]["renderedSize"]["minItems"] == 2
def test_empty_tuple_items_becomes_empty_schema(self):
# Empty tuple collapses to ``{}``; the generic repair then fills a
# synthetic ``type`` because Moonshot requires ``type`` on every
# schema node. Either ``{}`` or ``{"type": "string"}`` is a valid
# final shape for Moonshot — both accept any string element — but we
# always go through ``_fill_missing_type`` so the result is fully
# well-formed without needing the consumer to patch it later.
params = {
"type": "object",
"properties": {
"things": {"type": "array", "items": []},
},
}
out = sanitize_moonshot_tool_parameters(params)
items = out["properties"]["things"]["items"]
# Must be a dict and must carry a ``type`` (the whole point of Rule 1).
assert isinstance(items, dict)
assert items.get("type")
def test_tuple_items_first_element_is_repaired(self):
# The first element itself has a missing type — it should be filled.
params = {
"type": "object",
"properties": {
"pair": {
"type": "array",
"items": [{"description": "first"}, {"description": "second"}],
},
},
}
out = sanitize_moonshot_tool_parameters(params)
# Repaired to a single schema with a synthetic type.
assert out["properties"]["pair"]["items"] == {
"description": "first",
"type": "string",
}
def test_single_schema_items_unchanged(self):
params = {
"type": "object",
"properties": {
"tags": {"type": "array", "items": {"type": "string"}},
},
}
out = sanitize_moonshot_tool_parameters(params)
assert out["properties"]["tags"]["items"] == {"type": "string"}
class TestTopLevelGuarantees:
"""The returned top-level schema is always a well-formed object."""
+206
View File
@@ -0,0 +1,206 @@
"""Tests for cli._cprint's bg-thread cooperation with prompt_toolkit.
Background: when a prompt_toolkit Application is running, a bg thread that
calls ``_pt_print`` directly can race with the input-area redraw and the
printed line can end up visually buried behind the prompt. ``_cprint`` now
routes cross-thread prints through ``run_in_terminal`` via
``loop.call_soon_threadsafe`` so the self-improvement background review's
``💾 Self-improvement review: `` summary actually surfaces to the user.
These tests verify the routing logic without spinning up a real PT app.
"""
from __future__ import annotations
import sys
import types
from types import SimpleNamespace
import cli
def test_cprint_no_app_direct_print(monkeypatch):
"""No active app → direct _pt_print, no run_in_terminal involvement."""
calls = []
monkeypatch.setattr(cli, "_pt_print", lambda x: calls.append(("pt_print", x)))
monkeypatch.setattr(cli, "_PT_ANSI", lambda t: ("ANSI", t))
# Patch the prompt_toolkit import the function performs internally.
fake_pt_app = types.ModuleType("prompt_toolkit.application")
fake_pt_app.get_app_or_none = lambda: None
fake_pt_app.run_in_terminal = lambda *a, **kw: calls.append(("run_in_terminal",))
monkeypatch.setitem(sys.modules, "prompt_toolkit.application", fake_pt_app)
cli._cprint("hello")
assert calls == [("pt_print", ("ANSI", "hello"))]
def test_cprint_app_not_running_direct_print(monkeypatch):
"""App exists but not running (e.g. teardown) → direct print."""
calls = []
monkeypatch.setattr(cli, "_pt_print", lambda x: calls.append(("pt_print", x)))
monkeypatch.setattr(cli, "_PT_ANSI", lambda t: t)
fake_app = SimpleNamespace(_is_running=False, loop=None)
fake_pt_app = types.ModuleType("prompt_toolkit.application")
fake_pt_app.get_app_or_none = lambda: fake_app
fake_pt_app.run_in_terminal = lambda *a, **kw: calls.append(("run_in_terminal",))
monkeypatch.setitem(sys.modules, "prompt_toolkit.application", fake_pt_app)
cli._cprint("x")
assert calls == [("pt_print", "x")]
def test_cprint_bg_thread_schedules_on_app_loop(monkeypatch):
"""App running + different thread → schedules via call_soon_threadsafe."""
scheduled = []
direct_prints = []
monkeypatch.setattr(cli, "_pt_print", lambda x: direct_prints.append(x))
monkeypatch.setattr(cli, "_PT_ANSI", lambda t: t)
class FakeLoop:
def is_running(self):
return True
def call_soon_threadsafe(self, cb, *args):
scheduled.append(cb)
fake_loop = FakeLoop()
# Install a fake "current loop" that is NOT the app's loop, so the
# cross-thread branch is taken.
fake_current_loop = SimpleNamespace(is_running=lambda: True)
fake_asyncio = types.ModuleType("asyncio")
class _Policy:
def get_event_loop(self):
return fake_current_loop
fake_asyncio.get_event_loop_policy = lambda: _Policy()
monkeypatch.setitem(sys.modules, "asyncio", fake_asyncio)
fake_app = SimpleNamespace(_is_running=True, loop=fake_loop)
fake_pt_app = types.ModuleType("prompt_toolkit.application")
fake_pt_app.get_app_or_none = lambda: fake_app
run_in_terminal_calls = []
def _fake_run_in_terminal(func, **kw):
run_in_terminal_calls.append(func)
# Simulate run_in_terminal actually calling func (as the real PT
# impl would once the app loop tick picks it up).
func()
return None
fake_pt_app.run_in_terminal = _fake_run_in_terminal
monkeypatch.setitem(sys.modules, "prompt_toolkit.application", fake_pt_app)
cli._cprint("💾 Self-improvement review: Skill updated")
# call_soon_threadsafe must have been called with a scheduling cb.
assert len(scheduled) == 1
# Invoking the scheduled callback should hit run_in_terminal.
scheduled[0]()
assert len(run_in_terminal_calls) == 1
# And run_in_terminal's inner func should have emitted a pt_print.
assert direct_prints == ["💾 Self-improvement review: Skill updated"]
def test_cprint_same_thread_as_app_loop_direct_print(monkeypatch):
"""App running on same thread → direct print (no scheduling)."""
direct_prints = []
monkeypatch.setattr(cli, "_pt_print", lambda x: direct_prints.append(x))
monkeypatch.setattr(cli, "_PT_ANSI", lambda t: t)
class FakeLoop:
def is_running(self):
return True
def call_soon_threadsafe(self, cb, *args):
raise AssertionError(
"call_soon_threadsafe must not be used on the app's own thread"
)
fake_loop = FakeLoop()
fake_asyncio = types.ModuleType("asyncio")
class _Policy:
def get_event_loop(self):
return fake_loop # same as app loop
fake_asyncio.get_event_loop_policy = lambda: _Policy()
monkeypatch.setitem(sys.modules, "asyncio", fake_asyncio)
fake_app = SimpleNamespace(_is_running=True, loop=fake_loop)
fake_pt_app = types.ModuleType("prompt_toolkit.application")
fake_pt_app.get_app_or_none = lambda: fake_app
fake_pt_app.run_in_terminal = lambda *a, **kw: None
monkeypatch.setitem(sys.modules, "prompt_toolkit.application", fake_pt_app)
cli._cprint("x")
assert direct_prints == ["x"]
def test_cprint_swallows_app_loop_attr_error(monkeypatch):
"""Loop missing on app → fall back to direct print, no crash."""
direct_prints = []
monkeypatch.setattr(cli, "_pt_print", lambda x: direct_prints.append(x))
monkeypatch.setattr(cli, "_PT_ANSI", lambda t: t)
class WeirdApp:
_is_running = True
@property
def loop(self):
raise RuntimeError("no loop for you")
fake_pt_app = types.ModuleType("prompt_toolkit.application")
fake_pt_app.get_app_or_none = lambda: WeirdApp()
fake_pt_app.run_in_terminal = lambda *a, **kw: None
monkeypatch.setitem(sys.modules, "prompt_toolkit.application", fake_pt_app)
cli._cprint("fallback")
assert direct_prints == ["fallback"]
def test_cprint_swallows_prompt_toolkit_import_error(monkeypatch):
"""If prompt_toolkit.application itself fails to import, fall back."""
direct_prints = []
monkeypatch.setattr(cli, "_pt_print", lambda x: direct_prints.append(x))
monkeypatch.setattr(cli, "_PT_ANSI", lambda t: t)
# Drop cached prompt_toolkit.application AND install a meta-path finder
# that raises ImportError on re-import.
monkeypatch.delitem(sys.modules, "prompt_toolkit.application", raising=False)
class _BlockFinder:
def find_module(self, name, path=None):
if name == "prompt_toolkit.application":
return self
return None
def load_module(self, name):
raise ImportError("blocked for test")
def find_spec(self, name, path=None, target=None):
if name == "prompt_toolkit.application":
# Returning a bogus spec that will fail on load works too,
# but raising here keeps the test simple.
raise ImportError("blocked for test")
return None
blocker = _BlockFinder()
sys.meta_path.insert(0, blocker)
try:
cli._cprint("fallback2")
finally:
sys.meta_path.remove(blocker)
assert direct_prints == ["fallback2"]
@@ -0,0 +1,16 @@
"""Static dashboard tests for browser-safe @nous-research/ui imports."""
from pathlib import Path
WEB_SRC = Path(__file__).resolve().parents[2] / "web" / "src"
def test_dashboard_does_not_import_nous_ui_root_barrel():
offenders = []
for ext in ("*.tsx", "*.ts"):
for path in WEB_SRC.rglob(ext):
content = path.read_text(encoding="utf-8")
if 'from "@nous-research/ui"' in content or "from '@nous-research/ui'" in content:
offenders.append(str(path.relative_to(WEB_SRC)))
assert offenders == []
@@ -0,0 +1,11 @@
"""Static dashboard tests for the Profiles navigation copy."""
from pathlib import Path
def test_profiles_nav_label_uses_short_multi_agents_copy():
en_i18n = Path(__file__).resolve().parents[2] / "web" / "src" / "i18n" / "en.ts"
content = en_i18n.read_text(encoding="utf-8")
assert 'profiles: "profiles : multi agents"' in content
assert "Profiles: Running Multiple Agents" not in content
+210
View File
@@ -0,0 +1,210 @@
"""Tests for the kanban CLI surface (hermes_cli.kanban)."""
from __future__ import annotations
import argparse
import json
import os
from pathlib import Path
import pytest
from hermes_cli import kanban as kc
from hermes_cli import kanban_db as kb
@pytest.fixture
def kanban_home(tmp_path, monkeypatch):
home = tmp_path / ".hermes"
home.mkdir()
monkeypatch.setenv("HERMES_HOME", str(home))
monkeypatch.setattr(Path, "home", lambda: tmp_path)
kb.init_db()
return home
# ---------------------------------------------------------------------------
# Workspace flag parsing
# ---------------------------------------------------------------------------
@pytest.mark.parametrize(
"value,expected",
[
("scratch", ("scratch", None)),
("worktree", ("worktree", None)),
("dir:/tmp/work", ("dir", "/tmp/work")),
],
)
def test_parse_workspace_flag_valid(value, expected):
assert kc._parse_workspace_flag(value) == expected
def test_parse_workspace_flag_expands_user():
kind, path = kc._parse_workspace_flag("dir:~/vault")
assert kind == "dir"
assert path.endswith("/vault")
assert not path.startswith("~")
@pytest.mark.parametrize("bad", ["cloud", "dir:", "", "worktree:/x"])
def test_parse_workspace_flag_rejects(bad):
if not bad:
# Empty -> defaults; not an error.
assert kc._parse_workspace_flag(bad) == ("scratch", None)
return
with pytest.raises(argparse.ArgumentTypeError):
kc._parse_workspace_flag(bad)
# ---------------------------------------------------------------------------
# run_slash smoke tests (end-to-end via the same entry both CLI and gateway use)
# ---------------------------------------------------------------------------
def test_run_slash_no_args_shows_usage(kanban_home):
out = kc.run_slash("")
assert "kanban" in out.lower()
assert "create" in out.lower() or "subcommand" in out.lower() or "action" in out.lower()
def test_run_slash_create_and_list(kanban_home):
out = kc.run_slash("create 'ship feature' --assignee alice")
assert "Created" in out
out = kc.run_slash("list")
assert "ship feature" in out
assert "alice" in out
def test_run_slash_create_with_parent_and_cascade(kanban_home):
# Parent then child via --parent
out1 = kc.run_slash("create 'parent' --assignee alice")
# Extract the "t_xxxx" id from "Created t_xxxx (ready, ...)"
import re
m = re.search(r"(t_[a-f0-9]+)", out1)
assert m
p = m.group(1)
out2 = kc.run_slash(f"create 'child' --assignee bob --parent {p}")
assert "todo" in out2 # child starts as todo
# Complete parent; list should promote child to ready
kc.run_slash(f"complete {p}")
# Explicit filter: child should now be ready (was todo before complete).
ready_list = kc.run_slash("list --status ready")
assert "child" in ready_list
def test_run_slash_show_includes_comments(kanban_home):
out = kc.run_slash("create 'x'")
import re
tid = re.search(r"(t_[a-f0-9]+)", out).group(1)
kc.run_slash(f"comment {tid} 'source is paywalled'")
show = kc.run_slash(f"show {tid}")
assert "source is paywalled" in show
def test_run_slash_block_unblock_cycle(kanban_home):
out = kc.run_slash("create 'x' --assignee alice")
import re
tid = re.search(r"(t_[a-f0-9]+)", out).group(1)
# Claim first so block() finds it running
kc.run_slash(f"claim {tid}")
assert "Blocked" in kc.run_slash(f"block {tid} 'need decision'")
assert "Unblocked" in kc.run_slash(f"unblock {tid}")
def test_run_slash_json_output(kanban_home):
out = kc.run_slash("create 'jsontask' --assignee alice --json")
payload = json.loads(out)
assert payload["title"] == "jsontask"
assert payload["assignee"] == "alice"
assert payload["status"] == "ready"
def test_run_slash_dispatch_dry_run_counts(kanban_home):
kc.run_slash("create 'a' --assignee alice")
kc.run_slash("create 'b' --assignee bob")
out = kc.run_slash("dispatch --dry-run")
assert "Spawned:" in out
def test_run_slash_context_output_format(kanban_home):
out = kc.run_slash("create 'tech spec' --assignee alice --body 'write an RFC'")
import re
tid = re.search(r"(t_[a-f0-9]+)", out).group(1)
kc.run_slash(f"comment {tid} 'remember to include performance section'")
ctx = kc.run_slash(f"context {tid}")
assert "tech spec" in ctx
assert "write an RFC" in ctx
assert "performance section" in ctx
def test_run_slash_tenant_filter(kanban_home):
kc.run_slash("create 'biz-a task' --tenant biz-a --assignee alice")
kc.run_slash("create 'biz-b task' --tenant biz-b --assignee alice")
a = kc.run_slash("list --tenant biz-a")
b = kc.run_slash("list --tenant biz-b")
assert "biz-a task" in a and "biz-b task" not in a
assert "biz-b task" in b and "biz-a task" not in b
def test_run_slash_usage_error_returns_message(kanban_home):
# Missing required argument for create
out = kc.run_slash("create")
assert "usage" in out.lower() or "error" in out.lower()
def test_run_slash_assign_reassigns(kanban_home):
out = kc.run_slash("create 'x' --assignee alice")
import re
tid = re.search(r"(t_[a-f0-9]+)", out).group(1)
assert "Assigned" in kc.run_slash(f"assign {tid} bob")
show = kc.run_slash(f"show {tid}")
assert "bob" in show
def test_run_slash_link_unlink(kanban_home):
a = kc.run_slash("create 'a'")
b = kc.run_slash("create 'b'")
import re
ta = re.search(r"(t_[a-f0-9]+)", a).group(1)
tb = re.search(r"(t_[a-f0-9]+)", b).group(1)
assert "Linked" in kc.run_slash(f"link {ta} {tb}")
# After link, b is todo
show = kc.run_slash(f"show {tb}")
assert "todo" in show
assert "Unlinked" in kc.run_slash(f"unlink {ta} {tb}")
# ---------------------------------------------------------------------------
# Integration with the COMMAND_REGISTRY
# ---------------------------------------------------------------------------
def test_kanban_is_resolvable():
from hermes_cli.commands import resolve_command
cmd = resolve_command("kanban")
assert cmd is not None
assert cmd.name == "kanban"
def test_kanban_bypasses_active_session_guard():
from hermes_cli.commands import should_bypass_active_session
assert should_bypass_active_session("kanban")
def test_kanban_in_autocomplete_table():
from hermes_cli.commands import COMMANDS, SUBCOMMANDS
assert "/kanban" in COMMANDS
subs = SUBCOMMANDS.get("/kanban") or []
assert "create" in subs
assert "dispatch" in subs
def test_kanban_not_gateway_only():
# kanban is available in BOTH CLI and gateway surfaces.
from hermes_cli.commands import COMMAND_REGISTRY
cmd = next(c for c in COMMAND_REGISTRY if c.name == "kanban")
assert not cmd.cli_only
assert not cmd.gateway_only
File diff suppressed because it is too large Load Diff
+438
View File
@@ -0,0 +1,438 @@
"""Tests for the Kanban DB layer (hermes_cli.kanban_db)."""
from __future__ import annotations
import concurrent.futures
import os
import time
from pathlib import Path
import pytest
from hermes_cli import kanban_db as kb
@pytest.fixture
def kanban_home(tmp_path, monkeypatch):
"""Isolated HERMES_HOME with an empty kanban DB."""
home = tmp_path / ".hermes"
home.mkdir()
monkeypatch.setenv("HERMES_HOME", str(home))
monkeypatch.setattr(Path, "home", lambda: tmp_path)
kb.init_db()
return home
# ---------------------------------------------------------------------------
# Schema / init
# ---------------------------------------------------------------------------
def test_init_db_is_idempotent(kanban_home):
# Second call should not error or drop data.
with kb.connect() as conn:
kb.create_task(conn, title="persisted")
kb.init_db()
with kb.connect() as conn:
tasks = kb.list_tasks(conn)
assert len(tasks) == 1
assert tasks[0].title == "persisted"
def test_init_creates_expected_tables(kanban_home):
with kb.connect() as conn:
rows = conn.execute(
"SELECT name FROM sqlite_master WHERE type='table' ORDER BY name"
).fetchall()
names = {r["name"] for r in rows}
assert {"tasks", "task_links", "task_comments", "task_events"} <= names
# ---------------------------------------------------------------------------
# Task creation + status inference
# ---------------------------------------------------------------------------
def test_create_task_no_parents_is_ready(kanban_home):
with kb.connect() as conn:
tid = kb.create_task(conn, title="ship it", assignee="alice")
t = kb.get_task(conn, tid)
assert t is not None
assert t.status == "ready"
assert t.assignee == "alice"
assert t.workspace_kind == "scratch"
def test_create_task_with_parent_is_todo_until_parent_done(kanban_home):
with kb.connect() as conn:
p = kb.create_task(conn, title="parent")
c = kb.create_task(conn, title="child", parents=[p])
assert kb.get_task(conn, c).status == "todo"
kb.complete_task(conn, p, result="ok")
assert kb.get_task(conn, c).status == "ready"
def test_create_task_unknown_parent_errors(kanban_home):
with kb.connect() as conn, pytest.raises(ValueError, match="unknown parent"):
kb.create_task(conn, title="orphan", parents=["t_ghost"])
def test_workspace_kind_validation(kanban_home):
with kb.connect() as conn, pytest.raises(ValueError, match="workspace_kind"):
kb.create_task(conn, title="bad ws", workspace_kind="cloud")
# ---------------------------------------------------------------------------
# Links + dependency resolution
# ---------------------------------------------------------------------------
def test_link_demotes_ready_child_to_todo_when_parent_not_done(kanban_home):
with kb.connect() as conn:
a = kb.create_task(conn, title="a")
b = kb.create_task(conn, title="b")
assert kb.get_task(conn, b).status == "ready"
kb.link_tasks(conn, a, b)
assert kb.get_task(conn, b).status == "todo"
def test_link_keeps_ready_child_when_parent_already_done(kanban_home):
with kb.connect() as conn:
a = kb.create_task(conn, title="a")
kb.complete_task(conn, a)
b = kb.create_task(conn, title="b")
assert kb.get_task(conn, b).status == "ready"
kb.link_tasks(conn, a, b)
assert kb.get_task(conn, b).status == "ready"
def test_link_rejects_self_loop(kanban_home):
with kb.connect() as conn:
a = kb.create_task(conn, title="a")
with pytest.raises(ValueError, match="itself"):
kb.link_tasks(conn, a, a)
def test_link_detects_cycle(kanban_home):
with kb.connect() as conn:
a = kb.create_task(conn, title="a")
b = kb.create_task(conn, title="b", parents=[a])
c = kb.create_task(conn, title="c", parents=[b])
with pytest.raises(ValueError, match="cycle"):
kb.link_tasks(conn, c, a)
with pytest.raises(ValueError, match="cycle"):
kb.link_tasks(conn, b, a)
def test_recompute_ready_cascades_through_chain(kanban_home):
with kb.connect() as conn:
a = kb.create_task(conn, title="a")
b = kb.create_task(conn, title="b", parents=[a])
c = kb.create_task(conn, title="c", parents=[b])
assert [kb.get_task(conn, x).status for x in (a, b, c)] == \
["ready", "todo", "todo"]
kb.complete_task(conn, a)
assert kb.get_task(conn, b).status == "ready"
kb.complete_task(conn, b)
assert kb.get_task(conn, c).status == "ready"
def test_recompute_ready_fan_in_waits_for_all_parents(kanban_home):
with kb.connect() as conn:
a = kb.create_task(conn, title="a")
b = kb.create_task(conn, title="b")
c = kb.create_task(conn, title="c", parents=[a, b])
kb.complete_task(conn, a)
assert kb.get_task(conn, c).status == "todo"
kb.complete_task(conn, b)
assert kb.get_task(conn, c).status == "ready"
# ---------------------------------------------------------------------------
# Atomic claim (CAS)
# ---------------------------------------------------------------------------
def test_claim_once_wins_second_loses(kanban_home):
with kb.connect() as conn:
t = kb.create_task(conn, title="x", assignee="a")
first = kb.claim_task(conn, t, claimer="host:1")
assert first is not None and first.status == "running"
second = kb.claim_task(conn, t, claimer="host:2")
assert second is None
def test_claim_fails_on_non_ready(kanban_home):
with kb.connect() as conn:
t = kb.create_task(conn, title="x")
# Move to todo by introducing an unsatisfied parent.
p = kb.create_task(conn, title="p")
kb.link_tasks(conn, p, t)
assert kb.get_task(conn, t).status == "todo"
assert kb.claim_task(conn, t) is None
def test_stale_claim_reclaimed(kanban_home):
with kb.connect() as conn:
t = kb.create_task(conn, title="x", assignee="a")
kb.claim_task(conn, t)
# Rewind claim_expires so it looks stale.
conn.execute(
"UPDATE tasks SET claim_expires = ? WHERE id = ?",
(int(time.time()) - 3600, t),
)
reclaimed = kb.release_stale_claims(conn)
assert reclaimed == 1
assert kb.get_task(conn, t).status == "ready"
def test_heartbeat_extends_claim(kanban_home):
with kb.connect() as conn:
t = kb.create_task(conn, title="x", assignee="a")
claimer = "host:hb"
kb.claim_task(conn, t, claimer=claimer, ttl_seconds=60)
original = kb.get_task(conn, t).claim_expires
# Rewind then heartbeat.
conn.execute("UPDATE tasks SET claim_expires = ? WHERE id = ?", (0, t))
ok = kb.heartbeat_claim(conn, t, claimer=claimer, ttl_seconds=3600)
assert ok
new = kb.get_task(conn, t).claim_expires
assert new > int(time.time()) + 3000
def test_concurrent_claims_only_one_wins(kanban_home):
"""Fire N threads claiming the same task; exactly one must win."""
with kb.connect() as conn:
t = kb.create_task(conn, title="race", assignee="a")
def attempt(i):
with kb.connect() as c:
return kb.claim_task(c, t, claimer=f"host:{i}")
n_workers = 8
with concurrent.futures.ThreadPoolExecutor(max_workers=n_workers) as ex:
results = list(ex.map(attempt, range(n_workers)))
winners = [r for r in results if r is not None]
assert len(winners) == 1
assert winners[0].status == "running"
# ---------------------------------------------------------------------------
# Complete / block / unblock / archive / assign
# ---------------------------------------------------------------------------
def test_complete_records_result(kanban_home):
with kb.connect() as conn:
t = kb.create_task(conn, title="x")
assert kb.complete_task(conn, t, result="done and dusted")
task = kb.get_task(conn, t)
assert task.status == "done"
assert task.result == "done and dusted"
assert task.completed_at is not None
def test_block_then_unblock(kanban_home):
with kb.connect() as conn:
t = kb.create_task(conn, title="x", assignee="a")
kb.claim_task(conn, t)
assert kb.block_task(conn, t, reason="need input")
assert kb.get_task(conn, t).status == "blocked"
assert kb.unblock_task(conn, t)
assert kb.get_task(conn, t).status == "ready"
def test_assign_refuses_while_running(kanban_home):
with kb.connect() as conn:
t = kb.create_task(conn, title="x", assignee="a")
kb.claim_task(conn, t)
with pytest.raises(RuntimeError, match="currently running"):
kb.assign_task(conn, t, "b")
def test_assign_reassigns_when_not_running(kanban_home):
with kb.connect() as conn:
t = kb.create_task(conn, title="x", assignee="a")
assert kb.assign_task(conn, t, "b")
assert kb.get_task(conn, t).assignee == "b"
def test_archive_hides_from_default_list(kanban_home):
with kb.connect() as conn:
t = kb.create_task(conn, title="x")
kb.complete_task(conn, t)
assert kb.archive_task(conn, t)
assert len(kb.list_tasks(conn)) == 0
assert len(kb.list_tasks(conn, include_archived=True)) == 1
# ---------------------------------------------------------------------------
# Comments / events / worker context
# ---------------------------------------------------------------------------
def test_comments_recorded_in_order(kanban_home):
with kb.connect() as conn:
t = kb.create_task(conn, title="x")
kb.add_comment(conn, t, "user", "first")
kb.add_comment(conn, t, "researcher", "second")
comments = kb.list_comments(conn, t)
assert [c.body for c in comments] == ["first", "second"]
assert [c.author for c in comments] == ["user", "researcher"]
def test_empty_comment_rejected(kanban_home):
with kb.connect() as conn:
t = kb.create_task(conn, title="x")
with pytest.raises(ValueError, match="body is required"):
kb.add_comment(conn, t, "user", "")
def test_events_capture_lifecycle(kanban_home):
with kb.connect() as conn:
t = kb.create_task(conn, title="x", assignee="a")
kb.claim_task(conn, t)
kb.complete_task(conn, t, result="ok")
events = kb.list_events(conn, t)
kinds = [e.kind for e in events]
assert "created" in kinds
assert "claimed" in kinds
assert "completed" in kinds
def test_worker_context_includes_parent_results_and_comments(kanban_home):
with kb.connect() as conn:
p = kb.create_task(conn, title="p")
kb.complete_task(conn, p, result="PARENT_RESULT_MARKER")
c = kb.create_task(conn, title="child", parents=[p])
kb.add_comment(conn, c, "user", "CLARIFICATION_MARKER")
ctx = kb.build_worker_context(conn, c)
assert "PARENT_RESULT_MARKER" in ctx
assert "CLARIFICATION_MARKER" in ctx
assert c in ctx
assert "child" in ctx
# ---------------------------------------------------------------------------
# Dispatcher
# ---------------------------------------------------------------------------
def test_dispatch_dry_run_does_not_claim(kanban_home):
with kb.connect() as conn:
t1 = kb.create_task(conn, title="a", assignee="alice")
t2 = kb.create_task(conn, title="b", assignee="bob")
res = kb.dispatch_once(conn, dry_run=True)
assert {s[0] for s in res.spawned} == {t1, t2}
with kb.connect() as conn:
# Dry run must NOT mutate status.
assert kb.get_task(conn, t1).status == "ready"
assert kb.get_task(conn, t2).status == "ready"
def test_dispatch_skips_unassigned(kanban_home):
with kb.connect() as conn:
t = kb.create_task(conn, title="floater")
res = kb.dispatch_once(conn, dry_run=True)
assert t in res.skipped_unassigned
assert not res.spawned
def test_dispatch_promotes_ready_and_spawns(kanban_home):
spawns = []
def fake_spawn(task, workspace):
spawns.append((task.id, task.assignee, workspace))
with kb.connect() as conn:
p = kb.create_task(conn, title="p", assignee="alice")
c = kb.create_task(conn, title="c", assignee="bob", parents=[p])
# Finish parent outside dispatch; promotion happens inside.
kb.complete_task(conn, p)
res = kb.dispatch_once(conn, spawn_fn=fake_spawn)
# Spawned c (a was already done when dispatch was called).
assert len(spawns) == 1
assert spawns[0][0] == c
assert spawns[0][1] == "bob"
# c is now running
with kb.connect() as conn:
assert kb.get_task(conn, c).status == "running"
def test_dispatch_spawn_failure_releases_claim(kanban_home):
def boom(task, workspace):
raise RuntimeError("spawn failed")
with kb.connect() as conn:
t = kb.create_task(conn, title="boom", assignee="alice")
kb.dispatch_once(conn, spawn_fn=boom)
# Must return to ready so the next tick can retry.
assert kb.get_task(conn, t).status == "ready"
assert kb.get_task(conn, t).claim_lock is None
def test_dispatch_reclaims_stale_before_spawning(kanban_home):
with kb.connect() as conn:
t = kb.create_task(conn, title="x", assignee="alice")
kb.claim_task(conn, t)
conn.execute(
"UPDATE tasks SET claim_expires = ? WHERE id = ?",
(int(time.time()) - 1, t),
)
res = kb.dispatch_once(conn, dry_run=True)
assert res.reclaimed == 1
# ---------------------------------------------------------------------------
# Workspace resolution
# ---------------------------------------------------------------------------
def test_scratch_workspace_created_under_hermes_home(kanban_home):
with kb.connect() as conn:
t = kb.create_task(conn, title="x")
task = kb.get_task(conn, t)
ws = kb.resolve_workspace(task)
assert ws.exists()
assert ws.is_dir()
assert "kanban" in str(ws)
def test_dir_workspace_honors_given_path(kanban_home, tmp_path):
target = tmp_path / "my-vault"
with kb.connect() as conn:
t = kb.create_task(
conn, title="biz", workspace_kind="dir", workspace_path=str(target)
)
task = kb.get_task(conn, t)
ws = kb.resolve_workspace(task)
assert ws == target
assert ws.exists()
def test_worktree_workspace_returns_intended_path(kanban_home, tmp_path):
target = str(tmp_path / ".worktrees" / "my-task")
with kb.connect() as conn:
t = kb.create_task(
conn, title="ship", workspace_kind="worktree", workspace_path=target
)
task = kb.get_task(conn, t)
ws = kb.resolve_workspace(task)
# We do NOT auto-create worktrees; the worker's skill handles that.
assert str(ws) == target
# ---------------------------------------------------------------------------
# Tenancy
# ---------------------------------------------------------------------------
def test_tenant_column_filters_listings(kanban_home):
with kb.connect() as conn:
kb.create_task(conn, title="a1", tenant="biz-a")
kb.create_task(conn, title="b1", tenant="biz-b")
kb.create_task(conn, title="shared") # no tenant
biz_a = kb.list_tasks(conn, tenant="biz-a")
biz_b = kb.list_tasks(conn, tenant="biz-b")
assert [t.title for t in biz_a] == ["a1"]
assert [t.title for t in biz_b] == ["b1"]
def test_tenant_propagates_to_events(kanban_home):
with kb.connect() as conn:
t = kb.create_task(conn, title="tenant-task", tenant="biz-a")
events = kb.list_events(conn, t)
# The "created" event should have tenant in its payload.
created = [e for e in events if e.kind == "created"]
assert created and created[0].payload.get("tenant") == "biz-a"
+17
View File
@@ -149,6 +149,23 @@ class TestCreateProfile:
assert (profile_dir / ".env").read_text() == "KEY=val"
assert (profile_dir / "SOUL.md").read_text() == "Be helpful."
def test_clone_config_copies_source_skills(self, profile_env):
tmp_path = profile_env
default_home = tmp_path / ".hermes"
skill_dir = default_home / "skills" / "custom" / "installed-skill"
skill_dir.mkdir(parents=True)
(skill_dir / "SKILL.md").write_text("---\nname: installed-skill\n---\n")
profile_dir = create_profile("coder", clone_config=True, no_alias=True)
assert (
profile_dir
/ "skills"
/ "custom"
/ "installed-skill"
/ "SKILL.md"
).read_text() == "---\nname: installed-skill\n---\n"
def test_clone_all_copies_entire_tree(self, profile_env):
tmp_path = profile_env
default_home = tmp_path / ".hermes"
+216
View File
@@ -591,6 +591,222 @@ class TestNewEndpoints:
resp = self.client.get("/api/cron/jobs/nonexistent-id")
assert resp.status_code == 404
# --- Profiles ---
def test_profiles_list_includes_default(self):
from hermes_constants import get_hermes_home
get_hermes_home().mkdir(parents=True, exist_ok=True)
resp = self.client.get("/api/profiles")
assert resp.status_code == 200
names = [p["name"] for p in resp.json()["profiles"]]
assert "default" in names
def test_profiles_list_falls_back_when_profile_listing_fails(self, monkeypatch):
from hermes_constants import get_hermes_home
import hermes_cli.profiles as profiles_mod
hermes_home = get_hermes_home()
hermes_home.mkdir(parents=True, exist_ok=True)
(hermes_home / "config.yaml").write_text(
"model:\n provider: openrouter\n name: anthropic/claude-sonnet-4.6\n",
encoding="utf-8",
)
named = hermes_home / "profiles" / "multi-agent"
named.mkdir(parents=True)
(named / ".env").write_text("EXAMPLE=1\n", encoding="utf-8")
(named / "skills" / "demo").mkdir(parents=True)
(named / "skills" / "demo" / "SKILL.md").write_text("---\nname: demo\n---\n", encoding="utf-8")
monkeypatch.setattr(
profiles_mod,
"list_profiles",
lambda: (_ for _ in ()).throw(RuntimeError("boom")),
)
resp = self.client.get("/api/profiles")
assert resp.status_code == 200
profiles = {p["name"]: p for p in resp.json()["profiles"]}
assert profiles["default"]["is_default"] is True
assert profiles["default"]["provider"] == "openrouter"
assert profiles["multi-agent"]["has_env"] is True
assert profiles["multi-agent"]["skill_count"] == 1
def test_profiles_create_rename_delete_round_trip(self, monkeypatch):
# Stub gateway service teardown so the test doesn't shell out to
# launchctl/systemctl on the host.
import hermes_cli.profiles as profiles_mod
monkeypatch.setattr(profiles_mod, "_cleanup_gateway_service", lambda *a, **kw: None)
created = self.client.post("/api/profiles", json={"name": "test-prof"})
assert created.status_code == 200
renamed = self.client.patch(
"/api/profiles/test-prof",
json={"new_name": "test-prof-2"},
)
assert renamed.status_code == 200
names = [p["name"] for p in self.client.get("/api/profiles").json()["profiles"]]
assert "test-prof" not in names
assert "test-prof-2" in names
deleted = self.client.delete("/api/profiles/test-prof-2")
assert deleted.status_code == 200
names = [p["name"] for p in self.client.get("/api/profiles").json()["profiles"]]
assert "test-prof-2" not in names
def test_profile_setup_command_uses_named_profile_wrapper(self):
from hermes_constants import get_hermes_home
(get_hermes_home() / "profiles" / "coder").mkdir(parents=True)
resp = self.client.get("/api/profiles/coder/setup-command")
assert resp.status_code == 200
assert resp.json()["command"] == "coder setup"
def test_profile_setup_command_uses_hermes_for_default_profile(self):
from hermes_constants import get_hermes_home
get_hermes_home().mkdir(parents=True, exist_ok=True)
resp = self.client.get("/api/profiles/default/setup-command")
assert resp.status_code == 200
assert resp.json()["command"] == "hermes setup"
def test_profiles_create_creates_wrapper_alias_when_safe(self, monkeypatch, tmp_path):
import hermes_cli.profiles as profiles_mod
wrapper_dir = tmp_path / "bin"
wrapper_dir.mkdir()
monkeypatch.setattr(profiles_mod, "_get_wrapper_dir", lambda: wrapper_dir)
resp = self.client.post(
"/api/profiles",
json={"name": "writer", "clone_from_default": False},
)
assert resp.status_code == 200
wrapper_path = wrapper_dir / "writer"
assert wrapper_path.exists()
assert wrapper_path.read_text() == '#!/bin/sh\nexec hermes -p writer "$@"\n'
def test_profiles_create_with_clone_from_default_copies_default_skills(self, monkeypatch):
from hermes_constants import get_hermes_home
import hermes_cli.profiles as profiles_mod
monkeypatch.setattr(profiles_mod, "create_wrapper_script", lambda name: None)
default_skill = get_hermes_home() / "skills" / "custom" / "new-skill"
default_skill.mkdir(parents=True)
(default_skill / "SKILL.md").write_text("---\nname: new-skill\n---\n", encoding="utf-8")
resp = self.client.post(
"/api/profiles",
json={"name": "cloned", "clone_from_default": True},
)
assert resp.status_code == 200
cloned_skill = get_hermes_home() / "profiles" / "cloned" / "skills" / "custom" / "new-skill" / "SKILL.md"
assert cloned_skill.exists()
profiles = {p["name"]: p for p in self.client.get("/api/profiles").json()["profiles"]}
assert profiles["cloned"]["skill_count"] == 1
def test_profiles_create_without_clone_seeds_bundled_skills(self, monkeypatch):
from hermes_constants import get_hermes_home
import hermes_cli.profiles as profiles_mod
monkeypatch.setattr(profiles_mod, "create_wrapper_script", lambda name: None)
def fake_seed(profile_dir, quiet=False):
skill_dir = profile_dir / "skills" / "software-development" / "plan"
skill_dir.mkdir(parents=True)
(skill_dir / "SKILL.md").write_text("---\nname: plan\n---\n", encoding="utf-8")
return {"copied": ["plan"]}
monkeypatch.setattr(profiles_mod, "seed_profile_skills", fake_seed)
resp = self.client.post(
"/api/profiles",
json={"name": "fresh", "clone_from_default": False},
)
assert resp.status_code == 200
seeded_skill = get_hermes_home() / "profiles" / "fresh" / "skills" / "software-development" / "plan" / "SKILL.md"
assert seeded_skill.exists()
profiles = {p["name"]: p for p in self.client.get("/api/profiles").json()["profiles"]}
assert profiles["fresh"]["skill_count"] == 1
def test_profile_open_terminal_uses_macos_terminal(self, monkeypatch):
from hermes_constants import get_hermes_home
import hermes_cli.web_server as web_server
(get_hermes_home() / "profiles" / "coder").mkdir(parents=True)
calls = []
monkeypatch.setattr(web_server.sys, "platform", "darwin")
monkeypatch.setattr(web_server.subprocess, "Popen", lambda args, **kwargs: calls.append(args))
resp = self.client.post("/api/profiles/coder/open-terminal")
assert resp.status_code == 200
assert calls
assert calls[0][0] == "osascript"
assert "coder setup" in " ".join(calls[0])
def test_profile_open_terminal_uses_windows_cmd(self, monkeypatch):
from hermes_constants import get_hermes_home
import hermes_cli.web_server as web_server
(get_hermes_home() / "profiles" / "coder").mkdir(parents=True)
calls = []
monkeypatch.setattr(web_server.sys, "platform", "win32")
monkeypatch.setattr(web_server.subprocess, "Popen", lambda args, **kwargs: calls.append(args))
resp = self.client.post("/api/profiles/coder/open-terminal")
assert resp.status_code == 200
assert calls
assert calls[0][:4] == ["cmd.exe", "/c", "start", ""]
assert calls[0][-1] == "coder setup"
def test_profiles_create_rejects_invalid_name(self):
resp = self.client.post("/api/profiles", json={"name": "Has Spaces"})
assert resp.status_code == 400
def test_profiles_delete_default_forbidden(self):
resp = self.client.delete("/api/profiles/default")
assert resp.status_code == 400
def test_profiles_delete_not_found(self):
resp = self.client.delete("/api/profiles/does-not-exist")
assert resp.status_code == 404
def test_profile_soul_round_trip(self, monkeypatch):
import hermes_cli.profiles as profiles_mod
monkeypatch.setattr(profiles_mod, "_cleanup_gateway_service", lambda *a, **kw: None)
self.client.post("/api/profiles", json={"name": "soul-prof"})
get1 = self.client.get("/api/profiles/soul-prof/soul")
assert get1.status_code == 200
assert get1.json()["exists"] is True
put = self.client.put(
"/api/profiles/soul-prof/soul",
json={"content": "# Edited soul"},
)
assert put.status_code == 200
got = self.client.get("/api/profiles/soul-prof/soul").json()
assert got["content"] == "# Edited soul"
self.client.delete("/api/profiles/soul-prof")
def test_profile_soul_unknown_profile_404(self):
resp = self.client.get("/api/profiles/nonexistent/soul")
assert resp.status_code == 404
def test_skills_list(self):
resp = self.client.get("/api/skills")
assert resp.status_code == 200
@@ -0,0 +1,889 @@
"""Tests for the Kanban dashboard plugin backend (plugins/kanban/dashboard/plugin_api.py).
The plugin mounts as /api/plugins/kanban/ inside the dashboard's FastAPI app,
but here we attach its router to a bare FastAPI instance so we can test the
REST surface without spinning up the whole dashboard.
"""
from __future__ import annotations
import importlib.util
import os
import sys
import time
from pathlib import Path
import pytest
from fastapi import FastAPI
from fastapi.testclient import TestClient
from hermes_cli import kanban_db as kb
# ---------------------------------------------------------------------------
# Fixtures
# ---------------------------------------------------------------------------
def _load_plugin_router():
"""Dynamically load plugins/kanban/dashboard/plugin_api.py and return its router."""
repo_root = Path(__file__).resolve().parents[2]
plugin_file = repo_root / "plugins" / "kanban" / "dashboard" / "plugin_api.py"
assert plugin_file.exists(), f"plugin file missing: {plugin_file}"
spec = importlib.util.spec_from_file_location(
"hermes_dashboard_plugin_kanban_test", plugin_file,
)
assert spec is not None and spec.loader is not None
mod = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = mod
spec.loader.exec_module(mod)
return mod.router
@pytest.fixture
def kanban_home(tmp_path, monkeypatch):
"""Isolated HERMES_HOME with an empty kanban DB."""
home = tmp_path / ".hermes"
home.mkdir()
monkeypatch.setenv("HERMES_HOME", str(home))
monkeypatch.setattr(Path, "home", lambda: tmp_path)
kb.init_db()
return home
@pytest.fixture
def client(kanban_home):
app = FastAPI()
app.include_router(_load_plugin_router(), prefix="/api/plugins/kanban")
return TestClient(app)
# ---------------------------------------------------------------------------
# GET /board on an empty DB
# ---------------------------------------------------------------------------
def test_board_empty(client):
r = client.get("/api/plugins/kanban/board")
assert r.status_code == 200
data = r.json()
# All canonical columns present (triage + the rest), each empty.
names = [c["name"] for c in data["columns"]]
for expected in ("triage", "todo", "ready", "running", "blocked", "done"):
assert expected in names, f"missing column {expected}: {names}"
assert all(len(c["tasks"]) == 0 for c in data["columns"])
assert data["tenants"] == []
assert data["assignees"] == []
assert data["latest_event_id"] == 0
# ---------------------------------------------------------------------------
# POST /tasks then GET /board sees it
# ---------------------------------------------------------------------------
def test_create_task_appears_on_board(client):
r = client.post(
"/api/plugins/kanban/tasks",
json={
"title": "Research LLM caching",
"assignee": "researcher",
"priority": 3,
"tenant": "acme",
},
)
assert r.status_code == 200, r.text
task = r.json()["task"]
assert task["title"] == "Research LLM caching"
assert task["assignee"] == "researcher"
assert task["status"] == "ready" # no parents -> immediately ready
assert task["priority"] == 3
assert task["tenant"] == "acme"
task_id = task["id"]
# Board now lists it under 'ready'.
r = client.get("/api/plugins/kanban/board")
assert r.status_code == 200
data = r.json()
ready = next(c for c in data["columns"] if c["name"] == "ready")
assert len(ready["tasks"]) == 1
assert ready["tasks"][0]["id"] == task_id
assert "acme" in data["tenants"]
assert "researcher" in data["assignees"]
def test_tenant_filter(client):
client.post("/api/plugins/kanban/tasks", json={"title": "A", "tenant": "t1"})
client.post("/api/plugins/kanban/tasks", json={"title": "B", "tenant": "t2"})
r = client.get("/api/plugins/kanban/board?tenant=t1")
counts = {c["name"]: len(c["tasks"]) for c in r.json()["columns"]}
total = sum(counts.values())
assert total == 1
r = client.get("/api/plugins/kanban/board?tenant=t2")
total = sum(len(c["tasks"]) for c in r.json()["columns"])
assert total == 1
# ---------------------------------------------------------------------------
# GET /tasks/:id returns body + comments + events + links
# ---------------------------------------------------------------------------
def test_task_detail_includes_links_and_events(client):
parent = client.post(
"/api/plugins/kanban/tasks", json={"title": "parent"},
).json()["task"]
child = client.post(
"/api/plugins/kanban/tasks",
json={"title": "child", "parents": [parent["id"]]},
).json()["task"]
assert child["status"] == "todo" # parent not done yet
# Detail for the child shows the parent link.
r = client.get(f"/api/plugins/kanban/tasks/{child['id']}")
assert r.status_code == 200
data = r.json()
assert data["task"]["id"] == child["id"]
assert parent["id"] in data["links"]["parents"]
# Detail for the parent shows the child.
r = client.get(f"/api/plugins/kanban/tasks/{parent['id']}")
assert child["id"] in r.json()["links"]["children"]
# Events exist from creation.
assert len(data["events"]) >= 1
def test_task_detail_404_on_unknown(client):
r = client.get("/api/plugins/kanban/tasks/does-not-exist")
assert r.status_code == 404
# ---------------------------------------------------------------------------
# PATCH /tasks/:id — status transitions
# ---------------------------------------------------------------------------
def test_patch_status_complete(client):
t = client.post("/api/plugins/kanban/tasks", json={"title": "x"}).json()["task"]
r = client.patch(
f"/api/plugins/kanban/tasks/{t['id']}",
json={"status": "done", "result": "shipped"},
)
assert r.status_code == 200
assert r.json()["task"]["status"] == "done"
# Board reflects the move.
done = next(
c for c in client.get("/api/plugins/kanban/board").json()["columns"]
if c["name"] == "done"
)
assert any(x["id"] == t["id"] for x in done["tasks"])
def test_patch_block_then_unblock(client):
t = client.post("/api/plugins/kanban/tasks", json={"title": "x"}).json()["task"]
r = client.patch(
f"/api/plugins/kanban/tasks/{t['id']}",
json={"status": "blocked", "block_reason": "need input"},
)
assert r.status_code == 200
assert r.json()["task"]["status"] == "blocked"
r = client.patch(
f"/api/plugins/kanban/tasks/{t['id']}",
json={"status": "ready"},
)
assert r.status_code == 200
assert r.json()["task"]["status"] == "ready"
def test_patch_drag_drop_move_todo_to_ready(client):
"""Direct status write: the drag-drop path for statuses without a
dedicated verb (e.g. manually promoting todo -> ready)."""
parent = client.post("/api/plugins/kanban/tasks", json={"title": "p"}).json()["task"]
child = client.post(
"/api/plugins/kanban/tasks",
json={"title": "c", "parents": [parent["id"]]},
).json()["task"]
assert child["status"] == "todo"
r = client.patch(
f"/api/plugins/kanban/tasks/{child['id']}",
json={"status": "ready"},
)
assert r.status_code == 200
assert r.json()["task"]["status"] == "ready"
def test_patch_reassign(client):
t = client.post(
"/api/plugins/kanban/tasks",
json={"title": "x", "assignee": "a"},
).json()["task"]
r = client.patch(
f"/api/plugins/kanban/tasks/{t['id']}",
json={"assignee": "b"},
)
assert r.status_code == 200
assert r.json()["task"]["assignee"] == "b"
def test_patch_priority_and_edit(client):
t = client.post("/api/plugins/kanban/tasks", json={"title": "x"}).json()["task"]
r = client.patch(
f"/api/plugins/kanban/tasks/{t['id']}",
json={"priority": 5, "title": "renamed"},
)
assert r.status_code == 200
data = r.json()["task"]
assert data["priority"] == 5
assert data["title"] == "renamed"
def test_patch_invalid_status(client):
t = client.post("/api/plugins/kanban/tasks", json={"title": "x"}).json()["task"]
r = client.patch(
f"/api/plugins/kanban/tasks/{t['id']}",
json={"status": "banana"},
)
assert r.status_code == 400
# ---------------------------------------------------------------------------
# Comments + Links
# ---------------------------------------------------------------------------
def test_add_comment(client):
t = client.post("/api/plugins/kanban/tasks", json={"title": "x"}).json()["task"]
r = client.post(
f"/api/plugins/kanban/tasks/{t['id']}/comments",
json={"body": "how's progress?", "author": "teknium"},
)
assert r.status_code == 200
r = client.get(f"/api/plugins/kanban/tasks/{t['id']}")
comments = r.json()["comments"]
assert len(comments) == 1
assert comments[0]["body"] == "how's progress?"
assert comments[0]["author"] == "teknium"
def test_add_comment_empty_rejected(client):
t = client.post("/api/plugins/kanban/tasks", json={"title": "x"}).json()["task"]
r = client.post(
f"/api/plugins/kanban/tasks/{t['id']}/comments",
json={"body": " "},
)
assert r.status_code == 400
def test_add_link_and_delete_link(client):
a = client.post("/api/plugins/kanban/tasks", json={"title": "a"}).json()["task"]
b = client.post("/api/plugins/kanban/tasks", json={"title": "b"}).json()["task"]
r = client.post(
"/api/plugins/kanban/links",
json={"parent_id": a["id"], "child_id": b["id"]},
)
assert r.status_code == 200
r = client.get(f"/api/plugins/kanban/tasks/{b['id']}")
assert a["id"] in r.json()["links"]["parents"]
r = client.delete(
"/api/plugins/kanban/links",
params={"parent_id": a["id"], "child_id": b["id"]},
)
assert r.status_code == 200
assert r.json()["ok"] is True
def test_add_link_cycle_rejected(client):
a = client.post("/api/plugins/kanban/tasks", json={"title": "a"}).json()["task"]
b = client.post("/api/plugins/kanban/tasks", json={"title": "b"}).json()["task"]
client.post(
"/api/plugins/kanban/links",
json={"parent_id": a["id"], "child_id": b["id"]},
)
r = client.post(
"/api/plugins/kanban/links",
json={"parent_id": b["id"], "child_id": a["id"]},
)
assert r.status_code == 400
# ---------------------------------------------------------------------------
# Dispatch nudge
# ---------------------------------------------------------------------------
def test_dispatch_dry_run(client):
client.post(
"/api/plugins/kanban/tasks",
json={"title": "work", "assignee": "researcher"},
)
r = client.post("/api/plugins/kanban/dispatch?dry_run=true&max=4")
assert r.status_code == 200
body = r.json()
# DispatchResult is serialized as a dataclass dict.
assert isinstance(body, dict)
# ---------------------------------------------------------------------------
# Triage column (new v1 status)
# ---------------------------------------------------------------------------
def test_create_triage_lands_in_triage_column(client):
r = client.post(
"/api/plugins/kanban/tasks",
json={"title": "rough idea, spec me", "triage": True},
)
assert r.status_code == 200
task = r.json()["task"]
assert task["status"] == "triage"
r = client.get("/api/plugins/kanban/board")
triage = next(c for c in r.json()["columns"] if c["name"] == "triage")
assert len(triage["tasks"]) == 1
assert triage["tasks"][0]["title"] == "rough idea, spec me"
def test_triage_task_not_promoted_to_ready(client):
"""Triage tasks must stay in triage even when they have no parents."""
client.post(
"/api/plugins/kanban/tasks",
json={"title": "must stay put", "triage": True},
)
# Run the dispatcher — it should NOT promote the triage task.
client.post("/api/plugins/kanban/dispatch?dry_run=false&max=4")
r = client.get("/api/plugins/kanban/board")
triage = next(c for c in r.json()["columns"] if c["name"] == "triage")
ready = next(c for c in r.json()["columns"] if c["name"] == "ready")
assert len(triage["tasks"]) == 1
assert len(ready["tasks"]) == 0
def test_patch_status_triage_works(client):
"""A user (or specifier) can push a task back into triage, and out of it."""
t = client.post(
"/api/plugins/kanban/tasks", json={"title": "x"},
).json()["task"]
# Normal creation is 'ready'; push to triage.
r = client.patch(
f"/api/plugins/kanban/tasks/{t['id']}", json={"status": "triage"},
)
assert r.status_code == 200
assert r.json()["task"]["status"] == "triage"
# Now promote to todo.
r = client.patch(
f"/api/plugins/kanban/tasks/{t['id']}", json={"status": "todo"},
)
assert r.status_code == 200
assert r.json()["task"]["status"] == "todo"
# ---------------------------------------------------------------------------
# Progress rollup (done children / total children)
# ---------------------------------------------------------------------------
def test_board_progress_rollup(client):
parent = client.post(
"/api/plugins/kanban/tasks", json={"title": "parent"},
).json()["task"]
child_a = client.post(
"/api/plugins/kanban/tasks",
json={"title": "a", "parents": [parent["id"]]},
).json()["task"]
child_b = client.post(
"/api/plugins/kanban/tasks",
json={"title": "b", "parents": [parent["id"]]},
).json()["task"]
# Children start as "todo" because the parent isn't done yet; promote
# them to "ready" so complete_task will accept the transition.
for cid in (child_a["id"], child_b["id"]):
r = client.patch(
f"/api/plugins/kanban/tasks/{cid}", json={"status": "ready"},
)
assert r.status_code == 200
# 0/2 done.
r = client.get("/api/plugins/kanban/board")
parent_row = next(
t for col in r.json()["columns"] for t in col["tasks"]
if t["id"] == parent["id"]
)
assert parent_row["progress"] == {"done": 0, "total": 2}
# Complete one child. 1/2.
r = client.patch(
f"/api/plugins/kanban/tasks/{child_a['id']}",
json={"status": "done"},
)
assert r.status_code == 200
r = client.get("/api/plugins/kanban/board")
parent_row = next(
t for col in r.json()["columns"] for t in col["tasks"]
if t["id"] == parent["id"]
)
assert parent_row["progress"] == {"done": 1, "total": 2}
# Childless tasks report progress=None, not {0/0}.
assert next(
t for col in r.json()["columns"] for t in col["tasks"]
if t["id"] == child_b["id"]
)["progress"] is None
# ---------------------------------------------------------------------------
# Auto-init on first board read
# ---------------------------------------------------------------------------
def test_board_auto_initializes_missing_db(tmp_path, monkeypatch):
"""If kanban.db doesn't exist yet, GET /board must create it, not 500."""
home = tmp_path / ".hermes"
home.mkdir()
monkeypatch.setenv("HERMES_HOME", str(home))
monkeypatch.setattr(Path, "home", lambda: tmp_path)
# Deliberately DO NOT call kb.init_db().
app = FastAPI()
app.include_router(_load_plugin_router(), prefix="/api/plugins/kanban")
c = TestClient(app)
r = c.get("/api/plugins/kanban/board")
assert r.status_code == 200
assert (home / "kanban.db").exists(), "init_db wasn't invoked by /board"
# ---------------------------------------------------------------------------
# WebSocket auth (query-param token)
# ---------------------------------------------------------------------------
def test_ws_events_rejects_when_token_required(tmp_path, monkeypatch):
"""When _SESSION_TOKEN is set (normal dashboard context), a missing or
wrong ?token= query param must be rejected with policy-violation."""
home = tmp_path / ".hermes"
home.mkdir()
monkeypatch.setenv("HERMES_HOME", str(home))
monkeypatch.setattr(Path, "home", lambda: tmp_path)
kb.init_db()
# Stub web_server so _check_ws_token has a token to compare against.
import types
stub = types.SimpleNamespace(_SESSION_TOKEN="secret-xyz")
monkeypatch.setitem(sys.modules, "hermes_cli.web_server", stub)
app = FastAPI()
app.include_router(_load_plugin_router(), prefix="/api/plugins/kanban")
c = TestClient(app)
# No token → policy violation close.
from starlette.websockets import WebSocketDisconnect
with pytest.raises(WebSocketDisconnect) as exc:
with c.websocket_connect("/api/plugins/kanban/events"):
pass
assert exc.value.code == 1008
# Wrong token → policy violation close.
with pytest.raises(WebSocketDisconnect) as exc:
with c.websocket_connect("/api/plugins/kanban/events?token=nope"):
pass
assert exc.value.code == 1008
# Correct token → accepted (connect then close cleanly from our side).
with c.websocket_connect(
"/api/plugins/kanban/events?token=secret-xyz"
) as ws:
assert ws is not None # handshake succeeded
# ---------------------------------------------------------------------------
# Bulk actions
# ---------------------------------------------------------------------------
def test_bulk_status_ready(client):
a = client.post("/api/plugins/kanban/tasks", json={"title": "a"}).json()["task"]
b = client.post("/api/plugins/kanban/tasks", json={"title": "b"}).json()["task"]
c2 = client.post("/api/plugins/kanban/tasks", json={"title": "c"}).json()["task"]
# Parent-less tasks land in "ready" already; push them to blocked first.
for tid in (a["id"], b["id"], c2["id"]):
client.patch(f"/api/plugins/kanban/tasks/{tid}",
json={"status": "blocked", "block_reason": "wait"})
r = client.post("/api/plugins/kanban/tasks/bulk",
json={"ids": [a["id"], b["id"], c2["id"]], "status": "ready"})
assert r.status_code == 200
results = r.json()["results"]
assert all(r["ok"] for r in results)
# All three are now ready.
board = client.get("/api/plugins/kanban/board").json()
ready = next(col for col in board["columns"] if col["name"] == "ready")
ids = {t["id"] for t in ready["tasks"]}
assert {a["id"], b["id"], c2["id"]}.issubset(ids)
def test_bulk_archive(client):
a = client.post("/api/plugins/kanban/tasks", json={"title": "a"}).json()["task"]
b = client.post("/api/plugins/kanban/tasks", json={"title": "b"}).json()["task"]
r = client.post("/api/plugins/kanban/tasks/bulk",
json={"ids": [a["id"], b["id"]], "archive": True})
assert r.status_code == 200
assert all(r["ok"] for r in r.json()["results"])
# Default board (archived hidden) — both gone.
board = client.get("/api/plugins/kanban/board").json()
ids = {t["id"] for col in board["columns"] for t in col["tasks"]}
assert a["id"] not in ids
assert b["id"] not in ids
def test_bulk_reassign(client):
a = client.post("/api/plugins/kanban/tasks",
json={"title": "a", "assignee": "old"}).json()["task"]
b = client.post("/api/plugins/kanban/tasks",
json={"title": "b", "assignee": "old"}).json()["task"]
r = client.post("/api/plugins/kanban/tasks/bulk",
json={"ids": [a["id"], b["id"]], "assignee": "new"})
assert r.status_code == 200
for tid in (a["id"], b["id"]):
t = client.get(f"/api/plugins/kanban/tasks/{tid}").json()["task"]
assert t["assignee"] == "new"
def test_bulk_unassign_via_empty_string(client):
a = client.post("/api/plugins/kanban/tasks",
json={"title": "a", "assignee": "x"}).json()["task"]
r = client.post("/api/plugins/kanban/tasks/bulk",
json={"ids": [a["id"]], "assignee": ""})
assert r.status_code == 200
t = client.get(f"/api/plugins/kanban/tasks/{a['id']}").json()["task"]
assert t["assignee"] is None
def test_bulk_partial_failure_doesnt_abort_siblings(client):
"""One bad id in the middle of a batch must not prevent others from
applying."""
a = client.post("/api/plugins/kanban/tasks", json={"title": "a"}).json()["task"]
c2 = client.post("/api/plugins/kanban/tasks", json={"title": "c"}).json()["task"]
r = client.post("/api/plugins/kanban/tasks/bulk",
json={"ids": [a["id"], "bogus-id", c2["id"]], "priority": 7})
assert r.status_code == 200
results = r.json()["results"]
assert len(results) == 3
ok_ids = {r["id"] for r in results if r["ok"]}
assert a["id"] in ok_ids
assert c2["id"] in ok_ids
assert any(not r["ok"] and r["id"] == "bogus-id" for r in results)
# Good siblings actually got the priority bump.
for tid in (a["id"], c2["id"]):
t = client.get(f"/api/plugins/kanban/tasks/{tid}").json()["task"]
assert t["priority"] == 7
def test_bulk_empty_ids_400(client):
r = client.post("/api/plugins/kanban/tasks/bulk", json={"ids": []})
assert r.status_code == 400
# ---------------------------------------------------------------------------
# /config endpoint
# ---------------------------------------------------------------------------
def test_config_returns_defaults_when_section_missing(client):
r = client.get("/api/plugins/kanban/config")
assert r.status_code == 200
data = r.json()
# Defaults when dashboard.kanban is missing.
assert data["default_tenant"] == ""
assert data["lane_by_profile"] is True
assert data["include_archived_by_default"] is False
assert data["render_markdown"] is True
def test_config_reads_dashboard_kanban_section(tmp_path, monkeypatch, client):
home = Path(os.environ["HERMES_HOME"])
(home / "config.yaml").write_text(
"dashboard:\n"
" kanban:\n"
" default_tenant: acme\n"
" lane_by_profile: false\n"
" include_archived_by_default: true\n"
" render_markdown: false\n"
)
r = client.get("/api/plugins/kanban/config")
assert r.status_code == 200
data = r.json()
assert data["default_tenant"] == "acme"
assert data["lane_by_profile"] is False
assert data["include_archived_by_default"] is True
assert data["render_markdown"] is False
# ---------------------------------------------------------------------------
# Runs surfacing (vulcan-artivus RFC feedback)
# ---------------------------------------------------------------------------
def test_task_detail_includes_runs(client):
"""GET /tasks/:id carries a runs[] array with the attempt history."""
r = client.post("/api/plugins/kanban/tasks",
json={"title": "port x", "assignee": "worker"}).json()
tid = r["task"]["id"]
# Drive status running to force a run creation: PATCH to running
# doesn't call claim_task (the PATCH path uses _set_status_direct),
# so use the bulk/claim indirection via the kernel.
import hermes_cli.kanban_db as _kb
conn = _kb.connect()
try:
_kb.claim_task(conn, tid)
_kb.complete_task(
conn, tid,
result="done",
summary="tested on rate limiter",
metadata={"changed_files": ["limiter.py"]},
)
finally:
conn.close()
d = client.get(f"/api/plugins/kanban/tasks/{tid}").json()
assert "runs" in d
assert len(d["runs"]) == 1
run = d["runs"][0]
assert run["outcome"] == "completed"
assert run["profile"] == "worker"
assert run["summary"] == "tested on rate limiter"
assert run["metadata"] == {"changed_files": ["limiter.py"]}
assert run["ended_at"] is not None
def test_task_detail_runs_empty_before_claim(client):
"""A task that's never been claimed has an empty runs[] list, not
a missing key."""
r = client.post("/api/plugins/kanban/tasks", json={"title": "fresh"}).json()
d = client.get(f"/api/plugins/kanban/tasks/{r['task']['id']}").json()
assert d["runs"] == []
def test_patch_status_done_with_summary_and_metadata(client):
"""PATCH /tasks/:id with status=done + summary + metadata must
reach complete_task, so the dashboard has CLI parity."""
# Create + claim.
r = client.post("/api/plugins/kanban/tasks", json={"title": "x", "assignee": "worker"})
tid = r.json()["task"]["id"]
from hermes_cli import kanban_db as kb
conn = kb.connect()
try:
kb.claim_task(conn, tid)
finally:
conn.close()
r = client.patch(
f"/api/plugins/kanban/tasks/{tid}",
json={
"status": "done",
"summary": "shipped the thing",
"metadata": {"changed_files": ["a.py", "b.py"], "tests_run": 7},
},
)
assert r.status_code == 200, r.text
# The run must have the summary + metadata attached.
conn = kb.connect()
try:
run = kb.latest_run(conn, tid)
assert run.outcome == "completed"
assert run.summary == "shipped the thing"
assert run.metadata == {"changed_files": ["a.py", "b.py"], "tests_run": 7}
finally:
conn.close()
def test_patch_status_done_without_summary_still_works(client):
"""Back-compat: PATCH without the new fields still completes."""
r = client.post("/api/plugins/kanban/tasks", json={"title": "y", "assignee": "worker"})
tid = r.json()["task"]["id"]
from hermes_cli import kanban_db as kb
conn = kb.connect()
try:
kb.claim_task(conn, tid)
finally:
conn.close()
r = client.patch(
f"/api/plugins/kanban/tasks/{tid}",
json={"status": "done", "result": "legacy shape"},
)
assert r.status_code == 200, r.text
conn = kb.connect()
try:
run = kb.latest_run(conn, tid)
assert run.outcome == "completed"
assert run.summary == "legacy shape" # falls back to result
finally:
conn.close()
def test_patch_status_archive_closes_running_run(client):
"""PATCH to archived while running must close the in-flight run."""
r = client.post("/api/plugins/kanban/tasks", json={"title": "z", "assignee": "worker"})
tid = r.json()["task"]["id"]
from hermes_cli import kanban_db as kb
conn = kb.connect()
try:
kb.claim_task(conn, tid)
open_run = kb.latest_run(conn, tid)
assert open_run.ended_at is None
finally:
conn.close()
r = client.patch(
f"/api/plugins/kanban/tasks/{tid}",
json={"status": "archived"},
)
assert r.status_code == 200, r.text
conn = kb.connect()
try:
task = kb.get_task(conn, tid)
assert task.status == "archived"
assert task.current_run_id is None
assert kb.latest_run(conn, tid).outcome == "reclaimed"
finally:
conn.close()
def test_event_dict_includes_run_id(client):
"""GET /tasks/:id returns events with run_id populated."""
r = client.post("/api/plugins/kanban/tasks", json={"title": "e", "assignee": "worker"})
tid = r.json()["task"]["id"]
from hermes_cli import kanban_db as kb
conn = kb.connect()
try:
kb.claim_task(conn, tid)
run_id = kb.latest_run(conn, tid).id
kb.complete_task(conn, tid, summary="wss")
finally:
conn.close()
r = client.get(f"/api/plugins/kanban/tasks/{tid}")
assert r.status_code == 200
events = r.json()["events"]
# Every event in the response must have a run_id key (None or int).
for e in events:
assert "run_id" in e, f"missing run_id in event: {e}"
# completed event must have the actual run_id.
comp = [e for e in events if e["kind"] == "completed"]
assert comp[0]["run_id"] == run_id
# ---------------------------------------------------------------------------
# Per-task force-loaded skills via REST
# ---------------------------------------------------------------------------
def test_create_task_with_skills_roundtrips(client):
"""POST /tasks accepts `skills: [...]`, GET /tasks/:id returns it."""
r = client.post(
"/api/plugins/kanban/tasks",
json={
"title": "translate docs",
"assignee": "linguist",
"skills": ["translation", "github-code-review"],
},
)
assert r.status_code == 200, r.text
task = r.json()["task"]
assert task["skills"] == ["translation", "github-code-review"]
# Fetch via GET /tasks/:id as the drawer does.
got = client.get(f"/api/plugins/kanban/tasks/{task['id']}").json()
assert got["task"]["skills"] == ["translation", "github-code-review"]
def test_create_task_without_skills_defaults_to_empty_list(client):
"""_task_dict serializes Task.skills=None as [] so the drawer can
always .length check without guarding against null."""
r = client.post(
"/api/plugins/kanban/tasks",
json={"title": "no skills", "assignee": "x"},
)
assert r.status_code == 200, r.text
task = r.json()["task"]
# Task.skills is None in-memory; _task_dict serializes via
# dataclasses.asdict which keeps it None. The drawer's
# `t.skills && t.skills.length > 0` guard handles both null and [].
assert task.get("skills") in (None, [])
# ---------------------------------------------------------------------------
# Dispatcher-presence warning in POST /tasks response
# ---------------------------------------------------------------------------
def test_create_task_includes_warning_when_no_dispatcher(client, monkeypatch):
"""ready+assigned task + no gateway -> response has `warning` field
so the dashboard UI can surface a banner."""
# Force the dispatcher probe to report "not running".
monkeypatch.setattr(
"hermes_cli.kanban._check_dispatcher_presence",
lambda: (False, "No gateway is running — start `hermes gateway start`."),
)
r = client.post(
"/api/plugins/kanban/tasks",
json={"title": "warn-me", "assignee": "worker"},
)
assert r.status_code == 200
data = r.json()
assert data.get("warning")
assert "gateway" in data["warning"].lower()
def test_create_task_no_warning_when_dispatcher_up(client, monkeypatch):
"""Dispatcher running -> no `warning` field in the response."""
monkeypatch.setattr(
"hermes_cli.kanban._check_dispatcher_presence",
lambda: (True, ""),
)
r = client.post(
"/api/plugins/kanban/tasks",
json={"title": "silent", "assignee": "worker"},
)
assert r.status_code == 200
assert "warning" not in r.json() or not r.json()["warning"]
def test_create_task_no_warning_on_triage(client, monkeypatch):
"""Triage tasks never get the warning (they can't be dispatched
anyway until promoted)."""
monkeypatch.setattr(
"hermes_cli.kanban._check_dispatcher_presence",
lambda: (False, "oh no"),
)
r = client.post(
"/api/plugins/kanban/tasks",
json={"title": "triage-task", "assignee": "worker", "triage": True},
)
assert r.status_code == 200
assert "warning" not in r.json() or not r.json()["warning"]
def test_create_task_probe_error_does_not_break_create(client, monkeypatch):
"""Probe failure must never break task creation."""
def _raise():
raise RuntimeError("probe crashed")
monkeypatch.setattr(
"hermes_cli.kanban._check_dispatcher_presence", _raise,
)
r = client.post(
"/api/plugins/kanban/tasks",
json={"title": "resilient", "assignee": "worker"},
)
assert r.status_code == 200
assert r.json()["task"]["title"] == "resilient"
+63
View File
@@ -127,3 +127,66 @@ def test_background_review_installs_auto_deny_approval_callback(monkeypatch):
"Background review leaked its approval callback into the worker "
"thread's TLS slot; a recycled thread-id could reuse it."
)
def test_background_review_summary_is_attributed_to_self_improvement_loop(monkeypatch):
"""The CLI/gateway emission must identify the self-improvement loop.
Users who miss the line in their terminal have no way to tell that the
background review was what modified their skill/memory stores. The
summary prefix ``💾 Self-improvement review: `` makes the origin
explicit so both the CLI and gateway deliveries are unambiguous.
"""
import json
captured_prints: list = []
captured_bg_callback: list = []
class FakeReviewAgent:
def __init__(self, **kwargs):
# Simulate a review that successfully updated memory so
# _summarize_background_review_actions returns a real action.
self._session_messages = [
{
"role": "tool",
"tool_call_id": "call_bg",
"content": json.dumps(
{"success": True, "message": "Entry added", "target": "memory"}
),
}
]
def run_conversation(self, **kwargs):
pass
def shutdown_memory_provider(self):
pass
def close(self):
pass
monkeypatch.setattr(run_agent_module, "AIAgent", FakeReviewAgent)
monkeypatch.setattr(run_agent_module.threading, "Thread", ImmediateThread)
agent = _bare_agent()
agent._safe_print = lambda *a, **kw: captured_prints.append(" ".join(str(x) for x in a))
agent.background_review_callback = lambda msg: captured_bg_callback.append(msg)
AIAgent._spawn_background_review(
agent,
messages_snapshot=[{"role": "user", "content": "hi"}],
review_memory=True,
)
# Exactly one summary should have been emitted, and it must identify
# the self-improvement review explicitly.
assert len(captured_prints) == 1, captured_prints
printed = captured_prints[0]
assert "Self-improvement review" in printed, printed
assert "Memory updated" in printed, printed
# Gateway path gets the same prefix.
assert len(captured_bg_callback) == 1
assert captured_bg_callback[0].startswith("💾 Self-improvement review:"), (
captured_bg_callback[0]
)
@@ -0,0 +1,249 @@
"""Regression guard for PR #16660 (salvaged as PR #18027): ContextVar
propagation into concurrent tool worker threads.
Background
----------
Gateway adapters (Slack, Telegram, Discord, ...) set
``tools.approval._approval_session_key`` as a ContextVar before calling
``agent.run_conversation`` so that dangerous-command approval prompts route
back to the channel/session that initiated the tool call. When the agent
dispatches multiple tools in parallel, it uses
``concurrent.futures.ThreadPoolExecutor.submit(...)`` and ``submit`` runs
the callable in a *fresh* context, NOT the caller's context. Without an
explicit ``contextvars.copy_context().run(...)`` wrapper, worker threads
observe the ContextVar's default value, fall through to the
``os.environ`` legacy fallback (which the gateway overwrites at each
agent step), and route the approval card to *whichever session stepped
most recently* not the one that raised the prompt. Confirmed in the
wild on Slack with two concurrent channels: session A's `rm -rf`
approval card was delivered to session B.
The fix (4 LOC in ``run_agent.py``) snapshots the caller's context with
``copy_context()`` and submits ``ctx.run(_run_tool, )`` instead of
``_run_tool`` directly. Mirrors ``asyncio.to_thread`` semantics.
This suite follows the ``contextvar-run-in-executor-bridge`` skill's
two-test pattern: one end-to-end test proves the fix works at the
call-site level, one documents the Python contract that makes the fix
necessary. If anyone ever reverts the wrapper, the call-site test
fails while the contract test keeps passing a clear diagnostic
signal for *why* the call-site regressed.
"""
from __future__ import annotations
import concurrent.futures
import contextvars
import threading
def test_executor_submit_without_copy_context_does_not_propagate():
"""Documents the Python contract the fix relies on.
``concurrent.futures.ThreadPoolExecutor.submit(fn)`` runs ``fn`` in a
worker thread with a fresh, empty context. A ContextVar set by the
caller is invisible inside ``fn``. This is the exact trap that made
approval-session routing race in the gateway before #16660.
If this test ever fails i.e. submit() starts propagating
ContextVars by default the copy_context() wrapper in run_agent.py
becomes redundant but not harmful, and the call-site test below
should be updated accordingly.
"""
probe: contextvars.ContextVar[str] = contextvars.ContextVar(
"probe_default_propagation", default="unset"
)
def read_in_worker() -> str:
return probe.get()
probe.set("set-in-main")
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as ex:
observed = ex.submit(read_in_worker).result(timeout=5)
assert observed == "unset", (
"Unexpected: executor.submit propagated a ContextVar without "
"copy_context(). If Python's behavior changed, update "
"test_run_tool_worker_sees_parent_context below."
)
def test_executor_submit_with_copy_context_run_propagates():
"""Positive case: the explicit ``copy_context().run(...)`` wrapper the
PR adds makes parent-context ContextVar values visible in the worker.
"""
probe: contextvars.ContextVar[str] = contextvars.ContextVar(
"probe_explicit_propagation", default="unset"
)
def read_in_worker() -> str:
return probe.get()
probe.set("set-in-main")
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as ex:
ctx = contextvars.copy_context()
observed = ex.submit(ctx.run, read_in_worker).result(timeout=5)
assert observed == "set-in-main", (
f"copy_context().run(...) failed to propagate: got {observed!r}"
)
def test_run_tool_worker_sees_parent_approval_session_key():
"""End-to-end call-site guard.
Mirrors the exact shape of the fixed call site in
``run_agent.py::_execute_tool_calls_concurrent`` a
``ThreadPoolExecutor`` with ``executor.submit(ctx.run, fn, *args)``.
Sets the real ``tools.approval._approval_session_key`` ContextVar
in the caller and asserts the worker observes it via
``tools.approval.get_current_session_key()``.
If the PR's ``copy_context().run`` wrapper is reverted, this test
fails with ``Expected 'session-A' but worker saw 'default'``.
"""
from tools.approval import (
_approval_session_key,
get_current_session_key,
)
observed: dict = {}
barrier = threading.Event()
def worker_equivalent_to_run_tool() -> None:
# Mirror what real _run_tool does early: read the session key.
observed["session_key"] = get_current_session_key(default="FALLBACK")
barrier.set()
# Set the ContextVar the gateway would set before calling agent.run.
token = _approval_session_key.set("session-A")
try:
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as ex:
ctx = contextvars.copy_context()
fut = ex.submit(ctx.run, worker_equivalent_to_run_tool)
fut.result(timeout=5)
assert barrier.is_set(), "worker did not complete"
finally:
_approval_session_key.reset(token)
assert observed.get("session_key") == "session-A", (
f"Worker thread did not inherit _approval_session_key from caller. "
f"Expected 'session-A', got {observed.get('session_key')!r}. "
"This is the bug that PR #16660 fixed — approval prompts route to "
"the wrong session in concurrent gateway traffic. Check whether "
"the copy_context().run wrapper in _execute_tool_calls_concurrent "
"was removed."
)
def test_run_agent_concurrent_executor_wraps_submit_with_copy_context():
"""Source-level guard that the fix stays at the REAL call site.
The behavioral tests above exercise the pattern in isolation and
pass regardless of whether ``run_agent.py`` actually uses it.
This guard inspects ``_execute_tool_calls_concurrent`` directly and
asserts that ``executor.submit`` is called with ``ctx.run`` (or
``copy_context()`` appears within a few lines) so reverting the
wrapper in ``run_agent.py`` fails this test with a clear message.
"""
import ast
import inspect
import run_agent
src_path = inspect.getsourcefile(run_agent)
assert src_path is not None
tree = ast.parse(open(src_path, encoding="utf-8").read())
submit_calls_in_agent: list[ast.Call] = []
for node in ast.walk(tree):
if not isinstance(node, ast.Call):
continue
func = node.func
# Match executor.submit(...) style calls.
if isinstance(func, ast.Attribute) and func.attr == "submit":
submit_calls_in_agent.append(node)
# Filter to the submit call inside the concurrent tool executor —
# identifiable by passing `_run_tool` as its target. Other submit()
# call sites in run_agent.py (e.g. auxiliary client warm-up) are
# out of scope for this regression.
tool_submits = []
for call in submit_calls_in_agent:
if not call.args:
continue
first = call.args[0]
# Unfixed: executor.submit(_run_tool, ...) → first arg is a Name
if isinstance(first, ast.Name) and first.id == "_run_tool":
tool_submits.append(("unfixed", call))
# Fixed: executor.submit(ctx.run, _run_tool, ...) → first arg is
# ctx.run (Attribute), and _run_tool is the second arg.
elif (
isinstance(first, ast.Attribute)
and first.attr == "run"
and len(call.args) >= 2
and isinstance(call.args[1], ast.Name)
and call.args[1].id == "_run_tool"
):
tool_submits.append(("fixed", call))
assert tool_submits, (
"Could not locate `executor.submit(... _run_tool ...)` in "
"run_agent.py. The call site may have been renamed — update this "
"guard along with the refactor."
)
unfixed = [c for kind, c in tool_submits if kind == "unfixed"]
assert not unfixed, (
"run_agent.py contains `executor.submit(_run_tool, ...)` without a "
"`ctx.run` wrapper. This is the pre-#16660 shape: worker threads "
"will read a fresh ContextVar and approval-session routing "
"collapses to the os.environ fallback. Wrap with "
"`ctx = contextvars.copy_context(); executor.submit(ctx.run, "
"_run_tool, ...)`."
)
def test_two_concurrent_tool_batches_keep_session_keys_isolated():
"""End-to-end guard: two callers each set a different session key
and submit workers concurrently. Each worker must see its own
caller's key, not the other's.
Guards against a future "optimization" that reuses a single context
snapshot across callers (which would collapse isolation the same way
the unfixed ``submit`` does).
"""
from tools.approval import (
_approval_session_key,
get_current_session_key,
)
results: dict = {}
def caller(label: str) -> None:
token = _approval_session_key.set(f"session-{label}")
try:
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as ex:
ctx = contextvars.copy_context()
fut = ex.submit(
ctx.run,
lambda: get_current_session_key(default="FALLBACK"),
)
results[label] = fut.result(timeout=5)
finally:
_approval_session_key.reset(token)
t_a = threading.Thread(target=caller, args=("A",))
t_b = threading.Thread(target=caller, args=("B",))
t_a.start()
t_b.start()
t_a.join(timeout=10)
t_b.join(timeout=10)
assert results.get("A") == "session-A", (
f"Session A worker saw {results.get('A')!r}, expected 'session-A'"
)
assert results.get("B") == "session-B", (
f"Session B worker saw {results.get('B')!r}, expected 'session-B'"
)
+41
View File
@@ -0,0 +1,41 @@
# Stress / battle-test suite
Long-running tests that exercise the Kanban kernel under adversarial
conditions. **Not run by `scripts/run_tests.sh`** because they can
take 30+ seconds each and spawn real subprocesses.
Run manually:
```bash
./venv/bin/python -m pytest tests/stress/ -v -s
# or individual files:
./venv/bin/python tests/stress/test_concurrency.py
./venv/bin/python tests/stress/test_subprocess_e2e.py
./venv/bin/python tests/stress/test_property_fuzzing.py
./venv/bin/python tests/stress/test_benchmarks.py
```
## What's covered
- **test_concurrency.py** — 5 workers, 100 tasks, race-for-claim. Asserts
no double-claims, no orphan runs, no SQLite errors escape retry.
- **test_concurrency_mixed.py** — 10 workers + 1 reclaimer, 500 tasks,
random ops (claim/complete/block/unblock/archive). Same invariants
under adversarial scheduling.
- **test_concurrency_reclaim_race.py** — TTL < work duration so the
reclaimer intentionally yanks tasks mid-work; verifies the worker's
late-complete is refused cleanly (CAS guard works).
- **test_subprocess_e2e.py** — dispatcher spawns real Python subprocess
workers that heartbeat + complete via the CLI; crash detection
against a real dead PID.
- **test_property_fuzzing.py** — 500 random operation sequences,
~40k operations total, 9 invariant checks after each step.
- **test_atypical_scenarios.py** — 28 scenarios covering atypical
user inputs: unicode/emoji/RTL, 1 MB strings, SQL injection
attempts, cycles, self-parents, wide fan-in/out, clock skew,
HERMES_HOME with spaces/unicode/symlinks, 1000 runs on one
task, idempotency-key race across processes, terminal-state
resurrection attempts, dashboard REST with weird JSON.
- **test_benchmarks.py** — latency at 100/1k/10k tasks for dispatch,
recompute_ready, list_tasks, build_worker_context, etc. Results saved
to JSON for regression diffing.
+50
View File
@@ -0,0 +1,50 @@
#!/usr/bin/env python3
"""Fake worker process that exercises the real subprocess contract.
Reads HERMES_KANBAN_TASK from env, heartbeats periodically, does short
work, completes via the CLI. Designed to be spawned by the dispatcher
exactly the way `hermes chat -q` would be, minus the LLM cost.
"""
import json
import os
import subprocess
import sys
import time
def main():
tid = os.environ["HERMES_KANBAN_TASK"]
workspace = os.environ.get("HERMES_KANBAN_WORKSPACE", "")
# Announce via CLI (goes through real argparse + init_db + etc)
subprocess.run(
["hermes", "kanban", "heartbeat", tid, "--note", "started"],
check=True, capture_output=True,
)
# Simulate work with periodic heartbeats
for i in range(3):
time.sleep(0.3)
subprocess.run(
["hermes", "kanban", "heartbeat", tid, "--note", f"progress {i+1}/3"],
check=True, capture_output=True,
)
# Complete with structured handoff
subprocess.run(
[
"hermes", "kanban", "complete", tid,
"--summary", f"real-subprocess worker finished {tid}",
"--metadata", json.dumps({
"workspace": workspace,
"worker_pid": os.getpid(),
"iterations": 3,
}),
],
check=True, capture_output=True,
)
if __name__ == "__main__":
main()
+37
View File
@@ -0,0 +1,37 @@
"""pytest config for the stress/ subdirectory.
These tests are slow (30s+), spawn subprocesses, and are not run by
default. Enable via `pytest --run-stress` or by running the scripts
directly.
The scripts are primarily __main__-executable entry points; pytest
isn't expected to collect individual test functions from them.
"""
import pytest
def pytest_collection_modifyitems(config, items):
if config.getoption("--run-stress", default=False):
return
skip_stress = pytest.mark.skip(
reason="stress test (opt-in via --run-stress or run script directly)"
)
for item in items:
if "tests/stress" in str(item.fspath):
item.add_marker(skip_stress)
def pytest_addoption(parser):
parser.addoption(
"--run-stress",
action="store_true",
default=False,
help="Run the stress/battle-test suite (slow, spawns subprocesses).",
)
collect_ignore_glob = [
# The stress scripts have top-level code and hard-coded paths; they're
# meant to run as `python tests/stress/<name>.py`, not as pytest modules.
"*.py",
]
File diff suppressed because it is too large Load Diff
+221
View File
@@ -0,0 +1,221 @@
"""Scale benchmarks for the Kanban kernel.
Measures:
- dispatch_once latency at 100, 1000, 10000 tasks
- recompute_ready latency at 100, 1000, 10000 todo tasks with wide parent graphs
- build_worker_context latency with 1, 10, 50 parent dependencies
- board list/stats query latency
- task_runs query latency at scale
Results printed as a table. Saved to JSON for regression-diffing in CI
or future reviews. Not a pass/fail test records numbers so we know
when a change regresses latency by 10x and can decide whether to care.
"""
import json
import os
import random
import sys
import tempfile
import time
from pathlib import Path
WT = str(Path(__file__).resolve().parents[2])
def bench(label, fn, iterations=5):
"""Time fn over `iterations` runs, return (min, median, max) in ms."""
times = []
for _ in range(iterations):
t0 = time.perf_counter()
fn()
times.append((time.perf_counter() - t0) * 1000)
times.sort()
mn = times[0]
md = times[len(times) // 2]
mx = times[-1]
return {"label": label, "iter": iterations, "min_ms": mn, "median_ms": md, "max_ms": mx}
def seed_tasks(conn, kb, n, assignee="bench-worker", with_parents=False):
"""Seed n tasks. Optionally give each task 5 parents."""
ids = []
for i in range(n):
if with_parents and i >= 5:
parents = random.sample(ids[:i], 5)
else:
parents = ()
tid = kb.create_task(
conn, title=f"bench {i}", assignee=assignee,
tenant="bench", parents=parents,
)
ids.append(tid)
return ids
def main():
home = tempfile.mkdtemp(prefix="hermes_bench_")
os.environ["HERMES_HOME"] = home
os.environ["HOME"] = home
sys.path.insert(0, WT)
from hermes_cli import kanban_db as kb
kb.init_db()
results = []
# ============ dispatch_once latency ============
for n in [100, 1000, 10000]:
print(f"\n== dispatch_once @ {n} tasks ==")
# Fresh DB each time so we're not measuring cumulative effects
import shutil
shutil.rmtree(home, ignore_errors=True)
os.makedirs(home)
kb._INITIALIZED_PATHS.clear()
kb.init_db()
conn = kb.connect()
seed_tasks(conn, kb, n, assignee=None) # no assignee → won't spawn
r = bench(
f"dispatch_once (n={n}, no spawn)",
lambda: kb.dispatch_once(conn, spawn_fn=lambda *_: None),
iterations=5,
)
print(f" min={r['min_ms']:.1f} median={r['median_ms']:.1f} max={r['max_ms']:.1f} ms")
r["n"] = n
results.append(r)
conn.close()
# ============ recompute_ready at scale with parent graphs ============
for n in [100, 1000, 10000]:
print(f"\n== recompute_ready @ {n} tasks (5 parents each) ==")
shutil.rmtree(home, ignore_errors=True)
os.makedirs(home)
kb._INITIALIZED_PATHS.clear()
kb.init_db()
conn = kb.connect()
ids = seed_tasks(conn, kb, n, assignee=None, with_parents=True)
# Complete the first 100 so some todo tasks might get promoted
for tid in ids[:min(100, n // 10)]:
kb.complete_task(conn, tid, result="bench")
r = bench(
f"recompute_ready (n={n}, with parents)",
lambda: kb.recompute_ready(conn),
iterations=5,
)
print(f" min={r['min_ms']:.1f} median={r['median_ms']:.1f} max={r['max_ms']:.1f} ms")
r["n"] = n
results.append(r)
conn.close()
# ============ build_worker_context with N parents ============
for parent_count in [1, 10, 50]:
print(f"\n== build_worker_context with {parent_count} parents ==")
shutil.rmtree(home, ignore_errors=True)
os.makedirs(home)
kb._INITIALIZED_PATHS.clear()
kb.init_db()
conn = kb.connect()
# Create parents, complete them with summaries+metadata
parent_ids = []
for i in range(parent_count):
pid = kb.create_task(conn, title=f"parent {i}", assignee="p")
kb.claim_task(conn, pid)
kb.complete_task(
conn, pid,
summary=f"parent {i} result that is longer than a single token "
f"so we actually measure the IO",
metadata={"files": [f"file_{j}.py" for j in range(5)], "i": i},
)
parent_ids.append(pid)
child_id = kb.create_task(
conn, title="child", assignee="c", parents=parent_ids,
)
r = bench(
f"build_worker_context (parents={parent_count})",
lambda: kb.build_worker_context(conn, child_id),
iterations=10,
)
print(f" min={r['min_ms']:.1f} median={r['median_ms']:.1f} max={r['max_ms']:.1f} ms")
r["parent_count"] = parent_count
results.append(r)
conn.close()
# ============ list_tasks at scale ============
for n in [100, 1000, 10000]:
print(f"\n== list_tasks @ {n} ==")
shutil.rmtree(home, ignore_errors=True)
os.makedirs(home)
kb._INITIALIZED_PATHS.clear()
kb.init_db()
conn = kb.connect()
seed_tasks(conn, kb, n)
r = bench(
f"list_tasks (n={n})",
lambda: kb.list_tasks(conn),
iterations=5,
)
print(f" min={r['min_ms']:.1f} median={r['median_ms']:.1f} max={r['max_ms']:.1f} ms")
r["n"] = n
results.append(r)
conn.close()
# ============ board_stats at scale ============
for n in [100, 1000, 10000]:
print(f"\n== board_stats @ {n} ==")
shutil.rmtree(home, ignore_errors=True)
os.makedirs(home)
kb._INITIALIZED_PATHS.clear()
kb.init_db()
conn = kb.connect()
seed_tasks(conn, kb, n)
r = bench(
f"board_stats (n={n})",
lambda: kb.board_stats(conn),
iterations=5,
)
print(f" min={r['min_ms']:.1f} median={r['median_ms']:.1f} max={r['max_ms']:.1f} ms")
r["n"] = n
results.append(r)
conn.close()
# ============ list_runs at scale ============
for n in [100, 1000]:
print(f"\n== list_runs for task with {n} attempts ==")
shutil.rmtree(home, ignore_errors=True)
os.makedirs(home)
kb._INITIALIZED_PATHS.clear()
kb.init_db()
conn = kb.connect()
tid = kb.create_task(conn, title="x", assignee="w")
# Create N attempts via claim/release
for i in range(n):
kb.claim_task(conn, tid, ttl_seconds=0)
kb.release_stale_claims(conn)
r = bench(
f"list_runs (runs={n})",
lambda: kb.list_runs(conn, tid),
iterations=10,
)
print(f" min={r['min_ms']:.1f} median={r['median_ms']:.1f} max={r['max_ms']:.1f} ms")
r["run_count"] = n
results.append(r)
conn.close()
# ============ SUMMARY TABLE ============
print()
print("=" * 60)
print("SUMMARY")
print("=" * 60)
print(f"{'Benchmark':<50} {'min':>8} {'median':>8} {'max':>8}")
for r in results:
print(f"{r['label']:<50} {r['min_ms']:>7.1f}ms {r['median_ms']:>7.1f}ms {r['max_ms']:>7.1f}ms")
# Save for future diffing.
out_path = "/tmp/kanban_bench_results.json"
with open(out_path, "w") as f:
json.dump(results, f, indent=2)
print(f"\nResults saved to {out_path}")
if __name__ == "__main__":
main()
+302
View File
@@ -0,0 +1,302 @@
"""Multi-process concurrency stress test for the Kanban kernel.
5 worker processes race for claims on a shared DB with 100 tasks. Each
worker loops: claim -> simulate work -> complete. Asserts the invariants
that make the system worth building:
- No task claimed by two workers simultaneously
- No task completed twice
- Every claim produces exactly one run row
- Every completion closes exactly one run row
- Zero SQLite locking errors that escape the retry layer
- Total run count == total claim events == total completed events
This test is the primary justification for WAL + CAS-based claim. If it
passes, the architecture holds. If it fails, we have a real bug to fix
before anyone runs this in anger.
"""
import json
import multiprocessing as mp
import os
import random
import sqlite3
import subprocess
import sys
import tempfile
import time
from pathlib import Path
NUM_WORKERS = 5
NUM_TASKS = 100
WORKER_TIMEOUT_S = 60
WT = str(Path(__file__).resolve().parents[2])
def worker_loop(worker_id: int, hermes_home: str, result_file: str) -> None:
"""One worker's inner loop. Runs in a fresh Python process.
Tries to claim a ready task, marks it done with a per-worker summary,
repeats until the ready pool is empty. Records every claim + complete
into its own JSON result file for later aggregation.
"""
os.environ["HERMES_HOME"] = hermes_home
os.environ["HOME"] = hermes_home
sys.path.insert(0, WT)
from hermes_cli import kanban_db as kb
events = []
empty_polls = 0
start = time.monotonic()
while time.monotonic() - start < WORKER_TIMEOUT_S:
conn = kb.connect()
try:
# Find any ready task (non-deterministic order intentional — we
# want workers to race on popular assignees).
row = conn.execute(
"SELECT id FROM tasks WHERE status = 'ready' "
"AND claim_lock IS NULL LIMIT 1"
).fetchone()
if row is None:
empty_polls += 1
if empty_polls > 20:
break # queue empty long enough, stop
time.sleep(0.01)
continue
empty_polls = 0
tid = row["id"]
try:
claimed = kb.claim_task(
conn, tid, claimer=f"worker-{worker_id}",
)
except sqlite3.OperationalError as e:
events.append({"kind": "sqlite_err_on_claim", "task": tid, "err": str(e)})
continue
if claimed is None:
# Someone else beat us — expected contention, not an error.
events.append({"kind": "lost_claim_race", "task": tid})
continue
run = kb.latest_run(conn, tid)
events.append({
"kind": "claimed",
"task": tid,
"worker": worker_id,
"run_id": run.id,
"t": time.monotonic() - start,
})
# Simulate short, variable work
time.sleep(random.uniform(0.001, 0.05))
try:
kb.complete_task(
conn, tid,
result=f"done by worker-{worker_id}",
summary=f"worker-{worker_id} finished task {tid}",
metadata={"worker_id": worker_id, "run_id": run.id},
)
except sqlite3.OperationalError as e:
events.append({"kind": "sqlite_err_on_complete", "task": tid, "err": str(e)})
continue
events.append({
"kind": "completed",
"task": tid,
"worker": worker_id,
"run_id": run.id,
"t": time.monotonic() - start,
})
finally:
conn.close()
with open(result_file, "w") as f:
json.dump(events, f)
def main():
home = tempfile.mkdtemp(prefix="hermes_concurrency_")
print(f"HERMES_HOME = {home}")
# Seed.
os.environ["HERMES_HOME"] = home
os.environ["HOME"] = home
sys.path.insert(0, WT)
from hermes_cli import kanban_db as kb
kb.init_db()
conn = kb.connect()
tids = []
for i in range(NUM_TASKS):
tid = kb.create_task(
conn, title=f"task #{i}", assignee="shared",
tenant="concurrency-test",
)
tids.append(tid)
conn.close()
print(f"Seeded {NUM_TASKS} tasks.")
# Spawn workers.
ctx = mp.get_context("spawn")
result_files = [f"/tmp/concurrency_worker_{i}.json" for i in range(NUM_WORKERS)]
procs = []
start = time.monotonic()
for i in range(NUM_WORKERS):
p = ctx.Process(target=worker_loop, args=(i, home, result_files[i]))
p.start()
procs.append(p)
for p in procs:
p.join(timeout=WORKER_TIMEOUT_S + 30)
if p.is_alive():
p.terminate()
p.join()
elapsed = time.monotonic() - start
print(f"All workers done in {elapsed:.1f}s")
# Aggregate worker events.
all_events = []
for i, f in enumerate(result_files):
if not os.path.isfile(f):
print(f" WORKER {i} produced no result file — died?")
continue
with open(f) as fh:
events = json.load(fh)
all_events.extend(events)
# ============ INVARIANT CHECKS ============
print()
print("=" * 60)
print("INVARIANT CHECKS")
print("=" * 60)
failures = []
# Check 1: no task claimed by two different workers
claims_by_task = {}
for e in all_events:
if e["kind"] == "claimed":
if e["task"] in claims_by_task:
prev = claims_by_task[e["task"]]
if prev["worker"] != e["worker"]:
failures.append(
f"DOUBLE CLAIM: task {e['task']} claimed by "
f"worker {prev['worker']} AND worker {e['worker']}"
)
claims_by_task[e["task"]] = e
# Check 2: every completion has a matching claim from the same worker
for e in all_events:
if e["kind"] == "completed":
prev_claim = claims_by_task.get(e["task"])
if prev_claim is None:
failures.append(f"COMPLETION WITHOUT CLAIM: task {e['task']}")
elif prev_claim["worker"] != e["worker"]:
failures.append(
f"WORKER MISMATCH: task {e['task']} claimed by "
f"{prev_claim['worker']} but completed by {e['worker']}"
)
# Check 3: DB state — every task should be in 'done', no dangling claims
conn = kb.connect()
try:
bad_status = conn.execute(
"SELECT id, status, claim_lock, current_run_id FROM tasks "
"WHERE status != 'done' OR claim_lock IS NOT NULL "
"OR current_run_id IS NOT NULL"
).fetchall()
if bad_status:
for row in bad_status:
failures.append(
f"BAD FINAL STATE: task {row['id']} status={row['status']} "
f"claim_lock={row['claim_lock']} current_run_id={row['current_run_id']}"
)
# Check 4: exactly one run per task, all closed as completed
bad_runs = conn.execute(
"SELECT task_id, COUNT(*) as n FROM task_runs "
"GROUP BY task_id HAVING n != 1"
).fetchall()
if bad_runs:
for row in bad_runs:
failures.append(
f"WRONG RUN COUNT: task {row['task_id']} has {row['n']} runs (expected 1)"
)
open_runs = conn.execute(
"SELECT id, task_id FROM task_runs WHERE ended_at IS NULL"
).fetchall()
for row in open_runs:
failures.append(f"OPEN RUN: run {row['id']} on task {row['task_id']}")
wrong_outcomes = conn.execute(
"SELECT task_id, outcome FROM task_runs "
"WHERE outcome IS NULL OR outcome != 'completed'"
).fetchall()
for row in wrong_outcomes:
failures.append(
f"WRONG OUTCOME: task {row['task_id']} run outcome={row['outcome']}"
)
# Check 5: event counts — exactly NUM_TASKS completed events
completed_events = conn.execute(
"SELECT COUNT(*) as n FROM task_events WHERE kind='completed'"
).fetchone()["n"]
if completed_events != NUM_TASKS:
failures.append(
f"EVENT COUNT MISMATCH: {completed_events} completed events "
f"expected {NUM_TASKS}"
)
# Check 6: count SQLite errors that escaped retry
sqlite_errs = sum(
1 for e in all_events if e["kind"].startswith("sqlite_err")
)
if sqlite_errs > 0:
failures.append(f"UNRETRIED SQLITE ERRORS: {sqlite_errs}")
finally:
conn.close()
# ============ STATS ============
print()
total_claims = sum(1 for e in all_events if e["kind"] == "claimed")
total_completes = sum(1 for e in all_events if e["kind"] == "completed")
total_lost_races = sum(1 for e in all_events if e["kind"] == "lost_claim_race")
per_worker = {}
for e in all_events:
if e["kind"] == "completed":
per_worker.setdefault(e["worker"], 0)
per_worker[e["worker"]] += 1
print(f"Total claims: {total_claims}")
print(f"Total completes: {total_completes}")
print(f"Lost claim races: {total_lost_races} (expected contention; not a bug)")
print(f"Elapsed: {elapsed:.2f}s")
print(f"Throughput: {NUM_TASKS/elapsed:.1f} tasks/sec")
print(f"Per-worker completions:")
for w in sorted(per_worker.keys()):
print(f" worker-{w}: {per_worker[w]}")
if failures:
print()
print("=" * 60)
print(f"FAILURES ({len(failures)}):")
print("=" * 60)
for f in failures[:20]:
print(f" {f}")
if len(failures) > 20:
print(f" ... and {len(failures) - 20} more")
sys.exit(1)
else:
print()
print("✔ ALL INVARIANTS HELD")
if __name__ == "__main__":
main()
+350
View File
@@ -0,0 +1,350 @@
"""Harder concurrency stress: mixed operations + larger scale.
Scales to 500 tasks, 10 workers, 60s runtime. Each worker randomly:
- claims + completes (70%)
- claims + blocks with a reason (15%)
- unblocks a random blocked task (10%)
- archives a random done task (5%)
Adds a background "dispatcher" process that calls release_stale_claims
and detect_crashed_workers every 200ms, racing against the workers to
surface TTL + crash detection races.
Pass criteria: runs invariant holds, no double-completions, no orphan
runs, no SQLite errors escape the retry layer.
"""
import json
import multiprocessing as mp
import os
import random
import sqlite3
import sys
import tempfile
import time
from pathlib import Path
NUM_WORKERS = 10
NUM_TASKS = 500
RUN_DURATION_S = 30
WT = str(Path(__file__).resolve().parents[2])
def worker_loop(worker_id: int, hermes_home: str, result_file: str) -> None:
os.environ["HERMES_HOME"] = hermes_home
os.environ["HOME"] = hermes_home
sys.path.insert(0, WT)
from hermes_cli import kanban_db as kb
events = []
start = time.monotonic()
idle_rounds = 0
while time.monotonic() - start < RUN_DURATION_S:
conn = kb.connect()
try:
op = random.random()
if op < 0.10:
# Try to unblock a blocked task.
row = conn.execute(
"SELECT id FROM tasks WHERE status='blocked' "
"ORDER BY RANDOM() LIMIT 1"
).fetchone()
if row:
try:
ok = kb.unblock_task(conn, row["id"])
events.append({"kind": "unblocked" if ok else "unblock_noop",
"task": row["id"], "worker": worker_id})
except sqlite3.OperationalError as e:
events.append({"kind": "sqlite_err", "op": "unblock",
"task": row["id"], "err": str(e)[:100]})
continue
if op < 0.15:
# Try to archive a done task.
row = conn.execute(
"SELECT id FROM tasks WHERE status='done' "
"ORDER BY RANDOM() LIMIT 1"
).fetchone()
if row:
try:
kb.archive_task(conn, row["id"])
events.append({"kind": "archived", "task": row["id"],
"worker": worker_id})
except sqlite3.OperationalError as e:
events.append({"kind": "sqlite_err", "op": "archive",
"task": row["id"], "err": str(e)[:100]})
continue
# Default: claim + complete-or-block.
row = conn.execute(
"SELECT id FROM tasks WHERE status='ready' "
"AND claim_lock IS NULL LIMIT 1"
).fetchone()
if row is None:
idle_rounds += 1
if idle_rounds > 50:
break
time.sleep(0.02)
continue
idle_rounds = 0
tid = row["id"]
try:
claimed = kb.claim_task(
conn, tid, claimer=f"worker-{worker_id}",
ttl_seconds=5, # short TTL so reclaim races in
)
except sqlite3.OperationalError as e:
events.append({"kind": "sqlite_err", "op": "claim",
"task": tid, "err": str(e)[:100]})
continue
if claimed is None:
events.append({"kind": "lost_claim_race", "task": tid})
continue
run = kb.latest_run(conn, tid)
events.append({"kind": "claimed", "task": tid, "worker": worker_id,
"run_id": run.id, "t": time.monotonic() - start})
time.sleep(random.uniform(0.005, 0.05))
# 20% of the time, block instead of complete
if random.random() < 0.20:
try:
kb.block_task(conn, tid,
reason=f"blocked by worker-{worker_id}")
events.append({"kind": "blocked", "task": tid,
"worker": worker_id, "run_id": run.id})
except sqlite3.OperationalError as e:
events.append({"kind": "sqlite_err", "op": "block",
"task": tid, "err": str(e)[:100]})
else:
try:
kb.complete_task(
conn, tid,
result=f"done by worker-{worker_id}",
summary=f"worker-{worker_id} ok",
metadata={"worker_id": worker_id},
)
events.append({"kind": "completed", "task": tid,
"worker": worker_id, "run_id": run.id,
"t": time.monotonic() - start})
except sqlite3.OperationalError as e:
events.append({"kind": "sqlite_err", "op": "complete",
"task": tid, "err": str(e)[:100]})
finally:
conn.close()
with open(result_file, "w") as f:
json.dump(events, f)
def reclaimer_loop(hermes_home: str, result_file: str) -> None:
"""Background dispatcher-like loop that reclaims stale tasks."""
os.environ["HERMES_HOME"] = hermes_home
os.environ["HOME"] = hermes_home
sys.path.insert(0, WT)
from hermes_cli import kanban_db as kb
events = []
start = time.monotonic()
while time.monotonic() - start < RUN_DURATION_S + 2:
conn = kb.connect()
try:
try:
reclaimed = kb.release_stale_claims(conn)
if reclaimed:
events.append({"kind": "reclaimed", "count": reclaimed,
"t": time.monotonic() - start})
except sqlite3.OperationalError as e:
events.append({"kind": "sqlite_err", "op": "reclaim",
"err": str(e)[:100]})
finally:
conn.close()
time.sleep(0.2)
with open(result_file, "w") as f:
json.dump(events, f)
def main():
home = tempfile.mkdtemp(prefix="hermes_mixed_stress_")
print(f"HERMES_HOME = {home}")
os.environ["HERMES_HOME"] = home
os.environ["HOME"] = home
sys.path.insert(0, WT)
from hermes_cli import kanban_db as kb
kb.init_db()
conn = kb.connect()
for i in range(NUM_TASKS):
kb.create_task(
conn, title=f"t#{i}", assignee="shared", tenant="mixed-stress",
)
conn.close()
print(f"Seeded {NUM_TASKS} tasks, launching {NUM_WORKERS} workers + 1 reclaimer")
ctx = mp.get_context("spawn")
worker_results = [f"/tmp/mixed_worker_{i}.json" for i in range(NUM_WORKERS)]
reclaim_result = "/tmp/mixed_reclaim.json"
procs = []
start = time.monotonic()
for i in range(NUM_WORKERS):
p = ctx.Process(target=worker_loop, args=(i, home, worker_results[i]))
p.start()
procs.append(p)
r = ctx.Process(target=reclaimer_loop, args=(home, reclaim_result))
r.start()
procs.append(r)
for p in procs:
p.join(timeout=RUN_DURATION_S + 30)
if p.is_alive():
p.terminate()
p.join()
elapsed = time.monotonic() - start
print(f"Done in {elapsed:.1f}s")
# Aggregate.
all_events = []
for i, f in enumerate(worker_results):
if os.path.isfile(f):
with open(f) as fh:
all_events.extend(json.load(fh))
else:
print(f" WORKER {i} died with no result file!")
reclaim_events = []
if os.path.isfile(reclaim_result):
with open(reclaim_result) as fh:
reclaim_events = json.load(fh)
# ============ INVARIANT CHECKS ============
print()
print("=" * 60)
print("INVARIANT CHECKS")
print("=" * 60)
failures = []
# Per-run attribution tracking
claims = [e for e in all_events if e["kind"] == "claimed"]
completions = [e for e in all_events if e["kind"] == "completed"]
blocks = [e for e in all_events if e["kind"] == "blocked"]
# Every completion must have a matching claim on the same run_id AND
# the same worker (workers don't steal each other's runs).
claims_by_run = {c["run_id"]: c for c in claims}
for comp in completions:
claim = claims_by_run.get(comp["run_id"])
if claim is None:
# It's possible this worker saw a reclaimed run from another worker
# — that's still a bug: the worker shouldn't be able to complete
# a run it didn't claim. But let me check if reclaim happened first.
failures.append(
f"COMPLETION WITHOUT CLAIM: task {comp['task']} run {comp['run_id']} "
f"by worker {comp['worker']}"
)
elif claim["worker"] != comp["worker"]:
failures.append(
f"CROSS-WORKER COMPLETION: run {comp['run_id']} claimed by "
f"worker {claim['worker']} but completed by worker {comp['worker']}"
)
# SQLite errors that escaped the retry layer
sqlite_errs = [e for e in all_events if e["kind"] == "sqlite_err"]
if sqlite_errs:
for e in sqlite_errs[:5]:
failures.append(f"SQLITE ERROR: op={e.get('op')} err={e.get('err')}")
if len(sqlite_errs) > 5:
failures.append(f" ... and {len(sqlite_errs) - 5} more sqlite errs")
# DB final state — every task should be in a clean terminal state.
conn = kb.connect()
try:
# Invariant: current_run_id NULL iff latest run is terminal
inconsistent = conn.execute("""
SELECT t.id, t.status, t.current_run_id
FROM tasks t
WHERE t.current_run_id IS NOT NULL
AND EXISTS (SELECT 1 FROM task_runs r
WHERE r.id = t.current_run_id AND r.ended_at IS NOT NULL)
""").fetchall()
for row in inconsistent:
failures.append(
f"INVARIANT VIOLATION: task {row['id']} status={row['status']} "
f"has current_run_id={row['current_run_id']} but run is ended"
)
# Invariant: no orphan open runs
orphans = conn.execute("""
SELECT r.id, r.task_id, r.status
FROM task_runs r
LEFT JOIN tasks t ON t.current_run_id = r.id
WHERE r.ended_at IS NULL AND t.id IS NULL
""").fetchall()
for row in orphans:
failures.append(
f"ORPHAN OPEN RUN: run {row['id']} on task {row['task_id']}"
)
# Counts — should roughly balance.
status_counts = dict(
conn.execute("SELECT status, COUNT(*) FROM tasks GROUP BY status").fetchall()
)
run_outcome_counts = dict(
conn.execute(
"SELECT outcome, COUNT(*) FROM task_runs "
"WHERE ended_at IS NOT NULL GROUP BY outcome"
).fetchall()
)
active_runs = conn.execute(
"SELECT COUNT(*) FROM task_runs WHERE ended_at IS NULL"
).fetchone()[0]
finally:
conn.close()
# ============ STATS ============
print()
print(f"Workers: {NUM_WORKERS}, Tasks: {NUM_TASKS}")
print(f"Elapsed: {elapsed:.1f}s")
print(f"Events collected: {len(all_events)} (+{len(reclaim_events)} reclaim)")
print()
print("Operations:")
op_counts = {}
for e in all_events:
op_counts[e["kind"]] = op_counts.get(e["kind"], 0) + 1
for k in sorted(op_counts.keys()):
print(f" {k:<25} {op_counts[k]}")
print()
print("Final task status:")
for s, n in sorted(status_counts.items()):
print(f" {s:<10} {n}")
print("Final run outcomes:")
for o, n in sorted(run_outcome_counts.items(), key=lambda x: (x[0] or '',)):
print(f" {o:<12} {n}")
print(f" active {active_runs}")
if failures:
print()
print("=" * 60)
print(f"FAILURES ({len(failures)}):")
print("=" * 60)
for f in failures[:30]:
print(f" {f}")
if len(failures) > 30:
print(f" ... and {len(failures) - 30} more")
sys.exit(1)
else:
print()
print("✔ ALL INVARIANTS HELD UNDER MIXED STRESS")
if __name__ == "__main__":
main()
@@ -0,0 +1,241 @@
"""Target the reclaim race specifically.
Workers claim tasks with a 1s TTL but sleep 2s before completing. The
reclaimer runs every 200ms. Scenario: worker claims, reclaimer expires
the claim mid-work, worker tries to complete AFTER its run has been
reclaimed.
Expected behavior (per design): the worker's complete_task should
either succeed on the reclaimed-and-re-claimed-by-another-worker case
(no, it should refuse the claim was invalidated), OR succeed by
grace (we "forgive" a late complete from the original worker if no
one else picked it up).
Actually looking at complete_task: it doesn't check claim_lock. It just
transitions from 'running' -> 'done'. So if the reclaimer moved it back
to 'ready', the late worker's complete_task will fail (CAS on
status='running' fails). This is the CORRECT behavior.
Invariant being tested: race between worker.complete and
dispatcher.reclaim must not produce a double-run-close or other
inconsistency.
"""
import json
import multiprocessing as mp
import os
import random
import sqlite3
import sys
import tempfile
import time
from pathlib import Path
NUM_WORKERS = 5
NUM_TASKS = 50
TTL = 1
WORK_DURATION_S = 2.0 # longer than TTL => reclaimer wins
WT = str(Path(__file__).resolve().parents[2])
def worker_loop(worker_id: int, hermes_home: str, result_file: str) -> None:
os.environ["HERMES_HOME"] = hermes_home
os.environ["HOME"] = hermes_home
sys.path.insert(0, WT)
from hermes_cli import kanban_db as kb
events = []
start = time.monotonic()
idle = 0
while time.monotonic() - start < 40:
conn = kb.connect()
try:
row = conn.execute(
"SELECT id FROM tasks WHERE status='ready' AND claim_lock IS NULL LIMIT 1"
).fetchone()
if row is None:
idle += 1
if idle > 30:
break
time.sleep(0.05)
continue
idle = 0
tid = row["id"]
try:
claimed = kb.claim_task(conn, tid, claimer=f"worker-{worker_id}",
ttl_seconds=TTL)
except sqlite3.OperationalError as e:
events.append({"kind": "sqlite_err", "op": "claim", "err": str(e)[:100]})
continue
if claimed is None:
events.append({"kind": "lost_claim", "task": tid})
continue
run = kb.latest_run(conn, tid)
events.append({"kind": "claimed", "task": tid, "worker": worker_id,
"run_id": run.id})
# Sleep longer than TTL so reclaimer has a chance to intervene
time.sleep(WORK_DURATION_S + random.uniform(-0.3, 0.3))
try:
ok = kb.complete_task(
conn, tid,
result=f"by worker-{worker_id}",
summary=f"worker-{worker_id} finished",
)
events.append({"kind": "complete_ok" if ok else "complete_refused",
"task": tid, "worker": worker_id, "run_id": run.id})
except sqlite3.OperationalError as e:
events.append({"kind": "sqlite_err", "op": "complete", "err": str(e)[:100]})
finally:
conn.close()
with open(result_file, "w") as f:
json.dump(events, f)
def reclaimer_loop(hermes_home: str, result_file: str) -> None:
os.environ["HERMES_HOME"] = hermes_home
os.environ["HOME"] = hermes_home
sys.path.insert(0, WT)
from hermes_cli import kanban_db as kb
events = []
start = time.monotonic()
while time.monotonic() - start < 42:
conn = kb.connect()
try:
try:
n = kb.release_stale_claims(conn)
if n:
events.append({"kind": "reclaimed", "count": n,
"t": time.monotonic() - start})
except sqlite3.OperationalError as e:
events.append({"kind": "sqlite_err", "err": str(e)[:100]})
finally:
conn.close()
time.sleep(0.2)
with open(result_file, "w") as f:
json.dump(events, f)
def main():
home = tempfile.mkdtemp(prefix="hermes_reclaim_race_")
os.environ["HERMES_HOME"] = home
os.environ["HOME"] = home
sys.path.insert(0, WT)
from hermes_cli import kanban_db as kb
kb.init_db()
conn = kb.connect()
for i in range(NUM_TASKS):
kb.create_task(conn, title=f"t{i}", assignee="shared",
tenant="reclaim-race")
conn.close()
print(f"Seeded {NUM_TASKS} tasks. TTL={TTL}s, work_duration={WORK_DURATION_S}s")
print(f"(worker work > TTL guarantees reclaims)")
ctx = mp.get_context("spawn")
worker_results = [f"/tmp/rc_worker_{i}.json" for i in range(NUM_WORKERS)]
reclaim_result = "/tmp/rc_reclaim.json"
procs = []
for i in range(NUM_WORKERS):
p = ctx.Process(target=worker_loop, args=(i, home, worker_results[i]))
p.start()
procs.append(p)
r = ctx.Process(target=reclaimer_loop, args=(home, reclaim_result))
r.start()
procs.append(r)
for p in procs:
p.join(timeout=60)
if p.is_alive():
p.terminate()
p.join()
# Aggregate.
all_events = []
for f in worker_results:
if os.path.isfile(f):
with open(f) as fh:
all_events.extend(json.load(fh))
reclaim_events = []
if os.path.isfile(reclaim_result):
with open(reclaim_result) as fh:
reclaim_events = json.load(fh)
op_counts = {}
for e in all_events:
op_counts[e["kind"]] = op_counts.get(e["kind"], 0) + 1
total_reclaims = sum(e.get("count", 0) for e in reclaim_events)
print(f"\nReclaimer fired {len(reclaim_events)} times, total tasks reclaimed: {total_reclaims}")
print("Worker events:")
for k in sorted(op_counts):
print(f" {k:<25} {op_counts[k]}")
# Invariant checks
failures = []
conn = kb.connect()
try:
# Any task stuck with current_run_id pointing at a closed run?
bad = conn.execute("""
SELECT t.id, t.status, t.current_run_id, r.ended_at, r.outcome
FROM tasks t
JOIN task_runs r ON r.id = t.current_run_id
WHERE r.ended_at IS NOT NULL
""").fetchall()
for row in bad:
failures.append(
f"INVARIANT VIOLATION: task {row['id']} status={row['status']} "
f"current_run_id={row['current_run_id']} but run ended "
f"outcome={row['outcome']}"
)
# Every run with NULL ended_at should still have the task pointing at it
orphans = conn.execute("""
SELECT r.id, r.task_id
FROM task_runs r
LEFT JOIN tasks t ON t.current_run_id = r.id
WHERE r.ended_at IS NULL AND t.id IS NULL
""").fetchall()
for row in orphans:
failures.append(f"ORPHAN OPEN RUN: run {row['id']} on task {row['task_id']}")
# Event counts
claim_evts = conn.execute(
"SELECT COUNT(*) FROM task_events WHERE kind='claimed'").fetchone()[0]
reclaim_evts = conn.execute(
"SELECT COUNT(*) FROM task_events WHERE kind='reclaimed'").fetchone()[0]
comp_evts = conn.execute(
"SELECT COUNT(*) FROM task_events WHERE kind='completed'").fetchone()[0]
print(f"\nDB event counts: claimed={claim_evts} reclaimed={reclaim_evts} completed={comp_evts}")
# Every reclaimed run must have ended_at set
unended_reclaims = conn.execute(
"SELECT COUNT(*) FROM task_runs WHERE outcome='reclaimed' AND ended_at IS NULL"
).fetchone()[0]
if unended_reclaims:
failures.append(f"UNENDED RECLAIMED RUNS: {unended_reclaims}")
# Count of completed runs
comp_runs = conn.execute(
"SELECT COUNT(*) FROM task_runs WHERE outcome='completed'"
).fetchone()[0]
reclaim_runs = conn.execute(
"SELECT COUNT(*) FROM task_runs WHERE outcome='reclaimed'"
).fetchone()[0]
print(f"DB run outcomes: completed={comp_runs} reclaimed={reclaim_runs}")
finally:
conn.close()
if reclaim_runs == 0:
failures.append("NO RECLAIMS HAPPENED — test didn't stress what it was supposed to")
if failures:
print(f"\nFAILURES ({len(failures)}):")
for f in failures[:20]:
print(f" {f}")
sys.exit(1)
else:
print("\n✔ RECLAIM RACE INVARIANTS HELD")
if __name__ == "__main__":
main()
+283
View File
@@ -0,0 +1,283 @@
"""Randomized property testing for the Kanban kernel.
Generates 1000 random operation sequences, each 20-50 ops, on small
task graphs. After each step, checks the full invariant set:
I1. If tasks.current_run_id IS NOT NULL, the run MUST exist AND
ended_at MUST be NULL (we never point at a closed run).
I2. If a run has ended_at NULL, SOME task MUST have current_run_id
pointing at it (no orphan open runs).
I3. task.status in the valid set {triage, todo, ready, running,
blocked, done, archived}.
I4. task.claim_lock NULL iff status not in (running,).
I5. Every run has started_at <= ended_at (or ended_at is NULL).
I6. If outcome is set, ended_at must also be set.
I7. Events are strictly monotonic in (created_at, id).
I8. task_events.run_id references a task_runs.id that exists
(or is NULL).
I9. Parent completion invariant: if all parents are 'done', the
child cannot be in 'todo' status (recompute_ready should have
promoted it). This is called out in the comment on
recompute_ready; verify it holds after every random seq.
Not using hypothesis the lib; just Python random for simplicity.
"""
import os
import random
import sys
import tempfile
import time
from pathlib import Path
WT = str(Path(__file__).resolve().parents[2])
NUM_SEQUENCES = 500
OPS_PER_SEQUENCE = 100
TASK_POOL = 10
OPS = [
"create", "create_child", "claim", "complete", "block", "unblock",
"archive", "heartbeat", "release_stale", "detect_crashed",
"recompute_ready", "reassign",
]
def assert_invariants(conn, kb, ops_log):
"""Run all invariant checks; raise AssertionError with context on any."""
failures = []
# I1: current_run_id → run exists and not ended
bad_ptr = conn.execute("""
SELECT t.id, t.current_run_id, r.ended_at, r.outcome
FROM tasks t
LEFT JOIN task_runs r ON r.id = t.current_run_id
WHERE t.current_run_id IS NOT NULL
AND (r.id IS NULL OR r.ended_at IS NOT NULL)
""").fetchall()
for row in bad_ptr:
if row["ended_at"] is None and row["outcome"] is None:
detail = "missing"
else:
detail = f"closed ({row['outcome']})"
failures.append(
f"I1: task {row['id']} points at run {row['current_run_id']} "
f"which is {detail}"
)
# I2: open run → some task points at it
orphans = conn.execute("""
SELECT r.id, r.task_id
FROM task_runs r
WHERE r.ended_at IS NULL
AND NOT EXISTS (SELECT 1 FROM tasks t WHERE t.current_run_id = r.id)
""").fetchall()
for row in orphans:
failures.append(f"I2: open run {row['id']} on task {row['task_id']} has no pointer")
# I3: valid statuses
valid = {"triage", "todo", "ready", "running", "blocked", "done", "archived"}
bad_status = conn.execute("SELECT id, status FROM tasks").fetchall()
for row in bad_status:
if row["status"] not in valid:
failures.append(f"I3: task {row['id']} has invalid status {row['status']!r}")
# I4: claim_lock set only when running
bad_lock = conn.execute("""
SELECT id, status, claim_lock FROM tasks
WHERE (status != 'running' AND claim_lock IS NOT NULL)
""").fetchall()
for row in bad_lock:
failures.append(
f"I4: task {row['id']} status={row['status']} but claim_lock={row['claim_lock']!r}"
)
# I5: run started_at <= ended_at
bad_times = conn.execute("""
SELECT id, started_at, ended_at FROM task_runs
WHERE ended_at IS NOT NULL AND started_at > ended_at
""").fetchall()
for row in bad_times:
failures.append(
f"I5: run {row['id']} started_at={row['started_at']} > ended_at={row['ended_at']}"
)
# I6: outcome set → ended_at set
bad_outcome = conn.execute("""
SELECT id, outcome, ended_at FROM task_runs
WHERE outcome IS NOT NULL AND ended_at IS NULL
""").fetchall()
for row in bad_outcome:
failures.append(f"I6: run {row['id']} outcome={row['outcome']} but ended_at NULL")
# I7: events monotonic in id (always true for autoincrement)
# Skip — autoincrement guarantees it.
# I8: event.run_id references existing run
bad_ev_fk = conn.execute("""
SELECT e.id, e.run_id FROM task_events e
LEFT JOIN task_runs r ON r.id = e.run_id
WHERE e.run_id IS NOT NULL AND r.id IS NULL
""").fetchall()
for row in bad_ev_fk:
failures.append(f"I8: event {row['id']} references missing run {row['run_id']}")
# I9: if all parents done → child not in todo
# (Only applies to children with at least one parent)
orphaned_todo = conn.execute("""
SELECT c.id AS child_id,
COUNT(*) AS n_parents,
SUM(CASE WHEN p.status = 'done' THEN 1 ELSE 0 END) AS done_parents
FROM tasks c
JOIN task_links l ON l.child_id = c.id
JOIN tasks p ON p.id = l.parent_id
WHERE c.status = 'todo'
GROUP BY c.id
HAVING n_parents > 0 AND n_parents = done_parents
""").fetchall()
for row in orphaned_todo:
failures.append(
f"I9: task {row['child_id']} is todo but all {row['n_parents']} parents are done"
)
if failures:
print(f"\n!!! INVARIANT VIOLATION after {len(ops_log)} ops:")
for f in failures[:10]:
print(f" {f}")
if len(failures) > 10:
print(f" ... and {len(failures) - 10} more")
print("\nLast 10 ops:")
for op in ops_log[-10:]:
print(f" {op}")
return False
return True
def random_op(rng, conn, kb, task_pool):
op = rng.choice(OPS)
if op == "create":
tid = kb.create_task(
conn,
title=f"rand {rng.randint(0, 1000)}",
assignee=rng.choice(["w1", "w2", "w3", None]),
)
task_pool.append(tid)
return {"op": "create", "tid": tid}
if op == "create_child" and task_pool:
parent = rng.choice(task_pool)
tid = kb.create_task(
conn, title=f"child of {parent}",
assignee=rng.choice(["w1", "w2", "w3", None]),
parents=[parent],
)
task_pool.append(tid)
return {"op": "create_child", "tid": tid, "parent": parent}
if not task_pool:
return None
tid = rng.choice(task_pool)
task = kb.get_task(conn, tid)
if task is None:
task_pool.remove(tid)
return None
if op == "claim":
claimed = kb.claim_task(conn, tid, ttl_seconds=rng.choice([1, 3, 10]))
return {"op": "claim", "tid": tid, "ok": claimed is not None}
if op == "complete":
summary = rng.choice([None, f"done via op {rng.randint(0, 1000)}"])
ok = kb.complete_task(conn, tid, summary=summary)
return {"op": "complete", "tid": tid, "ok": ok}
if op == "block":
reason = rng.choice([None, "rand block"])
ok = kb.block_task(conn, tid, reason=reason)
return {"op": "block", "tid": tid, "ok": ok}
if op == "unblock":
ok = kb.unblock_task(conn, tid)
return {"op": "unblock", "tid": tid, "ok": ok}
if op == "archive":
ok = kb.archive_task(conn, tid)
if ok:
task_pool.remove(tid)
return {"op": "archive", "tid": tid, "ok": ok}
if op == "heartbeat":
ok = kb.heartbeat_worker(conn, tid)
return {"op": "heartbeat", "tid": tid, "ok": ok}
if op == "release_stale":
n = kb.release_stale_claims(conn)
return {"op": "release_stale", "n": n}
if op == "detect_crashed":
# Force-kill a fake PID first so there's something to detect
crashed = kb.detect_crashed_workers(conn)
return {"op": "detect_crashed", "n": len(crashed)}
if op == "recompute_ready":
n = kb.recompute_ready(conn)
return {"op": "recompute_ready", "promoted": n}
if op == "reassign":
# Reassignment isn't a direct API; simulate via assign_task
new_a = rng.choice(["w1", "w2", "w3", None])
try:
kb.assign_task(conn, tid, new_a)
return {"op": "reassign", "tid": tid, "to": new_a}
except Exception as e:
return {"op": "reassign", "tid": tid, "err": str(e)[:50]}
return None
def main():
total_ops = 0
total_violations = 0
for seq_idx in range(NUM_SEQUENCES):
seed = random.randint(0, 10**9)
rng = random.Random(seed)
home = tempfile.mkdtemp(prefix=f"hermes_fuzz_{seq_idx}_")
os.environ["HERMES_HOME"] = home
os.environ["HOME"] = home
sys.path.insert(0, WT)
# Fresh module state per sequence to avoid cached init paths.
for m in list(sys.modules.keys()):
if m.startswith("hermes_cli"):
del sys.modules[m]
from hermes_cli import kanban_db as kb
kb.init_db()
conn = kb.connect()
task_pool = []
ops_log = []
try:
for i in range(OPS_PER_SEQUENCE):
result = random_op(rng, conn, kb, task_pool)
if result is None:
continue
ops_log.append(result)
total_ops += 1
if not assert_invariants(conn, kb, ops_log):
total_violations += 1
print(f" sequence {seq_idx} (seed={seed}) failed at op {i}")
break
finally:
conn.close()
if seq_idx % 10 == 0:
print(f" seq {seq_idx:3d}: {total_ops} ops so far, {total_violations} violations")
print()
print("=" * 60)
print(f"Total sequences: {NUM_SEQUENCES}")
print(f"Total operations: {total_ops}")
print(f"Invariant violations: {total_violations}")
if total_violations == 0:
print("\n✔ ALL INVARIANTS HELD ACROSS RANDOMIZED SEQUENCES")
else:
print("\n✗ INVARIANT VIOLATIONS FOUND")
sys.exit(1)
if __name__ == "__main__":
main()
+228
View File
@@ -0,0 +1,228 @@
"""E2E: dispatcher spawns real Python subprocess workers.
This validates the IPC + lifecycle story that mocks can't:
- spawn_fn returns a real PID
- the child process resolves hermes_cli.kanban_db on its own
- the child writes heartbeats via the CLI (real argparse, real init_db)
- the child completes via the CLI with --summary + --metadata
- the dispatcher observes all of this through the DB only
- worker logs are captured to HERMES_HOME/kanban/logs/<task>.log
- crash detection works against a real dead PID
"""
import json
import os
import subprocess
import sys
import tempfile
import time
WT = str(Path(__file__).resolve().parents[2])
FAKE_WORKER = str(Path(__file__).parent / "_fake_worker.py")
PY = sys.executable
def make_spawn_fn(home: str):
"""Return a spawn_fn the dispatcher can call. Launches the fake
worker as a detached subprocess."""
def _spawn(task, workspace):
log_path = os.path.join(home, f"worker_{task.id}.log")
env = {
**os.environ,
"HERMES_HOME": home,
"HOME": home,
"PYTHONPATH": WT,
"HERMES_KANBAN_TASK": task.id,
"HERMES_KANBAN_WORKSPACE": workspace,
"PATH": f"{os.path.dirname(PY)}:{os.environ.get('PATH','')}",
}
log_f = open(log_path, "ab")
proc = subprocess.Popen(
[PY, FAKE_WORKER],
stdin=subprocess.DEVNULL,
stdout=log_f,
stderr=subprocess.STDOUT,
env=env,
start_new_session=True,
)
return proc.pid
return _spawn
def main():
home = tempfile.mkdtemp(prefix="hermes_e2e_")
os.environ["HERMES_HOME"] = home
os.environ["HOME"] = home
sys.path.insert(0, WT)
from hermes_cli import kanban_db as kb
# Point the `hermes` CLI child processes will run at the worktree
# hermes_cli.main. We do this by putting a shim on PATH.
shim_dir = os.path.join(home, "bin")
os.makedirs(shim_dir, exist_ok=True)
shim_path = os.path.join(shim_dir, "hermes")
with open(shim_path, "w") as f:
f.write(f"""#!/bin/sh
exec {PY} -m hermes_cli.main "$@"
""")
os.chmod(shim_path, 0o755)
os.environ["PATH"] = f"{shim_dir}:{os.environ.get('PATH','')}"
kb.init_db()
conn = kb.connect()
# ============ SCENARIO A: happy path, 3 tasks ============
print("=" * 60)
print("A. Real-subprocess happy path (3 tasks)")
print("=" * 60)
tids = []
for i in range(3):
tid = kb.create_task(
conn, title=f"real-e2e-{i}", assignee="worker",
)
tids.append(tid)
spawn_fn = make_spawn_fn(home)
result = kb.dispatch_once(conn, spawn_fn=spawn_fn)
print(f" dispatched: {len(result.spawned)} spawned")
spawned_pids = []
# The dispatcher sets worker_pid on each claimed task via _set_worker_pid.
for tid in tids:
task = kb.get_task(conn, tid)
spawned_pids.append(task.worker_pid)
print(f" task {tid}: pid={task.worker_pid} status={task.status}")
# Wait for all workers to complete (up to 10s).
deadline = time.monotonic() + 10
while time.monotonic() < deadline:
statuses = [kb.get_task(conn, tid).status for tid in tids]
if all(s == "done" for s in statuses):
break
time.sleep(0.2)
print()
failures = []
for tid in tids:
task = kb.get_task(conn, tid)
runs = kb.list_runs(conn, tid)
print(f" task {tid}: status={task.status}, current_run_id={task.current_run_id}, "
f"runs={[(r.id, r.outcome) for r in runs]}")
if task.status != "done":
failures.append(f"task {tid} not done: status={task.status}")
if task.current_run_id is not None:
failures.append(f"task {tid} has dangling current_run_id={task.current_run_id}")
if len(runs) != 1:
failures.append(f"task {tid} has {len(runs)} runs, expected 1")
else:
r = runs[0]
if r.outcome != "completed":
failures.append(f"task {tid} run outcome={r.outcome}, expected completed")
if not r.summary or "real-subprocess worker finished" not in r.summary:
failures.append(f"task {tid} summary missing: {r.summary!r}")
if not r.metadata or r.metadata.get("iterations") != 3:
failures.append(f"task {tid} metadata missing iterations: {r.metadata}")
# Heartbeat events should be present
events = kb.list_events(conn, tid)
heartbeats = [e for e in events if e.kind == "heartbeat"]
if len(heartbeats) < 3: # start + 3 progress
failures.append(f"task {tid} heartbeats={len(heartbeats)} expected >=3")
if failures:
print("\nFAILURES:")
for f in failures:
print(f" {f}")
sys.exit(1)
print("\n ✔ Scenario A: all 3 real-subprocess workers completed cleanly")
# ============ SCENARIO B: crashed worker ============
print()
print("=" * 60)
print("B. Crashed worker (kill -9 mid-heartbeat)")
print("=" * 60)
crash_tid = kb.create_task(
conn, title="crash-e2e", assignee="worker",
)
# Spawn a worker that sleeps long enough for us to kill it.
# CRITICAL: spawn through a double-fork so when we kill the child it
# doesn't zombify under our pid (which would fool kill -0 liveness
# checks into thinking it's still alive). In production the
# dispatcher daemon is long-lived but its workers are reaped by init
# after exit; the test needs to match that orphaning behavior.
def spawn_sleeper(task, workspace):
r, w = os.pipe()
middleman = subprocess.Popen(
[
PY, "-c",
"import os,sys,subprocess;"
"p=subprocess.Popen(['sleep','30'],"
"stdin=subprocess.DEVNULL,"
"stdout=subprocess.DEVNULL,stderr=subprocess.DEVNULL,"
"start_new_session=True);"
"os.write(int(sys.argv[1]), str(p.pid).encode());"
"sys.exit(0)",
str(w),
],
pass_fds=(w,),
stdin=subprocess.DEVNULL,
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
)
os.close(w)
middleman.wait() # middleman exits immediately, orphaning the sleep
grandchild_pid = int(os.read(r, 16))
os.close(r)
return grandchild_pid
result = kb.dispatch_once(conn, spawn_fn=spawn_sleeper)
task = kb.get_task(conn, crash_tid)
print(f" spawned sleeper pid={task.worker_pid} for {crash_tid}")
# Kill the sleeper forcibly
os.kill(task.worker_pid, 9)
# Give the OS a moment to reap
time.sleep(0.5)
# Simulate next dispatcher tick — should detect the crashed PID
crashed = kb.detect_crashed_workers(conn)
print(f" detect_crashed_workers returned {len(crashed)} crashed (expected 1)")
task = kb.get_task(conn, crash_tid)
runs = kb.list_runs(conn, crash_tid)
print(f" task status={task.status}, runs={[(r.id, r.outcome) for r in runs]}")
if len(crashed) < 1:
print(" ✗ crash NOT detected")
sys.exit(1)
if task.status != "ready":
print(f" ✗ task should be back to ready, got {task.status}")
sys.exit(1)
if runs[0].outcome != "crashed":
print(f" ✗ run outcome should be 'crashed', got {runs[0].outcome!r}")
sys.exit(1)
print("\n ✔ Scenario B: crash detected, task re-queued, run outcome=crashed")
# ============ SCENARIO C: worker log was captured ============
print()
print("=" * 60)
print("C. Worker log captured to disk")
print("=" * 60)
# Scenario A workers wrote to /tmp/hermes_e2e_*/worker_*.log
import glob
logs = glob.glob(os.path.join(home, "worker_*.log"))
print(f" {len(logs)} worker log files")
for lp in logs[:3]:
size = os.path.getsize(lp)
print(f" {os.path.basename(lp)}: {size} bytes")
# Our fake worker is quiet (no prints); size=0 is fine
conn.close()
print("\n✔ ALL E2E SCENARIOS PASS")
if __name__ == "__main__":
main()
+494
View File
@@ -0,0 +1,494 @@
"""Tests for the Kanban tool surface (tools/kanban_tools.py).
Verifies:
- Tools are gated on HERMES_KANBAN_TASK: a normal chat session sees
zero kanban tools in its schema; a worker session sees all seven.
- Each handler's happy path.
- Error paths (missing required args, bad metadata type, etc).
"""
from __future__ import annotations
import json
import os
import pytest
# ---------------------------------------------------------------------------
# Gating
# ---------------------------------------------------------------------------
def test_kanban_tools_hidden_without_env_var(monkeypatch, tmp_path):
"""Normal `hermes chat` sessions (no HERMES_KANBAN_TASK) must have
zero kanban_* tools in their schema."""
monkeypatch.delenv("HERMES_KANBAN_TASK", raising=False)
home = tmp_path / ".hermes"
home.mkdir()
monkeypatch.setenv("HERMES_HOME", str(home))
import tools.kanban_tools # ensure registered
from tools.registry import registry
from toolsets import resolve_toolset
schema = registry.get_definitions(set(resolve_toolset("hermes-cli")), quiet=True)
names = {s["function"].get("name") for s in schema if "function" in s}
kanban = {n for n in names if n and n.startswith("kanban_")}
assert kanban == set(), (
f"kanban tools leaked into normal chat schema: {kanban}"
)
def test_kanban_tools_visible_with_env_var(monkeypatch, tmp_path):
"""Worker sessions (HERMES_KANBAN_TASK set) must have all 7 tools."""
monkeypatch.setenv("HERMES_KANBAN_TASK", "t_fake")
home = tmp_path / ".hermes"
home.mkdir()
monkeypatch.setenv("HERMES_HOME", str(home))
import tools.kanban_tools # ensure registered
from tools.registry import registry
from toolsets import resolve_toolset
schema = registry.get_definitions(set(resolve_toolset("hermes-cli")), quiet=True)
names = {s["function"].get("name") for s in schema if "function" in s}
kanban = {n for n in names if n and n.startswith("kanban_")}
expected = {
"kanban_show", "kanban_complete", "kanban_block", "kanban_heartbeat",
"kanban_comment", "kanban_create", "kanban_link",
}
assert kanban == expected, f"expected {expected}, got {kanban}"
# ---------------------------------------------------------------------------
# Handler happy paths
# ---------------------------------------------------------------------------
@pytest.fixture
def worker_env(monkeypatch, tmp_path):
"""Simulate being a worker: HERMES_HOME isolated, HERMES_KANBAN_TASK set
after we've created the task."""
home = tmp_path / ".hermes"
home.mkdir()
monkeypatch.setenv("HERMES_HOME", str(home))
monkeypatch.setenv("HERMES_PROFILE", "test-worker")
from pathlib import Path as _Path
monkeypatch.setattr(_Path, "home", lambda: tmp_path)
from hermes_cli import kanban_db as kb
kb._INITIALIZED_PATHS.clear()
kb.init_db()
conn = kb.connect()
try:
tid = kb.create_task(conn, title="worker-test", assignee="test-worker")
kb.claim_task(conn, tid)
finally:
conn.close()
monkeypatch.setenv("HERMES_KANBAN_TASK", tid)
return tid
def test_show_defaults_to_env_task_id(worker_env):
from tools import kanban_tools as kt
out = kt._handle_show({})
d = json.loads(out)
assert "task" in d
assert d["task"]["id"] == worker_env
assert d["task"]["status"] == "running"
assert "worker_context" in d
assert "runs" in d
def test_show_explicit_task_id(worker_env):
"""Peek at a different task than the one in env."""
from hermes_cli import kanban_db as kb
conn = kb.connect()
try:
other = kb.create_task(conn, title="other task", assignee="peer")
finally:
conn.close()
from tools import kanban_tools as kt
out = kt._handle_show({"task_id": other})
d = json.loads(out)
assert d["task"]["id"] == other
def test_complete_happy_path(worker_env):
from tools import kanban_tools as kt
out = kt._handle_complete({
"summary": "got the thing done",
"metadata": {"files": 2},
})
d = json.loads(out)
assert d["ok"] is True
assert d["task_id"] == worker_env
# Verify via kernel
from hermes_cli import kanban_db as kb
conn = kb.connect()
try:
run = kb.latest_run(conn, worker_env)
assert run.outcome == "completed"
assert run.summary == "got the thing done"
assert run.metadata == {"files": 2}
finally:
conn.close()
def test_complete_with_result_only(worker_env):
"""`result` alone (without summary) is accepted for legacy compat."""
from tools import kanban_tools as kt
out = kt._handle_complete({"result": "legacy result"})
d = json.loads(out)
assert d["ok"] is True
def test_complete_rejects_no_handoff(worker_env):
from tools import kanban_tools as kt
out = kt._handle_complete({})
assert json.loads(out).get("error"), "should have errored"
def test_complete_rejects_non_dict_metadata(worker_env):
from tools import kanban_tools as kt
out = kt._handle_complete({"summary": "x", "metadata": [1, 2, 3]})
assert json.loads(out).get("error")
def test_block_happy_path(worker_env):
from tools import kanban_tools as kt
out = kt._handle_block({"reason": "need clarification"})
d = json.loads(out)
assert d["ok"] is True
from hermes_cli import kanban_db as kb
conn = kb.connect()
try:
assert kb.get_task(conn, worker_env).status == "blocked"
finally:
conn.close()
def test_block_rejects_empty_reason(worker_env):
from tools import kanban_tools as kt
for bad in ["", " ", None]:
out = kt._handle_block({"reason": bad})
assert json.loads(out).get("error")
def test_heartbeat_happy_path(worker_env):
from tools import kanban_tools as kt
out = kt._handle_heartbeat({"note": "progress"})
d = json.loads(out)
assert d["ok"] is True
def test_heartbeat_without_note(worker_env):
"""note is optional."""
from tools import kanban_tools as kt
out = kt._handle_heartbeat({})
d = json.loads(out)
assert d["ok"] is True
def test_comment_happy_path(worker_env):
from tools import kanban_tools as kt
out = kt._handle_comment({
"task_id": worker_env,
"body": "hello thread",
})
d = json.loads(out)
assert d["ok"] is True
assert d["comment_id"]
from hermes_cli import kanban_db as kb
conn = kb.connect()
try:
comments = kb.list_comments(conn, worker_env)
assert len(comments) == 1
# Author defaults to HERMES_PROFILE env we set in the fixture
assert comments[0].author == "test-worker"
assert comments[0].body == "hello thread"
finally:
conn.close()
def test_comment_rejects_empty_body(worker_env):
from tools import kanban_tools as kt
out = kt._handle_comment({"task_id": worker_env, "body": " "})
assert json.loads(out).get("error")
def test_comment_custom_author(worker_env):
from tools import kanban_tools as kt
out = kt._handle_comment({
"task_id": worker_env, "body": "hi", "author": "custom-bot",
})
assert json.loads(out)["ok"]
from hermes_cli import kanban_db as kb
conn = kb.connect()
try:
comments = kb.list_comments(conn, worker_env)
assert comments[0].author == "custom-bot"
finally:
conn.close()
def test_create_happy_path(worker_env):
from tools import kanban_tools as kt
out = kt._handle_create({
"title": "child task",
"assignee": "peer",
"parents": [worker_env],
})
d = json.loads(out)
assert d["ok"] is True
assert d["task_id"]
assert d["status"] == "todo" # parent isn't done yet
from hermes_cli import kanban_db as kb
conn = kb.connect()
try:
child = kb.get_task(conn, d["task_id"])
assert child.title == "child task"
assert child.assignee == "peer"
finally:
conn.close()
def test_create_rejects_no_title(worker_env):
from tools import kanban_tools as kt
assert json.loads(kt._handle_create({"assignee": "x"})).get("error")
assert json.loads(kt._handle_create({"title": " ", "assignee": "x"})).get("error")
def test_create_rejects_no_assignee(worker_env):
from tools import kanban_tools as kt
assert json.loads(kt._handle_create({"title": "t"})).get("error")
def test_create_rejects_non_list_parents(worker_env):
from tools import kanban_tools as kt
out = kt._handle_create({"title": "t", "assignee": "a", "parents": 42})
assert json.loads(out).get("error")
def test_create_accepts_string_parent(worker_env):
"""Convenience: a single parent id as string is coerced to [id]."""
from tools import kanban_tools as kt
out = kt._handle_create({
"title": "t", "assignee": "a", "parents": worker_env,
})
assert json.loads(out)["ok"]
def test_create_accepts_skills_list(worker_env):
"""Tool writes the per-task skills through to the kernel."""
from tools import kanban_tools as kt
from hermes_cli import kanban_db as kb
out = kt._handle_create({
"title": "skilled",
"assignee": "linguist",
"skills": ["translation", "github-code-review"],
})
d = json.loads(out)
assert d["ok"] is True
with kb.connect() as conn:
task = kb.get_task(conn, d["task_id"])
assert task.skills == ["translation", "github-code-review"]
def test_create_accepts_skills_string(worker_env):
"""Convenience: a single skill name as string is coerced to [name]."""
from tools import kanban_tools as kt
from hermes_cli import kanban_db as kb
out = kt._handle_create({
"title": "one-skill",
"assignee": "a",
"skills": "translation",
})
d = json.loads(out)
assert d["ok"] is True
with kb.connect() as conn:
task = kb.get_task(conn, d["task_id"])
assert task.skills == ["translation"]
def test_create_rejects_non_list_skills(worker_env):
"""skills: 42 must be rejected, not silently dropped."""
from tools import kanban_tools as kt
out = kt._handle_create({
"title": "t", "assignee": "a", "skills": 42,
})
assert json.loads(out).get("error")
def test_link_happy_path(worker_env):
from hermes_cli import kanban_db as kb
conn = kb.connect()
try:
a = kb.create_task(conn, title="A", assignee="x")
b = kb.create_task(conn, title="B", assignee="x")
finally:
conn.close()
from tools import kanban_tools as kt
out = kt._handle_link({"parent_id": a, "child_id": b})
d = json.loads(out)
assert d["ok"] is True
def test_link_rejects_self_reference(worker_env):
from tools import kanban_tools as kt
out = kt._handle_link({"parent_id": worker_env, "child_id": worker_env})
assert json.loads(out).get("error")
def test_link_rejects_missing_args(worker_env):
from tools import kanban_tools as kt
assert json.loads(kt._handle_link({"parent_id": "x"})).get("error")
assert json.loads(kt._handle_link({"child_id": "y"})).get("error")
def test_link_rejects_cycle(worker_env):
"""A → B, then try to link B → A."""
from hermes_cli import kanban_db as kb
conn = kb.connect()
try:
a = kb.create_task(conn, title="A", assignee="x")
b = kb.create_task(conn, title="B", assignee="x", parents=[a])
finally:
conn.close()
from tools import kanban_tools as kt
out = kt._handle_link({"parent_id": b, "child_id": a})
assert json.loads(out).get("error")
# ---------------------------------------------------------------------------
# End-to-end: simulate a full worker lifecycle through the tools
# ---------------------------------------------------------------------------
def test_worker_lifecycle_through_tools(worker_env):
"""Drive the full claim -> heartbeat -> comment -> complete lifecycle
exclusively through the tools, then verify the DB state matches what
the dispatcher/notifier expect."""
from tools import kanban_tools as kt
# 1. show — worker orientation
show = json.loads(kt._handle_show({}))
assert show["task"]["id"] == worker_env
# 2. heartbeat during long op
assert json.loads(kt._handle_heartbeat({"note": "warming up"}))["ok"]
# 3. comment for a future peer
assert json.loads(kt._handle_comment({
"task_id": worker_env,
"body": "note: using stdlib sqlite3 bindings",
}))["ok"]
# 4. spawn a child task for follow-up
child_out = json.loads(kt._handle_create({
"title": "write integration test",
"assignee": "qa",
"parents": [worker_env],
}))
assert child_out["ok"]
# 5. complete with structured handoff
comp = json.loads(kt._handle_complete({
"summary": "implemented + spawned QA follow-up",
"metadata": {"child_task": child_out["task_id"]},
}))
assert comp["ok"]
# Verify final state
from hermes_cli import kanban_db as kb
conn = kb.connect()
try:
parent = kb.get_task(conn, worker_env)
assert parent.status == "done"
assert parent.current_run_id is None
run = kb.latest_run(conn, worker_env)
assert run.outcome == "completed"
assert run.metadata == {"child_task": child_out["task_id"]}
# Child is todo (parent just finished, but recompute_ready may
# have promoted it — complete_task runs recompute internally).
child = kb.get_task(conn, child_out["task_id"])
assert child.status == "ready", (
f"child should be ready after parent done, got {child.status}"
)
# Comment is visible
assert len(kb.list_comments(conn, worker_env)) == 1
# Heartbeat event recorded
hb = [e for e in kb.list_events(conn, worker_env) if e.kind == "heartbeat"]
assert len(hb) == 1
finally:
conn.close()
# ---------------------------------------------------------------------------
# System-prompt guidance injection
# ---------------------------------------------------------------------------
def test_kanban_guidance_not_in_normal_prompt(monkeypatch, tmp_path):
"""A normal chat session (no HERMES_KANBAN_TASK) must NOT have
KANBAN_GUIDANCE in its system prompt."""
monkeypatch.delenv("HERMES_KANBAN_TASK", raising=False)
home = tmp_path / ".hermes"
home.mkdir()
monkeypatch.setenv("HERMES_HOME", str(home))
from pathlib import Path as _P
monkeypatch.setattr(_P, "home", lambda: tmp_path)
from run_agent import AIAgent
a = AIAgent(
api_key="test",
base_url="https://openrouter.ai/api/v1",
quiet_mode=True,
skip_context_files=True,
skip_memory=True,
)
prompt = a._build_system_prompt()
assert "You are a Kanban worker" not in prompt
assert "kanban_show()" not in prompt
def test_kanban_guidance_in_worker_prompt(monkeypatch, tmp_path):
"""A worker session (HERMES_KANBAN_TASK set) MUST have the full
lifecycle guidance in its system prompt."""
monkeypatch.setenv("HERMES_KANBAN_TASK", "t_fake")
home = tmp_path / ".hermes"
home.mkdir()
monkeypatch.setenv("HERMES_HOME", str(home))
from pathlib import Path as _P
monkeypatch.setattr(_P, "home", lambda: tmp_path)
from run_agent import AIAgent
a = AIAgent(
api_key="test",
base_url="https://openrouter.ai/api/v1",
quiet_mode=True,
skip_context_files=True,
skip_memory=True,
)
prompt = a._build_system_prompt()
# Header phrase
assert "You are a Kanban worker" in prompt
# Lifecycle signals
assert "kanban_show()" in prompt
assert "kanban_complete" in prompt
assert "kanban_block" in prompt
assert "kanban_create" in prompt
# Anti-shell guidance
assert "Do not shell out" in prompt or "tools — they work" in prompt
def test_kanban_guidance_prompt_size_bounded(monkeypatch, tmp_path):
"""Sanity: the guidance block is under 4 KB so it doesn't blow
up the cached prompt."""
monkeypatch.setenv("HERMES_KANBAN_TASK", "t_fake")
home = tmp_path / ".hermes"
home.mkdir()
monkeypatch.setenv("HERMES_HOME", str(home))
from pathlib import Path as _P
monkeypatch.setattr(_P, "home", lambda: tmp_path)
from agent.prompt_builder import KANBAN_GUIDANCE
assert 1_500 < len(KANBAN_GUIDANCE) < 4_096, (
f"KANBAN_GUIDANCE is {len(KANBAN_GUIDANCE)} chars — too short (missing?) or too long"
)
@@ -0,0 +1,117 @@
"""Tests for tui_gateway background-review summary delivery.
When the self-improvement background review fires and saves a skill or
memory entry, it calls ``agent.background_review_callback(message)``. In
the CLI that routes through a prompt_toolkit-safe ``_cprint``; in the TUI
there is no print surface, so without a callback wired up the review
writes the change silently. ``_init_session`` attaches a callback that
emits a ``review.summary`` event which Ink renders as a persistent
transcript line.
"""
from __future__ import annotations
import sys
from unittest.mock import MagicMock, patch
import pytest
@pytest.fixture()
def server():
with patch.dict(
"sys.modules",
{
"hermes_constants": MagicMock(
get_hermes_home=MagicMock(return_value="/tmp/hermes_test_review_summary")
),
"hermes_cli.env_loader": MagicMock(),
"hermes_cli.banner": MagicMock(),
"hermes_state": MagicMock(),
},
):
import importlib
mod = importlib.import_module("tui_gateway.server")
yield mod
mod._sessions.clear()
mod._pending.clear()
mod._answers.clear()
mod._methods.clear()
importlib.reload(mod)
def test_init_session_attaches_background_review_callback(server, monkeypatch):
"""After _init_session, agent.background_review_callback is set to a
function that emits 'review.summary' for the session's sid."""
# Neutralize side-effect calls inside _init_session so we're testing
# just the callback wiring.
monkeypatch.setattr(server, "_SlashWorker", lambda *a, **kw: object())
monkeypatch.setattr(server, "_wire_callbacks", lambda sid: None)
monkeypatch.setattr(server, "_notify_session_boundary", lambda *a, **kw: None)
monkeypatch.setattr(server, "_session_info", lambda agent: {"model": "m"})
monkeypatch.setattr(server, "_load_show_reasoning", lambda: False)
monkeypatch.setattr(server, "_load_tool_progress_mode", lambda: "all")
captured_emits: list = []
monkeypatch.setattr(
server,
"_emit",
lambda event, sid, payload=None: captured_emits.append(
(event, sid, payload)
),
)
class FakeAgent:
model = "fake/model"
# Presence of the attribute is all the Python side needs; the real
# AIAgent has it defaulted to None in __init__.
background_review_callback = None
agent = FakeAgent()
server._init_session("sid-abc", "session-key", agent, [], cols=80)
cb = getattr(agent, "background_review_callback", None)
assert callable(cb), (
"_init_session must attach a background_review_callback to the "
"agent so the self-improvement review is visible in the TUI."
)
# Clear the session.info emit captured during _init_session.
captured_emits.clear()
# Invoke the callback the way AIAgent._spawn_background_review would.
cb("💾 Self-improvement review: Skill 'hermes-release' patched")
# Exactly one review.summary event should have been emitted, bound to
# the session id we passed in, carrying the full message text.
matched = [e for e in captured_emits if e[0] == "review.summary"]
assert len(matched) == 1, captured_emits
event, sid, payload = matched[0]
assert sid == "sid-abc"
assert payload == {
"text": "💾 Self-improvement review: Skill 'hermes-release' patched"
}
def test_review_summary_callback_survives_agent_without_attribute(server, monkeypatch):
"""If the agent is a bare object that doesn't allow attribute
assignment (e.g. some stubbed test double), _init_session must not
raise session startup stays robust."""
monkeypatch.setattr(server, "_SlashWorker", lambda *a, **kw: object())
monkeypatch.setattr(server, "_wire_callbacks", lambda sid: None)
monkeypatch.setattr(server, "_notify_session_boundary", lambda *a, **kw: None)
monkeypatch.setattr(server, "_session_info", lambda agent: {"model": "m"})
monkeypatch.setattr(server, "_load_show_reasoning", lambda: False)
monkeypatch.setattr(server, "_load_tool_progress_mode", lambda: "all")
monkeypatch.setattr(server, "_emit", lambda *a, **kw: None)
class LockedAgent:
__slots__ = ("model",)
def __init__(self):
self.model = "fake/model"
# LockedAgent's __slots__ blocks background_review_callback assignment.
server._init_session("sid-x", "key-x", LockedAgent(), [], cols=80)
# If we got here, _init_session swallowed the AttributeError gracefully.
+726
View File
@@ -0,0 +1,726 @@
"""Kanban tools — structured tool-call surface for worker + orchestrator agents.
These tools are only registered into the model's schema when the agent is
running under the dispatcher (env var ``HERMES_KANBAN_TASK`` set). A
normal ``hermes chat`` session sees **zero** kanban tools in its schema.
Why tools instead of just shelling out to ``hermes kanban``?
1. **Backend portability.** A worker whose terminal tool points at Docker
/ Modal / Singularity / SSH would run ``hermes kanban complete ``
inside the container, where ``hermes`` isn't installed and the DB
isn't mounted. Tools run in the agent's Python process, so they
always reach ``~/.hermes/kanban.db`` regardless of terminal backend.
2. **No shell-quoting footguns.** Passing ``--metadata '{"x": [...]}'``
through shlex+argparse is fragile. Structured tool args skip it.
3. **Better errors.** Tool-call failures return structured JSON the
model can reason about, not stderr strings it has to parse.
Humans continue to use the CLI (``hermes kanban ``), the dashboard
(``hermes dashboard``), and the slash command (``/kanban ``) all
three bypass the agent entirely. The tools are ONLY for the worker
agent's handoff back to the kernel.
"""
from __future__ import annotations
import json
import logging
import os
from typing import Any, Optional
from tools.registry import registry, tool_error
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Gating
# ---------------------------------------------------------------------------
def _check_kanban_mode() -> bool:
"""Tools are available iff the current process has ``HERMES_KANBAN_TASK``
set in its env, which the dispatcher sets when spawning a worker.
Humans running ``hermes chat`` see zero kanban tools. Workers spawned
by the kanban dispatcher (gateway-embedded by default) see all seven.
"""
return bool(os.environ.get("HERMES_KANBAN_TASK"))
# ---------------------------------------------------------------------------
# Shared helpers
# ---------------------------------------------------------------------------
def _default_task_id(arg: Optional[str]) -> Optional[str]:
"""Resolve ``task_id`` arg or fall back to the env var the dispatcher set."""
if arg:
return arg
env_tid = os.environ.get("HERMES_KANBAN_TASK")
return env_tid or None
def _connect():
"""Import + connect lazily so the module imports cleanly in non-kanban
contexts (e.g. test rigs that import every tool module)."""
from hermes_cli import kanban_db as kb
return kb, kb.connect()
def _ok(**fields: Any) -> str:
return json.dumps({"ok": True, **fields})
# ---------------------------------------------------------------------------
# Handlers
# ---------------------------------------------------------------------------
def _handle_show(args: dict, **kw) -> str:
"""Read a task's full state: task row, parents, children, comments,
runs (attempt history), and the last N events."""
tid = _default_task_id(args.get("task_id"))
if not tid:
return tool_error(
"task_id is required (or set HERMES_KANBAN_TASK in the env)"
)
try:
kb, conn = _connect()
try:
task = kb.get_task(conn, tid)
if task is None:
return tool_error(f"task {tid} not found")
comments = kb.list_comments(conn, tid)
events = kb.list_events(conn, tid)
runs = kb.list_runs(conn, tid)
parents = kb.parent_ids(conn, tid)
children = kb.child_ids(conn, tid)
def _task_dict(t):
return {
"id": t.id, "title": t.title, "body": t.body,
"assignee": t.assignee, "status": t.status,
"tenant": t.tenant, "priority": t.priority,
"workspace_kind": t.workspace_kind,
"workspace_path": t.workspace_path,
"created_by": t.created_by, "created_at": t.created_at,
"started_at": t.started_at,
"completed_at": t.completed_at,
"result": t.result,
"current_run_id": t.current_run_id,
}
def _run_dict(r):
return {
"id": r.id, "profile": r.profile,
"status": r.status, "outcome": r.outcome,
"summary": r.summary, "error": r.error,
"metadata": r.metadata,
"started_at": r.started_at, "ended_at": r.ended_at,
}
return json.dumps({
"task": _task_dict(task),
"parents": parents,
"children": children,
"comments": [
{"author": c.author, "body": c.body,
"created_at": c.created_at}
for c in comments
],
"events": [
{"kind": e.kind, "payload": e.payload,
"created_at": e.created_at, "run_id": e.run_id}
for e in events[-50:] # cap; full log via CLI
],
"runs": [_run_dict(r) for r in runs],
# Also surface the worker's own context block so the
# agent can include it directly if it wants. This is
# the same string build_worker_context returns to the
# dispatcher at spawn time.
"worker_context": kb.build_worker_context(conn, tid),
})
finally:
conn.close()
except Exception as e:
logger.exception("kanban_show failed")
return tool_error(f"kanban_show: {e}")
def _handle_complete(args: dict, **kw) -> str:
"""Mark the current task done with a structured handoff."""
tid = _default_task_id(args.get("task_id"))
if not tid:
return tool_error(
"task_id is required (or set HERMES_KANBAN_TASK in the env)"
)
summary = args.get("summary")
metadata = args.get("metadata")
result = args.get("result")
if not (summary or result):
return tool_error(
"provide at least one of: summary (preferred), result"
)
if metadata is not None and not isinstance(metadata, dict):
return tool_error(
f"metadata must be an object/dict, got {type(metadata).__name__}"
)
try:
kb, conn = _connect()
try:
ok = kb.complete_task(
conn, tid,
result=result, summary=summary, metadata=metadata,
)
if not ok:
return tool_error(
f"could not complete {tid} (unknown id or already terminal)"
)
run = kb.latest_run(conn, tid)
return _ok(task_id=tid, run_id=run.id if run else None)
finally:
conn.close()
except Exception as e:
logger.exception("kanban_complete failed")
return tool_error(f"kanban_complete: {e}")
def _handle_block(args: dict, **kw) -> str:
"""Transition the task to blocked with a reason a human will read."""
tid = _default_task_id(args.get("task_id"))
if not tid:
return tool_error(
"task_id is required (or set HERMES_KANBAN_TASK in the env)"
)
reason = args.get("reason")
if not reason or not str(reason).strip():
return tool_error("reason is required — explain what input you need")
try:
kb, conn = _connect()
try:
ok = kb.block_task(conn, tid, reason=reason)
if not ok:
return tool_error(
f"could not block {tid} (unknown id or not in "
f"running/ready)"
)
run = kb.latest_run(conn, tid)
return _ok(task_id=tid, run_id=run.id if run else None)
finally:
conn.close()
except Exception as e:
logger.exception("kanban_block failed")
return tool_error(f"kanban_block: {e}")
def _handle_heartbeat(args: dict, **kw) -> str:
"""Signal that the worker is still alive during a long operation."""
tid = _default_task_id(args.get("task_id"))
if not tid:
return tool_error(
"task_id is required (or set HERMES_KANBAN_TASK in the env)"
)
note = args.get("note")
try:
kb, conn = _connect()
try:
ok = kb.heartbeat_worker(conn, tid, note=note)
if not ok:
return tool_error(
f"could not heartbeat {tid} (unknown id or not running)"
)
return _ok(task_id=tid)
finally:
conn.close()
except Exception as e:
logger.exception("kanban_heartbeat failed")
return tool_error(f"kanban_heartbeat: {e}")
def _handle_comment(args: dict, **kw) -> str:
"""Append a comment to a task's thread."""
tid = args.get("task_id")
if not tid:
return tool_error(
"task_id is required (use the current task id if that's what "
"you mean — pulls from env but kept explicit here)"
)
body = args.get("body")
if not body or not str(body).strip():
return tool_error("body is required")
author = args.get("author") or os.environ.get("HERMES_PROFILE") or "worker"
try:
kb, conn = _connect()
try:
cid = kb.add_comment(conn, tid, author=author, body=str(body))
return _ok(task_id=tid, comment_id=cid)
finally:
conn.close()
except Exception as e:
logger.exception("kanban_comment failed")
return tool_error(f"kanban_comment: {e}")
def _handle_create(args: dict, **kw) -> str:
"""Create a child task. Orchestrator workers use this to fan out.
``parents`` can be a list of task ids; dependency-gated promotion
works as usual.
"""
title = args.get("title")
if not title or not str(title).strip():
return tool_error("title is required")
assignee = args.get("assignee")
if not assignee:
return tool_error(
"assignee is required — name the profile that should execute this "
"task (the dispatcher will only spawn tasks with an assignee)"
)
body = args.get("body")
parents = args.get("parents") or []
tenant = args.get("tenant") or os.environ.get("HERMES_TENANT")
priority = args.get("priority")
workspace_kind = args.get("workspace_kind") or "scratch"
workspace_path = args.get("workspace_path")
triage = bool(args.get("triage"))
idempotency_key = args.get("idempotency_key")
max_runtime_seconds = args.get("max_runtime_seconds")
skills = args.get("skills")
if isinstance(skills, str):
# Accept a single skill name as a string for convenience.
skills = [skills]
if skills is not None and not isinstance(skills, (list, tuple)):
return tool_error(
f"skills must be a list of skill names, got {type(skills).__name__}"
)
if isinstance(parents, str):
parents = [parents]
if not isinstance(parents, (list, tuple)):
return tool_error(
f"parents must be a list of task ids, got {type(parents).__name__}"
)
try:
kb, conn = _connect()
try:
new_tid = kb.create_task(
conn,
title=str(title).strip(),
body=body,
assignee=str(assignee),
parents=tuple(parents),
tenant=tenant,
priority=int(priority) if priority is not None else 0,
workspace_kind=str(workspace_kind),
workspace_path=workspace_path,
triage=triage,
idempotency_key=idempotency_key,
max_runtime_seconds=(
int(max_runtime_seconds)
if max_runtime_seconds is not None else None
),
skills=skills,
created_by=os.environ.get("HERMES_PROFILE") or "worker",
)
new_task = kb.get_task(conn, new_tid)
return _ok(
task_id=new_tid,
status=new_task.status if new_task else None,
)
finally:
conn.close()
except Exception as e:
logger.exception("kanban_create failed")
return tool_error(f"kanban_create: {e}")
def _handle_link(args: dict, **kw) -> str:
"""Add a parent→child dependency edge after the fact."""
parent_id = args.get("parent_id")
child_id = args.get("child_id")
if not parent_id or not child_id:
return tool_error("both parent_id and child_id are required")
try:
kb, conn = _connect()
try:
kb.link_tasks(conn, parent_id=parent_id, child_id=child_id)
return _ok(parent_id=parent_id, child_id=child_id)
finally:
conn.close()
except ValueError as e:
# Covers cycle + self-parent rejections
return tool_error(f"kanban_link: {e}")
except Exception as e:
logger.exception("kanban_link failed")
return tool_error(f"kanban_link: {e}")
# ---------------------------------------------------------------------------
# Schemas
# ---------------------------------------------------------------------------
_DESC_TASK_ID_DEFAULT = (
"Task id. If omitted, defaults to HERMES_KANBAN_TASK from the env "
"(the task the dispatcher spawned you to work on)."
)
KANBAN_SHOW_SCHEMA = {
"name": "kanban_show",
"description": (
"Read a task's full state — title, body, assignee, parent task "
"handoffs, your prior attempts on this task if any, comments, "
"and recent events. Use this to (re)orient yourself before "
"starting work, especially on retries. The response includes a "
"pre-formatted ``worker_context`` string suitable for inclusion "
"verbatim in your reasoning."
),
"parameters": {
"type": "object",
"properties": {
"task_id": {
"type": "string",
"description": _DESC_TASK_ID_DEFAULT,
},
},
"required": [],
},
}
KANBAN_COMPLETE_SCHEMA = {
"name": "kanban_complete",
"description": (
"Mark your current task done with a structured handoff for "
"downstream workers and humans. Prefer ``summary`` for a "
"human-readable 1-3 sentence description of what you did; put "
"machine-readable facts in ``metadata`` (changed_files, "
"tests_run, decisions, findings, etc). At least one of "
"``summary`` or ``result`` is required."
),
"parameters": {
"type": "object",
"properties": {
"task_id": {
"type": "string",
"description": _DESC_TASK_ID_DEFAULT,
},
"summary": {
"type": "string",
"description": (
"Human-readable handoff, 1-3 sentences. Appears in "
"Run History on the dashboard and in downstream "
"workers' context."
),
},
"metadata": {
"type": "object",
"description": (
"Free-form dict of structured facts about this "
"attempt — {\"changed_files\": [...], \"tests_run\": 12, "
"\"findings\": [...]}. Surfaced to downstream "
"workers alongside ``summary``."
),
},
"result": {
"type": "string",
"description": (
"Short result log line (legacy field, maps to "
"task.result). Use ``summary`` instead when "
"possible; this exists for compatibility with "
"callers that still set --result on the CLI."
),
},
},
"required": [],
},
}
KANBAN_BLOCK_SCHEMA = {
"name": "kanban_block",
"description": (
"Transition the task to blocked because you need human input "
"to proceed. ``reason`` will be shown to the human on the "
"board and included in context when someone unblocks you. "
"Use for genuine blockers only — don't block on things you can "
"resolve yourself."
),
"parameters": {
"type": "object",
"properties": {
"task_id": {
"type": "string",
"description": _DESC_TASK_ID_DEFAULT,
},
"reason": {
"type": "string",
"description": (
"What you need answered, in one or two sentences. "
"Don't paste the whole conversation; the human has "
"the board and can ask follow-ups via comments."
),
},
},
"required": ["reason"],
},
}
KANBAN_HEARTBEAT_SCHEMA = {
"name": "kanban_heartbeat",
"description": (
"Signal that you're still alive during a long operation "
"(training, encoding, large crawls). Call every few minutes so "
"humans see liveness separately from PID checks. Pure side "
"effect — no work changes."
),
"parameters": {
"type": "object",
"properties": {
"task_id": {
"type": "string",
"description": _DESC_TASK_ID_DEFAULT,
},
"note": {
"type": "string",
"description": (
"Optional short note describing current progress. "
"Shown in the event log."
),
},
},
"required": [],
},
}
KANBAN_COMMENT_SCHEMA = {
"name": "kanban_comment",
"description": (
"Append a comment to a task's thread. Use for durable notes "
"that should outlive this run (questions for the next worker, "
"partial findings, rationale). Ephemeral reasoning doesn't "
"belong here — use your normal response instead."
),
"parameters": {
"type": "object",
"properties": {
"task_id": {
"type": "string",
"description": (
"Task id. Required (may be your own task or "
"another's — comment threads are per-task)."
),
},
"body": {
"type": "string",
"description": "Markdown-supported comment body.",
},
"author": {
"type": "string",
"description": (
"Override author name. Defaults to the current "
"profile (HERMES_PROFILE env)."
),
},
},
"required": ["task_id", "body"],
},
}
KANBAN_CREATE_SCHEMA = {
"name": "kanban_create",
"description": (
"Create a new kanban task, optionally as a child of the current "
"one (pass the current task id in ``parents``). Used by "
"orchestrator workers to fan out — decompose work into child "
"tasks with specific assignees, link them into a pipeline, "
"then complete your own task. The dispatcher picks up the new "
"tasks on its next tick and spawns the assigned profiles."
),
"parameters": {
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "Short task title (required).",
},
"assignee": {
"type": "string",
"description": (
"Profile name that should execute this task "
"(e.g. 'researcher-a', 'reviewer', 'writer'). "
"Required — tasks without an assignee are never "
"dispatched."
),
},
"body": {
"type": "string",
"description": (
"Opening post: full spec, acceptance criteria, "
"links. The assigned worker reads this as part of "
"its context."
),
},
"parents": {
"type": "array",
"items": {"type": "string"},
"description": (
"Parent task ids. The new task stays in 'todo' "
"until every parent reaches 'done'; then it "
"auto-promotes to 'ready'. Typical fan-in: list "
"all the researcher task ids when creating a "
"synthesizer task."
),
},
"tenant": {
"type": "string",
"description": (
"Optional namespace for multi-project isolation. "
"Defaults to HERMES_TENANT env if set."
),
},
"priority": {
"type": "integer",
"description": (
"Dispatcher tiebreaker. Higher = picked sooner "
"when multiple ready tasks share an assignee."
),
},
"workspace_kind": {
"type": "string",
"enum": ["scratch", "dir", "worktree"],
"description": (
"Workspace flavor: 'scratch' (fresh tmp dir, "
"default), 'dir' (shared directory, requires "
"absolute workspace_path), 'worktree' (git worktree)."
),
},
"workspace_path": {
"type": "string",
"description": (
"Absolute path for 'dir' or 'worktree' workspace. "
"Relative paths are rejected at dispatch."
),
},
"triage": {
"type": "boolean",
"description": (
"If true, task lands in 'triage' instead of 'todo' "
"— a specifier profile is expected to flesh out "
"the body before work starts."
),
},
"idempotency_key": {
"type": "string",
"description": (
"If a non-archived task with this key already "
"exists, return that task's id instead of creating "
"a duplicate. Useful for retry-safe automation."
),
},
"max_runtime_seconds": {
"type": "integer",
"description": (
"Per-task runtime cap. When exceeded, the "
"dispatcher SIGTERMs the worker and re-queues the "
"task with outcome='timed_out'."
),
},
"skills": {
"type": "array",
"items": {"type": "string"},
"description": (
"Skill names to force-load into the dispatched "
"worker (in addition to the built-in kanban-worker "
"skill). Use this to pin a task to a specialist "
"context — e.g. ['translation'] for a translation "
"task, ['github-code-review'] for a reviewer task. "
"The names must match skills installed on the "
"assignee's profile."
),
},
},
"required": ["title", "assignee"],
},
}
KANBAN_LINK_SCHEMA = {
"name": "kanban_link",
"description": (
"Add a parent→child dependency edge after both tasks already "
"exist. The child won't promote to 'ready' until all parents "
"are 'done'. Cycles and self-links are rejected."
),
"parameters": {
"type": "object",
"properties": {
"parent_id": {"type": "string", "description": "Parent task id."},
"child_id": {"type": "string", "description": "Child task id."},
},
"required": ["parent_id", "child_id"],
},
}
# ---------------------------------------------------------------------------
# Registration
# ---------------------------------------------------------------------------
registry.register(
name="kanban_show",
toolset="kanban",
schema=KANBAN_SHOW_SCHEMA,
handler=_handle_show,
check_fn=_check_kanban_mode,
emoji="📋",
)
registry.register(
name="kanban_complete",
toolset="kanban",
schema=KANBAN_COMPLETE_SCHEMA,
handler=_handle_complete,
check_fn=_check_kanban_mode,
emoji="",
)
registry.register(
name="kanban_block",
toolset="kanban",
schema=KANBAN_BLOCK_SCHEMA,
handler=_handle_block,
check_fn=_check_kanban_mode,
emoji="",
)
registry.register(
name="kanban_heartbeat",
toolset="kanban",
schema=KANBAN_HEARTBEAT_SCHEMA,
handler=_handle_heartbeat,
check_fn=_check_kanban_mode,
emoji="💓",
)
registry.register(
name="kanban_comment",
toolset="kanban",
schema=KANBAN_COMMENT_SCHEMA,
handler=_handle_comment,
check_fn=_check_kanban_mode,
emoji="💬",
)
registry.register(
name="kanban_create",
toolset="kanban",
schema=KANBAN_CREATE_SCHEMA,
handler=_handle_create,
check_fn=_check_kanban_mode,
emoji="",
)
registry.register(
name="kanban_link",
toolset="kanban",
schema=KANBAN_LINK_SCHEMA,
handler=_handle_link,
check_fn=_check_kanban_mode,
emoji="🔗",
)
+23
View File
@@ -60,6 +60,11 @@ _HERMES_CORE_TOOLS = [
"send_message",
# Home Assistant smart home control (gated on HASS_TOKEN via check_fn)
"ha_list_entities", "ha_get_state", "ha_list_services", "ha_call_service",
# Kanban multi-agent coordination — only in schema when the agent is
# spawned as a kanban worker (HERMES_KANBAN_TASK env set), otherwise
# zero schema footprint. Gated via check_fn in tools/kanban_tools.py.
"kanban_show", "kanban_complete", "kanban_block", "kanban_heartbeat",
"kanban_comment", "kanban_create", "kanban_link",
]
@@ -202,6 +207,24 @@ TOOLSETS = {
"includes": []
},
"kanban": {
"description": (
"Kanban multi-agent coordination — only active when the agent "
"is spawned by the kanban dispatcher (HERMES_KANBAN_TASK env "
"set). The dispatcher runs inside the gateway by default; see "
"`kanban.dispatch_in_gateway` in config.yaml. Lets workers mark "
"tasks done with structured handoffs, block for human input, "
"heartbeat during long ops, comment on threads, and (for "
"orchestrators) fan out into child tasks."
),
"tools": [
"kanban_show", "kanban_complete", "kanban_block",
"kanban_heartbeat", "kanban_comment",
"kanban_create", "kanban_link",
],
"includes": [],
},
"discord": {
"description": "Discord read and participate tools (fetch messages, search members, create threads)",
"tools": ["discord"],
+15
View File
@@ -1806,6 +1806,21 @@ def _init_session(sid: str, key: str, agent, history: list, cols: int = 80):
load_permanent_allowlist()
except Exception:
pass
# Surface the self-improvement background review's "💾 …" summary as a
# review.summary event so Ink can render it as a persistent system line
# in the transcript. In the CLI path this message is printed via
# prompt_toolkit; the TUI has no equivalent print surface, so without
# this callback the review would write the skill/memory change silently.
try:
agent.background_review_callback = (
lambda message, _sid=sid: _emit(
"review.summary", _sid, {"text": str(message)}
)
)
except Exception:
# Bare AIAgents that don't expose the attribute (unlikely, but keep
# session startup resilient).
pass
_wire_callbacks(sid)
_notify_session_boundary("on_session_reset", key)
_emit("session.info", sid, _session_info(agent))
@@ -96,3 +96,41 @@ describe('mouse wheel modifier decoding', () => {
expect(key).toMatchObject({ name: 'wheelup', meta: true })
})
})
describe('fragmented SGR mouse recovery', () => {
it('re-synthesizes bracket-only SGR mouse tails as mouse events', () => {
const [[mouse]] = parseMultipleKeypresses(INITIAL_STATE, '[<35;159;11M')
expect(mouse).toMatchObject({ kind: 'mouse', button: 35, col: 159, row: 11, action: 'press' })
})
it('re-synthesizes angle-only SGR mouse tails as mouse events', () => {
const [[mouse]] = parseMultipleKeypresses(INITIAL_STATE, '<35;159;11M')
expect(mouse).toMatchObject({ kind: 'mouse', button: 35, col: 159, row: 11, action: 'press' })
})
it('re-synthesizes degraded SGR mouse bursts without leaking prompt text', () => {
const [events] = parseMultipleKeypresses(INITIAL_STATE, '5;142;11M<35;159;11M35;124;26M35;119;26Mtyped')
expect(events.slice(0, 4)).toEqual([
expect.objectContaining({ kind: 'mouse', button: 5, col: 142, row: 11 }),
expect.objectContaining({ kind: 'mouse', button: 35, col: 159, row: 11 }),
expect.objectContaining({ kind: 'mouse', button: 35, col: 124, row: 26 }),
expect.objectContaining({ kind: 'mouse', button: 35, col: 119, row: 26 })
])
expect(events[4]).toMatchObject({ kind: 'key', sequence: 'typed' })
})
it('keeps isolated semicolon text that only resembles a prefixless mouse report', () => {
const [[key]] = parseMultipleKeypresses(INITIAL_STATE, 'see 1;2;3M for details')
expect(key).toMatchObject({ kind: 'key', sequence: 'see 1;2;3M for details' })
})
it('does not match prefixless fragments inside longer digit runs', () => {
const [[key]] = parseMultipleKeypresses(INITIAL_STATE, '1234;56;78M9;10;11M')
expect(key).toMatchObject({ kind: 'key', sequence: '1234;56;78M9;10;11M' })
})
})
@@ -63,6 +63,7 @@ const XTVERSION_RE = /^\x1bP>\|(.*?)(?:\x07|\x1b\\)$/s
// Button 32=left-drag (0x20 | motion-bit). Plain 0/1/2 = left/mid/right click.
// eslint-disable-next-line no-control-regex
const SGR_MOUSE_RE = /^\x1b\[<(\d+);(\d+);(\d+)([Mm])$/
const SGR_MOUSE_FRAGMENT_RE = /(?<!\d)(?:\[<|<)?(?:[0-9]|[1-9][0-9]|1\d{2}|2[0-4]\d|25[0-5]);\d+;\d+[Mm]/g
function createPasteKey(content: string): ParsedKey {
return {
@@ -267,23 +268,22 @@ export function parseMultipleKeypresses(
} else if (token.type === 'text') {
if (inPaste) {
pasteBuffer += token.value
} else if (/^\[<\d+;\d+;\d+[Mm]$/.test(token.value) || /^\[M[\x60-\x7f][\x20-\uffff]{2}$/.test(token.value)) {
// Orphaned SGR/X10 mouse tail (fullscreen only — mouse tracking is off
// otherwise). A heavy render blocked the event loop past App's 50ms
// flush timer, so the buffered ESC was flushed as a lone Escape and
// the continuation `[<btn;col;rowM` arrived as text. Re-synthesize
// with the ESC prefix so the scroll event still fires instead of
// leaking into the prompt. The spurious Escape is gone; App.tsx's
// readableLength check prevents it. The X10 Cb slot is narrowed to
// the wheel range [\x60-\x7f] (0x40|modifiers + 32) — a full [\x20-]
// range would match typed input like `[MAX]` batched into one read
// and silently drop it as a phantom click. Click/drag orphans leak
// as visible garbage instead; deletable garbage beats silent loss.
const resynthesized = '\x1b' + token.value
const mouse = parseMouseEvent(resynthesized)
keys.push(mouse ?? parseKeypress(resynthesized))
} else {
keys.push(parseKeypress(token.value))
const mouseFragments = parseTextWithSgrMouseFragments(token.value)
if (mouseFragments) {
keys.push(...mouseFragments)
} else if (/^\[M[\x60-\x7f][\x20-\uffff]{2}$/.test(token.value)) {
// Orphaned X10 wheel tail (fullscreen only — mouse tracking is off
// otherwise). A heavy render blocked the event loop past App's 50ms
// flush timer, so the buffered ESC was flushed as a lone Escape and
// the continuation arrived as text. Re-synthesize with ESC so the
// scroll event still fires instead of leaking into the prompt.
const resynthesized = '\x1b' + token.value
keys.push(parseKeypress(resynthesized))
} else {
keys.push(parseKeypress(token.value))
}
}
}
}
@@ -625,6 +625,77 @@ function parseMouseEvent(s: string): ParsedMouse | null {
}
}
function normalizeSgrMouseFragment(fragment: string): string {
if (fragment.startsWith('[<')) {
return `\x1b${fragment}`
}
if (fragment.startsWith('<')) {
return `\x1b[${fragment}`
}
return `\x1b[<${fragment}`
}
function parseSgrMouseFragment(fragment: string): ParsedInput {
const sequence = normalizeSgrMouseFragment(fragment)
return parseMouseEvent(sequence) ?? parseKeypress(sequence)
}
function parseTextWithSgrMouseFragments(text: string): ParsedInput[] | null {
SGR_MOUSE_FRAGMENT_RE.lastIndex = 0
const matches = [...text.matchAll(SGR_MOUSE_FRAGMENT_RE)]
if (matches.length === 0) {
return null
}
const parsed: ParsedInput[] = []
let cursor = 0
let consumedAny = false
for (let i = 0; i < matches.length;) {
const first = matches[i]!
const run: RegExpMatchArray[] = [first]
let runEnd = first.index! + first[0].length
i++
while (i < matches.length && matches[i]!.index === runEnd) {
run.push(matches[i]!)
runEnd = matches[i]!.index! + matches[i]![0].length
i++
}
const hasExplicitMousePrefix = run.some(match => match[0].startsWith('[<') || match[0].startsWith('<'))
const isFragmentBurst = run.length > 1
if (!hasExplicitMousePrefix && !isFragmentBurst) {
continue
}
if (first.index! > cursor) {
parsed.push(parseKeypress(text.slice(cursor, first.index!)))
}
for (const match of run) {
parsed.push(parseSgrMouseFragment(match[0]))
}
cursor = runEnd
consumedAny = true
}
if (!consumedAny) {
return null
}
if (cursor < text.length) {
parsed.push(parseKeypress(text.slice(cursor)))
}
return parsed
}
function parseKeypress(s: string = ''): ParsedKey {
let parts
@@ -132,6 +132,33 @@ describe('createGatewayEventHandler', () => {
expect(ctx.system.sys).toHaveBeenCalledWith('compressing 968 messages (~123,400 tok)…')
})
it('surfaces self-improvement review summaries as a persistent system line', () => {
const appended: Msg[] = []
const ctx = buildCtx(appended)
const onEvent = createGatewayEventHandler(ctx)
onEvent({
payload: { text: "💾 Self-improvement review: Skill 'hermes-release' patched" },
type: 'review.summary'
} as any)
expect(ctx.system.sys).toHaveBeenCalledWith(
"💾 Self-improvement review: Skill 'hermes-release' patched"
)
})
it('ignores review.summary events with empty or missing text', () => {
const appended: Msg[] = []
const ctx = buildCtx(appended)
const onEvent = createGatewayEventHandler(ctx)
onEvent({ payload: { text: '' }, type: 'review.summary' } as any)
onEvent({ payload: { text: ' ' }, type: 'review.summary' } as any)
onEvent({ payload: undefined, type: 'review.summary' } as any)
expect(ctx.system.sys).not.toHaveBeenCalled()
})
it('clears the visible todo list when the todo tool returns an empty list', () => {
const appended: Msg[] = []
const todos = [{ content: 'Boil water', id: 'boil', status: 'in_progress' }]
+9 -1
View File
@@ -3,11 +3,19 @@ import { describe, expect, it, vi } from 'vitest'
import { resetTerminalModes, TERMINAL_MODE_RESET } from '../lib/terminalModes.js'
describe('terminal mode reset', () => {
it('includes the sticky input modes Hermes enables', () => {
it('includes common sticky input modes', () => {
expect(TERMINAL_MODE_RESET).toContain('\x1b[0\'z')
expect(TERMINAL_MODE_RESET).toContain('\x1b[0\'{')
expect(TERMINAL_MODE_RESET).toContain('\x1b[?2029l')
expect(TERMINAL_MODE_RESET).toContain('\x1b[?1016l')
expect(TERMINAL_MODE_RESET).toContain('\x1b[?1015l')
expect(TERMINAL_MODE_RESET).toContain('\x1b[?1006l')
expect(TERMINAL_MODE_RESET).toContain('\x1b[?1005l')
expect(TERMINAL_MODE_RESET).toContain('\x1b[?1003l')
expect(TERMINAL_MODE_RESET).toContain('\x1b[?1002l')
expect(TERMINAL_MODE_RESET).toContain('\x1b[?1001l')
expect(TERMINAL_MODE_RESET).toContain('\x1b[?1000l')
expect(TERMINAL_MODE_RESET).toContain('\x1b[?9l')
expect(TERMINAL_MODE_RESET).toContain('\x1b[?1004l')
expect(TERMINAL_MODE_RESET).toContain('\x1b[?2004l')
expect(TERMINAL_MODE_RESET).toContain('\x1b[?1049l')
@@ -510,6 +510,20 @@ export function createGatewayEventHandler(ctx: GatewayEventHandlerContext): (ev:
return
case 'review.summary': {
// Self-improvement background review emitted a persistent summary
// of what it saved to memory/skills. Surface it as a system line
// in the transcript so it never gets lost to a transient status
// flash. Python-side already formats it as "💾 Self-improvement
// review: …".
const text = String(ev.payload?.text ?? '').trim()
if (text) {
sys(text)
}
return
}
case 'subagent.spawn_requested':
// Child built but not yet running (waiting on ThreadPoolExecutor slot).
// Preserve completed state if a later event races in before this one.
+1
View File
@@ -493,6 +493,7 @@ export type GatewayEvent =
| { payload: { request_id: string }; session_id?: string; type: 'sudo.request' }
| { payload: { env_var: string; prompt: string; request_id: string }; session_id?: string; type: 'secret.request' }
| { payload: { task_id: string; text: string }; session_id?: string; type: 'background.complete' }
| { payload?: { text?: string }; session_id?: string; type: 'review.summary' }
| { payload: SubagentEventPayload; session_id?: string; type: 'subagent.spawn_requested' }
| { payload: SubagentEventPayload; session_id?: string; type: 'subagent.start' }
| { payload: SubagentEventPayload; session_id?: string; type: 'subagent.thinking' }
+8
View File
@@ -1,10 +1,18 @@
import { writeSync } from 'node:fs'
export const TERMINAL_MODE_RESET =
'\x1b[0\'z' + // DEC locator reporting
'\x1b[0\'{' + // selectable locator events
'\x1b[?2029l' + // passive mouse
'\x1b[?1016l' + // SGR-pixels mouse
'\x1b[?1015l' + // urxvt decimal mouse
'\x1b[?1006l' + // SGR mouse
'\x1b[?1005l' + // UTF-8 extended mouse
'\x1b[?1003l' + // any-motion mouse
'\x1b[?1002l' + // button-motion mouse
'\x1b[?1001l' + // highlight mouse
'\x1b[?1000l' + // click mouse
'\x1b[?9l' + // X10 mouse
'\x1b[?1004l' + // focus events
'\x1b[?2004l' + // bracketed paste
'\x1b[?1049l' + // alternate screen
+10 -7
View File
@@ -38,17 +38,16 @@ import {
Sparkles,
Star,
Terminal,
Users,
Wrench,
X,
Zap,
} from "lucide-react";
import {
Button,
ListItem,
SelectionSwitcher,
Spinner,
Typography,
} from "@nous-research/ui";
import { Button } from "@nous-research/ui/ui/components/button";
import { ListItem } from "@nous-research/ui/ui/components/list-item";
import { SelectionSwitcher } from "@nous-research/ui/ui/components/selection-switcher";
import { Spinner } from "@nous-research/ui/ui/components/spinner";
import { Typography } from "@/components/NouiTypography";
import { cn } from "@/lib/utils";
import { Backdrop } from "@/components/Backdrop";
import { SidebarFooter } from "@/components/SidebarFooter";
@@ -64,6 +63,7 @@ import LogsPage from "@/pages/LogsPage";
import AnalyticsPage from "@/pages/AnalyticsPage";
import ModelsPage from "@/pages/ModelsPage";
import CronPage from "@/pages/CronPage";
import ProfilesPage from "@/pages/ProfilesPage";
import SkillsPage from "@/pages/SkillsPage";
import ChatPage from "@/pages/ChatPage";
import { LanguageSwitcher } from "@/components/LanguageSwitcher";
@@ -102,6 +102,7 @@ const BUILTIN_ROUTES_CORE: Record<string, ComponentType> = {
"/logs": LogsPage,
"/cron": CronPage,
"/skills": SkillsPage,
"/profiles": ProfilesPage,
"/config": ConfigPage,
"/env": EnvPage,
"/docs": DocsPage,
@@ -137,6 +138,7 @@ const BUILTIN_NAV_REST: NavItem[] = [
{ path: "/logs", labelKey: "logs", label: "Logs", icon: FileText },
{ path: "/cron", labelKey: "cron", label: "Cron", icon: Clock },
{ path: "/skills", labelKey: "skills", label: "Skills", icon: Package },
{ path: "/profiles", labelKey: "profiles", label: "Profiles", icon: Users },
{ path: "/config", labelKey: "config", label: "Config", icon: Settings },
{ path: "/env", labelKey: "keys", label: "Keys", icon: KeyRound },
{
@@ -163,6 +165,7 @@ const ICON_MAP: Record<string, ComponentType<{ className?: string }>> = {
Globe,
Database,
Shield,
Users,
Wrench,
Zap,
Heart,
+2 -1
View File
@@ -1,4 +1,5 @@
import { Select, SelectOption, Switch } from "@nous-research/ui";
import { Select, SelectOption } from "@nous-research/ui/ui/components/select";
import { Switch } from "@nous-research/ui/ui/components/switch";
import { Input } from "@/components/ui/input";
import { Label } from "@/components/ui/label";
+2 -2
View File
@@ -23,8 +23,8 @@
* terminal pane keeps working unimpaired.
*/
import { Button } from "@nous-research/ui";
import { Badge } from "@nous-research/ui";
import { Button } from "@nous-research/ui/ui/components/button";
import { Badge } from "@nous-research/ui/ui/components/badge";
import { Card } from "@/components/ui/card";
import { ModelPickerDialog } from "@/components/ModelPickerDialog";
+2 -1
View File
@@ -1,4 +1,5 @@
import { Button, Typography } from "@nous-research/ui";
import { Button } from "@nous-research/ui/ui/components/button";
import { Typography } from "@/components/NouiTypography";
import { useI18n } from "@/i18n/context";
/**
+1 -1
View File
@@ -1,6 +1,6 @@
import { useEffect, useRef, useState } from "react";
import { Brain, Eye, Gauge, Lightbulb, Wrench } from "lucide-react";
import { Spinner } from "@nous-research/ui";
import { Spinner } from "@nous-research/ui/ui/components/spinner";
import { api } from "@/lib/api";
import type { ModelInfoResponse } from "@/lib/api";
import { formatTokenCount } from "@/lib/format";
+3 -1
View File
@@ -1,4 +1,6 @@
import { Button, ListItem, Spinner } from "@nous-research/ui";
import { Button } from "@nous-research/ui/ui/components/button";
import { ListItem } from "@nous-research/ui/ui/components/list-item";
import { Spinner } from "@nous-research/ui/ui/components/spinner";
import { Input } from "@/components/ui/input";
import type { GatewayClient } from "@/lib/gatewayClient";
import { Check, Search, X } from "lucide-react";
+63
View File
@@ -0,0 +1,63 @@
import { forwardRef, type ElementType, type HTMLAttributes, type ReactNode } from "react";
import { cn } from "@/lib/utils";
type TypographyProps = HTMLAttributes<HTMLElement> & {
as?: ElementType;
children?: ReactNode;
compressed?: boolean;
courier?: boolean;
expanded?: boolean;
mondwest?: boolean;
mono?: boolean;
sans?: boolean;
variant?: "sm" | "md" | "lg" | "xl";
};
const variantClasses: Record<NonNullable<TypographyProps["variant"]>, string> = {
sm: "leading-[1.4] text-[.9375rem] tracking-[0.1875rem]",
md: "text-[2.625rem] leading-[1] tracking-[0.0525rem]",
lg: "text-[2.625rem] leading-[1] tracking-[0.0525rem]",
xl: "text-[4.5rem] leading-[1] tracking-[0.135rem]",
};
export const Typography = forwardRef<HTMLElement, TypographyProps>(function Typography(
{
as: Component = "span",
className,
compressed,
courier,
expanded,
mondwest,
mono,
sans,
variant,
...props
},
ref,
) {
const hasFontVariant = compressed || courier || expanded || mondwest || mono || sans;
return (
<Component
className={cn(
compressed && "font-compressed",
courier && "font-courier",
expanded && "font-expanded",
mondwest && "font-mondwest tracking-[0.1875rem]",
mono && "font-mono",
(!hasFontVariant || sans) && "font-sans",
variant && variantClasses[variant],
className,
)}
ref={ref}
{...props}
/>
);
});
export const H2 = forwardRef<HTMLHeadingElement, Omit<TypographyProps, "as">>(function H2(
{ className, variant = "lg", ...props },
ref,
) {
return <Typography as="h2" className={cn("font-bold", className)} variant={variant} ref={ref} {...props} />;
});
+4 -1
View File
@@ -1,6 +1,9 @@
import { useEffect, useRef, useState } from "react";
import { ExternalLink, X, Check } from "lucide-react";
import { Button, CopyButton, H2, Spinner } from "@nous-research/ui";
import { Button } from "@nous-research/ui/ui/components/button";
import { CopyButton } from "@nous-research/ui/ui/components/command-block";
import { Spinner } from "@nous-research/ui/ui/components/spinner";
import { H2 } from "@/components/NouiTypography";
import { api, type OAuthProvider, type OAuthStartResponse } from "@/lib/api";
import { Input } from "@/components/ui/input";
import { useI18n } from "@/i18n";
+4 -2
View File
@@ -9,7 +9,9 @@ import {
LogIn,
} from "lucide-react";
import { api, type OAuthProvider } from "@/lib/api";
import { Button, CopyButton, Spinner } from "@nous-research/ui";
import { Button } from "@nous-research/ui/ui/components/button";
import { CopyButton } from "@nous-research/ui/ui/components/command-block";
import { Spinner } from "@nous-research/ui/ui/components/spinner";
import {
Card,
CardContent,
@@ -17,7 +19,7 @@ import {
CardHeader,
CardTitle,
} from "@/components/ui/card";
import { Badge } from "@nous-research/ui";
import { Badge } from "@nous-research/ui/ui/components/badge";
import { OAuthLoginModal } from "@/components/OAuthLoginModal";
import { useI18n } from "@/i18n";
+1 -1
View File
@@ -1,7 +1,7 @@
import { AlertTriangle, Radio, Wifi, WifiOff } from "lucide-react";
import type { PlatformStatus } from "@/lib/api";
import { isoTimeAgo } from "@/lib/utils";
import { Badge } from "@nous-research/ui";
import { Badge } from "@nous-research/ui/ui/components/badge";
import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
import { useI18n } from "@/i18n";
+1 -1
View File
@@ -1,4 +1,4 @@
import { Typography } from "@nous-research/ui";
import { Typography } from "@/components/NouiTypography";
import { useSidebarStatus } from "@/hooks/useSidebarStatus";
import { cn } from "@/lib/utils";
import { useI18n } from "@/i18n";
+1 -1
View File
@@ -1,5 +1,5 @@
import type { GatewayClient } from "@/lib/gatewayClient";
import { ListItem } from "@nous-research/ui";
import { ListItem } from "@nous-research/ui/ui/components/list-item";
import { ChevronRight } from "lucide-react";
import {
forwardRef,
+3 -1
View File
@@ -1,6 +1,8 @@
import { useCallback, useEffect, useRef, useState } from "react";
import { Palette, Check } from "lucide-react";
import { Button, ListItem, Typography } from "@nous-research/ui";
import { Button } from "@nous-research/ui/ui/components/button";
import { ListItem } from "@nous-research/ui/ui/components/list-item";
import { Typography } from "@/components/NouiTypography";
import { BUILTIN_THEMES, useTheme } from "@/themes";
import { useI18n } from "@/i18n";
import { cn } from "@/lib/utils";
+1 -1
View File
@@ -1,4 +1,4 @@
import { ListItem } from "@nous-research/ui";
import { ListItem } from "@nous-research/ui/ui/components/list-item";
import {
AlertCircle,
Check,
+1 -1
View File
@@ -1,7 +1,7 @@
import { useEffect, useRef } from "react";
import { createPortal } from "react-dom";
import { AlertTriangle } from "lucide-react";
import { Button } from "@nous-research/ui";
import { Button } from "@nous-research/ui/ui/components/button";
import { cn } from "@/lib/utils";
export function ConfirmDialog({
+33
View File
@@ -75,6 +75,7 @@ export const en: Translations = {
keys: "Keys",
logs: "Logs",
models: "Models",
profiles: "profiles : multi agents",
sessions: "Sessions",
skills: "Skills",
},
@@ -223,6 +224,38 @@ export const en: Translations = {
},
},
profiles: {
newProfile: "New Profile",
name: "Name",
namePlaceholder: "e.g. coder, writer, etc.",
nameRequired: "Name is required",
nameRule:
"Lowercase letters, digits, _ and - only; must start with a letter or digit; up to 64 characters.",
invalidName: "Invalid profile name",
cloneFromDefault: "Clone config from default profile",
allProfiles: "Profiles",
noProfiles: "No profiles found.",
defaultBadge: "default",
hasEnv: "env",
model: "Model",
skills: "Skills",
rename: "Rename",
editSoul: "Edit SOUL.md",
soulSection: "SOUL.md (personality / system prompt)",
soulPlaceholder: "# How this agent should behave…",
saveSoul: "Save SOUL",
soulSaved: "SOUL.md saved",
openInTerminal: "Copy CLI command",
commandCopied: "Copied to clipboard",
copyFailed: "Could not copy",
confirmDeleteTitle: "Delete profile?",
confirmDeleteMessage:
"This permanently deletes profile '{name}' — config, keys, memories, sessions, skills, cron jobs. Cannot be undone.",
created: "Created",
deleted: "Deleted",
renamed: "Renamed",
},
skills: {
title: "Skills",
searchPlaceholder: "Search skills and toolsets...",
+32
View File
@@ -75,6 +75,7 @@ export interface Translations {
keys: string;
logs: string;
models: string;
profiles: string;
sessions: string;
skills: string;
};
@@ -227,6 +228,37 @@ export interface Translations {
};
};
// ── Profiles page ──
profiles: {
newProfile: string;
name: string;
namePlaceholder: string;
nameRequired: string;
nameRule: string;
invalidName: string;
cloneFromDefault: string;
allProfiles: string;
noProfiles: string;
defaultBadge: string;
hasEnv: string;
model: string;
skills: string;
rename: string;
editSoul: string;
soulSection: string;
soulPlaceholder: string;
saveSoul: string;
soulSaved: string;
openInTerminal: string;
commandCopied: string;
copyFailed: string;
confirmDeleteTitle: string;
confirmDeleteMessage: string;
created: string;
deleted: string;
renamed: string;
};
// ── Skills page ──
skills: {
title: string;
+33
View File
@@ -74,6 +74,7 @@ export const zh: Translations = {
keys: "密钥",
logs: "日志",
models: "模型",
profiles: "多Agent配置",
sessions: "会话",
skills: "技能",
},
@@ -220,6 +221,38 @@ export const zh: Translations = {
},
},
profiles: {
newProfile: "新建多Agent配置",
name: "名称",
namePlaceholder: "例如:coder, writer 等",
nameRequired: "名称必填",
nameRule:
"仅允许小写字母、数字、下划线和短横线;首字符必须是字母或数字;最多 64 个字符。",
invalidName: "多Agent配置名称非法",
cloneFromDefault: "从默认多Agent配置克隆配置",
allProfiles: "多Agent配置列表",
noProfiles: "暂无多Agent配置。",
defaultBadge: "默认",
hasEnv: "已配置 env",
model: "模型",
skills: "技能",
rename: "重命名",
editSoul: "编辑 SOUL.md",
soulSection: "SOUL.md(人格 / 系统提示词)",
soulPlaceholder: "# 这个代理应当如何工作……",
saveSoul: "保存 SOUL",
soulSaved: "SOUL.md 已保存",
openInTerminal: "复制 CLI 命令",
commandCopied: "已复制到剪贴板",
copyFailed: "复制失败",
confirmDeleteTitle: "删除多Agent配置?",
confirmDeleteMessage:
"将永久删除多Agent配置 '{name}' — 包括配置、密钥、记忆、会话、技能、定时任务。此操作无法撤销。",
created: "已创建",
deleted: "已删除",
renamed: "已重命名",
},
skills: {
title: "技能",
searchPlaceholder: "搜索技能和工具集...",
+51
View File
@@ -132,6 +132,47 @@ export const api = {
deleteCronJob: (id: string) =>
fetchJSON<{ ok: boolean }>(`/api/cron/jobs/${id}`, { method: "DELETE" }),
// Profiles (minimal)
getProfiles: () =>
fetchJSON<{ profiles: ProfileInfo[] }>("/api/profiles"),
createProfile: (body: { name: string; clone_from_default: boolean }) =>
fetchJSON<{ ok: boolean; name: string; path: string }>("/api/profiles", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(body),
}),
renameProfile: (name: string, newName: string) =>
fetchJSON<{ ok: boolean; name: string; path: string }>(
`/api/profiles/${encodeURIComponent(name)}`,
{
method: "PATCH",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ new_name: newName }),
},
),
deleteProfile: (name: string) =>
fetchJSON<{ ok: boolean }>(
`/api/profiles/${encodeURIComponent(name)}`,
{ method: "DELETE" },
),
getProfileSetupCommand: (name: string) =>
fetchJSON<{ command: string }>(
`/api/profiles/${encodeURIComponent(name)}/setup-command`,
),
getProfileSoul: (name: string) =>
fetchJSON<{ content: string; exists: boolean }>(
`/api/profiles/${encodeURIComponent(name)}/soul`,
),
updateProfileSoul: (name: string, content: string) =>
fetchJSON<{ ok: boolean }>(
`/api/profiles/${encodeURIComponent(name)}/soul`,
{
method: "PUT",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ content }),
},
),
// Skills & Toolsets
getSkills: () => fetchJSON<SkillInfo[]>("/api/skills"),
toggleSkill: (name: string, enabled: boolean) =>
@@ -380,6 +421,16 @@ export interface AnalyticsResponse {
};
}
export interface ProfileInfo {
name: string;
path: string;
is_default: boolean;
model: string | null;
provider: string | null;
has_env: boolean;
skill_count: number;
}
export interface ModelsAnalyticsModelEntry {
model: string;
provider: string;
+4 -2
View File
@@ -8,9 +8,11 @@ import type {
AnalyticsSkillEntry,
} from "@/lib/api";
import { timeAgo } from "@/lib/utils";
import { Button, Spinner, Stats } from "@nous-research/ui";
import { Button } from "@nous-research/ui/ui/components/button";
import { Spinner } from "@nous-research/ui/ui/components/spinner";
import { Stats } from "@nous-research/ui/ui/components/stats";
import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
import { Badge } from "@nous-research/ui";
import { Badge } from "@nous-research/ui/ui/components/badge";
import { usePageHeader } from "@/contexts/usePageHeader";
import { useI18n } from "@/i18n";
import { PluginSlot } from "@/plugins";
+2 -1
View File
@@ -22,7 +22,8 @@ import { WebLinksAddon } from "@xterm/addon-web-links";
import { WebglAddon } from "@xterm/addon-webgl";
import { Terminal } from "@xterm/xterm";
import "@xterm/xterm/css/xterm.css";
import { Button, Typography } from "@nous-research/ui";
import { Button } from "@nous-research/ui/ui/components/button";
import { Typography } from "@/components/NouiTypography";
import { cn } from "@/lib/utils";
import { Copy, PanelRight, X } from "lucide-react";
import { useCallback, useEffect, useMemo, useRef, useState } from "react";
+4 -2
View File
@@ -33,10 +33,12 @@ import { getNestedValue, setNestedValue } from "@/lib/nested";
import { useToast } from "@/hooks/useToast";
import { Toast } from "@/components/Toast";
import { AutoField } from "@/components/AutoField";
import { Button, ListItem, Spinner } from "@nous-research/ui";
import { Button } from "@nous-research/ui/ui/components/button";
import { ListItem } from "@nous-research/ui/ui/components/list-item";
import { Spinner } from "@nous-research/ui/ui/components/spinner";
import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
import { Input } from "@/components/ui/input";
import { Badge } from "@nous-research/ui";
import { Badge } from "@nous-research/ui/ui/components/badge";
import { useI18n } from "@/i18n";
import { usePageHeader } from "@/contexts/usePageHeader";
import { PluginSlot } from "@/plugins";
+5 -1
View File
@@ -1,6 +1,10 @@
import { useCallback, useEffect, useState } from "react";
import { Clock, Pause, Play, Plus, Trash2, Zap } from "lucide-react";
import { Badge, Button, H2, Select, SelectOption, Spinner } from "@nous-research/ui";
import { Badge } from "@nous-research/ui/ui/components/badge";
import { Button } from "@nous-research/ui/ui/components/button";
import { Select, SelectOption } from "@nous-research/ui/ui/components/select";
import { Spinner } from "@nous-research/ui/ui/components/spinner";
import { H2 } from "@/components/NouiTypography";
import { api } from "@/lib/api";
import type { CronJob } from "@/lib/api";
import { DeleteConfirmDialog } from "@/components/DeleteConfirmDialog";
+4 -2
View File
@@ -21,7 +21,9 @@ import { Toast } from "@/components/Toast";
import { useConfirmDelete } from "@/hooks/useConfirmDelete";
import { useToast } from "@/hooks/useToast";
import { OAuthProvidersCard } from "@/components/OAuthProvidersCard";
import { Button, ListItem, Spinner } from "@nous-research/ui";
import { Button } from "@nous-research/ui/ui/components/button";
import { ListItem } from "@nous-research/ui/ui/components/list-item";
import { Spinner } from "@nous-research/ui/ui/components/spinner";
import {
Card,
CardContent,
@@ -29,7 +31,7 @@ import {
CardHeader,
CardTitle,
} from "@/components/ui/card";
import { Badge } from "@nous-research/ui";
import { Badge } from "@nous-research/ui/ui/components/badge";
import { Input } from "@/components/ui/input";
import { Label } from "@/components/ui/label";
import { useI18n } from "@/i18n";
+5 -8
View File
@@ -7,14 +7,11 @@ import {
} from "react";
import { FileText, RefreshCw } from "lucide-react";
import { api } from "@/lib/api";
import {
Badge,
Button,
FilterGroup,
Segmented,
Spinner,
Switch,
} from "@nous-research/ui";
import { Badge } from "@nous-research/ui/ui/components/badge";
import { Button } from "@nous-research/ui/ui/components/button";
import { FilterGroup, Segmented } from "@nous-research/ui/ui/components/segmented";
import { Spinner } from "@nous-research/ui/ui/components/spinner";
import { Switch } from "@nous-research/ui/ui/components/switch";
import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
import { Label } from "@/components/ui/label";
import { useI18n } from "@/i18n";
+4 -2
View File
@@ -20,9 +20,11 @@ import type {
} from "@/lib/api";
import { timeAgo } from "@/lib/utils";
import { formatTokenCount } from "@/lib/format";
import { Button, Spinner, Stats } from "@nous-research/ui";
import { Button } from "@nous-research/ui/ui/components/button";
import { Spinner } from "@nous-research/ui/ui/components/spinner";
import { Stats } from "@nous-research/ui/ui/components/stats";
import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
import { Badge } from "@nous-research/ui";
import { Badge } from "@nous-research/ui/ui/components/badge";
import { usePageHeader } from "@/contexts/usePageHeader";
import { useI18n } from "@/i18n";
import { PluginSlot } from "@/plugins";
+444
View File
@@ -0,0 +1,444 @@
import { useCallback, useEffect, useRef, useState } from "react";
import { ChevronDown, Pencil, Plus, Terminal, Trash2, Users } from "lucide-react";
import { H2 } from "@/components/NouiTypography";
import { api } from "@/lib/api";
import type { ProfileInfo } from "@/lib/api";
import { DeleteConfirmDialog } from "@/components/DeleteConfirmDialog";
import { useToast } from "@/hooks/useToast";
import { useConfirmDelete } from "@/hooks/useConfirmDelete";
import { Toast } from "@/components/Toast";
import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
import { Badge } from "@nous-research/ui/ui/components/badge";
import { Button } from "@nous-research/ui/ui/components/button";
import { Input } from "@/components/ui/input";
import { Label } from "@/components/ui/label";
import { useI18n } from "@/i18n";
// Mirrors hermes_cli/profiles.py::_PROFILE_ID_RE so we can reject obviously
// invalid names (uppercase, spaces, …) before round-tripping a doomed POST.
const PROFILE_NAME_RE = /^[a-z0-9][a-z0-9_-]{0,63}$/;
export default function ProfilesPage() {
const [profiles, setProfiles] = useState<ProfileInfo[]>([]);
const [loading, setLoading] = useState(true);
const { toast, showToast } = useToast();
const { t } = useI18n();
// Create form
const [newName, setNewName] = useState("");
const [cloneFromDefault, setCloneFromDefault] = useState(true);
const [creating, setCreating] = useState(false);
// Inline rename state
const [renamingFrom, setRenamingFrom] = useState<string | null>(null);
const [renameTo, setRenameTo] = useState("");
// Inline SOUL editor state
const [editingSoulFor, setEditingSoulFor] = useState<string | null>(null);
const [soulText, setSoulText] = useState("");
const [soulSaving, setSoulSaving] = useState(false);
// Tracks the latest SOUL request so out-of-order responses don't overwrite
// newer state when the user switches profiles or closes the editor.
const activeSoulRequest = useRef<string | null>(null);
const load = useCallback(() => {
api
.getProfiles()
.then((res) => setProfiles(res.profiles))
.catch((e) => showToast(`${t.status.error}: ${e}`, "error"))
.finally(() => setLoading(false));
}, [showToast, t.status.error]);
useEffect(() => {
load();
}, [load]);
const handleCreate = async () => {
const name = newName.trim();
if (!name) {
showToast(t.profiles.nameRequired, "error");
return;
}
if (!PROFILE_NAME_RE.test(name)) {
showToast(`${t.profiles.invalidName}: ${t.profiles.nameRule}`, "error");
return;
}
setCreating(true);
try {
await api.createProfile({ name, clone_from_default: cloneFromDefault });
showToast(`${t.profiles.created}: ${name}`, "success");
setNewName("");
load();
} catch (e) {
showToast(`${t.status.error}: ${e}`, "error");
} finally {
setCreating(false);
}
};
const handleRenameSubmit = async () => {
if (!renamingFrom) return;
const target = renameTo.trim();
if (!target || target === renamingFrom) {
setRenamingFrom(null);
setRenameTo("");
return;
}
if (!PROFILE_NAME_RE.test(target)) {
showToast(`${t.profiles.invalidName}: ${t.profiles.nameRule}`, "error");
return;
}
try {
await api.renameProfile(renamingFrom, target);
showToast(`${t.profiles.renamed}: ${renamingFrom}${target}`, "success");
setRenamingFrom(null);
setRenameTo("");
load();
} catch (e) {
showToast(`${t.status.error}: ${e}`, "error");
}
};
const openSoulEditor = useCallback(
async (name: string) => {
if (editingSoulFor === name) {
activeSoulRequest.current = null;
setEditingSoulFor(null);
return;
}
setEditingSoulFor(name);
setSoulText("");
activeSoulRequest.current = name;
try {
const soul = await api.getProfileSoul(name);
if (activeSoulRequest.current === name) {
setSoulText(soul.content);
}
} catch (e) {
if (activeSoulRequest.current === name) {
showToast(`${t.status.error}: ${e}`, "error");
}
}
},
[editingSoulFor, showToast, t.status.error],
);
const handleSaveSoul = async (name: string) => {
setSoulSaving(true);
try {
await api.updateProfileSoul(name, soulText);
showToast(`${t.profiles.soulSaved}: ${name}`, "success");
} catch (e) {
showToast(`${t.status.error}: ${e}`, "error");
} finally {
setSoulSaving(false);
}
};
const handleCopyTerminalCommand = async (name: string) => {
let cmd: string;
try {
const res = await api.getProfileSetupCommand(name);
cmd = res.command;
} catch (e) {
showToast(`${t.status.error}: ${e}`, "error");
return;
}
try {
await navigator.clipboard.writeText(cmd);
showToast(`${t.profiles.commandCopied}: ${cmd}`, "success");
} catch {
showToast(`${t.profiles.copyFailed}: ${cmd}`, "error");
}
};
const profileDelete = useConfirmDelete<string>({
onDelete: useCallback(
async (name: string) => {
try {
await api.deleteProfile(name);
showToast(`${t.profiles.deleted}: ${name}`, "success");
load();
} catch (e) {
showToast(`${t.status.error}: ${e}`, "error");
throw e;
}
},
[load, showToast, t.profiles.deleted, t.status.error],
),
});
const pendingName = profileDelete.pendingId;
if (loading) {
return (
<div className="flex items-center justify-center py-24">
<div className="h-6 w-6 animate-spin rounded-full border-2 border-primary border-t-transparent" />
</div>
);
}
return (
// Profile names, model slugs, and paths are case-sensitive; opt out of
// the app shell's global ``uppercase`` so they render as the user typed.
// Children that explicitly opt back in (Badges, etc.) keep their casing.
<div className="flex flex-col gap-6 normal-case">
<Toast toast={toast} />
<DeleteConfirmDialog
open={profileDelete.isOpen}
onCancel={profileDelete.cancel}
onConfirm={profileDelete.confirm}
title={t.profiles.confirmDeleteTitle}
description={
pendingName
? t.profiles.confirmDeleteMessage.replace("{name}", pendingName)
: t.profiles.confirmDeleteMessage
}
loading={profileDelete.isDeleting}
/>
{/* Create new profile */}
<Card>
<CardHeader>
<CardTitle className="flex items-center gap-2 text-base">
<Plus className="h-4 w-4" />
{t.profiles.newProfile}
</CardTitle>
</CardHeader>
<CardContent>
<div className="grid gap-4">
<div className="grid gap-2">
<Label htmlFor="profile-name">{t.profiles.name}</Label>
<Input
id="profile-name"
placeholder={t.profiles.namePlaceholder}
value={newName}
onChange={(e) => setNewName(e.target.value)}
aria-invalid={
newName.trim() !== "" &&
!PROFILE_NAME_RE.test(newName.trim())
}
/>
<p className="text-xs text-muted-foreground">
{t.profiles.nameRule}
</p>
</div>
<label className="flex items-center gap-2 text-sm cursor-pointer">
<input
type="checkbox"
checked={cloneFromDefault}
onChange={(e) => setCloneFromDefault(e.target.checked)}
/>
{t.profiles.cloneFromDefault}
</label>
<div>
<Button onClick={handleCreate} disabled={creating}>
<Plus className="h-3 w-3" />
{creating ? t.common.creating : t.common.create}
</Button>
</div>
</div>
</CardContent>
</Card>
{/* List */}
<div className="flex flex-col gap-3">
<H2
variant="sm"
className="flex items-center gap-2 text-muted-foreground"
>
<Users className="h-4 w-4" />
{t.profiles.allProfiles} ({profiles.length})
</H2>
{profiles.length === 0 && (
<Card>
<CardContent className="py-8 text-center text-sm text-muted-foreground">
{t.profiles.noProfiles}
</CardContent>
</Card>
)}
{profiles.map((p) => {
const isRenaming = renamingFrom === p.name;
const isEditingSoul = editingSoulFor === p.name;
return (
<Card key={p.name}>
<CardContent className="flex items-center gap-4 py-4">
<div className="flex-1 min-w-0">
<div className="flex items-center gap-2 mb-1 flex-wrap">
{isRenaming ? (
<Input
autoFocus
value={renameTo}
onChange={(e) => setRenameTo(e.target.value)}
onKeyDown={(e) => {
if (e.key === "Enter") handleRenameSubmit();
if (e.key === "Escape") setRenamingFrom(null);
}}
aria-invalid={
renameTo.trim() !== "" &&
renameTo.trim() !== p.name &&
!PROFILE_NAME_RE.test(renameTo.trim())
}
className="max-w-xs"
/>
) : (
<span className="font-medium text-sm truncate">
{p.name}
</span>
)}
{p.is_default && (
<Badge tone="secondary">{t.profiles.defaultBadge}</Badge>
)}
{p.has_env && (
<Badge tone="outline">{t.profiles.hasEnv}</Badge>
)}
</div>
{isRenaming &&
(() => {
const trimmed = renameTo.trim();
const invalid =
trimmed !== "" &&
trimmed !== p.name &&
!PROFILE_NAME_RE.test(trimmed);
return (
<p
className={
"text-xs mb-1 " +
(invalid
? "text-destructive"
: "text-muted-foreground")
}
>
{invalid
? `${t.profiles.invalidName}: ${t.profiles.nameRule}`
: t.profiles.nameRule}
</p>
);
})()}
<div className="flex items-center gap-4 text-xs text-muted-foreground flex-wrap">
{p.model && (
<span>
{t.profiles.model}: {p.model}
{p.provider ? ` (${p.provider})` : ""}
</span>
)}
<span>
{t.profiles.skills}: {p.skill_count}
</span>
<span className="font-mono truncate max-w-[28rem]">
{p.path}
</span>
</div>
</div>
<div className="flex items-center gap-1 shrink-0">
{isRenaming ? (
<>
<Button
size="sm"
onClick={handleRenameSubmit}
>
{t.common.save}
</Button>
<Button
size="sm"
ghost
onClick={() => setRenamingFrom(null)}
>
{t.common.cancel}
</Button>
</>
) : (
<>
<Button
ghost
size="icon"
title={t.profiles.editSoul}
aria-label={t.profiles.editSoul}
onClick={() => openSoulEditor(p.name)}
>
{isEditingSoul ? (
<ChevronDown className="h-4 w-4" />
) : (
<span aria-hidden className="text-xs font-bold">
S
</span>
)}
</Button>
<Button
ghost
size="icon"
title={t.profiles.openInTerminal}
aria-label={t.profiles.openInTerminal}
onClick={() => handleCopyTerminalCommand(p.name)}
>
<Terminal className="h-4 w-4" />
</Button>
{!p.is_default && (
<Button
ghost
size="icon"
title={t.profiles.rename}
aria-label={t.profiles.rename}
onClick={() => {
setRenamingFrom(p.name);
setRenameTo(p.name);
}}
>
<Pencil className="h-4 w-4" />
</Button>
)}
{!p.is_default && (
<Button
ghost
size="icon"
title={t.common.delete}
aria-label={t.common.delete}
onClick={() => profileDelete.requestDelete(p.name)}
>
<Trash2 className="h-4 w-4 text-destructive" />
</Button>
)}
</>
)}
</div>
</CardContent>
{isEditingSoul && (
<div className="border-t border-border px-4 pb-4 pt-3 flex flex-col gap-2">
<Label
htmlFor={`soul-editor-${p.name}`}
className="flex items-center gap-2 text-xs uppercase tracking-wider text-muted-foreground"
>
{t.profiles.soulSection}
</Label>
<textarea
id={`soul-editor-${p.name}`}
className="flex min-h-[180px] w-full border border-input bg-transparent px-3 py-2 text-sm font-mono shadow-sm placeholder:text-muted-foreground focus-visible:outline-none focus-visible:ring-1 focus-visible:ring-ring"
placeholder={t.profiles.soulPlaceholder}
value={soulText}
onChange={(e) => setSoulText(e.target.value)}
/>
<div>
<Button
size="sm"
onClick={() => handleSaveSoul(p.name)}
disabled={soulSaving}
>
{soulSaving ? t.common.saving : t.profiles.saveSoul}
</Button>
</div>
</div>
)}
</Card>
);
})}
</div>
</div>
);
}
+4 -2
View File
@@ -35,8 +35,10 @@ import { timeAgo } from "@/lib/utils";
import { Markdown } from "@/components/Markdown";
import { PlatformsCard } from "@/components/PlatformsCard";
import { Toast } from "@/components/Toast";
import { Button, ListItem, Spinner } from "@nous-research/ui";
import { Badge } from "@nous-research/ui";
import { Button } from "@nous-research/ui/ui/components/button";
import { ListItem } from "@nous-research/ui/ui/components/list-item";
import { Spinner } from "@nous-research/ui/ui/components/spinner";
import { Badge } from "@nous-research/ui/ui/components/badge";
import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
import { DeleteConfirmDialog } from "@/components/DeleteConfirmDialog";
import { useConfirmDelete } from "@/hooks/useConfirmDelete";
+5 -1
View File
@@ -20,7 +20,11 @@ import type { SkillInfo, ToolsetInfo } from "@/lib/api";
import { useToast } from "@/hooks/useToast";
import { Toast } from "@/components/Toast";
import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
import { Badge, Button, ListItem, Spinner, Switch } from "@nous-research/ui";
import { Badge } from "@nous-research/ui/ui/components/badge";
import { Button } from "@nous-research/ui/ui/components/button";
import { ListItem } from "@nous-research/ui/ui/components/list-item";
import { Spinner } from "@nous-research/ui/ui/components/spinner";
import { Switch } from "@nous-research/ui/ui/components/switch";
import { cn } from "@/lib/utils";
import { Input } from "@/components/ui/input";
import { useI18n } from "@/i18n";
+1 -1
View File
@@ -1,5 +1,5 @@
import { useSyncExternalStore } from "react";
import { Spinner } from "@nous-research/ui";
import { Spinner } from "@nous-research/ui/ui/components/spinner";
import {
getPluginComponent,
getPluginLoadError,
+4 -2
View File
@@ -19,12 +19,14 @@ import React, {
} from "react";
import { api, fetchJSON } from "@/lib/api";
import { cn, timeAgo, isoTimeAgo } from "@/lib/utils";
import { Badge, Button, Select, SelectOption } from "@nous-research/ui";
import { Badge } from "@nous-research/ui/ui/components/badge";
import { Button } from "@nous-research/ui/ui/components/button";
import { Select, SelectOption } from "@nous-research/ui/ui/components/select";
import { Card, CardHeader, CardTitle, CardContent } from "@/components/ui/card";
import { Input } from "@/components/ui/input";
import { Label } from "@/components/ui/label";
import { Separator } from "@/components/ui/separator";
import { Tabs, TabsList, TabsTrigger } from "@nous-research/ui";
import { Tabs, TabsList, TabsTrigger } from "@nous-research/ui/ui/components/tabs";
import { useI18n } from "@/i18n";
import { registerSlot, PluginSlot } from "./slots";
+33
View File
@@ -47,6 +47,7 @@ hermes [global-options] <command> [subcommand/options]
| `hermes login` / `logout` | **Deprecated** — use `hermes auth` instead. |
| `hermes status` | Show agent, auth, and platform status. |
| `hermes cron` | Inspect and tick the cron scheduler. |
| `hermes kanban` | Multi-profile collaboration board (tasks, links, dispatcher). |
| `hermes webhook` | Manage dynamic webhook subscriptions for event-driven activation. |
| `hermes hooks` | Inspect, approve, or remove shell-script hooks declared in `config.yaml`. |
| `hermes doctor` | Diagnose config and dependency issues. |
@@ -336,6 +337,38 @@ hermes cron <list|create|edit|pause|resume|run|remove|status|tick>
| `status` | Check whether the cron scheduler is running. |
| `tick` | Run due jobs once and exit. |
## `hermes kanban`
```bash
hermes kanban <action> [options]
```
Multi-profile collaboration board. Tasks live in `~/.hermes/kanban.db` (WAL-mode SQLite); every profile reads and writes the same board. A `cron`-driven dispatcher (`hermes kanban dispatch`) atomically claims ready tasks and spawns the assigned profile as its own process with an isolated workspace.
| Action | Purpose |
|--------|---------|
| `init` | Create `kanban.db` if missing. Idempotent. |
| `create "<title>"` | Create a new task. Flags: `--body`, `--assignee`, `--parent` (repeatable), `--workspace scratch\|worktree\|dir:<path>`, `--tenant`, `--priority`. |
| `list` / `ls` | List tasks. Filter with `--mine`, `--assignee`, `--status`, `--tenant`, `--archived`, `--json`. |
| `show <id>` | Show a task with comments and events. `--json` for machine output. |
| `assign <id> <profile>` | Assign or reassign. Use `none` to unassign. Refused while task is running. |
| `link <parent> <child>` | Add a dependency. Cycle-detected. |
| `unlink <parent> <child>` | Remove a dependency. |
| `claim <id>` | Atomically claim a ready task. Prints resolved workspace path. |
| `comment <id> "<text>"` | Append a comment. Visible to the next worker that runs the task. |
| `complete <id>` | Mark task done. Flag: `--result "<summary>"` (goes into children's parent-result context). |
| `block <id> "<reason>"` | Mark task blocked. Also appends the reason as a comment. |
| `unblock <id>` | Return a blocked task to ready. |
| `archive <id>` | Hide from default list. `gc` will remove scratch workspaces. |
| `tail <id>` | Follow a task's event stream. |
| `dispatch` | One dispatcher pass. Flags: `--dry-run`, `--max N`, `--json`. |
| `context <id>` | Print the full context a worker would see (title + body + parent results + comments). |
| `gc` | Remove scratch workspaces for archived tasks. |
All actions are also available as a slash command in the gateway (`/kanban …`), with the same argument surface.
For the full design — comparison with Cline Kanban / Paperclip / NanoClaw / Gemini Enterprise, eight collaboration patterns, four user stories, concurrency correctness proof — see `docs/hermes-kanban-v1-spec.pdf` in the repository or the [Kanban user guide](/docs/user-guide/features/kanban).
## `hermes webhook`
```bash
@@ -0,0 +1,263 @@
# Kanban tutorial
A walkthrough of the four use-cases the Hermes Kanban system was designed for, with the dashboard open in a browser. If you haven't read the [Kanban overview](./kanban) yet, start there — this assumes you know what a task, run, assignee, and dispatcher are.
## Setup
```bash
hermes kanban init # optional; first `hermes kanban <anything>` auto-inits
hermes dashboard # opens http://127.0.0.1:9119 in your browser
# click Kanban in the left nav
```
The dashboard is the most comfortable place to learn the system. Everything you see here is also available via `hermes kanban <verb>` on the CLI — the two surfaces share the same SQLite database at `~/.hermes/kanban.db`.
## The board at a glance
![Kanban board overview](/img/kanban-tutorial/01-board-overview.png)
Six columns, left to right:
- **Triage** — raw ideas, a specifier will flesh out the spec before anyone works on them.
- **Todo** — created but waiting on dependencies, or not yet assigned.
- **Ready** — assigned and waiting for the dispatcher to claim.
- **In progress** — a worker is actively running the task. With "Lanes by profile" on (the default), this column sub-groups by assignee so you can see at a glance what each worker is doing.
- **Blocked** — a worker asked for human input, or the circuit breaker tripped.
- **Done** — completed.
The top bar has filters for search, tenant, and assignee, plus a `Lanes by profile` toggle and a `Nudge dispatcher` button that runs one dispatch tick right now instead of waiting for the daemon's next interval. Clicking any card opens its drawer on the right.
### Flat view
If the profile lanes are noisy, toggle "Lanes by profile" off and the In Progress column collapses to a single flat list ordered by claim time:
![Board with lanes by profile off](/img/kanban-tutorial/02-board-flat.png)
## Story 1 — Solo dev shipping a feature
You're building a feature. Classic flow: design a schema, implement the API, write the tests. Three tasks with parent→child dependencies.
```bash
SCHEMA=$(hermes kanban create "Design auth schema" \
--assignee backend-dev --tenant auth-project --priority 2 \
--body "Design the user/session/token schema for the auth module." \
--json | jq -r .id)
API=$(hermes kanban create "Implement auth API endpoints" \
--assignee backend-dev --tenant auth-project --priority 2 \
--parent $SCHEMA \
--body "POST /register, POST /login, POST /refresh, POST /logout." \
--json | jq -r .id)
hermes kanban create "Write auth integration tests" \
--assignee qa-dev --tenant auth-project --priority 2 \
--parent $API \
--body "Cover happy path, wrong password, expired token, concurrent refresh."
```
Because `API` has `SCHEMA` as its parent, and `tests` has `API` as its parent, only `SCHEMA` starts in `ready`. The other two sit in `todo` until their parents complete. This is the dependency promotion engine doing its job — no other worker will pick up the test-writing until there's an API to test.
Claim the schema task, do the work, hand off:
```bash
hermes kanban claim $SCHEMA
# (you design the schema, commit, etc.)
hermes kanban complete $SCHEMA \
--summary "users(id, email, pw_hash), sessions(id, user_id, jti, expires_at); refresh tokens stored as sessions with type='refresh'" \
--metadata '{
"changed_files": ["migrations/001_users.sql", "migrations/002_sessions.sql"],
"decisions": ["bcrypt for hashing", "JWT for session tokens", "7-day refresh, 15-min access"]
}'
```
When `SCHEMA` hits `done`, the dependency engine promotes `API` to `ready` automatically. The API worker, when it picks up, will read `SCHEMA`'s summary and metadata in its context — so it knows the schema decisions without re-reading a long design doc.
Click the completed schema task on the board and the drawer shows everything:
![Solo dev — completed schema task drawer](/img/kanban-tutorial/03-drawer-schema-task.png)
The Run History section at the bottom is the key addition. One attempt: outcome `completed`, worker `@backend-dev`, duration, timestamp, and the handoff summary in full. The metadata blob (`changed_files`, `decisions`) is stored on the run too and surfaced to any downstream worker that reads this parent.
On the CLI:
```bash
hermes kanban show $SCHEMA
hermes kanban runs $SCHEMA
# # OUTCOME PROFILE ELAPSED STARTED
# 1 completed backend-dev 0s 2026-04-27 19:34
# → users(id, email, pw_hash), sessions(id, user_id, jti, expires_at); refresh tokens ...
```
## Story 2 — Fleet farming
You have three workers (a translator, a transcriber, a copywriter) and a pile of independent tasks. You want all three pulling in parallel and making visible progress. This is the simplest kanban use-case and the one the original design optimized for.
Create the work:
```bash
for lang in Spanish French German; do
hermes kanban create "Translate homepage to $lang" \
--assignee translator --tenant content-ops
done
for i in 1 2 3 4 5; do
hermes kanban create "Transcribe Q3 customer call #$i" \
--assignee transcriber --tenant content-ops
done
for sku in 1001 1002 1003 1004; do
hermes kanban create "Generate product description: SKU-$sku" \
--assignee copywriter --tenant content-ops
done
```
Start the gateway and walk away — it hosts the embedded dispatcher
that picks up all three specialist profiles' tasks on the same
kanban.db:
```bash
hermes gateway start
```
Now filter the board to `content-ops` (or just search for "Transcribe") and you get this:
![Fleet view filtered to transcribe tasks](/img/kanban-tutorial/07-fleet-transcribes.png)
Two transcribes done, one running, two ready waiting for the next dispatcher tick. The In Progress column is grouped by profile (the "Lanes by profile" default) so you see each worker's active task without scanning a mixed list. The dispatcher will promote the next ready task to running as soon as the current one completes. With three daemons working on three assignee pools in parallel, the whole content queue drains without further human input.
**Everything Story 1 said about structured handoff still applies here.** A translator worker completing a call can pass `--summary "translated 4 pages, style matched existing marketing voice"` and `--metadata '{"duration_seconds": 720, "tokens_used": 2100}'` — useful for analytics and for any downstream task that depends on this one.
## Story 3 — Role pipeline with retry
This is where Kanban earns its keep over a flat TODO list. A PM writes a spec. An engineer implements it. A reviewer rejects the first attempt. The engineer tries again with changes. The reviewer approves.
The dashboard view, filtered by `auth-project`:
![Pipeline view for a multi-role feature](/img/kanban-tutorial/08-pipeline-auth.png)
Three-stage chain visible at once: `Spec: password reset flow` (DONE, pm), `Implement password reset flow` (DONE, backend-dev), `Review password reset PR` (READY, reviewer). Each has its parent in green at the bottom and children as dependencies.
The interesting one is the implementation task, because it was blocked and retried:
```bash
# PM completes the spec with acceptance criteria in metadata
hermes kanban complete $SPEC \
--summary "spec approved; POST /forgot-password sends email, GET /reset/:token renders form, POST /reset applies new password" \
--metadata '{"acceptance": [
"expired token returns 410",
"reused last-3 password returns 400 with message",
"successful reset invalidates all active sessions"
]}'
# Engineer claims + implements, but review blocks it for missing strength check
hermes kanban claim $IMPL
hermes kanban block $IMPL "Review: password strength check missing, reset link isn't single-use (can be replayed within 30min)"
# Engineer iterates, resolves, completes
hermes kanban unblock $IMPL
hermes kanban claim $IMPL
hermes kanban complete $IMPL \
--summary "added zxcvbn strength check, reset tokens are now single-use (stored + deleted on success)" \
--metadata '{
"changed_files": ["auth/reset.py", "auth/tests/test_reset.py", "migrations/003_single_use_reset_tokens.sql"],
"tests_run": 11,
"review_iteration": 2
}'
```
Click the implementation task. The drawer shows **two attempts**:
![Implementation task with two runs — blocked then completed](/img/kanban-tutorial/04b-drawer-retry-history-scrolled.png)
- **Run 1**`blocked` by `@backend-dev`. The review feedback sits right under the outcome: "password strength check missing, reset link isn't single-use (can be replayed within 30min)".
- **Run 2**`completed` by `@backend-dev`. Fresh summary, fresh metadata.
Each run is a row in `task_runs` with its own outcome, summary, and metadata. Retry history is not a conceptual afterthought layered on top of a "latest state" task — it's the primary representation. When a retrying worker opens the task, `build_worker_context` shows it the prior attempts, so the second-pass worker sees why the first pass was blocked and addresses those specific findings instead of re-running from scratch.
The reviewer picks up next. When they open `Review password reset PR`, they see:
![Reviewer's drawer view of the pipeline](/img/kanban-tutorial/09-drawer-pipeline-review.png)
The parent link is the completed implementation. When the reviewer's worker calls `build_worker_context`, it pulls the parent's most-recent-completed-run summary + metadata — so the reviewer reads "added zxcvbn strength check, reset tokens are now single-use" and has the list of changed files in hand before looking at a diff.
## Story 4 — Circuit breaker and crash recovery
Real workers fail. Missing credentials, OOM kills, transient network errors. The dispatcher has two lines of defense: a **circuit breaker** that auto-blocks after N consecutive failures so the board doesn't thrash forever, and **crash detection** that reclaims a task whose worker PID went away before its TTL expired.
### Circuit breaker — permanent-looking failure
A deploy task that can't spawn its worker because `AWS_ACCESS_KEY_ID` isn't set in the profile's environment:
```bash
hermes kanban create "Deploy to staging (missing creds)" \
--assignee deploy-bot --tenant ops
```
The dispatcher tries to spawn the worker. Spawn fails (`RuntimeError: AWS_ACCESS_KEY_ID not set`). The dispatcher releases the claim, increments a failure counter, and tries again next tick. After three consecutive failures (the default `failure_limit`), the circuit trips: the task goes to `blocked` with outcome `gave_up`. No more retries until a human unblocks it.
Click the blocked task:
![Circuit breaker — 2 spawn_failed + 1 gave_up](/img/kanban-tutorial/11-drawer-gave-up.png)
Three runs, all with the same error on the `error` field. The first two are `spawn_failed` (retryable), the third is `gave_up` (terminal). The event log above shows the full sequence: `created → claimed → spawn_failed → claimed → spawn_failed → claimed → gave_up`.
On the terminal:
```bash
hermes kanban runs t_ef5d
# # OUTCOME PROFILE ELAPSED STARTED
# 1 spawn_failed deploy-bot 0s 2026-04-27 19:34
# ! AWS_ACCESS_KEY_ID not set in deploy-bot env
# 2 spawn_failed deploy-bot 0s 2026-04-27 19:34
# ! AWS_ACCESS_KEY_ID not set in deploy-bot env
# 3 gave_up deploy-bot 0s 2026-04-27 19:34
# ! AWS_ACCESS_KEY_ID not set in deploy-bot env
```
If Telegram / Discord / Slack is wired in, a gateway notification fires on the `gave_up` event so you hear about the outage without having to check the board.
### Crash recovery — worker dies mid-flight
Sometimes the spawn succeeds but the worker process dies later — segfault, OOM, `systemctl stop`. The dispatcher polls `kill(pid, 0)` and detects the dead pid; the claim releases, the task goes back to `ready`, and the next tick gives it to a fresh worker.
The example in the seed data is a migration that was running out of memory:
```bash
# Worker claims, starts scanning 2.4M rows, OOM kills it at ~2.3M
# Dispatcher detects dead pid, releases claim, increments attempt counter
# Retry with a chunked strategy succeeds
```
The drawer shows the full two-attempt history:
![Crash and recovery — 1 crashed + 1 completed](/img/kanban-tutorial/06-drawer-crash-recovery.png)
Run 1 — `crashed`, with the error `OOM kill at row 2.3M (process 99999 gone)`. Run 2 — `completed`, with `"strategy": "chunked with LIMIT + WHERE id > last_id"` in its metadata. The retrying worker saw the crash of run 1 in its context and picked a safer strategy; the metadata makes it obvious to a future observer (or postmortem writer) what changed.
## Structured handoff — why `--summary` and `--metadata` matter
In every story above, workers passed `--summary` and `--metadata` on completion. That's not decoration — it's the primary handoff channel between stages of a workflow.
When a worker on task B reads its context, it gets:
- B's **prior attempts** (previous runs: outcome, summary, error, metadata) so a retrying worker doesn't repeat a failed path.
- **Parent task results** — for each parent, the most-recent completed run's summary and metadata — so downstream workers see why and how the upstream work was done.
This replaces the "dig through comments and the work output" dance that plagues flat kanban systems. A PM writes acceptance criteria in the spec's metadata, and the engineer's worker sees them structurally. An engineer records which tests they ran and how many passed, and the reviewer's worker has that list in hand before opening a diff.
The bulk-close guard exists because this data is per-run. `hermes kanban complete a b c --summary X` is refused — copy-pasting the same summary to three tasks is almost always wrong. Bulk close without the handoff flags still works for the common "I finished a pile of admin tasks" case.
## Inspecting a task currently running
For completeness — here's the drawer of a task still in flight (the API implementation from Story 1, claimed by `backend-dev` but not yet complete):
![Claimed, in-flight task](/img/kanban-tutorial/10-drawer-in-flight.png)
Status is `Running`. The active run appears in the Run History section with outcome `active` and no `ended_at`. If this worker dies or times out, the dispatcher closes this run with the appropriate outcome and opens a new one on the next claim — the attempt row never disappears.
## Next steps
- [Kanban overview](./kanban) — the full data model, event vocabulary, and CLI reference.
- `hermes kanban --help` — every subcommand, every flag.
- `hermes kanban watch --kinds completed,gave_up,timed_out` — live stream terminal events across the whole board.
- `hermes kanban notify-subscribe <task> --platform telegram --chat-id <id>` — get a gateway ping when a specific task finishes.
+510
View File
@@ -0,0 +1,510 @@
---
sidebar_position: 12
title: "Kanban (Multi-Agent Board)"
description: "Durable SQLite-backed task board for coordinating multiple Hermes profiles"
---
# Kanban — Multi-Agent Profile Collaboration
> **Want a walkthrough?** Read the [Kanban tutorial](./kanban-tutorial) — four user stories (solo dev, fleet farming, role pipeline with retry, circuit breaker) with dashboard screenshots of each. This page is the reference; the tutorial is the narrative.
Hermes Kanban is a durable task board, shared across all your Hermes profiles, that lets multiple named agents collaborate on work without fragile in-process subagent swarms. Every task is a row in `~/.hermes/kanban.db`; every handoff is a row anyone can read and write; every worker is a full OS process with its own identity.
This is the shape that covers the workloads `delegate_task` can't:
- **Research triage** — parallel researchers + analyst + writer, human-in-the-loop.
- **Scheduled ops** — recurring daily briefs that build a journal over weeks.
- **Digital twins** — persistent named assistants (`inbox-triage`, `ops-review`) that accumulate memory over time.
- **Engineering pipelines** — decompose → implement in parallel worktrees → review → iterate → PR.
- **Fleet work** — one specialist managing N subjects (50 social accounts, 12 monitored services).
For the full design rationale, comparative analysis against Cline Kanban / Paperclip / NanoClaw / Google Gemini Enterprise, and the eight canonical collaboration patterns, see `docs/hermes-kanban-v1-spec.pdf` in the repository.
## Kanban vs. `delegate_task`
They look similar; they are not the same primitive.
| | `delegate_task` | Kanban |
|---|---|---|
| Shape | RPC call (fork → join) | Durable message queue + state machine |
| Parent | Blocks until child returns | Fire-and-forget after `create` |
| Child identity | Anonymous subagent | Named profile with persistent memory |
| Resumability | None — failed = failed | Block → unblock → re-run; crash → reclaim |
| Human in the loop | Not supported | Comment / unblock at any point |
| Agents per task | One call = one subagent | N agents over task's life (retry, review, follow-up) |
| Audit trail | Lost on context compression | Durable rows in SQLite forever |
| Coordination | Hierarchical (caller → callee) | Peer — any profile reads/writes any task |
**One-sentence distinction:** `delegate_task` is a function call; Kanban is a work queue where every handoff is a row any profile (or human) can see and edit.
**Use `delegate_task` when** the parent agent needs a short reasoning answer before continuing, no humans involved, result goes back into the parent's context.
**Use Kanban when** work crosses agent boundaries, needs to survive restarts, might need human input, might be picked up by a different role, or needs to be discoverable after the fact.
They coexist: a kanban worker may call `delegate_task` internally during its run.
## Core concepts
- **Task** — a row with title, optional body, one assignee (a profile name), status (`triage | todo | ready | running | blocked | done | archived`), optional tenant namespace, optional idempotency key (dedup for retried automation).
- **Link**`task_links` row recording a parent → child dependency. The dispatcher promotes `todo → ready` when all parents are `done`.
- **Comment** — the inter-agent protocol. Agents and humans append comments; when a worker is (re-)spawned it reads the full comment thread as part of its context.
- **Workspace** — the directory a worker operates in. Three kinds:
- `scratch` (default) — fresh tmp dir under `~/.hermes/kanban/workspaces/<id>/`.
- `dir:<path>` — an existing shared directory (Obsidian vault, mail ops dir, per-account folder). **Must be an absolute path.** Relative paths like `dir:../tenants/foo/` are rejected at dispatch because they'd resolve against whatever CWD the dispatcher happens to be in, which is ambiguous and a confused-deputy escape vector. The path is otherwise trusted — it's your box, your filesystem, the worker runs with your uid. This is the trusted-local-user threat model; kanban is single-host by design.
- `worktree` — a git worktree under `.worktrees/<id>/` for coding tasks. Worker-side `git worktree add` creates it.
- **Dispatcher** — a long-lived loop that, every N seconds (default 60): reclaims stale claims, reclaims crashed workers (PID gone but TTL not yet expired), promotes ready tasks, atomically claims, spawns assigned profiles. Runs **inside the gateway** by default (`kanban.dispatch_in_gateway: true`). After ~5 consecutive spawn failures on the same task the dispatcher auto-blocks it with the last error as the reason — prevents thrashing on tasks whose profile doesn't exist, workspace can't mount, etc.
- **Tenant** — optional string namespace. One specialist fleet can serve multiple businesses (`--tenant business-a`) with data isolation by workspace path and memory key prefix.
## Quick start
```bash
# 1. Create the board
hermes kanban init
# 2. Start the gateway (hosts the embedded dispatcher)
hermes gateway start
# 3. Create a task
hermes kanban create "research AI funding landscape" --assignee researcher
# 4. Watch activity live
hermes kanban watch
# 5. See the board
hermes kanban list
hermes kanban stats
```
### Gateway-embedded dispatcher (default)
The dispatcher runs inside the gateway process. Nothing to install, no
separate service to manage — if the gateway is up, ready tasks get picked
up on the next tick (60s by default).
```yaml
# config.yaml
kanban:
dispatch_in_gateway: true # default
dispatch_interval_seconds: 60 # default
```
Override the config flag at runtime via `HERMES_KANBAN_DISPATCH_IN_GATEWAY=0`
for debugging. Standard gateway supervision applies: run `hermes gateway
start` directly, or wire the gateway up as a systemd user unit (see the
gateway docs). Without a running gateway, `ready` tasks stay where they are
until one comes up — `hermes kanban create` warns about this at creation
time.
Running `hermes kanban daemon` as a separate process is **deprecated**;
use the gateway. If you truly cannot run the gateway (headless host
policy forbids long-lived services, etc.) a `--force` escape hatch keeps
the old standalone daemon alive for one release cycle, but running both
a gateway-embedded dispatcher AND a standalone daemon against the same
`kanban.db` causes claim races and is not supported.
### Idempotent create (for automation / webhooks)
```bash
# First call creates the task. Any subsequent call with the same key
# returns the existing task id instead of duplicating.
hermes kanban create "nightly ops review" \
--assignee ops \
--idempotency-key "nightly-ops-$(date -u +%Y-%m-%d)" \
--json
```
### Bulk CLI verbs
All the lifecycle verbs accept multiple ids so you can clean up a batch
in one command:
```bash
hermes kanban complete t_abc t_def t_hij --result "batch wrap"
hermes kanban archive t_abc t_def t_hij
hermes kanban unblock t_abc t_def
hermes kanban block t_abc "need input" --ids t_def t_hij
```
## How workers interact with the board
When the dispatcher spawns a worker, it sets `HERMES_KANBAN_TASK` in the child's env. That env var is the gate for a dedicated **kanban toolset** — 7 tools that the normal agent schema never sees:
| Tool | Purpose |
|---|---|
| `kanban_show` | Read the current task (title, body, prior attempts, parent handoffs, comments, full `worker_context`). Defaults to the env's task id. |
| `kanban_complete` | Finish with `summary` + `metadata` structured handoff. |
| `kanban_block` | Escalate for human input. |
| `kanban_heartbeat` | Signal liveness during long operations. |
| `kanban_comment` | Append to the task thread. |
| `kanban_create` | (Orchestrators) fan out into child tasks. |
| `kanban_link` | (Orchestrators) add dependency edges after the fact. |
**Why tools and not just shelling to `hermes kanban`?** Three reasons:
1. **Backend portability.** Workers whose terminal tool points at a remote backend (Docker / Modal / Singularity / SSH) would run `hermes kanban complete` inside the container where `hermes` isn't installed and the DB isn't mounted. The kanban tools run in the agent's own Python process and always reach `~/.hermes/kanban.db` regardless of terminal backend.
2. **No shell-quoting fragility.** Passing `--metadata '{"files": [...]}'` through shlex + argparse is a latent footgun. Structured tool args skip it.
3. **Better errors.** Tool results are structured JSON the model can reason about, not stderr strings it has to parse.
**Zero schema footprint on normal sessions.** A regular `hermes chat` session has zero `kanban_*` tools in its schema. The `check_fn` on each tool only returns True when `HERMES_KANBAN_TASK` is set, which only happens when the dispatcher spawned this process. No tool bloat for users who never touch kanban.
The `kanban-worker` and `kanban-orchestrator` skills teach the model which tool to call when and in what order.
### The worker skill
Any profile that should be able to work kanban tasks must load the `kanban-worker` skill. It teaches the worker the full lifecycle:
1. On spawn, call `kanban_show()` to read title + body + parent handoffs + prior attempts + full comment thread.
2. `cd $HERMES_KANBAN_WORKSPACE` and do the work there.
3. Call `kanban_heartbeat(note="...")` every few minutes during long operations.
4. Complete with `kanban_complete(summary="...", metadata={...})`, or `kanban_block(reason="...")` if stuck.
Load it with:
```bash
hermes skills install devops/kanban-worker
```
The dispatcher also auto-passes `--skills kanban-worker` when spawning every worker, so the worker always has the pattern library available even if a profile's default skills config doesn't include it.
### Pinning extra skills to a specific task
Sometimes a single task needs specialist context the assignee profile doesn't carry by default — a translation job that needs the `translation` skill, a review task that needs `github-code-review`, a security audit that needs `security-pr-audit`. Rather than editing the assignee's profile every time, attach the skills directly to the task:
```bash
# CLI — repeat --skill for each extra skill
hermes kanban create "translate README to Japanese" \
--assignee linguist \
--skill translation
# Multiple skills
hermes kanban create "audit auth flow" \
--assignee reviewer \
--skill security-pr-audit \
--skill github-code-review
```
From the dashboard's inline create form, type the skills comma-separated into the **skills** field. From another agent (orchestrator pattern), use `kanban_create(skills=[...])`:
```
kanban_create(
title="translate README to Japanese",
assignee="linguist",
skills=["translation"],
)
```
These skills are **additive** to the built-in `kanban-worker` — the dispatcher emits one `--skills <name>` flag for each (and for the built-in), so the worker spawns with all of them loaded. The skill names must match skills that are actually installed on the assignee's profile (run `hermes skills list` to see what's available); there's no runtime install.
### The orchestrator skill
A **well-behaved orchestrator does not do the work itself.** It decomposes the user's goal into tasks, links them, assigns each to a specialist, and steps back. The `kanban-orchestrator` skill encodes this: anti-temptation rules, a standard specialist roster (`researcher`, `writer`, `analyst`, `backend-eng`, `reviewer`, `ops`), and a decomposition playbook.
Load it into your orchestrator profile:
```bash
hermes skills install devops/kanban-orchestrator
```
For best results, pair it with a profile whose toolsets are restricted to board operations (`kanban`, `gateway`, `memory`) so the orchestrator literally cannot execute implementation tasks even if it tries.
## Dashboard (GUI)
The `/kanban` CLI and slash command are enough to run the board headlessly, but a visual board is often the right interface for humans-in-the-loop: triage, cross-profile supervision, reading comment threads, and dragging cards between columns. Hermes ships this as a **bundled dashboard plugin** at `plugins/kanban/` — not a core feature, not a separate service — following the model laid out in [Extending the Dashboard](./extending-the-dashboard).
Open it with:
```bash
hermes kanban init # one-time: create kanban.db if not already present
hermes dashboard # "Kanban" tab appears in the nav, after "Skills"
```
### What the plugin gives you
- A **Kanban** tab showing one column per status: `triage`, `todo`, `ready`, `running`, `blocked`, `done` (plus `archived` when the toggle is on).
- `triage` is the parking column for rough ideas a specifier is expected to flesh out. Tasks created with `hermes kanban create --triage` (or via the Triage column's inline create) land here and the dispatcher leaves them alone until a human or specifier promotes them to `todo` / `ready`.
- Cards show the task id, title, priority badge, tenant tag, assigned profile, comment/link counts, a **progress pill** (`N/M` children done when the task has dependents), and "created N ago". A per-card checkbox enables multi-select.
- **Per-profile lanes inside Running** — toolbar checkbox toggles sub-grouping of the Running column by assignee.
- **Live updates via WebSocket** — the plugin tails the append-only `task_events` table on a short poll interval; the board reflects changes the instant any profile (CLI, gateway, or another dashboard tab) acts. Reloads are debounced so a burst of events triggers a single refetch.
- **Drag-drop** cards between columns to change status. The drop sends `PATCH /api/plugins/kanban/tasks/:id` which routes through the same `kanban_db` code the CLI uses — the three surfaces can never drift. Moves into destructive statuses (`done`, `archived`, `blocked`) prompt for confirmation. Touch devices use a pointer-based fallback so the board is usable from a tablet.
- **Inline create** — click `+` on any column header to type a title, assignee, priority, and (optionally) a parent task from a dropdown over every existing task. Creating from the Triage column automatically parks the new task in triage.
- **Multi-select with bulk actions** — shift/ctrl-click a card or tick its checkbox to add it to the selection. A bulk action bar appears at the top with batch status transitions, archive, and reassign (by profile dropdown, or "(unassign)"). Destructive batches confirm first. Per-id partial failures are reported without aborting the rest.
- **Click a card** (without shift/ctrl) to open a side drawer (Escape or click-outside closes) with:
- **Editable title** — click the heading to rename.
- **Editable assignee / priority** — click the meta row to rewrite.
- **Editable description** — markdown-rendered by default (headings, bold, italic, inline code, fenced code, `http(s)` / `mailto:` links, bullet lists), with an "edit" button that swaps in a textarea. Markdown rendering is a tiny, XSS-safe renderer — every substitution runs on HTML-escaped input, only `http(s)` / `mailto:` links pass through, and `target="_blank"` + `rel="noopener noreferrer"` are always set.
- **Dependency editor** — chip list of parents and children, each with an `×` to unlink, plus dropdowns over every other task to add a new parent or child. Cycle attempts are rejected server-side with a clear message.
- **Status action row** (→ triage / → ready / → running / block / unblock / complete / archive) with confirm prompts for destructive transitions.
- Result section (also markdown-rendered), comment thread with Enter-to-submit, the last 20 events.
- **Toolbar filters** — free-text search, tenant dropdown (defaults to `dashboard.kanban.default_tenant` from `config.yaml`), assignee dropdown, "show archived" toggle, "lanes by profile" toggle, and a **Nudge dispatcher** button so you don't have to wait for the next 60 s tick.
Visually the target is the familiar Linear / Fusion layout: dark theme, column headers with counts, coloured status dots, pill chips for priority and tenant. The plugin reads only theme CSS vars (`--color-*`, `--radius`, `--font-mono`, ...), so it reskins automatically with whichever dashboard theme is active.
### Architecture
The GUI is strictly a **read-through-the-DB + write-through-kanban_db** layer with no domain logic of its own:
```
┌────────────────────────┐ WebSocket (tails task_events)
│ React SPA (plugin) │ ◀──────────────────────────────────┐
│ HTML5 drag-and-drop │ │
└──────────┬─────────────┘ │
│ REST over fetchJSON │
▼ │
┌────────────────────────┐ writes call kanban_db.* │
│ FastAPI router │ directly — same code path │
│ plugins/kanban/ │ the CLI /kanban verbs use │
│ dashboard/plugin_api.py │
└──────────┬─────────────┘ │
│ │
▼ │
┌────────────────────────┐ │
│ ~/.hermes/kanban.db │ ───── append task_events ──────────┘
│ (WAL, shared) │
└────────────────────────┘
```
### REST surface
All routes are mounted under `/api/plugins/kanban/` and protected by the dashboard's ephemeral session token:
| Method | Path | Purpose |
|---|---|---|
| `GET` | `/board?tenant=<name>&include_archived=…` | Full board grouped by status column, plus tenants + assignees for filter dropdowns |
| `GET` | `/tasks/:id` | Task + comments + events + links |
| `POST` | `/tasks` | Create (wraps `kanban_db.create_task`, accepts `triage: bool` and `parents: [id, …]`) |
| `PATCH` | `/tasks/:id` | Status / assignee / priority / title / body / result |
| `POST` | `/tasks/bulk` | Apply the same patch (status / archive / assignee / priority) to every id in `ids`. Per-id failures reported without aborting siblings |
| `POST` | `/tasks/:id/comments` | Append a comment |
| `POST` | `/links` | Add a dependency (`parent_id``child_id`) |
| `DELETE` | `/links?parent_id=…&child_id=…` | Remove a dependency |
| `POST` | `/dispatch?max=…&dry_run=…` | Nudge the dispatcher — skip the 60 s wait |
| `GET` | `/config` | Read `dashboard.kanban` preferences from `config.yaml``default_tenant`, `lane_by_profile`, `include_archived_by_default`, `render_markdown` |
| `WS` | `/events?since=<event_id>` | Live stream of `task_events` rows |
Every handler is a thin wrapper — the plugin is ~700 lines of Python (router + WebSocket tail + bulk batcher + config reader) and adds no new business logic. A tiny `_conn()` helper auto-initializes `kanban.db` on every read and write, so a fresh install works whether the user opened the dashboard first, hit the REST API directly, or ran `hermes kanban init`.
### Dashboard config
Any of these keys under `dashboard.kanban` in `~/.hermes/config.yaml` changes the tab's defaults — the plugin reads them at load time via `GET /config`:
```yaml
dashboard:
kanban:
default_tenant: acme # preselects the tenant filter
lane_by_profile: true # default for the "lanes by profile" toggle
include_archived_by_default: false
render_markdown: true # set false for plain <pre> rendering
```
Each key is optional and falls back to the shown default.
### Security model
The dashboard's HTTP auth middleware [explicitly skips `/api/plugins/`](./extending-the-dashboard#backend-api-routes) — plugin routes are unauthenticated by design because the dashboard binds to localhost by default. That means the kanban REST surface is reachable from any process on the host.
The WebSocket takes one additional step: it requires the dashboard's ephemeral session token as a `?token=…` query parameter (browsers can't set `Authorization` on an upgrade request), matching the pattern used by the in-browser PTY bridge.
If you run `hermes dashboard --host 0.0.0.0`, every plugin route — kanban included — becomes reachable from the network. **Don't do that on a shared host.** The board contains task bodies, comments, and workspace paths; an attacker reaching these routes gets read access to your entire collaboration surface and can also create / reassign / archive tasks.
Tasks in `~/.hermes/kanban.db` are profile-agnostic on purpose (that's the coordination primitive). If you open the dashboard with `hermes -p <profile> dashboard`, the board still shows tasks created by any other profile on the host. Same user owns all profiles, but this is worth knowing if multiple personas coexist.
### Live updates
`task_events` is an append-only SQLite table with a monotonic `id`. The WebSocket endpoint holds each client's last-seen event id and pushes new rows as they land. When a burst of events arrives, the frontend reloads the (very cheap) board endpoint — simpler and more correct than trying to patch local state from every event kind. WAL mode means the read loop never blocks the dispatcher's `BEGIN IMMEDIATE` claim transactions.
### Extending it
The plugin uses the standard Hermes dashboard plugin contract — see [Extending the Dashboard](./extending-the-dashboard) for the full manifest reference, shell slots, page-scoped slots, and the Plugin SDK. Extra columns, custom card chrome, tenant-filtered layouts, or full `tab.override` replacements are all expressible without forking this plugin.
To disable without removing: add `dashboard.plugins.kanban.enabled: false` to `config.yaml` (or delete `plugins/kanban/dashboard/manifest.json`).
### Scope boundary
The GUI is deliberately thin. Everything the plugin does is reachable from the CLI; the plugin just makes it comfortable for humans. Auto-assignment, budgets, governance gates, and org-chart views remain user-space — a router profile, another plugin, or a reuse of `tools/approval.py` — exactly as listed in the out-of-scope section of the design spec.
## CLI command reference
```
hermes kanban init # create kanban.db + print daemon hint
hermes kanban create "<title>" [--body ...] [--assignee <profile>]
[--parent <id>]... [--tenant <name>]
[--workspace scratch|worktree|dir:<path>]
[--priority N] [--triage] [--idempotency-key KEY]
[--max-runtime 30m|2h|1d|<seconds>]
[--skill <name>]...
[--json]
hermes kanban list [--mine] [--assignee P] [--status S] [--tenant T] [--archived] [--json]
hermes kanban show <id> [--json]
hermes kanban assign <id> <profile> # or 'none' to unassign
hermes kanban link <parent_id> <child_id>
hermes kanban unlink <parent_id> <child_id>
hermes kanban claim <id> [--ttl SECONDS]
hermes kanban comment <id> "<text>" [--author NAME]
# Bulk verbs — accept multiple ids:
hermes kanban complete <id>... [--result "..."]
hermes kanban block <id> "<reason>" [--ids <id>...]
hermes kanban unblock <id>...
hermes kanban archive <id>...
hermes kanban tail <id> # follow a single task's event stream
hermes kanban watch [--assignee P] [--tenant T] # live stream ALL events to the terminal
[--kinds completed,blocked,…] [--interval SECS]
hermes kanban heartbeat <id> [--note "..."] # worker liveness signal for long ops
hermes kanban runs <id> [--json] # attempt history (one row per run)
hermes kanban assignees [--json] # profiles on disk + per-assignee task counts
hermes kanban dispatch [--dry-run] [--max N] # one-shot pass
[--failure-limit N] [--json]
hermes kanban daemon --force # DEPRECATED — standalone dispatcher (use `hermes gateway start` instead)
[--failure-limit N] [--pidfile PATH] [-v]
hermes kanban stats [--json] # per-status + per-assignee counts
hermes kanban log <id> [--tail BYTES] # worker log from ~/.hermes/kanban/logs/
hermes kanban notify-subscribe <id> # gateway bridge hook (used by /kanban in the gateway)
--platform <name> --chat-id <id> [--thread-id <id>] [--user-id <id>]
hermes kanban notify-list [<id>] [--json]
hermes kanban notify-unsubscribe <id>
--platform <name> --chat-id <id> [--thread-id <id>]
hermes kanban context <id> # what a worker sees
hermes kanban gc [--event-retention-days N] # workspaces + old events + old logs
[--log-retention-days N]
```
All commands are also available as a slash command in the gateway (`/kanban list`, `/kanban comment t_abc "need docs"`, etc.). The slash command bypasses the running-agent guard, so you can `/kanban unblock` a stuck worker while the main agent is still chatting.
## Collaboration patterns
The board supports these eight patterns without any new primitives:
| Pattern | Shape | Example |
|---|---|---|
| **P1 Fan-out** | N siblings, same role | "research 5 angles in parallel" |
| **P2 Pipeline** | role chain: scout → editor → writer | daily brief assembly |
| **P3 Voting / quorum** | N siblings + 1 aggregator | 3 researchers → 1 reviewer picks |
| **P4 Long-running journal** | same profile + shared dir + cron | Obsidian vault |
| **P5 Human-in-the-loop** | worker blocks → user comments → unblock | ambiguous decisions |
| **P6 `@mention`** | inline routing from prose | `@reviewer look at this` |
| **P7 Thread-scoped workspace** | `/kanban here` in a thread | per-project gateway threads |
| **P8 Fleet farming** | one profile, N subjects | 50 social accounts |
| **P9 Triage specifier** | rough idea → `triage` → specifier expands body → `todo` | "turn this one-liner into a spec' task" |
For worked examples of each, see `docs/hermes-kanban-v1-spec.pdf`.
## Multi-tenant usage
When one specialist fleet serves multiple businesses, tag each task with a tenant:
```bash
hermes kanban create "monthly report" \
--assignee researcher \
--tenant business-a \
--workspace dir:~/tenants/business-a/data/
```
Workers receive `$HERMES_TENANT` and namespace their memory writes by prefix. The board, the dispatcher, and the profile definitions are all shared; only the data is scoped.
## Gateway notifications
When you run `/kanban create …` from the gateway (Telegram, Discord, Slack, etc.), the originating chat is automatically subscribed to the new task. The gateway's background notifier polls `task_events` every few seconds and delivers one message per terminal event (`completed`, `blocked`, `gave_up`, `crashed`, `timed_out`) to that chat. Completed tasks also send the first line of the worker's `--result` so you see the outcome without having to `/kanban show`.
You can manage subscriptions explicitly from the CLI — useful when a script / cron job wants to notify a chat it didn't originate from:
```bash
hermes kanban notify-subscribe t_abcd \
--platform telegram --chat-id 12345678 --thread-id 7
hermes kanban notify-list
hermes kanban notify-unsubscribe t_abcd \
--platform telegram --chat-id 12345678 --thread-id 7
```
A subscription removes itself automatically once the task reaches `done` or `archived`; no cleanup needed.
## Runs — one row per attempt
A task is a logical unit of work; a **run** is one attempt to execute it. When the dispatcher claims a ready task it creates a row in `task_runs` and points `tasks.current_run_id` at it. When that attempt ends — completed, blocked, crashed, timed out, spawn-failed, reclaimed — the run row closes with an `outcome` and the task's pointer clears. A task that's been attempted three times has three `task_runs` rows.
Why two tables instead of just mutating the task: you need **full attempt history** for real-world postmortems ("the second reviewer attempt got to approve, the third merged"), and you need a clean place to hang per-attempt metadata — which files changed, which tests ran, which findings a reviewer noted. Those are run facts, not task facts.
Runs are also where **structured handoff** lives. When a worker completes a task it can pass:
- `--result "<short log line>"` — goes on the task row as before (for back-compat).
- `--summary "<human handoff>"` — goes on the run; downstream children see it in their `build_worker_context`.
- `--metadata '{"changed_files": [...], "tests_run": 12}'` — JSON dict on the run; children see it serialized alongside the summary.
Downstream children read the most recent completed run's summary + metadata for each parent. Retrying workers read the prior attempts on their own task (outcome, summary, error) so they don't repeat a path that already failed.
```bash
# Worker completes with a structured handoff:
hermes kanban complete t_abcd \
--result "rate limiter shipped" \
--summary "implemented token bucket, keys on user_id with IP fallback, all tests pass" \
--metadata '{"changed_files": ["limiter.py", "tests/test_limiter.py"], "tests_run": 14}'
# Review the attempt history on a retried task:
hermes kanban runs t_abcd
# # OUTCOME PROFILE ELAPSED STARTED
# 1 blocked worker 12s 2026-04-27 14:02
# → BLOCKED: need decision on rate-limit key
# 2 completed worker 8m 2026-04-27 15:18
# → implemented token bucket, keys on user_id with IP fallback
```
Runs are exposed on the dashboard (Run History section in the drawer, one coloured row per attempt) and on the REST API (`GET /api/plugins/kanban/tasks/:id` returns a `runs[]` array). `PATCH /api/plugins/kanban/tasks/:id` with `{status: "done", summary, metadata}` forwards both to the kernel, so the dashboard's "mark done" button is CLI-equivalent. `task_events` rows carry the `run_id` they belong to so the UI can group them by attempt, and the `completed` event embeds the first-line summary in its payload (capped at 400 chars) so gateway notifiers can render structured handoffs without a second SQL round-trip.
**Bulk close caveat.** `hermes kanban complete a b c --summary X` is refused — structured handoff is per-run, so copy-pasting the same summary to N tasks is almost always wrong. Bulk close *without* `--summary` / `--metadata` still works for the common "I finished a pile of admin tasks" case.
**Reclaimed runs from status changes.** If you drag a running task off `running` in the dashboard (back to `ready`, or straight to `todo`), or archive a task that was still running, the in-flight run closes with `outcome='reclaimed'` rather than being orphaned. The `task_runs` row is always in a terminal state when `tasks.current_run_id` is `NULL`, and vice versa — that invariant holds across CLI, dashboard, dispatcher, and notifier.
**Synthetic runs for never-claimed completions.** Completing or blocking a task that was never claimed (e.g. a human closes a `ready` task from the dashboard with a summary, or a CLI user runs `hermes kanban complete <ready-task> --summary X`) would otherwise drop the handoff. Instead the kernel inserts a zero-duration run row (`started_at == ended_at`) carrying the summary / metadata / reason so attempt history stays complete. The `completed` / `blocked` event's `run_id` points at that row.
**Live drawer refresh.** When the dashboard's WebSocket event stream reports new events for the task the user is currently viewing, the drawer reloads itself (via a per-task event counter threaded into its `useEffect` dependency list). Closing and reopening is no longer required to see a run's new row or updated outcome.
### Forward compatibility
Two nullable columns on `tasks` are reserved for v2 workflow routing: `workflow_template_id` (which template this task belongs to) and `current_step_key` (which step in that template is active). The v1 kernel ignores them for routing but lets clients write them, so a v2 release can add the routing machinery without another schema migration.
## Event reference
Every transition appends a row to `task_events`. Each row carries an optional `run_id` so UIs can group events by attempt. Kinds group into three clusters so filtering is easy (`hermes kanban watch --kinds completed,gave_up,timed_out`):
**Lifecycle** (what changed about the task as a logical unit):
| Kind | Payload | When |
|---|---|---|
| `created` | `{assignee, status, parents, tenant}` | Task inserted. `run_id` is `NULL`. |
| `promoted` | — | `todo → ready` because all parents hit `done`. `run_id` is `NULL`. |
| `claimed` | `{lock, expires, run_id}` | Dispatcher atomically claimed a `ready` task for spawn. |
| `completed` | `{result_len, summary?}` | Worker wrote `--result` / `--summary` and task hit `done`. `summary` is the first-line handoff (400-char cap); full version lives on the run row. If `complete_task` is called on a never-claimed task with handoff fields, a zero-duration run is synthesized so `run_id` still points at something. |
| `blocked` | `{reason}` | Worker or human flipped the task to `blocked`. Synthesizes a zero-duration run when called on a never-claimed task with `--reason`. |
| `unblocked` | — | `blocked → ready`, either manually or via `/unblock`. `run_id` is `NULL`. |
| `archived` | — | Hidden from the default board. If the task was still running, carries the `run_id` of the run that was reclaimed as a side effect. |
**Edits** (human-driven changes that aren't transitions):
| Kind | Payload | When |
|---|---|---|
| `assigned` | `{assignee}` | Assignee changed (including unassignment). |
| `edited` | `{fields}` | Title or body updated. |
| `reprioritized` | `{priority}` | Priority changed. |
| `status` | `{status}` | Dashboard drag-drop wrote a status directly (e.g. `todo → ready`). Carries the `run_id` of the run that was reclaimed when dragging off `running`; otherwise `run_id` is NULL. |
**Worker telemetry** (about the execution process, not the logical task):
| Kind | Payload | When |
|---|---|---|
| `spawned` | `{pid}` | Dispatcher successfully started a worker process. |
| `heartbeat` | `{note?}` | Worker called `hermes kanban heartbeat $TASK` to signal liveness during long operations. |
| `reclaimed` | `{stale_lock}` | Claim TTL expired without a completion; task goes back to `ready`. |
| `crashed` | `{pid, claimer}` | Worker PID no longer alive but TTL hadn't expired yet. |
| `timed_out` | `{pid, elapsed_seconds, limit_seconds, sigkill}` | `max_runtime_seconds` exceeded; dispatcher SIGTERM'd (then SIGKILL'd after 5 s grace) and re-queued. |
| `spawn_failed` | `{error, failures}` | One spawn attempt failed (missing PATH, workspace unmountable, …). Counter increments; task returns to `ready` for retry. |
| `gave_up` | `{failures, error}` | Circuit breaker fired after N consecutive `spawn_failed`. Task auto-blocks with the last error. Default N = 5; override via `--failure-limit`. |
`hermes kanban tail <id>` shows these for a single task. `hermes kanban watch` streams them board-wide.
## Out of scope
Kanban is deliberately single-host. `~/.hermes/kanban.db` is a local SQLite file and the dispatcher spawns workers on the same machine. Running a shared board across two hosts is not supported — there's no coordination primitive for "worker X on host A, worker Y on host B," and the crash-detection path assumes PIDs are host-local. If you need multi-host, run an independent board per host and use `delegate_task` / a message queue to bridge them.
## Design spec
The complete design — architecture, concurrency correctness, comparison with other systems, implementation plan, risks, open questions — lives in `docs/hermes-kanban-v1-spec.pdf`. Read that before filing any behavior-change PR.
+2
View File
@@ -62,6 +62,8 @@ const sidebars: SidebarsConfig = {
items: [
'user-guide/features/cron',
'user-guide/features/delegation',
'user-guide/features/kanban',
'user-guide/features/kanban-tutorial',
'user-guide/features/code-execution',
'user-guide/features/hooks',
'user-guide/features/batch-processing',
Binary file not shown.

After

Width:  |  Height:  |  Size: 748 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 764 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 476 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 500 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 496 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 811 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 804 KiB

Some files were not shown because too many files have changed in this diff Show More