Compare commits

..

75 Commits

Author SHA1 Message Date
kshitijk4poor 8aaefec231 fix: follow-up for salvaged PR #8952
- Rename provider_contracts.py -> volcengine_byteplus.py for explicitness
- Consolidate duplicate host-to-provider mappings: provider_for_base_url()
  now uses the canonical _URL_TO_PROVIDER from model_metadata.py instead of
  maintaining a separate 20-entry dict
- Add volcengine/byteplus to runtime_provider.py model-dependent base URL
  resolution (kimi-style special case) so manually-edited configs resolve
  the coding-plan base URL correctly
- Remove volcengine/byteplus from _API_KEY_PROVIDER_AUX_MODELS — the
  main-model-first design in _resolve_auto() handles these providers
  already; entries were dead code in the normal flow
- Add VOLCENGINE_API_KEY and BYTEPLUS_API_KEY to OPTIONAL_ENV_VARS in
  config.py so they appear in hermes setup
- Update docs: environment-variables.md, fallback-providers.md,
  configuration.md
2026-04-22 22:42:39 +05:30
gaoyiman ccde71a6ab feat(providers): add Volcengine and BytePlus support
Based on PR #8952 by @Maaannnn.

Adds Volcengine and BytePlus as first-class providers, each with standard
and Coding Plan model catalogs. The model prefix (volcengine/ vs
volcengine-coding-plan/) determines the runtime base URL automatically.

- New hermes_cli/provider_contracts.py centralises all constants
- ProviderConfig entries in auth.py with api_key auth
- Model catalogs, aliases, and provider ordering in models.py/providers.py
- Auxiliary client entries and context window resolution
- gateway /provider command detects known Volcengine/BytePlus endpoints
- Comprehensive tests and docs update
2026-04-22 22:33:06 +05:30
kshitijk4poor 5e8262da26 chore: add rnijhara to AUTHOR_MAP 2026-04-22 08:49:24 -07:00
kshitijk4poor 1f216ecbb4 feat(gateway/slack): add SLACK_REACTIONS env toggle for reaction lifecycle
Adds _reactions_enabled() gating to match Discord (DISCORD_REACTIONS) and
Telegram (TELEGRAM_REACTIONS) pattern. Defaults to true to preserve existing
behavior. Gates at three levels:
- _handle_slack_message: skips _reacting_message_ids registration
- on_processing_start: early return
- on_processing_complete: early return

Also adds config.yaml bridge (slack.reactions) and two new tests.
2026-04-22 08:49:24 -07:00
Roopak Nijhara 70a33708e7 fix(gateway/slack): align reaction lifecycle with Discord/Telegram pattern
Slack reactions were placed around handle_message(), which returns
immediately after spawning a background task. This caused the 👀 swap to happen before any real work began.

Fix: implement on_processing_start / on_processing_complete callbacks
(matching Discord/Telegram) so reactions bracket actual _message_handler
work driven by the base class.

Also fixes missing stop_typing() for Slack's assistant thread status
indicator, which left 'is thinking...' stuck in the UI after processing
completed.

- Add _reacting_message_ids set for DM/@mention-only gating
- Add _active_status_threads dict for stop_typing lookup
- Update test_reactions_in_message_flow for new callback pattern
- Add test_reactions_failure_outcome and test_reactions_skipped_for_non_dm_non_mention
2026-04-22 08:49:24 -07:00
kshitijk4poor 04e039f687 fix: Kimi /coding thinking block survival + empty reasoning_content + block ordering
Follow-up to the cherry-picked PR #13897 fix. Three issues found:

1. CRITICAL: The thinking block synthesised from reasoning_content was
   immediately stripped by the third-party signature management code
   (Kimi is classified as _is_third_party_anthropic_endpoint). Added a
   Kimi-specific carve-out that preserves unsigned thinking blocks while
   still stripping Anthropic-signed blocks Kimi can't validate.

2. Empty-string reasoning_content was silently dropped because the
   truthiness check ('if reasoning_content and ...') evaluates to False
   for ''. Changed to 'isinstance(reasoning_content, str)' so the
   tier-3 fallback from _copy_reasoning_content_for_api (which injects
   '' for Kimi tool-call messages with no reasoning) actually produces
   a thinking block.

3. The thinking block was appended AFTER tool_use blocks. Anthropic
   protocol requires thinking -> text -> tool_use ordering. Changed to
   blocks.insert(0, ...) to prepend.
2026-04-22 08:21:23 -07:00
Jerome 97a536057d chore(release): add hiddenpuppy to AUTHOR_MAP
Map tsuijinglei@gmail.com → hiddenpuppy.
2026-04-22 08:21:23 -07:00
Jerome 2efb0eea21 fix(anthropic_adapter): preserve reasoning_content on assistant tool-call messages for Kimi /coding
Fixes NousResearch/hermes-agent#13848

Kimi's /coding endpoint speaks the Anthropic Messages protocol but has its
own thinking semantics: when thinking is enabled, Kimi validates message
history and requires every prior assistant tool-call message to carry
OpenAI-style reasoning_content.

The Anthropic path never populated that field, and
convert_messages_to_anthropic strips all Anthropic thinking blocks on
third-party endpoints — so the request failed with HTTP 400:
  "thinking is enabled but reasoning_content is missing in assistant
tool call message at index N"

Now, when an assistant message contains tool_calls and a
reasoning_content string, we append a {"type": "thinking", ...} block
to the Anthropic content so Kimi can validate the history.  This only
affects assistant messages with tool_calls + reasoning_content; plain
text assistant messages are unchanged.
2026-04-22 08:21:23 -07:00
Teknium 77e04a29d5 fix(error_classifier): don't classify generic 404 as model_not_found (#14013)
The 404 branch in _classify_by_status had dead code: the generic
fallback below the _MODEL_NOT_FOUND_PATTERNS check returned the
exact same classification (model_not_found + should_fallback=True),
so every 404 — regardless of message — was treated as a missing model.

This bites local-endpoint users (llama.cpp, Ollama, vLLM) whose 404s
usually mean a wrong endpoint path, proxy routing glitch, or transient
backend issue — not a missing model. Claiming 'model not found' misleads
the next turn and silently falls back to another provider when the real
problem was a URL typo the user should see.

Fix: only classify 404 as model_not_found when the message actually
matches _MODEL_NOT_FOUND_PATTERNS ("invalid model", "model not found",
etc.). Otherwise fall through as unknown (retryable) so the real error
surfaces in the retry loop.

Test updated to match the new behavior. 103 error_classifier tests pass.
2026-04-22 06:11:47 -07:00
Yukipukii1 40619b393f tools: normalize file tool pagination bounds 2026-04-22 06:11:41 -07:00
Teknium 3e652f75b2 fix(plugins+nous): auto-coerce memory plugins; actionable Nous 401 diagnostic (#14005)
* fix(plugins): auto-coerce user-installed memory plugins to kind=exclusive

User-installed memory provider plugins at $HERMES_HOME/plugins/<name>/
were being dispatched to the general PluginManager, which has no
register_memory_provider method on PluginContext. Every startup logged:

  Failed to load plugin 'mempalace': 'PluginContext' object has no
  attribute 'register_memory_provider'

Bundled memory providers were already skipped via skip_names={memory,
context_engine} in discover_and_load, but user-installed ones weren't.

Fix: _parse_manifest now scans the plugin's __init__.py source for
'register_memory_provider' or 'MemoryProvider' (same heuristic as
plugins/memory/__init__.py:_is_memory_provider_dir) and auto-coerces
kind to 'exclusive' when the manifest didn't declare one explicitly.
This routes the plugin to plugins/memory discovery instead of the
general loader.

The escape hatch: if a manifest explicitly declares kind: standalone,
the heuristic doesn't override it.

Reported by Uncle HODL on Discord.

* fix(nous): actionable CLI message when Nous 401 refresh fails

Mirrors the Anthropic 401 diagnostic pattern. When Nous returns 401
and the credential refresh (_try_refresh_nous_client_credentials)
also fails, the user used to see only the raw APIError. Now prints:

  🔐 Nous 401 — Portal authentication failed.
     Response: <truncated body>
     Most likely: Portal OAuth expired, account out of credits, or
                  agent key revoked.
     Troubleshooting:
       • Re-authenticate: hermes login --provider nous
       • Check credits / billing: https://portal.nousresearch.com
       • Verify stored credentials: $HERMES_HOME/auth.json
       • Switch providers temporarily: /model <model> --provider openrouter

Addresses the common 'my hermes model hangs' pattern where the user's
Portal OAuth expired and the CLI gave no hint about the next step.
2026-04-22 05:54:11 -07:00
kshitijk4poor 5fb143169b feat(dashboard): track real API call count per session
Adds schema v7 'api_call_count' column. run_agent.py increments it by 1
per LLM API call, web_server analytics SQL aggregates it, frontend uses
the real counter instead of summing sessions.

The 'API Calls' card on the analytics dashboard previously displayed
COUNT(*) from the sessions table — the number of conversations, not
LLM requests. Each session makes 10-90 API calls through the tool loop,
so the reported number was ~30x lower than real.

Salvaged from PR #10140 (@kshitijk4poor). The cache-token accuracy
portions of the original PR were deferred — per-provider analytics is
the better path there, since cache_write_tokens and actual_cost_usd
are only reliably available from a subset of providers (Anthropic
native, Codex Responses, OpenRouter with usage.include).

Tests:
- schema_version v7 assertion
- migration v2 -> v7 adds api_call_count column with default 0
- update_token_counts increments api_call_count by provided delta
- absolute=True sets api_call_count directly
- /api/analytics/usage exposes total_api_calls in totals
2026-04-22 05:51:58 -07:00
teknium1 be11a75eae chore(release): map hharry11 email to GitHub handle 2026-04-22 05:51:44 -07:00
hharry11 83cb9a03ee fix(cli): ensure project .env is sanitized before loading 2026-04-22 05:51:44 -07:00
WideLee cf55c738e7 refactor(qqbot): migrate qr onboard flow to sync + consolidate into onboard.py
- Replace async create_bind_task/poll_bind_result with synchronous
  httpx.Client equivalents, eliminating manual event loop management
- Move _render_qr and full qr_register() entry-point into onboard.py,
  mirroring the Feishu onboarding pattern
- Remove _qqbot_render_qr and _qqbot_qr_flow from gateway.py (~90 lines);
  call site becomes a single qr_register() import
- Fix potential segfault: previous code called loop.close() in the EXPIRED
  branch and again in the finally block (double-close crashed under uvloop)
2026-04-22 05:50:21 -07:00
Teknium ba7e8b0df9 chore(release): map Abner email to Abnertheforeman 2026-04-22 05:27:10 -07:00
Abner b66644f0ec feat(hindsight): richer session-scoped retain metadata
- Add configurable retain_tags / retain_source / retain_user_prefix /
  retain_assistant_prefix knobs for native Hindsight.
- Thread gateway session identity (user_name, chat_id, chat_name,
  chat_type, thread_id) through AIAgent and MemoryManager into
  MemoryProvider.initialize kwargs so providers can scope and tag
  retained memories.
- Hindsight attaches the new identity fields as retain metadata,
  merges per-call tool tags with configured default tags, and uses
  the configurable transcript labels for auto-retained turns.

Co-authored-by: Abner <abner.the.foreman@agentmail.to>
2026-04-22 05:27:10 -07:00
Teknium b8663813b6 feat(state): auto-prune old sessions + VACUUM state.db at startup (#13861)
* feat(state): auto-prune old sessions + VACUUM state.db at startup

state.db accumulates every session, message, and FTS5 index entry forever.
A heavy user (gateway + cron) reported 384MB with 982 sessions / 68K messages
causing slowdown; manual 'hermes sessions prune --older-than 7' + VACUUM
brought it to 43MB. The prune command and VACUUM are not wired to run
automatically anywhere — sessions grew unbounded until users noticed.

Changes:
- hermes_state.py: new state_meta key/value table, vacuum() method, and
  maybe_auto_prune_and_vacuum() — idempotent via last-run timestamp in
  state_meta so it only actually executes once per min_interval_hours
  across all Hermes processes for a given HERMES_HOME. Never raises.
- hermes_cli/config.py: new 'sessions:' block in DEFAULT_CONFIG
  (auto_prune=True, retention_days=90, vacuum_after_prune=True,
  min_interval_hours=24). Added to _KNOWN_ROOT_KEYS.
- cli.py: call maintenance once at HermesCLI init (shared helper
  _run_state_db_auto_maintenance reads config and delegates to DB).
- gateway/run.py: call maintenance once at GatewayRunner init.
- Docs: user-guide/sessions.md rewrites 'Automatic Cleanup' section.

Why VACUUM matters: SQLite does NOT shrink the file on DELETE — freed
pages get reused on next INSERT. Without VACUUM, a delete-heavy DB stays
bloated forever. VACUUM only runs when the prune actually removed rows,
so tight DBs don't pay the I/O cost.

Tests: 10 new tests in tests/test_hermes_state.py covering state_meta,
vacuum, idempotency, interval skipping, VACUUM-only-when-needed,
corrupt-marker recovery. All 246 existing state/config/gateway tests
still pass.

Verified E2E with real imports + isolated HERMES_HOME: DEFAULT_CONFIG
exposes the new block, load_config() returns it for fresh installs,
first call prunes+vacuums, second call within min_interval_hours skips,
and the state_meta marker persists across connection close/reopen.

* sessions.auto_prune defaults to false (opt-in)

Session history powers session_search recall across past conversations,
so silently pruning on startup could surprise users. Ship the machinery
disabled and let users opt in when they notice state.db is hurting
performance.

- DEFAULT_CONFIG.sessions.auto_prune: True → False
- Call-site fallbacks in cli.py and gateway/run.py match the new default
  (so unmigrated configs still see off)
- Docs: flip 'Enable in config.yaml' framing + tip explains the tradeoff
2026-04-22 05:21:49 -07:00
Teknium b43524ecab fix(wecom): visible poll progress + clearer no-bot-info failure + docstring note
Follow-ups on top of salvaged #13923 (@keifergu):
- Print QR poll dot every 3s instead of every 18s so "Fetching
  configuration results..." doesn't look hung.
- On "status=success but no bot_info" from the WeCom query endpoint,
  log the full payload at WARNING and tell the user we're falling
  back to manual entry (was previously a single opaque line).
- Document in the qr_scan_for_bot_info() docstring that the
  work.weixin.qq.com/ai/qc/* endpoints are the admin-console web-UI
  flow, not the public developer API, and may change without notice.

Also add keifergu@tencent.com to scripts/release.py AUTHOR_MAP so
release notes attribute the feature correctly.
2026-04-22 05:15:32 -07:00
keifergu 3f60a907e1 docs(wecom): document QR scan-to-create setup flow 2026-04-22 05:15:32 -07:00
keifergu 8bcd77a9c2 feat(wecom): add QR scan flow and interactive setup wizard for bot credentials 2026-04-22 05:15:32 -07:00
Teknium d166716c65 feat(optional-skills): add page-agent skill under new web-development category (#13976)
Adds an optional skill that walks users through installing and using
alibaba/page-agent — a pure-JS in-page GUI agent that web developers
embed into their own webapps so end users can drive the UI with
natural language.

Three install paths: CDN demo (30s, no install), npm install into an
existing app with provider config table (Qwen/OpenAI/Ollama/OpenRouter),
and clone-from-source for dev/contributor workflow.

Clear use-case framing up front (embed AI copilot in SaaS/admin/B2B,
modernize legacy UIs, accessibility via natural language) and an
explicit NOT-for list that points users wanting server-side browser
automation back to Hermes' built-in browser tool.

Live-verified: repo builds on Node 22.22 + npm 10.9, dev:demo serves
at localhost:5174, API surface (new PageAgent{...}, panel.show(),
execute(task)) matches what the skill documents. Also verified
discovery end-to-end via OptionalSkillSource with isolated
HERMES_HOME — search/inspect/fetch all resolve
official/web-development/page-agent correctly.

New category directory: optional-skills/web-development/ with a
DESCRIPTION.md explaining the distinction from Hermes' own browser
automation (outside-in vs inside-out).
2026-04-22 04:54:26 -07:00
helix4u a7d78d3bfd fix: preserve reasoning_content on Kimi replay 2026-04-22 04:31:59 -07:00
kshitijk4poor 30ec12970b fix(packaging): include agent.* sub-packages in pyproject.toml
The transport refactor (PRs #13862 ff.) added agent/transports/ as a
sub-package but the setuptools packages.find include list only had
"agent" (top-level files), not "agent.*" (sub-packages).

pip install / Nix builds therefore ship run_agent.py (which now imports
from agent.transports on every API call) but omit the transports
directory entirely, causing:

  ModuleNotFoundError: No module named 'agent.transports'

on every LLM call for packaged installs.

Adds "agent.*" to match the existing pattern used by tools, gateway,
tui_gateway, and plugins.
2026-04-22 03:35:37 -07:00
hengm3467 c6b1ef4e58 feat: add Step Plan provider support (salvage #6005)
Adds a first-class 'stepfun' API-key provider surfaced as Step Plan:

- Support Step Plan setup for both International and China regions
- Discover Step Plan models live from /step_plan/v1/models, with a
  small coding-focused fallback catalog when discovery is unavailable
- Thread StepFun through provider metadata, setup persistence, status
  and doctor output, auxiliary routing, and model normalization
- Add tests for provider resolution, model validation, metadata
  mapping, and StepFun region/model persistence

Based on #6005 by @hengm3467.

Co-authored-by: hengm3467 <100685635+hengm3467@users.noreply.github.com>
2026-04-22 02:59:58 -07:00
Teknium ff9752410a feat(plugins): pluggable image_gen backends + OpenAI provider (#13799)
* feat(plugins): pluggable image_gen backends + OpenAI provider

Adds a ImageGenProvider ABC so image generation backends register as
bundled plugins under `plugins/image_gen/<name>/`. The plugin scanner
gains three primitives to make this work generically:

- `kind:` manifest field (`standalone` | `backend` | `exclusive`).
  Bundled `kind: backend` plugins auto-load — no `plugins.enabled`
  incantation. User-installed backends stay opt-in.
- Path-derived keys: `plugins/image_gen/openai/` gets key
  `image_gen/openai`, so a future `tts/openai` cannot collide.
- Depth-2 recursion into category namespaces (parent dirs without a
  `plugin.yaml` of their own).

Includes `OpenAIImageGenProvider` as the first consumer (gpt-image-1.5
default, plus gpt-image-1, gpt-image-1-mini, DALL-E 3/2). Base64
responses save to `$HERMES_HOME/cache/images/`; URL responses pass
through.

FAL stays in-tree for this PR — a follow-up ports it into
`plugins/image_gen/fal/` so the in-tree `image_generation_tool.py`
slims down. The dispatch shim in `_handle_image_generate` only fires
when `image_gen.provider` is explicitly set to a non-FAL value, so
existing FAL setups are untouched.

- 41 unit tests (scanner recursion, kind parsing, gate logic,
  registry, OpenAI payload shapes)
- E2E smoke verified: bundled plugin autoloads, registers, and
  `_handle_image_generate` routes to OpenAI when configured

* fix(image_gen/openai): don't send response_format to gpt-image-*

The live API rejects it: 'Unknown parameter: response_format'
(verified 2026-04-21 with gpt-image-1.5). gpt-image-* models return
b64_json unconditionally, so the parameter was both unnecessary and
actively broken.

* feat(image_gen/openai): gpt-image-2 only, drop legacy catalog

gpt-image-2 is the latest/best OpenAI image model (released 2026-04-21)
and there's no reason to expose the older gpt-image-1.5 / gpt-image-1 /
dall-e-3 / dall-e-2 alongside it — slower, lower quality, or awkward
(dall-e-2 squares only). Trim the catalog down to a single model.

Live-verified end-to-end: landscape 1536x1024 render of a Moog-style
synth matches prompt exactly, 2.4MB PNG saved to cache.

* feat(image_gen/openai): expose gpt-image-2 as three quality tiers

Users pick speed/fidelity via the normal model picker instead of a
hidden quality knob. All three tier IDs resolve to the single underlying
gpt-image-2 API model with a different quality parameter:

  gpt-image-2-low     ~15s   fast iteration
  gpt-image-2-medium  ~40s   default
  gpt-image-2-high    ~2min  highest fidelity

Live-measured on OpenAI's API today: 15.4s / 40.8s / 116.9s for the
same 1024x1024 prompt.

Config:
  image_gen.openai.model: gpt-image-2-high
  # or
  image_gen.model: gpt-image-2-low
  # or env var for scripts/tests
  OPENAI_IMAGE_MODEL=gpt-image-2-medium

Live-verified end-to-end with the low tier: 18.8s landscape render of a
golden retriever in wildflowers, vision-confirmed exact match.

* feat(tools_config): plugin image_gen providers inject themselves into picker

'hermes tools' → Image Generation now shows plugin-registered backends
alongside Nous Subscription and FAL.ai without tools_config.py needing
to know about them. OpenAI appears as a third option today; future
backends appear automatically as they're added.

Mechanism:
- ImageGenProvider gains an optional get_setup_schema() hook
  (name, badge, tag, env_vars). Default derived from display_name.
- tools_config._plugin_image_gen_providers() pulls the schemas from
  every registered non-FAL plugin provider.
- _visible_providers() appends those rows when rendering the Image
  Generation category.
- _configure_provider() handles the new image_gen_plugin_name marker:
  writes image_gen.provider and routes to the plugin's list_models()
  catalog for the model picker.
- _toolset_needs_configuration_prompt('image_gen') stops demanding a
  FAL key when any plugin provider reports is_available().

FAL is skipped in the plugin path because it already has hardcoded
TOOL_CATEGORIES rows — when it gets ported to a plugin in a follow-up
PR the hardcoded rows go away and it surfaces through the same path
as OpenAI.

Verified live: picker shows Nous Subscription / FAL.ai / OpenAI.
Picking OpenAI prompts for OPENAI_API_KEY, then shows the
gpt-image-2-low/medium/high model picker sourced from the plugin.

397 tests pass across plugins/, tools_config, registry, and picker.

* fix(image_gen): close final gaps for plugin-backend parity with FAL

Two small places that still hardcoded FAL:

- hermes_cli/setup.py status line: an OpenAI-only setup showed
  'Image Generation: missing FAL_KEY'. Now probes plugin providers
  and reports '(OpenAI)' when one is_available() — or falls back to
  'missing FAL_KEY or OPENAI_API_KEY' if nothing is configured.

- image_generate tool schema description: said 'using FAL.ai, default
  FLUX 2 Klein 9B'. Rewrote provider-neutral — 'backend and model are
  user-configured' — and notes the 'image' field can be a URL or an
  absolute path, which the gateway delivers either way via
  extract_local_files().
2026-04-21 21:30:10 -07:00
Teknium d1acf17773 feat(models): add minimax/minimax-m2.5:free to OpenRouter catalog (#13836)
Surfaces the free variant alongside the paid minimax-m2.5 entry in
both the OPENROUTER_MODELS fallback snapshot and the nous/openrouter
provider model list.
2026-04-21 21:27:40 -07:00
Teknium 410f33a728 fix(kimi): don't send Anthropic thinking to api.kimi.com/coding (#13826)
Kimi's /coding endpoint speaks the Anthropic Messages protocol but has
its own thinking semantics: when thinking.enabled is sent, Kimi validates
the history and requires every prior assistant tool-call message to carry
OpenAI-style reasoning_content. The Anthropic path never populates that
field, and convert_messages_to_anthropic strips Anthropic thinking blocks
on third-party endpoints — so after one tool-calling turn the next request
fails with:

  HTTP 400: thinking is enabled but reasoning_content is missing in
  assistant tool call message at index N

Kimi on chat_completions handles thinking via extra_body in
ChatCompletionsTransport (#13503). On the Anthropic route, drop the
parameter entirely and let Kimi drive reasoning server-side.

build_anthropic_kwargs now gates the reasoning_config -> thinking block
on not _is_kimi_coding_endpoint(base_url).

Tests: 8 new parametric tests cover /coding, /coding/v1, /coding/anthropic,
/coding/ (trailing slash), explicit disabled, other third-party endpoints
still getting thinking (MiniMax), native Anthropic unaffected, and the
non-/coding Kimi root route.
2026-04-21 21:19:14 -07:00
Teknium 7b79e0f4c9 chore(models): drop 3 models from nous portal recommended list (#13822)
Remove nvidia/nemotron-3-super-120b-a12b:free, arcee-ai/trinity-large-preview:free,
and openrouter/elephant-alpha from _PROVIDER_MODELS['nous']. The paid nemotron and
arcee-thinking variants remain.
2026-04-21 21:10:20 -07:00
kshitijk4poor 57411fca24 feat: add BedrockTransport + wire all Bedrock transport paths
Fourth and final transport — completes the transport layer with all four
api_modes covered.  Wraps agent/bedrock_adapter.py behind the ProviderTransport
ABC, handles both raw boto3 dicts and already-normalized SimpleNamespace.

Wires all transport methods to production paths in run_agent.py:
- build_kwargs: _build_api_kwargs bedrock branch
- validate_response: response validation, new bedrock_converse branch
- finish_reason: new bedrock_converse branch in finish_reason extraction

Based on PR #13467 by @kshitijk4poor, with one adjustment: the main normalize
loop does NOT add a bedrock_converse branch to invoke normalize_response on
the already-normalized response.  Bedrock's normalize_converse_response runs
at the dispatch site (run_agent.py:5189), so the response already has the
OpenAI-compatible .choices[0].message shape by the time the main loop sees
it.  Falling through to the chat_completions else branch is correct and
sidesteps a redundant NormalizedResponse rebuild.

Transport coverage — complete:
| api_mode           | Transport                | build_kwargs | normalize | validate |
|--------------------|--------------------------|:------------:|:---------:|:--------:|
| anthropic_messages | AnthropicTransport       |             |          |         |
| codex_responses    | ResponsesApiTransport    |             |          |         |
| chat_completions   | ChatCompletionsTransport |             |          |         |
| bedrock_converse   | BedrockTransport         |             |          |         |

17 new BedrockTransport tests pass.  117 transport tests total pass.
160 bedrock/converse tests across tests/agent/ pass.  Full tests/run_agent/
targeted suite passes (885/885 + 15 skipped; the 1 remaining failure is the
pre-existing test_concurrent_interrupt flake on origin/main).
2026-04-21 20:58:37 -07:00
Brooklyn Nicholson 572e27c93f fix(tui): demote gateway log-noise from Activity to info tone
Restore the old-CLI contract where only complete failures tint Activity
red. Everything else is still visible for debugging but no longer
commandeers attention.

- gateway.stderr: always tone='info' (drops the ERRLIKE_RE regex)
- gateway.protocol_error: both pushes demoted to 'info'
- commands.catalog cold-start failure: demoted to 'info'
- approval.request: no longer duplicates the overlay into Activity

Kept as 'error': terminal `error` event, gateway.start_timeout,
gateway-exited, explicit status.update kinds.
2026-04-21 20:57:40 -07:00
Brooklyn Nicholson 76ad697dcb fix(tui): don't force-open Activity on every error
Reverts the auto-expand-on-new-error effect added in 93b47d96. The
effect overrode the user's chosen detailsMode and visually interrupted
every turn. Red/yellow chevron tint remains as the passive signal —
click to read, just like Thinking and Tool calls.
2026-04-21 20:57:40 -07:00
kshitijk4poor 83d86ce344 feat: add ChatCompletionsTransport + wire all default paths
Third concrete transport — handles the default 'chat_completions' api_mode used
by ~16 OpenAI-compatible providers (OpenRouter, Nous, NVIDIA, Qwen, Ollama,
DeepSeek, xAI, Kimi, custom, etc.). Wires build_kwargs + validate_response to
production paths.

Based on PR #13447 by @kshitijk4poor, with fixes:
- Preserve tool_call.extra_content (Gemini thought_signature) via
  ToolCall.provider_data — the original shim stripped it, causing 400 errors
  on multi-turn Gemini 3 thinking requests.
- Preserve reasoning_content distinctly from reasoning (DeepSeek/Moonshot) so
  the thinking-prefill retry check (_has_structured) still triggers.
- Port Kimi/Moonshot quirks (32000 max_tokens, top-level reasoning_effort,
  extra_body.thinking) that landed on main after the original PR was opened.
- Keep _qwen_prepare_chat_messages_inplace alive and call it through the
  transport when sanitization already deepcopied (avoids a second deepcopy).
- Skip the back-compat SimpleNamespace shim in the main normalize loop — for
  chat_completions, response.choices[0].message is already the right shape
  with .content/.tool_calls/.reasoning/.reasoning_content/.reasoning_details
  and per-tool-call .extra_content from the OpenAI SDK.

run_agent.py: -239 lines in _build_api_kwargs default branch extracted to the
transport. build_kwargs now owns: codex-field sanitization, Qwen portal prep,
developer role swap, provider preferences, max_tokens resolution (ephemeral >
user > NVIDIA 16384 > Qwen 65536 > Kimi 32000 > anthropic_max_output), Kimi
reasoning_effort + extra_body.thinking, OpenRouter/Nous/GitHub reasoning,
Nous product attribution tags, Ollama num_ctx, custom-provider think=false,
Qwen vl_high_resolution_images, request_overrides.

39 new transport tests (8 build_kwargs, 5 Kimi, 4 validate, 4 normalize
including extra_content regression, 3 cache stats, 3 basic). Tests/run_agent/
targeted suite passes (885/885 + 15 skipped; the 1 remaining failure is the
test_concurrent_interrupt flake present on origin/main).
2026-04-21 20:50:02 -07:00
emozilla 29693f9d8e feat(aux): use Portal /api/nous/recommended-models for auxiliary models
Wire the auxiliary client (compaction, vision, session search, web extract)
to the Nous Portal's curated recommended-models endpoint when running on
Nous Portal, with a TTL-cached fetch that mirrors how we pull /models for
pricing.

hermes_cli/models.py
  - fetch_nous_recommended_models(portal_base_url, force_refresh=False)
    10-minute TTL cache, keyed per portal URL (staging vs prod don't
    collide).  Public endpoint, no auth required.  Returns {} on any
    failure so callers always get a dict.
  - get_nous_recommended_aux_model(vision, free_tier=None, ...)
    Tier-aware pick from the payload:
      - Paid tier → paidRecommended{Vision,Compaction}Model, falling back
        to freeRecommended* when the paid field is null (common during
        staged rollouts of new paid models).
      - Free tier → freeRecommended* only, never leaks paid models.
    When free_tier is None, auto-detects via the existing
    check_nous_free_tier() helper (already cached 3 min against
    /api/oauth/account).  Detection errors default to paid so we never
    silently downgrade a paying user.

agent/auxiliary_client.py — _try_nous()
  - Replaces the hardcoded xiaomi/mimo free-tier branch with a single call
    to get_nous_recommended_aux_model(vision=vision).
  - Falls back to _NOUS_MODEL (google/gemini-3-flash-preview) when the
    Portal is unreachable or returns a null recommendation.
  - The Portal is now the source of truth for aux model selection; the
    xiaomi allowlist we used to carry is effectively dead.

Tests (15 new)
  - tests/hermes_cli/test_models.py::TestNousRecommendedModels
    Fetch caching, per-portal keying, network failure, force_refresh;
    paid-prefers-paid, paid-falls-to-free, free-never-leaks-paid,
    auto-detect, detection-error → paid default, null/blank modelName
    handling.
  - tests/agent/test_auxiliary_client.py::TestNousAuxiliaryRefresh
    _try_nous honors Portal recommendation for text + vision, falls
    back to google/gemini-3-flash-preview on None or exception.

Behavior won't visibly change today — both tier recommendations currently
point at google/gemini-3-flash-preview — but the moment the Portal ships
a better paid recommendation, subscribers pick it up within 10 minutes
without a Hermes release.
2026-04-21 20:35:16 -07:00
emozilla c22f4a76de remove Nous Portal free-model allowlist
Drop _NOUS_ALLOWED_FREE_MODELS + filter_nous_free_models and its two call
sites. Whatever Nous Portal prices as free now shows up in the picker as-is
— no local allowlist gatekeeping. Free-tier partitioning (paid vs free in
the menu) still runs via partition_nous_models_by_tier.
2026-04-21 20:35:16 -07:00
Kongxi dd8ab40556 fix(delegation): add hard timeout and stale detection for subagent execution (#13770)
- Wrap child.run_conversation() in a ThreadPoolExecutor with configurable
  timeout (delegation.child_timeout_seconds, default 300s) to prevent
  indefinite blocking when a subagent's API call or tool HTTP request hangs.

- Add heartbeat stale detection: if a child's api_call_count doesn't
  advance for 5 consecutive heartbeat cycles (~2.5 min), stop touching
  the parent's activity timestamp so the gateway inactivity timeout
  can fire as a last resort.

- Add 'timeout' as a new exit_reason/status alongside the existing
  completed/max_iterations/interrupted states.

- Use shutdown(wait=False) on the timeout executor to avoid the
  ThreadPoolExecutor.__exit__ deadlock when a child is stuck on
  blocking I/O.

Closes #13768
2026-04-21 20:20:16 -07:00
kshitijk4poor c832ebd67c feat: add ResponsesApiTransport + wire all Codex transport paths
Add ResponsesApiTransport wrapping codex_responses_adapter.py behind the
ProviderTransport ABC. Auto-registered via _discover_transports().

Wire ALL Codex transport methods to production paths in run_agent.py:
- build_kwargs: main _build_api_kwargs codex branch (50 lines extracted)
- normalize_response: main loop + flush + summary + retry (4 sites)
- convert_tools: memory flush tool override
- convert_messages: called internally via build_kwargs
- validate_response: response validation gate
- preflight_kwargs: request sanitization (2 sites)

Remove 7 dead legacy wrappers from AIAgent (_responses_tools,
_chat_messages_to_responses_input, _normalize_codex_response,
_preflight_codex_api_kwargs, _preflight_codex_input_items,
_extract_responses_message_text, _extract_responses_reasoning_text).
Keep 3 ID manipulation methods still used by _build_assistant_message.

Update 18 test call sites across 3 test files to call adapter functions
directly instead of through deleted AIAgent wrappers.

24 new tests. 343 codex/responses/transport tests pass (0 failures).

PR 4 of the provider transport refactor.
2026-04-21 19:48:56 -07:00
Teknium 09dd5eb6a5 chore(release): map xiaoqiang243 personal email in AUTHOR_MAP 2026-04-21 19:48:39 -07:00
Teknium b2ba351380 fix(kimi): reconcile sk-kimi- routing with Anthropic SDK URL semantics
Follow-ups after salvaging xiaoqiang243's kimi-for-coding patches:

- KIMI_CODE_BASE_URL: drop trailing /v1 (was /coding/v1).
  The /coding endpoint speaks Anthropic Messages, and the Anthropic SDK
  appends /v1/messages internally. /coding/v1 + SDK suffix produced
  /coding/v1/v1/messages (a 404). /coding + SDK suffix now yields
  /coding/v1/messages correctly.
- kimi-coding ProviderConfig: keep legacy default api.moonshot.ai/v1 so
  non-sk-kimi- moonshot keys still authenticate. sk-kimi- keys are
  already redirected to api.kimi.com/coding via _resolve_kimi_base_url.
- doctor.py: update Kimi UA to claude-code/0.1.0 (was KimiCLI/1.30.0)
  and rewrite /coding base URLs to /coding/v1 for the /models health
  check (Anthropic surface has no /models).
- test_kimi_env_vars: accept KIMI_CODING_API_KEY as a secondary env var.

E2E verified:
  sk-kimi-<key>  → https://api.kimi.com/coding/v1/messages (Anthropic)
  sk-<legacy>    → https://api.moonshot.ai/v1/chat/completions (OpenAI)
  UA: claude-code/0.1.0, x-api-key: <sk-kimi-*>
2026-04-21 19:48:39 -07:00
王强 6caf8bd994 fix: Enhance Kimi Coding API mode detection and User-Agent 2026-04-21 19:48:39 -07:00
王强 2a026eb762 fix: Update Kimi Coding API endpoint and User-Agent 2026-04-21 19:48:39 -07:00
王强 46d680125e fix(kimi-coding): set anthropic_messages api_mode for /coding endpoint 2026-04-21 19:48:39 -07:00
王强 bad5471409 fix(kimi-coding): add KIMI_CODING_API_KEY fallback + api_mode detection for /coding endpoint 2026-04-21 19:48:39 -07:00
王强 fd403854b9 fix: auto-detect anthropic_messages mode for Kimi /coding/v1 endpoints 2026-04-21 19:48:39 -07:00
王强 de181dfd22 fix: add User-Agent claude-code/0.1.0 for Kimi /coding endpoint
- Add _is_kimi_coding_endpoint() to detect Kimi coding API
- Place Kimi check BEFORE _requires_bearer_auth to ensure User-Agent header is set
- Without this header, Kimi returns 403 on /coding/v1/messages
- Fixes kimi-2.5, kimi-for-coding, kimi-k2.6-code-preview all returning 403
2026-04-21 19:48:39 -07:00
Teknium 84449d9afe fix(prompt): tell CLI agents not to emit MEDIA:/path tags (#13766)
The CLI has no attachment channel — MEDIA:<path> tags are only
intercepted on messaging gateway platforms (Telegram, Discord,
Slack, WhatsApp, Signal, BlueBubbles, email, etc.). On the CLI
they render as literal text, which is confusing for users.

The CLI platform hint was the one PLATFORM_HINTS entry that said
nothing about file delivery, so models trained on the messaging
hints would default to MEDIA: tags on the CLI too. Tool schemas
(browser_tool, tts_tool, etc.) also recommend MEDIA: generically.

Extend the CLI hint to explicitly discourage MEDIA: tags and tell
the agent to reference files by plain absolute path instead.

Add a regression test asserting the CLI hint carries negative
guidance about MEDIA: while messaging hints keep positive guidance.
2026-04-21 19:36:05 -07:00
Teknium 0a1e85dd0d fix(skills/baoyu-comic): absolute curl paths + clarify-timeout handling (#13775)
* fix(skills/baoyu-comic): require absolute paths for curl -o downloads

When downloading generated images across several batches of image_generate
calls, relying on persistent-shell CWD is unsafe. The terminal tool's shell
can rotate (TERMINAL_LIFETIME_SECONDS expiry, a failed cd that leaves the
shell somewhere else), and 'curl -fsSL <url> -o relative.png' then silently
writes to the wrong directory with no error.

Update the skill's Step 7 Download step to require absolute -o paths (or
workdir= on the terminal tool) and add a matching pitfall entry referencing
the Apr 2026 incident where pages 06-09 of a 10-page comic landed at the
repo root instead of comic/<slug>/. The agent then spent several turns
claiming the files existed where they didn't.

* fix(skills/baoyu-comic): handle clarify timeouts correctly in Step 2

A clarify timeout returning 'Use your best judgement to make the choice
and proceed' is NOT user consent to default the entire Step 2 questionnaire.
It is a per-question default only. Add guidance at both instruction sites
(SKILL.md User Questions section, references/workflow.md Step 2 header)
telling the agent to:

1. Continue asking the remaining questions in the sequence after a
   timeout — each question is an independent consent point.
2. Surface every defaulted choice in the next user-visible message
   so the user can correct it when they return. An unreported default
   is indistinguishable from never having asked.

Reported live Apr 2026: agent asked style question via clarify, got a
timeout response, and silently defaulted style + narrative focus +
audience + review flags in one pass. User only learned style had
defaulted to 'ohmsha' after the comic was fully generated.
2026-04-21 19:35:42 -07:00
brooklyn! 1dfbfcfe74 Merge pull request #13729 from NousResearch/bb/tui-diff-inline-sequence
fix(tui): tool inline_diff renders inline with the active turn
2026-04-21 21:13:50 -05:00
Teknium 964b444107 fix(website): run skill extraction automatically on npm run build/start (#13747)
website/src/pages/skills/index.tsx imports ../../data/skills.json, but
that file is git-ignored and generated at build time by
website/scripts/extract-skills.py. CI workflows (deploy-site.yml,
docs-site-checks.yml) run the script explicitly before 'npm run build',
so production and PR checks always work — but 'npm run build' on a
contributor's machine fails with:

  Module not found: Can't resolve '../../data/skills.json'

because the extraction step was never wired into the npm scripts.

Adds a prebuild/prestart hook that runs extract-skills.py automatically.
If python3 or pyyaml aren't installed locally, writes an empty
skills.json instead of hard-failing — the Skills Hub page renders with
an empty state, the rest of the site builds normally, and CI (which
always has the deps) still generates the full catalog for production.
2026-04-21 18:02:04 -07:00
Teknium bf73ced4f5 docs: document delegation width + depth knobs (#13745)
Fills the three gaps left by the orchestrator/width-depth salvage:

- configuration.md §Delegation: max_concurrent_children, max_spawn_depth,
  orchestrator_enabled are now in the canonical config.yaml reference
  with a paragraph covering defaults, clamping, role-degradation, and
  the 3x3x3=27-leaf cost scaling.
- environment-variables.md: adds DELEGATION_MAX_CONCURRENT_CHILDREN to
  the Agent Behavior table.
- features/delegation.md: corrects stale 'default 5, cap 8' wording
  (that was from the original PR; the salvage landed on default 3 with
  no ceiling and a tool error on excess instead of truncation).
2026-04-21 17:54:39 -07:00
Jim Liu 宝玉 83a7a005aa fix(skills): clarify baoyu-comic character sheet role
Page prompts are written in Step 5 from the text descriptions in
characters/characters.md — the PNG sheet generated in Step 7.1
cannot be used to write them. Reposition the PNG as a human-facing
review artifact (and reference for later regenerations / manual
edits), and drop the confusing "Character sheet | Strategy" tables
since the embedding rule is uniform.
2026-04-21 17:50:04 -07:00
Jim Liu 宝玉 fe025425cb fix(skills): address baoyu-comic PR review
- Remove PDF merge feature and scripts/ directory (no pdf-lib dep)
- Correct image_generate docs: prompt-only, returns URL; add
  curl download step after every call
- Downgrade reference images to text-based trait extraction
  (style/palette/scene); character sheet is agent-facing reference
- Unify source file naming on source-{slug}.md across SKILL.md
  and workflow.md
2026-04-21 17:50:04 -07:00
Jim Liu 宝玉 a8beba82d0 refactor(skills): adapt baoyu-comic for Hermes
Port the upstream baoyu-comic skill to Hermes' tool ecosystem, matching
the earlier baoyu-infographic adaptation:

- metadata namespace openclaw -> hermes (+ tags, homepage)
- drop EXTEND.md preferences system (references/config/ removed,
  workflow Step 1.1 removed)
- user prompts via clarify (one question at a time) instead of
  AskUserQuestion batches
- image generation via image_generate instead of baoyu-imagine, with
  aspect-ratio mapping to landscape/portrait/square
- Windows/PowerShell/WSL shell snippets dropped
- file I/O referenced via Hermes write_file/read_file tools
- CLI-style --flags converted to natural-language options and
  user-intent cues (skill matching has no slash command trigger)

Add PORT_NOTES.md documenting the adaptations and a sync procedure.
Art-style/tone/layout reference files are preserved verbatim from
upstream v1.56.1.
2026-04-21 17:50:04 -07:00
Jim Liu 宝玉 be7dcf3628 feat(skills): add baoyu-comic skill 2026-04-21 17:50:04 -07:00
Teknium 8f167e8791 fix(tts): use per-provider input-character caps instead of global 4000 (#13743)
A single global MAX_TEXT_LENGTH = 4000 truncated every TTS provider at
4000 chars, causing long inputs to be silently chopped even though the
underlying APIs allow much more:

  - OpenAI:     4096
  - xAI:        15000
  - MiniMax:    10000
  - ElevenLabs: 5000 / 10000 / 30000 / 40000 (model-aware)
  - Gemini:     ~5000
  - Edge:       ~5000

The schema description also told the model 'Keep under 4000 characters',
which encouraged the agent to self-chunk long briefs into multiple TTS
calls (producing 3 separate audio files instead of one).

New behavior:
  - PROVIDER_MAX_TEXT_LENGTH table + ELEVENLABS_MODEL_MAX_TEXT_LENGTH
    encode the documented per-provider limits.
  - _resolve_max_text_length(provider, cfg) resolves:
      1. tts.<provider>.max_text_length user override
      2. ElevenLabs model_id lookup
      3. provider default
      4. 4000 fallback
  - text_to_speech_tool() and stream_tts_to_speaker() both call the
    resolver; old MAX_TEXT_LENGTH alias kept for back-compat.
  - Schema description no longer hardcodes 4000.

Tests: 27 new unit + E2E tests; all 53 existing TTS tests and 253
voice-command/voice-cli tests still pass.
2026-04-21 17:49:39 -07:00
Brooklyn Nicholson a8eb13e828 fix(tui): dedupe inline diffs, strip CLI review-diff header
After the prior inline-diff fix, the gateway still prepends a literal
"  ┊ review diff" line to inline_diff (it's terminal chrome written by
`_emit_inline_diff`). Wrapping that in a ```diff fence left that header
inside the code block. The agent also often narrates its own edit in a
second fenced diff, so the assistant message ended up stacking two
diff blocks for the same change.

- Strip the leading "┊ review diff" header from queued inline diffs
  before fencing.
- Skip appending the fenced diff entirely when the assistant already
  wrote its own ```diff (or ```patch) fence.

Keeps the single-surface diff UX even when the agent is chatty.
2026-04-21 19:21:00 -05:00
Brooklyn Nicholson e684afa151 fix(tui): keep review-diff tool rows terse
When tool.complete already carries inline_diff, the assistant message owns the full diff block. Suppress the tool-row summary/detail in that case so the turn shows one detailed diff surface instead of a rich diff plus a duplicated tool-detail payload.
2026-04-21 19:13:15 -05:00
Brooklyn Nicholson 9654c9fb10 fix(tui): dedupe inline_diff when assistant already echoes it
Avoid duplicate diff rendering in #13729 flow. We now skip queued inline diffs that are already present in final assistant text and dedupe repeated queued diffs by exact content.
2026-04-21 19:06:49 -05:00
Brooklyn Nicholson 31b3b09ea4 fix(tui): render inline diffs inside assistant completion
Follow-up for #13729: segment-level system artifacts still looked detached in real flow.\n\nInstead of appending inline_diff as a standalone segment/system row, queue sanitized diffs during tool.complete and append them as a fenced diff block to the assistant completion text on message.complete. This keeps the diff in the same message flow as the assistant response.
2026-04-21 19:02:53 -05:00
brooklyn! 1e5daa4ece Merge pull request #13728 from NousResearch/bb/tui-history-local
fix(tui): /history shows the TUI's own transcript, scrollable
2026-04-21 18:59:31 -05:00
brooklyn! 90fca3c7e0 Merge pull request #13724 from NousResearch/bb/tui-resume-all-sources
fix(tui): /resume picker shows telegram/discord/etc sessions
2026-04-21 18:59:12 -05:00
brooklyn! e2feccf7c6 Merge pull request #13726 from NousResearch/bb/tui-multiline-up-arrow
fix(tui): up-arrow inside a multi-line buffer moves cursor, not history
2026-04-21 18:58:56 -05:00
Brooklyn Nicholson 35cc66df62 fix(tui): arrow history fallback when no line exists
Follow-up on multiline arrow behavior: Up/Down now fall back to queue/history whenever there is no logical line above/below the caret (not only at absolute start/end character positions). This makes Up from the end of the top line cycle history, matching expected readline-ish behavior.
2026-04-21 18:55:57 -05:00
Brooklyn Nicholson bd046220b3 fix(tui): narrow /resume sources to human adapters
Follow-up on #13724: showing literally every source was too noisy.\n\n now fetches a wider window (, larger limit) and then filters to a curated allowlist of human-facing sources (tui/cli plus chat adapters like telegram/discord/slack/whatsapp/etc). This keeps row #7 fixed (telegram sessions visible in /resume) without surfacing internal source kinds such as tool/acp.
2026-04-21 18:52:26 -05:00
Brooklyn Nicholson bddf0cd61e fix(tui): keep inline diffs below tool rows and strip ANSI
Follow-up on #13729 from blitz screenshot feedback.\n\n- When tool.complete carried inline_diff but no buffered assistant text existed, pending tool rows were still in streamPendingTools, so diff rendered above the tool row section. appendSegmentMessage now emits pending tool rows as a trail segment before appending the diff artifact.\n- Strip ANSI color escapes from inline_diff payloads so we don't render loud red/green terminal palettes in the transcript.
2026-04-21 18:50:42 -05:00
Brooklyn Nicholson 95fd023eeb fix(tui): only cycle history at input boundaries on arrows
Follow-up on #13726 from blitz feedback: Up/Down history cycling should only trigger when the caret is at the start/end boundary (or the input is empty).\n\nPreviously useInputHandlers intercepted arrows whenever inputBuf was empty, which still stole Up/Down from normal multiline editing. textInput now publishes caret position through inputSelectionStore even with no active selection, and useInputHandlers gates history/queue cycling on those boundaries.
2026-04-21 18:48:35 -05:00
Teknium 9c9d9b7ddf feat(delegate): cross-agent file state coordination for concurrent subagents (#13718)
* feat(models): hide OpenRouter models that don't advertise tool support

Port from Kilo-Org/kilocode#9068.

hermes-agent is tool-calling-first — every provider path assumes the
model can invoke tools. Models whose OpenRouter supported_parameters
doesn't include 'tools' (e.g. image-only or completion-only models)
cannot be driven by the agent loop and fail at the first tool call.

Filter them out of fetch_openrouter_models() so they never appear in
the model picker (`hermes model`, setup wizard, /model slash command).

Permissive when the field is missing — OpenRouter-compatible gateways
(Nous Portal, private mirrors, older snapshots) don't always populate
supported_parameters. Treat missing as 'unknown → allow' rather than
silently emptying the picker on those gateways. Only hide models
whose supported_parameters is an explicit list that omits tools.

Tests cover: tools present → kept, tools absent → dropped, field
missing → kept, malformed non-list → kept, non-dict item → kept,
empty list → dropped.

* feat(delegate): cross-agent file state coordination for concurrent subagents

Prevents mangled edits when concurrent subagents touch the same file
(same process, same filesystem — the mangle scenario from #11215).

Three layers, all opt-out via HERMES_DISABLE_FILE_STATE_GUARD=1:

1. FileStateRegistry (tools/file_state.py) — process-wide singleton
   tracking per-agent read stamps and the last writer globally.
   check_stale() names the sibling subagent in the warning when a
   non-owning agent wrote after this agent's last read.

2. Per-path threading.Lock wrapped around the read-modify-write
   region in write_file_tool and patch_tool. Concurrent siblings on
   the same path serialize; different paths stay fully parallel.
   V4A multi-file patches lock in sorted path order (deadlock-free).

3. Delegate-completion reminder in tools/delegate_tool.py: after a
   subagent returns, writes_since(parent, child_start, parent_reads)
   appends '[NOTE: subagent modified files the parent previously
   read — re-read before editing: ...]' to entry.summary when the
   child touched anything the parent had already seen.

Complements (does not replace) the existing path-overlap check in
run_agent._should_parallelize_tool_batch — batch check prevents
same-file parallel dispatch within one agent's turn (cheap prevention,
zero API cost), registry catches cross-subagent and cross-turn
staleness at write time (detection).

Behavior is warning-only, not hard-failing — matches existing project
style. Errors surface naturally: sibling writes often invalidate the
old_string in patch operations, which already errors cleanly.

Tests: tests/tools/test_file_state_registry.py — 16 tests covering
registry state transitions, per-path locking, per-path-not-global
locking, writes_since filtering, kill switch, and end-to-end
integration through the real read_file/write_file/patch handlers.
2026-04-21 16:41:26 -07:00
Brooklyn Nicholson dff1c8fcf1 fix(tui): tool inline_diff renders inline with the active turn
Reported during TUI v2 blitz retest: code-review diffs from tool.complete
appeared at the top of the current interaction thread, out of sequence
with the agent's messages and tool rows below them.

Root cause — `sys(inline_diff)` appends to `historyItems`, which sits
above the `StreamingAssistant` pane that renders the active turn.
Until the turn closed, the diff visually floated above everything
else happening in the same turn.

Route the diff through `turnController.appendSegmentMessage` instead
so it flushes any pending streaming text first, then lands in the
segment stream beside assistant output and tool calls.  On
`message.complete` the segment list is committed to history in emit
order (diff → final text), matching what the gateway sent.

Adds a regression test that exercises tool.complete → message.complete
with an inline_diff payload and asserts both the streaming and final
placement.
2026-04-21 18:35:59 -05:00
Brooklyn Nicholson 723a9cfb1e fix(tui): /history shows the TUI's own transcript, scrollable
Reported during TUI v2 blitz retest: `/history` in the TUI only shows
prompts from non-TUI Hermes runs and can't scroll the window.  Root
cause is the slash-worker subprocess: it's a detached HermesCLI that
never sees the TUI's turns, so its `conversation_history` starts empty
and `show_history` surfaces whatever was persisted from earlier CLI
sessions — not what the user just did inside the TUI.

Intercept `/history` as a local slash command so it dumps
`ctx.local.getHistoryItems()` — the TUI's own transcript — routed
through the pager (which scrolls after #13591).  Accepts an optional
preview-length argument (default 400 chars per message).

Adds createSlashHandler coverage.
2026-04-21 18:33:27 -05:00
Brooklyn Nicholson d30f6ac44e fix(tui): up-arrow inside a multi-line buffer moves cursor, not history
Reported during TUI v2 blitz retest: typing a multi-line message with
shift-Enter and then pressing Up to edit an earlier line swapped the
whole buffer for the previous history entry instead of moving the
cursor up a line.  Down then restored the draft → the buffer appeared
to "flip" between the draft and a prior prompt.

`useInputHandlers` cycles history on Up/Down, but textInput only
checked `inputBuf.length` — that only counts lines committed with a
trailing backslash, not shift-Enter newlines inside `input` itself.

Fix: detect logical lines inside the input string and move the cursor
one line up/down preserving column offset (clamp to line end when the
destination is shorter, standard editor behavior).  Only fall through
to history cycling when the cursor is already on the first line (Up)
or last line (Down).

Adds unit coverage for the new `lineNav` helper.
2026-04-21 18:31:35 -05:00
Brooklyn Nicholson 0dfb7b8a0d fix(tui): /resume picker shows telegram/discord/etc sessions
Reported during TUI v2 blitz retest: /resume modal only surfaced tui/cli
rows, even though `hermes --tui --resume <id>` with a pasted telegram
session id works fine.  The handler double-fetched with explicit
`source="tui"` and `source="cli"` filters and dropped everything else on
the floor.

Drop the filter — list_sessions_rich(source=None) already excludes
child sessions (subagents, compression continuations) via its default,
and users want to resume messenger sessions from inside the TUI.

Adds gateway regression coverage.
2026-04-21 18:28:40 -05:00
brooklyn! 35a4b093d8 Merge pull request #13719 from NousResearch/bb/tui-markdown-cleanup
refactor(tui): clean markdown.tsx per KISS/DRY
2026-04-21 18:13:18 -05:00
brooklyn! 5504ee8de8 Merge pull request #13715 from NousResearch/bb/tui-markdown-tilde-subscript
fix(tui): don't swallow Kimi/Qwen ~! ~? kaomoji as subscript spans
2026-04-21 18:12:59 -05:00
Brooklyn Nicholson b97b4c4981 refactor(tui): clean markdown.tsx per KISS/DRY
- Drop the outer no-op capture group from INLINE_RE and restructure the
  source as an ordered list of patterns-with-index-comments so each
  alternative is individually greppable. Shift group indices in MdInline
  down by one accordingly.
- Inline single-use helpers (parseFence, isFenceClose, isMarkdownFence,
  trimBareUrl) and intermediate variables (path, lang, raw, prefix, body,
  depth, task body, setext match, etc.).
- Hoist block-level regexes used inside MdImpl (FENCE_CLOSE_RE, SETEXT_RE,
  BULLET_RE, TASK_RE, NUMBERED_RE, QUOTE_RE) to top-level consts so
  they're compiled once instead of per-line.
- Collapse the duplicate compact-vs-normal blank-line branches into one
  if/!compact gap call.
- Move Fence and MdProps types to the bottom per house style.
- Shorten splitTableRow → splitRow and use optional chaining in a few
  match sites.

No behavior change; 162/162 tests pass. Net -22 LoC.
2026-04-21 18:11:12 -05:00
Brooklyn Nicholson 43eb1153e9 fix(tui): don't swallow Kimi/Qwen ~! ~? kaomoji as subscript spans
The inline markdown regex had `~([^~\s][^~]*?)~` for Pandoc-style subscript
(H~2~O, CO~2~). On models that decorate prose with kaomoji like `thing ~!`
and `cool ~?` — Kimi especially — the opener `~!` paired with the next
stray `~` on the line and dim-formatted everything between them with a
leading `_` character, mangling markdown output.

Tighten the pattern to short alphanumeric-only content (`~[A-Za-z0-9]{1,8}~`)
since real subscript never contains punctuation, spaces, or long runs.
Same tightening applied to stripInlineMarkup so width measurement stays
consistent. Classic CLI was unaffected because it renders these literally.
2026-04-21 17:34:48 -05:00
164 changed files with 13088 additions and 1488 deletions
+1 -1
View File
@@ -13,7 +13,7 @@
**The self-improving AI agent built by [Nous Research](https://nousresearch.com).** It's the only agent with a built-in learning loop — it creates skills from experience, improves them during use, nudges itself to persist knowledge, searches its own past conversations, and builds a deepening model of who you are across sessions. Run it on a $5 VPS, a GPU cluster, or serverless infrastructure that costs nearly nothing when idle. It's not tied to your laptop — talk to it from Telegram while it works on a cloud VM.
Use any model you want — [Nous Portal](https://portal.nousresearch.com), [OpenRouter](https://openrouter.ai) (200+ models), [NVIDIA NIM](https://build.nvidia.com) (Nemotron), [Xiaomi MiMo](https://platform.xiaomimimo.com), [z.ai/GLM](https://z.ai), [Kimi/Moonshot](https://platform.moonshot.ai), [MiniMax](https://www.minimax.io), [Hugging Face](https://huggingface.co), OpenAI, or your own endpoint. Switch with `hermes model` — no code changes, no lock-in.
Use any model you want — [Nous Portal](https://portal.nousresearch.com), [OpenRouter](https://openrouter.ai) (200+ models), [Volcengine](https://www.volcengine.com/product/ark), [BytePlus](https://www.byteplus.com/en/product/modelark), [NVIDIA NIM](https://build.nvidia.com) (Nemotron), [Xiaomi MiMo](https://platform.xiaomimimo.com), [z.ai/GLM](https://z.ai), [Kimi/Moonshot](https://platform.moonshot.ai), [MiniMax](https://www.minimax.io), [Hugging Face](https://huggingface.co), OpenAI, or your own endpoint. Switch with `hermes model` — no code changes, no lock-in.
<table>
<tr><td><b>A real terminal interface</b></td><td>Full TUI with multiline editing, slash-command autocomplete, conversation history, interrupt-and-redirect, and streaming tool output.</td></tr>
+79 -4
View File
@@ -266,6 +266,14 @@ def _is_third_party_anthropic_endpoint(base_url: str | None) -> bool:
return True # Any other endpoint is a third-party proxy
def _is_kimi_coding_endpoint(base_url: str | None) -> bool:
"""Return True for Kimi's /coding endpoint that requires claude-code UA."""
normalized = _normalize_base_url_text(base_url)
if not normalized:
return False
return normalized.rstrip("/").lower().startswith("https://api.kimi.com/coding")
def _requires_bearer_auth(base_url: str | None) -> bool:
"""Return True for Anthropic-compatible providers that require Bearer auth.
@@ -323,9 +331,18 @@ def build_anthropic_client(api_key: str, base_url: str = None, timeout: float =
kwargs["base_url"] = normalized_base_url
common_betas = _common_betas_for_base_url(normalized_base_url)
if _requires_bearer_auth(normalized_base_url):
if _is_kimi_coding_endpoint(base_url):
# Kimi's /coding endpoint requires User-Agent: claude-code/0.1.0
# to be recognized as a valid Coding Agent. Without it, returns 403.
# Check this BEFORE _requires_bearer_auth since both match api.kimi.com/coding.
kwargs["api_key"] = api_key
kwargs["default_headers"] = {
"User-Agent": "claude-code/0.1.0",
**( {"anthropic-beta": ",".join(common_betas)} if common_betas else {} )
}
elif _requires_bearer_auth(normalized_base_url):
# Some Anthropic-compatible providers (e.g. MiniMax) expect the API key in
# Authorization: Bearer even for regular API keys. Route those endpoints
# Authorization: Bearer *** for regular API keys. Route those endpoints
# through auth_token so the SDK sends Bearer auth instead of x-api-key.
# Check this before OAuth token shape detection because MiniMax secrets do
# not use Anthropic's sk-ant-api prefix and would otherwise be misread as
@@ -1066,6 +1083,31 @@ def convert_messages_to_anthropic(
"name": fn.get("name", ""),
"input": parsed_args,
})
# Kimi's /coding endpoint (Anthropic protocol) requires assistant
# tool-call messages to carry reasoning_content when thinking is
# enabled server-side. Preserve it as a thinking block so Kimi
# can validate the message history. See hermes-agent#13848.
#
# Accept empty string "" — _copy_reasoning_content_for_api()
# injects "" as a tier-3 fallback for Kimi tool-call messages
# that had no reasoning. Kimi requires the field to exist, even
# if empty.
#
# Prepend (not append): Anthropic protocol requires thinking
# blocks before text and tool_use blocks.
#
# Guard: only add when reasoning_details didn't already contribute
# thinking blocks. On native Anthropic, reasoning_details produces
# signed thinking blocks — adding another unsigned one from
# reasoning_content would create a duplicate (same text) that gets
# downgraded to a spurious text block on the last assistant message.
reasoning_content = m.get("reasoning_content")
_already_has_thinking = any(
isinstance(b, dict) and b.get("type") in ("thinking", "redacted_thinking")
for b in blocks
)
if isinstance(reasoning_content, str) and not _already_has_thinking:
blocks.insert(0, {"type": "thinking", "thinking": reasoning_content})
# Anthropic rejects empty assistant content
effective = blocks or content
if not effective or effective == "":
@@ -1221,6 +1263,7 @@ def convert_messages_to_anthropic(
# cache markers can interfere with signature validation.
_THINKING_TYPES = frozenset(("thinking", "redacted_thinking"))
_is_third_party = _is_third_party_anthropic_endpoint(base_url)
_is_kimi = _is_kimi_coding_endpoint(base_url)
last_assistant_idx = None
for i in range(len(result) - 1, -1, -1):
@@ -1232,7 +1275,25 @@ def convert_messages_to_anthropic(
if m.get("role") != "assistant" or not isinstance(m.get("content"), list):
continue
if _is_third_party or idx != last_assistant_idx:
if _is_kimi:
# Kimi's /coding endpoint enables thinking server-side and
# requires unsigned thinking blocks on replayed assistant
# tool-call messages. Strip signed Anthropic blocks (Kimi
# can't validate signatures) but preserve the unsigned ones
# we synthesised from reasoning_content above.
new_content = []
for b in m["content"]:
if not isinstance(b, dict) or b.get("type") not in _THINKING_TYPES:
new_content.append(b)
continue
if b.get("signature") or b.get("data"):
# Anthropic-signed block — Kimi can't validate, strip
continue
# Unsigned thinking (synthesised from reasoning_content) —
# keep it: Kimi needs it for message-history validation.
new_content.append(b)
m["content"] = new_content or [{"type": "text", "text": "(empty)"}]
elif _is_third_party or idx != last_assistant_idx:
# Third-party endpoint: strip ALL thinking blocks from every
# assistant message — signatures are Anthropic-proprietary.
# Direct Anthropic: strip from non-latest assistant messages only.
@@ -1409,11 +1470,25 @@ def build_anthropic_kwargs(
# MiniMax Anthropic-compat endpoints support thinking (manual mode only,
# not adaptive). Haiku does NOT support extended thinking — skip entirely.
#
# Kimi's /coding endpoint speaks the Anthropic Messages protocol but has
# its own thinking semantics: when ``thinking.enabled`` is sent, Kimi
# validates the message history and requires every prior assistant
# tool-call message to carry OpenAI-style ``reasoning_content``. The
# Anthropic path never populates that field, and
# ``convert_messages_to_anthropic`` strips all Anthropic thinking blocks
# on third-party endpoints — so the request fails with HTTP 400
# "thinking is enabled but reasoning_content is missing in assistant
# tool call message at index N". Kimi's reasoning is driven server-side
# on the /coding route, so skip Anthropic's thinking parameter entirely
# for that host. (Kimi on chat_completions enables thinking via
# extra_body in the ChatCompletionsTransport — see #13503.)
#
# On 4.7+ the `thinking.display` field defaults to "omitted", which
# silently hides reasoning text that Hermes surfaces in its CLI. We
# request "summarized" so the reasoning blocks stay populated — matching
# 4.6 behavior and preserving the activity-feed UX during long tool runs.
if reasoning_config and isinstance(reasoning_config, dict):
_is_kimi_coding = _is_kimi_coding_endpoint(base_url)
if reasoning_config and isinstance(reasoning_config, dict) and not _is_kimi_coding:
if reasoning_config.get("enabled") is not False and "haiku" not in model.lower():
effort = str(reasoning_config.get("effort", "medium")).lower()
budget = THINKING_BUDGET.get(effort, 8000)
+38 -22
View File
@@ -74,6 +74,10 @@ _PROVIDER_ALIASES = {
"minimax_cn": "minimax-cn",
"claude": "anthropic",
"claude-code": "anthropic",
"volcengine-coding-plan": "volcengine",
"volcengine_coding_plan": "volcengine",
"byteplus-coding-plan": "byteplus",
"byteplus_coding_plan": "byteplus",
}
@@ -134,6 +138,7 @@ _API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
"gemini": "gemini-3-flash-preview",
"zai": "glm-4.5-flash",
"kimi-coding": "kimi-k2-turbo-preview",
"stepfun": "step-3.5-flash",
"kimi-coding-cn": "kimi-k2-turbo-preview",
"minimax": "MiniMax-M2.7",
"minimax-cn": "MiniMax-M2.7",
@@ -182,8 +187,6 @@ auxiliary_is_nous: bool = False
# Default auxiliary models per provider
_OPENROUTER_MODEL = "google/gemini-3-flash-preview"
_NOUS_MODEL = "google/gemini-3-flash-preview"
_NOUS_FREE_TIER_VISION_MODEL = "xiaomi/mimo-v2-omni"
_NOUS_FREE_TIER_AUX_MODEL = "xiaomi/mimo-v2-pro"
_NOUS_DEFAULT_BASE_URL = "https://inference-api.nousresearch.com/v1"
_ANTHROPIC_DEFAULT_BASE_URL = "https://api.anthropic.com"
_AUTH_JSON_PATH = get_hermes_home() / "auth.json"
@@ -845,7 +848,7 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
return GeminiNativeClient(api_key=api_key, base_url=base_url), model
extra = {}
if base_url_host_matches(base_url, "api.kimi.com"):
extra["default_headers"] = {"User-Agent": "KimiCLI/1.30.0"}
extra["default_headers"] = {"User-Agent": "claude-code/0.1.0"}
elif base_url_host_matches(base_url, "api.githubcopilot.com"):
from hermes_cli.models import copilot_default_headers
@@ -871,7 +874,7 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
return GeminiNativeClient(api_key=api_key, base_url=base_url), model
extra = {}
if base_url_host_matches(base_url, "api.kimi.com"):
extra["default_headers"] = {"User-Agent": "KimiCLI/1.30.0"}
extra["default_headers"] = {"User-Agent": "claude-code/0.1.0"}
elif base_url_host_matches(base_url, "api.githubcopilot.com"):
from hermes_cli.models import copilot_default_headers
@@ -927,22 +930,35 @@ def _try_nous(vision: bool = False) -> Tuple[Optional[OpenAI], Optional[str]]:
global auxiliary_is_nous
auxiliary_is_nous = True
logger.debug("Auxiliary client: Nous Portal")
if nous.get("source") == "pool":
model = "gemini-3-flash"
else:
model = _NOUS_MODEL
# Free-tier users can't use paid auxiliary models — use the free
# models instead: mimo-v2-omni for vision, mimo-v2-pro for text tasks.
# Paid accounts keep their tier-appropriate models: gemini-3-flash-preview
# for both text and vision tasks.
# Ask the Portal which model it currently recommends for this task type.
# The /api/nous/recommended-models endpoint is the authoritative source:
# it distinguishes paid vs free tier recommendations, and get_nous_recommended_aux_model
# auto-detects the caller's tier via check_nous_free_tier(). Fall back to
# _NOUS_MODEL (google/gemini-3-flash-preview) when the Portal is unreachable
# or returns a null recommendation for this task type.
model = _NOUS_MODEL
try:
from hermes_cli.models import check_nous_free_tier
if check_nous_free_tier():
model = _NOUS_FREE_TIER_VISION_MODEL if vision else _NOUS_FREE_TIER_AUX_MODEL
logger.debug("Free-tier Nous account — using %s for auxiliary/%s",
model, "vision" if vision else "text")
except Exception:
pass
from hermes_cli.models import get_nous_recommended_aux_model
recommended = get_nous_recommended_aux_model(vision=vision)
if recommended:
model = recommended
logger.debug(
"Auxiliary/%s: using Portal-recommended model %s",
"vision" if vision else "text", model,
)
else:
logger.debug(
"Auxiliary/%s: no Portal recommendation, falling back to %s",
"vision" if vision else "text", model,
)
except Exception as exc:
logger.debug(
"Auxiliary/%s: recommended-models lookup failed (%s); "
"falling back to %s",
"vision" if vision else "text", exc, model,
)
if runtime is not None:
api_key, base_url = runtime
else:
@@ -1487,7 +1503,7 @@ def _to_async_client(sync_client, model: str):
async_kwargs["default_headers"] = copilot_default_headers()
elif base_url_host_matches(sync_base_url, "api.kimi.com"):
async_kwargs["default_headers"] = {"User-Agent": "KimiCLI/1.30.0"}
async_kwargs["default_headers"] = {"User-Agent": "claude-code/0.1.0"}
return AsyncOpenAI(**async_kwargs), model
@@ -1674,7 +1690,7 @@ def resolve_provider_client(
)
extra = {}
if base_url_host_matches(custom_base, "api.kimi.com"):
extra["default_headers"] = {"User-Agent": "KimiCLI/1.30.0"}
extra["default_headers"] = {"User-Agent": "claude-code/0.1.0"}
elif base_url_host_matches(custom_base, "api.githubcopilot.com"):
from hermes_cli.models import copilot_default_headers
extra["default_headers"] = copilot_default_headers()
@@ -1781,7 +1797,7 @@ def resolve_provider_client(
# Provider-specific headers
headers = {}
if base_url_host_matches(base_url, "api.kimi.com"):
headers["User-Agent"] = "KimiCLI/1.30.0"
headers["User-Agent"] = "claude-code/0.1.0"
elif base_url_host_matches(base_url, "api.githubcopilot.com"):
from hermes_cli.models import copilot_default_headers
+9 -4
View File
@@ -470,11 +470,16 @@ def _classify_by_status(
retryable=False,
should_fallback=True,
)
# Generic 404 — could be model or endpoint
# Generic 404 with no "model not found" signal — could be a wrong
# endpoint path (common with local llama.cpp / Ollama / vLLM when
# the URL is slightly misconfigured), a proxy routing glitch, or
# a transient backend issue. Classifying these as model_not_found
# silently falls back to a different provider and tells the model
# the model is missing, which is wrong and wastes a turn. Treat
# as unknown so the retry loop surfaces the real error instead.
return result_fn(
FailoverReason.model_not_found,
retryable=False,
should_fallback=True,
FailoverReason.unknown,
retryable=True,
)
if status_code == 413:
+242
View File
@@ -0,0 +1,242 @@
"""
Image Generation Provider ABC
=============================
Defines the pluggable-backend interface for image generation. Providers register
instances via ``PluginContext.register_image_gen_provider()``; the active one
(selected via ``image_gen.provider`` in ``config.yaml``) services every
``image_generate`` tool call.
Providers live in ``<repo>/plugins/image_gen/<name>/`` (built-in, auto-loaded
as ``kind: backend``) or ``~/.hermes/plugins/image_gen/<name>/`` (user, opt-in
via ``plugins.enabled``).
Response shape
--------------
All providers return a dict that :func:`success_response` / :func:`error_response`
produce. The tool wrapper JSON-serializes it. Keys:
success bool
image str | None URL or absolute file path
model str provider-specific model identifier
prompt str echoed prompt
aspect_ratio str "landscape" | "square" | "portrait"
provider str provider name (for diagnostics)
error str only when success=False
error_type str only when success=False
"""
from __future__ import annotations
import abc
import base64
import datetime
import logging
import uuid
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple
logger = logging.getLogger(__name__)
VALID_ASPECT_RATIOS: Tuple[str, ...] = ("landscape", "square", "portrait")
DEFAULT_ASPECT_RATIO = "landscape"
# ---------------------------------------------------------------------------
# ABC
# ---------------------------------------------------------------------------
class ImageGenProvider(abc.ABC):
"""Abstract base class for an image generation backend.
Subclasses must implement :meth:`generate`. Everything else has sane
defaults — override only what your provider needs.
"""
@property
@abc.abstractmethod
def name(self) -> str:
"""Stable short identifier used in ``image_gen.provider`` config.
Lowercase, no spaces. Examples: ``fal``, ``openai``, ``replicate``.
"""
@property
def display_name(self) -> str:
"""Human-readable label shown in ``hermes tools``. Defaults to ``name.title()``."""
return self.name.title()
def is_available(self) -> bool:
"""Return True when this provider can service calls.
Typically checks for a required API key. Default: True
(providers with no external dependencies are always available).
"""
return True
def list_models(self) -> List[Dict[str, Any]]:
"""Return catalog entries for ``hermes tools`` model picker.
Each entry::
{
"id": "gpt-image-1.5", # required
"display": "GPT Image 1.5", # optional; defaults to id
"speed": "~10s", # optional
"strengths": "...", # optional
"price": "$...", # optional
}
Default: empty list (provider has no user-selectable models).
"""
return []
def get_setup_schema(self) -> Dict[str, Any]:
"""Return provider metadata for the ``hermes tools`` picker.
Used by ``tools_config.py`` to inject this provider as a row in
the Image Generation provider list. Shape::
{
"name": "OpenAI", # picker label
"badge": "paid", # optional short tag
"tag": "One-line description...", # optional subtitle
"env_vars": [ # keys to prompt for
{"key": "OPENAI_API_KEY",
"prompt": "OpenAI API key",
"url": "https://platform.openai.com/api-keys"},
],
}
Default: minimal entry derived from ``display_name``. Override to
expose API key prompts and custom badges.
"""
return {
"name": self.display_name,
"badge": "",
"tag": "",
"env_vars": [],
}
def default_model(self) -> Optional[str]:
"""Return the default model id, or None if not applicable."""
models = self.list_models()
if models:
return models[0].get("id")
return None
@abc.abstractmethod
def generate(
self,
prompt: str,
aspect_ratio: str = DEFAULT_ASPECT_RATIO,
**kwargs: Any,
) -> Dict[str, Any]:
"""Generate an image.
Implementations should return the dict from :func:`success_response`
or :func:`error_response`. ``kwargs`` may contain forward-compat
parameters future versions of the schema will expose — implementations
should ignore unknown keys.
"""
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def resolve_aspect_ratio(value: Optional[str]) -> str:
"""Clamp an aspect_ratio value to the valid set, defaulting to landscape.
Invalid values are coerced rather than rejected so the tool surface is
forgiving of agent mistakes.
"""
if not isinstance(value, str):
return DEFAULT_ASPECT_RATIO
v = value.strip().lower()
if v in VALID_ASPECT_RATIOS:
return v
return DEFAULT_ASPECT_RATIO
def _images_cache_dir() -> Path:
"""Return ``$HERMES_HOME/cache/images/``, creating parents as needed."""
from hermes_constants import get_hermes_home
path = get_hermes_home() / "cache" / "images"
path.mkdir(parents=True, exist_ok=True)
return path
def save_b64_image(
b64_data: str,
*,
prefix: str = "image",
extension: str = "png",
) -> Path:
"""Decode base64 image data and write it under ``$HERMES_HOME/cache/images/``.
Returns the absolute :class:`Path` to the saved file.
Filename format: ``<prefix>_<YYYYMMDD_HHMMSS>_<short-uuid>.<ext>``.
"""
raw = base64.b64decode(b64_data)
ts = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
short = uuid.uuid4().hex[:8]
path = _images_cache_dir() / f"{prefix}_{ts}_{short}.{extension}"
path.write_bytes(raw)
return path
def success_response(
*,
image: str,
model: str,
prompt: str,
aspect_ratio: str,
provider: str,
extra: Optional[Dict[str, Any]] = None,
) -> Dict[str, Any]:
"""Build a uniform success response dict.
``image`` may be an HTTP URL or an absolute filesystem path (for b64
providers like OpenAI). Callers that need to pass through additional
backend-specific fields can supply ``extra``.
"""
payload: Dict[str, Any] = {
"success": True,
"image": image,
"model": model,
"prompt": prompt,
"aspect_ratio": aspect_ratio,
"provider": provider,
}
if extra:
for k, v in extra.items():
payload.setdefault(k, v)
return payload
def error_response(
*,
error: str,
error_type: str = "provider_error",
provider: str = "",
model: str = "",
prompt: str = "",
aspect_ratio: str = DEFAULT_ASPECT_RATIO,
) -> Dict[str, Any]:
"""Build a uniform error response dict."""
return {
"success": False,
"image": None,
"error": error,
"error_type": error_type,
"model": model,
"prompt": prompt,
"aspect_ratio": aspect_ratio,
"provider": provider,
}
+120
View File
@@ -0,0 +1,120 @@
"""
Image Generation Provider Registry
==================================
Central map of registered providers. Populated by plugins at import-time via
``PluginContext.register_image_gen_provider()``; consumed by the
``image_generate`` tool to dispatch each call to the active backend.
Active selection
----------------
The active provider is chosen by ``image_gen.provider`` in ``config.yaml``.
If unset, :func:`get_active_provider` applies fallback logic:
1. If exactly one provider is registered, use it.
2. Otherwise if a provider named ``fal`` is registered, use it (legacy
default — matches pre-plugin behavior).
3. Otherwise return ``None`` (the tool surfaces a helpful error pointing
the user at ``hermes tools``).
"""
from __future__ import annotations
import logging
import threading
from typing import Dict, List, Optional
from agent.image_gen_provider import ImageGenProvider
logger = logging.getLogger(__name__)
_providers: Dict[str, ImageGenProvider] = {}
_lock = threading.Lock()
def register_provider(provider: ImageGenProvider) -> None:
"""Register an image generation provider.
Re-registration (same ``name``) overwrites the previous entry and logs
a debug message — this makes hot-reload scenarios (tests, dev loops)
behave predictably.
"""
if not isinstance(provider, ImageGenProvider):
raise TypeError(
f"register_provider() expects an ImageGenProvider instance, "
f"got {type(provider).__name__}"
)
name = provider.name
if not isinstance(name, str) or not name.strip():
raise ValueError("Image gen provider .name must be a non-empty string")
with _lock:
existing = _providers.get(name)
_providers[name] = provider
if existing is not None:
logger.debug("Image gen provider '%s' re-registered (was %r)", name, type(existing).__name__)
else:
logger.debug("Registered image gen provider '%s' (%s)", name, type(provider).__name__)
def list_providers() -> List[ImageGenProvider]:
"""Return all registered providers, sorted by name."""
with _lock:
items = list(_providers.values())
return sorted(items, key=lambda p: p.name)
def get_provider(name: str) -> Optional[ImageGenProvider]:
"""Return the provider registered under *name*, or None."""
if not isinstance(name, str):
return None
with _lock:
return _providers.get(name.strip())
def get_active_provider() -> Optional[ImageGenProvider]:
"""Resolve the currently-active provider.
Reads ``image_gen.provider`` from config.yaml; falls back per the
module docstring.
"""
configured: Optional[str] = None
try:
from hermes_cli.config import load_config
cfg = load_config()
section = cfg.get("image_gen") if isinstance(cfg, dict) else None
if isinstance(section, dict):
raw = section.get("provider")
if isinstance(raw, str) and raw.strip():
configured = raw.strip()
except Exception as exc:
logger.debug("Could not read image_gen.provider from config: %s", exc)
with _lock:
snapshot = dict(_providers)
if configured:
provider = snapshot.get(configured)
if provider is not None:
return provider
logger.debug(
"image_gen.provider='%s' configured but not registered; falling back",
configured,
)
# Fallback: single-provider case
if len(snapshot) == 1:
return next(iter(snapshot.values()))
# Fallback: prefer legacy FAL for backward compat
if "fal" in snapshot:
return snapshot["fal"]
return None
def _reset_for_tests() -> None:
"""Clear the registry. **Test-only.**"""
with _lock:
_providers.clear()
+19 -3
View File
@@ -14,8 +14,8 @@ from urllib.parse import urlparse
import requests
import yaml
from hermes_cli.volcengine_byteplus import model_context_window
from utils import base_url_host_matches, base_url_hostname
from hermes_constants import OPENROUTER_MODELS_URL
logger = logging.getLogger(__name__)
@@ -25,18 +25,22 @@ logger = logging.getLogger(__name__)
# are preserved so the full model name reaches cache lookups and server queries.
_PROVIDER_PREFIXES: frozenset[str] = frozenset({
"openrouter", "nous", "openai-codex", "copilot", "copilot-acp",
"gemini", "ollama-cloud", "zai", "kimi-coding", "kimi-coding-cn", "minimax", "minimax-cn", "anthropic", "deepseek",
"gemini", "ollama-cloud", "zai", "kimi-coding", "kimi-coding-cn", "stepfun", "minimax", "minimax-cn", "anthropic", "deepseek",
"opencode-zen", "opencode-go", "ai-gateway", "kilocode", "alibaba",
"qwen-oauth",
"xiaomi",
"arcee",
"volcengine",
"volcengine-coding-plan",
"byteplus",
"byteplus-coding-plan",
"custom", "local",
# Common aliases
"google", "google-gemini", "google-ai-studio",
"glm", "z-ai", "z.ai", "zhipu", "github", "github-copilot",
"github-models", "kimi", "moonshot", "kimi-cn", "moonshot-cn", "claude", "deep-seek",
"ollama",
"opencode", "zen", "go", "vercel", "kilo", "dashscope", "aliyun", "qwen",
"stepfun", "opencode", "zen", "go", "vercel", "kilo", "dashscope", "aliyun", "qwen",
"mimo", "xiaomi-mimo",
"arcee-ai", "arceeai",
"xai", "x-ai", "x.ai", "grok",
@@ -237,6 +241,8 @@ _URL_TO_PROVIDER: Dict[str, str] = {
"api.moonshot.ai": "kimi-coding",
"api.moonshot.cn": "kimi-coding-cn",
"api.kimi.com": "kimi-coding",
"api.stepfun.ai": "stepfun",
"api.stepfun.com": "stepfun",
"api.arcee.ai": "arcee",
"api.minimax": "minimax",
"dashscope.aliyuncs.com": "alibaba",
@@ -255,6 +261,8 @@ _URL_TO_PROVIDER: Dict[str, str] = {
"api.xiaomimimo.com": "xiaomi",
"xiaomimimo.com": "xiaomi",
"ollama.com": "ollama-cloud",
"ark.cn-beijing.volces.com": "volcengine",
"ark.ap-southeast.bytepluses.com": "byteplus",
}
@@ -1117,12 +1125,20 @@ def get_model_context_length(
ctx = _resolve_nous_context_length(model)
if ctx:
return ctx
if effective_provider in {"volcengine", "byteplus"}:
ctx = model_context_window(model)
if ctx:
return ctx
if effective_provider:
from agent.models_dev import lookup_models_dev_context
ctx = lookup_models_dev_context(effective_provider, model)
if ctx:
return ctx
ctx = model_context_window(model)
if ctx:
return ctx
# 6. OpenRouter live API metadata (provider-unaware fallback)
metadata = fetch_model_metadata()
if model in metadata:
+1
View File
@@ -146,6 +146,7 @@ PROVIDER_TO_MODELS_DEV: Dict[str, str] = {
"openai-codex": "openai",
"zai": "zai",
"kimi-coding": "kimi-for-coding",
"stepfun": "stepfun",
"kimi-coding-cn": "kimi-for-coding",
"minimax": "minimax",
"minimax-cn": "minimax-cn",
+7 -1
View File
@@ -350,7 +350,13 @@ PLATFORM_HINTS = {
),
"cli": (
"You are a CLI AI Agent. Try not to use markdown but simple text "
"renderable inside a terminal."
"renderable inside a terminal. "
"File delivery: there is no attachment channel — the user reads your "
"response directly in their terminal. Do NOT emit MEDIA:/path tags "
"(those are only intercepted on messaging platforms like Telegram, "
"Discord, Slack, etc.; on the CLI they render as literal text). "
"When referring to a file you created or changed, just state its "
"absolute path in plain text; the user can open it from there."
),
"sms": (
"You are communicating via SMS. Keep responses concise and use plain text "
+12
View File
@@ -37,3 +37,15 @@ def _discover_transports() -> None:
import agent.transports.anthropic # noqa: F401
except ImportError:
pass
try:
import agent.transports.codex # noqa: F401
except ImportError:
pass
try:
import agent.transports.chat_completions # noqa: F401
except ImportError:
pass
try:
import agent.transports.bedrock # noqa: F401
except ImportError:
pass
+154
View File
@@ -0,0 +1,154 @@
"""AWS Bedrock Converse API transport.
Delegates to the existing adapter functions in agent/bedrock_adapter.py.
Bedrock uses its own boto3 client (not the OpenAI SDK), so the transport
owns format conversion and normalization, while client construction and
boto3 calls stay on AIAgent.
"""
from typing import Any, Dict, List, Optional
from agent.transports.base import ProviderTransport
from agent.transports.types import NormalizedResponse, ToolCall, Usage
class BedrockTransport(ProviderTransport):
"""Transport for api_mode='bedrock_converse'."""
@property
def api_mode(self) -> str:
return "bedrock_converse"
def convert_messages(self, messages: List[Dict[str, Any]], **kwargs) -> Any:
"""Convert OpenAI messages to Bedrock Converse format."""
from agent.bedrock_adapter import convert_messages_to_converse
return convert_messages_to_converse(messages)
def convert_tools(self, tools: List[Dict[str, Any]]) -> Any:
"""Convert OpenAI tool schemas to Bedrock Converse toolConfig."""
from agent.bedrock_adapter import convert_tools_to_converse
return convert_tools_to_converse(tools)
def build_kwargs(
self,
model: str,
messages: List[Dict[str, Any]],
tools: Optional[List[Dict[str, Any]]] = None,
**params,
) -> Dict[str, Any]:
"""Build Bedrock converse() kwargs.
Calls convert_messages and convert_tools internally.
params:
max_tokens: int — output token limit (default 4096)
temperature: float | None
guardrail_config: dict | None — Bedrock guardrails
region: str — AWS region (default 'us-east-1')
"""
from agent.bedrock_adapter import build_converse_kwargs
region = params.get("region", "us-east-1")
guardrail = params.get("guardrail_config")
kwargs = build_converse_kwargs(
model=model,
messages=messages,
tools=tools,
max_tokens=params.get("max_tokens", 4096),
temperature=params.get("temperature"),
guardrail_config=guardrail,
)
# Sentinel keys for dispatch — agent pops these before the boto3 call
kwargs["__bedrock_converse__"] = True
kwargs["__bedrock_region__"] = region
return kwargs
def normalize_response(self, response: Any, **kwargs) -> NormalizedResponse:
"""Normalize Bedrock response to NormalizedResponse.
Handles two shapes:
1. Raw boto3 dict (from direct converse() calls)
2. Already-normalized SimpleNamespace with .choices (from dispatch site)
"""
from agent.bedrock_adapter import normalize_converse_response
# Normalize to OpenAI-compatible SimpleNamespace
if hasattr(response, "choices") and response.choices:
# Already normalized at dispatch site
ns = response
else:
# Raw boto3 dict
ns = normalize_converse_response(response)
choice = ns.choices[0]
msg = choice.message
finish_reason = choice.finish_reason or "stop"
tool_calls = None
if msg.tool_calls:
tool_calls = [
ToolCall(
id=tc.id,
name=tc.function.name,
arguments=tc.function.arguments,
)
for tc in msg.tool_calls
]
usage = None
if hasattr(ns, "usage") and ns.usage:
u = ns.usage
usage = Usage(
prompt_tokens=getattr(u, "prompt_tokens", 0) or 0,
completion_tokens=getattr(u, "completion_tokens", 0) or 0,
total_tokens=getattr(u, "total_tokens", 0) or 0,
)
reasoning = getattr(msg, "reasoning", None) or getattr(msg, "reasoning_content", None)
return NormalizedResponse(
content=msg.content,
tool_calls=tool_calls,
finish_reason=finish_reason,
reasoning=reasoning,
usage=usage,
)
def validate_response(self, response: Any) -> bool:
"""Check Bedrock response structure.
After normalize_converse_response, the response has OpenAI-compatible
.choices — same check as chat_completions.
"""
if response is None:
return False
# Raw Bedrock dict response — check for 'output' key
if isinstance(response, dict):
return "output" in response
# Already-normalized SimpleNamespace
if hasattr(response, "choices"):
return bool(response.choices)
return False
def map_finish_reason(self, raw_reason: str) -> str:
"""Map Bedrock stop reason to OpenAI finish_reason.
The adapter already does this mapping inside normalize_converse_response,
so this is only used for direct access to raw responses.
"""
_MAP = {
"end_turn": "stop",
"tool_use": "tool_calls",
"max_tokens": "length",
"stop_sequence": "stop",
"guardrail_intervened": "content_filter",
"content_filtered": "content_filter",
}
return _MAP.get(raw_reason, "stop")
# Auto-register on import
from agent.transports import register_transport # noqa: E402
register_transport("bedrock_converse", BedrockTransport)
+387
View File
@@ -0,0 +1,387 @@
"""OpenAI Chat Completions transport.
Handles the default api_mode ('chat_completions') used by ~16 OpenAI-compatible
providers (OpenRouter, Nous, NVIDIA, Qwen, Ollama, DeepSeek, xAI, Kimi, etc.).
Messages and tools are already in OpenAI format — convert_messages and
convert_tools are near-identity. The complexity lives in build_kwargs
which has provider-specific conditionals for max_tokens defaults,
reasoning configuration, temperature handling, and extra_body assembly.
"""
import copy
from typing import Any, Dict, List, Optional
from agent.prompt_builder import DEVELOPER_ROLE_MODELS
from agent.transports.base import ProviderTransport
from agent.transports.types import NormalizedResponse, ToolCall, Usage
class ChatCompletionsTransport(ProviderTransport):
"""Transport for api_mode='chat_completions'.
The default path for OpenAI-compatible providers.
"""
@property
def api_mode(self) -> str:
return "chat_completions"
def convert_messages(self, messages: List[Dict[str, Any]], **kwargs) -> List[Dict[str, Any]]:
"""Messages are already in OpenAI format — sanitize Codex leaks only.
Strips Codex Responses API fields (``codex_reasoning_items`` on the
message, ``call_id``/``response_item_id`` on tool_calls) that strict
chat-completions providers reject with 400/422.
"""
needs_sanitize = False
for msg in messages:
if not isinstance(msg, dict):
continue
if "codex_reasoning_items" in msg:
needs_sanitize = True
break
tool_calls = msg.get("tool_calls")
if isinstance(tool_calls, list):
for tc in tool_calls:
if isinstance(tc, dict) and ("call_id" in tc or "response_item_id" in tc):
needs_sanitize = True
break
if needs_sanitize:
break
if not needs_sanitize:
return messages
sanitized = copy.deepcopy(messages)
for msg in sanitized:
if not isinstance(msg, dict):
continue
msg.pop("codex_reasoning_items", None)
tool_calls = msg.get("tool_calls")
if isinstance(tool_calls, list):
for tc in tool_calls:
if isinstance(tc, dict):
tc.pop("call_id", None)
tc.pop("response_item_id", None)
return sanitized
def convert_tools(self, tools: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
"""Tools are already in OpenAI format — identity."""
return tools
def build_kwargs(
self,
model: str,
messages: List[Dict[str, Any]],
tools: Optional[List[Dict[str, Any]]] = None,
**params,
) -> Dict[str, Any]:
"""Build chat.completions.create() kwargs.
This is the most complex transport method — it handles ~16 providers
via params rather than subclasses.
params:
timeout: float — API call timeout
max_tokens: int | None — user-configured max tokens
ephemeral_max_output_tokens: int | None — one-shot override (error recovery)
max_tokens_param_fn: callable — returns {max_tokens: N} or {max_completion_tokens: N}
reasoning_config: dict | None
request_overrides: dict | None
session_id: str | None
qwen_session_metadata: dict | None — {sessionId, promptId} precomputed
model_lower: str — lowercase model name for pattern matching
# Provider detection flags (all optional, default False)
is_openrouter: bool
is_nous: bool
is_qwen_portal: bool
is_github_models: bool
is_nvidia_nim: bool
is_kimi: bool
is_custom_provider: bool
ollama_num_ctx: int | None
# Provider routing
provider_preferences: dict | None
# Qwen-specific
qwen_prepare_fn: callable | None — runs AFTER codex sanitization
qwen_prepare_inplace_fn: callable | None — in-place variant for deepcopied lists
# Temperature
fixed_temperature: Any — from _fixed_temperature_for_model()
omit_temperature: bool
# Reasoning
supports_reasoning: bool
github_reasoning_extra: dict | None
# Claude on OpenRouter/Nous max output
anthropic_max_output: int | None
# Extra
extra_body_additions: dict | None — pre-built extra_body entries
"""
# Codex sanitization: drop reasoning_items / call_id / response_item_id
sanitized = self.convert_messages(messages)
# Qwen portal prep AFTER codex sanitization. If sanitize already
# deepcopied, reuse that copy via the in-place variant to avoid a
# second deepcopy.
is_qwen = params.get("is_qwen_portal", False)
if is_qwen:
qwen_prep = params.get("qwen_prepare_fn")
qwen_prep_inplace = params.get("qwen_prepare_inplace_fn")
if sanitized is messages:
if qwen_prep is not None:
sanitized = qwen_prep(sanitized)
else:
# Already deepcopied — transform in place
if qwen_prep_inplace is not None:
qwen_prep_inplace(sanitized)
elif qwen_prep is not None:
sanitized = qwen_prep(sanitized)
# Developer role swap for GPT-5/Codex models
model_lower = params.get("model_lower", (model or "").lower())
if (
sanitized
and isinstance(sanitized[0], dict)
and sanitized[0].get("role") == "system"
and any(p in model_lower for p in DEVELOPER_ROLE_MODELS)
):
sanitized = list(sanitized)
sanitized[0] = {**sanitized[0], "role": "developer"}
api_kwargs: Dict[str, Any] = {
"model": model,
"messages": sanitized,
}
timeout = params.get("timeout")
if timeout is not None:
api_kwargs["timeout"] = timeout
# Temperature
fixed_temp = params.get("fixed_temperature")
omit_temp = params.get("omit_temperature", False)
if omit_temp:
api_kwargs.pop("temperature", None)
elif fixed_temp is not None:
api_kwargs["temperature"] = fixed_temp
# Qwen metadata (caller precomputes {sessionId, promptId})
qwen_meta = params.get("qwen_session_metadata")
if qwen_meta and is_qwen:
api_kwargs["metadata"] = qwen_meta
# Tools
if tools:
api_kwargs["tools"] = tools
# max_tokens resolution — priority: ephemeral > user > provider default
max_tokens_fn = params.get("max_tokens_param_fn")
ephemeral = params.get("ephemeral_max_output_tokens")
max_tokens = params.get("max_tokens")
anthropic_max_out = params.get("anthropic_max_output")
is_nvidia_nim = params.get("is_nvidia_nim", False)
is_kimi = params.get("is_kimi", False)
reasoning_config = params.get("reasoning_config")
if ephemeral is not None and max_tokens_fn:
api_kwargs.update(max_tokens_fn(ephemeral))
elif max_tokens is not None and max_tokens_fn:
api_kwargs.update(max_tokens_fn(max_tokens))
elif is_nvidia_nim and max_tokens_fn:
api_kwargs.update(max_tokens_fn(16384))
elif is_qwen and max_tokens_fn:
api_kwargs.update(max_tokens_fn(65536))
elif is_kimi and max_tokens_fn:
# Kimi/Moonshot: 32000 matches Kimi CLI's default
api_kwargs.update(max_tokens_fn(32000))
elif anthropic_max_out is not None:
api_kwargs["max_tokens"] = anthropic_max_out
# Kimi: top-level reasoning_effort (unless thinking disabled)
if is_kimi:
_kimi_thinking_off = bool(
reasoning_config
and isinstance(reasoning_config, dict)
and reasoning_config.get("enabled") is False
)
if not _kimi_thinking_off:
_kimi_effort = "medium"
if reasoning_config and isinstance(reasoning_config, dict):
_e = (reasoning_config.get("effort") or "").strip().lower()
if _e in ("low", "medium", "high"):
_kimi_effort = _e
api_kwargs["reasoning_effort"] = _kimi_effort
# extra_body assembly
extra_body: Dict[str, Any] = {}
is_openrouter = params.get("is_openrouter", False)
is_nous = params.get("is_nous", False)
is_github_models = params.get("is_github_models", False)
provider_prefs = params.get("provider_preferences")
if provider_prefs and is_openrouter:
extra_body["provider"] = provider_prefs
# Kimi extra_body.thinking
if is_kimi:
_kimi_thinking_enabled = True
if reasoning_config and isinstance(reasoning_config, dict):
if reasoning_config.get("enabled") is False:
_kimi_thinking_enabled = False
extra_body["thinking"] = {
"type": "enabled" if _kimi_thinking_enabled else "disabled",
}
# Reasoning
if params.get("supports_reasoning", False):
if is_github_models:
gh_reasoning = params.get("github_reasoning_extra")
if gh_reasoning is not None:
extra_body["reasoning"] = gh_reasoning
else:
if reasoning_config is not None:
rc = dict(reasoning_config)
if is_nous and rc.get("enabled") is False:
pass # omit for Nous when disabled
else:
extra_body["reasoning"] = rc
else:
extra_body["reasoning"] = {"enabled": True, "effort": "medium"}
if is_nous:
extra_body["tags"] = ["product=hermes-agent"]
# Ollama num_ctx
ollama_ctx = params.get("ollama_num_ctx")
if ollama_ctx:
options = extra_body.get("options", {})
options["num_ctx"] = ollama_ctx
extra_body["options"] = options
# Ollama/custom think=false
if params.get("is_custom_provider", False):
if reasoning_config and isinstance(reasoning_config, dict):
_effort = (reasoning_config.get("effort") or "").strip().lower()
_enabled = reasoning_config.get("enabled", True)
if _effort == "none" or _enabled is False:
extra_body["think"] = False
if is_qwen:
extra_body["vl_high_resolution_images"] = True
# Merge any pre-built extra_body additions
additions = params.get("extra_body_additions")
if additions:
extra_body.update(additions)
if extra_body:
api_kwargs["extra_body"] = extra_body
# Request overrides last (service_tier etc.)
overrides = params.get("request_overrides")
if overrides:
api_kwargs.update(overrides)
return api_kwargs
def normalize_response(self, response: Any, **kwargs) -> NormalizedResponse:
"""Normalize OpenAI ChatCompletion to NormalizedResponse.
For chat_completions, this is near-identity — the response is already
in OpenAI format. extra_content on tool_calls (Gemini thought_signature)
is preserved via ToolCall.provider_data. reasoning_details (OpenRouter
unified format) and reasoning_content (DeepSeek/Moonshot) are also
preserved for downstream replay.
"""
choice = response.choices[0]
msg = choice.message
finish_reason = choice.finish_reason or "stop"
tool_calls = None
if msg.tool_calls:
tool_calls = []
for tc in msg.tool_calls:
# Preserve provider-specific extras on the tool call.
# Gemini 3 thinking models attach extra_content with
# thought_signature — without replay on the next turn the API
# rejects the request with 400.
tc_provider_data: Dict[str, Any] = {}
extra = getattr(tc, "extra_content", None)
if extra is None and hasattr(tc, "model_extra"):
extra = (tc.model_extra or {}).get("extra_content")
if extra is not None:
if hasattr(extra, "model_dump"):
try:
extra = extra.model_dump()
except Exception:
pass
tc_provider_data["extra_content"] = extra
tool_calls.append(ToolCall(
id=tc.id,
name=tc.function.name,
arguments=tc.function.arguments,
provider_data=tc_provider_data or None,
))
usage = None
if hasattr(response, "usage") and response.usage:
u = response.usage
usage = Usage(
prompt_tokens=getattr(u, "prompt_tokens", 0) or 0,
completion_tokens=getattr(u, "completion_tokens", 0) or 0,
total_tokens=getattr(u, "total_tokens", 0) or 0,
)
# Preserve reasoning fields separately. DeepSeek/Moonshot use
# ``reasoning_content``; others use ``reasoning``. Downstream code
# (_extract_reasoning, thinking-prefill retry) reads both distinctly,
# so keep them apart in provider_data rather than merging.
reasoning = getattr(msg, "reasoning", None)
reasoning_content = getattr(msg, "reasoning_content", None)
provider_data: Dict[str, Any] = {}
if reasoning_content:
provider_data["reasoning_content"] = reasoning_content
rd = getattr(msg, "reasoning_details", None)
if rd:
provider_data["reasoning_details"] = rd
return NormalizedResponse(
content=msg.content,
tool_calls=tool_calls,
finish_reason=finish_reason,
reasoning=reasoning,
usage=usage,
provider_data=provider_data or None,
)
def validate_response(self, response: Any) -> bool:
"""Check that response has valid choices."""
if response is None:
return False
if not hasattr(response, "choices") or response.choices is None:
return False
if not response.choices:
return False
return True
def extract_cache_stats(self, response: Any) -> Optional[Dict[str, int]]:
"""Extract OpenRouter/OpenAI cache stats from prompt_tokens_details."""
usage = getattr(response, "usage", None)
if usage is None:
return None
details = getattr(usage, "prompt_tokens_details", None)
if details is None:
return None
cached = getattr(details, "cached_tokens", 0) or 0
written = getattr(details, "cache_write_tokens", 0) or 0
if cached or written:
return {"cached_tokens": cached, "creation_tokens": written}
return None
# Auto-register on import
from agent.transports import register_transport # noqa: E402
register_transport("chat_completions", ChatCompletionsTransport)
+217
View File
@@ -0,0 +1,217 @@
"""OpenAI Responses API (Codex) transport.
Delegates to the existing adapter functions in agent/codex_responses_adapter.py.
This transport owns format conversion and normalization — NOT client lifecycle,
streaming, or the _run_codex_stream() call path.
"""
from typing import Any, Dict, List, Optional
from agent.transports.base import ProviderTransport
from agent.transports.types import NormalizedResponse, ToolCall, Usage
class ResponsesApiTransport(ProviderTransport):
"""Transport for api_mode='codex_responses'.
Wraps the functions extracted into codex_responses_adapter.py (PR 1).
"""
@property
def api_mode(self) -> str:
return "codex_responses"
def convert_messages(self, messages: List[Dict[str, Any]], **kwargs) -> Any:
"""Convert OpenAI chat messages to Responses API input items."""
from agent.codex_responses_adapter import _chat_messages_to_responses_input
return _chat_messages_to_responses_input(messages)
def convert_tools(self, tools: List[Dict[str, Any]]) -> Any:
"""Convert OpenAI tool schemas to Responses API function definitions."""
from agent.codex_responses_adapter import _responses_tools
return _responses_tools(tools)
def build_kwargs(
self,
model: str,
messages: List[Dict[str, Any]],
tools: Optional[List[Dict[str, Any]]] = None,
**params,
) -> Dict[str, Any]:
"""Build Responses API kwargs.
Calls convert_messages and convert_tools internally.
params:
instructions: str — system prompt (extracted from messages[0] if not given)
reasoning_config: dict | None — {effort, enabled}
session_id: str | None — used for prompt_cache_key + xAI conv header
max_tokens: int | None — max_output_tokens
request_overrides: dict | None — extra kwargs merged in
provider: str | None — provider name for backend-specific logic
base_url: str | None — endpoint URL
base_url_hostname: str | None — hostname for backend detection
is_github_responses: bool — Copilot/GitHub models backend
is_codex_backend: bool — chatgpt.com/backend-api/codex
is_xai_responses: bool — xAI/Grok backend
github_reasoning_extra: dict | None — Copilot reasoning params
"""
from agent.codex_responses_adapter import (
_chat_messages_to_responses_input,
_responses_tools,
)
from run_agent import DEFAULT_AGENT_IDENTITY
instructions = params.get("instructions", "")
payload_messages = messages
if not instructions:
if messages and messages[0].get("role") == "system":
instructions = str(messages[0].get("content") or "").strip()
payload_messages = messages[1:]
if not instructions:
instructions = DEFAULT_AGENT_IDENTITY
is_github_responses = params.get("is_github_responses", False)
is_codex_backend = params.get("is_codex_backend", False)
is_xai_responses = params.get("is_xai_responses", False)
# Resolve reasoning effort
reasoning_effort = "medium"
reasoning_enabled = True
reasoning_config = params.get("reasoning_config")
if reasoning_config and isinstance(reasoning_config, dict):
if reasoning_config.get("enabled") is False:
reasoning_enabled = False
elif reasoning_config.get("effort"):
reasoning_effort = reasoning_config["effort"]
_effort_clamp = {"minimal": "low"}
reasoning_effort = _effort_clamp.get(reasoning_effort, reasoning_effort)
kwargs = {
"model": model,
"instructions": instructions,
"input": _chat_messages_to_responses_input(payload_messages),
"tools": _responses_tools(tools),
"tool_choice": "auto",
"parallel_tool_calls": True,
"store": False,
}
session_id = params.get("session_id")
if not is_github_responses and session_id:
kwargs["prompt_cache_key"] = session_id
if reasoning_enabled and is_xai_responses:
kwargs["include"] = ["reasoning.encrypted_content"]
elif reasoning_enabled:
if is_github_responses:
github_reasoning = params.get("github_reasoning_extra")
if github_reasoning is not None:
kwargs["reasoning"] = github_reasoning
else:
kwargs["reasoning"] = {"effort": reasoning_effort, "summary": "auto"}
kwargs["include"] = ["reasoning.encrypted_content"]
elif not is_github_responses and not is_xai_responses:
kwargs["include"] = []
request_overrides = params.get("request_overrides")
if request_overrides:
kwargs.update(request_overrides)
max_tokens = params.get("max_tokens")
if max_tokens is not None and not is_codex_backend:
kwargs["max_output_tokens"] = max_tokens
if is_xai_responses and session_id:
kwargs["extra_headers"] = {"x-grok-conv-id": session_id}
return kwargs
def normalize_response(self, response: Any, **kwargs) -> NormalizedResponse:
"""Normalize Codex Responses API response to NormalizedResponse."""
from agent.codex_responses_adapter import (
_normalize_codex_response,
_extract_responses_message_text,
_extract_responses_reasoning_text,
)
# _normalize_codex_response returns (SimpleNamespace, finish_reason_str)
msg, finish_reason = _normalize_codex_response(response)
tool_calls = None
if msg and msg.tool_calls:
tool_calls = []
for tc in msg.tool_calls:
provider_data = {}
if hasattr(tc, "call_id") and tc.call_id:
provider_data["call_id"] = tc.call_id
if hasattr(tc, "response_item_id") and tc.response_item_id:
provider_data["response_item_id"] = tc.response_item_id
tool_calls.append(ToolCall(
id=tc.id if hasattr(tc, "id") else (tc.function.name if hasattr(tc, "function") else None),
name=tc.function.name if hasattr(tc, "function") else getattr(tc, "name", ""),
arguments=tc.function.arguments if hasattr(tc, "function") else getattr(tc, "arguments", "{}"),
provider_data=provider_data or None,
))
# Extract reasoning items for provider_data
provider_data = {}
if msg and hasattr(msg, "codex_reasoning_items") and msg.codex_reasoning_items:
provider_data["codex_reasoning_items"] = msg.codex_reasoning_items
if msg and hasattr(msg, "reasoning_details") and msg.reasoning_details:
provider_data["reasoning_details"] = msg.reasoning_details
return NormalizedResponse(
content=msg.content if msg else None,
tool_calls=tool_calls,
finish_reason=finish_reason or "stop",
reasoning=msg.reasoning if msg and hasattr(msg, "reasoning") else None,
usage=None, # Codex usage is extracted separately in normalize_usage()
provider_data=provider_data or None,
)
def validate_response(self, response: Any) -> bool:
"""Check Codex Responses API response has valid output structure.
Returns True only if response.output is a non-empty list.
Does NOT check output_text fallback — the caller handles that
with diagnostic logging for stream backfill recovery.
"""
if response is None:
return False
output = getattr(response, "output", None)
if not isinstance(output, list) or not output:
return False
return True
def preflight_kwargs(self, api_kwargs: Any, *, allow_stream: bool = False) -> dict:
"""Validate and sanitize Codex API kwargs before the call.
Normalizes input items, strips unsupported fields, validates structure.
"""
from agent.codex_responses_adapter import _preflight_codex_api_kwargs
return _preflight_codex_api_kwargs(api_kwargs, allow_stream=allow_stream)
def map_finish_reason(self, raw_reason: str) -> str:
"""Map Codex response.status to OpenAI finish_reason.
Codex uses response.status ('completed', 'incomplete') +
response.incomplete_details.reason for granular mapping.
This method handles the simple status string; the caller
should check incomplete_details separately for 'max_output_tokens'.
"""
_MAP = {
"completed": "stop",
"incomplete": "length",
"failed": "stop",
"cancelled": "stop",
}
return _MAP.get(raw_reason, "stop")
# Auto-register on import
from agent.transports import register_transport # noqa: E402
register_transport("codex_responses", ResponsesApiTransport)
+33 -1
View File
@@ -914,6 +914,32 @@ def _cleanup_worktree(info: Dict[str, str] = None) -> None:
print(f"\033[32m✓ Worktree cleaned up: {wt_path}\033[0m")
def _run_state_db_auto_maintenance(session_db) -> None:
"""Call ``SessionDB.maybe_auto_prune_and_vacuum`` using current config.
Reads the ``sessions:`` section from config.yaml via
:func:`hermes_cli.config.load_config` (the authoritative loader that
deep-merges DEFAULT_CONFIG, so unmigrated configs still get default
values). Honours ``auto_prune`` / ``retention_days`` /
``vacuum_after_prune`` / ``min_interval_hours``, and delegates to the
DB. Never raises maintenance must never block interactive startup.
"""
if session_db is None:
return
try:
from hermes_cli.config import load_config as _load_full_config
cfg = (_load_full_config().get("sessions") or {})
if not cfg.get("auto_prune", False):
return
session_db.maybe_auto_prune_and_vacuum(
retention_days=int(cfg.get("retention_days", 90)),
min_interval_hours=int(cfg.get("min_interval_hours", 24)),
vacuum=bool(cfg.get("vacuum_after_prune", True)),
)
except Exception as exc:
logger.debug("state.db auto-maintenance skipped: %s", exc)
def _prune_stale_worktrees(repo_root: str, max_age_hours: int = 24) -> None:
"""Remove stale worktrees and orphaned branches on startup.
@@ -1961,7 +1987,13 @@ class HermesCLI:
self._session_db = SessionDB()
except Exception as e:
logger.warning("Failed to initialize SessionDB — session will NOT be indexed for search: %s", e)
# Opportunistic state.db maintenance — runs at most once per
# min_interval_hours, tracked via state_meta in state.db itself so
# it's shared across all Hermes processes for this HERMES_HOME.
# Never blocks startup on failure.
_run_state_db_auto_maintenance(self._session_db)
# Deferred title: stored in memory until the session is created in the DB
self._pending_title: Optional[str] = None
+2
View File
@@ -616,6 +616,8 @@ def load_gateway_config() -> GatewayConfig:
if isinstance(frc, list):
frc = ",".join(str(v) for v in frc)
os.environ["SLACK_FREE_RESPONSE_CHANNELS"] = str(frc)
if "reactions" in slack_cfg and not os.getenv("SLACK_REACTIONS"):
os.environ["SLACK_REACTIONS"] = str(slack_cfg["reactions"]).lower()
# Discord settings → env vars (env vars take precedence)
discord_cfg = yaml_cfg.get("discord", {})
+2 -4
View File
@@ -26,9 +26,8 @@ from .adapter import ( # noqa: F401
# -- Onboard (QR-code scan-to-configure) -----------------------------------
from .onboard import ( # noqa: F401
BindStatus,
create_bind_task,
poll_bind_result,
build_connect_url,
qr_register,
)
from .crypto import decrypt_secret, generate_bind_key # noqa: F401
@@ -44,9 +43,8 @@ __all__ = [
"_ssrf_redirect_guard",
# onboard
"BindStatus",
"create_bind_task",
"poll_bind_result",
"build_connect_url",
"qr_register",
# crypto
"decrypt_secret",
"generate_bind_key",
+117 -21
View File
@@ -1,6 +1,10 @@
"""
QQBot scan-to-configure (QR code onboard) module.
Mirrors the Feishu onboarding pattern: synchronous HTTP + a single public
entry-point ``qr_register()`` that handles the full flow (create task →
display QR code → poll → decrypt credentials).
Calls the ``q.qq.com`` ``create_bind_task`` / ``poll_bind_result`` APIs to
generate a QR-code URL and poll for scan completion. On success the caller
receives the bot's *app_id*, *client_secret* (decrypted locally), and the
@@ -12,18 +16,20 @@ Reference: https://bot.q.qq.com/wiki/develop/api-v2/
from __future__ import annotations
import logging
import time
from enum import IntEnum
from typing import Tuple
from typing import Optional, Tuple
from urllib.parse import quote
from .constants import (
ONBOARD_API_TIMEOUT,
ONBOARD_CREATE_PATH,
ONBOARD_POLL_INTERVAL,
ONBOARD_POLL_PATH,
PORTAL_HOST,
QR_URL_TEMPLATE,
)
from .crypto import generate_bind_key
from .crypto import decrypt_secret, generate_bind_key
from .utils import get_api_headers
logger = logging.getLogger(__name__)
@@ -35,7 +41,7 @@ logger = logging.getLogger(__name__)
class BindStatus(IntEnum):
"""Status codes returned by ``poll_bind_result``."""
"""Status codes returned by ``_poll_bind_result``."""
NONE = 0
PENDING = 1
@@ -44,18 +50,40 @@ class BindStatus(IntEnum):
# ---------------------------------------------------------------------------
# Public API
# QR rendering
# ---------------------------------------------------------------------------
try:
import qrcode as _qrcode_mod
except (ImportError, TypeError):
_qrcode_mod = None # type: ignore[assignment]
def _render_qr(url: str) -> bool:
"""Try to render a QR code in the terminal. Returns True if successful."""
if _qrcode_mod is None:
return False
try:
qr = _qrcode_mod.QRCode(
error_correction=_qrcode_mod.constants.ERROR_CORRECT_M,
border=2,
)
qr.add_data(url)
qr.make(fit=True)
qr.print_ascii(invert=True)
return True
except Exception:
return False
# ---------------------------------------------------------------------------
# Synchronous HTTP helpers (mirrors Feishu _post_registration pattern)
# ---------------------------------------------------------------------------
async def create_bind_task(
timeout: float = ONBOARD_API_TIMEOUT,
) -> Tuple[str, str]:
def _create_bind_task(timeout: float = ONBOARD_API_TIMEOUT) -> Tuple[str, str]:
"""Create a bind task and return *(task_id, aes_key_base64)*.
The AES key is generated locally and sent to the server so it can
encrypt the bot credentials before returning them.
Raises:
RuntimeError: If the API returns a non-zero ``retcode``.
"""
@@ -64,8 +92,8 @@ async def create_bind_task(
url = f"https://{PORTAL_HOST}{ONBOARD_CREATE_PATH}"
key = generate_bind_key()
async with httpx.AsyncClient(timeout=timeout, follow_redirects=True) as client:
resp = await client.post(url, json={"key": key}, headers=get_api_headers())
with httpx.Client(timeout=timeout, follow_redirects=True) as client:
resp = client.post(url, json={"key": key}, headers=get_api_headers())
resp.raise_for_status()
data = resp.json()
@@ -80,7 +108,7 @@ async def create_bind_task(
return task_id, key
async def poll_bind_result(
def _poll_bind_result(
task_id: str,
timeout: float = ONBOARD_API_TIMEOUT,
) -> Tuple[BindStatus, str, str, str]:
@@ -89,12 +117,6 @@ async def poll_bind_result(
Returns:
A 4-tuple of ``(status, bot_appid, bot_encrypt_secret, user_openid)``.
* ``bot_encrypt_secret`` is AES-256-GCM encrypted — decrypt it with
:func:`~gateway.platforms.qqbot.crypto.decrypt_secret` using the
key from :func:`create_bind_task`.
* ``user_openid`` is the OpenID of the person who scanned the code
(available when ``status == COMPLETED``).
Raises:
RuntimeError: If the API returns a non-zero ``retcode``.
"""
@@ -102,8 +124,8 @@ async def poll_bind_result(
url = f"https://{PORTAL_HOST}{ONBOARD_POLL_PATH}"
async with httpx.AsyncClient(timeout=timeout, follow_redirects=True) as client:
resp = await client.post(url, json={"task_id": task_id}, headers=get_api_headers())
with httpx.Client(timeout=timeout, follow_redirects=True) as client:
resp = client.post(url, json={"task_id": task_id}, headers=get_api_headers())
resp.raise_for_status()
data = resp.json()
@@ -122,3 +144,77 @@ async def poll_bind_result(
def build_connect_url(task_id: str) -> str:
"""Build the QR-code target URL for a given *task_id*."""
return QR_URL_TEMPLATE.format(task_id=quote(task_id))
# ---------------------------------------------------------------------------
# Public entry-point
# ---------------------------------------------------------------------------
_MAX_REFRESHES = 3
def qr_register(timeout_seconds: int = 600) -> Optional[dict]:
"""Run the QQBot scan-to-configure QR registration flow.
Mirrors ``feishu.qr_register()``: handles create → display → poll →
decrypt in one call. Unexpected errors propagate to the caller.
:returns:
``{"app_id": ..., "client_secret": ..., "user_openid": ...}`` on
success, or ``None`` on failure / expiry / cancellation.
"""
deadline = time.monotonic() + timeout_seconds
for refresh_count in range(_MAX_REFRESHES + 1):
# ── Create bind task ──
try:
task_id, aes_key = _create_bind_task()
except Exception as exc:
logger.warning("[QQBot onboard] Failed to create bind task: %s", exc)
return None
url = build_connect_url(task_id)
# ── Display QR code + URL ──
print()
if _render_qr(url):
print(f" Scan the QR code above, or open this URL directly:\n {url}")
else:
print(f" Open this URL in QQ on your phone:\n {url}")
print(" Tip: pip install qrcode to display a scannable QR code here")
print()
# ── Poll loop ──
while time.monotonic() < deadline:
try:
status, app_id, encrypted_secret, user_openid = _poll_bind_result(task_id)
except Exception:
time.sleep(ONBOARD_POLL_INTERVAL)
continue
if status == BindStatus.COMPLETED:
client_secret = decrypt_secret(encrypted_secret, aes_key)
print()
print(f" QR scan complete! (App ID: {app_id})")
if user_openid:
print(f" Scanner's OpenID: {user_openid}")
return {
"app_id": app_id,
"client_secret": client_secret,
"user_openid": user_openid,
}
if status == BindStatus.EXPIRED:
if refresh_count >= _MAX_REFRESHES:
logger.warning("[QQBot onboard] QR code expired %d times — giving up", _MAX_REFRESHES)
return None
print(f"\n QR code expired, refreshing... ({refresh_count + 1}/{_MAX_REFRESHES})")
break # next for-loop iteration creates a new task
time.sleep(ONBOARD_POLL_INTERVAL)
else:
# deadline reached without completing
logger.warning("[QQBot onboard] Poll timed out after %ds", timeout_seconds)
return None
return None
+57 -7
View File
@@ -38,6 +38,7 @@ from gateway.platforms.base import (
BasePlatformAdapter,
MessageEvent,
MessageType,
ProcessingOutcome,
SendResult,
SUPPORTED_DOCUMENT_TYPES,
safe_url_for_log,
@@ -113,6 +114,11 @@ class SlackAdapter(BasePlatformAdapter):
# Cache for _fetch_thread_context results: cache_key → _ThreadContextCache
self._thread_context_cache: Dict[str, _ThreadContextCache] = {}
self._THREAD_CACHE_TTL = 60.0
# Track message IDs that should get reaction lifecycle (DMs / @mentions).
self._reacting_message_ids: set = set()
# Track active assistant thread status indicators so stop_typing can
# clear them (chat_id → thread_ts).
self._active_status_threads: Dict[str, str] = {}
async def connect(self) -> bool:
"""Connect to Slack via Socket Mode."""
@@ -362,6 +368,7 @@ class SlackAdapter(BasePlatformAdapter):
if not thread_ts:
return # Can only set status in a thread context
self._active_status_threads[chat_id] = thread_ts
try:
await self._get_client(chat_id).assistant_threads_setStatus(
channel_id=chat_id,
@@ -373,6 +380,22 @@ class SlackAdapter(BasePlatformAdapter):
# in an assistant-enabled context. Falls back to reactions.
logger.debug("[Slack] assistant.threads.setStatus failed: %s", e)
async def stop_typing(self, chat_id: str) -> None:
"""Clear the assistant thread status indicator."""
if not self._app:
return
thread_ts = self._active_status_threads.pop(chat_id, None)
if not thread_ts:
return
try:
await self._get_client(chat_id).assistant_threads_setStatus(
channel_id=chat_id,
thread_ts=thread_ts,
status="",
)
except Exception as e:
logger.debug("[Slack] assistant.threads.setStatus clear failed: %s", e)
def _dm_top_level_threads_as_sessions(self) -> bool:
"""Whether top-level Slack DMs get per-message session threads.
@@ -584,6 +607,38 @@ class SlackAdapter(BasePlatformAdapter):
logger.debug("[Slack] reactions.remove failed (%s): %s", emoji, e)
return False
def _reactions_enabled(self) -> bool:
"""Check if message reactions are enabled via config/env."""
return os.getenv("SLACK_REACTIONS", "true").lower() not in ("false", "0", "no")
async def on_processing_start(self, event: MessageEvent) -> None:
"""Add an in-progress reaction when message processing begins."""
if not self._reactions_enabled():
return
ts = getattr(event, "message_id", None)
if not ts or ts not in self._reacting_message_ids:
return
channel_id = getattr(event.source, "chat_id", None)
if channel_id:
await self._add_reaction(channel_id, ts, "eyes")
async def on_processing_complete(self, event: MessageEvent, outcome: ProcessingOutcome) -> None:
"""Swap the in-progress reaction for a final success/failure reaction."""
if not self._reactions_enabled():
return
ts = getattr(event, "message_id", None)
if not ts or ts not in self._reacting_message_ids:
return
self._reacting_message_ids.discard(ts)
channel_id = getattr(event.source, "chat_id", None)
if not channel_id:
return
await self._remove_reaction(channel_id, ts, "eyes")
if outcome == ProcessingOutcome.SUCCESS:
await self._add_reaction(channel_id, ts, "white_check_mark")
elif outcome == ProcessingOutcome.FAILURE:
await self._add_reaction(channel_id, ts, "x")
# ----- User identity resolution -----
async def _resolve_user_name(self, user_id: str, chat_id: str = "") -> str:
@@ -1213,17 +1268,12 @@ class SlackAdapter(BasePlatformAdapter):
# Only react when bot is directly addressed (DM or @mention).
# In listen-all channels (require_mention=false), reacting to every
# casual message would be noisy.
_should_react = is_dm or is_mentioned
_should_react = (is_dm or is_mentioned) and self._reactions_enabled()
if _should_react:
await self._add_reaction(channel_id, ts, "eyes")
self._reacting_message_ids.add(ts)
await self.handle_message(msg_event)
if _should_react:
await self._remove_reaction(channel_id, ts, "eyes")
await self._add_reaction(channel_id, ts, "white_check_mark")
# ----- Approval button support (Block Kit) -----
async def send_exec_approval(
+131
View File
@@ -1464,3 +1464,134 @@ class WeComAdapter(BasePlatformAdapter):
"name": chat_id,
"type": "group" if chat_id and chat_id.lower().startswith("group") else "dm",
}
# ------------------------------------------------------------------
# QR code scan flow for obtaining bot credentials
# ------------------------------------------------------------------
_QR_GENERATE_URL = "https://work.weixin.qq.com/ai/qc/generate"
_QR_QUERY_URL = "https://work.weixin.qq.com/ai/qc/query_result"
_QR_CODE_PAGE = "https://work.weixin.qq.com/ai/qc/gen?source=hermes&scode="
_QR_POLL_INTERVAL = 3 # seconds
_QR_POLL_TIMEOUT = 300 # 5 minutes
def qr_scan_for_bot_info(
*,
timeout_seconds: int = _QR_POLL_TIMEOUT,
) -> Optional[Dict[str, str]]:
"""Run the WeCom QR scan flow to obtain bot_id and secret.
Fetches a QR code from WeCom, renders it in the terminal, and polls
until the user scans it or the timeout expires.
Returns ``{"bot_id": ..., "secret": ...}`` on success, ``None`` on
failure or timeout.
Note: the ``work.weixin.qq.com/ai/qc/{generate,query_result}`` endpoints
used here are not part of WeCom's public developer API — they back the
admin-console web UI's bot-creation flow and may change without notice.
The same pattern is used by the feishu/dingtalk QR setup wizards.
"""
try:
import urllib.request
import urllib.parse
except ImportError: # pragma: no cover
logger.error("urllib is required for WeCom QR scan")
return None
generate_url = f"{_QR_GENERATE_URL}?source=hermes"
# ── Step 1: Fetch QR code ──
print(" Connecting to WeCom...", end="", flush=True)
try:
req = urllib.request.Request(generate_url, headers={"User-Agent": "HermesAgent/1.0"})
with urllib.request.urlopen(req, timeout=15) as resp:
raw = json.loads(resp.read().decode("utf-8"))
except Exception as exc:
logger.error("WeCom QR: failed to fetch QR code: %s", exc)
print(f" failed: {exc}")
return None
data = raw.get("data") or {}
scode = str(data.get("scode") or "").strip()
auth_url = str(data.get("auth_url") or "").strip()
if not scode or not auth_url:
logger.error("WeCom QR: unexpected response format: %s", raw)
print(" failed: unexpected response format")
return None
print(" done.")
# ── Step 2: Render QR code in terminal ──
print()
qr_rendered = False
try:
import qrcode as _qrcode
qr = _qrcode.QRCode()
qr.add_data(auth_url)
qr.make(fit=True)
qr.print_ascii(invert=True)
qr_rendered = True
except ImportError:
pass
except Exception:
pass
page_url = f"{_QR_CODE_PAGE}{urllib.parse.quote(scode)}"
if qr_rendered:
print(f"\n Scan the QR code above, or open this URL directly:\n {page_url}")
else:
print(f" Open this URL in WeCom on your phone:\n\n {page_url}\n")
print(" Tip: pip install qrcode to display a scannable QR code here next time")
print()
print(" Fetching configuration results...", end="", flush=True)
# ── Step 3: Poll for result ──
import time
deadline = time.time() + timeout_seconds
query_url = f"{_QR_QUERY_URL}?scode={urllib.parse.quote(scode)}"
poll_count = 0
while time.time() < deadline:
try:
req = urllib.request.Request(query_url, headers={"User-Agent": "HermesAgent/1.0"})
with urllib.request.urlopen(req, timeout=10) as resp:
result = json.loads(resp.read().decode("utf-8"))
except Exception as exc:
logger.debug("WeCom QR poll error: %s", exc)
time.sleep(_QR_POLL_INTERVAL)
continue
poll_count += 1
# Print a dot on every poll so progress is visible within 3s.
print(".", end="", flush=True)
result_data = result.get("data") or {}
status = str(result_data.get("status") or "").lower()
if status == "success":
print() # newline after "Fetching configuration results..." dots
bot_info = result_data.get("bot_info") or {}
bot_id = str(bot_info.get("botid") or bot_info.get("bot_id") or "").strip()
secret = str(bot_info.get("secret") or "").strip()
if bot_id and secret:
return {"bot_id": bot_id, "secret": secret}
logger.warning(
"WeCom QR: scan reported success but bot_info missing or incomplete: %s",
result_data,
)
print(
" QR scan reported success but no bot credentials were returned.\n"
" This usually means the bot was not actually created on the WeCom side.\n"
" Falling back to manual credential entry."
)
return None
time.sleep(_QR_POLL_INTERVAL)
print() # newline after dots
print(f" QR scan timed out ({timeout_seconds // 60} minutes). Please try again.")
return None
+36 -2
View File
@@ -710,7 +710,26 @@ class GatewayRunner:
self._session_db = SessionDB()
except Exception as e:
logger.debug("SQLite session store not available: %s", e)
# Opportunistic state.db maintenance: prune ended sessions older
# than sessions.retention_days + optional VACUUM. Tracks last-run
# in state_meta so it only actually executes once per
# sessions.min_interval_hours. Gateway is long-lived so blocking
# a few seconds once per day is acceptable; failures are logged
# but never raised.
if self._session_db is not None:
try:
from hermes_cli.config import load_config as _load_full_config
_sess_cfg = (_load_full_config().get("sessions") or {})
if _sess_cfg.get("auto_prune", False):
self._session_db.maybe_auto_prune_and_vacuum(
retention_days=int(_sess_cfg.get("retention_days", 90)),
min_interval_hours=int(_sess_cfg.get("min_interval_hours", 24)),
vacuum=bool(_sess_cfg.get("vacuum_after_prune", True)),
)
except Exception as exc:
logger.debug("state.db auto-maintenance skipped: %s", exc)
# DM pairing store for code-based user authorization
from gateway.pairing import PairingStore
self.pairing_store = PairingStore()
@@ -5671,6 +5690,7 @@ class GatewayRunner:
from hermes_cli.models import (
list_available_providers,
normalize_provider,
provider_for_base_url,
_PROVIDER_LABELS,
)
@@ -5699,7 +5719,10 @@ class GatewayRunner:
# Detect custom endpoint from config base_url
if current_provider == "openrouter":
_cfg_base = model_cfg.get("base_url", "") if isinstance(model_cfg, dict) else ""
if _cfg_base and "openrouter.ai" not in _cfg_base:
inferred_provider = provider_for_base_url(_cfg_base)
if inferred_provider:
current_provider = inferred_provider
elif _cfg_base and "openrouter.ai" not in _cfg_base:
current_provider = "custom"
current_label = _PROVIDER_LABELS.get(current_provider, current_provider)
@@ -6456,6 +6479,11 @@ class GatewayRunner:
session_id=task_id,
platform=platform_key,
user_id=source.user_id,
user_name=source.user_name,
chat_id=source.chat_id,
chat_name=source.chat_name,
chat_type=source.chat_type,
thread_id=source.thread_id,
session_db=self._session_db,
fallback_model=self._fallback_model,
)
@@ -7216,6 +7244,7 @@ class GatewayRunner:
tool_calls=msg.get("tool_calls"),
tool_call_id=msg.get("tool_call_id"),
reasoning=msg.get("reasoning"),
reasoning_content=msg.get("reasoning_content"),
)
except Exception:
pass # Best-effort copy
@@ -9698,6 +9727,11 @@ class GatewayRunner:
session_id=session_id,
platform=platform_key,
user_id=source.user_id,
user_name=source.user_name,
chat_id=source.chat_id,
chat_name=source.chat_name,
chat_type=source.chat_type,
thread_id=source.thread_id,
gateway_session_key=session_key,
session_db=self._session_db,
fallback_model=self._fallback_model,
+5
View File
@@ -1147,6 +1147,10 @@ class SessionStore:
tool_name=message.get("tool_name"),
tool_calls=message.get("tool_calls"),
tool_call_id=message.get("tool_call_id"),
reasoning=message.get("reasoning") if message.get("role") == "assistant" else None,
reasoning_content=message.get("reasoning_content") if message.get("role") == "assistant" else None,
reasoning_details=message.get("reasoning_details") if message.get("role") == "assistant" else None,
codex_reasoning_items=message.get("codex_reasoning_items") if message.get("role") == "assistant" else None,
)
except Exception as e:
logger.debug("Session DB operation failed: %s", e)
@@ -1176,6 +1180,7 @@ class SessionStore:
tool_calls=msg.get("tool_calls"),
tool_call_id=msg.get("tool_call_id"),
reasoning=msg.get("reasoning") if role == "assistant" else None,
reasoning_content=msg.get("reasoning_content") if role == "assistant" else None,
reasoning_details=msg.get("reasoning_details") if role == "assistant" else None,
codex_reasoning_items=msg.get("codex_reasoning_items") if role == "assistant" else None,
)
+75 -8
View File
@@ -39,6 +39,13 @@ import httpx
import yaml
from hermes_cli.config import get_hermes_home, get_config_path, read_raw_config
from hermes_cli.volcengine_byteplus import (
VOLCENGINE_PROVIDER,
BYTEPLUS_PROVIDER,
VOLCENGINE_STANDARD_BASE_URL,
BYTEPLUS_STANDARD_BASE_URL,
base_url_for_provider_model,
)
from hermes_constants import OPENROUTER_BASE_URL
logger = logging.getLogger(__name__)
@@ -72,6 +79,8 @@ DEFAULT_QWEN_BASE_URL = "https://portal.qwen.ai/v1"
DEFAULT_GITHUB_MODELS_BASE_URL = "https://api.githubcopilot.com"
DEFAULT_COPILOT_ACP_BASE_URL = "acp://copilot"
DEFAULT_OLLAMA_CLOUD_BASE_URL = "https://ollama.com/v1"
STEPFUN_STEP_PLAN_INTL_BASE_URL = "https://api.stepfun.ai/step_plan/v1"
STEPFUN_STEP_PLAN_CN_BASE_URL = "https://api.stepfun.com/step_plan/v1"
CODEX_OAUTH_CLIENT_ID = "app_EMoamEEZ73f0CkXaXp7hrann"
CODEX_OAUTH_TOKEN_URL = "https://auth.openai.com/oauth/token"
CODEX_ACCESS_TOKEN_REFRESH_SKEW_SECONDS = 120
@@ -168,8 +177,11 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
id="kimi-coding",
name="Kimi / Moonshot",
auth_type="api_key",
# Legacy platform.moonshot.ai keys use this endpoint (OpenAI-compat).
# sk-kimi- (Kimi Code) keys are auto-redirected to api.kimi.com/coding
# by _resolve_kimi_base_url() below.
inference_base_url="https://api.moonshot.ai/v1",
api_key_env_vars=("KIMI_API_KEY",),
api_key_env_vars=("KIMI_API_KEY", "KIMI_CODING_API_KEY"),
base_url_env_var="KIMI_BASE_URL",
),
"kimi-coding-cn": ProviderConfig(
@@ -179,6 +191,14 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
inference_base_url="https://api.moonshot.cn/v1",
api_key_env_vars=("KIMI_CN_API_KEY",),
),
"stepfun": ProviderConfig(
id="stepfun",
name="StepFun Step Plan",
auth_type="api_key",
inference_base_url=STEPFUN_STEP_PLAN_INTL_BASE_URL,
api_key_env_vars=("STEPFUN_API_KEY",),
base_url_env_var="STEPFUN_BASE_URL",
),
"arcee": ProviderConfig(
id="arcee",
name="Arcee AI",
@@ -294,6 +314,20 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
api_key_env_vars=("XIAOMI_API_KEY",),
base_url_env_var="XIAOMI_BASE_URL",
),
"volcengine": ProviderConfig(
id=VOLCENGINE_PROVIDER,
name="Volcengine",
auth_type="api_key",
inference_base_url=VOLCENGINE_STANDARD_BASE_URL,
api_key_env_vars=("VOLCENGINE_API_KEY",),
),
"byteplus": ProviderConfig(
id=BYTEPLUS_PROVIDER,
name="BytePlus",
auth_type="api_key",
inference_base_url=BYTEPLUS_STANDARD_BASE_URL,
api_key_env_vars=("BYTEPLUS_API_KEY",),
),
"ollama-cloud": ProviderConfig(
id="ollama-cloud",
name="Ollama Cloud",
@@ -340,10 +374,16 @@ def get_anthropic_key() -> str:
# =============================================================================
# Kimi Code (kimi.com/code) issues keys prefixed "sk-kimi-" that only work
# on api.kimi.com/coding/v1. Legacy keys from platform.moonshot.ai work on
# api.moonshot.ai/v1 (the default). Auto-detect when user hasn't set
# on api.kimi.com/coding. Legacy keys from platform.moonshot.ai work on
# api.moonshot.ai/v1 (the old default). Auto-detect when user hasn't set
# KIMI_BASE_URL explicitly.
KIMI_CODE_BASE_URL = "https://api.kimi.com/coding/v1"
#
# Note: the base URL intentionally has NO /v1 suffix. The /coding endpoint
# speaks the Anthropic Messages protocol, and the anthropic SDK appends
# "/v1/messages" internally — so "/coding" + SDK suffix → "/coding/v1/messages"
# (the correct target). Using "/coding/v1" here would produce
# "/coding/v1/v1/messages" (a 404).
KIMI_CODE_BASE_URL = "https://api.kimi.com/coding"
def _resolve_kimi_base_url(api_key: str, default_url: str, env_override: str) -> str:
@@ -983,6 +1023,7 @@ def resolve_provider(
"x-ai": "xai", "x.ai": "xai", "grok": "xai",
"kimi": "kimi-coding", "kimi-for-coding": "kimi-coding", "moonshot": "kimi-coding",
"kimi-cn": "kimi-coding-cn", "moonshot-cn": "kimi-coding-cn",
"step": "stepfun", "stepfun-coding-plan": "stepfun",
"arcee-ai": "arcee", "arceeai": "arcee",
"minimax-china": "minimax-cn", "minimax_cn": "minimax-cn",
"claude": "anthropic", "claude-code": "anthropic",
@@ -995,6 +1036,10 @@ def resolve_provider(
"hf": "huggingface", "hugging-face": "huggingface", "huggingface-hub": "huggingface",
"mimo": "xiaomi", "xiaomi-mimo": "xiaomi",
"aws": "bedrock", "aws-bedrock": "bedrock", "amazon-bedrock": "bedrock", "amazon": "bedrock",
"volcengine-coding-plan": "volcengine",
"volcengine_coding_plan": "volcengine",
"byteplus-coding-plan": "byteplus",
"byteplus_coding_plan": "byteplus",
"go": "opencode-go", "opencode-go-sub": "opencode-go",
"kilo": "kilocode", "kilo-code": "kilocode", "kilo-gateway": "kilocode",
# Local server aliases — route through the generic custom provider
@@ -1137,6 +1182,21 @@ def _qwen_cli_auth_path() -> Path:
return Path.home() / ".qwen" / "oauth_creds.json"
def _current_model_for_provider(provider_id: str) -> str:
"""Return the currently configured model when it belongs to the provider."""
try:
config = read_raw_config()
except Exception:
return ""
model_cfg = config.get("model")
if isinstance(model_cfg, dict):
configured_provider = str(model_cfg.get("provider") or "").strip().lower()
if configured_provider == provider_id:
return str(model_cfg.get("default") or model_cfg.get("model") or "").strip()
return ""
def _read_qwen_cli_tokens() -> Dict[str, Any]:
auth_path = _qwen_cli_auth_path()
if not auth_path.exists():
@@ -2535,7 +2595,11 @@ def get_api_key_provider_status(provider_id: str) -> Dict[str, Any]:
if pconfig.base_url_env_var:
env_url = os.getenv(pconfig.base_url_env_var, "").strip()
if provider_id in ("kimi-coding", "kimi-coding-cn"):
active_model = _current_model_for_provider(provider_id)
if provider_id in {VOLCENGINE_PROVIDER, BYTEPLUS_PROVIDER}:
base_url = base_url_for_provider_model(provider_id, active_model) or pconfig.inference_base_url
elif provider_id in ("kimi-coding", "kimi-coding-cn"):
base_url = _resolve_kimi_base_url(api_key, pconfig.inference_base_url, env_url)
elif env_url:
base_url = env_url
@@ -2630,7 +2694,11 @@ def resolve_api_key_provider_credentials(provider_id: str) -> Dict[str, Any]:
if pconfig.base_url_env_var:
env_url = os.getenv(pconfig.base_url_env_var, "").strip()
if provider_id in ("kimi-coding", "kimi-coding-cn"):
active_model = _current_model_for_provider(provider_id)
if provider_id in {VOLCENGINE_PROVIDER, BYTEPLUS_PROVIDER}:
base_url = base_url_for_provider_model(provider_id, active_model) or pconfig.inference_base_url
elif provider_id in ("kimi-coding", "kimi-coding-cn"):
base_url = _resolve_kimi_base_url(api_key, pconfig.inference_base_url, env_url)
elif provider_id == "zai":
base_url = _resolve_zai_base_url(api_key, pconfig.inference_base_url, env_url)
@@ -3375,7 +3443,7 @@ def _login_nous(args, pconfig: ProviderConfig) -> None:
)
from hermes_cli.models import (
_PROVIDER_MODELS, get_pricing_for_provider, filter_nous_free_models,
_PROVIDER_MODELS, get_pricing_for_provider,
check_nous_free_tier, partition_nous_models_by_tier,
)
model_ids = _PROVIDER_MODELS.get("nous", [])
@@ -3384,7 +3452,6 @@ def _login_nous(args, pconfig: ProviderConfig) -> None:
unavailable_models: list = []
if model_ids:
pricing = get_pricing_for_provider("nous")
model_ids = filter_nous_free_models(model_ids, pricing)
free_tier = check_nous_free_tier()
if free_tier:
model_ids, unavailable_models = partition_nous_models_by_tier(
+63
View File
@@ -613,6 +613,10 @@ DEFAULT_CONFIG = {
},
# Text-to-speech configuration
# Each provider supports an optional `max_text_length:` override for the
# per-request input-character cap. Omit it to use the provider's documented
# limit (OpenAI 4096, xAI 15000, MiniMax 10000, ElevenLabs 5k-40k model-aware,
# Gemini 5000, Edge 5000, Mistral 4000, NeuTTS/KittenTTS 2000).
"tts": {
"provider": "edge", # "edge" (free) | "elevenlabs" (premium) | "openai" | "xai" | "minimax" | "mistral" | "neutts" (local)
"edge": {
@@ -889,6 +893,34 @@ DEFAULT_CONFIG = {
"force_ipv4": False,
},
# Session storage — controls automatic cleanup of ~/.hermes/state.db.
# state.db accumulates every session, message, tool call, and FTS5 index
# entry forever. Without auto-pruning, a heavy user (gateway + cron)
# reports 384MB+ databases with 68K+ messages, which slows down FTS5
# inserts, /resume listing, and insights queries.
"sessions": {
# When true, prune ended sessions older than retention_days once
# per (roughly) min_interval_hours at CLI/gateway/cron startup.
# Only touches ended sessions — active sessions are always preserved.
# Default false: session history is valuable for search recall, and
# silently deleting it could surprise users. Opt in explicitly.
"auto_prune": False,
# How many days of ended-session history to keep. Matches the
# default of ``hermes sessions prune``.
"retention_days": 90,
# VACUUM after a prune that actually deleted rows. SQLite does not
# reclaim disk space on DELETE — freed pages are just reused on
# subsequent INSERTs — so without VACUUM the file stays bloated
# even after pruning. VACUUM blocks writes for a few seconds per
# 100MB, so it only runs at startup, and only when prune deleted
# ≥1 session.
"vacuum_after_prune": True,
# Minimum hours between auto-maintenance runs (avoids repeating
# the sweep on every CLI invocation). Tracked via state_meta in
# state.db itself, so it's shared across all processes.
"min_interval_hours": 24,
},
# Config schema version - bump this when adding new required fields
"_config_version": 22,
}
@@ -1046,6 +1078,22 @@ OPTIONAL_ENV_VARS = {
"category": "provider",
"advanced": True,
},
"STEPFUN_API_KEY": {
"description": "StepFun Step Plan API key",
"prompt": "StepFun Step Plan API key",
"url": "https://platform.stepfun.com/",
"password": True,
"category": "provider",
"advanced": True,
},
"STEPFUN_BASE_URL": {
"description": "StepFun Step Plan base URL override",
"prompt": "StepFun Step Plan base URL (leave empty for default)",
"url": None,
"password": False,
"category": "provider",
"advanced": True,
},
"ARCEEAI_API_KEY": {
"description": "Arcee AI API key",
"prompt": "Arcee AI API key",
@@ -1233,6 +1281,20 @@ OPTIONAL_ENV_VARS = {
"category": "provider",
"advanced": True,
},
"VOLCENGINE_API_KEY": {
"description": "Volcengine API key for Doubao / Seed models (standard + Coding Plan catalogs)",
"prompt": "Volcengine API Key",
"url": "https://www.volcengine.com/product/ark",
"password": True,
"category": "provider",
},
"BYTEPLUS_API_KEY": {
"description": "BytePlus API key for Seed / Dola models (standard + Coding Plan catalogs)",
"prompt": "BytePlus API Key",
"url": "https://www.byteplus.com/en/product/modelark",
"password": True,
"category": "provider",
},
"AWS_REGION": {
"description": "AWS region for Bedrock API calls (e.g. us-east-1, eu-central-1)",
"prompt": "AWS Region",
@@ -2098,6 +2160,7 @@ _KNOWN_ROOT_KEYS = {
"fallback_providers", "credential_pool_strategies", "toolsets",
"agent", "terminal", "display", "compression", "delegation",
"auxiliary", "custom_providers", "context", "memory", "gateway",
"sessions",
}
# Valid fields inside a custom_providers list entry
+9 -4
View File
@@ -912,6 +912,7 @@ def run_doctor(args):
_apikey_providers = [
("Z.AI / GLM", ("GLM_API_KEY", "ZAI_API_KEY", "Z_AI_API_KEY"), "https://api.z.ai/api/paas/v4/models", "GLM_BASE_URL", True),
("Kimi / Moonshot", ("KIMI_API_KEY",), "https://api.moonshot.ai/v1/models", "KIMI_BASE_URL", True),
("StepFun Step Plan", ("STEPFUN_API_KEY",), "https://api.stepfun.ai/step_plan/v1/models", "STEPFUN_BASE_URL", True),
("Kimi / Moonshot (China)", ("KIMI_CN_API_KEY",), "https://api.moonshot.cn/v1/models", None, True),
("Arcee AI", ("ARCEEAI_API_KEY",), "https://api.arcee.ai/api/v1/models", "ARCEE_BASE_URL", True),
("DeepSeek", ("DEEPSEEK_API_KEY",), "https://api.deepseek.com/v1/models", "DEEPSEEK_BASE_URL", True),
@@ -943,18 +944,22 @@ def run_doctor(args):
try:
import httpx
_base = os.getenv(_base_env, "") if _base_env else ""
# Auto-detect Kimi Code keys (sk-kimi-) → api.kimi.com
# Auto-detect Kimi Code keys (sk-kimi-) → api.kimi.com/coding/v1
# (OpenAI-compat surface, which exposes /models for health check).
if not _base and _key.startswith("sk-kimi-"):
_base = "https://api.kimi.com/coding/v1"
# Anthropic-compat endpoints (/anthropic) don't support /models.
# Rewrite to the OpenAI-compat /v1 surface for health checks.
# Anthropic-compat endpoints (/anthropic, api.kimi.com/coding
# with no /v1) don't support /models. Rewrite to the OpenAI-compat
# /v1 surface for health checks.
if _base and _base.rstrip("/").endswith("/anthropic"):
from agent.auxiliary_client import _to_openai_base_url
_base = _to_openai_base_url(_base)
if base_url_host_matches(_base, "api.kimi.com") and _base.rstrip("/").endswith("/coding"):
_base = _base.rstrip("/") + "/v1"
_url = (_base.rstrip("/") + "/models") if _base else _default_url
_headers = {"Authorization": f"Bearer {_key}"}
if base_url_host_matches(_base, "api.kimi.com"):
_headers["User-Agent"] = "KimiCLI/1.30.0"
_headers["User-Agent"] = "claude-code/0.1.0"
_resp = httpx.get(
_url,
headers=_headers,
+2
View File
@@ -160,6 +160,8 @@ def load_hermes_dotenv(
# Fix corrupted .env files before python-dotenv parses them (#8908).
if user_env.exists():
_sanitize_env_file_if_needed(user_env)
if project_env_path and project_env_path.exists():
_sanitize_env_file_if_needed(project_env_path)
if user_env.exists():
_load_dotenv_with_fallback(user_env, override=True)
+118 -104
View File
@@ -2639,9 +2639,120 @@ def _setup_dingtalk():
def _setup_wecom():
"""Configure WeCom (Enterprise WeChat) via the standard platform setup."""
wecom_platform = next(p for p in _PLATFORMS if p["key"] == "wecom")
_setup_standard_platform(wecom_platform)
"""Interactive setup for WeCom — scan QR code or manual credential input."""
print()
print(color(" ─── 💬 WeCom (Enterprise WeChat) Setup ───", Colors.CYAN))
existing_bot_id = get_env_value("WECOM_BOT_ID")
existing_secret = get_env_value("WECOM_SECRET")
if existing_bot_id and existing_secret:
print()
print_success("WeCom is already configured.")
if not prompt_yes_no(" Reconfigure WeCom?", False):
return
# ── Choose setup method ──
print()
method_choices = [
"Scan QR code to obtain Bot ID and Secret automatically (recommended)",
"Enter existing Bot ID and Secret manually",
]
method_idx = prompt_choice(" How would you like to set up WeCom?", method_choices, 0)
bot_id = None
secret = None
if method_idx == 0:
# ── QR scan flow ──
try:
from gateway.platforms.wecom import qr_scan_for_bot_info
except Exception as exc:
print_error(f" WeCom QR scan import failed: {exc}")
qr_scan_for_bot_info = None
if qr_scan_for_bot_info is not None:
try:
credentials = qr_scan_for_bot_info()
except KeyboardInterrupt:
print()
print_warning(" WeCom setup cancelled.")
return
except Exception as exc:
print_warning(f" QR scan failed: {exc}")
credentials = None
if credentials:
bot_id = credentials.get("bot_id", "")
secret = credentials.get("secret", "")
print_success(" ✔ QR scan successful! Bot ID and Secret obtained.")
if not bot_id or not secret:
print_info(" QR scan did not complete. Continuing with manual input.")
bot_id = None
secret = None
# ── Manual credential input ──
if not bot_id or not secret:
print()
print_info(" 1. Go to WeCom Application → Workspace → Smart Robot -> Create smart robots")
print_info(" 2. Select API Mode")
print_info(" 3. Copy the Bot ID and Secret from the bot's credentials info")
print_info(" 4. The bot connects via WebSocket — no public endpoint needed")
print()
bot_id = prompt(" Bot ID", password=False)
if not bot_id:
print_warning(" Skipped — WeCom won't work without a Bot ID.")
return
secret = prompt(" Secret", password=True)
if not secret:
print_warning(" Skipped — WeCom won't work without a Secret.")
return
# ── Save core credentials ──
save_env_value("WECOM_BOT_ID", bot_id)
save_env_value("WECOM_SECRET", secret)
# ── Allowed users (deny-by-default security) ──
print()
print_info(" The gateway DENIES all users by default for security.")
print_info(" Enter user IDs to create an allowlist, or leave empty.")
allowed = prompt(" Allowed user IDs (comma-separated, or empty)", password=False)
if allowed:
cleaned = allowed.replace(" ", "")
save_env_value("WECOM_ALLOWED_USERS", cleaned)
print_success(" Saved — only these users can interact with the bot.")
else:
print()
access_choices = [
"Enable open access (anyone can message the bot)",
"Use DM pairing (unknown users request access, you approve with 'hermes pairing approve')",
"Disable direct messages",
"Skip for now (bot will deny all users until configured)",
]
access_idx = prompt_choice(" How should unauthorized users be handled?", access_choices, 1)
if access_idx == 0:
save_env_value("WECOM_DM_POLICY", "open")
save_env_value("GATEWAY_ALLOW_ALL_USERS", "true")
print_warning(" Open access enabled — anyone can use your bot!")
elif access_idx == 1:
save_env_value("WECOM_DM_POLICY", "pairing")
print_success(" DM pairing mode — users will receive a code to request access.")
print_info(" Approve with: hermes pairing approve <platform> <code>")
elif access_idx == 2:
save_env_value("WECOM_DM_POLICY", "disabled")
print_warning(" Direct messages disabled.")
else:
print_info(" Skipped — configure later with 'hermes gateway setup'")
# ── Home channel (optional) ──
print()
print_info(" Chat ID for scheduled results and notifications.")
home = prompt(" Home chat ID (optional, for cron/notifications)", password=False)
if home:
save_env_value("WECOM_HOME_CHANNEL", home)
print_success(f" Home channel set to {home}")
print()
print_success("💬 WeCom configured!")
def _is_service_installed() -> bool:
@@ -3021,7 +3132,8 @@ def _setup_qqbot():
if method_idx == 0:
# ── QR scan-to-configure ──
try:
credentials = _qqbot_qr_flow()
from gateway.platforms.qqbot import qr_register
credentials = qr_register()
except KeyboardInterrupt:
print()
print_warning(" QQ Bot setup cancelled.")
@@ -3103,106 +3215,6 @@ def _setup_qqbot():
print_info(f" App ID: {credentials['app_id']}")
def _qqbot_render_qr(url: str) -> bool:
"""Try to render a QR code in the terminal. Returns True if successful."""
try:
import qrcode as _qr
qr = _qr.QRCode(border=1,error_correction=_qr.constants.ERROR_CORRECT_L)
qr.add_data(url)
qr.make(fit=True)
qr.print_ascii(invert=True)
return True
except Exception:
return False
def _qqbot_qr_flow():
"""Run the QR-code scan-to-configure flow.
Returns a dict with app_id, client_secret, user_openid on success,
or None on failure/cancel.
"""
try:
from gateway.platforms.qqbot import (
create_bind_task, poll_bind_result, build_connect_url,
decrypt_secret, BindStatus,
)
from gateway.platforms.qqbot.constants import ONBOARD_POLL_INTERVAL
except Exception as exc:
print_error(f" QQBot onboard import failed: {exc}")
return None
import asyncio
import time
MAX_REFRESHES = 3
refresh_count = 0
while refresh_count <= MAX_REFRESHES:
loop = asyncio.new_event_loop()
# ── Create bind task ──
try:
task_id, aes_key = loop.run_until_complete(create_bind_task())
except Exception as e:
print_warning(f" Failed to create bind task: {e}")
loop.close()
return None
url = build_connect_url(task_id)
# ── Display QR code + URL ──
print()
if _qqbot_render_qr(url):
print(f" Scan the QR code above, or open this URL directly:\n {url}")
else:
print(f" Open this URL in QQ on your phone:\n {url}")
print_info(" Tip: pip install qrcode to show a scannable QR code here")
# ── Poll loop (silent — keep QR visible at bottom) ──
try:
while True:
try:
status, app_id, encrypted_secret, user_openid = loop.run_until_complete(
poll_bind_result(task_id)
)
except Exception:
time.sleep(ONBOARD_POLL_INTERVAL)
continue
if status == BindStatus.COMPLETED:
client_secret = decrypt_secret(encrypted_secret, aes_key)
print()
print_success(f" QR scan complete! (App ID: {app_id})")
if user_openid:
print_info(f" Scanner's OpenID: {user_openid}")
return {
"app_id": app_id,
"client_secret": client_secret,
"user_openid": user_openid,
}
if status == BindStatus.EXPIRED:
refresh_count += 1
if refresh_count > MAX_REFRESHES:
print()
print_warning(f" QR code expired {MAX_REFRESHES} times — giving up.")
return None
print()
print_warning(f" QR code expired, refreshing... ({refresh_count}/{MAX_REFRESHES})")
loop.close()
break # outer while creates a new task
time.sleep(ONBOARD_POLL_INTERVAL)
except KeyboardInterrupt:
loop.close()
raise
finally:
loop.close()
return None
def _setup_signal():
"""Interactive setup for Signal messenger."""
import shutil
@@ -3390,6 +3402,8 @@ def gateway_setup():
_setup_feishu()
elif platform["key"] == "qqbot":
_setup_qqbot()
elif platform["key"] == "wecom":
_setup_wecom()
else:
_setup_standard_platform(platform)
+212 -9
View File
@@ -1566,8 +1566,12 @@ def select_provider_and_model(args=None):
_model_flow_anthropic(config, current_model)
elif selected_provider == "kimi-coding":
_model_flow_kimi(config, current_model)
elif selected_provider == "stepfun":
_model_flow_stepfun(config, current_model)
elif selected_provider == "bedrock":
_model_flow_bedrock(config, current_model)
elif selected_provider in ("volcengine", "byteplus"):
_model_flow_contract_provider(config, selected_provider, current_model)
elif selected_provider in (
"gemini",
"deepseek",
@@ -1952,7 +1956,7 @@ def _aux_flow_custom_endpoint(task: str, task_cfg: dict) -> None:
print(f"{display_name}: custom ({short_url})" + (f" · {model}" if model else ""))
def _prompt_provider_choice(choices, *, default=0):
def _prompt_provider_choice(choices, *, default=0, title="Select provider:"):
"""Show provider selection menu with curses arrow-key navigation.
Falls back to a numbered list when curses is unavailable (e.g. piped
@@ -1961,8 +1965,7 @@ def _prompt_provider_choice(choices, *, default=0):
"""
try:
from hermes_cli.setup import _curses_prompt_choice
idx = _curses_prompt_choice("Select provider:", choices, default)
idx = _curses_prompt_choice(title, choices, default)
if idx >= 0:
print()
return idx
@@ -1970,7 +1973,7 @@ def _prompt_provider_choice(choices, *, default=0):
pass
# Fallback: numbered list
print("Select provider:")
print(title)
for i, c in enumerate(choices, 1):
marker = "" if i - 1 == default else " "
print(f" {marker} {i}. {c}")
@@ -2165,7 +2168,6 @@ def _model_flow_nous(config, current_model="", args=None):
from hermes_cli.models import (
_PROVIDER_MODELS,
get_pricing_for_provider,
filter_nous_free_models,
check_nous_free_tier,
partition_nous_models_by_tier,
)
@@ -2208,10 +2210,8 @@ def _model_flow_nous(config, current_model="", args=None):
# Check if user is on free tier
free_tier = check_nous_free_tier()
# For both tiers: apply the allowlist filter first (removes non-allowlisted
# free models and allowlist models that aren't actually free).
# Then for free users: partition remaining models into selectable/unavailable.
model_ids = filter_nous_free_models(model_ids, pricing)
# For free users: partition models into selectable/unavailable based on
# whether they are free per the Portal-reported pricing.
unavailable_models: list[str] = []
if free_tier:
model_ids, unavailable_models = partition_nous_models_by_tier(
@@ -2945,6 +2945,10 @@ def _model_flow_named_custom(config, provider_info):
# Curated model lists for direct API-key providers — single source in models.py
from hermes_cli.models import _PROVIDER_MODELS
from hermes_cli.volcengine_byteplus import (
base_url_for_provider_model,
provider_models,
)
def _current_reasoning_effort(config) -> str:
@@ -3465,6 +3469,140 @@ def _model_flow_kimi(config, current_model=""):
print("No change.")
def _infer_stepfun_region(base_url: str) -> str:
"""Infer the current StepFun region from the configured endpoint."""
normalized = (base_url or "").strip().lower()
if "api.stepfun.com" in normalized:
return "china"
return "international"
def _stepfun_base_url_for_region(region: str) -> str:
from hermes_cli.auth import (
STEPFUN_STEP_PLAN_CN_BASE_URL,
STEPFUN_STEP_PLAN_INTL_BASE_URL,
)
return (
STEPFUN_STEP_PLAN_CN_BASE_URL
if region == "china"
else STEPFUN_STEP_PLAN_INTL_BASE_URL
)
def _model_flow_stepfun(config, current_model=""):
"""StepFun Step Plan flow with region-specific endpoints."""
from hermes_cli.auth import (
PROVIDER_REGISTRY,
_prompt_model_selection,
_save_model_choice,
deactivate_provider,
)
from hermes_cli.config import get_env_value, save_env_value, load_config, save_config
from hermes_cli.models import fetch_api_models
provider_id = "stepfun"
pconfig = PROVIDER_REGISTRY[provider_id]
key_env = pconfig.api_key_env_vars[0] if pconfig.api_key_env_vars else ""
base_url_env = pconfig.base_url_env_var or ""
existing_key = ""
for ev in pconfig.api_key_env_vars:
existing_key = get_env_value(ev) or os.getenv(ev, "")
if existing_key:
break
if not existing_key:
print(f"No {pconfig.name} API key configured.")
if key_env:
try:
import getpass
new_key = getpass.getpass(f"{key_env} (or Enter to cancel): ").strip()
except (KeyboardInterrupt, EOFError):
print()
return
if not new_key:
print("Cancelled.")
return
save_env_value(key_env, new_key)
existing_key = new_key
print("API key saved.")
print()
else:
print(f" {pconfig.name} API key: {existing_key[:8]}... ✓")
print()
current_base = ""
if base_url_env:
current_base = get_env_value(base_url_env) or os.getenv(base_url_env, "")
if not current_base:
model_cfg = config.get("model")
if isinstance(model_cfg, dict):
current_base = str(model_cfg.get("base_url") or "").strip()
current_region = _infer_stepfun_region(current_base or pconfig.inference_base_url)
region_choices = [
("international", f"International ({_stepfun_base_url_for_region('international')})"),
("china", f"China ({_stepfun_base_url_for_region('china')})"),
]
ordered_regions = []
for region_key, label in region_choices:
if region_key == current_region:
ordered_regions.insert(0, (region_key, f"{label} ← currently active"))
else:
ordered_regions.append((region_key, label))
ordered_regions.append(("cancel", "Cancel"))
region_idx = _prompt_provider_choice([label for _, label in ordered_regions])
if region_idx is None or ordered_regions[region_idx][0] == "cancel":
print("No change.")
return
selected_region = ordered_regions[region_idx][0]
effective_base = _stepfun_base_url_for_region(selected_region)
if base_url_env:
save_env_value(base_url_env, effective_base)
live_models = fetch_api_models(existing_key, effective_base)
if live_models:
model_list = live_models
print(f" Found {len(model_list)} model(s) from {pconfig.name} API")
else:
model_list = _PROVIDER_MODELS.get(provider_id, [])
if model_list:
print(
f" Could not auto-detect models from {pconfig.name} API — "
"showing Step Plan fallback catalog."
)
if model_list:
selected = _prompt_model_selection(model_list, current_model=current_model)
else:
try:
selected = input("Model name: ").strip()
except (KeyboardInterrupt, EOFError):
selected = None
if selected:
_save_model_choice(selected)
cfg = load_config()
model = cfg.get("model")
if not isinstance(model, dict):
model = {"default": model} if model else {}
cfg["model"] = model
model["provider"] = provider_id
model["base_url"] = effective_base
model.pop("api_mode", None)
save_config(cfg)
deactivate_provider()
config["model"] = dict(model)
print(f"Default model set to: {selected} (via {pconfig.name})")
else:
print("No change.")
def _model_flow_bedrock_api_key(config, region, current_model=""):
"""Bedrock API Key mode — uses the OpenAI-compatible bedrock-mantle endpoint.
@@ -3900,6 +4038,70 @@ def _model_flow_api_key_provider(config, provider_id, current_model=""):
print("No change.")
def _model_flow_contract_provider(config, provider_id, current_model=""):
"""Provider flow for Volcengine / BytePlus contract-backed catalogs."""
from hermes_cli.auth import (
PROVIDER_REGISTRY,
_prompt_model_selection,
_save_model_choice,
deactivate_provider,
)
from hermes_cli.config import get_env_value, load_config, save_config, save_env_value
pconfig = PROVIDER_REGISTRY[provider_id]
key_env = pconfig.api_key_env_vars[0] if pconfig.api_key_env_vars else ""
existing_key = ""
for env_var in pconfig.api_key_env_vars:
existing_key = get_env_value(env_var) or os.getenv(env_var, "")
if existing_key:
break
if not existing_key:
print(f"No {pconfig.name} API key configured.")
if key_env:
try:
import getpass
new_key = getpass.getpass(f"{key_env} (or Enter to cancel): ").strip()
except (KeyboardInterrupt, EOFError):
print()
return
if not new_key:
print("Cancelled.")
return
save_env_value(key_env, new_key)
print("API key saved.")
print()
else:
print(f" {pconfig.name} API key: {existing_key[:8]}... ✓")
print()
model_list = provider_models(provider_id)
if not model_list:
print(f"No curated model catalog found for {pconfig.name}.")
return
selected = _prompt_model_selection(model_list, current_model=current_model)
if not selected:
print("No change.")
return
_save_model_choice(selected)
cfg = load_config()
model = cfg.get("model")
if not isinstance(model, dict):
model = {"default": model} if model else {}
cfg["model"] = model
model["provider"] = provider_id
model["base_url"] = base_url_for_provider_model(provider_id, selected)
model.pop("api_mode", None)
save_config(cfg)
deactivate_provider()
print(f"Default model set to: {selected} (via {pconfig.name})")
def _run_anthropic_oauth_flow(save_env_value):
"""Run the Claude OAuth setup-token flow. Returns True if credentials were saved."""
from agent.anthropic_adapter import (
@@ -6533,6 +6735,7 @@ For more help on a command:
"zai",
"kimi-coding",
"kimi-coding-cn",
"stepfun",
"minimax",
"minimax-cn",
"kilocode",
+2 -1
View File
@@ -97,6 +97,8 @@ _MATCHING_PREFIX_STRIP_PROVIDERS: frozenset[str] = frozenset({
"xiaomi",
"arcee",
"ollama-cloud",
"volcengine",
"byteplus",
"custom",
})
@@ -423,4 +425,3 @@ def normalize_model_for_provider(model_input: str, target_provider: str) -> str:
# ---------------------------------------------------------------------------
# Batch / convenience helpers
# ---------------------------------------------------------------------------
+2 -1
View File
@@ -143,7 +143,7 @@ MODEL_ALIASES: dict[str, ModelIdentity] = {
# Z.AI / GLM
"glm": ModelIdentity("z-ai", "glm"),
# StepFun
# Step Plan (StepFun)
"step": ModelIdentity("stepfun", "step"),
# Xiaomi
@@ -678,6 +678,7 @@ def switch_model(
_da = DIRECT_ALIASES.get(resolved_alias)
if _da is not None and _da.base_url:
base_url = _da.base_url
api_mode = "" # clear so determine_api_mode re-detects from URL
if not api_key:
api_key = "no-key-required"
+216 -47
View File
@@ -22,6 +22,12 @@ from hermes_cli import __version__ as _HERMES_VERSION
# Check (error 1010) don't reject the default ``Python-urllib/*`` signature.
_HERMES_USER_AGENT = f"hermes-cli/{_HERMES_VERSION}"
from hermes_cli.volcengine_byteplus import (
BYTEPLUS_PROVIDER,
VOLCENGINE_PROVIDER,
provider_models,
)
COPILOT_BASE_URL = "https://api.githubcopilot.com"
COPILOT_MODELS_URL = f"{COPILOT_BASE_URL}/models"
COPILOT_EDITOR_VERSION = "vscode/1.104.1"
@@ -53,6 +59,7 @@ OPENROUTER_MODELS: list[tuple[str, str]] = [
("stepfun/step-3.5-flash", ""),
("minimax/minimax-m2.7", ""),
("minimax/minimax-m2.5", ""),
("minimax/minimax-m2.5:free", "free"),
("z-ai/glm-5.1", ""),
("z-ai/glm-5v-turbo", ""),
("z-ai/glm-5-turbo", ""),
@@ -125,17 +132,15 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"stepfun/step-3.5-flash",
"minimax/minimax-m2.7",
"minimax/minimax-m2.5",
"minimax/minimax-m2.5:free",
"z-ai/glm-5.1",
"z-ai/glm-5v-turbo",
"z-ai/glm-5-turbo",
"x-ai/grok-4.20-beta",
"nvidia/nemotron-3-super-120b-a12b",
"nvidia/nemotron-3-super-120b-a12b:free",
"arcee-ai/trinity-large-preview:free",
"arcee-ai/trinity-large-thinking",
"openai/gpt-5.4-pro",
"openai/gpt-5.4-nano",
"openrouter/elephant-alpha",
],
"openai-codex": _codex_curated_models(),
"copilot-acp": [
@@ -211,6 +216,10 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"kimi-k2-turbo-preview",
"kimi-k2-0905-preview",
],
"stepfun": [
"step-3.5-flash",
"step-3.5-flash-2603",
],
"moonshot": [
"kimi-k2.6",
"kimi-k2.5",
@@ -353,6 +362,8 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"us.meta.llama4-maverick-17b-instruct-v1:0",
"us.meta.llama4-scout-17b-instruct-v1:0",
],
VOLCENGINE_PROVIDER: provider_models(VOLCENGINE_PROVIDER),
BYTEPLUS_PROVIDER: provider_models(BYTEPLUS_PROVIDER),
}
# Vercel AI Gateway: derive the bare-model-id catalog from the curated
@@ -362,17 +373,11 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
_PROVIDER_MODELS["ai-gateway"] = [mid for mid, _ in VERCEL_AI_GATEWAY_MODELS]
# ---------------------------------------------------------------------------
# Nous Portal free-model filtering
# Nous Portal free-model helper
# ---------------------------------------------------------------------------
# Models that are ALLOWED to appear when priced as free on Nous Portal.
# Any other free model is hidden — prevents promotional/temporary free models
# from cluttering the selection when users are paying subscribers.
# Models in this list are ALSO filtered out if they are NOT free (i.e. they
# should only appear in the menu when they are genuinely free).
_NOUS_ALLOWED_FREE_MODELS: frozenset[str] = frozenset({
"xiaomi/mimo-v2-pro",
"xiaomi/mimo-v2-omni",
})
# The Nous Portal models endpoint is the source of truth for which models
# are currently offered (free or paid). We trust whatever it returns and
# surface it to users as-is — no local allowlist filtering.
def _is_model_free(model_id: str, pricing: dict[str, dict[str, str]]) -> bool:
@@ -386,35 +391,6 @@ def _is_model_free(model_id: str, pricing: dict[str, dict[str, str]]) -> bool:
return False
def filter_nous_free_models(
model_ids: list[str],
pricing: dict[str, dict[str, str]],
) -> list[str]:
"""Filter the Nous Portal model list according to free-model policy.
Rules:
Paid models that are NOT in the allowlist keep (normal case).
Free models that are NOT in the allowlist drop.
Allowlist models that ARE free keep.
Allowlist models that are NOT free drop.
"""
if not pricing:
return model_ids # no pricing data — can't filter, show everything
result: list[str] = []
for mid in model_ids:
free = _is_model_free(mid, pricing)
if mid in _NOUS_ALLOWED_FREE_MODELS:
# Allowlist model: only show when it's actually free
if free:
result.append(mid)
else:
# Regular model: keep only when it's NOT free
if not free:
result.append(mid)
return result
# ---------------------------------------------------------------------------
# Nous Portal account tier detection
# ---------------------------------------------------------------------------
@@ -478,8 +454,7 @@ def partition_nous_models_by_tier(
) -> tuple[list[str], list[str]]:
"""Split Nous models into (selectable, unavailable) based on user tier.
For paid-tier users: all models are selectable, none unavailable
(free-model filtering is handled separately by ``filter_nous_free_models``).
For paid-tier users: all models are selectable, none unavailable.
For free-tier users: only free models are selectable; paid models
are returned as unavailable (shown grayed out in the menu).
@@ -549,6 +524,157 @@ def check_nous_free_tier() -> bool:
return False # default to paid on error — don't block users
# ---------------------------------------------------------------------------
# Nous Portal recommended models
#
# The Portal publishes a curated list of suggested models (separated into
# paid and free tiers) plus dedicated recommendations for compaction (text
# summarisation / auxiliary) and vision tasks. We fetch it once per process
# with a TTL cache so callers can ask "what's the best aux model right now?"
# without hitting the network on every lookup.
#
# Shape of the response (fields we care about):
# {
# "paidRecommendedModels": [ {modelName, ...}, ... ],
# "freeRecommendedModels": [ {modelName, ...}, ... ],
# "paidRecommendedCompactionModel": {modelName, ...} | null,
# "paidRecommendedVisionModel": {modelName, ...} | null,
# "freeRecommendedCompactionModel": {modelName, ...} | null,
# "freeRecommendedVisionModel": {modelName, ...} | null,
# }
# ---------------------------------------------------------------------------
NOUS_RECOMMENDED_MODELS_PATH = "/api/nous/recommended-models"
_NOUS_RECOMMENDED_CACHE_TTL: int = 600 # seconds (10 minutes)
# (result_dict, timestamp) keyed by portal_base_url so staging vs prod don't collide.
_nous_recommended_cache: dict[str, tuple[dict[str, Any], float]] = {}
def fetch_nous_recommended_models(
portal_base_url: str = "",
timeout: float = 5.0,
*,
force_refresh: bool = False,
) -> dict[str, Any]:
"""Fetch the Nous Portal's curated recommended-models payload.
Hits ``<portal>/api/nous/recommended-models``. The endpoint is public
no auth is required. Results are cached per portal URL for
``_NOUS_RECOMMENDED_CACHE_TTL`` seconds; pass ``force_refresh=True`` to
bypass the cache.
Returns the parsed JSON dict on success, or ``{}`` on any failure
(network, parse, non-2xx). Callers must treat missing/null fields as
"no recommendation" and fall back to their own default.
"""
base = (portal_base_url or "https://portal.nousresearch.com").rstrip("/")
now = time.monotonic()
cached = _nous_recommended_cache.get(base)
if not force_refresh and cached is not None:
payload, cached_at = cached
if now - cached_at < _NOUS_RECOMMENDED_CACHE_TTL:
return payload
url = f"{base}{NOUS_RECOMMENDED_MODELS_PATH}"
try:
req = urllib.request.Request(
url,
headers={"Accept": "application/json"},
)
with urllib.request.urlopen(req, timeout=timeout) as resp:
data = json.loads(resp.read().decode())
if not isinstance(data, dict):
data = {}
except Exception:
data = {}
_nous_recommended_cache[base] = (data, now)
return data
def _resolve_nous_portal_url() -> str:
"""Best-effort lookup of the Portal base URL the user is authed against."""
try:
from hermes_cli.auth import (
DEFAULT_NOUS_PORTAL_URL,
get_provider_auth_state,
)
state = get_provider_auth_state("nous") or {}
portal = str(state.get("portal_base_url") or "").strip()
if portal:
return portal.rstrip("/")
return str(DEFAULT_NOUS_PORTAL_URL).rstrip("/")
except Exception:
return "https://portal.nousresearch.com"
def _extract_model_name(entry: Any) -> Optional[str]:
"""Pull the ``modelName`` field from a recommended-model entry, else None."""
if not isinstance(entry, dict):
return None
model_name = entry.get("modelName")
if isinstance(model_name, str) and model_name.strip():
return model_name.strip()
return None
def get_nous_recommended_aux_model(
*,
vision: bool = False,
free_tier: Optional[bool] = None,
portal_base_url: str = "",
force_refresh: bool = False,
) -> Optional[str]:
"""Return the Portal's recommended model name for an auxiliary task.
Picks the best field from the Portal's recommended-models payload:
* ``vision=True`` ``paidRecommendedVisionModel`` (paid tier) or
``freeRecommendedVisionModel`` (free tier)
* ``vision=False`` ``paidRecommendedCompactionModel`` or
``freeRecommendedCompactionModel``
When ``free_tier`` is ``None`` (default) the user's tier is auto-detected
via :func:`check_nous_free_tier`. Pass an explicit bool to bypass the
detection useful for tests or when the caller already knows the tier.
For paid-tier users we prefer the paid recommendation but gracefully fall
back to the free recommendation if the Portal returned ``null`` for the
paid field (common during the staged rollout of new paid models).
Returns ``None`` when every candidate is missing, null, or the fetch
fails callers should fall back to their own default (currently
``google/gemini-3-flash-preview``).
"""
base = portal_base_url or _resolve_nous_portal_url()
payload = fetch_nous_recommended_models(base, force_refresh=force_refresh)
if not payload:
return None
if free_tier is None:
try:
free_tier = check_nous_free_tier()
except Exception:
# On any detection error, assume paid — paid users see both fields
# anyway so this is a safe default that maximises model quality.
free_tier = False
if vision:
paid_key, free_key = "paidRecommendedVisionModel", "freeRecommendedVisionModel"
else:
paid_key, free_key = "paidRecommendedCompactionModel", "freeRecommendedCompactionModel"
# Preference order:
# free tier → free only
# paid tier → paid, then free (if paid field is null)
candidates = [free_key] if free_tier else [paid_key, free_key]
for key in candidates:
name = _extract_model_name(payload.get(key))
if name:
return name
return None
# ---------------------------------------------------------------------------
# Canonical provider list — single source of truth for provider identity.
# Every code path that lists, displays, or iterates providers derives from
@@ -572,6 +698,8 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
ProviderEntry("ai-gateway", "Vercel AI Gateway", "Vercel AI Gateway (200+ models, $5 free credit, no markup)"),
ProviderEntry("anthropic", "Anthropic", "Anthropic (Claude models — API key or Claude Code)"),
ProviderEntry("openai-codex", "OpenAI Codex", "OpenAI Codex"),
ProviderEntry(VOLCENGINE_PROVIDER, "Volcengine", "Volcengine (standard + Coding Plan catalogs)"),
ProviderEntry(BYTEPLUS_PROVIDER, "BytePlus", "BytePlus (standard + Coding Plan catalogs)"),
ProviderEntry("xiaomi", "Xiaomi MiMo", "Xiaomi MiMo (MiMo-V2 models — pro, omni, flash)"),
ProviderEntry("nvidia", "NVIDIA NIM", "NVIDIA NIM (Nemotron models — build.nvidia.com or local NIM)"),
ProviderEntry("qwen-oauth", "Qwen OAuth (Portal)", "Qwen OAuth (reuses local Qwen CLI login)"),
@@ -585,6 +713,7 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
ProviderEntry("zai", "Z.AI / GLM", "Z.AI / GLM (Zhipu AI direct API)"),
ProviderEntry("kimi-coding", "Kimi / Kimi Coding Plan", "Kimi Coding Plan (api.kimi.com) & Moonshot API"),
ProviderEntry("kimi-coding-cn", "Kimi / Moonshot (China)", "Kimi / Moonshot China (Moonshot CN direct API)"),
ProviderEntry("stepfun", "StepFun Step Plan", "StepFun Step Plan (agent/coding models via Step Plan API)"),
ProviderEntry("minimax", "MiniMax", "MiniMax (global direct API)"),
ProviderEntry("minimax-cn", "MiniMax (China)", "MiniMax China (domestic direct API)"),
ProviderEntry("alibaba", "Alibaba Cloud (DashScope)","Alibaba Cloud / DashScope Coding (Qwen + multi-provider)"),
@@ -600,7 +729,6 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
_PROVIDER_LABELS = {p.slug: p.label for p in CANONICAL_PROVIDERS}
_PROVIDER_LABELS["custom"] = "Custom endpoint" # special case: not a named provider
_PROVIDER_ALIASES = {
"glm": "zai",
"z-ai": "zai",
@@ -619,6 +747,8 @@ _PROVIDER_ALIASES = {
"moonshot": "kimi-coding",
"kimi-cn": "kimi-coding-cn",
"moonshot-cn": "kimi-coding-cn",
"step": "stepfun",
"stepfun-coding-plan": "stepfun",
"arcee-ai": "arcee",
"arceeai": "arcee",
"minimax-china": "minimax-cn",
@@ -661,6 +791,10 @@ _PROVIDER_ALIASES = {
"nemotron": "nvidia",
"ollama": "custom", # bare "ollama" = local; use "ollama-cloud" for cloud
"ollama_cloud": "ollama-cloud",
"volcengine-coding-plan": VOLCENGINE_PROVIDER,
"volcengine_coding_plan": VOLCENGINE_PROVIDER,
"byteplus-coding-plan": BYTEPLUS_PROVIDER,
"byteplus_coding_plan": BYTEPLUS_PROVIDER,
}
@@ -1121,7 +1255,6 @@ def list_available_providers() -> list[dict[str, str]]:
"""
# Derive display order from canonical list + custom
provider_order = [p.slug for p in CANONICAL_PROVIDERS] + ["custom"]
# Build reverse alias map
aliases_for: dict[str, list[str]] = {}
for alias, canonical in _PROVIDER_ALIASES.items():
@@ -1137,7 +1270,7 @@ def list_available_providers() -> list[dict[str, str]]:
from hermes_cli.auth import get_auth_status, has_usable_secret
if pid == "custom":
custom_base_url = _get_custom_base_url() or ""
has_creds = bool(custom_base_url.strip())
has_creds = bool(custom_base_url.strip()) and provider_for_base_url(custom_base_url) is None
elif pid == "openrouter":
has_creds = has_usable_secret(os.getenv("OPENROUTER_API_KEY", ""))
else:
@@ -1203,6 +1336,29 @@ def _get_custom_base_url() -> str:
return ""
def provider_for_base_url(base_url: str) -> Optional[str]:
"""Return a known built-in provider for a configured base URL, if any.
Uses the canonical _URL_TO_PROVIDER mapping from model_metadata plus
additional entries for providers not in that dict.
"""
normalized = str(base_url or "").strip().rstrip("/")
if not normalized or "openrouter.ai" in normalized.lower():
return None
url_lower = normalized.lower()
# Primary source — shared with context-length resolution
from agent.model_metadata import _URL_TO_PROVIDER
for host, provider_id in _URL_TO_PROVIDER.items():
if host in url_lower:
canonical = normalize_provider(provider_id)
if canonical in _PROVIDER_LABELS and canonical != "custom":
return canonical
return None
def curated_models_for_provider(
provider: Optional[str],
*,
@@ -1499,6 +1655,19 @@ def provider_model_ids(provider: Optional[str], *, force_refresh: bool = False)
return live
except Exception:
pass
if normalized == "stepfun":
try:
from hermes_cli.auth import resolve_api_key_provider_credentials
creds = resolve_api_key_provider_credentials("stepfun")
api_key = str(creds.get("api_key") or "").strip()
base_url = str(creds.get("base_url") or "").strip()
if api_key and base_url:
live = fetch_api_models(api_key, base_url)
if live:
return live
except Exception:
pass
if normalized == "anthropic":
live = _fetch_anthropic_models()
if live:
+252 -53
View File
@@ -133,6 +133,9 @@ def _get_enabled_plugins() -> Optional[set]:
# Data classes
# ---------------------------------------------------------------------------
_VALID_PLUGIN_KINDS: Set[str] = {"standalone", "backend", "exclusive"}
@dataclass
class PluginManifest:
"""Parsed representation of a plugin.yaml manifest."""
@@ -146,6 +149,23 @@ class PluginManifest:
provides_hooks: List[str] = field(default_factory=list)
source: str = "" # "user", "project", or "entrypoint"
path: Optional[str] = None
# Plugin kind — see plugins.py module docstring for semantics.
# ``standalone`` (default): hooks/tools of its own; opt-in via
# ``plugins.enabled``.
# ``backend``: pluggable backend for an existing core tool (e.g.
# image_gen). Built-in (bundled) backends auto-load;
# user-installed still gated by ``plugins.enabled``.
# ``exclusive``: category with exactly one active provider (memory).
# Selection via ``<category>.provider`` config key; the
# category's own discovery system handles loading and the
# general scanner skips these.
kind: str = "standalone"
# Registry key — path-derived, used by ``plugins.enabled``/``disabled``
# lookups and by ``hermes plugins list``. For a flat plugin at
# ``plugins/disk-cleanup/`` the key is ``disk-cleanup``; for a nested
# category plugin at ``plugins/image_gen/openai/`` the key is
# ``image_gen/openai``. When empty, falls back to ``name``.
key: str = ""
@dataclass
@@ -366,6 +386,33 @@ class PluginContext:
self.manifest.name, engine.name,
)
# -- image gen provider registration ------------------------------------
def register_image_gen_provider(self, provider) -> None:
"""Register an image generation backend.
``provider`` must be an instance of
:class:`agent.image_gen_provider.ImageGenProvider`. The
``provider.name`` attribute is what ``image_gen.provider`` in
``config.yaml`` matches against when routing ``image_generate``
tool calls.
"""
from agent.image_gen_provider import ImageGenProvider
from agent.image_gen_registry import register_provider
if not isinstance(provider, ImageGenProvider):
logger.warning(
"Plugin '%s' tried to register an image_gen provider that does "
"not inherit from ImageGenProvider. Ignoring.",
self.manifest.name,
)
return
register_provider(provider)
logger.info(
"Plugin '%s' registered image_gen provider: %s",
self.manifest.name, provider.name,
)
# -- hook registration --------------------------------------------------
def register_hook(self, hook_name: str, callback: Callable) -> None:
@@ -465,11 +512,16 @@ class PluginManager:
manifests: List[PluginManifest] = []
# 1. Bundled plugins (<repo>/plugins/<name>/)
# Repo-shipped generic plugins live next to hermes_cli/. Memory and
# context_engine subdirs are handled by their own discovery paths, so
# skip those names here. Bundled plugins are discovered (so they
# show up in `hermes plugins`) but only loaded when added to
# `plugins.enabled` in config.yaml — opt-in like any other plugin.
#
# Repo-shipped plugins live next to hermes_cli/. Two layouts are
# supported (see ``_scan_directory`` for details):
#
# - flat: ``plugins/disk-cleanup/plugin.yaml`` (standalone)
# - category: ``plugins/image_gen/openai/plugin.yaml`` (backend)
#
# ``memory/`` and ``context_engine/`` are skipped at the top level —
# they have their own discovery systems. Porting those to the
# category-namespace ``kind: exclusive`` model is a future PR.
repo_plugins = Path(__file__).resolve().parent.parent / "plugins"
manifests.extend(
self._scan_directory(
@@ -492,36 +544,69 @@ class PluginManager:
manifests.extend(self._scan_entry_points())
# Load each manifest (skip user-disabled plugins).
# Later sources override earlier ones on name collision — user plugins
# take precedence over bundled, project plugins take precedence over
# user. Dedup here so we only load the final winner.
# Later sources override earlier ones on key collision — user
# plugins take precedence over bundled, project plugins take
# precedence over user. Dedup here so we only load the final
# winner. Keys are path-derived (``image_gen/openai``,
# ``disk-cleanup``) so ``tts/openai`` and ``image_gen/openai``
# don't collide even when both manifests say ``name: openai``.
disabled = _get_disabled_plugins()
enabled = _get_enabled_plugins() # None = opt-in default (nothing enabled)
winners: Dict[str, PluginManifest] = {}
for manifest in manifests:
winners[manifest.name] = manifest
winners[manifest.key or manifest.name] = manifest
for manifest in winners.values():
# Explicit disable always wins.
if manifest.name in disabled:
lookup_key = manifest.key or manifest.name
# Explicit disable always wins (matches on key or on legacy
# bare name for back-compat with existing user configs).
if lookup_key in disabled or manifest.name in disabled:
loaded = LoadedPlugin(manifest=manifest, enabled=False)
loaded.error = "disabled via config"
self._plugins[manifest.name] = loaded
logger.debug("Skipping disabled plugin '%s'", manifest.name)
self._plugins[lookup_key] = loaded
logger.debug("Skipping disabled plugin '%s'", lookup_key)
continue
# Opt-in gate: plugins must be in the enabled allow-list.
# If the allow-list is missing (None), treat as "nothing enabled"
# — users have to explicitly enable plugins to load them.
# Memory and context_engine providers are excluded from this gate
# since they have their own single-select config (memory.provider
# / context.engine), not the enabled list.
if enabled is None or manifest.name not in enabled:
# Exclusive plugins (memory providers) have their own
# discovery/activation path. The general loader records the
# manifest for introspection but does not load the module.
if manifest.kind == "exclusive":
loaded = LoadedPlugin(manifest=manifest, enabled=False)
loaded.error = "not enabled in config (run `hermes plugins enable {}` to activate)".format(
manifest.name
loaded.error = (
"exclusive plugin — activate via <category>.provider config"
)
self._plugins[manifest.name] = loaded
self._plugins[lookup_key] = loaded
logger.debug(
"Skipping '%s' (not in plugins.enabled)", manifest.name
"Skipping '%s' (exclusive, handled by category discovery)",
lookup_key,
)
continue
# Built-in backends auto-load — they ship with hermes and must
# just work. Selection among them (e.g. which image_gen backend
# services calls) is driven by ``<category>.provider`` config,
# enforced by the tool wrapper.
if manifest.kind == "backend" and manifest.source == "bundled":
self._load_plugin(manifest)
continue
# Everything else (standalone, user-installed backends,
# entry-point plugins) is opt-in via plugins.enabled.
# Accept both the path-derived key and the legacy bare name
# so existing configs keep working.
is_enabled = (
enabled is not None
and (lookup_key in enabled or manifest.name in enabled)
)
if not is_enabled:
loaded = LoadedPlugin(manifest=manifest, enabled=False)
loaded.error = (
"not enabled in config (run `hermes plugins enable {}` to activate)"
.format(lookup_key)
)
self._plugins[lookup_key] = loaded
logger.debug(
"Skipping '%s' (not in plugins.enabled)", lookup_key
)
continue
self._load_plugin(manifest)
@@ -545,9 +630,37 @@ class PluginManager:
) -> List[PluginManifest]:
"""Read ``plugin.yaml`` manifests from subdirectories of *path*.
*skip_names* is an optional allow-list of names to ignore (used
for the bundled scan to exclude ``memory`` / ``context_engine``
subdirs that have their own discovery path).
Supports two layouts, mixed freely:
* **Flat** ``<root>/<plugin-name>/plugin.yaml``. Key is
``<plugin-name>`` (e.g. ``disk-cleanup``).
* **Category** ``<root>/<category>/<plugin-name>/plugin.yaml``,
where the ``<category>`` directory itself has no ``plugin.yaml``.
Key is ``<category>/<plugin-name>`` (e.g. ``image_gen/openai``).
Depth is capped at two segments.
*skip_names* is an optional allow-list of names to ignore at the
top level (kept for back-compat; the current call sites no longer
pass it now that categories are first-class).
"""
return self._scan_directory_level(
path, source, skip_names=skip_names, prefix="", depth=0
)
def _scan_directory_level(
self,
path: Path,
source: str,
*,
skip_names: Optional[Set[str]],
prefix: str,
depth: int,
) -> List[PluginManifest]:
"""Recursive implementation of :meth:`_scan_directory`.
``prefix`` is the category path already accumulated ("" at root,
"image_gen" one level in). ``depth`` is the recursion depth; we
cap at 2 so ``<root>/a/b/c/`` is ignored.
"""
manifests: List[PluginManifest] = []
if not path.is_dir():
@@ -556,37 +669,112 @@ class PluginManager:
for child in sorted(path.iterdir()):
if not child.is_dir():
continue
if skip_names and child.name in skip_names:
if depth == 0 and skip_names and child.name in skip_names:
continue
manifest_file = child / "plugin.yaml"
if not manifest_file.exists():
manifest_file = child / "plugin.yml"
if not manifest_file.exists():
logger.debug("Skipping %s (no plugin.yaml)", child)
if manifest_file.exists():
manifest = self._parse_manifest(
manifest_file, child, source, prefix
)
if manifest is not None:
manifests.append(manifest)
continue
try:
if yaml is None:
logger.warning("PyYAML not installed cannot load %s", manifest_file)
continue
data = yaml.safe_load(manifest_file.read_text()) or {}
manifest = PluginManifest(
name=data.get("name", child.name),
version=str(data.get("version", "")),
description=data.get("description", ""),
author=data.get("author", ""),
requires_env=data.get("requires_env", []),
provides_tools=data.get("provides_tools", []),
provides_hooks=data.get("provides_hooks", []),
source=source,
path=str(child),
# No manifest at this level. If we're still within the depth
# cap, treat this directory as a category namespace and recurse
# one level in looking for children with manifests.
if depth >= 1:
logger.debug("Skipping %s (no plugin.yaml, depth cap reached)", child)
continue
sub_prefix = f"{prefix}/{child.name}" if prefix else child.name
manifests.extend(
self._scan_directory_level(
child,
source,
skip_names=None,
prefix=sub_prefix,
depth=depth + 1,
)
manifests.append(manifest)
except Exception as exc:
logger.warning("Failed to parse %s: %s", manifest_file, exc)
)
return manifests
def _parse_manifest(
self,
manifest_file: Path,
plugin_dir: Path,
source: str,
prefix: str,
) -> Optional[PluginManifest]:
"""Parse a single ``plugin.yaml`` into a :class:`PluginManifest`.
Returns ``None`` on parse failure (logs a warning).
"""
try:
if yaml is None:
logger.warning("PyYAML not installed cannot load %s", manifest_file)
return None
data = yaml.safe_load(manifest_file.read_text()) or {}
name = data.get("name", plugin_dir.name)
key = f"{prefix}/{plugin_dir.name}" if prefix else name
raw_kind = data.get("kind", "standalone")
if not isinstance(raw_kind, str):
raw_kind = "standalone"
kind = raw_kind.strip().lower()
if kind not in _VALID_PLUGIN_KINDS:
logger.warning(
"Plugin %s: unknown kind '%s' (valid: %s); treating as 'standalone'",
key, raw_kind, ", ".join(sorted(_VALID_PLUGIN_KINDS)),
)
kind = "standalone"
# Auto-coerce user-installed memory providers to kind="exclusive"
# so they're routed to plugins/memory discovery instead of being
# loaded by the general PluginManager (which has no
# register_memory_provider on PluginContext). Mirrors the
# heuristic in plugins/memory/__init__.py:_is_memory_provider_dir.
# Bundled memory providers are already skipped via skip_names.
if kind == "standalone" and "kind" not in data:
init_file = plugin_dir / "__init__.py"
if init_file.exists():
try:
source_text = init_file.read_text(errors="replace")[:8192]
if (
"register_memory_provider" in source_text
or "MemoryProvider" in source_text
):
kind = "exclusive"
logger.debug(
"Plugin %s: detected memory provider, "
"treating as kind='exclusive'",
key,
)
except Exception:
pass
return PluginManifest(
name=name,
version=str(data.get("version", "")),
description=data.get("description", ""),
author=data.get("author", ""),
requires_env=data.get("requires_env", []),
provides_tools=data.get("provides_tools", []),
provides_hooks=data.get("provides_hooks", []),
source=source,
path=str(plugin_dir),
kind=kind,
key=key,
)
except Exception as exc:
logger.warning("Failed to parse %s: %s", manifest_file, exc)
return None
# -----------------------------------------------------------------------
# Entry-point scanning
# -----------------------------------------------------------------------
@@ -609,6 +797,7 @@ class PluginManager:
name=ep.name,
source="entrypoint",
path=ep.value,
key=ep.name,
)
manifests.append(manifest)
except Exception as exc:
@@ -670,10 +859,16 @@ class PluginManager:
loaded.error = str(exc)
logger.warning("Failed to load plugin '%s': %s", manifest.name, exc)
self._plugins[manifest.name] = loaded
self._plugins[manifest.key or manifest.name] = loaded
def _load_directory_module(self, manifest: PluginManifest) -> types.ModuleType:
"""Import a directory-based plugin as ``hermes_plugins.<name>``."""
"""Import a directory-based plugin as ``hermes_plugins.<slug>``.
The module slug is derived from ``manifest.key`` so category-namespaced
plugins (``image_gen/openai``) import as
``hermes_plugins.image_gen__openai`` without colliding with any
future ``tts/openai``.
"""
plugin_dir = Path(manifest.path) # type: ignore[arg-type]
init_file = plugin_dir / "__init__.py"
if not init_file.exists():
@@ -686,7 +881,9 @@ class PluginManager:
ns_pkg.__package__ = _NS_PARENT
sys.modules[_NS_PARENT] = ns_pkg
module_name = f"{_NS_PARENT}.{manifest.name.replace('-', '_')}"
key = manifest.key or manifest.name
slug = key.replace("/", "__").replace("-", "_")
module_name = f"{_NS_PARENT}.{slug}"
spec = importlib.util.spec_from_file_location(
module_name,
init_file,
@@ -767,10 +964,12 @@ class PluginManager:
def list_plugins(self) -> List[Dict[str, Any]]:
"""Return a list of info dicts for all discovered plugins."""
result: List[Dict[str, Any]] = []
for name, loaded in sorted(self._plugins.items()):
for key, loaded in sorted(self._plugins.items()):
result.append(
{
"name": name,
"name": loaded.manifest.name,
"key": loaded.manifest.key or loaded.manifest.name,
"kind": loaded.manifest.kind,
"version": loaded.manifest.version,
"description": loaded.manifest.description,
"source": loaded.manifest.source,
+45
View File
@@ -23,6 +23,12 @@ import logging
from dataclasses import dataclass
from typing import Any, Dict, List, Optional, Tuple
from hermes_cli.volcengine_byteplus import (
BYTEPLUS_PROVIDER,
BYTEPLUS_STANDARD_BASE_URL,
VOLCENGINE_PROVIDER,
VOLCENGINE_STANDARD_BASE_URL,
)
from utils import base_url_host_matches, base_url_hostname
logger = logging.getLogger(__name__)
@@ -94,6 +100,12 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
transport="openai_chat",
base_url_env_var="KIMI_BASE_URL",
),
"stepfun": HermesOverlay(
transport="openai_chat",
extra_env_vars=("STEPFUN_API_KEY",),
base_url_override="https://api.stepfun.ai/step_plan/v1",
base_url_env_var="STEPFUN_BASE_URL",
),
"minimax": HermesOverlay(
transport="anthropic_messages",
base_url_env_var="MINIMAX_BASE_URL",
@@ -157,6 +169,16 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
transport="openai_chat",
base_url_env_var="OLLAMA_BASE_URL",
),
VOLCENGINE_PROVIDER: HermesOverlay(
transport="openai_chat",
extra_env_vars=("VOLCENGINE_API_KEY",),
base_url_override=VOLCENGINE_STANDARD_BASE_URL,
),
BYTEPLUS_PROVIDER: HermesOverlay(
transport="openai_chat",
extra_env_vars=("BYTEPLUS_API_KEY",),
base_url_override=BYTEPLUS_STANDARD_BASE_URL,
),
}
@@ -210,6 +232,10 @@ ALIASES: Dict[str, str] = {
"kimi-coding-cn": "kimi-for-coding",
"moonshot": "kimi-for-coding",
# stepfun
"step": "stepfun",
"stepfun-coding-plan": "stepfun",
# minimax-cn
"minimax-china": "minimax-cn",
"minimax_cn": "minimax-cn",
@@ -263,6 +289,10 @@ ALIASES: Dict[str, str] = {
# xiaomi
"mimo": "xiaomi",
"xiaomi-mimo": "xiaomi",
"volcengine-coding-plan": VOLCENGINE_PROVIDER,
"volcengine_coding_plan": VOLCENGINE_PROVIDER,
"byteplus-coding-plan": BYTEPLUS_PROVIDER,
"byteplus_coding_plan": BYTEPLUS_PROVIDER,
# bedrock
"aws": "bedrock",
@@ -294,7 +324,10 @@ _LABEL_OVERRIDES: Dict[str, str] = {
"nous": "Nous Portal",
"openai-codex": "OpenAI Codex",
"copilot-acp": "GitHub Copilot ACP",
"stepfun": "StepFun Step Plan",
"xiaomi": "Xiaomi MiMo",
VOLCENGINE_PROVIDER: "Volcengine",
BYTEPLUS_PROVIDER: "BytePlus",
"local": "Local endpoint",
"bedrock": "AWS Bedrock",
"ollama-cloud": "Ollama Cloud",
@@ -427,6 +460,16 @@ def determine_api_mode(provider: str, base_url: str = "") -> str:
"""
pdef = get_provider(provider)
if pdef is not None:
# Even for known providers, check URL heuristics for special endpoints
# (e.g. kimi /coding endpoint needs anthropic_messages even on 'custom')
if base_url:
url_lower = base_url.rstrip("/").lower()
if "api.kimi.com/coding" in url_lower:
return "anthropic_messages"
if url_lower.endswith("/anthropic") or "api.anthropic.com" in url_lower:
return "anthropic_messages"
if "api.openai.com" in url_lower:
return "codex_responses"
return TRANSPORT_TO_API_MODE.get(pdef.transport, "chat_completions")
# Direct provider checks for providers not in HERMES_OVERLAYS
@@ -439,6 +482,8 @@ def determine_api_mode(provider: str, base_url: str = "") -> str:
hostname = base_url_hostname(base_url)
if url_lower.endswith("/anthropic") or hostname == "api.anthropic.com":
return "anthropic_messages"
if hostname == "api.kimi.com" and "/coding" in url_lower:
return "anthropic_messages"
if hostname == "api.openai.com":
return "codex_responses"
if hostname.startswith("bedrock-runtime.") and base_url_host_matches(base_url, "amazonaws.com"):
+10 -3
View File
@@ -46,6 +46,9 @@ def _detect_api_mode_for_url(base_url: str) -> Optional[str]:
protocol under a ``/anthropic`` suffix treat those as
``anthropic_messages`` transport instead of the default
``chat_completions``.
- Kimi Code's ``api.kimi.com/coding`` endpoint also speaks the
Anthropic Messages protocol (the /coding route accepts Claude
Code's native request shape).
"""
normalized = (base_url or "").strip().lower().rstrip("/")
hostname = base_url_hostname(base_url)
@@ -55,6 +58,8 @@ def _detect_api_mode_for_url(base_url: str) -> Optional[str]:
return "codex_responses"
if normalized.endswith("/anthropic"):
return "anthropic_messages"
if hostname == "api.kimi.com" and "/coding" in normalized:
return "anthropic_messages"
return None
@@ -205,7 +210,8 @@ def _resolve_runtime_from_pool_entry(
api_mode = opencode_model_api_mode(provider, model_cfg.get("default", ""))
else:
# Auto-detect Anthropic-compatible endpoints (/anthropic suffix,
# api.openai.com → codex_responses, api.x.ai → codex_responses).
# Kimi /coding, api.openai.com → codex_responses, api.x.ai →
# codex_responses).
detected = _detect_api_mode_for_url(base_url)
if detected:
api_mode = detected
@@ -637,7 +643,7 @@ def _resolve_explicit_runtime(
base_url = explicit_base_url
if not base_url:
if provider in ("kimi-coding", "kimi-coding-cn"):
if provider in ("kimi-coding", "kimi-coding-cn", "volcengine", "byteplus"):
creds = resolve_api_key_provider_credentials(provider)
base_url = creds.get("base_url", "").rstrip("/")
else:
@@ -660,7 +666,8 @@ def _resolve_explicit_runtime(
if configured_mode:
api_mode = configured_mode
else:
# Auto-detect Anthropic-compatible endpoints (/anthropic suffix).
# Auto-detect from URL (Anthropic /anthropic suffix,
# api.openai.com → Responses, Kimi /coding, etc.).
detected = _detect_api_mode_for_url(base_url)
if detected:
api_mode = detected
+27 -2
View File
@@ -96,6 +96,7 @@ _DEFAULT_PROVIDER_MODELS = {
"zai": ["glm-5.1", "glm-5", "glm-4.7", "glm-4.5", "glm-4.5-flash"],
"kimi-coding": ["kimi-k2.6", "kimi-k2.5", "kimi-k2-thinking", "kimi-k2-turbo-preview"],
"kimi-coding-cn": ["kimi-k2.6", "kimi-k2.5", "kimi-k2-thinking", "kimi-k2-turbo-preview"],
"stepfun": ["step-3.5-flash", "step-3.5-flash-2603"],
"arcee": ["trinity-large-thinking", "trinity-large-preview", "trinity-mini"],
"minimax": ["MiniMax-M2.7", "MiniMax-M2.5", "MiniMax-M2.1", "MiniMax-M2"],
"minimax-cn": ["MiniMax-M2.7", "MiniMax-M2.5", "MiniMax-M2.1", "MiniMax-M2"],
@@ -408,13 +409,36 @@ def _print_setup_summary(config: dict, hermes_home):
("Browser Automation", False, missing_browser_hint)
)
# FAL (image generation)
# Image generation — FAL (direct or via Nous), or any plugin-registered
# provider (OpenAI, etc.)
if subscription_features.image_gen.managed_by_nous:
tool_status.append(("Image Generation (Nous subscription)", True, None))
elif subscription_features.image_gen.available:
tool_status.append(("Image Generation", True, None))
else:
tool_status.append(("Image Generation", False, "FAL_KEY"))
# Fall back to probing plugin-registered providers so OpenAI-only
# setups don't show as "missing FAL_KEY".
_img_backend = None
try:
from agent.image_gen_registry import list_providers
from hermes_cli.plugins import _ensure_plugins_discovered
_ensure_plugins_discovered()
for _p in list_providers():
if _p.name == "fal":
continue
try:
if _p.is_available():
_img_backend = _p.display_name
break
except Exception:
continue
except Exception:
pass
if _img_backend:
tool_status.append((f"Image Generation ({_img_backend})", True, None))
else:
tool_status.append(("Image Generation", False, "FAL_KEY or OPENAI_API_KEY"))
# TTS — show configured provider
tts_provider = config.get("tts", {}).get("provider", "edge")
@@ -781,6 +805,7 @@ def setup_model_provider(config: dict, *, quick: bool = False):
"zai": "Z.AI / GLM",
"kimi-coding": "Kimi / Moonshot",
"kimi-coding-cn": "Kimi / Moonshot (China)",
"stepfun": "StepFun Step Plan",
"minimax": "MiniMax",
"minimax-cn": "MiniMax CN",
"anthropic": "Anthropic",
+2
View File
@@ -122,6 +122,7 @@ def show_status(args):
"OpenAI": "OPENAI_API_KEY",
"Z.AI/GLM": "GLM_API_KEY",
"Kimi": "KIMI_API_KEY",
"StepFun Step Plan": "STEPFUN_API_KEY",
"MiniMax": "MINIMAX_API_KEY",
"MiniMax-CN": "MINIMAX_CN_API_KEY",
"Firecrawl": "FIRECRAWL_API_KEY",
@@ -252,6 +253,7 @@ def show_status(args):
apikey_providers = {
"Z.AI / GLM": ("GLM_API_KEY", "ZAI_API_KEY", "Z_AI_API_KEY"),
"Kimi / Moonshot": ("KIMI_API_KEY",),
"StepFun Step Plan": ("STEPFUN_API_KEY",),
"MiniMax": ("MINIMAX_API_KEY",),
"MiniMax (China)": ("MINIMAX_CN_API_KEY",),
}
+182 -1
View File
@@ -847,6 +847,51 @@ def _configure_toolset(ts_key: str, config: dict):
_configure_simple_requirements(ts_key)
def _plugin_image_gen_providers() -> list[dict]:
"""Build picker-row dicts from plugin-registered image gen providers.
Each returned dict looks like a regular ``TOOL_CATEGORIES`` provider
row but carries an ``image_gen_plugin_name`` marker so downstream
code (config writing, model picker) knows to route through the
plugin registry instead of the in-tree FAL backend.
FAL is skipped it's already exposed by the hardcoded
``TOOL_CATEGORIES["image_gen"]`` entries. When FAL gets ported to
a plugin in a follow-up PR, the hardcoded entries go away and this
function surfaces it alongside OpenAI automatically.
"""
try:
from agent.image_gen_registry import list_providers
from hermes_cli.plugins import _ensure_plugins_discovered
_ensure_plugins_discovered()
providers = list_providers()
except Exception:
return []
rows: list[dict] = []
for provider in providers:
if getattr(provider, "name", None) == "fal":
# FAL has its own hardcoded rows today.
continue
try:
schema = provider.get_setup_schema()
except Exception:
continue
if not isinstance(schema, dict):
continue
rows.append(
{
"name": schema.get("name", provider.display_name),
"badge": schema.get("badge", ""),
"tag": schema.get("tag", ""),
"env_vars": schema.get("env_vars", []),
"image_gen_plugin_name": provider.name,
}
)
return rows
def _visible_providers(cat: dict, config: dict) -> list[dict]:
"""Return provider entries visible for the current auth/config state."""
features = get_nous_subscription_features(config)
@@ -857,6 +902,12 @@ def _visible_providers(cat: dict, config: dict) -> list[dict]:
if provider.get("requires_nous_auth") and not features.nous_auth_present:
continue
visible.append(provider)
# Inject plugin-registered image_gen backends (OpenAI today, more
# later) so the picker lists them alongside FAL / Nous Subscription.
if cat.get("name") == "Image Generation":
visible.extend(_plugin_image_gen_providers())
return visible
@@ -876,7 +927,24 @@ def _toolset_needs_configuration_prompt(ts_key: str, config: dict) -> bool:
browser_cfg = config.get("browser", {})
return not isinstance(browser_cfg, dict) or "cloud_provider" not in browser_cfg
if ts_key == "image_gen":
return not fal_key_is_configured()
# Satisfied when the in-tree FAL backend is configured OR any
# plugin-registered image gen provider is available.
if fal_key_is_configured():
return False
try:
from agent.image_gen_registry import list_providers
from hermes_cli.plugins import _ensure_plugins_discovered
_ensure_plugins_discovered()
for provider in list_providers():
try:
if provider.is_available():
return False
except Exception:
continue
except Exception:
pass
return True
return not _toolset_has_keys(ts_key, config)
@@ -1095,6 +1163,88 @@ def _configure_imagegen_model(backend_name: str, config: dict) -> None:
_print_success(f" Model set to: {chosen}")
def _plugin_image_gen_catalog(plugin_name: str):
"""Return ``(catalog_dict, default_model_id)`` for a plugin provider.
``catalog_dict`` is shaped like the legacy ``FAL_MODELS`` table
``{model_id: {"display", "speed", "strengths", "price", ...}}``
so the existing picker code paths work without change. Returns
``({}, None)`` if the provider isn't registered or has no models.
"""
try:
from agent.image_gen_registry import get_provider
from hermes_cli.plugins import _ensure_plugins_discovered
_ensure_plugins_discovered()
provider = get_provider(plugin_name)
except Exception:
return {}, None
if provider is None:
return {}, None
try:
models = provider.list_models() or []
default = provider.default_model()
except Exception:
return {}, None
catalog = {m["id"]: m for m in models if isinstance(m, dict) and "id" in m}
return catalog, default
def _configure_imagegen_model_for_plugin(plugin_name: str, config: dict) -> None:
"""Prompt the user to pick a model for a plugin-registered backend.
Writes selection to ``image_gen.model``. Mirrors
:func:`_configure_imagegen_model` but sources its catalog from the
plugin registry instead of :data:`IMAGEGEN_BACKENDS`.
"""
catalog, default_model = _plugin_image_gen_catalog(plugin_name)
if not catalog:
return
cur_cfg = config.setdefault("image_gen", {})
if not isinstance(cur_cfg, dict):
cur_cfg = {}
config["image_gen"] = cur_cfg
current_model = cur_cfg.get("model") or default_model
if current_model not in catalog:
current_model = default_model
model_ids = list(catalog.keys())
ordered = [current_model] + [m for m in model_ids if m != current_model]
widths = {
"model": max(len(m) for m in model_ids),
"speed": max((len(catalog[m].get("speed", "")) for m in model_ids), default=6),
"strengths": max((len(catalog[m].get("strengths", "")) for m in model_ids), default=0),
}
print()
header = (
f" {'Model':<{widths['model']}} "
f"{'Speed':<{widths['speed']}} "
f"{'Strengths':<{widths['strengths']}} "
f"Price"
)
print(color(header, Colors.CYAN))
rows = []
for mid in ordered:
row = _format_imagegen_model_row(mid, catalog[mid], widths)
if mid == current_model:
row += " ← currently in use"
rows.append(row)
idx = _prompt_choice(
f" Choose {plugin_name} model:",
rows,
default=0,
)
chosen = ordered[idx]
cur_cfg["model"] = chosen
_print_success(f" Model set to: {chosen}")
def _configure_provider(provider: dict, config: dict):
"""Configure a single provider - prompt for API keys and set config."""
env_vars = provider.get("env_vars", [])
@@ -1151,10 +1301,28 @@ def _configure_provider(provider: dict, config: dict):
_print_success(f" {provider['name']} - no configuration needed!")
if managed_feature:
_print_info(" Requests for this tool will be billed to your Nous subscription.")
# Plugin-registered image_gen provider: write image_gen.provider
# and route model selection to the plugin's own catalog.
plugin_name = provider.get("image_gen_plugin_name")
if plugin_name:
img_cfg = config.setdefault("image_gen", {})
if not isinstance(img_cfg, dict):
img_cfg = {}
config["image_gen"] = img_cfg
img_cfg["provider"] = plugin_name
_print_success(f" image_gen.provider set to: {plugin_name}")
_configure_imagegen_model_for_plugin(plugin_name, config)
return
# Imagegen backends prompt for model selection after backend pick.
backend = provider.get("imagegen_backend")
if backend:
_configure_imagegen_model(backend, config)
# In-tree FAL is the only non-plugin backend today. Keep
# image_gen.provider clear so the dispatch shim falls through
# to the legacy FAL path.
img_cfg = config.setdefault("image_gen", {})
if isinstance(img_cfg, dict) and img_cfg.get("provider") not in (None, "", "fal"):
img_cfg["provider"] = "fal"
return
# Prompt for each required env var
@@ -1189,10 +1357,23 @@ def _configure_provider(provider: dict, config: dict):
if all_configured:
_print_success(f" {provider['name']} configured!")
plugin_name = provider.get("image_gen_plugin_name")
if plugin_name:
img_cfg = config.setdefault("image_gen", {})
if not isinstance(img_cfg, dict):
img_cfg = {}
config["image_gen"] = img_cfg
img_cfg["provider"] = plugin_name
_print_success(f" image_gen.provider set to: {plugin_name}")
_configure_imagegen_model_for_plugin(plugin_name, config)
return
# Imagegen backends prompt for model selection after env vars are in.
backend = provider.get("imagegen_backend")
if backend:
_configure_imagegen_model(backend, config)
img_cfg = config.setdefault("image_gen", {})
if isinstance(img_cfg, dict) and img_cfg.get("provider") not in (None, "", "fal"):
img_cfg["provider"] = "fal"
def _configure_simple_requirements(ts_key: str):
+134
View File
@@ -0,0 +1,134 @@
"""Source-of-truth contracts for built-in providers without models.dev catalogs."""
from __future__ import annotations
from typing import Dict, List, Tuple
VOLCENGINE_PROVIDER = "volcengine"
BYTEPLUS_PROVIDER = "byteplus"
VOLCENGINE_STANDARD_BASE_URL = "https://ark.cn-beijing.volces.com/api/v3"
VOLCENGINE_CODING_PLAN_BASE_URL = "https://ark.cn-beijing.volces.com/api/coding/v3"
BYTEPLUS_STANDARD_BASE_URL = "https://ark.ap-southeast.bytepluses.com/api/v3"
BYTEPLUS_CODING_PLAN_BASE_URL = "https://ark.ap-southeast.bytepluses.com/api/coding/v3"
VOLCENGINE_STANDARD_MODELS: Tuple[str, ...] = (
"doubao-seed-2-0-pro-260215",
"doubao-seed-2-0-lite-260215",
"doubao-seed-2-0-mini-260215",
"doubao-seed-2-0-code-preview-260215",
"kimi-k2-5-260127",
"glm-4-7-251222",
"deepseek-v3-2-251201",
)
VOLCENGINE_CODING_PLAN_MODELS: Tuple[str, ...] = (
"doubao-seed-2.0-code",
"doubao-seed-2.0-pro",
"doubao-seed-2.0-lite",
"doubao-seed-code",
"minimax-m2.5",
"glm-4.7",
"deepseek-v3.2",
"kimi-k2.5",
)
BYTEPLUS_STANDARD_MODELS: Tuple[str, ...] = (
"seed-2-0-pro-260328",
"seed-2-0-lite-260228",
"seed-2-0-mini-260215",
"kimi-k2-5-260127",
"glm-4-7-251222",
)
BYTEPLUS_CODING_PLAN_MODELS: Tuple[str, ...] = (
"dola-seed-2.0-pro",
"dola-seed-2.0-lite",
"bytedance-seed-code",
"glm-4.7",
"kimi-k2.5",
"gpt-oss-120b",
)
VOLCENGINE_STANDARD_MODEL_REFS: Tuple[str, ...] = tuple(
f"{VOLCENGINE_PROVIDER}/{model_id}" for model_id in VOLCENGINE_STANDARD_MODELS
)
VOLCENGINE_CODING_PLAN_MODEL_REFS: Tuple[str, ...] = tuple(
f"{VOLCENGINE_PROVIDER}-coding-plan/{model_id}" for model_id in VOLCENGINE_CODING_PLAN_MODELS
)
BYTEPLUS_STANDARD_MODEL_REFS: Tuple[str, ...] = tuple(
f"{BYTEPLUS_PROVIDER}/{model_id}" for model_id in BYTEPLUS_STANDARD_MODELS
)
BYTEPLUS_CODING_PLAN_MODEL_REFS: Tuple[str, ...] = tuple(
f"{BYTEPLUS_PROVIDER}-coding-plan/{model_id}" for model_id in BYTEPLUS_CODING_PLAN_MODELS
)
PROVIDER_MODEL_CATALOGS: Dict[str, Tuple[str, ...]] = {
VOLCENGINE_PROVIDER: VOLCENGINE_STANDARD_MODEL_REFS + VOLCENGINE_CODING_PLAN_MODEL_REFS,
BYTEPLUS_PROVIDER: BYTEPLUS_STANDARD_MODEL_REFS + BYTEPLUS_CODING_PLAN_MODEL_REFS,
}
MODEL_CONTEXT_WINDOWS: Dict[str, int] = {
"doubao-seed-2-0-pro-260215": 256000,
"doubao-seed-2-0-lite-260215": 256000,
"doubao-seed-2-0-mini-260215": 256000,
"doubao-seed-2-0-code-preview-260215": 256000,
"kimi-k2-5-260127": 256000,
"glm-4-7-251222": 200000,
"deepseek-v3-2-251201": 128000,
"doubao-seed-2.0-code": 256000,
"doubao-seed-2.0-pro": 256000,
"doubao-seed-2.0-lite": 256000,
"doubao-seed-code": 256000,
"minimax-m2.5": 200000,
"glm-4.7": 200000,
"deepseek-v3.2": 128000,
"kimi-k2.5": 256000,
"seed-2-0-pro-260328": 256000,
"seed-2-0-lite-260228": 256000,
"seed-2-0-mini-260215": 256000,
}
def provider_models(provider_id: str) -> List[str]:
"""Return the full user-facing model catalog for a provider."""
return list(PROVIDER_MODEL_CATALOGS.get(provider_id, ()))
def _bare_model_name(model_name: str) -> str:
value = (model_name or "").strip()
if not value:
return ""
if "/" in value:
return value.split("/", 1)[1].strip()
return value
def is_coding_plan_model(provider_id: str, model_name: str) -> bool:
"""Return True when a model belongs to the coding-plan catalog."""
raw = (model_name or "").strip()
bare = _bare_model_name(raw)
if provider_id == VOLCENGINE_PROVIDER:
return raw in VOLCENGINE_CODING_PLAN_MODEL_REFS or bare in VOLCENGINE_CODING_PLAN_MODELS
if provider_id == BYTEPLUS_PROVIDER:
return raw in BYTEPLUS_CODING_PLAN_MODEL_REFS or bare in BYTEPLUS_CODING_PLAN_MODELS
return False
def base_url_for_provider_model(provider_id: str, model_name: str) -> str:
"""Resolve the source-of-truth base URL for a provider+model pair."""
if provider_id == VOLCENGINE_PROVIDER:
if is_coding_plan_model(provider_id, model_name):
return VOLCENGINE_CODING_PLAN_BASE_URL
return VOLCENGINE_STANDARD_BASE_URL
if provider_id == BYTEPLUS_PROVIDER:
if is_coding_plan_model(provider_id, model_name):
return BYTEPLUS_CODING_PLAN_BASE_URL
return BYTEPLUS_STANDARD_BASE_URL
return ""
def model_context_window(model_name: str) -> int | None:
"""Return a known context window for a model, if specified by the contract."""
bare = _bare_model_name(model_name)
return MODEL_CONTEXT_WINDOWS.get(bare)
+6 -3
View File
@@ -2189,7 +2189,8 @@ async def get_usage_analytics(days: int = 30):
SUM(reasoning_tokens) as reasoning_tokens,
COALESCE(SUM(estimated_cost_usd), 0) as estimated_cost,
COALESCE(SUM(actual_cost_usd), 0) as actual_cost,
COUNT(*) as sessions
COUNT(*) as sessions,
SUM(COALESCE(api_call_count, 0)) as api_calls
FROM sessions WHERE started_at > ?
GROUP BY day ORDER BY day
""", (cutoff,))
@@ -2200,7 +2201,8 @@ async def get_usage_analytics(days: int = 30):
SUM(input_tokens) as input_tokens,
SUM(output_tokens) as output_tokens,
COALESCE(SUM(estimated_cost_usd), 0) as estimated_cost,
COUNT(*) as sessions
COUNT(*) as sessions,
SUM(COALESCE(api_call_count, 0)) as api_calls
FROM sessions WHERE started_at > ? AND model IS NOT NULL
GROUP BY model ORDER BY SUM(input_tokens) + SUM(output_tokens) DESC
""", (cutoff,))
@@ -2213,7 +2215,8 @@ async def get_usage_analytics(days: int = 30):
SUM(reasoning_tokens) as total_reasoning,
COALESCE(SUM(estimated_cost_usd), 0) as total_estimated_cost,
COALESCE(SUM(actual_cost_usd), 0) as total_actual_cost,
COUNT(*) as total_sessions
COUNT(*) as total_sessions,
SUM(COALESCE(api_call_count, 0)) as total_api_calls
FROM sessions WHERE started_at > ?
""", (cutoff,))
totals = dict(cur3.fetchone())
+154 -6
View File
@@ -31,7 +31,7 @@ T = TypeVar("T")
DEFAULT_DB_PATH = get_hermes_home() / "state.db"
SCHEMA_VERSION = 6
SCHEMA_VERSION = 8
SCHEMA_SQL = """
CREATE TABLE IF NOT EXISTS schema_version (
@@ -65,6 +65,7 @@ CREATE TABLE IF NOT EXISTS sessions (
cost_source TEXT,
pricing_version TEXT,
title TEXT,
api_call_count INTEGER DEFAULT 0,
FOREIGN KEY (parent_session_id) REFERENCES sessions(id)
);
@@ -80,10 +81,16 @@ CREATE TABLE IF NOT EXISTS messages (
token_count INTEGER,
finish_reason TEXT,
reasoning TEXT,
reasoning_content TEXT,
reasoning_details TEXT,
codex_reasoning_items TEXT
);
CREATE TABLE IF NOT EXISTS state_meta (
key TEXT PRIMARY KEY,
value TEXT
);
CREATE INDEX IF NOT EXISTS idx_sessions_source ON sessions(source);
CREATE INDEX IF NOT EXISTS idx_sessions_parent ON sessions(parent_session_id);
CREATE INDEX IF NOT EXISTS idx_sessions_started ON sessions(started_at DESC);
@@ -329,6 +336,26 @@ class SessionDB:
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 6")
if current_version < 7:
# v7: preserve provider-native reasoning_content separately from
# normalized reasoning text. Kimi/Moonshot replay can require
# this field on assistant tool-call messages when thinking is on.
try:
cursor.execute('ALTER TABLE messages ADD COLUMN "reasoning_content" TEXT')
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 7")
if current_version < 8:
# v8: add api_call_count column to sessions — tracks the number
# of individual LLM API calls made within a session (as opposed
# to the session count itself).
try:
cursor.execute(
'ALTER TABLE sessions ADD COLUMN "api_call_count" INTEGER DEFAULT 0'
)
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 8")
# Unique title index — always ensure it exists (safe to run after migrations
# since the title column is guaranteed to exist at this point)
@@ -435,6 +462,7 @@ class SessionDB:
billing_provider: Optional[str] = None,
billing_base_url: Optional[str] = None,
billing_mode: Optional[str] = None,
api_call_count: int = 0,
absolute: bool = False,
) -> None:
"""Update token counters and backfill model if not already set.
@@ -464,7 +492,8 @@ class SessionDB:
billing_provider = COALESCE(billing_provider, ?),
billing_base_url = COALESCE(billing_base_url, ?),
billing_mode = COALESCE(billing_mode, ?),
model = COALESCE(model, ?)
model = COALESCE(model, ?),
api_call_count = ?
WHERE id = ?"""
else:
sql = """UPDATE sessions SET
@@ -484,7 +513,8 @@ class SessionDB:
billing_provider = COALESCE(billing_provider, ?),
billing_base_url = COALESCE(billing_base_url, ?),
billing_mode = COALESCE(billing_mode, ?),
model = COALESCE(model, ?)
model = COALESCE(model, ?),
api_call_count = COALESCE(api_call_count, 0) + ?
WHERE id = ?"""
params = (
input_tokens,
@@ -502,6 +532,7 @@ class SessionDB:
billing_base_url,
billing_mode,
model,
api_call_count,
session_id,
)
def _do(conn):
@@ -922,6 +953,7 @@ class SessionDB:
token_count: int = None,
finish_reason: str = None,
reasoning: str = None,
reasoning_content: str = None,
reasoning_details: Any = None,
codex_reasoning_items: Any = None,
) -> int:
@@ -951,8 +983,8 @@ class SessionDB:
cursor = conn.execute(
"""INSERT INTO messages (session_id, role, content, tool_call_id,
tool_calls, tool_name, timestamp, token_count, finish_reason,
reasoning, reasoning_details, codex_reasoning_items)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
reasoning, reasoning_content, reasoning_details, codex_reasoning_items)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
(
session_id,
role,
@@ -964,6 +996,7 @@ class SessionDB:
token_count,
finish_reason,
reasoning,
reasoning_content,
reasoning_details_json,
codex_items_json,
),
@@ -1014,7 +1047,7 @@ class SessionDB:
with self._lock:
cursor = self._conn.execute(
"SELECT role, content, tool_call_id, tool_calls, tool_name, "
"reasoning, reasoning_details, codex_reasoning_items "
"reasoning, reasoning_content, reasoning_details, codex_reasoning_items "
"FROM messages WHERE session_id = ? ORDER BY timestamp, id",
(session_id,),
)
@@ -1038,6 +1071,8 @@ class SessionDB:
if row["role"] == "assistant":
if row["reasoning"]:
msg["reasoning"] = row["reasoning"]
if row["reasoning_content"] is not None:
msg["reasoning_content"] = row["reasoning_content"]
if row["reasoning_details"]:
try:
msg["reasoning_details"] = json.loads(row["reasoning_details"])
@@ -1441,3 +1476,116 @@ class SessionDB:
return len(session_ids)
return self._execute_write(_do)
# ── Meta key/value (for scheduler bookkeeping) ──
def get_meta(self, key: str) -> Optional[str]:
"""Read a value from the state_meta key/value store."""
with self._lock:
row = self._conn.execute(
"SELECT value FROM state_meta WHERE key = ?", (key,)
).fetchone()
if row is None:
return None
return row["value"] if isinstance(row, sqlite3.Row) else row[0]
def set_meta(self, key: str, value: str) -> None:
"""Write a value to the state_meta key/value store."""
def _do(conn):
conn.execute(
"INSERT INTO state_meta (key, value) VALUES (?, ?) "
"ON CONFLICT(key) DO UPDATE SET value = excluded.value",
(key, value),
)
self._execute_write(_do)
# ── Space reclamation ──
def vacuum(self) -> None:
"""Run VACUUM to reclaim disk space after large deletes.
SQLite does not shrink the database file when rows are deleted
freed pages just get reused on the next insert. After a prune that
removed hundreds of sessions, the file stays bloated unless we
explicitly VACUUM.
VACUUM rewrites the entire DB, so it's expensive (seconds per
100MB) and cannot run inside a transaction. It also acquires an
exclusive lock, so callers must ensure no other writers are
active. Safe to call at startup before the gateway/CLI starts
serving traffic.
"""
# VACUUM cannot be executed inside a transaction.
with self._lock:
# Best-effort WAL checkpoint first, then VACUUM.
try:
self._conn.execute("PRAGMA wal_checkpoint(TRUNCATE)")
except Exception:
pass
self._conn.execute("VACUUM")
def maybe_auto_prune_and_vacuum(
self,
retention_days: int = 90,
min_interval_hours: int = 24,
vacuum: bool = True,
) -> Dict[str, Any]:
"""Idempotent auto-maintenance: prune old sessions + optional VACUUM.
Records the last run timestamp in state_meta so subsequent calls
within ``min_interval_hours`` no-op. Designed to be called once at
startup from long-lived entrypoints (CLI, gateway, cron scheduler).
Never raises. On any failure, logs a warning and returns a dict
with ``"error"`` set.
Returns a dict with keys:
- ``"skipped"`` (bool) true if within min_interval_hours of last run
- ``"pruned"`` (int) number of sessions deleted
- ``"vacuumed"`` (bool) true if VACUUM ran
- ``"error"`` (str, optional) present only on failure
"""
result: Dict[str, Any] = {"skipped": False, "pruned": 0, "vacuumed": False}
try:
# Skip if another process/call did maintenance recently.
last_raw = self.get_meta("last_auto_prune")
now = time.time()
if last_raw:
try:
last_ts = float(last_raw)
if now - last_ts < min_interval_hours * 3600:
result["skipped"] = True
return result
except (TypeError, ValueError):
pass # corrupt meta; treat as no prior run
pruned = self.prune_sessions(older_than_days=retention_days)
result["pruned"] = pruned
# Only VACUUM if we actually freed rows — VACUUM on a tight DB
# is wasted I/O. Threshold keeps small DBs from paying the cost.
if vacuum and pruned > 0:
try:
self.vacuum()
result["vacuumed"] = True
except Exception as exc:
logger.warning("state.db VACUUM failed: %s", exc)
# Record the attempt even if pruned == 0, so we don't retry
# every startup within the min_interval_hours window.
self.set_meta("last_auto_prune", str(now))
if pruned > 0:
logger.info(
"state.db auto-maintenance: pruned %d session(s) older than %d days%s",
pruned,
retention_days,
" + VACUUM" if result["vacuumed"] else "",
)
except Exception as exc:
# Maintenance must never block startup. Log and return error marker.
logger.warning("state.db auto-maintenance failed: %s", exc)
result["error"] = str(exc)
return result
@@ -0,0 +1,5 @@
# Web Development
Optional skills for client-side web development workflows — embedding agents, copilots, and AI-native UX patterns into user-facing web apps.
These are distinct from Hermes' own browser automation (Browserbase, Camofox), which operate *on* websites from outside. Web-development skills here help users build *into* their own websites.
@@ -0,0 +1,189 @@
---
name: page-agent
description: Embed alibaba/page-agent into your own web application — a pure-JavaScript in-page GUI agent that ships as a single <script> tag or npm package and lets end-users of your site drive the UI with natural language ("click login, fill username as John"). No Python, no headless browser, no extension required. Use this skill when the user is a web developer who wants to add an AI copilot to their SaaS / admin panel / B2B tool, make a legacy web app accessible via natural language, or evaluate page-agent against a local (Ollama) or cloud (Qwen / OpenAI / OpenRouter) LLM. NOT for server-side browser automation — point those users to Hermes' built-in browser tool instead.
version: 1.0.0
author: Hermes Agent
license: MIT
metadata:
hermes:
tags: [web, javascript, agent, browser, gui, alibaba, embed, copilot, saas]
category: web-development
---
# page-agent
alibaba/page-agent (https://github.com/alibaba/page-agent, 17k+ stars, MIT) is an in-page GUI agent written in TypeScript. It lives inside a webpage, reads the DOM as text (no screenshots, no multi-modal LLM), and executes natural-language instructions like "click the login button, then fill username as John" against the current page. Pure client-side — the host site just includes a script and passes an OpenAI-compatible LLM endpoint.
## When to use this skill
Load this skill when a user wants to:
- **Ship an AI copilot inside their own web app** (SaaS, admin panel, B2B tool, ERP, CRM) — "users on my dashboard should be able to type 'create invoice for Acme Corp and email it' instead of clicking through five screens"
- **Modernize a legacy web app** without rewriting the frontend — page-agent drops on top of existing DOM
- **Add accessibility via natural language** — voice / screen-reader users drive the UI by describing what they want
- **Demo or evaluate page-agent** against a local (Ollama) or hosted (Qwen, OpenAI, OpenRouter) LLM
- **Build interactive training / product demos** — let an AI walk a user through "how to submit an expense report" live in the real UI
## When NOT to use this skill
- User wants **Hermes itself to drive a browser** → use Hermes' built-in browser tool (Browserbase / Camofox). page-agent is the *opposite* direction.
- User wants **cross-tab automation without embedding** → use Playwright, browser-use, or the page-agent Chrome extension
- User needs **visual grounding / screenshots** → page-agent is text-DOM only; use a multimodal browser agent instead
## Prerequisites
- Node 22.13+ or 24+, npm 10+ (docs claim 11+ but 10.9 works fine)
- An OpenAI-compatible LLM endpoint: Qwen (DashScope), OpenAI, Ollama, OpenRouter, or anything speaking `/v1/chat/completions`
- Browser with devtools (for debugging)
## Path 1 — 30-second demo via CDN (no install)
Fastest way to see it work. Uses alibaba's free testing LLM proxy — **for evaluation only**, subject to their terms.
Add to any HTML page (or paste into the devtools console as a bookmarklet):
```html
<script src="https://cdn.jsdelivr.net/npm/page-agent@1.8.0/dist/iife/page-agent.demo.js" crossorigin="true"></script>
```
A panel appears. Type an instruction. Done.
Bookmarklet form (drop into bookmarks bar, click on any page):
```javascript
javascript:(function(){var s=document.createElement('script');s.src='https://cdn.jsdelivr.net/npm/page-agent@1.8.0/dist/iife/page-agent.demo.js';document.head.appendChild(s);})();
```
## Path 2 — npm install into your own web app (production use)
Inside an existing web project (React / Vue / Svelte / plain):
```bash
npm install page-agent
```
Wire it up with your own LLM endpoint — **never ship the demo CDN to real users**:
```javascript
import { PageAgent } from 'page-agent'
const agent = new PageAgent({
model: 'qwen3.5-plus',
baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
apiKey: process.env.LLM_API_KEY, // never hardcode
language: 'en-US',
})
// Show the panel for end users:
agent.panel.show()
// Or drive it programmatically:
await agent.execute('Click submit button, then fill username as John')
```
Provider examples (any OpenAI-compatible endpoint works):
| Provider | `baseURL` | `model` |
|----------|-----------|---------|
| Qwen / DashScope | `https://dashscope.aliyuncs.com/compatible-mode/v1` | `qwen3.5-plus` |
| OpenAI | `https://api.openai.com/v1` | `gpt-4o-mini` |
| Ollama (local) | `http://localhost:11434/v1` | `qwen3:14b` |
| OpenRouter | `https://openrouter.ai/api/v1` | `anthropic/claude-sonnet-4.6` |
**Key config fields** (passed to `new PageAgent({...})`):
- `model`, `baseURL`, `apiKey` — LLM connection
- `language` — UI language (`en-US`, `zh-CN`, etc.)
- Allowlist and data-masking hooks exist for locking down what the agent can touch — see https://alibaba.github.io/page-agent/ for the full option list
**Security.** Don't put your `apiKey` in client-side code for a real deployment — proxy LLM calls through your backend and point `baseURL` at your proxy. The demo CDN exists because alibaba runs that proxy for evaluation.
## Path 3 — clone the source repo (contributing, or hacking on it)
Use this when the user wants to modify page-agent itself, test it against arbitrary sites via a local IIFE bundle, or develop the browser extension.
```bash
git clone https://github.com/alibaba/page-agent.git
cd page-agent
npm ci # exact lockfile install (or `npm i` to allow updates)
```
Create `.env` in the repo root with an LLM endpoint. Example:
```
LLM_MODEL_NAME=gpt-4o-mini
LLM_API_KEY=sk-...
LLM_BASE_URL=https://api.openai.com/v1
```
Ollama flavor:
```
LLM_BASE_URL=http://localhost:11434/v1
LLM_API_KEY=NA
LLM_MODEL_NAME=qwen3:14b
```
Common commands:
```bash
npm start # docs/website dev server
npm run build # build every package
npm run dev:demo # serve IIFE bundle at http://localhost:5174/page-agent.demo.js
npm run dev:ext # develop the browser extension (WXT + React)
npm run build:ext # build the extension
```
**Test on any website** using the local IIFE bundle. Add this bookmarklet:
```javascript
javascript:(function(){var s=document.createElement('script');s.src=`http://localhost:5174/page-agent.demo.js?t=${Math.random()}`;s.onload=()=>console.log('PageAgent ready!');document.head.appendChild(s);})();
```
Then: `npm run dev:demo`, click the bookmarklet on any page, and the local build injects. Auto-rebuilds on save.
**Warning:** your `.env` `LLM_API_KEY` is inlined into the IIFE bundle during dev builds. Don't share the bundle. Don't commit it. Don't paste the URL into Slack. (Verified: grepping the public dev bundle returns the literal values from `.env`.)
## Repo layout (Path 3)
Monorepo with npm workspaces. Key packages:
| Package | Path | Purpose |
|---------|------|---------|
| `page-agent` | `packages/page-agent/` | Main entry with UI panel |
| `@page-agent/core` | `packages/core/` | Core agent logic, no UI |
| `@page-agent/mcp` | `packages/mcp/` | MCP server (beta) |
| — | `packages/llms/` | LLM client |
| — | `packages/page-controller/` | DOM ops + visual feedback |
| — | `packages/ui/` | Panel + i18n |
| — | `packages/extension/` | Chrome/Firefox extension |
| — | `packages/website/` | Docs + landing site |
## Verifying it works
After Path 1 or Path 2:
1. Open the page in a browser with devtools open
2. You should see a floating panel. If not, check the console for errors (most common: CORS on the LLM endpoint, wrong `baseURL`, or a bad API key)
3. Type a simple instruction matching something visible on the page ("click the Login link")
4. Watch the Network tab — you should see a request to your `baseURL`
After Path 3:
1. `npm run dev:demo` prints `Accepting connections at http://localhost:5174`
2. `curl -I http://localhost:5174/page-agent.demo.js` returns `HTTP/1.1 200 OK` with `Content-Type: application/javascript`
3. Click the bookmarklet on any site; panel appears
## Pitfalls
- **Demo CDN in production** — don't. It's rate-limited, uses alibaba's free proxy, and their terms forbid production use.
- **API key exposure** — any key passed to `new PageAgent({apiKey: ...})` ships in your JS bundle. Always proxy through your own backend for real deployments.
- **Non-OpenAI-compatible endpoints** fail silently or with cryptic errors. If your provider needs native Anthropic/Gemini formatting, use an OpenAI-compatibility proxy (LiteLLM, OpenRouter) in front.
- **CSP blocks** — sites with strict Content-Security-Policy may refuse to load the CDN script or disallow inline eval. In that case, self-host from your origin.
- **Restart dev server** after editing `.env` in Path 3 — Vite only reads env at startup.
- **Node version** — the repo declares `^22.13.0 || >=24`. Node 20 will fail `npm ci` with engine errors.
- **npm 10 vs 11** — docs say npm 11+; npm 10.9 actually works fine.
## Reference
- Repo: https://github.com/alibaba/page-agent
- Docs: https://alibaba.github.io/page-agent/
- License: MIT (built on browser-use's DOM processing internals, Copyright 2024 Gregor Zunic)
+303
View File
@@ -0,0 +1,303 @@
"""OpenAI image generation backend.
Exposes OpenAI's ``gpt-image-2`` model at three quality tiers as an
:class:`ImageGenProvider` implementation. The tiers are implemented as
three virtual model IDs so the ``hermes tools`` model picker and the
``image_gen.model`` config key behave like any other multi-model backend:
gpt-image-2-low ~15s fastest, good for iteration
gpt-image-2-medium ~40s default balanced
gpt-image-2-high ~2min slowest, highest fidelity
All three hit the same underlying API model (``gpt-image-2``) with a
different ``quality`` parameter. Output is base64 JSON saved under
``$HERMES_HOME/cache/images/``.
Selection precedence (first hit wins):
1. ``OPENAI_IMAGE_MODEL`` env var (escape hatch for scripts / tests)
2. ``image_gen.openai.model`` in ``config.yaml``
3. ``image_gen.model`` in ``config.yaml`` (when it's one of our tier IDs)
4. :data:`DEFAULT_MODEL` ``gpt-image-2-medium``
"""
from __future__ import annotations
import logging
import os
from typing import Any, Dict, List, Optional, Tuple
from agent.image_gen_provider import (
DEFAULT_ASPECT_RATIO,
ImageGenProvider,
error_response,
resolve_aspect_ratio,
save_b64_image,
success_response,
)
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Model catalog
# ---------------------------------------------------------------------------
#
# All three IDs resolve to the same underlying API model with a different
# ``quality`` setting. ``api_model`` is what gets sent to OpenAI;
# ``quality`` is the knob that changes generation time and output fidelity.
API_MODEL = "gpt-image-2"
_MODELS: Dict[str, Dict[str, Any]] = {
"gpt-image-2-low": {
"display": "GPT Image 2 (Low)",
"speed": "~15s",
"strengths": "Fast iteration, lowest cost",
"quality": "low",
},
"gpt-image-2-medium": {
"display": "GPT Image 2 (Medium)",
"speed": "~40s",
"strengths": "Balanced — default",
"quality": "medium",
},
"gpt-image-2-high": {
"display": "GPT Image 2 (High)",
"speed": "~2min",
"strengths": "Highest fidelity, strongest prompt adherence",
"quality": "high",
},
}
DEFAULT_MODEL = "gpt-image-2-medium"
_SIZES = {
"landscape": "1536x1024",
"square": "1024x1024",
"portrait": "1024x1536",
}
def _load_openai_config() -> Dict[str, Any]:
"""Read ``image_gen`` from config.yaml (returns {} on any failure)."""
try:
from hermes_cli.config import load_config
cfg = load_config()
section = cfg.get("image_gen") if isinstance(cfg, dict) else None
return section if isinstance(section, dict) else {}
except Exception as exc:
logger.debug("Could not load image_gen config: %s", exc)
return {}
def _resolve_model() -> Tuple[str, Dict[str, Any]]:
"""Decide which tier to use and return ``(model_id, meta)``."""
env_override = os.environ.get("OPENAI_IMAGE_MODEL")
if env_override and env_override in _MODELS:
return env_override, _MODELS[env_override]
cfg = _load_openai_config()
openai_cfg = cfg.get("openai") if isinstance(cfg.get("openai"), dict) else {}
candidate: Optional[str] = None
if isinstance(openai_cfg, dict):
value = openai_cfg.get("model")
if isinstance(value, str) and value in _MODELS:
candidate = value
if candidate is None:
top = cfg.get("model")
if isinstance(top, str) and top in _MODELS:
candidate = top
if candidate is not None:
return candidate, _MODELS[candidate]
return DEFAULT_MODEL, _MODELS[DEFAULT_MODEL]
# ---------------------------------------------------------------------------
# Provider
# ---------------------------------------------------------------------------
class OpenAIImageGenProvider(ImageGenProvider):
"""OpenAI ``images.generate`` backend — gpt-image-2 at low/medium/high."""
@property
def name(self) -> str:
return "openai"
@property
def display_name(self) -> str:
return "OpenAI"
def is_available(self) -> bool:
if not os.environ.get("OPENAI_API_KEY"):
return False
try:
import openai # noqa: F401
except ImportError:
return False
return True
def list_models(self) -> List[Dict[str, Any]]:
return [
{
"id": model_id,
"display": meta["display"],
"speed": meta["speed"],
"strengths": meta["strengths"],
"price": "varies",
}
for model_id, meta in _MODELS.items()
]
def default_model(self) -> Optional[str]:
return DEFAULT_MODEL
def get_setup_schema(self) -> Dict[str, Any]:
return {
"name": "OpenAI",
"badge": "paid",
"tag": "gpt-image-2 at low/medium/high quality tiers",
"env_vars": [
{
"key": "OPENAI_API_KEY",
"prompt": "OpenAI API key",
"url": "https://platform.openai.com/api-keys",
},
],
}
def generate(
self,
prompt: str,
aspect_ratio: str = DEFAULT_ASPECT_RATIO,
**kwargs: Any,
) -> Dict[str, Any]:
prompt = (prompt or "").strip()
aspect = resolve_aspect_ratio(aspect_ratio)
if not prompt:
return error_response(
error="Prompt is required and must be a non-empty string",
error_type="invalid_argument",
provider="openai",
aspect_ratio=aspect,
)
if not os.environ.get("OPENAI_API_KEY"):
return error_response(
error=(
"OPENAI_API_KEY not set. Run `hermes tools` → Image "
"Generation → OpenAI to configure, or `hermes setup` "
"to add the key."
),
error_type="auth_required",
provider="openai",
aspect_ratio=aspect,
)
try:
import openai
except ImportError:
return error_response(
error="openai Python package not installed (pip install openai)",
error_type="missing_dependency",
provider="openai",
aspect_ratio=aspect,
)
tier_id, meta = _resolve_model()
size = _SIZES.get(aspect, _SIZES["square"])
# gpt-image-2 returns b64_json unconditionally and REJECTS
# ``response_format`` as an unknown parameter. Don't send it.
payload: Dict[str, Any] = {
"model": API_MODEL,
"prompt": prompt,
"size": size,
"n": 1,
"quality": meta["quality"],
}
try:
client = openai.OpenAI()
response = client.images.generate(**payload)
except Exception as exc:
logger.debug("OpenAI image generation failed", exc_info=True)
return error_response(
error=f"OpenAI image generation failed: {exc}",
error_type="api_error",
provider="openai",
model=tier_id,
prompt=prompt,
aspect_ratio=aspect,
)
data = getattr(response, "data", None) or []
if not data:
return error_response(
error="OpenAI returned no image data",
error_type="empty_response",
provider="openai",
model=tier_id,
prompt=prompt,
aspect_ratio=aspect,
)
first = data[0]
b64 = getattr(first, "b64_json", None)
url = getattr(first, "url", None)
revised_prompt = getattr(first, "revised_prompt", None)
if b64:
try:
saved_path = save_b64_image(b64, prefix=f"openai_{tier_id}")
except Exception as exc:
return error_response(
error=f"Could not save image to cache: {exc}",
error_type="io_error",
provider="openai",
model=tier_id,
prompt=prompt,
aspect_ratio=aspect,
)
image_ref = str(saved_path)
elif url:
# Defensive — gpt-image-2 returns b64 today, but fall back
# gracefully if the API ever changes.
image_ref = url
else:
return error_response(
error="OpenAI response contained neither b64_json nor URL",
error_type="empty_response",
provider="openai",
model=tier_id,
prompt=prompt,
aspect_ratio=aspect,
)
extra: Dict[str, Any] = {"size": size, "quality": meta["quality"]}
if revised_prompt:
extra["revised_prompt"] = revised_prompt
return success_response(
image=image_ref,
model=tier_id,
prompt=prompt,
aspect_ratio=aspect,
provider="openai",
extra=extra,
)
# ---------------------------------------------------------------------------
# Plugin entry point
# ---------------------------------------------------------------------------
def register(ctx) -> None:
"""Plugin entry point — wire ``OpenAIImageGenProvider`` into the registry."""
ctx.register_image_gen_provider(OpenAIImageGenProvider())
+7
View File
@@ -0,0 +1,7 @@
name: openai
version: 1.0.0
description: "OpenAI image generation backend (gpt-image-2). Saves generated images to $HERMES_HOME/cache/images/."
author: NousResearch
kind: backend
requires_env:
- OPENAI_API_KEY
+5 -2
View File
@@ -84,7 +84,10 @@ Config file: `~/.hermes/hindsight/config.json`
| `retain_async` | `true` | Process retain asynchronously on the Hindsight server |
| `retain_every_n_turns` | `1` | Retain every N turns (1 = every turn) |
| `retain_context` | `conversation between Hermes Agent and the User` | Context label for retained memories |
| `tags` | — | Tags applied when storing memories |
| `retain_tags` | — | Default tags applied to retained memories; merged with per-call tool tags |
| `retain_source` | — | Optional `metadata.source` attached to retained memories |
| `retain_user_prefix` | `User` | Label used before user turns in auto-retained transcripts |
| `retain_assistant_prefix` | `Assistant` | Label used before assistant turns in auto-retained transcripts |
### Integration
@@ -113,7 +116,7 @@ Available in `hybrid` and `tools` memory modes:
| Tool | Description |
|------|-------------|
| `hindsight_retain` | Store information with auto entity extraction |
| `hindsight_retain` | Store information with auto entity extraction; supports optional per-call `tags` |
| `hindsight_recall` | Multi-strategy search (semantic + entity graph) |
| `hindsight_reflect` | Cross-memory synthesis (LLM-powered) |
+198 -37
View File
@@ -6,11 +6,15 @@ retrieval. Supports cloud (API key) and local modes.
Original PR #1811 by benfrank241, adapted to MemoryProvider ABC.
Config via environment variables:
HINDSIGHT_API_KEY API key for Hindsight Cloud
HINDSIGHT_BANK_ID memory bank identifier (default: hermes)
HINDSIGHT_BUDGET recall budget: low/mid/high (default: mid)
HINDSIGHT_API_URL API endpoint
HINDSIGHT_MODE cloud or local (default: cloud)
HINDSIGHT_API_KEY API key for Hindsight Cloud
HINDSIGHT_BANK_ID memory bank identifier (default: hermes)
HINDSIGHT_BUDGET recall budget: low/mid/high (default: mid)
HINDSIGHT_API_URL API endpoint
HINDSIGHT_MODE cloud or local (default: cloud)
HINDSIGHT_RETAIN_TAGS comma-separated tags attached to retained memories
HINDSIGHT_RETAIN_SOURCE metadata source value attached to retained memories
HINDSIGHT_RETAIN_USER_PREFIX label used before user turns in retained transcripts
HINDSIGHT_RETAIN_ASSISTANT_PREFIX label used before assistant turns in retained transcripts
Or via $HERMES_HOME/hindsight/config.json (profile-scoped), falling back to
~/.hindsight/config.json (legacy, shared) for backward compatibility.
@@ -24,7 +28,7 @@ import logging
import os
import threading
from hermes_constants import get_hermes_home
from datetime import datetime, timezone
from typing import Any, Dict, List
from agent.memory_provider import MemoryProvider
@@ -99,6 +103,11 @@ RETAIN_SCHEMA = {
"properties": {
"content": {"type": "string", "description": "The information to store."},
"context": {"type": "string", "description": "Short label (e.g. 'user preference', 'project decision')."},
"tags": {
"type": "array",
"items": {"type": "string"},
"description": "Optional per-call tags to merge with configured default retain tags.",
},
},
"required": ["content"],
},
@@ -168,6 +177,10 @@ def _load_config() -> dict:
return {
"mode": os.environ.get("HINDSIGHT_MODE", "cloud"),
"apiKey": os.environ.get("HINDSIGHT_API_KEY", ""),
"retain_tags": os.environ.get("HINDSIGHT_RETAIN_TAGS", ""),
"retain_source": os.environ.get("HINDSIGHT_RETAIN_SOURCE", ""),
"retain_user_prefix": os.environ.get("HINDSIGHT_RETAIN_USER_PREFIX", "User"),
"retain_assistant_prefix": os.environ.get("HINDSIGHT_RETAIN_ASSISTANT_PREFIX", "Assistant"),
"banks": {
"hermes": {
"bankId": os.environ.get("HINDSIGHT_BANK_ID", "hermes"),
@@ -178,6 +191,48 @@ def _load_config() -> dict:
}
def _normalize_retain_tags(value: Any) -> List[str]:
"""Normalize tag config/tool values to a deduplicated list of strings."""
if value is None:
return []
raw_items: list[Any]
if isinstance(value, list):
raw_items = value
elif isinstance(value, str):
text = value.strip()
if not text:
return []
if text.startswith("["):
try:
parsed = json.loads(text)
except Exception:
parsed = None
if isinstance(parsed, list):
raw_items = parsed
else:
raw_items = text.split(",")
else:
raw_items = text.split(",")
else:
raw_items = [value]
normalized = []
seen = set()
for item in raw_items:
tag = str(item).strip()
if not tag or tag in seen:
continue
seen.add(tag)
normalized.append(tag)
return normalized
def _utc_timestamp() -> str:
"""Return current UTC timestamp in ISO-8601 with milliseconds and Z suffix."""
return datetime.now(timezone.utc).isoformat(timespec="milliseconds").replace("+00:00", "Z")
# ---------------------------------------------------------------------------
# MemoryProvider implementation
# ---------------------------------------------------------------------------
@@ -195,6 +250,19 @@ class HindsightMemoryProvider(MemoryProvider):
self._llm_base_url = ""
self._memory_mode = "hybrid" # "context", "tools", or "hybrid"
self._prefetch_method = "recall" # "recall" or "reflect"
self._retain_tags: List[str] = []
self._retain_source = ""
self._retain_user_prefix = "User"
self._retain_assistant_prefix = "Assistant"
self._platform = ""
self._user_id = ""
self._user_name = ""
self._chat_id = ""
self._chat_name = ""
self._chat_type = ""
self._thread_id = ""
self._agent_identity = ""
self._turn_index = 0
self._client = None
self._prefetch_result = ""
self._prefetch_lock = threading.Lock()
@@ -210,6 +278,7 @@ class HindsightMemoryProvider(MemoryProvider):
# Retain controls
self._auto_retain = True
self._retain_every_n_turns = 1
self._retain_async = True
self._retain_context = "conversation between Hermes Agent and the User"
self._turn_counter = 0
self._session_turns: list[str] = [] # accumulates ALL turns for the session
@@ -224,7 +293,6 @@ class HindsightMemoryProvider(MemoryProvider):
# Bank
self._bank_mission = ""
self._bank_retain_mission: str | None = None
self._retain_async = True
@property
def name(self) -> str:
@@ -423,7 +491,10 @@ class HindsightMemoryProvider(MemoryProvider):
{"key": "recall_budget", "description": "Recall thoroughness", "default": "mid", "choices": ["low", "mid", "high"]},
{"key": "memory_mode", "description": "Memory integration mode", "default": "hybrid", "choices": ["hybrid", "context", "tools"]},
{"key": "recall_prefetch_method", "description": "Auto-recall method", "default": "recall", "choices": ["recall", "reflect"]},
{"key": "tags", "description": "Tags applied when storing memories (comma-separated)", "default": ""},
{"key": "retain_tags", "description": "Default tags applied to retained memories (comma-separated)", "default": ""},
{"key": "retain_source", "description": "Metadata source value attached to retained memories", "default": ""},
{"key": "retain_user_prefix", "description": "Label used before user turns in retained transcripts", "default": "User"},
{"key": "retain_assistant_prefix", "description": "Label used before assistant turns in retained transcripts", "default": "Assistant"},
{"key": "recall_tags", "description": "Tags to filter when searching memories (comma-separated)", "default": ""},
{"key": "recall_tags_match", "description": "Tag matching mode for recall", "default": "any", "choices": ["any", "all", "any_strict", "all_strict"]},
{"key": "auto_recall", "description": "Automatically recall memories before each turn", "default": True},
@@ -467,7 +538,7 @@ class HindsightMemoryProvider(MemoryProvider):
return self._client
def initialize(self, session_id: str, **kwargs) -> None:
self._session_id = session_id
self._session_id = str(session_id or "").strip()
# Check client version and auto-upgrade if needed
try:
@@ -496,6 +567,16 @@ class HindsightMemoryProvider(MemoryProvider):
pass # packaging not available or other issue — proceed anyway
self._config = _load_config()
self._platform = str(kwargs.get("platform") or "").strip()
self._user_id = str(kwargs.get("user_id") or "").strip()
self._user_name = str(kwargs.get("user_name") or "").strip()
self._chat_id = str(kwargs.get("chat_id") or "").strip()
self._chat_name = str(kwargs.get("chat_name") or "").strip()
self._chat_type = str(kwargs.get("chat_type") or "").strip()
self._thread_id = str(kwargs.get("thread_id") or "").strip()
self._agent_identity = str(kwargs.get("agent_identity") or "").strip()
self._turn_index = 0
self._session_turns = []
self._mode = self._config.get("mode", "cloud")
# "local" is a legacy alias for "local_embedded"
if self._mode == "local":
@@ -513,7 +594,7 @@ class HindsightMemoryProvider(MemoryProvider):
memory_mode = self._config.get("memory_mode", "hybrid")
self._memory_mode = memory_mode if memory_mode in ("context", "tools", "hybrid") else "hybrid"
prefetch_method = self._config.get("recall_prefetch_method", "recall")
prefetch_method = self._config.get("recall_prefetch_method") or self._config.get("prefetch_method", "recall")
self._prefetch_method = prefetch_method if prefetch_method in ("recall", "reflect") else "recall"
# Bank options
@@ -521,9 +602,22 @@ class HindsightMemoryProvider(MemoryProvider):
self._bank_retain_mission = self._config.get("bank_retain_mission") or None
# Tags
self._tags = self._config.get("tags") or None
self._retain_tags = _normalize_retain_tags(
self._config.get("retain_tags")
or os.environ.get("HINDSIGHT_RETAIN_TAGS", "")
)
self._tags = self._retain_tags or None
self._recall_tags = self._config.get("recall_tags") or None
self._recall_tags_match = self._config.get("recall_tags_match", "any")
self._retain_source = str(
self._config.get("retain_source") or os.environ.get("HINDSIGHT_RETAIN_SOURCE", "")
).strip()
self._retain_user_prefix = str(
self._config.get("retain_user_prefix") or os.environ.get("HINDSIGHT_RETAIN_USER_PREFIX", "User")
).strip() or "User"
self._retain_assistant_prefix = str(
self._config.get("retain_assistant_prefix") or os.environ.get("HINDSIGHT_RETAIN_ASSISTANT_PREFIX", "Assistant")
).strip() or "Assistant"
# Retain controls
self._auto_retain = self._config.get("auto_retain", True)
@@ -547,11 +641,9 @@ class HindsightMemoryProvider(MemoryProvider):
logger.info("Hindsight initialized: mode=%s, api_url=%s, bank=%s, budget=%s, memory_mode=%s, prefetch_method=%s, client=%s",
self._mode, self._api_url, self._bank_id, self._budget, self._memory_mode, self._prefetch_method, _client_version)
logger.debug("Hindsight config: auto_retain=%s, auto_recall=%s, retain_every_n=%d, "
"retain_async=%s, retain_context=%s, "
"recall_max_tokens=%d, recall_max_input_chars=%d, tags=%s, recall_tags=%s",
"retain_async=%s, retain_context=%s, recall_max_tokens=%d, recall_max_input_chars=%d, tags=%s, recall_tags=%s",
self._auto_retain, self._auto_recall, self._retain_every_n_turns,
self._retain_async, self._retain_context,
self._recall_max_tokens, self._recall_max_input_chars,
self._retain_async, self._retain_context, self._recall_max_tokens, self._recall_max_input_chars,
self._tags, self._recall_tags)
# For local mode, start the embedded daemon in the background so it
@@ -712,6 +804,78 @@ class HindsightMemoryProvider(MemoryProvider):
self._prefetch_thread = threading.Thread(target=_run, daemon=True, name="hindsight-prefetch")
self._prefetch_thread.start()
def _build_turn_messages(self, user_content: str, assistant_content: str) -> List[Dict[str, str]]:
now = datetime.now(timezone.utc).isoformat()
return [
{
"role": "user",
"content": f"{self._retain_user_prefix}: {user_content}",
"timestamp": now,
},
{
"role": "assistant",
"content": f"{self._retain_assistant_prefix}: {assistant_content}",
"timestamp": now,
},
]
def _build_metadata(self, *, message_count: int, turn_index: int) -> Dict[str, str]:
metadata: Dict[str, str] = {
"retained_at": _utc_timestamp(),
"message_count": str(message_count),
"turn_index": str(turn_index),
}
if self._retain_source:
metadata["source"] = self._retain_source
if self._session_id:
metadata["session_id"] = self._session_id
if self._platform:
metadata["platform"] = self._platform
if self._user_id:
metadata["user_id"] = self._user_id
if self._user_name:
metadata["user_name"] = self._user_name
if self._chat_id:
metadata["chat_id"] = self._chat_id
if self._chat_name:
metadata["chat_name"] = self._chat_name
if self._chat_type:
metadata["chat_type"] = self._chat_type
if self._thread_id:
metadata["thread_id"] = self._thread_id
if self._agent_identity:
metadata["agent_identity"] = self._agent_identity
return metadata
def _build_retain_kwargs(
self,
content: str,
*,
context: str | None = None,
document_id: str | None = None,
metadata: Dict[str, str] | None = None,
tags: List[str] | None = None,
retain_async: bool | None = None,
) -> Dict[str, Any]:
kwargs: Dict[str, Any] = {
"bank_id": self._bank_id,
"content": content,
"metadata": metadata or self._build_metadata(message_count=1, turn_index=self._turn_index),
}
if context is not None:
kwargs["context"] = context
if document_id:
kwargs["document_id"] = document_id
if retain_async is not None:
kwargs["retain_async"] = retain_async
merged_tags = _normalize_retain_tags(self._retain_tags)
for tag in _normalize_retain_tags(tags):
if tag not in merged_tags:
merged_tags.append(tag)
if merged_tags:
kwargs["tags"] = merged_tags
return kwargs
def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
"""Retain conversation turn in background (non-blocking).
@@ -721,19 +885,14 @@ class HindsightMemoryProvider(MemoryProvider):
logger.debug("sync_turn: skipped (auto_retain disabled)")
return
from datetime import datetime, timezone
now = datetime.now(timezone.utc).isoformat()
if session_id:
self._session_id = str(session_id).strip()
messages = [
{"role": "user", "content": user_content, "timestamp": now},
{"role": "assistant", "content": assistant_content, "timestamp": now},
]
turn = json.dumps(messages)
turn = json.dumps(self._build_turn_messages(user_content, assistant_content))
self._session_turns.append(turn)
self._turn_counter += 1
self._turn_index = self._turn_counter
# Only retain every N turns
if self._turn_counter % self._retain_every_n_turns != 0:
logger.debug("sync_turn: buffered turn %d (will retain at turn %d)",
self._turn_counter, self._turn_counter + (self._retain_every_n_turns - self._turn_counter % self._retain_every_n_turns))
@@ -741,19 +900,21 @@ class HindsightMemoryProvider(MemoryProvider):
logger.debug("sync_turn: retaining %d turns, total session content %d chars",
len(self._session_turns), sum(len(t) for t in self._session_turns))
# Send the ENTIRE session as a single JSON array (document_id deduplicates).
# Each element in _session_turns is a JSON string of that turn's messages.
content = "[" + ",".join(self._session_turns) + "]"
def _sync():
try:
client = self._get_client()
item: dict = {
"content": content,
"context": self._retain_context,
}
if self._tags:
item["tags"] = self._tags
item = self._build_retain_kwargs(
content,
context=self._retain_context,
metadata=self._build_metadata(
message_count=len(self._session_turns) * 2,
turn_index=self._turn_index,
),
)
item.pop("bank_id", None)
item.pop("retain_async", None)
logger.debug("Hindsight retain: bank=%s, doc=%s, async=%s, content_len=%d, num_turns=%d",
self._bank_id, self._session_id, self._retain_async, len(content), len(self._session_turns))
_run_sync(client.aretain_batch(
@@ -789,11 +950,11 @@ class HindsightMemoryProvider(MemoryProvider):
return tool_error("Missing required parameter: content")
context = args.get("context")
try:
retain_kwargs: dict = {
"bank_id": self._bank_id, "content": content, "context": context,
}
if self._tags:
retain_kwargs["tags"] = self._tags
retain_kwargs = self._build_retain_kwargs(
content,
context=context,
tags=args.get("tags"),
)
logger.debug("Tool hindsight_retain: bank=%s, content_len=%d, context=%s",
self._bank_id, len(content), context)
_run_sync(client.aretain(**retain_kwargs))
+1 -1
View File
@@ -126,7 +126,7 @@ py-modules = ["run_agent", "model_tools", "toolsets", "batch_runner", "trajector
hermes_cli = ["web_dist/**/*"]
[tool.setuptools.packages.find]
include = ["agent", "tools", "tools.*", "hermes_cli", "gateway", "gateway.*", "tui_gateway", "tui_gateway.*", "cron", "acp_adapter", "plugins", "plugins.*"]
include = ["agent", "agent.*", "tools", "tools.*", "hermes_cli", "gateway", "gateway.*", "tui_gateway", "tui_gateway.*", "cron", "acp_adapter", "plugins", "plugins.*"]
[tool.pytest.ini_options]
testpaths = ["tests"]
+307 -404
View File
@@ -751,6 +751,11 @@ class AIAgent:
prefill_messages: List[Dict[str, Any]] = None,
platform: str = None,
user_id: str = None,
user_name: str = None,
chat_id: str = None,
chat_name: str = None,
chat_type: str = None,
thread_id: str = None,
gateway_session_key: str = None,
skip_context_files: bool = False,
skip_memory: bool = False,
@@ -820,6 +825,11 @@ class AIAgent:
self.ephemeral_system_prompt = ephemeral_system_prompt
self.platform = platform # "cli", "telegram", "discord", "whatsapp", etc.
self._user_id = user_id # Platform user identifier (gateway sessions)
self._user_name = user_name
self._chat_id = chat_id
self._chat_name = chat_name
self._chat_type = chat_type
self._thread_id = thread_id
self._gateway_session_key = gateway_session_key # Stable per-chat key (e.g. agent:main:telegram:dm:123)
# Pluggable print function — CLI replaces this with _cprint so that
# raw ANSI status lines are routed through prompt_toolkit's renderer
@@ -1175,7 +1185,7 @@ class AIAgent:
client_kwargs["default_headers"] = copilot_default_headers()
elif base_url_host_matches(effective_base, "api.kimi.com"):
client_kwargs["default_headers"] = {
"User-Agent": "KimiCLI/1.30.0",
"User-Agent": "claude-code/0.1.0",
}
elif base_url_host_matches(effective_base, "portal.qwen.ai"):
client_kwargs["default_headers"] = _qwen_portal_headers()
@@ -1471,6 +1481,16 @@ class AIAgent:
# Thread gateway user identity for per-user memory scoping
if self._user_id:
_init_kwargs["user_id"] = self._user_id
if self._user_name:
_init_kwargs["user_name"] = self._user_name
if self._chat_id:
_init_kwargs["chat_id"] = self._chat_id
if self._chat_name:
_init_kwargs["chat_name"] = self._chat_name
if self._chat_type:
_init_kwargs["chat_type"] = self._chat_type
if self._thread_id:
_init_kwargs["thread_id"] = self._thread_id
# Thread gateway session key for stable per-chat Honcho session isolation
if self._gateway_session_key:
_init_kwargs["gateway_session_key"] = self._gateway_session_key
@@ -2966,6 +2986,7 @@ class AIAgent:
tool_call_id=msg.get("tool_call_id"),
finish_reason=msg.get("finish_reason"),
reasoning=msg.get("reasoning") if role == "assistant" else None,
reasoning_content=msg.get("reasoning_content") if role == "assistant" else None,
reasoning_details=msg.get("reasoning_details") if role == "assistant" else None,
codex_reasoning_items=msg.get("codex_reasoning_items") if role == "assistant" else None,
)
@@ -4308,10 +4329,6 @@ class AIAgent:
if self._memory_store:
self._memory_store.load_from_disk()
def _responses_tools(self, tools: Optional[List[Dict[str, Any]]] = None) -> Optional[List[Dict[str, Any]]]:
"""Convert chat-completions tool schemas to Responses function-tool schemas."""
return _codex_responses_tools(tools if tools is not None else self.tools)
@staticmethod
def _deterministic_call_id(fn_name: str, arguments: str, index: int = 0) -> str:
"""Generate a deterministic call_id from tool call content.
@@ -4335,33 +4352,6 @@ class AIAgent:
"""Build a valid Responses `function_call.id` (must start with `fc_`)."""
return _codex_derive_responses_function_call_id(call_id, response_item_id)
def _chat_messages_to_responses_input(self, messages: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
"""Convert internal chat-style messages to Responses input items."""
return _codex_chat_messages_to_responses_input(messages)
def _preflight_codex_input_items(self, raw_items: Any) -> List[Dict[str, Any]]:
return _codex_preflight_codex_input_items(raw_items)
def _preflight_codex_api_kwargs(
self,
api_kwargs: Any,
*,
allow_stream: bool = False,
) -> Dict[str, Any]:
return _codex_preflight_codex_api_kwargs(api_kwargs, allow_stream=allow_stream)
def _extract_responses_message_text(self, item: Any) -> str:
"""Extract assistant text from a Responses message output item."""
return _codex_extract_responses_message_text(item)
def _extract_responses_reasoning_text(self, item: Any) -> str:
"""Extract a compact reasoning text from a Responses reasoning item."""
return _codex_extract_responses_reasoning_text(item)
def _normalize_codex_response(self, response: Any) -> tuple[Any, str]:
"""Normalize a Responses API object to an assistant_message-like object."""
return _codex_normalize_codex_response(response)
def _thread_identity(self) -> str:
thread = threading.current_thread()
return f"{thread.name}:{thread.ident}"
@@ -4854,7 +4844,7 @@ class AIAgent:
active_client = client or self._ensure_primary_openai_client(reason="codex_create_stream_fallback")
fallback_kwargs = dict(api_kwargs)
fallback_kwargs["stream"] = True
fallback_kwargs = self._preflight_codex_api_kwargs(fallback_kwargs, allow_stream=True)
fallback_kwargs = self._get_codex_transport().preflight_kwargs(fallback_kwargs, allow_stream=True)
stream_or_response = active_client.responses.create(**fallback_kwargs)
# Compatibility shim for mocks or providers that still return a concrete response.
@@ -5049,7 +5039,7 @@ class AIAgent:
self._client_kwargs["default_headers"] = copilot_default_headers()
elif base_url_host_matches(base_url, "api.kimi.com"):
self._client_kwargs["default_headers"] = {"User-Agent": "KimiCLI/1.30.0"}
self._client_kwargs["default_headers"] = {"User-Agent": "claude-code/0.1.0"}
elif base_url_host_matches(base_url, "portal.qwen.ai"):
self._client_kwargs["default_headers"] = _qwen_portal_headers()
elif base_url_host_matches(base_url, "chatgpt.com"):
@@ -6596,6 +6586,33 @@ class AIAgent:
self._anthropic_transport = t
return t
def _get_codex_transport(self):
"""Return the cached ResponsesApiTransport instance (lazy singleton)."""
t = getattr(self, "_codex_transport", None)
if t is None:
from agent.transports import get_transport
t = get_transport("codex_responses")
self._codex_transport = t
return t
def _get_chat_completions_transport(self):
"""Return the cached ChatCompletionsTransport instance (lazy singleton)."""
t = getattr(self, "_chat_completions_transport", None)
if t is None:
from agent.transports import get_transport
t = get_transport("chat_completions")
self._chat_completions_transport = t
return t
def _get_bedrock_transport(self):
"""Return the cached BedrockTransport instance (lazy singleton)."""
t = getattr(self, "_bedrock_transport", None)
if t is None:
from agent.transports import get_transport
t = get_transport("bedrock_converse")
self._bedrock_transport = t
return t
def _prepare_anthropic_messages_for_api(self, api_messages: list) -> list:
if not any(
isinstance(msg, dict) and self._content_has_image_parts(msg.get("content"))
@@ -6735,31 +6752,20 @@ class AIAgent:
# AWS Bedrock native Converse API — bypasses the OpenAI client entirely.
# The adapter handles message/tool conversion and boto3 calls directly.
if self.api_mode == "bedrock_converse":
from agent.bedrock_adapter import build_converse_kwargs
_bt = self._get_bedrock_transport()
region = getattr(self, "_bedrock_region", None) or "us-east-1"
guardrail = getattr(self, "_bedrock_guardrail_config", None)
return {
"__bedrock_converse__": True,
"__bedrock_region__": region,
**build_converse_kwargs(
model=self.model,
messages=api_messages,
tools=self.tools,
max_tokens=self.max_tokens or 4096,
temperature=None, # Let the model use its default
guardrail_config=guardrail,
),
}
return _bt.build_kwargs(
model=self.model,
messages=api_messages,
tools=self.tools,
max_tokens=self.max_tokens or 4096,
region=region,
guardrail_config=guardrail,
)
if self.api_mode == "codex_responses":
instructions = ""
payload_messages = api_messages
if api_messages and api_messages[0].get("role") == "system":
instructions = str(api_messages[0].get("content") or "").strip()
payload_messages = api_messages[1:]
if not instructions:
instructions = DEFAULT_AGENT_IDENTITY
_ct = self._get_codex_transport()
is_github_responses = (
base_url_host_matches(self.base_url, "models.github.ai")
or base_url_host_matches(self.base_url, "api.githubcopilot.com")
@@ -6771,320 +6777,118 @@ class AIAgent:
and "/backend-api/codex" in self._base_url_lower
)
)
# Resolve reasoning effort: config > default (medium)
reasoning_effort = "medium"
reasoning_enabled = True
if self.reasoning_config and isinstance(self.reasoning_config, dict):
if self.reasoning_config.get("enabled") is False:
reasoning_enabled = False
elif self.reasoning_config.get("effort"):
reasoning_effort = self.reasoning_config["effort"]
# Clamp effort levels not supported by the Responses API model.
# GPT-5.4 supports none/low/medium/high/xhigh but not "minimal".
# "minimal" is valid on OpenRouter and GPT-5 but fails on 5.2/5.4.
_effort_clamp = {"minimal": "low"}
reasoning_effort = _effort_clamp.get(reasoning_effort, reasoning_effort)
kwargs = {
"model": self.model,
"instructions": instructions,
"input": self._chat_messages_to_responses_input(payload_messages),
"tools": self._responses_tools(),
"tool_choice": "auto",
"parallel_tool_calls": True,
"store": False,
}
if not is_github_responses:
kwargs["prompt_cache_key"] = self.session_id
is_xai_responses = self.provider == "xai" or self._base_url_hostname == "api.x.ai"
if reasoning_enabled and is_xai_responses:
# xAI reasons automatically — no effort param, just include encrypted content
kwargs["include"] = ["reasoning.encrypted_content"]
elif reasoning_enabled:
if is_github_responses:
# Copilot's Responses route advertises reasoning-effort support,
# but not OpenAI-specific prompt cache or encrypted reasoning
# fields. Keep the payload to the documented subset.
github_reasoning = self._github_models_reasoning_extra_body()
if github_reasoning is not None:
kwargs["reasoning"] = github_reasoning
else:
kwargs["reasoning"] = {"effort": reasoning_effort, "summary": "auto"}
kwargs["include"] = ["reasoning.encrypted_content"]
elif not is_github_responses and not is_xai_responses:
kwargs["include"] = []
if self.request_overrides:
kwargs.update(self.request_overrides)
if self.max_tokens is not None and not is_codex_backend:
kwargs["max_output_tokens"] = self.max_tokens
if is_xai_responses and getattr(self, "session_id", None):
kwargs["extra_headers"] = {"x-grok-conv-id": self.session_id}
return kwargs
sanitized_messages = api_messages
needs_sanitization = False
for msg in api_messages:
if not isinstance(msg, dict):
continue
if "codex_reasoning_items" in msg:
needs_sanitization = True
break
tool_calls = msg.get("tool_calls")
if isinstance(tool_calls, list):
for tool_call in tool_calls:
if not isinstance(tool_call, dict):
continue
if "call_id" in tool_call or "response_item_id" in tool_call:
needs_sanitization = True
break
if needs_sanitization:
break
if needs_sanitization:
sanitized_messages = copy.deepcopy(api_messages)
for msg in sanitized_messages:
if not isinstance(msg, dict):
continue
# Codex-only replay state must not leak into strict chat-completions APIs.
msg.pop("codex_reasoning_items", None)
tool_calls = msg.get("tool_calls")
if isinstance(tool_calls, list):
for tool_call in tool_calls:
if isinstance(tool_call, dict):
tool_call.pop("call_id", None)
tool_call.pop("response_item_id", None)
# Qwen portal: normalize content to list-of-dicts, inject cache_control.
# Must run AFTER codex sanitization so we transform the final messages.
# If sanitization already deepcopied, reuse that copy (in-place).
if self._is_qwen_portal():
if sanitized_messages is api_messages:
# No sanitization was done — we need our own copy.
sanitized_messages = self._qwen_prepare_chat_messages(sanitized_messages)
else:
# Already a deepcopy — transform in place to avoid a second deepcopy.
self._qwen_prepare_chat_messages_inplace(sanitized_messages)
# GPT-5 and Codex models respond better to 'developer' than 'system'
# for instruction-following. Swap the role at the API boundary so
# internal message representation stays uniform ("system").
_model_lower = (self.model or "").lower()
if (
sanitized_messages
and sanitized_messages[0].get("role") == "system"
and any(p in _model_lower for p in DEVELOPER_ROLE_MODELS)
):
# Shallow-copy the list + first message only — rest stays shared.
sanitized_messages = list(sanitized_messages)
sanitized_messages[0] = {**sanitized_messages[0], "role": "developer"}
provider_preferences = {}
if self.providers_allowed:
provider_preferences["only"] = self.providers_allowed
if self.providers_ignored:
provider_preferences["ignore"] = self.providers_ignored
if self.providers_order:
provider_preferences["order"] = self.providers_order
if self.provider_sort:
provider_preferences["sort"] = self.provider_sort
if self.provider_require_parameters:
provider_preferences["require_parameters"] = True
if self.provider_data_collection:
provider_preferences["data_collection"] = self.provider_data_collection
api_kwargs = {
"model": self.model,
"messages": sanitized_messages,
"timeout": self._resolved_api_call_timeout(),
}
try:
from agent.auxiliary_client import _fixed_temperature_for_model, OMIT_TEMPERATURE
except Exception:
_fixed_temperature_for_model = None
OMIT_TEMPERATURE = None
if _fixed_temperature_for_model is not None:
fixed_temperature = _fixed_temperature_for_model(self.model, self.base_url)
if fixed_temperature is OMIT_TEMPERATURE:
api_kwargs.pop("temperature", None)
elif fixed_temperature is not None:
api_kwargs["temperature"] = fixed_temperature
if self._is_qwen_portal():
api_kwargs["metadata"] = {
"sessionId": self.session_id or "hermes",
"promptId": str(uuid.uuid4()),
}
if self.tools:
api_kwargs["tools"] = self.tools
# ── max_tokens for chat_completions ──────────────────────────────
# Priority: ephemeral override (error recovery / length-continuation
# boost) > user-configured max_tokens > provider-specific defaults.
_ephemeral_out = getattr(self, "_ephemeral_max_output_tokens", None)
if _ephemeral_out is not None:
self._ephemeral_max_output_tokens = None # consume immediately
api_kwargs.update(self._max_tokens_param(_ephemeral_out))
elif self.max_tokens is not None:
api_kwargs.update(self._max_tokens_param(self.max_tokens))
elif "integrate.api.nvidia.com" in self._base_url_lower:
# NVIDIA NIM defaults to a very low max_tokens when omitted,
# causing models like GLM-4.7 to truncate immediately (thinking
# tokens alone exhaust the budget). 16384 provides adequate room.
api_kwargs.update(self._max_tokens_param(16384))
elif self._is_qwen_portal():
# Qwen Portal defaults to a very low max_tokens when omitted.
# Reasoning models (qwen3-coder-plus) exhaust that budget on
# thinking tokens alone, causing the portal to return
# finish_reason="stop" with truncated output — the agent sees
# this as an intentional stop and exits the loop. Send 65536
# (the documented max output for qwen3-coder models) so the
# model has adequate output budget for tool calls.
api_kwargs.update(self._max_tokens_param(65536))
elif (
base_url_host_matches(self.base_url, "api.kimi.com")
or base_url_host_matches(self.base_url, "moonshot.ai")
or base_url_host_matches(self.base_url, "moonshot.cn")
):
# Kimi/Moonshot defaults to a low max_tokens when omitted.
# Reasoning tokens share the output budget — without an explicit
# value the model can exhaust it on thinking alone, causing
# "Response truncated due to output length limit". 32000 matches
# Kimi CLI's default (see MoonshotAI/kimi-cli kimi.py generate()).
api_kwargs.update(self._max_tokens_param(32000))
# Kimi requires reasoning_effort as a top-level chat completions
# parameter (not inside extra_body). Mirror Kimi CLI's
# with_generation_kwargs(reasoning_effort=...) / with_thinking():
# when thinking is disabled, Kimi CLI omits reasoning_effort
# entirely (maps to None).
_kimi_thinking_off = bool(
self.reasoning_config
and isinstance(self.reasoning_config, dict)
and self.reasoning_config.get("enabled") is False
return _ct.build_kwargs(
model=self.model,
messages=api_messages,
tools=self.tools,
reasoning_config=self.reasoning_config,
session_id=getattr(self, "session_id", None),
max_tokens=self.max_tokens,
request_overrides=self.request_overrides,
is_github_responses=is_github_responses,
is_codex_backend=is_codex_backend,
is_xai_responses=is_xai_responses,
github_reasoning_extra=self._github_models_reasoning_extra_body() if is_github_responses else None,
)
if not _kimi_thinking_off:
_kimi_effort = "medium"
if self.reasoning_config and isinstance(self.reasoning_config, dict):
_e = (self.reasoning_config.get("effort") or "").strip().lower()
if _e in ("low", "medium", "high"):
_kimi_effort = _e
api_kwargs["reasoning_effort"] = _kimi_effort
elif (self._is_openrouter_url() or "nousresearch" in self._base_url_lower) and "claude" in (self.model or "").lower():
# OpenRouter and Nous Portal translate requests to Anthropic's
# Messages API, which requires max_tokens as a mandatory field.
# When we omit it, the proxy picks a default that can be too
# low — the model spends its output budget on thinking and has
# almost nothing left for the actual response (especially large
# tool calls like write_file). Sending the model's real output
# limit ensures full capacity.
try:
from agent.anthropic_adapter import _get_anthropic_max_output
_model_output_limit = _get_anthropic_max_output(self.model)
api_kwargs["max_tokens"] = _model_output_limit
except Exception:
pass # fail open — let the proxy pick its default
extra_body = {}
# ── chat_completions (default) ─────────────────────────────────────
_ct = self._get_chat_completions_transport()
_is_openrouter = self._is_openrouter_url()
_is_github_models = (
# Provider detection flags
_is_qwen = self._is_qwen_portal()
_is_or = self._is_openrouter_url()
_is_gh = (
base_url_host_matches(self._base_url_lower, "models.github.ai")
or base_url_host_matches(self._base_url_lower, "api.githubcopilot.com")
)
# Provider preferences (only, ignore, order, sort) are OpenRouter-
# specific. Only send to OpenRouter-compatible endpoints.
# TODO: Nous Portal will add transparent proxy support — re-enable
# for _is_nous when their backend is updated.
if provider_preferences and _is_openrouter:
extra_body["provider"] = provider_preferences
_is_nous = "nousresearch" in self._base_url_lower
# Kimi/Moonshot API uses extra_body.thinking (separate from the
# top-level reasoning_effort) to enable/disable reasoning mode.
# Mirror Kimi CLI's with_thinking() behavior exactly — see
# MoonshotAI/kimi-cli packages/kosong/src/kosong/chat_provider/kimi.py
_is_nvidia = "integrate.api.nvidia.com" in self._base_url_lower
_is_kimi = (
base_url_host_matches(self.base_url, "api.kimi.com")
or base_url_host_matches(self.base_url, "moonshot.ai")
or base_url_host_matches(self.base_url, "moonshot.cn")
)
if _is_kimi:
_kimi_thinking_enabled = True
if self.reasoning_config and isinstance(self.reasoning_config, dict):
if self.reasoning_config.get("enabled") is False:
_kimi_thinking_enabled = False
extra_body["thinking"] = {
"type": "enabled" if _kimi_thinking_enabled else "disabled",
# Temperature: _fixed_temperature_for_model may return OMIT_TEMPERATURE
# sentinel (temperature omitted entirely), a numeric override, or None.
try:
from agent.auxiliary_client import _fixed_temperature_for_model, OMIT_TEMPERATURE
_ft = _fixed_temperature_for_model(self.model, self.base_url)
_omit_temp = _ft is OMIT_TEMPERATURE
_fixed_temp = _ft if not _omit_temp else None
except Exception:
_omit_temp = False
_fixed_temp = None
# Provider preferences (OpenRouter-specific)
_prefs: Dict[str, Any] = {}
if self.providers_allowed:
_prefs["only"] = self.providers_allowed
if self.providers_ignored:
_prefs["ignore"] = self.providers_ignored
if self.providers_order:
_prefs["order"] = self.providers_order
if self.provider_sort:
_prefs["sort"] = self.provider_sort
if self.provider_require_parameters:
_prefs["require_parameters"] = True
if self.provider_data_collection:
_prefs["data_collection"] = self.provider_data_collection
# Anthropic max output for Claude on OpenRouter/Nous
_ant_max = None
if (_is_or or _is_nous) and "claude" in (self.model or "").lower():
try:
from agent.anthropic_adapter import _get_anthropic_max_output
_ant_max = _get_anthropic_max_output(self.model)
except Exception:
pass # fail open — let the proxy pick its default
# Qwen session metadata precomputed here (promptId is per-call random)
_qwen_meta = None
if _is_qwen:
_qwen_meta = {
"sessionId": self.session_id or "hermes",
"promptId": str(uuid.uuid4()),
}
if self._supports_reasoning_extra_body():
if _is_github_models:
github_reasoning = self._github_models_reasoning_extra_body()
if github_reasoning is not None:
extra_body["reasoning"] = github_reasoning
else:
if self.reasoning_config is not None:
rc = dict(self.reasoning_config)
# Nous Portal requires reasoning enabled — don't send
# enabled=false to it (would cause 400).
if _is_nous and rc.get("enabled") is False:
pass # omit reasoning entirely for Nous when disabled
else:
extra_body["reasoning"] = rc
else:
extra_body["reasoning"] = {
"enabled": True,
"effort": "medium"
}
# Ephemeral max output override — consume immediately so the next
# turn doesn't inherit it.
_ephemeral_out = getattr(self, "_ephemeral_max_output_tokens", None)
if _ephemeral_out is not None:
self._ephemeral_max_output_tokens = None
# Nous Portal product attribution
if _is_nous:
extra_body["tags"] = ["product=hermes-agent"]
# Ollama num_ctx: override the 2048 default so the model actually
# uses the context window it was trained for. Passed via the OpenAI
# SDK's extra_body → options.num_ctx, which Ollama's OpenAI-compat
# endpoint forwards to the runner as --ctx-size.
if self._ollama_num_ctx:
options = extra_body.get("options", {})
options["num_ctx"] = self._ollama_num_ctx
extra_body["options"] = options
# Ollama / custom provider: pass think=false when reasoning is disabled.
# Ollama does not recognise the OpenRouter-style `reasoning` extra_body
# field, so we use its native `think` parameter instead.
# This prevents thinking-capable models (Qwen3, etc.) from generating
# <think> blocks and producing empty-response errors when the user has
# set reasoning_effort: none.
if self.provider == "custom" and self.reasoning_config and isinstance(self.reasoning_config, dict):
_effort = (self.reasoning_config.get("effort") or "").strip().lower()
_enabled = self.reasoning_config.get("enabled", True)
if _effort == "none" or _enabled is False:
extra_body["think"] = False
if self._is_qwen_portal():
extra_body["vl_high_resolution_images"] = True
if extra_body:
api_kwargs["extra_body"] = extra_body
# Priority Processing / generic request overrides (e.g. service_tier).
# Applied last so overrides win over any defaults set above.
if self.request_overrides:
api_kwargs.update(self.request_overrides)
return api_kwargs
return _ct.build_kwargs(
model=self.model,
messages=api_messages,
tools=self.tools,
timeout=self._resolved_api_call_timeout(),
max_tokens=self.max_tokens,
ephemeral_max_output_tokens=_ephemeral_out,
max_tokens_param_fn=self._max_tokens_param,
reasoning_config=self.reasoning_config,
request_overrides=self.request_overrides,
session_id=getattr(self, "session_id", None),
model_lower=(self.model or "").lower(),
is_openrouter=_is_or,
is_nous=_is_nous,
is_qwen_portal=_is_qwen,
is_github_models=_is_gh,
is_nvidia_nim=_is_nvidia,
is_kimi=_is_kimi,
is_custom_provider=self.provider == "custom",
ollama_num_ctx=self._ollama_num_ctx,
provider_preferences=_prefs or None,
qwen_prepare_fn=self._qwen_prepare_chat_messages if _is_qwen else None,
qwen_prepare_inplace_fn=self._qwen_prepare_chat_messages_inplace if _is_qwen else None,
qwen_session_metadata=_qwen_meta,
fixed_temperature=_fixed_temp,
omit_temperature=_omit_temp,
supports_reasoning=self._supports_reasoning_extra_body(),
github_reasoning_extra=self._github_models_reasoning_extra_body() if _is_gh else None,
anthropic_max_output=_ant_max,
)
def _supports_reasoning_extra_body(self) -> bool:
"""Return True when reasoning extra_body is safe to send for this route/model.
@@ -7220,6 +7024,11 @@ class AIAgent:
"finish_reason": finish_reason,
}
if hasattr(assistant_message, "reasoning_content"):
raw_reasoning_content = getattr(assistant_message, "reasoning_content", None)
if raw_reasoning_content is not None:
msg["reasoning_content"] = _sanitize_surrogates(raw_reasoning_content)
if hasattr(assistant_message, 'reasoning_details') and assistant_message.reasoning_details:
# Pass reasoning_details back unmodified so providers (OpenRouter,
# Anthropic, OpenAI) can maintain reasoning continuity across turns.
@@ -7294,6 +7103,30 @@ class AIAgent:
return msg
def _copy_reasoning_content_for_api(self, source_msg: dict, api_msg: dict) -> None:
"""Copy provider-facing reasoning fields onto an API replay message."""
if source_msg.get("role") != "assistant":
return
explicit_reasoning = source_msg.get("reasoning_content")
if isinstance(explicit_reasoning, str):
api_msg["reasoning_content"] = explicit_reasoning
return
normalized_reasoning = source_msg.get("reasoning")
if isinstance(normalized_reasoning, str) and normalized_reasoning:
api_msg["reasoning_content"] = normalized_reasoning
return
kimi_requires_reasoning = (
self.provider in {"kimi-coding", "kimi-coding-cn"}
or base_url_host_matches(self.base_url, "api.kimi.com")
or base_url_host_matches(self.base_url, "moonshot.ai")
or base_url_host_matches(self.base_url, "moonshot.cn")
)
if kimi_requires_reasoning and source_msg.get("tool_calls"):
api_msg["reasoning_content"] = ""
@staticmethod
def _sanitize_tool_calls_for_strict_api(api_msg: dict) -> dict:
"""Strip Codex Responses API fields from tool_calls for strict providers.
@@ -7377,10 +7210,7 @@ class AIAgent:
api_messages = []
for msg in messages:
api_msg = msg.copy()
if msg.get("role") == "assistant":
reasoning = msg.get("reasoning")
if reasoning:
api_msg["reasoning_content"] = reasoning
self._copy_reasoning_content_for_api(msg, api_msg)
api_msg.pop("reasoning", None)
api_msg.pop("finish_reason", None)
api_msg.pop("_flush_sentinel", None)
@@ -7438,7 +7268,7 @@ class AIAgent:
if not _aux_available and self.api_mode == "codex_responses":
# No auxiliary client -- use the Codex Responses path directly
codex_kwargs = self._build_api_kwargs(api_messages)
codex_kwargs["tools"] = self._responses_tools([memory_tool_def])
codex_kwargs["tools"] = self._get_codex_transport().convert_tools([memory_tool_def])
if _flush_temperature is not None:
codex_kwargs["temperature"] = _flush_temperature
else:
@@ -7473,9 +7303,15 @@ class AIAgent:
# Extract tool calls from the response, handling all API formats
tool_calls = []
if self.api_mode == "codex_responses" and not _aux_available:
assistant_msg, _ = self._normalize_codex_response(response)
if assistant_msg and assistant_msg.tool_calls:
tool_calls = assistant_msg.tool_calls
_ct_flush = self._get_codex_transport()
_cnr_flush = _ct_flush.normalize_response(response)
if _cnr_flush and _cnr_flush.tool_calls:
tool_calls = [
SimpleNamespace(
id=tc.id, type="function",
function=SimpleNamespace(name=tc.name, arguments=tc.arguments),
) for tc in _cnr_flush.tool_calls
]
elif self.api_mode == "anthropic_messages" and not _aux_available:
_tfn = self._get_anthropic_transport()
_flush_nr = _tfn.normalize_response(response, strip_tool_prefix=self._is_anthropic_oauth)
@@ -8519,8 +8355,9 @@ class AIAgent:
codex_kwargs = self._build_api_kwargs(api_messages)
codex_kwargs.pop("tools", None)
summary_response = self._run_codex_stream(codex_kwargs)
assistant_message, _ = self._normalize_codex_response(summary_response)
final_response = (assistant_message.content or "").strip() if assistant_message else ""
_ct_sum = self._get_codex_transport()
_cnr_sum = _ct_sum.normalize_response(summary_response)
final_response = (_cnr_sum.content or "").strip()
else:
summary_kwargs = {
"model": self.model,
@@ -8577,8 +8414,9 @@ class AIAgent:
codex_kwargs = self._build_api_kwargs(api_messages)
codex_kwargs.pop("tools", None)
retry_response = self._run_codex_stream(codex_kwargs)
retry_msg, _ = self._normalize_codex_response(retry_response)
final_response = (retry_msg.content or "").strip() if retry_msg else ""
_ct_retry = self._get_codex_transport()
_cnr_retry = _ct_retry.normalize_response(retry_response)
final_response = (_cnr_retry.content or "").strip()
elif self.api_mode == "anthropic_messages":
_tretry = self._get_anthropic_transport()
_ant_kw2 = _tretry.build_kwargs(model=self.model, messages=api_messages, tools=None,
@@ -8689,6 +8527,11 @@ class AIAgent:
self._persist_user_message_override = persist_user_message
# Generate unique task_id if not provided to isolate VMs between concurrent tasks
effective_task_id = task_id or str(uuid.uuid4())
# Expose the active task_id so tools running mid-turn (e.g. delegate_task
# in delegate_tool.py) can identify this agent for the cross-agent file
# state registry. Set BEFORE any tool dispatch so snapshots taken at
# child-launch time see the parent's real id, not None.
self._current_task_id = effective_task_id
# Reset retry counters and iteration budget at the start of each turn
# so subagent usage from a previous turn doesn't eat into the next one.
@@ -9127,11 +8970,7 @@ class AIAgent:
# For ALL assistant messages, pass reasoning back to the API
# This ensures multi-turn reasoning context is preserved
if msg.get("role") == "assistant":
reasoning_text = msg.get("reasoning")
if reasoning_text:
# Add reasoning_content for API compatibility (Moonshot AI, Novita, OpenRouter)
api_msg["reasoning_content"] = reasoning_text
self._copy_reasoning_content_for_api(msg, api_msg)
# Remove 'reasoning' field - it's for trajectory storage only
# We've copied it to 'reasoning_content' for the API above
@@ -9335,7 +9174,7 @@ class AIAgent:
if self._force_ascii_payload:
_sanitize_structure_non_ascii(api_kwargs)
if self.api_mode == "codex_responses":
api_kwargs = self._preflight_codex_api_kwargs(api_kwargs, allow_stream=False)
api_kwargs = self._get_codex_transport().preflight_kwargs(api_kwargs, allow_stream=False)
try:
from hermes_cli.plugins import invoke_hook as _invoke_hook
@@ -9423,38 +9262,34 @@ class AIAgent:
response_invalid = False
error_details = []
if self.api_mode == "codex_responses":
output_items = getattr(response, "output", None) if response is not None else None
if response is None:
response_invalid = True
error_details.append("response is None")
elif not isinstance(output_items, list):
response_invalid = True
error_details.append("response.output is not a list")
elif not output_items:
# Stream backfill may have failed, but
# _normalize_codex_response can still recover
# from response.output_text. Only mark invalid
# when that fallback is also absent.
_out_text = getattr(response, "output_text", None)
_out_text_stripped = _out_text.strip() if isinstance(_out_text, str) else ""
if _out_text_stripped:
logger.debug(
"Codex response.output is empty but output_text is present "
"(%d chars); deferring to normalization.",
len(_out_text_stripped),
)
else:
_resp_status = getattr(response, "status", None)
_resp_incomplete = getattr(response, "incomplete_details", None)
logger.warning(
"Codex response.output is empty after stream backfill "
"(status=%s, incomplete_details=%s, model=%s). %s",
_resp_status, _resp_incomplete,
getattr(response, "model", None),
f"api_mode={self.api_mode} provider={self.provider}",
)
_ct_v = self._get_codex_transport()
if not _ct_v.validate_response(response):
if response is None:
response_invalid = True
error_details.append("response.output is empty")
error_details.append("response is None")
else:
# output_text fallback: stream backfill may have failed
# but normalize can still recover from output_text
_out_text = getattr(response, "output_text", None)
_out_text_stripped = _out_text.strip() if isinstance(_out_text, str) else ""
if _out_text_stripped:
logger.debug(
"Codex response.output is empty but output_text is present "
"(%d chars); deferring to normalization.",
len(_out_text_stripped),
)
else:
_resp_status = getattr(response, "status", None)
_resp_incomplete = getattr(response, "incomplete_details", None)
logger.warning(
"Codex response.output is empty after stream backfill "
"(status=%s, incomplete_details=%s, model=%s). %s",
_resp_status, _resp_incomplete,
getattr(response, "model", None),
f"api_mode={self.api_mode} provider={self.provider}",
)
response_invalid = True
error_details.append("response.output is empty")
elif self.api_mode == "anthropic_messages":
_tv = self._get_anthropic_transport()
if not _tv.validate_response(response):
@@ -9463,8 +9298,17 @@ class AIAgent:
error_details.append("response is None")
else:
error_details.append("response.content invalid (not a non-empty list)")
elif self.api_mode == "bedrock_converse":
_btv = self._get_bedrock_transport()
if not _btv.validate_response(response):
response_invalid = True
if response is None:
error_details.append("response is None")
else:
error_details.append("Bedrock response invalid (no output or choices)")
else:
if response is None or not hasattr(response, 'choices') or response.choices is None or not response.choices:
_ctv = self._get_chat_completions_transport()
if not _ctv.validate_response(response):
response_invalid = True
if response is None:
error_details.append("response is None")
@@ -9625,6 +9469,10 @@ class AIAgent:
elif self.api_mode == "anthropic_messages":
_tfr = self._get_anthropic_transport()
finish_reason = _tfr.map_finish_reason(response.stop_reason)
elif self.api_mode == "bedrock_converse":
# Bedrock response is already normalized at dispatch — finish_reason
# is already in OpenAI format via normalize_converse_response()
finish_reason = response.choices[0].finish_reason if hasattr(response, "choices") and response.choices else "stop"
else:
finish_reason = response.choices[0].finish_reason
assistant_message = response.choices[0].message
@@ -9919,6 +9767,7 @@ class AIAgent:
billing_mode="subscription_included"
if cost_result.status == "included" else None,
model=self.model,
api_call_count=1,
)
except Exception:
pass # never block the agent loop
@@ -10195,6 +10044,27 @@ class AIAgent:
if self._try_refresh_nous_client_credentials(force=True):
print(f"{self.log_prefix}🔐 Nous agent key refreshed after 401. Retrying request...")
continue
# Credential refresh didn't help — show diagnostic info.
# Most common causes: Portal OAuth expired/revoked,
# account out of credits, or agent key blocked.
from hermes_constants import display_hermes_home as _dhh_fn
_dhh = _dhh_fn()
_body_text = ""
try:
_body = getattr(api_error, "body", None) or getattr(api_error, "response", None)
if _body is not None:
_body_text = str(_body)[:200]
except Exception:
pass
print(f"{self.log_prefix}🔐 Nous 401 — Portal authentication failed.")
if _body_text:
print(f"{self.log_prefix} Response: {_body_text}")
print(f"{self.log_prefix} Most likely: Portal OAuth expired, account out of credits, or agent key revoked.")
print(f"{self.log_prefix} Troubleshooting:")
print(f"{self.log_prefix} • Re-authenticate: hermes login --provider nous")
print(f"{self.log_prefix} • Check credits / billing: https://portal.nousresearch.com")
print(f"{self.log_prefix} • Verify stored credentials: {_dhh}/auth.json")
print(f"{self.log_prefix} • Switch providers temporarily: /model <model> --provider openrouter")
if (
self.api_mode == "anthropic_messages"
and status_code == 401
@@ -10880,7 +10750,40 @@ class AIAgent:
try:
if self.api_mode == "codex_responses":
assistant_message, finish_reason = self._normalize_codex_response(response)
_ct = self._get_codex_transport()
_cnr = _ct.normalize_response(response)
# Back-compat shim: downstream expects SimpleNamespace with
# codex-specific fields (.codex_reasoning_items, .reasoning_details,
# and .call_id/.response_item_id on tool calls).
_tc_list = None
if _cnr.tool_calls:
_tc_list = []
for tc in _cnr.tool_calls:
_tc_ns = SimpleNamespace(
id=tc.id, type="function",
function=SimpleNamespace(name=tc.name, arguments=tc.arguments),
)
if tc.provider_data:
if tc.provider_data.get("call_id"):
_tc_ns.call_id = tc.provider_data["call_id"]
if tc.provider_data.get("response_item_id"):
_tc_ns.response_item_id = tc.provider_data["response_item_id"]
_tc_list.append(_tc_ns)
assistant_message = SimpleNamespace(
content=_cnr.content,
tool_calls=_tc_list or None,
reasoning=_cnr.reasoning,
reasoning_content=None,
codex_reasoning_items=(
_cnr.provider_data.get("codex_reasoning_items")
if _cnr.provider_data else None
),
reasoning_details=(
_cnr.provider_data.get("reasoning_details")
if _cnr.provider_data else None
),
)
finish_reason = _cnr.finish_reason
elif self.api_mode == "anthropic_messages":
_transport = self._get_anthropic_transport()
_nr = _transport.normalize_response(
+9
View File
@@ -44,12 +44,16 @@ AUTHOR_MAP = {
"teknium@nousresearch.com": "teknium1",
"127238744+teknium1@users.noreply.github.com": "teknium1",
# contributors (from noreply pattern)
"wangqiang@wangqiangdeMac-mini.local": "xiaoqiang243",
"snreynolds2506@gmail.com": "snreynolds",
"35742124+0xbyt4@users.noreply.github.com": "0xbyt4",
"71184274+MassiveMassimo@users.noreply.github.com": "MassiveMassimo",
"massivemassimo@users.noreply.github.com": "MassiveMassimo",
"82637225+kshitijk4poor@users.noreply.github.com": "kshitijk4poor",
"keifergu@tencent.com": "keifergu",
"kshitijk4poor@users.noreply.github.com": "kshitijk4poor",
"abner.the.foreman@agentmail.to": "Abnertheforeman",
"harryykyle1@gmail.com": "hharry11",
"kshitijk4poor@gmail.com": "kshitijk4poor",
"16443023+stablegenius49@users.noreply.github.com": "stablegenius49",
"185121704+stablegenius49@users.noreply.github.com": "stablegenius49",
@@ -91,6 +95,8 @@ AUTHOR_MAP = {
"135070653+sgaofen@users.noreply.github.com": "sgaofen",
"nocoo@users.noreply.github.com": "nocoo",
"30841158+n-WN@users.noreply.github.com": "n-WN",
"tsuijinglei@gmail.com": "hiddenpuppy",
"jerome@clawwork.ai": "HiddenPuppy",
"leoyuan0099@gmail.com": "keyuyuan",
"bxzt2006@163.com": "Only-Code-A",
"i@troy-y.org": "TroyMitchell911",
@@ -98,6 +104,8 @@ AUTHOR_MAP = {
"hansnow@users.noreply.github.com": "hansnow",
"134848055+UNLINEARITY@users.noreply.github.com": "UNLINEARITY",
"ben.burtenshaw@gmail.com": "burtenshaw",
"roopaknijhara@gmail.com": "rnijhara",
"Maaannnn@users.noreply.github.com": "Maaannnn",
# contributors (manual mapping from git names)
"ahmedsherif95@gmail.com": "asheriif",
"liujinkun@bytedance.com": "liujinkun2025",
@@ -333,6 +341,7 @@ AUTHOR_MAP = {
"asslaenn5@gmail.com": "Aslaaen",
"shalompmc0505@naver.com": "pinion05",
"105142614+VTRiot@users.noreply.github.com": "VTRiot",
"vivien000812@gmail.com": "iamagenius00",
}
+77
View File
@@ -0,0 +1,77 @@
# Port Notes — baoyu-comic
Ported from [JimLiu/baoyu-skills](https://github.com/JimLiu/baoyu-skills) v1.56.1.
## Changes from upstream
### SKILL.md adaptations
| Change | Upstream | Hermes |
|--------|----------|--------|
| Metadata namespace | `openclaw` | `hermes` (with `tags` + `homepage`) |
| Trigger | Slash commands / CLI flags | Natural language skill matching |
| User config | EXTEND.md file (project/user/XDG paths) | Removed — not part of Hermes infra |
| User prompts | `AskUserQuestion` (batched) | `clarify` tool (one question at a time) |
| Image generation | baoyu-imagine (Bun/TypeScript, supports `--ref`) | `image_generate`**prompt-only**, returns a URL; no reference image input; agent must download the URL to the output directory |
| PDF assembly | `scripts/merge-to-pdf.ts` (Bun + `pdf-lib`) | Removed — the PDF merge step is out of scope for this port; pages are delivered as PNGs only |
| Platform support | Linux/macOS/Windows/WSL/PowerShell | Linux/macOS only |
| File operations | Generic instructions | Hermes file tools (`write_file`, `read_file`) |
### Structural removals
- **`references/config/` directory** (removed entirely):
- `first-time-setup.md` — blocking first-time setup flow for EXTEND.md
- `preferences-schema.md` — EXTEND.md YAML schema
- `watermark-guide.md` — watermark config (tied to EXTEND.md)
- **`scripts/` directory** (removed entirely): upstream's `merge-to-pdf.ts` depended on `pdf-lib`, which is not declared anywhere in the Hermes repo. Rather than add a new dependency, the port drops PDF assembly and delivers per-page PNGs.
- **Workflow Step 8 (Merge to PDF)** removed from `workflow.md`; Step 9 (Completion report) renumbered to Step 8.
- **Workflow Step 1.1** — "Load Preferences (EXTEND.md)" section removed from `workflow.md`; steps 1.2/1.3 renumbered to 1.1/1.2.
- **Generic "User Input Tools" and "Image Generation Tools" preambles** — SKILL.md no longer lists fallback rules for multiple possible tools; it references `clarify` and `image_generate` directly.
### Image generation strategy changes
`image_generate`'s schema accepts only `prompt` and `aspect_ratio` (`landscape` | `portrait` | `square`). Upstream's reference-image flow (`--ref characters.png` for character consistency, plus user-supplied refs for style/palette/scene) does not map to this tool, so the workflow was restructured:
- **Character sheet PNG** is still generated for multi-page comics, but it is repositioned as a **human-facing review artifact** (for visual verification) and a reference for later regenerations / manual prompt edits. Page prompts themselves are built from the **text descriptions** in `characters/characters.md` (embedded inline during Step 5). `image_generate` never sees the PNG as a visual input.
- **User-supplied reference images** are reduced to `style` / `palette` / `scene` trait extraction — traits are embedded in the prompt body; the image files themselves are kept only for provenance under `refs/`.
- **Page prompts** now mandate that character descriptions are embedded inline (copied from `characters/characters.md`) — this is the only mechanism left to enforce cross-page character consistency.
- **Download step** — after every `image_generate` call, the returned URL is fetched to disk (e.g., `curl -fsSL "<url>" -o <target>.png`) and verified before the workflow advances.
### SKILL.md reductions
- CLI option columns (`--art`, `--tone`, `--layout`, `--aspect`, `--lang`, `--ref`, `--storyboard-only`, `--prompts-only`, `--images-only`, `--regenerate`) converted to plain-English option descriptions.
- Preset files (`presets/*.md`) and `ohmsha-guide.md`: `` `--style X` `` / `` `--art X --tone Y` `` shorthand rewritten to `art=X, tone=Y` + natural-language references.
- `partial-workflows.md`: per-skill slash command invocations rewritten as user-intent cues; PDF-related outputs removed.
- `auto-selection.md`: priority order dropped the EXTEND.md tier.
- `analysis-framework.md`: language-priority comment updated (user option → conversation → source).
### File naming convention
Source content pasted by the user is saved as `source-{slug}.md`, where `{slug}` is the kebab-case topic slug used for the output directory. Backups follow the same pattern with a `-backup-YYYYMMDD-HHMMSS` suffix. SKILL.md and `workflow.md` now agree on this single convention.
### What was preserved verbatim
- All 6 art-style definitions (`references/art-styles/`)
- All 7 tone definitions (`references/tones/`)
- All 7 layout definitions (`references/layouts/`)
- Core templates: `character-template.md`, `storyboard-template.md`, `base-prompt.md`
- Preset bodies (only the first few intro lines adapted; special rules unchanged)
- Author, version, homepage attribution
## Syncing with upstream
To pull upstream updates:
```bash
# Compare versions
curl -sL https://raw.githubusercontent.com/JimLiu/baoyu-skills/main/skills/baoyu-comic/SKILL.md | head -5
# Look for the version: line
# Diff a reference file
diff <(curl -sL https://raw.githubusercontent.com/JimLiu/baoyu-skills/main/skills/baoyu-comic/references/art-styles/manga.md) \
references/art-styles/manga.md
```
Art-style, tone, and layout reference files can usually be overwritten directly (they're upstream-verbatim). `SKILL.md`, `references/workflow.md`, `references/partial-workflows.md`, `references/auto-selection.md`, `references/analysis-framework.md`, `references/ohmsha-guide.md`, and `references/presets/*.md` must be manually merged since they contain Hermes-specific adaptations.
If upstream adds a Hermes-compatible PDF merge step (no extra npm deps), restore `scripts/` and reintroduce Step 8 in `workflow.md`.
+246
View File
@@ -0,0 +1,246 @@
---
name: baoyu-comic
description: Knowledge comic creator supporting multiple art styles and tones. Creates original educational comics with detailed panel layouts and sequential image generation. Use when user asks to create "知识漫画", "教育漫画", "biography comic", "tutorial comic", or "Logicomix-style comic".
version: 1.56.1
author: 宝玉 (JimLiu)
license: MIT
metadata:
hermes:
tags: [comic, knowledge-comic, creative, image-generation]
homepage: https://github.com/JimLiu/baoyu-skills#baoyu-comic
---
# Knowledge Comic Creator
Adapted from [baoyu-comic](https://github.com/JimLiu/baoyu-skills) for Hermes Agent's tool ecosystem.
Create original knowledge comics with flexible art style × tone combinations.
## When to Use
Trigger this skill when the user asks to create a knowledge/educational comic, biography comic, tutorial comic, or uses terms like "知识漫画", "教育漫画", or "Logicomix-style". The user provides content (text, file path, URL, or topic) and optionally specifies art style, tone, layout, aspect ratio, or language.
## Reference Images
Hermes' `image_generate` tool is **prompt-only** — it accepts a text prompt and an aspect ratio, and returns an image URL. It does **NOT** accept reference images. When the user supplies a reference image, use it to **extract traits in text** that get embedded in every page prompt:
**Intake**: Accept file paths when the user provides them (or pastes images in conversation).
- File path(s) → copy to `refs/NN-ref-{slug}.{ext}` alongside the comic output for provenance
- Pasted image with no path → ask the user for the path via `clarify`, or extract style traits verbally as a text fallback
- No reference → skip this section
**Usage modes** (per reference):
| Usage | Effect |
|-------|--------|
| `style` | Extract style traits (line treatment, texture, mood) and append to every page's prompt body |
| `palette` | Extract hex colors and append to every page's prompt body |
| `scene` | Extract scene composition or subject notes and append to the relevant page(s) |
**Record in each page's prompt frontmatter** when refs exist:
```yaml
references:
- ref_id: 01
filename: 01-ref-scene.png
usage: style
traits: "muted earth tones, soft-edged ink wash, low-contrast backgrounds"
```
Character consistency is driven by **text descriptions** in `characters/characters.md` (written in Step 3) that get embedded inline in every page prompt (Step 5). The optional PNG character sheet generated in Step 7.1 is a human-facing review artifact, not an input to `image_generate`.
## Options
### Visual Dimensions
| Option | Values | Description |
|--------|--------|-------------|
| Art | ligne-claire (default), manga, realistic, ink-brush, chalk, minimalist | Art style / rendering technique |
| Tone | neutral (default), warm, dramatic, romantic, energetic, vintage, action | Mood / atmosphere |
| Layout | standard (default), cinematic, dense, splash, mixed, webtoon, four-panel | Panel arrangement |
| Aspect | 3:4 (default, portrait), 4:3 (landscape), 16:9 (widescreen) | Page aspect ratio |
| Language | auto (default), zh, en, ja, etc. | Output language |
| Refs | File paths | Reference images used for style / palette trait extraction (not passed to the image model). See [Reference Images](#reference-images) above. |
### Partial Workflow Options
| Option | Description |
|--------|-------------|
| Storyboard only | Generate storyboard only, skip prompts and images |
| Prompts only | Generate storyboard + prompts, skip images |
| Images only | Generate images from existing prompts directory |
| Regenerate N | Regenerate specific page(s) only (e.g., `3` or `2,5,8`) |
Details: [references/partial-workflows.md](references/partial-workflows.md)
### Art, Tone & Preset Catalogue
- **Art styles** (6): `ligne-claire`, `manga`, `realistic`, `ink-brush`, `chalk`, `minimalist`. Full definitions at `references/art-styles/<style>.md`.
- **Tones** (7): `neutral`, `warm`, `dramatic`, `romantic`, `energetic`, `vintage`, `action`. Full definitions at `references/tones/<tone>.md`.
- **Presets** (5) with special rules beyond plain art+tone:
| Preset | Equivalent | Hook |
|--------|-----------|------|
| `ohmsha` | manga + neutral | Visual metaphors, no talking heads, gadget reveals |
| `wuxia` | ink-brush + action | Qi effects, combat visuals, atmospheric |
| `shoujo` | manga + romantic | Decorative elements, eye details, romantic beats |
| `concept-story` | manga + warm | Visual symbol system, growth arc, dialogue+action balance |
| `four-panel` | minimalist + neutral + four-panel layout | 起承转合 structure, B&W + spot color, stick-figure characters |
Full rules at `references/presets/<preset>.md` — load the file when a preset is picked.
- **Compatibility matrix** and **content-signal → preset** table live in [references/auto-selection.md](references/auto-selection.md). Read it before recommending combinations in Step 2.
## File Structure
Output directory: `comic/{topic-slug}/`
- Slug: 2-4 words kebab-case from topic (e.g., `alan-turing-bio`)
- Conflict: append timestamp (e.g., `turing-story-20260118-143052`)
**Contents**:
| File | Description |
|------|-------------|
| `source-{slug}.md` | Saved source content (kebab-case slug matches the output directory) |
| `analysis.md` | Content analysis |
| `storyboard.md` | Storyboard with panel breakdown |
| `characters/characters.md` | Character definitions |
| `characters/characters.png` | Character reference sheet (downloaded from `image_generate`) |
| `prompts/NN-{cover\|page}-[slug].md` | Generation prompts |
| `NN-{cover\|page}-[slug].png` | Generated images (downloaded from `image_generate`) |
| `refs/NN-ref-{slug}.{ext}` | User-supplied reference images (optional, for provenance) |
## Language Handling
**Detection Priority**:
1. User-specified language (explicit option)
2. User's conversation language
3. Source content language
**Rule**: Use user's input language for ALL interactions:
- Storyboard outlines and scene descriptions
- Image generation prompts
- User selection options and confirmations
- Progress updates, questions, errors, summaries
Technical terms remain in English.
## Workflow
### Progress Checklist
```
Comic Progress:
- [ ] Step 1: Setup & Analyze
- [ ] 1.1 Analyze content
- [ ] 1.2 Check existing directory
- [ ] Step 2: Confirmation - Style & options ⚠️ REQUIRED
- [ ] Step 3: Generate storyboard + characters
- [ ] Step 4: Review outline (conditional)
- [ ] Step 5: Generate prompts
- [ ] Step 6: Review prompts (conditional)
- [ ] Step 7: Generate images
- [ ] 7.1 Generate character sheet (if needed) → characters/characters.png
- [ ] 7.2 Generate pages (with character descriptions embedded in prompt)
- [ ] Step 8: Completion report
```
### Flow
```
Input → Analyze → [Check Existing?] → [Confirm: Style + Reviews] → Storyboard → [Review?] → Prompts → [Review?] → Images → Complete
```
### Step Summary
| Step | Action | Key Output |
|------|--------|------------|
| 1.1 | Analyze content | `analysis.md`, `source-{slug}.md` |
| 1.2 | Check existing directory | Handle conflicts |
| 2 | Confirm style, focus, audience, reviews | User preferences |
| 3 | Generate storyboard + characters | `storyboard.md`, `characters/` |
| 4 | Review outline (if requested) | User approval |
| 5 | Generate prompts | `prompts/*.md` |
| 6 | Review prompts (if requested) | User approval |
| 7.1 | Generate character sheet (if needed) | `characters/characters.png` |
| 7.2 | Generate pages | `*.png` files |
| 8 | Completion report | Summary |
### User Questions
Use the `clarify` tool to confirm options. Since `clarify` handles one question at a time, ask the most important question first and proceed sequentially. See [references/workflow.md](references/workflow.md) for the full Step 2 question set.
**Timeout handling (CRITICAL)**: `clarify` can return `"The user did not provide a response within the time limit. Use your best judgement to make the choice and proceed."` — this is NOT user consent to default everything.
- Treat it as a default **for that one question only**. Continue asking the remaining Step 2 questions in sequence; each question is an independent consent point.
- **Surface the default to the user visibly** in your next message so they have a chance to correct it: e.g. `"Style: defaulted to ohmsha preset (clarify timed out). Say the word to switch."` — an unreported default is indistinguishable from never having asked.
- Do NOT collapse Step 2 into a single "use all defaults" pass after one timeout. If the user is genuinely absent, they will be equally absent for all five questions — but they can correct visible defaults when they return, and cannot correct invisible ones.
### Step 7: Image Generation
Use Hermes' built-in `image_generate` tool for all image rendering. Its schema accepts only `prompt` and `aspect_ratio` (`landscape` | `portrait` | `square`); it **returns a URL**, not a local file. Every generated page or character sheet must therefore be downloaded to the output directory.
**Prompt file requirement (hard)**: write each image's full, final prompt to a standalone file under `prompts/` (naming: `NN-{type}-[slug].md`) BEFORE calling `image_generate`. The prompt file is the reproducibility record.
**Aspect ratio mapping** — the storyboard's `aspect_ratio` field maps to `image_generate`'s format as follows:
| Storyboard ratio | `image_generate` format |
|------------------|-------------------------|
| `3:4`, `9:16`, `2:3` | `portrait` |
| `4:3`, `16:9`, `3:2` | `landscape` |
| `1:1` | `square` |
**Download step** — after every `image_generate` call:
1. Read the URL from the tool result
2. Fetch the image bytes using an **absolute** output path, e.g.
`curl -fsSL "<url>" -o /abs/path/to/comic/<slug>/NN-page-<slug>.png`
3. Verify the file exists and is non-empty at that exact path before proceeding to the next page
**Never rely on shell CWD persistence for `-o` paths.** The terminal tool's persistent-shell CWD can change between batches (session expiry, `TERMINAL_LIFETIME_SECONDS`, a failed `cd` that leaves you in the wrong directory). `curl -o relative/path.png` is a silent footgun: if CWD has drifted, the file lands somewhere else with no error. **Always pass a fully-qualified absolute path to `-o`**, or pass `workdir=<abs path>` to the terminal tool. Incident Apr 2026: pages 06-09 of a 10-page comic landed at the repo root instead of `comic/<slug>/` because batch 3 inherited a stale CWD from batch 2 and `curl -o 06-page-skills.png` wrote to the wrong directory. The agent then spent several turns claiming the files existed where they didn't.
**7.1 Character sheet** — generate it (to `characters/characters.png`, aspect `landscape`) when the comic is multi-page with recurring characters. Skip for simple presets (e.g., four-panel minimalist) or single-page comics. The prompt file at `characters/characters.md` must exist before invoking `image_generate`. The rendered PNG is a **human-facing review artifact** (so the user can visually verify character design) and a reference for later regenerations or manual prompt edits — it does **not** drive Step 7.2. Page prompts are already written in Step 5 from the **text descriptions** in `characters/characters.md`; `image_generate` cannot accept images as visual input.
**7.2 Pages** — each page's prompt MUST already be at `prompts/NN-{cover|page}-[slug].md` before invoking `image_generate`. Because `image_generate` is prompt-only, character consistency is enforced by **embedding character descriptions (sourced from `characters/characters.md`) inline in every page prompt during Step 5**. The embedding is done uniformly whether or not a PNG sheet is produced in 7.1; the PNG is only a review/regeneration aid.
**Backup rule**: existing `prompts/…md` and `…png` files → rename with `-backup-YYYYMMDD-HHMMSS` suffix before regenerating.
Full step-by-step workflow (analysis, storyboard, review gates, regeneration variants): [references/workflow.md](references/workflow.md).
## References
**Core Templates**:
- [analysis-framework.md](references/analysis-framework.md) - Deep content analysis
- [character-template.md](references/character-template.md) - Character definition format
- [storyboard-template.md](references/storyboard-template.md) - Storyboard structure
- [ohmsha-guide.md](references/ohmsha-guide.md) - Ohmsha manga specifics
**Style Definitions**:
- `references/art-styles/` - Art styles (ligne-claire, manga, realistic, ink-brush, chalk, minimalist)
- `references/tones/` - Tones (neutral, warm, dramatic, romantic, energetic, vintage, action)
- `references/presets/` - Presets with special rules (ohmsha, wuxia, shoujo, concept-story, four-panel)
- `references/layouts/` - Layouts (standard, cinematic, dense, splash, mixed, webtoon, four-panel)
**Workflow**:
- [workflow.md](references/workflow.md) - Full workflow details
- [auto-selection.md](references/auto-selection.md) - Content signal analysis
- [partial-workflows.md](references/partial-workflows.md) - Partial workflow options
## Page Modification
| Action | Steps |
|--------|-------|
| **Edit** | **Update prompt file FIRST** → regenerate image → download new PNG |
| **Add** | Create prompt at position → generate with character descriptions embedded → renumber subsequent → update storyboard |
| **Delete** | Remove files → renumber subsequent → update storyboard |
**IMPORTANT**: When updating pages, ALWAYS update the prompt file (`prompts/NN-{cover|page}-[slug].md`) FIRST before regenerating. This ensures changes are documented and reproducible.
## Pitfalls
- Image generation: 10-30 seconds per page; auto-retry once on failure
- **Always download** the URL returned by `image_generate` to a local PNG — downstream tooling (and the user's review) expects files in the output directory, not ephemeral URLs
- **Use absolute paths for `curl -o`** — never rely on persistent-shell CWD across batches. Silent footgun: files land in the wrong directory and subsequent `ls` on the intended path shows nothing. See Step 7 "Download step".
- Use stylized alternatives for sensitive public figures
- **Step 2 confirmation required** - do not skip
- **Steps 4/6 conditional** - only if user requested in Step 2
- **Step 7.1 character sheet** - recommended for multi-page comics, optional for simple presets. The PNG is a review/regeneration aid; page prompts (written in Step 5) use the text descriptions in `characters/characters.md`, not the PNG. `image_generate` does not accept images as visual input
- **Strip secrets** — scan source content for API keys, tokens, or credentials before writing any output file
@@ -0,0 +1,176 @@
# Comic Content Analysis Framework
Deep analysis framework for transforming source content into effective visual storytelling.
## Purpose
Before creating a comic, thoroughly analyze the source material to:
- Identify the target audience and their needs
- Determine what value the comic will deliver
- Extract narrative potential for visual storytelling
- Plan character arcs and key moments
## Analysis Dimensions
### 1. Core Content (Understanding "What")
**Central Message**
- What is the single most important idea readers should take away?
- Can you express it in one sentence?
**Key Concepts**
- What are the essential concepts readers must understand?
- How should these concepts be visualized?
- Which concepts need simplified explanations?
**Content Structure**
- How is the source material organized?
- What is the natural narrative arc?
- Where are the climax and turning points?
**Evidence & Examples**
- What concrete examples, data, or stories support the main ideas?
- Which examples translate well to visual panels?
- What can be shown rather than told?
### 2. Context & Background (Understanding "Why")
**Source Origin**
- Who created this content? What is their perspective?
- What was the original purpose?
- Is there bias to be aware of?
**Historical/Cultural Context**
- When and where does the story take place?
- What background knowledge do readers need?
- What period-specific visual elements are required?
**Underlying Assumptions**
- What does the source assume readers already know?
- What implicit beliefs or values are present?
- Should the comic challenge or reinforce these?
### 3. Audience Analysis
**Primary Audience**
- Who will read this comic?
- What is their existing knowledge level?
- What are their interests and motivations?
**Secondary Audiences**
- Who else might benefit from this comic?
- How might their needs differ?
**Reader Questions**
- What questions will readers have?
- What misconceptions might they bring?
- What "aha moments" can we create?
### 4. Value Proposition
**Knowledge Value**
- What will readers learn?
- What new perspectives will they gain?
- How will this change their understanding?
**Emotional Value**
- What emotions should readers feel?
- What connections will they make with characters?
- What will make this memorable?
**Practical Value**
- Can readers apply what they learn?
- What actions might this inspire?
- What conversations might it spark?
### 5. Narrative Potential
**Story Arc Candidates**
- What natural narratives exist in the content?
- Where is the conflict or tension?
- What transformations occur?
**Character Potential**
- Who are the key figures?
- What are their motivations and obstacles?
- How do they change throughout?
**Visual Opportunities**
- What scenes have strong visual potential?
- Where can abstract concepts become concrete images?
- What metaphors can be visualized?
**Dramatic Moments**
- What are the breakthrough/revelation moments?
- Where are the emotional peaks?
- What creates tension and release?
### 6. Adaptation Considerations
**What to Keep**
- Essential facts and ideas
- Key quotes or moments
- Core emotional beats
**What to Simplify**
- Complex explanations
- Dense technical details
- Lengthy descriptions
**What to Expand**
- Brief mentions that deserve more attention
- Implied emotions or relationships
- Visual details not in source
**What to Omit**
- Tangential information
- Redundant examples
- Content that doesn't serve the narrative
## Output Format
Analysis results should be saved to `analysis.md` with:
1. **YAML Front Matter**: Metadata (title, topic, time_span, source_language, user_language, aspect_ratio, recommended_page_count, recommended_art, recommended_tone, recommended_layout)
2. **Target Audience**: Primary, secondary, tertiary audiences with their needs
3. **Value Proposition**: What readers will gain (knowledge, emotional, practical)
4. **Core Themes**: Table with theme, narrative potential, visual opportunity
5. **Key Figures & Story Arcs**: Character profiles with arcs, visual identity, key moments
6. **Content Signals**: Style and layout recommendations based on content type
7. **Recommended Approaches**: Narrative approaches ranked by suitability
### YAML Front Matter Example
```yaml
---
title: "Alan Turing: The Father of Computing"
topic: alan-turing-biography
time_span: 1912-1954
source_language: en
user_language: zh # User-specified or detected from conversation
aspect_ratio: "3:4"
recommended_page_count: 16
recommended_art: ligne-claire # ligne-claire|manga|realistic|ink-brush|chalk
recommended_tone: neutral # neutral|warm|dramatic|romantic|energetic|vintage|action
recommended_layout: mixed # standard|cinematic|dense|splash|mixed|webtoon
---
```
### Language Fields
| Field | Description |
|-------|-------------|
| `source_language` | Detected language of source content |
| `user_language` | Output language for comic (user-specified option > conversation language > source_language) |
## Analysis Checklist
Before proceeding to storyboard:
- [ ] Can I state the core message in one sentence?
- [ ] Do I know exactly who will read this comic?
- [ ] Have I identified at least 3 ways this comic provides value?
- [ ] Are there clear protagonists with compelling arcs?
- [ ] Have I found at least 5 visually powerful moments?
- [ ] Do I understand what to keep, simplify, expand, and omit?
- [ ] Have I identified the emotional peaks and valleys?
@@ -0,0 +1,101 @@
# chalk
粉笔画风 - Chalkboard aesthetic with hand-drawn warmth
## Overview
Classic classroom chalkboard aesthetic with hand-drawn chalk illustrations. Nostalgic educational feel with imperfect, sketchy lines that capture the warmth of traditional teaching.
## Line Work
- Sketchy, imperfect hand-drawn lines
- Chalk texture on all strokes
- Varying line weight from chalk pressure
- Soft edges, no sharp digital lines
- Visible chalk dust effects
## Character Design
- Simplified, friendly character designs
- Stick figures to semi-detailed range
- Expressive through simple gestures
- Approachable, non-intimidating
- Educational presenter style
## Background
- Chalkboard Black (#1A1A1A) or Dark Green-Black (#1C2B1C)
- Realistic chalkboard texture
- Subtle scratches and dust particles
- Faint eraser marks for authenticity
- Wooden frame border optional
## Typography
- Hand-drawn chalk lettering style
- Visible chalk texture on text
- Imperfect baseline adds authenticity
- White or bright colored chalk for emphasis
## Visual Elements
- Hand-drawn chalk illustrations
- Chalk dust effects around elements
- Doodles: stars, arrows, underlines, circles
- Mathematical formulas and diagrams
- Eraser smudges and chalk residue
- Stick figures and simple icons
- Connection lines with hand-drawn feel
## Default Color Palette
| Role | Color | Hex |
|------|-------|-----|
| Background | Chalkboard Black | #1A1A1A |
| Alt Background | Green-Black | #1C2B1C |
| Primary Text | Chalk White | #F5F5F5 |
| Accent 1 | Chalk Yellow | #FFE566 |
| Accent 2 | Chalk Pink | #FF9999 |
| Accent 3 | Chalk Blue | #66B3FF |
| Accent 4 | Chalk Green | #90EE90 |
| Accent 5 | Chalk Orange | #FFB366 |
## Style Rules
### Do
- Maintain authentic chalk texture on all elements
- Use imperfect, hand-drawn quality throughout
- Add subtle chalk dust and smudge effects
- Create visual hierarchy with color variety
- Include playful doodles and annotations
### Don't
- Use perfect geometric shapes
- Create clean digital-looking lines
- Add photorealistic elements
- Use gradients or glossy effects
## Quality Markers
- ✓ Authentic chalk texture throughout
- ✓ Imperfect, hand-drawn quality
- ✓ Readable despite sketchy style
- ✓ Nostalgic classroom feel
- ✓ Effective color hierarchy
- ✓ Playful educational aesthetic
## Compatibility
| Tone | Fit | Notes |
|------|-----|-------|
| neutral | ✓✓ | Classic educational |
| warm | ✓✓ | Nostalgic feel |
| dramatic | ✗ | Style mismatch |
| vintage | ✓ | Old school feel |
| romantic | ✗ | Style mismatch |
| energetic | ✓✓ | Fun learning |
| action | ✗ | Style mismatch |
## Best For
Educational content, tutorials, classroom themes, teaching materials, workshops, informal learning, knowledge sharing
@@ -0,0 +1,97 @@
# ink-brush
水墨画风 - Chinese ink brush aesthetics with dynamic strokes
## Overview
Traditional Chinese ink brush painting style adapted for comics. Combines calligraphic brush strokes with ink wash effects. Creates atmospheric, artistic visuals rooted in East Asian aesthetics.
## Line Work
- 2-3px dynamic brush strokes with varying weight
- Ink wash effects, traditional Chinese brush feel
- Bold, confident strokes with sharp edges
- Flowing lines for fabric and hair
- Pressure-sensitive stroke variation
## Character Design
- Realistic human proportions (7.5-8 head heights)
- Defined features with ink brush definition
- Dynamic poses capturing movement
- Flowing hair and clothing in motion
- Traditional attire options (robes, hanfu)
- Intense, expressive faces
## Brush Techniques
| Technique | Usage |
|-----------|-------|
| Bold strokes | Character outlines |
| Fine lines | Details, hair |
| Ink wash | Atmosphere, shadows |
| Dry brush | Texture, aging |
| Splatter | Impact, drama |
## Background Treatment
- Dramatic landscapes: mountains, waterfalls, temples
- Ink wash atmospheric effects
- Misty, layered depth
- Traditional architecture elements
- High contrast silhouettes
- Negative space as design element
## Color Approach
- Ink gradients as primary
- Limited accent colors
- Traditional Chinese palette
- Atmospheric color washes
- High contrast compositions
## Default Color Palette
| Role | Color | Hex |
|------|-------|-----|
| Primary | Deep black ink | #1A1A1A |
| Accent | Crimson red | #8B0000 |
| Accent | Imperial gold | #D4AF37 |
| Skin | Natural tan | #D4A574 |
| Background | Misty gray | #9CA3AF |
| Background | Earth tone | #8B7355 |
| Wash | Ink gradient | #2D3748 |
## Visual Elements
- Calligraphic text integration
- Seal stamps (optional)
- Ink splatter effects
- Flowing fabric trails
- Atmospheric mist
- Mountain silhouettes
## Quality Markers
- ✓ Dynamic brush stroke quality
- ✓ Authentic ink wash atmosphere
- ✓ High contrast compositions
- ✓ Flowing movement in fabric/hair
- ✓ Traditional aesthetic elements
- ✓ Atmospheric depth
## Compatibility
| Tone | Fit | Notes |
|------|-----|-------|
| neutral | ✓ | Contemplative stories |
| warm | ✓ | Nostalgic, gentle |
| dramatic | ✓✓ | High contrast |
| vintage | ✓✓ | Historical pieces |
| romantic | ✗ | Style mismatch |
| energetic | ✗ | Too refined |
| action | ✓✓ | Martial arts |
## Best For
Chinese historical stories, martial arts, traditional tales, contemplative narratives, artistic adaptations
@@ -0,0 +1,75 @@
# ligne-claire
清线画风 - Uniform lines, flat colors, European comic tradition
## Overview
Classic European comic style originating from Hergé's Tintin. Characterized by clean, uniform outlines and flat color fills without gradients. Creates a timeless, accessible aesthetic suitable for educational and narrative content.
## Line Work
- Uniform, clean outlines with consistent weight (2px)
- No hatching or cross-hatching for shading
- Sharp, precise edges on all elements
- Black ink outlines on all figures and objects
- Shadows indicated through flat color areas, not line techniques
## Character Design
- Slightly stylized/cartoonish characters with realistic proportions
- Distinctive, recognizable facial features
- Expressive faces with clear emotions
- Period-appropriate clothing with attention to detail
- Consistent character appearance across panels
- 6-7 head height proportions
## Background Treatment
- Detailed, realistic backgrounds with architectural accuracy
- Period-specific props and technology
- Clear spatial depth and perspective
- Environmental storytelling through details
- Contrast between simplified characters and detailed backgrounds
## Color Approach
- Flat colors without gradients (true to Ligne Claire tradition)
- Limited palette per page for cohesion
- Colors support narrative mood
- Consistent lighting logic within scenes
## Default Color Palette
| Role | Color | Hex |
|------|-------|-----|
| Primary Blue | Clean blue | #3182CE |
| Primary Red | Classic red | #E53E3E |
| Primary Yellow | Warm yellow | #ECC94B |
| Skin | Warm tan | #F7CFAE |
| Background Light | Light cream | #FFFAF0 |
| Background Sky | Sky blue | #BEE3F8 |
## Quality Markers
- ✓ Clean, uniform line weight throughout
- ✓ Flat colors without gradients
- ✓ Detailed backgrounds, stylized characters
- ✓ Clear panel borders and reading flow
- ✓ Hand-drawn text style
- ✓ Proper perspective in environments
## Compatibility
| Tone | Fit | Notes |
|------|-----|-------|
| neutral | ✓✓ | Classic combination |
| warm | ✓✓ | Nostalgic stories |
| dramatic | ✓ | Works with high contrast |
| vintage | ✓ | Period pieces |
| romantic | ✗ | Style mismatch |
| energetic | ✓ | Lighter stories |
| action | ✗ | Lacks dynamic lines |
## Best For
Educational content, balanced narratives, biography comics, historical stories
@@ -0,0 +1,93 @@
# manga
日漫画风 - Anime/manga aesthetics with expressive characters
## Overview
Japanese manga art style characterized by large expressive eyes, dynamic poses, and visual emotion indicators. Versatile style that works across genres from educational to romantic to action.
## Line Work
- Clean, smooth lines (1.5-2px)
- Expressive weight variation for emphasis
- Smooth curves, dynamic strokes
- Speed lines and motion effects available
- Screen tone effects for atmosphere
## Character Design
- Anime/manga proportions: larger eyes, expressive faces
- 5-7 head height proportions (varies by sub-style)
- Clear emotional indicators (, , sweat drops, sparkles)
- Dynamic poses and gestures
- Detailed hair with individual strands
- Fashionable clothing with natural folds
## Eye Styles
| Type | Description |
|------|-------------|
| Standard | Medium-large, 2-3 highlights |
| Educational | Friendly, approachable eyes |
| Dramatic | Intense, detailed irises |
| Cute | Very large, sparkly eyes |
## Background Treatment
- Simplified during dialogue/explanation
- Detailed for establishing shots
- Screen tone gradients for mood
- Abstract backgrounds for emotional moments
- Technical diagrams styled as displays
## Color Approach
- Clean, bright anime colors
- Soft gradients on skin
- Vibrant palette options
- Light and shadow with soft transitions
- Color coding for character identification
## Default Color Palette
| Role | Color | Hex |
|------|-------|-----|
| Primary Blue | Bright blue | #4299E1 |
| Primary Orange | Warm orange | #ED8936 |
| Primary Green | Soft green | #68D391 |
| Skin | Anime warm | #FEEBC8 |
| Background | Clean white | #FFFFFF |
| Highlight | Golden | #FFD700 |
## Visual Elements
- Speech bubbles: rounded (normal), spiky (excitement)
- Sound effects integrated visually
- Emotion symbols (sweat drops, anger marks, hearts)
- Speed lines and motion blur
- Sparkle and glow effects
## Quality Markers
- ✓ Expressive character faces
- ✓ Clean, consistent line work
- ✓ Dynamic poses and compositions
- ✓ Appropriate use of manga conventions
- ✓ Readable panel flow
- ✓ Consistent character designs
## Compatibility
| Tone | Fit | Notes |
|------|-----|-------|
| neutral | ✓✓ | Educational manga |
| warm | ✓ | Slice of life |
| dramatic | ✓ | Intense moments |
| romantic | ✓✓ | Shoujo style |
| energetic | ✓✓ | Shonen style |
| vintage | ✗ | Style mismatch |
| action | ✓✓ | Battle manga |
## Best For
Educational tutorials, romance, action, coming-of-age, technical explanations, youth-oriented content
@@ -0,0 +1,84 @@
# minimalist
极简画风 - Clean black line art, limited spot color, simplified stick-figure characters
## Overview
Minimalist cartoon illustration characterized by clean black line art on white background with very limited spot color for emphasis. Characters are simplified to near-stick-figure abstraction, focusing on gesture and concept rather than anatomical detail. Designed for business allegory, quick-read educational content, and concept illustration.
## Line Work
- Clean, uniform black lines (1.5-2px)
- No hatching, cross-hatching, or shading techniques
- Minimal detail — every line serves a purpose
- Bold outlines for characters, thinner lines for props/labels
- No decorative flourishes or ornamental lines
## Character Design
- Highly simplified, stick-figure-like business characters
- Circle or oval heads with minimal facial features (dot eyes, simple line mouth)
- Body as simple geometric shapes or line constructions
- Distinguishing features through props only (tie, hat, briefcase, glasses)
- No anatomical detail — expressive through posture and gesture
- 4-5 head height proportions (squat, iconic)
## Background Treatment
- Mostly blank/white — negative space is a design element
- Minimal environmental cues (a line for ground, simple desk outline)
- Concept labels and text annotations replace detailed environments
- Icons and symbols over realistic rendering
- No perspective or spatial depth
## Color Approach
- Primarily black and white (90%+ of the image)
- 1-2 spot accent colors for emphasis on key concepts
- Accent color used sparingly: highlighting key objects, text labels, concept indicators
- No gradients, no shading, no color fills on backgrounds
- Color draws the eye to the most important element in each panel
## Default Color Palette
| Role | Color | Hex |
|------|-------|-----|
| Primary | Black ink | `#1A1A1A` |
| Background | Clean white | `#FFFFFF` |
| Accent 1 | Spot orange | `#FF6B35` |
| Accent 2 | Spot blue (optional) | `#3182CE` |
| Text labels | Dark gray | `#4A4A4A` |
| Panel border | Medium gray | `#666666` |
## Visual Elements
- Text labels with accent-color backgrounds or underlines for key terms
- Simple icons: arrows, circles, checkmarks, crosses
- Concept highlight boxes with spot color
- Minimal speech bubbles (simple oval or rectangle, thin black outline)
- No sound effects, no motion lines, no screen tones
## Quality Markers
- ✓ Clean, purposeful line work with no unnecessary detail
- ✓ 90%+ black-and-white with strategic spot color
- ✓ Simplified characters readable at small sizes
- ✓ Text labels integrated naturally into panels
- ✓ Strong negative space usage
- ✓ Every element serves the narrative point
## Compatibility
| Tone | Fit | Notes |
|------|-----|-------|
| neutral | ✓✓ | Ideal for business/educational content |
| warm | ✓ | Works for gentle stories, slight warmth in accent |
| energetic | ✓ | Works for punchy, high-energy content |
| dramatic | ✗ | Style too stripped down for dramatic intensity |
| vintage | ✗ | Minimalist aesthetic conflicts with aged/textured look |
| romantic | ✗ | No capacity for decorative/soft elements |
| action | ✗ | No dynamic line capability for speed/impact |
## Best For
Business allegory, management fables, short concept illustration, four-panel comic strips, quick-insight education, social media content
@@ -0,0 +1,89 @@
# realistic
写实画风 - Digital painting with realistic proportions and lighting
## Overview
Full-color realistic manga style using digital painting techniques. Features anatomically accurate characters, rich gradients, and detailed environmental rendering. Sophisticated aesthetic for mature audiences.
## Line Work
- Clean, precise outlines with clear contours
- Uniform line weight for character definition
- No excessive hatching - rely on color for depth
- Smooth curves and realistic anatomical lines
- Ligne Claire influence: clean but not simplified
## Character Design
- Realistic human proportions (7-8 head heights)
- Anatomically accurate features and expressions
- Detailed facial structure without exaggeration
- Natural poses and body language
- Consistent appearance across panels
- Subtle expressions rather than manga-style
## Rendering Style
- Full-color digital painting with rich gradients
- Soft shadow transitions on skin and fabric
- Realistic material textures (glass, liquid, fabric, wood)
- Detailed hair with natural shine and volume
- Environmental lighting affects all elements
- NOT flat cel-shading - smooth color blending
## Background Treatment
- Highly detailed, realistic environments
- Accurate perspective and spatial depth
- Atmospheric lighting (warm indoor, cool outdoor)
- Professional settings rendered with precision
- Props and objects with realistic textures
## Color Approach
- Rich gradients for depth and volume
- Realistic lighting with warm/cool contrast
- Material-specific rendering
- Subtle color temperature shifts
- Professional, sophisticated palette
## Default Color Palette
| Role | Color | Hex |
|------|-------|-----|
| Skin Light | Natural warm | #F5D6C6 |
| Skin Shadow | Warm shadow | #E8C4B0 |
| Environment | Warm wood | #8B7355 |
| Environment Cool | Cool stone | #9CA3AF |
| Accent | Wine red | #722F37 |
| Accent Gold | Gold | #D4AF37 |
| Light Warm | Amber | #FFB347 |
| Light Cool | Cool blue | #B0C4DE |
## Quality Markers
- ✓ Anatomically accurate proportions
- ✓ Smooth color gradients (not flat fills)
- ✓ Realistic material textures
- ✓ Detailed, atmospheric backgrounds
- ✓ Natural lighting with soft shadows
- ✓ Expressive but subtle expressions
- ✓ Professional aesthetic
- ✓ Clean speech bubbles
## Compatibility
| Tone | Fit | Notes |
|------|-----|-------|
| neutral | ✓✓ | Professional content |
| warm | ✓✓ | Nostalgic stories |
| dramatic | ✓✓ | High drama |
| vintage | ✓✓ | Period pieces |
| romantic | ✗ | Style mismatch |
| energetic | ✗ | Too refined |
| action | ✓ | Serious action |
## Best For
Professional topics (wine, food, business), lifestyle content, adult narratives, documentary-style, mature educational guides
@@ -0,0 +1,71 @@
# Auto Selection
Content signals determine default art + tone + layout (or preset).
## Content Signal Matrix
| Content Signals | Art Style | Tone | Layout | Preset |
|-----------------|-----------|------|--------|--------|
| Tutorial, how-to, beginner | manga | neutral | webtoon | **ohmsha** |
| Computing, AI, programming | manga | neutral | dense | **ohmsha** |
| Technical explanation, educational | manga | neutral | webtoon | **ohmsha** |
| Pre-1950, classical, ancient | realistic | vintage | cinematic | - |
| Personal story, mentor | ligne-claire | warm | standard | - |
| Psychology, motivation, self-help, coaching | manga | warm | standard | **concept-story** |
| Business narrative, management, leadership | manga | warm | standard | **concept-story** |
| Conflict, breakthrough | (inherit) | dramatic | splash | - |
| Wine, food, lifestyle | realistic | neutral | cinematic | - |
| Martial arts, wuxia, xianxia | ink-brush | action | splash | **wuxia** |
| Romance, love, school life | manga | romantic | standard | **shoujo** |
| Business allegory, fable, parable, short insight, 四格 | minimalist | neutral | four-panel | **four-panel** |
| Biography, balanced | ligne-claire | neutral | mixed | - |
## Preset Recommendation Rules
**When preset is recommended**: Load `presets/{preset}.md` and apply all special rules.
### ohmsha
- **Triggers**: Tutorial, technical, educational, computing, programming, how-to, beginner
- **Special rules**: Visual metaphors, NO talking heads, gadget reveals, Doraemon-style characters
- **Base**: manga + neutral + webtoon/dense
### wuxia
- **Triggers**: Martial arts, wuxia, xianxia, cultivation, swordplay
- **Special rules**: Qi effects, combat visuals, atmospheric elements
- **Base**: ink-brush + action + splash
### shoujo
- **Triggers**: Romance, love story, school life, emotional drama
- **Special rules**: Decorative elements, eye details, romantic beats
- **Base**: manga + romantic + standard
### concept-story
- **Triggers**: Psychology, motivation, self-help, business narrative, management, leadership, personal growth, coaching, soft skills, abstract concept through story
- **Special rules**: Visual symbol system, growth arc, dialogue+action balance, original characters
- **Base**: manga + warm + standard
### four-panel
- **Triggers**: Business allegory, fable, parable, short insight, four-panel, 四格, 四格漫画, single-page comic, minimalist comic strip
- **Special rules**: Strict 起承转合 4-panel structure, B&W + spot color, simplified stick-figure characters, single-page story
- **Base**: minimalist + neutral + four-panel
## Compatibility Matrix
Art Style × Tone combinations work best when matched appropriately:
| Art Style | ✓✓ Best | ✓ Works | ✗ Avoid |
|-----------|---------|---------|---------|
| ligne-claire | neutral, warm | dramatic, vintage, energetic | romantic, action |
| manga | neutral, romantic, energetic, action | warm, dramatic | vintage |
| realistic | neutral, warm, dramatic, vintage | action | romantic, energetic |
| ink-brush | neutral, dramatic, action, vintage | warm | romantic, energetic |
| chalk | neutral, warm, energetic | vintage | dramatic, action, romantic |
| minimalist | neutral | warm, energetic | dramatic, vintage, romantic, action |
**Note**: Art Style × Tone × Layout can be freely combined. Incompatible combinations work but may produce unexpected results.
## Priority Order
1. User-specified options (art / tone / style)
2. Content signal analysis → auto-selection
3. Fallback: ligne-claire + neutral + standard
@@ -0,0 +1,98 @@
Create a knowledge biography comic page following these guidelines:
## Image Specifications
- **Type**: Comic book page with multiple panels
- **Orientation**: Portrait (vertical)
- **Aspect Ratio**: 2:3
- **Style**: See style-specific reference for visual guidelines
## Panel Structure
### Panel Borders
- Clean black lines (1-2px) around each panel
- White gutters between panels (8-12px)
- Panels arranged for clear reading flow
- Variety in panel sizes for visual rhythm
### Panel Composition
- Clear focal points in each panel
- Proper use of foreground, midground, background
- Camera angles vary: eye level, bird's eye, low angle, close-up, wide shot
- Action flows logically between panels
- Negative space used intentionally
## Text Elements
### Speech Bubbles
- **Dialogue**: Oval/elliptical bubbles with pointed tails
- White fill with thin black outline
- Tail points clearly to speaker
- Hand-lettered style font (not computer-generated)
### Narrator Boxes
- **Fourth Wall/Narrator**: Rectangular boxes
- Often positioned at panel edges (top or bottom)
- Slightly different fill color (cream or light yellow)
- Used for commentary, time jumps, explanations
### Thought Bubbles
- Cloud-shaped with bubble trail leading to thinker
- Softer outline than speech bubbles
- For internal monologue
### Caption Bars
- Rectangular bars at panel edges
- Time and place information
- "Meanwhile...", "Three years later..." type transitions
- Darker fill with white text, or vice versa
### Typography
- Hand-drawn lettering style throughout
- Bold for emphasis and key terms
- Consistent letter sizing
- Chinese text: use full-width punctuation "",。!
- Clear hierarchy: titles > dialogue > captions
## Scientific/Concept Visualization
When depicting abstract concepts:
| Concept | Visual Metaphor |
|---------|----------------|
| Neural networks | Glowing nodes connected by clean lines |
| Data flow | Luminous particles along simple paths |
| Algorithms | Geometric patterns, building blocks |
| Logic/proof | Interlocking puzzle pieces |
| Discovery | Light breaking through darkness |
| Uncertainty | Forking paths, question marks |
| Time | Clock motifs, calendar pages |
- Integrate diagrams naturally into narrative panels
- Use inset panels or thought-bubble style for explanations
- Simplified iconography over realistic depiction
## Fourth Wall / Narrator Character
When depicting narrator characters addressing the reader:
- Character may look directly out of panel
- Can appear in "present day" framing scenes
- Distinct visual treatment from main timeline
- Often at page edges or in dedicated panels
- May comment on or question the events shown
## Historical Accuracy
- Research period-specific details: costumes, technology, architecture
- Show aging naturally for characters across time periods
- Iconic items and locations rendered recognizably
- Balance accuracy with stylization
## Language
- All text in Chinese (中文) unless source material is in another language
- Use Chinese full-width punctuation: "",。!
---
Please generate the comic page based on the content provided below:
@@ -0,0 +1,180 @@
# Character Definition Template
## Character Document Format
Create `characters/characters.md` with the following structure:
```markdown
# Character Definitions - [Comic Title]
**Style**: [selected style]
**Art Direction**: [Ligne Claire / Manga / etc.]
---
## Character 1: [Name]
**Role**: [Protagonist / Mentor / Antagonist / Narrator]
**Age**: [approximate age or age range in story]
**Appearance**:
- Face shape: [oval/square/round]
- Hair: [color, style, length]
- Eyes: [color, shape, distinctive features]
- Build: [height, body type]
- Distinguishing features: [glasses, beard, scar, etc.]
**Costume**:
- Default outfit: [detailed description]
- Color palette: [primary colors for this character]
- Accessories: [hat, bag, tools, etc.]
**Expression Range**:
- Neutral: [description]
- Happy/Excited: [description]
- Thinking/Confused: [description]
- Determined: [description]
**Visual Reference Notes**:
[Any specific artistic direction]
---
## Character 2: [Name]
...
```
## Reference Sheet Image Prompt
After character definitions, include a prompt for generating the reference sheet:
```markdown
## Reference Sheet Prompt
Character reference sheet in [style] style, clean lines, flat colors:
[ROW 1 - Character Name]:
- Front view: [detailed description]
- 3/4 view: [description]
- Expression sheet: Neutral | Happy | Focused | Worried
[ROW 2 - Character Name]:
...
COLOR PALETTE:
- [Character 1]: [colors]
- [Character 2]: [colors]
White background, clear labels under each character.
```
## Example: Turing Biography
```markdown
# Character Definitions - The Imitation Game
**Style**: classic (Ligne Claire)
**Art Direction**: Clean lines, muted colors, period-accurate details
---
## Character 1: Alan Turing
**Role**: Protagonist
**Age**: 25-40 (varies across story)
**Appearance**:
- Face shape: Oval, slightly angular
- Hair: Dark brown, wavy, slightly disheveled
- Eyes: Deep-set, intense gaze
- Build: Tall, lean, slightly awkward posture
- Distinguishing features: Prominent brow, thoughtful expression
**Costume**:
- Default outfit: Tweed jacket with elbow patches, white shirt, no tie
- Color palette: Muted browns, navy blue, cream
- Accessories: Occasionally a pipe, papers/notebooks
**Expression Range**:
- Neutral: Thoughtful, slightly distant
- Happy/Excited: Eureka moment, eyes bright, subtle smile
- Thinking/Confused: Furrowed brow, looking at abstract space
- Determined: Jaw set, focused eyes
---
## Character 2: The Bombe Machine
**Role**: Supporting (anthropomorphized)
**Appearance**:
- Large brass and wood cabinet
- Dial "eyes" that can express states
- Paper tape "mouth"
- Indicator lights for emotions
**Expression Range**:
- Processing: Spinning dials, humming
- Success: Lights up warmly
- Stuck: Smoke wisps, stuttering
---
## Reference Sheet Prompt
Character reference sheet in Ligne Claire style, clean lines, flat colors:
TOP ROW - Alan Turing:
- Front view: Young man, 30s, short dark wavy hair, thoughtful expression, wearing tweed jacket with elbow patches, white shirt
- 3/4 view: Same character, slight smile, showing profile of nose
- Expression sheet: Neutral | Excited (eureka moment) | Focused (working) | Worried
BOTTOM ROW - The Bombe Machine (anthropomorphized):
- Bombe machine as character: Large, brass and wood, dial "eyes", paper tape "mouth"
- Expressions: Processing (spinning dials) | Success (lights up) | Stuck (smoke wisps)
COLOR PALETTE:
- Turing: Muted browns (#8B7355), navy blue (#2C3E50), cream (#F5F5DC)
- Machine: Brass (#B5A642), mahogany (#4E2728), emerald indicators (#2ECC71)
White background, clear labels under each character.
```
## Handling Age Variants
For biographies spanning many years, define age variants:
```markdown
## Alan Turing - Age Variants
### Young (1920s, age 10-18)
- Boyish features, round face
- School uniform (Sherborne)
- Curious, eager expression
### Adult (1930s-40s, age 25-35)
- Angular face, defined jaw
- Tweed jacket, rumpled appearance
- Intense, focused expression
### Later (1950s, age 40+)
- Slightly weathered
- More casual dress
- Thoughtful, sometimes melancholic
```
## Best Practices
| Practice | Description |
|----------|-------------|
| Be specific | "Short dark wavy hair, parted left" not just "dark hair" |
| Use distinguishing features | Glasses, scars, accessories that identify character |
| Define color codes | Use specific color names or hex codes |
| Include age markers | Wrinkles, posture, clothing style matching era |
| Reference real people | For historical figures, note "based on 1940s photographs" |
## Why Character Reference Matters
Without unified character definition, AI generates inconsistent appearances. The reference sheet provides:
1. Visual anchors for consistent features
2. Color palettes for consistent coloring
3. Expression documentation for emotional portrayals
@@ -0,0 +1,23 @@
# cinematic
Wide panels, filmic feel
## Panel Structure
- **Panels per page**: 2-4
- **Structure**: Horizontal emphasis, wide aspect panels
- **Gutters**: Generous spacing (12-15px)
## Grid Configuration
- 1-2 columns, horizontal emphasis
- Panel sizes: Wide aspect ratios (3:1, 4:1)
- Reading flow: Horizontal sweep, filmic rhythm
## Best For
Establishing shots, dramatic moments, landscapes
## Best Style Pairings
dramatic, classic, sepia
@@ -0,0 +1,23 @@
# dense
Information-rich, educational focus
## Panel Structure
- **Panels per page**: 6-9
- **Structure**: Compact grid, smaller panels
- **Gutters**: Tight spacing (4-6px)
## Grid Configuration
- 3 columns × 3 rows
- Panel sizes: Compact, uniform
- Reading flow: Rapid progression, information-rich
## Best For
Technical explanations, complex narratives, timelines
## Best Style Pairings
ohmsha, vibrant
@@ -0,0 +1,40 @@
# four-panel
四格漫画 - Strict 2×2 grid, single-page story
## Panel Structure
- **Panels per page**: 4 (exactly, no variation)
- **Structure**: Strict 2×2 equal grid
- **Gutters**: Consistent white space (8-10px), uniform on all sides
## Grid Configuration
- 2 columns × 2 rows, all panels identical size
- Panel sizes: Exactly equal (each panel = 25% of content area)
- Reading flow: Z-pattern — Panel 1 (top-left) → Panel 2 (top-right) → Panel 3 (bottom-left) → Panel 4 (bottom-right)
## Narrative Structure
Each panel serves a specific narrative role (起承转合 / kishōtenketsu):
| Panel | Position | Role | Purpose |
|-------|----------|------|---------|
| 1 | Top-left | 起 Setup | Establish situation, introduce characters/problem |
| 2 | Top-right | 承 Development | Build on setup, add complication or attempt |
| 3 | Bottom-left | 转 Turn | Twist, key insight, or reversal — the pivotal moment |
| 4 | Bottom-right | 合 Conclusion | Resolution, punchline, or takeaway |
## Aspect Ratio
- Recommended page aspect: **4:3** (landscape)
- Landscape gives each panel a comfortable wide rectangle
- Portrait (3:4) makes panels tall and narrow — avoid for this layout
## Best For
Business allegory, quick-insight education, social media comics, fables, parables, single-concept explanation
## Best Style Pairings
minimalist, ligne-claire, chalk
@@ -0,0 +1,23 @@
# mixed
Dynamic, varied rhythm
## Panel Structure
- **Panels per page**: 3-7 (varies)
- **Structure**: Intentionally varied for pacing
- **Gutters**: Dynamic spacing
## Grid Configuration
- Intentionally irregular
- Panel sizes: Varied for pacing and emphasis
- Reading flow: Guides eye through varied rhythm
## Best For
Action sequences, emotional arcs, complex stories
## Best Style Pairings
dramatic, vibrant, ohmsha
@@ -0,0 +1,23 @@
# splash
Impact-focused, key moments
## Panel Structure
- **Panels per page**: 1-2 large + 2-3 small
- **Structure**: Dominant splash with supporting panels
- **Gutters**: Varied for emphasis
## Grid Configuration
- 1 dominant panel + 2-3 supporting
- Panel sizes: 50-70% splash, remainder small
- Reading flow: Splash dominates, supporting panels accent
## Best For
Revelations, breakthroughs, chapter openings
## Best Style Pairings
dramatic, classic, vibrant
@@ -0,0 +1,23 @@
# standard
Classic comic grid, versatile
## Panel Structure
- **Panels per page**: 4-6
- **Structure**: Regular grid with occasional variation
- **Gutters**: Consistent white space (8-10px)
## Grid Configuration
- 2-3 columns × 2-3 rows
- Panel sizes: Mostly equal, occasional variation
- Reading flow: Left→right, top→bottom (Z-pattern)
## Best For
Narrative flow, dialogue scenes
## Best Style Pairings
classic, warm, sepia
@@ -0,0 +1,30 @@
# webtoon
Vertical scrolling comic (竖版条漫)
## Panel Structure
- **Panels per page**: 3-5 vertically stacked
- **Structure**: Single column, vertical flow optimized for scrolling
- **Gutters**: Generous vertical spacing (20-40px), panels often bleed horizontally
## Grid Configuration
- Single column, vertical stack
- Panel sizes: Full width, variable height (1:1 to 1:2 aspect)
- Reading flow: Top→bottom continuous scroll
## Special Features
- Panels can extend beyond frame for dramatic effect
- Generous whitespace between beats
- Character close-ups alternate with wide explanation panels
- "Float" effect - elements can exist between panels
## Best For
Ohmsha-style tutorials, mobile reading, step-by-step guides
## Best Style Pairings
ohmsha, vibrant
@@ -0,0 +1,85 @@
# Ohmsha Manga Guide Style
Guidelines for educational manga comics using the `ohmsha` preset.
## Character Setup
| Role | Default | Traits |
|------|---------|--------|
| Student (Role A) | 大雄 | Confused, asks basic but crucial questions, represents reader |
| Mentor (Role B) | 哆啦A梦 | Knowledgeable, patient, uses gadgets as technical metaphors |
| Antagonist (Role C, optional) | 胖虎 | Represents misunderstanding, or "noise" in the data |
Custom characters: ask the user for role → name mappings (e.g., `Student:小明, Mentor:教授, Antagonist:Bug怪`).
## Character Reference Sheet Style
For Ohmsha style, use manga/anime style with:
- Exaggerated expressions for educational clarity
- Simple, distinctive silhouettes
- Bright, saturated color palettes
- Chibi/SD (super-deformed) variants for comedic reactions
## Outline Spec Block
Every ohmsha outline must start with:
```markdown
【漫画规格单】
- Language: [Same as input content]
- Style: Ohmsha (Manga Guide), Full Color
- Layout: Vertical Scrolling Comic (竖版条漫)
- Characters: [List character names and roles]
- Character Reference: characters/characters.png
- Page Limit: ≤20 pages
```
## Visual Metaphor Rules (Critical)
**NEVER** create "talking heads" panels. Every technical concept must become:
1. **A tangible gadget/prop** - Something characters can hold, use, demonstrate
2. **An action scene** - Characters doing something that illustrates the concept
3. **A visual environment** - Stepping into a metaphorical space
### Examples
| Concept | Bad (Talking Heads) | Good (Visual Metaphor) |
|---------|---------------------|------------------------|
| Word embeddings | Characters discussing vectors | 哆啦A梦拿出"词向量压缩机",把书本压缩成彩色小球 |
| Gradient descent | Explaining math formula | 大雄在山谷地形上滚球,寻找最低点 |
| Neural network | Diagram on whiteboard | 角色走进由发光节点组成的网络迷宫 |
## Page Title Convention
Avoid AI-style "Title: Subtitle" format. Use narrative descriptions:
- ❌ "Page 3: Introduction to Neural Networks"
- ✓ "Page 3: 大雄被海量单词淹没,哆啦A梦拿出'词向量压缩机'"
## Ending Requirements
- NO generic endings ("What will you choose?", "Thanks for reading")
- End with: Technical summary moment OR character achieving a small goal
- Final panel: Sense of accomplishment, not open-ended question
### Good Endings
- Student successfully applies learned concept
- Visual callback to opening problem, now solved
- Mentor gives summary while student demonstrates understanding
### Bad Endings
- "What do you think?" open questions
- "Thanks for reading this tutorial"
- Cliffhanger without resolution
## Layout Preference
Ohmsha style typically uses:
- `webtoon` (vertical scrolling) - Primary choice
- `dense` - For information-heavy sections
- `mixed` - For varied pacing
Avoid `cinematic` and `splash` for educational content.
@@ -0,0 +1,106 @@
# Partial Workflows
Options to run specific parts of the workflow. Trigger these via natural language (e.g., "just the storyboard", "regenerate page 3").
## Options Summary
| Option | Steps Executed | Output |
|--------|----------------|--------|
| Storyboard only | 1-3 | `storyboard.md` + `characters/` |
| Prompts only | 1-5 | + `prompts/*.md` |
| Images only | 7-8 | + images |
| Regenerate N | 7 (partial) | Specific page(s) |
---
## Storyboard-only
Generate storyboard and characters without prompts or images.
**User cue**: "storyboard only", "just the outline", "don't generate images yet".
**Workflow**: Steps 1-3 only (stop after storyboard + characters)
**Output**:
- `analysis.md`
- `storyboard.md`
- `characters/characters.md`
**Use case**: Review and edit the storyboard before generating images. Useful for:
- Getting feedback on the narrative structure
- Making manual adjustments to panel layouts
- Defining custom characters
---
## Prompts-only
Generate storyboard, characters, and prompts without images.
**User cue**: "prompts only", "write the prompts but don't generate yet".
**Workflow**: Steps 1-5 (generate prompts, skip images)
**Output**:
- `analysis.md`
- `storyboard.md`
- `characters/characters.md`
- `prompts/*.md`
**Use case**: Review and edit prompts before image generation. Useful for:
- Fine-tuning image generation prompts
- Ensuring visual consistency before committing to generation
- Making style adjustments at the prompt level
---
## Images-only
Generate images from existing prompts (starts at Step 7).
**User cue**: "generate images from existing prompts", "run the images now" (pointing at an existing `comic/topic-slug/` directory).
**Workflow**: Skip to Step 7, then 8
**Prerequisites** (must exist in directory):
- `prompts/` directory with page prompt files
- `storyboard.md` with style information
- `characters/characters.md` with character definitions
**Output**:
- `characters/characters.png` (if not exists)
- `NN-{cover|page}-[slug].png` images
**Use case**: Re-generate images after editing prompts. Useful for:
- Recovering from failed image generation
- Trying different image generation settings
- Regenerating after manual prompt edits
---
## Regenerate
Regenerate specific pages only.
**User cue**: "regenerate page 3", "redo pages 2, 5, 8", "regenerate the cover".
**Workflow**:
1. Read existing prompts for specified pages
2. Regenerate images only for those pages via `image_generate`
3. Download each returned URL and overwrite the existing PNG
**Prerequisites** (must exist):
- `prompts/NN-{cover|page}-[slug].md` for specified pages
- `characters/characters.md` (for agent-side consistency checks, if it was used originally)
**Output**:
- Regenerated `NN-{cover|page}-[slug].png` for specified pages
**Use case**: Fix specific pages without regenerating entire comic. Useful for:
- Fixing a single problematic page
- Iterating on specific visuals
- Regenerating pages after prompt edits
**Page numbering**:
- `0` = Cover page
- `1-N` = Content pages
@@ -0,0 +1,121 @@
# concept-story
概念故事预设 - Narrative comics that visualize abstract concepts through character-driven stories
## Base Configuration
| Dimension | Value |
|-----------|-------|
| Art Style | manga |
| Tone | warm |
| Layout | standard (default) |
Equivalent to: art=manga, tone=warm
## Unique Rules
This preset includes special rules beyond the art+tone combination. When the `concept-story` preset is selected, ALL rules below must be applied.
### Concept Visualization System (CRITICAL)
Each major abstract concept SHOULD have a recurring visual symbol/metaphor:
| Concept Type | Visualization Approach |
|-------------|----------------------|
| Psychological need | Tangible object character holds or discovers (e.g., glowing energy ball = competence) |
| Management principle | Environmental metaphor character navigates (e.g., ship wheel = autonomy) |
| Growth/development | Living organic symbol that transforms (e.g., seed → flowering plant = relatedness) |
| Abstract framework | Spatial structure characters can enter or observe |
| Emotional state | Color/lighting shift in the scene atmosphere |
**Unlike ohmsha**: Dialogue panels are allowed and expected. The goal is to COMBINE visual metaphors WITH dialogue, not replace dialogue entirely.
**Pattern**: "Dialogue introduces idea" → "Visual metaphor illustrates it" → "Character reacts/applies it"
### Visual Symbol Continuity
Symbols must persist across the story:
| Stage | Treatment |
|-------|-----------|
| Introduction | Symbol appears with soft glow effect when concept is first mentioned |
| Recurrence | Same symbol reappears in background or character interaction when concept is referenced |
| Resolution | ALL symbols gather in the final composition, showing integration of learned concepts |
**Storyboard requirement**: Include a Symbol Mapping Table defining concept → visual symbol before panel breakdown.
### Character Archetypes (Flexible)
Create original characters based on content domain. No fixed defaults:
| Role | Archetype | Visual Cues |
|------|-----------|------------|
| Protagonist | Learner/worker facing a challenge | Modern professional or student, relatable, starts with constrained posture |
| Mentor | Experienced guide who teaches through experience | Slightly older, calm demeanor, warm color accents |
| Catalyst | Person or event that triggers transformation | Can be a colleague, situation, challenge, or opportunity |
**IMPORTANT**: Characters are created fresh each time based on the source content's domain (business, psychology, education, etc.). No default character set.
### Narrative Arc Structure
Enforce a five-stage growth arc:
| Act | Structure | Visual Tone |
|-----|-----------|------------|
| Opening | Protagonist stuck in routine, faces frustration | Muted warm tones, tight framing, constrained compositions |
| Inciting moment | Mentor appears or opportunity arrives | Brightness increases, panels open up |
| Learning | Concepts introduced through visual metaphors | Rich warm palette, symbols introduced one by one |
| Turning point | Protagonist applies knowledge, faces test | Contrast increases, dynamic compositions |
| Transformation | Growth demonstrated, new understanding visible | Full warm palette, expansive composition, all symbols present |
### Dialogue + Action Balance
- Dialogue is encouraged and expected (unlike ohmsha's NO talking heads rule)
- Every page should combine at least one dialogue panel with at least one visual/action panel
- Avoid pure "lecture" pages where a character explains for 4+ panels straight
- When a character explains a concept verbally, the NEXT panel should visualize it
**Wrong approach**: Four consecutive panels of mentor lecturing at protagonist
**Right approach**: Mentor introduces concept → visual metaphor panel → protagonist reacts → applies understanding
### Scene Atmosphere Rules
| Scene Type | Atmosphere |
|------------|-----------|
| Problem/frustration | Cool muted tones over warm base, tight framing, cluttered environment |
| Mentoring moment | Golden hour lighting, open composition, warm indoor glow |
| Concept visualization | Soft glow effects, clean simplified backgrounds, symbol spotlight |
| Growth/transformation | Warm light expanding outward, character posture opening up |
| Resolution | Full warm palette, spacious composition, all visual symbols visible |
### Ending Requirements
Final page MUST include:
1. Protagonist demonstrating transformed understanding (not just being told)
2. Visual callback showing contrast with opening state (e.g., wilted plant → thriving plant)
3. All concept symbols visible together in the composition
4. A forward-looking element suggesting ongoing growth (not a closed ending)
### Page Title Convention
Every page MUST have a narrative title:
**Wrong**: "Chapter 3: Self-Determination Theory"
**Right**: "The Day Xiao Ming Found His Own Engine"
## Quality Markers
- ✓ Each major concept has a recurring visual symbol
- ✓ Dialogue and visual metaphors work together (not one replacing the other)
- ✓ Clear growth arc from problem to transformation
- ✓ Original characters suited to the content domain
- ✓ Warm, professional atmosphere throughout
- ✓ Visual symbols recur and accumulate through the story
- ✓ Final page integrates all concept symbols with transformation callback
## Best For
Psychology concepts, business/management principles, motivation theory, personal development,
self-help content, leadership frameworks, coaching narratives, soft skill education,
abstract concept explanation through character-driven stories
@@ -0,0 +1,107 @@
# four-panel
四格漫画预设 - Minimalist four-panel business allegory comics
## Base Configuration
| Dimension | Value |
|-----------|-------|
| Art Style | minimalist |
| Tone | neutral |
| Layout | four-panel (default) |
| Aspect | 4:3 (landscape) |
Equivalent to: art=minimalist, tone=neutral, layout=four-panel, aspect=4:3
## Unique Rules
This preset includes special rules beyond the art+tone combination. When the `four-panel` preset is selected, ALL rules below must be applied.
### 起承转合 Narrative Structure (CRITICAL)
Every comic MUST follow the four-panel 起承转合 structure:
| Panel | Role | Requirements |
|-------|------|-------------|
| 1 (起 Setup) | Introduce the situation | Show character(s) in a recognizable context. Establish the "normal" state or problem |
| 2 (承 Development) | Build on the setup | Add complication, show an attempt, or introduce the concept. Stakes become clearer |
| 3 (转 Turn) | The twist or key insight | **Most important panel.** Show the unexpected reversal, contrast, or "aha" moment that makes the allegory work |
| 4 (合 Conclusion) | Resolution and takeaway | Show the result, consequence, or lesson learned. Can be a visual punchline or summary |
**CRITICAL**: Do NOT deviate from exactly 4 panels. No 5th panel, no title panel, no footer panel within the image.
### Single-Page Story Rule (CRITICAL)
- The entire story is told in ONE page with exactly 4 panels
- Page count: always 1 (plus optional cover)
- No multi-page four-panel stories — if content requires more, create multiple separate four-panel comics
- Storyboard structure: Cover (optional) + 1 page
### Accent Color System
- The image is primarily black-and-white line art
- Use exactly 1-2 spot colors per strip (default: orange `#FF6B35`)
- Rules:
- Key concept label or object: filled with accent color or outlined in accent
- Panel 3 (转 Turn) should have the strongest color emphasis
- Characters remain B&W — color is for concepts/objects/labels only
- Consistent accent color across all 4 panels (do not switch colors between panels)
### Character Design Rules
- Simplified stick-figure-like characters
- Distinguish characters through simple props: ties, glasses, hats, briefcases, aprons
- No detailed faces — dot eyes, line mouth at most
- Characters should be generic enough to represent archetypes (the manager, the employee, the customer)
- Maximum 2-3 characters per strip
### Text in Panels
- Chinese text for dialogue and labels (or match source language)
- Keep text minimal — 1-2 short lines per panel maximum
- Key concept terms can be highlighted with accent color background
- No narrator boxes — dialogue and labels only
- Speech bubbles: simple rectangles or ovals, thin black outline
### Optional Title & Caption
- A brief descriptive title above the 4 panels
- An optional one-line caption/moral below the panels
- These are part of the page composition, not separate panels
### Character Archetypes (Flexible)
Create simple stick-figure characters based on content. No fixed defaults:
| Role | Archetype | Visual Cues |
|------|-----------|------------|
| Protagonist | Worker/employee facing a situation | Simple figure, minimal distinguishing feature (glasses, tie) |
| Authority | Boss/manager/expert | Slightly larger figure, or prop like pointer/clipboard |
| Object | The concept itself | Labeled object, icon, or highlighted text with accent color |
### Prompt Template
When generating image prompts for four-panel comics, include these keywords:
> A minimalist, clean line art digital comic strip in a four-panel grid layout (2×2). The style is simplified cartoon illustration with clear black outlines and a minimal color palette of black, white, and specific spot [accent color] for key concepts.
Each panel description should specify:
- Panel position (Top Left / Top Right / Bottom Left / Bottom Right)
- Character poses and gestures (simple, stick-figure style)
- Dialogue text in Chinese (hand-drawn style)
- Any accent-colored elements (concept labels, key objects)
## Quality Markers
- ✓ Exactly 4 panels in strict 2×2 grid
- ✓ 起承转合 narrative arc clearly present
- ✓ 90%+ black-and-white with strategic spot color
- ✓ Simplified stick-figure characters
- ✓ Key concept visually highlighted with accent color
- ✓ Text is minimal and in Chinese (or source language)
- ✓ Single complete story in one page
- ✓ Panel 3 delivers a clear "turn" or insight
## Best For
Business allegory, management fables, short insights, workplace parables, concept contrasts, social media educational content, quick-read comics
@@ -0,0 +1,114 @@
# ohmsha
Ohmsha预设 - Educational manga with visual metaphors
## Base Configuration
| Dimension | Value |
|-----------|-------|
| Art Style | manga |
| Tone | neutral |
| Layout | webtoon (default) |
Equivalent to: art=manga, tone=neutral
## Unique Rules
This preset includes special rules beyond the art+tone combination. When the `ohmsha` preset is selected, ALL rules below must be applied.
### Visual Metaphor Requirements (CRITICAL)
Every technical concept MUST be visualized as a metaphor:
| Concept Type | Visualization Approach |
|-------------|----------------------|
| Algorithm | Gadget/machine that demonstrates the process |
| Data structure | Physical space characters can enter/explore |
| Mathematical formula | Transformation visible in environment |
| Abstract process | Tangible flow of particles/objects |
**Wrong approach**: Character points at blackboard explaining
**Right approach**: Character uses "Concept Visualizer" gadget, steps into metaphorical space
### Visual Metaphor Examples
| Concept | Wrong (Talking Head) | Right (Visual Metaphor) |
|---------|---------------------|------------------------|
| Attention mechanism | Character points at formula on blackboard | "Attention Flashlight" gadget illuminates key words in dark room |
| Gradient descent | "The algorithm minimizes loss" | Character rides ball rolling down mountain valley |
| Neural network | Diagram with arrows | Living network of glowing creatures passing messages |
| Overfitting | "The model memorized the data" | Character wearing clothes that fit only one specific pose |
### Character Roles (Required)
**DEFAULT: Use Doraemon characters** unless user explicitly specifies custom characters.
| Role | Default Character | Visual | Traits |
|------|-------------------|--------|--------|
| Student (Role A) | 大雄 (Nobita) | Boy, 10yo, round glasses, black hair, yellow shirt, navy shorts | Confused, asks basic but crucial questions, represents reader |
| Mentor (Role B) | 哆啦A梦 (Doraemon) | Blue robot cat, white belly, 4D pocket, red nose, golden bell | Knowledgeable, patient, uses gadgets as technical metaphors |
| Challenge (Role C) | 胖虎 (Gian) | Stocky boy, small eyes, orange shirt | Represents misunderstanding, or "noise" in the data |
| Support (Role D) | 静香 (Shizuka) | Cute girl, black short hair, pink dress | Asks clarifying questions, provides alternative perspectives |
**IMPORTANT**: These Doraemon characters ARE the default for ohmsha preset. Generate character definitions using these exact characters unless user requests otherwise.
To use custom characters: ask the user to provide role → character mappings (e.g., `Student:小明, Mentor:教授`).
### Page Title Convention
Every page MUST have a narrative title (not section header):
**Wrong**: "Chapter 1: Introduction to Transformers"
**Right**: "The Day Nobita Couldn't Understand Anyone"
### Gadget Reveal Pattern
When introducing a concept:
1. Student expresses confusion with visual indicator (, spiral eyes)
2. Mentor dramatically produces gadget with sparkle effects
3. Gadget name announced in bold with explanation
4. Demonstration begins - student enters metaphorical space
### Ending Requirements
Final page MUST include:
1. Student demonstrating understanding (applying the concept)
2. Callback to opening problem (now resolved)
3. Mentor's satisfied expression
4. Optional: hint at next topic
### NO Talking Heads Rule
**Critical**: Characters must DO things, not just explain.
Every panel should show:
- Action being performed
- Metaphor being demonstrated
- Character interaction with concept-space
- NOT: two characters facing each other talking
### Special Visual Elements
| Element | Usage |
|---------|-------|
| Gadget reveals | Dramatic unveiling with sparkle effects |
| Concept spaces | Rounded borders, glowing edges for "imagination mode" |
| Information displays | Holographic UI style for technical details |
| Aha moments | Radial lines, light burst effects |
| Confusion | Spiral eyes, question marks floating above head |
## Quality Markers
- ✓ Every concept is a visual metaphor
- ✓ Characters are DOING things, not just talking
- ✓ Clear student/mentor dynamic
- ✓ Gadgets and props drive the explanation
- ✓ Expressive manga-style emotions
- ✓ Information density through visual design, not text walls
- ✓ Narrative page titles
## Reference
For complete guidelines, see `references/ohmsha-guide.md`
@@ -0,0 +1,116 @@
# shoujo
少女预设 - Classic shoujo manga with romantic aesthetics
## Base Configuration
| Dimension | Value |
|-----------|-------|
| Art Style | manga |
| Tone | romantic |
| Layout | standard (default) |
Equivalent to: art=manga, tone=romantic
## Unique Rules
This preset includes special rules beyond the art+tone combination. When the `shoujo` preset is selected, ALL rules below must be applied.
### Decorative Elements (Required)
Every emotional moment must include decorative elements:
| Emotion | Required Decorations |
|---------|---------------------|
| Love | Floating hearts, sparkles, rose petals |
| Longing | Feathers, bubbles, distant sparkles |
| Joy | Flowers blooming, light bursts, stars |
| Sadness | Falling petals, fading sparkles |
| Shyness | Soft sparkles, floating bubbles |
| Realization | Radiating lines with sparkles |
### Eye Detail Requirements
Eyes are critical in shoujo style:
| Aspect | Treatment |
|--------|-----------|
| Size | Larger than standard manga (1.2x) |
| Highlights | Multiple (3-5), placed for emotion |
| Reflection | Scene reflection in emotional moments |
| Sparkle | Built-in sparkle effects |
| Tears | Crystalline, detailed teardrops |
### Character Beauty Standards
| Feature | Treatment |
|---------|-----------|
| Hair | Flowing, detailed strands, shine highlights |
| Skin | Porcelain, soft blush on cheeks |
| Lips | Soft, slightly glossy |
| Hands | Elegant, expressive gestures |
| Posture | Graceful, elegant poses |
### Background Effects
**Abstract backgrounds** for emotional moments:
| Moment Type | Background |
|-------------|-----------|
| Love confession | Soft gradient + floating flowers |
| Shock | Screen tone speed lines + sparkles |
| Memory | Dreamy blur + scattered petals |
| Realization | Radial lines + light burst |
| Intimate | Soft focus + floating elements |
### Panel Flow
- Overlap panels for intimate moments
- Break panel borders for emotional impact
- Float decorative elements between panels
- Use screen tone gradients for mood
- Irregular panel shapes for drama
### Emotional Beat Timing
Slow down pacing for emotional impact:
| Scene Type | Panel Treatment |
|------------|-----------------|
| Confession | Multiple small panels, then splash |
| Eye contact | Close-up sequence |
| Touch | Slow-motion panel breakdown |
| Realization | Build-up panels then impact |
### Color Palette Application
| Scene Type | Palette |
|------------|---------|
| Romantic | Pink, lavender, rose gold |
| Happy | Soft yellow, peach, sky blue |
| Sad | Pale blue, silver, gray lavender |
| Dramatic | Deep rose, purple, contrast |
### Screen Tone Usage
| Mood | Tone Pattern |
|------|-------------|
| Neutral | Clean, minimal |
| Romantic | Soft gradient overlays |
| Dramatic | Heavy contrast tones |
| Dreamy | Soft dot patterns |
## Quality Markers
- ✓ Large, sparkling detailed eyes
- ✓ Decorative elements in emotional moments
- ✓ Flowing, beautiful character designs
- ✓ Soft, pastel color palette
- ✓ Elegant panel compositions
- ✓ Screen tone mood effects
- ✓ Romantic atmosphere throughout
- ✓ Beautiful, expressive poses
## Best For
Romance stories, coming-of-age, friendship narratives, school life, emotional drama, love stories
@@ -0,0 +1,110 @@
# wuxia
武侠预设 - Hong Kong martial arts comic style
## Base Configuration
| Dimension | Value |
|-----------|-------|
| Art Style | ink-brush |
| Tone | action |
| Layout | splash (default) |
Equivalent to: art=ink-brush, tone=action
## Unique Rules
This preset includes special rules beyond the art+tone combination. When the `wuxia` preset is selected, ALL rules below must be applied.
### Qi/Energy Effects (Required)
Martial arts power must be visible through qi effects:
| Effect Type | Visual Treatment |
|-------------|-----------------|
| Internal qi | Glowing aura around character |
| External qi | Visible energy projection |
| Qi clash | Radiating impact waves |
| Qi absorption | Flowing particles toward character |
| Hidden power | Subtle glow in eyes/fists |
### Energy Colors
| Qi Type | Color |
|---------|-------|
| Righteous | Blue (#4299E1), Gold (#FFD700) |
| Fierce | Red (#DC2626), Orange (#EA580C) |
| Evil | Purple (#7C3AED), Green (#16A34A) |
| Pure | White, Silver |
| Ancient | Gold with particles |
### Combat Visual Language
**Impact moments** must include:
1. Speed lines radiating from impact point
2. Flying debris (stone, wood, cloth)
3. Shockwave rings
4. Dust/energy clouds
5. Hair and clothing blown back
### Movement Depiction
| Speed Level | Visual Treatment |
|-------------|-----------------|
| Normal | Standard pose |
| Fast | Motion blur, speed lines |
| Lightning | Afterimages, multiple positions |
| Teleport | Fade effect, particle trail |
### Environmental Integration
Backgrounds must support action:
| Environment | Combat Enhancement |
|-------------|-------------------|
| Mountains | Crumbling peaks from impacts |
| Forest | Exploding trees, flying leaves |
| Water | Dramatic splashes, walking on water |
| Temple | Breaking pillars, flying tiles |
| Cliff | Dramatic falls, wind effects |
### Character Pose Guidelines
- Dynamic warrior stances with weight distribution
- Flowing robes and hair showing movement
- Muscle tension visible in action
- Feet planted or in dynamic motion
- Traditional martial arts postures
### Weapon Effects
| Weapon | Visual Treatment |
|--------|-----------------|
| Sword | Trailing light arc, blade glow |
| Palm | Qi projection, wind effect |
| Staff | Spinning blur, impact ripples |
| Whip | Flowing energy trail |
### Atmospheric Elements
Always include:
- Floating particles (leaves, petals, dust)
- Ink wash mist for depth
- Wind direction indicators
- Dramatic sky/weather when appropriate
## Quality Markers
- ✓ Dynamic action poses with sense of motion
- ✓ Ink brush aesthetic in line work
- ✓ Visible qi/energy effects
- ✓ High contrast dramatic lighting
- ✓ Atmospheric backgrounds with Chinese elements
- ✓ Flowing fabric and hair movement
- ✓ Impactful combat moments
- ✓ Speed lines and impact effects
## Best For
Martial arts stories, Chinese historical fiction, wuxia/xianxia adaptations, action-heavy narratives
@@ -0,0 +1,143 @@
# Storyboard Template
## Storyboard Document Format
```markdown
---
title: "[Comic Title]"
topic: "[topic description]"
time_span: "[e.g., 1912-1954]"
narrative_approach: "[chronological/thematic/character-focused]"
recommended_style: "[style name]"
recommended_layout: "[layout name or varies]"
aspect_ratio: "3:4" # 3:4 (portrait), 4:3 (landscape), 16:9 (widescreen)
language: "[zh/en/ja/etc.]"
page_count: [N]
generated: "YYYY-MM-DD HH:mm"
---
# [Comic Title] - Knowledge Comic Storyboard
**Character Reference**: characters/characters.png
---
## Cover
**Filename**: 00-cover-[slug].png
**Core Message**: [one-liner]
**Visual Design**:
- Title typography style
- Main visual composition
- Color scheme
- Subtitle / time span notation
**Visual Prompt**:
[Detailed image generation prompt]
---
## Page 1 / N
**Filename**: 01-page-[slug].png
**Layout**: [standard/cinematic/dense/splash/mixed]
**Narrative Layer**: [Main narrative / Narrator layer / Mixed]
**Core Message**: [What this page conveys]
### Panel Layout
**Panel Count**: X
**Layout Type**: [grid/irregular/splash]
#### Panel 1 (Size: 1/3 page, Position: Top)
**Scene**: [Time, location]
**Image Description**:
- Camera angle: [bird's eye / low angle / eye level / close-up / wide shot]
- Characters: [pose, expression, action]
- Environment: [scene details, period markers]
- Lighting: [atmosphere description]
- Color tone: [palette reference]
**Text Elements**:
- Dialogue bubble (oval): "Character line"
- Narrator box (rectangular): 「Narrator commentary」
- Caption bar: [Background info text]
#### Panel 2...
**Page Hook**: [Cliffhanger or transition at page end]
**Visual Prompt**:
[Full page image generation prompt]
---
## Page 2 / N
...
```
## Cover Design Principles
- Academic gravitas with visual appeal
- Title typography reflecting knowledge/science theme
- Composition hinting at core theme (character silhouette, iconic symbol, concept diagram)
- Subtitle or time span for epic scope
## Panel Composition Guidelines
| Panel Type | Recommended Count | Usage |
|-----------|-------------------|-------|
| Main narrative | 3-5 per page | Story progression |
| Concept diagram | 1-2 per page | Visualize abstractions |
| Narrator panel | 0-1 per page | Commentary, transition |
| Splash (full/half) | Occasional | Major moments |
## Panel Size Reference
- **Full page (Splash)**: Major moments, key breakthroughs
- **Half page**: Important scenes, turning points
- **1/3 page**: Standard narrative panels
- **1/4 or smaller**: Quick progression, sequential action
## Concept Visualization Techniques
Transform abstract concepts into concrete visuals:
| Abstract Concept | Visual Approach |
|-----------------|-----------------|
| Neural network | Glowing nodes with connecting lines |
| Gradient descent | Ball rolling down valley terrain |
| Data flow | Luminous particles flowing through pipes |
| Algorithm iteration | Ascending spiral staircase |
| Breakthrough moment | Shattering barrier, piercing light |
| Logical proof | Building blocks assembling |
| Uncertainty | Forking paths, fog, multiple shadows |
## Text Element Design
| Text Type | Style | Usage |
|-----------|-------|-------|
| Character dialogue | Oval speech bubble | Main narrative speech |
| Narrator commentary | Rectangular box | Explanation, commentary |
| Caption bar | Edge-mounted rectangle | Time, location info |
| Thought bubble | Cloud shape | Character inner monologue |
| Term label | Bold / special color | First appearance of technical terms |
## Prompt Structure for Consistency
Each page prompt should include character reference:
```
[CHARACTER REFERENCE]
(Key details from characters.md for characters in this page)
[PAGE CONTENT]
(Specific scene, panel layout, and visual elements)
[CONSISTENCY REMINDER]
Maintain exact character appearances as defined in character reference.
- [Character A]: [key identifying features]
- [Character B]: [key identifying features]
```
@@ -0,0 +1,110 @@
# action
动作基调 - Speed, impact, power
## Overview
High-impact action atmosphere with dynamic movement, combat effects, and powerful visual energy. Creates visceral, exciting sequences.
## Mood Characteristics
- Speed and motion
- Power and impact
- Combat intensity
- Physical energy
- Visceral excitement
## Color Modifiers
When applied to any art style:
| Adjustment | Direction |
|------------|-----------|
| Saturation | High contrast |
| Contrast | Maximum |
| Temperature | Variable per effect |
| Brightness | Dynamic range |
## Action Effects
**Combat/motion effects** (apply liberally):
| Effect | Usage |
|--------|-------|
| Speed lines | Motion, velocity |
| Impact bursts | Hits, collisions |
| Shockwaves | Powerful impacts |
| Flying debris | Environmental destruction |
| Dust clouds | Ground impacts |
| Motion blur | Fast movement |
| Afterimages | Super speed |
## Special Effects
| Effect Type | Visual Approach |
|------------|-----------------|
| Energy attacks | Glowing, radiating |
| Physical impacts | Radiating lines, debris |
| Movement | Speed lines, blur |
| Atmosphere | Flying particles, wind |
## Effect Colors
| Effect | Color | Hex |
|--------|-------|-----|
| Energy glow | Blue | #4299E1 |
| Fire/power | Gold | #FFD700 |
| Impact | White burst | #FFFFFF |
| Blood/intensity | Deep red | #8B0000 |
## Lighting
- Dynamic, shifting
- Impact flashes
- Energy glow sources
- Rim lighting on figures
- Dramatic contrast
## Emotional Range
| Emotion | Expression |
|---------|-----------|
| Determination | Fierce focus |
| Rage | Intense, powerful |
| Triumph | Victorious pose |
| Struggle | Strained effort |
## Composition
- Dynamic angles
- Extreme perspectives
- Panel-breaking layouts
- Asymmetric designs
- Impact-focused framing
## Pose Guidelines
- Dynamic warrior poses
- Weight and momentum visible
- Muscle tension shown
- Flow of movement captured
- Impact points emphasized
## Best For
- Martial arts combat
- Action sequences
- Sports moments
- Physical challenges
- Battle scenes
- Climactic confrontations
## Combination Notes
Works especially well with:
- ink-brush: wuxia combat
- manga: shonen battles
Avoid with:
- chalk: style mismatch
- ligne-claire: style mismatch (too static)
@@ -0,0 +1,95 @@
# dramatic
戏剧基调 - High contrast, intense, powerful moments
## Overview
High-impact dramatic tone for pivotal moments, conflicts, and breakthroughs. Uses strong contrast and intense compositions to create emotional power.
## Mood Characteristics
- Tension and intensity
- Pivotal moments
- Conflict and resolution
- Breakthrough discoveries
- Emotional climaxes
## Color Modifiers
When applied to any art style:
| Adjustment | Direction |
|------------|-----------|
| Saturation | High (vibrant or deep) |
| Contrast | Maximum |
| Temperature | Varies for effect |
| Brightness | Strong highlights, deep shadows |
## Contrast Approach
- Sharp light/dark divisions
- Minimal mid-tones
- Stark compositions
- Silhouette potential
- Rim lighting effects
## Accent Colors
- Deep navy (#1A365D)
- Crimson (#9B2C2C)
- Stark white
- Heavy blacks
- Limited palette per scene
## Lighting
- Dramatic single-source
- High contrast shadows
- Rim lighting on characters
- Spotlight effects
- Chiaroscuro influence
## Emotional Range
| Emotion | Expression |
|---------|-----------|
| Anger | Intense, defined features |
| Determination | Strong, focused gaze |
| Shock | Wide eyes, stark lighting |
| Triumph | Powerful, elevated pose |
## Composition
- Angular, dynamic layouts
- Dramatic camera angles
- Low/high viewpoints
- Diagonal compositions
- Negative space for impact
## Visual Elements
- Speed lines for tension
- Impact effects
- Dramatic backgrounds (storms, fire)
- Silhouettes
- Light burst effects
- Environmental drama
## Best For
- Pivotal discoveries
- Conflict scenes
- Climactic moments
- Breakthrough realizations
- Emotional confrontations
- Historical turning points
## Combination Notes
Works especially well with:
- realistic: powerful drama
- ink-brush: martial arts climax
- ligne-claire: historical pivots
- manga: shonen battles
Avoid with: chalk (style mismatch)
@@ -0,0 +1,105 @@
# energetic
活力基调 - Bright, dynamic, exciting
## Overview
High-energy atmosphere for exciting, discovery-filled content. Bright colors, dynamic compositions, and movement create engaging visuals for younger audiences.
## Mood Characteristics
- Excitement and wonder
- Discovery and learning
- Energy and enthusiasm
- Movement and action
- Youthful spirit
## Color Modifiers
When applied to any art style:
| Adjustment | Direction |
|------------|-----------|
| Saturation | High (vibrant) |
| Contrast | Medium-high |
| Temperature | Variable, punchy |
| Brightness | Bright, clean |
## Color Palette
Shift toward vibrant tones:
| Role | Color | Hex |
|------|-------|-----|
| Primary Red | Bright red | #F56565 |
| Primary Yellow | Sunny yellow | #F6E05E |
| Primary Blue | Sky blue | #63B3ED |
| Accent 1 | Magenta | #D53F8C |
| Accent 2 | Lime green | #68D391 |
| Background | Clean white | #FFFFFF |
| Background Alt | Bright pastels | Various |
## Lighting
- Bright, clear lighting
- Clean shadows
- High energy
- Spotlight effects for emphasis
- Dynamic light sources
## Dynamic Elements
**Energy effects** (add to compositions):
| Element | Usage |
|---------|-------|
| Speed lines | Motion, excitement |
| Sparkles | Discoveries |
| Burst effects | Aha moments |
| Motion blur | Fast action |
| Star bursts | Emphasis |
| Sweat drops | Effort/surprise |
## Emotional Range
| Emotion | Expression |
|---------|-----------|
| Excitement | Wide eyes, big smile |
| Surprise | Dramatic reaction |
| Determination | Intense focus |
| Wonder | Sparkling eyes |
## Composition
- Dynamic angles
- Action-oriented layouts
- Movement emphasis
- Clean, punchy designs
- Energy flows
## Visual Style
- Expressive, animated characters
- Wide eyes, big reactions
- Dynamic poses
- Motion and action focus
- Simplified backgrounds for energy
## Best For
- Science explanations
- "Aha" moments
- Young audience content
- Discovery narratives
- Learning adventures
- Action tutorials
## Combination Notes
Works especially well with:
- manga: shonen energy
- chalk: fun education
Avoid with:
- realistic: style mismatch
- ink-brush: style mismatch
@@ -0,0 +1,63 @@
# neutral
中性基调 - Balanced, rational, educational
## Overview
Default balanced tone suitable for educational and informative content. Neither overly emotional nor cold - creates accessible, professional atmosphere.
## Mood Characteristics
- Balanced emotional register
- Clear, rational presentation
- Educational focus
- Professional but approachable
- Objective storytelling
## Color Modifiers
When applied to any art style:
| Adjustment | Direction |
|------------|-----------|
| Saturation | Standard (no shift) |
| Contrast | Balanced |
| Temperature | Neutral |
| Brightness | Slightly bright |
## Lighting
- Even, clear lighting
- Minimal dramatic shadows
- Consistent across panels
- Natural light sources
- No extreme contrast
## Emotional Range
| Emotion | Expression Level |
|---------|-----------------|
| Joy | Moderate smile |
| Concern | Thoughtful expression |
| Surprise | Mild widening of eyes |
| Frustration | Slight frown |
## Composition
- Balanced panel layouts
- Clear focal points
- Readable hierarchies
- Standard framing
- Functional compositions
## Best For
- Educational content
- Technical tutorials
- Informative biographies
- Documentary style
- Professional topics
## Usage Notes
Neutral is the default tone. Combine with any art style for baseline professional output. Most versatile tone option.
@@ -0,0 +1,100 @@
# romantic
浪漫基调 - Soft, beautiful, emotionally delicate
## Overview
Soft, dreamy atmosphere for romantic and emotionally delicate content. Features decorative elements, sparkles, and beautiful compositions that emphasize feeling and beauty.
## Mood Characteristics
- Romance and love
- Beauty and elegance
- Emotional delicacy
- Dreams and hopes
- Youth and idealism
## Color Modifiers
When applied to any art style:
| Adjustment | Direction |
|------------|-----------|
| Saturation | Soft pastels |
| Contrast | Low, gentle |
| Temperature | Slightly warm pink |
| Brightness | Soft, glowing |
## Color Palette
Shift toward romantic tones:
| Role | Color | Hex |
|------|-------|-----|
| Primary | Soft pink | #FFB6C1 |
| Secondary | Lavender | #E6E6FA |
| Accent | Rose | #FF69B4 |
| Highlight | Pearl white | #FFFAF0 |
| Gold | Gold sparkle | #FFD700 |
| Skin | Porcelain | #FFF5EE |
| Blush | Soft blush | #FFE4E1 |
| Background | Soft cream | #FFF8DC |
## Lighting
- Soft, diffused light
- Glowing effects
- Backlighting halos
- Sparkle highlights
- Dreamy atmospheres
## Decorative Elements
**Essential decorations** (add to compositions):
| Element | Usage |
|---------|-------|
| Flower petals | Floating, framing |
| Sparkles | Emotional highlights |
| Bubbles | Dreamy moments |
| Feathers | Gentle floating |
| Stars | Night scenes, wonder |
| Hearts | Love emphasis |
| Light halos | Character highlights |
## Emotional Range
| Emotion | Expression |
|---------|-----------|
| Love | Soft gaze, blush |
| Longing | Distant, beautiful sadness |
| Joy | Radiant smile, sparkles |
| Shyness | Downcast eyes, blush |
## Composition
- Elegant, flowing layouts
- Soft focus backgrounds
- Characters framed by decorations
- Beautiful angles (3/4 profiles)
- Screen tone gradients
## Best For
- Romance stories
- Coming-of-age
- Friendship narratives
- Emotional drama
- School life
- Beautiful moments
## Combination Notes
Works especially well with:
- manga: classic shoujo style
Avoid with:
- realistic: style mismatch
- ink-brush: style mismatch
- ligne-claire: style mismatch
- chalk: style mismatch
@@ -0,0 +1,104 @@
# vintage
复古基调 - Historical, aged, period authenticity
## Overview
Historical atmosphere with aged paper effects and period-appropriate aesthetics. Creates sense of time, authenticity, and historical distance.
## Mood Characteristics
- Historical authenticity
- Period distance
- Archival quality
- Time and memory
- Classical elegance
## Color Modifiers
When applied to any art style:
| Adjustment | Direction |
|------------|-----------|
| Saturation | Reduced, muted |
| Contrast | Medium, aged |
| Temperature | Sepia shift |
| Brightness | Slightly faded |
## Color Palette
Shift toward aged tones:
| Role | Color | Hex |
|------|-------|-----|
| Primary | Sepia brown | #8B7355 |
| Background | Aged paper | #F5E6D3 |
| Accent 1 | Faded teal | #6B8E8E |
| Accent 2 | Muted burgundy | #7B3F3F |
| Ink | Aged black | #3D3D3D |
| Yellowed | Paper yellow | #F5DEB3 |
## Visual Effects
**Aging effects** (apply subtly):
| Effect | Application |
|--------|-------------|
| Paper aging | Background texture |
| Faded edges | Vignette effect |
| Dust specks | Subtle overlay |
| Yellowing | Color shift |
| Wear marks | Corner/edge details |
## Period Elements
- Historical typography
- Period-accurate details
- Archival presentation
- Classical compositions
- Formal framing
## Lighting
- Natural, period-appropriate
- Oil lamp/candle warmth
- Soft, diffused light
- Indoor historical lighting
- Photographic quality
## Emotional Range
| Emotion | Expression |
|---------|-----------|
| Dignity | Formal, composed |
| Sorrow | Restrained, elegant |
| Pride | Classical posture |
| Wisdom | Aged grace |
## Composition
- Classical framing
- Formal compositions
- Period-appropriate staging
- Documentary style
- Historical accuracy priority
## Best For
- Pre-1950s stories
- Classical science history
- Historical biographies
- Period pieces
- Documentary comics
- Archival narratives
## Combination Notes
Works especially well with:
- realistic: period drama
- ligne-claire: historical adventure
- ink-brush: classical Asian stories
Avoid with:
- manga: style mismatch (too modern)
- chalk: style mismatch (modern educational)
@@ -0,0 +1,94 @@
# warm
温馨基调 - Nostalgic, personal, comforting
## Overview
Warm, inviting atmosphere for personal stories and nostalgic content. Creates emotional connection through cozy aesthetics and comforting visuals.
## Mood Characteristics
- Nostalgic feeling
- Personal, intimate atmosphere
- Comforting and healing
- Memory and reflection
- Gentle emotional warmth
## Color Modifiers
When applied to any art style:
| Adjustment | Direction |
|------------|-----------|
| Saturation | Slightly reduced |
| Contrast | Softer |
| Temperature | Warm shift (+15%) |
| Brightness | Soft, golden |
## Color Temperature
Shift palette toward warm tones:
| Original | Warm Shift |
|----------|-----------|
| Cool blue | Soft teal |
| Pure white | Cream |
| Gray | Warm gray |
| Black | Soft charcoal |
## Accent Colors
- Golden yellow (#D69E2E)
- Soft orange (#DD6B20)
- Warm brown (#8B6F47)
- Sunset tones
## Lighting
- Golden hour lighting
- Soft, diffused light
- Warm indoor glow
- Candle/lamp warmth
- Gentle shadows
## Emotional Range
| Emotion | Expression |
|---------|-----------|
| Joy | Genuine warm smile |
| Sadness | Gentle melancholy |
| Love | Soft, tender expressions |
| Memory | Distant, reflective gaze |
## Composition
- Intimate framing
- Cozy environments
- Soft focus backgrounds
- Welcoming spaces
- Personal moments highlighted
## Visual Elements
- Warm light rays
- Soft edges
- Nostalgic props (old photos, keepsakes)
- Comfort objects (blankets, tea cups)
- Nature elements (autumn leaves, sunset)
## Best For
- Personal stories
- Childhood memories
- Mentorship narratives
- Family histories
- Gentle biographies
- Healing journeys
## Combination Notes
Works especially well with:
- ligne-claire: nostalgic European comics
- realistic: touching human stories
- manga: slice-of-life warmth
- chalk: nostalgic education
@@ -0,0 +1,401 @@
# Complete Workflow
Full workflow for generating knowledge comics.
## Progress Checklist
Copy and track progress:
```
Comic Progress:
- [ ] Step 1: Setup & Analyze
- [ ] 1.1 Analyze content
- [ ] 1.2 Check existing ⚠️ REQUIRED
- [ ] Step 2: Confirmation - Style & options ⚠️ REQUIRED
- [ ] Step 3: Generate storyboard + characters
- [ ] Step 4: Review outline (conditional)
- [ ] Step 5: Generate prompts
- [ ] Step 6: Review prompts (conditional)
- [ ] Step 7: Generate images
- [ ] 7.1 Character sheet (if needed)
- [ ] 7.2 Generate pages
- [ ] Step 8: Completion report
```
## Flow Diagram
```
Input → Analyze → [Check Existing?] → [Confirm: Style + Reviews] → Storyboard → [Review Outline?] → Prompts → [Review Prompts?] → Images → Complete
```
---
## Step 1: Setup & Analyze
### 1.1 Analyze Content → `analysis.md`
Read source content, save it if needed, and perform deep analysis.
**Actions**:
1. **Save source content** (if not already a file):
- If user provides a file path: use as-is
- If user pastes content: save to `source-{slug}.md` in the target directory using `write_file`, where `{slug}` is the kebab-case topic slug used for the output directory
- **Backup rule**: If `source-{slug}.md` already exists, rename it to `source-{slug}-backup-YYYYMMDD-HHMMSS.md` before writing
2. Read source content
3. **Deep analysis** following `analysis-framework.md`:
- Target audience identification
- Value proposition for readers
- Core themes and narrative potential
- Key figures and their story arcs
4. Detect source language
5. **Determine language**:
- If user specified a language → use it
- Else → use detected source language or user's conversation language
6. Determine recommended page count:
- Short story: 5-8 pages
- Medium complexity: 9-15 pages
- Full biography: 16-25 pages
7. Analyze content signals for art/tone/layout recommendations
8. **Save to `analysis.md`** using `write_file`
**analysis.md Format**: YAML front matter (title, topic, time_span, source_language, user_language, aspect_ratio, recommended_page_count, recommended_art, recommended_tone) + sections for Target Audience, Value Proposition, Core Themes, Key Figures & Story Arcs, Content Signals, Recommended Approaches. See `analysis-framework.md` for full template.
### 1.2 Check Existing Content ⚠️ REQUIRED
**MUST execute before proceeding to Step 2.**
Check if the output directory exists (e.g., via `test -d "comic/{topic-slug}"`).
**If directory exists**, use `clarify`:
```
question: "Existing content found at comic/{topic-slug}. How to proceed?"
options:
- "Regenerate storyboard — Keep images, regenerate storyboard and characters only"
- "Regenerate images — Keep storyboard, regenerate images only"
- "Backup and regenerate — Backup to {slug}-backup-{timestamp}, then regenerate all"
- "Exit — Cancel, keep existing content unchanged"
```
Save result and handle accordingly:
- **Regenerate storyboard**: Skip to Step 3, preserve `prompts/` and images
- **Regenerate images**: Skip to Step 7, use existing prompts
- **Backup and regenerate**: Move directory, start fresh from Step 2
- **Exit**: End workflow immediately
---
## Step 2: Confirmation - Style & Options ⚠️
**Purpose**: Select visual style + decide whether to review outline before generation. **Do NOT skip.**
**Display summary first**:
- Content type + topic identified
- Key figures extracted
- Time span detected
- Recommended page count
- Language (detected or user-specified)
- **Recommended style**: [art] + [tone] (based on content signals)
**Use `clarify` one question at a time**, in priority order:
> **Timeout handling (CRITICAL)**: if `clarify` returns `"The user did not provide a response within the time limit. Use your best judgement..."`, that is a per-question default, NOT blanket consent. Continue to the next question in the sequence — do not bail out of Step 2. Then, in your next user-visible message, explicitly surface every default that was taken (e.g. `"Defaulted style → ohmsha, narrative focus → concept explanation, audience → developers (clarify timed out on all three). Say the word to redirect."`). An unreported default is indistinguishable to the user from "the agent never asked."
### Question 1: Visual Style
If a preset is recommended (see `auto-selection.md`), show it first:
```
question: "Which visual style for this comic?"
options:
- "[preset name] preset (Recommended) — [preset description] with special rules"
- "[recommended art] + [recommended tone] (Recommended) — Best match for your content"
- "ligne-claire + neutral — Classic educational, Logicomix style"
- "ohmsha preset — Educational manga with visual metaphors, gadgets, NO talking heads"
- "Custom — Specify your own art + tone or preset"
```
**Preset vs Art+Tone**: Presets include special rules beyond art+tone. `ohmsha` = manga + neutral + visual metaphor rules + character roles + NO talking heads. Plain `manga + neutral` does NOT include these rules.
### Question 2: Narrative Focus
```
question: "What should the comic emphasize? (Pick the primary focus; mention others in a follow-up if needed)"
options:
- "Biography/life story — Follow a person's journey through key life events"
- "Concept explanation — Break down complex ideas visually"
- "Historical event — Dramatize important historical moments"
- "Tutorial/how-to — Step-by-step educational guide"
```
### Question 3: Target Audience
```
question: "Who is the primary reader?"
options:
- "General readers — Broad appeal, accessible content"
- "Students/learners — Educational focus, clear explanations"
- "Industry professionals — Technical depth, domain knowledge"
- "Children/young readers — Simplified language, engaging visuals"
```
### Question 4: Outline Review
```
question: "Do you want to review the outline before image generation?"
options:
- "Yes, let me review (Recommended) — Review storyboard and characters before generating images"
- "No, generate directly — Skip outline review, start generating immediately"
```
### Question 5: Prompt Review
```
question: "Review prompts before generating images?"
options:
- "Yes, review prompts (Recommended) — Review image generation prompts before generating"
- "No, skip prompt review — Proceed directly to image generation"
```
**After responses**:
1. Update `analysis.md` with user preferences
2. **Store `skip_outline_review`** flag based on Question 4 response
3. **Store `skip_prompt_review`** flag based on Question 5 response
4. → Step 3
---
## Step 3: Generate Storyboard + Characters
Create storyboard and character definitions using the confirmed style from Step 2.
**Loading Style References**:
- Art style: `art-styles/{art}.md`
- Tone: `tones/{tone}.md`
- If preset (ohmsha/wuxia/shoujo/concept-story/four-panel): also load `presets/{preset}.md`
**Generate**:
1. **Storyboard** (`storyboard.md`):
- YAML front matter with art_style, tone, layout, aspect_ratio
- Cover design
- Each page: layout, panel breakdown, visual prompts
- **Written in user's preferred language** (from Step 1)
- Reference: `storyboard-template.md`
- **If using preset**: Load and apply preset rules from `presets/`
2. **Character definitions** (`characters/characters.md`):
- Visual specs matching the art style (in user's preferred language)
- Include Reference Sheet Prompt for later image generation
- Reference: `character-template.md`
- **If using ohmsha preset**: Use default Doraemon characters (see below)
**Ohmsha Default Characters** (use these unless user specifies custom characters):
| Role | Character | Visual Description |
|------|-----------|-------------------|
| Student | 大雄 (Nobita) | Japanese boy, 10yo, round glasses, black hair parted in middle, yellow shirt, navy shorts |
| Mentor | 哆啦 A 梦 (Doraemon) | Round blue robot cat, big white eyes, red nose, whiskers, white belly with 4D pocket, golden bell, no ears |
| Challenge | 胖虎 (Gian) | Stocky boy, rough features, small eyes, orange shirt |
| Support | 静香 (Shizuka) | Cute girl, black short hair, pink dress, gentle expression |
These are the canonical ohmsha-style characters. Do NOT create custom characters for ohmsha unless explicitly requested.
**After generation**:
- If `skip_outline_review` is true → Skip Step 4, go directly to Step 5
- If `skip_outline_review` is false → Continue to Step 4
---
## Step 4: Review Outline (Conditional)
**Skip this step** if user selected "No, generate directly" in Step 2.
**Purpose**: User reviews and confirms storyboard + characters before generation.
**Display**:
- Page count and structure
- Art style + Tone combination
- Page-by-page summary (Cover → P1 → P2...)
- Character list with brief descriptions
**Use `clarify`**:
```
question: "Ready to generate images with this outline?"
options:
- "Yes, proceed (Recommended) — Generate character sheet and comic pages"
- "Edit storyboard first — I'll modify storyboard.md before continuing"
- "Edit characters first — I'll modify characters/characters.md before continuing"
- "Edit both — I'll modify both files before continuing"
```
**After response**:
1. If user wants to edit → Wait for user to finish editing, then ask again
2. If user confirms → Continue to Step 5
---
## Step 5: Generate Prompts
Create image generation prompts for all pages.
**Style Reference Loading**:
- Read `art-styles/{art}.md` for rendering guidelines
- Read `tones/{tone}.md` for mood/color adjustments
- If preset: Read `presets/{preset}.md` for special rules
**For each page (cover + pages)**:
1. Create prompt following art style + tone guidelines
2. **Embed character descriptions** inline (copy relevant traits from `characters/characters.md`) — `image_generate` is prompt-only, so the prompt text is the sole vehicle for character consistency
3. Save to `prompts/NN-{cover|page}-[slug].md` using `write_file`
- **Backup rule**: If prompt file exists, rename to `prompts/NN-{cover|page}-[slug]-backup-YYYYMMDD-HHMMSS.md`
**Prompt File Format**:
```markdown
# Page NN: [Title]
## Visual Style
Art: [art style] | Tone: [tone] | Layout: [layout type]
## Character Reference (embedded inline — maintain exact traits below)
- [Character A]: [detailed visual traits from characters/characters.md]
- [Character B]: [detailed visual traits from characters/characters.md]
## Panel Breakdown
[From storyboard.md - panel descriptions, actions, dialogue]
## Generation Prompt
[Combined prompt passed to image_generate]
```
**After generation**:
- If `skip_prompt_review` is true → Skip Step 6, go directly to Step 7
- If `skip_prompt_review` is false → Continue to Step 6
---
## Step 6: Review Prompts (Conditional)
**Skip this step** if user selected "No, skip prompt review" in Step 2.
**Purpose**: User reviews and confirms prompts before image generation.
**Display prompt summary table**:
| Page | Title | Key Elements |
|------|-------|--------------|
| Cover | [title] | [main visual] |
| P1 | [title] | [key elements] |
| ... | ... | ... |
**Use `clarify`**:
```
question: "Ready to generate images with these prompts?"
options:
- "Yes, proceed (Recommended) — Generate all comic page images"
- "Edit prompts first — I'll modify prompts/*.md before continuing"
- "Regenerate prompts — Regenerate all prompts with different approach"
```
**After response**:
1. If user wants to edit → Wait for user to finish editing, then ask again
2. If user wants to regenerate → Go back to Step 5
3. If user confirms → Continue to Step 7
---
## Step 7: Generate Images
With confirmed prompts from Step 5/6, use the `image_generate` tool. The tool accepts only `prompt` and `aspect_ratio` (`landscape` | `portrait` | `square`) and **returns a URL** — it does not accept reference images and does not write local files. Every invocation must be followed by a download step.
**Aspect ratio mapping** — map the storyboard's `aspect_ratio` to the tool's enum:
| Storyboard ratio | `image_generate` format |
|------------------|-------------------------|
| `3:4`, `9:16`, `2:3` | `portrait` |
| `4:3`, `16:9`, `3:2` | `landscape` |
| `1:1` | `square` |
**Download procedure** (run after every successful `image_generate` call):
1. Extract the `url` field from the tool result
2. Fetch it to disk, e.g. `curl -fsSL "<url>" -o comic/{slug}/<target>.png`
3. Verify the file is non-empty (`test -s <target>.png`); on failure, retry the generation once
### 7.1 Generate Character Reference Sheet (conditional)
Character sheet is recommended for multi-page comics with recurring characters, but **NOT required** for all presets.
**When to generate**:
| Condition | Action |
|-----------|--------|
| Multi-page comic with detailed/recurring characters | Generate character sheet (recommended) |
| Preset with simplified characters (e.g., four-panel minimalist) | Skip — prompt descriptions are sufficient |
| Single-page comic | Skip unless characters are complex |
**When generating**:
1. Use Reference Sheet Prompt from `characters/characters.md`
2. **Backup rule**: If `characters/characters.png` exists, rename to `characters/characters-backup-YYYYMMDD-HHMMSS.png`
3. Call `image_generate` with `landscape` format
4. Download the returned URL → save to `characters/characters.png`
**Important**: the downloaded sheet is a **human-facing review artifact** (so the user can visually verify character design) and a reference for later regenerations or manual prompt edits. It does **not** drive Step 7.2 — page prompts were already written in Step 5 from the text descriptions in `characters/characters.md`. `image_generate` cannot accept images as visual input, so the text is the sole cross-page consistency mechanism.
### 7.2 Generate Comic Pages
**Before generating any page**:
1. Confirm each prompt file exists at `prompts/NN-{cover|page}-[slug].md`
2. Confirm that each prompt has character descriptions embedded inline (see Step 5). `image_generate` is prompt-only, so the prompt text is the sole consistency mechanism.
**Page Generation Strategy**: every page prompt must embed character descriptions (sourced from `characters/characters.md`) inline. This is done during Step 5, uniformly whether or not the PNG sheet was produced in 7.1 — the PNG is only a review/regeneration aid, never a generation input.
**Example embedded prompt** (`prompts/01-page-xxx.md`):
```markdown
# Page 01: [Title]
## Character Reference (embedded inline — maintain consistency)
- 大雄:Japanese boy, round glasses, yellow shirt, navy shorts, worried expression...
- 哆啦 A 梦:Round blue robot cat, white belly, red nose, golden bell, 4D pocket...
## Page Content
[Original page prompt body — panels, dialogue, visual metaphors]
```
**For each page (cover + pages)**:
1. Read prompt from `prompts/NN-{cover|page}-[slug].md`
2. **Backup rule**: If image file exists, rename to `NN-{cover|page}-[slug]-backup-YYYYMMDD-HHMMSS.png`
3. Call `image_generate` with the prompt text and mapped aspect ratio
4. Download the returned URL → save to `NN-{cover|page}-[slug].png`
5. Report progress after each generation: "Generated X/N: [page title]"
---
## Step 8: Completion Report
```
Comic Complete!
Title: [title] | Art: [art] | Tone: [tone] | Pages: [count] | Aspect: [ratio] | Language: [lang]
Location: [path]
✓ source-{slug}.md (if content was pasted)
✓ analysis.md
✓ characters.png (if generated)
✓ 00-cover-[slug].png ... NN-page-[slug].png
```
---
## Page Modification
| Action | Steps |
|--------|-------|
| **Edit** | Update prompt → Regenerate image → Download new PNG |
| **Add** | Create prompt at position → Generate image → Download PNG → Renumber subsequent (NN+1) → Update storyboard |
| **Delete** | Remove files → Renumber subsequent (NN-1) → Update storyboard |
**File naming**: `NN-{cover|page}-[slug].png` (e.g., `03-page-enigma-machine.png`)
- Slugs: kebab-case, unique, derived from content
- Renumbering: Update NN prefix only, slugs unchanged
+90
View File
@@ -483,6 +483,7 @@ class TestNousAuxiliaryRefresh:
with (
patch("agent.auxiliary_client._read_nous_auth", return_value={"access_token": "stale-token"}),
patch("agent.auxiliary_client._resolve_nous_runtime_api", return_value=("fresh-agent-key", fresh_base)),
patch("hermes_cli.models.get_nous_recommended_aux_model", return_value=None),
patch("agent.auxiliary_client.OpenAI") as mock_openai,
):
from agent.auxiliary_client import _try_nous
@@ -491,10 +492,60 @@ class TestNousAuxiliaryRefresh:
client, model = _try_nous()
assert client is not None
# No Portal recommendation → falls back to the hardcoded default.
assert model == "google/gemini-3-flash-preview"
assert mock_openai.call_args.kwargs["api_key"] == "fresh-agent-key"
assert mock_openai.call_args.kwargs["base_url"] == fresh_base
def test_try_nous_uses_portal_recommendation_for_text(self):
"""When the Portal recommends a compaction model, _try_nous honors it."""
fresh_base = "https://inference-api.nousresearch.com/v1"
with (
patch("agent.auxiliary_client._read_nous_auth", return_value={"access_token": "***"}),
patch("agent.auxiliary_client._resolve_nous_runtime_api", return_value=("fresh-agent-key", fresh_base)),
patch("hermes_cli.models.get_nous_recommended_aux_model", return_value="minimax/minimax-m2.7") as mock_rec,
patch("agent.auxiliary_client.OpenAI") as mock_openai,
):
from agent.auxiliary_client import _try_nous
mock_openai.return_value = MagicMock()
client, model = _try_nous(vision=False)
assert client is not None
assert model == "minimax/minimax-m2.7"
assert mock_rec.call_args.kwargs["vision"] is False
def test_try_nous_uses_portal_recommendation_for_vision(self):
"""Vision tasks should ask for the vision-specific recommendation."""
fresh_base = "https://inference-api.nousresearch.com/v1"
with (
patch("agent.auxiliary_client._read_nous_auth", return_value={"access_token": "***"}),
patch("agent.auxiliary_client._resolve_nous_runtime_api", return_value=("fresh-agent-key", fresh_base)),
patch("hermes_cli.models.get_nous_recommended_aux_model", return_value="google/gemini-3-flash-preview") as mock_rec,
patch("agent.auxiliary_client.OpenAI"),
):
from agent.auxiliary_client import _try_nous
client, model = _try_nous(vision=True)
assert client is not None
assert model == "google/gemini-3-flash-preview"
assert mock_rec.call_args.kwargs["vision"] is True
def test_try_nous_falls_back_when_recommendation_lookup_raises(self):
"""If the Portal lookup throws, we must still return a usable model."""
fresh_base = "https://inference-api.nousresearch.com/v1"
with (
patch("agent.auxiliary_client._read_nous_auth", return_value={"access_token": "***"}),
patch("agent.auxiliary_client._resolve_nous_runtime_api", return_value=("fresh-agent-key", fresh_base)),
patch("hermes_cli.models.get_nous_recommended_aux_model", side_effect=RuntimeError("portal down")),
patch("agent.auxiliary_client.OpenAI"),
):
from agent.auxiliary_client import _try_nous
client, model = _try_nous()
assert client is not None
assert model == "google/gemini-3-flash-preview"
def test_call_llm_retries_nous_after_401(self):
class _Auth401(Exception):
status_code = 401
@@ -731,6 +782,45 @@ def test_resolve_api_key_provider_skips_unconfigured_anthropic(monkeypatch):
# ---------------------------------------------------------------------------
class TestModelDefaultElimination:
"""_resolve_api_key_provider must skip providers without known aux models."""
def test_unknown_provider_skipped(self, monkeypatch):
"""Providers not in _API_KEY_PROVIDER_AUX_MODELS are skipped, not sent model='default'."""
from agent.auxiliary_client import _API_KEY_PROVIDER_AUX_MODELS
# Verify our known providers have entries
assert "gemini" in _API_KEY_PROVIDER_AUX_MODELS
assert "kimi-coding" in _API_KEY_PROVIDER_AUX_MODELS
# A random provider_id not in the dict should return None
assert _API_KEY_PROVIDER_AUX_MODELS.get("totally-unknown-provider") is None
def test_known_provider_gets_real_model(self):
"""Known providers get a real model name, not 'default'."""
from agent.auxiliary_client import _API_KEY_PROVIDER_AUX_MODELS
for provider_id, model in _API_KEY_PROVIDER_AUX_MODELS.items():
assert model != "default", f"{provider_id} should not map to 'default'"
assert isinstance(model, str) and model.strip(), \
f"{provider_id} should have a non-empty model string"
def test_volcengine_byteplus_use_main_model_first(self):
"""Volcengine/BytePlus use main-model-first — no entry in _API_KEY_PROVIDER_AUX_MODELS."""
from agent.auxiliary_client import _API_KEY_PROVIDER_AUX_MODELS
assert "volcengine" not in _API_KEY_PROVIDER_AUX_MODELS
assert "byteplus" not in _API_KEY_PROVIDER_AUX_MODELS
class TestContractProviderAliases:
def test_coding_plan_aliases_normalize_to_canonical_provider(self):
from agent.auxiliary_client import _normalize_aux_provider
assert _normalize_aux_provider("volcengine-coding-plan") == "volcengine"
assert _normalize_aux_provider("byteplus-coding-plan") == "byteplus"
# ---------------------------------------------------------------------------
# _try_payment_fallback reason parameter (#7512 bug 3)
# ---------------------------------------------------------------------------
+7 -1
View File
@@ -298,9 +298,15 @@ class TestClassifyApiError:
assert result.retryable is False
def test_404_generic(self):
# Generic 404 with no "model not found" signal — common for local
# llama.cpp/Ollama/vLLM endpoints with slightly wrong paths. Treat
# as unknown (retryable) so the real error surfaces, rather than
# claiming the model is missing and silently falling back.
e = MockAPIError("Not Found", status_code=404)
result = classify_api_error(e)
assert result.reason == FailoverReason.model_not_found
assert result.reason == FailoverReason.unknown
assert result.retryable is True
assert result.should_fallback is False
# ── Payload too large ──
+111
View File
@@ -0,0 +1,111 @@
"""Tests for agent/image_gen_registry.py — provider registration & active lookup."""
from __future__ import annotations
import pytest
from agent import image_gen_registry
from agent.image_gen_provider import ImageGenProvider
class _FakeProvider(ImageGenProvider):
def __init__(self, name: str, available: bool = True):
self._name = name
self._available = available
@property
def name(self) -> str:
return self._name
def is_available(self) -> bool:
return self._available
def generate(self, prompt, aspect_ratio="landscape", **kw):
return {"success": True, "image": f"{self._name}://{prompt}"}
@pytest.fixture(autouse=True)
def _reset_registry():
image_gen_registry._reset_for_tests()
yield
image_gen_registry._reset_for_tests()
class TestRegisterProvider:
def test_register_and_lookup(self):
provider = _FakeProvider("fake")
image_gen_registry.register_provider(provider)
assert image_gen_registry.get_provider("fake") is provider
def test_rejects_non_provider(self):
with pytest.raises(TypeError):
image_gen_registry.register_provider("not a provider") # type: ignore[arg-type]
def test_rejects_empty_name(self):
class Empty(ImageGenProvider):
@property
def name(self) -> str:
return ""
def generate(self, prompt, aspect_ratio="landscape", **kw):
return {}
with pytest.raises(ValueError):
image_gen_registry.register_provider(Empty())
def test_reregister_overwrites(self):
a = _FakeProvider("same")
b = _FakeProvider("same")
image_gen_registry.register_provider(a)
image_gen_registry.register_provider(b)
assert image_gen_registry.get_provider("same") is b
def test_list_is_sorted(self):
image_gen_registry.register_provider(_FakeProvider("zeta"))
image_gen_registry.register_provider(_FakeProvider("alpha"))
names = [p.name for p in image_gen_registry.list_providers()]
assert names == ["alpha", "zeta"]
class TestGetActiveProvider:
def test_single_provider_autoresolves(self, tmp_path, monkeypatch):
monkeypatch.setenv("HERMES_HOME", str(tmp_path))
image_gen_registry.register_provider(_FakeProvider("solo"))
active = image_gen_registry.get_active_provider()
assert active is not None and active.name == "solo"
def test_fal_preferred_on_multi_without_config(self, tmp_path, monkeypatch):
monkeypatch.setenv("HERMES_HOME", str(tmp_path))
image_gen_registry.register_provider(_FakeProvider("fal"))
image_gen_registry.register_provider(_FakeProvider("openai"))
active = image_gen_registry.get_active_provider()
assert active is not None and active.name == "fal"
def test_explicit_config_wins(self, tmp_path, monkeypatch):
import yaml
monkeypatch.setenv("HERMES_HOME", str(tmp_path))
(tmp_path / "config.yaml").write_text(
yaml.safe_dump({"image_gen": {"provider": "openai"}})
)
image_gen_registry.register_provider(_FakeProvider("fal"))
image_gen_registry.register_provider(_FakeProvider("openai"))
active = image_gen_registry.get_active_provider()
assert active is not None and active.name == "openai"
def test_missing_configured_provider_falls_back(self, tmp_path, monkeypatch):
import yaml
monkeypatch.setenv("HERMES_HOME", str(tmp_path))
(tmp_path / "config.yaml").write_text(
yaml.safe_dump({"image_gen": {"provider": "replicate"}})
)
# Only FAL is registered — configured provider doesn't exist
image_gen_registry.register_provider(_FakeProvider("fal"))
active = image_gen_registry.get_active_provider()
# Falls back to FAL preference (legacy default) rather than None
assert active is not None and active.name == "fal"
def test_none_when_empty(self, tmp_path, monkeypatch):
monkeypatch.setenv("HERMES_HOME", str(tmp_path))
assert image_gen_registry.get_active_provider() is None
@@ -0,0 +1,115 @@
"""Regression guard: don't send Anthropic ``thinking`` to Kimi's /coding endpoint.
Kimi's ``api.kimi.com/coding`` endpoint speaks the Anthropic Messages protocol
but has its own thinking semantics. When ``thinking.enabled`` is present in
the request, Kimi validates the message history and requires every prior
assistant tool-call message to carry OpenAI-style ``reasoning_content``.
The Anthropic path never populates that field, and
``convert_messages_to_anthropic`` strips Anthropic thinking blocks on
third-party endpoints so after one turn with tool calls the next request
fails with HTTP 400::
thinking is enabled but reasoning_content is missing in assistant
tool call message at index N
Kimi on the chat_completions route handles ``thinking`` via ``extra_body`` in
``ChatCompletionsTransport`` (#13503). On the Anthropic route the right
thing to do is drop the parameter entirely and let Kimi drive reasoning
server-side.
"""
from __future__ import annotations
import pytest
class TestKimiCodingSkipsAnthropicThinking:
"""build_anthropic_kwargs must not inject ``thinking`` for Kimi /coding."""
@pytest.mark.parametrize(
"base_url",
[
"https://api.kimi.com/coding",
"https://api.kimi.com/coding/v1",
"https://api.kimi.com/coding/anthropic",
"https://api.kimi.com/coding/",
],
)
def test_kimi_coding_endpoint_omits_thinking(self, base_url: str) -> None:
from agent.anthropic_adapter import build_anthropic_kwargs
kwargs = build_anthropic_kwargs(
model="kimi-k2.5",
messages=[{"role": "user", "content": "hello"}],
tools=None,
max_tokens=4096,
reasoning_config={"enabled": True, "effort": "medium"},
base_url=base_url,
)
assert "thinking" not in kwargs, (
"Anthropic thinking must not be sent to Kimi /coding — "
"endpoint requires reasoning_content on history we don't preserve."
)
assert "output_config" not in kwargs
def test_kimi_coding_with_explicit_disabled_also_omits(self) -> None:
from agent.anthropic_adapter import build_anthropic_kwargs
kwargs = build_anthropic_kwargs(
model="kimi-k2.5",
messages=[{"role": "user", "content": "hello"}],
tools=None,
max_tokens=4096,
reasoning_config={"enabled": False},
base_url="https://api.kimi.com/coding",
)
assert "thinking" not in kwargs
def test_non_kimi_third_party_still_gets_thinking(self) -> None:
"""MiniMax and other third-party Anthropic endpoints must retain thinking."""
from agent.anthropic_adapter import build_anthropic_kwargs
kwargs = build_anthropic_kwargs(
model="MiniMax-M2.7",
messages=[{"role": "user", "content": "hello"}],
tools=None,
max_tokens=4096,
reasoning_config={"enabled": True, "effort": "medium"},
base_url="https://api.minimax.io/anthropic",
)
assert "thinking" in kwargs
assert kwargs["thinking"]["type"] == "enabled"
def test_native_anthropic_still_gets_thinking(self) -> None:
from agent.anthropic_adapter import build_anthropic_kwargs
kwargs = build_anthropic_kwargs(
model="claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "hello"}],
tools=None,
max_tokens=4096,
reasoning_config={"enabled": True, "effort": "medium"},
base_url=None,
)
assert "thinking" in kwargs
def test_kimi_root_endpoint_unaffected(self) -> None:
"""Only the /coding route is special-cased — plain api.kimi.com is not.
``api.kimi.com`` without ``/coding`` uses the chat_completions transport
(see runtime_provider._detect_api_mode_for_url); build_anthropic_kwargs
should never see it, but if it somehow does we should not suppress
thinking there that path has different semantics.
"""
from agent.anthropic_adapter import build_anthropic_kwargs
kwargs = build_anthropic_kwargs(
model="kimi-k2.5",
messages=[{"role": "user", "content": "hello"}],
tools=None,
max_tokens=4096,
reasoning_config={"enabled": True, "effort": "medium"},
base_url="https://api.kimi.com/v1",
)
assert "thinking" in kwargs
+23
View File
@@ -79,6 +79,28 @@ class TestMemoryManagerUserIdThreading:
assert p._init_kwargs.get("platform") == "telegram"
assert p._init_session_id == "sess-123"
def test_chat_context_forwarded_to_provider(self):
mgr = MemoryManager()
p = RecordingProvider()
mgr.add_provider(p)
mgr.initialize_all(
session_id="sess-chat",
platform="discord",
user_id="discord_u_7",
user_name="fakeusername",
chat_id="1485316232612941897",
chat_name="fakeassistantname-forums",
chat_type="thread",
thread_id="1491249007475949698",
)
assert p._init_kwargs.get("user_name") == "fakeusername"
assert p._init_kwargs.get("chat_id") == "1485316232612941897"
assert p._init_kwargs.get("chat_name") == "fakeassistantname-forums"
assert p._init_kwargs.get("chat_type") == "thread"
assert p._init_kwargs.get("thread_id") == "1491249007475949698"
def test_no_user_id_when_cli(self):
"""CLI sessions should not have user_id in kwargs."""
mgr = MemoryManager()
@@ -334,3 +356,4 @@ class TestAIAgentUserIdPropagation:
agent = object.__new__(AIAgent)
agent._user_id = None
assert agent._user_id is None
+17
View File
@@ -222,6 +222,22 @@ class TestGetModelContextLength:
mock_fetch.return_value = {}
assert get_model_context_length("unknown/never-heard-of-this") == CONTEXT_PROBE_TIERS[0]
@patch("agent.model_metadata.fetch_model_metadata")
def test_volcengine_contract_model_uses_contract_context_length(self, mock_fetch):
mock_fetch.return_value = {}
assert get_model_context_length(
"volcengine/doubao-seed-2-0-pro-260215",
provider="volcengine",
) == 256000
@patch("agent.model_metadata.fetch_model_metadata")
def test_byteplus_contract_model_infers_provider_from_url(self, mock_fetch):
mock_fetch.return_value = {}
assert get_model_context_length(
"byteplus-coding-plan/kimi-k2.5",
base_url="https://ark.ap-southeast.bytepluses.com/api/coding/v3",
) == 256000
@patch("agent.model_metadata.fetch_model_metadata")
def test_partial_match_in_defaults(self, mock_fetch):
mock_fetch.return_value = {}
@@ -385,6 +401,7 @@ class TestStripProviderPrefix:
assert _strip_provider_prefix("local:my-model") == "my-model"
assert _strip_provider_prefix("openrouter:anthropic/claude-sonnet-4") == "anthropic/claude-sonnet-4"
assert _strip_provider_prefix("anthropic:claude-sonnet-4") == "claude-sonnet-4"
assert _strip_provider_prefix("stepfun:step-3.5-flash") == "step-3.5-flash"
def test_ollama_model_tag_preserved(self):
"""Ollama model:tag format must NOT be stripped."""
+1
View File
@@ -82,6 +82,7 @@ class TestProviderMapping:
def test_known_providers_mapped(self):
assert PROVIDER_TO_MODELS_DEV["anthropic"] == "anthropic"
assert PROVIDER_TO_MODELS_DEV["copilot"] == "github-copilot"
assert PROVIDER_TO_MODELS_DEV["stepfun"] == "stepfun"
assert PROVIDER_TO_MODELS_DEV["kilocode"] == "kilo"
assert PROVIDER_TO_MODELS_DEV["ai-gateway"] == "vercel"
+18
View File
@@ -789,6 +789,24 @@ class TestPromptBuilderConstants:
assert "cron" in PLATFORM_HINTS
assert "cli" in PLATFORM_HINTS
def test_cli_hint_does_not_suggest_media_tags(self):
# Regression: MEDIA:/path tags are intercepted only by messaging
# gateway platforms. On the CLI they render as literal text and
# confuse users. The CLI hint must steer the agent away from them.
cli_hint = PLATFORM_HINTS["cli"]
assert "MEDIA:" in cli_hint, (
"CLI hint should mention MEDIA: in order to tell the agent "
"NOT to use it (negative guidance)."
)
# Must contain explicit "don't" language near the MEDIA reference.
assert any(
marker in cli_hint.lower()
for marker in ("do not emit media", "not intercepted", "do not", "don't")
), "CLI hint should explicitly discourage MEDIA: tags."
# Messaging hints should still advertise MEDIA: positively (sanity
# check that this test is calibrated correctly).
assert "include MEDIA:" in PLATFORM_HINTS["telegram"]
# =========================================================================
# Environment hints
@@ -0,0 +1,164 @@
"""Tests for the BedrockTransport."""
import json
import pytest
from types import SimpleNamespace
from agent.transports import get_transport
from agent.transports.types import NormalizedResponse, ToolCall
@pytest.fixture
def transport():
import agent.transports.bedrock # noqa: F401
return get_transport("bedrock_converse")
class TestBedrockBasic:
def test_api_mode(self, transport):
assert transport.api_mode == "bedrock_converse"
def test_registered(self, transport):
assert transport is not None
class TestBedrockBuildKwargs:
def test_basic_kwargs(self, transport):
msgs = [{"role": "user", "content": "Hello"}]
kw = transport.build_kwargs(model="anthropic.claude-3-5-sonnet-20241022-v2:0", messages=msgs)
assert kw["modelId"] == "anthropic.claude-3-5-sonnet-20241022-v2:0"
assert kw["__bedrock_converse__"] is True
assert kw["__bedrock_region__"] == "us-east-1"
assert "messages" in kw
def test_custom_region(self, transport):
msgs = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(
model="anthropic.claude-3-5-sonnet-20241022-v2:0",
messages=msgs,
region="eu-west-1",
)
assert kw["__bedrock_region__"] == "eu-west-1"
def test_max_tokens(self, transport):
msgs = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(
model="anthropic.claude-3-5-sonnet-20241022-v2:0",
messages=msgs,
max_tokens=8192,
)
assert kw["inferenceConfig"]["maxTokens"] == 8192
class TestBedrockConvertTools:
def test_convert_tools(self, transport):
tools = [{
"type": "function",
"function": {
"name": "terminal",
"description": "Run commands",
"parameters": {"type": "object", "properties": {"command": {"type": "string"}}},
}
}]
result = transport.convert_tools(tools)
assert len(result) == 1
assert result[0]["toolSpec"]["name"] == "terminal"
class TestBedrockValidate:
def test_none(self, transport):
assert transport.validate_response(None) is False
def test_raw_dict_valid(self, transport):
assert transport.validate_response({"output": {"message": {}}}) is True
def test_raw_dict_invalid(self, transport):
assert transport.validate_response({"error": "fail"}) is False
def test_normalized_valid(self, transport):
r = SimpleNamespace(choices=[SimpleNamespace(message=SimpleNamespace(content="hi"))])
assert transport.validate_response(r) is True
class TestBedrockMapFinishReason:
def test_end_turn(self, transport):
assert transport.map_finish_reason("end_turn") == "stop"
def test_tool_use(self, transport):
assert transport.map_finish_reason("tool_use") == "tool_calls"
def test_max_tokens(self, transport):
assert transport.map_finish_reason("max_tokens") == "length"
def test_guardrail(self, transport):
assert transport.map_finish_reason("guardrail_intervened") == "content_filter"
def test_unknown(self, transport):
assert transport.map_finish_reason("unknown") == "stop"
class TestBedrockNormalize:
def _make_bedrock_response(self, text="Hello", tool_calls=None, stop_reason="end_turn"):
"""Build a raw Bedrock converse response dict."""
content = []
if text:
content.append({"text": text})
if tool_calls:
for tc in tool_calls:
content.append({
"toolUse": {
"toolUseId": tc["id"],
"name": tc["name"],
"input": tc["input"],
}
})
return {
"output": {"message": {"role": "assistant", "content": content}},
"stopReason": stop_reason,
"usage": {"inputTokens": 10, "outputTokens": 5, "totalTokens": 15},
}
def test_text_response(self, transport):
raw = self._make_bedrock_response(text="Hello world")
nr = transport.normalize_response(raw)
assert isinstance(nr, NormalizedResponse)
assert nr.content == "Hello world"
assert nr.finish_reason == "stop"
def test_tool_call_response(self, transport):
raw = self._make_bedrock_response(
text=None,
tool_calls=[{"id": "tool_1", "name": "terminal", "input": {"command": "ls"}}],
stop_reason="tool_use",
)
nr = transport.normalize_response(raw)
assert nr.finish_reason == "tool_calls"
assert len(nr.tool_calls) == 1
assert nr.tool_calls[0].name == "terminal"
def test_already_normalized_response(self, transport):
"""Test normalize_response handles already-normalized SimpleNamespace (from dispatch site)."""
pre_normalized = SimpleNamespace(
choices=[SimpleNamespace(
message=SimpleNamespace(
content="Hello from Bedrock",
tool_calls=None,
reasoning=None,
reasoning_content=None,
),
finish_reason="stop",
)],
usage=SimpleNamespace(prompt_tokens=10, completion_tokens=5, total_tokens=15),
)
nr = transport.normalize_response(pre_normalized)
assert isinstance(nr, NormalizedResponse)
assert nr.content == "Hello from Bedrock"
assert nr.finish_reason == "stop"
assert nr.usage is not None
assert nr.usage.prompt_tokens == 10
@@ -0,0 +1,349 @@
"""Tests for the ChatCompletionsTransport."""
import pytest
from types import SimpleNamespace
from agent.transports import get_transport
from agent.transports.types import NormalizedResponse, ToolCall
@pytest.fixture
def transport():
import agent.transports.chat_completions # noqa: F401
return get_transport("chat_completions")
class TestChatCompletionsBasic:
def test_api_mode(self, transport):
assert transport.api_mode == "chat_completions"
def test_registered(self, transport):
assert transport is not None
def test_convert_tools_identity(self, transport):
tools = [{"type": "function", "function": {"name": "test", "parameters": {}}}]
assert transport.convert_tools(tools) is tools
def test_convert_messages_no_codex_leaks(self, transport):
msgs = [{"role": "user", "content": "hi"}]
result = transport.convert_messages(msgs)
assert result is msgs # no copy needed
def test_convert_messages_strips_codex_fields(self, transport):
msgs = [
{"role": "assistant", "content": "ok", "codex_reasoning_items": [{"id": "rs_1"}],
"tool_calls": [{"id": "call_1", "call_id": "call_1", "response_item_id": "fc_1",
"type": "function", "function": {"name": "t", "arguments": "{}"}}]},
]
result = transport.convert_messages(msgs)
assert "codex_reasoning_items" not in result[0]
assert "call_id" not in result[0]["tool_calls"][0]
assert "response_item_id" not in result[0]["tool_calls"][0]
# Original list untouched (deepcopy-on-demand)
assert "codex_reasoning_items" in msgs[0]
class TestChatCompletionsBuildKwargs:
def test_basic_kwargs(self, transport):
msgs = [{"role": "user", "content": "Hello"}]
kw = transport.build_kwargs(model="gpt-4o", messages=msgs, timeout=30.0)
assert kw["model"] == "gpt-4o"
assert kw["messages"][0]["content"] == "Hello"
assert kw["timeout"] == 30.0
def test_developer_role_swap(self, transport):
msgs = [{"role": "system", "content": "You are helpful"}, {"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(model="gpt-5.4", messages=msgs, model_lower="gpt-5.4")
assert kw["messages"][0]["role"] == "developer"
def test_no_developer_swap_for_non_gpt5(self, transport):
msgs = [{"role": "system", "content": "You are helpful"}, {"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(model="claude-sonnet-4", messages=msgs, model_lower="claude-sonnet-4")
assert kw["messages"][0]["role"] == "system"
def test_tools_included(self, transport):
msgs = [{"role": "user", "content": "Hi"}]
tools = [{"type": "function", "function": {"name": "test", "parameters": {}}}]
kw = transport.build_kwargs(model="gpt-4o", messages=msgs, tools=tools)
assert kw["tools"] == tools
def test_openrouter_provider_prefs(self, transport):
msgs = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(
model="gpt-4o", messages=msgs,
is_openrouter=True,
provider_preferences={"only": ["openai"]},
)
assert kw["extra_body"]["provider"] == {"only": ["openai"]}
def test_nous_tags(self, transport):
msgs = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(model="gpt-4o", messages=msgs, is_nous=True)
assert kw["extra_body"]["tags"] == ["product=hermes-agent"]
def test_reasoning_default(self, transport):
msgs = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(
model="gpt-4o", messages=msgs,
supports_reasoning=True,
)
assert kw["extra_body"]["reasoning"] == {"enabled": True, "effort": "medium"}
def test_nous_omits_disabled_reasoning(self, transport):
msgs = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(
model="gpt-4o", messages=msgs,
supports_reasoning=True,
is_nous=True,
reasoning_config={"enabled": False},
)
# Nous rejects enabled=false; reasoning omitted entirely
assert "reasoning" not in kw.get("extra_body", {})
def test_ollama_num_ctx(self, transport):
msgs = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(
model="llama3", messages=msgs,
ollama_num_ctx=32768,
)
assert kw["extra_body"]["options"]["num_ctx"] == 32768
def test_custom_think_false(self, transport):
msgs = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(
model="qwen3", messages=msgs,
is_custom_provider=True,
reasoning_config={"effort": "none"},
)
assert kw["extra_body"]["think"] is False
def test_max_tokens_with_fn(self, transport):
msgs = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(
model="gpt-4o", messages=msgs,
max_tokens=4096,
max_tokens_param_fn=lambda n: {"max_tokens": n},
)
assert kw["max_tokens"] == 4096
def test_ephemeral_overrides_max_tokens(self, transport):
msgs = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(
model="gpt-4o", messages=msgs,
max_tokens=4096,
ephemeral_max_output_tokens=2048,
max_tokens_param_fn=lambda n: {"max_tokens": n},
)
assert kw["max_tokens"] == 2048
def test_nvidia_default_max_tokens(self, transport):
msgs = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(
model="glm-4.7", messages=msgs,
is_nvidia_nim=True,
max_tokens_param_fn=lambda n: {"max_tokens": n},
)
# NVIDIA default: 16384
assert kw["max_tokens"] == 16384
def test_qwen_default_max_tokens(self, transport):
msgs = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(
model="qwen3-coder-plus", messages=msgs,
is_qwen_portal=True,
max_tokens_param_fn=lambda n: {"max_tokens": n},
)
# Qwen default: 65536
assert kw["max_tokens"] == 65536
def test_anthropic_max_output_for_claude_on_aggregator(self, transport):
msgs = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(
model="anthropic/claude-sonnet-4.6", messages=msgs,
is_openrouter=True,
anthropic_max_output=64000,
)
# Set as plain max_tokens (not via fn) because the aggregator proxies to
# Anthropic Messages API which requires the field.
assert kw["max_tokens"] == 64000
def test_request_overrides_last(self, transport):
msgs = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(
model="gpt-4o", messages=msgs,
request_overrides={"service_tier": "priority"},
)
assert kw["service_tier"] == "priority"
def test_fixed_temperature(self, transport):
msgs = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(model="gpt-4o", messages=msgs, fixed_temperature=0.6)
assert kw["temperature"] == 0.6
def test_omit_temperature(self, transport):
msgs = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(model="gpt-4o", messages=msgs, omit_temperature=True, fixed_temperature=0.5)
# omit wins
assert "temperature" not in kw
class TestChatCompletionsKimi:
"""Regression tests for the Kimi/Moonshot quirks migrated into the transport."""
def test_kimi_max_tokens_default(self, transport):
kw = transport.build_kwargs(
model="kimi-k2", messages=[{"role": "user", "content": "Hi"}],
is_kimi=True,
max_tokens_param_fn=lambda n: {"max_tokens": n},
)
# Kimi CLI default: 32000
assert kw["max_tokens"] == 32000
def test_kimi_reasoning_effort_top_level(self, transport):
kw = transport.build_kwargs(
model="kimi-k2", messages=[{"role": "user", "content": "Hi"}],
is_kimi=True,
reasoning_config={"effort": "high"},
max_tokens_param_fn=lambda n: {"max_tokens": n},
)
# Kimi requires reasoning_effort as a top-level parameter
assert kw["reasoning_effort"] == "high"
def test_kimi_reasoning_effort_omitted_when_thinking_disabled(self, transport):
kw = transport.build_kwargs(
model="kimi-k2", messages=[{"role": "user", "content": "Hi"}],
is_kimi=True,
reasoning_config={"enabled": False},
max_tokens_param_fn=lambda n: {"max_tokens": n},
)
# Mirror Kimi CLI: omit reasoning_effort entirely when thinking off
assert "reasoning_effort" not in kw
def test_kimi_thinking_enabled_extra_body(self, transport):
kw = transport.build_kwargs(
model="kimi-k2", messages=[{"role": "user", "content": "Hi"}],
is_kimi=True,
max_tokens_param_fn=lambda n: {"max_tokens": n},
)
assert kw["extra_body"]["thinking"] == {"type": "enabled"}
def test_kimi_thinking_disabled_extra_body(self, transport):
kw = transport.build_kwargs(
model="kimi-k2", messages=[{"role": "user", "content": "Hi"}],
is_kimi=True,
reasoning_config={"enabled": False},
max_tokens_param_fn=lambda n: {"max_tokens": n},
)
assert kw["extra_body"]["thinking"] == {"type": "disabled"}
class TestChatCompletionsValidate:
def test_none(self, transport):
assert transport.validate_response(None) is False
def test_no_choices(self, transport):
r = SimpleNamespace(choices=None)
assert transport.validate_response(r) is False
def test_empty_choices(self, transport):
r = SimpleNamespace(choices=[])
assert transport.validate_response(r) is False
def test_valid(self, transport):
r = SimpleNamespace(choices=[SimpleNamespace(message=SimpleNamespace(content="hi"))])
assert transport.validate_response(r) is True
class TestChatCompletionsNormalize:
def test_text_response(self, transport):
r = SimpleNamespace(
choices=[SimpleNamespace(
message=SimpleNamespace(content="Hello", tool_calls=None, reasoning_content=None),
finish_reason="stop",
)],
usage=SimpleNamespace(prompt_tokens=10, completion_tokens=5, total_tokens=15),
)
nr = transport.normalize_response(r)
assert isinstance(nr, NormalizedResponse)
assert nr.content == "Hello"
assert nr.finish_reason == "stop"
assert nr.tool_calls is None
def test_tool_call_response(self, transport):
tc = SimpleNamespace(
id="call_123",
function=SimpleNamespace(name="terminal", arguments='{"command": "ls"}'),
)
r = SimpleNamespace(
choices=[SimpleNamespace(
message=SimpleNamespace(content=None, tool_calls=[tc], reasoning_content=None),
finish_reason="tool_calls",
)],
usage=SimpleNamespace(prompt_tokens=10, completion_tokens=20, total_tokens=30),
)
nr = transport.normalize_response(r)
assert len(nr.tool_calls) == 1
assert nr.tool_calls[0].name == "terminal"
assert nr.tool_calls[0].id == "call_123"
def test_tool_call_extra_content_preserved(self, transport):
"""Gemini 3 thinking models attach extra_content with thought_signature
on tool_calls. Without this replay on the next turn, the API rejects
the request with 400. The transport MUST surface extra_content so the
agent loop can write it back into the assistant message."""
tc = SimpleNamespace(
id="call_gem",
function=SimpleNamespace(name="terminal", arguments='{"command": "ls"}'),
extra_content={"google": {"thought_signature": "SIG_ABC123"}},
)
r = SimpleNamespace(
choices=[SimpleNamespace(
message=SimpleNamespace(content=None, tool_calls=[tc], reasoning_content=None),
finish_reason="tool_calls",
)],
usage=None,
)
nr = transport.normalize_response(r)
assert nr.tool_calls[0].provider_data == {
"extra_content": {"google": {"thought_signature": "SIG_ABC123"}}
}
def test_reasoning_content_preserved_separately(self, transport):
"""DeepSeek/Moonshot use reasoning_content distinct from reasoning.
Don't merge them — the thinking-prefill retry check reads each field
separately."""
r = SimpleNamespace(
choices=[SimpleNamespace(
message=SimpleNamespace(
content=None, tool_calls=None,
reasoning="summary text",
reasoning_content="detailed scratchpad",
),
finish_reason="stop",
)],
usage=None,
)
nr = transport.normalize_response(r)
assert nr.reasoning == "summary text"
assert nr.provider_data == {"reasoning_content": "detailed scratchpad"}
class TestChatCompletionsCacheStats:
def test_no_usage(self, transport):
r = SimpleNamespace(usage=None)
assert transport.extract_cache_stats(r) is None
def test_no_details(self, transport):
r = SimpleNamespace(usage=SimpleNamespace(prompt_tokens_details=None))
assert transport.extract_cache_stats(r) is None
def test_with_cache(self, transport):
details = SimpleNamespace(cached_tokens=500, cache_write_tokens=100)
r = SimpleNamespace(usage=SimpleNamespace(prompt_tokens_details=details))
result = transport.extract_cache_stats(r)
assert result == {"cached_tokens": 500, "creation_tokens": 100}
@@ -0,0 +1,220 @@
"""Tests for the ResponsesApiTransport (Codex)."""
import json
import pytest
from types import SimpleNamespace
from agent.transports import get_transport
from agent.transports.types import NormalizedResponse, ToolCall
@pytest.fixture
def transport():
import agent.transports.codex # noqa: F401
return get_transport("codex_responses")
class TestCodexTransportBasic:
def test_api_mode(self, transport):
assert transport.api_mode == "codex_responses"
def test_registered_on_import(self, transport):
assert transport is not None
def test_convert_tools(self, transport):
tools = [{
"type": "function",
"function": {
"name": "terminal",
"description": "Run a command",
"parameters": {"type": "object", "properties": {"command": {"type": "string"}}},
}
}]
result = transport.convert_tools(tools)
assert len(result) == 1
assert result[0]["type"] == "function"
assert result[0]["name"] == "terminal"
class TestCodexBuildKwargs:
def test_basic_kwargs(self, transport):
messages = [
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "Hello"},
]
kw = transport.build_kwargs(
model="gpt-5.4",
messages=messages,
tools=[],
)
assert kw["model"] == "gpt-5.4"
assert kw["instructions"] == "You are helpful."
assert "input" in kw
assert kw["store"] is False
def test_system_extracted_from_messages(self, transport):
messages = [
{"role": "system", "content": "Custom system prompt"},
{"role": "user", "content": "Hi"},
]
kw = transport.build_kwargs(model="gpt-5.4", messages=messages, tools=[])
assert kw["instructions"] == "Custom system prompt"
def test_no_system_uses_default(self, transport):
messages = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(model="gpt-5.4", messages=messages, tools=[])
assert kw["instructions"] # should be non-empty default
def test_reasoning_config(self, transport):
messages = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(
model="gpt-5.4", messages=messages, tools=[],
reasoning_config={"effort": "high"},
)
assert kw.get("reasoning", {}).get("effort") == "high"
def test_reasoning_disabled(self, transport):
messages = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(
model="gpt-5.4", messages=messages, tools=[],
reasoning_config={"enabled": False},
)
assert "reasoning" not in kw or kw.get("include") == []
def test_session_id_sets_cache_key(self, transport):
messages = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(
model="gpt-5.4", messages=messages, tools=[],
session_id="test-session-123",
)
assert kw.get("prompt_cache_key") == "test-session-123"
def test_github_responses_no_cache_key(self, transport):
messages = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(
model="gpt-5.4", messages=messages, tools=[],
session_id="test-session",
is_github_responses=True,
)
assert "prompt_cache_key" not in kw
def test_max_tokens(self, transport):
messages = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(
model="gpt-5.4", messages=messages, tools=[],
max_tokens=4096,
)
assert kw.get("max_output_tokens") == 4096
def test_codex_backend_no_max_output_tokens(self, transport):
messages = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(
model="gpt-5.4", messages=messages, tools=[],
max_tokens=4096,
is_codex_backend=True,
)
assert "max_output_tokens" not in kw
def test_xai_headers(self, transport):
messages = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(
model="grok-3", messages=messages, tools=[],
session_id="conv-123",
is_xai_responses=True,
)
assert kw.get("extra_headers", {}).get("x-grok-conv-id") == "conv-123"
def test_minimal_effort_clamped(self, transport):
messages = [{"role": "user", "content": "Hi"}]
kw = transport.build_kwargs(
model="gpt-5.4", messages=messages, tools=[],
reasoning_config={"effort": "minimal"},
)
# "minimal" should be clamped to "low"
assert kw.get("reasoning", {}).get("effort") == "low"
class TestCodexValidateResponse:
def test_none_response(self, transport):
assert transport.validate_response(None) is False
def test_empty_output(self, transport):
r = SimpleNamespace(output=[], output_text=None)
assert transport.validate_response(r) is False
def test_valid_output(self, transport):
r = SimpleNamespace(output=[{"type": "message", "content": []}])
assert transport.validate_response(r) is True
def test_output_text_fallback_not_valid(self, transport):
"""validate_response is strict — output_text doesn't make it valid.
The caller handles output_text fallback with diagnostic logging."""
r = SimpleNamespace(output=None, output_text="Some text")
assert transport.validate_response(r) is False
class TestCodexMapFinishReason:
def test_completed(self, transport):
assert transport.map_finish_reason("completed") == "stop"
def test_incomplete(self, transport):
assert transport.map_finish_reason("incomplete") == "length"
def test_failed(self, transport):
assert transport.map_finish_reason("failed") == "stop"
def test_unknown(self, transport):
assert transport.map_finish_reason("unknown_status") == "stop"
class TestCodexNormalizeResponse:
def test_text_response(self, transport):
"""Normalize a simple text Codex response."""
r = SimpleNamespace(
output=[
SimpleNamespace(
type="message",
role="assistant",
content=[SimpleNamespace(type="output_text", text="Hello world")],
status="completed",
),
],
status="completed",
incomplete_details=None,
usage=SimpleNamespace(input_tokens=10, output_tokens=5,
input_tokens_details=None, output_tokens_details=None),
)
nr = transport.normalize_response(r)
assert isinstance(nr, NormalizedResponse)
assert nr.content == "Hello world"
assert nr.finish_reason == "stop"
def test_tool_call_response(self, transport):
"""Normalize a Codex response with tool calls."""
r = SimpleNamespace(
output=[
SimpleNamespace(
type="function_call",
call_id="call_abc123",
name="terminal",
arguments=json.dumps({"command": "ls"}),
id="fc_abc123",
status="completed",
),
],
status="completed",
incomplete_details=None,
usage=SimpleNamespace(input_tokens=10, output_tokens=20,
input_tokens_details=None, output_tokens_details=None),
)
nr = transport.normalize_response(r)
assert nr.finish_reason == "tool_calls"
assert len(nr.tool_calls) == 1
tc = nr.tool_calls[0]
assert tc.name == "terminal"
assert '"command"' in tc.arguments
+3
View File
@@ -1059,6 +1059,7 @@ class TestRewriteTranscriptPreservesReasoning:
role="assistant",
content="The answer is 42.",
reasoning="I need to think step by step.",
reasoning_content="provider scratchpad",
reasoning_details=[{"type": "summary", "text": "step by step"}],
codex_reasoning_items=[{"id": "r1", "type": "reasoning"}],
)
@@ -1066,6 +1067,7 @@ class TestRewriteTranscriptPreservesReasoning:
# Verify all three were stored
before = db.get_messages_as_conversation(session_id)
assert before[0].get("reasoning") == "I need to think step by step."
assert before[0].get("reasoning_content") == "provider scratchpad"
assert before[0].get("reasoning_details") == [{"type": "summary", "text": "step by step"}]
assert before[0].get("codex_reasoning_items") == [{"id": "r1", "type": "reasoning"}]
@@ -1082,5 +1084,6 @@ class TestRewriteTranscriptPreservesReasoning:
# Load again — all three reasoning fields must survive
after = db.get_messages_as_conversation(session_id)
assert after[0].get("reasoning") == "I need to think step by step."
assert after[0].get("reasoning_content") == "provider scratchpad"
assert after[0].get("reasoning_details") == [{"type": "summary", "text": "step by step"}]
assert after[0].get("codex_reasoning_items") == [{"id": "r1", "type": "reasoning"}]
@@ -0,0 +1,76 @@
"""Regression tests for the TUI gateway's ``session.list`` handler.
Reported during TUI v2 blitz retest: the ``/resume`` modal inside a TUI
session only surfaced ``tui``/``cli`` rows, hiding telegram sessions users
could still resume directly via ``hermes --tui --resume <id>``.
The fix widens the picker to a curated allowlist of user-facing sources
(tui/cli + chat adapters) while still filtering internal/system sources.
"""
from __future__ import annotations
from tui_gateway import server
class _StubDB:
def __init__(self, rows):
self.rows = rows
self.calls: list[dict] = []
def list_sessions_rich(self, **kwargs):
self.calls.append(kwargs)
return list(self.rows)
def _call(limit: int = 20):
return server.handle_request({
"id": "1",
"method": "session.list",
"params": {"limit": limit},
})
def test_session_list_includes_telegram_but_filters_internal_sources(monkeypatch):
rows = [
{"id": "tui-1", "source": "tui", "started_at": 9},
{"id": "tool-1", "source": "tool", "started_at": 8},
{"id": "tg-1", "source": "telegram", "started_at": 7},
{"id": "acp-1", "source": "acp", "started_at": 6},
{"id": "cli-1", "source": "cli", "started_at": 5},
]
db = _StubDB(rows)
monkeypatch.setattr(server, "_get_db", lambda: db)
resp = _call(limit=10)
sessions = resp["result"]["sessions"]
ids = [s["id"] for s in sessions]
assert "tg-1" in ids and "tui-1" in ids and "cli-1" in ids, ids
assert "tool-1" not in ids and "acp-1" not in ids, ids
def test_session_list_fetches_wider_window_before_filtering(monkeypatch):
db = _StubDB([{"id": "x", "source": "cli", "started_at": 1}])
monkeypatch.setattr(server, "_get_db", lambda: db)
_call(limit=10)
assert len(db.calls) == 1
assert db.calls[0].get("source") is None, db.calls[0]
assert db.calls[0].get("limit") == 100, db.calls[0]
def test_session_list_preserves_ordering_after_filter(monkeypatch):
rows = [
{"id": "newest", "source": "telegram", "started_at": 5},
{"id": "internal", "source": "tool", "started_at": 4},
{"id": "middle", "source": "tui", "started_at": 3},
{"id": "oldest", "source": "discord", "started_at": 1},
]
monkeypatch.setattr(server, "_get_db", lambda: _StubDB(rows))
resp = _call()
ids = [s["id"] for s in resp["result"]["sessions"]]
assert ids == ["newest", "middle", "oldest"]
+135 -3
View File
@@ -1031,7 +1031,7 @@ class TestReactions:
@pytest.mark.asyncio
async def test_reactions_in_message_flow(self, adapter):
"""Reactions should be added on receipt and swapped on completion."""
"""Reactions should be bracketed around actual processing via hooks."""
adapter._app.client.reactions_add = AsyncMock()
adapter._app.client.reactions_remove = AsyncMock()
adapter._app.client.users_info = AsyncMock(return_value={
@@ -1047,15 +1047,147 @@ class TestReactions:
}
await adapter._handle_slack_message(event)
# Should have added 👀, then removed 👀, then added ✅
# _handle_slack_message should register the message for reactions
assert "1234567890.000001" in adapter._reacting_message_ids
# Simulate the base class calling on_processing_start
from gateway.platforms.base import MessageEvent, MessageType, SessionSource
from gateway.config import Platform
source = SessionSource(
platform=Platform.SLACK,
chat_id="C123",
chat_type="dm",
user_id="U_USER",
)
msg_event = MessageEvent(
text="hello",
message_type=MessageType.TEXT,
source=source,
message_id="1234567890.000001",
)
await adapter.on_processing_start(msg_event)
add_calls = adapter._app.client.reactions_add.call_args_list
assert len(add_calls) == 1
assert add_calls[0].kwargs["name"] == "eyes"
# Simulate the base class calling on_processing_complete
from gateway.platforms.base import ProcessingOutcome
await adapter.on_processing_complete(msg_event, ProcessingOutcome.SUCCESS)
add_calls = adapter._app.client.reactions_add.call_args_list
remove_calls = adapter._app.client.reactions_remove.call_args_list
assert len(add_calls) == 2
assert add_calls[0].kwargs["name"] == "eyes"
assert add_calls[1].kwargs["name"] == "white_check_mark"
assert len(remove_calls) == 1
assert remove_calls[0].kwargs["name"] == "eyes"
# Message ID should be cleaned up
assert "1234567890.000001" not in adapter._reacting_message_ids
@pytest.mark.asyncio
async def test_reactions_failure_outcome(self, adapter):
"""Failed processing should add :x: instead of :white_check_mark:."""
adapter._app.client.reactions_add = AsyncMock()
adapter._app.client.reactions_remove = AsyncMock()
from gateway.platforms.base import MessageEvent, MessageType, SessionSource, ProcessingOutcome
from gateway.config import Platform
source = SessionSource(
platform=Platform.SLACK,
chat_id="C123",
chat_type="dm",
user_id="U_USER",
)
adapter._reacting_message_ids.add("1234567890.000002")
msg_event = MessageEvent(
text="hello",
message_type=MessageType.TEXT,
source=source,
message_id="1234567890.000002",
)
await adapter.on_processing_complete(msg_event, ProcessingOutcome.FAILURE)
add_calls = adapter._app.client.reactions_add.call_args_list
remove_calls = adapter._app.client.reactions_remove.call_args_list
assert len(add_calls) == 1
assert add_calls[0].kwargs["name"] == "x"
assert len(remove_calls) == 1
assert remove_calls[0].kwargs["name"] == "eyes"
@pytest.mark.asyncio
async def test_reactions_skipped_for_non_dm_non_mention(self, adapter):
"""Non-DM, non-mention messages should not get reactions."""
adapter._app.client.reactions_add = AsyncMock()
adapter._app.client.reactions_remove = AsyncMock()
adapter._app.client.users_info = AsyncMock(return_value={
"user": {"profile": {"display_name": "Tyler"}}
})
event = {
"text": "hello",
"user": "U_USER",
"channel": "C123",
"channel_type": "channel",
"ts": "1234567890.000003",
}
await adapter._handle_slack_message(event)
# Should NOT register for reactions when not mentioned in a channel
assert "1234567890.000003" not in adapter._reacting_message_ids
adapter._app.client.reactions_add.assert_not_called()
adapter._app.client.reactions_remove.assert_not_called()
@pytest.mark.asyncio
async def test_reactions_disabled_via_env(self, adapter, monkeypatch):
"""SLACK_REACTIONS=false should suppress all reaction lifecycle."""
monkeypatch.setenv("SLACK_REACTIONS", "false")
adapter._app.client.reactions_add = AsyncMock()
adapter._app.client.reactions_remove = AsyncMock()
adapter._app.client.users_info = AsyncMock(return_value={
"user": {"profile": {"display_name": "Tyler"}}
})
event = {
"text": "hello",
"user": "U_USER",
"channel": "C123",
"channel_type": "im",
"ts": "1234567890.000004",
}
await adapter._handle_slack_message(event)
# Should NOT register for reactions when toggle is off
assert "1234567890.000004" not in adapter._reacting_message_ids
# Hooks should also be no-ops when disabled
from gateway.platforms.base import MessageEvent, MessageType, SessionSource, ProcessingOutcome
from gateway.config import Platform
source = SessionSource(
platform=Platform.SLACK,
chat_id="C123",
chat_type="dm",
user_id="U_USER",
)
msg_event = MessageEvent(
text="hello",
message_type=MessageType.TEXT,
source=source,
message_id="1234567890.000004",
)
# Force-add to verify hooks respect the toggle independently
adapter._reacting_message_ids.add("1234567890.000004")
await adapter.on_processing_start(msg_event)
await adapter.on_processing_complete(msg_event, ProcessingOutcome.SUCCESS)
adapter._app.client.reactions_add.assert_not_called()
adapter._app.client.reactions_remove.assert_not_called()
@pytest.mark.asyncio
async def test_reactions_enabled_by_default(self, adapter):
"""SLACK_REACTIONS defaults to true (matches existing behavior)."""
assert adapter._reactions_enabled() is True
# ---------------------------------------------------------------------------
# TestThreadReplyHandling
+115 -2
View File
@@ -15,6 +15,8 @@ from hermes_cli.auth import (
get_auth_status,
AuthError,
KIMI_CODE_BASE_URL,
STEPFUN_STEP_PLAN_INTL_BASE_URL,
STEPFUN_STEP_PLAN_CN_BASE_URL,
_resolve_kimi_base_url,
)
from hermes_cli.copilot_auth import _try_gh_cli_token
@@ -35,10 +37,13 @@ class TestProviderRegistry:
("xai", "xAI", "api_key"),
("nvidia", "NVIDIA NIM", "api_key"),
("kimi-coding", "Kimi / Moonshot", "api_key"),
("stepfun", "StepFun Step Plan", "api_key"),
("minimax", "MiniMax", "api_key"),
("minimax-cn", "MiniMax (China)", "api_key"),
("ai-gateway", "Vercel AI Gateway", "api_key"),
("kilocode", "Kilo Code", "api_key"),
("volcengine", "Volcengine", "api_key"),
("byteplus", "BytePlus", "api_key"),
])
def test_provider_registered(self, provider_id, name, auth_type):
assert provider_id in PROVIDER_REGISTRY
@@ -71,7 +76,11 @@ class TestProviderRegistry:
def test_kimi_env_vars(self):
pconfig = PROVIDER_REGISTRY["kimi-coding"]
assert pconfig.api_key_env_vars == ("KIMI_API_KEY",)
# KIMI_API_KEY is the primary env var; KIMI_CODING_API_KEY is a
# secondary fallback for Kimi Code sk-kimi- keys so users don't
# have to overload the same variable.
assert "KIMI_API_KEY" in pconfig.api_key_env_vars
assert "KIMI_CODING_API_KEY" in pconfig.api_key_env_vars
assert pconfig.base_url_env_var == "KIMI_BASE_URL"
def test_minimax_env_vars(self):
@@ -79,6 +88,11 @@ class TestProviderRegistry:
assert pconfig.api_key_env_vars == ("MINIMAX_API_KEY",)
assert pconfig.base_url_env_var == "MINIMAX_BASE_URL"
def test_stepfun_env_vars(self):
pconfig = PROVIDER_REGISTRY["stepfun"]
assert pconfig.api_key_env_vars == ("STEPFUN_API_KEY",)
assert pconfig.base_url_env_var == "STEPFUN_BASE_URL"
def test_minimax_cn_env_vars(self):
pconfig = PROVIDER_REGISTRY["minimax-cn"]
assert pconfig.api_key_env_vars == ("MINIMAX_CN_API_KEY",)
@@ -99,16 +113,29 @@ class TestProviderRegistry:
assert pconfig.api_key_env_vars == ("HF_TOKEN",)
assert pconfig.base_url_env_var == "HF_BASE_URL"
def test_volcengine_env_vars(self):
pconfig = PROVIDER_REGISTRY["volcengine"]
assert pconfig.api_key_env_vars == ("VOLCENGINE_API_KEY",)
assert pconfig.base_url_env_var == ""
def test_byteplus_env_vars(self):
pconfig = PROVIDER_REGISTRY["byteplus"]
assert pconfig.api_key_env_vars == ("BYTEPLUS_API_KEY",)
assert pconfig.base_url_env_var == ""
def test_base_urls(self):
assert PROVIDER_REGISTRY["copilot"].inference_base_url == "https://api.githubcopilot.com"
assert PROVIDER_REGISTRY["copilot-acp"].inference_base_url == "acp://copilot"
assert PROVIDER_REGISTRY["zai"].inference_base_url == "https://api.z.ai/api/paas/v4"
assert PROVIDER_REGISTRY["kimi-coding"].inference_base_url == "https://api.moonshot.ai/v1"
assert PROVIDER_REGISTRY["stepfun"].inference_base_url == STEPFUN_STEP_PLAN_INTL_BASE_URL
assert PROVIDER_REGISTRY["minimax"].inference_base_url == "https://api.minimax.io/anthropic"
assert PROVIDER_REGISTRY["minimax-cn"].inference_base_url == "https://api.minimaxi.com/anthropic"
assert PROVIDER_REGISTRY["ai-gateway"].inference_base_url == "https://ai-gateway.vercel.sh/v1"
assert PROVIDER_REGISTRY["kilocode"].inference_base_url == "https://api.kilo.ai/api/gateway"
assert PROVIDER_REGISTRY["huggingface"].inference_base_url == "https://router.huggingface.co/v1"
assert PROVIDER_REGISTRY["volcengine"].inference_base_url == "https://ark.cn-beijing.volces.com/api/v3"
assert PROVIDER_REGISTRY["byteplus"].inference_base_url == "https://ark.ap-southeast.bytepluses.com/api/v3"
def test_oauth_providers_unchanged(self):
"""Ensure we didn't break the existing OAuth providers."""
@@ -126,13 +153,15 @@ PROVIDER_ENV_VARS = (
"OPENROUTER_API_KEY", "OPENAI_API_KEY", "ANTHROPIC_API_KEY", "ANTHROPIC_TOKEN",
"CLAUDE_CODE_OAUTH_TOKEN",
"GLM_API_KEY", "ZAI_API_KEY", "Z_AI_API_KEY",
"KIMI_API_KEY", "KIMI_BASE_URL", "MINIMAX_API_KEY", "MINIMAX_CN_API_KEY",
"KIMI_API_KEY", "KIMI_BASE_URL", "STEPFUN_API_KEY", "STEPFUN_BASE_URL",
"MINIMAX_API_KEY", "MINIMAX_CN_API_KEY",
"AI_GATEWAY_API_KEY", "AI_GATEWAY_BASE_URL",
"KILOCODE_API_KEY", "KILOCODE_BASE_URL",
"DASHSCOPE_API_KEY", "OPENCODE_ZEN_API_KEY", "OPENCODE_GO_API_KEY",
"NOUS_API_KEY", "GITHUB_TOKEN", "GH_TOKEN",
"OPENAI_BASE_URL", "HERMES_COPILOT_ACP_COMMAND", "COPILOT_CLI_PATH",
"HERMES_COPILOT_ACP_ARGS", "COPILOT_ACP_BASE_URL",
"VOLCENGINE_API_KEY", "BYTEPLUS_API_KEY",
)
@@ -152,6 +181,9 @@ class TestResolveProvider:
def test_explicit_kimi_coding(self):
assert resolve_provider("kimi-coding") == "kimi-coding"
def test_explicit_stepfun(self):
assert resolve_provider("stepfun") == "stepfun"
def test_explicit_minimax(self):
assert resolve_provider("minimax") == "minimax"
@@ -176,6 +208,9 @@ class TestResolveProvider:
def test_alias_moonshot(self):
assert resolve_provider("moonshot") == "kimi-coding"
def test_alias_step(self):
assert resolve_provider("step") == "stepfun"
def test_alias_minimax_underscore(self):
assert resolve_provider("minimax_cn") == "minimax-cn"
@@ -212,6 +247,14 @@ class TestResolveProvider:
assert resolve_provider("github-copilot-acp") == "copilot-acp"
assert resolve_provider("copilot-acp-agent") == "copilot-acp"
def test_alias_volcengine_coding_plan(self):
assert resolve_provider("volcengine-coding-plan") == "volcengine"
assert resolve_provider("volcengine_coding_plan") == "volcengine"
def test_alias_byteplus_coding_plan(self):
assert resolve_provider("byteplus-coding-plan") == "byteplus"
assert resolve_provider("byteplus_coding_plan") == "byteplus"
def test_explicit_huggingface(self):
assert resolve_provider("huggingface") == "huggingface"
@@ -244,6 +287,10 @@ class TestResolveProvider:
monkeypatch.setenv("KIMI_API_KEY", "test-kimi-key")
assert resolve_provider("auto") == "kimi-coding"
def test_auto_detects_stepfun_key(self, monkeypatch):
monkeypatch.setenv("STEPFUN_API_KEY", "test-stepfun-key")
assert resolve_provider("auto") == "stepfun"
def test_auto_detects_minimax_key(self, monkeypatch):
monkeypatch.setenv("MINIMAX_API_KEY", "test-mm-key")
assert resolve_provider("auto") == "minimax"
@@ -308,6 +355,30 @@ class TestApiKeyProviderStatus:
status = get_api_key_provider_status("kimi-coding")
assert status["base_url"] == "https://custom.kimi.example/v1"
def test_stepfun_status_uses_configured_base_url(self, monkeypatch):
monkeypatch.setenv("STEPFUN_API_KEY", "stepfun-key")
monkeypatch.setenv("STEPFUN_BASE_URL", STEPFUN_STEP_PLAN_CN_BASE_URL)
status = get_api_key_provider_status("stepfun")
assert status["configured"] is True
assert status["base_url"] == STEPFUN_STEP_PLAN_CN_BASE_URL
def test_volcengine_status_uses_coding_plan_base_url(self, monkeypatch):
monkeypatch.setenv("VOLCENGINE_API_KEY", "volc-test-key")
monkeypatch.setattr(
"hermes_cli.auth.read_raw_config",
lambda: {
"model": {
"provider": "volcengine",
"default": "volcengine-coding-plan/doubao-seed-2.0-code",
}
},
)
status = get_api_key_provider_status("volcengine")
assert status["configured"] is True
assert status["base_url"] == "https://ark.cn-beijing.volces.com/api/coding/v3"
def test_copilot_status_uses_gh_cli_token(self, monkeypatch):
monkeypatch.setattr("hermes_cli.copilot_auth._try_gh_cli_token", lambda: "gho_gh_cli_token")
status = get_api_key_provider_status("copilot")
@@ -363,6 +434,25 @@ class TestResolveApiKeyProviderCredentials:
assert creds["base_url"] == "https://api.z.ai/api/paas/v4"
assert creds["source"] == "GLM_API_KEY"
def test_resolve_byteplus_with_coding_plan_model_uses_coding_base_url(self, monkeypatch):
monkeypatch.setenv("BYTEPLUS_API_KEY", "byteplus-secret-key")
monkeypatch.setattr(
"hermes_cli.auth.read_raw_config",
lambda: {
"model": {
"provider": "byteplus",
"default": "byteplus-coding-plan/dola-seed-2.0-pro",
}
},
)
creds = resolve_api_key_provider_credentials("byteplus")
assert creds["provider"] == "byteplus"
assert creds["api_key"] == "byteplus-secret-key"
assert creds["base_url"] == "https://ark.ap-southeast.bytepluses.com/api/coding/v3"
assert creds["source"] == "BYTEPLUS_API_KEY"
def test_resolve_copilot_with_github_token(self, monkeypatch):
monkeypatch.setenv("GITHUB_TOKEN", "gh-env-secret")
creds = resolve_api_key_provider_credentials("copilot")
@@ -425,6 +515,19 @@ class TestResolveApiKeyProviderCredentials:
assert creds["api_key"] == "kimi-secret-key"
assert creds["base_url"] == "https://api.moonshot.ai/v1"
def test_resolve_stepfun_with_key(self, monkeypatch):
monkeypatch.setenv("STEPFUN_API_KEY", "stepfun-secret-key")
creds = resolve_api_key_provider_credentials("stepfun")
assert creds["provider"] == "stepfun"
assert creds["api_key"] == "stepfun-secret-key"
assert creds["base_url"] == STEPFUN_STEP_PLAN_INTL_BASE_URL
def test_resolve_stepfun_custom_base_url(self, monkeypatch):
monkeypatch.setenv("STEPFUN_API_KEY", "stepfun-secret-key")
monkeypatch.setenv("STEPFUN_BASE_URL", STEPFUN_STEP_PLAN_CN_BASE_URL)
creds = resolve_api_key_provider_credentials("stepfun")
assert creds["base_url"] == STEPFUN_STEP_PLAN_CN_BASE_URL
def test_resolve_minimax_with_key(self, monkeypatch):
monkeypatch.setenv("MINIMAX_API_KEY", "mm-secret-key")
creds = resolve_api_key_provider_credentials("minimax")
@@ -515,6 +618,16 @@ class TestRuntimeProviderResolution:
assert result["api_mode"] == "chat_completions"
assert result["api_key"] == "kimi-key"
def test_runtime_stepfun(self, monkeypatch):
monkeypatch.setenv("STEPFUN_API_KEY", "stepfun-key")
monkeypatch.setenv("STEPFUN_BASE_URL", STEPFUN_STEP_PLAN_CN_BASE_URL)
from hermes_cli.runtime_provider import resolve_runtime_provider
result = resolve_runtime_provider(requested="stepfun")
assert result["provider"] == "stepfun"
assert result["api_mode"] == "chat_completions"
assert result["api_key"] == "stepfun-key"
assert result["base_url"] == STEPFUN_STEP_PLAN_CN_BASE_URL
def test_runtime_minimax(self, monkeypatch):
monkeypatch.setenv("MINIMAX_API_KEY", "mm-key")
from hermes_cli.runtime_provider import resolve_runtime_provider
@@ -376,7 +376,6 @@ class TestLoginNousSkipKeepsCurrent:
lambda *a, **kw: prompt_returns,
)
monkeypatch.setattr(models_mod, "get_pricing_for_provider", lambda p: {})
monkeypatch.setattr(models_mod, "filter_nous_free_models", lambda ids, p: ids)
monkeypatch.setattr(models_mod, "check_nous_free_tier", lambda: None)
monkeypatch.setattr(
models_mod, "partition_nous_models_by_tier",
+19
View File
@@ -33,6 +33,25 @@ def test_project_env_overrides_stale_shell_values_when_user_env_missing(tmp_path
assert os.getenv("OPENAI_BASE_URL") == "https://project.example/v1"
def test_project_env_is_sanitized_before_loading(tmp_path, monkeypatch):
home = tmp_path / "hermes"
project_env = tmp_path / ".env"
project_env.write_text(
"TELEGRAM_BOT_TOKEN=8356550917:AAGGEkzg06Hrc3Hjb3Sa1jkGVDOdU_lYy2Q"
"ANTHROPIC_API_KEY=sk-ant-test123\n",
encoding="utf-8",
)
monkeypatch.delenv("TELEGRAM_BOT_TOKEN", raising=False)
monkeypatch.delenv("ANTHROPIC_API_KEY", raising=False)
loaded = load_hermes_dotenv(hermes_home=home, project_env=project_env)
assert loaded == [project_env]
assert os.getenv("TELEGRAM_BOT_TOKEN") == "8356550917:AAGGEkzg06Hrc3Hjb3Sa1jkGVDOdU_lYy2Q"
assert os.getenv("ANTHROPIC_API_KEY") == "sk-ant-test123"
def test_user_env_takes_precedence_over_project_env(tmp_path, monkeypatch):
home = tmp_path / "hermes"
home.mkdir()

Some files were not shown because too many files have changed in this diff Show More