Compare commits

...

1456 Commits

Author SHA1 Message Date
donrhmexe
f0b325b9f9 fix: link subagent sessions to parent and hide from session list
Subagent sessions spawned by delegate_task were created with
parent_session_id=NULL and source=cli, making them indistinguishable
from user sessions in hermes sessions list and /resume.

Changes:
- delegate_tool.py: pass parent_agent.session_id to child agent
- run_agent.py: accept parent_session_id param, pass to create_session
- hermes_state.py list_sessions_rich: filter parent_session_id IS NULL
  by default (opt-in include_children=True for callers that need them)
- hermes_state.py delete_session: delete child sessions first (FK)
- hermes_state.py prune_sessions: delete children before parents (FK)

session_search already handles parent_session_id correctly — child
sessions are filtered from recent list and resolved to parent root
in full-text search results.

Fixes #5122
2026-04-05 12:44:31 -07:00
Teknium
b63fb03f3f feat(browser): add JS evaluation via browser_console expression parameter (#5303)
Add optional 'expression' parameter to browser_console that evaluates
JavaScript in the page context (like DevTools console). Returns structured
results with auto-JSON parsing.

No new tool — extends the existing browser_console schema with ~20 tokens
of overhead instead of adding a 12th browser tool.

Both backends supported:
- Browserbase: uses agent-browser 'eval' command via CDP
- Camofox: uses /tabs/{tab_id}/eval endpoint with graceful degradation

E2E verified: string eval, number eval, structured JSON, DOM manipulation,
error handling, and original console-output mode all working.
2026-04-05 12:42:52 -07:00
Teknium
8d5226753f fix: add missing ButtonStyle.grey to discord mock for test compatibility 2026-04-05 12:42:47 -07:00
Abhey
66d0fa1778 fix: avoid unnecessary Discord members intent on startup
Only request the privileged members intent when DISCORD_ALLOWED_USERS includes non-numeric entries that need username resolution. Also release the Discord token lock when startup fails so retries and restarts are not blocked by a stale lock.\n\nAdds regression tests for conditional intents and startup lock cleanup.
2026-04-05 12:42:47 -07:00
Teknium
583d9f9597 fix(honcho): migration guard for observation mode default change
Existing honcho.json configs without an explicit observationMode now
default to 'unified' (the old default) instead of being silently
switched to 'directional'. New installations get 'directional' as
the new default.

Detection: _explicitly_configured (host block exists or enabled=true)
signals an existing config. When true and no observationMode is set
anywhere in the config chain, falls back to 'unified'. When false
(fresh install), uses 'directional'.

Users who explicitly set observationMode or granular observation
booleans are unaffected — explicit config always wins.

5 new tests covering all migration paths.
2026-04-05 12:34:11 -07:00
Teknium
0f813c422c fix(plugins): only register CLI commands for the active memory provider
discover_plugin_cli_commands() now reads memory.provider from config.yaml
and only loads CLI registration for the active provider. If no memory
provider is set, no plugin CLI commands appear in the CLI.

Only one memory provider can be active at a time — at most one set of
plugin CLI commands is registered. Users who haven't configured honcho
(or any memory provider) won't see 'hermes honcho' in their help output.

Adds test for inactive provider returning empty results.
2026-04-05 12:34:11 -07:00
Teknium
b074b0b13a test: add plugin CLI registration tests
11 tests covering:
- PluginContext.register_cli_command() storage and overwrite
- get_plugin_cli_commands() return semantics
- Memory plugin discover_plugin_cli_commands() with register_cli convention
- Skipping plugins without register_cli or cli.py
- Honcho register_cli() subcommand tree structure
- Mode choices updated to recall modes (hybrid/context/tools)
- _ProviderCollector.register_cli_command no-op safety
2026-04-05 12:34:11 -07:00
Teknium
dd8a42bf7d feat(plugins): plugin CLI registration system — decouple plugin commands from core
Add ctx.register_cli_command() to PluginContext for general plugins and
discover_plugin_cli_commands() to memory plugin system. Plugins that
provide a register_cli(subparser) function in their cli.py are
automatically discovered during argparse setup and wired into the CLI.

- Remove 95-line hardcoded honcho argparse block from main.py
- Move honcho subcommand tree into plugins/memory/honcho/cli.py
  via register_cli() convention
- hermes honcho setup now redirects to hermes memory setup (unified path)
- hermes honcho (no subcommand) shows status instead of running setup
- Future plugins can register CLI commands without touching core files
- PluginManager stores CLI registrations in _cli_commands dict
- Memory plugin discovery scans cli.py for register_cli at argparse time

main.py: -102 lines of hardcoded plugin routing
2026-04-05 12:34:11 -07:00
erosika
c02c3dc723 fix(honcho): plugin drift overhaul -- observation config, chunking, setup wizard, docs, dead code cleanup
Salvaged from PR #5045 by erosika.

- Replace memoryMode/peer_memory_modes with granular per-peer observation config
- Add message chunking for Honcho API limits (25k chars default)
- Add dialectic input guard (10k chars default)
- Add dialecticDynamic toggle for reasoning level auto-bump
- Rewrite setup wizard with cloud/local deployment picker
- Switch peer card/profile/search from session.context() to direct peer APIs
- Add server-side observation sync via get_peer_configuration()
- Fix base_url/baseUrl config mismatch for self-hosted setups
- Fix local auth leak (cloud API keys no longer sent to local instances)
- Remove dead code: memoryMode, peer_memory_modes, linkedHosts, suppress flags, SOUL.md aiPeer sync
- Add post_setup hook to memory_setup.py for provider-specific setup wizards
- Comprehensive README rewrite with full config reference
- New optional skill: autonomous-ai-agents/honcho
- Expanded memory-providers.md with multi-profile docs
- 9 new tests (chunking, dialectic guard, peer lookups), 14 dead tests removed
- Fix 2 pre-existing TestResolveConfigPath filesystem isolation failures
2026-04-05 12:34:11 -07:00
Teknium
12724e6295 feat: progressive subdirectory hint discovery (#5291)
As the agent navigates into subdirectories via tool calls (read_file,
terminal, search_files, etc.), automatically discover and load project
context files (AGENTS.md, CLAUDE.md, .cursorrules) from those directories.

Previously, context files were only loaded from the CWD at session start.
If the agent moved into backend/, frontend/, or any subdirectory with its
own AGENTS.md, those instructions were never seen.

Now, SubdirectoryHintTracker watches tool call arguments for file paths
and shell commands, resolves directories, and loads hint files on first
access. Discovered hints are appended to the tool result so the model
gets relevant context at the moment it starts working in a new area —
without modifying the system prompt (preserving prompt caching).

Features:
- Extracts paths from tool args (path, workdir) and shell commands
- Loads AGENTS.md, CLAUDE.md, .cursorrules (first match per directory)
- Deduplicates — each directory loaded at most once per session
- Ignores paths outside the working directory
- Truncates large hint files at 8K chars
- Works on both sequential and concurrent tool execution paths

Inspired by Block/goose SubdirectoryHintTracker.
2026-04-05 12:33:47 -07:00
Teknium
567bc79948 fix: clean up cron platform allowlist — add homeassistant, fix import, improve placement
Follow-up for cherry-picked #5118 commits:
- Remove duplicate 'import subprocess'
- Move _KNOWN_DELIVERY_PLATFORMS to module-level (after imports)
- Add 'homeassistant' to allowlist (existing platform missing from original PR)
- Remove trailing whitespace
2026-04-05 12:31:27 -07:00
Maymun
71a4582bf8 fix(security): hoist platform allowlist to module scope as frozenset 2026-04-05 12:31:27 -07:00
Maymun
1ebc932417 fix(security): validate cron deliver platform name to prevent env var enumeration 2026-04-05 12:31:27 -07:00
Xowiek
ef3bd3b276 security(approval): fix privilege escalation in gateway once-approval logic 2026-04-05 12:31:27 -07:00
MichaelWDanko
c6793d6fc3 fix(gateway): wrap cron helpers with staticmethod to prevent self-binding
Plain functions imported as class attributes in APIServerAdapter get
auto-bound as methods via Python's descriptor protocol.  Every
self._cron_*() call injected self as the first positional argument,
causing TypeError on all 8 cron API endpoints at runtime.

Wrap each import with staticmethod() so self._cron_*() calls dispatch
correctly without modifying any call sites.

Co-authored-by: teknium <teknium@nousresearch.com>
2026-04-05 12:31:10 -07:00
Mibayy
cc2b56b26a feat(api): structured run events via /v1/runs SSE endpoint
Add POST /v1/runs to start async agent runs and GET /v1/runs/{run_id}/events
for SSE streaming of typed lifecycle events (tool.started, tool.completed,
message.delta, reasoning.available, run.completed, run.failed).

Changes the internal tool_progress_callback signature from positional
(tool_name, preview, args) to event-type-first
(event_type, tool_name, preview, args, **kwargs). Existing consumers
filter on event_type and remain backward-compatible.

Adds concurrency limit (_MAX_CONCURRENT_RUNS=10) and orphaned run sweep.

Fixes logic inversion in cli.py _on_tool_progress where the original PR
would have displayed internal tools instead of non-internal ones.

Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
2026-04-05 12:05:13 -07:00
Mibayy
e167ad8f61 feat(delegate): add acp_command/acp_args override to delegate_task
Allow delegate_task to specify custom ACP transport per-task, so a parent
running via CLI/Discord/Telegram can spawn child agents over ACP
(e.g. claude --acp --stdio). Follows the existing override_provider pattern.
Supports per-task granularity in batch mode.

Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
2026-04-05 12:05:13 -07:00
NexVeridian
c71b1d197f fix(acp): advertise slash commands via ACP protocol
Send AvailableCommandsUpdate on session create/load/resume/fork so ACP
clients (Zed, etc.) can discover /help, /model, /tools, /compact, etc.
Also rewrites /compact to use agent._compress_context() properly with
token estimation and session DB isolation.

Co-authored-by: NexVeridian <NexVeridian@users.noreply.github.com>
2026-04-05 12:05:13 -07:00
Git-on-my-level
fcdd5447e2 fix: keep ACP stdout protocol-clean
Route AIAgent print output to stderr via _print_fn for ACP stdio sessions.
Gate quiet-mode spinner startup on _should_start_quiet_spinner() so JSON-RPC
on stdout isn't corrupted. Child agents inherit the redirect.

Co-authored-by: Git-on-my-level <Git-on-my-level@users.noreply.github.com>
2026-04-05 12:05:13 -07:00
Teknium
914a7db448 fix(acp): rename AuthMethod to AuthMethodAgent for agent-client-protocol 0.9.0
Straight rename to match the 0.9.0 API where AuthMethod was split into
AuthMethodAgent, AuthMethodEnvVar, AuthMethodTerminal. Bump pin to >=0.9.0,<1.0.

Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
2026-04-05 12:05:13 -07:00
Teknium
6ee90a7cf6 fix: hermes auth remove now clears env-seeded credentials permanently (#5285)
Removing an env-seeded credential (e.g. from OPENROUTER_API_KEY) via
'hermes auth' previously had no lasting effect -- the entry was deleted
from auth.json but load_pool() re-created it on the next call because
the env var was still set.

Now auth_remove_command detects env-sourced entries (source starts with
'env:') and calls the new remove_env_value() to strip the var from both
.env and os.environ, preventing re-seeding.

Changes:
- hermes_cli/config.py: add remove_env_value() -- atomically removes a
  line from .env and pops from os.environ
- hermes_cli/auth_commands.py: auth_remove_command clears env var when
  removing an env-seeded pool entry
- 8 new tests covering remove_env_value and the full zombie-credential
  lifecycle (remove -> reload -> stays gone)
2026-04-05 12:00:53 -07:00
Teknium
0c95e91059 fix: follow-up fixes for salvaged PRs
- Fix GatewayApp → GatewayRunner import in api_server.py (PR #4976)
- Update launchd test assertions for new bootstrap/bootout/kickstart commands (PR #4892)
- Add nonlocal message declaration in run_sync() to fix UnboundLocalError (pre-existing scoping bug)
2026-04-05 11:59:28 -07:00
analista
6a6ae9a5c3 fix(gateway): correct misleading log text for unknown /commands
The warning said 'forwarding as plain text' but the code returns a
user-facing error reply instead of forwarding. Describe what actually
happens.
2026-04-05 11:59:28 -07:00
analista
e8053e8b93 fix(gateway): surface unknown /commands instead of leaking them to the LLM
Previously, typing a /command that isn't a built-in, plugin, or skill
would silently fall through to the LLM as plain text. The model often
interprets it as a loose instruction and invents unrelated tool calls —
e.g. a stray /claude_code slipped through and the model fabricated a
delegate_task invocation that got stuck in an OAuth loop.

Now we check GATEWAY_KNOWN_COMMANDS after the skill / plugin /
unavailable-skill lookups and return an actionable message pointing the
user at /commands. The user gets feedback, and the agent doesn't waste
a round-trip guessing what /foo-bar was supposed to mean.
2026-04-05 11:59:28 -07:00
analista
4a75aec433 fix(gateway): resolve Telegram's underscored /commands to skill/plugin keys
Telegram's Bot API disallows hyphens in command names, so
_build_telegram_menu registers /claude-code as /claude_code. When the
user taps it from autocomplete, the gateway dispatch did a direct
lookup against skill_cmds (keyed on the hyphenated form) and missed,
silently falling through to the LLM as plain text. The model would
then typically call delegate_task, spawning a Hermes subagent instead
of invoking the intended skill.

Normalize underscores to hyphens in skill and plugin command lookup,
matching the existing pattern in _check_unavailable_skill.
2026-04-05 11:59:28 -07:00
Damian P
afccbf253c fix: resolve listed messaging targets consistently 2026-04-05 11:59:28 -07:00
kshitijk4poor
1d2e34c7eb Prevent Telegram polling handoffs and flood-control send failures
Telegram polling can inherit a stale webhook registration when a deployment
switches transport modes, which leaves getUpdates idle even though the gateway
starts cleanly. Outbound send also treats Telegram retry_after responses as
terminal errors, so brief flood control can drop tool progress and replies.

Constraint: Keep the PR narrowly scoped to upstream/main Telegram adapter behavior
Rejected: Port OpenClaw's broader polling supervisor and offset persistence | too broad for an isolated fix PR
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Polling mode should clear webhook state before starting getUpdates, and send-path retry logic must distinguish flood control from timeouts
Tested: uv run --extra dev pytest tests/gateway/test_telegram_* -q
Not-tested: Live Telegram webhook-to-polling migration and real Bot API 429 behavior
2026-04-05 11:59:28 -07:00
Trevin Chow
74ff62f5ac fix(gateway): use kickstart -k for atomic launchd restart
Replace the two-step stop/start restart with a single
launchctl kickstart -k call. When the gateway triggers a
restart from inside its own process tree, the old stop
command kills the shell before the start half is reached.
kickstart -k lets launchd handle the kill+restart atomically.
2026-04-05 11:59:28 -07:00
Trevin Chow
aab74b582c fix(gateway): replace deprecated launchctl start/stop with kickstart/kill
launchctl load/unload/start/stop are deprecated on macOS since 10.10
and fail silently on modern versions. This replaces them with the
current equivalents:

- load -> bootstrap gui/<uid> <plist>
- unload -> bootout gui/<uid>/<label>
- start -> kickstart gui/<uid>/<label>
- stop -> kill SIGTERM gui/<uid>/<label>

Adds _launchd_domain() helper returning the gui/<uid> target domain.
Updates test assertions to match the new command signatures.

Fixes #4820
2026-04-05 11:59:28 -07:00
bg-l2norm
abf1be564b fix(deps): include telegram webhook extra in messaging installs (#4915) 2026-04-05 11:59:28 -07:00
teyrebaz33
6df0f07ff3 fix: /status command bypasses active-session guard during agent run (#5046)
When an agent was actively processing a message, /status sent via Telegram
(or any gateway) was queued as a pending interrupt instead of being dispatched
immediately. The base platform adapter's handle_message() only had special-case
bypass logic for /approve and /deny, so /status fell through to the default
interrupt path and was never processed as a system command.

Apply the same bypass pattern used by /approve//deny: detect cmd == 'status'
inside the active-session guard, dispatch directly to the message handler, and
send the response without touching session lifecycle or interrupt state.

Adds a regression test that verifies /status is dispatched and responded to
immediately even when _active_sessions contains an entry for the session.
2026-04-05 11:59:28 -07:00
nibzard
4df2fca2f0 fix(gateway): cap memory flush retries at 3 to prevent infinite loop
The _session_expiry_watcher retried failed memory flushes forever
because exceptions were caught at debug level without setting
memory_flushed=True. Expired sessions with transient failures
(rate limits, network errors) would retry every 5 minutes
indefinitely, burning API quota and blocking gateway message
processing via 429 rate limit cascades.

Observed case: a March 19 session retried 28+ times over ~17 days,
causing repeated 429 errors that made Telegram unresponsive.

Add a per-session failure counter (_flush_failures) that gives up
after 3 consecutive attempts and marks the session as flushed to
break the loop.
2026-04-05 11:59:28 -07:00
Saurabh
507b63f86b fix(api-server): pass fallback_model to AIAgent (#4954)
The API server platform never passed fallback_model to AIAgent(),
so the fallback provider chain was always empty for requests through
the OpenAI-compatible endpoint. Load it via GatewayApp._load_fallback_model()
to match the behavior of Telegram/Discord/Slack platforms.
2026-04-05 11:59:28 -07:00
memosr
7f853ba7b6 fix: use logger.exception to preserve traceback in logs and drop unused import 2026-04-05 11:59:28 -07:00
memosr
5ff514ec79 fix(security): remove full traceback from cron error output to prevent info leakage 2026-04-05 11:59:28 -07:00
Teknium
daa4a5acdd feat: add docs links to setup wizard sections (#5283)
Each setup step now shows a link to the relevant docs page:
- Model & Provider → integrations/providers
- Terminal Backend → developer-guide/environments
- Agent Settings → user-guide/configuration
- Messaging Platforms → user-guide/messaging (overview)
- Telegram, Discord, Matrix, Mattermost, WhatsApp → per-platform guides
- Tools → user-guide/features/tools

Existing Slack and Webhook URLs migrated to shared _DOCS_BASE constant.
2026-04-05 11:46:13 -07:00
Teknium
54cb311f40 fix: suppress false 'Unknown toolsets' warning for MCP server names (#5279)
MCP server names (e.g. annas, libgen) are added to enabled_toolsets by
_get_platform_tools() but aren't registered in TOOLSETS until later when
_sync_mcp_toolsets() runs during tool discovery. The validation in
HermesCLI.__init__() fires before that, producing a false warning.

Fix: exclude configured MCP server names from the validation check.
CLI_CONFIG is already available at the call site, so no new imports needed.

Closes #5267 (alternative fix)
2026-04-05 11:44:40 -07:00
Teknium
a0a1b86c2e fix: accept reasoning-only responses without retries — set content to "(empty)" (#5278)
* feat: coerce tool call arguments to match JSON Schema types

LLMs frequently return numbers as strings ("42" instead of 42) and
booleans as strings ("true" instead of true). This causes silent
failures with MCP tools and any tool with strictly-typed parameters.

Added coerce_tool_args() in model_tools.py that runs before every tool
dispatch. For each argument, it checks the tool registry schema and
attempts safe coercion:
  - "42" → 42 when schema says "type": "integer"
  - "3.14" → 3.14 when schema says "type": "number"
  - "true"/"false" → True/False when schema says "type": "boolean"
  - Union types tried in order
  - Original values preserved when coercion fails or is not applicable

Inspired by Block/goose tool argument coercion system.

* fix: accept reasoning-only responses without retries — set content to "(empty)"

Previously, when a model returned reasoning/thinking but no visible
content, we entered a 120-line retry/classify/compress/salvage cascade
that wasted 3+ API calls trying to "fix" the response. The model was
done thinking — retrying with the same input just burned money.

Now reasoning-only responses are accepted immediately:
- Reasoning stays in the `reasoning` field (semantically correct)
- Content set to "(empty)" — valid non-empty string every provider accepts
- No retries, no compression triggers, no salvage logic
- Session history contains "(empty)" not "" — prevents #2128 session
  poisoning where empty assistant content caused prefill rejections

Removes ~120 lines, adds ~15. Saves 2-3 API calls per reasoning-only
response. Fixes #2128.
2026-04-05 11:30:52 -07:00
nepenth
534511bebb feat(matrix): Tier 1 enhancement — reactions, read receipts, rich formatting, room management
Cherry-picked from PR #4338 by nepenth, resolved against current main.

Adds:
- Processing lifecycle reactions (eyes/checkmark/cross) via MATRIX_REACTIONS env
- Reaction send/receive with ReactionEvent + UnknownEvent fallback for older nio
- Fire-and-forget read receipts on text and media messages
- Message redaction, room history fetch, room creation, user invite
- Presence status control (online/offline/unavailable)
- Emote (/me) and notice message types with HTML rendering
- XSS-hardened markdown-to-HTML converter (strips raw HTML preprocessor,
  sanitizes link URLs against javascript:/data:/vbscript: schemes)
- Comprehensive regex fallback with full block/inline markdown support
- Markdown>=3.6 added to [matrix] extras in pyproject.toml
- 46 new tests covering all features and security hardening
2026-04-05 11:19:54 -07:00
Teknium
20b4060dbf fix: web_extract fast-fail on scrape timeout + summarizer resilience
- Firecrawl scrape: 60s timeout via asyncio.wait_for + to_thread
  (previously could hang indefinitely)
- Summarizer retries: 6 → 2 (one retry), reads timeout from
  auxiliary.web_extract.timeout config (default 360s / 6min)
- Summarizer failure: falls back to truncated raw content (~5000 chars)
  instead of useless error message, with guidance about config/model
- Config default: auxiliary.web_extract.timeout bumped 30 → 360s
  for local model compatibility

Addresses Discord reports of agent hanging during web_extract.
2026-04-05 11:16:45 -07:00
Teknium
c100ad874c fix(matrix): E2EE cron delivery via live adapter + HTML formatting + origin fallback
Salvaged from PRs #3767 (chalkers), #5236 (ygd58), #2641 (buntingszn).

Three improvements to Matrix cron delivery:

1. Live adapter path: when the gateway is running, cron delivery now uses
   the connected MatrixAdapter via run_coroutine_threadsafe instead of
   the standalone HTTP PUT. This enables delivery to E2EE rooms where
   the raw HTTP path cannot encrypt. Falls back to standalone on failure.
   Threads adapters + event loop from gateway -> cron ticker -> tick() ->
   _deliver_result(). (from #3767)

2. HTML formatted_body: _send_matrix() now converts markdown to HTML
   using the optional markdown library, with h1-h6 to bold conversion
   for Element X compatibility. Falls back to plain text if markdown
   is not installed. Also adds random bytes to txn_id to prevent
   collisions. (from #5236)

3. Origin fallback: when deliver="origin" but origin is null (jobs
   created via API/scripts), falls back to HOME_CHANNEL env vars
   in order: matrix -> telegram -> discord -> slack. (from #2641)
2026-04-05 11:07:47 -07:00
dlkakbs
36e046e843 fix(gateway): MIME type fallback for Matrix document uploads
Cherry-picked run.py portion from PR #3495 by dlkakbs.
When Matrix sends non-image files (text, YAML, JSON, etc.), the MIME
type may be empty or application/octet-stream. Falls back to
extension-based detection so text files are properly injected into
agent context.
2026-04-05 11:07:47 -07:00
chalkers
bec02f3731 fix(matrix): handle encrypted media events and cache decrypted attachments
Cherry-picked from PR #3140 by chalkers, resolved against current main.
Registers RoomEncryptedImage/Audio/Video/File callbacks, decrypts
attachments via nio.crypto, caches all media types (images, audio,
documents), prevents ciphertext URL fallback for encrypted media.
Unifies the separate voice-message download into the main cache block.
Preserves main's MATRIX_REQUIRE_MENTION, auto-thread, and mention
stripping features. Includes 355 lines of encrypted media tests.
2026-04-05 11:07:47 -07:00
binhnt92
b65e67545a fix(gateway): stop Matrix/Mattermost reconnect on permanent auth failures
Cherry-picked from PR #3695 by binhnt92.
Matrix _sync_loop() and Mattermost _ws_loop() were retrying all errors
forever, including permanent auth failures (expired tokens, revoked
access). Now detects M_UNKNOWN_TOKEN, M_FORBIDDEN, 401/403 and stops
instead of spinning. Includes 216 lines of tests.
2026-04-05 11:07:47 -07:00
pjay-io
9d7c288d86 fix(matrix): add filesize to nio.upload() for Synapse compatibility
Cherry-picked from PR #4343 by pjay-io.
Synapse rejects chunked uploads without Content-Length. Adding
filesize=len(data) ensures the upload includes proper sizing.
2026-04-05 11:07:47 -07:00
thakoreh
914f7461dc fix: add missing shutil import for Matrix E2EE setup
Cherry-picked from PR #5136 by thakoreh.
setup_gateway() uses shutil.which('uv') at line 2126 but shutil was
never imported at module level, causing NameError during Matrix E2EE
auto-install. Adds top-level import and regression test.
2026-04-05 11:07:47 -07:00
LucidPaths
70f798043b fix: Ollama Cloud auth, /model switch persistence, and alias tab completion
- Add OLLAMA_API_KEY to credential resolution chain for ollama.com endpoints
- Update requested_provider/_explicit_api_key/_explicit_base_url after /model
  switch so _ensure_runtime_credentials() doesn't revert the switch
- Pass base_url/api_key from fallback config to resolve_provider_client()
- Add DirectAlias system: user-configurable model_aliases in config.yaml
  checked before catalog resolution, with reverse lookup by model ID
- Add /model tab completion showing aliases with provider metadata

Co-authored-by: LucidPaths <LucidPaths@users.noreply.github.com>
2026-04-05 11:06:06 -07:00
Teknium
35d280d0bd feat: coerce tool call arguments to match JSON Schema types (#5265)
LLMs frequently return numbers as strings ("42" instead of 42) and
booleans as strings ("true" instead of true). This causes silent
failures with MCP tools and any tool with strictly-typed parameters.

Added coerce_tool_args() in model_tools.py that runs before every tool
dispatch. For each argument, it checks the tool registry schema and
attempts safe coercion:
  - "42" → 42 when schema says "type": "integer"
  - "3.14" → 3.14 when schema says "type": "number"
  - "true"/"false" → True/False when schema says "type": "boolean"
  - Union types tried in order
  - Original values preserved when coercion fails or is not applicable

Inspired by Block/goose tool argument coercion system.
2026-04-05 10:57:34 -07:00
Teknium
e899d6a05d fix: increase default HERMES_AGENT_TIMEOUT from 10min to 30min
Users hitting the 10-minute default during complex tool chains.
Bumps both the execution cap and stale-lock eviction timeout.
Still overridable via HERMES_AGENT_TIMEOUT env var (0 = unlimited).
2026-04-05 10:32:59 -07:00
Teknium
51ed7dc2f3 feat: save oversized tool results to file instead of destructive truncation (#5210)
Previously, tool results exceeding 100K characters were silently chopped
with only a '[Truncated]' notice — the rest of the content was lost
permanently. The model had no way to access the truncated portion.

Now, oversized results are written to HERMES_HOME/cache/tool_responses/
and the model receives:
  - A 1,500-char head preview for immediate context
  - The file path so it can use read_file/search_files on the full output

This preserves the context window protection (inline content stays small)
while making the full data recoverable. Falls back to the old destructive
truncation if the file write fails.

Inspired by Block/goose's large response handler pattern.
2026-04-05 10:29:57 -07:00
Teknium
d932980c1a Add gitnexus-explorer optional skill (#5208)
Index codebases with GitNexus and serve an interactive knowledge
graph web UI via Cloudflare tunnel. No sudo required.

Includes:
- Full setup/build/serve/tunnel pipeline
- Zero-dependency Node.js reverse proxy script
- Pitfalls section covering cloudflared config conflicts,
  Vite allowedHosts, Claude Code artifact cleanup, and
  browser memory limits for large repos
2026-04-05 03:00:19 -07:00
Teknium
4976a8b066 feat: /model command — models.dev primary database + --provider flag (#5181)
Full overhaul of the model/provider system.

## What changed
- models.dev (109 providers, 4000+ models) as primary database for provider identity AND model metadata
- --provider flag replaces colon syntax for explicit provider switching
- Full ModelInfo/ProviderInfo dataclasses with context, cost, capabilities, modalities
- HermesOverlay system merges models.dev + Hermes-specific transport/auth/aggregator flags
- User-defined endpoints via config.yaml providers: section
- /model (no args) lists authenticated providers with curated model catalog
- Rich metadata display: context window, max output, cost/M tokens, capabilities
- Config migration: custom_providers list → providers dict (v11→v12)
- AIAgent.switch_model() for in-place model swap preserving conversation

## Files
agent/models_dev.py, hermes_cli/providers.py, hermes_cli/model_switch.py,
hermes_cli/model_normalize.py, cli.py, gateway/run.py, run_agent.py,
hermes_cli/config.py, hermes_cli/commands.py
2026-04-05 01:04:44 -07:00
Teknium
cb63b5f381 feat(skills): add popular-web-designs skill with 54 website design systems (#5194)
Curated collection of production-quality design system specifications extracted
from real websites (sourced from VoltAgent/awesome-design-md). Each template
captures a site's complete visual language: colors, typography, components,
layout, shadows, responsive behavior, and agent-ready CSS values.

Hermes-specific adaptations in every template:
- Google Fonts CDN link tags for proprietary font substitutes
- CSS font-family stacks with proper fallbacks
- Integration notes for write_file + generative-widgets workflow
- browser_vision verification reminders

SKILL.md includes categorized catalog, font substitution reference table,
HTML generation pattern, and design-to-use-case matching guide.

Sites: Airbnb, Airtable, Apple, BMW, Cal.com, Claude, Clay, ClickHouse,
Cohere, Coinbase, Composio, Cursor, ElevenLabs, Expo, Figma, Framer,
HashiCorp, IBM, Intercom, Kraken, Linear, Lovable, Minimax, Mintlify,
Miro, Mistral AI, MongoDB, Notion, NVIDIA, Ollama, OpenCode, Pinterest,
PostHog, Raycast, Replicate, Resend, Revolut, RunwayML, Sanity, Sentry,
SpaceX, Spotify, Stripe, Supabase, Superhuman, Together AI, Uber, Vercel,
VoltAgent, Warp, Webflow, Wise, xAI, Zapier
2026-04-05 00:42:55 -07:00
Teknium
0c54da8aaf feat(gateway): live-stream /update output + interactive prompt buttons (#5180)
* feat(gateway): live-stream /update output + forward interactive prompts

Adds real-time output streaming and interactive prompt forwarding for
the gateway /update command, so users on Telegram/Discord/etc see the
full update progress and can respond to prompts (stash restore, config
migration) without needing terminal access.

Changes:

hermes_cli/main.py:
- Add --gateway flag to 'hermes update' argparse
- Add _gateway_prompt() file-based IPC function that writes
  .update_prompt.json and polls for .update_response
- Modify _restore_stashed_changes() to accept optional input_fn
  parameter for gateway mode prompt forwarding
- cmd_update() uses _gateway_prompt when --gateway is set, enabling
  interactive stash restore and config migration prompts

gateway/run.py:
- _handle_update_command: spawn with --gateway flag and
  PYTHONUNBUFFERED=1 for real-time output flushing
- Store session_key in .update_pending.json for cross-restart
  session matching
- Add _update_prompt_pending dict to track sessions awaiting
  update prompt responses
- Replace _watch_for_update_completion with _watch_update_progress:
  streams output chunks every ~4s, detects .update_prompt.json and
  forwards prompts to the user, handles completion/failure/timeout
- Add update prompt interception in _handle_message: when a prompt
  is pending, the user's next message is written to .update_response
  instead of being processed normally
- Preserve _send_update_notification as legacy fallback for
  post-restart cases where adapter isn't available yet

File-based IPC protocol:
- .update_prompt.json: written by update process with prompt text,
  default value, and unique ID
- .update_response: written by gateway with user's answer
- .update_output.txt: existing, now streamed in real-time
- .update_exit_code: existing completion marker

Tests: 16 new tests covering _gateway_prompt IPC, output streaming,
prompt detection/forwarding, message interception, and cleanup.

* feat: interactive buttons for update prompts (Telegram + Discord)

Telegram: Inline keyboard with ✓ Yes / ✗ No buttons. Clicking a button
answers the callback query, edits the message to show the choice, and
writes .update_response directly. CallbackQueryHandler registered on
the update_prompt: prefix.

Discord: UpdatePromptView (discord.ui.View) with green Yes / red No
buttons. Follows the ExecApprovalView pattern — auth check, embed color
update, disabled-after-click. Writes .update_response on click.

All platforms: /approve and /deny (and /yes, /no) now work as shorthand
for yes/no when an update prompt is pending. The text fallback message
instructs users to use these commands. Raw message interception still
works as a fallback for non-command responses.

Gateway watcher checks adapter for send_update_prompt method (class-level
check to avoid MagicMock false positives) and falls back to text prompt
with /approve instructions when unavailable.

* fix: block /update on non-messaging platforms (API, webhooks, ACP)

Add _UPDATE_ALLOWED_PLATFORMS frozenset that explicitly lists messaging
platforms where /update is permitted. API server, webhook, and ACP
platforms get a clear error directing them to run hermes update from
the terminal instead.

ACP and API server already don't reach _handle_message (separate
codepaths), and webhooks have distinct session keys that can't collide
with messaging sessions. This guard is belt-and-suspenders.
2026-04-05 00:28:58 -07:00
Teknium
441ec48802 style: use module-level re import instead of local import re as _re 2026-04-05 00:20:53 -07:00
kshitijk4poor
4437354198 Preserve numeric credential labels in auth removal
Resolve exact label matches before treating digit-only input as a positional index so destructive auth removal does not mis-target credentials named with numeric labels.

Constraint: The CLI remove path must keep supporting existing index-based usage while adding safer label targeting
Rejected: Ban numeric labels | labels are free-form and existing users may already rely on them
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: When a destructive command accepts multiple identifier forms, prefer exact identity matches before fallback parsing heuristics
Tested: Focused pytest slice for auth commands, credential pool recovery, and routing (273 passed); py_compile on changed Python files
Not-tested: Full repository pytest suite
2026-04-05 00:20:53 -07:00
kshitijk4poor
65952ac00c Honor provider reset windows in pooled credential failover
Persist structured exhaustion metadata from provider errors, use explicit reset timestamps when available, and expose label-based credential targeting in the auth CLI. This keeps long-lived Codex cooldowns from being misreported as one-hour waits and avoids forcing operators to manage entries by list position alone.

Constraint: Existing credential pool JSON needs to remain backward compatible with stored entries that only record status code and timestamp
Constraint: Runtime recovery must keep the existing retry-then-rotate semantics for 429s while enriching pool state with provider metadata
Rejected: Add a separate credential scheduler subsystem | too large for the Hermes pool architecture and unnecessary for this fix
Rejected: Only change CLI formatting | would leave runtime rotation blind to resets_at and preserve the serial-failure behavior
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Preserve structured rate-limit metadata when new providers expose reset hints; do not collapse back to status-code-only exhaustion tracking
Tested: Focused pytest slice for auth commands, credential pool recovery, and routing (272 passed); py_compile on changed Python files; hermes -w auth list/remove smoke test with temporary HERMES_HOME
Not-tested: Full repository pytest suite, broader gateway/integration flows outside the touched auth and pool paths
2026-04-05 00:20:53 -07:00
Lume
ed4a605696 docs: update docstring to mention Fireworks strict validation
Updates _sanitize_tool_calls_for_strict_api docstring to explicitly
mention Fireworks alongside Mistral as strict APIs requiring sanitization.
Also documents the specific fields that are stripped (call_id, response_item_id).
2026-04-05 00:13:25 -07:00
Lume
8545343cba test: add strict API validation tests for Fireworks compatibility
Adds comprehensive tests verifying:
- Fireworks-compatible messages after sanitization
- Codex mode preserves fields for Responses API replay
- Fireworks provider triggers sanitization correctly
- Codex responses mode correctly skips sanitization

Prevents regression of 400 validation errors on strict APIs.
2026-04-05 00:13:25 -07:00
Lume
9be2b18064 test: add test for _should_sanitize_tool_calls()
Adds test verifying that:
- Codex mode returns False (no sanitization needed)
- Chat completions mode returns True (sanitization needed)
- Anthropic mode returns True (sanitization needed)

This ensures strict APIs like Fireworks receive properly sanitized tool_calls.
2026-04-05 00:13:25 -07:00
Lume
d90035835b refactor: use _should_sanitize_tool_calls in run_conversation()
Replaces hardcoded Mistral check with the new _should_sanitize_tool_calls()
method. Updates comment to mention Fireworks alongside Mistral as strict
APIs requiring tool_call field sanitization.
2026-04-05 00:13:25 -07:00
Lume
234c01f690 refactor: use _should_sanitize_tool_calls in _handle_max_iterations()
Replaces hardcoded Mistral check with the new _should_sanitize_tool_calls()
method. Ensures summary generation works correctly with Fireworks and
other strict APIs that reject unknown tool_call fields.
2026-04-05 00:13:25 -07:00
Lume
7f6e509199 refactor: use _should_sanitize_tool_calls in flush_memories()
Replaces hardcoded Mistral check with the new _should_sanitize_tool_calls()
method. This ensures tool_calls are sanitized for all strict APIs, not
just Mistral. Prevents 400 errors from Fireworks and other providers.
2026-04-05 00:13:25 -07:00
Lume
560c6ae143 feat: add _should_sanitize_tool_calls() method
Adds a centralized method to determine when tool_calls need sanitization
for strict APIs. Returns True for all APIs except codex_responses mode.
This prevents 400 errors from providers like Fireworks that reject unknown
fields (call_id, response_item_id) in tool_calls.
2026-04-05 00:13:25 -07:00
Teknium
5b003ca4a0 test(redact): add regression tests for lowercase variable redaction (#4367) (#5185)
Add 5 regression tests from PR #4476 (gnanam1990) to prevent re-introducing
the IGNORECASE bug that caused lowercase Python/TypeScript variable assignments
to be incorrectly redacted as secrets. The core fix landed in 6367e1c4.

Tests cover:
- Lowercase Python variable with 'token' in name
- Lowercase Python variable with 'api_key' in name
- TypeScript 'await' not treated as secret value
- TypeScript 'secret' variable assignment
- 'export' prefix preserved for uppercase env vars

Co-authored-by: gnanam1990 <gnanam1990@users.noreply.github.com>
2026-04-05 00:10:16 -07:00
Teknium
0fd3de2674 docs(skill): claude-code v2.2 — add cheat sheet commands, env vars, rules, advanced features (#5158)
Expands the claude-code skill with content from official docs and community
cheat sheets that was missing from v2.0:

Slash commands: /cost, /btw, /plan, /loop, /batch, /security-review,
  /resume, /effort (with auto level), /mcp, /release-notes, /voice details
Keyboard shortcuts: Alt+P (model), Alt+T (thinking), Alt+O (fast mode),
  Ctrl+V (paste image), Ctrl+O (transcript), Ctrl+G (external editor)
Ultrathink keyword for max reasoning on a specific turn
Rules directory: .claude/rules/*.md and ~/.claude/rules/*.md
Auto-memory: ~/.claude/projects/<proj>/memory/ (25KB/200 lines limit)
Environment variables: CLAUDE_CODE_EFFORT_LEVEL, MAX_THINKING_TOKENS,
  CLAUDE_CODE_NO_FLICKER, CLAUDE_CODE_SUBPROCESS_ENV_SCRUB
MCP limits: 2KB tool desc cap, maxResultSizeChars 500K, transport types
Reorganized slash commands into Session/Development/Configuration groups
Reorganized keyboard shortcuts into Controls/Toggles/Multiline groups
2026-04-04 19:15:57 -07:00
Teknium
85cefc7a5a fix(telegram): prevent duplicate message delivery on send timeout (#5153)
TimedOut is a subclass of NetworkError in python-telegram-bot. The
inner retry loop in send() and the outer _send_with_retry() in base.py
both treated it as a transient connection error and retried — but
send_message is not idempotent. When the request reaches Telegram but
the HTTP response times out, the message is already delivered. Retrying
sends duplicates. Worst case: up to 9 copies (inner 3x × outer 3x).

Inner loop (telegram.py):
- Import TimedOut separately, isinstance-check before generic
  NetworkError retry (same pattern as BadRequest carve-out from #3390)
- Re-raise immediately — no retry
- Mark as retryable=False in outer exception handler

Outer loop (base.py):
- Remove 'timeout', 'timed out', 'readtimeout', 'writetimeout' from
  _RETRYABLE_ERROR_PATTERNS (read/write timeouts are delivery-ambiguous)
- Add 'connecttimeout' (safe — connection never established)
- Keep 'network' (other platforms still need it)
- Add _is_timeout_error() + early return to prevent plain-text fallback
  on timeout errors (would also cause duplicate delivery)

Connection errors (ConnectionReset, ConnectError, etc.) are still
retried — these fail before the request reaches the server.

Credit: tmdgusya (PR #3899), barun1997 (PR #3904) for identifying the
bug and proposing fixes.

Closes #3899, closes #3904.
2026-04-04 19:05:34 -07:00
Teknium
c8220e69a1 fix: strip MEDIA: directives from streamed gateway messages (#5152)
When streaming is enabled, the GatewayStreamConsumer sends raw text
chunks directly to the platform without post-processing. This causes
MEDIA:/path/to/file tags and [[audio_as_voice]] directives to appear
as visible text in the user's chat instead of being stripped.

The non-streaming path already handles this correctly via
extract_media() in base.py, but the streaming path was missing
equivalent cleanup.

Add _clean_for_display() to GatewayStreamConsumer that strips MEDIA:
tags and internal markers before any text reaches the platform. The
actual media file delivery is unaffected — _deliver_media_from_response()
in gateway/run.py still extracts files from the agent's final_response
(separate from the stream consumer's display text).

Reported by Ao [FotM] on Discord.
2026-04-04 19:05:27 -07:00
Teknium
ff544526cd docs(skill): comprehensive claude-code skill rewrite v2.0 (#5155)
Major rewrite of the claude-code orchestration skill from 94 to 460 lines.
Based on official docs research, community guides, and live experimentation.

Key additions:
- Two orchestration modes: Print mode (-p) vs Interactive PTY via tmux
- Detailed PTY dialog handling (trust + permissions bypass patterns)
- Print mode deep dive: JSON output, piped input, session resumption,
  --json-schema, --bare mode for CI
- Complete flag reference (20+ flags organized by category)
- Interactive session patterns with tmux send-keys/capture-pane
- Claude's slash commands and keyboard shortcuts reference
- CLAUDE.md, hooks, custom subagents, MCP, custom commands docs
- Cost/performance tips (effort levels, budget caps, context mgmt)
- 10 specific pitfalls discovered through live testing
- 10 rules for Hermes agents orchestrating Claude Code
2026-04-04 19:00:50 -07:00
memosr
931624feda fix(security): guard cron script against path traversal and redact output
Relative script paths resolved against HERMES_HOME/scripts/ were not
validated to stay within that directory. Paths like '../../etc/passwd'
could escape and be executed as Python.

Fix: resolve the path and verify it stays within scripts_dir using
Path.relative_to(). Also apply redact_sensitive_text() to script stdout
before LLM injection — same pattern as execute_code sandbox output.

Cherry-picked from PR #5093 by memosr (fixes 1 and 3; absolute path
restriction dropped as too restrictive for the feature's design intent).
2026-04-04 17:01:11 -07:00
Teknium
aa475aef31 feat: add exit code context for common CLI tools in terminal results (#5144)
When commands like grep, diff, test, or find return non-zero exit codes
that aren't actual errors (grep 1 = no matches, diff 1 = files differ),
the model wastes turns investigating non-problems. This adds an
exit_code_meaning field to the terminal JSON result that explains
informational exit codes, so the agent can move on instead of debugging.

Covers grep/rg/ag/ack (no matches), diff (files differ), find (partial
access), test/[ (condition false), curl (timeouts, DNS, HTTP errors),
and git (context-dependent). Correctly extracts the last command from
pipelines and chains, strips full paths and env var assignments.

The exit_code field itself is unchanged — this is purely additive context.
2026-04-04 16:57:24 -07:00
Teknium
5879b3ef82 fix: move pre_llm_call plugin context to user message, preserve prompt cache (#5146)
Plugin context from pre_llm_call hooks was injected into the system
prompt, breaking the prompt cache prefix every turn when content
changed (typical for memory plugins). Now all plugin context goes
into the current turn's user message — the system prompt stays
identical across turns, preserving cached tokens.

The system prompt is reserved for Hermes internals. Plugins
contribute context alongside the user's input.

Also adds comprehensive documentation for all 6 plugin hooks:
pre_tool_call, post_tool_call, pre_llm_call, post_llm_call,
on_session_start, on_session_end — each with full callback
signatures, parameter tables, firing conditions, and examples.

Supersedes #5138 which identified the same cache-busting bug
and proposed an uncached system suffix approach. This fix goes
further by removing system prompt injection entirely.

Co-identified-by: OutThisLife (PR #5138)
2026-04-04 16:55:44 -07:00
Teknium
96e96a79ad fix: --yolo and other flags silently dropped when placed before 'chat' subcommand (#5145)
When --yolo, -w, -s, -r, -c, and --pass-session-id exist on both the parent
parser and the 'chat' subparser with explicit defaults (default=False or
default=None), argparse's subparser initialization overwrites the parent's
parsed value. So 'hermes --yolo chat' silently drops --yolo, making it appear
broken.

Fix: use default=argparse.SUPPRESS on all duplicated arguments in the chat
subparser. SUPPRESS means 'don't set this attribute if the user didn't
explicitly provide it', so the parent parser's value survives through.

Affected flags: --yolo, --worktree/-w, --skills/-s, --pass-session-id,
--resume/-r, --continue/-c.

Adds 15 regression tests covering flag-before-subcommand, flag-after-subcommand,
no-subcommand, and env var propagation scenarios.
2026-04-04 16:55:13 -07:00
Teknium
55bbf8caba fix: include approval metadata in terminal tool results (#5141)
When a dangerous command is approved (gateway, CLI, or smart approval),
the terminal tool now includes an 'approval' field in the result JSON
so the model knows approval was requested and granted. Previously the
model only saw normal command output with no indication that approval
happened, causing it to hallucinate that the approval system didn't fire.

Changes:
- approval.py: Return user_approved/description in all 3 approval paths
  (gateway blocking, CLI interactive, smart approval)
- terminal_tool.py: Capture approval metadata and inject into both
  foreground and background command results
2026-04-04 16:33:20 -07:00
Fran Fitzpatrick
2556cfdab1 fix(gateway): match Discord mention-stripping behavior in Matrix adapter
Move mention stripping outside the `if not is_dm` guard so mentions
are stripped in DMs too. Remove the bare-mention early return so a
message containing only a mention passes through as empty string,
matching Discord's behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 13:09:27 -07:00
Fran Fitzpatrick
d86be33161 feat(gateway): add MATRIX_REQUIRE_MENTION and MATRIX_AUTO_THREAD support
Bring Matrix feature parity with Discord by adding mention gating and
auto-threading. Both default to true, matching Discord behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 13:09:27 -07:00
Teknium
569e9f9670 feat: execute_code runs on remote terminal backends (#5088)
* feat: execute_code runs on remote terminal backends (Docker/SSH/Modal/Daytona/Singularity)

When TERMINAL_ENV is not 'local', execute_code now ships the script to
the remote environment and runs it there via the terminal backend --
the same container/sandbox/SSH session used by terminal() and file tools.

Architecture:
- Local backend: unchanged (UDS RPC, subprocess.Popen)
- Remote backends: file-based RPC via execute_oneshot() polling
  - Script writes request files, parent polls and dispatches tool calls
  - Responses written atomically (tmp + rename) via base64/stdin
  - execute_oneshot() bypasses persistent shell lock for concurrency

Changes:
- tools/environments/base.py: add execute_oneshot() (delegates to execute())
- tools/environments/persistent_shell.py: override execute_oneshot() to
  bypass _shell_lock via _execute_oneshot(), enabling concurrent polling
- tools/code_execution_tool.py: add file-based transport to
  generate_hermes_tools_module(), _execute_remote() with full env
  get-or-create, file shipping, RPC poll loop, output post-processing

* fix: use _get_env_config() instead of raw TERMINAL_ENV env var

Read terminal backend type through the canonical config resolution
path (terminal_tool._get_env_config) instead of os.getenv directly.

* fix: use echo piping instead of stdin_data for base64 writes

Modal doesn't reliably deliver stdin_data to chained commands
(base64 -d > file && mv), producing 0-byte files. Switch to
echo 'base64' | base64 -d which works on all backends.

Verified E2E on both Docker and Modal.
2026-04-04 12:57:49 -07:00
Chris Bartholomew
28e1e210ee fix(hindsight): overhaul hindsight memory plugin and memory setup wizard
- Dedicated asyncio event loop for Hindsight async calls (fixes aiohttp session leaks)
- Client caching (reuse instead of creating per-call)
- Local mode daemon management with config change detection and auto-restart
- Memory mode support (hybrid/context/tools) and prefetch method (recall/reflect)
- Proper shutdown with event loop and client cleanup
- Disable HindsightEmbedded.__del__ to avoid GC loop errors
- Update API URLs (app -> ui.hindsight.vectorize.io, api_url -> base_url)
- Setup wizard: conditional fields (when clause), dynamic defaults (default_from)
- Switch dependency install from pip to uv (correct for uv-based venvs)
- Add hindsight-all to plugin.yaml and import mapping
- 12 new tests for dispatch routing and setup field filtering

Original PR #5044 by cdbartholomew.
2026-04-04 12:18:46 -07:00
Teknium
93aa01c71c fix: use main provider model for auxiliary tasks on non-aggregator providers (#5091)
Users on direct API-key providers (Alibaba, DeepSeek, ZAI, etc.) without
an OpenRouter or Nous key would get broken auxiliary tasks (compression,
vision, etc.) because _resolve_auto() only tried aggregator providers
first, then fell back to iterating PROVIDER_REGISTRY with wrong default
model names.

Now _resolve_auto() checks the user's main provider first. If it's not
an aggregator (OpenRouter/Nous), it uses their main model directly for
all auxiliary tasks. Aggregator users still get the cheap gemini-flash
model as before.

Adds _read_main_provider() to read model.provider from config.yaml,
mirroring the existing _read_main_model().

Reported by SkyLinx — Alibaba Coding Plan user getting 400 errors from
google/gemini-3-flash-preview being sent to DashScope.
2026-04-04 12:07:43 -07:00
Teknium
5d0f55cac4 feat(cron): add script field for pre-run data collection (#5082)
Add an optional 'script' parameter to cron jobs that references a Python
script. The script runs before each agent turn, and its stdout is injected
into the prompt as context. This enables stateful monitoring — the script
handles data collection and change detection, the LLM analyzes and reports.

- cron/jobs.py: add script field to create_job(), stored in job dict
- cron/scheduler.py: add _run_job_script() executor with timeout handling,
  inject script output/errors into _build_job_prompt()
- tools/cronjob_tools.py: add script to tool schema, create/update handlers,
  _format_job display
- hermes_cli/cron.py: add --script to create/edit, display in list/edit output
- hermes_cli/main.py: add --script argparse for cron create/edit subcommands
- tests/cron/test_cron_script.py: 20 tests covering job CRUD, script
  execution, path resolution, error handling, prompt injection, tool API

Script paths can be absolute or relative (resolved against ~/.hermes/scripts/).
Scripts run with a 120s timeout. Failures are injected as error context so
the LLM can report the problem. Empty string clears an attached script.
2026-04-04 10:43:39 -07:00
catbusconductor
e09e48567e fix(openviking): correct API endpoint paths and response parsing
- Browse: POST /api/v1/browse → GET /api/v1/fs/{ls,tree,stat}
- Read: POST /api/v1/read[/abstract] → GET /api/v1/content/{read,abstract,overview}
- System prompt: result.get('children') → len(result) (API returns list)
- Content: result.get('content') → result is a plain string
- Browse: result['entries'] → result is the list; is_dir → isDir (camelCase)
- Browse: add rel_path and abstract fields to entry output

Based on PR #4742 by catbusconductor. Auth header changes dropped
(already on main via #4825).
2026-04-04 10:40:38 -07:00
Teknium
2aa3f199cb fix(doctor): sync provider checks, add config migration, WAL and mem0 diagnostics (#5077)
Provider coverage:
- Add 6 missing providers to _PROVIDER_ENV_HINTS (Nous, DeepSeek,
  DashScope, HF, OpenCode Zen/Go)
- Add 5 missing providers to API connectivity checks (DeepSeek,
  Hugging Face, Alibaba/DashScope, OpenCode Zen, OpenCode Go)

New diagnostics:
- Config version check — detects outdated config, --fix runs
  non-interactive migration automatically
- Stale root-level config keys — detects provider/base_url at root
  level (known bug source, PR #4329), --fix migrates them into
  the model section
- WAL file size check — warns on >50MB WAL files (indicates missed
  checkpoints from the duplicate close() bug), --fix runs PASSIVE
  checkpoint
- Mem0 memory plugin status — checks API key resolution including
  the env+json merge we just fixed
2026-04-04 10:21:33 -07:00
LucidPaths
6367e1c4c0 fix: remove stale test skips, fix regex backtracking, file search bug, and test flakiness
Bug fixes:
- agent/redact.py: catastrophic regex backtracking in _ENV_ASSIGN_RE — removed
  re.IGNORECASE and changed [A-Z_]* to [A-Z0-9_]* to restrict matching to actual
  env var name chars. Without this, the pattern backtracks exponentially on large
  strings (e.g. 100K tool output), causing test_file_read_guards to time out.
- tools/file_operations.py: over-escaped newline in find -printf format string
  produced literal backslash-n instead of a real newline, breaking file search
  result parsing (total_count always 1, paths concatenated).

Test fixes:
- Remove stale pytestmark.skip from 4 test modules that were blanket-skipped as
  'Hangs in non-interactive environments' but actually run fine:
  - test_413_compression.py (12 tests, 25s)
  - test_file_tools_live.py (71 tests, 24s)
  - test_code_execution.py (61 tests, 99s)
  - test_agent_loop_tool_calling.py (has proper OPENROUTER_API_KEY skip already)
- test_413_compression.py: fix threshold values in 2 preflight compression tests
  where context_length was too small for the compressed output to fit in one pass.
- test_mcp_probe.py: add missing _MCP_AVAILABLE mock so tests work without MCP SDK.
- test_mcp_tool_issue_948.py: inject MCP symbols (StdioServerParameters etc.) when
  SDK is not installed so patch() targets exist.
- test_approve_deny_commands.py: replace time.sleep(0.3) with deterministic polling
  of _gateway_queues — fixes race condition where resolve fires before threads
  register their approval entries, causing the test to hang indefinitely.

Net effect: +256 tests recovered from skip, 8 real failures fixed.
2026-04-04 10:18:57 -07:00
Teknium
77a2aad771 docs: fix stale references across 8 doc pages
Audit found 24+ discrepancies between docs and code. Fixed:

HIGH severity:
- Remove honcho toolset from tools-reference, toolsets-reference, and tools.md
  (converted to memory provider plugin, not a built-in toolset)
- Add note that Honcho is available via plugin

MEDIUM severity:
- Add hermes memory command family to cli-commands.md (setup/status/off)
- Add --clone-all, --clone-from to profile create in cli-commands.md
- Add --max-turns option to hermes chat in cli-commands.md
- Add /btw slash command to slash-commands.md
- Fix profile show example output (remove nonexistent disk usage,
  add .env and SOUL.md status lines)
- Add missing hermes-webhook toolset to toolsets-reference.md
- Add 5 missing providers to fallback-providers.md table
- Add 7 missing providers to providers.md fallback list
- Fix outdated model examples: glm-4-plus→glm-5, moonshot-v1-auto→kimi-for-coding
2026-04-03 23:30:29 -07:00
Teknium
43d3efd5c8 feat: add docker_env config for explicit container environment variables (#4738)
Add docker_env option to terminal config — a dict of key-value pairs that
get set inside Docker containers via -e flags at both container creation
(docker run) and per-command execution (docker exec) time.

This complements docker_forward_env (which reads values dynamically from
the host process environment). docker_env is useful when Hermes runs as a
systemd service without access to the user's shell environment — e.g.
setting SSH_AUTH_SOCK or GNUPGHOME to known stable paths for SSH/GPG
agent socket forwarding.

Precedence: docker_env provides baseline values; docker_forward_env
overrides for the same key.

Config example:
  terminal:
    docker_env:
      SSH_AUTH_SOCK: /run/user/1000/ssh-agent.sock
      GNUPGHOME: /root/.gnupg
    docker_volumes:
      - /run/user/1000/ssh-agent.sock:/run/user/1000/ssh-agent.sock
      - /run/user/1000/gnupg/S.gpg-agent:/root/.gnupg/S.gpg-agent
2026-04-03 23:30:12 -07:00
Stefan Vandermeulen
78ec8b017f style: add debug log for write-back failure in retry path
Address review feedback: replace bare `except: pass` with a debug
log when the post-retry write-back to ~/.claude/.credentials.json
fails. The write-back is best-effort (token is already resolved),
but logging helps troubleshooting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 23:26:08 -07:00
Stefan Vandermeulen
a70ee1b898 fix: sync OAuth tokens between credential pool and credentials file
OAuth refresh tokens are single-use. When multiple consumers share the
same Anthropic OAuth session (credential pool entries, Claude Code CLI,
multiple Hermes profiles), whichever refreshes first invalidates the
refresh token for all others. This causes a cascade:

1. Pool entry tries to refresh with a consumed refresh token → 400
2. Pool marks the credential as "exhausted" with a 24-hour cooldown
3. All subsequent heartbeats skip the credential entirely
4. The fallback to resolve_anthropic_token() only works while the
   access token in ~/.claude/.credentials.json hasn't expired
5. Once it expires, nothing can auto-recover without manual re-login

Fix:
- Add _sync_anthropic_entry_from_credentials_file() to detect when
  ~/.claude/.credentials.json has a newer refresh token and sync it
  into the pool entry, clearing exhaustion status
- After a successful pool refresh, write the new tokens back to
  ~/.claude/.credentials.json so other consumers stay in sync
- On refresh failure, check if the credentials file has a different
  (newer) refresh token and retry once before marking exhausted
- In _available_entries(), sync exhausted claude_code entries from
  the credentials file before applying the 24-hour cooldown, so a
  manual re-login or external refresh immediately unblocks agents

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 23:26:08 -07:00
Teknium
b93fa234df fix: clear ghost status-bar lines on terminal resize (#4960)
* feat: add /branch (/fork) command for session branching

Inspired by Claude Code's /branch command. Creates a copy of the current
session's conversation history in a new session, allowing the user to
explore a different approach without losing the original.

Works like 'git checkout -b' for conversations:
- /branch            — auto-generates a title from the parent session
- /branch my-idea    — uses a custom title
- /fork              — alias for /branch

Implementation:
- CLI: _handle_branch_command() in cli.py
- Gateway: _handle_branch_command() in gateway/run.py
- CommandDef with 'fork' alias in commands.py
- Uses existing parent_session_id field in session DB
- Uses get_next_title_in_lineage() for auto-numbered branches
- 14 tests covering session creation, history copy, parent links,
  title generation, edge cases, and agent sync

* fix: clear ghost status-bar lines on terminal resize

When the terminal shrinks (e.g. un-maximize), the emulator reflows
previously full-width rows (status bar, input rules) into multiple
narrower rows. prompt_toolkit's _on_resize only cursor_up()s by the
stored layout height, missing the extra rows from reflow — leaving
ghost duplicates of the status bar visible.

Fix: monkey-patch Application._on_resize to detect width shrinks,
calculate the extra rows created by reflow, and inflate the renderer's
cursor_pos.y so the erase moves up far enough to clear ghosts.
2026-04-03 22:43:45 -07:00
Octopus
f5c212f69b feat: add MiniMax TTS provider support (speech-2.8)
Add MiniMax as a fifth TTS provider alongside Edge TTS, ElevenLabs,
OpenAI, and NeuTTS. Supports speech-2.8-hd (recommended default) and
speech-2.8-turbo models via the MiniMax T2A HTTP API.

Changes:
- Add _generate_minimax_tts() with hex-encoded audio decoding
- Add MiniMax to provider dispatch, requirements check, and Telegram
  Opus compatibility handling
- Add MiniMax to interactive setup wizard with API key prompt
- Update TTS documentation and config example

Configuration:
  tts:
    provider: "minimax"
    minimax:
      model: "speech-2.8-hd"
      voice_id: "English_Graceful_Lady"

Requires MINIMAX_API_KEY environment variable.

API reference: https://platform.minimax.io/docs/api-reference/speech-t2a-http
2026-04-03 22:42:14 -07:00
acsezen
831067c5d3 perf: fix O(n²) catastrophic backtracking in redact regex + reorder file read guard
Two pre-existing issues causing test_file_read_guards timeouts on CI:

1. agent/redact.py: _ENV_ASSIGN_RE used unbounded [A-Z_]* with
   IGNORECASE, matching any letter/underscore to end-of-string at
   each position → O(n²) backtracking on 100K+ char inputs.
   Bounded to {0,50} since env var names are never that long.

2. tools/file_tools.py: redact_sensitive_text() ran BEFORE the
   character-count guard, so oversized content (that would be rejected
   anyway) went through the expensive regex first. Reordered to check
   size limit before redaction.
2026-04-03 22:40:37 -07:00
Teknium
1c0c5d957f fix(gateway): support infinite timeout + periodic notifications + actionable error (#4959)
- HERMES_AGENT_TIMEOUT=0 now means no limit (infinite execution)
- Periodic 'still working' notifications every 10 minutes for long tasks
- Timeout error message now tells users how to increase the limit
- Stale-lock eviction handles infinite timeout correctly (float inf TTL)
2026-04-03 22:37:38 -07:00
Teknium
34308e4de9 docs: improve youtube-content skill structure and workflow
Clearer workflow with validation/chunking steps, expanded description
with trigger terms for better agent matching, tightened error handling.
Fixed stray pipe character in original PR diff.

Based on PR #4778 by fernandezbaptiste.

Co-authored-by: fernandezbaptiste <fernandezbaptiste@users.noreply.github.com>
2026-04-03 22:18:00 -07:00
Teknium
ad4feeaf0d feat: wire skills.external_dirs into all remaining discovery paths
The config key skills.external_dirs and core resolution (get_all_skills_dirs,
get_external_skills_dirs in agent/skill_utils.py) already existed but several
code paths still only scanned SKILLS_DIR. Now external dirs are respected
everywhere:

- skills_categories(): scan all dirs for category discovery
- _get_category_from_path(): resolve categories against any skills root
- skill_manager_tool._find_skill(): search all dirs for edit/patch/delete
- credential_files.get_skills_directory_mount(): mount all dirs into
  Docker/Singularity containers (external dirs at external_skills/<idx>)
- credential_files.iter_skills_files(): list files from all dirs for
  Modal/Daytona upload
- tools/environments/ssh.py: rsync all skill dirs to remote hosts
- gateway _check_unavailable_skill(): check disabled skills across all dirs

Usage in config.yaml:
  skills:
    external_dirs:
      - ~/repos/agent-skills/hermes
      - /shared/team-skills
2026-04-03 21:14:42 -07:00
Teknium
5a98ce5973 fix: use clean user message for all memory provider operations (#4940)
When a skill is active, user_message contains the full SKILL.md content
injected by the skill system. This bloated string was being passed to
memory provider sync_all(), queue_prefetch_all(), and prefetch_all(),
causing providers with query size limits (e.g. Honcho's 10K char limit)
to fail.

Both call sites now use original_user_message (the clean user input,
already defined at line 6516) instead of the skill-inflated user_message:

- Pre-turn prefetch (line ~6695): prefetch_all() query
- Post-turn sync (line ~8672): sync_all() + queue_prefetch_all()

Fixes #4889
2026-04-03 20:43:01 -07:00
Teknium
585a3b40ad fix: use 'is not None and != ""' instead of truthiness for mem0.json merge
The original filter (if v) silently drops False and 0, so
'rerank: false' in mem0.json would be ignored. Use explicit
None/empty-string check to preserve intentional falsy values.
2026-04-03 20:42:48 -07:00
Livia Ellen
5e3303b3d8 fix(mem0): merge env vars with mem0.json instead of either/or
When mem0.json exists but is missing the api_key (e.g. after running
`hermes memory setup`), the plugin reports "not available" even though
MEM0_API_KEY is set in .env.  This happens because _load_config()
returns the JSON file contents verbatim, never falling back to env vars.

Use env vars as the base config and let mem0.json override individual
keys on top, so both config sources work together.

Fixes: mem0 plugin shows "not available" despite valid MEM0_API_KEY in .env
2026-04-03 20:42:48 -07:00
Mibayy
14e87325df fix(openviking): send tenant-scoping headers on every request (#4825)
OpenViking is multi-tenant and requires X-OpenViking-Account and
X-OpenViking-User headers. Without them, API calls like POST
/api/v1/search/find fail on authenticated servers.

Add both headers to _VikingClient._headers(), read from env vars
OPENVIKING_ACCOUNT (default: root) and OPENVIKING_USER (default:
default). All instantiation sites inherit the fix automatically.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 20:32:55 -07:00
Teknium
f1c0847145 fix(gateway): restore short preview truncation for all/new tool progress modes (#4935)
The tool_preview_length: 0 (unlimited) config change from e314833c
removed truncation from gateway progress messages in all/new modes.
This caused full terminal commands, code blocks, and file paths to
appear as permanent messages in Telegram -- the old 40-char truncation
was the correct behavior for messaging platforms.

Now:
- all/new modes: always truncate previews to 40 chars (old behavior)
- verbose mode: respects tool_preview_length config for JSON args cap

Reported by Paulclgro and socialsurfer on Discord.
2026-04-03 20:32:01 -07:00
Teknium
8af6a08695 fix: don't treat bare file paths as slash commands
Input like /Users/ironin/file.md:45-46 was routed to process_command()
because it starts with /. Added _looks_like_slash_command() which checks
whether the first word contains additional / characters — commands never
do (/help, /model), paths always do (/Users/foo/bar.md).

Applied to both process_loop routing and handle_enter interrupt bypass.
Preserves prefix matching (/h → /help) since short prefixes still pass
the check.

Based on PR #4782 by iRonin.

Co-authored-by: iRonin <iRonin@users.noreply.github.com>
2026-04-03 20:16:04 -07:00
Teknium
fb68c22340 fix(gateway): bypass active-session guard for /approve and /deny commands (#4926)
The base adapter's active-session guard queues all messages when an agent
is running. This creates a deadlock for /approve and /deny: the agent
thread is blocked on threading.Event.wait() in tools/approval.py waiting
for resolve_gateway_approval(), but the /approve command is queued waiting
for the agent to finish.

Dispatch /approve and /deny directly to the message handler (which routes
to gateway/run.py's _handle_approve_command) without going through
_process_message_background — avoids spawning a competing background task
that would mess with session lifecycle/guards.

Fixes #4898
Co-authored-by: mechovation (original diagnosis in PR #4904)
2026-04-03 20:08:37 -07:00
memosr
287ac15efd fix(gateway): write update-pending state atomically to prevent corruption 2026-04-03 18:57:38 -07:00
Teknium
cee761ee4a fix: prevent duplicate messages — gateway dedup + partial stream guard (#4878)
* fix(gateway): add message deduplication to Discord and Slack adapters (#4777)

Discord RESUME replays events after reconnects (~7/day observed),
and Slack Socket Mode can redeliver events if the ack was lost.
Neither adapter tracked which messages were already processed,
causing duplicate bot responses.

Add _seen_messages dedup cache (message ID → timestamp) with 5-min
TTL and 2000-entry cap to both adapters, matching the pattern already
used by Mattermost, Matrix, WeCom, Feishu, DingTalk, and Email.

The check goes at the very top of the message handler, before any
other logic, so replayed events are silently dropped.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: prevent duplicate messages on partial stream delivery

When streaming fails after tokens are already delivered to the platform,
_interruptible_streaming_api_call re-raised the error into the outer
retry loop, which would make a new API call — creating a duplicate
message.

Now checks deltas_were_sent before re-raising: if partial content was
already streamed, returns a stub response instead. The outer loop treats
the turn as complete (no retry, no fallback, no duplicate).

Inspired by PR #4871 (@trevorgordon981) which identified the bug.
This implementation avoids monkey-patching exception objects and keeps
the fix within the streaming call boundary.

---------

Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 18:53:52 -07:00
Teknium
36aace34aa fix(opencode-go): strip trailing /v1 from base URL for Anthropic models (#4918)
The Anthropic SDK appends /v1/messages to the base_url, so OpenCode's
base URL https://opencode.ai/zen/go/v1 produced a double /v1 path
(https://opencode.ai/zen/go/v1/v1/messages), causing 404s for MiniMax
models. Strip trailing /v1 when api_mode is anthropic_messages.

Also adds MiMo-V2-Pro, MiMo-V2-Omni, and MiniMax-M2.5 to the OpenCode
Go model lists per their updated docs.

Fixes #4890
2026-04-03 18:47:51 -07:00
Teknium
d4bf517b19 test+docs: add group_topics tests and documentation
- 7 new tests covering skill binding, fallthrough, coercion
- Docs section in telegram.md with config format, field reference,
  comparison table, and thread_id discovery tip
2026-04-03 18:20:50 -07:00
Dolf
1cae9ac628 feat(telegram): add group_topics skill binding for supergroup forum topics
Reads config.extra['group_topics'] to bind skills to specific thread_ids
in supergroup/forum chats. Mirrors the dm_topics skill injection pattern
but for group chat_type. Enables per-topic skill auto-loading in Falcon HQ.

Config format:
  platforms.telegram.extra.group_topics:
    - chat_id: -1003853746818
      topics:
        - name: FalconConnect
          thread_id: 5
          skill: falconconnect-architecture
2026-04-03 18:20:50 -07:00
Teknium
fb654c15d8 fix: add type hints to session key helpers, extend context-local key to terminal_tool
- Add contextvars.Token[str] type hints to set/reset_current_session_key
- Use get_current_session_key(default='') in terminal_tool.py for background
  process session tracking, fixing the same env var race for concurrent
  gateway sessions spawning background processes
2026-04-03 17:50:01 -07:00
Tranquil-Flow
3bfb39a25f fix(gateway): isolate approval session key per turn 2026-04-03 17:50:01 -07:00
kshitijk4poor
5359921199 refactor: simplify scope validation helpers in google workspace scripts
Fix double file read bug in google_api.py _missing_scopes(), consolidate
redundant _normalize_scope_values into callers, merge duplicate except blocks.
2026-04-03 17:49:18 -07:00
kshitijk4poor
37e2ef6c3f fix: protect profile-scoped google workspace oauth tokens 2026-04-03 17:49:18 -07:00
Teknium
92dcdbff66 fix: clarify interrupt re-queue label, document busy_input_mode behaviour
The '📨 Queued:' label was misleading — it looked like the message was
silently deferred when it was actually being sent immediately after the
interrupt. Changed to ' Sending after interrupt:' with multi-message
count when the user typed several messages during agent execution.

Added comment documenting that this code path only applies when
busy_input_mode == 'interrupt' (the default).

Based on PR #4821 by iRonin.

Co-authored-by: iRonin <iRonin@users.noreply.github.com>
2026-04-03 15:00:05 -07:00
Teknium
3f2180037c fix: also filter session_meta in /session switch restore path
The original PR missed the third CLI restore path — the /session switch
command that loads history via get_messages_as_conversation() without
stripping session_meta entries.
2026-04-03 14:57:33 -07:00
kagura-agent
6bf5946bbe fix: filter transcript-only roles from chat-completions payload (#4715)
Add a provider-agnostic role allowlist guard to _sanitize_api_messages()
that drops messages with roles not accepted by the chat-completions API
(e.g. session_meta). This prevents CLI resume/session restore from
leaking transcript-only metadata into the outgoing messages payload.

Two layers of defense:

1. API-boundary guard: _sanitize_api_messages() now filters messages by
   role allowlist (system/user/assistant/tool/function/developer) before
   the existing orphaned tool-call repair logic. This protects all
   current and future call paths.

2. CLI restore defense-in-depth: Both session restore paths in cli.py
   now strip session_meta entries before loading history into
   conversation_history, matching the existing gateway behavior.

Closes #4715
2026-04-03 14:57:33 -07:00
Hermes Agent
bef895b371 fix(memory): preserve holographic prompt and trust score rendering 2026-04-03 14:22:22 -07:00
Teknium
84a875ca02 fix: scope gateway stop/restart to current profile, --all for global kill
gateway stop and restart previously called kill_gateway_processes() which
scans ps aux and kills ALL gateway processes across all profiles. Starting
a profile gateway would nuke the main one (and vice versa).

Now:
- hermes gateway stop → only kills the current profile's gateway (PID file)
- hermes -p work gateway stop → only kills the 'work' profile's gateway
- hermes gateway stop --all → kills every gateway process (old behavior)
- hermes gateway restart → profile-scoped for manual fallback path
- hermes update → discovers and restarts ALL profile gateways (systemctl
  list-units hermes-gateway*) since the code update is shared

Added stop_profile_gateway() which uses the HERMES_HOME-scoped PID file
instead of global process scanning.
2026-04-03 14:21:44 -07:00
Teknium
52ddd6bc64 refactor(skills): consolidate code verification skills into one (#4854)
* chore: release v0.7.0 (2026.4.3)

168 merged PRs, 223 commits, 46 resolved issues, 40+ contributors.

Highlights: pluggable memory providers, credential pools, Camofox browser,
inline diff previews, API server session continuity, ACP MCP registration,
gateway hardening, secret exfiltration blocking.

* refactor(skills): consolidate code-review + verify-code-changes into requesting-code-review

Merge the passive code-review checklist and the automated verification
pipeline (from PR #4459 by @MorAlekss) into a single requesting-code-review
skill. This eliminates model confusion between three overlapping skills.

Now includes:
- Static security scan (grep on diff lines)
- Baseline-aware quality gates (only flag NEW failures)
- Multi-language tool detection (Python, Node, Rust, Go)
- Independent reviewer subagent with fail-closed JSON verdict
- Auto-fix loop with separate fixer agent (max 2 attempts)
- Git checkpoint and [verified] commit convention

Deletes: skills/software-development/code-review/ (absorbed)
Closes: #406 (independent code verification)
2026-04-03 14:13:27 -07:00
Teknium
7def061fee feat: add arcee-ai/trinity-large-thinking to recommended models
Added to OPENROUTER_MODELS and _PROVIDER_MODELS['nous'] lists.
Also added 'trinity' family entry to DEFAULT_CONTEXT_LENGTHS (262K).
2026-04-03 13:45:29 -07:00
CK iRonin.IT
de5aacddd2 fix: normalise \r\n and \r line endings in pasted text
Windows (CRLF) and old Mac (CR) line endings are normalised to LF
before the 5-line collapse threshold is checked in handle_paste.

Without this, markdown copied from Windows sources contains \r\n but
the line counter (pasted_text.count('\n')) still works — however
buf.insert_text() leaves bare \r characters in the buffer which some
terminals render by moving the cursor to the start of the line,
making multi-line pastes appear as a single overwritten line.
2026-04-03 13:20:50 -07:00
Teknium
b1756084a3 feat: add .zip document support and auto-mount cache dirs into remote backends (#4846)
- Add .zip to SUPPORTED_DOCUMENT_TYPES so gateway platforms (Telegram,
  Slack, Discord) cache uploaded zip files instead of rejecting them.
- Add get_cache_directory_mounts() and iter_cache_files() to
  credential_files.py for host-side cache directory passthrough
  (documents, images, audio, screenshots).
- Docker: bind-mount cache dirs read-only alongside credentials/skills.
  Changes are live (bind mount semantics).
- Modal: mount cache files at sandbox creation + resync before each
  command via _sync_files() with mtime+size change detection.
- Handles backward-compat with legacy dir names (document_cache,
  image_cache, audio_cache, browser_screenshots) via get_hermes_dir().
- Container paths always use the new cache/<subdir> layout regardless
  of host layout.

This replaces the need for a dedicated extract_archive tool (PR #4819)
— the agent can now use standard terminal commands (unzip, tar) on
uploaded files inside remote containers.

Closes: related to PR #4819 by kshitijk4poor
2026-04-03 13:16:26 -07:00
Teknium
8a384628a5 fix(memory): profile-scoped memory isolation and clone support (#4845)
Three fixes for memory+profile isolation bugs:

1. memory_tool.py: Replace module-level MEMORY_DIR constant with
   get_memory_dir() function that calls get_hermes_home() dynamically.
   The old constant was cached at import time and could go stale if
   HERMES_HOME changed after import. Internal MemoryStore methods now
   call get_memory_dir() directly. MEMORY_DIR kept as backward-compat
   alias.

2. profiles.py: profile create --clone now copies MEMORY.md and USER.md
   from the source profile. These curated memory files are part of the
   agent's identity (same as SOUL.md) and should carry over on clone.

3. holographic plugin: initialize() now expands $HERMES_HOME and
   ${HERMES_HOME} in the db_path config value, so users can write
   'db_path: $HERMES_HOME/memory_store.db' and it resolves to the
   active profile directory, not the default home.

Tests updated to mock get_memory_dir() alongside the legacy MEMORY_DIR.
2026-04-03 13:10:11 -07:00
Teknium
4979d77a4a fix: complete browser_tool profile isolation — replace remaining 3 hardcoded HERMES_HOME instances
The original PR fixed 4 of 7 instances. This fixes the remaining 3:
- _launch_local_browser() PATH setup (line 908)
- _start_recording() config read (line 1545)
- _cleanup_old_recordings() path (line 1834)
2026-04-03 13:09:54 -07:00
Dusk1e
a09fa690f0 fix: resolve critical stability issues in core, web, and browser tools 2026-04-03 13:09:54 -07:00
Teknium
6d357bb185 fix: regenerate uv.lock to sync with pyproject.toml v0.7.0 (#4842)
uv.lock was stale at v0.5.0 and missing exa-py (core dep), causing
ModuleNotFoundError for Nix flake builds. Also syncs faster-whisper
placement (core → voice extra), adds feishu/debugpy/lark-oapi extras.

Fixes #4648
Credit to @lvnilesh for identifying the issue in PR #4649.
2026-04-03 12:53:45 -07:00
Dat Pham
b3319b1252 fix(memory): Fix ByteRover plugin - run brv query synchronously before LLM call
The pipeline prefetch design was firing \`brv query\` in a background
thread *after* each response, meaning the context injected at turn N
was from turn N-1's message — and the first turn got no BRV context
at all. Replace the async prefetch pipeline with a synchronous query
in \`prefetch()\` so recall runs before the first API call on every
turn. Make \`queue_prefetch()\` a no-op and remove the now-unused
pipeline state.
2026-04-03 12:11:29 -07:00
Teknium
abf1e98f62 chore: release v0.7.0 (2026.4.3) (#4812)
168 merged PRs, 223 commits, 46 resolved issues, 40+ contributors.

Highlights: pluggable memory providers, credential pools, Camofox browser,
inline diff previews, API server session continuity, ACP MCP registration,
gateway hardening, secret exfiltration blocking.
2026-04-03 11:14:55 -07:00
Teknium
e492420df4 fix: route memory provider tools in sequential execution path (#4803)
Memory provider tools (hindsight_retain, honcho_search, etc.) were
advertised to the model via tool schemas but failed with 'Unknown tool'
at execution time. The concurrent path (_invoke_tool) correctly checks
self._memory_manager.has_tool() before falling through to the registry,
but the sequential path (_execute_tool_calls_sequential) was never
updated with this check. Since sequential is the default for single
tool calls, memory provider tools always hit the registry dispatcher
which returns 'Unknown tool' because they're not registered there.

Add the memory_manager dispatch check between the delegate_task handler
and the quiet_mode fallthrough in the sequential path, with proper
spinner/display handling to match the existing pattern.

Reported by KiBenderOP — all memory providers affected (Honcho,
Hindsight, Holographic, etc.).
2026-04-03 10:31:53 -07:00
Teknium
67e3620c5c fix: persist API server sessions to shared SessionDB (state.db) (#4802)
The API server adapter created AIAgent instances without passing
session_db, so conversations via Open WebUI and other OpenAI-compatible
frontends were never persisted to state.db. This meant 'hermes sessions
list' showed no API server sessions — they were effectively stateless.

Changes:
- Add _ensure_session_db() helper for lazy SessionDB initialization
- Pass session_db=self._ensure_session_db() in _create_agent()
- Refactor existing X-Hermes-Session-Id handler to use the shared helper

Sessions now persist with source='api_server' and are visible alongside
CLI and gateway sessions in hermes sessions list/search.
2026-04-03 10:31:11 -07:00
Teknium
aecbf7fa4a fix(discord): register /approve and /deny slash commands, wire up button-based approval UI (#4800)
Two fixes for Discord exec approval:

1. Register /approve and /deny as native Discord slash commands so they
   appear in Discord's command picker (autocomplete). Previously they
   were only handled as text commands, so users saw 'no commands found'
   when typing /approve.

2. Wire up the existing ExecApprovalView button UI (was dead code):
   - ExecApprovalView now calls resolve_gateway_approval() to actually
     unblock the waiting agent thread when a button is clicked
   - Gateway's _approval_notify_sync() detects adapters with
     send_exec_approval() and routes through the button UI
   - Added 'Allow Session' button for parity with /approve session
   - send_exec_approval() now accepts session_key and metadata for
     thread support
   - Graceful fallback to text-based /approve prompt if button send fails

Also updates test mocks to include grey/secondary ButtonStyle and
purple Color (used by new button styles).
2026-04-03 10:24:07 -07:00
Teknium
5db630aae4 fix: respect per-platform disabled skills in Telegram menu and gateway dispatch (#4799)
Three interconnected bugs caused `hermes skills config` per-platform
settings to be silently ignored:

1. telegram_menu_commands() never filtered disabled skills — all skills
   consumed menu slots regardless of platform config, hitting Telegram's
   100 command cap. Now loads disabled skills for 'telegram' and excludes
   them from the menu.

2. Gateway skill dispatch executed disabled skills because
   get_skill_commands() (process-global cache) only filters by the global
   disabled list at scan time. Added per-platform check before execution,
   returning an actionable 'skill is disabled' message.

3. get_disabled_skill_names() only checked HERMES_PLATFORM env var, but
   the gateway sets HERMES_SESSION_PLATFORM instead. Added
   HERMES_SESSION_PLATFORM as fallback, plus an explicit platform=
   parameter for callers that know their platform (menu builder, gateway
   dispatch). Also added platform to prompt_builder's skills cache key
   so multi-platform gateways get correct per-platform skill prompts.

Reported by SteveSkedasticity (CLAW community).
2026-04-03 10:10:53 -07:00
Teknium
b6f9b70afd fix(gateway): route /approve and /deny through running-agent guard (#4798)
When the agent is blocked on a dangerous command approval (threading.Event
wait inside tools/approval.py), incoming /approve and /deny commands were
falling through to the generic interrupt path instead of being dispatched
to their command handlers. The interrupt sets _interrupt_requested on the
agent, but the agent thread is blocked on event.wait() — not checking the
flag. Result: approval times out after 300s (5 minutes) before executing.

Fix: intercept /approve and /deny in the running-agent early-intercept
block (alongside /stop, /new, /queue) and route directly to
_handle_approve_command / _handle_deny_command.
2026-04-03 09:59:52 -07:00
Teknium
93334b2b92 docs: add community FAQ entries — multi-model workflows, WhatsApp binding, verbose control, skills config, thread sessions, migration, install troubleshooting (#4797)
Addresses common questions from the Nous Research community Discord:
- Multi-model workflows via delegation config
- WhatsApp per-chat binding limitations and workarounds
- Controlling tool progress display on Telegram
- Per-platform skills config and Telegram 100-command limit
- Shared thread sessions across multiple users
- Exporting/migrating Hermes to a new machine
- Permission denied on shell reload after install
- HTTP 400 on first agent run
2026-04-03 09:58:22 -07:00
Teknium
d50e5be500 fix: handle None mcp_servers in _get_platform_tools()
When config.yaml has 'mcp_servers:' with no value, YAML parses it as
None. dict.get('mcp_servers', {}) only returns the default when the key
is absent, not when it's explicitly None. Use 'or {}' pattern to handle
both cases, matching the other two assignment sites in the same file.
2026-04-03 09:08:20 -07:00
Teknium
cc54818d26 fix(mcp): stability fix pack — reload timeout, shutdown cleanup, event loop handler, OAuth non-blocking (#4757)
Four fixes for MCP server stability issues reported by community member
(terminal lockup, zombie processes, escape sequence pollution, startup hang):

1. MCP reload timeout guard (cli.py): _check_config_mcp_changes now runs
   _reload_mcp in a separate daemon thread with a 30s hard timeout. Previously,
   a hung MCP server could block the process_loop thread indefinitely, freezing
   the entire TUI (user can type but nothing happens, only Ctrl+D/Ctrl+\ work).

2. MCP stdio subprocess PID tracking (mcp_tool.py): Tracks child PIDs spawned
   by stdio_client via before/after snapshots of /proc children. On shutdown,
   _stop_mcp_loop force-kills any tracked PIDs that survived the SDK's graceful
   SIGTERM→SIGKILL cleanup. Prevents zombie MCP server processes from
   accumulating across sessions.

3. MCP event loop exception handler (mcp_tool.py): Installs
   _mcp_loop_exception_handler on the MCP background event loop — same pattern
   as the existing _suppress_closed_loop_errors on prompt_toolkit's loop.
   Suppresses benign 'Event loop is closed' RuntimeError from httpx transport
   __del__ during MCP shutdown. Salvaged from PR #2538 (acsezen).

4. MCP OAuth non-blocking (mcp_oauth.py): Replaces blocking input() call in
   _wait_for_callback with OAuthNonInteractiveError raise. Adds _is_interactive()
   TTY detection. In non-interactive environments, build_oauth_auth() still
   returns a provider (cached tokens + refresh work), but the callback handler
   raises immediately instead of blocking the MCP event loop for 120s. Re-raises
   OAuth setup failures in _run_http so failed servers are reported cleanly
   without blocking others. Salvaged from PRs #4521 (voidborne-d) and #4465
   (heathley).

Closes #2537, closes #4462
Related: #4128, #3436
2026-04-03 02:29:20 -07:00
Teknium
f374ae4c61 fix: prevent compression death spiral from API disconnects (#2153) (#4750)
Three fixes for long-running gateway sessions that enter a death spiral
when API disconnects prevent token data collection, which prevents
compression, which causes more disconnects:

Layer 1 — Stale token counter fallback (run_agent.py in-loop):
When last_prompt_tokens is 0 (stale after API disconnect or provider
returned no usage data), fall back to estimate_messages_tokens_rough()
instead of passing 0 to should_compress(), which would never fire.

Layer 2 — Server disconnect heuristic (run_agent.py error handler):
When ReadError/RemoteProtocolError hits a large session (>60% context
or >200 messages), treat it as a context-length error and trigger
compression rather than burning through retries that all fail the
same way.

Layer 3 — Hard message count limit (gateway/run.py hygiene):
Force compression when a session exceeds 400 messages, regardless of
token estimates. This catches runaway growth even when all token-based
checks fail due to missing API data.

Based on the analysis from PR #2157 by ygd58 — the gateway threshold
direction fix (1.4x multiplier) was already resolved on main.
2026-04-03 02:16:46 -07:00
Teknium
8fd9fafc84 fix: handle Anthropic Sonnet long-context tier 429 by reducing to 200k (#4747)
Anthropic returns HTTP 429 'Extra usage is required for long context
requests' when a Claude Max subscription doesn't include the 1M context
tier. This is NOT a transient rate limit — retrying won't help.

Only applies to Sonnet models (Opus 1M is general access). Detects
this specific error before the generic rate-limit handler and:
1. Reduces context_length from 1M to 200k (the standard tier)
2. Triggers context compression to fit
3. Retries with the reduced context

The reduction is session-scoped (not persisted) so it auto-recovers
if the user later enables extra usage on their subscription.

Fixes: Sonnet 4.6 instant rate limits on Claude Max without extra usage
2026-04-03 02:05:02 -07:00
Teknium
26d6083624 fix: correct qwen3.6-plus model slug
Renamed qwen/qwen3.6-plus-preview:free to qwen/qwen3.6-plus:free in both
OPENROUTER_MODELS and _PROVIDER_MODELS['nous'] lists.
2026-04-03 01:56:43 -07:00
Teknium
470c3ea51a fix: handle Anthropic long-context tier 429 by reducing to 200k
Anthropic returns HTTP 429 'Extra usage is required for long context
requests' when a Claude Max subscription doesn't include the 1M context
tier. This is NOT a transient rate limit — retrying won't help.

Detect this specific error before the generic rate-limit handler and:
1. Reduce context_length from 1M to 200k (the standard tier)
2. Trigger context compression to fit
3. Retry with the reduced context

The reduction is session-scoped (not persisted) so it auto-recovers
if the user later enables extra usage on their subscription.

Fixes: Sonnet 4.6 instant rate limits on Claude Max without extra usage
2026-04-03 01:56:43 -07:00
NexVeridian
388241f798 docs(acp): fix zed config 2026-04-03 01:46:45 -07:00
Teknium
67ae7a79df fix: use get_hermes_home(), consolidate git_cmd, update tests
Follow-up for salvaged PR #2352:
- Replace hardcoded Path(os.getenv('HERMES_HOME', ...)) with
  get_hermes_home() from hermes_constants (2 places)
- Consolidate redundant git_cmd_base into the existing git_cmd
  variable, constructed once before fork detection
- Update autostash tests for the unmerged index check added
  in the previous commit
2026-04-03 01:46:42 -07:00
Franci Penov
6b0022bb7b Add fork detection and upstream sync to hermes update
- Detect if origin points to a fork (not NousResearch/hermes-agent)
- Show warning when updating from a fork: origin URL
- After pulling from origin/main on a fork:
  - Prompt to add upstream remote if not present
  - Respect ~/.hermes/.skip_upstream_prompt to avoid repeated prompts
  - Compare origin/main with upstream/main
  - If origin has commits not on upstream, skip (don't trample user's work)
  - If upstream is ahead, pull from upstream and try to sync fork
  - Use --force-with-lease for safe fork syncing

Non-main branches are unaffected - they just pull from origin/{branch}.

Co-authored-by: Avery <avery@hermes-agent.ai>
2026-04-03 01:46:42 -07:00
Teknium
0109547fa2 fix(update): handle conflicted git index during hermes update (#4735)
* fix(gateway): race condition, photo media loss, and flood control in Telegram

Three bugs causing intermittent silent drops, partial responses, and
flood control delays on the Telegram platform:

1. Race condition in handle_message() — _active_sessions was set inside
   the background task, not before create_task(). Two rapid messages
   could both pass the guard and spawn duplicate processing tasks.
   Fix: set _active_sessions synchronously before spawning the task
   (grammY sequentialize / aiogram EventIsolation pattern).

2. Photo media loss on dequeue — when a photo (no caption) was queued
   during active processing and later dequeued, only .text was
   extracted. Empty text → message silently dropped.
   Fix: _build_media_placeholder() creates text context for media-only
   events so they survive the dequeue path.

3. Progress message edits triggered Telegram flood control — rapid tool
   calls edited the progress message every 0.3s, hitting Telegram's
   rate limit (23s+ waits). This blocked progress updates and could
   cause stream consumer timeouts.
   Fix: throttle edits to 1.5s minimum interval, detect flood control
   errors and gracefully degrade to new messages. edit_message() now
   returns failure for flood waits >5s instead of blocking.

* fix(gateway): downgrade empty/None response log from WARNING to DEBUG

This warning fires on every successful streamed response (streaming
delivers the text, handler returns None via already_sent=True) and
on every queued message during active processing. Both are expected
behavior, not error conditions. Downgrade to DEBUG to reduce log noise.

* fix(gateway): prevent stuck sessions with agent timeout and staleness eviction

Three changes to prevent sessions from getting permanently locked:

1. Agent execution timeout (HERMES_AGENT_TIMEOUT, default 10min):
   Wraps run_in_executor with asyncio.wait_for so a hung API call or
   runaway tool can't lock a session indefinitely. On timeout, the
   agent is interrupted and the user gets an actionable error message.

2. Staleness eviction for _running_agents:
   Tracks start timestamps for each session entry. When a new message
   arrives and the entry is older than timeout + 1min grace, it's
   evicted as a leaked lock. Safety net for any cleanup path that
   fails to remove the entry.

3. Cron job timeout (HERMES_CRON_TIMEOUT, default 10min):
   Wraps run_conversation in a ThreadPoolExecutor with timeout so a
   hung cron job doesn't block the ticker thread (and all subsequent
   cron jobs) indefinitely.

Follows grammY runner's per-update timeout pattern and aiogram's
asyncio.wait_for approach for handler deadlines.

* fix(gateway): STT config resolution, stream consumer flood control fallback

Three targeted fixes from user-reported issues:

1. STT config resolution (transcription_tools.py):
   _has_openai_audio_backend() and _resolve_openai_audio_client_config()
   now check stt.openai.api_key/base_url in config.yaml FIRST, before
   falling back to env vars. Fixes voice transcription breaking when
   using a custom OpenAI-compatible endpoint via config.yaml.

2. Stream consumer flood control fallback (stream_consumer.py):
   When an edit fails mid-stream (e.g., Telegram flood control returns
   failure for waits >5s), reset _already_sent to False so the normal
   final send path delivers the complete response. Previously, a
   truncated partial was left as the final message.

3. Telegram edit_message comment alignment (telegram.py):
   Clarify that long flood waits return failure so streaming can fall
   back to a normal final send.

* refactor: simplify and harden PR fixes after review

- Fix cron ThreadPoolExecutor blocking on timeout: use shutdown(wait=False,
  cancel_futures=True) instead of context manager that waits indefinitely
- Extract _dequeue_pending_text() to deduplicate media-placeholder logic
  in interrupt and normal-completion dequeue paths
- Remove hasattr guards for _running_agents_ts: add class-level default
  so partial test construction works without scattered defensive checks
- Move `import concurrent.futures` to top of cron/scheduler.py
- Progress throttle: sleep remaining interval instead of busy-looping
  0.1s (~15 wakeups per 1.5s window → 1 wakeup)
- Deduplicate _load_stt_config() in transcription_tools.py:
  _has_openai_audio_backend() now delegates to _resolve_openai_audio_client_config()

* fix: move class-level attribute after docstring, clarify throttle comment

Follow-up nits for salvaged PR #4577:
- Move _running_agents_ts class attribute below the docstring so
  GatewayRunner.__doc__ is preserved.
- Add clarifying comment explaining the throttle continue behavior
  (batches queued messages during the throttle interval).

* fix(update): handle conflicted git index during hermes update

When the git index has unmerged entries (e.g. from an interrupted
merge or rebase), git stash fails with 'needs merge / could not
write index'. Detect this with git ls-files --unmerged and clear
the conflict state with git reset before attempting the stash.
Working-tree changes are preserved.

Reported by @LLMJunky — package-lock.json conflict from a prior
merge left the index dirty, blocking hermes update entirely.

---------

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-04-03 01:17:12 -07:00
Teknium
c66c688727 fix: remove redundant restart message from update launchd path
launchd_restart() already prints stop/start confirmation via its
internal helpers — the extra 'Gateway restarted via launchd' line
was redundant. Update test assertion to match.
2026-04-03 01:16:42 -07:00
Dave Tist
988ecc7420 fix(update): avoid launchd restart race on macOS 2026-04-03 01:16:42 -07:00
kshitijk4poor
7165eff901 fix(whatsapp): add free_response_chats, mention stripping, and interactive message unwrapping
Address feature gaps vs Telegram/Discord/Mattermost adapters:
- free_response_chats whitelist to bypass mention gating per-group
- strip bot @phone mentions from body before forwarding to agent
- unwrap templateMessage/buttonsMessage/listMessage in bridge
- info-level log on successful mention pattern compilation
- use module-level json import instead of inline import in config
- eliminate double _normalize_whatsapp_id call via walrus operator
- hoist botIds computation outside per-message loop in bridge
2026-04-03 01:16:39 -07:00
kshitijk4poor
714e4941b8 fix(whatsapp): enforce require_mention in group chats 2026-04-03 01:16:39 -07:00
Teknium
23addf48d3 fix: allow running gateway service as root for LXC/container environments (#4732)
Previously, `hermes gateway install --system` hard-refused to create a
service running as root, even when explicitly requested via
`--run-as-user root`. This forced LXC/container users (where root is
the only user) to either create throwaway users or comment out the check
in source.

Changes:
- Auto-detected root (no explicit --run-as-user) still raises, but with
  a message explaining how to override
- Explicit `--run-as-user root` now allowed with a warning about
  security implications
- Interactive setup wizard prompt accepts 'root' as a valid username
  (warning comes from _system_service_identity downstream)
- Added tests for all three paths: auto-detected root rejection,
  explicit root allowance, and normal non-root passthrough
2026-04-03 01:14:21 -07:00
kshitijk4poor
4d99305345 fix(cli): surface recent sessions inside /history and /resume
When /history is used in an empty chat or /resume with no argument,
show an inline table of recent resumable sessions with title, preview,
relative timestamp, and session ID instead of a dead-end message.

Table formatting matches the existing hermes sessions list style
(column headers + thin separators, no box drawing).

Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-04-03 00:50:49 -07:00
Teknium
a933079564 fix: move class-level attribute after docstring, clarify throttle comment
Follow-up nits for salvaged PR #4577:
- Move _running_agents_ts class attribute below the docstring so
  GatewayRunner.__doc__ is preserved.
- Add clarifying comment explaining the throttle continue behavior
  (batches queued messages during the throttle interval).
2026-04-03 00:50:17 -07:00
kshitijk4poor
0ed28ab80c refactor: simplify and harden PR fixes after review
- Fix cron ThreadPoolExecutor blocking on timeout: use shutdown(wait=False,
  cancel_futures=True) instead of context manager that waits indefinitely
- Extract _dequeue_pending_text() to deduplicate media-placeholder logic
  in interrupt and normal-completion dequeue paths
- Remove hasattr guards for _running_agents_ts: add class-level default
  so partial test construction works without scattered defensive checks
- Move `import concurrent.futures` to top of cron/scheduler.py
- Progress throttle: sleep remaining interval instead of busy-looping
  0.1s (~15 wakeups per 1.5s window → 1 wakeup)
- Deduplicate _load_stt_config() in transcription_tools.py:
  _has_openai_audio_backend() now delegates to _resolve_openai_audio_client_config()
2026-04-03 00:50:17 -07:00
kshitijk4poor
28380e7aed fix(gateway): STT config resolution, stream consumer flood control fallback
Three targeted fixes from user-reported issues:

1. STT config resolution (transcription_tools.py):
   _has_openai_audio_backend() and _resolve_openai_audio_client_config()
   now check stt.openai.api_key/base_url in config.yaml FIRST, before
   falling back to env vars. Fixes voice transcription breaking when
   using a custom OpenAI-compatible endpoint via config.yaml.

2. Stream consumer flood control fallback (stream_consumer.py):
   When an edit fails mid-stream (e.g., Telegram flood control returns
   failure for waits >5s), reset _already_sent to False so the normal
   final send path delivers the complete response. Previously, a
   truncated partial was left as the final message.

3. Telegram edit_message comment alignment (telegram.py):
   Clarify that long flood waits return failure so streaming can fall
   back to a normal final send.
2026-04-03 00:50:17 -07:00
kshitijk4poor
970042deab fix(gateway): prevent stuck sessions with agent timeout and staleness eviction
Three changes to prevent sessions from getting permanently locked:

1. Agent execution timeout (HERMES_AGENT_TIMEOUT, default 10min):
   Wraps run_in_executor with asyncio.wait_for so a hung API call or
   runaway tool can't lock a session indefinitely. On timeout, the
   agent is interrupted and the user gets an actionable error message.

2. Staleness eviction for _running_agents:
   Tracks start timestamps for each session entry. When a new message
   arrives and the entry is older than timeout + 1min grace, it's
   evicted as a leaked lock. Safety net for any cleanup path that
   fails to remove the entry.

3. Cron job timeout (HERMES_CRON_TIMEOUT, default 10min):
   Wraps run_conversation in a ThreadPoolExecutor with timeout so a
   hung cron job doesn't block the ticker thread (and all subsequent
   cron jobs) indefinitely.

Follows grammY runner's per-update timeout pattern and aiogram's
asyncio.wait_for approach for handler deadlines.
2026-04-03 00:50:17 -07:00
kshitijk4poor
9bb83d1298 fix(gateway): downgrade empty/None response log from WARNING to DEBUG
This warning fires on every successful streamed response (streaming
delivers the text, handler returns None via already_sent=True) and
on every queued message during active processing. Both are expected
behavior, not error conditions. Downgrade to DEBUG to reduce log noise.
2026-04-03 00:50:17 -07:00
kshitijk4poor
69f85a4dce fix(gateway): race condition, photo media loss, and flood control in Telegram
Three bugs causing intermittent silent drops, partial responses, and
flood control delays on the Telegram platform:

1. Race condition in handle_message() — _active_sessions was set inside
   the background task, not before create_task(). Two rapid messages
   could both pass the guard and spawn duplicate processing tasks.
   Fix: set _active_sessions synchronously before spawning the task
   (grammY sequentialize / aiogram EventIsolation pattern).

2. Photo media loss on dequeue — when a photo (no caption) was queued
   during active processing and later dequeued, only .text was
   extracted. Empty text → message silently dropped.
   Fix: _build_media_placeholder() creates text context for media-only
   events so they survive the dequeue path.

3. Progress message edits triggered Telegram flood control — rapid tool
   calls edited the progress message every 0.3s, hitting Telegram's
   rate limit (23s+ waits). This blocked progress updates and could
   cause stream consumer timeouts.
   Fix: throttle edits to 1.5s minimum interval, detect flood control
   errors and gracefully degrade to new messages. edit_message() now
   returns failure for flood waits >5s instead of blocking.
2026-04-03 00:50:17 -07:00
Teknium
3659e1f0c2 test(acp): add E2E tests for MCP registration and tool-result reporting
Tests the full ACP flow:
- new_session with mcpServers → config conversion → register_mcp_servers
- prompt → tool_progress_callback → ToolCallStart events
- step_callback with results → ToolCallUpdate with rawOutput
- toolCallId pairing between start and completion events
- server names with slashes/dots sanitized correctly
- all session lifecycle methods (load/resume/fork) register MCP
2026-04-02 20:54:27 -07:00
Teknium
21c2d32471 fix(gateway): normalize step_callback prev_tools for backward compat
The PR changed prev_tools from list[str] to list[dict] with name/result
keys.  The gateway's _step_callback_sync passed this directly to hooks
as 'tool_names', breaking user-authored hooks that call
', '.join(tool_names).

Now:
- 'tool_names' always contains strings (backward-compatible)
- 'tools' carries the enriched dicts for hooks that want results

Also adds summary logging to register_mcp_servers() and comprehensive
tests for all three PR changes:
- sanitize_mcp_name_component edge cases
- register_mcp_servers public API
- _register_session_mcp_servers ACP integration
- step_callback result forwarding
- gateway normalization backward compat
2026-04-02 20:54:27 -07:00
Jack
f66b3fe76b fix(acp): include tool results in step_callback for ACP tool_call_update events
The step_callback previously only forwarded tool names as strings,
so build_tool_complete received result=None and ACP tool_call_update
events had empty content/rawOutput. Now prev_tools carries dicts with
both name and result by pairing each tool_call with its matching
tool-role message via tool_call_id.
2026-04-02 20:54:27 -07:00
Jack
9aa82d4807 fix(acp): use raw server name as registry key, only sanitize for tool name prefixes 2026-04-02 20:54:27 -07:00
Jack
9b2fb1cc2e feat(acp): register client-provided MCP servers as agent tools
ACP clients pass MCP server definitions in session/new, load_session,
resume_session, and fork_session. Previously these were accepted but
silently ignored — the agent never connected to them.

This wires the mcp_servers parameter into the existing MCP registration
pipeline (tools/mcp_tool.py) so client-provided servers are connected,
their tools discovered, and the agent's tool surface refreshed before
the first prompt.

Changes:

tools/mcp_tool.py:
- Extract sanitize_mcp_name_component() to replace all non-[A-Za-z0-9_]
  characters (fixes crash when server names contain / or other chars
  that violate provider tool-name validation rules)
- Use it in _convert_mcp_schema, _sync_mcp_toolsets, _build_utility_schemas
- Extract register_mcp_servers(servers: dict) as a public API that takes
  an explicit {name: config} map. discover_mcp_tools() becomes a thin
  wrapper that loads config.yaml and calls register_mcp_servers()

acp_adapter/server.py:
- Add _register_session_mcp_servers() which converts ACP McpServerStdio /
  McpServerHttp / McpServerSse objects to Hermes MCP config dicts,
  registers them via asyncio.to_thread (avoids blocking the ACP event
  loop), then rebuilds agent.tools, valid_tool_names, and invalidates
  the cached system prompt
- Call it from new_session, load_session, resume_session, fork_session

Tested with Eden (theproxycompany.com) as ACP client — 5 MCP servers
(HTTP + stdio) registered successfully, 110 tools available to the agent.
2026-04-02 20:54:27 -07:00
Erosika
29c98e8f83 feat(honcho): add configurable observation mode (unified/directional)
Adds observationMode config field to HonchoClientConfig:
- 'unified' (default): user peer self-observations, all agents share one pool
- 'directional': AI peer observes user, each agent keeps its own view

Changes:
- client.py: observation_mode field, _normalize_observation_mode(), config resolution
- session.py: add_peers respects mode (peer observation flags), dialectic_query
  routes through correct peer, create_conclusion uses correct observer
2026-04-02 20:38:36 -07:00
Erosika
9e0fc62650 feat(honcho): restore full integration parity in memory provider plugin
Implements all features from the post-merge Honcho plugin spec:

B1: recall_mode support (context/tools/hybrid)
B2: peer_memory_mode gating (stub for ABC suppression mechanism)
B3: resolve_session_name() session key resolution
B4: first-turn context baking in system_prompt_block()
B5: cost-awareness (cadence, injection frequency, reasoning cap)
B6: memory file migration in initialize()
B7: pre-warming context at init

Ports from open PRs:
- #3265: token budget enforcement in prefetch()
- #4053: cron guard (skip activation for cron/flush sessions)
- #2645: baseUrl-only flow verified in is_available()
- #1969: aiPeer sync from SOUL.md
- #1957: lazy session init in tools mode

Single file change: plugins/memory/honcho/__init__.py
No modifications to client.py, session.py, or any files outside the plugin.
2026-04-02 20:38:36 -07:00
Teknium
924bc67eee feat(memory): pluggable memory provider interface with profile isolation, review fixes, and honcho CLI restoration (#4623)
* feat(memory): add pluggable memory provider interface with profile isolation

Introduces a pluggable MemoryProvider ABC so external memory backends can
integrate with Hermes without modifying core files. Each backend becomes a
plugin implementing a standard interface, orchestrated by MemoryManager.

Key architecture:
- agent/memory_provider.py — ABC with core + optional lifecycle hooks
- agent/memory_manager.py — single integration point in the agent loop
- agent/builtin_memory_provider.py — wraps existing MEMORY.md/USER.md

Profile isolation fixes applied to all 6 shipped plugins:
- Cognitive Memory: use get_hermes_home() instead of raw env var
- Hindsight Memory: check $HERMES_HOME/hindsight/config.json first,
  fall back to legacy ~/.hindsight/ for backward compat
- Hermes Memory Store: replace hardcoded ~/.hermes paths with
  get_hermes_home() for config loading and DB path defaults
- Mem0 Memory: use get_hermes_home() instead of raw env var
- RetainDB Memory: auto-derive profile-scoped project name from
  hermes_home path (hermes-<profile>), explicit env var overrides
- OpenViking Memory: read-only, no local state, isolation via .env

MemoryManager.initialize_all() now injects hermes_home into kwargs so
every provider can resolve profile-scoped storage without importing
get_hermes_home() themselves.

Plugin system: adds register_memory_provider() to PluginContext and
get_plugin_memory_providers() accessor.

Based on PR #3825. 46 tests (37 unit + 5 E2E + 4 plugin registration).

* refactor(memory): drop cognitive plugin, rewrite OpenViking as full provider

Remove cognitive-memory plugin (#727) — core mechanics are broken:
decay runs 24x too fast (hourly not daily), prefetch uses row ID as
timestamp, search limited by importance not similarity.

Rewrite openviking-memory plugin from a read-only search wrapper into
a full bidirectional memory provider using the complete OpenViking
session lifecycle API:

- sync_turn: records user/assistant messages to OpenViking session
  (threaded, non-blocking)
- on_session_end: commits session to trigger automatic memory extraction
  into 6 categories (profile, preferences, entities, events, cases,
  patterns)
- prefetch: background semantic search via find() endpoint
- on_memory_write: mirrors built-in memory writes to the session
- is_available: checks env var only, no network calls (ABC compliance)

Tools expanded from 3 to 5:
- viking_search: semantic search with mode/scope/limit
- viking_read: tiered content (abstract ~100tok / overview ~2k / full)
- viking_browse: filesystem-style navigation (list/tree/stat)
- viking_remember: explicit memory storage via session
- viking_add_resource: ingest URLs/docs into knowledge base

Uses direct HTTP via httpx (no openviking SDK dependency needed).
Response truncation on viking_read to prevent context flooding.

* fix(memory): harden Mem0 plugin — thread safety, non-blocking sync, circuit breaker

- Remove redundant mem0_context tool (identical to mem0_search with
  rerank=true, top_k=5 — wastes a tool slot and confuses the model)
- Thread sync_turn so it's non-blocking — Mem0's server-side LLM
  extraction can take 5-10s, was stalling the agent after every turn
- Add threading.Lock around _get_client() for thread-safe lazy init
  (prefetch and sync threads could race on first client creation)
- Add circuit breaker: after 5 consecutive API failures, pause calls
  for 120s instead of hammering a down server every turn. Auto-resets
  after cooldown. Logs a warning when tripped.
- Track success/failure in prefetch, sync_turn, and all tool calls
- Wait for previous sync to finish before starting a new one (prevents
  unbounded thread accumulation on rapid turns)
- Clean up shutdown to join both prefetch and sync threads

* fix(memory): enforce single external memory provider limit

MemoryManager now rejects a second non-builtin provider with a warning.
Built-in memory (MEMORY.md/USER.md) is always accepted. Only ONE
external plugin provider is allowed at a time. This prevents tool
schema bloat (some providers add 3-5 tools each) and conflicting
memory backends.

The warning message directs users to configure memory.provider in
config.yaml to select which provider to activate.

Updated all 47 tests to use builtin + one external pattern instead
of multiple externals. Added test_second_external_rejected to verify
the enforcement.

* feat(memory): add ByteRover memory provider plugin

Implements the ByteRover integration (from PR #3499 by hieuntg81) as a
MemoryProvider plugin instead of direct run_agent.py modifications.

ByteRover provides persistent memory via the brv CLI — a hierarchical
knowledge tree with tiered retrieval (fuzzy text then LLM-driven search).
Local-first with optional cloud sync.

Plugin capabilities:
- prefetch: background brv query for relevant context
- sync_turn: curate conversation turns (threaded, non-blocking)
- on_memory_write: mirror built-in memory writes to brv
- on_pre_compress: extract insights before context compression

Tools (3):
- brv_query: search the knowledge tree
- brv_curate: store facts/decisions/patterns
- brv_status: check CLI version and context tree state

Profile isolation: working directory at $HERMES_HOME/byterover/ (scoped
per profile). Binary resolution cached with thread-safe double-checked
locking. All write operations threaded to avoid blocking the agent
(curate can take 120s with LLM processing).

* fix(memory): thread remaining sync_turns, fix holographic, add config key

Plugin fixes:
- Hindsight: thread sync_turn (was blocking up to 30s via _run_in_thread)
- RetainDB: thread sync_turn (was blocking on HTTP POST)
- Both: shutdown now joins sync threads alongside prefetch threads

Holographic retrieval fixes:
- reason(): removed dead intersection_key computation (bundled but never
  used in scoring). Now reuses pre-computed entity_residuals directly,
  moved role_content encoding outside the inner loop.
- contradict(): added _MAX_CONTRADICT_FACTS=500 scaling guard. Above
  500 facts, only checks the most recently updated ones to avoid O(n^2)
  explosion (~125K comparisons at 500 is acceptable).

Config:
- Added memory.provider key to DEFAULT_CONFIG ("" = builtin only).
  No version bump needed (deep_merge handles new keys automatically).

* feat(memory): extract Honcho as a MemoryProvider plugin

Creates plugins/honcho-memory/ as a thin adapter over the existing
honcho_integration/ package. All 4 Honcho tools (profile, search,
context, conclude) move from the normal tool registry to the
MemoryProvider interface.

The plugin delegates all work to HonchoSessionManager — no Honcho
logic is reimplemented. It uses the existing config chain:
$HERMES_HOME/honcho.json -> ~/.honcho/config.json -> env vars.

Lifecycle hooks:
- initialize: creates HonchoSessionManager via existing client factory
- prefetch: background dialectic query
- sync_turn: records messages + flushes to API (threaded)
- on_memory_write: mirrors user profile writes as conclusions
- on_session_end: flushes all pending messages

This is a prerequisite for the MemoryManager wiring in run_agent.py.
Once wired, Honcho goes through the same provider interface as all
other memory plugins, and the scattered Honcho code in run_agent.py
can be consolidated into the single MemoryManager integration point.

* feat(memory): wire MemoryManager into run_agent.py

Adds 8 integration points for the external memory provider plugin,
all purely additive (zero existing code modified):

1. Init (~L1130): Create MemoryManager, find matching plugin provider
   from memory.provider config, initialize with session context
2. Tool injection (~L1160): Append provider tool schemas to self.tools
   and self.valid_tool_names after memory_manager init
3. System prompt (~L2705): Add external provider's system_prompt_block
   alongside existing MEMORY.md/USER.md blocks
4. Tool routing (~L5362): Route provider tool calls through
   memory_manager.handle_tool_call() before the catchall handler
5. Memory write bridge (~L5353): Notify external provider via
   on_memory_write() when the built-in memory tool writes
6. Pre-compress (~L5233): Call on_pre_compress() before context
   compression discards messages
7. Prefetch (~L6421): Inject provider prefetch results into the
   current-turn user message (same pattern as Honcho turn context)
8. Turn sync + session end (~L8161, ~L8172): sync_all() after each
   completed turn, queue_prefetch_all() for next turn, on_session_end()
   + shutdown_all() at conversation end

All hooks are wrapped in try/except — a failing provider never breaks
the agent. The existing memory system, Honcho integration, and all
other code paths are completely untouched.

Full suite: 7222 passed, 4 pre-existing failures.

* refactor(memory): remove legacy Honcho integration from core

Extracts all Honcho-specific code from run_agent.py, model_tools.py,
toolsets.py, and gateway/run.py. Honcho is now exclusively available
as a memory provider plugin (plugins/honcho-memory/).

Removed from run_agent.py (-457 lines):
- Honcho init block (session manager creation, activation, config)
- 8 Honcho methods: _honcho_should_activate, _strip_honcho_tools,
  _activate_honcho, _register_honcho_exit_hook, _queue_honcho_prefetch,
  _honcho_prefetch, _honcho_save_user_observation, _honcho_sync
- _inject_honcho_turn_context module-level function
- Honcho system prompt block (tool descriptions, CLI commands)
- Honcho context injection in api_messages building
- Honcho params from __init__ (honcho_session_key, honcho_manager,
  honcho_config)
- HONCHO_TOOL_NAMES constant
- All honcho-specific tool dispatch forwarding

Removed from other files:
- model_tools.py: honcho_tools import, honcho params from handle_function_call
- toolsets.py: honcho toolset definition, honcho tools from core tools list
- gateway/run.py: honcho params from AIAgent constructor calls

Removed tests (-339 lines):
- 9 Honcho-specific test methods from test_run_agent.py
- TestHonchoAtexitFlush class from test_exit_cleanup_interrupt.py

Restored two regex constants (_SURROGATE_RE, _BUDGET_WARNING_RE) that
were accidentally removed during the honcho function extraction.

The honcho_integration/ package is kept intact — the plugin delegates
to it. tools/honcho_tools.py registry entries are now dead code (import
commented out in model_tools.py) but the file is preserved for reference.

Full suite: 7207 passed, 4 pre-existing failures. Zero regressions.

* refactor(memory): restructure plugins, add CLI, clean gateway, migration notice

Plugin restructure:
- Move all memory plugins from plugins/<name>-memory/ to plugins/memory/<name>/
  (byterover, hindsight, holographic, honcho, mem0, openviking, retaindb)
- New plugins/memory/__init__.py discovery module that scans the directory
  directly, loading providers by name without the general plugin system
- run_agent.py uses load_memory_provider() instead of get_plugin_memory_providers()

CLI wiring:
- hermes memory setup — interactive curses picker + config wizard
- hermes memory status — show active provider, config, availability
- hermes memory off — disable external provider (built-in only)
- hermes honcho — now shows migration notice pointing to hermes memory setup

Gateway cleanup:
- Remove _get_or_create_gateway_honcho (already removed in prev commit)
- Remove _shutdown_gateway_honcho and _shutdown_all_gateway_honcho methods
- Remove all calls to shutdown methods (4 call sites)
- Remove _honcho_managers/_honcho_configs dict references

Dead code removal:
- Delete tools/honcho_tools.py (279 lines, import was already commented out)
- Delete tests/gateway/test_honcho_lifecycle.py (131 lines, tested removed methods)
- Remove if False placeholder from run_agent.py

Migration:
- Honcho migration notice on startup: detects existing honcho.json or
  ~/.honcho/config.json, prints guidance to run hermes memory setup.
  Only fires when memory.provider is not set and not in quiet mode.

Full suite: 7203 passed, 4 pre-existing failures. Zero regressions.

* feat(memory): standardize plugin config + add per-plugin documentation

Config architecture:
- Add save_config(values, hermes_home) to MemoryProvider ABC
- Honcho: writes to $HERMES_HOME/honcho.json (SDK native)
- Mem0: writes to $HERMES_HOME/mem0.json
- Hindsight: writes to $HERMES_HOME/hindsight/config.json
- Holographic: writes to config.yaml under plugins.hermes-memory-store
- OpenViking/RetainDB/ByteRover: env-var only (default no-op)

Setup wizard (hermes memory setup):
- Now calls provider.save_config() for non-secret config
- Secrets still go to .env via env vars
- Only memory.provider activation key goes to config.yaml

Documentation:
- README.md for each of the 7 providers in plugins/memory/<name>/
- Requirements, setup (wizard + manual), config reference, tools table
- Consistent format across all providers

The contract for new memory plugins:
- get_config_schema() declares all fields (REQUIRED)
- save_config() writes native config (REQUIRED if not env-var-only)
- Secrets use env_var field in schema, written to .env by wizard
- README.md in the plugin directory

* docs: add memory providers user guide + developer guide

New pages:
- user-guide/features/memory-providers.md — comprehensive guide covering
  all 7 shipped providers (Honcho, OpenViking, Mem0, Hindsight,
  Holographic, RetainDB, ByteRover). Each with setup, config, tools,
  cost, and unique features. Includes comparison table and profile
  isolation notes.
- developer-guide/memory-provider-plugin.md — how to build a new memory
  provider plugin. Covers ABC, required methods, config schema,
  save_config, threading contract, profile isolation, testing.

Updated pages:
- user-guide/features/memory.md — replaced Honcho section with link to
  new Memory Providers page
- user-guide/features/honcho.md — replaced with migration redirect to
  the new Memory Providers page
- sidebars.ts — added both new pages to navigation

* fix(memory): auto-migrate Honcho users to memory provider plugin

When honcho.json or ~/.honcho/config.json exists but memory.provider
is not set, automatically set memory.provider: honcho in config.yaml
and activate the plugin. The plugin reads the same config files, so
all data and credentials are preserved. Zero user action needed.

Persists the migration to config.yaml so it only fires once. Prints
a one-line confirmation in non-quiet mode.

* fix(memory): only auto-migrate Honcho when enabled + credentialed

Check HonchoClientConfig.enabled AND (api_key OR base_url) before
auto-migrating — not just file existence. Prevents false activation
for users who disabled Honcho, stopped using it (config lingers),
or have ~/.honcho/ from a different tool.

* feat(memory): auto-install pip dependencies during hermes memory setup

Reads pip_dependencies from plugin.yaml, checks which are missing,
installs them via pip before config walkthrough. Also shows install
guidance for external_dependencies (e.g. brv CLI for ByteRover).

Updated all 7 plugin.yaml files with pip_dependencies:
- honcho: honcho-ai
- mem0: mem0ai
- openviking: httpx
- hindsight: hindsight-client
- holographic: (none)
- retaindb: requests
- byterover: (external_dependencies for brv CLI)

* fix: remove remaining Honcho crash risks from cli.py and gateway

cli.py: removed Honcho session re-mapping block (would crash importing
deleted tools/honcho_tools.py), Honcho flush on compress, Honcho
session display on startup, Honcho shutdown on exit, honcho_session_key
AIAgent param.

gateway/run.py: removed honcho_session_key params from helper methods,
sync_honcho param, _honcho.shutdown() block.

tests: fixed test_cron_session_with_honcho_key_skipped (was passing
removed honcho_key param to _flush_memories_for_session).

* fix: include plugins/ in pyproject.toml package list

Without this, plugins/memory/ wouldn't be included in non-editable
installs. Hermes always runs from the repo checkout so this is belt-
and-suspenders, but prevents breakage if the install method changes.

* fix(memory): correct pip-to-import name mapping for dep checks

The heuristic dep.replace('-', '_') fails for packages where the pip
name differs from the import name: honcho-ai→honcho, mem0ai→mem0,
hindsight-client→hindsight_client. Added explicit mapping table so
hermes memory setup doesn't try to reinstall already-installed packages.

* chore: remove dead code from old plugin memory registration path

- hermes_cli/plugins.py: removed register_memory_provider(),
  _memory_providers list, get_plugin_memory_providers() — memory
  providers now use plugins/memory/ discovery, not the general plugin system
- hermes_cli/main.py: stripped 74 lines of dead honcho argparse
  subparsers (setup, status, sessions, map, peer, mode, tokens,
  identity, migrate) — kept only the migration redirect
- agent/memory_provider.py: updated docstring to reflect new
  registration path
- tests: replaced TestPluginMemoryProviderRegistration with
  TestPluginMemoryDiscovery that tests the actual plugins/memory/
  discovery system. Added 3 new tests (discover, load, nonexistent).

* chore: delete dead honcho_integration/cli.py and its tests

cli.py (794 lines) was the old 'hermes honcho' command handler — nobody
calls it since cmd_honcho was replaced with a migration redirect.

Deleted tests that imported from removed code:
- tests/honcho_integration/test_cli.py (tested _resolve_api_key)
- tests/honcho_integration/test_config_isolation.py (tested CLI config paths)
- tests/tools/test_honcho_tools.py (tested the deleted tools/honcho_tools.py)

Remaining honcho_integration/ files (actively used by the plugin):
- client.py (445 lines) — config loading, SDK client creation
- session.py (991 lines) — session management, queries, flush

* refactor: move honcho_integration/ into the honcho plugin

Moves client.py (445 lines) and session.py (991 lines) from the
top-level honcho_integration/ package into plugins/memory/honcho/.
No Honcho code remains in the main codebase.

- plugins/memory/honcho/client.py — config loading, SDK client creation
- plugins/memory/honcho/session.py — session management, queries, flush
- Updated all imports: run_agent.py (auto-migration), hermes_cli/doctor.py,
  plugin __init__.py, session.py cross-import, all tests
- Removed honcho_integration/ package and pyproject.toml entry
- Renamed tests/honcho_integration/ → tests/honcho_plugin/

* docs: update architecture + gateway-internals for memory provider system

- architecture.md: replaced honcho_integration/ with plugins/memory/
- gateway-internals.md: replaced Honcho-specific session routing and
  flush lifecycle docs with generic memory provider interface docs

* fix: update stale mock path for resolve_active_host after honcho plugin migration

* fix(memory): address review feedback — P0 lifecycle, ABC contract, honcho CLI restore

Review feedback from Honcho devs (erosika):

P0 — Provider lifecycle:
- Remove on_session_end() + shutdown_all() from run_conversation() tail
  (was killing providers after every turn in multi-turn sessions)
- Add shutdown_memory_provider() method on AIAgent for callers
- Wire shutdown into CLI atexit, reset_conversation, gateway stop/expiry

Bug fixes:
- Remove sync_honcho=False kwarg from /btw callsites (TypeError crash)
- Fix doctor.py references to dead 'hermes honcho setup' command
- Cache prefetch_all() before tool loop (was re-calling every iteration)

ABC contract hardening (all backwards-compatible):
- Add session_id kwarg to prefetch/sync_turn/queue_prefetch
- Make on_pre_compress() return str (provider insights in compression)
- Add **kwargs to on_turn_start() for runtime context
- Add on_delegation() hook for parent-side subagent observation
- Document agent_context/agent_identity/agent_workspace kwargs on
  initialize() (prevents cron corruption, enables profile scoping)
- Fix docstring: single external provider, not multiple

Honcho CLI restoration:
- Add plugins/memory/honcho/cli.py (from main's honcho_integration/cli.py
  with imports adapted to plugin path)
- Restore full hermes honcho command with all subcommands (status, peer,
  mode, tokens, identity, enable/disable, sync, peers, --target-profile)
- Restore auto-clone on profile creation + sync on hermes update
- hermes honcho setup now redirects to hermes memory setup

* fix(memory): wire on_delegation, skip_memory for cron/flush, fix ByteRover return type

- Wire on_delegation() in delegate_tool.py — parent's memory provider
  is notified with task+result after each subagent completes
- Add skip_memory=True to cron scheduler (prevents cron system prompts
  from corrupting user representations — closes #4052)
- Add skip_memory=True to gateway flush agent (throwaway agent shouldn't
  activate memory provider)
- Fix ByteRover on_pre_compress() return type: None -> str

* fix(honcho): port profile isolation fixes from PR #4632

Ports 5 bug fixes found during profile testing (erosika's PR #4632):

1. 3-tier config resolution — resolve_config_path() now checks
   $HERMES_HOME/honcho.json → ~/.hermes/honcho.json → ~/.honcho/config.json
   (non-default profiles couldn't find shared host blocks)

2. Thread host=_host_key() through from_global_config() in cmd_setup,
   cmd_status, cmd_identity (--target-profile was being ignored)

3. Use bare profile name as aiPeer (not host key with dots) — Honcho's
   peer ID pattern is ^[a-zA-Z0-9_-]+$, dots are invalid

4. Wrap add_peers() in try/except — was fatal on new AI peers, killed
   all message uploads for the session

5. Gate Honcho clone behind --clone/--clone-all on profile create
   (bare create should be blank-slate)

Also: sanitize assistant_peer_id via _sanitize_id()

* fix(tests): add module cleanup fixture to test_cli_provider_resolution

test_cli_provider_resolution._import_cli() wipes tools.*, cli, and
run_agent from sys.modules to force fresh imports, but had no cleanup.
This poisoned all subsequent tests on the same xdist worker — mocks
targeting tools.file_tools, tools.send_message_tool, etc. patched the
NEW module object while already-imported functions still referenced
the OLD one. Caused ~25 cascade failures: send_message KeyError,
process_registry FileNotFoundError, file_read_guards timeouts,
read_loop_detection file-not-found, mcp_oauth None port, and
provider_parity/codex_execution stale tool lists.

Fix: autouse fixture saves all affected modules before each test and
restores them after, matching the pattern in
test_managed_browserbase_and_modal.py.
2026-04-02 15:33:51 -07:00
Teknium
e0b2bdb089 fix: webhook platform support — skip home channel prompt, disable tool progress (salvage #4363) (#4660)
Cherry-picked from PR #4363 by @bennyhodl with follow-up fixes:

- Skip 'No home channel' prompt for webhook platform (webhooks deliver
  to configured targets, not a home channel)
- Disable tool progress for webhooks (no message editing support)
- Add webhook to PLATFORMS in tools_config.py and skills_config.py
- Add hermes-webhook toolset to toolsets.py + hermes-gateway includes
- Removed overly aggressive <50 char content filter that blocked
  legitimate short responses (tool progress already handled at source)

Co-authored-by: bennyhodl <bennyhodl@users.noreply.github.com>
2026-04-02 14:00:22 -07:00
SHL0MS
6d68fbf756 Merge pull request #4654 from SHL0MS/skill/research-paper-writing
Replace ml-paper-writing with research-paper-writing: full end-to-end research pipeline
2026-04-02 13:24:12 -07:00
SHL0MS
b86647c295 Replace ml-paper-writing with research-paper-writing: full research pipeline skill
Replaces the writing-focused ml-paper-writing skill (940 lines) with a
complete end-to-end research paper pipeline (1,599 lines SKILL.md + 3,184
lines across 7 reference files).

New content:
- Full 8-phase pipeline: project setup, literature review, experiment
  design, execution/monitoring, analysis, paper drafting, review/revision,
  submission preparation
- Iterative refinement strategy guide from autoreason research (when to use
  autoreason vs critique-and-revise vs single-pass, model selection)
- Hermes agent integration: delegate_task parallel drafting, cronjob
  monitoring, memory/todo state management, skill composition
- Professional LaTeX tooling: microtype, siunitx, TikZ diagram patterns,
  algorithm2e, subcaption, latexdiff, SciencePlots
- Human evaluation design: annotation protocols, inter-annotator agreement,
  crowdsourcing platforms
- Title, Figure 1, conclusion, appendix strategy, page budget management
- Anonymization checklist, rebuttal writing, camera-ready preparation
- AAAI and COLM venue coverage (checklists, reviewer guidelines)

Preserved from ml-paper-writing:
- All writing philosophy (Nanda, Farquhar, Gopen & Swan, Lipton, Perez)
- Citation verification workflow (5-step mandatory process)
- All 6 conference templates (NeurIPS, ICML, ICLR, ACL, AAAI, COLM)
- Conference requirements, format conversion workflow
- Proactivity/collaboration guidance

Bug fixes in inherited reference files:
- BibLaTeX recommendation now correctly says natbib for conferences
- Bare except clauses fixed to except Exception
- Jinja2 template tags removed from citation-workflow.md
- Stale date caveats added to reviewer-guidelines.md
2026-04-02 16:13:26 -04:00
Teknium
798a7b99e4 docs: add Configuration Options section to Slack docs (#4644)
* docs: add Configuration Options section to Slack docs

Documents all config.yaml options for the Slack bot:
- Thread & reply behavior (reply_to_mode, reply_broadcast)
- Session isolation (group_sessions_per_user)
- Mention & trigger behavior (require_mention, mention_patterns, reply_prefix)
- Unauthorized user handling (unauthorized_dm_behavior)
- Voice transcription (stt_enabled)
- Full example config showing all options together

Includes a note about Slack's hardcoded @mention requirement in channels
(no free_response_channels equivalent like Discord/Telegram).

* docs: consolidate reply_in_thread into Configuration Options section

Folds the standalone Reply Threading subsection from PR #4643 into
the Thread & Reply Behavior subsection, keeping all config options
in one place. Adds reply_in_thread to the table and full example.
2026-04-02 12:38:13 -07:00
kshitijk4poor
d2b08406a4 fix(agent): classify think-only empty responses before retrying 2026-04-02 12:29:18 -07:00
Teknium
241cbeeccd docs: add reply_in_thread config to Slack docs 2026-04-02 12:18:40 -07:00
Animesh Mishra
b9a968c1de feat(slack): add reply_in_thread config option
By default, Hermes always threads replies to channel messages. Teams
that prefer direct channel replies had no way to opt out without
patching the source.

Add a reply_in_thread option (default: true) to the Slack platform
extra config:

  platforms:
    slack:
      extra:
        reply_in_thread: false

When false, _resolve_thread_ts() returns None for top-level channel
messages, so replies go directly to the channel. Messages already
inside an existing thread are still replied in-thread to preserve
conversation context. Default is true for full backward compatibility.
2026-04-02 12:18:40 -07:00
Teknium
d89cc7fec1 feat(prompt): add Google model operational guidance for Gemini and Gemma (#4641)
Adapted from OpenCode's gemini.txt. Gemini and Gemma models now get
structured operational directives alongside tool-use enforcement:
absolute paths, verify-before-edit, dependency checks, conciseness,
parallel tool calls, non-interactive flags, autonomous execution.

Based on PR #4026, extended to cover Gemma models.
2026-04-02 11:52:34 -07:00
Teknium
3186668799 feat: per-turn primary runtime restoration and transport recovery (#4624)
Makes provider fallback turn-scoped in long-lived CLI sessions. Previously, a single transient failure pinned the session to the fallback provider for every subsequent turn.

- _primary_runtime dict snapshot at __init__ (model, provider, base_url, api_mode, client_kwargs, compressor state)
- _restore_primary_runtime() at top of run_conversation() — restores all state, resets fallback chain index
- _try_recover_primary_transport() — one extra recovery cycle (client rebuild + cooldown) for transient transport errors on direct endpoints before fallback
- Skipped for aggregator providers (OpenRouter, Nous)
- 25 tests

Inspired by #4612 (@betamod). Closes #4612.
2026-04-02 10:52:01 -07:00
Teknium
918d593544 chore: gitignore generated skills.json
Follow-up to #4500 — the extraction script generates this file at
build time, so it should not be committed.
2026-04-02 10:48:15 -07:00
Nacho Avecilla
b8dd059c40 feat(website): add skills browse and search page to docs (#4500)
Adds a Skills Hub page to the documentation site with browsable/searchable catalog of all skills (built-in, optional, and community from cached hub indexes).

- Python extraction script (website/scripts/extract-skills.py) parses SKILL.md frontmatter and hub index caches into skills.json
- React page (website/src/pages/skills/) with search, category filtering, source filtering, and expandable skill cards
- CI workflow updated to run extraction before Docusaurus build
- Deploy trigger expanded to include skills/ and optional-skills/ changes

Authored by @IAvecilla
2026-04-02 10:47:38 -07:00
kshitijk4poor
20441cf2c8 fix(insights): persist token usage for non-CLI sessions 2026-04-02 10:47:13 -07:00
Teknium
585855d2ca fix: preserve Anthropic thinking block signatures across tool-use turns
Anthropic extended thinking blocks include an opaque 'signature' field
required for thinking chain continuity across multi-turn tool-use
conversations. Previously, normalize_anthropic_response() extracted
only the thinking text and set reasoning_details=None, discarding the
signature. On subsequent turns the API could not verify the chain.

Changes:
- _to_plain_data(): new recursive SDK-to-dict converter with depth cap
  (20 levels) and path-based cycle detection for safety
- _extract_preserved_thinking_blocks(): rehydrates preserved thinking
  blocks (including signature) from reasoning_details on assistant
  messages, placing them before tool_use blocks as Anthropic requires
- normalize_anthropic_response(): stores full thinking blocks in
  reasoning_details via _to_plain_data()
- _extract_reasoning(): adds 'thinking' key to the detail lookup chain
  so Anthropic-format details are found alongside OpenRouter format

Salvaged from PR #4503 by @priveperfumes — focused on the thinking
block continuity fix only (cache strategy and other changes excluded).
2026-04-02 10:30:32 -07:00
Teknium
28a073edc6 fix: repair OpenCode model routing and selection (#4508)
OpenCode Zen and Go are mixed-API-surface providers — different models
behind them use different API surfaces (GPT on Zen uses codex_responses,
Claude on Zen uses anthropic_messages, MiniMax on Go uses
anthropic_messages, GLM/Kimi on Go use chat_completions).

Changes:
- Add normalize_opencode_model_id() and opencode_model_api_mode() to
  models.py for model ID normalization and API surface routing
- Add _provider_supports_explicit_api_mode() to runtime_provider.py
  to prevent stale api_mode from leaking across provider switches
- Wire opencode routing into all three api_mode resolution paths:
  pool entry, api_key provider, and explicit runtime
- Add api_mode field to ModelSwitchResult for propagation through the
  switch pipeline
- Consolidate _PROVIDER_MODELS from main.py into models.py (single
  source of truth, eliminates duplicate dict)
- Add opencode normalization to setup wizard and model picker flows
- Add opencode block to _normalize_model_for_provider in CLI
- Add opencode-zen/go fallback model lists to setup.py

Tests: 160 targeted tests pass (26 new tests covering normalization,
api_mode routing per provider/model, persistence, and setup wizard
normalization).

Based on PR #3017 by SaM13997.

Co-authored-by: SaM13997 <139419381+SaM13997@users.noreply.github.com>
2026-04-02 09:36:24 -07:00
Devorun
f4f64c413f fix(cli): ensure zero exit code on successful quiet mode queries (#4601) 2026-04-02 09:33:31 -07:00
Teknium
8dc5b11e95 fix(honcho): remove redundant local HOST import in _all_profile_host_configs
HOST is already imported at module level from honcho_integration.client.
The local import inside _all_profile_host_configs() was unnecessary.
2026-04-02 09:25:16 -07:00
Erosika
37d73d94bb fix: patch _local_config_path in tests for write isolation 2026-04-02 09:25:16 -07:00
Erosika
a0eae33248 fix(honcho): address PR review findings
- Remove duplicate cmd_sync definition (kept version with error output)
- Fix from_env workspace to stay shared (hermes) not profile-derived
- Add docstring clarifying get_or_create is idempotent in status
- Remove unused import importlib in test
- Fix test assertion for shared workspace in from_env path
- Add 3 tests for sync_honcho_profiles_quiet
2026-04-02 09:25:16 -07:00
Erosika
c146631e3b feat(honcho): sync command + auto-sync on hermes update
- hermes honcho sync: scan all profiles, create missing host blocks
- hermes update: automatically syncs Honcho config to all profiles
  after skill sync (existing users get profile mapping on next update)
- sync_honcho_profiles_quiet() for silent use from update path
2026-04-02 09:25:16 -07:00
Erosika
89eab74c67 feat(honcho): --target-profile flag + peer card display in status
- hermes honcho --target-profile <name> <command>: target another
  profile's Honcho config without switching profiles. Works with all
  subcommands (status, peer, mode, tokens, enable, disable, etc.)
- hermes honcho status now shows user peer card and AI peer
  representation when connected (fetched live from Honcho API)
2026-04-02 09:25:16 -07:00
Erosika
5f6bf2a473 fix(honcho): share workspace across profiles by default
Profiles inherit the default workspace instead of deriving a separate
one. All profiles see the same user context, sessions, and project
history. Each profile is a different AI peer in a shared space.

Workspace can still be overridden per-profile via config if isolation
is needed.
2026-04-02 09:25:16 -07:00
Erosika
f27da5fe8e fix(honcho): remove linkedHosts from peers table 2026-04-02 09:25:16 -07:00
Erosika
0e90df1216 feat(honcho): eager peer creation + enable/disable per profile
- Eagerly create AI and user peers in Honcho when a profile is created
  (not deferred to first message). Uses idempotent peer() SDK call.
- hermes honcho enable: turn on Honcho for active profile, clone
  settings from default if first time, create peer immediately
- hermes honcho disable: turn off Honcho for active profile
- _ensure_peer_exists() helper for idempotent peer creation
2026-04-02 09:25:16 -07:00
Erosika
37458e72a2 feat(honcho): auto-clone config to new profiles on creation
When a profile is created and Honcho is already configured on the
default host, automatically creates a host block for the new profile
with inherited settings (memory mode, recall mode, write frequency,
peer name, etc.) and auto-derived workspace/aiPeer.

Zero-friction path: hermes profile create coder -> Honcho config
cloned as hermes.coder with all settings inherited.
2026-04-02 09:25:16 -07:00
Erosika
d1189f2be9 feat(honcho): add cross-profile observability for Honcho integration
- hermes honcho status: shows active profile name + host key
- hermes honcho status --all: compact table of all profiles with mode,
  recall, write frequency per host block
- hermes honcho peers: cross-profile peer identity table (user peer,
  AI peer, linked hosts)
- All write commands (peer, mode, tokens) print [host_key] label when
  operating on a non-default profile
2026-04-02 09:25:16 -07:00
Erosika
18c156af8e feat(honcho): scope host and peer resolution to active Hermes profile
Derives the Honcho host key from the active Hermes profile so that each
profile gets its own Honcho host block, workspace, and AI peer identity.

Profile "coder" resolves to host "hermes.coder", reads from
hosts["hermes.coder"] in honcho.json, and defaults workspace + aiPeer
to the derived host name.

Resolution order: HERMES_HONCHO_HOST env var > active profile name >
"hermes" (default).

Complements #3681 (profiles) with the Honcho identity layer that was
part of #2845 (named instances), adapted to the merged profiles system.
2026-04-02 09:25:16 -07:00
Teknium
661a1b0ba2 fix: exclude matrix from [all] extras — python-olm is upstream-broken (#4615)
python-olm (required by matrix-nio[e2e]) fails to build on modern macOS:
- CMake 4 rejects vendored libolm's cmake_minimum_required(VERSION 3.4)
- Apple Clang 21+ rejects a C++ type error in include/olm/list.hh
- Upstream libolm repo is archived, no fix forthcoming

Including matrix in [all] causes the entire extras install to fail during
`hermes update`, silently dropping all other extras (telegram, discord,
slack, cron, etc.) when the fallback kicks in.

The [matrix] extra is preserved for opt-in install:
  pip install 'hermes-agent[matrix]'

Closes #4178
2026-04-02 09:21:37 -07:00
Teknium
acea9ee20b fix(tests): fix 11 real test failures + major cascade poisoner (#4570)
Three root causes addressed:

1. AIAgent no longer defaults base_url to OpenRouter (9 tests)
   Tests that assert OpenRouter-specific behavior (prompt caching,
   reasoning extra_body, provider preferences) need explicit base_url
   and model set on the agent. Updated test_run_agent.py and
   test_provider_parity.py.

2. Credential pool auto-seeding from host env (2 tests)
   test_auxiliary_client.py tests for Anthropic OAuth and custom
   endpoint fallback were not mocking _select_pool_entry, so the
   host's credential pool interfered. Added pool + codex mocks.

3. sys.modules corruption cascade (major - ~250 tests)
   test_managed_modal_environment.py replaced sys.modules entries
   (tools, hermes_cli, agent packages) with SimpleNamespace stubs
   but had NO cleanup fixture. Every subsequent test in the process
   saw corrupted imports: 'cannot import get_config_path from
   <unknown module name>' and 'module tools has no attribute
   environments'. Added _restore_tool_and_agent_modules autouse
   fixture matching the pattern in test_managed_browserbase_and_modal.py.

   This was also the root cause of CI failures (104 failed on main).
2026-04-02 08:43:06 -07:00
Teknium
624ad582a5 fix: make gateway approval block agent thread like CLI does (#4557)
The gateway's dangerous command approval system was fundamentally broken:
the agent loop continued running after a command was flagged, and the
approval request only reached the user after the agent finished its
entire conversation loop. By then the context was lost.

This change makes the gateway approval mirror the CLI's synchronous
behavior. When a dangerous command is detected:

1. The agent thread blocks on a threading.Event
2. The approval request is sent to the user immediately
3. The user responds with /approve or /deny
4. The event is signaled and the agent resumes with the real result

The agent never sees 'approval_required' as a tool result. It either
gets the command output (approved) or a definitive BLOCKED message
(denied/timed out) — same as CLI mode.

Queue-based design supports multiple concurrent approvals (parallel
subagents via delegate_task, execute_code RPC handlers). Each approval
gets its own _ApprovalEntry with its own threading.Event. /approve
resolves the oldest (FIFO); /approve all resolves all at once.

Changes:
- tools/approval.py: Queue-based per-session blocking gateway approval
  (register/unregister callbacks, resolve with FIFO or all-at-once)
- gateway/run.py: Register approval callback in run_sync(), remove
  post-loop pop_pending hack, /approve and /deny support 'all' flag
- tests: 21 tests including parallel subagent E2E scenarios
2026-04-02 01:47:19 -07:00
Teknium
64584a931f cleanup: use _generate_session_key for parent key, fix trailing whitespace 2026-04-02 01:33:53 -07:00
Gary Chiu
8cb3596939 fix(gateway): seed DM thread sessions with parent transcript to preserve context 2026-04-02 01:33:53 -07:00
kshitijk4poor
e94b4b2b40 fix: preserve allowed_users during setup reconfigure and quiet unconfigured provider warnings
Setup wizard now shows existing allowed_users when reconfiguring a
platform and preserves them if the user presses Enter. Previously the
wizard would display a misleading "No allowlist set" warning even when
the .env still held the original IDs.

Also downgrades the "provider X has no API key configured" log from
WARNING to DEBUG in resolve_provider_client — callers already handle
the None return with their own contextual messages. This eliminates
noisy startup warnings for providers in the fallback chain that the
user never configured (e.g. minimax).
2026-04-02 01:00:29 -07:00
Teknium
835defe074 fix: invalidate update cache for all profiles, not just current
hermes update only cleared .update_check for the active HERMES_HOME,
leaving other profiles showing stale 'N commits behind' in their banner.

Now _invalidate_update_cache() iterates over ~/.hermes/ (default) plus
every directory under ~/.hermes/profiles/ to clear all caches. The git
repo is shared across profiles so a single update brings them all current.

Reported by SteveSkedasticity on Discord.
2026-04-02 00:49:17 -07:00
Teknium
e4db72ef39 fix: merge dotted+hyphenated FTS5 quoting into single pass
The original PR applied dotted and hyphenated regex quoting in two
sequential steps.  For terms with both dots and hyphens (e.g.
my-app.config.ts), step 2 would re-match inside already-quoted output,
producing malformed double-quoted FTS5 syntax.

Merged into a single regex pass: \w+(?:[.-]\w+)+ — handles dots,
hyphens, and mixed terms in one shot.  Added test coverage for the
mixed case.
2026-04-02 00:49:11 -07:00
Lume
9825cd7b1e fix(state): quote dotted terms in FTS5 queries
FTS5 queries containing dots (e.g. P2.2, simulate.p2.test.ts) can trigger query parse edge cases that yield OperationalError or empty results unless quoted. Extend _sanitize_fts5_query to wrap dotted tokens in double quotes (similar to hyphenated terms) and add regression tests.
2026-04-02 00:49:11 -07:00
Roland Parnaso
c4e626b1fa refactor: extract _detect_file_drop() + add 28 tests
Extract the inline file-drop detection logic into a standalone
_detect_file_drop() function at module level for testability. The main
loop now calls this function instead of inlining the logic.

Tests cover:
- Slash commands still route correctly (/help, /quit, /xyz)
- Image paths auto-detected (.png, .jpg, .gif, etc.)
- Non-image files detected (.py, .txt, Makefile, etc.)
- Backslash-escaped spaces from macOS drag-and-drop
- Trailing user text preserved as remainder
- Edge cases: directories, symlinks, no-extension files
- Non-string input, empty strings, nonexistent paths
2026-04-02 00:40:27 -07:00
Roland Parnaso
1841886898 fix(cli): detect dragged file paths instead of treating them as slash commands
When a user drags a file into the terminal, macOS pastes the absolute
path (e.g. /Users/roland/Desktop/Screenshot.png) which starts with '/'
and was incorrectly routed to process_command(), producing an 'Unknown
command' error.

This change adds file-path detection before the slash-command check:
- Parses the first token, handling backslash-escaped spaces from macOS
- Checks if the path exists as a real file via Path.exists()
- Image files (.png, .jpg, etc.) are auto-attached to the message
- Non-image files are reformatted as [User attached file: ...] context
- Falls through to normal slash-command handling if not a real file path
2026-04-02 00:40:27 -07:00
Teknium
f4bc6aa856 fix: scope extras retry to [all] group only
_load_installable_optional_extras() was returning ALL extras from
pyproject.toml except 'all', which included 'rl' and 'yc-bench' —
extras not referenced by [all] that install heavy research deps
(atroposlib, tinker, wandb) from git repos. Changed to parse the
[all] group's references and only retry those 18 extras.

Also moved tomllib import to function-level since it only runs
during the rare fallback path.
2026-04-02 00:40:07 -07:00
kshitijk4poor
c91f4ef4ed fix(update): preserve optional extras during fallback install 2026-04-02 00:40:07 -07:00
Ben Barclay
5101f853ba Merge pull request #3287 from NousResearch/rewbs/tool-use-charge-to-subscription 2026-04-01 18:42:47 -07:00
Hermes Agent
a0f5fc2570 fix(tools): add debug logging for token refresh and tighten domain check
- Add logger + debug log to read_nous_access_token() catch-all so token
  refresh failures are observable instead of silently swallowed
- Tighten _is_nous_auxiliary_client() domain check to use proper URL
  hostname parsing instead of substring match, preventing false-positives
  on domains like not-nousresearch.com or nousresearch.com.evil.com
2026-04-02 12:40:03 +11:00
Ben
647f99d4dd fix: resolve post-merge issues in auxiliary_client and model flow
- Add missing `from agent.credential_pool import load_pool` import to
  auxiliary_client.py (introduced by the credential pool feature in main)
- Thread `args` through `select_provider_and_model(args=None)` so TLS
  options from `cmd_model` reach `_model_flow_nous`
- Mock `_require_tty` in test_cmd_model_forwards_nous_login_tls_options
  so it can run in non-interactive test environments

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 00:50:40 +00:00
Ben Barclay
a2e56d044b Merge branch 'main' into rewbs/tool-use-charge-to-subscription 2026-04-02 11:00:35 +11:00
pefontana
bd9e0b605f test(e2e): remove section separator comments 2026-04-01 15:23:52 -07:00
pefontana
99e6f44204 test(e2e): remove unused imports and duplicate fixtures 2026-04-01 15:23:52 -07:00
pefontana
1f1297f56c ci: merge e2e into tests workflow as separate job
Move e2e tests into tests.yml as a parallel job instead of a separate
workflow. Unit tests now also ignore tests/e2e/ to avoid running them
twice. Both jobs appear as independent checks in the PR.
2026-04-01 15:23:52 -07:00
pefontana
04e60cfacd test(e2e): add authorization, session lifecycle, and resilience tests
New test classes:
- TestSessionLifecycle: /new then /status sequence, idempotent resets
- TestAuthorization: unauthorized users get pairing code, not commands
- TestSendFailureResilience: pipeline survives send() failures

Additional command coverage: /provider, /verbose, /personality, /yolo.

Note: /provider test is xfail - found a real bug where model_cfg is
referenced unbound when config.yaml is absent (run.py:3247).
2026-04-01 15:23:52 -07:00
pefontana
ecd9bf2ca0 test(e2e): revert intentional failure after CI verification
CI correctly detected the broken assertion — e2e workflow works.
2026-04-01 15:23:52 -07:00
pefontana
b209dc0f43 test(e2e): add intentional failure to verify CI detection
Temporary commit — will be reverted after confirming CI catches it.
2026-04-01 15:23:52 -07:00
pefontana
67e1170b01 ci: add e2e test workflow
Separate workflow for gateway e2e tests, runs on push/PR to main.
Same Python 3.11 + uv setup as existing tests.yml but targets only
tests/e2e/ with verbose output.
2026-04-01 15:23:52 -07:00
pefontana
bff34b1df9 test(e2e): add telegram slash command e2e tests
Tests /help, /status, /new, /stop, /commands through the full adapter
background-task pipeline. Validates command dispatch, session lifecycle,
and response delivery without any LLM involvement.
2026-04-01 15:23:52 -07:00
pefontana
ba48cfe84a test(e2e): add telegram gateway e2e test infrastructure
Fixtures and helpers for driving messages through the full async
pipeline: adapter.handle_message → background task → GatewayRunner
command dispatch → adapter.send (mocked).

Uses the established _make_runner pattern (object.__new__) to skip
filesystem side effects while exercising real command dispatch logic.
2026-04-01 15:23:52 -07:00
Teknium
de9bba8d7c fix: remove hardcoded OpenRouter/opus defaults
No model, base_url, or provider is assumed when the user hasn't
configured one.  Previously the defaults dict in cli.py, AIAgent
constructor args, and several fallback paths all hardcoded
anthropic/claude-opus-4.6 + openrouter.ai/api/v1 — silently routing
unconfigured users to OpenRouter, which 404s for anyone using a
different provider.

Now empty defaults force the setup wizard to run, and existing users
who already completed setup are unaffected (their config.yaml has
the model they chose).

Files changed:
- cli.py: defaults dict, _DEFAULT_CONFIG_MODEL
- run_agent.py: AIAgent.__init__ defaults, main() defaults
- hermes_cli/config.py: DEFAULT_CONFIG
- hermes_cli/runtime_provider.py: is_fallback sentinel
- acp_adapter/session.py: default_model
- tests: updated to reflect empty defaults
2026-04-01 15:22:26 -07:00
Teknium
3628ccc8c4 feat: use 'developer' role for GPT-5 and Codex models (#4498)
OpenAI's newer models (GPT-5, Codex) give stronger instruction-following
weight to the 'developer' role vs 'system'. Swap the role at the API
boundary in _build_api_kwargs() for the chat_completions path so internal
message representation stays consistent ('system' everywhere).

Applies regardless of provider — OpenRouter, Nous portal, direct, etc.
The codex_responses path (direct OpenAI) uses 'instructions' instead of
message roles, so it's unaffected.

DEVELOPER_ROLE_MODELS constant in prompt_builder.py defines the matching
model name substrings: ('gpt-5', 'codex').
2026-04-01 14:49:32 -07:00
Teknium
c59ab8b0da fix: profile model.model promoted to model.default when default not set
When a profile config sets model.model but not model.default, the
hardcoded default (claude-opus-4.6) survived the config merge and
took precedence in HermesCLI.__init__ because it checks model.default
first. Profile model configs were silently ignored.

Now model.model is promoted to model.default during the merge when the
user didn't explicitly set model.default. Fixes #4486.
2026-04-01 13:46:18 -07:00
Teknium
16d9f58445 fix(gateway): persist memory flush state to prevent redundant re-flushes on restart (#4481)
* fix: force-close TCP sockets on client cleanup, detect and recover dead connections

When a provider drops connections mid-stream (e.g. OpenRouter outage),
httpx's graceful close leaves sockets in CLOSE-WAIT indefinitely. These
zombie connections accumulate and can prevent recovery without restarting.

Changes:
- _force_close_tcp_sockets: walks the httpx connection pool and issues
  socket.shutdown(SHUT_RDWR) + close() to force TCP RST on every socket
  when a client is closed, preventing CLOSE-WAIT accumulation
- _cleanup_dead_connections: probes the primary client's pool for dead
  sockets (recv MSG_PEEK), rebuilds the client if any are found
- Pre-turn health check at the start of each run_conversation call that
  auto-recovers with a user-facing status message
- Primary client rebuild after stale stream detection to purge pool
- User-facing messages on streaming connection failures:
  "Connection to provider dropped — Reconnecting (attempt 2/3)"
  "Connection failed after 3 attempts — try again in a moment"

Made-with: Cursor

* fix: pool entry missing base_url for openrouter, clean error messages

- _resolve_runtime_from_pool_entry: add OPENROUTER_BASE_URL fallback
  when pool entry has no runtime_base_url (pool entries from auth.json
  credential_pool often omit base_url)
- Replace Rich console.print for auth errors with plain print() to
  prevent ANSI escape code mangling through prompt_toolkit's stdout patch
- Force-close TCP sockets on client cleanup to prevent CLOSE-WAIT
  accumulation after provider outages
- Pre-turn dead connection detection with auto-recovery and user message
- Primary client rebuild after stale stream detection
- User-facing status messages on streaming connection failures/retries

Made-with: Cursor

* fix(gateway): persist memory flush state to prevent redundant re-flushes on restart

The _session_expiry_watcher tracked flushed sessions in an in-memory set
(_pre_flushed_sessions) that was lost on gateway restart. Expired sessions
remained in sessions.json and were re-discovered every restart, causing
redundant AIAgent runs that burned API credits and blocked the event loop.

Fix: Add a memory_flushed boolean field to SessionEntry, persisted in
sessions.json. The watcher sets it after a successful flush. On restart,
the flag survives and the watcher skips already-flushed sessions.

- Add memory_flushed field to SessionEntry with to_dict/from_dict support
- Old sessions.json entries without the field default to False (backward compat)
- Remove the ephemeral _pre_flushed_sessions set from SessionStore
- Update tests: save/load roundtrip, legacy entry compat, auto-reset behavior
2026-04-01 12:05:02 -07:00
Teknium
1515e8c8f2 fix: rewrite test mock secrets and add redaction fixture
The original test file had mock secrets corrupted by secret-redaction
tooling before commit — the test values (sk-ant...l012) didn't actually
trigger the PREFIX_RE regex, so 4 of 10 tests were asserting against
values that never appeared in the input.

- Replace truncated mock values with proper fake keys built via string
  concatenation (avoids tool redaction during file writes)
- Add _ensure_redaction_enabled autouse fixture to patch the module-level
  _REDACT_ENABLED constant, matching the pattern from test_redact.py
2026-04-01 12:03:56 -07:00
0xbyt4
127a4e512b security: redact secrets from auxiliary and vision LLM responses
LLM responses from browser snapshot extraction and vision analysis
could echo back secrets that appeared on screen or in page content.
Input redaction alone is insufficient — the LLM may reproduce secrets
it read from screenshots (which cannot be text-redacted).

Now redact outputs from:
- _extract_relevant_content (auxiliary LLM response)
- browser_vision (vision LLM response)
- camofox_vision (vision LLM response)
2026-04-01 12:03:56 -07:00
0xbyt4
712aa44325 security: block secret exfiltration via browser URLs and auxiliary LLM calls
Three exfiltration vectors closed:

1. Browser URL exfil — agent could embed secrets in URL params and
   navigate to attacker-controlled server. Now scans URLs for known
   API key patterns before navigating (browser_navigate, web_extract).

2. Browser snapshot leak — page displaying env vars or API keys would
   send secrets to auxiliary LLM via _extract_relevant_content before
   run_agent.py's redaction layer sees the result. Now redacts snapshot
   text before the auxiliary call.

3. Camofox annotation leak — accessibility tree text sent to vision
   LLM could contain secrets visible on screen. Now redacts annotation
   context before the vision call.

10 new tests covering URL blocking, snapshot redaction, and annotation
redaction for both browser and camofox backends.
2026-04-01 12:03:56 -07:00
Teknium
7e91009018 fix: lazy-init SessionDB on adapter instance instead of per-request
Reuse a single SessionDB across requests by caching on self._session_db
with lazy initialization. Avoids creating a new SQLite connection per
request when X-Hermes-Session-Id is used. Updated tests to set
adapter._session_db directly instead of patching the constructor.
2026-04-01 11:41:32 -07:00
txchen
bf19623a53 feat(api-server): support X-Hermes-Session-Id header for session continuity
Allow callers to pass X-Hermes-Session-Id in request headers to continue
an existing conversation. When provided, history is loaded from SessionDB
instead of the request body, and the session_id is echoed in the response
header. Without the header, existing behavior is preserved (new uuid per
request).

This enables web UI clients to maintain thread continuity without modifying
any session state themselves — the same mechanism the gateway uses for IM
platforms (Telegram, Discord, etc.).
2026-04-01 11:41:32 -07:00
Leegenux
3ff9e0101d fix(skill_utils): add type check for metadata field in extract_skill_conditions
When PyYAML is unavailable or YAML frontmatter is malformed, the fallback
parser may return metadata as a string instead of a dict. This causes
AttributeError when calling .get("hermes") on the string.

Added explicit type checks to handle cases where metadata or hermes fields
are not dicts, preventing the crash.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
2026-04-01 11:34:56 -07:00
Teknium
b267516851 fix: also exclude .env from default profile exports
The original PR excluded auth.json from _DEFAULT_EXPORT_EXCLUDE_ROOT and
filtered both auth.json and .env from named profile exports, but missed
adding .env to the default profile exclusion set. Default exports would
still leak .env containing API keys.

Added .env to _DEFAULT_EXPORT_EXCLUDE_ROOT, added test coverage, and
updated the existing test that incorrectly asserted .env presence.
2026-04-01 11:20:33 -07:00
dieutx
d435acc2c0 fix(security): exclude auth.json and .env from profile exports 2026-04-01 11:20:33 -07:00
Teknium
bacc86d031 fix: use RedactingFormatter on stderr handler, update types and test mock
- stderr handler now uses RedactingFormatter to match file handlers
- restart path uses verbose=0 (int) instead of verbose=False (bool)
- test mock updated with new run_gateway(verbose, quiet, replace) signature
2026-04-01 11:05:07 -07:00
Alan Justino
5bd01b838c fix(gateway): wire -v/-q flags to stderr logging
By default 'hermes gateway run' now prints WARNING+ to stderr so
connection errors and startup failures are visible in the terminal
without having to tail ~/.hermes/logs/gateway.log.

- gateway/run.py: start_gateway() accepts verbosity: Optional[int]=0.
  When not None, attaches a StreamHandler to stderr with level mapped
  from the count (0=WARNING, 1=INFO, 2+=DEBUG). Root logger level is
  also lowered when DEBUG is requested so records are not swallowed.

- hermes_cli/gateway.py: run_gateway() gains verbose: int and
  quiet: bool params. -q translates to verbosity=None (no stderr
  handler). Wired through gateway_command().

- hermes_cli/main.py: -v changed from store_true to action=count so
  -v/-vv/-vvv each increment the level. -q/--quiet added as a new flag.

Behaviour summary:
  hermes gateway run        -> WARNING+ on stderr (default)
  hermes gateway run -q     -> silent
  hermes gateway run -v     -> INFO+
  hermes gateway run -vv    -> DEBUG
2026-04-01 11:05:07 -07:00
analista
3400098481 fix: update fetch_transcript.py for youtube-transcript-api v1.x
The library removed the static get_transcript() method in v1.0.
Migrate to the new instance-based fetch() API and normalize
FetchedTranscriptSnippet objects back to dicts for compatibility
with the rest of the script.
2026-04-01 10:49:24 -07:00
Dean Kerr
e905768ffd fix(gateway): remap HERMES_HOME to target user in system service unit
When `sudo hermes gateway install --system --run-as-user <user>` generates
the systemd unit, get_hermes_home() resolves to /root/.hermes because
Path.home() returns root's home under sudo. The unit correctly sets
HOME= and User= via _system_service_identity(), but HERMES_HOME was
computed independently and pointed to root's config directory.

Add _hermes_home_for_target_user() which remaps the current HERMES_HOME
to the equivalent path under the target user's home. This handles:
- Default ~/.hermes → target user's ~/.hermes
- Profiles (e.g. ~/.hermes/profiles/coder) → preserves relative structure
- Custom paths (e.g. /opt/hermes) → kept as-is

Supersedes #3861 which only handled the default case and left profiles
broken (also flagged by Copilot review).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 06:09:33 -07:00
Teknium
e0abf2416d fix: restore _config_version to 11 (reverted by stale-branch merge in #4419) (#4440)
PR #4419 was based on pre-credential-pools main where _config_version was 10.
The squash merge downgraded it from 11 (set by #2647) back to 10.
Also fixes the test assertion.
2026-04-01 04:34:04 -07:00
Teknium
f6ada27d1c feat(skills): size limits for agent writes + fuzzy matching for patch (#4414)
* feat(skills): add content size limits for agent-created skills

Agent writes via skill_manage (create/edit/patch/write_file) are now
constrained to prevent unbounded growth:

- SKILL.md and supporting files: 100,000 character limit
- Supporting files: additional 1 MiB byte limit
- Patches on oversized hand-placed skills that reduce the size are
  allowed (shrink path), but patches that grow beyond the limit are
  rejected

Hand-placed skills and hub-installed skills have NO hard limit —
they load and function normally regardless of size. Hub installs
get a warning in the log if SKILL.md exceeds 100k chars.

This mirrors the memory system's char_limit pattern. Without this,
the agent auto-grows skills indefinitely through iterative patches
(hermes-agent-dev reached 197k chars / 72k tokens — 40x larger than
the largest skill in the entire skills.sh ecosystem).

Constants: MAX_SKILL_CONTENT_CHARS (100k), MAX_SKILL_FILE_BYTES (1MiB)
Tests: 14 new tests covering all write paths and edge cases

* feat(skills): add fuzzy matching to skill patch

_patch_skill now uses the same 8-strategy fuzzy matching engine
(tools/fuzzy_match.py) as the file patch tool. Handles whitespace
normalization, indentation differences, escape sequences, and
block-anchor matching. Eliminates exact-match failures when agents
patch skills with minor formatting mismatches.
2026-04-01 04:19:19 -07:00
Teknium
70744add15 feat(browser): add persistent Camofox sessions and VNC URL discovery (salvage #4400) (#4419)
Adds two Camofox features:

1. Persistent browser sessions: new `browser.camofox.managed_persistence`
   config option. When enabled, Hermes sends a deterministic profile-scoped
   userId to Camofox so the server maps it to a persistent browser profile
   directory. Cookies, logins, and browser state survive across restarts.
   Default remains ephemeral (random userId per session).

2. VNC URL discovery: Camofox /health endpoint returns vncPort when running
   in headed mode. Hermes constructs the VNC URL and includes it in navigate
   responses so the agent can share it with users.

Also fixes camofox_vision bug where call_llm response object was passed
directly to json.dumps instead of extracting .choices[0].message.content.

Changes from original PR:
- Removed browser_evaluate tool (separate feature, needs own PR)
- Removed snapshot truncation limit change (unrelated)
- Config.yaml only for managed_persistence (no env var, no version bump)
- Rewrote tests to use config mock instead of env var
- Reverted package-lock.json churn

Co-authored-by: analista <psikonetik@gmail.com.com>
2026-04-01 04:18:50 -07:00
Teknium
85e96a4638 fix(skills): move unified hermes-agent skill into autonomous-ai-agents category (#4435)
The unified skill from PR #4332 was placed at a top-level
skills/hermes-agent/ directory, creating a redundant standalone
category. Move it to skills/autonomous-ai-agents/hermes-agent/
alongside claude-code, codex, and opencode where it belongs.
2026-04-01 03:39:25 -07:00
Teknium
c9dc6c4749 fix(insights): show cache tokens in overview so total adds up (#4428)
The total_tokens field includes cache_read + cache_write tokens, but
the display only showed input + output — making the math look wrong
(e.g. 765K + 134K displayed but total said 9.2M). Now shows a cache
line when cache tokens are present so all visible numbers sum to the
displayed total.

Affects both terminal (hermes insights) and gateway (/insights)
formats.
2026-04-01 03:06:47 -07:00
kshitijk4poor
935137f0d9 feat: add inline diff previews for write actions
Show inline diffs in the CLI transcript when write_file, patch, or
skill_manage modifies files. Captures a filesystem snapshot before the
tool runs, computes a unified diff after, and renders it with ANSI
coloring in the activity feed.

Adds tool_start_callback and tool_complete_callback hooks to AIAgent
for pre/post tool execution notifications.

Also fixes _extract_parallel_scope_path to normalize relative paths
to absolute, preventing the parallel overlap detection from missing
conflicts when the same file is referenced with different path styles.

Gated by display.inline_diffs config option (default: true).

Based on PR #3774 by @kshitijk4poor.
2026-04-01 02:13:57 -07:00
Teknium
68fc4aec21 fix: comprehensive default profile export exclusions and import guard
- Add _DEFAULT_EXPORT_EXCLUDE_ROOT constant with 25+ entries to exclude
  from default profile exports: repo checkout (hermes-agent), worktrees,
  databases (state.db), caches, runtime state, logs, binaries
- Add _default_export_ignore() with root-level and universal exclusions
  (__pycache__, *.sock, *.tmp at any depth)
- Remove redundant shutil/tempfile imports from contributor's if-block
- Block import_profile() from accepting 'default' as target name with
  clear guidance to use --name
- Add 7 tests covering: archive creation, inclusion of profile data,
  exclusion of infrastructure, nested __pycache__ exclusion, import
  rejection without --name, import rejection with --name default,
  full export-import roundtrip with a different name

Addresses review feedback on PR #4370.
2026-04-01 01:43:51 -07:00
Devorun
f04977f45a fix(cli): support exporting the default root profile (#4366) 2026-04-01 01:43:51 -07:00
Teknium
996250d178 fix(cli): pin entire TUI to bottom of terminal on startup (#4412)
Replace the per-response padding from PR #4359 (which created a void
between short responses and the prompt) with a one-time initial scroll
at session start.  Prints terminal_height newlines before the banner so
the cursor starts at the bottom row — banner, responses, and prompt all
appear pinned to the bottom with empty space above, not below.

patch_stdout naturally keeps the prompt at the bottom from there, so
no per-response padding is needed.
2026-04-01 01:41:09 -07:00
Bartok9
afa75a6185 fix(client): handle is_closed as method in OpenAI SDK
The openai SDK's SyncAPIClient.is_closed is a method, not a property.
getattr(client, 'is_closed', False) returned the bound method object,
which is always truthy — causing _is_openai_client_closed() to report
all clients as closed and triggering unnecessary client recreation
(~100-200ms TCP+TLS overhead per API call).

Fix: check if is_closed is callable and call it, otherwise treat as bool.

Fixes #4377
Co-authored-by: Bartok9 <Bartok9@users.noreply.github.com>
2026-04-01 01:40:43 -07:00
Nick
9a581bba50 fix(gateway): resume agent after /approve executes blocked command
When a dangerous command was blocked and the user approved it via /approve,
the command was executed but the agent loop had already exited — the agent
never received the command output and the task died silently.

Now _handle_approve_command sends immediate feedback to the user, then
creates a synthetic continuation message with the command output and feeds
it through _handle_message so the agent picks up where it left off.

- Send command result to chat immediately via adapter.send()
- Create synthetic MessageEvent with command + output as context
- Spawn asyncio task to re-invoke agent via _handle_message
- Return None (feedback already sent directly)
- Add test for agent re-invocation after approval
- Update existing approval tests for new return behavior
2026-04-01 01:38:55 -07:00
Smyile
8327f7cc61 fix(docs): use compound selector instead of media query
Target the exact state that breaks: when .navbar-sidebar--show is active
on the same <nav> element. This preserves the blur on mobile when the
sidebar is closed, and only removes it when the sidebar is open.
2026-04-01 01:14:39 -07:00
Smyile
7baee0b023 fix(docs): restrict backdrop-filter to desktop to fix mobile sidebar
backdrop-filter on .navbar creates a new CSS stacking context that
hides .navbar-sidebar menu content on mobile (only the close button
is visible). Scope the blur effect to min-width: 997px so it only
applies on desktop where the sidebar is not rendered inside the navbar.

Ref: facebook/docusaurus#6996, facebook/docusaurus#6853
2026-04-01 01:14:39 -07:00
Teknium
efa327a998 fix: add missing provider attrs to cli_obj test fixture
_show_status() now references self.provider and self._provider_source,
added after the original PR was submitted.
2026-04-01 01:12:23 -07:00
Johannnnn506
9b99ea176e fix(cli): initialize ctx_len before compact banner path 2026-04-01 01:12:23 -07:00
Teknium
a7f7e87070 fix: preserve credential_pool through smart routing and defer eager fallback on 429 (#4361)
Three bugs prevented credential pool rotation from working when multiple
Codex OAuth tokens were configured:

1. credential_pool was dropped during smart model turn routing.
   resolve_turn_route() constructed runtime dicts without it, so the
   AIAgent was created without pool access. Fixed in smart_model_routing.py
   (no-route and fallback paths), cli.py, and gateway/run.py.

2. Eager fallback fired before pool rotation on 429. The rate-limit
   handler at line ~7180 switched to a fallback provider immediately,
   before _recover_with_credential_pool got a chance to rotate to the
   next credential. Now deferred when the pool still has credentials.

3. (Non-issue) Retry budget was reported as too small, but successful
   pool rotations already skip retry_count increment — no change needed.

Reported by community member Schinsly who identified all three root
causes and verified the fix locally with multiple Codex accounts.
2026-04-01 01:02:34 -07:00
Teknium
ef2ae3e48f fix(file_tools): refresh staleness timestamp after writes (#4390)
After a successful write_file or patch, update the stored read
timestamp to match the file's new modification time.  Without this,
consecutive edits by the same task (read → write → write) would
false-warn on the second write because the stored timestamp still
reflected the original read, not the first write.

Also renames the internal tracker key from 'file_mtimes' to
'read_timestamps' for clarity.
2026-04-01 00:50:08 -07:00
SHL0MS
83dec2b3ec fix: skip empty/whitespace text in Telegram send to prevent 400 errors
Telegram API returns HTTP 400 when sent whitespace-only or empty
text. Add a guard at the top of send() to silently succeed on
blank content instead of crashing.

Equivalent to OpenClaw #56620.
2026-03-31 19:10:26 -07:00
Laura Batalha
f4d44c777b feat(discord): only create threads and reactions for authorized users 2026-03-31 19:06:46 -07:00
Teknium
0a6d366327 fix(security): redact secrets from execute_code sandbox output
* fix: root-level provider in config.yaml no longer overrides model.provider

load_cli_config() had a priority inversion: a stale root-level
'provider' key in config.yaml would OVERRIDE the canonical
'model.provider' set by 'hermes model'. The gateway reads
model.provider directly from YAML and worked correctly, but
'hermes chat -q' and the interactive CLI went through the merge
logic and picked up the stale root-level key.

Fix: root-level provider/base_url are now only used as a fallback
when model.provider/model.base_url is not set (never as an override).

Also added _normalize_root_model_keys() to config.py load_config()
and save_config() — migrates root-level provider/base_url into the
model section and removes the root-level keys permanently.

Reported by (≧▽≦) in Discord: opencode-go provider persisted as a
root-level key and overrode the correct model.provider=openrouter,
causing 401 errors.

* fix(security): redact secrets from execute_code sandbox output

The execute_code sandbox stripped env vars with secret-like names from
the child process (preventing os.environ access), but scripts could
still read secrets from disk (e.g. open('~/.hermes/.env')) and print
them to stdout. The raw values entered the model context unredacted.

terminal_tool and file_tools already applied redact_sensitive_text()
to their output — execute_code was the only tool that skipped this
step. Now the same redaction runs on both stdout and stderr after
ANSI stripping.

Reported via Discord (not filed on GitHub to avoid public disclosure
of the reproduction steps).
2026-03-31 18:52:11 -07:00
Teknium
3604665e44 feat: add qwen/qwen3.6-plus-preview:free to OpenRouter and Nous model lists (#4376) 2026-03-31 18:05:40 -07:00
Ben Barclay
c36aa5fe98 Merge pull request #4034 from bcross/docker-optimization
fix(docker): optimize docker contanier image creation
2026-03-31 15:27:06 -07:00
Teknium
f8cb54ba04 fix(cli): anchor input prompt near bottom of terminal after responses (#4359)
After short agent responses, the prompt_toolkit input area sat mid-screen
with empty terminal space below it. Now prints padding newlines (half
terminal height) after each response to push the prompt toward the bottom.
patch_stdout renders the padding above the input area.
2026-03-31 14:56:35 -07:00
Teknium
b118f607b2 feat(skills): unify hermes-agent and hermes-agent-setup into single skill (#4332)
Merges the hermes-agent-spawning skill (autonomous-ai-agents/) and
hermes-agent-setup skill (dogfood/) into a single comprehensive
skills/hermes-agent/ skill.

The unified skill covers:
- What Hermes Agent is and how it compares to Claude Code/Codex/OpenClaw
- Complete CLI reference (all subcommands and flags)
- Slash command reference
- Configuration guide (providers, toolsets, config sections)
- Voice/STT/TTS setup
- Spawning additional agent instances (one-shot and interactive PTY)
- Multi-agent coordination patterns
- Troubleshooting guide
- Where-to-find-things lookup table with docs links
- Concise contributor quick reference

Removes:
- skills/autonomous-ai-agents/hermes-agent/ (hermes-agent-spawning)
- skills/dogfood/hermes-agent-setup/
2026-03-31 14:49:20 -07:00
Teknium
f04986029c feat(file_tools): detect stale files on write and patch (#4345)
Track file mtime when read_file is called.  When write_file or patch
subsequently targets the same file, compare the current mtime against
the recorded one.  If they differ (external edit, concurrent agent,
user change), include a _warning in the result advising the agent to
re-read.  The write still proceeds — this is a soft signal, not a
hard block.

Key design points:
- Per-task isolation: task A's reads don't affect task B's writes.
- Files never read produce no warning (not enforcing read-before-write).
- mtime naturally updates after the agent's own writes, so the warning
  only fires on external changes, not the agent's own edits.
- V4A multi-file patches check all target paths.

Tests: 10 new tests covering write staleness, patch staleness,
never-read files, cross-task isolation, and the helper function.
2026-03-31 14:49:00 -07:00
Teknium
f5cc597afc fix: add CAMOFOX_PORT=9377 to Docker commands for camofox-browser (#4340)
The camofox-browser image defaults to port 3000 internally, not 9377.
Without -e CAMOFOX_PORT=9377, the -p 9377:9377 mapping silently fails
because nothing listens on 9377 inside the container.

E2E verified: -p 9377:9377 alone → connection reset,
-p 9377:9377 -e CAMOFOX_PORT=9377 → healthy and functional.
2026-03-31 13:38:22 -07:00
Teknium
1b62ad9de7 fix: root-level provider in config.yaml no longer overrides model.provider
load_cli_config() had a priority inversion: a stale root-level
'provider' key in config.yaml would OVERRIDE the canonical
'model.provider' set by 'hermes model'. The gateway reads
model.provider directly from YAML and worked correctly, but
'hermes chat -q' and the interactive CLI went through the merge
logic and picked up the stale root-level key.

Fix: root-level provider/base_url are now only used as a fallback
when model.provider/model.base_url is not set (never as an override).

Also added _normalize_root_model_keys() to config.py load_config()
and save_config() — migrates root-level provider/base_url into the
model section and removes the root-level keys permanently.

Reported by (≧▽≦) in Discord: opencode-go provider persisted as a
root-level key and overrode the correct model.provider=openrouter,
causing 401 errors.
2026-03-31 12:54:22 -07:00
Teknium
e3f8347be3 feat(file_tools): harden read_file with size guard, dedup, and device blocking (#4315)
* feat(file_tools): harden read_file with size guard, dedup, and device blocking

Three improvements to read_file_tool to reduce wasted context tokens and
prevent process hangs:

1. Character-count guard: reads that produce more than 100K characters
   (≈25-35K tokens across tokenisers) are rejected with an error that
   tells the model to use offset+limit for a smaller range.  The
   effective cap is min(file_size, 100K) so small files that happen to
   have long lines aren't over-penalised.  Large truncated files also
   get a hint nudging toward targeted reads.

2. File-read deduplication: when the same (path, offset, limit) is read
   a second time and the file hasn't been modified (mtime unchanged),
   return a lightweight stub instead of re-sending the full content.
   Writes and patches naturally change mtime, so post-edit reads always
   return fresh content.  The dedup cache is cleared on context
   compression — after compression the original read content is
   summarised away, so the model needs the full content again.

3. Device path blocking: paths like /dev/zero, /dev/random, /dev/stdin
   etc. are rejected before any I/O to prevent process hangs from
   infinite-output or blocking-input devices.

Tests: 17 new tests covering all three features plus the dedup-reset-
on-compression integration.  All 52 file-read tests pass (35 existing +
17 new).  Full tool suite (2124 tests) passes with 0 failures.

* feat: make file_read_max_chars configurable, add docs

Add file_read_max_chars to DEFAULT_CONFIG (default 100K).  read_file_tool
reads this on first call and caches for the process lifetime.  Users on
large-context models can raise it; users on small local models can lower it.

Also adds a 'File Read Safety' section to the configuration docs
explaining the char limit, dedup behavior, and example values.
2026-03-31 12:53:19 -07:00
Teknium
d3f1987a05 fix(security): add .config/gh to read protection for @file references (#4327)
Follow-up to PR #4305 — .config/gh was added to the write-deny list
but missed from _SENSITIVE_HOME_DIRS, leaving GitHub CLI OAuth tokens
exposed via @file:~/.config/gh/hosts.yml context injection.
2026-03-31 12:48:30 -07:00
maymuneth
655eea2db8 fix(security): protect .docker, .azure, and .config/gh from read and write 2026-03-31 12:47:10 -07:00
binhnt92
c94a5fa1b2 fix(cli): use atomic write in save_config_value to prevent config loss on interrupt
save_config_value() used bare open(path, 'w') + yaml.dump() which truncates
the file to zero bytes on open. If the process is interrupted mid-write,
config.yaml is left empty. Replace with atomic_yaml_write() (temp file +
fsync + os.replace), matching the gateway config write path.

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
2026-03-31 12:21:55 -07:00
Teknium
7f78deebe7 fix: apply same path traversal checks to config-based credential files
_load_config_files() had the same hermes_home / item pattern without
containment checks. While config.yaml is user-controlled (lower threat
than skill frontmatter), defense in depth prevents exploitation via
config injection or copy-paste mistakes.
2026-03-31 12:16:37 -07:00
maymuneth
a97641b9f2 fix(security): reject path traversal in credential file registration 2026-03-31 12:16:37 -07:00
Gutslabs
0f2ea2062b fix(profiles): validate tar archive member paths on import
Fixes a zip-slip path traversal vulnerability in hermes profile import.
shutil.unpack_archive() on untrusted tar members allows entries like
../../escape.txt to write files outside ~/.hermes/profiles/.

- Add _normalize_profile_archive_parts() to reject absolute paths
  (POSIX and Windows), traversal (..), empty paths, backslash tricks
- Add _safe_extract_profile_archive() for manual per-member extraction
  that only allows regular files and directories (rejects symlinks)
- Replace shutil.unpack_archive() with the safe extraction path
- Add regression tests for traversal and absolute-path attacks

Co-authored-by: Gutslabs <gutslabsxyz@gmail.com>
2026-03-31 12:14:27 -07:00
0xbyt4
08171c1c31 fix: allow voice mode in WSL when PulseAudio bridge is configured
WSL detection was treated as a hard fail, blocking voice mode even when
audio worked via PulseAudio bridge. Now PULSE_SERVER env var presence
makes WSL a soft notice instead of a blocking warning. Device query
failures in WSL with PULSE_SERVER are also treated as non-blocking.
2026-03-31 12:13:33 -07:00
Teknium
7f670a06cf feat: add --max-turns CLI flag to hermes chat
Exposes the existing max_turns parameter (cli.py main()) as a CLI flag
so programmatic callers (Paperclip adapter, scripts) can control the
agent's tool-calling iteration limit without editing config.yaml.

Priority chain unchanged: CLI flag > config agent.max_turns > env
HERMES_MAX_ITERATIONS > default 90.
2026-03-31 12:10:12 -07:00
curtitoo
cac9d20c4f test: add codex transport drop regression 2026-03-31 12:05:06 -07:00
curtitoo
e75964d46d fix: harden codex responses transport handling 2026-03-31 12:05:06 -07:00
Teknium
161acb0086 fix: credential pool 401 recovery rotates to next credential after failed refresh (#4300)
When an OAuth token refresh fails on a 401 error, the pool recovery
would return 'not recovered' without trying the next credential in the
pool. This meant users who added a second valid credential via
'hermes auth add' would never see it used when the primary credential
was dead.

Now: try refresh first (handles expired tokens quickly), and if that
fails, rotate to the next available credential — same as 429/402
already did.

Adds three tests covering 401 refresh success, refresh-fail-then-rotate,
and refresh-fail-with-no-remaining-credentials.
2026-03-31 12:02:29 -07:00
Teknium
143b74ec00 fix: first-run guard stuck in loop when provider configured via config.yaml (#4298)
The _has_any_provider_configured() guard only checked env vars, .env file,
and auth.json — missing config.yaml model.provider/base_url/api_key entirely.
Users who configured a provider through setup (saving to config.yaml) but had
empty API key placeholders in .env from the install template were permanently
blocked by the 'not configured' message.

Changes:
- _has_any_provider_configured() now checks config.yaml model section for
  explicit provider, base_url, or api_key — covers custom endpoints and
  providers that store credentials in config rather than env vars
- .env.example: comment out all empty API key placeholders so they don't
  pollute the environment when copied to .env by the installer
- .env.example: mark LLM_MODEL as deprecated (config.yaml is source of truth)
- 4 new tests for the config.yaml detection path

Reported by OkadoOP on Discord.
2026-03-31 11:42:52 -07:00
Teknium
57625329a2 docs+feat: comprehensive local LLM provider guides and context length warning (#4294)
* docs: update llama.cpp section with --jinja flag and tool calling guide

The llama.cpp docs were missing the --jinja flag which is required for
tool calling to work. Without it, models output tool calls as raw JSON
text instead of structured API responses, making Hermes unable to
execute them.

Changes:
- Add --jinja and -fa flags to the server startup example
- Replace deprecated env vars (OPENAI_BASE_URL, LLM_MODEL) with
  hermes model interactive setup
- Add caution block explaining the --jinja requirement and symptoms
- List models with native tool calling support
- Add /props endpoint verification tip

* docs+feat: comprehensive local LLM provider guides and context length warning

Docs (providers.md):
- Rewrote Ollama section with context length warning (defaults to 4k on
  <24GB VRAM), three methods to increase it, and verification steps
- Rewrote vLLM section with --max-model-len, tool calling flags
  (--enable-auto-tool-choice, --tool-call-parser), and context guidance
- Rewrote SGLang section with --context-length, --tool-call-parser,
  and warning about 128-token default max output
- Added LM Studio section (port 1234, context length defaults to 2048,
  tool calling since 0.3.6)
- Added llama.cpp context length flag (-c) and GPU offload (-ngl)
- Added Troubleshooting Local Models section covering:
  - Tool calls appearing as text (with per-server fix table)
  - Silent context truncation and diagnosis commands
  - Low detected context at startup
  - Truncated responses
- Replaced all deprecated env vars (OPENAI_BASE_URL, LLM_MODEL) with
  hermes model interactive setup and config.yaml examples
- Added deprecation warning for legacy env vars in General Setup

Code (cli.py):
- Added context length warning in show_banner() when detected context
  is <= 8192 tokens, with server-specific fix hints:
  - Ollama (port 11434): suggests OLLAMA_CONTEXT_LENGTH env var
  - LM Studio (port 1234): suggests model settings adjustment
  - Other servers: suggests config.yaml override

Tests:
- 9 new tests covering warning thresholds, server-specific hints,
  and no-warning cases
2026-03-31 11:42:48 -07:00
arasovic
0240baa357 fix: strip orphaned think/reasoning tags from user-facing responses
Some models (e.g. Kimi K2.5 on Alibaba OpenAI-compatible endpoint)
emit reasoning text followed by a closing </think> without a matching
opening <think> tag.  The existing paired-tag regexes in
_strip_think_blocks() cannot match these orphaned tags, so </think>
leaks into user-facing responses on all platforms.

Add a catch-all regex that strips any remaining opening or closing
think/thinking/reasoning/REASONING_SCRATCHPAD tags after the existing
paired-block removal pass.

Closes #4285
2026-03-31 11:42:44 -07:00
Dakota Secula-Rosell
c1606aed69 fix(cli): allow empty strings and falsy values in config set
`hermes config set KEY ""` and `hermes config set KEY 0` were rejected
because the guard used `not value` which is truthy for empty strings,
zero, and False. Changed to `value is None` so only truly missing
arguments are rejected.

Closes #4277

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 11:41:12 -07:00
MacroAnarchy
49d7210fed fix(gateway): parse thread_id from delivery target format
The delivery target parser uses split(':', 1) which only splits on the
first colon. For the documented format platform:chat_id:thread_id
(e.g. 'telegram:-1001234567890:17585'), thread_id gets munged into
chat_id and is never extracted.

Fix: split(':', 2) to correctly extract all three parts. Also fix
to_string() to include thread_id for proper round-tripping.

The downstream plumbing in _deliver_to_platform() already handles
thread_id correctly (line 292-293) — it just never received a value.
2026-03-31 10:45:27 -07:00
Teknium
84a541b619 feat: support * wildcard in platform allowlists and improve WhatsApp docs
* docs: clarify WhatsApp allowlist behavior and document WHATSAPP_ALLOW_ALL_USERS

- Add WHATSAPP_ALLOW_ALL_USERS and WHATSAPP_DEBUG to env vars reference
- Warn that * is not a wildcard and silently blocks all messages
- Show WHATSAPP_ALLOWED_USERS as optional, not required
- Update troubleshooting with the * trap and debug mode tip
- Fix Security section to mention the allow-all alternative

Prompted by a user report in Discord where WHATSAPP_ALLOWED_USERS=*
caused all incoming messages to be silently dropped at the bridge level.

* feat: support * wildcard in platform allowlists

Follow the precedent set by SIGNAL_GROUP_ALLOWED_USERS which already
supports * as an allow-all wildcard.

Bridge (allowlist.js): matchesAllowedUser() now checks for * in the
allowedUsers set before iterating sender aliases.

Gateway (run.py): _is_authorized() checks for * in allowed_ids after
parsing the allowlist. This is generic — works for all platforms, not
just WhatsApp.

Updated docs to document * as a supported value instead of warning
against it. Added WHATSAPP_ALLOW_ALL_USERS and WHATSAPP_DEBUG to
the env vars reference.

Tests: JS allowlist test + 2 Python gateway tests (WhatsApp + Telegram
to verify cross-platform behavior).
2026-03-31 10:42:03 -07:00
Teknium
cca0996a28 fix(browser): skip SSRF check for local backends (Camofox, headless Chromium) (#4292)
The SSRF protection added in #3041 blocks all private/internal addresses
unconditionally in browser_navigate(). This prevents legitimate local use
cases (localhost apps, LAN devices) when using Camofox or the built-in
headless Chromium without a cloud provider.

The check is only meaningful for cloud backends (Browserbase, BrowserUse)
where the agent could reach internal resources on a remote machine. Local
backends give the user full terminal and network access already — the
SSRF check adds zero security value.

Add _is_local_backend() helper that returns True when Camofox is active
or no cloud provider is configured. Both the pre-navigation and
post-redirect SSRF checks now skip when running locally. The
browser.allow_private_urls config option remains available as an
explicit opt-out for cloud mode.
2026-03-31 10:40:13 -07:00
Teknium
fad3f338d1 fix: patch _REDACT_ENABLED in test fixture for module-level snapshot
The _REDACT_ENABLED constant is snapshotted at import time, so
monkeypatch.delenv() alone doesn't re-enable redaction during tests
when HERMES_REDACT_SECRETS=false is set in the host environment.
2026-03-31 10:30:48 -07:00
Dilee
6dcc3330b3 fix(security): add missing GitHub OAuth token patterns and snapshot redact flag
- Add gho_, ghu_, ghs_, ghr_ prefix patterns (OAuth, user-to-server,
  server-to-server, and refresh tokens) — all four types used by
  GitHub Apps and Copilot auth flows were absent from _PREFIX_PATTERNS
- Snapshot HERMES_REDACT_SECRETS at module import time instead of
  re-reading os.getenv() on every call, preventing runtime env mutations
  (e.g. LLM-generated export commands) from disabling redaction
2026-03-31 10:30:48 -07:00
Bryan Cross
289df5dd1c Merge branch 'NousResearch:main' into docker-optimization 2026-03-31 07:08:44 -05:00
Teknium
344239c2db feat: auto-detect models from server probe in custom endpoint setup (#4218)
Custom endpoint setup (_model_flow_custom) now probes the server first
and presents detected models instead of asking users to type blind:

- Single model: auto-confirms with Y/n prompt
- Multiple models: numbered list picker, or type a name
- No models / probe failed: falls back to manual input

Context length prompt also moved after model selection so the user sees
the verified endpoint before being asked for details.

All recent fixes preserved: config dict sync (#4172), api_key
persistence (#4182), no save_env_value for URLs (#4165).

Inspired by PR #4194 by sudoingX — re-implemented against current main.

Co-authored-by: Xpress AI (Dip KD) <200180104+sudoingX@users.noreply.github.com>
2026-03-31 03:29:00 -07:00
Teknium
79b2694b9a fix: _allow_private_urls name collision + stale OPENAI_BASE_URL test (#4217)
1. browser_tool.py: _allow_private_urls() used 'global _allow_private_urls'
   then assigned a bool to it, replacing the function in the module namespace.
   After first call, subsequent calls hit TypeError: 'bool' object is not
   callable. Renamed cache variable to _cached_allow_private_urls.

2. test_provider_parity.py: test_custom_endpoint_when_no_nous relied on
   OPENAI_BASE_URL env var (removed in config refactor). Mock
   _resolve_custom_runtime directly instead.
2026-03-31 03:16:40 -07:00
Teknium
8d59881a62 feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)
* feat(auth): add same-provider credential pools and rotation UX

Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.

- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647

* fix(tests): prevent pool auto-seeding from host env in credential pool tests

Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.

- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test

* feat(auth): add thread safety, least_used strategy, and request counting

- Add threading.Lock to CredentialPool for gateway thread safety
  (concurrent requests from multiple gateway sessions could race on
  pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
  with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
  with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
  thread safety (4 threads × 20 selects with no corruption)

* feat(auth): add interactive mode for bare 'hermes auth' command

When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:

1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
   add flow explicitly asks 'API key or OAuth login?' — making it
   clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
   least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection

The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.

* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)

Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.

* feat(auth): support custom endpoint credential pools keyed by provider name

Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).

- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
  model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
  providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
  pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing

* docs: add Excalidraw diagram of full credential pool flow

Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration

Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g

* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow

The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached

* docs: add comprehensive credential pool documentation

- New page: website/docs/user-guide/features/credential-pools.md
  Full guide covering quick start, CLI commands, rotation strategies,
  error recovery, custom endpoint pools, auto-discovery, thread safety,
  architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
  first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide

* chore: remove excalidraw diagram from repo (external link only)

* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns

- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
  (token_type, scope, client_id, portal_base_url, obtained_at,
  expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
  agent_key_obtained_at, tls) into a single extra dict with
  __getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider

Net -17 lines. All 383 targeted tests pass.

---------

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-31 03:10:01 -07:00
Teknium
2ae50bdddd fix(telegram): enforce 32-char limit on command names with collision avoidance (#4211)
Telegram Bot API requires command names to be 1-32 characters. Plugin
and skill names that exceed this limit now get truncated. If truncation
creates a collision (with core commands, other plugins, or other skills),
the name is shortened to 31 chars and a digit 0-9 is appended.

Adds _clamp_telegram_names() helper used for both plugin and skill
entries in telegram_menu_commands(). Core CommandDef commands are tracked
as reserved names so truncated plugin/skill names never shadow them.

Addresses the fix from PR #4191 (sroecker) with collision-safe truncation.

Tests: 9 new tests covering truncation, digit suffixes, exhaustion, dedup.
2026-03-31 02:41:50 -07:00
Nils
50302ed70a fix(tools): make browser SSRF check configurable via browser.allow_private_urls (#4198)
* fix(tools): skip SSRF check in local browser mode

The SSRF protection added in #3041 blocks all private/internal
addresses unconditionally in browser_navigate(). This prevents
legitimate local development use cases (localhost testing, LAN
device access) when using the local Chromium backend.

The SSRF check is only meaningful for cloud browsers (Browserbase,
BrowserUse) where the agent could reach internal resources on a
remote machine. In local mode, the user already has full terminal
and network access, so the check adds no security value.

This change makes the SSRF check conditional on _get_cloud_provider(),
keeping full protection in cloud mode while allowing private addresses
in local mode.

* fix(tools): make SSRF check configurable via browser.allow_private_urls

Replace unconditional SSRF check with a configurable setting.
Default (False) keeps existing security behavior. Setting to True
allows navigating to private/internal IPs for local dev and LAN use cases.

---------

Co-authored-by: Nils (Norya) <nils@begou.dev>
2026-03-31 02:11:55 -07:00
Teknium
086ec5590d fix: gate Claude Code credentials behind explicit Hermes config in wizard trigger (#4210)
If a user has Claude Code installed but never configured Hermes, the
first-run guard found those external credentials and skipped the setup
wizard. Users got silently routed to someone else's inference without
being asked.

Now _has_any_provider_configured() checks whether Hermes itself has been
explicitly configured (model in config differs from hardcoded default)
before counting Claude Code credentials. Fresh installs trigger the
wizard regardless of what external tools are on the machine.

Salvaged from PR #4194 by sudoingX — wizard trigger fix only.
Model auto-detect change under separate review.

Co-authored-by: Xpress AI (Dip KD) <200180104+sudoingX@users.noreply.github.com>
2026-03-31 02:01:15 -07:00
Teknium
c53a296df1 feat: add MiniMax M2.7 to hermes model picker and opencode-go (#4208)
Add MiniMax-M2.7 and M2.7-highspeed to _PROVIDER_MODELS for minimax
and minimax-cn providers in main.py so hermes model shows them.
Update opencode-go bare ID from m2.5 to m2.7 in models.py.

Salvaged from PR #4197 by octo-patch.
2026-03-31 01:54:13 -07:00
Teknium
1bca6f3930 fix: save API key to model config for custom endpoints (#4182)
Custom cloud endpoints (Together.ai, RunPod, Groq, etc.) lost their
API key after #4165 removed OPENAI_API_KEY .env saves.  The key was
only saved to the custom_providers list which is unreachable at
runtime for plain 'custom' provider resolution.

Save model.api_key to config.yaml alongside model.provider and
model.base_url in all three custom endpoint code paths:
- _model_flow_custom (new endpoint with model name)
- _model_flow_custom (new endpoint without model name)
- _model_flow_named_custom (switching to a saved endpoint)

The runtime resolver already reads model.api_key (runtime_provider.py
line 224-228), so the key is picked up automatically.  Each custom
endpoint carries its own key in config — no shared OPENAI_API_KEY
env var needed.
2026-03-31 01:36:15 -07:00
Teknium
a994cf5e5a docs: update adding-providers guide for unified setup flow
setup_model_provider() now delegates to select_provider_and_model()
from main.py, so new providers only need to be wired in main.py.
Removed setup.py from file checklists, replaced the setup.py section
with a tip explaining the automatic inheritance.
2026-03-31 01:29:43 -07:00
Teknium
ff78ad4c81 feat: add discord.reactions config option to disable message reactions (#4199)
Adds a 'reactions' key under the discord config section (default: true).
When set to false, the bot no longer adds 👀// reactions to messages
during processing. The config maps to DISCORD_REACTIONS env var following
the same pattern as require_mention and auto_thread.

Files changed:
- hermes_cli/config.py: Add reactions default to DEFAULT_CONFIG
- gateway/config.py: Map discord.reactions to DISCORD_REACTIONS env var
- gateway/platforms/discord.py: Gate on_processing_start/complete hooks
- tests/gateway/test_discord_reactions.py: 3 new tests for config gate
2026-03-31 01:24:48 -07:00
Teknium
491e79bca9 refactor: unify setup wizard provider selection with hermes model
setup_model_provider() had 800+ lines of duplicated provider handling
that reimplemented the same credential prompting, OAuth flows, and model
selection that hermes model already provides via the _model_flow_*
functions.  Every new provider had to be added in both places, and the
two implementations diverged in config persistence (setup.py did raw
YAML writes, _set_model_provider, and _update_config_for_provider
depending on the provider — main.py used its own load/save cycle).

This caused the #4172 bug: _model_flow_custom saved config to disk but
the wizard's final save_config(config) overwrote it with stale values.

Fix: extract the core of cmd_model() into select_provider_and_model()
and have setup_model_provider() call it.  After the call, re-sync the
wizard's config dict from disk.  Deletes ~800 lines of duplicated
provider handling from setup.py.

Also fixes cmd_model() double-AuthError crash on fresh installs with
no API keys configured.
2026-03-31 01:04:07 -07:00
Teknium
89d8127772 fix: setup wizard overwrites custom endpoint config (#4172)
_model_flow_custom() saved model.provider and model.base_url to disk
via its own load_config/save_config cycle, but never updated the
setup wizard's in-memory config dict.  The wizard's final
save_config(config) then overwrote the custom settings with the
stale default string model value.

Fix: after saving to disk, also mutate the caller's config dict so
the wizard's final save preserves model.provider='custom' and the
base_url.  Both the model_name and no-model_name branches are
covered.

Added regression tests that simulate the full wizard flow including
the final save_config(config) call — the step that was previously
untested.
2026-03-30 23:17:26 -07:00
Teknium
f890a94c12 refactor: make config.yaml the single source of truth for endpoint URLs (#4165)
OPENAI_BASE_URL was written to .env AND config.yaml, creating a dual-source
confusion. Users (especially Docker) would see the URL in .env and assume
that's where all config lives, then wonder why LLM_MODEL in .env didn't work.

Changes:
- Remove all 27 save_env_value("OPENAI_BASE_URL", ...) calls across main.py,
  setup.py, and tools_config.py
- Remove OPENAI_BASE_URL env var reading from runtime_provider.py, cli.py,
  models.py, and gateway/run.py
- Remove LLM_MODEL/HERMES_MODEL env var reading from gateway/run.py and
  auxiliary_client.py — config.yaml model.default is authoritative
- Vision base URL now saved to config.yaml auxiliary.vision.base_url
  (both setup wizard and tools_config paths)
- Tests updated to set config values instead of env vars

Convention enforced: .env is for SECRETS only (API keys). All other
configuration (model names, base URLs, provider selection) lives
exclusively in config.yaml.
2026-03-30 22:02:53 -07:00
Teknium
4d7e3c7157 fix(tests): provide model name in Codex 401 refresh tests for CI (#4166)
CI has no config.yaml, so cron/gateway resolve an empty model name.
The Codex Responses validator rejects empty models before the mock
API call is reached. Provide explicit model in job dict and env var.
2026-03-30 21:17:09 -07:00
Teknium
1bd206ea5d feat: add /btw command for ephemeral side questions (#4161)
Adds /btw <question> — ask a quick follow-up using the current
session context without interrupting the main conversation.

- Snapshots conversation history, answers with a no-tools agent
- Response is not persisted to session history or DB
- Runs in a background thread (CLI) / async task (gateway)
- Per-session guard prevents concurrent /btw in gateway

Implementation:
- model_tools.py: enabled_toolsets=[] now correctly means "no tools"
  (was falsy, fell through to default "all tools")
- run_agent.py: persist_session=False gates _persist_session()
- cli.py: _handle_btw_command (background thread, Rich panel output)
- gateway/run.py: _handle_btw_command + _run_btw_task (async task)
- hermes_cli/commands.py: CommandDef for "btw"

Inspired by PR #3504 by areu01or00, reimplemented cleanly on current
main with the enabled_toolsets=[] fix and without the __btw_no_tools__
hack.
2026-03-30 21:10:05 -07:00
Teknium
f8e1ee10aa Fix profile list model display (#4160)
Co-authored-by: txhno <roshwarrier@gmail.com>
2026-03-30 20:40:13 -07:00
Teknium
c1ef9b2250 fix(cli): ensure on_session_end hook fires on interrupted exits (#4159)
- Add SIGTERM/SIGHUP signal handlers for graceful shutdown
- Add BrokenPipeError to exit exception handling (SSH disconnects)
- Fire on_session_end plugin hook in finally block, guarded by
  _agent_running to avoid double-firing on normal exits (the hook
  already fires per-turn from run_conversation)

Co-authored-by: kelsia14 <kelsia14@users.noreply.github.com>
2026-03-30 20:37:17 -07:00
Teknium
3a68ec3172 feat: add Fireworks context length detection support (#4158)
- Add api.fireworks.ai to _URL_TO_PROVIDER for automatic provider detection
- Add fireworks to PROVIDER_TO_MODELS_DEV mapped to 'fireworks-ai' (the
  correct models.dev provider key — original PR used 'fireworks' which
  would silently fail the lookup)


Cherry-picked from PR #3989 with models.dev key fix.

Co-authored-by: sroecker <sroecker@users.noreply.github.com>
2026-03-30 20:37:08 -07:00
Teknium
d30ea65c9b fix: URL-based auth for third-party Anthropic endpoints + CI test fixes (#4148)
* fix(tests): mock sys.stdin.isatty for cmd_model TTY guard

* fix(tests): update camofox snapshot format + trajectory compressor mock path

- test_browser_camofox: mock response now uses snapshot format (accessibility tree)
- test_trajectory_compressor: mock _get_async_client instead of setting async_client directly

* fix: URL-based auth detection for third-party Anthropic endpoints + test fixes

Reverts the key-prefix approach from #4093 which broke JWT and managed
key OAuth detection. Instead, detects third-party endpoints by URL:
if base_url is set and isn't anthropic.com, it's a proxy (Azure AI
Foundry, AWS Bedrock, etc.) that uses x-api-key regardless of key format.

Auth decision chain is now:
1. _requires_bearer_auth(url) → MiniMax → Bearer
2. _is_third_party_anthropic_endpoint(url) → Azure/Bedrock → x-api-key
3. _is_oauth_token(key) → OAuth on direct Anthropic → Bearer
4. else → x-api-key

Also includes test fixes from PR #4051 by @erosika:
- Mock sys.stdin.isatty for cmd_model TTY guard
- Update camofox snapshot format mock
- Fix trajectory compressor async client mock path

---------

Co-authored-by: Erosika <eri@plasticlabs.ai>
2026-03-30 20:36:56 -07:00
Teknium
fb4b87f4af chore: add claude-sonnet-4.6 to OpenRouter and Nous model lists (#4157) 2026-03-30 20:33:21 -07:00
Teknium
5b0243e6ad docs: deep quality pass — expand 10 thin pages, fix specific issues (#4134)
Developer guide stubs expanded to full documentation:
- trajectory-format.md: 56→233 lines (JSONL format, ShareGPT example,
  normalization rules, reasoning markup, replay code)
- session-storage.md: 66→388 lines (SQLite schema, migration table,
  FTS5 search syntax, lineage queries, Python API examples)
- context-compression-and-caching.md: 72→321 lines (dual compression
  system, config defaults, 4-phase algorithm, before/after example,
  prompt caching mechanics, cache-aware patterns)
- tools-runtime.md: 65→246 lines (registry API, dispatch flow,
  availability checking, error wrapping, approval flow)
- prompt-assembly.md: 89→246 lines (concrete assembled prompt example,
  SOUL.md injection, context file discovery table)

User-facing pages expanded:
- docker.md: 62→224 lines (volumes, env forwarding, docker-compose,
  resource limits, troubleshooting)
- updating.md: 79→167 lines (update behavior, version checking,
  rollback instructions, Nix users)
- skins.md: 80→206 lines (all color/spinner/branding keys, built-in
  skin descriptions, full custom skin YAML template)

Hub pages improved:
- integrations/index.md: 25→82 lines (web search backends table,
  TTS/browser providers, quick config example)
- features/overview.md: added Integrations section with 6 missing links

Specific fixes:
- configuration.md: removed duplicate Gateway Streaming section
- mcp.md: removed internal "PR work" language
- plugins.md: added inline minimal plugin example (self-contained)

13 files changed, ~1700 lines added. Docusaurus build verified clean.
2026-03-30 20:30:11 -07:00
Teknium
54b876a5c9 fix: add actionable guidance to context-exceeded error messages (#4155)
When context compression fails, users now see hints suggesting /new
or /compress instead of a dead-end error. Covers all 4 error paths:
payload-too-large, max compression attempts (2 paths), and context
length exceeded.

Closes #4061
Salvaged from PR #4076 by SHL0MS.

Co-authored-by: SHL0MS <SHL0MS@users.noreply.github.com>
2026-03-30 20:23:28 -07:00
Teknium
83e5249be6 fix(gateway): use setsid instead of systemd-run --user for /update (salvage #4024) (#4104)
Salvaged from PR #4024 by @Sertug17. Fixes #4017.

- Replace systemd-run --user --scope with setsid for portable session detach
- Add system-level service detection to cmd_update gateway restart
- Falls back to start_new_session=True on systems without setsid (macOS, minimal containers)
2026-03-30 20:22:09 -07:00
Teknium
fb2af3bd1d docs: document tool progress streaming in API server and Open WebUI (#4138)
Update docs to reflect that tool progress now streams inline during
SSE responses. Previously docs said tool calls were invisible.

- api-server.md: add 'Tool progress in streams' note to streaming docs
- open-webui.md: update 'How It Works' steps, add Tool Progress tip
2026-03-30 19:40:39 -07:00
Teknium
cc63b2d1cd fix(gateway): remove user-facing compression warnings (#4139)
Auto-compression still runs silently in the background with server-side
logging, but no longer sends messages to the user's chat about it.

Removed:
- 'Session is large... Auto-compressing' pre-compression notification
- 'Compressed: N → M messages' post-compression notification
- 'Session is still very large after compression' warning
- 'Auto-compression failed' warning
- Rate-limit tracking (only existed for these warnings)
2026-03-30 19:17:07 -07:00
Teknium
45396aaa92 fix(alibaba): use standard DashScope international endpoint (#4133)
* fix(alibaba): use standard DashScope international endpoint

The Alibaba Cloud provider was hardcoded to the coding-intl endpoint
(https://coding-intl.dashscope.aliyuncs.com/v1) which only accepts
Alibaba Coding Plan API keys.

Standard DashScope API keys fail with invalid_api_key error against
this endpoint. Changed to the international compatible-mode endpoint
(https://dashscope-intl.aliyuncs.com/compatible-mode/v1) which works
with standard DashScope keys.

Users with Coding Plan keys or China-region keys can still override
via DASHSCOPE_BASE_URL or config.yaml base_url.

Fixes #3912

* fix: update test to match new DashScope default endpoint

---------

Co-authored-by: kagura-agent <kagura.chen28@gmail.com>
2026-03-30 19:06:30 -07:00
Teknium
04367e2fac fix(cron): stop truncating job IDs in list view (#4132)
Remove [:8] truncation from hermes cron list output. Job IDs are 12
hex chars — truncating to 8 makes them unusable for cron run/pause/remove
which require the full ID.

Co-authored-by: vitobotta <vitobotta@users.noreply.github.com>
2026-03-30 19:05:34 -07:00
Teknium
cdb64a869a fix(security): reject private and loopback IPs in Telegram DoH fallback (#4129)
Co-authored-by: Maymun <139681654+maymuneth@users.noreply.github.com>
2026-03-30 18:53:24 -07:00
Teknium
1e59d4813c feat(api_server): stream tool progress to Open WebUI (#4092)
Wire the existing tool_progress_callback through the API server's
streaming handler so Open WebUI users see what tool is running.

Uses the existing 3-arg callback signature (name, preview, args)
that fires at tool start — no changes to run_agent.py needed.
Progress appears as inline markdown in the SSE content stream.

Inspired by PR #4032 by sroecker, reimplemented to avoid breaking
the callback signature used by CLI and gateway consumers.
2026-03-30 18:50:27 -07:00
Teknium
f776191650 fix: persist compressed context to gateway session after mid-run compression
When context compression fires during run_conversation() in the gateway,
the compressed messages were silently lost on the next turn. Two bugs:

1. Agent-side: _flush_messages_to_session_db() calculated
   flush_from = max(len(conversation_history), _last_flushed_db_idx).
   After compression, _last_flushed_db_idx was correctly reset to 0,
   but conversation_history still had its original pre-compression
   length (e.g. 200). Since compressed messages are shorter (~30),
   messages[200:] was empty — nothing written to the new session's
   SQLite.

   Fix: Set conversation_history = None after each _compress_context()
   call so start_idx = 0 and all compressed messages are flushed.

2. Gateway-side: history_offset was always len(agent_history) — the
   original pre-compression length. After compression shortened the
   message list, agent_messages[200:] was empty, causing the gateway
   to fall back to writing only a user/assistant pair, losing the
   compressed summary and tail context.

   Fix: Detect session splits (agent.session_id != original) and set
   history_offset = 0 so all compressed messages are written to JSONL.
2026-03-30 18:49:14 -07:00
Teknium
44d02f35d2 docs: restructure site navigation — promote features and platforms to top-level (#4116)
Major reorganization of the documentation site for better discoverability
and navigation. 94 pages across 8 top-level sections (was 5).

Structural changes:
- Promote Features from 3-level-deep subcategory to top-level section
  with new Overview hub page categorizing all 26 feature pages
- Promote Messaging Platforms from User Guide subcategory to top-level
  section, add platform comparison matrix (13 platforms x 7 features)
- Create new Integrations section with hub page, grouping MCP, ACP,
  API Server, Honcho, Provider Routing, Fallback Providers
- Extract AI provider content (626 lines) from configuration.md into
  dedicated integrations/providers.md — configuration.md drops from
  1803 to 1178 lines
- Subcategorize Developer Guide into Architecture, Extending, Internals
- Rename "User Guide" to "Using Hermes" for top-level items

Orphan fixes (7 pages now reachable via sidebar):
- build-a-hermes-plugin.md added to Guides
- sms.md added to Messaging Platforms
- context-references.md added to Features > Core
- plugins.md added to Features > Core
- git-worktrees.md added to Using Hermes
- checkpoints-and-rollback.md added to Using Hermes
- checkpoints.md (30-line stub) deleted, superseded by
  checkpoints-and-rollback.md (203 lines)

New files:
- integrations/index.md — Integrations hub page
- integrations/providers.md — AI provider setup (extracted)
- user-guide/features/overview.md — Features hub page

Broken link fixes:
- quickstart.md, faq.md: update context-length-detection anchors
- configuration.md: update checkpoints link
- overview.md: fix checkpoint link path

Docusaurus build verified clean (zero broken links/anchors).
2026-03-30 18:39:51 -07:00
Teknium
b2e1a095f8 fix(anthropic): write scopes field to Claude Code credentials on token refresh (#4126)
Claude Code >=2.1.81 checks for a 'scopes' array containing 'user:inference'
in ~/.claude/.credentials.json before accepting stored OAuth tokens as valid.

When Hermes refreshes the token, it writes only accessToken, refreshToken, and
expiresAt — omitting the scopes field. This causes Claude Code to report
'loggedIn: false' and refuse to start, even though the token is valid.

This commit:
- Parses the 'scope' field from the OAuth refresh response
- Passes it to _write_claude_code_credentials() as a keyword argument
- Persists the scopes array in the claudeAiOauth credential store
- Preserves existing scopes when the refresh response omits the field

Tested against Claude Code v2.1.87 on Linux — auth status correctly reports
loggedIn: true and claude --print works after this fix.

Co-authored-by: Nick <git@flybynight.io>
2026-03-30 18:35:16 -07:00
Teknium
ffd5d37f9b fix: treat non-sk-ant- keys as regular API keys, not OAuth tokens (#4093)
* fix: treat non-sk-ant- prefixed keys (Azure AI Foundry) as regular API keys, not OAuth tokens

* fix: treat non-sk-ant- keys as regular API keys, not OAuth tokens

_is_oauth_token() returned True for any key not starting with
sk-ant-api, misclassifying Azure AI Foundry keys as OAuth tokens
and sending Bearer auth instead of x-api-key → 401 rejection.

Real Anthropic OAuth tokens all start with sk-ant-oat (confirmed
from live .credentials.json). Non-sk-ant- keys are third-party
provider keys that should use x-api-key.

Test fixtures updated to use realistic sk-ant-oat01- prefixed
tokens instead of fake strings.

Salvaged from PR #4075 by @HangGlidersRule.

---------

Co-authored-by: Clawdbot <clawdbot@openclaw.ai>
2026-03-30 17:41:13 -07:00
Teknium
720507efac feat: add post-migration cleanup for OpenClaw directories (#4100)
After migrating from OpenClaw, leftover workspace directories contain
state files (todo.json, sessions, logs) that confuse the agent — it
discovers them and reads/writes to stale locations instead of the
Hermes state directory, causing issues like cron jobs reading a
different todo list than interactive sessions.

Changes:
- hermes claw migrate now offers to archive the source directory after
  successful migration (rename to .pre-migration, not delete)
- New `hermes claw cleanup` subcommand for users who already migrated
  and need to archive leftover OpenClaw directories
- Migration notes updated with explicit cleanup guidance
- 42 tests covering all new functionality

Reported by SteveSkedasticity — multiple todo.json files across
~/.hermes/, ~/.openclaw/workspace/, and ~/.openclaw/workspace-assistant/
caused cron jobs to read from wrong locations.
2026-03-30 17:39:08 -07:00
Teknium
8a794d029d fix(ci): add repo conditionals to prevent fork workflow failures (#4107)
Add github.repository checks to docker-publish and deploy-site
workflows so they skip on forks where upstream-specific resources
(Docker Hub org, custom domain) are unavailable.

Co-authored-by: StreamOfRon <StreamOfRon@users.noreply.github.com>
2026-03-30 17:38:32 -07:00
Teknium
e64b047663 chore: prepare Hermes for Homebrew packaging (#4099)
Co-authored-by: Yabuku-xD <78594762+Yabuku-xD@users.noreply.github.com>
2026-03-30 17:34:43 -07:00
Robin Fernandes
1b7473e702 Fixes and refactors enabled by recent updates to main. 2026-03-31 09:29:59 +09:00
Robin Fernandes
1126284c97 Merge branch 'main' into rewbs/tool-use-charge-to-subscription 2026-03-31 09:29:43 +09:00
Teknium
11aa44d34d docs(telegram): add webhook mode documentation (#4089)
Documents the Telegram webhook mode from #3880:
- New 'Webhook Mode' section in telegram.md with polling vs webhook
  comparison, config table, Fly.io deployment example, troubleshooting
- Add TELEGRAM_WEBHOOK_URL/PORT/SECRET to environment-variables.md
- Add Telegram section to .env.example (existing + webhook vars)

Co-authored-by: raulbcs <raulbcs@users.noreply.github.com>
2026-03-30 17:21:59 -07:00
Teknium
07746dca0c fix(matrix): E2EE decryption — request keys, auto-trust devices, retry buffered events (#4083)
When the Matrix adapter receives encrypted events it can't decrypt
(MegolmEvent), it now:

1. Requests the missing room key from other devices via
   client.request_room_key(event) instead of silently dropping the message

2. Buffers undecrypted events (bounded to 100, 5 min TTL) and retries
   decryption after each E2EE maintenance cycle when new keys arrive

3. Auto-trusts/verifies all devices after key queries so other clients
   share session keys with the bot proactively

4. Exports Megolm keys on disconnect and imports them on connect, so
   session keys survive gateway restarts

This addresses the 'could not decrypt event' warnings that caused the
bot to miss messages in encrypted rooms.
2026-03-30 17:16:09 -07:00
Teknium
7e0c2c3ce3 docs: comprehensive documentation audit — fix 9 HIGH, 20+ MEDIUM gaps (#4087)
Reference docs fixes:
- cli-commands.md: remove non-existent --provider alibaba, add hermes
  profile/completion/plugins/mcp to top-level table, add --profile/-p
  global flag, add --source chat option
- slash-commands.md: add /yolo and /commands, fix /q alias conflict
  (resolves to /queue not /quit), add missing aliases (/bg, /set-home,
  /reload_mcp, /gateway)
- toolsets-reference.md: fix hermes-api-server (not same as hermes-cli,
  omits clarify/send_message/text_to_speech)
- profile-commands.md: fix show name required not optional, --clone-from
  not --from, add --remove/--name to alias, fix alias path, fix export/
  import arg types, remove non-existent fish completion
- tools-reference.md: add EXA_API_KEY to web tools requires_env
- mcp-config-reference.md: add auth key for OAuth, tool name sanitization
- environment-variables.md: add EXA_API_KEY, update provider values
- plugins.md: remove non-existent ctx.register_command(), add
  ctx.inject_message()

Feature docs additions:
- security.md: add /yolo mode, approval modes (manual/smart/off),
  configurable timeout, expanded dangerous patterns table
- cron.md: add wrap_response config, [SILENT] suppression
- mcp.md: add dynamic tool discovery, MCP sampling support
- cli.md: add Ctrl+Z suspend, busy_input_mode, tool_preview_length
- docker.md: add skills/credential file mounting

Messaging platform docs:
- telegram.md: add webhook mode, DoH fallback IPs
- slack.md: add multi-workspace OAuth support
- discord.md: add DISCORD_IGNORE_NO_MENTION
- matrix.md: add MSC3245 native voice messages
- feishu.md: expand from 129 to 365 lines (encrypt key, verification
  token, group policy, card actions, media, rate limiting, markdown,
  troubleshooting)
- wecom.md: expand from 86 to 264 lines (per-group allowlists, media,
  AES decryption, stream replies, reconnection, troubleshooting)

Configuration docs:
- quickstart.md: add DeepSeek, Copilot, Copilot ACP providers
- configuration.md: add DeepSeek provider, Exa web backend, terminal
  env_passthrough/images, browser.command_timeout, compression params,
  discord config, security/tirith config, timezone, auxiliary models

21 files changed, ~1000 lines added
2026-03-30 17:15:21 -07:00
SHL0MS
3c8f910973 feat: respect NO_COLOR env var and TERM=dumb (#4079)
Add should_use_color() function to hermes_cli/colors.py that checks
NO_COLOR (https://no-color.org/) and TERM=dumb before emitting ANSI
escapes. The existing color() helper now uses this function instead
of a bare isatty() check.

This is the foundation — cli.py and banner.py still have inline ANSI
constants that bypass this module (tracked in #4071).

Closes #4066

Co-authored-by: SHL0MS <SHL0MS@users.noreply.github.com>
2026-03-30 17:07:21 -07:00
Teknium
13f3e67165 ux: show 'Initializing agent...' on first message (#4086)
Display a brief status message before the heavy agent initialization
(OpenAI client setup, tool loading, memory init, etc.) so users
aren't staring at a blank screen for several seconds.

Only prints when self.agent is None (first use or after model switch).

Closes #4060

Co-authored-by: SHL0MS <SHL0MS@users.noreply.github.com>
2026-03-30 17:05:40 -07:00
Teknium
4a7c17fca5 fix(gateway): read custom_providers context_length in hygiene compression (#4085)
Gateway hygiene pre-compression only checked model.context_length from
the top-level config, missing per-model context_length defined in
custom_providers entries. This caused premature compression for custom
provider users (e.g. 128K default instead of 200K configured).

The AIAgent's own compressor already reads custom_providers correctly
(run_agent.py lines 1171-1189). This adds the same fallback to the
gateway hygiene path, running after runtime provider resolution so
the base_url is available for matching.
2026-03-30 17:04:31 -07:00
Robin Fernandes
6e4598ce1e Merge branch 'main' into rewbs/tool-use-charge-to-subscription 2026-03-31 08:48:54 +09:00
Teknium
f007284d05 fix: rate-limit pairing rejection messages to prevent spam (#4081)
* fix: rate-limit pairing rejection messages to prevent spam

When generate_code() returns None (rate limited or max pending), the
"Too many pairing requests" message was sent on every subsequent DM
with no cooldown. A user sending 30 messages would get 30 rejection
replies — reported as potential hack on WhatsApp.

Now check _is_rate_limited() before any pairing response, and record
rate limit after sending a rejection. Subsequent messages from the
same user are silently ignored until the rate limit window expires.

* test: add coverage for pairing response rate limiting

Follow-up to cherry-picked PR #4042 — adds tests verifying:
- Rate-limited users get silently ignored (no response sent)
- Rejection messages record rate limit for subsequent suppression

---------

Co-authored-by: 0xbyt4 <35742124+0xbyt4@users.noreply.github.com>
2026-03-30 16:48:00 -07:00
Teknium
3d47af01c3 fix(honcho): write config to instance-local path for profile isolation (#4037)
Multiple agents/profiles running 'hermes honcho setup' all wrote to
the shared global ~/.honcho/config.json, overwriting each other's
configuration.

Root cause: _write_config() defaulted to resolve_config_path() which
returns the global path when no instance-local file exists yet (i.e.
on first setup).

Fix: _write_config() now defaults to _local_config_path() which always
returns $HERMES_HOME/honcho.json. Each profile gets its own config file.
Reading still falls back to global for cross-app interop and seeding.

Also updates cmd_setup and cmd_status messaging to show the actual
write path.

Includes 10 new tests verifying profile isolation, global fallback
reads, and multi-profile independence.
2026-03-30 16:41:19 -07:00
SHL0MS
275fcc6673 Merge pull request #4054 from NousResearch/ascii-video/text-readability-and-layout-oracle
ascii-video skill: text readability techniques and external layout oracle
2026-03-30 15:52:14 -07:00
SHL0MS
ab62614a89 ascii-video: add text readability techniques and external layout oracle pattern
- composition.md: add text backdrop (gaussian dark mask behind glyphs) and
  external layout oracle pattern (browser-based text layout → JSON → Python
  renderer pipeline for obstacle-aware text reflow)
- shaders.md: add reverse vignette shader (center-darkening for text readability)
- troubleshooting.md: add diagnostic entries for text-over-busy-background
  readability and kaleidoscope-destroys-text pitfall
2026-03-30 18:48:22 -04:00
Bryan Cross
0287597d02 Optimize Playwright install 2026-03-30 17:38:07 -05:00
Teknium
de368cac54 fix(tools): show browser and TTS in reconfigure menu (#4041)
* fix(gateway): honor default for invalid bool-like config values

* refactor: simplify web backend priority detection

Replace cascading boolean conditions with a priority-ordered loop.
Same behavior (verified against all 16 env var combinations),
half the lines, trivially extensible for new backends.

* fix(tools): show browser and TTS in reconfigure menu

_toolset_has_keys() returned False for toolsets with no-key providers
(Local Browser, Edge TTS) because it only checked providers with
env_vars. Users couldn't find these tools in the reconfigure list
and had no obvious way to switch browser/TTS backends.

Now treats providers with empty env_vars as always-configured, so
toolsets with free/local options always appear in the reconfigure menu.

---------

Co-authored-by: aydnOktay <xaydinoktay@gmail.com>
2026-03-30 14:11:39 -07:00
Bryan Cross
3a1e489dd6 Add build-essential to Dockerfile dependencies 2026-03-30 15:57:22 -05:00
Teknium
0d1003559d refactor: simplify web backend priority detection (#4036)
* fix(gateway): honor default for invalid bool-like config values

* refactor: simplify web backend priority detection

Replace cascading boolean conditions with a priority-ordered loop.
Same behavior (verified against all 16 env var combinations),
half the lines, trivially extensible for new backends.

---------

Co-authored-by: aydnOktay <xaydinoktay@gmail.com>
2026-03-30 13:37:25 -07:00
Bryan Cross
4f4d7c4eeb Merge branch 'NousResearch:main' into docker-optimization 2026-03-30 15:29:27 -05:00
Bryan Cross
5de312c9e3 Simplify dockerignore 2026-03-30 15:29:06 -05:00
Bryan Cross
48942c89b5 Further npm optimizations 2026-03-30 15:27:11 -05:00
Teknium
eba8d52d54 fix: show correct shell config path for macOS/zsh in install script (#4025)
- print_success() hardcoded 'source ~/.bashrc' regardless of user's shell
- On macOS (default zsh), ~/.bashrc doesn't exist, leaving users unable to
  find the hermes command after install
- Now detects $SHELL and shows the correct file (zshrc/bashrc)
- Also captures .[all] install failure output instead of silencing with
  2>/dev/null, so users can diagnose why full extras failed
2026-03-30 13:25:11 -07:00
Teknium
72104eb06f fix(gateway): honor default for invalid bool-like config values (#4029)
Co-authored-by: aydnOktay <xaydinoktay@gmail.com>
2026-03-30 13:24:48 -07:00
Bryan Cross
fdef0456a7 Merge branch 'NousResearch:main' into docker-optimization 2026-03-30 15:21:45 -05:00
Teknium
4b35836ba4 fix(auth): use bearer auth for MiniMax Anthropic endpoints (#4028)
MiniMax's /anthropic endpoints implement Anthropic's Messages API but
require Authorization: Bearer instead of x-api-key. Without this fix,
MiniMax users get 401 errors in gateway sessions.

Adds _requires_bearer_auth() to detect MiniMax endpoints and route
through auth_token in the Anthropic SDK. Check runs before OAuth
token detection so MiniMax keys aren't misclassified as setup tokens.

Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-03-30 13:21:39 -07:00
Teknium
bd376fe976 fix(docs): improve mobile sidebar navigation
The sidebar had all categories expanded by default (collapsed: false),
which on mobile created a 60+ item flat list when opening the sidebar.
Reported by danny on Discord.

Changes:
- Set all top-level categories to collapsed: true (tap to expand)
- Enable autoCollapseCategories: true (accordion — opening one section
  closes others, prevents the overwhelming flat list)
- Enable hideable sidebar (swipe-to-dismiss on mobile)
- Add mobile CSS: larger touch targets (0.75rem padding), bolder
  category headers, visible subcategory indentation with left border,
  wider sidebar (85vw / 360px max), darker backdrop overlay
2026-03-30 13:20:55 -07:00
Teknium
f93637b3a1 feat: add /profile slash command to show active profile (#4027)
Adds /profile to COMMAND_REGISTRY (Info category) with handlers in
both CLI and gateway. Shows the active profile name and home directory.

Works on all platforms — CLI, Telegram, Discord, Slack, etc.
Detects profile by checking if HERMES_HOME is under ~/.hermes/profiles/.
Shows 'default' when running without a profile.
2026-03-30 13:20:06 -07:00
Bryan Cross
8210e7aba6 Optimize Dockerfile: combine RUN commands, clear caches, add .dockerignore
- Combine apt-get update and install into single RUN with cache clearing
- Remove APT lists after installation
- Add --no-cache-dir to pip install
- Add --prefer-offline --no-audit to npm install
- Create .dockerignore to exclude unnecessary files from build context
- Update docker-publish.yml workflow to tag images with release names
- Ensure buildx caching is used (type=gha)
2026-03-30 15:19:52 -05:00
Teknium
7b4fe0528f fix(auth): use bearer auth for MiniMax Anthropic endpoints (#4028)
MiniMax's /anthropic endpoints implement Anthropic's Messages API but
require Authorization: Bearer instead of x-api-key. Without this fix,
MiniMax users get 401 errors in gateway sessions.

Adds _requires_bearer_auth() to detect MiniMax endpoints and route
through auth_token in the Anthropic SDK. Check runs before OAuth
token detection so MiniMax keys aren't misclassified as setup tokens.

Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-03-30 13:19:44 -07:00
Teknium
950f69475f feat(browser): add Camofox local anti-detection browser backend (#4008)
Camofox-browser is a self-hosted Node.js server wrapping Camoufox
(Firefox fork with C++ fingerprint spoofing). When CAMOFOX_URL is set,
all 11 browser tools route through the Camofox REST API instead of
the agent-browser CLI.

Maps 1:1 to the existing browser tool interface:
- Navigate, snapshot, click, type, scroll, back, press, close
- Get images, vision (screenshot + LLM analysis)
- Console (returns empty with note — camofox limitation)

Setup: npm start in camofox-browser dir, or docker run -p 9377:9377
Then: CAMOFOX_URL=http://localhost:9377 in ~/.hermes/.env

Advantages over Browserbase (cloud):
- Free (no per-session API costs)
- Local (zero network latency for browser ops)
- Anti-detection at C++ level (bypasses Cloudflare/Google bot detection)
- Works offline, Docker-ready

Files:
- tools/browser_camofox.py: Full REST backend (~400 lines)
- tools/browser_tool.py: Routing at each tool function
- hermes_cli/config.py: CAMOFOX_URL env var entry
- tests/tools/test_browser_camofox.py: 20 tests
2026-03-30 13:18:42 -07:00
Teknium
7dac75f2ae fix: prevent context pressure warning spam after compression (#4012)
* feat: add /yolo slash command to toggle dangerous command approvals

Adds a /yolo command that toggles HERMES_YOLO_MODE at runtime, skipping
all dangerous command approval prompts for the current session. Works in
both CLI and gateway (Telegram, Discord, etc.).

- /yolo -> ON: all commands auto-approved, no confirmation prompts
- /yolo -> OFF: normal approval flow restored

The --yolo CLI flag already existed for launch-time opt-in. This adds
the ability to toggle mid-session without restarting.

Session-scoped — resets when the process ends. Uses the existing
HERMES_YOLO_MODE env var that check_all_command_guards() already
respects.

* fix: prevent context pressure warning spam (agent loop + gateway rate-limit)

Two complementary fixes for repeated context pressure warnings spamming
gateway users (Telegram, Discord, etc.):

1. Agent-level loop fix (run_agent.py):
   After compression, only reset _context_pressure_warned if the
   post-compression estimate is actually below the 85% warning level.
   Previously the flag was unconditionally reset, causing the warning
   to re-fire every loop iteration when compression couldn't reduce
   below 85% of the threshold (e.g. very low threshold like 15%,
   or system prompt alone exceeds the warning level).

2. Gateway-level rate-limit (gateway/run.py, salvaged from PR #3786):
   Per-chat_id cooldown of 1 hour on compression warning messages.
   Both warning paths ('still large after compression' and 'compression
   failed') are gated. Defense-in-depth — even if the agent-level fix
   has edge cases, users won't see more than one warning per hour.

Co-authored-by: dlkakbs <dlkakbs@users.noreply.github.com>

---------

Co-authored-by: dlkakbs <dlkakbs@users.noreply.github.com>
2026-03-30 13:18:21 -07:00
Teknium
ed9af6e589 fix: create AsyncOpenAI lazily in trajectory_compressor to avoid closed event loop (#4013)
The AsyncOpenAI client was created once at __init__ and stored as an
instance attribute. process_directory() calls asyncio.run() which creates
and closes a fresh event loop. On a second call, the client's httpx
transport is still bound to the closed loop, raising RuntimeError:
"Event loop is closed" — the same pattern fixed by PR #3398 for the
main agent loop.

Create the client lazily in _get_async_client() so each asyncio.run()
gets a client bound to the current loop.

Co-authored-by: binhnt92 <binhnt.ht.92@gmail.com>
2026-03-30 13:16:16 -07:00
Teknium
158f49f19a fix: enforce priority order in Telegram menu — core > plugins > skills (#4023)
The menu now has explicit priority tiers:
1. Core CommandDef commands (always included, never bumped)
2. Plugin slash commands (take precedence over skills)
3. Built-in skill commands (fill remaining slots alphabetically)

Only skills get trimmed when the 100-command cap is hit. Adding new
core commands or plugin commands automatically pushes skills out,
not the other way around.
2026-03-30 13:04:06 -07:00
Teknium
86250a3e45 docs: expand terminal backends section + fix docs build (#4016)
* feat(telegram): add webhook mode as alternative to polling

When TELEGRAM_WEBHOOK_URL is set, the adapter starts an HTTP webhook
server (via python-telegram-bot's start_webhook()) instead of long
polling. This enables cloud platforms like Fly.io and Railway to
auto-wake suspended machines on inbound HTTP traffic.

Polling remains the default — no behavior change unless the env var
is set.

Env vars:
  TELEGRAM_WEBHOOK_URL    Public HTTPS URL for Telegram to push to
  TELEGRAM_WEBHOOK_PORT   Local listen port (default 8443)
  TELEGRAM_WEBHOOK_SECRET Secret token for update verification

Cherry-picked and adapted from PR #2022 by SHL0MS. Preserved all
current main enhancements (network error recovery, polling conflict
detection, DM topics setup).

Co-authored-by: SHL0MS <SHL0MS@users.noreply.github.com>

* fix: send_document call in background task delivery + vision download timeout

Two fixes salvaged from PR #2269 by amethystani:

1. gateway/run.py: adapter.send_file() → adapter.send_document()
   send_file() doesn't exist on BasePlatformAdapter. Background task
   media files were silently never delivered (AttributeError swallowed
   by except Exception: pass).

2. tools/vision_tools.py: configurable image download timeout via
   HERMES_VISION_DOWNLOAD_TIMEOUT env var (default 30s), plus guard
   against raise None when max_retries=0.

The third fix in #2269 (opencode-go auth config) was already resolved
on main.

Co-authored-by: amethystani <amethystani@users.noreply.github.com>

* docs: expand terminal backends section + fix feishu MDX build error

---------

Co-authored-by: SHL0MS <SHL0MS@users.noreply.github.com>
Co-authored-by: amethystani <amethystani@users.noreply.github.com>
2026-03-30 12:59:58 -07:00
Teknium
ea342f2382 Fix banner alignment in installer script (#4011)
Co-authored-by: Ahmed Khaled <wakeupwithme000@gmail.com>
2026-03-30 11:24:10 -07:00
Teknium
60ecde8ac7 fix: fit all 100 commands in Telegram menu with 40-char descriptions (#4010)
* fix: truncate skill descriptions to 100 chars in Telegram menu

* fix: 40-char desc cap + 100 command limit for Telegram menu

setMyCommands has an undocumented total payload size limit.
50 commands with 256-char descriptions failed, 50 with 100-char
worked, and 100 with 40-char descriptions also works (~5300 total
chars). Truncate skill descriptions to 40 chars in the menu picker
and set cap back to 100. Full descriptions available via /commands.
2026-03-30 11:21:13 -07:00
Teknium
f3069c649c fix(cli): add missing subprocess.run() timeouts in doctor and status (#4009)
Add timeout parameters to 4 subprocess.run() calls that could hang
indefinitely if the child process blocks (e.g., unresponsive docker
daemon, systemctl waiting for D-Bus):

- doctor.py: docker info (timeout=10), ssh check (timeout=15)
- status.py: systemctl is-active (timeout=5), launchctl list (timeout=5)

Each call site now catches subprocess.TimeoutExpired and treats it as
a failure, consistent with how non-zero return codes are already handled.

Add AST-based regression test that verifies every subprocess.run() call
in CLI modules specifies a timeout keyword argument.

Co-authored-by: dieutx <dangtc94@gmail.com>
2026-03-30 11:17:15 -07:00
Teknium
0976bf6cd0 feat: add /yolo slash command to toggle dangerous command approvals (#3990)
Adds a /yolo command that toggles HERMES_YOLO_MODE at runtime, skipping
all dangerous command approval prompts for the current session. Works in
both CLI and gateway (Telegram, Discord, etc.).

- /yolo -> ON: all commands auto-approved, no confirmation prompts
- /yolo -> OFF: normal approval flow restored

The --yolo CLI flag already existed for launch-time opt-in. This adds
the ability to toggle mid-session without restarting.

Session-scoped — resets when the process ends. Uses the existing
HERMES_YOLO_MODE env var that check_all_command_guards() already
respects.
2026-03-30 11:17:09 -07:00
Teknium
da3e22bcfa fix: cap Telegram menu at 50 commands — API rejects above ~60 (#4006)
* fix: use SKILLS_DIR not repo path for Telegram menu skill filter

Skills are synced to ~/.hermes/skills/ (SKILLS_DIR), not the repo's
skills/ directory. The previous filter compared against the repo path
so no skills matched. Now checks SKILLS_DIR and excludes .hub/
subdirectory (user-installed hub skills).

* fix: cap Telegram menu at 50 commands — API rejects above ~60

Telegram's setMyCommands returns BOT_COMMANDS_TOO_MUCH when
registering close to 100 commands despite docs claiming 100 is the
limit. Metadata overhead causes rejection above ~60. Cap at 50 for
reliability — remaining commands accessible via /commands.
2026-03-30 11:05:20 -07:00
Teknium
9fd78c7a8e fix: use SKILLS_DIR not repo path for Telegram menu skill filter (#4005)
Skills are synced to ~/.hermes/skills/ (SKILLS_DIR), not the repo's
skills/ directory. The previous filter compared against the repo path
so no skills matched. Now checks SKILLS_DIR and excludes .hub/
subdirectory (user-installed hub skills).
2026-03-30 11:01:13 -07:00
Teknium
5ceed021dc feat(gateway): skill-aware slash commands, paginated /commands, Telegram 100-cap (#3934)
* feat(gateway): skill-aware slash commands, paginated /commands, Telegram 100-cap

Map active skills to Telegram's slash command menu so users can
discover and invoke skills directly. Three changes:

1. Telegram menu now includes active skill commands alongside built-in
   commands, capped at 100 entries (Telegram Bot API limit). Overflow
   commands remain callable but hidden from the picker. Logged at
   startup when cap is hit.

2. New /commands [page] gateway command for paginated browsing of all
   commands + skills. /help now shows first 10 skill commands and
   points to /commands for the full list.

3. When a user types a slash command that matches a disabled or
   uninstalled skill, they get actionable guidance:
   - Disabled: 'Enable it with: hermes skills config'
   - Optional (not installed): 'Install with: hermes skills install official/<path>'

Built on ideas from PR #3921 by @kshitijk4poor.

* chore: move 21 niche skills to optional-skills

Move specialized/niche skills from built-in (skills/) to optional
(optional-skills/) to reduce the default skill count. Users can
install them with: hermes skills install official/<category>/<name>

Moved skills (21):
- mlops: accelerate, chroma, faiss, flash-attention,
  hermes-atropos-environments, huggingface-tokenizers, instructor,
  lambda-labs, llava, nemo-curator, pinecone, pytorch-lightning,
  qdrant, saelens, simpo, slime, tensorrt-llm, torchtitan
- research: domain-intel, duckduckgo-search
- devops: inference-sh cli

Built-in skills: 96 → 75
Optional skills: 22 → 43

* fix: only include repo built-in skills in Telegram menu, not user-installed

User-installed skills (from hub or manually added) stay accessible via
/skills and by typing the command directly, but don't get registered
in the Telegram slash command picker. Only skills whose SKILL.md is
under the repo's skills/ directory are included in the menu.

This keeps the Telegram menu focused on the curated built-in set while
user-installed skills remain discoverable through /skills and /commands.
2026-03-30 10:57:30 -07:00
Teknium
97d6813f51 fix(cache): use deterministic call_id fallbacks instead of random UUIDs (#3991)
When the API doesn't provide a call_id for tool calls, the fallback
generated a random uuid4 hex. This made every API call's input unique
when replayed, preventing OpenAI's prompt cache from matching the
prefix across turns.

Replaced all four uuid4 fallback sites with a deterministic hash of
(function_name, arguments, position_index). The same tool call now
always produces the same fallback call_id, preserving cache-friendly
input stability.

Affected code paths:
- _chat_messages_to_responses_input() — Codex input reconstruction
- _normalize_codex_response() — function_call and custom_tool_call
- _build_assistant_message() — assistant message construction
2026-03-30 09:43:56 -07:00
Teknium
37825189dd fix(skills): validate hub bundle paths before install (#3986)
Co-authored-by: Gutslabs <gutslabsxyz@gmail.com>
2026-03-30 08:37:19 -07:00
Teknium
e08778fa1e chore: release v0.6.0 (2026.3.30) (#3985) 2026-03-30 08:29:38 -07:00
Teknium
fb634068df fix(security): extend secret redaction to ElevenLabs, Tavily and Exa API keys (#3920)
ElevenLabs (sk_), Tavily (tvly-), and Exa (exa_) keys were not covered
by _PREFIX_PATTERNS, leaking in plain text via printenv or log output.

Salvaged from PR #3790 by @memosr. Tests rewritten with correct
assertions (original tests had vacuously true checks).

Co-authored-by: memosr <memosr@users.noreply.github.com>
2026-03-30 08:13:01 -07:00
Teknium
74181fe726 fix: add TTY guard to interactive CLI commands to prevent CPU spin (#3933)
When interactive TUI commands are invoked non-interactively (e.g. via
the agent's terminal() tool through a subprocess pipe), curses loops
spin at 100% CPU and input() calls hang indefinitely.

Defense in depth — two layers:

1. Source-level guard in curses_checklist() (curses_ui.py + checklist.py):
   Returns cancel_returns immediately when stdin is not a TTY. This
   catches ALL callers automatically, including future code.

2. Command-level guards with clear error messages:
   - hermes tools (interactive checklist, not list/disable/enable)
   - hermes setup (interactive wizard)
   - hermes model (provider/model picker)
   - hermes whatsapp (pairing setup)
   - hermes skills config (skill toggle)
   - hermes mcp configure (tool selection)
   - hermes uninstall (confirmation prompt)

Non-interactive subcommands (hermes tools list, hermes tools enable,
hermes mcp add/remove/list/test, hermes skills search/install/browse)
remain unaffected.
2026-03-30 08:10:23 -07:00
Teknium
1e896b0251 fix: resolve 7 failing CI tests (#3936)
1. matrix voice: _on_room_message_media unconditionally overwrote
   media_urls with the image cache path (always None for non-images),
   wiping the locally-cached voice path. Now only overrides when
   cached_path is truthy.

2. cli_tools_command: /tools disable no longer prompts for confirmation
   (input() removed in earlier commit to fix TUI hang), but tests still
   expected the old Y/N prompt flow. Updated tests to match current
   behavior (direct apply + session reset).

3. slack app_mention: connect() was refactored for multi-workspace
   (creates AsyncWebClient per token), but test only mocked the old
   self._app.client path. Added AsyncWebClient and acquire_scoped_lock
   mocks.

4. website_policy: module-level _cached_policy from earlier tests caused
   fast-path return of None. Added invalidate_cache() before assertion.

5. codex 401 refresh: already passing on current main (fixed by
   intervening commit).
2026-03-30 08:10:14 -07:00
0xbyt4
0b0c1b326c fix: openclaw migration overwrites model config dict with string (#3924)
migrate_model_config() was writing `config["model"] = model_str` which
replaces the entire model dict (default, provider, base_url) with a
bare string. This causes 'str' object has no attribute 'get' errors
throughout Hermes when any code does model_cfg.get("default").

Now preserves the existing model dict and only updates the "default"
key, keeping provider/base_url intact.
2026-03-30 03:02:28 -07:00
Teknium
b4496b33b5 fix: background task media delivery + vision download timeout (#3919)
* feat(telegram): add webhook mode as alternative to polling

When TELEGRAM_WEBHOOK_URL is set, the adapter starts an HTTP webhook
server (via python-telegram-bot's start_webhook()) instead of long
polling. This enables cloud platforms like Fly.io and Railway to
auto-wake suspended machines on inbound HTTP traffic.

Polling remains the default — no behavior change unless the env var
is set.

Env vars:
  TELEGRAM_WEBHOOK_URL    Public HTTPS URL for Telegram to push to
  TELEGRAM_WEBHOOK_PORT   Local listen port (default 8443)
  TELEGRAM_WEBHOOK_SECRET Secret token for update verification

Cherry-picked and adapted from PR #2022 by SHL0MS. Preserved all
current main enhancements (network error recovery, polling conflict
detection, DM topics setup).

Co-authored-by: SHL0MS <SHL0MS@users.noreply.github.com>

* fix: send_document call in background task delivery + vision download timeout

Two fixes salvaged from PR #2269 by amethystani:

1. gateway/run.py: adapter.send_file() → adapter.send_document()
   send_file() doesn't exist on BasePlatformAdapter. Background task
   media files were silently never delivered (AttributeError swallowed
   by except Exception: pass).

2. tools/vision_tools.py: configurable image download timeout via
   HERMES_VISION_DOWNLOAD_TIMEOUT env var (default 30s), plus guard
   against raise None when max_retries=0.

The third fix in #2269 (opencode-go auth config) was already resolved
on main.

Co-authored-by: amethystani <amethystani@users.noreply.github.com>

---------

Co-authored-by: SHL0MS <SHL0MS@users.noreply.github.com>
Co-authored-by: amethystani <amethystani@users.noreply.github.com>
2026-03-30 02:59:39 -07:00
Teknium
d028a94b83 fix(whatsapp): skip reply prefix in bot mode — only needed for self-chat (#3931)
The WhatsApp bridge prepends '⚕ *Hermes Agent*\n────────────\n' to
every outgoing message. In self-chat mode this is necessary to
distinguish the bot's responses from the user's own messages. In bot
mode the messages already come from a different number, making the
prefix redundant and cluttered.

Now only prepends the prefix when WHATSAPP_MODE is 'self-chat' (the
default). Bot mode messages are sent clean.
2026-03-30 02:55:33 -07:00
Teknium
0e592aa5b4 fix(cli): remove input() from /tools disable that freezes the terminal (#3918)
input() hangs inside prompt_toolkit's TUI event loop — this is a known
pitfall (AGENTS.md). The /tools disable and /tools enable commands used
input() for a Y/N confirmation prompt, causing the terminal to freeze
with no way to type a response.

Fix: remove the confirmation prompt. The user typing '/tools disable web'
is implicit consent. The change is applied directly with a status message.
2026-03-30 02:53:21 -07:00
Wing Lian
efae525dc5 feat(plugins): add inject_message interface for remote message injection (#3778) 2026-03-30 02:48:06 -07:00
Teknium
5148682b43 feat: mount skills directory into all remote backends with live sync (#3890)
Skills with scripts/, templates/, and references/ subdirectories need
those files available inside sandboxed execution environments. Previously
the skills directory was missing entirely from remote backends.

Live sync — files stay current as credentials refresh and skills update:
- Docker/Singularity: bind mounts are inherently live (host changes
  visible immediately)
- Modal: _sync_files() runs before each command with mtime+size caching,
  pushing only changed credential and skill files (~13μs no-op overhead)
- SSH: rsync --safe-links before each command (naturally incremental)
- Daytona: _upload_if_changed() with mtime+size caching before each command

Security — symlink filtering:
- Docker/Singularity: sanitized temp copy when symlinks detected
- Modal/Daytona: iter_skills_files() skips symlinks
- SSH: rsync --safe-links skips symlinks pointing outside source tree
- Temp dir cleanup via atexit + reuse across calls

Non-root user support:
- SSH: detects remote home via echo $HOME, syncs to $HOME/.hermes/
- Daytona: detects sandbox home before sync, uploads to $HOME/.hermes/
- Docker/Modal/Singularity: run as root, /root/.hermes/ is correct

Also:
- credential_files.py: fix name/path key fallback in required_credential_files
- Singularity, SSH, Daytona: gained credential file support
- 14 tests covering symlink filtering, name/path fallback, iter_skills_files
2026-03-30 02:45:41 -07:00
Teknium
791f4e94b2 feat(slack): multi-workspace support via OAuth token file (#3903)
Salvaged from PR #2033 by yoannes. Adds multi-workspace Slack support
so a single Hermes instance can serve multiple Slack workspaces after
OAuth installs.

Changes:
- Support comma-separated bot tokens in SLACK_BOT_TOKEN env var
- Load additional OAuth-persisted tokens from HERMES_HOME/slack_tokens.json
- Route all Slack API calls through workspace-aware _get_client(chat_id)
  instead of always using the primary app client
- Track channel → workspace mapping from incoming events
- Per-workspace bot_user_id for correct mention detection
- Workspace-aware file downloads (correct auth token per workspace)

Backward compatible: single-token setups work identically.

Token file format (slack_tokens.json):
  {"T12345": {"token": "xoxb-...", "team_name": "My Workspace"}}

Fixed from original PR:
- Uses get_hermes_home() instead of hardcoded ~/.hermes/ path

Co-authored-by: yoannes <yoannes@users.noreply.github.com>
2026-03-30 01:51:48 -07:00
Teknium
a4b064763d fix(cron): tighten [SILENT] instruction to prevent report-with-silent-prefix (#3901)
The model was interpreting [SILENT] as a metadata prefix and writing
full reports with [SILENT] slapped at the front. The old instruction
said 'optionally followed by a brief internal note' which gave too
much room. New instruction explicitly says: [SILENT] means nothing
else, do NOT combine it with a report.
2026-03-30 00:11:00 -07:00
Teknium
138ea3fbe8 fix(docs): escape angle-bracket URLs in feishu.md breaking MDX build (#3902) 2026-03-30 00:09:30 -07:00
Teknium
ee61485cac feat(matrix): support native voice messages via MSC3245 (#3877)
* feat(matrix): support native voice messages

* fix: skip matrix voice tests when matrix-nio not installed

---------

Co-authored-by: Carlos Alberto Pereira Gomes <carlosapgomes@users.noreply.github.com>
2026-03-30 00:02:51 -07:00
Teknium
947faed3bc feat(approvals): make dangerous command approval timeout configurable (#3886)
* feat(approvals): make dangerous command approval timeout configurable

Read `approvals.timeout` from config.yaml (default 60s) instead of
hardcoding 60 seconds in both the fallback CLI prompt and the TUI
prompt_toolkit callback.

Follows the same pattern as `clarify.timeout` which is already
configurable via CLI_CONFIG.

Closes #3765

* fix: add timeout default to approvals section in DEFAULT_CONFIG

---------

Co-authored-by: acsezen <asezen@icloud.com>
2026-03-30 00:02:02 -07:00
kshitij
c288bbfb57 fix(cli): prevent status bar wrapping into duplicate rows (#3883)
- measure status bar display width using prompt_toolkit cell widths
- trim rendered status text when fragments would overflow
- add a final single-fragment fallback to prevent wrapping
- update width assertions to validate display cells instead of len()
2026-03-29 23:59:07 -07:00
Teknium
a347921314 docs: comprehensive OpenClaw migration guide (#3900)
New standalone guide at guides/migrate-from-openclaw.md with:
- Complete config key mapping tables for every category
- Agent behavior mappings (thinkingDefault → reasoning_effort, etc.)
- Session reset policy mapping (session.reset vs resetTriggers)
- TTS dual-source explanation (messages.tts.providers + talk config)
- MCP server field-by-field mapping
- Messaging platform table with exact config paths and env vars
- API key resolution: 3 sources, priority order, supported targets
- SecretRef handling: plain strings, env templates, SecretRef objects
- Post-migration checklist (6 steps)
- Troubleshooting section
- Complete archived items table with recreation guidance

CLI commands reference condensed to summary + link to full guide.
Added to sidebar under Guides & Tutorials.
2026-03-29 23:58:12 -07:00
Teknium
09def65eff fix(migration): expand OpenClaw migration to cover full data footprint (#3869)
Cross-referenced the OpenClaw Zod schema and TypeScript source against
our migration script. Found and fixed:

Expanded data sources:
- Legacy config fallback: clawdbot.json, moldbot.json
- Legacy dir fallback: ~/.clawdbot/, ~/.moldbot/
- API keys from ~/.openclaw/.env and auth-profiles.json
- Personal skills from ~/.agents/skills/
- Project skills from workspace/.agents/skills/
- BOOTSTRAP.md archived (was silently skipped)
- Expanded env key allowlist: DEEPSEEK, GEMINI, ZAI, MINIMAX

Fixed wrong config paths (verified against Zod schema):
- humanDelay.enabled → humanDelay.mode (field doesn't exist as .enabled)
- agents.defaults.exec.timeout → tools.exec.timeoutSec (wrong path + name)
- messages.tts.elevenlabs.voiceId → messages.tts.providers.elevenlabs.voiceId
- session.resetTriggers (string[]) → session.reset (structured object)
- approvals.mode → approvals.exec.mode (no top-level mode)
- browser.inactivityTimeoutMs → doesn't exist; map cdpUrl+headless instead
- tools.webSearch.braveApiKey → tools.web.search.brave.apiKey
- tools.exec.timeout → tools.exec.timeoutSec

Added SecretRef resolution:
- All token/apiKey fields in OpenClaw can be strings, env templates
  (${VAR}), or SecretRef objects ({source:'env',id:'VAR'}). Added
  resolve_secret_input() to handle all three forms.

Fixed auth-profiles.json:
- Canonical field is 'key' not 'apiKey' (though alias accepted)
- File wraps entries in a 'profiles' key — now handled

Fixed TTS config:
- Provider settings at messages.tts.providers.{name} (not flat)
- Also checks top-level 'talk' config as fallback source

Docs updated with new sources and key list.
2026-03-29 22:49:34 -07:00
Teknium
649d149438 feat(telegram): add webhook mode as alternative to polling (#3880)
When TELEGRAM_WEBHOOK_URL is set, the adapter starts an HTTP webhook
server (via python-telegram-bot's start_webhook()) instead of long
polling. This enables cloud platforms like Fly.io and Railway to
auto-wake suspended machines on inbound HTTP traffic.

Polling remains the default — no behavior change unless the env var
is set.

Env vars:
  TELEGRAM_WEBHOOK_URL    Public HTTPS URL for Telegram to push to
  TELEGRAM_WEBHOOK_PORT   Local listen port (default 8443)
  TELEGRAM_WEBHOOK_SECRET Secret token for update verification

Cherry-picked and adapted from PR #2022 by SHL0MS. Preserved all
current main enhancements (network error recovery, polling conflict
detection, DM topics setup).

Co-authored-by: SHL0MS <SHL0MS@users.noreply.github.com>
2026-03-29 22:36:07 -07:00
Teknium
5602458794 security: harden dangerous command detection and add file tool path guards (#3872)
Closes gaps that allowed an agent to expose Docker's Remote API to the
internet by writing to /etc/docker/daemon.json.

Terminal tool (approval.py):
- chmod: now catches 666 and symbolic modes (o+w, a+w), not just 777
- cp/mv/install: detected when targeting /etc/
- sed -i/--in-place: detected when targeting /etc/

File tools (file_tools.py):
- write_file and patch now refuse to write to sensitive system paths
  (/etc/, /boot/, /usr/lib/systemd/, docker.sock)
- Directs users to the terminal tool (which has approval prompts) for
  system file modifications
2026-03-29 22:33:47 -07:00
Teknium
1c900c45e3 fix(agent): support full context length resolution for direct Gemini API endpoints (#3876)
* add .aac audio file format support to transcription tool

* fix(agent): support full context length resolution for direct Gemini API endpoints

Add generativelanguage.googleapis.com to _URL_TO_PROVIDER so direct
Gemini API users get correct 1M+ context length instead of the 128K
unknown-proxy fallback.

Co-authored-by: bb873 <bb873@users.noreply.github.com>

---------

Co-authored-by: Adrian Scott <adrian@adrianscott.com>
Co-authored-by: bb873 <bb873@users.noreply.github.com>
2026-03-29 21:56:07 -07:00
Teknium
227601c200 feat(discord): add message processing reactions (salvage #1980) (#3871)
Adds lifecycle hooks to the base platform adapter so Discord (and future
platforms) can react to message processing events:

  👀  when processing starts
    on successful completion (delivery confirmed)
    on failure, error, or cancellation

Implementation:
- base.py: on_processing_start/on_processing_complete hooks with
  _run_processing_hook error isolation wrapper; delivery tracking
  via _record_delivery closure for accurate success detection
- discord.py: _add_reaction/_remove_reaction helpers + hook overrides
- Tests for base hook lifecycle and Discord-specific reactions

Co-authored-by: alanwilhelm <alanwilhelm@users.noreply.github.com>
2026-03-29 21:55:23 -07:00
Teknium
fd29933a6d fix: use argparse entrypoint in top-level launcher (#3874)
The ./hermes convenience script still used the legacy Fire-based
cli.main wrapper, which doesn't support subcommands (gateway, cron,
doctor, etc.). The installed 'hermes' command already uses
hermes_cli.main:main (argparse) — this aligns the launcher.

Salvaged from PR #2009 by gito369.
2026-03-29 21:54:36 -07:00
Teknium
839f798b74 feat(telegram): add group mention gating and regex triggers (#3870)
Adds Discord-style mention gating for Telegram groups:
- telegram.require_mention: gate group messages (default: false)
- telegram.mention_patterns: regex wake-word triggers
- telegram.free_response_chats: bypass gating for specific chats

When require_mention is enabled, group messages are accepted only for:
- slash commands
- replies to the bot
- @botusername mentions
- regex wake-word pattern matches

DMs remain unrestricted. @mention text is stripped before passing to
the agent. Invalid regex patterns are ignored with a warning.

Config bridges follow the existing Discord pattern (yaml → env vars).

Cherry-picked and adapted from PR #1977 by mcleay. Fixed ChatType
comparison to work without python-telegram-bot installed (uses string
matching instead of enum, consistent with other entity_type checks).

Co-authored-by: mcleay <mcleay@users.noreply.github.com>
2026-03-29 21:53:59 -07:00
Teknium
366bfc3c76 fix(setup): auto-install matrix-nio during hermes setup (#3873)
Setup previously only printed a manual install hint for matrix-nio,
causing the gateway to crash with 'matrix-nio not installed' after
configuring Matrix. Now auto-installs matrix-nio (or matrix-nio[e2e]
when E2EE is enabled) using the same uv-first/pip-fallback pattern
as Daytona and Modal backends.

Also adds hermes-agent[matrix] to the [all] extra in pyproject.toml
and a regression test to keep it there.

Co-authored-by: Gutslabs <Gutslabs@users.noreply.github.com>
Co-authored-by: cutepawss <cutepawss@users.noreply.github.com>
2026-03-29 21:53:28 -07:00
Teknium
b4ceb541a7 fix(terminal): preserve partial output when command times out (#3868)
When a command timed out, all captured output was discarded — the agent
only saw 'Command timed out after Xs' with zero context. Now returns
the buffered output followed by a timeout marker, matching the existing
interrupt path behavior.

Salvaged from PR #3286 by @binhnt92.

Co-authored-by: nguyen binh <binhnt92@users.noreply.github.com>
2026-03-29 21:51:44 -07:00
Teknium
ccf7bb1102 fix(nous): use curated model list instead of full API dump for Nous Portal (#3867)
All three Nous Portal model selection paths (hermes model, first-time
login, setup wizard) were hitting the live /models endpoint and showing
every model available — potentially hundreds. Now uses the curated
_PROVIDER_MODELS['nous'] list (25 agentic models matching OpenRouter
defaults) with 'Enter custom model name' for anything else.

Fixed in:
- hermes_cli/main.py: _model_flow_nous()
- hermes_cli/auth.py: _login_nous() model selection
- hermes_cli/setup.py: post-login model selection
2026-03-29 21:38:10 -07:00
Teknium
ce2841f3c9 feat(gateway): add WeCom (Enterprise WeChat) platform support (#3847)
Adds WeCom as a gateway platform adapter using the AI Bot WebSocket
gateway for real-time bidirectional communication. No public endpoint
or new pip dependencies needed (uses existing aiohttp + httpx).

Features:
- WebSocket persistent connection with auto-reconnect (exponential backoff)
- DM and group messaging with configurable access policies
- Media upload/download with AES decryption for encrypted attachments
- Markdown rendering, quote context preservation
- Proactive + passive reply message modes
- Chunked media upload pipeline (512KB chunks)

Cherry-picked from PR #1898 by EvilRan with:
- Moved to current main (PR was 300 commits behind)
- Skipped base.py regressions (reply_to additions are good but belong
  in a separate PR since they affect all platforms)
- Fixed test assertions to match current base class send() signature
  (reply_to=None kwarg now explicit)
- All 16 integration points added surgically to current main
- No new pip dependencies (aiohttp + httpx already installed)

Fixes #1898

Co-authored-by: EvilRan <EvilRan@users.noreply.github.com>
2026-03-29 21:29:13 -07:00
Teknium
e296efbf24 fix: add INFO-level logging for auxiliary provider resolution (#3866)
The auxiliary client's auto-detection chain was a black box — when
compression, summarization, or memory flush failed, the only clue was
a generic 'Request timed out' with no indication of which provider was
tried or why it was skipped.

Now logs at INFO level:
- 'Auxiliary auto-detect: using local/custom (qwen3.5-9b) — skipped:
  openrouter, nous' when auto-detection picks a provider
- 'Auxiliary compression: using auto (qwen3.5-9b) at http://localhost:11434/v1'
  before each auxiliary call
- 'Auxiliary compression: provider custom unavailable, falling back to
  openrouter' on fallback
- Clear warning with actionable guidance when NO provider is available:
  'Set OPENROUTER_API_KEY or configure a local model in config.yaml'
2026-03-29 21:29:00 -07:00
Robin Fernandes
1cbb1b99cc Gate tool-gateway behind an env var, so it's not in users' faces until we're ready. Even if users enable it, it'll be blocked server-side for now, until we unlock for non-admin users on tool-gateway. 2026-03-30 13:28:10 +09:00
Teknium
2ff2cd3a59 add .aac audio file format support to transcription tool (#3865)
Co-authored-by: Adrian Scott <adrian@adrianscott.com>
2026-03-29 21:27:03 -07:00
Teknium
f39ca81bab docs: comprehensive hermes claw migrate reference (#3864)
The existing docs were two lines. The migration script handles 35
categories of data across persona, memory, skills, messaging platforms,
model providers, MCP servers, agent config, and more.

New docs cover:
- All CLI options (--dry-run, --preset, --overwrite, --migrate-secrets,
  --source, --workspace-target, --skill-conflict, --yes)
- 27 directly-imported categories with source → destination mapping
- 7 archived categories with manual recreation guidance
- Security notes on API key allowlisting
- Usage examples for common migration scenarios
2026-03-29 21:25:13 -07:00
Teknium
3fad1e7cc1 fix(cron): resolve human-friendly delivery labels via channel directory (#3860)
Cron jobs configured with deliver labels from send_message(action='list')
like 'whatsapp:Alice (dm)' passed the label as a literal chat_id.
WhatsApp bridge failed with jidDecode error since 'Alice (dm)' isn't
a valid JID.

Now _resolve_delivery_target() strips display suffixes like ' (dm)' and
resolves human-friendly names via the channel directory before using
them. Raw IDs pass through unchanged when the directory has no match.

Fixes #1945.
2026-03-29 21:24:17 -07:00
Teknium
86ac23c8da fix(auth): stop silently falling back to OpenRouter when no provider is configured (#3862)
Previously, when no API keys or provider credentials were found, Hermes
silently defaulted to OpenRouter + Claude Opus. This caused confusion
when users configured local servers (LM Studio, Ollama, etc.) with a
typo or unrecognized provider name — the system would silently route to
OpenRouter instead of telling them something was wrong.

Changes:
- resolve_provider() now raises AuthError when no credentials are found
  instead of returning 'openrouter' as a silent fallback
- Added local server aliases: lmstudio, ollama, vllm, llamacpp → custom
- Removed hardcoded 'anthropic/claude-opus-4.6' fallback from gateway
  and cron scheduler (they read from config.yaml instead)
- Updated cli-config.yaml.example with complete provider documentation
  including all supported providers, aliases, and local server setup
2026-03-29 21:06:35 -07:00
Teknium
3cc50532d1 fix: auxiliary client uses placeholder key for local servers without auth (#3842)
Local inference servers (Ollama, llama.cpp, vLLM, LM Studio) don't
require API keys, but the auxiliary client's _resolve_custom_runtime()
rejected endpoints with empty keys — causing the auto-detection chain
to skip the user's local server entirely.  This broke compression,
summarization, and memory flush for users running local models without
an OpenRouter/cloud API key.

The main CLI already had this fix (PR #2556, 'no-key-required'
placeholder), but the auxiliary client's resolution path was missed.

Two fixes:
- _resolve_custom_runtime(): use 'no-key-required' placeholder instead
  of returning None when base_url is present but key is empty
- resolve_provider_client() custom branch: same placeholder fallback
  for explicit_base_url without explicit_api_key

Updates 2 tests that expected the old (broken) behavior.
2026-03-29 21:05:36 -07:00
Teknium
2d607d36f6 fix(security): catch sensitive path writes in approval checks (#3859)
Co-authored-by: Gutslabs <gutslabsxyz@gmail.com>
2026-03-29 20:57:57 -07:00
Teknium
aa389924ad fix: prefer curated model list when live probe returns fewer models (#3856)
The model picker for API-key providers (MiniMax, z.ai, etc.) probes
the live /models endpoint when the curated list has fewer than 8
models. When the live endpoint returns fewer models than the curated
list (e.g. MiniMax's Anthropic-compatible endpoint doesn't list M2.7),
the incomplete live list was used instead.

Now falls back to the curated list when live returns fewer models,
ensuring new models like MiniMax-M2.7 always appear in the picker.
2026-03-29 20:55:15 -07:00
Teknium
5e67fc8c40 fix(vision): reject non-image files and enforce website policy (salvage #1940) (#3845)
Three safety gaps in vision_analyze_tool:

1. Local files accepted without checking if they're actually images —
   a renamed text file would get base64-encoded and sent to the model.
   Now validates magic bytes (PNG, JPEG, GIF, BMP, WebP, SVG).

2. No website policy enforcement on image URLs — blocked domains could
   be fetched via the vision tool. Now checks before download.

3. No redirect check — if an allowed URL redirected to a blocked domain,
   the download would proceed. Now re-checks the final URL.

Fixed one test that needed _validate_image_url mocked to bypass DNS
resolution on the fake blocked.test domain (is_safe_url does DNS
checks that were added after the original PR).

Co-authored-by: GutSlabs <GutSlabs@users.noreply.github.com>
2026-03-29 20:55:04 -07:00
Teknium
b60cfd6ce6 fix(telegram): gracefully handle deleted reply targets (#3858)
* fix: add gpt-5.4-mini to Codex fallback catalog

* fix(telegram): gracefully handle deleted reply targets

When a user deletes their message while Hermes is processing, Telegram
returns BadRequest 'Message to be replied not found'. Previously this
was an unhandled permanent error causing silent delivery failure.

Now clears reply_to_id and retries so the response is still delivered,
matching the existing 'thread not found' recovery pattern.

Inspired by PR #3231 by @heathley. Fixes #3229.

---------

Co-authored-by: Clippy <clippy@grads.flow>
Co-authored-by: Nigel Gibbs <heathley@users.noreply.github.com>
2026-03-29 20:47:07 -07:00
Teknium
981e14001c fix: clear api_mode on provider switch instead of hardcoding chat_completions (#3857)
PR #3726 fixed stale codex_responses persisting when switching providers
by hardcoding api_mode=chat_completions in 5 model flows. This broke
MiniMax, MiniMax-CN, and Alibaba which use /anthropic endpoints that
need anthropic_messages — the hardcoded value overrides the URL-based
auto-detection in runtime_provider.py.

Fix: pop api_mode from config in the 3 URL-dependent flows (custom
endpoint, Kimi, api_key_provider) instead of hardcoding. The runtime
resolver already correctly auto-detects api_mode from the base_url
suffix (/anthropic -> anthropic_messages, else chat_completions).

OpenRouter and Copilot ACP flows keep the explicit value since their
api_mode is always known.

Reported by stefan171.
2026-03-29 20:44:39 -07:00
Teknium
9d28f4aba3 fix: add gpt-5.4-mini to Codex fallback catalog (#3855)
Co-authored-by: Clippy <clippy@grads.flow>
2026-03-29 20:10:00 -07:00
Teknium
3e203de125 fix(skills): block category path traversal in skill manager (#3844)
Validate category names in _create_skill() before using them as
filesystem path segments. Previously, categories like '../escape' or
'/tmp/pwned' could write skill files outside ~/.hermes/skills/.

Adds _validate_category() that rejects slashes, backslashes, absolute
paths, and non-alphanumeric characters (reuses existing VALID_NAME_RE).

Tests: 5 new tests for traversal, absolute paths, and valid categories.

Salvaged from PR #1939 by Gutslabs.
2026-03-29 20:08:22 -07:00
Teknium
2d264a4562 fix(tests): resolve 10 CI failures across hooks, tiktoken, plugins (#3848)
test_hooks.py (7 failures): Built-in boot-md hook was always loaded
by _register_builtin_hooks(), adding +1 to every expected hook count.
Mock out built-in registration in TestDiscoverAndLoad so tests isolate
user-hook discovery logic.

test_tool_token_estimation.py (2 failures): tiktoken is not in
core/[all] dependencies. The estimation function gracefully returns {}
when tiktoken is missing, but tests expected non-empty results. Added
skipif markers for tests that need tiktoken.

test_plugins_cmd.py (1 failure): bare 'hermes plugins' now dispatches
to cmd_toggle() (interactive curses UI) instead of cmd_list(). Updated
test to match the new behavior.
2026-03-29 20:05:59 -07:00
Teknium
3e2c8c529b fix(whatsapp): resolve LID↔phone aliases in allowlist matching (#3830)
WhatsApp DMs can arrive with LID sender IDs even when
WHATSAPP_ALLOWED_USERS is configured with phone numbers. The allowlist
check now reads bridge session mapping files (lid-mapping-*.json) to
resolve phone↔LID aliases, matching users regardless of which
identifier format the message uses.

Both the Python gateway (_is_user_authorized) and the Node bridge
(allowlist.js) now share the same mapping-file-based resolution logic.

Co-authored-by: Frederico Ribeiro <fr@tecompanytea.com>
2026-03-29 18:21:50 -07:00
Teknium
e4d575e563 fix: report subagent status as completed when summary exists (#3829)
When a subagent hit max_iterations, status was always 'failed' even
if it produced a usable summary via _handle_max_iterations(). This
happened because the status check required both completed=True AND
a summary, but completed is False whenever max_iterations is reached
(run_agent.py line 7969).

Now gates status on whether a summary was produced — if the subagent
returned a final_response, the parent has usable output regardless of
iteration budget. The exit_reason field already distinguishes
'completed' vs 'max_iterations' for anything that needs to know how
the task ended.

Closes #1899.
2026-03-29 18:21:36 -07:00
Teknium
2a0e8b001f fix(cli): handle closed stdout ValueError in safe print paths (#3843)
When stdout is closed (piped to a dead process, broken terminal),
Python raises ValueError('I/O operation on closed file'), not OSError.
_safe_print and the API error printer only caught OSError, letting the
ValueError propagate and crash the agent.

Salvaged from PR #3760 by @apexscaleai. Fixes #3534.

Co-authored-by: apexscaleai <apexscaleai@users.noreply.github.com>
2026-03-29 18:21:27 -07:00
Teknium
ca4907dfbc feat(gateway): add Feishu/Lark platform support (#3817)
Adds Feishu (ByteDance's enterprise messaging platform) as a gateway
platform adapter with full feature parity: WebSocket + webhook transports,
message batching, dedup, rate limiting, rich post/card content parsing,
media handling (images/audio/files/video), group @mention gating,
reaction routing, and interactive card button support.

Cherry-picked from PR #1793 by penwyp with:
- Moved to current main (PR was 458 commits behind)
- Fixed _send_with_retry shadowing BasePlatformAdapter method (renamed to
  _feishu_send_with_retry to avoid signature mismatch crash)
- Fixed import structure: aiohttp/websockets imported independently of
  lark_oapi so they remain available when SDK is missing
- Fixed get_hermes_home import (hermes_constants, not hermes_cli.config)
- Added skip decorators for tests requiring lark_oapi SDK
- All 16 integration points added surgically to current main

New dependency: lark-oapi>=1.5.3,<2 (optional, pip install hermes-agent[feishu])

Fixes #1788

Co-authored-by: penwyp <penwyp@users.noreply.github.com>
2026-03-29 18:17:42 -07:00
Teknium
e314833c9d feat(display): configurable tool preview length -- show full paths by default (#3841)
Tool call previews (paths, commands, queries) were hardcoded to truncate
at 35-40 chars across CLI spinners, completion lines, and gateway progress
messages. Users could not see full file paths in tool output.

New config option: display.tool_preview_length (default 0 = no limit).
Set a positive number to truncate at that length.

Changes:
- display.py: module-level _tool_preview_max_len with getter/setter;
  build_tool_preview() and get_cute_tool_message() _trunc/_path respect it
- cli.py: reads config at startup, spinner widget respects config
- gateway/run.py: reads config per-message, progress callback respects config
- run_agent.py: removed redundant 30-char quiet-mode spinner truncation
- config.py: added display.tool_preview_length to DEFAULT_CONFIG

Reported by kriskaminski
2026-03-29 18:02:42 -07:00
Teknium
59f2b228f7 fix(paths): respect HERMES_HOME for protected .env write-deny path (#3840)
The write-deny list in file_operations.py hardcoded ~/.hermes/.env,
which misses the actual .env in custom HERMES_HOME or profile setups.
Use get_hermes_home() for profile-safe path resolution.

Salvaged from PR #3232 by @erhnysr.

Co-authored-by: Erhnysr <erhnysr@users.noreply.github.com>
2026-03-29 18:02:11 -07:00
Teknium
d6b7836210 fix: update session_log_file during context compression (#3835)
When compression creates a child session with a new session_id,
session_log_file was still pointing to the old session's JSON file.
This caused _save_session_log() to write new data to the wrong file.

Closes #3731.

Co-authored-by: kelsia14 <kelsia14@users.noreply.github.com>
2026-03-29 17:49:58 -07:00
Teknium
17b6000e90 feat(skills): add songwriting-and-ai-music creative skill (salvage #1901) (#3834)
Adds a songwriting craft and AI music prompt engineering skill covering
song structure, rhyme/meter, emotional arcs, Suno metatag reference,
phonetic tricks for AI singers, parody adaptation, and production workflow.

Complements existing music skills (heartmula, audiocraft, songsee) which
cover model setup/usage — this one covers the creative process itself.

Also removes the empty skills/music-creation/ category (only had a
DESCRIPTION.md, no actual skills).

Co-authored-by: 123mikeyd <123mikeyd@users.noreply.github.com>
2026-03-29 17:49:19 -07:00
Teknium
45c8d3da96 fix(banner): show lazy-initialized tools in yellow instead of red (salvage #1854) (#3822)
Tools from check_fn-gated toolsets (honcho, homeassistant) showed as
red (disabled) in the startup banner even when properly configured.
This happened because check_fn runs lazily after session context is
set, but the banner renders before agent init.

Now distinguishes three states:
  - red:    truly unavailable (missing env var, no API key)
  - yellow: lazy-initialized (check_fn pending, will activate on use)
  - normal: available and ready

Only the banner fix was salvaged from the original PR; unrelated
bundled changes (context_compressor, STT config, auth default_model,
SessionResetPolicy) were discarded.

Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
2026-03-29 16:53:29 -07:00
Teknium
5ca6d681f0 feat(skills): add memento-flashcards optional skill (#3827)
* feat(skills): add memento-flashcards skill

* docs(skills): clarify memento-flashcards interaction model

* fix: use HERMES_HOME env var for profile-safe data path

---------

Co-authored-by: Magnus Ahmad <magnus.ahmad@gmail.com>
2026-03-29 16:52:52 -07:00
Teknium
df806bdbaf feat(cron): add cron.wrap_response config to disable delivery wrapping (#3807)
Adds a config option to suppress the header/footer text that wraps
cron job responses when delivered to messaging platforms.

Set cron.wrap_response: false in config.yaml for clean output without
the 'Cronjob Response: <name>' header and 'The agent cannot see this
message' footer.  Default is true (preserves current behavior).
2026-03-29 16:31:01 -07:00
Teknium
0ef80c5f32 fix(whatsapp): reuse persistent aiohttp session across requests (#3818)
Replace per-request aiohttp.ClientSession() in every WhatsApp adapter
method with a single persistent self._http_session, matching the pattern
used by Mattermost, HomeAssistant, and SMS adapters.

Changes:
- Create self._http_session in connect(), close in disconnect()
- All bridge HTTP calls (send, edit, send-media, typing, get_chat_info,
  poll_messages) now use the shared session
- Explicitly cancel _poll_task on disconnect() instead of relying
  solely on self._running = False
- Health-check sessions in connect() remain ephemeral (persistent
  session not yet created at that point)
- Remove per-method ImportError guards for aiohttp (always available
  when gateway runs via [messaging] extras)

Salvaged from PR #1851 by Himess. The _poll_task storage was already
on main from PR #3267; this adds the disconnect cancellation and the
persistent session.

Tests: 4 new tests for session close, already-closed skip, poll task
cancellation, and done-task skip.
2026-03-29 16:25:20 -07:00
Teknium
c4cf20f564 fix: clear __pycache__ during update to prevent stale bytecode ImportError (#3819)
Third report of gateway crashing with:
  ImportError: cannot import name 'get_hermes_home' from 'hermes_constants'

Root cause: stale .pyc bytecode files survive code updates. When Python
loads a cached .pyc that references names from the old source, the import
fails and the gateway won't start.

Two bugs fixed:
1. Git update path: no cache clearing at all after git pull
2. ZIP update path: __pycache__ was explicitly in the preserve set

Added _clear_bytecode_cache() helper that removes all __pycache__ dirs
under PROJECT_ROOT (skipping venv/node_modules/.git/.worktrees). Called
in both git and ZIP update paths, before pip install.
2026-03-29 16:23:36 -07:00
Teknium
68d5472810 fix: omit tools param entirely when empty instead of sending None (#3820)
Some providers (Fireworks AI) reject tools=null, and others (Anthropic)
reject tools=[]. The safest approach is to not include the key at all
when there are no tools — the OpenAI SDK treats a missing parameter as
NOT_GIVEN and omits it from the request entirely.

Inspired by PR #3736 (@kelsia14).
2026-03-29 16:12:47 -07:00
Teknium
252fbea005 feat(providers): add ordered fallback provider chain (salvage #1761) (#3813)
Extends the single fallback_model mechanism into an ordered chain.
When the primary model fails, Hermes tries each fallback provider in
sequence until one succeeds or the chain is exhausted.

Config format (new):
  fallback_providers:
    - provider: openrouter
      model: anthropic/claude-sonnet-4
    - provider: openai
      model: gpt-4o

Legacy single-dict fallback_model format still works unchanged.

Key fix vs original PR: the call sites in the retry loop now use
_fallback_index < len(_fallback_chain) instead of the old one-shot
_fallback_activated guard, so the chain actually advances through
all configured providers.

Changes:
- run_agent.py: _fallback_chain list + _fallback_index replaces
  one-shot _fallback_model; _try_activate_fallback() advances
  through chain; failed provider resolution skips to next entry;
  call sites updated to allow chain advancement
- cli.py: reads fallback_providers with legacy fallback_model compat
- gateway/run.py: same
- hermes_cli/config.py: fallback_providers: [] in DEFAULT_CONFIG
- tests: 12 new chain tests + 6 existing test fixtures updated

Co-authored-by: uzaylisak <uzaylisak@users.noreply.github.com>
2026-03-29 16:04:53 -07:00
Teknium
c774833667 fix(banner): show honcho tools as available when configured (#3810)
The honcho check_fn only checked runtime session state, which isn't
set until the agent initializes. At banner time, honcho tools showed
as red/disabled even when properly configured.

Now checks configuration (enabled + api_key/base_url) as a fallback
when the session context isn't active yet. Fast path (session active)
unchanged; slow path (config check) only runs at banner time.

Adds 4 tests covering: session active, configured but no session,
not configured, and import failure graceful fallback.

Closes #1843.
2026-03-29 15:55:05 -07:00
Teknium
d5d22fe7ba feat(mcp): dynamic tool discovery via notifications/tools/list_changed (#3812)
When a connected MCP server sends a ToolListChangedNotification (per the
MCP spec), Hermes now automatically re-fetches the tool list, deregisters
removed tools, and registers new ones — without requiring a restart.

This enables MCP servers with dynamic toolsets (e.g. GitHub MCP with
GITHUB_DYNAMIC_TOOLSETS=1) to add/remove tools at runtime.

Changes:
- registry.py: add ToolRegistry.deregister() for nuke-and-repave refresh
- mcp_tool.py: extract _register_server_tools() from
  _discover_and_register_server() as a shared helper for both initial
  discovery and dynamic refresh
- mcp_tool.py: add _make_message_handler() and _refresh_tools() on
  MCPServerTask, wired into all 3 ClientSession sites (stdio, new HTTP,
  deprecated HTTP)
- Graceful degradation: silently falls back to static discovery when the
  MCP SDK lacks notification types or message_handler support
- 8 new tests covering registration, refresh, handler dispatch, and
  deregister

Salvaged from PR #1794 by shivvor2.
2026-03-29 15:52:54 -07:00
Teknium
bf84cdfa5e fix: ensure tool schema always includes name field in get_definitions (#3811)
When a tool plugin registers a schema without an explicit 'name' key,
get_definitions() crashes with KeyError:

    available_tool_names = {t["function"]["name"] for t in filtered_tools}

Fix: always merge entry.name into schema so 'name' is never missing.

Refs: #3729

Co-authored-by: ekkoitac <ekko.itac@gmail.com>
2026-03-29 15:49:21 -07:00
Teknium
38d694f559 fix(gateway): apply home channel env overrides consistently (#3808)
Home channel env vars (SLACK_HOME_CHANNEL, SIGNAL_HOME_CHANNEL, etc.)
for Slack, Signal, Mattermost, Matrix, Email, and SMS were nested
inside the credential-env blocks, so they were ignored when the
platform was already configured via config.yaml.

Moved the home channel handling outside the credential blocks with a
Platform.X in config.platforms guard, matching the existing pattern
for Telegram and Discord.

Co-authored-by: cutepawss <cutepawss@users.noreply.github.com>
2026-03-29 15:48:51 -07:00
Teknium
ed6427e0a7 fix(agent): user-friendly 429 rate limit messages with Retry-After support (#3809)
When hitting rate limits (429), the agent now:
- Extracts the Retry-After header from the provider response and uses it
  as the wait time instead of blind exponential backoff (capped at 120s)
- Shows rate-limit-specific messaging: 'Rate limit reached. Waiting Xs
  before retry (attempt N/M)...'
- Shows a distinct exhaustion message: 'Rate limit persisted after N
  retries. Please try again later.'

Non-429 errors keep the existing exponential backoff and generic messaging.

Co-authored-by: ygd58 <ygd58@users.noreply.github.com>
2026-03-29 15:48:06 -07:00
Teknium
0fd3b59ba1 feat(cli): add Ctrl+Z process suspend support (#3802)
Adds a Ctrl+Z key binding to suspend the hermes CLI to background
using standard Unix job control. Uses prompt_toolkit's run_in_terminal()
to properly save/restore terminal state, then sends SIGTSTP to the
process group. Prints a branded message with resume instructions.
Shows a not-supported notice on Windows.

Co-authored-by: CharlieKerfoot <CharlieKerfoot@users.noreply.github.com>
2026-03-29 15:47:55 -07:00
Teknium
6716e66e89 feat: add MCP server mode — hermes mcp serve (#3795)
hermes mcp serve starts a stdio MCP server that lets any MCP client
(Claude Code, Cursor, Codex, etc.) interact with Hermes conversations.

Matches OpenClaw's 9-tool channel bridge surface:

Tools exposed:
- conversations_list: list active sessions across all platforms
- conversation_get: details on one conversation
- messages_read: read message history
- attachments_fetch: extract non-text content from messages
- events_poll: poll for new events since a cursor
- events_wait: long-poll / block until next event (near-real-time)
- messages_send: send to any platform via send_message_tool
- channels_list: browse available messaging targets
- permissions_list_open: list pending approval requests
- permissions_respond: allow/deny approvals

Architecture:
- EventBridge: background thread polls SessionDB for new messages,
  maintains in-memory event queue with waiter support
- Reads sessions.json + SessionDB directly (no gateway dep for reads)
- Reuses send_message_tool for sending (same platform adapters)
- FastMCP server with stdio transport
- Zero new dependencies (uses existing mcp>=1.2.0 optional dep)

Files:
- mcp_serve.py: MCP server + EventBridge (~600 lines)
- hermes_cli/main.py: added serve sub-parser to hermes mcp
- hermes_cli/mcp_config.py: route serve action to run_mcp_server
- tests/test_mcp_serve.py: 53 tests
- docs: updated MCP page + CLI commands reference
2026-03-29 15:47:19 -07:00
Teknium
d02561af85 feat: add Gemini 3.1 preview models to OpenRouter and Nous catalogs (#3803)
* Add new Gemini 3.1 model entries to models.py

* fix: also add Gemini 3.1 models to nous provider list

---------

Co-authored-by: Andrei Ignat <andrei@ignat.se>
2026-03-29 15:44:07 -07:00
Teknium
8eb70a6885 fix(email): close SMTP and IMAP connections on failure (#3804)
SMTP connections in _send_email() and _send_email_with_attachment() leak
when login() or send_message() raises before quit() is reached. Both now
wrapped in try/finally with a close() fallback if quit() also fails.

IMAP connection in _fetch_new_messages() leaks when UID processing raises,
since logout() sits after the loop. Restructured with try/finally so
logout() runs unconditionally.

Co-authored-by: Himess <Himess@users.noreply.github.com>
2026-03-29 15:38:32 -07:00
Teknium
ee3d2941cc feat: show estimated tool token context in hermes tools checklist (#3805)
* feat: show estimated tool token context in hermes tools checklist

Adds a live token estimate indicator to the bottom of the interactive
tool configuration checklist (hermes tools / hermes setup). As users
toggle toolsets on/off, the total estimated context cost updates in
real time.

Implementation:
- tools/registry.py: Add get_schema() for check_fn-free schema access
- hermes_cli/curses_ui.py: Add optional status_fn callback to
  curses_checklist — renders at bottom-right of terminal, stays fixed
  while items scroll
- hermes_cli/tools_config.py: Add _estimate_tool_tokens() using
  tiktoken (cl100k_base, already installed) to count tokens in the
  JSON-serialised OpenAI-format tool schemas. Results are cached
  per-process. The status function deduplicates overlapping tools
  (e.g. browser includes web_search) for accurate totals.
- 12 new tests covering estimation, caching, graceful degradation
  when tiktoken is unavailable, status_fn wiring, deduplication,
  and the numbered fallback display

* fix: use effective toolsets (includes plugins) for token estimation index mapping

The status_fn closure built ts_keys from CONFIGURABLE_TOOLSETS but the
checklist uses _get_effective_configurable_toolsets() which appends plugin
toolsets. With plugins present, the indices would mismatch, causing
IndexError when selecting a plugin toolset.
2026-03-29 15:36:56 -07:00
Teknium
475205e30b fix: restore terminalbench2_env.py from patch-tool redaction corruption (#3801)
Commit ed27b826 introduced patch-tool redaction corruption that:
- Replaced max_token_length=16000 with max_token_length=***
- Truncated api_key=os.getenv(...) to api_key=os.get...EY
- Truncated tokenizer_name to NousRe...1-8B
- Deleted 409 lines including _run_tests(), _eval_with_timeout(),
  evaluate(), wandb_log(), and the __main__ entry point

Restores the file from pre-corruption state (ed27b826^) and re-applies
the two legitimate changes from subsequent commits:
- eval_concurrency config field (from ed27b826)
- docker_image registration in register_task_env_overrides (from ed27b826)
- ManagedServer branching for vLLM/SGLang backends (from 13f54596)

Closes #1737, #1740.
2026-03-29 15:33:52 -07:00
Teknium
612321631f fix(gateway): use atomic writes for config.yaml to prevent data loss (#3800)
Replace all 5 plain open(config_path, 'w') calls in gateway command
handlers with atomic_yaml_write() from utils.py. This uses the
established tempfile + fsync + os.replace pattern to ensure config.yaml
is never left half-written if the process is killed mid-write.

Affected handlers: /personality (clear + set), /sethome, /reasoning
(_save_config_key helper), /verbose (tool_progress cycling).

Also fixes missing encoding='utf-8' on the /personality clear write.

Salvaged from PR #1211 by albatrosjj.
2026-03-29 15:32:46 -07:00
Teknium
83cbf7b5bb fix(gateway): use atomic writes for config.yaml to prevent data loss (#3800)
Replace all 5 plain open(config_path, 'w') calls in gateway command
handlers with atomic_yaml_write() from utils.py. This uses the
established tempfile + fsync + os.replace pattern to ensure config.yaml
is never left half-written if the process is killed mid-write.

Affected handlers: /personality (clear + set), /sethome, /reasoning
(_save_config_key helper), /verbose (tool_progress cycling).

Also fixes missing encoding='utf-8' on the /personality clear write.

Salvaged from PR #1211 by albatrosjj.
2026-03-29 15:31:21 -07:00
Teknium
563101e2a9 feat: add Canvas LMS skill for fetching courses and assignments (#3799)
Adds a Canvas LMS integration skill under optional-skills/productivity/canvas/
with a Python CLI wrapper (canvas_api.py) for listing courses and assignments
via personal access token auth.

Cherry-picked from PR #1250 by Alicorn-Max-S with:
- Moved from skills/ to optional-skills/ (niche educational integration)
- Fixed hardcoded ~/.hermes/ path to use $HERMES_HOME
- Removed Canvas env vars from .env.example (optional skill)
- Cleaned stale 'mini-swe-agent backend' reference from .env.example header

Co-authored-by: Alicorn-Max-S <Alicorn-Max-S@users.noreply.github.com>
2026-03-29 15:28:32 -07:00
Teknium
fe6a916284 feat(skills): add one-three-one-rule communication skill (#3797)
Adds a structured 1-3-1 decision-making framework as an optional skill.
Produces: one problem statement, three options with trade-offs, one
recommendation with definition of done and implementation plan.

Moved to optional-skills/ (niche communication framework, not broadly
needed by default). Improved description with clearer trigger conditions
and replaced implementation-specific example with a generic one.

Based on PR #1262 by Willardgmoore.

Co-authored-by: Willard Moore <willardgmoore@users.noreply.github.com>
2026-03-29 15:25:12 -07:00
Teknium
57481c8ac5 fix(tools): implement send_message routing for Matrix, Mattermost, HomeAssistant, DingTalk (#3796)
* fix(tools): implement send_message routing for Matrix, Mattermost, HomeAssistant, DingTalk

Matrix, Mattermost, HomeAssistant, and DingTalk were present in
platform_map but fell through to the "not yet implemented" else branch,
causing send_message tool calls to silently fail on these platforms.

Add four async sender functions:
- _send_mattermost: POST /api/v4/posts via Mattermost REST API
- _send_matrix: PUT /_matrix/client/v3/rooms/.../send via Matrix CS API
- _send_homeassistant: POST /api/services/notify/notify via HA REST API
- _send_dingtalk: POST to session webhook URL

Add routing in _send_to_platform() and 17 unit tests covering success,
HTTP errors, missing config, env var fallback, and Matrix txn_id uniqueness.

* fix: pass platform tokens explicitly to Mattermost/Matrix/HA senders

The original PR passed pconfig.extra to sender functions, but tokens
live at pconfig.token (not in extra). This caused the senders to always
fall through to env var lookup instead of using the gateway-resolved
token.

Changes:
- Mattermost/Matrix/HA: accept token as first arg, matching the
  Telegram/Discord/Slack sender pattern
- DingTalk: add DINGTALK_WEBHOOK_URL env var fallback + docstring
  explaining the session-webhook vs robot-webhook difference
- Tests updated for new signatures + new DingTalk env var test

---------

Co-authored-by: sprmn24 <oncuevtv@gmail.com>
2026-03-29 15:17:46 -07:00
Teknium
c62cadb73a fix: make display_hermes_home imports lazy to prevent ImportError during hermes update (#3776)
When a user runs 'hermes update', the Python process caches old modules
in sys.modules.  After git pull updates files on disk, lazy imports of
newly-updated modules fail because they try to import display_hermes_home
from the cached (old) hermes_constants which doesn't have the function.

This specifically broke the gateway auto-restart in cmd_update — importing
hermes_cli/gateway.py triggered the top-level 'from hermes_constants
import display_hermes_home' against the cached old module.  The ImportError
was silently caught, so the gateway was never restarted after update.

Users with a running gateway then hit the ImportError on their next
Telegram/Discord message when the stale gateway process lazily loaded
run_agent.py (new version) which also had the top-level import.

Fixes:
- hermes_cli/gateway.py: lazy import at call site (line 940)
- run_agent.py: lazy import at call site (line 6927)
- tools/terminal_tool.py: lazy imports at 3 call sites
- tools/tts_tool.py: static schema string (no module-level call)
- hermes_cli/auth.py: lazy import at call site (line 2024)
- hermes_cli/main.py: reload hermes_constants after git pull in cmd_update

Also fixes 4 pre-existing test failures in test_parse_env_var caused by
NameError on display_hermes_home in terminal_tool.py.
2026-03-29 15:15:17 -07:00
Teknium
442888a05b fix: store token lock identity at acquire time for Slack and Discord
Community review (devoruncommented) correctly identified that the Slack
adapter re-read SLACK_APP_TOKEN from os.getenv() during disconnect,
which could differ from the value used during connect if the environment
changed. Discord had the same pattern with self.config.token (less risky
but still not bulletproof).

Both now follow the Telegram pattern: store the token identity on self
at acquire time, use the stored value for release, clear after release.

Also fixes docs: alias naming was hermes-<name> in docs but actual
implementation creates <name> directly (e.g. ~/.local/bin/coder not
~/.local/bin/hermes-coder).
2026-03-29 11:09:17 -07:00
Teknium
b151d5f7a7 docs: fix profile alias naming and improve quick start
The docs incorrectly showed aliases as 'hermes-work' when the actual
implementation creates 'work' (profile name directly, no prefix).

Rewrote the user guide to lead with the alias pattern:
  hermes profile create coder → coder chat, coder setup, etc.

Also clarified that the banner shows 'Profile: coder' and the prompt
shows 'coder ❯' when a non-default profile is active.

Fixed alias paths in command reference (hermes-work → work).
2026-03-29 10:51:51 -07:00
Teknium
f6db1b27ba feat: add profiles — run multiple isolated Hermes instances (#3681)
Each profile is a fully independent HERMES_HOME with its own config,
API keys, memory, sessions, skills, gateway, cron, and state.db.

Core module: hermes_cli/profiles.py (~900 lines)
  - Profile CRUD: create, delete, list, show, rename
  - Three clone levels: blank, --clone (config), --clone-all (everything)
  - Export/import: tar.gz archive for backup and migration
  - Wrapper alias scripts (~/.local/bin/<name>)
  - Collision detection for alias names
  - Sticky default via ~/.hermes/active_profile
  - Skill seeding via subprocess (handles module-level caching)
  - Auto-stop gateway on delete with disable-before-stop for services
  - Tab completion generation for bash and zsh

CLI integration (hermes_cli/main.py):
  - _apply_profile_override(): pre-import -p/--profile flag + sticky default
  - Full 'hermes profile' subcommand: list, use, create, delete, show,
    alias, rename, export, import
  - 'hermes completion bash/zsh' command
  - Multi-profile skill sync in hermes update

Display (cli.py, banner.py, gateway/run.py):
  - CLI prompt: 'coder ❯' when using a non-default profile
  - Banner shows profile name
  - Gateway startup log includes profile name

Gateway safety:
  - Token locks: Discord, Slack, WhatsApp, Signal (extends Telegram pattern)
  - Port conflict detection: API server, webhook adapter

Diagnostics (hermes_cli/doctor.py):
  - Profile health section: lists profiles, checks config, .env, aliases
  - Orphan alias detection: warns when wrapper points to deleted profile

Tests (tests/hermes_cli/test_profiles.py):
  - 71 automated tests covering: validation, CRUD, clone levels, rename,
    export/import, active profile, isolation, alias collision, completion
  - Full suite: 6760 passed, 0 new failures

Documentation:
  - website/docs/user-guide/profiles.md: full user guide (12 sections)
  - website/docs/reference/profile-commands.md: command reference (12 commands)
  - website/docs/reference/faq.md: 6 profile FAQ entries
  - website/sidebars.ts: navigation updated
2026-03-29 10:41:20 -07:00
Teknium
0df4d1278e feat(plugins): add enable/disable commands + interactive toggle UI (#3747)
Adds plugin management with three interfaces:

  hermes plugins          # interactive curses checklist (like hermes tools)
  hermes plugins enable   # non-interactive enable
  hermes plugins disable  # non-interactive disable
  hermes plugins list     # table with status column

Disabled plugins are stored in config.yaml under plugins.disabled and
skipped during discovery. Uses the same curses_checklist component as
hermes tools for the interactive UI.

Changes:
- hermes_cli/plugins.py: _get_disabled_plugins() + skip disabled during
  discover_and_load()
- hermes_cli/plugins_cmd.py: cmd_toggle() interactive UI, cmd_enable(),
  cmd_disable(), updated cmd_list() with status column
- hermes_cli/main.py: enable/disable subparser entries
- website/docs/reference/cli-commands.md: updated plugins section
- website/docs/user-guide/features/plugins.md: updated managing section
2026-03-29 10:39:57 -07:00
Teknium
95f99ea4b9 feat: built-in boot-md hook — run BOOT.md on gateway startup (#3733)
The gateway now ships with a built-in boot-md hook that checks for
~/.hermes/BOOT.md on every startup. If the file exists, the agent
executes its instructions in a background thread. No installation
or configuration needed — just create the file.

No BOOT.md = zero overhead (the hook silently returns).

Implementation:
- gateway/builtin_hooks/boot_md.py: handler with boot prompt,
  background thread, [SILENT] suppression, error handling
- gateway/hooks.py: _register_builtin_hooks() called at the start
  of discover_and_load() to wire in built-in hooks
- Docs updated: hooks page documents BOOT.md as a built-in feature
2026-03-29 10:19:54 -07:00
Teknium
811adca277 feat(skills): add SiYuan Note and Scrapling as optional skills (#3742)
Add two new optional skills:

- siyuan (optional-skills/productivity/): SiYuan Note knowledge base
  API skill — search, read, create, and manage blocks/documents in a
  self-hosted SiYuan instance via curl. Requires SIYUAN_TOKEN.

- scrapling (optional-skills/research/): Intelligent web scraping skill
  using the Scrapling library — anti-bot fetching, Cloudflare bypass,
  CSS/XPath selectors, spider framework for multi-page crawling.

Placed in optional-skills/ (not bundled) since both are niche tools
that require external dependencies.

Co-authored-by: FEUAZUR <FEUAZUR@users.noreply.github.com>
2026-03-29 09:34:56 -07:00
Teknium
aafe37012a docs: update skills catalog — add red-teaming and optional skills (#3745)
* fix(discord): clean up deferred "thinking..." after slash commands complete

After a slash command is deferred (interaction.response.defer), the
"thinking..." indicator persisted indefinitely because the code used
followup.send() which creates a separate message instead of replacing
or removing the deferred response.

Fix: use edit_original_response() to replace "thinking..." with the
confirmation text when provided, or delete_original_response() to
remove it when there is no confirmation. Also consolidated /reasoning
and /voice handlers to use _run_simple_slash instead of duplicating
the defer+dispatch pattern.

Fixes #3595.

* docs: update skills catalog — add red-teaming category and all 16 optional skills

The skills catalog was missing:
- red-teaming category with the godmode jailbreaking skill
- The entire optional skills section (16 skills across 10 categories)

Added both with descriptions sourced from each SKILL.md frontmatter.
Verified against the actual skills/ and optional-skills/ directories.
2026-03-29 09:34:35 -07:00
Teknium
909de72426 fix: set api_mode when switching providers via hermes model (#3726)
When switching providers via 'hermes model', the previous provider's
api_mode persisted in config.yaml. Switching from Copilot
(codex_responses) to a chat_completions provider like Z.AI would send
requests to the wrong endpoint (404).

Set api_mode = chat_completions in the 4 provider flows that were
missing it: OpenRouter, custom endpoint, Kimi, and api_key_provider.

Co-authored-by: Nour Eddine Hamaidi <HenkDz@users.noreply.github.com>
2026-03-29 08:07:11 -07:00
Teknium
ba1b600bce fix(tests): align skill/setup and platform mocks with current behavior (#3721)
- Skill invocation: no secret capture callback so SSH remote setup note is emitted
- Patch agent.skill_utils.sys for platform checks (skill_matches_platform)
- Skip CLAUDE.md priority test on Darwin (case-insensitive FS)

Made-with: Cursor

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-29 07:51:43 -07:00
Teknium
fcd1645223 feat(skills): support external skill directories via config (#3678)
Add skills.external_dirs config option — a list of additional directories
to scan for skills alongside ~/.hermes/skills/. External dirs are read-only:
skill creation/editing always writes to the local dir. Local skills take
precedence when names collide.

This lets users share skills across tools/agents without copying them into
Hermes's own directory (e.g. ~/.agents/skills, /shared/team-skills).

Changes:
- agent/skill_utils.py: add get_external_skills_dirs() and get_all_skills_dirs()
- agent/prompt_builder.py: scan external dirs in build_skills_system_prompt()
- tools/skills_tool.py: _find_all_skills() and skill_view() search external dirs;
  security check recognizes configured external dirs as trusted
- agent/skill_commands.py: /skill slash commands discover external skills
- hermes_cli/config.py: add skills.external_dirs to DEFAULT_CONFIG
- cli-config.yaml.example: document the option
- tests/agent/test_external_skills.py: 11 tests covering discovery, precedence,
  deduplication, and skill_view for external skills

Requested by community member primco.
2026-03-29 00:33:30 -07:00
Teknium
253a9adc72 docs(skills): clarify DuckDuckGo runtime requirements (#3680)
Co-authored-by: kshitij <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-29 00:17:57 -07:00
Teknium
300964178f docs: document credential file passthrough and env var forwarding for remote backends (#3677)
Three docs pages updated:

- security.md: New 'Credential File Passthrough' section, updated
  sandbox filter table to include Docker/Modal rows, added info box
  about Docker env_passthrough merge
- creating-skills.md: New 'Credential File Requirements' section
  with frontmatter examples and guidance on when to use env vars
  vs credential files
- environment-variables.md: Updated TERMINAL_DOCKER_FORWARD_ENV
  description to note auto-passthrough from skills
2026-03-29 00:16:34 -07:00
Teknium
7a3682ac3f feat: mount skill credential files + fix env passthrough for remote backends (#3671)
Two related fixes for remote terminal backends (Modal/Docker):

1. NEW: Credential file mounting system
   Skills declare required_credential_files in frontmatter. Files are
   mounted into Docker (read-only bind mounts) and Modal (mounts at
   creation + sync via exec on each command for mid-session changes).
   Google Workspace skill updated with the new field.

2. FIX: Docker backend now includes env_passthrough vars
   Skills that declare required_environment_variables (e.g. Notion with
   NOTION_API_KEY) register vars in the env_passthrough system. The
   local backend checked this, but Docker's forward_env was a separate
   disconnected list. Now Docker exec merges both sources, so
   skill-declared env vars are forwarded into containers automatically.

   This fixes the reported issue where NOTION_API_KEY in ~/.hermes/.env
   wasn't reaching the Docker container despite being registered via
   the Notion skill's prerequisites.

Closes #3665
2026-03-28 23:53:40 -07:00
Teknium
9f01244137 fix: replace user-facing hardcoded ~/.hermes paths with display_hermes_home()
Prep for profiles: user-facing messages now use display_hermes_home() so
diagnostic output shows the correct path for each profile.

New helper: display_hermes_home() in hermes_constants.py
12 files swept, ~30 user-facing string replacements.
Includes dynamic TTS schema description.
2026-03-28 23:47:21 -07:00
Teknium
0a80dd9c7a fix(discord): clean up deferred "thinking..." after slash commands complete (#3674)
After a slash command is deferred (interaction.response.defer), the
"thinking..." indicator persisted indefinitely because the code used
followup.send() which creates a separate message instead of replacing
or removing the deferred response.

Fix: use edit_original_response() to replace "thinking..." with the
confirmation text when provided, or delete_original_response() to
remove it when there is no confirmation. Also consolidated /reasoning
and /voice handlers to use _run_simple_slash instead of duplicating
the defer+dispatch pattern.

Fixes #3595.
2026-03-28 23:46:43 -07:00
Teknium
4764e06fde fix(acp): complete session management surface for editor clients (salvage #3501) (#3675)
* fix acp adapter session methods

* test: stub local command in transcription provider cases

---------

Co-authored-by: David Zhang <david.d.zhang@gmail.com>
2026-03-28 23:45:53 -07:00
kshitij
4c532c153b fix: URL-encode Signal phone numbers and correct attachment RPC parameter (#3670)
Fixes two Signal bugs:

1. SSE connection: URL-encode phone numbers so + isn't interpreted as space (400 Bad Request)
2. Attachment fetch: use 'id' parameter instead of 'attachmentId' (NullPointerException in signal-cli)

Also refactors Signal tests with shared helpers.
2026-03-28 23:45:28 -07:00
kshitij
a99c0478d0 fix(skills): move parallel-cli to optional-skills (#3673)
parallel-cli is a paid third-party vendor skill that requires
PARALLEL_API_KEY, but it was shipped in the default skills/ directory
with no env-var gate. This caused it to appear in every user's system
prompt even when they have no Parallel account or API key.

Move it to optional-skills/ so it is only visible through the Skills
Hub and must be explicitly installed. Also remove it from the default
skills catalog docs.
2026-03-28 23:45:05 -07:00
Teknium
c6e3084baf fix(gateway): replace print() with logger calls in BasePlatformAdapter (#3669)
Salvage of PR #3616 (memosr). Replaces 6 print() calls with proper logger calls in BasePlatformAdapter + removes redundant traceback.print_exc().

Co-Authored-By: memosr <memosr@users.noreply.github.com>
2026-03-28 22:25:35 -07:00
Teknium
dcbdfdbb2b feat(docker): add Docker container for the agent (salvage #1841) (#3668)
Adds a complete Docker packaging for Hermes Agent:
- Dockerfile based on debian:13.4 with all deps
- Entrypoint that bootstraps .env, config.yaml, SOUL.md on first run
- CI workflow to build, test, and push to DockerHub
- Documentation for interactive, gateway, and upgrade workflows

Closes #850, #913.

Changes vs original PR:
- Removed pre-created legacy cache/platform dirs from entrypoint
  (image_cache, audio_cache, pairing, whatsapp/session) — these are
  now created on demand by the application using the consolidated
  layout from get_hermes_dir()
- Moved docs from docs/docker.md to website/docs/user-guide/docker.md
  and added to Docusaurus sidebar

Co-authored-by: benbarclay <benbarclay@users.noreply.github.com>
2026-03-28 22:21:48 -07:00
Teknium
91b881f931 feat(mattermost): configurable mention behavior — respond without @mention (#3664)
Adds MATTERMOST_REQUIRE_MENTION and MATTERMOST_FREE_RESPONSE_CHANNELS
env vars, matching Discord's existing mention gating pattern.

- MATTERMOST_REQUIRE_MENTION=false: respond to all channel messages
- MATTERMOST_FREE_RESPONSE_CHANNELS=id1,id2: specific channels where
  bot responds without @mention even when require_mention is true
- DMs always respond regardless of mention settings
- @mention is now stripped from message text (clean agent input)

7 new tests for mention gating, free-response channels, DM bypass,
and mention stripping. Updated existing test for mention stripping.

Docs: updated mattermost.md with Mention Behavior section,
environment-variables.md with new vars, config.py with metadata.
2026-03-28 22:17:43 -07:00
Teknium
3e1157080a fix(tools): use non-deprecated streamable_http_client for MCP HTTP transport (#3646)
Switch MCP HTTP transport from the deprecated streamablehttp_client()
(mcp < 1.24.0) to the new streamable_http_client() API that accepts a
pre-built httpx.AsyncClient.

Changes vs the original PR #3391:
- Separate try/except imports so mcp < 1.24.0 doesn't break (graceful
  fallback to deprecated API instead of losing HTTP MCP entirely)
- Wrap httpx.AsyncClient in async-with for proper lifecycle management
  (the new SDK API explicitly skips closing caller-provided clients)
- Match SDK's own create_mcp_http_client defaults: follow_redirects=True,
  Timeout(connect_timeout, read=300.0)
- Keep deprecated code path as fallback for older SDK versions

Co-authored-by: HenkDz <HenkDz@users.noreply.github.com>
2026-03-28 18:20:49 -07:00
Teknium
1a032ccf79 fix(skills): stop marking persisted env vars missing on remote backends (#3650)
Salvage of PR #3452 (kentimsit). Fixes skill readiness checks on remote backends — persisted env vars are no longer incorrectly marked as missing.

Co-Authored-By: kentimsit <kentimsit@users.noreply.github.com>
2026-03-28 17:52:32 -07:00
Teknium
0bd7e95dfc fix(honcho): allow self-hosted local instances without API key (#3644)
Self-hosted Honcho on localhost doesn't require authentication, but
both the activation gates and the SDK client required an API key.

Combined fix from three contributor PRs:
- Relax all 8 activation gates to accept (api_key OR base_url) as
  valid credentials (#3482 by @cameronbergh)
- Use 'local' placeholder for the SDK client when base_url points to
  localhost/127.0.0.1/::1 (#3570 by @ygd58)

Files changed: run_agent.py (2 gates), cli.py (1 gate),
gateway/run.py (1 gate), honcho_integration/cli.py (2 gates),
hermes_cli/doctor.py (2 gates), honcho_integration/client.py (SDK).

Co-authored-by: cameronbergh <cameronbergh@users.noreply.github.com>
Co-authored-by: ygd58 <ygd58@users.noreply.github.com>
Co-authored-by: devorun <devorun@users.noreply.github.com>
2026-03-28 17:49:56 -07:00
Teknium
d35567c6e0 feat(web): add Exa as a web search and extract backend (#3648)
Adds Exa (https://exa.ai) as a fourth web backend alongside Parallel,
Firecrawl, and Tavily. Follows the exact same integration pattern:

- Backend selection: config web.backend=exa or auto-detect from EXA_API_KEY
- Search: _exa_search() with highlights for result descriptions
- Extract: _exa_extract() with full text content extraction
- Lazy singleton client with x-exa-integration header
- Wired into web_search_tool and web_extract_tool dispatchers
- check_web_api_key() and requires_env updated
- CLI: hermes setup summary, hermes tools config, hermes config show
- config.py: EXA_API_KEY in OPTIONAL_ENV_VARS with metadata
- pyproject.toml: exa-py>=2.9.0,<3 in dependencies


Salvaged from PR #1850.

Co-authored-by: louiswalsh <louiswalsh@users.noreply.github.com>
2026-03-28 17:35:53 -07:00
Teknium
bea49e02a3 fix: route /bg spinner through TUI widget to prevent status bar collision (#3643)
Background agent's KawaiiSpinner wrote \r-based animation and stop()
messages through StdoutProxy, colliding with prompt_toolkit's status bar.

Two fixes:
- display.py: use isinstance(out, StdoutProxy) instead of fragile
  hasattr+name check for detecting prompt_toolkit's stdout wrapper
- cli.py: silence bg agent's raw spinner (_print_fn=no-op) and route
  thinking updates through the TUI widget only when no foreground
  agent is active; clear spinner text in finally block with same guard

Closes #2718

Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-03-28 17:29:37 -07:00
nguyen binh
c6e2e486bf fix: add download retry to cache_audio_from_url matching cache_image_from_url (#3401)
PR #3323 added retry with exponential backoff to cache_image_from_url
but missed the sibling function cache_audio_from_url 18 lines below in
the same file. A single transient 429/5xx/timeout loses voice messages
while image downloads now survive them.

Apply the same retry pattern: 3 attempts with 1.5s exponential backoff,
immediate raise on non-retryable 4xx.
2026-03-28 17:28:38 -07:00
Teknium
973deb4f76 fix(browser): guard LLM response content against None in snapshot and vision (#3642)
Salvage of PR #3532 (binhnt92). Guards browser_tool.py against None content from reasoning-only models (DeepSeek-R1, QwQ). Follow-up to #3449.

Co-Authored-By: binhnt92 <binhnt92@users.noreply.github.com>
2026-03-28 17:25:04 -07:00
Teknium
dc74998718 fix(sessions): support stdout (-) in session and snapshot export (salvage #3617) (#3641)
* fix(sessions): support stdout when output path is '-' in session export

* fix: style cleanup + extend stdout support to snapshot export

Follow-up for salvaged PR #3617:
- Fix import sys; on one line (style consistency)
- Update help text to mention - for stdout
- Apply same stdout support to hermes skills snapshot export

---------

Co-authored-by: ygd58 <buraysandro9@gmail.com>
2026-03-28 17:24:32 -07:00
Teknium
17617e4399 feat(discord): DISCORD_IGNORE_NO_MENTION — skip messages that @mention others but not the bot (#3640)
Salvage of PR #3310 (luojiesi). When DISCORD_IGNORE_NO_MENTION=true (default), messages that @mention other users but not the bot are silently skipped in server channels. DMs excluded — mentions there are just references.

Co-Authored-By: luojiesi <luojiesi@users.noreply.github.com>
2026-03-28 17:19:41 -07:00
Siddharth Balyan
ffdfeb91d8 fix(nix): unify directory and file permissions across all three layers (#3619)
Activation script, tmpfiles, and container entrypoint now agree on
0750 for all directories. Tighten config.yaml and workspace documents
from 0644 to 0640 (group-readable, no world access). Add explicit
chmod for .managed marker and container $TARGET_HOME to eliminate
umask dependence. Secrets (auth.json, .env) remain 0600.
2026-03-29 05:29:24 +05:30
Teknium
857a5d7b47 fix: sanitize surrogate characters from clipboard paste to prevent UnicodeEncodeError (#3624)
Pasting text from rich-text editors (Google Docs, Word, etc.) can inject
lone surrogate characters (U+D800..U+DFFF) that are invalid UTF-8.
The OpenAI SDK serializes messages with ensure_ascii=False, then encodes
to UTF-8 for the HTTP body — surrogates crash this with:
  UnicodeEncodeError: 'utf-8' codec can't encode character '\udce2'

Three-layer fix:
1. Primary: sanitize user_message at the top of run_conversation()
2. CLI: sanitize in chat() before appending to conversation_history
3. Safety net: catch UnicodeEncodeError in the API error handler,
   sanitize the entire messages list in-place, and retry once.
   Also exclude UnicodeEncodeError from is_local_validation_error
   so it doesn't get classified as non-retryable.

Includes 14 new tests covering the sanitization helpers and the
integration with run_conversation().
2026-03-28 16:53:14 -07:00
Teknium
b029742092 fix(cli): strengthen paste collapse fallback for terminals without bracketed paste (#3625)
The _on_text_changed fallback only detected pastes when all characters
arrived in a single event (chars_added > 1).  Some terminals (notably
VSCode integrated terminal in certain configs) may deliver paste data
differently, causing the fallback to miss.

Add a second heuristic: if the newline count jumps by 4+ in a single
text-change event, treat it as a paste.  Alt+Enter only adds 1 newline
per event, so this never false-positives on manual multi-line input.

Also fixes: the fallback path was missing _paste_just_collapsed flag
set before replacing buffer text, which could cause a re-trigger loop.
2026-03-28 15:40:49 -07:00
Teknium
02fb7c4aaf docs: comprehensive docs audit — fix 12 stale/missing items across 10 pages (#3618)
Fixes found by auditing docs against recent PRs/commits:

Critical (misleading):
- hooks.md: Remove stale 'planned — not yet wired' markers for 4 hooks
  that are now active (#3542). Add correct callback signatures.
- security.md: Update tirith verdict behavior — block verdicts now go
  through approval flow instead of hard-blocking (#3428). Add pkill/killall
  self-termination guard and gateway-run backgrounding patterns (#3593).

New feature docs:
- configuration.md: Add tool_use_enforcement section with value table
  (auto/true/false/list) from #3551/#3528.
- configuration.md: Expand auxiliary config with per-task timeouts
  (compression 120s, web_extract 30s, approval 30s) from #3597.
- api-server.md: Add /v1/health alias, Security Headers section,
  CORS details (Max-Age, SSE headers, Idempotency-Key) from
  #3572/#3573/#3576/#3580/#3530.

Stale/incomplete:
- configuration.md: Fix Alibaba model name qwen-plus -> qwen3.5-plus (#3484).
- environment-variables.md: Specify actual DashScope default URL.
- cli-commands.md: Add alibaba to --provider list.
- fallback-providers.md: Add Alibaba/DashScope to provider table.
- email.md: Document noreply/automated sender filtering (#3606).
- toolsets-reference.md: Add 4 missing platform toolsets — matrix,
  mattermost, dingtalk, api-server (#3583).
- skills.md: List default GitHub taps including garrytan/gstack (#3605).
2026-03-28 15:26:35 -07:00
Teknium
1e924e99b9 refactor: consolidate ~/.hermes directory layout with backward compat (#3610)
New installs get a cleaner structure:
  cache/images/      (was image_cache/)
  cache/audio/       (was audio_cache/)
  cache/documents/   (was document_cache/)
  cache/screenshots/ (was browser_screenshots/)
  platforms/whatsapp/session/ (was whatsapp/session/)
  platforms/matrix/store/    (was matrix/store/)
  platforms/pairing/         (was pairing/)

Existing installs are unaffected -- get_hermes_dir() checks for the
old path first and uses it if present. No migration needed.

Adds get_hermes_dir(new_subpath, old_name) helper to hermes_constants.py
for reuse by any future subsystem.
2026-03-28 15:22:19 -07:00
Teknium
614e43d3d9 feat(skills): add garrytan/gstack as default Skills Hub tap (#3605)
Add the gstack community skills repo to the default tap list and fix
skill_identifier construction for repos with an empty path prefix.

Co-authored-by: Tugrul Guner <tugrulguner@users.noreply.github.com>
2026-03-28 14:55:49 -07:00
Teknium
e4480ff426 fix(config): accept 'model' key as alias for 'default' in model config (#3603)
Users intuitively write model: { model: my-model } instead of
model: { default: my-model } and it silently falls back to the
hardcoded default. Now both spellings work across all three config
consumers: runtime_provider, CLI, and gateway.

Co-authored-by: ygd58 <ygd58@users.noreply.github.com>
2026-03-28 14:55:27 -07:00
Teknium
9a364f2805 fix: cap percentage displays at 100% in stats, gateway, and memory tool (#3599)
Salvage of PR #3533 (binhnt92). Follow-up to #3480 — applies min(100, ...) to 5 remaining unclamped percentage display sites in context_compressor, cli /stats, gateway /stats, and memory tool. Defensive clamps now that the root cause (estimation heuristic) was already removed in #3480.

Co-Authored-By: binhnt92 <binhnt92@users.noreply.github.com>
2026-03-28 14:55:18 -07:00
Teknium
1b2d4f21f3 feat(cli): show resume-by-title command in exit summary (#3607)
When exiting a session that has a title (auto-generated or manual),
the exit summary now also shows:
  hermes -c "Session Title"
alongside the existing hermes --resume <id> command.

Also adds the title to the session info block.
2026-03-28 14:54:53 -07:00
Teknium
9009169eeb fix: recover updater when venv pip is missing (#3608)
Some environments lose pip inside the venv. Before invoking pip install,
check pip --version and bootstrap with ensurepip if missing. Applied to
both update code paths (_update_via_zip and cmd_update).


Salvaged from PR #3359.

Co-authored-by: Git-on-my-level <Git-on-my-level@users.noreply.github.com>
2026-03-28 14:54:49 -07:00
Teknium
0f042f3930 fix(email): filter automated/noreply senders to prevent reply loops (salvage #3461) (#3606)
* fix(gateway): filter automated/noreply senders in email adapter

Fixes #3453

Adds noreply/automated sender filtering to the email adapter. Drops emails from noreply, mailer-daemon, postmaster addresses and bulk mail headers (Auto-Submitted, Precedence, List-Unsubscribe) before dispatching. Prevents pairing codes and AI responses being sent to automated senders.

* fix: remove redundant seen_uids add + trailing whitespace cleanup

---------

Co-authored-by: devorun <130918800+devorun@users.noreply.github.com>
2026-03-28 14:50:50 -07:00
Siddharth Balyan
7a9e45e560 fix: regenerate uv.lock to match v0.5.0 in pyproject.toml (#3594)
The lockfile was still pinned to hermes-agent 0.4.0 after the v0.5.0
release, causing downstream consumers (e.g. the Nix package built via
uv2nix) to report the wrong version.  Also drops stale transitive deps
(bashlex, boto3, swe-rex) that were carried over from the removed
swe-rex integration.
2026-03-29 03:19:47 +05:30
Teknium
a641f20cac fix(gateway): self-heal missing launchd plist on start (#3601)
When the plist is deleted (manual cleanup, failed upgrade),
hermes gateway start now regenerates it automatically instead of
failing. Also simplifies the returncode==3 error path since the
plist is guaranteed to exist at that point.

Co-authored-by: Bartok9 <Bartok9@users.noreply.github.com>
2026-03-28 14:48:55 -07:00
Teknium
ee066b7be6 fix: use placeholder api_key for custom providers without credentials (#3604)
Local/custom OpenAI-compatible providers (Ollama, LM Studio, vLLM) that
don't require auth were hitting empty api_key rejections from the OpenAI
SDK, especially when used as smart model routing targets.

Uses the same 'no-key-required' placeholder already used in
_resolve_openrouter_runtime() for the identical scenario.


Salvaged from PR #3543.

Co-authored-by: scottlowry <scottlowry@users.noreply.github.com>
2026-03-28 14:47:41 -07:00
Mibay
a6bc13ce13 fix(github-auth): check ~/.hermes/.env before ~/.git-credentials for token extraction (#3466)
* fix(github-auth): check ~/.hermes/.env before ~/.git-credentials for token extraction

Users who configured their token via `hermes setup` have it stored in
~/.hermes/.env (GITHUB_TOKEN=...), not in ~/.git-credentials. On macOS
with osxkeychain as the default git credential helper, ~/.git-credentials
may not exist at all, causing silent 401 failures in all GitHub skills.

Add ~/.hermes/.env as the first fallback in the auth detection block and
the inline "Extracting the Token from Git Credentials" example.

Priority order: env var → ~/.hermes/.env → ~/.git-credentials → none

Part of fix for NousResearch/hermes-agent#3464

* fix(github-auth): check ~/.hermes/.env before ~/.git-credentials

Fixes #3464

* fix(github-auth): check ~/.hermes/.env before ~/.git-credentials

Fixes #3464

* fix(github-auth): check ~/.hermes/.env before ~/.git-credentials

Fixes #3464

* fix(github-auth): check ~/.hermes/.env before ~/.git-credentials

Fixes #3464

* fix(github-auth): check ~/.hermes/.env before ~/.git-credentials

Fixes #3464

* fix(github-auth): check ~/.hermes/.env before ~/.git-credentials

Fixes #3464
2026-03-28 14:46:49 -07:00
Teknium
f803f66339 fix(terminal): avoid merging heredoc EOF with fence wrapper (#3598)
One-shot local execution built `printf FENCE; <cmd>; __hermes_rc=...`, so a
command ending in a heredoc produced a closing line like `EOF; __hermes_rc=...`,
which is not a valid delimiter. Bash then treated the rest of the wrapper as
heredoc body, leaking it into tool output (e.g. gh issue/PR flows).

Use newline-separated wrapper lines so the delimiter stays alone and the
trailer runs after the heredoc completes.

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-28 14:43:41 -07:00
Teknium
839d9d7471 feat(agent): configurable timeouts for auxiliary LLM calls via config.yaml (#3597)
Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml
instead of hardcoded values. Users with slow local models (Ollama, llama.cpp)
can now increase timeouts for compression, vision, session search, etc.

Defaults:
  - auxiliary.compression.timeout: 120s (was hardcoded 45s)
  - auxiliary.vision.timeout: 30s (unchanged)
  - all other aux tasks: 30s (was hardcoded 30s)
  - title_generator: 30s (was hardcoded 15s)

call_llm/async_call_llm now auto-resolve timeout from config when not
explicitly passed. Callers can still override with an explicit timeout arg.

Based on PR #3406 by alanfwilliams. Converted from env vars to config.yaml
per project conventions.

Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>
2026-03-28 14:35:28 -07:00
Teknium
404a0b823e fix: add self-termination guard for pkill/killall targeting hermes/gateway (#3593)
Prevent the agent from accidentally killing its own process with
pkill -f gateway, killall hermes, etc. Adds a dangerous command
pattern that triggers the approval flow.

Co-authored-by: arasovic <arasovic@users.noreply.github.com>
2026-03-28 14:33:48 -07:00
Teknium
dabe3c34cc feat(webhook): hermes webhook CLI + skill for event-driven subscriptions (#3578)
Adds 'hermes webhook' CLI subcommand and a skill — zero new model tools.

CLI commands (require webhook platform to be enabled):
  hermes webhook subscribe <name> [--events, --prompt, --deliver, ...]
  hermes webhook list
  hermes webhook remove <name>
  hermes webhook test <name>

All commands gate on webhook platform being enabled in config. If not
configured, prints setup instructions (gateway setup wizard, manual
config.yaml, or env vars).

The agent uses these via terminal tool, guided by the webhook-subscriptions
skill which documents setup, common patterns (GitHub, Stripe, CI/CD,
monitoring), prompt template syntax, security, and troubleshooting.

Adapter enhancement: webhook.py hot-reloads dynamic subscriptions from
~/.hermes/webhook_subscriptions.json on each incoming request (mtime-gated).
Static config.yaml routes always take precedence.

Docs: updated webhooks.md with Dynamic Subscriptions section, added
hermes webhook to cli-commands.md reference.

No new model tools. No toolset changes.

24 new tests for CLI CRUD, persistence, enabled-gate, and adapter
dynamic route loading.
2026-03-28 14:33:35 -07:00
Teknium
82d6c28bd5 fix(skills): cache-aware /skills install and uninstall in TUI (#3586)
Two fixes for /skills install and /skills uninstall slash commands:

1. input() hangs indefinitely inside prompt_toolkit's TUI event loop,
   soft-locking the CLI. The user typing the slash command is already
   implicit consent, so confirmation is now always skipped.

2. Cache invalidation was unconditional — installing or uninstalling a
   skill mid-session silently broke the prompt cache, increasing costs.
   The slash handler now defers cache invalidation by default (skill
   takes effect next session). Pass --now to invalidate immediately,
   with a message explaining the cost tradeoff. The CLI argparse path
   (hermes skills install) is unaffected and still invalidates.

Fixes #3474
Salvaged from PR #3496 by dlkakbs.
2026-03-28 14:32:23 -07:00
Islandman93
dc7d504aca Remove incorrect docker alternative for signal-cli (#3545)
Removed docker alternative for signal-cli-rest-api from the documentation. It does not support the raw signal-cli http daemon. See https://github.com/bbernhard/signal-cli-rest-api/issues/720
2026-03-28 14:28:57 -07:00
Teknium
9e411f7d70 fix(update): skip config migration prompts in non-interactive sessions (#3584)
hermes update hangs on input() when run from cron, scripts, or piped
contexts. Check both stdin and stdout isatty(), catch EOFError as a
fallback, and print guidance to run 'hermes config migrate' later.

Co-authored-by: phippsbot-byte <phippsbot-byte@users.noreply.github.com>
2026-03-28 14:26:32 -07:00
Teknium
708f187549 fix(gateway): exit with failure when all platforms fail with retryable errors (#3592)
When all messaging platforms exhaust retries and get queued for background
reconnection, exit with code 1 so systemd Restart=on-failure can restart
the process. Previously the gateway stayed alive as a zombie with no
connected platforms and exit code 0.

Salvaged from PR #3567 by kelsia14. Test updates added.

Co-authored-by: kelsia14 <kelsia14@users.noreply.github.com>
2026-03-28 14:25:12 -07:00
Teknium
d7c41f3cef fix(telegram): honor proxy env vars in fallback transport (salvage #3411) (#3591)
* fix: keep gateway running through telegram proxy failures

- continue gateway startup in degraded mode when Telegram cannot connect yet
- ensure Telegram fallback transport also honors proxy env vars
- support reconnect retries without taking down the whole gateway

* test(telegram): cover proxy env handling in fallback transport

---------

Co-authored-by: kufufu9 <pi@local>
2026-03-28 14:23:27 -07:00
Teknium
6893c3befc fix(gateway): inject PATH + VIRTUAL_ENV into launchd plist for macOS service (#3585)
Salvage of PR #2173 (hanai) and PR #3432 (timknip).

Injects PATH, VIRTUAL_ENV, and HERMES_HOME into the macOS launchd plist so gateway subprocesses find user-installed tools (node, ffmpeg, etc.). Matches systemd unit parity with venv/bin, node_modules/.bin, and resolved node dir in PATH. Includes 7 new tests and docs updates across 4 pages.

Co-Authored-By: Han <ihanai1991@gmail.com>
Co-Authored-By: timknip <timknip@users.noreply.github.com>
2026-03-28 14:23:26 -07:00
Teknium
5cdc24c2e2 docs(slack): add missing Messages Tab setup step (#3590)
Without enabling the Messages Tab in App Home settings, users see
"Sending messages to this app has been turned off" when trying to DM
the bot — even with all correct scopes and event subscriptions.

Add Step 5 (Enable the Messages Tab) between Event Subscriptions and
Install App, with a danger admonition. Also add troubleshooting entry
for this specific error message. Renumber subsequent steps (6→7→8→9).

Co-authored-by: Alberto Leal <mail4alberto@gmail.com>
2026-03-28 14:23:19 -07:00
Teknium
2dd286c162 fix: write models.dev disk cache atomically (#3588)
Use atomic_json_write() from utils.py instead of plain open()/json.dump()
for the models.dev disk cache. Prevents corrupted cache if the process is
killed mid-write — _load_disk_cache() silently returns {} on corrupt JSON,
losing all model metadata until the next successful API fetch.

Co-authored-by: memosr <memosr@users.noreply.github.com>
2026-03-28 14:20:30 -07:00
Teknium
924857c3e3 fix: prevent tool name/arg concatenation for Ollama-compatible endpoints (#3582)
Ollama reuses index 0 for every tool call in a parallel batch,
distinguishing them only by id.  The streaming accumulator now
detects a new non-empty id at an already-active index and redirects
it to a fresh slot, preventing names and arguments from being
concatenated into a single tool call.

No-op for normal providers that use incrementing indices.

Co-authored-by: dmater01 <dmater01@users.noreply.github.com>
2026-03-28 14:08:26 -07:00
Teknium
ba3bbf5b53 fix: add missing mattermost/matrix/dingtalk toolsets + platform consistency tests (salvage #3512) (#3583)
* Fixing mattermost configuration parsing bugs

* fix: add homeassistant to skills_config + platform consistency tests

Follow-up for cherry-picked #3512:
- Add homeassistant to skills_config.py PLATFORMS (was in tools_config
  but missing from skills_config)
- Add 3 consistency tests that verify all platforms in tools_config have
  matching toolset definitions, gateway includes, and skills_config entries
  — prevents this class of bug from recurring

---------

Co-authored-by: DaneelV3 <dannel@v3rtical.tech>
2026-03-28 14:05:02 -07:00
Teknium
d6b4fa2e9f fix: strip @botname from commands so /new@TigerNanoBot resolves correctly (#3581)
Commands sent directly to the bot in groups include @botname suffix
(e.g. /compress@TigerNanoBot). get_command() now strips the @anything
part before lookup, matching how Telegram bot menu generates commands.
Fixes all slash commands silently doing nothing when sent with @mention.

Co-authored-by: MacroAnarchy <MacroAnarchy@users.noreply.github.com>
2026-03-28 14:01:01 -07:00
Teknium
df1bf0a209 feat(api-server): add basic security headers (#3576)
Add X-Content-Type-Options: nosniff and Referrer-Policy: no-referrer
to all API server responses via a new security_headers_middleware.

Co-authored-by: Oktay Aydin <aydnOktay@users.noreply.github.com>
2026-03-28 14:00:52 -07:00
Teknium
49a49983e4 feat(api-server): add Access-Control-Max-Age to CORS preflight responses (#3580)
Adds Access-Control-Max-Age: 600 to CORS preflight responses, telling
browsers to cache the preflight for 10 minutes. Reduces redundant OPTIONS
requests and improves perceived latency for browser-based API clients.

Salvaged from PR #3514 by aydnOktay.

Co-authored-by: aydnOktay <xaydinoktay@gmail.com>
2026-03-28 14:00:03 -07:00
Teknium
e97c0cb578 fix: replace hardcoded ~/.hermes paths with get_hermes_home() for profile support
* feat: GPT tool-use steering + strip budget warnings from history

Two changes to improve tool reliability, especially for OpenAI GPT models:

1. GPT tool-use enforcement prompt: Adds GPT_TOOL_USE_GUIDANCE to the
   system prompt when the model name contains 'gpt' and tools are loaded.
   This addresses a known behavioral pattern where GPT models describe
   intended actions ('I will run the tests') instead of actually making
   tool calls. Inspired by similar steering in OpenCode (beast.txt) and
   Cline (GPT-5.1 variant).

2. Budget warning history stripping: Budget pressure warnings injected by
   _get_budget_warning() into tool results are now stripped when
   conversation history is replayed via run_conversation(). Previously,
   these turn-scoped signals persisted across turns, causing models to
   avoid tool calls in all subsequent messages after any turn that hit
   the 70-90% iteration threshold.

* fix: replace hardcoded ~/.hermes paths with get_hermes_home() for profile support

Prep for the upcoming profiles feature — each profile is a separate
HERMES_HOME directory, so all paths must respect the env var.

Fixes:
- gateway/platforms/matrix.py: Matrix E2EE store was hardcoded to
  ~/.hermes/matrix/store, ignoring HERMES_HOME. Now uses
  get_hermes_home() so each profile gets its own Matrix state.

- gateway/platforms/telegram.py: Two locations reading config.yaml via
  Path.home()/.hermes instead of get_hermes_home(). DM topic thread_id
  persistence and hot-reload would read the wrong config in a profile.

- tools/file_tools.py: Security path for hub index blocking was
  hardcoded to ~/.hermes, would miss the actual profile's hub cache.

- hermes_cli/gateway.py: Service naming now uses the profile name
  (hermes-gateway-coder) instead of a cryptic hash suffix. Extracted
  _profile_suffix() helper shared by systemd and launchd.

- hermes_cli/gateway.py: Launchd plist path and Label now scoped per
  profile (ai.hermes.gateway-coder.plist). Previously all profiles
  would collide on the same plist file on macOS.

- hermes_cli/gateway.py: Launchd plist now includes HERMES_HOME in
  EnvironmentVariables — was missing entirely, making custom
  HERMES_HOME broken on macOS launchd (pre-existing bug).

- All launchctl commands in gateway.py, main.py, status.py updated
  to use get_launchd_label() instead of hardcoded string.

Test fixes: DM topic tests now set HERMES_HOME env var alongside
Path.home() mock. Launchd test uses get_launchd_label() for expected
commands.
2026-03-28 13:51:08 -07:00
Teknium
c0aa06f300 fix(test): update streaming test to match PR #3566 behavior change (#3574)
PR #3566 intentionally routes suppressed content to stream_delta_callback
when tool calls are present, so reasoning tag extraction can fire during
streaming. The test was still asserting the old behavior where content
after tool calls was fully suppressed from the callback.

Updated the assertion to match: content IS delivered to the callback
(for tag extraction), with display-level suppression handled by the
CLI's _stream_delta.
2026-03-28 13:41:23 -07:00
Teknium
3273732891 fix(api-server): add CORS headers to streaming SSE responses (#3573)
StreamResponse headers are flushed on prepare() before the CORS
middleware can inject them. Resolve CORS headers up front using
_cors_headers_for_origin() so the full set (including
Access-Control-Allow-Origin) is present on SSE streams.

Co-authored-by: ygd58 <ygd58@users.noreply.github.com>
2026-03-28 13:38:30 -07:00
Teknium
09ebf8b252 feat(api-server): add /v1/health alias for OpenAI compatibility (#3572)
Add GET /v1/health as an alias to the existing /health endpoint so
OpenAI-compatible health checks work out of the box.

Co-authored-by: Oktay Aydin <aydnOktay@users.noreply.github.com>
2026-03-28 13:32:39 -07:00
Teknium
33c89e52ec fix(whatsapp): add **kwargs to media sending methods to accept metadata (#3571)
The base orchestrator passes metadata=_thread_metadata to
send_image_file, send_video, and send_document. WhatsApp was the
only platform adapter missing the parameter, causing TypeError
crashes when sending media.

Extended to all three methods (original PR only fixed send_image_file).


Salvaged from PR #3144.

Co-authored-by: afifai <afifai@users.noreply.github.com>
2026-03-28 13:28:04 -07:00
Teknium
558cc14ad9 chore: release v0.5.0 (v2026.3.28) (#3568)
The hardening release — Nous Portal 400+ models, Hugging Face provider,
Telegram Private Chat Topics, native Modal SDK, plugin lifecycle hooks,
improved OpenAI model reliability, Nix flake, supply chain hardening,
Anthropic output limits fix, and 50+ security/reliability fixes.

165 merged PRs, 65 closed issues across a 5-day window.
2026-03-28 13:11:39 -07:00
Teknium
1d0a119368 fix(display): show reasoning before response when tool calls suppress content (#3566)
* fix(provider): remove MiniMax /v1→/anthropic auto-correction to allow user override

The minimax-specific auto-correction in runtime_provider.py was
preventing users from overriding to the OpenAI-compatible endpoint
via MINIMAX_BASE_URL. Users in certain regions get nginx 404 on
api.minimax.io/anthropic and need to switch to api.minimax.chat/v1.

The generic URL-suffix detection already handles /anthropic →
anthropic_messages, so the minimax-specific code was redundant for
the default path and harmful for the override path.

Now: default /anthropic URL works via generic detection, user
override to /v1 gets chat_completions mode naturally.

Closes #3546 (different approach — respects user overrides instead
of changing the default endpoint).

* fix(display): show reasoning during streaming even when tool calls suppress content

When a model generates content (containing <REASONING_SCRATCHPAD> tags)
alongside tool calls in the same API response, content deltas were
suppressed from streaming once any tool call chunk arrived. This
prevented the CLI's tag extraction from running, so reasoning was
never shown during streaming. The post-response fallback then
displayed reasoning AFTER the already-visible streamed response,
creating a confusing reversed order.

Fix: route suppressed content to stream_delta_callback even when tool
calls are present. The CLI's _stream_delta handles tag extraction —
reasoning tags are routed to the reasoning display box, while
non-reasoning text is handled by the existing stream display logic.
This ensures reasoning appears before tool execution and the final
response, matching the expected visual order.
2026-03-28 12:34:32 -07:00
Teknium
901494d728 feat: make tool-use enforcement configurable via agent.tool_use_enforcement (#3551)
The TOOL_USE_ENFORCEMENT_GUIDANCE injection (added in #3528) was
hardcoded to only match gpt/codex model names. This makes it a
config option so users can turn it on for any model family.

New config key: agent.tool_use_enforcement
  - "auto" (default): matches gpt/codex (existing behavior)
  - true: inject for all models
  - false: never inject
  - list of strings: custom model-name substrings to match
    e.g. ["gpt", "codex", "deepseek", "qwen"]

No version bump needed — deep merge provides the default
automatically for existing installs.

12 new tests covering all config modes.
2026-03-28 12:31:22 -07:00
Osman Mehmood
d26ee20659 docs(discord): fix Public Bot setting for Discord-provided invite link (#3519)
The documentation incorrectly instructed users to set Public Bot to OFF,
but this prevents using the Discord-provided invite link (recommended method),
causing the error: 'Private application cannot have a default authorization link'.

Changes:
- Changed Step 2: Public Bot now set to ON (required for Installation tab method)
- Added info callout explaining the Private Bot alternative (use Manual URL)
- Added note in Step 5 Option A clarifying the Public Bot requirement

Fixes Discord bot setup flow for new users following the recommended path.

Co-authored-by: Docs Fix <docs-fix@example.com>
2026-03-28 12:24:43 -07:00
Teknium
393929831e fix(gateway): preserve transcript on /compress and hygiene compression (salvage #3516) (#3556)
* fix(gateway): preserve full transcript on /compress instead of overwriting

The /compress command calls _compress_context() which correctly ends the
old session (preserving its full transcript in SQLite) and creates a new
session_id for the continuation. However, it then immediately called
rewrite_transcript() on the OLD session_id, overwriting the preserved
transcript with the compressed version — destroying searchable history.

Auto-compression (triggered by context pressure) does not have this bug
because the gateway already handles the session_id swap via the
agent.session_id != session_id check after _run_agent_sync.

Fix: after _compress_context creates the new session, write the compressed
messages into the NEW session_id and update the session store pointer.
The old session's full transcript stays intact and searchable via
session_search.

Before: /compress destroys original messages, session_search can't find
details from compressed portions.

After: /compress behaves like /new for history — full transcript preserved,
compressed context for the live session.

* fix(gateway): preserve transcript on /compress and hygiene compression

Apply session_id swap after _compress_context in both /compress handler
and hygiene pre-compression. _compress_context creates a new session
(ending the old one), but both paths were calling rewrite_transcript on
the OLD session_id — overwriting the preserved transcript and destroying
searchable history.

Now follows the same pattern as the auto-compression handler (lines
5415-5423): detect the new session_id, update the session store entry,
and write compressed messages to the new session.

Also fix FakeCompressAgent test mock to include session_id attribute
and simulate the session_id change that real _compress_context performs.

Co-authored-by: MacroAnarchy <MacroAnarchy@users.noreply.github.com>

---------

Co-authored-by: MacroAnarchy <MacroAnarchy@users.noreply.github.com>
2026-03-28 12:23:43 -07:00
Teknium
be322efdf2 fix(matrix): harden e2ee access-token handling (#3562)
* fix(matrix): harden e2ee access-token handling

* fix: patch nio mock in e2ee maintenance sync loop test

The sync_loop now imports nio for SyncError checking (from PR #3280),
so the test needs to inject a fake nio module via sys.modules.

---------

Co-authored-by: Cortana <andrew+cortana@chalkley.org>
2026-03-28 12:13:35 -07:00
Teknium
be39292633 fix(cli): guard .strip() against None values from YAML config (#3552)
dict.get(key, default) only returns default when key is ABSENT.
When YAML has 'key:' with no value, it parses as None — .get()
returns None, then .strip() crashes with AttributeError.

Use (x or '') pattern to handle both missing and null cases.


Salvaged from PR #3217.

Co-authored-by: erosika <erosika@users.noreply.github.com>
2026-03-28 11:39:01 -07:00
Teknium
df6ce848e9 fix(provider): remove MiniMax /v1→/anthropic auto-correction to allow user override (#3553)
The minimax-specific auto-correction in runtime_provider.py was
preventing users from overriding to the OpenAI-compatible endpoint
via MINIMAX_BASE_URL. Users in certain regions get nginx 404 on
api.minimax.io/anthropic and need to switch to api.minimax.chat/v1.

The generic URL-suffix detection already handles /anthropic →
anthropic_messages, so the minimax-specific code was redundant for
the default path and harmful for the override path.

Now: default /anthropic URL works via generic detection, user
override to /v1 gets chat_completions mode naturally.

Closes #3546 (different approach — respects user overrides instead
of changing the default endpoint).
2026-03-28 11:36:59 -07:00
Teknium
735ca9dfb2 refactor: replace swe-rex with native Modal SDK for Modal backend (#3538)
Drop the swe-rex dependency for Modal terminal backend and use the
Modal SDK directly (Sandbox.create + Sandbox.exec). This fixes:

- AsyncUsageWarning from synchronous App.lookup() in async context
- DeprecationError from unencrypted_ports / .url on unencrypted tunnels
  (deprecated 2026-03-05)

The new implementation:
- Uses modal.App.lookup.aio() for async-safe app creation
- Uses Sandbox.create.aio() with 'sleep infinity' entrypoint
- Uses Sandbox.exec.aio() for direct command execution (no HTTP server
  or tunnel needed)
- Keeps all existing features: persistent filesystem snapshots,
  configurable resources (CPU/memory/disk), sudo support, interrupt
  handling, _AsyncWorker for event loop safety

Consistent with the Docker backend precedent (PR #2804) where we
removed mini-swe-agent in favor of direct docker run.

Files changed:
- tools/environments/modal.py - core rewrite
- tools/terminal_tool.py - health check: modal instead of swerex
- hermes_cli/setup.py - install modal instead of swe-rex[modal]
- pyproject.toml - modal extra: modal>=1.0.0 instead of swe-rex[modal]
- scripts/kill_modal.sh - grep for hermes-agent instead of swe-rex
- tests/ - updated for new implementation
- environments/README.md - updated patches section
- website/docs - updated install command
2026-03-28 11:21:44 -07:00
Teknium
455bf2e853 feat: activate plugin lifecycle hooks (pre/post_llm_call, session start/end) (#3542)
The plugin system defined six lifecycle hooks but only pre_tool_call and
post_tool_call were invoked.  This activates the remaining four so that
external plugins (e.g. memory systems) can hook into the conversation
loop without touching core code.

Hook semantics:
- on_session_start: fires once when a new session is created
- pre_llm_call: fires once per turn before the tool-calling loop;
  plugins can return {"context": "..."} to inject into the ephemeral
  system prompt (not cached, not persisted)
- post_llm_call: fires once per turn after the loop completes, with
  user_message and assistant_response for sync/storage
- on_session_end: fires at the end of every run_conversation call

invoke_hook() now returns a list of non-None callback return values,
enabling pre_llm_call context injection while remaining backward
compatible (existing hooks that return None are unaffected).

Salvaged from PR #2823.

Co-authored-by: Nicolò Boschi <boschi1997@gmail.com>
2026-03-28 11:14:54 -07:00
Teknium
411e3c1539 fix(api-server): allow Idempotency-Key in CORS headers (#3530)
Browser clients using the Idempotency-Key header for request
deduplication were blocked by CORS preflight because the header
was not listed in Access-Control-Allow-Headers.

Add Idempotency-Key to _CORS_HEADERS and add tests for both the
new header allowance and the existing Vary: Origin behavior.

Co-authored-by: aydnOktay <aydnOktay@users.noreply.github.com>
Co-authored-by: Hermes Agent <hermes@nousresearch.com>
2026-03-28 08:16:41 -07:00
Teknium
d313a3b7d7 fix: auto-repair jobs.json with invalid control characters (#3537)
load_jobs() uses strict json.load() which rejects bare control characters
(e.g. literal newlines) in JSON string values. When a cron job prompt
contains such characters, the parser throws JSONDecodeError and the
function silently returns an empty list — causing ALL scheduled jobs
to stop firing with no error logged.

Fix: on JSONDecodeError, retry with json.loads(strict=False). If jobs
are recovered, auto-rewrite the file with proper escaping via save_jobs()
and log a warning. Only fall back to empty list if the JSON is truly
unrecoverable.

Co-authored-by: Sebastian Bochna <sbochna@SB-MBP-M2-2.local>
2026-03-28 08:15:31 -07:00
Teknium
80a899a8e2 fix: enable fine-grained tool streaming for Claude/OpenRouter + retry SSE errors (#3497)
Root cause: Anthropic buffers entire tool call arguments and goes silent
for minutes while thinking (verified: 167s gap with zero SSE events on
direct API).  OpenRouter's upstream proxy times out after ~125s of
inactivity and drops the connection with 'Network connection lost'.

Fix: Send the x-anthropic-beta: fine-grained-tool-streaming-2025-05-14
header for Claude models on OpenRouter.  This makes Anthropic stream
tool call arguments token-by-token instead of buffering them, keeping
the connection alive through OpenRouter's proxy.

Live-tested: the exact prompt that consistently failed at ~128s now
completes successfully — 2,972 lines written, 49K tokens, 8 minutes.

Additional improvements:

1. Send explicit max_tokens for Claude through OpenRouter.  Without it,
   OpenRouter defaults to 65,536 (confirmed via echo_upstream_body) —
   only half of Opus 4.6's 128K limit.

2. Classify SSE 'Network connection lost' as retryable in the streaming
   inner retry loop.  The OpenAI SDK raises APIError from SSE error
   events, which was bypassing our transient error retry logic.

3. Actionable diagnostic guidance when stream-drop retries exhaust.
2026-03-28 08:01:37 -07:00
Teknium
e295a2215a fix(gateway): include user-local bin paths in systemd unit PATH (#3527)
Add ~/.local/bin, ~/.cargo/bin, ~/go/bin, ~/.npm-global/bin to the
systemd unit PATH so tools installed via uv/pipx/cargo/go are
discoverable by MCP servers and terminal commands.

Uses a _build_user_local_paths() helper that checks exists() before
adding, and correctly resolves home dir for both user and system
service types.

Co-authored-by: Kal Sze <ksze@users.noreply.github.com>
2026-03-28 07:47:40 -07:00
Teknium
831e8ba0e5 feat: tool-use enforcement + strip budget warnings from history (#3528)
Cherry-pick of feat/gpt-tool-steering with modifications:

1. Tool-use enforcement prompt (refactored from GPT-specific):
   - Renamed GPT_TOOL_USE_GUIDANCE -> TOOL_USE_ENFORCEMENT_GUIDANCE
   - Added TOOL_USE_ENFORCEMENT_MODELS tuple: ('gpt', 'codex')
   - Injection logic now checks against the tuple instead of hardcoding
     'gpt' — adding new model families is a one-line change
   - Addresses models describing actions instead of making tool calls

2. Budget warning history stripping:
   - _strip_budget_warnings_from_history() strips _budget_warning JSON
     keys and [BUDGET WARNING: ...] text from tool results at the start
     of run_conversation()
   - Prevents old budget warnings from poisoning subsequent turns

Based on PR #3479 by teknium1.
2026-03-28 07:38:36 -07:00
Teknium
9d4b3e5470 fix: harden hermes update against diverged history, non-main branches, and gateway edge cases (salvage #3489) (#3492)
* fix: harden `hermes update` against diverged history, non-main branches, and gateway edge cases

The self-update command (`hermes update` / gateway `/update`) could fail
or silently corrupt state in several scenarios:

1. **Diverged history** — `git pull --ff-only` aborts with a cryptic
   subprocess error when upstream has force-pushed or rebased. Now falls
   back to `git reset --hard origin/main` since local changes are already
   stashed.

2. **User on a feature branch / detached HEAD** — the old code would
   either clobber the feature branch HEAD to point at origin/main, or
   silently pull against a non-existent remote branch. Now auto-checkouts
   main before pulling, with a clear warning.

3. **Fetch failures** — network or auth errors produced raw subprocess
   tracebacks. Now shows user-friendly messages ("Network error",
   "Authentication failed") with actionable hints.

4. **reset --hard failure** — if the fallback reset itself fails (disk
   full, permissions), the old code would still attempt stash restore on
   a broken working tree. Now skips restore and tells the user their
   changes are safe in stash.

5. **Gateway /update stash conflicts** — non-interactive mode (Telegram
   `/update`) called sys.exit(1) when stash restore had conflicts, making
   the entire update report as failed even though the code update itself
   succeeded. Now treats stash conflicts as non-fatal in non-interactive
   mode (returns False instead of exiting).

* fix: restore stash and branch on 'already up to date' early return

The PR moved stash creation before the commit-count check (needed for
the branch-switching feature), but the 'already up to date' early return
didn't restore the stash or switch back to the original branch — leaving
the user stranded on main with changes trapped in a stash.

Now the early-return path restores the stash and checks out the original
branch when applicable.

---------

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-27 23:12:43 -07:00
Teknium
6ed9740444 fix: prevent unbounded growth of _seen_uids in EmailAdapter (#3490)
EmailAdapter._seen_uids accumulates every IMAP UID ever seen but
never removes any. A long-running gateway processing a high-volume
inbox would leak memory indefinitely — thousands of integers per day.

IMAP UIDs are monotonically increasing integers, so old UIDs are safe
to drop: new messages always have higher UIDs, and the IMAP UNSEEN
flag already prevents re-delivery regardless of our local tracking.

Fix adds _trim_seen_uids() which keeps only the most recent 1000 UIDs
(half of the 2000-entry cap) when the set grows too large. Called
automatically during connect() and after each fetch cycle.

Co-authored-by: memosr.eth <96793918+memosr@users.noreply.github.com>
2026-03-27 23:08:42 -07:00
Teknium
290c71a707 fix(gateway): scope progress thread fallback to Slack only (salvage #3414) (#3488)
* test(gateway): map fixture adapter by platform in progress threading tests

* fix(gateway): scope progress thread fallback to Slack only

---------

Co-authored-by: EmpireOperating <258363005+EmpireOperating@users.noreply.github.com>
2026-03-27 22:37:53 -07:00
Teknium
09796b183b fix: alibaba provider default endpoint and model list (#3484)
- Change default inference_base_url from dashscope-intl Anthropic-compat
  endpoint to coding-intl OpenAI-compat /v1 endpoint. The old Anthropic
  endpoint 404'd when used with the OpenAI SDK (which appends
  /chat/completions to a /apps/anthropic base URL).

- Update curated model list: remove models unavailable on coding-intl
  (qwen3-max, qwen-plus-latest, qwen3.5-flash, qwen-vl-max), add
  third-party models available on the platform (glm-5, glm-4.7,
  kimi-k2.5, MiniMax-M2.5).

- URL-based api_mode auto-detection still works: overriding
  DASHSCOPE_BASE_URL to an /apps/anthropic endpoint automatically
  switches to anthropic_messages mode.

- Update provider description and env var descriptions to reflect the
  coding-intl multi-provider platform.

- Update tests to match new default URL and test the anthropic override
  path instead.
2026-03-27 22:10:10 -07:00
Teknium
15cfd20820 fix: cap context pressure percentage at 100% in display (#3480)
* fix: cap context pressure percentage at 100% in display

The forward-looking token estimate can overshoot the compaction threshold
(e.g. a large tool result pushes it from 70% to 109% in one step). The
progress bar was already capped via min(), but pct_int was not — causing
the user to see '109% to compaction' which is confusing.

Cap pct_int at 100 in both CLI and gateway display functions.

Reported by @JoshExile82.

* refactor: use real API token counts for compression decisions

Replace the rough chars/3 estimation with actual prompt_tokens +
completion_tokens from the API response. The estimation was needed to
predict whether tool results would push context past the threshold, but
the default 50% threshold leaves ample headroom — if tool results push
past it, the next API call reports real usage and triggers compression
then.

This removes all estimation from the compression and context pressure
paths, making both 100% data-driven from provider-reported token counts.

Also removes the dead _msg_count_before_tools variable.
2026-03-27 21:42:09 -07:00
Teknium
03f24c1edd fix: session_search fallback preview on summarization failure (salvage #3413) (#3478)
* Fix #3409: Add fallback to session_search to prevent false negatives on summarization failure

Fixes #3409. When the auxiliary summarizer fails or returns None, the tool now returns a raw fallback preview of the matched session instead of silently dropping it and returning an empty list

* fix: clean up fallback logic — separate exception handling from preview

Restructure the loop: handle exceptions first (log + nullify), build
entry dict once, then branch on result truthiness. Removes duplicated
field assignments and makes the control flow linear.

---------

Co-authored-by: devorun <130918800+devorun@users.noreply.github.com>
2026-03-27 21:27:51 -07:00
Teknium
388fa5293d fix(matrix): add missing matrix entry in PLATFORMS dict (#3473)
Matrix platform was missing from the PLATFORMS config, causing a
KeyError in _get_platform_tools() when handling Matrix messages.
Every other platform (telegram, discord, slack, etc.) was present
but matrix was overlooked.

Co-authored-by: williamtwomey <williamtwomey@users.noreply.github.com>
2026-03-27 18:36:23 -07:00
Teknium
83043e9aa8 fix: add timeout to subprocess calls in context_references (#3469)
_expand_git_reference() and _rg_files() called subprocess.run()
without a timeout. On a large repository, @diff, @staged, or
@git:N references could hang the agent indefinitely while git
or ripgrep processes slow output.

- Add timeout=30 to git subprocess in _expand_git_reference()
  with a user-friendly error message on TimeoutExpired
- Add timeout=10 to rg subprocess in _rg_files() returning
  None on timeout (falls back to os.walk folder listing)

Co-authored-by: memosr.eth <96793918+memosr@users.noreply.github.com>
2026-03-27 17:51:14 -07:00
Teknium
b6b87dedd4 fix: discover plugins before reading plugin toolsets in tools_config (#3457)
hermes tools and _get_platform_tools() call get_plugin_toolsets() /
_get_plugin_toolset_keys() without first ensuring plugins have been
discovered. discover_plugins() only runs as a side effect of importing
model_tools.py, which hermes tools never does. This means:

- hermes tools TUI never shows plugin toolsets (invisible to users)
- _get_platform_tools() in standalone processes misses plugin toolsets

Fix: call discover_plugins() (idempotent) in both
_get_plugin_toolset_keys() and _get_effective_configurable_toolsets()
before accessing plugin state. In the gateway/CLI where model_tools.py
is already imported, the call is a no-op (discover_and_load checks
_discovered flag).
2026-03-27 15:31:17 -07:00
Teknium
8fdfc4b00c fix(agent): detect thinking-budget exhaustion on truncation, skip useless retries (#3444)
When finish_reason='length' and the response contains only reasoning
(think blocks or empty content), the model exhausted its output token
budget on thinking with nothing left for the actual response.

Previously, this fell into either:
- chat_completions: 3 useless continuation retries (model hits same limit)
- anthropic/codex: generic 'Response truncated' error with rollback

Now: detect the think-only + length condition early and return immediately
with a targeted error message: 'Model used all output tokens on reasoning
with none left for the response. Try lowering reasoning effort or
increasing max_tokens.'

This saves 2 wasted API calls on the chat_completions path and gives
users actionable guidance instead of a cryptic error.

The existing think-only retry logic (finish_reason='stop') is unchanged —
that's a genuine model glitch where retrying can help.
2026-03-27 15:29:30 -07:00
Teknium
658692799d fix: guard aux LLM calls against None content + reasoning fallback + retry (salvage #3389) (#3449)
Salvage of #3389 by @binhnt92 with reasoning fallback and retry logic added on top.

All 7 auxiliary LLM call sites now use extract_content_or_reasoning() which mirrors the main agent loop's behavior: extract content, strip think blocks, fall back to structured reasoning fields, retry on empty.

Closes #3389.
2026-03-27 15:28:19 -07:00
Teknium
ab09f6b568 feat: curate HF model picker with OpenRouter analogues (#3440)
Show only agentic models that map to OpenRouter defaults:

  Qwen/Qwen3.5-397B-A17B          ↔ qwen/qwen3.5-plus
  Qwen/Qwen3.5-35B-A3B            ↔ qwen/qwen3.5-35b-a3b
  deepseek-ai/DeepSeek-V3.2       ↔ deepseek/deepseek-chat
  moonshotai/Kimi-K2.5             ↔ moonshotai/kimi-k2.5
  MiniMaxAI/MiniMax-M2.5           ↔ minimax/minimax-m2.5
  zai-org/GLM-5                    ↔ z-ai/glm-5
  XiaomiMiMo/MiMo-V2-Flash         ↔ xiaomi/mimo-v2-pro
  moonshotai/Kimi-K2-Thinking      ↔ moonshotai/kimi-k2-thinking

Users can still pick any HF model via Enter custom model name.
2026-03-27 13:54:46 -07:00
Teknium
e4e04c2005 fix: make tirith block verdicts approvable instead of hard-blocking (#3428)
Previously, tirith exit code 1 (block) immediately rejected the command
with no approval prompt — users saw 'BLOCKED: Command blocked by
security scan' and the agent moved on.  This prevented gateway/CLI users
from approving pipe-to-shell installs like 'curl ... | sh' even when
they understood the risk.

Changes:
- Tirith 'block' and 'warn' now both go through the approval flow.
  Users see the full tirith findings (severity, title, description,
  safer alternatives) and can choose to approve or deny.
- New _format_tirith_description() builds rich descriptions from tirith
  findings JSON so the approval prompt is informative.
- CLI startup now warns when tirith is enabled but not available, so
  users know command scanning is degraded to pattern matching only.

The default approval choice is still deny, so the security posture is
unchanged for unattended/timeout scenarios.

Reported via Discord by pistrie — 'curl -fsSL https://mandex.dev/install.sh | sh'
was hard-blocked with no way to approve.
2026-03-27 13:22:01 -07:00
Teknium
6f11ff53ad fix(anthropic): use model-native output limits instead of hardcoded 16K (#3426)
The Anthropic adapter defaulted to max_tokens=16384 when no explicit value
was configured.  This severely limits thinking-enabled models where thinking
tokens count toward max_tokens:

- Claude Opus 4.6 supports 128K output but was capped at 16K
- Claude Sonnet 4.6 supports 64K output but was capped at 16K

With extended thinking (adaptive or budget-based), the model could exhaust
the entire 16K on reasoning, leaving zero tokens for the actual response.
This caused two user-visible errors:
- 'Response truncated (finish_reason=length)' — thinking consumed most tokens
- 'Response only contains think block with no content' — thinking consumed all

Fix: add _ANTHROPIC_OUTPUT_LIMITS lookup table (sourced from Anthropic docs
and Cline's model catalog) and use the model's actual output limit as the
default.  Unknown future models default to 128K (the current maximum).

Also adds context_length clamping: if the user configured a smaller context
window (e.g. custom endpoint), max_tokens is clamped to context_length - 1
to avoid exceeding the window.

Closes #2706
2026-03-27 13:02:52 -07:00
Teknium
fb46a90098 fix: increase API timeout default from 900s to 1800s for slow-thinking models (#3431)
Models like GLM-5/5.1 can think for 15+ minutes. The previous 900s
(15 min) default for HERMES_API_TIMEOUT killed legitimate requests.

Raised to 1800s (30 min) in both places that read the env var:
- _build_api_kwargs() timeout (non-streaming total timeout)
- _call_chat_completions() write timeout (streaming connection)

The streaming per-chunk read timeout (60s) and stale stream detector
(180-300s) are unchanged — those are appropriate for inter-chunk timing.
2026-03-27 13:02:23 -07:00
Teknium
fd8c465e42 feat: add Hugging Face as a first-class inference provider (#3419)
Salvage of PR #1747 (original PR #1171 by @davanstrien) onto current main.

Registers Hugging Face Inference Providers (router.huggingface.co/v1) as a named provider:
- hermes chat --provider huggingface (or --provider hf)
- 18 curated open models via hermes model picker
- HF_TOKEN in ~/.hermes/.env
- OpenAI-compatible endpoint with automatic failover (Groq, Together, SambaNova, etc.)

Files: auth.py, models.py, main.py, setup.py, config.py, model_metadata.py, .env.example, 5 docs pages, 17 new tests.

Co-authored-by: Daniel van Strien <davanstrien@gmail.com>
2026-03-27 12:41:59 -07:00
Teknium
f57ebf52e9 fix(api-server): cancel orphaned agent + true interrupt on SSE disconnect (salvage #3399) (#3427)
Salvage of #3399 by @binhnt92 with true agent interruption added on top.

When a streaming /v1/chat/completions client disconnects mid-stream, the agent is now interrupted via agent.interrupt() so it stops making LLM API calls, and the asyncio task wrapper is cancelled.

Closes #3399.
2026-03-27 11:33:19 -07:00
Teknium
5127567d5d perf(ttft): cache skills prompt with shared skill_utils module (salvage #3366) (#3421)
Two-layer caching for build_skills_system_prompt():
  1. In-process LRU (OrderedDict, max 8) — same-process: 546ms → <1ms
  2. Disk snapshot (.skills_prompt_snapshot.json) — cold start: 297ms → 103ms

Key improvements over original PR #3366:
- Extract shared logic into agent/skill_utils.py (parse_frontmatter,
  skill_matches_platform, get_disabled_skill_names, extract_skill_conditions,
  extract_skill_description, iter_skill_index_files)
- tools/skills_tool.py delegates to shared module — zero code duplication
- Proper LRU eviction via OrderedDict.move_to_end + popitem(last=False)
- Cache invalidation on all skill mutation paths:
  - skill_manage tool (in-conversation writes)
  - hermes skills install (CLI hub)
  - hermes skills uninstall (CLI hub)
  - Automatic via mtime/size manifest on cold start

prompt_builder.py no longer imports tools.skills_tool (avoids pulling
in the entire tool registry chain at prompt build time).

6301 tests pass, 0 failures.

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-27 10:54:02 -07:00
Siddharth Balyan
cc4514076b feat(nix): add suffix PATHs during nix build for more agent-friendliness (#3274)
* refactor: suffix runtimeDeps PATH so apt-installed tools take priority

Changes makeWrapper from --prefix to --suffix. In container mode,
tools installed via apt in /usr/bin now win over read-only nix store
copies. Nix store versions become dead-letter fallbacks. Native NixOS
mode unaffected — tools in /run/current-system/sw/bin already precede
the suffix.

* feat(container): first-boot apt provisioning for agent tools

Installs nodejs, npm, curl via apt and uv via curl on first container
boot. Uses sentinel file so subsequent boots skip. Container recreation
triggers fresh install. Combined with --suffix PATH change, agents get
mutable tools that support npm i -g and uv without hitting read-only
nix store paths.

* docs: update nixosModules header for tool provisioning

* feat(container): consolidate first-boot provisioning + Python 3.11 venv

Merge sudo and tool apt installs into a single apt-get update call.
Move uv install outside the sentinel so transient failures retry on
next boot. Bootstrap a Python 3.11 venv via uv (--seed for pip) and
prepend ~/.venv/bin to PATH so agents get writable python/pip/node
out of the box.

---------

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
2026-03-27 23:00:56 +05:30
Teknium
8ecd7aed2c fix: prevent reasoning box from rendering 3x during tool-calling loops (#3405)
Two independent bugs caused the reasoning box to appear three times when
the model produced reasoning + tool_calls:

Bug A: _build_assistant_message() re-fired reasoning_callback with the full
reasoning text even when streaming had already displayed it. The original
guard only checked structured reasoning_content deltas, but reasoning also
arrives via content tag extraction (<REASONING_SCRATCHPAD>/<think> tags
in delta.content), which went through _fire_stream_delta not
_fire_reasoning_delta. Fix: skip the callback entirely when streaming is
active — both paths display reasoning during the stream. Any reasoning not
shown during streaming is caught by the CLI post-response fallback.

Bug B: The post-response reasoning display checked _reasoning_stream_started,
but that flag was reset by _reset_stream_state() during intermediate turn
boundaries (when stream_delta_callback(None) fires between tool calls).
Introduced _reasoning_shown_this_turn flag that persists across the tool
loop and is only reset at the start of each user turn.

Live-tested in PTY: reasoning now shows exactly once per API call, no
duplicates across tool-calling loops.
2026-03-27 09:57:50 -07:00
Teknium
e0dbbdb2c9 fix: eliminate 'Event loop is closed' / 'Press ENTER to continue' during idle (#3398)
The OpenAI SDK's AsyncHttpxClientWrapper.__del__ schedules aclose() via
asyncio.get_running_loop().create_task().  When an AsyncOpenAI client is
garbage-collected while prompt_toolkit's event loop is running (the common
CLI idle state), the aclose() task runs on prompt_toolkit's loop but the
underlying TCP transport is bound to a different (dead) worker loop.
The transport's self._loop.call_soon() then raises RuntimeError('Event
loop is closed'), which prompt_toolkit surfaces as the disruptive
'Unhandled exception in event loop ... Press ENTER to continue...' error.

Three-layer fix:

1. neuter_async_httpx_del(): Monkey-patches __del__ to a no-op at CLI
   startup before any AsyncOpenAI clients are created.  Safe because
   cached clients are explicitly cleaned via _force_close_async_httpx,
   and uncached clients' TCP connections are cleaned by the OS on exit.

2. Custom asyncio exception handler: Installed on prompt_toolkit's event
   loop to silently suppress 'Event loop is closed' RuntimeError.
   Defense-in-depth for SDK upgrades that might change the class name.

3. cleanup_stale_async_clients(): Called after each agent turn (when the
   agent thread joins) to proactively evict cache entries whose event
   loop is closed, preventing stale clients from accumulating.
2026-03-27 09:45:25 -07:00
Teknium
eb2127c1dc fix(cron): prevent recurring job re-fire on gateway crash/restart loop (#3396)
When a gateway crashes mid-job execution (before mark_job_run can persist
the updated next_run_at), the job would fire again on every restart attempt
within the grace window. For a daily 6:15 AM job with a 2-hour grace,
rapidly restarting the gateway could trigger dozens of duplicate runs.

Fix: call advance_next_run() BEFORE run_job() in tick(). For recurring
jobs (cron/interval), this preemptively advances next_run_at to the next
future occurrence and persists it to disk. If the process then crashes
during execution, the job won't be considered due on restart.

One-shot jobs are left unchanged — they still retry on restart since
there's no future occurrence to advance to.

This changes the scheduler from at-least-once to at-most-once semantics
for recurring jobs, which is the correct tradeoff: missing one daily
message is far better than sending it dozens of times.
2026-03-27 08:02:58 -07:00
Teknium
5a1e2a307a perf(ttft): salvage easy-win startup optimizations from #3346 (#3395)
* perf(ttft): dedupe shared tool availability checks

* perf(ttft): short-circuit vision auto-resolution

* perf(ttft): make Claude Code version detection lazy

* perf(ttft): reuse loaded toolsets for skills prompt

---------

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-27 07:49:44 -07:00
Teknium
41d9d08078 fix(telegram): fall back to no thread_id on 'Message thread not found' (#3390)
python-telegram-bot's BadRequest inherits from NetworkError, so the
send() retry loop was catching 'Message thread not found' as a transient
network error and retrying 3 times before silently failing. This killed
all tool progress messages, streaming responses, and typing indicators
when the incoming message carried an invalid message_thread_id.

Now detect BadRequest inside the NetworkError handler:
- 'thread not found' + thread_id set → clear thread_id and retry once
  (message still reaches the chat, just without topic threading)
- Other BadRequest errors → raise immediately (permanent, don't retry)
- True NetworkError → retry as before (transient)

252 silent failures in gateway.log traced to this on 2026-03-26.

5 new tests for thread fallback, non-thread BadRequest, no-thread sends,
network retry, and multi-chunk fallback.
2026-03-27 06:07:28 -07:00
Teknium
b7bcae49c6 fix: SQLite WAL write-lock contention causing 15-20s TUI freeze (#3385)
Multiple hermes processes (gateway + CLI sessions + worktree agents) sharing
one state.db caused WAL write-lock convoy effects. SQLite's built-in busy
handler uses deterministic sleep intervals (up to 100ms) that synchronize
competing writers, creating 15-20 second freezes during agent init.

Root cause: timeout=30.0 with 7+ concurrent connections meant:
- WAL never checkpointed (294MB, readers always blocked it)
- Bloated WAL slowed all reads and writes
- Deterministic backoff caused convoy effects under contention

Fix:
- Replace 30s SQLite timeout with 1s + app-level retry (15 attempts,
  random 20-150ms jitter between retries to break convoys)
- Use BEGIN IMMEDIATE for explicit write-lock acquisition (fail fast)
- Set isolation_level=None for manual transaction control
- PASSIVE WAL checkpoint on close() and every 50 writes
- All 12 write methods converted to _execute_write() helper

Before: 15-20s frozen at create_session during agent init
After:  <1s to API call, WAL stays at ~4MB

Tested: 4355 tests pass, 3 concurrent live sessions with simultaneous
writes showed zero contention on every py-spy sample.
2026-03-27 05:22:57 -07:00
Teknium
915df02bbf fix(streaming): stale stream detector race causing spurious RemoteProtocolError
The stale stream detector (90s timeout) was killing healthy connections
during the model's thinking phase, producing self-inflicted
RemoteProtocolError ("peer closed connection without sending complete
message body"). Three issues:

1. last_chunk_time was never reset between inner stream retries, so
   subsequent attempts inherited the previous attempt's stale budget
2. The non-streaming fallback path didn't reset the timer either
3. 90s base timeout was too aggressive for large-context Opus sessions
   where thinking time before first token routinely exceeds 90s

Fix: reset last_chunk_time at the start of each streaming attempt and
before the non-streaming fallback. Increase base timeout to 180s and
scale to 300s for >100K token contexts.

Made-with: Cursor
2026-03-27 04:05:51 -07:00
Teknium
75fcbc44ce feat(telegram): auto-discover fallback IPs via DoH when api.telegram.org is unreachable (#3376)
* feat(telegram): auto-discover fallback IPs via DoH when api.telegram.org is unreachable

On some networks (university, corporate), api.telegram.org resolves to a
valid Telegram IP that is unreachable due to routing/firewall rules. A
different IP in the same Telegram-owned 149.154.160.0/20 block works fine.

This adds automatic fallback IP discovery at connect time:
1. Query Google and Cloudflare DNS-over-HTTPS for api.telegram.org A records
2. Exclude the system-DNS IP (the unreachable one), use the rest as fallbacks
3. If DoH is also blocked, fall back to a seed list (149.154.167.220)
4. TelegramFallbackTransport tries primary first, sticks to whichever works

No configuration needed — works automatically. TELEGRAM_FALLBACK_IPS env var
still available as manual override. Zero impact on healthy networks (primary
path succeeds on first attempt, fallback never exercised).

No new dependencies (uses httpx already in deps + stdlib socket).

* fix: share transport instance and downgrade seed fallback log to info

- Use single TelegramFallbackTransport shared between request and
  get_updates_request so sticky IP is shared across polling and API calls
- Keep separate HTTPXRequest instances (different timeout settings)
- Downgrade "using seed fallback IPs" from warning to info to avoid
  noisy logs on healthy networks

* fix: add telegram.request mock and discovery fixture to remaining test files

The original PR missed test_dm_topics.py and
test_telegram_network_reconnect.py — both need the telegram.request
mock module. The reconnect test also needs _no_auto_discovery since
_handle_polling_network_error calls connect() which now invokes
discover_fallback_ips().

---------

Co-authored-by: Mohan Qiao <Gavin-Qiao@users.noreply.github.com>
2026-03-27 04:03:13 -07:00
Teknium
be416cdfa9 fix: guard config.get() against YAML null values to prevent AttributeError (#3377)
dict.get(key, default) returns None — not the default — when the key IS
present but explicitly set to null/~ in YAML.  Calling .lower() on that
raises AttributeError.

Use (config.get(key) or fallback) so both missing keys and explicit nulls
coalesce to the intended default.

Files fixed:
- tools/tts_tool.py — _get_provider()
- tools/web_tools.py — _get_backend()
- tools/mcp_tool.py — MCPServerTask auth config
- trajectory_compressor.py — _detect_provider() and config loading

Co-authored-by: dieutx <dangtc94@gmail.com>
2026-03-27 04:03:00 -07:00
Teknium
b8b1f24fd7 fix: handle addition-only hunks in V4A patch parser (#3325)
V4A patches with only + lines (no context or - lines) were silently
dropped because search_lines was empty and the 'if search_lines:' block
was the only code path. Addition-only hunks are common when the model
generates patches for new functions or blocks.

Adds an else branch that inserts at the context_hint position when
available, or appends at end of file.

Includes 2 regression tests for addition-only hunks with and without
context hints.

Salvaged from PR #3092 by thakoreh.

Co-authored-by: Hiren <hiren.thakore58@gmail.com>
2026-03-26 19:38:04 -07:00
Teknium
a2847ea7f0 fix(gateway): add media download retry to Mattermost, Slack, and base cache (#3323)
* fix(gateway): add media download retry to Mattermost, Slack, and base cache

Media downloads on Mattermost and Slack fail permanently on transient
errors (timeouts, 429 rate limits, 5xx server errors). Telegram and
WhatsApp already have retry logic, but these platforms had single-attempt
downloads with hardcoded 30s timeouts.

Changes:
- base.py cache_image_from_url: add retry with exponential backoff
  (covers Signal and any platform using the shared cache helper)
- mattermost.py _send_media_url: retry on 429/5xx/timeout (3 attempts)
- slack.py _download_slack_file: retry on timeout/5xx (3 attempts)
- slack.py _download_slack_file_bytes: same retry pattern

* test: add tests for media download retry

---------

Co-authored-by: dieutx <dangtc94@gmail.com>
2026-03-26 19:33:18 -07:00
Teknium
58ca875e19 feat(gateway): surface session config on /new, /reset, and auto-reset (#3321)
When a new session starts in the gateway (via /new, /reset, or
auto-reset), send the user a summary of the detected configuration:

   Session reset! Starting fresh.

  ◆ Model: qwen3.5:27b-q4_K_M
  ◆ Provider: custom
  ◆ Context: 8K tokens (config)
  ◆ Endpoint: http://localhost:11434/v1

This makes misconfigured context length immediately visible — a user
running a local 8K model that falls to the 128K default will see:

  ◆ Context: 128K tokens (default — set model.context_length in config to override)

Instead of silently getting no compression and degrading responses.

- _format_session_info() resolves model, provider, context length,
  and endpoint from config + runtime, matching the hygiene code's
  resolution chain
- Local/custom endpoints shown; cloud endpoints hidden (not useful)
- Context source annotated: config, detected, or default with hint
- Appended to /new and /reset responses, and auto-reset notifications
- 9 tests covering all formatting paths and failure resilience

Addresses the user-facing side of #2708 — instead of trying to fix
every edge case in context detection, surface the values so users
can immediately see when something is wrong.
2026-03-26 19:27:58 -07:00
Teknium
3f95e741a7 fix: validate empty user messages to prevent Anthropic API 400 errors (#3322)
When user messages have empty content (e.g., Discord @mention-only
messages, unrecognized attachments), the Anthropic API rejects the
request with 'user messages must have non-empty content'.

Changes:
- anthropic_adapter.py: Add empty content validation for user messages
  (string and list formats), matching the existing pattern for assistant
  and tool messages. Empty content gets '(empty message)' placeholder.

- discord.py: Defense-in-depth check at gateway layer to catch empty
  messages before they enter session history.

- Add 4 regression tests covering empty string, whitespace-only,
  empty list, and empty text block scenarios.

Fixes #3143

Co-authored-by: Bartok9 <bartok9@users.noreply.github.com>
2026-03-26 19:24:03 -07:00
Teknium
03396627a6 fix(ci): pin acp <0.9 and update retry-exhaust test (#3320)
Two remaining CI failures:

1. agent-client-protocol 0.9.0 removed AuthMethod (replaced with
   AuthMethodAgent/EnvVar/Terminal). Pin to <0.9 until the new API
   is evaluated — our usage doesn't map 1:1 to the new types.

2. test_429_exhausts_all_retries_before_raising expected pytest.raises
   but the agent now catches 429s after max retries, tries fallback,
   then returns a result dict. Updated to check final_response.
2026-03-26 19:21:34 -07:00
Teknium
22cfad157b fix: gateway token double-counting — use absolute set instead of increment (#3317)
The gateway's update_session() used += for token counts, but the cached
agent's session_prompt_tokens / session_completion_tokens are cumulative
totals that grow across messages. Each update_session call re-added the
running total, inflating usage stats with every message (1.7x after 3
messages, worse over longer conversations).

Fix: change += to = for in-memory entry fields, add set_token_counts()
to SessionDB that uses direct assignment instead of SQL increment, and
switch the gateway to call it.

CLI mode continues using update_token_counts() (increment) since it
tracks per-API-call deltas — that path is unchanged.

Based on analysis from PR #3222 by @zaycruz (closed).

Co-authored-by: zaycruz <zay@users.noreply.github.com>
2026-03-26 19:13:07 -07:00
Teknium
867eefdd9f fix(signal): track SSE keepalive comments as connection activity (#3316)
signal-cli sends SSE comment lines (':') as keepalives every ~15s. The
SSE listener only counted 'data:' lines as activity, so the health
monitor reported false idle warnings every 2 minutes during quiet
periods. Recognize ':' lines as valid activity per the SSE spec.

Salvaged from PR #2938 by ticketclosed-wontfix.
2026-03-26 19:10:25 -07:00
Teknium
a8df7f9964 fix: gateway token double-counting with cached agents (#3306)
The cached agent accumulates session_input_tokens across messages, so
run_conversation() returns cumulative totals. But update_session() used
+= (increment), double-counting on every message after the first.

- session.py: change in-memory entry updates from += to = (direct
  assignment for cumulative values)
- hermes_state.py: add absolute=True flag to update_token_counts()
  that uses SET column = ? instead of SET column = column + ?
- session.py: pass absolute=True to the DB call

CLI path is unchanged — it passes per-API-call deltas directly to
update_token_counts() with the default absolute=False (increment).

Reported by @zaycruz in #3222. Closes #3222.
2026-03-26 19:04:53 -07:00
Teknium
1519c4d477 fix(session): add /resume CLI handler, session log truncation guard, reopen_session API (#3315)
Three improvements salvaged from PR #3225 by Mibayy:

1. Add /resume slash command handler in CLI process_command(). The
   command was registered in the commands registry but had no handler,
   so typing /resume produced 'Unknown command'. The handler resolves
   by title or session ID, ends the current session cleanly, loads
   conversation history from SQLite, re-opens the target session, and
   syncs the AIAgent instance. Follows the same pattern as new_session().

2. Add truncation guard in _save_session_log(). When resuming a session
   whose messages weren't fully written to SQLite, the agent starts with
   partial history and the first save would overwrite the full JSON log
   on disk. The guard reads the existing file and skips the write if it
   already has more messages than the current batch.

3. Add reopen_session() method to SessionDB. Proper API for clearing
   ended_at/end_reason instead of reaching into _conn directly.

Note: Bug 1 from the original PR (INSERT OR IGNORE + _session_db = None)
is already fixed on main — skipped as redundant.

Closes #3123.
2026-03-26 19:04:28 -07:00
Teknium
005786c55d fix(gateway): include per-platform ALLOW_ALL and SIGNAL_GROUP in startup allowlist check (#3313)
The startup warning 'No user allowlists configured' only checked
GATEWAY_ALLOW_ALL_USERS and per-platform _ALLOWED_USERS vars. It
missed SIGNAL_GROUP_ALLOWED_USERS and per-platform _ALLOW_ALL_USERS
vars (e.g. TELEGRAM_ALLOW_ALL_USERS), causing a false warning even
when users had these configured. The actual auth check in
_is_user_authorized already recognized these vars.

Cherry-picked from PR #3202 by binhnt92.

Co-authored-by: binhnt92 <binhnt.ht.92@gmail.com>
2026-03-26 18:23:49 -07:00
Teknium
ad764d3513 fix(auxiliary): catch ImportError from build_anthropic_client in vision auto-detection (#3312)
_try_anthropic() caught ImportError on the module import (line 667-669)
but not on the build_anthropic_client() call (line 696). When the
anthropic_adapter module imports fine but the anthropic SDK is missing,
build_anthropic_client() raises ImportError at call time. This escaped
_try_anthropic() entirely, killing get_available_vision_backends() and
cascading to 7 test failures:

- 4 setup wizard tests hit unexpected 'Configure vision:' prompt
- 3 codex-auth-as-vision tests failed check_vision_requirements()

The fix wraps the build_anthropic_client call in try/except ImportError,
returning (None, None) when the SDK is unavailable — consistent with the
existing guard at the top of the function.
2026-03-26 18:21:59 -07:00
Teknium
f008ee1019 fix(session): preserve reasoning fields in rewrite_transcript (#3311)
rewrite_transcript (used by /retry, /undo, /compress) was calling
append_message without reasoning, reasoning_details, or
codex_reasoning_items — permanently dropping them from SQLite.

Co-authored-by: alireza78a <alireza78.crypto@gmail.com>
2026-03-26 18:18:00 -07:00
Teknium
60fdb58ce4 fix(agent): update context compressor limits after fallback activation (#3305)
When _try_activate_fallback() switches to the fallback model, it
updates the agent's model/provider/client but never touches
self.context_compressor. The compressor keeps the primary model's
context_length and threshold_tokens, so compression decisions use
wrong limits — a 200K primary → 32K fallback still uses 200K-based
thresholds, causing oversized sessions to overflow the fallback.

Update the compressor's model, credentials, context_length, and
threshold_tokens after fallback activation using get_model_context_length()
for the new model.

Cherry-picked from PR #3202 by binhnt92.

Co-authored-by: binhnt92 <binhnt.ht.92@gmail.com>
2026-03-26 18:10:50 -07:00
Teknium
18d28c63a7 fix: add explicit hermes-api-server toolset for API server platform (#3304)
The API server adapter was creating agents without specifying
enabled_toolsets, causing ALL tools to load — including clarify,
send_message, and text_to_speech which don't work without interactive
callbacks or gateway dispatch.

Changes:
- toolsets.py: Add hermes-api-server toolset (core tools minus clarify,
  send_message, text_to_speech)
- api_server.py: Resolve toolsets from config.yaml platform_toolsets
  via _get_platform_tools() — same path as all other gateway platforms.
  Falls back to hermes-api-server default when no override configured.
- tools_config.py: Add api_server to PLATFORMS dict so users can
  customize via 'hermes tools' or platform_toolsets.api_server in
  config.yaml
- 12 tests covering toolset definition, config resolution, and
  user override

Reported by thatwolfieguy on Discord.
2026-03-26 18:02:26 -07:00
Teknium
3c57eaf744 fix: YAML boolean handling for tool_progress config (#3300)
YAML 1.1 parses bare `off` as boolean False, which is falsy in
Python's `or` chain and silently falls through to the 'all' default.
Users setting `display.tool_progress: off` in config.yaml saw no
effect — tool progress stayed on.

Normalise False → 'off' before the or chain in both affected paths:
- gateway/run.py _run_agent() tool progress reader
- cli.py HermesCLI.__init__() tool_progress_mode

Reported by @gibbsoft in #2859. Closes #2859.
2026-03-26 17:58:50 -07:00
Teknium
2d232c9991 feat(cli): configurable busy input mode + fix /queue always working (#3298)
Two changes:

1. Fix /queue command: remove the _agent_running guard that rejected
   /queue after the agent finished. The prompt was deferred in
   _pending_input until the agent completed, then the handler checked
   _agent_running (now False) and rejected it. /queue now always queues
   regardless of timing.

2. Add display.busy_input_mode config (CLI-only):
   - 'interrupt' (default): Enter while busy interrupts the current run
     (preserves existing behavior)
   - 'queue': Enter while busy queues the message for the next turn,
     with a 'Queued for the next turn: ...' confirmation
   Ctrl+C always interrupts regardless of this setting.

Salvaged from PR #3037 by StefanoChiodino. Key differences:
- Default is 'interrupt' (preserves existing behavior) not 'queue'
- No config version bump (unnecessary for new key in existing section)
- Simpler normalization (no alias map)
- /queue fix is simpler: just remove the guard instead of intercepting
  commands during busy state
2026-03-26 17:58:40 -07:00
Teknium
0375b2a0d7 fix(gateway): silence background agent terminal output (#3297)
* fix(gateway): silence flush agent terminal output

quiet_mode=True only suppresses AIAgent init messages.
Tool call output still leaks to the terminal through
_safe_print → _print_fn during session reset/expiry.

Since #2670 injected live memory state into the flush prompt,
the flush agent now reliably calls memory tools — making the
output leak noticeable for the first time.

Set _print_fn to a no-op so the background flush is fully silent.

* test(gateway): add test for flush agent terminal silence + fix dotenv mock

- Add TestFlushAgentSilenced: verifies _print_fn is set to a no-op on
  the flush agent so tool output never leaks to the terminal
- Fix pre-existing test failures: replace patch('run_agent.AIAgent')
  with sys.modules mock to avoid importing run_agent (requires openai)
- Add autouse _mock_dotenv fixture so all tests in this file run
  without the dotenv package installed

* fix(display): route KawaiiSpinner output through print_fn to fully silence flush agent

The previous fix set tmp_agent._print_fn = no-op on the flush agent but
spinner output and quiet-mode cute messages bypassed _print_fn entirely:
- KawaiiSpinner captured sys.stdout at __init__ and wrote directly to it
- quiet-mode tool results used builtin print() instead of _safe_print()

Add optional print_fn parameter to KawaiiSpinner.__init__; _write routes
through it when set. Pass self._print_fn to all spinner construction sites
in run_agent.py and change the quiet-mode cute message print to _safe_print.
The existing gateway fix (tmp_agent._print_fn = lambda) now propagates
correctly through both paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(gateway): silence hygiene and compression background agents

Two more background AIAgent instances in the gateway were created with
quiet_mode=True but without _print_fn = no-op, causing tool output to
leak to the terminal:
- _hyg_agent (in-turn hygiene memory agent)
- tmp_agent (_compress_context path)

Apply the same _print_fn no-op pattern used for the flush agent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore(display): remove unused _last_flush_time from KawaiiSpinner

Attribute was set but never read; upstream already removed it.
Leftover from conflict resolution during rebase onto upstream/main.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 17:40:31 -07:00
Teknium
08fa326bb0 feat(gateway): deliver background review notifications to user chat (#3293)
The background memory/skill review (_spawn_background_review) runs
after the agent response when turn/iteration counters exceed their
thresholds. It saves memories and skills, then prints a summary like
'💾 Memory updated · User profile updated'. In CLI mode this goes to
the terminal via _safe_print. In gateway mode, _safe_print routes to
print() which goes to stdout — invisible to the user.

Add a background_review_callback attribute to AIAgent. When set, the
background review thread calls it with the summary string after saves
complete. The gateway wires this to adapter.send() via the same
run_coroutine_threadsafe bridge used by status_callback, delivering
the notification to the user's chat.
2026-03-26 17:38:24 -07:00
Teknium
bde45f5a2a fix(gateway): retry transient send failures and notify user on exhaustion (#3288)
When send() fails due to a network error (ConnectError, ReadTimeout, etc.),
the failure was silently logged and the user received no feedback — appearing
as a hang. In one reported case, a user waited 1+ hour for a response that
had already been generated but failed to deliver (#2910).

Adds _send_with_retry() to BasePlatformAdapter:
- Transient errors: retry up to 2x with exponential backoff + jitter
- On exhaustion: send delivery-failure notice so user knows to retry
- Permanent errors: fall back to plain-text version (preserves existing behavior)
- SendResult.retryable flag for platform-specific transient errors

All adapters benefit automatically via BasePlatformAdapter inheritance.

Cherry-picked from PR #3108 by Mibayy.

Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
2026-03-26 17:37:10 -07:00
Teknium
716e616d28 fix(tui): status bar duplicates and degrades during long sessions (#3291)
shutil.get_terminal_size() can return stale/fallback values on SSH that
differ from prompt_toolkit's actual terminal width. Fragments built for
the wrong width overflow and wrap onto a second line (wrap_lines=True
default), appearing as progressively degrading duplicates.

- Read width from get_app().output.get_size().columns when inside a
  prompt_toolkit TUI, falling back to shutil outside TUI context
- Add wrap_lines=False on the status bar Window as belt-and-suspenders
  guard against any future width mismatch

Closes #3130

Co-authored-by: Mibayy <Mibayy@users.noreply.github.com>
2026-03-26 17:33:11 -07:00
Teknium
bdccdd67a1 fix: OpenClaw migration overwrites defaults and setup wizard skips imported sections (#3282)
Two bugs caused the OpenClaw migration during first-time setup to be
ineffective, forcing users to reconfigure everything manually:

1. The setup wizard created config.yaml with all defaults BEFORE running
   the migration, then the migrator ran with overwrite=False. Every config
   setting was reported as a 'conflict' against the defaults and skipped.
   Fix: use overwrite=True during setup-time migration (safe because only
   defaults exist at that point). The hermes claw migrate CLI command
   still defaults to overwrite=False for post-setup use.

2. After migration, the full setup wizard ran all 5 sections unconditionally,
   forcing the user through model/terminal/agent/messaging/tools configuration
   even when those settings were just imported.
   Fix: add _get_section_config_summary() and _skip_configured_section()
   helpers. After migration, each section checks if it's already configured
   (API keys present, non-default values, platform tokens) and offers
   'Reconfigure? [y/N]' with default No. Unconfigured sections still run
   normally.

Reported by Dev Bredda on social media.
2026-03-26 16:29:38 -07:00
Teknium
148f46620f fix(matrix): add backoff for SyncError in sync loop (#3280)
When the homeserver returns an error response, matrix-nio parses it
as a SyncError return value rather than raising an exception. The sync
loop only had backoff in the except handler, so SyncError caused a
tight retry loop (~489 req/s) flooding logs and hammering the
homeserver. Check the return value and sleep 5s before retry.

Cherry-picked from PR #2937 by ticketclosed-wontfix.

Co-authored-by: ticketclosed-wontfix <ticketclosed-wontfix@users.noreply.github.com>
2026-03-26 16:19:58 -07:00
Robin Fernandes
e95965d76a Merge branch 'main' into rewbs/tool-use-charge-to-subscription 2026-03-26 16:18:28 -07:00
Robin Fernandes
95dc9aaa75 feat: add managed tool gateway and Nous subscription support
- add managed modal and gateway-backed tool integrations\n- improve CLI setup, auth, and configuration for subscriber flows\n- expand tests and docs for managed tool support
2026-03-26 16:17:58 -07:00
Teknium
6610c377ba fix(telegram): self-reschedule reconnect when start_polling fails (#3268)
After a Telegram 502, _handle_polling_network_error calls updater.stop()
then start_polling(). If start_polling() also raises, the old code logged
a warning and returned — but the comment 'The next network error will
trigger another attempt' was wrong. The updater loop is dead after stop(),
so no further error callbacks ever fire. The gateway stays alive but
permanently deaf to messages.

Fix: when start_polling() fails in the except branch, schedule a new
_handle_polling_network_error task to continue the exponential backoff
retry chain. The task is tracked in _background_tasks (preventing GC).
Guarded by has_fatal_error to avoid spurious retries during shutdown.

Closes #3173.
Salvaged from PR #3177 by Mibayy.
2026-03-26 15:34:33 -07:00
Teknium
e5d14445ef fix(security): restrict subagent toolsets to parent's enabled set (#3269)
The delegate_task tool accepts a toolsets parameter directly from the
LLM's function call arguments. When provided, these toolsets are passed
through _strip_blocked_tools but never intersected with the parent
agent's enabled_toolsets. A model can request toolsets the parent does
not have (e.g., web, browser, rl), granting the subagent tools that
were explicitly disabled for the parent.

Intersect LLM-requested toolsets with the parent's enabled set before
applying the blocked-tool filter, so subagents can only receive a
subset of the parent's tools.

Co-authored-by: dieutx <dangtc94@gmail.com>
2026-03-26 14:50:26 -07:00
Teknium
72250b5f62 feat: config-gated /verbose command for messaging gateway (#3262)
* feat: config-gated /verbose command for messaging gateway

Add gateway_config_gate field to CommandDef, allowing cli_only commands
to be conditionally available in the gateway based on a config value.

- CommandDef gains gateway_config_gate: str | None — a config dotpath
  that, when truthy, overrides cli_only for gateway surfaces
- /verbose uses gateway_config_gate='display.tool_progress_command'
- Default is off (cli_only behavior preserved)
- When enabled, /verbose cycles tool_progress mode (off/new/all/verbose)
  in the gateway, saving to config.yaml — same cycle as the CLI
- Gateway helpers (help, telegram menus, slack mapping) dynamically
  check config to include/exclude config-gated commands
- GATEWAY_KNOWN_COMMANDS always includes config-gated commands so
  the gateway recognizes them and can respond appropriately
- Handles YAML 1.1 bool coercion (bare 'off' parses as False)
- 8 new tests for the config gate mechanism + gateway handler

* docs: document gateway_config_gate and /verbose messaging support

- AGENTS.md: add gateway_config_gate to CommandDef fields
- slash-commands.md: note /verbose can be enabled for messaging, update Notes
- configuration.md: add tool_progress_command to display section + usage note
- cli.md: cross-link to config docs for messaging enablement
- messaging/index.md: show tool_progress_command in config snippet
- plugins.md: add gateway_config_gate to register_command parameter table
2026-03-26 14:41:04 -07:00
Teknium
243ee67529 fix: store asyncio task references to prevent GC mid-execution (#3267)
Python's asyncio event loop holds only weak references to tasks.
Without a strong reference, the garbage collector can destroy a task
while it's awaiting I/O — silently dropping messages. Python 3.12+
made this more aggressive.

Audit of all gateway platform adapters found 6 untracked create_task
calls across 6 files:

Per-message tasks (tracked via _background_tasks set from base class):
- gateway/platforms/webhook.py: handle_message task
- gateway/platforms/sms.py: handle_message task
- gateway/platforms/signal.py: SSE response aclose task

Long-running infrastructure tasks (stored in named instance vars):
- gateway/platforms/slack.py: Socket Mode handler (_socket_mode_task)
- gateway/platforms/discord.py: bot client (_bot_task)
- gateway/platforms/whatsapp.py: message poll loop (_poll_task, 2 sites)

All other adapters (telegram, mattermost, matrix, email, homeassistant,
dingtalk) already tracked their tasks correctly.

Salvaged from PR #3160 by memosr — expanded from 1 file to 6.
2026-03-26 14:36:24 -07:00
Teknium
3a86328847 fix(gateway): add request timeouts to HA, Email, Mattermost, SMS adapters (#3258)
Add timeout=30 to all bare ClientSession, IMAP4_SSL, smtplib.SMTP, and
ws_connect calls that previously had no timeout, preventing indefinite
hangs when an external server is slow or unresponsive.

Adapters hardened:
- HomeAssistant: REST + WS session creation, ws_connect handshake
- Email: all IMAP4_SSL (x2) and smtplib.SMTP (x3) calls
- Mattermost: session creation, _api_get, _api_post, _upload_file (60s)
- SMS: session creation in connect() + fallback session in send()

Salvaged from PRs #3161, #3168, #3170 (memosr) and #3201 (binhnt92).
SMS fallback ClientSession on send() also patched (missed in #3201).

Co-authored-by: memosr <memosr@users.noreply.github.com>
Co-authored-by: nguyen binh <binhnt92@users.noreply.github.com>
2026-03-26 14:36:07 -07:00
Teknium
db241ae6ce feat(sessions): add --source flag for third-party session isolation (#3255)
When third-party tools (Paperclip orchestrator, etc.) spawn hermes chat
as a subprocess, their sessions pollute user session history and search.

- hermes chat --source <tag> (also HERMES_SESSION_SOURCE env var)
- exclude_sources parameter on list_sessions_rich() and search_messages()
- Sessions with source=tool hidden from sessions list/browse/search
- Third-party adapters pass --source tool to isolate agent sessions

Cherry-picked from PR #3208 by HenkDz.

Co-authored-by: Henkey <noonou7@gmail.com>
2026-03-26 14:35:31 -07:00
Teknium
41ee207a5e fix: catch KeyboardInterrupt in exit cleanup handlers (#3257)
except Exception does not catch KeyboardInterrupt (inherits from
BaseException). A second Ctrl+C during exit cleanup aborts pending
writes — Honcho observations dropped, SQLite sessions left unclosed,
cron job sessions never marked ended.

Changed to except (Exception, KeyboardInterrupt) at all five sites:
- cli.py: honcho.shutdown() and end_session() in finally exit block
- run_agent.py: _flush_honcho_on_exit atexit handler
- cron/scheduler.py: end_session() and close() in job finally block

Tests exercise the actual production code paths and confirm
KeyboardInterrupt propagates without the fix.

Co-authored-by: dieutx <dangtc94@gmail.com>
2026-03-26 14:34:31 -07:00
Teknium
e9e7fb0683 fix(gateway): track background task references in GatewayRunner (#3254)
Asyncio tasks created with create_task() but never stored can be
garbage collected mid-execution. Add self._background_tasks set to
hold references, with add_done_callback cleanup. Tracks:
- /background command task
- session-reset memory flush task
- session-resume memory flush task
Cancel all pending tasks in stop().

Update test fixtures that construct GatewayRunner via object.__new__()
to include the new _background_tasks attribute.

Cherry-picked from PR #3167 by memosr. The original PR also deleted
the DM topic auto-skill loading code — that deletion was excluded
from this salvage as it removes a shipped feature (#2598).

Co-authored-by: memosr.eth <96793918+memosr@users.noreply.github.com>
2026-03-26 14:33:48 -07:00
Teknium
76ed15dd4d fix(security): normalize input before dangerous command detection (#3260)
detect_dangerous_command() ran regex patterns against raw command strings
without normalization, allowing bypass via Unicode fullwidth chars,
ANSI escape codes, null bytes, and 8-bit C1 controls.

Adds _normalize_command_for_detection() that:
- Strips ANSI escapes using the full ECMA-48 strip_ansi() from
  tools/ansi_strip (CSI, OSC, DCS, 8-bit C1, nF sequences)
- Removes null bytes
- Normalizes Unicode via NFKC (fullwidth Latin → ASCII, etc.)

Includes 12 regression tests covering fullwidth, ANSI, C1, null byte,
and combined obfuscation bypasses.

Salvaged from PR #3089 by thakoreh — improved ANSI stripping to use
existing comprehensive strip_ansi() instead of a weaker hand-rolled
regex, and added test coverage.

Co-authored-by: Hiren <hiren.thakore58@gmail.com>
2026-03-26 14:33:18 -07:00
Teknium
a8e02c7d49 fix: align Nous Portal model slugs with OpenRouter naming (#3253)
Nous Portal now passes through OpenRouter model names and routes from
there. Update the static fallback model list and auxiliary client default
to use OpenRouter-format slugs (provider/model) instead of bare names.

- _PROVIDER_MODELS['nous']: full OpenRouter catalog
- _NOUS_MODEL: google/gemini-3-flash-preview (was gemini-3-flash)
- Updated 4 test assertions for the new default model name
2026-03-26 13:49:43 -07:00
Teknium
b81d49dc45 fix(state): SQLite concurrency hardening + session transcript integrity (#3249)
* fix(session-db): survive CLI/gateway concurrent write contention

Closes #3139

Three layered fixes for the scenario where CLI and gateway write to
state.db concurrently, causing create_session() to fail with
'database is locked' and permanently disabling session_search on the
gateway side.

1. Increase SQLite connection timeout: 10s -> 30s
   hermes_state.py: longer window for the WAL writer to finish a batch
   flush before the other process gives up entirely.

2. INSERT OR IGNORE in create_session
   hermes_state.py: prevents IntegrityError on duplicate session IDs
   (e.g. gateway restarts while CLI session is still alive).

3. Don't null out _session_db on create_session failure  (main fix)
   run_agent.py: a transient lock at agent startup must not permanently
   disable session_search for the lifetime of that agent instance.
   _session_db now stays alive so subsequent flushes and searches work
   once the lock clears.

4. New ensure_session() helper + call it during flush
   hermes_state.py: INSERT OR IGNORE for a minimal session row.
   run_agent.py _flush_messages_to_session_db: calls ensure_session()
   before appending messages, so the FK constraint is satisfied even
   when create_session() failed at startup. No-op when the row exists.

* fix(state): release lock between context queries in search_messages

The context-window queries (one per FTS5 match) were running inside
the same lock acquisition as the primary FTS5 query, holding the lock
for O(N) sequential SQLite round-trips. Move per-match context fetches
outside the outer lock block so each acquires the lock independently,
keeping critical sections short and allowing other threads to interleave.

* fix(session): prefer longer source in load_transcript to prevent legacy truncation

When a long-lived session pre-dates SQLite storage (e.g. sessions
created before the DB layer was introduced, or after a clean
deployment that reset the DB), _flush_messages_to_session_db only
writes the *new* messages from the current turn to SQLite — it skips
messages already present in conversation_history, assuming they are
already persisted.

That assumption fails for legacy JSONL-only sessions:

  Turn N (first after DB migration):
    load_transcript(id)       → SQLite: 0  → falls back to JSONL: 994 ✓
    _flush_messages_to_session_db: skip first 994, write 2 new → SQLite: 2

  Turn N+1:
    load_transcript(id)       → SQLite: 2  → returns immediately ✗
    Agent sees 2 messages of history instead of 996

The same pattern causes the reported symptom: session JSON truncated
to 4 messages (_save_session_log writes agent.messages which only has
2 history + 2 new = 4).

Fix: always load both sources and return whichever is longer.  For a
fully-migrated session SQLite will always be ≥ JSONL, so there is no
regression.  For a legacy session that hasn't been bootstrapped yet,
JSONL wins and the full history is restored.

Closes #3212

* test: add load_transcript source preference tests for #3212

Covers: JSONL longer returns JSONL, SQLite longer returns SQLite,
SQLite empty falls back to JSONL, both empty returns empty, equal
length prefers SQLite (richer reasoning fields).

---------

Co-authored-by: Mibayy <mibayy@hermes.ai>
Co-authored-by: kewe63 <kewe.3217@gmail.com>
Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
2026-03-26 13:47:14 -07:00
Teknium
3a7907b278 fix(security): prevent zip-slip path traversal in self-update (#3250)
Validate each ZIP member's resolved path against the extraction directory
before extracting. A crafted ZIP with paths like ../../etc/passwd would
previously write outside the target directory.

Fixes #3075

Co-authored-by: Hiren <hiren.thakore58@gmail.com>
2026-03-26 13:40:37 -07:00
Teknium
b7b3294c4a fix(skills): preserve trust for skills-sh identifiers + reduce resolution churn (#3251)
* fix(skills): reduce skills.sh resolution churn and preserve trust for wrapped identifiers

- Accept common skills.sh prefix typos (skils-sh/, skils.sh/)
- Strip skills-sh/ prefix in _resolve_trust_level() so trusted repos
  stay trusted when installed through skills.sh
- Use resolved identifier (from bundle/meta) for scan_skill source
- Prefer tree search before root scan in _discover_identifier()
- Add _resolve_github_meta() consolidation for inspect flow

Cherry-picked from PR #3001 by kshitijk4poor.

* fix: restore candidate loop in SkillsShSource.fetch() for consistency

The cherry-picked PR only tried the first candidate identifier in
fetch() while inspect() (via _resolve_github_meta) tried all four.
This meant skills at repo/skills/path would be found by inspect but
missed by fetch, forcing it through the heavier _discover_identifier
flow. Restore the candidate loop so both paths behave identically.

Updated the test assertion to match.

---------

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-26 13:40:21 -07:00
Teknium
62f8aa9b03 fix: MCP toolset resolution for runtime and config (#3252)
Gateway sessions had their own inline toolset resolution that only read
platform_toolsets from config, which never includes MCP server names.
MCP tools were discovered and registered but invisible to the model.

- Replace duplicated gateway toolset resolution in _run_agent() and
  _run_background_task() with calls to the shared _get_platform_tools()
- Extend _get_platform_tools() to include globally enabled MCP servers
  at runtime (include_default_mcp_servers=True), while config-editing
  flows use include_default_mcp_servers=False to avoid persisting
  implicit MCP defaults into platform_toolsets
- Add homeassistant to PLATFORMS dict (was missing, caused KeyError)
- Fix CLI entry point to use _get_platform_tools() as well, so MCP
  tools are visible in CLI mode too
- Remove redundant platform_key reassignment in _run_background_task

Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-03-26 13:39:41 -07:00
Teknium
2c719f0701 fix(auth): migrate OAuth token refresh to platform.claude.com with fallback (#3246)
Anthropic migrated their OAuth infrastructure from console.anthropic.com
to platform.claude.com (Claude Code v2.1.81+). Update _refresh_oauth_token()
to try the new endpoint first, falling back to the old one for tokens
issued before the migration.

Also switches Content-Type from application/x-www-form-urlencoded to
application/json to match current Claude Code behavior.

Salvaged from PR #2741 by kshitijk4poor.
2026-03-26 13:26:56 -07:00
Teknium
c6fe75e99b fix(gateway): fingerprint full auth token in agent cache signature (#3247)
Previously _agent_config_signature() used only the first 8 characters of
the API key, which causes false cache hits for JWT/OAuth tokens that share
a common prefix (e.g. 'eyJhbGci'). This led to cross-account cache
collisions when switching OAuth accounts in multi-user gateway deployments.

Replace the 8-char prefix with a SHA-256 hash of the full key so the
signature is unique per credential while keeping secrets out of the
cache key.

Salvaged from PR #3117 by EmpireOperating.

Co-authored-by: EmpireOperating <EmpireOperating@users.noreply.github.com>
2026-03-26 13:19:43 -07:00
Teknium
36af1f3baf feat(telegram): Private Chat Topics with functional skill binding (#2598)
Salvages PR #3005 by web3blind. Cherry-picked onto current main with functional skill binding and docs added.

- DM topic creation via createForumTopic (Bot API 9.4, Feb 2026)
- Config-driven topics with thread_id persistence across restarts
- Session isolation via existing build_session_key thread_id support
- auto_skill field on MessageEvent for topic-skill bindings
- Gateway auto-loads bound skill on new sessions (same as /skill commands)
- Docs: full Private Chat Topics section in Telegram messaging guide
- 20 tests (17 original + 3 for auto_skill)

Closes #2598
Co-authored-by: web3blind <web3blind@users.noreply.github.com>
2026-03-26 02:04:11 -07:00
Teknium
43af094ae3 fix(agent): include tool tokens in preflight estimate, guard context probe persistence (#3164)
Two improvements salvaged from PR #2600 (paraddox):

1. Preflight compression now counts tool schema tokens alongside system
   prompt and messages.  With 50+ tools enabled, schemas can add 20-30K
   tokens that were previously invisible to the estimator, delaying
   compression until the API rejected the request.

2. Context probe persistence guard: when the agent steps down context
   tiers after a context-length error, only provider-confirmed numeric
   limits (parsed from the error message) are cached to disk.  Guessed
   fallback tiers from get_next_probe_tier() stay in-memory only,
   preventing wrong values from polluting the persistent cache.

Co-authored-by: paraddox <paraddox@users.noreply.github.com>
2026-03-26 02:00:50 -07:00
memosr.eth
9989e579da fix: add request timeouts to send_message_tool HTTP calls (#3162)
_send_discord(), _send_slack(), and _send_twilio() all created
aiohttp.ClientSession() without a timeout, leaving HTTP requests
able to hang indefinitely. _send_whatsapp() already used
aiohttp.ClientTimeout(total=30) — this fix applies the same
pattern consistently to all platform send functions.

- Add ClientTimeout(total=30) to _send_discord() ClientSession
- Add ClientTimeout(total=30) to _send_slack() ClientSession
- Add ClientTimeout(total=30) to _send_twilio() ClientSession
2026-03-26 01:58:11 -07:00
Teknium
4a56e2cd88 fix(display): show tool progress for substantive tools, not just "preparing"
_mute_post_response was set True whenever a turn had both content
and tool_calls, suppressing ALL subsequent _vprint output including
tool completion messages. This meant users only saw "preparing
search_files..." but never the result.

Now only mutes output when every tool in the batch is housekeeping
(memory, todo, skill_manage, session_search). Substantive tools
like search_files, read_file, write_file, terminal etc. keep their
completion messages visible.

Also fixes: run_conversation no longer raises on max retries
(returns graceful error dict instead), and cli.py wraps the agent
thread in try/except as a safety net.

Made-with: Cursor
2026-03-26 01:52:52 -07:00
Teknium
26bfdc22b4 feat: add godmode jailbreaking skill + docs (#3157) 2026-03-26 01:37:18 -07:00
Teknium
0426bb745f fix: reset default SOUL.md to baseline identity text (#3159)
The default SOUL.md seeded for new users should match
DEFAULT_AGENT_IDENTITY — a short, neutral identity paragraph.
The elaborate voice spec (avoid lists, dialogue examples, symbol
conventions) was never intended as the default for all users.

Users who want a custom persona write their own SOUL.md.
2026-03-26 01:34:27 -07:00
Teknium
c511e087e0 fix(agent): always prefer streaming for API calls to prevent hung subagents (#3120)
The non-streaming API call path (_interruptible_api_call) had no
wall-clock timeout. When providers keep connections alive with SSE
keep-alive pings but never deliver a response, httpx's inactivity
timeout never fires and the call hangs indefinitely.

Subagents always used the non-streaming path because they have no
stream consumers (quiet_mode=True). This caused delegate_task to
hang for 40+ minutes in production.

The streaming path has two layers of protection:
- httpx read timeout (60s, HERMES_STREAM_READ_TIMEOUT)
- Stale stream detection (90s, HERMES_STREAM_STALE_TIMEOUT)

Both work because streaming sends chunks continuously — a 90-second
gap between chunks genuinely means the connection is broken, even for
reasoning models that take minutes to complete.

Now run_conversation() always prefers the streaming path. The streaming
method falls back to non-streaming automatically if the provider
doesn't support it. Stream delta callbacks are no-ops when no
consumers are registered, so there's no overhead for subagents.
2026-03-26 01:22:31 -07:00
Teknium
c07c17f5f2 feat(agent): surface all retry/fallback/compression lifecycle events (#3153)
Add _emit_status() helper that sends lifecycle notifications to both
CLI (via _vprint force=True) and gateway (via status_callback). No
retry, fallback, or compression path is silent anymore.

Pathways surfaced:
- General retry backoff: was logger-only, now shows countdown
- Provider fallback: changed raw print() to _emit_status for gateway
- Rate limit eager fallback: new notification before switching
- Empty/malformed response fallback: new notification
- Client error fallback: new notification with HTTP status
- Max retries fallback: new notification before attempting
- Max retries giving up: upgraded from _vprint to _emit_status
- Compression retry (413 + context overflow): upgraded to _emit_status
- Compression success + retry: upgraded to _emit_status (2 instances)
2026-03-26 01:08:47 -07:00
Teknium
cbf195e806 chore: fix 154 f-strings, simplify getattr/URL patterns, remove dead code (#3119)
Three categories of cleanup, all zero-behavioral-change:

1. F-strings without placeholders (154 fixes across 29 files)
   - Converted f'...' to '...' where no {expression} was present
   - Heaviest files: run_agent.py (24), cli.py (20), honcho_integration/cli.py (34)

2. Simplify defensive patterns in run_agent.py
   - Added explicit self._is_anthropic_oauth = False in __init__ (before
     the api_mode branch that conditionally sets it)
   - Replaced 7x getattr(self, '_is_anthropic_oauth', False) with direct
     self._is_anthropic_oauth (attribute always initialized now)
   - Added _is_openrouter_url() and _is_anthropic_url() helper methods
   - Replaced 3 inline 'openrouter' in self._base_url_lower checks

3. Remove dead code in small files
   - hermes_cli/claw.py: removed unused 'total' computation
   - tools/fuzzy_match.py: removed unused strip_indent() function and
     pattern_stripped variable

Full test suite: 6184 passed, 0 failures
E2E PTY: banner clean, tool calls work, zero garbled ANSI
2026-03-25 19:47:58 -07:00
Teknium
08d3be0412 fix: graceful return on max retries instead of crashing thread
run_conversation raised the raw exception after exhausting retries,
which crashed the background thread in cli.py (unhandled exception
in Thread). Now returns a proper error result dict with failed=True
and persists the session, matching the pattern used by other error
paths (invalid responses, empty content, etc.).

Also wraps cli.py's run_agent thread function in try/except as a
safety net against any future unhandled exceptions from
run_conversation.

Made-with: Cursor
2026-03-25 19:00:39 -07:00
Teknium
156b50358b fix(reasoning): skip duplicate callback for <think>-extracted reasoning during streaming (#3116)
Local models (Ollama, LM Studio) embed reasoning in <think> tags in
delta.content. During streaming, _stream_delta() already displays these
blocks. Then _build_assistant_message() extracts them again and fires
reasoning_callback, causing duplicate display.

Track whether reasoning came from structured fields (reasoning_content)
vs <think> tag extraction. Only fire the callback for <think>-extracted
reasoning when stream_delta_callback is NOT active. Structured reasoning
always fires regardless.

Salvaged from PR #2076 by dusterbloom (Fix A only — Fix B was already
covered by PR #3013's _current_reasoning_callback centralization).
Closes #2069.
2026-03-25 18:57:18 -07:00
Teknium
59575d6a91 fix(gateway): recover from hung agents — /stop force-unlocks session (#3104)
When an agent thread hangs (truly blocked, never checks _interrupt_requested),
/stop now force-cleans _running_agents to unlock the session immediately.

Two changes:
- Early /stop intercept in the running-agent guard: bypasses normal command
  dispatch to force-interrupt and unlock the session. Follows the same pattern
  as the existing /new intercept.
- Sentinel /stop: force-cleans the sentinel instead of returning 'nothing to
  stop yet', so /stop during slow startup actually unlocks the session.

Follow-up improvements over original PR:
- Consolidated duplicate resolve_command imports into single early resolution
- Updated _handle_stop_command to also force-clean for consistency
- Removed 10-minute hard timeout on the executor (would kill legitimate
  long-running agent tasks; the /stop force-clean handles recovery)

Cherry-picked from Mibayy's PR #2498.

Co-authored-by: Mibayy <Mibayy@users.noreply.github.com>
2026-03-25 18:46:50 -07:00
Teknium
f46542b6c6 fix(cli): read root-level provider and base_url from config.yaml into model config (#3112)
When users write root-level provider and base_url in config.yaml
(instead of nesting under model:), these keys were never merged into
defaults['model']. The CLI reads them from CLI_CONFIG['model']['provider']
so root-level keys were silently ignored, causing fallback to OpenRouter.

Merge root-level provider and base_url into defaults['model'] after
handling the model key, so custom/local provider configs work regardless
of nesting.

Cherry-picked from PR #2283 by ygd58. Fixes #2281.
2026-03-25 18:38:32 -07:00
Teknium
5b29ff50f8 fix(logging): extract useful info from HTML error pages, dump debug on max retries
Three problems with API error debugging:

1. Terminal showed str(error)[:200] — raw HTML gibberish for Cloudflare
   502/503 pages instead of "502 Bad Gateway"
2. errors.log dumped the entire HTML page as unstructured text
3. _dump_api_request_debug was never called when retries exhausted,
   only for non-retryable 4xx errors

Adds _summarize_api_error() that extracts <title> and Cloudflare Ray ID
from HTML error pages, and falls back to SDK error body messages. Now
the terminal shows clean one-liners like:

  📝 Error: HTTP 502 — openrouter.ai | 502: Bad gateway — Ray 9e226...

Also calls _dump_api_request_debug on max_retries_exhausted so the full
request context is written to ~/.hermes/sessions/ for post-mortem.

Made-with: Cursor
2026-03-25 18:36:04 -07:00
Teknium
7258311710 fix: stop recursive AGENTS.md walk, load top-level only (#3110)
The recursive os.walk for AGENTS.md in subdirectories was undesired.
Only load AGENTS.md from the working directory root, matching the
behavior of CLAUDE.md and .cursorrules.
2026-03-25 18:30:45 -07:00
Teknium
910ec7eb38 chore: remove unused Hermes-native PKCE OAuth flow (#3107)
Remove run_hermes_oauth_login(), refresh_hermes_oauth_token(),
read_hermes_oauth_credentials(), _save_hermes_oauth_credentials(),
_generate_pkce(), and associated constants/credential file path.

This code was added in 63e88326 but never wired into any user-facing
flow (setup wizard, hermes model, or any CLI command). Neither
clawdbot/OpenClaw nor opencode implement PKCE for Anthropic — both
use setup-token or API keys. Dead code that was never tested in
production.

Also removes the credential resolution step that checked
~/.hermes/.anthropic_oauth.json (step 3 in resolve_anthropic_token),
renumbering remaining steps.
2026-03-25 18:29:47 -07:00
Teknium
4b45f65858 fix: update api_key in _try_activate_fallback for subagent auth (#3103)
When fallback activates (e.g. minimax → OpenRouter), self.provider,
self.base_url, self.api_mode, and self._client_kwargs were all updated
but self.api_key was not. delegate_tool.py reads parent_agent.api_key
to pass credentials to child agents, so subagents inherited the stale
pre-fallback key (e.g. a minimax key sent to OpenRouter), causing 401
Missing Authentication errors.

Add self.api_key = ... in both the anthropic_messages and
chat_completions branches of _try_activate_fallback().
2026-03-25 18:23:03 -07:00
Teknium
b374f52063 fix(session): clear compressor summary and turn counter on /clear and /new (#3102)
reset_session_state() was missing two fields added after it was written:
- _user_turn_count: kept accumulating across sessions, affecting
  flush_min_turns guard behavior
- context_compressor._previous_summary: old session's compression
  summary leaked into new session's iterative compression

Cherry-picked from PR #2640 by dusterbloom. Closes #2635.
2026-03-25 18:22:21 -07:00
Teknium
bd43a43f07 fix(cli): handle EOFError in sessions delete/prune confirmation prompts (#3101)
sessions delete and prune call input() for confirmation without
catching EOFError. When stdin isn't a TTY (piped input, CI/CD, cron),
input() throws EOFError and the command crashes.

Extract a _confirm_prompt() helper that handles EOFError and
KeyboardInterrupt, defaulting to cancel. Both call sites now use it.

Salvaged from PR #2622 by dieutx (improved from duplicated try/except
to shared helper). Closes #2565.
2026-03-25 18:06:04 -07:00
Teknium
432ba3b709 fix: use sys.executable for pip in update commands to fix PEP 668 (#3099)
The update commands called bare 'pip' as fallback when uv wasn't found.
On modern Debian/Ubuntu enforcing PEP 668, this resolves to system pip
which refuses to install in an externally-managed environment.

Use sys.executable -m pip to ensure the venv's pip is used. Fixed in
both cmd_update and _update_via_zip (the PR only caught one instance).

Salvaged from PR #2655 by devorun. Fixes #2648.
2026-03-25 17:52:59 -07:00
Teknium
712cebc40f fix(logging): show HTTP status code and 400 body in API error output (#3096)
When an API call fails, the terminal output now includes the HTTP status
code in the header line and, for 400 errors, the response body from the
provider (truncated to 300 chars). Makes it much easier to diagnose
issues like invalid model names or malformed requests that were
previously hidden behind generic error messages.

Salvaged from PR #2646 by Mibayy. Fixes #2644.
2026-03-25 17:47:55 -07:00
Teknium
45f57c2012 feat(models): add glm-5-turbo to zai provider model list (#3095)
Cherry-picked from PR #2542 by ReqX. Adds glm-5-turbo to the direct
zai provider curated model list so /model zai:glm-5-turbo validates
correctly. The model was already in _OPENROUTER_UPSTREAM_MODELS but
missing from the direct provider list.
2026-03-25 17:42:25 -07:00
Teknium
41081d718c fix(cli): prevent update crash in non-TTY environments (#3094)
cmd_update calls input() unconditionally during config migration.
In headless environments (Telegram gateway, systemd), there's no TTY,
so input() throws EOFError and the update crashes.

Guard with sys.stdin.isatty(), default to skipping the migration
prompt when non-interactive.

Salvaged from PR #2850 by devorun. Closes #2848.
2026-03-25 17:34:20 -07:00
ctlst
281100e2df fix(agent): prevent AsyncOpenAI/httpx cross-loop deadlock in gateway mode (#2701)
In gateway mode, async tools (vision_analyze, web_extract, session_search)
deadlock because _run_async() spawns a thread with asyncio.run(), creating
a new event loop, but _get_cached_client() returns an AsyncOpenAI client
bound to a different loop. httpx.AsyncClient cannot work across event loop
boundaries, causing await client.chat.completions.create() to hang forever.

Fix: include the event loop identity in the async client cache key so each
loop gets its own AsyncOpenAI instance. Also fix session_search_tool.py
which had its own broken asyncio.run()-in-thread pattern — now uses the
centralized _run_async() bridge.
2026-03-25 17:31:56 -07:00
Teknium
0d7f739675 fix(setup): use explicit key mapping for returning-user menu dispatch instead of positional index (#3083)
Co-authored-by: ygd58 <buraysandro9@gmail.com>
2026-03-25 17:14:43 -07:00
Teknium
9783c9d5c1 refactor: remove /model slash command from CLI and gateway (#3080)
The /model command is removed from both the interactive CLI and
messenger gateway (Telegram/Discord/Slack/WhatsApp). Users can
still change models via 'hermes model' CLI subcommand or by
editing config.yaml directly.

Removed:
- CommandDef entry from COMMAND_REGISTRY
- CLI process_command() handler and model autocomplete logic
- Gateway _handle_model_command() and dispatch
- SlashCommandCompleter model_completer_provider parameter
- Two-stage Tab completion and ghost text for /model
- All /model-specific tests

Unaffected:
- /provider command (read-only, shows current model + providers)
- ACP adapter _cmd_model (separate system for VS Code/Zed/JetBrains)
- model_switch.py module (used by ACP)
- 'hermes model' CLI subcommand

Author: Teknium
2026-03-25 17:03:05 -07:00
Teknium
0cfc1f88a3 fix: add MCP tool name collision protection (#3077)
- Registry now warns when a tool name is overwritten by a different
  toolset (silent dict overwrite was the previous behavior)
- MCP tool registration checks for collisions with non-MCP (built-in)
  tools before registering. If an MCP tool's prefixed name matches an
  existing built-in, the MCP tool is skipped and a warning is logged.
  MCP-to-MCP collisions are allowed (last server wins).
- Both regular MCP tools and utility tools (resources/prompts) are
  guarded.
- Adds 5 tests covering: registry overwrite warning, same-toolset
  re-registration silence, built-in collision skip, normal registration,
  and MCP-to-MCP collision pass-through.

Reported by k_sze (KONG) — MiniMax MCP server's web_search tool could
theoretically shadow Hermes's built-in web_search if prefixing failed.
2026-03-25 16:52:04 -07:00
Teknium
3bc953a666 fix(security): bump dependencies to fix CVEs + regenerate uv.lock (#3073)
* fix(security): bump dependencies to fix 7 CVEs

Python (pyproject.toml):
- requests >=2.33.0: CVE-2026-25645
- PyJWT >=2.12.0: CVE-2026-32597

Transitive Python CVEs (require lock file or upstream fix):
- cbor2 5.8.0: CVE-2026-26209 (via modal)
- pygments 2.19.2: CVE-2026-4539 (via rich)
- pynacl 1.5.0: CVE-2025-69277 (via discord.py)

NPM (package-lock.json via npm audit fix):
- basic-ftp: CRITICAL path traversal (GHSA-5rq4-664w-9x2c)
- fast-xml-parser: HIGH stack overflow + entity expansion
- undici: HIGH CRLF injection, memory DoS, smuggling
- minimatch: HIGH ReDoS

Remaining: lodash moderate prototype pollution in @appium/logger
(upstream fix needed).

* chore: regenerate uv.lock for CVE version bumps

uv lock after requests >=2.33.0 and PyJWT >=2.12.0 minimum bumps.
Without this, uv sync --locked fails because the old lock pinned
requests==2.32.5 and pyjwt==2.11.0 (below new minimums).

---------

Co-authored-by: 0xbyt4 <35742124+0xbyt4@users.noreply.github.com>
2026-03-25 16:43:21 -07:00
Teknium
bd6b138e85 fix: clean up HTML error messages in CLI display (#3069)
When API calls fail with HTML error pages (e.g., CloudFlare errors), the CLI
was dumping raw HTML content to users like:
  📝 Error: <!DOCTYPE html><!--[if lt IE 7]> <html class="no-js ie6...

This commit adds a _clean_error_message() utility method that:
- Detects HTML content and replaces with user-friendly message
- Collapses multiline errors to single line
- Truncates overly long errors (>150 chars)
- Preserves meaningful error text for regular errors

Applied to all user-facing error displays:
- API call failure messages (line 6314)
- Interrupt error responses (line 6324)
- Invalid response error messages (line 6000)

Before: 📝 Error: <!DOCTYPE html><!--[if lt IE 7]>...
After:  📝 Error: Service temporarily unavailable (HTML error page returned)
2026-03-25 16:39:22 -07:00
Teknium
9792bde31a fix(agent): count compression restarts toward retry limit (#3070)
When context overflow triggers compression, the outer retry loop
restarts via continue without incrementing retry_count. If compression
reduces messages but not enough to fit the context window, this creates
an infinite loop burning API credits: API call → overflow → compress →
retry → overflow → compress → ...

Increment retry_count on compression restarts so the loop exits after
max_retries total attempts.

Cherry-picked from PR #2766 by dieutx.
2026-03-25 16:35:17 -07:00
Teknium
9d1e13019e fix(cli): prevent TypeError on startup when base_url is None (#3068)
Description
This PR fixes the startup crash introduced in v0.4.0 where `self.base_url` being `None` throws a `TypeError`.

Root Cause:
At `cli.py:1108`, a membership check (`"openrouter.ai" in self.base_url`) is performed. If a user's config doesn't explicitly set a `base_url` (meaning it's `None`), Python raises a `TypeError: argument of type 'NoneType' is not iterable`, causing the entire CLI to crash on boot.

Fix:
Added a simple truthiness guard (`if self.base_url and ...`) to ensure the membership check only occurs if `base_url` is a valid string.

Closes #2842

Co-authored-by: devorun <130918800+devorun@users.noreply.github.com>
2026-03-25 16:21:00 -07:00
Teknium
37cabc47d3 test(skills): add regression tests for null metadata frontmatter
Covers the case where a SKILL.md has `metadata:` (null) or
`metadata.hermes:` (null), which caused an AttributeError
before the fix in d218cf91.

Made-with: Cursor
2026-03-25 16:09:27 -07:00
Teknium
f7f30aaab9 fix(streaming): detect and kill stale SSE connections
Adds a wall-clock stale stream detector (HERMES_STREAM_STALE_TIMEOUT,
default 90s) that force-closes the httpx client when no real chunks
arrive, even if SSE keep-alive pings keep the socket alive. Works
with the existing streaming retry loop to recover via fresh connection.

Made-with: Cursor
2026-03-25 16:07:05 -07:00
Teknium
d218cf9118 fix(skills): handle null metadata in skill frontmatter
frontmatter.get("metadata", {}) returns None (not {}) when the
key exists with a null value, crashing build_skills_system_prompt
with AttributeError: 'NoneType' object has no attribute 'get'.

Made-with: Cursor
2026-03-25 16:06:15 -07:00
Teknium
841401f588 feat(cli): preserve user input on multiline paste (#3065)
When pasting 5+ lines, the CLI previously replaced the entire input
buffer with a file reference placeholder. If the user had already typed
a question, it was lost.

Fix: move paste collapsing into handle_paste (BracketedPaste handler)
so only the pasted content is saved to file. The placeholder is inserted
at the cursor position, preserving existing buffer text.

Also fixes:
- Multi-ref expansion on submit (re.sub instead of re.match) so
  multiple paste blocks and surrounding text are all preserved
- Double-collapse prevention via _paste_just_collapsed flag
- Consistent Unicode arrow character across all paste paths

Salvaged from PR #2607 by crazywriter1 (option B: core fix only,
without keybinding overrides for solid-object navigation/deletion).
2026-03-25 16:00:36 -07:00
Teknium
77bcaba2d7 refactor: consolidate get_hermes_home() and parse_reasoning_effort() (#3062)
Centralizes two widely-duplicated patterns into hermes_constants.py:

1. get_hermes_home() — Path resolution for ~/.hermes (HERMES_HOME env var)
   - Was copy-pasted inline across 30+ files as:
     Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
   - Now defined once in hermes_constants.py (zero-dependency module)
   - hermes_cli/config.py re-exports it for backward compatibility
   - Removed local wrapper functions in honcho_integration/client.py,
     tools/website_policy.py, tools/tirith_security.py, hermes_cli/uninstall.py

2. parse_reasoning_effort() — Reasoning effort string validation
   - Was copy-pasted in cli.py, gateway/run.py, cron/scheduler.py
   - Same validation logic: check against (xhigh, high, medium, low, minimal, none)
   - Now defined once in hermes_constants.py, called from all 3 locations
   - Warning log for unknown values kept at call sites (context-specific)

31 files changed, net +31 lines (125 insertions, 94 deletions)
Full test suite: 6179 passed, 0 failed
2026-03-25 15:54:28 -07:00
Teknium
e0cfc089da fix(gateway/slack): send progress messages to correct thread (#3063)
Co-authored-by: Jneeee <jneeee@outlook.com>
2026-03-25 15:51:15 -07:00
Siddharth Balyan
7126524e8d remove config drift check for nix (#3061) 2026-03-25 15:46:29 -07:00
Teknium
f83c27e26f feat(skills): add Docker management skill to optional-skills (#3060)
Docker CLI reference covering containers, images, Compose, volumes,
networks, troubleshooting, and Dockerfile optimization. Placed in
optional-skills/devops/ since it's a documentation-only skill with
no external dependencies beyond Docker CLI.

Based on PR #3032 by @sprmn24. Moved from skills/ to optional-skills/
and trimmed the description to be concise.

Co-authored-by: sprmn24 <sprmn24@users.noreply.github.com>
2026-03-25 15:32:25 -07:00
Teknium
ab548a9b5e fix(security): add SSRF protection to browser_navigate (#3058)
* fix(security): add SSRF protection to browser_navigate

browser_navigate() only checked the website blocklist policy but did
not call is_safe_url() to block private/internal addresses. This
allowed the agent to navigate to localhost, cloud metadata endpoints
(169.254.169.254), and private network IPs via the browser.

web_tools and vision_tools already had this check. Added the same
is_safe_url() pre-flight validation before the blocklist check in
browser_navigate().

* fix: move SSRF import to module level, fix policy test mock

Move is_safe_url import to module level so it can be monkeypatched
in tests. Update test_browser_navigate_returns_policy_block to mock
_is_safe_url so the SSRF check passes and the policy check is reached.

* fix(security): harden browser SSRF protection

Follow-up to cherry-picked PR #3041:

1. Fail-closed fallback: if url_safety module can't import, block all
   URLs instead of allowing all. Security guards should never fail-open.

2. Post-redirect SSRF check: after navigation, verify the final URL
   isn't a private/internal address. If a public URL redirected to
   169.254.169.254 or localhost, navigate to about:blank and return
   an error — prevents the model from reading internal content via
   subsequent browser_snapshot calls.

---------

Co-authored-by: 0xbyt4 <35742124+0xbyt4@users.noreply.github.com>
2026-03-25 15:16:57 -07:00
Teknium
73e66eb3c0 fix(gateway): thread-safe SessionStore — protect _entries with threading.Lock (#3052)
SessionStore._entries was read and mutated without synchronisation,
causing race conditions when multiple platforms (Telegram + Discord)
received messages concurrently on the same gateway process. Two threads
could simultaneously pass the session_key check and create duplicate
sessions for the same user, splitting conversation history.

- Added threading.Lock to protect all _entries / _loaded mutations
- Split _ensure_loaded() into public wrapper + internal _ensure_loaded_locked()
- SQLite I/O is performed outside the lock to avoid blocking during
  slow disk operations
- _save() stays inside the lock since it reads _entries for serialization

Cherry-picked from PR #3012 by Kewe63. Removed unrelated changes
(delivery.py case-sensitivity, hermes_state.py schema tracking) and
stripped the UTC timezone switch to keep the change focused on threading.

Co-authored-by: Kewe63 <Kewe63@users.noreply.github.com>
2026-03-25 15:15:37 -07:00
Teknium
14cf2d85ca fix(display): guard isatty() against closed streams via _is_tty property (#3056)
In gateway/Telegram mode, the stdout fd can be closed by executor
thread cleanup. KawaiiSpinner.stop() called isatty() on the closed fd,
raising ValueError and masking the original error.

Instead of a point fix, add a _is_tty property that centralizes the
closed-stream guard — both _animate() and stop() now use it. Follows
the same (ValueError, OSError) pattern already in _write().

Inspired by PR #2632 by bot-deo88.
2026-03-25 15:15:15 -07:00
Teknium
8bb1d15da4 chore: remove ~100 unused imports across 55 files (#3016)
Automated cleanup via pyflakes + autoflake with manual review.

Changes:
- Removed unused stdlib imports (os, sys, json, pathlib.Path, etc.)
- Removed unused typing imports (List, Dict, Any, Optional, Tuple, Set, etc.)
- Removed unused internal imports (hermes_cli.auth, hermes_cli.config, etc.)
- Fixed cli.py: removed 8 shadowed banner imports (imported from hermes_cli.banner
  then immediately redefined locally — only build_welcome_banner is actually used)
- Added noqa comments to imports that appear unused but serve a purpose:
  - Re-exports (gateway/session.py SessionResetPolicy, tools/terminal_tool.py
    is_interrupted/_interrupt_event)
  - SDK presence checks in try/except (daytona, fal_client, discord)
  - Test mock targets (auxiliary_client.py Path, mcp_config.py get_hermes_home)

Zero behavioral changes. Full test suite passes (6162/6162, 2 pre-existing
streaming test failures unrelated to this change).
2026-03-25 15:02:03 -07:00
Teknium
861624d4e9 fix(cli): refresh TUI before background task output to prevent status bar overlap (#3048)
When a background task (/bg command) prints its output while the main agent
is processing with the thinking spinner visible, the status bar could render
on the same row as the spinner, causing visual overlap.

This fix adds an explicit app.invalidate() call with a brief pause before
printing background task output, ensuring the TUI layout is in a consistent
state before the output is written.

Changes:
- Add TUI refresh before success output in _handle_background_command
- Add TUI refresh before error output in the exception handler
- Add tests for the refresh behavior

Closes #2718

Co-authored-by: Bartok9 <bartokmagic@proton.me>
2026-03-25 15:00:33 -07:00
Teknium
e4033b2baf fix(cli): catch KeyboardInterrupt during flush_memories on exit (#3025)
KeyboardInterrupt inherits from BaseException, not Exception, so the
except Exception: clauses wrapping flush_memories() on exit paths
silently skipped the flush when the user pressed Ctrl+C. This could
lose conversation memory.

Change both call sites to except (Exception, KeyboardInterrupt): so
the memory flush is attempted even during interrupt.

Salvaged from PR #2855 by RufusLin (dropped unrelated bundled changes).
2026-03-25 12:47:51 -07:00
Teknium
94e3d9adbf fix(agent): restore safe non-streaming fallback after stream failures (#3020)
After streaming retries are exhausted on transient errors, fall back to
non-streaming instead of propagating the error. Also fall back for any
other pre-delivery stream error (not just 'streaming not supported').

Added user-facing message when streaming is not supported by a model/
provider, directing users to set display.streaming: false in config.yaml
to avoid the fallback delay.

Cherry-picked from PR #3008 by kshitijk4poor. Added UX message for
streaming-not-supported detection.

Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-03-25 12:46:04 -07:00
Teknium
0dcd6ab2f2 fix: status bar shows 26K instead of 260K for token counts with trailing zeros (#3024)
format_token_count_compact() used unconditional rstrip("0") to clean up
decimal trailing zeros (e.g. "1.50" → "1.5"), but this also stripped
meaningful trailing zeros from whole numbers ("260" → "26", "100" → "1").
Guard the strip behind a decimal-point check.

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-25 12:45:58 -07:00
Siddharth Balyan
b6461903ff feat: nix flake — uv2nix build, NixOS module, persistent container mode (#20)
* feat: nix flake, uv2nix build, dev shell and home manager

* fixed nix run, updated docs for setup

* feat(nix): NixOS module with persistent container mode, managed guards, checks

- Replace homeModules.nix with nixosModules.nix (two deployment modes)
- Mode A (native): hardened systemd service with ProtectSystem=strict
- Mode B (container): persistent Ubuntu container with /nix/store bind-mount,
  identity-hash-based recreation, GC root protection, symlink-based updates
- Add HERMES_MANAGED guards blocking CLI config mutation (config set, setup,
  gateway install/uninstall) when running under NixOS module
- Add nix/checks.nix with build-time verification (binary, CLI, managed guard)
- Remove container.nix (no Nix-built OCI image; pulls ubuntu:24.04 at runtime)
- Simplify packages.nix (drop fetchFromGitHub submodules, PYTHONPATH wrappers)
- Rewrite docs/nixos-setup.md with full options reference, container
  architecture, secrets management, and troubleshooting guide

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Update config.py

* feat(nix): add CI workflow and enhanced build checks
- GitHub Actions workflow for nix flake check + build on linux/macOS
- Entry point sync check to catch pyproject.toml drift
- Expanded managed-guard check to cover config edit
- Wrap hermes-acp binary in Nix package
- Fix Path type mismatch in is_managed()

* Update MCP server package name; bundled skills support

* fix reading .env. instead have container user a common mounted .env file

* feat(nix): container entrypoint with privilege drop and sudo provisioning

Container was running as non-root via --user, which broke apt/pip installs
and caused crashes when $HOME didn't exist. Replace --user with a Nix-built
entrypoint script that provisions the hermes user, sudo (NOPASSWD), and
/home/hermes inside the container on first boot, then drops privileges via
setpriv. Writable layer persists so setup only runs once.

Also expands MCP server options to support HTTP transport and sampling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix group and user creation in container mode

* feat(nix): persistent /home/hermes and MESSAGING_CWD in container mode

Container mode now bind-mounts ${stateDir}/home to /home/hermes so the
agent's home directory survives container recreation. Previously it lived
in the writable layer and was lost on image/volume/options changes.

Also passes MESSAGING_CWD to the container so the agent finds its
workspace and documents, matching native mode behavior.

Other changes:
- Extract containerDataDir/containerHomeDir bindings (no more magic strings)
- Fix entrypoint chown to run unconditionally (volume mounts always exist)
- Add schema field to container identity hash for auto-recreation
- Add idempotency test (Scenario G) to config-roundtrip check

* docs: add Nix & NixOS setup guide to docs site

Add comprehensive Nix documentation to the Docusaurus site at
website/docs/getting-started/nix-setup.md, covering nix run/profile
install, NixOS module (native + container modes), declarative settings,
secrets management, MCP servers, managed mode, container architecture,
dev shell, flake checks, and full options reference.

- Register nix-setup in sidebar after installation page
- Add Nix callout tip to installation.md linking to new guide
- Add canonical version pointer in docs/nixos-setup.md

* docs: remove docs/nixos-setup.md, consolidate into website docs

Backfill missing details (restart/restartSec in full example,
gateway.pid, 0750 permissions, docker inspect commands) into
the canonical website/docs/getting-started/nix-setup.md and
delete the old standalone file.

* fix(nix): add compression.protect_last_n and target_ratio to config-keys.json

New keys were added to DEFAULT_CONFIG on main, causing the
config-drift check to fail in CI.

* fix(nix): skip checks on aarch64-darwin (onnxruntime wheel missing)

The full Python venv includes onnxruntime (via faster-whisper/STT)
which lacks a compatible uv2nix wheel on aarch64-darwin. Gate all
checks behind stdenv.hostPlatform.isLinux. The package and devShell
still evaluate on macOS.

* fix(nix): skip flake check and build on macOS CI

onnxruntime (transitive dep via faster-whisper) lacks a compatible
uv2nix wheel on aarch64-darwin. Run full checks and build on Linux
only; macOS CI verifies the flake evaluates without building.

* fix(nix): preserve container writable layer across nixos-rebuild

The container identity hash included the entrypoint's Nix store path,
which changes on every nixpkgs update (due to runtimeShell/stdenv
input-addressing). This caused false-positive identity mismatches,
triggering container recreation and losing the persistent writable layer.

- Use stable symlink (current-entrypoint) like current-package already does
- Remove entrypoint from identity hash (only image/volumes/options matter)
- Add GC root for entrypoint so nix-collect-garbage doesn't break it
- Remove global HERMES_HOME env var from addToSystemPackages (conflicted
  with interactive CLI use, service already sets its own)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 01:08:02 +05:30
Teknium
8f6ef042c1 fix(cli): buffer reasoning preview chunks and fix duplicate display (#3013)
Three improvements to reasoning/thinking display in the CLI:

1. Buffer tiny reasoning chunks: providers like DeepSeek stream reasoning
   one word at a time, producing a separate [thinking] line per token.
   Add a buffer that coalesces chunks and flushes at natural boundaries
   (newlines, sentence endings, terminal width).

2. Fix duplicate reasoning display: centralize callback selection into
   _current_reasoning_callback() — one place instead of 4 scattered
   inline ternaries. Prevents both the streaming box AND the preview
   callback from firing simultaneously.

3. Fix post-response reasoning box guard: change the check from
   'not self._stream_started' to 'not self._reasoning_stream_started'
   so the final reasoning box is only suppressed when reasoning was
   actually streamed live, not when any text was streamed.

Cherry-picked from PR #2781 by juanfradb.
2026-03-25 12:16:39 -07:00
Teknium
099dfca6db fix: GLM reasoning-only and max-length handling (#3010)
- Add 'prompt exceeds max length' to context overflow detection for
  Z.AI/GLM 400 errors
- Extract inline reasoning blocks from assistant content as fallback
  when no structured reasoning fields are present
- Guard inline extraction so structured API reasoning takes priority
- Update test for reasoning-only response salvage behavior

Cherry-picked from PR #2993 by kshitijk4poor. Added priority guard
to fix test_structured_reasoning_takes_priority failure.

Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-03-25 12:05:37 -07:00
Teknium
68ab37e891 fix(delegate): give subagents independent iteration budgets (#3004)
Each subagent now gets its own IterationBudget instead of sharing the
parent's.  The per-subagent cap is controlled by delegation.max_iterations
in config.yaml (default 50).  Total iterations across parent + subagents
can exceed the parent's max_iterations, but the user retains control via
the config setting.

Previously, subagents shared the parent's budget, so three parallel
subagents configured for max_iterations=50 racing against a parent that
already used 60 of 90 would each only get ~10 iterations.

Inspired by PR #2928 (Bartok9) which identified the issue (#2873).
2026-03-25 11:29:49 -07:00
Teknium
65dace1b1a fix(discord): stop phantom typing indicator after agent turn completes (#3003)
Two fixes for a race where Discord's typing indicator lingers after the
agent finishes:

1. _keep_typing (root cause): after outer stop_typing() clears the task
   dict, _keep_typing wakes from its 2s sleep and calls send_typing()
   again, recreating an orphaned loop. Add a finally block so _keep_typing
   always calls stop_typing() on exit, cleaning up any loop it recreated.

2. _process_message_background (safety net): add stop_typing() after
   cancelling the typing task, catching any platform-level persistent
   typing tasks that slipped through.

Combines fixes from PR #2945 by catbusconductor (root cause in
_keep_typing) and PR #2832 by subrih (safety net in
_process_message_background).
2026-03-25 11:28:28 -07:00
Teknium
650b400c98 fix(cron): mark session as ended after job completes (#2998)
Cron was the only execution path that never called end_session(),
leaving ended_at = NULL permanently. This made cron sessions invisible
to hermes prune --older-than and indistinguishable from active sessions.

Captures session_id in a local variable before agent construction so
it's available in the finally block even if AIAgent() fails, then calls
end_session(session_id, 'cron_complete') before close().

Cherry-picked from PR #2979 by ygd58. Fixed bug: original PR called
end_session() with zero arguments (TypeError — method requires
session_id and end_reason).

Fixes #2972.

Co-authored-by: ygd58 <ygd58@users.noreply.github.com>
2026-03-25 11:13:21 -07:00
Teknium
61949f0af7 Fix (#2997)
Co-authored-by: Jack <jvand@DESKTOP-JACK.localdomain>
2026-03-25 11:12:11 -07:00
Teknium
52c5e491f5 fix(session): surface silent SessionDB failures that cause session data loss (#2999)
* fix(session): surface silent SessionDB failures that cause session data loss

SessionDB initialization and operation failures were logged at debug level
or silently swallowed, causing sessions to never be indexed in the FTS5
database. This made session_search unable to find affected conversations.

In practice, ~48% of sessions can be lost without any visible indication.
The JSON session files are still written (separate code path), but the
SQLite/FTS5 index gets nothing — making session_search return empty results
for affected sessions.

Changes:
- cli.py: Log warnings (not debug) when SessionDB init fails at both
  __init__ and _start_session entry points
- run_agent.py: Log warnings on create_session, append_message, and
  compression split failures
- run_agent.py: Set _session_db = None after create_session failure to
  fail fast instead of silently dropping every message for the session

Root cause: When gateway restarts or DB lock contention occurs during
SessionDB() init, the exception is caught and swallowed. The agent
continues running normally — JSON session logs are written to disk —
but no messages reach the FTS5 index.

* fix: use module logger instead of root logging for SessionDB warnings

Follow-up to cherry-picked PR #2939 — the original used logging.warning()
(root logger) instead of logger.warning() (module logger) in the 5 new
warning calls. Module logger preserves the logger hierarchy and shows the
correct module name in log output.

---------

Co-authored-by: LucidPaths <lc77@outlook.de>
2026-03-25 11:10:19 -07:00
Teknium
f665351740 fix(shell): exponential backoff for persistent shell polling (#2996)
* fix(shell): replace fixed 10ms poll interval with exponential backoff to reduce WSL2 resource consumption

* fix(shell): rename _poll_interval to _poll_interval_start for clarity, update SSH override

* fix(shell): correctly rename _poll_interval to _poll_interval_start in ssh.py

---------

Co-authored-by: ygd58 <buraysandro9@gmail.com>
2026-03-25 10:56:48 -07:00
Teknium
fba73a60e3 fix(skills): use Git Trees API to prevent silent subdirectory loss during install (#2995)
* fix(skills): use Git Trees API to prevent silent subdirectory loss during install

Refactors _download_directory() to use the Git Trees API (single call
for the entire repo tree) as the primary path, falling back to the
recursive Contents API when the tree endpoint is unavailable or
truncated.  Prevents silent subdirectory loss caused by per-directory
rate limiting or transient failures.

Cherry-picked from PR #2981 by tugrulguner.
Fixes #2940.

* fix: simplify tree API — use branch name directly as tree-ish

Eliminates an extra git/ref/heads API call by passing the branch name
directly to git/trees/{branch}?recursive=1, matching the pattern
already used by _find_skill_in_repo_tree.

---------

Co-authored-by: tugrulguner <tugrulguner@users.noreply.github.com>
2026-03-25 10:48:18 -07:00
Teknium
114e636b7d fix(display): suppress KawaiiSpinner animation under patch_stdout (#2994)
When the CLI is active, sys.stdout is prompt_toolkit's StdoutProxy which
queues writes and injects newlines around each flush(). This causes every
\r spinner frame to land on its own line instead of overwriting the
previous one, producing visible flickering where the spinner and status
bar repeatedly swap positions.

The CLI already renders spinner state via a dedicated TUI widget
(_spinner_text / get_spinner_text), so KawaiiSpinner's \r-based loop is
redundant under StdoutProxy. Detect the proxy and suppress the animation
entirely — the thread still runs to preserve start()/stop() semantics.

Also removes the 0.4s flush rate-limit workaround that was papering over
the same issue, and cleans up the unused _last_flush_time attribute.

Salvaged from PR #2908 by Mibayy (fixed _raw -> raw detection, dropped
unrelated bundled changes).
2026-03-25 10:46:54 -07:00
Teknium
20cc1731f4 perf(prompt_builder): avoid redundant file re-read for skill conditions (#2992)
build_skills_system_prompt() was calling _read_skill_conditions() which
re-read each SKILL.md file to extract conditional activation fields.
The frontmatter was already parsed by _parse_skill_file() earlier in
the same loop. Extract conditions inline from the existing frontmatter
dict instead, saving one file read per skill (~80+ on a typical setup).

Salvaged from PR #2827 by InB4DevOps.
2026-03-25 10:39:27 -07:00
Teknium
b2a6b012fe fix(api_server): streaming breaks when agent makes tool calls (#2985)
* fix(run_agent): ensure _fire_first_delta() is called for tool generation events

Added calls to _fire_first_delta() in the AIAgent class to improve the handling of tool generation events, ensuring timely notifications during the processing of function calls and tool usage.

* fix(run_agent): improve timeout handling for chat completions

Enhanced the timeout configuration for chat completions in the AIAgent class by introducing customizable connection, read, and write timeouts using environment variables. This ensures more robust handling of API requests during streaming operations.

* fix(run_agent): reduce default stream read timeout for chat completions

Updated the default stream read timeout from 120 seconds to 60 seconds in the AIAgent class, enhancing the timeout configuration for chat completions. This change aims to improve responsiveness during streaming operations.

* fix(run_agent): enhance streaming error handling and retry logic

Improved the error handling and retry mechanism for streaming requests in the AIAgent class. Introduced a configurable maximum number of stream retries and refined the handling of transient network errors, allowing for retries with fresh connections. Non-transient errors now trigger a fallback to non-streaming only when appropriate, ensuring better resilience during API interactions.

* fix(api_server): streaming breaks when agent makes tool calls

The agent fires stream_delta_callback(None) to signal the CLI display
to close its response box before tool execution begins. The API server's
_on_delta callback was forwarding this None directly into the SSE queue,
where the SSE writer treats it as end-of-stream and terminates the HTTP
response prematurely.

After tool calls complete, the agent streams the final answer through
the same callback, but the SSE response was already closed — so Open
WebUI (and similar frontends) never received the actual answer.

Fix: filter out None in _on_delta so the SSE stream stays open. The SSE
loop already detects completion via agent_task.done(), which handles
stream termination correctly without needing the None sentinel.

Reported by Rohit Paul on X.
2026-03-25 09:56:20 -07:00
Teknium
42fec19151 feat: persist reasoning across gateway session turns (schema v6) (#2974)
feat: persist reasoning across gateway session turns (schema v6)

Tested against OpenAI Codex (direct), Anthropic (direct + OAI-compat), and OpenRouter → 6 backends. All reasoning field types (reasoning, reasoning_details, codex_reasoning_items) round-trip through the DB correctly.
2026-03-25 09:47:28 -07:00
Teknium
5dbe2d9d73 fix: skills-sh install fails for deeply nested repo structures (#2980)
* fix(run_agent): ensure _fire_first_delta() is called for tool generation events

Added calls to _fire_first_delta() in the AIAgent class to improve the handling of tool generation events, ensuring timely notifications during the processing of function calls and tool usage.

* fix(run_agent): improve timeout handling for chat completions

Enhanced the timeout configuration for chat completions in the AIAgent class by introducing customizable connection, read, and write timeouts using environment variables. This ensures more robust handling of API requests during streaming operations.

* fix(run_agent): reduce default stream read timeout for chat completions

Updated the default stream read timeout from 120 seconds to 60 seconds in the AIAgent class, enhancing the timeout configuration for chat completions. This change aims to improve responsiveness during streaming operations.

* fix(run_agent): enhance streaming error handling and retry logic

Improved the error handling and retry mechanism for streaming requests in the AIAgent class. Introduced a configurable maximum number of stream retries and refined the handling of transient network errors, allowing for retries with fresh connections. Non-transient errors now trigger a fallback to non-streaming only when appropriate, ensuring better resilience during API interactions.

* fix: skills-sh install fails for deeply nested repo structures

Skills in repos with deep directory nesting (e.g.
cli-tool/components/skills/development/senior-backend/) could not be
installed because the candidate path generation and shallow root-dir
scan never reached them.

Added GitHubSource._find_skill_in_repo_tree() which uses the GitHub
Trees API to recursively search the entire repo tree in a single API
call. This is used as a final fallback in
SkillsShSource._discover_identifier() when the standard candidate
paths and shallow scan both fail.

Fixes installation of skills from repos like davila7/claude-code-templates
where skills are nested 4+ levels deep.

Reported by user Samuraixheart.
2026-03-25 09:31:05 -07:00
Teknium
c6f4515f73 fix(whatsapp): download documents, audio, and video media from messages (#2978)
Add downloadMediaMessage() calls for documents, audio/voice notes, and
video in bridge.js — previously only images were downloaded, leaving all
other file types inaccessible to the agent.

Handle local file paths from the bridge for DOCUMENT, VOICE, and VIDEO
types in whatsapp.py with proper MIME detection. Inject text content
inline for readable files (.txt, .md, .csv, .json, etc.).

Follow-up fixes applied during salvage:
- Remove unused cache_document_from_bytes import
- Add 100KB size cap on text injection (matches Telegram/Discord/Slack)
- Align injection format with other platforms

Cherry-picked from PR #2818. Also fixes #2856 (bugs 1 & 2).
PR #2865 by ayberkesn fixed the same voice note issue.

Co-authored-by: noestelar <hola@noeali.com>
2026-03-25 08:37:28 -07:00
Teknium
fd292e676b fix: skip KawaiiSpinner when TUI handles tool progress (#2973)
* docs: unify hooks documentation — add plugin hooks to hooks page, add session:end event

The hooks page only documented gateway event hooks (HOOK.yaml system).
The plugins page listed plugin hooks (pre_tool_call, etc.) that weren't
referenced from the hooks page, which was confusing.

Changes:
- hooks.md: Add overview table showing both hook systems
- hooks.md: Add Plugin Hooks section with available hooks, callback
  signatures, and example
- hooks.md: Add missing session:end gateway event (emitted but undocumented)
- hooks.md: Mark pre_llm_call, post_llm_call, on_session_start,
  on_session_end as planned (defined in VALID_HOOKS but not yet invoked)
- hooks.md: Update info box to cross-reference plugin hooks
- hooks.md: Fix heading hierarchy (gateway content as subsections)
- plugins.md: Add cross-reference to hooks page for full details
- plugins.md: Mark planned hooks as (planned)

* feat(session_search): add recent sessions mode when query is omitted

When session_search is called without a query (or with an empty query),
it now returns metadata for the most recent sessions instead of erroring.
This lets the agent quickly see what was worked on recently without
needing specific keywords.

Returns for each session: session_id, title, source, started_at,
last_active, message_count, preview (first user message).
Zero LLM cost — pure DB query. Current session lineage and child
delegation sessions are excluded.

The agent can then keyword-search specific sessions if it needs
deeper context from any of them.

* docs: clarify two-mode behavior in session_search schema description

* fix(compression): restore sane defaults and cap summary at 12K tokens

- threshold: 0.80 → 0.50 (compress at 50%, not 80%)
- target_ratio: 0.40 → 0.20, now relative to threshold not total context
  (20% of 50% = 10% of context as tail budget)
- summary ceiling: 32K → 12K (Gemini can't output more than ~12K)
- Updated DEFAULT_CONFIG, config display, example config, and tests

* fix: browser_vision ignores auxiliary.vision.timeout config (#2901)

* docs: unify hooks documentation — add plugin hooks to hooks page, add session:end event

The hooks page only documented gateway event hooks (HOOK.yaml system).
The plugins page listed plugin hooks (pre_tool_call, etc.) that weren't
referenced from the hooks page, which was confusing.

Changes:
- hooks.md: Add overview table showing both hook systems
- hooks.md: Add Plugin Hooks section with available hooks, callback
  signatures, and example
- hooks.md: Add missing session:end gateway event (emitted but undocumented)
- hooks.md: Mark pre_llm_call, post_llm_call, on_session_start,
  on_session_end as planned (defined in VALID_HOOKS but not yet invoked)
- hooks.md: Update info box to cross-reference plugin hooks
- hooks.md: Fix heading hierarchy (gateway content as subsections)
- plugins.md: Add cross-reference to hooks page for full details
- plugins.md: Mark planned hooks as (planned)

* fix: browser_vision ignores auxiliary.vision.timeout config

browser_vision called call_llm() without passing a timeout parameter,
so it always used the 30-second default in auxiliary_client.py. This
made vision analysis with local models (llama.cpp, ollama) impossible
since they typically need more than 30s for screenshot analysis.

Now browser_vision reads auxiliary.vision.timeout from config.yaml
(same config key that vision_analyze already uses) and passes it
through to call_llm().

Also bumped the default vision timeout from 30s to 120s in both
browser_vision and vision_analyze — 30s is too aggressive for local
models and the previous default silently failed for anyone running
vision locally.

Fixes user report from GamerGB1988.

* fix(skills): agent-created skills were incorrectly treated as untrusted community content

_resolve_trust_level() didn't handle 'agent-created' source, so it
fell through to 'community' trust level. Community policy blocks on
any caution or dangerous findings, which meant common patterns like
curl with env vars, systemctl, crontab, cloudflared references etc.
would block skill creation/patching.

The agent-created policy row already existed in INSTALL_POLICY with
permissive settings (allow caution, ask on dangerous) but was never
reached. Now it is.

Fixes reports of skill_manage being blocked by security scanner.

* fix(cli): enhance real-time reasoning output by forcing flush of long partial lines

Updated the reasoning output mechanism to emit complete lines and force-flush long partial lines, ensuring reasoning is visible in real-time even without newlines. This improves user experience during reasoning sessions.

* fix: skip KawaiiSpinner when TUI handles tool progress

In the interactive CLI, the agent runs with quiet_mode=True and
tool_progress_callback set. The quiet_mode condition triggered
KawaiiSpinner for every tool call, but the TUI was already handling
progress display via the spinner widget.

The KawaiiSpinner writes carriage-return animation through StdoutProxy,
triggering run_in_terminal() erase/redraw cycles on every flush. These
redundant cycles cause the status bar to ghost into terminal scrollback.

The thinking spinner already had this guard (checks thinking_callback).
This extends the same pattern to the three tool spinner creation sites:
concurrent tools, delegate_task, and single tool execution.
2026-03-25 08:33:44 -07:00
Teknium
e5691eed38 feat(gateway): configurable Telegram reply threading mode (#2907)
Add reply_to_mode setting (off/first/all) to control whether Telegram
replies quote/thread to the user's original message.

- 'off': Never thread replies (no quote bubble)
- 'first': Only first chunk threads to user's message (default, preserves existing behavior)
- 'all': All chunks in multi-part replies thread to user's message

Configurable via:
- reply_to_mode in platform config (gateway config YAML)
- TELEGRAM_REPLY_TO_MODE env var

Based on PR #855 by raulvidis.
2026-03-24 19:56:00 -07:00
Teknium
ab4ba8163a feat(migration): comprehensive OpenClaw migration v2 — 17 new modules, terminal recap (#2906)
* feat(migration): comprehensive OpenClaw -> Hermes migration v2

Extends the existing migration script from ~15% to ~95% coverage of
OpenClaw's configuration surface. Adds 17 new migration modules:

Direct migrations (written to config.yaml/.env):
- MCP servers: full server definitions with transport, tools, sampling
- Agent defaults: reasoning_effort, compression, human_delay, timezone
- Session config: reset triggers (daily/idle) -> session_reset
- Full model providers: custom_providers with base_url/api_mode
- Deep channel config: Matrix, Mattermost, IRC, Discord deep settings
- Browser config: timeout settings
- Tools config: exec timeout -> terminal.timeout
- Approvals: mode mapping (smart/manual/auto -> Hermes equivalents)

Archived for manual review (no direct Hermes equivalent):
- Plugins config + installed extensions
- Cron jobs (with note to use 'hermes cron')
- Hooks/webhooks config
- Multi-agent list + routing bindings
- Gateway config (port, auth, TLS)
- Memory backend config (QMD, vector search)
- Skills registry per-entry config
- UI/identity settings
- Logging/diagnostics preferences

Also adds:
- MIGRATION_NOTES.md generation with PM2 reassurance message
- _set_env_var helper for consistent env file management
- Updated presets to include all new options
- Comprehensive mock test passing (12 migrated, 12 archived)

* feat(migration): add terminal recap with visual summary

Replaces raw JSON dump with a formatted box showing migrated/archived/
skipped/conflict/error counts, detailed item lists with labels, PM2
reassurance message, and actionable next steps. JSON output available
via MIGRATION_JSON_OUTPUT=1 env var.

* fix(test): allowlist python_os_environ as known false-positive in skills guard test

MIGRATION_JSON_OUTPUT env var is a legitimate CLI feature flag that enables
JSON output mode, not an env dump. Add it alongside agent_config_mod as an
accepted finding in test_skill_installs_cleanly_under_skills_guard.

* fix(test): add hermes_config_mod to known false-positives in skills guard test

The scanner flags two print statements that tell the user to *review*
~/.hermes/config.yaml in the post-migration summary. The script never
writes to that file — those are informational strings, not config mutations.

---------

Co-authored-by: Hermes <hermes@nousresearch.ai>
2026-03-24 19:44:02 -07:00
Teknium
80cc27eb9d feat(api-server): Idempotency-Key support, body size limit, OpenAI error envelope (#2903)
* feat(api-server): add Idempotency-Key support and request size limit; unify OpenAI error envelope

* fix(api-server): include provider error message in 500 OpenAI error body

---------

Co-authored-by: aydnOktay <xaydinoktay@gmail.com>
2026-03-24 19:31:08 -07:00
Teknium
1b24a226ea fix(skills): agent-created skills were incorrectly treated as untrusted community content
_resolve_trust_level() didn't handle 'agent-created' source, so it
fell through to 'community' trust level. Community policy blocks on
any caution or dangerous findings, which meant common patterns like
curl with env vars, systemctl, crontab, cloudflared references etc.
would block skill creation/patching.

The agent-created policy row already existed in INSTALL_POLICY with
permissive settings (allow caution, ask on dangerous) but was never
reached. Now it is.

Fixes reports of skill_manage being blocked by security scanner.
2026-03-24 19:15:03 -07:00
Teknium
9b32f846a8 fix: browser_vision ignores auxiliary.vision.timeout config (#2901)
* docs: unify hooks documentation — add plugin hooks to hooks page, add session:end event

The hooks page only documented gateway event hooks (HOOK.yaml system).
The plugins page listed plugin hooks (pre_tool_call, etc.) that weren't
referenced from the hooks page, which was confusing.

Changes:
- hooks.md: Add overview table showing both hook systems
- hooks.md: Add Plugin Hooks section with available hooks, callback
  signatures, and example
- hooks.md: Add missing session:end gateway event (emitted but undocumented)
- hooks.md: Mark pre_llm_call, post_llm_call, on_session_start,
  on_session_end as planned (defined in VALID_HOOKS but not yet invoked)
- hooks.md: Update info box to cross-reference plugin hooks
- hooks.md: Fix heading hierarchy (gateway content as subsections)
- plugins.md: Add cross-reference to hooks page for full details
- plugins.md: Mark planned hooks as (planned)

* fix: browser_vision ignores auxiliary.vision.timeout config

browser_vision called call_llm() without passing a timeout parameter,
so it always used the 30-second default in auxiliary_client.py. This
made vision analysis with local models (llama.cpp, ollama) impossible
since they typically need more than 30s for screenshot analysis.

Now browser_vision reads auxiliary.vision.timeout from config.yaml
(same config key that vision_analyze already uses) and passes it
through to call_llm().

Also bumped the default vision timeout from 30s to 120s in both
browser_vision and vision_analyze — 30s is too aggressive for local
models and the previous default silently failed for anyone running
vision locally.

Fixes user report from GamerGB1988.
2026-03-24 19:10:12 -07:00
Teknium
7ca22ea11b fix(compression): restore sane defaults and cap summary at 12K tokens
- threshold: 0.80 → 0.50 (compress at 50%, not 80%)
- target_ratio: 0.40 → 0.20, now relative to threshold not total context
  (20% of 50% = 10% of context as tail budget)
- summary ceiling: 32K → 12K (Gemini can't output more than ~12K)
- Updated DEFAULT_CONFIG, config display, example config, and tests
2026-03-24 18:48:47 -07:00
Teknium
ef47531617 docs: unify hooks documentation — add plugin hooks to hooks page, add session:end event
The hooks page only documented gateway event hooks (HOOK.yaml system).
The plugins page listed plugin hooks (pre_tool_call, etc.) that weren't
referenced from the hooks page, which was confusing.

Changes:
- hooks.md: Add overview table showing both hook systems
- hooks.md: Add Plugin Hooks section with available hooks, callback
  signatures, and example
- hooks.md: Add missing session:end gateway event (emitted but undocumented)
- hooks.md: Mark pre_llm_call, post_llm_call, on_session_start,
  on_session_end as planned (defined in VALID_HOOKS but not yet invoked)
- hooks.md: Update info box to cross-reference plugin hooks
- hooks.md: Fix heading hierarchy (gateway content as subsections)
- plugins.md: Add cross-reference to hooks page for full details
- plugins.md: Mark planned hooks as (planned)
2026-03-24 18:48:47 -07:00
Teknium
b36fe9282a feat(session_search): add recent sessions mode when query is omitted (#2533)
feat(session_search): add recent sessions mode when query is omitted
2026-03-24 18:41:38 -07:00
Teknium
1e9ff53a74 docs: clarify two-mode behavior in session_search schema description 2026-03-24 18:08:06 -07:00
Teknium
27c023e071 feat(config): expose compression target_ratio, protect_last_n, and threshold in DEFAULT_CONFIG
PR #2554 made these configurable via config.yaml but didn't add them
to DEFAULT_CONFIG or the config display. Users couldn't discover the
new knobs without reading the source.

- threshold: 0.80 (compress at 80% context usage)
- target_ratio: 0.40 (preserve 40% of context as recent tail)
- protect_last_n: 20 (keep last 20 messages uncompressed)
- Updated hermes config display to show all three fields
2026-03-24 18:05:43 -07:00
Teknium
9231a335d4 fix(compression): replace dead summary_target_tokens with ratio-based scaling (#2554)
The summary_target_tokens parameter was accepted in the constructor,
stored on the instance, and never used — the summary budget was always
computed from hardcoded module constants (_SUMMARY_RATIO=0.20,
_MAX_SUMMARY_TOKENS=8000). This caused two compounding problems:

1. The config value was silently ignored, giving users no control
   over post-compression size.
2. Fixed budgets (20K tail, 8K summary cap) didn't scale with
   context window size. Switching from a 1M-context model to a
   200K model would trigger compression that nuked 350K tokens
   of conversation history down to ~30K.

Changes:
- Replace summary_target_tokens with summary_target_ratio (default 0.40)
  which sets the post-compression target as a fraction of context_length.
  Tail token budget and summary cap now scale proportionally:
    MiniMax 200K → ~80K post-compression
    GPT-5   1M  → ~400K post-compression
- Change threshold_percent default: 0.50 → 0.80 (don't fire until
  80% of context is consumed)
- Change protect_last_n default: 4 → 20 (preserve ~10 full turns)
- Summary token cap scales to 5% of context (was fixed 8K), capped
  at 32K ceiling
- Read target_ratio and protect_last_n from config.yaml compression
  section (both are now configurable)
- Remove hardcoded summary_target_tokens=500 from run_agent.py
- Add 5 new tests for ratio scaling, clamping, and new defaults
2026-03-24 17:45:49 -07:00
Teknium
7efaa5968d Merge pull request #2891 from NousResearch/hermes/hermes-gateway-context
fix(gateway): stop loading hermes repo AGENTS.md into gateway sessions (~10k wasted tokens)
2026-03-24 17:43:41 -07:00
Teknium
8ee4f32819 fix(gateway): use TERMINAL_CWD for context file discovery, not process cwd
The gateway process runs from the hermes-agent install directory, so
os.getcwd() picks up the repo's AGENTS.md (16k chars) and other dev
context files — inflating input tokens by ~10k on every gateway message.

Fix: use TERMINAL_CWD (which the gateway sets to MESSAGING_CWD or
$HOME) as the cwd for build_context_files_prompt(). In CLI mode,
TERMINAL_CWD is the user's actual project directory, so behavior
is unchanged.

Before: gateway 15-20k input tokens, CLI 6-8k
After:  gateway ~6-8k input tokens (same as CLI)

Reported by keri on Discord.
2026-03-24 17:30:33 -07:00
Teknium
689344430c chore: gitignore orphaned mini-swe-agent directory 2026-03-24 12:50:34 -07:00
Teknium
618f15dda9 fix: reorder setup wizard providers — OpenRouter first
Move OpenRouter to position 1 in the setup wizard's provider list
to match hermes model ordering. Update default selection index and
fix test expectations for the new ordering.

Setup order: OpenRouter → Nous Portal → Codex → Custom → ...
2026-03-24 12:50:24 -07:00
Teknium
481915587e fix: update context pressure warnings and token estimates after compaction
Reset context pressure warnings and update last_prompt_tokens and last_completion_tokens in the context compressor to prevent stale values from causing excessive warnings and re-triggering compression. This change ensures accurate pressure calculations following the compaction process.
2026-03-24 09:25:10 -07:00
Teknium
0b993c1e07 docs: quote pip install extras to fix zsh glob errors (#2815)
zsh interprets square brackets as glob patterns, so
`pip install hermes-agent[voice]` fails with 'no matches found'.
Quote all pip install commands with extras across 5 docs pages (12 instances).

Reported by OFumik0OP.
2026-03-24 09:25:01 -07:00
Teknium
9718334962 docs: fix api-server response storage — SQLite, not in-memory (#2819)
* docs: update all docs for /model command overhaul and custom provider support

Documents the full /model command overhaul across 6 files:

AGENTS.md:
- Add model_switch.py to project structure tree

configuration.md:
- Rewrite General Setup with 3 config methods (interactive, config.yaml, env vars)
- Add new 'Switching Models with /model' section documenting all syntax variants
- Add 'Named Custom Providers' section with config.yaml examples and
  custom:name:model triple syntax

slash-commands.md:
- Update /model descriptions in both CLI and messaging tables with
  full syntax examples (provider:model, custom:model, custom:name:model,
  bare custom auto-detect)

cli-commands.md:
- Add /model slash command subsection under hermes model with syntax table
- Add custom endpoint config to hermes model use cases

faq.md:
- Add config.yaml example for offline/local model setup
- Note that provider: custom is a first-class provider
- Document /model custom auto-detect

provider-runtime.md:
- Add model_switch.py to implementation file list
- Update provider families to show Custom as first-class with named variants

* docs: fix api-server response storage description — SQLite, not in-memory

The ResponseStore class uses SQLite persistence (with in-memory
fallback), not pure in-memory storage. Responses survive gateway
restarts.
2026-03-24 09:05:15 -07:00
Teknium
ebcb81b649 docs: document 9 previously undocumented features
New documentation for features that existed in code but had no docs:

New page:
- context-references.md: Full docs for @-syntax inline context
  injection (@file:, @folder:, @diff, @staged, @git:, @url:) with
  line ranges, CLI autocomplete, size limits, sensitive path blocking,
  and error handling

configuration.md additions:
- Environment variable substitution: ${VAR_NAME} syntax in config.yaml
  with expansion, fallback, and multi-reference support
- Gateway streaming: Progressive token delivery on messaging platforms
  via message editing (StreamingConfig: enabled, transport, edit_interval,
  buffer_threshold, cursor) with platform support matrix
- Web search backends: Three providers (Firecrawl, Parallel, Tavily)
  with web.backend config key, capability matrix, auto-detection from
  API keys, self-hosted Firecrawl, and Parallel search modes

security.md additions:
- SSRF protection: Always-on URL validation blocking private networks,
  loopback, link-local, CGNAT, cloud metadata hostnames, with
  fail-closed DNS and redirect chain re-validation
- Tirith pre-exec security scanning: Content-level command scanning
  for homograph URLs, pipe-to-interpreter, terminal injection with
  auto-install, SHA-256/cosign verification, config options, and
  fail-open/fail-closed modes

sessions.md addition:
- Auto-generated session titles: Background LLM-powered title
  generation after first exchange

creating-skills.md additions:
- Conditional skill activation: requires_toolsets, requires_tools,
  fallback_for_toolsets, fallback_for_tools frontmatter fields with
  matching logic and use cases
- Environment variable requirements: required_environment_variables
  frontmatter for automatic env passthrough to sandboxed execution,
  plus terminal.env_passthrough user config
2026-03-24 08:56:21 -07:00
Teknium
ac5b8a478a ci: add supply chain audit workflow for PR scanning (#2816)
Scans every PR diff for patterns associated with supply chain attacks:

CRITICAL (blocks merge):
- .pth files (auto-execute on Python startup — litellm attack vector)
- base64 decode + exec/eval combo (obfuscated payload execution)
- subprocess with encoded/obfuscated commands

WARNING (comment only, no block):
- base64 encode/decode alone (legitimate uses: images, JWT, etc.)
- exec/eval alone
- Outbound POST/PUT requests
- setup.py/sitecustomize.py/usercustomize.py changes
- marshal.loads/pickle.loads/compile()

Posts a detailed comment on the PR with matched lines and context.
Excludes lockfiles (uv.lock, package-lock.json) from scanning.

Motivated by the litellm 1.82.7/1.82.8 credential stealer attack
(BerriAI/litellm#24512).
2026-03-24 08:56:04 -07:00
Teknium
624e4a8e7a chore: regenerate uv.lock with hashes, use lockfile in setup (#2812)
- Regenerate uv.lock with sha256 hashes for all 2965 package artifacts
- Add python_version marker to yc-bench (requires >=3.12)
- Update setup-hermes.sh to prefer 'uv sync --locked' for hash-verified
  installs, with fallback to 'uv pip install' when lockfile is stale

This completes the supply chain hardening: pyproject.toml bounds the
version ranges, and uv.lock pins exact versions with cryptographic
hashes so tampered packages are rejected at install time.
2026-03-24 08:42:45 -07:00
Teknium
177e43259f refactor: update mini_swe_runner to use Hermes built-in backends
Replace all minisweagent imports with Hermes-Agent's own environment
classes (LocalEnvironment, DockerEnvironment, ModalEnvironment).

mini_swe_runner.py no longer has any dependency on mini-swe-agent.
The runner now uses the same backends as the terminal tool, so Docker
and Modal environments work out of the box without extra submodules.

Tested: local and Docker backends verified working through the runner.
2026-03-24 08:27:15 -07:00
Teknium
c9b76057d4 chore: pin all dependency version ranges (supply chain hardening) (#2810)
Adds upper-bound version pins (<next_major) to all dependencies in
pyproject.toml — both core and optional. Previously most deps were
unpinned or had only floor bounds, meaning fresh installs would pull
whatever version was latest on PyPI.

This limits blast radius from supply chain attacks like the litellm
1.82.7/1.82.8 credential stealer (BerriAI/litellm#24512). With bounded
ranges, a compromised major version bump won't be pulled automatically.

Floors are set to current known-good installed versions.
2026-03-24 08:25:17 -07:00
Teknium
745859babb feat: env var passthrough for skills and user config (#2807)
* feat: env var passthrough for skills and user config

Skills that declare required_environment_variables now have those vars
passed through to sandboxed execution environments (execute_code and
terminal).  Previously, execute_code stripped all vars containing KEY,
TOKEN, SECRET, etc. and the terminal blocklist removed Hermes
infrastructure vars — both blocked skill-declared env vars.

Two passthrough sources:

1. Skill-scoped (automatic): when a skill is loaded via skill_view and
   declares required_environment_variables, vars that are present in
   the environment are registered in a session-scoped passthrough set.

2. Config-based (manual): terminal.env_passthrough in config.yaml lets
   users explicitly allowlist vars for non-skill use cases.

Changes:
- New module: tools/env_passthrough.py — shared passthrough registry
- hermes_cli/config.py: add terminal.env_passthrough to DEFAULT_CONFIG
- tools/skills_tool.py: register available skill env vars on load
- tools/code_execution_tool.py: check passthrough before filtering
- tools/environments/local.py: check passthrough in _sanitize_subprocess_env
  and _make_run_env
- 19 new tests covering all layers

* docs: add environment variable passthrough documentation

Document the env var passthrough feature across four docs pages:

- security.md: new 'Environment Variable Passthrough' section with
  full explanation, comparison table, and security considerations
- code-execution.md: update security section, add passthrough subsection,
  fix comparison table
- creating-skills.md: add tip about automatic sandbox passthrough
- skills.md: add note about passthrough after secure setup docs

Live-tested: launched interactive CLI, loaded a skill with
required_environment_variables, verified TEST_SKILL_SECRET_KEY was
accessible inside execute_code sandbox (value: passthrough-test-value-42).
2026-03-24 08:19:34 -07:00
Teknium
ad1bf16f28 chore: remove all remaining mini-swe-agent references
Complete cleanup after dropping the mini-swe-agent submodule (PR #2804):

- Remove MSWEA_SILENT_STARTUP and MSWEA_GLOBAL_CONFIG_DIR env var
  settings from cli.py, run_agent.py, hermes_cli/main.py, doctor.py
- Remove mini-swe-agent health check from hermes doctor
- Remove 'minisweagent' from logger suppression lists
- Remove litellm/typer/platformdirs from requirements.txt
- Remove mini-swe-agent install steps from install.ps1 (Windows)
- Remove mini-swe-agent install steps from website docs
- Update all stale comments/docstrings referencing mini-swe-agent
  in terminal_tool.py, tools/__init__.py, code_execution_tool.py,
  environments/README.md, environments/agent_loop.py
- Remove mini_swe_runner from pyproject.toml py-modules
  (still exists as standalone script for RL training use)
- Shrink test_minisweagent_path.py to empty stub

The orphaned mini-swe-agent/ directory on disk needs manual removal:
  rm -rf mini-swe-agent/
2026-03-24 08:19:23 -07:00
Teknium
e2c81c6e2f docs: add missing skills, CLI commands, and messaging env vars
Complete the documentation gaps identified in the previous audit:

Skills catalogs:
- skills-catalog.md: Add 7 missing bundled skills — data-science/
  jupyter-live-kernel, dogfood/hermes-agent-setup, inference-sh/
  inference-sh-cli, mlops/huggingface-hub, productivity/linear,
  research/parallel-cli, social-media/xitter
- optional-skills-catalog.md: Add 8 missing optional skills —
  blockchain/base, creative/blender-mcp, creative/meme-generation,
  mcp/fastmcp, productivity/telephony, research/bioinformatics,
  security/oss-forensics, security/sherlock

CLI commands reference:
- cli-commands.md: Add full documentation for hermes mcp (add/remove/
  list/test/configure) and hermes plugins (install/update/remove/list)

Messaging platform docs:
- discord.md: Add DISCORD_REQUIRE_MENTION and
  DISCORD_FREE_RESPONSE_CHANNELS to manual config env vars section
- signal.md: Add SIGNAL_ALLOW_ALL_USERS to env var reference table
- slack.md: Add SLACK_HOME_CHANNEL_NAME to config section
2026-03-24 08:12:37 -07:00
Teknium
677b11d84c fix: reject relative cwd paths for container terminal backends
When TERMINAL_CWD is set to '.' or any relative path (common when the
CLI config defaults to cwd='.'), container backends (docker, modal,
singularity, daytona) would pass it directly to the container where it's
meaningless. This caused 'docker run -d -w .' to fail.

Now relative paths are caught alongside host paths and replaced with
the default '/root' for container backends.
2026-03-24 08:03:14 -07:00
Teknium
ee3f3e756d docs: fix stale and incorrect documentation across 18 files
Cross-referenced all 84 docs pages against the actual codebase and
corrected every discrepancy found.

Reference docs:
- faq.md: Fix non-existent commands (/stats→/usage, /context→/usage,
  hermes models→hermes model, hermes config get→hermes config show,
  hermes gateway logs→cat gateway.log, async→sync chat() call)
- cli-commands.md: Fix --provider choices list (remove providers not
  in argparse), add undocumented -s/--skills flag
- slash-commands.md: Add missing /queue and /resume commands, fix
  /approve args_hint to show [session|always]
- tools-reference.md: Remove duplicate vision and web toolset sections
- environment-variables.md: Fix HERMES_INFERENCE_PROVIDER list (add
  copilot-acp, remove alibaba to match actual argparse choices)

Configuration & user guide:
- configuration.md: Fix approval_mode→approvals.mode (manual not ask),
  checkpoints.enabled default true not false, human_delay defaults
  (500/2000→800/2500), remove non-existent delegation.max_iterations
  and delegation.default_toolsets, fix website_blocklist nesting
  under security:, add .hermes.md and CLAUDE.md to context files
  table with priority system explanation
- security.md: Fix website_blocklist nesting under security:
- context-files.md: Add .hermes.md/HERMES.md and CLAUDE.md support,
  document priority-based first-match-wins loading behavior
- cli.md: Fix personalities config nesting (top-level, not under agent:)
- delegation.md: Fix model override docs (config-level, not per-call
  tool parameter)
- rl-training.md: Fix log directory (tinker-atropos/logs/→
  ~/.hermes/logs/rl_training/)
- tts.md: Fix Discord delivery format (voice bubble with fallback,
  not just file attachment)
- git-worktrees.md: Remove outdated v0.2.0 version reference

Developer guide:
- prompt-assembly.md: Add .hermes.md, CLAUDE.md, document priority
  system for context files
- agent-loop.md: Fix callback list (remove non-existent
  message_callback, add stream_delta_callback, tool_gen_callback,
  status_callback)

Messaging & guides:
- webhooks.md: Fix command (hermes setup gateway→hermes gateway setup)
- tips.md: Fix session idle timeout (120min→24h), config file
  (gateway.json→config.yaml)
- build-a-hermes-plugin.md: Fix plugin.yaml provides: format
  (provides_tools/provides_hooks as lists), note register_command()
  as not yet implemented
2026-03-24 07:53:07 -07:00
Teknium
02b38b93cb refactor: remove mini-swe-agent dependency — inline Docker/Modal backends (#2804)
Drop the mini-swe-agent git submodule. All terminal backends now use
hermes-agent's own environment implementations directly.

Docker backend:
- Inline the `docker run -d` container startup (was 15 lines in
  minisweagent's DockerEnvironment). Our wrapper already handled
  execute(), cleanup(), security hardening, volumes, and resource limits.

Modal backend:
- Import swe-rex's ModalDeployment directly instead of going through
  minisweagent's 90-line passthrough wrapper.
- Bake the _AsyncWorker pattern (from environments/patches.py) directly
  into ModalEnvironment for Atropos compatibility without monkey-patching.

Cleanup:
- Remove minisweagent_path.py (submodule path resolution helper)
- Remove submodule init/install from install.sh and setup-hermes.sh
- Remove mini-swe-agent from .gitmodules
- environments/patches.py is now a no-op (kept for backward compat)
- terminal_tool.py no longer does sys.path hacking for minisweagent
- mini_swe_runner.py guards imports (optional, for RL training only)
- Update all affected tests to mock the new direct subprocess calls
- Update README.md, CONTRIBUTING.md

No functionality change — all Docker, Modal, local, SSH, Singularity,
and Daytona backends behave identically. 6093 tests pass.
2026-03-24 07:30:25 -07:00
Teknium
2233f764af fix(tools): handle 402 insufficient credits error in vision tool (#2802)
Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>
2026-03-24 07:23:07 -07:00
Teknium
98b5570961 fix: make browser command timeout configurable via config.yaml (#2801)
browser_vision and other browser commands had a hardcoded 30-second
subprocess timeout that couldn't be overridden. Users with slower
machines (local Chromium without GPU) would hit timeouts on screenshot
capture even when setting browser.command_timeout in config.yaml,
because nothing read that value.

Changes:
- Add browser.command_timeout to DEFAULT_CONFIG (default: 30s)
- Add _get_command_timeout() helper that reads config, falls back to 30s
- _run_browser_command() now defaults to config value instead of constant
- browser_vision screenshot no longer hardcodes timeout=30
- browser_navigate uses max(config_timeout, 60) as floor for navigation

Reported by Gamer1988.
2026-03-24 07:21:50 -07:00
Teknium
773d3bb4df docs: update all docs for /model command overhaul and custom provider support
Documents the full /model command overhaul across 6 files:

AGENTS.md:
- Add model_switch.py to project structure tree

configuration.md:
- Rewrite General Setup with 3 config methods (interactive, config.yaml, env vars)
- Add new 'Switching Models with /model' section documenting all syntax variants
- Add 'Named Custom Providers' section with config.yaml examples and
  custom:name:model triple syntax

slash-commands.md:
- Update /model descriptions in both CLI and messaging tables with
  full syntax examples (provider:model, custom:model, custom:name:model,
  bare custom auto-detect)

cli-commands.md:
- Add /model slash command subsection under hermes model with syntax table
- Add custom endpoint config to hermes model use cases

faq.md:
- Add config.yaml example for offline/local model setup
- Note that provider: custom is a first-class provider
- Document /model custom auto-detect

provider-runtime.md:
- Add model_switch.py to implementation file list
- Update provider families to show Custom as first-class with named variants
2026-03-24 07:19:26 -07:00
Teknium
a312ee7b4c fix(agent): ensure first delta is fired during reasoning updates
- Added calls to `_fire_first_delta()` in the `AIAgent` class to ensure that the first delta is triggered for both reasoning and thinking updates. This change improves the handling of delta events during streaming, enhancing the responsiveness of the agent's reasoning capabilities.
2026-03-24 07:16:20 -07:00
Teknium
2e524272b1 refactor(model): extract shared switch_model() from CLI and gateway handlers
Phase 4 of the /model command overhaul.

Both the CLI (cli.py) and gateway (gateway/run.py) /model handlers
had ~50 lines of duplicated core logic: parsing, provider detection,
credential resolution, and model validation. This extracts that
pipeline into hermes_cli/model_switch.py.

New module exports:
- ModelSwitchResult: dataclass with all fields both handlers need
- CustomAutoResult: dataclass for bare '/model custom' results
- switch_model(): core pipeline — parse → detect → resolve → validate
- switch_to_custom_provider(): resolve endpoint + auto-detect model

The shared functions are pure (no I/O side effects). Each caller
handles its own platform-specific concerns:
- CLI: sets self.model/provider/etc, calls save_config_value(), prints
- Gateway: writes config.yaml directly, sets env vars, returns markdown

Net result: -244 lines from handlers, +234 lines in shared module.
The handlers are now ~80 lines each (down from ~150+) and can't drift
apart on core logic.
2026-03-24 07:08:07 -07:00
Teknium
ce39f9cc44 fix(gateway): detect virtualenv path instead of hardcoding venv/ (#2797)
Fixes #2492.

`generate_systemd_unit()` and `get_python_path()` hardcoded `venv`
as the virtualenv directory name. When the virtualenv is `.venv`
(which `setup-hermes.sh` and `.gitignore` both reference), the
generated systemd unit had incorrect VIRTUAL_ENV and PATH variables.

Introduce `_detect_venv_dir()` which:
1. Checks `sys.prefix` vs `sys.base_prefix` to detect the active venv
2. Falls back to probing `.venv` then `venv` under PROJECT_ROOT

Both `get_python_path()` and `generate_systemd_unit()` now use
this detection instead of hardcoded paths.

Co-authored-by: Hermes <hermes@nousresearch.ai>
2026-03-24 07:05:57 -07:00
Teknium
18cbd18fa9 fix: remove litellm/typer/platformdirs from hermes-agent deps (supply chain compromise) (#2796)
litellm 1.82.7/1.82.8 contained a credential stealer (.pth auto-exec
payload). PyPI quarantined the entire package, blocking all fresh
hermes-agent installs since litellm was listed as a hard dependency.

These three deps (litellm, typer, platformdirs) are only used by the
mini-swe-agent submodule, which has its own pyproject.toml and manages
its own dependencies. They were redundantly duplicated in hermes-agent's
pyproject.toml.

Also fixes install.sh to not print 'mini-swe-agent installed' on
failure, and updates warning messages in both install scripts to clarify
that only Docker/Modal backends are affected — local terminal is
unaffected.

Ref: https://github.com/BerriAI/litellm/issues/24512
2026-03-24 07:03:16 -07:00
Teknium
b641ee88f4 feat(model): /model command overhaul — Phases 2, 3, 5
* feat(model): persist base_url on /model switch, auto-detect for bare /model custom

Phase 2+3 of the /model command overhaul:

Phase 2 — Persist base_url on model switch:
- CLI: save model.base_url when switching to a non-OpenRouter endpoint;
  clear it when switching away from custom to prevent stale URLs
  leaking into the new provider's resolution
- Gateway: same logic using direct YAML write

Phase 3 — Better feedback and edge cases:
- Bare '/model custom' now auto-detects the model from the endpoint
  using _auto_detect_local_model() and saves all three config values
  (model, provider, base_url) atomically
- Shows endpoint URL in success messages when switching to/from
  custom providers (both CLI and gateway)
- Clear error messages when no custom endpoint is configured
- Updated test assertions for the additional save_config_value call

Fixes #2562 (Phase 2+3)

* feat(model): support custom:name:model triple syntax for named custom providers

Phase 5 of the /model command overhaul.

Extends parse_model_input() to handle the triple syntax:
  /model custom:local-server:qwen → provider='custom:local-server', model='qwen'
  /model custom:my-model          → provider='custom', model='my-model' (unchanged)

The 'custom:local-server' provider string is already supported by
_get_named_custom_provider() in runtime_provider.py, which matches
it against the custom_providers list in config.yaml. This just wires
the parsing so users can do it from the /model slash command.

Added 4 tests covering single, triple, whitespace, and empty model cases.
2026-03-24 06:58:04 -07:00
Teknium
2f1c4fb01f fix(auth): preserve 'custom' provider instead of silently remapping to 'openrouter'
resolve_provider('custom') was silently returning 'openrouter', causing
users who set provider: custom in config.yaml to unknowingly route
through OpenRouter instead of their local/custom endpoint. The display
showed 'via openrouter' even when the user explicitly chose custom.

Changes:
- auth.py: Split the conditional so 'custom' returns 'custom' as-is
- runtime_provider.py: _resolve_named_custom_runtime now returns
  provider='custom' instead of 'openrouter'
- runtime_provider.py: _resolve_openrouter_runtime returns
  provider='custom' when that was explicitly requested
- Add 'no-key-required' placeholder for keyless local servers
- Update existing test + add 5 new tests covering the fix

Fixes #2562
2026-03-24 06:41:11 -07:00
Teknium
4313b8aff6 fix(cli): ensure single closure of streaming boxes during tool generation
- Updated `_on_tool_gen_start` method in `HermesCLI` to close open streaming boxes exactly once, preventing potential multiple closures.
- Added a check for `_stream_box_opened` to manage the state of the streaming box more effectively, enhancing user experience during large payload streaming.
2026-03-24 06:33:21 -07:00
Teknium
87e2626cf6 feat(cli, agent): add tool generation callback for streaming updates
- Introduced `_on_tool_gen_start` in `HermesCLI` to indicate when tool-call arguments are being generated, enhancing user feedback during streaming.
- Updated `AIAgent` to support a new `tool_gen_callback`, notifying the display layer when tool generation starts, allowing for better user experience during large payloads.
- Ensured that the callback is triggered appropriately during streaming events to prevent user interface freezing.
2026-03-23 23:10:58 -07:00
Teknium
1345e93393 fix: add macOS Homebrew paths to browser and terminal PATH resolution
On macOS with Homebrew (Apple Silicon), Node.js and agent-browser
binaries live under /opt/homebrew/bin/ which is not included in the
_SANE_PATH fallback used by browser_tool.py and environments/local.py.
When Hermes runs with a filtered PATH (e.g. as a systemd service),
these binaries are invisible, causing 'env: node: No such file or
directory' errors when using browser tools.

Changes:
- Add /opt/homebrew/bin and /opt/homebrew/sbin to _SANE_PATH in both
  browser_tool.py and environments/local.py
- Add _discover_homebrew_node_dirs() to find versioned Node installs
  (e.g. brew install node@24) that aren't linked into /opt/homebrew/bin
- Extend _find_agent_browser() to search Homebrew and Hermes-managed
  dirs when agent-browser isn't on the current PATH
- Include discovered Homebrew node dirs in subprocess PATH when
  launching agent-browser
- Add 11 new tests covering all Homebrew path discovery logic
2026-03-23 22:45:55 -07:00
Teknium
6e97a3b338 docs: revise v0.4.0 changelog — fix feature attribution, reorder sections 2026-03-23 22:42:22 -07:00
Teknium
8416bc2142 chore: release v0.4.0 (v2026.3.23) 2026-03-23 22:34:04 -07:00
Teknium
48b5bc6038 fix(gateway): prevent stale memory overwrites by flush agent (#2670)
The gateway memory flush agent reviews old conversation history on session
reset/expiry and writes to memory. It had no awareness of memory changes
made after that conversation ended (by the live agent, cron jobs, or other
sessions), causing silent overwrites of newer entries.

Two fixes:

1. Skip memory flush entirely for cron sessions (session IDs starting with
   'cron_'). Cron sessions are headless with no meaningful user conversation
   to extract memories from.

2. Inject the current live memory state (MEMORY.md + USER.md) directly into
   the flush prompt. The flush agent can now see what's already saved and
   make informed decisions — only adding genuinely new information rather
   than blindly overwriting entries that may have been updated since the
   conversation ended.

Addresses the root cause identified in #2670: the flush agent was making
memory decisions blind to the current state of memory, causing stale
context to overwrite newer entries on gateway restarts and session resets.

Co-authored-by: devorun <devorun@users.noreply.github.com>
Co-authored-by: dlkakbs <dlkakbs@users.noreply.github.com>
2026-03-23 16:08:38 -07:00
Teknium
4ff73fb32c feat(config): support ${ENV_VAR} substitution in config.yaml (#2684)
* feat(config): support ${ENV_VAR} substitution in config.yaml

* fix: extend env var expansion to CLI and gateway config loaders

The original PR (#2680) only wired _expand_env_vars into load_config(),
which is used by 'hermes tools' and 'hermes setup'. The two primary
config paths were missed:

- load_cli_config() in cli.py (interactive CLI)
- Module-level _cfg in gateway/run.py (gateway — bridges api_keys to env vars)

Also:
- Remove redundant 'import re' (already imported at module level)
- Add missing blank lines between top-level functions (PEP 8)
- Add tests for load_cli_config() expansion

---------

Co-authored-by: teyrebaz33 <hakanerten02@hotmail.com>
2026-03-23 16:02:06 -07:00
Teknium
73a88a02fe fix(security): prevent shell injection in _expand_path via ~user path suffix (#2047)
echo was called with the full unquoted path (~username/suffix), allowing
command substitution in the suffix (e.g. ~user/$(malicious)) to execute
arbitrary shell commands. The fix expands only the validated ~username
portion via the shell and concatenates the suffix as a plain string.

Co-authored-by: Gutslabs <gutslabsxyz@gmail.com>
2026-03-23 16:00:34 -07:00
Teknium
f9c2565ab4 fix(config): log warning instead of silently swallowing config.yaml errors (#2683)
A bare `except Exception: pass` meant any YAML syntax error, bad value,
or unexpected structure in config.yaml was silently ignored and the
gateway fell back to .env / gateway.json without any indication.
Users had no way to know why their config changes had no effect.

Co-authored-by: sprmn24 <oncuevtv@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 15:54:11 -07:00
Teknium
ad5f973a8d fix(vision): make SSRF redirect guard async for httpx.AsyncClient
httpx.AsyncClient awaits event hooks. The sync _ssrf_redirect_guard
returned None, causing 'object NoneType can't be used in await
expression' on any vision_analyze call that followed redirects.

Caught during live PTY testing of the merged SSRF protection.
2026-03-23 15:44:52 -07:00
Teknium
0791efe2c3 fix(security): add SSRF protection to vision_tools and web_tools (hardened)
* fix(security): add SSRF protection to vision_tools and web_tools

Both vision_analyze and web_extract/web_crawl accept arbitrary URLs
without checking if they target private/internal network addresses.
A prompt-injected or malicious skill could use this to access cloud
metadata endpoints (169.254.169.254), localhost services, or private
network hosts.

Adds a shared url_safety.is_safe_url() that resolves hostnames and
blocks private, loopback, link-local, and reserved IP ranges. Also
blocks known internal hostnames (metadata.google.internal).

Integrated at the URL validation layer in vision_tools and before
each website_policy check in web_tools (extract, crawl).

* test(vision): update localhost test to reflect SSRF protection

The existing test_valid_url_with_port asserted localhost URLs pass
validation. With SSRF protection, localhost is now correctly blocked.
Update the test to verify the block, and add a separate test for
valid URLs with ports using a public hostname.

* fix(security): harden SSRF protection — fail-closed, CGNAT, multicast, redirect guard

Follow-up hardening on top of dieutx's SSRF protection (PR #2630):

- Change fail-open to fail-closed: DNS errors and unexpected exceptions
  now block the request instead of allowing it (OWASP best practice)
- Block CGNAT range (100.64.0.0/10): Python's ipaddress.is_private
  does NOT cover this range (returns False for both is_private and
  is_global). Used by Tailscale/WireGuard and carrier infrastructure.
- Add is_multicast and is_unspecified checks: multicast (224.0.0.0/4)
  and unspecified (0.0.0.0) addresses were not caught by the original
  four-check chain
- Add redirect guard for vision_tools: httpx event hook re-validates
  each redirect target against SSRF checks, preventing the classic
  redirect-based SSRF bypass (302 to internal IP)
- Move SSRF filtering before backend dispatch in web_extract: now
  covers Parallel and Tavily backends, not just Firecrawl
- Extract _is_blocked_ip() helper for cleaner IP range checking
- Add 24 new tests (CGNAT, multicast, IPv4-mapped IPv6, fail-closed
  behavior, parametrized blocked/allowed IP lists)
- Fix existing tests to mock DNS resolution for test hostnames

---------

Co-authored-by: dieutx <dangtc94@gmail.com>
2026-03-23 15:40:42 -07:00
Teknium
934fbe3c06 fix: strip ANSI at the source — clean terminal output before it reaches the model
Root cause: terminal_tool, execute_code, and process_registry returned raw
subprocess output with ANSI escape sequences intact. The model saw these
in tool results and copied them into file writes.

Previous fix (PR #2532) stripped ANSI at the write point in file_tools.py,
but this was a band-aid — regex on file content risks corrupting legitimate
content, and doesn't prevent ANSI from wasting tokens in the model context.

Source-level fix:
- New tools/ansi_strip.py with comprehensive ECMA-48 regex covering CSI
  (incl. private-mode, colon-separated, intermediate bytes), OSC (both
  terminators), DCS/SOS/PM/APC strings, Fp/Fe/Fs/nF escapes, 8-bit C1
- terminal_tool.py: strip output before returning to model
- code_execution_tool.py: strip stdout/stderr before returning
- process_registry.py: strip output in poll/read_log/wait
- file_tools.py: remove _strip_ansi band-aid (no longer needed)

Verified: `ls --color=always` output returned as clean text to model,
file written from that output contains zero ESC bytes.
2026-03-23 07:43:12 -07:00
Teknium
6302e56e7c fix(gateway): add all missing platform allowlist env vars to startup warning check (#2628)
* fix(gateway): added MATRIX_ALLOWED_USERS to list of env vars checked by gateway

* fix(gateway): add all missing platform allowlist env vars to startup check

The startup warning for 'No user allowlists configured' was only checking
TELEGRAM, DISCORD, WHATSAPP, SLACK, and SMS — missing SIGNAL, EMAIL,
MATTERMOST, and DINGTALK. Users of those platforms would see a spurious
warning even with their platform-specific allowlist configured.

Now matches the canonical platform_env_map in _is_user_authorized().

---------

Co-authored-by: SteelPh0enix <wojciech_olech@hotmail.com>
2026-03-23 07:19:14 -07:00
Teknium
868b3c07e3 fix: platform default toolsets silently override tool deselection in hermes tools (#2624)
Cherry-picked from PR #2576 by ereid7, plus read-side fix from 173a5c62.

Both fixes were originally landed in 173a5c62 but were inadvertently
reverted by commit 34be3f8b (a squash-merge that bundled unrelated
tools_config.py changes).

Save side (_save_platform_tools): exclude platform default toolset
names (hermes-cli, hermes-telegram) from preserved entries so they
don't silently re-enable everything.

Read side (_get_platform_tools): when the saved list contains explicit
configurable keys, use direct membership instead of subset inference.
The subset approach is broken when composite toolsets like hermes-cli
resolve to ALL tools.
2026-03-23 07:06:51 -07:00
Teknium
9d6148316c fix: media delivery fails for file paths containing spaces (#2621)
Cherry-picked from PR #2583 by Glucksberg.

The MEDIA: regex used \S+ which truncated paths at the first space.
Added a space-aware alternative anchored to known media extensions.
Also updated extract_local_files to allow spaces in path segments.

Follow-up fix: changed \s to [^\S\n] in the space-matching group
so the regex doesn't greedily match across newlines (broke multi-line
MEDIA: tags).
2026-03-23 06:59:59 -07:00
Teknium
7da0822456 fix(approval): honor bare YAML approvals.mode: off (#2620)
Cherry-picked from PR #2563 by tumf.

YAML 1.1 parses unquoted 'off' as boolean False. Added
_normalize_approval_mode() to map False -> 'off', True -> 'manual',
and normalize string values. Includes regression tests.
2026-03-23 06:56:09 -07:00
Teknium
d35df0db71 fix(discord): ignore system messages in on_message handler (#2618)
Cherry-picked from PR #2575 by ticketclosed-wontfix.

Filters out Discord system messages (thread renames, pins, member joins,
boosts) that were being treated as regular user messages.

Follow-up fix: also allow MessageType.reply (value 19) — the original
filter only allowed MessageType.default, which would silently drop all
reply-based interactions.

Added pytest.importorskip for discord dependency in tests.
2026-03-23 06:50:09 -07:00
Teknium
93dc5dee6f fix: prevent agents from starting gateway outside systemd management (#2617)
An agent session killed the systemd-managed gateway (PID 1605) and restarted
it with '&disown', taking it outside systemd's Restart= management. When the
orphaned process later received SIGTERM, nothing restarted it.

Add dangerous command patterns to detect:
- 'gateway run' with & (background), disown, nohup, or setsid
- These should use 'systemctl --user restart hermes-gateway' instead

Also applied directly to main repo and fixed the systemd service:
- Changed Restart=on-failure to Restart=always (clean SIGTERM = exit 0 = not
  a 'failure', so on-failure never triggered)
- RestartSec=10 for reasonable restart delay
2026-03-23 06:45:17 -07:00
Guts
2d8fad8230 fix(context): restrict @ references to safe workspace paths (#2601)
fix(context): block @ references from reading secrets outside the workspace. Defaults allowed_root to cwd, adds sensitive file blocklist.
2026-03-23 06:40:05 -07:00
Mibay
ca2958ff98 fix: normalize repeat<=0 to None to prevent cron jobs deleting after first run (#2612)
fix: normalize repeat<=0 to None — cron jobs deleted after first run when LLM passes -1
2026-03-23 06:35:43 -07:00
Teknium
f60ebc7bf2 fix: move activated skills line below welcome text
Previously 'Activated skills: xxx' was printed above the banner in
show_banner(). Now it prints directly after the 'Welcome to Hermes
Agent!' line in run(), which is a more natural placement.
2026-03-23 06:20:19 -07:00
Teknium
b072737193 fix: expand tilde (~) in vision_analyze local file paths (#2585)
Path('~/.hermes/image.png').is_file() returns False because Path
doesn't expand tilde. This caused the tool to fall through to URL
validation, which also failed, producing a confusing error:
'Invalid image source. Provide an HTTP/HTTPS URL or a valid local
file path.'

Fix: use os.path.expanduser() before constructing the Path object.
Added two tests for tilde expansion (success and nonexistent file).
2026-03-22 23:48:32 -07:00
Teknium
3b509da571 feat: auto-reconnect failed gateway platforms with exponential backoff (#2584)
When a messaging platform fails to connect at startup (e.g. transient DNS
failure) or disconnects at runtime with a retryable error, the gateway now
queues it for background reconnection instead of giving up permanently.

- New _platform_reconnect_watcher background task runs alongside the
  existing session expiry watcher
- Exponential backoff: 30s, 60s, 120s, 240s, 300s cap
- Max 20 retry attempts before giving up on a platform
- Non-retryable errors (bad auth token, etc.) are not retried
- Runtime disconnections via _handle_adapter_fatal_error now queue
  retryable failures instead of triggering gateway shutdown
- On successful reconnect, adapter is wired up and channel directory
  is rebuilt automatically

Fixes the case where a DNS blip during gateway startup caused Telegram
and Discord to be permanently unavailable until manual restart.
2026-03-22 23:48:24 -07:00
Teknium
5ddb6a191f Merge pull request #2556 from NousResearch/hermes/hermes-fdcb4c4a
fix(cli): allow custom/local endpoints without API key
2026-03-22 16:19:12 -07:00
Teknium
1b5fb36c9d fix(cli): allow custom/local endpoints without API key
Local LLM servers (llama.cpp, ollama, vLLM, etc.) typically don't
require authentication. When a custom base_url is configured but no
API key is found, use a placeholder instead of failing with
'Provider resolver returned an empty API key.'

The OpenAI SDK accepts any string as api_key, and local servers
simply ignore the Authorization header.

Fixes issue reported by @ThatWolfieGuy — llama.cpp stopped working
after updating because the new runtime provider resolver enforces
non-empty API keys even for keyless local endpoints.
2026-03-22 16:08:21 -07:00
Teknium
942f6eac94 fix(run_agent): ensure proper cleanup of OpenAI client in background review
Added explicit closing of the OpenAI/httpx client in the background review process to prevent "Event loop is closed" errors. This change ensures that the client is properly cleaned up when the review agent is no longer needed, enhancing stability and resource management.
2026-03-22 16:03:16 -07:00
Teknium
2b3c1d81f0 Merge pull request #2555 from NousResearch/hermes/hermes-fdcb4c4a
fix(cli): prevent 'Press ENTER to continue...' on exit
2026-03-22 16:03:13 -07:00
Teknium
1f21ef7488 fix(cli): prevent 'Press ENTER to continue...' on exit
When AsyncOpenAI clients are garbage-collected after the event loop
closes, their AsyncHttpxClientWrapper.__del__ tries to schedule
aclose() on the dead loop, causing RuntimeError: Event loop is closed.
prompt_toolkit catches this as an unhandled exception and shows
'Press ENTER to continue...' which blocks CLI exit.

Fix: Add shutdown_cached_clients() to auxiliary_client.py that marks
all cached async clients' underlying httpx transport as CLOSED before
GC runs. This prevents __del__ from attempting the aclose() call.

- _force_close_async_httpx(): sets httpx AsyncClient._state to CLOSED
- shutdown_cached_clients(): iterates _client_cache, closes sync clients
  normally and marks async clients as closed
- Also fix stale client eviction in _get_cached_client to mark evicted
  async clients as closed (was just del-ing them, triggering __del__)
- Call shutdown_cached_clients() from _run_cleanup() in cli.py
2026-03-22 15:31:54 -07:00
Teknium
b799bca7a3 refactor(gateway): remove broken 1.4x hygiene multiplier entirely
The previous commit capped the 1.4x at 95% of context, but the multiplier
itself is unnecessary and confusing:

  85% threshold × 1.4 = 119% of context → never fires
  95% warn      × 1.4 = 133% of context → never warns

The 85% hygiene threshold already provides ample headroom over the agent's
own 50% compressor. Even if rough estimates overestimate by 50%, hygiene
would fire at ~57% actual usage — safe and harmless.

Remove the multiplier entirely. Both actual and estimated token paths
now use the same 85% / 95% thresholds. Update tests and comments.
2026-03-22 15:21:18 -07:00
Teknium
b2b4a9ee7d fix(gateway): hygiene compression ignores config context_length and 1.4x exceeds model limit
Three bugs in gateway session hygiene pre-compression caused 'Session too
large' errors for ~200K context models like GLM-5-turbo on z.ai:

1. Gateway hygiene called get_model_context_length(model) without passing
   config_context_length, provider, or base_url — so user overrides like
   model.context_length: 180000 were ignored, and provider-aware detection
   (models.dev, z.ai endpoint) couldn't fire. The agent's own compressor
   correctly passed all three (run_agent.py line 1038).

2. The 1.4x safety factor on rough token estimates pushed the compression
   threshold above the model's actual context limit:
     200K * 0.85 * 1.4 = 238K > 200K (model limit)
   So hygiene never compressed, sessions grew past the limit, and the API
   rejected the request.

3. Same issue for the warn threshold: 200K * 0.95 * 1.4 = 266K.

Fix:
- Read model.context_length, provider, and base_url from config.yaml
  (same as run_agent.py does) and pass them to get_model_context_length()
- Resolve provider/base_url from runtime when not in config
- Cap the 1.4x-adjusted compress threshold at 95% of context_length
- Cap the 1.4x-adjusted warn threshold at context_length

Affects: z.ai GLM-5/GLM-5-turbo, any ~200K or smaller context model
where the 1.4x factor would push 85% above 100%.

Ref: Discord report from Ddox — glm-5-turbo on z.ai coding plan
2026-03-22 15:15:37 -07:00
Teknium
ed805f57ff fix(mcp-oauth): port mismatch, path traversal, and shared handler state (salvage #2521) (#2552)
* fix(mcp-oauth): port mismatch, path traversal, and shared state in OAuth flow

Three bugs in the new MCP OAuth 2.1 PKCE implementation:

1. CRITICAL: OAuth redirect port mismatch — build_oauth_auth() calls
   _find_free_port() to register the redirect_uri, but _wait_for_callback()
   calls _find_free_port() again getting a DIFFERENT port. Browser redirects
   to port A, server listens on port B — callback never arrives, 120s timeout.
   Fix: share the port via module-level _oauth_port variable.

2. MEDIUM: Path traversal via unsanitized server_name — HermesTokenStorage
   uses server_name directly in filenames. A name like "../../.ssh/config"
   writes token files outside ~/.hermes/mcp-tokens/.
   Fix: sanitize server_name with the same regex pattern used elsewhere.

3. MEDIUM: Class-level auth_code/state on _CallbackHandler causes data
   races if concurrent OAuth flows run. Second callback overwrites first.
   Fix: factory function _make_callback_handler() returns a handler class
   with a closure-scoped result dict, isolating each flow.

* test: add tests for MCP OAuth path traversal, handler isolation, and port sharing

7 new tests covering:
- Path traversal blocked (../../.ssh/config stays in mcp-tokens/)
- Dots/slashes sanitized and resolved within base dir
- Normal server names preserved
- Special characters sanitized (@, :, /)
- Concurrent handler result dicts are independent
- Handler writes to its own result dict, not class-level
- build_oauth_auth stores port in module-level _oauth_port

---------

Co-authored-by: 0xbyt4 <35742124+0xbyt4@users.noreply.github.com>
2026-03-22 15:02:26 -07:00
Teknium
e93b539a8f feat(session_search): add recent sessions mode when query is omitted
When session_search is called without a query (or with an empty query),
it now returns metadata for the most recent sessions instead of erroring.
This lets the agent quickly see what was worked on recently without
needing specific keywords.

Returns for each session: session_id, title, source, started_at,
last_active, message_count, preview (first user message).
Zero LLM cost — pure DB query. Current session lineage and child
delegation sessions are excluded.

The agent can then keyword-search specific sessions if it needs
deeper context from any of them.
2026-03-22 11:22:10 -07:00
Teknium
fa6f069577 fix(file_tools): strip ANSI escape codes from write_file and patch content (#2532)
Models occasionally copy ANSI escape sequences from terminal output
or display formatting into file content, breaking shebangs and
injecting binary characters into scripts.

Strip ANSI codes (CSI, OSC, simple escapes) from:
- write_file content
- patch old_string, new_string, and V4A patch content

The check is fast (skips entirely if no ESC byte present).

Reported by Andi Jaeger.
2026-03-22 11:17:06 -07:00
Teknium
cd2280d1a3 feat(gateway): notify users when session auto-resets (#2519)
When a session expires (daily schedule or idle timeout) and is
automatically reset, send a notification to the user explaining
what happened:

  ◐ Session automatically reset (inactive for 24h).
    Conversation history cleared.
  Use /resume to browse and restore a previous session.
  Adjust reset timing in config.yaml under session_reset.

Notifications are suppressed when:
- The expired session had no activity (no tokens used)
- The platform is excluded (api_server, webhook by default)
- notify: false in config

Changes:
- session.py: _should_reset() returns reason string ('idle'/'daily')
  instead of bool; SessionEntry gains auto_reset_reason and
  reset_had_activity fields; old entry's total_tokens checked
- config.py: SessionResetPolicy gains notify (bool, default: true)
  and notify_exclude_platforms (default: api_server, webhook)
- run.py: sends notification via adapter.send() before processing
  the user's message, with activity + platform checks
- 13 new tests

Config (config.yaml):

  session_reset:
    notify: true
    notify_exclude_platforms: [api_server, webhook]
2026-03-22 09:33:39 -07:00
Teknium
5e5ad634a1 fix(matrix): duplicate messages, image caching for vision support (#2520)
Three fixes for the Matrix adapter:

1. Remove RoomMessageMedia callback registration — RoomMessageImage
   inherits from it, causing images to be processed twice.

2. Add event ID deduplication to both text and media handlers.
   nio can fire the same event more than once; bounded deque+set
   tracks the last 1000 events.

3. Cache images locally via Matrix client download. MXC URLs require
   authentication, so the vision pipeline couldn't access them.
   Images are now downloaded via the authenticated client and saved
   to the local cache (same pattern as Telegram/Discord).

Cherry-picked from PR #2353 by williamtwomey.

Co-authored-by: williamtwomey <williamtwomey@users.noreply.github.com>
2026-03-22 09:27:25 -07:00
Teknium
55a27a3fb8 Merge pull request #2517 from NousResearch/hermes/hermes-31d7db3b
fix(telegram): auto-reconnect polling after network interruption
2026-03-22 09:19:10 -07:00
Teknium
8587cddd6c chore: remove unused imports, dead code, and stale comments (#2509)
chore: remove unused imports, dead code, and stale comments
2026-03-22 09:18:58 -07:00
Teknium
2bd8e5cb23 fix(telegram): auto-reconnect polling after network interruption
Closes #2476

The polling error callback previously only handled Conflict errors
(409 from multiple getUpdates callers). All other errors, including
NetworkError and TimedOut that python-telegram-bot raises when the
host loses connectivity (Mac sleep, WiFi switch, VPN reconnect),
were logged and silently discarded. The bot would stop responding
until manually restarted.

Fix:
- Add _looks_like_network_error() to classify transient connectivity
  errors (NetworkError, TimedOut, OSError, ConnectionError).
- Add _handle_polling_network_error() with exponential back-off
  reconnect: retries up to 10 times with delays 5s, 10s, 20s, 40s,
  60s (capped). On exhaustion, marks the adapter retryable-fatal so
  launchd/systemd can restart the gateway process.
- Refactor _polling_error_callback() to route network errors to the
  new handler before falling through to a generic error log.
- Track _polling_network_error_count (reset on successful reconnect)
  independently from _polling_conflict_count.
2026-03-22 09:18:58 -07:00
Teknium
bfe4baa6ed chore: remove unused imports, dead code, and stale comments
Mechanical cleanup — no behavior changes.

Unused imports removed:
- model_tools.py: import os
- run_agent.py: OPENROUTER_MODELS_URL, get_model_context_length
- cli.py: Table, VERSION, RELEASE_DATE, resolve_toolset, get_skill_commands
- terminal_tool.py: signal, uuid, tempfile, set_interrupt_event,
  DANGEROUS_PATTERNS, _load_permanent_allowlist, _detect_dangerous_command

Dead code removed:
- toolsets.py: print_toolset_tree() (zero callers)
- browser_tool.py: _get_session_name() (never called)

Stale comments removed:
- toolsets.py: duplicated/garbled comment line
- web_tools.py: 3 aspirational TODO comments from early development
2026-03-22 08:33:34 -07:00
Teknium
72a6d7dffe fix(model_metadata): skip endpoint probe for known providers (Copilot context bug) (#2507)
The context length resolver was querying the /models endpoint for known
providers like GitHub Copilot, which returns a provider-imposed limit
(128k) instead of the model's actual context window (400k for gpt-5.4).
Since this check happened before the models.dev lookup, the wrong value
won every time.

Fix:
- Add api.githubcopilot.com and models.github.ai to _URL_TO_PROVIDER
- Skip the endpoint metadata probe for known providers — their /models
  data is unreliable for context length. models.dev has the correct
  per-provider values.

Reported by danny [DUMB] — gpt-5.4 via Copilot was resolving to 128k
instead of the correct 400k from models.dev.
2026-03-22 08:15:06 -07:00
Teknium
afe2f0abe1 feat(discord): add document caching and text-file injection (#2503)
- Download and cache .pdf, .docx, .xlsx, .pptx attachments locally
  instead of passing expiring CDN URLs to the agent
- Inject .txt and .md content (≤100 KB) into event.text so the agent
  sees file content without needing to fetch the URL
- Add 20 MB size guard and SUPPORTED_DOCUMENT_TYPES allowlist
- Fix: unsupported types (.zip etc.) no longer get MessageType.DOCUMENT
- Add 9 unit tests in test_discord_document_handling.py

Mirrors the Slack implementation from PR #784. Discord CDN URLs are
publicly accessible so no auth header is needed (unlike Slack).

Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>
2026-03-22 07:38:14 -07:00
Teknium
09fd007c6e Merge pull request #2482 from NousResearch/hermes/hermes-5d6932ba
feat(cli): Claude Code-style @ context completions
2026-03-22 06:33:16 -07:00
Teknium
24cf2a7954 Merge pull request #2488 from NousResearch/hermes/hermes-31d7db3b
fix(tests): resolve all consistently failing tests
2026-03-22 06:24:48 -07:00
Teknium
be3eb62047 fix(tests): resolve all consistently failing tests
- test_plugins.py: remove tests for unimplemented plugin command API
  (get_plugin_command_handler, register_command never existed)
- test_redact.py: add autouse fixture to clear HERMES_REDACT_SECRETS
  env var leaked by cli.py import in other tests
- test_signal.py: same HERMES_REDACT_SECRETS fix for phone redaction
- test_mattermost.py: add @bot_user_id to test messages after the
  mention-only filter was added in #2443
- test_context_token_tracking.py: mock resolve_provider_client for
  openai-codex provider that requires real OAuth credentials

Full suite: 5893 passed, 0 failed.
2026-03-22 05:58:26 -07:00
Teknium
9c32fed184 feat(cli): Claude Code-style @ context completions
Based on PR #2454 by @kshitijk4poor (reimplemented lean — 127 lines
vs original 715).

Type @ in the CLI input to get autocomplete suggestions for context
references:
- Static: @diff, @staged, @file:, @folder:, @git:, @url:
- @file:path and @folder:path browse the filesystem
- Bare @ or @partial shows matching files/folders from cwd

Dropped from original: .hermesignore walking, custom shell tokenizer,
PathToken dataclass, fuzzy matching, token estimates. Kept: all
user-facing functionality.
2026-03-22 05:32:04 -07:00
Teknium
6435d69a6d fix: make vision_analyze timeout configurable via config.yaml (#2480)
Reads auxiliary.vision.timeout from config.yaml (default: 30s) and
passes it to async_call_llm. Useful for slow local vision models
that need more than 30 seconds.

Setting is in config.yaml (not .env) since it's not a secret:

  auxiliary:
    vision:
      timeout: 120

Based on PR #2306.

Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-03-22 05:28:24 -07:00
Teknium
a2276177a3 Merge pull request #2475 from NousResearch/hermes/hermes-31d7db3b
docs(honcho): add self-hosted / Docker configuration section
2026-03-22 05:03:34 -07:00
Teknium
ebd0291ef2 docs(honcho): add self-hosted / Docker configuration section
Document HONCHO_BASE_URL for users running a local Honcho instance.
Both hermes config and ~/.honcho/config.json paths are covered.

Closes #2318
2026-03-22 05:03:17 -07:00
Teknium
0510ee056d chore: add minimax-m2.7 to model catalogs (#2474)
* fix: respect DashScope v1 runtime mode for alibaba

Remove the hardcoded Alibaba branch from resolve_runtime_provider()
that forced api_mode='anthropic_messages' regardless of the base URL.

Alibaba now goes through the generic API-key provider path, which
auto-detects the protocol from the URL:
- /apps/anthropic → anthropic_messages (via endswith check)
- /v1 → chat_completions (default)

This fixes Alibaba setup with OpenAI-compatible DashScope endpoints
(e.g. coding-intl.dashscope.aliyuncs.com/v1) that were broken because
runtime always forced Anthropic mode even when setup saved a /v1 URL.

Based on PR #2024 by @kshitijk4poor.

* docs(skill): add split, merge, search examples to ocr-and-documents skill

Adds pymupdf examples for PDF splitting, merging, and text search
to the existing ocr-and-documents skill. No new dependencies — pymupdf
already covers all three operations natively.

* fix: replace all production print() calls with logger in rl_training_tool

Replace all bare print() calls in production code paths with proper logger calls.

- Add `import logging` and module-level `logger = logging.getLogger(__name__)`
- Replace print() in _start_training_run() with logger.info()
- Replace print() in _stop_training_run() with logger.info()
- Replace print(Warning/Note) calls with logger.warning() and logger.info()

Using the logging framework allows log level filtering, proper formatting,
and log routing instead of always printing to stdout.

* fix(gateway): process /queue'd messages after agent completion

/queue stored messages in adapter._pending_messages but never consumed
them after normal (non-interrupted) completion. The consumption path
at line 5219 only checked pending messages when result.get('interrupted')
was True — since /queue deliberately doesn't interrupt, queued messages
were silently dropped.

Now checks adapter._pending_messages after both interrupted AND normal
completion. For queued messages (non-interrupt), the first response is
delivered before recursing to process the queued follow-up. Skips the
direct send when streaming already delivered the response.

Reported by GhostMode on Discord.

* chore: add minimax/minimax-m2.7 to OpenRouter and MiniMax model catalogs

---------

Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
Co-authored-by: memosr.eth <96793918+memosr@users.noreply.github.com>
2026-03-22 05:00:25 -07:00
Teknium
44b572a9e0 fix: defer streaming iteration linebreak to prevent blank line stacking (#2473)
fix: defer streaming iteration linebreak to prevent blank line stacking
2026-03-22 04:59:40 -07:00
MacroAnarchy
f9c2ad48c2 fix: defer streaming iteration linebreak to prevent blank line stacking
Follow-up to 669c60a6 (cherry-pick of PR #2187, fixes #2177).

The original fix emits a "\n\n" delta immediately after every
_execute_tool_calls() invocation. When the model runs multiple
consecutive tool iterations before producing text (common with
search → read → analyze flows), each iteration appends its own
paragraph break, resulting in 4-6+ blank lines before the actual
response.

Replace the immediate delta with a deferred flag
(_stream_needs_break). _fire_stream_delta() checks the flag and
prepends a single "\n\n" only when the first real text delta
arrives, so multiple back-to-back tool iterations still produce
exactly one paragraph break.
2026-03-22 04:59:12 -07:00
Teknium
c275aa4732 Merge pull request #2465 from NousResearch/hermes/hermes-31d7db3b
feat(cli): MCP server management CLI + OAuth 2.1 PKCE auth
2026-03-22 04:56:48 -07:00
Teknium
ff071fc74c fix(gateway): process /queue'd messages after agent completion (#2469)
* fix: respect DashScope v1 runtime mode for alibaba

Remove the hardcoded Alibaba branch from resolve_runtime_provider()
that forced api_mode='anthropic_messages' regardless of the base URL.

Alibaba now goes through the generic API-key provider path, which
auto-detects the protocol from the URL:
- /apps/anthropic → anthropic_messages (via endswith check)
- /v1 → chat_completions (default)

This fixes Alibaba setup with OpenAI-compatible DashScope endpoints
(e.g. coding-intl.dashscope.aliyuncs.com/v1) that were broken because
runtime always forced Anthropic mode even when setup saved a /v1 URL.

Based on PR #2024 by @kshitijk4poor.

* docs(skill): add split, merge, search examples to ocr-and-documents skill

Adds pymupdf examples for PDF splitting, merging, and text search
to the existing ocr-and-documents skill. No new dependencies — pymupdf
already covers all three operations natively.

* fix: replace all production print() calls with logger in rl_training_tool

Replace all bare print() calls in production code paths with proper logger calls.

- Add `import logging` and module-level `logger = logging.getLogger(__name__)`
- Replace print() in _start_training_run() with logger.info()
- Replace print() in _stop_training_run() with logger.info()
- Replace print(Warning/Note) calls with logger.warning() and logger.info()

Using the logging framework allows log level filtering, proper formatting,
and log routing instead of always printing to stdout.

* fix(gateway): process /queue'd messages after agent completion

/queue stored messages in adapter._pending_messages but never consumed
them after normal (non-interrupted) completion. The consumption path
at line 5219 only checked pending messages when result.get('interrupted')
was True — since /queue deliberately doesn't interrupt, queued messages
were silently dropped.

Now checks adapter._pending_messages after both interrupted AND normal
completion. For queued messages (non-interrupt), the first response is
delivered before recursing to process the queued follow-up. Skips the
direct send when streaming already delivered the response.

Reported by GhostMode on Discord.

---------

Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
Co-authored-by: memosr.eth <96793918+memosr@users.noreply.github.com>
2026-03-22 04:56:13 -07:00
Teknium
8d528e0045 fix(api_server): persist ResponseStore to SQLite across restarts (#2472)
The /v1/responses endpoint used an in-memory OrderedDict that lost
all conversation state on gateway restart. Replace with SQLite-backed
storage at ~/.hermes/response_store.db.

- Responses and conversation name mappings survive restarts
- Same LRU eviction behavior (configurable max_size)
- WAL mode for concurrent read performance
- Falls back to in-memory SQLite if disk path unavailable
- Conversation name→response_id mapping moved into the store
2026-03-22 04:56:06 -07:00
Teknium
fd32e3d6e8 revert: remove trailing empty assistant message stripping (#2471)
revert: remove trailing empty assistant message stripping
2026-03-22 04:55:58 -07:00
Teknium
34be3f8be6 revert: remove trailing empty assistant message stripping
Reverts the sanitizer addition from PR #2466 (originally #2129).
We already have _empty_content_retries handling for reasoning-only
responses. The trailing strip risks silently eating valid messages
and is redundant with existing empty-content handling.
2026-03-22 04:55:34 -07:00
Teknium
3037450c77 Merge pull request #2468 from NousResearch/hermes/hermes-5d6932ba
feat(discord): persistent typing indicator for DMs
2026-03-22 04:53:32 -07:00
Teknium
b7091f93b1 feat(cli): MCP server management CLI + OAuth 2.1 PKCE auth
Add hermes mcp add/remove/list/test/configure CLI for managing MCP
server connections interactively. Discovery-first 'add' flow connects,
discovers tools, and lets users select which to enable via curses checklist.

Add OAuth 2.1 PKCE authentication for MCP HTTP servers (RFC 7636).
Supports browser-based and manual (headless) authorization, token
caching with 0600 permissions, automatic refresh. Zero external deps.

Add ${ENV_VAR} interpolation in MCP server config values, resolved
from os.environ + ~/.hermes/.env at load time.

Core OAuth module from PR #2021 by @imnotdev25. CLI and mcp_tool
wiring rewritten against current main. Closes #497, #690.
2026-03-22 04:52:52 -07:00
Teknium
ab3cbfc99d feat(discord): persistent typing indicator for DMs
Based on PR #2427 by @oxngon (core feature extracted, reformatting
and unrelated changes dropped).

Discord's TYPING_START gateway event is unreliable for bot DMs. This
adds a background typing loop that hits POST /channels/{id}/typing
every 8 seconds (indicator lasts ~10s) until the response is sent.

- send_typing() starts a per-channel background loop (idempotent)
- stop_typing() cancels it (called after _run_agent returns)
- Base adapter gets stop_typing() as a no-op default
- Per-channel tracking via _typing_tasks dict prevents duplicates
2026-03-22 04:52:33 -07:00
Teknium
26030266d2 docs: Gemini OAuth provider implementation plan (#2467)
* docs: add Gemini OAuth provider implementation plan

Planning doc for a standard-route Gemini provider using Google OAuth
(Authorization Code + PKCE) with the OpenAI-compatible endpoint at
generativelanguage.googleapis.com. Covers OAuth flow, token lifecycle,
file list, and estimated scope (~700 lines).

Replaces the Node.js bridge approach from PR #2042.

* chore: update OpenRouter model list

- Add xiaomi/mimo-v2-pro
- Add nvidia/nemotron-3-super-120b-a12b (paid, higher rate limits)
- Remove openrouter/hunter-alpha and openrouter/healer-alpha (discontinued)
2026-03-22 04:46:05 -07:00
Teknium
edda0e324b fix: batch of 5 small contributor fixes (#2466)
fix: batch of 5 small contributor fixes — PortAudio, SafeWriter, IMAP, thread lock, prefill
2026-03-22 04:40:20 -07:00
ygd58
5407d12bc6 fix(agent): strip trailing empty assistant messages before API calls to prevent prefill rejection 2026-03-22 04:38:17 -07:00
Hermes
2de42ba690 fix(state): add missing thread lock to session_count() and message_count()
Both methods accessed self._conn without self._lock, breaking the
thread-safety contract documented on SessionDB (line 111). All 22 other
DB methods use with self._lock — these two were the only exceptions.

In the gateway's multi-threaded environment (multiple platform reader
threads + single writer) this could cause cursor interleaving,
sqlite3.ProgrammingError, or inconsistent COUNT results.

Closes #2130
2026-03-22 04:38:17 -07:00
Hermes
f3301a31d5 fix(email): guard against IndexError when IMAP search returns empty list
imap.uid('search') can return data=[] when the mailbox is empty or
has no matching messages. Accessing data[0] without checking len first
raises IndexError: list index out of range.

Fixed at both call sites in gateway/platforms/email.py:
- Line 233 (connect): ALL search on startup
- Line 298 (fetch): UNSEEN search in the polling loop

Closes #2137
2026-03-22 04:38:17 -07:00
Bartok Moltbot
e6a708aa04 fix(io): catch ValueError in _SafeWriter for closed file handles (#2428)
When subagents run in ThreadPoolExecutor threads, the shared stdout handle
can close between thread teardown and KawaiiSpinner cleanup. Python raises
ValueError (not OSError) for I/O operations on closed files:
  ValueError: I/O operation on closed file

The _SafeWriter class was only catching OSError, missing this case.

Changes:
- Add ValueError to exception handling in write(), flush(), and isatty()
- Update docstring to document the ThreadPoolExecutor teardown scenario

Fixes #2428
2026-03-22 04:38:17 -07:00
Ivelin Tenev
e80489135b fix: improve error message when PortAudio system library is missing
When sounddevice is installed but libportaudio2 is not present on the
system, the OSError was caught together with ImportError and showed a
generic 'pip install sounddevice' message that sent users down the wrong
path.

Split the except clause to give a clear, actionable message for the
OSError case, including the correct apt/brew commands to install the
system library.
2026-03-22 04:38:17 -07:00
Teknium
a53db44d40 fix(compression): remove hardcoded gemini-3-flash-preview as default summary model (#2464)
fix(compression): remove hardcoded gemini-3-flash-preview as default summary model
2026-03-22 04:37:02 -07:00
Mibayy
0698ddb496 fix(compression): remove hardcoded gemini-3-flash-preview as default summary model
Closes #2453

The DEFAULT_CONFIG was hardcoding google/gemini-3-flash-preview as the
summary_model for context compression. This caused unexpected OpenRouter
charges for users who configured a different provider/model, because the
compression task would silently fall back to gemini via OpenRouter even
when the user's main model was on a different provider.

Fix: change summary_model default to empty string. When empty,
call_llm() resolves the model through the standard auto-detection chain
(auxiliary.compression config -> env vars -> main provider), which
correctly uses the user's configured provider and model.

Users who want a dedicated cheap model for compression can still
explicitly set compression.summary_model in their config.yaml.
2026-03-22 04:36:36 -07:00
Teknium
0962cbb2e5 fix: /stop command crash + UnboundLocalError in streaming media delivery (#2463)
fix: /stop command crash + UnboundLocalError in streaming media delivery
2026-03-22 04:35:57 -07:00
Teknium
f69c47d9ae fix: /stop command crash + UnboundLocalError in streaming media delivery
Two fixes:

1. CLI /stop command crashed with 'cannot import name get_registry' —
   the code imported a non-existent function. Fixed to use the actual
   process_registry singleton and list_sessions() method.
   (Reported in #2458 by haiyuzhong1980)

2. Streaming media delivery used undefined 'adapter' variable —
   our PR #2382 called _deliver_media_from_response(adapter=adapter)
   but 'adapter' wasn't guaranteed to be defined in that scope.
   Fixed to resolve via self.adapters.get(source.platform).
   (Reported in #2424 by 42-evey)
2026-03-22 04:35:27 -07:00
Teknium
027fc1a85a fix: replace production print() calls with logger in rl_training_tool (salvage #1981) (#2462)
* fix: respect DashScope v1 runtime mode for alibaba

Remove the hardcoded Alibaba branch from resolve_runtime_provider()
that forced api_mode='anthropic_messages' regardless of the base URL.

Alibaba now goes through the generic API-key provider path, which
auto-detects the protocol from the URL:
- /apps/anthropic → anthropic_messages (via endswith check)
- /v1 → chat_completions (default)

This fixes Alibaba setup with OpenAI-compatible DashScope endpoints
(e.g. coding-intl.dashscope.aliyuncs.com/v1) that were broken because
runtime always forced Anthropic mode even when setup saved a /v1 URL.

Based on PR #2024 by @kshitijk4poor.

* docs(skill): add split, merge, search examples to ocr-and-documents skill

Adds pymupdf examples for PDF splitting, merging, and text search
to the existing ocr-and-documents skill. No new dependencies — pymupdf
already covers all three operations natively.

* fix: replace all production print() calls with logger in rl_training_tool

Replace all bare print() calls in production code paths with proper logger calls.

- Add `import logging` and module-level `logger = logging.getLogger(__name__)`
- Replace print() in _start_training_run() with logger.info()
- Replace print() in _stop_training_run() with logger.info()
- Replace print(Warning/Note) calls with logger.warning() and logger.info()

Using the logging framework allows log level filtering, proper formatting,
and log routing instead of always printing to stdout.

---------

Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
Co-authored-by: memosr.eth <96793918+memosr@users.noreply.github.com>
2026-03-22 04:35:23 -07:00
Teknium
f84230527c docs(skill): add split, merge, search examples to ocr-and-documents skill (#2461)
* fix: respect DashScope v1 runtime mode for alibaba

Remove the hardcoded Alibaba branch from resolve_runtime_provider()
that forced api_mode='anthropic_messages' regardless of the base URL.

Alibaba now goes through the generic API-key provider path, which
auto-detects the protocol from the URL:
- /apps/anthropic → anthropic_messages (via endswith check)
- /v1 → chat_completions (default)

This fixes Alibaba setup with OpenAI-compatible DashScope endpoints
(e.g. coding-intl.dashscope.aliyuncs.com/v1) that were broken because
runtime always forced Anthropic mode even when setup saved a /v1 URL.

Based on PR #2024 by @kshitijk4poor.

* docs(skill): add split, merge, search examples to ocr-and-documents skill

Adds pymupdf examples for PDF splitting, merging, and text search
to the existing ocr-and-documents skill. No new dependencies — pymupdf
already covers all three operations natively.

---------

Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-03-22 04:31:22 -07:00
Teknium
0e64a48743 Merge pull request #2460 from NousResearch/hermes/hermes-5d6932ba
fix(discord): properly route slash event handling in threads
2026-03-22 04:28:53 -07:00
Teknium
ffa8b562e9 fix(discord): properly route slash event handling in threads
Cherry-picked from PR #2017 by @simpolism. Fixes #2011.

Discord slash commands in threads were missing thread_id in the
SessionSource, causing them to route to the parent channel session.
Commands like /usage and /reset returned wrong data or affected the
wrong session.

Detects discord.Thread channels in _build_slash_event and sets
chat_type='thread' with thread_id. Two tests added.
2026-03-22 04:25:19 -07:00
Teknium
56b0104154 fix: respect DashScope v1 runtime mode for alibaba (#2459)
Remove the hardcoded Alibaba branch from resolve_runtime_provider()
that forced api_mode='anthropic_messages' regardless of the base URL.

Alibaba now goes through the generic API-key provider path, which
auto-detects the protocol from the URL:
- /apps/anthropic → anthropic_messages (via endswith check)
- /v1 → chat_completions (default)

This fixes Alibaba setup with OpenAI-compatible DashScope endpoints
(e.g. coding-intl.dashscope.aliyuncs.com/v1) that were broken because
runtime always forced Anthropic mode even when setup saved a /v1 URL.

Based on PR #2024 by @kshitijk4poor.

Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-03-22 04:24:43 -07:00
Teknium
c0c13e4ed4 fix(api-server): harden jobs API — input limits, field whitelist, startup check, tests (#2456)
fix(api-server): harden jobs API — input limits, field whitelist, startup check, tests
2026-03-22 04:18:45 -07:00
Teknium
89befcaf33 fix(cron): support Telegram topic delivery via platform:chat_id:thread_id format (#2455)
Parse thread_id from explicit deliver target (e.g. telegram:-1003724596514:17)
and forward it to _send_to_platform and mirror_to_session.

Previously _resolve_delivery_target() always set thread_id=None when
parsing the platform:chat_id format, breaking cron job delivery to
specific Telegram topics.

Added tests:
- test_explicit_telegram_topic_target_with_thread_id
- test_explicit_telegram_chat_id_without_thread_id

Also updated CRONJOB_SCHEMA deliver description to document the
platform:chat_id:thread_id format.

Co-authored-by: Alex Ferrari <alex@thealexferrari.com>
2026-03-22 04:18:28 -07:00
Teknium
0f1c970179 fix(api-server): harden jobs API — input limits, field whitelist, startup check, tests
Five improvements to the /api/jobs endpoints:

1. Startup availability check — cron module imported once at class load,
   endpoints return 501 if unavailable (not 500 per-request import error)
2. Input limits — name ≤ 200 chars, prompt ≤ 5000 chars, repeat must be
   positive int
3. Update field whitelist — only name/schedule/prompt/deliver/skills/
   repeat/enabled pass through to cron.jobs.update_job, preventing
   arbitrary key injection
4. Deduplicated validation — _check_job_id and _check_jobs_available
   helpers replace repeated boilerplate
5. 32 new tests covering all endpoints, validation, auth, and
   cron-unavailable cases
2026-03-22 04:18:18 -07:00
Teknium
57d3ac0c0b Merge pull request #2452 from NousResearch/hermes/hermes-5d6932ba
fix(deps): add dingtalk-stream to optional dependencies
2026-03-22 04:12:36 -07:00
Teknium
a9f9c60efd fix(deps): add dingtalk-stream to optional dependencies
Cherry-picked from PR #2065 by @ygd58. Fixes #2062.

dingtalk-stream was required by gateway/platforms/dingtalk.py but not
listed in pyproject.toml, causing ImportError on pip install .[all].
Adds dingtalk extras group following the same pattern as slack/sms/etc.
2026-03-22 04:08:49 -07:00
Teknium
e109a8b502 fix(security): block untrusted browser access to api server (#2451)
Co-authored-by: ifrederico <fr@tecompanytea.com>
2026-03-22 04:08:48 -07:00
Teknium
b81926def6 feat(api-server): add /api/jobs endpoints for cron job management (#2450)
feat(api-server): add /api/jobs endpoints for cron job management
2026-03-22 04:07:22 -07:00
Teknium
8cb7864110 fix: resolve garbled ANSI escape codes in status printouts (#2262) (#2448)
Two related root causes for the '?[33mTool progress: NEW?[0m' garbling
reported on kitty, alacritty, ghostty and gnome-console:

1. /verbose label printing used self.console.print() with Rich markup
   ([yellow]...[/]).  self.console is a plain Rich Console() whose output
   goes directly to sys.stdout, which patch_stdout's StdoutProxy
   intercepts and mangles raw ANSI sequences.

2. Context pressure status lines (e.g. 'approaching compaction') from
   AIAgent._safe_print() had the same problem -- _safe_print() was a
   @staticmethod that always called builtin print(), bypassing the
   prompt_toolkit renderer entirely.

Fix:
- Convert AIAgent._safe_print() from @staticmethod to an instance method
  that delegates to self._print_fn (defaults to builtin print, preserving
  all non-CLI behaviour).
- After the CLI creates its AIAgent instance, wire self.agent._print_fn to
  the existing _cprint() helper which routes through
  prompt_toolkit.print_formatted_text(ANSI(text)).
- Rewrite the /verbose feedback labels to use hermes_cli.colors.Colors
  ANSI constants in f-strings and emit them via _cprint() directly,
  removing the Rich-markup-inside-patch_stdout anti-pattern.

Fixes #2262

Co-authored-by: Animesh Mishra <animesh.m.7523@gmail.com>
2026-03-22 04:07:06 -07:00
Teknium
7cd9f9ed48 feat(api-server): add /api/jobs endpoints for cron job management
CRUD + actions for cron jobs on the existing API server (port 8642):
  GET    /api/jobs              — list jobs
  POST   /api/jobs              — create job
  GET    /api/jobs/{id}         — get job
  PATCH  /api/jobs/{id}         — update job
  DELETE /api/jobs/{id}         — delete job
  POST   /api/jobs/{id}/pause   — pause job
  POST   /api/jobs/{id}/resume  — resume job
  POST   /api/jobs/{id}/run     — trigger immediate run

All endpoints use existing API_SERVER_KEY auth. Job ID format
validated (12 hex chars). Logic ported from PR #2111 by nock4,
adapted from FastAPI to aiohttp on the existing API server.
2026-03-22 04:06:57 -07:00
Teknium
2c2334d4db Merge pull request #2449 from NousResearch/hermes/hermes-31d7db3b
fix(cron): scale missed-job grace window with schedule frequency
2026-03-22 04:04:42 -07:00
Teknium
21ffadc2a6 fix: dynamic grace window for missed cron job catch-up
Replace hardcoded 120-second grace period with a dynamic window that
scales with the job's scheduling frequency (half the period, clamped
to [120s, 2h]). Daily jobs now catch up if missed by up to 2 hours
instead of being silently skipped after just 2 minutes.
2026-03-22 04:04:24 -07:00
Teknium
241f966b1a Merge pull request #2447 from NousResearch/hermes/hermes-5d6932ba
fix: skills hub inspect/resolve — 4 bugs in inspect, redirects, discovery, tap list
2026-03-22 04:04:19 -07:00
Teknium
7d0e4510b8 fix: skills hub inspect/resolve — 4 bugs
Cherry-picked from PR #2122 by @AtlasMeridia.

1. do_inspect bytes crash: bundle.files returns bytes for official
   skills, .split() expected str. Added decode guard.
2. GitHub redirects: three httpx.get calls missing follow_redirects=True,
   causing silent 301 failures on renamed orgs.
3. Skill discovery fallback: scan repo root directories when standard
   paths (skills/, .agents/skills/, .claude/skills/) miss.
4. tap list KeyError: t['repo'] crashes for local taps. Use safe .get().
2026-03-22 04:03:28 -07:00
Teknium
306e67f32d fix: fail fast when explicit provider has no API key instead of silent OpenRouter fallback (#2445)
When a non-OpenRouter provider (e.g. minimax, anthropic) is set in
config.yaml but its API key is missing, Hermes silently fell back to
OpenRouter, causing confusing 404 errors.

Now checks if the user explicitly configured a provider before falling
back. Explicit providers raise RuntimeError with a clear message naming
the missing env var. Auto/openrouter/custom providers still fall through
to OpenRouter as before.

Three code paths fixed:
- run_agent.py AIAgent.__init__ — main client initialization
- auxiliary_client.py call_llm — sync auxiliary calls
- auxiliary_client.py call_llm_streaming — async auxiliary calls

Based on PR #2272 by @StefanIsMe. Applied manually to fix a
pconfig NameError in the original and extend to call_llm_streaming.

Co-authored-by: StefanIsMe <StefanIsMe@users.noreply.github.com>
2026-03-22 03:59:29 -07:00
Teknium
5c8d7d5d6f fix(skills_guard): agent-created dangerous skills ask instead of block (#2446)
fix(skills_guard): agent-created dangerous skills ask instead of block
2026-03-22 03:56:30 -07:00
Teknium
0b370f2dd9 fix(skills_guard): agent-created dangerous skills ask instead of block
Changes the policy for agent-created skills with critical security
findings from 'block' (silently rejected) to 'ask' (allowed with
warning logged). The agent created the skill, so blocking it entirely
is too aggressive — let it through but log the findings.

- Policy: agent-created dangerous changed from block to ask
- should_allow_install returns None for 'ask' (vs True/False)
- format_scan_report shows 'NEEDS CONFIRMATION' for ask
- skill_manager_tool.py caller handles None (allows with warning)
- force=True still overrides as before

Based on PR #2271 by redhelix (closed — 3200 lines of unrelated
Mission Control code excluded).
2026-03-22 03:56:02 -07:00
Teknium
887e8a8d84 Merge pull request #2444 from NousResearch/hermes/hermes-31d7db3b
fix(tests): replace FakePath with monkeypatch for Python 3.12 compat
2026-03-22 03:52:56 -07:00
Teknium
189214a69d fix(tests): replace FakePath subclass with monkeypatch for Python 3.12 compat
Python 3.12 changed PosixPath.__new__ to ignore the redirected path
argument, breaking the FakePath subclass pattern. Use monkeypatch on
Path.exists instead.

Based on PR #2261 by @dieutx, fixed NameError (bare Path not imported).
2026-03-22 03:52:39 -07:00
Teknium
cd6d24f111 Merge pull request #2443 from NousResearch/hermes/hermes-31d7db3b
feat(gateway): add @-mention-only filter for Mattermost channels
2026-03-22 03:50:35 -07:00
Teknium
c01cfe4f9a fix(cron): silent jobs return empty response for delivery skip (#2442)
Fixes #2234

The placeholder '(No response generated)' was overwriting the actual
final_response, causing it to be delivered to Discord even when the
agent completed work silently via tools.

Changes:
- Separate logged_response for output template display
- Keep final_response clean (empty when agent has no text)
- Delivery logic now correctly skips when final_response is empty

Test added to verify empty response stays empty for delivery.

Co-authored-by: Bartok9 <bartokmagic@proton.me>
2026-03-22 03:50:27 -07:00
Teknium
fbbe9e6030 feat(gateway): add @-mention-only filter for Mattermost channels
The Mattermost adapter now only responds to messages in channels and
groups when the bot is @-mentioned. DMs are always processed without
filtering.

Detection checks both the bot's @username and user ID in the message
text, providing a reliable fallback when the structured mentions field
is unavailable.

Fixes #2174
2026-03-22 03:50:20 -07:00
Teknium
43bca6d107 Merge pull request #2413 from NousResearch/hermes/hermes-5d6932ba
fix: add iteration boundary linebreak to prevent stream concatenation
2026-03-21 19:28:12 -07:00
Teknium
669c60a6bb fix: add iteration boundary linebreak to prevent stream concatenation
Cherry-picked from PR #2187 by @devorun. Fixes #2177.

When streaming is enabled, text before and after tool calls gets
concatenated without separation. Adds a paragraph break delta after
_execute_tool_calls() so stream consumers insert proper whitespace
between iteration boundaries.
2026-03-21 19:19:26 -07:00
Teknium
dd39003a9b Merge pull request #2406 from NousResearch/hermes/hermes-31d7db3b
fix(gateway): detect stopped processes and release stale locks on --replace
2026-03-21 18:16:15 -07:00
Teknium
4bded44b6a fix(gateway): detect stopped processes and release stale locks on --replace 2026-03-21 18:13:53 -07:00
Teknium
ec22635b47 Merge pull request #2403 from NousResearch/hermes/hermes-31d7db3b
fix(model_metadata): use /v1/props endpoint for llama.cpp context detection
2026-03-21 18:07:41 -07:00
Teknium
29d0541ac9 fix(model_metadata): use /v1/props endpoint for llama.cpp context detection
Recent versions of llama.cpp moved the server properties endpoint from
/props to /v1/props (consistent with the /v1 API prefix convention).

The server-type detection path and the n_ctx reading path both used the
old /props URL, which returns 404 on current builds. This caused the
allocated context window size to fall back to a hardcoded default,
resulting in an incorrect (too small) value being displayed in the TUI
context bar.

Fix: try /v1/props first, fall back to /props for backward compatibility
with older llama.cpp builds. Both paths are now handled gracefully.
2026-03-21 18:07:18 -07:00
Teknium
a0f411c87d Merge pull request #2400 from NousResearch/hermes/hermes-5d6932ba
fix(signal): use id instead of attachmentId in getAttachment RPC
2026-03-21 18:05:28 -07:00
Teknium
862d5224dd docs: replace ASCII diagrams with Mermaid/lists, add linting note (#2402)
docs: replace ASCII diagrams with Mermaid/lists, add linting note
2026-03-21 17:58:52 -07:00
Teknium
e664bc7632 docs: replace ASCII diagrams with Mermaid/lists, add linting note
CI enforces ascii-guard linting on docs. Replaced ASCII box diagrams
with Mermaid flowcharts (open-webui architecture) and numbered lists
(CLI layout). Added diagram linting note to website README.

Based on PR #2364 by aydnOktay (closed — README had broken formatting).
2026-03-21 17:58:30 -07:00
Teknium
f9052d7ecf fix(signal): use id instead of attachmentId in getAttachment RPC
Cherry-picked from PR #2365 by @xerpert.

Three bugs preventing Signal image attachments from being processed:
1. signal-cli getAttachment RPC expects 'id', not 'attachmentId'
2. signal-cli daemon returns dict {"data": "base64..."} not raw base64
3. MessageType.IMAGE doesn't exist — correct enum is MessageType.PHOTO
2026-03-21 17:56:12 -07:00
Teknium
7dff34ba4e fix: auxiliary client skips expired Codex JWT and propagates Anthropic OAuth flag (salvage #2378)
fix: auxiliary client skips expired Codex JWT and propagates Anthropic OAuth flag (salvage #2378)
2026-03-21 17:54:19 -07:00
0xbyt4
dbc25a386e fix: auxiliary client skips expired Codex JWT and propagates Anthropic OAuth flag
Two bugs in the auxiliary provider auto-detection chain:

1. Expired Codex JWT blocks the auto chain: _read_codex_access_token()
   returned any stored token without checking expiry, preventing fallback
   to working providers. Now decodes JWT exp claim and returns None for
   expired tokens.

2. Auxiliary Anthropic client missing OAuth identity transforms:
   _AnthropicCompletionsAdapter always called build_anthropic_kwargs with
   is_oauth=False, causing 400 errors for OAuth tokens. Now detects OAuth
   tokens via _is_oauth_token() and propagates the flag through the
   adapter chain.

Cherry-picked from PR #2378 by 0xbyt4. Fixed test_api_key_no_oauth_flag
to mock resolve_anthropic_token directly (env var alone was insufficient).
2026-03-21 17:36:25 -07:00
Teknium
0ea7d0ec80 fix(terminal): log disk warning check failures at debug level (salvage #2372) (#2394)
* fix(terminal): log disk warning check failures at debug level

* fix(terminal): guard _check_disk_usage_warning by moving scratch_dir into try

---------

Co-authored-by: aydnOktay <xaydinoktay@gmail.com>
2026-03-21 17:10:17 -07:00
Teknium
1d28b4699b fix(redact): safely handle non-string inputs (salvage #2369)
fix(redact): safely handle non-string inputs (salvage #2369)
2026-03-21 17:10:14 -07:00
0xbyt4
e0ca46cd73 fix: restore opencode-go provider config corrupted by secret redaction (#2393)
auth_type was "***" instead of "api_key" and api_key_env_vars was
("OPEN...",) instead of ("OPENCODE_GO_API_KEY",). This was introduced
in 35d948b6 when a secret redaction tool masked these values during
the Kilo Code provider commit. OpenCode Go provider was completely
broken as a result.
2026-03-21 17:08:52 -07:00
Teknium
5454a55269 fix(prompt-caching): skip top-level cache_control on role:tool for OpenRouter (#2391)
fix(prompt-caching): skip top-level cache_control on role:tool for OpenRouter
2026-03-21 16:55:23 -07:00
aydnOktay
40c9a13476 fix(redact): safely handle non-string inputs
redact_sensitive_text() now returns early for None and coerces other
non-string values to str before applying regex-based redaction,
preventing TypeErrors in logging/tool-output paths.

Cherry-picked from PR #2369 by aydnOktay.
2026-03-21 16:55:02 -07:00
teyrebaz33
bd49bce278 fix(prompt-caching): skip top-level cache_control on role:tool for OpenRouter
On the native Anthropic Messages API path, convert_messages_to_anthropic()
moves top-level cache_control on role:tool messages inside the tool_result
block. On OpenRouter (chat_completions), no such conversion happens — the
unexpected top-level field causes a silent hang on the second tool call.

Add native_anthropic parameter to _apply_cache_marker() and
apply_anthropic_cache_control(). When False (OpenRouter), role:tool messages
are skipped entirely. When True (native Anthropic), existing behaviour is
preserved.

Fixes #2362
2026-03-21 16:54:43 -07:00
Teknium
52dd479214 Merge pull request #2361 from NousResearch/hermes/hermes-5d6932ba
feat(gateway): cache AIAgent per session for prompt caching
2026-03-21 16:53:21 -07:00
Teknium
c57d5cbdde fix(update): prompt before resetting working tree on stash conflicts (#2390)
When 'hermes update' stashes local changes and the restore hits
conflicts, the previous behavior silently ran 'git reset --hard HEAD'
to clean up. This could surprise users who didn't realize their
working tree was being nuked.

Now the conflict handler:
- Lists the specific conflicted files
- Reassures the user their stash is preserved
- Asks before resetting (interactive mode)
- Auto-resets in non-interactive mode (prompt_user=False)
- If declined, leaves the working tree as-is with guidance
2026-03-21 16:49:19 -07:00
Teknium
525caadd8c fix: prevent Anthropic token leaking to third-party anthropic_messages providers (salvage #2383) (#2389)
* fix: prevent Anthropic token fallback leaking to third-party anthropic_messages providers

When provider is minimax/alibaba/etc and MINIMAX_API_KEY is not set,
the code fell back to resolve_anthropic_token() sending Anthropic OAuth
credentials to third-party endpoints, causing 401 errors.

Now only provider=="anthropic" triggers the fallback. Generalizes the
Alibaba-specific guard from #1739 to all non-Anthropic providers.

* fix: set provider='anthropic' in credential refresh tests

Follow-up for cherry-picked PR #2383 — existing tests didn't set
agent.provider, which the new guard requires to allow Anthropic
token refresh.

---------

Co-authored-by: 0xbyt4 <35742124+0xbyt4@users.noreply.github.com>
2026-03-21 16:42:46 -07:00
Teknium
f9fa7421cb feat: bioinformatics gateway skill — index to 400+ bio skills
feat: bioinformatics gateway skill — index to 400+ bio skills
2026-03-21 16:38:43 -07:00
Teknium
342096b4bd feat(gateway): cache AIAgent per session for prompt caching
The gateway created a fresh AIAgent per message, rebuilding the system
prompt (including memory, skills, context files) every turn. This broke
prompt prefix caching — providers like Anthropic charge ~10x more for
uncached prefixes.

Now caches AIAgent instances per session_key with a config signature.
The cached agent is reused across messages in the same session,
preserving the frozen system prompt and tool schemas. Cache is
invalidated when:
- Config changes (model, provider, toolsets, reasoning, ephemeral
  prompt) — detected via signature mismatch
- /new, /reset, /clear — explicit session reset
- /model — global model change clears all cached agents
- /reasoning — global reasoning change clears all cached agents

Per-message state (callbacks, stream consumers, progress queues) is
set on the agent instance before each run_conversation() call.

This matches CLI behavior where a single AIAgent lives across all turns
in a session, with _cached_system_prompt built once and reused.
2026-03-21 16:21:06 -07:00
Teknium
55510cbad2 Merge pull request #2388 from NousResearch/hermes/hermes-31d7db3b
fix(provider): prevent Anthropic fallback from inheriting non-Anthropic base_url + fix(update): reset on stash conflict
2026-03-21 16:20:08 -07:00
Teknium
3ab50376b0 fix(update): reset working tree when stash restore leaves conflict markers
When `hermes update` stashes local changes and the subsequent
`git stash apply` fails or leaves unmerged files, the conflict markers
(<<<<<<< etc.) were left in the working tree, making Hermes unrunnable
until manually cleaned up.

Now the update command runs `git reset --hard HEAD` to restore a clean
working tree before exiting, and also detects unmerged files even when
git stash apply reports success.

Closes #2348
2026-03-21 16:16:35 -07:00
Teknium
f8fb61d4ad fix(provider): prevent Anthropic fallback from inheriting non-Anthropic base_url
Only honor config.model.base_url for Anthropic resolution when
config.model.provider is actually "anthropic". This prevents a Codex
(or other provider) base_url from leaking into Anthropic runtime and
auxiliary client paths, which would send  requests to the wrong
endpoint.

Closes #2384
2026-03-21 16:16:17 -07:00
Teknium
0d68446323 feat: add bioinformatics gateway skill
Meta-skill that indexes 400+ bioinformatics skills from two open-source
repos (GPTomics/bioSkills and ClawBio/ClawBio) and fetches domain-specific
reference material on demand. Covers genomics, transcriptomics, single-cell,
variant calling, pharmacogenomics, metagenomics, structural biology, and
20+ other computational biology domains.

No dependencies bundled — the skill clones the relevant repo when needed
and reads the domain-specific guides as reference material.
2026-03-21 16:15:24 -07:00
Teknium
81dbf4309a fix(telegram): escape bare parentheses/braces in MarkdownV2 output (#2386)
fix(telegram): escape bare parentheses/braces in MarkdownV2 output
2026-03-21 16:13:34 -07:00
Teknium
febfe1c268 fix(telegram): escape bare parentheses/braces in MarkdownV2 output
The MarkdownV2 format_message conversion left unescaped ( ) { }
in edge cases where placeholder processing didn't cover them (e.g.
partial link matches, URLs with parens). This caused Telegram to
reject the message with 'character ( is reserved and must be escaped'
and fall back to plain text — losing all formatting.

Added a safety-net pass (step 12) after placeholder restoration that
escapes any remaining bare ( ) { } outside code blocks and valid
MarkdownV2 link syntax.
2026-03-21 16:13:13 -07:00
Teknium
2a5f86ed6d Merge pull request #2343 from NousResearch/hermes/hermes-31d7db3b
feat: @ context references + Honcho config fixes
2026-03-21 16:10:19 -07:00
Tenzin Jampa
d3659c8ca0 fix(gateway): /title command fails when session doesn't exist in SQLite yet (#2379)
The /title command would fail with 'Session not found in database.' when
used as the first command in a new session. This happened because:

1. Gateway creates session in session_store (in-memory)
2. But SQLite _session_db only gets sessions when agent flushes messages
3. set_session_title() does UPDATE which fails if row doesn't exist

Now we check if session exists in SQLite and create it if needed before
attempting to set the title.

Fixes: Session not found in database. error on /title in new chats
2026-03-21 16:04:53 -07:00
Teknium
f7f75de7c3 fix(gateway): deliver MEDIA: files after streaming responses (#2382)
fix(gateway): deliver MEDIA: files after streaming responses
2026-03-21 16:01:47 -07:00
Teknium
f58902818d fix(gateway): deliver MEDIA: files after streaming responses
When streaming is enabled, text chunks are sent to the user in
real-time including raw MEDIA: tags. The normal post-processing in
_process_message_background is skipped when already_sent=True, so
MEDIA: files were never extracted or delivered — the user just saw
the raw MEDIA:/path/to/file text.

Fix: after streaming completes, extract MEDIA: tags and local file
paths from the response and deliver them via the platform adapter.
The text is already sent (with the raw tag visible in the stream),
but the actual files now get delivered as attachments.
2026-03-21 16:01:25 -07:00
Teknium
8da410ed95 feat(plugins): add slash command registration for plugins (#2359)
Plugins can now register slash commands via ctx.register_command()
in their register() function. Commands automatically appear in:
- /help and COMMANDS_BY_CATEGORY (under 'Plugins' category)
- Tab autocomplete in CLI
- Telegram bot menu
- Slack subcommand mapping
- Gateway dispatch

Handler signature: handler(args: str) -> str | None
Async handlers are supported in gateway context.

Changes:
- commands.py: add register_plugin_command() and rebuild_lookups()
- plugins.py: add register_command() to PluginContext, track in
  PluginManager._plugin_commands and LoadedPlugin.commands_registered
- cli.py: dispatch plugin commands in process_command()
- gateway/run.py: dispatch plugin commands before skill commands
- tests: 5 new tests for registration, help, tracking, handler, gateway
- docs: update plugins feature page and build guide
2026-03-21 16:00:30 -07:00
Teknium
da44c196b6 feat: @ context references — inline file, folder, diff, git, and URL injection
Add @file:path, @folder:dir, @diff, @staged, @git:N, and @url:
references that expand inline before the message reaches the LLM.
Supports line ranges (@file:main.py:10-50), token budget enforcement
(soft warn at 25%, hard block at 50%), and path sandboxing for gateway.

Core module from PR #2090 by @kshitijk4poor. CLI and gateway wiring
rewritten against current main. Fixed asyncio.run() crash when called
from inside a running event loop (gateway).

Closes #682.
2026-03-21 15:57:13 -07:00
Teknium
36079c6646 fix(tools): fix resource leak and double socket close in code_execution_tool (#2381)
Two fixes:
1. Use a single open(os.devnull) handle for both stdout and stderr
   suppression, preventing a file handle leak if the second open() fails.
2. Set server_sock = None after closing it in the try block to prevent
   the finally block from closing it again (causing an OSError).

Closes #2136

Co-authored-by: dieutx <dangtc94@gmail.com>
2026-03-21 15:55:25 -07:00
Teknium
135448f513 fix: ignore placeholder provider keys in provider activation checks (salvage #2121)
fix: ignore placeholder provider keys in provider activation checks (salvage #2121)
2026-03-21 15:54:59 -07:00
Teknium
2e143fd15c fix(acp): preserve session provider when switching models (#2380)
fix(acp): preserve session provider when switching models
2026-03-21 15:54:42 -07:00
Gutslabs
0b9526b476 fix(acp): preserve session provider when switching models 2026-03-21 15:54:10 -07:00
aashizpoudel
f304bc63b8 fix: ignore placeholder provider keys in provider activation checks
Add has_usable_secret() to reject empty, short (<4 char), and common
placeholder API key values (changeme, your_api_key, placeholder, etc.)
throughout the auth/runtime resolution chain.

Update list_available_providers() to use provider-specific auth status
via get_auth_status() instead of resolve_runtime_provider(), preventing
cross-provider key fallback from making providers appear available when
they aren't actually configured.

Preserve keyless custom endpoint support by checking via base URL.

Cherry-picked from PR #2121 by aashizpoudel.
2026-03-21 12:55:42 -07:00
Teknium
decc7851f2 fix(cli): pass conversation_history in quiet mode with --resume (#2357)
fix(cli): pass conversation_history in quiet mode with --resume
2026-03-21 12:51:56 -07:00
christopher-kapic
97108db038 fix(cli): pass conversation_history in quiet mode with --resume
hermes chat -q 'msg' --resume SESSION_ID loaded the session history
but never passed it to run_conversation(), so the model responded
without prior context. The interactive mode already does this correctly.

Based on work by christopher-kapic in PR #2081. Fixes #2106.
2026-03-21 12:51:34 -07:00
Teknium
1f1fa71d0c feat(skill): meme-generation — real image generator with Pillow (#2344)
* feat: add meme-generation skill

* Reduce meme skill prompt cost with tighter selection rules

* feat(skill): overhaul meme-generation into real image generator

Move from skills/creative/ to optional-skills/creative/ (niche skill,
not needed by default). Replace prompt-only meme concept brainstormer
with actual meme image generation:

- Python script using Pillow to overlay text on template images
- 10 curated templates with hand-tuned text positioning
- Dynamic access to ~100 popular imgflip templates via public API
- Custom image mode (--image): use AI-generated or any image as base
- Two text modes: overlay (white+outline on image) or bars (black bars)
- Vision verification workflow: use vision_analyze to QA the result
- Auto-scaling font with pixel-accurate word wrapping
- Template search via --search
- No API keys required

Original skill concept by adanaleycio (PR #1771), overhauled with
image generation and custom image support.

---------

Co-authored-by: adanaleycio <atillababa767@gmail.com>
2026-03-21 12:48:57 -07:00
Teknium
2988334fe5 fix: case-insensitive model family matching + compressor init logging (#2350)
fix: case-insensitive model family matching + compressor init logging
2026-03-21 10:48:08 -07:00
Teknium
292d12bed4 fix: case-insensitive model family matching + compressor init logging
Two fixes for local model context detection:

1. Hardcoded DEFAULT_CONTEXT_LENGTHS matching was case-sensitive.
   'qwen' didn't match 'Qwen3.5-9B-Q4_K_M.gguf' because of the
   capital Q. Now uses model.lower() for comparison.

2. Added compressor initialization logging showing the detected
   context_length, threshold, model, provider, and base_url.
   This makes turn-1 compression bugs diagnosable from logs —
   previously there was no log of what context length was detected.
2026-03-21 10:47:44 -07:00
Teknium
509cff6e5c revert: remove Shift+Enter keybindings that crash prompt_toolkit (#2349)
revert: remove Shift+Enter keybindings that crash prompt_toolkit
2026-03-21 10:41:24 -07:00
Teknium
29520df44f revert: remove Shift+Enter keybindings that crash prompt_toolkit
Reverts the s-enter and Kitty CSI keybindings from PR #2345/#2346.
The s-enter key notation causes 'Invalid key: s-enter' crash on
some prompt_toolkit versions, breaking hermes startup entirely.
2026-03-21 10:41:07 -07:00
Teknium
9be42e49f9 fix: resolve merge conflict markers in cli.py breaking hermes startup (#2347)
fix: resolve merge conflict markers in cli.py breaking hermes startup
2026-03-21 10:34:40 -07:00
Teknium
42cef9c282 fix: resolve merge conflict markers in cli.py breaking hermes startup
PR #2346 was merged with unresolved git conflict markers (<<<<<<,
=======, >>>>>>>) in cli.py at line 6047, causing SyntaxError on
startup. Resolved by keeping both the Shift+Enter keybindings and
the tab handler.
2026-03-21 10:34:21 -07:00
Teknium
3a71099dac fix(cli): handle Kitty keyboard protocol Shift+Enter for Ghostty/WezTerm (#2345)
fix(cli): handle Kitty keyboard protocol Shift+Enter for Ghostty/WezTerm
2026-03-21 10:04:19 -07:00
ygd58
356122e990 fix(cli): handle Kitty keyboard protocol Shift+Enter for Ghostty/WezTerm
Kitty-protocol terminals (Ghostty, WezTerm) encode Shift+Enter as
CSI 13;2u instead of plain Enter. Without this binding, raw escape
characters appear in the input buffer. Adds s-enter and the Kitty
escape sequence as newline-insert bindings.

Based on work by ygd58 in PR #1798. Fixes #1795.
Registry.py apostrophe sanitization change excluded (unrelated scope).
2026-03-21 10:03:55 -07:00
Teknium
aefcdd6f7f fix: return JSON parse error to model instead of dispatching with empty args (#2342)
When the model produces malformed JSON in tool call arguments, the agent
loop was setting args={} and dispatching the tool anyway, wasting an
iteration and producing a confusing downstream error. Now the error is
returned directly as the tool result so the model can retry with valid JSON.

Co-authored-by: alireza78a <alireza78.crypto@gmail.com>
2026-03-21 09:56:44 -07:00
Teknium
3835a8d5df fix: whitespace-only env vars bypass web backend detection + clearer Firecrawl error (#2341)
fix: whitespace-only env vars bypass web backend detection + clearer Firecrawl error
2026-03-21 09:55:03 -07:00
JackTheGit
e8188a56c7 Fix backend detection when environment variables contain only whitespace 2026-03-21 09:53:06 -07:00
JackTheGit
c42a18e9e5 Improve Firecrawl configuration error message and add logging 2026-03-21 09:53:06 -07:00
Teknium
b73d221324 fix: Alibaba/DashScope: preserve model dots, fix 401 auth, fix dead provider check (salvage #1748 + fix #2314)
fix: Alibaba/DashScope: preserve model dots, fix 401 auth, fix dead provider check (salvage #1748 + fix #2314)
2026-03-21 09:51:40 -07:00
Teknium
cc51ffdb57 Merge pull request #2340 from NousResearch/feat/streaming-default
feat: enable streaming by default in CLI
2026-03-21 09:50:54 -07:00
Teknium
c8971db435 fix(gateway): pass message_thread_id in send_image_file, send_document, send_video (#2339)
fix(gateway): pass message_thread_id in send_image_file, send_document, send_video
2026-03-21 09:50:09 -07:00
Teknium
c4e787d47b feat: enable streaming by default in CLI
Streaming provides a better UX — tokens appear as they arrive instead
of waiting for the full response. show_reasoning remains false so
thinking blocks are not streamed to the user.
2026-03-21 09:49:47 -07:00
unmodeled-tyler
fb48b8f0c5 fix(gateway): pass message_thread_id in send_image_file, send_document, send_video
Fixes #1803. send_image_file, send_document, and send_video were missing
message_thread_id forwarding, causing them to fail in Telegram forum/supergroups
where thread_id is required. send_voice already handled this correctly. Adds
metadata parameter + message_thread_id to all three methods, and adds tests
covering the thread_id forwarding path.
2026-03-21 09:49:33 -07:00
Teknium
67600d0a0b feat(cli): add hermes plugins install/remove/list command (#2337)
feat(cli): add hermes plugins install/remove/list command
2026-03-21 09:47:59 -07:00
Angello Picasso
5a9ab09bc3 feat(cli): add hermes plugins install/remove/list command
Plugin management via git repos:
- hermes plugins install <git-url|owner/repo>
- hermes plugins update <name>
- hermes plugins remove <name> (aliases: rm, uninstall)
- hermes plugins list (alias: ls)

Security: path traversal protection, no shell injection, manifest
version guard, insecure URL warnings.

42 tests covering security, dispatch, helpers, and commands.

Based on work by Angello Picasso in PR #1785. Closes #1789.
2026-03-21 09:47:33 -07:00
Teknium
2c06ec5f51 fix: correct provider check for Alibaba model identity injection
PR #2314 checked for provider names 'alibaba-coding-plan' and
'alibaba-coding-plan-anthropic' which don't exist in the provider
registry. The provider is always 'alibaba' — the condition was dead
code. Fixed to check self.provider == 'alibaba'.
2026-03-21 09:46:26 -07:00
Teknium
d70e07fc45 refactor(cli): add protected TUI extension hooks for wrapper CLIs
Based on PR #1749 by @erosika (reimplemented on current main).

Extracts three protected methods from run() so wrapper CLIs can extend
the TUI without overriding the entire method:

- _get_extra_tui_widgets(): inject widgets between spacer and status bar
- _register_extra_tui_keybindings(kb, input_area): add keybindings
- _build_tui_layout_children(**widgets): full control over ordering

Default implementations reproduce existing layout exactly. The inline
HSplit in run() now delegates to _build_tui_layout_children().

5 tests covering defaults, widget insertion position, and keybinding
registration.
2026-03-21 09:42:07 -07:00
Teknium
fff7203049 fix(mistral-parser): handle nested JSON in fallback extraction (#2335)
fix(mistral-parser): handle nested JSON in fallback extraction
2026-03-21 09:41:45 -07:00
Himess
5663980015 fix(mistral-parser): handle nested JSON in fallback extraction 2026-03-21 09:41:17 -07:00
Teknium
8304a7716d fix(gateway): restart on whatsapp bridge child exit (#2334)
Co-authored-by: Frederico Ribeiro <fr@tecompanytea.com>
2026-03-21 09:38:52 -07:00
crazywriter1
523d8c38f9 fix: Alibaba/DashScope: preserve model dots (qwen3.5-plus) and fix 401 auth
When using Alibaba (DashScope) with an anthropic-compatible endpoint,
model names like qwen3.5-plus were being normalized to qwen3-5-plus.
Alibaba's API expects the dot. Added preserve_dots parameter to
normalize_model_name() and build_anthropic_kwargs().

Also fixed 401 auth: when provider is alibaba or base_url contains
dashscope/aliyuncs, use only the resolved API key (DASHSCOPE_API_KEY).
Never fall back to resolve_anthropic_token(), and skip Anthropic
credential refresh for DashScope endpoints.

Cherry-picked from PR #1748 by crazywriter1. Fixes #1739.
2026-03-21 09:38:04 -07:00
Teknium
e6299960cc docs(discord): mark Server Members Intent as required (#2330)
docs(discord): mark Server Members Intent as required
2026-03-21 09:34:21 -07:00
Teknium
fb6d41237c docs(discord): mark Server Members Intent as required
Users reported that the bot fails to resolve usernames without the
Server Members privileged intent enabled. Updated the setup docs
to mark it as Required instead of Optional.

Feedback from Blangs [MADD].
2026-03-21 09:34:01 -07:00
Teknium
e183744cb5 feat(honcho): instance-local config via HERMES_HOME, default session strategy to per-directory
- Add resolve_config_path(): checks $HERMES_HOME/honcho.json first,
  falls back to ~/.honcho/config.json.  Enables isolated Hermes instances
  with independent Honcho credentials and settings.
- Update CLI and doctor to use resolved path instead of hardcoded global.
- Change default session_strategy from per-session to per-directory.

Part 1 of #1962 by @erosika.
2026-03-21 09:34:00 -07:00
Teknium
07112e4e98 fix(mattermost): use MIME types for media attachments (#2329)
fix(mattermost): use MIME types for media attachments
2026-03-21 09:31:53 -07:00
Himess
bc15f6cca3 fix(mattermost): use MIME types for media attachments
Bare strings like "image", "audio", "document" were appended to
media_types, but downstream run.py checks mtype.startswith("image/")
and mtype.startswith("audio/"), which never matched. This caused all
Mattermost file attachments to be silently dropped from vision/STT
processing. Use the actual MIME type from file_info instead.
2026-03-21 09:31:15 -07:00
Teknium
3921fb973c fix(gateway): load platforms section from config.yaml for webhook routes (#2328)
fix(gateway): load platforms section from config.yaml for webhook routes
2026-03-21 09:27:40 -07:00
Teknium
6408b4ad53 Merge pull request #2327 from NousResearch/hermes/hermes-5d6932ba
fix: prevent systemd restart storm on gateway connection failure
2026-03-21 09:26:57 -07:00
Teknium
326b146d68 fix: prevent systemd restart storm on gateway connection failure
Cherry-picked from PR #2319 by @itenev.

When the gateway fails to connect (e.g. PrivilegedIntentsRequired,
missing token), systemd's default RestartSec=10 with no start rate
limit causes rapid reconnect storms flooding logs and triggering
platform-side rate limits.

- StartLimitIntervalSec=600 + StartLimitBurst=5 in [Unit] (max 5
  restarts per 10 min)
- RestartSec: 10 → 30
- Applied to both templates in gateway.py and scripts/hermes-gateway
2026-03-21 09:26:39 -07:00
dieutx
1830db0476 fix(gateway): load platforms section from config.yaml into gateway config
The gateway config loader read config.yaml but never merged its
`platforms` key into the runtime config dict.  This meant that
platform-specific settings defined under `platforms.<name>.extra`
(e.g. webhook routes) were silently ignored unless the user also
duplicated them in the legacy gateway.json file.

Merge `yaml_cfg["platforms"]` into `gw_data["platforms"]` with a
shallow deep-merge of the `extra` dict so that gateway.json defaults
are preserved while config.yaml values take precedence.

Closes #2305
2026-03-21 09:26:24 -07:00
Teknium
3ba6043c62 feat(compressor): major context compaction improvements (#2323)
feat(compressor): major context compaction improvements — structured summaries, iterative updates, token-budget tail protection
2026-03-21 08:51:42 -07:00
Teknium
f4a74d3ac7 fix(honcho): hide session banner when not explicitly configured
Add explicitly_configured field to HonchoClientConfig — set when the
config has a hosts.hermes block or explicit enabled flag, vs auto-enabled
from a stray HONCHO_API_KEY env var.  Banner only shows when this is true.

Based on #1960 by @erosika, reimplemented without duplicating config parsing.
2026-03-21 08:33:44 -07:00
Teknium
e75f58420c feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:

1. Structured summary template — sections for Goal, Progress (Done/
   In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
   and Critical Context. Forces the summarizer to preserve each
   category instead of writing a vague paragraph.

2. Iterative summary updates — on re-compression, the prompt says
   'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
   status.' Previous summary is stored and fed back to the summarizer
   so accumulated context survives across multiple compactions.

3. Token-budget tail protection — instead of fixed protect_last_n=4,
   walks backward keeping ~20K tokens of recent context. Adapts to
   message density: sessions with big tool results protect fewer
   messages, short exchanges protect more. Falls back to protect_last_n
   for small conversations.

4. Tool output pruning (pre-pass) — before the expensive LLM summary,
   replaces old tool result contents with a placeholder. This is free
   (no LLM call) and can save 30%+ of context by itself.

5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
   of compressed content tokens (clamped to 2000-8000). A 50-turn
   conversation gets more summary space than a 10-turn one.

6. Richer summarizer input — tool calls now include arguments (up to
   500 chars) and tool results keep up to 3000 chars (was 1500).
   The summarizer sees 'terminal(git status) → M src/config.py'
   instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
Teknium
28bb0e770f fix(voice): enable TTS voice reply when streaming is active (#2322)
When streaming is enabled, the base adapter receives None from
_handle_message (already_sent=True) and cannot run auto-TTS for
voice input. The runner was unconditionally skipping voice input
TTS assuming the base adapter would handle it.

Now the runner takes over TTS responsibility when streaming has
already delivered the text response, so voice channel playback
works with both streaming on and off.

Streaming off behavior is unchanged (default already_sent=False
preserves the original code path exactly).

Co-authored-by: 0xbyt4 <35742124+0xbyt4@users.noreply.github.com>
2026-03-21 08:08:37 -07:00
Teknium
06f4df52f1 fix(install): add zprofile fallback and create zshrc on fresh macOS installs (#2320)
On macOS, zsh users may not have ~/.zshrc if they haven't customized
their shell yet. The installer would silently fail to add ~/.local/bin
to PATH, causing 'hermes: command not found' after installation.

- Check ~/.zprofile as fallback for zsh users (macOS login shell config)
- Create ~/.zshrc if neither config file exists

Cherry-picked from PR #2315 by erhnysr.

Co-authored-by: erhnysr <erhnysr@users.noreply.github.com>
2026-03-21 07:30:43 -07:00
Teknium
a03cbcd5f9 Merge pull request #2317 from NousResearch/hermes/hermes-5d6932ba
fix(cron): close abandoned coroutine when asyncio.run() raises RuntimeError
2026-03-21 07:21:18 -07:00
Teknium
df67ae730b fix(cron): close abandoned coroutine when asyncio.run() raises RuntimeError
Cherry-picked from PR #2290 by @Mibayy. Closes #2138.

When asyncio.run() raises RuntimeError (running loop exists), the
coroutine was created but never awaited, producing a RuntimeWarning
on GC. Extract coro before try, call coro.close() in the except
branch before falling back to ThreadPoolExecutor.
2026-03-21 07:20:58 -07:00
Teknium
9305164bf3 fix: add None-entry guard to tool_calls loops in run_agent, batch_runner, and mini_swe_runner (#2316)
Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>
2026-03-21 07:20:41 -07:00
Teknium
453f4c5175 Merge pull request #2312 from NousResearch/hermes/hermes-31d7db3b
fix(gateway): retry Telegram 409 polling conflicts before giving up
2026-03-21 07:19:43 -07:00
Teknium
37a9979459 fix(cron): stop injecting cron outputs into gateway session history (#2313)
Cron deliveries were mirrored into the target gateway session as
assistant-role messages, causing consecutive assistant messages that
violate message alternation (issue #2221).

Instead of fixing the role, remove the mirror injection entirely.
Cron outputs already live in their own cron session and don't belong
in the interactive conversation history.

Delivered messages are now wrapped with a header (task name) and a
footer noting the agent cannot see or respond to the message, so
users have clear context about what they're reading.

Closes #2221
2026-03-21 07:18:36 -07:00
Teknium
713f2f73da fix(agent): inject model identity for Alibaba Coding Plan (#2314)
fix(agent): inject model identity for Alibaba Coding Plan
2026-03-21 07:11:51 -07:00
Teknium
237499d102 Merge pull request #2311 from NousResearch/hermes/hermes-5d6932ba
fix(toolsets): pass visited set by reference to prevent diamond dependency duplication
2026-03-21 07:11:27 -07:00
Teknium
3f811f52fd fix(toolsets): pass visited set by reference to prevent diamond dependency duplication
Cherry-picked from PR #2292 by @Mibayy. Closes #2134.

resolve_toolset() called visited.copy() per sibling include, breaking
dedup for diamond dependencies (D resolved twice via B and C paths)
and causing duplicate cycle warnings.

Fix: pass visited directly so siblings share the same set. The .copy()
for the all/* alias at the top level is kept so each top-level toolset
gets an independent pass. Removes the print() cycle warning since
hitting a visited name now usually means diamond (not a bug).
2026-03-21 07:11:09 -07:00
ygd58
2ea8054304 fix(agent): inject model identity for Alibaba Coding Plan to work around API returning wrong model name 2026-03-21 07:11:08 -07:00
Teknium
488a30e879 fix(gateway): retry Telegram 409 polling conflicts before giving up
A single Telegram 409 Conflict from getUpdates permanently killed
Telegram polling with no recovery possible (retryable=False on
first occurrence).  This is too aggressive for production use with
process supervisors.

Transient 409s are expected during:
- --replace handoffs where the old long-poll session lingers on
  Telegram servers for a few seconds after SIGTERM
- systemd Restart=on-failure respawns that overlap with the dying
  instance cleanup

Now _handle_polling_conflict() retries up to 3 times with a
10-second delay between attempts.  The 30-second total retry window
lets stale server-side sessions expire.  If all retries fail, the
error is still marked as permanently fatal — preserving the original
protection against genuine dual-instance conflicts.

Tests updated: split the single conflict test into two — one verifying
retry on transient conflict, one verifying fatal after exhausted
retries.

Closes #2296
2026-03-21 07:11:06 -07:00
Teknium
bc3f425212 Merge pull request #2309 from NousResearch/hermes/hermes-5d6932ba
fix(cli): correct truncated AUXILIARY_WEB_EXTRACT_API_KEY env var name
2026-03-21 07:09:47 -07:00
Teknium
fd1d6c03cb fix(cli): correct truncated AUXILIARY_WEB_EXTRACT_API_KEY env var name
Cherry-picked from PR #2295 by @dlkakbs.

The web_extract auxiliary client api_key env var was literally stored as
'AUXILI..._KEY' (dots in the source) instead of the full name. Users
configuring an auxiliary web_extract model with an API key would have
auth failures because the key was written to a non-existent var.
2026-03-21 07:09:28 -07:00
Teknium
58b52dfb2f Merge pull request #2303 from NousResearch/hermes/hermes-31d7db3b
fix: remove synthetic error message injection, fix session resume after repeated failures
2026-03-21 07:03:54 -07:00
Teknium
651e92fbbf fix: use git pull --ff-only in update/install to avoid divergent branch error (#2274)
fix: use git pull --rebase in update/install to avoid divergent branch error
2026-03-21 06:33:22 -07:00
Teknium
779619f742 fix: remove synthetic error message injection, fix session resume after repeated failures
Two changes to the error handler in the agent loop:

1. Remove the 'if not pending_handled' block that injected fake
   [System error during processing: ...] messages into conversation
   history.  These polluted history, burned tokens on retries, and
   could violate role alternation by injecting as role=user.
   The tool_calls error-result path (role=tool) is preserved.

2. Append the error final_response as an assistant message when
   hitting the iteration limit, so session resume doesn't produce
   consecutive user messages.
2026-03-21 06:33:05 -07:00
Teknium
96a5e9fc11 feat(agent): add summary of successful tool actions in review agent
Enhanced the review agent to scan and summarize successful tool actions, providing users with a compact overview of updates made during the review process. This includes actions related to memory and user profiles, improving user feedback and interaction clarity.
2026-03-21 06:31:59 -07:00
Teknium
eb537b5db4 fix(cli): prevent multiple reasoning boxes from rendering
Added a check to suppress further reasoning rendering once the response box is open, preventing potential overlap of reasoning boxes during late thinking blocks. This enhances the user experience by maintaining a clean output in the CLI.
2026-03-21 06:28:47 -07:00
Teknium
2da79b13df feat: priority-based context file selection + CLAUDE.md support (#2301)
Previously, all project context files (AGENTS.md, .cursorrules, .hermes.md)
were loaded and concatenated into the system prompt. This bloated the prompt
with potentially redundant or conflicting instructions.

Now only ONE project context type is loaded, using priority order:
  1. .hermes.md / HERMES.md  (walk to git root)
  2. AGENTS.md / agents.md   (recursive directory walk)
  3. CLAUDE.md / claude.md   (cwd only, NEW)
  4. .cursorrules / .cursor/rules/*.mdc  (cwd only)

SOUL.md from HERMES_HOME remains independent and always loads.

Also adds CLAUDE.md as a recognized context file format, matching the
convention popularized by Claude Code.

Refactored the monolithic function into four focused helpers:
_load_hermes_md, _load_agents_md, _load_claude_md, _load_cursorrules.

Tests: replaced 1 coexistence test with 10 new tests covering priority
ordering, CLAUDE.md loading, case sensitivity, injection blocking.
2026-03-21 06:26:20 -07:00
Teknium
885f88fb60 feat(agent): suppress non-forced output during post-response housekeeping
- Introduced a mechanism to mute output after the main response is delivered, ensuring that subsequent tool calls run without cluttering the CLI.
- Redirected stdout to devnull during the review agent's execution to prevent any print statements from interfering with the main CLI display.
- Added a new attribute `_mute_post_response` to manage output suppression effectively.
2026-03-20 23:54:42 -07:00
Teknium
3585019831 feat(cli): enhance user input display with consistent formatting
- Added a user bar separator for improved visual clarity when displaying pasted text and user input in the HermesCLI.
- Ensured consistent formatting for both multi-line and single-line user inputs, enhancing the overall user experience in the command-line interface.

These changes contribute to a more organized and visually appealing output during interactions.
2026-03-20 23:36:49 -07:00
Teknium
6d7f3dbbb7 Merge pull request #2278 from NousResearch/hermes/hermes-5d6932ba
fix(setup): add alibaba and deepseek to provider model selection
2026-03-20 22:50:18 -07:00
Test
71cf7ad11a fix(setup): add alibaba to provider model selection
Same bug as opencode-zen/go — alibaba fell through to the OpenRouter
model list instead of using _setup_provider_model_selection() which
probes the provider's own /models endpoint.

All user-selectable providers now have correct model selection routing.
2026-03-20 22:48:59 -07:00
Teknium
b748fcf836 Merge pull request #2277 from NousResearch/hermes/hermes-5d6932ba
fix(setup): OpenCode Zen/Go show OpenRouter models instead of their own
2026-03-20 22:42:33 -07:00
Test
7289256114 fix(setup): OpenCode Zen/Go show OpenRouter models instead of their own
After selecting OpenCode Zen or Go as provider in hermes setup, the
model selection page showed OpenRouter models because these providers
weren't in the list that routes to _setup_provider_model_selection().
They fell through to the else branch which shows the OpenRouter catalog.

Users ended up with an OpenCode API key but an OpenRouter model name,
causing 'Provider resolver returned an empty API key' on first use.

Fix: add opencode-zen and opencode-go to the provider list that uses
_setup_provider_model_selection() for live /models detection.
2026-03-20 22:42:14 -07:00
Test
870ebb8850 fix: use git pull --ff-only in update/install to avoid divergent branch error
Fresh installs without pull.rebase configured hit a git error when
running hermes update because git doesn't know how to reconcile
divergent branches. --ff-only is the right strategy: it works for the
normal case (local branch is behind remote) and fails cleanly if the
user somehow has local commits, rather than silently rebasing them.
2026-03-20 22:28:55 -07:00
Teknium
517b5c17d6 Merge pull request #2275 from NousResearch/hermes/hermes-5d6932ba
chore: remove dead top-level toolsets config key
2026-03-20 22:27:35 -07:00
Test
d0ac8d9fc7 chore: remove dead top-level toolsets config key
The top-level 'toolsets' key in config.yaml was never read at runtime.
Tool selection uses platform_toolsets (per-platform) or the --toolsets
CLI flag. The key existed in load_cli_config() defaults and the example
config as 'toolsets: [all]', misleading users into thinking it
controlled tool availability.

- Remove from load_cli_config() hardcoded defaults
- Remove from hermes config show output
- Replace in cli-config.yaml.example with deprecation note pointing
  to platform_toolsets and hermes tools
2026-03-20 22:27:13 -07:00
Teknium
761a8ad39a fix(display): show provider and endpoint in API error messages (#2266)
fix(display): show provider and endpoint in API error messages
2026-03-20 21:57:53 -07:00
Teknium
52adc8873b Merge pull request #2268 from NousResearch/hermes/hermes-5d6932ba
fix(tools): disabled toolsets re-enable themselves after hermes tools
2026-03-20 21:57:39 -07:00
Test
173a5c6290 fix(tools): disabled toolsets re-enable themselves after hermes tools
Two bugs in the save/load roundtrip for platform_toolsets:

1. _save_platform_tools preserved composite toolset entries (hermes-cli,
   hermes-telegram, etc.) because they weren't in configurable_keys.
   These composites include ALL _HERMES_CORE_TOOLS, so having hermes-cli
   in the saved list alongside individual keys negated any disables —
   the subset check always found the disabled toolset's tools via the
   composite entry.

   Fix: also filter out known TOOLSETS keys from preserved entries. Only
   truly unknown entries (MCP server names, custom entries) are kept.

2. _get_platform_tools used reverse subset inference to determine which
   configurable toolsets were enabled. This is inherently broken when
   tools appear in multiple toolsets (e.g. HA tools in both the
   homeassistant toolset and _HERMES_CORE_TOOLS).

   Fix: when the saved list contains explicit configurable keys (meaning
   the user has configured this platform), use direct membership instead
   of subset inference. The fallback path still handles legacy configs
   that only have a composite entry like hermes-cli.
2026-03-20 21:11:54 -07:00
Test
f3b2303428 fix(gateway): skip model auto-detection for custom/local providers
Mirrors the CLI fix for the gateway /model handler. When the user is on
a custom provider (provider=custom, localhost, or 127.0.0.1 endpoint),
/model <name> no longer tries to auto-detect a provider switch.

Previously, typing /model openrouter/nvidia/nemotron:free on Telegram
while on a localhost endpoint would silently accept the model name on
the local server — auto-detection failed to match the free model, so
the provider stayed as custom with the localhost base_url. The user saw
'Model changed' but requests still went to localhost, which doesn't
serve that model.

Now shows the endpoint URL and provider:model syntax tip, matching
the CLI behavior.
2026-03-20 21:07:48 -07:00
Test
1870069f80 fix(session_search): exclude current session lineage
Cherry-picked from PR #2201 by @Gutslabs.

session_search resolved hits to parent/root sessions but only excluded
the exact current_session_id. If the active session was a child
continuation (compression/delegation), its parent could still appear
as a 'past' conversation result.

Fix: resolve current_session_id to its lineage root before filtering,
so the entire active lineage (parent and children) is excluded.
2026-03-20 21:07:48 -07:00
Test
d560f2d1f2 fix(display): show provider and endpoint in API error messages
When an API call fails, the error output now shows the provider name,
model, and endpoint URL so users can immediately identify which service
rejected their request. Auth errors (401/403) get actionable guidance:
check key validity, model access, and OpenRouter credits link.

Before: 'API call failed (attempt 1/3): PermissionDeniedError'
After:  'API call failed (attempt 1/3): PermissionDeniedError
         Provider: openrouter  Model: anthropic/claude-sonnet-4
         Endpoint: https://openrouter.ai/api/v1
         Your API key was rejected by the provider. Check:
           • Is the key valid? Run: hermes setup
           • Does your account have access to anthropic/claude-sonnet-4?
           • Check credits: https://openrouter.ai/settings/credits'
2026-03-20 21:06:55 -07:00
Test
f7e2ed20fa feat(cli): implement true-color ANSI support for response text
- Added support for true-color ANSI escape codes in the HermesCLI to enhance the visual appearance of streamed content.
- Introduced a fallback mechanism for text color in case of errors while retrieving the color from the active skin.
- Updated the output formatting to include the new text color in both line emissions and buffer flushing.

These changes improve the user experience by ensuring consistent and visually appealing text output in the command-line interface.
2026-03-20 21:02:36 -07:00
Test
10d719ac1b fix(security): require opt-in for project plugin discovery 2026-03-20 20:50:30 -07:00
Teknium
45058b4105 feat: replace inline nudges with background memory/skill review (#2235)
Remove the memory and skill nudges that were appended directly to user
messages, causing backward-looking system instructions to compete with
forward-looking user tasks. Found in 43% of user messages across 15
sessions, with confirmed cases of the agent spending tool calls on
nudge responses before starting the user's actual request.

Replace with a background review agent that runs AFTER the main agent
finishes responding:
- Spawns a background thread with a snapshot of the conversation
- Uses the main model (not auxiliary) for high-precision memory/skill work
- Only has memory + skill_manage tools (5 iteration budget)
- Shares the memory store for direct writes
- Never modifies the main conversation history
- Never competes with the user's task for model attention
- Zero latency impact (runs after response is delivered)
- Same token cost (processes the same context, just on a separate track)

The trigger conditions are unchanged (every 10 user turns for memory,
after 10+ tool iterations for skills). Only the execution path changes:
from inline injection to background fork.

Closes #2227.

Co-authored-by: Test <test@test.com>
2026-03-20 18:51:31 -07:00
Teknium
2416b2b7af refactor(cli, banner): update gold ANSI color to true-color format (#2246)
- Changed the ANSI escape code for gold color in cli.py and banner.py to use true-color format (#FFD700) for better visual consistency.
- Enhanced the _on_tool_progress method in HermesCLI to update the TUI spinner with tool execution status, improving user feedback during operations.

These changes improve the visual representation and user experience in the command-line interface.

Co-authored-by: Test <test@test.com>
2026-03-20 18:17:38 -07:00
Teknium
4263350c5b fix: remove post-compression file-read history injection (#2226)
Remove the [Files already read — do NOT re-read these] user message
that was injected into the conversation after context compression.

This message used role='user' for system-generated content, creating
a fake user turn that confused models about conversation state and
could contribute to task-redo behavior.

The file_tools.py read tracker (warn on 3rd consecutive read, block
on 4th+) already handles re-read prevention inline without injecting
synthetic messages.

Closes #2224.

Co-authored-by: Test <test@test.com>
2026-03-20 14:54:25 -07:00
Teknium
214047dee1 fix(display): suppress spinner animation in non-TTY environments (#2216)
fix(display): suppress spinner animation in non-TTY environments
2026-03-20 12:55:54 -07:00
Teknium
ba0b77a803 Merge pull request #2214 from NousResearch/fix/event-loop-closed-delegate
Completes the event loop lifecycle fix trilogy (#2190#2207#2214). Per-thread persistent loops for worker threads prevent GC crashes on cached async clients.
2026-03-20 12:54:19 -07:00
Evey
6e2be3356d fix(display): suppress spinner animation in non-TTY environments
In Docker/systemd/piped environments, the KawaiiSpinner animation
generates ~500 log lines per tool call. Now checks isatty() and
falls back to clean [tool]/[done] log lines in non-TTY contexts.
Interactive CLI behavior unchanged.

Based on work by 42-evey in PR #2203.
2026-03-20 12:52:21 -07:00
Teknium
8e884fb3f1 Merge pull request #2215 from NousResearch/hermes/hermes-31d7db3b
fix: infer provider from base URL for models.dev context length lookup
2026-03-20 12:52:07 -07:00
Test
59074df021 fix: add dashscope-intl.aliyuncs.com to URL-to-provider mapping
The official international DashScope endpoint uses dashscope-intl.aliyuncs.com
(per Alibaba docs), which the substring match on dashscope.aliyuncs.com misses
because of the hyphenated prefix.
2026-03-20 12:51:39 -07:00
Teknium
f853e50589 Merge pull request #2199 from llbn/fix/telegram-markdownv2-features
Clean PR, well-tested. Adds MarkdownV2 strikethrough, spoiler, and blockquote support to Telegram adapter.
2026-03-20 12:45:47 -07:00
Teknium
ca03358575 Merge pull request #2200 from llbn/fix/telegram-mdv2-code-backslash
fix(telegram): escape backslashes and backticks inside code entities for Telegram (MarkdownV2)
2026-03-20 12:43:59 -07:00
emozilla
ab6abc2c13 fix: use per-thread persistent event loops in worker threads
Replace asyncio.run() with thread-local persistent event loops for
worker threads (e.g., delegate_task's ThreadPoolExecutor). asyncio.run()
creates and closes a fresh loop on every call, leaving cached
httpx/AsyncOpenAI clients bound to a dead loop — causing 'Event loop is
closed' errors during GC when parallel subagents clean up connections.

The fix mirrors the main thread's _get_tool_loop() pattern but uses
threading.local() so each worker thread gets its own long-lived loop,
avoiding both cross-thread contention and the create-destroy lifecycle.

Added 4 regression tests covering worker loop persistence, reuse,
per-thread isolation, and separation from the main thread's loop.
2026-03-20 15:41:06 -04:00
0xbyt4
0ce35a117c fix: crash on None entry in tool_calls list during Anthropic conversion (#2209)
If a tool_calls list contains a None entry (from malformed API response,
compression artifact, or corrupt session replay), convert_messages_to_anthropic
crashes with AttributeError: 'NoneType' object has no attribute 'get'.

Skip None and non-dict entries in the tool_calls iteration. Found via
chaos/fuzz testing with mixed valid/invalid tool_call entries.
2026-03-20 12:01:42 -07:00
Test
900e848522 fix: infer provider from base URL for models.dev context length lookup
Custom endpoint users (DashScope/Alibaba, Z.AI, Kimi, DeepSeek, etc.)
get wrong context lengths because their provider resolves as "openrouter"
or "custom", skipping the models.dev lookup entirely. For example,
qwen3.5-plus on DashScope falls to the generic "qwen" hardcoded default
(131K) instead of the correct 1M.

Add _infer_provider_from_url() that maps known API hostnames to their
models.dev provider IDs. When the explicit provider is generic
(openrouter/custom/empty), infer from the base URL before the models.dev
lookup. This resolves context lengths correctly for DashScope, Z.AI,
Kimi, MiniMax, DeepSeek, and Nous endpoints without requiring users to
manually set context_length in config.

Also refactors _is_known_provider_base_url() to use the same URL mapping,
removing the duplicated hostname list.
2026-03-20 11:57:24 -07:00
Teknium
aafe86d81a fix: prevent 'event loop already running' when async tools run in parallel (#2207)
When the model returns multiple tool calls, run_agent.py executes them
concurrently in a ThreadPoolExecutor. Each thread called _run_async()
which used a shared persistent event loop (_get_tool_loop()). If two
async tools (like web_extract) ran in parallel, the second thread would
hit 'This event loop is already running' on the shared loop.

Fix: detect worker threads (not main thread) and use asyncio.run() with
a per-thread fresh loop instead of the shared persistent one. The shared
loop is still used for the main thread (CLI sequential path) to keep
cached async clients (httpx/AsyncOpenAI) alive.

Co-authored-by: Test <test@test.com>
2026-03-20 11:39:13 -07:00
llbn
43b3a0ac66 fix(telegram): escape backslashes and backticks inside code entities for MarkdownV2
- Escape \ → \\ inside inline code and fenced code blocks
- Escape ` → \` inside fenced code block bodies (not delimiters)
- Add regression tests for code entity backslash handling
2026-03-20 18:32:45 +01:00
llbn
02f639e561 fix(telegram): add MarkdownV2 support for strikethrough, spoiler, and blockquotes
- Convert ~~text~~ to ~text~ (MarkdownV2 strikethrough)
- Protect ||text|| from pipe escaping (MarkdownV2 spoiler)
- Preserve > at line start as blockquote instead of escaping it
- Update _strip_mdv2() to strip ~strikethrough~ and ||spoiler|| markers
- Add tests covering new formatting paths and edge cases
2026-03-20 18:21:24 +01:00
Test
76bc27199f fix(cli, agent): improve streaming handling and state management
- Updated _stream_delta method in HermesCLI to handle None values, flushing the stream and resetting state for clean tool execution.
- Enhanced quiet mode handling in AIAgent to ensure proper display closure before tool execution, preventing display issues with intermediate streamed content.

These changes improve the robustness of the streaming functionality and ensure a smoother user experience during tool interactions.
2026-03-20 10:02:42 -07:00
Teknium
1aa7027be1 Merge pull request #2192 from NousResearch/hermes/hermes-3d7c23c9
fix(acp): preserve leading whitespace in streaming chunks
2026-03-20 09:52:32 -07:00
Teknium
f961937097 Merge pull request #2181 from NousResearch/hermes/hermes-4a7e401e
fix: missing platforms in delivery maps + WhatsApp image/bridge improvements
2026-03-20 09:45:50 -07:00
Teknium
7a427d7b03 fix: persistent event loop in _run_async prevents 'Event loop is closed' (#2190)
Cherry-picked from PR #2146 by @crazywriter1. Fixes #2104.

asyncio.run() creates and closes a fresh event loop each call. Cached
httpx/AsyncOpenAI clients bound to the dead loop crash on GC with
'Event loop is closed'. This hit vision_analyze on first use in CLI.

Two-layer fix:
- model_tools._run_async(): replace asyncio.run() with persistent
  loop via _get_tool_loop() + run_until_complete()
- auxiliary_client._get_cached_client(): track which loop created
  each async client, discard stale entries if loop is closed

6 regression tests covering loop lifecycle, reuse, and full vision
dispatch chain.

Co-authored-by: Test <test@test.com>
2026-03-20 09:44:50 -07:00
Teknium
66a1942524 feat: add /queue command to queue prompts without interrupting (#2191)
Adds /queue <prompt> (alias /q) that queues a message for the next
turn while the agent is busy, without interrupting the current run.

- CLI: /queue <prompt> puts it in _pending_input for the next turn
- Gateway: /queue <prompt> creates a pending MessageEvent on the
  adapter, picked up after the current agent run finishes
- Enter still interrupts as usual (no behavior change)
- /queue with no prompt shows usage
- /queue when agent is idle tells user to just type normally

Co-authored-by: Test <test@test.com>
2026-03-20 09:44:27 -07:00
Dilee
1173adbe86 fix(acp): preserve leading whitespace in streaming chunks 2026-03-20 09:38:13 -07:00
Test
a5beb6d8f0 fix(whatsapp): image downloading, bridge reuse, LID allowlist, Baileys 7.x compat
Salvaged from PR #2162 by @Zindar. Reply prefix changes excluded (already
on main via #1756 configurable prefix).

Bridge improvements (bridge.js):
- Download incoming images to ~/.hermes/image_cache/ via downloadMediaMessage
  so the agent can actually see user-sent photos
- Add getMessage callback required for Baileys 7.x E2EE session
  re-establishment (without it, some messages arrive as null)
- Build LID→phone reverse map for allowlist resolution (WhatsApp LID format)
- Add placeholder body for media without caption: [image received]
- Bind express to 127.0.0.1 instead of 0.0.0.0 for security
- Use 127.0.0.1 consistently throughout (more reliable than localhost)

Adapter improvements (whatsapp.py):
- Detect and reuse already-running bridge (only if status=connected)
- Handle local file paths from bridge-cached images in _build_message_event
- Don't kill external bridges on disconnect
- Use 127.0.0.1 throughout for consistency with bridge binding

Fix vs original PR: bridge reuse now checks status=connected, not just
HTTP 200. A disconnected bridge gets restarted instead of reused.

Co-authored-by: Zindar <zindar@users.noreply.github.com>
2026-03-20 09:37:48 -07:00
Teknium
0e3b7b6a39 docs: fill documentation gaps from recent PRs (#2183)
- slash-commands.md: add /approve, /deny (gateway-only), /statusbar
  (CLI-only); update Notes section with new platform-specific commands
- messaging/index.md: add Webhooks to architecture diagram, platform
  toolsets table, and Next Steps links; add /approve and /deny to
  Chat Commands table
- environment-variables.md: add HONCHO_BASE_URL for self-hosted
  Honcho instances
- configuration.md: add Context Pressure Warnings section (separate
  from iteration budget pressure); add base_url to OpenAI TTS config;
  add display.show_cost to Display Settings
- tts.md: add base_url to OpenAI TTS config example

Co-authored-by: Test <test@test.com>
2026-03-20 08:55:49 -07:00
Teknium
5e705bc31b Merge pull request #2182 from NousResearch/hermes/hermes-5d6932ba
fix: 6 bugs in model metadata, reasoning detection, and delegate tool
2026-03-20 08:53:01 -07:00
Test
55ce601502 fix: 6 bugs in model metadata, reasoning detection, and delegate tool
Cherry-picked from PR #2169 by @0xbyt4.

1. _strip_provider_prefix: skip Ollama model:tag names (qwen:0.5b)
2. Fuzzy match: remove reverse direction that made claude-sonnet-4
   resolve to 1M instead of 200K
3. _has_content_after_think_block: reuse _strip_think_blocks() to
   handle all tag variants (thinking, reasoning, REASONING_SCRATCHPAD)
4. models.dev lookup: elif→if so nous provider also queries models.dev
5. Disk cache fallback: use 5-min TTL instead of full hour so network
   is retried soon
6. Delegate build: wrap child construction in try/finally so
   _last_resolved_tool_names is always restored on exception
2026-03-20 08:52:37 -07:00
Test
8f6ecd5c64 fix: add missing platforms to cron/send_message delivery maps and tool schema
Matrix, Mattermost, Home Assistant, and DingTalk were missing from the
platform_map in both cron/scheduler.py and tools/send_message_tool.py,
causing delivery to those platforms to silently fail.

Also updates the cronjob tool schema description to list all available
delivery targets so the model knows its options.
2026-03-20 08:52:21 -07:00
Teknium
a51a767407 Merge pull request #2167 from buntingszn/fix/cron-matrix-delivery
fix(cron): add Matrix to scheduler delivery platform_map
2026-03-20 08:50:14 -07:00
Teknium
2ea4dd30c6 fix(gateway): strip orphaned tool_results + let /reset bypass running agent (#2180)
Two fixes for Telegram/gateway-specific bugs:

1. Anthropic adapter: strip orphaned tool_result blocks (mirror of
   existing tool_use stripping). Context compression or session
   truncation can remove an assistant message containing a tool_use
   while leaving the subsequent tool_result intact. Anthropic rejects
   these with a 400: 'unexpected tool_use_id found in tool_result
   blocks'. The adapter now collects all tool_use IDs and filters out
   any tool_result blocks referencing IDs not in that set.

2. Gateway: /reset and /new now bypass the running-agent guard (like
   /status already does). Previously, sending /reset while an agent
   was running caused the raw text to be queued and later fed back as
   a user message with the same broken history — replaying the
   corrupted session instead of resetting it. Now the running agent is
   interrupted, pending messages are cleared, and the reset command
   dispatches immediately.

Tests updated: existing tests now include proper tool_use→tool_result
pairs; two new tests cover orphaned tool_result stripping.

Co-authored-by: Test <test@test.com>
2026-03-20 08:39:49 -07:00
Teknium
80e578d3e3 docs: add context length detection references to FAQ and quickstart (#2179)
- quickstart.md: mention context length prompt for custom endpoints,
  link to configuration docs, add Ollama to provider table
- faq.md: rewrite local models section with hermes model flow and
  context length prompt example, add Ollama num_ctx tip, expand
  context-length-exceeded troubleshooting with detection override
  options and config.yaml examples

Co-authored-by: Test <test@test.com>
2026-03-20 08:38:44 -07:00
Teknium
c52353cf8a feat: context pressure warnings for CLI and gateway (#2159)
* feat: context pressure warnings for CLI and gateway

User-facing notifications as context approaches the compaction threshold.
Warnings fire at 60% and 85% of the way to compaction — relative to
the configured compression threshold, not the raw context window.

CLI: Formatted line with a progress bar showing distance to compaction.
Cyan at 60% (approaching), bold yellow at 85% (imminent).

  ◐ context ▰▰▰▰▰▰▰▰▰▰▰▰▱▱▱▱▱▱▱▱ 60% to compaction  100k threshold (50%) · approaching compaction
  ⚠ context ▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▱▱▱ 85% to compaction  100k threshold (50%) · compaction imminent

Gateway: Plain-text notification sent to the user's chat via the new
status_callback mechanism (asyncio.run_coroutine_threadsafe bridge,
same pattern as step_callback).

Does NOT inject into the message stream. The LLM never sees these
warnings. Flags reset after each compaction cycle.

Files changed:
- agent/display.py — format_context_pressure(), format_context_pressure_gateway()
- run_agent.py — status_callback param, _context_50/70_warned flags,
  _emit_context_pressure(), flag reset in _compress_context()
- gateway/run.py — _status_callback_sync bridge, wired to AIAgent
- tests/test_context_pressure.py — 23 tests

* Merge remote-tracking branch 'origin/main' into hermes/hermes-7ea545bf

---------

Co-authored-by: Test <test@test.com>
2026-03-20 08:37:36 -07:00
Teknium
d76ebf0ec3 feat(gateway): webhook platform adapter for external event triggers (#2166)
feat(gateway): webhook platform adapter for external event triggers
2026-03-20 08:27:58 -07:00
bunting szn
4be5070427 fix(cron): add Matrix to scheduler delivery platform_map
Matrix is a supported gateway platform but was missing from the
cron scheduler's delivery platform_map, causing cron job results
to silently fail delivery when targeting Matrix rooms.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 08:33:46 -05:00
Test
e140c02d51 feat(gateway): add webhook platform adapter for external event triggers
Add a generic webhook platform adapter that receives HTTP POSTs from
external services (GitHub, GitLab, JIRA, Stripe, etc.), validates HMAC
signatures, transforms payloads into agent prompts, and routes responses
back to the source or to another platform.

Features:
- Configurable routes with per-route HMAC secrets, event filters,
  prompt templates with dot-notation payload access, skill loading,
  and pluggable delivery (github_comment, telegram, discord, log)
- HMAC signature validation (GitHub SHA-256, GitLab token, generic)
- Rate limiting (30 req/min per route, configurable)
- Idempotency cache (1hr TTL, prevents duplicate runs on retries)
- Body size limits (1MB default, checked before reading payload)
- Setup wizard integration with security warnings and docs links
- 33 tests (29 unit + 4 integration), all passing

Security:
- HMAC secret required per route (startup validation)
- Setup wizard warns about internet exposure for webhook/SMS platforms
- Sandboxing (Docker/VM) recommended in docs for public-facing deployments

Files changed:
- gateway/config.py — Platform.WEBHOOK enum + env var overrides
- gateway/platforms/webhook.py — WebhookAdapter (~420 lines)
- gateway/run.py — factory wiring + auth bypass for webhook events
- hermes_cli/config.py — WEBHOOK_* env var definitions
- hermes_cli/setup.py — webhook section in setup_gateway()
- tests/gateway/test_webhook_adapter.py — 29 unit tests
- tests/gateway/test_webhook_integration.py — 4 integration tests
- website/docs/user-guide/messaging/webhooks.md — full user docs
- website/docs/reference/environment-variables.md — WEBHOOK_* vars
- website/sidebars.ts — nav entry
2026-03-20 06:33:36 -07:00
Teknium
88643a1ba9 feat: overhaul context length detection with models.dev and provider-aware resolution (#2158)
Replace the fragile hardcoded context length system with a multi-source
resolution chain that correctly identifies context windows per provider.

Key changes:

- New agent/models_dev.py: Fetches and caches the models.dev registry
  (3800+ models across 100+ providers with per-provider context windows).
  In-memory cache (1hr TTL) + disk cache for cold starts.

- Rewritten get_model_context_length() resolution chain:
  0. Config override (model.context_length)
  1. Custom providers per-model context_length
  2. Persistent disk cache
  3. Endpoint /models (local servers)
  4. Anthropic /v1/models API (max_input_tokens, API-key only)
  5. OpenRouter live API (existing, unchanged)
  6. Nous suffix-match via OpenRouter (dot/dash normalization)
  7. models.dev registry lookup (provider-aware)
  8. Thin hardcoded defaults (broad family patterns)
  9. 128K fallback (was 2M)

- Provider-aware context: same model now correctly resolves to different
  context windows per provider (e.g. claude-opus-4.6: 1M on Anthropic,
  128K on GitHub Copilot). Provider name flows through ContextCompressor.

- DEFAULT_CONTEXT_LENGTHS shrunk from 80+ entries to ~16 broad patterns.
  models.dev replaces the per-model hardcoding.

- CONTEXT_PROBE_TIERS changed from [2M, 1M, 512K, 200K, 128K, 64K, 32K]
  to [128K, 64K, 32K, 16K, 8K]. Unknown models no longer start at 2M.

- hermes model: prompts for context_length when configuring custom
  endpoints. Supports shorthand (32k, 128K). Saved to custom_providers
  per-model config.

- custom_providers schema extended with optional models dict for
  per-model context_length (backward compatible).

- Nous Portal: suffix-matches bare IDs (claude-opus-4-6) against
  OpenRouter's prefixed IDs (anthropic/claude-opus-4.6) with dot/dash
  normalization. Handles all 15 current Nous models.

- Anthropic direct: queries /v1/models for max_input_tokens. Only works
  with regular API keys (sk-ant-api*), not OAuth tokens. Falls through
  to models.dev for OAuth users.

Tests: 5574 passed (18 new tests for models_dev + updated probe tiers)
Docs: Updated configuration.md context length section, AGENTS.md

Co-authored-by: Test <test@test.com>
2026-03-20 06:04:33 -07:00
Teknium
b7b585656b Merge pull request #2110 from NousResearch/hermes/hermes-5d6932ba
fix: session reset + custom provider model switch + honcho base_url
2026-03-20 06:01:44 -07:00
Test
4494c0b033 fix(cron): remove send_message/clarify from cron agents + autonomous prompt
Cron jobs run unattended with no user present. Previously the agent had
send_message and clarify tools available, which makes no sense — the
final response is auto-delivered, and there's nobody to ask questions to.

Changes:
- Disable messaging and clarify toolsets for cron agent sessions
- Update cron platform hint to emphasize autonomous execution: no user
  present, cannot ask questions, must execute fully and make decisions
- Update cronjob tool schema description to match (remove stale
  send_message guidance)
2026-03-20 05:18:05 -07:00
Teknium
aa6416399e Merge pull request #2161 from NousResearch/hermes/hermes-6757a563
fix(display): show spinners and tool progress during streaming mode
2026-03-20 05:17:55 -07:00
Test
b313751acf fix(display): show spinners and tool progress during streaming mode
When streaming was enabled, two visual feedback mechanisms were
completely suppressed:

1. The thinking spinner (TUI toolbar) was skipped because the entire
   spinner block was gated on 'not self._has_stream_consumers()'.
   Now the thinking_callback fires in streaming mode too — the
   raw KawaiiSpinner is still skipped (would conflict with streamed
   tokens) but the TUI toolbar widget works fine alongside streaming.

2. Tool progress lines (the ┊ feed) were invisible because _vprint
   was blanket-suppressed when stream consumers existed. But during
   tool execution, no tokens are actively streaming, so printing is
   safe. Added an _executing_tools flag that _vprint respects to
   allow output during tool execution even with stream consumers
   registered.
2026-03-20 05:14:42 -07:00
Test
b1d05dfe8b fix(openai): route api.openai.com to Responses API for GPT-5.x
Based on PR #1859 by @magi-morph (too stale to cherry-pick, reimplemented).

GPT-5.x models reject tool calls + reasoning_effort on
/v1/chat/completions with a 400 error directing to /v1/responses.
This auto-detects api.openai.com in the base URL and switches to
codex_responses mode in three places:

- AIAgent.__init__: upgrades chat_completions → codex_responses
- _try_activate_fallback(): same routing for fallback model
- runtime_provider.py: _detect_api_mode_for_url() for both custom
  provider and openrouter runtime resolution paths

Also extracts _is_direct_openai_url() helper to replace the inline
check in _max_tokens_param().
2026-03-20 05:09:41 -07:00
Teknium
f8899af113 Merge pull request #2156 from NousResearch/hermes/hermes-6757a563
fix(signal): handle Note to Self messages with echo-back protection
2026-03-20 04:56:57 -07:00
Test
cf29cba084 docs(signal): add Note to Self section to Signal setup guide 2026-03-20 04:48:13 -07:00
Test
ec9b868aea fix(signal): handle Note to Self messages with echo-back protection
Support Signal 'Note to Self' messages in single-number setups where
signal-cli is linked as a secondary device on the user's own account.

syncMessage.sentMessage envelopes addressed to the bot's own account
are now promoted to dataMessage for normal processing, while other
sync events (read receipts, typing, etc.) are still filtered.

Echo-back prevention mirrors the WhatsApp bridge pattern:
- Track timestamps of recently sent messages (bounded set of 50)
- When a Note to Self sync arrives, check if its timestamp matches
  a recent outbound — skip if so (agent echo-back)
- Only process sync messages that are genuinely user-initiated

Based on PR #2115 by @Stonelinks with added echo-back protection.
2026-03-20 04:46:32 -07:00
Teknium
3ec6c71e43 fix: update claude 4.6 context length from 200K to 1M (#2155)
* fix: preserve Ollama model:tag colons in context length detection

The colon-split logic in get_model_context_length() and
_query_local_context_length() assumed any colon meant provider:model
format (e.g. "local:my-model"). But Ollama uses model:tag format
(e.g. "qwen3.5:27b"), so the split turned "qwen3.5:27b" into just
"27b" — which matches nothing, causing a fallback to the 2M token
probe tier.

Now only recognised provider prefixes (local, openrouter, anthropic,
etc.) are stripped. Ollama model:tag names pass through intact.

* fix: update claude-opus-4-6 and claude-sonnet-4-6 context length from 200K to 1M

Both models support 1,000,000 token context windows. The hardcoded defaults
were set before Anthropic expanded the context for the 4.6 generation.
Verified via models.dev and OpenRouter API data.

---------

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Co-authored-by: Test <test@test.com>
2026-03-20 04:38:59 -07:00
Test
4ad0083118 fix(honcho): read HONCHO_BASE_URL for local/self-hosted instances
Cherry-picked from PR #2120 by @unclebumpy.

- from_env() now reads HONCHO_BASE_URL and enables Honcho when base_url
  is set, even without an API key
- from_global_config() reads baseUrl from config root with
  HONCHO_BASE_URL env var as fallback
- get_honcho_client() guard relaxed to allow base_url without api_key
  for no-auth local instances
- Added HONCHO_BASE_URL to OPTIONAL_ENV_VARS registry

Result: Setting HONCHO_BASE_URL=http://localhost:8000 in ~/.hermes/.env
now correctly routes the Honcho client to a local instance.
2026-03-20 04:36:06 -07:00
Test
1055d4356a fix: skip model auto-detection for custom/local providers
When the user is on a custom provider (provider=custom, localhost, or
127.0.0.1 endpoint), /model <name> no longer tries to auto-detect a
provider switch. The model name changes on the current endpoint as-is.

To switch away from a custom endpoint, users must use explicit
provider:model syntax (e.g. /model openai-codex:gpt-5.2-codex).
A helpful tip is printed when changing models on a custom endpoint.

This prevents the confusing case where someone on LM Studio types
/model gpt-5.2-codex, the auto-detection tries to switch providers,
fails or partially succeeds, and requests still go to the old endpoint.

Also fixes the missing prompt_toolkit.auto_suggest mock stub in
test_cli_init.py (same issue already fixed in test_cli_new_session.py).
2026-03-20 04:35:17 -07:00
Test
5822711ae6 fix: complete session reset — missing compressor counters + test
Follow-up to PR #2101 (InB4DevOps). Adds three missing context compressor
resets in reset_session_state():
- compression_count (displayed in status bar)
- last_total_tokens
- _context_probed (stale context-error flag)

Also fixes the test_cli_new_session.py prompt_toolkit mock (missing
auto_suggest stub) and adds a regression test for #2099 that verifies
all token counters and compressor state are zeroed on /new.
2026-03-20 04:35:17 -07:00
Teknium
b19f5133c3 Merge pull request #2118 from NousResearch/hermes/hermes-e83093f0
feat: show reasoning/thinking blocks when show_reasoning is enabled
2026-03-20 04:35:12 -07:00
Teknium
471ea81a7d fix: preserve Ollama model:tag colons in context length detection (#2149)
The colon-split logic in get_model_context_length() and
_query_local_context_length() assumed any colon meant provider:model
format (e.g. "local:my-model"). But Ollama uses model:tag format
(e.g. "qwen3.5:27b"), so the split turned "qwen3.5:27b" into just
"27b" — which matches nothing, causing a fallback to the 2M token
probe tier.

Now only recognised provider prefixes (local, openrouter, anthropic,
etc.) are stripped. Ollama model:tag names pass through intact.

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-20 03:19:31 -07:00
Test
b1832faaae feat: show reasoning/thinking blocks when show_reasoning is enabled
- Add <thinking> tag to streaming filter's tag list
- When show_reasoning is on, route XML reasoning content to the
  reasoning display box instead of silently discarding it
- Expand _strip_think_blocks to handle all tag variants:
  <think>, <thinking>, <THINKING>, <reasoning>, <REASONING_SCRATCHPAD>
2026-03-19 19:44:31 -07:00
Teknium
3a9a1bbb84 Merge pull request #2091 from dusterbloom/fix/lmstudio-context-length-detection
feat: query local servers for actual context window size
2026-03-19 19:08:21 -07:00
Teknium
d8081790f3 Merge pull request #2102 from NousResearch/hermes/hermes-6757a563
fix(tools,cli): normalise MCP schemas + expand session list columns
2026-03-19 19:06:56 -07:00
Teknium
493bf8db7e Merge pull request #2083 from ygd58/fix/delegate-save-parent-tool-names-before-child-build
fix(delegate): save parent tool names before child construction mutates global
2026-03-19 18:47:29 -07:00
Teknium
d9eba2a44f feat: optional FastMCP skill + fix: gateway session race guard (#2113)
feat: optional FastMCP skill + fix: gateway session race guard
2026-03-19 18:26:49 -07:00
Test
fc061c2fee fix: harden sentinel guard for /stop during setup and shutdown
- /stop during sentinel returns helpful message instead of queuing
- Shutdown loop skips sentinel entries instead of catching AttributeError
- _handle_stop_command guards against sentinel (defensive)
- Added tests for both edge cases (7 total race guard tests)
2026-03-19 18:26:09 -07:00
Gutslabs
aaa96713d4 fix(gateway): prevent concurrent agent runs for the same session
Place a sentinel in _running_agents immediately after the "already
running" guard check passes — before any await.  Without this, the
numerous await points between the guard (line 1324) and agent
registration (track_agent at line 4790) create a window where a
second message for the same session can bypass the guard and start
a duplicate agent, corrupting the transcript.

The await gap includes: hook emissions, vision enrichment (external
API call), audio transcription (external API call), session hygiene
compression, and the run_in_executor call itself.  For messages with
media attachments the window can be several seconds wide.

The sentinel is wrapped in try/finally so it is always cleaned up —
even if the handler raises or takes an early-return path.  When the
real AIAgent is created, track_agent() overwrites the sentinel with
the actual instance (preserving interrupt support).

Also handles the edge case where a message arrives while the sentinel
is set but no real agent exists yet: the message is queued via the
adapter's pending-message mechanism instead of attempting to call
interrupt() on the sentinel object.
2026-03-19 18:23:24 -07:00
kshitijk4poor
02954c1a10 feat: add optional FastMCP skill for building MCP servers
Add FastMCP skill to optional-skills/mcp/fastmcp/ with:
- SKILL.md with workflow, design patterns, quality checklist
- Templates: API wrapper, database server, file processor
- Scaffold CLI script for template instantiation
- FastMCP CLI reference documentation

Moved to optional-skills (requires pip install fastmcp).

Based on work by kshitijk4poor in PR #2096.
Closes #343
2026-03-19 18:23:16 -07:00
Teknium
4355f30422 Merge pull request #2114 from NousResearch/hermes/hermes-14b05543
docs: align venv path to match installer (venv/ not .venv/)
2026-03-19 18:22:03 -07:00
Test
2f07df3177 fix(cli): expand session list columns for full ID visibility
Show complete session IDs in 'hermes sessions list' instead of
truncating to 20 characters. Widens title column from 20→30 chars
and adjusts header widths accordingly.

Fixes #2068. Based on PR #2085 by @Nebula037 with a correction
to preserve the no-titles layout (the original PR accidentally
replaced the Preview/Src header with a duplicate Title/Preview header).
2026-03-19 18:17:28 -07:00
Test
672e9752a0 docs: align venv path to match installer (venv/ not .venv/)
The install script creates venv/ but several docs referenced .venv/,
causing agents to fail with 'No such file or directory' when following
AGENTS.md instructions.

Fixes #2066
2026-03-19 18:16:26 -07:00
Teknium
df0f684c34 Merge pull request #2098 from JiwaniZakir/minisweagent_path-missing-wheel-2075
Clean fix — adds minisweagent_path to py-modules so it ships in the wheel. Thanks @JiwaniZakir!
2026-03-19 17:47:25 -07:00
Teknium
21afa134f0 Merge pull request #2101 from InB4DevOps/main
fix: Reset token counters on new session for accurate usage display
2026-03-19 17:47:11 -07:00
Teknium
6bcec1ac25 fix: resolve MiniMax 401 auth error by defaulting to anthropic_messages (#2103)
MiniMax's default base URL was /v1 which caused runtime_provider to
default to chat_completions mode (OpenAI-style Authorization: Bearer
header). MiniMax rejects this with a 401 because they require the
Anthropic-style x-api-key header.

Changes:
- auth.py: Change default inference_base_url for minimax and minimax-cn
  from /v1 to /anthropic
- runtime_provider.py: Auto-correct stale /v1 URLs from existing .env
  files to /anthropic, and always default minimax/minimax-cn providers
  to anthropic_messages mode
- Update tests to reflect new defaults, add tests for stale URL
  auto-correction and explicit api_mode override

Based on PR #2100 by @devorun. Fixes #2094.

Co-authored-by: Test <test@test.com>
2026-03-19 17:47:05 -07:00
InB4DevOps
fe331ed9bd fix: Reset token counters on new session for accurate usage display (#2099) 2026-03-20 01:21:25 +01:00
Peppi Littera
746abf5e28 fix: use reasoning content as response when model only produces think blocks
Local models (especially Qwen 3.5) sometimes wrap their entire response
inside <think> tags, leaving actual content empty. Previously this caused
3 retries and then an error, wasting tokens and failing the request.

Now when retries are exhausted and reasoning_text contains the response,
it is used as final_response instead of returning an error. The user
sees the actual answer instead of "Model generated only think blocks."
2026-03-20 00:26:36 +01:00
hermes
4d2c93a04f fix: normalize MCP object schemas without properties 2026-03-19 16:23:45 -07:00
Zakir Jiwani
3959e3cadb fix: add minisweagent_path to py-modules in pyproject.toml
Closes #2075
2026-03-19 22:20:44 +00:00
Peppi Littera
ec5fdb8b92 feat: query local servers for actual context window size
Custom endpoints (LM Studio, Ollama, vLLM, llama.cpp) silently fall
back to 2M tokens when /v1/models doesn't include context_length.

Adds _query_local_context_length() which queries server-specific APIs:
- LM Studio: /api/v1/models (max_context_length + loaded instances)
- Ollama: /api/show (model_info + num_ctx parameters)
- llama.cpp: /props (n_ctx from default_generation_settings)
- vLLM: /v1/models/{model} (max_model_len)

Prefers loaded instance context over max (e.g., 122K loaded vs 1M max).
Results are cached via save_context_length() to avoid repeated queries.

Also fixes detect_local_server_type() misidentifying LM Studio as
Ollama (LM Studio returns 200 for /api/tags with an error body).
2026-03-19 21:32:04 +01:00
Peppi Littera
c030ac1d85 fix: prefer loaded instance context size over max for LM Studio
When LM Studio has a model loaded with a custom context size (e.g.,
122K), prefer that over the model's max_context_length (e.g., 1M).
This makes the TUI status bar show the actual runtime context window.
2026-03-19 21:24:53 +01:00
Peppi Littera
d223f7388d feat: query local server for actual context window size
Instead of defaulting to 2M for unknown local models, query the server
API for the real context length. Supports Ollama (/api/show), vLLM
(max_model_len), and LM Studio (/v1/models). Results are cached to
avoid repeated queries.
2026-03-19 21:24:05 +01:00
ygd58
816d1344ee fix(delegate): save parent tool names before child construction mutates global 2026-03-19 20:27:26 +01:00
Teknium
4c0c7f4c6e fix: /model command — bare provider names, custom endpoint display
Two issues with /model preventing proper provider switching:

1. Bare provider names not detected: typing '/model nous' treated 'nous'
   as a model name instead of triggering a provider switch. Fixed by adding
   step 0 in detect_provider_for_model() that checks if the input matches
   a known provider name/alias (excluding 'custom'/'openrouter' which need
   explicit model names) and returns that provider's default model.

2. Custom endpoint details hidden: /model (no args) showed '[custom]' with
   just a usage hint but no endpoint URL or model name. Now displays the
   configured base_url for custom providers in both CLI and gateway.

Note: config base_url and OPENAI_BASE_URL are intentionally NOT cleared on
provider switch — dedicated provider paths (nous, anthropic, codex) have
their own credential resolution that ignores these, and clearing them would
destroy the user's custom endpoint config, preventing switching back.

Co-authored-by: Test <test@test.com>
2026-03-19 12:06:48 -07:00
StefanIsMe
04b6ecadc4 feat(cli): Tab now accepts auto-suggestions (ghost text)
Previously, Tab only handled dropdown completions. Users seeing gray
ghost text from history-based suggestions had no way to accept them
with Tab - they had to use Right arrow or Ctrl+E.

Now Tab follows priority:
1. Completion menu open → accept selected completion
2. Ghost text suggestion available → accept auto-suggestion
3. Otherwise → start completion menu

This matches user intuition that Tab should 'complete what I see.'
2026-03-19 10:40:37 -07:00
Teknium
e84d952dc0 fix(codex): handle reasoning-only responses and replay path (#2070)
* fix(codex): treat reasoning-only responses as incomplete, not stop

When a Codex Responses API response contains only reasoning items
(encrypted thinking state) with no message text or tool calls, the
_normalize_codex_response method was setting finish_reason='stop'.
This sent the response into the empty-content retry loop, which
burned 3 retries and then failed — exactly the pattern Nester
reported in Discord.

Two fixes:
1. _normalize_codex_response: reasoning-only responses (reasoning_items_raw
   non-empty but no final_text) now get finish_reason='incomplete', routing
   them to the Codex continuation path instead of the retry loop.
2. Incomplete handling: also checks for codex_reasoning_items when deciding
   whether to preserve an interim message, so encrypted reasoning state is
   not silently dropped when there is no visible reasoning text.

Adds 4 regression tests covering:
- Unit: reasoning-only → incomplete, reasoning+content → stop
- E2E: reasoning-only → continuation → final answer succeeds
- E2E: encrypted reasoning items preserved in interim messages

* fix(codex): ensure reasoning items have required following item in API input

Follow-up to the reasoning-only response fix. Three additional issues
found by tracing the full replay path:

1. _chat_messages_to_responses_input: when a reasoning-only interim
   message was converted to Responses API input, the reasoning items
   were emitted as the last items with no following item. The Responses
   API requires a following item after each reasoning item (otherwise:
   'missing_following_item' error, as seen in OpenHands #11406). Now
   emits an empty assistant message as the required following item when
   content is empty but reasoning items were added.

2. Duplicate detection: two consecutive reasoning-only incomplete
   messages with identical empty content/reasoning but different
   encrypted codex_reasoning_items were incorrectly treated as
   duplicates, silently dropping the second response's reasoning state.
   Now includes codex_reasoning_items in the duplicate comparison.

3. Added tests for both the API input conversion path and the duplicate
   detection edge case.

Research context: verified against OpenCode (uses Vercel AI SDK, no
retry loop so avoids the issue), Clawdbot (drops orphaned reasoning
blocks entirely), and OpenHands (hit the missing_following_item error).
Our approach preserves reasoning continuity while satisfying the API
constraint.

---------

Co-authored-by: Test <test@test.com>
2026-03-19 10:34:44 -07:00
Teknium
388130a122 fix: persist ACP sessions to SessionDB so they survive process restarts
* fix: persist ACP sessions to disk so they survive process restarts

The ACP adapter stored sessions entirely in-memory. When the editor
restarted the ACP subprocess (idle timeout, crash, system sleep/wake,
editor restart), all sessions were lost. The editor's load_session /
resume_session calls would fail to find the session, forcing a new
empty session and losing all conversation history.

Changes:
- SessionManager now persists each session as a JSON file under
  ~/.hermes/acp_sessions/<session_id>.json
- get_session() transparently restores from disk when not in memory
- update_cwd(), fork_session(), list_sessions() all check disk
- server.py calls save_session() after prompt completion, /reset,
  /compact, and model switches
- cleanup() and remove_session() delete disk files too
- Sessions have a 7-day TTL; expired sessions are pruned on startup
- Atomic writes via tempfile + os.replace to prevent corruption
- 11 new tests covering persistence, disk restoration, and TTL expiry

* refactor: use SessionDB instead of JSON files for ACP session persistence

Replace the standalone JSON file persistence layer with SessionDB
(~/.hermes/state.db) integration. ACP sessions now:
- Share the same DB as CLI and gateway sessions
- Are searchable via session_search (FTS5)
- Get token tracking, cost tracking, and session titles for free
- Follow existing session pruning policies

Key changes:
- _get_db() lazily creates a SessionDB, resolving HERMES_HOME
  dynamically (not at import time) for test compatibility
- _persist() creates session record + replaces messages in DB
- _restore() loads from DB with source='acp' filter
- cwd stored in model_config JSON field (no schema migration)
- Model values coerced to str to handle mock agents in tests
- Removed: json files, sessions_dir, ttl_days, _expire logic
- Tests updated: DB-backed persistence, FTS search, tool_call
  round-tripping, source filtering

---------

Co-authored-by: Test <test@test.com>
2026-03-19 10:30:50 -07:00
cmcleay
bb59057d5d fix: normalize live Chrome CDP endpoints for browser tools 2026-03-19 10:17:03 -07:00
Teknium
36a4481152 fix: prevent unavailable tool names from leaking into model schemas
* fix: prevent unavailable tool names from leaking into model schemas

When web_search/web_extract fail check_fn (no API key configured), their
names were still leaking into tool descriptions via two paths:

1. execute_code schema: sandbox_enabled was computed from tools_to_include
   (pre-filter) instead of the actual available tools (post-filter), so
   the execute_code description listed web_search/web_extract as available
   sandbox imports even when they weren't.

2. browser_navigate schema: hardcoded description said 'prefer web_search
   or web_extract' regardless of whether those tools existed.

The model saw these references, assumed the tools existed, and tried
calling them directly — triggering 'Unknown tool' errors.

Fix: compute available_tool_names from the filtered result set and use
that for both execute_code sandbox listing and browser_navigate description
patching.

* docs: add pitfall about cross-tool references in schema descriptions

---------

Co-authored-by: Test <test@test.com>
2026-03-19 10:08:14 -07:00
Test
efa753678c Merge PR #2064: feat(tools): add base_url support to OpenAI TTS provider
Authored by Hanai. Allows overriding the OpenAI TTS endpoint via
tts.openai.base_url in config.yaml for self-hosted or OpenAI-compatible
TTS services. Falls back to api.openai.com when not set.
2026-03-19 10:07:58 -07:00
Test
7f3a567259 Merge PR #2063: fix(daytona): migrate sandbox lookup from find_one to get/list
Authored by Lovre Pešut (rovle). Migrates from deprecated find_one(labels=...)
to get(sandbox_name) with deterministic naming (hermes-{task_id}), plus legacy
fallback via list(labels=...) for pre-migration sandboxes.
2026-03-19 10:01:40 -07:00
Yannick Stephan
defbe0f9e9 fix(cron): warn and skip missing skills instead of crashing job
When a cron job references a skill that is no longer installed,
_build_job_prompt() now logs a warning and injects a user-visible notice
into the prompt instead of raising RuntimeError. The job continues with
any remaining valid skills and the user prompt.

Adds 4 regression tests for missing skill handling.
2026-03-19 09:56:16 -07:00
rovle
18862145e4 fix(daytona): migrate sandbox lookup from find_one to get/list
find_one is being deprecated. Primary lookup now uses get() with a
deterministic sandbox name (hermes-{task_id}). A legacy fallback via
list(labels=...) ensures sandboxes created before this migration are
still resumable.
2026-03-19 17:54:46 +01:00
Test
35558dadf4 Merge PR #2061: fix(security): eliminate SQL string formatting in execute() calls
Authored by dusterbloom. Closes #1911.

Pre-computes SQL query strings at class definition time in insights.py,
adds identifier quoting for ALTER TABLE DDL in hermes_state.py, and adds
4 regression tests verifying query construction safety.
2026-03-19 09:52:00 -07:00
Test
ae8059ca24 fix(delegate): move _saved_tool_names assignment to correct scope
The merge at e7844e9c re-introduced a line in _build_child_agent() that
references _saved_tool_names — a variable only defined in _run_single_child().
This caused NameError on every delegate_task call, completely breaking
subagent delegation.

Moves the child._delegate_saved_tool_names assignment to _run_single_child()
where _saved_tool_names is actually defined, keeping the save/restore in the
same scope as the try/finally block.

Adds two regression tests from PR #2038 (YanSte).
Also fixes the same issue reported in PR #2048 (Gutslabs).

Co-authored-by: Yannick Stephan <yannick.stephan@gmail.com>
Co-authored-by: Guts <gutslabs@users.noreply.github.com>
2026-03-19 09:26:05 -07:00
Han
116984feb7 feat(tools): add base_url support to OpenAI TTS provider
Allow users to configure a custom base_url for the OpenAI TTS provider
in ~/.hermes/config.yaml under tts.openai.base_url. Defaults to the
official OpenAI endpoint. Enables use of self-hosted or OpenAI-compatible
TTS services (e.g. http://localhost:8000/v1).

Also adds a TTS configuration example block to cli-config.yaml.example.
2026-03-19 23:55:13 +08:00
Peppi Littera
219af75704 fix(security): eliminate SQL string formatting in execute() calls
Closes #1911

- insights.py: Pre-compute SELECT queries as class constants instead of
  f-string interpolation at runtime. _SESSION_COLS is now evaluated once
  at class definition time.
- hermes_state.py: Add identifier quoting and whitelist validation for
  ALTER TABLE column names in schema migrations.
- Add 4 tests verifying no injection vectors in SQL query construction.
2026-03-19 15:16:35 +01:00
Teknium
d76fa7fc37 fix: detect context length for custom model endpoints via fuzzy matching + config override (#2051)
* fix: detect context length for custom model endpoints via fuzzy matching + config override

Custom model endpoints (non-OpenRouter, non-known-provider) were silently
falling back to 2M tokens when the model name didn't exactly match what the
endpoint's /v1/models reported. This happened because:

1. Endpoint metadata lookup used exact match only — model name mismatches
   (e.g. 'qwen3.5:9b' vs 'Qwen3.5-9B-Q4_K_M.gguf') caused a miss
2. Single-model servers (common for local inference) required exact name
   match even though only one model was loaded
3. No user escape hatch to manually set context length

Changes:
- Add fuzzy matching for endpoint model metadata: single-model servers
  use the only available model regardless of name; multi-model servers
  try substring matching in both directions
- Add model.context_length config override (highest priority) so users
  can explicitly set their model's context length in config.yaml
- Log an informative message when falling back to 2M probe, telling
  users about the config override option
- Thread config_context_length through ContextCompressor and AIAgent init

Tests: 6 new tests covering fuzzy match, single-model fallback, config
override (including zero/None edge cases).

* fix: auto-detect local model name and context length for local servers

Cherry-picked from PR #2043 by sudoingX.

- Auto-detect model name from local server's /v1/models when only one
  model is loaded (no manual model name config needed)
- Add n_ctx_train and n_ctx to context length detection keys for llama.cpp
- Query llama.cpp /props endpoint for actual allocated context (not just
  training context from GGUF metadata)
- Strip .gguf suffix from display in banner and status bar
- _auto_detect_local_model() in runtime_provider.py for CLI init

Co-authored-by: sudo <sudoingx@users.noreply.github.com>

* fix: revert accidental summary_target_tokens change + add docs for context_length config

- Revert summary_target_tokens from 2500 back to 500 (accidental change
  during patching)
- Add 'Context Length Detection' section to Custom & Self-Hosted docs
  explaining model.context_length config override

---------

Co-authored-by: Test <test@test.com>
Co-authored-by: sudo <sudoingx@users.noreply.github.com>
2026-03-19 06:01:16 -07:00
Teknium
7b6d14e62a fix(gateway): replace bare text approval with /approve and /deny commands (#2002)
The gateway approval system previously intercepted bare 'yes'/'no' text
from the user's next message to approve/deny dangerous commands. This was
fragile and dangerous — if the agent asked a clarify question and the user
said 'yes' to answer it, the gateway would execute the pending dangerous
command instead. (Fixes #1888)

Changes:
- Remove bare text matching ('yes', 'y', 'approve', 'ok', etc.) from
  _handle_message approval check
- Add /approve and /deny as gateway-only slash commands in the command
  registry
- /approve supports scoping: /approve (one-time), /approve session,
  /approve always (permanent)
- Add 5-minute timeout for stale approvals
- Gateway appends structured instructions to the agent response when a
  dangerous command is pending, telling the user exactly how to respond
- 9 tests covering approve, deny, timeout, scoping, and verification
  that bare 'yes' no longer triggers execution

Credit to @solo386 and @FlyByNight69420 for identifying and reporting
this security issue in PR #1971 and issue #1888.

Co-authored-by: Test <test@test.com>
2026-03-18 16:58:20 -07:00
Teknium
67d707e851 fix: respect config.yaml model.base_url for Anthropic provider (#1948) (#1998)
After #1675 removed ANTHROPIC_BASE_URL env var support, the Anthropic
provider base URL was hardcoded to https://api.anthropic.com. Now reads
model.base_url from config.yaml as an override, falling back to the
default when not set. Also applies to the auxiliary client.

Cherry-picked from PR #1949 by @rivercrab26.

Co-authored-by: rivercrab26 <rivercrab26@users.noreply.github.com>
2026-03-18 16:51:24 -07:00
Teknium
e648863d52 docs: fix documentation inconsistencies across reference and user guides
- toolsets-reference: add browser_console to browser + all platform toolsets,
  add missing hermes-acp, hermes-sms, messaging toolsets, correct hermes-gateway
  as composite, deduplicate platform toolset listings
- tools-reference: add missing vision and web toolset sections
- slash-commands: fix /new+/reset as alias (not separate commands), add /stop to
  CLI section (available in both CLI and gateway), add /plugins command, fix Notes
  section about messaging-only vs CLI-only
- environment-variables: fix HERMES_MAX_ITERATIONS default (90 not 60), add
  DEEPSEEK_API_KEY/BASE_URL, OPENCODE_ZEN/GO keys, TAVILY_API_KEY,
  GITHUB_TOKEN, HERMES_EPHEMERAL_SYSTEM_PROMPT
- configuration: remove duplicate Alibaba Cloud row, add OpenCode Zen/Go providers
- cli-commands: add missing providers to --provider list (opencode-zen,
  opencode-go, ai-gateway, kilocode, alibaba)
- quickstart: add OpenCode Zen and OpenCode Go to provider table

Co-authored-by: Test <test@test.com>
2026-03-18 16:26:27 -07:00
Teknium
a7cc1cf309 fix: support Anthropic-compatible endpoints for third-party providers (#1997)
Three bugs prevented providers like MiniMax from using their
Anthropic-compatible endpoints (e.g. api.minimax.io/anthropic):

1. _VALID_API_MODES was missing 'anthropic_messages', so explicit
   api_mode config was silently rejected and defaulted to
   chat_completions.

2. API-key provider resolution hardcoded api_mode to 'chat_completions'
   without checking model config or detecting Anthropic-compatible URLs.

3. run_agent.py auto-detection only recognized api.anthropic.com, not
   third-party endpoints using the /anthropic URL convention.

Fixes:
- Add 'anthropic_messages' to _VALID_API_MODES
- API-key providers now check model config api_mode and auto-detect
  URLs ending in /anthropic
- run_agent.py and fallback logic detect /anthropic URL convention
- 5 new tests covering all scenarios

Users can now either:
- Set MINIMAX_BASE_URL=https://api.minimax.io/anthropic (auto-detected)
- Set api_mode: anthropic_messages in model config (explicit)
- Use custom_providers with api_mode: anthropic_messages

Co-authored-by: Test <test@test.com>
2026-03-18 16:26:06 -07:00
Teknium
f24db23458 fix: custom provider uses config base_url and api_key over env vars (#1760) (#1994)
When provider: custom is set in config.yaml with base_url and api_key,
those values are now used instead of falling back to OPENAI_BASE_URL and
OPENAI_API_KEY env vars. Also reads the 'api' field as an alternative to
'api_key' for config compatibility.

Cherry-picked from PR #1762 by crazywriter1.

Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
2026-03-18 16:00:14 -07:00
Teknium
d132e344d7 fix(agent): prevent silent tool result loss during context compression (#1993)
_align_boundary_backward only checked messages[idx-1] to decide if
the compress-end boundary splits a tool_call/result group. When an
assistant issues 3+ parallel tool calls, their results span multiple
consecutive messages. If the boundary fell in the middle of that group,
the parent assistant was summarized away and orphaned tool results were
silently deleted by _sanitize_tool_pairs.

Now walks backward through all consecutive tool results to find the
parent assistant, then pulls the boundary before the entire group.

6 regression tests added in tests/test_compression_boundary.py.

Co-authored-by: Guts <Gutslabs@users.noreply.github.com>
2026-03-18 15:22:51 -07:00
Teknium
22f41daded fix: send error details to user in gateway outer exception handler
Previously, if an error occurred during response processing in
_process_message_background (e.g. during extract_media, send, or
any uncaught exception from the handler), the error was only logged
to server console and the user was left with radio silence — typing
indicator stops but no message arrives.

Now the outer except block attempts to send the error type and detail
(truncated to 300 chars) to the user's chat, matching the format
already used by the inner handler in gateway/run.py.

Co-authored-by: Test <test@test.com>
2026-03-18 10:42:43 -07:00
Teknium
7c7feaa033 Merge pull request #1929 from NousResearch/hermes/hermes-b29f73b2
feat: inject model and provider into system prompt
2026-03-18 04:18:41 -07:00
Teknium
2f80bd9f87 fix: whatsapp reply_prefix config.yaml bridging was dead code (#1923)
The whatsapp reply_prefix bridging referenced config.platforms before
the config object was constructed, making it a silent NameError caught
by except Exception: pass.

Fix: fold reply_prefix into the per-platform bridging loop (introduced
in #1919) which correctly writes to gw_data dict pre-construction.
Removes the broken standalone whatsapp bridging block.

Co-authored-by: Test <test@test.com>
2026-03-18 04:18:33 -07:00
Teknium
23e5e8dde9 Merge pull request #1928 from NousResearch/hermes/hermes-ba3c8fa1
chore: trim huggingface-hub skill description
2026-03-18 04:18:27 -07:00
Test
e99aca98ab feat: inject model and provider into system prompt
Adds model name and provider to the system prompt metadata block,
alongside the existing session ID and timestamp. These are frozen
at session start and don't change mid-conversation, so they won't
break prompt caching.
2026-03-18 04:18:26 -07:00
Test
7e30e97a59 chore: trim redundant trigger sentence from huggingface-hub description 2026-03-18 04:18:13 -07:00
Teknium
db4dfea7ec docs: document SOUL.md as primary agent identity (#1927)
Update all SOUL.md documentation to reflect that it now occupies
slot #1 in the system prompt, replacing the hardcoded default identity.

Updated pages:
- user-guide/features/personality.md — SOUL.md is primary identity, not just a layer
- developer-guide/prompt-assembly.md — updated prompt layer order, context files list
- guides/use-soul-with-hermes.md — SOUL.md replaces built-in identity
- user-guide/configuration.md — updated context files table and directory tree

Co-authored-by: Test <test@test.com>
2026-03-18 04:18:08 -07:00
Teknium
17254a7692 Merge pull request #1926 from NousResearch/hermes/hermes-ba3c8fa1
chore: add search to huggingface-hub skill description
2026-03-18 04:15:17 -07:00
Test
adf188c439 chore: add search to huggingface-hub skill description 2026-03-18 04:15:03 -07:00
Teknium
21958a55d1 Merge pull request #1925 from NousResearch/hermes/hermes-ba3c8fa1
chore: tighten huggingface-hub skill description
2026-03-18 04:11:43 -07:00
Test
947827bba0 chore: tighten huggingface-hub skill description 2026-03-18 04:11:33 -07:00
Teknium
e4a3ffa9c1 feat: use SOUL.md as primary agent identity instead of hardcoded default (#1922)
SOUL.md now loads in slot #1 of the system prompt, replacing the
hardcoded DEFAULT_AGENT_IDENTITY. This lets users fully customize
the agent's identity and personality by editing ~/.hermes/SOUL.md
without it conflicting with the built-in identity text.

When SOUL.md is loaded as identity, it's excluded from the context
files section to avoid appearing twice. When SOUL.md is missing,
empty, unreadable, or skip_context_files is set, the hardcoded
DEFAULT_AGENT_IDENTITY is used as a fallback.

The default SOUL.md (seeded on first run) already contains the full
Hermes personality, so existing installs are unaffected.

Co-authored-by: Test <test@test.com>
2026-03-18 04:11:20 -07:00
Teknium
1fa3737134 feat: GitHub Copilot provider integration (#1924)
feat: GitHub Copilot provider integration with OAuth auth, API routing, and docs
2026-03-18 04:09:30 -07:00
Test
e7844e9c8d Merge origin/main, resolve conflicts (self._base_url_lower) 2026-03-18 04:09:00 -07:00
Teknium
1c761ae042 feat: add huggingface-hub bundled skill (#1921)
feat: add huggingface-hub bundled skill
2026-03-18 04:08:00 -07:00
Test
56ca84f243 feat: add huggingface-hub bundled skill
Adds the Hugging Face CLI (hf) reference as a built-in skill under
mlops/. Covers downloading/uploading models and datasets, repo
management, SQL queries on datasets, inference endpoints, Spaces,
buckets, and more.

Based on the official HF skill from huggingface/skills.
2026-03-18 04:07:41 -07:00
Test
04101bc59e docs: comprehensive GitHub Copilot provider documentation
- Add dedicated GitHub Copilot section in configuration guide with:
  - Auth options (OAuth device code, env vars, gh CLI)
  - Token type table (supported vs unsupported)
  - API routing explanation (GPT-5+ → Responses, others → Chat)
  - Copilot ACP setup instructions
  - Environment variable reference
- Add all Copilot env vars to environment-variables.md:
  COPILOT_GITHUB_TOKEN, HERMES_COPILOT_ACP_COMMAND, etc.
- Add copilot-acp to --provider list in cli-commands.md
- Docs build verified
2026-03-18 04:07:34 -07:00
Teknium
0a247a50f2 feat: support ignoring unauthorized gateway DMs (#1919)
Add unauthorized_dm_behavior config (pair|ignore) with global default
and per-platform override. WhatsApp can silently drop unknown DMs
instead of sending pairing codes.

Adapted config bridging to work with gw_data dict (pre-construction)
rather than config object. Dropped implementation plan document.

Co-authored-by: Frederico Ribeiro <fr@tecompanytea.com>
2026-03-18 04:06:08 -07:00
Teknium
0e2714acea fix(cron): recover recent one-shot jobs (#1918)
Co-authored-by: Frederico Ribeiro <fr@tecompanytea.com>
2026-03-18 04:06:02 -07:00
Test
36921a3e98 fix: correct Copilot API mode selection to match opencode
The previous copilot_model_api_mode() checked the catalog's
supported_endpoints first and picked /chat/completions when a model
supported both endpoints. This is wrong — GPT-5+ models should use
the Responses API even when the catalog lists both.

Replicate opencode's shouldUseCopilotResponsesApi() logic:
- GPT-5+ models (gpt-5.4, gpt-5.3-codex, etc.) → Responses API
- gpt-5-mini → Chat Completions (explicit exception)
- Everything else (gpt-4o, claude, gemini, etc.) → Chat Completions
- Model ID pattern is the primary signal, catalog is secondary

The catalog fallback now only matters for non-GPT-5 models that might
exclusively support /v1/messages (e.g. Claude via Copilot).

Models are auto-detected from the live catalog at
api.githubcopilot.com/models — no hardcoded list required for
supported models, only a static fallback for when the API is
unreachable.
2026-03-18 03:54:50 -07:00
Teknium
c1a127c87c Merge pull request #1917 from NousResearch/hermes/hermes-b29f73b2
feat(cli): add /statusbar command to toggle context bar
2026-03-18 03:50:05 -07:00
Test
c1750bb32d feat(cli): add /statusbar command to toggle context bar
Adds /statusbar (alias /sb) to show/hide the bottom status bar that
displays model name, context usage, and session duration.

Uses ConditionalContainer so the bar takes zero space when hidden
rather than leaving a blank line.
2026-03-18 03:49:49 -07:00
Teknium
4699c226da chore: reorder OpenRouter model catalog (#1916)
chore: reorder OpenRouter model catalog
2026-03-18 03:31:19 -07:00
Test
b05f9b6256 chore: reorder OpenRouter catalog — glm-5-turbo under glm-5, minimax under stepfun 2026-03-18 03:31:04 -07:00
Teknium
0679712d26 feat: reorder OpenRouter catalog, add haiku-4.5, fix minimax slug (#1915)
feat: reorder OpenRouter catalog, add haiku-4.5, fix minimax slug
2026-03-18 03:26:22 -07:00
Test
cb54750e07 feat: reorder OpenRouter catalog, add haiku-4.5, fix minimax slug
- Add anthropic/claude-haiku-4.5
- Move gpt-5.4-pro and gpt-5.4-nano to bottom
- Fix minimax/minimax-m2.7 → minimax-m2.5 (m2.7 not on OpenRouter)
- Tag hunter-alpha and healer-alpha as free
- Place hunter/healer-alpha right below gpt-5.4-mini
2026-03-18 03:26:06 -07:00
Test
21c45ba0ac feat: proper Copilot auth with OAuth device code flow and token validation
Builds on PR #1879's Copilot integration with critical auth improvements
modeled after opencode's implementation:

- Add hermes_cli/copilot_auth.py with:
  - OAuth device code flow (copilot_device_code_login) using the same
    client_id (Ov23li8tweQw6odWQebz) as opencode and Copilot CLI
  - Token type validation: reject classic PATs (ghp_*) with a clear
    error message explaining supported token types
  - Proper env var priority: COPILOT_GITHUB_TOKEN > GH_TOKEN > GITHUB_TOKEN
    (matching Copilot CLI documentation)
  - copilot_request_headers() with Openai-Intent, x-initiator, and
    Copilot-Vision-Request headers (matching opencode)

- Update auth.py:
  - PROVIDER_REGISTRY copilot entry uses correct env var order
  - _resolve_api_key_provider_secret delegates to copilot_auth for
    the copilot provider with proper token validation

- Update models.py:
  - copilot_default_headers() now includes Openai-Intent and x-initiator

- Update main.py:
  - _model_flow_copilot offers OAuth device code login when no token
    is found, with manual token entry as fallback
  - Shows supported vs unsupported token types

- 22 new tests covering token validation, env var priority, header
  generation, and integration with existing auth infrastructure
2026-03-18 03:25:58 -07:00
Teknium
c0c14e60b4 fix: make concurrent tool batching path-aware for file mutations (#1914)
* Improve tool batching independence checks

* fix: address review feedback on path-aware batching

- Log malformed/non-dict tool arguments at debug level before
  falling back to sequential, instead of silently swallowing
  the error into an empty dict
- Guard empty paths in _paths_overlap (unreachable in practice
  due to upstream filtering, but makes the invariant explicit)
- Add tests: malformed JSON args, non-dict args, _paths_overlap
  unit tests including empty path edge cases
- web_crawl is not a registered tool (only web_search/web_extract
  are); no addition needed to _PARALLEL_SAFE_TOOLS

---------

Co-authored-by: kshitij <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-18 03:25:38 -07:00
Teknium
050b43108c feat: add gpt-5.4-mini, gpt-5.4-nano, healer-alpha to OpenRouter catalog (#1913)
feat: add gpt-5.4-mini, gpt-5.4-nano, healer-alpha to OpenRouter catalog
2026-03-18 03:23:36 -07:00
Test
00cc0c6a28 feat: add gpt-5.4-mini, gpt-5.4-nano, healer-alpha to OpenRouter catalog 2026-03-18 03:23:20 -07:00
Teknium
bee13d9921 Merge pull request #1912 from NousResearch/hermes/hermes-b29f73b2
fix(banner): normalize toolset labels and use skin colors
2026-03-18 03:23:15 -07:00
Test
f814787144 fix(banner): normalize toolset labels and use skin colors
- Strip '_tools' suffix from internal toolset identifiers in the banner
  (e.g. 'web_tools' -> 'web', 'homeassistant_tools' -> 'homeassistant')
- Stop appending '_tools' to unavailable toolset names
- Replace 6 hardcoded hex colors (#B8860B, #FFBF00, #FFF8DC) in toolset
  rows, overflow line, and MCP server rows with the skin variables
  (dim, accent, text) already resolved at the top of the function

Inspired by PR #1871 by @kshitijk4poor.
Adds 4 tests.
2026-03-18 03:22:58 -07:00
Teknium
c9bb0c587f fix: direct user message on STT failure + hermes-agent-setup skill (#1905)
fix: direct user message on STT failure + hermes-agent-setup skill
2026-03-18 03:21:12 -07:00
Test
8422196e89 Merge PR #1879: feat: integrate GitHub Copilot providers 2026-03-18 03:18:33 -07:00
Teknium
b70dd51cfa fix: disabled skills respected across banner, system prompt, slash commands, and skill_view (#1897)
* fix: banner skill count now respects disabled skills and platform filtering

The banner's get_available_skills() was doing a raw rglob scan of
~/.hermes/skills/ without checking:
- Whether skills are disabled (skills.disabled config)
- Whether skills match the current platform (platforms: frontmatter)

This caused the banner to show inflated skill counts (e.g. '100 skills'
when many are disabled) and list macOS-only skills on Linux.

Fix: delegate to _find_all_skills() from tools/skills_tool which already
handles both platform gating and disabled-skill filtering.

* fix: system prompt and slash commands now respect disabled skills

Two more places where disabled skills were still surfaced:

1. build_skills_system_prompt() in prompt_builder.py — disabled skills
   appeared in the <available_skills> system prompt section, causing
   the agent to suggest/load them despite being disabled.

2. scan_skill_commands() in skill_commands.py — disabled skills still
   registered as /skill-name slash commands in CLI help and could be
   invoked.

Both now load _get_disabled_skill_names() and filter accordingly.

* fix: skill_view blocks disabled skills

skill_view() checked platform compatibility but not disabled state,
so the agent could still load and read disabled skills directly.

Now returns a clear error when a disabled skill is requested, telling
the user to enable it via hermes skills or inspect the files manually.

---------

Co-authored-by: Test <test@test.com>
2026-03-18 03:17:37 -07:00
Test
190c07975d fix: check skill availability before hinting at hermes-agent-setup
Only mention the hermes-agent-setup skill in STT failure notes (both
the direct user message and the agent context note) when the skill is
actually installed. Uses _find_skill() from skill_manager_tool.

Also confirmed: STT is the only user-facing failure case where the
setup skill hint helps. Vision failures are transient API issues,
runtime transcription errors indicate a configured-but-broken provider,
and platform startup warnings are server logs.
2026-03-18 03:17:23 -07:00
Teknium
011ed540dd Merge pull request #1909 from NousResearch/hermes/hermes-b29f73b2
docs: fix MCP install commands — use uv, not bare pip
2026-03-18 03:15:15 -07:00
Test
a9c405fac9 docs: fix MCP install commands — use uv, not bare pip
The standard install already includes MCP via .[all]. For users who
need to add it separately, the correct command is:
  cd ~/.hermes/hermes-agent && uv pip install -e ".[mcp]"

The venv is created by uv, so bare 'pip' isn't available. All four
occurrences across 3 docs pages updated.
2026-03-18 03:14:58 -07:00
Teknium
9c174e0940 Merge pull request #1908 from NousResearch/hermes/hermes-b29f73b2
fix(gateway): detect script-style gateway processes for --replace
2026-03-18 03:13:21 -07:00
TheSameCat2
5c4c4b8b7d fix(gateway): detect script-style gateway processes for --replace
Recognize hermes_cli/main.py gateway command lines in gateway
process detection and PID validation so --replace reliably finds
existing gateway instances.

Adds a regression test covering script-style cmdline detection.

Closes #1830
2026-03-18 03:12:59 -07:00
Test
764825bbff feat: expand hermes-agent-setup skill + tell agent about it in STT notes
Skill now covers full CLI usage (hermes setup, hermes skills, hermes
tools, hermes config, session management, etc.), config file reference,
and expanded gateway commands.

Agent context notes for STT failure now mention the hermes-agent-setup
skill is available to help users configure Hermes features.
2026-03-18 03:05:17 -07:00
Teknium
ee4cc8ee3b Merge pull request #1907 from NousResearch/hermes/hermes-b29f73b2
feat(mcp): expose MCP servers as standalone toolsets
2026-03-18 03:04:34 -07:00
Test
4b53b89f09 feat(mcp): expose MCP servers as standalone toolsets
Each configured MCP server now registers as its own toolset in TOOLSETS
(e.g. TOOLSETS['github'] = {tools: ['mcp_github_list_files', ...]}),
making raw server names resolvable in platform_toolsets overrides.

Previously MCP tools were only injected into hermes-* umbrella toolsets,
so gateway sessions using raw toolset names like ['terminal', 'github']
in platform_toolsets couldn't resolve MCP tools.

Skips server names that collide with built-in toolsets. Also handles
idempotent reloads (syncs toolsets even when no new servers connect).

Inspired by PR #1876 by @kshitijk4poor.
Adds 2 tests (standalone toolset creation + built-in collision guard).
2026-03-18 03:04:17 -07:00
Teknium
a2440f72f6 feat: use endpoint metadata for custom model context and pricing (#1906)
* perf: cache base_url.lower() via property, consolidate triple load_config(), hoist set constant

run_agent.py:
- Add base_url property that auto-caches _base_url_lower on every
  assignment, eliminating 12+ redundant .lower() calls per API cycle
  across __init__, _build_api_kwargs, _supports_reasoning_extra_body,
  and the main conversation loop
- Consolidate three separate load_config() disk reads in __init__
  (memory, skills, compression) into a single call, reusing the
  result dict for all three config sections

model_tools.py:
- Hoist _READ_SEARCH_TOOLS set to module level (was rebuilt inside
  handle_function_call on every tool invocation)

* Use endpoint metadata for custom model context and pricing

---------

Co-authored-by: kshitij <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-18 03:04:07 -07:00
Test
9c0f346258 fix: direct user message on STT failure + hermes-agent-setup skill
When a user sends a voice message and STT isn't configured, the gateway
now sends a clear message directly to the user explaining how to set up
voice transcription, rather than relying on the agent to relay an
injected context note (which often gets misinterpreted).

Also adds a hermes-agent-setup bundled skill covering STT/TTS setup,
tool configuration, dependency installation, and troubleshooting.
2026-03-18 03:01:41 -07:00
Teknium
11f029c311 fix(tts): document NeuTTS provider and align install guidance (#1903)
Co-authored-by: charles-édouard <59705750+ccbbccbb@users.noreply.github.com>
2026-03-18 02:55:30 -07:00
Teknium
fb923d5efc Merge pull request #1902 from NousResearch/hermes/hermes-b29f73b2
fix(gateway): PID-based wait with force-kill for gateway restart
2026-03-18 02:54:38 -07:00
Test
ace2cc6257 fix(gateway): PID-based wait with force-kill for gateway restart
Add _wait_for_gateway_exit() that polls get_running_pid() to confirm
the old gateway process has actually exited before starting a new one.
If the process doesn't exit within 5s, sends SIGKILL to the specific
PID. Uses the saved PID from gateway.pid (not launchd labels) so it
works correctly with multiple gateway instances under separate
HERMES_HOME directories.

Applied to both launchd_restart() and the manual restart path (replaces
the blind time.sleep(2)).

Inspired by PR #1881 by @AzothZephyr (race condition diagnosis).
Adds 4 tests.
2026-03-18 02:54:18 -07:00
Teknium
24ac577046 fix: respect model.default from config.yaml for openai-codex provider (#1896)
When config.yaml had a non-default model (e.g. gpt-5.3-codex) and the
provider was openai-codex, _normalize_model_for_provider() would replace
it with the latest available codex model because _model_is_default only
checked the CLI argument, not the config value.

Now _model_is_default is False when config.yaml has a model that differs
from the global fallback (anthropic/claude-opus-4.6), so the user's
explicit config choice is preserved.

Fixes #1887

Co-authored-by: Test <test@test.com>
2026-03-18 02:50:31 -07:00
Teknium
e86bfd7667 feat: upgrade MiniMax default to M2.7 + add new OpenRouter models (#1900)
feat: upgrade MiniMax default to M2.7 + add new OpenRouter models
2026-03-18 02:43:19 -07:00
octo-patch
e4043633fc feat: upgrade MiniMax default to M2.7 + add new OpenRouter models
MiniMax: Add M2.7 and M2.7-highspeed as new defaults across provider
model lists, auxiliary client, metadata, setup wizard, RL training tool,
fallback tests, and docs. Retain M2.5/M2.1 as alternatives.

OpenRouter: Add grok-4.20-beta, nemotron-3-super-120b-a12b:free,
trinity-large-preview:free, glm-5-turbo, and hunter-alpha to the
model catalog.

MiniMax changes based on PR #1882 by @octo-patch (applied manually
due to stale conflicts in refactored pricing module).
2026-03-18 02:42:58 -07:00
Test
a8132d1252 fix: respect model.default from config.yaml for openai-codex provider
When config.yaml had a non-default model (e.g. gpt-5.3-codex) and the
provider was openai-codex, _normalize_model_for_provider() would replace
it with the latest available codex model because _model_is_default only
checked the CLI argument, not the config value.

Now _model_is_default is False when config.yaml has a model that differs
from the global fallback (anthropic/claude-opus-4.6), so the user's
explicit config choice is preserved.

Fixes #1887
2026-03-18 02:24:41 -07:00
Teknium
927f4d3a37 fix(matrix): use correct reply_to_message_id parameter name (#1895)
fix(matrix): use correct reply_to_message_id parameter name
2026-03-18 02:23:38 -07:00
Bartok9
66f71c1836 fix(matrix): use correct reply_to_message_id parameter name
Fixes #1842

The MessageEvent dataclass expects 'reply_to_message_id' but the Matrix
connector was passing 'reply_to'. This caused replies to fail with:

    MessageEvent.__init__() got an unexpected keyword argument 'reply_to'

Changed the parameter name to match the dataclass definition.
2026-03-18 02:23:21 -07:00
Teknium
b1069196a6 Merge pull request #1894 from NousResearch/hermes/hermes-b29f73b2
fix(delegate): move _saved_tool_names save/restore to _run_single_child scope
2026-03-18 02:23:14 -07:00
Bartok9
ba7248c669 fix(delegate): move _saved_tool_names save/restore to _run_single_child scope
Fixes #1802

The v0.3.0 refactor split child agent construction (_build_child_agent)
and execution (_run_single_child) into separate functions. This created
a scope bug where _saved_tool_names was defined in _build_child_agent
but referenced in _run_single_child's finally block, causing a NameError
on every delegate_task call.

Solution: Move the save/restore logic entirely into _run_single_child,
keeping the save and restore in the same scope as the try/finally block.
This is cleaner than passing the variable through and removes the dead
save from _build_child_agent.
2026-03-18 02:22:46 -07:00
Teknium
6fc4e36625 fix: search all sources by default in session_search (#1892)
* fix: include ACP sessions in default search sources

* fix: remove hardcoded source allowlist from session search

The default source_filter was a hardcoded list that silently excluded
any platform not explicitly listed. Instead of maintaining an ever-growing
allowlist, remove it entirely so all sources are searched by default.
Callers can still pass source_filter explicitly to narrow results.

Follow-up to cherry-picked PR #1817.

---------

Co-authored-by: someoneexistsontheinternet <154079416+someoneexistsontheinternet@users.noreply.github.com>
Co-authored-by: Test <test@test.com>
2026-03-18 02:21:29 -07:00
Teknium
7d7c2a62dd Merge pull request #1890 from NousResearch/hermes/hermes-b29f73b2
fix: OAuth flag stale after refresh/fallback, memory nudge never fires, dead code
2026-03-18 02:20:19 -07:00
Test
5b74df2bfc fix: OAuth flag stale after refresh/fallback, memory nudge never fires, dead code
- Update _is_anthropic_oauth in _try_refresh_anthropic_client_credentials()
  when token type changes during credential refresh
- Set _is_anthropic_oauth in _try_activate_fallback() Anthropic path
- Move _turns_since_memory and _iters_since_skill init to __init__ so
  nudge counters accumulate across run_conversation() calls in CLI mode
- Remove unreachable retry_count >= max_retries block after raise

Adds 7 regression tests. Salvaged from PR #1797 by @0xbyt4.
2026-03-18 02:19:57 -07:00
max
0c392e7a87 feat: integrate GitHub Copilot providers across Hermes
Add first-class GitHub Copilot and Copilot ACP provider support across
model selection, runtime provider resolution, CLI sessions, delegated
subagents, cron jobs, and the Telegram gateway.

This also normalizes Copilot model catalogs and API modes, introduces a
Copilot ACP OpenAI-compatible shim, and fixes service-mode auth by
resolving Homebrew-installed gh binaries under launchd.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-17 23:40:22 -07:00
Teknium
f656dfcb32 Merge pull request #1840 from NousResearch/hermes/hermes-b29f73b2
fix: allow agent-created skills with caution-level findings
2026-03-17 16:33:04 -07:00
Test
0fab46f65c fix: allow agent-created skills with caution-level findings
Agent-created skills were using the same policy as community hub
installs, blocking any skill with medium/high severity findings
(e.g. docker pull, pip install, git clone). This meant the agent
couldn't create skills that reference Docker or other common tools.

Changed agent-created policy from (allow, block, block) to
(allow, allow, block) — matching the trusted policy. Caution-level
findings (medium/high severity) are now allowed through, while
dangerous findings (critical severity like exfiltration, prompt
injection, reverse shells) remain blocked.

Added 4 tests covering the agent-created policy: safe allowed,
caution allowed, dangerous blocked, force override.
2026-03-17 16:32:25 -07:00
Teknium
37dceb043e fix: improve gateway error handling for 429 usage limits and 500 context overflow (#1839)
fix: improve gateway error handling for 429 usage limits and 500 context overflow
2026-03-17 16:32:20 -07:00
silentconsensus
7ce374d3b9 Improve gateway error handling for 429 usage limits and 500 context overflow
- Distinguish plan usage limits (429 with usage_limit_reached) from transient rate limits
- Show approximate reset time in hours for plan limits
- Treat HTTP 500 with large sessions as context overflow (same as 400)
- Move history length check earlier for reuse across status codes
2026-03-17 16:32:01 -07:00
Teknium
6e4415e865 Merge pull request #1838 from NousResearch/hermes/hermes-b29f73b2
fix(context_compressor): replace print() calls with logger
2026-03-17 16:31:32 -07:00
Test
45bad9771d fix(context_compressor): replace print() calls with logger
Replaces all remaining print() calls in compress() with logger.info()
and logger.warning() for consistency with the rest of the module.

Inspired by PR #1822.
2026-03-17 16:31:01 -07:00
Teknium
8d60db0f6f fix(discord): remove bugged followup messages + remove /ask command (#1836)
fix(discord): remove bugged followup messages + remove /ask command
2026-03-17 16:28:36 -07:00
Test
1bee519a6f fix(discord): remove redundant /ask slash command
/ask was just 'send a message to the bot' via the slash command menu —
completely redundant since Discord bots already listen to channel messages.
Removed as part of salvaging PR #1827.
2026-03-17 16:25:09 -07:00
charliekerfoot
72bfa115a0 fix(discord): removebugged follow up messages from discord slash commands 2026-03-17 16:24:17 -07:00
Teknium
7f85b2914d Merge pull request #1824 from cutepawss/fix/search-files-pagination
Clean fix — adds pagination args to search_key for parity with read_file. Thanks @cutepawss!
2026-03-17 16:16:47 -07:00
Teknium
b8076bb0bd feat: cron agents can suppress delivery with [SILENT] response (#1833)
feat: cron agents can suppress delivery with [SILENT] response
2026-03-17 16:09:24 -07:00
Test
d35d923c76 feat: cron agents can suppress delivery with [SILENT] response
Every cron job prompt now includes guidance that the agent can respond
with [SILENT] when it has nothing new or noteworthy to report. The
scheduler checks for this marker and skips delivery, while still saving
output to disk for audit. Failed jobs always deliver regardless.

This replaces the notify parameter approach from PR #1807 with a simpler
always-on design — the model is smart enough to decide when there's
nothing worth reporting without needing a per-job flag.
2026-03-17 16:06:49 -07:00
darya
a654bc04f7 fix(file_tools): include pagination args in repeated search key 2026-03-18 01:19:05 +03:00
Test
a71e3f4d98 fix: add /browser to COMMAND_REGISTRY so it shows in help and autocomplete
The /browser command handler existed in cli.py but was never added to
COMMAND_REGISTRY after the centralized command registry refactor. This
meant:
- /browser didn't appear in /help
- No tab-completion or subcommand suggestions
- Dispatch used _base_word fallback instead of canonical resolution

Added CommandDef with connect/disconnect/status subcommands and
switched dispatch to use canonical instead of _base_word.
2026-03-17 13:29:36 -07:00
Teknium
588962d24e docs: escape {id} in api-server.md headings to fix MDX build (#1787)
MDX v2+ interprets curly braces in regular markdown as JSX
expressions. The headings 'GET /v1/responses/{id}' and
'DELETE /v1/responses/{id}' caused a ReferenceError during
Docusaurus static site generation because 'id' is not a
defined JavaScript variable. Escaped with backslashes.

Co-authored-by: Test <test@test.com>
2026-03-17 11:04:37 -07:00
Teknium
2fa33dde81 fix: handle message length overflow in streaming mode (#1783)
Stream consumer now splits messages that exceed the platform's
MAX_MESSAGE_LENGTH. When accumulated text grows past the safe limit,
the current message is finalized and a new message is started for the
overflow — same as how normal sends chunk long responses.

Split point prefers line boundaries (rfind newline) for clean breaks.
Works for all platforms (Telegram 4096, Discord 2000, etc.) by reading
the adapter's MAX_MESSAGE_LENGTH at runtime.

Also added a safety net in the Telegram adapter: if edit_message_text
still hits MESSAGE_TOO_LONG (e.g. markdown formatting expansion), it
truncates and returns success so the stream consumer doesn't die.

Co-authored-by: Test <test@test.com>
2026-03-17 11:00:52 -07:00
Teknium
7ac9088d5c fix: Telegram streaming — config bridge, not-modified, flood control (#1782)
* fix: NameError in OpenCode provider setup (prompt_text -> prompt)

The OpenCode Zen and OpenCode Go setup sections used prompt_text()
which is undefined. All other providers correctly use the local
prompt() function defined in setup.py. Fixes crash during
'hermes setup' when selecting either OpenCode provider.

* fix: Telegram streaming — config bridge, not-modified, flood control

Three fixes for gateway streaming:

1. Bridge streaming config from config.yaml into gateway runtime.
   load_gateway_config() now reads the 'streaming' key from config.yaml
   (same pattern as session_reset, stt, etc.), matching the docs.
   Previously only gateway.json was read.

2. Handle 'Message is not modified' in Telegram edit_message().
   This Telegram API error fires when editing with identical content —
   a no-op, not a real failure. Previously it returned success=False
   which made the stream consumer disable streaming entirely.

3. Handle RetryAfter / flood control in Telegram edit_message().
   Fast providers can hit Telegram rate limits during streaming.
   Now waits the requested retry_after duration and retries once,
   instead of treating it as a fatal edit failure.

Also fixed double-edit on stream finish: the consumer now tracks
last-sent text and skips redundant edits, preventing the not-modified
error at the source.

* refactor: make config.yaml the primary gateway config source

Eliminates the per-key bridge pattern in load_gateway_config().
Previously gateway.json was the primary source and each config.yaml
key needed an individual bridge — easy to forget (streaming was
missing, causing garl4546's bug).

Now config.yaml is read first and its keys are mapped directly into
the GatewayConfig.from_dict() schema. gateway.json is kept as a
legacy fallback layer (loaded first, then overwritten by config.yaml
keys). If gateway.json exists, a log message suggests migrating.

Also:
- Removed dead save_gateway_config() (never called anywhere)
- Updated CLI help text and send_message error to reference
  config.yaml instead of gateway.json

---------

Co-authored-by: Test <test@test.com>
2026-03-17 10:51:54 -07:00
Teknium
dd60bcbfb7 feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter

Salvaged from PR #956, updated for current main.

Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.

Endpoints:
- POST /v1/chat/completions  — stateless Chat Completions API
- POST /v1/responses         — stateful Responses API with chaining
- GET  /v1/responses/{id}    — retrieve stored response
- DELETE /v1/responses/{id}  — delete stored response
- GET  /v1/models            — list hermes-agent as available model
- GET  /health               — health check

Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses

Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()

Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)

Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference

* feat(whatsapp): make reply prefix configurable via config.yaml

Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.

The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:

  whatsapp:
    reply_prefix: ''                     # disable header
    reply_prefix: '🤖 *My Bot*\n───\n'  # custom prefix

How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
  and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
  WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
  or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix

Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.

Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.

---------

Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
Teknium
b5cf0f0aef fix: preserve parent agent's tool list after subagent delegation (#1778)
Save and restore the process-global _last_resolved_tool_names in
_run_single_child() so the parent's execute_code sandbox generates
correct tool imports after delegation completes.

The global was already mostly mitigated (run_agent.py passes
enabled_tools via self.valid_tool_names), but the global itself
remained corrupted — a footgun for any code that reads it directly.

Co-authored-by: shane9coy <shane9coy@users.noreply.github.com>
2026-03-17 10:31:38 -07:00
Teknium
9a1e971126 fix(stt): respect explicit provider config instead of env-var fallback (#1775)
* fix(session): skip corrupt lines in load_transcript instead of crashing

Wrap json.loads() in load_transcript() with try/except JSONDecodeError
so that partial JSONL lines (from mid-write crashes like OOM/SIGKILL)
are skipped with a warning instead of crashing the entire transcript
load. The rest of the history loads fine.

Adds a logger.warning with the session ID and truncated corrupt line
content for debugging visibility.

Salvaged from PR #1193 by alireza78a.
Closes #1193

* fix(stt): respect explicit provider config instead of env-var fallback

Rework _get_provider() to separate explicit config from auto-detect.
When stt.provider is explicitly set in config.yaml, that choice is
authoritative — no silent cross-provider fallback based on which env
vars happen to be set. When no provider is configured, auto-detect
still tries: local > groq > openai.

This fixes the reported scenario where provider: local + a placeholder
OPENAI_API_KEY caused the system to silently select OpenAI and fail
with a 401.

Closes #1774
2026-03-17 10:30:58 -07:00
Teknium
088d65605a fix: NameError in OpenCode provider setup (prompt_text -> prompt) (#1779)
The OpenCode Zen and OpenCode Go setup sections used prompt_text()
which is undefined. All other providers correctly use the local
prompt() function defined in setup.py. Fixes crash during
'hermes setup' when selecting either OpenCode provider.
2026-03-17 10:30:16 -07:00
teknium1
c881209b92 Revert "feat(cli): skin-aware light/dark theme mode with terminal auto-detection"
This reverts commit a1c81360a5.
2026-03-17 10:04:53 -07:00
Teknium
d7a2e3ddae fix: handle hyphenated FTS5 queries and preserve quoted literals (#1776)
_sanitize_fts5_query() was stripping ALL double quotes (including
properly paired ones), breaking user-provided quoted phrases like
"exact phrase".  Hyphenated terms like chat-send also silently
expanded to chat AND send, returning unexpected or zero results.

Fix:
1. Extract balanced quoted phrases into placeholders before
   stripping FTS5-special characters, then restore them.
2. Wrap unquoted hyphenated terms (word-word) in double quotes so
   FTS5 matches them as exact phrases instead of splitting on
   the hyphen.
3. Unmatched quotes are still stripped as before.

Based on issue report by @bailob (#1770) and PR #1773 by @Jah-yee
(whose branch contained unrelated changes and couldn't be merged
directly).

Closes #1770
Closes #1773

Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
2026-03-17 09:44:01 -07:00
Teknium
d5af593769 Merge pull request #1769 from sai-samarth/fix/whatsapp-send-message-support
Clean merge — PR is current against main, tests pass, implementation matches existing gateway WhatsApp bridge pattern.
2026-03-17 09:42:01 -07:00
Teknium
df74f86955 Merge pull request #1767 from sai-samarth/fix/systemd-node-path-whatsapp
Clean fix for nvm/non-standard Node.js paths in systemd units. Merges cleanly.
2026-03-17 09:41:39 -07:00
sai-samarth
a3de843fdb test: replace real-looking WhatsApp jid in regression test 2026-03-17 15:38:37 +00:00
sai-samarth
dc15bc508f fix(tools): add outbound WhatsApp send_message routing 2026-03-17 15:31:13 +00:00
sai-samarth
b8eb7c5fed fix(gateway): include resolved node path in systemd unit 2026-03-17 15:11:28 +00:00
Teknium
548cedb869 fix(context_compressor): prevent consecutive same-role messages after compression (#1743)
compress() checks both the head and tail neighbors when choosing the
summary message role.  When only the tail collides, the role is flipped.
When BOTH roles would create consecutive same-role messages (e.g.
head=assistant, tail=user), the summary is merged into the first tail
message instead of inserting a standalone message that breaks role
alternation and causes API 400 errors.

The previous code handled head-side collision but left the tail-side
uncovered — long conversations would crash mid-reply with no useful
error, forcing the user to /reset and lose session history.

Based on PR #1186 by @alireza78a, with improved double-collision
handling (merge into tail instead of unconditional 'user' fallback).

Co-authored-by: alireza78a <alireza78.crypto@gmail.com>
2026-03-17 05:18:52 -07:00
Teknium
702191049f fix(session): skip corrupt lines in load_transcript instead of crashing (#1744)
Wrap json.loads() in load_transcript() with try/except JSONDecodeError
so that partial JSONL lines (from mid-write crashes like OOM/SIGKILL)
are skipped with a warning instead of crashing the entire transcript
load. The rest of the history loads fine.

Adds a logger.warning with the session ID and truncated corrupt line
content for debugging visibility.

Salvaged from PR #1193 by alireza78a.
Closes #1193
2026-03-17 05:18:12 -07:00
Teknium
aea39eeafb Merge pull request #1736 from NousResearch/fix/gateway-platform-hardening
fix(gateway): SMS session-per-send + Matrix bare media types break downstream processing
2026-03-17 04:46:25 -07:00
Teknium
23a3f01b2b Merge pull request #1735 from NousResearch/fix/tool-handler-safety
fix(tools): browser handlers TypeError on unexpected LLM params + fuzzy_match docstring
2026-03-17 04:46:22 -07:00
Teknium
af118501b9 Merge pull request #1733 from NousResearch/fix/defensive-hardening
fix: defensive hardening — logging, dedup, locks, dead code
2026-03-17 04:46:20 -07:00
Teknium
d1d17f4f0a feat(compression): add summary_base_url + move compression config to YAML-only
- Add summary_base_url config option to compression block for custom
  OpenAI-compatible endpoints (e.g. zai, DeepSeek, Ollama)
- Remove compression env var bridges from cli.py and gateway/run.py
  (CONTEXT_COMPRESSION_* env vars no longer set from config)
- Switch run_agent.py to read compression config directly from
  config.yaml instead of env vars
- Fix backwards-compat block in _resolve_task_provider_model to also
  fire when auxiliary.compression.provider is 'auto' (DEFAULT_CONFIG
  sets this, which was silently preventing the compression section's
  summary_* keys from being read)
- Add test for summary_base_url config-to-client flow
- Update docs to show compression as config.yaml-only

Closes #1591
Based on PR #1702 by @uzaylisak
2026-03-17 04:46:15 -07:00
teknium1
6832d60bc0 fix(gateway): SMS persistent HTTP session + Matrix MIME media types
1. sms.py: Replace per-send aiohttp.ClientSession with a persistent
   session created in connect() and closed in disconnect(). Each
   outbound SMS no longer pays the TCP+TLS handshake cost. Falls back
   to a temporary session if the persistent one isn't available.

2. matrix.py: Use proper MIME types (image/png, audio/ogg, video/mp4)
   instead of bare category words (image, audio, video). The gateway's
   media processing checks startswith('image/') and startswith('audio/')
   so bare words caused Matrix images to skip vision enrichment and
   Matrix audio to skip transcription. Now extracts the actual MIME
   type from the nio event's content info when available.
2026-03-17 04:35:14 -07:00
teknium1
ea95462998 fix(tools): browser handler safety + fuzzy_match docstring accuracy
1. browser_tool.py: Replace **args spread on browser_click, browser_type,
   and browser_scroll handlers with explicit parameter extraction. The
   **args pattern passed all dict keys as keyword arguments, causing
   TypeError if the LLM sent unexpected parameters. Now extracts only
   the expected params (ref, text, direction) with safe defaults.

2. fuzzy_match.py: Update module docstring to match actual strategy
   order in code. Block anchor was listed as #3 but is actually #7.
   Multi-occurrence is not a separate strategy but a flag. Updated
   count from 9 to 8.
2026-03-17 04:32:39 -07:00
teknium1
847ee20390 fix: defensive hardening — logging, dedup, locks, dead code
Four small fixes:

1. model_tools.py: Tool import failures logged at WARNING instead of
   DEBUG. If a tool module fails to import (syntax error, missing dep),
   the user now sees a warning instead of the tool silently vanishing.

2. hermes_cli/config.py: Remove duplicate 'import sys' (lines 19, 21).

3. agent/model_metadata.py: Remove 6 duplicate entries in
   DEFAULT_CONTEXT_LENGTHS dict. Python keeps the last value, so no
   functional change, but removes maintenance confusion.

4. hermes_state.py: Add missing self._lock to the LIKE query in
   resolve_session_id(). The exact-match path used get_session()
   (which locks internally), but the prefix fallback queried _conn
   without the lock.
2026-03-17 04:31:26 -07:00
Teknium
867a96c051 fix+feat: bug fixes, auto session titles, .hermes.md project config (#1712)
fix+feat: bug fixes, auto session titles, .hermes.md project config
2026-03-17 04:30:48 -07:00
teknium1
0897e4350e merge: resolve conflicts with origin/main 2026-03-17 04:30:37 -07:00
Teknium
d2b10545db feat(web): add Tavily as web search/extract/crawl backend (#1731)
Salvage of PR #1707 by @kshitijk4poor (cherry-picked with authorship preserved).

Adds Tavily as a third web backend alongside Firecrawl and Parallel, using the Tavily REST API via httpx.

- Backend selection via hermes tools → saved as web.backend in config.yaml
- All three tools supported: search, extract, crawl
- TAVILY_API_KEY in config registry, doctor, status, setup wizard
- 15 new Tavily tests + 9 backend selection tests + 5 config tests
- Backward compatible

Closes #1707
2026-03-17 04:28:03 -07:00
Teknium
85993fbb5a feat: pre-call sanitization and post-call tool guardrails (#1732)
Salvage of PR #1321 by @alireza78a (cherry-picked concept, reimplemented
against current main).

Phase 1 — Pre-call message sanitization:
  _sanitize_api_messages() now runs unconditionally before every LLM call.
  Previously gated on context_compressor being present, so sessions loaded
  from disk or running without compression could accumulate dangling
  tool_call/tool_result pairs causing API errors.

Phase 2a — Delegate task cap:
  _cap_delegate_task_calls() truncates excess delegate_task calls per turn
  to MAX_CONCURRENT_CHILDREN. The existing cap in delegate_tool.py only
  limits the task array within a single call; this catches multiple
  separate delegate_task tool_calls in one turn.

Phase 2b — Tool call deduplication:
  _deduplicate_tool_calls() drops duplicate (tool_name, arguments) pairs
  within a single turn when models stutter.

All three are static methods on AIAgent, independently testable.
29 tests covering happy paths and edge cases.
2026-03-17 04:24:27 -07:00
Teknium
fb20a9e120 Merge pull request #1729 from NousResearch/fix/cron-timezone-naive-iso
fix(cron): naive ISO timestamps stored without timezone — jobs fire at wrong time
2026-03-17 04:24:02 -07:00
Teknium
21b823dd3b Merge pull request #1726 from NousResearch/fix/memory-tool-file-locking
fix(memory): concurrent writes silently drop entries — add file locking
2026-03-17 04:23:59 -07:00
Teknium
618ed2c65f fix(update): use .[all] extras with fallback in hermes update (#1728)
Both update paths now try .[all] first, fall back to . if extras fail. Fixes #1336.

Inspired by PR #1342 by @baketnk.
2026-03-17 04:22:37 -07:00
Teknium
9f81c11ba0 feat: eager fallback to backup model on rate-limit errors (#1730)
When a fallback model is configured, switch to it immediately upon
detecting rate-limit conditions (429, quota exhaustion, empty/malformed
responses) instead of exhausting all retries with exponential backoff.

Two eager-fallback checks:
1. Invalid/empty API responses — fallback attempted before retry loop
2. HTTP 429 / rate-limit keyword detection — fallback before backoff

Both guarded by _fallback_activated for one-shot semantics.

Cherry-picked from PR #1413 by usvimal.

Co-authored-by: usvimal <usvimal@users.noreply.github.com>
2026-03-17 04:21:16 -07:00
teknium1
5301c01776 fix(cron): make naive ISO timestamps timezone-aware at parse time
User-provided ISO timestamps like '2026-02-03T14:00' (no timezone)
were stored naive. The _ensure_aware() helper at check time interprets
naive datetimes using the current system timezone, but if the system
timezone changes between job creation and checking, the job fires at
the wrong time.

Fix: call dt.astimezone() at parse time to immediately stamp the
datetime with the local timezone. The stored value is now always
timezone-aware, so it's stable regardless of later timezone changes.
2026-03-17 04:20:24 -07:00
teknium1
d81de2f3d8 fix(memory): file-lock read-modify-write to prevent concurrent data loss
Two concurrent gateway sessions calling memory add/replace/remove
simultaneously could both read the old state, apply their changes
independently, and write — the last writer silently drops the first
writer's entry.

Fix: wrap each mutation in a file lock (fcntl.flock on a .lock file).
Under the lock, re-read entries from disk to get the latest state,
apply the mutation, then write. This ensures concurrent writers
serialize properly.

The lock uses a separate .lock file since the memory file itself is
atomically replaced via os.replace() (can't flock a replaced file).
Readers remain lock-free since atomic rename ensures they always see
a complete file.
2026-03-17 04:19:11 -07:00
Teknium
1314b4b541 feat(hooks): emit session:end lifecycle event (#1725)
Based on PR #1432 by @bayrakdarerdem. session:start was already on main; this adds the session:end event.

Co-authored-by: bayrakdarerdem <bayrakdarerdem@users.noreply.github.com>
2026-03-17 04:17:44 -07:00
ch3ronsa
695eb04243 feat(agent): .hermes.md per-repository project config discovery
Adds .hermes.md / HERMES.md discovery for per-project agent configuration.
When the agent starts, it walks from cwd to the git root looking for
.hermes.md (preferred) or HERMES.md, strips any YAML frontmatter, and
injects the markdown body into the system prompt as project context.

- Nearest-first discovery (subdirectory configs shadow parent)
- Stops at git root boundary (no leaking into parent repos)
- YAML frontmatter stripped (structured config deferred to Phase 2)
- Same injection scanning and 20K truncation as other context files
- 22 comprehensive tests

Original implementation by ch3ronsa. Cherry-picked and adapted for current main.

Closes #681 (Phase 1)
2026-03-17 04:16:32 -07:00
teknium1
e5fc916814 feat: auto-generate session titles after first exchange
After the first user→assistant exchange, Hermes now generates a short
descriptive session title via the auxiliary LLM (compression task config).
Title generation runs in a background thread so it never delays the
user-facing response.

Key behaviors:
- Fires only on the first 1-2 exchanges (checks user message count)
- Skips if a title already exists (user-set titles are never overwritten)
- Uses call_llm with compression task config (cheapest/fastest model)
- Truncates long messages to keep the title generation request small
- Cleans up LLM output: strips quotes, 'Title:' prefixes, enforces 80 char max
- Works in both CLI and gateway (Telegram/Discord/etc.)

Also updates /title (no args) to show the session ID alongside the title
in both CLI and gateway.

Implements #1426
2026-03-17 04:14:40 -07:00
Teknium
0878e5f4a8 Merge pull request #1724 from NousResearch/fix/model-metadata-fuzzy-match
fix(metadata): fuzzy context length match can return wrong model's value
2026-03-17 04:13:56 -07:00
Teknium
72bcec0ce5 Merge pull request #1723 from NousResearch/fix/compression-attempts-persist
fix(core): compression_attempts resets each iteration — allows unlimited compressions
2026-03-17 04:13:54 -07:00
Teknium
d604b9622c Merge pull request #1722 from NousResearch/fix/run-agent-role-violations
fix(core): message role alternation violations in JSON recovery and error handler
2026-03-17 04:13:51 -07:00
Teknium
cf0dd777c8 Merge pull request #1721 from NousResearch/fix/browser-session-race
fix(browser): race condition in session creation orphans cloud sessions
2026-03-17 04:13:49 -07:00
Teknium
ec272ca8be Merge pull request #1720 from NousResearch/fix/compressor-consecutive-role-violation
fix(compressor): summary role can violate consecutive-role constraint
2026-03-17 04:13:46 -07:00
Teknium
99a44d87dc Merge pull request #1718 from NousResearch/fix/messaging-toolset-missing
fix(toolsets): add missing 'messaging' toolset — can't enable/disable send_message
2026-03-17 04:13:44 -07:00
Teknium
16f38abd25 Merge pull request #1717 from NousResearch/fix/length-continue-retries-reset
fix(core): length_continue_retries never resets — later truncations get fewer retries
2026-03-17 04:13:41 -07:00
Teknium
cac3c4d45f Merge pull request #1716 from NousResearch/fix/cron-double-load-jobs
fix(cron): get_due_jobs reads jobs.json twice — race condition
2026-03-17 04:13:39 -07:00
Teknium
4167e2e294 Merge pull request #1714 from NousResearch/fix/anthropic-tool-choice-none
fix(anthropic): tool_choice 'none' still allows tool calls
2026-03-17 04:13:36 -07:00
Teknium
6ddb9ee3e3 Merge pull request #1713 from NousResearch/fix/auxiliary-is-nous-reset
fix(aux): auxiliary_is_nous flag never resets — leaks Nous tags to other providers
2026-03-17 04:13:33 -07:00
Teknium
05aefeddc7 Merge pull request #1711 from NousResearch/fix/matrix-mattermost-mark-connected
fix(gateway): Matrix and Mattermost never report as connected
2026-03-17 04:13:31 -07:00
teknium1
9db75fcfc2 fix(metadata): fuzzy context length match prefers longest key
The fuzzy match for model context lengths iterated dict insertion
order. Shorter model names (e.g. 'gpt-5') could match before more
specific ones (e.g. 'gpt-5.4-pro'), returning the wrong context
length.

Sort by key length descending so more specific model names always
match first.
2026-03-17 04:12:08 -07:00
teknium1
1264275cc3 fix(core): compression_attempts counter resets each loop iteration
compression_attempts was initialized inside the outer while loop,
resetting to 0 on every iteration. Since compression triggers a
'continue' back to the top of the loop, the counter never accumulated
past 1 — effectively allowing unlimited compression attempts.

Move initialization before the outer while loop so the cap of 3
applies across the entire run_conversation() call.
2026-03-17 04:11:32 -07:00
teknium1
cd6dc4ef7e fix(core): message role violations in JSON recovery and error handler
Two edge cases could inject messages that violate role alternation:

1. Invalid JSON recovery (line ~5985): After 3 retries of invalid JSON
   tool args, a user-role recovery message was injected. But the
   assistant's tool_calls were never appended, so the sequence could
   become user → user. Fix: append the assistant message with its
   tool_calls, then respond with proper tool-role error results.

2. System error handler (line ~6238): Always injected a user-role
   error message, which creates consecutive user messages if the last
   message was already user. Fix: dynamically choose the role based on
   the last message to maintain alternation.
2026-03-17 04:10:41 -07:00
teknium1
8cd4a96686 fix(browser): race condition in session creation can orphan cloud sessions
Two concurrent threads (e.g. parallel subagents) could both pass the
'task_id in _active_sessions' check, both create cloud sessions via
network calls, and then one would overwrite the other — leaking the
first cloud session.

Add double-check after the lock is re-acquired: if another thread
already created a session while we were doing the network call, use
the existing one instead of orphaning it.
2026-03-17 04:09:16 -07:00
teknium1
344f3771cb fix(compressor): summary role can create consecutive same-role messages
The summary message role was determined only by the last head message,
ignoring the first tail message. This could create consecutive user
messages (rejected by Anthropic) when the tail started with 'user'.

Now checks both neighbors. Priority: avoid colliding with the head
(already committed). If the chosen role also collides with the tail,
flip it — but only if flipping wouldn't re-collide with the head.
2026-03-17 04:08:37 -07:00
teknium1
8b851e2eeb fix(toolsets): add missing 'messaging' toolset definition
send_message_tool registers under toolset='messaging' but no
'messaging' entry existed in TOOLSETS. This meant --disable-toolset
messaging and --enable-toolset messaging silently failed, and the
hermes tools config UI couldn't toggle the messaging tools.
2026-03-17 04:06:06 -07:00
teknium1
24282dceb1 fix(core): reset length_continue_retries after successful continuation
length_continue_retries and truncated_response_prefix were initialized
once before the outer loop and never reset after a successful
continuation. If a conversation hit length truncation once (counter=1),
succeeded on continuation, did more tool calls, then hit length again,
the counter started at 1 instead of 0 — reducing available retries
from 3 to 2. The stale truncated_response_prefix would also leak
into the next response.

Reset both after the prefix is consumed on a successful final response.
2026-03-17 04:05:20 -07:00
teknium1
1f0bb8742f fix(cron): get_due_jobs read jobs.json twice creating race window
get_due_jobs() called load_jobs() twice: once for filtering (with
_apply_skill_fields) and once for saving updates. Between the two
reads, another process could modify jobs.json, causing the filtering
and saving to operate on different versions.

Fix: load once, deepcopy for the skill-applied working list.
2026-03-17 04:03:42 -07:00
teknium1
0de75505f3 fix(anthropic): tool_choice 'none' still allowed tool calls
When tool_choice was 'none', the code did 'pass' — no tool_choice
was sent but tools were still included in the request. Anthropic
defaults to 'auto' when tools are present, so the model could still
call tools despite the caller requesting 'none'.

Fix: omit tools entirely from the request when tool_choice is 'none',
which is the only way to prevent tool use with the Anthropic API.
2026-03-17 04:02:49 -07:00
teknium1
e5a244ad5d fix(aux): reset auxiliary_is_nous flag on each resolution attempt
The module-level auxiliary_is_nous was set to True by _try_nous() and
never reset. In long-running gateway processes, once Nous was resolved
as auxiliary provider, the flag stayed True forever — even if
subsequent resolutions chose a different provider (e.g. OpenRouter).
This caused Nous product tags to be sent to non-Nous providers.

Reset the flag at the start of _resolve_auto() so only the winning
provider's flag persists.
2026-03-17 04:02:15 -07:00
Teknium
4433b83378 feat(web): add Parallel as alternative web search/extract backend (#1696)
* feat(web): add Parallel as alternative web search/extract backend

Adds Parallel (parallel.ai) as a drop-in alternative to Firecrawl for
web_search and web_extract tools using the official parallel-web SDK.

- Backend selection via WEB_SEARCH_BACKEND env var (auto/parallel/firecrawl)
- Auto mode prefers Firecrawl when both keys present; Parallel when sole backend
- web_crawl remains Firecrawl-only with clear error when unavailable
- Lazy SDK imports, interrupt support, singleton clients
- 16 new unit tests for backend selection and client config

Co-authored-by: s-jag <s-jag@users.noreply.github.com>

* fix: add PARALLEL_API_KEY to config registry and fix web_crawl policy tests

Follow-up for Parallel backend integration:
- Add PARALLEL_API_KEY to OPTIONAL_ENV_VARS (hermes doctor, env blocklist)
- Add to set_config_value api_keys list (hermes config set)
- Add to doctor keys display
- Fix 2 web_crawl policy tests that didn't set FIRECRAWL_API_KEY
  (needed now that web_crawl has a Firecrawl availability guard)

* refactor: explicit backend selection via hermes tools, not auto-detect

Replace the auto-detect backend selection with explicit user choice:
- hermes tools saves WEB_SEARCH_BACKEND to .env when user picks a provider
- _get_backend() reads the explicit choice first
- Fallback only for manual/legacy config (uses whichever key is present)
- _is_provider_active() shows [active] for the selected web backend
- Updated tests, docs, and .env.example to remove 'auto' mode language

* refactor: use config.yaml for web backend, not env var

Match the TTS/browser pattern — web.backend is stored in config.yaml
(set by hermes tools), not as a WEB_SEARCH_BACKEND env var.

- _load_web_config() reads web: section from config.yaml
- _get_backend() reads web.backend from config, falls back to key detection
- _configure_provider() saves to config dict (saved to config.yaml)
- _is_provider_active() reads from config dict
- Removed WEB_SEARCH_BACKEND from .env.example, set_config_value, docs
- Updated all tests to mock _load_web_config instead of env vars

---------

Co-authored-by: s-jag <s-jag@users.noreply.github.com>
2026-03-17 04:02:02 -07:00
crazywriter1
7049dba778 fix(docker): remove container on cleanup when container_persistent=false
When container_persistent=false, the inner mini-swe-agent cleanup only
runs 'docker stop' in the background, leaving containers in Exited state.
Now cleanup() also runs 'docker rm -f' to fully remove the container.

Also fixes pre-existing test failures in model_metadata (gpt-4.1 1M context),
setup tests (TTS provider step), and adds MockInnerDocker.cleanup().

Original fix by crazywriter1. Cherry-picked and adapted for current main.

Fixes #1679
2026-03-17 04:02:01 -07:00
Teknium
6405d389aa test: align Hermes setup and full-suite expectations (#1710)
Salvaged from PR #1708 by @kartikkabadi. Cherry-picked with authorship preserved.

Fixes pre-existing test failures from setup TTS prompt flow changes and environment-sensitive assumptions.

Co-authored-by: Kartik <user2@RentKars-MacBook-Air.local>
2026-03-17 04:01:37 -07:00
teknium1
b111f2a779 fix(gateway): Matrix and Mattermost never report as connected
Neither adapter called _mark_connected() after successful connect(),
so _running stayed False, runtime status never showed 'connected',
and /status reported them as offline even while actively processing
messages.

Add _mark_connected() calls matching the pattern used by Telegram
and DingTalk adapters.
2026-03-17 04:01:02 -07:00
Teknium
b16186a32a feat(telegram): auto-detect HTML tags and use parse_mode=HTML in send_message (#1709)
* feat: interactive MCP tool configuration in hermes tools

Add the ability to selectively enable/disable individual MCP server
tools through the interactive 'hermes tools' TUI.

Changes:
- tools/mcp_tool.py: Add probe_mcp_server_tools() — lightweight function
  that temporarily connects to configured MCP servers, discovers their
  tools (names + descriptions), and disconnects. No registry side effects.

- hermes_cli/tools_config.py: Add 'Configure MCP tools' option to the
  interactive menu. When selected:
  1. Probes all enabled MCP servers for their available tools
  2. Shows a per-server curses checklist with tool descriptions
  3. Pre-selects tools based on existing include/exclude config
  4. Writes changes back as tools.exclude entries in config.yaml
  5. Reports which servers failed to connect

The existing CLI commands (hermes tools enable/disable server:tool)
continue to work unchanged. This adds the interactive TUI counterpart
so users can browse and toggle MCP tools visually.

Tests: 22 new tests covering probe function edge cases and interactive
flow (pre-selection, exclude/include modes, description truncation,
multi-server handling, error paths).

* feat(telegram): auto-detect HTML tags and use parse_mode=HTML in send_message

When _send_telegram detects HTML tags in the message body, it now sends
with parse_mode='HTML' instead of converting to MarkdownV2. This allows
cron jobs and agents to send rich HTML-formatted Telegram messages with
bold, italic, code blocks, etc. that render correctly.

Detection uses the same regex from PR #1568 by @ashaney:
  re.search(r'<[a-zA-Z/][^>]*>', message)

Plain-text and markdown messages continue through the existing
MarkdownV2 pipeline. The HTML fallback path also catches HTML parse
errors and falls back to plain text, matching the existing MarkdownV2
error handling.

Inspired by: github.com/ashaney — PR #1568
2026-03-17 03:56:06 -07:00
Teknium
abdb4660d4 Merge pull request #1705 from NousResearch/fix/dingtalk-requirements-check
fix(dingtalk): requirements check passes with only one credential set
2026-03-17 03:53:51 -07:00
Teknium
ed3bcae8bd Merge pull request #1704 from NousResearch/fix/hermes-state-thread-locks
fix(state): add missing thread locks to 4 SessionDB methods
2026-03-17 03:53:48 -07:00
Teknium
75c5136e5a Merge pull request #1703 from NousResearch/fix/anthropic-adapter-merge-content-loss
fix(anthropic): consecutive assistant message merge drops content on mixed types
2026-03-17 03:53:45 -07:00
Teknium
1781c05adb Merge pull request #1701 from NousResearch/fix/gateway-yaml-pii-redaction
fix(gateway): PII redaction config never read — missing yaml import
2026-03-17 03:53:43 -07:00
teknium1
f613da4219 fix: add missing subprocess import in _install_neutts_deps
The function uses subprocess.run() and subprocess.CalledProcessError but
never imported the module. This caused a NameError crash during setup
when users selected NeuTTS as their TTS provider.

Fixes #1698
2026-03-17 03:53:35 -07:00
Teknium
d87655afff fix(gateway): persist watcher metadata in checkpoint for crash recovery (#1706)
Salvaged from PR #1573 by @eren-karakus0. Cherry-picked with authorship preserved.

Fixes #1143 — background process notifications resume after gateway restart.

Co-authored-by: Muhammet Eren Karakuş <erenkar950@gmail.com>
2026-03-17 03:52:15 -07:00
teknium1
a9da944a5d fix(dingtalk): requirements check passes with only one credential set
check_dingtalk_requirements() used 'and' to check for missing env vars:
  if not CLIENT_ID and not CLIENT_SECRET: return False

This only returns False when BOTH are missing. If only one is set
(e.g. CLIENT_ID without CLIENT_SECRET), the check passes and
connect() fails later with a cryptic error.

Fix: Change 'and' to 'or' so it returns False when EITHER is missing.
2026-03-17 03:50:45 -07:00
teknium1
efa778a0ef fix(state): add missing thread locks to 4 SessionDB methods
search_sessions(), clear_messages(), delete_session(), and
prune_sessions() all accessed self._conn without acquiring self._lock.
Every other method in the class uses the lock. In multi-threaded
contexts (gateway serving concurrent platform messages), these
unprotected methods can cause sqlite3.ProgrammingError from concurrent
cursor operations on the same connection.
2026-03-17 03:50:06 -07:00
teknium1
8b411b234d fix(anthropic): merge consecutive assistant messages with mixed content types
When two consecutive assistant messages had mixed content types (one
string, one list), the merge logic just replaced the earlier message
entirely with the later one (fixed[-1] = m), silently dropping the
earlier message's content.

Apply the same normalization pattern used in the tool_use merge path
(lines 952-956): convert both to list format before concatenating.
This preserves all content from both messages.
2026-03-17 03:48:55 -07:00
Teknium
ce7418e274 feat: interactive MCP tool configuration in hermes tools (#1694)
Add the ability to selectively enable/disable individual MCP server
tools through the interactive 'hermes tools' TUI.

Changes:
- tools/mcp_tool.py: Add probe_mcp_server_tools() — lightweight function
  that temporarily connects to configured MCP servers, discovers their
  tools (names + descriptions), and disconnects. No registry side effects.

- hermes_cli/tools_config.py: Add 'Configure MCP tools' option to the
  interactive menu. When selected:
  1. Probes all enabled MCP servers for their available tools
  2. Shows a per-server curses checklist with tool descriptions
  3. Pre-selects tools based on existing include/exclude config
  4. Writes changes back as tools.exclude entries in config.yaml
  5. Reports which servers failed to connect

The existing CLI commands (hermes tools enable/disable server:tool)
continue to work unchanged. This adds the interactive TUI counterpart
so users can browse and toggle MCP tools visually.

Tests: 22 new tests covering probe function edge cases and interactive
flow (pre-selection, exclude/include modes, description truncation,
multi-server handling, error paths).
2026-03-17 03:48:44 -07:00
teknium1
7c9beb5829 fix(gateway): add missing yaml import for PII redaction config read
The privacy.redact_pii config reader on line 1546 used bare 'yaml'
which is not in scope — yaml is imported as '_yaml' at module level
(line 93) and as '_y' in other methods. The NameError was silently
caught by the try/except, so PII redaction never activated even when
configured.

Add a local 'import yaml as _pii_yaml' consistent with the pattern
used elsewhere in the file.
2026-03-17 03:48:15 -07:00
Teknium
56e0c90445 Merge pull request #1700 from NousResearch/fix/redacting-formatter-import
fix(core): RedactingFormatter NameError when verbose_logging=True
2026-03-17 03:46:49 -07:00
Teknium
490d37bb80 Merge pull request #1699 from NousResearch/fix/nous-model-fetch-kwargs
fix(cli): fetch_nous_models called with positional args — always TypeError
2026-03-17 03:46:43 -07:00
Teknium
ea238721f0 Merge pull request #1697 from NousResearch/fix/gateway-skill-command-nameref
fix(gateway): NameError on skill slash commands — wrong variable reference
2026-03-17 03:46:08 -07:00
Teknium
d417ba2a48 feat: add route-aware pricing estimates (#1695)
Salvaged from PR #1563 by @kshitijk4poor. Cherry-picked with authorship preserved.

- Route-aware pricing architecture replacing static MODEL_PRICING + heuristics
- Canonical usage normalization (Anthropic/OpenAI/Codex API shapes)
- Cache-aware billing (separate cache_read/cache_write rates)
- Cost status tracking (estimated/included/unknown/actual)
- OpenRouter live pricing via models API
- Schema migration v4→v5 with billing metadata columns
- Removed speculative forward-looking entries
- Removed cost display from CLI status bar
- Threaded OpenRouter metadata pre-warm

Co-authored-by: kshitij <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-17 03:44:44 -07:00
teknium1
c713d01e72 fix(core): move RedactingFormatter import before conditional block
RedactingFormatter was imported inside 'if not has_errors_log_handler:'
(line 461) but also used unconditionally in the verbose_logging block
(line 479). When the error log handler already exists (e.g. second
AIAgent in the same process) AND verbose_logging=True, the import was
skipped and line 479 raised NameError.

Fix: Move the import one level up so it's always available regardless
of whether the error log handler already exists.
2026-03-17 03:43:21 -07:00
teknium1
f95c6a221b fix(cli): use keyword args for fetch_nous_models (always TypeError)
fetch_nous_models() uses keyword-only parameters (the * separator in
its signature), but models.py called it with positional args and in
the wrong order (api_key first, base_url second). This always raised
TypeError, silently caught by except Exception: pass.

Result: Nous provider model list was completely broken — /model
autocomplete and provider_model_ids('nous') always fell back to the
static model catalog instead of fetching live models.
2026-03-17 03:42:46 -07:00
teknium1
718d4b013c fix(gateway): use correct variable for skill slash command task_id
Line 1482 referenced 'session_key' which is not defined until line 1519,
causing a NameError on every skill slash command invocation in the gateway
(e.g. /deploy, /plan-with-skill). The try/except silently swallowed the
error, making all user-defined skill slash commands silently fail.

The correct variable is '_quick_key', defined at line 1292 (same variable
used by the /plan handler on line 1379).
2026-03-17 03:42:15 -07:00
Teknium
d9b9987ad3 docs: comprehensive documentation update for recent features
New documentation:
- DingTalk messaging platform setup guide (dingtalk.md)

Updated existing docs:
- quickstart.md: add Alibaba Cloud, Kilo Code, Vercel AI Gateway to provider table
- configuration.md: add Alibaba Cloud provider, website blocklist config,
  light/dark theme mode, smart approvals (ask/smart/off)
- environment-variables.md: add Mattermost, Matrix, DingTalk, Browser Use,
  DashScope env vars
- browser.md: add Browser Use cloud provider, /browser connect CDP mode,
  multi-provider architecture, fix limitation section contradiction
- slash-commands.md: add /tools enable/disable/list, /browser connect/disconnect/status
- messaging/index.md: add DingTalk, Mattermost, Matrix to architecture diagram,
  platform toolset table, security allowlists, and Next Steps links
- security.md: add website access policy (blocklist) documentation
- sidebars.ts: add Mattermost, Matrix, DingTalk to Messaging Gateway sidebar
2026-03-17 03:42:02 -07:00
Teknium
ba728f3e63 docs: add DingTalk setup guide and Alibaba Cloud provider to Docusaurus docs (#1692)
* feat(gateway): wire DingTalk into gateway setup and platform maps

Add DingTalk to:
- hermes_cli/gateway.py: _PLATFORMS list with setup instructions,
  AppKey/AppSecret prompts, and Stream Mode setup guide
- gateway/run.py: all platform-to-config-key maps, allowed users
  map, allow-all-users map, and toolset resolution maps

* docs: add DingTalk setup guide and Alibaba Cloud provider to docs

- Create website/docs/user-guide/messaging/dingtalk.md with full
  setup guide (prerequisites, app creation, config, access control,
  features, troubleshooting, env var reference)
- Update messaging/index.md: add DingTalk to diagram, toolsets
  table, security examples, and next steps
- Update configuration.md: add Alibaba Cloud to provider table
2026-03-17 03:26:44 -07:00
Teknium
d83efbb5bc feat(gateway): wire DingTalk into gateway setup and platform maps (#1690)
Add DingTalk to:
- hermes_cli/gateway.py: _PLATFORMS list with setup instructions,
  AppKey/AppSecret prompts, and Stream Mode setup guide
- gateway/run.py: all platform-to-config-key maps, allowed users
  map, allow-all-users map, and toolset resolution maps
2026-03-17 03:19:45 -07:00
Teknium
3cb83404e9 Merge pull request #1683 from NousResearch/feat/mattermost-matrix-adapters
feat: add Mattermost and Matrix gateway adapters
2026-03-17 03:18:25 -07:00
teknium1
1ae1e361b7 docs: add Mattermost and Matrix setup guides
Full Docusaurus docs following the Discord guide structure:

Mattermost (277 lines):
- Step-by-step: enable bot accounts, create bot, get token, add to channels
- All env vars documented with examples
- Reply mode (thread/off), home channel, troubleshooting

Matrix (354 lines):
- Step-by-step: create bot account, get access token (Element or API)
- Dual auth (token + password), E2EE section with libolm install
- Thread support, DM detection, home room, troubleshooting
- Works with any homeserver (Synapse, Conduit, Dendrite, matrix.org)
2026-03-17 03:18:16 -07:00
teknium1
016b1e10d7 feat: register Mattermost and Matrix env vars in OPTIONAL_ENV_VARS
Adds both platforms to the config system so hermes setup, hermes doctor,
and hermes config properly discover and manage their env vars.

- MATTERMOST_URL, MATTERMOST_TOKEN, MATTERMOST_ALLOWED_USERS
- MATRIX_HOMESERVER, MATRIX_ACCESS_TOKEN, MATRIX_USER_ID, MATRIX_ALLOWED_USERS
- Extra env keys for .env sanitizer: MATTERMOST_HOME_CHANNEL,
  MATTERMOST_REPLY_MODE, MATRIX_PASSWORD, MATRIX_ENCRYPTION, MATRIX_HOME_ROOM
2026-03-17 03:18:16 -07:00
teknium1
c3ce6108e3 test: add comprehensive tests for Mattermost and Matrix adapters
77 tests covering:

Mattermost (37 tests):
- Platform enum and config loading
- Message formatting (image markdown stripping)
- Message chunking at 4000 chars
- Send with mocked aiohttp (payload, threading, errors)
- WebSocket event parsing (double-encoded JSON!)
- File upload flow
- Post dedup cache (TTL, pruning)
- Requirements check

Matrix (40 tests):
- Platform enum and config loading (token + password auth, E2EE)
- mxc:// to HTTP URL conversion (authenticated v1.11+ endpoint)
- DM detection via m.direct cache
- Reply fallback stripping
- Thread detection from m.relates_to
- Message formatting and markdown to HTML
- Display name resolution
- Requirements check
2026-03-17 03:18:16 -07:00
teknium1
cd67f60e01 feat: add Mattermost and Matrix gateway adapters
Add support for Mattermost (self-hosted Slack alternative) and Matrix
(federated messaging protocol) as messaging platforms.

Mattermost adapter:
- REST API v4 client for posts, files, channels, typing indicators
- WebSocket listener for real-time 'posted' events with reconnect backoff
- Thread support via root_id
- File upload/download with auth-aware caching
- Dedup cache (5min TTL, 2000 entries)
- Full self-hosted instance support

Matrix adapter:
- matrix-nio AsyncClient with sync loop
- Dual auth: access token or user_id + password
- Optional E2EE via matrix-nio[e2e] (libolm)
- Thread support via m.thread (MSC3440)
- Reply support via m.in_reply_to with fallback stripping
- Media upload/download via mxc:// URLs (authenticated v1.11+ endpoint)
- Auto-join on room invite
- DM detection via m.direct account data with sync fallback
- Markdown to HTML conversion

Fixes applied over original PR #1225 by @cyb0rgk1tty:
- Mattermost: add timeout to file downloads, wrap API helpers in
  try/except for network errors, download incoming files immediately
  with auth headers instead of passing auth-required URLs
- Matrix: use authenticated media endpoint (/_matrix/client/v1/media/),
  robust m.direct cache with sync fallback, prefer aiohttp over httpx

Install Matrix support: pip install 'hermes-agent[matrix]'
Mattermost needs no extra deps (uses aiohttp).

Salvaged from PR #1225 by @cyb0rgk1tty with fixes.
2026-03-17 03:18:16 -07:00
Teknium
07549c967a feat: add SMS (Twilio) platform adapter
Add SMS as a first-class messaging platform via the Twilio API.
Shares credentials with the existing telephony skill — same
TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN, TWILIO_PHONE_NUMBER env vars.

Adapter (gateway/platforms/sms.py):
- aiohttp webhook server for inbound (Twilio form-encoded POSTs)
- Twilio REST API with Basic auth for outbound
- Markdown stripping, smart chunking at 1600 chars
- Echo loop prevention, phone number redaction in logs

Integration (13 files):
- gateway config, run, channel_directory
- agent prompt_builder (SMS platform hint)
- cron scheduler, cronjob tools
- send_message_tool (_send_sms via Twilio API)
- toolsets (hermes-sms + hermes-gateway)
- gateway setup wizard, status display
- pyproject.toml (sms optional extra)
- 21 tests

Docs:
- website/docs/user-guide/messaging/sms.md (full setup guide)
- Updated messaging index (architecture, toolsets, security, links)
- Updated environment-variables.md reference

Inspired by PR #1575 (@sunsakis), rewritten for Twilio.
2026-03-17 03:14:53 -07:00
Teknium
3d38d85287 docs: add Alibaba Cloud and DingTalk to setup wizard and docs (#1687)
* feat(gateway): add DingTalk platform adapter

Add DingTalk as a messaging platform using the dingtalk-stream SDK
for real-time message reception via Stream Mode (no webhook needed).
Replies are sent via session webhook using markdown format.

Features:
- Stream Mode connection (long-lived WebSocket, no public URL needed)
- Text and rich text message support
- DM and group chat support
- Message deduplication with 5-minute window
- Auto-reconnection with exponential backoff
- Session webhook caching for reply routing

Configuration:
  export DINGTALK_CLIENT_ID=your-app-key
  export DINGTALK_CLIENT_SECRET=your-app-secret

  # or in config.yaml:
  platforms:
    dingtalk:
      enabled: true
      extra:
        client_id: your-app-key
        client_secret: your-app-secret

Files:
- gateway/platforms/dingtalk.py (340 lines) — adapter implementation
- gateway/config.py — add DINGTALK to Platform enum
- gateway/run.py — add DingTalk to _create_adapter
- hermes_cli/config.py — add env vars to _EXTRA_ENV_KEYS
- hermes_cli/tools_config.py — add dingtalk to PLATFORMS
- tests/gateway/test_dingtalk.py — 21 tests

* docs: add Alibaba Cloud and DingTalk to setup wizard and docs

Wire Alibaba Cloud (DashScope) into hermes setup and hermes model
provider selection flows. Add DingTalk env vars to documentation.

Changes:
- setup.py: Add Alibaba Cloud as provider choice (index 11) with
  DASHSCOPE_API_KEY prompt and model studio link
- main.py: Add alibaba to provider_labels, providers list, and
  model flow dispatch
- environment-variables.md: Add DASHSCOPE_API_KEY, DINGTALK_CLIENT_ID,
  DINGTALK_CLIENT_SECRET, and alibaba to HERMES_INFERENCE_PROVIDER
2026-03-17 03:13:58 -07:00
teknium1
6fc76ef954 fix: harden website blocklist — default off, TTL cache, fail-open, guarded imports
- Default enabled: false (zero overhead when not configured)
- Fast path: cached disabled state skips all work immediately
- TTL cache (30s) for parsed policy — avoids re-reading config.yaml
  on every URL check
- Missing shared files warn + skip instead of crashing all web tools
- Lazy yaml import — missing PyYAML doesn't break browser toolset
- Guarded browser_tool import — fail-open lambda fallback
- check_website_access never raises for default path (fail-open with
  warning log); only raises with explicit config_path (test mode)
- Simplified enforcement code in web_tools/browser_tool — no more
  try/except wrappers since errors are handled internally
2026-03-17 03:11:26 -07:00
Teknium
d132a3dfbb feat(skills): add inference.sh skill (terminal-based, no custom tools) (#1686)
Add inference.sh as a built-in skill that uses the terminal tool to
run infsh CLI commands. No custom tools or tool registration — the
skill teaches the agent how to use the infsh binary via terminal.

Covers 150+ AI apps: image gen (FLUX, Reve, Seedream), video (Veo,
Wan, Seedance), LLMs, search (Tavily, Exa), 3D, avatars, and more.

Includes reference docs for authentication, app discovery, running
apps, and CLI command reference.

Based on PR #1021 by @okaris, reworked as a skill-only integration.

Co-authored-by: okaris <okaris@users.noreply.github.com>
2026-03-17 03:06:53 -07:00
Teknium
a6dcc231f8 feat(gateway): add DingTalk platform adapter (#1685)
Add DingTalk as a messaging platform using the dingtalk-stream SDK
for real-time message reception via Stream Mode (no webhook needed).
Replies are sent via session webhook using markdown format.

Features:
- Stream Mode connection (long-lived WebSocket, no public URL needed)
- Text and rich text message support
- DM and group chat support
- Message deduplication with 5-minute window
- Auto-reconnection with exponential backoff
- Session webhook caching for reply routing

Configuration:
  export DINGTALK_CLIENT_ID=your-app-key
  export DINGTALK_CLIENT_SECRET=your-app-secret

  # or in config.yaml:
  platforms:
    dingtalk:
      enabled: true
      extra:
        client_id: your-app-key
        client_secret: your-app-secret

Files:
- gateway/platforms/dingtalk.py (340 lines) — adapter implementation
- gateway/config.py — add DINGTALK to Platform enum
- gateway/run.py — add DingTalk to _create_adapter
- hermes_cli/config.py — add env vars to _EXTRA_ENV_KEYS
- hermes_cli/tools_config.py — add dingtalk to PLATFORMS
- tests/gateway/test_dingtalk.py — 21 tests
2026-03-17 03:04:58 -07:00
Teknium
c3d626eb07 Revert "feat: add inference.sh integration (infsh tool + skill) (#1682)" (#1684)
This reverts commit 6020db0243.
2026-03-17 03:01:30 -07:00
Teknium
6d1c5d4491 refactor(tools): extract position calculation logic in fuzzy_match (#1681)
Extract the repeated line-position calculation pattern into a
_calculate_line_positions() helper. The same 4-line pattern was
duplicated across _strategy_trimmed_boundary, _strategy_block_anchor,
_strategy_context_aware, and _find_normalized_matches. Also
standardizes the end_pos clamping (some sites used min(), some used
an if-guard).

Based on PR #1604 by aydnOktay.

Co-authored-by: aydnOktay <aydnOktay@users.noreply.github.com>
2026-03-17 03:00:17 -07:00
teknium1
30c417fe70 feat: add website blocklist enforcement for web/browser tools (#1064)
Adds security.website_blocklist config for user-managed domain blocking
across URL-capable tools. Enforced at the tool level (not monkey-patching)
so it's safe and predictable.

- tools/website_policy.py: shared policy loader with domain normalization,
  wildcard support (*.tracking.example), shared file imports, and
  structured block metadata
- web_extract: pre-fetch URL check + post-redirect recheck
- web_crawl: pre-crawl URL check + per-page URL recheck
- browser_navigate: pre-navigation URL check
- Blocked responses include blocked_by_policy metadata so the agent
  can explain exactly what was denied

Config:
  security:
    website_blocklist:
      enabled: true
      domains: ["evil.com", "*.tracking.example"]
      shared_files: ["team-blocklist.txt"]

Salvaged from PR #1086 by @kshitijk4poor. Browser post-redirect checks
deferred (browser_tool was fully rewritten since the PR branched).

Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-03-17 02:59:39 -07:00
Teknium
6020db0243 feat: add inference.sh integration (infsh tool + skill) (#1682)
Add inference.sh CLI (infsh) as a tool integration, giving agents
access to 150+ AI apps through a single CLI — image gen (FLUX, Reve,
Seedream), video (Veo, Wan, Seedance), LLMs, search (Tavily, Exa),
3D, avatar/lipsync, and more. One API key manages all services.

Tools:
- infsh: run any infsh CLI command (app list, app run, etc.)
- infsh_install: install the CLI if not present

Registered as an 'inference' toolset (opt-in, not in core tools).
Includes comprehensive skill docs with examples for all app categories.

Changes from original PR:
- NOT added to _HERMES_CORE_TOOLS (available via --toolsets inference)
- Added 12 tests covering tool registration, command execution,
  error handling, timeout, JSON parsing, and install flow

Inspired by PR #1021 by @okaris.

Co-authored-by: okaris <okaris@users.noreply.github.com>
2026-03-17 02:59:21 -07:00
Teknium
d9a7b83ae3 fix: make _is_write_denied robust to Path objects (#1678)
Cast path to str() before os.path.expanduser() to handle pathlib.Path
inputs safely.

Based on PR #1051 by JackTheGit.

Co-authored-by: JackTheGit <JackTheGit@users.noreply.github.com>
2026-03-17 02:57:02 -07:00
Teknium
1d5a39e002 fix: thread safety for concurrent subagent delegation (#1672)
* fix: thread safety for concurrent subagent delegation

Four thread-safety fixes that prevent crashes and data races when
running multiple subagents concurrently via delegate_task:

1. Remove redirect_stdout/stderr from delegate_tool — mutating global
   sys.stdout races with the spinner thread when multiple children start
   concurrently, causing segfaults. Children already run with
   quiet_mode=True so the redirect was redundant.

2. Split _run_single_child into _build_child_agent (main thread) +
   _run_single_child (worker thread). AIAgent construction creates
   httpx/SSL clients which are not thread-safe to initialize
   concurrently.

3. Add threading.Lock to SessionDB — subagents share the parent's
   SessionDB and call create_session/append_message from worker threads
   with no synchronization.

4. Add _active_children_lock to AIAgent — interrupt() iterates
   _active_children while worker threads append/remove children.

5. Add _client_cache_lock to auxiliary_client — multiple subagent
   threads may resolve clients concurrently via call_llm().

Based on PR #1471 by peteromallet.

* feat: Honcho base_url override via config.yaml + quick command alias type

Two features salvaged from PR #1576:

1. Honcho base_url override: allows pointing Hermes at a remote
   self-hosted Honcho deployment via config.yaml:

     honcho:
       base_url: "http://192.168.x.x:8000"

   When set, this overrides the Honcho SDK's environment mapping
   (production/local), enabling LAN/VPN Honcho deployments without
   requiring the server to live on localhost. Uses config.yaml instead
   of env var (HONCHO_URL) per project convention.

2. Quick command alias type: adds a new 'alias' quick command type
   that rewrites to another slash command before normal dispatch:

     quick_commands:
       sc:
         type: alias
         target: /context

   Supports both CLI and gateway. Arguments are forwarded to the
   target command.

Based on PR #1576 by redhelix.

---------

Co-authored-by: peteromallet <peteromallet@users.noreply.github.com>
Co-authored-by: redhelix <redhelix@users.noreply.github.com>
2026-03-17 02:53:33 -07:00
Teknium
fd61ae13e5 revert: revert SMS (Telnyx) platform adapter for review
This reverts commit ef67037f8e.
2026-03-17 02:53:30 -07:00
Teknium
ef67037f8e feat: add SMS (Telnyx) platform adapter
Implement SMS as a first-class messaging platform following
ADDING_A_PLATFORM.md checklist. All 16 integration points covered:

- gateway/platforms/sms.py: Core adapter with aiohttp webhook server,
  Telnyx REST API send, markdown stripping, 1600-char chunking,
  echo loop prevention, multi-number reply-from tracking
- gateway/config.py: Platform.SMS enum + env override block
- gateway/run.py: Adapter factory + auth maps (SMS_ALLOWED_USERS,
  SMS_ALLOW_ALL_USERS)
- toolsets.py: hermes-sms toolset + included in hermes-gateway
- cron/scheduler.py: SMS in platform_map for cron delivery
- tools/send_message_tool.py: SMS routing + _send_sms() standalone sender
- tools/cronjob_tools.py: 'sms' in deliver description
- gateway/channel_directory.py: SMS in session-based discovery
- agent/prompt_builder.py: SMS platform hint (plain text, concise)
- hermes_cli/status.py: SMS in platforms status display
- hermes_cli/gateway.py: SMS in setup wizard with Telnyx instructions
- pyproject.toml: sms optional dependency group (aiohttp>=3.9.0)
- tests/gateway/test_sms.py: Unit tests for config, format, truncate,
  echo prevention, requirements, toolset integration

Co-authored-by: sunsakis <teo@sunsakis.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-17 02:52:34 -07:00
Teknium
71c6b1ee99 fix: remove ANTHROPIC_BASE_URL env var to avoid collisions (#1675)
ANTHROPIC_BASE_URL collides with Claude Code and other Anthropic
tooling. Remove it from the Anthropic provider — base URL overrides
should go through config.yaml model.base_url instead.

The Alibaba/DashScope provider has its own dedicated base URL and
API key env vars which don't collide with anything.
2026-03-17 02:51:49 -07:00
teknium1
a1c81360a5 feat(cli): skin-aware light/dark theme mode with terminal auto-detection
Add display.theme_mode setting (auto/light/dark) that makes the CLI
readable on light terminal backgrounds.

- Auto-detect terminal background via COLORFGBG, OSC 11, and macOS
  appearance (fallback chain in hermes_cli/colors.py)
- Add colors_light overrides to all 7 built-in skins with dark/readable
  colors for light backgrounds
- SkinConfig.get_color() now returns light overrides when theme is light
- get_prompt_toolkit_style_overrides() uses light bg colors for
  completion menus in light mode
- init_skin_from_config() reads display.theme_mode from config
- 7 new tests covering theme mode resolution, detection fallbacks,
  and light-mode skin overrides

Salvaged from PR #1187 by @peteromallet. Core design preserved;
adapted to current main (kept all existing helpers, tool_emojis,
convenience functions that were added after the PR branched).

Co-authored-by: Peter O'Mallet <peteromallet@users.noreply.github.com>
2026-03-17 02:51:40 -07:00
Teknium
d156942419 fix(telegram): aggregate split text messages before dispatching (#1674)
When a user sends a long message, Telegram clients split it into
multiple updates that arrive within milliseconds of each other.
Previously each chunk was dispatched independently — the first would
start the agent, and subsequent chunks would interrupt or queue as
separate turns, causing the agent to only see part of the message.

Add text message batching to TelegramAdapter following the same pattern
as the existing photo burst batching:

- _enqueue_text_event() buffers text by session key, concatenating
  chunks that arrive in rapid succession
- _flush_text_batch() dispatches the combined message after a 0.6s
  quiet period (configurable via HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS)
- Timer resets on each new chunk, so all parts of a split arrive
  before the batch is dispatched

Reported by NulledVector on Discord.
2026-03-17 02:49:57 -07:00
Teknium
7042a748f5 feat: add Alibaba Cloud provider and Anthropic base_url override (#1673)
Add Alibaba Cloud (DashScope) as a first-class inference provider
using the Anthropic-compatible endpoint. This gives access to Qwen
models (qwen3.5-plus, qwen3-max, qwen3-coder-plus, etc.) through
the same api_mode as native Anthropic.

Also add ANTHROPIC_BASE_URL env var support so users can point the
Anthropic provider at any compatible endpoint.

Changes:
- auth.py: Add alibaba ProviderConfig + ANTHROPIC_BASE_URL on anthropic
- models.py: Add alibaba to catalog, labels, aliases (dashscope/aliyun/qwen), provider order
- runtime_provider.py: Add alibaba resolution (anthropic_messages api_mode) + ANTHROPIC_BASE_URL
- model_metadata.py: Add Qwen model context lengths (128K)
- config.py: Add DASHSCOPE_API_KEY, DASHSCOPE_BASE_URL, ANTHROPIC_BASE_URL env vars

Usage:
  hermes --provider alibaba --model qwen3.5-plus
  # or via aliases:
  hermes --provider qwen --model qwen3-max
2026-03-17 02:49:22 -07:00
Teknium
d9d937b7f7 fix: detect Claude Code version dynamically for OAuth user-agent
* fix: prevent infinite 400 failure loop on context overflow (#1630)

When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message.  This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error.  Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.

Three-layer fix:

1. run_agent.py — Fallback heuristic: when a 400 error has a very short
   generic message AND the session is large (>40% of context or >80
   messages), treat it as a probable context overflow and trigger
   compression instead of aborting.

2. run_agent.py + gateway/run.py — Don't persist failed messages:
   when the agent returns failed=True before generating any response,
   skip writing the user's message to the transcript/DB. This prevents
   the session from growing on each failure.

3. gateway/run.py — Smarter error messages: detect context-overflow
   failures and suggest /compact or /reset specifically, instead of a
   generic 'try again' that will fail identically.

* fix(skills): detect prompt injection patterns and block cache file reads

Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):

1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
   (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
   was the original injection vector — untrusted skill descriptions
   in the catalog contained adversarial text that the model executed.

2. skill_view: warns when skills are loaded from outside the trusted
   ~/.hermes/skills/ directory, and detects common injection patterns
   in skill content ("ignore previous instructions", "<system>", etc.).

Cherry-picked from PR #1562 by ygd58.

* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)

Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.

- Apply truncate_message() chunking in _send_to_platform() before
  dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
  in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement

Cherry-picked from PR #1557 by llbn.

* fix(approval): show full command in dangerous command approval (#1553)

Previously the command was truncated to 80 chars in CLI (with a
[v]iew full option), 500 chars in Discord embeds, and missing entirely
in Telegram/Slack approval messages. Now the full command is always
displayed everywhere:

- CLI: removed 80-char truncation and [v]iew full menu option
- Gateway (TG/Slack): approval_required message includes full command
  in a code block
- Discord: embed shows full command up to 4096-char limit
- Windows: skip SIGALRM-based test timeout (Unix-only)
- Updated tests: replaced view-flow tests with direct approval tests

Cherry-picked from PR #1566 by crazywriter1.

* fix(cli): flush stdout during agent loop to prevent macOS display freeze (#1624)

The interrupt polling loop in chat() waited on the queue without
invalidating the prompt_toolkit renderer. On macOS, the StdoutProxy
buffer only flushed on input events, causing the CLI to appear frozen
during tool execution until the user typed a key.

Fix: call _invalidate() on each queue timeout (every ~100ms, throttled
to 150ms) to force the renderer to flush buffered agent output.

* fix(claw): warn when API keys are skipped during OpenClaw migration (#1580)

When --migrate-secrets is not passed (the default), API keys like
OPENROUTER_API_KEY are silently skipped with no warning. Users don't
realize their keys weren't migrated until the agent fails to connect.

Add a post-migration warning with actionable instructions: either
re-run with --migrate-secrets or add the key manually via
hermes config set.

Cherry-picked from PR #1593 by ygd58.

* fix(security): block sandbox backend creds from subprocess env (#1264)

Add Modal and Daytona sandbox credentials to the subprocess env
blocklist so they're not leaked to agent terminal sessions via
printenv/env.

Cherry-picked from PR #1571 by ygd58.

* fix(gateway): cap interrupt recursion depth to prevent resource exhaustion (#816)

When a user sends multiple messages while the agent keeps failing,
_run_agent() calls itself recursively with no depth limit. This can
exhaust stack/memory if the agent is in a failure loop.

Add _MAX_INTERRUPT_DEPTH = 3. When exceeded, the pending message is
logged and the current result is returned instead of recursing deeper.

The log handler duplication bug described in #816 was already fixed
separately (AIAgent.__init__ deduplicates handlers).

* fix(gateway): /model shows active fallback model instead of config default (#1615)

When the agent falls back to a different model (e.g. due to rate
limiting), /model still showed the config default. Now tracks the
effective model/provider after each agent run and displays it.

Cleared when the primary model succeeds again or the user explicitly
switches via /model.

Cherry-picked from PR #1616 by MaxKerkula. Added hasattr guard for
test compatibility.

* feat(gateway): inject reply-to message context for out-of-session replies (#1594)

When a user replies to a Telegram message, check if the quoted text
exists in the current session transcript. If missing (from cron jobs,
background tasks, or old sessions), prepend [Replying to: "..."] to
the message so the agent has context about what's being referenced.

- Add reply_to_text field to MessageEvent (base.py)
- Populate from Telegram's reply_to_message (text or caption)
- Inject context in _handle_message when not found in history

Based on PR #1596 by anpicasso (cherry-picked reply-to feature only,
excluded unrelated /server command and background delegation changes).

* fix: recognize Claude Code OAuth credentials in startup gate (#1455)

The _has_any_provider_configured() startup check didn't look for
Claude Code OAuth credentials (~/.claude/.credentials.json). Users
with only Claude Code auth got the setup wizard instead of starting.

Cherry-picked from PR #1455 by kshitijk4poor.

* perf: use ripgrep for file search (200x faster than find)

search_files(target='files') now uses rg --files -g instead of find.
Ripgrep respects .gitignore, excludes hidden dirs by default, and has
parallel directory traversal — ~200x faster on wide trees (0.14s vs 34s
benchmarked on 164-repo tree).

Falls back to find when rg is unavailable, preserving hidden-dir
exclusion and BSD find compatibility.

Salvaged from PR #1464 by @light-merlin-dark (Merlin) — adapted to
preserve hidden-dir exclusion added since the original PR.

* refactor(tts): replace NeuTTS optional skill with built-in provider + setup flow

Remove the optional skill (redundant now that NeuTTS is a built-in TTS
provider). Replace neutts_cli dependency with a standalone synthesis
helper (tools/neutts_synth.py) that calls the neutts Python API directly
in a subprocess.

Add TTS provider selection to hermes setup:
- 'hermes setup' now prompts for TTS provider after model selection
- 'hermes setup tts' available as standalone section
- Selecting NeuTTS checks for deps and offers to install:
  espeak-ng (system) + neutts[all] (pip)
- ElevenLabs/OpenAI selections prompt for API keys
- Tool status display shows NeuTTS install state

Changes:
- Remove optional-skills/mlops/models/neutts/ (skill + CLI scaffold)
- Add tools/neutts_synth.py (standalone synthesis subprocess helper)
- Move jo.wav/jo.txt to tools/neutts_samples/ (bundled default voice)
- Refactor _generate_neutts() — uses neutts API via subprocess, no
  neutts_cli dependency, config-driven ref_audio/ref_text/model/device
- Add TTS setup to hermes_cli/setup.py (SETUP_SECTIONS, tool status)
- Update config.py defaults (ref_audio, ref_text, model, device)

* fix(docker): add explicit env allowlist for container credentials (#1436)

Docker terminal sessions are secret-dark by default. This adds
terminal.docker_forward_env as an explicit allowlist for env vars
that may be forwarded into Docker containers.

Values resolve from the current shell first, then fall back to
~/.hermes/.env. Only variables the user explicitly lists are
forwarded — nothing is auto-exposed.

Cherry-picked from PR #1449 by @teknium1, conflict-resolved onto
current main.

Fixes #1436
Supersedes #1439

* fix: email send_typing metadata param + ☤ Hermes staff symbol

- email.py: add missing metadata parameter to send_typing() to match
  BasePlatformAdapter signature (PR #1431 by @ItsChoudhry)
- README.md: ⚕ → ☤ — the caduceus is Hermes's staff, not the
  medical Staff of Asclepius (PR #1420 by @rianczerwinski)

* fix(whatsapp): support LID format in self-chat mode (#1556)

WhatsApp now uses LID (Linked Identity Device) format alongside classic
@s.whatsapp.net. Self-chat detection checked only the classic format,
breaking self-chat mode for users on newer WhatsApp versions.

- Check both sock.user.id and sock.user.lid for self-chat detection
- Accept 'append' message type in addition to 'notify' (self-chat
  messages arrive as 'append')
- Track sent message IDs to prevent echo-back loops with media
- Add WHATSAPP_DEBUG env var for troubleshooting

Based on PR #1556 by jcorrego (manually applied due to cherry-pick
conflicts).

* fix: detect Claude Code version dynamically for OAuth user-agent

The _CLAUDE_CODE_VERSION was hardcoded to '2.1.2' but Anthropic
rejects OAuth requests when the spoofed user-agent version is too
far behind the current Claude Code release. The error is a generic
400 with just 'Error' as the message, making it very hard to diagnose.

Fix: detect the installed version via 'claude --version' at import
time, falling back to a bumped static constant (2.1.74) when Claude
Code isn't installed. This means users who keep Claude Code updated
never hit stale-version rejections.

Reported by Jack — changing the version string to match the installed
claude binary fixed persistent OAuth 400 errors immediately.

---------

Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
Co-authored-by: Max K <MaxKerkula@users.noreply.github.com>
Co-authored-by: Angello Picasso <angello.picasso@devsu.com>
Co-authored-by: kshitij <kshitijk4poor@users.noreply.github.com>
Co-authored-by: jcorrego <jcorrego@users.noreply.github.com>
2026-03-17 02:48:33 -07:00
Teknium
65be657a79 feat(skills): add Sherlock OSINT username search skill
Add optional skill for username enumeration across 400+ social networks
using the Sherlock Project CLI (https://github.com/sherlock-project/sherlock).

Features:
- Smart username extraction from user messages
- Installation verification before execution
- Categorized output with clickable links
- Ethical use guidelines
- Docker, pipx, and pip installation paths

Co-authored-by: unmodeled-tyler <unmodeled.tyler@proton.me>
2026-03-17 02:48:21 -07:00
Teknium
b197bb01d3 docs(configuration): clarify self-hosted firecrawl setup
Co-authored-by: caentzminger <112503481+caentzminger@users.noreply.github.com>
2026-03-17 02:45:13 -07:00
Teknium
a3ac142c83 fix(core): guard print() calls in run_conversation() against OSError
In headless environments (systemd, Docker, nohup) stdout can become
unavailable mid-session. Raw print() raises OSError which crashes
cron jobs — agent finishes work but delivery never happens because
the error handler's own print() also raises OSError.

Fix:
- Add _safe_print() static method that wraps print() with try/except
  OSError — silently drops output when stdout is broken
- Make _vprint() use _safe_print() — protects all calls through the
  verbose print path
- Convert raw print() calls in run_conversation() hot path to use
  _safe_print(): starting conversation, interrupt, budget exhausted,
  preflight compression, context cache, conversation completed
- Error handler print (the cascading crash point) gets explicit
  try/except with logger.error() fallback so diagnostics aren't lost

Fixes #845
Closes #1358 (superseded — PR was 323 commits stale with a bug)
2026-03-17 02:41:01 -07:00
Teknium
342a0ad372 fix(whatsapp): support LID format in self-chat mode (#1556)
* fix: prevent infinite 400 failure loop on context overflow (#1630)

When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message.  This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error.  Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.

Three-layer fix:

1. run_agent.py — Fallback heuristic: when a 400 error has a very short
   generic message AND the session is large (>40% of context or >80
   messages), treat it as a probable context overflow and trigger
   compression instead of aborting.

2. run_agent.py + gateway/run.py — Don't persist failed messages:
   when the agent returns failed=True before generating any response,
   skip writing the user's message to the transcript/DB. This prevents
   the session from growing on each failure.

3. gateway/run.py — Smarter error messages: detect context-overflow
   failures and suggest /compact or /reset specifically, instead of a
   generic 'try again' that will fail identically.

* fix(skills): detect prompt injection patterns and block cache file reads

Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):

1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
   (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
   was the original injection vector — untrusted skill descriptions
   in the catalog contained adversarial text that the model executed.

2. skill_view: warns when skills are loaded from outside the trusted
   ~/.hermes/skills/ directory, and detects common injection patterns
   in skill content ("ignore previous instructions", "<system>", etc.).

Cherry-picked from PR #1562 by ygd58.

* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)

Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.

- Apply truncate_message() chunking in _send_to_platform() before
  dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
  in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement

Cherry-picked from PR #1557 by llbn.

* fix(approval): show full command in dangerous command approval (#1553)

Previously the command was truncated to 80 chars in CLI (with a
[v]iew full option), 500 chars in Discord embeds, and missing entirely
in Telegram/Slack approval messages. Now the full command is always
displayed everywhere:

- CLI: removed 80-char truncation and [v]iew full menu option
- Gateway (TG/Slack): approval_required message includes full command
  in a code block
- Discord: embed shows full command up to 4096-char limit
- Windows: skip SIGALRM-based test timeout (Unix-only)
- Updated tests: replaced view-flow tests with direct approval tests

Cherry-picked from PR #1566 by crazywriter1.

* fix(cli): flush stdout during agent loop to prevent macOS display freeze (#1624)

The interrupt polling loop in chat() waited on the queue without
invalidating the prompt_toolkit renderer. On macOS, the StdoutProxy
buffer only flushed on input events, causing the CLI to appear frozen
during tool execution until the user typed a key.

Fix: call _invalidate() on each queue timeout (every ~100ms, throttled
to 150ms) to force the renderer to flush buffered agent output.

* fix(claw): warn when API keys are skipped during OpenClaw migration (#1580)

When --migrate-secrets is not passed (the default), API keys like
OPENROUTER_API_KEY are silently skipped with no warning. Users don't
realize their keys weren't migrated until the agent fails to connect.

Add a post-migration warning with actionable instructions: either
re-run with --migrate-secrets or add the key manually via
hermes config set.

Cherry-picked from PR #1593 by ygd58.

* fix(security): block sandbox backend creds from subprocess env (#1264)

Add Modal and Daytona sandbox credentials to the subprocess env
blocklist so they're not leaked to agent terminal sessions via
printenv/env.

Cherry-picked from PR #1571 by ygd58.

* fix(gateway): cap interrupt recursion depth to prevent resource exhaustion (#816)

When a user sends multiple messages while the agent keeps failing,
_run_agent() calls itself recursively with no depth limit. This can
exhaust stack/memory if the agent is in a failure loop.

Add _MAX_INTERRUPT_DEPTH = 3. When exceeded, the pending message is
logged and the current result is returned instead of recursing deeper.

The log handler duplication bug described in #816 was already fixed
separately (AIAgent.__init__ deduplicates handlers).

* fix(gateway): /model shows active fallback model instead of config default (#1615)

When the agent falls back to a different model (e.g. due to rate
limiting), /model still showed the config default. Now tracks the
effective model/provider after each agent run and displays it.

Cleared when the primary model succeeds again or the user explicitly
switches via /model.

Cherry-picked from PR #1616 by MaxKerkula. Added hasattr guard for
test compatibility.

* feat(gateway): inject reply-to message context for out-of-session replies (#1594)

When a user replies to a Telegram message, check if the quoted text
exists in the current session transcript. If missing (from cron jobs,
background tasks, or old sessions), prepend [Replying to: "..."] to
the message so the agent has context about what's being referenced.

- Add reply_to_text field to MessageEvent (base.py)
- Populate from Telegram's reply_to_message (text or caption)
- Inject context in _handle_message when not found in history

Based on PR #1596 by anpicasso (cherry-picked reply-to feature only,
excluded unrelated /server command and background delegation changes).

* fix: recognize Claude Code OAuth credentials in startup gate (#1455)

The _has_any_provider_configured() startup check didn't look for
Claude Code OAuth credentials (~/.claude/.credentials.json). Users
with only Claude Code auth got the setup wizard instead of starting.

Cherry-picked from PR #1455 by kshitijk4poor.

* perf: use ripgrep for file search (200x faster than find)

search_files(target='files') now uses rg --files -g instead of find.
Ripgrep respects .gitignore, excludes hidden dirs by default, and has
parallel directory traversal — ~200x faster on wide trees (0.14s vs 34s
benchmarked on 164-repo tree).

Falls back to find when rg is unavailable, preserving hidden-dir
exclusion and BSD find compatibility.

Salvaged from PR #1464 by @light-merlin-dark (Merlin) — adapted to
preserve hidden-dir exclusion added since the original PR.

* refactor(tts): replace NeuTTS optional skill with built-in provider + setup flow

Remove the optional skill (redundant now that NeuTTS is a built-in TTS
provider). Replace neutts_cli dependency with a standalone synthesis
helper (tools/neutts_synth.py) that calls the neutts Python API directly
in a subprocess.

Add TTS provider selection to hermes setup:
- 'hermes setup' now prompts for TTS provider after model selection
- 'hermes setup tts' available as standalone section
- Selecting NeuTTS checks for deps and offers to install:
  espeak-ng (system) + neutts[all] (pip)
- ElevenLabs/OpenAI selections prompt for API keys
- Tool status display shows NeuTTS install state

Changes:
- Remove optional-skills/mlops/models/neutts/ (skill + CLI scaffold)
- Add tools/neutts_synth.py (standalone synthesis subprocess helper)
- Move jo.wav/jo.txt to tools/neutts_samples/ (bundled default voice)
- Refactor _generate_neutts() — uses neutts API via subprocess, no
  neutts_cli dependency, config-driven ref_audio/ref_text/model/device
- Add TTS setup to hermes_cli/setup.py (SETUP_SECTIONS, tool status)
- Update config.py defaults (ref_audio, ref_text, model, device)

* fix(docker): add explicit env allowlist for container credentials (#1436)

Docker terminal sessions are secret-dark by default. This adds
terminal.docker_forward_env as an explicit allowlist for env vars
that may be forwarded into Docker containers.

Values resolve from the current shell first, then fall back to
~/.hermes/.env. Only variables the user explicitly lists are
forwarded — nothing is auto-exposed.

Cherry-picked from PR #1449 by @teknium1, conflict-resolved onto
current main.

Fixes #1436
Supersedes #1439

* fix: email send_typing metadata param + ☤ Hermes staff symbol

- email.py: add missing metadata parameter to send_typing() to match
  BasePlatformAdapter signature (PR #1431 by @ItsChoudhry)
- README.md: ⚕ → ☤ — the caduceus is Hermes's staff, not the
  medical Staff of Asclepius (PR #1420 by @rianczerwinski)

* fix(whatsapp): support LID format in self-chat mode (#1556)

WhatsApp now uses LID (Linked Identity Device) format alongside classic
@s.whatsapp.net. Self-chat detection checked only the classic format,
breaking self-chat mode for users on newer WhatsApp versions.

- Check both sock.user.id and sock.user.lid for self-chat detection
- Accept 'append' message type in addition to 'notify' (self-chat
  messages arrive as 'append')
- Track sent message IDs to prevent echo-back loops with media
- Add WHATSAPP_DEBUG env var for troubleshooting

Based on PR #1556 by jcorrego (manually applied due to cherry-pick
conflicts).

---------

Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
Co-authored-by: Max K <MaxKerkula@users.noreply.github.com>
Co-authored-by: Angello Picasso <angello.picasso@devsu.com>
Co-authored-by: kshitij <kshitijk4poor@users.noreply.github.com>
Co-authored-by: jcorrego <jcorrego@users.noreply.github.com>
2026-03-17 02:40:55 -07:00
Teknium
35d948b6e1 feat: add Kilo Code (kilocode) as first-class inference provider (#1666)
Add Kilo Gateway (kilo.ai) as an API-key provider with OpenAI-compatible
endpoint at https://api.kilo.ai/api/gateway. Supports 500+ models from
Anthropic, OpenAI, Google, xAI, Mistral, MiniMax via a single API key.

- Register kilocode in PROVIDER_REGISTRY with aliases (kilo, kilo-code,
  kilo-gateway) and KILOCODE_API_KEY / KILOCODE_BASE_URL env vars
- Add to model catalog, CLI provider menu, setup wizard, doctor checks
- Add google/gemini-3-flash-preview as default aux model
- 12 new tests covering registration, aliases, credential resolution,
  runtime config
- Documentation updates (env vars, config, fallback providers)
- Fix setup test index shift from provider insertion

Inspired by PR #1473 by @amanning3390.

Co-authored-by: amanning3390 <amanning3390@users.noreply.github.com>
2026-03-17 02:40:34 -07:00
Teknium
6c6d12033f fix: email send_typing metadata + ☤ Hermes staff symbol (#1431, #1420)
* fix: prevent infinite 400 failure loop on context overflow (#1630)

When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message.  This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error.  Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.

Three-layer fix:

1. run_agent.py — Fallback heuristic: when a 400 error has a very short
   generic message AND the session is large (>40% of context or >80
   messages), treat it as a probable context overflow and trigger
   compression instead of aborting.

2. run_agent.py + gateway/run.py — Don't persist failed messages:
   when the agent returns failed=True before generating any response,
   skip writing the user's message to the transcript/DB. This prevents
   the session from growing on each failure.

3. gateway/run.py — Smarter error messages: detect context-overflow
   failures and suggest /compact or /reset specifically, instead of a
   generic 'try again' that will fail identically.

* fix(skills): detect prompt injection patterns and block cache file reads

Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):

1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
   (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
   was the original injection vector — untrusted skill descriptions
   in the catalog contained adversarial text that the model executed.

2. skill_view: warns when skills are loaded from outside the trusted
   ~/.hermes/skills/ directory, and detects common injection patterns
   in skill content ("ignore previous instructions", "<system>", etc.).

Cherry-picked from PR #1562 by ygd58.

* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)

Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.

- Apply truncate_message() chunking in _send_to_platform() before
  dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
  in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement

Cherry-picked from PR #1557 by llbn.

* fix(approval): show full command in dangerous command approval (#1553)

Previously the command was truncated to 80 chars in CLI (with a
[v]iew full option), 500 chars in Discord embeds, and missing entirely
in Telegram/Slack approval messages. Now the full command is always
displayed everywhere:

- CLI: removed 80-char truncation and [v]iew full menu option
- Gateway (TG/Slack): approval_required message includes full command
  in a code block
- Discord: embed shows full command up to 4096-char limit
- Windows: skip SIGALRM-based test timeout (Unix-only)
- Updated tests: replaced view-flow tests with direct approval tests

Cherry-picked from PR #1566 by crazywriter1.

* fix(cli): flush stdout during agent loop to prevent macOS display freeze (#1624)

The interrupt polling loop in chat() waited on the queue without
invalidating the prompt_toolkit renderer. On macOS, the StdoutProxy
buffer only flushed on input events, causing the CLI to appear frozen
during tool execution until the user typed a key.

Fix: call _invalidate() on each queue timeout (every ~100ms, throttled
to 150ms) to force the renderer to flush buffered agent output.

* fix(claw): warn when API keys are skipped during OpenClaw migration (#1580)

When --migrate-secrets is not passed (the default), API keys like
OPENROUTER_API_KEY are silently skipped with no warning. Users don't
realize their keys weren't migrated until the agent fails to connect.

Add a post-migration warning with actionable instructions: either
re-run with --migrate-secrets or add the key manually via
hermes config set.

Cherry-picked from PR #1593 by ygd58.

* fix(security): block sandbox backend creds from subprocess env (#1264)

Add Modal and Daytona sandbox credentials to the subprocess env
blocklist so they're not leaked to agent terminal sessions via
printenv/env.

Cherry-picked from PR #1571 by ygd58.

* fix(gateway): cap interrupt recursion depth to prevent resource exhaustion (#816)

When a user sends multiple messages while the agent keeps failing,
_run_agent() calls itself recursively with no depth limit. This can
exhaust stack/memory if the agent is in a failure loop.

Add _MAX_INTERRUPT_DEPTH = 3. When exceeded, the pending message is
logged and the current result is returned instead of recursing deeper.

The log handler duplication bug described in #816 was already fixed
separately (AIAgent.__init__ deduplicates handlers).

* fix(gateway): /model shows active fallback model instead of config default (#1615)

When the agent falls back to a different model (e.g. due to rate
limiting), /model still showed the config default. Now tracks the
effective model/provider after each agent run and displays it.

Cleared when the primary model succeeds again or the user explicitly
switches via /model.

Cherry-picked from PR #1616 by MaxKerkula. Added hasattr guard for
test compatibility.

* feat(gateway): inject reply-to message context for out-of-session replies (#1594)

When a user replies to a Telegram message, check if the quoted text
exists in the current session transcript. If missing (from cron jobs,
background tasks, or old sessions), prepend [Replying to: "..."] to
the message so the agent has context about what's being referenced.

- Add reply_to_text field to MessageEvent (base.py)
- Populate from Telegram's reply_to_message (text or caption)
- Inject context in _handle_message when not found in history

Based on PR #1596 by anpicasso (cherry-picked reply-to feature only,
excluded unrelated /server command and background delegation changes).

* fix: recognize Claude Code OAuth credentials in startup gate (#1455)

The _has_any_provider_configured() startup check didn't look for
Claude Code OAuth credentials (~/.claude/.credentials.json). Users
with only Claude Code auth got the setup wizard instead of starting.

Cherry-picked from PR #1455 by kshitijk4poor.

* perf: use ripgrep for file search (200x faster than find)

search_files(target='files') now uses rg --files -g instead of find.
Ripgrep respects .gitignore, excludes hidden dirs by default, and has
parallel directory traversal — ~200x faster on wide trees (0.14s vs 34s
benchmarked on 164-repo tree).

Falls back to find when rg is unavailable, preserving hidden-dir
exclusion and BSD find compatibility.

Salvaged from PR #1464 by @light-merlin-dark (Merlin) — adapted to
preserve hidden-dir exclusion added since the original PR.

* refactor(tts): replace NeuTTS optional skill with built-in provider + setup flow

Remove the optional skill (redundant now that NeuTTS is a built-in TTS
provider). Replace neutts_cli dependency with a standalone synthesis
helper (tools/neutts_synth.py) that calls the neutts Python API directly
in a subprocess.

Add TTS provider selection to hermes setup:
- 'hermes setup' now prompts for TTS provider after model selection
- 'hermes setup tts' available as standalone section
- Selecting NeuTTS checks for deps and offers to install:
  espeak-ng (system) + neutts[all] (pip)
- ElevenLabs/OpenAI selections prompt for API keys
- Tool status display shows NeuTTS install state

Changes:
- Remove optional-skills/mlops/models/neutts/ (skill + CLI scaffold)
- Add tools/neutts_synth.py (standalone synthesis subprocess helper)
- Move jo.wav/jo.txt to tools/neutts_samples/ (bundled default voice)
- Refactor _generate_neutts() — uses neutts API via subprocess, no
  neutts_cli dependency, config-driven ref_audio/ref_text/model/device
- Add TTS setup to hermes_cli/setup.py (SETUP_SECTIONS, tool status)
- Update config.py defaults (ref_audio, ref_text, model, device)

* fix(docker): add explicit env allowlist for container credentials (#1436)

Docker terminal sessions are secret-dark by default. This adds
terminal.docker_forward_env as an explicit allowlist for env vars
that may be forwarded into Docker containers.

Values resolve from the current shell first, then fall back to
~/.hermes/.env. Only variables the user explicitly lists are
forwarded — nothing is auto-exposed.

Cherry-picked from PR #1449 by @teknium1, conflict-resolved onto
current main.

Fixes #1436
Supersedes #1439

* fix: email send_typing metadata param + ☤ Hermes staff symbol

- email.py: add missing metadata parameter to send_typing() to match
  BasePlatformAdapter signature (PR #1431 by @ItsChoudhry)
- README.md: ⚕ → ☤ — the caduceus is Hermes's staff, not the
  medical Staff of Asclepius (PR #1420 by @rianczerwinski)

---------

Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
Co-authored-by: Max K <MaxKerkula@users.noreply.github.com>
Co-authored-by: Angello Picasso <angello.picasso@devsu.com>
Co-authored-by: kshitij <kshitijk4poor@users.noreply.github.com>
2026-03-17 02:37:40 -07:00
Teknium
556e0f4b43 fix(docker): add explicit env allowlist for container credentials (#1436)
Docker terminal sessions are secret-dark by default. This adds
terminal.docker_forward_env as an explicit allowlist for env vars
that may be forwarded into Docker containers.

Values resolve from the current shell first, then fall back to
~/.hermes/.env. Only variables the user explicitly lists are
forwarded — nothing is auto-exposed.

Cherry-picked from PR #1449 by @teknium1, conflict-resolved onto
current main.

Fixes #1436
Supersedes #1439
2026-03-17 02:34:35 -07:00
Teknium
d50e0711c2 refactor(tts): replace NeuTTS optional skill with built-in provider + setup flow
Remove the optional skill (redundant now that NeuTTS is a built-in TTS
provider). Replace neutts_cli dependency with a standalone synthesis
helper (tools/neutts_synth.py) that calls the neutts Python API directly
in a subprocess.

Add TTS provider selection to hermes setup:
- 'hermes setup' now prompts for TTS provider after model selection
- 'hermes setup tts' available as standalone section
- Selecting NeuTTS checks for deps and offers to install:
  espeak-ng (system) + neutts[all] (pip)
- ElevenLabs/OpenAI selections prompt for API keys
- Tool status display shows NeuTTS install state

Changes:
- Remove optional-skills/mlops/models/neutts/ (skill + CLI scaffold)
- Add tools/neutts_synth.py (standalone synthesis subprocess helper)
- Move jo.wav/jo.txt to tools/neutts_samples/ (bundled default voice)
- Refactor _generate_neutts() — uses neutts API via subprocess, no
  neutts_cli dependency, config-driven ref_audio/ref_text/model/device
- Add TTS setup to hermes_cli/setup.py (SETUP_SECTIONS, tool status)
- Update config.py defaults (ref_audio, ref_text, model, device)
2026-03-17 02:33:12 -07:00
Teknium
e2e53d497f fix: recognize Claude Code OAuth credentials in startup gate (#1455)
* fix: prevent infinite 400 failure loop on context overflow (#1630)

When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message.  This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error.  Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.

Three-layer fix:

1. run_agent.py — Fallback heuristic: when a 400 error has a very short
   generic message AND the session is large (>40% of context or >80
   messages), treat it as a probable context overflow and trigger
   compression instead of aborting.

2. run_agent.py + gateway/run.py — Don't persist failed messages:
   when the agent returns failed=True before generating any response,
   skip writing the user's message to the transcript/DB. This prevents
   the session from growing on each failure.

3. gateway/run.py — Smarter error messages: detect context-overflow
   failures and suggest /compact or /reset specifically, instead of a
   generic 'try again' that will fail identically.

* fix(skills): detect prompt injection patterns and block cache file reads

Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):

1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
   (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
   was the original injection vector — untrusted skill descriptions
   in the catalog contained adversarial text that the model executed.

2. skill_view: warns when skills are loaded from outside the trusted
   ~/.hermes/skills/ directory, and detects common injection patterns
   in skill content ("ignore previous instructions", "<system>", etc.).

Cherry-picked from PR #1562 by ygd58.

* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)

Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.

- Apply truncate_message() chunking in _send_to_platform() before
  dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
  in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement

Cherry-picked from PR #1557 by llbn.

* fix(approval): show full command in dangerous command approval (#1553)

Previously the command was truncated to 80 chars in CLI (with a
[v]iew full option), 500 chars in Discord embeds, and missing entirely
in Telegram/Slack approval messages. Now the full command is always
displayed everywhere:

- CLI: removed 80-char truncation and [v]iew full menu option
- Gateway (TG/Slack): approval_required message includes full command
  in a code block
- Discord: embed shows full command up to 4096-char limit
- Windows: skip SIGALRM-based test timeout (Unix-only)
- Updated tests: replaced view-flow tests with direct approval tests

Cherry-picked from PR #1566 by crazywriter1.

* fix(cli): flush stdout during agent loop to prevent macOS display freeze (#1624)

The interrupt polling loop in chat() waited on the queue without
invalidating the prompt_toolkit renderer. On macOS, the StdoutProxy
buffer only flushed on input events, causing the CLI to appear frozen
during tool execution until the user typed a key.

Fix: call _invalidate() on each queue timeout (every ~100ms, throttled
to 150ms) to force the renderer to flush buffered agent output.

* fix(claw): warn when API keys are skipped during OpenClaw migration (#1580)

When --migrate-secrets is not passed (the default), API keys like
OPENROUTER_API_KEY are silently skipped with no warning. Users don't
realize their keys weren't migrated until the agent fails to connect.

Add a post-migration warning with actionable instructions: either
re-run with --migrate-secrets or add the key manually via
hermes config set.

Cherry-picked from PR #1593 by ygd58.

* fix(security): block sandbox backend creds from subprocess env (#1264)

Add Modal and Daytona sandbox credentials to the subprocess env
blocklist so they're not leaked to agent terminal sessions via
printenv/env.

Cherry-picked from PR #1571 by ygd58.

* fix(gateway): cap interrupt recursion depth to prevent resource exhaustion (#816)

When a user sends multiple messages while the agent keeps failing,
_run_agent() calls itself recursively with no depth limit. This can
exhaust stack/memory if the agent is in a failure loop.

Add _MAX_INTERRUPT_DEPTH = 3. When exceeded, the pending message is
logged and the current result is returned instead of recursing deeper.

The log handler duplication bug described in #816 was already fixed
separately (AIAgent.__init__ deduplicates handlers).

* fix(gateway): /model shows active fallback model instead of config default (#1615)

When the agent falls back to a different model (e.g. due to rate
limiting), /model still showed the config default. Now tracks the
effective model/provider after each agent run and displays it.

Cleared when the primary model succeeds again or the user explicitly
switches via /model.

Cherry-picked from PR #1616 by MaxKerkula. Added hasattr guard for
test compatibility.

* feat(gateway): inject reply-to message context for out-of-session replies (#1594)

When a user replies to a Telegram message, check if the quoted text
exists in the current session transcript. If missing (from cron jobs,
background tasks, or old sessions), prepend [Replying to: "..."] to
the message so the agent has context about what's being referenced.

- Add reply_to_text field to MessageEvent (base.py)
- Populate from Telegram's reply_to_message (text or caption)
- Inject context in _handle_message when not found in history

Based on PR #1596 by anpicasso (cherry-picked reply-to feature only,
excluded unrelated /server command and background delegation changes).

* fix: recognize Claude Code OAuth credentials in startup gate (#1455)

The _has_any_provider_configured() startup check didn't look for
Claude Code OAuth credentials (~/.claude/.credentials.json). Users
with only Claude Code auth got the setup wizard instead of starting.

Cherry-picked from PR #1455 by kshitijk4poor.

---------

Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
Co-authored-by: Max K <MaxKerkula@users.noreply.github.com>
Co-authored-by: Angello Picasso <angello.picasso@devsu.com>
Co-authored-by: kshitij <kshitijk4poor@users.noreply.github.com>
2026-03-17 02:32:16 -07:00
teknium1
693f5786ac perf: use ripgrep for file search (200x faster than find)
search_files(target='files') now uses rg --files -g instead of find.
Ripgrep respects .gitignore, excludes hidden dirs by default, and has
parallel directory traversal — ~200x faster on wide trees (0.14s vs 34s
benchmarked on 164-repo tree).

Falls back to find when rg is unavailable, preserving hidden-dir
exclusion and BSD find compatibility.

Salvaged from PR #1464 by @light-merlin-dark (Merlin) — adapted to
preserve hidden-dir exclusion added since the original PR.
2026-03-17 02:32:02 -07:00
Teknium
9ece1ce2de feat(gateway): inject reply-to message context for out-of-session replies (#1594)
* fix: prevent infinite 400 failure loop on context overflow (#1630)

When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message.  This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error.  Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.

Three-layer fix:

1. run_agent.py — Fallback heuristic: when a 400 error has a very short
   generic message AND the session is large (>40% of context or >80
   messages), treat it as a probable context overflow and trigger
   compression instead of aborting.

2. run_agent.py + gateway/run.py — Don't persist failed messages:
   when the agent returns failed=True before generating any response,
   skip writing the user's message to the transcript/DB. This prevents
   the session from growing on each failure.

3. gateway/run.py — Smarter error messages: detect context-overflow
   failures and suggest /compact or /reset specifically, instead of a
   generic 'try again' that will fail identically.

* fix(skills): detect prompt injection patterns and block cache file reads

Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):

1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
   (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
   was the original injection vector — untrusted skill descriptions
   in the catalog contained adversarial text that the model executed.

2. skill_view: warns when skills are loaded from outside the trusted
   ~/.hermes/skills/ directory, and detects common injection patterns
   in skill content ("ignore previous instructions", "<system>", etc.).

Cherry-picked from PR #1562 by ygd58.

* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)

Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.

- Apply truncate_message() chunking in _send_to_platform() before
  dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
  in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement

Cherry-picked from PR #1557 by llbn.

* fix(approval): show full command in dangerous command approval (#1553)

Previously the command was truncated to 80 chars in CLI (with a
[v]iew full option), 500 chars in Discord embeds, and missing entirely
in Telegram/Slack approval messages. Now the full command is always
displayed everywhere:

- CLI: removed 80-char truncation and [v]iew full menu option
- Gateway (TG/Slack): approval_required message includes full command
  in a code block
- Discord: embed shows full command up to 4096-char limit
- Windows: skip SIGALRM-based test timeout (Unix-only)
- Updated tests: replaced view-flow tests with direct approval tests

Cherry-picked from PR #1566 by crazywriter1.

* fix(cli): flush stdout during agent loop to prevent macOS display freeze (#1624)

The interrupt polling loop in chat() waited on the queue without
invalidating the prompt_toolkit renderer. On macOS, the StdoutProxy
buffer only flushed on input events, causing the CLI to appear frozen
during tool execution until the user typed a key.

Fix: call _invalidate() on each queue timeout (every ~100ms, throttled
to 150ms) to force the renderer to flush buffered agent output.

* fix(claw): warn when API keys are skipped during OpenClaw migration (#1580)

When --migrate-secrets is not passed (the default), API keys like
OPENROUTER_API_KEY are silently skipped with no warning. Users don't
realize their keys weren't migrated until the agent fails to connect.

Add a post-migration warning with actionable instructions: either
re-run with --migrate-secrets or add the key manually via
hermes config set.

Cherry-picked from PR #1593 by ygd58.

* fix(security): block sandbox backend creds from subprocess env (#1264)

Add Modal and Daytona sandbox credentials to the subprocess env
blocklist so they're not leaked to agent terminal sessions via
printenv/env.

Cherry-picked from PR #1571 by ygd58.

* fix(gateway): cap interrupt recursion depth to prevent resource exhaustion (#816)

When a user sends multiple messages while the agent keeps failing,
_run_agent() calls itself recursively with no depth limit. This can
exhaust stack/memory if the agent is in a failure loop.

Add _MAX_INTERRUPT_DEPTH = 3. When exceeded, the pending message is
logged and the current result is returned instead of recursing deeper.

The log handler duplication bug described in #816 was already fixed
separately (AIAgent.__init__ deduplicates handlers).

* fix(gateway): /model shows active fallback model instead of config default (#1615)

When the agent falls back to a different model (e.g. due to rate
limiting), /model still showed the config default. Now tracks the
effective model/provider after each agent run and displays it.

Cleared when the primary model succeeds again or the user explicitly
switches via /model.

Cherry-picked from PR #1616 by MaxKerkula. Added hasattr guard for
test compatibility.

* feat(gateway): inject reply-to message context for out-of-session replies (#1594)

When a user replies to a Telegram message, check if the quoted text
exists in the current session transcript. If missing (from cron jobs,
background tasks, or old sessions), prepend [Replying to: "..."] to
the message so the agent has context about what's being referenced.

- Add reply_to_text field to MessageEvent (base.py)
- Populate from Telegram's reply_to_message (text or caption)
- Inject context in _handle_message when not found in history

Based on PR #1596 by anpicasso (cherry-picked reply-to feature only,
excluded unrelated /server command and background delegation changes).

---------

Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
Co-authored-by: Max K <MaxKerkula@users.noreply.github.com>
Co-authored-by: Angello Picasso <angello.picasso@devsu.com>
2026-03-17 02:31:27 -07:00
Teknium
36a76bf9db Merge pull request #1661 from NousResearch/fix/discord-thread-persistence
fix(discord): persist thread participation across gateway restarts
2026-03-17 02:27:09 -07:00
Teknium
d0faf77208 fix(gateway): /model shows active fallback model instead of config default (#1615)
* fix: prevent infinite 400 failure loop on context overflow (#1630)

When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message.  This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error.  Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.

Three-layer fix:

1. run_agent.py — Fallback heuristic: when a 400 error has a very short
   generic message AND the session is large (>40% of context or >80
   messages), treat it as a probable context overflow and trigger
   compression instead of aborting.

2. run_agent.py + gateway/run.py — Don't persist failed messages:
   when the agent returns failed=True before generating any response,
   skip writing the user's message to the transcript/DB. This prevents
   the session from growing on each failure.

3. gateway/run.py — Smarter error messages: detect context-overflow
   failures and suggest /compact or /reset specifically, instead of a
   generic 'try again' that will fail identically.

* fix(skills): detect prompt injection patterns and block cache file reads

Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):

1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
   (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
   was the original injection vector — untrusted skill descriptions
   in the catalog contained adversarial text that the model executed.

2. skill_view: warns when skills are loaded from outside the trusted
   ~/.hermes/skills/ directory, and detects common injection patterns
   in skill content ("ignore previous instructions", "<system>", etc.).

Cherry-picked from PR #1562 by ygd58.

* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)

Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.

- Apply truncate_message() chunking in _send_to_platform() before
  dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
  in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement

Cherry-picked from PR #1557 by llbn.

* fix(approval): show full command in dangerous command approval (#1553)

Previously the command was truncated to 80 chars in CLI (with a
[v]iew full option), 500 chars in Discord embeds, and missing entirely
in Telegram/Slack approval messages. Now the full command is always
displayed everywhere:

- CLI: removed 80-char truncation and [v]iew full menu option
- Gateway (TG/Slack): approval_required message includes full command
  in a code block
- Discord: embed shows full command up to 4096-char limit
- Windows: skip SIGALRM-based test timeout (Unix-only)
- Updated tests: replaced view-flow tests with direct approval tests

Cherry-picked from PR #1566 by crazywriter1.

* fix(cli): flush stdout during agent loop to prevent macOS display freeze (#1624)

The interrupt polling loop in chat() waited on the queue without
invalidating the prompt_toolkit renderer. On macOS, the StdoutProxy
buffer only flushed on input events, causing the CLI to appear frozen
during tool execution until the user typed a key.

Fix: call _invalidate() on each queue timeout (every ~100ms, throttled
to 150ms) to force the renderer to flush buffered agent output.

* fix(claw): warn when API keys are skipped during OpenClaw migration (#1580)

When --migrate-secrets is not passed (the default), API keys like
OPENROUTER_API_KEY are silently skipped with no warning. Users don't
realize their keys weren't migrated until the agent fails to connect.

Add a post-migration warning with actionable instructions: either
re-run with --migrate-secrets or add the key manually via
hermes config set.

Cherry-picked from PR #1593 by ygd58.

* fix(security): block sandbox backend creds from subprocess env (#1264)

Add Modal and Daytona sandbox credentials to the subprocess env
blocklist so they're not leaked to agent terminal sessions via
printenv/env.

Cherry-picked from PR #1571 by ygd58.

* fix(gateway): cap interrupt recursion depth to prevent resource exhaustion (#816)

When a user sends multiple messages while the agent keeps failing,
_run_agent() calls itself recursively with no depth limit. This can
exhaust stack/memory if the agent is in a failure loop.

Add _MAX_INTERRUPT_DEPTH = 3. When exceeded, the pending message is
logged and the current result is returned instead of recursing deeper.

The log handler duplication bug described in #816 was already fixed
separately (AIAgent.__init__ deduplicates handlers).

* fix(gateway): /model shows active fallback model instead of config default (#1615)

When the agent falls back to a different model (e.g. due to rate
limiting), /model still showed the config default. Now tracks the
effective model/provider after each agent run and displays it.

Cleared when the primary model succeeds again or the user explicitly
switches via /model.

Cherry-picked from PR #1616 by MaxKerkula. Added hasattr guard for
test compatibility.

---------

Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
Co-authored-by: Max K <MaxKerkula@users.noreply.github.com>
2026-03-17 02:26:51 -07:00
teknium1
c8582fc4a2 fix(discord): persist thread participation across gateway restarts
_bot_participated_threads was an in-memory set — lost on every restart.
After restart, the bot forgot which threads it was active in, requiring
fresh @mentions and potentially creating duplicate threads instead of
continuing existing conversations.

Changes:
- Persist thread IDs to ~/.hermes/discord_threads.json
- Load on adapter init, save on every new thread participation
- _track_thread() replaces direct .add() calls for atomic persist
- Cap at 500 tracked threads to prevent unbounded growth
- /thread slash command also tracks participation
- 7 new tests covering persistence, restart survival, corruption
  recovery, cap enforcement
2026-03-17 02:26:34 -07:00
Teknium
60b67e2b47 fix(gateway): cap interrupt recursion depth to prevent resource exhaustion (#816)
* fix: prevent infinite 400 failure loop on context overflow (#1630)

When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message.  This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error.  Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.

Three-layer fix:

1. run_agent.py — Fallback heuristic: when a 400 error has a very short
   generic message AND the session is large (>40% of context or >80
   messages), treat it as a probable context overflow and trigger
   compression instead of aborting.

2. run_agent.py + gateway/run.py — Don't persist failed messages:
   when the agent returns failed=True before generating any response,
   skip writing the user's message to the transcript/DB. This prevents
   the session from growing on each failure.

3. gateway/run.py — Smarter error messages: detect context-overflow
   failures and suggest /compact or /reset specifically, instead of a
   generic 'try again' that will fail identically.

* fix(skills): detect prompt injection patterns and block cache file reads

Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):

1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
   (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
   was the original injection vector — untrusted skill descriptions
   in the catalog contained adversarial text that the model executed.

2. skill_view: warns when skills are loaded from outside the trusted
   ~/.hermes/skills/ directory, and detects common injection patterns
   in skill content ("ignore previous instructions", "<system>", etc.).

Cherry-picked from PR #1562 by ygd58.

* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)

Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.

- Apply truncate_message() chunking in _send_to_platform() before
  dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
  in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement

Cherry-picked from PR #1557 by llbn.

* fix(approval): show full command in dangerous command approval (#1553)

Previously the command was truncated to 80 chars in CLI (with a
[v]iew full option), 500 chars in Discord embeds, and missing entirely
in Telegram/Slack approval messages. Now the full command is always
displayed everywhere:

- CLI: removed 80-char truncation and [v]iew full menu option
- Gateway (TG/Slack): approval_required message includes full command
  in a code block
- Discord: embed shows full command up to 4096-char limit
- Windows: skip SIGALRM-based test timeout (Unix-only)
- Updated tests: replaced view-flow tests with direct approval tests

Cherry-picked from PR #1566 by crazywriter1.

* fix(cli): flush stdout during agent loop to prevent macOS display freeze (#1624)

The interrupt polling loop in chat() waited on the queue without
invalidating the prompt_toolkit renderer. On macOS, the StdoutProxy
buffer only flushed on input events, causing the CLI to appear frozen
during tool execution until the user typed a key.

Fix: call _invalidate() on each queue timeout (every ~100ms, throttled
to 150ms) to force the renderer to flush buffered agent output.

* fix(claw): warn when API keys are skipped during OpenClaw migration (#1580)

When --migrate-secrets is not passed (the default), API keys like
OPENROUTER_API_KEY are silently skipped with no warning. Users don't
realize their keys weren't migrated until the agent fails to connect.

Add a post-migration warning with actionable instructions: either
re-run with --migrate-secrets or add the key manually via
hermes config set.

Cherry-picked from PR #1593 by ygd58.

* fix(security): block sandbox backend creds from subprocess env (#1264)

Add Modal and Daytona sandbox credentials to the subprocess env
blocklist so they're not leaked to agent terminal sessions via
printenv/env.

Cherry-picked from PR #1571 by ygd58.

* fix(gateway): cap interrupt recursion depth to prevent resource exhaustion (#816)

When a user sends multiple messages while the agent keeps failing,
_run_agent() calls itself recursively with no depth limit. This can
exhaust stack/memory if the agent is in a failure loop.

Add _MAX_INTERRUPT_DEPTH = 3. When exceeded, the pending message is
logged and the current result is returned instead of recursing deeper.

The log handler duplication bug described in #816 was already fixed
separately (AIAgent.__init__ deduplicates handlers).

---------

Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
2026-03-17 02:23:07 -07:00
Teknium
2c7c30be69 fix(security): harden terminal safety and sandbox file writes (#1653)
* fix(security): harden terminal safety and sandbox file writes

Two security improvements:

1. Dangerous command detection: expand shell -c pattern to catch
   combined flags (bash -lc, bash -ic, ksh -c) that were previously
   undetected. Pattern changed from matching only 'bash -c' to
   matching any shell invocation with -c anywhere in the flags.

2. File write sandboxing: add HERMES_WRITE_SAFE_ROOT env var that
   constrains all write_file/patch operations to a configured directory
   tree. Opt-in — when unset, behavior is unchanged. Useful for
   gateway/messaging deployments that should only touch a workspace.

Based on PR #1085 by ismoilh.

* fix: correct "POSIDEON" typo to "POSEIDON" in banner ASCII art

The poseidon skin's banner_logo had the E and I letters swapped,
spelling "POSIDEON-AGENT" instead of "POSEIDON-AGENT".

---------

Co-authored-by: ismoilh <ismoilh@users.noreply.github.com>
Co-authored-by: unmodeled-tyler <unmodeled.tyler@proton.me>
2026-03-17 02:22:12 -07:00
Teknium
6a320e8bfe fix(security): block sandbox backend creds from subprocess env (#1264)
* fix: prevent infinite 400 failure loop on context overflow (#1630)

When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message.  This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error.  Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.

Three-layer fix:

1. run_agent.py — Fallback heuristic: when a 400 error has a very short
   generic message AND the session is large (>40% of context or >80
   messages), treat it as a probable context overflow and trigger
   compression instead of aborting.

2. run_agent.py + gateway/run.py — Don't persist failed messages:
   when the agent returns failed=True before generating any response,
   skip writing the user's message to the transcript/DB. This prevents
   the session from growing on each failure.

3. gateway/run.py — Smarter error messages: detect context-overflow
   failures and suggest /compact or /reset specifically, instead of a
   generic 'try again' that will fail identically.

* fix(skills): detect prompt injection patterns and block cache file reads

Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):

1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
   (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
   was the original injection vector — untrusted skill descriptions
   in the catalog contained adversarial text that the model executed.

2. skill_view: warns when skills are loaded from outside the trusted
   ~/.hermes/skills/ directory, and detects common injection patterns
   in skill content ("ignore previous instructions", "<system>", etc.).

Cherry-picked from PR #1562 by ygd58.

* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)

Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.

- Apply truncate_message() chunking in _send_to_platform() before
  dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
  in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement

Cherry-picked from PR #1557 by llbn.

* fix(approval): show full command in dangerous command approval (#1553)

Previously the command was truncated to 80 chars in CLI (with a
[v]iew full option), 500 chars in Discord embeds, and missing entirely
in Telegram/Slack approval messages. Now the full command is always
displayed everywhere:

- CLI: removed 80-char truncation and [v]iew full menu option
- Gateway (TG/Slack): approval_required message includes full command
  in a code block
- Discord: embed shows full command up to 4096-char limit
- Windows: skip SIGALRM-based test timeout (Unix-only)
- Updated tests: replaced view-flow tests with direct approval tests

Cherry-picked from PR #1566 by crazywriter1.

* fix(cli): flush stdout during agent loop to prevent macOS display freeze (#1624)

The interrupt polling loop in chat() waited on the queue without
invalidating the prompt_toolkit renderer. On macOS, the StdoutProxy
buffer only flushed on input events, causing the CLI to appear frozen
during tool execution until the user typed a key.

Fix: call _invalidate() on each queue timeout (every ~100ms, throttled
to 150ms) to force the renderer to flush buffered agent output.

* fix(claw): warn when API keys are skipped during OpenClaw migration (#1580)

When --migrate-secrets is not passed (the default), API keys like
OPENROUTER_API_KEY are silently skipped with no warning. Users don't
realize their keys weren't migrated until the agent fails to connect.

Add a post-migration warning with actionable instructions: either
re-run with --migrate-secrets or add the key manually via
hermes config set.

Cherry-picked from PR #1593 by ygd58.

* fix(security): block sandbox backend creds from subprocess env (#1264)

Add Modal and Daytona sandbox credentials to the subprocess env
blocklist so they're not leaked to agent terminal sessions via
printenv/env.

Cherry-picked from PR #1571 by ygd58.

---------

Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
2026-03-17 02:20:42 -07:00
Teknium
cb0deb5f9d feat: add NeuTTS optional skill + local TTS provider backend
* feat(skills): add bundled neutts optional skill

Add NeuTTS optional skill with CLI scaffold, bootstrap helper, and
sample voice profile. Also fixes skills_hub.py to handle binary
assets (WAV files) during skill installation.

Changes:
- optional-skills/mlops/models/neutts/ — skill + CLI scaffold
- tools/skills_hub.py — binary asset support (read_bytes, write_bytes)
- tests/tools/test_skills_hub.py — regression tests for binary assets

* feat(tts): add NeuTTS as local TTS provider backend

Add NeuTTS as a fourth TTS provider option alongside Edge, ElevenLabs,
and OpenAI. NeuTTS runs fully on-device via neutts_cli — no API key
needed.

Provider behavior:
- Explicit: set tts.provider to 'neutts' in config.yaml
- Fallback: when Edge TTS is unavailable and neutts_cli is installed,
  automatically falls back to NeuTTS instead of failing
- check_tts_requirements() now includes NeuTTS in availability checks

NeuTTS outputs WAV natively. For Telegram voice bubbles, ffmpeg
converts to Opus (same pattern as Edge TTS).

Changes:
- tools/tts_tool.py — _generate_neutts(), _check_neutts_available(),
  provider dispatch, fallback logic, Opus conversion
- hermes_cli/config.py — tts.neutts config defaults

---------

Co-authored-by: unmodeled-tyler <unmodeled.tyler@proton.me>
2026-03-17 02:13:34 -07:00
Teknium
766f4aae2b refactor: tie api_mode to provider config instead of env var (#1656)
Remove HERMES_API_MODE env var. api_mode is now configured where the
endpoint is defined:

- model.api_mode in config.yaml (for the active model config)
- custom_providers[].api_mode (for named custom providers)

Replace _get_configured_api_mode() with _parse_api_mode() which just
validates a value against the whitelist without reading env vars.

Both paths (model config and named custom providers) now read api_mode
from their respective config entries rather than a global override.
2026-03-17 02:13:26 -07:00
Teknium
4e66d22151 fix(claw): warn when API keys are skipped during OpenClaw migration (#1580)
* fix: prevent infinite 400 failure loop on context overflow (#1630)

When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message.  This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error.  Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.

Three-layer fix:

1. run_agent.py — Fallback heuristic: when a 400 error has a very short
   generic message AND the session is large (>40% of context or >80
   messages), treat it as a probable context overflow and trigger
   compression instead of aborting.

2. run_agent.py + gateway/run.py — Don't persist failed messages:
   when the agent returns failed=True before generating any response,
   skip writing the user's message to the transcript/DB. This prevents
   the session from growing on each failure.

3. gateway/run.py — Smarter error messages: detect context-overflow
   failures and suggest /compact or /reset specifically, instead of a
   generic 'try again' that will fail identically.

* fix(skills): detect prompt injection patterns and block cache file reads

Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):

1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
   (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
   was the original injection vector — untrusted skill descriptions
   in the catalog contained adversarial text that the model executed.

2. skill_view: warns when skills are loaded from outside the trusted
   ~/.hermes/skills/ directory, and detects common injection patterns
   in skill content ("ignore previous instructions", "<system>", etc.).

Cherry-picked from PR #1562 by ygd58.

* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)

Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.

- Apply truncate_message() chunking in _send_to_platform() before
  dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
  in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement

Cherry-picked from PR #1557 by llbn.

* fix(approval): show full command in dangerous command approval (#1553)

Previously the command was truncated to 80 chars in CLI (with a
[v]iew full option), 500 chars in Discord embeds, and missing entirely
in Telegram/Slack approval messages. Now the full command is always
displayed everywhere:

- CLI: removed 80-char truncation and [v]iew full menu option
- Gateway (TG/Slack): approval_required message includes full command
  in a code block
- Discord: embed shows full command up to 4096-char limit
- Windows: skip SIGALRM-based test timeout (Unix-only)
- Updated tests: replaced view-flow tests with direct approval tests

Cherry-picked from PR #1566 by crazywriter1.

* fix(cli): flush stdout during agent loop to prevent macOS display freeze (#1624)

The interrupt polling loop in chat() waited on the queue without
invalidating the prompt_toolkit renderer. On macOS, the StdoutProxy
buffer only flushed on input events, causing the CLI to appear frozen
during tool execution until the user typed a key.

Fix: call _invalidate() on each queue timeout (every ~100ms, throttled
to 150ms) to force the renderer to flush buffered agent output.

* fix(claw): warn when API keys are skipped during OpenClaw migration (#1580)

When --migrate-secrets is not passed (the default), API keys like
OPENROUTER_API_KEY are silently skipped with no warning. Users don't
realize their keys weren't migrated until the agent fails to connect.

Add a post-migration warning with actionable instructions: either
re-run with --migrate-secrets or add the key manually via
hermes config set.

Cherry-picked from PR #1593 by ygd58.

---------

Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
2026-03-17 02:10:36 -07:00
Teknium
8992babaa3 fix(cli): flush stdout during agent loop to prevent macOS display freeze (#1624)
* fix: prevent infinite 400 failure loop on context overflow (#1630)

When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message.  This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error.  Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.

Three-layer fix:

1. run_agent.py — Fallback heuristic: when a 400 error has a very short
   generic message AND the session is large (>40% of context or >80
   messages), treat it as a probable context overflow and trigger
   compression instead of aborting.

2. run_agent.py + gateway/run.py — Don't persist failed messages:
   when the agent returns failed=True before generating any response,
   skip writing the user's message to the transcript/DB. This prevents
   the session from growing on each failure.

3. gateway/run.py — Smarter error messages: detect context-overflow
   failures and suggest /compact or /reset specifically, instead of a
   generic 'try again' that will fail identically.

* fix(skills): detect prompt injection patterns and block cache file reads

Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):

1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
   (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
   was the original injection vector — untrusted skill descriptions
   in the catalog contained adversarial text that the model executed.

2. skill_view: warns when skills are loaded from outside the trusted
   ~/.hermes/skills/ directory, and detects common injection patterns
   in skill content ("ignore previous instructions", "<system>", etc.).

Cherry-picked from PR #1562 by ygd58.

* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)

Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.

- Apply truncate_message() chunking in _send_to_platform() before
  dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
  in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement

Cherry-picked from PR #1557 by llbn.

* fix(approval): show full command in dangerous command approval (#1553)

Previously the command was truncated to 80 chars in CLI (with a
[v]iew full option), 500 chars in Discord embeds, and missing entirely
in Telegram/Slack approval messages. Now the full command is always
displayed everywhere:

- CLI: removed 80-char truncation and [v]iew full menu option
- Gateway (TG/Slack): approval_required message includes full command
  in a code block
- Discord: embed shows full command up to 4096-char limit
- Windows: skip SIGALRM-based test timeout (Unix-only)
- Updated tests: replaced view-flow tests with direct approval tests

Cherry-picked from PR #1566 by crazywriter1.

* fix(cli): flush stdout during agent loop to prevent macOS display freeze (#1624)

The interrupt polling loop in chat() waited on the queue without
invalidating the prompt_toolkit renderer. On macOS, the StdoutProxy
buffer only flushed on input events, causing the CLI to appear frozen
during tool execution until the user typed a key.

Fix: call _invalidate() on each queue timeout (every ~100ms, throttled
to 150ms) to force the renderer to flush buffered agent output.

---------

Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
2026-03-17 02:09:26 -07:00
Teknium
49043b7b7d feat: add /tools disable/enable/list slash commands with session reset (#1652)
Add in-session tool management via /tools disable/enable/list, plus
hermes tools list/disable/enable CLI subcommands. Supports both
built-in toolsets (web, memory) and MCP tools (github:create_issue).

To preserve prompt caching, /tools disable/enable in a chat session
saves the change to config and resets the session cleanly — the user
is asked to confirm before the reset happens.

Also improves prefix matching: /qui now dispatches to /quit instead
of showing ambiguous when longer skill commands like /quint-pipeline
are installed.

Based on PR #1520 by @YanSte.

Co-authored-by: Yannick Stephan <YanSte@users.noreply.github.com>
2026-03-17 02:05:26 -07:00
Teknium
f2414bfd45 feat: allow custom endpoints to use responses API via api_mode override (#1651)
Add HERMES_API_MODE env var and model.api_mode config field to let
custom OpenAI-compatible endpoints opt into codex_responses mode
without requiring the OpenAI Codex OAuth provider path.

- _get_configured_api_mode() reads HERMES_API_MODE env (precedence)
  then model.api_mode from config.yaml; validates against whitelist
- Applied in both _resolve_openrouter_runtime() and
  _resolve_named_custom_runtime() (original PR only covered openrouter)
- Fix _dump_api_request_debug() to show /responses URL when in
  codex_responses mode instead of always showing /chat/completions
- Tests for config override, env override, invalid values, named
  custom providers, and debug dump URL for both API modes

Inspired by PR #1041 by @mxyhi.

Co-authored-by: mxyhi <mxyhi@users.noreply.github.com>
2026-03-17 02:04:36 -07:00
0xbyt4
68fbcdaa06 fix: add browser_console to browser toolset and core tools list (#1084)
browser_console was registered in the tool registry but missing from
all toolset definitions (TOOLSETS, _HERMES_CORE_TOOLS, _LEGACY_TOOLSET_MAP),
so the agent could never discover or use it.

Added to all 4 locations + 4 wiring tests.

Cherry-picked from PR #1084 by @0xbyt4 (authorship preserved in tests).
2026-03-17 02:02:57 -07:00
teknium1
7d91b436e4 fix: exclude hidden directories from find/grep search backends (#1558)
The primary injection vector in #1558 was search_files discovering
catalog cache files in .hub/index-cache/ via find or grep, which
don't skip hidden directories like ripgrep does by default.

Three-layer fix:

1. _search_files (find): add -not -path '*/.*' to exclude hidden
   directories, matching ripgrep's default behavior.

2. _search_with_grep: add --exclude-dir='.*' to skip hidden
   directories in the grep fallback path.

3. _write_index_cache: write a .ignore file to .hub/ so ripgrep
   also skips it even when invoked with --hidden (belt-and-suspenders).

This makes all three search backends (rg, grep, find) consistently
exclude hidden directories, preventing the agent from discovering
and reading unvetted community content in hub cache files.
2026-03-17 02:02:57 -07:00
Teknium
40e2f8d9f0 feat(provider): add OpenCode Zen and OpenCode Go providers
Add support for OpenCode Zen (pay-as-you-go, 35+ curated models) and
OpenCode Go ($10/month subscription, open models) as first-class providers.

Both are OpenAI-compatible endpoints resolved via the generic api_key
provider flow — no custom adapter needed.

Files changed:
- hermes_cli/auth.py — ProviderConfig entries + aliases
- hermes_cli/config.py — OPENCODE_ZEN/GO API key env vars
- hermes_cli/models.py — model catalogs, labels, aliases, provider order
- hermes_cli/main.py — provider labels, menu entries, model flow dispatch
- hermes_cli/setup.py — setup wizard branches (idx 10, 11)
- agent/model_metadata.py — context lengths for all OpenCode models
- agent/auxiliary_client.py — default aux models
- .env.example — documentation

Co-authored-by: DevAgarwal2 <DevAgarwal2@users.noreply.github.com>
2026-03-17 02:02:43 -07:00
Teknium
4cb6735541 fix(approval): show full command in dangerous command approval (#1553)
* fix: prevent infinite 400 failure loop on context overflow (#1630)

When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message.  This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error.  Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.

Three-layer fix:

1. run_agent.py — Fallback heuristic: when a 400 error has a very short
   generic message AND the session is large (>40% of context or >80
   messages), treat it as a probable context overflow and trigger
   compression instead of aborting.

2. run_agent.py + gateway/run.py — Don't persist failed messages:
   when the agent returns failed=True before generating any response,
   skip writing the user's message to the transcript/DB. This prevents
   the session from growing on each failure.

3. gateway/run.py — Smarter error messages: detect context-overflow
   failures and suggest /compact or /reset specifically, instead of a
   generic 'try again' that will fail identically.

* fix(skills): detect prompt injection patterns and block cache file reads

Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):

1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
   (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
   was the original injection vector — untrusted skill descriptions
   in the catalog contained adversarial text that the model executed.

2. skill_view: warns when skills are loaded from outside the trusted
   ~/.hermes/skills/ directory, and detects common injection patterns
   in skill content ("ignore previous instructions", "<system>", etc.).

Cherry-picked from PR #1562 by ygd58.

* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)

Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.

- Apply truncate_message() chunking in _send_to_platform() before
  dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
  in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement

Cherry-picked from PR #1557 by llbn.

* fix(approval): show full command in dangerous command approval (#1553)

Previously the command was truncated to 80 chars in CLI (with a
[v]iew full option), 500 chars in Discord embeds, and missing entirely
in Telegram/Slack approval messages. Now the full command is always
displayed everywhere:

- CLI: removed 80-char truncation and [v]iew full menu option
- Gateway (TG/Slack): approval_required message includes full command
  in a code block
- Discord: embed shows full command up to 4096-char limit
- Windows: skip SIGALRM-based test timeout (Unix-only)
- Updated tests: replaced view-flow tests with direct approval tests

Cherry-picked from PR #1566 by crazywriter1.

---------

Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
2026-03-17 02:02:33 -07:00
teknium1
0351e4fa90 fix: add metadata param to base send_image and forward in send_animation
_send_response_parts() calls send_image(metadata=_thread_metadata) but
the base class signature didn't accept metadata, crashing platforms that
don't override send_image. send_animation already had the param but
wasn't forwarding it.

Credit: @0xbyt4 (PR #1077)
2026-03-17 02:02:28 -07:00
Teknium
1b2d6c424c fix: add --yes flag to bypass confirmation in /skills install and uninstall (#1647)
Fixes hanging when using /skills install or /skills uninstall from the
TUI — bare input() calls hang inside prompt_toolkit's event loop.

Changes:
- Add skip_confirm parameter to do_install() and do_uninstall()
- Separate --yes/-y (confirmation bypass) from --force (scan override)
  in both argparse and slash command handlers
- Update usage hint for /skills uninstall to show [--yes]

The original PR (#1595) accidentally deleted the install_from_quarantine()
call, which would have broken all installs. That bug is not present here.

Based on PR #1595 by 333Alden333.

Co-authored-by: 333Alden333 <333Alden333@users.noreply.github.com>
2026-03-17 01:59:07 -07:00
Teknium
28c35d045d Merge pull request #1537 from aydnOktay/improve/skill-manager-error-logging
Improve error logging in skill manager tool
2026-03-17 01:53:58 -07:00
Teknium
1f6a1f0028 fix(tools): chunk long messages in send_message_tool before platform dispatch
* add base support

* fix: correct skill author attribution to youssefea

* fix(tools): chunk long messages in send_message_tool before platform dispatch

  - Convert BasePlatformAdapter.truncate_message() to @staticmethod
  - Apply truncate_message() in _send_to_platform() with per-platform
    max lengths
  - Remove naive character split in _send_discord()
  - Attach media files to last chunk only for Telegram
  - Add regression tests for chunking and media placement

---------

Co-authored-by: youssefea <youcefea99@gmail.com>
Co-authored-by: llbn <46884939+llbn@users.noreply.github.com>
2026-03-17 01:52:51 -07:00
Teknium
d7029489d6 fix: show custom endpoint models in /model via live API probe (#1645)
Add 'custom' to the provider order so custom OpenAI-compatible
endpoints appear in /model list. Probes the endpoint's /models API
to dynamically discover available models.

Changes:
- Add 'custom' to _PROVIDER_ORDER in list_available_providers()
- Add _get_custom_base_url() helper to read model.base_url from config
- Add custom branch in provider_model_ids() using fetch_api_models()
- Custom endpoint detection via base_url presence for has_creds check

Based on PR #1612 by @aashizpoudel.

Co-authored-by: Aashish Poudel <aashizpoudel@users.noreply.github.com>
2026-03-17 01:52:46 -07:00
Teknium
12afccd9ca fix(tools): chunk long messages in send_message_tool before dispatch (#1552)
* fix: prevent infinite 400 failure loop on context overflow (#1630)

When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message.  This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error.  Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.

Three-layer fix:

1. run_agent.py — Fallback heuristic: when a 400 error has a very short
   generic message AND the session is large (>40% of context or >80
   messages), treat it as a probable context overflow and trigger
   compression instead of aborting.

2. run_agent.py + gateway/run.py — Don't persist failed messages:
   when the agent returns failed=True before generating any response,
   skip writing the user's message to the transcript/DB. This prevents
   the session from growing on each failure.

3. gateway/run.py — Smarter error messages: detect context-overflow
   failures and suggest /compact or /reset specifically, instead of a
   generic 'try again' that will fail identically.

* fix(skills): detect prompt injection patterns and block cache file reads

Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):

1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
   (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
   was the original injection vector — untrusted skill descriptions
   in the catalog contained adversarial text that the model executed.

2. skill_view: warns when skills are loaded from outside the trusted
   ~/.hermes/skills/ directory, and detects common injection patterns
   in skill content ("ignore previous instructions", "<system>", etc.).

Cherry-picked from PR #1562 by ygd58.

* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)

Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.

- Apply truncate_message() chunking in _send_to_platform() before
  dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
  in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement

Cherry-picked from PR #1557 by llbn.

---------

Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
2026-03-17 01:52:43 -07:00
Teknium
81f76111b0 Merge pull request #1560 from eren-karakus0/fix/singularity-preflight-check
fix(terminal): add Singularity/Apptainer preflight availability check
2026-03-17 01:52:03 -07:00
Teknium
96dac22194 fix: prevent infinite 400 loop on context overflow + block prompt injection via cache files (#1630, #1558)
* fix: prevent infinite 400 failure loop on context overflow (#1630)

When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message.  This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error.  Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.

Three-layer fix:

1. run_agent.py — Fallback heuristic: when a 400 error has a very short
   generic message AND the session is large (>40% of context or >80
   messages), treat it as a probable context overflow and trigger
   compression instead of aborting.

2. run_agent.py + gateway/run.py — Don't persist failed messages:
   when the agent returns failed=True before generating any response,
   skip writing the user's message to the transcript/DB. This prevents
   the session from growing on each failure.

3. gateway/run.py — Smarter error messages: detect context-overflow
   failures and suggest /compact or /reset specifically, instead of a
   generic 'try again' that will fail identically.

* fix(skills): detect prompt injection patterns and block cache file reads

Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):

1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
   (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
   was the original injection vector — untrusted skill descriptions
   in the catalog contained adversarial text that the model executed.

2. skill_view: warns when skills are loaded from outside the trusted
   ~/.hermes/skills/ directory, and detects common injection patterns
   in skill content ("ignore previous instructions", "<system>", etc.).

Cherry-picked from PR #1562 by ygd58.

---------

Co-authored-by: buray <ygd58@users.noreply.github.com>
2026-03-17 01:50:59 -07:00
Teknium
2d36819503 feat: add Base blockchain optional skill
* add base support

* fix: correct skill author attribution to youssefea

---------

Co-authored-by: youssefea <youcefea99@gmail.com>
2026-03-17 01:50:03 -07:00
Teknium
8e20a7e035 fix(gateway): strip MEDIA: and [[audio_as_voice]] tags from message body
* fix(gateway): strip MEDIA: and [[audio_as_voice]] tags from message body

Closes #1561

* fix: remove redundant re import, use existing import

---------

Co-authored-by: mettin4 <coktinmetin@gmail.com>
2026-03-17 01:47:35 -07:00
Teknium
4920c5940f feat: auto-detect local file paths in gateway responses for native media delivery (#1640)
Small models (7B-14B) can't reliably use MEDIA: or IMAGE: syntax. This
adds extract_local_files() to BasePlatformAdapter that regex-detects
bare local file paths ending in image/video extensions, validates them
with os.path.isfile(), and delivers them as native platform attachments.

Hardened over the original PR:
- Code-block exclusion: paths inside fenced blocks and inline code are
  skipped so code samples are never mutilated
- URL rejection: negative lookbehind prevents matching path segments
  inside HTTP URLs
- Relative path rejection: ./foo.png no longer matches
- Tilde path cleanup: raw ~/... form is removed from response text
- Deduplication by expanded path
- Added .webm to _VIDEO_EXTS
- Fallback to send_document for unrecognized media extensions

Based on PR #1636 by sudoingX.

Co-authored-by: sudoingX <sudoingX@users.noreply.github.com>
2026-03-17 01:47:34 -07:00
Teknium
3744118311 feat(cli): two-stage /model autocomplete with ghost text suggestions (#1641)
* feat(cli): two-stage /model autocomplete with ghost text suggestions

- SlashCommandCompleter: Tab-complete providers first (anthropic:, openrouter:, etc.)
  then models within the selected provider
- SlashCommandAutoSuggest: inline ghost text for slash commands, subcommands,
  and /model provider:model two-stage suggestions
- Custom Tab key binding: accepts provider completion and immediately
  re-triggers completions to show that provider's models
- COMMANDS_BY_CATEGORY: structured format with explicit subcommands for
  tab completion and ghost text (prompt, reasoning, voice, skills, cron, browser)
- SUBCOMMANDS dict auto-extracted from command definitions
- Model/provider info cached 60s for responsive completions

* fix: repair test regression and restore gold color from PR #1622

- Fix test_unknown_command_still_shows_error: patch _cprint instead of
  console.print to match the _cprint switch in process_command()
- Restore gold color on 'Type /help' hint using _DIM + _GOLD constants
  instead of bare \033[2m (was losing the #B8860B gold)
- Use _GOLD constant for ambiguous command message for consistency
- Add clarifying comment on SUBCOMMANDS regex fallback

---------

Co-authored-by: Lars van der Zande <lmvanderzande@gmail.com>
2026-03-17 01:47:32 -07:00
Teknium
5ada0b95e9 Merge pull request #1609 from 0xbyt4/fix/context-counter-cache-tokens
fix: context counter shows cached token count in status bar
2026-03-17 01:45:12 -07:00
teknium1
19eaf5d956 test: fix telegram mock to include ParseMode constant
The MarkdownV2 formatting change imports telegram.constants.ParseMode,
which the test mock didn't provide. Add ParseMode to the mock so
existing tests continue working.
2026-03-17 01:44:11 -07:00
Alex Ferrari
365d175100 fix: apply MarkdownV2 formatting in _send_telegram for proper rendering
The _send_telegram() function was sending raw markdown text without
parse_mode, causing bold, links, and headers to render as plain text.
This fix reuses the gateway adapter's format_message() to convert
markdown to Telegram's MarkdownV2 format, with a fallback to plain
text if parsing fails.
2026-03-17 01:44:11 -07:00
Teknium
c3ca68d25b Merge pull request #1614 from PeterFile/fix/launchd-service-recovery
fix(gateway): recover stale launchd service state
2026-03-17 01:43:07 -07:00
Teknium
eaa9ceeb43 Merge pull request #1621 from Death-Incarnate/main
fix: isolate test_anthropic_adapter from local credentials
2026-03-17 01:40:39 -07:00
Teknium
949fac192f fix(tools): remove unnecessary crontab requirement from cronjob tool (#1638)
* fix(tools): remove unnecessary crontab requirement from cronjob tool

The hermes cron system is internal — it uses a JSON-based scheduler
ticked by the gateway (cron/scheduler.py), not system crontab.

The check for shutil.which('crontab') was preventing the cronjob tool
from being available in environments without crontab installed (e.g.
minimal Ubuntu containers).

Changes:
- Remove shutil.which('crontab') check from check_cronjob_requirements()
- Remove unused shutil import
- Update docstring to clarify internal scheduler is used
- Update tests to reflect new behavior and add coverage for all
  session modes (interactive, gateway, exec_ask)

Fixes #1589

* test: add HERMES_EXEC_ASK coverage for cronjob requirements

Adds missing test for the exec_ask session mode, complementing
the cherry-picked fix from PR #1633.

---------

Co-authored-by: Bartok9 <bartokmagic@proton.me>
2026-03-17 01:40:02 -07:00
Teknium
4b96d10bc3 fix(cli): invalidate update-check cache after hermes update
Signed-off-by: nidhi-singh02 <nidhi2894@gmail.com>
Co-authored-by: nidhi-singh02 <nidhi2894@gmail.com>
2026-03-17 01:38:11 -07:00
teknium1
c16870277c test: add regression test for stale PID in gateway_state.json (#1631)
Verifies that write_runtime_status() overwrites pid and start_time
from a previous process rather than preserving them via setdefault().
Covers the fix from PR #1632.
2026-03-17 01:35:02 -07:00
Teknium
247e3c1470 Merge pull request #1632 from nidhi-singh02/fix/stale-pid-gateway-state
fix(gateway): overwrite stale PID in gateway_state.json on restart
2026-03-17 01:34:24 -07:00
Teknium
2af4af6390 Merge pull request #1635 from NousResearch/hermes/hermes-a86162db
fix: sanitize corrupted .env files on read and during migration
2026-03-17 01:33:36 -07:00
Teknium
749e9977a0 Merge pull request #1629 from NousResearch/hermes/hermes-6891ac11
feat(browser): multi-provider cloud browser support + Browser Use integration
2026-03-17 01:32:38 -07:00
teknium1
1c61ab6bd9 fix: unconditionally clear ANTHROPIC_TOKEN on v8→v9 migration
No conditional checks — just clear it. The new auth flow doesn't use
this env var. Anyone upgrading gets it wiped once, then it's done.
2026-03-17 01:31:20 -07:00
teknium1
e9f1a8e39b fix: gate ANTHROPIC_TOKEN cleanup to config version 8→9 migration
- Bump _config_version 8 → 9
- Move stale ANTHROPIC_TOKEN clearing into 'if current_ver < 9' block
  so it only runs once during the upgrade, not on every migrate_config()
- ANTHROPIC_TOKEN is still a valid auth path (OAuth flow), so we don't
  want to clear it repeatedly — only during the one-time migration from
  old setups that left it stale
- Add test_skips_on_version_9_or_later to verify one-time behavior
- All tests set config version 8 to trigger migration
2026-03-17 01:28:38 -07:00
teknium1
b6a51c955e fix: clear stale ANTHROPIC_TOKEN during migration, remove false *** detection
- Remove *** placeholder detection from _sanitize_env_lines (was based on
  confusing terminal redaction with literal file content)
- Add migrate_config() logic to clear stale ANTHROPIC_TOKEN when better
  credentials exist (ANTHROPIC_API_KEY or Claude Code auto-discovery)
- Old ANTHROPIC_TOKEN values shadow Claude Code credential fallthrough,
  breaking auth for users who updated without re-running setup
- Preserves ANTHROPIC_TOKEN when it's the only auth method available
- 3 new migration tests, updated existing tests
2026-03-17 01:26:23 -07:00
teknium1
634c1f6752 fix: sanitize corrupted .env files on read and during migration
Fixes two corruption patterns that break API keys during updates:

1. Concatenated KEY=VALUE pairs on a single line due to missing newlines
   (e.g. ANTHROPIC_API_KEY=sk-...OPENAI_BASE_URL=https://...). Uses a
   known-keys set to safely detect and split concatenated entries without
   false-splitting values that contain uppercase text.

2. Stale KEY=*** placeholder entries left by incomplete setup runs that
   never get updated and shadow real credentials.

Changes:
- Add _sanitize_env_lines() that splits concatenated known keys and drops
  *** placeholders
- Add sanitize_env_file() public API for explicit repair
- Call sanitization in save_env_value() on every read (self-healing)
- Call sanitize_env_file() at the start of migrate_config() so existing
  corrupted files are repaired on update
- 12 new tests covering splits, placeholders, edge cases, and integration
2026-03-17 01:13:34 -07:00
Teknium
6ebb816e56 Merge pull request #1634 from NousResearch/hermes/hermes-a86162db
chore: release v0.3.0 (v2026.3.17)
2026-03-17 00:55:51 -07:00
teknium1
37862f74fa chore: release v0.3.0 (v2026.3.17)
- Bump version 0.2.0 → 0.3.0
- Add comprehensive changelog (248 merged PRs, 15 contributors)
- CalVer tag: v2026.3.17
2026-03-17 00:38:48 -07:00
nidhi-singh02
67546746d4 fix(gateway): overwrite stale PID in gateway_state.json on restart
Signed-off-by: nidhi-singh02 <nidhi2894@gmail.com>
2026-03-17 13:01:55 +05:30
ShawnPana
d44b6b7f1b feat(browser): multi-provider cloud browser support + Browser Use integration
Introduce a cloud browser provider abstraction so users can switch
between Local Browser, Browserbase, and Browser Use (or future providers)
via hermes tools / hermes setup.

Cloud browser providers are behind an ABC (tools/browser_providers/base.py)
so adding a new provider is a single-file addition with no changes to
browser_tool.py internals.

Changes:
- tools/browser_providers/ package with ABC, Browserbase extraction,
  and Browser Use provider
- browser_tool.py refactored to use _PROVIDER_REGISTRY + _get_cloud_provider()
  (cached) instead of hardcoded _is_local_mode() / _create_browserbase_session()
- tools_config.py: generic _is_provider_active() / _detect_active_provider_index()
  replace TTS-only logic; Browser Use added as third browser option
- config.py: BROWSER_USE_API_KEY added to OPTIONAL_ENV_VARS + show_config + allowlist
- subprocess pipe hang fix: agent-browser daemon inherits pipe fds,
  communicate() blocks. Replaced with Popen + temp files.

Original PR: #1208
Co-authored-by: ShawnPana <shawnpana@users.noreply.github.com>
2026-03-17 00:16:34 -07:00
Teknium
3576f44a57 feat: add Vercel AI Gateway provider (#1628)
* feat: add Vercel AI Gateway as a first-class provider

Adds AI Gateway (ai-gateway.vercel.sh) as a new inference provider
with AI_GATEWAY_API_KEY authentication, live model discovery, and
reasoning support via extra_body.reasoning.

Based on PR #1492 by jerilynzheng.

* feat: add AI Gateway to setup wizard, doctor, and fallback providers

* test: add AI Gateway to api_key_providers test suite

* feat: add AI Gateway to hermes model CLI and model metadata

Wire AI Gateway into the interactive model selection menu and add
context lengths for AI Gateway model IDs in model_metadata.py.

* feat: use claude-haiku-4.5 as AI Gateway auxiliary model

* revert: use gemini-3-flash as AI Gateway auxiliary model

* fix: move AI Gateway below established providers in selection order

---------

Co-authored-by: jerilynzheng <jerilynzheng@users.noreply.github.com>
Co-authored-by: jerilynzheng <zheng.jerilyn@gmail.com>
2026-03-17 00:12:16 -07:00
teknium1
4768ea624d fix: skip stale cron jobs on gateway restart instead of firing immediately
When the gateway restarts after being down past a scheduled run time,
recurring jobs (cron/interval) were firing immediately because their
next_run_at was in the past. Now jobs more than 2 minutes late are
fast-forwarded to the next future occurrence instead.

- get_due_jobs() checks staleness for cron/interval jobs
- Stale jobs get next_run_at recomputed and saved
- Jobs within 2 minutes of their schedule still fire normally
- One-shot (once) jobs are unaffected — they fire if missed

Fixes the 'cron jobs run on every gateway restart' issue.
2026-03-16 23:48:14 -07:00
Teknium
e3f9894caf fix: send_animation metadata, MarkdownV2 inline code splitting, tirith cosign-free install (#1626)
* fix: Anthropic OAuth compatibility — Claude Code identity fingerprinting

Anthropic routes OAuth/subscription requests based on Claude Code's
identity markers. Without them, requests get intermittent 500 errors
(~25% failure rate observed). This matches what pi-ai (clawdbot) and
OpenCode both implement for OAuth compatibility.

Changes (OAuth tokens only — API key users unaffected):

1. Headers: user-agent 'claude-cli/2.1.2 (external, cli)' + x-app 'cli'
2. System prompt: prepend 'You are Claude Code, Anthropic's official CLI'
3. System prompt sanitization: replace Hermes/Nous references
4. Tool names: prefix with 'mcp_' (Claude Code convention for non-native tools)
5. Tool name stripping: remove 'mcp_' prefix from response tool calls

Before: 9/12 OK, 1 hard fail, 4 needed retries (~25% error rate)
After: 16/16 OK, 0 failures, 0 retries (0% error rate)

* fix: three gateway issues from user error logs

1. send_animation missing metadata kwarg (base.py)
   - Base class send_animation lacked the metadata parameter that the
     call site in base.py line 917 passes. Telegram's override accepted
     it, but any platform without an override (Discord, Slack, etc.)
     hit TypeError. Added metadata to base class signature.

2. MarkdownV2 split-inside-inline-code (base.py truncate_message)
   - truncate_message could split at a space inside an inline code span
     (e.g. `function(arg1, arg2)`), leaving an unpaired backtick and
     unescaped parentheses in the chunk. Telegram rejects with
     'character ( is reserved'. Added inline code awareness to the
     split-point finder — detects odd backtick counts and moves the
     split before the code span.

3. tirith auto-install without cosign (tirith_security.py)
   - Previously required cosign on PATH for auto-install, blocking
     install entirely with a warning if missing. Now proceeds with
     SHA-256 checksum verification only when cosign is unavailable.
     Cosign is still used for full supply chain verification when
     present. If cosign IS present but verification explicitly fails,
     install is still aborted (tampered release).
2026-03-16 23:39:41 -07:00
teknium1
19c8ad3d3d fix: add Claude Code user-agent to OAuth token exchange/refresh requests
Anthropic's token endpoint is behind Cloudflare which blocks Python's
default urllib user-agent (Python-urllib/3.x). Without a proper
user-agent, the token exchange returns 403 (Cloudflare error 1010).

Adds 'claude-cli/2.1.2 (external, cli)' user-agent to all three
OAuth HTTP requests:
- Initial token exchange (authorization_code grant)
- Hermes token refresh (refresh_token grant)
- Claude Code credential refresh (refresh_token grant)

Verified: full OAuth PKCE flow now works end-to-end.
2026-03-16 23:26:43 -07:00
teknium1
bd3b0c712b fix: make OAuth login URL prominent for SSH/headless users
The URL is now the primary element — displayed in a bordered box
before the browser auto-open attempt. Works for users who SSH into
remote servers where webbrowser.open() silently fails.
2026-03-16 23:21:30 -07:00
Teknium
46176c8029 refactor: centralize slash command registry (#1603)
* refactor: centralize slash command registry

Replace 7+ scattered command definition sites with a single
CommandDef registry in hermes_cli/commands.py. All downstream
consumers now derive from this registry:

- CLI process_command() resolves aliases via resolve_command()
- Gateway _known_commands uses GATEWAY_KNOWN_COMMANDS frozenset
- Gateway help text generated by gateway_help_lines()
- Telegram BotCommands generated by telegram_bot_commands()
- Slack subcommand map generated by slack_subcommand_map()

Adding a command or alias is now a one-line change to
COMMAND_REGISTRY instead of touching 6+ files.

Bugfixes included:
- Telegram now registers /rollback, /background (were missing)
- Slack now has /voice, /update, /reload-mcp (were missing)
- Gateway duplicate 'reasoning' dispatch (dead code) removed
- Gateway help text can no longer drift from CLI help

Backwards-compatible: COMMANDS and COMMANDS_BY_CATEGORY dicts are
rebuilt from the registry, so existing imports work unchanged.

* docs: update developer docs for centralized command registry

Update AGENTS.md with full 'Slash Command Registry' and 'Adding a
Slash Command' sections covering CommandDef fields, registry helpers,
and the one-line alias workflow.

Also update:
- CONTRIBUTING.md: commands.py description
- website/docs/reference/slash-commands.md: reference central registry
- docs/plans/centralize-command-registry.md: mark COMPLETED
- plans/checkpoint-rollback.md: reference new pattern
- hermes-agent-dev skill: architecture table

* chore: remove stale plan docs
2026-03-16 23:21:03 -07:00
teknium1
b798062501 fix: improve OAuth login UX for headless/SSH users
Put the authorization URL front and center instead of treating it as
a fallback. Most Hermes users run on remote servers via SSH where
webbrowser.open() silently fails.
2026-03-16 23:17:29 -07:00
teknium1
63e88326a8 feat: Hermes-native PKCE OAuth flow for Claude Pro/Max subscriptions
Adds our own OAuth login and token refresh flow, independent of Claude
Code CLI. Mirrors the PKCE flow used by pi-ai (clawdbot) and OpenCode:

- run_hermes_oauth_login(): full PKCE authorization code flow
  - Opens browser to claude.ai/oauth/authorize
  - User pastes code#state back
  - Exchanges for access + refresh tokens
  - Stores in ~/.hermes/.anthropic_oauth.json (our own file)
  - Also writes to ~/.claude/.credentials.json for backward compat

- refresh_hermes_oauth_token(): automatic token refresh
  - POST to console.anthropic.com/v1/oauth/token with refresh_token
  - Updates both credential files on success

- Credential resolution priority updated:
  1. ANTHROPIC_TOKEN env var
  2. CLAUDE_CODE_OAUTH_TOKEN env var
  3. Hermes OAuth credentials (~/.hermes/.anthropic_oauth.json) ← NEW
  4. Claude Code credentials (~/.claude/.credentials.json)
  5. ANTHROPIC_API_KEY env var

Uses same CLIENT_ID, endpoints, scopes, and PKCE parameters as
Claude Code / OpenCode / pi-ai. Token refresh happens automatically
before each API call via _try_refresh_anthropic_client_credentials.
2026-03-16 23:15:56 -07:00
Teknium
474301adc6 fix: improve execute_code error logging and harden cleanup (#1623)
* fix(tools): improve error logging in code_execution_tool

* fix: harden execute_code cleanup and reduce logging noise

Follow-up to cherry-picked PR #1588 (aydnOktay):
- Initialize server_sock = None before try block to prevent NameError
  if exception occurs before socket creation (line 413 is inside the try)
- Guard server_sock.close() with None check
- Narrow cleanup exception handlers to OSError (the actual error type)
- Remove exc_info=True from cleanup debug logs — benign teardown
  failures don't need stack traces, the message is sufficient
- Remove redundant try/except around shutil.rmtree(ignore_errors=True)
- Silence sock_path unlink with pass — expected when already cleaned up

---------

Co-authored-by: aydnOktay <xaydinoktay@gmail.com>
2026-03-16 23:13:26 -07:00
DeadMan
285300528b fix: isolate test_anthropic_adapter from local credentials
Two tests lacked filesystem isolation causing them to pick up real
~/.claude/.credentials.json tokens on machines with Claude Code installed.

- test_prefers_oauth_token_over_api_key: add tmp_path, mock Path.home,
  clear CLAUDE_CODE_OAUTH_TOKEN env
- test_falls_back_to_token: same isolation

Also commit run_agent.py generic-400 retry fix.
2026-03-16 22:53:32 -07:00
Verne
673f132151 fix(gateway): Recover stale service state
Repair stale launchd/systemd definitions during install and
teach launchd start to reload unloaded jobs before retrying.

Stop masking service restart failures by falling back to a
foreground gateway when a configured service manager is still
broken.

Refs: #1613
2026-03-17 11:05:28 +08:00
0xbyt4
8d0a96a8bf fix: context counter shows cached token count in status bar
Anthropic prompt caching splits input into cache_read_input_tokens,
cache_creation_input_tokens, and non-cached input_tokens. The context
counter only read input_tokens (non-cached portion), showing ~3 tokens
instead of the real ~18K total. Now includes cached portions for
Anthropic native provider only — other providers (OpenAI, OpenRouter,
Codex) already include cached tokens in their prompt_tokens field.

Before: 3/200K | 0%
After: 17.7K/200K | 9%
2026-03-17 05:06:11 +03:00
SHL0MS
cfa87e77a9 Merge pull request #1598 from NousResearch/shloms/ascii-video-v3
Refactor ascii-video skill: creative-first SKILL.md, consolidate references
2026-03-16 20:46:12 -04:00
Teknium
60e38e82ec fix: auto-detect D-Bus session bus for systemctl --user on headless servers (#1601)
* fix: Anthropic OAuth compatibility — Claude Code identity fingerprinting

Anthropic routes OAuth/subscription requests based on Claude Code's
identity markers. Without them, requests get intermittent 500 errors
(~25% failure rate observed). This matches what pi-ai (clawdbot) and
OpenCode both implement for OAuth compatibility.

Changes (OAuth tokens only — API key users unaffected):

1. Headers: user-agent 'claude-cli/2.1.2 (external, cli)' + x-app 'cli'
2. System prompt: prepend 'You are Claude Code, Anthropic's official CLI'
3. System prompt sanitization: replace Hermes/Nous references
4. Tool names: prefix with 'mcp_' (Claude Code convention for non-native tools)
5. Tool name stripping: remove 'mcp_' prefix from response tool calls

Before: 9/12 OK, 1 hard fail, 4 needed retries (~25% error rate)
After: 16/16 OK, 0 failures, 0 retries (0% error rate)

* fix: auto-detect DBUS_SESSION_BUS_ADDRESS for systemctl --user on headless servers

On SSH sessions to headless servers, DBUS_SESSION_BUS_ADDRESS and
XDG_RUNTIME_DIR may not be set even when the user's systemd instance
is running via linger. This causes 'systemctl --user' to fail with
'Failed to connect to bus: No medium found', breaking gateway
restart/start/stop as a service and falling back to foreground mode.

Add _ensure_user_systemd_env() that detects the standard D-Bus socket
at /run/user/<UID>/bus and sets the env vars before any systemctl --user
call. Called from _systemctl_cmd() so all existing call sites benefit
automatically with zero changes.

Fixes: gateway restart falling back to foreground on headless servers

* fix: show linger guidance when gateway restart fails during update and gateway restart

When systemctl --user restart fails during 'hermes update' or
'hermes gateway restart', check linger status and tell the user
exactly what to run (sudo -S -p '' loginctl enable-linger) instead of
silently falling back to foreground mode.

Also applies _ensure_user_systemd_env() to the raw systemctl calls
in cmd_update so they work properly on SSH sessions where D-Bus
env vars are missing.
2026-03-16 17:45:48 -07:00
Teknium
ce430fed4c installer: clarify why sudo is needed at every prompt (#1602)
* fix: Anthropic OAuth compatibility — Claude Code identity fingerprinting

Anthropic routes OAuth/subscription requests based on Claude Code's
identity markers. Without them, requests get intermittent 500 errors
(~25% failure rate observed). This matches what pi-ai (clawdbot) and
OpenCode both implement for OAuth compatibility.

Changes (OAuth tokens only — API key users unaffected):

1. Headers: user-agent 'claude-cli/2.1.2 (external, cli)' + x-app 'cli'
2. System prompt: prepend 'You are Claude Code, Anthropic's official CLI'
3. System prompt sanitization: replace Hermes/Nous references
4. Tool names: prefix with 'mcp_' (Claude Code convention for non-native tools)
5. Tool name stripping: remove 'mcp_' prefix from response tool calls

Before: 9/12 OK, 1 hard fail, 4 needed retries (~25% error rate)
After: 16/16 OK, 0 failures, 0 retries (0% error rate)

* installer: clarify why sudo is needed at every prompt

Every sudo prompt now explicitly states what packages are being installed
and that Hermes Agent itself does not require or retain root access.
Covers system packages, build tools, and Playwright browser deps.
2026-03-16 17:43:48 -07:00
Teknium
6794e79bb4 feat: add /bg as alias for /background slash command (#1590)
* feat: add optional smart model routing

Add a conservative cheap-vs-strong routing option that can send very short/simple turns to a cheaper model across providers while keeping the primary model for complex work. Wire it through CLI, gateway, and cron, and document the config.yaml workflow.

* fix(gateway): remove recursive ExecStop from systemd units, extend TimeoutStopSec to 60s

* fix(gateway): avoid recursive ExecStop in user systemd unit

* fix: extend ExecStop removal and TimeoutStopSec=60 to system unit

The cherry-picked PR #1448 fix only covered the user systemd unit.
The system unit had the same TimeoutStopSec=15 and could benefit
from the same 60s timeout for clean shutdown. Also adds a regression
test for the system unit.

---------

Co-authored-by: Ninja <ninja@local>

* feat(skills): add blender-mcp optional skill for 3D modeling

Control a running Blender instance from Hermes via socket connection
to the blender-mcp addon (port 9876). Supports creating 3D objects,
materials, animations, and running arbitrary bpy code.

Placed in optional-skills/ since it requires Blender 4.3+ desktop
with a third-party addon manually started each session.

* feat(acp): support slash commands in ACP adapter (#1532)

Adds /help, /model, /tools, /context, /reset, /compact, /version
to the ACP adapter (VS Code, Zed, JetBrains). Commands are handled
directly in the server without instantiating the TUI — each command
queries agent/session state and returns plain text.

Unrecognized /commands fall through to the LLM as normal messages.

/model uses detect_provider_for_model() for auto-detection when
switching models, matching the CLI and gateway behavior.

Fixes #1402

* fix(logging): improve error logging in session search tool (#1533)

* fix(gateway): restart on retryable startup failures (#1517)

* feat(email): add skip_attachments option via config.yaml

* feat(email): add skip_attachments option via config.yaml

Adds a config.yaml-driven option to skip email attachments in the
gateway email adapter. Useful for malware protection and bandwidth
savings.

Configure in config.yaml:
  platforms:
    email:
      skip_attachments: true

Based on PR #1521 by @an420eth, changed from env var to config.yaml
(via PlatformConfig.extra) to match the project's config-first pattern.

* docs: document skip_attachments option for email adapter

* fix(telegram): retry on transient TLS failures during connect and send

Add exponential-backoff retry (3 attempts) around initialize() to
handle transient TLS resets during gateway startup. Also catches
TimedOut and OSError in addition to NetworkError.

Add exponential-backoff retry (3 attempts) around send_message() for
NetworkError during message delivery, wrapping the existing Markdown
fallback logic.

Both imports are guarded with try/except ImportError for test
environments where telegram is mocked.

Based on PR #1527 by cmd8. Closes #1526.

* feat: permissive block_anchor thresholds and unicode normalization (#1539)

Salvaged from PR #1528 by an420eth. Closes #517.

Improves _strategy_block_anchor in fuzzy_match.py:
- Add unicode normalization (smart quotes, em/en-dashes, ellipsis,
  non-breaking spaces → ASCII) so LLM-produced unicode artifacts
  don't break anchor line matching
- Lower thresholds: 0.10 for unique matches (was 0.70), 0.30 for
  multiple candidates — if first/last lines match exactly, the
  block is almost certainly correct
- Use original (non-normalized) content for offset calculation to
  preserve correct character positions

Tested: 3 new scenarios fixed (em-dash anchors, non-breaking space
anchors, very-low-similarity unique matches), zero regressions on
all 9 existing fuzzy match tests.

Co-authored-by: an420eth <an420eth@users.noreply.github.com>

* feat(cli): add file path autocomplete in the input prompt (#1545)

When typing a path-like token (./  ../  ~/  /  or containing /),
the CLI now shows filesystem completions in the dropdown menu.
Directories show a trailing slash and 'dir' label; files show
their size. Completions are case-insensitive and capped at 30
entries.

Triggered by tokens like:
  edit ./src/ma     → shows ./src/main.py, ./src/manifest.json, ...
  check ~/doc       → shows ~/docs/, ~/documents/, ...
  read /etc/hos     → shows /etc/hosts, /etc/hostname, ...
  open tools/reg    → shows tools/registry.py

Slash command autocomplete (/help, /model, etc.) is unaffected —
it still triggers when the input starts with /.

Inspired by OpenCode PR #145 (file path completion menu).

Implementation:
- hermes_cli/commands.py: _extract_path_word() detects path-like
  tokens, _path_completions() yields filesystem Completions with
  size labels, get_completions() routes to paths vs slash commands
- tests/hermes_cli/test_path_completion.py: 26 tests covering
  path extraction, prefix filtering, directory markers, home
  expansion, case-insensitivity, integration with slash commands

* feat(privacy): redact PII from LLM context when privacy.redact_pii is enabled

Add privacy.redact_pii config option (boolean, default false). When
enabled, the gateway redacts personally identifiable information from
the system prompt before sending it to the LLM provider:

- Phone numbers (user IDs on WhatsApp/Signal) → hashed to user_<sha256>
- User IDs → hashed to user_<sha256>
- Chat IDs → numeric portion hashed, platform prefix preserved
- Home channel IDs → hashed
- Names/usernames → NOT affected (user-chosen, publicly visible)

Hashes are deterministic (same user → same hash) so the model can
still distinguish users in group chats. Routing and delivery use
the original values internally — redaction only affects LLM context.

Inspired by OpenClaw PR #47959.

* fix(privacy): skip PII redaction on Discord/Slack (mentions need real IDs)

Discord uses <@user_id> for mentions and Slack uses <@U12345> — the LLM
needs the real ID to tag users. Redaction now only applies to WhatsApp,
Signal, and Telegram where IDs are pure routing metadata.

Add 4 platform-specific tests covering Discord, WhatsApp, Signal, Slack.

* feat: smart approvals + /stop command (inspired by OpenAI Codex)

* feat: smart approvals — LLM-based risk assessment for dangerous commands

Adds a 'smart' approval mode that uses the auxiliary LLM to assess
whether a flagged command is genuinely dangerous or a false positive,
auto-approving low-risk commands without prompting the user.

Inspired by OpenAI Codex's Smart Approvals guardian subagent
(openai/codex#13860).

Config (config.yaml):
  approvals:
    mode: manual   # manual (default), smart, off

Modes:
- manual — current behavior, always prompt the user
- smart  — aux LLM evaluates risk: APPROVE (auto-allow), DENY (block),
           or ESCALATE (fall through to manual prompt)
- off    — skip all approval prompts (equivalent to --yolo)

When smart mode auto-approves, the pattern gets session-level approval
so subsequent uses of the same pattern don't trigger another LLM call.
When it denies, the command is blocked without user prompt. When
uncertain, it escalates to the normal manual approval flow.

The LLM prompt is carefully scoped: it sees only the command text and
the flagged reason, assesses actual risk vs false positive, and returns
a single-word verdict.

* feat: make smart approval model configurable via config.yaml

Adds auxiliary.approval section to config.yaml with the same
provider/model/base_url/api_key pattern as other aux tasks (vision,
web_extract, compression, etc.).

Config:
  auxiliary:
    approval:
      provider: auto
      model: ''        # fast/cheap model recommended
      base_url: ''
      api_key: ''

Bridged to env vars in both CLI and gateway paths so the aux client
picks them up automatically.

* feat: add /stop command to kill all background processes

Adds a /stop slash command that kills all running background processes
at once. Currently users have to process(list) then process(kill) for
each one individually.

Inspired by OpenAI Codex's separation of interrupt (Ctrl+C stops current
turn) from /stop (cleans up background processes). See openai/codex#14602.

Ctrl+C continues to only interrupt the active agent turn — background
dev servers, watchers, etc. are preserved. /stop is the explicit way
to clean them all up.

* feat: first-class plugin architecture + hide status bar cost by default (#1544)

The persistent status bar now shows context %, token counts, and
duration but NOT $ cost by default. Cost display is opt-in via:

  display:
    show_cost: true

in config.yaml, or: hermes config set display.show_cost true

The /usage command still shows full cost breakdown since the user
explicitly asked for it — this only affects the always-visible bar.

Status bar without cost:
  ⚕ claude-sonnet-4 │ 12K/200K │ 6% │ 15m

Status bar with show_cost: true:
  ⚕ claude-sonnet-4 │ 12K/200K │ 6% │ $0.06 │ 15m

* feat: improve memory prioritization + aggressive skill updates (inspired by OpenAI Codex)

* feat: improve memory prioritization — user preferences over procedural knowledge

Inspired by OpenAI Codex's memory prompt improvements (openai/codex#14493)
which focus memory writes on user preferences and recurring patterns
rather than procedural task details.

Key insight: 'Optimize for reducing future user steering — the most
valuable memory prevents the user from having to repeat themselves.'

Changes:
- MEMORY_GUIDANCE (prompt_builder.py): added prioritization hierarchy
  and the core principle about reducing user steering
- MEMORY_SCHEMA (memory_tool.py): reordered WHEN TO SAVE list to put
  corrections first, added explicit PRIORITY guidance
- Memory nudge (run_agent.py): now asks specifically about preferences,
  corrections, and workflow patterns instead of generic 'anything'
- Memory flush (run_agent.py): now instructs to prioritize user
  preferences and corrections over task-specific details

* feat: more aggressive skill creation and update prompting

Press harder on skill updates — the agent should proactively patch
skills when it encounters issues during use, not wait to be asked.

Changes:
- SKILLS_GUIDANCE: 'consider saving' → 'save'; added explicit instruction
  to patch skills immediately when found outdated/wrong
- Skills header: added instruction to update loaded skills before finishing
  if they had missing steps or wrong commands
- Skill nudge: more assertive ('save the approach' not 'consider saving'),
  now also prompts for updating existing skills used in the task
- Skill nudge interval: lowered default from 15 to 10 iterations
- skill_manage schema: added 'patch it immediately' to update triggers

* feat: first-class plugin architecture (#1555)

Plugin system for extending Hermes with custom tools, hooks, and
integrations — no source code changes required.

Core system (hermes_cli/plugins.py):
  - Plugin discovery from ~/.hermes/plugins/, .hermes/plugins/, and
    pip entry_points (hermes_agent.plugins group)
  - PluginContext with register_tool() and register_hook()
  - 6 lifecycle hooks: pre/post tool_call, pre/post llm_call,
    on_session_start/end
  - Namespace package handling for relative imports in plugins
  - Graceful error isolation — broken plugins never crash the agent

Integration (model_tools.py):
  - Plugin discovery runs after built-in + MCP tools
  - Plugin tools bypass toolset filter via get_plugin_tool_names()
  - Pre/post tool call hooks fire in handle_function_call()

CLI:
  - /plugins command shows loaded plugins, tool counts, status
  - Added to COMMANDS dict for autocomplete

Docs:
  - Getting started guide (build-a-hermes-plugin.md) — full tutorial
    building a calculator plugin step by step
  - Reference page (features/plugins.md) — quick overview + tables
  - Covers: file structure, schemas, handlers, hooks, data files,
    bundled skills, env var gating, pip distribution, common mistakes

Tests: 16 tests covering discovery, loading, hooks, tool visibility.

* feat: add /bg as alias for /background slash command

Adds /bg alias across CLI, gateway, and Slack platform adapter.
Updates help text, autocomplete, known_commands set, and dispatch
logic. Includes tests for the new alias.

* docs: add plan for centralized slash command registry

Scopes a refactor to replace 7+ scattered command definition sites
with a single CommandDef registry in hermes_cli/commands.py. Includes
derived helper functions for gateway help text, Telegram BotCommands,
Slack subcommand maps, and alias resolution.

Documents current drift (Telegram missing /rollback + /background,
Slack missing /voice + /update, gateway dead code) that the refactor
fixes for free.

---------

Co-authored-by: Ninja <ninja@local>
Co-authored-by: alireza78a <alireza78a@users.noreply.github.com>
Co-authored-by: Oktay Aydin <113846926+aydnOktay@users.noreply.github.com>
Co-authored-by: JP Lew <polydegen@protonmail.com>
Co-authored-by: an420eth <an420eth@users.noreply.github.com>
2026-03-16 17:27:02 -07:00
Teknium
181077b785 fix: hide Honcho session line on CLI load when no API key configured (#1582)
HonchoClientConfig.from_env() set enabled=True unconditionally,
even when HONCHO_API_KEY was not set. When ~/.honcho/config.json
didn't exist, from_global_config() fell back to from_env() and
returned enabled=True with a null api_key, causing the Honcho
session indicator to display on every CLI launch.

Fix: from_env() now sets enabled=bool(api_key), matching the
auto-enable logic already used in from_global_config().
Also added api_key guard to the CLI display as defense-in-depth.
2026-03-16 17:22:52 -07:00
SHL0MS
63635744bf Refactor ascii-video skill: creative-first SKILL.md, consolidate reference files 2026-03-16 20:11:12 -04:00
Teknium
2158c44efd fix: Anthropic OAuth compatibility — Claude Code identity fingerprinting (#1597)
Anthropic routes OAuth/subscription requests based on Claude Code's
identity markers. Without them, requests get intermittent 500 errors
(~25% failure rate observed). This matches what pi-ai (clawdbot) and
OpenCode both implement for OAuth compatibility.

Changes (OAuth tokens only — API key users unaffected):

1. Headers: user-agent 'claude-cli/2.1.2 (external, cli)' + x-app 'cli'
2. System prompt: prepend 'You are Claude Code, Anthropic's official CLI'
3. System prompt sanitization: replace Hermes/Nous references
4. Tool names: prefix with 'mcp_' (Claude Code convention for non-native tools)
5. Tool name stripping: remove 'mcp_' prefix from response tool calls

Before: 9/12 OK, 1 hard fail, 4 needed retries (~25% error rate)
After: 16/16 OK, 0 failures, 0 retries (0% error rate)
2026-03-16 17:08:22 -07:00
Teknium
e6cf1c94a8 Merge pull request #1585 from 0xbyt4/fix/anthropic-error-handling
fix(anthropic): retry 429/529 errors and surface error details to users
2026-03-16 15:46:06 -07:00
0xbyt4
d998cac319 fix(anthropic): retry 429/529 errors and surface error details to users
- 429 rate limit and 529 overloaded were incorrectly treated as
  non-retryable client errors, causing immediate failure instead of
  exponential backoff retry. Users hitting Anthropic rate limits got
  silent failures or no response at all.
- Generic "Sorry, I encountered an unexpected error" now includes
  error type, details, and status-specific hints (auth, rate limit,
  overloaded).
- Failed agent with final_response=None now surfaces the actual
  error message instead of returning an empty response.
2026-03-17 01:07:11 +03:00
Teknium
6c84e26e70 Merge pull request #1538 from NousResearch/hermes/hermes-a098c323
feat: unified streaming infrastructure — real-time token delivery for CLI + gateway
2026-03-16 14:22:57 -07:00
teknium1
f4d61c168b merge: resolve conflicts with main (show_cost, turn routing, docker docs) 2026-03-16 14:22:38 -07:00
teknium1
8feb9e4656 docs: add streaming section to configuration guide 2026-03-16 12:53:49 -07:00
teknium1
25a1f1867f fix(gateway): prevent message flooding on adapters without edit support
When the stream consumer's first edit_message() call fails (Signal,
Email, HomeAssistant don't support editing), it now disables editing
for the rest of the stream instead of falling back to sending a new
message every 0.3 seconds. The final response is delivered by the
normal send path since already_sent stays false.

Without this fix, enabling gateway streaming on Signal/Email/HA would
flood the chat with dozens of partial messages.
2026-03-16 12:41:28 -07:00
Teknium
5e5c92663d fix: hermes update causes dual gateways on macOS (launchd) (#1567)
* feat: add optional smart model routing

Add a conservative cheap-vs-strong routing option that can send very short/simple turns to a cheaper model across providers while keeping the primary model for complex work. Wire it through CLI, gateway, and cron, and document the config.yaml workflow.

* fix(gateway): remove recursive ExecStop from systemd units, extend TimeoutStopSec to 60s

* fix(gateway): avoid recursive ExecStop in user systemd unit

* fix: extend ExecStop removal and TimeoutStopSec=60 to system unit

The cherry-picked PR #1448 fix only covered the user systemd unit.
The system unit had the same TimeoutStopSec=15 and could benefit
from the same 60s timeout for clean shutdown. Also adds a regression
test for the system unit.

---------

Co-authored-by: Ninja <ninja@local>

* feat(skills): add blender-mcp optional skill for 3D modeling

Control a running Blender instance from Hermes via socket connection
to the blender-mcp addon (port 9876). Supports creating 3D objects,
materials, animations, and running arbitrary bpy code.

Placed in optional-skills/ since it requires Blender 4.3+ desktop
with a third-party addon manually started each session.

* feat(acp): support slash commands in ACP adapter (#1532)

Adds /help, /model, /tools, /context, /reset, /compact, /version
to the ACP adapter (VS Code, Zed, JetBrains). Commands are handled
directly in the server without instantiating the TUI — each command
queries agent/session state and returns plain text.

Unrecognized /commands fall through to the LLM as normal messages.

/model uses detect_provider_for_model() for auto-detection when
switching models, matching the CLI and gateway behavior.

Fixes #1402

* fix(logging): improve error logging in session search tool (#1533)

* fix(gateway): restart on retryable startup failures (#1517)

* feat(email): add skip_attachments option via config.yaml

* feat(email): add skip_attachments option via config.yaml

Adds a config.yaml-driven option to skip email attachments in the
gateway email adapter. Useful for malware protection and bandwidth
savings.

Configure in config.yaml:
  platforms:
    email:
      skip_attachments: true

Based on PR #1521 by @an420eth, changed from env var to config.yaml
(via PlatformConfig.extra) to match the project's config-first pattern.

* docs: document skip_attachments option for email adapter

* fix(telegram): retry on transient TLS failures during connect and send

Add exponential-backoff retry (3 attempts) around initialize() to
handle transient TLS resets during gateway startup. Also catches
TimedOut and OSError in addition to NetworkError.

Add exponential-backoff retry (3 attempts) around send_message() for
NetworkError during message delivery, wrapping the existing Markdown
fallback logic.

Both imports are guarded with try/except ImportError for test
environments where telegram is mocked.

Based on PR #1527 by cmd8. Closes #1526.

* feat: permissive block_anchor thresholds and unicode normalization (#1539)

Salvaged from PR #1528 by an420eth. Closes #517.

Improves _strategy_block_anchor in fuzzy_match.py:
- Add unicode normalization (smart quotes, em/en-dashes, ellipsis,
  non-breaking spaces → ASCII) so LLM-produced unicode artifacts
  don't break anchor line matching
- Lower thresholds: 0.10 for unique matches (was 0.70), 0.30 for
  multiple candidates — if first/last lines match exactly, the
  block is almost certainly correct
- Use original (non-normalized) content for offset calculation to
  preserve correct character positions

Tested: 3 new scenarios fixed (em-dash anchors, non-breaking space
anchors, very-low-similarity unique matches), zero regressions on
all 9 existing fuzzy match tests.

Co-authored-by: an420eth <an420eth@users.noreply.github.com>

* feat(cli): add file path autocomplete in the input prompt (#1545)

When typing a path-like token (./  ../  ~/  /  or containing /),
the CLI now shows filesystem completions in the dropdown menu.
Directories show a trailing slash and 'dir' label; files show
their size. Completions are case-insensitive and capped at 30
entries.

Triggered by tokens like:
  edit ./src/ma     → shows ./src/main.py, ./src/manifest.json, ...
  check ~/doc       → shows ~/docs/, ~/documents/, ...
  read /etc/hos     → shows /etc/hosts, /etc/hostname, ...
  open tools/reg    → shows tools/registry.py

Slash command autocomplete (/help, /model, etc.) is unaffected —
it still triggers when the input starts with /.

Inspired by OpenCode PR #145 (file path completion menu).

Implementation:
- hermes_cli/commands.py: _extract_path_word() detects path-like
  tokens, _path_completions() yields filesystem Completions with
  size labels, get_completions() routes to paths vs slash commands
- tests/hermes_cli/test_path_completion.py: 26 tests covering
  path extraction, prefix filtering, directory markers, home
  expansion, case-insensitivity, integration with slash commands

* feat(privacy): redact PII from LLM context when privacy.redact_pii is enabled

Add privacy.redact_pii config option (boolean, default false). When
enabled, the gateway redacts personally identifiable information from
the system prompt before sending it to the LLM provider:

- Phone numbers (user IDs on WhatsApp/Signal) → hashed to user_<sha256>
- User IDs → hashed to user_<sha256>
- Chat IDs → numeric portion hashed, platform prefix preserved
- Home channel IDs → hashed
- Names/usernames → NOT affected (user-chosen, publicly visible)

Hashes are deterministic (same user → same hash) so the model can
still distinguish users in group chats. Routing and delivery use
the original values internally — redaction only affects LLM context.

Inspired by OpenClaw PR #47959.

* fix(privacy): skip PII redaction on Discord/Slack (mentions need real IDs)

Discord uses <@user_id> for mentions and Slack uses <@U12345> — the LLM
needs the real ID to tag users. Redaction now only applies to WhatsApp,
Signal, and Telegram where IDs are pure routing metadata.

Add 4 platform-specific tests covering Discord, WhatsApp, Signal, Slack.

* feat: smart approvals + /stop command (inspired by OpenAI Codex)

* feat: smart approvals — LLM-based risk assessment for dangerous commands

Adds a 'smart' approval mode that uses the auxiliary LLM to assess
whether a flagged command is genuinely dangerous or a false positive,
auto-approving low-risk commands without prompting the user.

Inspired by OpenAI Codex's Smart Approvals guardian subagent
(openai/codex#13860).

Config (config.yaml):
  approvals:
    mode: manual   # manual (default), smart, off

Modes:
- manual — current behavior, always prompt the user
- smart  — aux LLM evaluates risk: APPROVE (auto-allow), DENY (block),
           or ESCALATE (fall through to manual prompt)
- off    — skip all approval prompts (equivalent to --yolo)

When smart mode auto-approves, the pattern gets session-level approval
so subsequent uses of the same pattern don't trigger another LLM call.
When it denies, the command is blocked without user prompt. When
uncertain, it escalates to the normal manual approval flow.

The LLM prompt is carefully scoped: it sees only the command text and
the flagged reason, assesses actual risk vs false positive, and returns
a single-word verdict.

* feat: make smart approval model configurable via config.yaml

Adds auxiliary.approval section to config.yaml with the same
provider/model/base_url/api_key pattern as other aux tasks (vision,
web_extract, compression, etc.).

Config:
  auxiliary:
    approval:
      provider: auto
      model: ''        # fast/cheap model recommended
      base_url: ''
      api_key: ''

Bridged to env vars in both CLI and gateway paths so the aux client
picks them up automatically.

* feat: add /stop command to kill all background processes

Adds a /stop slash command that kills all running background processes
at once. Currently users have to process(list) then process(kill) for
each one individually.

Inspired by OpenAI Codex's separation of interrupt (Ctrl+C stops current
turn) from /stop (cleans up background processes). See openai/codex#14602.

Ctrl+C continues to only interrupt the active agent turn — background
dev servers, watchers, etc. are preserved. /stop is the explicit way
to clean them all up.

* feat: first-class plugin architecture + hide status bar cost by default (#1544)

The persistent status bar now shows context %, token counts, and
duration but NOT $ cost by default. Cost display is opt-in via:

  display:
    show_cost: true

in config.yaml, or: hermes config set display.show_cost true

The /usage command still shows full cost breakdown since the user
explicitly asked for it — this only affects the always-visible bar.

Status bar without cost:
  ⚕ claude-sonnet-4 │ 12K/200K │ 6% │ 15m

Status bar with show_cost: true:
  ⚕ claude-sonnet-4 │ 12K/200K │ 6% │ $0.06 │ 15m

* feat: improve memory prioritization + aggressive skill updates (inspired by OpenAI Codex)

* feat: improve memory prioritization — user preferences over procedural knowledge

Inspired by OpenAI Codex's memory prompt improvements (openai/codex#14493)
which focus memory writes on user preferences and recurring patterns
rather than procedural task details.

Key insight: 'Optimize for reducing future user steering — the most
valuable memory prevents the user from having to repeat themselves.'

Changes:
- MEMORY_GUIDANCE (prompt_builder.py): added prioritization hierarchy
  and the core principle about reducing user steering
- MEMORY_SCHEMA (memory_tool.py): reordered WHEN TO SAVE list to put
  corrections first, added explicit PRIORITY guidance
- Memory nudge (run_agent.py): now asks specifically about preferences,
  corrections, and workflow patterns instead of generic 'anything'
- Memory flush (run_agent.py): now instructs to prioritize user
  preferences and corrections over task-specific details

* feat: more aggressive skill creation and update prompting

Press harder on skill updates — the agent should proactively patch
skills when it encounters issues during use, not wait to be asked.

Changes:
- SKILLS_GUIDANCE: 'consider saving' → 'save'; added explicit instruction
  to patch skills immediately when found outdated/wrong
- Skills header: added instruction to update loaded skills before finishing
  if they had missing steps or wrong commands
- Skill nudge: more assertive ('save the approach' not 'consider saving'),
  now also prompts for updating existing skills used in the task
- Skill nudge interval: lowered default from 15 to 10 iterations
- skill_manage schema: added 'patch it immediately' to update triggers

* feat: first-class plugin architecture (#1555)

Plugin system for extending Hermes with custom tools, hooks, and
integrations — no source code changes required.

Core system (hermes_cli/plugins.py):
  - Plugin discovery from ~/.hermes/plugins/, .hermes/plugins/, and
    pip entry_points (hermes_agent.plugins group)
  - PluginContext with register_tool() and register_hook()
  - 6 lifecycle hooks: pre/post tool_call, pre/post llm_call,
    on_session_start/end
  - Namespace package handling for relative imports in plugins
  - Graceful error isolation — broken plugins never crash the agent

Integration (model_tools.py):
  - Plugin discovery runs after built-in + MCP tools
  - Plugin tools bypass toolset filter via get_plugin_tool_names()
  - Pre/post tool call hooks fire in handle_function_call()

CLI:
  - /plugins command shows loaded plugins, tool counts, status
  - Added to COMMANDS dict for autocomplete

Docs:
  - Getting started guide (build-a-hermes-plugin.md) — full tutorial
    building a calculator plugin step by step
  - Reference page (features/plugins.md) — quick overview + tables
  - Covers: file structure, schemas, handlers, hooks, data files,
    bundled skills, env var gating, pip distribution, common mistakes

Tests: 16 tests covering discovery, loading, hooks, tool visibility.

* fix: hermes update causes dual gateways on macOS (launchd)

Three bugs worked together to create the dual-gateway problem:

1. cmd_update only checked systemd for gateway restart, completely
   ignoring launchd on macOS. After killing the PID it would print
   'Restart it with: hermes gateway run' even when launchd was about
   to auto-respawn the process.

2. launchd's KeepAlive.SuccessfulExit=false respawns the gateway
   after SIGTERM (non-zero exit), so the user's manual restart
   created a second instance.

3. The launchd plist lacked --replace (systemd had it), so the
   respawned gateway didn't kill stale instances on startup.

Fixes:
- Add --replace to launchd ProgramArguments (matches systemd)
- Add launchd detection to cmd_update's auto-restart logic
- Print 'auto-restart via launchd' instead of manual restart hint

* fix: add launchd plist auto-refresh + explicit restart in cmd_update

Two integration issues with the initial fix:

1. Existing macOS users with old plist (no --replace) would never
   get the fix until manual uninstall/reinstall. Added
   refresh_launchd_plist_if_needed() — mirrors the existing
   refresh_systemd_unit_if_needed(). Called from launchd_start(),
   launchd_restart(), and cmd_update.

2. cmd_update relied on KeepAlive respawn after SIGTERM rather than
   explicit launchctl stop/start. This caused races: launchd would
   respawn the old process before the PID file was cleaned up.
   Now does explicit stop+start (matching how systemd gets an
   explicit systemctl restart), with plist refresh first so the
   new --replace flag is picked up.

---------

Co-authored-by: Ninja <ninja@local>
Co-authored-by: alireza78a <alireza78a@users.noreply.github.com>
Co-authored-by: Oktay Aydin <113846926+aydnOktay@users.noreply.github.com>
Co-authored-by: JP Lew <polydegen@protonmail.com>
Co-authored-by: an420eth <an420eth@users.noreply.github.com>
2026-03-16 12:36:29 -07:00
teknium1
942950f5b9 feat(cli): live reasoning token streaming — dim box above response
When both display.streaming and display.show_reasoning are enabled,
reasoning tokens stream in real-time into a dim bordered box. When
content tokens start arriving, the reasoning box closes and the
response box opens — smooth visual transition.

- _stream_reasoning_delta(): line-buffered rendering in dim text
- _close_reasoning_box(): flush + close, called on first content token
- Reasoning callback routes to streaming version when both flags set
- Skips static post-response reasoning display when streamed live
- State reset per turn via _reset_stream_state()

Works with reasoning_content deltas (OpenRouter reasoning mode) and
thinking_delta events (Anthropic extended thinking).
2026-03-16 10:29:55 -07:00
teknium1
d3687d3e81 docs: document planned live reasoning token display as future enhancement
The streaming infrastructure already fires reasoning deltas via
_fire_reasoning_delta() during streaming. The remaining work is the
CLI display layer: a dim reasoning box that opens on first reasoning
token, streams live, then transitions to the response box.

Reference: PR #1214 (raulvidis) for gateway reasoning visibility.
2026-03-16 10:22:44 -07:00
Muhammet Eren Karakuş
43b8ecd172 fix(tests): use case-insensitive regex in singularity preflight tests
pytest.raises(match=...) is case-sensitive by default. The error
message starts with "Neither" (capital N) but the regex used lowercase
"neither", causing CI failures on Linux.
2026-03-16 19:01:39 +03:00
Muhammet Eren Karakuş
606f57a3ab fix(terminal): add Singularity/Apptainer preflight availability check
When neither apptainer nor singularity is installed, the Singularity
backend silently defaults to "singularity" and fails with a cryptic
FileNotFoundError inside _start_instance().  Add a preflight check
that resolves the executable and verifies it responds, raising a
clear RuntimeError with install instructions on failure.

Closes #1511
2026-03-16 18:25:20 +03:00
teknium1
23b9d88a76 docs: add streaming config to cli-config.yaml.example and defaults
Documents the new streaming options in the example config:
- display.streaming for CLI (under display section)
- streaming.enabled + transport/interval/threshold/cursor for gateway
- Added streaming: false to load_cli_config() defaults dict
2026-03-16 07:53:08 -07:00
teknium1
c0b88018eb feat: ship streaming disabled by default — opt-in via config
Streaming is now off by default for both CLI and gateway. Users opt in:

CLI (config.yaml):
  display:
    streaming: true

Gateway (config.yaml):
  streaming:
    enabled: true

This lets early adopters test streaming while existing users see zero
change. Once we have enough field validation, we flip the default to
true in a subsequent release.
2026-03-16 07:44:42 -07:00
teknium1
fc4080c58a fix(cli): add <THINKING> to streaming tag suppression list
Anthropic native models emit <THINKING> tags in text content (separate
from the SDK's thinking_delta events). Without suppression, these tags
leak into the streamed CLI output. Found during live provider testing.
2026-03-16 07:34:29 -07:00
Teknium
91b9495b04 feat(browser): /browser connect — attach browser tools to live Chrome via CDP (#1549)
feat(browser): /browser connect — attach browser tools to live Chrome via CDP
2026-03-16 07:32:07 -07:00
teknium1
c2769dffe0 merge: resolve conflicts with main (plugins + stop commands) 2026-03-16 07:32:00 -07:00
teknium1
71e35311f5 fix(browser): model waits for user instruction after /browser connect
Updated the injected context message to tell the model to await the
user's instruction before operating the browser. Typical flow is:
user opens Chrome → logs into sites → /browser connect → tells the
agent what to do.
2026-03-16 07:20:43 -07:00
Teknium
97990e7ad5 feat: first-class plugin architecture (#1555)
Plugin system for extending Hermes with custom tools, hooks, and
integrations — no source code changes required.

Core system (hermes_cli/plugins.py):
  - Plugin discovery from ~/.hermes/plugins/, .hermes/plugins/, and
    pip entry_points (hermes_agent.plugins group)
  - PluginContext with register_tool() and register_hook()
  - 6 lifecycle hooks: pre/post tool_call, pre/post llm_call,
    on_session_start/end
  - Namespace package handling for relative imports in plugins
  - Graceful error isolation — broken plugins never crash the agent

Integration (model_tools.py):
  - Plugin discovery runs after built-in + MCP tools
  - Plugin tools bypass toolset filter via get_plugin_tool_names()
  - Pre/post tool call hooks fire in handle_function_call()

CLI:
  - /plugins command shows loaded plugins, tool counts, status
  - Added to COMMANDS dict for autocomplete

Docs:
  - Getting started guide (build-a-hermes-plugin.md) — full tutorial
    building a calculator plugin step by step
  - Reference page (features/plugins.md) — quick overview + tables
  - Covers: file structure, schemas, handlers, hooks, data files,
    bundled skills, env var gating, pip distribution, common mistakes

Tests: 16 tests covering discovery, loading, hooks, tool visibility.
2026-03-16 07:17:36 -07:00
teknium1
73f39a7761 feat(browser): auto-launch Chrome when /browser connect finds no debugger
When /browser connect detects that port 9222 isn't open, it now:
1. Finds Chrome/Chromium/Brave/Edge on the system (macOS app bundles
   or Linux PATH lookup)
2. Launches it with --remote-debugging-port=9222 (detached)
3. Waits up to 5 seconds for the port to come up
4. Falls back to manual instructions if auto-launch fails

This means GUI-only users can just type /browser connect without
needing to know about terminal flags or Chrome launch commands.
2026-03-16 07:05:48 -07:00
Teknium
1ecfe68675 feat: improve memory prioritization + aggressive skill updates (inspired by OpenAI Codex)
* feat: improve memory prioritization — user preferences over procedural knowledge

Inspired by OpenAI Codex's memory prompt improvements (openai/codex#14493)
which focus memory writes on user preferences and recurring patterns
rather than procedural task details.

Key insight: 'Optimize for reducing future user steering — the most
valuable memory prevents the user from having to repeat themselves.'

Changes:
- MEMORY_GUIDANCE (prompt_builder.py): added prioritization hierarchy
  and the core principle about reducing user steering
- MEMORY_SCHEMA (memory_tool.py): reordered WHEN TO SAVE list to put
  corrections first, added explicit PRIORITY guidance
- Memory nudge (run_agent.py): now asks specifically about preferences,
  corrections, and workflow patterns instead of generic 'anything'
- Memory flush (run_agent.py): now instructs to prioritize user
  preferences and corrections over task-specific details

* feat: more aggressive skill creation and update prompting

Press harder on skill updates — the agent should proactively patch
skills when it encounters issues during use, not wait to be asked.

Changes:
- SKILLS_GUIDANCE: 'consider saving' → 'save'; added explicit instruction
  to patch skills immediately when found outdated/wrong
- Skills header: added instruction to update loaded skills before finishing
  if they had missing steps or wrong commands
- Skill nudge: more assertive ('save the approach' not 'consider saving'),
  now also prompts for updating existing skills used in the task
- Skill nudge interval: lowered default from 15 to 10 iterations
- skill_manage schema: added 'patch it immediately' to update triggers
2026-03-16 06:52:32 -07:00
Teknium
447594be28 feat: first-class plugin architecture + hide status bar cost by default (#1544)
The persistent status bar now shows context %, token counts, and
duration but NOT $ cost by default. Cost display is opt-in via:

  display:
    show_cost: true

in config.yaml, or: hermes config set display.show_cost true

The /usage command still shows full cost breakdown since the user
explicitly asked for it — this only affects the always-visible bar.

Status bar without cost:
  ⚕ claude-sonnet-4 │ 12K/200K │ 6% │ 15m

Status bar with show_cost: true:
  ⚕ claude-sonnet-4 │ 12K/200K │ 6% │ $0.06 │ 15m
2026-03-16 06:43:57 -07:00
teknium1
9d1483c7e6 feat(browser): /browser connect — attach browser tools to live Chrome via CDP
Add /browser slash command for connecting browser tools to the user's
live Chrome instance via Chrome DevTools Protocol:

  /browser connect       — connect to Chrome on localhost:9222
  /browser connect ws://host:port  — custom CDP endpoint
  /browser disconnect    — revert to default (headless/Browserbase)
  /browser status        — show current browser mode + connectivity

When connected:
- All browser tools (navigate, snapshot, click, etc.) control the
  user's real Chrome — logged-in sessions, cookies, open tabs
- Platform-specific Chrome launch instructions are shown
- Port connectivity is tested immediately
- A context message is injected so the model knows it's controlling
  a live browser and should be mindful of user's open tabs

Implementation:
- BROWSER_CDP_URL env var drives the backend selection in browser_tool.py
- New _create_cdp_session() creates sessions using the CDP override
- _get_cdp_override() checked before local/Browserbase selection
- Existing agent-browser --cdp flag handles the actual CDP connection

Inspired by OpenClaw's browser profile system.
2026-03-16 06:38:20 -07:00
teknium1
8e07f9ca56 fix: audit fixes — 5 bugs found and resolved
Thorough code review found 5 issues across run_agent.py, cli.py, and gateway/:

1. CRITICAL — Gateway stream consumer task never started: stream_consumer_holder
   was checked BEFORE run_sync populated it. Fixed with async polling pattern
   (same as track_agent).

2. MEDIUM-HIGH — Streaming fallback after partial delivery caused double-response:
   if streaming failed after some tokens were delivered, the fallback would
   re-deliver the full response. Now tracks deltas_were_sent and only falls
   back when no tokens reached consumers yet.

3. MEDIUM — Codex mode lost on_first_delta spinner callback: _run_codex_stream
   now accepts on_first_delta parameter, fires it on first text delta. Passed
   through from _interruptible_streaming_api_call via _codex_on_first_delta
   instance attribute.

4. MEDIUM — CLI close-tag after-text bypassed tag filtering: text after a
   reasoning close tag was sent directly to _emit_stream_text, skipping
   open-tag detection. Now routes through _stream_delta for full filtering.

5. LOW — Removed 140 lines of dead code: old _streaming_api_call method
   (superseded by _interruptible_streaming_api_call). Updated 13 tests in
   test_run_agent.py and test_openai_client_lifecycle.py to use the new
   method name and signature.

4573 tests passing.
2026-03-16 06:35:46 -07:00
Teknium
57be18c026 feat: smart approvals + /stop command (inspired by OpenAI Codex)
* feat: smart approvals — LLM-based risk assessment for dangerous commands

Adds a 'smart' approval mode that uses the auxiliary LLM to assess
whether a flagged command is genuinely dangerous or a false positive,
auto-approving low-risk commands without prompting the user.

Inspired by OpenAI Codex's Smart Approvals guardian subagent
(openai/codex#13860).

Config (config.yaml):
  approvals:
    mode: manual   # manual (default), smart, off

Modes:
- manual — current behavior, always prompt the user
- smart  — aux LLM evaluates risk: APPROVE (auto-allow), DENY (block),
           or ESCALATE (fall through to manual prompt)
- off    — skip all approval prompts (equivalent to --yolo)

When smart mode auto-approves, the pattern gets session-level approval
so subsequent uses of the same pattern don't trigger another LLM call.
When it denies, the command is blocked without user prompt. When
uncertain, it escalates to the normal manual approval flow.

The LLM prompt is carefully scoped: it sees only the command text and
the flagged reason, assesses actual risk vs false positive, and returns
a single-word verdict.

* feat: make smart approval model configurable via config.yaml

Adds auxiliary.approval section to config.yaml with the same
provider/model/base_url/api_key pattern as other aux tasks (vision,
web_extract, compression, etc.).

Config:
  auxiliary:
    approval:
      provider: auto
      model: ''        # fast/cheap model recommended
      base_url: ''
      api_key: ''

Bridged to env vars in both CLI and gateway paths so the aux client
picks them up automatically.

* feat: add /stop command to kill all background processes

Adds a /stop slash command that kills all running background processes
at once. Currently users have to process(list) then process(kill) for
each one individually.

Inspired by OpenAI Codex's separation of interrupt (Ctrl+C stops current
turn) from /stop (cleans up background processes). See openai/codex#14602.

Ctrl+C continues to only interrupt the active agent turn — background
dev servers, watchers, etc. are preserved. /stop is the explicit way
to clean them all up.
2026-03-16 06:20:11 -07:00
teknium1
99369b926c fix: always fall back to non-streaming on ANY streaming error
Previously the fallback only triggered on specific error keywords like
'streaming is not supported'. Many third-party providers have partial
or broken streaming — rejecting stream=True, crashing on stream_options,
dropping connections mid-stream, returning malformed chunks, etc.

Now: any exception during the streaming API call triggers an automatic
fallback to the standard non-streaming request path. The error is logged
at INFO level for diagnostics but never surfaces to the user. If the
fallback also fails, THAT error propagates normally.

This ensures streaming is additive — it improves UX when it works but
never breaks providers that don't support it.

Tests: 2 new (any-error fallback, double-failure propagation), 15 total.
2026-03-16 06:15:09 -07:00
Teknium
2633272ea9 feat(privacy): redact PII from LLM context when privacy.redact_pii is enabled (#1542)
feat(privacy): redact PII from LLM context when privacy.redact_pii is enabled
2026-03-16 06:08:17 -07:00
Teknium
2ba219fa4b feat(cli): add file path autocomplete in the input prompt (#1545)
When typing a path-like token (./  ../  ~/  /  or containing /),
the CLI now shows filesystem completions in the dropdown menu.
Directories show a trailing slash and 'dir' label; files show
their size. Completions are case-insensitive and capped at 30
entries.

Triggered by tokens like:
  edit ./src/ma     → shows ./src/main.py, ./src/manifest.json, ...
  check ~/doc       → shows ~/docs/, ~/documents/, ...
  read /etc/hos     → shows /etc/hosts, /etc/hostname, ...
  open tools/reg    → shows tools/registry.py

Slash command autocomplete (/help, /model, etc.) is unaffected —
it still triggers when the input starts with /.

Inspired by OpenCode PR #145 (file path completion menu).

Implementation:
- hermes_cli/commands.py: _extract_path_word() detects path-like
  tokens, _path_completions() yields filesystem Completions with
  size labels, get_completions() routes to paths vs slash commands
- tests/hermes_cli/test_path_completion.py: 26 tests covering
  path extraction, prefix filtering, directory markers, home
  expansion, case-insensitivity, integration with slash commands
2026-03-16 06:07:45 -07:00
teknium1
9a423c3487 fix(privacy): skip PII redaction on Discord/Slack (mentions need real IDs)
Discord uses <@user_id> for mentions and Slack uses <@U12345> — the LLM
needs the real ID to tag users. Redaction now only applies to WhatsApp,
Signal, and Telegram where IDs are pure routing metadata.

Add 4 platform-specific tests covering Discord, WhatsApp, Signal, Slack.
2026-03-16 05:58:34 -07:00
teknium1
5479bb0e0c feat(gateway): streaming token delivery — StreamingConfig, GatewayStreamConsumer, already_sent
Stage 3 of streaming support. Gateway now streams tokens to messaging platforms:

- StreamingConfig dataclass (enabled, transport, edit_interval, buffer_threshold, cursor)
  on GatewayConfig with from_dict/to_dict serialization
- GatewayStreamConsumer: async queue-based consumer that progressively edits
  a single message on the target platform (edit transport)
- on_delta() → queue → run() async task → send_or_edit() with rate limiting
- already_sent propagation: when streaming delivered the response, handler
  returns None so base adapter skips duplicate send()
- stream_delta_callback wired into AIAgent constructor in _run_agent
- Consumer lifecycle: started as asyncio task, awaited with timeout in finally

Config (config.yaml):
  streaming:
    enabled: true
    transport: edit      # progressive editMessageText
    edit_interval: 0.3   # seconds between edits
    buffer_threshold: 40 # chars before forcing flush
    cursor: ' ▉'

Credit: jobless0x (#774, #1312), OutThisLife (#798), clicksingh (#697).
2026-03-16 05:52:42 -07:00
teknium1
c51e7b4af7 feat(privacy): redact PII from LLM context when privacy.redact_pii is enabled
Add privacy.redact_pii config option (boolean, default false). When
enabled, the gateway redacts personally identifiable information from
the system prompt before sending it to the LLM provider:

- Phone numbers (user IDs on WhatsApp/Signal) → hashed to user_<sha256>
- User IDs → hashed to user_<sha256>
- Chat IDs → numeric portion hashed, platform prefix preserved
- Home channel IDs → hashed
- Names/usernames → NOT affected (user-chosen, publicly visible)

Hashes are deterministic (same user → same hash) so the model can
still distinguish users in group chats. Routing and delivery use
the original values internally — redaction only affects LLM context.

Inspired by OpenClaw PR #47959.
2026-03-16 05:48:45 -07:00
Teknium
7d2c786acc Merge pull request #1534 from NousResearch/fix/1445-docker-cwd-optin
fix(docker): make cwd workspace mount explicit opt-in
2026-03-16 05:42:21 -07:00
teknium1
b72f522e30 test: fake minisweagent for docker cwd mount regressions
Make the new Docker cwd-mount tests pass in CI environments that do not have the minisweagent package installed by injecting a fake module instead of monkeypatching an import path that may not exist.
2026-03-16 05:40:05 -07:00
Teknium
352980311b feat: permissive block_anchor thresholds and unicode normalization (#1539)
Salvaged from PR #1528 by an420eth. Closes #517.

Improves _strategy_block_anchor in fuzzy_match.py:
- Add unicode normalization (smart quotes, em/en-dashes, ellipsis,
  non-breaking spaces → ASCII) so LLM-produced unicode artifacts
  don't break anchor line matching
- Lower thresholds: 0.10 for unique matches (was 0.70), 0.30 for
  multiple candidates — if first/last lines match exactly, the
  block is almost certainly correct
- Use original (non-normalized) content for offset calculation to
  preserve correct character positions

Tested: 3 new scenarios fixed (em-dash anchors, non-breaking space
anchors, very-low-similarity unique matches), zero regressions on
all 9 existing fuzzy match tests.

Co-authored-by: an420eth <an420eth@users.noreply.github.com>
2026-03-16 05:29:25 -07:00
Teknium
b411b979cb fix(telegram): retry on transient TLS failures during connect and send (#1535)
fix(telegram): retry on transient TLS failures during connect and send
2026-03-16 05:28:11 -07:00
teknium1
ac739e485f fix(cli): reasoning tag suppression during streaming + fix fallback detection
Fixes two issues found during live testing:

1. Reasoning tag suppression: close tags like </REASONING_SCRATCHPAD>
   that arrive split across stream tokens (e.g. '</REASONING_SCRATCH' +
   'PAD>\n\nHello') were being lost because the buffer was discarded.
   Fix: keep a sliding window of the tail (max close tag length) so
   partial tags survive across tokens.

2. Streaming fallback detection was too broad — 'stream' matched any
   error containing that word (including 'stream_options' rejections).
   Narrowed to specific phrases: 'streaming is not', 'streaming not
   support', 'does not support stream', 'not available'.

Verified with real API calls: streaming works end-to-end with
reasoning block suppression, response box framing, and proper
fallback to Rich Panel when streaming isn't active.
2026-03-16 05:28:10 -07:00
Teknium
8758e2e8d7 feat(email): add skip_attachments option via config.yaml
* feat(email): add skip_attachments option via config.yaml

Adds a config.yaml-driven option to skip email attachments in the
gateway email adapter. Useful for malware protection and bandwidth
savings.

Configure in config.yaml:
  platforms:
    email:
      skip_attachments: true

Based on PR #1521 by @an420eth, changed from env var to config.yaml
(via PlatformConfig.extra) to match the project's config-first pattern.

* docs: document skip_attachments option for email adapter
2026-03-16 05:27:54 -07:00
JP Lew
17e87478d2 fix(gateway): restart on retryable startup failures (#1517) 2026-03-16 05:26:31 -07:00
aydnOktay
a5359e61e7 fix(tools): improve error logging in skill_manager_tool 2026-03-16 15:25:30 +03:00
teknium1
25b0ae7979 fix(telegram): retry on transient TLS failures during connect and send
Add exponential-backoff retry (3 attempts) around initialize() to
handle transient TLS resets during gateway startup. Also catches
TimedOut and OSError in addition to NetworkError.

Add exponential-backoff retry (3 attempts) around send_message() for
NetworkError during message delivery, wrapping the existing Markdown
fallback logic.

Both imports are guarded with try/except ImportError for test
environments where telegram is mocked.

Based on PR #1527 by cmd8. Closes #1526.
2026-03-16 05:23:32 -07:00
Oktay Aydin
dfe72b9d97 fix(logging): improve error logging in session search tool (#1533) 2026-03-16 05:22:00 -07:00
teknium1
780ddd102b fix(docker): gate cwd workspace mount behind config
Keep Docker sandboxes isolated by default. Add an explicit terminal.docker_mount_cwd_to_workspace opt-in, thread it through terminal/file environment creation, and document the security tradeoff and config.yaml workflow clearly.
2026-03-16 05:20:56 -07:00
Bartok9
8cdbbcaaa2 fix(docker): auto-mount host CWD to /workspace
Fixes #1445 — When using Docker backend, the user's current working
directory is now automatically bind-mounted to /workspace inside the
container. This allows users to run `cd my-project && hermes` and have
their project files accessible to the agent without manual volume config.

Changes:
- Add host_cwd and auto_mount_cwd parameters to DockerEnvironment
- Capture original host CWD in _get_env_config() before container fallback
- Pass host_cwd through _create_environment() to Docker backend
- Add TERMINAL_DOCKER_NO_AUTO_MOUNT env var to disable if needed
- Skip auto-mount when /workspace is already explicitly mounted
- Add tests for auto-mount behavior
- Add documentation for the new feature

The auto-mount is skipped when:
1. TERMINAL_DOCKER_NO_AUTO_MOUNT=true is set
2. User configured docker_volumes with :/workspace
3. persistent_filesystem=true (persistent sandbox mode)

This makes the Docker backend behave more intuitively — the agent
operates on the user's actual project directory by default.
2026-03-16 05:20:21 -07:00
Teknium
a2f0d14f29 feat(acp): support slash commands in ACP adapter (#1532)
Adds /help, /model, /tools, /context, /reset, /compact, /version
to the ACP adapter (VS Code, Zed, JetBrains). Commands are handled
directly in the server without instantiating the TUI — each command
queries agent/session state and returns plain text.

Unrecognized /commands fall through to the LLM as normal messages.

/model uses detect_provider_for_model() for auto-detection when
switching models, matching the CLI and gateway behavior.

Fixes #1402
2026-03-16 05:19:36 -07:00
teknium1
2219695d92 test: 14-test streaming suite — accumulator, callbacks, fallback, reasoning, Codex
Tests cover:
- Text/tool-call/mixed response accumulation into correct shape
- Delta callback ordering and on_first_delta firing once
- Tool-call suppression (no callbacks during tool turns)
- Provider fallback on 'not supported' errors
- Reasoning content accumulation and callback
- _has_stream_consumers() detection
- Codex stream delta callback firing
2026-03-16 05:12:38 -07:00
teknium1
d23e9a9bed feat(cli): streaming token display — line-buffered rendering with response box framing
Stage 2 of streaming support. CLI now streams tokens in real-time:

- _stream_delta(): line-buffered rendering via _cprint (prompt_toolkit safe)
- _flush_stream(): emits remaining buffer and closes response box
- Response box opens on first token, closes on flush
- Skip Rich Panel when streaming already displayed content
- Reset streaming state before each agent turn
- Compatible with existing TTS streaming (both can fire simultaneously)
- Uses skin engine for response label branding

Credit: OutThisLife (#798 CLI streaming concept).
2026-03-16 05:10:15 -07:00
Teknium
add945e53c feat(skills): add blender-mcp optional skill for 3D modeling (#1531)
feat(skills): add blender-mcp optional skill for 3D modeling
2026-03-16 05:05:56 -07:00
teknium1
c1ac32737d feat: unified streaming infrastructure — core delta callbacks for all providers
Stage 1 of streaming support. Adds:

- stream_delta_callback parameter on AIAgent.__init__ for real-time token delivery
- _interruptible_streaming_api_call() handling chat_completions + anthropic_messages
- Enhanced _run_codex_stream() to fire delta callbacks during Codex streaming
- _fire_stream_delta() fires both display and TTS callbacks
- _fire_reasoning_delta() for reasoning content streaming
- Tool-call suppression: callbacks only fire on text-only responses
- on_first_delta callback for spinner control on first token
- Provider fallback: graceful degradation to non-streaming
- _has_stream_consumers() unifies stream_delta_callback and _stream_callback checks
- Anthropic streaming returns native Message for downstream compatibility

Drawing from PRs #922 (unified streaming), #1312 (gateway consumer),
#774 (Telegram streaming), #798 (CLI streaming), #1214 (reasoning modes).
Credit: jobless0x, OutThisLife, clicksingh, raulvidis.
2026-03-16 05:05:45 -07:00
alireza78a
14b049d658 feat(skills): add blender-mcp optional skill for 3D modeling
Control a running Blender instance from Hermes via socket connection
to the blender-mcp addon (port 9876). Supports creating 3D objects,
materials, animations, and running arbitrary bpy code.

Placed in optional-skills/ since it requires Blender 4.3+ desktop
with a third-party addon manually started each session.
2026-03-16 05:03:19 -07:00
Teknium
002c459981 fix(gateway): remove recursive ExecStop from systemd units, extend TimeoutStopSec to 60s
* fix(gateway): avoid recursive ExecStop in user systemd unit

* fix: extend ExecStop removal and TimeoutStopSec=60 to system unit

The cherry-picked PR #1448 fix only covered the user systemd unit.
The system unit had the same TimeoutStopSec=15 and could benefit
from the same 60s timeout for clean shutdown. Also adds a regression
test for the system unit.

---------

Co-authored-by: Ninja <ninja@local>
2026-03-16 05:03:11 -07:00
Teknium
ce660a4413 fix(gateway): remove app-specific Athabasca references from vision enrichment (#1529)
Salvaged from PR #1428 by jplew.

Removes Athabasca-specific persistence guidance accidentally merged
in PR #1422:
- Drop Athabasca docstring and injected note from _enrich_message_with_vision
- Delete tests/gateway/test_image_enrichment.py (asserted app-specific behavior)

Co-authored-by: jplew <jplew@users.noreply.github.com>
2026-03-16 05:02:58 -07:00
Teknium
ee579af566 docs: add CLI status bar docs and update /usage reference (#1523)
- Add Status Bar section to user-guide/cli.md with layout example,
  element descriptions, responsive width behavior, and color-coded
  context threshold table
- Update /usage description in slash-commands reference to mention
  cost breakdown and session duration
2026-03-16 04:58:28 -07:00
Teknium
caa944e752 fix(setup+gateway): defer config write, PID-based gateway kill, scoped systemd service names (#1499)
fix(setup+gateway): defer config write, PID-based gateway kill, scoped systemd service names
2026-03-16 04:58:12 -07:00
Teknium
00110fb3c3 docs: update checkpoint/rollback docs for new features
- Reflect that checkpoints are now enabled by default
- Document /rollback diff <N> for previewing changes
- Document /rollback <N> <file> for single-file restore
- Document automatic conversation undo on rollback
- Document terminal command checkpoint coverage
- Update listing example to show change stats
- Fix config path (checkpoints.enabled, not agent.checkpoints_enabled)
- Consolidate features/checkpoints.md to brief summary with link
2026-03-16 04:56:22 -07:00
Bartok9
3543b755af fix(docker): auto-mount host CWD to /workspace
Fixes #1445 — When using Docker backend, the user's current working
directory is now automatically bind-mounted to /workspace inside the
container. This allows users to run `cd my-project && hermes` and have
their project files accessible to the agent without manual volume config.

Changes:
- Add host_cwd and auto_mount_cwd parameters to DockerEnvironment
- Capture original host CWD in _get_env_config() before container fallback
- Pass host_cwd through _create_environment() to Docker backend
- Add TERMINAL_DOCKER_NO_AUTO_MOUNT env var to disable if needed
- Skip auto-mount when /workspace is already explicitly mounted
- Add tests for auto-mount behavior
- Add documentation for the new feature

The auto-mount is skipped when:
1. TERMINAL_DOCKER_NO_AUTO_MOUNT=true is set
2. User configured docker_volumes with :/workspace
3. persistent_filesystem=true (persistent sandbox mode)

This makes the Docker backend behave more intuitively — the agent
operates on the user's actual project directory by default.
2026-03-16 04:53:24 -07:00
teknium1
51185354dd docs: document scoped systemd service names for multi-install
- Update messaging guide to use 'hermes gateway' CLI commands instead
  of raw systemctl (auto-resolves the correct service name)
- Add info callout explaining multi-install service name scoping
- Update HERMES_HOME env var docs to mention PID + service name scoping
2026-03-16 04:44:53 -07:00
Teknium
9e845a6e53 feat: major /rollback improvements — enabled by default, diff preview, file-level restore, conversation undo, terminal checkpoints
Checkpoint & rollback upgrades:

1. Enabled by default — checkpoints are now on for all new sessions.
   Zero cost when no file-mutating tools fire. Disable with
   checkpoints.enabled: false in config.yaml.

2. Diff preview — /rollback diff <N> shows a git diff between the
   checkpoint and current working tree before committing to a restore.

3. File-level restore — /rollback <N> <file> restores a single file
   from a checkpoint instead of the entire directory.

4. Conversation undo on rollback — when restoring files, the last
   chat turn is automatically undone so the agent's context matches
   the restored filesystem state.

5. Terminal command checkpoints — destructive terminal commands (rm,
   mv, sed -i, truncate, git reset/clean, output redirects) now
   trigger automatic checkpoints before execution. Previously only
   write_file and patch were covered.

6. Change summary in listing — /rollback now shows file count and
   +insertions/-deletions for each checkpoint.

7. Fixed dead code — removed duplicate _run_git call in
   list_checkpoints with nonsensical --all if False condition.

8. Updated help text — /rollback with no args now shows available
   subcommands (diff, file-level restore).
2026-03-16 04:43:37 -07:00
Teknium
00a0c56598 feat: add persistent CLI status bar and usage details (#1522)
Salvaged from PR #1104 by kshitijk4poor. Closes #683.

Adds a persistent status bar to the CLI showing model name, context
window usage with visual bar, estimated cost, and session duration.
Responsive layout degrades gracefully for narrow terminals.

Changes:
- agent/usage_pricing.py: shared pricing table, cost estimation with
  Decimal arithmetic, duration/token formatting helpers
- agent/insights.py: refactored to reuse usage_pricing (eliminates
  duplicate pricing table and formatting logic)
- cli.py: status bar with FormattedTextControl fragments, color-coded
  context thresholds (green/yellow/orange/red), enhanced /usage with
  cost breakdown, 1Hz idle refresh for status bar updates
- tests/test_cli_status_bar.py: status bar snapshot, width collapsing,
  usage report with/without pricing, zero-priced model handling
- tests/test_insights.py: verify zero-priced providers show as unknown

Salvage fixes:
- Resolved conflict with voice status bar (both coexist in layout)
- Import _format_context_length from hermes_cli.banner (moved since PR)

Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-03-16 04:42:48 -07:00
teknium1
30da22e1c1 feat(gateway): scope systemd service name to HERMES_HOME
Multiple Hermes installations on the same machine now get unique
systemd service names:
- Default ~/.hermes → hermes-gateway (backward compatible)
- Custom HERMES_HOME → hermes-gateway-<8-char-hash>

Changes:
- Add get_service_name() in hermes_cli/gateway.py that derives a
  deterministic service name from HERMES_HOME via SHA256
- Replace all hardcoded 'hermes-gateway' systemd references with
  get_service_name() across gateway.py, main.py, status.py, uninstall.py
- Add HERMES_HOME env var to both user and system systemd unit templates
  so the gateway process uses the correct installation
- Update tests to use get_service_name() in assertions
2026-03-16 04:42:46 -07:00
teknium1
e7d3f1f3ba fix(update): kill gateway via PID file before restart
cmd_update only ran 'systemctl --user restart hermes-gateway', which
left manually-started gateway processes alive, causing duplicates.

Now uses get_running_pid() from gateway/status.py (scoped to
HERMES_HOME) to find and SIGTERM this installation's gateway before
restarting. Safe with multiple Hermes installations since each
HERMES_HOME has its own PID file.

If no systemd service exists, informs the user to restart manually.

Based on PR #1131 by teknium1. Dropped the cli.py Rich from_ansi
changes (already on main).
2026-03-16 04:35:34 -07:00
Teknium
c1da1fdcd5 feat: auto-detect provider when switching models via /model (#1506)
When typing /model deepseek-chat while on a different provider, the
model name now auto-resolves to the correct provider instead of
silently staying on the wrong one and causing API errors.

Detection priority:
1. Direct provider with credentials (e.g. DEEPSEEK_API_KEY set)
2. OpenRouter catalog match with proper slug remapping
3. Direct provider without creds (clear error beats silent failure)

Also adds DeepSeek as a first-class API-key provider — just set
DEEPSEEK_API_KEY and /model deepseek-chat routes directly.

Bare model names get remapped to proper OpenRouter slugs:
  /model gpt-5.4 → openai/gpt-5.4
  /model claude-opus-4.6 → anthropic/claude-opus-4.6

Salvages the concept from PR #1177 by @virtaava with credential
awareness and OpenRouter slug mapping added.

Co-authored-by: virtaava <virtaava@users.noreply.github.com>
2026-03-16 04:34:45 -07:00
teknium1
f7c5d8a749 Merge remote-tracking branch 'origin/main' into hermes/hermes-6360cdf9 2026-03-16 00:29:49 -07:00
Teknium
9cf7e2f0af Merge pull request #1495 from NousResearch/fix/814-group-session-isolation
fix(gateway): default group sessions to per-user isolation
2026-03-16 00:25:43 -07:00
Teknium
dd7921d514 fix(honcho): isolate session routing for multi-user gateway (#1500)
Salvaged from PR #1470 by adavyas.

Core fix: Honcho tool calls in a multi-session gateway could route to
the wrong session because honcho_tools.py relied on process-global
state. Now threads session context through the call chain:
  AIAgent._invoke_tool() → handle_function_call() → registry.dispatch()
  → handler **kw → _resolve_session_context()

Changes:
- Add _resolve_session_context() to prefer per-call context over globals
- Plumb honcho_manager + honcho_session_key through handle_function_call
- Add sync_honcho=False to run_conversation() for synthetic flush turns
- Pass honcho_session_key through gateway memory flush lifecycle
- Harden gateway PID detection when /proc cmdline is unreadable
- Make interrupt test scripts import-safe for pytest-xdist
- Wrap BibTeX examples in Jekyll raw blocks for docs build
- Fix thread-order-dependent assertion in client lifecycle test
- Expand Honcho docs: session isolation, lifecycle, routing internals

Dropped from original PR:
- Indentation change in _create_request_openai_client that would move
  client creation inside the lock (causes unnecessary contention)

Co-authored-by: adavyas <adavyas@users.noreply.github.com>
2026-03-16 00:23:47 -07:00
Teknium
eb4f0348e1 fix: persist CLI token counts to session DB for /insights
Token usage was tracked in-memory during CLI sessions (session_prompt_tokens,
session_completion_tokens) but never written to the SQLite session DB. The
gateway persisted tokens via session_store.update_session(), but CLI sessions
always showed 0 tokens in /insights.

Now run_agent.py persists token deltas to the DB after each API call for CLI
sessions. Gateway sessions continue to use their existing persist path to
avoid double-counting.
2026-03-16 00:23:13 -07:00
teknium1
38b4fd3737 fix(gateway): make group session isolation configurable
default group and channel sessions to per-user isolation, allow opting back into shared room sessions via config.yaml, and document Discord gateway routing and session behavior.
2026-03-16 00:22:23 -07:00
ygd58
36dd7a3e8d fix(setup): defer config.yaml write until after model selection
_update_config_for_provider() was called immediately after provider
selection for zai, kimi-coding, minimax, minimax-cn, and anthropic —
before model selection happened. Since the gateway re-reads config.yaml
per-message, this created a race where the gateway would pick up the
new provider but still use the old (incompatible) model name.

Capture selected_base_url in each provider block, then call
_update_config_for_provider() once, after model selection completes,
right before save_config(). The in-memory _set_model_provider() calls
stay in place so the config object remains consistent during setup.

Closes #1182
2026-03-16 00:18:30 -07:00
Teknium
dd698f6d5d fix(gateway): SSL certificate auto-detection for NixOS and non-standard systems (#1494)
fix(gateway): SSL certificate auto-detection for NixOS and non-standard systems
2026-03-16 00:14:13 -07:00
teknium1
06a7d19f98 fix(gateway): isolate group sessions per user
Include participant identifiers in non-DM session keys when available so group and channel conversations no longer share one transcript across every active user in the chat.
2026-03-15 23:08:56 -07:00
teknium1
3801532bd3 fix(gateway): SSL certificate auto-detection for NixOS and non-standard systems
Add _ensure_ssl_certs() that discovers CA certificate bundles before any
HTTP library is imported.  Resolution order:
1. Python's ssl.get_default_verify_paths()
2. certifi (if installed)
3. Common distro/macOS paths

Only sets SSL_CERT_FILE if not already present in the environment.
Wrapped in a function (called immediately) to avoid polluting module
namespace.

Based on PR #1151 by sylvesterroos.
2026-03-15 23:04:34 -07:00
Teknium
aaacab7de7 docs: explain checkpoints, /rollback, and git worktrees
* docs: explain checkpoints, rollback, and git worktrees

* fix: correct hermes -w description — auto-creates worktree, takes no path arg

---------

Co-authored-by: aydnOktay <xaydinoktay@gmail.com>
2026-03-15 23:04:07 -07:00
Teknium
4298c6fd9a fix: route background process watcher notifications to Telegram forum topics (#1481)
Salvaged from PR #1146 by spanishflu-est1918.

Background process progress/completion messages were sent with only
chat_id, landing in the general topic instead of the originating forum
topic. Thread the thread_id from HERMES_SESSION_THREAD_ID through the
watcher payload and pass it as metadata to adapter.send() so Telegram
routes notifications to the correct topic.

The env var export (HERMES_SESSION_THREAD_ID in _set_session_env /
_clear_session_env) already existed on main — this commit adds the
missing watcher plumbing.

Co-authored-by: spanishflu-est1918 <spanishflu-est1918@users.noreply.github.com>
2026-03-15 23:01:57 -07:00
Teknium
c30505dddd feat: add OSS Security Forensics skill (Skills Hub) (#1482)
* feat: add OSS Security Forensics skill (Skills Hub)

Salvaged from PR #1066 by zagiscoming. Adds a 7-phase multi-agent
investigation framework for GitHub supply chain attack forensics.

Skill contents (optional-skills/security/oss-forensics/):
- SKILL.md: 420-line investigation framework with 8 anti-hallucination
  guardrails, 5 specialist investigators, ethical use guidelines,
  and API rate limiting guidance
- evidence-store.py: CLI evidence manager with add/list/verify/query/
  export/summary + SHA-256 integrity + chain of custody
- references/: evidence types, GH Archive BigQuery guide (expanded with
  12 event types and 6 query templates), recovery techniques (4 methods),
  investigation templates (5 attack patterns)
- templates/: forensic report template (151 lines), malicious package
  report template

Changes from original PR:
- Dropped unrelated core tool changes (delegate_tool.py role parameter,
  AGENTS.md, README.md modifications)
- Removed duplicate skills/security/oss-forensics/ placement
- Fixed github-archive-guide.md (missing from optional-skills/, expanded
  from 33 to 160+ lines with all 12 event types and query templates)
- Added ethical use guidelines and API rate limiting sections
- Rewrote tests to match the v2 evidence store API (12 tests, all pass)

Closes #384

* fix: use python3 and SKILL_DIR paths throughout oss-forensics skill

- Replace all 'python' invocations with 'python3' for portability
  (Ubuntu doesn't ship 'python' by default)
- Replace relative '../scripts/' and '../templates/' paths with
  SKILL_DIR/scripts/ and SKILL_DIR/templates/ convention
- Add path convention note before Phase 0 explaining SKILL_DIR
- Fix double --- separator (cosmetic)
- Applies to SKILL.md, evidence-store.py docstring,
  recovery-techniques.md, and forensic-report.md template

---------

Co-authored-by: zagiscoming <zagiscoming@users.noreply.github.com>
2026-03-15 21:59:53 -07:00
Teknium
70e24d77a1 Merge pull request #1490 from NousResearch/fix/1033-telegram-voice-fallback
fix: restore local STT fallback for gateway voice notes
2026-03-15 21:58:32 -07:00
Teknium
fa3db2671a docs(readme): add CLI vs messaging quick reference
Co-authored-by: Frank <97429702+tsubasakong@users.noreply.github.com>
2026-03-15 21:58:11 -07:00
Teknium
6fd9f2a0c5 fix(gateway): null-coalesce mode in SessionResetPolicy.from_dict (#1488)
fix(gateway): null-coalesce mode in SessionResetPolicy.from_dict
2026-03-15 21:57:31 -07:00
teknium1
1f72ce71b7 fix: restore local STT fallback for gateway voice notes
Restore local STT command fallback for voice transcription, detect whisper and ffmpeg in common local install paths, and avoid bogus no-provider messaging when only a backend-specific key is missing.
2026-03-15 21:51:40 -07:00
teknium1
102a255575 fix(gateway): null-coalesce mode in SessionResetPolicy.from_dict
Complete the YAML null handling for all three SessionResetPolicy fields.
at_hour and idle_minutes already had null coalescing; mode was still
using data.get('mode', 'both') which returns None when the key exists
with an explicit null value.

Add regression test covering all-null input.

Based on PR #1120 by stablegenius49.
2026-03-15 21:40:22 -07:00
Teknium
5beb681c70 fix(cli): prefer curses over simple_term_menu in setup.py (#1487) 2026-03-15 21:16:21 -07:00
Teknium
c9a9db318e feat(tools): persistent shell mode for local and SSH backends (#1483)
feat(tools): persistent shell mode for local and SSH backends
2026-03-15 21:14:01 -07:00
teknium1
01e62c067b merge: resolve conflicts with origin/main (SSH preflight check) 2026-03-15 21:13:40 -07:00
Teknium
ceb970c559 fix(terminal): add SSH preflight check (#1486) 2026-03-15 21:09:07 -07:00
teknium1
6894358fe1 docs: add persistent shell section to configuration and env-vars reference
Documents terminal.persistent_shell config option, per-backend env var
overrides, precedence table, and what state persists across commands.
2026-03-15 21:01:50 -07:00
Teknium
3f0f4a04a9 fix(agent): skip reasoning extra_body for unsupported OpenRouter models (#1485)
* fix(agent): skip reasoning extra_body for models that don't support it

Sending reasoning config to models like MiniMax or Nvidia via OpenRouter
causes a 400 BadRequestError. Previously, reasoning extra_body was sent
to all OpenRouter and Nous models unconditionally.

Fix: only send reasoning extra_body when the model slug starts with a
known reasoning-capable prefix (deepseek/, anthropic/, openai/, x-ai/,
google/gemini-2, qwen/qwen3) or when using Nous Portal directly.

Applies to both the main API call path (_build_api_kwargs) and the
conversation summary path.

Fixes #1083

* test(agent): cover reasoning extra_body gating

---------

Co-authored-by: ygd58 <buraysandro9@gmail.com>
2026-03-15 20:42:07 -07:00
Teknium
c564e1c3dc feat(tools): centralize tool emoji metadata in registry + skin integration (#1484)
feat(tools): centralize tool emoji metadata in registry + skin integration
2026-03-15 20:35:24 -07:00
teknium1
210d5ade1e feat(tools): centralize tool emoji metadata in registry + skin integration
- Add 'emoji' field to ToolEntry and 'get_emoji()' to ToolRegistry
- Add emoji= to all 50+ registry.register() calls across tool files
- Add get_tool_emoji() helper in agent/display.py with 3-tier resolution:
  skin override → registry default → hardcoded fallback
- Replace hardcoded emoji maps in run_agent.py, delegate_tool.py, and
  gateway/run.py with centralized get_tool_emoji() calls
- Add 'tool_emojis' field to SkinConfig so skins can override per-tool
  emojis (e.g. ares skin could use swords instead of wrenches)
- Add 11 tests (5 registry emoji, 6 display/skin integration)
- Update AGENTS.md skin docs table

Based on the approach from PR #1061 by ForgingAlex (emoji centralization
in registry). This salvage fixes several issues from the original:
- Does NOT split the cronjob tool (which would crash on missing schemas)
- Does NOT change image_generate toolset/requires_env/is_async
- Does NOT delete existing tests
- Completes the centralization (gateway/run.py was missed)
- Hooks into the skin system for full customizability
2026-03-15 20:21:21 -07:00
teknium1
33ebedc76d feat: enable persistent shell by default for SSH, add config option
SSH persistent shell now defaults to true — non-local backends benefit
most from state persistence across execute() calls. Local backend
remains opt-in via TERMINAL_LOCAL_PERSISTENT env var.

New config.yaml option: terminal.persistent_shell (default: true)
Controls the default for non-local backends. Users can disable with:
  hermes config set terminal.persistent_shell false

Precedence: per-backend env var > TERMINAL_PERSISTENT_SHELL > default.

Wired through cli.py, gateway/run.py, and hermes_cli/config.py so the
config.yaml value reaches terminal_tool via env var bridge.
2026-03-15 20:17:13 -07:00
teknium1
5b80654198 feat(tools): add persistent shell mode to local and SSH backends
Cherry-picked from PR #1067 by alt-glitch.
Adds PersistentShellMixin with file-based IPC protocol for long-lived
bash shells. LocalEnvironment and SSHEnvironment gain persistent=True
option. Controlled via TERMINAL_LOCAL_PERSISTENT / TERMINAL_SSH_PERSISTENT
env vars. Fixes latent stderr pipe buffer deadlock.

Co-authored-by: alt-glitch <balyan.sid@gmail.com>
2026-03-15 20:13:02 -07:00
Teknium
25e53f3c1a fix(custom-endpoint): verify /models and suggest working /v1 base URL (#1480) 2026-03-15 20:09:50 -07:00
Teknium
103f7b1ebc fix: verbose mode shows full untruncated output
* fix(cli): silence tirith prefetch install warnings at startup

* fix: verbose mode now shows full untruncated tool args, results, content, and think blocks

When tool progress is set to 'verbose' (via /verbose or config), the display
was still truncating tool arguments to 100 chars, tool results to 100-200 chars,
assistant content to 100 chars, and think blocks to 5 lines. This defeated the
purpose of verbose mode.

Changes:
- Tool args: show full JSON args (not truncated to log_prefix_chars)
- Tool results: show full result content in both display and debug logs
- Assistant content: show full content during tool-call loops
- Think blocks: show full reasoning text (not truncated to 5 lines/100 chars)
- Auto-enable reasoning display when verbose mode is active
- Fix initial agent creation to respect verbose config (was always quiet_mode=True)
- Updated verbose label to mention think blocks
2026-03-15 20:03:37 -07:00
Teknium
a56937735e fix(telegram): escape chunk indicators in MarkdownV2 (#1478) 2026-03-15 19:27:15 -07:00
Teknium
7148534401 fix(gateway): make /status report live state and tokens (#1476) 2026-03-15 19:18:58 -07:00
Teknium
4e91b0240b fix(honcho): correct seed_ai_identity to use session.add_messages() (#1475)
The seed_ai_identity method was calling assistant_peer.add_message() which
doesn't exist on the Honcho SDK's Peer class. Fixed to use the correct
pattern: session.add_messages([peer.message(content)]), matching the
existing message sync code at line 294.

Discovered and fixed by Yuqi (Hermes Agent), Angello's AI companion.

Co-authored-by: Angello Picasso <angello.picasso@devsu.com>
2026-03-15 19:07:57 -07:00
Teknium
5e92a4ce5a fix: auto-reload MCP tools when mcp_servers config changes without restart (#1474)
Fixes #1036

After adding an MCP server to config.yaml, users had to restart Hermes
before the new tools became visible — even though /reload-mcp existed.

Add _check_config_mcp_changes() called from process_loop every 5s:
- stat() config.yaml for mtime changes (fast path, no YAML parse)
- On mtime change, parse and compare mcp_servers section
- If mcp_servers changed, auto-trigger _reload_mcp() and notify user
- Skip check while agent is running to avoid interrupting tool calls
- Throttled to CONFIG_WATCH_INTERVAL=5s to avoid busy-polling

/reload-mcp still works for manual force-reload.

Tests: 6 new tests in TestMCPConfigWatch, all passed

Co-authored-by: teyrebaz33 <hakanerten02@hotmail.com>
2026-03-15 19:03:34 -07:00
Teknium
471c663fdf fix(cli): silence tirith prefetch install warnings at startup (#1452) 2026-03-15 18:07:03 -07:00
Teknium
64d333204b Merge pull request #1242 from NousResearch/fix/file-tool-log-noise
fix: reduce file tool log noise
2026-03-15 11:11:18 -07:00
Teknium
c44af43840 Merge pull request #1401 from NousResearch/hermes/hermes-eca4a640
test: protect atomic temp cleanup on interrupts
2026-03-15 11:10:41 -07:00
alt-glitch
4511322f56 Merge origin/main into sid/persistent-backend
Resolve conflict in local.py: keep refactored _make_run_env helper
over inline _sanitize_subprocess_env logic.
2026-03-15 21:08:11 +05:30
Teknium
934fc9df22 Merge pull request #1440 from NousResearch/fix/1071-dict-tool-args
fix: handle dict tool call arguments from local backends
2026-03-15 08:04:09 -07:00
teknium1
5847c180c6 test: restore vllm integration coverage and add dict-args regression
Restore the existing vLLM integration test module that was accidentally replaced during development and add a focused agent-loop regression test for dict tool-call arguments from OpenAI-compatible local backends.
2026-03-15 08:02:29 -07:00
teknium1
93a0c0cddd fix: handle dict tool call arguments from local backends
Normalize tool call arguments when OpenAI-compatible backends return parsed dict/list payloads instead of JSON strings. This prevents the .strip() crash during tool-call validation for llama.cpp and similar servers, while preserving existing empty-string and invalid-JSON handling. Adds a focused regression test for dict arguments in the agent loop.
2026-03-15 08:00:19 -07:00
Teknium
23e8fdd167 feat(discord): auto-thread on @mention + skip mention in bot threads
Two changes to align Discord behavior with Slack:

1. Auto-thread on @mention (default: true)
   - When someone @mentions the bot in a server channel, a thread is
     automatically created from their message and the response goes there.
   - Each thread gets its own isolated session (like Slack).
   - Configurable via discord.auto_thread in config.yaml (default: true)
     or DISCORD_AUTO_THREAD env var (env takes precedence).
   - DMs and existing threads are unaffected.

2. Skip @mention in bot-participated threads
   - Once the bot has responded in a thread (auto-created or manually
     entered), subsequent messages in that thread no longer require
     @mention. Users can just type normally.
   - Tracked via in-memory set (_bot_participated_threads). After a
     gateway restart, users need to @mention once to re-establish.
   - Threads the bot hasn't participated in still require @mention.

Config change:
   discord:
     auto_thread: true  # new, added to DEFAULT_CONFIG

Tests: 7 new tests covering auto-thread default, disable, bot thread
participation tracking, and mention skip logic. All 903 gateway tests pass.
2026-03-15 07:59:55 -07:00
Teknium
3268b98779 Merge pull request #1437 from NousResearch/fix/1219-cron-thread-context
fix: preserve thread context for cronjob deliver=origin
2026-03-15 06:58:37 -07:00
teyrebaz33
20f381cfb6 fix: preserve thread context for cronjob deliver=origin
When a cronjob is created from within a Telegram or Slack thread,
deliver=origin was posting to the parent channel instead of the thread.

Root cause: the gateway never set HERMES_SESSION_THREAD_ID in the
session environment, so cronjob_tools.py could not capture thread_id
into the job's origin metadata — even though the scheduler already
reads origin.get('thread_id').

Fix:
- gateway/run.py: set HERMES_SESSION_THREAD_ID when thread_id is
  present on the session context, and clear it in _clear_session_env
- tools/cronjob_tools.py: read HERMES_SESSION_THREAD_ID into origin

Closes #1219
2026-03-15 06:57:00 -07:00
Teknium
77bfa252b9 Merge pull request #1434 from NousResearch/fix/1244-env-override
fix(config): reload .env over stale shell overrides
2026-03-15 06:47:40 -07:00
teknium1
f24c00a5bf fix(config): reload .env over stale shell overrides
Hermes startup entrypoints now load ~/.hermes/.env and project fallback env files with user config taking precedence over stale shell-exported values. This makes model/provider/base URL changes in .env actually take effect after restarting Hermes. Adds a shared env loader plus regression coverage, and reproduces the original bug case where OPENAI_BASE_URL and HERMES_INFERENCE_PROVIDER remained stuck on old shell values before import.
2026-03-15 06:46:28 -07:00
Teknium
463239ed85 docs: fallback providers + /background command documentation
* docs: comprehensive fallback providers documentation

- New dedicated page: user-guide/features/fallback-providers.md covering
  both primary model fallback and auxiliary task fallback systems
- Updated configuration.md with fallback_model config section
- Updated environment-variables.md noting fallback is config-only
- Fleshed out developer-guide/provider-runtime.md fallback section with
  internal architecture details (trigger points, activation flow, config flow)
- Added cross-reference from provider-routing.md distinguishing OpenRouter
  sub-provider routing from Hermes-level model fallback
- Added new page to sidebar under Integrations

* docs: comprehensive /background command documentation

- Added Background Sessions section to cli.md covering how it works
  (daemon threads, isolated sessions, config inheritance, Rich panel
  output, bell notification, concurrent tasks)
- Added Background Sessions section to messaging/index.md covering
  messaging-specific behavior (async execution, result delivery back
  to same chat, fire-and-forget pattern)
- Documented background_process_notifications config
  (all/result/error/off) in messaging docs and configuration.md
- Added HERMES_BACKGROUND_NOTIFICATIONS env var to reference page
- Fixed inconsistency in slash-commands.md: /background was listed as
  messaging-only but works in both CLI and messaging. Moved it to the
  'both surfaces' note.
- Expanded one-liner table descriptions with detail and cross-references
2026-03-15 06:24:28 -07:00
Teknium
60cce9ca6d Merge pull request #1429 from NousResearch/fix/1336-discord-voice-reliability
fix(voice): Discord voice channel reliability fixes
2026-03-15 05:25:45 -07:00
teknium1
2d57946ee9 test(voice): clarify install guidance and local skips
Add an explicit messaging-extra install hint to the missing PyNaCl/davey error path, cover it with a voice-channel join regression test, and skip the low-level NaCl packet tests when PyNaCl is not installed locally.
2026-03-15 05:24:34 -07:00
0xbyt4
5f32fd8b6d feat(voice): add discord-voice-doctor diagnostic script
Checks the full voice environment and reports what's missing:
- Python packages: discord.py, PyNaCl, davey, STT/TTS providers
- System tools: Opus codec (macOS + Linux paths), ffmpeg
- Environment: bot token, allowed users (resolved to usernames), API keys
- Configuration: STT/TTS provider, voice mode state
- Bot permissions: live Discord API check for Connect, Speak, VAD, etc.

All sensitive values are masked. Gracefully handles missing deps,
invalid tokens, API timeouts, and unreachable Discord API.
2026-03-15 05:20:17 -07:00
0xbyt4
3ea039684e test(voice): add integration tests with real NaCl crypto and Opus codec
End-to-end voice channel tests using real crypto (no mocks):

NaCl decrypt (5): valid packet, wrong key, bot SSRC, multi-packet, multi-SSRC
DAVE passthrough (3): unknown SSRC, Unencrypted error, real error drop
Full flow (5): utterance lifecycle, auto-map, pause/resume, corruption, cleanup
SPEAKING hook (4): hook installed, map/overwrite, mapped audio processed
Auth filtering (3): allowed user, rejected user, empty allowlist
Rejoin flow (3): clean state, new SSRC, missing SPEAKING auto-map
Multi-guild (2): independent receivers, stop isolation
Echo prevention (2): paused audio ignored, resumed audio processed
2026-03-15 05:20:17 -07:00
0xbyt4
63f0ec96ec test(voice): add comprehensive flow tests for voice channel fixes
Tests cover the actual code paths changed in voice fixes:

_on_packet DAVE passthrough (8 tests):
- Known SSRC + DAVE decrypt success → buffered
- Unknown SSRC + DAVE → skip DAVE, passthrough to Opus
- DAVE "Unencrypted" error → passthrough, not dropped
- DAVE other error → packet dropped
- No DAVE session → direct decode
- Bot's own SSRC → ignored (echo prevention)
- Multiple SSRCs → separate buffers

SSRC auto-mapping (6 tests):
- Single allowed user → auto-mapped
- Multiple allowed users → no auto-map
- No allowlist → sole non-bot member inferred
- Unallowed user → rejected
- Only bot in channel → no map
- Auto-map persists across checks

Buffer lifecycle (4 tests):
- Known SSRC completed utterance
- Short buffer ignored
- Recent audio waits
- Stale unknown buffer discarded

TTS playback (10 tests):
- play_tts calls play_in_voice_channel in VC
- play_tts falls through when not in VC
- play_tts wrong channel no match
- Voice input dedup (runner skips)
- Text + voice_mode combinations
- Error/empty response skipped
- Agent TTS tool dedup

UDP keepalive (2 tests):
- Interval within bounds
- Silence frame actually sent via send_packet
2026-03-15 05:20:17 -07:00
0xbyt4
1cacaccca6 fix(voice): show clear error when voice dependencies are missing
When PyNaCl or davey is not installed, joining a voice channel fails
with a raw exception. Now shows a human-readable message pointing
the user to reinstall with voice support.

Closes #1336
2026-03-15 05:20:17 -07:00
0xbyt4
773f3c1137 fix(voice): DAVE passthrough + auto-map SSRC after bot rejoin
After bot leave/rejoin, Discord doesn't resend SPEAKING events for
users already in the channel. This left SSRC unmapped and all audio
was silently dropped by DAVE decrypt.

Fixes:
- Skip DAVE for unknown SSRCs instead of dropping (passthrough)
- Handle "UnencryptedWhenPassthroughDisabled" DAVE errors gracefully
- Auto-infer user_id from sole allowed member in voice channel
- Pass allowed_user_ids to VoiceReceiver for secure inference
2026-03-15 05:20:17 -07:00
0xbyt4
0cc784068d fix(voice): add UDP keepalive to prevent Discord dropping voice after silence
Discord drops the UDP voice route after ~60s of silence - no packets
arrive even when users start speaking again. Send an Opus silence
frame every 15s to keep the UDP session alive.
2026-03-15 05:20:17 -07:00
0xbyt4
f1b4d0b280 fix(voice): make play_tts play in VC instead of no-op
play_tts was returning success without playing anything when bot was
in a voice channel. Now it calls play_in_voice_channel directly.

Simplified skip_double dedup: base adapter handles voice input TTS
via play_tts (which now works for VC), runner skips to avoid double.
2026-03-15 05:20:17 -07:00
Teknium
5254d0bba1 Merge pull request #1427 from NousResearch/fix/1414-gateway-shutdown-restart
fix(gateway): cancel active runs during shutdown
2026-03-15 04:47:28 -07:00
teknium1
21c20aeaa5 fix(gateway): cancel active runs during shutdown
Track adapter background message-processing tasks, cancel them during gateway shutdown, and interrupt running agents before disconnecting adapters. This prevents old gateway instances from continuing in-flight work after stop/replace, which was contributing to the restart-time task continuation/flicker behavior reported in #1414. Adds regression coverage for adapter task cancellation and shutdown interrupts.
2026-03-15 04:21:50 -07:00
Teknium
dc095f8491 Merge pull request #1425 from NousResearch/fix/1412-session-delete-prefix
fix(cli): accept session ID prefixes for session actions
2026-03-15 04:02:47 -07:00
teknium1
621fd80b1e fix(cli): accept session ID prefixes for session actions
Resolve session IDs by exact match or unique prefix for sessions delete/export/rename so IDs copied from Preview                                            Last Active   Src    ID
──────────────────────────────────────────────────────────────────────────────────────────
Search for GitHub/GitLab source repositories for   11m ago       cli    20260315_034720_8e1f
[SYSTEM: The user has invoked the "minecraft-atm   1m ago        cli    20260315_034035_57b6
                                                   1h ago        cron   cron_job-1_20260315_
[SYSTEM: The user has invoked the "hermes-agent-   9m ago        cli    20260315_014304_652a
                                                   4h ago        cron   cron_job-1_20260314_
[The user attached an image. Here's what it cont   4h ago        cli    20260314_233806_c8f3
[SYSTEM: The user has invoked the "google-worksp   1h ago        cli    20260314_233301_b04f
Inspect the opencode codebase for how it sends m   4h ago        cli    20260314_232543_0601
Inspect the clawdbot codebase for how it sends m   4h ago        cli    20260314_232543_8125
                                                   4h ago        cron   cron_job-1_20260314_
Reply with exactly: smoke-ok                       4h ago        cli    20260314_231730_aac9
                                                   4h ago        cron   cron_job-1_20260314_
[SYSTEM: The user has invoked the "hermes-agent-   4h ago        cli    20260314_231111_3586
[SYSTEM: The user has invoked the "hermes-agent-   4h ago        cli    20260314_225551_daff
                                                   5h ago        cron   cron_job-1_20260314_
[SYSTEM: The user has invoked the "google-worksp   4h ago        cli    20260314_224629_a9c6
k_sze   — 10:34 PM Just ran hermes update and I    5h ago        cli    20260314_224243_544e
                                                   5h ago        cron   cron_job-1_20260314_
                                                   5h ago        cron   cron_job-1_20260314_
                                                   5h ago        cron   cron_job-1_20260314_ work even when the table view truncates them. Add SessionDB prefix-resolution coverage and a CLI regression test for deleting by listed prefix.
2026-03-15 04:01:56 -07:00
Teknium
2b8fd9a8e3 Merge pull request #1422 from NousResearch/fix/1409-photo-burst-interrupts
fix(gateway): prevent Telegram photo burst interrupts
2026-03-15 03:51:51 -07:00
teknium1
fef710aca8 test(gateway): cover photo burst interrupt regressions
Add regression coverage for non-album Telegram photo burst batching, photo follow-ups that should queue without interrupting active runs, and the gateway priority-interrupt path for photo events.
2026-03-15 03:50:45 -07:00
CoinDegen
4ae1334287 fix(gateway): prevent telegram photo burst interrupts 2026-03-15 03:49:01 -07:00
Teknium
db3e3aa6c5 Merge pull request #1421 from NousResearch/fix/1247-preserve-mcp-toolsets
fix(tools): preserve MCP toolsets when saving platform tool config
2026-03-15 03:33:31 -07:00
ygd58
633488e0c0 fix(tools): preserve MCP toolsets when saving platform tool config
_save_platform_tools() overwrote the entire platform_toolsets list with
only the toolsets known to CONFIGURABLE_TOOLSETS. This silently dropped
any MCP server toolsets that users had added manually to config.yaml.

Fix: collect any existing toolset keys that are not in CONFIGURABLE_TOOLSETS
and append them back after the wizard's selections are written. This ensures
MCP toolsets survive a hermes tools save.

Fixes #1247
2026-03-15 03:28:20 -07:00
Teknium
0de200cf4d Merge pull request #1419 from NousResearch/fix/1264-env-secret-blocklist
fix(security): block gateway and tool env vars in subprocesses
2026-03-15 03:22:58 -07:00
Teknium
f6fdb18fe6 Merge pull request #1417 from NousResearch/fix/1056-dm-session-isolation
fix(gateway): isolate DM sessions by chat_id
2026-03-15 03:22:39 -07:00
teknium1
b177b4abad fix(security): block gateway and tool env vars in subprocesses
Extend subprocess env sanitization beyond provider credentials by blocking Hermes-managed tool, messaging, and related gateway runtime vars. Reuse a shared sanitizer in LocalEnvironment and ProcessRegistry so background and PTY processes honor the same blocklist and _HERMES_FORCE_ escape hatch. Add regression coverage for local env execution and process_registry spawning.
2026-03-15 02:51:04 -07:00
teknium1
232ba441d7 test: cover DM session key isolation
Update interrupt-key expectations for namespaced DM session keys and add a regression test that different DM chat IDs produce distinct gateway sessions.
2026-03-15 02:38:48 -07:00
heyyyimmax
34e120bcbb fix(gateway): enforce chat_id isolation for all DM sessions 2026-03-15 02:37:53 -07:00
Teknium
779f8df6a6 Merge pull request #1408 from NousResearch/hermes/hermes-daa73839
fix: make Claude image handling work end-to-end
2026-03-14 23:45:03 -07:00
teknium1
62abb453d3 Merge origin/main into hermes/hermes-daa73839 2026-03-14 23:44:47 -07:00
teknium1
735a6e7651 fix: convert anthropic image content blocks 2026-03-14 23:41:20 -07:00
Teknium
e5ddca1c8b Merge pull request #1407 from NousResearch/hermes/slack-thread-docs
docs: clarify Slack thread reply behavior
2026-03-14 23:25:43 -07:00
teknium1
214827a594 docs: move Discord behavior guidance to top 2026-03-14 23:18:21 -07:00
Teknium
fd0e1aac72 Merge pull request #1400 from NousResearch/hermes/hermes-45b79a59-clawhub-search
fix: harden ClawHub skill search exact matches
2026-03-14 23:17:24 -07:00
teknium1
678e0bd9cc docs: clarify Slack thread reply behavior 2026-03-14 23:15:25 -07:00
teknium1
8ccd14a0d4 fix: improve clawhub skill search matching 2026-03-14 23:15:04 -07:00
Teknium
6c611c852e fix(update): clarify manual autostash cleanup
fix(update): clarify manual autostash cleanup
2026-03-14 22:58:11 -07:00
teknium1
f882dabf19 fix(update): clarify manual autostash cleanup 2026-03-14 22:56:59 -07:00
Teknium
973aa9b549 fix(update): drop autostash by stash selector
fix(update): drop autostash by stash selector
2026-03-14 22:53:50 -07:00
Teknium
2316b8dc98 Merge pull request #1405 from NousResearch/hermes/hermes-7ef7cb6a
docs: stabilize website diagrams
2026-03-14 22:52:56 -07:00
teknium1
259208bfe4 docs: stabilize website diagrams 2026-03-14 22:49:57 -07:00
teknium1
47c5c97654 fix(update): drop autostash by stash selector 2026-03-14 22:45:29 -07:00
teknium1
b117bbc125 test: cover atomic temp cleanup on interrupts
- add regression coverage for BaseException cleanup in atomic_json_write
- add dedicated atomic_yaml_write tests, including interrupt cleanup
- document why BaseException is intentional in both helpers
2026-03-14 22:31:51 -07:00
teknium1
df9020dfa3 fix: harden clawhub skill search exact matches 2026-03-14 22:31:09 -07:00
Teknium
c6fb7f6463 Merge pull request #1399 from NousResearch/hermes/hermes-629f8bde
fix(#1002): expand environment blocklist for terminal isolation
2026-03-14 22:30:05 -07:00
teknium1
672dc1666f test: cover extra provider env blocklist vars 2026-03-14 22:29:35 -07:00
Teknium
5b11570517 Merge pull request #1398 from NousResearch/hermes/hermes-1b6f4583
fix(cron): support per-job runtime overrides
2026-03-14 22:29:30 -07:00
teknium1
ff87a566c4 fix(test): make Nous setup prompt selection robust to optional vision step 2026-03-14 22:28:15 -07:00
Nikita
9e3752df36 fix(#1002): expand environment blocklist for terminal isolation
Expanded the list of blocked environment variables to include Google, Groq, Mistral, and other major LLM providers. This ensures complete isolation and prevents conflicts with external CLI tools.
2026-03-14 22:27:32 -07:00
Teknium
15bf0b4af2 Merge pull request #1365 from mr-emmett-one/fix/deepseek-multi-tool-calls-989
fix: support multiple parallel tool calls in DeepSeek V3 parser (#989)
2026-03-14 22:22:45 -07:00
Synergy
28b3764d1e fix(cron): support per-job runtime overrides
Salvaged from PR #1292 onto current main. Preserve per-job model,
provider, and base_url overrides in cron execution, persist them in
job records, expose them through the cronjob tool create/update paths,
and add regression coverage. Deliberately does not persist per-job
api_key values.
2026-03-14 22:22:31 -07:00
Teknium
62f1c2b622 Merge pull request #1397 from NousResearch/hermes/hermes-629f8bde
fix: escape parens and braces in fork bomb regex pattern
2026-03-14 22:17:16 -07:00
Teknium
71cff92eb7 Merge pull request #1377 from NousResearch/hermes/hermes-aa701810
feat: add native Anthropic auxiliary vision
2026-03-14 22:16:09 -07:00
teknium1
1337c9efd8 test: resolve auxiliary client merge conflict 2026-03-14 22:15:16 -07:00
Teknium
747612fb3e Merge pull request #1396 from NousResearch/hermes/hermes-0fadff1b
fix: persist Google OAuth PKCE state for headless setup
2026-03-14 22:13:37 -07:00
Teknium
84d99f7754 Merge pull request #1394 from NousResearch/hermes/hermes-eca4a640
fix: honor stt.enabled false across gateway transcription
2026-03-14 22:11:47 -07:00
teknium1
4524cddc72 fix: persist google oauth pkce for headless auth
Store the pending OAuth state and code verifier between --auth-url and --auth-code so the manual headless flow can reuse Flow.fetch_token() without disabling PKCE.
2026-03-14 22:11:34 -07:00
teknium1
f4e8772de4 fix: require oauth creds for native Anthropic 2026-03-14 22:11:21 -07:00
Teknium
39fe9e8533 Merge pull request #1395 from NousResearch/hermes/hermes-7ef7cb6a
fix: use description as pattern_key to prevent approval collisions
2026-03-14 22:11:09 -07:00
teknium1
d5b64ebdb3 fix: preserve legacy approval keys after pattern key migration 2026-03-14 22:10:39 -07:00
teknium1
f8ceadbad0 fix: propagate STT disable through shared transcription config
- add stt.enabled to the default user config
- make transcription_tools respect the disabled flag globally
- surface disabled state cleanly in voice mode diagnostics
- add regression coverage for disabled STT provider selection
2026-03-14 22:09:59 -07:00
teyrebaz33
c36136084a fix(gateway): honor stt.enabled false for voice transcription
- bridge stt.enabled from config.yaml into gateway runtime config
- preserve the flag in GatewayConfig serialization
- skip gateway voice transcription when STT is disabled
- add regression tests for config loading and disabled transcription flow
2026-03-14 22:09:53 -07:00
0xbyt4
4a93cfd889 fix: use description as pattern_key to prevent approval collisions
pattern_key was derived by splitting the regex on \b and taking [1],
so patterns starting with the same word (e.g. find -exec rm and
find -delete) produced the same key "find". Approving one silently
approved the other. Using the unique description string as the key
eliminates all collisions.
2026-03-14 22:07:58 -07:00
Teknium
f46b35e3d1 Merge pull request #1393 from NousResearch/hermes/hermes-45b79a59-pr1087
fix: normalize Codex dict tool arguments as JSON
2026-03-14 22:07:22 -07:00
0xbyt4
e6417cb7bc fix: escape parens and braces in fork bomb regex pattern
The fork bomb regex used `()` (empty capture group) and unescaped `{}`
instead of literal `\(\)` and `\{\}`. This meant the classic fork bomb
`:(){ :|:& };:` was never detected. Also added `\s*` between `:` and
`&` and between `;` and trailing `:` to catch whitespace variants.
2026-03-14 22:06:44 -07:00
teknium1
08081e5969 Merge origin/main into hermes/hermes-7ef7cb6a 2026-03-14 22:06:26 -07:00
Teknium
30120f05a6 Merge pull request #1392 from NousResearch/hermes/hermes-1b6f4583
fix(discord): preserve native document and video attachment support
2026-03-14 22:04:04 -07:00
0xbyt4
6f85283553 fix: use json.dumps instead of str() for Codex Responses API arguments
When the Responses API returns tool call arguments as a dict,
str(dict) produces Python repr with single quotes (e.g. {'key': 'val'})
which is invalid JSON. Downstream json.loads() fails silently and the
tool gets called with empty arguments, losing all parameters.

Affects both function_call and custom_tool_call item types in
_normalize_codex_response().
2026-03-14 22:03:53 -07:00
halfprice06
9a177d6f4b fix(discord): preserve native document and video attachment support
Salvaged from PR #1115 onto current main by reusing the shared
Discord file-attachment helper for local video and document sends,
including file_name support for documents and regression coverage.
2026-03-14 22:01:02 -07:00
Teknium
6761021fb4 Merge pull request #1391 from NousResearch/hermes/hermes-629f8bde
fix: prevent closed OpenAI client reuse across retries
2026-03-14 21:58:11 -07:00
yemi-lagosinternationalmarket
00c5e77724 fix: prevent closed OpenAI client reuse across retries
Use per-request OpenAI clients inside _interruptible_api_call so interrupts and transport failures do not poison later retries. Also add closed-client detection/recreation for the shared client and regression tests covering retry and concurrency behavior.
2026-03-14 21:56:00 -07:00
Teknium
69045711c1 Merge pull request #1389 from NousResearch/hermes/hermes-7ef7cb6a
fix(telegram): check updater/app state before disconnect
2026-03-14 21:53:54 -07:00
teknium1
9938d27e27 test(telegram): cover disconnect with inactive updater 2026-03-14 21:53:28 -07:00
Teknium
d36b3d498d Merge pull request #1388 from NousResearch/hermes/hermes-0fadff1b
fix: harden .worktreeinclude path containment
2026-03-14 21:53:28 -07:00
Vimal
0c182211a1 fix(telegram): check updater/app state before disconnect
The disconnect() method was unconditionally calling updater.stop() and
app.stop(), causing errors when:
- The updater was not running (RuntimeError: This Updater is not running!)
- The app was None (AttributeError: 'NoneType' object has no attribute)

Changes:
- Check if updater exists and is running before stopping
- Check if app is running before stopping
- Only log warnings for actual errors, not expected shutdown states

Fixes spurious warnings during gateway shutdown.
2026-03-14 21:51:30 -07:00
teknium1
f4c012873c fix: harden salvaged worktree include checks
Use Path.relative_to-based containment checks for the salvaged .worktreeinclude guard, remove the replayed test logic from the cherry-picked PR, and add real integration regressions for file, directory, and symlink escapes.
2026-03-14 21:51:27 -07:00
teknium1
8ac5baf2d8 Merge origin/main into hermes/hermes-7ef7cb6a 2026-03-14 21:51:21 -07:00
Teknium
c54db79edc Merge pull request #1387 from NousResearch/hermes/hermes-eca4a640
fix: improve Slack setup guidance
2026-03-14 21:50:28 -07:00
teknium1
2119b68799 fix: clarify Slack setup guidance
- mark private-channel scopes/events as optional
- note reinstall requirement after scope/event changes
- correct Slack allowlist messaging to match gateway behavior
2026-03-14 21:49:04 -07:00
Joshua Martinez
fd687d0967 fix slack docs reference 2026-03-14 21:48:39 -07:00
Sebastion
12bc86d9c9 fix: prevent path traversal in .worktreeinclude file processing
Resolve .worktreeinclude entries and validate that both the source path
stays within the repository root and the destination path stays within
the worktree directory before copying files or creating symlinks.

A malicious .worktreeinclude in a cloned repository could previously
reference paths like "../../etc/passwd" to copy or symlink arbitrary
files from outside the repo into the worktree.

CWE-22: Improper Limitation of a Pathname to a Restricted Directory
2026-03-14 21:48:19 -07:00
Teknium
9e0f86cd3b Merge pull request #1386 from NousResearch/hermes/hermes-7ef7cb6a
fix(cli): non-blocking startup update check and banner deduplication
2026-03-14 21:47:54 -07:00
Teknium
883f6c81a2 Merge pull request #1385 from NousResearch/hermes/hermes-1b6f4583
fix(discord): retry without reply reference for system messages
2026-03-14 21:47:50 -07:00
Nyk
b89177668e fix(cli): non-blocking startup update check and banner deduplication
- Add background thread mechanism (prefetch_update_check/get_update_result)
  so git fetch runs in parallel with skill sync and agent init
- Fix repo path fallback in check_for_updates() for dev installs
- Remove duplicate build_welcome_banner (~180 lines) and
  _format_context_length from cli.py — the banner.py version is
  now the single source of truth
- Port skin banner_hero/banner_logo support and terminal width check
  from cli.py's version into banner.py
- Add update status output to hermes version command
- Add unit tests for update check, prefetch, and version string
2026-03-14 21:45:50 -07:00
teknium1
9f51de7261 Merge origin/main into hermes/hermes-7ef7cb6a 2026-03-14 21:45:41 -07:00
teknium1
a05a4afa53 fix: align salvaged Discord send test mock with current slash-command API 2026-03-14 21:44:50 -07:00
teknium1
db9e512424 fix: fall back from managed Anthropic keys 2026-03-14 21:44:39 -07:00
insecurejezza
8ce66a01ee fix(discord): retry without reply reference for system messages 2026-03-14 21:44:38 -07:00
Teknium
f9a61a0d9e Merge pull request #1383 from NousResearch/hermes/hermes-7ef7cb6a
fix: add project root to PYTHONPATH in execute_code sandbox
2026-03-14 21:41:50 -07:00
Teknium
ba9f82946d Merge pull request #1382 from NousResearch/hermes/hermes-0fadff1b
fix: verify crontab availability for cronjob tools
2026-03-14 21:41:14 -07:00
teknium1
0614969f7b test: cover repo-root imports in execute_code sandbox 2026-03-14 21:41:12 -07:00
teknium1
f6ff6639e8 fix: complete salvaged cronjob dependency check
Add regression coverage for cronjob availability and import shutil for the crontab PATH check added from PR #1380.
2026-03-14 21:39:59 -07:00
Nikita
861869cb48 fix(#878): add robust crontab binary check to requirements 2026-03-14 21:39:59 -07:00
anastazya
23bc642c82 fix: add project root to PYTHONPATH in execute_code sandbox
The execute_code sandbox spawns a child process with cwd set to a
temporary directory, but never adds the hermes-agent project root to
PYTHONPATH. This makes project-root modules like minisweagent_path
unreachable from sandboxed scripts, causing ImportError when the
agent runs self-diagnostic or analysis code via execute_code.

Fix by prepending the hermes-agent root directory to PYTHONPATH in
the child process environment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 21:39:05 -07:00
teknium1
9c322f7f59 Merge origin/main into hermes/hermes-7ef7cb6a 2026-03-14 21:39:01 -07:00
Teknium
b14a07315b fix: save /plan output in workspace (#1381) 2026-03-14 21:28:51 -07:00
Teknium
ff3473a37c feat: add /plan command (#1372)
* feat: add /plan command

* refactor: back /plan with bundled skill

* docs: document /plan skill
2026-03-14 21:18:17 -07:00
Teknium
cb7690b2b5 Merge pull request #1375 from NousResearch/hermes/hermes-dd253d81
feat: add direct endpoint overrides for auxiliary and delegation
2026-03-14 21:17:42 -07:00
Teknium
95939a1b51 docs: clarify gateway service scopes (#1378) 2026-03-14 21:17:41 -07:00
teknium1
85ef09e520 Merge origin/main into hermes/hermes-dd253d81 2026-03-14 21:16:29 -07:00
teknium1
db362dbd4c feat: add native Anthropic auxiliary vision 2026-03-14 21:14:20 -07:00
teknium1
9f6bccd76a feat: add direct endpoint overrides for auxiliary and delegation
Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.
2026-03-14 21:11:37 -07:00
Emmett
26bedf973b fix: support multiple parallel tool calls in DeepSeek V3 parser (#989)
- Refactored regex pattern to handle varied whitespace and newlines for better robustness.
- Replaced logic to iterate through all tool call blocks using finditer instead of stopping at the first match.
- Ensured full extraction of multiple tool calls for complex agentic workflows.
- Added error logging for failed parsing attempts.
2026-03-15 03:55:24 +01:00
alt-glitch
e266530c7d add different polling intervals for ssh and local backends. ssh has a
longer roundtrip
2026-03-15 02:54:32 +05:30
alt-glitch
879b7d3fbf fix(tests): update mock stdout in env blocklist tests
The fake_popen mock used iter([]) for proc.stdout which doesn't
support .close(). Use MagicMock with __iter__ instead, since
_drain_stdout now calls proc.stdout.close() in its finally block.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 02:48:05 +05:30
alt-glitch
9f36483bf4 refactor: deduplicate execute/cleanup, merge init, clean up helpers
- Merge _init_persistent_shell + _start_persistent_shell into single method
- Move execute() dispatcher and cleanup() into PersistentShellMixin
  so LocalEnvironment and SSHEnvironment inherit them
- Remove broad except Exception wrappers from _execute_oneshot in both backends
- Replace try/except with os.path.exists checks in local _read_temp_files
  and _cleanup_temp_files
- Remove redundant bash -c from SSH oneshot (SSH already runs in a shell)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 02:39:56 +05:30
alt-glitch
7be314c456 pass configs to file_tools for r+w over ssh.
pass TERM env.
default to ~ to in local and ssh backends.
ssh backend.
2026-03-15 02:26:39 +05:30
balyan.sid@gmail.com
9001b34146 simplify docstrings, fix some bugs 2026-03-15 01:20:42 +05:30
balyan.sid@gmail.com
861202b56c wip: add persistent shell to ssh and local terminal backends 2026-03-15 01:20:42 +05:30
balyan.sid@gmail.com
9d63dcc3f9 add persistent ssh backend 2026-03-15 01:19:38 +05:30
teknium1
b59da08730 fix: reduce file tool log noise
- treat git diff --cached --quiet rc=1 as an expected checkpoint state
  instead of logging it as an error
- downgrade expected write PermissionError/EROFS/EACCES failures out of
  error logging while keeping unexpected exceptions at error level
- add regression tests for both logging behaviors
2026-03-13 22:14:00 -07:00
981 changed files with 206264 additions and 18515 deletions

15
.dockerignore Normal file
View File

@@ -0,0 +1,15 @@
# Git
.git
.gitignore
.gitmodules
# Dependencies
node_modules
# CI/CD
.github
# Environment files
.env
*.md

View File

@@ -7,18 +7,19 @@
# OpenRouter provides access to many models through one API
# All LLM calls go through OpenRouter - no direct provider keys needed
# Get your key at: https://openrouter.ai/keys
OPENROUTER_API_KEY=
# OPENROUTER_API_KEY=
# Default model to use (OpenRouter format: provider/model)
# Examples: anthropic/claude-opus-4.6, openai/gpt-4o, google/gemini-3-flash-preview, zhipuai/glm-4-plus
LLM_MODEL=anthropic/claude-opus-4.6
# Default model is configured in ~/.hermes/config.yaml (model.default).
# Use 'hermes model' or 'hermes setup' to change it.
# LLM_MODEL is no longer read from .env — this line is kept for reference only.
# LLM_MODEL=anthropic/claude-opus-4.6
# =============================================================================
# LLM PROVIDER (z.ai / GLM)
# =============================================================================
# z.ai provides access to ZhipuAI GLM models (GLM-4-Plus, etc.)
# Get your key at: https://z.ai or https://open.bigmodel.cn
GLM_API_KEY=
# GLM_API_KEY=
# GLM_BASE_URL=https://api.z.ai/api/paas/v4 # Override default base URL
# =============================================================================
@@ -28,7 +29,7 @@ GLM_API_KEY=
# Get your key at: https://platform.kimi.ai (Kimi Code console)
# Keys prefixed sk-kimi- use the Kimi Code API (api.kimi.com) by default.
# Legacy keys from platform.moonshot.ai need KIMI_BASE_URL override below.
KIMI_API_KEY=
# KIMI_API_KEY=
# KIMI_BASE_URL=https://api.kimi.com/coding/v1 # Default for sk-kimi- keys
# KIMI_BASE_URL=https://api.moonshot.ai/v1 # For legacy Moonshot keys
# KIMI_BASE_URL=https://api.moonshot.cn/v1 # For Moonshot China keys
@@ -38,33 +39,67 @@ KIMI_API_KEY=
# =============================================================================
# MiniMax provides access to MiniMax models (global endpoint)
# Get your key at: https://www.minimax.io
MINIMAX_API_KEY=
# MINIMAX_API_KEY=
# MINIMAX_BASE_URL=https://api.minimax.io/v1 # Override default base URL
# MiniMax China endpoint (for users in mainland China)
MINIMAX_CN_API_KEY=
# MINIMAX_CN_API_KEY=
# MINIMAX_CN_BASE_URL=https://api.minimaxi.com/v1 # Override default base URL
# =============================================================================
# LLM PROVIDER (OpenCode Zen)
# =============================================================================
# OpenCode Zen provides curated, tested models (GPT, Claude, Gemini, MiniMax, GLM, Kimi)
# Pay-as-you-go pricing. Get your key at: https://opencode.ai/auth
# OPENCODE_ZEN_API_KEY=
# OPENCODE_ZEN_BASE_URL=https://opencode.ai/zen/v1 # Override default base URL
# =============================================================================
# LLM PROVIDER (OpenCode Go)
# =============================================================================
# OpenCode Go provides access to open models (GLM-5, Kimi K2.5, MiniMax M2.5)
# $10/month subscription. Get your key at: https://opencode.ai/auth
# OPENCODE_GO_API_KEY=
# =============================================================================
# LLM PROVIDER (Hugging Face Inference Providers)
# =============================================================================
# Hugging Face routes to 20+ open models via unified OpenAI-compatible endpoint.
# Free tier included ($0.10/month), no markup on provider rates.
# Get your token at: https://huggingface.co/settings/tokens
# Required permission: "Make calls to Inference Providers"
# HF_TOKEN=
# OPENCODE_GO_BASE_URL=https://opencode.ai/zen/go/v1 # Override default base URL
# =============================================================================
# TOOL API KEYS
# =============================================================================
# Exa API Key - AI-native web search and contents
# Get at: https://exa.ai
# EXA_API_KEY=
# Parallel API Key - AI-native web search and extract
# Get at: https://parallel.ai
# PARALLEL_API_KEY=
# Firecrawl API Key - Web search, extract, and crawl
# Get at: https://firecrawl.dev/
FIRECRAWL_API_KEY=
# FIRECRAWL_API_KEY=
# FAL.ai API Key - Image generation
# Get at: https://fal.ai/
FAL_KEY=
# FAL_KEY=
# Honcho - Cross-session AI-native user modeling (optional)
# Builds a persistent understanding of the user across sessions and tools.
# Get at: https://app.honcho.dev
# Also requires ~/.honcho/config.json with enabled=true (see README).
HONCHO_API_KEY=
# HONCHO_API_KEY=
# =============================================================================
# TERMINAL TOOL CONFIGURATION (mini-swe-agent backend)
# TERMINAL TOOL CONFIGURATION
# =============================================================================
# Backend type: "local", "singularity", "docker", "modal", or "ssh"
# Terminal backend is configured in ~/.hermes/config.yaml (terminal.backend).
@@ -147,10 +182,10 @@ TERMINAL_LIFETIME_SECONDS=300
# Browserbase API Key - Cloud browser execution
# Get at: https://browserbase.com/
BROWSERBASE_API_KEY=
# BROWSERBASE_API_KEY=
# Browserbase Project ID - From your Browserbase dashboard
BROWSERBASE_PROJECT_ID=
# BROWSERBASE_PROJECT_ID=
# Enable residential proxies for better CAPTCHA solving (default: true)
# Routes traffic through residential IPs, significantly improves success rate
@@ -182,7 +217,7 @@ BROWSER_INACTIVITY_TIMEOUT=120
# Uses OpenAI's API directly (not via OpenRouter).
# Named VOICE_TOOLS_OPENAI_KEY to avoid interference with OpenRouter.
# Get at: https://platform.openai.com/api-keys
VOICE_TOOLS_OPENAI_KEY=
# VOICE_TOOLS_OPENAI_KEY=
# =============================================================================
# SLACK INTEGRATION
@@ -197,6 +232,21 @@ VOICE_TOOLS_OPENAI_KEY=
# Slack allowed users (comma-separated Slack user IDs)
# SLACK_ALLOWED_USERS=
# =============================================================================
# TELEGRAM INTEGRATION
# =============================================================================
# Telegram Bot Token - From @BotFather (https://t.me/BotFather)
# TELEGRAM_BOT_TOKEN=
# TELEGRAM_ALLOWED_USERS= # Comma-separated user IDs
# TELEGRAM_HOME_CHANNEL= # Default chat for cron delivery
# TELEGRAM_HOME_CHANNEL_NAME= # Display name for home channel
# Webhook mode (optional — for cloud deployments like Fly.io/Railway)
# Default is long polling. Setting TELEGRAM_WEBHOOK_URL switches to webhook mode.
# TELEGRAM_WEBHOOK_URL=https://my-app.fly.dev/telegram
# TELEGRAM_WEBHOOK_PORT=8443
# TELEGRAM_WEBHOOK_SECRET= # Recommended for production
# WhatsApp (built-in Baileys bridge — run `hermes whatsapp` to pair)
# WHATSAPP_ENABLED=false
# WHATSAPP_ALLOWED_USERS=15551234567
@@ -253,11 +303,11 @@ IMAGE_TOOLS_DEBUG=false
# Tinker API Key - RL training service
# Get at: https://tinker-console.thinkingmachines.ai/keys
TINKER_API_KEY=
# TINKER_API_KEY=
# Weights & Biases API Key - Experiment tracking and metrics
# Get at: https://wandb.ai/authorize
WANDB_API_KEY=
# WANDB_API_KEY=
# RL API Server URL (default: http://localhost:8080)
# Change if running the rl-server on a different host/port

1
.envrc Normal file
View File

@@ -0,0 +1 @@
use flake

View File

@@ -6,6 +6,8 @@ on:
paths:
- 'website/**'
- 'landingpage/**'
- 'skills/**'
- 'optional-skills/**'
- '.github/workflows/deploy-site.yml'
workflow_dispatch:
@@ -19,6 +21,8 @@ concurrency:
jobs:
build-and-deploy:
# Only run on the upstream repository, not on forks
if: github.repository == 'NousResearch/hermes-agent'
runs-on: ubuntu-latest
environment:
name: github-pages
@@ -32,6 +36,16 @@ jobs:
cache: npm
cache-dependency-path: website/package-lock.json
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install PyYAML for skill extraction
run: pip install pyyaml
- name: Extract skill metadata for dashboard
run: python3 website/scripts/extract-skills.py
- name: Install dependencies
run: npm ci
working-directory: website

79
.github/workflows/docker-publish.yml vendored Normal file
View File

@@ -0,0 +1,79 @@
name: Docker Build and Publish
on:
push:
branches: [main]
pull_request:
branches: [main]
release:
types: [published]
concurrency:
group: docker-${{ github.ref }}
cancel-in-progress: true
jobs:
build-and-push:
# Only run on the upstream repository, not on forks
if: github.repository == 'NousResearch/hermes-agent'
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
submodules: recursive
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build image
uses: docker/build-push-action@v6
with:
context: .
file: Dockerfile
load: true
tags: nousresearch/hermes-agent:test
cache-from: type=gha
cache-to: type=gha,mode=max
- name: Test image starts
run: |
docker run --rm \
-v /tmp/hermes-test:/opt/data \
--entrypoint /opt/hermes/docker/entrypoint.sh \
nousresearch/hermes-agent:test --help
- name: Log in to Docker Hub
if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Push image (main branch)
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
uses: docker/build-push-action@v6
with:
context: .
file: Dockerfile
push: true
tags: |
nousresearch/hermes-agent:latest
nousresearch/hermes-agent:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max
- name: Push image (release)
if: github.event_name == 'release'
uses: docker/build-push-action@v6
with:
context: .
file: Dockerfile
push: true
tags: |
nousresearch/hermes-agent:latest
nousresearch/hermes-agent:${{ github.event.release.tag_name }}
nousresearch/hermes-agent:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max

42
.github/workflows/docs-site-checks.yml vendored Normal file
View File

@@ -0,0 +1,42 @@
name: Docs Site Checks
on:
pull_request:
paths:
- 'website/**'
- '.github/workflows/docs-site-checks.yml'
workflow_dispatch:
jobs:
docs-site-checks:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: npm
cache-dependency-path: website/package-lock.json
- name: Install website dependencies
run: npm ci
working-directory: website
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install Python dependencies
run: python -m pip install ascii-guard pyyaml
- name: Extract skill metadata for dashboard
run: python3 website/scripts/extract-skills.py
- name: Lint docs diagrams
run: npm run lint:diagrams
working-directory: website
- name: Build Docusaurus
run: npm run build
working-directory: website

40
.github/workflows/nix.yml vendored Normal file
View File

@@ -0,0 +1,40 @@
name: Nix
on:
push:
branches: [main]
pull_request:
paths:
- 'flake.nix'
- 'flake.lock'
- 'nix/**'
- 'pyproject.toml'
- 'uv.lock'
- 'hermes_cli/**'
- 'run_agent.py'
- 'acp_adapter/**'
concurrency:
group: nix-${{ github.ref }}
cancel-in-progress: true
jobs:
nix:
strategy:
matrix:
os: [ubuntu-latest, macos-latest]
runs-on: ${{ matrix.os }}
timeout-minutes: 30
steps:
- uses: actions/checkout@v4
- uses: DeterminateSystems/nix-installer-action@main
- uses: DeterminateSystems/magic-nix-cache-action@main
- name: Check flake
if: runner.os == 'Linux'
run: nix flake check --print-build-logs
- name: Build package
if: runner.os == 'Linux'
run: nix build --print-build-logs
- name: Evaluate flake (macOS)
if: runner.os == 'macOS'
run: nix flake show --json > /dev/null

192
.github/workflows/supply-chain-audit.yml vendored Normal file
View File

@@ -0,0 +1,192 @@
name: Supply Chain Audit
on:
pull_request:
types: [opened, synchronize, reopened]
permissions:
pull-requests: write
contents: read
jobs:
scan:
name: Scan PR for supply chain risks
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Scan diff for suspicious patterns
id: scan
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
set -euo pipefail
BASE="${{ github.event.pull_request.base.sha }}"
HEAD="${{ github.event.pull_request.head.sha }}"
# Get the full diff (added lines only)
DIFF=$(git diff "$BASE".."$HEAD" -- . ':!uv.lock' ':!*.lock' ':!package-lock.json' ':!yarn.lock' || true)
FINDINGS=""
CRITICAL=false
# --- .pth files (auto-execute on Python startup) ---
PTH_FILES=$(git diff --name-only "$BASE".."$HEAD" | grep '\.pth$' || true)
if [ -n "$PTH_FILES" ]; then
CRITICAL=true
FINDINGS="${FINDINGS}
### 🚨 CRITICAL: .pth file added or modified
Python \`.pth\` files in \`site-packages/\` execute automatically when the interpreter starts — no import required. This is the exact mechanism used in the [litellm supply chain attack](https://github.com/BerriAI/litellm/issues/24512).
**Files:**
\`\`\`
${PTH_FILES}
\`\`\`
"
fi
# --- base64 + exec/eval combo (the litellm attack pattern) ---
B64_EXEC_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -iE 'base64\.(b64decode|decodebytes|urlsafe_b64decode)' | grep -iE 'exec\(|eval\(' | head -10 || true)
if [ -n "$B64_EXEC_HITS" ]; then
CRITICAL=true
FINDINGS="${FINDINGS}
### 🚨 CRITICAL: base64 decode + exec/eval combo
This is the exact pattern used in the [litellm supply chain attack](https://github.com/BerriAI/litellm/issues/24512) — base64-decoded strings passed to exec/eval to hide credential-stealing payloads.
**Matches:**
\`\`\`
${B64_EXEC_HITS}
\`\`\`
"
fi
# --- base64 decode/encode (alone — legitimate uses exist) ---
B64_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -iE 'base64\.(b64decode|b64encode|decodebytes|encodebytes|urlsafe_b64decode)|atob\(|btoa\(|Buffer\.from\(.*base64' | head -20 || true)
if [ -n "$B64_HITS" ]; then
FINDINGS="${FINDINGS}
### ⚠️ WARNING: base64 encoding/decoding detected
Base64 has legitimate uses (images, JWT, etc.) but is also commonly used to obfuscate malicious payloads. Verify the usage is appropriate.
**Matches (first 20):**
\`\`\`
${B64_HITS}
\`\`\`
"
fi
# --- exec/eval with string arguments ---
EXEC_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -E '(exec|eval)\s*\(' | grep -v '^\+\s*#' | grep -v 'test_\|mock\|assert\|# ' | head -20 || true)
if [ -n "$EXEC_HITS" ]; then
FINDINGS="${FINDINGS}
### ⚠️ WARNING: exec() or eval() usage
Dynamic code execution can hide malicious behavior, especially when combined with base64 or network fetches.
**Matches (first 20):**
\`\`\`
${EXEC_HITS}
\`\`\`
"
fi
# --- subprocess with encoded/obfuscated commands ---
PROC_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -E 'subprocess\.(Popen|call|run)\s*\(' | grep -iE 'base64|decode|encode|\\x|chr\(' | head -10 || true)
if [ -n "$PROC_HITS" ]; then
CRITICAL=true
FINDINGS="${FINDINGS}
### 🚨 CRITICAL: subprocess with encoded/obfuscated command
Subprocess calls with encoded arguments are a strong indicator of payload execution.
**Matches:**
\`\`\`
${PROC_HITS}
\`\`\`
"
fi
# --- Network calls to non-standard domains ---
EXFIL_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -iE 'requests\.(post|put)\(|httpx\.(post|put)\(|urllib\.request\.urlopen' | grep -v '^\+\s*#' | grep -v 'test_\|mock\|assert' | head -10 || true)
if [ -n "$EXFIL_HITS" ]; then
FINDINGS="${FINDINGS}
### ⚠️ WARNING: Outbound network calls (POST/PUT)
Outbound POST/PUT requests in new code could be data exfiltration. Verify the destination URLs are legitimate.
**Matches (first 10):**
\`\`\`
${EXFIL_HITS}
\`\`\`
"
fi
# --- setup.py / setup.cfg install hooks ---
SETUP_HITS=$(git diff --name-only "$BASE".."$HEAD" | grep -E '(setup\.py|setup\.cfg|__init__\.pth|sitecustomize\.py|usercustomize\.py)$' || true)
if [ -n "$SETUP_HITS" ]; then
FINDINGS="${FINDINGS}
### ⚠️ WARNING: Install hook files modified
These files can execute code during package installation or interpreter startup.
**Files:**
\`\`\`
${SETUP_HITS}
\`\`\`
"
fi
# --- Compile/marshal/pickle (code object injection) ---
MARSHAL_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -iE 'marshal\.loads|pickle\.loads|compile\(' | grep -v '^\+\s*#' | grep -v 'test_\|re\.compile\|ast\.compile' | head -10 || true)
if [ -n "$MARSHAL_HITS" ]; then
FINDINGS="${FINDINGS}
### ⚠️ WARNING: marshal/pickle/compile usage
These can deserialize or construct executable code objects.
**Matches:**
\`\`\`
${MARSHAL_HITS}
\`\`\`
"
fi
# --- Output results ---
if [ -n "$FINDINGS" ]; then
echo "found=true" >> "$GITHUB_OUTPUT"
if [ "$CRITICAL" = true ]; then
echo "critical=true" >> "$GITHUB_OUTPUT"
else
echo "critical=false" >> "$GITHUB_OUTPUT"
fi
# Write findings to a file (multiline env vars are fragile)
echo "$FINDINGS" > /tmp/findings.md
else
echo "found=false" >> "$GITHUB_OUTPUT"
echo "critical=false" >> "$GITHUB_OUTPUT"
fi
- name: Post warning comment
if: steps.scan.outputs.found == 'true'
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
SEVERITY="⚠️ Supply Chain Risk Detected"
if [ "${{ steps.scan.outputs.critical }}" = "true" ]; then
SEVERITY="🚨 CRITICAL Supply Chain Risk Detected"
fi
BODY="## ${SEVERITY}
This PR contains patterns commonly associated with supply chain attacks. This does **not** mean the PR is malicious — but these patterns require careful human review before merging.
$(cat /tmp/findings.md)
---
*Automated scan triggered by [supply-chain-audit](/.github/workflows/supply-chain-audit.yml). If this is a false positive, a maintainer can approve after manual review.*"
gh pr comment "${{ github.event.pull_request.number }}" --body "$BODY"
- name: Fail on critical findings
if: steps.scan.outputs.critical == 'true'
run: |
echo "::error::CRITICAL supply chain risk patterns detected in this PR. See the PR comment for details."
exit 1

View File

@@ -34,9 +34,37 @@ jobs:
- name: Run tests
run: |
source .venv/bin/activate
python -m pytest tests/ -q --ignore=tests/integration --tb=short -n auto
python -m pytest tests/ -q --ignore=tests/integration --ignore=tests/e2e --tb=short -n auto
env:
# Ensure tests don't accidentally call real APIs
OPENROUTER_API_KEY: ""
OPENAI_API_KEY: ""
NOUS_API_KEY: ""
e2e:
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Install uv
uses: astral-sh/setup-uv@v5
- name: Set up Python 3.11
run: uv python install 3.11
- name: Install dependencies
run: |
uv venv .venv --python 3.11
source .venv/bin/activate
uv pip install -e ".[all,dev]"
- name: Run e2e tests
run: |
source .venv/bin/activate
python -m pytest tests/e2e/ -v --tb=short
env:
OPENROUTER_API_KEY: ""
OPENAI_API_KEY: ""
NOUS_API_KEY: ""

5
.gitignore vendored
View File

@@ -53,3 +53,8 @@ environments/benchmarks/evals/
# Release script temp files
.release_notes.md
mini-swe-agent/
# Nix
.direnv/
result

3
.gitmodules vendored
View File

@@ -1,6 +1,3 @@
[submodule "mini-swe-agent"]
path = mini-swe-agent
url = https://github.com/SWE-agent/mini-swe-agent
[submodule "tinker-atropos"]
path = tinker-atropos
url = https://github.com/nousresearch/tinker-atropos

139
AGENTS.md
View File

@@ -5,7 +5,7 @@ Instructions for AI coding assistants and developers working on the hermes-agent
## Development Environment
```bash
source .venv/bin/activate # ALWAYS activate before running Python
source venv/bin/activate # ALWAYS activate before running Python
```
## Project Structure
@@ -23,6 +23,7 @@ hermes-agent/
│ ├── prompt_caching.py # Anthropic prompt caching
│ ├── auxiliary_client.py # Auxiliary LLM client (vision, summarization)
│ ├── model_metadata.py # Model context lengths, token estimation
│ ├── models_dev.py # models.dev registry integration (provider-aware context)
│ ├── display.py # KawaiiSpinner, tool preview formatting
│ ├── skill_commands.py # Skill slash commands (shared CLI/gateway)
│ └── trajectory.py # Trajectory saving helpers
@@ -37,6 +38,7 @@ hermes-agent/
│ ├── tools_config.py # `hermes tools` — enable/disable tools per platform
│ ├── skills_hub.py # `/skills` slash command (search, browse, install)
│ ├── models.py # Model catalog, provider model lists
│ ├── model_switch.py # Shared /model switch pipeline (CLI + gateway)
│ └── auth.py # Provider credential resolution
├── tools/ # Tool implementations (one file per tool)
│ ├── registry.py # Central tool registry (schemas, handlers, dispatch)
@@ -44,7 +46,7 @@ hermes-agent/
│ ├── terminal_tool.py # Terminal orchestration
│ ├── process_registry.py # Background process management
│ ├── file_tools.py # File read/write/search/patch
│ ├── web_tools.py # Firecrawl search/extract
│ ├── web_tools.py # Web search/extract (Parallel + Firecrawl)
│ ├── browser_tool.py # Browserbase browser automation
│ ├── code_execution_tool.py # execute_code sandbox
│ ├── delegate_tool.py # Subagent delegation
@@ -129,14 +131,51 @@ Messages follow OpenAI format: `{"role": "system/user/assistant/tool", ...}`. Re
- **KawaiiSpinner** (`agent/display.py`) — animated faces during API calls, `┊` activity feed for tool results
- `load_cli_config()` in cli.py merges hardcoded defaults + user config YAML
- **Skin engine** (`hermes_cli/skin_engine.py`) — data-driven CLI theming; initialized from `display.skin` config key at startup; skins customize banner colors, spinner faces/verbs/wings, tool prefix, response box, branding text
- `process_command()` is a method on `HermesCLI` (not in commands.py)
- `process_command()` is a method on `HermesCLI` — dispatches on canonical command name resolved via `resolve_command()` from the central registry
- Skill slash commands: `agent/skill_commands.py` scans `~/.hermes/skills/`, injects as **user message** (not system prompt) to preserve prompt caching
### Adding CLI Commands
### Slash Command Registry (`hermes_cli/commands.py`)
1. Add to `COMMANDS` dict in `hermes_cli/commands.py`
2. Add handler in `HermesCLI.process_command()` in `cli.py`
3. For persistent settings, use `save_config_value()` in `cli.py`
All slash commands are defined in a central `COMMAND_REGISTRY` list of `CommandDef` objects. Every downstream consumer derives from this registry automatically:
- **CLI** — `process_command()` resolves aliases via `resolve_command()`, dispatches on canonical name
- **Gateway** — `GATEWAY_KNOWN_COMMANDS` frozenset for hook emission, `resolve_command()` for dispatch
- **Gateway help** — `gateway_help_lines()` generates `/help` output
- **Telegram** — `telegram_bot_commands()` generates the BotCommand menu
- **Slack** — `slack_subcommand_map()` generates `/hermes` subcommand routing
- **Autocomplete** — `COMMANDS` flat dict feeds `SlashCommandCompleter`
- **CLI help** — `COMMANDS_BY_CATEGORY` dict feeds `show_help()`
### Adding a Slash Command
1. Add a `CommandDef` entry to `COMMAND_REGISTRY` in `hermes_cli/commands.py`:
```python
CommandDef("mycommand", "Description of what it does", "Session",
aliases=("mc",), args_hint="[arg]"),
```
2. Add handler in `HermesCLI.process_command()` in `cli.py`:
```python
elif canonical == "mycommand":
self._handle_mycommand(cmd_original)
```
3. If the command is available in the gateway, add a handler in `gateway/run.py`:
```python
if canonical == "mycommand":
return await self._handle_mycommand(event)
```
4. For persistent settings, use `save_config_value()` in `cli.py`
**CommandDef fields:**
- `name` — canonical name without slash (e.g. `"background"`)
- `description` — human-readable description
- `category` — one of `"Session"`, `"Configuration"`, `"Tools & Skills"`, `"Info"`, `"Exit"`
- `aliases` — tuple of alternative names (e.g. `("bg",)`)
- `args_hint` — argument placeholder shown in help (e.g. `"<prompt>"`, `"[name]"`)
- `cli_only` — only available in the interactive CLI
- `gateway_only` — only available in messaging platforms
- `gateway_config_gate` — config dotpath (e.g. `"display.tool_progress_command"`); when set on a `cli_only` command, the command becomes available in the gateway if the config value is truthy. `GATEWAY_KNOWN_COMMANDS` always includes config-gated commands so the gateway can dispatch them; help/menus only show them when the gate is open.
**Adding an alias** requires only adding it to the `aliases` tuple on the existing `CommandDef`. No other file changes needed — dispatch, help text, Telegram menu, Slack mapping, and autocomplete all update automatically.
---
@@ -171,6 +210,10 @@ registry.register(
The registry handles schema collection, dispatch, availability checking, and error wrapping. All handlers MUST return a JSON string.
**Path references in tool schemas**: If the schema description mentions file paths (e.g. default output directories), use `display_hermes_home()` to make them profile-aware. The schema is generated at import time, which is after `_apply_profile_override()` sets `HERMES_HOME`.
**State files**: If a tool stores persistent state (caches, logs, checkpoints), use `get_hermes_home()` for the base directory — never `Path.home() / ".hermes"`. This ensures each profile gets its own state.
**Agent-level tools** (todo, memory): intercepted by `run_agent.py` before `handle_function_call()`. See `todo_tool.py` for the pattern.
---
@@ -235,6 +278,7 @@ hermes_cli/skin_engine.py # SkinConfig dataclass, built-in skins, YAML loader
| Spinner verbs | `spinner.thinking_verbs` | `display.py` |
| Spinner wings (optional) | `spinner.wings` | `display.py` |
| Tool output prefix | `tool_prefix` | `display.py` |
| Per-tool emojis | `tool_emojis` | `display.py``get_tool_emoji()` |
| Agent name | `branding.agent_name` | `banner.py`, `cli.py` |
| Welcome message | `branding.welcome` | `cli.py` |
| Response box label | `branding.response_label` | `cli.py` |
@@ -318,8 +362,69 @@ in config.yaml (or `HERMES_BACKGROUND_NOTIFICATIONS` env var):
---
## Profiles: Multi-Instance Support
Hermes supports **profiles** — multiple fully isolated instances, each with its own
`HERMES_HOME` directory (config, API keys, memory, sessions, skills, gateway, etc.).
The core mechanism: `_apply_profile_override()` in `hermes_cli/main.py` sets
`HERMES_HOME` before any module imports. All 119+ references to `get_hermes_home()`
automatically scope to the active profile.
### Rules for profile-safe code
1. **Use `get_hermes_home()` for all HERMES_HOME paths.** Import from `hermes_constants`.
NEVER hardcode `~/.hermes` or `Path.home() / ".hermes"` in code that reads/writes state.
```python
# GOOD
from hermes_constants import get_hermes_home
config_path = get_hermes_home() / "config.yaml"
# BAD — breaks profiles
config_path = Path.home() / ".hermes" / "config.yaml"
```
2. **Use `display_hermes_home()` for user-facing messages.** Import from `hermes_constants`.
This returns `~/.hermes` for default or `~/.hermes/profiles/<name>` for profiles.
```python
# GOOD
from hermes_constants import display_hermes_home
print(f"Config saved to {display_hermes_home()}/config.yaml")
# BAD — shows wrong path for profiles
print("Config saved to ~/.hermes/config.yaml")
```
3. **Module-level constants are fine** — they cache `get_hermes_home()` at import time,
which is AFTER `_apply_profile_override()` sets the env var. Just use `get_hermes_home()`,
not `Path.home() / ".hermes"`.
4. **Tests that mock `Path.home()` must also set `HERMES_HOME`** — since code now uses
`get_hermes_home()` (reads env var), not `Path.home() / ".hermes"`:
```python
with patch.object(Path, "home", return_value=tmp_path), \
patch.dict(os.environ, {"HERMES_HOME": str(tmp_path / ".hermes")}):
...
```
5. **Gateway platform adapters should use token locks** — if the adapter connects with
a unique credential (bot token, API key), call `acquire_scoped_lock()` from
`gateway.status` in the `connect()`/`start()` method and `release_scoped_lock()` in
`disconnect()`/`stop()`. This prevents two profiles from using the same credential.
See `gateway/platforms/telegram.py` for the canonical pattern.
6. **Profile operations are HOME-anchored, not HERMES_HOME-anchored** — `_get_profiles_root()`
returns `Path.home() / ".hermes" / "profiles"`, NOT `get_hermes_home() / "profiles"`.
This is intentional — it lets `hermes -p coder profile list` see all profiles regardless
of which one is active.
## Known Pitfalls
### DO NOT hardcode `~/.hermes` paths
Use `get_hermes_home()` from `hermes_constants` for code paths. Use `display_hermes_home()`
for user-facing print/log messages. Hardcoding `~/.hermes` breaks profiles — each profile
has its own `HERMES_HOME` directory. This was the source of 5 bugs fixed in PR #3575.
### DO NOT use `simple_term_menu` for interactive menus
Rendering bugs in tmux/iTerm2 — ghosting on scroll. Use `curses` (stdlib) instead. See `hermes_cli/tools_config.py` for the pattern.
@@ -327,17 +432,33 @@ Rendering bugs in tmux/iTerm2 — ghosting on scroll. Use `curses` (stdlib) inst
Leaks as literal `?[K` text under `prompt_toolkit`'s `patch_stdout`. Use space-padding: `f"\r{line}{' ' * pad}"`.
### `_last_resolved_tool_names` is a process-global in `model_tools.py`
When subagents overwrite this global, `execute_code` calls after delegation may fail with missing tool imports. Known bug.
`_run_single_child()` in `delegate_tool.py` saves and restores this global around subagent execution. If you add new code that reads this global, be aware it may be temporarily stale during child agent runs.
### DO NOT hardcode cross-tool references in schema descriptions
Tool schema descriptions must not mention tools from other toolsets by name (e.g., `browser_navigate` saying "prefer web_search"). Those tools may be unavailable (missing API keys, disabled toolset), causing the model to hallucinate calls to non-existent tools. If a cross-reference is needed, add it dynamically in `get_tool_definitions()` in `model_tools.py` — see the `browser_navigate` / `execute_code` post-processing blocks for the pattern.
### Tests must not write to `~/.hermes/`
The `_isolate_hermes_home` autouse fixture in `tests/conftest.py` redirects `HERMES_HOME` to a temp dir. Never hardcode `~/.hermes/` paths in tests.
**Profile tests**: When testing profile features, also mock `Path.home()` so that
`_get_profiles_root()` and `_get_default_hermes_home()` resolve within the temp dir.
Use the pattern from `tests/hermes_cli/test_profiles.py`:
```python
@pytest.fixture
def profile_env(tmp_path, monkeypatch):
home = tmp_path / ".hermes"
home.mkdir()
monkeypatch.setattr(Path, "home", lambda: tmp_path)
monkeypatch.setenv("HERMES_HOME", str(home))
return home
```
---
## Testing
```bash
source .venv/bin/activate
source venv/bin/activate
python -m pytest tests/ -q # Full suite (~3000 tests, ~3 min)
python -m pytest tests/test_model_tools.py -q # Toolset resolution
python -m pytest tests/test_cli_init.py -q # CLI config loading

View File

@@ -72,8 +72,9 @@ export VIRTUAL_ENV="$(pwd)/venv"
# Install with all extras (messaging, cron, CLI menus, dev tools)
uv pip install -e ".[all,dev]"
uv pip install -e "./mini-swe-agent"
uv pip install -e "./tinker-atropos"
# Optional: RL training submodule
# git submodule update --init tinker-atropos && uv pip install -e "./tinker-atropos"
# Optional: browser tools
npm install
@@ -136,7 +137,7 @@ hermes-agent/
│ ├── auth.py # Provider resolution, OAuth, Nous Portal
│ ├── models.py # OpenRouter model selection lists
│ ├── banner.py # Welcome banner, ASCII art
│ ├── commands.py # Slash command definitions + autocomplete
│ ├── commands.py # Central slash command registry (CommandDef), autocomplete, gateway helpers
│ ├── callbacks.py # Interactive callbacks (clarify, sudo, approval)
│ ├── doctor.py # Diagnostics
│ ├── skills_hub.py # Skills Hub CLI + /skills slash command
@@ -147,7 +148,7 @@ hermes-agent/
│ ├── approval.py # Dangerous command detection + per-session approval
│ ├── terminal_tool.py # Terminal orchestration (sudo, env lifecycle, backends)
│ ├── file_operations.py # read_file, write_file, search, patch, etc.
│ ├── web_tools.py # web_search, web_extract (Firecrawl + Gemini summarization)
│ ├── web_tools.py # web_search, web_extract (Parallel/Firecrawl + Gemini summarization)
│ ├── vision_tools.py # Image analysis via multimodal models
│ ├── delegate_tool.py # Subagent spawning and parallel task execution
│ ├── code_execution_tool.py # Sandboxed Python with RPC tool access

25
Dockerfile Normal file
View File

@@ -0,0 +1,25 @@
FROM debian:13.4
# Install system dependencies in one layer, clear APT cache
RUN apt-get update && \
apt-get install -y --no-install-recommends \
build-essential nodejs npm python3 python3-pip ripgrep ffmpeg gcc python3-dev libffi-dev && \
rm -rf /var/lib/apt/lists/*
COPY . /opt/hermes
WORKDIR /opt/hermes
# Install Python and Node dependencies in one layer, no cache
RUN pip install --no-cache-dir -e ".[all]" --break-system-packages && \
npm install --prefer-offline --no-audit && \
npx playwright install --with-deps chromium --only-shell && \
cd /opt/hermes/scripts/whatsapp-bridge && \
npm install --prefer-offline --no-audit && \
npm cache clean --force
WORKDIR /opt/hermes
RUN chmod +x /opt/hermes/docker/entrypoint.sh
ENV HERMES_HOME=/opt/data
VOLUME [ "/opt/data" ]
ENTRYPOINT [ "/opt/hermes/docker/entrypoint.sh" ]

4
MANIFEST.in Normal file
View File

@@ -0,0 +1,4 @@
graft skills
graft optional-skills
global-exclude __pycache__
global-exclude *.py[cod]

View File

@@ -2,7 +2,7 @@
<img src="assets/banner.png" alt="Hermes Agent" width="100%">
</p>
# Hermes Agent
# Hermes Agent
<p align="center">
<a href="https://hermes-agent.nousresearch.com/docs/"><img src="https://img.shields.io/badge/Docs-hermes--agent.nousresearch.com-FFD700?style=for-the-badge" alt="Documentation"></a>
@@ -62,6 +62,24 @@ hermes doctor # Diagnose any issues
📖 **[Full documentation →](https://hermes-agent.nousresearch.com/docs/)**
## CLI vs Messaging Quick Reference
Hermes has two entry points: start the terminal UI with `hermes`, or run the gateway and talk to it from Telegram, Discord, Slack, WhatsApp, Signal, or Email. Once you're in a conversation, many slash commands are shared across both interfaces.
| Action | CLI | Messaging platforms |
|---------|-----|---------------------|
| Start chatting | `hermes` | Run `hermes gateway setup` + `hermes gateway start`, then send the bot a message |
| Start fresh conversation | `/new` or `/reset` | `/new` or `/reset` |
| Change model | `/model [provider:model]` | `/model [provider:model]` |
| Set a personality | `/personality [name]` | `/personality [name]` |
| Retry or undo the last turn | `/retry`, `/undo` | `/retry`, `/undo` |
| Compress context / check usage | `/compress`, `/usage`, `/insights [--days N]` | `/compress`, `/usage`, `/insights [days]` |
| Browse skills | `/skills` or `/<skill-name>` | `/skills` or `/<skill-name>` |
| Interrupt current work | `Ctrl+C` or send a new message | `/stop` or send a new message |
| Platform-specific status | `/platforms` | `/status`, `/sethome` |
For the full command lists, see the [CLI guide](https://hermes-agent.nousresearch.com/docs/user-guide/cli) and the [Messaging Gateway guide](https://hermes-agent.nousresearch.com/docs/user-guide/messaging).
---
## Documentation
@@ -126,16 +144,14 @@ Quick start for contributors:
```bash
git clone https://github.com/NousResearch/hermes-agent.git
cd hermes-agent
git submodule update --init mini-swe-agent # required terminal backend
curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv .venv --python 3.11
source .venv/bin/activate
uv venv venv --python 3.11
source venv/bin/activate
uv pip install -e ".[all,dev]"
uv pip install -e "./mini-swe-agent"
python -m pytest tests/ -q
```
> **RL Training (optional):** To work on the RL/Tinker-Atropos integration, also run:
> **RL Training (optional):** To work on the RL/Tinker-Atropos integration:
> ```bash
> git submodule update --init tinker-atropos
> uv pip install -e "./tinker-atropos"

377
RELEASE_v0.3.0.md Normal file
View File

@@ -0,0 +1,377 @@
# Hermes Agent v0.3.0 (v2026.3.17)
**Release Date:** March 17, 2026
> The streaming, plugins, and provider release — unified real-time token delivery, first-class plugin architecture, rebuilt provider system with Vercel AI Gateway, native Anthropic provider, smart approvals, live Chrome CDP browser connect, ACP IDE integration, Honcho memory, voice mode, persistent shell, and 50+ bug fixes across every platform.
---
## ✨ Highlights
- **Unified Streaming Infrastructure** — Real-time token-by-token delivery in CLI and all gateway platforms. Responses stream as they're generated instead of arriving as a block. ([#1538](https://github.com/NousResearch/hermes-agent/pull/1538))
- **First-Class Plugin Architecture** — Drop Python files into `~/.hermes/plugins/` to extend Hermes with custom tools, commands, and hooks. No forking required. ([#1544](https://github.com/NousResearch/hermes-agent/pull/1544), [#1555](https://github.com/NousResearch/hermes-agent/pull/1555))
- **Native Anthropic Provider** — Direct Anthropic API calls with Claude Code credential auto-discovery, OAuth PKCE flows, and native prompt caching. No OpenRouter middleman needed. ([#1097](https://github.com/NousResearch/hermes-agent/pull/1097))
- **Smart Approvals + /stop Command** — Codex-inspired approval system that learns which commands are safe and remembers your preferences. `/stop` kills the current agent run immediately. ([#1543](https://github.com/NousResearch/hermes-agent/pull/1543))
- **Honcho Memory Integration** — Async memory writes, configurable recall modes, session title integration, and multi-user isolation in gateway mode. By @erosika. ([#736](https://github.com/NousResearch/hermes-agent/pull/736))
- **Voice Mode** — Push-to-talk in CLI, voice notes in Telegram/Discord, Discord voice channel support, and local Whisper transcription via faster-whisper. ([#1299](https://github.com/NousResearch/hermes-agent/pull/1299), [#1185](https://github.com/NousResearch/hermes-agent/pull/1185), [#1429](https://github.com/NousResearch/hermes-agent/pull/1429))
- **Concurrent Tool Execution** — Multiple independent tool calls now run in parallel via ThreadPoolExecutor, significantly reducing latency for multi-tool turns. ([#1152](https://github.com/NousResearch/hermes-agent/pull/1152))
- **PII Redaction** — When `privacy.redact_pii` is enabled, personally identifiable information is automatically scrubbed before sending context to LLM providers. ([#1542](https://github.com/NousResearch/hermes-agent/pull/1542))
- **`/browser connect` via CDP** — Attach browser tools to a live Chrome instance through Chrome DevTools Protocol. Debug, inspect, and interact with pages you already have open. ([#1549](https://github.com/NousResearch/hermes-agent/pull/1549))
- **Vercel AI Gateway Provider** — Route Hermes through Vercel's AI Gateway for access to their model catalog and infrastructure. ([#1628](https://github.com/NousResearch/hermes-agent/pull/1628))
- **Centralized Provider Router** — Rebuilt provider system with `call_llm` API, unified `/model` command, auto-detect provider on model switch, and direct endpoint overrides for auxiliary/delegation clients. ([#1003](https://github.com/NousResearch/hermes-agent/pull/1003), [#1506](https://github.com/NousResearch/hermes-agent/pull/1506), [#1375](https://github.com/NousResearch/hermes-agent/pull/1375))
- **ACP Server (IDE Integration)** — VS Code, Zed, and JetBrains can now connect to Hermes as an agent backend, with full slash command support. ([#1254](https://github.com/NousResearch/hermes-agent/pull/1254), [#1532](https://github.com/NousResearch/hermes-agent/pull/1532))
- **Persistent Shell Mode** — Local and SSH terminal backends can maintain shell state across tool calls — cd, env vars, and aliases persist. By @alt-glitch. ([#1067](https://github.com/NousResearch/hermes-agent/pull/1067), [#1483](https://github.com/NousResearch/hermes-agent/pull/1483))
- **Agentic On-Policy Distillation (OPD)** — New RL training environment for distilling agent policies, expanding the Atropos training ecosystem. ([#1149](https://github.com/NousResearch/hermes-agent/pull/1149))
---
## 🏗️ Core Agent & Architecture
### Provider & Model Support
- **Centralized provider router** with `call_llm` API and unified `/model` command — switch models and providers seamlessly ([#1003](https://github.com/NousResearch/hermes-agent/pull/1003))
- **Vercel AI Gateway** provider support ([#1628](https://github.com/NousResearch/hermes-agent/pull/1628))
- **Auto-detect provider** when switching models via `/model` ([#1506](https://github.com/NousResearch/hermes-agent/pull/1506))
- **Direct endpoint overrides** for auxiliary and delegation clients — point vision/subagent calls at specific endpoints ([#1375](https://github.com/NousResearch/hermes-agent/pull/1375))
- **Native Anthropic auxiliary vision** — use Claude's native vision API instead of routing through OpenAI-compatible endpoints ([#1377](https://github.com/NousResearch/hermes-agent/pull/1377))
- Anthropic OAuth flow improvements — auto-run `claude setup-token`, reauthentication, PKCE state persistence, identity fingerprinting ([#1132](https://github.com/NousResearch/hermes-agent/pull/1132), [#1360](https://github.com/NousResearch/hermes-agent/pull/1360), [#1396](https://github.com/NousResearch/hermes-agent/pull/1396), [#1597](https://github.com/NousResearch/hermes-agent/pull/1597))
- Fix adaptive thinking without `budget_tokens` for Claude 4.6 models — by @ASRagab ([#1128](https://github.com/NousResearch/hermes-agent/pull/1128))
- Fix Anthropic cache markers through adapter — by @brandtcormorant ([#1216](https://github.com/NousResearch/hermes-agent/pull/1216))
- Retry Anthropic 429/529 errors and surface details to users — by @0xbyt4 ([#1585](https://github.com/NousResearch/hermes-agent/pull/1585))
- Fix Anthropic adapter max_tokens, fallback crash, proxy base_url — by @0xbyt4 ([#1121](https://github.com/NousResearch/hermes-agent/pull/1121))
- Fix DeepSeek V3 parser dropping multiple parallel tool calls — by @mr-emmett-one ([#1365](https://github.com/NousResearch/hermes-agent/pull/1365), [#1300](https://github.com/NousResearch/hermes-agent/pull/1300))
- Accept unlisted models with warning instead of rejecting ([#1047](https://github.com/NousResearch/hermes-agent/pull/1047), [#1102](https://github.com/NousResearch/hermes-agent/pull/1102))
- Skip reasoning params for unsupported OpenRouter models ([#1485](https://github.com/NousResearch/hermes-agent/pull/1485))
- MiniMax Anthropic API compatibility fix ([#1623](https://github.com/NousResearch/hermes-agent/pull/1623))
- Custom endpoint `/models` verification and `/v1` base URL suggestion ([#1480](https://github.com/NousResearch/hermes-agent/pull/1480))
- Resolve delegation providers from `custom_providers` config ([#1328](https://github.com/NousResearch/hermes-agent/pull/1328))
- Kimi model additions and User-Agent fix ([#1039](https://github.com/NousResearch/hermes-agent/pull/1039))
- Strip `call_id`/`response_item_id` for Mistral compatibility ([#1058](https://github.com/NousResearch/hermes-agent/pull/1058))
### Agent Loop & Conversation
- **Anthropic Context Editing API** support ([#1147](https://github.com/NousResearch/hermes-agent/pull/1147))
- Improved context compaction handoff summaries — compressor now preserves more actionable state ([#1273](https://github.com/NousResearch/hermes-agent/pull/1273))
- Sync session_id after mid-run context compression ([#1160](https://github.com/NousResearch/hermes-agent/pull/1160))
- Session hygiene threshold tuned to 50% for more proactive compression ([#1096](https://github.com/NousResearch/hermes-agent/pull/1096), [#1161](https://github.com/NousResearch/hermes-agent/pull/1161))
- Include session ID in system prompt via `--pass-session-id` flag ([#1040](https://github.com/NousResearch/hermes-agent/pull/1040))
- Prevent closed OpenAI client reuse across retries ([#1391](https://github.com/NousResearch/hermes-agent/pull/1391))
- Sanitize chat payloads and provider precedence ([#1253](https://github.com/NousResearch/hermes-agent/pull/1253))
- Handle dict tool call arguments from Codex and local backends ([#1393](https://github.com/NousResearch/hermes-agent/pull/1393), [#1440](https://github.com/NousResearch/hermes-agent/pull/1440))
### Memory & Sessions
- **Improve memory prioritization** — user preferences and corrections weighted above procedural knowledge ([#1548](https://github.com/NousResearch/hermes-agent/pull/1548))
- Tighter memory and session recall guidance in system prompts ([#1329](https://github.com/NousResearch/hermes-agent/pull/1329))
- Persist CLI token counts to session DB for `/insights` ([#1498](https://github.com/NousResearch/hermes-agent/pull/1498))
- Keep Honcho recall out of the cached system prefix ([#1201](https://github.com/NousResearch/hermes-agent/pull/1201))
- Correct `seed_ai_identity` to use `session.add_messages()` ([#1475](https://github.com/NousResearch/hermes-agent/pull/1475))
- Isolate Honcho session routing for multi-user gateway ([#1500](https://github.com/NousResearch/hermes-agent/pull/1500))
---
## 📱 Messaging Platforms (Gateway)
### Gateway Core
- **System gateway service mode** — run as a system-level systemd service, not just user-level ([#1371](https://github.com/NousResearch/hermes-agent/pull/1371))
- **Gateway install scope prompts** — choose user vs system scope during setup ([#1374](https://github.com/NousResearch/hermes-agent/pull/1374))
- **Reasoning hot reload** — change reasoning settings without restarting the gateway ([#1275](https://github.com/NousResearch/hermes-agent/pull/1275))
- Default group sessions to per-user isolation — no more shared state across users in group chats ([#1495](https://github.com/NousResearch/hermes-agent/pull/1495), [#1417](https://github.com/NousResearch/hermes-agent/pull/1417))
- Harden gateway restart recovery ([#1310](https://github.com/NousResearch/hermes-agent/pull/1310))
- Cancel active runs during shutdown ([#1427](https://github.com/NousResearch/hermes-agent/pull/1427))
- SSL certificate auto-detection for NixOS and non-standard systems ([#1494](https://github.com/NousResearch/hermes-agent/pull/1494))
- Auto-detect D-Bus session bus for `systemctl --user` on headless servers ([#1601](https://github.com/NousResearch/hermes-agent/pull/1601))
- Auto-enable systemd linger during gateway install on headless servers ([#1334](https://github.com/NousResearch/hermes-agent/pull/1334))
- Fall back to module entrypoint when `hermes` is not on PATH ([#1355](https://github.com/NousResearch/hermes-agent/pull/1355))
- Fix dual gateways on macOS launchd after `hermes update` ([#1567](https://github.com/NousResearch/hermes-agent/pull/1567))
- Remove recursive ExecStop from systemd units ([#1530](https://github.com/NousResearch/hermes-agent/pull/1530))
- Prevent logging handler accumulation in gateway mode ([#1251](https://github.com/NousResearch/hermes-agent/pull/1251))
- Restart on retryable startup failures — by @jplew ([#1517](https://github.com/NousResearch/hermes-agent/pull/1517))
- Backfill model on gateway sessions after agent runs ([#1306](https://github.com/NousResearch/hermes-agent/pull/1306))
- PID-based gateway kill and deferred config write ([#1499](https://github.com/NousResearch/hermes-agent/pull/1499))
### Telegram
- Buffer media groups to prevent self-interruption from photo bursts ([#1341](https://github.com/NousResearch/hermes-agent/pull/1341), [#1422](https://github.com/NousResearch/hermes-agent/pull/1422))
- Retry on transient TLS failures during connect and send ([#1535](https://github.com/NousResearch/hermes-agent/pull/1535))
- Harden polling conflict handling ([#1339](https://github.com/NousResearch/hermes-agent/pull/1339))
- Escape chunk indicators and inline code in MarkdownV2 ([#1478](https://github.com/NousResearch/hermes-agent/pull/1478), [#1626](https://github.com/NousResearch/hermes-agent/pull/1626))
- Check updater/app state before disconnect ([#1389](https://github.com/NousResearch/hermes-agent/pull/1389))
### Discord
- `/thread` command with `auto_thread` config and media metadata fixes ([#1178](https://github.com/NousResearch/hermes-agent/pull/1178))
- Auto-thread on @mention, skip mention text in bot threads ([#1438](https://github.com/NousResearch/hermes-agent/pull/1438))
- Retry without reply reference for system messages ([#1385](https://github.com/NousResearch/hermes-agent/pull/1385))
- Preserve native document and video attachment support ([#1392](https://github.com/NousResearch/hermes-agent/pull/1392))
- Defer discord adapter annotations to avoid optional import crashes ([#1314](https://github.com/NousResearch/hermes-agent/pull/1314))
### Slack
- Thread handling overhaul — progress messages, responses, and session isolation all respect threads ([#1103](https://github.com/NousResearch/hermes-agent/pull/1103))
- Formatting, reactions, user resolution, and command improvements ([#1106](https://github.com/NousResearch/hermes-agent/pull/1106))
- Fix MAX_MESSAGE_LENGTH 3900 → 39000 ([#1117](https://github.com/NousResearch/hermes-agent/pull/1117))
- File upload fallback preserves thread context — by @0xbyt4 ([#1122](https://github.com/NousResearch/hermes-agent/pull/1122))
- Improve setup guidance ([#1387](https://github.com/NousResearch/hermes-agent/pull/1387))
### Email
- Fix IMAP UID tracking and SMTP TLS verification ([#1305](https://github.com/NousResearch/hermes-agent/pull/1305))
- Add `skip_attachments` option via config.yaml ([#1536](https://github.com/NousResearch/hermes-agent/pull/1536))
### Home Assistant
- Event filtering closed by default ([#1169](https://github.com/NousResearch/hermes-agent/pull/1169))
---
## 🖥️ CLI & User Experience
### Interactive CLI
- **Persistent CLI status bar** — always-visible model, provider, and token counts ([#1522](https://github.com/NousResearch/hermes-agent/pull/1522))
- **File path autocomplete** in the input prompt ([#1545](https://github.com/NousResearch/hermes-agent/pull/1545))
- **`/plan` command** — generate implementation plans from specs ([#1372](https://github.com/NousResearch/hermes-agent/pull/1372), [#1381](https://github.com/NousResearch/hermes-agent/pull/1381))
- **Major `/rollback` improvements** — richer checkpoint history, clearer UX ([#1505](https://github.com/NousResearch/hermes-agent/pull/1505))
- **Preload CLI skills on launch** — skills are ready before the first prompt ([#1359](https://github.com/NousResearch/hermes-agent/pull/1359))
- **Centralized slash command registry** — all commands defined once, consumed everywhere ([#1603](https://github.com/NousResearch/hermes-agent/pull/1603))
- `/bg` alias for `/background` ([#1590](https://github.com/NousResearch/hermes-agent/pull/1590))
- Prefix matching for slash commands — `/mod` resolves to `/model` ([#1320](https://github.com/NousResearch/hermes-agent/pull/1320))
- `/new`, `/reset`, `/clear` now start genuinely fresh sessions ([#1237](https://github.com/NousResearch/hermes-agent/pull/1237))
- Accept session ID prefixes for session actions ([#1425](https://github.com/NousResearch/hermes-agent/pull/1425))
- TUI prompt and accent output now respect active skin ([#1282](https://github.com/NousResearch/hermes-agent/pull/1282))
- Centralize tool emoji metadata in registry + skin integration ([#1484](https://github.com/NousResearch/hermes-agent/pull/1484))
- "View full command" option added to dangerous command approval — by @teknium1 based on design by community ([#887](https://github.com/NousResearch/hermes-agent/pull/887))
- Non-blocking startup update check and banner deduplication ([#1386](https://github.com/NousResearch/hermes-agent/pull/1386))
- `/reasoning` command output ordering and inline think extraction fixes ([#1031](https://github.com/NousResearch/hermes-agent/pull/1031))
- Verbose mode shows full untruncated output ([#1472](https://github.com/NousResearch/hermes-agent/pull/1472))
- Fix `/status` to report live state and tokens ([#1476](https://github.com/NousResearch/hermes-agent/pull/1476))
- Seed a default global SOUL.md ([#1311](https://github.com/NousResearch/hermes-agent/pull/1311))
### Setup & Configuration
- **OpenClaw migration** during first-time setup — by @kshitijk4poor ([#981](https://github.com/NousResearch/hermes-agent/pull/981))
- `hermes claw migrate` command + migration docs ([#1059](https://github.com/NousResearch/hermes-agent/pull/1059))
- Smart vision setup that respects the user's chosen provider ([#1323](https://github.com/NousResearch/hermes-agent/pull/1323))
- Handle headless setup flows end-to-end ([#1274](https://github.com/NousResearch/hermes-agent/pull/1274))
- Prefer curses over `simple_term_menu` in setup.py ([#1487](https://github.com/NousResearch/hermes-agent/pull/1487))
- Show effective model and provider in `/status` ([#1284](https://github.com/NousResearch/hermes-agent/pull/1284))
- Config set examples use placeholder syntax ([#1322](https://github.com/NousResearch/hermes-agent/pull/1322))
- Reload .env over stale shell overrides ([#1434](https://github.com/NousResearch/hermes-agent/pull/1434))
- Fix is_coding_plan NameError crash — by @0xbyt4 ([#1123](https://github.com/NousResearch/hermes-agent/pull/1123))
- Add missing packages to setuptools config — by @alt-glitch ([#912](https://github.com/NousResearch/hermes-agent/pull/912))
- Installer: clarify why sudo is needed at every prompt ([#1602](https://github.com/NousResearch/hermes-agent/pull/1602))
---
## 🔧 Tool System
### Terminal & Execution
- **Persistent shell mode** for local and SSH backends — maintain shell state across tool calls — by @alt-glitch ([#1067](https://github.com/NousResearch/hermes-agent/pull/1067), [#1483](https://github.com/NousResearch/hermes-agent/pull/1483))
- **Tirith pre-exec command scanning** — security layer that analyzes commands before execution ([#1256](https://github.com/NousResearch/hermes-agent/pull/1256))
- Strip Hermes provider env vars from all subprocess environments ([#1157](https://github.com/NousResearch/hermes-agent/pull/1157), [#1172](https://github.com/NousResearch/hermes-agent/pull/1172), [#1399](https://github.com/NousResearch/hermes-agent/pull/1399), [#1419](https://github.com/NousResearch/hermes-agent/pull/1419)) — initial fix by @eren-karakus0
- SSH preflight check ([#1486](https://github.com/NousResearch/hermes-agent/pull/1486))
- Docker backend: make cwd workspace mount explicit opt-in ([#1534](https://github.com/NousResearch/hermes-agent/pull/1534))
- Add project root to PYTHONPATH in execute_code sandbox ([#1383](https://github.com/NousResearch/hermes-agent/pull/1383))
- Eliminate execute_code progress spam on gateway platforms ([#1098](https://github.com/NousResearch/hermes-agent/pull/1098))
- Clearer docker backend preflight errors ([#1276](https://github.com/NousResearch/hermes-agent/pull/1276))
### Browser
- **`/browser connect`** — attach browser tools to a live Chrome instance via CDP ([#1549](https://github.com/NousResearch/hermes-agent/pull/1549))
- Improve browser cleanup, local browser PATH setup, and screenshot recovery ([#1333](https://github.com/NousResearch/hermes-agent/pull/1333))
### MCP
- **Selective tool loading** with utility policies — filter which MCP tools are available ([#1302](https://github.com/NousResearch/hermes-agent/pull/1302))
- Auto-reload MCP tools when `mcp_servers` config changes without restart ([#1474](https://github.com/NousResearch/hermes-agent/pull/1474))
- Resolve npx stdio connection failures ([#1291](https://github.com/NousResearch/hermes-agent/pull/1291))
- Preserve MCP toolsets when saving platform tool config ([#1421](https://github.com/NousResearch/hermes-agent/pull/1421))
### Vision
- Unify vision backend gating ([#1367](https://github.com/NousResearch/hermes-agent/pull/1367))
- Surface actual error reason instead of generic message ([#1338](https://github.com/NousResearch/hermes-agent/pull/1338))
- Make Claude image handling work end-to-end ([#1408](https://github.com/NousResearch/hermes-agent/pull/1408))
### Cron
- **Compress cron management into one tool** — single `cronjob` tool replaces multiple commands ([#1343](https://github.com/NousResearch/hermes-agent/pull/1343))
- Suppress duplicate cron sends to auto-delivery targets ([#1357](https://github.com/NousResearch/hermes-agent/pull/1357))
- Persist cron sessions to SQLite ([#1255](https://github.com/NousResearch/hermes-agent/pull/1255))
- Per-job runtime overrides (provider, model, base_url) ([#1398](https://github.com/NousResearch/hermes-agent/pull/1398))
- Atomic write in `save_job_output` to prevent data loss on crash ([#1173](https://github.com/NousResearch/hermes-agent/pull/1173))
- Preserve thread context for `deliver=origin` ([#1437](https://github.com/NousResearch/hermes-agent/pull/1437))
### Patch Tool
- Avoid corrupting pipe chars in V4A patch apply ([#1286](https://github.com/NousResearch/hermes-agent/pull/1286))
- Permissive `block_anchor` thresholds and unicode normalization ([#1539](https://github.com/NousResearch/hermes-agent/pull/1539))
### Delegation
- Add observability metadata to subagent results (model, tokens, duration, tool trace) ([#1175](https://github.com/NousResearch/hermes-agent/pull/1175))
---
## 🧩 Skills Ecosystem
### Skills System
- **Integrate skills.sh** as a hub source alongside ClawHub ([#1303](https://github.com/NousResearch/hermes-agent/pull/1303))
- Secure skill env setup on load ([#1153](https://github.com/NousResearch/hermes-agent/pull/1153))
- Honor policy table for dangerous verdicts ([#1330](https://github.com/NousResearch/hermes-agent/pull/1330))
- Harden ClawHub skill search exact matches ([#1400](https://github.com/NousResearch/hermes-agent/pull/1400))
- Fix ClawHub skill install — use `/download` ZIP endpoint ([#1060](https://github.com/NousResearch/hermes-agent/pull/1060))
- Avoid mislabeling local skills as builtin — by @arceus77-7 ([#862](https://github.com/NousResearch/hermes-agent/pull/862))
### New Skills
- **Linear** project management ([#1230](https://github.com/NousResearch/hermes-agent/pull/1230))
- **X/Twitter** via x-cli ([#1285](https://github.com/NousResearch/hermes-agent/pull/1285))
- **Telephony** — Twilio, SMS, and AI calls ([#1289](https://github.com/NousResearch/hermes-agent/pull/1289))
- **1Password** — by @arceus77-7 ([#883](https://github.com/NousResearch/hermes-agent/pull/883), [#1179](https://github.com/NousResearch/hermes-agent/pull/1179))
- **NeuroSkill BCI** integration ([#1135](https://github.com/NousResearch/hermes-agent/pull/1135))
- **Blender MCP** for 3D modeling ([#1531](https://github.com/NousResearch/hermes-agent/pull/1531))
- **OSS Security Forensics** ([#1482](https://github.com/NousResearch/hermes-agent/pull/1482))
- **Parallel CLI** research skill ([#1301](https://github.com/NousResearch/hermes-agent/pull/1301))
- **OpenCode** CLI skill ([#1174](https://github.com/NousResearch/hermes-agent/pull/1174))
- **ASCII Video** skill refactored — by @SHL0MS ([#1213](https://github.com/NousResearch/hermes-agent/pull/1213), [#1598](https://github.com/NousResearch/hermes-agent/pull/1598))
---
## 🎙️ Voice Mode
- Voice mode foundation — push-to-talk CLI, Telegram/Discord voice notes ([#1299](https://github.com/NousResearch/hermes-agent/pull/1299))
- Free local Whisper transcription via faster-whisper ([#1185](https://github.com/NousResearch/hermes-agent/pull/1185))
- Discord voice channel reliability fixes ([#1429](https://github.com/NousResearch/hermes-agent/pull/1429))
- Restore local STT fallback for gateway voice notes ([#1490](https://github.com/NousResearch/hermes-agent/pull/1490))
- Honor `stt.enabled: false` across gateway transcription ([#1394](https://github.com/NousResearch/hermes-agent/pull/1394))
- Fix bogus incapability message on Telegram voice notes (Issue [#1033](https://github.com/NousResearch/hermes-agent/issues/1033))
---
## 🔌 ACP (IDE Integration)
- Restore ACP server implementation ([#1254](https://github.com/NousResearch/hermes-agent/pull/1254))
- Support slash commands in ACP adapter ([#1532](https://github.com/NousResearch/hermes-agent/pull/1532))
---
## 🧪 RL Training
- **Agentic On-Policy Distillation (OPD)** environment — new RL training environment for agent policy distillation ([#1149](https://github.com/NousResearch/hermes-agent/pull/1149))
- Make tinker-atropos RL training fully optional ([#1062](https://github.com/NousResearch/hermes-agent/pull/1062))
---
## 🔒 Security & Reliability
### Security Hardening
- **Tirith pre-exec command scanning** — static analysis of terminal commands before execution ([#1256](https://github.com/NousResearch/hermes-agent/pull/1256))
- **PII redaction** when `privacy.redact_pii` is enabled ([#1542](https://github.com/NousResearch/hermes-agent/pull/1542))
- Strip Hermes provider/gateway/tool env vars from all subprocess environments ([#1157](https://github.com/NousResearch/hermes-agent/pull/1157), [#1172](https://github.com/NousResearch/hermes-agent/pull/1172), [#1399](https://github.com/NousResearch/hermes-agent/pull/1399), [#1419](https://github.com/NousResearch/hermes-agent/pull/1419))
- Docker cwd workspace mount now explicit opt-in — never auto-mount host directories ([#1534](https://github.com/NousResearch/hermes-agent/pull/1534))
- Escape parens and braces in fork bomb regex pattern ([#1397](https://github.com/NousResearch/hermes-agent/pull/1397))
- Harden `.worktreeinclude` path containment ([#1388](https://github.com/NousResearch/hermes-agent/pull/1388))
- Use description as `pattern_key` to prevent approval collisions ([#1395](https://github.com/NousResearch/hermes-agent/pull/1395))
### Reliability
- Guard init-time stdio writes ([#1271](https://github.com/NousResearch/hermes-agent/pull/1271))
- Session log writes reuse shared atomic JSON helper ([#1280](https://github.com/NousResearch/hermes-agent/pull/1280))
- Atomic temp cleanup protected on interrupts ([#1401](https://github.com/NousResearch/hermes-agent/pull/1401))
---
## 🐛 Notable Bug Fixes
- **`/status` always showing 0 tokens** — now reports live state (Issue [#1465](https://github.com/NousResearch/hermes-agent/issues/1465), [#1476](https://github.com/NousResearch/hermes-agent/pull/1476))
- **Custom model endpoints not working** — restored config-saved endpoint resolution (Issue [#1460](https://github.com/NousResearch/hermes-agent/issues/1460), [#1373](https://github.com/NousResearch/hermes-agent/pull/1373))
- **MCP tools not visible until restart** — auto-reload on config change (Issue [#1036](https://github.com/NousResearch/hermes-agent/issues/1036), [#1474](https://github.com/NousResearch/hermes-agent/pull/1474))
- **`hermes tools` removing MCP tools** — preserve MCP toolsets when saving (Issue [#1247](https://github.com/NousResearch/hermes-agent/issues/1247), [#1421](https://github.com/NousResearch/hermes-agent/pull/1421))
- **Terminal subprocesses inheriting `OPENAI_BASE_URL`** breaking external tools (Issue [#1002](https://github.com/NousResearch/hermes-agent/issues/1002), [#1399](https://github.com/NousResearch/hermes-agent/pull/1399))
- **Background process lost on gateway restart** — improved recovery (Issue [#1144](https://github.com/NousResearch/hermes-agent/issues/1144))
- **Cron jobs not persisting state** — now stored in SQLite (Issue [#1416](https://github.com/NousResearch/hermes-agent/issues/1416), [#1255](https://github.com/NousResearch/hermes-agent/pull/1255))
- **Cronjob `deliver: origin` not preserving thread context** (Issue [#1219](https://github.com/NousResearch/hermes-agent/issues/1219), [#1437](https://github.com/NousResearch/hermes-agent/pull/1437))
- **Gateway systemd service failing to auto-restart** when browser processes orphaned (Issue [#1617](https://github.com/NousResearch/hermes-agent/issues/1617))
- **`/background` completion report cut off in Telegram** (Issue [#1443](https://github.com/NousResearch/hermes-agent/issues/1443))
- **Model switching not taking effect** (Issue [#1244](https://github.com/NousResearch/hermes-agent/issues/1244), [#1183](https://github.com/NousResearch/hermes-agent/pull/1183))
- **`hermes doctor` reporting cronjob as unavailable** (Issue [#878](https://github.com/NousResearch/hermes-agent/issues/878), [#1180](https://github.com/NousResearch/hermes-agent/pull/1180))
- **WhatsApp bridge messages not received** from mobile (Issue [#1142](https://github.com/NousResearch/hermes-agent/issues/1142))
- **Setup wizard hanging on headless SSH** (Issue [#905](https://github.com/NousResearch/hermes-agent/issues/905), [#1274](https://github.com/NousResearch/hermes-agent/pull/1274))
- **Log handler accumulation** degrading gateway performance (Issue [#990](https://github.com/NousResearch/hermes-agent/issues/990), [#1251](https://github.com/NousResearch/hermes-agent/pull/1251))
- **Gateway NULL model in DB** (Issue [#987](https://github.com/NousResearch/hermes-agent/issues/987), [#1306](https://github.com/NousResearch/hermes-agent/pull/1306))
- **Strict endpoints rejecting replayed tool_calls** (Issue [#893](https://github.com/NousResearch/hermes-agent/issues/893))
- **Remaining hardcoded `~/.hermes` paths** — all now respect `HERMES_HOME` (Issue [#892](https://github.com/NousResearch/hermes-agent/issues/892), [#1233](https://github.com/NousResearch/hermes-agent/pull/1233))
- **Delegate tool not working with custom inference providers** (Issue [#1011](https://github.com/NousResearch/hermes-agent/issues/1011), [#1328](https://github.com/NousResearch/hermes-agent/pull/1328))
- **Skills Guard blocking official skills** (Issue [#1006](https://github.com/NousResearch/hermes-agent/issues/1006), [#1330](https://github.com/NousResearch/hermes-agent/pull/1330))
- **Setup writing provider before model selection** (Issue [#1182](https://github.com/NousResearch/hermes-agent/issues/1182))
- **`GatewayConfig.get()` AttributeError** crashing all message handling (Issue [#1158](https://github.com/NousResearch/hermes-agent/issues/1158), [#1287](https://github.com/NousResearch/hermes-agent/pull/1287))
- **`/update` hard-failing with "command not found"** (Issue [#1049](https://github.com/NousResearch/hermes-agent/issues/1049))
- **Image analysis failing silently** (Issue [#1034](https://github.com/NousResearch/hermes-agent/issues/1034), [#1338](https://github.com/NousResearch/hermes-agent/pull/1338))
- **API `BadRequestError` from `'dict'` object has no attribute `'strip'`** (Issue [#1071](https://github.com/NousResearch/hermes-agent/issues/1071))
- **Slash commands requiring exact full name** — now uses prefix matching (Issue [#928](https://github.com/NousResearch/hermes-agent/issues/928), [#1320](https://github.com/NousResearch/hermes-agent/pull/1320))
- **Gateway stops responding when terminal is closed on headless** (Issue [#1005](https://github.com/NousResearch/hermes-agent/issues/1005))
---
## 🧪 Testing
- Cover empty cached Anthropic tool-call turns ([#1222](https://github.com/NousResearch/hermes-agent/pull/1222))
- Fix stale CI assumptions in parser and quick-command coverage ([#1236](https://github.com/NousResearch/hermes-agent/pull/1236))
- Fix gateway async tests without implicit event loop ([#1278](https://github.com/NousResearch/hermes-agent/pull/1278))
- Make gateway async tests xdist-safe ([#1281](https://github.com/NousResearch/hermes-agent/pull/1281))
- Cross-timezone naive timestamp regression for cron ([#1319](https://github.com/NousResearch/hermes-agent/pull/1319))
- Isolate codex provider tests from local env ([#1335](https://github.com/NousResearch/hermes-agent/pull/1335))
- Lock retry replacement semantics ([#1379](https://github.com/NousResearch/hermes-agent/pull/1379))
- Improve error logging in session search tool — by @aydnOktay ([#1533](https://github.com/NousResearch/hermes-agent/pull/1533))
---
## 📚 Documentation
- Comprehensive SOUL.md guide ([#1315](https://github.com/NousResearch/hermes-agent/pull/1315))
- Voice mode documentation ([#1316](https://github.com/NousResearch/hermes-agent/pull/1316), [#1362](https://github.com/NousResearch/hermes-agent/pull/1362))
- Provider contribution guide ([#1361](https://github.com/NousResearch/hermes-agent/pull/1361))
- ACP and internal systems implementation guides ([#1259](https://github.com/NousResearch/hermes-agent/pull/1259))
- Expand Docusaurus coverage across CLI, tools, skills, and skins ([#1232](https://github.com/NousResearch/hermes-agent/pull/1232))
- Terminal backend and Windows troubleshooting ([#1297](https://github.com/NousResearch/hermes-agent/pull/1297))
- Skills hub reference section ([#1317](https://github.com/NousResearch/hermes-agent/pull/1317))
- Checkpoint, /rollback, and git worktrees guide ([#1493](https://github.com/NousResearch/hermes-agent/pull/1493), [#1524](https://github.com/NousResearch/hermes-agent/pull/1524))
- CLI status bar and /usage reference ([#1523](https://github.com/NousResearch/hermes-agent/pull/1523))
- Fallback providers + /background command docs ([#1430](https://github.com/NousResearch/hermes-agent/pull/1430))
- Gateway service scopes docs ([#1378](https://github.com/NousResearch/hermes-agent/pull/1378))
- Slack thread reply behavior docs ([#1407](https://github.com/NousResearch/hermes-agent/pull/1407))
- Redesigned landing page with Nous blue palette — by @austinpickett ([#974](https://github.com/NousResearch/hermes-agent/pull/974))
- Fix several documentation typos — by @JackTheGit ([#953](https://github.com/NousResearch/hermes-agent/pull/953))
- Stabilize website diagrams ([#1405](https://github.com/NousResearch/hermes-agent/pull/1405))
- CLI vs messaging quick reference in README ([#1491](https://github.com/NousResearch/hermes-agent/pull/1491))
- Add search to Docusaurus ([#1053](https://github.com/NousResearch/hermes-agent/pull/1053))
- Home Assistant integration docs ([#1170](https://github.com/NousResearch/hermes-agent/pull/1170))
---
## 👥 Contributors
### Core
- **@teknium1** — 220+ PRs spanning every area of the codebase
### Top Community Contributors
- **@0xbyt4** (4 PRs) — Anthropic adapter fixes (max_tokens, fallback crash, 429/529 retry), Slack file upload thread context, setup NameError fix
- **@erosika** (1 PR) — Honcho memory integration: async writes, memory modes, session title integration
- **@SHL0MS** (2 PRs) — ASCII video skill design patterns and refactoring
- **@alt-glitch** (2 PRs) — Persistent shell mode for local/SSH backends, setuptools packaging fix
- **@arceus77-7** (2 PRs) — 1Password skill, fix skills list mislabeling
- **@kshitijk4poor** (1 PR) — OpenClaw migration during setup wizard
- **@ASRagab** (1 PR) — Fix adaptive thinking for Claude 4.6 models
- **@eren-karakus0** (1 PR) — Strip Hermes provider env vars from subprocess environment
- **@mr-emmett-one** (1 PR) — Fix DeepSeek V3 parser multi-tool call support
- **@jplew** (1 PR) — Gateway restart on retryable startup failures
- **@brandtcormorant** (1 PR) — Fix Anthropic cache control for empty text blocks
- **@aydnOktay** (1 PR) — Improve error logging in session search tool
- **@austinpickett** (1 PR) — Landing page redesign with Nous blue palette
- **@JackTheGit** (1 PR) — Documentation typo fixes
### All Contributors
@0xbyt4, @alt-glitch, @arceus77-7, @ASRagab, @austinpickett, @aydnOktay, @brandtcormorant, @eren-karakus0, @erosika, @JackTheGit, @jplew, @kshitijk4poor, @mr-emmett-one, @SHL0MS, @teknium1
---
**Full Changelog**: [v2026.3.12...v2026.3.17](https://github.com/NousResearch/hermes-agent/compare/v2026.3.12...v2026.3.17)

400
RELEASE_v0.4.0.md Normal file
View File

@@ -0,0 +1,400 @@
# Hermes Agent v0.4.0 (v2026.3.23)
**Release Date:** March 23, 2026
> The platform expansion release — OpenAI-compatible API server, 6 new messaging adapters, 4 new inference providers, MCP server management with OAuth 2.1, @ context references, gateway prompt caching, streaming enabled by default, and a sweeping reliability pass with 200+ bug fixes.
---
## ✨ Highlights
- **OpenAI-compatible API server** — Expose Hermes as an `/v1/chat/completions` endpoint with a new `/api/jobs` REST API for cron job management, hardened with input limits, field whitelists, SQLite-backed response persistence, and CORS origin protection ([#1756](https://github.com/NousResearch/hermes-agent/pull/1756), [#2450](https://github.com/NousResearch/hermes-agent/pull/2450), [#2456](https://github.com/NousResearch/hermes-agent/pull/2456), [#2451](https://github.com/NousResearch/hermes-agent/pull/2451), [#2472](https://github.com/NousResearch/hermes-agent/pull/2472))
- **6 new messaging platform adapters** — Signal, DingTalk, SMS (Twilio), Mattermost, Matrix, and Webhook adapters join Telegram, Discord, and WhatsApp. Gateway auto-reconnects failed platforms with exponential backoff ([#2206](https://github.com/NousResearch/hermes-agent/pull/2206), [#1685](https://github.com/NousResearch/hermes-agent/pull/1685), [#1688](https://github.com/NousResearch/hermes-agent/pull/1688), [#1683](https://github.com/NousResearch/hermes-agent/pull/1683), [#2166](https://github.com/NousResearch/hermes-agent/pull/2166), [#2584](https://github.com/NousResearch/hermes-agent/pull/2584))
- **@ context references** — Claude Code-style `@file` and `@url` context injection with tab completions in the CLI ([#2343](https://github.com/NousResearch/hermes-agent/pull/2343), [#2482](https://github.com/NousResearch/hermes-agent/pull/2482))
- **4 new inference providers** — GitHub Copilot (OAuth + token validation), Alibaba Cloud / DashScope, Kilo Code, and OpenCode Zen/Go ([#1924](https://github.com/NousResearch/hermes-agent/pull/1924), [#1879](https://github.com/NousResearch/hermes-agent/pull/1879) by @mchzimm, [#1673](https://github.com/NousResearch/hermes-agent/pull/1673), [#1666](https://github.com/NousResearch/hermes-agent/pull/1666), [#1650](https://github.com/NousResearch/hermes-agent/pull/1650))
- **MCP server management CLI** — `hermes mcp` commands for installing, configuring, and authenticating MCP servers with full OAuth 2.1 PKCE flow ([#2465](https://github.com/NousResearch/hermes-agent/pull/2465))
- **Gateway prompt caching** — Cache AIAgent instances per session, preserving Anthropic prompt cache across turns for dramatic cost reduction on long conversations ([#2282](https://github.com/NousResearch/hermes-agent/pull/2282), [#2284](https://github.com/NousResearch/hermes-agent/pull/2284), [#2361](https://github.com/NousResearch/hermes-agent/pull/2361))
- **Context compression overhaul** — Structured summaries with iterative updates, token-budget tail protection, configurable summary endpoint, and fallback model support ([#2323](https://github.com/NousResearch/hermes-agent/pull/2323), [#1727](https://github.com/NousResearch/hermes-agent/pull/1727), [#2224](https://github.com/NousResearch/hermes-agent/pull/2224))
- **Streaming enabled by default** — CLI streaming on by default with proper spinner/tool progress display during streaming mode, plus extensive linebreak and concatenation fixes ([#2340](https://github.com/NousResearch/hermes-agent/pull/2340), [#2161](https://github.com/NousResearch/hermes-agent/pull/2161), [#2258](https://github.com/NousResearch/hermes-agent/pull/2258))
---
## 🖥️ CLI & User Experience
### New Commands & Interactions
- **@ context completions** — Tab-completable `@file`/`@url` references that inject file content or web pages into the conversation ([#2482](https://github.com/NousResearch/hermes-agent/pull/2482), [#2343](https://github.com/NousResearch/hermes-agent/pull/2343))
- **`/statusbar`** — Toggle a persistent config bar showing model + provider info in the prompt ([#2240](https://github.com/NousResearch/hermes-agent/pull/2240), [#1917](https://github.com/NousResearch/hermes-agent/pull/1917))
- **`/queue`** — Queue prompts for the agent without interrupting the current run ([#2191](https://github.com/NousResearch/hermes-agent/pull/2191), [#2469](https://github.com/NousResearch/hermes-agent/pull/2469))
- **`/permission`** — Switch approval mode dynamically during a session ([#2207](https://github.com/NousResearch/hermes-agent/pull/2207))
- **`/browser`** — Interactive browser sessions from the CLI ([#2273](https://github.com/NousResearch/hermes-agent/pull/2273), [#1814](https://github.com/NousResearch/hermes-agent/pull/1814))
- **`/cost`** — Live pricing and usage tracking in gateway mode ([#2180](https://github.com/NousResearch/hermes-agent/pull/2180))
- **`/approve` and `/deny`** — Replaced bare text approval in gateway with explicit commands ([#2002](https://github.com/NousResearch/hermes-agent/pull/2002))
### Streaming & Display
- Streaming enabled by default in CLI ([#2340](https://github.com/NousResearch/hermes-agent/pull/2340))
- Show spinners and tool progress during streaming mode ([#2161](https://github.com/NousResearch/hermes-agent/pull/2161))
- Show reasoning/thinking blocks when `show_reasoning` enabled ([#2118](https://github.com/NousResearch/hermes-agent/pull/2118))
- Context pressure warnings for CLI and gateway ([#2159](https://github.com/NousResearch/hermes-agent/pull/2159))
- Fix: streaming chunks concatenated without whitespace ([#2258](https://github.com/NousResearch/hermes-agent/pull/2258))
- Fix: iteration boundary linebreak prevents stream concatenation ([#2413](https://github.com/NousResearch/hermes-agent/pull/2413))
- Fix: defer streaming linebreak to prevent blank line stacking ([#2473](https://github.com/NousResearch/hermes-agent/pull/2473))
- Fix: suppress spinner animation in non-TTY environments ([#2216](https://github.com/NousResearch/hermes-agent/pull/2216))
- Fix: display provider and endpoint in API error messages ([#2266](https://github.com/NousResearch/hermes-agent/pull/2266))
- Fix: resolve garbled ANSI escape codes in status printouts ([#2448](https://github.com/NousResearch/hermes-agent/pull/2448))
- Fix: update gold ANSI color to true-color format ([#2246](https://github.com/NousResearch/hermes-agent/pull/2246))
- Fix: normalize toolset labels and use skin colors in banner ([#1912](https://github.com/NousResearch/hermes-agent/pull/1912))
### CLI Polish
- Fix: prevent 'Press ENTER to continue...' on exit ([#2555](https://github.com/NousResearch/hermes-agent/pull/2555))
- Fix: flush stdout during agent loop to prevent macOS display freeze ([#1654](https://github.com/NousResearch/hermes-agent/pull/1654))
- Fix: show human-readable error when `hermes setup` hits permissions error ([#2196](https://github.com/NousResearch/hermes-agent/pull/2196))
- Fix: `/stop` command crash + UnboundLocalError in streaming media delivery ([#2463](https://github.com/NousResearch/hermes-agent/pull/2463))
- Fix: allow custom/local endpoints without API key ([#2556](https://github.com/NousResearch/hermes-agent/pull/2556))
- Fix: Kitty keyboard protocol Shift+Enter for Ghostty/WezTerm (attempted + reverted due to prompt_toolkit crash) ([#2345](https://github.com/NousResearch/hermes-agent/pull/2345), [#2349](https://github.com/NousResearch/hermes-agent/pull/2349))
### Configuration
- **`${ENV_VAR}` substitution** in config.yaml ([#2684](https://github.com/NousResearch/hermes-agent/pull/2684))
- **Real-time config reload** — config.yaml changes apply without restart ([#2210](https://github.com/NousResearch/hermes-agent/pull/2210))
- **`custom_models.yaml`** for user-managed model additions ([#2214](https://github.com/NousResearch/hermes-agent/pull/2214))
- **Priority-based context file selection** + CLAUDE.md support ([#2301](https://github.com/NousResearch/hermes-agent/pull/2301))
- **Merge nested YAML sections** instead of replacing on config update ([#2213](https://github.com/NousResearch/hermes-agent/pull/2213))
- Fix: config.yaml provider key overrides env var silently ([#2272](https://github.com/NousResearch/hermes-agent/pull/2272))
- Fix: log warning instead of silently swallowing config.yaml errors ([#2683](https://github.com/NousResearch/hermes-agent/pull/2683))
- Fix: disabled toolsets re-enable themselves after `hermes tools` ([#2268](https://github.com/NousResearch/hermes-agent/pull/2268))
- Fix: platform default toolsets silently override tool deselection ([#2624](https://github.com/NousResearch/hermes-agent/pull/2624))
- Fix: honor bare YAML `approvals.mode: off` ([#2620](https://github.com/NousResearch/hermes-agent/pull/2620))
- Fix: `hermes update` use `.[all]` extras with fallback ([#1728](https://github.com/NousResearch/hermes-agent/pull/1728))
- Fix: `hermes update` prompt before resetting working tree on stash conflicts ([#2390](https://github.com/NousResearch/hermes-agent/pull/2390))
- Fix: use git pull --rebase in update/install to avoid divergent branch error ([#2274](https://github.com/NousResearch/hermes-agent/pull/2274))
- Fix: add zprofile fallback and create zshrc on fresh macOS installs ([#2320](https://github.com/NousResearch/hermes-agent/pull/2320))
- Fix: remove `ANTHROPIC_BASE_URL` env var to avoid collisions ([#1675](https://github.com/NousResearch/hermes-agent/pull/1675))
- Fix: don't ask IMAP password if already in keyring or env ([#2212](https://github.com/NousResearch/hermes-agent/pull/2212))
- Fix: OpenCode Zen/Go show OpenRouter models instead of their own ([#2277](https://github.com/NousResearch/hermes-agent/pull/2277))
---
## 🏗️ Core Agent & Architecture
### New Providers
- **GitHub Copilot** — Full OAuth auth, API routing, token validation, and 400k context. ([#1924](https://github.com/NousResearch/hermes-agent/pull/1924), [#1896](https://github.com/NousResearch/hermes-agent/pull/1896), [#1879](https://github.com/NousResearch/hermes-agent/pull/1879) by @mchzimm, [#2507](https://github.com/NousResearch/hermes-agent/pull/2507))
- **Alibaba Cloud / DashScope** — Full integration with DashScope v1 runtime, model dot preservation, and 401 auth fixes ([#1673](https://github.com/NousResearch/hermes-agent/pull/1673), [#2332](https://github.com/NousResearch/hermes-agent/pull/2332), [#2459](https://github.com/NousResearch/hermes-agent/pull/2459))
- **Kilo Code** — First-class inference provider ([#1666](https://github.com/NousResearch/hermes-agent/pull/1666))
- **OpenCode Zen and OpenCode Go** — New provider backends ([#1650](https://github.com/NousResearch/hermes-agent/pull/1650), [#2393](https://github.com/NousResearch/hermes-agent/pull/2393) by @0xbyt4)
- **NeuTTS** — Local TTS provider backend with built-in setup flow, replacing the old optional skill ([#1657](https://github.com/NousResearch/hermes-agent/pull/1657), [#1664](https://github.com/NousResearch/hermes-agent/pull/1664))
### Provider Improvements
- **Eager fallback** to backup model on rate-limit errors ([#1730](https://github.com/NousResearch/hermes-agent/pull/1730))
- **Endpoint metadata** for custom model context and pricing; query local servers for actual context window size ([#1906](https://github.com/NousResearch/hermes-agent/pull/1906), [#2091](https://github.com/NousResearch/hermes-agent/pull/2091) by @dusterbloom)
- **Context length detection overhaul** — models.dev integration, provider-aware resolution, fuzzy matching for custom endpoints, `/v1/props` for llama.cpp ([#2158](https://github.com/NousResearch/hermes-agent/pull/2158), [#2051](https://github.com/NousResearch/hermes-agent/pull/2051), [#2403](https://github.com/NousResearch/hermes-agent/pull/2403))
- **Model catalog updates** — gpt-5.4-mini, gpt-5.4-nano, healer-alpha, haiku-4.5, minimax-m2.7, claude 4.6 at 1M context ([#1913](https://github.com/NousResearch/hermes-agent/pull/1913), [#1915](https://github.com/NousResearch/hermes-agent/pull/1915), [#1900](https://github.com/NousResearch/hermes-agent/pull/1900), [#2155](https://github.com/NousResearch/hermes-agent/pull/2155), [#2474](https://github.com/NousResearch/hermes-agent/pull/2474))
- **Custom endpoint improvements** — `model.base_url` in config.yaml, `api_mode` override for responses API, allow endpoints without API key, fail fast on missing keys ([#2330](https://github.com/NousResearch/hermes-agent/pull/2330), [#1651](https://github.com/NousResearch/hermes-agent/pull/1651), [#2556](https://github.com/NousResearch/hermes-agent/pull/2556), [#2445](https://github.com/NousResearch/hermes-agent/pull/2445), [#1994](https://github.com/NousResearch/hermes-agent/pull/1994), [#1998](https://github.com/NousResearch/hermes-agent/pull/1998))
- Inject model and provider into system prompt ([#1929](https://github.com/NousResearch/hermes-agent/pull/1929))
- Tie `api_mode` to provider config instead of env var ([#1656](https://github.com/NousResearch/hermes-agent/pull/1656))
- Fix: prevent Anthropic token leaking to third-party `anthropic_messages` providers ([#2389](https://github.com/NousResearch/hermes-agent/pull/2389))
- Fix: prevent Anthropic fallback from inheriting non-Anthropic `base_url` ([#2388](https://github.com/NousResearch/hermes-agent/pull/2388))
- Fix: `auxiliary_is_nous` flag never resets — leaked Nous tags to other providers ([#1713](https://github.com/NousResearch/hermes-agent/pull/1713))
- Fix: Anthropic `tool_choice 'none'` still allowed tool calls ([#1714](https://github.com/NousResearch/hermes-agent/pull/1714))
- Fix: Mistral parser nested JSON fallback extraction ([#2335](https://github.com/NousResearch/hermes-agent/pull/2335))
- Fix: MiniMax 401 auth resolved by defaulting to `anthropic_messages` ([#2103](https://github.com/NousResearch/hermes-agent/pull/2103))
- Fix: case-insensitive model family matching ([#2350](https://github.com/NousResearch/hermes-agent/pull/2350))
- Fix: ignore placeholder provider keys in activation checks ([#2358](https://github.com/NousResearch/hermes-agent/pull/2358))
- Fix: Preserve Ollama model:tag colons in context length detection ([#2149](https://github.com/NousResearch/hermes-agent/pull/2149))
- Fix: recognize Claude Code OAuth credentials in startup gate ([#1663](https://github.com/NousResearch/hermes-agent/pull/1663))
- Fix: detect Claude Code version dynamically for OAuth user-agent ([#1670](https://github.com/NousResearch/hermes-agent/pull/1670))
- Fix: OAuth flag stale after refresh/fallback ([#1890](https://github.com/NousResearch/hermes-agent/pull/1890))
- Fix: auxiliary client skips expired Codex JWT ([#2397](https://github.com/NousResearch/hermes-agent/pull/2397))
### Agent Loop
- **Gateway prompt caching** — Cache AIAgent per session, keep assistant turns, fix session restore ([#2282](https://github.com/NousResearch/hermes-agent/pull/2282), [#2284](https://github.com/NousResearch/hermes-agent/pull/2284), [#2361](https://github.com/NousResearch/hermes-agent/pull/2361))
- **Context compression overhaul** — Structured summaries, iterative updates, token-budget tail protection, configurable `summary_base_url` ([#2323](https://github.com/NousResearch/hermes-agent/pull/2323), [#1727](https://github.com/NousResearch/hermes-agent/pull/1727), [#2224](https://github.com/NousResearch/hermes-agent/pull/2224))
- **Pre-call sanitization and post-call tool guardrails** ([#1732](https://github.com/NousResearch/hermes-agent/pull/1732))
- **Auto-recover** from provider-rejected `tool_choice` by retrying without ([#2174](https://github.com/NousResearch/hermes-agent/pull/2174))
- **Background memory/skill review** replaces inline nudges ([#2235](https://github.com/NousResearch/hermes-agent/pull/2235))
- **SOUL.md as primary agent identity** instead of hardcoded default ([#1922](https://github.com/NousResearch/hermes-agent/pull/1922))
- Fix: prevent silent tool result loss during context compression ([#1993](https://github.com/NousResearch/hermes-agent/pull/1993))
- Fix: handle empty/null function arguments in tool call recovery ([#2163](https://github.com/NousResearch/hermes-agent/pull/2163))
- Fix: handle API refusal responses gracefully instead of crashing ([#2156](https://github.com/NousResearch/hermes-agent/pull/2156))
- Fix: prevent stuck agent loop on malformed tool calls ([#2114](https://github.com/NousResearch/hermes-agent/pull/2114))
- Fix: return JSON parse error to model instead of dispatching with empty args ([#2342](https://github.com/NousResearch/hermes-agent/pull/2342))
- Fix: consecutive assistant message merge drops content on mixed types ([#1703](https://github.com/NousResearch/hermes-agent/pull/1703))
- Fix: message role alternation violations in JSON recovery and error handler ([#1722](https://github.com/NousResearch/hermes-agent/pull/1722))
- Fix: `compression_attempts` resets each iteration — allowed unlimited compressions ([#1723](https://github.com/NousResearch/hermes-agent/pull/1723))
- Fix: `length_continue_retries` never resets — later truncations got fewer retries ([#1717](https://github.com/NousResearch/hermes-agent/pull/1717))
- Fix: compressor summary role violated consecutive-role constraint ([#1720](https://github.com/NousResearch/hermes-agent/pull/1720), [#1743](https://github.com/NousResearch/hermes-agent/pull/1743))
- Fix: remove hardcoded `gemini-3-flash-preview` as default summary model ([#2464](https://github.com/NousResearch/hermes-agent/pull/2464))
- Fix: correctly handle empty tool results ([#2201](https://github.com/NousResearch/hermes-agent/pull/2201))
- Fix: crash on None entry in `tool_calls` list ([#2209](https://github.com/NousResearch/hermes-agent/pull/2209) by @0xbyt4, [#2316](https://github.com/NousResearch/hermes-agent/pull/2316))
- Fix: per-thread persistent event loops in worker threads ([#2214](https://github.com/NousResearch/hermes-agent/pull/2214) by @jquesnelle)
- Fix: prevent 'event loop already running' when async tools run in parallel ([#2207](https://github.com/NousResearch/hermes-agent/pull/2207))
- Fix: strip ANSI at the source — clean terminal output before it reaches the model ([#2115](https://github.com/NousResearch/hermes-agent/pull/2115))
- Fix: skip top-level `cache_control` on role:tool for OpenRouter ([#2391](https://github.com/NousResearch/hermes-agent/pull/2391))
- Fix: delegate tool — save parent tool names before child construction mutates global ([#2083](https://github.com/NousResearch/hermes-agent/pull/2083) by @ygd58, [#1894](https://github.com/NousResearch/hermes-agent/pull/1894))
- Fix: only strip last assistant message if empty string ([#2326](https://github.com/NousResearch/hermes-agent/pull/2326))
### Session & Memory
- **Session search** and management slash commands ([#2198](https://github.com/NousResearch/hermes-agent/pull/2198))
- **Auto session titles** and `.hermes.md` project config ([#1712](https://github.com/NousResearch/hermes-agent/pull/1712))
- Fix: concurrent memory writes silently drop entries — added file locking ([#1726](https://github.com/NousResearch/hermes-agent/pull/1726))
- Fix: search all sources by default in `session_search` ([#1892](https://github.com/NousResearch/hermes-agent/pull/1892))
- Fix: handle hyphenated FTS5 queries and preserve quoted literals ([#1776](https://github.com/NousResearch/hermes-agent/pull/1776))
- Fix: skip corrupt lines in `load_transcript` instead of crashing ([#1744](https://github.com/NousResearch/hermes-agent/pull/1744))
- Fix: normalize session keys to prevent case-sensitive duplicates ([#2157](https://github.com/NousResearch/hermes-agent/pull/2157))
- Fix: prevent `session_search` crash when no sessions exist ([#2194](https://github.com/NousResearch/hermes-agent/pull/2194))
- Fix: reset token counters on new session for accurate usage display ([#2101](https://github.com/NousResearch/hermes-agent/pull/2101) by @InB4DevOps)
- Fix: prevent stale memory overwrites by flush agent ([#2687](https://github.com/NousResearch/hermes-agent/pull/2687))
- Fix: remove synthetic error message injection, fix session resume after repeated failures ([#2303](https://github.com/NousResearch/hermes-agent/pull/2303))
- Fix: quiet mode with `--resume` now passes conversation_history ([#2357](https://github.com/NousResearch/hermes-agent/pull/2357))
- Fix: unify resume logic in batch mode ([#2331](https://github.com/NousResearch/hermes-agent/pull/2331))
### Honcho Memory
- Honcho config fixes and @ context reference integration ([#2343](https://github.com/NousResearch/hermes-agent/pull/2343))
- Self-hosted / Docker configuration documentation ([#2475](https://github.com/NousResearch/hermes-agent/pull/2475))
---
## 📱 Messaging Platforms (Gateway)
### New Platform Adapters
- **Signal Messenger** — Full adapter with attachment handling, group message filtering, and Note to Self echo-back protection ([#2206](https://github.com/NousResearch/hermes-agent/pull/2206), [#2400](https://github.com/NousResearch/hermes-agent/pull/2400), [#2297](https://github.com/NousResearch/hermes-agent/pull/2297), [#2156](https://github.com/NousResearch/hermes-agent/pull/2156))
- **DingTalk** — Adapter with gateway wiring and setup docs ([#1685](https://github.com/NousResearch/hermes-agent/pull/1685), [#1690](https://github.com/NousResearch/hermes-agent/pull/1690), [#1692](https://github.com/NousResearch/hermes-agent/pull/1692))
- **SMS (Twilio)** ([#1688](https://github.com/NousResearch/hermes-agent/pull/1688))
- **Mattermost** — With @-mention-only channel filter ([#1683](https://github.com/NousResearch/hermes-agent/pull/1683), [#2443](https://github.com/NousResearch/hermes-agent/pull/2443))
- **Matrix** — With vision support and image caching ([#1683](https://github.com/NousResearch/hermes-agent/pull/1683), [#2520](https://github.com/NousResearch/hermes-agent/pull/2520))
- **Webhook** — Platform adapter for external event triggers ([#2166](https://github.com/NousResearch/hermes-agent/pull/2166))
- **OpenAI-compatible API server** — `/v1/chat/completions` endpoint with `/api/jobs` cron management ([#1756](https://github.com/NousResearch/hermes-agent/pull/1756), [#2450](https://github.com/NousResearch/hermes-agent/pull/2450), [#2456](https://github.com/NousResearch/hermes-agent/pull/2456))
### Telegram Improvements
- MarkdownV2 support — strikethrough, spoiler, blockquotes, escape parentheses/braces/backslashes/backticks ([#2199](https://github.com/NousResearch/hermes-agent/pull/2199), [#2200](https://github.com/NousResearch/hermes-agent/pull/2200) by @llbn, [#2386](https://github.com/NousResearch/hermes-agent/pull/2386))
- Auto-detect HTML tags and use `parse_mode=HTML` ([#1709](https://github.com/NousResearch/hermes-agent/pull/1709))
- Telegram group vision support + thread-based sessions ([#2153](https://github.com/NousResearch/hermes-agent/pull/2153))
- Auto-reconnect polling after network interruption ([#2517](https://github.com/NousResearch/hermes-agent/pull/2517))
- Aggregate split text messages before dispatching ([#1674](https://github.com/NousResearch/hermes-agent/pull/1674))
- Fix: streaming config bridge, not-modified, flood control ([#1782](https://github.com/NousResearch/hermes-agent/pull/1782), [#1783](https://github.com/NousResearch/hermes-agent/pull/1783))
- Fix: edited_message event crashes ([#2074](https://github.com/NousResearch/hermes-agent/pull/2074))
- Fix: retry 409 polling conflicts before giving up ([#2312](https://github.com/NousResearch/hermes-agent/pull/2312))
- Fix: topic delivery via `platform:chat_id:thread_id` format ([#2455](https://github.com/NousResearch/hermes-agent/pull/2455))
### Discord Improvements
- Document caching and text-file injection ([#2503](https://github.com/NousResearch/hermes-agent/pull/2503))
- Persistent typing indicator for DMs ([#2468](https://github.com/NousResearch/hermes-agent/pull/2468))
- Discord DM vision — inline images + attachment analysis ([#2186](https://github.com/NousResearch/hermes-agent/pull/2186))
- Persist thread participation across gateway restarts ([#1661](https://github.com/NousResearch/hermes-agent/pull/1661))
- Fix: gateway crash on non-ASCII guild names ([#2302](https://github.com/NousResearch/hermes-agent/pull/2302))
- Fix: thread permission errors ([#2073](https://github.com/NousResearch/hermes-agent/pull/2073))
- Fix: slash event routing in threads ([#2460](https://github.com/NousResearch/hermes-agent/pull/2460))
- Fix: remove bugged followup messages + `/ask` command ([#1836](https://github.com/NousResearch/hermes-agent/pull/1836))
- Fix: graceful WebSocket reconnection ([#2127](https://github.com/NousResearch/hermes-agent/pull/2127))
- Fix: voice channel TTS when streaming enabled ([#2322](https://github.com/NousResearch/hermes-agent/pull/2322))
### WhatsApp & Other Adapters
- WhatsApp: outbound `send_message` routing ([#1769](https://github.com/NousResearch/hermes-agent/pull/1769) by @sai-samarth), LID format self-chat ([#1667](https://github.com/NousResearch/hermes-agent/pull/1667)), `reply_prefix` config fix ([#1923](https://github.com/NousResearch/hermes-agent/pull/1923)), restart on bridge child exit ([#2334](https://github.com/NousResearch/hermes-agent/pull/2334)), image/bridge improvements ([#2181](https://github.com/NousResearch/hermes-agent/pull/2181))
- Matrix: correct `reply_to_message_id` parameter ([#1895](https://github.com/NousResearch/hermes-agent/pull/1895)), bare media types fix ([#1736](https://github.com/NousResearch/hermes-agent/pull/1736))
- Mattermost: MIME types for media attachments ([#2329](https://github.com/NousResearch/hermes-agent/pull/2329))
### Gateway Core
- **Auto-reconnect** failed platforms with exponential backoff ([#2584](https://github.com/NousResearch/hermes-agent/pull/2584))
- **Notify users when session auto-resets** ([#2519](https://github.com/NousResearch/hermes-agent/pull/2519))
- **Reply-to message context** for out-of-session replies ([#1662](https://github.com/NousResearch/hermes-agent/pull/1662))
- **Ignore unauthorized DMs** config option ([#1919](https://github.com/NousResearch/hermes-agent/pull/1919))
- Fix: `/reset` in thread-mode resets global session instead of thread ([#2254](https://github.com/NousResearch/hermes-agent/pull/2254))
- Fix: deliver MEDIA: files after streaming responses ([#2382](https://github.com/NousResearch/hermes-agent/pull/2382))
- Fix: cap interrupt recursion depth to prevent resource exhaustion ([#1659](https://github.com/NousResearch/hermes-agent/pull/1659))
- Fix: detect stopped processes and release stale locks on `--replace` ([#2406](https://github.com/NousResearch/hermes-agent/pull/2406), [#1908](https://github.com/NousResearch/hermes-agent/pull/1908))
- Fix: PID-based wait with force-kill for gateway restart ([#1902](https://github.com/NousResearch/hermes-agent/pull/1902))
- Fix: prevent `--replace` mode from killing the caller process ([#2185](https://github.com/NousResearch/hermes-agent/pull/2185))
- Fix: `/model` shows active fallback model instead of config default ([#1660](https://github.com/NousResearch/hermes-agent/pull/1660))
- Fix: `/title` command fails when session doesn't exist in SQLite yet ([#2379](https://github.com/NousResearch/hermes-agent/pull/2379) by @ten-jampa)
- Fix: process `/queue`'d messages after agent completion ([#2469](https://github.com/NousResearch/hermes-agent/pull/2469))
- Fix: strip orphaned `tool_results` + let `/reset` bypass running agent ([#2180](https://github.com/NousResearch/hermes-agent/pull/2180))
- Fix: prevent agents from starting gateway outside systemd management ([#2617](https://github.com/NousResearch/hermes-agent/pull/2617))
- Fix: prevent systemd restart storm on gateway connection failure ([#2327](https://github.com/NousResearch/hermes-agent/pull/2327))
- Fix: include resolved node path in systemd unit ([#1767](https://github.com/NousResearch/hermes-agent/pull/1767) by @sai-samarth)
- Fix: send error details to user in gateway outer exception handler ([#1966](https://github.com/NousResearch/hermes-agent/pull/1966))
- Fix: improve error handling for 429 usage limits and 500 context overflow ([#1839](https://github.com/NousResearch/hermes-agent/pull/1839))
- Fix: add all missing platform allowlist env vars to startup warning check ([#2628](https://github.com/NousResearch/hermes-agent/pull/2628))
- Fix: media delivery fails for file paths containing spaces ([#2621](https://github.com/NousResearch/hermes-agent/pull/2621))
- Fix: duplicate session-key collision in multi-platform gateway ([#2171](https://github.com/NousResearch/hermes-agent/pull/2171))
- Fix: Matrix and Mattermost never report as connected ([#1711](https://github.com/NousResearch/hermes-agent/pull/1711))
- Fix: PII redaction config never read — missing yaml import ([#1701](https://github.com/NousResearch/hermes-agent/pull/1701))
- Fix: NameError on skill slash commands ([#1697](https://github.com/NousResearch/hermes-agent/pull/1697))
- Fix: persist watcher metadata in checkpoint for crash recovery ([#1706](https://github.com/NousResearch/hermes-agent/pull/1706))
- Fix: pass `message_thread_id` in send_image_file, send_document, send_video ([#2339](https://github.com/NousResearch/hermes-agent/pull/2339))
- Fix: media-group aggregation on rapid successive photo messages ([#2160](https://github.com/NousResearch/hermes-agent/pull/2160))
---
## 🔧 Tool System
### MCP Enhancements
- **MCP server management CLI** + OAuth 2.1 PKCE auth ([#2465](https://github.com/NousResearch/hermes-agent/pull/2465))
- **Expose MCP servers as standalone toolsets** ([#1907](https://github.com/NousResearch/hermes-agent/pull/1907))
- **Interactive MCP tool configuration** in `hermes tools` ([#1694](https://github.com/NousResearch/hermes-agent/pull/1694))
- Fix: MCP-OAuth port mismatch, path traversal, and shared handler state ([#2552](https://github.com/NousResearch/hermes-agent/pull/2552))
- Fix: preserve MCP tool registrations across session resets ([#2124](https://github.com/NousResearch/hermes-agent/pull/2124))
- Fix: concurrent file access crash + duplicate MCP registration ([#2154](https://github.com/NousResearch/hermes-agent/pull/2154))
- Fix: normalise MCP schemas + expand session list columns ([#2102](https://github.com/NousResearch/hermes-agent/pull/2102))
- Fix: `tool_choice` `mcp_` prefix handling ([#1775](https://github.com/NousResearch/hermes-agent/pull/1775))
### Web Tool Backends
- **Tavily** as web search/extract/crawl backend ([#1731](https://github.com/NousResearch/hermes-agent/pull/1731))
- **Parallel** as alternative web search/extract backend ([#1696](https://github.com/NousResearch/hermes-agent/pull/1696))
- **Configurable web backend** — Firecrawl/BeautifulSoup/Playwright selection ([#2256](https://github.com/NousResearch/hermes-agent/pull/2256))
- Fix: whitespace-only env vars bypass web backend detection ([#2341](https://github.com/NousResearch/hermes-agent/pull/2341))
### New Tools
- **IMAP email** reading and sending ([#2173](https://github.com/NousResearch/hermes-agent/pull/2173))
- **STT (speech-to-text)** tool using Whisper API ([#2072](https://github.com/NousResearch/hermes-agent/pull/2072))
- **Route-aware pricing estimates** ([#1695](https://github.com/NousResearch/hermes-agent/pull/1695))
### Tool Improvements
- TTS: `base_url` support for OpenAI TTS provider ([#2064](https://github.com/NousResearch/hermes-agent/pull/2064) by @hanai)
- Vision: configurable timeout, tilde expansion in file paths, DM vision with multi-image and base64 fallback ([#2480](https://github.com/NousResearch/hermes-agent/pull/2480), [#2585](https://github.com/NousResearch/hermes-agent/pull/2585), [#2211](https://github.com/NousResearch/hermes-agent/pull/2211))
- Browser: race condition fix in session creation ([#1721](https://github.com/NousResearch/hermes-agent/pull/1721)), TypeError on unexpected LLM params ([#1735](https://github.com/NousResearch/hermes-agent/pull/1735))
- File tools: strip ANSI escape codes from write_file and patch content ([#2532](https://github.com/NousResearch/hermes-agent/pull/2532)), include pagination args in repeated search key ([#1824](https://github.com/NousResearch/hermes-agent/pull/1824) by @cutepawss), improve fuzzy matching accuracy + position calculation refactor ([#2096](https://github.com/NousResearch/hermes-agent/pull/2096), [#1681](https://github.com/NousResearch/hermes-agent/pull/1681))
- Code execution: resource leak and double socket close fix ([#2381](https://github.com/NousResearch/hermes-agent/pull/2381))
- Delegate: thread safety for concurrent subagent delegation ([#1672](https://github.com/NousResearch/hermes-agent/pull/1672)), preserve parent agent's tool list after delegation ([#1778](https://github.com/NousResearch/hermes-agent/pull/1778))
- Fix: make concurrent tool batching path-aware for file mutations ([#1914](https://github.com/NousResearch/hermes-agent/pull/1914))
- Fix: chunk long messages in `send_message_tool` before platform dispatch ([#1646](https://github.com/NousResearch/hermes-agent/pull/1646))
- Fix: add missing 'messaging' toolset ([#1718](https://github.com/NousResearch/hermes-agent/pull/1718))
- Fix: prevent unavailable tool names from leaking into model schemas ([#2072](https://github.com/NousResearch/hermes-agent/pull/2072))
- Fix: pass visited set by reference to prevent diamond dependency duplication ([#2311](https://github.com/NousResearch/hermes-agent/pull/2311))
- Fix: Daytona sandbox lookup migrated from `find_one` to `get/list` ([#2063](https://github.com/NousResearch/hermes-agent/pull/2063) by @rovle)
---
## 🧩 Skills Ecosystem
### Skills System Improvements
- **Agent-created skills** — Caution-level findings allowed, dangerous skills ask instead of block ([#1840](https://github.com/NousResearch/hermes-agent/pull/1840), [#2446](https://github.com/NousResearch/hermes-agent/pull/2446))
- **`--yes` flag** to bypass confirmation in `/skills install` and uninstall ([#1647](https://github.com/NousResearch/hermes-agent/pull/1647))
- **Disabled skills respected** across banner, system prompt, and slash commands ([#1897](https://github.com/NousResearch/hermes-agent/pull/1897))
- Fix: skills custom_tools import crash + sandbox file_tools integration ([#2239](https://github.com/NousResearch/hermes-agent/pull/2239))
- Fix: agent-created skills with pip requirements crash on install ([#2145](https://github.com/NousResearch/hermes-agent/pull/2145))
- Fix: race condition in `Skills.__init__` when `hub.yaml` missing ([#2242](https://github.com/NousResearch/hermes-agent/pull/2242))
- Fix: validate skill metadata before install and block duplicates ([#2241](https://github.com/NousResearch/hermes-agent/pull/2241))
- Fix: skills hub inspect/resolve — 4 bugs in inspect, redirects, discovery, tap list ([#2447](https://github.com/NousResearch/hermes-agent/pull/2447))
- Fix: agent-created skills keep working after session reset ([#2121](https://github.com/NousResearch/hermes-agent/pull/2121))
### New Skills
- **OCR-and-documents** — PDF/DOCX/XLS/PPTX/image OCR with optional GPU ([#2236](https://github.com/NousResearch/hermes-agent/pull/2236), [#2461](https://github.com/NousResearch/hermes-agent/pull/2461))
- **Huggingface-hub** bundled skill ([#1921](https://github.com/NousResearch/hermes-agent/pull/1921))
- **Sherlock OSINT** username search ([#1671](https://github.com/NousResearch/hermes-agent/pull/1671))
- **Meme-generation** — Image generator with Pillow ([#2344](https://github.com/NousResearch/hermes-agent/pull/2344))
- **Bioinformatics** gateway skill — index to 400+ bio skills ([#2387](https://github.com/NousResearch/hermes-agent/pull/2387))
- **Inference.sh** skill (terminal-based) ([#1686](https://github.com/NousResearch/hermes-agent/pull/1686))
- **Base blockchain** optional skill ([#1643](https://github.com/NousResearch/hermes-agent/pull/1643))
- **3D-model-viewer** optional skill ([#2226](https://github.com/NousResearch/hermes-agent/pull/2226))
- **FastMCP** optional skill ([#2113](https://github.com/NousResearch/hermes-agent/pull/2113))
- **Hermes-agent-setup** skill ([#1905](https://github.com/NousResearch/hermes-agent/pull/1905))
---
## 🔌 Plugin System Enhancements
- **TUI extension hooks** — Build custom CLIs on top of Hermes ([#2333](https://github.com/NousResearch/hermes-agent/pull/2333))
- **`hermes plugins install/remove/list`** commands ([#2337](https://github.com/NousResearch/hermes-agent/pull/2337))
- **Slash command registration** for plugins ([#2359](https://github.com/NousResearch/hermes-agent/pull/2359))
- **`session:end` lifecycle event** hook ([#1725](https://github.com/NousResearch/hermes-agent/pull/1725))
- Fix: require opt-in for project plugin discovery ([#2215](https://github.com/NousResearch/hermes-agent/pull/2215))
---
## 🔒 Security & Reliability
### Security
- **SSRF protection** for vision_tools and web_tools ([#2679](https://github.com/NousResearch/hermes-agent/pull/2679))
- **Shell injection prevention** in `_expand_path` via `~user` path suffix ([#2685](https://github.com/NousResearch/hermes-agent/pull/2685))
- **Block untrusted browser-origin** API server access ([#2451](https://github.com/NousResearch/hermes-agent/pull/2451))
- **Block sandbox backend creds** from subprocess env ([#1658](https://github.com/NousResearch/hermes-agent/pull/1658))
- **Block @ references** from reading secrets outside workspace ([#2601](https://github.com/NousResearch/hermes-agent/pull/2601) by @Gutslabs)
- **Malicious code pattern pre-exec scanner** for terminal_tool ([#2245](https://github.com/NousResearch/hermes-agent/pull/2245))
- **Harden terminal safety** and sandbox file writes ([#1653](https://github.com/NousResearch/hermes-agent/pull/1653))
- **PKCE verifier leak** fix + OAuth refresh Content-Type ([#1775](https://github.com/NousResearch/hermes-agent/pull/1775))
- **Eliminate SQL string formatting** in `execute()` calls ([#2061](https://github.com/NousResearch/hermes-agent/pull/2061) by @dusterbloom)
- **Harden jobs API** — input limits, field whitelist, startup check ([#2456](https://github.com/NousResearch/hermes-agent/pull/2456))
### Reliability
- Thread locks on 4 SessionDB methods ([#1704](https://github.com/NousResearch/hermes-agent/pull/1704))
- File locking for concurrent memory writes ([#1726](https://github.com/NousResearch/hermes-agent/pull/1726))
- Handle OpenRouter errors gracefully ([#2112](https://github.com/NousResearch/hermes-agent/pull/2112))
- Guard print() calls against OSError ([#1668](https://github.com/NousResearch/hermes-agent/pull/1668))
- Safely handle non-string inputs in redacting formatter ([#2392](https://github.com/NousResearch/hermes-agent/pull/2392), [#1700](https://github.com/NousResearch/hermes-agent/pull/1700))
- ACP: preserve session provider on model switch, persist sessions to disk ([#2380](https://github.com/NousResearch/hermes-agent/pull/2380), [#2071](https://github.com/NousResearch/hermes-agent/pull/2071))
- API server: persist ResponseStore to SQLite across restarts ([#2472](https://github.com/NousResearch/hermes-agent/pull/2472))
- Fix: `fetch_nous_models` always TypeError from positional args ([#1699](https://github.com/NousResearch/hermes-agent/pull/1699))
- Fix: resolve merge conflict markers in cli.py breaking startup ([#2347](https://github.com/NousResearch/hermes-agent/pull/2347))
- Fix: `minisweagent_path.py` missing from wheel ([#2098](https://github.com/NousResearch/hermes-agent/pull/2098) by @JiwaniZakir)
### Cron System
- **`[SILENT]` response** — cron agents can suppress delivery ([#1833](https://github.com/NousResearch/hermes-agent/pull/1833))
- **Scale missed-job grace window** with schedule frequency ([#2449](https://github.com/NousResearch/hermes-agent/pull/2449))
- **Recover recent one-shot jobs** ([#1918](https://github.com/NousResearch/hermes-agent/pull/1918))
- Fix: normalize `repeat<=0` to None — jobs deleted after first run when LLM passes -1 ([#2612](https://github.com/NousResearch/hermes-agent/pull/2612) by @Mibayy)
- Fix: Matrix added to scheduler delivery platform_map ([#2167](https://github.com/NousResearch/hermes-agent/pull/2167) by @buntingszn)
- Fix: naive ISO timestamps without timezone — jobs fire at wrong time ([#1729](https://github.com/NousResearch/hermes-agent/pull/1729))
- Fix: `get_due_jobs` reads `jobs.json` twice — race condition ([#1716](https://github.com/NousResearch/hermes-agent/pull/1716))
- Fix: silent jobs return empty response for delivery skip ([#2442](https://github.com/NousResearch/hermes-agent/pull/2442))
- Fix: stop injecting cron outputs into gateway session history ([#2313](https://github.com/NousResearch/hermes-agent/pull/2313))
- Fix: close abandoned coroutine when `asyncio.run()` raises RuntimeError ([#2317](https://github.com/NousResearch/hermes-agent/pull/2317))
---
## 🧪 Testing
- Resolve all consistently failing tests ([#2488](https://github.com/NousResearch/hermes-agent/pull/2488))
- Replace `FakePath` with `monkeypatch` for Python 3.12 compat ([#2444](https://github.com/NousResearch/hermes-agent/pull/2444))
- Align Hermes setup and full-suite expectations ([#1710](https://github.com/NousResearch/hermes-agent/pull/1710))
---
## 📚 Documentation
- Comprehensive docs update for recent features ([#1693](https://github.com/NousResearch/hermes-agent/pull/1693), [#2183](https://github.com/NousResearch/hermes-agent/pull/2183))
- Alibaba Cloud and DingTalk setup guides ([#1687](https://github.com/NousResearch/hermes-agent/pull/1687), [#1692](https://github.com/NousResearch/hermes-agent/pull/1692))
- Detailed skills documentation ([#2244](https://github.com/NousResearch/hermes-agent/pull/2244))
- Honcho self-hosted / Docker configuration ([#2475](https://github.com/NousResearch/hermes-agent/pull/2475))
- Context length detection FAQ and quickstart references ([#2179](https://github.com/NousResearch/hermes-agent/pull/2179))
- Fix docs inconsistencies across reference and user guides ([#1995](https://github.com/NousResearch/hermes-agent/pull/1995))
- Fix MCP install commands — use uv, not bare pip ([#1909](https://github.com/NousResearch/hermes-agent/pull/1909))
- Replace ASCII diagrams with Mermaid/lists ([#2402](https://github.com/NousResearch/hermes-agent/pull/2402))
- Gemini OAuth provider implementation plan ([#2467](https://github.com/NousResearch/hermes-agent/pull/2467))
- Discord Server Members Intent marked as required ([#2330](https://github.com/NousResearch/hermes-agent/pull/2330))
- Fix MDX build error in api-server.md ([#1787](https://github.com/NousResearch/hermes-agent/pull/1787))
- Align venv path to match installer ([#2114](https://github.com/NousResearch/hermes-agent/pull/2114))
- New skills added to hub index ([#2281](https://github.com/NousResearch/hermes-agent/pull/2281))
---
## 👥 Contributors
### Core
- **@teknium1** (Teknium) — 280 PRs
### Community Contributors
- **@mchzimm** (to_the_max) — GitHub Copilot provider integration ([#1879](https://github.com/NousResearch/hermes-agent/pull/1879))
- **@jquesnelle** (Jeffrey Quesnelle) — Per-thread persistent event loops fix ([#2214](https://github.com/NousResearch/hermes-agent/pull/2214))
- **@llbn** (lbn) — Telegram MarkdownV2 strikethrough, spoiler, blockquotes, and escape fixes ([#2199](https://github.com/NousResearch/hermes-agent/pull/2199), [#2200](https://github.com/NousResearch/hermes-agent/pull/2200))
- **@dusterbloom** — SQL injection prevention + local server context window querying ([#2061](https://github.com/NousResearch/hermes-agent/pull/2061), [#2091](https://github.com/NousResearch/hermes-agent/pull/2091))
- **@0xbyt4** — Anthropic tool_calls None guard + OpenCode-Go provider config fix ([#2209](https://github.com/NousResearch/hermes-agent/pull/2209), [#2393](https://github.com/NousResearch/hermes-agent/pull/2393))
- **@sai-samarth** (Saisamarth) — WhatsApp send_message routing + systemd node path ([#1769](https://github.com/NousResearch/hermes-agent/pull/1769), [#1767](https://github.com/NousResearch/hermes-agent/pull/1767))
- **@Gutslabs** (Guts) — Block @ references from reading secrets ([#2601](https://github.com/NousResearch/hermes-agent/pull/2601))
- **@Mibayy** (Mibay) — Cron job repeat normalization ([#2612](https://github.com/NousResearch/hermes-agent/pull/2612))
- **@ten-jampa** (Tenzin Jampa) — Gateway /title command fix ([#2379](https://github.com/NousResearch/hermes-agent/pull/2379))
- **@cutepawss** (lila) — File tools search pagination fix ([#1824](https://github.com/NousResearch/hermes-agent/pull/1824))
- **@hanai** (Hanai) — OpenAI TTS base_url support ([#2064](https://github.com/NousResearch/hermes-agent/pull/2064))
- **@rovle** (Lovre Pešut) — Daytona sandbox API migration ([#2063](https://github.com/NousResearch/hermes-agent/pull/2063))
- **@buntingszn** (bunting szn) — Matrix cron delivery support ([#2167](https://github.com/NousResearch/hermes-agent/pull/2167))
- **@InB4DevOps** — Token counter reset on new session ([#2101](https://github.com/NousResearch/hermes-agent/pull/2101))
- **@JiwaniZakir** (Zakir Jiwani) — Missing file in wheel fix ([#2098](https://github.com/NousResearch/hermes-agent/pull/2098))
- **@ygd58** (buray) — Delegate tool parent tool names fix ([#2083](https://github.com/NousResearch/hermes-agent/pull/2083))
---
**Full Changelog**: [v2026.3.17...v2026.3.23](https://github.com/NousResearch/hermes-agent/compare/v2026.3.17...v2026.3.23)

348
RELEASE_v0.5.0.md Normal file
View File

@@ -0,0 +1,348 @@
# Hermes Agent v0.5.0 (v2026.3.28)
**Release Date:** March 28, 2026
> The hardening release — Hugging Face provider, /model command overhaul, Telegram Private Chat Topics, native Modal SDK, plugin lifecycle hooks, tool-use enforcement for GPT models, Nix flake, 50+ security and reliability fixes, and a comprehensive supply chain audit.
---
## ✨ Highlights
- **Nous Portal now supports 400+ models** — The Nous Research inference portal has expanded dramatically, giving Hermes Agent users access to over 400 models through a single provider endpoint
- **Hugging Face as a first-class inference provider** — Full integration with HF Inference API including curated agentic model picker that maps to OpenRouter analogues, live `/models` endpoint probe, and setup wizard flow ([#3419](https://github.com/NousResearch/hermes-agent/pull/3419), [#3440](https://github.com/NousResearch/hermes-agent/pull/3440))
- **Telegram Private Chat Topics** — Project-based conversations with functional skill binding per topic, enabling isolated workflows within a single Telegram chat ([#3163](https://github.com/NousResearch/hermes-agent/pull/3163))
- **Native Modal SDK backend** — Replaced swe-rex dependency with native Modal SDK (`Sandbox.create.aio` + `exec.aio`), eliminating tunnels and simplifying the Modal terminal backend ([#3538](https://github.com/NousResearch/hermes-agent/pull/3538))
- **Plugin lifecycle hooks activated** — `pre_llm_call`, `post_llm_call`, `on_session_start`, and `on_session_end` hooks now fire in the agent loop and CLI/gateway, completing the plugin hook system ([#3542](https://github.com/NousResearch/hermes-agent/pull/3542))
- **Improved OpenAI Model Reliability** — Added `GPT_TOOL_USE_GUIDANCE` to prevent GPT models from describing intended actions instead of making tool calls, plus automatic stripping of stale budget warnings from conversation history that caused models to avoid tools across turns ([#3528](https://github.com/NousResearch/hermes-agent/pull/3528))
- **Nix flake** — Full uv2nix build, NixOS module with persistent container mode, auto-generated config keys from Python source, and suffix PATHs for agent-friendliness ([#20](https://github.com/NousResearch/hermes-agent/pull/20), [#3274](https://github.com/NousResearch/hermes-agent/pull/3274), [#3061](https://github.com/NousResearch/hermes-agent/pull/3061)) by @alt-glitch
- **Supply chain hardening** — Removed compromised `litellm` dependency, pinned all dependency version ranges, regenerated `uv.lock` with hashes, added CI workflow scanning PRs for supply chain attack patterns, and bumped deps to fix CVEs ([#2796](https://github.com/NousResearch/hermes-agent/pull/2796), [#2810](https://github.com/NousResearch/hermes-agent/pull/2810), [#2812](https://github.com/NousResearch/hermes-agent/pull/2812), [#2816](https://github.com/NousResearch/hermes-agent/pull/2816), [#3073](https://github.com/NousResearch/hermes-agent/pull/3073))
- **Anthropic output limits fix** — Replaced hardcoded 16K `max_tokens` with per-model native output limits (128K for Opus 4.6, 64K for Sonnet 4.6), fixing "Response truncated" and thinking-budget exhaustion on direct Anthropic API ([#3426](https://github.com/NousResearch/hermes-agent/pull/3426), [#3444](https://github.com/NousResearch/hermes-agent/pull/3444))
---
## 🏗️ Core Agent & Architecture
### New Provider: Hugging Face
- First-class Hugging Face Inference API integration with auth, setup wizard, and model picker ([#3419](https://github.com/NousResearch/hermes-agent/pull/3419))
- Curated model list mapping OpenRouter agentic defaults to HF equivalents — providers with 8+ curated models skip live `/models` probe for speed ([#3440](https://github.com/NousResearch/hermes-agent/pull/3440))
- Added glm-5-turbo to Z.AI provider model list ([#3095](https://github.com/NousResearch/hermes-agent/pull/3095))
### Provider & Model Improvements
- `/model` command overhaul — extracted shared `switch_model()` pipeline for CLI and gateway, custom endpoint support, provider-aware routing ([#2795](https://github.com/NousResearch/hermes-agent/pull/2795), [#2799](https://github.com/NousResearch/hermes-agent/pull/2799))
- Removed `/model` slash command from CLI and gateway in favor of `hermes model` subcommand ([#3080](https://github.com/NousResearch/hermes-agent/pull/3080))
- Preserve `custom` provider instead of silently remapping to `openrouter` ([#2792](https://github.com/NousResearch/hermes-agent/pull/2792))
- Read root-level `provider` and `base_url` from config.yaml into model config ([#3112](https://github.com/NousResearch/hermes-agent/pull/3112))
- Align Nous Portal model slugs with OpenRouter naming ([#3253](https://github.com/NousResearch/hermes-agent/pull/3253))
- Fix Alibaba provider default endpoint and model list ([#3484](https://github.com/NousResearch/hermes-agent/pull/3484))
- Allow MiniMax users to override `/v1``/anthropic` auto-correction ([#3553](https://github.com/NousResearch/hermes-agent/pull/3553))
- Migrate OAuth token refresh to `platform.claude.com` with fallback ([#3246](https://github.com/NousResearch/hermes-agent/pull/3246))
### Agent Loop & Conversation
- **Improved OpenAI model reliability** — `GPT_TOOL_USE_GUIDANCE` prevents GPT models from describing actions instead of calling tools + automatic budget warning stripping from history ([#3528](https://github.com/NousResearch/hermes-agent/pull/3528))
- **Surface lifecycle events** — All retry, fallback, and compression events now surface to the user as formatted messages ([#3153](https://github.com/NousResearch/hermes-agent/pull/3153))
- **Anthropic output limits** — Per-model native output limits instead of hardcoded 16K `max_tokens` ([#3426](https://github.com/NousResearch/hermes-agent/pull/3426))
- **Thinking-budget exhaustion detection** — Skip useless continuation retries when model uses all output tokens on reasoning ([#3444](https://github.com/NousResearch/hermes-agent/pull/3444))
- Always prefer streaming for API calls to prevent hung subagents ([#3120](https://github.com/NousResearch/hermes-agent/pull/3120))
- Restore safe non-streaming fallback after stream failures ([#3020](https://github.com/NousResearch/hermes-agent/pull/3020))
- Give subagents independent iteration budgets ([#3004](https://github.com/NousResearch/hermes-agent/pull/3004))
- Update `api_key` in `_try_activate_fallback` for subagent auth ([#3103](https://github.com/NousResearch/hermes-agent/pull/3103))
- Graceful return on max retries instead of crashing thread ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Count compression restarts toward retry limit ([#3070](https://github.com/NousResearch/hermes-agent/pull/3070))
- Include tool tokens in preflight estimate, guard context probe persistence ([#3164](https://github.com/NousResearch/hermes-agent/pull/3164))
- Update context compressor limits after fallback activation ([#3305](https://github.com/NousResearch/hermes-agent/pull/3305))
- Validate empty user messages to prevent Anthropic API 400 errors ([#3322](https://github.com/NousResearch/hermes-agent/pull/3322))
- GLM reasoning-only and max-length handling ([#3010](https://github.com/NousResearch/hermes-agent/pull/3010))
- Increase API timeout default from 900s to 1800s for slow-thinking models ([#3431](https://github.com/NousResearch/hermes-agent/pull/3431))
- Send `max_tokens` for Claude/OpenRouter + retry SSE connection errors ([#3497](https://github.com/NousResearch/hermes-agent/pull/3497))
- Prevent AsyncOpenAI/httpx cross-loop deadlock in gateway mode ([#2701](https://github.com/NousResearch/hermes-agent/pull/2701)) by @ctlst
### Streaming & Reasoning
- **Persist reasoning across gateway session turns** with new schema v6 columns (`reasoning`, `reasoning_details`, `codex_reasoning_items`) ([#2974](https://github.com/NousResearch/hermes-agent/pull/2974))
- Detect and kill stale SSE connections ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Fix stale stream detector race causing spurious `RemoteProtocolError` ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Skip duplicate callback for `<think>`-extracted reasoning during streaming ([#3116](https://github.com/NousResearch/hermes-agent/pull/3116))
- Preserve reasoning fields in `rewrite_transcript` ([#3311](https://github.com/NousResearch/hermes-agent/pull/3311))
- Preserve Gemini thought signatures in streamed tool calls ([#2997](https://github.com/NousResearch/hermes-agent/pull/2997))
- Ensure first delta is fired during reasoning updates ([untagged commit](https://github.com/NousResearch/hermes-agent))
### Session & Memory
- **Session search recent sessions mode** — Omit query to browse recent sessions with titles, previews, and timestamps ([#2533](https://github.com/NousResearch/hermes-agent/pull/2533))
- **Session config surfacing** on `/new`, `/reset`, and auto-reset ([#3321](https://github.com/NousResearch/hermes-agent/pull/3321))
- **Third-party session isolation** — `--source` flag for isolating sessions by origin ([#3255](https://github.com/NousResearch/hermes-agent/pull/3255))
- Add `/resume` CLI handler, session log truncation guard, `reopen_session` API ([#3315](https://github.com/NousResearch/hermes-agent/pull/3315))
- Clear compressor summary and turn counter on `/clear` and `/new` ([#3102](https://github.com/NousResearch/hermes-agent/pull/3102))
- Surface silent SessionDB failures that cause session data loss ([#2999](https://github.com/NousResearch/hermes-agent/pull/2999))
- Session search fallback preview on summarization failure ([#3478](https://github.com/NousResearch/hermes-agent/pull/3478))
- Prevent stale memory overwrites by flush agent ([#2687](https://github.com/NousResearch/hermes-agent/pull/2687))
### Context Compression
- Replace dead `summary_target_tokens` with ratio-based scaling ([#2554](https://github.com/NousResearch/hermes-agent/pull/2554))
- Expose `compression.target_ratio`, `protect_last_n`, and `threshold` in `DEFAULT_CONFIG` ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Restore sane defaults and cap summary at 12K tokens ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Preserve transcript on `/compress` and hygiene compression ([#3556](https://github.com/NousResearch/hermes-agent/pull/3556))
- Update context pressure warnings and token estimates after compaction ([untagged commit](https://github.com/NousResearch/hermes-agent))
### Architecture & Dependencies
- **Remove mini-swe-agent dependency** — Inline Docker and Modal backends directly ([#2804](https://github.com/NousResearch/hermes-agent/pull/2804))
- **Replace swe-rex with native Modal SDK** for Modal backend ([#3538](https://github.com/NousResearch/hermes-agent/pull/3538))
- **Plugin lifecycle hooks** — `pre_llm_call`, `post_llm_call`, `on_session_start`, `on_session_end` now fire in the agent loop ([#3542](https://github.com/NousResearch/hermes-agent/pull/3542))
- Fix plugin toolsets invisible in `hermes tools` and standalone processes ([#3457](https://github.com/NousResearch/hermes-agent/pull/3457))
- Consolidate `get_hermes_home()` and `parse_reasoning_effort()` ([#3062](https://github.com/NousResearch/hermes-agent/pull/3062))
- Remove unused Hermes-native PKCE OAuth flow ([#3107](https://github.com/NousResearch/hermes-agent/pull/3107))
- Remove ~100 unused imports across 55 files ([#3016](https://github.com/NousResearch/hermes-agent/pull/3016))
- Fix 154 f-strings, simplify getattr/URL patterns, remove dead code ([#3119](https://github.com/NousResearch/hermes-agent/pull/3119))
---
## 📱 Messaging Platforms (Gateway)
### Telegram
- **Private Chat Topics** — Project-based conversations with functional skill binding per topic, enabling isolated workflows within a single Telegram chat ([#3163](https://github.com/NousResearch/hermes-agent/pull/3163))
- **Auto-discover fallback IPs via DNS-over-HTTPS** when `api.telegram.org` is unreachable ([#3376](https://github.com/NousResearch/hermes-agent/pull/3376))
- **Configurable reply threading mode** ([#2907](https://github.com/NousResearch/hermes-agent/pull/2907))
- Fall back to no `thread_id` on "Message thread not found" BadRequest ([#3390](https://github.com/NousResearch/hermes-agent/pull/3390))
- Self-reschedule reconnect when `start_polling` fails after 502 ([#3268](https://github.com/NousResearch/hermes-agent/pull/3268))
### Discord
- Stop phantom typing indicator after agent turn completes ([#3003](https://github.com/NousResearch/hermes-agent/pull/3003))
### Slack
- Send tool call progress messages to correct Slack thread ([#3063](https://github.com/NousResearch/hermes-agent/pull/3063))
- Scope progress thread fallback to Slack only ([#3488](https://github.com/NousResearch/hermes-agent/pull/3488))
### WhatsApp
- Download documents, audio, and video media from messages ([#2978](https://github.com/NousResearch/hermes-agent/pull/2978))
### Matrix
- Add missing Matrix entry in `PLATFORMS` dict ([#3473](https://github.com/NousResearch/hermes-agent/pull/3473))
- Harden e2ee access-token handling ([#3562](https://github.com/NousResearch/hermes-agent/pull/3562))
- Add backoff for `SyncError` in sync loop ([#3280](https://github.com/NousResearch/hermes-agent/pull/3280))
### Signal
- Track SSE keepalive comments as connection activity ([#3316](https://github.com/NousResearch/hermes-agent/pull/3316))
### Email
- Prevent unbounded growth of `_seen_uids` in EmailAdapter ([#3490](https://github.com/NousResearch/hermes-agent/pull/3490))
### Gateway Core
- **Config-gated `/verbose` command** for messaging platforms — toggle tool output verbosity from chat ([#3262](https://github.com/NousResearch/hermes-agent/pull/3262))
- **Background review notifications** delivered to user chat ([#3293](https://github.com/NousResearch/hermes-agent/pull/3293))
- **Retry transient send failures** and notify user on exhaustion ([#3288](https://github.com/NousResearch/hermes-agent/pull/3288))
- Recover from hung agents — `/stop` hard-kills session lock ([#3104](https://github.com/NousResearch/hermes-agent/pull/3104))
- Thread-safe `SessionStore` — protect `_entries` with `threading.Lock` ([#3052](https://github.com/NousResearch/hermes-agent/pull/3052))
- Fix gateway token double-counting with cached agents — use absolute set instead of increment ([#3306](https://github.com/NousResearch/hermes-agent/pull/3306), [#3317](https://github.com/NousResearch/hermes-agent/pull/3317))
- Fingerprint full auth token in agent cache signature ([#3247](https://github.com/NousResearch/hermes-agent/pull/3247))
- Silence background agent terminal output ([#3297](https://github.com/NousResearch/hermes-agent/pull/3297))
- Include per-platform `ALLOW_ALL` and `SIGNAL_GROUP` in startup allowlist check ([#3313](https://github.com/NousResearch/hermes-agent/pull/3313))
- Include user-local bin paths in systemd unit PATH ([#3527](https://github.com/NousResearch/hermes-agent/pull/3527))
- Track background task references in `GatewayRunner` ([#3254](https://github.com/NousResearch/hermes-agent/pull/3254))
- Add request timeouts to HA, Email, Mattermost, SMS adapters ([#3258](https://github.com/NousResearch/hermes-agent/pull/3258))
- Add media download retry to Mattermost, Slack, and base cache ([#3323](https://github.com/NousResearch/hermes-agent/pull/3323))
- Detect virtualenv path instead of hardcoding `venv/` ([#2797](https://github.com/NousResearch/hermes-agent/pull/2797))
- Use `TERMINAL_CWD` for context file discovery, not process cwd ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Stop loading hermes repo AGENTS.md into gateway sessions (~10k wasted tokens) ([#2891](https://github.com/NousResearch/hermes-agent/pull/2891))
---
## 🖥️ CLI & User Experience
### Interactive CLI
- **Configurable busy input mode** + fix `/queue` always working ([#3298](https://github.com/NousResearch/hermes-agent/pull/3298))
- **Preserve user input on multiline paste** ([#3065](https://github.com/NousResearch/hermes-agent/pull/3065))
- **Tool generation callback** — streaming "preparing terminal…" updates during tool argument generation ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Show tool progress for substantive tools, not just "preparing" ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Buffer reasoning preview chunks and fix duplicate display ([#3013](https://github.com/NousResearch/hermes-agent/pull/3013))
- Prevent reasoning box from rendering 3x during tool-calling loops ([#3405](https://github.com/NousResearch/hermes-agent/pull/3405))
- Eliminate "Event loop is closed" / "Press ENTER to continue" during idle — three-layer fix with `neuter_async_httpx_del()`, custom exception handler, and stale client cleanup ([#3398](https://github.com/NousResearch/hermes-agent/pull/3398))
- Fix status bar shows 26K instead of 260K for token counts with trailing zeros ([#3024](https://github.com/NousResearch/hermes-agent/pull/3024))
- Fix status bar duplicates and degrades during long sessions ([#3291](https://github.com/NousResearch/hermes-agent/pull/3291))
- Refresh TUI before background task output to prevent status bar overlap ([#3048](https://github.com/NousResearch/hermes-agent/pull/3048))
- Suppress KawaiiSpinner animation under `patch_stdout` ([#2994](https://github.com/NousResearch/hermes-agent/pull/2994))
- Skip KawaiiSpinner when TUI handles tool progress ([#2973](https://github.com/NousResearch/hermes-agent/pull/2973))
- Guard `isatty()` against closed streams via `_is_tty` property ([#3056](https://github.com/NousResearch/hermes-agent/pull/3056))
- Ensure single closure of streaming boxes during tool generation ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Cap context pressure percentage at 100% in display ([#3480](https://github.com/NousResearch/hermes-agent/pull/3480))
- Clean up HTML error messages in CLI display ([#3069](https://github.com/NousResearch/hermes-agent/pull/3069))
- Show HTTP status code and 400 body in API error output ([#3096](https://github.com/NousResearch/hermes-agent/pull/3096))
- Extract useful info from HTML error pages, dump debug on max retries ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Prevent TypeError on startup when `base_url` is None ([#3068](https://github.com/NousResearch/hermes-agent/pull/3068))
- Prevent update crash in non-TTY environments ([#3094](https://github.com/NousResearch/hermes-agent/pull/3094))
- Handle EOFError in sessions delete/prune confirmation prompts ([#3101](https://github.com/NousResearch/hermes-agent/pull/3101))
- Catch KeyboardInterrupt during `flush_memories` on exit and in exit cleanup handlers ([#3025](https://github.com/NousResearch/hermes-agent/pull/3025), [#3257](https://github.com/NousResearch/hermes-agent/pull/3257))
- Guard `.strip()` against None values from YAML config ([#3552](https://github.com/NousResearch/hermes-agent/pull/3552))
- Guard `config.get()` against YAML null values to prevent AttributeError ([#3377](https://github.com/NousResearch/hermes-agent/pull/3377))
- Store asyncio task references to prevent GC mid-execution ([#3267](https://github.com/NousResearch/hermes-agent/pull/3267))
### Setup & Configuration
- Use explicit key mapping for returning-user menu dispatch instead of positional index ([#3083](https://github.com/NousResearch/hermes-agent/pull/3083))
- Use `sys.executable` for pip in update commands to fix PEP 668 ([#3099](https://github.com/NousResearch/hermes-agent/pull/3099))
- Harden `hermes update` against diverged history, non-main branches, and gateway edge cases ([#3492](https://github.com/NousResearch/hermes-agent/pull/3492))
- OpenClaw migration overwrites defaults and setup wizard skips imported sections — fixed ([#3282](https://github.com/NousResearch/hermes-agent/pull/3282))
- Stop recursive AGENTS.md walk, load top-level only ([#3110](https://github.com/NousResearch/hermes-agent/pull/3110))
- Add macOS Homebrew paths to browser and terminal PATH resolution ([#2713](https://github.com/NousResearch/hermes-agent/pull/2713))
- YAML boolean handling for `tool_progress` config ([#3300](https://github.com/NousResearch/hermes-agent/pull/3300))
- Reset default SOUL.md to baseline identity text ([#3159](https://github.com/NousResearch/hermes-agent/pull/3159))
- Reject relative cwd paths for container terminal backends ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Add explicit `hermes-api-server` toolset for API server platform ([#3304](https://github.com/NousResearch/hermes-agent/pull/3304))
- Reorder setup wizard providers — OpenRouter first ([untagged commit](https://github.com/NousResearch/hermes-agent))
---
## 🔧 Tool System
### API Server
- **Idempotency-Key support**, body size limit, and OpenAI error envelope ([#2903](https://github.com/NousResearch/hermes-agent/pull/2903))
- Allow Idempotency-Key in CORS headers ([#3530](https://github.com/NousResearch/hermes-agent/pull/3530))
- Cancel orphaned agent + true interrupt on SSE disconnect ([#3427](https://github.com/NousResearch/hermes-agent/pull/3427))
- Fix streaming breaks when agent makes tool calls ([#2985](https://github.com/NousResearch/hermes-agent/pull/2985))
### Terminal & File Operations
- Handle addition-only hunks in V4A patch parser ([#3325](https://github.com/NousResearch/hermes-agent/pull/3325))
- Exponential backoff for persistent shell polling ([#2996](https://github.com/NousResearch/hermes-agent/pull/2996))
- Add timeout to subprocess calls in `context_references` ([#3469](https://github.com/NousResearch/hermes-agent/pull/3469))
### Browser & Vision
- Handle 402 insufficient credits error in vision tool ([#2802](https://github.com/NousResearch/hermes-agent/pull/2802))
- Fix `browser_vision` ignores `auxiliary.vision.timeout` config ([#2901](https://github.com/NousResearch/hermes-agent/pull/2901))
- Make browser command timeout configurable via config.yaml ([#2801](https://github.com/NousResearch/hermes-agent/pull/2801))
### MCP
- MCP toolset resolution for runtime and config ([#3252](https://github.com/NousResearch/hermes-agent/pull/3252))
- Add MCP tool name collision protection ([#3077](https://github.com/NousResearch/hermes-agent/pull/3077))
### Auxiliary LLM
- Guard aux LLM calls against None content + reasoning fallback + retry ([#3449](https://github.com/NousResearch/hermes-agent/pull/3449))
- Catch ImportError from `build_anthropic_client` in vision auto-detection ([#3312](https://github.com/NousResearch/hermes-agent/pull/3312))
### Other Tools
- Add request timeouts to `send_message_tool` HTTP calls ([#3162](https://github.com/NousResearch/hermes-agent/pull/3162)) by @memosr
- Auto-repair `jobs.json` with invalid control characters ([#3537](https://github.com/NousResearch/hermes-agent/pull/3537))
- Enable fine-grained tool streaming for Claude/OpenRouter ([#3497](https://github.com/NousResearch/hermes-agent/pull/3497))
---
## 🧩 Skills Ecosystem
### Skills System
- **Env var passthrough** for skills and user config — skills can declare environment variables to pass through ([#2807](https://github.com/NousResearch/hermes-agent/pull/2807))
- Cache skills prompt with shared `skill_utils` module for faster TTFT ([#3421](https://github.com/NousResearch/hermes-agent/pull/3421))
- Avoid redundant file re-read for skill conditions ([#2992](https://github.com/NousResearch/hermes-agent/pull/2992))
- Use Git Trees API to prevent silent subdirectory loss during install ([#2995](https://github.com/NousResearch/hermes-agent/pull/2995))
- Fix skills-sh install for deeply nested repo structures ([#2980](https://github.com/NousResearch/hermes-agent/pull/2980))
- Handle null metadata in skill frontmatter ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Preserve trust for skills-sh identifiers + reduce resolution churn ([#3251](https://github.com/NousResearch/hermes-agent/pull/3251))
- Agent-created skills were incorrectly treated as untrusted community content — fixed ([untagged commit](https://github.com/NousResearch/hermes-agent))
### New Skills
- **G0DM0D3 godmode jailbreaking skill** + docs ([#3157](https://github.com/NousResearch/hermes-agent/pull/3157))
- **Docker management skill** added to optional-skills ([#3060](https://github.com/NousResearch/hermes-agent/pull/3060))
- **OpenClaw migration v2** — 17 new modules, terminal recap for migrating from OpenClaw to Hermes ([#2906](https://github.com/NousResearch/hermes-agent/pull/2906))
---
## 🔒 Security & Reliability
### Security Hardening
- **SSRF protection** added to `browser_navigate` ([#3058](https://github.com/NousResearch/hermes-agent/pull/3058))
- **SSRF protection** added to `vision_tools` and `web_tools` (hardened) ([#2679](https://github.com/NousResearch/hermes-agent/pull/2679))
- **Restrict subagent toolsets** to parent's enabled set ([#3269](https://github.com/NousResearch/hermes-agent/pull/3269))
- **Prevent zip-slip path traversal** in self-update ([#3250](https://github.com/NousResearch/hermes-agent/pull/3250))
- **Prevent shell injection** in `_expand_path` via `~user` path suffix ([#2685](https://github.com/NousResearch/hermes-agent/pull/2685))
- **Normalize input** before dangerous command detection ([#3260](https://github.com/NousResearch/hermes-agent/pull/3260))
- Make tirith block verdicts approvable instead of hard-blocking ([#3428](https://github.com/NousResearch/hermes-agent/pull/3428))
- Remove compromised `litellm`/`typer`/`platformdirs` from deps ([#2796](https://github.com/NousResearch/hermes-agent/pull/2796))
- Pin all dependency version ranges ([#2810](https://github.com/NousResearch/hermes-agent/pull/2810))
- Regenerate `uv.lock` with hashes, use lockfile in setup ([#2812](https://github.com/NousResearch/hermes-agent/pull/2812))
- Bump dependencies to fix CVEs + regenerate `uv.lock` ([#3073](https://github.com/NousResearch/hermes-agent/pull/3073))
- Supply chain audit CI workflow for PR scanning ([#2816](https://github.com/NousResearch/hermes-agent/pull/2816))
### Reliability
- **SQLite WAL write-lock contention** causing 15-20s TUI freeze — fixed ([#3385](https://github.com/NousResearch/hermes-agent/pull/3385))
- **SQLite concurrency hardening** + session transcript integrity ([#3249](https://github.com/NousResearch/hermes-agent/pull/3249))
- Prevent recurring cron job re-fire on gateway crash/restart loop ([#3396](https://github.com/NousResearch/hermes-agent/pull/3396))
- Mark cron session as ended after job completes ([#2998](https://github.com/NousResearch/hermes-agent/pull/2998))
---
## ⚡ Performance
- **TTFT startup optimizations** — salvaged easy-win startup improvements ([#3395](https://github.com/NousResearch/hermes-agent/pull/3395))
- Cache skills prompt with shared `skill_utils` module ([#3421](https://github.com/NousResearch/hermes-agent/pull/3421))
- Avoid redundant file re-read for skill conditions in prompt builder ([#2992](https://github.com/NousResearch/hermes-agent/pull/2992))
---
## 🐛 Notable Bug Fixes
- Fix gateway token double-counting with cached agents ([#3306](https://github.com/NousResearch/hermes-agent/pull/3306), [#3317](https://github.com/NousResearch/hermes-agent/pull/3317))
- Fix "Event loop is closed" / "Press ENTER to continue" during idle sessions ([#3398](https://github.com/NousResearch/hermes-agent/pull/3398))
- Fix reasoning box rendering 3x during tool-calling loops ([#3405](https://github.com/NousResearch/hermes-agent/pull/3405))
- Fix status bar shows 26K instead of 260K for token counts ([#3024](https://github.com/NousResearch/hermes-agent/pull/3024))
- Fix `/queue` always working regardless of config ([#3298](https://github.com/NousResearch/hermes-agent/pull/3298))
- Fix phantom Discord typing indicator after agent turn ([#3003](https://github.com/NousResearch/hermes-agent/pull/3003))
- Fix Slack progress messages appearing in wrong thread ([#3063](https://github.com/NousResearch/hermes-agent/pull/3063))
- Fix WhatsApp media downloads (documents, audio, video) ([#2978](https://github.com/NousResearch/hermes-agent/pull/2978))
- Fix Telegram "Message thread not found" killing progress messages ([#3390](https://github.com/NousResearch/hermes-agent/pull/3390))
- Fix OpenClaw migration overwriting defaults ([#3282](https://github.com/NousResearch/hermes-agent/pull/3282))
- Fix returning-user setup menu dispatching wrong section ([#3083](https://github.com/NousResearch/hermes-agent/pull/3083))
- Fix `hermes update` PEP 668 "externally-managed-environment" error ([#3099](https://github.com/NousResearch/hermes-agent/pull/3099))
- Fix subagents hitting `max_iterations` prematurely via shared budget ([#3004](https://github.com/NousResearch/hermes-agent/pull/3004))
- Fix YAML boolean handling for `tool_progress` config ([#3300](https://github.com/NousResearch/hermes-agent/pull/3300))
- Fix `config.get()` crashes on YAML null values ([#3377](https://github.com/NousResearch/hermes-agent/pull/3377))
- Fix `.strip()` crash on None values from YAML config ([#3552](https://github.com/NousResearch/hermes-agent/pull/3552))
- Fix hung agents on gateway — `/stop` now hard-kills session lock ([#3104](https://github.com/NousResearch/hermes-agent/pull/3104))
- Fix `_custom` provider silently remapped to `openrouter` ([#2792](https://github.com/NousResearch/hermes-agent/pull/2792))
- Fix Matrix missing from `PLATFORMS` dict ([#3473](https://github.com/NousResearch/hermes-agent/pull/3473))
- Fix Email adapter unbounded `_seen_uids` growth ([#3490](https://github.com/NousResearch/hermes-agent/pull/3490))
---
## 🧪 Testing
- Pin `agent-client-protocol` < 0.9 to handle breaking upstream release ([#3320](https://github.com/NousResearch/hermes-agent/pull/3320))
- Catch anthropic ImportError in vision auto-detection tests ([#3312](https://github.com/NousResearch/hermes-agent/pull/3312))
- Update retry-exhaust test for new graceful return behavior ([#3320](https://github.com/NousResearch/hermes-agent/pull/3320))
- Add regression tests for null metadata frontmatter ([untagged commit](https://github.com/NousResearch/hermes-agent))
---
## 📚 Documentation
- Update all docs for `/model` command overhaul and custom provider support ([#2800](https://github.com/NousResearch/hermes-agent/pull/2800))
- Fix stale and incorrect documentation across 18 files ([#2805](https://github.com/NousResearch/hermes-agent/pull/2805))
- Document 9 previously undocumented features ([#2814](https://github.com/NousResearch/hermes-agent/pull/2814))
- Add missing skills, CLI commands, and messaging env vars to docs ([#2809](https://github.com/NousResearch/hermes-agent/pull/2809))
- Fix api-server response storage documentation — SQLite, not in-memory ([#2819](https://github.com/NousResearch/hermes-agent/pull/2819))
- Quote pip install extras to fix zsh glob errors ([#2815](https://github.com/NousResearch/hermes-agent/pull/2815))
- Unify hooks documentation — add plugin hooks to hooks page, add `session:end` event ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Clarify two-mode behavior in `session_search` schema description ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Fix Discord Public Bot setting for Discord-provided invite link ([#3519](https://github.com/NousResearch/hermes-agent/pull/3519)) by @mehmoodosman
- Revise v0.4.0 changelog — fix feature attribution, reorder sections ([untagged commit](https://github.com/NousResearch/hermes-agent))
---
## 👥 Contributors
### Core
- **@teknium1** — 157 PRs covering the full scope of this release
### Community Contributors
- **@alt-glitch** (Siddharth Balyan) — 2 PRs: Nix flake with uv2nix build, NixOS module, and persistent container mode ([#20](https://github.com/NousResearch/hermes-agent/pull/20)); auto-generated config keys and suffix PATHs for Nix builds ([#3061](https://github.com/NousResearch/hermes-agent/pull/3061), [#3274](https://github.com/NousResearch/hermes-agent/pull/3274))
- **@ctlst** — 1 PR: Prevent AsyncOpenAI/httpx cross-loop deadlock in gateway mode ([#2701](https://github.com/NousResearch/hermes-agent/pull/2701))
- **@memosr** (memosr.eth) — 1 PR: Add request timeouts to `send_message_tool` HTTP calls ([#3162](https://github.com/NousResearch/hermes-agent/pull/3162))
- **@mehmoodosman** (Osman Mehmood) — 1 PR: Fix Discord docs for Public Bot setting ([#3519](https://github.com/NousResearch/hermes-agent/pull/3519))
### All Contributors
@alt-glitch, @ctlst, @mehmoodosman, @memosr, @teknium1
---
**Full Changelog**: [v2026.3.23...v2026.3.28](https://github.com/NousResearch/hermes-agent/compare/v2026.3.23...v2026.3.28)

249
RELEASE_v0.6.0.md Normal file
View File

@@ -0,0 +1,249 @@
# Hermes Agent v0.6.0 (v2026.3.30)
**Release Date:** March 30, 2026
> The multi-instance release — Profiles for running isolated agent instances, MCP server mode, Docker container, fallback provider chains, two new messaging platforms (Feishu/Lark and WeCom), Telegram webhook mode, Slack multi-workspace OAuth, 95 PRs and 16 resolved issues in 2 days.
---
## ✨ Highlights
- **Profiles — Multi-Instance Hermes** — Run multiple isolated Hermes instances from the same installation. Each profile gets its own config, memory, sessions, skills, and gateway service. Create with `hermes profile create`, switch with `hermes -p <name>`, export/import for sharing. Full token-lock isolation prevents two profiles from using the same bot credential. ([#3681](https://github.com/NousResearch/hermes-agent/pull/3681))
- **MCP Server Mode** — Expose Hermes conversations and sessions to any MCP-compatible client (Claude Desktop, Cursor, VS Code, etc.) via `hermes mcp serve`. Browse conversations, read messages, search across sessions, and manage attachments — all through the Model Context Protocol. Supports both stdio and Streamable HTTP transports. ([#3795](https://github.com/NousResearch/hermes-agent/pull/3795))
- **Docker Container** — Official Dockerfile for running Hermes Agent in a container. Supports both CLI and gateway modes with volume-mounted config. ([#3668](https://github.com/NousResearch/hermes-agent/pull/3668), closes [#850](https://github.com/NousResearch/hermes-agent/issues/850))
- **Ordered Fallback Provider Chain** — Configure multiple inference providers with automatic failover. When your primary provider returns errors or is unreachable, Hermes automatically tries the next provider in the chain. Configure via `fallback_providers` in config.yaml. ([#3813](https://github.com/NousResearch/hermes-agent/pull/3813), closes [#1734](https://github.com/NousResearch/hermes-agent/issues/1734))
- **Feishu/Lark Platform Support** — Full gateway adapter for Feishu (飞书) and Lark with event subscriptions, message cards, group chat, image/file attachments, and interactive card callbacks. ([#3799](https://github.com/NousResearch/hermes-agent/pull/3799), [#3817](https://github.com/NousResearch/hermes-agent/pull/3817), closes [#1788](https://github.com/NousResearch/hermes-agent/issues/1788))
- **WeCom (Enterprise WeChat) Platform Support** — New gateway adapter for WeCom (企业微信) with text/image/voice messages, group chats, and callback verification. ([#3847](https://github.com/NousResearch/hermes-agent/pull/3847))
- **Slack Multi-Workspace OAuth** — Connect a single Hermes gateway to multiple Slack workspaces via OAuth token file. Each workspace gets its own bot token, resolved dynamically per incoming event. ([#3903](https://github.com/NousResearch/hermes-agent/pull/3903))
- **Telegram Webhook Mode & Group Controls** — Run the Telegram adapter in webhook mode as an alternative to polling — faster response times and better for production deployments behind a reverse proxy. New group mention gating controls when the bot responds: always, only when @mentioned, or via regex triggers. ([#3880](https://github.com/NousResearch/hermes-agent/pull/3880), [#3870](https://github.com/NousResearch/hermes-agent/pull/3870))
- **Exa Search Backend** — Add Exa as an alternative web search and content extraction backend alongside Firecrawl and DuckDuckGo. Set `EXA_API_KEY` and configure as preferred backend. ([#3648](https://github.com/NousResearch/hermes-agent/pull/3648))
- **Skills & Credentials on Remote Backends** — Mount skill directories and credential files into Modal and Docker containers, so remote terminal sessions have access to the same skills and secrets as local execution. ([#3890](https://github.com/NousResearch/hermes-agent/pull/3890), [#3671](https://github.com/NousResearch/hermes-agent/pull/3671), closes [#3665](https://github.com/NousResearch/hermes-agent/issues/3665), [#3433](https://github.com/NousResearch/hermes-agent/issues/3433))
---
## 🏗️ Core Agent & Architecture
### Provider & Model Support
- **Ordered fallback provider chain** — automatic failover across multiple configured providers ([#3813](https://github.com/NousResearch/hermes-agent/pull/3813))
- **Fix api_mode on provider switch** — switching providers via `hermes model` now correctly clears stale `api_mode` instead of hardcoding `chat_completions`, fixing 404s for providers with Anthropic-compatible endpoints ([#3726](https://github.com/NousResearch/hermes-agent/pull/3726), [#3857](https://github.com/NousResearch/hermes-agent/pull/3857), closes [#3685](https://github.com/NousResearch/hermes-agent/issues/3685))
- **Stop silent OpenRouter fallback** — when no provider is configured, Hermes now raises a clear error instead of silently routing to OpenRouter ([#3807](https://github.com/NousResearch/hermes-agent/pull/3807), [#3862](https://github.com/NousResearch/hermes-agent/pull/3862))
- **Gemini 3.1 preview models** — added to OpenRouter and Nous Portal catalogs ([#3803](https://github.com/NousResearch/hermes-agent/pull/3803), closes [#3753](https://github.com/NousResearch/hermes-agent/issues/3753))
- **Gemini direct API context length** — full context length resolution for direct Google AI endpoints ([#3876](https://github.com/NousResearch/hermes-agent/pull/3876))
- **gpt-5.4-mini** added to Codex fallback catalog ([#3855](https://github.com/NousResearch/hermes-agent/pull/3855))
- **Curated model lists preferred** over live API probe when the probe returns fewer models ([#3856](https://github.com/NousResearch/hermes-agent/pull/3856), [#3867](https://github.com/NousResearch/hermes-agent/pull/3867))
- **User-friendly 429 rate limit messages** with Retry-After countdown ([#3809](https://github.com/NousResearch/hermes-agent/pull/3809))
- **Auxiliary client placeholder key** for local servers without auth requirements ([#3842](https://github.com/NousResearch/hermes-agent/pull/3842))
- **INFO-level logging** for auxiliary provider resolution ([#3866](https://github.com/NousResearch/hermes-agent/pull/3866))
### Agent Loop & Conversation
- **Subagent status reporting** — reports `completed` status when summary exists instead of generic failure ([#3829](https://github.com/NousResearch/hermes-agent/pull/3829))
- **Session log file updated during compression** — prevents stale file references after context compression ([#3835](https://github.com/NousResearch/hermes-agent/pull/3835))
- **Omit empty tools param** — sends no `tools` parameter when empty instead of `None`, fixing compatibility with strict providers ([#3820](https://github.com/NousResearch/hermes-agent/pull/3820))
### Profiles & Multi-Instance
- **Profiles system** — `hermes profile create/list/switch/delete/export/import/rename`. Each profile gets isolated HERMES_HOME, gateway service, CLI wrapper. Token locks prevent credential collisions. Tab completion for profile names. ([#3681](https://github.com/NousResearch/hermes-agent/pull/3681))
- **Profile-aware display paths** — all user-facing `~/.hermes` paths replaced with `display_hermes_home()` to show the correct profile directory ([#3623](https://github.com/NousResearch/hermes-agent/pull/3623))
- **Lazy display_hermes_home imports** — prevents `ImportError` during `hermes update` when modules cache stale bytecode ([#3776](https://github.com/NousResearch/hermes-agent/pull/3776))
- **HERMES_HOME for protected paths** — `.env` write-deny path now respects HERMES_HOME instead of hardcoded `~/.hermes` ([#3840](https://github.com/NousResearch/hermes-agent/pull/3840))
---
## 📱 Messaging Platforms (Gateway)
### New Platforms
- **Feishu/Lark** — Full adapter with event subscriptions, message cards, group chat, image/file attachments, interactive card callbacks ([#3799](https://github.com/NousResearch/hermes-agent/pull/3799), [#3817](https://github.com/NousResearch/hermes-agent/pull/3817))
- **WeCom (Enterprise WeChat)** — Text/image/voice messages, group chats, callback verification ([#3847](https://github.com/NousResearch/hermes-agent/pull/3847))
### Telegram
- **Webhook mode** — run as webhook endpoint instead of polling for production deployments ([#3880](https://github.com/NousResearch/hermes-agent/pull/3880))
- **Group mention gating & regex triggers** — configurable bot response behavior in groups: always, @mention-only, or regex-matched ([#3870](https://github.com/NousResearch/hermes-agent/pull/3870))
- **Gracefully handle deleted reply targets** — no more crashes when the message being replied to was deleted ([#3858](https://github.com/NousResearch/hermes-agent/pull/3858), closes [#3229](https://github.com/NousResearch/hermes-agent/issues/3229))
### Discord
- **Message processing reactions** — adds a reaction emoji while processing and removes it when done, giving visual feedback in channels ([#3871](https://github.com/NousResearch/hermes-agent/pull/3871))
- **DISCORD_IGNORE_NO_MENTION** — skip messages that @mention other users/bots but not Hermes ([#3640](https://github.com/NousResearch/hermes-agent/pull/3640))
- **Clean up deferred "thinking..."** — properly removes the "thinking..." indicator after slash commands complete ([#3674](https://github.com/NousResearch/hermes-agent/pull/3674), closes [#3595](https://github.com/NousResearch/hermes-agent/issues/3595))
### Slack
- **Multi-workspace OAuth** — connect to multiple Slack workspaces from a single gateway via OAuth token file ([#3903](https://github.com/NousResearch/hermes-agent/pull/3903))
### WhatsApp
- **Persistent aiohttp session** — reuse HTTP sessions across requests instead of creating new ones per message ([#3818](https://github.com/NousResearch/hermes-agent/pull/3818))
- **LID↔phone alias resolution** — correctly match Linked ID and phone number formats in allowlists ([#3830](https://github.com/NousResearch/hermes-agent/pull/3830))
- **Skip reply prefix in bot mode** — cleaner message formatting when running as a WhatsApp bot ([#3931](https://github.com/NousResearch/hermes-agent/pull/3931))
### Matrix
- **Native voice messages via MSC3245** — send voice messages as proper Matrix voice events instead of file attachments ([#3877](https://github.com/NousResearch/hermes-agent/pull/3877))
### Mattermost
- **Configurable mention behavior** — respond to messages without requiring @mention ([#3664](https://github.com/NousResearch/hermes-agent/pull/3664))
### Signal
- **URL-encode phone numbers** and correct attachment RPC parameter — fixes delivery failures with certain phone number formats ([#3670](https://github.com/NousResearch/hermes-agent/pull/3670)) — @kshitijk4poor
### Email
- **Close SMTP/IMAP connections on failure** — prevents connection leaks during error scenarios ([#3804](https://github.com/NousResearch/hermes-agent/pull/3804))
### Gateway Core
- **Atomic config writes** — use atomic file writes for config.yaml to prevent data loss during crashes ([#3800](https://github.com/NousResearch/hermes-agent/pull/3800))
- **Home channel env overrides** — apply environment variable overrides for home channels consistently ([#3796](https://github.com/NousResearch/hermes-agent/pull/3796), [#3808](https://github.com/NousResearch/hermes-agent/pull/3808))
- **Replace print() with logger** — BasePlatformAdapter now uses proper logging instead of print statements ([#3669](https://github.com/NousResearch/hermes-agent/pull/3669))
- **Cron delivery labels** — resolve human-friendly delivery labels via channel directory ([#3860](https://github.com/NousResearch/hermes-agent/pull/3860), closes [#1945](https://github.com/NousResearch/hermes-agent/issues/1945))
- **Cron [SILENT] tightening** — prevent agents from prefixing reports with [SILENT] to suppress delivery ([#3901](https://github.com/NousResearch/hermes-agent/pull/3901))
- **Background task media delivery** and vision download timeout fixes ([#3919](https://github.com/NousResearch/hermes-agent/pull/3919))
- **Boot-md hook** — example built-in hook to run a BOOT.md file on gateway startup ([#3733](https://github.com/NousResearch/hermes-agent/pull/3733))
---
## 🖥️ CLI & User Experience
### Interactive CLI
- **Configurable tool preview length** — show full file paths by default instead of truncating at 40 chars ([#3841](https://github.com/NousResearch/hermes-agent/pull/3841))
- **Tool token context display** — `hermes tools` checklist now shows estimated token cost per toolset ([#3805](https://github.com/NousResearch/hermes-agent/pull/3805))
- **/bg spinner TUI fix** — route background task spinner through the TUI widget to prevent status bar collision ([#3643](https://github.com/NousResearch/hermes-agent/pull/3643))
- **Prevent status bar wrapping** into duplicate rows ([#3883](https://github.com/NousResearch/hermes-agent/pull/3883)) — @kshitijk4poor
- **Handle closed stdout ValueError** in safe print paths — fixes crashes when stdout is closed during gateway thread shutdown ([#3843](https://github.com/NousResearch/hermes-agent/pull/3843), closes [#3534](https://github.com/NousResearch/hermes-agent/issues/3534))
- **Remove input() from /tools disable** — eliminates freeze in terminal when disabling tools ([#3918](https://github.com/NousResearch/hermes-agent/pull/3918))
- **TTY guard for interactive CLI commands** — prevent CPU spin when launched without a terminal ([#3933](https://github.com/NousResearch/hermes-agent/pull/3933))
- **Argparse entrypoint** — use argparse in the top-level launcher for cleaner error handling ([#3874](https://github.com/NousResearch/hermes-agent/pull/3874))
- **Lazy-initialized tools show yellow** in banner instead of red, reducing false alarm about "missing" tools ([#3822](https://github.com/NousResearch/hermes-agent/pull/3822))
- **Honcho tools shown in banner** when configured ([#3810](https://github.com/NousResearch/hermes-agent/pull/3810))
### Setup & Configuration
- **Auto-install matrix-nio** during `hermes setup` when Matrix is selected ([#3802](https://github.com/NousResearch/hermes-agent/pull/3802), [#3873](https://github.com/NousResearch/hermes-agent/pull/3873))
- **Session export stdout support** — export sessions to stdout with `-` for piping ([#3641](https://github.com/NousResearch/hermes-agent/pull/3641), closes [#3609](https://github.com/NousResearch/hermes-agent/issues/3609))
- **Configurable approval timeouts** — set how long dangerous command approval prompts wait before auto-denying ([#3886](https://github.com/NousResearch/hermes-agent/pull/3886), closes [#3765](https://github.com/NousResearch/hermes-agent/issues/3765))
- **Clear __pycache__ during update** — prevents stale bytecode ImportError after `hermes update` ([#3819](https://github.com/NousResearch/hermes-agent/pull/3819))
---
## 🔧 Tool System
### MCP
- **MCP Server Mode** — `hermes mcp serve` exposes conversations, sessions, and attachments to MCP clients via stdio or Streamable HTTP ([#3795](https://github.com/NousResearch/hermes-agent/pull/3795))
- **Dynamic tool discovery** — respond to `notifications/tools/list_changed` events to pick up new tools from MCP servers without reconnecting ([#3812](https://github.com/NousResearch/hermes-agent/pull/3812))
- **Non-deprecated HTTP transport** — switched from `sse_client` to `streamable_http_client` ([#3646](https://github.com/NousResearch/hermes-agent/pull/3646))
### Web Tools
- **Exa search backend** — alternative to Firecrawl and DuckDuckGo for web search and extraction ([#3648](https://github.com/NousResearch/hermes-agent/pull/3648))
### Browser
- **Guard against None LLM responses** in browser snapshot and vision tools ([#3642](https://github.com/NousResearch/hermes-agent/pull/3642))
### Terminal & Remote Backends
- **Mount skill directories** into Modal and Docker containers ([#3890](https://github.com/NousResearch/hermes-agent/pull/3890))
- **Mount credential files** into remote backends with mtime+size caching ([#3671](https://github.com/NousResearch/hermes-agent/pull/3671))
- **Preserve partial output** when commands time out instead of losing everything ([#3868](https://github.com/NousResearch/hermes-agent/pull/3868))
- **Stop marking persisted env vars as missing** on remote backends ([#3650](https://github.com/NousResearch/hermes-agent/pull/3650))
### Audio
- **.aac format support** in transcription tool ([#3865](https://github.com/NousResearch/hermes-agent/pull/3865), closes [#1963](https://github.com/NousResearch/hermes-agent/issues/1963))
- **Audio download retry** — retry logic for `cache_audio_from_url` matching the existing image download pattern ([#3401](https://github.com/NousResearch/hermes-agent/pull/3401)) — @binhnt92
### Vision
- **Reject non-image files** and enforce website-only policy for vision analysis ([#3845](https://github.com/NousResearch/hermes-agent/pull/3845))
### Tool Schema
- **Ensure name field** always present in tool definitions, fixing `KeyError: 'name'` crashes ([#3811](https://github.com/NousResearch/hermes-agent/pull/3811), closes [#3729](https://github.com/NousResearch/hermes-agent/issues/3729))
### ACP (Editor Integration)
- **Complete session management surface** for VS Code/Zed/JetBrains clients — proper task lifecycle, cancel support, session persistence ([#3675](https://github.com/NousResearch/hermes-agent/pull/3675))
---
## 🧩 Skills & Plugins
### Skills System
- **External skill directories** — configure additional skill directories via `skills.external_dirs` in config.yaml ([#3678](https://github.com/NousResearch/hermes-agent/pull/3678))
- **Category path traversal blocked** — prevents `../` attacks in skill category names ([#3844](https://github.com/NousResearch/hermes-agent/pull/3844))
- **parallel-cli moved to optional-skills** — reduces default skill footprint ([#3673](https://github.com/NousResearch/hermes-agent/pull/3673)) — @kshitijk4poor
### New Skills
- **memento-flashcards** — spaced repetition flashcard system ([#3827](https://github.com/NousResearch/hermes-agent/pull/3827))
- **songwriting-and-ai-music** — songwriting craft and AI music generation prompts ([#3834](https://github.com/NousResearch/hermes-agent/pull/3834))
- **SiYuan Note** — integration with SiYuan note-taking app ([#3742](https://github.com/NousResearch/hermes-agent/pull/3742))
- **Scrapling** — web scraping skill using Scrapling library ([#3742](https://github.com/NousResearch/hermes-agent/pull/3742))
- **one-three-one-rule** — communication framework skill ([#3797](https://github.com/NousResearch/hermes-agent/pull/3797))
### Plugin System
- **Plugin enable/disable commands** — `hermes plugins enable/disable <name>` for managing plugin state without removing them ([#3747](https://github.com/NousResearch/hermes-agent/pull/3747))
- **Plugin message injection** — plugins can now inject messages into the conversation stream on behalf of the user via `ctx.inject_message()` ([#3778](https://github.com/NousResearch/hermes-agent/pull/3778)) — @winglian
- **Honcho self-hosted support** — allow local Honcho instances without requiring an API key ([#3644](https://github.com/NousResearch/hermes-agent/pull/3644))
---
## 🔒 Security & Reliability
### Security Hardening
- **Hardened dangerous command detection** — expanded pattern matching for risky shell commands and added file tool path guards for sensitive locations (`/etc/`, `/boot/`, docker.sock) ([#3872](https://github.com/NousResearch/hermes-agent/pull/3872))
- **Sensitive path write checks** in approval system — catch writes to system config files through file tools, not just terminal ([#3859](https://github.com/NousResearch/hermes-agent/pull/3859))
- **Secret redaction expansion** — now covers ElevenLabs, Tavily, and Exa API keys ([#3920](https://github.com/NousResearch/hermes-agent/pull/3920))
- **Vision file rejection** — reject non-image files passed to vision analysis to prevent information disclosure ([#3845](https://github.com/NousResearch/hermes-agent/pull/3845))
- **Category path traversal blocking** — prevent directory traversal in skill category names ([#3844](https://github.com/NousResearch/hermes-agent/pull/3844))
### Reliability
- **Atomic config.yaml writes** — prevent data loss during gateway crashes ([#3800](https://github.com/NousResearch/hermes-agent/pull/3800))
- **Clear __pycache__ on update** — prevent stale bytecode from causing ImportError after updates ([#3819](https://github.com/NousResearch/hermes-agent/pull/3819))
- **Lazy imports for update safety** — prevent ImportError chains during `hermes update` when modules reference new functions ([#3776](https://github.com/NousResearch/hermes-agent/pull/3776))
- **Restore terminalbench2 from patch corruption** — recovered file damaged by patch tool's secret redaction ([#3801](https://github.com/NousResearch/hermes-agent/pull/3801))
- **Terminal timeout preserves partial output** — no more lost command output on timeout ([#3868](https://github.com/NousResearch/hermes-agent/pull/3868))
---
## 🐛 Notable Bug Fixes
- **OpenClaw migration model config overwrite** — migration no longer overwrites model config dict with a string ([#3924](https://github.com/NousResearch/hermes-agent/pull/3924)) — @0xbyt4
- **OpenClaw migration expanded** — covers full data footprint including sessions, cron, memory ([#3869](https://github.com/NousResearch/hermes-agent/pull/3869))
- **Telegram deleted reply targets** — gracefully handle replies to deleted messages instead of crashing ([#3858](https://github.com/NousResearch/hermes-agent/pull/3858))
- **Discord "thinking..." persistence** — properly cleans up deferred response indicators ([#3674](https://github.com/NousResearch/hermes-agent/pull/3674))
- **WhatsApp LID↔phone aliases** — fixes allowlist matching failures with Linked ID format ([#3830](https://github.com/NousResearch/hermes-agent/pull/3830))
- **Signal URL-encoded phone numbers** — fixes delivery failures with certain formats ([#3670](https://github.com/NousResearch/hermes-agent/pull/3670))
- **Email connection leaks** — properly close SMTP/IMAP connections on error ([#3804](https://github.com/NousResearch/hermes-agent/pull/3804))
- **_safe_print ValueError** — no more gateway thread crashes on closed stdout ([#3843](https://github.com/NousResearch/hermes-agent/pull/3843))
- **Tool schema KeyError 'name'** — ensure name field always present in tool definitions ([#3811](https://github.com/NousResearch/hermes-agent/pull/3811))
- **api_mode stale on provider switch** — correctly clear when switching providers via `hermes model` ([#3857](https://github.com/NousResearch/hermes-agent/pull/3857))
---
## 🧪 Testing
- Resolved 10+ CI failures across hooks, tiktoken, plugins, and skill tests ([#3848](https://github.com/NousResearch/hermes-agent/pull/3848), [#3721](https://github.com/NousResearch/hermes-agent/pull/3721), [#3936](https://github.com/NousResearch/hermes-agent/pull/3936))
---
## 📚 Documentation
- **Comprehensive OpenClaw migration guide** — step-by-step guide for migrating from OpenClaw/Claw3D to Hermes Agent ([#3864](https://github.com/NousResearch/hermes-agent/pull/3864), [#3900](https://github.com/NousResearch/hermes-agent/pull/3900))
- **Credential file passthrough docs** — document how to forward credential files and env vars to remote backends ([#3677](https://github.com/NousResearch/hermes-agent/pull/3677))
- **DuckDuckGo requirements clarified** — note runtime dependency on duckduckgo-search package ([#3680](https://github.com/NousResearch/hermes-agent/pull/3680))
- **Skills catalog updated** — added red-teaming category and optional skills listing ([#3745](https://github.com/NousResearch/hermes-agent/pull/3745))
- **Feishu docs MDX fix** — escape angle-bracket URLs that break Docusaurus build ([#3902](https://github.com/NousResearch/hermes-agent/pull/3902))
---
## 👥 Contributors
### Core
- **@teknium1** — 90 PRs across all subsystems
### Community Contributors
- **@kshitijk4poor** — 3 PRs: Signal phone number fix ([#3670](https://github.com/NousResearch/hermes-agent/pull/3670)), parallel-cli to optional-skills ([#3673](https://github.com/NousResearch/hermes-agent/pull/3673)), status bar wrapping fix ([#3883](https://github.com/NousResearch/hermes-agent/pull/3883))
- **@winglian** — 1 PR: Plugin message injection interface ([#3778](https://github.com/NousResearch/hermes-agent/pull/3778))
- **@binhnt92** — 1 PR: Audio download retry logic ([#3401](https://github.com/NousResearch/hermes-agent/pull/3401))
- **@0xbyt4** — 1 PR: OpenClaw migration model config fix ([#3924](https://github.com/NousResearch/hermes-agent/pull/3924))
### Issues Resolved from Community
@Material-Scientist ([#850](https://github.com/NousResearch/hermes-agent/issues/850)), @hanxu98121 ([#1734](https://github.com/NousResearch/hermes-agent/issues/1734)), @penwyp ([#1788](https://github.com/NousResearch/hermes-agent/issues/1788)), @dan-and ([#1945](https://github.com/NousResearch/hermes-agent/issues/1945)), @AdrianScott ([#1963](https://github.com/NousResearch/hermes-agent/issues/1963)), @clawdbot47 ([#3229](https://github.com/NousResearch/hermes-agent/issues/3229)), @alanfwilliams ([#3404](https://github.com/NousResearch/hermes-agent/issues/3404)), @kentimsit ([#3433](https://github.com/NousResearch/hermes-agent/issues/3433)), @hayka-pacha ([#3534](https://github.com/NousResearch/hermes-agent/issues/3534)), @primmer ([#3595](https://github.com/NousResearch/hermes-agent/issues/3595)), @dagelf ([#3609](https://github.com/NousResearch/hermes-agent/issues/3609)), @HenkDz ([#3685](https://github.com/NousResearch/hermes-agent/issues/3685)), @tmdgusya ([#3729](https://github.com/NousResearch/hermes-agent/issues/3729)), @TypQxQ ([#3753](https://github.com/NousResearch/hermes-agent/issues/3753)), @acsezen ([#3765](https://github.com/NousResearch/hermes-agent/issues/3765))
---
**Full Changelog**: [v2026.3.28...v2026.3.30](https://github.com/NousResearch/hermes-agent/compare/v2026.3.28...v2026.3.30)

290
RELEASE_v0.7.0.md Normal file
View File

@@ -0,0 +1,290 @@
# Hermes Agent v0.7.0 (v2026.4.3)
**Release Date:** April 3, 2026
> The resilience release — pluggable memory providers, credential pool rotation, Camofox anti-detection browser, inline diff previews, gateway hardening across race conditions and approval routing, and deep security fixes across 168 PRs and 46 resolved issues.
---
## ✨ Highlights
- **Pluggable Memory Provider Interface** — Memory is now an extensible plugin system. Third-party memory backends (Honcho, vector stores, custom DBs) implement a simple provider ABC and register via the plugin system. Built-in memory is the default provider. Honcho integration restored to full parity as the reference plugin with profile-scoped host/peer resolution. ([#4623](https://github.com/NousResearch/hermes-agent/pull/4623), [#4616](https://github.com/NousResearch/hermes-agent/pull/4616), [#4355](https://github.com/NousResearch/hermes-agent/pull/4355))
- **Same-Provider Credential Pools** — Configure multiple API keys for the same provider with automatic rotation. Thread-safe `least_used` strategy distributes load across keys, and 401 failures trigger automatic rotation to the next credential. Set up via the setup wizard or `credential_pool` config. ([#4188](https://github.com/NousResearch/hermes-agent/pull/4188), [#4300](https://github.com/NousResearch/hermes-agent/pull/4300), [#4361](https://github.com/NousResearch/hermes-agent/pull/4361))
- **Camofox Anti-Detection Browser Backend** — New local browser backend using Camoufox for stealth browsing. Persistent sessions with VNC URL discovery for visual debugging, configurable SSRF bypass for local backends, auto-install via `hermes tools`. ([#4008](https://github.com/NousResearch/hermes-agent/pull/4008), [#4419](https://github.com/NousResearch/hermes-agent/pull/4419), [#4292](https://github.com/NousResearch/hermes-agent/pull/4292))
- **Inline Diff Previews** — File write and patch operations now show inline diffs in the tool activity feed, giving you visual confirmation of what changed before the agent moves on. ([#4411](https://github.com/NousResearch/hermes-agent/pull/4411), [#4423](https://github.com/NousResearch/hermes-agent/pull/4423))
- **API Server Session Continuity & Tool Streaming** — The API server (Open WebUI integration) now streams tool progress events in real-time and supports `X-Hermes-Session-Id` headers for persistent sessions across requests. Sessions persist to the shared SessionDB. ([#4092](https://github.com/NousResearch/hermes-agent/pull/4092), [#4478](https://github.com/NousResearch/hermes-agent/pull/4478), [#4802](https://github.com/NousResearch/hermes-agent/pull/4802))
- **ACP: Client-Provided MCP Servers** — Editor integrations (VS Code, Zed, JetBrains) can now register their own MCP servers, which Hermes picks up as additional agent tools. Your editor's MCP ecosystem flows directly into the agent. ([#4705](https://github.com/NousResearch/hermes-agent/pull/4705))
- **Gateway Hardening** — Major stability pass across race conditions, photo media delivery, flood control, stuck sessions, approval routing, and compression death spirals. The gateway is substantially more reliable in production. ([#4727](https://github.com/NousResearch/hermes-agent/pull/4727), [#4750](https://github.com/NousResearch/hermes-agent/pull/4750), [#4798](https://github.com/NousResearch/hermes-agent/pull/4798), [#4557](https://github.com/NousResearch/hermes-agent/pull/4557))
- **Security: Secret Exfiltration Blocking** — Browser URLs and LLM responses are now scanned for secret patterns, blocking exfiltration attempts via URL encoding, base64, or prompt injection. Credential directory protections expanded to `.docker`, `.azure`, `.config/gh`. Execute_code sandbox output is redacted. ([#4483](https://github.com/NousResearch/hermes-agent/pull/4483), [#4360](https://github.com/NousResearch/hermes-agent/pull/4360), [#4305](https://github.com/NousResearch/hermes-agent/pull/4305), [#4327](https://github.com/NousResearch/hermes-agent/pull/4327))
---
## 🏗️ Core Agent & Architecture
### Provider & Model Support
- **Same-provider credential pools** — configure multiple API keys with automatic `least_used` rotation and 401 failover ([#4188](https://github.com/NousResearch/hermes-agent/pull/4188), [#4300](https://github.com/NousResearch/hermes-agent/pull/4300))
- **Credential pool preserved through smart routing** — pool state survives fallback provider switches and defers eager fallback on 429 ([#4361](https://github.com/NousResearch/hermes-agent/pull/4361))
- **Per-turn primary runtime restoration** — after fallback provider use, the agent automatically restores the primary provider on the next turn with transport recovery ([#4624](https://github.com/NousResearch/hermes-agent/pull/4624))
- **`developer` role for GPT-5 and Codex models** — uses OpenAI's recommended system message role for newer models ([#4498](https://github.com/NousResearch/hermes-agent/pull/4498))
- **Google model operational guidance** — Gemini and Gemma models get provider-specific prompting guidance ([#4641](https://github.com/NousResearch/hermes-agent/pull/4641))
- **Anthropic long-context tier 429 handling** — automatically reduces context to 200k when hitting tier limits ([#4747](https://github.com/NousResearch/hermes-agent/pull/4747))
- **URL-based auth for third-party Anthropic endpoints** + CI test fixes ([#4148](https://github.com/NousResearch/hermes-agent/pull/4148))
- **Bearer auth for MiniMax Anthropic endpoints** ([#4028](https://github.com/NousResearch/hermes-agent/pull/4028))
- **Fireworks context length detection** ([#4158](https://github.com/NousResearch/hermes-agent/pull/4158))
- **Standard DashScope international endpoint** for Alibaba provider ([#4133](https://github.com/NousResearch/hermes-agent/pull/4133), closes [#3912](https://github.com/NousResearch/hermes-agent/issues/3912))
- **Custom providers context_length** honored in hygiene compression ([#4085](https://github.com/NousResearch/hermes-agent/pull/4085))
- **Non-sk-ant keys** treated as regular API keys, not OAuth tokens ([#4093](https://github.com/NousResearch/hermes-agent/pull/4093))
- **Claude-sonnet-4.6** added to OpenRouter and Nous model lists ([#4157](https://github.com/NousResearch/hermes-agent/pull/4157))
- **Qwen 3.6 Plus Preview** added to model lists ([#4376](https://github.com/NousResearch/hermes-agent/pull/4376))
- **MiniMax M2.7** added to hermes model picker and OpenCode ([#4208](https://github.com/NousResearch/hermes-agent/pull/4208))
- **Auto-detect models from server probe** in custom endpoint setup ([#4218](https://github.com/NousResearch/hermes-agent/pull/4218))
- **Config.yaml single source of truth** for endpoint URLs — no more env var vs config.yaml conflicts ([#4165](https://github.com/NousResearch/hermes-agent/pull/4165))
- **Setup wizard no longer overwrites** custom endpoint config ([#4180](https://github.com/NousResearch/hermes-agent/pull/4180), closes [#4172](https://github.com/NousResearch/hermes-agent/issues/4172))
- **Unified setup wizard provider selection** with `hermes model` — single code path for both flows ([#4200](https://github.com/NousResearch/hermes-agent/pull/4200))
- **Root-level provider config** no longer overrides `model.provider` ([#4329](https://github.com/NousResearch/hermes-agent/pull/4329))
- **Rate-limit pairing rejection messages** to prevent spam ([#4081](https://github.com/NousResearch/hermes-agent/pull/4081))
### Agent Loop & Conversation
- **Preserve Anthropic thinking block signatures** across tool-use turns ([#4626](https://github.com/NousResearch/hermes-agent/pull/4626))
- **Classify think-only empty responses** before retrying — prevents infinite retry loops on models that produce thinking blocks without content ([#4645](https://github.com/NousResearch/hermes-agent/pull/4645))
- **Prevent compression death spiral** from API disconnects — stops the loop where compression triggers, fails, compresses again ([#4750](https://github.com/NousResearch/hermes-agent/pull/4750), closes [#2153](https://github.com/NousResearch/hermes-agent/issues/2153))
- **Persist compressed context** to gateway session after mid-run compression ([#4095](https://github.com/NousResearch/hermes-agent/pull/4095))
- **Context-exceeded error messages** now include actionable guidance ([#4155](https://github.com/NousResearch/hermes-agent/pull/4155), closes [#4061](https://github.com/NousResearch/hermes-agent/issues/4061))
- **Strip orphaned think/reasoning tags** from user-facing responses ([#4311](https://github.com/NousResearch/hermes-agent/pull/4311), closes [#4285](https://github.com/NousResearch/hermes-agent/issues/4285))
- **Harden Codex responses preflight** and stream error handling ([#4313](https://github.com/NousResearch/hermes-agent/pull/4313))
- **Deterministic call_id fallbacks** instead of random UUIDs for prompt cache consistency ([#3991](https://github.com/NousResearch/hermes-agent/pull/3991))
- **Context pressure warning spam** prevented after compression ([#4012](https://github.com/NousResearch/hermes-agent/pull/4012))
- **AsyncOpenAI created lazily** in trajectory compressor to avoid closed event loop errors ([#4013](https://github.com/NousResearch/hermes-agent/pull/4013))
### Memory & Sessions
- **Pluggable memory provider interface** — ABC-based plugin system for custom memory backends with profile isolation ([#4623](https://github.com/NousResearch/hermes-agent/pull/4623))
- **Honcho full integration parity** restored as reference memory provider plugin ([#4355](https://github.com/NousResearch/hermes-agent/pull/4355)) — @erosika
- **Honcho profile-scoped** host and peer resolution ([#4616](https://github.com/NousResearch/hermes-agent/pull/4616))
- **Memory flush state persisted** to prevent redundant re-flushes on gateway restart ([#4481](https://github.com/NousResearch/hermes-agent/pull/4481))
- **Memory provider tools** routed through sequential execution path ([#4803](https://github.com/NousResearch/hermes-agent/pull/4803))
- **Honcho config** written to instance-local path for profile isolation ([#4037](https://github.com/NousResearch/hermes-agent/pull/4037))
- **API server sessions** persist to shared SessionDB ([#4802](https://github.com/NousResearch/hermes-agent/pull/4802))
- **Token usage persisted** for non-CLI sessions ([#4627](https://github.com/NousResearch/hermes-agent/pull/4627))
- **Quote dotted terms in FTS5 queries** — fixes session search for terms containing dots ([#4549](https://github.com/NousResearch/hermes-agent/pull/4549))
---
## 📱 Messaging Platforms (Gateway)
### Gateway Core
- **Race condition fixes** — photo media loss, flood control, stuck sessions, and STT config issues resolved in one hardening pass ([#4727](https://github.com/NousResearch/hermes-agent/pull/4727))
- **Approval routing through running-agent guard** — `/approve` and `/deny` now route correctly when the agent is blocked waiting for approval instead of being swallowed as interrupts ([#4798](https://github.com/NousResearch/hermes-agent/pull/4798), [#4557](https://github.com/NousResearch/hermes-agent/pull/4557), closes [#4542](https://github.com/NousResearch/hermes-agent/issues/4542))
- **Resume agent after /approve** — tool result is no longer lost when executing blocked commands ([#4418](https://github.com/NousResearch/hermes-agent/pull/4418))
- **DM thread sessions seeded** with parent transcript to preserve context ([#4559](https://github.com/NousResearch/hermes-agent/pull/4559))
- **Skill-aware slash commands** — gateway dynamically registers installed skills as slash commands with paginated `/commands` list and Telegram 100-command cap ([#3934](https://github.com/NousResearch/hermes-agent/pull/3934), [#4005](https://github.com/NousResearch/hermes-agent/pull/4005), [#4006](https://github.com/NousResearch/hermes-agent/pull/4006), [#4010](https://github.com/NousResearch/hermes-agent/pull/4010), [#4023](https://github.com/NousResearch/hermes-agent/pull/4023))
- **Per-platform disabled skills** respected in Telegram menu and gateway dispatch ([#4799](https://github.com/NousResearch/hermes-agent/pull/4799))
- **Remove user-facing compression warnings** — cleaner message flow ([#4139](https://github.com/NousResearch/hermes-agent/pull/4139))
- **`-v/-q` flags wired to stderr logging** for gateway service ([#4474](https://github.com/NousResearch/hermes-agent/pull/4474))
- **HERMES_HOME remapped** to target user in system service unit ([#4456](https://github.com/NousResearch/hermes-agent/pull/4456))
- **Honor default for invalid bool-like config values** ([#4029](https://github.com/NousResearch/hermes-agent/pull/4029))
- **setsid instead of systemd-run** for `/update` command to avoid systemd permission issues ([#4104](https://github.com/NousResearch/hermes-agent/pull/4104), closes [#4017](https://github.com/NousResearch/hermes-agent/issues/4017))
- **'Initializing agent...'** shown on first message for better UX ([#4086](https://github.com/NousResearch/hermes-agent/pull/4086))
- **Allow running gateway service as root** for LXC/container environments ([#4732](https://github.com/NousResearch/hermes-agent/pull/4732))
### Telegram
- **32-char limit on command names** with collision avoidance ([#4211](https://github.com/NousResearch/hermes-agent/pull/4211))
- **Priority order enforced** in menu — core > plugins > skills ([#4023](https://github.com/NousResearch/hermes-agent/pull/4023))
- **Capped at 50 commands** — API rejects above ~60 ([#4006](https://github.com/NousResearch/hermes-agent/pull/4006))
- **Skip empty/whitespace text** to prevent 400 errors ([#4388](https://github.com/NousResearch/hermes-agent/pull/4388))
- **E2E gateway tests** added ([#4497](https://github.com/NousResearch/hermes-agent/pull/4497)) — @pefontana
### Discord
- **Button-based approval UI** — register `/approve` and `/deny` slash commands with interactive button prompts ([#4800](https://github.com/NousResearch/hermes-agent/pull/4800))
- **Configurable reactions** — `discord.reactions` config option to disable message processing reactions ([#4199](https://github.com/NousResearch/hermes-agent/pull/4199))
- **Skip reactions and auto-threading** for unauthorized users ([#4387](https://github.com/NousResearch/hermes-agent/pull/4387))
### Slack
- **Reply in thread** — `slack.reply_in_thread` config option for threaded responses ([#4643](https://github.com/NousResearch/hermes-agent/pull/4643), closes [#2662](https://github.com/NousResearch/hermes-agent/issues/2662))
### WhatsApp
- **Enforce require_mention in group chats** ([#4730](https://github.com/NousResearch/hermes-agent/pull/4730))
### Webhook
- **Platform support fixes** — skip home channel prompt, disable tool progress for webhook adapters ([#4660](https://github.com/NousResearch/hermes-agent/pull/4660))
### Matrix
- **E2EE decryption hardening** — request missing keys, auto-trust devices, retry buffered events ([#4083](https://github.com/NousResearch/hermes-agent/pull/4083))
---
## 🖥️ CLI & User Experience
### New Slash Commands
- **`/yolo`** — toggle dangerous command approvals on/off for the session ([#3990](https://github.com/NousResearch/hermes-agent/pull/3990))
- **`/btw`** — ephemeral side questions that don't affect the main conversation context ([#4161](https://github.com/NousResearch/hermes-agent/pull/4161))
- **`/profile`** — show active profile info without leaving the chat session ([#4027](https://github.com/NousResearch/hermes-agent/pull/4027))
### Interactive CLI
- **Inline diff previews** for write and patch operations in the tool activity feed ([#4411](https://github.com/NousResearch/hermes-agent/pull/4411), [#4423](https://github.com/NousResearch/hermes-agent/pull/4423))
- **TUI pinned to bottom** on startup — no more large blank spaces between response and input ([#4412](https://github.com/NousResearch/hermes-agent/pull/4412), [#4359](https://github.com/NousResearch/hermes-agent/pull/4359), closes [#4398](https://github.com/NousResearch/hermes-agent/issues/4398), [#4421](https://github.com/NousResearch/hermes-agent/issues/4421))
- **`/history` and `/resume`** now surface recent sessions directly instead of requiring search ([#4728](https://github.com/NousResearch/hermes-agent/pull/4728))
- **Cache tokens shown** in `/insights` overview so total adds up ([#4428](https://github.com/NousResearch/hermes-agent/pull/4428))
- **`--max-turns` CLI flag** for `hermes chat` to limit agent iterations ([#4314](https://github.com/NousResearch/hermes-agent/pull/4314))
- **Detect dragged file paths** instead of treating them as slash commands ([#4533](https://github.com/NousResearch/hermes-agent/pull/4533)) — @rolme
- **Allow empty strings and falsy values** in `config set` ([#4310](https://github.com/NousResearch/hermes-agent/pull/4310), closes [#4277](https://github.com/NousResearch/hermes-agent/issues/4277))
- **Voice mode in WSL** when PulseAudio bridge is configured ([#4317](https://github.com/NousResearch/hermes-agent/pull/4317))
- **Respect `NO_COLOR` env var** and `TERM=dumb` for accessibility ([#4079](https://github.com/NousResearch/hermes-agent/pull/4079), closes [#4066](https://github.com/NousResearch/hermes-agent/issues/4066)) — @SHL0MS
- **Correct shell reload instruction** for macOS/zsh users ([#4025](https://github.com/NousResearch/hermes-agent/pull/4025))
- **Zero exit code** on successful quiet mode queries ([#4613](https://github.com/NousResearch/hermes-agent/pull/4613), closes [#4601](https://github.com/NousResearch/hermes-agent/issues/4601)) — @devorun
- **on_session_end hook fires** on interrupted exits ([#4159](https://github.com/NousResearch/hermes-agent/pull/4159))
- **Profile list display** reads `model.default` key correctly ([#4160](https://github.com/NousResearch/hermes-agent/pull/4160))
- **Browser and TTS** shown in reconfigure menu ([#4041](https://github.com/NousResearch/hermes-agent/pull/4041))
- **Web backend priority** detection simplified ([#4036](https://github.com/NousResearch/hermes-agent/pull/4036))
### Setup & Configuration
- **Allowed_users preserved** during setup and quiet unconfigured provider warnings ([#4551](https://github.com/NousResearch/hermes-agent/pull/4551)) — @kshitijk4poor
- **Save API key to model config** for custom endpoints ([#4202](https://github.com/NousResearch/hermes-agent/pull/4202), closes [#4182](https://github.com/NousResearch/hermes-agent/issues/4182))
- **Claude Code credentials gated** behind explicit Hermes config in wizard trigger ([#4210](https://github.com/NousResearch/hermes-agent/pull/4210))
- **Atomic writes in save_config_value** to prevent config loss on interrupt ([#4298](https://github.com/NousResearch/hermes-agent/pull/4298), [#4320](https://github.com/NousResearch/hermes-agent/pull/4320))
- **Scopes field written** to Claude Code credentials on token refresh ([#4126](https://github.com/NousResearch/hermes-agent/pull/4126))
### Update System
- **Fork detection and upstream sync** in `hermes update` ([#4744](https://github.com/NousResearch/hermes-agent/pull/4744))
- **Preserve working optional extras** when one extra fails during update ([#4550](https://github.com/NousResearch/hermes-agent/pull/4550))
- **Handle conflicted git index** during hermes update ([#4735](https://github.com/NousResearch/hermes-agent/pull/4735))
- **Avoid launchd restart race** on macOS ([#4736](https://github.com/NousResearch/hermes-agent/pull/4736))
- **Missing subprocess.run() timeouts** added to doctor and status commands ([#4009](https://github.com/NousResearch/hermes-agent/pull/4009))
---
## 🔧 Tool System
### Browser
- **Camofox anti-detection browser backend** — local stealth browsing with auto-install via `hermes tools` ([#4008](https://github.com/NousResearch/hermes-agent/pull/4008))
- **Persistent Camofox sessions** with VNC URL discovery for visual debugging ([#4419](https://github.com/NousResearch/hermes-agent/pull/4419))
- **Skip SSRF check for local backends** (Camofox, headless Chromium) ([#4292](https://github.com/NousResearch/hermes-agent/pull/4292))
- **Configurable SSRF check** via `browser.allow_private_urls` ([#4198](https://github.com/NousResearch/hermes-agent/pull/4198)) — @nils010485
- **CAMOFOX_PORT=9377** added to Docker commands ([#4340](https://github.com/NousResearch/hermes-agent/pull/4340))
### File Operations
- **Inline diff previews** on write and patch actions ([#4411](https://github.com/NousResearch/hermes-agent/pull/4411), [#4423](https://github.com/NousResearch/hermes-agent/pull/4423))
- **Stale file detection** on write and patch — warns when file was modified externally since last read ([#4345](https://github.com/NousResearch/hermes-agent/pull/4345))
- **Staleness timestamp refreshed** after writes ([#4390](https://github.com/NousResearch/hermes-agent/pull/4390))
- **Size guard, dedup, and device blocking** on read_file ([#4315](https://github.com/NousResearch/hermes-agent/pull/4315))
### MCP
- **Stability fix pack** — reload timeout, shutdown cleanup, event loop handler, OAuth non-blocking ([#4757](https://github.com/NousResearch/hermes-agent/pull/4757), closes [#4462](https://github.com/NousResearch/hermes-agent/issues/4462), [#2537](https://github.com/NousResearch/hermes-agent/issues/2537))
### ACP (Editor Integration)
- **Client-provided MCP servers** registered as agent tools — editors pass their MCP servers to Hermes ([#4705](https://github.com/NousResearch/hermes-agent/pull/4705))
### Skills System
- **Size limits for agent writes** and **fuzzy matching for skill patch** — prevents oversized skill writes and improves edit reliability ([#4414](https://github.com/NousResearch/hermes-agent/pull/4414))
- **Validate hub bundle paths** before install — blocks path traversal in skill bundles ([#3986](https://github.com/NousResearch/hermes-agent/pull/3986))
- **Unified hermes-agent and hermes-agent-setup** into single skill ([#4332](https://github.com/NousResearch/hermes-agent/pull/4332))
- **Skill metadata type check** in extract_skill_conditions ([#4479](https://github.com/NousResearch/hermes-agent/pull/4479))
### New/Updated Skills
- **research-paper-writing** — full end-to-end research pipeline (replaced ml-paper-writing) ([#4654](https://github.com/NousResearch/hermes-agent/pull/4654)) — @SHL0MS
- **ascii-video** — text readability techniques and external layout oracle ([#4054](https://github.com/NousResearch/hermes-agent/pull/4054)) — @SHL0MS
- **youtube-transcript** updated for youtube-transcript-api v1.x ([#4455](https://github.com/NousResearch/hermes-agent/pull/4455)) — @el-analista
- **Skills browse and search page** added to documentation site ([#4500](https://github.com/NousResearch/hermes-agent/pull/4500)) — @IAvecilla
---
## 🔒 Security & Reliability
### Security Hardening
- **Block secret exfiltration** via browser URLs and LLM responses — scans for secret patterns in URL encoding, base64, and prompt injection vectors ([#4483](https://github.com/NousResearch/hermes-agent/pull/4483))
- **Redact secrets from execute_code sandbox output** ([#4360](https://github.com/NousResearch/hermes-agent/pull/4360))
- **Protect `.docker`, `.azure`, `.config/gh` credential directories** from read/write via file tools and terminal ([#4305](https://github.com/NousResearch/hermes-agent/pull/4305), [#4327](https://github.com/NousResearch/hermes-agent/pull/4327)) — @memosr
- **GitHub OAuth token patterns** added to redaction + snapshot redact flag ([#4295](https://github.com/NousResearch/hermes-agent/pull/4295))
- **Reject private and loopback IPs** in Telegram DoH fallback ([#4129](https://github.com/NousResearch/hermes-agent/pull/4129))
- **Reject path traversal** in credential file registration ([#4316](https://github.com/NousResearch/hermes-agent/pull/4316))
- **Validate tar archive member paths** on profile import — blocks zip-slip attacks ([#4318](https://github.com/NousResearch/hermes-agent/pull/4318))
- **Exclude auth.json and .env** from profile exports ([#4475](https://github.com/NousResearch/hermes-agent/pull/4475))
### Reliability
- **Prevent compression death spiral** from API disconnects ([#4750](https://github.com/NousResearch/hermes-agent/pull/4750), closes [#2153](https://github.com/NousResearch/hermes-agent/issues/2153))
- **Handle `is_closed` as method** in OpenAI SDK — prevents false positive client closure detection ([#4416](https://github.com/NousResearch/hermes-agent/pull/4416), closes [#4377](https://github.com/NousResearch/hermes-agent/issues/4377))
- **Exclude matrix from [all] extras** — python-olm is upstream-broken, prevents install failures ([#4615](https://github.com/NousResearch/hermes-agent/pull/4615), closes [#4178](https://github.com/NousResearch/hermes-agent/issues/4178))
- **OpenCode model routing** repaired ([#4508](https://github.com/NousResearch/hermes-agent/pull/4508))
- **Docker container image** optimized ([#4034](https://github.com/NousResearch/hermes-agent/pull/4034)) — @bcross
### Windows & Cross-Platform
- **Voice mode in WSL** with PulseAudio bridge ([#4317](https://github.com/NousResearch/hermes-agent/pull/4317))
- **Homebrew packaging** preparation ([#4099](https://github.com/NousResearch/hermes-agent/pull/4099))
- **CI fork conditionals** to prevent workflow failures on forks ([#4107](https://github.com/NousResearch/hermes-agent/pull/4107))
---
## 🐛 Notable Bug Fixes
- **Gateway approval blocked agent thread** — approval now blocks the agent thread like CLI does, preventing tool result loss ([#4557](https://github.com/NousResearch/hermes-agent/pull/4557), closes [#4542](https://github.com/NousResearch/hermes-agent/issues/4542))
- **Compression death spiral** from API disconnects — detected and halted instead of looping ([#4750](https://github.com/NousResearch/hermes-agent/pull/4750), closes [#2153](https://github.com/NousResearch/hermes-agent/issues/2153))
- **Anthropic thinking blocks lost** across tool-use turns ([#4626](https://github.com/NousResearch/hermes-agent/pull/4626))
- **Profile model config ignored** with `-p` flag — model.model now promoted to model.default correctly ([#4160](https://github.com/NousResearch/hermes-agent/pull/4160), closes [#4486](https://github.com/NousResearch/hermes-agent/issues/4486))
- **CLI blank space** between response and input area ([#4412](https://github.com/NousResearch/hermes-agent/pull/4412), [#4359](https://github.com/NousResearch/hermes-agent/pull/4359), closes [#4398](https://github.com/NousResearch/hermes-agent/issues/4398))
- **Dragged file paths** treated as slash commands instead of file references ([#4533](https://github.com/NousResearch/hermes-agent/pull/4533)) — @rolme
- **Orphaned `</think>` tags** leaking into user-facing responses ([#4311](https://github.com/NousResearch/hermes-agent/pull/4311), closes [#4285](https://github.com/NousResearch/hermes-agent/issues/4285))
- **OpenAI SDK `is_closed`** is a method not property — false positive client closure ([#4416](https://github.com/NousResearch/hermes-agent/pull/4416), closes [#4377](https://github.com/NousResearch/hermes-agent/issues/4377))
- **MCP OAuth server** could block Hermes startup instead of degrading gracefully ([#4757](https://github.com/NousResearch/hermes-agent/pull/4757), closes [#4462](https://github.com/NousResearch/hermes-agent/issues/4462))
- **MCP event loop closed** on shutdown with HTTP servers ([#4757](https://github.com/NousResearch/hermes-agent/pull/4757), closes [#2537](https://github.com/NousResearch/hermes-agent/issues/2537))
- **Alibaba provider** hardcoded to wrong endpoint ([#4133](https://github.com/NousResearch/hermes-agent/pull/4133), closes [#3912](https://github.com/NousResearch/hermes-agent/issues/3912))
- **Slack reply_in_thread** missing config option ([#4643](https://github.com/NousResearch/hermes-agent/pull/4643), closes [#2662](https://github.com/NousResearch/hermes-agent/issues/2662))
- **Quiet mode exit code** — successful `-q` queries no longer exit nonzero ([#4613](https://github.com/NousResearch/hermes-agent/pull/4613), closes [#4601](https://github.com/NousResearch/hermes-agent/issues/4601))
- **Mobile sidebar** shows only close button due to backdrop-filter issue in docs site ([#4207](https://github.com/NousResearch/hermes-agent/pull/4207)) — @xsmyile
- **Config restore reverted** by stale-branch squash merge — `_config_version` fixed ([#4440](https://github.com/NousResearch/hermes-agent/pull/4440))
---
## 🧪 Testing
- **Telegram gateway E2E tests** — full integration test suite for the Telegram adapter ([#4497](https://github.com/NousResearch/hermes-agent/pull/4497)) — @pefontana
- **11 real test failures fixed** plus sys.modules cascade poisoner resolved ([#4570](https://github.com/NousResearch/hermes-agent/pull/4570))
- **7 CI failures resolved** across hooks, plugins, and skill tests ([#3936](https://github.com/NousResearch/hermes-agent/pull/3936))
- **Codex 401 refresh tests** updated for CI compatibility ([#4166](https://github.com/NousResearch/hermes-agent/pull/4166))
- **Stale OPENAI_BASE_URL test** fixed ([#4217](https://github.com/NousResearch/hermes-agent/pull/4217))
---
## 📚 Documentation
- **Comprehensive documentation audit** — 9 HIGH and 20+ MEDIUM gaps fixed across 21 files ([#4087](https://github.com/NousResearch/hermes-agent/pull/4087))
- **Site navigation restructured** — features and platforms promoted to top-level ([#4116](https://github.com/NousResearch/hermes-agent/pull/4116))
- **Tool progress streaming** documented for API server and Open WebUI ([#4138](https://github.com/NousResearch/hermes-agent/pull/4138))
- **Telegram webhook mode** documentation ([#4089](https://github.com/NousResearch/hermes-agent/pull/4089))
- **Local LLM provider guides** — comprehensive setup guides with context length warnings ([#4294](https://github.com/NousResearch/hermes-agent/pull/4294))
- **WhatsApp allowlist behavior** clarified with `WHATSAPP_ALLOW_ALL_USERS` documentation ([#4293](https://github.com/NousResearch/hermes-agent/pull/4293))
- **Slack configuration options** — new config section in Slack docs ([#4644](https://github.com/NousResearch/hermes-agent/pull/4644))
- **Terminal backends section** expanded + docs build fixes ([#4016](https://github.com/NousResearch/hermes-agent/pull/4016))
- **Adding-providers guide** updated for unified setup flow ([#4201](https://github.com/NousResearch/hermes-agent/pull/4201))
- **ACP Zed config** fixed ([#4743](https://github.com/NousResearch/hermes-agent/pull/4743))
- **Community FAQ** entries for common workflows and troubleshooting ([#4797](https://github.com/NousResearch/hermes-agent/pull/4797))
- **Skills browse and search page** on docs site ([#4500](https://github.com/NousResearch/hermes-agent/pull/4500)) — @IAvecilla
---
## 👥 Contributors
### Core
- **@teknium1** — 135 commits across all subsystems
### Top Community Contributors
- **@kshitijk4poor** — 13 commits: preserve allowed_users during setup ([#4551](https://github.com/NousResearch/hermes-agent/pull/4551)), and various fixes
- **@erosika** — 12 commits: Honcho full integration parity restored as memory provider plugin ([#4355](https://github.com/NousResearch/hermes-agent/pull/4355))
- **@pefontana** — 9 commits: Telegram gateway E2E test suite ([#4497](https://github.com/NousResearch/hermes-agent/pull/4497))
- **@bcross** — 5 commits: Docker container image optimization ([#4034](https://github.com/NousResearch/hermes-agent/pull/4034))
- **@SHL0MS** — 4 commits: NO_COLOR/TERM=dumb support ([#4079](https://github.com/NousResearch/hermes-agent/pull/4079)), ascii-video skill updates ([#4054](https://github.com/NousResearch/hermes-agent/pull/4054)), research-paper-writing skill ([#4654](https://github.com/NousResearch/hermes-agent/pull/4654))
### All Contributors
@0xbyt4, @arasovic, @Bartok9, @bcross, @binhnt92, @camden-lowrance, @curtitoo, @Dakota, @Dave Tist, @Dean Kerr, @devorun, @dieutx, @Dilee, @el-analista, @erosika, @Gutslabs, @IAvecilla, @Jack, @Johannnnn506, @kshitijk4poor, @Laura Batalha, @Leegenux, @Lume, @MacroAnarchy, @maymuneth, @memosr, @NexVeridian, @Nick, @nils010485, @pefontana, @Penov, @rolme, @SHL0MS, @txchen, @xsmyile
### Issues Resolved from Community
@acsezen ([#2537](https://github.com/NousResearch/hermes-agent/issues/2537)), @arasovic ([#4285](https://github.com/NousResearch/hermes-agent/issues/4285)), @camden-lowrance ([#4462](https://github.com/NousResearch/hermes-agent/issues/4462)), @devorun ([#4601](https://github.com/NousResearch/hermes-agent/issues/4601)), @eloklam ([#4486](https://github.com/NousResearch/hermes-agent/issues/4486)), @HenkDz ([#3719](https://github.com/NousResearch/hermes-agent/issues/3719)), @hypotyposis ([#2153](https://github.com/NousResearch/hermes-agent/issues/2153)), @kazamak ([#4178](https://github.com/NousResearch/hermes-agent/issues/4178)), @lstep ([#4366](https://github.com/NousResearch/hermes-agent/issues/4366)), @Mark-Lok ([#4542](https://github.com/NousResearch/hermes-agent/issues/4542)), @NoJster ([#4421](https://github.com/NousResearch/hermes-agent/issues/4421)), @patp ([#2662](https://github.com/NousResearch/hermes-agent/issues/2662)), @pr0n ([#4601](https://github.com/NousResearch/hermes-agent/issues/4601)), @saulmc ([#4377](https://github.com/NousResearch/hermes-agent/issues/4377)), @SHL0MS ([#4060](https://github.com/NousResearch/hermes-agent/issues/4060), [#4061](https://github.com/NousResearch/hermes-agent/issues/4061), [#4066](https://github.com/NousResearch/hermes-agent/issues/4066), [#4172](https://github.com/NousResearch/hermes-agent/issues/4172), [#4277](https://github.com/NousResearch/hermes-agent/issues/4277)), @Z-Mackintosh ([#4398](https://github.com/NousResearch/hermes-agent/issues/4398))
---
**Full Changelog**: [v2026.3.30...v2026.4.3](https://github.com/NousResearch/hermes-agent/compare/v2026.3.30...v2026.4.3)

View File

@@ -18,6 +18,7 @@ import logging
import os
import sys
from pathlib import Path
from hermes_constants import get_hermes_home
def _setup_logging() -> None:
@@ -42,19 +43,16 @@ def _setup_logging() -> None:
def _load_env() -> None:
"""Load .env from HERMES_HOME (default ``~/.hermes``)."""
from dotenv import load_dotenv
from hermes_cli.env_loader import load_hermes_dotenv
hermes_home = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
env_file = hermes_home / ".env"
if env_file.exists():
try:
load_dotenv(dotenv_path=env_file, encoding="utf-8")
except UnicodeDecodeError:
load_dotenv(dotenv_path=env_file, encoding="latin-1")
logging.getLogger(__name__).info("Loaded env from %s", env_file)
hermes_home = get_hermes_home()
loaded = load_hermes_dotenv(hermes_home=hermes_home)
if loaded:
for env_file in loaded:
logging.getLogger(__name__).info("Loaded env from %s", env_file)
else:
logging.getLogger(__name__).info(
"No .env found at %s, using system env", env_file
"No .env found at %s, using system env", hermes_home / ".env"
)
@@ -76,7 +74,7 @@ def main() -> None:
agent = HermesACPAgent()
try:
asyncio.run(acp.run_agent(agent))
asyncio.run(acp.run_agent(agent, use_unstable_protocol=True))
except KeyboardInterrupt:
logger.info("Shutting down (KeyboardInterrupt)")
except Exception:

View File

@@ -10,7 +10,7 @@ thread while the event loop lives on the main thread).
import asyncio
import json
import logging
from collections import defaultdict, deque
from collections import deque
from typing import Any, Callable, Deque, Dict
import acp
@@ -54,14 +54,18 @@ def make_tool_progress_cb(
Signature expected by AIAgent::
tool_progress_callback(name: str, preview: str, args: dict)
tool_progress_callback(event_type: str, name: str, preview: str, args: dict, **kwargs)
Emits ``ToolCallStart`` for each tool invocation and tracks IDs in a FIFO
Emits ``ToolCallStart`` for ``tool.started`` events and tracks IDs in a FIFO
queue per tool name so duplicate/parallel same-name calls still complete
against the correct ACP tool call.
against the correct ACP tool call. Other event types (``tool.completed``,
``reasoning.available``) are silently ignored.
"""
def _tool_progress(name: str, preview: str, args: Any = None) -> None:
def _tool_progress(event_type: str, name: str = None, preview: str = None, args: Any = None, **kwargs) -> None:
# Only emit ACP ToolCallStart for tool.started; ignore other event types
if event_type != "tool.started":
return
if isinstance(args, str):
try:
args = json.loads(args)

View File

@@ -5,14 +5,11 @@ from __future__ import annotations
import asyncio
import logging
from concurrent.futures import TimeoutError as FutureTimeout
from typing import Any, Callable, Optional
from typing import Callable
from acp.schema import (
AllowedOutcome,
DeniedOutcome,
PermissionOption,
RequestPermissionRequest,
SelectedPermissionOutcome,
)
logger = logging.getLogger(__name__)

View File

@@ -12,7 +12,8 @@ import acp
from acp.schema import (
AgentCapabilities,
AuthenticateResponse,
AuthMethod,
AvailableCommand,
AvailableCommandsUpdate,
ClientCapabilities,
EmbeddedResourceContentBlock,
ForkSessionResponse,
@@ -22,18 +23,31 @@ from acp.schema import (
InitializeResponse,
ListSessionsResponse,
LoadSessionResponse,
McpServerHttp,
McpServerSse,
McpServerStdio,
NewSessionResponse,
PromptResponse,
ResumeSessionResponse,
SetSessionConfigOptionResponse,
SetSessionModelResponse,
SetSessionModeResponse,
ResourceContentBlock,
SessionCapabilities,
SessionForkCapabilities,
SessionListCapabilities,
SessionInfo,
TextContentBlock,
UnstructuredCommandInput,
Usage,
)
# AuthMethodAgent was renamed from AuthMethod in agent-client-protocol 0.9.0
try:
from acp.schema import AuthMethodAgent
except ImportError:
from acp.schema import AuthMethod as AuthMethodAgent # type: ignore[attr-defined]
from acp_adapter.auth import detect_provider, has_provider
from acp_adapter.events import (
make_message_cb,
@@ -42,7 +56,7 @@ from acp_adapter.events import (
make_tool_progress_cb,
)
from acp_adapter.permissions import make_approval_callback
from acp_adapter.session import SessionManager
from acp_adapter.session import SessionManager, SessionState
logger = logging.getLogger(__name__)
@@ -78,6 +92,48 @@ def _extract_text(
class HermesACPAgent(acp.Agent):
"""ACP Agent implementation wrapping Hermes AIAgent."""
_SLASH_COMMANDS = {
"help": "Show available commands",
"model": "Show or change current model",
"tools": "List available tools",
"context": "Show conversation context info",
"reset": "Clear conversation history",
"compact": "Compress conversation context",
"version": "Show Hermes version",
}
_ADVERTISED_COMMANDS = (
{
"name": "help",
"description": "List available commands",
},
{
"name": "model",
"description": "Show current model and provider, or switch models",
"input_hint": "model name to switch to",
},
{
"name": "tools",
"description": "List available tools with descriptions",
},
{
"name": "context",
"description": "Show conversation message counts by role",
},
{
"name": "reset",
"description": "Clear conversation history",
},
{
"name": "compact",
"description": "Compress conversation context",
},
{
"name": "version",
"description": "Show Hermes version",
},
)
def __init__(self, session_manager: SessionManager | None = None):
super().__init__()
self.session_manager = session_manager or SessionManager()
@@ -90,20 +146,88 @@ class HermesACPAgent(acp.Agent):
self._conn = conn
logger.info("ACP client connected")
async def _register_session_mcp_servers(
self,
state: SessionState,
mcp_servers: list[McpServerStdio | McpServerHttp | McpServerSse] | None,
) -> None:
"""Register ACP-provided MCP servers and refresh the agent tool surface."""
if not mcp_servers:
return
try:
from tools.mcp_tool import register_mcp_servers
config_map: dict[str, dict] = {}
for server in mcp_servers:
name = server.name
if isinstance(server, McpServerStdio):
config = {
"command": server.command,
"args": list(server.args),
"env": {item.name: item.value for item in server.env},
}
else:
config = {
"url": server.url,
"headers": {item.name: item.value for item in server.headers},
}
config_map[name] = config
await asyncio.to_thread(register_mcp_servers, config_map)
except Exception:
logger.warning(
"Session %s: failed to register ACP MCP servers",
state.session_id,
exc_info=True,
)
return
try:
from model_tools import get_tool_definitions
enabled_toolsets = getattr(state.agent, "enabled_toolsets", None) or ["hermes-acp"]
disabled_toolsets = getattr(state.agent, "disabled_toolsets", None)
state.agent.tools = get_tool_definitions(
enabled_toolsets=enabled_toolsets,
disabled_toolsets=disabled_toolsets,
quiet_mode=True,
)
state.agent.valid_tool_names = {
tool["function"]["name"] for tool in state.agent.tools or []
}
invalidate = getattr(state.agent, "_invalidate_system_prompt", None)
if callable(invalidate):
invalidate()
logger.info(
"Session %s: refreshed tool surface after ACP MCP registration (%d tools)",
state.session_id,
len(state.agent.tools or []),
)
except Exception:
logger.warning(
"Session %s: failed to refresh tool surface after ACP MCP registration",
state.session_id,
exc_info=True,
)
# ---- ACP lifecycle ------------------------------------------------------
async def initialize(
self,
protocol_version: int,
protocol_version: int | None = None,
client_capabilities: ClientCapabilities | None = None,
client_info: Implementation | None = None,
**kwargs: Any,
) -> InitializeResponse:
resolved_protocol_version = (
protocol_version if isinstance(protocol_version, int) else acp.PROTOCOL_VERSION
)
provider = detect_provider()
auth_methods = None
if provider:
auth_methods = [
AuthMethod(
AuthMethodAgent(
id=provider,
name=f"{provider} runtime credentials",
description=f"Authenticate Hermes using the currently configured {provider} runtime credentials.",
@@ -111,7 +235,11 @@ class HermesACPAgent(acp.Agent):
]
client_name = client_info.name if client_info else "unknown"
logger.info("Initialize from %s (protocol v%s)", client_name, protocol_version)
logger.info(
"Initialize from %s (protocol v%s)",
client_name,
resolved_protocol_version,
)
return InitializeResponse(
protocol_version=acp.PROTOCOL_VERSION,
@@ -139,7 +267,9 @@ class HermesACPAgent(acp.Agent):
**kwargs: Any,
) -> NewSessionResponse:
state = self.session_manager.create_session(cwd=cwd)
await self._register_session_mcp_servers(state, mcp_servers)
logger.info("New session %s (cwd=%s)", state.session_id, cwd)
self._schedule_available_commands_update(state.session_id)
return NewSessionResponse(session_id=state.session_id)
async def load_session(
@@ -153,7 +283,9 @@ class HermesACPAgent(acp.Agent):
if state is None:
logger.warning("load_session: session %s not found", session_id)
return None
await self._register_session_mcp_servers(state, mcp_servers)
logger.info("Loaded session %s", session_id)
self._schedule_available_commands_update(session_id)
return LoadSessionResponse()
async def resume_session(
@@ -167,7 +299,9 @@ class HermesACPAgent(acp.Agent):
if state is None:
logger.warning("resume_session: session %s not found, creating new", session_id)
state = self.session_manager.create_session(cwd=cwd)
await self._register_session_mcp_servers(state, mcp_servers)
logger.info("Resumed session %s", state.session_id)
self._schedule_available_commands_update(state.session_id)
return ResumeSessionResponse()
async def cancel(self, session_id: str, **kwargs: Any) -> None:
@@ -190,7 +324,11 @@ class HermesACPAgent(acp.Agent):
) -> ForkSessionResponse:
state = self.session_manager.fork_session(session_id, cwd=cwd)
new_id = state.session_id if state else ""
if state is not None:
await self._register_session_mcp_servers(state, mcp_servers)
logger.info("Forked session %s -> %s", session_id, new_id)
if new_id:
self._schedule_available_commands_update(new_id)
return ForkSessionResponse(session_id=new_id)
async def list_sessions(
@@ -226,10 +364,19 @@ class HermesACPAgent(acp.Agent):
logger.error("prompt: session %s not found", session_id)
return PromptResponse(stop_reason="refusal")
user_text = _extract_text(prompt)
if not user_text.strip():
user_text = _extract_text(prompt).strip()
if not user_text:
return PromptResponse(stop_reason="end_turn")
# Intercept slash commands — handle locally without calling the LLM
if user_text.startswith("/"):
response_text = self._handle_slash_command(user_text, state)
if response_text is not None:
if self._conn:
update = acp.update_agent_message_text(response_text)
await self._conn.session_update(session_id, update)
return PromptResponse(stop_reason="end_turn")
logger.info("Prompt on session %s: %s", session_id, user_text[:100])
conn = self._conn
@@ -295,6 +442,8 @@ class HermesACPAgent(acp.Agent):
if result.get("messages"):
state.history = result["messages"]
# Persist updated history so sessions survive process restarts.
self.session_manager.save_session(session_id)
final_response = result.get("final_response", "")
if final_response and conn:
@@ -315,19 +464,263 @@ class HermesACPAgent(acp.Agent):
stop_reason = "cancelled" if state.cancel_event and state.cancel_event.is_set() else "end_turn"
return PromptResponse(stop_reason=stop_reason, usage=usage)
# ---- Model switching ----------------------------------------------------
# ---- Slash commands (headless) -------------------------------------------
@classmethod
def _available_commands(cls) -> list[AvailableCommand]:
commands: list[AvailableCommand] = []
for spec in cls._ADVERTISED_COMMANDS:
input_hint = spec.get("input_hint")
commands.append(
AvailableCommand(
name=spec["name"],
description=spec["description"],
input=UnstructuredCommandInput(hint=input_hint)
if input_hint
else None,
)
)
return commands
async def _send_available_commands_update(self, session_id: str) -> None:
"""Advertise supported slash commands to the connected ACP client."""
if not self._conn:
return
try:
await self._conn.session_update(
session_id=session_id,
update=AvailableCommandsUpdate(
sessionUpdate="available_commands_update",
availableCommands=self._available_commands(),
),
)
except Exception:
logger.warning(
"Failed to advertise ACP slash commands for session %s",
session_id,
exc_info=True,
)
def _schedule_available_commands_update(self, session_id: str) -> None:
"""Send the command advertisement after the session response is queued."""
if not self._conn:
return
loop = asyncio.get_running_loop()
loop.call_soon(
asyncio.create_task, self._send_available_commands_update(session_id)
)
def _handle_slash_command(self, text: str, state: SessionState) -> str | None:
"""Dispatch a slash command and return the response text.
Returns ``None`` for unrecognized commands so they fall through
to the LLM (the user may have typed ``/something`` as prose).
"""
parts = text.split(maxsplit=1)
cmd = parts[0].lstrip("/").lower()
args = parts[1].strip() if len(parts) > 1 else ""
handler = {
"help": self._cmd_help,
"model": self._cmd_model,
"tools": self._cmd_tools,
"context": self._cmd_context,
"reset": self._cmd_reset,
"compact": self._cmd_compact,
"version": self._cmd_version,
}.get(cmd)
if handler is None:
return None # not a known command — let the LLM handle it
try:
return handler(args, state)
except Exception as e:
logger.error("Slash command /%s error: %s", cmd, e, exc_info=True)
return f"Error executing /{cmd}: {e}"
def _cmd_help(self, args: str, state: SessionState) -> str:
lines = ["Available commands:", ""]
for cmd, desc in self._SLASH_COMMANDS.items():
lines.append(f" /{cmd:10s} {desc}")
lines.append("")
lines.append("Unrecognized /commands are sent to the model as normal messages.")
return "\n".join(lines)
def _cmd_model(self, args: str, state: SessionState) -> str:
if not args:
model = state.model or getattr(state.agent, "model", "unknown")
provider = getattr(state.agent, "provider", None) or "auto"
return f"Current model: {model}\nProvider: {provider}"
new_model = args.strip()
target_provider = None
current_provider = getattr(state.agent, "provider", None) or "openrouter"
# Auto-detect provider for the requested model
try:
from hermes_cli.models import parse_model_input, detect_provider_for_model
target_provider, new_model = parse_model_input(new_model, current_provider)
if target_provider == current_provider:
detected = detect_provider_for_model(new_model, current_provider)
if detected:
target_provider, new_model = detected
except Exception:
logger.debug("Provider detection failed, using model as-is", exc_info=True)
state.model = new_model
state.agent = self.session_manager._make_agent(
session_id=state.session_id,
cwd=state.cwd,
model=new_model,
requested_provider=target_provider or current_provider,
)
self.session_manager.save_session(state.session_id)
provider_label = getattr(state.agent, "provider", None) or target_provider or current_provider
logger.info("Session %s: model switched to %s", state.session_id, new_model)
return f"Model switched to: {new_model}\nProvider: {provider_label}"
def _cmd_tools(self, args: str, state: SessionState) -> str:
try:
from model_tools import get_tool_definitions
toolsets = getattr(state.agent, "enabled_toolsets", None) or ["hermes-acp"]
tools = get_tool_definitions(enabled_toolsets=toolsets, quiet_mode=True)
if not tools:
return "No tools available."
lines = [f"Available tools ({len(tools)}):"]
for t in tools:
name = t.get("function", {}).get("name", "?")
desc = t.get("function", {}).get("description", "")
# Truncate long descriptions
if len(desc) > 80:
desc = desc[:77] + "..."
lines.append(f" {name}: {desc}")
return "\n".join(lines)
except Exception as e:
return f"Could not list tools: {e}"
def _cmd_context(self, args: str, state: SessionState) -> str:
n_messages = len(state.history)
if n_messages == 0:
return "Conversation is empty (no messages yet)."
# Count by role
roles: dict[str, int] = {}
for msg in state.history:
role = msg.get("role", "unknown")
roles[role] = roles.get(role, 0) + 1
lines = [
f"Conversation: {n_messages} messages",
f" user: {roles.get('user', 0)}, assistant: {roles.get('assistant', 0)}, "
f"tool: {roles.get('tool', 0)}, system: {roles.get('system', 0)}",
]
model = state.model or getattr(state.agent, "model", "")
if model:
lines.append(f"Model: {model}")
return "\n".join(lines)
def _cmd_reset(self, args: str, state: SessionState) -> str:
state.history.clear()
self.session_manager.save_session(state.session_id)
return "Conversation history cleared."
def _cmd_compact(self, args: str, state: SessionState) -> str:
if not state.history:
return "Nothing to compress — conversation is empty."
try:
agent = state.agent
if not getattr(agent, "compression_enabled", True):
return "Context compression is disabled for this agent."
if not hasattr(agent, "_compress_context"):
return "Context compression not available for this agent."
from agent.model_metadata import estimate_messages_tokens_rough
original_count = len(state.history)
approx_tokens = estimate_messages_tokens_rough(state.history)
original_session_db = getattr(agent, "_session_db", None)
try:
# ACP sessions must keep a stable session id, so avoid the
# SQLite session-splitting side effect inside _compress_context.
agent._session_db = None
compressed, _ = agent._compress_context(
state.history,
getattr(agent, "_cached_system_prompt", "") or "",
approx_tokens=approx_tokens,
task_id=state.session_id,
)
finally:
agent._session_db = original_session_db
state.history = compressed
self.session_manager.save_session(state.session_id)
new_count = len(state.history)
new_tokens = estimate_messages_tokens_rough(state.history)
return (
f"Context compressed: {original_count} -> {new_count} messages\n"
f"~{approx_tokens:,} -> ~{new_tokens:,} tokens"
)
except Exception as e:
return f"Compression failed: {e}"
def _cmd_version(self, args: str, state: SessionState) -> str:
return f"Hermes Agent v{HERMES_VERSION}"
# ---- Model switching (ACP protocol method) -------------------------------
async def set_session_model(
self, model_id: str, session_id: str, **kwargs: Any
):
"""Switch the model for a session."""
) -> SetSessionModelResponse | None:
"""Switch the model for a session (called by ACP protocol)."""
state = self.session_manager.get_session(session_id)
if state:
state.model = model_id
current_provider = getattr(state.agent, "provider", None)
current_base_url = getattr(state.agent, "base_url", None)
current_api_mode = getattr(state.agent, "api_mode", None)
state.agent = self.session_manager._make_agent(
session_id=session_id,
cwd=state.cwd,
model=model_id,
requested_provider=current_provider,
base_url=current_base_url,
api_mode=current_api_mode,
)
self.session_manager.save_session(session_id)
logger.info("Session %s: model switched to %s", session_id, model_id)
return SetSessionModelResponse()
logger.warning("Session %s: model switch requested for missing session", session_id)
return None
async def set_session_mode(
self, mode_id: str, session_id: str, **kwargs: Any
) -> SetSessionModeResponse | None:
"""Persist the editor-requested mode so ACP clients do not fail on mode switches."""
state = self.session_manager.get_session(session_id)
if state is None:
logger.warning("Session %s: mode switch requested for missing session", session_id)
return None
setattr(state, "mode", mode_id)
self.session_manager.save_session(session_id)
logger.info("Session %s: mode switched to %s", session_id, mode_id)
return SetSessionModeResponse()
async def set_config_option(
self, config_id: str, session_id: str, value: str, **kwargs: Any
) -> SetSessionConfigOptionResponse | None:
"""Accept ACP config option updates even when Hermes has no typed ACP config surface yet."""
state = self.session_manager.get_session(session_id)
if state is None:
logger.warning("Session %s: config update requested for missing session", session_id)
return None
options = getattr(state, "config_options", None)
if not isinstance(options, dict):
options = {}
options[str(config_id)] = value
setattr(state, "config_options", options)
self.session_manager.save_session(session_id)
logger.info("Session %s: config option %s updated", session_id, config_id)
return SetSessionConfigOptionResponse(config_options=[])

View File

@@ -1,8 +1,19 @@
"""ACP session manager — maps ACP sessions to Hermes AIAgent instances."""
"""ACP session manager — maps ACP sessions to Hermes AIAgent instances.
Sessions are persisted to the shared SessionDB (``~/.hermes/state.db``) so they
survive process restarts and appear in ``session_search``. When the editor
reconnects after idle/restart, the ``load_session`` / ``resume_session`` calls
find the persisted session in the database and restore the full conversation
history.
"""
from __future__ import annotations
from hermes_constants import get_hermes_home
import copy
import json
import logging
import sys
import uuid
from dataclasses import dataclass, field
from threading import Lock
@@ -11,6 +22,17 @@ from typing import Any, Dict, List, Optional
logger = logging.getLogger(__name__)
def _acp_stderr_print(*args, **kwargs) -> None:
"""Best-effort human-readable output sink for ACP stdio sessions.
ACP reserves stdout for JSON-RPC frames, so any incidental CLI/status output
from AIAgent must be redirected away from stdout. Route it to stderr instead.
"""
kwargs = dict(kwargs)
kwargs.setdefault("file", sys.stderr)
print(*args, **kwargs)
def _register_task_cwd(task_id: str, cwd: str) -> None:
"""Bind a task/session id to the editor's working directory for tools."""
if not task_id:
@@ -46,18 +68,26 @@ class SessionState:
class SessionManager:
"""Thread-safe manager for ACP sessions backed by Hermes AIAgent instances."""
"""Thread-safe manager for ACP sessions backed by Hermes AIAgent instances.
def __init__(self, agent_factory=None):
Sessions are held in-memory for fast access **and** persisted to the
shared SessionDB so they survive process restarts and are searchable
via ``session_search``.
"""
def __init__(self, agent_factory=None, db=None):
"""
Args:
agent_factory: Optional callable that creates an AIAgent-like object.
Used by tests. When omitted, a real AIAgent is created
using the current Hermes runtime provider configuration.
db: Optional SessionDB instance. When omitted, the default
SessionDB (``~/.hermes/state.db``) is lazily created.
"""
self._sessions: Dict[str, SessionState] = {}
self._lock = Lock()
self._agent_factory = agent_factory
self._db_instance = db # None → lazy-init on first use
# ---- public API ---------------------------------------------------------
@@ -77,54 +107,67 @@ class SessionManager:
with self._lock:
self._sessions[session_id] = state
_register_task_cwd(session_id, cwd)
self._persist(state)
logger.info("Created ACP session %s (cwd=%s)", session_id, cwd)
return state
def get_session(self, session_id: str) -> Optional[SessionState]:
"""Return the session for *session_id*, or ``None``."""
"""Return the session for *session_id*, or ``None``.
If the session is not in memory but exists in the database (e.g. after
a process restart), it is transparently restored.
"""
with self._lock:
return self._sessions.get(session_id)
state = self._sessions.get(session_id)
if state is not None:
return state
# Attempt to restore from database.
return self._restore(session_id)
def remove_session(self, session_id: str) -> bool:
"""Remove a session. Returns True if it existed."""
"""Remove a session from memory and database. Returns True if it existed."""
with self._lock:
existed = self._sessions.pop(session_id, None) is not None
if existed:
db_existed = self._delete_persisted(session_id)
if existed or db_existed:
_clear_task_cwd(session_id)
return existed
return existed or db_existed
def fork_session(self, session_id: str, cwd: str = ".") -> Optional[SessionState]:
"""Deep-copy a session's history into a new session."""
import threading
with self._lock:
original = self._sessions.get(session_id)
if original is None:
return None
original = self.get_session(session_id) # checks DB too
if original is None:
return None
new_id = str(uuid.uuid4())
agent = self._make_agent(
session_id=new_id,
cwd=cwd,
model=original.model or None,
)
state = SessionState(
session_id=new_id,
agent=agent,
cwd=cwd,
model=getattr(agent, "model", original.model) or original.model,
history=copy.deepcopy(original.history),
cancel_event=threading.Event(),
)
new_id = str(uuid.uuid4())
agent = self._make_agent(
session_id=new_id,
cwd=cwd,
model=original.model or None,
)
state = SessionState(
session_id=new_id,
agent=agent,
cwd=cwd,
model=getattr(agent, "model", original.model) or original.model,
history=copy.deepcopy(original.history),
cancel_event=threading.Event(),
)
with self._lock:
self._sessions[new_id] = state
_register_task_cwd(new_id, cwd)
self._persist(state)
logger.info("Forked ACP session %s -> %s", session_id, new_id)
return state
def list_sessions(self) -> List[Dict[str, Any]]:
"""Return lightweight info dicts for all sessions."""
"""Return lightweight info dicts for all sessions (memory + database)."""
# Collect in-memory sessions first.
with self._lock:
return [
seen_ids = set(self._sessions.keys())
results = [
{
"session_id": s.session_id,
"cwd": s.cwd,
@@ -134,23 +177,245 @@ class SessionManager:
for s in self._sessions.values()
]
# Merge any persisted sessions not currently in memory.
db = self._get_db()
if db is not None:
try:
rows = db.search_sessions(source="acp", limit=1000)
for row in rows:
sid = row["id"]
if sid in seen_ids:
continue
# Extract cwd from model_config JSON.
cwd = "."
mc = row.get("model_config")
if mc:
try:
cwd = json.loads(mc).get("cwd", ".")
except (json.JSONDecodeError, TypeError):
pass
results.append({
"session_id": sid,
"cwd": cwd,
"model": row.get("model") or "",
"history_len": row.get("message_count") or 0,
})
except Exception:
logger.debug("Failed to list ACP sessions from DB", exc_info=True)
return results
def update_cwd(self, session_id: str, cwd: str) -> Optional[SessionState]:
"""Update the working directory for a session and its tool overrides."""
with self._lock:
state = self._sessions.get(session_id)
if state is None:
return None
state.cwd = cwd
state = self.get_session(session_id) # checks DB too
if state is None:
return None
state.cwd = cwd
_register_task_cwd(session_id, cwd)
self._persist(state)
return state
def cleanup(self) -> None:
"""Remove all sessions and clear task-specific cwd overrides."""
"""Remove all sessions (memory and database) and clear task-specific cwd overrides."""
with self._lock:
session_ids = list(self._sessions.keys())
self._sessions.clear()
for session_id in session_ids:
_clear_task_cwd(session_id)
self._delete_persisted(session_id)
# Also remove any DB-only ACP sessions not currently in memory.
db = self._get_db()
if db is not None:
try:
rows = db.search_sessions(source="acp", limit=10000)
for row in rows:
sid = row["id"]
_clear_task_cwd(sid)
db.delete_session(sid)
except Exception:
logger.debug("Failed to cleanup ACP sessions from DB", exc_info=True)
def save_session(self, session_id: str) -> None:
"""Persist the current state of a session to the database.
Called by the server after prompt completion, slash commands that
mutate history, and model switches.
"""
with self._lock:
state = self._sessions.get(session_id)
if state is not None:
self._persist(state)
# ---- persistence via SessionDB ------------------------------------------
def _get_db(self):
"""Lazily initialise and return the SessionDB instance.
Returns ``None`` if the DB is unavailable (e.g. import error in a
minimal test environment).
Note: we resolve ``HERMES_HOME`` dynamically rather than relying on
the module-level ``DEFAULT_DB_PATH`` constant, because that constant
is evaluated at import time and won't reflect env-var changes made
later (e.g. by the test fixture ``_isolate_hermes_home``).
"""
if self._db_instance is not None:
return self._db_instance
try:
import os
from pathlib import Path
from hermes_state import SessionDB
hermes_home = get_hermes_home()
self._db_instance = SessionDB(db_path=hermes_home / "state.db")
return self._db_instance
except Exception:
logger.debug("SessionDB unavailable for ACP persistence", exc_info=True)
return None
def _persist(self, state: SessionState) -> None:
"""Write session state to the database.
Creates the session record if it doesn't exist, then replaces all
stored messages with the current in-memory history.
"""
db = self._get_db()
if db is None:
return
# Ensure model is a plain string (not a MagicMock or other proxy).
model_str = str(state.model) if state.model else None
session_meta = {"cwd": state.cwd}
provider = getattr(state.agent, "provider", None)
base_url = getattr(state.agent, "base_url", None)
api_mode = getattr(state.agent, "api_mode", None)
if isinstance(provider, str) and provider.strip():
session_meta["provider"] = provider.strip()
if isinstance(base_url, str) and base_url.strip():
session_meta["base_url"] = base_url.strip()
if isinstance(api_mode, str) and api_mode.strip():
session_meta["api_mode"] = api_mode.strip()
cwd_json = json.dumps(session_meta)
try:
# Ensure the session record exists.
existing = db.get_session(state.session_id)
if existing is None:
db.create_session(
session_id=state.session_id,
source="acp",
model=model_str,
model_config={"cwd": state.cwd},
)
else:
# Update model_config (contains cwd) if changed.
try:
with db._lock:
db._conn.execute(
"UPDATE sessions SET model_config = ?, model = COALESCE(?, model) WHERE id = ?",
(cwd_json, model_str, state.session_id),
)
db._conn.commit()
except Exception:
logger.debug("Failed to update ACP session metadata", exc_info=True)
# Replace stored messages with current history.
db.clear_messages(state.session_id)
for msg in state.history:
db.append_message(
session_id=state.session_id,
role=msg.get("role", "user"),
content=msg.get("content"),
tool_name=msg.get("tool_name") or msg.get("name"),
tool_calls=msg.get("tool_calls"),
tool_call_id=msg.get("tool_call_id"),
)
except Exception:
logger.warning("Failed to persist ACP session %s", state.session_id, exc_info=True)
def _restore(self, session_id: str) -> Optional[SessionState]:
"""Load a session from the database into memory, recreating the AIAgent."""
import threading
db = self._get_db()
if db is None:
return None
try:
row = db.get_session(session_id)
except Exception:
logger.debug("Failed to query DB for ACP session %s", session_id, exc_info=True)
return None
if row is None:
return None
# Only restore ACP sessions.
if row.get("source") != "acp":
return None
# Extract cwd from model_config.
cwd = "."
requested_provider = row.get("billing_provider")
restored_base_url = row.get("billing_base_url")
restored_api_mode = None
mc = row.get("model_config")
if mc:
try:
meta = json.loads(mc)
if isinstance(meta, dict):
cwd = meta.get("cwd", ".")
requested_provider = meta.get("provider") or requested_provider
restored_base_url = meta.get("base_url") or restored_base_url
restored_api_mode = meta.get("api_mode") or restored_api_mode
except (json.JSONDecodeError, TypeError):
pass
model = row.get("model") or None
# Load conversation history.
try:
history = db.get_messages_as_conversation(session_id)
except Exception:
logger.warning("Failed to load messages for ACP session %s", session_id, exc_info=True)
history = []
try:
agent = self._make_agent(
session_id=session_id,
cwd=cwd,
model=model,
requested_provider=requested_provider,
base_url=restored_base_url,
api_mode=restored_api_mode,
)
except Exception:
logger.warning("Failed to recreate agent for ACP session %s", session_id, exc_info=True)
return None
state = SessionState(
session_id=session_id,
agent=agent,
cwd=cwd,
model=model or getattr(agent, "model", "") or "",
history=history,
cancel_event=threading.Event(),
)
with self._lock:
self._sessions[session_id] = state
_register_task_cwd(session_id, cwd)
logger.info("Restored ACP session %s from DB (%d messages)", session_id, len(history))
return state
def _delete_persisted(self, session_id: str) -> bool:
"""Delete a session from the database. Returns True if it existed."""
db = self._get_db()
if db is None:
return False
try:
return db.delete_session(session_id)
except Exception:
logger.debug("Failed to delete ACP session %s from DB", session_id, exc_info=True)
return False
# ---- internal -----------------------------------------------------------
@@ -160,6 +425,9 @@ class SessionManager:
session_id: str,
cwd: str,
model: str | None = None,
requested_provider: str | None = None,
base_url: str | None = None,
api_mode: str | None = None,
):
if self._agent_factory is not None:
return self._agent_factory()
@@ -170,11 +438,11 @@ class SessionManager:
config = load_config()
model_cfg = config.get("model")
default_model = "anthropic/claude-opus-4.6"
requested_provider = None
default_model = ""
config_provider = None
if isinstance(model_cfg, dict):
default_model = str(model_cfg.get("default") or default_model)
requested_provider = model_cfg.get("provider")
config_provider = model_cfg.get("provider")
elif isinstance(model_cfg, str) and model_cfg.strip():
default_model = model_cfg.strip()
@@ -187,17 +455,23 @@ class SessionManager:
}
try:
runtime = resolve_runtime_provider(requested=requested_provider)
runtime = resolve_runtime_provider(requested=requested_provider or config_provider)
kwargs.update(
{
"provider": runtime.get("provider"),
"api_mode": runtime.get("api_mode"),
"base_url": runtime.get("base_url"),
"api_mode": api_mode or runtime.get("api_mode"),
"base_url": base_url or runtime.get("base_url"),
"api_key": runtime.get("api_key"),
"command": runtime.get("command"),
"args": list(runtime.get("args") or []),
}
)
except Exception:
logger.debug("ACP session falling back to default provider resolution", exc_info=True)
_register_task_cwd(session_id, cwd)
return AIAgent(**kwargs)
agent = AIAgent(**kwargs)
# ACP stdio transport requires stdout to remain protocol-only JSON-RPC.
# Route any incidental human-readable agent output to stderr instead.
agent._print_fn = _acp_stderr_print
return agent

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,113 @@
"""BuiltinMemoryProvider — wraps MEMORY.md / USER.md as a MemoryProvider.
Always registered as the first provider. Cannot be disabled or removed.
This is the existing Hermes memory system exposed through the provider
interface for compatibility with the MemoryManager.
The actual storage logic lives in tools/memory_tool.py (MemoryStore).
This provider is a thin adapter that delegates to MemoryStore and
exposes the memory tool schema.
"""
from __future__ import annotations
import json
import logging
from typing import Any, Dict, List, Optional
from agent.memory_provider import MemoryProvider
logger = logging.getLogger(__name__)
class BuiltinMemoryProvider(MemoryProvider):
"""Built-in file-backed memory (MEMORY.md + USER.md).
Always active, never disabled by other providers. The `memory` tool
is handled by run_agent.py's agent-level tool interception (not through
the normal registry), so get_tool_schemas() returns an empty list —
the memory tool is already wired separately.
"""
def __init__(
self,
memory_store=None,
memory_enabled: bool = False,
user_profile_enabled: bool = False,
):
self._store = memory_store
self._memory_enabled = memory_enabled
self._user_profile_enabled = user_profile_enabled
@property
def name(self) -> str:
return "builtin"
def is_available(self) -> bool:
"""Built-in memory is always available."""
return True
def initialize(self, session_id: str, **kwargs) -> None:
"""Load memory from disk if not already loaded."""
if self._store is not None:
self._store.load_from_disk()
def system_prompt_block(self) -> str:
"""Return MEMORY.md and USER.md content for the system prompt.
Uses the frozen snapshot captured at load time. This ensures the
system prompt stays stable throughout a session (preserving the
prompt cache), even though the live entries may change via tool calls.
"""
if not self._store:
return ""
parts = []
if self._memory_enabled:
mem_block = self._store.format_for_system_prompt("memory")
if mem_block:
parts.append(mem_block)
if self._user_profile_enabled:
user_block = self._store.format_for_system_prompt("user")
if user_block:
parts.append(user_block)
return "\n\n".join(parts)
def prefetch(self, query: str, *, session_id: str = "") -> str:
"""Built-in memory doesn't do query-based recall — it's injected via system_prompt_block."""
return ""
def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
"""Built-in memory doesn't auto-sync turns — writes happen via the memory tool."""
def get_tool_schemas(self) -> List[Dict[str, Any]]:
"""Return empty list.
The `memory` tool is an agent-level intercepted tool, handled
specially in run_agent.py before normal tool dispatch. It's not
part of the standard tool registry. We don't duplicate it here.
"""
return []
def handle_tool_call(self, tool_name: str, args: Dict[str, Any], **kwargs) -> str:
"""Not used — the memory tool is intercepted in run_agent.py."""
return json.dumps({"error": "Built-in memory tool is handled by the agent loop"})
def shutdown(self) -> None:
"""No cleanup needed — files are saved on every write."""
# -- Property access for backward compatibility --------------------------
@property
def store(self):
"""Access the underlying MemoryStore for legacy code paths."""
return self._store
@property
def memory_enabled(self) -> bool:
return self._memory_enabled
@property
def user_profile_enabled(self) -> bool:
return self._user_profile_enabled

View File

@@ -1,12 +1,19 @@
"""Automatic context window compression for long conversations.
Self-contained class with its own OpenAI client for summarization.
Uses Gemini Flash (cheap/fast) to summarize middle turns while
Uses auxiliary model (cheap/fast) to summarize middle turns while
protecting head and tail context.
Improvements over v1:
- Structured summary template (Goal, Progress, Decisions, Files, Next Steps)
- Iterative summary updates (preserves info across multiple compactions)
- Token-budget tail protection instead of fixed message count
- Tool output pruning before LLM summarization (cheap pre-pass)
- Scaled summary budget (proportional to compressed content)
- Richer tool call/result detail in summarizer input
"""
import logging
import os
from typing import Any, Dict, List, Optional
from agent.auxiliary_client import call_llm
@@ -27,12 +34,29 @@ SUMMARY_PREFIX = (
)
LEGACY_SUMMARY_PREFIX = "[CONTEXT SUMMARY]:"
# Minimum tokens for the summary output
_MIN_SUMMARY_TOKENS = 2000
# Proportion of compressed content to allocate for summary
_SUMMARY_RATIO = 0.20
# Absolute ceiling for summary tokens (even on very large context windows)
_SUMMARY_TOKENS_CEILING = 12_000
# Placeholder used when pruning old tool results
_PRUNED_TOOL_PLACEHOLDER = "[Old tool output cleared to save context space]"
# Chars per token rough estimate
_CHARS_PER_TOKEN = 4
class ContextCompressor:
"""Compresses conversation context when approaching the model's context limit.
Algorithm: protect first N + last N turns, summarize everything in between.
Token tracking uses actual counts from API responses for accuracy.
Algorithm:
1. Prune old tool results (cheap, no LLM call)
2. Protect head messages (system prompt + first exchange)
3. Protect tail messages by token budget (most recent ~20K tokens)
4. Summarize middle turns with structured LLM prompt
5. On subsequent compactions, iteratively update the previous summary
"""
def __init__(
@@ -40,23 +64,50 @@ class ContextCompressor:
model: str,
threshold_percent: float = 0.50,
protect_first_n: int = 3,
protect_last_n: int = 4,
summary_target_tokens: int = 2500,
protect_last_n: int = 20,
summary_target_ratio: float = 0.20,
quiet_mode: bool = False,
summary_model_override: str = None,
base_url: str = "",
api_key: str = "",
config_context_length: int | None = None,
provider: str = "",
):
self.model = model
self.base_url = base_url
self.api_key = api_key
self.provider = provider
self.threshold_percent = threshold_percent
self.protect_first_n = protect_first_n
self.protect_last_n = protect_last_n
self.summary_target_tokens = summary_target_tokens
self.summary_target_ratio = max(0.10, min(summary_target_ratio, 0.80))
self.quiet_mode = quiet_mode
self.context_length = get_model_context_length(model, base_url=base_url)
self.context_length = get_model_context_length(
model, base_url=base_url, api_key=api_key,
config_context_length=config_context_length,
provider=provider,
)
self.threshold_tokens = int(self.context_length * threshold_percent)
self.compression_count = 0
# Derive token budgets: ratio is relative to the threshold, not total context
target_tokens = int(self.threshold_tokens * self.summary_target_ratio)
self.tail_token_budget = target_tokens
self.max_summary_tokens = min(
int(self.context_length * 0.05), _SUMMARY_TOKENS_CEILING,
)
if not quiet_mode:
logger.info(
"Context compressor initialized: model=%s context_length=%d "
"threshold=%d (%.0f%%) target_ratio=%.0f%% tail_budget=%d "
"provider=%s base_url=%s",
model, self.context_length, self.threshold_tokens,
threshold_percent * 100, self.summary_target_ratio * 100,
self.tail_token_budget,
provider or "none", base_url or "none",
)
self._context_probed = False # True after a step-down from context error
self.last_prompt_tokens = 0
@@ -65,6 +116,9 @@ class ContextCompressor:
self.summary_model = summary_model_override or ""
# Stores the previous compaction summary for iterative updates
self._previous_summary: Optional[str] = None
def update_from_response(self, usage: Dict[str, Any]):
"""Update tracked token usage from API response."""
self.last_prompt_tokens = usage.get("prompt_tokens", 0)
@@ -87,57 +141,213 @@ class ContextCompressor:
"last_prompt_tokens": self.last_prompt_tokens,
"threshold_tokens": self.threshold_tokens,
"context_length": self.context_length,
"usage_percent": (self.last_prompt_tokens / self.context_length * 100) if self.context_length else 0,
"usage_percent": min(100, (self.last_prompt_tokens / self.context_length * 100)) if self.context_length else 0,
"compression_count": self.compression_count,
}
def _generate_summary(self, turns_to_summarize: List[Dict[str, Any]]) -> Optional[str]:
"""Generate a concise summary of conversation turns.
# ------------------------------------------------------------------
# Tool output pruning (cheap pre-pass, no LLM call)
# ------------------------------------------------------------------
Tries the auxiliary model first, then falls back to the user's main
model. Returns None if all attempts fail — the caller should drop
def _prune_old_tool_results(
self, messages: List[Dict[str, Any]], protect_tail_count: int,
) -> tuple[List[Dict[str, Any]], int]:
"""Replace old tool result contents with a short placeholder.
Walks backward from the end, protecting the most recent
``protect_tail_count`` messages. Older tool results get their
content replaced with a placeholder string.
Returns (pruned_messages, pruned_count).
"""
if not messages:
return messages, 0
result = [m.copy() for m in messages]
pruned = 0
prune_boundary = len(result) - protect_tail_count
for i in range(prune_boundary):
msg = result[i]
if msg.get("role") != "tool":
continue
content = msg.get("content", "")
if not content or content == _PRUNED_TOOL_PLACEHOLDER:
continue
# Only prune if the content is substantial (>200 chars)
if len(content) > 200:
result[i] = {**msg, "content": _PRUNED_TOOL_PLACEHOLDER}
pruned += 1
return result, pruned
# ------------------------------------------------------------------
# Summarization
# ------------------------------------------------------------------
def _compute_summary_budget(self, turns_to_summarize: List[Dict[str, Any]]) -> int:
"""Scale summary token budget with the amount of content being compressed.
The maximum scales with the model's context window (5% of context,
capped at ``_SUMMARY_TOKENS_CEILING``) so large-context models get
richer summaries instead of being hard-capped at 8K tokens.
"""
content_tokens = estimate_messages_tokens_rough(turns_to_summarize)
budget = int(content_tokens * _SUMMARY_RATIO)
return max(_MIN_SUMMARY_TOKENS, min(budget, self.max_summary_tokens))
def _serialize_for_summary(self, turns: List[Dict[str, Any]]) -> str:
"""Serialize conversation turns into labeled text for the summarizer.
Includes tool call arguments and result content (up to 3000 chars
per message) so the summarizer can preserve specific details like
file paths, commands, and outputs.
"""
parts = []
for msg in turns:
role = msg.get("role", "unknown")
content = msg.get("content") or ""
# Tool results: keep more content than before (3000 chars)
if role == "tool":
tool_id = msg.get("tool_call_id", "")
if len(content) > 3000:
content = content[:2000] + "\n...[truncated]...\n" + content[-800:]
parts.append(f"[TOOL RESULT {tool_id}]: {content}")
continue
# Assistant messages: include tool call names AND arguments
if role == "assistant":
if len(content) > 3000:
content = content[:2000] + "\n...[truncated]...\n" + content[-800:]
tool_calls = msg.get("tool_calls", [])
if tool_calls:
tc_parts = []
for tc in tool_calls:
if isinstance(tc, dict):
fn = tc.get("function", {})
name = fn.get("name", "?")
args = fn.get("arguments", "")
# Truncate long arguments but keep enough for context
if len(args) > 500:
args = args[:400] + "..."
tc_parts.append(f" {name}({args})")
else:
fn = getattr(tc, "function", None)
name = getattr(fn, "name", "?") if fn else "?"
tc_parts.append(f" {name}(...)")
content += "\n[Tool calls:\n" + "\n".join(tc_parts) + "\n]"
parts.append(f"[ASSISTANT]: {content}")
continue
# User and other roles
if len(content) > 3000:
content = content[:2000] + "\n...[truncated]...\n" + content[-800:]
parts.append(f"[{role.upper()}]: {content}")
return "\n\n".join(parts)
def _generate_summary(self, turns_to_summarize: List[Dict[str, Any]]) -> Optional[str]:
"""Generate a structured summary of conversation turns.
Uses a structured template (Goal, Progress, Decisions, Files, Next Steps)
inspired by Pi-mono and OpenCode. When a previous summary exists,
generates an iterative update instead of summarizing from scratch.
Returns None if all attempts fail — the caller should drop
the middle turns without a summary rather than inject a useless
placeholder.
"""
parts = []
for msg in turns_to_summarize:
role = msg.get("role", "unknown")
content = msg.get("content") or ""
if len(content) > 2000:
content = content[:1000] + "\n...[truncated]...\n" + content[-500:]
tool_calls = msg.get("tool_calls", [])
if tool_calls:
tool_names = [tc.get("function", {}).get("name", "?") for tc in tool_calls if isinstance(tc, dict)]
content += f"\n[Tool calls: {', '.join(tool_names)}]"
parts.append(f"[{role.upper()}]: {content}")
summary_budget = self._compute_summary_budget(turns_to_summarize)
content_to_summarize = self._serialize_for_summary(turns_to_summarize)
content_to_summarize = "\n\n".join(parts)
prompt = f"""Create a concise handoff summary for a later assistant that will continue this conversation after earlier turns are compacted.
if self._previous_summary:
# Iterative update: preserve existing info, add new progress
prompt = f"""You are updating a context compaction summary. A previous compaction produced the summary below. New conversation turns have occurred since then and need to be incorporated.
Describe:
1. What actions were taken (tool calls, searches, file operations)
2. Key information or results obtained
3. Important decisions, constraints, or user preferences
4. Relevant data, file names, outputs, or next steps needed to continue
PREVIOUS SUMMARY:
{self._previous_summary}
Keep it factual, concise, and focused on helping the next assistant resume without repeating work. Target ~{self.summary_target_tokens} tokens.
NEW TURNS TO INCORPORATE:
{content_to_summarize}
Update the summary using this exact structure. PRESERVE all existing information that is still relevant. ADD new progress. Move items from "In Progress" to "Done" when completed. Remove information only if it is clearly obsolete.
## Goal
[What the user is trying to accomplish — preserve from previous summary, update if goal evolved]
## Constraints & Preferences
[User preferences, coding style, constraints, important decisions — accumulate across compactions]
## Progress
### Done
[Completed work — include specific file paths, commands run, results obtained]
### In Progress
[Work currently underway]
### Blocked
[Any blockers or issues encountered]
## Key Decisions
[Important technical decisions and why they were made]
## Relevant Files
[Files read, modified, or created — with brief note on each. Accumulate across compactions.]
## Next Steps
[What needs to happen next to continue the work]
## Critical Context
[Any specific values, error messages, configuration details, or data that would be lost without explicit preservation]
Target ~{summary_budget} tokens. Be specific — include file paths, command outputs, error messages, and concrete values rather than vague descriptions.
Write only the summary body. Do not include any preamble or prefix."""
else:
# First compaction: summarize from scratch
prompt = f"""Create a structured handoff summary for a later assistant that will continue this conversation after earlier turns are compacted.
---
TURNS TO SUMMARIZE:
{content_to_summarize}
---
Write only the summary body. Do not include any preamble or prefix; the system will add the handoff wrapper."""
Use this exact structure:
## Goal
[What the user is trying to accomplish]
## Constraints & Preferences
[User preferences, coding style, constraints, important decisions]
## Progress
### Done
[Completed work — include specific file paths, commands run, results obtained]
### In Progress
[Work currently underway]
### Blocked
[Any blockers or issues encountered]
## Key Decisions
[Important technical decisions and why they were made]
## Relevant Files
[Files read, modified, or created — with brief note on each]
## Next Steps
[What needs to happen next to continue the work]
## Critical Context
[Any specific values, error messages, configuration details, or data that would be lost without explicit preservation]
Target ~{summary_budget} tokens. Be specific — include file paths, command outputs, error messages, and concrete values rather than vague descriptions. The goal is to prevent the next assistant from repeating work or losing important details.
Write only the summary body. Do not include any preamble or prefix."""
# Use the centralized LLM router — handles provider resolution,
# auth, and fallback internally.
try:
call_kwargs = {
"task": "compression",
"messages": [{"role": "user", "content": prompt}],
"temperature": 0.3,
"max_tokens": self.summary_target_tokens * 2,
"timeout": 30.0,
"max_tokens": summary_budget * 2,
# timeout resolved from auxiliary.compression.timeout config by call_llm
}
if self.summary_model:
call_kwargs["model"] = self.summary_model
@@ -147,6 +357,8 @@ Write only the summary body. Do not include any preamble or prefix; the system w
if not isinstance(content, str):
content = str(content) if content else ""
summary = content.strip()
# Store for iterative updates on next compaction
self._previous_summary = summary
return self._with_summary_prefix(summary)
except RuntimeError:
logging.warning("Context compression: no provider available for "
@@ -251,56 +463,155 @@ Write only the summary body. Do not include any preamble or prefix; the system w
"""Pull a compress-end boundary backward to avoid splitting a
tool_call / result group.
If the message just before ``idx`` is an assistant message with
tool_calls, those tool results will start at ``idx`` and would be
separated from their parent. Move backwards to include the whole
group in the summarised region.
If the boundary falls in the middle of a tool-result group (i.e.
there are consecutive tool messages before ``idx``), walk backward
past all of them to find the parent assistant message. If found,
move the boundary before the assistant so the entire
assistant + tool_results group is included in the summarised region
rather than being split (which causes silent data loss when
``_sanitize_tool_pairs`` removes the orphaned tail results).
"""
if idx <= 0 or idx >= len(messages):
return idx
prev = messages[idx - 1]
if prev.get("role") == "assistant" and prev.get("tool_calls"):
# The results for this assistant turn sit at idx..idx+k.
# Include the assistant message in the summarised region too.
idx -= 1
# Walk backward past consecutive tool results
check = idx - 1
while check >= 0 and messages[check].get("role") == "tool":
check -= 1
# If we landed on the parent assistant with tool_calls, pull the
# boundary before it so the whole group gets summarised together.
if check >= 0 and messages[check].get("role") == "assistant" and messages[check].get("tool_calls"):
idx = check
return idx
# ------------------------------------------------------------------
# Tail protection by token budget
# ------------------------------------------------------------------
def _find_tail_cut_by_tokens(
self, messages: List[Dict[str, Any]], head_end: int,
token_budget: int | None = None,
) -> int:
"""Walk backward from the end of messages, accumulating tokens until
the budget is reached. Returns the index where the tail starts.
``token_budget`` defaults to ``self.tail_token_budget`` which is
derived from ``summary_target_ratio * context_length``, so it
scales automatically with the model's context window.
Never cuts inside a tool_call/result group. Falls back to the old
``protect_last_n`` if the budget would protect fewer messages.
"""
if token_budget is None:
token_budget = self.tail_token_budget
n = len(messages)
min_tail = self.protect_last_n
accumulated = 0
cut_idx = n # start from beyond the end
for i in range(n - 1, head_end - 1, -1):
msg = messages[i]
content = msg.get("content") or ""
msg_tokens = len(content) // _CHARS_PER_TOKEN + 10 # +10 for role/metadata
# Include tool call arguments in estimate
for tc in msg.get("tool_calls") or []:
if isinstance(tc, dict):
args = tc.get("function", {}).get("arguments", "")
msg_tokens += len(args) // _CHARS_PER_TOKEN
if accumulated + msg_tokens > token_budget and (n - i) >= min_tail:
break
accumulated += msg_tokens
cut_idx = i
# Ensure we protect at least protect_last_n messages
fallback_cut = n - min_tail
if cut_idx > fallback_cut:
cut_idx = fallback_cut
# If the token budget would protect everything (small conversations),
# fall back to the fixed protect_last_n approach so compression can
# still remove middle turns.
if cut_idx <= head_end:
cut_idx = fallback_cut
# Align to avoid splitting tool groups
cut_idx = self._align_boundary_backward(messages, cut_idx)
return max(cut_idx, head_end + 1)
# ------------------------------------------------------------------
# Main compression entry point
# ------------------------------------------------------------------
def compress(self, messages: List[Dict[str, Any]], current_tokens: int = None) -> List[Dict[str, Any]]:
"""Compress conversation messages by summarizing middle turns.
Keeps first N + last N turns, summarizes everything in between.
Algorithm:
1. Prune old tool results (cheap pre-pass, no LLM call)
2. Protect head messages (system prompt + first exchange)
3. Find tail boundary by token budget (~20K tokens of recent context)
4. Summarize middle turns with structured LLM prompt
5. On re-compression, iteratively update the previous summary
After compression, orphaned tool_call / tool_result pairs are cleaned
up so the API never receives mismatched IDs.
"""
n_messages = len(messages)
if n_messages <= self.protect_first_n + self.protect_last_n + 1:
if not self.quiet_mode:
print(f"⚠️ Cannot compress: only {n_messages} messages (need > {self.protect_first_n + self.protect_last_n + 1})")
logger.warning(
"Cannot compress: only %d messages (need > %d)",
n_messages,
self.protect_first_n + self.protect_last_n + 1,
)
return messages
display_tokens = current_tokens if current_tokens else self.last_prompt_tokens or estimate_messages_tokens_rough(messages)
# Phase 1: Prune old tool results (cheap, no LLM call)
messages, pruned_count = self._prune_old_tool_results(
messages, protect_tail_count=self.protect_last_n * 3,
)
if pruned_count and not self.quiet_mode:
logger.info("Pre-compression: pruned %d old tool result(s)", pruned_count)
# Phase 2: Determine boundaries
compress_start = self.protect_first_n
compress_end = n_messages - self.protect_last_n
if compress_start >= compress_end:
return messages
# Adjust boundaries to avoid splitting tool_call/result groups.
compress_start = self._align_boundary_forward(messages, compress_start)
compress_end = self._align_boundary_backward(messages, compress_end)
# Use token-budget tail protection instead of fixed message count
compress_end = self._find_tail_cut_by_tokens(messages, compress_start)
if compress_start >= compress_end:
return messages
turns_to_summarize = messages[compress_start:compress_end]
display_tokens = current_tokens if current_tokens else self.last_prompt_tokens or estimate_messages_tokens_rough(messages)
if not self.quiet_mode:
print(f"\n📦 Context compression triggered ({display_tokens:,} tokens ≥ {self.threshold_tokens:,} threshold)")
print(f" 📊 Model context limit: {self.context_length:,} tokens ({self.threshold_percent*100:.0f}% = {self.threshold_tokens:,})")
if not self.quiet_mode:
print(f" 🗜️ Summarizing turns {compress_start+1}-{compress_end} ({len(turns_to_summarize)} turns)")
logger.info(
"Context compression triggered (%d tokens >= %d threshold)",
display_tokens,
self.threshold_tokens,
)
logger.info(
"Model context limit: %d tokens (%.0f%% = %d)",
self.context_length,
self.threshold_percent * 100,
self.threshold_tokens,
)
tail_msgs = n_messages - compress_end
logger.info(
"Summarizing turns %d-%d (%d turns), protecting %d head + %d tail messages",
compress_start + 1,
compress_end,
len(turns_to_summarize),
compress_start,
tail_msgs,
)
# Phase 3: Generate structured summary
summary = self._generate_summary(turns_to_summarize)
# Phase 4: Assemble compressed message list
compressed = []
for i in range(compress_start):
msg = messages[i].copy()
@@ -311,16 +622,41 @@ Write only the summary body. Do not include any preamble or prefix; the system w
)
compressed.append(msg)
_merge_summary_into_tail = False
if summary:
last_head_role = messages[compress_start - 1].get("role", "user") if compress_start > 0 else "user"
summary_role = "user" if last_head_role in ("assistant", "tool") else "assistant"
compressed.append({"role": summary_role, "content": summary})
first_tail_role = messages[compress_end].get("role", "user") if compress_end < n_messages else "user"
# Pick a role that avoids consecutive same-role with both neighbors.
# Priority: avoid colliding with head (already committed), then tail.
if last_head_role in ("assistant", "tool"):
summary_role = "user"
else:
summary_role = "assistant"
# If the chosen role collides with the tail AND flipping wouldn't
# collide with the head, flip it.
if summary_role == first_tail_role:
flipped = "assistant" if summary_role == "user" else "user"
if flipped != last_head_role:
summary_role = flipped
else:
# Both roles would create consecutive same-role messages
# (e.g. head=assistant, tail=user — neither role works).
# Merge the summary into the first tail message instead
# of inserting a standalone message that breaks alternation.
_merge_summary_into_tail = True
if not _merge_summary_into_tail:
compressed.append({"role": summary_role, "content": summary})
else:
if not self.quiet_mode:
print(" ⚠️ No summary model available — middle turns dropped without summary")
logger.warning("No summary model available — middle turns dropped without summary")
for i in range(compress_end, n_messages):
compressed.append(messages[i].copy())
msg = messages[i].copy()
if _merge_summary_into_tail and i == compress_end:
original = msg.get("content") or ""
msg["content"] = summary + "\n\n" + original
_merge_summary_into_tail = False
compressed.append(msg)
self.compression_count += 1
@@ -329,7 +665,12 @@ Write only the summary body. Do not include any preamble or prefix; the system w
if not self.quiet_mode:
new_estimate = estimate_messages_tokens_rough(compressed)
saved_estimate = display_tokens - new_estimate
print(f" ✅ Compressed: {n_messages}{len(compressed)} messages (~{saved_estimate:,} tokens saved)")
print(f" 💡 Compression #{self.compression_count} complete")
logger.info(
"Compressed: %d -> %d messages (~%d tokens saved)",
n_messages,
len(compressed),
saved_estimate,
)
logger.info("Compression #%d complete", self.compression_count)
return compressed

492
agent/context_references.py Normal file
View File

@@ -0,0 +1,492 @@
from __future__ import annotations
import asyncio
import inspect
import json
import mimetypes
import os
import re
import subprocess
from dataclasses import dataclass, field
from pathlib import Path
from typing import Awaitable, Callable
from agent.model_metadata import estimate_tokens_rough
REFERENCE_PATTERN = re.compile(
r"(?<![\w/])@(?:(?P<simple>diff|staged)\b|(?P<kind>file|folder|git|url):(?P<value>\S+))"
)
TRAILING_PUNCTUATION = ",.;!?"
_SENSITIVE_HOME_DIRS = (".ssh", ".aws", ".gnupg", ".kube", ".docker", ".azure", ".config/gh")
_SENSITIVE_HERMES_DIRS = (Path("skills") / ".hub",)
_SENSITIVE_HOME_FILES = (
Path(".ssh") / "authorized_keys",
Path(".ssh") / "id_rsa",
Path(".ssh") / "id_ed25519",
Path(".ssh") / "config",
Path(".bashrc"),
Path(".zshrc"),
Path(".profile"),
Path(".bash_profile"),
Path(".zprofile"),
Path(".netrc"),
Path(".pgpass"),
Path(".npmrc"),
Path(".pypirc"),
)
@dataclass(frozen=True)
class ContextReference:
raw: str
kind: str
target: str
start: int
end: int
line_start: int | None = None
line_end: int | None = None
@dataclass
class ContextReferenceResult:
message: str
original_message: str
references: list[ContextReference] = field(default_factory=list)
warnings: list[str] = field(default_factory=list)
injected_tokens: int = 0
expanded: bool = False
blocked: bool = False
def parse_context_references(message: str) -> list[ContextReference]:
refs: list[ContextReference] = []
if not message:
return refs
for match in REFERENCE_PATTERN.finditer(message):
simple = match.group("simple")
if simple:
refs.append(
ContextReference(
raw=match.group(0),
kind=simple,
target="",
start=match.start(),
end=match.end(),
)
)
continue
kind = match.group("kind")
value = _strip_trailing_punctuation(match.group("value") or "")
line_start = None
line_end = None
target = value
if kind == "file":
range_match = re.match(r"^(?P<path>.+?):(?P<start>\d+)(?:-(?P<end>\d+))?$", value)
if range_match:
target = range_match.group("path")
line_start = int(range_match.group("start"))
line_end = int(range_match.group("end") or range_match.group("start"))
refs.append(
ContextReference(
raw=match.group(0),
kind=kind,
target=target,
start=match.start(),
end=match.end(),
line_start=line_start,
line_end=line_end,
)
)
return refs
def preprocess_context_references(
message: str,
*,
cwd: str | Path,
context_length: int,
url_fetcher: Callable[[str], str | Awaitable[str]] | None = None,
allowed_root: str | Path | None = None,
) -> ContextReferenceResult:
coro = preprocess_context_references_async(
message,
cwd=cwd,
context_length=context_length,
url_fetcher=url_fetcher,
allowed_root=allowed_root,
)
# Safe for both CLI (no loop) and gateway (loop already running).
try:
loop = asyncio.get_running_loop()
except RuntimeError:
loop = None
if loop and loop.is_running():
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
return pool.submit(asyncio.run, coro).result()
return asyncio.run(coro)
async def preprocess_context_references_async(
message: str,
*,
cwd: str | Path,
context_length: int,
url_fetcher: Callable[[str], str | Awaitable[str]] | None = None,
allowed_root: str | Path | None = None,
) -> ContextReferenceResult:
refs = parse_context_references(message)
if not refs:
return ContextReferenceResult(message=message, original_message=message)
cwd_path = Path(cwd).expanduser().resolve()
# Default to the current working directory so @ references cannot escape
# the active workspace unless a caller explicitly widens the root.
allowed_root_path = (
Path(allowed_root).expanduser().resolve() if allowed_root is not None else cwd_path
)
warnings: list[str] = []
blocks: list[str] = []
injected_tokens = 0
for ref in refs:
warning, block = await _expand_reference(
ref,
cwd_path,
url_fetcher=url_fetcher,
allowed_root=allowed_root_path,
)
if warning:
warnings.append(warning)
if block:
blocks.append(block)
injected_tokens += estimate_tokens_rough(block)
hard_limit = max(1, int(context_length * 0.50))
soft_limit = max(1, int(context_length * 0.25))
if injected_tokens > hard_limit:
warnings.append(
f"@ context injection refused: {injected_tokens} tokens exceeds the 50% hard limit ({hard_limit})."
)
return ContextReferenceResult(
message=message,
original_message=message,
references=refs,
warnings=warnings,
injected_tokens=injected_tokens,
expanded=False,
blocked=True,
)
if injected_tokens > soft_limit:
warnings.append(
f"@ context injection warning: {injected_tokens} tokens exceeds the 25% soft limit ({soft_limit})."
)
stripped = _remove_reference_tokens(message, refs)
final = stripped
if warnings:
final = f"{final}\n\n--- Context Warnings ---\n" + "\n".join(f"- {warning}" for warning in warnings)
if blocks:
final = f"{final}\n\n--- Attached Context ---\n\n" + "\n\n".join(blocks)
return ContextReferenceResult(
message=final.strip(),
original_message=message,
references=refs,
warnings=warnings,
injected_tokens=injected_tokens,
expanded=bool(blocks or warnings),
blocked=False,
)
async def _expand_reference(
ref: ContextReference,
cwd: Path,
*,
url_fetcher: Callable[[str], str | Awaitable[str]] | None = None,
allowed_root: Path | None = None,
) -> tuple[str | None, str | None]:
try:
if ref.kind == "file":
return _expand_file_reference(ref, cwd, allowed_root=allowed_root)
if ref.kind == "folder":
return _expand_folder_reference(ref, cwd, allowed_root=allowed_root)
if ref.kind == "diff":
return _expand_git_reference(ref, cwd, ["diff"], "git diff")
if ref.kind == "staged":
return _expand_git_reference(ref, cwd, ["diff", "--staged"], "git diff --staged")
if ref.kind == "git":
count = max(1, min(int(ref.target or "1"), 10))
return _expand_git_reference(ref, cwd, ["log", f"-{count}", "-p"], f"git log -{count} -p")
if ref.kind == "url":
content = await _fetch_url_content(ref.target, url_fetcher=url_fetcher)
if not content:
return f"{ref.raw}: no content extracted", None
return None, f"🌐 {ref.raw} ({estimate_tokens_rough(content)} tokens)\n{content}"
except Exception as exc:
return f"{ref.raw}: {exc}", None
return f"{ref.raw}: unsupported reference type", None
def _expand_file_reference(
ref: ContextReference,
cwd: Path,
*,
allowed_root: Path | None = None,
) -> tuple[str | None, str | None]:
path = _resolve_path(cwd, ref.target, allowed_root=allowed_root)
_ensure_reference_path_allowed(path)
if not path.exists():
return f"{ref.raw}: file not found", None
if not path.is_file():
return f"{ref.raw}: path is not a file", None
if _is_binary_file(path):
return f"{ref.raw}: binary files are not supported", None
text = path.read_text(encoding="utf-8")
if ref.line_start is not None:
lines = text.splitlines()
start_idx = max(ref.line_start - 1, 0)
end_idx = min(ref.line_end or ref.line_start, len(lines))
text = "\n".join(lines[start_idx:end_idx])
lang = _code_fence_language(path)
label = ref.raw
return None, f"📄 {label} ({estimate_tokens_rough(text)} tokens)\n```{lang}\n{text}\n```"
def _expand_folder_reference(
ref: ContextReference,
cwd: Path,
*,
allowed_root: Path | None = None,
) -> tuple[str | None, str | None]:
path = _resolve_path(cwd, ref.target, allowed_root=allowed_root)
_ensure_reference_path_allowed(path)
if not path.exists():
return f"{ref.raw}: folder not found", None
if not path.is_dir():
return f"{ref.raw}: path is not a folder", None
listing = _build_folder_listing(path, cwd)
return None, f"📁 {ref.raw} ({estimate_tokens_rough(listing)} tokens)\n{listing}"
def _expand_git_reference(
ref: ContextReference,
cwd: Path,
args: list[str],
label: str,
) -> tuple[str | None, str | None]:
try:
result = subprocess.run(
["git", *args],
cwd=cwd,
capture_output=True,
text=True,
timeout=30,
)
except subprocess.TimeoutExpired:
return f"{ref.raw}: git command timed out (30s)", None
if result.returncode != 0:
stderr = (result.stderr or "").strip() or "git command failed"
return f"{ref.raw}: {stderr}", None
content = result.stdout.strip()
if not content:
content = "(no output)"
return None, f"🧾 {label} ({estimate_tokens_rough(content)} tokens)\n```diff\n{content}\n```"
async def _fetch_url_content(
url: str,
*,
url_fetcher: Callable[[str], str | Awaitable[str]] | None = None,
) -> str:
fetcher = url_fetcher or _default_url_fetcher
content = fetcher(url)
if inspect.isawaitable(content):
content = await content
return str(content or "").strip()
async def _default_url_fetcher(url: str) -> str:
from tools.web_tools import web_extract_tool
raw = await web_extract_tool([url], format="markdown", use_llm_processing=True)
payload = json.loads(raw)
docs = payload.get("data", {}).get("documents", [])
if not docs:
return ""
doc = docs[0]
return str(doc.get("content") or doc.get("raw_content") or "").strip()
def _resolve_path(cwd: Path, target: str, *, allowed_root: Path | None = None) -> Path:
path = Path(os.path.expanduser(target))
if not path.is_absolute():
path = cwd / path
resolved = path.resolve()
if allowed_root is not None:
try:
resolved.relative_to(allowed_root)
except ValueError as exc:
raise ValueError("path is outside the allowed workspace") from exc
return resolved
def _ensure_reference_path_allowed(path: Path) -> None:
home = Path(os.path.expanduser("~")).resolve()
hermes_home = Path(
os.getenv("HERMES_HOME", str(home / ".hermes"))
).expanduser().resolve()
blocked_exact = {home / rel for rel in _SENSITIVE_HOME_FILES}
blocked_exact.add(hermes_home / ".env")
blocked_dirs = [home / rel for rel in _SENSITIVE_HOME_DIRS]
blocked_dirs.extend(hermes_home / rel for rel in _SENSITIVE_HERMES_DIRS)
if path in blocked_exact:
raise ValueError("path is a sensitive credential file and cannot be attached")
for blocked_dir in blocked_dirs:
try:
path.relative_to(blocked_dir)
except ValueError:
continue
raise ValueError("path is a sensitive credential or internal Hermes path and cannot be attached")
def _strip_trailing_punctuation(value: str) -> str:
stripped = value.rstrip(TRAILING_PUNCTUATION)
while stripped.endswith((")", "]", "}")):
closer = stripped[-1]
opener = {")": "(", "]": "[", "}": "{"}[closer]
if stripped.count(closer) > stripped.count(opener):
stripped = stripped[:-1]
continue
break
return stripped
def _remove_reference_tokens(message: str, refs: list[ContextReference]) -> str:
pieces: list[str] = []
cursor = 0
for ref in refs:
pieces.append(message[cursor:ref.start])
cursor = ref.end
pieces.append(message[cursor:])
text = "".join(pieces)
text = re.sub(r"\s{2,}", " ", text)
text = re.sub(r"\s+([,.;:!?])", r"\1", text)
return text.strip()
def _is_binary_file(path: Path) -> bool:
mime, _ = mimetypes.guess_type(path.name)
if mime and not mime.startswith("text/") and not any(
path.name.endswith(ext) for ext in (".py", ".md", ".txt", ".json", ".yaml", ".yml", ".toml", ".js", ".ts")
):
return True
chunk = path.read_bytes()[:4096]
return b"\x00" in chunk
def _build_folder_listing(path: Path, cwd: Path, limit: int = 200) -> str:
lines = [f"{path.relative_to(cwd)}/"]
entries = _iter_visible_entries(path, cwd, limit=limit)
for entry in entries:
rel = entry.relative_to(cwd)
indent = " " * max(len(rel.parts) - len(path.relative_to(cwd).parts) - 1, 0)
if entry.is_dir():
lines.append(f"{indent}- {entry.name}/")
else:
meta = _file_metadata(entry)
lines.append(f"{indent}- {entry.name} ({meta})")
if len(entries) >= limit:
lines.append("- ...")
return "\n".join(lines)
def _iter_visible_entries(path: Path, cwd: Path, limit: int) -> list[Path]:
rg_entries = _rg_files(path, cwd, limit=limit)
if rg_entries is not None:
output: list[Path] = []
seen_dirs: set[Path] = set()
for rel in rg_entries:
full = cwd / rel
for parent in full.parents:
if parent == cwd or parent in seen_dirs or path not in {parent, *parent.parents}:
continue
seen_dirs.add(parent)
output.append(parent)
output.append(full)
return sorted({p for p in output if p.exists()}, key=lambda p: (not p.is_dir(), str(p)))
output = []
for root, dirs, files in os.walk(path):
dirs[:] = sorted(d for d in dirs if not d.startswith(".") and d != "__pycache__")
files = sorted(f for f in files if not f.startswith("."))
root_path = Path(root)
for d in dirs:
output.append(root_path / d)
if len(output) >= limit:
return output
for f in files:
output.append(root_path / f)
if len(output) >= limit:
return output
return output
def _rg_files(path: Path, cwd: Path, limit: int) -> list[Path] | None:
try:
result = subprocess.run(
["rg", "--files", str(path.relative_to(cwd))],
cwd=cwd,
capture_output=True,
text=True,
timeout=10,
)
except FileNotFoundError:
return None
except subprocess.TimeoutExpired:
return None
if result.returncode != 0:
return None
files = [Path(line.strip()) for line in result.stdout.splitlines() if line.strip()]
return files[:limit]
def _file_metadata(path: Path) -> str:
if _is_binary_file(path):
return f"{path.stat().st_size} bytes"
try:
line_count = path.read_text(encoding="utf-8").count("\n") + 1
except Exception:
return f"{path.stat().st_size} bytes"
return f"{line_count} lines"
def _code_fence_language(path: Path) -> str:
mapping = {
".py": "python",
".js": "javascript",
".ts": "typescript",
".tsx": "tsx",
".jsx": "jsx",
".json": "json",
".md": "markdown",
".sh": "bash",
".yml": "yaml",
".yaml": "yaml",
".toml": "toml",
}
return mapping.get(path.suffix.lower(), "")

447
agent/copilot_acp_client.py Normal file
View File

@@ -0,0 +1,447 @@
"""OpenAI-compatible shim that forwards Hermes requests to `copilot --acp`.
This adapter lets Hermes treat the GitHub Copilot ACP server as a chat-style
backend. Each request starts a short-lived ACP session, sends the formatted
conversation as a single prompt, collects text chunks, and converts the result
back into the minimal shape Hermes expects from an OpenAI client.
"""
from __future__ import annotations
import json
import os
import queue
import shlex
import subprocess
import threading
import time
from collections import deque
from pathlib import Path
from types import SimpleNamespace
from typing import Any
ACP_MARKER_BASE_URL = "acp://copilot"
_DEFAULT_TIMEOUT_SECONDS = 900.0
def _resolve_command() -> str:
return (
os.getenv("HERMES_COPILOT_ACP_COMMAND", "").strip()
or os.getenv("COPILOT_CLI_PATH", "").strip()
or "copilot"
)
def _resolve_args() -> list[str]:
raw = os.getenv("HERMES_COPILOT_ACP_ARGS", "").strip()
if not raw:
return ["--acp", "--stdio"]
return shlex.split(raw)
def _jsonrpc_error(message_id: Any, code: int, message: str) -> dict[str, Any]:
return {
"jsonrpc": "2.0",
"id": message_id,
"error": {
"code": code,
"message": message,
},
}
def _format_messages_as_prompt(messages: list[dict[str, Any]], model: str | None = None) -> str:
sections: list[str] = [
"You are being used as the active ACP agent backend for Hermes.",
"Use your own ACP capabilities and respond directly in natural language.",
"Do not emit OpenAI tool-call JSON.",
]
if model:
sections.append(f"Hermes requested model hint: {model}")
transcript: list[str] = []
for message in messages:
if not isinstance(message, dict):
continue
role = str(message.get("role") or "unknown").strip().lower()
if role == "tool":
role = "tool"
elif role not in {"system", "user", "assistant"}:
role = "context"
content = message.get("content")
rendered = _render_message_content(content)
if not rendered:
continue
label = {
"system": "System",
"user": "User",
"assistant": "Assistant",
"tool": "Tool",
"context": "Context",
}.get(role, role.title())
transcript.append(f"{label}:\n{rendered}")
if transcript:
sections.append("Conversation transcript:\n\n" + "\n\n".join(transcript))
sections.append("Continue the conversation from the latest user request.")
return "\n\n".join(section.strip() for section in sections if section and section.strip())
def _render_message_content(content: Any) -> str:
if content is None:
return ""
if isinstance(content, str):
return content.strip()
if isinstance(content, dict):
if "text" in content:
return str(content.get("text") or "").strip()
if "content" in content and isinstance(content.get("content"), str):
return str(content.get("content") or "").strip()
return json.dumps(content, ensure_ascii=True)
if isinstance(content, list):
parts: list[str] = []
for item in content:
if isinstance(item, str):
parts.append(item)
elif isinstance(item, dict):
text = item.get("text")
if isinstance(text, str) and text.strip():
parts.append(text.strip())
return "\n".join(parts).strip()
return str(content).strip()
def _ensure_path_within_cwd(path_text: str, cwd: str) -> Path:
candidate = Path(path_text)
if not candidate.is_absolute():
raise PermissionError("ACP file-system paths must be absolute.")
resolved = candidate.resolve()
root = Path(cwd).resolve()
try:
resolved.relative_to(root)
except ValueError as exc:
raise PermissionError(f"Path '{resolved}' is outside the session cwd '{root}'.") from exc
return resolved
class _ACPChatCompletions:
def __init__(self, client: "CopilotACPClient"):
self._client = client
def create(self, **kwargs: Any) -> Any:
return self._client._create_chat_completion(**kwargs)
class _ACPChatNamespace:
def __init__(self, client: "CopilotACPClient"):
self.completions = _ACPChatCompletions(client)
class CopilotACPClient:
"""Minimal OpenAI-client-compatible facade for Copilot ACP."""
def __init__(
self,
*,
api_key: str | None = None,
base_url: str | None = None,
default_headers: dict[str, str] | None = None,
acp_command: str | None = None,
acp_args: list[str] | None = None,
acp_cwd: str | None = None,
command: str | None = None,
args: list[str] | None = None,
**_: Any,
):
self.api_key = api_key or "copilot-acp"
self.base_url = base_url or ACP_MARKER_BASE_URL
self._default_headers = dict(default_headers or {})
self._acp_command = acp_command or command or _resolve_command()
self._acp_args = list(acp_args or args or _resolve_args())
self._acp_cwd = str(Path(acp_cwd or os.getcwd()).resolve())
self.chat = _ACPChatNamespace(self)
self.is_closed = False
self._active_process: subprocess.Popen[str] | None = None
self._active_process_lock = threading.Lock()
def close(self) -> None:
proc: subprocess.Popen[str] | None
with self._active_process_lock:
proc = self._active_process
self._active_process = None
self.is_closed = True
if proc is None:
return
try:
proc.terminate()
proc.wait(timeout=2)
except Exception:
try:
proc.kill()
except Exception:
pass
def _create_chat_completion(
self,
*,
model: str | None = None,
messages: list[dict[str, Any]] | None = None,
timeout: float | None = None,
**_: Any,
) -> Any:
prompt_text = _format_messages_as_prompt(messages or [], model=model)
response_text, reasoning_text = self._run_prompt(
prompt_text,
timeout_seconds=float(timeout or _DEFAULT_TIMEOUT_SECONDS),
)
usage = SimpleNamespace(
prompt_tokens=0,
completion_tokens=0,
total_tokens=0,
prompt_tokens_details=SimpleNamespace(cached_tokens=0),
)
assistant_message = SimpleNamespace(
content=response_text,
tool_calls=[],
reasoning=reasoning_text or None,
reasoning_content=reasoning_text or None,
reasoning_details=None,
)
choice = SimpleNamespace(message=assistant_message, finish_reason="stop")
return SimpleNamespace(
choices=[choice],
usage=usage,
model=model or "copilot-acp",
)
def _run_prompt(self, prompt_text: str, *, timeout_seconds: float) -> tuple[str, str]:
try:
proc = subprocess.Popen(
[self._acp_command] + self._acp_args,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
bufsize=1,
cwd=self._acp_cwd,
)
except FileNotFoundError as exc:
raise RuntimeError(
f"Could not start Copilot ACP command '{self._acp_command}'. "
"Install GitHub Copilot CLI or set HERMES_COPILOT_ACP_COMMAND/COPILOT_CLI_PATH."
) from exc
if proc.stdin is None or proc.stdout is None:
proc.kill()
raise RuntimeError("Copilot ACP process did not expose stdin/stdout pipes.")
self.is_closed = False
with self._active_process_lock:
self._active_process = proc
inbox: queue.Queue[dict[str, Any]] = queue.Queue()
stderr_tail: deque[str] = deque(maxlen=40)
def _stdout_reader() -> None:
for line in proc.stdout:
try:
inbox.put(json.loads(line))
except Exception:
inbox.put({"raw": line.rstrip("\n")})
def _stderr_reader() -> None:
if proc.stderr is None:
return
for line in proc.stderr:
stderr_tail.append(line.rstrip("\n"))
out_thread = threading.Thread(target=_stdout_reader, daemon=True)
err_thread = threading.Thread(target=_stderr_reader, daemon=True)
out_thread.start()
err_thread.start()
next_id = 0
def _request(method: str, params: dict[str, Any], *, text_parts: list[str] | None = None, reasoning_parts: list[str] | None = None) -> Any:
nonlocal next_id
next_id += 1
request_id = next_id
payload = {
"jsonrpc": "2.0",
"id": request_id,
"method": method,
"params": params,
}
proc.stdin.write(json.dumps(payload) + "\n")
proc.stdin.flush()
deadline = time.time() + timeout_seconds
while time.time() < deadline:
if proc.poll() is not None:
break
try:
msg = inbox.get(timeout=0.1)
except queue.Empty:
continue
if self._handle_server_message(
msg,
process=proc,
cwd=self._acp_cwd,
text_parts=text_parts,
reasoning_parts=reasoning_parts,
):
continue
if msg.get("id") != request_id:
continue
if "error" in msg:
err = msg.get("error") or {}
raise RuntimeError(
f"Copilot ACP {method} failed: {err.get('message') or err}"
)
return msg.get("result")
stderr_text = "\n".join(stderr_tail).strip()
if proc.poll() is not None and stderr_text:
raise RuntimeError(f"Copilot ACP process exited early: {stderr_text}")
raise TimeoutError(f"Timed out waiting for Copilot ACP response to {method}.")
try:
_request(
"initialize",
{
"protocolVersion": 1,
"clientCapabilities": {
"fs": {
"readTextFile": True,
"writeTextFile": True,
}
},
"clientInfo": {
"name": "hermes-agent",
"title": "Hermes Agent",
"version": "0.0.0",
},
},
)
session = _request(
"session/new",
{
"cwd": self._acp_cwd,
"mcpServers": [],
},
) or {}
session_id = str(session.get("sessionId") or "").strip()
if not session_id:
raise RuntimeError("Copilot ACP did not return a sessionId.")
text_parts: list[str] = []
reasoning_parts: list[str] = []
_request(
"session/prompt",
{
"sessionId": session_id,
"prompt": [
{
"type": "text",
"text": prompt_text,
}
],
},
text_parts=text_parts,
reasoning_parts=reasoning_parts,
)
return "".join(text_parts), "".join(reasoning_parts)
finally:
self.close()
def _handle_server_message(
self,
msg: dict[str, Any],
*,
process: subprocess.Popen[str],
cwd: str,
text_parts: list[str] | None,
reasoning_parts: list[str] | None,
) -> bool:
method = msg.get("method")
if not isinstance(method, str):
return False
if method == "session/update":
params = msg.get("params") or {}
update = params.get("update") or {}
kind = str(update.get("sessionUpdate") or "").strip()
content = update.get("content") or {}
chunk_text = ""
if isinstance(content, dict):
chunk_text = str(content.get("text") or "")
if kind == "agent_message_chunk" and chunk_text and text_parts is not None:
text_parts.append(chunk_text)
elif kind == "agent_thought_chunk" and chunk_text and reasoning_parts is not None:
reasoning_parts.append(chunk_text)
return True
if process.stdin is None:
return True
message_id = msg.get("id")
params = msg.get("params") or {}
if method == "session/request_permission":
response = {
"jsonrpc": "2.0",
"id": message_id,
"result": {
"outcome": {
"outcome": "allow_once",
}
},
}
elif method == "fs/read_text_file":
try:
path = _ensure_path_within_cwd(str(params.get("path") or ""), cwd)
content = path.read_text() if path.exists() else ""
line = params.get("line")
limit = params.get("limit")
if isinstance(line, int) and line > 1:
lines = content.splitlines(keepends=True)
start = line - 1
end = start + limit if isinstance(limit, int) and limit > 0 else None
content = "".join(lines[start:end])
response = {
"jsonrpc": "2.0",
"id": message_id,
"result": {
"content": content,
},
}
except Exception as exc:
response = _jsonrpc_error(message_id, -32602, str(exc))
elif method == "fs/write_text_file":
try:
path = _ensure_path_within_cwd(str(params.get("path") or ""), cwd)
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(str(params.get("content") or ""))
response = {
"jsonrpc": "2.0",
"id": message_id,
"result": None,
}
except Exception as exc:
response = _jsonrpc_error(message_id, -32602, str(exc))
else:
response = _jsonrpc_error(
message_id,
-32601,
f"ACP client method '{method}' is not supported by Hermes yet.",
)
process.stdin.write(json.dumps(response) + "\n")
process.stdin.flush()
return True

1103
agent/credential_pool.py Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -10,6 +10,9 @@ import os
import sys
import threading
import time
from dataclasses import dataclass, field
from difflib import unified_diff
from pathlib import Path
# ANSI escape codes for coloring tool failure indicators
_RED = "\033[31m"
@@ -17,6 +20,39 @@ _RESET = "\033[0m"
logger = logging.getLogger(__name__)
_ANSI_RESET = "\033[0m"
_ANSI_DIM = "\033[38;2;150;150;150m"
_ANSI_FILE = "\033[38;2;180;160;255m"
_ANSI_HUNK = "\033[38;2;120;120;140m"
_ANSI_MINUS = "\033[38;2;255;255;255;48;2;120;20;20m"
_ANSI_PLUS = "\033[38;2;255;255;255;48;2;20;90;20m"
_MAX_INLINE_DIFF_FILES = 6
_MAX_INLINE_DIFF_LINES = 80
@dataclass
class LocalEditSnapshot:
"""Pre-tool filesystem snapshot used to render diffs locally after writes."""
paths: list[Path] = field(default_factory=list)
before: dict[str, str | None] = field(default_factory=dict)
# =========================================================================
# Configurable tool preview length (0 = no limit)
# Set once at startup by CLI or gateway from display.tool_preview_length config.
# =========================================================================
_tool_preview_max_len: int = 0 # 0 = unlimited
def set_tool_preview_max_len(n: int) -> None:
"""Set the global max length for tool call previews. 0 = no limit."""
global _tool_preview_max_len
_tool_preview_max_len = max(int(n), 0) if n else 0
def get_tool_preview_max_len() -> int:
"""Return the configured max preview length (0 = unlimited)."""
return _tool_preview_max_len
# =========================================================================
# Skin-aware helpers (lazy import to avoid circular deps)
@@ -59,6 +95,32 @@ def get_skin_tool_prefix() -> str:
return ""
def get_tool_emoji(tool_name: str, default: str = "") -> str:
"""Get the display emoji for a tool.
Resolution order:
1. Active skin's ``tool_emojis`` overrides (if a skin is loaded)
2. Tool registry's per-tool ``emoji`` field
3. *default* fallback
"""
# 1. Skin override
skin = _get_skin()
if skin and skin.tool_emojis:
override = skin.tool_emojis.get(tool_name)
if override:
return override
# 2. Registry default
try:
from tools.registry import registry
emoji = registry.get_emoji(tool_name, default="")
if emoji:
return emoji
except Exception:
pass
# 3. Hardcoded fallback
return default
# =========================================================================
# Tool preview (one-line summary of a tool call's primary argument)
# =========================================================================
@@ -68,8 +130,14 @@ def _oneline(text: str) -> str:
return " ".join(text.split())
def build_tool_preview(tool_name: str, args: dict, max_len: int = 40) -> str | None:
"""Build a short preview of a tool call's primary argument for display."""
def build_tool_preview(tool_name: str, args: dict, max_len: int | None = None) -> str | None:
"""Build a short preview of a tool call's primary argument for display.
*max_len* controls truncation. ``None`` (default) defers to the global
``_tool_preview_max_len`` set via config; ``0`` means unlimited.
"""
if max_len is None:
max_len = _tool_preview_max_len
if not args:
return None
primary_args = {
@@ -164,11 +232,305 @@ def build_tool_preview(tool_name: str, args: dict, max_len: int = 40) -> str | N
preview = _oneline(str(value))
if not preview:
return None
if len(preview) > max_len:
if max_len > 0 and len(preview) > max_len:
preview = preview[:max_len - 3] + "..."
return preview
# =========================================================================
# Inline diff previews for write actions
# =========================================================================
def _resolved_path(path: str) -> Path:
"""Resolve a possibly-relative filesystem path against the current cwd."""
candidate = Path(os.path.expanduser(path))
if candidate.is_absolute():
return candidate
return Path.cwd() / candidate
def _snapshot_text(path: Path) -> str | None:
"""Return UTF-8 file content, or None for missing/unreadable files."""
try:
return path.read_text(encoding="utf-8")
except (FileNotFoundError, IsADirectoryError, UnicodeDecodeError, OSError):
return None
def _display_diff_path(path: Path) -> str:
"""Prefer cwd-relative paths in diffs when available."""
try:
return str(path.resolve().relative_to(Path.cwd().resolve()))
except Exception:
return str(path)
def _resolve_skill_manage_paths(args: dict) -> list[Path]:
"""Resolve skill_manage write targets to filesystem paths."""
action = args.get("action")
name = args.get("name")
if not action or not name:
return []
from tools.skill_manager_tool import _find_skill, _resolve_skill_dir
if action == "create":
skill_dir = _resolve_skill_dir(name, args.get("category"))
return [skill_dir / "SKILL.md"]
existing = _find_skill(name)
if not existing:
return []
skill_dir = Path(existing["path"])
if action in {"edit", "patch"}:
file_path = args.get("file_path")
return [skill_dir / file_path] if file_path else [skill_dir / "SKILL.md"]
if action in {"write_file", "remove_file"}:
file_path = args.get("file_path")
return [skill_dir / file_path] if file_path else []
if action == "delete":
files = [path for path in sorted(skill_dir.rglob("*")) if path.is_file()]
return files
return []
def _resolve_local_edit_paths(tool_name: str, function_args: dict | None) -> list[Path]:
"""Resolve local filesystem targets for write-capable tools."""
if not isinstance(function_args, dict):
return []
if tool_name == "write_file":
path = function_args.get("path")
return [_resolved_path(path)] if path else []
if tool_name == "patch":
path = function_args.get("path")
return [_resolved_path(path)] if path else []
if tool_name == "skill_manage":
return _resolve_skill_manage_paths(function_args)
return []
def capture_local_edit_snapshot(tool_name: str, function_args: dict | None) -> LocalEditSnapshot | None:
"""Capture before-state for local write previews."""
paths = _resolve_local_edit_paths(tool_name, function_args)
if not paths:
return None
snapshot = LocalEditSnapshot(paths=paths)
for path in paths:
snapshot.before[str(path)] = _snapshot_text(path)
return snapshot
def _result_succeeded(result: str | None) -> bool:
"""Conservatively detect whether a tool result represents success."""
if not result:
return False
try:
data = json.loads(result)
except (json.JSONDecodeError, TypeError):
return False
if not isinstance(data, dict):
return False
if data.get("error"):
return False
if "success" in data:
return bool(data.get("success"))
return True
def _diff_from_snapshot(snapshot: LocalEditSnapshot | None) -> str | None:
"""Generate unified diff text from a stored before-state and current files."""
if not snapshot:
return None
chunks: list[str] = []
for path in snapshot.paths:
before = snapshot.before.get(str(path))
after = _snapshot_text(path)
if before == after:
continue
display_path = _display_diff_path(path)
diff = "".join(
unified_diff(
[] if before is None else before.splitlines(keepends=True),
[] if after is None else after.splitlines(keepends=True),
fromfile=f"a/{display_path}",
tofile=f"b/{display_path}",
)
)
if diff:
chunks.append(diff)
if not chunks:
return None
return "".join(chunk if chunk.endswith("\n") else chunk + "\n" for chunk in chunks)
def extract_edit_diff(
tool_name: str,
result: str | None,
*,
function_args: dict | None = None,
snapshot: LocalEditSnapshot | None = None,
) -> str | None:
"""Extract a unified diff from a file-edit tool result."""
if tool_name == "patch" and result:
try:
data = json.loads(result)
except (json.JSONDecodeError, TypeError):
data = None
if isinstance(data, dict):
diff = data.get("diff")
if isinstance(diff, str) and diff.strip():
return diff
if tool_name not in {"write_file", "patch", "skill_manage"}:
return None
if not _result_succeeded(result):
return None
return _diff_from_snapshot(snapshot)
def _emit_inline_diff(diff_text: str, print_fn) -> bool:
"""Emit rendered diff text through the CLI's prompt_toolkit-safe printer."""
if print_fn is None or not diff_text:
return False
try:
print_fn(" ┊ review diff")
for line in diff_text.rstrip("\n").splitlines():
print_fn(line)
return True
except Exception:
return False
def _render_inline_unified_diff(diff: str) -> list[str]:
"""Render unified diff lines in Hermes' inline transcript style."""
rendered: list[str] = []
from_file = None
to_file = None
for raw_line in diff.splitlines():
if raw_line.startswith("--- "):
from_file = raw_line[4:].strip()
continue
if raw_line.startswith("+++ "):
to_file = raw_line[4:].strip()
if from_file or to_file:
rendered.append(f"{_ANSI_FILE}{from_file or 'a/?'}{to_file or 'b/?'}{_ANSI_RESET}")
continue
if raw_line.startswith("@@"):
rendered.append(f"{_ANSI_HUNK}{raw_line}{_ANSI_RESET}")
continue
if raw_line.startswith("-"):
rendered.append(f"{_ANSI_MINUS}{raw_line}{_ANSI_RESET}")
continue
if raw_line.startswith("+"):
rendered.append(f"{_ANSI_PLUS}{raw_line}{_ANSI_RESET}")
continue
if raw_line.startswith(" "):
rendered.append(f"{_ANSI_DIM}{raw_line}{_ANSI_RESET}")
continue
if raw_line:
rendered.append(raw_line)
return rendered
def _split_unified_diff_sections(diff: str) -> list[str]:
"""Split a unified diff into per-file sections."""
sections: list[list[str]] = []
current: list[str] = []
for line in diff.splitlines():
if line.startswith("--- ") and current:
sections.append(current)
current = [line]
continue
current.append(line)
if current:
sections.append(current)
return ["\n".join(section) for section in sections if section]
def _summarize_rendered_diff_sections(
diff: str,
*,
max_files: int = _MAX_INLINE_DIFF_FILES,
max_lines: int = _MAX_INLINE_DIFF_LINES,
) -> list[str]:
"""Render diff sections while capping file count and total line count."""
sections = _split_unified_diff_sections(diff)
rendered: list[str] = []
omitted_files = 0
omitted_lines = 0
for idx, section in enumerate(sections):
if idx >= max_files:
omitted_files += 1
omitted_lines += len(_render_inline_unified_diff(section))
continue
section_lines = _render_inline_unified_diff(section)
remaining_budget = max_lines - len(rendered)
if remaining_budget <= 0:
omitted_lines += len(section_lines)
omitted_files += 1
continue
if len(section_lines) <= remaining_budget:
rendered.extend(section_lines)
continue
rendered.extend(section_lines[:remaining_budget])
omitted_lines += len(section_lines) - remaining_budget
omitted_files += 1 + max(0, len(sections) - idx - 1)
for leftover in sections[idx + 1:]:
omitted_lines += len(_render_inline_unified_diff(leftover))
break
if omitted_files or omitted_lines:
summary = f"… omitted {omitted_lines} diff line(s)"
if omitted_files:
summary += f" across {omitted_files} additional file(s)/section(s)"
rendered.append(f"{_ANSI_HUNK}{summary}{_ANSI_RESET}")
return rendered
def render_edit_diff_with_delta(
tool_name: str,
result: str | None,
*,
function_args: dict | None = None,
snapshot: LocalEditSnapshot | None = None,
print_fn=None,
) -> bool:
"""Render an edit diff inline without taking over the terminal UI."""
diff = extract_edit_diff(
tool_name,
result,
function_args=function_args,
snapshot=snapshot,
)
if not diff:
return False
try:
rendered_lines = _summarize_rendered_diff_sections(diff)
except Exception as exc:
logger.debug("Could not render inline diff: %s", exc)
return False
return _emit_inline_diff("\n".join(rendered_lines), print_fn)
# =========================================================================
# KawaiiSpinner
# =========================================================================
@@ -205,7 +567,7 @@ class KawaiiSpinner:
"analyzing", "computing", "synthesizing", "formulating", "brainstorming",
]
def __init__(self, message: str = "", spinner_type: str = 'dots'):
def __init__(self, message: str = "", spinner_type: str = 'dots', print_fn=None):
self.message = message
self.spinner_frames = self.SPINNERS.get(spinner_type, self.SPINNERS['dots'])
self.running = False
@@ -213,13 +575,26 @@ class KawaiiSpinner:
self.frame_idx = 0
self.start_time = None
self.last_line_len = 0
self._last_flush_time = 0.0 # Rate-limit flushes for patch_stdout compat
# Optional callable to route all output through (e.g. a no-op for silent
# background agents). When set, bypasses self._out entirely so that
# agents with _print_fn overridden remain fully silent.
self._print_fn = print_fn
# Capture stdout NOW, before any redirect_stdout(devnull) from
# child agents can replace sys.stdout with a black hole.
self._out = sys.stdout
def _write(self, text: str, end: str = '\n', flush: bool = False):
"""Write to the stdout captured at spinner creation time."""
"""Write to the stdout captured at spinner creation time.
If a print_fn was supplied at construction, all output is routed through
it instead — allowing callers to silence the spinner with a no-op lambda.
"""
if self._print_fn is not None:
try:
self._print_fn(text)
except Exception:
pass
return
try:
self._out.write(text + end)
if flush:
@@ -227,7 +602,50 @@ class KawaiiSpinner:
except (ValueError, OSError):
pass
@property
def _is_tty(self) -> bool:
"""Check if output is a real terminal, safe against closed streams."""
try:
return hasattr(self._out, 'isatty') and self._out.isatty()
except (ValueError, OSError):
return False
def _is_patch_stdout_proxy(self) -> bool:
"""Return True when stdout is prompt_toolkit's StdoutProxy.
patch_stdout wraps sys.stdout in a StdoutProxy that queues writes and
injects newlines around each flush(). The \\r overwrite never lands on
the correct line — each spinner frame ends up on its own line.
The CLI already drives a TUI widget (_spinner_text) for spinner display,
so KawaiiSpinner's \\r-based animation is redundant under StdoutProxy.
"""
try:
from prompt_toolkit.patch_stdout import StdoutProxy
return isinstance(self._out, StdoutProxy)
except ImportError:
return False
def _animate(self):
# When stdout is not a real terminal (e.g. Docker, systemd, pipe),
# skip the animation entirely — it creates massive log bloat.
# Just log the start once and let stop() log the completion.
if not self._is_tty:
self._write(f" [tool] {self.message}", flush=True)
while self.running:
time.sleep(0.5)
return
# When running inside prompt_toolkit's patch_stdout context the CLI
# renders spinner state via a dedicated TUI widget (_spinner_text).
# Driving a \r-based animation here too causes visual overdraw: the
# StdoutProxy injects newlines around each flush, so every frame lands
# on a new line and overwrites the status bar.
if self._is_patch_stdout_proxy():
while self.running:
time.sleep(0.1)
return
# Cache skin wings at start (avoid per-frame imports)
skin = _get_skin()
wings = skin.get_spinner_wings() if skin else []
@@ -244,18 +662,7 @@ class KawaiiSpinner:
else:
line = f" {frame} {self.message} ({elapsed:.1f}s)"
pad = max(self.last_line_len - len(line), 0)
# Rate-limit flush() calls to avoid spinner spam under
# prompt_toolkit's patch_stdout. Each flush() pushes a queue
# item that may trigger a separate run_in_terminal() call; if
# items are processed one-at-a-time the \r overwrite is lost
# and every frame appears on its own line. By flushing at
# most every 0.4s we guarantee multiple \r-frames are batched
# into a single write, so the terminal collapses them correctly.
now = time.time()
should_flush = (now - self._last_flush_time) >= 0.4
self._write(f"\r{line}{' ' * pad}", end='', flush=should_flush)
if should_flush:
self._last_flush_time = now
self._write(f"\r{line}{' ' * pad}", end='', flush=True)
self.last_line_len = len(line)
self.frame_idx += 1
time.sleep(0.12)
@@ -293,12 +700,19 @@ class KawaiiSpinner:
self.running = False
if self.thread:
self.thread.join(timeout=0.5)
# Clear the spinner line with spaces instead of \033[K to avoid
# garbled escape codes when prompt_toolkit's patch_stdout is active.
blanks = ' ' * max(self.last_line_len + 5, 40)
self._write(f"\r{blanks}\r", end='', flush=True)
is_tty = self._is_tty
if is_tty:
# Clear the spinner line with spaces instead of \033[K to avoid
# garbled escape codes when prompt_toolkit's patch_stdout is active.
blanks = ' ' * max(self.last_line_len + 5, 40)
self._write(f"\r{blanks}\r", end='', flush=True)
if final_message:
self._write(f" {final_message}", flush=True)
elapsed = f" ({time.time() - self.start_time:.1f}s)" if self.start_time else ""
if is_tty:
self._write(f" {final_message}", flush=True)
else:
self._write(f" [done] {final_message}{elapsed}", flush=True)
def __enter__(self):
self.start()
@@ -406,10 +820,14 @@ def get_cute_tool_message(
def _trunc(s, n=40):
s = str(s)
if _tool_preview_max_len == 0:
return s # no limit
return (s[:n-3] + "...") if len(s) > n else s
def _path(p, n=35):
p = str(p)
if _tool_preview_max_len == 0:
return p # no limit
return ("..." + p[-(n-3):]) if len(p) > n else p
def _wrap(line: str) -> str:
@@ -586,3 +1004,81 @@ def write_tty(text: str) -> None:
except OSError:
sys.stdout.write(text)
sys.stdout.flush()
# =========================================================================
# Context pressure display (CLI user-facing warnings)
# =========================================================================
# ANSI color codes for context pressure tiers
_CYAN = "\033[36m"
_YELLOW = "\033[33m"
_BOLD = "\033[1m"
_DIM_ANSI = "\033[2m"
# Bar characters
_BAR_FILLED = ""
_BAR_EMPTY = ""
_BAR_WIDTH = 20
def format_context_pressure(
compaction_progress: float,
threshold_tokens: int,
threshold_percent: float,
compression_enabled: bool = True,
) -> str:
"""Build a formatted context pressure line for CLI display.
The bar and percentage show progress toward the compaction threshold,
NOT the raw context window. 100% = compaction fires.
Args:
compaction_progress: How close to compaction (0.01.0, 1.0 = fires).
threshold_tokens: Compaction threshold in tokens.
threshold_percent: Compaction threshold as a fraction of context window.
compression_enabled: Whether auto-compression is active.
"""
pct_int = min(int(compaction_progress * 100), 100)
filled = min(int(compaction_progress * _BAR_WIDTH), _BAR_WIDTH)
bar = _BAR_FILLED * filled + _BAR_EMPTY * (_BAR_WIDTH - filled)
threshold_k = f"{threshold_tokens // 1000}k" if threshold_tokens >= 1000 else str(threshold_tokens)
threshold_pct_int = int(threshold_percent * 100)
color = f"{_BOLD}{_YELLOW}"
icon = ""
if compression_enabled:
hint = "compaction approaching"
else:
hint = "no auto-compaction"
return (
f" {color}{icon} context {bar} {pct_int}% to compaction{_ANSI_RESET}"
f" {_DIM_ANSI}{threshold_k} threshold ({threshold_pct_int}%) · {hint}{_ANSI_RESET}"
)
def format_context_pressure_gateway(
compaction_progress: float,
threshold_percent: float,
compression_enabled: bool = True,
) -> str:
"""Build a plain-text context pressure notification for messaging platforms.
No ANSI — just Unicode and plain text suitable for Telegram/Discord/etc.
The percentage shows progress toward the compaction threshold.
"""
pct_int = min(int(compaction_progress * 100), 100)
filled = min(int(compaction_progress * _BAR_WIDTH), _BAR_WIDTH)
bar = _BAR_FILLED * filled + _BAR_EMPTY * (_BAR_WIDTH - filled)
threshold_pct_int = int(threshold_percent * 100)
icon = "⚠️"
if compression_enabled:
hint = f"Context compaction approaching (threshold: {threshold_pct_int}% of window)."
else:
hint = "Auto-compaction is disabled — context may be truncated."
return f"{icon} Context: {bar} {pct_int}% to compaction\n{hint}"

View File

@@ -20,65 +20,23 @@ import json
import time
from collections import Counter, defaultdict
from datetime import datetime
from typing import Any, Dict, List, Optional
from typing import Any, Dict, List
# =========================================================================
# Model pricing (USD per million tokens) — approximate as of early 2026
# =========================================================================
MODEL_PRICING = {
# OpenAI
"gpt-4o": {"input": 2.50, "output": 10.00},
"gpt-4o-mini": {"input": 0.15, "output": 0.60},
"gpt-4.1": {"input": 2.00, "output": 8.00},
"gpt-4.1-mini": {"input": 0.40, "output": 1.60},
"gpt-4.1-nano": {"input": 0.10, "output": 0.40},
"gpt-4.5-preview": {"input": 75.00, "output": 150.00},
"gpt-5": {"input": 10.00, "output": 30.00},
"gpt-5.4": {"input": 10.00, "output": 30.00},
"o3": {"input": 10.00, "output": 40.00},
"o3-mini": {"input": 1.10, "output": 4.40},
"o4-mini": {"input": 1.10, "output": 4.40},
# Anthropic
"claude-opus-4-20250514": {"input": 15.00, "output": 75.00},
"claude-sonnet-4-20250514": {"input": 3.00, "output": 15.00},
"claude-3-5-sonnet-20241022": {"input": 3.00, "output": 15.00},
"claude-3-5-haiku-20241022": {"input": 0.80, "output": 4.00},
"claude-3-opus-20240229": {"input": 15.00, "output": 75.00},
"claude-3-haiku-20240307": {"input": 0.25, "output": 1.25},
# DeepSeek
"deepseek-chat": {"input": 0.14, "output": 0.28},
"deepseek-reasoner": {"input": 0.55, "output": 2.19},
# Google
"gemini-2.5-pro": {"input": 1.25, "output": 10.00},
"gemini-2.5-flash": {"input": 0.15, "output": 0.60},
"gemini-2.0-flash": {"input": 0.10, "output": 0.40},
# Meta (via providers)
"llama-4-maverick": {"input": 0.50, "output": 0.70},
"llama-4-scout": {"input": 0.20, "output": 0.30},
# Z.AI / GLM (direct provider — pricing not published externally, treat as local)
"glm-5": {"input": 0.0, "output": 0.0},
"glm-4.7": {"input": 0.0, "output": 0.0},
"glm-4.5": {"input": 0.0, "output": 0.0},
"glm-4.5-flash": {"input": 0.0, "output": 0.0},
# Kimi / Moonshot (direct provider — pricing not published externally, treat as local)
"kimi-k2.5": {"input": 0.0, "output": 0.0},
"kimi-k2-thinking": {"input": 0.0, "output": 0.0},
"kimi-k2-turbo-preview": {"input": 0.0, "output": 0.0},
"kimi-k2-0905-preview": {"input": 0.0, "output": 0.0},
# MiniMax (direct provider — pricing not published externally, treat as local)
"MiniMax-M2.5": {"input": 0.0, "output": 0.0},
"MiniMax-M2.5-highspeed": {"input": 0.0, "output": 0.0},
"MiniMax-M2.1": {"input": 0.0, "output": 0.0},
}
from agent.usage_pricing import (
CanonicalUsage,
DEFAULT_PRICING,
estimate_usage_cost,
format_duration_compact,
get_pricing,
has_known_pricing,
)
# Fallback: unknown/custom models get zero cost (we can't assume pricing
# for self-hosted models, custom OAI endpoints, local inference, etc.)
_DEFAULT_PRICING = {"input": 0.0, "output": 0.0}
_DEFAULT_PRICING = DEFAULT_PRICING
def _has_known_pricing(model_name: str) -> bool:
def _has_known_pricing(model_name: str, provider: str = None, base_url: str = None) -> bool:
"""Check if a model has known pricing (vs unknown/custom endpoint)."""
return _get_pricing(model_name) is not _DEFAULT_PRICING
return has_known_pricing(model_name, provider=provider, base_url=base_url)
def _get_pricing(model_name: str) -> Dict[str, float]:
@@ -87,67 +45,51 @@ def _get_pricing(model_name: str) -> Dict[str, float]:
Returns _DEFAULT_PRICING (zero cost) for unknown/custom models —
we can't assume costs for self-hosted endpoints, local inference, etc.
"""
if not model_name:
return _DEFAULT_PRICING
# Strip provider prefix (e.g., "anthropic/claude-..." -> "claude-...")
bare = model_name.split("/")[-1].lower()
# Exact match first
if bare in MODEL_PRICING:
return MODEL_PRICING[bare]
# Fuzzy prefix match — prefer the LONGEST matching key to avoid
# e.g. "gpt-4o" matching before "gpt-4o-mini" for "gpt-4o-mini-2024-07-18"
best_match = None
best_len = 0
for key, price in MODEL_PRICING.items():
if bare.startswith(key) and len(key) > best_len:
best_match = price
best_len = len(key)
if best_match:
return best_match
# Keyword heuristics (checked in most-specific-first order)
if "opus" in bare:
return {"input": 15.00, "output": 75.00}
if "sonnet" in bare:
return {"input": 3.00, "output": 15.00}
if "haiku" in bare:
return {"input": 0.80, "output": 4.00}
if "gpt-4o-mini" in bare:
return {"input": 0.15, "output": 0.60}
if "gpt-4o" in bare:
return {"input": 2.50, "output": 10.00}
if "gpt-5" in bare:
return {"input": 10.00, "output": 30.00}
if "deepseek" in bare:
return {"input": 0.14, "output": 0.28}
if "gemini" in bare:
return {"input": 0.15, "output": 0.60}
return _DEFAULT_PRICING
return get_pricing(model_name)
def _estimate_cost(model: str, input_tokens: int, output_tokens: int) -> float:
"""Estimate the USD cost for a given model and token counts."""
pricing = _get_pricing(model)
return (input_tokens * pricing["input"] + output_tokens * pricing["output"]) / 1_000_000
def _estimate_cost(
session_or_model: Dict[str, Any] | str,
input_tokens: int = 0,
output_tokens: int = 0,
*,
cache_read_tokens: int = 0,
cache_write_tokens: int = 0,
provider: str = None,
base_url: str = None,
) -> tuple[float, str]:
"""Estimate the USD cost for a session row or a model/token tuple."""
if isinstance(session_or_model, dict):
session = session_or_model
model = session.get("model") or ""
usage = CanonicalUsage(
input_tokens=session.get("input_tokens") or 0,
output_tokens=session.get("output_tokens") or 0,
cache_read_tokens=session.get("cache_read_tokens") or 0,
cache_write_tokens=session.get("cache_write_tokens") or 0,
)
provider = session.get("billing_provider")
base_url = session.get("billing_base_url")
else:
model = session_or_model or ""
usage = CanonicalUsage(
input_tokens=input_tokens,
output_tokens=output_tokens,
cache_read_tokens=cache_read_tokens,
cache_write_tokens=cache_write_tokens,
)
result = estimate_usage_cost(
model,
usage,
provider=provider,
base_url=base_url,
)
return float(result.amount_usd or 0.0), result.status
def _format_duration(seconds: float) -> str:
"""Format seconds into a human-readable duration string."""
if seconds < 60:
return f"{seconds:.0f}s"
minutes = seconds / 60
if minutes < 60:
return f"{minutes:.0f}m"
hours = minutes / 60
if hours < 24:
remaining_min = int(minutes % 60)
return f"{int(hours)}h {remaining_min}m" if remaining_min else f"{int(hours)}h"
days = hours / 24
return f"{days:.1f}d"
return format_duration_compact(seconds)
def _bar_chart(values: List[int], max_width: int = 20) -> List[str]:
@@ -234,24 +176,30 @@ class InsightsEngine:
# Columns we actually need (skip system_prompt, model_config blobs)
_SESSION_COLS = ("id, source, model, started_at, ended_at, "
"message_count, tool_call_count, input_tokens, output_tokens")
"message_count, tool_call_count, input_tokens, output_tokens, "
"cache_read_tokens, cache_write_tokens, billing_provider, "
"billing_base_url, billing_mode, estimated_cost_usd, "
"actual_cost_usd, cost_status, cost_source")
# Pre-computed query strings — f-string evaluated once at class definition,
# not at runtime, so no user-controlled value can alter the query structure.
_GET_SESSIONS_WITH_SOURCE = (
f"SELECT {_SESSION_COLS} FROM sessions"
" WHERE started_at >= ? AND source = ?"
" ORDER BY started_at DESC"
)
_GET_SESSIONS_ALL = (
f"SELECT {_SESSION_COLS} FROM sessions"
" WHERE started_at >= ?"
" ORDER BY started_at DESC"
)
def _get_sessions(self, cutoff: float, source: str = None) -> List[Dict]:
"""Fetch sessions within the time window."""
if source:
cursor = self._conn.execute(
f"""SELECT {self._SESSION_COLS} FROM sessions
WHERE started_at >= ? AND source = ?
ORDER BY started_at DESC""",
(cutoff, source),
)
cursor = self._conn.execute(self._GET_SESSIONS_WITH_SOURCE, (cutoff, source))
else:
cursor = self._conn.execute(
f"""SELECT {self._SESSION_COLS} FROM sessions
WHERE started_at >= ?
ORDER BY started_at DESC""",
(cutoff,),
)
cursor = self._conn.execute(self._GET_SESSIONS_ALL, (cutoff,))
return [dict(row) for row in cursor.fetchall()]
def _get_tool_usage(self, cutoff: float, source: str = None) -> List[Dict]:
@@ -386,21 +334,30 @@ class InsightsEngine:
"""Compute high-level overview statistics."""
total_input = sum(s.get("input_tokens") or 0 for s in sessions)
total_output = sum(s.get("output_tokens") or 0 for s in sessions)
total_tokens = total_input + total_output
total_cache_read = sum(s.get("cache_read_tokens") or 0 for s in sessions)
total_cache_write = sum(s.get("cache_write_tokens") or 0 for s in sessions)
total_tokens = total_input + total_output + total_cache_read + total_cache_write
total_tool_calls = sum(s.get("tool_call_count") or 0 for s in sessions)
total_messages = sum(s.get("message_count") or 0 for s in sessions)
# Cost estimation (weighted by model)
total_cost = 0.0
actual_cost = 0.0
models_with_pricing = set()
models_without_pricing = set()
unknown_cost_sessions = 0
included_cost_sessions = 0
for s in sessions:
model = s.get("model") or ""
inp = s.get("input_tokens") or 0
out = s.get("output_tokens") or 0
total_cost += _estimate_cost(model, inp, out)
estimated, status = _estimate_cost(s)
total_cost += estimated
actual_cost += s.get("actual_cost_usd") or 0.0
display = model.split("/")[-1] if "/" in model else (model or "unknown")
if _has_known_pricing(model):
if status == "included":
included_cost_sessions += 1
elif status == "unknown":
unknown_cost_sessions += 1
if _has_known_pricing(model, s.get("billing_provider"), s.get("billing_base_url")):
models_with_pricing.add(display)
else:
models_without_pricing.add(display)
@@ -427,8 +384,11 @@ class InsightsEngine:
"total_tool_calls": total_tool_calls,
"total_input_tokens": total_input,
"total_output_tokens": total_output,
"total_cache_read_tokens": total_cache_read,
"total_cache_write_tokens": total_cache_write,
"total_tokens": total_tokens,
"estimated_cost": total_cost,
"actual_cost": actual_cost,
"total_hours": total_hours,
"avg_session_duration": avg_duration,
"avg_messages_per_session": total_messages / len(sessions) if sessions else 0,
@@ -440,12 +400,15 @@ class InsightsEngine:
"date_range_end": date_range_end,
"models_with_pricing": sorted(models_with_pricing),
"models_without_pricing": sorted(models_without_pricing),
"unknown_cost_sessions": unknown_cost_sessions,
"included_cost_sessions": included_cost_sessions,
}
def _compute_model_breakdown(self, sessions: List[Dict]) -> List[Dict]:
"""Break down usage by model."""
model_data = defaultdict(lambda: {
"sessions": 0, "input_tokens": 0, "output_tokens": 0,
"cache_read_tokens": 0, "cache_write_tokens": 0,
"total_tokens": 0, "tool_calls": 0, "cost": 0.0,
})
@@ -457,12 +420,18 @@ class InsightsEngine:
d["sessions"] += 1
inp = s.get("input_tokens") or 0
out = s.get("output_tokens") or 0
cache_read = s.get("cache_read_tokens") or 0
cache_write = s.get("cache_write_tokens") or 0
d["input_tokens"] += inp
d["output_tokens"] += out
d["total_tokens"] += inp + out
d["cache_read_tokens"] += cache_read
d["cache_write_tokens"] += cache_write
d["total_tokens"] += inp + out + cache_read + cache_write
d["tool_calls"] += s.get("tool_call_count") or 0
d["cost"] += _estimate_cost(model, inp, out)
d["has_pricing"] = _has_known_pricing(model)
estimate, status = _estimate_cost(s)
d["cost"] += estimate
d["has_pricing"] = _has_known_pricing(model, s.get("billing_provider"), s.get("billing_base_url"))
d["cost_status"] = status
result = [
{"model": model, **data}
@@ -476,7 +445,8 @@ class InsightsEngine:
"""Break down usage by platform/source."""
platform_data = defaultdict(lambda: {
"sessions": 0, "messages": 0, "input_tokens": 0,
"output_tokens": 0, "total_tokens": 0, "tool_calls": 0,
"output_tokens": 0, "cache_read_tokens": 0,
"cache_write_tokens": 0, "total_tokens": 0, "tool_calls": 0,
})
for s in sessions:
@@ -486,9 +456,13 @@ class InsightsEngine:
d["messages"] += s.get("message_count") or 0
inp = s.get("input_tokens") or 0
out = s.get("output_tokens") or 0
cache_read = s.get("cache_read_tokens") or 0
cache_write = s.get("cache_write_tokens") or 0
d["input_tokens"] += inp
d["output_tokens"] += out
d["total_tokens"] += inp + out
d["cache_read_tokens"] += cache_read
d["cache_write_tokens"] += cache_write
d["total_tokens"] += inp + out + cache_read + cache_write
d["tool_calls"] += s.get("tool_call_count") or 0
result = [
@@ -670,6 +644,9 @@ class InsightsEngine:
lines.append(f" Sessions: {o['total_sessions']:<12} Messages: {o['total_messages']:,}")
lines.append(f" Tool calls: {o['total_tool_calls']:<12,} User messages: {o['user_messages']:,}")
lines.append(f" Input tokens: {o['total_input_tokens']:<12,} Output tokens: {o['total_output_tokens']:,}")
cache_total = o.get("total_cache_read_tokens", 0) + o.get("total_cache_write_tokens", 0)
if cache_total > 0:
lines.append(f" Cache read: {o['total_cache_read_tokens']:<12,} Cache write: {o['total_cache_write_tokens']:,}")
cost_str = f"${o['estimated_cost']:.2f}"
if o.get("models_without_pricing"):
cost_str += " *"
@@ -692,7 +669,7 @@ class InsightsEngine:
cost_cell = " N/A"
lines.append(f" {model_name:<30} {m['sessions']:>8} {m['total_tokens']:>12,} {cost_cell}")
if o.get("models_without_pricing"):
lines.append(f" * Cost N/A for custom/self-hosted models")
lines.append(" * Cost N/A for custom/self-hosted models")
lines.append("")
# Platform breakdown
@@ -772,7 +749,11 @@ class InsightsEngine:
# Overview
lines.append(f"**Sessions:** {o['total_sessions']} | **Messages:** {o['total_messages']:,} | **Tool calls:** {o['total_tool_calls']:,}")
lines.append(f"**Tokens:** {o['total_tokens']:,} (in: {o['total_input_tokens']:,} / out: {o['total_output_tokens']:,})")
cache_total = o.get("total_cache_read_tokens", 0) + o.get("total_cache_write_tokens", 0)
if cache_total > 0:
lines.append(f"**Tokens:** {o['total_tokens']:,} (in: {o['total_input_tokens']:,} / out: {o['total_output_tokens']:,} / cache: {cache_total:,})")
else:
lines.append(f"**Tokens:** {o['total_tokens']:,} (in: {o['total_input_tokens']:,} / out: {o['total_output_tokens']:,})")
cost_note = ""
if o.get("models_without_pricing"):
cost_note = " _(excludes custom/self-hosted models)_"

335
agent/memory_manager.py Normal file
View File

@@ -0,0 +1,335 @@
"""MemoryManager — orchestrates the built-in memory provider plus at most
ONE external plugin memory provider.
Single integration point in run_agent.py. Replaces scattered per-backend
code with one manager that delegates to registered providers.
The BuiltinMemoryProvider is always registered first and cannot be removed.
Only ONE external (non-builtin) provider is allowed at a time — attempting
to register a second external provider is rejected with a warning. This
prevents tool schema bloat and conflicting memory backends.
Usage in run_agent.py:
self._memory_manager = MemoryManager()
self._memory_manager.add_provider(BuiltinMemoryProvider(...))
# Only ONE of these:
self._memory_manager.add_provider(plugin_provider)
# System prompt
prompt_parts.append(self._memory_manager.build_system_prompt())
# Pre-turn
context = self._memory_manager.prefetch_all(user_message)
# Post-turn
self._memory_manager.sync_all(user_msg, assistant_response)
self._memory_manager.queue_prefetch_all(user_msg)
"""
from __future__ import annotations
import json
import logging
from typing import Any, Dict, List, Optional
from agent.memory_provider import MemoryProvider
logger = logging.getLogger(__name__)
class MemoryManager:
"""Orchestrates the built-in provider plus at most one external provider.
The builtin provider is always first. Only one non-builtin (external)
provider is allowed. Failures in one provider never block the other.
"""
def __init__(self) -> None:
self._providers: List[MemoryProvider] = []
self._tool_to_provider: Dict[str, MemoryProvider] = {}
self._has_external: bool = False # True once a non-builtin provider is added
# -- Registration --------------------------------------------------------
def add_provider(self, provider: MemoryProvider) -> None:
"""Register a memory provider.
Built-in provider (name ``"builtin"``) is always accepted.
Only **one** external (non-builtin) provider is allowed — a second
attempt is rejected with a warning.
"""
is_builtin = provider.name == "builtin"
if not is_builtin:
if self._has_external:
existing = next(
(p.name for p in self._providers if p.name != "builtin"), "unknown"
)
logger.warning(
"Rejected memory provider '%s' — external provider '%s' is "
"already registered. Only one external memory provider is "
"allowed at a time. Configure which one via memory.provider "
"in config.yaml.",
provider.name, existing,
)
return
self._has_external = True
self._providers.append(provider)
# Index tool names → provider for routing
for schema in provider.get_tool_schemas():
tool_name = schema.get("name", "")
if tool_name and tool_name not in self._tool_to_provider:
self._tool_to_provider[tool_name] = provider
elif tool_name in self._tool_to_provider:
logger.warning(
"Memory tool name conflict: '%s' already registered by %s, "
"ignoring from %s",
tool_name,
self._tool_to_provider[tool_name].name,
provider.name,
)
logger.info(
"Memory provider '%s' registered (%d tools)",
provider.name,
len(provider.get_tool_schemas()),
)
@property
def providers(self) -> List[MemoryProvider]:
"""All registered providers in order."""
return list(self._providers)
@property
def provider_names(self) -> List[str]:
"""Names of all registered providers."""
return [p.name for p in self._providers]
def get_provider(self, name: str) -> Optional[MemoryProvider]:
"""Get a provider by name, or None if not registered."""
for p in self._providers:
if p.name == name:
return p
return None
# -- System prompt -------------------------------------------------------
def build_system_prompt(self) -> str:
"""Collect system prompt blocks from all providers.
Returns combined text, or empty string if no providers contribute.
Each non-empty block is labeled with the provider name.
"""
blocks = []
for provider in self._providers:
try:
block = provider.system_prompt_block()
if block and block.strip():
blocks.append(block)
except Exception as e:
logger.warning(
"Memory provider '%s' system_prompt_block() failed: %s",
provider.name, e,
)
return "\n\n".join(blocks)
# -- Prefetch / recall ---------------------------------------------------
def prefetch_all(self, query: str, *, session_id: str = "") -> str:
"""Collect prefetch context from all providers.
Returns merged context text labeled by provider. Empty providers
are skipped. Failures in one provider don't block others.
"""
parts = []
for provider in self._providers:
try:
result = provider.prefetch(query, session_id=session_id)
if result and result.strip():
parts.append(result)
except Exception as e:
logger.debug(
"Memory provider '%s' prefetch failed (non-fatal): %s",
provider.name, e,
)
return "\n\n".join(parts)
def queue_prefetch_all(self, query: str, *, session_id: str = "") -> None:
"""Queue background prefetch on all providers for the next turn."""
for provider in self._providers:
try:
provider.queue_prefetch(query, session_id=session_id)
except Exception as e:
logger.debug(
"Memory provider '%s' queue_prefetch failed (non-fatal): %s",
provider.name, e,
)
# -- Sync ----------------------------------------------------------------
def sync_all(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
"""Sync a completed turn to all providers."""
for provider in self._providers:
try:
provider.sync_turn(user_content, assistant_content, session_id=session_id)
except Exception as e:
logger.warning(
"Memory provider '%s' sync_turn failed: %s",
provider.name, e,
)
# -- Tools ---------------------------------------------------------------
def get_all_tool_schemas(self) -> List[Dict[str, Any]]:
"""Collect tool schemas from all providers."""
schemas = []
seen = set()
for provider in self._providers:
try:
for schema in provider.get_tool_schemas():
name = schema.get("name", "")
if name and name not in seen:
schemas.append(schema)
seen.add(name)
except Exception as e:
logger.warning(
"Memory provider '%s' get_tool_schemas() failed: %s",
provider.name, e,
)
return schemas
def get_all_tool_names(self) -> set:
"""Return set of all tool names across all providers."""
return set(self._tool_to_provider.keys())
def has_tool(self, tool_name: str) -> bool:
"""Check if any provider handles this tool."""
return tool_name in self._tool_to_provider
def handle_tool_call(
self, tool_name: str, args: Dict[str, Any], **kwargs
) -> str:
"""Route a tool call to the correct provider.
Returns JSON string result. Raises ValueError if no provider
handles the tool.
"""
provider = self._tool_to_provider.get(tool_name)
if provider is None:
return json.dumps({"error": f"No memory provider handles tool '{tool_name}'"})
try:
return provider.handle_tool_call(tool_name, args, **kwargs)
except Exception as e:
logger.error(
"Memory provider '%s' handle_tool_call(%s) failed: %s",
provider.name, tool_name, e,
)
return json.dumps({"error": f"Memory tool '{tool_name}' failed: {e}"})
# -- Lifecycle hooks -----------------------------------------------------
def on_turn_start(self, turn_number: int, message: str, **kwargs) -> None:
"""Notify all providers of a new turn.
kwargs may include: remaining_tokens, model, platform, tool_count.
"""
for provider in self._providers:
try:
provider.on_turn_start(turn_number, message, **kwargs)
except Exception as e:
logger.debug(
"Memory provider '%s' on_turn_start failed: %s",
provider.name, e,
)
def on_session_end(self, messages: List[Dict[str, Any]]) -> None:
"""Notify all providers of session end."""
for provider in self._providers:
try:
provider.on_session_end(messages)
except Exception as e:
logger.debug(
"Memory provider '%s' on_session_end failed: %s",
provider.name, e,
)
def on_pre_compress(self, messages: List[Dict[str, Any]]) -> str:
"""Notify all providers before context compression.
Returns combined text from providers to include in the compression
summary prompt. Empty string if no provider contributes.
"""
parts = []
for provider in self._providers:
try:
result = provider.on_pre_compress(messages)
if result and result.strip():
parts.append(result)
except Exception as e:
logger.debug(
"Memory provider '%s' on_pre_compress failed: %s",
provider.name, e,
)
return "\n\n".join(parts)
def on_memory_write(self, action: str, target: str, content: str) -> None:
"""Notify external providers when the built-in memory tool writes.
Skips the builtin provider itself (it's the source of the write).
"""
for provider in self._providers:
if provider.name == "builtin":
continue
try:
provider.on_memory_write(action, target, content)
except Exception as e:
logger.debug(
"Memory provider '%s' on_memory_write failed: %s",
provider.name, e,
)
def on_delegation(self, task: str, result: str, *,
child_session_id: str = "", **kwargs) -> None:
"""Notify all providers that a subagent completed."""
for provider in self._providers:
try:
provider.on_delegation(
task, result, child_session_id=child_session_id, **kwargs
)
except Exception as e:
logger.debug(
"Memory provider '%s' on_delegation failed: %s",
provider.name, e,
)
def shutdown_all(self) -> None:
"""Shut down all providers (reverse order for clean teardown)."""
for provider in reversed(self._providers):
try:
provider.shutdown()
except Exception as e:
logger.warning(
"Memory provider '%s' shutdown failed: %s",
provider.name, e,
)
def initialize_all(self, session_id: str, **kwargs) -> None:
"""Initialize all providers.
Automatically injects ``hermes_home`` into *kwargs* so that every
provider can resolve profile-scoped storage paths without importing
``get_hermes_home()`` themselves.
"""
if "hermes_home" not in kwargs:
from hermes_constants import get_hermes_home
kwargs["hermes_home"] = str(get_hermes_home())
for provider in self._providers:
try:
provider.initialize(session_id=session_id, **kwargs)
except Exception as e:
logger.warning(
"Memory provider '%s' initialize failed: %s",
provider.name, e,
)

231
agent/memory_provider.py Normal file
View File

@@ -0,0 +1,231 @@
"""Abstract base class for pluggable memory providers.
Memory providers give the agent persistent recall across sessions. One
external provider is active at a time alongside the always-on built-in
memory (MEMORY.md / USER.md). The MemoryManager enforces this limit.
Built-in memory is always active as the first provider and cannot be removed.
External providers (Honcho, Hindsight, Mem0, etc.) are additive — they never
disable the built-in store. Only one external provider runs at a time to
prevent tool schema bloat and conflicting memory backends.
Registration:
1. Built-in: BuiltinMemoryProvider — always present, not removable.
2. Plugins: Ship in plugins/memory/<name>/, activated by memory.provider config.
Lifecycle (called by MemoryManager, wired in run_agent.py):
initialize() — connect, create resources, warm up
system_prompt_block() — static text for the system prompt
prefetch(query) — background recall before each turn
sync_turn(user, asst) — async write after each turn
get_tool_schemas() — tool schemas to expose to the model
handle_tool_call() — dispatch a tool call
shutdown() — clean exit
Optional hooks (override to opt in):
on_turn_start(turn, message, **kwargs) — per-turn tick with runtime context
on_session_end(messages) — end-of-session extraction
on_pre_compress(messages) -> str — extract before context compression
on_memory_write(action, target, content) — mirror built-in memory writes
on_delegation(task, result, **kwargs) — parent-side observation of subagent work
"""
from __future__ import annotations
import logging
from abc import ABC, abstractmethod
from typing import Any, Dict, List, Optional
logger = logging.getLogger(__name__)
class MemoryProvider(ABC):
"""Abstract base class for memory providers."""
@property
@abstractmethod
def name(self) -> str:
"""Short identifier for this provider (e.g. 'builtin', 'honcho', 'hindsight')."""
# -- Core lifecycle (implement these) ------------------------------------
@abstractmethod
def is_available(self) -> bool:
"""Return True if this provider is configured, has credentials, and is ready.
Called during agent init to decide whether to activate the provider.
Should not make network calls — just check config and installed deps.
"""
@abstractmethod
def initialize(self, session_id: str, **kwargs) -> None:
"""Initialize for a session.
Called once at agent startup. May create resources (banks, tables),
establish connections, start background threads, etc.
kwargs always include:
- hermes_home (str): The active HERMES_HOME directory path. Use this
for profile-scoped storage instead of hardcoding ``~/.hermes``.
- platform (str): "cli", "telegram", "discord", "cron", etc.
kwargs may also include:
- agent_context (str): "primary", "subagent", "cron", or "flush".
Providers should skip writes for non-primary contexts (cron system
prompts would corrupt user representations).
- agent_identity (str): Profile name (e.g. "coder"). Use for
per-profile provider identity scoping.
- agent_workspace (str): Shared workspace name (e.g. "hermes").
- parent_session_id (str): For subagents, the parent's session_id.
- user_id (str): Platform user identifier (gateway sessions).
"""
def system_prompt_block(self) -> str:
"""Return text to include in the system prompt.
Called during system prompt assembly. Return empty string to skip.
This is for STATIC provider info (instructions, status). Prefetched
recall context is injected separately via prefetch().
"""
return ""
def prefetch(self, query: str, *, session_id: str = "") -> str:
"""Recall relevant context for the upcoming turn.
Called before each API call. Return formatted text to inject as
context, or empty string if nothing relevant. Implementations
should be fast — use background threads for the actual recall
and return cached results here.
session_id is provided for providers serving concurrent sessions
(gateway group chats, cached agents). Providers that don't need
per-session scoping can ignore it.
"""
return ""
def queue_prefetch(self, query: str, *, session_id: str = "") -> None:
"""Queue a background recall for the NEXT turn.
Called after each turn completes. The result will be consumed
by prefetch() on the next turn. Default is no-op — providers
that do background prefetching should override this.
"""
def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
"""Persist a completed turn to the backend.
Called after each turn. Should be non-blocking — queue for
background processing if the backend has latency.
"""
@abstractmethod
def get_tool_schemas(self) -> List[Dict[str, Any]]:
"""Return tool schemas this provider exposes.
Each schema follows the OpenAI function calling format:
{"name": "...", "description": "...", "parameters": {...}}
Return empty list if this provider has no tools (context-only).
"""
def handle_tool_call(self, tool_name: str, args: Dict[str, Any], **kwargs) -> str:
"""Handle a tool call for one of this provider's tools.
Must return a JSON string (the tool result).
Only called for tool names returned by get_tool_schemas().
"""
raise NotImplementedError(f"Provider {self.name} does not handle tool {tool_name}")
def shutdown(self) -> None:
"""Clean shutdown — flush queues, close connections."""
# -- Optional hooks (override to opt in) ---------------------------------
def on_turn_start(self, turn_number: int, message: str, **kwargs) -> None:
"""Called at the start of each turn with the user message.
Use for turn-counting, scope management, periodic maintenance.
kwargs may include: remaining_tokens, model, platform, tool_count.
Providers use what they need; extras are ignored.
"""
def on_session_end(self, messages: List[Dict[str, Any]]) -> None:
"""Called when a session ends (explicit exit or timeout).
Use for end-of-session fact extraction, summarization, etc.
messages is the full conversation history.
NOT called after every turn — only at actual session boundaries
(CLI exit, /reset, gateway session expiry).
"""
def on_pre_compress(self, messages: List[Dict[str, Any]]) -> str:
"""Called before context compression discards old messages.
Use to extract insights from messages about to be compressed.
messages is the list that will be summarized/discarded.
Return text to include in the compression summary prompt so the
compressor preserves provider-extracted insights. Return empty
string for no contribution (backwards-compatible default).
"""
return ""
def on_delegation(self, task: str, result: str, *,
child_session_id: str = "", **kwargs) -> None:
"""Called on the PARENT agent when a subagent completes.
The parent's memory provider gets the task+result pair as an
observation of what was delegated and what came back. The subagent
itself has no provider session (skip_memory=True).
task: the delegation prompt
result: the subagent's final response
child_session_id: the subagent's session_id
"""
def get_config_schema(self) -> List[Dict[str, Any]]:
"""Return config fields this provider needs for setup.
Used by 'hermes memory setup' to walk the user through configuration.
Each field is a dict with:
key: config key name (e.g. 'api_key', 'mode')
description: human-readable description
secret: True if this should go to .env (default: False)
required: True if required (default: False)
default: default value (optional)
choices: list of valid values (optional)
url: URL where user can get this credential (optional)
env_var: explicit env var name for secrets (default: auto-generated)
Return empty list if no config needed (e.g. local-only providers).
"""
return []
def save_config(self, values: Dict[str, Any], hermes_home: str) -> None:
"""Write non-secret config to the provider's native location.
Called by 'hermes memory setup' after collecting user inputs.
``values`` contains only non-secret fields (secrets go to .env).
``hermes_home`` is the active HERMES_HOME directory path.
Providers with native config files (JSON, YAML) should override
this to write to their expected location. Providers that use only
env vars can leave the default (no-op).
All new memory provider plugins MUST implement either:
- save_config() for native config file formats, OR
- use only env vars (in which case get_config_schema() fields
should all have ``env_var`` set and this method stays no-op).
"""
def on_memory_write(self, action: str, target: str, content: str) -> None:
"""Called when the built-in memory tool writes an entry.
action: 'add', 'replace', or 'remove'
target: 'memory' or 'user'
content: the entry content
Use to mirror built-in memory writes to your backend.
"""

View File

@@ -10,6 +10,7 @@ import re
import time
from pathlib import Path
from typing import Any, Dict, List, Optional
from urllib.parse import urlparse
import requests
import yaml
@@ -18,61 +19,361 @@ from hermes_constants import OPENROUTER_MODELS_URL
logger = logging.getLogger(__name__)
# Provider names that can appear as a "provider:" prefix before a model ID.
# Only these are stripped — Ollama-style "model:tag" colons (e.g. "qwen3.5:27b")
# are preserved so the full model name reaches cache lookups and server queries.
_PROVIDER_PREFIXES: frozenset[str] = frozenset({
"openrouter", "nous", "openai-codex", "copilot", "copilot-acp",
"zai", "kimi-coding", "minimax", "minimax-cn", "anthropic", "deepseek",
"opencode-zen", "opencode-go", "ai-gateway", "kilocode", "alibaba",
"custom", "local",
# Common aliases
"glm", "z-ai", "z.ai", "zhipu", "github", "github-copilot",
"github-models", "kimi", "moonshot", "claude", "deep-seek",
"opencode", "zen", "go", "vercel", "kilo", "dashscope", "aliyun", "qwen",
})
_OLLAMA_TAG_PATTERN = re.compile(
r"^(\d+\.?\d*b|latest|stable|q\d|fp?\d|instruct|chat|coder|vision|text)",
re.IGNORECASE,
)
def _strip_provider_prefix(model: str) -> str:
"""Strip a recognised provider prefix from a model string.
``"local:my-model"`` → ``"my-model"``
``"qwen3.5:27b"`` → ``"qwen3.5:27b"`` (unchanged — not a provider prefix)
``"qwen:0.5b"`` → ``"qwen:0.5b"`` (unchanged — Ollama model:tag)
``"deepseek:latest"``→ ``"deepseek:latest"``(unchanged — Ollama model:tag)
"""
if ":" not in model or model.startswith("http"):
return model
prefix, suffix = model.split(":", 1)
prefix_lower = prefix.strip().lower()
if prefix_lower in _PROVIDER_PREFIXES:
# Don't strip if suffix looks like an Ollama tag (e.g. "7b", "latest", "q4_0")
if _OLLAMA_TAG_PATTERN.match(suffix.strip()):
return model
return suffix
return model
_model_metadata_cache: Dict[str, Dict[str, Any]] = {}
_model_metadata_cache_time: float = 0
_MODEL_CACHE_TTL = 3600
_endpoint_model_metadata_cache: Dict[str, Dict[str, Dict[str, Any]]] = {}
_endpoint_model_metadata_cache_time: Dict[str, float] = {}
_ENDPOINT_MODEL_CACHE_TTL = 300
# Descending tiers for context length probing when the model is unknown.
# We start high and step down on context-length errors until one works.
# We start at 128K (a safe default for most modern models) and step down
# on context-length errors until one works.
CONTEXT_PROBE_TIERS = [
2_000_000,
1_000_000,
512_000,
200_000,
128_000,
64_000,
32_000,
16_000,
8_000,
]
# Default context length when no detection method succeeds.
DEFAULT_FALLBACK_CONTEXT = CONTEXT_PROBE_TIERS[0]
# Thin fallback defaults — only broad model family patterns.
# These fire only when provider is unknown AND models.dev/OpenRouter/Anthropic
# all miss. Replaced the previous 80+ entry dict.
# For provider-specific context lengths, models.dev is the primary source.
DEFAULT_CONTEXT_LENGTHS = {
"anthropic/claude-opus-4": 200000,
"anthropic/claude-opus-4.5": 200000,
"anthropic/claude-opus-4.6": 200000,
"anthropic/claude-sonnet-4": 200000,
"anthropic/claude-sonnet-4-20250514": 200000,
"anthropic/claude-haiku-4.5": 200000,
# Bare Anthropic model IDs (for native API provider)
"claude-opus-4-6": 200000,
"claude-sonnet-4-6": 200000,
"claude-opus-4-5-20251101": 200000,
"claude-sonnet-4-5-20250929": 200000,
"claude-opus-4-1-20250805": 200000,
"claude-opus-4-20250514": 200000,
"claude-sonnet-4-20250514": 200000,
"claude-haiku-4-5-20251001": 200000,
"openai/gpt-4o": 128000,
"openai/gpt-4-turbo": 128000,
"openai/gpt-4o-mini": 128000,
"google/gemini-2.0-flash": 1048576,
"google/gemini-2.5-pro": 1048576,
"meta-llama/llama-3.3-70b-instruct": 131072,
"deepseek/deepseek-chat-v3": 65536,
"qwen/qwen-2.5-72b-instruct": 32768,
"glm-4.7": 202752,
"glm-5": 202752,
"glm-4.5": 131072,
"glm-4.5-flash": 131072,
"kimi-for-coding": 262144,
"kimi-k2.5": 262144,
"kimi-k2-thinking": 262144,
"kimi-k2-thinking-turbo": 262144,
"kimi-k2-turbo-preview": 262144,
"kimi-k2-0905-preview": 131072,
"MiniMax-M2.5": 204800,
"MiniMax-M2.5-highspeed": 204800,
"MiniMax-M2.1": 204800,
# Anthropic Claude 4.6 (1M context) — bare IDs only to avoid
# fuzzy-match collisions (e.g. "anthropic/claude-sonnet-4" is a
# substring of "anthropic/claude-sonnet-4.6").
# OpenRouter-prefixed models resolve via OpenRouter live API or models.dev.
"claude-opus-4-6": 1000000,
"claude-sonnet-4-6": 1000000,
"claude-opus-4.6": 1000000,
"claude-sonnet-4.6": 1000000,
# Catch-all for older Claude models (must sort after specific entries)
"claude": 200000,
# OpenAI
"gpt-4.1": 1047576,
"gpt-5": 128000,
"gpt-4": 128000,
# Google
"gemini": 1048576,
# DeepSeek
"deepseek": 128000,
# Meta
"llama": 131072,
# Qwen
"qwen": 131072,
# MiniMax
"minimax": 204800,
# GLM
"glm": 202752,
# Kimi
"kimi": 262144,
# Arcee
"trinity": 262144,
# Hugging Face Inference Providers — model IDs use org/name format
"Qwen/Qwen3.5-397B-A17B": 131072,
"Qwen/Qwen3.5-35B-A3B": 131072,
"deepseek-ai/DeepSeek-V3.2": 65536,
"moonshotai/Kimi-K2.5": 262144,
"moonshotai/Kimi-K2-Thinking": 262144,
"MiniMaxAI/MiniMax-M2.5": 204800,
"XiaomiMiMo/MiMo-V2-Flash": 32768,
"mimo-v2-pro": 1048576,
"mimo-v2-omni": 1048576,
"zai-org/GLM-5": 202752,
}
_CONTEXT_LENGTH_KEYS = (
"context_length",
"context_window",
"max_context_length",
"max_position_embeddings",
"max_model_len",
"max_input_tokens",
"max_sequence_length",
"max_seq_len",
"n_ctx_train",
"n_ctx",
)
_MAX_COMPLETION_KEYS = (
"max_completion_tokens",
"max_output_tokens",
"max_tokens",
)
# Local server hostnames / address patterns
_LOCAL_HOSTS = ("localhost", "127.0.0.1", "::1", "0.0.0.0")
def _normalize_base_url(base_url: str) -> str:
return (base_url or "").strip().rstrip("/")
def _is_openrouter_base_url(base_url: str) -> bool:
return "openrouter.ai" in _normalize_base_url(base_url).lower()
def _is_custom_endpoint(base_url: str) -> bool:
normalized = _normalize_base_url(base_url)
return bool(normalized) and not _is_openrouter_base_url(normalized)
_URL_TO_PROVIDER: Dict[str, str] = {
"api.openai.com": "openai",
"chatgpt.com": "openai",
"api.anthropic.com": "anthropic",
"api.z.ai": "zai",
"api.moonshot.ai": "kimi-coding",
"api.kimi.com": "kimi-coding",
"api.minimax": "minimax",
"dashscope.aliyuncs.com": "alibaba",
"dashscope-intl.aliyuncs.com": "alibaba",
"openrouter.ai": "openrouter",
"generativelanguage.googleapis.com": "google",
"inference-api.nousresearch.com": "nous",
"api.deepseek.com": "deepseek",
"api.githubcopilot.com": "copilot",
"models.github.ai": "copilot",
"api.fireworks.ai": "fireworks",
}
def _infer_provider_from_url(base_url: str) -> Optional[str]:
"""Infer the models.dev provider name from a base URL.
This allows context length resolution via models.dev for custom endpoints
like DashScope (Alibaba), Z.AI, Kimi, etc. without requiring the user to
explicitly set the provider name in config.
"""
normalized = _normalize_base_url(base_url)
if not normalized:
return None
parsed = urlparse(normalized if "://" in normalized else f"https://{normalized}")
host = parsed.netloc.lower() or parsed.path.lower()
for url_part, provider in _URL_TO_PROVIDER.items():
if url_part in host:
return provider
return None
def _is_known_provider_base_url(base_url: str) -> bool:
return _infer_provider_from_url(base_url) is not None
def is_local_endpoint(base_url: str) -> bool:
"""Return True if base_url points to a local machine (localhost / RFC-1918 / WSL)."""
normalized = _normalize_base_url(base_url)
if not normalized:
return False
url = normalized if "://" in normalized else f"http://{normalized}"
try:
parsed = urlparse(url)
host = parsed.hostname or ""
except Exception:
return False
if host in _LOCAL_HOSTS:
return True
# RFC-1918 private ranges and link-local
import ipaddress
try:
addr = ipaddress.ip_address(host)
return addr.is_private or addr.is_loopback or addr.is_link_local
except ValueError:
pass
# Bare IP that looks like a private range (e.g. 172.26.x.x for WSL)
parts = host.split(".")
if len(parts) == 4:
try:
first, second = int(parts[0]), int(parts[1])
if first == 10:
return True
if first == 172 and 16 <= second <= 31:
return True
if first == 192 and second == 168:
return True
except ValueError:
pass
return False
def detect_local_server_type(base_url: str) -> Optional[str]:
"""Detect which local server is running at base_url by probing known endpoints.
Returns one of: "ollama", "lm-studio", "vllm", "llamacpp", or None.
"""
import httpx
normalized = _normalize_base_url(base_url)
server_url = normalized
if server_url.endswith("/v1"):
server_url = server_url[:-3]
try:
with httpx.Client(timeout=2.0) as client:
# LM Studio exposes /api/v1/models — check first (most specific)
try:
r = client.get(f"{server_url}/api/v1/models")
if r.status_code == 200:
return "lm-studio"
except Exception:
pass
# Ollama exposes /api/tags and responds with {"models": [...]}
# LM Studio returns {"error": "Unexpected endpoint"} with status 200
# on this path, so we must verify the response contains "models".
try:
r = client.get(f"{server_url}/api/tags")
if r.status_code == 200:
try:
data = r.json()
if "models" in data:
return "ollama"
except Exception:
pass
except Exception:
pass
# llama.cpp exposes /v1/props (older builds used /props without the /v1 prefix)
try:
r = client.get(f"{server_url}/v1/props")
if r.status_code != 200:
r = client.get(f"{server_url}/props") # fallback for older builds
if r.status_code == 200 and "default_generation_settings" in r.text:
return "llamacpp"
except Exception:
pass
# vLLM: /version
try:
r = client.get(f"{server_url}/version")
if r.status_code == 200:
data = r.json()
if "version" in data:
return "vllm"
except Exception:
pass
except Exception:
pass
return None
def _iter_nested_dicts(value: Any):
if isinstance(value, dict):
yield value
for nested in value.values():
yield from _iter_nested_dicts(nested)
elif isinstance(value, list):
for item in value:
yield from _iter_nested_dicts(item)
def _coerce_reasonable_int(value: Any, minimum: int = 1024, maximum: int = 10_000_000) -> Optional[int]:
try:
if isinstance(value, bool):
return None
if isinstance(value, str):
value = value.strip().replace(",", "")
result = int(value)
except (TypeError, ValueError):
return None
if minimum <= result <= maximum:
return result
return None
def _extract_first_int(payload: Dict[str, Any], keys: tuple[str, ...]) -> Optional[int]:
keyset = {key.lower() for key in keys}
for mapping in _iter_nested_dicts(payload):
for key, value in mapping.items():
if str(key).lower() not in keyset:
continue
coerced = _coerce_reasonable_int(value)
if coerced is not None:
return coerced
return None
def _extract_context_length(payload: Dict[str, Any]) -> Optional[int]:
return _extract_first_int(payload, _CONTEXT_LENGTH_KEYS)
def _extract_max_completion_tokens(payload: Dict[str, Any]) -> Optional[int]:
return _extract_first_int(payload, _MAX_COMPLETION_KEYS)
def _extract_pricing(payload: Dict[str, Any]) -> Dict[str, Any]:
alias_map = {
"prompt": ("prompt", "input", "input_cost_per_token", "prompt_token_cost"),
"completion": ("completion", "output", "output_cost_per_token", "completion_token_cost"),
"request": ("request", "request_cost"),
"cache_read": ("cache_read", "cached_prompt", "input_cache_read", "cache_read_cost_per_token"),
"cache_write": ("cache_write", "cache_creation", "input_cache_write", "cache_write_cost_per_token"),
}
for mapping in _iter_nested_dicts(payload):
normalized = {str(key).lower(): value for key, value in mapping.items()}
if not any(any(alias in normalized for alias in aliases) for aliases in alias_map.values()):
continue
pricing: Dict[str, Any] = {}
for target, aliases in alias_map.items():
for alias in aliases:
if alias in normalized and normalized[alias] not in (None, ""):
pricing[target] = normalized[alias]
break
if pricing:
return pricing
return {}
def _add_model_aliases(cache: Dict[str, Dict[str, Any]], model_id: str, entry: Dict[str, Any]) -> None:
cache[model_id] = entry
if "/" in model_id:
bare_model = model_id.split("/", 1)[1]
cache.setdefault(bare_model, entry)
def fetch_model_metadata(force_refresh: bool = False) -> Dict[str, Dict[str, Any]]:
"""Fetch model metadata from OpenRouter (cached for 1 hour)."""
@@ -89,15 +390,16 @@ def fetch_model_metadata(force_refresh: bool = False) -> Dict[str, Dict[str, Any
cache = {}
for model in data.get("data", []):
model_id = model.get("id", "")
cache[model_id] = {
entry = {
"context_length": model.get("context_length", 128000),
"max_completion_tokens": model.get("top_provider", {}).get("max_completion_tokens", 4096),
"name": model.get("name", model_id),
"pricing": model.get("pricing", {}),
}
_add_model_aliases(cache, model_id, entry)
canonical = model.get("canonical_slug", "")
if canonical and canonical != model_id:
cache[canonical] = cache[model_id]
_add_model_aliases(cache, canonical, entry)
_model_metadata_cache = cache
_model_metadata_cache_time = time.time()
@@ -109,6 +411,97 @@ def fetch_model_metadata(force_refresh: bool = False) -> Dict[str, Dict[str, Any
return _model_metadata_cache or {}
def fetch_endpoint_model_metadata(
base_url: str,
api_key: str = "",
force_refresh: bool = False,
) -> Dict[str, Dict[str, Any]]:
"""Fetch model metadata from an OpenAI-compatible ``/models`` endpoint.
This is used for explicit custom endpoints where hardcoded global model-name
defaults are unreliable. Results are cached in memory per base URL.
"""
normalized = _normalize_base_url(base_url)
if not normalized or _is_openrouter_base_url(normalized):
return {}
if not force_refresh:
cached = _endpoint_model_metadata_cache.get(normalized)
cached_at = _endpoint_model_metadata_cache_time.get(normalized, 0)
if cached is not None and (time.time() - cached_at) < _ENDPOINT_MODEL_CACHE_TTL:
return cached
candidates = [normalized]
if normalized.endswith("/v1"):
alternate = normalized[:-3].rstrip("/")
else:
alternate = normalized + "/v1"
if alternate and alternate not in candidates:
candidates.append(alternate)
headers = {"Authorization": f"Bearer {api_key}"} if api_key else {}
last_error: Optional[Exception] = None
for candidate in candidates:
url = candidate.rstrip("/") + "/models"
try:
response = requests.get(url, headers=headers, timeout=10)
response.raise_for_status()
payload = response.json()
cache: Dict[str, Dict[str, Any]] = {}
for model in payload.get("data", []):
if not isinstance(model, dict):
continue
model_id = model.get("id")
if not model_id:
continue
entry: Dict[str, Any] = {"name": model.get("name", model_id)}
context_length = _extract_context_length(model)
if context_length is not None:
entry["context_length"] = context_length
max_completion_tokens = _extract_max_completion_tokens(model)
if max_completion_tokens is not None:
entry["max_completion_tokens"] = max_completion_tokens
pricing = _extract_pricing(model)
if pricing:
entry["pricing"] = pricing
_add_model_aliases(cache, model_id, entry)
# If this is a llama.cpp server, query /props for actual allocated context
is_llamacpp = any(
m.get("owned_by") == "llamacpp"
for m in payload.get("data", []) if isinstance(m, dict)
)
if is_llamacpp:
try:
# Try /v1/props first (current llama.cpp); fall back to /props for older builds
base = candidate.rstrip("/").replace("/v1", "")
props_resp = requests.get(base + "/v1/props", headers=headers, timeout=5)
if not props_resp.ok:
props_resp = requests.get(base + "/props", headers=headers, timeout=5)
if props_resp.ok:
props = props_resp.json()
gen_settings = props.get("default_generation_settings", {})
n_ctx = gen_settings.get("n_ctx")
model_alias = props.get("model_alias", "")
if n_ctx and model_alias and model_alias in cache:
cache[model_alias]["context_length"] = n_ctx
except Exception:
pass
_endpoint_model_metadata_cache[normalized] = cache
_endpoint_model_metadata_cache_time[normalized] = time.time()
return cache
except Exception as exc:
last_error = exc
if last_error:
logger.debug("Failed to fetch model metadata from %s/models: %s", normalized, last_error)
_endpoint_model_metadata_cache[normalized] = {}
_endpoint_model_metadata_cache_time[normalized] = time.time()
return {}
def _get_context_cache_path() -> Path:
"""Return path to the persistent context length cache file."""
hermes_home = Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes"))
@@ -116,7 +509,7 @@ def _get_context_cache_path() -> Path:
def _load_context_cache() -> Dict[str, int]:
"""Load the model+provider context_length cache from disk."""
"""Load the model+provider -> context_length cache from disk."""
path = _get_context_cache_path()
if not path.exists():
return {}
@@ -145,7 +538,7 @@ def save_context_length(model: str, base_url: str, length: int) -> None:
path.parent.mkdir(parents=True, exist_ok=True)
with open(path, "w") as f:
yaml.dump({"context_lengths": cache}, f, default_flow_style=False)
logger.info("Cached context length %s %s tokens", key, f"{length:,}")
logger.info("Cached context length %s -> %s tokens", key, f"{length:,}")
except Exception as e:
logger.debug("Failed to save context length cache: %s", e)
@@ -193,33 +586,317 @@ def parse_context_limit_from_error(error_msg: str) -> Optional[int]:
return None
def get_model_context_length(model: str, base_url: str = "") -> int:
def _model_id_matches(candidate_id: str, lookup_model: str) -> bool:
"""Return True if *candidate_id* (from server) matches *lookup_model* (configured).
Supports two forms:
- Exact match: "nvidia-nemotron-super-49b-v1" == "nvidia-nemotron-super-49b-v1"
- Slug match: "nvidia/nvidia-nemotron-super-49b-v1" matches "nvidia-nemotron-super-49b-v1"
(the part after the last "/" equals lookup_model)
This covers LM Studio's native API which stores models as "publisher/slug"
while users typically configure only the slug after the "local:" prefix.
"""
if candidate_id == lookup_model:
return True
# Slug match: basename of candidate equals the lookup name
if "/" in candidate_id and candidate_id.rsplit("/", 1)[1] == lookup_model:
return True
return False
def _query_local_context_length(model: str, base_url: str) -> Optional[int]:
"""Query a local server for the model's context length."""
import httpx
# Strip recognised provider prefix (e.g., "local:model-name" → "model-name").
# Ollama "model:tag" colons (e.g. "qwen3.5:27b") are intentionally preserved.
model = _strip_provider_prefix(model)
# Strip /v1 suffix to get the server root
server_url = base_url.rstrip("/")
if server_url.endswith("/v1"):
server_url = server_url[:-3]
try:
server_type = detect_local_server_type(base_url)
except Exception:
server_type = None
try:
with httpx.Client(timeout=3.0) as client:
# Ollama: /api/show returns model details with context info
if server_type == "ollama":
resp = client.post(f"{server_url}/api/show", json={"name": model})
if resp.status_code == 200:
data = resp.json()
# Check model_info for context length
model_info = data.get("model_info", {})
for key, value in model_info.items():
if "context_length" in key and isinstance(value, (int, float)):
return int(value)
# Check parameters string for num_ctx
params = data.get("parameters", "")
if "num_ctx" in params:
for line in params.split("\n"):
if "num_ctx" in line:
parts = line.strip().split()
if len(parts) >= 2:
try:
return int(parts[-1])
except ValueError:
pass
# LM Studio native API: /api/v1/models returns max_context_length.
# This is more reliable than the OpenAI-compat /v1/models which
# doesn't include context window information for LM Studio servers.
# Use _model_id_matches for fuzzy matching: LM Studio stores models as
# "publisher/slug" but users configure only "slug" after "local:" prefix.
if server_type == "lm-studio":
resp = client.get(f"{server_url}/api/v1/models")
if resp.status_code == 200:
data = resp.json()
for m in data.get("models", []):
if _model_id_matches(m.get("key", ""), model) or _model_id_matches(m.get("id", ""), model):
# Prefer loaded instance context (actual runtime value)
for inst in m.get("loaded_instances", []):
cfg = inst.get("config", {})
ctx = cfg.get("context_length")
if ctx and isinstance(ctx, (int, float)):
return int(ctx)
# Fall back to max_context_length (theoretical model max)
ctx = m.get("max_context_length") or m.get("context_length")
if ctx and isinstance(ctx, (int, float)):
return int(ctx)
# LM Studio / vLLM / llama.cpp: try /v1/models/{model}
resp = client.get(f"{server_url}/v1/models/{model}")
if resp.status_code == 200:
data = resp.json()
# vLLM returns max_model_len
ctx = data.get("max_model_len") or data.get("context_length") or data.get("max_tokens")
if ctx and isinstance(ctx, (int, float)):
return int(ctx)
# Try /v1/models and find the model in the list.
# Use _model_id_matches to handle "publisher/slug" vs bare "slug".
resp = client.get(f"{server_url}/v1/models")
if resp.status_code == 200:
data = resp.json()
models_list = data.get("data", [])
for m in models_list:
if _model_id_matches(m.get("id", ""), model):
ctx = m.get("max_model_len") or m.get("context_length") or m.get("max_tokens")
if ctx and isinstance(ctx, (int, float)):
return int(ctx)
except Exception:
pass
return None
def _normalize_model_version(model: str) -> str:
"""Normalize version separators for matching.
Nous uses dashes: claude-opus-4-6, claude-sonnet-4-5
OpenRouter uses dots: claude-opus-4.6, claude-sonnet-4.5
Normalize both to dashes for comparison.
"""
return model.replace(".", "-")
def _query_anthropic_context_length(model: str, base_url: str, api_key: str) -> Optional[int]:
"""Query Anthropic's /v1/models endpoint for context length.
Only works with regular ANTHROPIC_API_KEY (sk-ant-api*).
OAuth tokens (sk-ant-oat*) from Claude Code return 401.
"""
if not api_key or api_key.startswith("sk-ant-oat"):
return None # OAuth tokens can't access /v1/models
try:
base = base_url.rstrip("/")
if base.endswith("/v1"):
base = base[:-3]
url = f"{base}/v1/models?limit=1000"
headers = {
"x-api-key": api_key,
"anthropic-version": "2023-06-01",
}
resp = requests.get(url, headers=headers, timeout=10)
if resp.status_code != 200:
return None
data = resp.json()
for m in data.get("data", []):
if m.get("id") == model:
ctx = m.get("max_input_tokens")
if isinstance(ctx, int) and ctx > 0:
return ctx
except Exception as e:
logger.debug("Anthropic /v1/models query failed: %s", e)
return None
def _resolve_nous_context_length(model: str) -> Optional[int]:
"""Resolve Nous Portal model context length via OpenRouter metadata.
Nous model IDs are bare (e.g. 'claude-opus-4-6') while OpenRouter uses
prefixed IDs (e.g. 'anthropic/claude-opus-4.6'). Try suffix matching
with version normalization (dot↔dash).
"""
metadata = fetch_model_metadata() # OpenRouter cache
# Exact match first
if model in metadata:
return metadata[model].get("context_length")
normalized = _normalize_model_version(model).lower()
for or_id, entry in metadata.items():
bare = or_id.split("/", 1)[1] if "/" in or_id else or_id
if bare.lower() == model.lower() or _normalize_model_version(bare).lower() == normalized:
return entry.get("context_length")
# Partial prefix match for cases like gemini-3-flash → gemini-3-flash-preview
# Require match to be at a word boundary (followed by -, :, or end of string)
model_lower = model.lower()
for or_id, entry in metadata.items():
bare = or_id.split("/", 1)[1] if "/" in or_id else or_id
for candidate, query in [(bare.lower(), model_lower), (_normalize_model_version(bare).lower(), normalized)]:
if candidate.startswith(query) and (
len(candidate) == len(query) or candidate[len(query)] in "-:."
):
return entry.get("context_length")
return None
def get_model_context_length(
model: str,
base_url: str = "",
api_key: str = "",
config_context_length: int | None = None,
provider: str = "",
) -> int:
"""Get the context length for a model.
Resolution order:
0. Explicit config override (model.context_length or custom_providers per-model)
1. Persistent cache (previously discovered via probing)
2. OpenRouter API metadata
3. Hardcoded DEFAULT_CONTEXT_LENGTHS (fuzzy match)
4. First probe tier (2M) — will be narrowed on first context error
2. Active endpoint metadata (/models for explicit custom endpoints)
3. Local server query (for local endpoints)
4. Anthropic /v1/models API (API-key users only, not OAuth)
5. OpenRouter live API metadata
6. Nous suffix-match via OpenRouter cache
7. models.dev registry lookup (provider-aware)
8. Thin hardcoded defaults (broad family patterns)
9. Default fallback (128K)
"""
# 0. Explicit config override — user knows best
if config_context_length is not None and isinstance(config_context_length, int) and config_context_length > 0:
return config_context_length
# Normalise provider-prefixed model names (e.g. "local:model-name" →
# "model-name") so cache lookups and server queries use the bare ID that
# local servers actually know about. Ollama "model:tag" colons are preserved.
model = _strip_provider_prefix(model)
# 1. Check persistent cache (model+provider)
if base_url:
cached = get_cached_context_length(model, base_url)
if cached is not None:
return cached
# 2. OpenRouter API metadata
# 2. Active endpoint metadata for truly custom/unknown endpoints.
# Known providers (Copilot, OpenAI, Anthropic, etc.) skip this — their
# /models endpoint may report a provider-imposed limit (e.g. Copilot
# returns 128k) instead of the model's full context (400k). models.dev
# has the correct per-provider values and is checked at step 5+.
if _is_custom_endpoint(base_url) and not _is_known_provider_base_url(base_url):
endpoint_metadata = fetch_endpoint_model_metadata(base_url, api_key=api_key)
matched = endpoint_metadata.get(model)
if not matched:
# Single-model servers: if only one model is loaded, use it
if len(endpoint_metadata) == 1:
matched = next(iter(endpoint_metadata.values()))
else:
# Fuzzy match: substring in either direction
for key, entry in endpoint_metadata.items():
if model in key or key in model:
matched = entry
break
if matched:
context_length = matched.get("context_length")
if isinstance(context_length, int):
return context_length
if not _is_known_provider_base_url(base_url):
# 3. Try querying local server directly
if is_local_endpoint(base_url):
local_ctx = _query_local_context_length(model, base_url)
if local_ctx and local_ctx > 0:
save_context_length(model, base_url, local_ctx)
return local_ctx
logger.info(
"Could not detect context length for model %r at %s"
"defaulting to %s tokens (probe-down). Set model.context_length "
"in config.yaml to override.",
model, base_url, f"{DEFAULT_FALLBACK_CONTEXT:,}",
)
return DEFAULT_FALLBACK_CONTEXT
# 4. Anthropic /v1/models API (only for regular API keys, not OAuth)
if provider == "anthropic" or (
base_url and "api.anthropic.com" in base_url
):
ctx = _query_anthropic_context_length(model, base_url or "https://api.anthropic.com", api_key)
if ctx:
return ctx
# 5. Provider-aware lookups (before generic OpenRouter cache)
# These are provider-specific and take priority over the generic OR cache,
# since the same model can have different context limits per provider
# (e.g. claude-opus-4.6 is 1M on Anthropic but 128K on GitHub Copilot).
# If provider is generic (openrouter/custom/empty), try to infer from URL.
effective_provider = provider
if not effective_provider or effective_provider in ("openrouter", "custom"):
if base_url:
inferred = _infer_provider_from_url(base_url)
if inferred:
effective_provider = inferred
if effective_provider == "nous":
ctx = _resolve_nous_context_length(model)
if ctx:
return ctx
if effective_provider:
from agent.models_dev import lookup_models_dev_context
ctx = lookup_models_dev_context(effective_provider, model)
if ctx:
return ctx
# 6. OpenRouter live API metadata (provider-unaware fallback)
metadata = fetch_model_metadata()
if model in metadata:
return metadata[model].get("context_length", 128000)
# 3. Hardcoded defaults (fuzzy match)
for default_model, length in DEFAULT_CONTEXT_LENGTHS.items():
if default_model in model or model in default_model:
# 8. Hardcoded defaults (fuzzy match — longest key first for specificity)
# Only check `default_model in model` (is the key a substring of the input).
# The reverse (`model in default_model`) causes shorter names like
# "claude-sonnet-4" to incorrectly match "claude-sonnet-4-6" and return 1M.
model_lower = model.lower()
for default_model, length in sorted(
DEFAULT_CONTEXT_LENGTHS.items(), key=lambda x: len(x[0]), reverse=True
):
if default_model in model_lower:
return length
# 4. Unknown model — start at highest probe tier
return CONTEXT_PROBE_TIERS[0]
# 9. Query local server as last resort
if base_url and is_local_endpoint(base_url):
local_ctx = _query_local_context_length(model, base_url)
if local_ctx and local_ctx > 0:
save_context_length(model, base_url, local_ctx)
return local_ctx
# 10. Default fallback — 128K
return DEFAULT_FALLBACK_CONTEXT
def estimate_tokens_rough(text: str) -> int:
@@ -233,3 +910,26 @@ def estimate_messages_tokens_rough(messages: List[Dict[str, Any]]) -> int:
"""Rough token estimate for a message list (pre-flight only)."""
total_chars = sum(len(str(msg)) for msg in messages)
return total_chars // 4
def estimate_request_tokens_rough(
messages: List[Dict[str, Any]],
*,
system_prompt: str = "",
tools: Optional[List[Dict[str, Any]]] = None,
) -> int:
"""Rough token estimate for a full chat-completions request.
Includes the major payload buckets Hermes sends to providers:
system prompt, conversation messages, and tool schemas. With 50+
tools enabled, schemas alone can add 20-30K tokens — a significant
blind spot when only counting messages.
"""
total_chars = 0
if system_prompt:
total_chars += len(system_prompt)
if messages:
total_chars += sum(len(str(msg)) for msg in messages)
if tools:
total_chars += len(str(tools))
return total_chars // 4

747
agent/models_dev.py Normal file
View File

@@ -0,0 +1,747 @@
"""Models.dev registry integration — primary database for providers and models.
Fetches from https://models.dev/api.json — a community-maintained database
of 4000+ models across 109+ providers. Provides:
- **Provider metadata**: name, base URL, env vars, documentation link
- **Model metadata**: context window, max output, cost/M tokens, capabilities
(reasoning, tools, vision, PDF, audio), modalities, knowledge cutoff,
open-weights flag, family grouping, deprecation status
Data resolution order (like TypeScript OpenCode):
1. Bundled snapshot (ships with the package — offline-first)
2. Disk cache (~/.hermes/models_dev_cache.json)
3. Network fetch (https://models.dev/api.json)
4. Background refresh every 60 minutes
Other modules should import the dataclasses and query functions from here
rather than parsing the raw JSON themselves.
"""
import difflib
import json
import logging
import os
import time
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple, Union
from utils import atomic_json_write
import requests
logger = logging.getLogger(__name__)
MODELS_DEV_URL = "https://models.dev/api.json"
_MODELS_DEV_CACHE_TTL = 3600 # 1 hour in-memory
# In-memory cache
_models_dev_cache: Dict[str, Any] = {}
_models_dev_cache_time: float = 0
# ---------------------------------------------------------------------------
# Dataclasses — rich metadata for providers and models
# ---------------------------------------------------------------------------
@dataclass
class ModelInfo:
"""Full metadata for a single model from models.dev."""
id: str
name: str
family: str
provider_id: str # models.dev provider ID (e.g. "anthropic")
# Capabilities
reasoning: bool = False
tool_call: bool = False
attachment: bool = False # supports image/file attachments (vision)
temperature: bool = False
structured_output: bool = False
open_weights: bool = False
# Modalities
input_modalities: Tuple[str, ...] = () # ("text", "image", "pdf", ...)
output_modalities: Tuple[str, ...] = ()
# Limits
context_window: int = 0
max_output: int = 0
max_input: Optional[int] = None
# Cost (per million tokens, USD)
cost_input: float = 0.0
cost_output: float = 0.0
cost_cache_read: Optional[float] = None
cost_cache_write: Optional[float] = None
# Metadata
knowledge_cutoff: str = ""
release_date: str = ""
status: str = "" # "alpha", "beta", "deprecated", or ""
interleaved: Any = False # True or {"field": "reasoning_content"}
def has_cost_data(self) -> bool:
return self.cost_input > 0 or self.cost_output > 0
def supports_vision(self) -> bool:
return self.attachment or "image" in self.input_modalities
def supports_pdf(self) -> bool:
return "pdf" in self.input_modalities
def supports_audio_input(self) -> bool:
return "audio" in self.input_modalities
def format_cost(self) -> str:
"""Human-readable cost string, e.g. '$3.00/M in, $15.00/M out'."""
if not self.has_cost_data():
return "unknown"
parts = [f"${self.cost_input:.2f}/M in", f"${self.cost_output:.2f}/M out"]
if self.cost_cache_read is not None:
parts.append(f"cache read ${self.cost_cache_read:.2f}/M")
return ", ".join(parts)
def format_capabilities(self) -> str:
"""Human-readable capabilities, e.g. 'reasoning, tools, vision, PDF'."""
caps = []
if self.reasoning:
caps.append("reasoning")
if self.tool_call:
caps.append("tools")
if self.supports_vision():
caps.append("vision")
if self.supports_pdf():
caps.append("PDF")
if self.supports_audio_input():
caps.append("audio")
if self.structured_output:
caps.append("structured output")
if self.open_weights:
caps.append("open weights")
return ", ".join(caps) if caps else "basic"
@dataclass
class ProviderInfo:
"""Full metadata for a provider from models.dev."""
id: str # models.dev provider ID
name: str # display name
env: Tuple[str, ...] # env var names for API key
api: str # base URL
doc: str = "" # documentation URL
model_count: int = 0
def has_api_url(self) -> bool:
return bool(self.api)
# ---------------------------------------------------------------------------
# Provider ID mapping: Hermes ↔ models.dev
# ---------------------------------------------------------------------------
# Hermes provider names → models.dev provider IDs
PROVIDER_TO_MODELS_DEV: Dict[str, str] = {
"openrouter": "openrouter",
"anthropic": "anthropic",
"zai": "zai",
"kimi-coding": "kimi-for-coding",
"minimax": "minimax",
"minimax-cn": "minimax-cn",
"deepseek": "deepseek",
"alibaba": "alibaba",
"copilot": "github-copilot",
"ai-gateway": "vercel",
"opencode-zen": "opencode",
"opencode-go": "opencode-go",
"kilocode": "kilo",
"fireworks": "fireworks-ai",
"huggingface": "huggingface",
"google": "google",
"xai": "xai",
"nvidia": "nvidia",
"groq": "groq",
"mistral": "mistral",
"togetherai": "togetherai",
"perplexity": "perplexity",
"cohere": "cohere",
}
# Reverse mapping: models.dev → Hermes (built lazily)
_MODELS_DEV_TO_PROVIDER: Optional[Dict[str, str]] = None
def _get_reverse_mapping() -> Dict[str, str]:
"""Return models.dev ID → Hermes provider ID mapping."""
global _MODELS_DEV_TO_PROVIDER
if _MODELS_DEV_TO_PROVIDER is None:
_MODELS_DEV_TO_PROVIDER = {v: k for k, v in PROVIDER_TO_MODELS_DEV.items()}
return _MODELS_DEV_TO_PROVIDER
def _get_cache_path() -> Path:
"""Return path to disk cache file."""
env_val = os.environ.get("HERMES_HOME", "")
hermes_home = Path(env_val) if env_val else Path.home() / ".hermes"
return hermes_home / "models_dev_cache.json"
def _load_disk_cache() -> Dict[str, Any]:
"""Load models.dev data from disk cache."""
try:
cache_path = _get_cache_path()
if cache_path.exists():
with open(cache_path, encoding="utf-8") as f:
return json.load(f)
except Exception as e:
logger.debug("Failed to load models.dev disk cache: %s", e)
return {}
def _save_disk_cache(data: Dict[str, Any]) -> None:
"""Save models.dev data to disk cache atomically."""
try:
cache_path = _get_cache_path()
atomic_json_write(cache_path, data, indent=None, separators=(",", ":"))
except Exception as e:
logger.debug("Failed to save models.dev disk cache: %s", e)
def fetch_models_dev(force_refresh: bool = False) -> Dict[str, Any]:
"""Fetch models.dev registry. In-memory cache (1hr) + disk fallback.
Returns the full registry dict keyed by provider ID, or empty dict on failure.
"""
global _models_dev_cache, _models_dev_cache_time
# Check in-memory cache
if (
not force_refresh
and _models_dev_cache
and (time.time() - _models_dev_cache_time) < _MODELS_DEV_CACHE_TTL
):
return _models_dev_cache
# Try network fetch
try:
response = requests.get(MODELS_DEV_URL, timeout=15)
response.raise_for_status()
data = response.json()
if isinstance(data, dict) and len(data) > 0:
_models_dev_cache = data
_models_dev_cache_time = time.time()
_save_disk_cache(data)
logger.debug(
"Fetched models.dev registry: %d providers, %d total models",
len(data),
sum(len(p.get("models", {})) for p in data.values() if isinstance(p, dict)),
)
return data
except Exception as e:
logger.debug("Failed to fetch models.dev: %s", e)
# Fall back to disk cache — use a short TTL (5 min) so we retry
# the network fetch soon instead of serving stale data for a full hour.
if not _models_dev_cache:
_models_dev_cache = _load_disk_cache()
if _models_dev_cache:
_models_dev_cache_time = time.time() - _MODELS_DEV_CACHE_TTL + 300
logger.debug("Loaded models.dev from disk cache (%d providers)", len(_models_dev_cache))
return _models_dev_cache
def lookup_models_dev_context(provider: str, model: str) -> Optional[int]:
"""Look up context_length for a provider+model combo in models.dev.
Returns the context window in tokens, or None if not found.
Handles case-insensitive matching and filters out context=0 entries.
"""
mdev_provider_id = PROVIDER_TO_MODELS_DEV.get(provider)
if not mdev_provider_id:
return None
data = fetch_models_dev()
provider_data = data.get(mdev_provider_id)
if not isinstance(provider_data, dict):
return None
models = provider_data.get("models", {})
if not isinstance(models, dict):
return None
# Exact match
entry = models.get(model)
if entry:
ctx = _extract_context(entry)
if ctx:
return ctx
# Case-insensitive match
model_lower = model.lower()
for mid, mdata in models.items():
if mid.lower() == model_lower:
ctx = _extract_context(mdata)
if ctx:
return ctx
return None
def _extract_context(entry: Dict[str, Any]) -> Optional[int]:
"""Extract context_length from a models.dev model entry.
Returns None for invalid/zero values (some audio/image models have context=0).
"""
if not isinstance(entry, dict):
return None
limit = entry.get("limit")
if not isinstance(limit, dict):
return None
ctx = limit.get("context")
if isinstance(ctx, (int, float)) and ctx > 0:
return int(ctx)
return None
# ---------------------------------------------------------------------------
# Model capability metadata
# ---------------------------------------------------------------------------
@dataclass
class ModelCapabilities:
"""Structured capability metadata for a model from models.dev."""
supports_tools: bool = True
supports_vision: bool = False
supports_reasoning: bool = False
context_window: int = 200000
max_output_tokens: int = 8192
model_family: str = ""
def _get_provider_models(provider: str) -> Optional[Dict[str, Any]]:
"""Resolve a Hermes provider ID to its models dict from models.dev.
Returns the models dict or None if the provider is unknown or has no data.
"""
mdev_provider_id = PROVIDER_TO_MODELS_DEV.get(provider)
if not mdev_provider_id:
return None
data = fetch_models_dev()
provider_data = data.get(mdev_provider_id)
if not isinstance(provider_data, dict):
return None
models = provider_data.get("models", {})
if not isinstance(models, dict):
return None
return models
def _find_model_entry(models: Dict[str, Any], model: str) -> Optional[Dict[str, Any]]:
"""Find a model entry by exact match, then case-insensitive fallback."""
# Exact match
entry = models.get(model)
if isinstance(entry, dict):
return entry
# Case-insensitive match
model_lower = model.lower()
for mid, mdata in models.items():
if mid.lower() == model_lower and isinstance(mdata, dict):
return mdata
return None
def get_model_capabilities(provider: str, model: str) -> Optional[ModelCapabilities]:
"""Look up full capability metadata from models.dev cache.
Uses the existing fetch_models_dev() and PROVIDER_TO_MODELS_DEV mapping.
Returns None if model not found.
Extracts from model entry fields:
- reasoning (bool) → supports_reasoning
- tool_call (bool) → supports_tools
- attachment (bool) → supports_vision
- limit.context (int) → context_window
- limit.output (int) → max_output_tokens
- family (str) → model_family
"""
models = _get_provider_models(provider)
if models is None:
return None
entry = _find_model_entry(models, model)
if entry is None:
return None
# Extract capability flags (default to False if missing)
supports_tools = bool(entry.get("tool_call", False))
supports_vision = bool(entry.get("attachment", False))
supports_reasoning = bool(entry.get("reasoning", False))
# Extract limits
limit = entry.get("limit", {})
if not isinstance(limit, dict):
limit = {}
ctx = limit.get("context")
context_window = int(ctx) if isinstance(ctx, (int, float)) and ctx > 0 else 200000
out = limit.get("output")
max_output_tokens = int(out) if isinstance(out, (int, float)) and out > 0 else 8192
model_family = entry.get("family", "") or ""
return ModelCapabilities(
supports_tools=supports_tools,
supports_vision=supports_vision,
supports_reasoning=supports_reasoning,
context_window=context_window,
max_output_tokens=max_output_tokens,
model_family=model_family,
)
def list_provider_models(provider: str) -> List[str]:
"""Return all model IDs for a provider from models.dev.
Returns an empty list if the provider is unknown or has no data.
"""
models = _get_provider_models(provider)
if models is None:
return []
return list(models.keys())
def search_models_dev(
query: str, provider: str = None, limit: int = 5
) -> List[Dict[str, Any]]:
"""Fuzzy search across models.dev catalog. Returns matching model entries.
Args:
query: Search string to match against model IDs.
provider: Optional Hermes provider ID to restrict search scope.
If None, searches across all providers in PROVIDER_TO_MODELS_DEV.
limit: Maximum number of results to return.
Returns:
List of dicts, each containing 'provider', 'model_id', and the full
model 'entry' from models.dev.
"""
data = fetch_models_dev()
if not data:
return []
# Build list of (provider_id, model_id, entry) candidates
candidates: List[tuple] = []
if provider is not None:
# Search only the specified provider
mdev_provider_id = PROVIDER_TO_MODELS_DEV.get(provider)
if not mdev_provider_id:
return []
provider_data = data.get(mdev_provider_id, {})
if isinstance(provider_data, dict):
models = provider_data.get("models", {})
if isinstance(models, dict):
for mid, mdata in models.items():
candidates.append((provider, mid, mdata))
else:
# Search across all mapped providers
for hermes_prov, mdev_prov in PROVIDER_TO_MODELS_DEV.items():
provider_data = data.get(mdev_prov, {})
if isinstance(provider_data, dict):
models = provider_data.get("models", {})
if isinstance(models, dict):
for mid, mdata in models.items():
candidates.append((hermes_prov, mid, mdata))
if not candidates:
return []
# Use difflib for fuzzy matching — case-insensitive comparison
model_ids_lower = [c[1].lower() for c in candidates]
query_lower = query.lower()
# First try exact substring matches (more intuitive than pure edit-distance)
substring_matches = []
for prov, mid, mdata in candidates:
if query_lower in mid.lower():
substring_matches.append({"provider": prov, "model_id": mid, "entry": mdata})
# Then add difflib fuzzy matches for any remaining slots
fuzzy_ids = difflib.get_close_matches(
query_lower, model_ids_lower, n=limit * 2, cutoff=0.4
)
seen_ids: set = set()
results: List[Dict[str, Any]] = []
# Prioritize substring matches
for match in substring_matches:
key = (match["provider"], match["model_id"])
if key not in seen_ids:
seen_ids.add(key)
results.append(match)
if len(results) >= limit:
return results
# Add fuzzy matches
for fid in fuzzy_ids:
# Find original-case candidates matching this lowered ID
for prov, mid, mdata in candidates:
if mid.lower() == fid:
key = (prov, mid)
if key not in seen_ids:
seen_ids.add(key)
results.append({"provider": prov, "model_id": mid, "entry": mdata})
if len(results) >= limit:
return results
return results
# ---------------------------------------------------------------------------
# Rich dataclass constructors — parse raw models.dev JSON into dataclasses
# ---------------------------------------------------------------------------
def _parse_model_info(model_id: str, raw: Dict[str, Any], provider_id: str) -> ModelInfo:
"""Convert a raw models.dev model entry dict into a ModelInfo dataclass."""
limit = raw.get("limit") or {}
if not isinstance(limit, dict):
limit = {}
cost = raw.get("cost") or {}
if not isinstance(cost, dict):
cost = {}
modalities = raw.get("modalities") or {}
if not isinstance(modalities, dict):
modalities = {}
input_mods = modalities.get("input") or []
output_mods = modalities.get("output") or []
ctx = limit.get("context")
ctx_int = int(ctx) if isinstance(ctx, (int, float)) and ctx > 0 else 0
out = limit.get("output")
out_int = int(out) if isinstance(out, (int, float)) and out > 0 else 0
inp = limit.get("input")
inp_int = int(inp) if isinstance(inp, (int, float)) and inp > 0 else None
return ModelInfo(
id=model_id,
name=raw.get("name", "") or model_id,
family=raw.get("family", "") or "",
provider_id=provider_id,
reasoning=bool(raw.get("reasoning", False)),
tool_call=bool(raw.get("tool_call", False)),
attachment=bool(raw.get("attachment", False)),
temperature=bool(raw.get("temperature", False)),
structured_output=bool(raw.get("structured_output", False)),
open_weights=bool(raw.get("open_weights", False)),
input_modalities=tuple(input_mods) if isinstance(input_mods, list) else (),
output_modalities=tuple(output_mods) if isinstance(output_mods, list) else (),
context_window=ctx_int,
max_output=out_int,
max_input=inp_int,
cost_input=float(cost.get("input", 0) or 0),
cost_output=float(cost.get("output", 0) or 0),
cost_cache_read=float(cost["cache_read"]) if "cache_read" in cost and cost["cache_read"] is not None else None,
cost_cache_write=float(cost["cache_write"]) if "cache_write" in cost and cost["cache_write"] is not None else None,
knowledge_cutoff=raw.get("knowledge", "") or "",
release_date=raw.get("release_date", "") or "",
status=raw.get("status", "") or "",
interleaved=raw.get("interleaved", False),
)
def _parse_provider_info(provider_id: str, raw: Dict[str, Any]) -> ProviderInfo:
"""Convert a raw models.dev provider entry dict into a ProviderInfo."""
env = raw.get("env") or []
models = raw.get("models") or {}
return ProviderInfo(
id=provider_id,
name=raw.get("name", "") or provider_id,
env=tuple(env) if isinstance(env, list) else (),
api=raw.get("api", "") or "",
doc=raw.get("doc", "") or "",
model_count=len(models) if isinstance(models, dict) else 0,
)
# ---------------------------------------------------------------------------
# Provider-level queries
# ---------------------------------------------------------------------------
def get_provider_info(provider_id: str) -> Optional[ProviderInfo]:
"""Get full provider metadata from models.dev.
Accepts either a Hermes provider ID (e.g. "kilocode") or a models.dev
ID (e.g. "kilo"). Returns None if the provider is not in the catalog.
"""
# Resolve Hermes ID → models.dev ID
mdev_id = PROVIDER_TO_MODELS_DEV.get(provider_id, provider_id)
data = fetch_models_dev()
raw = data.get(mdev_id)
if not isinstance(raw, dict):
return None
return _parse_provider_info(mdev_id, raw)
def list_all_providers() -> Dict[str, ProviderInfo]:
"""Return all providers from models.dev as {provider_id: ProviderInfo}.
Returns the full catalog — 109+ providers. For providers that have
a Hermes alias, both the models.dev ID and the Hermes ID are included.
"""
data = fetch_models_dev()
result: Dict[str, ProviderInfo] = {}
for pid, pdata in data.items():
if isinstance(pdata, dict):
info = _parse_provider_info(pid, pdata)
result[pid] = info
return result
def get_providers_for_env_var(env_var: str) -> List[str]:
"""Reverse lookup: find all providers that use a given env var.
Useful for auto-detection: "user has ANTHROPIC_API_KEY set, which
providers does that enable?"
Returns list of models.dev provider IDs.
"""
data = fetch_models_dev()
matches: List[str] = []
for pid, pdata in data.items():
if isinstance(pdata, dict):
env = pdata.get("env", [])
if isinstance(env, list) and env_var in env:
matches.append(pid)
return matches
# ---------------------------------------------------------------------------
# Model-level queries (rich ModelInfo)
# ---------------------------------------------------------------------------
def get_model_info(
provider_id: str, model_id: str
) -> Optional[ModelInfo]:
"""Get full model metadata from models.dev.
Accepts Hermes or models.dev provider ID. Tries exact match then
case-insensitive fallback. Returns None if not found.
"""
mdev_id = PROVIDER_TO_MODELS_DEV.get(provider_id, provider_id)
data = fetch_models_dev()
pdata = data.get(mdev_id)
if not isinstance(pdata, dict):
return None
models = pdata.get("models", {})
if not isinstance(models, dict):
return None
# Exact match
raw = models.get(model_id)
if isinstance(raw, dict):
return _parse_model_info(model_id, raw, mdev_id)
# Case-insensitive fallback
model_lower = model_id.lower()
for mid, mdata in models.items():
if mid.lower() == model_lower and isinstance(mdata, dict):
return _parse_model_info(mid, mdata, mdev_id)
return None
def get_model_info_any_provider(model_id: str) -> Optional[ModelInfo]:
"""Search all providers for a model by ID.
Useful when you have a full slug like "anthropic/claude-sonnet-4.6" or
a bare name and want to find it anywhere. Checks Hermes-mapped providers
first, then falls back to all models.dev providers.
"""
data = fetch_models_dev()
# Try Hermes-mapped providers first (more likely what the user wants)
for hermes_id, mdev_id in PROVIDER_TO_MODELS_DEV.items():
pdata = data.get(mdev_id)
if not isinstance(pdata, dict):
continue
models = pdata.get("models", {})
if not isinstance(models, dict):
continue
raw = models.get(model_id)
if isinstance(raw, dict):
return _parse_model_info(model_id, raw, mdev_id)
# Case-insensitive
model_lower = model_id.lower()
for mid, mdata in models.items():
if mid.lower() == model_lower and isinstance(mdata, dict):
return _parse_model_info(mid, mdata, mdev_id)
# Fall back to ALL providers
for pid, pdata in data.items():
if pid in _get_reverse_mapping():
continue # already checked
if not isinstance(pdata, dict):
continue
models = pdata.get("models", {})
if not isinstance(models, dict):
continue
raw = models.get(model_id)
if isinstance(raw, dict):
return _parse_model_info(model_id, raw, pid)
return None
def list_provider_model_infos(provider_id: str) -> List[ModelInfo]:
"""Return all models for a provider as ModelInfo objects.
Filters out deprecated models by default.
"""
mdev_id = PROVIDER_TO_MODELS_DEV.get(provider_id, provider_id)
data = fetch_models_dev()
pdata = data.get(mdev_id)
if not isinstance(pdata, dict):
return []
models = pdata.get("models", {})
if not isinstance(models, dict):
return []
result: List[ModelInfo] = []
for mid, mdata in models.items():
if not isinstance(mdata, dict):
continue
status = mdata.get("status", "")
if status == "deprecated":
continue
result.append(_parse_model_info(mid, mdata, mdev_id))
return result

View File

@@ -4,12 +4,28 @@ All functions are stateless. AIAgent._build_system_prompt() calls these to
assemble pieces, then combines them with memory and ephemeral prompts.
"""
import json
import logging
import os
import re
import threading
from collections import OrderedDict
from pathlib import Path
from hermes_constants import get_hermes_home
from typing import Optional
from agent.skill_utils import (
extract_skill_conditions,
extract_skill_description,
get_all_skills_dirs,
get_disabled_skill_names,
iter_skill_index_files,
parse_frontmatter,
skill_matches_platform,
)
from utils import atomic_json_write
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
@@ -56,6 +72,61 @@ def _scan_context_content(content: str, filename: str) -> str:
return content
def _find_git_root(start: Path) -> Optional[Path]:
"""Walk *start* and its parents looking for a ``.git`` directory.
Returns the directory containing ``.git``, or ``None`` if we hit the
filesystem root without finding one.
"""
current = start.resolve()
for parent in [current, *current.parents]:
if (parent / ".git").exists():
return parent
return None
_HERMES_MD_NAMES = (".hermes.md", "HERMES.md")
def _find_hermes_md(cwd: Path) -> Optional[Path]:
"""Discover the nearest ``.hermes.md`` or ``HERMES.md``.
Search order: *cwd* first, then each parent directory up to (and
including) the git repository root. Returns the first match, or
``None`` if nothing is found.
"""
stop_at = _find_git_root(cwd)
current = cwd.resolve()
for directory in [current, *current.parents]:
for name in _HERMES_MD_NAMES:
candidate = directory / name
if candidate.is_file():
return candidate
# Stop walking at the git root (or filesystem root).
if stop_at and directory == stop_at:
break
return None
def _strip_yaml_frontmatter(content: str) -> str:
"""Remove optional YAML frontmatter (``---`` delimited) from *content*.
The frontmatter may contain structured config (model overrides, tool
settings) that will be handled separately in a future PR. For now we
strip it so only the human-readable markdown body is injected into the
system prompt.
"""
if content.startswith("---"):
end = content.find("\n---", 3)
if end != -1:
# Skip past the closing --- and any trailing newline
body = content[end + 4:].lstrip("\n")
return body if body else content
return content
# =========================================================================
# Constants
# =========================================================================
@@ -73,9 +144,15 @@ DEFAULT_AGENT_IDENTITY = (
MEMORY_GUIDANCE = (
"You have persistent memory across sessions. Save durable facts using the memory "
"tool: user preferences, environment details, tool quirks, and stable conventions. "
"Memory is injected into every turn, so keep it compact. Do NOT save task progress, "
"session outcomes, or completed-work logs to memory; use session_search to recall "
"those from past transcripts."
"Memory is injected into every turn, so keep it compact and focused on facts that "
"will still matter later.\n"
"Prioritize what reduces future user steering — the most valuable memory is one "
"that prevents the user from having to correct or remind you again. "
"User preferences and recurring corrections matter more than procedural task details.\n"
"Do NOT save task progress, session outcomes, completed-work logs, or temporary TODO "
"state to memory; use session_search to recall those from past transcripts. "
"If you've discovered a new way to do something, solved a problem that could be "
"necessary later, save it as a skill with the skill tool."
)
SESSION_SEARCH_GUIDANCE = (
@@ -86,10 +163,61 @@ SESSION_SEARCH_GUIDANCE = (
SKILLS_GUIDANCE = (
"After completing a complex task (5+ tool calls), fixing a tricky error, "
"or discovering a non-trivial workflow, consider saving the approach as a "
"skill with skill_manage so you can reuse it next time."
"or discovering a non-trivial workflow, save the approach as a "
"skill with skill_manage so you can reuse it next time.\n"
"When using a skill and finding it outdated, incomplete, or wrong, "
"patch it immediately with skill_manage(action='patch') — don't wait to be asked. "
"Skills that aren't maintained become liabilities."
)
TOOL_USE_ENFORCEMENT_GUIDANCE = (
"# Tool-use enforcement\n"
"You MUST use your tools to take action — do not describe what you would do "
"or plan to do without actually doing it. When you say you will perform an "
"action (e.g. 'I will run the tests', 'Let me check the file', 'I will create "
"the project'), you MUST immediately make the corresponding tool call in the same "
"response. Never end your turn with a promise of future action — execute it now.\n"
"Keep working until the task is actually complete. Do not stop with a summary of "
"what you plan to do next time. If you have tools available that can accomplish "
"the task, use them instead of telling the user what you would do.\n"
"Every response should either (a) contain tool calls that make progress, or "
"(b) deliver a final result to the user. Responses that only describe intentions "
"without acting are not acceptable."
)
# Model name substrings that trigger tool-use enforcement guidance.
# Add new patterns here when a model family needs explicit steering.
TOOL_USE_ENFORCEMENT_MODELS = ("gpt", "codex", "gemini", "gemma")
# Gemini/Gemma-specific operational guidance, adapted from OpenCode's gemini.txt.
# Injected alongside TOOL_USE_ENFORCEMENT_GUIDANCE when the model is Gemini or Gemma.
GOOGLE_MODEL_OPERATIONAL_GUIDANCE = (
"# Google model operational directives\n"
"Follow these operational rules strictly:\n"
"- **Absolute paths:** Always construct and use absolute file paths for all "
"file system operations. Combine the project root with relative paths.\n"
"- **Verify first:** Use read_file/search_files to check file contents and "
"project structure before making changes. Never guess at file contents.\n"
"- **Dependency checks:** Never assume a library is available. Check "
"package.json, requirements.txt, Cargo.toml, etc. before importing.\n"
"- **Conciseness:** Keep explanatory text brief — a few sentences, not "
"paragraphs. Focus on actions and results over narration.\n"
"- **Parallel tool calls:** When you need to perform multiple independent "
"operations (e.g. reading several files), make all the tool calls in a "
"single response rather than sequentially.\n"
"- **Non-interactive commands:** Use flags like -y, --yes, --non-interactive "
"to prevent CLI tools from hanging on prompts.\n"
"- **Keep going:** Work autonomously until the task is fully resolved. "
"Don't stop with a plan — execute it.\n"
)
# Model name substrings that should use the 'developer' role instead of
# 'system' for the system prompt. OpenAI's newer models (GPT-5, Codex)
# give stronger instruction-following weight to the 'developer' role.
# The swap happens at the API boundary in _build_api_kwargs() so internal
# message representation stays consistent ("system" everywhere).
DEVELOPER_ROLE_MODELS = ("gpt-5", "codex")
PLATFORM_HINTS = {
"whatsapp": (
"You are on a text messaging communication platform, WhatsApp. "
@@ -142,16 +270,21 @@ PLATFORM_HINTS = {
"contextually appropriate."
),
"cron": (
"You are running as a scheduled cron job. Your final response is automatically "
"delivered to the job's configured destination, so do not use send_message to "
"send to that same target again. If you want the user to receive something in "
"the scheduled destination, put it directly in your final response. Use "
"send_message only for additional or different targets."
"You are running as a scheduled cron job. There is no user present — you "
"cannot ask questions, request clarification, or wait for follow-up. Execute "
"the task fully and autonomously, making reasonable decisions where needed. "
"Your final response is automatically delivered to the job's configured "
"destination — put the primary content directly in your response."
),
"cli": (
"You are a CLI AI Agent. Try not to use markdown but simple text "
"renderable inside a terminal."
),
"sms": (
"You are communicating via SMS. Keep responses concise and use plain text "
"only — no markdown, no formatting. SMS messages are limited to ~1600 "
"characters, so be brief and direct."
),
}
CONTEXT_FILE_MAX_CHARS = 20_000
@@ -159,6 +292,111 @@ CONTEXT_TRUNCATE_HEAD_RATIO = 0.7
CONTEXT_TRUNCATE_TAIL_RATIO = 0.2
# =========================================================================
# Skills prompt cache
# =========================================================================
_SKILLS_PROMPT_CACHE_MAX = 8
_SKILLS_PROMPT_CACHE: OrderedDict[tuple, str] = OrderedDict()
_SKILLS_PROMPT_CACHE_LOCK = threading.Lock()
_SKILLS_SNAPSHOT_VERSION = 1
def _skills_prompt_snapshot_path() -> Path:
return get_hermes_home() / ".skills_prompt_snapshot.json"
def clear_skills_system_prompt_cache(*, clear_snapshot: bool = False) -> None:
"""Drop the in-process skills prompt cache (and optionally the disk snapshot)."""
with _SKILLS_PROMPT_CACHE_LOCK:
_SKILLS_PROMPT_CACHE.clear()
if clear_snapshot:
try:
_skills_prompt_snapshot_path().unlink(missing_ok=True)
except OSError as e:
logger.debug("Could not remove skills prompt snapshot: %s", e)
def _build_skills_manifest(skills_dir: Path) -> dict[str, list[int]]:
"""Build an mtime/size manifest of all SKILL.md and DESCRIPTION.md files."""
manifest: dict[str, list[int]] = {}
for filename in ("SKILL.md", "DESCRIPTION.md"):
for path in iter_skill_index_files(skills_dir, filename):
try:
st = path.stat()
except OSError:
continue
manifest[str(path.relative_to(skills_dir))] = [st.st_mtime_ns, st.st_size]
return manifest
def _load_skills_snapshot(skills_dir: Path) -> Optional[dict]:
"""Load the disk snapshot if it exists and its manifest still matches."""
snapshot_path = _skills_prompt_snapshot_path()
if not snapshot_path.exists():
return None
try:
snapshot = json.loads(snapshot_path.read_text(encoding="utf-8"))
except Exception:
return None
if not isinstance(snapshot, dict):
return None
if snapshot.get("version") != _SKILLS_SNAPSHOT_VERSION:
return None
if snapshot.get("manifest") != _build_skills_manifest(skills_dir):
return None
return snapshot
def _write_skills_snapshot(
skills_dir: Path,
manifest: dict[str, list[int]],
skill_entries: list[dict],
category_descriptions: dict[str, str],
) -> None:
"""Persist skill metadata to disk for fast cold-start reuse."""
payload = {
"version": _SKILLS_SNAPSHOT_VERSION,
"manifest": manifest,
"skills": skill_entries,
"category_descriptions": category_descriptions,
}
try:
atomic_json_write(_skills_prompt_snapshot_path(), payload)
except Exception as e:
logger.debug("Could not write skills prompt snapshot: %s", e)
def _build_snapshot_entry(
skill_file: Path,
skills_dir: Path,
frontmatter: dict,
description: str,
) -> dict:
"""Build a serialisable metadata dict for one skill."""
rel_path = skill_file.relative_to(skills_dir)
parts = rel_path.parts
if len(parts) >= 2:
skill_name = parts[-2]
category = "/".join(parts[:-2]) if len(parts) > 2 else parts[0]
else:
category = "general"
skill_name = skill_file.parent.name
platforms = frontmatter.get("platforms") or []
if isinstance(platforms, str):
platforms = [platforms]
return {
"skill_name": skill_name,
"category": category,
"frontmatter_name": str(frontmatter.get("name", skill_name)),
"description": description,
"platforms": [str(p).strip() for p in platforms if str(p).strip()],
"conditions": extract_skill_conditions(frontmatter),
}
# =========================================================================
# Skills index
# =========================================================================
@@ -170,22 +408,13 @@ def _parse_skill_file(skill_file: Path) -> tuple[bool, dict, str]:
(True, {}, "") to err on the side of showing the skill.
"""
try:
from tools.skills_tool import _parse_frontmatter, skill_matches_platform
raw = skill_file.read_text(encoding="utf-8")[:2000]
frontmatter, _ = _parse_frontmatter(raw)
frontmatter, _ = parse_frontmatter(raw)
if not skill_matches_platform(frontmatter):
return False, {}, ""
return False, frontmatter, ""
desc = ""
raw_desc = frontmatter.get("description", "")
if raw_desc:
desc = str(raw_desc).strip().strip("'\"")
if len(desc) > 60:
desc = desc[:57] + "..."
return True, frontmatter, desc
return True, frontmatter, extract_skill_description(frontmatter)
except Exception as e:
logger.debug("Failed to parse skill file %s: %s", skill_file, e)
return True, {}, ""
@@ -194,16 +423,9 @@ def _parse_skill_file(skill_file: Path) -> tuple[bool, dict, str]:
def _read_skill_conditions(skill_file: Path) -> dict:
"""Extract conditional activation fields from SKILL.md frontmatter."""
try:
from tools.skills_tool import _parse_frontmatter
raw = skill_file.read_text(encoding="utf-8")[:2000]
frontmatter, _ = _parse_frontmatter(raw)
hermes = frontmatter.get("metadata", {}).get("hermes", {})
return {
"fallback_for_toolsets": hermes.get("fallback_for_toolsets", []),
"requires_toolsets": hermes.get("requires_toolsets", []),
"fallback_for_tools": hermes.get("fallback_for_tools", []),
"requires_tools": hermes.get("requires_tools", []),
}
frontmatter, _ = parse_frontmatter(raw)
return extract_skill_conditions(frontmatter)
except Exception as e:
logger.debug("Failed to read skill conditions from %s: %s", skill_file, e)
return {}
@@ -246,93 +468,285 @@ def build_skills_system_prompt(
) -> str:
"""Build a compact skill index for the system prompt.
Scans ~/.hermes/skills/ for SKILL.md files grouped by category.
Includes per-skill descriptions from frontmatter so the model can
match skills by meaning, not just name.
Filters out skills incompatible with the current OS platform.
Two-layer cache:
1. In-process LRU dict keyed by (skills_dir, tools, toolsets)
2. Disk snapshot (``.skills_prompt_snapshot.json``) validated by
mtime/size manifest — survives process restarts
Falls back to a full filesystem scan when both layers miss.
External skill directories (``skills.external_dirs`` in config.yaml) are
scanned alongside the local ``~/.hermes/skills/`` directory. External dirs
are read-only — they appear in the index but new skills are always created
in the local dir. Local skills take precedence when names collide.
"""
hermes_home = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
hermes_home = get_hermes_home()
skills_dir = hermes_home / "skills"
external_dirs = get_all_skills_dirs()[1:] # skip local (index 0)
if not skills_dir.exists():
if not skills_dir.exists() and not external_dirs:
return ""
# Collect skills with descriptions, grouped by category.
# Each entry: (skill_name, description)
# Supports sub-categories: skills/mlops/training/axolotl/SKILL.md
# -> category "mlops/training", skill "axolotl"
# ── Layer 1: in-process LRU cache ─────────────────────────────────
# Include the resolved platform so per-platform disabled-skill lists
# produce distinct cache entries (gateway serves multiple platforms).
_platform_hint = (
os.environ.get("HERMES_PLATFORM")
or os.environ.get("HERMES_SESSION_PLATFORM")
or ""
)
cache_key = (
str(skills_dir.resolve()),
tuple(str(d) for d in external_dirs),
tuple(sorted(str(t) for t in (available_tools or set()))),
tuple(sorted(str(ts) for ts in (available_toolsets or set()))),
_platform_hint,
)
with _SKILLS_PROMPT_CACHE_LOCK:
cached = _SKILLS_PROMPT_CACHE.get(cache_key)
if cached is not None:
_SKILLS_PROMPT_CACHE.move_to_end(cache_key)
return cached
disabled = get_disabled_skill_names()
# ── Layer 2: disk snapshot ────────────────────────────────────────
snapshot = _load_skills_snapshot(skills_dir)
skills_by_category: dict[str, list[tuple[str, str]]] = {}
for skill_file in skills_dir.rglob("SKILL.md"):
is_compatible, _, desc = _parse_skill_file(skill_file)
if not is_compatible:
continue
# Skip skills whose conditional activation rules exclude them
conditions = _read_skill_conditions(skill_file)
if not _skill_should_show(conditions, available_tools, available_toolsets):
continue
rel_path = skill_file.relative_to(skills_dir)
parts = rel_path.parts
if len(parts) >= 2:
# Category is everything between skills_dir and the skill folder
# e.g. parts = ("mlops", "training", "axolotl", "SKILL.md")
# → category = "mlops/training", skill_name = "axolotl"
# e.g. parts = ("github", "github-auth", "SKILL.md")
# → category = "github", skill_name = "github-auth"
skill_name = parts[-2]
category = "/".join(parts[:-2]) if len(parts) > 2 else parts[0]
else:
category = "general"
skill_name = skill_file.parent.name
skills_by_category.setdefault(category, []).append((skill_name, desc))
category_descriptions: dict[str, str] = {}
if not skills_by_category:
return ""
if snapshot is not None:
# Fast path: use pre-parsed metadata from disk
for entry in snapshot.get("skills", []):
if not isinstance(entry, dict):
continue
skill_name = entry.get("skill_name") or ""
category = entry.get("category") or "general"
frontmatter_name = entry.get("frontmatter_name") or skill_name
platforms = entry.get("platforms") or []
if not skill_matches_platform({"platforms": platforms}):
continue
if frontmatter_name in disabled or skill_name in disabled:
continue
if not _skill_should_show(
entry.get("conditions") or {},
available_tools,
available_toolsets,
):
continue
skills_by_category.setdefault(category, []).append(
(skill_name, entry.get("description", ""))
)
category_descriptions = {
str(k): str(v)
for k, v in (snapshot.get("category_descriptions") or {}).items()
}
else:
# Cold path: full filesystem scan + write snapshot for next time
skill_entries: list[dict] = []
for skill_file in iter_skill_index_files(skills_dir, "SKILL.md"):
is_compatible, frontmatter, desc = _parse_skill_file(skill_file)
entry = _build_snapshot_entry(skill_file, skills_dir, frontmatter, desc)
skill_entries.append(entry)
if not is_compatible:
continue
skill_name = entry["skill_name"]
if entry["frontmatter_name"] in disabled or skill_name in disabled:
continue
if not _skill_should_show(
extract_skill_conditions(frontmatter),
available_tools,
available_toolsets,
):
continue
skills_by_category.setdefault(entry["category"], []).append(
(skill_name, entry["description"])
)
# Read category-level descriptions from DESCRIPTION.md
# Checks both the exact category path and parent directories
category_descriptions = {}
for category in skills_by_category:
cat_path = Path(category)
desc_file = skills_dir / cat_path / "DESCRIPTION.md"
if desc_file.exists():
# Read category-level DESCRIPTION.md files
for desc_file in iter_skill_index_files(skills_dir, "DESCRIPTION.md"):
try:
content = desc_file.read_text(encoding="utf-8")
match = re.search(r"^---\s*\n.*?description:\s*(.+?)\s*\n.*?^---", content, re.MULTILINE | re.DOTALL)
if match:
category_descriptions[category] = match.group(1).strip()
fm, _ = parse_frontmatter(content)
cat_desc = fm.get("description")
if not cat_desc:
continue
rel = desc_file.relative_to(skills_dir)
cat = "/".join(rel.parts[:-1]) if len(rel.parts) > 1 else "general"
category_descriptions[cat] = str(cat_desc).strip().strip("'\"")
except Exception as e:
logger.debug("Could not read skill description %s: %s", desc_file, e)
index_lines = []
for category in sorted(skills_by_category.keys()):
cat_desc = category_descriptions.get(category, "")
if cat_desc:
index_lines.append(f" {category}: {cat_desc}")
else:
index_lines.append(f" {category}:")
# Deduplicate and sort skills within each category
seen = set()
for name, desc in sorted(skills_by_category[category], key=lambda x: x[0]):
if name in seen:
continue
seen.add(name)
if desc:
index_lines.append(f" - {name}: {desc}")
else:
index_lines.append(f" - {name}")
_write_skills_snapshot(
skills_dir,
_build_skills_manifest(skills_dir),
skill_entries,
category_descriptions,
)
return (
"## Skills (mandatory)\n"
"Before replying, scan the skills below. If one clearly matches your task, "
"load it with skill_view(name) and follow its instructions. "
"If a skill has issues, fix it with skill_manage(action='patch').\n"
"\n"
"<available_skills>\n"
+ "\n".join(index_lines) + "\n"
"</available_skills>\n"
"\n"
"If none match, proceed normally without loading a skill."
# ── External skill directories ─────────────────────────────────────
# Scan external dirs directly (no snapshot caching — they're read-only
# and typically small). Local skills already in skills_by_category take
# precedence: we track seen names and skip duplicates from external dirs.
seen_skill_names: set[str] = set()
for cat_skills in skills_by_category.values():
for name, _desc in cat_skills:
seen_skill_names.add(name)
for ext_dir in external_dirs:
if not ext_dir.exists():
continue
for skill_file in iter_skill_index_files(ext_dir, "SKILL.md"):
try:
is_compatible, frontmatter, desc = _parse_skill_file(skill_file)
if not is_compatible:
continue
entry = _build_snapshot_entry(skill_file, ext_dir, frontmatter, desc)
skill_name = entry["skill_name"]
if skill_name in seen_skill_names:
continue
if entry["frontmatter_name"] in disabled or skill_name in disabled:
continue
if not _skill_should_show(
extract_skill_conditions(frontmatter),
available_tools,
available_toolsets,
):
continue
seen_skill_names.add(skill_name)
skills_by_category.setdefault(entry["category"], []).append(
(skill_name, entry["description"])
)
except Exception as e:
logger.debug("Error reading external skill %s: %s", skill_file, e)
# External category descriptions
for desc_file in iter_skill_index_files(ext_dir, "DESCRIPTION.md"):
try:
content = desc_file.read_text(encoding="utf-8")
fm, _ = parse_frontmatter(content)
cat_desc = fm.get("description")
if not cat_desc:
continue
rel = desc_file.relative_to(ext_dir)
cat = "/".join(rel.parts[:-1]) if len(rel.parts) > 1 else "general"
category_descriptions.setdefault(cat, str(cat_desc).strip().strip("'\""))
except Exception as e:
logger.debug("Could not read external skill description %s: %s", desc_file, e)
if not skills_by_category:
result = ""
else:
index_lines = []
for category in sorted(skills_by_category.keys()):
cat_desc = category_descriptions.get(category, "")
if cat_desc:
index_lines.append(f" {category}: {cat_desc}")
else:
index_lines.append(f" {category}:")
# Deduplicate and sort skills within each category
seen = set()
for name, desc in sorted(skills_by_category[category], key=lambda x: x[0]):
if name in seen:
continue
seen.add(name)
if desc:
index_lines.append(f" - {name}: {desc}")
else:
index_lines.append(f" - {name}")
result = (
"## Skills (mandatory)\n"
"Before replying, scan the skills below. If one clearly matches your task, "
"load it with skill_view(name) and follow its instructions. "
"If a skill has issues, fix it with skill_manage(action='patch').\n"
"After difficult/iterative tasks, offer to save as a skill. "
"If a skill you loaded was missing steps, had wrong commands, or needed "
"pitfalls you discovered, update it before finishing.\n"
"\n"
"<available_skills>\n"
+ "\n".join(index_lines) + "\n"
"</available_skills>\n"
"\n"
"If none match, proceed normally without loading a skill."
)
# ── Store in LRU cache ────────────────────────────────────────────
with _SKILLS_PROMPT_CACHE_LOCK:
_SKILLS_PROMPT_CACHE[cache_key] = result
_SKILLS_PROMPT_CACHE.move_to_end(cache_key)
while len(_SKILLS_PROMPT_CACHE) > _SKILLS_PROMPT_CACHE_MAX:
_SKILLS_PROMPT_CACHE.popitem(last=False)
return result
def build_nous_subscription_prompt(valid_tool_names: "set[str] | None" = None) -> str:
"""Build a compact Nous subscription capability block for the system prompt."""
try:
from hermes_cli.nous_subscription import get_nous_subscription_features
from tools.tool_backend_helpers import managed_nous_tools_enabled
except Exception as exc:
logger.debug("Failed to import Nous subscription helper: %s", exc)
return ""
if not managed_nous_tools_enabled():
return ""
valid_names = set(valid_tool_names or set())
relevant_tool_names = {
"web_search",
"web_extract",
"browser_navigate",
"browser_snapshot",
"browser_click",
"browser_type",
"browser_scroll",
"browser_console",
"browser_close",
"browser_press",
"browser_get_images",
"browser_vision",
"image_generate",
"text_to_speech",
"terminal",
"process",
"execute_code",
}
if valid_names and not (valid_names & relevant_tool_names):
return ""
features = get_nous_subscription_features()
def _status_line(feature) -> str:
if feature.managed_by_nous:
return f"- {feature.label}: active via Nous subscription"
if feature.active:
current = feature.current_provider or "configured provider"
return f"- {feature.label}: currently using {current}"
if feature.included_by_default and features.nous_auth_present:
return f"- {feature.label}: included with Nous subscription, not currently selected"
if feature.key == "modal" and features.nous_auth_present:
return f"- {feature.label}: optional via Nous subscription"
return f"- {feature.label}: not currently available"
lines = [
"# Nous Subscription",
"Nous subscription includes managed web tools (Firecrawl), image generation (FAL), OpenAI TTS, and browser automation (Browserbase) by default. Modal execution is optional.",
"Current capability status:",
]
lines.extend(_status_line(feature) for feature in features.items())
lines.extend(
[
"When a Nous-managed feature is active, do not ask the user for Firecrawl, FAL, OpenAI TTS, or Browserbase API keys.",
"If the user is not subscribed and asks for a capability that Nous subscription would unlock or simplify, suggest Nous subscription as one option alongside direct setup or local alternatives.",
"Do not mention subscription unless the user asks about it or it directly solves the current missing capability.",
"Useful commands: hermes setup, hermes setup tools, hermes setup terminal, hermes status.",
]
)
return "\n".join(lines)
# =========================================================================
@@ -351,51 +765,91 @@ def _truncate_content(content: str, filename: str, max_chars: int = CONTEXT_FILE
return head + marker + tail
def build_context_files_prompt(cwd: Optional[str] = None) -> str:
"""Discover and load context files for the system prompt.
def load_soul_md() -> Optional[str]:
"""Load SOUL.md from HERMES_HOME and return its content, or None.
Discovery: AGENTS.md (recursive), .cursorrules / .cursor/rules/*.mdc,
and SOUL.md from HERMES_HOME only. Each capped at 20,000 chars.
Used as the agent identity (slot #1 in the system prompt). When this
returns content, ``build_context_files_prompt`` should be called with
``skip_soul=True`` so SOUL.md isn't injected twice.
"""
if cwd is None:
cwd = os.getcwd()
try:
from hermes_cli.config import ensure_hermes_home
ensure_hermes_home()
except Exception as e:
logger.debug("Could not ensure HERMES_HOME before loading SOUL.md: %s", e)
cwd_path = Path(cwd).resolve()
sections = []
soul_path = get_hermes_home() / "SOUL.md"
if not soul_path.exists():
return None
try:
content = soul_path.read_text(encoding="utf-8").strip()
if not content:
return None
content = _scan_context_content(content, "SOUL.md")
content = _truncate_content(content, "SOUL.md")
return content
except Exception as e:
logger.debug("Could not read SOUL.md from %s: %s", soul_path, e)
return None
# AGENTS.md (hierarchical, recursive)
top_level_agents = None
def _load_hermes_md(cwd_path: Path) -> str:
""".hermes.md / HERMES.md — walk to git root."""
hermes_md_path = _find_hermes_md(cwd_path)
if not hermes_md_path:
return ""
try:
content = hermes_md_path.read_text(encoding="utf-8").strip()
if not content:
return ""
content = _strip_yaml_frontmatter(content)
rel = hermes_md_path.name
try:
rel = str(hermes_md_path.relative_to(cwd_path))
except ValueError:
pass
content = _scan_context_content(content, rel)
result = f"## {rel}\n\n{content}"
return _truncate_content(result, ".hermes.md")
except Exception as e:
logger.debug("Could not read %s: %s", hermes_md_path, e)
return ""
def _load_agents_md(cwd_path: Path) -> str:
"""AGENTS.md — top-level only (no recursive walk)."""
for name in ["AGENTS.md", "agents.md"]:
candidate = cwd_path / name
if candidate.exists():
top_level_agents = candidate
break
if top_level_agents:
agents_files = []
for root, dirs, files in os.walk(cwd_path):
dirs[:] = [d for d in dirs if not d.startswith('.') and d not in ('node_modules', '__pycache__', 'venv', '.venv')]
for f in files:
if f.lower() == "agents.md":
agents_files.append(Path(root) / f)
agents_files.sort(key=lambda p: len(p.parts))
total_agents_content = ""
for agents_path in agents_files:
try:
content = agents_path.read_text(encoding="utf-8").strip()
content = candidate.read_text(encoding="utf-8").strip()
if content:
rel_path = agents_path.relative_to(cwd_path)
content = _scan_context_content(content, str(rel_path))
total_agents_content += f"## {rel_path}\n\n{content}\n\n"
content = _scan_context_content(content, name)
result = f"## {name}\n\n{content}"
return _truncate_content(result, "AGENTS.md")
except Exception as e:
logger.debug("Could not read %s: %s", agents_path, e)
logger.debug("Could not read %s: %s", candidate, e)
return ""
if total_agents_content:
total_agents_content = _truncate_content(total_agents_content, "AGENTS.md")
sections.append(total_agents_content)
# .cursorrules
def _load_claude_md(cwd_path: Path) -> str:
"""CLAUDE.md / claude.md — cwd only."""
for name in ["CLAUDE.md", "claude.md"]:
candidate = cwd_path / name
if candidate.exists():
try:
content = candidate.read_text(encoding="utf-8").strip()
if content:
content = _scan_context_content(content, name)
result = f"## {name}\n\n{content}"
return _truncate_content(result, "CLAUDE.md")
except Exception as e:
logger.debug("Could not read %s: %s", candidate, e)
return ""
def _load_cursorrules(cwd_path: Path) -> str:
""".cursorrules + .cursor/rules/*.mdc — cwd only."""
cursorrules_content = ""
cursorrules_file = cwd_path / ".cursorrules"
if cursorrules_file.exists():
@@ -419,27 +873,47 @@ def build_context_files_prompt(cwd: Optional[str] = None) -> str:
except Exception as e:
logger.debug("Could not read %s: %s", mdc_file, e)
if cursorrules_content:
cursorrules_content = _truncate_content(cursorrules_content, ".cursorrules")
sections.append(cursorrules_content)
if not cursorrules_content:
return ""
return _truncate_content(cursorrules_content, ".cursorrules")
# SOUL.md from HERMES_HOME only
try:
from hermes_cli.config import ensure_hermes_home
ensure_hermes_home()
except Exception as e:
logger.debug("Could not ensure HERMES_HOME before loading SOUL.md: %s", e)
soul_path = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes")) / "SOUL.md"
if soul_path.exists():
try:
content = soul_path.read_text(encoding="utf-8").strip()
if content:
content = _scan_context_content(content, "SOUL.md")
content = _truncate_content(content, "SOUL.md")
sections.append(content)
except Exception as e:
logger.debug("Could not read SOUL.md from %s: %s", soul_path, e)
def build_context_files_prompt(cwd: Optional[str] = None, skip_soul: bool = False) -> str:
"""Discover and load context files for the system prompt.
Priority (first found wins — only ONE project context type is loaded):
1. .hermes.md / HERMES.md (walk to git root)
2. AGENTS.md / agents.md (cwd only)
3. CLAUDE.md / claude.md (cwd only)
4. .cursorrules / .cursor/rules/*.mdc (cwd only)
SOUL.md from HERMES_HOME is independent and always included when present.
Each context source is capped at 20,000 chars.
When *skip_soul* is True, SOUL.md is not included here (it was already
loaded via ``load_soul_md()`` for the identity slot).
"""
if cwd is None:
cwd = os.getcwd()
cwd_path = Path(cwd).resolve()
sections = []
# Priority-based project context: first match wins
project_context = (
_load_hermes_md(cwd_path)
or _load_agents_md(cwd_path)
or _load_claude_md(cwd_path)
or _load_cursorrules(cwd_path)
)
if project_context:
sections.append(project_context)
# SOUL.md from HERMES_HOME only — skip when already loaded as identity
if not skip_soul:
soul_content = load_soul_md()
if soul_content:
sections.append(soul_content)
if not sections:
return ""

View File

@@ -12,13 +12,14 @@ import copy
from typing import Any, Dict, List
def _apply_cache_marker(msg: dict, cache_marker: dict) -> None:
def _apply_cache_marker(msg: dict, cache_marker: dict, native_anthropic: bool = False) -> None:
"""Add cache_control to a single message, handling all format variations."""
role = msg.get("role", "")
content = msg.get("content")
if role == "tool":
msg["cache_control"] = cache_marker
if native_anthropic:
msg["cache_control"] = cache_marker
return
if content is None or content == "":
@@ -40,6 +41,7 @@ def _apply_cache_marker(msg: dict, cache_marker: dict) -> None:
def apply_anthropic_cache_control(
api_messages: List[Dict[str, Any]],
cache_ttl: str = "5m",
native_anthropic: bool = False,
) -> List[Dict[str, Any]]:
"""Apply system_and_3 caching strategy to messages for Anthropic models.
@@ -59,12 +61,12 @@ def apply_anthropic_cache_control(
breakpoints_used = 0
if messages[0].get("role") == "system":
_apply_cache_marker(messages[0], marker)
_apply_cache_marker(messages[0], marker, native_anthropic=native_anthropic)
breakpoints_used += 1
remaining = 4 - breakpoints_used
non_sys = [i for i in range(len(messages)) if messages[i].get("role") != "system"]
for idx in non_sys[-remaining:]:
_apply_cache_marker(messages[idx], marker)
_apply_cache_marker(messages[idx], marker, native_anthropic=native_anthropic)
return messages

View File

@@ -13,11 +13,19 @@ import re
logger = logging.getLogger(__name__)
# Snapshot at import time so runtime env mutations (e.g. LLM-generated
# `export HERMES_REDACT_SECRETS=false`) cannot disable redaction mid-session.
_REDACT_ENABLED = os.getenv("HERMES_REDACT_SECRETS", "").lower() not in ("0", "false", "no", "off")
# Known API key prefixes -- match the prefix + contiguous token chars
_PREFIX_PATTERNS = [
r"sk-[A-Za-z0-9_-]{10,}", # OpenAI / OpenRouter / Anthropic (sk-ant-*)
r"ghp_[A-Za-z0-9]{10,}", # GitHub PAT (classic)
r"github_pat_[A-Za-z0-9_]{10,}", # GitHub PAT (fine-grained)
r"gho_[A-Za-z0-9]{10,}", # GitHub OAuth access token
r"ghu_[A-Za-z0-9]{10,}", # GitHub user-to-server token
r"ghs_[A-Za-z0-9]{10,}", # GitHub server-to-server token
r"ghr_[A-Za-z0-9]{10,}", # GitHub refresh token
r"xox[baprs]-[A-Za-z0-9-]{10,}", # Slack tokens
r"AIza[A-Za-z0-9_-]{30,}", # Google API keys
r"pplx-[A-Za-z0-9]{10,}", # Perplexity
@@ -37,13 +45,15 @@ _PREFIX_PATTERNS = [
r"dop_v1_[A-Za-z0-9]{10,}", # DigitalOcean PAT
r"doo_v1_[A-Za-z0-9]{10,}", # DigitalOcean OAuth
r"am_[A-Za-z0-9_-]{10,}", # AgentMail API key
r"sk_[A-Za-z0-9_]{10,}", # ElevenLabs TTS key (sk_ underscore, not sk- dash)
r"tvly-[A-Za-z0-9]{10,}", # Tavily search API key
r"exa_[A-Za-z0-9]{10,}", # Exa search API key
]
# ENV assignment patterns: KEY=value where KEY contains a secret-like name
_SECRET_ENV_NAMES = r"(?:API_?KEY|TOKEN|SECRET|PASSWORD|PASSWD|CREDENTIAL|AUTH)"
_ENV_ASSIGN_RE = re.compile(
rf"([A-Z_]*{_SECRET_ENV_NAMES}[A-Z_]*)\s*=\s*(['\"]?)(\S+)\2",
re.IGNORECASE,
rf"([A-Z0-9_]{{0,50}}{_SECRET_ENV_NAMES}[A-Z0-9_]{{0,50}})\s*=\s*(['\"]?)(\S+)\2",
)
# JSON field patterns: "apiKey": "value", "token": "value", etc.
@@ -100,9 +110,13 @@ def redact_sensitive_text(text: str) -> str:
Safe to call on any string -- non-matching text passes through unchanged.
Disabled when security.redact_secrets is false in config.yaml.
"""
if text is None:
return None
if not isinstance(text, str):
text = str(text)
if not text:
return text
if os.getenv("HERMES_REDACT_SECRETS", "").lower() in ("0", "false", "no", "off"):
if not _REDACT_ENABLED:
return text
# Known prefixes (sk-, ghp_, etc.)

View File

@@ -1,17 +1,42 @@
"""Skill slash commands — scan installed skills and build invocation messages.
"""Shared slash command helpers for skills and built-in prompt-style modes.
Shared between CLI (cli.py) and gateway (gateway/run.py) so both surfaces
can invoke skills via /skill-name commands.
can invoke skills via /skill-name commands and prompt-only built-ins like
/plan.
"""
import json
import logging
import re
from datetime import datetime
from pathlib import Path
from typing import Any, Dict, Optional
logger = logging.getLogger(__name__)
_skill_commands: Dict[str, Dict[str, Any]] = {}
_PLAN_SLUG_RE = re.compile(r"[^a-z0-9]+")
def build_plan_path(
user_instruction: str = "",
*,
now: datetime | None = None,
) -> Path:
"""Return the default workspace-relative markdown path for a /plan invocation.
Relative paths are intentional: file tools are task/backend-aware and resolve
them against the active working directory for local, docker, ssh, modal,
daytona, and similar terminal backends. That keeps the plan with the active
workspace instead of the Hermes host's global home directory.
"""
slug_source = (user_instruction or "").strip().splitlines()[0] if user_instruction else ""
slug = _PLAN_SLUG_RE.sub("-", slug_source.lower()).strip("-")
if slug:
slug = "-".join(part for part in slug.split("-")[:8] if part)[:48].strip("-")
slug = slug or "conversation-plan"
timestamp = (now or datetime.now()).strftime("%Y-%m-%d_%H%M%S")
return Path(".hermes") / "plans" / f"{timestamp}-{slug}.md"
def _load_skill_payload(skill_identifier: str, task_id: str | None = None) -> tuple[dict[str, Any], Path | None, str] | None:
@@ -56,6 +81,7 @@ def _build_skill_message(
skill_dir: Path | None,
activation_note: str,
user_instruction: str = "",
runtime_note: str = "",
) -> str:
"""Format a loaded skill into a user/system message payload."""
from tools.skills_tool import SKILLS_DIR
@@ -102,7 +128,11 @@ def _build_skill_message(
supporting.append(rel)
if supporting and skill_dir:
skill_view_target = str(skill_dir.relative_to(SKILLS_DIR))
try:
skill_view_target = str(skill_dir.relative_to(SKILLS_DIR))
except ValueError:
# Skill is from an external dir — use the skill name instead
skill_view_target = skill_dir.name
parts.append("")
parts.append("[This skill has supporting files you can load with the skill_view tool:]")
for sf in supporting:
@@ -115,6 +145,10 @@ def _build_skill_message(
parts.append("")
parts.append(f"The user has provided the following instruction alongside the skill invocation: {user_instruction}")
if runtime_note:
parts.append("")
parts.append(f"[Runtime note: {runtime_note}]")
return "\n".join(parts)
@@ -127,35 +161,50 @@ def scan_skill_commands() -> Dict[str, Dict[str, Any]]:
global _skill_commands
_skill_commands = {}
try:
from tools.skills_tool import SKILLS_DIR, _parse_frontmatter, skill_matches_platform
if not SKILLS_DIR.exists():
return _skill_commands
for skill_md in SKILLS_DIR.rglob("SKILL.md"):
if any(part in ('.git', '.github', '.hub') for part in skill_md.parts):
continue
try:
content = skill_md.read_text(encoding='utf-8')
frontmatter, body = _parse_frontmatter(content)
# Skip skills incompatible with the current OS platform
if not skill_matches_platform(frontmatter):
from tools.skills_tool import SKILLS_DIR, _parse_frontmatter, skill_matches_platform, _get_disabled_skill_names
from agent.skill_utils import get_external_skills_dirs
disabled = _get_disabled_skill_names()
seen_names: set = set()
# Scan local dir first, then external dirs
dirs_to_scan = []
if SKILLS_DIR.exists():
dirs_to_scan.append(SKILLS_DIR)
dirs_to_scan.extend(get_external_skills_dirs())
for scan_dir in dirs_to_scan:
for skill_md in scan_dir.rglob("SKILL.md"):
if any(part in ('.git', '.github', '.hub') for part in skill_md.parts):
continue
try:
content = skill_md.read_text(encoding='utf-8')
frontmatter, body = _parse_frontmatter(content)
# Skip skills incompatible with the current OS platform
if not skill_matches_platform(frontmatter):
continue
name = frontmatter.get('name', skill_md.parent.name)
if name in seen_names:
continue
# Respect user's disabled skills config
if name in disabled:
continue
description = frontmatter.get('description', '')
if not description:
for line in body.strip().split('\n'):
line = line.strip()
if line and not line.startswith('#'):
description = line[:80]
break
seen_names.add(name)
cmd_name = name.lower().replace(' ', '-').replace('_', '-')
_skill_commands[f"/{cmd_name}"] = {
"name": name,
"description": description or f"Invoke the {name} skill",
"skill_md_path": str(skill_md),
"skill_dir": str(skill_md.parent),
}
except Exception:
continue
name = frontmatter.get('name', skill_md.parent.name)
description = frontmatter.get('description', '')
if not description:
for line in body.strip().split('\n'):
line = line.strip()
if line and not line.startswith('#'):
description = line[:80]
break
cmd_name = name.lower().replace(' ', '-').replace('_', '-')
_skill_commands[f"/{cmd_name}"] = {
"name": name,
"description": description or f"Invoke the {name} skill",
"skill_md_path": str(skill_md),
"skill_dir": str(skill_md.parent),
}
except Exception:
continue
except Exception:
pass
return _skill_commands
@@ -168,10 +217,30 @@ def get_skill_commands() -> Dict[str, Dict[str, Any]]:
return _skill_commands
def resolve_skill_command_key(command: str) -> Optional[str]:
"""Resolve a user-typed /command to its canonical skill_cmds key.
Skills are always stored with hyphens — ``scan_skill_commands`` normalizes
spaces and underscores to hyphens when building the key. Hyphens and
underscores are treated interchangeably in user input: this matches
``_check_unavailable_skill`` and accommodates Telegram bot-command names
(which disallow hyphens, so ``/claude-code`` is registered as
``/claude_code`` and comes back in the underscored form).
Returns the matching ``/slug`` key from ``get_skill_commands()`` or
``None`` if no match.
"""
if not command:
return None
cmd_key = f"/{command.replace('_', '-')}"
return cmd_key if cmd_key in get_skill_commands() else None
def build_skill_invocation_message(
cmd_key: str,
user_instruction: str = "",
task_id: str | None = None,
runtime_note: str = "",
) -> Optional[str]:
"""Build the user message content for a skill slash command invocation.
@@ -201,6 +270,7 @@ def build_skill_invocation_message(
skill_dir,
activation_note,
user_instruction=user_instruction,
runtime_note=runtime_note,
)

285
agent/skill_utils.py Normal file
View File

@@ -0,0 +1,285 @@
"""Lightweight skill metadata utilities shared by prompt_builder and skills_tool.
This module intentionally avoids importing the tool registry, CLI config, or any
heavy dependency chain. It is safe to import at module level without triggering
tool registration or provider resolution.
"""
import logging
import os
import re
import sys
from pathlib import Path
from typing import Any, Dict, List, Optional, Set, Tuple
from hermes_constants import get_hermes_home
logger = logging.getLogger(__name__)
# ── Platform mapping ──────────────────────────────────────────────────────
PLATFORM_MAP = {
"macos": "darwin",
"linux": "linux",
"windows": "win32",
}
EXCLUDED_SKILL_DIRS = frozenset((".git", ".github", ".hub"))
# ── Lazy YAML loader ─────────────────────────────────────────────────────
_yaml_load_fn = None
def yaml_load(content: str):
"""Parse YAML with lazy import and CSafeLoader preference."""
global _yaml_load_fn
if _yaml_load_fn is None:
import yaml
loader = getattr(yaml, "CSafeLoader", None) or yaml.SafeLoader
def _load(value: str):
return yaml.load(value, Loader=loader)
_yaml_load_fn = _load
return _yaml_load_fn(content)
# ── Frontmatter parsing ──────────────────────────────────────────────────
def parse_frontmatter(content: str) -> Tuple[Dict[str, Any], str]:
"""Parse YAML frontmatter from a markdown string.
Uses yaml with CSafeLoader for full YAML support (nested metadata, lists)
with a fallback to simple key:value splitting for robustness.
Returns:
(frontmatter_dict, remaining_body)
"""
frontmatter: Dict[str, Any] = {}
body = content
if not content.startswith("---"):
return frontmatter, body
end_match = re.search(r"\n---\s*\n", content[3:])
if not end_match:
return frontmatter, body
yaml_content = content[3 : end_match.start() + 3]
body = content[end_match.end() + 3 :]
try:
parsed = yaml_load(yaml_content)
if isinstance(parsed, dict):
frontmatter = parsed
except Exception:
# Fallback: simple key:value parsing for malformed YAML
for line in yaml_content.strip().split("\n"):
if ":" not in line:
continue
key, value = line.split(":", 1)
frontmatter[key.strip()] = value.strip()
return frontmatter, body
# ── Platform matching ─────────────────────────────────────────────────────
def skill_matches_platform(frontmatter: Dict[str, Any]) -> bool:
"""Return True when the skill is compatible with the current OS.
Skills declare platform requirements via a top-level ``platforms`` list
in their YAML frontmatter::
platforms: [macos] # macOS only
platforms: [macos, linux] # macOS and Linux
If the field is absent or empty the skill is compatible with **all**
platforms (backward-compatible default).
"""
platforms = frontmatter.get("platforms")
if not platforms:
return True
if not isinstance(platforms, list):
platforms = [platforms]
current = sys.platform
for platform in platforms:
normalized = str(platform).lower().strip()
mapped = PLATFORM_MAP.get(normalized, normalized)
if current.startswith(mapped):
return True
return False
# ── Disabled skills ───────────────────────────────────────────────────────
def get_disabled_skill_names(platform: str | None = None) -> Set[str]:
"""Read disabled skill names from config.yaml.
Args:
platform: Explicit platform name (e.g. ``"telegram"``). When
*None*, resolves from ``HERMES_PLATFORM`` or
``HERMES_SESSION_PLATFORM`` env vars. Falls back to the
global disabled list when no platform is determined.
Reads the config file directly (no CLI config imports) to stay
lightweight.
"""
config_path = get_hermes_home() / "config.yaml"
if not config_path.exists():
return set()
try:
parsed = yaml_load(config_path.read_text(encoding="utf-8"))
except Exception as e:
logger.debug("Could not read skill config %s: %s", config_path, e)
return set()
if not isinstance(parsed, dict):
return set()
skills_cfg = parsed.get("skills")
if not isinstance(skills_cfg, dict):
return set()
resolved_platform = (
platform
or os.getenv("HERMES_PLATFORM")
or os.getenv("HERMES_SESSION_PLATFORM")
)
if resolved_platform:
platform_disabled = (skills_cfg.get("platform_disabled") or {}).get(
resolved_platform
)
if platform_disabled is not None:
return _normalize_string_set(platform_disabled)
return _normalize_string_set(skills_cfg.get("disabled"))
def _normalize_string_set(values) -> Set[str]:
if values is None:
return set()
if isinstance(values, str):
values = [values]
return {str(v).strip() for v in values if str(v).strip()}
# ── External skills directories ──────────────────────────────────────────
def get_external_skills_dirs() -> List[Path]:
"""Read ``skills.external_dirs`` from config.yaml and return validated paths.
Each entry is expanded (``~`` and ``${VAR}``) and resolved to an absolute
path. Only directories that actually exist are returned. Duplicates and
paths that resolve to the local ``~/.hermes/skills/`` are silently skipped.
"""
config_path = get_hermes_home() / "config.yaml"
if not config_path.exists():
return []
try:
parsed = yaml_load(config_path.read_text(encoding="utf-8"))
except Exception:
return []
if not isinstance(parsed, dict):
return []
skills_cfg = parsed.get("skills")
if not isinstance(skills_cfg, dict):
return []
raw_dirs = skills_cfg.get("external_dirs")
if not raw_dirs:
return []
if isinstance(raw_dirs, str):
raw_dirs = [raw_dirs]
if not isinstance(raw_dirs, list):
return []
local_skills = (get_hermes_home() / "skills").resolve()
seen: Set[Path] = set()
result: List[Path] = []
for entry in raw_dirs:
entry = str(entry).strip()
if not entry:
continue
# Expand ~ and environment variables
expanded = os.path.expanduser(os.path.expandvars(entry))
p = Path(expanded).resolve()
if p == local_skills:
continue
if p in seen:
continue
if p.is_dir():
seen.add(p)
result.append(p)
else:
logger.debug("External skills dir does not exist, skipping: %s", p)
return result
def get_all_skills_dirs() -> List[Path]:
"""Return all skill directories: local ``~/.hermes/skills/`` first, then external.
The local dir is always first (and always included even if it doesn't exist
yet — callers handle that). External dirs follow in config order.
"""
dirs = [get_hermes_home() / "skills"]
dirs.extend(get_external_skills_dirs())
return dirs
# ── Condition extraction ──────────────────────────────────────────────────
def extract_skill_conditions(frontmatter: Dict[str, Any]) -> Dict[str, List]:
"""Extract conditional activation fields from parsed frontmatter."""
metadata = frontmatter.get("metadata")
# Handle cases where metadata is not a dict (e.g., a string from malformed YAML)
if not isinstance(metadata, dict):
metadata = {}
hermes = metadata.get("hermes") or {}
if not isinstance(hermes, dict):
hermes = {}
return {
"fallback_for_toolsets": hermes.get("fallback_for_toolsets", []),
"requires_toolsets": hermes.get("requires_toolsets", []),
"fallback_for_tools": hermes.get("fallback_for_tools", []),
"requires_tools": hermes.get("requires_tools", []),
}
# ── Description extraction ────────────────────────────────────────────────
def extract_skill_description(frontmatter: Dict[str, Any]) -> str:
"""Extract a truncated description from parsed frontmatter."""
raw_desc = frontmatter.get("description", "")
if not raw_desc:
return ""
desc = str(raw_desc).strip().strip("'\"")
if len(desc) > 60:
return desc[:57] + "..."
return desc
# ── File iteration ────────────────────────────────────────────────────────
def iter_skill_index_files(skills_dir: Path, filename: str):
"""Walk skills_dir yielding sorted paths matching *filename*.
Excludes ``.git``, ``.github``, ``.hub`` directories.
"""
matches = []
for root, dirs, files in os.walk(skills_dir):
dirs[:] = [d for d in dirs if d not in EXCLUDED_SKILL_DIRS]
if filename in files:
matches.append(Path(root) / filename)
for path in sorted(matches, key=lambda p: str(p.relative_to(skills_dir))):
yield path

View File

@@ -0,0 +1,194 @@
"""Helpers for optional cheap-vs-strong model routing."""
from __future__ import annotations
import os
import re
from typing import Any, Dict, Optional
from utils import is_truthy_value
_COMPLEX_KEYWORDS = {
"debug",
"debugging",
"implement",
"implementation",
"refactor",
"patch",
"traceback",
"stacktrace",
"exception",
"error",
"analyze",
"analysis",
"investigate",
"architecture",
"design",
"compare",
"benchmark",
"optimize",
"optimise",
"review",
"terminal",
"shell",
"tool",
"tools",
"pytest",
"test",
"tests",
"plan",
"planning",
"delegate",
"subagent",
"cron",
"docker",
"kubernetes",
}
_URL_RE = re.compile(r"https?://|www\.", re.IGNORECASE)
def _coerce_bool(value: Any, default: bool = False) -> bool:
return is_truthy_value(value, default=default)
def _coerce_int(value: Any, default: int) -> int:
try:
return int(value)
except (TypeError, ValueError):
return default
def choose_cheap_model_route(user_message: str, routing_config: Optional[Dict[str, Any]]) -> Optional[Dict[str, Any]]:
"""Return the configured cheap-model route when a message looks simple.
Conservative by design: if the message has signs of code/tool/debugging/
long-form work, keep the primary model.
"""
cfg = routing_config or {}
if not _coerce_bool(cfg.get("enabled"), False):
return None
cheap_model = cfg.get("cheap_model") or {}
if not isinstance(cheap_model, dict):
return None
provider = str(cheap_model.get("provider") or "").strip().lower()
model = str(cheap_model.get("model") or "").strip()
if not provider or not model:
return None
text = (user_message or "").strip()
if not text:
return None
max_chars = _coerce_int(cfg.get("max_simple_chars"), 160)
max_words = _coerce_int(cfg.get("max_simple_words"), 28)
if len(text) > max_chars:
return None
if len(text.split()) > max_words:
return None
if text.count("\n") > 1:
return None
if "```" in text or "`" in text:
return None
if _URL_RE.search(text):
return None
lowered = text.lower()
words = {token.strip(".,:;!?()[]{}\"'`") for token in lowered.split()}
if words & _COMPLEX_KEYWORDS:
return None
route = dict(cheap_model)
route["provider"] = provider
route["model"] = model
route["routing_reason"] = "simple_turn"
return route
def resolve_turn_route(user_message: str, routing_config: Optional[Dict[str, Any]], primary: Dict[str, Any]) -> Dict[str, Any]:
"""Resolve the effective model/runtime for one turn.
Returns a dict with model/runtime/signature/label fields.
"""
route = choose_cheap_model_route(user_message, routing_config)
if not route:
return {
"model": primary.get("model"),
"runtime": {
"api_key": primary.get("api_key"),
"base_url": primary.get("base_url"),
"provider": primary.get("provider"),
"api_mode": primary.get("api_mode"),
"command": primary.get("command"),
"args": list(primary.get("args") or []),
"credential_pool": primary.get("credential_pool"),
},
"label": None,
"signature": (
primary.get("model"),
primary.get("provider"),
primary.get("base_url"),
primary.get("api_mode"),
primary.get("command"),
tuple(primary.get("args") or ()),
),
}
from hermes_cli.runtime_provider import resolve_runtime_provider
explicit_api_key = None
api_key_env = str(route.get("api_key_env") or "").strip()
if api_key_env:
explicit_api_key = os.getenv(api_key_env) or None
try:
runtime = resolve_runtime_provider(
requested=route.get("provider"),
explicit_api_key=explicit_api_key,
explicit_base_url=route.get("base_url"),
)
except Exception:
return {
"model": primary.get("model"),
"runtime": {
"api_key": primary.get("api_key"),
"base_url": primary.get("base_url"),
"provider": primary.get("provider"),
"api_mode": primary.get("api_mode"),
"command": primary.get("command"),
"args": list(primary.get("args") or []),
"credential_pool": primary.get("credential_pool"),
},
"label": None,
"signature": (
primary.get("model"),
primary.get("provider"),
primary.get("base_url"),
primary.get("api_mode"),
primary.get("command"),
tuple(primary.get("args") or ()),
),
}
return {
"model": route.get("model"),
"runtime": {
"api_key": runtime.get("api_key"),
"base_url": runtime.get("base_url"),
"provider": runtime.get("provider"),
"api_mode": runtime.get("api_mode"),
"command": runtime.get("command"),
"args": list(runtime.get("args") or []),
},
"label": f"smart route → {route.get('model')} ({runtime.get('provider')})",
"signature": (
route.get("model"),
runtime.get("provider"),
runtime.get("base_url"),
runtime.get("api_mode"),
runtime.get("command"),
tuple(runtime.get("args") or ()),
),
}

219
agent/subdirectory_hints.py Normal file
View File

@@ -0,0 +1,219 @@
"""Progressive subdirectory hint discovery.
As the agent navigates into subdirectories via tool calls (read_file, terminal,
search_files, etc.), this module discovers and loads project context files
(AGENTS.md, CLAUDE.md, .cursorrules) from those directories. Discovered hints
are appended to the tool result so the model gets relevant context at the moment
it starts working in a new area of the codebase.
This complements the startup context loading in ``prompt_builder.py`` which only
loads from the CWD. Subdirectory hints are discovered lazily and injected into
the conversation without modifying the system prompt (preserving prompt caching).
Inspired by Block/goose's SubdirectoryHintTracker.
"""
import logging
import os
import re
import shlex
from pathlib import Path
from typing import Dict, Any, Optional, Set
from agent.prompt_builder import _scan_context_content
logger = logging.getLogger(__name__)
# Context files to look for in subdirectories, in priority order.
# Same filenames as prompt_builder.py but we load ALL found (not first-wins)
# since different subdirectories may use different conventions.
_HINT_FILENAMES = [
"AGENTS.md", "agents.md",
"CLAUDE.md", "claude.md",
".cursorrules",
]
# Maximum chars per hint file to prevent context bloat
_MAX_HINT_CHARS = 8_000
# Tool argument keys that typically contain file paths
_PATH_ARG_KEYS = {"path", "file_path", "workdir"}
# Tools that take shell commands where we should extract paths
_COMMAND_TOOLS = {"terminal"}
# How many parent directories to walk up when looking for hints.
# Prevents scanning all the way to / for deeply nested paths.
_MAX_ANCESTOR_WALK = 5
class SubdirectoryHintTracker:
"""Track which directories the agent visits and load hints on first access.
Usage::
tracker = SubdirectoryHintTracker(working_dir="/path/to/project")
# After each tool call:
hints = tracker.check_tool_call("read_file", {"path": "backend/src/main.py"})
if hints:
tool_result += hints # append to the tool result string
"""
def __init__(self, working_dir: Optional[str] = None):
self.working_dir = Path(working_dir or os.getcwd()).resolve()
self._loaded_dirs: Set[Path] = set()
# Pre-mark the working dir as loaded (startup context handles it)
self._loaded_dirs.add(self.working_dir)
def check_tool_call(
self,
tool_name: str,
tool_args: Dict[str, Any],
) -> Optional[str]:
"""Check tool call arguments for new directories and load any hint files.
Returns formatted hint text to append to the tool result, or None.
"""
dirs = self._extract_directories(tool_name, tool_args)
if not dirs:
return None
all_hints = []
for d in dirs:
hints = self._load_hints_for_directory(d)
if hints:
all_hints.append(hints)
if not all_hints:
return None
return "\n\n" + "\n\n".join(all_hints)
def _extract_directories(
self, tool_name: str, args: Dict[str, Any]
) -> list:
"""Extract directory paths from tool call arguments."""
candidates: Set[Path] = set()
# Direct path arguments
for key in _PATH_ARG_KEYS:
val = args.get(key)
if isinstance(val, str) and val.strip():
self._add_path_candidate(val, candidates)
# Shell commands — extract path-like tokens
if tool_name in _COMMAND_TOOLS:
cmd = args.get("command", "")
if isinstance(cmd, str):
self._extract_paths_from_command(cmd, candidates)
return list(candidates)
def _add_path_candidate(self, raw_path: str, candidates: Set[Path]):
"""Resolve a raw path and add its directory + ancestors to candidates.
Walks up from the resolved directory toward the filesystem root,
stopping at the first directory already in ``_loaded_dirs`` (or after
``_MAX_ANCESTOR_WALK`` levels). This ensures that reading
``project/src/main.py`` discovers ``project/AGENTS.md`` even when
``project/src/`` has no hint files of its own.
"""
try:
p = Path(raw_path).expanduser()
if not p.is_absolute():
p = self.working_dir / p
p = p.resolve()
# Use parent if it's a file path (has extension or doesn't exist as dir)
if p.suffix or (p.exists() and p.is_file()):
p = p.parent
# Walk up ancestors — stop at already-loaded or root
for _ in range(_MAX_ANCESTOR_WALK):
if p in self._loaded_dirs:
break
if self._is_valid_subdir(p):
candidates.add(p)
parent = p.parent
if parent == p:
break # filesystem root
p = parent
except (OSError, ValueError):
pass
def _extract_paths_from_command(self, cmd: str, candidates: Set[Path]):
"""Extract path-like tokens from a shell command string."""
try:
tokens = shlex.split(cmd)
except ValueError:
tokens = cmd.split()
for token in tokens:
# Skip flags
if token.startswith("-"):
continue
# Must look like a path (contains / or .)
if "/" not in token and "." not in token:
continue
# Skip URLs
if token.startswith(("http://", "https://", "git@")):
continue
self._add_path_candidate(token, candidates)
def _is_valid_subdir(self, path: Path) -> bool:
"""Check if path is a valid directory to scan for hints."""
if not path.is_dir():
return False
if path in self._loaded_dirs:
return False
return True
def _load_hints_for_directory(self, directory: Path) -> Optional[str]:
"""Load hint files from a directory. Returns formatted text or None."""
self._loaded_dirs.add(directory)
found_hints = []
for filename in _HINT_FILENAMES:
hint_path = directory / filename
if not hint_path.is_file():
continue
try:
content = hint_path.read_text(encoding="utf-8").strip()
if not content:
continue
# Same security scan as startup context loading
content = _scan_context_content(content, filename)
if len(content) > _MAX_HINT_CHARS:
content = (
content[:_MAX_HINT_CHARS]
+ f"\n\n[...truncated {filename}: {len(content):,} chars total]"
)
# Best-effort relative path for display
rel_path = str(hint_path)
try:
rel_path = str(hint_path.relative_to(self.working_dir))
except ValueError:
try:
rel_path = str(hint_path.relative_to(Path.home()))
rel_path = "~/" + rel_path
except ValueError:
pass # keep absolute
found_hints.append((rel_path, content))
# First match wins per directory (like startup loading)
break
except Exception as exc:
logger.debug("Could not read %s: %s", hint_path, exc)
if not found_hints:
return None
sections = []
for rel_path, content in found_hints:
sections.append(
f"[Subdirectory context discovered: {rel_path}]\n{content}"
)
logger.debug(
"Loaded subdirectory hints from %s: %s",
directory,
[h[0] for h in found_hints],
)
return "\n\n".join(sections)

125
agent/title_generator.py Normal file
View File

@@ -0,0 +1,125 @@
"""Auto-generate short session titles from the first user/assistant exchange.
Runs asynchronously after the first response is delivered so it never
adds latency to the user-facing reply.
"""
import logging
import threading
from typing import Optional
from agent.auxiliary_client import call_llm
logger = logging.getLogger(__name__)
_TITLE_PROMPT = (
"Generate a short, descriptive title (3-7 words) for a conversation that starts with the "
"following exchange. The title should capture the main topic or intent. "
"Return ONLY the title text, nothing else. No quotes, no punctuation at the end, no prefixes."
)
def generate_title(user_message: str, assistant_response: str, timeout: float = 30.0) -> Optional[str]:
"""Generate a session title from the first exchange.
Uses the auxiliary LLM client (cheapest/fastest available model).
Returns the title string or None on failure.
"""
# Truncate long messages to keep the request small
user_snippet = user_message[:500] if user_message else ""
assistant_snippet = assistant_response[:500] if assistant_response else ""
messages = [
{"role": "system", "content": _TITLE_PROMPT},
{"role": "user", "content": f"User: {user_snippet}\n\nAssistant: {assistant_snippet}"},
]
try:
response = call_llm(
task="compression", # reuse compression task config (cheap/fast model)
messages=messages,
max_tokens=30,
temperature=0.3,
timeout=timeout,
)
title = (response.choices[0].message.content or "").strip()
# Clean up: remove quotes, trailing punctuation, prefixes like "Title: "
title = title.strip('"\'')
if title.lower().startswith("title:"):
title = title[6:].strip()
# Enforce reasonable length
if len(title) > 80:
title = title[:77] + "..."
return title if title else None
except Exception as e:
logger.debug("Title generation failed: %s", e)
return None
def auto_title_session(
session_db,
session_id: str,
user_message: str,
assistant_response: str,
) -> None:
"""Generate and set a session title if one doesn't already exist.
Called in a background thread after the first exchange completes.
Silently skips if:
- session_db is None
- session already has a title (user-set or previously auto-generated)
- title generation fails
"""
if not session_db or not session_id:
return
# Check if title already exists (user may have set one via /title before first response)
try:
existing = session_db.get_session_title(session_id)
if existing:
return
except Exception:
return
title = generate_title(user_message, assistant_response)
if not title:
return
try:
session_db.set_session_title(session_id, title)
logger.debug("Auto-generated session title: %s", title)
except Exception as e:
logger.debug("Failed to set auto-generated title: %s", e)
def maybe_auto_title(
session_db,
session_id: str,
user_message: str,
assistant_response: str,
conversation_history: list,
) -> None:
"""Fire-and-forget title generation after the first exchange.
Only generates a title when:
- This appears to be the first user→assistant exchange
- No title is already set
"""
if not session_db or not session_id or not user_message or not assistant_response:
return
# Count user messages in history to detect first exchange.
# conversation_history includes the exchange that just happened,
# so for a first exchange we expect exactly 1 user message
# (or 2 counting system). Be generous: generate on first 2 exchanges.
user_msg_count = sum(1 for m in (conversation_history or []) if m.get("role") == "user")
if user_msg_count > 2:
return
thread = threading.Thread(
target=auto_title_session,
args=(session_db, session_id, user_message, assistant_response),
daemon=True,
name="auto-title",
)
thread.start()

656
agent/usage_pricing.py Normal file
View File

@@ -0,0 +1,656 @@
from __future__ import annotations
from dataclasses import dataclass
from datetime import datetime, timezone
from decimal import Decimal
from typing import Any, Dict, Literal, Optional
from agent.model_metadata import fetch_endpoint_model_metadata, fetch_model_metadata
DEFAULT_PRICING = {"input": 0.0, "output": 0.0}
_ZERO = Decimal("0")
_ONE_MILLION = Decimal("1000000")
CostStatus = Literal["actual", "estimated", "included", "unknown"]
CostSource = Literal[
"provider_cost_api",
"provider_generation_api",
"provider_models_api",
"official_docs_snapshot",
"user_override",
"custom_contract",
"none",
]
@dataclass(frozen=True)
class CanonicalUsage:
input_tokens: int = 0
output_tokens: int = 0
cache_read_tokens: int = 0
cache_write_tokens: int = 0
reasoning_tokens: int = 0
request_count: int = 1
raw_usage: Optional[dict[str, Any]] = None
@property
def prompt_tokens(self) -> int:
return self.input_tokens + self.cache_read_tokens + self.cache_write_tokens
@property
def total_tokens(self) -> int:
return self.prompt_tokens + self.output_tokens
@dataclass(frozen=True)
class BillingRoute:
provider: str
model: str
base_url: str = ""
billing_mode: str = "unknown"
@dataclass(frozen=True)
class PricingEntry:
input_cost_per_million: Optional[Decimal] = None
output_cost_per_million: Optional[Decimal] = None
cache_read_cost_per_million: Optional[Decimal] = None
cache_write_cost_per_million: Optional[Decimal] = None
request_cost: Optional[Decimal] = None
source: CostSource = "none"
source_url: Optional[str] = None
pricing_version: Optional[str] = None
fetched_at: Optional[datetime] = None
@dataclass(frozen=True)
class CostResult:
amount_usd: Optional[Decimal]
status: CostStatus
source: CostSource
label: str
fetched_at: Optional[datetime] = None
pricing_version: Optional[str] = None
notes: tuple[str, ...] = ()
_UTC_NOW = lambda: datetime.now(timezone.utc)
# Official docs snapshot entries. Models whose published pricing and cache
# semantics are stable enough to encode exactly.
_OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
(
"anthropic",
"claude-opus-4-20250514",
): PricingEntry(
input_cost_per_million=Decimal("15.00"),
output_cost_per_million=Decimal("75.00"),
cache_read_cost_per_million=Decimal("1.50"),
cache_write_cost_per_million=Decimal("18.75"),
source="official_docs_snapshot",
source_url="https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching",
pricing_version="anthropic-prompt-caching-2026-03-16",
),
(
"anthropic",
"claude-sonnet-4-20250514",
): PricingEntry(
input_cost_per_million=Decimal("3.00"),
output_cost_per_million=Decimal("15.00"),
cache_read_cost_per_million=Decimal("0.30"),
cache_write_cost_per_million=Decimal("3.75"),
source="official_docs_snapshot",
source_url="https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching",
pricing_version="anthropic-prompt-caching-2026-03-16",
),
# OpenAI
(
"openai",
"gpt-4o",
): PricingEntry(
input_cost_per_million=Decimal("2.50"),
output_cost_per_million=Decimal("10.00"),
cache_read_cost_per_million=Decimal("1.25"),
source="official_docs_snapshot",
source_url="https://openai.com/api/pricing/",
pricing_version="openai-pricing-2026-03-16",
),
(
"openai",
"gpt-4o-mini",
): PricingEntry(
input_cost_per_million=Decimal("0.15"),
output_cost_per_million=Decimal("0.60"),
cache_read_cost_per_million=Decimal("0.075"),
source="official_docs_snapshot",
source_url="https://openai.com/api/pricing/",
pricing_version="openai-pricing-2026-03-16",
),
(
"openai",
"gpt-4.1",
): PricingEntry(
input_cost_per_million=Decimal("2.00"),
output_cost_per_million=Decimal("8.00"),
cache_read_cost_per_million=Decimal("0.50"),
source="official_docs_snapshot",
source_url="https://openai.com/api/pricing/",
pricing_version="openai-pricing-2026-03-16",
),
(
"openai",
"gpt-4.1-mini",
): PricingEntry(
input_cost_per_million=Decimal("0.40"),
output_cost_per_million=Decimal("1.60"),
cache_read_cost_per_million=Decimal("0.10"),
source="official_docs_snapshot",
source_url="https://openai.com/api/pricing/",
pricing_version="openai-pricing-2026-03-16",
),
(
"openai",
"gpt-4.1-nano",
): PricingEntry(
input_cost_per_million=Decimal("0.10"),
output_cost_per_million=Decimal("0.40"),
cache_read_cost_per_million=Decimal("0.025"),
source="official_docs_snapshot",
source_url="https://openai.com/api/pricing/",
pricing_version="openai-pricing-2026-03-16",
),
(
"openai",
"o3",
): PricingEntry(
input_cost_per_million=Decimal("10.00"),
output_cost_per_million=Decimal("40.00"),
cache_read_cost_per_million=Decimal("2.50"),
source="official_docs_snapshot",
source_url="https://openai.com/api/pricing/",
pricing_version="openai-pricing-2026-03-16",
),
(
"openai",
"o3-mini",
): PricingEntry(
input_cost_per_million=Decimal("1.10"),
output_cost_per_million=Decimal("4.40"),
cache_read_cost_per_million=Decimal("0.55"),
source="official_docs_snapshot",
source_url="https://openai.com/api/pricing/",
pricing_version="openai-pricing-2026-03-16",
),
# Anthropic older models (pre-4.6 generation)
(
"anthropic",
"claude-3-5-sonnet-20241022",
): PricingEntry(
input_cost_per_million=Decimal("3.00"),
output_cost_per_million=Decimal("15.00"),
cache_read_cost_per_million=Decimal("0.30"),
cache_write_cost_per_million=Decimal("3.75"),
source="official_docs_snapshot",
source_url="https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching",
pricing_version="anthropic-pricing-2026-03-16",
),
(
"anthropic",
"claude-3-5-haiku-20241022",
): PricingEntry(
input_cost_per_million=Decimal("0.80"),
output_cost_per_million=Decimal("4.00"),
cache_read_cost_per_million=Decimal("0.08"),
cache_write_cost_per_million=Decimal("1.00"),
source="official_docs_snapshot",
source_url="https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching",
pricing_version="anthropic-pricing-2026-03-16",
),
(
"anthropic",
"claude-3-opus-20240229",
): PricingEntry(
input_cost_per_million=Decimal("15.00"),
output_cost_per_million=Decimal("75.00"),
cache_read_cost_per_million=Decimal("1.50"),
cache_write_cost_per_million=Decimal("18.75"),
source="official_docs_snapshot",
source_url="https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching",
pricing_version="anthropic-pricing-2026-03-16",
),
(
"anthropic",
"claude-3-haiku-20240307",
): PricingEntry(
input_cost_per_million=Decimal("0.25"),
output_cost_per_million=Decimal("1.25"),
cache_read_cost_per_million=Decimal("0.03"),
cache_write_cost_per_million=Decimal("0.30"),
source="official_docs_snapshot",
source_url="https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching",
pricing_version="anthropic-pricing-2026-03-16",
),
# DeepSeek
(
"deepseek",
"deepseek-chat",
): PricingEntry(
input_cost_per_million=Decimal("0.14"),
output_cost_per_million=Decimal("0.28"),
source="official_docs_snapshot",
source_url="https://api-docs.deepseek.com/quick_start/pricing",
pricing_version="deepseek-pricing-2026-03-16",
),
(
"deepseek",
"deepseek-reasoner",
): PricingEntry(
input_cost_per_million=Decimal("0.55"),
output_cost_per_million=Decimal("2.19"),
source="official_docs_snapshot",
source_url="https://api-docs.deepseek.com/quick_start/pricing",
pricing_version="deepseek-pricing-2026-03-16",
),
# Google Gemini
(
"google",
"gemini-2.5-pro",
): PricingEntry(
input_cost_per_million=Decimal("1.25"),
output_cost_per_million=Decimal("10.00"),
source="official_docs_snapshot",
source_url="https://ai.google.dev/pricing",
pricing_version="google-pricing-2026-03-16",
),
(
"google",
"gemini-2.5-flash",
): PricingEntry(
input_cost_per_million=Decimal("0.15"),
output_cost_per_million=Decimal("0.60"),
source="official_docs_snapshot",
source_url="https://ai.google.dev/pricing",
pricing_version="google-pricing-2026-03-16",
),
(
"google",
"gemini-2.0-flash",
): PricingEntry(
input_cost_per_million=Decimal("0.10"),
output_cost_per_million=Decimal("0.40"),
source="official_docs_snapshot",
source_url="https://ai.google.dev/pricing",
pricing_version="google-pricing-2026-03-16",
),
}
def _to_decimal(value: Any) -> Optional[Decimal]:
if value is None:
return None
try:
return Decimal(str(value))
except Exception:
return None
def _to_int(value: Any) -> int:
try:
return int(value or 0)
except Exception:
return 0
def resolve_billing_route(
model_name: str,
provider: Optional[str] = None,
base_url: Optional[str] = None,
) -> BillingRoute:
provider_name = (provider or "").strip().lower()
base = (base_url or "").strip().lower()
model = (model_name or "").strip()
if not provider_name and "/" in model:
inferred_provider, bare_model = model.split("/", 1)
if inferred_provider in {"anthropic", "openai", "google"}:
provider_name = inferred_provider
model = bare_model
if provider_name == "openai-codex":
return BillingRoute(provider="openai-codex", model=model, base_url=base_url or "", billing_mode="subscription_included")
if provider_name == "openrouter" or "openrouter.ai" in base:
return BillingRoute(provider="openrouter", model=model, base_url=base_url or "", billing_mode="official_models_api")
if provider_name == "anthropic":
return BillingRoute(provider="anthropic", model=model.split("/")[-1], base_url=base_url or "", billing_mode="official_docs_snapshot")
if provider_name == "openai":
return BillingRoute(provider="openai", model=model.split("/")[-1], base_url=base_url or "", billing_mode="official_docs_snapshot")
if provider_name in {"custom", "local"} or (base and "localhost" in base):
return BillingRoute(provider=provider_name or "custom", model=model, base_url=base_url or "", billing_mode="unknown")
return BillingRoute(provider=provider_name or "unknown", model=model.split("/")[-1] if model else "", base_url=base_url or "", billing_mode="unknown")
def _lookup_official_docs_pricing(route: BillingRoute) -> Optional[PricingEntry]:
return _OFFICIAL_DOCS_PRICING.get((route.provider, route.model.lower()))
def _openrouter_pricing_entry(route: BillingRoute) -> Optional[PricingEntry]:
return _pricing_entry_from_metadata(
fetch_model_metadata(),
route.model,
source_url="https://openrouter.ai/docs/api/api-reference/models/get-models",
pricing_version="openrouter-models-api",
)
def _pricing_entry_from_metadata(
metadata: Dict[str, Dict[str, Any]],
model_id: str,
*,
source_url: str,
pricing_version: str,
) -> Optional[PricingEntry]:
if model_id not in metadata:
return None
pricing = metadata[model_id].get("pricing") or {}
prompt = _to_decimal(pricing.get("prompt"))
completion = _to_decimal(pricing.get("completion"))
request = _to_decimal(pricing.get("request"))
cache_read = _to_decimal(
pricing.get("cache_read")
or pricing.get("cached_prompt")
or pricing.get("input_cache_read")
)
cache_write = _to_decimal(
pricing.get("cache_write")
or pricing.get("cache_creation")
or pricing.get("input_cache_write")
)
if prompt is None and completion is None and request is None:
return None
def _per_token_to_per_million(value: Optional[Decimal]) -> Optional[Decimal]:
if value is None:
return None
return value * _ONE_MILLION
return PricingEntry(
input_cost_per_million=_per_token_to_per_million(prompt),
output_cost_per_million=_per_token_to_per_million(completion),
cache_read_cost_per_million=_per_token_to_per_million(cache_read),
cache_write_cost_per_million=_per_token_to_per_million(cache_write),
request_cost=request,
source="provider_models_api",
source_url=source_url,
pricing_version=pricing_version,
fetched_at=_UTC_NOW(),
)
def get_pricing_entry(
model_name: str,
provider: Optional[str] = None,
base_url: Optional[str] = None,
api_key: Optional[str] = None,
) -> Optional[PricingEntry]:
route = resolve_billing_route(model_name, provider=provider, base_url=base_url)
if route.billing_mode == "subscription_included":
return PricingEntry(
input_cost_per_million=_ZERO,
output_cost_per_million=_ZERO,
cache_read_cost_per_million=_ZERO,
cache_write_cost_per_million=_ZERO,
source="none",
pricing_version="included-route",
)
if route.provider == "openrouter":
return _openrouter_pricing_entry(route)
if route.base_url:
entry = _pricing_entry_from_metadata(
fetch_endpoint_model_metadata(route.base_url, api_key=api_key or ""),
route.model,
source_url=f"{route.base_url.rstrip('/')}/models",
pricing_version="openai-compatible-models-api",
)
if entry:
return entry
return _lookup_official_docs_pricing(route)
def normalize_usage(
response_usage: Any,
*,
provider: Optional[str] = None,
api_mode: Optional[str] = None,
) -> CanonicalUsage:
"""Normalize raw API response usage into canonical token buckets.
Handles three API shapes:
- Anthropic: input_tokens/output_tokens/cache_read_input_tokens/cache_creation_input_tokens
- Codex Responses: input_tokens includes cache tokens; input_tokens_details.cached_tokens separates them
- OpenAI Chat Completions: prompt_tokens includes cache tokens; prompt_tokens_details.cached_tokens separates them
In both Codex and OpenAI modes, input_tokens is derived by subtracting cache
tokens from the total — the API contract is that input/prompt totals include
cached tokens and the details object breaks them out.
"""
if not response_usage:
return CanonicalUsage()
provider_name = (provider or "").strip().lower()
mode = (api_mode or "").strip().lower()
if mode == "anthropic_messages" or provider_name == "anthropic":
input_tokens = _to_int(getattr(response_usage, "input_tokens", 0))
output_tokens = _to_int(getattr(response_usage, "output_tokens", 0))
cache_read_tokens = _to_int(getattr(response_usage, "cache_read_input_tokens", 0))
cache_write_tokens = _to_int(getattr(response_usage, "cache_creation_input_tokens", 0))
elif mode == "codex_responses":
input_total = _to_int(getattr(response_usage, "input_tokens", 0))
output_tokens = _to_int(getattr(response_usage, "output_tokens", 0))
details = getattr(response_usage, "input_tokens_details", None)
cache_read_tokens = _to_int(getattr(details, "cached_tokens", 0) if details else 0)
cache_write_tokens = _to_int(
getattr(details, "cache_creation_tokens", 0) if details else 0
)
input_tokens = max(0, input_total - cache_read_tokens - cache_write_tokens)
else:
prompt_total = _to_int(getattr(response_usage, "prompt_tokens", 0))
output_tokens = _to_int(getattr(response_usage, "completion_tokens", 0))
details = getattr(response_usage, "prompt_tokens_details", None)
cache_read_tokens = _to_int(getattr(details, "cached_tokens", 0) if details else 0)
cache_write_tokens = _to_int(
getattr(details, "cache_write_tokens", 0) if details else 0
)
input_tokens = max(0, prompt_total - cache_read_tokens - cache_write_tokens)
reasoning_tokens = 0
output_details = getattr(response_usage, "output_tokens_details", None)
if output_details:
reasoning_tokens = _to_int(getattr(output_details, "reasoning_tokens", 0))
return CanonicalUsage(
input_tokens=input_tokens,
output_tokens=output_tokens,
cache_read_tokens=cache_read_tokens,
cache_write_tokens=cache_write_tokens,
reasoning_tokens=reasoning_tokens,
)
def estimate_usage_cost(
model_name: str,
usage: CanonicalUsage,
*,
provider: Optional[str] = None,
base_url: Optional[str] = None,
api_key: Optional[str] = None,
) -> CostResult:
route = resolve_billing_route(model_name, provider=provider, base_url=base_url)
if route.billing_mode == "subscription_included":
return CostResult(
amount_usd=_ZERO,
status="included",
source="none",
label="included",
pricing_version="included-route",
)
entry = get_pricing_entry(model_name, provider=provider, base_url=base_url, api_key=api_key)
if not entry:
return CostResult(amount_usd=None, status="unknown", source="none", label="n/a")
notes: list[str] = []
amount = _ZERO
if usage.input_tokens and entry.input_cost_per_million is None:
return CostResult(amount_usd=None, status="unknown", source=entry.source, label="n/a")
if usage.output_tokens and entry.output_cost_per_million is None:
return CostResult(amount_usd=None, status="unknown", source=entry.source, label="n/a")
if usage.cache_read_tokens:
if entry.cache_read_cost_per_million is None:
return CostResult(
amount_usd=None,
status="unknown",
source=entry.source,
label="n/a",
notes=("cache-read pricing unavailable for route",),
)
if usage.cache_write_tokens:
if entry.cache_write_cost_per_million is None:
return CostResult(
amount_usd=None,
status="unknown",
source=entry.source,
label="n/a",
notes=("cache-write pricing unavailable for route",),
)
if entry.input_cost_per_million is not None:
amount += Decimal(usage.input_tokens) * entry.input_cost_per_million / _ONE_MILLION
if entry.output_cost_per_million is not None:
amount += Decimal(usage.output_tokens) * entry.output_cost_per_million / _ONE_MILLION
if entry.cache_read_cost_per_million is not None:
amount += Decimal(usage.cache_read_tokens) * entry.cache_read_cost_per_million / _ONE_MILLION
if entry.cache_write_cost_per_million is not None:
amount += Decimal(usage.cache_write_tokens) * entry.cache_write_cost_per_million / _ONE_MILLION
if entry.request_cost is not None and usage.request_count:
amount += Decimal(usage.request_count) * entry.request_cost
status: CostStatus = "estimated"
label = f"~${amount:.2f}"
if entry.source == "none" and amount == _ZERO:
status = "included"
label = "included"
if route.provider == "openrouter":
notes.append("OpenRouter cost is estimated from the models API until reconciled.")
return CostResult(
amount_usd=amount,
status=status,
source=entry.source,
label=label,
fetched_at=entry.fetched_at,
pricing_version=entry.pricing_version,
notes=tuple(notes),
)
def has_known_pricing(
model_name: str,
provider: Optional[str] = None,
base_url: Optional[str] = None,
api_key: Optional[str] = None,
) -> bool:
"""Check whether we have pricing data for this model+route.
Uses direct lookup instead of routing through the full estimation
pipeline — avoids creating dummy usage objects just to check status.
"""
route = resolve_billing_route(model_name, provider=provider, base_url=base_url)
if route.billing_mode == "subscription_included":
return True
entry = get_pricing_entry(model_name, provider=provider, base_url=base_url, api_key=api_key)
return entry is not None
def get_pricing(
model_name: str,
provider: Optional[str] = None,
base_url: Optional[str] = None,
api_key: Optional[str] = None,
) -> Dict[str, float]:
"""Backward-compatible thin wrapper for legacy callers.
Returns only non-cache input/output fields when a pricing entry exists.
Unknown routes return zeroes.
"""
entry = get_pricing_entry(model_name, provider=provider, base_url=base_url, api_key=api_key)
if not entry:
return {"input": 0.0, "output": 0.0}
return {
"input": float(entry.input_cost_per_million or _ZERO),
"output": float(entry.output_cost_per_million or _ZERO),
}
def estimate_cost_usd(
model: str,
input_tokens: int,
output_tokens: int,
*,
provider: Optional[str] = None,
base_url: Optional[str] = None,
api_key: Optional[str] = None,
) -> float:
"""Backward-compatible helper for legacy callers.
This uses non-cached input/output only. New code should call
`estimate_usage_cost()` with canonical usage buckets.
"""
result = estimate_usage_cost(
model,
CanonicalUsage(input_tokens=input_tokens, output_tokens=output_tokens),
provider=provider,
base_url=base_url,
api_key=api_key,
)
return float(result.amount_usd or _ZERO)
def format_duration_compact(seconds: float) -> str:
if seconds < 60:
return f"{seconds:.0f}s"
minutes = seconds / 60
if minutes < 60:
return f"{minutes:.0f}m"
hours = minutes / 60
if hours < 24:
remaining_min = int(minutes % 60)
return f"{int(hours)}h {remaining_min}m" if remaining_min else f"{int(hours)}h"
days = hours / 24
return f"{days:.1f}d"
def format_token_count_compact(value: int) -> str:
abs_value = abs(int(value))
if abs_value < 1_000:
return str(int(value))
sign = "-" if value < 0 else ""
units = ((1_000_000_000, "B"), (1_000_000, "M"), (1_000, "K"))
for threshold, suffix in units:
if abs_value >= threshold:
scaled = abs_value / threshold
if scaled < 10:
text = f"{scaled:.2f}"
elif scaled < 100:
text = f"{scaled:.1f}"
else:
text = f"{scaled:.0f}"
if "." in text:
text = text.rstrip("0").rstrip(".")
return f"{sign}{text}{suffix}"
return f"{value:,}"

View File

@@ -128,6 +128,7 @@ def _extract_tool_stats(messages: List[Dict[str, Any]]) -> Dict[str, Dict[str, i
# Track tool calls from assistant messages
if msg["role"] == "assistant" and "tool_calls" in msg and msg["tool_calls"]:
for tool_call in msg["tool_calls"]:
if not tool_call or not isinstance(tool_call, dict): continue
tool_name = tool_call["function"]["name"]
tool_call_id = tool_call["id"]

View File

@@ -7,17 +7,39 @@
# =============================================================================
model:
# Default model to use (can be overridden with --model flag)
# Both "default" and "model" work as the key name here.
default: "anthropic/claude-opus-4.6"
# Inference provider selection:
# "auto" - Use Nous Portal if logged in, otherwise OpenRouter/env vars (default)
# "nous-api" - Use Nous Portal via API key (requires: NOUS_API_KEY)
# "openrouter" - Always use OpenRouter API key from OPENROUTER_API_KEY
# "nous" - Always use Nous Portal (requires: hermes login)
# "zai" - Use z.ai / ZhipuAI GLM models (requires: GLM_API_KEY)
# "kimi-coding"- Use Kimi / Moonshot AI models (requires: KIMI_API_KEY)
# "minimax" - Use MiniMax global endpoint (requires: MINIMAX_API_KEY)
# "minimax-cn" - Use MiniMax China endpoint (requires: MINIMAX_CN_API_KEY)
# "auto" - Auto-detect from credentials (default)
# "openrouter" - OpenRouter (requires: OPENROUTER_API_KEY or OPENAI_API_KEY)
# "nous" - Nous Portal OAuth (requires: hermes login)
# "nous-api" - Nous Portal API key (requires: NOUS_API_KEY)
# "anthropic" - Direct Anthropic API (requires: ANTHROPIC_API_KEY)
# "openai-codex" - OpenAI Codex (requires: hermes login --provider openai-codex)
# "copilot" - GitHub Copilot / GitHub Models (requires: GITHUB_TOKEN)
# "zai" - z.ai / ZhipuAI GLM (requires: GLM_API_KEY)
# "kimi-coding" - Kimi / Moonshot AI (requires: KIMI_API_KEY)
# "minimax" - MiniMax global (requires: MINIMAX_API_KEY)
# "minimax-cn" - MiniMax China (requires: MINIMAX_CN_API_KEY)
# "huggingface" - Hugging Face Inference (requires: HF_TOKEN)
# "kilocode" - KiloCode gateway (requires: KILOCODE_API_KEY)
# "ai-gateway" - Vercel AI Gateway (requires: AI_GATEWAY_API_KEY)
#
# Local servers (LM Studio, Ollama, vLLM, llama.cpp):
# "custom" - Any OpenAI-compatible endpoint. Set base_url below.
# Aliases: "lmstudio", "ollama", "vllm", "llamacpp" all map to "custom".
# Example for LM Studio:
# provider: "lmstudio"
# base_url: "http://localhost:1234/v1"
# No API key needed — local servers typically ignore auth.
#
# For Ollama Cloud (https://ollama.com/pricing):
# provider: "custom"
# base_url: "https://ollama.com/v1"
# Set OLLAMA_API_KEY in .env — automatically picked up when base_url
# points to ollama.com.
#
# Can also be overridden with --provider flag or HERMES_INFERENCE_PROVIDER env var.
provider: "auto"
@@ -51,6 +73,20 @@ model:
# # Data policy: "allow" (default) or "deny" to exclude providers that may store data
# # data_collection: "deny"
# =============================================================================
# Smart Model Routing (optional)
# =============================================================================
# Use a cheaper model for short/simple turns while keeping your main model for
# more complex requests. Disabled by default.
#
# smart_model_routing:
# enabled: true
# max_simple_chars: 160
# max_simple_words: 28
# cheap_model:
# provider: openrouter
# model: google/gemini-2.5-flash
# =============================================================================
# Git Worktree Isolation
# =============================================================================
@@ -76,8 +112,9 @@ model:
# - Messaging (Telegram/Discord): Uses MESSAGING_CWD from .env (default: home)
terminal:
backend: "local"
cwd: "." # For local backend: "." = current directory. Ignored for remote backends.
cwd: "." # For local backend: "." = current directory. Ignored for remote backends unless a backend documents otherwise.
timeout: 180
docker_mount_cwd_to_workspace: false # SECURITY: off by default. Opt in to mount the launch cwd into Docker /workspace.
lifetime_seconds: 300
# sudo_password: "" # Enable sudo commands (pipes via sudo -S) - SECURITY WARNING: plaintext!
@@ -107,6 +144,13 @@ terminal:
# timeout: 180
# lifetime_seconds: 300
# docker_image: "nikolaik/python-nodejs:python3.11-nodejs20"
# docker_mount_cwd_to_workspace: true # Explicit opt-in: mount your launch cwd into /workspace
# # Optional: explicitly forward selected env vars into Docker.
# # These values come from your current shell first, then ~/.hermes/.env.
# # Warning: anything forwarded here is visible to commands run in the container.
# docker_forward_env:
# - "GITHUB_TOKEN"
# - "NPM_TOKEN"
# -----------------------------------------------------------------------------
# OPTION 4: Singularity/Apptainer container
@@ -210,19 +254,34 @@ browser:
# 1. Tracks actual token usage from API responses (not estimates)
# 2. When prompt_tokens >= threshold% of model's context_length, triggers compression
# 3. Protects first 3 turns (system prompt, initial request, first response)
# 4. Protects last 4 turns (recent context is most relevant)
# 4. Protects last N turns (default 20 messages = ~10 full turns of recent context)
# 5. Summarizes middle turns using a fast/cheap model
# 6. Inserts summary as a user message, continues conversation seamlessly
#
# Post-compression tail budget is target_ratio × threshold × context_length:
# 200K context, threshold 0.50, ratio 0.20 → 20K tokens of recent tail preserved
# 1M context, threshold 0.50, ratio 0.20 → 100K tokens of recent tail preserved
#
compression:
# Enable automatic context compression (default: true)
# Set to false if you prefer to manage context manually or want errors on overflow
enabled: true
# Trigger compression at this % of model's context limit (default: 0.85 = 85%)
# Trigger compression at this % of model's context limit (default: 0.50 = 50%)
# Lower values = more aggressive compression, higher values = compress later
threshold: 0.85
threshold: 0.50
# Fraction of the threshold to preserve as recent tail (default: 0.20 = 20%)
# e.g. 20% of 50% threshold = 10% of total context kept as recent messages.
# Summary output is separately capped at 12K tokens (Gemini output limit).
# Range: 0.10 - 0.80
target_ratio: 0.20
# Number of most-recent messages to always preserve (default: 20 ≈ 10 full turns)
# Higher values keep more recent conversation intact at the cost of more aggressive
# compression of older turns.
protect_last_n: 20
# Model to use for generating summaries (fast/cheap recommended)
# This model compresses the middle turns into a concise summary.
# IMPORTANT: it receives the full middle section of the conversation, so it
@@ -271,6 +330,9 @@ compression:
# vision:
# provider: "auto"
# model: "" # e.g. "google/gemini-2.5-flash", "openai/gpt-4o"
# timeout: 30 # LLM API call timeout (seconds)
# download_timeout: 30 # Image HTTP download timeout (seconds)
# # Increase for slow connections or self-hosted image servers
#
# # Web page scraping / summarization + browser page text extraction
# web_extract:
@@ -333,6 +395,25 @@ session_reset:
idle_minutes: 1440 # Inactivity timeout in minutes (default: 1440 = 24 hours)
at_hour: 4 # Daily reset hour, 0-23 local time (default: 4 AM)
# When true, group/channel chats use one session per participant when the platform
# provides a user ID. This is the secure default and prevents users in the same
# room from sharing context, interrupts, and token costs. Set false only if you
# explicitly want one shared "room brain" per group/channel.
group_sessions_per_user: true
# ─────────────────────────────────────────────────────────────────────────────
# Gateway Streaming
# ─────────────────────────────────────────────────────────────────────────────
# Stream tokens to messaging platforms in real-time. The bot sends a message
# on first token, then progressively edits it as more tokens arrive.
# Disabled by default — enable to try the streaming UX on Telegram/Discord/Slack.
streaming:
enabled: false
# transport: edit # "edit" = progressive editMessageText
# edit_interval: 0.3 # seconds between message edits
# buffer_threshold: 40 # chars before forcing an edit flush
# cursor: " ▉" # cursor shown during streaming
# =============================================================================
# Skills Configuration
# =============================================================================
@@ -345,6 +426,15 @@ skills:
# Set to 0 to disable.
creation_nudge_interval: 15
# External skill directories — share skills across tools/agents without
# copying them into ~/.hermes/skills/. Each path is expanded (~ and ${VAR})
# and resolved to an absolute path. External dirs are read-only: skill
# creation always writes to ~/.hermes/skills/. Local skills take precedence
# when names collide.
# external_dirs:
# - ~/.agents/skills
# - /home/shared/team-skills
# =============================================================================
# Agent Behavior
# =============================================================================
@@ -383,7 +473,7 @@ agent:
# Toolsets
# =============================================================================
# Control which tools the agent has access to.
# Use "all" to enable everything, or specify individual toolsets.
# Use `hermes tools` to interactively enable/disable tools per platform.
# =============================================================================
# Platform Toolsets (per-platform tool configuration)
@@ -455,7 +545,7 @@ platform_toolsets:
# skills_hub - skill_hub (search/install/manage from online registries — user-driven only)
# moa - mixture_of_agents (requires OPENROUTER_API_KEY)
# todo - todo (in-memory task planning, no deps)
# tts - text_to_speech (Edge TTS free, or ELEVENLABS/OPENAI key)
# tts - text_to_speech (Edge TTS free, or ELEVENLABS/OPENAI/MINIMAX key)
# cronjob - cronjob (create/list/update/pause/resume/run/remove scheduled tasks)
# rl - rl_list_environments, rl_start_training, etc. (requires TINKER_API_KEY)
#
@@ -484,7 +574,7 @@ platform_toolsets:
# todo - Task planning and tracking for multi-step work
# memory - Persistent memory across sessions (personal notes + user profile)
# session_search - Search and recall past conversations (FTS5 + Gemini Flash summarization)
# tts - Text-to-speech (Edge TTS free, ElevenLabs, OpenAI)
# tts - Text-to-speech (Edge TTS free, ElevenLabs, OpenAI, MiniMax)
# cronjob - Schedule and manage automated tasks (CLI-only)
# rl - RL training tools (Tinker-Atropos)
#
@@ -492,53 +582,11 @@ platform_toolsets:
# debugging - terminal + web + file (for troubleshooting)
# safe - web + vision + moa (no terminal access)
# -----------------------------------------------------------------------------
# OPTION 1: Enable all tools (default)
# -----------------------------------------------------------------------------
toolsets:
- all
# -----------------------------------------------------------------------------
# OPTION 2: Minimal - just web search and terminal
# Great for: Simple coding tasks, quick lookups
# -----------------------------------------------------------------------------
# toolsets:
# - web
# - terminal
# -----------------------------------------------------------------------------
# OPTION 3: Research mode - no execution capabilities
# Great for: Safe information gathering, research tasks
# -----------------------------------------------------------------------------
# toolsets:
# - web
# - vision
# - skills
# -----------------------------------------------------------------------------
# OPTION 4: Full automation - browser + terminal
# Great for: Web scraping, automation tasks, testing
# -----------------------------------------------------------------------------
# toolsets:
# - terminal
# - browser
# - web
# -----------------------------------------------------------------------------
# OPTION 5: Creative mode - vision + image generation
# Great for: Design work, image analysis, creative tasks
# -----------------------------------------------------------------------------
# toolsets:
# - vision
# - image_gen
# - web
# -----------------------------------------------------------------------------
# OPTION 6: Safe mode - no terminal or browser
# Great for: Restricted environments, untrusted queries
# -----------------------------------------------------------------------------
# toolsets:
# - safe
# NOTE: The top-level "toolsets" key is deprecated and ignored.
# Tool configuration is managed per-platform via platform_toolsets above.
# Use `hermes tools` to configure interactively, or edit platform_toolsets directly.
#
# CLI override: hermes chat --toolsets terminal,web,file
# =============================================================================
# MCP (Model Context Protocol) Servers
@@ -674,6 +722,12 @@ display:
# Toggle at runtime with /verbose in the CLI
tool_progress: all
# What Enter does when Hermes is already busy in the CLI.
# interrupt: Interrupt the current run and redirect Hermes (default)
# queue: Queue your message for the next turn
# Ctrl+C always interrupts regardless of this setting.
busy_input_mode: interrupt
# Background process notifications (gateway/messaging only).
# Controls how chatty the process watcher is when you use
# terminal(background=true, check_interval=...) from Telegram/Discord/etc.
@@ -694,6 +748,12 @@ display:
# Toggle at runtime with /reasoning show or /reasoning hide.
show_reasoning: false
# Stream tokens to the terminal as they arrive instead of waiting for the
# full response. The response box opens on first token and text appears
# line-by-line. Tool calls are still captured silently.
# Stream tokens to the terminal in real-time. Disable to wait for full responses.
streaming: true
# ───────────────────────────────────────────────────────────────────────────
# Skin / Theme
# ───────────────────────────────────────────────────────────────────────────
@@ -734,3 +794,35 @@ display:
# tool_prefix: "╎" # Tool output line prefix (default: ┊)
#
skin: default
# =============================================================================
# Model Aliases — short names for /model command
# =============================================================================
# Map short aliases to exact (model, provider, base_url) tuples.
# Used by /model tab completion and resolve_alias().
# Aliases are checked BEFORE the models.dev catalog, so they can route
# to endpoints not in the catalog (e.g. Ollama Cloud, local servers).
#
# model_aliases:
# opus:
# model: claude-opus-4-6
# provider: anthropic
# qwen:
# model: "qwen3.5:397b"
# provider: custom
# base_url: "https://ollama.com/v1"
# glm:
# model: glm-4.7
# provider: custom
# base_url: "https://ollama.com/v1"
# =============================================================================
# Privacy
# =============================================================================
# privacy:
# # Redact PII from the LLM context prompt.
# # When true, phone numbers are stripped and user/chat IDs are replaced
# # with deterministic hashes before being sent to the model.
# # Names and usernames are NOT affected (user-chosen, publicly visible).
# # Routing/delivery still uses the original values internally.
# redact_pii: false

3673
cli.py Executable file → Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -7,7 +7,8 @@ This module provides scheduled task execution, allowing the agent to:
- Execute tasks in isolated sessions (no prior context)
Cron jobs are executed automatically by the gateway daemon:
hermes gateway install # Install as system service (recommended)
hermes gateway install # Install as a user service
sudo hermes gateway install --system # Linux servers: boot-time system service
hermes gateway # Or run in foreground
The gateway ticks the scheduler every 60 seconds. A file lock prevents

View File

@@ -5,15 +5,20 @@ Jobs are stored in ~/.hermes/cron/jobs.json
Output is saved to ~/.hermes/cron/output/{job_id}/{timestamp}.md
"""
import copy
import json
import logging
import tempfile
import os
import re
import uuid
from datetime import datetime, timedelta
from pathlib import Path
from hermes_constants import get_hermes_home
from typing import Optional, Dict, List, Any
logger = logging.getLogger(__name__)
from hermes_time import now as _hermes_now
try:
@@ -26,10 +31,11 @@ except ImportError:
# Configuration
# =============================================================================
HERMES_DIR = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
HERMES_DIR = get_hermes_home()
CRON_DIR = HERMES_DIR / "cron"
JOBS_FILE = CRON_DIR / "jobs.json"
OUTPUT_DIR = CRON_DIR / "output"
ONESHOT_GRACE_SECONDS = 120
def _normalize_skill_list(skill: Optional[str] = None, skills: Optional[Any] = None) -> List[str]:
@@ -164,6 +170,10 @@ def parse_schedule(schedule: str) -> Dict[str, Any]:
try:
# Parse and validate
dt = datetime.fromisoformat(schedule.replace('Z', '+00:00'))
# Make naive timestamps timezone-aware at parse time so the stored
# value doesn't depend on the system timezone matching at check time.
if dt.tzinfo is None:
dt = dt.astimezone() # Interpret as local timezone
return {
"kind": "once",
"run_at": dt.isoformat(),
@@ -212,6 +222,65 @@ def _ensure_aware(dt: datetime) -> datetime:
return dt.astimezone(target_tz)
def _recoverable_oneshot_run_at(
schedule: Dict[str, Any],
now: datetime,
*,
last_run_at: Optional[str] = None,
) -> Optional[str]:
"""Return a one-shot run time if it is still eligible to fire.
One-shot jobs get a small grace window so jobs created a few seconds after
their requested minute still run on the next tick. Once a one-shot has
already run, it is never eligible again.
"""
if schedule.get("kind") != "once":
return None
if last_run_at:
return None
run_at = schedule.get("run_at")
if not run_at:
return None
run_at_dt = _ensure_aware(datetime.fromisoformat(run_at))
if run_at_dt >= now - timedelta(seconds=ONESHOT_GRACE_SECONDS):
return run_at
return None
def _compute_grace_seconds(schedule: dict) -> int:
"""Compute how late a job can be and still catch up instead of fast-forwarding.
Uses half the schedule period, clamped between 120 seconds and 2 hours.
This ensures daily jobs can catch up if missed by up to 2 hours,
while frequent jobs (every 5-10 min) still fast-forward quickly.
"""
MIN_GRACE = 120
MAX_GRACE = 7200 # 2 hours
kind = schedule.get("kind")
if kind == "interval":
period_seconds = schedule.get("minutes", 1) * 60
grace = period_seconds // 2
return max(MIN_GRACE, min(grace, MAX_GRACE))
if kind == "cron" and HAS_CRONITER:
try:
now = _hermes_now()
cron = croniter(schedule["expr"], now)
first = cron.get_next(datetime)
second = cron.get_next(datetime)
period_seconds = int((second - first).total_seconds())
grace = period_seconds // 2
return max(MIN_GRACE, min(grace, MAX_GRACE))
except Exception:
pass
return MIN_GRACE
def compute_next_run(schedule: Dict[str, Any], last_run_at: Optional[str] = None) -> Optional[str]:
"""
Compute the next run time for a schedule.
@@ -221,9 +290,7 @@ def compute_next_run(schedule: Dict[str, Any], last_run_at: Optional[str] = None
now = _hermes_now()
if schedule["kind"] == "once":
run_at = _ensure_aware(datetime.fromisoformat(schedule["run_at"]))
# If in the future, return it; if in the past, no more runs
return schedule["run_at"] if run_at > now else None
return _recoverable_oneshot_run_at(schedule, now, last_run_at=last_run_at)
elif schedule["kind"] == "interval":
minutes = schedule["minutes"]
@@ -260,7 +327,20 @@ def load_jobs() -> List[Dict[str, Any]]:
with open(JOBS_FILE, 'r', encoding='utf-8') as f:
data = json.load(f)
return data.get("jobs", [])
except (json.JSONDecodeError, IOError):
except json.JSONDecodeError:
# Retry with strict=False to handle bare control chars in string values
try:
with open(JOBS_FILE, 'r', encoding='utf-8') as f:
data = json.loads(f.read(), strict=False)
jobs = data.get("jobs", [])
if jobs:
# Auto-repair: rewrite with proper escaping
save_jobs(jobs)
logger.warning("Auto-repaired jobs.json (had invalid control characters)")
return jobs
except Exception:
return []
except IOError:
return []
@@ -292,6 +372,10 @@ def create_job(
origin: Optional[Dict[str, Any]] = None,
skill: Optional[str] = None,
skills: Optional[List[str]] = None,
model: Optional[str] = None,
provider: Optional[str] = None,
base_url: Optional[str] = None,
script: Optional[str] = None,
) -> Dict[str, Any]:
"""
Create a new cron job.
@@ -305,12 +389,22 @@ def create_job(
origin: Source info where job was created (for "origin" delivery)
skill: Optional legacy single skill name to load before running the prompt
skills: Optional ordered list of skills to load before running the prompt
model: Optional per-job model override
provider: Optional per-job provider override
base_url: Optional per-job base URL override
script: Optional path to a Python script whose stdout is injected into the
prompt each run. The script runs before the agent turn, and its output
is prepended as context. Useful for data collection / change detection.
Returns:
The created job dict
"""
parsed_schedule = parse_schedule(schedule)
# Normalize repeat: treat 0 or negative values as None (infinite)
if repeat is not None and repeat <= 0:
repeat = None
# Auto-set repeat=1 for one-shot schedules if not specified
if parsed_schedule["kind"] == "once" and repeat is None:
repeat = 1
@@ -323,6 +417,15 @@ def create_job(
now = _hermes_now().isoformat()
normalized_skills = _normalize_skill_list(skill, skills)
normalized_model = str(model).strip() if isinstance(model, str) else None
normalized_provider = str(provider).strip() if isinstance(provider, str) else None
normalized_base_url = str(base_url).strip().rstrip("/") if isinstance(base_url, str) else None
normalized_model = normalized_model or None
normalized_provider = normalized_provider or None
normalized_base_url = normalized_base_url or None
normalized_script = str(script).strip() if isinstance(script, str) else None
normalized_script = normalized_script or None
label_source = (prompt or (normalized_skills[0] if normalized_skills else None)) or "cron job"
job = {
"id": job_id,
@@ -330,6 +433,10 @@ def create_job(
"prompt": prompt,
"skills": normalized_skills,
"skill": normalized_skills[0] if normalized_skills else None,
"model": normalized_model,
"provider": normalized_provider,
"base_url": normalized_base_url,
"script": normalized_script,
"schedule": parsed_schedule,
"schedule_display": parsed_schedule.get("display", schedule),
"repeat": {
@@ -489,7 +596,7 @@ def mark_job_run(job_id: str, success: bool, error: Optional[str] = None):
# Check if we've hit the repeat limit
times = job["repeat"].get("times")
completed = job["repeat"]["completed"]
if times is not None and completed >= times:
if times is not None and times > 0 and completed >= times:
# Remove the job (limit reached)
jobs.pop(i)
save_jobs(jobs)
@@ -511,11 +618,47 @@ def mark_job_run(job_id: str, success: bool, error: Optional[str] = None):
save_jobs(jobs)
def advance_next_run(job_id: str) -> bool:
"""Preemptively advance next_run_at for a recurring job before execution.
Call this BEFORE run_job() so that if the process crashes mid-execution,
the job won't re-fire on the next gateway restart. This converts the
scheduler from at-least-once to at-most-once for recurring jobs — missing
one run is far better than firing dozens of times in a crash loop.
One-shot jobs are left unchanged so they can still retry on restart.
Returns True if next_run_at was advanced, False otherwise.
"""
jobs = load_jobs()
for job in jobs:
if job["id"] == job_id:
kind = job.get("schedule", {}).get("kind")
if kind not in ("cron", "interval"):
return False
now = _hermes_now().isoformat()
new_next = compute_next_run(job["schedule"], now)
if new_next and new_next != job.get("next_run_at"):
job["next_run_at"] = new_next
save_jobs(jobs)
return True
return False
return False
def get_due_jobs() -> List[Dict[str, Any]]:
"""Get all jobs that are due to run now."""
"""Get all jobs that are due to run now.
For recurring jobs (cron/interval), if the scheduled time is stale
(more than one period in the past, e.g. because the gateway was down),
the job is fast-forwarded to the next future run instead of firing
immediately. This prevents a burst of missed jobs on gateway restart.
"""
now = _hermes_now()
jobs = [_apply_skill_fields(j) for j in load_jobs()]
raw_jobs = load_jobs()
jobs = [_apply_skill_fields(j) for j in copy.deepcopy(raw_jobs)]
due = []
needs_save = False
for job in jobs:
if not job.get("enabled", True):
@@ -523,12 +666,62 @@ def get_due_jobs() -> List[Dict[str, Any]]:
next_run = job.get("next_run_at")
if not next_run:
continue
recovered_next = _recoverable_oneshot_run_at(
job.get("schedule", {}),
now,
last_run_at=job.get("last_run_at"),
)
if not recovered_next:
continue
job["next_run_at"] = recovered_next
next_run = recovered_next
logger.info(
"Job '%s' had no next_run_at; recovering one-shot run at %s",
job.get("name", job["id"]),
recovered_next,
)
for rj in raw_jobs:
if rj["id"] == job["id"]:
rj["next_run_at"] = recovered_next
needs_save = True
break
next_run_dt = _ensure_aware(datetime.fromisoformat(next_run))
if next_run_dt <= now:
schedule = job.get("schedule", {})
kind = schedule.get("kind")
# For recurring jobs, check if the scheduled time is stale
# (gateway was down and missed the window). Fast-forward to
# the next future occurrence instead of firing a stale run.
grace = _compute_grace_seconds(schedule)
if kind in ("cron", "interval") and (now - next_run_dt).total_seconds() > grace:
# Job is past its catch-up grace window — this is a stale missed run.
# Grace scales with schedule period: daily=2h, hourly=30m, 10min=5m.
new_next = compute_next_run(schedule, now.isoformat())
if new_next:
logger.info(
"Job '%s' missed its scheduled time (%s, grace=%ds). "
"Fast-forwarding to next run: %s",
job.get("name", job["id"]),
next_run,
grace,
new_next,
)
# Update the job in storage
for rj in raw_jobs:
if rj["id"] == job["id"]:
rj["next_run_at"] = new_next
needs_save = True
break
continue # Skip this run
due.append(job)
if needs_save:
save_jobs(raw_jobs)
return due

View File

@@ -9,11 +9,12 @@ runs at a time if multiple processes overlap.
"""
import asyncio
import concurrent.futures
import json
import logging
import os
import subprocess
import sys
import traceback
# fcntl is Unix-only; on Windows use msvcrt for file locking
try:
@@ -24,21 +25,35 @@ except ImportError:
import msvcrt
except ImportError:
msvcrt = None
from datetime import datetime
from pathlib import Path
from hermes_constants import get_hermes_home
from hermes_cli.config import load_config
from typing import Optional
from hermes_time import now as _hermes_now
logger = logging.getLogger(__name__)
# Valid delivery platforms — used to validate user-supplied platform names
# in cron delivery targets, preventing env var enumeration via crafted names.
_KNOWN_DELIVERY_PLATFORMS = frozenset({
"telegram", "discord", "slack", "whatsapp", "signal",
"matrix", "mattermost", "homeassistant", "dingtalk", "feishu",
"wecom", "sms", "email", "webhook",
})
# Add parent directory to path for imports
sys.path.insert(0, str(Path(__file__).parent.parent))
from cron.jobs import get_due_jobs, mark_job_run, save_job_output
from cron.jobs import get_due_jobs, mark_job_run, save_job_output, advance_next_run
# Sentinel: when a cron agent has nothing new to report, it can start its
# response with this marker to suppress delivery. Output is still saved
# locally for audit.
SILENT_MARKER = "[SILENT]"
# Resolve Hermes home directory (respects HERMES_HOME override)
_hermes_home = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
_hermes_home = get_hermes_home()
# File-based lock prevents concurrent ticks from gateway + daemon + systemd timer
_LOCK_DIR = _hermes_home / "cron"
@@ -66,20 +81,58 @@ def _resolve_delivery_target(job: dict) -> Optional[dict]:
return None
if deliver == "origin":
if not origin:
return None
return {
"platform": origin["platform"],
"chat_id": str(origin["chat_id"]),
"thread_id": origin.get("thread_id"),
}
if origin:
return {
"platform": origin["platform"],
"chat_id": str(origin["chat_id"]),
"thread_id": origin.get("thread_id"),
}
# Origin missing (e.g. job created via API/script) — try each
# platform's home channel as a fallback instead of silently dropping.
for platform_name in ("matrix", "telegram", "discord", "slack"):
chat_id = os.getenv(f"{platform_name.upper()}_HOME_CHANNEL", "")
if chat_id:
logger.info(
"Job '%s' has deliver=origin but no origin; falling back to %s home channel",
job.get("name", job.get("id", "?")),
platform_name,
)
return {
"platform": platform_name,
"chat_id": chat_id,
"thread_id": None,
}
return None
if ":" in deliver:
platform_name, chat_id = deliver.split(":", 1)
platform_name, rest = deliver.split(":", 1)
platform_key = platform_name.lower()
from tools.send_message_tool import _parse_target_ref
parsed_chat_id, parsed_thread_id, is_explicit = _parse_target_ref(platform_key, rest)
if is_explicit:
chat_id, thread_id = parsed_chat_id, parsed_thread_id
else:
chat_id, thread_id = rest, None
# Resolve human-friendly labels like "Alice (dm)" to real IDs.
try:
from gateway.channel_directory import resolve_channel_name
resolved = resolve_channel_name(platform_key, chat_id)
if resolved:
parsed_chat_id, parsed_thread_id, resolved_is_explicit = _parse_target_ref(platform_key, resolved)
if resolved_is_explicit:
chat_id, thread_id = parsed_chat_id, parsed_thread_id
else:
chat_id = resolved
except Exception:
pass
return {
"platform": platform_name,
"chat_id": chat_id,
"thread_id": None,
"thread_id": thread_id,
}
platform_name = deliver
@@ -90,6 +143,8 @@ def _resolve_delivery_target(job: dict) -> Optional[dict]:
"thread_id": origin.get("thread_id"),
}
if platform_name.lower() not in _KNOWN_DELIVERY_PLATFORMS:
return None
chat_id = os.getenv(f"{platform_name.upper()}_HOME_CHANNEL", "")
if not chat_id:
return None
@@ -101,12 +156,14 @@ def _resolve_delivery_target(job: dict) -> Optional[dict]:
}
def _deliver_result(job: dict, content: str) -> None:
def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> None:
"""
Deliver job output to the configured target (origin chat, specific platform, etc.).
Uses the standalone platform send functions from send_message_tool so delivery
works whether or not the gateway is running.
When ``adapters`` and ``loop`` are provided (gateway is running), tries to
use the live adapter first — this supports E2EE rooms (e.g. Matrix) where
the standalone HTTP path cannot encrypt. Falls back to standalone send if
the adapter path fails or is unavailable.
"""
target = _resolve_delivery_target(job)
if not target:
@@ -131,7 +188,14 @@ def _deliver_result(job: dict, content: str) -> None:
"slack": Platform.SLACK,
"whatsapp": Platform.WHATSAPP,
"signal": Platform.SIGNAL,
"matrix": Platform.MATRIX,
"mattermost": Platform.MATTERMOST,
"homeassistant": Platform.HOMEASSISTANT,
"dingtalk": Platform.DINGTALK,
"feishu": Platform.FEISHU,
"wecom": Platform.WECOM,
"email": Platform.EMAIL,
"sms": Platform.SMS,
}
platform = platform_map.get(platform_name.lower())
if not platform:
@@ -149,15 +213,66 @@ def _deliver_result(job: dict, content: str) -> None:
logger.warning("Job '%s': platform '%s' not configured/enabled", job["id"], platform_name)
return
# Run the async send in a fresh event loop (safe from any thread)
# Optionally wrap the content with a header/footer so the user knows this
# is a cron delivery. Wrapping is on by default; set cron.wrap_response: false
# in config.yaml for clean output.
wrap_response = True
try:
result = asyncio.run(_send_to_platform(platform, pconfig, chat_id, content, thread_id=thread_id))
user_cfg = load_config()
wrap_response = user_cfg.get("cron", {}).get("wrap_response", True)
except Exception:
pass
if wrap_response:
task_name = job.get("name", job["id"])
delivery_content = (
f"Cronjob Response: {task_name}\n"
f"-------------\n\n"
f"{content}\n\n"
f"Note: The agent cannot see this message, and therefore cannot respond to it."
)
else:
delivery_content = content
# Prefer the live adapter when the gateway is running — this supports E2EE
# rooms (e.g. Matrix) where the standalone HTTP path cannot encrypt.
runtime_adapter = (adapters or {}).get(platform)
if runtime_adapter is not None and loop is not None and getattr(loop, "is_running", lambda: False)():
send_metadata = {"thread_id": thread_id} if thread_id else None
try:
future = asyncio.run_coroutine_threadsafe(
runtime_adapter.send(chat_id, delivery_content, metadata=send_metadata),
loop,
)
send_result = future.result(timeout=60)
if send_result and not getattr(send_result, "success", True):
err = getattr(send_result, "error", "unknown")
logger.warning(
"Job '%s': live adapter send to %s:%s failed (%s), falling back to standalone",
job["id"], platform_name, chat_id, err,
)
else:
logger.info("Job '%s': delivered to %s:%s via live adapter", job["id"], platform_name, chat_id)
return
except Exception as e:
logger.warning(
"Job '%s': live adapter delivery to %s:%s failed (%s), falling back to standalone",
job["id"], platform_name, chat_id, e,
)
# Standalone path: run the async send in a fresh event loop (safe from any thread)
coro = _send_to_platform(platform, pconfig, chat_id, delivery_content, thread_id=thread_id)
try:
result = asyncio.run(coro)
except RuntimeError:
# asyncio.run() fails if there's already a running loop in this thread;
# spin up a new thread to avoid that.
# asyncio.run() checks for a running loop before awaiting the coroutine;
# when it raises, the original coro was never started — close it to
# prevent "coroutine was never awaited" RuntimeWarning, then retry in a
# fresh thread that has no running loop.
coro.close()
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
future = pool.submit(asyncio.run, _send_to_platform(platform, pconfig, chat_id, content, thread_id=thread_id))
future = pool.submit(asyncio.run, _send_to_platform(platform, pconfig, chat_id, delivery_content, thread_id=thread_id))
result = future.result(timeout=30)
except Exception as e:
logger.error("Job '%s': delivery to %s:%s failed: %s", job["id"], platform_name, chat_id, e)
@@ -167,18 +282,115 @@ def _deliver_result(job: dict, content: str) -> None:
logger.error("Job '%s': delivery error: %s", job["id"], result["error"])
else:
logger.info("Job '%s': delivered to %s:%s", job["id"], platform_name, chat_id)
# Mirror the delivered content into the target's gateway session
_SCRIPT_TIMEOUT = 120 # seconds
def _run_job_script(script_path: str) -> tuple[bool, str]:
"""Execute a cron job's data-collection script and capture its output.
Args:
script_path: Path to a Python script (resolved via HERMES_HOME/scripts/ or absolute).
Returns:
(success, output) — on failure *output* contains the error message so the
LLM can report the problem to the user.
"""
from hermes_constants import get_hermes_home
path = Path(script_path).expanduser()
if not path.is_absolute():
# Resolve relative paths against HERMES_HOME/scripts/
scripts_dir = get_hermes_home() / "scripts"
path = (scripts_dir / path).resolve()
# Guard against path traversal (e.g. "../../etc/passwd")
try:
from gateway.mirror import mirror_to_session
mirror_to_session(platform_name, chat_id, content, source_label="cron", thread_id=thread_id)
except Exception as e:
logger.warning("Job '%s': mirror_to_session failed: %s", job["id"], e)
path.relative_to(scripts_dir.resolve())
except ValueError:
return False, f"Script path escapes the scripts directory: {script_path!r}"
if not path.exists():
return False, f"Script not found: {path}"
if not path.is_file():
return False, f"Script path is not a file: {path}"
try:
result = subprocess.run(
[sys.executable, str(path)],
capture_output=True,
text=True,
timeout=_SCRIPT_TIMEOUT,
cwd=str(path.parent),
)
stdout = (result.stdout or "").strip()
stderr = (result.stderr or "").strip()
if result.returncode != 0:
parts = [f"Script exited with code {result.returncode}"]
if stderr:
parts.append(f"stderr:\n{stderr}")
if stdout:
parts.append(f"stdout:\n{stdout}")
return False, "\n".join(parts)
# Redact any secrets that may appear in script output before
# they are injected into the LLM prompt context.
try:
from agent.redact import redact_sensitive_text
stdout = redact_sensitive_text(stdout)
except Exception:
pass
return True, stdout
except subprocess.TimeoutExpired:
return False, f"Script timed out after {_SCRIPT_TIMEOUT}s: {path}"
except Exception as exc:
return False, f"Script execution failed: {exc}"
def _build_job_prompt(job: dict) -> str:
"""Build the effective prompt for a cron job, optionally loading one or more skills first."""
prompt = job.get("prompt", "")
skills = job.get("skills")
# Run data-collection script if configured, inject output as context.
script_path = job.get("script")
if script_path:
success, script_output = _run_job_script(script_path)
if success:
if script_output:
prompt = (
"## Script Output\n"
"The following data was collected by a pre-run script. "
"Use it as context for your analysis.\n\n"
f"```\n{script_output}\n```\n\n"
f"{prompt}"
)
else:
prompt = (
"[Script ran successfully but produced no output.]\n\n"
f"{prompt}"
)
else:
prompt = (
"## Script Error\n"
"The data-collection script failed. Report this to the user.\n\n"
f"```\n{script_output}\n```\n\n"
f"{prompt}"
)
# Always prepend [SILENT] guidance so the cron agent can suppress
# delivery when it has nothing new or noteworthy to report.
silent_hint = (
"[SYSTEM: If you have a meaningful status report or findings, "
"send them — that is the whole point of this job. Only respond "
"with exactly \"[SILENT]\" (nothing else) when there is genuinely "
"nothing new to report. [SILENT] suppresses delivery to the user. "
"Never combine [SILENT] with content — either report your "
"findings normally, or say [SILENT] and nothing more.]\n\n"
)
prompt = silent_hint + prompt
if skills is None:
legacy = job.get("skill")
skills = [legacy] if legacy else []
@@ -190,11 +402,14 @@ def _build_job_prompt(job: dict) -> str:
from tools.skills_tool import skill_view
parts = []
skipped: list[str] = []
for skill_name in skill_names:
loaded = json.loads(skill_view(skill_name))
if not loaded.get("success"):
error = loaded.get("error") or f"Failed to load skill '{skill_name}'"
raise RuntimeError(error)
logger.warning("Cron job '%s': skill not found, skipping — %s", job.get("name", job.get("id")), error)
skipped.append(skill_name)
continue
content = str(loaded.get("content") or "").strip()
if parts:
@@ -207,6 +422,15 @@ def _build_job_prompt(job: dict) -> str:
]
)
if skipped:
notice = (
f"[SYSTEM: The following skill(s) were listed for this job but could not be found "
f"and were skipped: {', '.join(skipped)}. "
f"Start your response with a brief notice so the user is aware, e.g.: "
f"'⚠️ Skill(s) not found and skipped: {', '.join(skipped)}']"
)
parts.insert(0, notice)
if prompt:
parts.extend(["", f"The user has provided the following instruction alongside the skill invocation: {prompt}"])
return "\n".join(parts)
@@ -234,6 +458,7 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
job_name = job["name"]
prompt = _build_job_prompt(job)
origin = _resolve_origin(job)
_cron_session_id = f"cron_{job_id}_{_hermes_now().strftime('%Y%m%d_%H%M%S')}"
logger.info("Running job '%s' (ID: %s)", job_name, job_id)
logger.info("Prompt: %s", prompt[:100])
@@ -261,7 +486,7 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
if delivery_target.get("thread_id") is not None:
os.environ["HERMES_CRON_AUTO_DELIVER_THREAD_ID"] = str(delivery_target["thread_id"])
model = os.getenv("HERMES_MODEL") or "anthropic/claude-opus-4.6"
model = job.get("model") or os.getenv("HERMES_MODEL") or ""
# Load config.yaml for model, reasoning, prefill, toolsets, provider routing
_cfg = {}
@@ -272,24 +497,20 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
with open(_cfg_path) as _f:
_cfg = yaml.safe_load(_f) or {}
_model_cfg = _cfg.get("model", {})
if isinstance(_model_cfg, str):
model = _model_cfg
elif isinstance(_model_cfg, dict):
model = _model_cfg.get("default", model)
if not job.get("model"):
if isinstance(_model_cfg, str):
model = _model_cfg
elif isinstance(_model_cfg, dict):
model = _model_cfg.get("default", model)
except Exception as e:
logger.warning("Job '%s': failed to load config.yaml, using defaults: %s", job_id, e)
# Reasoning config from env or config.yaml
reasoning_config = None
from hermes_constants import parse_reasoning_effort
effort = os.getenv("HERMES_REASONING_EFFORT", "")
if not effort:
effort = str(_cfg.get("agent", {}).get("reasoning_effort", "")).strip()
if effort and effort.lower() != "none":
valid = ("xhigh", "high", "medium", "low", "minimal")
if effort.lower() in valid:
reasoning_config = {"enabled": True, "effort": effort.lower()}
elif effort.lower() == "none":
reasoning_config = {"enabled": False}
reasoning_config = parse_reasoning_effort(effort)
# Prefill messages from env or config.yaml
prefill_messages = None
@@ -314,25 +535,46 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
# Provider routing
pr = _cfg.get("provider_routing", {})
smart_routing = _cfg.get("smart_model_routing", {}) or {}
from hermes_cli.runtime_provider import (
resolve_runtime_provider,
format_runtime_provider_error,
)
try:
runtime = resolve_runtime_provider(
requested=os.getenv("HERMES_INFERENCE_PROVIDER"),
)
runtime_kwargs = {
"requested": job.get("provider") or os.getenv("HERMES_INFERENCE_PROVIDER"),
}
if job.get("base_url"):
runtime_kwargs["explicit_base_url"] = job.get("base_url")
runtime = resolve_runtime_provider(**runtime_kwargs)
except Exception as exc:
message = format_runtime_provider_error(exc)
raise RuntimeError(message) from exc
from agent.smart_model_routing import resolve_turn_route
turn_route = resolve_turn_route(
prompt,
smart_routing,
{
"model": model,
"api_key": runtime.get("api_key"),
"base_url": runtime.get("base_url"),
"provider": runtime.get("provider"),
"api_mode": runtime.get("api_mode"),
"command": runtime.get("command"),
"args": list(runtime.get("args") or []),
},
)
agent = AIAgent(
model=model,
api_key=runtime.get("api_key"),
base_url=runtime.get("base_url"),
provider=runtime.get("provider"),
api_mode=runtime.get("api_mode"),
model=turn_route["model"],
api_key=turn_route["runtime"].get("api_key"),
base_url=turn_route["runtime"].get("base_url"),
provider=turn_route["runtime"].get("provider"),
api_mode=turn_route["runtime"].get("api_mode"),
acp_command=turn_route["runtime"].get("command"),
acp_args=turn_route["runtime"].get("args"),
max_iterations=max_iterations,
reasoning_config=reasoning_config,
prefill_messages=prefill_messages,
@@ -340,18 +582,42 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
providers_ignored=pr.get("ignore"),
providers_order=pr.get("order"),
provider_sort=pr.get("sort"),
disabled_toolsets=["cronjob"],
disabled_toolsets=["cronjob", "messaging", "clarify"],
quiet_mode=True,
skip_memory=True, # Cron system prompts would corrupt user representations
platform="cron",
session_id=f"cron_{job_id}_{_hermes_now().strftime('%Y%m%d_%H%M%S')}",
session_id=_cron_session_id,
session_db=_session_db,
)
result = agent.run_conversation(prompt)
final_response = result.get("final_response", "")
if not final_response:
final_response = "(No response generated)"
# Run the agent with a timeout so a hung API call or tool doesn't
# block the cron ticker thread indefinitely. Default 10 minutes;
# override via env var. Uses a separate thread because
# run_conversation is synchronous.
_cron_timeout = float(os.getenv("HERMES_CRON_TIMEOUT", 600))
_cron_pool = concurrent.futures.ThreadPoolExecutor(max_workers=1)
_cron_future = _cron_pool.submit(agent.run_conversation, prompt)
try:
result = _cron_future.result(timeout=_cron_timeout)
except concurrent.futures.TimeoutError:
logger.error(
"Job '%s' timed out after %.0fs — interrupting agent",
job_name, _cron_timeout,
)
if hasattr(agent, "interrupt"):
agent.interrupt("Cron job timed out")
_cron_pool.shutdown(wait=False, cancel_futures=True)
raise TimeoutError(
f"Cron job '{job_name}' timed out after "
f"{int(_cron_timeout // 60)} minutes"
)
finally:
_cron_pool.shutdown(wait=False)
final_response = result.get("final_response", "") or ""
# Use a separate variable for log display; keep final_response clean
# for delivery logic (empty response = no delivery).
logged_response = final_response if final_response else "(No response generated)"
output = f"""# Cron Job: {job_name}
@@ -365,7 +631,7 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
## Response
{final_response}
{logged_response}
"""
logger.info("Job '%s' completed successfully", job_name)
@@ -373,7 +639,7 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
except Exception as e:
error_msg = f"{type(e).__name__}: {str(e)}"
logger.error("Job '%s' failed: %s", job_name, error_msg)
logger.exception("Job '%s' failed: %s", job_name, error_msg)
output = f"""# Cron Job: {job_name} (FAILED)
@@ -389,8 +655,6 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
```
{error_msg}
{traceback.format_exc()}
```
"""
return False, output, "", error_msg
@@ -407,13 +671,17 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
):
os.environ.pop(key, None)
if _session_db:
try:
_session_db.end_session(_cron_session_id, "cron_complete")
except (Exception, KeyboardInterrupt) as e:
logger.debug("Job '%s': failed to end session: %s", job_id, e)
try:
_session_db.close()
except Exception as e:
except (Exception, KeyboardInterrupt) as e:
logger.debug("Job '%s': failed to close SQLite session store: %s", job_id, e)
def tick(verbose: bool = True) -> int:
def tick(verbose: bool = True, adapters=None, loop=None) -> int:
"""
Check and run all due jobs.
@@ -422,6 +690,8 @@ def tick(verbose: bool = True) -> int:
Args:
verbose: Whether to print status messages
adapters: Optional dict mapping Platform → live adapter (from gateway)
loop: Optional asyncio event loop (from gateway) for live adapter sends
Returns:
Number of jobs executed (0 if another tick is already running)
@@ -455,17 +725,30 @@ def tick(verbose: bool = True) -> int:
executed = 0
for job in due_jobs:
try:
# For recurring jobs (cron/interval), advance next_run_at to the
# next future occurrence BEFORE execution. This way, if the
# process crashes mid-run, the job won't re-fire on restart.
# One-shot jobs are left alone so they can retry on restart.
advance_next_run(job["id"])
success, output, final_response, error = run_job(job)
output_file = save_job_output(job["id"], output)
if verbose:
logger.info("Output saved to: %s", output_file)
# Deliver the final response to the origin/target chat
# Deliver the final response to the origin/target chat.
# If the agent responded with [SILENT], skip delivery (but
# output is already saved above). Failed jobs always deliver.
deliver_content = final_response if success else f"⚠️ Cron job '{job.get('name', job['id'])}' failed:\n{error}"
if deliver_content:
should_deliver = bool(deliver_content)
if should_deliver and success and deliver_content.strip().upper().startswith(SILENT_MARKER):
logger.info("Job '%s': agent returned %s — skipping delivery", job["id"], SILENT_MARKER)
should_deliver = False
if should_deliver:
try:
_deliver_result(job, deliver_content)
_deliver_result(job, deliver_content, adapters=adapters, loop=loop)
except Exception as de:
logger.error("Delivery failed for job %s: %s", job["id"], de)

15
docker/SOUL.md Normal file
View File

@@ -0,0 +1,15 @@
# Hermes Agent Persona
<!--
This file defines the agent's personality and tone.
The agent will embody whatever you write here.
Edit this to customize how Hermes communicates with you.
Examples:
- "You are a warm, playful assistant who uses kaomoji occasionally."
- "You are a concise technical expert. No fluff, just facts."
- "You speak like a friendly coworker who happens to know everything."
This file is loaded fresh each message -- no restart needed.
Delete the contents (or this file) to use the default personality.
-->

34
docker/entrypoint.sh Normal file
View File

@@ -0,0 +1,34 @@
#!/bin/bash
# Docker entrypoint: bootstrap config files into the mounted volume, then run hermes.
set -e
HERMES_HOME="/opt/data"
INSTALL_DIR="/opt/hermes"
# Create essential directory structure. Cache and platform directories
# (cache/images, cache/audio, platforms/whatsapp, etc.) are created on
# demand by the application — don't pre-create them here so new installs
# get the consolidated layout from get_hermes_dir().
mkdir -p "$HERMES_HOME"/{cron,sessions,logs,hooks,memories,skills}
# .env
if [ ! -f "$HERMES_HOME/.env" ]; then
cp "$INSTALL_DIR/.env.example" "$HERMES_HOME/.env"
fi
# config.yaml
if [ ! -f "$HERMES_HOME/config.yaml" ]; then
cp "$INSTALL_DIR/cli-config.yaml.example" "$HERMES_HOME/config.yaml"
fi
# SOUL.md
if [ ! -f "$HERMES_HOME/SOUL.md" ]; then
cp "$INSTALL_DIR/docker/SOUL.md" "$HERMES_HOME/SOUL.md"
fi
# Sync bundled skills (manifest-based so user edits are preserved)
if [ -d "$INSTALL_DIR/skills" ]; then
python3 "$INSTALL_DIR/tools/skills_sync.py"
fi
exec hermes "$@"

View File

@@ -76,14 +76,13 @@ Open Zed settings (`Cmd+,` on macOS or `Ctrl+,` on Linux) and add to your
```json
{
"acp": {
"agents": [
{
"name": "hermes-agent",
"registry_dir": "/path/to/hermes-agent/acp_registry"
}
]
}
"agent_servers": {
"hermes-agent": {
"type": "custom",
"command": "hermes",
"args": ["acp"],
},
},
}
```

View File

@@ -0,0 +1,608 @@
# Pricing Accuracy Architecture
Date: 2026-03-16
## Goal
Hermes should only show dollar costs when they are backed by an official source for the user's actual billing path.
This design replaces the current static, heuristic pricing flow in:
- `run_agent.py`
- `agent/usage_pricing.py`
- `agent/insights.py`
- `cli.py`
with a provider-aware pricing system that:
- handles cache billing correctly
- distinguishes `actual` vs `estimated` vs `included` vs `unknown`
- reconciles post-hoc costs when providers expose authoritative billing data
- supports direct providers, OpenRouter, subscriptions, enterprise pricing, and custom endpoints
## Problems In The Current Design
Current Hermes behavior has four structural issues:
1. It stores only `prompt_tokens` and `completion_tokens`, which is insufficient for providers that bill cache reads and cache writes separately.
2. It uses a static model price table and fuzzy heuristics, which can drift from current official pricing.
3. It assumes public API list pricing matches the user's real billing path.
4. It has no distinction between live estimates and reconciled billed cost.
## Design Principles
1. Normalize usage before pricing.
2. Never fold cached tokens into plain input cost.
3. Track certainty explicitly.
4. Treat the billing path as part of the model identity.
5. Prefer official machine-readable sources over scraped docs.
6. Use post-hoc provider cost APIs when available.
7. Show `n/a` rather than inventing precision.
## High-Level Architecture
The new system has four layers:
1. `usage_normalization`
Converts raw provider usage into a canonical usage record.
2. `pricing_source_resolution`
Determines the billing path, source of truth, and applicable pricing source.
3. `cost_estimation_and_reconciliation`
Produces an immediate estimate when possible, then replaces or annotates it with actual billed cost later.
4. `presentation`
`/usage`, `/insights`, and the status bar display cost with certainty metadata.
## Canonical Usage Record
Add a canonical usage model that every provider path maps into before any pricing math happens.
Suggested structure:
```python
@dataclass
class CanonicalUsage:
provider: str
billing_provider: str
model: str
billing_route: str
input_tokens: int = 0
output_tokens: int = 0
cache_read_tokens: int = 0
cache_write_tokens: int = 0
reasoning_tokens: int = 0
request_count: int = 1
raw_usage: dict[str, Any] | None = None
raw_usage_fields: dict[str, str] | None = None
computed_fields: set[str] | None = None
provider_request_id: str | None = None
provider_generation_id: str | None = None
provider_response_id: str | None = None
```
Rules:
- `input_tokens` means non-cached input only.
- `cache_read_tokens` and `cache_write_tokens` are never merged into `input_tokens`.
- `output_tokens` excludes cache metrics.
- `reasoning_tokens` is telemetry unless a provider officially bills it separately.
This is the same normalization pattern used by `opencode`, extended with provenance and reconciliation ids.
## Provider Normalization Rules
### OpenAI Direct
Source usage fields:
- `prompt_tokens`
- `completion_tokens`
- `prompt_tokens_details.cached_tokens`
Normalization:
- `cache_read_tokens = cached_tokens`
- `input_tokens = prompt_tokens - cached_tokens`
- `cache_write_tokens = 0` unless OpenAI exposes it in the relevant route
- `output_tokens = completion_tokens`
### Anthropic Direct
Source usage fields:
- `input_tokens`
- `output_tokens`
- `cache_read_input_tokens`
- `cache_creation_input_tokens`
Normalization:
- `input_tokens = input_tokens`
- `output_tokens = output_tokens`
- `cache_read_tokens = cache_read_input_tokens`
- `cache_write_tokens = cache_creation_input_tokens`
### OpenRouter
Estimate-time usage normalization should use the response usage payload with the same rules as the underlying provider when possible.
Reconciliation-time records should also store:
- OpenRouter generation id
- native token fields when available
- `total_cost`
- `cache_discount`
- `upstream_inference_cost`
- `is_byok`
### Gemini / Vertex
Use official Gemini or Vertex usage fields where available.
If cached content tokens are exposed:
- map them to `cache_read_tokens`
If a route exposes no cache creation metric:
- store `cache_write_tokens = 0`
- preserve the raw usage payload for later extension
### DeepSeek And Other Direct Providers
Normalize only the fields that are officially exposed.
If a provider does not expose cache buckets:
- do not infer them unless the provider explicitly documents how to derive them
### Subscription / Included-Cost Routes
These still use the canonical usage model.
Tokens are tracked normally. Cost depends on billing mode, not on whether usage exists.
## Billing Route Model
Hermes must stop keying pricing solely by `model`.
Introduce a billing route descriptor:
```python
@dataclass
class BillingRoute:
provider: str
base_url: str | None
model: str
billing_mode: str
organization_hint: str | None = None
```
`billing_mode` values:
- `official_cost_api`
- `official_generation_api`
- `official_models_api`
- `official_docs_snapshot`
- `subscription_included`
- `user_override`
- `custom_contract`
- `unknown`
Examples:
- OpenAI direct API with Costs API access: `official_cost_api`
- Anthropic direct API with Usage & Cost API access: `official_cost_api`
- OpenRouter request before reconciliation: `official_models_api`
- OpenRouter request after generation lookup: `official_generation_api`
- GitHub Copilot style subscription route: `subscription_included`
- local OpenAI-compatible server: `unknown`
- enterprise contract with configured rates: `custom_contract`
## Cost Status Model
Every displayed cost should have:
```python
@dataclass
class CostResult:
amount_usd: Decimal | None
status: Literal["actual", "estimated", "included", "unknown"]
source: Literal[
"provider_cost_api",
"provider_generation_api",
"provider_models_api",
"official_docs_snapshot",
"user_override",
"custom_contract",
"none",
]
label: str
fetched_at: datetime | None
pricing_version: str | None
notes: list[str]
```
Presentation rules:
- `actual`: show dollar amount as final
- `estimated`: show dollar amount with estimate labeling
- `included`: show `included` or `$0.00 (included)` depending on UX choice
- `unknown`: show `n/a`
## Official Source Hierarchy
Resolve cost using this order:
1. Request-level or account-level official billed cost
2. Official machine-readable model pricing
3. Official docs snapshot
4. User override or custom contract
5. Unknown
The system must never skip to a lower level if a higher-confidence source exists for the current billing route.
## Provider-Specific Truth Rules
### OpenAI Direct
Preferred truth:
1. Costs API for reconciled spend
2. Official pricing page for live estimate
### Anthropic Direct
Preferred truth:
1. Usage & Cost API for reconciled spend
2. Official pricing docs for live estimate
### OpenRouter
Preferred truth:
1. `GET /api/v1/generation` for reconciled `total_cost`
2. `GET /api/v1/models` pricing for live estimate
Do not use underlying provider public pricing as the source of truth for OpenRouter billing.
### Gemini / Vertex
Preferred truth:
1. official billing export or billing API for reconciled spend when available for the route
2. official pricing docs for estimate
### DeepSeek
Preferred truth:
1. official machine-readable cost source if available in the future
2. official pricing docs snapshot today
### Subscription-Included Routes
Preferred truth:
1. explicit route config marking the model as included in subscription
These should display `included`, not an API list-price estimate.
### Custom Endpoint / Local Model
Preferred truth:
1. user override
2. custom contract config
3. unknown
These should default to `unknown`.
## Pricing Catalog
Replace the current `MODEL_PRICING` dict with a richer pricing catalog.
Suggested record:
```python
@dataclass
class PricingEntry:
provider: str
route_pattern: str
model_pattern: str
input_cost_per_million: Decimal | None = None
output_cost_per_million: Decimal | None = None
cache_read_cost_per_million: Decimal | None = None
cache_write_cost_per_million: Decimal | None = None
request_cost: Decimal | None = None
image_cost: Decimal | None = None
source: str = "official_docs_snapshot"
source_url: str | None = None
fetched_at: datetime | None = None
pricing_version: str | None = None
```
The catalog should be route-aware:
- `openai:gpt-5`
- `anthropic:claude-opus-4-6`
- `openrouter:anthropic/claude-opus-4.6`
- `copilot:gpt-4o`
This avoids conflating direct-provider billing with aggregator billing.
## Pricing Sync Architecture
Introduce a pricing sync subsystem instead of manually maintaining a single hardcoded table.
Suggested modules:
- `agent/pricing/catalog.py`
- `agent/pricing/sources.py`
- `agent/pricing/sync.py`
- `agent/pricing/reconcile.py`
- `agent/pricing/types.py`
### Sync Sources
- OpenRouter models API
- official provider docs snapshots where no API exists
- user overrides from config
### Sync Output
Cache pricing entries locally with:
- source URL
- fetch timestamp
- version/hash
- confidence/source type
### Sync Frequency
- startup warm cache
- background refresh every 6 to 24 hours depending on source
- manual `hermes pricing sync`
## Reconciliation Architecture
Live requests may produce only an estimate initially. Hermes should reconcile them later when a provider exposes actual billed cost.
Suggested flow:
1. Agent call completes.
2. Hermes stores canonical usage plus reconciliation ids.
3. Hermes computes an immediate estimate if a pricing source exists.
4. A reconciliation worker fetches actual cost when supported.
5. Session and message records are updated with `actual` cost.
This can run:
- inline for cheap lookups
- asynchronously for delayed provider accounting
## Persistence Changes
Session storage should stop storing only aggregate prompt/completion totals.
Add fields for both usage and cost certainty:
- `input_tokens`
- `output_tokens`
- `cache_read_tokens`
- `cache_write_tokens`
- `reasoning_tokens`
- `estimated_cost_usd`
- `actual_cost_usd`
- `cost_status`
- `cost_source`
- `pricing_version`
- `billing_provider`
- `billing_mode`
If schema expansion is too large for one PR, add a new pricing events table:
```text
session_cost_events
id
session_id
request_id
provider
model
billing_mode
input_tokens
output_tokens
cache_read_tokens
cache_write_tokens
estimated_cost_usd
actual_cost_usd
cost_status
cost_source
pricing_version
created_at
updated_at
```
## Hermes Touchpoints
### `run_agent.py`
Current responsibility:
- parse raw provider usage
- update session token counters
New responsibility:
- build `CanonicalUsage`
- update canonical counters
- store reconciliation ids
- emit usage event to pricing subsystem
### `agent/usage_pricing.py`
Current responsibility:
- static lookup table
- direct cost arithmetic
New responsibility:
- move or replace with pricing catalog facade
- no fuzzy model-family heuristics
- no direct pricing without billing-route context
### `cli.py`
Current responsibility:
- compute session cost directly from prompt/completion totals
New responsibility:
- display `CostResult`
- show status badges:
- `actual`
- `estimated`
- `included`
- `n/a`
### `agent/insights.py`
Current responsibility:
- recompute historical estimates from static pricing
New responsibility:
- aggregate stored pricing events
- prefer actual cost over estimate
- surface estimates only when reconciliation is unavailable
## UX Rules
### Status Bar
Show one of:
- `$1.42`
- `~$1.42`
- `included`
- `cost n/a`
Where:
- `$1.42` means `actual`
- `~$1.42` means `estimated`
- `included` means subscription-backed or explicitly zero-cost route
- `cost n/a` means unknown
### `/usage`
Show:
- token buckets
- estimated cost
- actual cost if available
- cost status
- pricing source
### `/insights`
Aggregate:
- actual cost totals
- estimated-only totals
- unknown-cost sessions count
- included-cost sessions count
## Config And Overrides
Add user-configurable pricing overrides in config:
```yaml
pricing:
mode: hybrid
sync_on_startup: true
sync_interval_hours: 12
overrides:
- provider: openrouter
model: anthropic/claude-opus-4.6
billing_mode: custom_contract
input_cost_per_million: 4.25
output_cost_per_million: 22.0
cache_read_cost_per_million: 0.5
cache_write_cost_per_million: 6.0
included_routes:
- provider: copilot
model: "*"
- provider: codex-subscription
model: "*"
```
Overrides must win over catalog defaults for the matching billing route.
## Rollout Plan
### Phase 1
- add canonical usage model
- split cache token buckets in `run_agent.py`
- stop pricing cache-inflated prompt totals
- preserve current UI with improved backend math
### Phase 2
- add route-aware pricing catalog
- integrate OpenRouter models API sync
- add `estimated` vs `included` vs `unknown`
### Phase 3
- add reconciliation for OpenRouter generation cost
- add actual cost persistence
- update `/insights` to prefer actual cost
### Phase 4
- add direct OpenAI and Anthropic reconciliation paths
- add user overrides and contract pricing
- add pricing sync CLI command
## Testing Strategy
Add tests for:
- OpenAI cached token subtraction
- Anthropic cache read/write separation
- OpenRouter estimated vs actual reconciliation
- subscription-backed models showing `included`
- custom endpoints showing `n/a`
- override precedence
- stale catalog fallback behavior
Current tests that assume heuristic pricing should be replaced with route-aware expectations.
## Non-Goals
- exact enterprise billing reconstruction without an official source or user override
- backfilling perfect historical cost for old sessions that lack cache bucket data
- scraping arbitrary provider web pages at request time
## Recommendation
Do not expand the existing `MODEL_PRICING` dict.
That path cannot satisfy the product requirement. Hermes should instead migrate to:
- canonical usage normalization
- route-aware pricing sources
- estimate-then-reconcile cost lifecycle
- explicit certainty states in the UI
This is the minimum architecture that makes the statement "Hermes pricing is backed by official sources where possible, and otherwise clearly labeled" defensible.

View File

@@ -101,21 +101,11 @@ Available methods:
### Patches (`patches.py`)
**Problem**: Some hermes-agent tools use `asyncio.run()` internally (e.g., mini-swe-agent's Modal backend via SWE-ReX). This crashes when called from inside Atropos's event loop because `asyncio.run()` cannot be nested.
**Problem**: Some hermes-agent tools use `asyncio.run()` internally (e.g., the Modal backend). This crashes when called from inside Atropos's event loop because `asyncio.run()` cannot be nested.
**Solution**: `patches.py` monkey-patches `SwerexModalEnvironment` to use a dedicated background thread (`_AsyncWorker`) with its own event loop. The calling code sees the same sync interface, but internally the async work happens on a separate thread that doesn't conflict with Atropos's loop.
**Solution**: `ModalEnvironment` uses a dedicated `_AsyncWorker` background thread with its own event loop. The calling code sees a sync interface, but internally all async Modal SDK calls happen on the worker thread so they don't conflict with Atropos's loop. This is built directly into `tools/environments/modal.py` — no monkey-patching required.
What gets patched:
- `SwerexModalEnvironment.__init__` -- creates Modal deployment on a background thread
- `SwerexModalEnvironment.execute` -- runs commands on the same background thread
- `SwerexModalEnvironment.stop` -- stops deployment on the background thread
The patches are:
- **Idempotent** -- calling `apply_patches()` multiple times is safe
- **Transparent** -- same interface and behavior, only the internal async execution changes
- **Universal** -- works identically in normal CLI use (no running event loop)
Applied automatically at import time by `hermes_base_env.py`.
`patches.py` is now a no-op (kept for backward compatibility with imports).
### Tool Call Parsers (`tool_call_parsers/`)

View File

@@ -23,7 +23,7 @@ from typing import Any, Dict, List, Optional, Set
from model_tools import handle_function_call
# Thread pool for running sync tool calls that internally use asyncio.run()
# (e.g., mini-swe-agent's modal/docker/daytona backends). Running them in a separate
# (e.g., the Modal/Docker/Daytona terminal backends). Running them in a separate
# thread gives them a clean event loop so they don't deadlock inside Atropos's loop.
# Size must be large enough for concurrent eval tasks (e.g., 89 TB2 tasks all
# making tool calls). Too small = thread pool starvation, tasks queue for minutes.
@@ -346,78 +346,89 @@ class HermesAgentLoop:
tool_name, turn + 1,
)
else:
# Parse arguments and dispatch
# Parse arguments
try:
args = json.loads(tool_args_raw)
except json.JSONDecodeError:
args = {}
except json.JSONDecodeError as e:
args = None
tool_result = json.dumps(
{"error": f"Invalid JSON in tool arguments: {e}. Please retry with valid JSON."}
)
tool_errors.append(ToolError(
turn=turn + 1, tool_name=tool_name,
arguments=tool_args_raw[:200],
error=f"Invalid JSON: {e}",
tool_result=tool_result,
))
logger.warning(
"Invalid JSON in tool call arguments for '%s': %s",
tool_name, tool_args_raw[:200],
)
try:
if tool_name == "terminal":
backend = os.getenv("TERMINAL_ENV", "local")
cmd_preview = args.get("command", "")[:80]
logger.info(
"[%s] $ %s", self.task_id[:8], cmd_preview,
)
# Dispatch tool only if arguments parsed successfully
if args is not None:
try:
if tool_name == "terminal":
backend = os.getenv("TERMINAL_ENV", "local")
cmd_preview = args.get("command", "")[:80]
logger.info(
"[%s] $ %s", self.task_id[:8], cmd_preview,
)
tool_submit_time = _time.monotonic()
tool_submit_time = _time.monotonic()
# Todo tool -- handle locally (needs per-loop TodoStore)
if tool_name == "todo":
tool_result = _todo_tool(
todos=args.get("todos"),
merge=args.get("merge", False),
store=_todo_store,
)
tool_elapsed = _time.monotonic() - tool_submit_time
elif tool_name == "memory":
tool_result = json.dumps({"error": "Memory is not available in RL environments."})
tool_elapsed = _time.monotonic() - tool_submit_time
elif tool_name == "session_search":
tool_result = json.dumps({"error": "Session search is not available in RL environments."})
tool_elapsed = _time.monotonic() - tool_submit_time
else:
# Run tool calls in a thread pool so backends that
# use asyncio.run() internally (modal, docker, daytona) get
# a clean event loop instead of deadlocking.
loop = asyncio.get_event_loop()
# Capture current tool_name/args for the lambda
_tn, _ta, _tid = tool_name, args, self.task_id
tool_result = await loop.run_in_executor(
_tool_executor,
lambda: handle_function_call(
_tn, _ta, task_id=_tid,
user_task=_user_task,
),
)
tool_elapsed = _time.monotonic() - tool_submit_time
# Todo tool -- handle locally (needs per-loop TodoStore)
if tool_name == "todo":
tool_result = _todo_tool(
todos=args.get("todos"),
merge=args.get("merge", False),
store=_todo_store,
)
tool_elapsed = _time.monotonic() - tool_submit_time
elif tool_name == "memory":
tool_result = json.dumps({"error": "Memory is not available in RL environments."})
tool_elapsed = _time.monotonic() - tool_submit_time
elif tool_name == "session_search":
tool_result = json.dumps({"error": "Session search is not available in RL environments."})
tool_elapsed = _time.monotonic() - tool_submit_time
else:
# Run tool calls in a thread pool so backends that
# use asyncio.run() internally (modal, docker, daytona) get
# a clean event loop instead of deadlocking.
loop = asyncio.get_event_loop()
# Capture current tool_name/args for the lambda
_tn, _ta, _tid = tool_name, args, self.task_id
tool_result = await loop.run_in_executor(
_tool_executor,
lambda: handle_function_call(
_tn, _ta, task_id=_tid,
user_task=_user_task,
),
)
tool_elapsed = _time.monotonic() - tool_submit_time
# Log slow tools and thread pool stats for debugging
pool_active = _tool_executor._work_queue.qsize()
if tool_elapsed > 30:
logger.warning(
"[%s] turn %d: %s took %.1fs (pool queue=%d)",
self.task_id[:8], turn + 1, tool_name,
tool_elapsed, pool_active,
# Log slow tools and thread pool stats for debugging
pool_active = _tool_executor._work_queue.qsize()
if tool_elapsed > 30:
logger.warning(
"[%s] turn %d: %s took %.1fs (pool queue=%d)",
self.task_id[:8], turn + 1, tool_name,
tool_elapsed, pool_active,
)
except Exception as e:
tool_result = json.dumps(
{"error": f"Tool execution failed: {type(e).__name__}: {str(e)}"}
)
tool_errors.append(ToolError(
turn=turn + 1, tool_name=tool_name,
arguments=tool_args_raw[:200],
error=f"{type(e).__name__}: {str(e)}",
tool_result=tool_result,
))
logger.error(
"Tool '%s' execution failed on turn %d: %s",
tool_name, turn + 1, e,
)
except Exception as e:
tool_result = json.dumps(
{"error": f"Tool execution failed: {type(e).__name__}: {str(e)}"}
)
tool_errors.append(ToolError(
turn=turn + 1, tool_name=tool_name,
arguments=tool_args_raw[:200],
error=f"{type(e).__name__}: {str(e)}",
tool_result=tool_result,
))
logger.error(
"Tool '%s' execution failed on turn %d: %s",
tool_name, turn + 1, e,
)
# Also check if the tool returned an error in its JSON result
try:

View File

@@ -209,7 +209,7 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):
# Agent settings -- TB2 tasks are complex, need many turns
max_agent_turns=60,
max_token_length=***
max_token_length=16000,
agent_temperature=0.6,
system_prompt=None,
@@ -233,7 +233,7 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):
steps_per_eval=1,
total_steps=1,
tokenizer_name="NousRe...1-8B",
tokenizer_name="NousResearch/Hermes-3-Llama-3.1-8B",
use_wandb=True,
wandb_name="terminal-bench-2",
ensure_scores_are_not_same=False, # Binary rewards may all be 0 or 1
@@ -245,7 +245,7 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):
base_url="https://openrouter.ai/api/v1",
model_name="anthropic/claude-sonnet-4",
server_type="openai",
api_key=os.get...EY", ""),
api_key=os.getenv("OPENROUTER_API_KEY", ""),
health_check=False,
)
]
@@ -513,3 +513,446 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):
reward = 0.0
else:
# Run tests in a thread so the blocking ctx.terminal() calls
# don't freeze the entire event loop (which would stall all
# other tasks, tqdm updates, and timeout timers).
ctx = ToolContext(task_id)
try:
loop = asyncio.get_event_loop()
reward = await loop.run_in_executor(
None, # default thread pool
self._run_tests, eval_item, ctx, task_name,
)
except Exception as e:
logger.error("Task %s: test verification failed: %s", task_name, e)
reward = 0.0
finally:
ctx.cleanup()
passed = reward == 1.0
status = "PASS" if passed else "FAIL"
elapsed = time.time() - task_start
tqdm.write(f" [{status}] {task_name} (turns={result.turns_used}, {elapsed:.0f}s)")
logger.info(
"Task %s: reward=%.1f, turns=%d, finished=%s",
task_name, reward, result.turns_used, result.finished_naturally,
)
out = {
"passed": passed,
"reward": reward,
"task_name": task_name,
"category": category,
"turns_used": result.turns_used,
"finished_naturally": result.finished_naturally,
"messages": result.messages,
}
self._save_result(out)
return out
except Exception as e:
elapsed = time.time() - task_start
logger.error("Task %s: rollout failed: %s", task_name, e, exc_info=True)
tqdm.write(f" [ERROR] {task_name}: {e} ({elapsed:.0f}s)")
out = {
"passed": False, "reward": 0.0,
"task_name": task_name, "category": category,
"error": str(e),
}
self._save_result(out)
return out
finally:
# --- Cleanup: clear overrides, sandbox, and temp files ---
clear_task_env_overrides(task_id)
try:
cleanup_vm(task_id)
except Exception as e:
logger.debug("VM cleanup for %s: %s", task_id[:8], e)
if task_dir and task_dir.exists():
shutil.rmtree(task_dir, ignore_errors=True)
def _run_tests(
self, item: Dict[str, Any], ctx: ToolContext, task_name: str
) -> float:
"""
Upload and execute the test suite in the agent's sandbox, then
download the verifier output locally to read the reward.
Follows Harbor's verification pattern:
1. Upload tests/ directory into the sandbox
2. Execute test.sh inside the sandbox
3. Download /logs/verifier/ directory to a local temp dir
4. Read reward.txt locally with native Python I/O
Downloading locally avoids issues with the file_read tool on
the Modal VM and matches how Harbor handles verification.
TB2 test scripts (test.sh) typically:
1. Install pytest via uv/pip
2. Run pytest against the test files in /tests/
3. Write results to /logs/verifier/reward.txt
Args:
item: The TB2 task dict (contains tests_tar, test_sh)
ctx: ToolContext scoped to this task's sandbox
task_name: For logging
Returns:
1.0 if tests pass, 0.0 otherwise
"""
tests_tar = item.get("tests_tar", "")
test_sh = item.get("test_sh", "")
if not test_sh:
logger.warning("Task %s: no test_sh content, reward=0", task_name)
return 0.0
# Create required directories in the sandbox
ctx.terminal("mkdir -p /tests /logs/verifier")
# Upload test files into the sandbox (binary-safe via base64)
if tests_tar:
tests_temp = Path(tempfile.mkdtemp(prefix=f"tb2-tests-{task_name}-"))
try:
_extract_base64_tar(tests_tar, tests_temp)
ctx.upload_dir(str(tests_temp), "/tests")
except Exception as e:
logger.warning("Task %s: failed to upload test files: %s", task_name, e)
finally:
shutil.rmtree(tests_temp, ignore_errors=True)
# Write the test runner script (test.sh)
ctx.write_file("/tests/test.sh", test_sh)
ctx.terminal("chmod +x /tests/test.sh")
# Execute the test suite
logger.info(
"Task %s: running test suite (timeout=%ds)",
task_name, self.config.test_timeout,
)
test_result = ctx.terminal(
"bash /tests/test.sh",
timeout=self.config.test_timeout,
)
exit_code = test_result.get("exit_code", -1)
output = test_result.get("output", "")
# Download the verifier output directory locally, then read reward.txt
# with native Python I/O. This avoids issues with file_read on the
# Modal VM and matches Harbor's verification pattern.
reward = 0.0
local_verifier_dir = Path(tempfile.mkdtemp(prefix=f"tb2-verifier-{task_name}-"))
try:
ctx.download_dir("/logs/verifier", str(local_verifier_dir))
reward_file = local_verifier_dir / "reward.txt"
if reward_file.exists() and reward_file.stat().st_size > 0:
content = reward_file.read_text().strip()
if content == "1":
reward = 1.0
elif content == "0":
reward = 0.0
else:
# Unexpected content -- try parsing as float
try:
reward = float(content)
except (ValueError, TypeError):
logger.warning(
"Task %s: reward.txt content unexpected (%r), "
"falling back to exit_code=%d",
task_name, content, exit_code,
)
reward = 1.0 if exit_code == 0 else 0.0
else:
# reward.txt not written -- fall back to exit code
logger.warning(
"Task %s: reward.txt not found after download, "
"falling back to exit_code=%d",
task_name, exit_code,
)
reward = 1.0 if exit_code == 0 else 0.0
except Exception as e:
logger.warning(
"Task %s: failed to download verifier dir: %s, "
"falling back to exit_code=%d",
task_name, e, exit_code,
)
reward = 1.0 if exit_code == 0 else 0.0
finally:
shutil.rmtree(local_verifier_dir, ignore_errors=True)
# Log test output for debugging failures
if reward == 0.0:
output_preview = output[-500:] if output else "(no output)"
logger.info(
"Task %s: FAIL (exit_code=%d)\n%s",
task_name, exit_code, output_preview,
)
return reward
# =========================================================================
# Evaluate -- main entry point for the eval subcommand
# =========================================================================
async def _eval_with_timeout(self, item: Dict[str, Any]) -> Dict:
"""
Wrap rollout_and_score_eval with a per-task wall-clock timeout.
If the task exceeds task_timeout seconds, it's automatically scored
as FAIL. This prevents any single task from hanging indefinitely.
"""
task_name = item.get("task_name", "unknown")
category = item.get("category", "unknown")
try:
return await asyncio.wait_for(
self.rollout_and_score_eval(item),
timeout=self.config.task_timeout,
)
except asyncio.TimeoutError:
from tqdm import tqdm
elapsed = self.config.task_timeout
tqdm.write(f" [TIMEOUT] {task_name} (exceeded {elapsed}s wall-clock limit)")
logger.error("Task %s: wall-clock timeout after %ds", task_name, elapsed)
out = {
"passed": False, "reward": 0.0,
"task_name": task_name, "category": category,
"error": f"timeout ({elapsed}s)",
}
self._save_result(out)
return out
async def evaluate(self, *args, **kwargs) -> None:
"""
Run Terminal-Bench 2.0 evaluation over all tasks.
This is the main entry point when invoked via:
python environments/terminalbench2_env.py evaluate
Runs all tasks through rollout_and_score_eval() via asyncio.gather()
(same pattern as GPQA and other Atropos eval envs). Each task is
wrapped with a wall-clock timeout so hung tasks auto-fail.
Suppresses noisy Modal/terminal output (HERMES_QUIET) so the tqdm
bar stays visible.
"""
start_time = time.time()
# Route all logging through tqdm.write() so the progress bar stays
# pinned at the bottom while log lines scroll above it.
from tqdm import tqdm
class _TqdmHandler(logging.Handler):
def emit(self, record):
try:
tqdm.write(self.format(record))
except Exception:
self.handleError(record)
handler = _TqdmHandler()
handler.setFormatter(logging.Formatter(
"%(asctime)s [%(name)s] %(levelname)s: %(message)s",
datefmt="%H:%M:%S",
))
root = logging.getLogger()
root.handlers = [handler] # Replace any existing handlers
root.setLevel(logging.INFO)
# Silence noisy third-party loggers that flood the output
logging.getLogger("httpx").setLevel(logging.WARNING) # Every HTTP request
logging.getLogger("openai").setLevel(logging.WARNING) # OpenAI client retries
logging.getLogger("rex-deploy").setLevel(logging.WARNING) # Swerex deployment
logging.getLogger("rex_image_builder").setLevel(logging.WARNING) # Image builds
print(f"\n{'='*60}")
print("Starting Terminal-Bench 2.0 Evaluation")
print(f"{'='*60}")
print(f" Dataset: {self.config.dataset_name}")
print(f" Total tasks: {len(self.all_eval_items)}")
print(f" Max agent turns: {self.config.max_agent_turns}")
print(f" Task timeout: {self.config.task_timeout}s")
print(f" Terminal backend: {self.config.terminal_backend}")
print(f" Tool thread pool: {self.config.tool_pool_size}")
print(f" Terminal timeout: {self.config.terminal_timeout}s/cmd")
print(f" Terminal lifetime: {self.config.terminal_lifetime}s (auto: task_timeout + 120)")
print(f" Max concurrent tasks: {self.config.max_concurrent_tasks}")
print(f"{'='*60}\n")
# Semaphore to limit concurrent Modal sandbox creations.
# Without this, all 86 tasks fire simultaneously, each creating a Modal
# sandbox via asyncio.run() inside a thread pool worker. Modal's blocking
# calls (App.lookup, etc.) deadlock when too many are created at once.
semaphore = asyncio.Semaphore(self.config.max_concurrent_tasks)
async def _eval_with_semaphore(item):
async with semaphore:
return await self._eval_with_timeout(item)
# Fire all tasks with wall-clock timeout, track live accuracy on the bar
total_tasks = len(self.all_eval_items)
eval_tasks = [
asyncio.ensure_future(_eval_with_semaphore(item))
for item in self.all_eval_items
]
results = []
passed_count = 0
pbar = tqdm(total=total_tasks, desc="Evaluating TB2", dynamic_ncols=True)
try:
for coro in asyncio.as_completed(eval_tasks):
result = await coro
results.append(result)
if result and result.get("passed"):
passed_count += 1
done = len(results)
pct = (passed_count / done * 100) if done else 0
pbar.set_postfix_str(f"pass={passed_count}/{done} ({pct:.1f}%)")
pbar.update(1)
except (KeyboardInterrupt, asyncio.CancelledError):
pbar.close()
print(f"\n\nInterrupted! Cleaning up {len(eval_tasks)} tasks...")
# Cancel all pending tasks
for task in eval_tasks:
task.cancel()
# Let cancellations propagate (finally blocks run cleanup_vm)
await asyncio.gather(*eval_tasks, return_exceptions=True)
# Belt-and-suspenders: clean up any remaining sandboxes
from tools.terminal_tool import cleanup_all_environments
cleanup_all_environments()
print("All sandboxes cleaned up.")
return
finally:
pbar.close()
end_time = time.time()
# Filter out None results (shouldn't happen, but be safe)
valid_results = [r for r in results if r is not None]
if not valid_results:
print("Warning: No valid evaluation results obtained")
return
# ---- Compute metrics ----
total = len(valid_results)
passed = sum(1 for r in valid_results if r.get("passed"))
overall_pass_rate = passed / total if total > 0 else 0.0
# Per-category breakdown
cat_results: Dict[str, List[Dict]] = defaultdict(list)
for r in valid_results:
cat_results[r.get("category", "unknown")].append(r)
# Build metrics dict
eval_metrics = {
"eval/pass_rate": overall_pass_rate,
"eval/total_tasks": total,
"eval/passed_tasks": passed,
"eval/evaluation_time_seconds": end_time - start_time,
}
# Per-category metrics
for category, cat_items in sorted(cat_results.items()):
cat_passed = sum(1 for r in cat_items if r.get("passed"))
cat_total = len(cat_items)
cat_pass_rate = cat_passed / cat_total if cat_total > 0 else 0.0
cat_key = category.replace(" ", "_").replace("-", "_").lower()
eval_metrics[f"eval/pass_rate_{cat_key}"] = cat_pass_rate
# Store metrics for wandb_log
self.eval_metrics = [(k, v) for k, v in eval_metrics.items()]
# ---- Print summary ----
print(f"\n{'='*60}")
print("Terminal-Bench 2.0 Evaluation Results")
print(f"{'='*60}")
print(f"Overall Pass Rate: {overall_pass_rate:.4f} ({passed}/{total})")
print(f"Evaluation Time: {end_time - start_time:.1f} seconds")
print("\nCategory Breakdown:")
for category, cat_items in sorted(cat_results.items()):
cat_passed = sum(1 for r in cat_items if r.get("passed"))
cat_total = len(cat_items)
cat_rate = cat_passed / cat_total if cat_total > 0 else 0.0
print(f" {category}: {cat_rate:.1%} ({cat_passed}/{cat_total})")
# Print individual task results
print("\nTask Results:")
for r in sorted(valid_results, key=lambda x: x.get("task_name", "")):
status = "PASS" if r.get("passed") else "FAIL"
turns = r.get("turns_used", "?")
error = r.get("error", "")
extra = f" (error: {error})" if error else ""
print(f" [{status}] {r['task_name']} (turns={turns}){extra}")
print(f"{'='*60}\n")
# Build sample records for evaluate_log (includes full conversations)
samples = [
{
"task_name": r.get("task_name"),
"category": r.get("category"),
"passed": r.get("passed"),
"reward": r.get("reward"),
"turns_used": r.get("turns_used"),
"error": r.get("error"),
"messages": r.get("messages"),
}
for r in valid_results
]
# Log evaluation results
try:
await self.evaluate_log(
metrics=eval_metrics,
samples=samples,
start_time=start_time,
end_time=end_time,
generation_parameters={
"temperature": self.config.agent_temperature,
"max_tokens": self.config.max_token_length,
"max_agent_turns": self.config.max_agent_turns,
"terminal_backend": self.config.terminal_backend,
},
)
except Exception as e:
print(f"Error logging evaluation results: {e}")
# Close streaming file
if hasattr(self, "_streaming_file") and not self._streaming_file.closed:
self._streaming_file.close()
print(f" Live results saved to: {self._streaming_path}")
# Kill all remaining sandboxes. Timed-out tasks leave orphaned thread
# pool workers still executing commands -- cleanup_all stops them.
from tools.terminal_tool import cleanup_all_environments
print("\nCleaning up all sandboxes...")
cleanup_all_environments()
# Shut down the tool thread pool so orphaned workers from timed-out
# tasks are killed immediately instead of retrying against dead
# sandboxes and spamming the console with TimeoutError warnings.
from environments.agent_loop import _tool_executor
_tool_executor.shutdown(wait=False, cancel_futures=True)
print("Done.")
# =========================================================================
# Wandb logging
# =========================================================================
async def wandb_log(self, wandb_metrics: Optional[Dict] = None):
"""Log TB2-specific metrics to wandb."""
if wandb_metrics is None:
wandb_metrics = {}
# Add stored eval metrics
for metric_name, metric_value in self.eval_metrics:
wandb_metrics[metric_name] = metric_value
self.eval_metrics = []
await super().wandb_log(wandb_metrics)
if __name__ == "__main__":
TerminalBench2EvalEnv.cli()

View File

@@ -2,203 +2,34 @@
Monkey patches for making hermes-agent tools work inside async frameworks (Atropos).
Problem:
Some tools use asyncio.run() internally (e.g., mini-swe-agent's Modal backend,
Some tools use asyncio.run() internally (e.g., Modal backend via SWE-ReX,
web_extract). This crashes when called from inside Atropos's event loop because
asyncio.run() can't be nested.
Solution:
Replace the problematic methods with versions that use a dedicated background
thread with its own event loop. The calling code sees the same sync interface --
call a function, get a result -- but internally the async work happens on a
separate thread that doesn't conflict with Atropos's loop.
The Modal environment (tools/environments/modal.py) now uses a dedicated
_AsyncWorker thread internally, making it safe for both CLI and Atropos use.
No monkey-patching is required.
These patches are safe for normal CLI use too: when there's no running event
loop, the behavior is identical (the background thread approach works regardless).
What gets patched:
- SwerexModalEnvironment.__init__ -- creates Modal deployment on a background thread
- SwerexModalEnvironment.execute -- runs commands on the same background thread
- SwerexModalEnvironment.stop -- stops deployment on the background thread
This module is kept for backward compatibility. apply_patches() is a no-op.
Usage:
Call apply_patches() once at import time (done automatically by hermes_base_env.py).
This is idempotent -- calling it multiple times is safe.
This is idempotent and safe to call multiple times.
"""
import asyncio
import logging
import threading
from typing import Any
logger = logging.getLogger(__name__)
_patches_applied = False
class _AsyncWorker:
"""
A dedicated background thread with its own event loop.
Allows sync code to submit async coroutines and block for results,
even when called from inside another running event loop. Used to
bridge sync tool interfaces with async backends (Modal, SWE-ReX).
"""
def __init__(self):
self._loop: asyncio.AbstractEventLoop = None
self._thread: threading.Thread = None
self._started = threading.Event()
def start(self):
"""Start the background event loop thread."""
self._thread = threading.Thread(target=self._run_loop, daemon=True)
self._thread.start()
self._started.wait(timeout=30)
def _run_loop(self):
"""Background thread entry point -- runs the event loop forever."""
self._loop = asyncio.new_event_loop()
asyncio.set_event_loop(self._loop)
self._started.set()
self._loop.run_forever()
def run_coroutine(self, coro, timeout=600):
"""
Submit a coroutine to the background loop and block until it completes.
Safe to call from any thread, including threads that already have
a running event loop.
"""
if self._loop is None or self._loop.is_closed():
raise RuntimeError("AsyncWorker loop is not running")
future = asyncio.run_coroutine_threadsafe(coro, self._loop)
return future.result(timeout=timeout)
def stop(self):
"""Stop the background event loop and join the thread."""
if self._loop and self._loop.is_running():
self._loop.call_soon_threadsafe(self._loop.stop)
if self._thread:
self._thread.join(timeout=10)
def _patch_swerex_modal():
"""
Monkey patch SwerexModalEnvironment to use a background thread event loop
instead of asyncio.run(). This makes it safe to call from inside Atropos's
async event loop.
The patched methods have the exact same interface and behavior -- the only
difference is HOW the async work is executed internally.
"""
try:
from minisweagent.environments.extra.swerex_modal import (
SwerexModalEnvironment,
SwerexModalEnvironmentConfig,
)
from swerex.deployment.modal import ModalDeployment
from swerex.runtime.abstract import Command as RexCommand
except ImportError:
# mini-swe-agent or swe-rex not installed -- nothing to patch
logger.debug("mini-swe-agent Modal backend not available, skipping patch")
return
# Save original methods so we can refer to config handling
_original_init = SwerexModalEnvironment.__init__
def _patched_init(self, **kwargs):
"""Patched __init__: creates Modal deployment on a background thread."""
self.config = SwerexModalEnvironmentConfig(**kwargs)
# Start a dedicated event loop thread for all Modal async operations
self._worker = _AsyncWorker()
self._worker.start()
# Pre-build a modal.Image with pip fix for Modal's legacy image builder.
# Modal requires `python -m pip` to work during image build, but some
# task images (e.g., TBLite's broken-python) have intentionally broken pip.
# Fix: remove stale pip dist-info and reinstall via ensurepip before Modal
# tries to use it. This is a no-op for images where pip already works.
import modal as _modal
image_spec = self.config.image
if isinstance(image_spec, str):
image_spec = _modal.Image.from_registry(
image_spec,
setup_dockerfile_commands=[
"RUN rm -rf /usr/local/lib/python*/site-packages/pip* 2>/dev/null; "
"python -m ensurepip --upgrade --default-pip 2>/dev/null || true",
],
)
# Create AND start the deployment entirely on the worker's loop/thread
# so all gRPC channels and async state are bound to that loop
async def _create_and_start():
deployment = ModalDeployment(
image=image_spec,
startup_timeout=self.config.startup_timeout,
runtime_timeout=self.config.runtime_timeout,
deployment_timeout=self.config.deployment_timeout,
install_pipx=self.config.install_pipx,
modal_sandbox_kwargs=self.config.modal_sandbox_kwargs,
)
await deployment.start()
return deployment
self.deployment = self._worker.run_coroutine(_create_and_start())
def _patched_execute(self, command: str, cwd: str = "", *, timeout: int | None = None) -> dict[str, Any]:
"""Patched execute: runs commands on the background thread's loop."""
async def _do_execute():
return await self.deployment.runtime.execute(
RexCommand(
command=command,
shell=True,
check=False,
cwd=cwd or self.config.cwd,
timeout=timeout or self.config.timeout,
merge_output_streams=True,
env=self.config.env if self.config.env else None,
)
)
output = self._worker.run_coroutine(_do_execute())
return {
"output": output.stdout,
"returncode": output.exit_code,
}
def _patched_stop(self):
"""Patched stop: stops deployment on the background thread, then stops the thread."""
try:
self._worker.run_coroutine(
asyncio.wait_for(self.deployment.stop(), timeout=10),
timeout=15,
)
except Exception:
pass
finally:
self._worker.stop()
# Apply the patches
SwerexModalEnvironment.__init__ = _patched_init
SwerexModalEnvironment.execute = _patched_execute
SwerexModalEnvironment.stop = _patched_stop
logger.debug("Patched SwerexModalEnvironment for async-safe operation")
def apply_patches():
"""
Apply all monkey patches needed for Atropos compatibility.
Safe to call multiple times -- patches are only applied once.
Safe for normal CLI use -- patched code works identically when
there is no running event loop.
"""
"""Apply all monkey patches needed for Atropos compatibility."""
global _patches_applied
if _patches_applied:
return
_patch_swerex_modal()
logger.debug("apply_patches() called; no patches needed (async safety is built-in)")
_patches_applied = True

View File

@@ -10,12 +10,13 @@ Format uses special unicode tokens:
<tool▁call▁end>
<tool▁calls▁end>
Based on VLLM's DeepSeekV3ToolParser.extract_tool_calls()
Fixes Issue #989: Support for multiple simultaneous tool calls.
"""
import re
import uuid
from typing import List, Optional
import logging
from typing import List, Optional, Tuple
from openai.types.chat.chat_completion_message_tool_call import (
ChatCompletionMessageToolCall,
@@ -24,6 +25,7 @@ from openai.types.chat.chat_completion_message_tool_call import (
from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
logger = logging.getLogger(__name__)
@register_parser("deepseek_v3")
class DeepSeekV3ToolCallParser(ToolCallParser):
@@ -32,45 +34,56 @@ class DeepSeekV3ToolCallParser(ToolCallParser):
Uses special unicode tokens with fullwidth angle brackets and block elements.
Extracts type, function name, and JSON arguments from the structured format.
Ensures all tool calls are captured when the model executes multiple actions.
"""
START_TOKEN = "<tool▁calls▁begin>"
# Regex captures: type, function_name, function_arguments
# Updated PATTERN: Using \s* instead of literal \n for increased robustness
# against variations in model formatting (Issue #989).
PATTERN = re.compile(
r"<tool▁call▁begin>(?P<type>.*?)<tool▁sep>(?P<function_name>.*?)\n```json\n(?P<function_arguments>.*?)\n```<tool▁call▁end>",
r"<tool▁call▁begin>(?P<type>.*?)<tool▁sep>(?P<function_name>.*?)\s*```json\s*(?P<function_arguments>.*?)\s*```\s*<tool▁call▁end>",
re.DOTALL,
)
def parse(self, text: str) -> ParseResult:
"""
Parses the input text and extracts all available tool calls.
"""
if self.START_TOKEN not in text:
return text, None
try:
matches = self.PATTERN.findall(text)
# Using finditer to capture ALL tool calls in the sequence
matches = list(self.PATTERN.finditer(text))
if not matches:
return text, None
tool_calls: List[ChatCompletionMessageToolCall] = []
for match in matches:
tc_type, func_name, func_args = match
func_name = match.group("function_name").strip()
func_args = match.group("function_arguments").strip()
tool_calls.append(
ChatCompletionMessageToolCall(
id=f"call_{uuid.uuid4().hex[:8]}",
type="function",
function=Function(
name=func_name.strip(),
arguments=func_args.strip(),
name=func_name,
arguments=func_args,
),
)
)
if not tool_calls:
return text, None
if tool_calls:
# Content is text before the first tool call block
content_index = text.find(self.START_TOKEN)
content = text[:content_index].strip()
return content if content else None, tool_calls
# Content is everything before the tool calls section
content = text[: text.find(self.START_TOKEN)].strip()
return content if content else None, tool_calls
except Exception:
return text, None
except Exception as e:
logger.error(f"Error parsing DeepSeek V3 tool calls: {e}")
return text, None

View File

@@ -10,7 +10,6 @@ The [TOOL_CALLS] token is the bot_token used by Mistral models.
"""
import json
import re
import uuid
from typing import List, Optional
@@ -42,9 +41,6 @@ class MistralToolCallParser(ToolCallParser):
# The [TOOL_CALLS] token -- may appear as different strings depending on tokenizer
BOT_TOKEN = "[TOOL_CALLS]"
# Fallback regex for pre-v11 format when JSON parsing fails
TOOL_CALL_REGEX = re.compile(r"\[?\s*(\{.*?\})\s*\]?", re.DOTALL)
def parse(self, text: str) -> ParseResult:
if self.BOT_TOKEN not in text:
return text, None
@@ -71,6 +67,13 @@ class MistralToolCallParser(ToolCallParser):
tool_name = raw[:brace_idx].strip()
args_str = raw[brace_idx:]
# Validate and clean the JSON arguments
try:
parsed_args = json.loads(args_str)
args_str = json.dumps(parsed_args, ensure_ascii=False)
except json.JSONDecodeError:
pass # Keep raw if parsing fails
tool_calls.append(
ChatCompletionMessageToolCall(
id=_generate_mistral_id(),
@@ -100,13 +103,14 @@ class MistralToolCallParser(ToolCallParser):
)
)
except json.JSONDecodeError:
# Fallback regex extraction
match = self.TOOL_CALL_REGEX.findall(first_raw)
if match:
for raw_json in match:
try:
tc = json.loads(raw_json)
args = tc.get("arguments", {})
# Fallback: extract JSON objects using raw_decode
decoder = json.JSONDecoder()
idx = 0
while idx < len(first_raw):
try:
obj, end_idx = decoder.raw_decode(first_raw, idx)
if isinstance(obj, dict) and "name" in obj:
args = obj.get("arguments", {})
if isinstance(args, dict):
args = json.dumps(args, ensure_ascii=False)
tool_calls.append(
@@ -114,12 +118,13 @@ class MistralToolCallParser(ToolCallParser):
id=_generate_mistral_id(),
type="function",
function=Function(
name=tc["name"], arguments=args
name=obj["name"], arguments=args
),
)
)
except (json.JSONDecodeError, KeyError):
continue
idx = end_idx
except json.JSONDecodeError:
idx += 1
if not tool_calls:
return text, None

181
flake.lock generated Normal file
View File

@@ -0,0 +1,181 @@
{
"nodes": {
"flake-parts": {
"inputs": {
"nixpkgs-lib": [
"nixpkgs"
]
},
"locked": {
"lastModified": 1772408722,
"narHash": "sha256-rHuJtdcOjK7rAHpHphUb1iCvgkU3GpfvicLMwwnfMT0=",
"owner": "hercules-ci",
"repo": "flake-parts",
"rev": "f20dc5d9b8027381c474144ecabc9034d6a839a3",
"type": "github"
},
"original": {
"owner": "hercules-ci",
"repo": "flake-parts",
"type": "github"
}
},
"nixpkgs": {
"locked": {
"lastModified": 1751274312,
"narHash": "sha256-/bVBlRpECLVzjV19t5KMdMFWSwKLtb5RyXdjz3LJT+g=",
"owner": "NixOS",
"repo": "nixpkgs",
"rev": "50ab793786d9de88ee30ec4e4c24fb4236fc2674",
"type": "github"
},
"original": {
"owner": "NixOS",
"ref": "nixos-24.11",
"repo": "nixpkgs",
"type": "github"
}
},
"pyproject-build-systems": {
"inputs": {
"nixpkgs": [
"nixpkgs"
],
"pyproject-nix": "pyproject-nix",
"uv2nix": "uv2nix"
},
"locked": {
"lastModified": 1772555609,
"narHash": "sha256-3BA3HnUvJSbHJAlJj6XSy0Jmu7RyP2gyB/0fL7XuEDo=",
"owner": "pyproject-nix",
"repo": "build-system-pkgs",
"rev": "c37f66a953535c394244888598947679af231863",
"type": "github"
},
"original": {
"owner": "pyproject-nix",
"repo": "build-system-pkgs",
"type": "github"
}
},
"pyproject-nix": {
"inputs": {
"nixpkgs": [
"pyproject-build-systems",
"nixpkgs"
]
},
"locked": {
"lastModified": 1769936401,
"narHash": "sha256-kwCOegKLZJM9v/e/7cqwg1p/YjjTAukKPqmxKnAZRgA=",
"owner": "nix-community",
"repo": "pyproject.nix",
"rev": "b0d513eeeebed6d45b4f2e874f9afba2021f7812",
"type": "github"
},
"original": {
"owner": "nix-community",
"repo": "pyproject.nix",
"type": "github"
}
},
"pyproject-nix_2": {
"inputs": {
"nixpkgs": [
"nixpkgs"
]
},
"locked": {
"lastModified": 1772865871,
"narHash": "sha256-/ZTSg97aouL0SlPHaokA4r3iuH9QzHVuWPACD2CUCFY=",
"owner": "pyproject-nix",
"repo": "pyproject.nix",
"rev": "e537db02e72d553cea470976b9733581bcf5b3ed",
"type": "github"
},
"original": {
"owner": "pyproject-nix",
"repo": "pyproject.nix",
"type": "github"
}
},
"pyproject-nix_3": {
"inputs": {
"nixpkgs": [
"uv2nix",
"nixpkgs"
]
},
"locked": {
"lastModified": 1771518446,
"narHash": "sha256-nFJSfD89vWTu92KyuJWDoTQJuoDuddkJV3TlOl1cOic=",
"owner": "pyproject-nix",
"repo": "pyproject.nix",
"rev": "eb204c6b3335698dec6c7fc1da0ebc3c6df05937",
"type": "github"
},
"original": {
"owner": "pyproject-nix",
"repo": "pyproject.nix",
"type": "github"
}
},
"root": {
"inputs": {
"flake-parts": "flake-parts",
"nixpkgs": "nixpkgs",
"pyproject-build-systems": "pyproject-build-systems",
"pyproject-nix": "pyproject-nix_2",
"uv2nix": "uv2nix_2"
}
},
"uv2nix": {
"inputs": {
"nixpkgs": [
"pyproject-build-systems",
"nixpkgs"
],
"pyproject-nix": [
"pyproject-build-systems",
"pyproject-nix"
]
},
"locked": {
"lastModified": 1770770348,
"narHash": "sha256-A2GzkmzdYvdgmMEu5yxW+xhossP+txrYb7RuzRaqhlg=",
"owner": "pyproject-nix",
"repo": "uv2nix",
"rev": "5d1b2cb4fe3158043fbafbbe2e46238abbc954b0",
"type": "github"
},
"original": {
"owner": "pyproject-nix",
"repo": "uv2nix",
"type": "github"
}
},
"uv2nix_2": {
"inputs": {
"nixpkgs": [
"nixpkgs"
],
"pyproject-nix": "pyproject-nix_3"
},
"locked": {
"lastModified": 1773039484,
"narHash": "sha256-+boo33KYkJDw9KItpeEXXv8+65f7hHv/earxpcyzQ0I=",
"owner": "pyproject-nix",
"repo": "uv2nix",
"rev": "b68be7cfeacbed9a3fa38a2b5adc0cfb81d9bb1f",
"type": "github"
},
"original": {
"owner": "pyproject-nix",
"repo": "uv2nix",
"type": "github"
}
}
},
"root": "root",
"version": 7
}

35
flake.nix Normal file
View File

@@ -0,0 +1,35 @@
{
description = "Hermes Agent - AI agent framework by Nous Research";
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixos-24.11";
flake-parts = {
url = "github:hercules-ci/flake-parts";
inputs.nixpkgs-lib.follows = "nixpkgs";
};
pyproject-nix = {
url = "github:pyproject-nix/pyproject.nix";
inputs.nixpkgs.follows = "nixpkgs";
};
uv2nix = {
url = "github:pyproject-nix/uv2nix";
inputs.nixpkgs.follows = "nixpkgs";
};
pyproject-build-systems = {
url = "github:pyproject-nix/build-system-pkgs";
inputs.nixpkgs.follows = "nixpkgs";
};
};
outputs = inputs:
inputs.flake-parts.lib.mkFlake { inherit inputs; } {
systems = [ "x86_64-linux" "aarch64-linux" "aarch64-darwin" ];
imports = [
./nix/packages.nix
./nix/nixosModules.nix
./nix/checks.nix
./nix/devShell.nix
];
};
}

View File

@@ -0,0 +1 @@
"""Built-in gateway hooks that are always registered."""

View File

@@ -0,0 +1,86 @@
"""Built-in boot-md hook — run ~/.hermes/BOOT.md on gateway startup.
This hook is always registered. It silently skips if no BOOT.md exists.
To activate, create ``~/.hermes/BOOT.md`` with instructions for the
agent to execute on every gateway restart.
Example BOOT.md::
# Startup Checklist
1. Check if any cron jobs failed overnight
2. Send a status update to Discord #general
3. If there are errors in /opt/app/deploy.log, summarize them
The agent runs in a background thread so it doesn't block gateway
startup. If nothing needs attention, it replies with [SILENT] to
suppress delivery.
"""
import logging
import os
import threading
from pathlib import Path
logger = logging.getLogger("hooks.boot-md")
HERMES_HOME = Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes"))
BOOT_FILE = HERMES_HOME / "BOOT.md"
def _build_boot_prompt(content: str) -> str:
"""Wrap BOOT.md content in a system-level instruction."""
return (
"You are running a startup boot checklist. Follow the BOOT.md "
"instructions below exactly.\n\n"
"---\n"
f"{content}\n"
"---\n\n"
"Execute each instruction. If you need to send a message to a "
"platform, use the send_message tool.\n"
"If nothing needs attention and there is nothing to report, "
"reply with ONLY: [SILENT]"
)
def _run_boot_agent(content: str) -> None:
"""Spawn a one-shot agent session to execute the boot instructions."""
try:
from run_agent import AIAgent
prompt = _build_boot_prompt(content)
agent = AIAgent(
quiet_mode=True,
skip_context_files=True,
skip_memory=True,
max_iterations=20,
)
result = agent.run_conversation(prompt)
response = result.get("final_response", "")
if response and "[SILENT]" not in response:
logger.info("boot-md completed: %s", response[:200])
else:
logger.info("boot-md completed (nothing to report)")
except Exception as e:
logger.error("boot-md agent failed: %s", e)
async def handle(event_type: str, context: dict) -> None:
"""Gateway startup handler — run BOOT.md if it exists."""
if not BOOT_FILE.exists():
return
content = BOOT_FILE.read_text(encoding="utf-8").strip()
if not content:
return
logger.info("Running BOOT.md (%d chars)", len(content))
# Run in a background thread so we don't block gateway startup.
thread = threading.Thread(
target=_run_boot_agent,
args=(content,),
name="boot-md",
daemon=True,
)
thread.start()

View File

@@ -9,7 +9,6 @@ action="list" and for resolving human-friendly channel names to numeric IDs.
import json
import logging
from datetime import datetime
from pathlib import Path
from typing import Any, Dict, List, Optional
from hermes_cli.config import get_hermes_home
@@ -19,6 +18,20 @@ logger = logging.getLogger(__name__)
DIRECTORY_PATH = get_hermes_home() / "channel_directory.json"
def _normalize_channel_query(value: str) -> str:
return value.lstrip("#").strip().lower()
def _channel_target_name(platform_name: str, channel: Dict[str, Any]) -> str:
"""Return the human-facing target label shown to users for a channel entry."""
name = channel["name"]
if platform_name == "discord" and channel.get("guild"):
return f"#{name}"
if platform_name != "discord" and channel.get("type"):
return f"{name} ({channel['type']})"
return name
def _session_entry_id(origin: Dict[str, Any]) -> Optional[str]:
chat_id = origin.get("chat_id")
if not chat_id:
@@ -63,7 +76,7 @@ def build_channel_directory(adapters: Dict[Any, Any]) -> Dict[str, Any]:
logger.warning("Channel directory: failed to build %s: %s", platform.value, e)
# Telegram, WhatsApp & Signal can't enumerate chats -- pull from session history
for plat_name in ("telegram", "whatsapp", "signal", "email"):
for plat_name in ("telegram", "whatsapp", "signal", "email", "sms"):
if plat_name not in platforms:
platforms[plat_name] = _build_from_sessions(plat_name)
@@ -90,7 +103,7 @@ def _build_discord(adapter) -> List[Dict[str, str]]:
return channels
try:
import discord as _discord
import discord as _discord # noqa: F401 — SDK presence check
except ImportError:
return channels
@@ -119,7 +132,6 @@ def _build_slack(adapter) -> List[Dict[str, str]]:
return _build_from_sessions("slack")
try:
import asyncio
from tools.send_message_tool import _send_slack # noqa: F401
# Use the Slack Web API directly if available
except Exception:
@@ -190,23 +202,25 @@ def resolve_channel_name(platform_name: str, name: str) -> Optional[str]:
if not channels:
return None
query = name.lstrip("#").lower()
query = _normalize_channel_query(name)
# 1. Exact name match
# 1. Exact name match, including the display labels shown by send_message(action="list")
for ch in channels:
if ch["name"].lower() == query:
if _normalize_channel_query(ch["name"]) == query:
return ch["id"]
if _normalize_channel_query(_channel_target_name(platform_name, ch)) == query:
return ch["id"]
# 2. Guild-qualified match for Discord ("GuildName/channel")
if "/" in query:
guild_part, ch_part = query.rsplit("/", 1)
for ch in channels:
guild = ch.get("guild", "").lower()
if guild == guild_part and ch["name"].lower() == ch_part:
guild = ch.get("guild", "").strip().lower()
if guild == guild_part and _normalize_channel_query(ch["name"]) == ch_part:
return ch["id"]
# 3. Partial prefix match (only if unambiguous)
matches = [ch for ch in channels if ch["name"].lower().startswith(query)]
matches = [ch for ch in channels if _normalize_channel_query(ch["name"]).startswith(query)]
if len(matches) == 1:
return matches[0]["id"]
@@ -241,17 +255,16 @@ def format_directory_for_display() -> str:
for guild_name, guild_channels in sorted(guilds.items()):
lines.append(f"Discord ({guild_name}):")
for ch in sorted(guild_channels, key=lambda c: c["name"]):
lines.append(f" discord:#{ch['name']}")
lines.append(f" discord:{_channel_target_name(plat_name, ch)}")
if dms:
lines.append("Discord (DMs):")
for ch in dms:
lines.append(f" discord:{ch['name']}")
lines.append(f" discord:{_channel_target_name(plat_name, ch)}")
lines.append("")
else:
lines.append(f"{plat_name.title()}:")
for ch in channels:
type_label = f" ({ch['type']})" if ch.get("type") else ""
lines.append(f" {plat_name}:{ch['name']}{type_label}")
lines.append(f" {plat_name}:{_channel_target_name(plat_name, ch)}")
lines.append("")
lines.append('Use these as the "target" parameter when sending.')

View File

@@ -17,10 +17,34 @@ from typing import Dict, List, Optional, Any
from enum import Enum
from hermes_cli.config import get_hermes_home
from utils import is_truthy_value
logger = logging.getLogger(__name__)
def _coerce_bool(value: Any, default: bool = True) -> bool:
"""Coerce bool-ish config values, preserving a caller-provided default."""
if value is None:
return default
if isinstance(value, str):
lowered = value.strip().lower()
if lowered in ("true", "1", "yes", "on"):
return True
if lowered in ("false", "0", "no", "off"):
return False
return default
return is_truthy_value(value, default=default)
def _normalize_unauthorized_dm_behavior(value: Any, default: str = "pair") -> str:
"""Normalize unauthorized DM behavior to a supported value."""
if isinstance(value, str):
normalized = value.strip().lower()
if normalized in {"pair", "ignore"}:
return normalized
return default
class Platform(Enum):
"""Supported messaging platforms."""
LOCAL = "local"
@@ -29,8 +53,16 @@ class Platform(Enum):
WHATSAPP = "whatsapp"
SLACK = "slack"
SIGNAL = "signal"
MATTERMOST = "mattermost"
MATRIX = "matrix"
HOMEASSISTANT = "homeassistant"
EMAIL = "email"
SMS = "sms"
DINGTALK = "dingtalk"
API_SERVER = "api_server"
WEBHOOK = "webhook"
FEISHU = "feishu"
WECOM = "wecom"
@dataclass
@@ -75,23 +107,32 @@ class SessionResetPolicy:
mode: str = "both" # "daily", "idle", "both", or "none"
at_hour: int = 4 # Hour for daily reset (0-23, local time)
idle_minutes: int = 1440 # Minutes of inactivity before reset (24 hours)
notify: bool = True # Send a notification to the user when auto-reset occurs
notify_exclude_platforms: tuple = ("api_server", "webhook") # Platforms that don't get reset notifications
def to_dict(self) -> Dict[str, Any]:
return {
"mode": self.mode,
"at_hour": self.at_hour,
"idle_minutes": self.idle_minutes,
"notify": self.notify,
"notify_exclude_platforms": list(self.notify_exclude_platforms),
}
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "SessionResetPolicy":
# Handle both missing keys and explicit null values (YAML null → None)
mode = data.get("mode")
at_hour = data.get("at_hour")
idle_minutes = data.get("idle_minutes")
notify = data.get("notify")
exclude = data.get("notify_exclude_platforms")
return cls(
mode=data.get("mode", "both"),
mode=mode if mode is not None else "both",
at_hour=at_hour if at_hour is not None else 4,
idle_minutes=idle_minutes if idle_minutes is not None else 1440,
notify=notify if notify is not None else True,
notify_exclude_platforms=tuple(exclude) if exclude is not None else ("api_server", "webhook"),
)
@@ -103,6 +144,12 @@ class PlatformConfig:
api_key: Optional[str] = None # API key if different from token
home_channel: Optional[HomeChannel] = None
# Reply threading mode (Telegram/Slack)
# - "off": Never thread replies to original message
# - "first": Only first chunk threads to user's message (default)
# - "all": All chunks in multi-part replies thread to user's message
reply_to_mode: str = "first"
# Platform-specific settings
extra: Dict[str, Any] = field(default_factory=dict)
@@ -110,6 +157,7 @@ class PlatformConfig:
result = {
"enabled": self.enabled,
"extra": self.extra,
"reply_to_mode": self.reply_to_mode,
}
if self.token:
result["token"] = self.token
@@ -130,10 +178,42 @@ class PlatformConfig:
token=data.get("token"),
api_key=data.get("api_key"),
home_channel=home_channel,
reply_to_mode=data.get("reply_to_mode", "first"),
extra=data.get("extra", {}),
)
@dataclass
class StreamingConfig:
"""Configuration for real-time token streaming to messaging platforms."""
enabled: bool = False
transport: str = "edit" # "edit" (progressive editMessageText) or "off"
edit_interval: float = 0.3 # Seconds between message edits
buffer_threshold: int = 40 # Chars before forcing an edit
cursor: str = "" # Cursor shown during streaming
def to_dict(self) -> Dict[str, Any]:
return {
"enabled": self.enabled,
"transport": self.transport,
"edit_interval": self.edit_interval,
"buffer_threshold": self.buffer_threshold,
"cursor": self.cursor,
}
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "StreamingConfig":
if not data:
return cls()
return cls(
enabled=data.get("enabled", False),
transport=data.get("transport", "edit"),
edit_interval=float(data.get("edit_interval", 0.3)),
buffer_threshold=int(data.get("buffer_threshold", 40)),
cursor=data.get("cursor", ""),
)
@dataclass
class GatewayConfig:
"""
@@ -160,7 +240,19 @@ class GatewayConfig:
# Delivery settings
always_log_local: bool = True # Always save cron outputs to local files
# STT settings
stt_enabled: bool = True # Whether to auto-transcribe inbound voice messages
# Session isolation in shared chats
group_sessions_per_user: bool = True # Isolate group/channel sessions per participant when user IDs are available
# Unauthorized DM policy
unauthorized_dm_behavior: str = "pair" # "pair" or "ignore"
# Streaming configuration
streaming: StreamingConfig = field(default_factory=StreamingConfig)
def get_connected_platforms(self) -> List[Platform]:
"""Return list of platforms that are enabled and configured."""
connected = []
@@ -179,6 +271,21 @@ class GatewayConfig:
# Email uses extra dict for config (address + imap_host + smtp_host)
elif platform == Platform.EMAIL and config.extra.get("address"):
connected.append(platform)
# SMS uses api_key (Twilio auth token) — SID checked via env
elif platform == Platform.SMS and os.getenv("TWILIO_ACCOUNT_SID"):
connected.append(platform)
# API Server uses enabled flag only (no token needed)
elif platform == Platform.API_SERVER:
connected.append(platform)
# Webhook uses enabled flag only (secrets are per-route)
elif platform == Platform.WEBHOOK:
connected.append(platform)
# Feishu uses extra dict for app credentials
elif platform == Platform.FEISHU and config.extra.get("app_id"):
connected.append(platform)
# WeCom uses extra dict for bot credentials
elif platform == Platform.WECOM and config.extra.get("bot_id"):
connected.append(platform)
return connected
def get_home_channel(self, platform: Platform) -> Optional[HomeChannel]:
@@ -224,6 +331,10 @@ class GatewayConfig:
"quick_commands": self.quick_commands,
"sessions_dir": str(self.sessions_dir),
"always_log_local": self.always_log_local,
"stt_enabled": self.stt_enabled,
"group_sessions_per_user": self.group_sessions_per_user,
"unauthorized_dm_behavior": self.unauthorized_dm_behavior,
"streaming": self.streaming.to_dict(),
}
@classmethod
@@ -260,6 +371,16 @@ class GatewayConfig:
if not isinstance(quick_commands, dict):
quick_commands = {}
stt_enabled = data.get("stt_enabled")
if stt_enabled is None:
stt_enabled = data.get("stt", {}).get("enabled") if isinstance(data.get("stt"), dict) else None
group_sessions_per_user = data.get("group_sessions_per_user")
unauthorized_dm_behavior = _normalize_unauthorized_dm_behavior(
data.get("unauthorized_dm_behavior"),
"pair",
)
return cls(
platforms=platforms,
default_reset_policy=default_policy,
@@ -269,57 +390,152 @@ class GatewayConfig:
quick_commands=quick_commands,
sessions_dir=sessions_dir,
always_log_local=data.get("always_log_local", True),
stt_enabled=_coerce_bool(stt_enabled, True),
group_sessions_per_user=_coerce_bool(group_sessions_per_user, True),
unauthorized_dm_behavior=unauthorized_dm_behavior,
streaming=StreamingConfig.from_dict(data.get("streaming", {})),
)
def get_unauthorized_dm_behavior(self, platform: Optional[Platform] = None) -> str:
"""Return the effective unauthorized-DM behavior for a platform."""
if platform:
platform_cfg = self.platforms.get(platform)
if platform_cfg and "unauthorized_dm_behavior" in platform_cfg.extra:
return _normalize_unauthorized_dm_behavior(
platform_cfg.extra.get("unauthorized_dm_behavior"),
self.unauthorized_dm_behavior,
)
return self.unauthorized_dm_behavior
def load_gateway_config() -> GatewayConfig:
"""
Load gateway configuration from multiple sources.
Priority (highest to lowest):
1. Environment variables
2. ~/.hermes/gateway.json
3. cli-config.yaml gateway section
4. Defaults
2. ~/.hermes/config.yaml (primary user-facing config)
3. ~/.hermes/gateway.json (legacy — provides defaults under config.yaml)
4. Built-in defaults
"""
config = GatewayConfig()
# Try loading from ~/.hermes/gateway.json
_home = get_hermes_home()
gateway_config_path = _home / "gateway.json"
if gateway_config_path.exists():
try:
with open(gateway_config_path, "r", encoding="utf-8") as f:
data = json.load(f)
config = GatewayConfig.from_dict(data)
except Exception as e:
print(f"[gateway] Warning: Failed to load {gateway_config_path}: {e}")
gw_data: dict = {}
# Bridge session_reset from config.yaml (the user-facing config file)
# into the gateway config. config.yaml takes precedence over gateway.json
# for session reset policy since that's where hermes setup writes it.
# Legacy fallback: gateway.json provides the base layer.
# config.yaml keys always win when both specify the same setting.
gateway_json_path = _home / "gateway.json"
if gateway_json_path.exists():
try:
with open(gateway_json_path, "r", encoding="utf-8") as f:
gw_data = json.load(f) or {}
logger.info(
"Loaded legacy %s — consider moving settings to config.yaml",
gateway_json_path,
)
except Exception as e:
logger.warning("Failed to load %s: %s", gateway_json_path, e)
# Primary source: config.yaml
try:
import yaml
config_yaml_path = _home / "config.yaml"
if config_yaml_path.exists():
with open(config_yaml_path, encoding="utf-8") as f:
yaml_cfg = yaml.safe_load(f) or {}
# Map config.yaml keys → GatewayConfig.from_dict() schema.
# Each key overwrites whatever gateway.json may have set.
sr = yaml_cfg.get("session_reset")
if sr and isinstance(sr, dict):
config.default_reset_policy = SessionResetPolicy.from_dict(sr)
gw_data["default_reset_policy"] = sr
# Bridge quick commands from config.yaml into gateway runtime config.
# config.yaml is the user-facing config source, so when present it
# should override gateway.json for this setting.
qc = yaml_cfg.get("quick_commands")
if qc is not None:
if isinstance(qc, dict):
config.quick_commands = qc
gw_data["quick_commands"] = qc
else:
logger.warning("Ignoring invalid quick_commands in config.yaml (expected mapping, got %s)", type(qc).__name__)
logger.warning(
"Ignoring invalid quick_commands in config.yaml "
"(expected mapping, got %s)",
type(qc).__name__,
)
# Bridge discord settings from config.yaml to env vars
# (env vars take precedence — only set if not already defined)
stt_cfg = yaml_cfg.get("stt")
if isinstance(stt_cfg, dict):
gw_data["stt"] = stt_cfg
if "group_sessions_per_user" in yaml_cfg:
gw_data["group_sessions_per_user"] = yaml_cfg["group_sessions_per_user"]
streaming_cfg = yaml_cfg.get("streaming")
if isinstance(streaming_cfg, dict):
gw_data["streaming"] = streaming_cfg
if "reset_triggers" in yaml_cfg:
gw_data["reset_triggers"] = yaml_cfg["reset_triggers"]
if "always_log_local" in yaml_cfg:
gw_data["always_log_local"] = yaml_cfg["always_log_local"]
if "unauthorized_dm_behavior" in yaml_cfg:
gw_data["unauthorized_dm_behavior"] = _normalize_unauthorized_dm_behavior(
yaml_cfg.get("unauthorized_dm_behavior"),
"pair",
)
# Merge platforms section from config.yaml into gw_data so that
# nested keys like platforms.webhook.extra.routes are loaded.
yaml_platforms = yaml_cfg.get("platforms")
platforms_data = gw_data.setdefault("platforms", {})
if not isinstance(platforms_data, dict):
platforms_data = {}
gw_data["platforms"] = platforms_data
if isinstance(yaml_platforms, dict):
for plat_name, plat_block in yaml_platforms.items():
if not isinstance(plat_block, dict):
continue
existing = platforms_data.get(plat_name, {})
if not isinstance(existing, dict):
existing = {}
# Deep-merge extra dicts so gateway.json defaults survive
merged_extra = {**existing.get("extra", {}), **plat_block.get("extra", {})}
merged = {**existing, **plat_block}
if merged_extra:
merged["extra"] = merged_extra
platforms_data[plat_name] = merged
gw_data["platforms"] = platforms_data
for plat in Platform:
if plat == Platform.LOCAL:
continue
platform_cfg = yaml_cfg.get(plat.value)
if not isinstance(platform_cfg, dict):
continue
# Collect bridgeable keys from this platform section
bridged = {}
if "unauthorized_dm_behavior" in platform_cfg:
bridged["unauthorized_dm_behavior"] = _normalize_unauthorized_dm_behavior(
platform_cfg.get("unauthorized_dm_behavior"),
gw_data.get("unauthorized_dm_behavior", "pair"),
)
if "reply_prefix" in platform_cfg:
bridged["reply_prefix"] = platform_cfg["reply_prefix"]
if "require_mention" in platform_cfg:
bridged["require_mention"] = platform_cfg["require_mention"]
if "mention_patterns" in platform_cfg:
bridged["mention_patterns"] = platform_cfg["mention_patterns"]
if not bridged:
continue
plat_data = platforms_data.setdefault(plat.value, {})
if not isinstance(plat_data, dict):
plat_data = {}
platforms_data[plat.value] = plat_data
extra = plat_data.setdefault("extra", {})
if not isinstance(extra, dict):
extra = {}
plat_data["extra"] = extra
extra.update(bridged)
# Discord settings → env vars (env vars take precedence)
discord_cfg = yaml_cfg.get("discord", {})
if isinstance(discord_cfg, dict):
if "require_mention" in discord_cfg and not os.getenv("DISCORD_REQUIRE_MENTION"):
@@ -331,8 +547,57 @@ def load_gateway_config() -> GatewayConfig:
os.environ["DISCORD_FREE_RESPONSE_CHANNELS"] = str(frc)
if "auto_thread" in discord_cfg and not os.getenv("DISCORD_AUTO_THREAD"):
os.environ["DISCORD_AUTO_THREAD"] = str(discord_cfg["auto_thread"]).lower()
except Exception:
pass
if "reactions" in discord_cfg and not os.getenv("DISCORD_REACTIONS"):
os.environ["DISCORD_REACTIONS"] = str(discord_cfg["reactions"]).lower()
# Telegram settings → env vars (env vars take precedence)
telegram_cfg = yaml_cfg.get("telegram", {})
if isinstance(telegram_cfg, dict):
if "require_mention" in telegram_cfg and not os.getenv("TELEGRAM_REQUIRE_MENTION"):
os.environ["TELEGRAM_REQUIRE_MENTION"] = str(telegram_cfg["require_mention"]).lower()
if "mention_patterns" in telegram_cfg and not os.getenv("TELEGRAM_MENTION_PATTERNS"):
import json as _json
os.environ["TELEGRAM_MENTION_PATTERNS"] = _json.dumps(telegram_cfg["mention_patterns"])
frc = telegram_cfg.get("free_response_chats")
if frc is not None and not os.getenv("TELEGRAM_FREE_RESPONSE_CHATS"):
if isinstance(frc, list):
frc = ",".join(str(v) for v in frc)
os.environ["TELEGRAM_FREE_RESPONSE_CHATS"] = str(frc)
whatsapp_cfg = yaml_cfg.get("whatsapp", {})
if isinstance(whatsapp_cfg, dict):
if "require_mention" in whatsapp_cfg and not os.getenv("WHATSAPP_REQUIRE_MENTION"):
os.environ["WHATSAPP_REQUIRE_MENTION"] = str(whatsapp_cfg["require_mention"]).lower()
if "mention_patterns" in whatsapp_cfg and not os.getenv("WHATSAPP_MENTION_PATTERNS"):
os.environ["WHATSAPP_MENTION_PATTERNS"] = json.dumps(whatsapp_cfg["mention_patterns"])
frc = whatsapp_cfg.get("free_response_chats")
if frc is not None and not os.getenv("WHATSAPP_FREE_RESPONSE_CHATS"):
if isinstance(frc, list):
frc = ",".join(str(v) for v in frc)
os.environ["WHATSAPP_FREE_RESPONSE_CHATS"] = str(frc)
# Matrix settings → env vars (env vars take precedence)
matrix_cfg = yaml_cfg.get("matrix", {})
if isinstance(matrix_cfg, dict):
if "require_mention" in matrix_cfg and not os.getenv("MATRIX_REQUIRE_MENTION"):
os.environ["MATRIX_REQUIRE_MENTION"] = str(matrix_cfg["require_mention"]).lower()
frc = matrix_cfg.get("free_response_rooms")
if frc is not None and not os.getenv("MATRIX_FREE_RESPONSE_ROOMS"):
if isinstance(frc, list):
frc = ",".join(str(v) for v in frc)
os.environ["MATRIX_FREE_RESPONSE_ROOMS"] = str(frc)
if "auto_thread" in matrix_cfg and not os.getenv("MATRIX_AUTO_THREAD"):
os.environ["MATRIX_AUTO_THREAD"] = str(matrix_cfg["auto_thread"]).lower()
except Exception as e:
logger.warning(
"Failed to process config.yaml — falling back to .env / gateway.json values. "
"Check %s for syntax errors. Error: %s",
_home / "config.yaml",
e,
)
config = GatewayConfig.from_dict(gw_data)
# Override with environment variables
_apply_env_overrides(config)
@@ -359,6 +624,8 @@ def load_gateway_config() -> GatewayConfig:
Platform.TELEGRAM: "TELEGRAM_BOT_TOKEN",
Platform.DISCORD: "DISCORD_BOT_TOKEN",
Platform.SLACK: "SLACK_BOT_TOKEN",
Platform.MATTERMOST: "MATTERMOST_TOKEN",
Platform.MATRIX: "MATRIX_ACCESS_TOKEN",
}
for platform, pconfig in config.platforms.items():
if not pconfig.enabled:
@@ -385,6 +652,21 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
config.platforms[Platform.TELEGRAM].enabled = True
config.platforms[Platform.TELEGRAM].token = telegram_token
# Reply threading mode for Telegram (off/first/all)
telegram_reply_mode = os.getenv("TELEGRAM_REPLY_TO_MODE", "").lower()
if telegram_reply_mode in ("off", "first", "all"):
if Platform.TELEGRAM not in config.platforms:
config.platforms[Platform.TELEGRAM] = PlatformConfig()
config.platforms[Platform.TELEGRAM].reply_to_mode = telegram_reply_mode
telegram_fallback_ips = os.getenv("TELEGRAM_FALLBACK_IPS", "")
if telegram_fallback_ips:
if Platform.TELEGRAM not in config.platforms:
config.platforms[Platform.TELEGRAM] = PlatformConfig()
config.platforms[Platform.TELEGRAM].extra["fallback_ips"] = [
ip.strip() for ip in telegram_fallback_ips.split(",") if ip.strip()
]
telegram_home = os.getenv("TELEGRAM_HOME_CHANNEL")
if telegram_home and Platform.TELEGRAM in config.platforms:
config.platforms[Platform.TELEGRAM].home_channel = HomeChannel(
@@ -423,14 +705,13 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
config.platforms[Platform.SLACK] = PlatformConfig()
config.platforms[Platform.SLACK].enabled = True
config.platforms[Platform.SLACK].token = slack_token
# Home channel
slack_home = os.getenv("SLACK_HOME_CHANNEL")
if slack_home:
config.platforms[Platform.SLACK].home_channel = HomeChannel(
platform=Platform.SLACK,
chat_id=slack_home,
name=os.getenv("SLACK_HOME_CHANNEL_NAME", ""),
)
slack_home = os.getenv("SLACK_HOME_CHANNEL")
if slack_home and Platform.SLACK in config.platforms:
config.platforms[Platform.SLACK].home_channel = HomeChannel(
platform=Platform.SLACK,
chat_id=slack_home,
name=os.getenv("SLACK_HOME_CHANNEL_NAME", ""),
)
# Signal
signal_url = os.getenv("SIGNAL_HTTP_URL")
@@ -444,13 +725,60 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
"account": signal_account,
"ignore_stories": os.getenv("SIGNAL_IGNORE_STORIES", "true").lower() in ("true", "1", "yes"),
})
signal_home = os.getenv("SIGNAL_HOME_CHANNEL")
if signal_home:
config.platforms[Platform.SIGNAL].home_channel = HomeChannel(
platform=Platform.SIGNAL,
chat_id=signal_home,
name=os.getenv("SIGNAL_HOME_CHANNEL_NAME", "Home"),
)
signal_home = os.getenv("SIGNAL_HOME_CHANNEL")
if signal_home and Platform.SIGNAL in config.platforms:
config.platforms[Platform.SIGNAL].home_channel = HomeChannel(
platform=Platform.SIGNAL,
chat_id=signal_home,
name=os.getenv("SIGNAL_HOME_CHANNEL_NAME", "Home"),
)
# Mattermost
mattermost_token = os.getenv("MATTERMOST_TOKEN")
if mattermost_token:
mattermost_url = os.getenv("MATTERMOST_URL", "")
if not mattermost_url:
logger.warning("MATTERMOST_TOKEN set but MATTERMOST_URL is missing")
if Platform.MATTERMOST not in config.platforms:
config.platforms[Platform.MATTERMOST] = PlatformConfig()
config.platforms[Platform.MATTERMOST].enabled = True
config.platforms[Platform.MATTERMOST].token = mattermost_token
config.platforms[Platform.MATTERMOST].extra["url"] = mattermost_url
mattermost_home = os.getenv("MATTERMOST_HOME_CHANNEL")
if mattermost_home and Platform.MATTERMOST in config.platforms:
config.platforms[Platform.MATTERMOST].home_channel = HomeChannel(
platform=Platform.MATTERMOST,
chat_id=mattermost_home,
name=os.getenv("MATTERMOST_HOME_CHANNEL_NAME", "Home"),
)
# Matrix
matrix_token = os.getenv("MATRIX_ACCESS_TOKEN")
matrix_homeserver = os.getenv("MATRIX_HOMESERVER", "")
if matrix_token or os.getenv("MATRIX_PASSWORD"):
if not matrix_homeserver:
logger.warning("MATRIX_ACCESS_TOKEN/MATRIX_PASSWORD set but MATRIX_HOMESERVER is missing")
if Platform.MATRIX not in config.platforms:
config.platforms[Platform.MATRIX] = PlatformConfig()
config.platforms[Platform.MATRIX].enabled = True
if matrix_token:
config.platforms[Platform.MATRIX].token = matrix_token
config.platforms[Platform.MATRIX].extra["homeserver"] = matrix_homeserver
matrix_user = os.getenv("MATRIX_USER_ID", "")
if matrix_user:
config.platforms[Platform.MATRIX].extra["user_id"] = matrix_user
matrix_password = os.getenv("MATRIX_PASSWORD", "")
if matrix_password:
config.platforms[Platform.MATRIX].extra["password"] = matrix_password
matrix_e2ee = os.getenv("MATRIX_ENCRYPTION", "").lower() in ("true", "1", "yes")
config.platforms[Platform.MATRIX].extra["encryption"] = matrix_e2ee
matrix_home = os.getenv("MATRIX_HOME_ROOM")
if matrix_home and Platform.MATRIX in config.platforms:
config.platforms[Platform.MATRIX].home_channel = HomeChannel(
platform=Platform.MATRIX,
chat_id=matrix_home,
name=os.getenv("MATRIX_HOME_ROOM_NAME", "Home"),
)
# Home Assistant
hass_token = os.getenv("HASS_TOKEN")
@@ -477,12 +805,116 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
"imap_host": email_imap,
"smtp_host": email_smtp,
})
email_home = os.getenv("EMAIL_HOME_ADDRESS")
if email_home:
config.platforms[Platform.EMAIL].home_channel = HomeChannel(
platform=Platform.EMAIL,
chat_id=email_home,
name=os.getenv("EMAIL_HOME_ADDRESS_NAME", "Home"),
email_home = os.getenv("EMAIL_HOME_ADDRESS")
if email_home and Platform.EMAIL in config.platforms:
config.platforms[Platform.EMAIL].home_channel = HomeChannel(
platform=Platform.EMAIL,
chat_id=email_home,
name=os.getenv("EMAIL_HOME_ADDRESS_NAME", "Home"),
)
# SMS (Twilio)
twilio_sid = os.getenv("TWILIO_ACCOUNT_SID")
if twilio_sid:
if Platform.SMS not in config.platforms:
config.platforms[Platform.SMS] = PlatformConfig()
config.platforms[Platform.SMS].enabled = True
config.platforms[Platform.SMS].api_key = os.getenv("TWILIO_AUTH_TOKEN", "")
sms_home = os.getenv("SMS_HOME_CHANNEL")
if sms_home and Platform.SMS in config.platforms:
config.platforms[Platform.SMS].home_channel = HomeChannel(
platform=Platform.SMS,
chat_id=sms_home,
name=os.getenv("SMS_HOME_CHANNEL_NAME", "Home"),
)
# API Server
api_server_enabled = os.getenv("API_SERVER_ENABLED", "").lower() in ("true", "1", "yes")
api_server_key = os.getenv("API_SERVER_KEY", "")
api_server_cors_origins = os.getenv("API_SERVER_CORS_ORIGINS", "")
api_server_port = os.getenv("API_SERVER_PORT")
api_server_host = os.getenv("API_SERVER_HOST")
if api_server_enabled or api_server_key:
if Platform.API_SERVER not in config.platforms:
config.platforms[Platform.API_SERVER] = PlatformConfig()
config.platforms[Platform.API_SERVER].enabled = True
if api_server_key:
config.platforms[Platform.API_SERVER].extra["key"] = api_server_key
if api_server_cors_origins:
origins = [origin.strip() for origin in api_server_cors_origins.split(",") if origin.strip()]
if origins:
config.platforms[Platform.API_SERVER].extra["cors_origins"] = origins
if api_server_port:
try:
config.platforms[Platform.API_SERVER].extra["port"] = int(api_server_port)
except ValueError:
pass
if api_server_host:
config.platforms[Platform.API_SERVER].extra["host"] = api_server_host
# Webhook platform
webhook_enabled = os.getenv("WEBHOOK_ENABLED", "").lower() in ("true", "1", "yes")
webhook_port = os.getenv("WEBHOOK_PORT")
webhook_secret = os.getenv("WEBHOOK_SECRET", "")
if webhook_enabled:
if Platform.WEBHOOK not in config.platforms:
config.platforms[Platform.WEBHOOK] = PlatformConfig()
config.platforms[Platform.WEBHOOK].enabled = True
if webhook_port:
try:
config.platforms[Platform.WEBHOOK].extra["port"] = int(webhook_port)
except ValueError:
pass
if webhook_secret:
config.platforms[Platform.WEBHOOK].extra["secret"] = webhook_secret
# Feishu / Lark
feishu_app_id = os.getenv("FEISHU_APP_ID")
feishu_app_secret = os.getenv("FEISHU_APP_SECRET")
if feishu_app_id and feishu_app_secret:
if Platform.FEISHU not in config.platforms:
config.platforms[Platform.FEISHU] = PlatformConfig()
config.platforms[Platform.FEISHU].enabled = True
config.platforms[Platform.FEISHU].extra.update({
"app_id": feishu_app_id,
"app_secret": feishu_app_secret,
"domain": os.getenv("FEISHU_DOMAIN", "feishu"),
"connection_mode": os.getenv("FEISHU_CONNECTION_MODE", "websocket"),
})
feishu_encrypt_key = os.getenv("FEISHU_ENCRYPT_KEY", "")
if feishu_encrypt_key:
config.platforms[Platform.FEISHU].extra["encrypt_key"] = feishu_encrypt_key
feishu_verification_token = os.getenv("FEISHU_VERIFICATION_TOKEN", "")
if feishu_verification_token:
config.platforms[Platform.FEISHU].extra["verification_token"] = feishu_verification_token
feishu_home = os.getenv("FEISHU_HOME_CHANNEL")
if feishu_home:
config.platforms[Platform.FEISHU].home_channel = HomeChannel(
platform=Platform.FEISHU,
chat_id=feishu_home,
name=os.getenv("FEISHU_HOME_CHANNEL_NAME", "Home"),
)
# WeCom (Enterprise WeChat)
wecom_bot_id = os.getenv("WECOM_BOT_ID")
wecom_secret = os.getenv("WECOM_SECRET")
if wecom_bot_id and wecom_secret:
if Platform.WECOM not in config.platforms:
config.platforms[Platform.WECOM] = PlatformConfig()
config.platforms[Platform.WECOM].enabled = True
config.platforms[Platform.WECOM].extra.update({
"bot_id": wecom_bot_id,
"secret": wecom_secret,
})
wecom_ws_url = os.getenv("WECOM_WEBSOCKET_URL", "")
if wecom_ws_url:
config.platforms[Platform.WECOM].extra["websocket_url"] = wecom_ws_url
wecom_home = os.getenv("WECOM_HOME_CHANNEL")
if wecom_home:
config.platforms[Platform.WECOM].home_channel = HomeChannel(
platform=Platform.WECOM,
chat_id=wecom_home,
name=os.getenv("WECOM_HOME_CHANNEL_NAME", "Home"),
)
# Session settings
@@ -499,12 +931,3 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
config.default_reset_policy.at_hour = int(reset_hour)
except ValueError:
pass
def save_gateway_config(config: GatewayConfig) -> None:
"""Save gateway configuration to ~/.hermes/gateway.json."""
gateway_config_path = get_hermes_home() / "gateway.json"
gateway_config_path.parent.mkdir(parents=True, exist_ok=True)
with open(gateway_config_path, "w", encoding="utf-8") as f:
json.dump(config.to_dict(), f, indent=2)

View File

@@ -13,7 +13,6 @@ from pathlib import Path
from datetime import datetime
from dataclasses import dataclass
from typing import Dict, List, Optional, Any, Union
from enum import Enum
from hermes_cli.config import get_hermes_home
@@ -71,12 +70,15 @@ class DeliveryTarget:
if target == "local":
return cls(platform=Platform.LOCAL)
# Check for platform:chat_id format
# Check for platform:chat_id or platform:chat_id:thread_id format
if ":" in target:
platform_str, chat_id = target.split(":", 1)
parts = target.split(":", 2)
platform_str = parts[0]
chat_id = parts[1] if len(parts) > 1 else None
thread_id = parts[2] if len(parts) > 2 else None
try:
platform = Platform(platform_str)
return cls(platform=platform, chat_id=chat_id, is_explicit=True)
return cls(platform=platform, chat_id=chat_id, thread_id=thread_id, is_explicit=True)
except ValueError:
# Unknown platform, treat as local
return cls(platform=Platform.LOCAL)
@@ -95,6 +97,8 @@ class DeliveryTarget:
return "origin"
if self.platform == Platform.LOCAL:
return "local"
if self.chat_id and self.thread_id:
return f"{self.platform.value}:{self.chat_id}:{self.thread_id}"
if self.chat_id:
return f"{self.platform.value}:{self.chat_id}"
return self.platform.value

View File

@@ -8,8 +8,9 @@ Hooks are discovered from ~/.hermes/hooks/ directories, each containing:
Events:
- gateway:startup -- Gateway process starts
- session:start -- New session created
- session:reset -- User ran /new or /reset
- session:start -- New session created (first message of a new session)
- session:end -- Session ends (user ran /new or /reset)
- session:reset -- Session reset completed (new session entry created)
- agent:start -- Agent begins processing a message
- agent:step -- Each turn in the tool-calling loop
- agent:end -- Agent finishes processing
@@ -20,8 +21,6 @@ Errors in hooks are caught and logged but never block the main pipeline.
import asyncio
import importlib.util
import os
from pathlib import Path
from typing import Any, Callable, Dict, List, Optional
import yaml
@@ -52,14 +51,33 @@ class HookRegistry:
"""Return metadata about all loaded hooks."""
return list(self._loaded_hooks)
def _register_builtin_hooks(self) -> None:
"""Register built-in hooks that are always active."""
try:
from gateway.builtin_hooks.boot_md import handle as boot_md_handle
self._handlers.setdefault("gateway:startup", []).append(boot_md_handle)
self._loaded_hooks.append({
"name": "boot-md",
"description": "Run ~/.hermes/BOOT.md on gateway startup",
"events": ["gateway:startup"],
"path": "(builtin)",
})
except Exception as e:
print(f"[hooks] Could not load built-in boot-md hook: {e}", flush=True)
def discover_and_load(self) -> None:
"""
Scan the hooks directory for hook directories and load their handlers.
Also registers built-in hooks that are always active.
Each hook directory must contain:
- HOOK.yaml with at least 'name' and 'events' keys
- handler.py with a top-level 'handle' function (sync or async)
"""
self._register_builtin_hooks()
if not HOOKS_DIR.exists():
return

View File

@@ -12,7 +12,6 @@ the full SessionStore machinery.
import json
import logging
from datetime import datetime
from pathlib import Path
from typing import Optional
from hermes_cli.config import get_hermes_home

View File

@@ -25,7 +25,7 @@ import time
from pathlib import Path
from typing import Optional
from hermes_cli.config import get_hermes_home
from hermes_constants import get_hermes_dir
# Unambiguous alphabet -- excludes 0/O, 1/I to prevent confusion
@@ -41,7 +41,7 @@ LOCKOUT_SECONDS = 3600 # Lockout duration after too many failures
MAX_PENDING_PER_PLATFORM = 3 # Max pending codes per platform
MAX_FAILED_ATTEMPTS = 5 # Failed approvals before lockout
PAIRING_DIR = get_hermes_home() / "pairing"
PAIRING_DIR = get_hermes_dir("platforms/pairing", "pairing")
def _secure_write(path: Path, data: str) -> None:

File diff suppressed because it is too large Load Diff

View File

@@ -8,6 +8,7 @@ and implement the required methods.
import asyncio
import logging
import os
import random
import re
import uuid
from abc import ABC, abstractmethod
@@ -26,6 +27,7 @@ sys.path.insert(0, str(_Path(__file__).resolve().parents[2]))
from gateway.config import Platform, PlatformConfig
from gateway.session import SessionSource, build_session_key
from hermes_cli.config import get_hermes_home
from hermes_constants import get_hermes_dir
GATEWAY_SECRET_CAPTURE_UNSUPPORTED_MESSAGE = (
@@ -43,8 +45,8 @@ GATEWAY_SECRET_CAPTURE_UNSUPPORTED_MESSAGE = (
# (e.g. Telegram file URLs expire after ~1 hour).
# ---------------------------------------------------------------------------
# Default location: {HERMES_HOME}/image_cache/
IMAGE_CACHE_DIR = get_hermes_home() / "image_cache"
# Default location: {HERMES_HOME}/cache/images/ (legacy: image_cache/)
IMAGE_CACHE_DIR = get_hermes_dir("cache/images", "image_cache")
def get_image_cache_dir() -> Path:
@@ -71,31 +73,51 @@ def cache_image_from_bytes(data: bytes, ext: str = ".jpg") -> str:
return str(filepath)
async def cache_image_from_url(url: str, ext: str = ".jpg") -> str:
async def cache_image_from_url(url: str, ext: str = ".jpg", retries: int = 2) -> str:
"""
Download an image from a URL and save it to the local cache.
Uses httpx for async download with a reasonable timeout.
Retries on transient failures (timeouts, 429, 5xx) with exponential
backoff so a single slow CDN response doesn't lose the media.
Args:
url: The HTTP/HTTPS URL to download from.
ext: File extension including the dot (e.g. ".jpg", ".png").
retries: Number of retry attempts on transient failures.
Returns:
Absolute path to the cached image file as a string.
"""
import asyncio
import httpx
import logging as _logging
_log = _logging.getLogger(__name__)
last_exc = None
async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
response = await client.get(
url,
headers={
"User-Agent": "Mozilla/5.0 (compatible; HermesAgent/1.0)",
"Accept": "image/*,*/*;q=0.8",
},
)
response.raise_for_status()
return cache_image_from_bytes(response.content, ext)
for attempt in range(retries + 1):
try:
response = await client.get(
url,
headers={
"User-Agent": "Mozilla/5.0 (compatible; HermesAgent/1.0)",
"Accept": "image/*,*/*;q=0.8",
},
)
response.raise_for_status()
return cache_image_from_bytes(response.content, ext)
except (httpx.TimeoutException, httpx.HTTPStatusError) as exc:
last_exc = exc
if isinstance(exc, httpx.HTTPStatusError) and exc.response.status_code < 429:
raise
if attempt < retries:
wait = 1.5 * (attempt + 1)
_log.debug("Media cache retry %d/%d for %s (%.1fs): %s",
attempt + 1, retries, url[:80], wait, exc)
await asyncio.sleep(wait)
continue
raise
raise last_exc
def cleanup_image_cache(max_age_hours: int = 24) -> int:
@@ -126,7 +148,7 @@ def cleanup_image_cache(max_age_hours: int = 24) -> int:
# here so the STT tool (OpenAI Whisper) can transcribe them from local files.
# ---------------------------------------------------------------------------
AUDIO_CACHE_DIR = get_hermes_home() / "audio_cache"
AUDIO_CACHE_DIR = get_hermes_dir("cache/audio", "audio_cache")
def get_audio_cache_dir() -> Path:
@@ -153,29 +175,51 @@ def cache_audio_from_bytes(data: bytes, ext: str = ".ogg") -> str:
return str(filepath)
async def cache_audio_from_url(url: str, ext: str = ".ogg") -> str:
async def cache_audio_from_url(url: str, ext: str = ".ogg", retries: int = 2) -> str:
"""
Download an audio file from a URL and save it to the local cache.
Retries on transient failures (timeouts, 429, 5xx) with exponential
backoff so a single slow CDN response doesn't lose the media.
Args:
url: The HTTP/HTTPS URL to download from.
ext: File extension including the dot (e.g. ".ogg", ".mp3").
retries: Number of retry attempts on transient failures.
Returns:
Absolute path to the cached audio file as a string.
"""
import asyncio
import httpx
import logging as _logging
_log = _logging.getLogger(__name__)
last_exc = None
async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
response = await client.get(
url,
headers={
"User-Agent": "Mozilla/5.0 (compatible; HermesAgent/1.0)",
"Accept": "audio/*,*/*;q=0.8",
},
)
response.raise_for_status()
return cache_audio_from_bytes(response.content, ext)
for attempt in range(retries + 1):
try:
response = await client.get(
url,
headers={
"User-Agent": "Mozilla/5.0 (compatible; HermesAgent/1.0)",
"Accept": "audio/*,*/*;q=0.8",
},
)
response.raise_for_status()
return cache_audio_from_bytes(response.content, ext)
except (httpx.TimeoutException, httpx.HTTPStatusError) as exc:
last_exc = exc
if isinstance(exc, httpx.HTTPStatusError) and exc.response.status_code < 429:
raise
if attempt < retries:
wait = 1.5 * (attempt + 1)
_log.debug("Audio cache retry %d/%d for %s (%.1fs): %s",
attempt + 1, retries, url[:80], wait, exc)
await asyncio.sleep(wait)
continue
raise
raise last_exc
# ---------------------------------------------------------------------------
@@ -185,12 +229,13 @@ async def cache_audio_from_url(url: str, ext: str = ".ogg") -> str:
# here so the agent can reference them by local file path.
# ---------------------------------------------------------------------------
DOCUMENT_CACHE_DIR = get_hermes_home() / "document_cache"
DOCUMENT_CACHE_DIR = get_hermes_dir("cache/documents", "document_cache")
SUPPORTED_DOCUMENT_TYPES = {
".pdf": "application/pdf",
".md": "text/markdown",
".txt": "text/plain",
".zip": "application/zip",
".docx": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
".xlsx": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
".pptx": "application/vnd.openxmlformats-officedocument.presentationml.presentation",
@@ -288,11 +333,16 @@ class MessageEvent:
message_id: Optional[str] = None
# Media attachments
# media_urls: local file paths (for vision tool access)
media_urls: List[str] = field(default_factory=list)
media_types: List[str] = field(default_factory=list)
# Reply context
reply_to_message_id: Optional[str] = None
reply_to_text: Optional[str] = None # Text of the replied-to message (for context injection)
# Auto-loaded skill for topic/channel bindings (e.g., Telegram DM Topics)
auto_skill: Optional[str] = None
# Timestamps
timestamp: datetime = field(default_factory=datetime.now)
@@ -307,7 +357,10 @@ class MessageEvent:
return None
# Split on space and get first word, strip the /
parts = self.text.split(maxsplit=1)
return parts[0][1:].lower() if parts else None
raw = parts[0][1:].lower() if parts else None
if raw and "@" in raw:
raw = raw.split("@", 1)[0]
return raw
def get_command_args(self) -> str:
"""Get the arguments after a command."""
@@ -324,6 +377,27 @@ class SendResult:
message_id: Optional[str] = None
error: Optional[str] = None
raw_response: Any = None
retryable: bool = False # True for transient connection errors — base will retry automatically
# Error substrings that indicate a transient *connection* failure worth retrying.
# "timeout" / "timed out" / "readtimeout" / "writetimeout" are intentionally
# excluded: a read/write timeout on a non-idempotent call (e.g. send_message)
# means the request may have reached the server — retrying risks duplicate
# delivery. "connecttimeout" is safe because the connection was never
# established. Platforms that know a timeout is safe to retry should set
# SendResult.retryable = True explicitly.
_RETRYABLE_ERROR_PATTERNS = (
"connecterror",
"connectionerror",
"connectionreset",
"connectionrefused",
"connecttimeout",
"network",
"broken pipe",
"remotedisconnected",
"eoferror",
)
# Type for message handlers
@@ -355,6 +429,10 @@ class BasePlatformAdapter(ABC):
# Key: session_key (e.g., chat_id), Value: (event, asyncio.Event for interrupt)
self._active_sessions: Dict[str, asyncio.Event] = {}
self._pending_messages: Dict[str, MessageEvent] = {}
# Background message-processing tasks spawned by handle_message().
# Gateway shutdown cancels these so an old gateway instance doesn't keep
# working on a task after --replace or manual restarts.
self._background_tasks: set[asyncio.Task] = set()
# Chats where auto-TTS on voice input is disabled (set by /voice off)
self._auto_tts_disabled_chats: set = set()
@@ -498,6 +576,14 @@ class BasePlatformAdapter(ABC):
metadata: optional dict with platform-specific context (e.g. thread_id for Slack).
"""
pass
async def stop_typing(self, chat_id: str) -> None:
"""Stop a persistent typing indicator (if the platform uses one).
Override in subclasses that start background typing loops.
Default is a no-op for platforms with one-shot typing indicators.
"""
pass
async def send_image(
self,
@@ -505,6 +591,7 @@ class BasePlatformAdapter(ABC):
image_url: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""
Send an image natively via the platform API.
@@ -523,6 +610,7 @@ class BasePlatformAdapter(ABC):
animation_url: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""
Send an animated GIF natively via the platform API.
@@ -531,7 +619,7 @@ class BasePlatformAdapter(ABC):
(e.g., Telegram send_animation) so they auto-play inline.
Default falls back to send_image.
"""
return await self.send_image(chat_id=chat_id, image_url=animation_url, caption=caption, reply_to=reply_to)
return await self.send_image(chat_id=chat_id, image_url=animation_url, caption=caption, reply_to=reply_to, metadata=metadata)
@staticmethod
def _is_animation_url(url: str) -> bool:
@@ -705,7 +793,7 @@ class BasePlatformAdapter(ABC):
# Extract MEDIA:<path> tags, allowing optional whitespace after the colon
# and quoted/backticked paths for LLM-formatted outputs.
media_pattern = re.compile(
r'''[`"']?MEDIA:\s*(?P<path>`[^`\n]+`|"[^"\n]+"|'[^'\n]+'|\S+)[`"']?'''
r'''[`"']?MEDIA:\s*(?P<path>`[^`\n]+`|"[^"\n]+"|'[^'\n]+'|(?:~/|/)\S+(?:[^\S\n]+\S+)*?\.(?:png|jpe?g|gif|webp|mp4|mov|avi|mkv|webm|ogg|opus|mp3|wav|m4a)(?=[\s`"',;:)\]}]|$)|\S+)[`"']?'''
)
for match in media_pattern.finditer(content):
path = match.group("path").strip()
@@ -721,7 +809,75 @@ class BasePlatformAdapter(ABC):
cleaned = re.sub(r'\n{3,}', '\n\n', cleaned).strip()
return media, cleaned
@staticmethod
def extract_local_files(content: str) -> Tuple[List[str], str]:
"""
Detect bare local file paths in response text for native media delivery.
Matches absolute paths (/...) and tilde paths (~/) ending in common
image or video extensions. Validates each candidate with
``os.path.isfile()`` to avoid false positives from URLs or
non-existent paths.
Paths inside fenced code blocks (``` ... ```) and inline code
(`...`) are ignored so that code samples are never mutilated.
Returns:
Tuple of (list of expanded file paths, cleaned text with the
raw path strings removed).
"""
_LOCAL_MEDIA_EXTS = (
'.png', '.jpg', '.jpeg', '.gif', '.webp',
'.mp4', '.mov', '.avi', '.mkv', '.webm',
)
ext_part = '|'.join(e.lstrip('.') for e in _LOCAL_MEDIA_EXTS)
# (?<![/:\w.]) prevents matching inside URLs (e.g. https://…/img.png)
# and relative paths (./foo.png)
# (?:~/|/) anchors to absolute or home-relative paths
path_re = re.compile(
r'(?<![/:\w.])(?:~/|/)(?:[\w.\-]+/)*[\w.\-]+\.(?:' + ext_part + r')\b',
re.IGNORECASE,
)
# Build spans covered by fenced code blocks and inline code
code_spans: list = []
for m in re.finditer(r'```[^\n]*\n.*?```', content, re.DOTALL):
code_spans.append((m.start(), m.end()))
for m in re.finditer(r'`[^`\n]+`', content):
code_spans.append((m.start(), m.end()))
def _in_code(pos: int) -> bool:
return any(s <= pos < e for s, e in code_spans)
found: list = [] # (raw_match_text, expanded_path)
for match in path_re.finditer(content):
if _in_code(match.start()):
continue
raw = match.group(0)
expanded = os.path.expanduser(raw)
if os.path.isfile(expanded):
found.append((raw, expanded))
# Deduplicate by expanded path, preserving discovery order
seen: set = set()
unique: list = []
for raw, expanded in found:
if expanded not in seen:
seen.add(expanded)
unique.append((raw, expanded))
paths = [expanded for _, expanded in unique]
cleaned = content
if unique:
for raw, _exp in unique:
cleaned = cleaned.replace(raw, '')
cleaned = re.sub(r'\n{3,}', '\n\n', cleaned).strip()
return paths, cleaned
async def _keep_typing(self, chat_id: str, interval: float = 2.0, metadata=None) -> None:
"""
Continuously send typing indicator until cancelled.
@@ -735,7 +891,139 @@ class BasePlatformAdapter(ABC):
await asyncio.sleep(interval)
except asyncio.CancelledError:
pass # Normal cancellation when handler completes
finally:
# Ensure the underlying platform typing loop is stopped.
# _keep_typing may have called send_typing() after an outer
# stop_typing() cleared the task dict, recreating the loop.
# Cancelling _keep_typing alone won't clean that up.
if hasattr(self, "stop_typing"):
try:
await self.stop_typing(chat_id)
except Exception:
pass
# ── Processing lifecycle hooks ──────────────────────────────────────────
# Subclasses override these to react to message processing events
# (e.g. Discord adds 👀/✅/❌ reactions).
async def on_processing_start(self, event: MessageEvent) -> None:
"""Hook called when background processing begins."""
async def on_processing_complete(self, event: MessageEvent, success: bool) -> None:
"""Hook called when background processing completes."""
async def _run_processing_hook(self, hook_name: str, *args: Any, **kwargs: Any) -> None:
"""Run a lifecycle hook without letting failures break message flow."""
hook = getattr(self, hook_name, None)
if not callable(hook):
return
try:
await hook(*args, **kwargs)
except Exception as e:
logger.warning("[%s] %s hook failed: %s", self.name, hook_name, e)
@staticmethod
def _is_retryable_error(error: Optional[str]) -> bool:
"""Return True if the error string looks like a transient network failure."""
if not error:
return False
lowered = error.lower()
return any(pat in lowered for pat in _RETRYABLE_ERROR_PATTERNS)
@staticmethod
def _is_timeout_error(error: Optional[str]) -> bool:
"""Return True if the error string indicates a read/write timeout.
Timeout errors are NOT retryable and should NOT trigger plain-text
fallback — the request may have already been delivered.
"""
if not error:
return False
lowered = error.lower()
return "timed out" in lowered or "readtimeout" in lowered or "writetimeout" in lowered
async def _send_with_retry(
self,
chat_id: str,
content: str,
reply_to: Optional[str] = None,
metadata: Any = None,
max_retries: int = 2,
base_delay: float = 2.0,
) -> "SendResult":
"""
Send a message with automatic retry for transient network errors.
On permanent failures (e.g. formatting / permission errors) falls back
to a plain-text version before giving up. If all attempts fail due to
network errors, sends the user a brief delivery-failure notice so they
know to retry rather than waiting indefinitely.
"""
result = await self.send(
chat_id=chat_id,
content=content,
reply_to=reply_to,
metadata=metadata,
)
if result.success:
return result
error_str = result.error or ""
is_network = result.retryable or self._is_retryable_error(error_str)
# Timeout errors are not safe to retry (message may have been
# delivered) and not formatting errors — return the failure as-is.
if not is_network and self._is_timeout_error(error_str):
return result
if is_network:
# Retry with exponential backoff for transient errors
for attempt in range(1, max_retries + 1):
delay = base_delay * (2 ** (attempt - 1)) + random.uniform(0, 1)
logger.warning(
"[%s] Send failed (attempt %d/%d, retrying in %.1fs): %s",
self.name, attempt, max_retries, delay, error_str,
)
await asyncio.sleep(delay)
result = await self.send(
chat_id=chat_id,
content=content,
reply_to=reply_to,
metadata=metadata,
)
if result.success:
logger.info("[%s] Send succeeded on retry %d", self.name, attempt)
return result
error_str = result.error or ""
if not (result.retryable or self._is_retryable_error(error_str)):
break # error switched to non-transient — fall through to plain-text fallback
else:
# All retries exhausted (loop completed without break) — notify user
logger.error("[%s] Failed to deliver response after %d retries: %s", self.name, max_retries, error_str)
notice = (
"\u26a0\ufe0f Message delivery failed after multiple attempts. "
"Please try again \u2014 your request was processed but the response could not be sent."
)
try:
await self.send(chat_id=chat_id, content=notice, reply_to=reply_to, metadata=metadata)
except Exception as notify_err:
logger.debug("[%s] Could not send delivery-failure notice: %s", self.name, notify_err)
return result
# Non-network / post-retry formatting failure: try plain text as fallback
logger.warning("[%s] Send failed: %s — trying plain-text fallback", self.name, error_str)
fallback_result = await self.send(
chat_id=chat_id,
content=f"(Response formatting failed, plain text:)\n\n{content[:3500]}",
reply_to=reply_to,
metadata=metadata,
)
if not fallback_result.success:
logger.error("[%s] Fallback send also failed: %s", self.name, fallback_result.error)
return fallback_result
async def handle_message(self, event: MessageEvent) -> None:
"""
Process an incoming message.
@@ -747,19 +1035,103 @@ class BasePlatformAdapter(ABC):
if not self._message_handler:
return
session_key = build_session_key(event.source)
session_key = build_session_key(
event.source,
group_sessions_per_user=self.config.extra.get("group_sessions_per_user", True),
)
# Check if there's already an active handler for this session
if session_key in self._active_sessions:
# Store this as a pending message - it will interrupt the running agent
print(f"[{self.name}] ⚡ New message while session {session_key} is active - triggering interrupt")
# /approve and /deny must bypass the active-session guard.
# The agent thread is blocked on threading.Event.wait() inside
# tools/approval.py — queuing these commands creates a deadlock:
# the agent waits for approval, approval waits for agent to finish.
# Dispatch directly to the message handler without touching session
# lifecycle (no competing background task, no session guard removal).
cmd = event.get_command()
if cmd in ("approve", "deny"):
logger.debug(
"[%s] Approval command '/%s' bypassing active-session guard for %s",
self.name, cmd, session_key,
)
try:
_thread_meta = {"thread_id": event.source.thread_id} if event.source.thread_id else None
response = await self._message_handler(event)
if response:
await self._send_with_retry(
chat_id=event.source.chat_id,
content=response,
reply_to=event.message_id,
metadata=_thread_meta,
)
except Exception as e:
logger.error("[%s] Approval dispatch failed: %s", self.name, e, exc_info=True)
return
# /status must also bypass the active-session guard so it always
# returns a system-generated response instead of being queued as
# user text and passed to the agent (#5046).
if cmd == "status":
logger.debug(
"[%s] Status command bypassing active-session guard for %s",
self.name, session_key,
)
try:
_thread_meta = {"thread_id": event.source.thread_id} if event.source.thread_id else None
response = await self._message_handler(event)
if response:
await self._send_with_retry(
chat_id=event.source.chat_id,
content=response,
reply_to=event.message_id,
metadata=_thread_meta,
)
except Exception as e:
logger.error("[%s] Status dispatch failed: %s", self.name, e, exc_info=True)
return
# Special case: photo bursts/albums frequently arrive as multiple near-
# simultaneous messages. Queue them without interrupting the active run,
# then process them immediately after the current task finishes.
if event.message_type == MessageType.PHOTO:
logger.debug("[%s] Queuing photo follow-up for session %s without interrupt", self.name, session_key)
existing = self._pending_messages.get(session_key)
if existing and existing.message_type == MessageType.PHOTO:
existing.media_urls.extend(event.media_urls)
existing.media_types.extend(event.media_types)
if event.text:
if not existing.text:
existing.text = event.text
elif event.text not in existing.text:
existing.text = f"{existing.text}\n\n{event.text}".strip()
else:
self._pending_messages[session_key] = event
return # Don't interrupt now - will run after current task completes
# Default behavior for non-photo follow-ups: interrupt the running agent
logger.debug("[%s] New message while session %s is active — triggering interrupt", self.name, session_key)
self._pending_messages[session_key] = event
# Signal the interrupt (the processing task checks this)
self._active_sessions[session_key].set()
return # Don't process now - will be handled after current task finishes
# Mark session as active BEFORE spawning background task to close
# the race window where a second message arriving before the task
# starts would also pass the _active_sessions check and spawn a
# duplicate task. (grammY sequentialize / aiogram EventIsolation
# pattern — set the guard synchronously, not inside the task.)
self._active_sessions[session_key] = asyncio.Event()
# Spawn background task to process this message
asyncio.create_task(self._process_message_background(event, session_key))
task = asyncio.create_task(self._process_message_background(event, session_key))
try:
self._background_tasks.add(task)
except TypeError:
# Some tests stub create_task() with lightweight sentinels that are not
# hashable and do not support lifecycle callbacks.
return
if hasattr(task, "add_done_callback"):
task.add_done_callback(self._background_tasks.discard)
@staticmethod
def _get_human_delay() -> float:
@@ -784,8 +1156,22 @@ class BasePlatformAdapter(ABC):
async def _process_message_background(self, event: MessageEvent, session_key: str) -> None:
"""Background task that actually processes the message."""
# Create interrupt event for this session
interrupt_event = asyncio.Event()
# Track delivery outcomes for the processing-complete hook
delivery_attempted = False
delivery_succeeded = False
def _record_delivery(result):
nonlocal delivery_attempted, delivery_succeeded
if result is None:
return
delivery_attempted = True
if getattr(result, "success", False):
delivery_succeeded = True
# Reuse the interrupt event set by handle_message() (which marks
# the session active before spawning this task to prevent races).
# Fall back to a new Event only if the entry was removed externally.
interrupt_event = self._active_sessions.get(session_key) or asyncio.Event()
self._active_sessions[session_key] = interrupt_event
# Start continuous typing indicator (refreshes every 2 seconds)
@@ -793,20 +1179,34 @@ class BasePlatformAdapter(ABC):
typing_task = asyncio.create_task(self._keep_typing(event.source.chat_id, metadata=_thread_metadata))
try:
await self._run_processing_hook("on_processing_start", event)
# Call the handler (this can take a while with tool calls)
response = await self._message_handler(event)
# Send response if any
# Send response if any. A None/empty response is normal when
# streaming already delivered the text (already_sent=True) or
# when the message was queued behind an active agent. Log at
# DEBUG to avoid noisy warnings for expected behavior.
if not response:
logger.warning("[%s] Handler returned empty/None response for %s", self.name, event.source.chat_id)
logger.debug("[%s] Handler returned empty/None response for %s", self.name, event.source.chat_id)
if response:
# Extract MEDIA:<path> tags (from TTS tool) before other processing
media_files, response = self.extract_media(response)
# Extract image URLs and send them as native platform attachments
images, text_content = self.extract_images(response)
# Strip any remaining internal directives from message body (fixes #1561)
text_content = text_content.replace("[[audio_as_voice]]", "").strip()
text_content = re.sub(r"MEDIA:\s*\S+", "", text_content).strip()
if images:
logger.info("[%s] extract_images found %d image(s) in response (%d chars)", self.name, len(images), len(response))
# Auto-detect bare local file paths for native media delivery
# (helps small models that don't use MEDIA: syntax)
local_files, text_content = self.extract_local_files(text_content)
if local_files:
logger.info("[%s] extract_local_files found %d file(s) in response", self.name, len(local_files))
# Auto-TTS: if voice message, generate audio FIRST (before sending text)
# Skipped when the chat has voice mode disabled (/voice off)
@@ -847,25 +1247,13 @@ class BasePlatformAdapter(ABC):
# Send the text portion
if text_content:
logger.info("[%s] Sending response (%d chars) to %s", self.name, len(text_content), event.source.chat_id)
result = await self.send(
result = await self._send_with_retry(
chat_id=event.source.chat_id,
content=text_content,
reply_to=event.message_id,
metadata=_thread_metadata,
)
# Log send failures (don't raise - user already saw tool progress)
if not result.success:
print(f"[{self.name}] Failed to send response: {result.error}")
# Try sending without markdown as fallback
fallback_result = await self.send(
chat_id=event.source.chat_id,
content=f"(Response formatting failed, plain text:)\n\n{text_content[:3500]}",
reply_to=event.message_id,
metadata=_thread_metadata,
)
if not fallback_result.success:
print(f"[{self.name}] Fallback send also failed: {fallback_result.error}")
_record_delivery(result)
# Human-like pacing delay between text and media
human_delay = self._get_human_delay()
@@ -900,7 +1288,7 @@ class BasePlatformAdapter(ABC):
# Send extracted media files — route by file type
_AUDIO_EXTS = {'.ogg', '.opus', '.mp3', '.wav', '.m4a'}
_VIDEO_EXTS = {'.mp4', '.mov', '.avi', '.mkv', '.3gp'}
_VIDEO_EXTS = {'.mp4', '.mov', '.avi', '.mkv', '.webm', '.3gp'}
_IMAGE_EXTS = {'.jpg', '.jpeg', '.png', '.webp', '.gif'}
for media_path, is_voice in media_files:
@@ -934,14 +1322,45 @@ class BasePlatformAdapter(ABC):
)
if not media_result.success:
print(f"[{self.name}] Failed to send media ({ext}): {media_result.error}")
logger.warning("[%s] Failed to send media (%s): %s", self.name, ext, media_result.error)
except Exception as media_err:
print(f"[{self.name}] Error sending media: {media_err}")
logger.warning("[%s] Error sending media: %s", self.name, media_err)
# Send auto-detected local files as native attachments
for file_path in local_files:
if human_delay > 0:
await asyncio.sleep(human_delay)
try:
ext = Path(file_path).suffix.lower()
if ext in _IMAGE_EXTS:
await self.send_image_file(
chat_id=event.source.chat_id,
image_path=file_path,
metadata=_thread_metadata,
)
elif ext in _VIDEO_EXTS:
await self.send_video(
chat_id=event.source.chat_id,
video_path=file_path,
metadata=_thread_metadata,
)
else:
await self.send_document(
chat_id=event.source.chat_id,
file_path=file_path,
metadata=_thread_metadata,
)
except Exception as file_err:
logger.error("[%s] Error sending local file %s: %s", self.name, file_path, file_err)
# Determine overall success for the processing hook
processing_ok = delivery_succeeded if delivery_attempted else not bool(response)
await self._run_processing_hook("on_processing_complete", event, processing_ok)
# Check if there's a pending message that was queued during our processing
if session_key in self._pending_messages:
pending_event = self._pending_messages.pop(session_key)
print(f"[{self.name}] 📨 Processing queued message from interrupt")
logger.debug("[%s] Processing queued message from interrupt", self.name)
# Clean up current session before processing pending
if session_key in self._active_sessions:
del self._active_sessions[session_key]
@@ -954,10 +1373,28 @@ class BasePlatformAdapter(ABC):
await self._process_message_background(pending_event, session_key)
return # Already cleaned up
except asyncio.CancelledError:
await self._run_processing_hook("on_processing_complete", event, False)
raise
except Exception as e:
print(f"[{self.name}] Error handling message: {e}")
import traceback
traceback.print_exc()
await self._run_processing_hook("on_processing_complete", event, False)
logger.error("[%s] Error handling message: %s", self.name, e, exc_info=True)
# Send the error to the user so they aren't left with radio silence
try:
error_type = type(e).__name__
error_detail = str(e)[:300] if str(e) else "no details available"
_thread_metadata = {"thread_id": event.source.thread_id} if event.source.thread_id else None
await self.send(
chat_id=event.source.chat_id,
content=(
f"Sorry, I encountered an error ({error_type}).\n"
f"{error_detail}\n"
"Try again or use /reset to start a fresh session."
),
metadata=_thread_metadata,
)
except Exception:
pass # Last resort — don't let error reporting crash the handler
finally:
# Stop typing indicator
typing_task.cancel()
@@ -965,10 +1402,32 @@ class BasePlatformAdapter(ABC):
await typing_task
except asyncio.CancelledError:
pass
# Also cancel any platform-level persistent typing tasks (e.g. Discord)
# that may have been recreated by _keep_typing after the last stop_typing()
try:
if hasattr(self, "stop_typing"):
await self.stop_typing(event.source.chat_id)
except Exception:
pass
# Clean up session tracking
if session_key in self._active_sessions:
del self._active_sessions[session_key]
async def cancel_background_tasks(self) -> None:
"""Cancel any in-flight background message-processing tasks.
Used during gateway shutdown/replacement so active sessions from the old
process do not keep running after adapters are being torn down.
"""
tasks = [task for task in self._background_tasks if not task.done()]
for task in tasks:
task.cancel()
if tasks:
await asyncio.gather(*tasks, return_exceptions=True)
self._background_tasks.clear()
self._pending_messages.clear()
self._active_sessions.clear()
def has_pending_interrupt(self, session_key: str) -> bool:
"""Check if there's a pending interrupt for a session."""
return session_key in self._active_sessions and self._active_sessions[session_key].is_set()
@@ -1028,7 +1487,8 @@ class BasePlatformAdapter(ABC):
"""
return content
def truncate_message(self, content: str, max_length: int = 4096) -> List[str]:
@staticmethod
def truncate_message(content: str, max_length: int = 4096) -> List[str]:
"""
Split a long message into chunks, preserving code block boundaries.
@@ -1080,6 +1540,27 @@ class BasePlatformAdapter(ABC):
if split_at < 1:
split_at = headroom
# Avoid splitting inside an inline code span (`...`).
# If the text before split_at has an odd number of unescaped
# backticks, the split falls inside inline code — the resulting
# chunk would have an unpaired backtick and any special characters
# (like parentheses) inside the broken span would be unescaped,
# causing MarkdownV2 parse errors on Telegram.
candidate = remaining[:split_at]
backtick_count = candidate.count("`") - candidate.count("\\`")
if backtick_count % 2 == 1:
# Find the last unescaped backtick and split before it
last_bt = candidate.rfind("`")
while last_bt > 0 and candidate[last_bt - 1] == "\\":
last_bt = candidate.rfind("`", 0, last_bt)
if last_bt > 0:
# Try to find a space or newline just before the backtick
safe_split = candidate.rfind(" ", 0, last_bt)
nl_split = candidate.rfind("\n", 0, last_bt)
safe_split = max(safe_split, nl_split)
if safe_split > headroom // 4:
split_at = safe_split
chunk_body = remaining[:split_at]
remaining = remaining[split_at:].lstrip()

View File

@@ -0,0 +1,340 @@
"""
DingTalk platform adapter using Stream Mode.
Uses dingtalk-stream SDK for real-time message reception without webhooks.
Responses are sent via DingTalk's session webhook (markdown format).
Requires:
pip install dingtalk-stream httpx
DINGTALK_CLIENT_ID and DINGTALK_CLIENT_SECRET env vars
Configuration in config.yaml:
platforms:
dingtalk:
enabled: true
extra:
client_id: "your-app-key" # or DINGTALK_CLIENT_ID env var
client_secret: "your-secret" # or DINGTALK_CLIENT_SECRET env var
"""
import asyncio
import logging
import os
import time
import uuid
from datetime import datetime, timezone
from typing import Any, Dict, Optional
try:
import dingtalk_stream
from dingtalk_stream import ChatbotHandler, ChatbotMessage
DINGTALK_STREAM_AVAILABLE = True
except ImportError:
DINGTALK_STREAM_AVAILABLE = False
dingtalk_stream = None # type: ignore[assignment]
try:
import httpx
HTTPX_AVAILABLE = True
except ImportError:
HTTPX_AVAILABLE = False
httpx = None # type: ignore[assignment]
from gateway.config import Platform, PlatformConfig
from gateway.platforms.base import (
BasePlatformAdapter,
MessageEvent,
MessageType,
SendResult,
)
logger = logging.getLogger(__name__)
MAX_MESSAGE_LENGTH = 20000
DEDUP_WINDOW_SECONDS = 300
DEDUP_MAX_SIZE = 1000
RECONNECT_BACKOFF = [2, 5, 10, 30, 60]
def check_dingtalk_requirements() -> bool:
"""Check if DingTalk dependencies are available and configured."""
if not DINGTALK_STREAM_AVAILABLE or not HTTPX_AVAILABLE:
return False
if not os.getenv("DINGTALK_CLIENT_ID") or not os.getenv("DINGTALK_CLIENT_SECRET"):
return False
return True
class DingTalkAdapter(BasePlatformAdapter):
"""DingTalk chatbot adapter using Stream Mode.
The dingtalk-stream SDK maintains a long-lived WebSocket connection.
Incoming messages arrive via a ChatbotHandler callback. Replies are
sent via the incoming message's session_webhook URL using httpx.
"""
MAX_MESSAGE_LENGTH = MAX_MESSAGE_LENGTH
def __init__(self, config: PlatformConfig):
super().__init__(config, Platform.DINGTALK)
extra = config.extra or {}
self._client_id: str = extra.get("client_id") or os.getenv("DINGTALK_CLIENT_ID", "")
self._client_secret: str = extra.get("client_secret") or os.getenv("DINGTALK_CLIENT_SECRET", "")
self._stream_client: Any = None
self._stream_task: Optional[asyncio.Task] = None
self._http_client: Optional["httpx.AsyncClient"] = None
# Message deduplication: msg_id -> timestamp
self._seen_messages: Dict[str, float] = {}
# Map chat_id -> session_webhook for reply routing
self._session_webhooks: Dict[str, str] = {}
# -- Connection lifecycle -----------------------------------------------
async def connect(self) -> bool:
"""Connect to DingTalk via Stream Mode."""
if not DINGTALK_STREAM_AVAILABLE:
logger.warning("[%s] dingtalk-stream not installed. Run: pip install dingtalk-stream", self.name)
return False
if not HTTPX_AVAILABLE:
logger.warning("[%s] httpx not installed. Run: pip install httpx", self.name)
return False
if not self._client_id or not self._client_secret:
logger.warning("[%s] DINGTALK_CLIENT_ID and DINGTALK_CLIENT_SECRET required", self.name)
return False
try:
self._http_client = httpx.AsyncClient(timeout=30.0)
credential = dingtalk_stream.Credential(self._client_id, self._client_secret)
self._stream_client = dingtalk_stream.DingTalkStreamClient(credential)
# Capture the current event loop for cross-thread dispatch
loop = asyncio.get_running_loop()
handler = _IncomingHandler(self, loop)
self._stream_client.register_callback_handler(
dingtalk_stream.ChatbotMessage.TOPIC, handler
)
self._stream_task = asyncio.create_task(self._run_stream())
self._mark_connected()
logger.info("[%s] Connected via Stream Mode", self.name)
return True
except Exception as e:
logger.error("[%s] Failed to connect: %s", self.name, e)
return False
async def _run_stream(self) -> None:
"""Run the blocking stream client with auto-reconnection."""
backoff_idx = 0
while self._running:
try:
logger.debug("[%s] Starting stream client...", self.name)
await asyncio.to_thread(self._stream_client.start)
except asyncio.CancelledError:
return
except Exception as e:
if not self._running:
return
logger.warning("[%s] Stream client error: %s", self.name, e)
if not self._running:
return
delay = RECONNECT_BACKOFF[min(backoff_idx, len(RECONNECT_BACKOFF) - 1)]
logger.info("[%s] Reconnecting in %ds...", self.name, delay)
await asyncio.sleep(delay)
backoff_idx += 1
async def disconnect(self) -> None:
"""Disconnect from DingTalk."""
self._running = False
self._mark_disconnected()
if self._stream_task:
self._stream_task.cancel()
try:
await self._stream_task
except asyncio.CancelledError:
pass
self._stream_task = None
if self._http_client:
await self._http_client.aclose()
self._http_client = None
self._stream_client = None
self._session_webhooks.clear()
self._seen_messages.clear()
logger.info("[%s] Disconnected", self.name)
# -- Inbound message processing -----------------------------------------
async def _on_message(self, message: "ChatbotMessage") -> None:
"""Process an incoming DingTalk chatbot message."""
msg_id = getattr(message, "message_id", None) or uuid.uuid4().hex
if self._is_duplicate(msg_id):
logger.debug("[%s] Duplicate message %s, skipping", self.name, msg_id)
return
text = self._extract_text(message)
if not text:
logger.debug("[%s] Empty message, skipping", self.name)
return
# Chat context
conversation_id = getattr(message, "conversation_id", "") or ""
conversation_type = getattr(message, "conversation_type", "1")
is_group = str(conversation_type) == "2"
sender_id = getattr(message, "sender_id", "") or ""
sender_nick = getattr(message, "sender_nick", "") or sender_id
sender_staff_id = getattr(message, "sender_staff_id", "") or ""
chat_id = conversation_id or sender_id
chat_type = "group" if is_group else "dm"
# Store session webhook for reply routing
session_webhook = getattr(message, "session_webhook", None) or ""
if session_webhook and chat_id:
self._session_webhooks[chat_id] = session_webhook
source = self.build_source(
chat_id=chat_id,
chat_name=getattr(message, "conversation_title", None),
chat_type=chat_type,
user_id=sender_id,
user_name=sender_nick,
user_id_alt=sender_staff_id if sender_staff_id else None,
)
# Parse timestamp
create_at = getattr(message, "create_at", None)
try:
timestamp = datetime.fromtimestamp(int(create_at) / 1000, tz=timezone.utc) if create_at else datetime.now(tz=timezone.utc)
except (ValueError, OSError, TypeError):
timestamp = datetime.now(tz=timezone.utc)
event = MessageEvent(
text=text,
message_type=MessageType.TEXT,
source=source,
message_id=msg_id,
raw_message=message,
timestamp=timestamp,
)
logger.debug("[%s] Message from %s in %s: %s",
self.name, sender_nick, chat_id[:20] if chat_id else "?", text[:50])
await self.handle_message(event)
@staticmethod
def _extract_text(message: "ChatbotMessage") -> str:
"""Extract plain text from a DingTalk chatbot message."""
text = getattr(message, "text", None) or ""
if isinstance(text, dict):
content = text.get("content", "").strip()
else:
content = str(text).strip()
# Fall back to rich text if present
if not content:
rich_text = getattr(message, "rich_text", None)
if rich_text and isinstance(rich_text, list):
parts = [item["text"] for item in rich_text
if isinstance(item, dict) and item.get("text")]
content = " ".join(parts).strip()
return content
# -- Deduplication ------------------------------------------------------
def _is_duplicate(self, msg_id: str) -> bool:
"""Check and record a message ID. Returns True if already seen."""
now = time.time()
if len(self._seen_messages) > DEDUP_MAX_SIZE:
cutoff = now - DEDUP_WINDOW_SECONDS
self._seen_messages = {k: v for k, v in self._seen_messages.items() if v > cutoff}
if msg_id in self._seen_messages:
return True
self._seen_messages[msg_id] = now
return False
# -- Outbound messaging -------------------------------------------------
async def send(
self,
chat_id: str,
content: str,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Send a markdown reply via DingTalk session webhook."""
metadata = metadata or {}
session_webhook = metadata.get("session_webhook") or self._session_webhooks.get(chat_id)
if not session_webhook:
return SendResult(success=False,
error="No session_webhook available. Reply must follow an incoming message.")
if not self._http_client:
return SendResult(success=False, error="HTTP client not initialized")
payload = {
"msgtype": "markdown",
"markdown": {"title": "Hermes", "text": content[:self.MAX_MESSAGE_LENGTH]},
}
try:
resp = await self._http_client.post(session_webhook, json=payload, timeout=15.0)
if resp.status_code < 300:
return SendResult(success=True, message_id=uuid.uuid4().hex[:12])
body = resp.text
logger.warning("[%s] Send failed HTTP %d: %s", self.name, resp.status_code, body[:200])
return SendResult(success=False, error=f"HTTP {resp.status_code}: {body[:200]}")
except httpx.TimeoutException:
return SendResult(success=False, error="Timeout sending message to DingTalk")
except Exception as e:
logger.error("[%s] Send error: %s", self.name, e)
return SendResult(success=False, error=str(e))
async def send_typing(self, chat_id: str, metadata=None) -> None:
"""DingTalk does not support typing indicators."""
pass
async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
"""Return basic info about a DingTalk conversation."""
return {"name": chat_id, "type": "group" if "group" in chat_id.lower() else "dm"}
# ---------------------------------------------------------------------------
# Internal stream handler
# ---------------------------------------------------------------------------
class _IncomingHandler(ChatbotHandler if DINGTALK_STREAM_AVAILABLE else object):
"""dingtalk-stream ChatbotHandler that forwards messages to the adapter."""
def __init__(self, adapter: DingTalkAdapter, loop: asyncio.AbstractEventLoop):
if DINGTALK_STREAM_AVAILABLE:
super().__init__()
self._adapter = adapter
self._loop = loop
def process(self, message: "ChatbotMessage"):
"""Called by dingtalk-stream in its thread when a message arrives.
Schedules the async handler on the main event loop.
"""
loop = self._loop
if loop is None or loop.is_closed():
logger.error("[DingTalk] Event loop unavailable, cannot dispatch message")
return dingtalk_stream.AckMessage.STATUS_OK, "OK"
future = asyncio.run_coroutine_threadsafe(self._adapter._on_message(message), loop)
try:
future.result(timeout=60)
except Exception:
logger.exception("[DingTalk] Error processing incoming message")
return dingtalk_stream.AckMessage.STATUS_OK, "OK"

File diff suppressed because it is too large Load Diff

View File

@@ -24,7 +24,6 @@ import re
import smtplib
import ssl
import uuid
from datetime import datetime
from email.header import decode_header
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
@@ -44,6 +43,20 @@ from gateway.platforms.base import (
from gateway.config import Platform, PlatformConfig
logger = logging.getLogger(__name__)
# Automated sender patterns — emails from these are silently ignored
_NOREPLY_PATTERNS = (
"noreply", "no-reply", "no_reply", "donotreply", "do-not-reply",
"mailer-daemon", "postmaster", "bounce", "notifications@",
"automated@", "auto-confirm", "auto-reply", "automailer",
)
# RFC headers that indicate bulk/automated mail
_AUTOMATED_HEADERS = {
"Auto-Submitted": lambda v: v.lower() != "no",
"Precedence": lambda v: v.lower() in ("bulk", "list", "junk"),
"X-Auto-Response-Suppress": lambda v: bool(v),
"List-Unsubscribe": lambda v: bool(v),
}
# Gmail-safe max length per email body
MAX_MESSAGE_LENGTH = 50_000
@@ -51,7 +64,17 @@ MAX_MESSAGE_LENGTH = 50_000
# Supported image extensions for inline detection
_IMAGE_EXTS = {".jpg", ".jpeg", ".png", ".gif", ".webp"}
def _is_automated_sender(address: str, headers: dict) -> bool:
"""Return True if this email is from an automated/noreply source."""
addr = address.lower()
if any(pattern in addr for pattern in _NOREPLY_PATTERNS):
return True
for header, check in _AUTOMATED_HEADERS.items():
value = headers.get(header, "")
if value and check(value):
return True
return False
def check_email_requirements() -> bool:
"""Check if email platform dependencies are available."""
addr = os.getenv("EMAIL_ADDRESS")
@@ -135,14 +158,23 @@ def _extract_email_address(raw: str) -> str:
return raw.strip().lower()
def _extract_attachments(msg: email_lib.message.Message) -> List[Dict[str, Any]]:
"""Extract attachment metadata and cache files locally."""
def _extract_attachments(
msg: email_lib.message.Message,
skip_attachments: bool = False,
) -> List[Dict[str, Any]]:
"""Extract attachment metadata and cache files locally.
When *skip_attachments* is True, all attachment/inline parts are ignored
(useful for malware protection or bandwidth savings).
"""
attachments = []
if not msg.is_multipart():
return attachments
for part in msg.walk():
disposition = str(part.get("Content-Disposition", ""))
if skip_attachments and ("attachment" in disposition or "inline" in disposition):
continue
if "attachment" not in disposition and "inline" not in disposition:
continue
# Skip text/plain and text/html body parts
@@ -196,8 +228,16 @@ class EmailAdapter(BasePlatformAdapter):
self._smtp_port = int(os.getenv("EMAIL_SMTP_PORT", "587"))
self._poll_interval = int(os.getenv("EMAIL_POLL_INTERVAL", "15"))
# Skip attachments — configured via config.yaml:
# platforms:
# email:
# skip_attachments: true
extra = config.extra or {}
self._skip_attachments = extra.get("skip_attachments", False)
# Track message IDs we've already processed to avoid duplicates
self._seen_uids: set = set()
self._seen_uids_max: int = 2000 # cap to prevent unbounded memory growth
self._poll_task: Optional[asyncio.Task] = None
# Map chat_id (sender email) -> last subject + message-id for threading
@@ -205,18 +245,40 @@ class EmailAdapter(BasePlatformAdapter):
logger.info("[Email] Adapter initialized for %s", self._address)
def _trim_seen_uids(self) -> None:
"""Keep only the most recent UIDs to prevent unbounded memory growth.
IMAP UIDs are monotonically increasing integers. When the set grows
beyond the cap, we keep only the highest half — old UIDs are safe to
drop because new messages always have higher UIDs and IMAP's UNSEEN
flag prevents re-delivery regardless.
"""
if len(self._seen_uids) <= self._seen_uids_max:
return
try:
# UIDs are bytes like b'1234' — sort numerically and keep top half
sorted_uids = sorted(self._seen_uids, key=lambda u: int(u))
keep = self._seen_uids_max // 2
self._seen_uids = set(sorted_uids[-keep:])
logger.debug("[Email] Trimmed seen UIDs to %d entries", len(self._seen_uids))
except (ValueError, TypeError):
# Fallback: just clear old entries if sort fails
self._seen_uids = set(list(self._seen_uids)[-self._seen_uids_max // 2:])
async def connect(self) -> bool:
"""Connect to the IMAP server and start polling for new messages."""
try:
# Test IMAP connection
imap = imaplib.IMAP4_SSL(self._imap_host, self._imap_port)
imap = imaplib.IMAP4_SSL(self._imap_host, self._imap_port, timeout=30)
imap.login(self._address, self._password)
# Mark all existing messages as seen so we only process new ones
imap.select("INBOX")
status, data = imap.uid("search", None, "ALL")
if status == "OK" and data[0]:
if status == "OK" and data and data[0]:
for uid in data[0].split():
self._seen_uids.add(uid)
# Keep only the most recent UIDs to prevent unbounded growth
self._trim_seen_uids()
imap.logout()
logger.info("[Email] IMAP connection test passed. %d existing messages skipped.", len(self._seen_uids))
except Exception as e:
@@ -225,7 +287,7 @@ class EmailAdapter(BasePlatformAdapter):
try:
# Test SMTP connection
smtp = smtplib.SMTP(self._smtp_host, self._smtp_port)
smtp = smtplib.SMTP(self._smtp_host, self._smtp_port, timeout=30)
smtp.starttls(context=ssl.create_default_context())
smtp.login(self._address, self._password)
smtp.quit()
@@ -274,53 +336,64 @@ class EmailAdapter(BasePlatformAdapter):
"""Fetch new (unseen) messages from IMAP. Runs in executor thread."""
results = []
try:
imap = imaplib.IMAP4_SSL(self._imap_host, self._imap_port)
imap.login(self._address, self._password)
imap.select("INBOX")
imap = imaplib.IMAP4_SSL(self._imap_host, self._imap_port, timeout=30)
try:
imap.login(self._address, self._password)
imap.select("INBOX")
status, data = imap.uid("search", None, "UNSEEN")
if status != "OK" or not data[0]:
imap.logout()
return results
status, data = imap.uid("search", None, "UNSEEN")
if status != "OK" or not data or not data[0]:
return results
for uid in data[0].split():
if uid in self._seen_uids:
continue
self._seen_uids.add(uid)
for uid in data[0].split():
if uid in self._seen_uids:
continue
self._seen_uids.add(uid)
# Trim periodically to prevent unbounded memory growth
if len(self._seen_uids) > self._seen_uids_max:
self._trim_seen_uids()
status, msg_data = imap.uid("fetch", uid, "(RFC822)")
if status != "OK":
continue
status, msg_data = imap.uid("fetch", uid, "(RFC822)")
if status != "OK":
continue
raw_email = msg_data[0][1]
msg = email_lib.message_from_bytes(raw_email)
raw_email = msg_data[0][1]
msg = email_lib.message_from_bytes(raw_email)
sender_raw = msg.get("From", "")
sender_addr = _extract_email_address(sender_raw)
sender_name = _decode_header_value(sender_raw)
# Remove email from name if present
if "<" in sender_name:
sender_name = sender_name.split("<")[0].strip().strip('"')
sender_raw = msg.get("From", "")
sender_addr = _extract_email_address(sender_raw)
sender_name = _decode_header_value(sender_raw)
# Remove email from name if present
if "<" in sender_name:
sender_name = sender_name.split("<")[0].strip().strip('"')
subject = _decode_header_value(msg.get("Subject", "(no subject)"))
message_id = msg.get("Message-ID", "")
in_reply_to = msg.get("In-Reply-To", "")
body = _extract_text_body(msg)
attachments = _extract_attachments(msg)
subject = _decode_header_value(msg.get("Subject", "(no subject)"))
message_id = msg.get("Message-ID", "")
in_reply_to = msg.get("In-Reply-To", "")
# Skip automated/noreply senders before any processing
msg_headers = dict(msg.items())
if _is_automated_sender(sender_addr, msg_headers):
logger.debug("[Email] Skipping automated sender: %s", sender_addr)
continue
body = _extract_text_body(msg)
attachments = _extract_attachments(msg, skip_attachments=self._skip_attachments)
results.append({
"uid": uid,
"sender_addr": sender_addr,
"sender_name": sender_name,
"subject": subject,
"message_id": message_id,
"in_reply_to": in_reply_to,
"body": body,
"attachments": attachments,
"date": msg.get("Date", ""),
})
imap.logout()
results.append({
"uid": uid,
"sender_addr": sender_addr,
"sender_name": sender_name,
"subject": subject,
"message_id": message_id,
"in_reply_to": in_reply_to,
"body": body,
"attachments": attachments,
"date": msg.get("Date", ""),
})
finally:
try:
imap.logout()
except Exception:
pass
except Exception as e:
logger.error("[Email] IMAP fetch error: %s", e)
return results
@@ -333,6 +406,11 @@ class EmailAdapter(BasePlatformAdapter):
if sender_addr == self._address.lower():
return
# Never reply to automated senders
if _is_automated_sender(sender_addr, {}):
logger.debug("[Email] Dropping automated sender at dispatch: %s", sender_addr)
return
subject = msg_data["subject"]
body = msg_data["body"].strip()
attachments = msg_data["attachments"]
@@ -427,18 +505,22 @@ class EmailAdapter(BasePlatformAdapter):
msg.attach(MIMEText(body, "plain", "utf-8"))
smtp = smtplib.SMTP(self._smtp_host, self._smtp_port)
smtp.starttls(context=ssl.create_default_context())
smtp.login(self._address, self._password)
smtp.send_message(msg)
smtp.quit()
smtp = smtplib.SMTP(self._smtp_host, self._smtp_port, timeout=30)
try:
smtp.starttls(context=ssl.create_default_context())
smtp.login(self._address, self._password)
smtp.send_message(msg)
finally:
try:
smtp.quit()
except Exception:
smtp.close()
logger.info("[Email] Sent reply to %s (subject: %s)", to_addr, subject)
return msg_id
async def send_typing(self, chat_id: str) -> None:
async def send_typing(self, chat_id: str, metadata: Optional[Dict[str, Any]] = None) -> None:
"""Email has no typing indicator — no-op."""
pass
async def send_image(
self,
@@ -515,11 +597,16 @@ class EmailAdapter(BasePlatformAdapter):
part.add_header("Content-Disposition", f"attachment; filename={fname}")
msg.attach(part)
smtp = smtplib.SMTP(self._smtp_host, self._smtp_port)
smtp.starttls(context=ssl.create_default_context())
smtp.login(self._address, self._password)
smtp.send_message(msg)
smtp.quit()
smtp = smtplib.SMTP(self._smtp_host, self._smtp_port, timeout=30)
try:
smtp.starttls(context=ssl.create_default_context())
smtp.login(self._address, self._password)
smtp.send_message(msg)
finally:
try:
smtp.quit()
except Exception:
smtp.close()
return msg_id

3255
gateway/platforms/feishu.py Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -19,7 +19,7 @@ import os
import time
import uuid
from datetime import datetime
from typing import Any, Dict, List, Optional, Set
from typing import Any, Dict, Optional, Set
try:
import aiohttp
@@ -114,7 +114,9 @@ class HomeAssistantAdapter(BasePlatformAdapter):
return False
# Dedicated REST session for send() calls
self._rest_session = aiohttp.ClientSession()
self._rest_session = aiohttp.ClientSession(
timeout=aiohttp.ClientTimeout(total=30)
)
# Warn if no event filters are configured
if not self._watch_domains and not self._watch_entities and not self._watch_all:
@@ -140,8 +142,10 @@ class HomeAssistantAdapter(BasePlatformAdapter):
ws_url = self._hass_url.replace("http://", "ws://").replace("https://", "wss://")
ws_url = f"{ws_url}/api/websocket"
self._session = aiohttp.ClientSession()
self._ws = await self._session.ws_connect(ws_url, heartbeat=30)
self._session = aiohttp.ClientSession(
timeout=aiohttp.ClientTimeout(total=30)
)
self._ws = await self._session.ws_connect(ws_url, heartbeat=30, timeout=30)
# Step 1: Receive auth_required
msg = await self._ws.receive_json()
@@ -435,7 +439,6 @@ class HomeAssistantAdapter(BasePlatformAdapter):
async def send_typing(self, chat_id: str, metadata=None) -> None:
"""No typing indicator for Home Assistant."""
pass
async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
"""Return basic info about the HA event channel."""

1986
gateway/platforms/matrix.py Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,733 @@
"""Mattermost gateway adapter.
Connects to a self-hosted (or cloud) Mattermost instance via its REST API
(v4) and WebSocket for real-time events. No external Mattermost library
required — uses aiohttp which is already a Hermes dependency.
Environment variables:
MATTERMOST_URL Server URL (e.g. https://mm.example.com)
MATTERMOST_TOKEN Bot token or personal-access token
MATTERMOST_ALLOWED_USERS Comma-separated user IDs
MATTERMOST_HOME_CHANNEL Channel ID for cron/notification delivery
"""
from __future__ import annotations
import asyncio
import json
import logging
import os
import re
import time
from pathlib import Path
from typing import Any, Dict, List, Optional
from gateway.config import Platform, PlatformConfig
from gateway.platforms.base import (
BasePlatformAdapter,
MessageEvent,
MessageType,
SendResult,
)
logger = logging.getLogger(__name__)
# Mattermost post size limit (server default is 16383, but 4000 is the
# practical limit for readable messages — matching OpenClaw's choice).
MAX_POST_LENGTH = 4000
# Channel type codes returned by the Mattermost API.
_CHANNEL_TYPE_MAP = {
"D": "dm",
"G": "group",
"P": "group", # private channel → treat as group
"O": "channel",
}
# Reconnect parameters (exponential backoff).
_RECONNECT_BASE_DELAY = 2.0
_RECONNECT_MAX_DELAY = 60.0
_RECONNECT_JITTER = 0.2
def check_mattermost_requirements() -> bool:
"""Return True if the Mattermost adapter can be used."""
token = os.getenv("MATTERMOST_TOKEN", "")
url = os.getenv("MATTERMOST_URL", "")
if not token:
logger.debug("Mattermost: MATTERMOST_TOKEN not set")
return False
if not url:
logger.warning("Mattermost: MATTERMOST_URL not set")
return False
try:
import aiohttp # noqa: F401
return True
except ImportError:
logger.warning("Mattermost: aiohttp not installed")
return False
class MattermostAdapter(BasePlatformAdapter):
"""Gateway adapter for Mattermost (self-hosted or cloud)."""
def __init__(self, config: PlatformConfig):
super().__init__(config, Platform.MATTERMOST)
self._base_url: str = (
config.extra.get("url", "")
or os.getenv("MATTERMOST_URL", "")
).rstrip("/")
self._token: str = config.token or os.getenv("MATTERMOST_TOKEN", "")
self._bot_user_id: str = ""
self._bot_username: str = ""
# aiohttp session + websocket handle
self._session: Any = None # aiohttp.ClientSession
self._ws: Any = None # aiohttp.ClientWebSocketResponse
self._ws_task: Optional[asyncio.Task] = None
self._reconnect_task: Optional[asyncio.Task] = None
self._closing = False
# Reply mode: "thread" to nest replies, "off" for flat messages.
self._reply_mode: str = (
config.extra.get("reply_mode", "")
or os.getenv("MATTERMOST_REPLY_MODE", "off")
).lower()
# Dedup cache: post_id → timestamp (prevent reprocessing)
self._seen_posts: Dict[str, float] = {}
self._SEEN_MAX = 2000
self._SEEN_TTL = 300 # 5 minutes
# ------------------------------------------------------------------
# HTTP helpers
# ------------------------------------------------------------------
def _headers(self) -> Dict[str, str]:
return {
"Authorization": f"Bearer {self._token}",
"Content-Type": "application/json",
}
async def _api_get(self, path: str) -> Dict[str, Any]:
"""GET /api/v4/{path}."""
import aiohttp
url = f"{self._base_url}/api/v4/{path.lstrip('/')}"
try:
async with self._session.get(url, headers=self._headers(), timeout=aiohttp.ClientTimeout(total=30)) as resp:
if resp.status >= 400:
body = await resp.text()
logger.error("MM API GET %s%s: %s", path, resp.status, body[:200])
return {}
return await resp.json()
except aiohttp.ClientError as exc:
logger.error("MM API GET %s network error: %s", path, exc)
return {}
async def _api_post(
self, path: str, payload: Dict[str, Any]
) -> Dict[str, Any]:
"""POST /api/v4/{path} with JSON body."""
import aiohttp
url = f"{self._base_url}/api/v4/{path.lstrip('/')}"
try:
async with self._session.post(
url, headers=self._headers(), json=payload,
timeout=aiohttp.ClientTimeout(total=30)
) as resp:
if resp.status >= 400:
body = await resp.text()
logger.error("MM API POST %s%s: %s", path, resp.status, body[:200])
return {}
return await resp.json()
except aiohttp.ClientError as exc:
logger.error("MM API POST %s network error: %s", path, exc)
return {}
async def _api_put(
self, path: str, payload: Dict[str, Any]
) -> Dict[str, Any]:
"""PUT /api/v4/{path} with JSON body."""
import aiohttp
url = f"{self._base_url}/api/v4/{path.lstrip('/')}"
try:
async with self._session.put(
url, headers=self._headers(), json=payload
) as resp:
if resp.status >= 400:
body = await resp.text()
logger.error("MM API PUT %s%s: %s", path, resp.status, body[:200])
return {}
return await resp.json()
except aiohttp.ClientError as exc:
logger.error("MM API PUT %s network error: %s", path, exc)
return {}
async def _upload_file(
self, channel_id: str, file_data: bytes, filename: str, content_type: str = "application/octet-stream"
) -> Optional[str]:
"""Upload a file and return its file ID, or None on failure."""
import aiohttp
url = f"{self._base_url}/api/v4/files"
form = aiohttp.FormData()
form.add_field("channel_id", channel_id)
form.add_field(
"files",
file_data,
filename=filename,
content_type=content_type,
)
headers = {"Authorization": f"Bearer {self._token}"}
async with self._session.post(url, headers=headers, data=form, timeout=aiohttp.ClientTimeout(total=60)) as resp:
if resp.status >= 400:
body = await resp.text()
logger.error("MM file upload → %s: %s", resp.status, body[:200])
return None
data = await resp.json()
infos = data.get("file_infos", [])
return infos[0]["id"] if infos else None
# ------------------------------------------------------------------
# Required overrides
# ------------------------------------------------------------------
async def connect(self) -> bool:
"""Connect to Mattermost and start the WebSocket listener."""
import aiohttp
if not self._base_url or not self._token:
logger.error("Mattermost: URL or token not configured")
return False
self._session = aiohttp.ClientSession(
timeout=aiohttp.ClientTimeout(total=30)
)
self._closing = False
# Verify credentials and fetch bot identity.
me = await self._api_get("users/me")
if not me or "id" not in me:
logger.error("Mattermost: failed to authenticate — check MATTERMOST_TOKEN and MATTERMOST_URL")
await self._session.close()
return False
self._bot_user_id = me["id"]
self._bot_username = me.get("username", "")
logger.info(
"Mattermost: authenticated as @%s (%s) on %s",
self._bot_username,
self._bot_user_id,
self._base_url,
)
# Start WebSocket in background.
self._ws_task = asyncio.create_task(self._ws_loop())
self._mark_connected()
return True
async def disconnect(self) -> None:
"""Disconnect from Mattermost."""
self._closing = True
if self._ws_task and not self._ws_task.done():
self._ws_task.cancel()
try:
await self._ws_task
except (asyncio.CancelledError, Exception):
pass
if self._reconnect_task and not self._reconnect_task.done():
self._reconnect_task.cancel()
if self._ws:
await self._ws.close()
self._ws = None
if self._session and not self._session.closed:
await self._session.close()
logger.info("Mattermost: disconnected")
async def send(
self,
chat_id: str,
content: str,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Send a message (or multiple chunks) to a channel."""
if not content:
return SendResult(success=True)
formatted = self.format_message(content)
chunks = self.truncate_message(formatted, MAX_POST_LENGTH)
last_id = None
for chunk in chunks:
payload: Dict[str, Any] = {
"channel_id": chat_id,
"message": chunk,
}
# Thread support: reply_to is the root post ID.
if reply_to and self._reply_mode == "thread":
payload["root_id"] = reply_to
data = await self._api_post("posts", payload)
if not data or "id" not in data:
return SendResult(success=False, error="Failed to create post")
last_id = data["id"]
return SendResult(success=True, message_id=last_id)
async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
"""Return channel name and type."""
data = await self._api_get(f"channels/{chat_id}")
if not data:
return {"name": chat_id, "type": "channel"}
ch_type = _CHANNEL_TYPE_MAP.get(data.get("type", "O"), "channel")
display_name = data.get("display_name") or data.get("name") or chat_id
return {"name": display_name, "type": ch_type}
# ------------------------------------------------------------------
# Optional overrides
# ------------------------------------------------------------------
async def send_typing(
self, chat_id: str, metadata: Optional[Dict[str, Any]] = None
) -> None:
"""Send a typing indicator."""
await self._api_post(
f"users/{self._bot_user_id}/typing",
{"channel_id": chat_id},
)
async def edit_message(
self, chat_id: str, message_id: str, content: str
) -> SendResult:
"""Edit an existing post."""
formatted = self.format_message(content)
data = await self._api_put(
f"posts/{message_id}/patch",
{"message": formatted},
)
if not data or "id" not in data:
return SendResult(success=False, error="Failed to edit post")
return SendResult(success=True, message_id=data["id"])
async def send_image(
self,
chat_id: str,
image_url: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Download an image and upload it as a file attachment."""
return await self._send_url_as_file(
chat_id, image_url, caption, reply_to, "image"
)
async def send_image_file(
self,
chat_id: str,
image_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Upload a local image file."""
return await self._send_local_file(
chat_id, image_path, caption, reply_to
)
async def send_document(
self,
chat_id: str,
file_path: str,
caption: Optional[str] = None,
file_name: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Upload a local file as a document."""
return await self._send_local_file(
chat_id, file_path, caption, reply_to, file_name
)
async def send_voice(
self,
chat_id: str,
audio_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Upload an audio file."""
return await self._send_local_file(
chat_id, audio_path, caption, reply_to
)
async def send_video(
self,
chat_id: str,
video_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Upload a video file."""
return await self._send_local_file(
chat_id, video_path, caption, reply_to
)
def format_message(self, content: str) -> str:
"""Mattermost uses standard Markdown — mostly pass through.
Strip image markdown into plain links (files are uploaded separately).
"""
# Convert ![alt](url) to just the URL — Mattermost renders
# image URLs as inline previews automatically.
content = re.sub(r"!\[([^\]]*)\]\(([^)]+)\)", r"\2", content)
return content
# ------------------------------------------------------------------
# File helpers
# ------------------------------------------------------------------
async def _send_url_as_file(
self,
chat_id: str,
url: str,
caption: Optional[str],
reply_to: Optional[str],
kind: str = "file",
) -> SendResult:
"""Download a URL and upload it as a file attachment."""
import asyncio
import aiohttp
last_exc = None
file_data = None
ct = "application/octet-stream"
fname = url.rsplit("/", 1)[-1].split("?")[0] or f"{kind}.png"
for attempt in range(3):
try:
async with self._session.get(url, timeout=aiohttp.ClientTimeout(total=30)) as resp:
if resp.status >= 500 or resp.status == 429:
if attempt < 2:
logger.debug("Mattermost download retry %d/2 for %s (status %d)",
attempt + 1, url[:80], resp.status)
await asyncio.sleep(1.5 * (attempt + 1))
continue
if resp.status >= 400:
return await self.send(chat_id, f"{caption or ''}\n{url}".strip(), reply_to)
file_data = await resp.read()
ct = resp.content_type or "application/octet-stream"
break
except (aiohttp.ClientError, asyncio.TimeoutError) as exc:
last_exc = exc
if attempt < 2:
await asyncio.sleep(1.5 * (attempt + 1))
continue
logger.warning("Mattermost: failed to download %s after %d attempts: %s", url, attempt + 1, exc)
return await self.send(chat_id, f"{caption or ''}\n{url}".strip(), reply_to)
if file_data is None:
logger.warning("Mattermost: download returned no data for %s", url)
return await self.send(chat_id, f"{caption or ''}\n{url}".strip(), reply_to)
file_id = await self._upload_file(chat_id, file_data, fname, ct)
if not file_id:
return await self.send(chat_id, f"{caption or ''}\n{url}".strip(), reply_to)
payload: Dict[str, Any] = {
"channel_id": chat_id,
"message": caption or "",
"file_ids": [file_id],
}
if reply_to and self._reply_mode == "thread":
payload["root_id"] = reply_to
data = await self._api_post("posts", payload)
if not data or "id" not in data:
return SendResult(success=False, error="Failed to post with file")
return SendResult(success=True, message_id=data["id"])
async def _send_local_file(
self,
chat_id: str,
file_path: str,
caption: Optional[str],
reply_to: Optional[str],
file_name: Optional[str] = None,
) -> SendResult:
"""Upload a local file and attach it to a post."""
import mimetypes
p = Path(file_path)
if not p.exists():
return await self.send(
chat_id, f"{caption or ''}\n(file not found: {file_path})", reply_to
)
fname = file_name or p.name
ct = mimetypes.guess_type(fname)[0] or "application/octet-stream"
file_data = p.read_bytes()
file_id = await self._upload_file(chat_id, file_data, fname, ct)
if not file_id:
return SendResult(success=False, error="File upload failed")
payload: Dict[str, Any] = {
"channel_id": chat_id,
"message": caption or "",
"file_ids": [file_id],
}
if reply_to and self._reply_mode == "thread":
payload["root_id"] = reply_to
data = await self._api_post("posts", payload)
if not data or "id" not in data:
return SendResult(success=False, error="Failed to post with file")
return SendResult(success=True, message_id=data["id"])
# ------------------------------------------------------------------
# WebSocket
# ------------------------------------------------------------------
async def _ws_loop(self) -> None:
"""Connect to the WebSocket and listen for events, reconnecting on failure."""
delay = _RECONNECT_BASE_DELAY
while not self._closing:
try:
await self._ws_connect_and_listen()
# Clean disconnect — reset delay.
delay = _RECONNECT_BASE_DELAY
except asyncio.CancelledError:
return
except Exception as exc:
if self._closing:
return
# Detect permanent auth/permission failures that will never
# succeed on retry — stop reconnecting instead of looping forever.
import aiohttp
err_str = str(exc).lower()
if isinstance(exc, aiohttp.WSServerHandshakeError) and exc.status in (401, 403):
logger.error("Mattermost WS auth failed (HTTP %d) — stopping reconnect", exc.status)
return
if "401" in err_str or "403" in err_str or "unauthorized" in err_str:
logger.error("Mattermost WS permanent error: %s — stopping reconnect", exc)
return
logger.warning("Mattermost WS error: %s — reconnecting in %.0fs", exc, delay)
if self._closing:
return
# Exponential backoff with jitter.
import random
jitter = delay * _RECONNECT_JITTER * random.random()
await asyncio.sleep(delay + jitter)
delay = min(delay * 2, _RECONNECT_MAX_DELAY)
async def _ws_connect_and_listen(self) -> None:
"""Single WebSocket session: connect, authenticate, process events."""
# Build WS URL: https:// → wss://, http:// → ws://
ws_url = re.sub(r"^http", "ws", self._base_url) + "/api/v4/websocket"
logger.info("Mattermost: connecting to %s", ws_url)
self._ws = await self._session.ws_connect(ws_url, heartbeat=30.0)
# Authenticate via the WebSocket.
auth_msg = {
"seq": 1,
"action": "authentication_challenge",
"data": {"token": self._token},
}
await self._ws.send_json(auth_msg)
logger.info("Mattermost: WebSocket connected and authenticated")
async for raw_msg in self._ws:
if self._closing:
return
if raw_msg.type in (
raw_msg.type.TEXT,
raw_msg.type.BINARY,
):
try:
event = json.loads(raw_msg.data)
except (json.JSONDecodeError, TypeError):
continue
await self._handle_ws_event(event)
elif raw_msg.type in (
raw_msg.type.ERROR,
raw_msg.type.CLOSE,
raw_msg.type.CLOSING,
raw_msg.type.CLOSED,
):
logger.info("Mattermost: WebSocket closed (%s)", raw_msg.type)
break
async def _handle_ws_event(self, event: Dict[str, Any]) -> None:
"""Process a single WebSocket event."""
event_type = event.get("event")
if event_type != "posted":
return
data = event.get("data", {})
raw_post_str = data.get("post")
if not raw_post_str:
return
try:
post = json.loads(raw_post_str)
except (json.JSONDecodeError, TypeError):
return
# Ignore own messages.
if post.get("user_id") == self._bot_user_id:
return
# Ignore system posts.
if post.get("type"):
return
post_id = post.get("id", "")
# Dedup.
self._prune_seen()
if post_id in self._seen_posts:
return
self._seen_posts[post_id] = time.time()
# Build message event.
channel_id = post.get("channel_id", "")
channel_type_raw = data.get("channel_type", "O")
chat_type = _CHANNEL_TYPE_MAP.get(channel_type_raw, "channel")
# For DMs, user_id is sufficient. For channels, check for @mention.
message_text = post.get("message", "")
# Mention-gating for non-DM channels.
# Config (env vars):
# MATTERMOST_REQUIRE_MENTION: Require @mention in channels (default: true)
# MATTERMOST_FREE_RESPONSE_CHANNELS: Channel IDs where bot responds without mention
if channel_type_raw != "D":
require_mention = os.getenv(
"MATTERMOST_REQUIRE_MENTION", "true"
).lower() not in ("false", "0", "no")
free_channels_raw = os.getenv("MATTERMOST_FREE_RESPONSE_CHANNELS", "")
free_channels = {ch.strip() for ch in free_channels_raw.split(",") if ch.strip()}
is_free_channel = channel_id in free_channels
mention_patterns = [
f"@{self._bot_username}",
f"@{self._bot_user_id}",
]
has_mention = any(
pattern.lower() in message_text.lower()
for pattern in mention_patterns
)
if require_mention and not is_free_channel and not has_mention:
logger.debug(
"Mattermost: skipping non-DM message without @mention (channel=%s)",
channel_id,
)
return
# Strip @mention from the message text so the agent sees clean input.
if has_mention:
for pattern in mention_patterns:
message_text = re.sub(
re.escape(pattern), "", message_text, flags=re.IGNORECASE
).strip()
# Resolve sender info.
sender_id = post.get("user_id", "")
sender_name = data.get("sender_name", "").lstrip("@") or sender_id
# Thread support: if the post is in a thread, use root_id.
thread_id = post.get("root_id") or None
# Determine message type.
file_ids = post.get("file_ids") or []
msg_type = MessageType.TEXT
if message_text.startswith("/"):
msg_type = MessageType.COMMAND
# Download file attachments immediately (URLs require auth headers
# that downstream tools won't have).
media_urls: List[str] = []
media_types: List[str] = []
for fid in file_ids:
try:
file_info = await self._api_get(f"files/{fid}/info")
fname = file_info.get("name", f"file_{fid}")
ext = Path(fname).suffix or ""
mime = file_info.get("mime_type", "application/octet-stream")
import aiohttp
dl_url = f"{self._base_url}/api/v4/files/{fid}"
async with self._session.get(
dl_url,
headers={"Authorization": f"Bearer {self._token}"},
timeout=aiohttp.ClientTimeout(total=30),
) as resp:
if resp.status < 400:
file_data = await resp.read()
from gateway.platforms.base import cache_image_from_bytes, cache_document_from_bytes
if mime.startswith("image/"):
local_path = cache_image_from_bytes(file_data, ext or ".png")
media_urls.append(local_path)
media_types.append(mime)
elif mime.startswith("audio/"):
from gateway.platforms.base import cache_audio_from_bytes
local_path = cache_audio_from_bytes(file_data, ext or ".ogg")
media_urls.append(local_path)
media_types.append(mime)
else:
local_path = cache_document_from_bytes(file_data, fname)
media_urls.append(local_path)
media_types.append(mime)
else:
logger.warning("Mattermost: failed to download file %s: HTTP %s", fid, resp.status)
except Exception as exc:
logger.warning("Mattermost: error downloading file %s: %s", fid, exc)
source = self.build_source(
chat_id=channel_id,
chat_type=chat_type,
user_id=sender_id,
user_name=sender_name,
thread_id=thread_id,
)
msg_event = MessageEvent(
text=message_text,
message_type=msg_type,
source=source,
raw_message=post,
message_id=post_id,
media_urls=media_urls if media_urls else None,
media_types=media_types if media_types else None,
)
await self.handle_message(msg_event)
def _prune_seen(self) -> None:
"""Remove expired entries from the dedup cache."""
if len(self._seen_posts) < self._SEEN_MAX:
return
now = time.time()
self._seen_posts = {
pid: ts
for pid, ts in self._seen_posts.items()
if now - ts < self._SEEN_TTL
}

View File

@@ -22,7 +22,7 @@ import time
from datetime import datetime, timezone
from pathlib import Path
from typing import Dict, List, Optional, Any
from urllib.parse import unquote
from urllib.parse import quote, unquote
import httpx
@@ -179,6 +179,13 @@ class SignalAdapter(BasePlatformAdapter):
# Normalize account for self-message filtering
self._account_normalized = self.account.strip()
# Track recently sent message timestamps to prevent echo-back loops
# in Note to Self / self-chat mode (mirrors WhatsApp recentlySentIds)
self._recent_sent_timestamps: set = set()
self._max_recent_timestamps = 50
self._phone_lock_identity: Optional[str] = None
logger.info("Signal adapter initialized: url=%s account=%s groups=%s",
self.http_url, _redact_phone(self.account),
"enabled" if self.group_allow_from else "disabled")
@@ -193,6 +200,29 @@ class SignalAdapter(BasePlatformAdapter):
logger.error("Signal: SIGNAL_HTTP_URL and SIGNAL_ACCOUNT are required")
return False
# Acquire scoped lock to prevent duplicate Signal listeners for the same phone
try:
from gateway.status import acquire_scoped_lock
self._phone_lock_identity = self.account
acquired, existing = acquire_scoped_lock(
"signal-phone",
self._phone_lock_identity,
metadata={"platform": self.platform.value},
)
if not acquired:
owner_pid = existing.get("pid") if isinstance(existing, dict) else None
message = (
"Another local Hermes gateway is already using this Signal account"
+ (f" (PID {owner_pid})." if owner_pid else ".")
+ " Stop the other gateway before starting a second Signal listener."
)
logger.error("Signal: %s", message)
self._set_fatal_error("signal_phone_lock", message, retryable=False)
return False
except Exception as e:
logger.warning("Signal: Could not acquire phone lock (non-fatal): %s", e)
self.client = httpx.AsyncClient(timeout=30.0)
# Health check — verify signal-cli daemon is reachable
@@ -240,6 +270,14 @@ class SignalAdapter(BasePlatformAdapter):
await self.client.aclose()
self.client = None
if self._phone_lock_identity:
try:
from gateway.status import release_scoped_lock
release_scoped_lock("signal-phone", self._phone_lock_identity)
except Exception as e:
logger.warning("Signal: Error releasing phone lock: %s", e, exc_info=True)
self._phone_lock_identity = None
logger.info("Signal: disconnected")
# ------------------------------------------------------------------
@@ -248,7 +286,7 @@ class SignalAdapter(BasePlatformAdapter):
async def _sse_listener(self) -> None:
"""Listen for SSE events from signal-cli daemon."""
url = f"{self.http_url}/api/v1/events?account={self.account}"
url = f"{self.http_url}/api/v1/events?account={quote(self.account, safe='')}"
backoff = SSE_RETRY_DELAY_INITIAL
while self._running:
@@ -274,6 +312,12 @@ class SignalAdapter(BasePlatformAdapter):
line = line.strip()
if not line:
continue
# SSE keepalive comments (":") prove the connection
# is alive — update activity so the health monitor
# doesn't report false idle warnings.
if line.startswith(":"):
self._last_sse_activity = time.time()
continue
# Parse SSE data lines
if line.startswith("data:"):
data_str = line[5:].strip()
@@ -339,7 +383,9 @@ class SignalAdapter(BasePlatformAdapter):
"""Force SSE reconnection by closing the current response."""
if self._sse_response and not self._sse_response.is_stream_consumed:
try:
asyncio.create_task(self._sse_response.aclose())
task = asyncio.create_task(self._sse_response.aclose())
self._background_tasks.add(task)
task.add_done_callback(self._background_tasks.discard)
except Exception:
pass
self._sse_response = None
@@ -353,10 +399,26 @@ class SignalAdapter(BasePlatformAdapter):
# Unwrap nested envelope if present
envelope_data = envelope.get("envelope", envelope)
# Filter syncMessage envelopes (sent transcripts, read receipts, etc.)
# signal-cli may set syncMessage to null vs omitting it, so check key existence
# Handle syncMessage: extract "Note to Self" messages (sent to own account)
# while still filtering other sync events (read receipts, typing, etc.)
is_note_to_self = False
if "syncMessage" in envelope_data:
return
sync_msg = envelope_data.get("syncMessage")
if sync_msg and isinstance(sync_msg, dict):
sent_msg = sync_msg.get("sentMessage")
if sent_msg and isinstance(sent_msg, dict):
dest = sent_msg.get("destinationNumber") or sent_msg.get("destination")
sent_ts = sent_msg.get("timestamp")
if dest == self._account_normalized:
# Check if this is an echo of our own outbound reply
if sent_ts and sent_ts in self._recent_sent_timestamps:
self._recent_sent_timestamps.discard(sent_ts)
return
# Genuine user Note to Self — promote to dataMessage
is_note_to_self = True
envelope_data = {**envelope_data, "dataMessage": sent_msg}
if not is_note_to_self:
return
# Extract sender info
sender = (
@@ -371,8 +433,8 @@ class SignalAdapter(BasePlatformAdapter):
logger.debug("Signal: ignoring envelope with no sender")
return
# Self-message filtering — prevent reply loops
if self._account_normalized and sender == self._account_normalized:
# Self-message filtering — prevent reply loops (but allow Note to Self)
if self._account_normalized and sender == self._account_normalized and not is_note_to_self:
return
# Filter stories
@@ -457,7 +519,7 @@ class SignalAdapter(BasePlatformAdapter):
if any(mt.startswith("audio/") for mt in media_types):
msg_type = MessageType.VOICE
elif any(mt.startswith("image/") for mt in media_types):
msg_type = MessageType.IMAGE
msg_type = MessageType.PHOTO
# Parse timestamp from envelope data (milliseconds since epoch)
ts_ms = envelope_data.get("timestamp", 0)
@@ -492,12 +554,19 @@ class SignalAdapter(BasePlatformAdapter):
"""Fetch an attachment via JSON-RPC and cache it. Returns (path, ext)."""
result = await self._rpc("getAttachment", {
"account": self.account,
"attachmentId": attachment_id,
"id": attachment_id,
})
if not result:
return None, ""
# Handle dict response (signal-cli returns {"data": "base64..."})
if isinstance(result, dict):
result = result.get("data")
if not result:
logger.warning("Signal: attachment response missing 'data' key")
return None, ""
# Result is base64-encoded file content
raw_data = base64.b64decode(result)
ext = _guess_extension(raw_data)
@@ -577,9 +646,18 @@ class SignalAdapter(BasePlatformAdapter):
result = await self._rpc("send", params)
if result is not None:
self._track_sent_timestamp(result)
return SendResult(success=True)
return SendResult(success=False, error="RPC send failed")
def _track_sent_timestamp(self, rpc_result) -> None:
"""Record outbound message timestamp for echo-back filtering."""
ts = rpc_result.get("timestamp") if isinstance(rpc_result, dict) else None
if ts:
self._recent_sent_timestamps.add(ts)
if len(self._recent_sent_timestamps) > self._max_recent_timestamps:
self._recent_sent_timestamps.pop()
async def send_typing(self, chat_id: str, metadata=None) -> None:
"""Send a typing indicator."""
params: Dict[str, Any] = {
@@ -635,6 +713,7 @@ class SignalAdapter(BasePlatformAdapter):
result = await self._rpc("send", params)
if result is not None:
self._track_sent_timestamp(result)
return SendResult(success=True)
return SendResult(success=False, error="RPC send with attachment failed")
@@ -665,6 +744,7 @@ class SignalAdapter(BasePlatformAdapter):
result = await self._rpc("send", params)
if result is not None:
self._track_sent_timestamp(result)
return SendResult(success=True)
return SendResult(success=False, error="RPC send document failed")

View File

@@ -9,10 +9,12 @@ Uses slack-bolt (Python) with Socket Mode for:
"""
import asyncio
import json
import logging
import os
import re
from typing import Dict, List, Optional, Any
import time
from typing import Dict, Optional, Any
try:
from slack_bolt.async_app import AsyncApp
@@ -37,8 +39,6 @@ from gateway.platforms.base import (
SendResult,
SUPPORTED_DOCUMENT_TYPES,
cache_document_from_bytes,
cache_image_from_url,
cache_audio_from_url,
)
@@ -74,6 +74,16 @@ class SlackAdapter(BasePlatformAdapter):
self._handler: Optional[AsyncSocketModeHandler] = None
self._bot_user_id: Optional[str] = None
self._user_name_cache: Dict[str, str] = {} # user_id → display name
self._socket_mode_task: Optional[asyncio.Task] = None
# Multi-workspace support
self._team_clients: Dict[str, AsyncWebClient] = {} # team_id → WebClient
self._team_bot_user_ids: Dict[str, str] = {} # team_id → bot_user_id
self._channel_team: Dict[str, str] = {} # channel_id → team_id
# Dedup cache: event_ts → timestamp. Prevents duplicate bot
# responses when Socket Mode reconnects redeliver events.
self._seen_messages: Dict[str, float] = {}
self._SEEN_TTL = 300 # 5 minutes
self._SEEN_MAX = 2000 # prune threshold
async def connect(self) -> bool:
"""Connect to Slack via Socket Mode."""
@@ -83,23 +93,70 @@ class SlackAdapter(BasePlatformAdapter):
)
return False
bot_token = self.config.token
raw_token = self.config.token
app_token = os.getenv("SLACK_APP_TOKEN")
if not bot_token:
if not raw_token:
logger.error("[Slack] SLACK_BOT_TOKEN not set")
return False
if not app_token:
logger.error("[Slack] SLACK_APP_TOKEN not set")
return False
try:
self._app = AsyncApp(token=bot_token)
# Support comma-separated bot tokens for multi-workspace
bot_tokens = [t.strip() for t in raw_token.split(",") if t.strip()]
# Get our own bot user ID for mention detection
auth_response = await self._app.client.auth_test()
self._bot_user_id = auth_response.get("user_id")
bot_name = auth_response.get("user", "unknown")
# Also load tokens from OAuth token file
from hermes_constants import get_hermes_home
tokens_file = get_hermes_home() / "slack_tokens.json"
if tokens_file.exists():
try:
saved = json.loads(tokens_file.read_text(encoding="utf-8"))
for team_id, entry in saved.items():
tok = entry.get("token", "") if isinstance(entry, dict) else ""
if tok and tok not in bot_tokens:
bot_tokens.append(tok)
team_label = entry.get("team_name", team_id) if isinstance(entry, dict) else team_id
logger.info("[Slack] Loaded saved token for workspace %s", team_label)
except Exception as e:
logger.warning("[Slack] Failed to read %s: %s", tokens_file, e)
try:
# Acquire scoped lock to prevent duplicate app token usage
from gateway.status import acquire_scoped_lock
self._token_lock_identity = app_token
acquired, existing = acquire_scoped_lock('slack-app-token', app_token, metadata={'platform': 'slack'})
if not acquired:
owner_pid = existing.get('pid') if isinstance(existing, dict) else None
message = f'Slack app token already in use' + (f' (PID {owner_pid})' if owner_pid else '') + '. Stop the other gateway first.'
logger.error('[%s] %s', self.name, message)
self._set_fatal_error('slack_token_lock', message, retryable=False)
return False
# First token is the primary — used for AsyncApp / Socket Mode
primary_token = bot_tokens[0]
self._app = AsyncApp(token=primary_token)
# Register each bot token and map team_id → client
for token in bot_tokens:
client = AsyncWebClient(token=token)
auth_response = await client.auth_test()
team_id = auth_response.get("team_id", "")
bot_user_id = auth_response.get("user_id", "")
bot_name = auth_response.get("user", "unknown")
team_name = auth_response.get("team", "unknown")
self._team_clients[team_id] = client
self._team_bot_user_ids[team_id] = bot_user_id
# First token sets the primary bot_user_id (backward compat)
if self._bot_user_id is None:
self._bot_user_id = bot_user_id
logger.info(
"[Slack] Authenticated as @%s in workspace %s (team: %s)",
bot_name, team_name, team_id,
)
# Register message event handler
@self._app.event("message")
@@ -121,10 +178,13 @@ class SlackAdapter(BasePlatformAdapter):
# Start Socket Mode handler in background
self._handler = AsyncSocketModeHandler(self._app, app_token)
asyncio.create_task(self._handler.start_async())
self._socket_mode_task = asyncio.create_task(self._handler.start_async())
self._running = True
logger.info("[Slack] Connected as @%s (Socket Mode)", bot_name)
logger.info(
"[Slack] Socket Mode connected (%d workspace(s))",
len(self._team_clients),
)
return True
except Exception as e: # pragma: no cover - defensive logging
@@ -139,8 +199,25 @@ class SlackAdapter(BasePlatformAdapter):
except Exception as e: # pragma: no cover - defensive logging
logger.warning("[Slack] Error while closing Socket Mode handler: %s", e, exc_info=True)
self._running = False
# Release the token lock (use stored identity, not re-read env)
try:
from gateway.status import release_scoped_lock
if getattr(self, '_token_lock_identity', None):
release_scoped_lock('slack-app-token', self._token_lock_identity)
self._token_lock_identity = None
except Exception:
pass
logger.info("[Slack] Disconnected")
def _get_client(self, chat_id: str) -> AsyncWebClient:
"""Return the workspace-specific WebClient for a channel."""
team_id = self._channel_team.get(chat_id)
if team_id and team_id in self._team_clients:
return self._team_clients[team_id]
return self._app.client # fallback to primary
async def send(
self,
chat_id: str,
@@ -177,7 +254,7 @@ class SlackAdapter(BasePlatformAdapter):
if broadcast and i == 0:
kwargs["reply_broadcast"] = True
last_result = await self._app.client.chat_postMessage(**kwargs)
last_result = await self._get_client(chat_id).chat_postMessage(**kwargs)
return SendResult(
success=True,
@@ -199,7 +276,7 @@ class SlackAdapter(BasePlatformAdapter):
if not self._app:
return SendResult(success=False, error="Not connected")
try:
await self._app.client.chat_update(
await self._get_client(chat_id).chat_update(
channel=chat_id,
ts=message_id,
text=content,
@@ -233,7 +310,7 @@ class SlackAdapter(BasePlatformAdapter):
return # Can only set status in a thread context
try:
await self._app.client.assistant_threads_setStatus(
await self._get_client(chat_id).assistant_threads_setStatus(
channel_id=chat_id,
thread_ts=thread_ts,
status="is thinking...",
@@ -252,7 +329,18 @@ class SlackAdapter(BasePlatformAdapter):
Prefers metadata thread_id (the thread parent's ts, set by the
gateway) over reply_to (which may be a child message's ts).
When ``reply_in_thread`` is ``false`` in the platform extra config,
top-level channel messages receive direct channel replies instead of
thread replies. Messages that originate inside an existing thread are
always replied to in-thread to preserve conversation context.
"""
# When reply_in_thread is disabled (default: True for backward compat),
# only thread messages that are already part of an existing thread.
if not self.config.extra.get("reply_in_thread", True):
existing_thread = (metadata or {}).get("thread_id") or (metadata or {}).get("thread_ts")
return existing_thread or None
if metadata:
if metadata.get("thread_id"):
return metadata["thread_id"]
@@ -275,7 +363,7 @@ class SlackAdapter(BasePlatformAdapter):
if not os.path.exists(file_path):
raise FileNotFoundError(f"File not found: {file_path}")
result = await self._app.client.files_upload_v2(
result = await self._get_client(chat_id).files_upload_v2(
channel=chat_id,
file=file_path,
filename=os.path.basename(file_path),
@@ -377,7 +465,7 @@ class SlackAdapter(BasePlatformAdapter):
if not self._app:
return False
try:
await self._app.client.reactions_add(
await self._get_client(channel).reactions_add(
channel=channel, timestamp=timestamp, name=emoji
)
return True
@@ -393,7 +481,7 @@ class SlackAdapter(BasePlatformAdapter):
if not self._app:
return False
try:
await self._app.client.reactions_remove(
await self._get_client(channel).reactions_remove(
channel=channel, timestamp=timestamp, name=emoji
)
return True
@@ -403,7 +491,7 @@ class SlackAdapter(BasePlatformAdapter):
# ----- User identity resolution -----
async def _resolve_user_name(self, user_id: str) -> str:
async def _resolve_user_name(self, user_id: str, chat_id: str = "") -> str:
"""Resolve a Slack user ID to a display name, with caching."""
if not user_id:
return ""
@@ -414,7 +502,8 @@ class SlackAdapter(BasePlatformAdapter):
return user_id
try:
result = await self._app.client.users_info(user=user_id)
client = self._get_client(chat_id) if chat_id else self._app.client
result = await client.users_info(user=user_id)
user = result.get("user", {})
# Prefer display_name → real_name → user_id
profile = user.get("profile", {})
@@ -478,7 +567,7 @@ class SlackAdapter(BasePlatformAdapter):
response = await client.get(image_url)
response.raise_for_status()
result = await self._app.client.files_upload_v2(
result = await self._get_client(chat_id).files_upload_v2(
channel=chat_id,
content=response.content,
filename="image.png",
@@ -538,7 +627,7 @@ class SlackAdapter(BasePlatformAdapter):
return SendResult(success=False, error=f"Video file not found: {video_path}")
try:
result = await self._app.client.files_upload_v2(
result = await self._get_client(chat_id).files_upload_v2(
channel=chat_id,
file=video_path,
filename=os.path.basename(video_path),
@@ -579,7 +668,7 @@ class SlackAdapter(BasePlatformAdapter):
display_name = file_name or os.path.basename(file_path)
try:
result = await self._app.client.files_upload_v2(
result = await self._get_client(chat_id).files_upload_v2(
channel=chat_id,
file=file_path,
filename=display_name,
@@ -607,7 +696,7 @@ class SlackAdapter(BasePlatformAdapter):
return {"name": chat_id, "type": "unknown"}
try:
result = await self._app.client.conversations_info(channel=chat_id)
result = await self._get_client(chat_id).conversations_info(channel=chat_id)
channel = result.get("channel", {})
is_dm = channel.get("is_im", False)
return {
@@ -627,6 +716,20 @@ class SlackAdapter(BasePlatformAdapter):
async def _handle_slack_message(self, event: dict) -> None:
"""Handle an incoming Slack message event."""
# Dedup: Slack Socket Mode can redeliver events after reconnects (#4777)
event_ts = event.get("ts", "")
if event_ts:
now = time.time()
if event_ts in self._seen_messages:
return
self._seen_messages[event_ts] = now
if len(self._seen_messages) > self._SEEN_MAX:
cutoff = now - self._SEEN_TTL
self._seen_messages = {
k: v for k, v in self._seen_messages.items()
if v > cutoff
}
# Ignore bot messages (including our own)
if event.get("bot_id") or event.get("subtype") == "bot_message":
return
@@ -640,6 +743,11 @@ class SlackAdapter(BasePlatformAdapter):
user_id = event.get("user", "")
channel_id = event.get("channel", "")
ts = event.get("ts", "")
team_id = event.get("team", "")
# Track which workspace owns this channel
if team_id and channel_id:
self._channel_team[channel_id] = team_id
# Determine if this is a DM or channel message
channel_type = event.get("channel_type", "")
@@ -656,11 +764,12 @@ class SlackAdapter(BasePlatformAdapter):
thread_ts = event.get("thread_ts") or ts # ts fallback for channels
# In channels, only respond if bot is mentioned
if not is_dm and self._bot_user_id:
if f"<@{self._bot_user_id}>" not in text:
bot_uid = self._team_bot_user_ids.get(team_id, self._bot_user_id)
if not is_dm and bot_uid:
if f"<@{bot_uid}>" not in text:
return
# Strip the bot mention from the text
text = text.replace(f"<@{self._bot_user_id}>", "").strip()
text = text.replace(f"<@{bot_uid}>", "").strip()
# Determine message type
msg_type = MessageType.TEXT
@@ -680,7 +789,7 @@ class SlackAdapter(BasePlatformAdapter):
if ext not in (".jpg", ".jpeg", ".png", ".gif", ".webp"):
ext = ".jpg"
# Slack private URLs require the bot token as auth header
cached = await self._download_slack_file(url, ext)
cached = await self._download_slack_file(url, ext, team_id=team_id)
media_urls.append(cached)
media_types.append(mimetype)
msg_type = MessageType.PHOTO
@@ -691,7 +800,7 @@ class SlackAdapter(BasePlatformAdapter):
ext = "." + mimetype.split("/")[-1].split(";")[0]
if ext not in (".ogg", ".mp3", ".wav", ".webm", ".m4a"):
ext = ".ogg"
cached = await self._download_slack_file(url, ext, audio=True)
cached = await self._download_slack_file(url, ext, audio=True, team_id=team_id)
media_urls.append(cached)
media_types.append(mimetype)
msg_type = MessageType.VOICE
@@ -722,7 +831,7 @@ class SlackAdapter(BasePlatformAdapter):
continue
# Download and cache
raw_bytes = await self._download_slack_file_bytes(url)
raw_bytes = await self._download_slack_file_bytes(url, team_id=team_id)
cached_path = cache_document_from_bytes(
raw_bytes, original_filename or f"document{ext}"
)
@@ -751,7 +860,7 @@ class SlackAdapter(BasePlatformAdapter):
logger.warning("[Slack] Failed to cache document from %s: %s", url, e, exc_info=True)
# Resolve user display name (cached after first lookup)
user_name = await self._resolve_user_name(user_id)
user_name = await self._resolve_user_name(user_id, chat_id=channel_id)
# Build source
source = self.build_source(
@@ -788,24 +897,17 @@ class SlackAdapter(BasePlatformAdapter):
text = command.get("text", "").strip()
user_id = command.get("user_id", "")
channel_id = command.get("channel_id", "")
team_id = command.get("team_id", "")
# Map subcommands to gateway commands
subcommand_map = {
"new": "/reset", "reset": "/reset",
"status": "/status", "stop": "/stop",
"help": "/help",
"model": "/model", "personality": "/personality",
"retry": "/retry", "undo": "/undo",
"compact": "/compress", "compress": "/compress",
"resume": "/resume",
"background": "/background",
"usage": "/usage",
"insights": "/insights",
"title": "/title",
"reasoning": "/reasoning",
"provider": "/provider",
"rollback": "/rollback",
}
# Track which workspace owns this channel
if team_id and channel_id:
self._channel_team[channel_id] = team_id
# Map subcommands to gateway commands — derived from central registry.
# Also keep "compact" as a Slack-specific alias for /compress.
from hermes_cli.commands import slack_subcommand_map
subcommand_map = slack_subcommand_map()
subcommand_map["compact"] = "/compress"
first_word = text.split()[0] if text else ""
if first_word in subcommand_map:
# Preserve arguments after the subcommand
@@ -831,34 +933,66 @@ class SlackAdapter(BasePlatformAdapter):
await self.handle_message(event)
async def _download_slack_file(self, url: str, ext: str, audio: bool = False) -> str:
"""Download a Slack file using the bot token for auth."""
async def _download_slack_file(self, url: str, ext: str, audio: bool = False, team_id: str = "") -> str:
"""Download a Slack file using the bot token for auth, with retry."""
import asyncio
import httpx
bot_token = self.config.token
bot_token = self._team_clients[team_id].token if team_id and team_id in self._team_clients else self.config.token
last_exc = None
async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
response = await client.get(
url,
headers={"Authorization": f"Bearer {bot_token}"},
)
response.raise_for_status()
for attempt in range(3):
try:
response = await client.get(
url,
headers={"Authorization": f"Bearer {bot_token}"},
)
response.raise_for_status()
if audio:
from gateway.platforms.base import cache_audio_from_bytes
return cache_audio_from_bytes(response.content, ext)
else:
from gateway.platforms.base import cache_image_from_bytes
return cache_image_from_bytes(response.content, ext)
if audio:
from gateway.platforms.base import cache_audio_from_bytes
return cache_audio_from_bytes(response.content, ext)
else:
from gateway.platforms.base import cache_image_from_bytes
return cache_image_from_bytes(response.content, ext)
except (httpx.TimeoutException, httpx.HTTPStatusError) as exc:
last_exc = exc
if isinstance(exc, httpx.HTTPStatusError) and exc.response.status_code < 429:
raise
if attempt < 2:
logger.debug("Slack file download retry %d/2 for %s: %s",
attempt + 1, url[:80], exc)
await asyncio.sleep(1.5 * (attempt + 1))
continue
raise
raise last_exc
async def _download_slack_file_bytes(self, url: str) -> bytes:
"""Download a Slack file and return raw bytes."""
async def _download_slack_file_bytes(self, url: str, team_id: str = "") -> bytes:
"""Download a Slack file and return raw bytes, with retry."""
import asyncio
import httpx
bot_token = self.config.token
bot_token = self._team_clients[team_id].token if team_id and team_id in self._team_clients else self.config.token
last_exc = None
async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
response = await client.get(
url,
headers={"Authorization": f"Bearer {bot_token}"},
)
response.raise_for_status()
return response.content
for attempt in range(3):
try:
response = await client.get(
url,
headers={"Authorization": f"Bearer {bot_token}"},
)
response.raise_for_status()
return response.content
except (httpx.TimeoutException, httpx.HTTPStatusError) as exc:
last_exc = exc
if isinstance(exc, httpx.HTTPStatusError) and exc.response.status_code < 429:
raise
if attempt < 2:
logger.debug("Slack file download retry %d/2 for %s: %s",
attempt + 1, url[:80], exc)
await asyncio.sleep(1.5 * (attempt + 1))
continue
raise
raise last_exc

276
gateway/platforms/sms.py Normal file
View File

@@ -0,0 +1,276 @@
"""SMS (Twilio) platform adapter.
Connects to the Twilio REST API for outbound SMS and runs an aiohttp
webhook server to receive inbound messages.
Shares credentials with the optional telephony skill — same env vars:
- TWILIO_ACCOUNT_SID
- TWILIO_AUTH_TOKEN
- TWILIO_PHONE_NUMBER (E.164 from-number, e.g. +15551234567)
Gateway-specific env vars:
- SMS_WEBHOOK_PORT (default 8080)
- SMS_ALLOWED_USERS (comma-separated E.164 phone numbers)
- SMS_ALLOW_ALL_USERS (true/false)
- SMS_HOME_CHANNEL (phone number for cron delivery)
"""
import asyncio
import base64
import logging
import os
import re
import urllib.parse
from typing import Any, Dict, Optional
from gateway.config import Platform, PlatformConfig
from gateway.platforms.base import (
BasePlatformAdapter,
MessageEvent,
MessageType,
SendResult,
)
logger = logging.getLogger(__name__)
TWILIO_API_BASE = "https://api.twilio.com/2010-04-01/Accounts"
MAX_SMS_LENGTH = 1600 # ~10 SMS segments
DEFAULT_WEBHOOK_PORT = 8080
# E.164 phone number pattern for redaction
_PHONE_RE = re.compile(r"\+[1-9]\d{6,14}")
def _redact_phone(phone: str) -> str:
"""Redact a phone number for logging: +15551234567 -> +1555***4567."""
if not phone:
return "<none>"
if len(phone) <= 8:
return phone[:2] + "***" + phone[-2:] if len(phone) > 4 else "****"
return phone[:5] + "***" + phone[-4:]
def check_sms_requirements() -> bool:
"""Check if SMS adapter dependencies are available."""
try:
import aiohttp # noqa: F401
except ImportError:
return False
return bool(os.getenv("TWILIO_ACCOUNT_SID") and os.getenv("TWILIO_AUTH_TOKEN"))
class SmsAdapter(BasePlatformAdapter):
"""
Twilio SMS <-> Hermes gateway adapter.
Each inbound phone number gets its own Hermes session (multi-tenant).
Replies are always sent from the configured TWILIO_PHONE_NUMBER.
"""
MAX_MESSAGE_LENGTH = MAX_SMS_LENGTH
def __init__(self, config: PlatformConfig):
super().__init__(config, Platform.SMS)
self._account_sid: str = os.environ["TWILIO_ACCOUNT_SID"]
self._auth_token: str = os.environ["TWILIO_AUTH_TOKEN"]
self._from_number: str = os.getenv("TWILIO_PHONE_NUMBER", "")
self._webhook_port: int = int(
os.getenv("SMS_WEBHOOK_PORT", str(DEFAULT_WEBHOOK_PORT))
)
self._runner = None
self._http_session: Optional["aiohttp.ClientSession"] = None
def _basic_auth_header(self) -> str:
"""Build HTTP Basic auth header value for Twilio."""
creds = f"{self._account_sid}:{self._auth_token}"
encoded = base64.b64encode(creds.encode("ascii")).decode("ascii")
return f"Basic {encoded}"
# ------------------------------------------------------------------
# Required abstract methods
# ------------------------------------------------------------------
async def connect(self) -> bool:
import aiohttp
from aiohttp import web
if not self._from_number:
logger.error("[sms] TWILIO_PHONE_NUMBER not set — cannot send replies")
return False
app = web.Application()
app.router.add_post("/webhooks/twilio", self._handle_webhook)
app.router.add_get("/health", lambda _: web.Response(text="ok"))
self._runner = web.AppRunner(app)
await self._runner.setup()
site = web.TCPSite(self._runner, "0.0.0.0", self._webhook_port)
await site.start()
self._http_session = aiohttp.ClientSession(
timeout=aiohttp.ClientTimeout(total=30),
)
self._running = True
logger.info(
"[sms] Twilio webhook server listening on port %d, from: %s",
self._webhook_port,
_redact_phone(self._from_number),
)
return True
async def disconnect(self) -> None:
if self._http_session:
await self._http_session.close()
self._http_session = None
if self._runner:
await self._runner.cleanup()
self._runner = None
self._running = False
logger.info("[sms] Disconnected")
async def send(
self,
chat_id: str,
content: str,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
import aiohttp
formatted = self.format_message(content)
chunks = self.truncate_message(formatted)
last_result = SendResult(success=True)
url = f"{TWILIO_API_BASE}/{self._account_sid}/Messages.json"
headers = {
"Authorization": self._basic_auth_header(),
}
session = self._http_session or aiohttp.ClientSession(
timeout=aiohttp.ClientTimeout(total=30),
)
try:
for chunk in chunks:
form_data = aiohttp.FormData()
form_data.add_field("From", self._from_number)
form_data.add_field("To", chat_id)
form_data.add_field("Body", chunk)
try:
async with session.post(url, data=form_data, headers=headers) as resp:
body = await resp.json()
if resp.status >= 400:
error_msg = body.get("message", str(body))
logger.error(
"[sms] send failed to %s: %s %s",
_redact_phone(chat_id),
resp.status,
error_msg,
)
return SendResult(
success=False,
error=f"Twilio {resp.status}: {error_msg}",
)
msg_sid = body.get("sid", "")
last_result = SendResult(success=True, message_id=msg_sid)
except Exception as e:
logger.error("[sms] send error to %s: %s", _redact_phone(chat_id), e)
return SendResult(success=False, error=str(e))
finally:
# Close session only if we created a fallback (no persistent session)
if not self._http_session and session:
await session.close()
return last_result
async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
return {"name": chat_id, "type": "dm"}
# ------------------------------------------------------------------
# SMS-specific formatting
# ------------------------------------------------------------------
def format_message(self, content: str) -> str:
"""Strip markdown — SMS renders it as literal characters."""
content = re.sub(r"\*\*(.+?)\*\*", r"\1", content, flags=re.DOTALL)
content = re.sub(r"\*(.+?)\*", r"\1", content, flags=re.DOTALL)
content = re.sub(r"__(.+?)__", r"\1", content, flags=re.DOTALL)
content = re.sub(r"_(.+?)_", r"\1", content, flags=re.DOTALL)
content = re.sub(r"```[a-z]*\n?", "", content)
content = re.sub(r"`(.+?)`", r"\1", content)
content = re.sub(r"^#{1,6}\s+", "", content, flags=re.MULTILINE)
content = re.sub(r"\[([^\]]+)\]\([^\)]+\)", r"\1", content)
content = re.sub(r"\n{3,}", "\n\n", content)
return content.strip()
# ------------------------------------------------------------------
# Twilio webhook handler
# ------------------------------------------------------------------
async def _handle_webhook(self, request) -> "aiohttp.web.Response":
from aiohttp import web
try:
raw = await request.read()
# Twilio sends form-encoded data, not JSON
form = urllib.parse.parse_qs(raw.decode("utf-8"))
except Exception as e:
logger.error("[sms] webhook parse error: %s", e)
return web.Response(
text='<?xml version="1.0" encoding="UTF-8"?><Response></Response>',
content_type="application/xml",
status=400,
)
# Extract fields (parse_qs returns lists)
from_number = (form.get("From", [""]))[0].strip()
to_number = (form.get("To", [""]))[0].strip()
text = (form.get("Body", [""]))[0].strip()
message_sid = (form.get("MessageSid", [""]))[0].strip()
if not from_number or not text:
return web.Response(
text='<?xml version="1.0" encoding="UTF-8"?><Response></Response>',
content_type="application/xml",
)
# Ignore messages from our own number (echo prevention)
if from_number == self._from_number:
logger.debug("[sms] ignoring echo from own number %s", _redact_phone(from_number))
return web.Response(
text='<?xml version="1.0" encoding="UTF-8"?><Response></Response>',
content_type="application/xml",
)
logger.info(
"[sms] inbound from %s -> %s: %s",
_redact_phone(from_number),
_redact_phone(to_number),
text[:80],
)
source = self.build_source(
chat_id=from_number,
chat_name=from_number,
chat_type="dm",
user_id=from_number,
user_name=from_number,
)
event = MessageEvent(
text=text,
message_type=MessageType.TEXT,
source=source,
raw_message=form,
message_id=message_sid,
)
# Non-blocking: Twilio expects a fast response
task = asyncio.create_task(self.handle_message(event))
self._background_tasks.add(task)
task.add_done_callback(self._background_tasks.discard)
# Return empty TwiML — we send replies via the REST API, not inline TwiML
return web.Response(
text='<?xml version="1.0" encoding="UTF-8"?><Response></Response>',
content_type="application/xml",
)

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,248 @@
"""Telegram-specific network helpers.
Provides a hostname-preserving fallback transport for networks where
api.telegram.org resolves to an endpoint that is unreachable from the current
host. The transport keeps the logical request host and TLS SNI as
api.telegram.org while retrying the TCP connection against one or more fallback
IPv4 addresses.
"""
from __future__ import annotations
import asyncio
import ipaddress
import logging
import os
import socket
from typing import Iterable, Optional
import httpx
logger = logging.getLogger(__name__)
_TELEGRAM_API_HOST = "api.telegram.org"
# DNS-over-HTTPS providers used to discover Telegram API IPs that may differ
# from the (potentially unreachable) IP returned by the local system resolver.
_DOH_TIMEOUT = 4.0 # seconds — bounded so connect() isn't noticeably delayed
_DOH_PROVIDERS: list[dict] = [
{
"url": "https://dns.google/resolve",
"params": {"name": _TELEGRAM_API_HOST, "type": "A"},
"headers": {},
},
{
"url": "https://cloudflare-dns.com/dns-query",
"params": {"name": _TELEGRAM_API_HOST, "type": "A"},
"headers": {"Accept": "application/dns-json"},
},
]
# Last-resort IPs when DoH is also blocked. These are stable Telegram Bot API
# endpoints in the 149.154.160.0/20 block (same seed used by OpenClaw).
_SEED_FALLBACK_IPS: list[str] = ["149.154.167.220"]
def _resolve_proxy_url() -> str | None:
for key in ("HTTPS_PROXY", "HTTP_PROXY", "ALL_PROXY", "https_proxy", "http_proxy", "all_proxy"):
value = (os.environ.get(key) or "").strip()
if value:
return value
return None
class TelegramFallbackTransport(httpx.AsyncBaseTransport):
"""Retry Telegram Bot API requests via fallback IPs while preserving TLS/SNI.
Requests continue to target https://api.telegram.org/... logically, but on
connect failures the underlying TCP connection is retried against a known
reachable IP. This is effectively the programmatic equivalent of
``curl --resolve api.telegram.org:443:<ip>``.
"""
def __init__(self, fallback_ips: Iterable[str], **transport_kwargs):
self._fallback_ips = [ip for ip in dict.fromkeys(_normalize_fallback_ips(fallback_ips))]
proxy_url = _resolve_proxy_url()
if proxy_url and "proxy" not in transport_kwargs:
transport_kwargs["proxy"] = proxy_url
self._primary = httpx.AsyncHTTPTransport(**transport_kwargs)
self._fallbacks = {
ip: httpx.AsyncHTTPTransport(**transport_kwargs) for ip in self._fallback_ips
}
self._sticky_ip: Optional[str] = None
self._sticky_lock = asyncio.Lock()
async def handle_async_request(self, request: httpx.Request) -> httpx.Response:
if request.url.host != _TELEGRAM_API_HOST or not self._fallback_ips:
return await self._primary.handle_async_request(request)
sticky_ip = self._sticky_ip
attempt_order: list[Optional[str]] = [sticky_ip] if sticky_ip else [None]
for ip in self._fallback_ips:
if ip != sticky_ip:
attempt_order.append(ip)
last_error: Exception | None = None
for ip in attempt_order:
candidate = request if ip is None else _rewrite_request_for_ip(request, ip)
transport = self._primary if ip is None else self._fallbacks[ip]
try:
response = await transport.handle_async_request(candidate)
if ip is not None and self._sticky_ip != ip:
async with self._sticky_lock:
if self._sticky_ip != ip:
self._sticky_ip = ip
logger.warning(
"[Telegram] Primary api.telegram.org path unreachable; using sticky fallback IP %s",
ip,
)
return response
except Exception as exc:
last_error = exc
if not _is_retryable_connect_error(exc):
raise
if ip is None:
logger.warning(
"[Telegram] Primary api.telegram.org connection failed (%s); trying fallback IPs %s",
exc,
", ".join(self._fallback_ips),
)
continue
logger.warning("[Telegram] Fallback IP %s failed: %s", ip, exc)
continue
assert last_error is not None
raise last_error
async def aclose(self) -> None:
await self._primary.aclose()
for transport in self._fallbacks.values():
await transport.aclose()
def _normalize_fallback_ips(values: Iterable[str]) -> list[str]:
normalized: list[str] = []
for value in values:
raw = str(value).strip()
if not raw:
continue
try:
addr = ipaddress.ip_address(raw)
except ValueError:
logger.warning("Ignoring invalid Telegram fallback IP: %r", raw)
continue
if addr.version != 4:
logger.warning("Ignoring non-IPv4 Telegram fallback IP: %s", raw)
continue
if addr.is_private or addr.is_loopback or addr.is_link_local or addr.is_unspecified:
logger.warning("Ignoring private/internal Telegram fallback IP: %s", raw)
continue
normalized.append(str(addr))
return normalized
def parse_fallback_ip_env(value: str | None) -> list[str]:
if not value:
return []
parts = [part.strip() for part in value.split(",")]
return _normalize_fallback_ips(parts)
def _resolve_system_dns() -> set[str]:
"""Return the IPv4 addresses that the OS resolver gives for api.telegram.org."""
try:
results = socket.getaddrinfo(_TELEGRAM_API_HOST, 443, socket.AF_INET)
return {addr[4][0] for addr in results}
except Exception:
return set()
async def _query_doh_provider(
client: httpx.AsyncClient, provider: dict
) -> list[str]:
"""Query one DoH provider and return A-record IPs."""
try:
resp = await client.get(
provider["url"], params=provider["params"], headers=provider["headers"]
)
resp.raise_for_status()
data = resp.json()
ips: list[str] = []
for answer in data.get("Answer", []):
if answer.get("type") != 1: # A record
continue
raw = answer.get("data", "").strip()
try:
ipaddress.ip_address(raw)
ips.append(raw)
except ValueError:
continue
return ips
except Exception as exc:
logger.debug("DoH query to %s failed: %s", provider["url"], exc)
return []
async def discover_fallback_ips() -> list[str]:
"""Auto-discover Telegram API IPs via DNS-over-HTTPS.
Resolves api.telegram.org through Google and Cloudflare DoH, collects all
unique IPs, and excludes the system-DNS-resolved IP (which is presumably
unreachable on this network). Falls back to a hardcoded seed list when DoH
is also unavailable.
"""
async with httpx.AsyncClient(timeout=httpx.Timeout(_DOH_TIMEOUT)) as client:
doh_tasks = [_query_doh_provider(client, p) for p in _DOH_PROVIDERS]
system_dns_task = asyncio.to_thread(_resolve_system_dns)
results = await asyncio.gather(system_dns_task, *doh_tasks, return_exceptions=True)
# results[0] = system DNS IPs (set), results[1:] = DoH IP lists
system_ips: set[str] = results[0] if isinstance(results[0], set) else set()
doh_ips: list[str] = []
for r in results[1:]:
if isinstance(r, list):
doh_ips.extend(r)
# Deduplicate preserving order, exclude system-DNS IPs
seen: set[str] = set()
candidates: list[str] = []
for ip in doh_ips:
if ip not in seen and ip not in system_ips:
seen.add(ip)
candidates.append(ip)
# Validate through existing normalization
validated = _normalize_fallback_ips(candidates)
if validated:
logger.debug("Discovered Telegram fallback IPs via DoH: %s", ", ".join(validated))
return validated
logger.info(
"DoH discovery yielded no new IPs (system DNS: %s); using seed fallback IPs %s",
", ".join(system_ips) or "unknown",
", ".join(_SEED_FALLBACK_IPS),
)
return list(_SEED_FALLBACK_IPS)
def _rewrite_request_for_ip(request: httpx.Request, ip: str) -> httpx.Request:
original_host = request.url.host or _TELEGRAM_API_HOST
url = request.url.copy_with(host=ip)
headers = request.headers.copy()
headers["host"] = original_host
extensions = dict(request.extensions)
extensions["sni_hostname"] = original_host
return httpx.Request(
method=request.method,
url=url,
headers=headers,
stream=request.stream,
extensions=extensions,
)
def _is_retryable_connect_error(exc: Exception) -> bool:
return isinstance(exc, (httpx.ConnectTimeout, httpx.ConnectError))

View File

@@ -0,0 +1,616 @@
"""Generic webhook platform adapter.
Runs an aiohttp HTTP server that receives webhook POSTs from external
services (GitHub, GitLab, JIRA, Stripe, etc.), validates HMAC signatures,
transforms payloads into agent prompts, and routes responses back to the
source or to another configured platform.
Configuration lives in config.yaml under platforms.webhook.extra.routes.
Each route defines:
- events: which event types to accept (header-based filtering)
- secret: HMAC secret for signature validation (REQUIRED)
- prompt: template string formatted with the webhook payload
- skills: optional list of skills to load for the agent
- deliver: where to send the response (github_comment, telegram, etc.)
- deliver_extra: additional delivery config (repo, pr_number, chat_id)
Security:
- HMAC secret is required per route (validated at startup)
- Rate limiting per route (fixed-window, configurable)
- Idempotency cache prevents duplicate agent runs on webhook retries
- Body size limits checked before reading payload
- Set secret to "INSECURE_NO_AUTH" to skip validation (testing only)
"""
import asyncio
import hashlib
import hmac
import json
import logging
import os
import re
import subprocess
import time
from typing import Any, Dict, List, Optional
try:
from aiohttp import web
AIOHTTP_AVAILABLE = True
except ImportError:
AIOHTTP_AVAILABLE = False
web = None # type: ignore[assignment]
from gateway.config import Platform, PlatformConfig
from gateway.platforms.base import (
BasePlatformAdapter,
MessageEvent,
MessageType,
SendResult,
)
logger = logging.getLogger(__name__)
DEFAULT_HOST = "0.0.0.0"
DEFAULT_PORT = 8644
_INSECURE_NO_AUTH = "INSECURE_NO_AUTH"
_DYNAMIC_ROUTES_FILENAME = "webhook_subscriptions.json"
def check_webhook_requirements() -> bool:
"""Check if webhook adapter dependencies are available."""
return AIOHTTP_AVAILABLE
class WebhookAdapter(BasePlatformAdapter):
"""Generic webhook receiver that triggers agent runs from HTTP POSTs."""
def __init__(self, config: PlatformConfig):
super().__init__(config, Platform.WEBHOOK)
self._host: str = config.extra.get("host", DEFAULT_HOST)
self._port: int = int(config.extra.get("port", DEFAULT_PORT))
self._global_secret: str = config.extra.get("secret", "")
self._static_routes: Dict[str, dict] = config.extra.get("routes", {})
self._dynamic_routes: Dict[str, dict] = {}
self._dynamic_routes_mtime: float = 0.0
self._routes: Dict[str, dict] = dict(self._static_routes)
self._runner = None
# Delivery info keyed by session chat_id — consumed by send()
self._delivery_info: Dict[str, dict] = {}
# Reference to gateway runner for cross-platform delivery (set externally)
self.gateway_runner = None
# Idempotency: TTL cache of recently processed delivery IDs.
# Prevents duplicate agent runs when webhook providers retry.
self._seen_deliveries: Dict[str, float] = {}
self._idempotency_ttl: int = 3600 # 1 hour
# Rate limiting: per-route timestamps in a fixed window.
self._rate_counts: Dict[str, List[float]] = {}
self._rate_limit: int = int(config.extra.get("rate_limit", 30)) # per minute
# Body size limit (auth-before-body pattern)
self._max_body_bytes: int = int(
config.extra.get("max_body_bytes", 1_048_576)
) # 1MB
# ------------------------------------------------------------------
# Lifecycle
# ------------------------------------------------------------------
async def connect(self) -> bool:
# Load agent-created subscriptions before validating
self._reload_dynamic_routes()
# Validate routes at startup — secret is required per route
for name, route in self._routes.items():
secret = route.get("secret", self._global_secret)
if not secret:
raise ValueError(
f"[webhook] Route '{name}' has no HMAC secret. "
f"Set 'secret' on the route or globally. "
f"For testing without auth, set secret to '{_INSECURE_NO_AUTH}'."
)
app = web.Application()
app.router.add_get("/health", self._handle_health)
app.router.add_post("/webhooks/{route_name}", self._handle_webhook)
# Port conflict detection — fail fast if port is already in use
import socket as _socket
try:
with _socket.socket(_socket.AF_INET, _socket.SOCK_STREAM) as _s:
_s.settimeout(1)
_s.connect(('127.0.0.1', self._port))
logger.error('[webhook] Port %d already in use. Set a different port in config.yaml: platforms.webhook.port', self._port)
return False
except (ConnectionRefusedError, OSError):
pass # port is free
self._runner = web.AppRunner(app)
await self._runner.setup()
site = web.TCPSite(self._runner, self._host, self._port)
await site.start()
self._mark_connected()
route_names = ", ".join(self._routes.keys()) or "(none configured)"
logger.info(
"[webhook] Listening on %s:%d — routes: %s",
self._host,
self._port,
route_names,
)
return True
async def disconnect(self) -> None:
if self._runner:
await self._runner.cleanup()
self._runner = None
self._mark_disconnected()
logger.info("[webhook] Disconnected")
async def send(
self,
chat_id: str,
content: str,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Deliver the agent's response to the configured destination.
chat_id is ``webhook:{route}:{delivery_id}`` — we pop the delivery
info stored during webhook receipt so it doesn't leak memory.
"""
delivery = self._delivery_info.pop(chat_id, {})
deliver_type = delivery.get("deliver", "log")
if deliver_type == "log":
logger.info("[webhook] Response for %s: %s", chat_id, content[:200])
return SendResult(success=True)
if deliver_type == "github_comment":
return await self._deliver_github_comment(content, delivery)
# Cross-platform delivery (telegram, discord, etc.)
if self.gateway_runner and deliver_type in (
"telegram",
"discord",
"slack",
"signal",
"sms",
):
return await self._deliver_cross_platform(
deliver_type, content, delivery
)
logger.warning("[webhook] Unknown deliver type: %s", deliver_type)
return SendResult(
success=False, error=f"Unknown deliver type: {deliver_type}"
)
async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
return {"name": chat_id, "type": "webhook"}
# ------------------------------------------------------------------
# HTTP handlers
# ------------------------------------------------------------------
async def _handle_health(self, request: "web.Request") -> "web.Response":
"""GET /health — simple health check."""
return web.json_response({"status": "ok", "platform": "webhook"})
def _reload_dynamic_routes(self) -> None:
"""Reload agent-created subscriptions from disk if the file changed."""
from pathlib import Path as _Path
hermes_home = _Path(
os.getenv("HERMES_HOME", str(_Path.home() / ".hermes"))
).expanduser()
subs_path = hermes_home / _DYNAMIC_ROUTES_FILENAME
if not subs_path.exists():
if self._dynamic_routes:
self._dynamic_routes = {}
self._routes = dict(self._static_routes)
logger.debug("[webhook] Dynamic subscriptions file removed, cleared dynamic routes")
return
try:
mtime = subs_path.stat().st_mtime
if mtime <= self._dynamic_routes_mtime:
return # No change
data = json.loads(subs_path.read_text(encoding="utf-8"))
if not isinstance(data, dict):
return
# Merge: static routes take precedence over dynamic ones
self._dynamic_routes = {
k: v for k, v in data.items()
if k not in self._static_routes
}
self._routes = {**self._dynamic_routes, **self._static_routes}
self._dynamic_routes_mtime = mtime
logger.info(
"[webhook] Reloaded %d dynamic route(s): %s",
len(self._dynamic_routes),
", ".join(self._dynamic_routes.keys()) or "(none)",
)
except Exception as e:
logger.warning("[webhook] Failed to reload dynamic routes: %s", e)
async def _handle_webhook(self, request: "web.Request") -> "web.Response":
"""POST /webhooks/{route_name} — receive and process a webhook event."""
# Hot-reload dynamic subscriptions on each request (mtime-gated, cheap)
self._reload_dynamic_routes()
route_name = request.match_info.get("route_name", "")
route_config = self._routes.get(route_name)
if not route_config:
return web.json_response(
{"error": f"Unknown route: {route_name}"}, status=404
)
# ── Auth-before-body ─────────────────────────────────────
# Check Content-Length before reading the full payload.
content_length = request.content_length or 0
if content_length > self._max_body_bytes:
return web.json_response(
{"error": "Payload too large"}, status=413
)
# ── Rate limiting ────────────────────────────────────────
now = time.time()
window = self._rate_counts.setdefault(route_name, [])
window[:] = [t for t in window if now - t < 60]
if len(window) >= self._rate_limit:
return web.json_response(
{"error": "Rate limit exceeded"}, status=429
)
window.append(now)
# Read body
try:
raw_body = await request.read()
except Exception as e:
logger.error("[webhook] Failed to read body: %s", e)
return web.json_response({"error": "Bad request"}, status=400)
# Validate HMAC signature (skip for INSECURE_NO_AUTH testing mode)
secret = route_config.get("secret", self._global_secret)
if secret and secret != _INSECURE_NO_AUTH:
if not self._validate_signature(request, raw_body, secret):
logger.warning(
"[webhook] Invalid signature for route %s", route_name
)
return web.json_response(
{"error": "Invalid signature"}, status=401
)
# Parse payload
try:
payload = json.loads(raw_body)
except json.JSONDecodeError:
# Try form-encoded as fallback
try:
import urllib.parse
payload = dict(
urllib.parse.parse_qsl(raw_body.decode("utf-8"))
)
except Exception:
return web.json_response(
{"error": "Cannot parse body"}, status=400
)
# Check event type filter
event_type = (
request.headers.get("X-GitHub-Event", "")
or request.headers.get("X-GitLab-Event", "")
or payload.get("event_type", "")
or "unknown"
)
allowed_events = route_config.get("events", [])
if allowed_events and event_type not in allowed_events:
logger.debug(
"[webhook] Ignoring event %s for route %s (allowed: %s)",
event_type,
route_name,
allowed_events,
)
return web.json_response(
{"status": "ignored", "event": event_type}
)
# Format prompt from template
prompt_template = route_config.get("prompt", "")
prompt = self._render_prompt(
prompt_template, payload, event_type, route_name
)
# Inject skill content if configured.
# We call build_skill_invocation_message() directly rather than
# using /skill-name slash commands — the gateway's command parser
# would intercept those and break the flow.
skills = route_config.get("skills", [])
if skills:
try:
from agent.skill_commands import (
build_skill_invocation_message,
get_skill_commands,
)
skill_cmds = get_skill_commands()
for skill_name in skills:
cmd_key = f"/{skill_name}"
if cmd_key in skill_cmds:
skill_content = build_skill_invocation_message(
cmd_key, user_instruction=prompt
)
if skill_content:
prompt = skill_content
break # Load the first matching skill
else:
logger.warning(
"[webhook] Skill '%s' not found", skill_name
)
except Exception as e:
logger.warning("[webhook] Skill loading failed: %s", e)
# Build a unique delivery ID
delivery_id = request.headers.get(
"X-GitHub-Delivery",
request.headers.get("X-Request-ID", str(int(time.time() * 1000))),
)
# ── Idempotency ─────────────────────────────────────────
# Skip duplicate deliveries (webhook retries).
now = time.time()
# Prune expired entries
self._seen_deliveries = {
k: v
for k, v in self._seen_deliveries.items()
if now - v < self._idempotency_ttl
}
if delivery_id in self._seen_deliveries:
logger.info(
"[webhook] Skipping duplicate delivery %s", delivery_id
)
return web.json_response(
{"status": "duplicate", "delivery_id": delivery_id},
status=200,
)
self._seen_deliveries[delivery_id] = now
# Use delivery_id in session key so concurrent webhooks on the
# same route get independent agent runs (not queued/interrupted).
session_chat_id = f"webhook:{route_name}:{delivery_id}"
# Store delivery info for send() — consumed (popped) on delivery
deliver_config = {
"deliver": route_config.get("deliver", "log"),
"deliver_extra": self._render_delivery_extra(
route_config.get("deliver_extra", {}), payload
),
"payload": payload,
}
self._delivery_info[session_chat_id] = deliver_config
# Build source and event
source = self.build_source(
chat_id=session_chat_id,
chat_name=f"webhook/{route_name}",
chat_type="webhook",
user_id=f"webhook:{route_name}",
user_name=route_name,
)
event = MessageEvent(
text=prompt,
message_type=MessageType.TEXT,
source=source,
raw_message=payload,
message_id=delivery_id,
)
logger.info(
"[webhook] %s event=%s route=%s prompt_len=%d delivery=%s",
request.method,
event_type,
route_name,
len(prompt),
delivery_id,
)
# Non-blocking — return 202 Accepted immediately
task = asyncio.create_task(self.handle_message(event))
self._background_tasks.add(task)
task.add_done_callback(self._background_tasks.discard)
return web.json_response(
{
"status": "accepted",
"route": route_name,
"event": event_type,
"delivery_id": delivery_id,
},
status=202,
)
# ------------------------------------------------------------------
# Signature validation
# ------------------------------------------------------------------
def _validate_signature(
self, request: "web.Request", body: bytes, secret: str
) -> bool:
"""Validate webhook signature (GitHub, GitLab, generic HMAC-SHA256)."""
# GitHub: X-Hub-Signature-256 = sha256=<hex>
gh_sig = request.headers.get("X-Hub-Signature-256", "")
if gh_sig:
expected = "sha256=" + hmac.new(
secret.encode(), body, hashlib.sha256
).hexdigest()
return hmac.compare_digest(gh_sig, expected)
# GitLab: X-Gitlab-Token = <plain secret>
gl_token = request.headers.get("X-Gitlab-Token", "")
if gl_token:
return hmac.compare_digest(gl_token, secret)
# Generic: X-Webhook-Signature = <hex HMAC-SHA256>
generic_sig = request.headers.get("X-Webhook-Signature", "")
if generic_sig:
expected = hmac.new(
secret.encode(), body, hashlib.sha256
).hexdigest()
return hmac.compare_digest(generic_sig, expected)
# No recognised signature header but secret is configured → reject
logger.debug(
"[webhook] Secret configured but no signature header found"
)
return False
# ------------------------------------------------------------------
# Prompt rendering
# ------------------------------------------------------------------
def _render_prompt(
self,
template: str,
payload: dict,
event_type: str,
route_name: str,
) -> str:
"""Render a prompt template with the webhook payload.
Supports dot-notation access into nested dicts:
``{pull_request.title}`` → ``payload["pull_request"]["title"]``
"""
if not template:
truncated = json.dumps(payload, indent=2)[:4000]
return (
f"Webhook event '{event_type}' on route "
f"'{route_name}':\n\n```json\n{truncated}\n```"
)
def _resolve(match: re.Match) -> str:
key = match.group(1)
value: Any = payload
for part in key.split("."):
if isinstance(value, dict):
value = value.get(part, f"{{{key}}}")
else:
return f"{{{key}}}"
if isinstance(value, (dict, list)):
return json.dumps(value, indent=2)[:2000]
return str(value)
return re.sub(r"\{([a-zA-Z0-9_.]+)\}", _resolve, template)
def _render_delivery_extra(
self, extra: dict, payload: dict
) -> dict:
"""Render delivery_extra template values with payload data."""
rendered: Dict[str, Any] = {}
for key, value in extra.items():
if isinstance(value, str):
rendered[key] = self._render_prompt(value, payload, "", "")
else:
rendered[key] = value
return rendered
# ------------------------------------------------------------------
# Response delivery
# ------------------------------------------------------------------
async def _deliver_github_comment(
self, content: str, delivery: dict
) -> SendResult:
"""Post agent response as a GitHub PR/issue comment via ``gh`` CLI."""
extra = delivery.get("deliver_extra", {})
repo = extra.get("repo", "")
pr_number = extra.get("pr_number", "")
if not repo or not pr_number:
logger.error(
"[webhook] github_comment delivery missing repo or pr_number"
)
return SendResult(
success=False, error="Missing repo or pr_number"
)
try:
result = subprocess.run(
[
"gh",
"pr",
"comment",
str(pr_number),
"--repo",
repo,
"--body",
content,
],
capture_output=True,
text=True,
timeout=30,
)
if result.returncode == 0:
logger.info(
"[webhook] Posted comment on %s#%s", repo, pr_number
)
return SendResult(success=True)
else:
logger.error(
"[webhook] gh pr comment failed: %s", result.stderr
)
return SendResult(success=False, error=result.stderr)
except FileNotFoundError:
logger.error(
"[webhook] 'gh' CLI not found — install GitHub CLI for "
"github_comment delivery"
)
return SendResult(
success=False, error="gh CLI not installed"
)
except Exception as e:
logger.error("[webhook] github_comment delivery error: %s", e)
return SendResult(success=False, error=str(e))
async def _deliver_cross_platform(
self, platform_name: str, content: str, delivery: dict
) -> SendResult:
"""Route response to another platform (telegram, discord, etc.)."""
if not self.gateway_runner:
return SendResult(
success=False,
error="No gateway runner for cross-platform delivery",
)
try:
target_platform = Platform(platform_name)
except ValueError:
return SendResult(
success=False, error=f"Unknown platform: {platform_name}"
)
adapter = self.gateway_runner.adapters.get(target_platform)
if not adapter:
return SendResult(
success=False,
error=f"Platform {platform_name} not connected",
)
# Use home channel if no specific chat_id in deliver_extra
extra = delivery.get("deliver_extra", {})
chat_id = extra.get("chat_id", "")
if not chat_id:
home = self.gateway_runner.config.get_home_channel(target_platform)
if home:
chat_id = home.chat_id
else:
return SendResult(
success=False,
error=f"No chat_id or home channel for {platform_name}",
)
return await adapter.send(chat_id, content)

1338
gateway/platforms/wecom.py Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -20,13 +20,15 @@ import json
import logging
import os
import platform
import re
import subprocess
_IS_WINDOWS = platform.system() == "Windows"
from pathlib import Path
from typing import Dict, List, Optional, Any
from typing import Dict, Optional, Any
from hermes_cli.config import get_hermes_home
from hermes_constants import get_hermes_dir
logger = logging.getLogger(__name__)
@@ -74,6 +76,7 @@ from gateway.platforms.base import (
MessageEvent,
MessageType,
SendResult,
SUPPORTED_DOCUMENT_TYPES,
cache_image_from_url,
cache_audio_from_url,
)
@@ -134,11 +137,140 @@ class WhatsAppAdapter(BasePlatformAdapter):
)
self._session_path: Path = Path(config.extra.get(
"session_path",
get_hermes_home() / "whatsapp" / "session"
get_hermes_dir("platforms/whatsapp/session", "whatsapp/session")
))
self._reply_prefix: Optional[str] = config.extra.get("reply_prefix")
self._mention_patterns = self._compile_mention_patterns()
self._message_queue: asyncio.Queue = asyncio.Queue()
self._bridge_log_fh = None
self._bridge_log: Optional[Path] = None
self._poll_task: Optional[asyncio.Task] = None
self._http_session: Optional["aiohttp.ClientSession"] = None
self._session_lock_identity: Optional[str] = None
def _whatsapp_require_mention(self) -> bool:
configured = self.config.extra.get("require_mention")
if configured is not None:
if isinstance(configured, str):
return configured.lower() in ("true", "1", "yes", "on")
return bool(configured)
return os.getenv("WHATSAPP_REQUIRE_MENTION", "false").lower() in ("true", "1", "yes", "on")
def _whatsapp_free_response_chats(self) -> set[str]:
raw = self.config.extra.get("free_response_chats")
if raw is None:
raw = os.getenv("WHATSAPP_FREE_RESPONSE_CHATS", "")
if isinstance(raw, list):
return {str(part).strip() for part in raw if str(part).strip()}
return {part.strip() for part in str(raw).split(",") if part.strip()}
def _compile_mention_patterns(self):
patterns = self.config.extra.get("mention_patterns")
if patterns is None:
raw = os.getenv("WHATSAPP_MENTION_PATTERNS", "").strip()
if raw:
try:
patterns = json.loads(raw)
except Exception:
patterns = [part.strip() for part in raw.splitlines() if part.strip()]
if not patterns:
patterns = [part.strip() for part in raw.split(",") if part.strip()]
if patterns is None:
return []
if isinstance(patterns, str):
patterns = [patterns]
if not isinstance(patterns, list):
logger.warning("[%s] whatsapp mention_patterns must be a list or string; got %s", self.name, type(patterns).__name__)
return []
compiled = []
for pattern in patterns:
if not isinstance(pattern, str) or not pattern.strip():
continue
try:
compiled.append(re.compile(pattern, re.IGNORECASE))
except re.error as exc:
logger.warning("[%s] Invalid WhatsApp mention pattern %r: %s", self.name, pattern, exc)
if compiled:
logger.info("[%s] Loaded %d WhatsApp mention pattern(s)", self.name, len(compiled))
return compiled
@staticmethod
def _normalize_whatsapp_id(value: Optional[str]) -> str:
if not value:
return ""
normalized = str(value).strip()
if ":" in normalized and "@" in normalized:
normalized = normalized.replace(":", "@", 1)
return normalized
def _bot_ids_from_message(self, data: Dict[str, Any]) -> set[str]:
bot_ids = set()
for candidate in data.get("botIds") or []:
normalized = self._normalize_whatsapp_id(candidate)
if normalized:
bot_ids.add(normalized)
return bot_ids
def _message_is_reply_to_bot(self, data: Dict[str, Any]) -> bool:
quoted_participant = self._normalize_whatsapp_id(data.get("quotedParticipant"))
if not quoted_participant:
return False
return quoted_participant in self._bot_ids_from_message(data)
def _message_mentions_bot(self, data: Dict[str, Any]) -> bool:
bot_ids = self._bot_ids_from_message(data)
if not bot_ids:
return False
mentioned_ids = {
nid
for candidate in (data.get("mentionedIds") or [])
if (nid := self._normalize_whatsapp_id(candidate))
}
if mentioned_ids & bot_ids:
return True
body = str(data.get("body") or "")
lower_body = body.lower()
for bot_id in bot_ids:
bare_id = bot_id.split("@", 1)[0].lower()
if bare_id and (f"@{bare_id}" in lower_body or bare_id in lower_body):
return True
return False
def _message_matches_mention_patterns(self, data: Dict[str, Any]) -> bool:
if not self._mention_patterns:
return False
body = str(data.get("body") or "")
return any(pattern.search(body) for pattern in self._mention_patterns)
def _clean_bot_mention_text(self, text: str, data: Dict[str, Any]) -> str:
if not text:
return text
bot_ids = self._bot_ids_from_message(data)
cleaned = text
for bot_id in bot_ids:
bare_id = bot_id.split("@", 1)[0]
if bare_id:
cleaned = re.sub(rf"@{re.escape(bare_id)}\b[,:\-]*\s*", "", cleaned)
return cleaned.strip() or text
def _should_process_message(self, data: Dict[str, Any]) -> bool:
if not data.get("isGroup"):
return True
chat_id = str(data.get("chatId") or "")
if chat_id in self._whatsapp_free_response_chats():
return True
if not self._whatsapp_require_mention():
return True
body = str(data.get("body") or "").strip()
if body.startswith("/"):
return True
if self._message_is_reply_to_bot(data):
return True
if self._message_mentions_bot(data):
return True
return self._message_matches_mention_patterns(data)
async def connect(self) -> bool:
"""
@@ -157,6 +289,29 @@ class WhatsAppAdapter(BasePlatformAdapter):
logger.info("[%s] Bridge found at %s", self.name, bridge_path)
# Acquire scoped lock to prevent duplicate sessions
try:
from gateway.status import acquire_scoped_lock
self._session_lock_identity = str(self._session_path)
acquired, existing = acquire_scoped_lock(
"whatsapp-session",
self._session_lock_identity,
metadata={"platform": self.platform.value},
)
if not acquired:
owner_pid = existing.get("pid") if isinstance(existing, dict) else None
message = (
"Another local Hermes gateway is already using this WhatsApp session"
+ (f" (PID {owner_pid})." if owner_pid else ".")
+ " Stop the other gateway before starting a second WhatsApp bridge."
)
logger.error("[%s] %s", self.name, message)
self._set_fatal_error("whatsapp_session_lock", message, retryable=False)
return False
except Exception as e:
logger.warning("[%s] Could not acquire session lock (non-fatal): %s", self.name, e)
# Auto-install npm dependencies if node_modules doesn't exist
bridge_dir = bridge_path.parent
if not (bridge_dir / "node_modules").exists():
@@ -181,9 +336,32 @@ class WhatsAppAdapter(BasePlatformAdapter):
# Ensure session directory exists
self._session_path.mkdir(parents=True, exist_ok=True)
# Check if bridge is already running and connected
import aiohttp
import asyncio
try:
async with aiohttp.ClientSession() as session:
async with session.get(
f"http://127.0.0.1:{self._bridge_port}/health",
timeout=aiohttp.ClientTimeout(total=2)
) as resp:
if resp.status == 200:
data = await resp.json()
bridge_status = data.get("status", "unknown")
if bridge_status == "connected":
print(f"[{self.name}] Using existing bridge (status: {bridge_status})")
self._mark_connected()
self._bridge_process = None # Not managed by us
self._http_session = aiohttp.ClientSession()
self._poll_task = asyncio.create_task(self._poll_messages())
return True
else:
print(f"[{self.name}] Bridge found but not connected (status: {bridge_status}), restarting")
except Exception:
pass # Bridge not running, start a new one
# Kill any orphaned bridge from a previous gateway run
_kill_port_process(self._bridge_port)
import asyncio
await asyncio.sleep(1)
# Start the bridge process in its own process group.
@@ -193,6 +371,14 @@ class WhatsAppAdapter(BasePlatformAdapter):
self._bridge_log = self._session_path.parent / "bridge.log"
bridge_log_fh = open(self._bridge_log, "a")
self._bridge_log_fh = bridge_log_fh
# Build bridge subprocess environment.
# Pass WHATSAPP_REPLY_PREFIX from config.yaml so the Node bridge
# can use it without the user needing to set a separate env var.
bridge_env = os.environ.copy()
if self._reply_prefix is not None:
bridge_env["WHATSAPP_REPLY_PREFIX"] = self._reply_prefix
self._bridge_process = subprocess.Popen(
[
"node",
@@ -204,6 +390,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
stdout=bridge_log_fh,
stderr=bridge_log_fh,
preexec_fn=None if _IS_WINDOWS else os.setsid,
env=bridge_env,
)
# Wait for the bridge to connect to WhatsApp.
@@ -222,7 +409,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
try:
async with aiohttp.ClientSession() as session:
async with session.get(
f"http://localhost:{self._bridge_port}/health",
f"http://127.0.0.1:{self._bridge_port}/health",
timeout=aiohttp.ClientTimeout(total=2)
) as resp:
if resp.status == 200:
@@ -254,7 +441,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
try:
async with aiohttp.ClientSession() as session:
async with session.get(
f"http://localhost:{self._bridge_port}/health",
f"http://127.0.0.1:{self._bridge_port}/health",
timeout=aiohttp.ClientTimeout(total=2)
) as resp:
if resp.status == 200:
@@ -271,14 +458,23 @@ class WhatsAppAdapter(BasePlatformAdapter):
print(f"[{self.name}] Bridge log: {self._bridge_log}")
print(f"[{self.name}] If session expired, re-pair: hermes whatsapp")
# Create a persistent HTTP session for all bridge communication
self._http_session = aiohttp.ClientSession()
# Start message polling task
asyncio.create_task(self._poll_messages())
self._poll_task = asyncio.create_task(self._poll_messages())
self._running = True
self._mark_connected()
print(f"[{self.name}] Bridge started on port {self._bridge_port}")
return True
except Exception as e:
if self._session_lock_identity:
try:
from gateway.status import release_scoped_lock
release_scoped_lock("whatsapp-session", self._session_lock_identity)
except Exception:
pass
logger.error("[%s] Failed to start bridge: %s", self.name, e, exc_info=True)
self._close_bridge_log()
return False
@@ -292,6 +488,23 @@ class WhatsAppAdapter(BasePlatformAdapter):
pass
self._bridge_log_fh = None
async def _check_managed_bridge_exit(self) -> Optional[str]:
"""Return a fatal error message if the managed bridge child exited."""
if self._bridge_process is None:
return None
returncode = self._bridge_process.poll()
if returncode is None:
return None
message = f"WhatsApp bridge process exited unexpectedly (code {returncode})."
if not self.has_fatal_error:
logger.error("[%s] %s", self.name, message)
self._set_fatal_error("whatsapp_bridge_exited", message, retryable=True)
self._close_bridge_log()
await self._notify_fatal_error()
return self.fatal_error_message or message
async def disconnect(self) -> None:
"""Stop the WhatsApp bridge and clean up any orphaned processes."""
if self._bridge_process:
@@ -316,13 +529,35 @@ class WhatsAppAdapter(BasePlatformAdapter):
self._bridge_process.kill()
except Exception as e:
print(f"[{self.name}] Error stopping bridge: {e}")
# Also kill any orphaned bridge processes on our port
_kill_port_process(self._bridge_port)
self._running = False
else:
# Bridge was not started by us, don't kill it
print(f"[{self.name}] Disconnecting (external bridge left running)")
# Cancel the poll task explicitly
if self._poll_task and not self._poll_task.done():
self._poll_task.cancel()
try:
await self._poll_task
except (asyncio.CancelledError, Exception):
pass
self._poll_task = None
# Close the persistent HTTP session
if self._http_session and not self._http_session.closed:
await self._http_session.close()
self._http_session = None
if self._session_lock_identity:
try:
from gateway.status import release_scoped_lock
release_scoped_lock("whatsapp-session", self._session_lock_identity)
except Exception as e:
logger.warning("[%s] Error releasing WhatsApp session lock: %s", self.name, e, exc_info=True)
self._mark_disconnected()
self._bridge_process = None
self._close_bridge_log()
self._session_lock_identity = None
print(f"[{self.name}] Disconnected")
async def send(
@@ -333,41 +568,37 @@ class WhatsAppAdapter(BasePlatformAdapter):
metadata: Optional[Dict[str, Any]] = None
) -> SendResult:
"""Send a message via the WhatsApp bridge."""
if not self._running:
if not self._running or not self._http_session:
return SendResult(success=False, error="Not connected")
bridge_exit = await self._check_managed_bridge_exit()
if bridge_exit:
return SendResult(success=False, error=bridge_exit)
try:
import aiohttp
payload = {
"chatId": chat_id,
"message": content,
}
if reply_to:
payload["replyTo"] = reply_to
async with aiohttp.ClientSession() as session:
payload = {
"chatId": chat_id,
"message": content,
}
if reply_to:
payload["replyTo"] = reply_to
async with session.post(
f"http://localhost:{self._bridge_port}/send",
json=payload,
timeout=aiohttp.ClientTimeout(total=30)
) as resp:
if resp.status == 200:
data = await resp.json()
return SendResult(
success=True,
message_id=data.get("messageId"),
raw_response=data
)
else:
error = await resp.text()
return SendResult(success=False, error=error)
except ImportError:
return SendResult(
success=False,
error="aiohttp not installed. Run: pip install aiohttp"
)
async with self._http_session.post(
f"http://127.0.0.1:{self._bridge_port}/send",
json=payload,
timeout=aiohttp.ClientTimeout(total=30)
) as resp:
if resp.status == 200:
data = await resp.json()
return SendResult(
success=True,
message_id=data.get("messageId"),
raw_response=data
)
else:
error = await resp.text()
return SendResult(success=False, error=error)
except Exception as e:
return SendResult(success=False, error=str(e))
@@ -378,25 +609,27 @@ class WhatsAppAdapter(BasePlatformAdapter):
content: str,
) -> SendResult:
"""Edit a previously sent message via the WhatsApp bridge."""
if not self._running:
if not self._running or not self._http_session:
return SendResult(success=False, error="Not connected")
bridge_exit = await self._check_managed_bridge_exit()
if bridge_exit:
return SendResult(success=False, error=bridge_exit)
try:
import aiohttp
async with aiohttp.ClientSession() as session:
async with session.post(
f"http://localhost:{self._bridge_port}/edit",
json={
"chatId": chat_id,
"messageId": message_id,
"message": content,
},
timeout=aiohttp.ClientTimeout(total=15)
) as resp:
if resp.status == 200:
return SendResult(success=True, message_id=message_id)
else:
error = await resp.text()
return SendResult(success=False, error=error)
async with self._http_session.post(
f"http://127.0.0.1:{self._bridge_port}/edit",
json={
"chatId": chat_id,
"messageId": message_id,
"message": content,
},
timeout=aiohttp.ClientTimeout(total=15)
) as resp:
if resp.status == 200:
return SendResult(success=True, message_id=message_id)
else:
error = await resp.text()
return SendResult(success=False, error=error)
except Exception as e:
return SendResult(success=False, error=str(e))
@@ -409,8 +642,11 @@ class WhatsAppAdapter(BasePlatformAdapter):
file_name: Optional[str] = None,
) -> SendResult:
"""Send any media file via bridge /send-media endpoint."""
if not self._running:
if not self._running or not self._http_session:
return SendResult(success=False, error="Not connected")
bridge_exit = await self._check_managed_bridge_exit()
if bridge_exit:
return SendResult(success=False, error=bridge_exit)
try:
import aiohttp
@@ -427,22 +663,21 @@ class WhatsAppAdapter(BasePlatformAdapter):
if file_name:
payload["fileName"] = file_name
async with aiohttp.ClientSession() as session:
async with session.post(
f"http://localhost:{self._bridge_port}/send-media",
json=payload,
timeout=aiohttp.ClientTimeout(total=120),
) as resp:
if resp.status == 200:
data = await resp.json()
return SendResult(
success=True,
message_id=data.get("messageId"),
raw_response=data,
)
else:
error = await resp.text()
return SendResult(success=False, error=error)
async with self._http_session.post(
f"http://127.0.0.1:{self._bridge_port}/send-media",
json=payload,
timeout=aiohttp.ClientTimeout(total=120),
) as resp:
if resp.status == 200:
data = await resp.json()
return SendResult(
success=True,
message_id=data.get("messageId"),
raw_response=data,
)
else:
error = await resp.text()
return SendResult(success=False, error=error)
except Exception as e:
return SendResult(success=False, error=str(e))
@@ -467,6 +702,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
image_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
**kwargs,
) -> SendResult:
"""Send a local image file natively via bridge."""
return await self._send_media_to_bridge(chat_id, image_path, "image", caption)
@@ -477,6 +713,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
video_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
**kwargs,
) -> SendResult:
"""Send a video natively via bridge — plays inline in WhatsApp."""
return await self._send_media_to_bridge(chat_id, video_path, "video", caption)
@@ -488,6 +725,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
caption: Optional[str] = None,
file_name: Optional[str] = None,
reply_to: Optional[str] = None,
**kwargs,
) -> SendResult:
"""Send a document/file as a downloadable attachment via bridge."""
return await self._send_media_to_bridge(
@@ -497,41 +735,43 @@ class WhatsAppAdapter(BasePlatformAdapter):
async def send_typing(self, chat_id: str, metadata=None) -> None:
"""Send typing indicator via bridge."""
if not self._running:
if not self._running or not self._http_session:
return
if await self._check_managed_bridge_exit():
return
try:
import aiohttp
async with aiohttp.ClientSession() as session:
await session.post(
f"http://localhost:{self._bridge_port}/typing",
json={"chatId": chat_id},
timeout=aiohttp.ClientTimeout(total=5)
)
await self._http_session.post(
f"http://127.0.0.1:{self._bridge_port}/typing",
json={"chatId": chat_id},
timeout=aiohttp.ClientTimeout(total=5)
)
except Exception:
pass # Ignore typing indicator failures
async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
"""Get information about a WhatsApp chat."""
if not self._running:
if not self._running or not self._http_session:
return {"name": "Unknown", "type": "dm"}
if await self._check_managed_bridge_exit():
return {"name": chat_id, "type": "dm"}
try:
import aiohttp
async with aiohttp.ClientSession() as session:
async with session.get(
f"http://localhost:{self._bridge_port}/chat/{chat_id}",
timeout=aiohttp.ClientTimeout(total=10)
) as resp:
if resp.status == 200:
data = await resp.json()
return {
"name": data.get("name", chat_id),
"type": "group" if data.get("isGroup") else "dm",
"participants": data.get("participants", []),
}
async with self._http_session.get(
f"http://127.0.0.1:{self._bridge_port}/chat/{chat_id}",
timeout=aiohttp.ClientTimeout(total=10)
) as resp:
if resp.status == 200:
data = await resp.json()
return {
"name": data.get("name", chat_id),
"type": "group" if data.get("isGroup") else "dm",
"participants": data.get("participants", []),
}
except Exception as e:
logger.debug("Could not get WhatsApp chat info for %s: %s", chat_id, e)
@@ -539,28 +779,33 @@ class WhatsAppAdapter(BasePlatformAdapter):
async def _poll_messages(self) -> None:
"""Poll the bridge for incoming messages."""
try:
import aiohttp
except ImportError:
print(f"[{self.name}] aiohttp not installed, message polling disabled")
return
import aiohttp
while self._running:
if not self._http_session:
break
bridge_exit = await self._check_managed_bridge_exit()
if bridge_exit:
print(f"[{self.name}] {bridge_exit}")
break
try:
async with aiohttp.ClientSession() as session:
async with session.get(
f"http://localhost:{self._bridge_port}/messages",
timeout=aiohttp.ClientTimeout(total=30)
) as resp:
if resp.status == 200:
messages = await resp.json()
for msg_data in messages:
event = await self._build_message_event(msg_data)
if event:
await self.handle_message(event)
async with self._http_session.get(
f"http://127.0.0.1:{self._bridge_port}/messages",
timeout=aiohttp.ClientTimeout(total=30)
) as resp:
if resp.status == 200:
messages = await resp.json()
for msg_data in messages:
event = await self._build_message_event(msg_data)
if event:
await self.handle_message(event)
except asyncio.CancelledError:
break
except Exception as e:
bridge_exit = await self._check_managed_bridge_exit()
if bridge_exit:
print(f"[{self.name}] {bridge_exit}")
break
print(f"[{self.name}] Poll error: {e}")
await asyncio.sleep(5)
@@ -569,6 +814,9 @@ class WhatsAppAdapter(BasePlatformAdapter):
async def _build_message_event(self, data: Dict[str, Any]) -> Optional[MessageEvent]:
"""Build a MessageEvent from bridge message data, downloading images to cache."""
try:
if not self._should_process_message(data):
return None
# Determine message type
msg_type = MessageType.TEXT
if data.get("hasMedia"):
@@ -595,7 +843,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
user_name=data.get("senderName"),
)
# Download image media URLs to the local cache so the vision tool
# Download media URLs to the local cache so agent tools
# can access them reliably regardless of URL expiration.
raw_urls = data.get("mediaUrls", [])
cached_urls = []
@@ -611,6 +859,11 @@ class WhatsAppAdapter(BasePlatformAdapter):
print(f"[{self.name}] Failed to cache image: {e}", flush=True)
cached_urls.append(url)
media_types.append("image/jpeg")
elif msg_type == MessageType.PHOTO and os.path.isabs(url):
# Local file path — bridge already downloaded the image
cached_urls.append(url)
media_types.append("image/jpeg")
print(f"[{self.name}] Using bridge-cached image: {url}", flush=True)
elif msg_type == MessageType.VOICE and url.startswith(("http://", "https://")):
try:
cached_path = await cache_audio_from_url(url, ext=".ogg")
@@ -621,12 +874,61 @@ class WhatsAppAdapter(BasePlatformAdapter):
print(f"[{self.name}] Failed to cache voice: {e}", flush=True)
cached_urls.append(url)
media_types.append("audio/ogg")
elif msg_type == MessageType.VOICE and os.path.isabs(url):
# Local file path — bridge already downloaded the audio
cached_urls.append(url)
media_types.append("audio/ogg")
print(f"[{self.name}] Using bridge-cached audio: {url}", flush=True)
elif msg_type == MessageType.DOCUMENT and os.path.isabs(url):
# Local file path — bridge already downloaded the document
cached_urls.append(url)
ext = Path(url).suffix.lower()
mime = SUPPORTED_DOCUMENT_TYPES.get(ext, "application/octet-stream")
media_types.append(mime)
print(f"[{self.name}] Using bridge-cached document: {url}", flush=True)
elif msg_type == MessageType.VIDEO and os.path.isabs(url):
cached_urls.append(url)
media_types.append("video/mp4")
print(f"[{self.name}] Using bridge-cached video: {url}", flush=True)
else:
cached_urls.append(url)
media_types.append("unknown")
# For text-readable documents, inject file content directly into
# the message text so the agent can read it inline.
# Cap at 100KB to match Telegram/Discord/Slack behaviour.
body = data.get("body", "")
if data.get("isGroup"):
body = self._clean_bot_mention_text(body, data)
MAX_TEXT_INJECT_BYTES = 100 * 1024
if msg_type == MessageType.DOCUMENT and cached_urls:
for doc_path in cached_urls:
ext = Path(doc_path).suffix.lower()
if ext in (".txt", ".md", ".csv", ".json", ".xml", ".yaml", ".yml", ".log", ".py", ".js", ".ts", ".html", ".css"):
try:
file_size = Path(doc_path).stat().st_size
if file_size > MAX_TEXT_INJECT_BYTES:
print(f"[{self.name}] Skipping text injection for {doc_path} ({file_size} bytes > {MAX_TEXT_INJECT_BYTES})", flush=True)
continue
content = Path(doc_path).read_text(errors="replace")
fname = Path(doc_path).name
# Remove the doc_<hex>_ prefix for display
display_name = fname
if "_" in fname:
parts = fname.split("_", 2)
if len(parts) >= 3:
display_name = parts[2]
injection = f"[Content of {display_name}]:\n{content}"
if body:
body = f"{injection}\n\n{body}"
else:
body = injection
print(f"[{self.name}] Injected text content from: {doc_path}", flush=True)
except Exception as e:
print(f"[{self.name}] Failed to read document text: {e}", flush=True)
return MessageEvent(
text=data.get("body", ""),
text=body,
message_type=msg_type,
source=source,
raw_message=data,
@@ -637,4 +939,3 @@ class WhatsAppAdapter(BasePlatformAdapter):
except Exception as e:
print(f"[{self.name}] Error building event: {e}")
return None

File diff suppressed because it is too large Load Diff

View File

@@ -8,21 +8,64 @@ Handles:
- Dynamic system prompt injection (agent knows its context)
"""
import hashlib
import logging
import os
import json
import re
import threading
import uuid
from pathlib import Path
from datetime import datetime, timedelta
from dataclasses import dataclass, field
from dataclasses import dataclass
from typing import Dict, List, Optional, Any
logger = logging.getLogger(__name__)
def _now() -> datetime:
"""Return the current local time."""
return datetime.now()
# ---------------------------------------------------------------------------
# PII redaction helpers
# ---------------------------------------------------------------------------
_PHONE_RE = re.compile(r"^\+?\d[\d\-\s]{6,}$")
def _hash_id(value: str) -> str:
"""Deterministic 12-char hex hash of an identifier."""
return hashlib.sha256(value.encode("utf-8")).hexdigest()[:12]
def _hash_sender_id(value: str) -> str:
"""Hash a sender ID to ``user_<12hex>``."""
return f"user_{_hash_id(value)}"
def _hash_chat_id(value: str) -> str:
"""Hash the numeric portion of a chat ID, preserving platform prefix.
``telegram:12345`` → ``telegram:<hash>``
``12345`` → ``<hash>``
"""
colon = value.find(":")
if colon > 0:
prefix = value[:colon]
return f"{prefix}:{_hash_id(value[colon + 1:])}"
return _hash_id(value)
def _looks_like_phone(value: str) -> bool:
"""Return True if *value* looks like a phone number (E.164 or similar)."""
return bool(_PHONE_RE.match(value.strip()))
from .config import (
Platform,
GatewayConfig,
SessionResetPolicy,
SessionResetPolicy, # noqa: F401 — re-exported via gateway/__init__.py
HomeChannel,
)
@@ -146,7 +189,21 @@ class SessionContext:
}
def build_session_context_prompt(context: SessionContext) -> str:
_PII_SAFE_PLATFORMS = frozenset({
Platform.WHATSAPP,
Platform.SIGNAL,
Platform.TELEGRAM,
})
"""Platforms where user IDs can be safely redacted (no in-message mention system
that requires raw IDs). Discord is excluded because mentions use ``<@user_id>``
and the LLM needs the real ID to tag users."""
def build_session_context_prompt(
context: SessionContext,
*,
redact_pii: bool = False,
) -> str:
"""
Build the dynamic system prompt section that tells the agent about its context.
@@ -154,7 +211,15 @@ def build_session_context_prompt(context: SessionContext) -> str:
- Where messages are coming from
- What platforms are connected
- Where it can deliver scheduled task outputs
When *redact_pii* is True **and** the source platform is in
``_PII_SAFE_PLATFORMS``, phone numbers are stripped and user/chat IDs
are replaced with deterministic hashes before being sent to the LLM.
Platforms like Discord are excluded because mentions need real IDs.
Routing still uses the original values (they stay in SessionSource).
"""
# Only apply redaction on platforms where IDs aren't needed for mentions
redact_pii = redact_pii and context.source.platform in _PII_SAFE_PLATFORMS
lines = [
"## Current Session Context",
"",
@@ -165,7 +230,25 @@ def build_session_context_prompt(context: SessionContext) -> str:
if context.source.platform == Platform.LOCAL:
lines.append(f"**Source:** {platform_name} (the machine running this agent)")
else:
lines.append(f"**Source:** {platform_name} ({context.source.description})")
# Build a description that respects PII redaction
src = context.source
if redact_pii:
# Build a safe description without raw IDs
_uname = src.user_name or (
_hash_sender_id(src.user_id) if src.user_id else "user"
)
_cname = src.chat_name or _hash_chat_id(src.chat_id)
if src.chat_type == "dm":
desc = f"DM with {_uname}"
elif src.chat_type == "group":
desc = f"group: {_cname}"
elif src.chat_type == "channel":
desc = f"channel: {_cname}"
else:
desc = _cname
else:
desc = src.description
lines.append(f"**Source:** {platform_name} ({desc})")
# Channel topic (if available - provides context about the channel's purpose)
if context.source.chat_topic:
@@ -175,7 +258,10 @@ def build_session_context_prompt(context: SessionContext) -> str:
if context.source.user_name:
lines.append(f"**User:** {context.source.user_name}")
elif context.source.user_id:
lines.append(f"**User ID:** {context.source.user_id}")
uid = context.source.user_id
if redact_pii:
uid = _hash_sender_id(uid)
lines.append(f"**User ID:** {uid}")
# Platform-specific behavioral notes
if context.source.platform == Platform.SLACK:
@@ -210,7 +296,8 @@ def build_session_context_prompt(context: SessionContext) -> str:
lines.append("")
lines.append("**Home Channels (default destinations):**")
for platform, home in context.home_channels.items():
lines.append(f" - {platform.value}: {home.name} (ID: {home.chat_id})")
hc_id = _hash_chat_id(home.chat_id) if redact_pii else home.chat_id
lines.append(f" - {platform.value}: {home.name} (ID: {hc_id})")
# Delivery options for scheduled tasks
lines.append("")
@@ -220,7 +307,10 @@ def build_session_context_prompt(context: SessionContext) -> str:
if context.source.platform == Platform.LOCAL:
lines.append("- `\"origin\"` → Local output (saved to files)")
else:
lines.append(f"- `\"origin\"` → Back to this chat ({context.source.chat_name or context.source.chat_id})")
_origin_label = context.source.chat_name or (
_hash_chat_id(context.source.chat_id) if redact_pii else context.source.chat_id
)
lines.append(f"- `\"origin\"` → Back to this chat ({_origin_label})")
# Local always available
lines.append("- `\"local\"` → Save to local files only (~/.hermes/cron/output/)")
@@ -259,7 +349,11 @@ class SessionEntry:
# Token tracking
input_tokens: int = 0
output_tokens: int = 0
cache_read_tokens: int = 0
cache_write_tokens: int = 0
total_tokens: int = 0
estimated_cost_usd: float = 0.0
cost_status: str = "unknown"
# Last API-reported prompt tokens (for accurate compression pre-check)
last_prompt_tokens: int = 0
@@ -267,6 +361,14 @@ class SessionEntry:
# Set when a session was created because the previous one expired;
# consumed once by the message handler to inject a notice into context
was_auto_reset: bool = False
auto_reset_reason: Optional[str] = None # "idle" or "daily"
reset_had_activity: bool = False # whether the expired session had any messages
# Set by the background expiry watcher after it successfully flushes
# memories for this session. Persisted to sessions.json so the flag
# survives gateway restarts (the old in-memory _pre_flushed_sessions
# set was lost on restart, causing redundant re-flushes).
memory_flushed: bool = False
def to_dict(self) -> Dict[str, Any]:
result = {
@@ -279,8 +381,13 @@ class SessionEntry:
"chat_type": self.chat_type,
"input_tokens": self.input_tokens,
"output_tokens": self.output_tokens,
"cache_read_tokens": self.cache_read_tokens,
"cache_write_tokens": self.cache_write_tokens,
"total_tokens": self.total_tokens,
"last_prompt_tokens": self.last_prompt_tokens,
"estimated_cost_usd": self.estimated_cost_usd,
"cost_status": self.cost_status,
"memory_flushed": self.memory_flushed,
}
if self.origin:
result["origin"] = self.origin.to_dict()
@@ -310,36 +417,57 @@ class SessionEntry:
chat_type=data.get("chat_type", "dm"),
input_tokens=data.get("input_tokens", 0),
output_tokens=data.get("output_tokens", 0),
cache_read_tokens=data.get("cache_read_tokens", 0),
cache_write_tokens=data.get("cache_write_tokens", 0),
total_tokens=data.get("total_tokens", 0),
last_prompt_tokens=data.get("last_prompt_tokens", 0),
estimated_cost_usd=data.get("estimated_cost_usd", 0.0),
cost_status=data.get("cost_status", "unknown"),
memory_flushed=data.get("memory_flushed", False),
)
def build_session_key(source: SessionSource) -> str:
def build_session_key(source: SessionSource, group_sessions_per_user: bool = True) -> str:
"""Build a deterministic session key from a message source.
This is the single source of truth for session key construction.
DM rules:
- WhatsApp DMs include chat_id (multi-user support).
- Other DMs include thread_id when present (e.g. Slack threaded DMs),
so each DM thread gets its own session while top-level DMs share one.
- Without thread_id or chat_id, all DMs share a single session.
- DMs include chat_id when present, so each private conversation is isolated.
- thread_id further differentiates threaded DMs within the same DM chat.
- Without chat_id, thread_id is used as a best-effort fallback.
- Without thread_id or chat_id, DMs share a single session.
Group/channel rules:
- thread_id differentiates threads within a channel.
- Without thread_id, all messages in a channel share one session.
- chat_id identifies the parent group/channel.
- user_id/user_id_alt isolates participants within that parent chat when available when
``group_sessions_per_user`` is enabled.
- thread_id differentiates threads within that parent chat.
- Without participant identifiers, or when isolation is disabled, messages fall back to one
shared session per chat.
- Without identifiers, messages fall back to one session per platform/chat_type.
"""
platform = source.platform.value
if source.chat_type == "dm":
if source.chat_id:
if source.thread_id:
return f"agent:main:{platform}:dm:{source.chat_id}:{source.thread_id}"
return f"agent:main:{platform}:dm:{source.chat_id}"
if source.thread_id:
return f"agent:main:{platform}:dm:{source.thread_id}"
if platform == "whatsapp" and source.chat_id:
return f"agent:main:{platform}:dm:{source.chat_id}"
return f"agent:main:{platform}:dm"
participant_id = source.user_id_alt or source.user_id
key_parts = ["agent:main", platform, source.chat_type]
if source.chat_id:
key_parts.append(source.chat_id)
if source.thread_id:
return f"agent:main:{platform}:{source.chat_type}:{source.chat_id}:{source.thread_id}"
return f"agent:main:{platform}:{source.chat_type}:{source.chat_id}"
key_parts.append(source.thread_id)
if group_sessions_per_user and participant_id:
key_parts.append(str(participant_id))
return ":".join(key_parts)
class SessionStore:
@@ -357,10 +485,8 @@ class SessionStore:
self.config = config
self._entries: Dict[str, SessionEntry] = {}
self._loaded = False
self._lock = threading.Lock()
self._has_active_processes_fn = has_active_processes_fn
# on_auto_reset is deprecated — memory flush now runs proactively
# via the background session expiry watcher in GatewayRunner.
self._pre_flushed_sessions: set = set() # session_ids already flushed by watcher
# Initialize SQLite session database
self._db = None
@@ -372,12 +498,17 @@ class SessionStore:
def _ensure_loaded(self) -> None:
"""Load sessions index from disk if not already loaded."""
with self._lock:
self._ensure_loaded_locked()
def _ensure_loaded_locked(self) -> None:
"""Load sessions index from disk. Must be called with self._lock held."""
if self._loaded:
return
self.sessions_dir.mkdir(parents=True, exist_ok=True)
sessions_file = self.sessions_dir / "sessions.json"
if sessions_file.exists():
try:
with open(sessions_file, "r", encoding="utf-8") as f:
@@ -390,7 +521,7 @@ class SessionStore:
continue
except Exception as e:
print(f"[gateway] Warning: Failed to load sessions: {e}")
self._loaded = True
def _save(self) -> None:
@@ -418,7 +549,10 @@ class SessionStore:
def _generate_session_key(self, source: SessionSource) -> str:
"""Generate a session key from a source."""
return build_session_key(source)
return build_session_key(
source,
group_sessions_per_user=getattr(self.config, "group_sessions_per_user", True),
)
def _is_session_expired(self, entry: SessionEntry) -> bool:
"""Check if a session has expired based on its reset policy.
@@ -439,7 +573,7 @@ class SessionStore:
if policy.mode == "none":
return False
now = datetime.now()
now = _now()
if policy.mode in ("idle", "both"):
idle_deadline = entry.updated_at + timedelta(minutes=policy.idle_minutes)
@@ -458,16 +592,19 @@ class SessionStore:
return False
def _should_reset(self, entry: SessionEntry, source: SessionSource) -> bool:
def _should_reset(self, entry: SessionEntry, source: SessionSource) -> Optional[str]:
"""
Check if a session should be reset based on policy.
Returns the reset reason ("idle" or "daily") if a reset is needed,
or None if the session is still valid.
Sessions with active background processes are never reset.
"""
if self._has_active_processes_fn:
session_key = self._generate_session_key(source)
if self._has_active_processes_fn(session_key):
return False
return None
policy = self.config.get_reset_policy(
platform=source.platform,
@@ -475,14 +612,14 @@ class SessionStore:
)
if policy.mode == "none":
return False
return None
now = datetime.now()
now = _now()
if policy.mode in ("idle", "both"):
idle_deadline = entry.updated_at + timedelta(minutes=policy.idle_minutes)
if now > idle_deadline:
return True
return "idle"
if policy.mode in ("daily", "both"):
today_reset = now.replace(
@@ -495,9 +632,9 @@ class SessionStore:
today_reset -= timedelta(days=1)
if entry.updated_at < today_reset:
return True
return "daily"
return False
return None
def has_any_sessions(self) -> bool:
"""Check if any sessions have ever been created (across all platforms).
@@ -517,151 +654,192 @@ class SessionStore:
pass # fall through to heuristic
# Fallback: check if sessions.json was loaded with existing data.
# This covers the rare case where the DB is unavailable.
self._ensure_loaded()
return len(self._entries) > 1
with self._lock:
self._ensure_loaded_locked()
return len(self._entries) > 1
def get_or_create_session(
self,
self,
source: SessionSource,
force_new: bool = False
) -> SessionEntry:
"""
Get an existing session or create a new one.
Evaluates reset policy to determine if the existing session is stale.
Creates a session record in SQLite when a new session starts.
"""
self._ensure_loaded()
session_key = self._generate_session_key(source)
now = datetime.now()
if session_key in self._entries and not force_new:
entry = self._entries[session_key]
if not self._should_reset(entry, source):
entry.updated_at = now
self._save()
return entry
now = _now()
# SQLite calls are made outside the lock to avoid holding it during I/O.
# All _entries / _loaded mutations are protected by self._lock.
db_end_session_id = None
db_create_kwargs = None
with self._lock:
self._ensure_loaded_locked()
if session_key in self._entries and not force_new:
entry = self._entries[session_key]
reset_reason = self._should_reset(entry, source)
if not reset_reason:
entry.updated_at = now
self._save()
return entry
else:
# Session is being auto-reset.
was_auto_reset = True
auto_reset_reason = reset_reason
# Track whether the expired session had any real conversation
reset_had_activity = entry.total_tokens > 0
db_end_session_id = entry.session_id
else:
# Session is being auto-reset. The background expiry watcher
# should have already flushed memories proactively; discard
# the marker so it doesn't accumulate.
was_auto_reset = True
self._pre_flushed_sessions.discard(entry.session_id)
if self._db:
try:
self._db.end_session(entry.session_id, "session_reset")
except Exception as e:
logger.debug("Session DB operation failed: %s", e)
else:
was_auto_reset = False
# Create new session
session_id = f"{now.strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:8]}"
entry = SessionEntry(
session_key=session_key,
session_id=session_id,
created_at=now,
updated_at=now,
origin=source,
display_name=source.chat_name,
platform=source.platform,
chat_type=source.chat_type,
was_auto_reset=was_auto_reset,
)
self._entries[session_key] = entry
self._save()
# Create session in SQLite
if self._db:
was_auto_reset = False
auto_reset_reason = None
reset_had_activity = False
# Create new session
session_id = f"{now.strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:8]}"
entry = SessionEntry(
session_key=session_key,
session_id=session_id,
created_at=now,
updated_at=now,
origin=source,
display_name=source.chat_name,
platform=source.platform,
chat_type=source.chat_type,
was_auto_reset=was_auto_reset,
auto_reset_reason=auto_reset_reason,
reset_had_activity=reset_had_activity,
)
self._entries[session_key] = entry
self._save()
db_create_kwargs = {
"session_id": session_id,
"source": source.platform.value,
"user_id": source.user_id,
}
# SQLite operations outside the lock
if self._db and db_end_session_id:
try:
self._db.create_session(
session_id=session_id,
source=source.platform.value,
user_id=source.user_id,
)
self._db.end_session(db_end_session_id, "session_reset")
except Exception as e:
logger.debug("Session DB operation failed: %s", e)
if self._db and db_create_kwargs:
try:
self._db.create_session(**db_create_kwargs)
except Exception as e:
print(f"[gateway] Warning: Failed to create SQLite session: {e}")
return entry
def update_session(
self,
session_key: str,
input_tokens: int = 0,
output_tokens: int = 0,
last_prompt_tokens: int = None,
model: str = None,
) -> None:
"""Update a session's metadata after an interaction."""
self._ensure_loaded()
if session_key in self._entries:
entry = self._entries[session_key]
entry.updated_at = datetime.now()
entry.input_tokens += input_tokens
entry.output_tokens += output_tokens
if last_prompt_tokens is not None:
entry.last_prompt_tokens = last_prompt_tokens
entry.total_tokens = entry.input_tokens + entry.output_tokens
self._save()
if self._db:
# Seed new DM thread sessions with parent DM session history.
# When a bot reply creates a Slack thread and the user responds in it,
# the thread gets a new session (keyed by thread_ts). Without seeding,
# the thread session starts with zero context — the user's original
# question and the bot's answer are invisible. Fix: copy the parent
# DM session's transcript into the new thread session so context carries
# over while still keeping threads isolated from each other.
if (
source.chat_type == "dm"
and source.thread_id
and entry.created_at == entry.updated_at # brand-new session
and not was_auto_reset
):
parent_source = SessionSource(
platform=source.platform,
chat_id=source.chat_id,
chat_type="dm",
user_id=source.user_id,
# no thread_id — this is the parent DM session
)
parent_key = self._generate_session_key(parent_source)
with self._lock:
parent_entry = self._entries.get(parent_key)
if parent_entry and parent_entry.session_id != entry.session_id:
try:
self._db.update_token_counts(
entry.session_id, input_tokens, output_tokens,
model=model,
)
parent_history = self.load_transcript(parent_entry.session_id)
if parent_history:
self.rewrite_transcript(entry.session_id, parent_history)
logger.info(
"[Session] Seeded DM thread session %s with %d messages from parent %s",
entry.session_id, len(parent_history), parent_entry.session_id,
)
except Exception as e:
logger.debug("Session DB operation failed: %s", e)
logger.warning("[Session] Failed to seed thread session: %s", e)
return entry
def update_session(
self,
session_key: str,
last_prompt_tokens: int = None,
) -> None:
"""Update lightweight session metadata after an interaction."""
with self._lock:
self._ensure_loaded_locked()
if session_key in self._entries:
entry = self._entries[session_key]
entry.updated_at = _now()
if last_prompt_tokens is not None:
entry.last_prompt_tokens = last_prompt_tokens
self._save()
def reset_session(self, session_key: str) -> Optional[SessionEntry]:
"""Force reset a session, creating a new session ID."""
self._ensure_loaded()
if session_key not in self._entries:
return None
old_entry = self._entries[session_key]
# End old session in SQLite
if self._db:
db_end_session_id = None
db_create_kwargs = None
new_entry = None
with self._lock:
self._ensure_loaded_locked()
if session_key not in self._entries:
return None
old_entry = self._entries[session_key]
db_end_session_id = old_entry.session_id
now = _now()
session_id = f"{now.strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:8]}"
new_entry = SessionEntry(
session_key=session_key,
session_id=session_id,
created_at=now,
updated_at=now,
origin=old_entry.origin,
display_name=old_entry.display_name,
platform=old_entry.platform,
chat_type=old_entry.chat_type,
)
self._entries[session_key] = new_entry
self._save()
db_create_kwargs = {
"session_id": session_id,
"source": old_entry.platform.value if old_entry.platform else "unknown",
"user_id": old_entry.origin.user_id if old_entry.origin else None,
}
if self._db and db_end_session_id:
try:
self._db.end_session(old_entry.session_id, "session_reset")
self._db.end_session(db_end_session_id, "session_reset")
except Exception as e:
logger.debug("Session DB operation failed: %s", e)
now = datetime.now()
session_id = f"{now.strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:8]}"
new_entry = SessionEntry(
session_key=session_key,
session_id=session_id,
created_at=now,
updated_at=now,
origin=old_entry.origin,
display_name=old_entry.display_name,
platform=old_entry.platform,
chat_type=old_entry.chat_type,
)
self._entries[session_key] = new_entry
self._save()
# Create new session in SQLite
if self._db:
if self._db and db_create_kwargs:
try:
self._db.create_session(
session_id=session_id,
source=old_entry.platform.value if old_entry.platform else "unknown",
user_id=old_entry.origin.user_id if old_entry.origin else None,
)
self._db.create_session(**db_create_kwargs)
except Exception as e:
logger.debug("Session DB operation failed: %s", e)
return new_entry
def switch_session(self, session_key: str, target_session_id: str) -> Optional[SessionEntry]:
@@ -672,52 +850,58 @@ class SessionStore:
generating a fresh session ID, re-uses ``target_session_id`` so the
old transcript is loaded on the next message.
"""
self._ensure_loaded()
db_end_session_id = None
new_entry = None
if session_key not in self._entries:
return None
with self._lock:
self._ensure_loaded_locked()
old_entry = self._entries[session_key]
if session_key not in self._entries:
return None
# Don't switch if already on that session
if old_entry.session_id == target_session_id:
return old_entry
old_entry = self._entries[session_key]
# End the current session in SQLite
if self._db:
# Don't switch if already on that session
if old_entry.session_id == target_session_id:
return old_entry
db_end_session_id = old_entry.session_id
now = _now()
new_entry = SessionEntry(
session_key=session_key,
session_id=target_session_id,
created_at=now,
updated_at=now,
origin=old_entry.origin,
display_name=old_entry.display_name,
platform=old_entry.platform,
chat_type=old_entry.chat_type,
)
self._entries[session_key] = new_entry
self._save()
if self._db and db_end_session_id:
try:
self._db.end_session(old_entry.session_id, "session_switch")
self._db.end_session(db_end_session_id, "session_switch")
except Exception as e:
logger.debug("Session DB end_session failed: %s", e)
now = datetime.now()
new_entry = SessionEntry(
session_key=session_key,
session_id=target_session_id,
created_at=now,
updated_at=now,
origin=old_entry.origin,
display_name=old_entry.display_name,
platform=old_entry.platform,
chat_type=old_entry.chat_type,
)
self._entries[session_key] = new_entry
self._save()
return new_entry
def list_sessions(self, active_minutes: Optional[int] = None) -> List[SessionEntry]:
"""List all sessions, optionally filtered by activity."""
self._ensure_loaded()
entries = list(self._entries.values())
with self._lock:
self._ensure_loaded_locked()
entries = list(self._entries.values())
if active_minutes is not None:
cutoff = datetime.now() - timedelta(minutes=active_minutes)
cutoff = _now() - timedelta(minutes=active_minutes)
entries = [e for e in entries if e.updated_at >= cutoff]
entries.sort(key=lambda e: e.updated_at, reverse=True)
return entries
def get_transcript_path(self, session_id: str) -> Path:
@@ -763,13 +947,17 @@ class SessionStore:
try:
self._db.clear_messages(session_id)
for msg in messages:
role = msg.get("role", "unknown")
self._db.append_message(
session_id=session_id,
role=msg.get("role", "unknown"),
role=role,
content=msg.get("content"),
tool_name=msg.get("tool_name"),
tool_calls=msg.get("tool_calls"),
tool_call_id=msg.get("tool_call_id"),
reasoning=msg.get("reasoning") if role == "assistant" else None,
reasoning_details=msg.get("reasoning_details") if role == "assistant" else None,
codex_reasoning_items=msg.get("codex_reasoning_items") if role == "assistant" else None,
)
except Exception as e:
logger.debug("Failed to rewrite transcript in DB: %s", e)
@@ -782,29 +970,51 @@ class SessionStore:
def load_transcript(self, session_id: str) -> List[Dict[str, Any]]:
"""Load all messages from a session's transcript."""
db_messages = []
# Try SQLite first
if self._db:
try:
messages = self._db.get_messages_as_conversation(session_id)
if messages:
return messages
db_messages = self._db.get_messages_as_conversation(session_id)
except Exception as e:
logger.debug("Could not load messages from DB: %s", e)
# Fall back to legacy JSONL
# Load legacy JSONL transcript (may contain more history than SQLite
# for sessions created before the DB layer was introduced).
transcript_path = self.get_transcript_path(session_id)
if not transcript_path.exists():
return []
messages = []
with open(transcript_path, "r", encoding="utf-8") as f:
for line in f:
line = line.strip()
if line:
messages.append(json.loads(line))
return messages
jsonl_messages = []
if transcript_path.exists():
with open(transcript_path, "r", encoding="utf-8") as f:
for line in f:
line = line.strip()
if line:
try:
jsonl_messages.append(json.loads(line))
except json.JSONDecodeError:
logger.warning(
"Skipping corrupt line in transcript %s: %s",
session_id, line[:120],
)
# Prefer whichever source has more messages.
#
# Background: when a session pre-dates SQLite storage (or when the DB
# layer was added while a long-lived session was already active), the
# first post-migration turn writes only the *new* messages to SQLite
# (because _flush_messages_to_session_db skips messages already in
# conversation_history, assuming they're persisted). On the *next*
# turn load_transcript returns those few SQLite rows and ignores the
# full JSONL history — the model sees a context of 1-4 messages instead
# of hundreds. Using the longer source prevents this silent truncation.
if len(jsonl_messages) > len(db_messages):
if db_messages:
logger.debug(
"Session %s: JSONL has %d messages vs SQLite %d"
"using JSONL (legacy session not yet fully migrated)",
session_id, len(jsonl_messages), len(db_messages),
)
return jsonl_messages
return db_messages
def build_session_context(

View File

@@ -17,6 +17,7 @@ import os
import sys
from datetime import datetime, timezone
from pathlib import Path
from hermes_constants import get_hermes_home
from typing import Any, Optional
_GATEWAY_KIND = "hermes-gateway"
@@ -26,7 +27,7 @@ _LOCKS_DIRNAME = "gateway-locks"
def _get_pid_path() -> Path:
"""Return the path to the gateway PID file, respecting HERMES_HOME."""
home = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
home = get_hermes_home()
return home / "gateway.pid"
@@ -83,11 +84,30 @@ def _looks_like_gateway_process(pid: int) -> bool:
"""Return True when the live PID still looks like the Hermes gateway."""
cmdline = _read_process_cmdline(pid)
if not cmdline:
# If we cannot inspect the process, fall back to the liveness check.
return True
return False
patterns = (
"hermes_cli.main gateway",
"hermes_cli/main.py gateway",
"hermes gateway",
"gateway/run.py",
)
return any(pattern in cmdline for pattern in patterns)
def _record_looks_like_gateway(record: dict[str, Any]) -> bool:
"""Validate gateway identity from PID-file metadata when cmdline is unavailable."""
if record.get("kind") != _GATEWAY_KIND:
return False
argv = record.get("argv")
if not isinstance(argv, list) or not argv:
return False
cmdline = " ".join(str(part) for part in argv)
patterns = (
"hermes_cli.main gateway",
"hermes_cli/main.py gateway",
"hermes gateway",
"gateway/run.py",
)
@@ -178,8 +198,8 @@ def write_runtime_status(
payload = _read_json_file(path) or _build_runtime_status_record()
payload.setdefault("platforms", {})
payload.setdefault("kind", _GATEWAY_KIND)
payload.setdefault("pid", os.getpid())
payload.setdefault("start_time", _get_process_start_time(os.getpid()))
payload["pid"] = os.getpid()
payload["start_time"] = _get_process_start_time(os.getpid())
payload["updated_at"] = _utc_now_iso()
if gateway_state is not None:
@@ -255,6 +275,21 @@ def acquire_scoped_lock(scope: str, identity: str, metadata: Optional[dict[str,
and current_start != existing.get("start_time")
):
stale = True
# Check if process is stopped (Ctrl+Z / SIGTSTP) — stopped
# processes still respond to os.kill(pid, 0) but are not
# actually running. Treat them as stale so --replace works.
if not stale:
try:
_proc_status = Path(f"/proc/{existing_pid}/status")
if _proc_status.exists():
for _line in _proc_status.read_text().splitlines():
if _line.startswith("State:"):
_state = _line.split()[1]
if _state in ("T", "t"): # stopped or tracing stop
stale = True
break
except (OSError, PermissionError):
pass
if stale:
try:
lock_path.unlink(missing_ok=True)
@@ -295,6 +330,25 @@ def release_scoped_lock(scope: str, identity: str) -> None:
pass
def release_all_scoped_locks() -> int:
"""Remove all scoped lock files in the lock directory.
Called during --replace to clean up stale locks left by stopped/killed
gateway processes that did not release their locks gracefully.
Returns the number of lock files removed.
"""
lock_dir = _get_lock_dir()
removed = 0
if lock_dir.exists():
for lock_file in lock_dir.glob("*.lock"):
try:
lock_file.unlink(missing_ok=True)
removed += 1
except OSError:
pass
return removed
def get_running_pid() -> Optional[int]:
"""Return the PID of a running gateway instance, or ``None``.
@@ -325,8 +379,9 @@ def get_running_pid() -> Optional[int]:
return None
if not _looks_like_gateway_process(pid):
remove_pid_file()
return None
if not _record_looks_like_gateway(record):
remove_pid_file()
return None
return pid

View File

@@ -9,9 +9,7 @@ Cache location: ~/.hermes/sticker_cache.json
"""
import json
import os
import time
from pathlib import Path
from typing import Optional
from hermes_cli.config import get_hermes_home

234
gateway/stream_consumer.py Normal file
View File

@@ -0,0 +1,234 @@
"""Gateway streaming consumer — bridges sync agent callbacks to async platform delivery.
The agent fires stream_delta_callback(text) synchronously from its worker thread.
GatewayStreamConsumer:
1. Receives deltas via on_delta() (thread-safe, sync)
2. Queues them to an asyncio task via queue.Queue
3. The async run() task buffers, rate-limits, and progressively edits
a single message on the target platform
Design: Uses the edit transport (send initial message, then editMessageText).
This is universally supported across Telegram, Discord, and Slack.
Credit: jobless0x (#774, #1312), OutThisLife (#798), clicksingh (#697).
"""
from __future__ import annotations
import asyncio
import logging
import queue
import re
import time
from dataclasses import dataclass
from typing import Any, Optional
logger = logging.getLogger("gateway.stream_consumer")
# Sentinel to signal the stream is complete
_DONE = object()
@dataclass
class StreamConsumerConfig:
"""Runtime config for a single stream consumer instance."""
edit_interval: float = 0.3
buffer_threshold: int = 40
cursor: str = ""
class GatewayStreamConsumer:
"""Async consumer that progressively edits a platform message with streamed tokens.
Usage::
consumer = GatewayStreamConsumer(adapter, chat_id, config, metadata=metadata)
# Pass consumer.on_delta as stream_delta_callback to AIAgent
agent = AIAgent(..., stream_delta_callback=consumer.on_delta)
# Start the consumer as an asyncio task
task = asyncio.create_task(consumer.run())
# ... run agent in thread pool ...
consumer.finish() # signal completion
await task # wait for final edit
"""
def __init__(
self,
adapter: Any,
chat_id: str,
config: Optional[StreamConsumerConfig] = None,
metadata: Optional[dict] = None,
):
self.adapter = adapter
self.chat_id = chat_id
self.cfg = config or StreamConsumerConfig()
self.metadata = metadata
self._queue: queue.Queue = queue.Queue()
self._accumulated = ""
self._message_id: Optional[str] = None
self._already_sent = False
self._edit_supported = True # Disabled on first edit failure (Signal/Email/HA)
self._last_edit_time = 0.0
self._last_sent_text = "" # Track last-sent text to skip redundant edits
@property
def already_sent(self) -> bool:
"""True if at least one message was sent/edited — signals the base
adapter to skip re-sending the final response."""
return self._already_sent
def on_delta(self, text: str) -> None:
"""Thread-safe callback — called from the agent's worker thread."""
if text:
self._queue.put(text)
def finish(self) -> None:
"""Signal that the stream is complete."""
self._queue.put(_DONE)
async def run(self) -> None:
"""Async task that drains the queue and edits the platform message."""
# Platform message length limit — leave room for cursor + formatting
_raw_limit = getattr(self.adapter, "MAX_MESSAGE_LENGTH", 4096)
_safe_limit = max(500, _raw_limit - len(self.cfg.cursor) - 100)
try:
while True:
# Drain all available items from the queue
got_done = False
while True:
try:
item = self._queue.get_nowait()
if item is _DONE:
got_done = True
break
self._accumulated += item
except queue.Empty:
break
# Decide whether to flush an edit
now = time.monotonic()
elapsed = now - self._last_edit_time
should_edit = (
got_done
or (elapsed >= self.cfg.edit_interval
and len(self._accumulated) > 0)
or len(self._accumulated) >= self.cfg.buffer_threshold
)
if should_edit and self._accumulated:
# Split overflow: if accumulated text exceeds the platform
# limit, finalize the current message and start a new one.
while (
len(self._accumulated) > _safe_limit
and self._message_id is not None
):
split_at = self._accumulated.rfind("\n", 0, _safe_limit)
if split_at < _safe_limit // 2:
split_at = _safe_limit
chunk = self._accumulated[:split_at]
await self._send_or_edit(chunk)
self._accumulated = self._accumulated[split_at:].lstrip("\n")
self._message_id = None
self._last_sent_text = ""
display_text = self._accumulated
if not got_done:
display_text += self.cfg.cursor
await self._send_or_edit(display_text)
self._last_edit_time = time.monotonic()
if got_done:
# Final edit without cursor
if self._accumulated and self._message_id:
await self._send_or_edit(self._accumulated)
return
await asyncio.sleep(0.05) # Small yield to not busy-loop
except asyncio.CancelledError:
# Best-effort final edit on cancellation
if self._accumulated and self._message_id:
try:
await self._send_or_edit(self._accumulated)
except Exception:
pass
except Exception as e:
logger.error("Stream consumer error: %s", e)
# Pattern to strip MEDIA:<path> tags (including optional surrounding quotes).
# Matches the simple cleanup regex used by the non-streaming path in
# gateway/platforms/base.py for post-processing.
_MEDIA_RE = re.compile(r'''[`"']?MEDIA:\s*\S+[`"']?''')
@staticmethod
def _clean_for_display(text: str) -> str:
"""Strip MEDIA: directives and internal markers from text before display.
The streaming path delivers raw text chunks that may include
``MEDIA:<path>`` tags and ``[[audio_as_voice]]`` directives meant for
the platform adapter's post-processing. The actual media files are
delivered separately via ``_deliver_media_from_response()`` after the
stream finishes — we just need to hide the raw directives from the
user.
"""
if "MEDIA:" not in text and "[[audio_as_voice]]" not in text:
return text
cleaned = text.replace("[[audio_as_voice]]", "")
cleaned = GatewayStreamConsumer._MEDIA_RE.sub("", cleaned)
# Collapse excessive blank lines left behind by removed tags
cleaned = re.sub(r'\n{3,}', '\n\n', cleaned)
# Strip trailing whitespace/newlines but preserve leading content
return cleaned.rstrip()
async def _send_or_edit(self, text: str) -> None:
"""Send or edit the streaming message."""
# Strip MEDIA: directives so they don't appear as visible text.
# Media files are delivered as native attachments after the stream
# finishes (via _deliver_media_from_response in gateway/run.py).
text = self._clean_for_display(text)
if not text.strip():
return
try:
if self._message_id is not None:
if self._edit_supported:
# Skip if text is identical to what we last sent
if text == self._last_sent_text:
return
# Edit existing message
result = await self.adapter.edit_message(
chat_id=self.chat_id,
message_id=self._message_id,
content=text,
)
if result.success:
self._already_sent = True
self._last_sent_text = text
else:
# If an edit fails mid-stream (especially Telegram flood control),
# stop progressive edits and let the normal final send path deliver
# the complete answer instead of leaving the user with a partial.
logger.debug("Edit failed, disabling streaming for this adapter")
self._edit_supported = False
self._already_sent = False
else:
# Editing not supported — skip intermediate updates.
# The final response will be sent by the normal path.
pass
else:
# First message — send new
result = await self.adapter.send(
chat_id=self.chat_id,
content=text,
metadata=self.metadata,
)
if result.success and result.message_id:
self._message_id = result.message_id
self._already_sent = True
self._last_sent_text = text
else:
# Initial send failed — disable streaming for this session
self._edit_supported = False
except Exception as e:
logger.error("Stream send/edit error: %s", e)

11
hermes
View File

@@ -1,12 +1,11 @@
#!/usr/bin/env python3
"""
Hermes Agent CLI Launcher
Hermes Agent CLI launcher.
This is a convenience wrapper to launch the Hermes CLI.
Usage: ./hermes [options]
This wrapper should behave like the installed `hermes` command, including
subcommands such as `gateway`, `cron`, and `doctor`.
"""
if __name__ == "__main__":
from cli import main
import fire
fire.Fire(main)
from hermes_cli.main import main
main()

View File

@@ -11,5 +11,5 @@ Provides subcommands for:
- hermes cron - Manage cron jobs
"""
__version__ = "0.2.0"
__release_date__ = "2026.3.12"
__version__ = "0.7.0"
__release_date__ = "2026.4.3"

File diff suppressed because it is too large Load Diff

493
hermes_cli/auth_commands.py Normal file
View File

@@ -0,0 +1,493 @@
"""Credential-pool auth subcommands."""
from __future__ import annotations
from getpass import getpass
import math
import time
from types import SimpleNamespace
import uuid
from agent.credential_pool import (
AUTH_TYPE_API_KEY,
AUTH_TYPE_OAUTH,
CUSTOM_POOL_PREFIX,
SOURCE_MANUAL,
STATUS_EXHAUSTED,
STRATEGY_FILL_FIRST,
STRATEGY_ROUND_ROBIN,
STRATEGY_RANDOM,
STRATEGY_LEAST_USED,
SUPPORTED_POOL_STRATEGIES,
PooledCredential,
_exhausted_until,
_normalize_custom_pool_name,
get_pool_strategy,
label_from_token,
list_custom_pool_providers,
load_pool,
)
import hermes_cli.auth as auth_mod
from hermes_cli.auth import PROVIDER_REGISTRY
from hermes_constants import OPENROUTER_BASE_URL
# Providers that support OAuth login in addition to API keys.
_OAUTH_CAPABLE_PROVIDERS = {"anthropic", "nous", "openai-codex"}
def _get_custom_provider_names() -> list:
"""Return list of (display_name, pool_key) tuples for custom_providers in config."""
try:
from hermes_cli.config import load_config
config = load_config()
except Exception:
return []
custom_providers = config.get("custom_providers")
if not isinstance(custom_providers, list):
return []
result = []
for entry in custom_providers:
if not isinstance(entry, dict):
continue
name = entry.get("name")
if not isinstance(name, str) or not name.strip():
continue
pool_key = f"{CUSTOM_POOL_PREFIX}{_normalize_custom_pool_name(name)}"
result.append((name.strip(), pool_key))
return result
def _resolve_custom_provider_input(raw: str) -> str | None:
"""If raw input matches a custom_providers entry name (case-insensitive), return its pool key."""
normalized = (raw or "").strip().lower().replace(" ", "-")
if not normalized:
return None
# Direct match on 'custom:name' format
if normalized.startswith(CUSTOM_POOL_PREFIX):
return normalized
for display_name, pool_key in _get_custom_provider_names():
if _normalize_custom_pool_name(display_name) == normalized:
return pool_key
return None
def _normalize_provider(provider: str) -> str:
normalized = (provider or "").strip().lower()
if normalized in {"or", "open-router"}:
return "openrouter"
# Check if it matches a custom provider name
custom_key = _resolve_custom_provider_input(normalized)
if custom_key:
return custom_key
return normalized
def _provider_base_url(provider: str) -> str:
if provider == "openrouter":
return OPENROUTER_BASE_URL
if provider.startswith(CUSTOM_POOL_PREFIX):
from agent.credential_pool import _get_custom_provider_config
cp_config = _get_custom_provider_config(provider)
if cp_config:
return str(cp_config.get("base_url") or "").strip()
return ""
pconfig = PROVIDER_REGISTRY.get(provider)
return pconfig.inference_base_url if pconfig else ""
def _oauth_default_label(provider: str, count: int) -> str:
return f"{provider}-oauth-{count}"
def _api_key_default_label(count: int) -> str:
return f"api-key-{count}"
def _display_source(source: str) -> str:
return source.split(":", 1)[1] if source.startswith("manual:") else source
def _format_exhausted_status(entry) -> str:
if entry.last_status != STATUS_EXHAUSTED:
return ""
reason = getattr(entry, "last_error_reason", None)
reason_text = f" {reason}" if isinstance(reason, str) and reason.strip() else ""
code = f" ({entry.last_error_code})" if entry.last_error_code else ""
exhausted_until = _exhausted_until(entry)
if exhausted_until is None:
return f" exhausted{reason_text}{code}"
remaining = max(0, int(math.ceil(exhausted_until - time.time())))
if remaining <= 0:
return f" exhausted{reason_text}{code} (ready to retry)"
minutes, seconds = divmod(remaining, 60)
hours, minutes = divmod(minutes, 60)
days, hours = divmod(hours, 24)
if days:
wait = f"{days}d {hours}h"
elif hours:
wait = f"{hours}h {minutes}m"
elif minutes:
wait = f"{minutes}m {seconds}s"
else:
wait = f"{seconds}s"
return f" exhausted{reason_text}{code} ({wait} left)"
def auth_add_command(args) -> None:
provider = _normalize_provider(getattr(args, "provider", ""))
if provider not in PROVIDER_REGISTRY and provider != "openrouter" and not provider.startswith(CUSTOM_POOL_PREFIX):
raise SystemExit(f"Unknown provider: {provider}")
requested_type = str(getattr(args, "auth_type", "") or "").strip().lower()
if requested_type in {AUTH_TYPE_API_KEY, "api-key"}:
requested_type = AUTH_TYPE_API_KEY
if not requested_type:
if provider.startswith(CUSTOM_POOL_PREFIX):
requested_type = AUTH_TYPE_API_KEY
else:
requested_type = AUTH_TYPE_OAUTH if provider in {"anthropic", "nous", "openai-codex"} else AUTH_TYPE_API_KEY
pool = load_pool(provider)
if requested_type == AUTH_TYPE_API_KEY:
token = (getattr(args, "api_key", None) or "").strip()
if not token:
token = getpass("Paste your API key: ").strip()
if not token:
raise SystemExit("No API key provided.")
default_label = _api_key_default_label(len(pool.entries()) + 1)
label = (getattr(args, "label", None) or "").strip()
if not label:
label = input(f"Label (optional, default: {default_label}): ").strip() or default_label
entry = PooledCredential(
provider=provider,
id=uuid.uuid4().hex[:6],
label=label,
auth_type=AUTH_TYPE_API_KEY,
priority=0,
source=SOURCE_MANUAL,
access_token=token,
base_url=_provider_base_url(provider),
)
pool.add_entry(entry)
print(f'Added {provider} credential #{len(pool.entries())}: "{label}"')
return
if provider == "anthropic":
from agent import anthropic_adapter as anthropic_mod
creds = anthropic_mod.run_hermes_oauth_login_pure()
if not creds:
raise SystemExit("Anthropic OAuth login did not return credentials.")
label = (getattr(args, "label", None) or "").strip() or label_from_token(
creds["access_token"],
_oauth_default_label(provider, len(pool.entries()) + 1),
)
entry = PooledCredential(
provider=provider,
id=uuid.uuid4().hex[:6],
label=label,
auth_type=AUTH_TYPE_OAUTH,
priority=0,
source=f"{SOURCE_MANUAL}:hermes_pkce",
access_token=creds["access_token"],
refresh_token=creds.get("refresh_token"),
expires_at_ms=creds.get("expires_at_ms"),
base_url=_provider_base_url(provider),
)
pool.add_entry(entry)
print(f'Added {provider} OAuth credential #{len(pool.entries())}: "{entry.label}"')
return
if provider == "nous":
creds = auth_mod._nous_device_code_login(
portal_base_url=getattr(args, "portal_url", None),
inference_base_url=getattr(args, "inference_url", None),
client_id=getattr(args, "client_id", None),
scope=getattr(args, "scope", None),
open_browser=not getattr(args, "no_browser", False),
timeout_seconds=getattr(args, "timeout", None) or 15.0,
insecure=bool(getattr(args, "insecure", False)),
ca_bundle=getattr(args, "ca_bundle", None),
min_key_ttl_seconds=max(60, int(getattr(args, "min_key_ttl_seconds", 5 * 60))),
)
label = (getattr(args, "label", None) or "").strip() or label_from_token(
creds.get("access_token", ""),
_oauth_default_label(provider, len(pool.entries()) + 1),
)
entry = PooledCredential.from_dict(provider, {
**creds,
"label": label,
"auth_type": AUTH_TYPE_OAUTH,
"source": f"{SOURCE_MANUAL}:device_code",
"base_url": creds.get("inference_base_url"),
})
pool.add_entry(entry)
print(f'Added {provider} OAuth credential #{len(pool.entries())}: "{entry.label}"')
return
if provider == "openai-codex":
creds = auth_mod._codex_device_code_login()
label = (getattr(args, "label", None) or "").strip() or label_from_token(
creds["tokens"]["access_token"],
_oauth_default_label(provider, len(pool.entries()) + 1),
)
entry = PooledCredential(
provider=provider,
id=uuid.uuid4().hex[:6],
label=label,
auth_type=AUTH_TYPE_OAUTH,
priority=0,
source=f"{SOURCE_MANUAL}:device_code",
access_token=creds["tokens"]["access_token"],
refresh_token=creds["tokens"].get("refresh_token"),
base_url=creds.get("base_url"),
last_refresh=creds.get("last_refresh"),
)
pool.add_entry(entry)
print(f'Added {provider} OAuth credential #{len(pool.entries())}: "{entry.label}"')
return
raise SystemExit(f"`hermes auth add {provider}` is not implemented for auth type {requested_type} yet.")
def auth_list_command(args) -> None:
provider_filter = _normalize_provider(getattr(args, "provider", "") or "")
if provider_filter:
providers = [provider_filter]
else:
providers = sorted({
*PROVIDER_REGISTRY.keys(),
"openrouter",
*list_custom_pool_providers(),
})
for provider in providers:
pool = load_pool(provider)
entries = pool.entries()
if not entries:
continue
current = pool.peek()
print(f"{provider} ({len(entries)} credentials):")
for idx, entry in enumerate(entries, start=1):
marker = " "
if current is not None and entry.id == current.id:
marker = ""
status = _format_exhausted_status(entry)
source = _display_source(entry.source)
print(f" #{idx} {entry.label:<20} {entry.auth_type:<7} {source}{status} {marker}".rstrip())
print()
def auth_remove_command(args) -> None:
provider = _normalize_provider(getattr(args, "provider", ""))
target = getattr(args, "target", None)
if target is None:
target = getattr(args, "index", None)
pool = load_pool(provider)
index, matched, error = pool.resolve_target(target)
if matched is None or index is None:
raise SystemExit(f"{error} Provider: {provider}.")
removed = pool.remove_index(index)
if removed is None:
raise SystemExit(f'No credential matching "{target}" for provider {provider}.')
print(f"Removed {provider} credential #{index} ({removed.label})")
# If this was an env-seeded credential, also clear the env var from .env
# so it doesn't get re-seeded on the next load_pool() call.
if removed.source.startswith("env:"):
env_var = removed.source[len("env:"):]
if env_var:
from hermes_cli.config import remove_env_value
cleared = remove_env_value(env_var)
if cleared:
print(f"Cleared {env_var} from .env")
def auth_reset_command(args) -> None:
provider = _normalize_provider(getattr(args, "provider", ""))
pool = load_pool(provider)
count = pool.reset_statuses()
print(f"Reset status on {count} {provider} credentials")
def _interactive_auth() -> None:
"""Interactive credential pool management when `hermes auth` is called bare."""
# Show current pool status first
print("Credential Pool Status")
print("=" * 50)
auth_list_command(SimpleNamespace(provider=None))
print()
# Main menu
choices = [
"Add a credential",
"Remove a credential",
"Reset cooldowns for a provider",
"Set rotation strategy for a provider",
"Exit",
]
print("What would you like to do?")
for i, choice in enumerate(choices, 1):
print(f" {i}. {choice}")
try:
raw = input("\nChoice: ").strip()
except (EOFError, KeyboardInterrupt):
return
if not raw or raw == str(len(choices)):
return
if raw == "1":
_interactive_add()
elif raw == "2":
_interactive_remove()
elif raw == "3":
_interactive_reset()
elif raw == "4":
_interactive_strategy()
def _pick_provider(prompt: str = "Provider") -> str:
"""Prompt for a provider name with auto-complete hints."""
known = sorted(set(list(PROVIDER_REGISTRY.keys()) + ["openrouter"]))
custom_names = _get_custom_provider_names()
if custom_names:
custom_display = [name for name, _key in custom_names]
print(f"\nKnown providers: {', '.join(known)}")
print(f"Custom endpoints: {', '.join(custom_display)}")
else:
print(f"\nKnown providers: {', '.join(known)}")
try:
raw = input(f"{prompt}: ").strip()
except (EOFError, KeyboardInterrupt):
raise SystemExit()
return _normalize_provider(raw)
def _interactive_add() -> None:
provider = _pick_provider("Provider to add credential for")
if provider not in PROVIDER_REGISTRY and provider != "openrouter" and not provider.startswith(CUSTOM_POOL_PREFIX):
raise SystemExit(f"Unknown provider: {provider}")
# For OAuth-capable providers, ask which type
if provider in _OAUTH_CAPABLE_PROVIDERS:
print(f"\n{provider} supports both API keys and OAuth login.")
print(" 1. API key (paste a key from the provider dashboard)")
print(" 2. OAuth login (authenticate via browser)")
try:
type_choice = input("Type [1/2]: ").strip()
except (EOFError, KeyboardInterrupt):
return
if type_choice == "2":
auth_type = "oauth"
else:
auth_type = "api_key"
else:
auth_type = "api_key"
label = None
try:
typed_label = input("Label / account name (optional): ").strip()
except (EOFError, KeyboardInterrupt):
return
if typed_label:
label = typed_label
auth_add_command(SimpleNamespace(
provider=provider, auth_type=auth_type, label=label, api_key=None,
portal_url=None, inference_url=None, client_id=None, scope=None,
no_browser=False, timeout=None, insecure=False, ca_bundle=None,
))
def _interactive_remove() -> None:
provider = _pick_provider("Provider to remove credential from")
pool = load_pool(provider)
if not pool.has_credentials():
print(f"No credentials for {provider}.")
return
# Show entries with indices
for i, e in enumerate(pool.entries(), 1):
exhausted = _format_exhausted_status(e)
print(f" #{i} {e.label:25s} {e.auth_type:10s} {e.source}{exhausted} [id:{e.id}]")
try:
raw = input("Remove #, id, or label (blank to cancel): ").strip()
except (EOFError, KeyboardInterrupt):
return
if not raw:
return
auth_remove_command(SimpleNamespace(provider=provider, target=raw))
def _interactive_reset() -> None:
provider = _pick_provider("Provider to reset cooldowns for")
auth_reset_command(SimpleNamespace(provider=provider))
def _interactive_strategy() -> None:
provider = _pick_provider("Provider to set strategy for")
current = get_pool_strategy(provider)
strategies = [STRATEGY_FILL_FIRST, STRATEGY_ROUND_ROBIN, STRATEGY_LEAST_USED, STRATEGY_RANDOM]
print(f"\nCurrent strategy for {provider}: {current}")
print()
descriptions = {
STRATEGY_FILL_FIRST: "Use first key until exhausted, then next",
STRATEGY_ROUND_ROBIN: "Cycle through keys evenly",
STRATEGY_LEAST_USED: "Always pick the least-used key",
STRATEGY_RANDOM: "Random selection",
}
for i, s in enumerate(strategies, 1):
marker = "" if s == current else ""
print(f" {i}. {s:15s}{descriptions.get(s, '')}{marker}")
try:
raw = input("\nStrategy [1-4]: ").strip()
except (EOFError, KeyboardInterrupt):
return
if not raw:
return
try:
idx = int(raw) - 1
strategy = strategies[idx]
except (ValueError, IndexError):
print("Invalid choice.")
return
from hermes_cli.config import load_config, save_config
cfg = load_config()
pool_strategies = cfg.get("credential_pool_strategies") or {}
if not isinstance(pool_strategies, dict):
pool_strategies = {}
pool_strategies[provider] = strategy
cfg["credential_pool_strategies"] = pool_strategies
save_config(cfg)
print(f"Set {provider} strategy to: {strategy}")
def auth_command(args) -> None:
action = getattr(args, "auth_action", "")
if action == "add":
auth_add_command(args)
return
if action == "list":
auth_list_command(args)
return
if action == "remove":
auth_remove_command(args)
return
if action == "reset":
auth_reset_command(args)
return
# No subcommand — launch interactive mode
_interactive_auth()

Some files were not shown because too many files have changed in this diff Show More