Compare commits

...

157 Commits

Author SHA1 Message Date
Teknium 00c65280c4 feat(xai): add video generation, image editing, and X search tools
Cherry-picked from PR #10600 by Jaaneek — the media/search tool additions,
separated from the core provider upgrade (PR #10783).

NOTE: Depends on PR #10783 being merged first (for xai_http.py, codex_responses
transport, and XAI_API_KEY env var).

- Add video generation tool (generate, edit, extend) with async polling
- Add xAI image generation/editing backend alongside FAL
- Add X search tool backed by xAI Responses API
- Add x_search and video_gen toolset definitions
- Add CONFIGURABLE_TOOLSETS entries for tools_config UI
- Wire into safe and api-server toolsets
- Add test coverage for all new tools

Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>
2026-04-15 22:44:40 -07:00
Teknium 3ff18ffe14 fix: add circuit breaker to MCP tool handler to prevent retry burn loops (#10447) (#10776)
When an MCP server returns errors consistently (crashed, disconnected,
auth expired), the model sees each error and retries the tool call.
With no circuit breaker, this burned through all 90 iterations — each
one a full LLM API call plus failed MCP call — producing 15-45 minutes
of zero useful output while the gateway inactivity timeout never fired
(because the agent WAS active, just uselessly).

Fix: track consecutive error counts per MCP server. After 3 consecutive
failures (connection errors, MCP-level errors, or transport exceptions),
the handler short-circuits with a message telling the model to stop
retrying and use alternative approaches. The counter resets to 0 on
any successful call.

Closes #10447
2026-04-15 22:33:48 -07:00
Teknium 36b54afbc4 feat(plugins): add dispatch_tool() to PluginContext (#10763)
Expands the plugin interface so slash command handlers can dispatch tool
calls through the registry with parent agent context wired up automatically.

This is the public API for plugins that need to orchestrate tools like
delegate_task — they call ctx.dispatch_tool() instead of reaching into
framework internals. The parent agent is resolved lazily from _cli_ref
when available (CLI mode) and omitted in gateway mode (tools degrade
gracefully).

Enables the hermes-deliver-plugin pattern where /deliver and /fanout
slash commands spawn subagents via delegate_task without touching the
agent conversation loop.

7 new tests covering: registry delegation, parent_agent injection from
cli_ref, gateway mode (no cli_ref), uninitialized agent, explicit
parent_agent override, kwargs forwarding, return value passthrough.
2026-04-15 22:23:01 -07:00
Teknium 9b7bd4ca61 docs: add missing pages to sidebar navigation (#10758)
* feat: implement register_command() on plugin context

Complete the half-built plugin slash command system. The dispatch
code in cli.py and gateway/run.py already called
get_plugin_command_handler() but the registration side was never
implemented.

Changes:
- Add register_command() to PluginContext — stores handler,
  description, and plugin name; normalizes names; rejects conflicts
  with built-in commands
- Add _plugin_commands dict to PluginManager
- Add commands_registered tracking on LoadedPlugin
- Add get_plugin_command_handler() and get_plugin_commands()
  module-level convenience functions
- Fix commands.py to use actual plugin description in Telegram
  bot menu (was hardcoded 'Plugin command')
- Add plugin commands to SlashCommandCompleter autocomplete
- Show command count in /plugins display
- 12 new tests covering registration, conflict detection,
  normalization, handler dispatch, and introspection

Closes #10495

* docs: add register_command() to plugin guides

- Build a Plugin guide: new 'Register slash commands' section with
  full API reference, comparison table vs register_cli_command(),
  sync/async examples, and conflict protection docs
- Features/Plugins page: add slash commands to capabilities table
  and plugin types summary

* docs: add missing pages to sidebar navigation

- guides/aws-bedrock → Guides & Tutorials
- user-guide/features/credential-pools → Integrations
2026-04-15 22:22:43 -07:00
Teknium 8a246910bf fix: reject startup when no provider configured instead of silent OpenRouter fallback (#10766)
When no provider was set in config.yaml and auto-detection found no
credentials, the agent silently fell back to bare OPENROUTER_API_KEY
from the environment and sent the configured model name to OpenRouter.
This produced undefined behavior -- wrong provider, wrong model routing,
and auxiliary tasks (compression, vision) hitting the wrong endpoint.

Fix: replace the silent fallback with a hard RuntimeError telling
the user to run hermes model or hermes setup. The provider must
be explicitly configured -- env vars are for secrets, not config.
2026-04-15 22:22:07 -07:00
leeyang1990 c5acc6edb6 feat(telegram): add dedicated TELEGRAM_PROXY env var and config.yaml proxy_url support
Pass platform_env_var="TELEGRAM_PROXY" to resolve_proxy_url() in both
telegram.py (main connect) and telegram_network.py (fallback transport),
so a Telegram-specific proxy takes priority over the generic HTTPS_PROXY.

Also bridge telegram.proxy_url from config.yaml to the TELEGRAM_PROXY
env var (env var takes precedence if both are set), add OPTIONAL_ENV_VARS
entry, docs, and tests.

Composite salvage of four community PRs:
- Core approach (both call sites): #9414 by @leeyang1990
- config.yaml bridging + docs: #6530 by @WhiteWorld
- Naming convention: #9074 by @brantzh6
- Earlier proxy work: #7786 by @ten-ltw

Closes #9414, closes #9074, closes #7786, closes #6530

Co-authored-by: WhiteWorld <WhiteWorld@users.noreply.github.com>
Co-authored-by: brantzh6 <brantzh6@users.noreply.github.com>
Co-authored-by: ten-ltw <ten-ltw@users.noreply.github.com>
2026-04-15 22:13:11 -07:00
kshitijk4poor ff5bf0d6c8 fix(tests): resolve CI test failures — pool auto-seeding, stale assertions, mock isolation
Salvaged from PR #10643 by kshitijk4poor, updated for current main.

Root causes fixed:
1. Telegram xdist mock pollution — new tests/gateway/conftest.py with shared
   mock that runs at collection time (prevents ChatType=None caching)
2. VIRTUAL_ENV env var leak — monkeypatch.delenv in _detect_venv_dir tests
3. Copilot base_url missing — add fallback in _resolve_runtime_from_pool_entry
4. Stale vision model assertion — zai now uses glm-5v-turbo
5. Reasoning item id intentionally stripped — assert 'id' not in (store=False)
6. Context length warning unreachable — pass base_url to AIAgent in test
7. Kimi provider label updated — 'Kimi / Kimi Coding Plan' matches models.py
8. Google Workspace calendar tests — rewritten for current production code,
   properly mock subprocess on api_module, removed stale +agenda assertions
9. Credential pool auto-seeding — mock _select_pool_entry / _resolve_auto /
   _import_codex_cli_tokens to prevent real credentials from leaking into tests
2026-04-15 22:05:21 -07:00
Austin Pickett cedaefce9e Merge pull request #10704 from NousResearch/revert-10686-feat/vercel-deployment
Revert "feat: add vercel deployment, remove old landing page"
2026-04-15 20:30:31 -07:00
Austin Pickett 4683b97d92 Revert "feat: add vercel deployment, remove old landing page (#10686)"
This reverts commit 51d5c76488.
2026-04-15 23:29:41 -04:00
Austin Pickett 51d5c76488 feat: add vercel deployment, remove old landing page (#10686) 2026-04-15 20:12:52 -07:00
Teknium fb903b8f08 docs: document register_command() for plugin slash commands (#10671)
* feat: implement register_command() on plugin context

Complete the half-built plugin slash command system. The dispatch
code in cli.py and gateway/run.py already called
get_plugin_command_handler() but the registration side was never
implemented.

Changes:
- Add register_command() to PluginContext — stores handler,
  description, and plugin name; normalizes names; rejects conflicts
  with built-in commands
- Add _plugin_commands dict to PluginManager
- Add commands_registered tracking on LoadedPlugin
- Add get_plugin_command_handler() and get_plugin_commands()
  module-level convenience functions
- Fix commands.py to use actual plugin description in Telegram
  bot menu (was hardcoded 'Plugin command')
- Add plugin commands to SlashCommandCompleter autocomplete
- Show command count in /plugins display
- 12 new tests covering registration, conflict detection,
  normalization, handler dispatch, and introspection

Closes #10495

* docs: add register_command() to plugin guides

- Build a Plugin guide: new 'Register slash commands' section with
  full API reference, comparison table vs register_cli_command(),
  sync/async examples, and conflict protection docs
- Features/Plugins page: add slash commands to capabilities table
  and plugin types summary
2026-04-15 19:55:25 -07:00
Teknium 498b995c13 feat: implement register_command() on plugin context (#10626)
Complete the half-built plugin slash command system. The dispatch
code in cli.py and gateway/run.py already called
get_plugin_command_handler() but the registration side was never
implemented.

Changes:
- Add register_command() to PluginContext — stores handler,
  description, and plugin name; normalizes names; rejects conflicts
  with built-in commands
- Add _plugin_commands dict to PluginManager
- Add commands_registered tracking on LoadedPlugin
- Add get_plugin_command_handler() and get_plugin_commands()
  module-level convenience functions
- Fix commands.py to use actual plugin description in Telegram
  bot menu (was hardcoded 'Plugin command')
- Add plugin commands to SlashCommandCompleter autocomplete
- Show command count in /plugins display
- 12 new tests covering registration, conflict detection,
  normalization, handler dispatch, and introspection

Closes #10495
2026-04-15 19:53:11 -07:00
Teknium df714add9d fix: preserve file permissions on atomic writes (Docker/NAS fix) (#10618)
atomic_yaml_write() and atomic_json_write() used tempfile.mkstemp()
which creates files with 0o600 (owner-only). After os.replace(), the
original file's permissions were destroyed. Combined with _secure_file()
forcing 0o600, this broke Docker/NAS setups where volume-mounted config
files need broader permissions (e.g. 0o666).

Changes:
- atomic_yaml_write/atomic_json_write: capture original permissions
  before write, restore after os.replace()
- _secure_file: skip permission tightening in container environments
  (detected via /.dockerenv, /proc/1/cgroup, or HERMES_SKIP_CHMOD env)
- save_env_value: preserve original .env permissions, remove redundant
  third os.chmod call
- remove_env_value: same permission preservation

On desktop installs, _secure_file() still tightens to 0o600 as before.
In containers, the user's original permissions are respected.

Reported by Cedric Weber (Docker/Portainer on NAS).
2026-04-15 19:52:46 -07:00
Teknium cc6e8941db feat(honcho): context injection overhaul, 5-tool surface, cost safety, session isolation (#10619)
Salvaged from PR #9884 by erosika. Cherry-picked plugin changes onto
current main with minimal core modifications.

Plugin changes (plugins/memory/honcho/):
- New honcho_reasoning tool (5th tool, splits LLM calls from honcho_context)
- Two-layer context injection: base context (summary + representation + card)
  on contextCadence, dialectic supplement on dialecticCadence
- Multi-pass dialectic depth (1-3 passes) with early bail-out on strong signal
- Cold/warm prompt selection based on session state
- dialecticCadence defaults to 3 (was 1) — ~66% fewer Honcho LLM calls
- Session summary injection for conversational continuity
- Bidirectional peer targeting on all 5 tools
- Correctness fixes: peer param fallback, None guard on set_peer_card,
  schema validation, signal_sufficient anchored regex, mid->medium level fix

Core changes (~20 lines across 3 files):
- agent/memory_manager.py: Enhanced sanitize_context() to strip full
  <memory-context> blocks and system notes (prevents leak from saveMessages)
- run_agent.py: gateway_session_key param for stable per-chat Honcho sessions,
  on_turn_start() call before prefetch_all() for cadence tracking,
  sanitize_context() on user messages to strip leaked memory blocks
- gateway/run.py: skip_memory=True on 2 temp agents (prevents orphan sessions),
  gateway_session_key threading to main agent

Tests: 509 passed (3 skipped — honcho SDK not installed locally)
Docs: Updated honcho.md, memory-providers.md, tools-reference.md, SKILL.md

Co-authored-by: erosika <erosika@users.noreply.github.com>
2026-04-15 19:12:19 -07:00
Kovyrin Family Claw 00ff9a26cd Fix Telegram link preview suppression for bot sends 2026-04-15 17:54:43 -07:00
Oleksiy Kovyrin 192ef00bb2 docs(config): document telegram link preview setting 2026-04-15 17:54:43 -07:00
Oleksiy Kovyrin 5221ff9ed1 fix(telegram): tolerate bare adapters in link preview helper 2026-04-15 17:54:43 -07:00
Kovyrin Family Claw aea3499e56 feat(telegram): add config option to disable link previews 2026-04-15 17:54:43 -07:00
root 06d6903d3c fix(telegram): escape Markdown special chars in send_exec_approval
The command preview and description were wrapped in Markdown v1 inline
code (backticks) without escaping, causing Telegram API parse errors
when the command itself contained backticks or asterisks.

Fixes: 'Can't parse entities: can't find end of the entity'
2026-04-15 17:54:36 -07:00
jneeee 4936b19144 fix(cron): guard telegram import in _send_to_platform against ImportError
Wrap the TelegramAdapter import in _send_to_platform() with a try/except
ImportError guard, matching the existing Feishu pattern in the same function.

When python-telegram-bot is not installed, the import no longer crashes the
cron scheduler. Instead, MAX_MESSAGE_LENGTH falls back to a hardcoded 4096.

The _send_telegram() function already had its own ImportError guard for the
telegram package; this fixes the remaining bare import of TelegramAdapter
in the platform-routing function.
2026-04-15 17:54:33 -07:00
Mil Wang (from Dev Box) 63548e4fe1 fix: validate Telegram bot token format during gateway setup (#9843)
The setup wizard accepted any string as a Telegram bot token without
validation. Invalid tokens were only caught at runtime when the gateway
failed to connect, with no clear error message.

Add regex validation for the expected format (<numeric_id>:<hash>) and
loop until a valid token is entered or the user cancels.
2026-04-15 17:54:19 -07:00
Roque 92a23479c0 fix(model-switch): normalize Unicode dashes from Telegram/iOS input
Telegram on iOS auto-converts double hyphens (--) to em dashes (—)
or en dashes (–) via autocorrect. This breaks /model flag parsing
since parse_model_flags() only recognizes literal '--provider' and
'--global'.

When the flag isn't parsed, the entire string (e.g. 'glm-5.1 —provider zai')
gets treated as the model name and fails with 'Model names cannot
contain spaces.'

Fix: normalize Unicode dashes (U+2012-U+2015) to '--' when they
appear before flag keywords (provider, global), before flag extraction.

The existing test suite in test_model_switch_provider_routing.py
already covers all four dash variants — this commit adds the code
that makes them pass.
2026-04-15 17:54:16 -07:00
flobo3 c6398fcaab fix(prompt): list all supported Telegram markdown formatting 2026-04-15 17:54:13 -07:00
helix4u e7c61baaa1 fix: include telegram dependency in termux bundle 2026-04-15 17:54:10 -07:00
cuyua9 5d3a81408d docs: document Telegram ignored threads 2026-04-15 17:54:07 -07:00
Xowiek 21cd3a3fc0 fix(profile): use existing get_active_profile_name() for /profile command
Replace inline Path.home() / '.hermes' / 'profiles' detection in both CLI
and gateway /profile handlers with the existing get_active_profile_name()
from hermes_cli.profiles — which already handles custom-root deployments,
standard profiles, and Docker layouts.

Fixes /profile incorrectly reporting 'default' when HERMES_HOME points to
a custom-root profile path like /opt/data/profiles/coder.

Based on PR #10484 by Xowiek.
2026-04-15 17:52:03 -07:00
Xowiek 77435c4f13 fix(gateway): use profile-aware Hermes paths in runtime hints 2026-04-15 17:52:03 -07:00
Teknium 5ef0fe1665 docs: fix stale hermes login references in hermes-agent skill (#10603)
Follow-up to #10471 — replace remaining 'hermes login --provider'
references with current 'hermes auth' flow.
2026-04-15 17:43:54 -07:00
Teknium c850a40e4e fix: gate Matrix adapter path on media_files presence
Text-only Matrix sends should continue using the lightweight _send_matrix()
HTTP helper (~100ms). Only route through the heavy MatrixAdapter (full sync +
E2EE setup) when media files are present. Adds test verifying text-only
messages don't take the adapter path.
2026-04-15 17:37:43 -07:00
Teknium 276ed5c399 fix(send_message): deliver Matrix media via adapter
Matrix media delivery was silently dropped by send_message because Matrix
wasn't wired into the native adapter-backed media path. Only Telegram,
Discord, and Weixin had native media support.

Adds _send_matrix_via_adapter() which creates a MatrixAdapter instance,
connects, sends text + media via the adapter's native upload methods
(send_document, send_image_file, send_video, send_voice), then disconnects.

Also fixes a stale URL-encoding assertion in test_send_message_missing_platforms
that broke after PR #10151 added quote() to room IDs.

Cherry-picked from PR #10486 by helix4u.
2026-04-15 17:37:43 -07:00
Joshua Santos 55c8098601 docs: update openai-codex setup reference (#10471)
Fixes stale openai-codex onboarding reference in cli-config.yaml.example
2026-04-15 17:37:05 -07:00
Teknium b750c720cd fix: three CLI quality-of-life fixes (#10468, #10230, #10526, #9545) (#10599)
Three independent fixes batched together:

1. hermes auth add crashes on non-interactive stdin (#10468)
   input() for the label prompt was called without checking isatty().
   In scripted/CI environments this raised EOFError. Fix: check
   sys.stdin.isatty() and fall back to the computed default label.

2. Subcommand help prints twice (#10230)
   'hermes dashboard -h' printed help text twice because the
   SystemExit(0) from argparse was caught by the fallback retry
   logic, which re-parsed and printed help again. Fix: re-raise
   SystemExit with code 0 (help/version) immediately.

3. Duplicate entries in /model picker (#10526, #9545)
   - Kimi showed 2x because kimi-coding and kimi-coding-cn both
     mapped to the same models.dev ID. Fix: track seen mdev_ids
     and skip aliases.
   - Providers could show 2-3x from case-variant slugs across the
     four loading paths. Fix: normalize all seen_slugs membership
     checks and insertions to lowercase.

Closes #10468, #10230, #10526, #9545
2026-04-15 17:34:15 -07:00
Teknium a6ad8ace29 chore: add handsdiff to AUTHOR_MAP 2026-04-15 17:26:31 -07:00
handsdiff 933fbd8fea fix: prevent agent hang when backgrounding processes via terminal tool
bash -lic with a PTY enables job control (set -m), which waits for all
background jobs before the shell exits. A command like
`python3 -m http.server &>/dev/null &` hangs forever because the shell
never completes.

Prefix `set +m;` to disable job control while keeping -i for .bashrc
sourcing and PTY for interactive tools.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 17:26:31 -07:00
Greer Guthrie 33ff29dfae fix(gateway): defer background review notifications until after main reply
Background review notifications ("💾 Skill created", "💾 Memory updated")
could race ahead of the main assistant reply in chat, making it look like
the agent stopped after creating a skill.

Gate bg-review notifications behind a threading.Event + pending queue.
Register a release callback on the adapter's _post_delivery_callbacks dict
so base.py's finally block fires it after the main response is delivered.

The queued-message path in _run_agent pops and calls the callback directly
to prevent double-fire.

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
Closes #10541
2026-04-15 17:23:15 -07:00
Teknium 44941f0ed1 fix: activate WeCom callback message deduplication (#10305) (#10588)
WecomCallbackAdapter declared a _seen_messages dict and
MESSAGE_DEDUP_TTL_SECONDS constant but never actually checked
them in _handle_callback(). WeCom retries callback deliveries
on timeout, and each retry with the same MsgId was treated as
a fresh message and queued for processing.

Fix: check _seen_messages before enqueuing. Uses the same TTL-
based pattern as MessageDeduplicator (fixed in #10306) — check
age before returning duplicate, prune on overflow.

Closes #10305
2026-04-15 17:22:58 -07:00
Teknium 4fdcae6c91 fix: use absolute skill_dir for external skills (#10313) (#10587)
_load_skill_payload() reconstructed skill_dir as SKILLS_DIR / relative_path,
which is wrong for external skills from skills.external_dirs — they live
outside SKILLS_DIR entirely. Scripts and linked files failed to load.

Fix: skill_view() now includes the absolute skill_dir in its result dict.
_load_skill_payload() uses that directly when available, falling back to
the SKILLS_DIR-relative reconstruction only for legacy responses.

Closes #10313
2026-04-15 17:22:55 -07:00
shin4 63d045b51a fix: pass HERMES_HOME to execute_code subprocess (#6644)
Add "HERMES_" to _SAFE_ENV_PREFIXES in code_execution_tool.py so HERMES_HOME and other Hermes env vars pass through to execute_code subprocesses. Fixes vision_analyze and other tools that rely on get_hermes_home() failing in Docker environments with non-default HERMES_HOME.

Authored by @shin4.
2026-04-15 17:13:11 -07:00
Teknium e402906d48 fix: five HERMES_HOME profile-isolation leaks (#10570)
* fix: show correct env var name in provider API key error (#9506)

The error message for missing provider API keys dynamically built
the env var name as PROVIDER_API_KEY (e.g. ALIBABA_API_KEY), but
some providers use different names (alibaba uses DASHSCOPE_API_KEY).
Users following the error message set the wrong variable.

Fix: look up the actual env var from PROVIDER_REGISTRY before
building the error. Falls back to the dynamic name if the registry
lookup fails.

Closes #9506

* fix: five HERMES_HOME profile-isolation leaks (#5947)

Bug A: Thread session_title from session_db to memory provider init kwargs
so honcho can derive chat-scoped session keys instead of falling back to
cwd-based naming that merges all gateway users into one session.

Bug B: Replace 14 hardcoded ~/.hermes/skills/ paths across 10 skill files
with HERMES_HOME-aware alternatives (${HERMES_HOME:-$HOME/.hermes} in
shell, os.environ.get('HERMES_HOME', ...) in Python).

Bug C: install.sh now respects HERMES_HOME env var and adds --hermes-home
flag. Previously --dir only set INSTALL_DIR while HERMES_HOME was always
hardcoded to $HOME/.hermes.

Bug D: Remove hardcoded ~/.hermes/honcho.json fallback in resolve_config_path().
Non-default profiles no longer silently inherit the default profile's honcho
config. Falls through to ~/.honcho/config.json (global) instead.

Bug E: Guard _edit_skill, _patch_skill, _delete_skill, _write_file, and
_remove_file against writing to skills found in external_dirs. Skills
outside the local SKILLS_DIR are now read-only from the agent's perspective.

Closes #5947
2026-04-15 17:09:41 -07:00
Teknium c483b4ceca fix: use POSIX ps -A instead of BSD -ax for Docker compat (#9723) (#10569)
procps-ng 4.0.4 in Docker rejects BSD-style 'ps eww -ax' with a
'must set personality' error, causing find_gateway_pids() to return
empty and falsely report the gateway as not running.

Fix: replace 'ps eww -ax' with 'ps -A eww'. -A is the POSIX
equivalent of BSD -ax (select all processes), and the eww modifiers
(show environment + wide output) still work as BSD flags alongside
the POSIX -A flag. This preserves the HERMES_HOME= environment
visibility needed for profile-aware PID matching.

Closes #9723
2026-04-15 17:07:22 -07:00
Teknium 9d9b424390 fix: Nous Portal rate limit guard — prevent retry amplification (#10568)
When Nous returns a 429, the retry amplification chain burns up to 9
API requests per conversation turn (3 SDK retries × 3 Hermes retries),
each counting against RPH and deepening the rate limit. With multiple
concurrent sessions (cron + gateway + auxiliary), this creates a spiral
where retries keep the limit tapped indefinitely.

New module: agent/nous_rate_guard.py
- Shared file-based rate limit state (~/.hermes/rate_limits/nous.json)
- Parses reset time from x-ratelimit-reset-requests-1h, x-ratelimit-
  reset-requests, retry-after headers, or error context
- Falls back to 5-minute default cooldown if no header data
- Atomic writes (tempfile + rename) for cross-process safety
- Auto-cleanup of expired state files

run_agent.py changes:
- Top-of-retry-loop guard: when another session already recorded Nous
  as rate-limited, skip the API call entirely. Try fallback provider
  first, then return a clear message with the reset time.
- On 429 from Nous: record rate limit state and skip further retries
  (sets retry_count = max_retries to trigger fallback path)
- On success from Nous: clear the rate limit state so other sessions
  know they can resume

auxiliary_client.py changes:
- _try_nous() checks rate guard before attempting Nous in the auxiliary
  fallback chain. When rate-limited, returns (None, None) so the chain
  skips to the next provider instead of piling more requests onto Nous.

This eliminates three sources of amplification:
1. Hermes-level retries (saves 6 of 9 calls per turn)
2. Cross-session retries (cron + gateway all skip Nous)
3. Auxiliary fallback to Nous (compression/session_search skip too)

Includes 24 tests covering the rate guard module, header parsing,
state lifecycle, and auxiliary client integration.
2026-04-15 16:31:48 -07:00
Teknium 0d05bd34f8 feat: extend channel_prompts to Telegram, Slack, and Mattermost
Extract resolve_channel_prompt() shared helper into
gateway/platforms/base.py. Refactor Discord to use it.
Wire channel_prompts into Telegram (groups + forum topics),
Slack (channels), and Mattermost (channels).

Config bridging now applies to all platforms (not just Discord).
Added channel_prompts defaults to telegram/slack/mattermost
config sections.

Docs added to all four platform pages with platform-specific
examples (topic inheritance for Telegram, channel IDs for Slack,
etc.).
2026-04-15 16:31:28 -07:00
Teknium 620c296b1d fix: discord mock setup and AUTHOR_MAP for channel_prompts tests
Move _ensure_discord_mock() from module level to _make_adapter() so it
doesn't poison sys.modules for other discord test files. Use
types.ModuleType instead of MagicMock for the mock module to avoid
auto-generated __file__ attribute confusing hasattr checks.

Add BrennerSpear to AUTHOR_MAP.
2026-04-15 16:31:28 -07:00
Brenner Spear 90a6336145 fix: remove redundant key normalization and defensive getattr in channel_prompts
- Remove double str() normalization in _resolve_channel_prompt since
  config bridging already handles numeric YAML key conversion
- Remove dead prompts.get(str(key)) fallback that could never match
  after keys were already normalized to strings
- Replace getattr(event, "channel_prompt", None) with direct attribute
  access since channel_prompt is a declared dataclass field
- Update test to verify normalization responsibility lives in config bridging
2026-04-15 16:31:28 -07:00
Brenner Spear 2fbdc2c8fa feat(discord): add channel_prompts config
Add native Discord channel_prompts support with parent forum fallback,
ephemeral runtime injection, config migration updates, docs, and tests.
2026-04-15 16:31:28 -07:00
Teknium 2918328009 fix: show correct env var name in provider API key error (#9506) (#10563)
The error message for missing provider API keys dynamically built
the env var name as PROVIDER_API_KEY (e.g. ALIBABA_API_KEY), but
some providers use different names (alibaba uses DASHSCOPE_API_KEY).
Users following the error message set the wrong variable.

Fix: look up the actual env var from PROVIDER_REGISTRY before
building the error. Falls back to the dynamic name if the registry
lookup fails.

Closes #9506
2026-04-15 16:31:08 -07:00
JiaDe WU 0cb8c51fa5 feat: native AWS Bedrock provider via Converse API
Salvaged from PR #7920 by JiaDe-Wu — cherry-picked Bedrock-specific
additions onto current main, skipping stale-branch reverts (293 commits
behind).

Dual-path architecture:
  - Claude models → AnthropicBedrock SDK (prompt caching, thinking budgets)
  - Non-Claude models → Converse API via boto3 (Nova, DeepSeek, Llama, Mistral)

Includes:
  - Core adapter (agent/bedrock_adapter.py, 1098 lines)
  - Full provider registration (auth, models, providers, config, runtime, main)
  - IAM credential chain + Bedrock API Key auth modes
  - Dynamic model discovery via ListFoundationModels + ListInferenceProfiles
  - Streaming with delta callbacks, error classification, guardrails
  - hermes doctor + hermes auth integration
  - /usage pricing for 7 Bedrock models
  - 130 automated tests (79 unit + 28 integration + follow-up fixes)
  - Documentation (website/docs/guides/aws-bedrock.md)
  - boto3 optional dependency (pip install hermes-agent[bedrock])

Co-authored-by: JiaDe WU <40445668+JiaDe-Wu@users.noreply.github.com>
2026-04-15 16:17:17 -07:00
Teknium 21afc9502a fix: respect explicit api_mode for custom GPT-5 endpoints (#10473) (#10548)
The GPT-5 auto-upgrade logic unconditionally overrode api_mode to
codex_responses for any model starting with gpt-5, even when the
user explicitly set api_mode=chat_completions. Custom proxies that
serve GPT-5 via /chat/completions became unusable.

Fix: check api_mode is None before the override fires. If the caller
passed any explicit api_mode, it is final -- no auto-upgrade.

Closes #10473
2026-04-15 16:10:56 -07:00
MestreY0d4-Uninter f4724803b4 fix(runtime): surface malformed proxy env and base URL before client init
When proxy env vars (HTTP_PROXY, HTTPS_PROXY, ALL_PROXY) contain
malformed URLs — e.g. 'http://127.0.0.1:6153export' from a broken
shell config — the OpenAI/httpx client throws a cryptic 'Invalid port'
error that doesn't identify the offending variable.

Add _validate_proxy_env_urls() and _validate_base_url() in
auxiliary_client.py, called from resolve_provider_client() and
_create_openai_client() to fail fast with a clear, actionable error
message naming the broken env var or URL.

Closes #6360
Co-authored-by: MestreY0d4-Uninter <MestreY0d4-Uninter@users.noreply.github.com>
2026-04-15 16:10:53 -07:00
Teknium ee9c0a3ed0 fix(security): add JWT token and Discord mention redaction (#10547)
Found via trace data audit: JWT tokens (eyJ...) and Discord snowflake
mentions (<@ID>) were passing through unredacted.

JWT pattern: matches 1/2/3-part tokens starting with eyJ (base64 for '{').
Zero false-positive risk — no normal text matches eyJ + 10+ base64url chars.

Discord pattern: matches <@digits> and <@!digits> with 17-20 digit snowflake
IDs. Syntactically unique to Discord's mention format.

Both patterns follow the same structural-uniqueness standard as existing
prefix patterns (sk-, ghp_, AKIA, etc.).
2026-04-15 16:08:52 -07:00
Teknium 1d4b9c1a74 fix(gateway): don't treat group session user_id as thread_id in shutdown notifications (#10546)
_parse_session_key() blindly assigned parts[5] as thread_id for all
chat types. For group sessions with per-user isolation, parts[5] is
a user_id, not a thread_id. This could cause shutdown notifications
to route with incorrect thread metadata.

Only return thread_id for chat types where the 6th element is
unambiguous: dm and thread. For group/channel sessions, omit
thread_id since the suffix may be a user_id.

Based on the approach from PR #9938 by @Ruzzgar.
2026-04-15 15:09:23 -07:00
Ruzzgar de3f8bc6ce fix terminal workdir validation for Windows paths 2026-04-15 15:06:51 -07:00
Teknium eb3d928da6 chore: add counterposition to AUTHOR_MAP 2026-04-15 15:05:32 -07:00
Harish Kukreja f1df83179f fix(doctor): skip health check for OpenCode Go (no shared /models endpoint)
OpenCode Go does not expose a shared /models endpoint, so the doctor
probe was always failing and producing a false warning. Set the default
URL to None and disable the health check for this provider.
2026-04-15 15:05:32 -07:00
Teknium ddaadfb9f0 chore: add helix4u to AUTHOR_MAP 2026-04-15 15:04:14 -07:00
helix4u 96cc556055 fix(copilot): preserve base URL and gpt-5-mini routing 2026-04-15 15:04:14 -07:00
Teknium 3b4ecf8ee7 fix: remove 'q' alias from /quit so /queue's 'q' alias works (#10467) (#10538)
Both /queue and /quit registered 'q' as an alias. Since /quit appeared
later in COMMAND_REGISTRY, _build_command_lookup() silently overwrote
/queue's claim, making the documented /queue shorthand unusable.

Fix: remove 'q' from /quit's aliases. /quit already has 'exit' as an
alias plus the full '/quit' command. /queue has no other short alias.

Closes #10467
2026-04-15 15:04:01 -07:00
Teknium 93b6f45224 fix: always retry on ASCII codec UnicodeEncodeError — don't gate on per-component sanitization
The recovery block previously only retried (continue) when one of the
per-component sanitization checks (messages, tools, system prompt,
headers, credentials) found and stripped non-ASCII content.  When the
non-ASCII lived only in api_messages' reasoning_content field (which
is built from messages['reasoning'] and not checked by the original
_sanitize_messages_non_ascii), all checks returned False and the
recovery fell through to the normal error path — burning a retry
attempt despite _force_ascii_payload being set.

Now the recovery always continues (retries) when _is_ascii_codec is
detected.  The _force_ascii_payload flag guarantees the next iteration
runs _sanitize_structure_non_ascii(api_kwargs) on the full API payload,
catching any remaining non-ASCII regardless of where it lives.

Also adds test for the 'reasoning' field on canonical messages.

Fixes #6843
2026-04-15 15:03:28 -07:00
MestreY0d4-Uninter 902f1e6ede chore: add MestreY0d4-Uninter to AUTHOR_MAP and .mailmap 2026-04-15 15:03:28 -07:00
MestreY0d4-Uninter efd1ddc6e1 fix: sanitize api_messages and extra string fields during ASCII-codec recovery (#6843)
The ASCII-locale recovery path in run_agent.py sanitized the canonical
'messages' list but left 'api_messages' untouched. api_messages is a
separate API-copy built before the retry loop and may carry extra fields
(reasoning_content, extra_body entries) that are not present in
'messages'. This caused the retry to still raise UnicodeEncodeError even
after the 'System encoding is ASCII — stripped...' log line appeared.

Two changes:
- _sanitize_messages_non_ascii now walks all extra top-level string fields
  in each message dict (any key not in {content, name, tool_calls, role})
  so reasoning_content and future extras are cleaned in both 'messages'
  and 'api_messages'.
- The ASCII-codec recovery block now also calls sanitize on api_messages
  and api_kwargs so no non-ASCII survives into the next retry attempt.

Adds regression tests covering:
- reasoning_content with non-ASCII in api_messages
- extra_body with non-ASCII in api_kwargs
- canonical messages clean but api_messages dirty

Fixes #6843
2026-04-15 15:03:28 -07:00
LehaoLin d4eba82a37 fix(streaming): don't suppress final response when commentary message is sent
Commentary messages (interim assistant status updates like "Using browser
tool...") are sent via _send_commentary(), which was incorrectly setting
_already_sent = True on success. This caused the final response to be
suppressed when there were multiple tool calls, because the gateway checks
already_sent to decide whether to skip re-sending the response.

The fix: commentary messages are interim status updates, not the final
response, so _already_sent should not be set when they succeed. This
ensures the final response is always delivered regardless of how many
commentary messages were sent during the turn.

Fixes: #10454
2026-04-15 15:00:58 -07:00
Teknium 23f1fa22af fix(kimi): include kimi-coding-cn in Kimi base URL resolution (#10534)
Route kimi-coding-cn through _resolve_kimi_base_url() in both
get_api_key_provider_status() and resolve_api_key_provider_credentials()
so CN users with sk-kimi- prefixed keys get auto-detected to the Kimi
Coding Plan endpoint, matching the existing behavior for kimi-coding.

Also update the kimi-coding display label to accurately reflect the
dual-endpoint setup (Kimi Coding Plan + Moonshot API).

Salvaged from PR #10525 by kkikione999.
2026-04-15 14:54:30 -07:00
Junass1 096260ce78 fix(telegram): authorize update prompt callbacks 2026-04-15 14:54:23 -07:00
Teknium 18396af31e fix: handle cross-device shutil.move failure in tirith auto-install (#10127) (#10524)
_install_tirith() uses shutil.move() to place the binary from tmpdir
to ~/.hermes/bin/.  When these are on different filesystems (common in
Docker, NFS), shutil.move() falls back to copy2 + unlink, but copy2's
metadata step can raise PermissionError.  This exception propagated
past the fail_open guard, crashing the terminal tool entirely.

Additionally, a failed install could leave a non-executable tirith
binary at the destination, causing a retry loop on every subsequent
terminal command.

Fix:
- Catch OSError from shutil.move() and fall back to shutil.copy()
  (skips metadata/xattr copying that causes PermissionError)
- If even copy fails, clean up the partial dest file to prevent
  the non-executable retry loop
- Return (None, 'cross_device_copy_failed') so the failure routes
  through the existing install-failure caching and fail_open logic

Closes #10127
2026-04-15 14:50:07 -07:00
Teknium 1b12f9b1d6 docs: add terminal bypass test to Out of Scope section
Clarifies that tool-level access restrictions are not security boundaries
when the agent has unrestricted terminal access. Deny lists only matter
when paired with equivalent terminal-side restrictions (like WRITE_DENIED_PATHS
pairs with the dangerous command approval system).
2026-04-15 14:34:09 -07:00
i3eg1nner 407d27bd82 feat: add SECURITY.md 2026-04-15 14:34:09 -07:00
Teknium b3b88a279b fix: prevent stale os.environ leak after clear_session_vars (#10304) (#10527)
After clear_session_vars() reset contextvars to their default (''),
get_session_env() treated the empty string as falsy and fell through
to os.environ — resurrecting stale HERMES_SESSION_* values from CLI
startup, cron, or previous sessions.  This broke session isolation
in the gateway where concurrent messages could see each other's
stale environment values.

Fix: use a sentinel (_UNSET) as the contextvar default instead of ''.
get_session_env() now checks 'value is not _UNSET' instead of
truthiness.  Three states are cleanly distinguished:

  - _UNSET (never set): fall back to os.environ (CLI/cron compat)
  - '' (explicitly cleared): return '' — no os.environ fallback
  - 'telegram' (actively set): return the value

clear_session_vars() now uses var.set('') instead of var.reset(token)
to mark vars as explicitly cleared rather than reverting to _UNSET.

Closes #10304
2026-04-15 14:27:17 -07:00
Teknium e36c804bc2 fix: prevent already_sent from swallowing empty responses after tool calls (#10531)
When a model (e.g. mimo-v2-pro) streams intermediate text alongside tool
calls ("Let me search for that") but then returns empty after processing
tool results, the stream consumer already_sent flag is True from the
earlier text delivery.  The gateway suppression check
(already_sent=True, failed=False → return None) would swallow the final
response, leaving the user staring at silence after the search.

Two changes:

1. gateway/run.py return path: skip already_sent suppression when the
   final_response is "(empty)" or empty — the user needs to know the
   agent finished even if streaming sent partial content earlier.

2. gateway/run.py response handler: convert the internal "(empty)"
   sentinel to a user-friendly warning instead of delivering the raw
   sentinel string.

Tests added for all empty/None/sentinel cases plus preserved existing
suppression behavior for normal non-empty responses.
2026-04-15 14:26:45 -07:00
Teknium a9197f9bb1 fix(memory): discover user-installed memory providers from $HERMES_HOME/plugins/ (#10529)
Memory provider discovery (discover_memory_providers, load_memory_provider)
only scanned the bundled plugins/memory/ directory. User-installed providers
at $HERMES_HOME/plugins/<name>/ were invisible, forcing users to symlink
into the repo source tree — which broke on hermes update and created a
dual-registration path causing duplicate tool names (400 errors on strict
providers like Xiaomi MiMo).

Changes:
- Add _get_user_plugins_dir(), _is_memory_provider_dir(), _iter_provider_dirs(),
  and find_provider_dir() helpers to plugins/memory/__init__.py
- discover_memory_providers() now scans both bundled and user dirs
- load_memory_provider() uses find_provider_dir() (bundled-first)
- discover_plugin_cli_commands() uses find_provider_dir()
- _install_dependencies() in memory_setup.py uses find_provider_dir()
- User plugins use _hermes_user_memory namespace to avoid sys.modules collisions
- Non-memory user plugins filtered via source text heuristic
- Bundled providers always take precedence on name collisions

Fixes #4956, #9099. Supersedes #4987, #9123, #9130, #9132, #9982.
2026-04-15 14:25:40 -07:00
Teknium 22d22cd75c fix: auto-register all gateway commands as Discord slash commands (#10528)
Discord's _register_slash_commands() had a hardcoded list of ~27 commands
while COMMAND_REGISTRY defines 34+ gateway-available commands. Missing
commands (debug, branch, rollback, snapshot, profile, yolo, fast, reload,
commands) were invisible in Discord's / autocomplete — users couldn't
discover them.

Add a dynamic catch-all loop after the explicit registrations that
iterates COMMAND_REGISTRY, skips already-registered commands, and
auto-registers the rest using discord.app_commands.Command(). Commands
with args_hint get an optional string parameter; parameterless commands
get a simple callback.

This ensures any future commands added to COMMAND_REGISTRY automatically
appear on Discord without needing a manual entry in discord.py.

Telegram and Slack already derive dynamically from COMMAND_REGISTRY
via telegram_bot_commands() and slack_subcommand_map() — no changes
needed there.
2026-04-15 14:25:27 -07:00
Teknium c4674cbe21 fix: parse string schedules in cron update_job() (#10129) (#10521)
update_job() assumed the schedule value was always a pre-parsed dict
and called .get() on it directly.  When the API passes a raw string
like "every 10m", this crashed with AttributeError.

The create path already handles this correctly by calling
parse_schedule() on the incoming string.  The fix adds the same
normalization to the update path: if the schedule is a string,
parse it into a dict before proceeding.

Closes #10129
2026-04-15 14:25:12 -07:00
Teknium 305a702e09 fix: /browser connect CDP override now takes priority over Camofox (#10523)
When a user runs /browser connect to attach browser tools to their real
Chrome instance via CDP, the BROWSER_CDP_URL env var is set. However,
every browser tool function checks _is_camofox_mode() first, which
short-circuits to the Camofox backend before _get_session_info() ever
checks for the CDP override.

Fix: is_camofox_mode() now returns False when BROWSER_CDP_URL is set,
so the explicit CDP connection takes priority. This is the correct
behavior — /browser connect is an intentional user override.

Reported by SkyLinx on Discord.
2026-04-15 14:11:18 -07:00
Teknium 824c33729d fix(session_search): coerce limit to int to prevent TypeError with non-int values (#10522)
Models (especially open-source like qwen3.5-plus) may send non-int values
for the limit parameter — None (JSON null), string, or even a type object.
This caused TypeError: '<=' not supported between instances of 'int' and
'type' when the value reached min()/comparison operations.

Changes:
- Add defensive int coercion at session_search() entry with fallback to 3
- Clamp limit to [1, 5] range (was only capped at 5, not floored)
- Add tests for None, type object, string, negative, and zero limit values

Reported by community user ludoSifu via Discord.
2026-04-15 14:11:05 -07:00
Teknium 91980e3518 fix: deduplicate memory provider tools to prevent 400 on strict providers (#10511)
Memory provider plugins (e.g. Mnemosyne) can register tools via two paths:
1. Plugin system (ctx.register_tool) → tool registry → get_tool_definitions()
2. Memory manager → get_all_tool_schemas() → direct append in AIAgent.__init__

Path 2 blindly appended without checking if path 1 already added the same
tool names. This created duplicate function names in the tools array sent
to the API. Most providers silently handle duplicates, but Xiaomi MiMo
(via Nous Portal) strictly rejects them with a 400 Bad Request.

Fix: build a set of existing tool names before memory manager injection
and skip any tool whose name is already present.

Confirmed via live testing against Nous Portal:
- Unique tool names → 200 OK
- Duplicate tool names → 400 'Provider returned error'
2026-04-15 14:09:32 -07:00
Teknium 861efe274b fix: add ensure_ascii=False to all MCP json.dumps calls (#10234) (#10512)
Python's json.dumps() defaults to ensure_ascii=True, escaping non-ASCII
characters to \uXXXX sequences.  For CJK characters this inflates
token count 3-4x — a single Chinese character like '中' becomes
'\u4e2d' (6 chars vs 3 bytes, ~6 tokens vs ~1 token).

Since MCP tool results feed directly into the model's conversation
context, this silently multiplied API costs for Chinese, Japanese,
and Korean users.

Fix: add ensure_ascii=False to all 20 json.dumps calls in mcp_tool.py.
Raw UTF-8 is valid JSON per RFC 8259 and all downstream consumers
(LLM APIs, display) handle it correctly.

Closes #10234
2026-04-15 13:59:57 -07:00
Teknium 19142810ed fix: /debug privacy — auto-delete pastes after 1 hour, add privacy notices (#10510)
- Pastes uploaded by /debug now auto-delete after 1 hour via a detached
  background process that sends DELETE to paste.rs
- CLI: shows privacy notice listing what data will be uploaded
- Gateway: only uploads summary report (system info + log tails), NOT
  full log files containing conversation content
- Added 'hermes debug delete <url>' for immediate manual deletion
- 16 new tests covering auto-delete scheduling, paste deletion, privacy
  notices, and the delete subcommand

Addresses user privacy concern where /debug uploaded full conversation
logs to a public paste service with no warning or expiry.
2026-04-15 13:40:27 -07:00
Teknium 2edbf15560 fix: enforce TTL in MessageDeduplicator + use yaml for gateway --config (#10306, #10216) (#10509)
Two gateway fixes:

1. MessageDeduplicator.is_duplicate() now checks TTL at query time (#10306)

   Previously, is_duplicate() returned True for any previously seen ID
   without checking its age — expired entries were only purged when cache
   size exceeded max_size.  On normal workloads that never overflow, message
   IDs stayed deduplicated forever instead of expiring after the TTL.

   Fix: check `now - timestamp < ttl` before returning True.  Expired
   entries are removed and treated as new messages.

2. Gateway --config flag now uses yaml.safe_load() (#10216)

   The --config CLI flag in gateway/run.py main() used json.load() to
   parse config files.  YAML is the only documented config format and
   every other config loader uses yaml.safe_load().  A YAML config file
   passed via --config would crash with json.JSONDecodeError.

Closes #10306
Closes #10216
2026-04-15 13:35:40 -07:00
Teknium af4bf505b3 fix: add on_memory_write bridge to sequential tool execution path (#10174) (#10507)
The on_memory_write bridge that notifies external memory providers
(ClawMem, retaindb, supermemory, etc.) of built-in memory writes was
only present in the concurrent tool execution path (_invoke_tool).
The sequential path (_execute_tool_calls_sequential) — which handles
all single tool calls, the common case — was missing it entirely.

This meant external memory providers silently missed every single-call
memory write, which is the vast majority of memory operations.

Fix: add the identical bridge block to the sequential path, right
after the memory_tool call returns.

Closes #10174
2026-04-15 13:32:59 -07:00
helix4u 93f6f66872 fix(interrupt): preserve pre-start terminal interrupts 2026-04-15 13:29:57 -07:00
Teknium a418ddbd8b fix: add activity heartbeats to prevent false gateway inactivity timeouts (#10501)
Multiple gaps in activity tracking could cause the gateway's inactivity
timeout to fire while the agent is actively working:

1. Streaming wait loop had no periodic heartbeat — the outer thread only
   touched activity when the stale-stream detector fired (180-300s), and
   for local providers (Ollama) the stale timeout was infinity, meaning
   zero heartbeats. Now touches activity every 30s.

2. Concurrent tool execution never set the activity callback on worker
   threads (threading.local invisible across threads) and never set
   _current_tool. Workers now set the callback, and the concurrent wait
   uses a polling loop with 30s heartbeats.

3. Modal backend's execute() override had its own polling loop without
   any activity callback. Now matches _wait_for_process cadence (10s).
2026-04-15 13:29:05 -07:00
Teknium 0d25e1c146 fix: prevent premature loop exit when weak models return empty after substantive tool calls (#10472)
The _last_content_with_tools fallback was firing indiscriminately for ALL
content+tool turns, including mid-task narration alongside substantive
tools (terminal, search_files, etc.).  This caused the agent to exit
the loop with 'I'll scan the directory...' as the final answer instead
of nudging the model to continue processing tool results.

The fix restricts the fallback to housekeeping-only turns (memory, todo,
skill_manage, session_search) where the content genuinely IS the final
answer.  When substantive tools are present, the existing post-tool
nudge mechanism now fires instead, prompting the model to continue.

Affected models: xiaomi/mimo-v2-pro, GLM-5, and other weaker models
that intermittently return empty after tool results.

Reported by user Renaissance on Discord.
2026-04-15 13:28:09 -07:00
Teknium 6391b46779 fix: bound auxiliary client cache to prevent fd exhaustion in long-running gateways (#10200) (#10470)
The _client_cache used event loop id() as part of the cache key, so
every new worker-thread event loop created a new entry for the same
provider config.  In long-running gateways where threads are recycled
frequently, this caused unbounded cache growth — each stale entry
held an unclosed AsyncOpenAI client with its httpx connection pool,
eventually exhausting file descriptors.

Fix: remove loop_id from the cache key and instead validate on each
async cache hit that the cached loop is the current, open loop.  If
the loop changed or was closed, the stale entry is replaced in-place
rather than creating an additional entry.  This bounds cache growth
to at most one entry per unique provider config.

Also adds a _CLIENT_CACHE_MAX_SIZE (64) safety belt with FIFO
eviction as defense-in-depth against any remaining unbounded growth.

Cross-loop safety is preserved: different event loops still get
different client instances (validated by existing test suite).

Closes #10200
2026-04-15 13:16:28 -07:00
Teknium d1d425e9d0 chore: add ZaynJarvis bytedance email to AUTHOR_MAP 2026-04-15 11:28:45 -07:00
zhiheng.liu 7cb06e3bb3 refactor(memory): drop on_session_reset — commit-only is enough
OV transparently handles message history across /new and /compress: old
messages stay in the same session and extraction is idempotent, so there's
no need to rebind providers to a new session_id. The only thing the
session boundary actually needs is to trigger extraction.

- MemoryProvider / MemoryManager: remove on_session_reset hook
- OpenViking: remove on_session_reset override (nothing to do)
- AIAgent: replace rotate_memory_session with commit_memory_session
  (just calls on_session_end, no rebind)
- cli.py / run_agent.py: single commit_memory_session call at the
  session boundary before session_id rotates
- tests: replace on_session_reset coverage with routing tests for
  MemoryManager.on_session_end

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 11:28:45 -07:00
zhiheng.liu 8275fa597a refactor(memory): promote on_session_reset to base provider hook
Replace hasattr-forked OpenViking-specific paths with a proper base-class
hook. Collapse the two agent wrappers into a single rotate_memory_session
so callers don't orchestrate commit + rebind themselves.

- MemoryProvider: add on_session_reset(new_session_id) as a default no-op
- MemoryManager: on_session_reset fans out unconditionally (no hasattr,
  no builtin skip — base no-op covers it)
- OpenViking: rename reset_session -> on_session_reset; drop the explicit
  POST /api/v1/sessions (OV auto-creates on first message) and the two
  debug raise_for_status wrappers
- AIAgent: collapse commit_memory_session + reinitialize_memory_session
  into rotate_memory_session(new_sid, messages)
- cli.py / run_agent.py: replace hasattr blocks and the split calls with
  a single unconditional rotate_memory_session call; compression path
  now passes the real messages list instead of []
- tests: align with on_session_reset, assert reset does NOT POST /sessions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 11:28:45 -07:00
zhiheng.liu 7856d304f2 fix(openviking): commit session on /new and context compression
The OpenViking memory provider extracts memories when its session is
committed (POST /api/v1/sessions/{id}/commit).  Before this fix, the
CLI had two code paths that changed the active session_id without ever
committing the outgoing OpenViking session:

1. /new (new_session() in cli.py) — called flush_memories() to write
   MEMORY.md, then immediately discarded the old session_id.  The
   accumulated OpenViking session was never committed, so all context
   from that session was lost before extraction could run.

2. /compress and auto-compress (_compress_context() in run_agent.py) —
   split the SQLite session (new session_id) but left the OpenViking
   provider pointing at the old session_id with no commit, meaning all
   messages synced to OpenViking were silently orphaned.

The gateway already handles session commit on /new and /reset via
shutdown_memory_provider() on the cached agent; the CLI path did not.

Fix: introduce a lightweight session-transition lifecycle alongside
the existing full shutdown path:

- OpenVikingMemoryProvider.reset_session(new_session_id): waits for
  in-flight background threads, resets per-session counters, and
  creates the new OV session via POST /api/v1/sessions — without
  tearing down the HTTP client (avoids connection overhead on /new).

- MemoryManager.restart_session(new_session_id): calls reset_session()
  on providers that implement it; falls back to initialize() for
  providers that do not.  Skips the builtin provider (no per-session
  state).

- AIAgent.commit_memory_session(messages): wraps
  memory_manager.on_session_end() without shutdown — commits OV session
  for extraction but leaves the provider alive for the next session.

- AIAgent.reinitialize_memory_session(new_session_id): wraps
  memory_manager.restart_session() — transitions all external providers
  to the new session after session_id has been assigned.

Call sites:
- cli.py new_session(): commit BEFORE session_id changes, reinitialize
  AFTER — ensuring OV extraction runs on the correct session and the
  new session is immediately ready for the next turn.
- run_agent._compress_context(): same pattern, inside the
  if self._session_db: block where the session_id split happens.

/compress and auto-compress are functionally identical at this layer:
both call _compress_context(), so both are fixed by the same change.

Tests added to tests/agent/test_memory_provider.py:
- TestMemoryManagerRestartSession: reset_session() routing, builtin
  skip, initialize() fallback, failure tolerance, empty-manager noop.
- TestOpenVikingResetSession: session_id update, per-session state
  clear, POST /api/v1/sessions call, API failure tolerance, no-client
  noop.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 11:28:45 -07:00
zhiheng.liu f3ec4b3a16 Fix OpenViking integration issues: explicit session creation, better error logging 2026-04-15 11:28:45 -07:00
ZaynJarvis 5082a9f66c fix: wire agent/account/user params through _VikingClient
- Fix copy-paste bug: `self._agent = user` → `self._agent = agent`
  with new `agent` parameter in `_VikingClient.__init__`
- Read account/user/agent env vars in `initialize()` and pass them
  to all 4 `_VikingClient` instantiations so identity headers are
  consistently applied across health check, prefetch, sync, and
  memory write paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 11:28:45 -07:00
Zayn Jarvis 0c30385be2 chore: update doc 2026-04-15 11:28:45 -07:00
Zayn Jarvis 8b167af66b feat: add ov agent header 2026-04-15 11:28:45 -07:00
ZaynJarvis 990030c26e feat: add contrib map 2026-04-15 11:28:45 -07:00
ZaynJarvis d2f85383e8 fix: change default OPENVIKING_ACCOUNT from root to default
- Change default OPENVIKING_ACCOUNT from 'root' to 'default'
- Add account and user config options to get_config_schema()
- Add session creation in initialize()
- Add reset_session() method
- Update docstring to reflect new default

This is a breaking change: existing users who relied on the 'root' account will need to either:
1. Set OPENVIKING_ACCOUNT=root in their environment, or
2. Migrate their data to the 'default' account

Future release will add support for OPENVIKING_ACCOUNT and OPENVIKING_USER in setup when API key is provided.

update desc for key setup
2026-04-15 11:28:45 -07:00
Teknium 2dc5f9d2d3 fix: light mode link/primary colors unreadable on white background (#10457)
Gold #FFD700 has 1.4:1 contrast ratio on white — barely visible.
Replace with dark amber palette (#8B6508 primary, #7A5800 links)
that passes WCAG AA (5.3:1 and 6.5:1 respectively).

Changes:
- :root primary palette → dark amber tones for light mode
- Explicit light mode link colors (#7A5800 / #5A4100 hover)
- Light mode sidebar active state with amber accent
- Light mode table header/border styling
- Footer hover color split by theme (gold for dark, amber for light)

Dark mode is completely unchanged.

Reported by @AbrahamMat7632
2026-04-15 11:17:44 -07:00
Teknium f61cc464f0 fix: include thread_id in _parse_session_key and fix stale parts reference
_parse_session_key() now extracts the optional 6th part (thread_id) from
session keys, and _notify_active_sessions_of_shutdown uses _parsed.get()
instead of the removed 'parts' variable. Without this, shutdown notifications
silently failed (NameError caught by try/except) and forum topic routing
was lost.
2026-04-15 11:16:01 -07:00
kshitijk4poor 2276b72141 fix: follow-up improvements for watch notification routing (#9537)
- Populate watcher_* routing fields for watch-only processes (not just
  notify_on_complete), so watch-pattern events carry direct metadata
  instead of relying solely on session_key parsing fallback
- Extract _parse_session_key() helper to dedupe session key parsing
  at two call sites in gateway/run.py
- Add negative test proving cross-thread leakage doesn't happen
- Add edge-case tests for _build_process_event_source returning None
  (empty evt, invalid platform, short session_key)
- Add unit tests for _parse_session_key helper
2026-04-15 11:16:01 -07:00
etcircle dee592a0b1 fix(gateway): route synthetic background events by session 2026-04-15 11:16:01 -07:00
kshitij da448d4fce test(cron): add regression test for credential_files ContextVar propagation (#10462)
Follow-up to #10459 (salvage of #7527). The copy_context() fix propagates
ALL ContextVars into the cron worker thread, including credential_files.
This test verifies that skill-declared required_credential_files are
visible inside the worker thread, matching the existing env_passthrough
regression test.
2026-04-15 11:11:08 -07:00
helix4u aa398ad655 fix(cron): preserve skill env passthrough in worker thread 2026-04-15 11:03:49 -07:00
WideLee 422f2866e6 docs: restore sidebar entries removed by PR #9931
Re-add 'qqbot' and 'automation-templates' doc indexes to sidebars.ts
that were accidentally dropped in https://github.com/NousResearch/hermes-agent/pull/9931.
2026-04-15 09:39:12 -07:00
Teknium 722331a57d fix: replace hardcoded ~/.hermes with display_hermes_home() in agent-facing text (#10285)
Tool schema descriptions and tool return values contained hardcoded
~/.hermes paths that the model sees and uses. When HERMES_HOME is set
to a custom path (Docker containers, profiles), the agent would still
reference ~/.hermes — looking at the wrong directory.

Fixes 6 locations across 5 files:
- tools/tts_tool.py: output_path schema description
- tools/cronjob_tools.py: script path schema description
- tools/skill_manager_tool.py: skill_manage schema description
- tools/skills_tool.py: two tool return messages
- agent/skill_commands.py: skill config injection text

All now use display_hermes_home() which resolves to the actual
HERMES_HOME path (e.g. /opt/data for Docker, ~/.hermes/profiles/X
for profiles, ~/.hermes for default).

Reported by: Sandeep Narahari (PrithviDevs)
2026-04-15 04:57:55 -07:00
sprmn24 41e2d61b3f feat(discord): add native send_animation for inline GIF playback 2026-04-15 04:51:27 -07:00
Teknium 4da598b48a docs: clarify hermes model vs /model — two commands, two purposes (#10276)
Users are confused about the difference between `hermes model` (terminal
command for full provider setup) and `/model` (session command for switching
between already-configured providers). This distinction was not documented
anywhere.

Changes across 4 doc pages:
- cli-commands.md: Added warning callout explaining the difference, added
  --global flag docs, added 'only see OpenRouter models?' info box
- slash-commands.md: Added notes on both TUI and messaging /model entries
  that /model only switches between configured providers
- providers.md: Added 'Two Commands for Model Management' comparison table
  near top of page, added warning callout in switching section
- faq.md: Added new FAQ entry '/model only shows one provider' with quick
  reference table

Prompted by user feedback in Discord — new users consistently hit this
confusion when trying to add providers from inside a session.
2026-04-15 04:39:34 -07:00
asheriif 33ae403890 fix(gateway): fix matrix lingering typing indicator 2026-04-15 04:16:16 -07:00
Teknium 47e6ea84bb fix: file handle bug, warning text, and tests for Discord media send
- Fix file handle closed before POST: nest session.post() inside
  the 'with open()' block so aiohttp can read the file during upload
- Update warning text to include weixin (also supports media delivery)
- Add 8 unit tests covering: text+media, media-only, missing files,
  upload failures, multiple files, and _send_to_platform routing
2026-04-15 04:16:06 -07:00
sprmn24 4bcb2f2d26 feat(send_message): add native media attachment support for Discord
Previously send_message only supported media delivery for Telegram.
Discord users received a warning that media was omitted.

- Add media_files parameter to _send_discord()
- Upload media via Discord multipart/form-data API (files[0] field)
- Handle Discord in _send_to_platform() same way as Telegram block
- Remove Discord from generic chunk loop (now handled above)
- Update error/warning strings to mention telegram and discord
2026-04-15 04:16:06 -07:00
Teknium 1c4d3216d3 fix(cron): include job_id in delivery and guide models on removal workflow (#10242)
* fix(gateway): suppress duplicate replies on interrupt and streaming flood control

Three fixes for the duplicate reply bug affecting all gateway platforms:

1. base.py: Suppress stale response when the session was interrupted by a
   new message that hasn't been consumed yet. Checks both interrupt_event
   and _pending_messages to avoid false positives. (#8221, #2483)

2. run.py (return path): Remove response_previewed guard from already_sent
   check. Stream consumer's already_sent alone is authoritative — if
   content was delivered via streaming, the duplicate send must be
   suppressed regardless of the agent's response_previewed flag. (#8375)

3. run.py (queued-message path): Same fix — already_sent without
   response_previewed now correctly marks the first response as already
   streamed, preventing re-send before processing the queued message.

The response_previewed field is still produced by the agent (run_agent.py)
but is no longer required as a gate for duplicate suppression. The stream
consumer's already_sent flag is the delivery-level truth about what the
user actually saw.

Concepts from PR #8380 (konsisumer). Closes #8375, #8221, #2483.

* fix(cron): include job_id in delivery and guide models on removal workflow

Users reported cron reminders keep firing after asking the agent to stop.
Root cause: the conversational agent didn't know the job_id (not in delivery)
and models don't reliably do the list→remove two-step without guidance.

1. Include job_id in the cron delivery wrapper so users and agents can
   reference it when requesting removal.

2. Replace confusing footer ('The agent cannot see this message') with
   actionable guidance ('To stop or manage this job, send me a new
   message').

3. Add explicit list→remove guidance in the cronjob tool schema so models
   know to list first and never guess job IDs.
2026-04-15 03:46:58 -07:00
Misturi dedc4600dd fix(skills): handle missing fields in Google Workspace token file gracefully instead of crashing with KeyError 2026-04-15 03:45:09 -07:00
Misturi 8bc9b5a0b4 fix(skills): use is None check for coordinates in find-nearby to avoid dropping valid 0.0 values 2026-04-15 03:45:09 -07:00
Teknium 2546b7acea fix(gateway): suppress duplicate replies on interrupt and streaming flood control
Three fixes for the duplicate reply bug affecting all gateway platforms:

1. base.py: Suppress stale response when the session was interrupted by a
   new message that hasn't been consumed yet. Checks both interrupt_event
   and _pending_messages to avoid false positives. (#8221, #2483)

2. run.py (return path): Remove response_previewed guard from already_sent
   check. Stream consumer's already_sent alone is authoritative — if
   content was delivered via streaming, the duplicate send must be
   suppressed regardless of the agent's response_previewed flag. (#8375)

3. run.py (queued-message path): Same fix — already_sent without
   response_previewed now correctly marks the first response as already
   streamed, preventing re-send before processing the queued message.

The response_previewed field is still produced by the agent (run_agent.py)
but is no longer required as a gate for duplicate suppression. The stream
consumer's already_sent flag is the delivery-level truth about what the
user actually saw.

Concepts from PR #8380 (konsisumer). Closes #8375, #8221, #2483.
2026-04-15 03:42:24 -07:00
Teknium 7b2700c9af fix(browser): use 127.0.0.1 instead of localhost for CDP default (#10231)
/browser connect set BROWSER_CDP_URL to http://localhost:9222, but
Chrome's --remote-debugging-port only binds to 127.0.0.1 (IPv4).
On macOS, 'localhost' can resolve to ::1 (IPv6) first, causing both
_resolve_cdp_override's /json/version fetch and agent-browser's
--cdp connection to fail when Chrome isn't listening on IPv6.

The socket check in the connect handler already used 127.0.0.1
explicitly and succeeded, masking the mismatch.

Use 127.0.0.1 in the default CDP URL to match what Chrome actually
binds to.
2026-04-15 03:29:37 -07:00
Teknium a4e1842f12 fix: strip reasoning item IDs from Responses API input when store=False (#10217)
With store=False (our default for the Responses API), the API does not
persist response items.  When reasoning items with 'id' fields were
replayed on subsequent turns, the API attempted a server-side lookup
for those IDs and returned 404:

  Item with id 'rs_...' not found. Items are not persisted when store
  is set to false.

The encrypted_content blob is self-contained for reasoning chain
continuity — the id field is unnecessary and triggers the failed lookup.

Fix: strip 'id' from reasoning items in both _chat_messages_to_responses_input
(message conversion) and _preflight_codex_input_items (normalization layer).
The id is still used for local deduplication but never sent to the API.

Reported by @zuogl448 on GPT-5.4.
2026-04-15 03:19:43 -07:00
Teknium e69526be79 fix(send_message): URL-encode Matrix room IDs and add Matrix to schema examples (#10151)
Matrix room IDs contain ! and : which must be percent-encoded in URI
path segments per the Matrix C-S spec. Without encoding, some
homeservers reject the PUT request.

Also adds 'matrix:!roomid:server.org' and 'matrix:@user:server.org'
to the tool schema examples so models know the correct target format.
2026-04-15 00:10:59 -07:00
Teknium 180b14442f test: add _parse_target_ref Matrix coverage for salvaged PR #6144 2026-04-15 00:08:14 -07:00
bkadish 03446e06bb fix(send_message): accept Matrix room IDs and user MXIDs as explicit targets
`_parse_target_ref` has explicit-reference branches for Telegram, Feishu,
and numeric IDs, but none for Matrix. As a result, callers of
`send_message(target="matrix:!roomid:server")` or
`send_message(target="matrix:@user:server")` fall through to
`(None, None, False)` and the tool errors out with a resolution failure —
even though a raw Matrix room ID or MXID is the most unambiguous possible
target.

Three-line fix: recognize `!…` as a room ID and `@…` as a user MXID when
platform is `matrix`, and return them as explicit targets. Alias-based
targets (`#…`) continue to go through the normal resolve path.
2026-04-15 00:08:14 -07:00
Teknium df7be3d8ae fix(cli): /model picker shows curated models instead of full catalog (#10146)
The /model picker called provider_model_ids() which fetches the FULL
live API catalog (hundreds of models for Anthropic, Copilot, etc.) and
only fell back to the curated list when the live fetch failed.

This flips the priority: use the curated model list from
list_authenticated_providers() (same lists as `hermes model` and
gateway pickers), falling back to provider_model_ids() only when the
curated list is empty (e.g. user-defined endpoints).
2026-04-15 00:07:50 -07:00
Ubuntu da8bab77fb fix(cli): restore messaging toolset for gateway platforms 2026-04-14 23:13:35 -07:00
Teknium 9932366f3c feat(doctor): add Command Installation check for hermes bin symlink
hermes doctor now checks whether the ~/.local/bin/hermes symlink exists
and points to the correct venv entry point. With --fix, it creates or
repairs the symlink automatically.

Covers:
- Missing symlink at ~/.local/bin/hermes (or $PREFIX/bin on Termux)
- Symlink pointing to wrong target
- Missing venv entry point (venv/bin/hermes or .venv/bin/hermes)
- PATH warning when ~/.local/bin is not on PATH
- Skipped on Windows (different mechanism)

Addresses user report: 'python -m hermes_cli.main doesn't have an option
to fix the local bin/install'

10 new tests covering all scenarios.
2026-04-14 23:13:11 -07:00
Teknium 029938fbed fix(cli): defensive subparser routing for argparse bpo-9338 (#10113)
On some Python versions, argparse fails to route subcommand tokens when
the parent parser has nargs='?' optional arguments (--continue).  The
symptom: 'hermes model' produces 'unrecognized arguments: model' even
though 'model' is a registered subcommand.

Fix: when argv contains a token matching a known subcommand, set
subparsers.required=True to force deterministic routing.  If that fails
(e.g. 'hermes -c model' where 'model' is consumed as the session name
for --continue), fall back to the default optional-subparsers behaviour.

Adds 13 tests covering all key argument combinations.

Reported via user screenshot showing the exact error on an installed
version with the model subcommand listed in usage but rejected at parse
time.
2026-04-14 23:13:02 -07:00
Teknium 772cfb6c4e fix: stale agent timeout, uv venv detection, empty response after tools, compression model fallback (#9051, #8620, #9400) (#10093)
Four independent fixes:

1. Reset activity timestamp on cached agent reuse (#9051)
   When the gateway reuses a cached AIAgent for a new turn, the
   _last_activity_ts from the previous turn (possibly hours ago)
   carried over. The inactivity timeout handler immediately saw
   the agent as idle for hours and killed it.

   Fix: reset _last_activity_ts, _last_activity_desc, and
   _api_call_count when retrieving an agent from the cache.

2. Detect uv-managed virtual environments (#8620 sub-issue 1)
   The systemd unit generator fell back to sys.executable (uv's
   standalone Python) when running under 'uv run', because
   sys.prefix == sys.base_prefix. The generated ExecStart pointed
   to a Python binary without site-packages.

   Fix: check VIRTUAL_ENV env var before falling back to
   sys.executable. uv sets VIRTUAL_ENV even when sys.prefix
   doesn't reflect the venv.

3. Nudge model to continue after empty post-tool response (#9400)
   Weaker models sometimes return empty after tool calls. The agent
   silently abandoned the remaining work.

   Fix: append assistant('(empty)') + user nudge message and retry
   once. Resets after each successful tool round.

4. Compression model fallback on permanent errors (#8620 sub-issue 4)
   When the default summary model (gemini-3-flash) returns 503
   'model_not_found' on custom proxies, the compressor entered a
   600s cooldown, leaving context growing unbounded.

   Fix: detect permanent model-not-found errors (503, 404,
   'model_not_found', 'no available channel') and fall back to
   using the main model for compression instead of entering
   cooldown. One-time fallback with immediate retry.

Test plan: 40 compressor tests + 97 gateway/CLI tests + 9 venv tests pass
2026-04-14 22:38:17 -07:00
Teknium 5d5d21556e fix: sync client.api_key during UnicodeEncodeError ASCII recovery (#10090)
The existing recovery block sanitized self.api_key and
self._client_kwargs['api_key'] but did not update self.client.api_key.
The OpenAI SDK stores its own copy of api_key and reads it dynamically
via the auth_headers property on every request. Without this fix, the
retry after sanitization would still send the corrupted key in the
Authorization header, causing the same UnicodeEncodeError.

The bug manifests when an API key contains Unicode lookalike characters
(e.g. ʋ U+028B instead of v) from copy-pasting out of PDFs, rich-text
editors, or web pages with decorative fonts. httpx hard-encodes all
HTTP headers as ASCII, so the non-ASCII char in the Authorization
header triggers the error.

Adds TestApiKeyClientSync with two tests verifying:
- All three key locations are synced after sanitization
- Recovery handles client=None (pre-init) without crashing
2026-04-14 22:37:45 -07:00
kshitijk4poor 9855190f23 feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.

- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown

Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)

Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:25 -07:00
Teknium 50c35dcabe fix: stale agent timeout, uv venv detection, empty response after tools (#9051, #8620, #9400)
Three independent fixes:

1. Reset activity timestamp on cached agent reuse (#9051)
   When the gateway reuses a cached AIAgent for a new turn, the
   _last_activity_ts from the previous turn (possibly hours ago)
   carried over. The inactivity timeout handler immediately saw
   the agent as idle for hours and killed it.

   Fix: reset _last_activity_ts, _last_activity_desc, and
   _api_call_count when retrieving an agent from the cache.

2. Detect uv-managed virtual environments (#8620 sub-issue 1)
   The systemd unit generator fell back to sys.executable (uv's
   standalone Python) when running under 'uv run', because
   sys.prefix == sys.base_prefix (uv doesn't set up traditional
   venv activation). The generated ExecStart pointed to a Python
   binary without site-packages, crashing the service on startup.

   Fix: check VIRTUAL_ENV env var before falling back to
   sys.executable. uv sets VIRTUAL_ENV even when sys.prefix
   doesn't reflect the venv.

3. Nudge model to continue after empty post-tool response (#9400)
   Weaker models (GLM-5, mimo-v2-pro) sometimes return empty
   responses after tool calls instead of continuing to the next
   step. The agent silently abandoned the remaining work with
   '(empty)' or used prior-turn fallback text.

   Fix: when the model returns empty after tool calls AND there's
   no prior-turn content to fall back on, inject a one-time user
   nudge message telling the model to process the tool results and
   continue. The flag resets after each successful tool round so it
   can fire again on later rounds.

Test plan: 97 gateway + CLI tests pass, 9 venv detection tests pass
2026-04-14 22:16:02 -07:00
Teknium 93fe4ead83 fix: warn on invalid context_length format in config.yaml (#10067)
Previously, non-integer context_length values (e.g. '256K') in
config.yaml were silently ignored, causing the agent to fall back
to 128K auto-detection with no user feedback. This was confusing
for users with custom LiteLLM endpoints expecting larger context.

Now prints a clear stderr warning and logs at WARNING level when
model.context_length or custom_providers[].models.<model>.context_length
cannot be parsed as an integer, telling users to use plain integers
(e.g. 256000 instead of '256K').

Reported by community user ChFarhan via Discord.
2026-04-14 22:14:27 -07:00
Teknium a8b7db35b2 fix: interrupt agent immediately when user messages during active run (#10068)
When a user sends a message while the agent is executing a task on the
gateway, the agent is now interrupted immediately — not silently queued.
Previously, messages were stored in _pending_messages with zero feedback
to the user, potentially leaving them waiting 1+ hours.

Root cause: Level 1 guard (base.py) intercepted all messages for active
sessions and returned with no response. Level 2 (gateway/run.py) which
calls agent.interrupt() was never reached.

Fix: Expand _handle_active_session_busy_message to handle the normal
(non-draining) case:
  1. Call running_agent.interrupt(text) to abort in-flight tool calls
     and exit the agent loop at the next check point
  2. Store the message as pending so it becomes the next turn once the
     interrupted run returns
  3. Send a brief ack: 'Interrupting current task (10 min elapsed,
     iteration 21/60, running: terminal). I'll respond shortly.'
  4. Debounce acks to once per 30s to avoid spam on rapid messages

Reported by @Lonely__MH.
2026-04-14 22:07:28 -07:00
Teknium 8548893d14 feat: entry-level Podman support — find_docker() + rootless entrypoint (#10066)
- find_docker() now checks HERMES_DOCKER_BINARY env var first, then
  docker on PATH, then podman on PATH, then macOS known locations
- Entrypoint respects HERMES_HOME env var (was hardcoded to /opt/data)
- Entrypoint uses groupmod -o to tolerate non-unique GIDs (fixes macOS
  GID 20 conflict with Debian's dialout group)
- Entrypoint makes chown best-effort so rootless Podman continues
  instead of failing with 'Operation not permitted'
- 5 new tests covering env var override, podman fallback, precedence

Based on work by alanjds (PR #3996) and malaiwah (PR #8115).
Closes #4084.
2026-04-14 21:20:37 -07:00
Teknium c5688e7c8b fix(gateway): break compression-exhaustion infinite loop and auto-reset session (#9893)
When compression fails after max attempts, the agent returns
{completed: False, partial: True} but was missing the 'failed' flag.
The gateway's agent_failed_early guard checked for 'failed' AND
'not final_response', but _run_agent_blocking always converts errors
to final_response — making the guard dead code.  This caused the
oversized session to persist, creating an infinite fail loop where
every subsequent message hits the same compression failure.

Changes:
- run_agent.py: add 'failed: True' and 'compression_exhausted: True'
  to all 5 compression-exhaustion return paths
- gateway/run.py (_run_agent_blocking): forward 'failed' and
  'compression_exhausted' flags through to the caller
- gateway/run.py (_handle_message_with_agent): fix agent_failed_early
  to check bool(failed) without the broken 'not final_response' clause;
  auto-reset the session when compression is exhausted so the next
  message starts fresh
- Update tests to match new guard logic and add
  TestCompressionExhaustedFlag test class

Closes #9893
2026-04-14 21:18:17 -07:00
Teknium ba24f058ed docs: fix stale docstring reference to _discover_tools in mcp_tool.py 2026-04-14 21:12:29 -07:00
Teknium ef04de3e98 docs: update tool-adding instructions for auto-discovery
- AGENTS.md: 3 files → 2 files, remove _discover_tools() step
- adding-tools.md: remove Step 3, note auto-discovery
- architecture.md: update discovery description
- tools-runtime.md: replace manual list with discover_builtin_tools() docs
- hermes-agent skill: remove manual import step
2026-04-14 21:12:29 -07:00
Teknium fc6cb5b970 fix: tighten AST check to module-level only
The original tree-wide ast.walk() would match registry.register() calls
inside functions too. Restrict to top-level ast.Expr statements so helper
modules that call registry.register() inside a function are never picked
up as tool modules.
2026-04-14 21:12:29 -07:00
Greer Guthrie 4b2a1a4337 fix(tools): auto-discover built-in tool modules 2026-04-14 21:12:29 -07:00
Teknium 2871ef1807 docs: note session continuity for previous_response_id chains (#10060) 2026-04-14 21:07:37 -07:00
Teknium 5cbb45d93e fix: preserve session_id across previous_response_id chains in /v1/responses (#10059)
The /v1/responses endpoint generated a new UUID session_id for every
request, even when previous_response_id was provided. This caused each
turn of a multi-turn conversation to appear as a separate session on the
web dashboard, despite the conversation history being correctly chained.

Fix: store session_id alongside the response in the ResponseStore, and
reuse it when a subsequent request chains via previous_response_id.
Applies to both the non-streaming /v1/responses path and the streaming
SSE path. The /v1/runs endpoint also gains session continuity from
stored responses (explicit body.session_id still takes priority).

Adds test verifying session_id is preserved across chained requests.
2026-04-14 21:06:32 -07:00
Teknium ca0ae56ccb fix: add 402 billing error hint to gateway error handler (#5220) (#10057)
* fix: hermes gateway restart waits for service to come back up (#8260)

Previously, systemd_restart() sent SIGUSR1 to the gateway, printed
'restart requested', and returned immediately. The gateway still
needed to drain active agents, exit with code 75, wait for systemd's
RestartSec=30, and start the new process. The user saw 'success' but
the gateway was actually down for 30-60 seconds.

Now the SIGUSR1 path blocks with progress feedback:

Phase 1 — wait for old process to die:
   User service draining active work...
  Polls os.kill(pid, 0) until ProcessLookupError (up to 90s)

Phase 2 — wait for new process to become active:
   Waiting for hermes-gateway to restart...
  Polls systemctl is-active + verifies new PID (up to 60s)

Success:
  ✓ User service restarted (PID 12345)

Timeout:
  ⚠ User service did not become active within 60s.
    Check status: hermes gateway status
    Check logs: journalctl --user -u hermes-gateway --since '2 min ago'

The reload-or-restart fallback path (line 1189) already blocks because
systemctl reload-or-restart is synchronous.

Test plan:
- Updated test to verify wait-for-restart behavior
- All 118 gateway CLI tests pass

* fix: add 402 billing error hint to gateway error handler (#5220)

The gateway's exception handler for agent errors had specific hints for
HTTP 401, 429, 529, 400, 500 — but not 402 (Payment Required / quota
exhausted). Users hitting billing limits from custom proxy providers
got a generic error with no guidance.

Added: 'Your API balance or quota is exhausted. Check your provider
dashboard.'

The underlying billing classification (error_classifier.py) already
correctly handles 402 as FailoverReason.billing with credential
rotation and fallback. The original issue (#5220) where 402 killed
the entire gateway was from an older version — on current main, 402
is excluded from the is_client_error abort path (line 9460) and goes
through the proper retry/fallback/fail flow. Combined with PR #9875
(auto-recover from unexpected SIGTERM), even edge cases where the
gateway dies are now survivable.
2026-04-14 21:03:05 -07:00
Teknium 23b87c8ca8 chore: add zons-zhaozhy to AUTHOR_MAP 2026-04-14 21:01:40 -07:00
阿泥豆 92385679b6 fix: reset retry counters after compression and stop poisoning conversation history
Three bugfixes in the agent loop:

1. Reset retry counters after context compression. Without this,
   pre-compression retry counts carry over, causing the model to
   hit empty-response recovery immediately after a compression-
   induced context loss, wasting API calls on a now-valid context.

2. Unmute output in the final-response (no-tool-call) branch.
   _mute_post_response could be left True from a prior housekeeping
   turn, silently suppressing empty-response warnings and recovery
   status that the user should see.

3. Stop injecting 'Calling the X tools...' into assistant message
   content when falling back to prior-turn content. This mutated
   conversation history with synthetic text that the model never
   produced, poisoning subsequent turns.
2026-04-14 21:01:40 -07:00
Teknium 82f364ffd1 feat: add --all flag to gateway start and restart commands (#10043)
- gateway start --all: kills all stale gateway processes across all
  profiles before starting the current profile's service
- gateway restart --all: stops all gateway processes across all
  profiles, then starts the current profile's service fresh
- gateway stop --all: already existed, unchanged

The --all flag was only available on 'stop' but not on 'start' or
'restart', causing 'unrecognized arguments' errors for users.
2026-04-14 20:52:18 -07:00
Teknium 31d0620663 chore: add simon-marcus to AUTHOR_MAP 2026-04-14 20:51:52 -07:00
Teknium cf1d718823 fix: keep batch-path function_call_output.output as string per OpenAI spec
The streaming path emits output as content-part arrays for Open WebUI
compatibility, but the batch (non-streaming) Responses API path must
return output as a plain string per the OpenAI Responses API spec.
Reverts the _extract_output_items change from the cherry-picked commits
while preserving the streaming path's array format.
2026-04-14 20:51:52 -07:00
simon-marcus 302554b158 fix(api-server): format responses tool outputs for open webui 2026-04-14 20:51:52 -07:00
simon-marcus d6c09ab94a feat(api-server): stream /v1/responses SSE tool events 2026-04-14 20:51:52 -07:00
Teknium da528a8207 fix: detect and strip non-ASCII characters from API keys (#6843)
API keys containing Unicode lookalike characters (e.g. ʋ U+028B instead
of v) cause UnicodeEncodeError when httpx encodes the Authorization
header as ASCII.  This commonly happens when users copy-paste keys from
PDFs, rich-text editors, or web pages with decorative fonts.

Three layers of defense:

1. **Save-time validation** (hermes_cli/config.py):
   _check_non_ascii_credential() strips non-ASCII from credential values
   when saving to .env, with a clear warning explaining the issue.

2. **Load-time sanitization** (hermes_cli/env_loader.py):
   _sanitize_loaded_credentials() strips non-ASCII from credential env
   vars (those ending in _API_KEY, _TOKEN, _SECRET, _KEY) after dotenv
   loads them, so the rest of the codebase never sees non-ASCII keys.

3. **Runtime recovery** (run_agent.py):
   The UnicodeEncodeError recovery block now also sanitizes self.api_key
   and self._client_kwargs['api_key'], fixing the gap where message/tool
   sanitization succeeded but the API key still caused httpx to fail on
   the Authorization header.

Also: hermes_logging.py RotatingFileHandler now explicitly sets
encoding='utf-8' instead of relying on locale default (defensive
hardening for ASCII-locale systems).
2026-04-14 20:20:31 -07:00
kshitijk4poor 677f1227c3 fix: remove @staticmethod from _context_completions — crashes on @ mention
PR #9467 added a call to self._fuzzy_file_completions() inside
_context_completions(), but the method was still decorated with
@staticmethod and didn't receive self. Every @ mention in the input
triggers 'name self is not defined' from prompt_toolkit's async
completer, spamming the error on every keystroke.

Fix: remove @staticmethod, add self parameter. The method already uses
self._fuzzy_file_completions() and self._get_project_files() via that
call chain, so it was never meant to stay static after the fuzzy search
feature was added.
2026-04-14 19:43:42 -07:00
Teknium 4610551d74 fix: update stale comment referencing removed _sync_mcp_toolsets 2026-04-14 17:19:20 -07:00
Greer Guthrie 498cb7a0fc chore(release): map greer guthrie attribution 2026-04-14 17:19:20 -07:00
Greer Guthrie c10fea8d26 fix(mcp): make server aliases explicit 2026-04-14 17:19:20 -07:00
Greer Guthrie cda64a5961 fix(mcp): resolve toolsets from live registry 2026-04-14 17:19:20 -07:00
Teknium 2a98098035 fix: hermes gateway restart waits for service to come back up (#8260)
Previously, systemd_restart() sent SIGUSR1 to the gateway, printed
'restart requested', and returned immediately. The gateway still
needed to drain active agents, exit with code 75, wait for systemd's
RestartSec=30, and start the new process. The user saw 'success' but
the gateway was actually down for 30-60 seconds.

Now the SIGUSR1 path blocks with progress feedback:

Phase 1 — wait for old process to die:
   User service draining active work...
  Polls os.kill(pid, 0) until ProcessLookupError (up to 90s)

Phase 2 — wait for new process to become active:
   Waiting for hermes-gateway to restart...
  Polls systemctl is-active + verifies new PID (up to 60s)

Success:
  ✓ User service restarted (PID 12345)

Timeout:
  ⚠ User service did not become active within 60s.
    Check status: hermes gateway status
    Check logs: journalctl --user -u hermes-gateway --since '2 min ago'

The reload-or-restart fallback path (line 1189) already blocks because
systemctl reload-or-restart is synchronous.

Test plan:
- Updated test to verify wait-for-restart behavior
- All 118 gateway CLI tests pass
2026-04-14 17:12:58 -07:00
Teknium 6c89306437 fix: break stuck session resume loops after repeated restarts (#7536)
When a session gets stuck (hung terminal, runaway tool loop) and the
user restarts the gateway, the same session history loads and puts the
agent right back in the stuck state. The user is trapped in a loop:
restart → stuck → restart → stuck.

Fix: track restart-failure counts per session using a simple JSON file
(.restart_failure_counts). On each shutdown with active agents, the
counter increments for those sessions. On startup, if any session has
been active across 3+ consecutive restarts, it's auto-suspended —
giving the user a clean slate on their next message.

The counter resets to 0 when a session completes a turn successfully
(response delivered), so normal sessions that happen to be active
during planned restarts (/restart, hermes update) won't accumulate
false counts.

Implementation:
- _increment_restart_failure_counts(): called during stop() when
  agents are active. Writes {session_key: count} to JSON file.
  Sessions NOT active are dropped (loop broken).
- _suspend_stuck_loop_sessions(): called on startup. Reads the file,
  suspends sessions at threshold (3), clears the file.
- _clear_restart_failure_count(): called after successful response
  delivery. Removes the session from the counter file.

No SessionEntry schema changes. No database migration. Pure file-based
tracking that naturally cleans up.

Test plan:
- 9 new stuck-loop tests (increment, accumulate, threshold, clear,
  suspend, file cleanup, edge cases)
- All 28 gateway lifecycle tests pass (restart drain + auto-continue
  + stuck loop)
2026-04-14 17:08:35 -07:00
Teknium 847d7cbea5 fix: improve CLI text padding, word-wrap for responses and verbose tool output (#9920)
* feat(skills): add fitness-nutrition skill to optional-skills

Cherry-picked from PR #9177 by @haileymarshall.

Adds a fitness and nutrition skill for gym-goers and health-conscious users:
- Exercise search via wger API (690+ exercises, free, no auth)
- Nutrition lookup via USDA FoodData Central (380K+ foods, DEMO_KEY fallback)
- Offline body composition calculators (BMI, TDEE, 1RM, macros, body fat %)
- Pure stdlib Python, no pip dependencies

Changes from original PR:
- Moved from skills/ to optional-skills/health/ (correct location)
- Fixed BMR formula in FORMULAS.md (removed confusing -5+10, now just +5)
- Fixed author attribution to match PR submitter
- Marked USDA_API_KEY as optional (DEMO_KEY works without signup)

Also adds optional env var support to the skill readiness checker:
- New 'optional: true' field in required_environment_variables entries
- Optional vars are preserved in metadata but don't block skill readiness
- Optional vars skip the CLI capture prompt flow
- Skills with only optional missing vars show as 'available' not 'setup_needed'

* fix: increase CLI response text padding to 4-space tab indent

Increases horizontal padding on all response display paths:

- Rich Panel responses (main, background, /btw): padding (1,2) -> (1,4)
- Streaming text: add 4-space indent prefix to each line
- Streaming TTS: add 4-space indent prefix to sentences

Gives response text proper breathing room with a tab-width indent.
Rich Panel word wrapping automatically adjusts for the wider padding.

Requested by AriesTheCoder.

* fix: word-wrap verbose tool call args and results to terminal width

Verbose mode (tool_progress: verbose) printed tool args and results as
single unwrapped lines that could be thousands of characters long.

Adds _wrap_verbose() helper that:
- Pretty-prints JSON args with indent=2 instead of one-line dumps
- Splits text on existing newlines (preserves JSON/structured output)
- Wraps lines exceeding terminal width with 5-char continuation indent
- Uses break_long_words=True for URLs and paths without spaces

Applied to all 4 verbose print sites:
- Concurrent tool call args
- Concurrent tool results
- Sequential tool call args
- Sequential tool results

---------

Co-authored-by: haileymarshall <haileymarshall@users.noreply.github.com>
2026-04-14 16:58:23 -07:00
Teknium a9c78d0eb0 feat(setup): add recommendation badges to tool provider selection (#9929)
New users don't know which tool providers to pick during setup.
Add [badge] labels to each provider in the selection menu:

  - [★ recommended · free] for best default choices (Edge TTS, Local Browser)
  - [★ recommended] for top-tier paid options (Firecrawl Cloud)
  - [paid] for options requiring an API key
  - [free tier] for services with a free tier (Tavily)
  - [free · self-hosted] / [free · local] for self-run options
  - [subscription] for Nous subscription-managed options

Also improves vague tag descriptions — e.g. 'AI-native search and
contents' becomes 'Neural search with semantic understanding' and
Tavily gets '1000 free searches/mo'.

Both hermes setup and hermes tools share the same rendering path,
so badges appear in both flows.

Addresses user feedback about setup being confusing for newcomers.
2026-04-14 16:58:10 -07:00
Teknium e7475b1582 feat: auto-continue interrupted agent work after gateway restart (#4493)
When the gateway restarts mid-agent-work, the session transcript ends
on a tool result the agent never processed. Previously, the user had
to type 'continue' or use /retry (which replays from scratch, losing
all prior work).

Now, when the next user message arrives and the loaded history ends
with role='tool', a system note is prepended:

  [System note: Your previous turn was interrupted before you could
  process the last tool result(s). Please finish processing those
  results and summarize what was accomplished, then address the
  user's new message below.]

This is injected in _run_agent()'s run_sync closure, right before
calling agent.run_conversation(). The agent sees the full history
(including the pending tool results) and the system note, so it can
summarize what was accomplished and then handle the user's new input.

Design decisions:
- No new session flags or schema changes — purely detects trailing
  tool messages in the loaded history
- Works for any restart scenario (clean, crash, SIGTERM, drain timeout)
  as long as the session wasn't suspended (suspended = fresh start)
- The user's actual message is preserved after the note
- If the session WAS suspended (unclean shutdown), the old history is
  abandoned and the user starts fresh — no false auto-continue

Also updates the shutdown notification message from 'Use /retry after
restart to continue' to 'Send any message after restart to resume
where it left off' — which is now accurate.

Test plan:
- 6 new auto-continue tests (trailing tool detection, no false
  positives for assistant/user/empty history, multi-tool, message
  preservation)
- All 13 restart drain tests pass (updated /retry assertion)
2026-04-14 16:56:49 -07:00
Teknium ac1f8fcccd docs(termux): note browser tool PATH auto-discovery
Update the Termux guide to mention that the browser tool now
automatically discovers Termux directories, and add the missing
pkg install nodejs-lts step.
2026-04-14 16:55:55 -07:00
adybag14-cyber 56c34ac4f7 fix(browser): add termux PATH fallbacks
Refactor browser tool PATH construction to include Termux directories
(/data/data/com.termux/files/usr/bin, /data/data/com.termux/files/usr/sbin)
so agent-browser and npx are discoverable on Android/Termux.

Extracts _browser_candidate_path_dirs() and _merge_browser_path() helpers
to centralize PATH construction shared between _find_agent_browser() and
_run_browser_command(), replacing duplicated inline logic.

Also fixes os.pathsep usage (was hardcoded ':') for cross-platform correctness.

Cherry-picked from PR #9846.
2026-04-14 16:55:55 -07:00
Teknium 3ca7417c2a chore: add areu01or00 to AUTHOR_MAP 2026-04-14 16:55:48 -07:00
areu01or00 cfa24532d3 fix(discord): register native /restart slash command 2026-04-14 16:55:48 -07:00
Teknium b24e5ee4b0 feat(google-workspace): add --from flag for custom sender display name (#9931)
Adds --from flag to gmail send and gmail reply commands, allowing agents
to customize the From header display name when sharing the same email
account. Usage: --from '"Agent Name" <user@example.com>'

Also syncs repo google_api.py with the deployed standalone implementation
(replaces outdated gws_bridge thin wrapper), adds dedicated docs page
under Features > Skills, and updates sidebar navigation.

Requested by community user @Maxime44.
2026-04-14 16:55:34 -07:00
Julien Talbot 3b50821555 feat(xai): add xAI/Grok to provider prefix stripping
Add 'xai', 'x-ai', 'x.ai', 'grok' to _PROVIDER_PREFIXES so that
colon-prefixed model names (e.g. xai:grok-4.20) are stripped correctly
for context length lookups.

Cherry-picked from PR #9184 by @Julientalbot.
2026-04-14 16:43:42 -07:00
199 changed files with 19297 additions and 1704 deletions
+4
View File
@@ -145,6 +145,10 @@
# Only override here if you need to force a backend without touching config.yaml:
# TERMINAL_ENV=local
# Override the container runtime binary (e.g. to use Podman instead of Docker).
# Useful on systems where Docker's storage driver is broken or unavailable.
# HERMES_DOCKER_BINARY=/usr/local/bin/podman
# Container images (for singularity/docker/modal backends)
# TERMINAL_DOCKER_IMAGE=nikolaik/python-nodejs:python3.11-nodejs20
# TERMINAL_SINGULARITY_IMAGE=docker://nikolaik/python-nodejs:python3.11-nodejs20
+1
View File
@@ -105,3 +105,4 @@ tesseracttars-creator <tesseracttars@gmail.com> <tesseracttars@gmail.com>
xinbenlv <zzn+pa@zzn.im> <zzn+pa@zzn.im>
SaulJWu <saul.jj.wu@gmail.com> <saul.jj.wu@gmail.com>
angelos <angelos@oikos.lan.home.malaiwah.com> <angelos@oikos.lan.home.malaiwah.com>
MestreY0d4-Uninter <241404605+MestreY0d4-Uninter@users.noreply.github.com> <MestreY0d4-Uninter@users.noreply.github.com>
+4 -4
View File
@@ -13,7 +13,7 @@ source venv/bin/activate # ALWAYS activate before running Python
```
hermes-agent/
├── run_agent.py # AIAgent class — core conversation loop
├── model_tools.py # Tool orchestration, _discover_tools(), handle_function_call()
├── model_tools.py # Tool orchestration, discover_builtin_tools(), handle_function_call()
├── toolsets.py # Toolset definitions, _HERMES_CORE_TOOLS list
├── cli.py # HermesCLI class — interactive CLI orchestrator
├── hermes_state.py # SessionDB — SQLite session store (FTS5 search)
@@ -181,7 +181,7 @@ if canonical == "mycommand":
## Adding New Tools
Requires changes in **3 files**:
Requires changes in **2 files**:
**1. Create `tools/your_tool.py`:**
```python
@@ -204,9 +204,9 @@ registry.register(
)
```
**2. Add import** in `model_tools.py` `_discover_tools()` list.
**2. Add to `toolsets.py`** — either `_HERMES_CORE_TOOLS` (all platforms) or a new toolset.
**3. Add to `toolsets.py`** — either `_HERMES_CORE_TOOLS` (all platforms) or a new toolset.
Auto-discovery: any `tools/*.py` file with a top-level `registry.register()` call is imported automatically — no manual import list to maintain.
The registry handles schema collection, dispatch, availability checking, and error wrapping. All handlers MUST return a JSON string.
+84
View File
@@ -0,0 +1,84 @@
# Hermes Agent Security Policy
This document outlines the security protocols, trust model, and deployment hardening guidelines for the **Hermes Agent** project.
## 1. Vulnerability Reporting
Hermes Agent does **not** operate a bug bounty program. Security issues should be reported via [GitHub Security Advisories (GHSA)](https://github.com/NousResearch/hermes-agent/security/advisories/new) or by emailing **security@nousresearch.com**. Do not open public issues for security vulnerabilities.
### Required Submission Details
- **Title & Severity:** Concise description and CVSS score/rating.
- **Affected Component:** Exact file path and line range (e.g., `tools/approval.py:120-145`).
- **Environment:** Output of `hermes version`, commit SHA, OS, and Python version.
- **Reproduction:** Step-by-step Proof-of-Concept (PoC) against `main` or the latest release.
- **Impact:** Explanation of what trust boundary was crossed.
---
## 2. Trust Model
The core assumption is that Hermes is a **personal agent** with one trusted operator.
### Operator & Session Trust
- **Single Tenant:** The system protects the operator from LLM actions, not from malicious co-tenants. Multi-user isolation must happen at the OS/host level.
- **Gateway Security:** Authorized callers (Telegram, Discord, Slack, etc.) receive equal trust. Session keys are used for routing, not as authorization boundaries.
- **Execution:** Defaults to `terminal.backend: local` (direct host execution). Container isolation (Docker, Modal, Daytona) is opt-in for sandboxing.
### Dangerous Command Approval
The approval system (`tools/approval.py`) is a core security boundary. Terminal commands, file operations, and other potentially destructive actions are gated behind explicit user confirmation before execution. The approval mode is configurable via `approvals.mode` in `config.yaml`:
- `"on"` (default) — prompts the user to approve dangerous commands.
- `"auto"` — auto-approves after a configurable delay.
- `"off"` — disables the gate entirely (break-glass; see Section 3).
### Output Redaction
`agent/redact.py` strips secret-like patterns (API keys, tokens, credentials) from all display output before it reaches the terminal or gateway platform. This prevents accidental credential leakage in chat logs, tool previews, and response text. Redaction operates on the display layer only — underlying values remain intact for internal agent operations.
### Skills vs. MCP Servers
- **Installed Skills:** High trust. Equivalent to local host code; skills can read environment variables and run arbitrary commands.
- **MCP Servers:** Lower trust. MCP subprocesses receive a filtered environment (`_build_safe_env()` in `tools/mcp_tool.py`) — only safe baseline variables (`PATH`, `HOME`, `XDG_*`) plus variables explicitly declared in the server's `env` config block are passed through. Host credentials are stripped by default. Additionally, packages invoked via `npx`/`uvx` are checked against the OSV malware database before spawning.
### Code Execution Sandbox
The `execute_code` tool (`tools/code_execution_tool.py`) runs LLM-generated Python scripts in a child process with API keys and tokens stripped from the environment to prevent credential exfiltration. Only environment variables explicitly declared by loaded skills (via `env_passthrough`) or by the user in `config.yaml` (`terminal.env_passthrough`) are passed through. The child accesses Hermes tools via RPC, not direct API calls.
### Subagents
- **No recursive delegation:** The `delegate_task` tool is disabled for child agents.
- **Depth limit:** `MAX_DEPTH = 2` — parent (depth 0) can spawn a child (depth 1); grandchildren are rejected.
- **Memory isolation:** Subagents run with `skip_memory=True` and do not have access to the parent's persistent memory provider. The parent receives only the task prompt and final response as an observation.
---
## 3. Out of Scope (Non-Vulnerabilities)
The following scenarios are **not** considered security breaches:
- **Prompt Injection:** Unless it results in a concrete bypass of the approval system, toolset restrictions, or container sandbox.
- **Public Exposure:** Deploying the gateway to the public internet without external authentication or network protection.
- **Trusted State Access:** Reports that require pre-existing write access to `~/.hermes/`, `.env`, or `config.yaml` (these are operator-owned files).
- **Default Behavior:** Host-level command execution when `terminal.backend` is set to `local` — this is the documented default, not a vulnerability.
- **Configuration Trade-offs:** Intentional break-glass settings such as `approvals.mode: "off"` or `terminal.backend: local` in production.
- **Tool-level read/access restrictions:** The agent has unrestricted shell access via the `terminal` tool by design. Reports that a specific tool (e.g., `read_file`) can access a resource are not vulnerabilities if the same access is available through `terminal`. Tool-level deny lists only constitute a meaningful security boundary when paired with equivalent restrictions on the terminal side (as with write operations, where `WRITE_DENIED_PATHS` is paired with the dangerous command approval system).
---
## 4. Deployment Hardening & Best Practices
### Filesystem & Network
- **Production sandboxing:** Use container backends (`docker`, `modal`, `daytona`) instead of `local` for untrusted workloads.
- **File permissions:** Run as non-root (the Docker image uses UID 10000); protect credentials with `chmod 600 ~/.hermes/.env` on local installs.
- **Network exposure:** Do not expose the gateway or API server to the public internet without VPN, Tailscale, or firewall protection. SSRF protection is enabled by default across all gateway platform adapters (Telegram, Discord, Slack, Matrix, Mattermost, etc.) with redirect validation. Note: the local terminal backend does not apply SSRF filtering, as it operates within the trusted operator's environment.
### Skills & Supply Chain
- **Skill installation:** Review Skills Guard reports (`tools/skills_guard.py`) before installing third-party skills. The audit log at `~/.hermes/skills/.hub/audit.log` tracks every install and removal.
- **MCP safety:** OSV malware checking runs automatically for `npx`/`uvx` packages before MCP server processes are spawned.
- **CI/CD:** GitHub Actions are pinned to full commit SHAs. The `supply-chain-audit.yml` workflow blocks PRs containing `.pth` files or suspicious `base64`+`exec` patterns.
### Credential Storage
- API keys and tokens belong exclusively in `~/.hermes/.env` — never in `config.yaml` or checked into version control.
- The credential pool system (`agent/credential_pool.py`) handles key rotation and fallback. Credentials are resolved from environment variables, not stored in plaintext databases.
---
## 5. Disclosure Process
- **Coordinated Disclosure:** 90-day window or until a fix is released, whichever comes first.
- **Communication:** All updates occur via the GHSA thread or email correspondence with security@nousresearch.com.
- **Credits:** Reporters are credited in release notes unless anonymity is requested.
+27
View File
@@ -298,6 +298,33 @@ def build_anthropic_client(api_key: str, base_url: str = None):
return _anthropic_sdk.Anthropic(**kwargs)
def build_anthropic_bedrock_client(region: str):
"""Create an AnthropicBedrock client for Bedrock Claude models.
Uses the Anthropic SDK's native Bedrock adapter, which provides full
Claude feature parity: prompt caching, thinking budgets, adaptive
thinking, fast mode — features not available via the Converse API.
Auth uses the boto3 default credential chain (IAM roles, SSO, env vars).
"""
if _anthropic_sdk is None:
raise ImportError(
"The 'anthropic' package is required for the Bedrock provider. "
"Install it with: pip install 'anthropic>=0.39.0'"
)
if not hasattr(_anthropic_sdk, "AnthropicBedrock"):
raise ImportError(
"anthropic.AnthropicBedrock not available. "
"Upgrade with: pip install 'anthropic>=0.39.0'"
)
from httpx import Timeout
return _anthropic_sdk.AnthropicBedrock(
aws_region=region,
timeout=Timeout(timeout=900.0, connect=10.0),
)
def read_claude_code_credentials() -> Optional[Dict[str, Any]]:
"""Read refreshable Claude Code OAuth credentials from ~/.claude/.credentials.json.
+101 -18
View File
@@ -775,6 +775,21 @@ def _try_openrouter() -> Tuple[Optional[OpenAI], Optional[str]]:
def _try_nous(vision: bool = False) -> Tuple[Optional[OpenAI], Optional[str]]:
# Check cross-session rate limit guard before attempting Nous —
# if another session already recorded a 429, skip Nous entirely
# to avoid piling more requests onto the tapped RPH bucket.
try:
from agent.nous_rate_guard import nous_rate_limit_remaining
_remaining = nous_rate_limit_remaining()
if _remaining is not None and _remaining > 0:
logger.debug(
"Auxiliary: skipping Nous Portal (rate-limited, resets in %.0fs)",
_remaining,
)
return None, None
except Exception:
pass
nous = _read_nous_auth()
if not nous:
return None, None
@@ -899,6 +914,51 @@ def _current_custom_base_url() -> str:
return custom_base or ""
def _validate_proxy_env_urls() -> None:
"""Fail fast with a clear error when proxy env vars have malformed URLs.
Common cause: shell config (e.g. .zshrc) with a typo like
``export HTTP_PROXY=http://127.0.0.1:6153export NEXT_VAR=...``
which concatenates 'export' into the port number. Without this
check the OpenAI/httpx client raises a cryptic ``Invalid port``
error that doesn't name the offending env var.
"""
from urllib.parse import urlparse
for key in ("HTTPS_PROXY", "HTTP_PROXY", "ALL_PROXY",
"https_proxy", "http_proxy", "all_proxy"):
value = str(os.environ.get(key) or "").strip()
if not value:
continue
try:
parsed = urlparse(value)
if parsed.scheme:
_ = parsed.port # raises ValueError for e.g. '6153export'
except ValueError as exc:
raise RuntimeError(
f"Malformed proxy environment variable {key}={value!r}. "
"Fix or unset your proxy settings and try again."
) from exc
def _validate_base_url(base_url: str) -> None:
"""Reject obviously broken custom endpoint URLs before they reach httpx."""
from urllib.parse import urlparse
candidate = str(base_url or "").strip()
if not candidate or candidate.startswith("acp://"):
return
try:
parsed = urlparse(candidate)
if parsed.scheme in {"http", "https"}:
_ = parsed.port # raises ValueError for malformed ports
except ValueError as exc:
raise RuntimeError(
f"Malformed custom endpoint URL: {candidate!r}. "
"Run `hermes setup` or `hermes model` and enter a valid http(s) base URL."
) from exc
def _try_custom_endpoint() -> Tuple[Optional[OpenAI], Optional[str]]:
runtime = _resolve_custom_runtime()
if len(runtime) == 2:
@@ -1299,6 +1359,7 @@ def resolve_provider_client(
Returns:
(client, resolved_model) or (None, None) if auth is unavailable.
"""
_validate_proxy_env_urls()
# Normalise aliases
provider = _normalize_aux_provider(provider)
@@ -1835,9 +1896,15 @@ def auxiliary_max_tokens_param(value: int) -> dict:
# Every auxiliary LLM consumer should use these instead of manually
# constructing clients and calling .chat.completions.create().
# Client cache: (provider, async_mode, base_url, api_key) -> (client, default_model)
# Client cache: (provider, async_mode, base_url, api_key, api_mode, runtime_key) -> (client, default_model, loop)
# NOTE: loop identity is NOT part of the key. On async cache hits we check
# whether the cached loop is the *current* loop; if not, the stale entry is
# replaced in-place. This bounds cache growth to one entry per unique
# provider config rather than one per (config × event-loop), which previously
# caused unbounded fd accumulation in long-running gateway processes (#10200).
_client_cache: Dict[tuple, tuple] = {}
_client_cache_lock = threading.Lock()
_CLIENT_CACHE_MAX_SIZE = 64 # safety belt — evict oldest when exceeded
def neuter_async_httpx_del() -> None:
@@ -1970,39 +2037,49 @@ def _get_cached_client(
Async clients (AsyncOpenAI) use httpx.AsyncClient internally, which
binds to the event loop that was current when the client was created.
Using such a client on a *different* loop causes deadlocks or
RuntimeError. To prevent cross-loop issues (especially in gateway
mode where _run_async() may spawn fresh loops in worker threads), the
cache key for async clients includes the current event loop's identity
so each loop gets its own client instance.
RuntimeError. To prevent cross-loop issues, the cache validates on
every async hit that the cached loop is the *current, open* loop.
If the loop changed (e.g. a new gateway worker-thread loop), the stale
entry is replaced in-place rather than creating an additional entry.
This keeps cache size bounded to one entry per unique provider config,
preventing the fd-exhaustion that previously occurred in long-running
gateways where recycled worker threads created unbounded entries (#10200).
"""
# Include loop identity for async clients to prevent cross-loop reuse.
# httpx.AsyncClient (inside AsyncOpenAI) is bound to the loop where it
# was created — reusing it on a different loop causes deadlocks (#2681).
loop_id = 0
# Resolve the current event loop for async clients so we can validate
# cached entries. Loop identity is NOT in the cache key — instead we
# check at hit time whether the cached loop is still current and open.
# This prevents unbounded cache growth from recycled worker-thread loops
# while still guaranteeing we never reuse a client on the wrong loop
# (which causes deadlocks, see #2681).
current_loop = None
if async_mode:
try:
import asyncio as _aio
current_loop = _aio.get_event_loop()
loop_id = id(current_loop)
except RuntimeError:
pass
runtime = _normalize_main_runtime(main_runtime)
runtime_key = tuple(runtime.get(field, "") for field in _MAIN_RUNTIME_FIELDS) if provider == "auto" else ()
cache_key = (provider, async_mode, base_url or "", api_key or "", api_mode or "", loop_id, runtime_key)
cache_key = (provider, async_mode, base_url or "", api_key or "", api_mode or "", runtime_key)
with _client_cache_lock:
if cache_key in _client_cache:
cached_client, cached_default, cached_loop = _client_cache[cache_key]
if async_mode:
# A cached async client whose loop has been closed will raise
# "Event loop is closed" when httpx tries to clean up its
# transport. Discard the stale client and create a fresh one.
if cached_loop is not None and cached_loop.is_closed():
_force_close_async_httpx(cached_client)
del _client_cache[cache_key]
else:
# Validate: the cached client must be bound to the CURRENT,
# OPEN loop. If the loop changed or was closed, the httpx
# transport inside is dead — force-close and replace.
loop_ok = (
cached_loop is not None
and cached_loop is current_loop
and not cached_loop.is_closed()
)
if loop_ok:
effective = _compat_model(cached_client, model, cached_default)
return cached_client, effective
# Stale — evict and fall through to create a new client.
_force_close_async_httpx(cached_client)
del _client_cache[cache_key]
else:
effective = _compat_model(cached_client, model, cached_default)
return cached_client, effective
@@ -2022,6 +2099,12 @@ def _get_cached_client(
bound_loop = current_loop
with _client_cache_lock:
if cache_key not in _client_cache:
# Safety belt: if the cache has grown beyond the max, evict
# the oldest entries (FIFO — dict preserves insertion order).
while len(_client_cache) >= _CLIENT_CACHE_MAX_SIZE:
evict_key, evict_entry = next(iter(_client_cache.items()))
_force_close_async_httpx(evict_entry[0])
del _client_cache[evict_key]
_client_cache[cache_key] = (client, default_model, bound_loop)
else:
client, default_model, _ = _client_cache[cache_key]
File diff suppressed because it is too large Load Diff
+307 -36
View File
@@ -17,7 +17,10 @@ Improvements over v2:
- Richer tool call/result detail in summarizer input
"""
import hashlib
import json
import logging
import re
import time
from typing import Any, Dict, List, Optional
@@ -57,6 +60,128 @@ _CHARS_PER_TOKEN = 4
_SUMMARY_FAILURE_COOLDOWN_SECONDS = 600
def _summarize_tool_result(tool_name: str, tool_args: str, tool_content: str) -> str:
"""Create an informative 1-line summary of a tool call + result.
Used during the pre-compression pruning pass to replace large tool
outputs with a short but useful description of what the tool did,
rather than a generic placeholder that carries zero information.
Returns strings like::
[terminal] ran `npm test` -> exit 0, 47 lines output
[read_file] read config.py from line 1 (1,200 chars)
[search_files] content search for 'compress' in agent/ -> 12 matches
"""
try:
args = json.loads(tool_args) if tool_args else {}
except (json.JSONDecodeError, TypeError):
args = {}
content = tool_content or ""
content_len = len(content)
line_count = content.count("\n") + 1 if content.strip() else 0
if tool_name == "terminal":
cmd = args.get("command", "")
if len(cmd) > 80:
cmd = cmd[:77] + "..."
exit_match = re.search(r'"exit_code"\s*:\s*(-?\d+)', content)
exit_code = exit_match.group(1) if exit_match else "?"
return f"[terminal] ran `{cmd}` -> exit {exit_code}, {line_count} lines output"
if tool_name == "read_file":
path = args.get("path", "?")
offset = args.get("offset", 1)
return f"[read_file] read {path} from line {offset} ({content_len:,} chars)"
if tool_name == "write_file":
path = args.get("path", "?")
written_lines = args.get("content", "").count("\n") + 1 if args.get("content") else "?"
return f"[write_file] wrote to {path} ({written_lines} lines)"
if tool_name == "search_files":
pattern = args.get("pattern", "?")
path = args.get("path", ".")
target = args.get("target", "content")
match_count = re.search(r'"total_count"\s*:\s*(\d+)', content)
count = match_count.group(1) if match_count else "?"
return f"[search_files] {target} search for '{pattern}' in {path} -> {count} matches"
if tool_name == "patch":
path = args.get("path", "?")
mode = args.get("mode", "replace")
return f"[patch] {mode} in {path} ({content_len:,} chars result)"
if tool_name in ("browser_navigate", "browser_click", "browser_snapshot",
"browser_type", "browser_scroll", "browser_vision"):
url = args.get("url", "")
ref = args.get("ref", "")
detail = f" {url}" if url else (f" ref={ref}" if ref else "")
return f"[{tool_name}]{detail} ({content_len:,} chars)"
if tool_name == "web_search":
query = args.get("query", "?")
return f"[web_search] query='{query}' ({content_len:,} chars result)"
if tool_name == "web_extract":
urls = args.get("urls", [])
url_desc = urls[0] if isinstance(urls, list) and urls else "?"
if isinstance(urls, list) and len(urls) > 1:
url_desc += f" (+{len(urls) - 1} more)"
return f"[web_extract] {url_desc} ({content_len:,} chars)"
if tool_name == "delegate_task":
goal = args.get("goal", "")
if len(goal) > 60:
goal = goal[:57] + "..."
return f"[delegate_task] '{goal}' ({content_len:,} chars result)"
if tool_name == "execute_code":
code_preview = (args.get("code") or "")[:60].replace("\n", " ")
if len(args.get("code", "")) > 60:
code_preview += "..."
return f"[execute_code] `{code_preview}` ({line_count} lines output)"
if tool_name in ("skill_view", "skills_list", "skill_manage"):
name = args.get("name", "?")
return f"[{tool_name}] name={name} ({content_len:,} chars)"
if tool_name == "vision_analyze":
question = args.get("question", "")[:50]
return f"[vision_analyze] '{question}' ({content_len:,} chars)"
if tool_name == "memory":
action = args.get("action", "?")
target = args.get("target", "?")
return f"[memory] {action} on {target}"
if tool_name == "todo":
return "[todo] updated task list"
if tool_name == "clarify":
return "[clarify] asked user a question"
if tool_name == "text_to_speech":
return f"[text_to_speech] generated audio ({content_len:,} chars)"
if tool_name == "cronjob":
action = args.get("action", "?")
return f"[cronjob] {action}"
if tool_name == "process":
action = args.get("action", "?")
sid = args.get("session_id", "?")
return f"[process] {action} session={sid}"
# Generic fallback
first_arg = ""
for k, v in list(args.items())[:2]:
sv = str(v)[:40]
first_arg += f" {k}={sv}"
return f"[{tool_name}]{first_arg} ({content_len:,} chars result)"
class ContextCompressor(ContextEngine):
"""Default context engine — compresses conversation context via lossy summarization.
@@ -78,6 +203,8 @@ class ContextCompressor(ContextEngine):
self._context_probed = False
self._context_probe_persistable = False
self._previous_summary = None
self._last_compression_savings_pct = 100.0
self._ineffective_compression_count = 0
def update_model(
self,
@@ -167,6 +294,9 @@ class ContextCompressor(ContextEngine):
# Stores the previous compaction summary for iterative updates
self._previous_summary: Optional[str] = None
# Anti-thrashing: track whether last compression was effective
self._last_compression_savings_pct: float = 100.0
self._ineffective_compression_count: int = 0
self._summary_failure_cooldown_until: float = 0.0
def update_from_response(self, usage: Dict[str, Any]):
@@ -175,9 +305,26 @@ class ContextCompressor(ContextEngine):
self.last_completion_tokens = usage.get("completion_tokens", 0)
def should_compress(self, prompt_tokens: int = None) -> bool:
"""Check if context exceeds the compression threshold."""
"""Check if context exceeds the compression threshold.
Includes anti-thrashing protection: if the last two compressions
each saved less than 10%, skip compression to avoid infinite loops
where each pass removes only 1-2 messages.
"""
tokens = prompt_tokens if prompt_tokens is not None else self.last_prompt_tokens
return tokens >= self.threshold_tokens
if tokens < self.threshold_tokens:
return False
# Anti-thrashing: back off if recent compressions were ineffective
if self._ineffective_compression_count >= 2:
if not self.quiet_mode:
logger.warning(
"Compression skipped — last %d compressions saved <10%% each. "
"Consider /new to start a fresh session, or /compress <topic> "
"for focused compression.",
self._ineffective_compression_count,
)
return False
return True
# ------------------------------------------------------------------
# Tool output pruning (cheap pre-pass, no LLM call)
@@ -187,7 +334,16 @@ class ContextCompressor(ContextEngine):
self, messages: List[Dict[str, Any]], protect_tail_count: int,
protect_tail_tokens: int | None = None,
) -> tuple[List[Dict[str, Any]], int]:
"""Replace old tool result contents with a short placeholder.
"""Replace old tool result contents with informative 1-line summaries.
Instead of a generic placeholder, generates a summary like::
[terminal] ran `npm test` -> exit 0, 47 lines output
[read_file] read config.py from line 1 (3,400 chars)
Also deduplicates identical tool results (e.g. reading the same file
5x keeps only the newest full copy) and truncates large tool_call
arguments in assistant messages outside the protected tail.
Walks backward from the end, protecting the most recent messages that
fall within ``protect_tail_tokens`` (when provided) OR the last
@@ -203,6 +359,22 @@ class ContextCompressor(ContextEngine):
result = [m.copy() for m in messages]
pruned = 0
# Build index: tool_call_id -> (tool_name, arguments_json)
call_id_to_tool: Dict[str, tuple] = {}
for msg in result:
if msg.get("role") == "assistant":
for tc in msg.get("tool_calls") or []:
if isinstance(tc, dict):
cid = tc.get("id", "")
fn = tc.get("function", {})
call_id_to_tool[cid] = (fn.get("name", "unknown"), fn.get("arguments", ""))
else:
cid = getattr(tc, "id", "") or ""
fn = getattr(tc, "function", None)
name = getattr(fn, "name", "unknown") if fn else "unknown"
args_str = getattr(fn, "arguments", "") if fn else ""
call_id_to_tool[cid] = (name, args_str)
# Determine the prune boundary
if protect_tail_tokens is not None and protect_tail_tokens > 0:
# Token-budget approach: walk backward accumulating tokens
@@ -211,7 +383,8 @@ class ContextCompressor(ContextEngine):
min_protect = min(protect_tail_count, len(result) - 1)
for i in range(len(result) - 1, -1, -1):
msg = result[i]
content_len = len(msg.get("content") or "")
raw_content = msg.get("content") or ""
content_len = sum(len(p.get("text", "")) for p in raw_content) if isinstance(raw_content, list) else len(raw_content)
msg_tokens = content_len // _CHARS_PER_TOKEN + 10
for tc in msg.get("tool_calls") or []:
if isinstance(tc, dict):
@@ -226,18 +399,69 @@ class ContextCompressor(ContextEngine):
else:
prune_boundary = len(result) - protect_tail_count
# Pass 1: Deduplicate identical tool results.
# When the same file is read multiple times, keep only the most recent
# full copy and replace older duplicates with a back-reference.
content_hashes: dict = {} # hash -> (index, tool_call_id)
for i in range(len(result) - 1, -1, -1):
msg = result[i]
if msg.get("role") != "tool":
continue
content = msg.get("content") or ""
# Skip multimodal content (list of content blocks)
if isinstance(content, list):
continue
if len(content) < 200:
continue
h = hashlib.md5(content.encode("utf-8", errors="replace")).hexdigest()[:12]
if h in content_hashes:
# This is an older duplicate — replace with back-reference
result[i] = {**msg, "content": "[Duplicate tool output — same content as a more recent call]"}
pruned += 1
else:
content_hashes[h] = (i, msg.get("tool_call_id", "?"))
# Pass 2: Replace old tool results with informative summaries
for i in range(prune_boundary):
msg = result[i]
if msg.get("role") != "tool":
continue
content = msg.get("content", "")
# Skip multimodal content (list of content blocks)
if isinstance(content, list):
continue
if not content or content == _PRUNED_TOOL_PLACEHOLDER:
continue
# Skip already-deduplicated or previously-summarized results
if content.startswith("[Duplicate tool output"):
continue
# Only prune if the content is substantial (>200 chars)
if len(content) > 200:
result[i] = {**msg, "content": _PRUNED_TOOL_PLACEHOLDER}
call_id = msg.get("tool_call_id", "")
tool_name, tool_args = call_id_to_tool.get(call_id, ("unknown", ""))
summary = _summarize_tool_result(tool_name, tool_args, content)
result[i] = {**msg, "content": summary}
pruned += 1
# Pass 3: Truncate large tool_call arguments in assistant messages
# outside the protected tail. write_file with 50KB content, for
# example, survives pruning entirely without this.
for i in range(prune_boundary):
msg = result[i]
if msg.get("role") != "assistant" or not msg.get("tool_calls"):
continue
new_tcs = []
modified = False
for tc in msg["tool_calls"]:
if isinstance(tc, dict):
args = tc.get("function", {}).get("arguments", "")
if len(args) > 500:
tc = {**tc, "function": {**tc["function"], "arguments": args[:200] + "...[truncated]"}}
modified = True
new_tcs.append(tc)
if modified:
result[i] = {**msg, "tool_calls": new_tcs}
return result, pruned
# ------------------------------------------------------------------
@@ -357,29 +581,37 @@ class ContextCompressor(ContextEngine):
)
# Shared structured template (used by both paths).
# Key changes vs v1:
# - "Pending User Asks" section (from Claude Code) explicitly tracks
# unanswered questions so the model knows what's resolved vs open
# - "Remaining Work" replaces "Next Steps" to avoid reading as active
# instructions
# - "Resolved Questions" makes it clear which questions were already
# answered (prevents model from re-answering them)
_template_sections = f"""## Goal
[What the user is trying to accomplish]
## Constraints & Preferences
[User preferences, coding style, constraints, important decisions]
## Progress
### Done
[Completed work — include specific file paths, commands run, results obtained]
### In Progress
[Work currently underway]
### Blocked
[Any blockers or issues encountered]
## Completed Actions
[Numbered list of concrete actions taken — include tool used, target, and outcome.
Format each as: N. ACTION target — outcome [tool: name]
Example:
1. READ config.py:45 — found `==` should be `!=` [tool: read_file]
2. PATCH config.py:45 — changed `==` to `!=` [tool: patch]
3. TEST `pytest tests/` — 3/50 failed: test_parse, test_validate, test_edge [tool: terminal]
Be specific with file paths, commands, line numbers, and results.]
## Active State
[Current working state — include:
- Working directory and branch (if applicable)
- Modified/created files with brief note on each
- Test status (X/Y passing)
- Any running processes or servers
- Environment details that matter]
## In Progress
[Work currently underway — what was being done when compaction fired]
## Blocked
[Any blockers, errors, or issues not yet resolved. Include exact error messages.]
## Key Decisions
[Important technical decisions and why they were made]
[Important technical decisions and WHY they were made]
## Resolved Questions
[Questions the user asked that were ALREADY answered — include the answer so the next assistant does not re-answer them]
@@ -396,10 +628,7 @@ class ContextCompressor(ContextEngine):
## Critical Context
[Any specific values, error messages, configuration details, or data that would be lost without explicit preservation]
## Tools & Patterns
[Which tools were used, how they were used effectively, and any tool-specific discoveries]
Target ~{summary_budget} tokens. Be specific — include file paths, command outputs, error messages, and concrete values rather than vague descriptions.
Target ~{summary_budget} tokens. Be CONCRETE — include file paths, command outputs, error messages, line numbers, and specific values. Avoid vague descriptions like "made some changes" — say exactly what changed.
Write only the summary body. Do not include any preamble or prefix."""
@@ -415,7 +644,7 @@ PREVIOUS SUMMARY:
NEW TURNS TO INCORPORATE:
{content_to_summarize}
Update the summary using this exact structure. PRESERVE all existing information that is still relevant. ADD new progress. Move items from "In Progress" to "Done" when completed. Move answered questions to "Resolved Questions". Remove information only if it is clearly obsolete.
Update the summary using this exact structure. PRESERVE all existing information that is still relevant. ADD new completed actions to the numbered list (continue numbering). Move items from "In Progress" to "Completed Actions" when done. Move answered questions to "Resolved Questions". Update "Active State" to reflect current state. Remove information only if it is clearly obsolete.
{_template_sections}"""
else:
@@ -450,7 +679,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
"api_mode": self.api_mode,
},
"messages": [{"role": "user", "content": prompt}],
"max_tokens": summary_budget * 2,
"max_tokens": int(summary_budget * 1.3),
# timeout resolved from auxiliary.compression.timeout config by call_llm
}
if self.summary_model:
@@ -464,8 +693,10 @@ The user has requested that this compaction PRIORITISE preserving all informatio
# Store for iterative updates on next compaction
self._previous_summary = summary
self._summary_failure_cooldown_until = 0.0
self._summary_model_fallen_back = False
return self._with_summary_prefix(summary)
except RuntimeError:
# No provider configured — long cooldown, unlikely to self-resolve
self._summary_failure_cooldown_until = time.monotonic() + _SUMMARY_FAILURE_COOLDOWN_SECONDS
logging.warning("Context compression: no provider available for "
"summary. Middle turns will be dropped without summary "
@@ -473,12 +704,42 @@ The user has requested that this compaction PRIORITISE preserving all informatio
_SUMMARY_FAILURE_COOLDOWN_SECONDS)
return None
except Exception as e:
self._summary_failure_cooldown_until = time.monotonic() + _SUMMARY_FAILURE_COOLDOWN_SECONDS
# If the summary model is different from the main model and the
# error looks permanent (model not found, 503, 404), fall back to
# using the main model instead of entering cooldown that leaves
# context growing unbounded. (#8620 sub-issue 4)
_status = getattr(e, "status_code", None) or getattr(getattr(e, "response", None), "status_code", None)
_err_str = str(e).lower()
_is_model_not_found = (
_status in (404, 503)
or "model_not_found" in _err_str
or "does not exist" in _err_str
or "no available channel" in _err_str
)
if (
_is_model_not_found
and self.summary_model
and self.summary_model != self.model
and not getattr(self, "_summary_model_fallen_back", False)
):
self._summary_model_fallen_back = True
logging.warning(
"Summary model '%s' not available (%s). "
"Falling back to main model '%s' for compression.",
self.summary_model, e, self.model,
)
self.summary_model = "" # empty = use main model
self._summary_failure_cooldown_until = 0.0 # no cooldown
return self._generate_summary(messages, summary_budget) # retry immediately
# Transient errors (timeout, rate limit, network) — shorter cooldown
_transient_cooldown = 60
self._summary_failure_cooldown_until = time.monotonic() + _transient_cooldown
logging.warning(
"Failed to generate context summary: %s. "
"Further summary attempts paused for %d seconds.",
e,
_SUMMARY_FAILURE_COOLDOWN_SECONDS,
_transient_cooldown,
)
return None
@@ -744,11 +1005,11 @@ The user has requested that this compaction PRIORITISE preserving all informatio
compressed = []
for i in range(compress_start):
msg = messages[i].copy()
if i == 0 and msg.get("role") == "system" and self.compression_count == 0:
msg["content"] = (
(msg.get("content") or "")
+ "\n\n[Note: Some earlier conversation turns have been compacted into a handoff summary to preserve context space. The current session state may still reflect earlier work, so build on that summary and state rather than re-doing work.]"
)
if i == 0 and msg.get("role") == "system":
existing = msg.get("content") or ""
_compression_note = "[Note: Some earlier conversation turns have been compacted into a handoff summary to preserve context space. The current session state may still reflect earlier work, so build on that summary and state rather than re-doing work.]"
if _compression_note not in existing:
msg["content"] = existing + "\n\n" + _compression_note
compressed.append(msg)
# If LLM summary failed, insert a static fallback so the model
@@ -806,14 +1067,24 @@ The user has requested that this compaction PRIORITISE preserving all informatio
compressed = self._sanitize_tool_pairs(compressed)
new_estimate = estimate_messages_tokens_rough(compressed)
saved_estimate = display_tokens - new_estimate
# Anti-thrashing: track compression effectiveness
savings_pct = (saved_estimate / display_tokens * 100) if display_tokens > 0 else 0
self._last_compression_savings_pct = savings_pct
if savings_pct < 10:
self._ineffective_compression_count += 1
else:
self._ineffective_compression_count = 0
if not self.quiet_mode:
new_estimate = estimate_messages_tokens_rough(compressed)
saved_estimate = display_tokens - new_estimate
logger.info(
"Compressed: %d -> %d messages (~%d tokens saved)",
"Compressed: %d -> %d messages (~%d tokens saved, %.0f%%)",
n_messages,
len(compressed),
saved_estimate,
savings_pct,
)
logger.info("Compression #%d complete", self.compression_count)
+2
View File
@@ -1162,6 +1162,7 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
if token:
source_name = "gh_cli" if "gh" in source.lower() else f"env:{source}"
active_sources.add(source_name)
pconfig = PROVIDER_REGISTRY.get(provider)
changed |= _upsert_entry(
entries,
provider,
@@ -1170,6 +1171,7 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
"source": source_name,
"auth_type": AUTH_TYPE_API_KEY,
"access_token": token,
"base_url": pconfig.inference_base_url if pconfig else "",
"label": source,
},
)
+9
View File
@@ -112,6 +112,10 @@ _RATE_LIMIT_PATTERNS = [
"please retry after",
"resource_exhausted",
"rate increased too quickly", # Alibaba/DashScope throttling
# AWS Bedrock throttling
"throttlingexception",
"too many concurrent requests",
"servicequotaexceededexception",
]
# Usage-limit patterns that need disambiguation (could be billing OR rate_limit)
@@ -171,6 +175,11 @@ _CONTEXT_OVERFLOW_PATTERNS = [
# Chinese error messages (some providers return these)
"超过最大长度",
"上下文长度",
# AWS Bedrock Converse API error patterns
"input is too long",
"max input token",
"input token",
"exceeds the maximum number of input tokens",
]
# Model not found patterns
+14 -2
View File
@@ -28,6 +28,7 @@ Usage in run_agent.py:
from __future__ import annotations
import json
import logging
import re
from typing import Any, Dict, List, Optional
@@ -43,11 +44,22 @@ logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
_FENCE_TAG_RE = re.compile(r'</?\s*memory-context\s*>', re.IGNORECASE)
_INTERNAL_CONTEXT_RE = re.compile(
r'<\s*memory-context\s*>[\s\S]*?</\s*memory-context\s*>',
re.IGNORECASE,
)
_INTERNAL_NOTE_RE = re.compile(
r'\[System note:\s*The following is recalled memory context,\s*NOT new user input\.\s*Treat as informational background data\.\]\s*',
re.IGNORECASE,
)
def sanitize_context(text: str) -> str:
"""Strip fence-escape sequences from provider output."""
return _FENCE_TAG_RE.sub('', text)
"""Strip fence tags, injected context blocks, and system notes from provider output."""
text = _INTERNAL_CONTEXT_RE.sub('', text)
text = _INTERNAL_NOTE_RE.sub('', text)
text = _FENCE_TAG_RE.sub('', text)
return text
def build_memory_context_block(raw_context: str) -> str:
+11
View File
@@ -36,6 +36,7 @@ _PROVIDER_PREFIXES: frozenset[str] = frozenset({
"opencode", "zen", "go", "vercel", "kilo", "dashscope", "aliyun", "qwen",
"mimo", "xiaomi-mimo",
"arcee-ai", "arceeai",
"xai", "x-ai", "x.ai", "grok",
"qwen-portal",
})
@@ -1011,6 +1012,16 @@ def get_model_context_length(
if ctx:
return ctx
# 4b. AWS Bedrock — use static context length table.
# Bedrock's ListFoundationModels doesn't expose context window sizes,
# so we maintain a curated table in bedrock_adapter.py.
if provider == "bedrock" or (base_url and "bedrock-runtime" in base_url):
try:
from agent.bedrock_adapter import get_bedrock_context_length
return get_bedrock_context_length(model)
except ImportError:
pass # boto3 not installed — fall through to generic resolution
# 5. Provider-aware lookups (before generic OpenRouter cache)
# These are provider-specific and take priority over the generic OR cache,
# since the same model can have different context limits per provider
+182
View File
@@ -0,0 +1,182 @@
"""Cross-session rate limit guard for Nous Portal.
Writes rate limit state to a shared file so all sessions (CLI, gateway,
cron, auxiliary) can check whether Nous Portal is currently rate-limited
before making requests. Prevents retry amplification when RPH is tapped.
Each 429 from Nous triggers up to 9 API calls per conversation turn
(3 SDK retries x 3 Hermes retries), and every one of those calls counts
against RPH. By recording the rate limit state on first 429 and checking
it before subsequent attempts, we eliminate the amplification effect.
"""
from __future__ import annotations
import json
import logging
import os
import tempfile
import time
from typing import Any, Mapping, Optional
logger = logging.getLogger(__name__)
_STATE_SUBDIR = "rate_limits"
_STATE_FILENAME = "nous.json"
def _state_path() -> str:
"""Return the path to the Nous rate limit state file."""
try:
from hermes_constants import get_hermes_home
base = get_hermes_home()
except ImportError:
base = os.path.join(os.path.expanduser("~"), ".hermes")
return os.path.join(base, _STATE_SUBDIR, _STATE_FILENAME)
def _parse_reset_seconds(headers: Optional[Mapping[str, str]]) -> Optional[float]:
"""Extract the best available reset-time estimate from response headers.
Priority:
1. x-ratelimit-reset-requests-1h (hourly RPH window — most useful)
2. x-ratelimit-reset-requests (per-minute RPM window)
3. retry-after (generic HTTP header)
Returns seconds-from-now, or None if no usable header found.
"""
if not headers:
return None
lowered = {k.lower(): v for k, v in headers.items()}
for key in (
"x-ratelimit-reset-requests-1h",
"x-ratelimit-reset-requests",
"retry-after",
):
raw = lowered.get(key)
if raw is not None:
try:
val = float(raw)
if val > 0:
return val
except (TypeError, ValueError):
pass
return None
def record_nous_rate_limit(
*,
headers: Optional[Mapping[str, str]] = None,
error_context: Optional[dict[str, Any]] = None,
default_cooldown: float = 300.0,
) -> None:
"""Record that Nous Portal is rate-limited.
Parses the reset time from response headers or error context.
Falls back to ``default_cooldown`` (5 minutes) if no reset info
is available. Writes to a shared file that all sessions can read.
Args:
headers: HTTP response headers from the 429 error.
error_context: Structured error context from _extract_api_error_context().
default_cooldown: Fallback cooldown in seconds when no header data.
"""
now = time.time()
reset_at = None
# Try headers first (most accurate)
header_seconds = _parse_reset_seconds(headers)
if header_seconds is not None:
reset_at = now + header_seconds
# Try error_context reset_at (from body parsing)
if reset_at is None and isinstance(error_context, dict):
ctx_reset = error_context.get("reset_at")
if isinstance(ctx_reset, (int, float)) and ctx_reset > now:
reset_at = float(ctx_reset)
# Default cooldown
if reset_at is None:
reset_at = now + default_cooldown
path = _state_path()
try:
state_dir = os.path.dirname(path)
os.makedirs(state_dir, exist_ok=True)
state = {
"reset_at": reset_at,
"recorded_at": now,
"reset_seconds": reset_at - now,
}
# Atomic write: write to temp file + rename
fd, tmp_path = tempfile.mkstemp(dir=state_dir, suffix=".tmp")
try:
with os.fdopen(fd, "w") as f:
json.dump(state, f)
os.replace(tmp_path, path)
except Exception:
# Clean up temp file on failure
try:
os.unlink(tmp_path)
except OSError:
pass
raise
logger.info(
"Nous rate limit recorded: resets in %.0fs (at %.0f)",
reset_at - now, reset_at,
)
except Exception as exc:
logger.debug("Failed to write Nous rate limit state: %s", exc)
def nous_rate_limit_remaining() -> Optional[float]:
"""Check if Nous Portal is currently rate-limited.
Returns:
Seconds remaining until reset, or None if not rate-limited.
"""
path = _state_path()
try:
with open(path) as f:
state = json.load(f)
reset_at = state.get("reset_at", 0)
remaining = reset_at - time.time()
if remaining > 0:
return remaining
# Expired — clean up
try:
os.unlink(path)
except OSError:
pass
return None
except (FileNotFoundError, json.JSONDecodeError, KeyError, TypeError):
return None
def clear_nous_rate_limit() -> None:
"""Clear the rate limit state (e.g., after a successful Nous request)."""
try:
os.unlink(_state_path())
except FileNotFoundError:
pass
except OSError as exc:
logger.debug("Failed to clear Nous rate limit state: %s", exc)
def format_remaining(seconds: float) -> str:
"""Format seconds remaining into human-readable duration."""
s = max(0, int(seconds))
if s < 60:
return f"{s}s"
if s < 3600:
m, sec = divmod(s, 60)
return f"{m}m {sec}s" if sec else f"{m}m"
h, remainder = divmod(s, 3600)
m = remainder // 60
return f"{h}h {m}m" if m else f"{h}h"
+3 -1
View File
@@ -295,7 +295,9 @@ PLATFORM_HINTS = {
),
"telegram": (
"You are on a text messaging communication platform, Telegram. "
"Please do not use markdown as it does not render. "
"Standard markdown is automatically converted to Telegram format. "
"Supported: **bold**, *italic*, ~~strikethrough~~, ||spoiler||, "
"`inline code`, ```code blocks```, [links](url), and ## headers. "
"You can send media files natively: to deliver a file to the user, "
"include MEDIA:/absolute/path/to/file in your response. Images "
"(.png, .jpg, .webp) appear as photos, audio (.ogg) sends as voice "
+17
View File
@@ -93,6 +93,17 @@ _DB_CONNSTR_RE = re.compile(
re.IGNORECASE,
)
# JWT tokens: header.payload[.signature] — always start with "eyJ" (base64 for "{")
# Matches 1-part (header only), 2-part (header.payload), and full 3-part JWTs.
_JWT_RE = re.compile(
r"eyJ[A-Za-z0-9_-]{10,}" # Header (always starts with eyJ)
r"(?:\.[A-Za-z0-9_=-]{4,}){0,2}" # Optional payload and/or signature
)
# Discord user/role mentions: <@123456789012345678> or <@!123456789012345678>
# Snowflake IDs are 17-20 digit integers that resolve to specific Discord accounts.
_DISCORD_MENTION_RE = re.compile(r"<@!?(\d{17,20})>")
# E.164 phone numbers: +<country><number>, 7-15 digits
# Negative lookahead prevents matching hex strings or identifiers
_SIGNAL_PHONE_RE = re.compile(r"(\+[1-9]\d{6,14})(?![A-Za-z0-9])")
@@ -159,6 +170,12 @@ def redact_sensitive_text(text: str) -> str:
# Database connection string passwords
text = _DB_CONNSTR_RE.sub(lambda m: f"{m.group(1)}***{m.group(3)}", text)
# JWT tokens (eyJ... — base64-encoded JSON headers)
text = _JWT_RE.sub(lambda m: _mask_token(m.group(0)), text)
# Discord user/role mentions (<@snowflake_id>)
text = _DISCORD_MENTION_RE.sub(lambda m: f"<@{'!' if '!' in m.group(0) else ''}***>", text)
# E.164 phone numbers (Signal, WhatsApp)
def _redact_phone(m):
phone = m.group(1)
+11 -2
View File
@@ -12,6 +12,8 @@ from datetime import datetime
from pathlib import Path
from typing import Any, Dict, Optional
from hermes_constants import display_hermes_home
logger = logging.getLogger(__name__)
_skill_commands: Dict[str, Dict[str, Any]] = {}
@@ -70,7 +72,14 @@ def _load_skill_payload(skill_identifier: str, task_id: str | None = None) -> tu
skill_name = str(loaded_skill.get("name") or normalized)
skill_path = str(loaded_skill.get("path") or "")
skill_dir = None
if skill_path:
# Prefer the absolute skill_dir returned by skill_view() — this is
# correct for both local and external skills. Fall back to the old
# SKILLS_DIR-relative reconstruction only when skill_dir is absent
# (e.g. legacy skill_view responses).
abs_skill_dir = loaded_skill.get("skill_dir")
if abs_skill_dir:
skill_dir = Path(abs_skill_dir)
elif skill_path:
try:
skill_dir = SKILLS_DIR / Path(skill_path).parent
except Exception:
@@ -108,7 +117,7 @@ def _inject_skill_config(loaded_skill: dict[str, Any], parts: list[str]) -> None
if not resolved:
return
lines = ["", "[Skill config (from ~/.hermes/config.yaml):"]
lines = ["", f"[Skill config (from {display_hermes_home()}/config.yaml):"]
for key, value in resolved.items():
display_val = str(value) if value else "(not set)"
lines.append(f" {key} = {display_val}")
+74
View File
@@ -284,6 +284,80 @@ _OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
source_url="https://ai.google.dev/pricing",
pricing_version="google-pricing-2026-03-16",
),
# AWS Bedrock — pricing per the Bedrock pricing page.
# Bedrock charges the same per-token rates as the model provider but
# through AWS billing. These are the on-demand prices (no commitment).
# Source: https://aws.amazon.com/bedrock/pricing/
(
"bedrock",
"anthropic.claude-opus-4-6",
): PricingEntry(
input_cost_per_million=Decimal("15.00"),
output_cost_per_million=Decimal("75.00"),
source="official_docs_snapshot",
source_url="https://aws.amazon.com/bedrock/pricing/",
pricing_version="bedrock-pricing-2026-04",
),
(
"bedrock",
"anthropic.claude-sonnet-4-6",
): PricingEntry(
input_cost_per_million=Decimal("3.00"),
output_cost_per_million=Decimal("15.00"),
source="official_docs_snapshot",
source_url="https://aws.amazon.com/bedrock/pricing/",
pricing_version="bedrock-pricing-2026-04",
),
(
"bedrock",
"anthropic.claude-sonnet-4-5",
): PricingEntry(
input_cost_per_million=Decimal("3.00"),
output_cost_per_million=Decimal("15.00"),
source="official_docs_snapshot",
source_url="https://aws.amazon.com/bedrock/pricing/",
pricing_version="bedrock-pricing-2026-04",
),
(
"bedrock",
"anthropic.claude-haiku-4-5",
): PricingEntry(
input_cost_per_million=Decimal("0.80"),
output_cost_per_million=Decimal("4.00"),
source="official_docs_snapshot",
source_url="https://aws.amazon.com/bedrock/pricing/",
pricing_version="bedrock-pricing-2026-04",
),
(
"bedrock",
"amazon.nova-pro",
): PricingEntry(
input_cost_per_million=Decimal("0.80"),
output_cost_per_million=Decimal("3.20"),
source="official_docs_snapshot",
source_url="https://aws.amazon.com/bedrock/pricing/",
pricing_version="bedrock-pricing-2026-04",
),
(
"bedrock",
"amazon.nova-lite",
): PricingEntry(
input_cost_per_million=Decimal("0.06"),
output_cost_per_million=Decimal("0.24"),
source="official_docs_snapshot",
source_url="https://aws.amazon.com/bedrock/pricing/",
pricing_version="bedrock-pricing-2026-04",
),
(
"bedrock",
"amazon.nova-micro",
): PricingEntry(
input_cost_per_million=Decimal("0.035"),
output_cost_per_million=Decimal("0.14"),
source="official_docs_snapshot",
source_url="https://aws.amazon.com/bedrock/pricing/",
pricing_version="bedrock-pricing-2026-04",
),
}
+13 -1
View File
@@ -16,7 +16,7 @@ model:
# "nous" - Nous Portal OAuth (requires: hermes login)
# "nous-api" - Nous Portal API key (requires: NOUS_API_KEY)
# "anthropic" - Direct Anthropic API (requires: ANTHROPIC_API_KEY)
# "openai-codex" - OpenAI Codex (requires: hermes login --provider openai-codex)
# "openai-codex" - OpenAI Codex (requires: hermes auth)
# "copilot" - GitHub Copilot / GitHub Models (requires: GITHUB_TOKEN)
# "gemini" - Use Google AI Studio direct (requires: GOOGLE_API_KEY or GEMINI_API_KEY)
# "zai" - Use z.ai / ZhipuAI GLM models (requires: GLM_API_KEY)
@@ -564,6 +564,18 @@ platform_toolsets:
homeassistant: [hermes-homeassistant]
qqbot: [hermes-qqbot]
# =============================================================================
# Gateway Platform Settings
# =============================================================================
# Optional per-platform messaging settings.
# Platform-specific knobs live under `extra`.
#
# platforms:
# telegram:
# reply_to_mode: "first" # off | first | all
# extra:
# disable_link_previews: false # Set true to suppress Telegram URL previews in bot messages
# ─────────────────────────────────────────────────────────────────────────────
# Available toolsets (use these names in platform_toolsets or the toolsets list)
#
+31 -33
View File
@@ -989,6 +989,7 @@ def _prune_orphaned_branches(repo_root: str) -> None:
_ACCENT_ANSI_DEFAULT = "\033[1;38;2;255;215;0m" # True-color #FFD700 bold — fallback
_BOLD = "\033[1m"
_RST = "\033[0m"
_STREAM_PAD = " " # 4-space indent for streamed response text (matches Panel padding)
def _hex_to_ansi(hex_color: str, *, bold: bool = False) -> str:
@@ -1712,9 +1713,9 @@ class HermesCLI:
# Parse and validate toolsets
self.enabled_toolsets = toolsets
if toolsets and "all" not in toolsets and "*" not in toolsets:
# Validate each toolset — MCP server names are added by
# _get_platform_tools() but aren't registered in TOOLSETS yet
# (that happens later in _sync_mcp_toolsets), so exclude them.
# Validate each toolset — MCP server names are resolved via
# live registry aliases (registered during discover_mcp_tools),
# but discovery hasn't run yet at this point, so exclude them.
mcp_names = set((CLI_CONFIG.get("mcp_servers") or {}).keys())
invalid = [t for t in toolsets if not validate_toolset(t) and t not in mcp_names]
if invalid:
@@ -2580,7 +2581,7 @@ class HermesCLI:
_tc = getattr(self, "_stream_text_ansi", "")
while "\n" in self._stream_buf:
line, self._stream_buf = self._stream_buf.split("\n", 1)
_cprint(f"{_tc}{line}{_RST}" if _tc else line)
_cprint(f"{_STREAM_PAD}{_tc}{line}{_RST}" if _tc else f"{_STREAM_PAD}{line}")
def _flush_stream(self) -> None:
"""Emit any remaining partial line from the stream buffer and close the box."""
@@ -2597,7 +2598,7 @@ class HermesCLI:
if self._stream_buf:
_tc = getattr(self, "_stream_text_ansi", "")
_cprint(f"{_tc}{self._stream_buf}{_RST}" if _tc else self._stream_buf)
_cprint(f"{_STREAM_PAD}{_tc}{self._stream_buf}{_RST}" if _tc else f"{_STREAM_PAD}{self._stream_buf}")
self._stream_buf = ""
# Close the response box
@@ -3896,23 +3897,14 @@ class HermesCLI:
def _handle_profile_command(self):
"""Display active profile name and home directory."""
from hermes_constants import get_hermes_home, display_hermes_home
from hermes_constants import display_hermes_home
from hermes_cli.profiles import get_active_profile_name
home = get_hermes_home()
display = display_hermes_home()
profiles_parent = Path.home() / ".hermes" / "profiles"
try:
rel = home.relative_to(profiles_parent)
profile_name = str(rel).split("/")[0]
except ValueError:
profile_name = None
profile_name = get_active_profile_name()
print()
if profile_name:
print(f" Profile: {profile_name}")
else:
print(" Profile: default")
print(f" Profile: {profile_name}")
print(f" Home: {display}")
print()
@@ -4099,6 +4091,8 @@ class HermesCLI:
self.agent.flush_memories(self.conversation_history)
except (Exception, KeyboardInterrupt):
pass
# Trigger memory extraction on the old session before session_id rotates.
self.agent.commit_memory_session(self.conversation_history)
self._notify_session_boundary("on_session_finalize")
elif self.agent:
# First session or empty history — still finalize the old session
@@ -4587,16 +4581,19 @@ class HermesCLI:
self._close_model_picker()
return
provider_data = providers[selected]
model_list = []
try:
from hermes_cli.models import provider_model_ids
live = provider_model_ids(provider_data["slug"])
if live:
model_list = live
except Exception:
pass
# Use the curated model list from list_authenticated_providers()
# (same lists as `hermes model` and gateway pickers).
# Only fall back to the live provider catalog when the curated
# list is empty (e.g. user-defined endpoints with no curated list).
model_list = provider_data.get("models", [])
if not model_list:
model_list = provider_data.get("models", [])
try:
from hermes_cli.models import provider_model_ids
live = provider_model_ids(provider_data["slug"])
if live:
model_list = live
except Exception:
pass
state["stage"] = "model"
state["provider_data"] = provider_data
state["model_list"] = model_list
@@ -5487,7 +5484,8 @@ class HermesCLI:
version = f" v{p['version']}" if p["version"] else ""
tools = f"{p['tools']} tools" if p["tools"] else ""
hooks = f"{p['hooks']} hooks" if p["hooks"] else ""
parts = [x for x in [tools, hooks] if x]
commands = f"{p['commands']} commands" if p.get("commands") else ""
parts = [x for x in [tools, hooks, commands] if x]
detail = f" ({', '.join(parts)})" if parts else ""
error = f"{p['error']}" if p["error"] else ""
print(f" {status} {p['name']}{version}{detail}{error}")
@@ -5761,7 +5759,7 @@ class HermesCLI:
border_style=_resp_color,
style=_resp_text,
box=rich_box.HORIZONTALS,
padding=(1, 2),
padding=(1, 4),
))
else:
_cprint(" (No response generated)")
@@ -5885,7 +5883,7 @@ class HermesCLI:
title_align="left",
border_style=_resp_color,
box=rich_box.HORIZONTALS,
padding=(1, 2),
padding=(1, 4),
))
else:
_cprint(" 💬 /btw: (no response)")
@@ -5952,7 +5950,7 @@ class HermesCLI:
parts = cmd.strip().split(None, 1)
sub = parts[1].lower().strip() if len(parts) > 1 else "status"
_DEFAULT_CDP = "http://localhost:9222"
_DEFAULT_CDP = "http://127.0.0.1:9222"
current = os.environ.get("BROWSER_CDP_URL", "").strip()
if sub.startswith("connect"):
@@ -7648,7 +7646,7 @@ class HermesCLI:
label = " ⚕ Hermes "
fill = w - 2 - len(label)
_cprint(f"\n{_ACCENT}╭─{label}{'' * max(fill - 1, 0)}{_RST}")
_cprint(sentence.rstrip())
_cprint(f"{_STREAM_PAD}{sentence.rstrip()}")
tts_thread = threading.Thread(
target=stream_tts_to_speaker,
@@ -7879,7 +7877,7 @@ class HermesCLI:
border_style=_resp_color,
style=_resp_text,
box=rich_box.HORIZONTALS,
padding=(1, 2),
padding=(1, 4),
))
+6
View File
@@ -501,6 +501,12 @@ def update_job(job_id: str, updates: Dict[str, Any]) -> Optional[Dict[str, Any]]
if schedule_changed:
updated_schedule = updated["schedule"]
# The API may pass schedule as a raw string (e.g. "every 10m")
# instead of a pre-parsed dict. Normalize it the same way
# create_job() does so downstream code can call .get() safely.
if isinstance(updated_schedule, str):
updated_schedule = parse_schedule(updated_schedule)
updated["schedule"] = updated_schedule
updated["schedule_display"] = updates.get(
"schedule_display",
updated_schedule.get("display", updated.get("schedule_display")),
+9 -2
View File
@@ -10,6 +10,7 @@ runs at a time if multiple processes overlap.
import asyncio
import concurrent.futures
import contextvars
import json
import logging
import os
@@ -288,11 +289,13 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Option
if wrap_response:
task_name = job.get("name", job["id"])
job_id = job.get("id", "")
delivery_content = (
f"Cronjob Response: {task_name}\n"
f"(job_id: {job_id})\n"
f"-------------\n\n"
f"{content}\n\n"
f"Note: The agent cannot see this message, and therefore cannot respond to it."
f"To stop or manage this job, send me a new message (e.g. \"stop reminder {task_name}\")."
)
else:
delivery_content = content
@@ -768,7 +771,11 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
_cron_inactivity_limit = _cron_timeout if _cron_timeout > 0 else None
_POLL_INTERVAL = 5.0
_cron_pool = concurrent.futures.ThreadPoolExecutor(max_workers=1)
_cron_future = _cron_pool.submit(agent.run_conversation, prompt)
# Preserve scheduler-scoped ContextVar state (for example skill-declared
# env passthrough registrations) when the cron run hops into the worker
# thread used for inactivity timeout monitoring.
_cron_context = contextvars.copy_context()
_cron_future = _cron_pool.submit(_cron_context.run, agent.run_conversation, prompt)
_inactivity_timeout = False
try:
if _cron_inactivity_limit is None:
Regular → Executable
+13 -6
View File
@@ -1,13 +1,14 @@
#!/bin/bash
# Docker entrypoint: bootstrap config files into the mounted volume, then run hermes.
# Docker/Podman entrypoint: bootstrap config files into the mounted volume, then run hermes.
set -e
HERMES_HOME="/opt/data"
HERMES_HOME="${HERMES_HOME:-/opt/data}"
INSTALL_DIR="/opt/hermes"
# --- Privilege dropping via gosu ---
# When started as root (the default), optionally remap the hermes user/group
# to match host-side ownership, fix volume permissions, then re-exec as hermes.
# When started as root (the default for Docker, or fakeroot in rootless Podman),
# optionally remap the hermes user/group to match host-side ownership, fix volume
# permissions, then re-exec as hermes.
if [ "$(id -u)" = "0" ]; then
if [ -n "$HERMES_UID" ] && [ "$HERMES_UID" != "$(id -u hermes)" ]; then
echo "Changing hermes UID to $HERMES_UID"
@@ -16,13 +17,19 @@ if [ "$(id -u)" = "0" ]; then
if [ -n "$HERMES_GID" ] && [ "$HERMES_GID" != "$(id -g hermes)" ]; then
echo "Changing hermes GID to $HERMES_GID"
groupmod -g "$HERMES_GID" hermes
# -o allows non-unique GID (e.g. macOS GID 20 "staff" may already exist
# as "dialout" in the Debian-based container image)
groupmod -o -g "$HERMES_GID" hermes 2>/dev/null || true
fi
actual_hermes_uid=$(id -u hermes)
if [ "$(stat -c %u "$HERMES_HOME" 2>/dev/null)" != "$actual_hermes_uid" ]; then
echo "$HERMES_HOME is not owned by $actual_hermes_uid, fixing"
chown -R hermes:hermes "$HERMES_HOME"
# In rootless Podman the container's "root" is mapped to an unprivileged
# host UID — chown will fail. That's fine: the volume is already owned
# by the mapped user on the host side.
chown -R hermes:hermes "$HERMES_HOME" 2>/dev/null || \
echo "Warning: chown failed (rootless container?) — continuing anyway"
fi
echo "Dropping root privileges"
+18
View File
@@ -554,6 +554,12 @@ def load_gateway_config() -> GatewayConfig:
bridged["mention_patterns"] = platform_cfg["mention_patterns"]
if plat == Platform.DISCORD and "channel_skill_bindings" in platform_cfg:
bridged["channel_skill_bindings"] = platform_cfg["channel_skill_bindings"]
if "channel_prompts" in platform_cfg:
channel_prompts = platform_cfg["channel_prompts"]
if isinstance(channel_prompts, dict):
bridged["channel_prompts"] = {str(k): v for k, v in channel_prompts.items()}
else:
bridged["channel_prompts"] = channel_prompts
if not bridged:
continue
plat_data = platforms_data.setdefault(plat.value, {})
@@ -632,6 +638,18 @@ def load_gateway_config() -> GatewayConfig:
os.environ["TELEGRAM_IGNORED_THREADS"] = str(ignored_threads)
if "reactions" in telegram_cfg and not os.getenv("TELEGRAM_REACTIONS"):
os.environ["TELEGRAM_REACTIONS"] = str(telegram_cfg["reactions"]).lower()
if "proxy_url" in telegram_cfg and not os.getenv("TELEGRAM_PROXY"):
os.environ["TELEGRAM_PROXY"] = str(telegram_cfg["proxy_url"]).strip()
if "disable_link_previews" in telegram_cfg:
plat_data = platforms_data.setdefault(Platform.TELEGRAM.value, {})
if not isinstance(plat_data, dict):
plat_data = {}
platforms_data[Platform.TELEGRAM.value] = plat_data
extra = plat_data.setdefault("extra", {})
if not isinstance(extra, dict):
extra = {}
plat_data["extra"] = extra
extra["disable_link_previews"] = telegram_cfg["disable_link_previews"]
whatsapp_cfg = yaml_cfg.get("whatsapp", {})
if isinstance(whatsapp_cfg, dict):
+512 -3
View File
@@ -515,6 +515,8 @@ class APIServerAdapter(BasePlatformAdapter):
session_id: Optional[str] = None,
stream_delta_callback=None,
tool_progress_callback=None,
tool_start_callback=None,
tool_complete_callback=None,
) -> Any:
"""
Create an AIAgent instance using the gateway's runtime config.
@@ -553,6 +555,8 @@ class APIServerAdapter(BasePlatformAdapter):
platform="api_server",
stream_delta_callback=stream_delta_callback,
tool_progress_callback=tool_progress_callback,
tool_start_callback=tool_start_callback,
tool_complete_callback=tool_complete_callback,
session_db=self._ensure_session_db(),
fallback_model=fallback_model,
)
@@ -965,6 +969,427 @@ class APIServerAdapter(BasePlatformAdapter):
return response
async def _write_sse_responses(
self,
request: "web.Request",
response_id: str,
model: str,
created_at: int,
stream_q,
agent_task,
agent_ref,
conversation_history: List[Dict[str, str]],
user_message: str,
instructions: Optional[str],
conversation: Optional[str],
store: bool,
session_id: str,
) -> "web.StreamResponse":
"""Write an SSE stream for POST /v1/responses (OpenAI Responses API).
Emits spec-compliant event types as the agent runs:
- ``response.created`` initial envelope (status=in_progress)
- ``response.output_text.delta`` / ``response.output_text.done``
streamed assistant text
- ``response.output_item.added`` / ``response.output_item.done``
with ``item.type == "function_call"`` when the agent invokes a
tool (both events fire; the ``done`` event carries the finalized
``arguments`` string)
- ``response.output_item.added`` with
``item.type == "function_call_output"`` tool result with
``{call_id, output, status}``
- ``response.completed`` terminal event carrying the full
response object with all output items + usage (same payload
shape as the non-streaming path for parity)
- ``response.failed`` terminal event on agent error
If the client disconnects mid-stream, ``agent.interrupt()`` is
called so the agent stops issuing upstream LLM calls, then the
asyncio task is cancelled. When ``store=True`` the full response
is persisted to the ResponseStore in a ``finally`` block so GET
/v1/responses/{id} and ``previous_response_id`` chaining work the
same as the batch path.
"""
import queue as _q
sse_headers = {
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache",
"X-Accel-Buffering": "no",
}
origin = request.headers.get("Origin", "")
cors = self._cors_headers_for_origin(origin) if origin else None
if cors:
sse_headers.update(cors)
if session_id:
sse_headers["X-Hermes-Session-Id"] = session_id
response = web.StreamResponse(status=200, headers=sse_headers)
await response.prepare(request)
# State accumulated during the stream
final_text_parts: List[str] = []
# Track open function_call items by name so we can emit a matching
# ``done`` event when the tool completes. Order preserved.
pending_tool_calls: List[Dict[str, Any]] = []
# Output items we've emitted so far (used to build the terminal
# response.completed payload). Kept in the order they appeared.
emitted_items: List[Dict[str, Any]] = []
# Monotonic counter for output_index (spec requires it).
output_index = 0
# Monotonic counter for call_id generation if the agent doesn't
# provide one (it doesn't, from tool_progress_callback).
call_counter = 0
# Canonical Responses SSE events include a monotonically increasing
# sequence_number. Add it server-side for every emitted event so
# clients that validate the OpenAI event schema can parse our stream.
sequence_number = 0
# Track the assistant message item id + content index for text
# delta events — the spec ties deltas to a specific item.
message_item_id = f"msg_{uuid.uuid4().hex[:24]}"
message_output_index: Optional[int] = None
message_opened = False
async def _write_event(event_type: str, data: Dict[str, Any]) -> None:
nonlocal sequence_number
if "sequence_number" not in data:
data["sequence_number"] = sequence_number
sequence_number += 1
payload = f"event: {event_type}\ndata: {json.dumps(data)}\n\n"
await response.write(payload.encode())
def _envelope(status: str) -> Dict[str, Any]:
env: Dict[str, Any] = {
"id": response_id,
"object": "response",
"status": status,
"created_at": created_at,
"model": model,
}
return env
final_response_text = ""
agent_error: Optional[str] = None
usage: Dict[str, int] = {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0}
try:
# response.created — initial envelope, status=in_progress
created_env = _envelope("in_progress")
created_env["output"] = []
await _write_event("response.created", {
"type": "response.created",
"response": created_env,
})
last_activity = time.monotonic()
async def _open_message_item() -> None:
"""Emit response.output_item.added for the assistant message
the first time any text delta arrives."""
nonlocal message_opened, message_output_index, output_index
if message_opened:
return
message_opened = True
message_output_index = output_index
output_index += 1
item = {
"id": message_item_id,
"type": "message",
"status": "in_progress",
"role": "assistant",
"content": [],
}
await _write_event("response.output_item.added", {
"type": "response.output_item.added",
"output_index": message_output_index,
"item": item,
})
async def _emit_text_delta(delta_text: str) -> None:
await _open_message_item()
final_text_parts.append(delta_text)
await _write_event("response.output_text.delta", {
"type": "response.output_text.delta",
"item_id": message_item_id,
"output_index": message_output_index,
"content_index": 0,
"delta": delta_text,
"logprobs": [],
})
async def _emit_tool_started(payload: Dict[str, Any]) -> str:
"""Emit response.output_item.added for a function_call.
Returns the call_id so the matching completion event can
reference it. Prefer the real ``tool_call_id`` from the
agent when available; fall back to a generated call id for
safety in tests or older code paths.
"""
nonlocal output_index, call_counter
call_counter += 1
call_id = payload.get("tool_call_id") or f"call_{response_id[5:]}_{call_counter}"
args = payload.get("arguments", {})
if isinstance(args, dict):
arguments_str = json.dumps(args)
else:
arguments_str = str(args)
item = {
"id": f"fc_{uuid.uuid4().hex[:24]}",
"type": "function_call",
"status": "in_progress",
"name": payload.get("name", ""),
"call_id": call_id,
"arguments": arguments_str,
}
idx = output_index
output_index += 1
pending_tool_calls.append({
"call_id": call_id,
"name": payload.get("name", ""),
"arguments": arguments_str,
"item_id": item["id"],
"output_index": idx,
})
emitted_items.append({
"type": "function_call",
"name": payload.get("name", ""),
"arguments": arguments_str,
"call_id": call_id,
})
await _write_event("response.output_item.added", {
"type": "response.output_item.added",
"output_index": idx,
"item": item,
})
return call_id
async def _emit_tool_completed(payload: Dict[str, Any]) -> None:
"""Emit response.output_item.done (function_call) followed
by response.output_item.added (function_call_output)."""
nonlocal output_index
call_id = payload.get("tool_call_id")
result = payload.get("result", "")
pending = None
if call_id:
for i, p in enumerate(pending_tool_calls):
if p["call_id"] == call_id:
pending = pending_tool_calls.pop(i)
break
if pending is None:
# Completion without a matching start — skip to avoid
# emitting orphaned done events.
return
# function_call done
done_item = {
"id": pending["item_id"],
"type": "function_call",
"status": "completed",
"name": pending["name"],
"call_id": pending["call_id"],
"arguments": pending["arguments"],
}
await _write_event("response.output_item.done", {
"type": "response.output_item.done",
"output_index": pending["output_index"],
"item": done_item,
})
# function_call_output added (result)
result_str = result if isinstance(result, str) else json.dumps(result)
output_parts = [{"type": "input_text", "text": result_str}]
output_item = {
"id": f"fco_{uuid.uuid4().hex[:24]}",
"type": "function_call_output",
"call_id": pending["call_id"],
"output": output_parts,
"status": "completed",
}
idx = output_index
output_index += 1
emitted_items.append({
"type": "function_call_output",
"call_id": pending["call_id"],
"output": output_parts,
})
await _write_event("response.output_item.added", {
"type": "response.output_item.added",
"output_index": idx,
"item": output_item,
})
await _write_event("response.output_item.done", {
"type": "response.output_item.done",
"output_index": idx,
"item": output_item,
})
# Main drain loop — thread-safe queue fed by agent callbacks.
async def _dispatch(it) -> None:
"""Route a queue item to the correct SSE emitter.
Plain strings are text deltas. Tagged tuples with
``__tool_started__`` / ``__tool_completed__`` prefixes
are tool lifecycle events.
"""
if isinstance(it, tuple) and len(it) == 2 and isinstance(it[0], str):
tag, payload = it
if tag == "__tool_started__":
await _emit_tool_started(payload)
elif tag == "__tool_completed__":
await _emit_tool_completed(payload)
# Unknown tags are silently ignored (forward-compat).
elif isinstance(it, str):
await _emit_text_delta(it)
# Other types (non-string, non-tuple) are silently dropped.
loop = asyncio.get_event_loop()
while True:
try:
item = await loop.run_in_executor(None, lambda: stream_q.get(timeout=0.5))
except _q.Empty:
if agent_task.done():
# Drain remaining
while True:
try:
item = stream_q.get_nowait()
if item is None:
break
await _dispatch(item)
last_activity = time.monotonic()
except _q.Empty:
break
break
if time.monotonic() - last_activity >= CHAT_COMPLETIONS_SSE_KEEPALIVE_SECONDS:
await response.write(b": keepalive\n\n")
last_activity = time.monotonic()
continue
if item is None: # EOS sentinel
break
await _dispatch(item)
last_activity = time.monotonic()
# Pick up agent result + usage from the completed task
try:
result, agent_usage = await agent_task
usage = agent_usage or usage
# If the agent produced a final_response but no text
# deltas were streamed (e.g. some providers only emit
# the full response at the end), emit a single fallback
# delta so Responses clients still receive a live text part.
agent_final = result.get("final_response", "") if isinstance(result, dict) else ""
if agent_final and not final_text_parts:
await _emit_text_delta(agent_final)
if agent_final and not final_response_text:
final_response_text = agent_final
if isinstance(result, dict) and result.get("error") and not final_response_text:
agent_error = result["error"]
except Exception as e: # noqa: BLE001
logger.error("Error running agent for streaming responses: %s", e, exc_info=True)
agent_error = str(e)
# Close the message item if it was opened
final_response_text = "".join(final_text_parts) or final_response_text
if message_opened:
await _write_event("response.output_text.done", {
"type": "response.output_text.done",
"item_id": message_item_id,
"output_index": message_output_index,
"content_index": 0,
"text": final_response_text,
"logprobs": [],
})
msg_done_item = {
"id": message_item_id,
"type": "message",
"status": "completed",
"role": "assistant",
"content": [
{"type": "output_text", "text": final_response_text}
],
}
await _write_event("response.output_item.done", {
"type": "response.output_item.done",
"output_index": message_output_index,
"item": msg_done_item,
})
# Always append a final message item in the completed
# response envelope so clients that only parse the terminal
# payload still see the assistant text. This mirrors the
# shape produced by _extract_output_items in the batch path.
final_items: List[Dict[str, Any]] = list(emitted_items)
final_items.append({
"type": "message",
"role": "assistant",
"content": [
{"type": "output_text", "text": final_response_text or (agent_error or "")}
],
})
if agent_error:
failed_env = _envelope("failed")
failed_env["output"] = final_items
failed_env["error"] = {"message": agent_error, "type": "server_error"}
failed_env["usage"] = {
"input_tokens": usage.get("input_tokens", 0),
"output_tokens": usage.get("output_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
}
await _write_event("response.failed", {
"type": "response.failed",
"response": failed_env,
})
else:
completed_env = _envelope("completed")
completed_env["output"] = final_items
completed_env["usage"] = {
"input_tokens": usage.get("input_tokens", 0),
"output_tokens": usage.get("output_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
}
await _write_event("response.completed", {
"type": "response.completed",
"response": completed_env,
})
# Persist for future chaining / GET retrieval, mirroring
# the batch path behavior.
if store:
full_history = list(conversation_history)
full_history.append({"role": "user", "content": user_message})
if isinstance(result, dict) and result.get("messages"):
full_history.extend(result["messages"])
else:
full_history.append({"role": "assistant", "content": final_response_text})
self._response_store.put(response_id, {
"response": completed_env,
"conversation_history": full_history,
"instructions": instructions,
"session_id": session_id,
})
if conversation:
self._response_store.set_conversation(conversation, response_id)
except (ConnectionResetError, ConnectionAbortedError, BrokenPipeError, OSError):
# Client disconnected — interrupt the agent so it stops
# making upstream LLM calls, then cancel the task.
agent = agent_ref[0] if agent_ref else None
if agent is not None:
try:
agent.interrupt("SSE client disconnected")
except Exception:
pass
if not agent_task.done():
agent_task.cancel()
try:
await agent_task
except (asyncio.CancelledError, Exception):
pass
logger.info("SSE client disconnected; interrupted agent task %s", response_id)
return response
async def _handle_responses(self, request: "web.Request") -> "web.Response":
"""POST /v1/responses — OpenAI Responses API format."""
auth_err = self._check_auth(request)
@@ -1035,11 +1460,13 @@ class APIServerAdapter(BasePlatformAdapter):
if previous_response_id:
logger.debug("Both conversation_history and previous_response_id provided; using conversation_history")
stored_session_id = None
if not conversation_history and previous_response_id:
stored = self._response_store.get(previous_response_id)
if stored is None:
return web.json_response(_openai_error(f"Previous response not found: {previous_response_id}"), status=404)
conversation_history = list(stored.get("conversation_history", []))
stored_session_id = stored.get("session_id")
# If no instructions provided, carry forward from previous
if instructions is None:
instructions = stored.get("instructions")
@@ -1057,8 +1484,83 @@ class APIServerAdapter(BasePlatformAdapter):
if body.get("truncation") == "auto" and len(conversation_history) > 100:
conversation_history = conversation_history[-100:]
# Run the agent (with Idempotency-Key support)
session_id = str(uuid.uuid4())
# Reuse session from previous_response_id chain so the dashboard
# groups the entire conversation under one session entry.
session_id = stored_session_id or str(uuid.uuid4())
stream = bool(body.get("stream", False))
if stream:
# Streaming branch — emit OpenAI Responses SSE events as the
# agent runs so frontends can render text deltas and tool
# calls in real time. See _write_sse_responses for details.
import queue as _q
_stream_q: _q.Queue = _q.Queue()
def _on_delta(delta):
# None from the agent is a CLI box-close signal, not EOS.
# Forwarding would kill the SSE stream prematurely; the
# SSE writer detects completion via agent_task.done().
if delta is not None:
_stream_q.put(delta)
def _on_tool_progress(event_type, name, preview, args, **kwargs):
"""Queue non-start tool progress events if needed in future.
The structured Responses stream uses ``tool_start_callback``
and ``tool_complete_callback`` for exact call-id correlation,
so progress events are currently ignored here.
"""
return
def _on_tool_start(tool_call_id, function_name, function_args):
"""Queue a started tool for live function_call streaming."""
_stream_q.put(("__tool_started__", {
"tool_call_id": tool_call_id,
"name": function_name,
"arguments": function_args or {},
}))
def _on_tool_complete(tool_call_id, function_name, function_args, function_result):
"""Queue a completed tool result for live function_call_output streaming."""
_stream_q.put(("__tool_completed__", {
"tool_call_id": tool_call_id,
"name": function_name,
"arguments": function_args or {},
"result": function_result,
}))
agent_ref = [None]
agent_task = asyncio.ensure_future(self._run_agent(
user_message=user_message,
conversation_history=conversation_history,
ephemeral_system_prompt=instructions,
session_id=session_id,
stream_delta_callback=_on_delta,
tool_progress_callback=_on_tool_progress,
tool_start_callback=_on_tool_start,
tool_complete_callback=_on_tool_complete,
agent_ref=agent_ref,
))
response_id = f"resp_{uuid.uuid4().hex[:28]}"
model_name = body.get("model", self._model_name)
created_at = int(time.time())
return await self._write_sse_responses(
request=request,
response_id=response_id,
model=model_name,
created_at=created_at,
stream_q=_stream_q,
agent_task=agent_task,
agent_ref=agent_ref,
conversation_history=conversation_history,
user_message=user_message,
instructions=instructions,
conversation=conversation,
store=store,
session_id=session_id,
)
async def _compute_response():
return await self._run_agent(
@@ -1133,6 +1635,7 @@ class APIServerAdapter(BasePlatformAdapter):
"response": response_data,
"conversation_history": full_history,
"instructions": instructions,
"session_id": session_id,
})
# Update conversation mapping so the next request with the same
# conversation name automatically chains to this response
@@ -1486,6 +1989,8 @@ class APIServerAdapter(BasePlatformAdapter):
session_id: Optional[str] = None,
stream_delta_callback=None,
tool_progress_callback=None,
tool_start_callback=None,
tool_complete_callback=None,
agent_ref: Optional[list] = None,
) -> tuple:
"""
@@ -1507,6 +2012,8 @@ class APIServerAdapter(BasePlatformAdapter):
session_id=session_id,
stream_delta_callback=stream_delta_callback,
tool_progress_callback=tool_progress_callback,
tool_start_callback=tool_start_callback,
tool_complete_callback=tool_complete_callback,
)
if agent_ref is not None:
agent_ref[0] = agent
@@ -1643,10 +2150,12 @@ class APIServerAdapter(BasePlatformAdapter):
if previous_response_id:
logger.debug("Both conversation_history and previous_response_id provided; using conversation_history")
stored_session_id = None
if not conversation_history and previous_response_id:
stored = self._response_store.get(previous_response_id)
if stored:
conversation_history = list(stored.get("conversation_history", []))
stored_session_id = stored.get("session_id")
if instructions is None:
instructions = stored.get("instructions")
@@ -1665,7 +2174,7 @@ class APIServerAdapter(BasePlatformAdapter):
)
conversation_history.append({"role": msg["role"], "content": str(content)})
session_id = body.get("session_id") or run_id
session_id = body.get("session_id") or stored_session_id or run_id
ephemeral_system_prompt = instructions
async def _run_and_close():
+62
View File
@@ -682,6 +682,10 @@ class MessageEvent:
# Auto-loaded skill(s) for topic/channel bindings (e.g., Telegram DM Topics,
# Discord channel_skill_bindings). A single name or ordered list.
auto_skill: Optional[str | list[str]] = None
# Per-channel ephemeral system prompt (e.g. Discord channel_prompts).
# Applied at API call time and never persisted to transcript history.
channel_prompt: Optional[str] = None
# Internal flag — set for synthetic events (e.g. background process
# completion notifications) that must bypass user authorization checks.
@@ -776,6 +780,36 @@ _RETRYABLE_ERROR_PATTERNS = (
MessageHandler = Callable[[MessageEvent], Awaitable[Optional[str]]]
def resolve_channel_prompt(
config_extra: dict,
channel_id: str,
parent_id: str | None = None,
) -> str | None:
"""Resolve a per-channel ephemeral prompt from platform config.
Looks up ``channel_prompts`` in the adapter's ``config.extra`` dict.
Prefers an exact match on *channel_id*; falls back to *parent_id*
(useful for forum threads / child channels inheriting a parent prompt).
Returns the prompt string, or None if no match is found. Blank/whitespace-
only prompts are treated as absent.
"""
prompts = config_extra.get("channel_prompts") or {}
if not isinstance(prompts, dict):
return None
for key in (channel_id, parent_id):
if not key:
continue
prompt = prompts.get(key)
if prompt is None:
continue
prompt = str(prompt).strip()
if prompt:
return prompt
return None
class BasePlatformAdapter(ABC):
"""
Base class for platform adapters.
@@ -805,6 +839,11 @@ class BasePlatformAdapter(ABC):
# Gateway shutdown cancels these so an old gateway instance doesn't keep
# working on a task after --replace or manual restarts.
self._background_tasks: set[asyncio.Task] = set()
# One-shot callbacks to fire after the main response is delivered.
# Keyed by session_key. GatewayRunner uses this to defer
# background-review notifications ("💾 Skill created") until the
# primary reply has been sent.
self._post_delivery_callbacks: Dict[str, Callable] = {}
self._expected_cancelled_tasks: set[asyncio.Task] = set()
self._busy_session_handler: Optional[Callable[[MessageEvent, str], Awaitable[bool]]] = None
# Chats where auto-TTS on voice input is disabled (set by /voice off)
@@ -1624,6 +1663,21 @@ class BasePlatformAdapter(ABC):
# streaming already delivered the text (already_sent=True) or
# when the message was queued behind an active agent. Log at
# DEBUG to avoid noisy warnings for expected behavior.
#
# Suppress stale response when the session was interrupted by a
# new message that hasn't been consumed yet. The pending message
# is processed by the pending-message handler below (#8221/#2483).
if (
response
and interrupt_event.is_set()
and session_key in self._pending_messages
):
logger.info(
"[%s] Suppressing stale response for interrupted session %s",
self.name,
session_key,
)
response = None
if not response:
logger.debug("[%s] Handler returned empty/None response for %s", self.name, event.source.chat_id)
if response:
@@ -1845,6 +1899,14 @@ class BasePlatformAdapter(ABC):
except Exception:
pass # Last resort — don't let error reporting crash the handler
finally:
# Fire any one-shot post-delivery callback registered for this
# session (e.g. deferred background-review notifications).
_post_cb = getattr(self, "_post_delivery_callbacks", {}).pop(session_key, None)
if callable(_post_cb):
try:
_post_cb()
except Exception:
pass
# Stop typing indicator
typing_task.cancel()
try:
+150 -1
View File
@@ -1379,6 +1379,68 @@ class DiscordAdapter(BasePlatformAdapter):
)
return await super().send_image(chat_id, image_url, caption, reply_to)
async def send_animation(
self,
chat_id: str,
animation_url: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Send an animated GIF natively as a Discord file attachment."""
if not self._client:
return SendResult(success=False, error="Not connected")
if not is_safe_url(animation_url):
logger.warning("[%s] Blocked unsafe animation URL during Discord send_animation", self.name)
return await super().send_animation(chat_id, animation_url, caption, reply_to, metadata=metadata)
try:
import aiohttp
channel = self._client.get_channel(int(chat_id))
if not channel:
channel = await self._client.fetch_channel(int(chat_id))
if not channel:
return SendResult(success=False, error=f"Channel {chat_id} not found")
# Download the GIF and send as a Discord file attachment
# (Discord renders .gif attachments as auto-playing animations inline)
from gateway.platforms.base import resolve_proxy_url, proxy_kwargs_for_aiohttp
_proxy = resolve_proxy_url(platform_env_var="DISCORD_PROXY")
_sess_kw, _req_kw = proxy_kwargs_for_aiohttp(_proxy)
async with aiohttp.ClientSession(**_sess_kw) as session:
async with session.get(animation_url, timeout=aiohttp.ClientTimeout(total=30), **_req_kw) as resp:
if resp.status != 200:
raise Exception(f"Failed to download animation: HTTP {resp.status}")
animation_data = await resp.read()
import io
file = discord.File(io.BytesIO(animation_data), filename="animation.gif")
msg = await channel.send(
content=caption if caption else None,
file=file,
)
return SendResult(success=True, message_id=str(msg.id))
except ImportError:
logger.warning(
"[%s] aiohttp not installed, falling back to URL. Run: pip install aiohttp",
self.name,
exc_info=True,
)
return await super().send_animation(chat_id, animation_url, caption, reply_to, metadata=metadata)
except Exception as e: # pragma: no cover - defensive logging
logger.error(
"[%s] Failed to send animation attachment, falling back to URL: %s",
self.name,
e,
exc_info=True,
)
return await super().send_animation(chat_id, animation_url, caption, reply_to, metadata=metadata)
async def send_video(
self,
chat_id: str,
@@ -1696,6 +1758,10 @@ class DiscordAdapter(BasePlatformAdapter):
async def slash_update(interaction: discord.Interaction):
await self._run_simple_slash(interaction, "/update", "Update initiated~")
@tree.command(name="restart", description="Gracefully restart the Hermes gateway")
async def slash_restart(interaction: discord.Interaction):
await self._run_simple_slash(interaction, "/restart", "Restart requested~")
@tree.command(name="approve", description="Approve a pending dangerous command")
@discord.app_commands.describe(scope="Optional: 'all', 'session', 'always', 'all session', 'all always'")
async def slash_approve(interaction: discord.Interaction, scope: str = ""):
@@ -1736,6 +1802,76 @@ class DiscordAdapter(BasePlatformAdapter):
async def slash_btw(interaction: discord.Interaction, question: str):
await self._run_simple_slash(interaction, f"/btw {question}")
# ── Auto-register any gateway-available commands not yet on the tree ──
# This ensures new commands added to COMMAND_REGISTRY in
# hermes_cli/commands.py automatically appear as Discord slash
# commands without needing a manual entry here.
try:
from hermes_cli.commands import COMMAND_REGISTRY, _is_gateway_available, _resolve_config_gates
already_registered = set()
try:
already_registered = {cmd.name for cmd in tree.get_commands()}
except Exception:
pass
config_overrides = _resolve_config_gates()
for cmd_def in COMMAND_REGISTRY:
if not _is_gateway_available(cmd_def, config_overrides):
continue
# Discord command names: lowercase, hyphens OK, max 32 chars.
discord_name = cmd_def.name.lower()[:32]
if discord_name in already_registered:
continue
# Skip aliases that overlap with already-registered names
# (aliases for explicitly registered commands are handled above).
desc = (cmd_def.description or f"Run /{cmd_def.name}")[:100]
has_args = bool(cmd_def.args_hint)
if has_args:
# Command takes optional arguments — create handler with
# an optional ``args`` string parameter.
def _make_args_handler(_name: str, _hint: str):
@discord.app_commands.describe(args=f"Arguments: {_hint}"[:100])
async def _handler(interaction: discord.Interaction, args: str = ""):
await self._run_simple_slash(
interaction, f"/{_name} {args}".strip()
)
_handler.__name__ = f"auto_slash_{_name.replace('-', '_')}"
return _handler
handler = _make_args_handler(cmd_def.name, cmd_def.args_hint)
else:
# Parameterless command.
def _make_simple_handler(_name: str):
async def _handler(interaction: discord.Interaction):
await self._run_simple_slash(interaction, f"/{_name}")
_handler.__name__ = f"auto_slash_{_name.replace('-', '_')}"
return _handler
handler = _make_simple_handler(cmd_def.name)
auto_cmd = discord.app_commands.Command(
name=discord_name,
description=desc,
callback=handler,
)
try:
tree.add_command(auto_cmd)
already_registered.add(discord_name)
except Exception:
# Silently skip commands that fail registration (e.g.
# name conflict with a subcommand group).
pass
logger.debug(
"Discord auto-registered %d commands from COMMAND_REGISTRY",
len(already_registered),
)
except Exception as e:
logger.warning("Discord auto-register from COMMAND_REGISTRY failed: %s", e)
# Register skills under a single /skill command group with category
# subcommand groups. This uses 1 top-level slot instead of N,
# supporting up to 25 categories × 25 skills = 625 skills.
@@ -1856,11 +1992,14 @@ class DiscordAdapter(BasePlatformAdapter):
)
msg_type = MessageType.COMMAND if text.startswith("/") else MessageType.TEXT
channel_id = str(interaction.channel_id)
parent_id = str(getattr(getattr(interaction, "channel", None), "parent_id", "") or "")
return MessageEvent(
text=text,
message_type=msg_type,
source=source,
raw_message=interaction,
channel_prompt=self._resolve_channel_prompt(channel_id, parent_id or None),
)
# ------------------------------------------------------------------
@@ -1931,14 +2070,17 @@ class DiscordAdapter(BasePlatformAdapter):
chat_topic=chat_topic,
)
_parent_id = str(getattr(getattr(interaction, "channel", None), "parent_id", "") or "")
_parent_channel = self._thread_parent_channel(getattr(interaction, "channel", None))
_parent_id = str(getattr(_parent_channel, "id", "") or "")
_skills = self._resolve_channel_skills(thread_id, _parent_id or None)
_channel_prompt = self._resolve_channel_prompt(thread_id, _parent_id or None)
event = MessageEvent(
text=text,
message_type=MessageType.TEXT,
source=source,
raw_message=interaction,
auto_skill=_skills,
channel_prompt=_channel_prompt,
)
await self.handle_message(event)
@@ -1967,6 +2109,11 @@ class DiscordAdapter(BasePlatformAdapter):
return list(dict.fromkeys(skills)) # dedup, preserve order
return None
def _resolve_channel_prompt(self, channel_id: str, parent_id: str | None = None) -> str | None:
"""Resolve a Discord per-channel prompt, preferring the exact channel over its parent."""
from gateway.platforms.base import resolve_channel_prompt
return resolve_channel_prompt(self.config.extra, channel_id, parent_id)
def _thread_parent_channel(self, channel: Any) -> Any:
"""Return the parent text channel when invoked from a thread."""
return getattr(channel, "parent", None) or channel
@@ -2518,6 +2665,7 @@ class DiscordAdapter(BasePlatformAdapter):
_parent_id = str(getattr(_chan, "parent_id", "") or "")
_chan_id = str(getattr(_chan, "id", ""))
_skills = self._resolve_channel_skills(_chan_id, _parent_id or None)
_channel_prompt = self._resolve_channel_prompt(_chan_id, _parent_id or None)
reply_to_id = None
reply_to_text = None
@@ -2538,6 +2686,7 @@ class DiscordAdapter(BasePlatformAdapter):
reply_to_text=reply_to_text,
timestamp=message.created_at,
auto_skill=_skills,
channel_prompt=_channel_prompt,
)
# Track thread participation so the bot won't require @mention for
+4 -1
View File
@@ -49,7 +49,10 @@ class MessageDeduplicator:
return False
now = time.time()
if msg_id in self._seen:
return True
if now - self._seen[msg_id] < self._ttl:
return True
# Entry has expired — remove it and treat as new
del self._seen[msg_id]
self._seen[msg_id] = now
if len(self._seen) > self._max_size:
cutoff = now - self._ttl
+8
View File
@@ -729,6 +729,14 @@ class MatrixAdapter(BasePlatformAdapter):
except Exception:
pass
async def stop_typing(self, chat_id: str) -> None:
"""Stop the Matrix typing indicator."""
if self._client:
try:
await self._client.set_typing(RoomID(chat_id), timeout=0)
except Exception:
pass
async def edit_message(
self, chat_id: str, message_id: str, content: str
) -> SendResult:
+7
View File
@@ -718,6 +718,12 @@ class MattermostAdapter(BasePlatformAdapter):
thread_id=thread_id,
)
# Per-channel ephemeral prompt
from gateway.platforms.base import resolve_channel_prompt
_channel_prompt = resolve_channel_prompt(
self.config.extra, channel_id, None,
)
msg_event = MessageEvent(
text=message_text,
message_type=msg_type,
@@ -726,6 +732,7 @@ class MattermostAdapter(BasePlatformAdapter):
message_id=post_id,
media_urls=media_urls if media_urls else None,
media_types=media_types if media_types else None,
channel_prompt=_channel_prompt,
)
await self.handle_message(msg_event)
+7
View File
@@ -1167,6 +1167,12 @@ class SlackAdapter(BasePlatformAdapter):
thread_id=thread_ts,
)
# Per-channel ephemeral prompt
from gateway.platforms.base import resolve_channel_prompt
_channel_prompt = resolve_channel_prompt(
self.config.extra, channel_id, None,
)
msg_event = MessageEvent(
text=text,
message_type=msg_type,
@@ -1176,6 +1182,7 @@ class SlackAdapter(BasePlatformAdapter):
media_urls=media_urls,
media_types=media_types,
reply_to_message_id=thread_ts if thread_ts != ts else None,
channel_prompt=_channel_prompt,
)
# Only react when bot is directly addressed (DM or @mention).
+63 -9
View File
@@ -18,6 +18,10 @@ logger = logging.getLogger(__name__)
try:
from telegram import Update, Bot, Message, InlineKeyboardButton, InlineKeyboardMarkup
try:
from telegram import LinkPreviewOptions
except ImportError:
LinkPreviewOptions = None
from telegram.ext import (
Application,
CommandHandler,
@@ -36,6 +40,7 @@ except ImportError:
Message = Any
InlineKeyboardButton = Any
InlineKeyboardMarkup = Any
LinkPreviewOptions = None
Application = Any
CommandHandler = Any
CallbackQueryHandler = Any
@@ -137,6 +142,7 @@ class TelegramAdapter(BasePlatformAdapter):
self._webhook_mode: bool = False
self._mention_patterns = self._compile_mention_patterns()
self._reply_to_mode: str = getattr(config, 'reply_to_mode', 'first') or 'first'
self._disable_link_previews: bool = self._coerce_bool_extra("disable_link_previews", False)
# Buffer rapid/album photo updates so Telegram image bursts are handled
# as a single MessageEvent instead of self-interrupting multiple turns.
self._media_batch_delay_seconds = float(os.getenv("HERMES_TELEGRAM_MEDIA_BATCH_DELAY_SECONDS", "0.8"))
@@ -163,6 +169,15 @@ class TelegramAdapter(BasePlatformAdapter):
# Approval button state: message_id → session_key
self._approval_state: Dict[int, str] = {}
@staticmethod
def _is_callback_user_authorized(user_id: str) -> bool:
"""Return whether a Telegram inline-button caller may perform gated actions."""
allowed_csv = os.getenv("TELEGRAM_ALLOWED_USERS", "").strip()
if not allowed_csv:
return True
allowed_ids = {uid.strip() for uid in allowed_csv.split(",") if uid.strip()}
return "*" in allowed_ids or user_id in allowed_ids
def _fallback_ips(self) -> list[str]:
"""Return validated fallback IPs from config (populated by _apply_env_overrides)."""
configured = self.config.extra.get("fallback_ips", []) if getattr(self.config, "extra", None) else []
@@ -193,6 +208,26 @@ class TelegramAdapter(BasePlatformAdapter):
pass
return isinstance(error, OSError)
def _coerce_bool_extra(self, key: str, default: bool = False) -> bool:
value = self.config.extra.get(key) if getattr(self.config, "extra", None) else None
if value is None:
return default
if isinstance(value, str):
lowered = value.strip().lower()
if lowered in ("true", "1", "yes", "on"):
return True
if lowered in ("false", "0", "no", "off"):
return False
return default
return bool(value)
def _link_preview_kwargs(self) -> Dict[str, Any]:
if not getattr(self, "_disable_link_previews", False):
return {}
if LinkPreviewOptions is not None:
return {"link_preview_options": LinkPreviewOptions(is_disabled=True)}
return {"disable_web_page_preview": True}
async def _handle_polling_network_error(self, error: Exception) -> None:
"""Reconnect polling after a transient network interruption.
@@ -540,7 +575,7 @@ class TelegramAdapter(BasePlatformAdapter):
"write_timeout": _env_float("HERMES_TELEGRAM_HTTP_WRITE_TIMEOUT", 20.0),
}
proxy_url = resolve_proxy_url()
proxy_url = resolve_proxy_url("TELEGRAM_PROXY")
disable_fallback = (os.getenv("HERMES_TELEGRAM_DISABLE_FALLBACK_IPS", "").strip().lower() in ("1", "true", "yes", "on"))
fallback_ips = self._fallback_ips()
if not fallback_ips:
@@ -847,6 +882,7 @@ class TelegramAdapter(BasePlatformAdapter):
parse_mode=ParseMode.MARKDOWN_V2,
reply_to_message_id=reply_to_id,
message_thread_id=effective_thread_id,
**self._link_preview_kwargs(),
)
except Exception as md_error:
# Markdown parsing failed, try plain text
@@ -859,6 +895,7 @@ class TelegramAdapter(BasePlatformAdapter):
parse_mode=None,
reply_to_message_id=reply_to_id,
message_thread_id=effective_thread_id,
**self._link_preview_kwargs(),
)
else:
raise
@@ -1046,6 +1083,7 @@ class TelegramAdapter(BasePlatformAdapter):
text=text,
parse_mode=ParseMode.MARKDOWN,
reply_markup=keyboard,
**self._link_preview_kwargs(),
)
return SendResult(success=True, message_id=str(msg.message_id))
except Exception as e:
@@ -1067,10 +1105,13 @@ class TelegramAdapter(BasePlatformAdapter):
try:
cmd_preview = command[:3800] + "..." if len(command) > 3800 else command
# Escape backticks that would break Markdown v1 inline code parsing
safe_cmd = cmd_preview.replace("`", "'")
safe_desc = description.replace("`", "'").replace("*", "")
text = (
f"⚠️ *Command Approval Required*\n\n"
f"`{cmd_preview}`\n\n"
f"Reason: {description}"
f"`{safe_cmd}`\n\n"
f"Reason: {safe_desc}"
)
# Resolve thread context for thread replies
@@ -1102,6 +1143,7 @@ class TelegramAdapter(BasePlatformAdapter):
"text": text,
"parse_mode": ParseMode.MARKDOWN,
"reply_markup": keyboard,
**self._link_preview_kwargs(),
}
if thread_id:
kwargs["message_thread_id"] = int(thread_id)
@@ -1172,6 +1214,7 @@ class TelegramAdapter(BasePlatformAdapter):
parse_mode=ParseMode.MARKDOWN,
reply_markup=keyboard,
message_thread_id=int(thread_id) if thread_id else None,
**self._link_preview_kwargs(),
)
# Store picker state keyed by chat_id
@@ -1440,12 +1483,9 @@ class TelegramAdapter(BasePlatformAdapter):
# Only authorized users may click approval buttons.
caller_id = str(getattr(query.from_user, "id", ""))
allowed_csv = os.getenv("TELEGRAM_ALLOWED_USERS", "").strip()
if allowed_csv:
allowed_ids = {uid.strip() for uid in allowed_csv.split(",") if uid.strip()}
if "*" not in allowed_ids and caller_id not in allowed_ids:
await query.answer(text="⛔ You are not authorized to approve commands.")
return
if not self._is_callback_user_authorized(caller_id):
await query.answer(text="⛔ You are not authorized to approve commands.")
return
session_key = self._approval_state.pop(approval_id, None)
if not session_key:
@@ -1490,6 +1530,10 @@ class TelegramAdapter(BasePlatformAdapter):
if not data.startswith("update_prompt:"):
return
answer = data.split(":", 1)[1] # "y" or "n"
caller_id = str(getattr(query.from_user, "id", ""))
if not self._is_callback_user_authorized(caller_id):
await query.answer(text="⛔ You are not authorized to answer update prompts.")
return
await query.answer(text=f"Sent '{answer}' to the update process.")
# Edit the message to show the choice and remove buttons
label = "Yes" if answer == "y" else "No"
@@ -2765,6 +2809,15 @@ class TelegramAdapter(BasePlatformAdapter):
reply_to_id = str(message.reply_to_message.message_id)
reply_to_text = message.reply_to_message.text or message.reply_to_message.caption or None
# Per-channel/topic ephemeral prompt
from gateway.platforms.base import resolve_channel_prompt
_chat_id_str = str(chat.id)
_channel_prompt = resolve_channel_prompt(
self.config.extra,
thread_id_str or _chat_id_str,
_chat_id_str if thread_id_str else None,
)
return MessageEvent(
text=message.text or "",
message_type=msg_type,
@@ -2774,6 +2827,7 @@ class TelegramAdapter(BasePlatformAdapter):
reply_to_message_id=reply_to_id,
reply_to_text=reply_to_text,
auto_skill=topic_skill,
channel_prompt=_channel_prompt,
timestamp=message.date,
)
+1 -1
View File
@@ -46,7 +46,7 @@ _SEED_FALLBACK_IPS: list[str] = ["149.154.167.220"]
def _resolve_proxy_url() -> str | None:
# Delegate to shared implementation (env vars + macOS system proxy detection)
from gateway.platforms.base import resolve_proxy_url
return resolve_proxy_url()
return resolve_proxy_url("TELEGRAM_PROXY")
class TelegramFallbackTransport(httpx.AsyncBaseTransport):
+14
View File
@@ -258,6 +258,20 @@ class WecomCallbackAdapter(BasePlatformAdapter):
)
event = self._build_event(app, decrypted)
if event is not None:
# Deduplicate: WeCom retries callbacks on timeout,
# producing duplicate inbound messages (#10305).
if event.message_id:
now = time.time()
if event.message_id in self._seen_messages:
if now - self._seen_messages[event.message_id] < MESSAGE_DEDUP_TTL_SECONDS:
logger.debug("[WecomCallback] Duplicate MsgId %s, skipping", event.message_id)
return web.Response(text="success", content_type="text/plain")
del self._seen_messages[event.message_id]
self._seen_messages[event.message_id] = now
# Prune expired entries when cache grows large
if len(self._seen_messages) > 2000:
cutoff = now - MESSAGE_DEDUP_TTL_SECONDS
self._seen_messages = {k: v for k, v in self._seen_messages.items() if v > cutoff}
# Record which app this user belongs to.
if event.source and event.source.user_id:
map_key = self._user_app_key(
+503 -116
View File
@@ -482,6 +482,32 @@ def _resolve_hermes_bin() -> Optional[list[str]]:
return None
def _parse_session_key(session_key: str) -> "dict | None":
"""Parse a session key into its component parts.
Session keys follow the format
``agent:main:{platform}:{chat_type}:{chat_id}[:{extra}...]``.
Returns a dict with ``platform``, ``chat_type``, ``chat_id``, and
optionally ``thread_id`` keys, or None if the key doesn't match.
The 6th element is only returned as ``thread_id`` for chat types where
it is unambiguous (``dm`` and ``thread``). For group/channel sessions
the suffix may be a user_id (per-user isolation) rather than a
thread_id, so we leave ``thread_id`` out to avoid mis-routing.
"""
parts = session_key.split(":")
if len(parts) >= 5 and parts[0] == "agent" and parts[1] == "main":
result = {
"platform": parts[2],
"chat_type": parts[3],
"chat_id": parts[4],
}
if len(parts) > 5 and parts[3] in ("dm", "thread"):
result["thread_id"] = parts[5]
return result
return None
def _format_gateway_process_notification(evt: dict) -> "str | None":
"""Format a watch pattern event from completion_queue into a [SYSTEM:] message."""
evt_type = evt.get("type", "completion")
@@ -573,6 +599,7 @@ class GatewayRunner:
self._running_agents: Dict[str, Any] = {}
self._running_agents_ts: Dict[str, float] = {} # start timestamp per session
self._pending_messages: Dict[str, str] = {} # Queued messages during interrupt
self._busy_ack_ts: Dict[str, float] = {} # last busy-ack timestamp per session (debounce)
# Cache AIAgent instances per session to preserve prompt caching.
# Without this, a new AIAgent is created per message, rebuilding the
@@ -1329,26 +1356,100 @@ class GatewayRunner:
merge_pending_message_event(adapter._pending_messages, session_key, event)
async def _handle_active_session_busy_message(self, event: MessageEvent, session_key: str) -> bool:
if not self._draining:
return False
# --- Draining case (gateway restarting/stopping) ---
if self._draining:
adapter = self.adapters.get(event.source.platform)
if not adapter:
return True
thread_meta = {"thread_id": event.source.thread_id} if event.source.thread_id else None
if self._queue_during_drain_enabled():
self._queue_or_replace_pending_event(session_key, event)
message = f"⏳ Gateway {self._status_action_gerund()} — queued for the next turn after it comes back."
else:
message = f"⏳ Gateway is {self._status_action_gerund()} and is not accepting another turn right now."
await adapter._send_with_retry(
chat_id=event.source.chat_id,
content=message,
reply_to=event.message_id,
metadata=thread_meta,
)
return True
# --- Normal busy case (agent actively running a task) ---
# The user sent a message while the agent is working. Interrupt the
# agent immediately so it stops the current tool-calling loop and
# processes the new message. The pending message is stored in the
# adapter so the base adapter picks it up once the interrupted run
# returns. A brief ack tells the user what's happening (debounced
# to avoid spam when they fire multiple messages quickly).
adapter = self.adapters.get(event.source.platform)
if not adapter:
return True
return False # let default path handle it
# Store the message so it's processed as the next turn after the
# interrupt causes the current run to exit.
from gateway.platforms.base import merge_pending_message_event
merge_pending_message_event(adapter._pending_messages, session_key, event)
# Interrupt the running agent — this aborts in-flight tool calls and
# causes the agent loop to exit at the next check point.
running_agent = self._running_agents.get(session_key)
if running_agent and running_agent is not _AGENT_PENDING_SENTINEL:
try:
running_agent.interrupt(event.text)
except Exception:
pass # don't let interrupt failure block the ack
# Debounce: only send an acknowledgment once every 30 seconds per session
# to avoid spamming the user when they send multiple messages quickly
_BUSY_ACK_COOLDOWN = 30
now = time.time()
last_ack = self._busy_ack_ts.get(session_key, 0)
if now - last_ack < _BUSY_ACK_COOLDOWN:
return True # interrupt sent, ack already delivered recently
self._busy_ack_ts[session_key] = now
# Build a status-rich acknowledgment
status_parts = []
if running_agent and running_agent is not _AGENT_PENDING_SENTINEL:
try:
summary = running_agent.get_activity_summary()
iteration = summary.get("api_call_count", 0)
max_iter = summary.get("max_iterations", 0)
current_tool = summary.get("current_tool")
start_ts = self._running_agents_ts.get(session_key, 0)
if start_ts:
elapsed_min = int((now - start_ts) / 60)
if elapsed_min > 0:
status_parts.append(f"{elapsed_min} min elapsed")
if max_iter:
status_parts.append(f"iteration {iteration}/{max_iter}")
if current_tool:
status_parts.append(f"running: {current_tool}")
except Exception:
pass
status_detail = f" ({', '.join(status_parts)})" if status_parts else ""
message = (
f"⚡ Interrupting current task{status_detail}. "
f"I'll respond to your message shortly."
)
thread_meta = {"thread_id": event.source.thread_id} if event.source.thread_id else None
if self._queue_during_drain_enabled():
self._queue_or_replace_pending_event(session_key, event)
message = f"⏳ Gateway {self._status_action_gerund()} — queued for the next turn after it comes back."
else:
message = f"⏳ Gateway is {self._status_action_gerund()} and is not accepting another turn right now."
try:
await adapter._send_with_retry(
chat_id=event.source.chat_id,
content=message,
reply_to=event.message_id,
metadata=thread_meta,
)
except Exception as e:
logger.debug("Failed to send busy-ack: %s", e)
await adapter._send_with_retry(
chat_id=event.source.chat_id,
content=message,
reply_to=event.message_id,
metadata=thread_meta,
)
return True
async def _drain_active_agents(self, timeout: float) -> tuple[Dict[str, Any], bool]:
@@ -1405,7 +1506,7 @@ class GatewayRunner:
action = "restarting" if self._restart_requested else "shutting down"
hint = (
"Your current task will be interrupted. "
"Use /retry after restart to continue."
"Send any message after restart to resume where it left off."
if self._restart_requested
else "Your current task will be interrupted."
)
@@ -1414,12 +1515,11 @@ class GatewayRunner:
notified: set = set()
for session_key in active:
# Parse platform + chat_id from the session key.
# Format: agent:main:{platform}:{chat_type}:{chat_id}[:{extra}...]
parts = session_key.split(":")
if len(parts) < 5:
_parsed = _parse_session_key(session_key)
if not _parsed:
continue
platform_str = parts[2]
chat_id = parts[4]
platform_str = _parsed["platform"]
chat_id = _parsed["chat_id"]
# Deduplicate: one notification per chat, even if multiple
# sessions (different users/threads) share the same chat.
@@ -1435,7 +1535,7 @@ class GatewayRunner:
# Include thread_id if present so the message lands in the
# correct forum topic / thread.
thread_id = parts[5] if len(parts) > 5 else None
thread_id = _parsed.get("thread_id")
metadata = {"thread_id": thread_id} if thread_id else None
await adapter.send(chat_id, msg, metadata=metadata)
@@ -1475,6 +1575,106 @@ class GatewayRunner:
except Exception:
pass
_STUCK_LOOP_THRESHOLD = 3 # restarts while active before auto-suspend
_STUCK_LOOP_FILE = ".restart_failure_counts"
def _increment_restart_failure_counts(self, active_session_keys: set) -> None:
"""Increment restart-failure counters for sessions active at shutdown.
Persists to a JSON file so counters survive across restarts.
Sessions NOT in active_session_keys are removed (they completed
successfully, so the loop is broken).
"""
import json
path = _hermes_home / self._STUCK_LOOP_FILE
try:
counts = json.loads(path.read_text()) if path.exists() else {}
except Exception:
counts = {}
# Increment active sessions, remove inactive ones (loop broken)
new_counts = {}
for key in active_session_keys:
new_counts[key] = counts.get(key, 0) + 1
# Keep any entries that are still above 0 even if not active now
# (they might become active again next restart)
try:
path.write_text(json.dumps(new_counts))
except Exception:
pass
def _suspend_stuck_loop_sessions(self) -> int:
"""Suspend sessions that have been active across too many restarts.
Returns the number of sessions suspended. Called on gateway startup
AFTER suspend_recently_active() to catch the stuck-loop pattern:
session loads agent gets stuck gateway restarts repeat.
"""
import json
path = _hermes_home / self._STUCK_LOOP_FILE
if not path.exists():
return 0
try:
counts = json.loads(path.read_text())
except Exception:
return 0
suspended = 0
stuck_keys = [k for k, v in counts.items() if v >= self._STUCK_LOOP_THRESHOLD]
for session_key in stuck_keys:
try:
entry = self.session_store._entries.get(session_key)
if entry and not entry.suspended:
entry.suspended = True
suspended += 1
logger.warning(
"Auto-suspended stuck session %s (active across %d "
"consecutive restarts — likely a stuck loop)",
session_key[:30], counts[session_key],
)
except Exception:
pass
if suspended:
try:
self.session_store._save()
except Exception:
pass
# Clear the file — counters start fresh after suspension
try:
path.unlink(missing_ok=True)
except Exception:
pass
return suspended
def _clear_restart_failure_count(self, session_key: str) -> None:
"""Clear the restart-failure counter for a session that completed OK.
Called after a successful agent turn to signal the loop is broken.
"""
import json
path = _hermes_home / self._STUCK_LOOP_FILE
if not path.exists():
return
try:
counts = json.loads(path.read_text())
if session_key in counts:
del counts[session_key]
if counts:
path.write_text(json.dumps(counts))
else:
path.unlink(missing_ok=True)
except Exception:
pass
async def _launch_detached_restart_command(self) -> None:
import shutil
import subprocess
@@ -1618,6 +1818,17 @@ class GatewayRunner:
except Exception as e:
logger.warning("Session suspension on startup failed: %s", e)
# Stuck-loop detection (#7536): if a session has been active across
# 3+ consecutive restarts, it's probably stuck in a loop (the same
# history keeps causing the agent to hang). Auto-suspend it so the
# user gets a clean slate on the next message.
try:
stuck = self._suspend_stuck_loop_sessions()
if stuck:
logger.warning("Auto-suspended %d stuck-loop session(s)", stuck)
except Exception as e:
logger.debug("Stuck-loop detection failed: %s", e)
connected_count = 0
enabled_platform_count = 0
startup_nonretryable_errors: list[str] = []
@@ -2126,6 +2337,8 @@ class GatewayRunner:
self._running_agents.clear()
self._pending_messages.clear()
self._pending_approvals.clear()
if hasattr(self, '_busy_ack_ts'):
self._busy_ack_ts.clear()
self._shutdown_event.set()
# Global cleanup: kill any remaining tool subprocesses not tied
@@ -2169,6 +2382,14 @@ class GatewayRunner:
"active sessions."
)
# Track sessions that were active at shutdown for stuck-loop
# detection (#7536). On each restart, the counter increments
# for sessions that were running. If a session hits the
# threshold (3 consecutive restarts while active), the next
# startup auto-suspends it — breaking the loop.
if active_agents:
self._increment_restart_failure_counts(set(active_agents.keys()))
if self._restart_requested and self._restart_via_service:
self._exit_code = GATEWAY_SERVICE_RESTART_EXIT_CODE
self._exit_reason = self._exit_reason or "Gateway restart requested"
@@ -2602,6 +2823,7 @@ class GatewayRunner:
)
del self._running_agents[_quick_key]
self._running_agents_ts.pop(_quick_key, None)
self._busy_ack_ts.pop(_quick_key, None)
if _quick_key in self._running_agents:
if event.get_command() == "status":
@@ -2669,6 +2891,7 @@ class GatewayRunner:
message_type=_MT.TEXT,
source=event.source,
message_id=event.message_id,
channel_prompt=event.channel_prompt,
)
adapter._pending_messages[_quick_key] = queued_event
return "Queued for the next turn."
@@ -3516,6 +3739,7 @@ class GatewayRunner:
model=_hyg_model,
max_iterations=4,
quiet_mode=True,
skip_memory=True,
enabled_toolsets=["memory"],
session_id=session_entry.session_id,
)
@@ -3646,6 +3870,7 @@ class GatewayRunner:
session_id=session_entry.session_id,
session_key=session_key,
event_message_id=event.message_id,
channel_prompt=event.channel_prompt,
)
# Stop persistent typing indicator now that the agent is done
@@ -3657,6 +3882,18 @@ class GatewayRunner:
pass
response = agent_result.get("final_response") or ""
# Convert the agent's internal "(empty)" sentinel into a
# user-friendly message. "(empty)" means the model failed to
# produce visible content after exhausting all retries (nudge,
# prefill, empty-retry, fallback). Sending the raw sentinel
# looks like a bug; a short explanation is more helpful.
if response == "(empty)":
response = (
"⚠️ The model returned no response after processing tool "
"results. This can happen with some models — try again or "
"rephrase your question."
)
agent_messages = agent_result.get("messages", [])
_response_time = time.time() - _msg_start_time
_api_calls = agent_result.get("api_calls", 0)
@@ -3667,6 +3904,12 @@ class GatewayRunner:
_response_time, _api_calls, _resp_len,
)
# Successful turn — clear any stuck-loop counter for this session.
# This ensures the counter only accumulates across CONSECUTIVE
# restarts where the session was active (never completed).
if session_key:
self._clear_restart_failure_count(session_key)
# Surface error details when the agent failed silently (final_response=None)
if not response and agent_result.get("failed"):
error_detail = agent_result.get("error", "unknown error")
@@ -3755,7 +3998,7 @@ class GatewayRunner:
synth_text = _format_gateway_process_notification(evt)
if synth_text:
try:
await self._inject_watch_notification(synth_text, event)
await self._inject_watch_notification(synth_text, evt)
except Exception as e2:
logger.error("Watch notification injection error: %s", e2)
except Exception as e:
@@ -3773,14 +4016,11 @@ class GatewayRunner:
# intermediate reasoning) so sessions can be resumed with full context
# and transcripts are useful for debugging and training data.
#
# IMPORTANT: When the agent failed before producing any response
# (e.g. context-overflow 400), do NOT persist the user's message.
# IMPORTANT: When the agent failed (e.g. context-overflow 400,
# compression exhausted), do NOT persist the user's message.
# Persisting it would make the session even larger, causing the
# same failure on the next attempt — an infinite loop. (#1630)
agent_failed_early = (
agent_result.get("failed")
and not agent_result.get("final_response")
)
# same failure on the next attempt — an infinite loop. (#1630, #9893)
agent_failed_early = bool(agent_result.get("failed"))
if agent_failed_early:
logger.info(
"Skipping transcript persistence for failed request in "
@@ -3788,6 +4028,24 @@ class GatewayRunner:
session_entry.session_id,
)
# When compression is exhausted, the session is permanently too
# large to process. Auto-reset it so the next message starts
# fresh instead of replaying the same oversized context in an
# infinite fail loop. (#9893)
if agent_result.get("compression_exhausted") and session_entry and session_key:
logger.info(
"Auto-resetting session %s after compression exhaustion.",
session_entry.session_id,
)
self.session_store.reset_session(session_key)
self._evict_cached_agent(session_key)
self._session_model_overrides.pop(session_key, None)
response = (response or "") + (
"\n\n🔄 Session auto-reset — the conversation exceeded the "
"maximum context size and could not be compressed further. "
"Your next message will start a fresh session."
)
ts = datetime.now().isoformat()
# If this is a fresh session (no history), write the full tool
@@ -3895,6 +4153,8 @@ class GatewayRunner:
_hist_len = len(history) if 'history' in locals() else 0
if status_code == 401:
status_hint = " Check your API key or run `claude /login` to refresh OAuth credentials."
elif status_code == 402:
status_hint = " Your API balance or quota is exhausted. Check your provider dashboard."
elif status_code == 429:
# Check if this is a plan usage limit (resets on a schedule) vs a transient rate limit
_err_body = getattr(e, "response", None)
@@ -4134,31 +4394,16 @@ class GatewayRunner:
async def _handle_profile_command(self, event: MessageEvent) -> str:
"""Handle /profile — show active profile name and home directory."""
from hermes_constants import get_hermes_home, display_hermes_home
from pathlib import Path
from hermes_constants import display_hermes_home
from hermes_cli.profiles import get_active_profile_name
home = get_hermes_home()
display = display_hermes_home()
profile_name = get_active_profile_name()
# Detect profile name from HERMES_HOME path
# Profile paths look like: ~/.hermes/profiles/<name>
profiles_parent = Path.home() / ".hermes" / "profiles"
try:
rel = home.relative_to(profiles_parent)
profile_name = str(rel).split("/")[0]
except ValueError:
profile_name = None
if profile_name:
lines = [
f"👤 **Profile:** `{profile_name}`",
f"📂 **Home:** `{display}`",
]
else:
lines = [
"👤 **Profile:** default",
f"📂 **Home:** `{display}`",
]
lines = [
f"👤 **Profile:** `{profile_name}`",
f"📂 **Home:** `{display}`",
]
return "\n".join(lines)
@@ -4731,6 +4976,7 @@ class GatewayRunner:
async def _handle_personality_command(self, event: MessageEvent) -> str:
"""Handle /personality command - list or set a personality."""
import yaml
from hermes_constants import display_hermes_home
args = event.get_command_args().strip().lower()
config_path = _hermes_home / 'config.yaml'
@@ -4748,7 +4994,7 @@ class GatewayRunner:
personalities = {}
if not personalities:
return "No personalities configured in `~/.hermes/config.yaml`"
return f"No personalities configured in `{display_hermes_home()}/config.yaml`"
if not args:
lines = ["🎭 **Available Personalities**\n"]
@@ -4832,6 +5078,7 @@ class GatewayRunner:
message_type=MessageType.TEXT,
source=source,
raw_message=event.raw_message,
channel_prompt=event.channel_prompt,
)
# Let the normal message handler process it
@@ -5975,6 +6222,7 @@ class GatewayRunner:
model=model,
max_iterations=4,
quiet_mode=True,
skip_memory=True,
enabled_toolsets=["memory"],
session_id=session_entry.session_id,
)
@@ -6340,6 +6588,11 @@ class GatewayRunner:
import asyncio as _asyncio
args = event.get_command_args().strip()
# Normalize Unicode dashes (Telegram/iOS auto-converts -- to em/en dash)
import re as _re
args = _re.sub(r'[\u2012\u2013\u2014\u2015](days|source)', r'--\1', args)
days = 30
source = None
@@ -6563,11 +6816,17 @@ class GatewayRunner:
})
async def _handle_debug_command(self, event: MessageEvent) -> str:
"""Handle /debug — upload debug report + logs and return paste URLs."""
"""Handle /debug — upload debug report (summary only) and return paste URLs.
Gateway uploads ONLY the summary report (system info + log tails),
NOT full log files, to protect conversation privacy. Users who need
full log uploads should use ``hermes debug share`` from the CLI.
"""
import asyncio
from hermes_cli.debug import (
_capture_dump, collect_debug_report, _read_full_log,
upload_to_pastebin,
_capture_dump, collect_debug_report,
upload_to_pastebin, _schedule_auto_delete,
_GATEWAY_PRIVACY_NOTICE,
)
loop = asyncio.get_running_loop()
@@ -6576,43 +6835,25 @@ class GatewayRunner:
def _collect_and_upload():
dump_text = _capture_dump()
report = collect_debug_report(log_lines=200, dump_text=dump_text)
agent_log = _read_full_log("agent")
gateway_log = _read_full_log("gateway")
if agent_log:
agent_log = dump_text + "\n\n--- full agent.log ---\n" + agent_log
if gateway_log:
gateway_log = dump_text + "\n\n--- full gateway.log ---\n" + gateway_log
urls = {}
failures = []
try:
urls["Report"] = upload_to_pastebin(report)
except Exception as exc:
return f"✗ Failed to upload debug report: {exc}"
if agent_log:
try:
urls["agent.log"] = upload_to_pastebin(agent_log)
except Exception:
failures.append("agent.log")
# Schedule auto-deletion after 1 hour
_schedule_auto_delete(list(urls.values()))
if gateway_log:
try:
urls["gateway.log"] = upload_to_pastebin(gateway_log)
except Exception:
failures.append("gateway.log")
lines = ["**Debug report uploaded:**", ""]
lines = [_GATEWAY_PRIVACY_NOTICE, "", "**Debug report uploaded:**", ""]
label_width = max(len(k) for k in urls)
for label, url in urls.items():
lines.append(f"`{label:<{label_width}}` {url}")
if failures:
lines.append(f"\n_(failed to upload: {', '.join(failures)})_")
lines.append("\nShare these links with the Hermes team for support.")
lines.append("")
lines.append("⏱ Pastes will auto-delete in 1 hour.")
lines.append("For full log uploads, use `hermes debug share` from the CLI.")
lines.append("Share these links with the Hermes team for support.")
return "\n".join(lines)
return await loop.run_in_executor(None, _collect_and_upload)
@@ -7232,14 +7473,75 @@ class GatewayRunner:
return prefix
return user_text
async def _inject_watch_notification(self, synth_text: str, original_event) -> None:
def _build_process_event_source(self, evt: dict):
"""Resolve the canonical source for a synthetic background-process event.
Prefer the persisted session-store origin for the event's session key.
Falling back to the currently active foreground event is what causes
cross-topic bleed, so don't do that.
"""
from gateway.session import SessionSource
session_key = str(evt.get("session_key") or "").strip()
derived_platform = ""
derived_chat_type = ""
derived_chat_id = ""
if session_key:
try:
self.session_store._ensure_loaded()
entry = self.session_store._entries.get(session_key)
if entry and getattr(entry, "origin", None):
return entry.origin
except Exception as exc:
logger.debug(
"Synthetic process-event session-store lookup failed for %s: %s",
session_key,
exc,
)
_parsed = _parse_session_key(session_key)
if _parsed:
derived_platform = _parsed["platform"]
derived_chat_type = _parsed["chat_type"]
derived_chat_id = _parsed["chat_id"]
platform_name = str(evt.get("platform") or derived_platform or "").strip().lower()
chat_type = str(evt.get("chat_type") or derived_chat_type or "").strip().lower()
chat_id = str(evt.get("chat_id") or derived_chat_id or "").strip()
if not platform_name or not chat_type or not chat_id:
return None
try:
platform = Platform(platform_name)
except Exception:
logger.warning(
"Synthetic process event has invalid platform metadata: %r",
platform_name,
)
return None
return SessionSource(
platform=platform,
chat_id=chat_id,
chat_type=chat_type,
thread_id=str(evt.get("thread_id") or "").strip() or None,
user_id=str(evt.get("user_id") or "").strip() or None,
user_name=str(evt.get("user_name") or "").strip() or None,
)
async def _inject_watch_notification(self, synth_text: str, evt: dict) -> None:
"""Inject a watch-pattern notification as a synthetic message event.
Uses the source from the original user event to route the notification
back to the correct chat/adapter.
Routing must come from the queued watch event itself, not from whatever
foreground message happened to be active when the queue was drained.
"""
source = getattr(original_event, "source", None)
source = self._build_process_event_source(evt)
if not source:
logger.warning(
"Dropping watch notification with no routing metadata for process %s",
evt.get("session_id", "unknown"),
)
return
platform_name = source.platform.value if hasattr(source.platform, "value") else str(source.platform)
adapter = None
@@ -7257,7 +7559,12 @@ class GatewayRunner:
source=source,
internal=True,
)
logger.info("Watch pattern notification — injecting for %s", platform_name)
logger.info(
"Watch pattern notification — injecting for %s chat=%s thread=%s",
platform_name,
source.chat_id,
source.thread_id,
)
await adapter.handle_message(synth_event)
except Exception as e:
logger.error("Watch notification injection error: %s", e)
@@ -7327,33 +7634,42 @@ class GatewayRunner:
f"Command: {session.command}\n"
f"Output:\n{_out}]"
)
source = self._build_process_event_source({
"session_id": session_id,
"session_key": session_key,
"platform": platform_name,
"chat_id": chat_id,
"thread_id": thread_id,
"user_id": user_id,
"user_name": user_name,
})
if not source:
logger.warning(
"Dropping completion notification with no routing metadata for process %s",
session_id,
)
break
adapter = None
for p, a in self.adapters.items():
if p.value == platform_name:
if p == source.platform:
adapter = a
break
if adapter and chat_id:
if adapter and source.chat_id:
try:
from gateway.platforms.base import MessageEvent, MessageType
from gateway.session import SessionSource
from gateway.config import Platform
_platform_enum = Platform(platform_name)
_source = SessionSource(
platform=_platform_enum,
chat_id=chat_id,
thread_id=thread_id or None,
user_id=user_id or None,
user_name=user_name or None,
)
synth_event = MessageEvent(
text=synth_text,
message_type=MessageType.TEXT,
source=_source,
source=source,
internal=True,
)
logger.info(
"Process %s finished — injecting agent notification for session %s",
session_id, session_key,
"Process %s finished — injecting agent notification for session %s chat=%s thread=%s",
session_id,
session_key,
source.chat_id,
source.thread_id,
)
await adapter.handle_message(synth_event)
except Exception as e:
@@ -7749,6 +8065,7 @@ class GatewayRunner:
session_key: str = None,
_interrupt_depth: int = 0,
event_message_id: Optional[str] = None,
channel_prompt: Optional[str] = None,
) -> Dict[str, Any]:
"""
Run the agent with the given message and context.
@@ -8103,8 +8420,12 @@ class GatewayRunner:
# Platform.LOCAL ("local") maps to "cli"; others pass through as-is.
platform_key = "cli" if source.platform == Platform.LOCAL else source.platform.value
# Combine platform context with user-configured ephemeral system prompt
# Combine platform context, per-channel context, and the user-configured
# ephemeral system prompt.
combined_ephemeral = context_prompt or ""
event_channel_prompt = (channel_prompt or "").strip()
if event_channel_prompt:
combined_ephemeral = (combined_ephemeral + "\n\n" + event_channel_prompt).strip()
if self._ephemeral_system_prompt:
combined_ephemeral = (combined_ephemeral + "\n\n" + self._ephemeral_system_prompt).strip()
@@ -8238,6 +8559,12 @@ class GatewayRunner:
cached = _cache.get(session_key)
if cached and cached[1] == _sig:
agent = cached[0]
# Reset activity timestamp so the inactivity timeout
# handler doesn't see stale idle time from the previous
# turn and immediately kill this agent. (#9051)
agent._last_activity_ts = time.time()
agent._last_activity_desc = "starting new turn (cached)"
agent._api_call_count = 0
logger.debug("Reusing cached agent for session %s", session_key)
if agent is None:
@@ -8263,6 +8590,7 @@ class GatewayRunner:
session_id=session_id,
platform=platform_key,
user_id=source.user_id,
gateway_session_key=session_key,
session_db=self._session_db,
fallback_model=self._fallback_model,
)
@@ -8282,8 +8610,11 @@ class GatewayRunner:
agent.service_tier = self._service_tier
agent.request_overrides = turn_route.get("request_overrides")
# Background review delivery — send "💾 Memory updated" etc. to user
def _bg_review_send(message: str) -> None:
_bg_review_release = threading.Event()
_bg_review_pending: list[str] = []
_bg_review_pending_lock = threading.Lock()
def _deliver_bg_review_message(message: str) -> None:
if not _status_adapter:
return
try:
@@ -8298,7 +8629,32 @@ class GatewayRunner:
except Exception as _e:
logger.debug("background_review_callback error: %s", _e)
def _release_bg_review_messages() -> None:
_bg_review_release.set()
with _bg_review_pending_lock:
pending = list(_bg_review_pending)
_bg_review_pending.clear()
for queued in pending:
_deliver_bg_review_message(queued)
# Background review delivery — send "💾 Memory updated" etc. to user
def _bg_review_send(message: str) -> None:
if not _status_adapter:
return
if not _bg_review_release.is_set():
with _bg_review_pending_lock:
if not _bg_review_release.is_set():
_bg_review_pending.append(message)
return
_deliver_bg_review_message(message)
agent.background_review_callback = _bg_review_send
# Register the release hook on the adapter so base.py's finally
# block can fire it after delivering the main response.
if _status_adapter and session_key:
_pdc = getattr(_status_adapter, "_post_delivery_callbacks", None)
if _pdc is not None:
_pdc[session_key] = _release_bg_review_messages
# Store agent reference for interrupt support
agent_holder[0] = agent
@@ -8450,6 +8806,21 @@ class GatewayRunner:
if _msn:
message = _msn + "\n\n" + message
# Auto-continue: if the loaded history ends with a tool result,
# the previous agent turn was interrupted mid-work (gateway
# restart, crash, SIGTERM). Prepend a system note so the model
# finishes processing the pending tool results before addressing
# the user's new message. (#4493)
if agent_history and agent_history[-1].get("role") == "tool":
message = (
"[System note: Your previous turn was interrupted before you could "
"process the last tool result(s). The conversation history contains "
"tool outputs you haven't responded to yet. Please finish processing "
"those results and summarize what was accomplished, then address the "
"user's new message below.]\n\n"
+ message
)
_approval_session_key = session_key or ""
_approval_session_token = set_current_session_key(_approval_session_key)
register_gateway_notify(_approval_session_key, _approval_notify_sync)
@@ -8484,6 +8855,8 @@ class GatewayRunner:
"final_response": error_msg,
"messages": result.get("messages", []),
"api_calls": result.get("api_calls", 0),
"failed": result.get("failed", False),
"compression_exhausted": result.get("compression_exhausted", False),
"tools": tools_holder[0] or [],
"history_offset": len(agent_history),
"last_prompt_tokens": _last_prompt_toks,
@@ -8988,15 +9361,11 @@ class GatewayRunner:
pass
except Exception as e:
logger.debug("Stream consumer wait before queued message failed: %s", e)
_response_previewed = bool(result.get("response_previewed"))
_already_streamed = bool(
_sc
and (
getattr(_sc, "final_response_sent", False)
or (
_response_previewed
and getattr(_sc, "already_sent", False)
)
or getattr(_sc, "already_sent", False)
)
)
first_response = result.get("final_response", "")
@@ -9009,6 +9378,17 @@ class GatewayRunner:
)
except Exception as e:
logger.warning("Failed to send first response before queued message: %s", e)
# Release deferred bg-review notifications now that the
# first response has been delivered. Pop from the
# adapter's callback dict (prevents double-fire in
# base.py's finally block) and call it.
if adapter and hasattr(adapter, "_post_delivery_callbacks"):
_bg_cb = adapter._post_delivery_callbacks.pop(session_key, None)
if callable(_bg_cb):
try:
_bg_cb()
except Exception:
pass
# else: interrupted — discard the interrupted response ("Operation
# interrupted." is just noise; the user already knows they sent a
# new message).
@@ -9037,6 +9417,7 @@ class GatewayRunner:
session_key=session_key,
_interrupt_depth=_interrupt_depth + 1,
event_message_id=next_message_id,
channel_prompt=pending_event.channel_prompt,
)
finally:
# Stop progress sender, interrupt monitor, and notification task
@@ -9078,15 +9459,21 @@ class GatewayRunner:
# BUT: never suppress delivery when the agent failed — the error
# message is new content the user hasn't seen, and it must reach
# them even if streaming had sent earlier partial output.
#
# Also never suppress when the final response is "(empty)" — this
# means the model failed to produce content after tool calls (common
# with mimo-v2-pro, GLM-5, etc.). The stream consumer may have
# sent intermediate text ("Let me search for that…") alongside the
# tool call, setting already_sent=True, but that text is NOT the
# final answer. Suppressing delivery here leaves the user staring
# at silence. (#10xxx — "agent stops after web search")
_sc = stream_consumer_holder[0]
if _sc and isinstance(response, dict) and not response.get("failed"):
_response_previewed = bool(response.get("response_previewed"))
if (
_final = response.get("final_response") or ""
_is_empty_sentinel = not _final or _final == "(empty)"
if not _is_empty_sentinel and (
getattr(_sc, "final_response_sent", False)
or (
_response_previewed
and getattr(_sc, "already_sent", False)
)
or getattr(_sc, "already_sent", False)
):
response["already_sent"] = True
@@ -9395,9 +9782,9 @@ def main():
config = None
if args.config:
import json
import yaml
with open(args.config, encoding="utf-8") as f:
data = json.load(f)
data = yaml.safe_load(f)
config = GatewayConfig.from_dict(data)
# Run the gateway - exit with code 1 if no platforms connected,
+6 -2
View File
@@ -301,6 +301,8 @@ def build_session_context_prompt(
lines.append("")
lines.append("**Delivery options for scheduled tasks:**")
from hermes_constants import display_hermes_home
# Origin delivery
if context.source.platform == Platform.LOCAL:
lines.append("- `\"origin\"` → Local output (saved to files)")
@@ -309,9 +311,11 @@ def build_session_context_prompt(
_hash_chat_id(context.source.chat_id) if redact_pii else context.source.chat_id
)
lines.append(f"- `\"origin\"` → Back to this chat ({_origin_label})")
# Local always available
lines.append("- `\"local\"` → Save to local files only (~/.hermes/cron/output/)")
lines.append(
f"- `\"local\"` → Save to local files only ({display_hermes_home()}/cron/output/)"
)
# Platform home channels
for platform, home in context.home_channels.items():
+34 -17
View File
@@ -37,18 +37,24 @@ needs to replace the import + call site:
"""
from contextvars import ContextVar
from typing import Any
# Sentinel to distinguish "never set in this context" from "explicitly set to empty".
# When a contextvar holds _UNSET, we fall back to os.environ (CLI/cron compat).
# When it holds "" (after clear_session_vars resets it), we return "" — no fallback.
_UNSET: Any = object()
# ---------------------------------------------------------------------------
# Per-task session variables
# ---------------------------------------------------------------------------
_SESSION_PLATFORM: ContextVar[str] = ContextVar("HERMES_SESSION_PLATFORM", default="")
_SESSION_CHAT_ID: ContextVar[str] = ContextVar("HERMES_SESSION_CHAT_ID", default="")
_SESSION_CHAT_NAME: ContextVar[str] = ContextVar("HERMES_SESSION_CHAT_NAME", default="")
_SESSION_THREAD_ID: ContextVar[str] = ContextVar("HERMES_SESSION_THREAD_ID", default="")
_SESSION_USER_ID: ContextVar[str] = ContextVar("HERMES_SESSION_USER_ID", default="")
_SESSION_USER_NAME: ContextVar[str] = ContextVar("HERMES_SESSION_USER_NAME", default="")
_SESSION_KEY: ContextVar[str] = ContextVar("HERMES_SESSION_KEY", default="")
_SESSION_PLATFORM: ContextVar = ContextVar("HERMES_SESSION_PLATFORM", default=_UNSET)
_SESSION_CHAT_ID: ContextVar = ContextVar("HERMES_SESSION_CHAT_ID", default=_UNSET)
_SESSION_CHAT_NAME: ContextVar = ContextVar("HERMES_SESSION_CHAT_NAME", default=_UNSET)
_SESSION_THREAD_ID: ContextVar = ContextVar("HERMES_SESSION_THREAD_ID", default=_UNSET)
_SESSION_USER_ID: ContextVar = ContextVar("HERMES_SESSION_USER_ID", default=_UNSET)
_SESSION_USER_NAME: ContextVar = ContextVar("HERMES_SESSION_USER_NAME", default=_UNSET)
_SESSION_KEY: ContextVar = ContextVar("HERMES_SESSION_KEY", default=_UNSET)
_VAR_MAP = {
"HERMES_SESSION_PLATFORM": _SESSION_PLATFORM,
@@ -91,10 +97,17 @@ def set_session_vars(
def clear_session_vars(tokens: list) -> None:
"""Restore session context variables to their pre-handler values."""
if not tokens:
return
vars_in_order = [
"""Mark session context variables as explicitly cleared.
Sets all variables to ``""`` so that ``get_session_env`` returns an empty
string instead of falling back to (potentially stale) ``os.environ``
values. The *tokens* argument is accepted for API compatibility with
callers that saved the return value of ``set_session_vars``, but the
actual clearing uses ``var.set("")`` rather than ``var.reset(token)``
to ensure the "explicitly cleared" state is distinguishable from
"never set" (which holds the ``_UNSET`` sentinel).
"""
for var in (
_SESSION_PLATFORM,
_SESSION_CHAT_ID,
_SESSION_CHAT_NAME,
@@ -102,9 +115,8 @@ def clear_session_vars(tokens: list) -> None:
_SESSION_USER_ID,
_SESSION_USER_NAME,
_SESSION_KEY,
]
for var, token in zip(vars_in_order, tokens):
var.reset(token)
):
var.set("")
def get_session_env(name: str, default: str = "") -> str:
@@ -113,8 +125,13 @@ def get_session_env(name: str, default: str = "") -> str:
Drop-in replacement for ``os.getenv("HERMES_SESSION_*", default)``.
Resolution order:
1. Context variable (set by the gateway for concurrency-safe access)
2. ``os.environ`` (used by CLI, cron scheduler, and tests)
1. Context variable (set by the gateway for concurrency-safe access).
If the variable was explicitly set (even to ``""``) via
``set_session_vars`` or ``clear_session_vars``, that value is
returned **no fallback to os.environ**.
2. ``os.environ`` (only when the context variable was never set in
this context i.e. CLI, cron scheduler, and test processes that
don't use ``set_session_vars`` at all).
3. *default*
"""
import os
@@ -122,7 +139,7 @@ def get_session_env(name: str, default: str = "") -> str:
var = _VAR_MAP.get(name)
if var is not None:
value = var.get()
if value:
if value is not _UNSET:
return value
# Fall back to os.environ for CLI, cron, and test compatibility
return os.getenv(name, default)
+7 -4
View File
@@ -609,12 +609,15 @@ class GatewayStreamConsumer:
content=text,
metadata=self.metadata,
)
if result.success:
self._already_sent = True
return True
# Note: do NOT set _already_sent = True here.
# Commentary messages are interim status updates (e.g. "Using browser
# tool..."), not the final response. Setting already_sent would cause
# the final response to be incorrectly suppressed when there are
# multiple tool calls. See: https://github.com/NousResearch/hermes-agent/issues/10454
return result.success
except Exception as e:
logger.error("Commentary send error: %s", e)
return False
return False
async def _send_or_edit(self, text: str) -> bool:
"""Send or edit the streaming message.
+27 -2
View File
@@ -274,6 +274,14 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
api_key_env_vars=("XIAOMI_API_KEY",),
base_url_env_var="XIAOMI_BASE_URL",
),
"bedrock": ProviderConfig(
id="bedrock",
name="AWS Bedrock",
auth_type="aws_sdk",
inference_base_url="https://bedrock-runtime.us-east-1.amazonaws.com",
api_key_env_vars=(),
base_url_env_var="BEDROCK_BASE_URL",
),
}
@@ -924,6 +932,7 @@ def resolve_provider(
"qwen-portal": "qwen-oauth", "qwen-cli": "qwen-oauth", "qwen-oauth": "qwen-oauth",
"hf": "huggingface", "hugging-face": "huggingface", "huggingface-hub": "huggingface",
"mimo": "xiaomi", "xiaomi-mimo": "xiaomi",
"aws": "bedrock", "aws-bedrock": "bedrock", "amazon-bedrock": "bedrock", "amazon": "bedrock",
"go": "opencode-go", "opencode-go-sub": "opencode-go",
"kilo": "kilocode", "kilo-code": "kilocode", "kilo-gateway": "kilocode",
# Local server aliases — route through the generic custom provider
@@ -980,6 +989,15 @@ def resolve_provider(
if has_usable_secret(os.getenv(env_var, "")):
return pid
# AWS Bedrock — detect via boto3 credential chain (IAM roles, SSO, env vars).
# This runs after API-key providers so explicit keys always win.
try:
from agent.bedrock_adapter import has_aws_credentials
if has_aws_credentials():
return "bedrock"
except ImportError:
pass # boto3 not installed — skip Bedrock auto-detection
raise AuthError(
"No inference provider configured. Run 'hermes model' to choose a "
"provider and model, or set an API key (OPENROUTER_API_KEY, "
@@ -2384,7 +2402,7 @@ def get_api_key_provider_status(provider_id: str) -> Dict[str, Any]:
if pconfig.base_url_env_var:
env_url = os.getenv(pconfig.base_url_env_var, "").strip()
if provider_id == "kimi-coding":
if provider_id in ("kimi-coding", "kimi-coding-cn"):
base_url = _resolve_kimi_base_url(api_key, pconfig.inference_base_url, env_url)
elif env_url:
base_url = env_url
@@ -2446,6 +2464,13 @@ def get_auth_status(provider_id: Optional[str] = None) -> Dict[str, Any]:
pconfig = PROVIDER_REGISTRY.get(target)
if pconfig and pconfig.auth_type == "api_key":
return get_api_key_provider_status(target)
# AWS SDK providers (Bedrock) — check via boto3 credential chain
if pconfig and pconfig.auth_type == "aws_sdk":
try:
from agent.bedrock_adapter import has_aws_credentials
return {"logged_in": has_aws_credentials(), "provider": target}
except ImportError:
return {"logged_in": False, "provider": target, "error": "boto3 not installed"}
return {"logged_in": False}
@@ -2470,7 +2495,7 @@ def resolve_api_key_provider_credentials(provider_id: str) -> Dict[str, Any]:
if pconfig.base_url_env_var:
env_url = os.getenv(pconfig.base_url_env_var, "").strip()
if provider_id == "kimi-coding":
if provider_id in ("kimi-coding", "kimi-coding-cn"):
base_url = _resolve_kimi_base_url(api_key, pconfig.inference_base_url, env_url)
elif provider_id == "zai":
base_url = _resolve_zai_base_url(api_key, pconfig.inference_base_url, env_url)
+26 -1
View File
@@ -4,6 +4,7 @@ from __future__ import annotations
from getpass import getpass
import math
import sys
import time
from types import SimpleNamespace
import uuid
@@ -160,7 +161,10 @@ def auth_add_command(args) -> None:
default_label = _api_key_default_label(len(pool.entries()) + 1)
label = (getattr(args, "label", None) or "").strip()
if not label:
label = input(f"Label (optional, default: {default_label}): ").strip() or default_label
if sys.stdin.isatty():
label = input(f"Label (optional, default: {default_label}): ").strip() or default_label
else:
label = default_label
entry = PooledCredential(
provider=provider,
id=uuid.uuid4().hex[:6],
@@ -368,6 +372,27 @@ def _interactive_auth() -> None:
print("=" * 50)
auth_list_command(SimpleNamespace(provider=None))
# Show AWS Bedrock credential status (not in the pool — uses boto3 chain)
try:
from agent.bedrock_adapter import has_aws_credentials, resolve_aws_auth_env_var, resolve_bedrock_region
if has_aws_credentials():
auth_source = resolve_aws_auth_env_var() or "unknown"
region = resolve_bedrock_region()
print(f"bedrock (AWS SDK credential chain):")
print(f" Auth: {auth_source}")
print(f" Region: {region}")
try:
import boto3
sts = boto3.client("sts", region_name=region)
identity = sts.get_caller_identity()
arn = identity.get("Arn", "unknown")
print(f" Identity: {arn}")
except Exception:
print(f" Identity: (could not resolve — boto3 STS call failed)")
print()
except ImportError:
pass # boto3 or bedrock_adapter not available
print()
# Main menu
+19 -4
View File
@@ -164,7 +164,7 @@ COMMAND_REGISTRY: list[CommandDef] = [
# Exit
CommandDef("quit", "Exit the CLI", "Exit",
cli_only=True, aliases=("exit", "q")),
cli_only=True, aliases=("exit",)),
]
@@ -450,7 +450,7 @@ def _collect_gateway_skill_entries(
name = sanitize_name(cmd_name) if sanitize_name else cmd_name
if not name:
continue
desc = "Plugin command"
desc = plugin_cmds[cmd_name].get("description", "Plugin command")
if len(desc) > desc_limit:
desc = desc[:desc_limit - 3] + "..."
plugin_pairs.append((name, desc))
@@ -844,8 +844,7 @@ class SlashCommandCompleter(Completer):
return None
return word
@staticmethod
def _context_completions(word: str, limit: int = 30):
def _context_completions(self, word: str, limit: int = 30):
"""Yield Claude Code-style @ context completions.
Bare ``@`` or ``@partial`` shows static references and matching
@@ -1140,6 +1139,22 @@ class SlashCommandCompleter(Completer):
display_meta=f"{short_desc}",
)
# Plugin-registered slash commands
try:
from hermes_cli.plugins import get_plugin_commands
for cmd_name, cmd_info in get_plugin_commands().items():
if cmd_name.startswith(word):
desc = str(cmd_info.get("description", "Plugin command"))
short_desc = desc[:50] + ("..." if len(desc) > 50 else "")
yield Completion(
self._completion_text(cmd_name, word),
start_position=-len(word),
display=f"/{cmd_name}",
display_meta=f"🔌 {short_desc}",
)
except Exception:
pass
# ---------------------------------------------------------------------------
# Inline auto-suggest (ghost text) for slash commands
+156 -9
View File
@@ -241,13 +241,41 @@ def _secure_dir(path):
pass
def _is_container() -> bool:
"""Detect if we're running inside a Docker/Podman/LXC container.
When Hermes runs in a container with volume-mounted config files, forcing
0o600 permissions breaks multi-process setups where the gateway and
dashboard run as different UIDs or the volume mount requires broader
permissions.
"""
# Explicit opt-out
if os.environ.get("HERMES_CONTAINER") or os.environ.get("HERMES_SKIP_CHMOD"):
return True
# Docker / Podman marker file
if os.path.exists("/.dockerenv"):
return True
# LXC / cgroup-based detection
try:
with open("/proc/1/cgroup", "r") as f:
cgroup_content = f.read()
if "docker" in cgroup_content or "lxc" in cgroup_content or "kubepods" in cgroup_content:
return True
except (OSError, IOError):
pass
return False
def _secure_file(path):
"""Set file to owner-only read/write (0600). No-op on Windows.
Skipped in managed mode the NixOS activation script sets
group-readable permissions (0640) on config files.
Skipped in containers Docker/Podman volume mounts often need broader
permissions. Set HERMES_SKIP_CHMOD=1 to force-skip on other systems.
"""
if is_managed():
if is_managed() or _is_container():
return
try:
if os.path.exists(str(path)):
@@ -419,6 +447,27 @@ DEFAULT_CONFIG = {
"protect_last_n": 20, # minimum recent messages to keep uncompressed
},
# AWS Bedrock provider configuration.
# Only used when model.provider is "bedrock".
"bedrock": {
"region": "", # AWS region for Bedrock API calls (empty = AWS_REGION env var → us-east-1)
"discovery": {
"enabled": True, # Auto-discover models via ListFoundationModels
"provider_filter": [], # Only show models from these providers (e.g. ["anthropic", "amazon"])
"refresh_interval": 3600, # Cache discovery results for this many seconds
},
"guardrail": {
# Amazon Bedrock Guardrails — content filtering and safety policies.
# Create a guardrail in the Bedrock console, then set the ID and version here.
# See: https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails.html
"guardrail_identifier": "", # e.g. "abc123def456"
"guardrail_version": "", # e.g. "1" or "DRAFT"
"stream_processing_mode": "async", # "sync" or "async"
"trace": "disabled", # "enabled", "disabled", or "enabled_full"
},
},
"smart_model_routing": {
"enabled": False,
"max_simple_chars": 160,
@@ -638,6 +687,7 @@ DEFAULT_CONFIG = {
"allowed_channels": "", # If set, bot ONLY responds in these channel IDs (whitelist)
"auto_thread": True, # Auto-create threads on @mention in channels (like Slack)
"reactions": True, # Add 👀/✅/❌ reactions to messages during processing
"channel_prompts": {}, # Per-channel ephemeral system prompts (forum parents apply to child threads)
},
# WhatsApp platform settings (gateway mode)
@@ -648,6 +698,21 @@ DEFAULT_CONFIG = {
# Supports \n for newlines, e.g. "🤖 *My Bot*\n──────\n"
},
# Telegram platform settings (gateway mode)
"telegram": {
"channel_prompts": {}, # Per-chat/topic ephemeral system prompts (topics inherit from parent group)
},
# Slack platform settings (gateway mode)
"slack": {
"channel_prompts": {}, # Per-channel ephemeral system prompts
},
# Mattermost platform settings (gateway mode)
"mattermost": {
"channel_prompts": {}, # Per-channel ephemeral system prompts
},
# Approval mode for dangerous commands:
# manual — always prompt the user (default)
# smart — use auxiliary LLM to auto-approve low-risk commands, prompt for high-risk
@@ -703,7 +768,7 @@ DEFAULT_CONFIG = {
},
# Config schema version - bump this when adding new required fields
"_config_version": 17,
"_config_version": 18,
}
# =============================================================================
@@ -974,6 +1039,22 @@ OPTIONAL_ENV_VARS = {
"category": "provider",
"advanced": True,
},
"AWS_REGION": {
"description": "AWS region for Bedrock API calls (e.g. us-east-1, eu-central-1)",
"prompt": "AWS Region",
"url": "https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-regions.html",
"password": False,
"category": "provider",
"advanced": True,
},
"AWS_PROFILE": {
"description": "AWS named profile for Bedrock authentication (from ~/.aws/credentials)",
"prompt": "AWS Profile",
"url": None,
"password": False,
"category": "provider",
"advanced": True,
},
# ── Tool API keys ──
"EXA_API_KEY": {
@@ -1171,6 +1252,12 @@ OPTIONAL_ENV_VARS = {
"password": False,
"category": "messaging",
},
"TELEGRAM_PROXY": {
"description": "Proxy URL for Telegram connections (overrides HTTPS_PROXY). Supports http://, https://, socks5://",
"prompt": "Telegram proxy URL (optional)",
"password": False,
"category": "messaging",
},
"DISCORD_BOT_TOKEN": {
"description": "Discord bot token from Developer Portal",
"prompt": "Discord bot token",
@@ -2766,6 +2853,47 @@ def sanitize_env_file() -> int:
return fixes
def _check_non_ascii_credential(key: str, value: str) -> str:
"""Warn and strip non-ASCII characters from credential values.
API keys and tokens must be pure ASCII they are sent as HTTP header
values which httpx/httpcore encode as ASCII. Non-ASCII characters
(commonly introduced by copy-pasting from rich-text editors or PDFs
that substitute lookalike Unicode glyphs for ASCII letters) cause
``UnicodeEncodeError: 'ascii' codec can't encode character`` at
request time.
Returns the sanitized (ASCII-only) value. Prints a warning if any
non-ASCII characters were found and removed.
"""
try:
value.encode("ascii")
return value # all ASCII — nothing to do
except UnicodeEncodeError:
pass
# Build a readable list of the offending characters
bad_chars: list[str] = []
for i, ch in enumerate(value):
if ord(ch) > 127:
bad_chars.append(f" position {i}: {ch!r} (U+{ord(ch):04X})")
sanitized = value.encode("ascii", errors="ignore").decode("ascii")
import sys
print(
f"\n Warning: {key} contains non-ASCII characters that will break API requests.\n"
f" This usually happens when copy-pasting from a PDF, rich-text editor,\n"
f" or web page that substitutes lookalike Unicode glyphs for ASCII letters.\n"
f"\n"
+ "\n".join(f" {line}" for line in bad_chars[:5])
+ ("\n ... and more" if len(bad_chars) > 5 else "")
+ f"\n\n The non-ASCII characters have been stripped automatically.\n"
f" If authentication fails, re-copy the key from the provider's dashboard.\n",
file=sys.stderr,
)
return sanitized
def save_env_value(key: str, value: str):
"""Save or update a value in ~/.hermes/.env."""
if is_managed():
@@ -2774,6 +2902,8 @@ def save_env_value(key: str, value: str):
if not _ENV_VAR_NAME_RE.match(key):
raise ValueError(f"Invalid environment variable name: {key!r}")
value = value.replace("\n", "").replace("\r", "")
# API keys / tokens must be ASCII — strip non-ASCII with a warning.
value = _check_non_ascii_credential(key, value)
ensure_hermes_home()
env_path = get_env_path()
@@ -2804,12 +2934,25 @@ def save_env_value(key: str, value: str):
lines.append(f"{key}={value}\n")
fd, tmp_path = tempfile.mkstemp(dir=str(env_path.parent), suffix='.tmp', prefix='.env_')
# Preserve original permissions so Docker volume mounts aren't clobbered.
original_mode = None
if env_path.exists():
try:
original_mode = stat.S_IMODE(env_path.stat().st_mode)
except OSError:
pass
try:
with os.fdopen(fd, 'w', **write_kw) as f:
f.writelines(lines)
f.flush()
os.fsync(f.fileno())
os.replace(tmp_path, env_path)
# Restore original permissions before _secure_file may tighten them.
if original_mode is not None:
try:
os.chmod(env_path, original_mode)
except OSError:
pass
except BaseException:
try:
os.unlink(tmp_path)
@@ -2820,13 +2963,6 @@ def save_env_value(key: str, value: str):
os.environ[key] = value
# Restrict .env permissions to owner-only (contains API keys)
if not _IS_WINDOWS:
try:
os.chmod(env_path, stat.S_IRUSR | stat.S_IWUSR)
except OSError:
pass
def remove_env_value(key: str) -> bool:
"""Remove a key from ~/.hermes/.env and os.environ.
@@ -2855,12 +2991,23 @@ def remove_env_value(key: str) -> bool:
if found:
fd, tmp_path = tempfile.mkstemp(dir=str(env_path.parent), suffix='.tmp', prefix='.env_')
# Preserve original permissions so Docker volume mounts aren't clobbered.
original_mode = None
try:
original_mode = stat.S_IMODE(env_path.stat().st_mode)
except OSError:
pass
try:
with os.fdopen(fd, 'w', **write_kw) as f:
f.writelines(new_lines)
f.flush()
os.fsync(f.fileno())
os.replace(tmp_path, env_path)
if original_mode is not None:
try:
os.chmod(env_path, original_mode)
except OSError:
pass
except BaseException:
try:
os.unlink(tmp_path)
+143 -2
View File
@@ -27,6 +27,110 @@ _DPASTE_COM_URL = "https://dpaste.com/api/"
# paste.rs caps at ~1 MB; we stay under that with headroom.
_MAX_LOG_BYTES = 512_000
# Auto-delete pastes after this many seconds (1 hour).
_AUTO_DELETE_SECONDS = 3600
# ---------------------------------------------------------------------------
# Privacy / delete helpers
# ---------------------------------------------------------------------------
_PRIVACY_NOTICE = """\
This will upload the following to a public paste service:
System info (OS, Python version, Hermes version, provider, which API keys
are configured NOT the actual keys)
Recent log lines (agent.log, errors.log, gateway.log may contain
conversation fragments and file paths)
Full agent.log and gateway.log (up to 512 KB each likely contains
conversation content, tool outputs, and file paths)
Pastes auto-delete after 1 hour.
"""
_GATEWAY_PRIVACY_NOTICE = (
"⚠️ **Privacy notice:** This uploads system info + recent log tails "
"(may contain conversation fragments) to a public paste service. "
"Full logs are NOT included from the gateway — use `hermes debug share` "
"from the CLI for full log uploads.\n"
"Pastes auto-delete after 1 hour."
)
def _extract_paste_id(url: str) -> Optional[str]:
"""Extract the paste ID from a paste.rs or dpaste.com URL.
Returns the ID string, or None if the URL doesn't match a known service.
"""
url = url.strip().rstrip("/")
for prefix in ("https://paste.rs/", "http://paste.rs/"):
if url.startswith(prefix):
return url[len(prefix):]
return None
def delete_paste(url: str) -> bool:
"""Delete a paste from paste.rs. Returns True on success.
Only paste.rs supports unauthenticated DELETE. dpaste.com pastes
expire automatically but cannot be deleted via API.
"""
paste_id = _extract_paste_id(url)
if not paste_id:
raise ValueError(
f"Cannot delete: only paste.rs URLs are supported. Got: {url}"
)
target = f"{_PASTE_RS_URL}{paste_id}"
req = urllib.request.Request(
target, method="DELETE",
headers={"User-Agent": "hermes-agent/debug-share"},
)
with urllib.request.urlopen(req, timeout=30) as resp:
return 200 <= resp.status < 300
def _schedule_auto_delete(urls: list[str], delay_seconds: int = _AUTO_DELETE_SECONDS):
"""Spawn a detached process to delete paste.rs pastes after *delay_seconds*.
The child process is fully detached (``start_new_session=True``) so it
survives the parent exiting (important for CLI mode). Only paste.rs
URLs are attempted dpaste.com pastes auto-expire on their own.
"""
import subprocess
paste_rs_urls = [u for u in urls if _extract_paste_id(u)]
if not paste_rs_urls:
return
# Build a tiny inline Python script. No imports beyond stdlib.
url_list = ", ".join(f'"{u}"' for u in paste_rs_urls)
script = (
"import time, urllib.request; "
f"time.sleep({delay_seconds}); "
f"[urllib.request.urlopen(urllib.request.Request(u, method='DELETE', "
f"headers={{'User-Agent': 'hermes-agent/auto-delete'}}), timeout=15) "
f"for u in [{url_list}]]"
)
try:
subprocess.Popen(
[sys.executable, "-c", script],
start_new_session=True,
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
)
except Exception:
pass # Best-effort; manual delete still available.
def _delete_hint(url: str) -> str:
"""Return a one-liner delete command for the given paste URL."""
paste_id = _extract_paste_id(url)
if paste_id:
return f"hermes debug delete {url}"
# dpaste.com — no API delete, expires on its own.
return "(auto-expires per dpaste.com policy)"
def _upload_paste_rs(content: str) -> str:
"""Upload to paste.rs. Returns the paste URL.
@@ -250,6 +354,9 @@ def run_debug_share(args):
expiry = getattr(args, "expire", 7)
local_only = getattr(args, "local", False)
if not local_only:
print(_PRIVACY_NOTICE)
print("Collecting debug report...")
# Capture dump once — prepended to every paste for context.
@@ -315,22 +422,56 @@ def run_debug_share(args):
if failures:
print(f"\n (failed to upload: {', '.join(failures)})")
# Schedule auto-deletion after 1 hour
_schedule_auto_delete(list(urls.values()))
print(f"\n⏱ Pastes will auto-delete in 1 hour.")
# Manual delete fallback
print(f"To delete now: hermes debug delete <url>")
print(f"\nShare these links with the Hermes team for support.")
def run_debug_delete(args):
"""Delete one or more paste URLs uploaded by /debug."""
urls = getattr(args, "urls", [])
if not urls:
print("Usage: hermes debug delete <url> [<url> ...]")
print(" Deletes paste.rs pastes uploaded by 'hermes debug share'.")
return
for url in urls:
try:
ok = delete_paste(url)
if ok:
print(f" ✓ Deleted: {url}")
else:
print(f" ✗ Failed to delete: {url} (unexpected response)")
except ValueError as exc:
print(f"{exc}")
except Exception as exc:
print(f" ✗ Could not delete {url}: {exc}")
def run_debug(args):
"""Route debug subcommands."""
subcmd = getattr(args, "debug_command", None)
if subcmd == "share":
run_debug_share(args)
elif subcmd == "delete":
run_debug_delete(args)
else:
# Default: show help
print("Usage: hermes debug share [--lines N] [--expire N] [--local]")
print("Usage: hermes debug <command>")
print()
print("Commands:")
print(" share Upload debug report to a paste service and print URL")
print(" delete Delete a previously uploaded paste")
print()
print("Options:")
print("Options (share):")
print(" --lines N Number of log lines to include (default: 200)")
print(" --expire N Paste expiry in days (default: 7)")
print(" --local Print report locally instead of uploading")
print()
print("Options (delete):")
print(" <url> ... One or more paste URLs to delete")
+109 -2
View File
@@ -8,6 +8,7 @@ import os
import sys
import subprocess
import shutil
from pathlib import Path
from hermes_cli.config import get_project_root, get_hermes_home, get_env_path
from hermes_constants import display_hermes_home
@@ -513,7 +514,87 @@ def run_doctor(args):
pass
_check_gateway_service_linger(issues)
# =========================================================================
# Check: Command installation (hermes bin symlink)
# =========================================================================
if sys.platform != "win32":
print()
print(color("◆ Command Installation", Colors.CYAN, Colors.BOLD))
# Determine the venv entry point location
_venv_bin = None
for _venv_name in ("venv", ".venv"):
_candidate = PROJECT_ROOT / _venv_name / "bin" / "hermes"
if _candidate.exists():
_venv_bin = _candidate
break
# Determine the expected command link directory (mirrors install.sh logic)
_prefix = os.environ.get("PREFIX", "")
_is_termux_env = bool(os.environ.get("TERMUX_VERSION")) or "com.termux/files/usr" in _prefix
if _is_termux_env and _prefix:
_cmd_link_dir = Path(_prefix) / "bin"
_cmd_link_display = "$PREFIX/bin"
else:
_cmd_link_dir = Path.home() / ".local" / "bin"
_cmd_link_display = "~/.local/bin"
_cmd_link = _cmd_link_dir / "hermes"
if _venv_bin is None:
check_warn(
"Venv entry point not found",
"(hermes not in venv/bin/ or .venv/bin/ — reinstall with pip install -e '.[all]')"
)
manual_issues.append(
f"Reinstall entry point: cd {PROJECT_ROOT} && source venv/bin/activate && pip install -e '.[all]'"
)
else:
check_ok(f"Venv entry point exists ({_venv_bin.relative_to(PROJECT_ROOT)})")
# Check the symlink at the command link location
if _cmd_link.is_symlink():
_target = _cmd_link.resolve()
_expected = _venv_bin.resolve()
if _target == _expected:
check_ok(f"{_cmd_link_display}/hermes → correct target")
else:
check_warn(
f"{_cmd_link_display}/hermes points to wrong target",
f"(→ {_target}, expected → {_expected})"
)
if should_fix:
_cmd_link.unlink()
_cmd_link.symlink_to(_venv_bin)
check_ok(f"Fixed symlink: {_cmd_link_display}/hermes → {_venv_bin}")
fixed_count += 1
else:
issues.append(f"Broken symlink at {_cmd_link_display}/hermes — run 'hermes doctor --fix'")
elif _cmd_link.exists():
# It's a regular file, not a symlink — possibly a wrapper script
check_ok(f"{_cmd_link_display}/hermes exists (non-symlink)")
else:
check_fail(
f"{_cmd_link_display}/hermes not found",
"(hermes command may not work outside the venv)"
)
if should_fix:
_cmd_link_dir.mkdir(parents=True, exist_ok=True)
_cmd_link.symlink_to(_venv_bin)
check_ok(f"Created symlink: {_cmd_link_display}/hermes → {_venv_bin}")
fixed_count += 1
# Check if the link dir is on PATH
_path_dirs = os.environ.get("PATH", "").split(os.pathsep)
if str(_cmd_link_dir) not in _path_dirs:
check_warn(
f"{_cmd_link_display} is not on your PATH",
"(add it to your shell config: export PATH=\"$HOME/.local/bin:$PATH\")"
)
manual_issues.append(f"Add {_cmd_link_display} to your PATH")
else:
issues.append(f"Missing {_cmd_link_display}/hermes symlink — run 'hermes doctor --fix'")
# =========================================================================
# Check: External tools
# =========================================================================
@@ -733,7 +814,8 @@ def run_doctor(args):
("Vercel AI Gateway", ("AI_GATEWAY_API_KEY",), "https://ai-gateway.vercel.sh/v1/models", "AI_GATEWAY_BASE_URL", True),
("Kilo Code", ("KILOCODE_API_KEY",), "https://api.kilo.ai/api/gateway/models", "KILOCODE_BASE_URL", True),
("OpenCode Zen", ("OPENCODE_ZEN_API_KEY",), "https://opencode.ai/zen/v1/models", "OPENCODE_ZEN_BASE_URL", True),
("OpenCode Go", ("OPENCODE_GO_API_KEY",), "https://opencode.ai/zen/go/v1/models", "OPENCODE_GO_BASE_URL", True),
# OpenCode Go has no shared /models endpoint; skip the health check.
("OpenCode Go", ("OPENCODE_GO_API_KEY",), None, "OPENCODE_GO_BASE_URL", False),
]
for _pname, _env_vars, _default_url, _base_env, _supports_health_check in _apikey_providers:
_key = ""
@@ -778,6 +860,31 @@ def run_doctor(args):
except Exception as _e:
print(f"\r {color('', Colors.YELLOW)} {_label} {color(f'({_e})', Colors.DIM)} ")
# -- AWS Bedrock --
# Bedrock uses the AWS SDK credential chain, not API keys.
try:
from agent.bedrock_adapter import has_aws_credentials, resolve_aws_auth_env_var, resolve_bedrock_region
if has_aws_credentials():
_auth_var = resolve_aws_auth_env_var()
_region = resolve_bedrock_region()
_label = "AWS Bedrock".ljust(20)
print(f" Checking AWS Bedrock...", end="", flush=True)
try:
import boto3
_br_client = boto3.client("bedrock", region_name=_region)
_br_resp = _br_client.list_foundation_models()
_model_count = len(_br_resp.get("modelSummaries", []))
print(f"\r {color('', Colors.GREEN)} {_label} {color(f'({_auth_var}, {_region}, {_model_count} models)', Colors.DIM)} ")
except ImportError:
print(f"\r {color('', Colors.YELLOW)} {_label} {color('(boto3 not installed — pip install hermes-agent[bedrock])', Colors.DIM)} ")
issues.append("Install boto3 for Bedrock: pip install hermes-agent[bedrock]")
except Exception as _e:
_err_name = type(_e).__name__
print(f"\r {color('', Colors.YELLOW)} {_label} {color(f'({_err_name}: {_e})', Colors.DIM)} ")
issues.append(f"AWS Bedrock: {_err_name} — check IAM permissions for bedrock:ListFoundationModels")
except ImportError:
pass # bedrock_adapter not available — skip silently
# =========================================================================
# Check: Submodules
# =========================================================================
+29
View File
@@ -8,11 +8,40 @@ from pathlib import Path
from dotenv import load_dotenv
# Env var name suffixes that indicate credential values. These are the
# only env vars whose values we sanitize on load — we must not silently
# alter arbitrary user env vars, but credentials are known to require
# pure ASCII (they become HTTP header values).
_CREDENTIAL_SUFFIXES = ("_API_KEY", "_TOKEN", "_SECRET", "_KEY")
def _sanitize_loaded_credentials() -> None:
"""Strip non-ASCII characters from credential env vars in os.environ.
Called after dotenv loads so the rest of the codebase never sees
non-ASCII API keys. Only touches env vars whose names end with
known credential suffixes (``_API_KEY``, ``_TOKEN``, etc.).
"""
for key, value in list(os.environ.items()):
if not any(key.endswith(suffix) for suffix in _CREDENTIAL_SUFFIXES):
continue
try:
value.encode("ascii")
except UnicodeEncodeError:
os.environ[key] = value.encode("ascii", errors="ignore").decode("ascii")
def _load_dotenv_with_fallback(path: Path, *, override: bool) -> None:
try:
load_dotenv(dotenv_path=path, override=override, encoding="utf-8")
except UnicodeDecodeError:
load_dotenv(dotenv_path=path, override=override, encoding="latin-1")
# Strip non-ASCII characters from credential env vars that were just
# loaded. API keys must be pure ASCII since they're sent as HTTP
# header values (httpx encodes headers as ASCII). Non-ASCII chars
# typically come from copy-pasting keys from PDFs or rich-text editors
# that substitute Unicode lookalike glyphs (e.g. ʋ U+028B for v).
_sanitize_loaded_credentials()
def _sanitize_env_file_if_needed(path: Path) -> None:
+110 -3
View File
@@ -222,7 +222,7 @@ def find_gateway_pids(exclude_pids: set | None = None, all_profiles: bool = Fals
current_cmd = ""
else:
result = subprocess.run(
["ps", "eww", "-ax", "-o", "pid=,command="],
["ps", "-A", "eww", "-o", "pid=,command="],
capture_output=True,
text=True,
timeout=10,
@@ -715,7 +715,9 @@ def _detect_venv_dir() -> Path | None:
"""Detect the active virtualenv directory.
Checks ``sys.prefix`` first (works regardless of the directory name),
then falls back to probing common directory names under PROJECT_ROOT.
then ``VIRTUAL_ENV`` env var (covers uv-managed environments where
sys.prefix == sys.base_prefix), then falls back to probing common
directory names under PROJECT_ROOT.
Returns ``None`` when no virtualenv can be found.
"""
# If we're running inside a virtualenv, sys.prefix points to it.
@@ -724,6 +726,15 @@ def _detect_venv_dir() -> Path | None:
if venv.is_dir():
return venv
# uv and some other tools set VIRTUAL_ENV without changing sys.prefix.
# This catches `uv run` where sys.prefix == sys.base_prefix but the
# environment IS a venv. (#8620)
_virtual_env = os.environ.get("VIRTUAL_ENV")
if _virtual_env:
venv = Path(_virtual_env)
if venv.is_dir():
return venv
# Fallback: check common virtualenv directory names under the project root.
for candidate in (".venv", "venv"):
venv = PROJECT_ROOT / candidate
@@ -1128,7 +1139,62 @@ def systemd_restart(system: bool = False):
pid = get_running_pid()
if pid is not None and _request_gateway_self_restart(pid):
print(f"{_service_scope_label(system).capitalize()} service restart requested")
# SIGUSR1 sent — the gateway will drain active agents, exit with
# code 75, and systemd will restart it after RestartSec (30s).
# Wait for the old process to die and the new one to become active
# so the CLI doesn't return while the service is still restarting.
import time
scope_label = _service_scope_label(system).capitalize()
svc = get_service_name()
scope_cmd = _systemctl_cmd(system)
# Phase 1: wait for old process to exit (drain + shutdown)
print(f"{scope_label} service draining active work...")
deadline = time.time() + 90
while time.time() < deadline:
try:
os.kill(pid, 0)
time.sleep(1)
except (ProcessLookupError, PermissionError):
break # old process is gone
else:
print(f"⚠ Old process (PID {pid}) still alive after 90s")
# Phase 2: wait for systemd to start the new process
print(f"⏳ Waiting for {svc} to restart...")
deadline = time.time() + 60
while time.time() < deadline:
try:
result = subprocess.run(
scope_cmd + ["is-active", svc],
capture_output=True, text=True, timeout=5,
)
if result.stdout.strip() == "active":
# Verify it's a NEW process, not the old one somehow
new_pid = get_running_pid()
if new_pid and new_pid != pid:
print(f"{scope_label} service restarted (PID {new_pid})")
return
except (subprocess.TimeoutExpired, FileNotFoundError):
pass
time.sleep(2)
# Timed out — check final state
try:
result = subprocess.run(
scope_cmd + ["is-active", svc],
capture_output=True, text=True, timeout=5,
)
if result.stdout.strip() == "active":
print(f"{scope_label} service restarted")
return
except Exception:
pass
print(
f"{scope_label} service did not become active within 60s.\n"
f" Check status: {'sudo ' if system else ''}hermes gateway status\n"
f" Check logs: journalctl {'--user ' if not system else ''}-u {svc} --since '2 min ago'"
)
return
_run_systemctl(["reload-or-restart", get_service_name()], system=system, check=True, timeout=90)
print(f"{_service_scope_label(system).capitalize()} service restarted")
@@ -2864,6 +2930,15 @@ def gateway_command(args):
elif subcmd == "start":
system = getattr(args, 'system', False)
start_all = getattr(args, 'all', False)
if start_all:
# Kill all stale gateway processes across all profiles before starting
killed = kill_gateway_processes(all_profiles=True)
if killed:
print(f"✓ Killed {killed} stale gateway process(es) across all profiles")
_wait_for_gateway_exit(timeout=10.0, force_after=5.0)
if is_termux():
print("Gateway service start is not supported on Termux because there is no system service manager.")
print("Run manually: hermes gateway")
@@ -2949,7 +3024,39 @@ def gateway_command(args):
# Try service first, fall back to killing and restarting
service_available = False
system = getattr(args, 'system', False)
restart_all = getattr(args, 'all', False)
service_configured = False
if restart_all:
# --all: stop every gateway process across all profiles, then start fresh
service_stopped = False
if supports_systemd_services() and (get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()):
try:
systemd_stop(system=system)
service_stopped = True
except subprocess.CalledProcessError:
pass
elif is_macos() and get_launchd_plist_path().exists():
try:
launchd_stop()
service_stopped = True
except subprocess.CalledProcessError:
pass
killed = kill_gateway_processes(all_profiles=True)
total = killed + (1 if service_stopped else 0)
if total:
print(f"✓ Stopped {total} gateway process(es) across all profiles")
_wait_for_gateway_exit(timeout=10.0, force_after=5.0)
# Start the current profile's service fresh
print("Starting gateway...")
if supports_systemd_services() and (get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()):
systemd_start(system=system)
elif is_macos() and get_launchd_plist_path().exists():
launchd_start()
else:
run_gateway(verbose=0)
return
if supports_systemd_services() and (get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()):
service_configured = True
+295 -1
View File
@@ -1139,6 +1139,8 @@ def select_provider_and_model(args=None):
_model_flow_anthropic(config, current_model)
elif selected_provider == "kimi-coding":
_model_flow_kimi(config, current_model)
elif selected_provider == "bedrock":
_model_flow_bedrock(config, current_model)
elif selected_provider in ("gemini", "deepseek", "xai", "zai", "kimi-coding-cn", "minimax", "minimax-cn", "kilocode", "opencode-zen", "opencode-go", "ai-gateway", "alibaba", "huggingface", "xiaomi", "arcee"):
_model_flow_api_key_provider(config, selected_provider, current_model)
@@ -2425,6 +2427,252 @@ def _model_flow_kimi(config, current_model=""):
print("No change.")
def _model_flow_bedrock_api_key(config, region, current_model=""):
"""Bedrock API Key mode — uses the OpenAI-compatible bedrock-mantle endpoint.
For developers who don't have an AWS account but received a Bedrock API Key
from their AWS admin. Works like any OpenAI-compatible endpoint.
"""
from hermes_cli.auth import _prompt_model_selection, _save_model_choice, deactivate_provider
from hermes_cli.config import load_config, save_config, get_env_value, save_env_value
from hermes_cli.models import _PROVIDER_MODELS
mantle_base_url = f"https://bedrock-mantle.{region}.api.aws/v1"
# Prompt for API key
existing_key = get_env_value("AWS_BEARER_TOKEN_BEDROCK") or ""
if existing_key:
print(f" Bedrock API Key: {existing_key[:12]}... ✓")
else:
print(f" Endpoint: {mantle_base_url}")
print()
try:
import getpass
api_key = getpass.getpass(" Bedrock API Key: ").strip()
except (KeyboardInterrupt, EOFError):
print()
return
if not api_key:
print(" Cancelled.")
return
save_env_value("AWS_BEARER_TOKEN_BEDROCK", api_key)
existing_key = api_key
print(" ✓ API key saved.")
print()
# Model selection — use static list (mantle doesn't need boto3 for discovery)
model_list = _PROVIDER_MODELS.get("bedrock", [])
print(f" Showing {len(model_list)} curated models")
if model_list:
selected = _prompt_model_selection(model_list, current_model=current_model)
else:
try:
selected = input(" Model ID: ").strip()
except (KeyboardInterrupt, EOFError):
selected = None
if selected:
_save_model_choice(selected)
# Save as custom provider pointing to bedrock-mantle
cfg = load_config()
model = cfg.get("model")
if not isinstance(model, dict):
model = {"default": model} if model else {}
cfg["model"] = model
model["provider"] = "custom"
model["base_url"] = mantle_base_url
model.pop("api_mode", None) # chat_completions is the default
# Also save region in bedrock config for reference
bedrock_cfg = cfg.get("bedrock", {})
if not isinstance(bedrock_cfg, dict):
bedrock_cfg = {}
bedrock_cfg["region"] = region
cfg["bedrock"] = bedrock_cfg
# Save the API key env var name so hermes knows where to find it
save_env_value("OPENAI_API_KEY", existing_key)
save_env_value("OPENAI_BASE_URL", mantle_base_url)
save_config(cfg)
deactivate_provider()
print(f" Default model set to: {selected} (via Bedrock API Key, {region})")
print(f" Endpoint: {mantle_base_url}")
else:
print(" No change.")
def _model_flow_bedrock(config, current_model=""):
"""AWS Bedrock provider: verify credentials, pick region, discover models.
Uses the native Converse API via boto3 not the OpenAI-compatible endpoint.
Auth is handled by the AWS SDK default credential chain (env vars, profile,
instance role), so no API key prompt is needed.
"""
from hermes_cli.auth import _prompt_model_selection, _save_model_choice, deactivate_provider
from hermes_cli.config import load_config, save_config
from hermes_cli.models import _PROVIDER_MODELS
# 1. Check for AWS credentials
try:
from agent.bedrock_adapter import (
has_aws_credentials,
resolve_aws_auth_env_var,
resolve_bedrock_region,
discover_bedrock_models,
)
except ImportError:
print(" ✗ boto3 is not installed. Install it with:")
print(" pip install boto3")
print()
return
if not has_aws_credentials():
print(" ⚠ No AWS credentials detected via environment variables.")
print(" Bedrock will use boto3's default credential chain (IMDS, SSO, etc.)")
print()
auth_var = resolve_aws_auth_env_var()
if auth_var:
print(f" AWS credentials: {auth_var}")
else:
print(" AWS credentials: boto3 default chain (instance role / SSO)")
print()
# 2. Region selection
current_region = resolve_bedrock_region()
try:
region_input = input(f" AWS Region [{current_region}]: ").strip()
except (KeyboardInterrupt, EOFError):
print()
return
region = region_input or current_region
# 2b. Authentication mode
print(" Choose authentication method:")
print()
print(" 1. IAM credential chain (recommended)")
print(" Works with EC2 instance roles, SSO, env vars, aws configure")
print(" 2. Bedrock API Key")
print(" Enter your Bedrock API Key directly — also supports")
print(" team scenarios where an admin distributes keys")
print()
try:
auth_choice = input(" Choice [1]: ").strip()
except (KeyboardInterrupt, EOFError):
print()
return
if auth_choice == "2":
_model_flow_bedrock_api_key(config, region, current_model)
return
# 3. Model discovery — try live API first, fall back to static list
print(f" Discovering models in {region}...")
live_models = discover_bedrock_models(region)
if live_models:
_EXCLUDE_PREFIXES = (
"stability.", "cohere.embed", "twelvelabs.", "us.stability.",
"us.cohere.embed", "us.twelvelabs.", "global.cohere.embed",
"global.twelvelabs.",
)
_EXCLUDE_SUBSTRINGS = ("safeguard", "voxtral", "palmyra-vision")
filtered = []
for m in live_models:
mid = m["id"]
if any(mid.startswith(p) for p in _EXCLUDE_PREFIXES):
continue
if any(s in mid.lower() for s in _EXCLUDE_SUBSTRINGS):
continue
filtered.append(m)
# Deduplicate: prefer inference profiles (us.*, global.*) over bare
# foundation model IDs.
profile_base_ids = set()
for m in filtered:
mid = m["id"]
if mid.startswith(("us.", "global.")):
base = mid.split(".", 1)[1] if "." in mid[3:] else mid
profile_base_ids.add(base)
deduped = []
for m in filtered:
mid = m["id"]
if not mid.startswith(("us.", "global.")) and mid in profile_base_ids:
continue
deduped.append(m)
_RECOMMENDED = [
"us.anthropic.claude-sonnet-4-6",
"us.anthropic.claude-opus-4-6",
"us.anthropic.claude-haiku-4-5",
"us.amazon.nova-pro",
"us.amazon.nova-lite",
"us.amazon.nova-micro",
"deepseek.v3",
"us.meta.llama4-maverick",
"us.meta.llama4-scout",
]
def _sort_key(m):
mid = m["id"]
for i, rec in enumerate(_RECOMMENDED):
if mid.startswith(rec):
return (0, i, mid)
if mid.startswith("global."):
return (1, 0, mid)
return (2, 0, mid)
deduped.sort(key=_sort_key)
model_list = [m["id"] for m in deduped]
print(f" Found {len(model_list)} text model(s) (filtered from {len(live_models)} total)")
else:
model_list = _PROVIDER_MODELS.get("bedrock", [])
if model_list:
print(f" Using {len(model_list)} curated models (live discovery unavailable)")
else:
print(" No models found. Check IAM permissions for bedrock:ListFoundationModels.")
return
# 4. Model selection
if model_list:
selected = _prompt_model_selection(model_list, current_model=current_model)
else:
try:
selected = input(" Model ID: ").strip()
except (KeyboardInterrupt, EOFError):
selected = None
if selected:
_save_model_choice(selected)
cfg = load_config()
model = cfg.get("model")
if not isinstance(model, dict):
model = {"default": model} if model else {}
cfg["model"] = model
model["provider"] = "bedrock"
model["base_url"] = f"https://bedrock-runtime.{region}.amazonaws.com"
model.pop("api_mode", None) # bedrock_converse is auto-detected
bedrock_cfg = cfg.get("bedrock", {})
if not isinstance(bedrock_cfg, dict):
bedrock_cfg = {}
bedrock_cfg["region"] = region
cfg["bedrock"] = bedrock_cfg
save_config(cfg)
deactivate_provider()
print(f" Default model set to: {selected} (via AWS Bedrock, {region})")
else:
print(" No change.")
def _model_flow_api_key_provider(config, provider_id, current_model=""):
"""Generic flow for API-key providers (z.ai, MiniMax, OpenCode, etc.)."""
from hermes_cli.auth import (
@@ -4749,6 +4997,7 @@ For more help on a command:
# gateway start
gateway_start = gateway_subparsers.add_parser("start", help="Start the installed systemd/launchd background service")
gateway_start.add_argument("--system", action="store_true", help="Target the Linux system-level gateway service")
gateway_start.add_argument("--all", action="store_true", help="Kill ALL stale gateway processes across all profiles before starting")
# gateway stop
gateway_stop = gateway_subparsers.add_parser("stop", help="Stop gateway service")
@@ -4758,6 +5007,7 @@ For more help on a command:
# gateway restart
gateway_restart = gateway_subparsers.add_parser("restart", help="Restart gateway service")
gateway_restart.add_argument("--system", action="store_true", help="Target the Linux system-level gateway service")
gateway_restart.add_argument("--all", action="store_true", help="Kill ALL gateway processes across all profiles before restarting")
# gateway status
gateway_status = gateway_subparsers.add_parser("status", help="Show gateway status")
@@ -5071,6 +5321,7 @@ Examples:
hermes debug share --lines 500 Include more log lines
hermes debug share --expire 30 Keep paste for 30 days
hermes debug share --local Print report locally (no upload)
hermes debug delete <url> Delete a previously uploaded paste
""",
)
debug_sub = debug_parser.add_subparsers(dest="debug_command")
@@ -5090,6 +5341,14 @@ Examples:
"--local", action="store_true",
help="Print the report locally instead of uploading",
)
delete_parser = debug_sub.add_parser(
"delete",
help="Delete a paste uploaded by 'hermes debug share'",
)
delete_parser.add_argument(
"urls", nargs="*", default=[],
help="One or more paste URLs to delete (e.g. https://paste.rs/abc123)",
)
debug_parser.set_defaults(func=cmd_debug)
# =========================================================================
@@ -6044,7 +6303,42 @@ Examples:
sys.exit(1)
_processed_argv = _coalesce_session_name_args(sys.argv[1:])
args = parser.parse_args(_processed_argv)
# ── Defensive subparser routing (bpo-9338 workaround) ───────────
# On some Python versions (notably <3.11), argparse fails to route
# subcommand tokens when the parent parser has nargs='?' optional
# arguments (--continue). The symptom: "unrecognized arguments: model"
# even though 'model' is a registered subcommand.
#
# Fix: when argv contains a token matching a known subcommand, set
# subparsers.required=True to force deterministic routing. If that
# fails (e.g. 'hermes -c model' where 'model' is consumed as the
# session name for --continue), fall back to the default behaviour.
import io as _io
_known_cmds = set(subparsers.choices.keys()) if hasattr(subparsers, "choices") else set()
_has_cmd_token = any(t in _known_cmds for t in _processed_argv if not t.startswith("-"))
if _has_cmd_token:
subparsers.required = True
_saved_stderr = sys.stderr
try:
sys.stderr = _io.StringIO()
args = parser.parse_args(_processed_argv)
sys.stderr = _saved_stderr
except SystemExit as exc:
sys.stderr = _saved_stderr
# Help/version flags (exit code 0) already printed output —
# re-raise immediately to avoid a second parse_args printing
# the same help text again (#10230).
if exc.code == 0:
raise
# Subcommand name was consumed as a flag value (e.g. -c model).
# Fall back to optional subparsers so argparse handles it normally.
subparsers.required = False
args = parser.parse_args(_processed_argv)
else:
subparsers.required = False
args = parser.parse_args(_processed_argv)
# Handle --version flag
if args.version:
+4 -2
View File
@@ -58,9 +58,11 @@ def _prompt(label: str, default: str | None = None, secret: bool = False) -> str
def _install_dependencies(provider_name: str) -> None:
"""Install pip dependencies declared in plugin.yaml."""
import subprocess
from pathlib import Path as _Path
from plugins.memory import find_provider_dir
plugin_dir = _Path(__file__).parent.parent / "plugins" / "memory" / provider_name
plugin_dir = find_provider_dir(provider_name)
if not plugin_dir:
return
yaml_path = plugin_dir / "plugin.yaml"
if not yaml_path.exists():
return
+22 -10
View File
@@ -274,6 +274,11 @@ def parse_model_flags(raw_args: str) -> tuple[str, str, bool]:
is_global = False
explicit_provider = ""
# Normalize Unicode dashes (Telegram/iOS auto-converts -- to em/en dash)
# A single Unicode dash before a flag keyword becomes "--"
import re as _re
raw_args = _re.sub(r'[\u2012\u2013\u2014\u2015](provider|global)', r'--\1', raw_args)
# Extract --global
if "--global" in raw_args:
is_global = True
@@ -786,7 +791,8 @@ def list_authenticated_providers(
from hermes_cli.models import OPENROUTER_MODELS, _PROVIDER_MODELS
results: List[dict] = []
seen_slugs: set = set()
seen_slugs: set = set() # lowercase-normalized to catch case variants (#9545)
seen_mdev_ids: set = set() # prevent duplicate entries for aliases (e.g. kimi-coding + kimi-coding-cn)
data = fetch_models_dev()
@@ -799,6 +805,11 @@ def list_authenticated_providers(
# --- 1. Check Hermes-mapped providers ---
for hermes_id, mdev_id in PROVIDER_TO_MODELS_DEV.items():
# Skip aliases that map to the same models.dev provider (e.g.
# kimi-coding and kimi-coding-cn both → kimi-for-coding).
# The first one with valid credentials wins (#10526).
if mdev_id in seen_mdev_ids:
continue
pdata = data.get(mdev_id)
if not isinstance(pdata, dict):
continue
@@ -837,7 +848,8 @@ def list_authenticated_providers(
"total_models": total,
"source": "built-in",
})
seen_slugs.add(slug)
seen_slugs.add(slug.lower())
seen_mdev_ids.add(mdev_id)
# --- 2. Check Hermes-only providers (nous, openai-codex, copilot, opencode-go) ---
from hermes_cli.providers import HERMES_OVERLAYS
@@ -849,12 +861,12 @@ def list_authenticated_providers(
_mdev_to_hermes = {v: k for k, v in PROVIDER_TO_MODELS_DEV.items()}
for pid, overlay in HERMES_OVERLAYS.items():
if pid in seen_slugs:
if pid.lower() in seen_slugs:
continue
# Resolve Hermes slug — e.g. "github-copilot" → "copilot"
hermes_slug = _mdev_to_hermes.get(pid, pid)
if hermes_slug in seen_slugs:
if hermes_slug.lower() in seen_slugs:
continue
# Check if credentials exist
@@ -935,8 +947,8 @@ def list_authenticated_providers(
"total_models": total,
"source": "hermes",
})
seen_slugs.add(pid)
seen_slugs.add(hermes_slug)
seen_slugs.add(pid.lower())
seen_slugs.add(hermes_slug.lower())
# --- 2b. Cross-check canonical provider list ---
# Catches providers that are in CANONICAL_PROVIDERS but weren't found
@@ -948,7 +960,7 @@ def list_authenticated_providers(
_canon_provs = []
for _cp in _canon_provs:
if _cp.slug in seen_slugs:
if _cp.slug.lower() in seen_slugs:
continue
# Check credentials via PROVIDER_REGISTRY (auth.py)
@@ -995,7 +1007,7 @@ def list_authenticated_providers(
"total_models": _cp_total,
"source": "canonical",
})
seen_slugs.add(_cp.slug)
seen_slugs.add(_cp.slug.lower())
# --- 3. User-defined endpoints from config ---
if user_providers and isinstance(user_providers, dict):
@@ -1068,7 +1080,7 @@ def list_authenticated_providers(
groups[slug]["models"].append(default_model)
for slug, grp in groups.items():
if slug in seen_slugs:
if slug.lower() in seen_slugs:
continue
results.append({
"slug": slug,
@@ -1080,7 +1092,7 @@ def list_authenticated_providers(
"source": "user-config",
"api_url": grp["api_url"],
})
seen_slugs.add(slug)
seen_slugs.add(slug.lower())
# Sort: current provider first, then by model count descending
results.sort(key=lambda r: (not r["is_current"], -r["total_models"]))
+58 -1
View File
@@ -303,6 +303,22 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"XiaomiMiMo/MiMo-V2-Flash",
"moonshotai/Kimi-K2-Thinking",
],
# AWS Bedrock — static fallback list used when dynamic discovery is
# unavailable (no boto3, no credentials, or API error). The agent
# prefers live discovery via ListFoundationModels + ListInferenceProfiles.
# Use inference profile IDs (us.*) since most models require them.
"bedrock": [
"us.anthropic.claude-sonnet-4-6",
"us.anthropic.claude-opus-4-6-v1",
"us.anthropic.claude-haiku-4-5-20251001-v1:0",
"us.anthropic.claude-sonnet-4-5-20250929-v1:0",
"us.amazon.nova-pro-v1:0",
"us.amazon.nova-lite-v1:0",
"us.amazon.nova-micro-v1:0",
"deepseek.v3.2",
"us.meta.llama4-maverick-17b-instruct-v1:0",
"us.meta.llama4-scout-17b-instruct-v1:0",
],
}
# ---------------------------------------------------------------------------
@@ -526,7 +542,7 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
ProviderEntry("deepseek", "DeepSeek", "DeepSeek (DeepSeek-V3, R1, coder — direct API)"),
ProviderEntry("xai", "xAI", "xAI (Grok models — direct API)"),
ProviderEntry("zai", "Z.AI / GLM", "Z.AI / GLM (Zhipu AI direct API)"),
ProviderEntry("kimi-coding", "Kimi / Moonshot", "Kimi / Moonshot (Moonshot AI direct API)"),
ProviderEntry("kimi-coding", "Kimi / Kimi Coding Plan", "Kimi Coding Plan (api.kimi.com) & Moonshot API"),
ProviderEntry("kimi-coding-cn", "Kimi / Moonshot (China)", "Kimi / Moonshot China (Moonshot CN direct API)"),
ProviderEntry("minimax", "MiniMax", "MiniMax (global direct API)"),
ProviderEntry("minimax-cn", "MiniMax (China)", "MiniMax China (domestic direct API)"),
@@ -536,6 +552,7 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
ProviderEntry("opencode-zen", "OpenCode Zen", "OpenCode Zen (35+ curated models, pay-as-you-go)"),
ProviderEntry("opencode-go", "OpenCode Go", "OpenCode Go (open models, $10/month subscription)"),
ProviderEntry("ai-gateway", "Vercel AI Gateway", "Vercel AI Gateway (200+ models, pay-per-use)"),
ProviderEntry("bedrock", "AWS Bedrock", "AWS Bedrock (Claude, Nova, Llama, DeepSeek — IAM or API key)"),
]
# Derived dicts — used throughout the codebase
@@ -587,6 +604,10 @@ _PROVIDER_ALIASES = {
"huggingface-hub": "huggingface",
"mimo": "xiaomi",
"xiaomi-mimo": "xiaomi",
"aws": "bedrock",
"aws-bedrock": "bedrock",
"amazon-bedrock": "bedrock",
"amazon": "bedrock",
"grok": "xai",
"x-ai": "xai",
"x.ai": "xai",
@@ -1957,6 +1978,42 @@ def validate_requested_model(
# api_models is None — couldn't reach API. Accept and persist,
# but warn so typos don't silently break things.
# Bedrock: use our own discovery instead of HTTP /models endpoint.
# Bedrock's bedrock-runtime URL doesn't support /models — it uses the
# AWS SDK control plane (ListFoundationModels + ListInferenceProfiles).
if normalized == "bedrock":
try:
from agent.bedrock_adapter import discover_bedrock_models, resolve_bedrock_region
region = resolve_bedrock_region()
discovered = discover_bedrock_models(region)
discovered_ids = {m["id"] for m in discovered}
if requested in discovered_ids:
return {
"accepted": True,
"persist": True,
"recognized": True,
"message": None,
}
# Not in discovered list — still accept (user may have custom
# inference profiles or cross-account access), but warn.
suggestions = get_close_matches(requested, list(discovered_ids), n=3, cutoff=0.4)
suggestion_text = ""
if suggestions:
suggestion_text = "\n Similar models: " + ", ".join(f"`{s}`" for s in suggestions)
return {
"accepted": True,
"persist": True,
"recognized": False,
"message": (
f"Note: `{requested}` was not found in Bedrock model discovery for {region}. "
f"It may still work with custom inference profiles or cross-account access."
f"{suggestion_text}"
),
}
except Exception:
pass # Fall through to generic warning
provider_label = _PROVIDER_LABELS.get(normalized, normalized)
return {
"accepted": True,
+99
View File
@@ -112,6 +112,7 @@ class LoadedPlugin:
module: Optional[types.ModuleType] = None
tools_registered: List[str] = field(default_factory=list)
hooks_registered: List[str] = field(default_factory=list)
commands_registered: List[str] = field(default_factory=list)
enabled: bool = False
error: Optional[str] = None
@@ -211,6 +212,84 @@ class PluginContext:
}
logger.debug("Plugin %s registered CLI command: %s", self.manifest.name, name)
# -- slash command registration -------------------------------------------
def register_command(
self,
name: str,
handler: Callable,
description: str = "",
) -> None:
"""Register a slash command (e.g. ``/lcm``) available in CLI and gateway sessions.
The handler signature is ``fn(raw_args: str) -> str | None``.
It may also be an async callable the gateway dispatch handles both.
Unlike ``register_cli_command()`` (which creates ``hermes <subcommand>``
terminal commands), this registers in-session slash commands that users
invoke during a conversation.
Names conflicting with built-in commands are rejected with a warning.
"""
clean = name.lower().strip().lstrip("/").replace(" ", "-")
if not clean:
logger.warning(
"Plugin '%s' tried to register a command with an empty name.",
self.manifest.name,
)
return
# Reject if it conflicts with a built-in command
try:
from hermes_cli.commands import resolve_command
if resolve_command(clean) is not None:
logger.warning(
"Plugin '%s' tried to register command '/%s' which conflicts "
"with a built-in command. Skipping.",
self.manifest.name, clean,
)
return
except Exception:
pass # If commands module isn't available, skip the check
self._manager._plugin_commands[clean] = {
"handler": handler,
"description": description or "Plugin command",
"plugin": self.manifest.name,
}
logger.debug("Plugin %s registered command: /%s", self.manifest.name, clean)
# -- tool dispatch -------------------------------------------------------
def dispatch_tool(self, tool_name: str, args: dict, **kwargs) -> str:
"""Dispatch a tool call through the registry, with parent agent context.
This is the public interface for plugin slash commands that need to call
tools like ``delegate_task`` without reaching into framework internals.
The parent agent (if available) is resolved automatically plugins never
need to access the agent directly.
Args:
tool_name: Registry name of the tool (e.g. ``"delegate_task"``).
args: Tool arguments dict (same as what the model would pass).
**kwargs: Extra keyword args forwarded to the registry dispatch.
Returns:
JSON string from the tool handler (same format as model tool calls).
"""
from tools.registry import registry
# Wire up parent agent context when available (CLI mode).
# In gateway mode _cli_ref is None — tools degrade gracefully
# (workspace hints fall back to TERMINAL_CWD, no spinner).
if "parent_agent" not in kwargs:
cli = self._manager._cli_ref
agent = getattr(cli, "agent", None) if cli else None
if agent is not None:
kwargs["parent_agent"] = agent
return registry.dispatch(tool_name, args, **kwargs)
# -- context engine registration -----------------------------------------
def register_context_engine(self, engine) -> None:
@@ -323,6 +402,7 @@ class PluginManager:
self._plugin_tool_names: Set[str] = set()
self._cli_commands: Dict[str, dict] = {}
self._context_engine = None # Set by a plugin via register_context_engine()
self._plugin_commands: Dict[str, dict] = {} # Slash commands registered by plugins
self._discovered: bool = False
self._cli_ref = None # Set by CLI after plugin discovery
# Plugin skill registry: qualified name → metadata dict.
@@ -485,6 +565,10 @@ class PluginManager:
for h in p.hooks_registered
}
)
loaded.commands_registered = [
c for c in self._plugin_commands
if self._plugin_commands[c].get("plugin") == manifest.name
]
loaded.enabled = True
except Exception as exc:
@@ -598,6 +682,7 @@ class PluginManager:
"enabled": loaded.enabled,
"tools": len(loaded.tools_registered),
"hooks": len(loaded.hooks_registered),
"commands": len(loaded.commands_registered),
"error": loaded.error,
}
)
@@ -699,6 +784,20 @@ def get_plugin_context_engine():
return get_plugin_manager()._context_engine
def get_plugin_command_handler(name: str) -> Optional[Callable]:
"""Return the handler for a plugin-registered slash command, or ``None``."""
entry = get_plugin_manager()._plugin_commands.get(name)
return entry["handler"] if entry else None
def get_plugin_commands() -> Dict[str, dict]:
"""Return the full plugin commands dict (name → {handler, description, plugin}).
Safe to call before discovery returns an empty dict if no plugins loaded.
"""
return get_plugin_manager()._plugin_commands
def get_plugin_toolsets() -> List[tuple]:
"""Return plugin toolsets as ``(key, label, description)`` tuples.
+14
View File
@@ -236,6 +236,12 @@ ALIASES: Dict[str, str] = {
"mimo": "xiaomi",
"xiaomi-mimo": "xiaomi",
# bedrock
"aws": "bedrock",
"aws-bedrock": "bedrock",
"amazon-bedrock": "bedrock",
"amazon": "bedrock",
# arcee
"arcee-ai": "arcee",
"arceeai": "arcee",
@@ -262,6 +268,7 @@ _LABEL_OVERRIDES: Dict[str, str] = {
"copilot-acp": "GitHub Copilot ACP",
"xiaomi": "Xiaomi MiMo",
"local": "Local endpoint",
"bedrock": "AWS Bedrock",
}
@@ -271,6 +278,7 @@ TRANSPORT_TO_API_MODE: Dict[str, str] = {
"openai_chat": "chat_completions",
"anthropic_messages": "anthropic_messages",
"codex_responses": "codex_responses",
"bedrock_converse": "bedrock_converse",
}
@@ -388,6 +396,10 @@ def determine_api_mode(provider: str, base_url: str = "") -> str:
if pdef is not None:
return TRANSPORT_TO_API_MODE.get(pdef.transport, "chat_completions")
# Direct provider checks for providers not in HERMES_OVERLAYS
if provider == "bedrock":
return "bedrock_converse"
# URL-based heuristics for custom / unknown providers
if base_url:
url_lower = base_url.rstrip("/").lower()
@@ -395,6 +407,8 @@ def determine_api_mode(provider: str, base_url: str = "") -> str:
return "anthropic_messages"
if "api.openai.com" in url_lower:
return "codex_responses"
if "bedrock-runtime" in url_lower and "amazonaws.com" in url_lower:
return "bedrock_converse"
return "chat_completions"
+73 -1
View File
@@ -124,7 +124,7 @@ def _copilot_runtime_api_mode(model_cfg: Dict[str, Any], api_key: str) -> str:
return "chat_completions"
_VALID_API_MODES = {"chat_completions", "codex_responses", "anthropic_messages"}
_VALID_API_MODES = {"chat_completions", "codex_responses", "anthropic_messages", "bedrock_converse"}
def _parse_api_mode(raw: Any) -> Optional[str]:
@@ -167,6 +167,7 @@ def _resolve_runtime_from_pool_entry(
api_mode = "chat_completions"
elif provider == "copilot":
api_mode = _copilot_runtime_api_mode(model_cfg, getattr(entry, "runtime_api_key", ""))
base_url = base_url or PROVIDER_REGISTRY["copilot"].inference_base_url
else:
configured_provider = str(model_cfg.get("provider") or "").strip().lower()
# Honour model.base_url from config.yaml when the configured provider
@@ -836,6 +837,77 @@ def resolve_runtime_provider(
"requested_provider": requested_provider,
}
# AWS Bedrock (native Converse API via boto3)
if provider == "bedrock":
from agent.bedrock_adapter import (
has_aws_credentials,
resolve_aws_auth_env_var,
resolve_bedrock_region,
is_anthropic_bedrock_model,
)
# When the user explicitly selected bedrock (not auto-detected),
# trust boto3's credential chain — it handles IMDS, ECS task roles,
# Lambda execution roles, SSO, and other implicit sources that our
# env-var check can't detect.
is_explicit = requested_provider in ("bedrock", "aws", "aws-bedrock", "amazon-bedrock", "amazon")
if not is_explicit and not has_aws_credentials():
raise AuthError(
"No AWS credentials found for Bedrock. Configure one of:\n"
" - AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY\n"
" - AWS_PROFILE (for SSO / named profiles)\n"
" - IAM instance role (EC2, ECS, Lambda)\n"
"Or run 'aws configure' to set up credentials.",
code="no_aws_credentials",
)
# Read bedrock-specific config from config.yaml
from hermes_cli.config import load_config as _load_bedrock_config
_bedrock_cfg = _load_bedrock_config().get("bedrock", {})
# Region priority: config.yaml bedrock.region → env var → us-east-1
region = (_bedrock_cfg.get("region") or "").strip() or resolve_bedrock_region()
auth_source = resolve_aws_auth_env_var() or "aws-sdk-default-chain"
# Build guardrail config if configured
_gr = _bedrock_cfg.get("guardrail", {})
guardrail_config = None
if _gr.get("guardrail_identifier") and _gr.get("guardrail_version"):
guardrail_config = {
"guardrailIdentifier": _gr["guardrail_identifier"],
"guardrailVersion": _gr["guardrail_version"],
}
if _gr.get("stream_processing_mode"):
guardrail_config["streamProcessingMode"] = _gr["stream_processing_mode"]
if _gr.get("trace"):
guardrail_config["trace"] = _gr["trace"]
# Dual-path routing: Claude models use AnthropicBedrock SDK for full
# feature parity (prompt caching, thinking budgets, adaptive thinking).
# Non-Claude models use the Converse API for multi-model support.
_current_model = str(model_cfg.get("default") or "").strip()
if is_anthropic_bedrock_model(_current_model):
# Claude on Bedrock → AnthropicBedrock SDK → anthropic_messages path
runtime = {
"provider": "bedrock",
"api_mode": "anthropic_messages",
"base_url": f"https://bedrock-runtime.{region}.amazonaws.com",
"api_key": "aws-sdk",
"source": auth_source,
"region": region,
"bedrock_anthropic": True, # Signal to use AnthropicBedrock client
"requested_provider": requested_provider,
}
else:
# Non-Claude (Nova, DeepSeek, Llama, etc.) → Converse API
runtime = {
"provider": "bedrock",
"api_mode": "bedrock_converse",
"base_url": f"https://bedrock-runtime.{region}.amazonaws.com",
"api_key": "aws-sdk",
"source": auth_source,
"region": region,
"requested_provider": requested_provider,
}
if guardrail_config:
runtime["guardrail_config"] = guardrail_config
return runtime
# API-key providers (z.ai/GLM, Kimi, MiniMax, MiniMax-CN)
pconfig = PROVIDER_REGISTRY.get(provider)
if pconfig and pconfig.auth_type == "api_key":
+13 -3
View File
@@ -1611,9 +1611,19 @@ def _setup_telegram():
return
print_info("Create a bot via @BotFather on Telegram")
token = prompt("Telegram bot token", password=True)
if not token:
return
import re
while True:
token = prompt("Telegram bot token", password=True)
if not token:
return
if not re.match(r"^\d+:[A-Za-z0-9_-]{30,}$", token):
print_error(
"Invalid token format. Expected: <numeric_id>:<alphanumeric_hash> "
"(e.g., 123456789:ABCdefGHI-jklMNOpqrSTUvwxYZ)"
)
continue
break
save_env_value("TELEGRAM_BOT_TOKEN", token)
print_success("Telegram token saved")
+40 -16
View File
@@ -48,12 +48,14 @@ from hermes_cli.cli_output import ( # noqa: E402 — late import block
# These map to keys in toolsets.py TOOLSETS dict.
CONFIGURABLE_TOOLSETS = [
("web", "🔍 Web Search & Scraping", "web_search, web_extract"),
("x_search", "🐦 X Search", "x_search"),
("browser", "🌐 Browser Automation", "navigate, click, type, scroll"),
("terminal", "💻 Terminal & Processes", "terminal, process"),
("file", "📁 File Operations", "read, write, patch, search"),
("code_execution", "⚡ Code Execution", "execute_code"),
("vision", "👁️ Vision / Image Analysis", "vision_analyze"),
("image_gen", "🎨 Image Generation", "image_generate"),
("video_gen", "🎬 Video Generation", "video_generate"),
("moa", "🧠 Mixture of Agents", "mixture_of_agents"),
("tts", "🔊 Text-to-Speech", "text_to_speech"),
("skills", "📚 Skills", "list, view, manage"),
@@ -63,6 +65,7 @@ CONFIGURABLE_TOOLSETS = [
("clarify", "❓ Clarifying Questions", "clarify"),
("delegation", "👥 Task Delegation", "delegate_task"),
("cronjob", "⏰ Cron Jobs", "create/list/update/pause/resume/run, with optional attached skills"),
("messaging", "📨 Cross-Platform Messaging", "send_message"),
("rl", "🧪 RL Training", "Tinker-Atropos training tools"),
("homeassistant", "🏠 Home Assistant", "smart home device control"),
]
@@ -121,6 +124,7 @@ TOOL_CATEGORIES = {
"providers": [
{
"name": "Nous Subscription",
"badge": "subscription",
"tag": "Managed OpenAI TTS billed to your subscription",
"env_vars": [],
"tts_provider": "openai",
@@ -130,13 +134,15 @@ TOOL_CATEGORIES = {
},
{
"name": "Microsoft Edge TTS",
"tag": "Free - no API key needed",
"badge": "★ recommended · free",
"tag": "Good quality, no API key needed",
"env_vars": [],
"tts_provider": "edge",
},
{
"name": "OpenAI TTS",
"tag": "Premium - high quality voices",
"badge": "paid",
"tag": "High quality voices",
"env_vars": [
{"key": "VOICE_TOOLS_OPENAI_KEY", "prompt": "OpenAI API key", "url": "https://platform.openai.com/api-keys"},
],
@@ -144,7 +150,8 @@ TOOL_CATEGORIES = {
},
{
"name": "ElevenLabs",
"tag": "Premium - most natural voices",
"badge": "paid",
"tag": "Most natural voices",
"env_vars": [
{"key": "ELEVENLABS_API_KEY", "prompt": "ElevenLabs API key", "url": "https://elevenlabs.io/app/settings/api-keys"},
],
@@ -152,7 +159,8 @@ TOOL_CATEGORIES = {
},
{
"name": "Mistral (Voxtral TTS)",
"tag": "Multilingual, native Opus, needs MISTRAL_API_KEY",
"badge": "paid",
"tag": "Multilingual, native Opus",
"env_vars": [
{"key": "MISTRAL_API_KEY", "prompt": "Mistral API key", "url": "https://console.mistral.ai/"},
],
@@ -168,6 +176,7 @@ TOOL_CATEGORIES = {
"providers": [
{
"name": "Nous Subscription",
"badge": "subscription",
"tag": "Managed Firecrawl billed to your subscription",
"web_backend": "firecrawl",
"env_vars": [],
@@ -177,7 +186,8 @@ TOOL_CATEGORIES = {
},
{
"name": "Firecrawl Cloud",
"tag": "Hosted service - search, extract, and crawl",
"badge": "★ recommended",
"tag": "Full-featured search, extract, and crawl",
"web_backend": "firecrawl",
"env_vars": [
{"key": "FIRECRAWL_API_KEY", "prompt": "Firecrawl API key", "url": "https://firecrawl.dev"},
@@ -185,7 +195,8 @@ TOOL_CATEGORIES = {
},
{
"name": "Exa",
"tag": "AI-native search and contents",
"badge": "paid",
"tag": "Neural search with semantic understanding",
"web_backend": "exa",
"env_vars": [
{"key": "EXA_API_KEY", "prompt": "Exa API key", "url": "https://exa.ai"},
@@ -193,7 +204,8 @@ TOOL_CATEGORIES = {
},
{
"name": "Parallel",
"tag": "AI-native search and extract",
"badge": "paid",
"tag": "AI-powered search and extract",
"web_backend": "parallel",
"env_vars": [
{"key": "PARALLEL_API_KEY", "prompt": "Parallel API key", "url": "https://parallel.ai"},
@@ -201,7 +213,8 @@ TOOL_CATEGORIES = {
},
{
"name": "Tavily",
"tag": "AI-native search, extract, and crawl",
"badge": "free tier",
"tag": "Search, extract, and crawl — 1000 free searches/mo",
"web_backend": "tavily",
"env_vars": [
{"key": "TAVILY_API_KEY", "prompt": "Tavily API key", "url": "https://app.tavily.com/home"},
@@ -209,7 +222,8 @@ TOOL_CATEGORIES = {
},
{
"name": "Firecrawl Self-Hosted",
"tag": "Free - run your own instance",
"badge": "free · self-hosted",
"tag": "Run your own Firecrawl instance (Docker)",
"web_backend": "firecrawl",
"env_vars": [
{"key": "FIRECRAWL_API_URL", "prompt": "Your Firecrawl instance URL (e.g., http://localhost:3002)"},
@@ -223,6 +237,7 @@ TOOL_CATEGORIES = {
"providers": [
{
"name": "Nous Subscription",
"badge": "subscription",
"tag": "Managed FAL image generation billed to your subscription",
"env_vars": [],
"requires_nous_auth": True,
@@ -231,6 +246,7 @@ TOOL_CATEGORIES = {
},
{
"name": "FAL.ai",
"badge": "paid",
"tag": "FLUX 2 Pro with auto-upscaling",
"env_vars": [
{"key": "FAL_KEY", "prompt": "FAL API key", "url": "https://fal.ai/dashboard/keys"},
@@ -244,6 +260,7 @@ TOOL_CATEGORIES = {
"providers": [
{
"name": "Nous Subscription (Browser Use cloud)",
"badge": "subscription",
"tag": "Managed Browser Use billed to your subscription",
"env_vars": [],
"browser_provider": "browser-use",
@@ -254,14 +271,16 @@ TOOL_CATEGORIES = {
},
{
"name": "Local Browser",
"tag": "Free headless Chromium (no API key needed)",
"badge": "★ recommended · free",
"tag": "Headless Chromium, no API key needed",
"env_vars": [],
"browser_provider": "local",
"post_setup": "agent_browser",
},
{
"name": "Browserbase",
"tag": "Cloud browser with stealth & proxies",
"badge": "paid",
"tag": "Cloud browser with stealth and proxies",
"env_vars": [
{"key": "BROWSERBASE_API_KEY", "prompt": "Browserbase API key", "url": "https://browserbase.com"},
{"key": "BROWSERBASE_PROJECT_ID", "prompt": "Browserbase project ID"},
@@ -271,6 +290,7 @@ TOOL_CATEGORIES = {
},
{
"name": "Browser Use",
"badge": "paid",
"tag": "Cloud browser with remote execution",
"env_vars": [
{"key": "BROWSER_USE_API_KEY", "prompt": "Browser Use API key", "url": "https://browser-use.com"},
@@ -280,6 +300,7 @@ TOOL_CATEGORIES = {
},
{
"name": "Firecrawl",
"badge": "paid",
"tag": "Cloud browser with remote execution",
"env_vars": [
{"key": "FIRECRAWL_API_KEY", "prompt": "Firecrawl API key", "url": "https://firecrawl.dev"},
@@ -289,7 +310,8 @@ TOOL_CATEGORIES = {
},
{
"name": "Camofox",
"tag": "Local anti-detection browser (Firefox/Camoufox)",
"badge": "free · local",
"tag": "Anti-detection browser (Firefox/Camoufox)",
"env_vars": [
{"key": "CAMOFOX_URL", "prompt": "Camofox server URL", "default": "http://localhost:9377",
"url": "https://github.com/jo-inc/camofox-browser"},
@@ -838,7 +860,8 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
# Plain text labels only (no ANSI codes in menu items)
provider_choices = []
for p in providers:
tag = f" ({p['tag']})" if p.get("tag") else ""
badge = f" [{p['badge']}]" if p.get("badge") else ""
tag = f"{p['tag']}" if p.get("tag") else ""
configured = ""
env_vars = p.get("env_vars", [])
if not env_vars or all(get_env_value(v["key"]) for v in env_vars):
@@ -848,7 +871,7 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
configured = ""
else:
configured = " [configured]"
provider_choices.append(f"{p['name']}{tag}{configured}")
provider_choices.append(f"{p['name']}{badge}{tag}{configured}")
# Add skip option
provider_choices.append("Skip — keep defaults / configure later")
@@ -1104,7 +1127,8 @@ def _configure_tool_category_for_reconfig(ts_key: str, cat: dict, config: dict):
provider_choices = []
for p in providers:
tag = f" ({p['tag']})" if p.get("tag") else ""
badge = f" [{p['badge']}]" if p.get("badge") else ""
tag = f"{p['tag']}" if p.get("tag") else ""
configured = ""
env_vars = p.get("env_vars", [])
if not env_vars or all(get_env_value(v["key"]) for v in env_vars):
@@ -1114,7 +1138,7 @@ def _configure_tool_category_for_reconfig(ts_key: str, cat: dict, config: dict):
configured = ""
else:
configured = " [configured]"
provider_choices.append(f"{p['name']}{tag}{configured}")
provider_choices.append(f"{p['name']}{badge}{tag}{configured}")
default_idx = _detect_active_provider_index(providers, config)
+1
View File
@@ -358,6 +358,7 @@ def _add_rotating_handler(
path.parent.mkdir(parents=True, exist_ok=True)
handler = _ManagedRotatingFileHandler(
str(path), maxBytes=max_bytes, backupCount=backup_count,
encoding="utf-8",
)
handler.setLevel(level)
handler.setFormatter(formatter)
+2 -40
View File
@@ -26,7 +26,7 @@ import logging
import threading
from typing import Dict, Any, List, Optional, Tuple
from tools.registry import registry
from tools.registry import discover_builtin_tools, registry
from toolsets import resolve_toolset, validate_toolset
logger = logging.getLogger(__name__)
@@ -129,45 +129,7 @@ def _run_async(coro):
# Tool Discovery (importing each module triggers its registry.register calls)
# =============================================================================
def _discover_tools():
"""Import all tool modules to trigger their registry.register() calls.
Wrapped in a function so import errors in optional tools (e.g., fal_client
not installed) don't prevent the rest from loading.
"""
_modules = [
"tools.web_tools",
"tools.terminal_tool",
"tools.file_tools",
"tools.vision_tools",
"tools.mixture_of_agents_tool",
"tools.image_generation_tool",
"tools.skills_tool",
"tools.skill_manager_tool",
"tools.browser_tool",
"tools.cronjob_tools",
"tools.rl_training_tool",
"tools.tts_tool",
"tools.todo_tool",
"tools.memory_tool",
"tools.session_search_tool",
"tools.clarify_tool",
"tools.code_execution_tool",
"tools.delegate_tool",
"tools.process_registry",
"tools.send_message_tool",
# "tools.honcho_tools", # Removed — Honcho is now a memory provider plugin
"tools.homeassistant_tool",
]
import importlib
for mod_name in _modules:
try:
importlib.import_module(mod_name)
except Exception as e:
logger.warning("Could not import tool module %s: %s", mod_name, e)
_discover_tools()
discover_builtin_tools()
# MCP tool discovery (external MCP servers from config)
try:
@@ -1,12 +1,12 @@
---
name: honcho
description: Configure and use Honcho memory with Hermes -- cross-session user modeling, multi-profile peer isolation, observation config, and dialectic reasoning. Use when setting up Honcho, troubleshooting memory, managing profiles with Honcho peers, or tuning observation and recall settings.
version: 1.0.0
description: Configure and use Honcho memory with Hermes -- cross-session user modeling, multi-profile peer isolation, observation config, dialectic reasoning, session summaries, and context budget enforcement. Use when setting up Honcho, troubleshooting memory, managing profiles with Honcho peers, or tuning observation, recall, and dialectic settings.
version: 2.0.0
author: Hermes Agent
license: MIT
metadata:
hermes:
tags: [Honcho, Memory, Profiles, Observation, Dialectic, User-Modeling]
tags: [Honcho, Memory, Profiles, Observation, Dialectic, User-Modeling, Session-Summary]
homepage: https://docs.honcho.dev
related_skills: [hermes-agent]
prerequisites:
@@ -22,8 +22,9 @@ Honcho provides AI-native cross-session user modeling. It learns who the user is
- Setting up Honcho (cloud or self-hosted)
- Troubleshooting memory not working / peers not syncing
- Creating multi-profile setups where each agent has its own Honcho peer
- Tuning observation, recall, or write frequency settings
- Understanding what the 4 Honcho tools do and when to use them
- Tuning observation, recall, dialectic depth, or write frequency settings
- Understanding what the 5 Honcho tools do and when to use them
- Configuring context budgets and session summary injection
## Setup
@@ -51,6 +52,27 @@ hermes honcho status # shows resolved config, connection test, peer info
## Architecture
### Base Context Injection
When Honcho injects context into the system prompt (in `hybrid` or `context` recall modes), it assembles the base context block in this order:
1. **Session summary** -- a short digest of the current session so far (placed first so the model has immediate conversational continuity)
2. **User representation** -- Honcho's accumulated model of the user (preferences, facts, patterns)
3. **AI peer card** -- the identity card for this Hermes profile's AI peer
The session summary is generated automatically by Honcho at the start of each turn (when a prior session exists). It gives the model a warm start without replaying full history.
### Cold / Warm Prompt Selection
Honcho automatically selects between two prompt strategies:
| Condition | Strategy | What happens |
|-----------|----------|--------------|
| No prior session or empty representation | **Cold start** | Lightweight intro prompt; skips summary injection; encourages the model to learn about the user |
| Existing representation and/or session history | **Warm start** | Full base context injection (summary → representation → card); richer system prompt |
You do not need to configure this -- it is automatic based on session state.
### Peers
Honcho models conversations as interactions between **peers**. Hermes creates two peers per session:
@@ -112,6 +134,63 @@ How the agent accesses Honcho memory:
| `context` | Yes | No (hidden) | Minimal token cost, no tool calls |
| `tools` | No | Yes | Agent controls all memory access explicitly |
## Three Orthogonal Knobs
Honcho's dialectic behavior is controlled by three independent dimensions. Each can be tuned without affecting the others:
### Cadence (when)
Controls **how often** dialectic and context calls happen.
| Key | Default | Description |
|-----|---------|-------------|
| `contextCadence` | `1` | Min turns between context API calls |
| `dialecticCadence` | `3` | Min turns between dialectic API calls |
| `injectionFrequency` | `every-turn` | `every-turn` or `first-turn` for base context injection |
Higher cadence values reduce API calls and cost. `dialecticCadence: 3` (default) means the dialectic engine fires at most every 3rd turn.
### Depth (how many)
Controls **how many rounds** of dialectic reasoning Honcho performs per query.
| Key | Default | Range | Description |
|-----|---------|-------|-------------|
| `dialecticDepth` | `1` | 1-3 | Number of dialectic reasoning rounds per query |
| `dialecticDepthLevels` | -- | array | Optional per-depth-round level overrides (see below) |
`dialecticDepth: 2` means Honcho runs two rounds of dialectic synthesis. The first round produces an initial answer; the second refines it.
`dialecticDepthLevels` lets you set the reasoning level for each round independently:
```json
{
"dialecticDepth": 3,
"dialecticDepthLevels": ["low", "medium", "high"]
}
```
If `dialecticDepthLevels` is omitted, rounds use **proportional levels** derived from `dialecticReasoningLevel` (the base):
| Depth | Pass levels |
|-------|-------------|
| 1 | [base] |
| 2 | [minimal, base] |
| 3 | [minimal, base, low] |
This keeps earlier passes cheap while using full depth on the final synthesis.
### Level (how hard)
Controls the **intensity** of each dialectic reasoning round.
| Key | Default | Description |
|-----|---------|-------------|
| `dialecticReasoningLevel` | `low` | `minimal`, `low`, `medium`, `high`, `max` |
| `dialecticDynamic` | `true` | When `true`, the model can pass `reasoning_level` to `honcho_reasoning` to override the default per-call. `false` = always use `dialecticReasoningLevel`, model overrides ignored |
Higher levels produce richer synthesis but cost more tokens on Honcho's backend.
## Multi-Profile Setup
Each Hermes profile gets its own Honcho AI peer while sharing the same workspace (user context). This means:
@@ -149,6 +228,7 @@ Override any setting in the host block:
"hermes.coder": {
"aiPeer": "coder",
"recallMode": "tools",
"dialecticDepth": 2,
"observation": {
"user": { "observeMe": true, "observeOthers": false },
"ai": { "observeMe": true, "observeOthers": true }
@@ -160,19 +240,97 @@ Override any setting in the host block:
## Tools
The agent has 4 Honcho tools (hidden in `context` recall mode):
The agent has 5 bidirectional Honcho tools (hidden in `context` recall mode):
| Tool | LLM call? | Cost | Use when |
|------|-----------|------|----------|
| `honcho_profile` | No | minimal | Quick factual snapshot at conversation start or for fast name/role/pref lookups |
| `honcho_search` | No | low | Fetch specific past facts to reason over yourself — raw excerpts, no synthesis |
| `honcho_context` | No | low | Full session context snapshot: summary, representation, card, recent messages |
| `honcho_reasoning` | Yes | mediumhigh | Natural language question synthesized by Honcho's dialectic engine |
| `honcho_conclude` | No | minimal | Write or delete a persistent fact; pass `peer: "ai"` for AI self-knowledge |
### `honcho_profile`
Quick factual snapshot of the user -- name, role, preferences, patterns. No LLM call, minimal cost. Use at conversation start or for fast lookups.
Read or update a peer card — curated key facts (name, role, preferences, communication style). Pass `card: [...]` to update; omit to read. No LLM call.
### `honcho_search`
Semantic search over stored context. Returns raw excerpts ranked by relevance, no LLM synthesis. Default 800 tokens, max 2000. Use when you want specific past facts to reason over yourself.
Semantic search over stored context for a specific peer. Returns raw excerpts ranked by relevance, no synthesis. Default 800 tokens, max 2000. Good when you need specific past facts to reason over yourself rather than a synthesized answer.
### `honcho_context`
Natural language question answered by Honcho's dialectic reasoning (LLM call on Honcho's backend). Higher cost, higher quality. Can query about user (default) or the AI peer.
Full session context snapshot from Honcho — session summary, peer representation, peer card, and recent messages. No LLM call. Use when you want to see everything Honcho knows about the current session and peer in one shot.
### `honcho_reasoning`
Natural language question answered by Honcho's dialectic reasoning engine (LLM call on Honcho's backend). Higher cost, higher quality. Pass `reasoning_level` to control depth: `minimal` (fast/cheap) → `low``medium``high``max` (thorough). Omit to use the configured default (`low`). Use for synthesized understanding of the user's patterns, goals, or current state.
### `honcho_conclude`
Write a persistent fact about the user. Conclusions build the user's profile over time. Use when the user states a preference, corrects you, or shares something to remember.
Write or delete a persistent conclusion about a peer. Pass `conclusion: "..."` to create. Pass `delete_id: "..."` to remove a conclusion (for PII removal — Honcho self-heals incorrect conclusions over time, so deletion is only needed for PII). You MUST pass exactly one of the two.
### Bidirectional peer targeting
All 5 tools accept an optional `peer` parameter:
- `peer: "user"` (default) — operates on the user peer
- `peer: "ai"` — operates on this profile's AI peer
- `peer: "<explicit-id>"` — any peer ID in the workspace
Examples:
```
honcho_profile # read user's card
honcho_profile peer="ai" # read AI peer's card
honcho_reasoning query="What does this user care about most?"
honcho_reasoning query="What are my interaction patterns?" peer="ai" reasoning_level="medium"
honcho_conclude conclusion="Prefers terse answers"
honcho_conclude conclusion="I tend to over-explain code" peer="ai"
honcho_conclude delete_id="abc123" # PII removal
```
## Agent Usage Patterns
Guidelines for Hermes when Honcho memory is active.
### On conversation start
```
1. honcho_profile → fast warmup, no LLM cost
2. If context looks thin → honcho_context (full snapshot, still no LLM)
3. If deep synthesis needed → honcho_reasoning (LLM call, use sparingly)
```
Do NOT call `honcho_reasoning` on every turn. Auto-injection already handles ongoing context refresh. Use the reasoning tool only when you genuinely need synthesized insight the base context doesn't provide.
### When the user shares something to remember
```
honcho_conclude conclusion="<specific, actionable fact>"
```
Good conclusions: "Prefers code examples over prose explanations", "Working on a Rust async project through April 2026"
Bad conclusions: "User said something about Rust" (too vague), "User seems technical" (already in representation)
### When the user asks about past context / you need to recall specifics
```
honcho_search query="<topic>" → fast, no LLM, good for specific facts
honcho_context → full snapshot with summary + messages
honcho_reasoning query="<question>" → synthesized answer, use when search isn't enough
```
### When to use `peer: "ai"`
Use AI peer targeting to build and query the agent's own self-knowledge:
- `honcho_conclude conclusion="I tend to be verbose when explaining architecture" peer="ai"` — self-correction
- `honcho_reasoning query="How do I typically handle ambiguous requests?" peer="ai"` — self-audit
- `honcho_profile peer="ai"` — review own identity card
### When NOT to call tools
In `hybrid` and `context` modes, base context (user representation + card + session summary) is auto-injected before every turn. Do not re-fetch what was already injected. Call tools only when:
- You need something the injected context doesn't have
- The user explicitly asks you to recall or check memory
- You're writing a conclusion about something new
### Cadence awareness
`honcho_reasoning` on the tool side shares the same cost as auto-injection dialectic. After an explicit tool call, the auto-injection cadence resets — avoiding double-charging the same turn.
## Config Reference
@@ -191,18 +349,39 @@ Config file: `$HERMES_HOME/honcho.json` (profile-local) or `~/.honcho/config.jso
| `observation` | all on | Per-peer `observeMe`/`observeOthers` booleans |
| `writeFrequency` | `async` | `async`, `turn`, `session`, or integer N |
| `sessionStrategy` | `per-directory` | `per-directory`, `per-repo`, `per-session`, `global` |
| `dialecticReasoningLevel` | `low` | `minimal`, `low`, `medium`, `high`, `max` |
| `dialecticDynamic` | `true` | Auto-bump reasoning by query length. `false` = fixed level |
| `messageMaxChars` | `25000` | Max chars per message (chunked if exceeded) |
| `dialecticMaxInputChars` | `10000` | Max chars for dialectic query input |
### Cost-awareness (advanced, root config only)
### Dialectic settings
| Key | Default | Description |
|-----|---------|-------------|
| `dialecticReasoningLevel` | `low` | `minimal`, `low`, `medium`, `high`, `max` |
| `dialecticDynamic` | `true` | Auto-bump reasoning by query complexity. `false` = fixed level |
| `dialecticDepth` | `1` | Number of dialectic rounds per query (1-3) |
| `dialecticDepthLevels` | -- | Optional array of per-round levels, e.g. `["low", "high"]` |
| `dialecticMaxInputChars` | `10000` | Max chars for dialectic query input |
### Context budget and injection
| Key | Default | Description |
|-----|---------|-------------|
| `contextTokens` | uncapped | Max tokens for the combined base context injection (summary + representation + card). Opt-in cap — omit to leave uncapped, set to an integer to bound injection size. |
| `injectionFrequency` | `every-turn` | `every-turn` or `first-turn` |
| `contextCadence` | `1` | Min turns between context API calls |
| `dialecticCadence` | `1` | Min turns between dialectic API calls |
| `dialecticCadence` | `3` | Min turns between dialectic LLM calls |
The `contextTokens` budget is enforced at injection time. If the session summary + representation + card exceed the budget, Honcho trims the summary first, then the representation, preserving the card. This prevents context blowup in long sessions.
### Memory-context sanitization
Honcho sanitizes the `memory-context` block before injection to prevent prompt injection and malformed content:
- Strips XML/HTML tags from user-authored conclusions
- Normalizes whitespace and control characters
- Truncates individual conclusions that exceed `messageMaxChars`
- Escapes delimiter sequences that could break the system prompt structure
This fix addresses edge cases where raw user conclusions containing markup or special characters could corrupt the injected context block.
## Troubleshooting
@@ -221,6 +400,12 @@ Observation config is synced from the server on each session init. Start a new s
### Messages truncated
Messages over `messageMaxChars` (default 25k) are automatically chunked with `[continued]` markers. If you're hitting this often, check if tool results or skill content is inflating message size.
### Context injection too large
If you see warnings about context budget exceeded, lower `contextTokens` or reduce `dialecticDepth`. The session summary is trimmed first when the budget is tight.
### Session summary missing
Session summary requires at least one prior turn in the current Honcho session. On cold start (new session, no history), the summary is omitted and Honcho uses the cold-start prompt strategy instead.
## CLI Commands
| Command | Description |
+116 -27
View File
@@ -1,18 +1,22 @@
"""Memory provider plugin discovery.
Scans ``plugins/memory/<name>/`` directories for memory provider plugins.
Each subdirectory must contain ``__init__.py`` with a class implementing
the MemoryProvider ABC.
Scans two directories for memory provider plugins:
Memory providers are separate from the general plugin system they live
in the repo and are always available without user installation. Only ONE
can be active at a time, selected via ``memory.provider`` in config.yaml.
1. Bundled providers: ``plugins/memory/<name>/`` (shipped with hermes-agent)
2. User-installed providers: ``$HERMES_HOME/plugins/<name>/``
Each subdirectory must contain ``__init__.py`` with a class implementing
the MemoryProvider ABC. On name collisions, bundled providers take
precedence.
Only ONE provider can be active at a time, selected via
``memory.provider`` in config.yaml.
Usage:
from plugins.memory import discover_memory_providers, load_memory_provider
available = discover_memory_providers() # [(name, desc, available), ...]
provider = load_memory_provider("openviking") # MemoryProvider instance
provider = load_memory_provider("mnemosyne") # MemoryProvider instance
"""
from __future__ import annotations
@@ -29,24 +33,101 @@ logger = logging.getLogger(__name__)
_MEMORY_PLUGINS_DIR = Path(__file__).parent
# ---------------------------------------------------------------------------
# Directory helpers
# ---------------------------------------------------------------------------
def _get_user_plugins_dir() -> Optional[Path]:
"""Return ``$HERMES_HOME/plugins/`` or None if unavailable."""
try:
from hermes_constants import get_hermes_home
d = get_hermes_home() / "plugins"
return d if d.is_dir() else None
except Exception:
return None
def _is_memory_provider_dir(path: Path) -> bool:
"""Heuristic: does *path* look like a memory provider plugin?
Checks for ``register_memory_provider`` or ``MemoryProvider`` in the
``__init__.py`` source. Cheap text scan no import needed.
"""
init_file = path / "__init__.py"
if not init_file.exists():
return False
try:
source = init_file.read_text(errors="replace")[:8192]
return "register_memory_provider" in source or "MemoryProvider" in source
except Exception:
return False
def _iter_provider_dirs() -> List[Tuple[str, Path]]:
"""Yield ``(name, path)`` for all discovered provider directories.
Scans bundled first, then user-installed. Bundled takes precedence
on name collisions (first-seen wins via ``seen`` set).
"""
seen: set = set()
dirs: List[Tuple[str, Path]] = []
# 1. Bundled providers (plugins/memory/<name>/)
if _MEMORY_PLUGINS_DIR.is_dir():
for child in sorted(_MEMORY_PLUGINS_DIR.iterdir()):
if not child.is_dir() or child.name.startswith(("_", ".")):
continue
if not (child / "__init__.py").exists():
continue
seen.add(child.name)
dirs.append((child.name, child))
# 2. User-installed providers ($HERMES_HOME/plugins/<name>/)
user_dir = _get_user_plugins_dir()
if user_dir:
for child in sorted(user_dir.iterdir()):
if not child.is_dir() or child.name.startswith(("_", ".")):
continue
if child.name in seen:
continue # bundled takes precedence
if not _is_memory_provider_dir(child):
continue # skip non-memory plugins
dirs.append((child.name, child))
return dirs
def find_provider_dir(name: str) -> Optional[Path]:
"""Resolve a provider name to its directory.
Checks bundled first, then user-installed.
"""
# Bundled
bundled = _MEMORY_PLUGINS_DIR / name
if bundled.is_dir() and (bundled / "__init__.py").exists():
return bundled
# User-installed
user_dir = _get_user_plugins_dir()
if user_dir:
user = user_dir / name
if user.is_dir() and _is_memory_provider_dir(user):
return user
return None
# ---------------------------------------------------------------------------
# Public API
# ---------------------------------------------------------------------------
def discover_memory_providers() -> List[Tuple[str, str, bool]]:
"""Scan plugins/memory/ for available providers.
"""Scan bundled and user-installed directories for available providers.
Returns list of (name, description, is_available) tuples.
Does NOT import the providers just reads plugin.yaml for metadata
and does a lightweight availability check.
Bundled providers take precedence on name collisions.
"""
results = []
if not _MEMORY_PLUGINS_DIR.is_dir():
return results
for child in sorted(_MEMORY_PLUGINS_DIR.iterdir()):
if not child.is_dir() or child.name.startswith(("_", ".")):
continue
init_file = child / "__init__.py"
if not init_file.exists():
continue
for name, child in _iter_provider_dirs():
# Read description from plugin.yaml if available
desc = ""
yaml_file = child / "plugin.yaml"
@@ -70,7 +151,7 @@ def discover_memory_providers() -> List[Tuple[str, str, bool]]:
except Exception:
available = False
results.append((child.name, desc, available))
results.append((name, desc, available))
return results
@@ -78,11 +159,15 @@ def discover_memory_providers() -> List[Tuple[str, str, bool]]:
def load_memory_provider(name: str) -> Optional["MemoryProvider"]:
"""Load and return a MemoryProvider instance by name.
Checks both bundled (``plugins/memory/<name>/``) and user-installed
(``$HERMES_HOME/plugins/<name>/``) directories. Bundled takes
precedence on name collisions.
Returns None if the provider is not found or fails to load.
"""
provider_dir = _MEMORY_PLUGINS_DIR / name
if not provider_dir.is_dir():
logger.debug("Memory provider '%s' not found in %s", name, _MEMORY_PLUGINS_DIR)
provider_dir = find_provider_dir(name)
if not provider_dir:
logger.debug("Memory provider '%s' not found in bundled or user plugins", name)
return None
try:
@@ -104,7 +189,10 @@ def _load_provider_from_dir(provider_dir: Path) -> Optional["MemoryProvider"]:
- A top-level class that extends MemoryProvider we instantiate it
"""
name = provider_dir.name
module_name = f"plugins.memory.{name}"
# Use a separate namespace for user-installed plugins so they don't
# collide with bundled providers in sys.modules.
_is_bundled = _MEMORY_PLUGINS_DIR in provider_dir.parents or provider_dir.parent == _MEMORY_PLUGINS_DIR
module_name = f"plugins.memory.{name}" if _is_bundled else f"_hermes_user_memory.{name}"
init_file = provider_dir / "__init__.py"
if not init_file.exists():
@@ -257,15 +345,16 @@ def discover_plugin_cli_commands() -> List[dict]:
return results
# Only look at the active provider's directory
plugin_dir = _MEMORY_PLUGINS_DIR / active_provider
if not plugin_dir.is_dir():
plugin_dir = find_provider_dir(active_provider)
if not plugin_dir:
return results
cli_file = plugin_dir / "cli.py"
if not cli_file.exists():
return results
module_name = f"plugins.memory.{active_provider}.cli"
_is_bundled = _MEMORY_PLUGINS_DIR in plugin_dir.parents or plugin_dir.parent == _MEMORY_PLUGINS_DIR
module_name = f"plugins.memory.{active_provider}.cli" if _is_bundled else f"_hermes_user_memory.{active_provider}.cli"
try:
# Import the CLI module (lightweight — no SDK needed)
if module_name in sys.modules:
+207 -99
View File
@@ -1,6 +1,6 @@
# Honcho Memory Provider
AI-native cross-session user modeling with dialectic Q&A, semantic search, peer cards, and persistent conclusions.
AI-native cross-session user modeling with multi-pass dialectic reasoning, session summaries, bidirectional peer tools, and persistent conclusions.
> **Honcho docs:** <https://docs.honcho.dev/v3/guides/integrations/hermes>
@@ -19,9 +19,86 @@ hermes memory setup # generic picker, also works
Or manually:
```bash
hermes config set memory.provider honcho
echo "HONCHO_API_KEY=your-key" >> ~/.hermes/.env
echo "HONCHO_API_KEY=***" >> ~/.hermes/.env
```
## Architecture Overview
### Two-Layer Context Injection
Context is injected into the **user message** at API-call time (not the system prompt) to preserve prompt caching. Only a static mode header goes in the system prompt. The injected block is wrapped in `<memory-context>` fences with a system note clarifying it's background data, not new user input.
Two independent layers, each on its own cadence:
**Layer 1 — Base context** (refreshed every `contextCadence` turns):
1. **SESSION SUMMARY** — from `session.context(summary=True)`, placed first
2. **User Representation** — Honcho's evolving model of the user
3. **User Peer Card** — key facts snapshot
4. **AI Self-Representation** — Honcho's model of the AI peer
5. **AI Identity Card** — AI peer facts
**Layer 2 — Dialectic supplement** (fired every `dialecticCadence` turns):
Multi-pass `.chat()` reasoning about the user, appended after base context.
Both layers are joined, then truncated to fit `contextTokens` budget via `_truncate_to_budget` (tokens × 4 chars, word-boundary safe).
### Cold Start vs Warm Session Prompts
Dialectic pass 0 automatically selects its prompt based on session state:
- **Cold** (no base context cached): "Who is this person? What are their preferences, goals, and working style? Focus on facts that would help an AI assistant be immediately useful."
- **Warm** (base context exists): "Given what's been discussed in this session so far, what context about this user is most relevant to the current conversation? Prioritize active context over biographical facts."
Not configurable — determined automatically.
### Dialectic Depth (Multi-Pass Reasoning)
`dialecticDepth` (13, clamped) controls how many `.chat()` calls fire per dialectic cycle:
| Depth | Passes | Behavior |
|-------|--------|----------|
| 1 | single `.chat()` | Base query only (cold or warm prompt) |
| 2 | audit + synthesis | Pass 0 result is self-audited; pass 1 does targeted synthesis. Conditional bail-out if pass 0 returns strong signal (>300 chars or structured with bullets/sections >100 chars) |
| 3 | audit + synthesis + reconciliation | Pass 2 reconciles contradictions across prior passes into a final synthesis |
### Proportional Reasoning Levels
When `dialecticDepthLevels` is not set, each pass uses a proportional level relative to `dialecticReasoningLevel` (the "base"):
| Depth | Pass levels |
|-------|-------------|
| 1 | [base] |
| 2 | [minimal, base] |
| 3 | [minimal, base, low] |
Override with `dialecticDepthLevels`: an explicit array of reasoning level strings per pass.
### Three Orthogonal Dialectic Knobs
| Knob | Controls | Type |
|------|----------|------|
| `dialecticCadence` | How often — minimum turns between dialectic firings | int |
| `dialecticDepth` | How many — passes per firing (13) | int |
| `dialecticReasoningLevel` | How hard — reasoning ceiling per `.chat()` call | string |
### Input Sanitization
`run_conversation` strips leaked `<memory-context>` blocks from user input before processing. When `saveMessages` persists a turn that included injected context, the block can reappear in subsequent turns via message history. The sanitizer removes `<memory-context>` blocks plus associated system notes.
## Tools
Five bidirectional tools. All accept an optional `peer` parameter (`"user"` or `"ai"`, default `"user"`).
| Tool | LLM call? | Description |
|------|-----------|-------------|
| `honcho_profile` | No | Peer card — key facts snapshot |
| `honcho_search` | No | Semantic search over stored context (800 tok default, 2000 max) |
| `honcho_context` | No | Full session context: summary, representation, card, messages |
| `honcho_reasoning` | Yes | LLM-synthesized answer via dialectic `.chat()` |
| `honcho_conclude` | No | Write a persistent fact/conclusion about the user |
Tool visibility depends on `recallMode`: hidden in `context` mode, always present in `tools` and `hybrid`.
## Config Resolution
Config is read from the first file that exists:
@@ -34,42 +111,128 @@ Config is read from the first file that exists:
Host key is derived from the active Hermes profile: `hermes` (default) or `hermes.<profile>`.
## Tools
| Tool | LLM call? | Description |
|------|-----------|-------------|
| `honcho_profile` | No | User's peer card -- key facts snapshot |
| `honcho_search` | No | Semantic search over stored context (800 tok default, 2000 max) |
| `honcho_context` | Yes | LLM-synthesized answer via dialectic reasoning |
| `honcho_conclude` | No | Write a persistent fact about the user |
Tool availability depends on `recallMode`: hidden in `context` mode, always present in `tools` and `hybrid`.
For every key, resolution order is: **host block > root > env var > default**.
## Full Configuration Reference
### Identity & Connection
| Key | Type | Default | Scope | Description |
|-----|------|---------|-------|-------------|
| `apiKey` | string | -- | root / host | API key. Falls back to `HONCHO_API_KEY` env var |
| `baseUrl` | string | -- | root | Base URL for self-hosted Honcho. Local URLs (`localhost`, `127.0.0.1`, `::1`) auto-skip API key auth |
| `environment` | string | `"production"` | root / host | SDK environment mapping |
| `enabled` | bool | auto | root / host | Master toggle. Auto-enables when `apiKey` or `baseUrl` present |
| `workspace` | string | host key | root / host | Honcho workspace ID |
| `peerName` | string | -- | root / host | User peer identity |
| `aiPeer` | string | host key | root / host | AI peer identity |
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `apiKey` | string | | API key. Falls back to `HONCHO_API_KEY` env var |
| `baseUrl` | string | | Base URL for self-hosted Honcho. Local URLs auto-skip API key auth |
| `environment` | string | `"production"` | SDK environment mapping |
| `enabled` | bool | auto | Master toggle. Auto-enables when `apiKey` or `baseUrl` present |
| `workspace` | string | host key | Honcho workspace ID. Shared environment — all profiles in the same workspace can see the same user identity and related memories |
| `peerName` | string | | User peer identity |
| `aiPeer` | string | host key | AI peer identity |
### Memory & Recall
| Key | Type | Default | Scope | Description |
|-----|------|---------|-------|-------------|
| `recallMode` | string | `"hybrid"` | root / host | `"hybrid"` (auto-inject + tools), `"context"` (auto-inject only, tools hidden), `"tools"` (tools only, no injection). Legacy `"auto"` normalizes to `"hybrid"` |
| `observationMode` | string | `"directional"` | root / host | Shorthand preset: `"directional"` (all on) or `"unified"` (shared pool). Use `observation` object for granular control |
| `observation` | object | -- | root / host | Per-peer observation config (see below) |
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `recallMode` | string | `"hybrid"` | `"hybrid"` (auto-inject + tools), `"context"` (auto-inject only, tools hidden), `"tools"` (tools only, no injection). Legacy `"auto"` `"hybrid"` |
| `observationMode` | string | `"directional"` | Preset: `"directional"` (all on) or `"unified"` (shared pool). Use `observation` object for granular control |
| `observation` | object | | Per-peer observation config (see Observation section) |
#### Observation (granular)
### Write Behavior
Maps 1:1 to Honcho's per-peer `SessionPeerConfig`. Set at root or per host block -- each profile can have different observation settings. When present, overrides `observationMode` preset.
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `writeFrequency` | string/int | `"async"` | `"async"` (background), `"turn"` (sync per turn), `"session"` (batch on end), or integer N (every N turns) |
| `saveMessages` | bool | `true` | Persist messages to Honcho API |
### Session Resolution
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `sessionStrategy` | string | `"per-directory"` | `"per-directory"`, `"per-session"`, `"per-repo"` (git root), `"global"` |
| `sessionPeerPrefix` | bool | `false` | Prepend peer name to session keys |
| `sessions` | object | `{}` | Manual directory-to-session-name mappings |
#### Session Name Resolution
The Honcho session name determines which conversation bucket memory lands in. Resolution follows a priority chain — first match wins:
| Priority | Source | Example session name |
|----------|--------|---------------------|
| 1 | Manual map (`sessions` config) | `"myproject-main"` |
| 2 | `/title` command (mid-session rename) | `"refactor-auth"` |
| 3 | Gateway session key (Telegram, Discord, etc.) | `"agent-main-telegram-dm-8439114563"` |
| 4 | `per-session` strategy | Hermes session ID (`20260415_a3f2b1`) |
| 5 | `per-repo` strategy | Git root directory name (`hermes-agent`) |
| 6 | `per-directory` strategy | Current directory basename (`src`) |
| 7 | `global` strategy | Workspace name (`hermes`) |
Gateway platforms always resolve via priority 3 (per-chat isolation) regardless of `sessionStrategy`. The strategy setting only affects CLI sessions.
If `sessionPeerPrefix` is `true`, the peer name is prepended: `eri-hermes-agent`.
#### What each strategy produces
- **`per-directory`** — basename of `$PWD`. Opening hermes in `~/code/myapp` and `~/code/other` gives two separate sessions. Same directory = same session across runs.
- **`per-repo`** — git root directory name. All subdirectories within a repo share one session. Falls back to `per-directory` if not inside a git repo.
- **`per-session`** — Hermes session ID (timestamp + hex). Every `hermes` invocation starts a fresh Honcho session. Falls back to `per-directory` if no session ID is available.
- **`global`** — workspace name. One session for everything. Memory accumulates across all directories and runs.
### Multi-Profile Pattern
Multiple Hermes profiles can share one workspace while maintaining separate AI identities. Config resolution is **host block > root > env var > default** — host blocks inherit from root, so shared settings only need to be declared once:
```json
{
"apiKey": "***",
"workspace": "hermes",
"peerName": "yourname",
"hosts": {
"hermes": {
"aiPeer": "hermes",
"recallMode": "hybrid",
"sessionStrategy": "per-directory"
},
"hermes.coder": {
"aiPeer": "coder",
"recallMode": "tools",
"sessionStrategy": "per-repo"
}
}
}
```
Both profiles see the same user (`yourname`) in the same shared environment (`hermes`), but each AI peer builds its own observations, conclusions, and behavior patterns. The coder's memory stays code-oriented; the main agent's stays broad.
Host key is derived from the active Hermes profile: `hermes` (default) or `hermes.<profile>` (e.g. `hermes -p coder` → host key `hermes.coder`).
### Dialectic & Reasoning
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `dialecticDepth` | int | `1` | Passes per dialectic cycle (13, clamped). 1=single query, 2=audit+synthesis, 3=audit+synthesis+reconciliation |
| `dialecticDepthLevels` | array | — | Optional array of reasoning level strings per pass. Overrides proportional defaults. Example: `["minimal", "low", "medium"]` |
| `dialecticReasoningLevel` | string | `"low"` | Base reasoning level for `.chat()`: `"minimal"`, `"low"`, `"medium"`, `"high"`, `"max"` |
| `dialecticDynamic` | bool | `true` | When `true`, model can override reasoning level per-call via `honcho_reasoning` tool. When `false`, always uses `dialecticReasoningLevel` |
| `dialecticMaxChars` | int | `600` | Max chars of dialectic result injected into system prompt |
| `dialecticMaxInputChars` | int | `10000` | Max chars for dialectic query input to `.chat()`. Honcho cloud limit: 10k |
### Token Budgets
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `contextTokens` | int | SDK default | Token budget for `context()` API calls. Also gates prefetch truncation (tokens × 4 chars) |
| `messageMaxChars` | int | `25000` | Max chars per message sent via `add_messages()`. Exceeding this triggers chunking with `[continued]` markers. Honcho cloud limit: 25k |
### Cadence (Cost Control)
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `contextCadence` | int | `1` | Minimum turns between base context refreshes (session summary + representation + card) |
| `dialecticCadence` | int | `1` | Minimum turns between dialectic `.chat()` firings |
| `injectionFrequency` | string | `"every-turn"` | `"every-turn"` or `"first-turn"` (inject context on the first user message only, skip from turn 2 onward) |
| `reasoningLevelCap` | string | — | Hard cap on reasoning level: `"minimal"`, `"low"`, `"medium"`, `"high"` |
### Observation (Granular)
Maps 1:1 to Honcho's per-peer `SessionPeerConfig`. When present, overrides `observationMode` preset.
```json
"observation": {
@@ -85,74 +248,16 @@ Maps 1:1 to Honcho's per-peer `SessionPeerConfig`. Set at root or per host block
| `ai.observeMe` | `true` | AI peer self-observation (Honcho builds AI representation) |
| `ai.observeOthers` | `true` | AI peer observes user messages (enables cross-peer dialectic) |
Presets for `observationMode`:
- `"directional"` (default): all four booleans `true`
Presets:
- `"directional"` (default): all four `true`
- `"unified"`: user `observeMe=true`, AI `observeOthers=true`, rest `false`
Per-profile example -- coder profile observes the user but user doesn't observe coder:
### Hardcoded Limits
```json
"hosts": {
"hermes.coder": {
"observation": {
"user": { "observeMe": true, "observeOthers": false },
"ai": { "observeMe": true, "observeOthers": true }
}
}
}
```
Settings changed in the [Honcho dashboard](https://app.honcho.dev) are synced back on session init.
### Write Behavior
| Key | Type | Default | Scope | Description |
|-----|------|---------|-------|-------------|
| `writeFrequency` | string or int | `"async"` | root / host | `"async"` (background thread), `"turn"` (sync per turn), `"session"` (batch on end), or integer N (every N turns) |
| `saveMessages` | bool | `true` | root / host | Whether to persist messages to Honcho API |
### Session Resolution
| Key | Type | Default | Scope | Description |
|-----|------|---------|-------|-------------|
| `sessionStrategy` | string | `"per-directory"` | root / host | `"per-directory"`, `"per-session"` (new each run), `"per-repo"` (git root name), `"global"` (single session) |
| `sessionPeerPrefix` | bool | `false` | root / host | Prepend peer name to session keys |
| `sessions` | object | `{}` | root | Manual directory-to-session-name mappings: `{"/path/to/project": "my-session"}` |
### Token Budgets & Dialectic
| Key | Type | Default | Scope | Description |
|-----|------|---------|-------|-------------|
| `contextTokens` | int | SDK default | root / host | Token budget for `context()` API calls. Also gates prefetch truncation (tokens x 4 chars) |
| `dialecticReasoningLevel` | string | `"low"` | root / host | Base reasoning level for `peer.chat()`: `"minimal"`, `"low"`, `"medium"`, `"high"`, `"max"` |
| `dialecticDynamic` | bool | `true` | root / host | Auto-bump reasoning based on query length: `<120` chars = base level, `120-400` = +1, `>400` = +2 (capped at `"high"`). Set `false` to always use `dialecticReasoningLevel` as-is |
| `dialecticMaxChars` | int | `600` | root / host | Max chars of dialectic result injected into system prompt |
| `dialecticMaxInputChars` | int | `10000` | root / host | Max chars for dialectic query input to `peer.chat()`. Honcho cloud limit: 10k |
| `messageMaxChars` | int | `25000` | root / host | Max chars per message sent via `add_messages()`. Messages exceeding this are chunked with `[continued]` markers. Honcho cloud limit: 25k |
### Cost Awareness (Advanced)
These are read from the root config object, not the host block. Must be set manually in `honcho.json`.
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `injectionFrequency` | string | `"every-turn"` | `"every-turn"` or `"first-turn"` (inject context only on turn 0) |
| `contextCadence` | int | `1` | Minimum turns between `context()` API calls |
| `dialecticCadence` | int | `1` | Minimum turns between `peer.chat()` API calls |
| `reasoningLevelCap` | string | -- | Hard cap on auto-bumped reasoning: `"minimal"`, `"low"`, `"mid"`, `"high"` |
### Hardcoded Limits (Not Configurable)
| Limit | Value | Location |
|-------|-------|----------|
| Search tool max tokens | 2000 (hard cap), 800 (default) | `__init__.py` handle_tool_call |
| Peer card fetch tokens | 200 | `session.py` get_peer_card |
## Config Precedence
For every key, resolution order is: **host block > root > env var > default**.
Host key derivation: `HERMES_HONCHO_HOST` env > active profile (`hermes.<profile>`) > `"hermes"`.
| Limit | Value |
|-------|-------|
| Search tool max tokens | 2000 (hard cap), 800 (default) |
| Peer card fetch tokens | 200 |
## Environment Variables
@@ -182,15 +287,16 @@ Host key derivation: `HERMES_HONCHO_HOST` env > active profile (`hermes.<profile
```json
{
"apiKey": "your-key",
"apiKey": "***",
"workspace": "hermes",
"peerName": "eri",
"peerName": "username",
"contextCadence": 2,
"dialecticCadence": 3,
"dialecticDepth": 2,
"hosts": {
"hermes": {
"enabled": true,
"aiPeer": "hermes",
"workspace": "hermes",
"peerName": "eri",
"recallMode": "hybrid",
"observation": {
"user": { "observeMe": true, "observeOthers": true },
@@ -199,14 +305,16 @@ Host key derivation: `HERMES_HONCHO_HOST` env > active profile (`hermes.<profile
"writeFrequency": "async",
"sessionStrategy": "per-directory",
"dialecticReasoningLevel": "low",
"dialecticDepth": 2,
"dialecticMaxChars": 600,
"saveMessages": true
},
"hermes.coder": {
"enabled": true,
"aiPeer": "coder",
"workspace": "hermes",
"peerName": "eri",
"sessionStrategy": "per-repo",
"dialecticDepth": 1,
"dialecticDepthLevels": ["low"],
"observation": {
"user": { "observeMe": true, "observeOthers": false },
"ai": { "observeMe": true, "observeOthers": true }
+409 -78
View File
@@ -17,6 +17,7 @@ from __future__ import annotations
import json
import logging
import re
import threading
from typing import Any, Dict, List, Optional
@@ -33,20 +34,33 @@ logger = logging.getLogger(__name__)
PROFILE_SCHEMA = {
"name": "honcho_profile",
"description": (
"Retrieve the user's peer card from Honcho — a curated list of key facts "
"about them (name, role, preferences, communication style, patterns). "
"Fast, no LLM reasoning, minimal cost. "
"Use this at conversation start or when you need a quick factual snapshot."
"Retrieve or update a peer card from Honcho — a curated list of key facts "
"about that peer (name, role, preferences, communication style, patterns). "
"Pass `card` to update; omit `card` to read."
),
"parameters": {"type": "object", "properties": {}, "required": []},
"parameters": {
"type": "object",
"properties": {
"peer": {
"type": "string",
"description": "Peer to query. Built-in aliases: 'user' (default), 'ai'. Or pass any peer ID from this workspace.",
},
"card": {
"type": "array",
"items": {"type": "string"},
"description": "New peer card as a list of fact strings. Omit to read the current card.",
},
},
"required": [],
},
}
SEARCH_SCHEMA = {
"name": "honcho_search",
"description": (
"Semantic search over Honcho's stored context about the user. "
"Semantic search over Honcho's stored context about a peer. "
"Returns raw excerpts ranked by relevance — no LLM synthesis. "
"Cheaper and faster than honcho_context. "
"Cheaper and faster than honcho_reasoning. "
"Good when you want to find specific past facts and reason over them yourself."
),
"parameters": {
@@ -60,17 +74,23 @@ SEARCH_SCHEMA = {
"type": "integer",
"description": "Token budget for returned context (default 800, max 2000).",
},
"peer": {
"type": "string",
"description": "Peer to query. Built-in aliases: 'user' (default), 'ai'. Or pass any peer ID from this workspace.",
},
},
"required": ["query"],
},
}
CONTEXT_SCHEMA = {
"name": "honcho_context",
REASONING_SCHEMA = {
"name": "honcho_reasoning",
"description": (
"Ask Honcho a natural language question and get a synthesized answer. "
"Uses Honcho's LLM (dialectic reasoning) — higher cost than honcho_profile or honcho_search. "
"Can query about any peer: the user (default) or the AI assistant."
"Can query about any peer via alias or explicit peer ID. "
"Pass reasoning_level to control depth: minimal (fast/cheap), low (default), "
"medium, high, max (deep/expensive). Omit for configured default."
),
"parameters": {
"type": "object",
@@ -79,37 +99,87 @@ CONTEXT_SCHEMA = {
"type": "string",
"description": "A natural language question.",
},
"reasoning_level": {
"type": "string",
"description": (
"Override the default reasoning depth. "
"Omit to use the configured default (typically low). "
"Guide:\n"
"- minimal: quick factual lookups (name, role, simple preference)\n"
"- low: straightforward questions with clear answers\n"
"- medium: multi-aspect questions requiring synthesis across observations\n"
"- high: complex behavioral patterns, contradictions, deep analysis\n"
"- max: thorough audit-level analysis, leave no stone unturned"
),
"enum": ["minimal", "low", "medium", "high", "max"],
},
"peer": {
"type": "string",
"description": "Which peer to query about: 'user' (default) or 'ai'.",
"description": "Peer to query. Built-in aliases: 'user' (default), 'ai'. Or pass any peer ID from this workspace.",
},
},
"required": ["query"],
},
}
CONTEXT_SCHEMA = {
"name": "honcho_context",
"description": (
"Retrieve full session context from Honcho — summary, peer representation, "
"peer card, and recent messages. No LLM synthesis. "
"Cheaper than honcho_reasoning. Use this to see what Honcho knows about "
"the current conversation and the specified peer."
),
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Optional focus query to filter context. Omit for full session context snapshot.",
},
"peer": {
"type": "string",
"description": "Peer to query. Built-in aliases: 'user' (default), 'ai'. Or pass any peer ID from this workspace.",
},
},
"required": [],
},
}
CONCLUDE_SCHEMA = {
"name": "honcho_conclude",
"description": (
"Write a conclusion about the user back to Honcho's memory. "
"Conclusions are persistent facts that build the user's profile. "
"Use when the user states a preference, corrects you, or shares "
"something to remember across sessions."
"Write or delete a conclusion about a peer in Honcho's memory. "
"Conclusions are persistent facts that build a peer's profile. "
"You MUST pass exactly one of: `conclusion` (to create) or `delete_id` (to delete). "
"Passing neither is an error. "
"Deletion is only for PII removal — Honcho self-heals incorrect conclusions over time."
),
"parameters": {
"type": "object",
"properties": {
"conclusion": {
"type": "string",
"description": "A factual statement about the user to persist.",
}
"description": "A factual statement to persist. Required when not using delete_id.",
},
"delete_id": {
"type": "string",
"description": "Conclusion ID to delete (for PII removal). Required when not using conclusion.",
},
"peer": {
"type": "string",
"description": "Peer to query. Built-in aliases: 'user' (default), 'ai'. Or pass any peer ID from this workspace.",
},
},
"required": ["conclusion"],
"anyOf": [
{"required": ["conclusion"]},
{"required": ["delete_id"]},
],
},
}
ALL_TOOL_SCHEMAS = [PROFILE_SCHEMA, SEARCH_SCHEMA, CONTEXT_SCHEMA, CONCLUDE_SCHEMA]
ALL_TOOL_SCHEMAS = [PROFILE_SCHEMA, SEARCH_SCHEMA, REASONING_SCHEMA, CONTEXT_SCHEMA, CONCLUDE_SCHEMA]
# ---------------------------------------------------------------------------
@@ -131,16 +201,18 @@ class HonchoMemoryProvider(MemoryProvider):
# B1: recall_mode — set during initialize from config
self._recall_mode = "hybrid" # "context", "tools", or "hybrid"
# B4: First-turn context baking
self._first_turn_context: Optional[str] = None
self._first_turn_lock = threading.Lock()
# Base context cache — refreshed on context_cadence, not frozen
self._base_context_cache: Optional[str] = None
self._base_context_lock = threading.Lock()
# B5: Cost-awareness turn counting and cadence
self._turn_count = 0
self._injection_frequency = "every-turn" # or "first-turn"
self._context_cadence = 1 # minimum turns between context API calls
self._dialectic_cadence = 1 # minimum turns between dialectic API calls
self._reasoning_level_cap: Optional[str] = None # "minimal", "low", "mid", "high"
self._dialectic_cadence = 3 # minimum turns between dialectic API calls
self._dialectic_depth = 1 # how many .chat() calls per dialectic cycle (1-3)
self._dialectic_depth_levels: list[str] | None = None # per-pass reasoning levels
self._reasoning_level_cap: Optional[str] = None # "minimal", "low", "medium", "high"
self._last_context_turn = -999
self._last_dialectic_turn = -999
@@ -236,9 +308,11 @@ class HonchoMemoryProvider(MemoryProvider):
raw = cfg.raw or {}
self._injection_frequency = raw.get("injectionFrequency", "every-turn")
self._context_cadence = int(raw.get("contextCadence", 1))
self._dialectic_cadence = int(raw.get("dialecticCadence", 1))
self._dialectic_cadence = int(raw.get("dialecticCadence", 3))
self._dialectic_depth = max(1, min(cfg.dialectic_depth, 3))
self._dialectic_depth_levels = cfg.dialectic_depth_levels
cap = raw.get("reasoningLevelCap")
if cap and cap in ("minimal", "low", "mid", "high"):
if cap and cap in ("minimal", "low", "medium", "high"):
self._reasoning_level_cap = cap
except Exception as e:
logger.debug("Honcho cost-awareness config parse error: %s", e)
@@ -251,9 +325,7 @@ class HonchoMemoryProvider(MemoryProvider):
# ----- Port #1957: lazy session init for tools-only mode -----
if self._recall_mode == "tools":
if cfg.init_on_session_start:
# Eager init: create session now so sync_turn() works from turn 1.
# Does NOT enable auto-injection — prefetch() still returns empty.
logger.debug("Honcho tools-only mode — eager session init (initOnSessionStart=true)")
# Eager init even in tools mode (opt-in)
self._do_session_init(cfg, session_id, **kwargs)
return
# Defer actual session creation until first tool call
@@ -287,8 +359,13 @@ class HonchoMemoryProvider(MemoryProvider):
# ----- B3: resolve_session_name -----
session_title = kwargs.get("session_title")
gateway_session_key = kwargs.get("gateway_session_key")
self._session_key = (
cfg.resolve_session_name(session_title=session_title, session_id=session_id)
cfg.resolve_session_name(
session_title=session_title,
session_id=session_id,
gateway_session_key=gateway_session_key,
)
or session_id
or "hermes-default"
)
@@ -299,12 +376,21 @@ class HonchoMemoryProvider(MemoryProvider):
self._session_initialized = True
# ----- B6: Memory file migration (one-time, for new sessions) -----
# Skip under per-session strategy: every Hermes run creates a fresh
# Honcho session by design, so uploading MEMORY.md/USER.md/SOUL.md to
# each one would flood the backend with short-lived duplicates instead
# of performing a one-time migration.
try:
if not session.messages:
if not session.messages and cfg.session_strategy != "per-session":
from hermes_constants import get_hermes_home
mem_dir = str(get_hermes_home() / "memories")
self._manager.migrate_memory_files(self._session_key, mem_dir)
logger.debug("Honcho memory file migration attempted for new session: %s", self._session_key)
elif cfg.session_strategy == "per-session":
logger.debug(
"Honcho memory file migration skipped: per-session strategy creates a fresh session per run (%s)",
self._session_key,
)
except Exception as e:
logger.debug("Honcho memory file migration skipped: %s", e)
@@ -347,6 +433,11 @@ class HonchoMemoryProvider(MemoryProvider):
"""Format the prefetch context dict into a readable system prompt block."""
parts = []
# Session summary — session-scoped context, placed first for relevance
summary = ctx.get("summary", "")
if summary:
parts.append(f"## Session Summary\n{summary}")
rep = ctx.get("representation", "")
if rep:
parts.append(f"## User Representation\n{rep}")
@@ -370,9 +461,9 @@ class HonchoMemoryProvider(MemoryProvider):
def system_prompt_block(self) -> str:
"""Return system prompt text, adapted by recall_mode.
B4: On the FIRST call, fetch and bake the full Honcho context
(user representation, peer card, AI representation, continuity synthesis).
Subsequent calls return the cached block for prompt caching stability.
Returns only the mode header and tool instructions static text
that doesn't change between turns (prompt-cache friendly).
Live context (representation, card) is injected via prefetch().
"""
if self._cron_skipped:
return ""
@@ -382,24 +473,10 @@ class HonchoMemoryProvider(MemoryProvider):
return (
"# Honcho Memory\n"
"Active (tools-only mode). Use honcho_profile, honcho_search, "
"honcho_context, and honcho_conclude tools to access user memory."
"honcho_reasoning, honcho_context, and honcho_conclude tools to access user memory."
)
return ""
# ----- B4: First-turn context baking -----
first_turn_block = ""
if self._recall_mode in ("context", "hybrid"):
with self._first_turn_lock:
if self._first_turn_context is None:
# First call — fetch and cache
try:
ctx = self._manager.get_prefetch_context(self._session_key)
self._first_turn_context = self._format_first_turn_context(ctx) if ctx else ""
except Exception as e:
logger.debug("Honcho first-turn context fetch failed: %s", e)
self._first_turn_context = ""
first_turn_block = self._first_turn_context
# ----- B1: adapt text based on recall_mode -----
if self._recall_mode == "context":
header = (
@@ -412,7 +489,8 @@ class HonchoMemoryProvider(MemoryProvider):
header = (
"# Honcho Memory\n"
"Active (tools-only mode). Use honcho_profile for a quick factual snapshot, "
"honcho_search for raw excerpts, honcho_context for synthesized answers, "
"honcho_search for raw excerpts, honcho_context for raw peer context, "
"honcho_reasoning for synthesized answers, "
"honcho_conclude to save facts about the user. "
"No automatic context injection — you must use tools to access memory."
)
@@ -421,16 +499,19 @@ class HonchoMemoryProvider(MemoryProvider):
"# Honcho Memory\n"
"Active (hybrid mode). Relevant context is auto-injected AND memory tools are available. "
"Use honcho_profile for a quick factual snapshot, "
"honcho_search for raw excerpts, honcho_context for synthesized answers, "
"honcho_search for raw excerpts, honcho_context for raw peer context, "
"honcho_reasoning for synthesized answers, "
"honcho_conclude to save facts about the user."
)
if first_turn_block:
return f"{header}\n\n{first_turn_block}"
return header
def prefetch(self, query: str, *, session_id: str = "") -> str:
"""Return prefetched dialectic context from background thread.
"""Return base context (representation + card) plus dialectic supplement.
Assembles two layers:
1. Base context from peer.context() cached, refreshed on context_cadence
2. Dialectic supplement cached, refreshed on dialectic_cadence
B1: Returns empty when recall_mode is "tools" (no injection).
B5: Respects injection_frequency "first-turn" returns cached/empty after turn 0.
@@ -443,22 +524,95 @@ class HonchoMemoryProvider(MemoryProvider):
if self._recall_mode == "tools":
return ""
# B5: injection_frequency — if "first-turn" and past first turn, return empty
if self._injection_frequency == "first-turn" and self._turn_count > 0:
# B5: injection_frequency — if "first-turn" and past first turn, return empty.
# _turn_count is 1-indexed (first user message = 1), so > 1 means "past first".
if self._injection_frequency == "first-turn" and self._turn_count > 1:
return ""
parts = []
# ----- Layer 1: Base context (representation + card) -----
# On first call, fetch synchronously so turn 1 isn't empty.
# After that, serve from cache and refresh in background on cadence.
with self._base_context_lock:
if self._base_context_cache is None:
# First call — synchronous fetch
try:
ctx = self._manager.get_prefetch_context(self._session_key)
self._base_context_cache = self._format_first_turn_context(ctx) if ctx else ""
self._last_context_turn = self._turn_count
except Exception as e:
logger.debug("Honcho base context fetch failed: %s", e)
self._base_context_cache = ""
base_context = self._base_context_cache
# Check if background context prefetch has a fresher result
if self._manager:
fresh_ctx = self._manager.pop_context_result(self._session_key)
if fresh_ctx:
formatted = self._format_first_turn_context(fresh_ctx)
if formatted:
with self._base_context_lock:
self._base_context_cache = formatted
base_context = formatted
if base_context:
parts.append(base_context)
# ----- Layer 2: Dialectic supplement -----
# On the very first turn, no queue_prefetch() has run yet so the
# dialectic result is empty. Run with a bounded timeout so a slow
# Honcho connection doesn't block the first response indefinitely.
# On timeout the result is skipped and queue_prefetch() will pick it
# up at the next cadence-allowed turn.
if self._last_dialectic_turn == -999 and query:
_first_turn_timeout = (
self._config.timeout if self._config and self._config.timeout else 8.0
)
_result_holder: list[str] = []
def _run_first_turn() -> None:
try:
_result_holder.append(self._run_dialectic_depth(query))
except Exception as exc:
logger.debug("Honcho first-turn dialectic failed: %s", exc)
_t = threading.Thread(target=_run_first_turn, daemon=True)
_t.start()
_t.join(timeout=_first_turn_timeout)
if not _t.is_alive():
first_turn_dialectic = _result_holder[0] if _result_holder else ""
if first_turn_dialectic and first_turn_dialectic.strip():
with self._prefetch_lock:
self._prefetch_result = first_turn_dialectic
self._last_dialectic_turn = self._turn_count
else:
logger.debug(
"Honcho first-turn dialectic timed out (%.1fs) — "
"will inject at next cadence-allowed turn",
_first_turn_timeout,
)
# Don't update _last_dialectic_turn: queue_prefetch() will
# retry at the next cadence-allowed turn via the async path.
if self._prefetch_thread and self._prefetch_thread.is_alive():
self._prefetch_thread.join(timeout=3.0)
with self._prefetch_lock:
result = self._prefetch_result
dialectic_result = self._prefetch_result
self._prefetch_result = ""
if not result:
if dialectic_result and dialectic_result.strip():
parts.append(dialectic_result)
if not parts:
return ""
result = "\n\n".join(parts)
# ----- Port #3265: token budget enforcement -----
result = self._truncate_to_budget(result)
return f"## Honcho Context\n{result}"
return result
def _truncate_to_budget(self, text: str) -> str:
"""Truncate text to fit within context_tokens budget if set."""
@@ -475,9 +629,11 @@ class HonchoMemoryProvider(MemoryProvider):
return truncated + ""
def queue_prefetch(self, query: str, *, session_id: str = "") -> None:
"""Fire a background dialectic query for the upcoming turn.
"""Fire background prefetch threads for the upcoming turn.
B5: Checks cadence before firing background threads.
B5: Checks cadence independently for dialectic and context refresh.
Context refresh updates the base layer (representation + card).
Dialectic fires the LLM reasoning supplement.
"""
if self._cron_skipped:
return
@@ -488,6 +644,15 @@ class HonchoMemoryProvider(MemoryProvider):
if self._recall_mode == "tools":
return
# ----- Context refresh (base layer) — independent cadence -----
if self._context_cadence <= 1 or (self._turn_count - self._last_context_turn) >= self._context_cadence:
self._last_context_turn = self._turn_count
try:
self._manager.prefetch_context(self._session_key, query)
except Exception as e:
logger.debug("Honcho context prefetch failed: %s", e)
# ----- Dialectic prefetch (supplement layer) -----
# B5: cadence check — skip if too soon since last dialectic call
if self._dialectic_cadence > 1:
if (self._turn_count - self._last_dialectic_turn) < self._dialectic_cadence:
@@ -499,9 +664,7 @@ class HonchoMemoryProvider(MemoryProvider):
def _run():
try:
result = self._manager.dialectic_query(
self._session_key, query, peer="user"
)
result = self._run_dialectic_depth(query)
if result and result.strip():
with self._prefetch_lock:
self._prefetch_result = result
@@ -513,13 +676,140 @@ class HonchoMemoryProvider(MemoryProvider):
)
self._prefetch_thread.start()
# Also fire context prefetch if cadence allows
if self._context_cadence <= 1 or (self._turn_count - self._last_context_turn) >= self._context_cadence:
self._last_context_turn = self._turn_count
try:
self._manager.prefetch_context(self._session_key, query)
except Exception as e:
logger.debug("Honcho context prefetch failed: %s", e)
# ----- Dialectic depth: multi-pass .chat() with cold/warm prompts -----
# Proportional reasoning levels per depth/pass when dialecticDepthLevels
# is not configured. The base level is dialecticReasoningLevel.
# Index: (depth, pass) → level relative to base.
_PROPORTIONAL_LEVELS: dict[tuple[int, int], str] = {
# depth 1: single pass at base level
(1, 0): "base",
# depth 2: pass 0 lighter, pass 1 at base
(2, 0): "minimal",
(2, 1): "base",
# depth 3: pass 0 lighter, pass 1 at base, pass 2 one above minimal
(3, 0): "minimal",
(3, 1): "base",
(3, 2): "low",
}
_LEVEL_ORDER = ("minimal", "low", "medium", "high", "max")
def _resolve_pass_level(self, pass_idx: int) -> str:
"""Resolve reasoning level for a given pass index.
Uses dialecticDepthLevels if configured, otherwise proportional
defaults relative to dialecticReasoningLevel.
"""
if self._dialectic_depth_levels and pass_idx < len(self._dialectic_depth_levels):
return self._dialectic_depth_levels[pass_idx]
base = (self._config.dialectic_reasoning_level if self._config else "low")
mapping = self._PROPORTIONAL_LEVELS.get((self._dialectic_depth, pass_idx))
if mapping is None or mapping == "base":
return base
return mapping
def _build_dialectic_prompt(self, pass_idx: int, prior_results: list[str], is_cold: bool) -> str:
"""Build the prompt for a given dialectic pass.
Pass 0: cold start (general user query) or warm (session-scoped).
Pass 1: self-audit / targeted synthesis against gaps from pass 0.
Pass 2: reconciliation / contradiction check across prior passes.
"""
if pass_idx == 0:
if is_cold:
return (
"Who is this person? What are their preferences, goals, "
"and working style? Focus on facts that would help an AI "
"assistant be immediately useful."
)
return (
"Given what's been discussed in this session so far, what "
"context about this user is most relevant to the current "
"conversation? Prioritize active context over biographical facts."
)
elif pass_idx == 1:
prior = prior_results[-1] if prior_results else ""
return (
f"Given this initial assessment:\n\n{prior}\n\n"
"What gaps remain in your understanding that would help "
"going forward? Synthesize what you actually know about "
"the user's current state and immediate needs, grounded "
"in evidence from recent sessions."
)
else:
# pass 2: reconciliation
return (
f"Prior passes produced:\n\n"
f"Pass 1:\n{prior_results[0] if len(prior_results) > 0 else '(empty)'}\n\n"
f"Pass 2:\n{prior_results[1] if len(prior_results) > 1 else '(empty)'}\n\n"
"Do these assessments cohere? Reconcile any contradictions "
"and produce a final, concise synthesis of what matters most "
"for the current conversation."
)
@staticmethod
def _signal_sufficient(result: str) -> bool:
"""Check if a dialectic pass returned enough signal to skip further passes.
Heuristic: a response longer than 100 chars with some structure
(section headers, bullets, or an ordered list) is considered sufficient.
"""
if not result or len(result.strip()) < 100:
return False
# Structured output with sections/bullets is strong signal
if "\n" in result and (
"##" in result
or "" in result
or re.search(r"^[*-] ", result, re.MULTILINE)
or re.search(r"^\s*\d+\. ", result, re.MULTILINE)
):
return True
# Long enough even without structure
return len(result.strip()) > 300
def _run_dialectic_depth(self, query: str) -> str:
"""Execute up to dialecticDepth .chat() calls with conditional bail-out.
Cold start (no base context): general user-oriented query.
Warm session (base context exists): session-scoped query.
Each pass is conditional bails early if prior pass returned strong signal.
Returns the best (usually last) result.
"""
if not self._manager or not self._session_key:
return ""
is_cold = not self._base_context_cache
results: list[str] = []
for i in range(self._dialectic_depth):
if i == 0:
prompt = self._build_dialectic_prompt(0, results, is_cold)
else:
# Skip further passes if prior pass delivered strong signal
if results and self._signal_sufficient(results[-1]):
logger.debug("Honcho dialectic depth %d: pass %d skipped, prior signal sufficient",
self._dialectic_depth, i)
break
prompt = self._build_dialectic_prompt(i, results, is_cold)
level = self._resolve_pass_level(i)
logger.debug("Honcho dialectic depth %d: pass %d, level=%s, cold=%s",
self._dialectic_depth, i, level, is_cold)
result = self._manager.dialectic_query(
self._session_key, prompt,
reasoning_level=level,
peer="user",
)
results.append(result or "")
# Return the last non-empty result (deepest pass that ran)
for r in reversed(results):
if r and r.strip():
return r
return ""
def on_turn_start(self, turn_number: int, message: str, **kwargs) -> None:
"""Track turn count for cadence and injection_frequency logic."""
@@ -659,7 +949,14 @@ class HonchoMemoryProvider(MemoryProvider):
try:
if tool_name == "honcho_profile":
card = self._manager.get_peer_card(self._session_key)
peer = args.get("peer", "user")
card_update = args.get("card")
if card_update:
result = self._manager.set_peer_card(self._session_key, card_update, peer=peer)
if result is None:
return tool_error("Failed to update peer card.")
return json.dumps({"result": f"Peer card updated ({len(result)} facts).", "card": result})
card = self._manager.get_peer_card(self._session_key, peer=peer)
if not card:
return json.dumps({"result": "No profile facts available yet."})
return json.dumps({"result": card})
@@ -669,30 +966,64 @@ class HonchoMemoryProvider(MemoryProvider):
if not query:
return tool_error("Missing required parameter: query")
max_tokens = min(int(args.get("max_tokens", 800)), 2000)
peer = args.get("peer", "user")
result = self._manager.search_context(
self._session_key, query, max_tokens=max_tokens
self._session_key, query, max_tokens=max_tokens, peer=peer
)
if not result:
return json.dumps({"result": "No relevant context found."})
return json.dumps({"result": result})
elif tool_name == "honcho_context":
elif tool_name == "honcho_reasoning":
query = args.get("query", "")
if not query:
return tool_error("Missing required parameter: query")
peer = args.get("peer", "user")
reasoning_level = args.get("reasoning_level")
result = self._manager.dialectic_query(
self._session_key, query, peer=peer
self._session_key, query,
reasoning_level=reasoning_level,
peer=peer,
)
# Update cadence tracker so auto-injection respects the gap after an explicit call
self._last_dialectic_turn = self._turn_count
return json.dumps({"result": result or "No result from Honcho."})
elif tool_name == "honcho_context":
peer = args.get("peer", "user")
ctx = self._manager.get_session_context(self._session_key, peer=peer)
if not ctx:
return json.dumps({"result": "No context available yet."})
parts = []
if ctx.get("summary"):
parts.append(f"## Summary\n{ctx['summary']}")
if ctx.get("representation"):
parts.append(f"## Representation\n{ctx['representation']}")
if ctx.get("card"):
parts.append(f"## Card\n{ctx['card']}")
if ctx.get("recent_messages"):
msgs = ctx["recent_messages"]
msg_str = "\n".join(
f" [{m['role']}] {m['content'][:200]}"
for m in msgs[-5:] # last 5 for brevity
)
parts.append(f"## Recent messages\n{msg_str}")
return json.dumps({"result": "\n\n".join(parts) or "No context available."})
elif tool_name == "honcho_conclude":
delete_id = args.get("delete_id")
peer = args.get("peer", "user")
if delete_id:
ok = self._manager.delete_conclusion(self._session_key, delete_id, peer=peer)
if ok:
return json.dumps({"result": f"Conclusion {delete_id} deleted."})
return tool_error(f"Failed to delete conclusion {delete_id}.")
conclusion = args.get("conclusion", "")
if not conclusion:
return tool_error("Missing required parameter: conclusion")
ok = self._manager.create_conclusion(self._session_key, conclusion)
return tool_error("Missing required parameter: conclusion or delete_id")
ok = self._manager.create_conclusion(self._session_key, conclusion, peer=peer)
if ok:
return json.dumps({"result": f"Conclusion saved: {conclusion}"})
return json.dumps({"result": f"Conclusion saved for {peer}: {conclusion}"})
return tool_error("Failed to save conclusion.")
return tool_error(f"Unknown tool: {tool_name}")
+109 -16
View File
@@ -440,11 +440,43 @@ def cmd_setup(args) -> None:
if new_recall in ("hybrid", "context", "tools"):
hermes_host["recallMode"] = new_recall
# --- 7. Session strategy ---
current_strat = hermes_host.get("sessionStrategy") or cfg.get("sessionStrategy", "per-directory")
# --- 7. Context token budget ---
current_ctx_tokens = hermes_host.get("contextTokens") or cfg.get("contextTokens")
current_display = str(current_ctx_tokens) if current_ctx_tokens else "uncapped"
print("\n Context injection per turn (hybrid/context recall modes only):")
print(" uncapped -- no limit (default)")
print(" N -- token limit per turn (e.g. 1200)")
new_ctx_tokens = _prompt("Context tokens", default=current_display)
if new_ctx_tokens.strip().lower() in ("none", "uncapped", "no limit"):
hermes_host.pop("contextTokens", None)
elif new_ctx_tokens.strip() == "":
pass # keep current
else:
try:
val = int(new_ctx_tokens)
if val >= 0:
hermes_host["contextTokens"] = val
except (ValueError, TypeError):
pass # keep current
# --- 7b. Dialectic cadence ---
current_dialectic = str(hermes_host.get("dialecticCadence") or cfg.get("dialecticCadence") or "3")
print("\n Dialectic cadence:")
print(" How often Honcho rebuilds its user model (LLM call on Honcho backend).")
print(" 1 = every turn (aggressive), 3 = every 3 turns (recommended), 5+ = sparse.")
new_dialectic = _prompt("Dialectic cadence", default=current_dialectic)
try:
val = int(new_dialectic)
if val >= 1:
hermes_host["dialecticCadence"] = val
except (ValueError, TypeError):
hermes_host["dialecticCadence"] = 3
# --- 8. Session strategy ---
current_strat = hermes_host.get("sessionStrategy") or cfg.get("sessionStrategy", "per-session")
print("\n Session strategy:")
print(" per-directory -- one session per working directory (default)")
print(" per-session -- new Honcho session each run")
print(" per-session -- each run starts clean, Honcho injects context automatically")
print(" per-directory -- reuses session per dir, prior context auto-injected each run")
print(" per-repo -- one session per git repository")
print(" global -- single session across all directories")
new_strat = _prompt("Session strategy", default=current_strat)
@@ -490,10 +522,11 @@ def cmd_setup(args) -> None:
print(f" Recall: {hcfg.recall_mode}")
print(f" Sessions: {hcfg.session_strategy}")
print("\n Honcho tools available in chat:")
print(" honcho_context -- ask Honcho about the user (LLM-synthesized)")
print(" honcho_search -- semantic search over history (no LLM)")
print(" honcho_profile -- peer card, key facts (no LLM)")
print(" honcho_conclude -- persist a user fact to memory (no LLM)")
print(" honcho_context -- session context: summary, representation, card, messages")
print(" honcho_search -- semantic search over history")
print(" honcho_profile -- peer card, key facts")
print(" honcho_reasoning -- ask Honcho a question, synthesized answer")
print(" honcho_conclude -- persist a user fact to memory")
print("\n Other commands:")
print(" hermes honcho status -- show full config")
print(" hermes honcho mode -- change recall/observation mode")
@@ -585,13 +618,26 @@ def cmd_status(args) -> None:
print(f" Enabled: {hcfg.enabled}")
print(f" API key: {masked}")
print(f" Workspace: {hcfg.workspace_id}")
print(f" Config path: {active_path}")
# Config paths — show where config was read from and where writes go
global_path = Path.home() / ".honcho" / "config.json"
print(f" Config: {active_path}")
if write_path != active_path:
print(f" Write path: {write_path} (instance-local)")
print(f" Write to: {write_path} (profile-local)")
if active_path == global_path:
print(f" Fallback: (none — using global ~/.honcho/config.json)")
elif global_path.exists():
print(f" Fallback: {global_path} (exists, cross-app interop)")
print(f" AI peer: {hcfg.ai_peer}")
print(f" User peer: {hcfg.peer_name or 'not set'}")
print(f" Session key: {hcfg.resolve_session_name()}")
print(f" Session strat: {hcfg.session_strategy}")
print(f" Recall mode: {hcfg.recall_mode}")
print(f" Context budget: {hcfg.context_tokens or '(uncapped)'} tokens")
raw = getattr(hcfg, "raw", None) or {}
dialectic_cadence = raw.get("dialecticCadence") or 3
print(f" Dialectic cad: every {dialectic_cadence} turn{'s' if dialectic_cadence != 1 else ''}")
print(f" Observation: user(me={hcfg.user_observe_me},others={hcfg.user_observe_others}) ai(me={hcfg.ai_observe_me},others={hcfg.ai_observe_others})")
print(f" Write freq: {hcfg.write_frequency}")
@@ -599,8 +645,8 @@ def cmd_status(args) -> None:
print("\n Connection... ", end="", flush=True)
try:
client = get_honcho_client(hcfg)
print("OK")
_show_peer_cards(hcfg, client)
print("OK")
except Exception as e:
print(f"FAILED ({e})\n")
else:
@@ -824,6 +870,41 @@ def cmd_mode(args) -> None:
print(f" {label}Recall mode -> {mode_arg} ({MODES[mode_arg]})\n")
def cmd_strategy(args) -> None:
"""Show or set the session strategy."""
STRATEGIES = {
"per-session": "each run starts clean, Honcho injects context automatically",
"per-directory": "reuses session per dir, prior context auto-injected each run",
"per-repo": "one session per git repository",
"global": "single session across all directories",
}
cfg = _read_config()
strat_arg = getattr(args, "strategy", None)
if strat_arg is None:
current = (
(cfg.get("hosts") or {}).get(_host_key(), {}).get("sessionStrategy")
or cfg.get("sessionStrategy")
or "per-session"
)
print("\nHoncho session strategy\n" + "" * 40)
for s, desc in STRATEGIES.items():
marker = " <-" if s == current else ""
print(f" {s:<15} {desc}{marker}")
print(f"\n Set with: hermes honcho strategy [per-session|per-directory|per-repo|global]\n")
return
if strat_arg not in STRATEGIES:
print(f" Invalid strategy '{strat_arg}'. Options: {', '.join(STRATEGIES)}\n")
return
host = _host_key()
label = f"[{host}] " if host != "hermes" else ""
cfg.setdefault("hosts", {}).setdefault(host, {})["sessionStrategy"] = strat_arg
_write_config(cfg)
print(f" {label}Session strategy -> {strat_arg} ({STRATEGIES[strat_arg]})\n")
def cmd_tokens(args) -> None:
"""Show or set token budget settings."""
cfg = _read_config()
@@ -1143,10 +1224,11 @@ def cmd_migrate(args) -> None:
print(" automatically. Files become the seed, not the live store.")
print()
print(" Honcho tools (available to the agent during conversation)")
print(" honcho_context — ask Honcho a question, get a synthesized answer (LLM)")
print(" honcho_search — semantic search over stored context (no LLM)")
print(" honcho_profile — fast peer card snapshot (no LLM)")
print(" honcho_conclude — write a conclusion/fact back to memory (no LLM)")
print(" honcho_context — session context: summary, representation, card, messages")
print(" honcho_search — semantic search over stored context")
print(" honcho_profile — fast peer card snapshot")
print(" honcho_reasoning — ask Honcho a question, synthesized answer")
print(" honcho_conclude — write a conclusion/fact back to memory")
print()
print(" Session naming")
print(" OpenClaw: no persistent session concept — files are global.")
@@ -1197,6 +1279,8 @@ def honcho_command(args) -> None:
cmd_peer(args)
elif sub == "mode":
cmd_mode(args)
elif sub == "strategy":
cmd_strategy(args)
elif sub == "tokens":
cmd_tokens(args)
elif sub == "identity":
@@ -1211,7 +1295,7 @@ def honcho_command(args) -> None:
cmd_sync(args)
else:
print(f" Unknown honcho command: {sub}")
print(" Available: status, sessions, map, peer, mode, tokens, identity, migrate, enable, disable, sync\n")
print(" Available: status, sessions, map, peer, mode, strategy, tokens, identity, migrate, enable, disable, sync\n")
def register_cli(subparser) -> None:
@@ -1270,6 +1354,15 @@ def register_cli(subparser) -> None:
help="Recall mode to set (hybrid/context/tools). Omit to show current.",
)
strategy_parser = subs.add_parser(
"strategy", help="Show or set session strategy (per-session/per-directory/per-repo/global)",
)
strategy_parser.add_argument(
"strategy", nargs="?", metavar="STRATEGY",
choices=("per-session", "per-directory", "per-repo", "global"),
help="Session strategy to set. Omit to show current.",
)
tokens_parser = subs.add_parser(
"tokens", help="Show or set token budget for context and dialectic",
)
+127 -16
View File
@@ -94,6 +94,68 @@ def _resolve_bool(host_val, root_val, *, default: bool) -> bool:
return default
def _parse_context_tokens(host_val, root_val) -> int | None:
"""Parse contextTokens: host wins, then root, then None (uncapped)."""
for val in (host_val, root_val):
if val is not None:
try:
return int(val)
except (ValueError, TypeError):
pass
return None
def _parse_dialectic_depth(host_val, root_val) -> int:
"""Parse dialecticDepth: host wins, then root, then 1. Clamped to 1-3."""
for val in (host_val, root_val):
if val is not None:
try:
return max(1, min(int(val), 3))
except (ValueError, TypeError):
pass
return 1
_VALID_REASONING_LEVELS = ("minimal", "low", "medium", "high", "max")
def _parse_dialectic_depth_levels(host_val, root_val, depth: int) -> list[str] | None:
"""Parse dialecticDepthLevels: optional array of reasoning levels per pass.
Returns None when not configured (use proportional defaults).
When configured, validates each level and truncates/pads to match depth.
"""
for val in (host_val, root_val):
if val is not None and isinstance(val, list):
levels = [
lvl if lvl in _VALID_REASONING_LEVELS else "low"
for lvl in val[:depth]
]
# Pad with "low" if array is shorter than depth
while len(levels) < depth:
levels.append("low")
return levels
return None
def _resolve_optional_float(*values: Any) -> float | None:
"""Return the first non-empty value coerced to a positive float."""
for value in values:
if value is None:
continue
if isinstance(value, str):
value = value.strip()
if not value:
continue
try:
parsed = float(value)
except (TypeError, ValueError):
continue
if parsed > 0:
return parsed
return None
_VALID_OBSERVATION_MODES = {"unified", "directional"}
_OBSERVATION_MODE_ALIASES = {"shared": "unified", "separate": "directional", "cross": "directional"}
@@ -159,6 +221,8 @@ class HonchoClientConfig:
environment: str = "production"
# Optional base URL for self-hosted Honcho (overrides environment mapping)
base_url: str | None = None
# Optional request timeout in seconds for Honcho SDK HTTP calls
timeout: float | None = None
# Identity
peer_name: str | None = None
ai_peer: str = "hermes"
@@ -168,17 +232,25 @@ class HonchoClientConfig:
# Write frequency: "async" (background thread), "turn" (sync per turn),
# "session" (flush on session end), or int (every N turns)
write_frequency: str | int = "async"
# Prefetch budget
# Prefetch budget (None = no cap; set to an integer to bound auto-injected context)
context_tokens: int | None = None
# Dialectic (peer.chat) settings
# reasoning_level: "minimal" | "low" | "medium" | "high" | "max"
dialectic_reasoning_level: str = "low"
# dynamic: auto-bump reasoning level based on query length
# true — low->medium (120+ chars), low->high (400+ chars), capped at "high"
# false — always use dialecticReasoningLevel as-is
# When true, the model can override reasoning_level per-call via the
# honcho_reasoning tool param (agentic). When false, always uses
# dialecticReasoningLevel and ignores model-provided overrides.
dialectic_dynamic: bool = True
# Max chars of dialectic result to inject into Hermes system prompt
dialectic_max_chars: int = 600
# Dialectic depth: how many .chat() calls per dialectic cycle (1-3).
# Depth 1: single call. Depth 2: self-audit + targeted synthesis.
# Depth 3: self-audit + synthesis + reconciliation.
dialectic_depth: int = 1
# Optional per-pass reasoning level override. Array of reasoning levels
# matching dialectic_depth length. When None, uses proportional defaults
# derived from dialectic_reasoning_level.
dialectic_depth_levels: list[str] | None = None
# Honcho API limits — configurable for self-hosted instances
# Max chars per message sent via add_messages() (Honcho cloud: 25000)
message_max_chars: int = 25000
@@ -189,10 +261,8 @@ class HonchoClientConfig:
# "context" — auto-injected context only, Honcho tools removed
# "tools" — Honcho tools only, no auto-injected context
recall_mode: str = "hybrid"
# When True and recallMode is "tools", create the Honcho session eagerly
# during initialize() instead of deferring to the first tool call.
# This ensures sync_turn() can write from the very first turn.
# Does NOT enable automatic context injection — only changes init timing.
# Eager init in tools mode — when true, initializes session during
# initialize() instead of deferring to first tool call
init_on_session_start: bool = False
# Observation mode: legacy string shorthand ("directional" or "unified").
# Kept for backward compat; granular per-peer booleans below are preferred.
@@ -224,12 +294,14 @@ class HonchoClientConfig:
resolved_host = host or resolve_active_host()
api_key = os.environ.get("HONCHO_API_KEY")
base_url = os.environ.get("HONCHO_BASE_URL", "").strip() or None
timeout = _resolve_optional_float(os.environ.get("HONCHO_TIMEOUT"))
return cls(
host=resolved_host,
workspace_id=workspace_id,
api_key=api_key,
environment=os.environ.get("HONCHO_ENVIRONMENT", "production"),
base_url=base_url,
timeout=timeout,
ai_peer=resolved_host,
enabled=bool(api_key or base_url),
)
@@ -290,6 +362,11 @@ class HonchoClientConfig:
or os.environ.get("HONCHO_BASE_URL", "").strip()
or None
)
timeout = _resolve_optional_float(
raw.get("timeout"),
raw.get("requestTimeout"),
os.environ.get("HONCHO_TIMEOUT"),
)
# Auto-enable when API key or base_url is present (unless explicitly disabled)
# Host-level enabled wins, then root-level, then auto-enable if key/url exists.
@@ -335,12 +412,16 @@ class HonchoClientConfig:
api_key=api_key,
environment=environment,
base_url=base_url,
timeout=timeout,
peer_name=host_block.get("peerName") or raw.get("peerName"),
ai_peer=ai_peer,
enabled=enabled,
save_messages=save_messages,
write_frequency=write_frequency,
context_tokens=host_block.get("contextTokens") or raw.get("contextTokens"),
context_tokens=_parse_context_tokens(
host_block.get("contextTokens"),
raw.get("contextTokens"),
),
dialectic_reasoning_level=(
host_block.get("dialecticReasoningLevel")
or raw.get("dialecticReasoningLevel")
@@ -356,6 +437,15 @@ class HonchoClientConfig:
or raw.get("dialecticMaxChars")
or 600
),
dialectic_depth=_parse_dialectic_depth(
host_block.get("dialecticDepth"),
raw.get("dialecticDepth"),
),
dialectic_depth_levels=_parse_dialectic_depth_levels(
host_block.get("dialecticDepthLevels"),
raw.get("dialecticDepthLevels"),
depth=_parse_dialectic_depth(host_block.get("dialecticDepth"), raw.get("dialecticDepth")),
),
message_max_chars=int(
host_block.get("messageMaxChars")
or raw.get("messageMaxChars")
@@ -422,16 +512,18 @@ class HonchoClientConfig:
cwd: str | None = None,
session_title: str | None = None,
session_id: str | None = None,
gateway_session_key: str | None = None,
) -> str | None:
"""Resolve Honcho session name.
Resolution order:
1. Manual directory override from sessions map
2. Hermes session title (from /title command)
3. per-session strategy Hermes session_id ({timestamp}_{hex})
4. per-repo strategy git repo root directory name
5. per-directory strategy directory basename
6. global strategy workspace name
3. Gateway session key (stable per-chat identifier from gateway platforms)
4. per-session strategy Hermes session_id ({timestamp}_{hex})
5. per-repo strategy git repo root directory name
6. per-directory strategy directory basename
7. global strategy workspace name
"""
import re
@@ -445,12 +537,22 @@ class HonchoClientConfig:
# /title mid-session remap
if session_title:
sanitized = re.sub(r'[^a-zA-Z0-9_-]', '-', session_title).strip('-')
sanitized = re.sub(r'[^a-zA-Z0-9_-]+', '-', session_title).strip('-')
if sanitized:
if self.session_peer_prefix and self.peer_name:
return f"{self.peer_name}-{sanitized}"
return sanitized
# Gateway session key: stable per-chat identifier passed by the gateway
# (e.g. "agent:main:telegram:dm:8439114563"). Sanitize colons to hyphens
# for Honcho session ID compatibility. This takes priority over strategy-
# based resolution because gateway platforms need per-chat isolation that
# cwd-based strategies cannot provide.
if gateway_session_key:
sanitized = re.sub(r'[^a-zA-Z0-9_-]+', '-', gateway_session_key).strip('-')
if sanitized:
return sanitized
# per-session: inherit Hermes session_id (new Honcho session each run)
if self.session_strategy == "per-session" and session_id:
if self.session_peer_prefix and self.peer_name:
@@ -512,13 +614,20 @@ def get_honcho_client(config: HonchoClientConfig | None = None) -> Honcho:
# mapping, enabling remote self-hosted Honcho deployments without
# requiring the server to live on localhost.
resolved_base_url = config.base_url
if not resolved_base_url:
resolved_timeout = config.timeout
if not resolved_base_url or resolved_timeout is None:
try:
from hermes_cli.config import load_config
hermes_cfg = load_config()
honcho_cfg = hermes_cfg.get("honcho", {})
if isinstance(honcho_cfg, dict):
resolved_base_url = honcho_cfg.get("base_url", "").strip() or None
if not resolved_base_url:
resolved_base_url = honcho_cfg.get("base_url", "").strip() or None
if resolved_timeout is None:
resolved_timeout = _resolve_optional_float(
honcho_cfg.get("timeout"),
honcho_cfg.get("request_timeout"),
)
except Exception:
pass
@@ -553,6 +662,8 @@ def get_honcho_client(config: HonchoClientConfig | None = None) -> Honcho:
}
if resolved_base_url:
kwargs["base_url"] = resolved_base_url
if resolved_timeout is not None:
kwargs["timeout"] = resolved_timeout
_honcho_client = Honcho(**kwargs)
+247 -75
View File
@@ -486,36 +486,9 @@ class HonchoSessionManager:
_REASONING_LEVELS = ("minimal", "low", "medium", "high", "max")
def _dynamic_reasoning_level(self, query: str) -> str:
"""
Pick a reasoning level for a dialectic query.
When dialecticDynamic is true (default), auto-bumps based on query
length so Honcho applies more inference where it matters:
< 120 chars -> configured default (typically "low")
120-400 chars -> +1 level above default (cap at "high")
> 400 chars -> +2 levels above default (cap at "high")
"max" is never selected automatically -- reserve it for explicit config.
When dialecticDynamic is false, always returns the configured level.
"""
if not self._dialectic_dynamic:
return self._dialectic_reasoning_level
levels = self._REASONING_LEVELS
default_idx = levels.index(self._dialectic_reasoning_level) if self._dialectic_reasoning_level in levels else 1
n = len(query)
if n < 120:
bump = 0
elif n < 400:
bump = 1
else:
bump = 2
# Cap at "high" (index 3) for auto-selection
idx = min(default_idx + bump, 3)
return levels[idx]
def _default_reasoning_level(self) -> str:
"""Return the configured default reasoning level."""
return self._dialectic_reasoning_level
def dialectic_query(
self, session_key: str, query: str,
@@ -532,8 +505,9 @@ class HonchoSessionManager:
Args:
session_key: The session key to query against.
query: Natural language question.
reasoning_level: Override the config default. If None, uses
_dynamic_reasoning_level(query).
reasoning_level: Override the configured default (dialecticReasoningLevel).
Only honored when dialecticDynamic is true.
If None or dialecticDynamic is false, uses the configured default.
peer: Which peer to query "user" (default) or "ai".
Returns:
@@ -543,29 +517,34 @@ class HonchoSessionManager:
if not session:
return ""
target_peer_id = self._resolve_peer_id(session, peer)
if target_peer_id is None:
return ""
# Guard: truncate query to Honcho's dialectic input limit
if len(query) > self._dialectic_max_input_chars:
query = query[:self._dialectic_max_input_chars].rsplit(" ", 1)[0]
level = reasoning_level or self._dynamic_reasoning_level(query)
if self._dialectic_dynamic and reasoning_level:
level = reasoning_level
else:
level = self._default_reasoning_level()
try:
if self._ai_observe_others:
# AI peer can observe user — use cross-observation routing
if peer == "ai":
ai_peer_obj = self._get_or_create_peer(session.assistant_peer_id)
# AI peer can observe other peers — use assistant as observer.
ai_peer_obj = self._get_or_create_peer(session.assistant_peer_id)
if target_peer_id == session.assistant_peer_id:
result = ai_peer_obj.chat(query, reasoning_level=level) or ""
else:
ai_peer_obj = self._get_or_create_peer(session.assistant_peer_id)
result = ai_peer_obj.chat(
query,
target=session.user_peer_id,
target=target_peer_id,
reasoning_level=level,
) or ""
else:
# AI can't observe others — each peer queries self
peer_id = session.assistant_peer_id if peer == "ai" else session.user_peer_id
target_peer = self._get_or_create_peer(peer_id)
# Without cross-observation, each peer queries its own context.
target_peer = self._get_or_create_peer(target_peer_id)
result = target_peer.chat(query, reasoning_level=level) or ""
# Apply Hermes-side char cap before caching
@@ -647,10 +626,11 @@ class HonchoSessionManager:
"""
Pre-fetch user and AI peer context from Honcho.
Fetches peer_representation and peer_card for both peers. search_query
is intentionally omitted it would only affect additional excerpts
that this code does not consume, and passing the raw message exposes
conversation content in server access logs.
Fetches peer_representation and peer_card for both peers, plus the
session summary when available. search_query is intentionally omitted
it would only affect additional excerpts that this code does not
consume, and passing the raw message exposes conversation content in
server access logs.
Args:
session_key: The session key to get context for.
@@ -658,15 +638,29 @@ class HonchoSessionManager:
Returns:
Dictionary with 'representation', 'card', 'ai_representation',
and 'ai_card' keys.
'ai_card', and optionally 'summary' keys.
"""
session = self._cache.get(session_key)
if not session:
return {}
result: dict[str, str] = {}
# Session summary — provides session-scoped context.
# Fresh sessions (per-session cold start, or first-ever per-directory)
# return null summary — the guard below handles that gracefully.
# Per-directory returning sessions get their accumulated summary.
try:
user_ctx = self._fetch_peer_context(session.user_peer_id)
honcho_session = self._sessions_cache.get(session.honcho_session_id)
if honcho_session:
ctx = honcho_session.context(summary=True)
if ctx.summary and getattr(ctx.summary, "content", None):
result["summary"] = ctx.summary.content
except Exception as e:
logger.debug("Failed to fetch session summary from Honcho: %s", e)
try:
user_ctx = self._fetch_peer_context(session.user_peer_id, target=session.user_peer_id)
result["representation"] = user_ctx["representation"]
result["card"] = "\n".join(user_ctx["card"])
except Exception as e:
@@ -674,7 +668,7 @@ class HonchoSessionManager:
# Also fetch AI peer's own representation so Hermes knows itself.
try:
ai_ctx = self._fetch_peer_context(session.assistant_peer_id)
ai_ctx = self._fetch_peer_context(session.assistant_peer_id, target=session.assistant_peer_id)
result["ai_representation"] = ai_ctx["representation"]
result["ai_card"] = "\n".join(ai_ctx["card"])
except Exception as e:
@@ -862,7 +856,7 @@ class HonchoSessionManager:
return [str(item) for item in card if item]
return [str(card)]
def _fetch_peer_card(self, peer_id: str) -> list[str]:
def _fetch_peer_card(self, peer_id: str, *, target: str | None = None) -> list[str]:
"""Fetch a peer card directly from the peer object.
This avoids relying on session.context(), which can return an empty
@@ -872,22 +866,33 @@ class HonchoSessionManager:
peer = self._get_or_create_peer(peer_id)
getter = getattr(peer, "get_card", None)
if callable(getter):
return self._normalize_card(getter())
return self._normalize_card(getter(target=target) if target is not None else getter())
legacy_getter = getattr(peer, "card", None)
if callable(legacy_getter):
return self._normalize_card(legacy_getter())
return self._normalize_card(legacy_getter(target=target) if target is not None else legacy_getter())
return []
def _fetch_peer_context(self, peer_id: str, search_query: str | None = None) -> dict[str, Any]:
def _fetch_peer_context(
self,
peer_id: str,
search_query: str | None = None,
*,
target: str | None = None,
) -> dict[str, Any]:
"""Fetch representation + peer card directly from a peer object."""
peer = self._get_or_create_peer(peer_id)
representation = ""
card: list[str] = []
try:
ctx = peer.context(search_query=search_query) if search_query else peer.context()
context_kwargs: dict[str, Any] = {}
if target is not None:
context_kwargs["target"] = target
if search_query is not None:
context_kwargs["search_query"] = search_query
ctx = peer.context(**context_kwargs) if context_kwargs else peer.context()
representation = (
getattr(ctx, "representation", None)
or getattr(ctx, "peer_representation", None)
@@ -899,24 +904,111 @@ class HonchoSessionManager:
if not representation:
try:
representation = peer.representation() or ""
representation = (
peer.representation(target=target) if target is not None else peer.representation()
) or ""
except Exception as e:
logger.debug("Direct peer.representation() failed for '%s': %s", peer_id, e)
if not card:
try:
card = self._fetch_peer_card(peer_id)
card = self._fetch_peer_card(peer_id, target=target)
except Exception as e:
logger.debug("Direct peer card fetch failed for '%s': %s", peer_id, e)
return {"representation": representation, "card": card}
def get_peer_card(self, session_key: str) -> list[str]:
def get_session_context(self, session_key: str, peer: str = "user") -> dict[str, Any]:
"""Fetch full session context from Honcho including summary.
Uses the session-level context() API which returns summary,
peer_representation, peer_card, and messages.
"""
Fetch the user peer's card — a curated list of key facts.
session = self._cache.get(session_key)
if not session:
return {}
honcho_session = self._sessions_cache.get(session.honcho_session_id)
if not honcho_session:
# Fall back to peer-level context, respecting the requested peer
peer_id = self._resolve_peer_id(session, peer)
if peer_id is None:
peer_id = session.user_peer_id
return self._fetch_peer_context(peer_id, target=peer_id)
try:
peer_id = self._resolve_peer_id(session, peer)
ctx = honcho_session.context(
summary=True,
peer_target=peer_id,
peer_perspective=session.user_peer_id if peer == "user" else session.assistant_peer_id,
)
result: dict[str, Any] = {}
# Summary
if ctx.summary:
result["summary"] = ctx.summary.content
# Peer representation and card
if ctx.peer_representation:
result["representation"] = ctx.peer_representation
if ctx.peer_card:
result["card"] = "\n".join(ctx.peer_card)
# Messages (last N for context)
if ctx.messages:
recent = ctx.messages[-10:] # last 10 messages
result["recent_messages"] = [
{"role": getattr(m, "peer_id", "unknown"), "content": (m.content or "")[:500]}
for m in recent
]
return result
except Exception as e:
logger.debug("Session context fetch failed: %s", e)
return {}
def _resolve_peer_id(self, session: HonchoSession, peer: str | None) -> str:
"""Resolve a peer alias or explicit peer ID to a concrete Honcho peer ID.
Always returns a non-empty string: either a known peer ID or a
sanitized version of the caller-supplied alias/ID.
"""
candidate = (peer or "user").strip()
if not candidate:
return session.user_peer_id
normalized = self._sanitize_id(candidate)
if normalized == self._sanitize_id("user"):
return session.user_peer_id
if normalized == self._sanitize_id("ai"):
return session.assistant_peer_id
return normalized
def _resolve_observer_target(
self,
session: HonchoSession,
peer: str | None,
) -> tuple[str, str | None]:
"""Resolve observer and target peer IDs for context/search/profile queries."""
target_peer_id = self._resolve_peer_id(session, peer)
if target_peer_id == session.assistant_peer_id:
return session.assistant_peer_id, session.assistant_peer_id
if self._ai_observe_others:
return session.assistant_peer_id, target_peer_id
return target_peer_id, None
def get_peer_card(self, session_key: str, peer: str = "user") -> list[str]:
"""
Fetch a peer card a curated list of key facts.
Fast, no LLM reasoning. Returns raw structured facts Honcho has
inferred about the user (name, role, preferences, patterns).
inferred about the target peer (name, role, preferences, patterns).
Empty list if unavailable.
"""
session = self._cache.get(session_key)
@@ -924,12 +1016,19 @@ class HonchoSessionManager:
return []
try:
return self._fetch_peer_card(session.user_peer_id)
observer_peer_id, target_peer_id = self._resolve_observer_target(session, peer)
return self._fetch_peer_card(observer_peer_id, target=target_peer_id)
except Exception as e:
logger.debug("Failed to fetch peer card from Honcho: %s", e)
return []
def search_context(self, session_key: str, query: str, max_tokens: int = 800) -> str:
def search_context(
self,
session_key: str,
query: str,
max_tokens: int = 800,
peer: str = "user",
) -> str:
"""
Semantic search over Honcho session context.
@@ -941,6 +1040,7 @@ class HonchoSessionManager:
session_key: Session to search against.
query: Search query for semantic matching.
max_tokens: Token budget for returned content.
peer: Peer alias or explicit peer ID to search about.
Returns:
Relevant context excerpts as a string, or empty string if none.
@@ -950,7 +1050,13 @@ class HonchoSessionManager:
return ""
try:
ctx = self._fetch_peer_context(session.user_peer_id, search_query=query)
observer_peer_id, target = self._resolve_observer_target(session, peer)
ctx = self._fetch_peer_context(
observer_peer_id,
search_query=query,
target=target,
)
parts = []
if ctx["representation"]:
parts.append(ctx["representation"])
@@ -962,16 +1068,17 @@ class HonchoSessionManager:
logger.debug("Honcho search_context failed: %s", e)
return ""
def create_conclusion(self, session_key: str, content: str) -> bool:
"""Write a conclusion about the user back to Honcho.
def create_conclusion(self, session_key: str, content: str, peer: str = "user") -> bool:
"""Write a conclusion about a target peer back to Honcho.
Conclusions are facts the AI peer observes about the user
preferences, corrections, clarifications, project context.
They feed into the user's peer card and representation.
Conclusions are facts a peer observes about another peer or itself
preferences, corrections, clarifications, and project context.
They feed into the target peer's card and representation.
Args:
session_key: Session to associate the conclusion with.
content: The conclusion text (e.g. "User prefers dark mode").
content: The conclusion text.
peer: Peer alias or explicit peer ID. "user" is the default alias.
Returns:
True on success, False on failure.
@@ -985,25 +1092,90 @@ class HonchoSessionManager:
return False
try:
if self._ai_observe_others:
# AI peer creates conclusion about user (cross-observation)
target_peer_id = self._resolve_peer_id(session, peer)
if target_peer_id is None:
logger.warning("Could not resolve conclusion peer '%s' for session '%s'", peer, session_key)
return False
if target_peer_id == session.assistant_peer_id:
assistant_peer = self._get_or_create_peer(session.assistant_peer_id)
conclusions_scope = assistant_peer.conclusions_of(session.user_peer_id)
conclusions_scope = assistant_peer.conclusions_of(session.assistant_peer_id)
elif self._ai_observe_others:
assistant_peer = self._get_or_create_peer(session.assistant_peer_id)
conclusions_scope = assistant_peer.conclusions_of(target_peer_id)
else:
# AI can't observe others — user peer creates self-conclusion
user_peer = self._get_or_create_peer(session.user_peer_id)
conclusions_scope = user_peer.conclusions_of(session.user_peer_id)
target_peer = self._get_or_create_peer(target_peer_id)
conclusions_scope = target_peer.conclusions_of(target_peer_id)
conclusions_scope.create([{
"content": content.strip(),
"session_id": session.honcho_session_id,
}])
logger.info("Created conclusion for %s: %s", session_key, content[:80])
logger.info("Created conclusion about %s for %s: %s", target_peer_id, session_key, content[:80])
return True
except Exception as e:
logger.error("Failed to create conclusion: %s", e)
return False
def delete_conclusion(self, session_key: str, conclusion_id: str, peer: str = "user") -> bool:
"""Delete a conclusion by ID. Use only for PII removal.
Args:
session_key: Session key for peer resolution.
conclusion_id: The conclusion ID to delete.
peer: Peer alias or explicit peer ID.
Returns:
True on success, False on failure.
"""
session = self._cache.get(session_key)
if not session:
return False
try:
target_peer_id = self._resolve_peer_id(session, peer)
if target_peer_id == session.assistant_peer_id:
observer = self._get_or_create_peer(session.assistant_peer_id)
scope = observer.conclusions_of(session.assistant_peer_id)
elif self._ai_observe_others:
observer = self._get_or_create_peer(session.assistant_peer_id)
scope = observer.conclusions_of(target_peer_id)
else:
target_peer = self._get_or_create_peer(target_peer_id)
scope = target_peer.conclusions_of(target_peer_id)
scope.delete(conclusion_id)
logger.info("Deleted conclusion %s for %s", conclusion_id, session_key)
return True
except Exception as e:
logger.error("Failed to delete conclusion %s: %s", conclusion_id, e)
return False
def set_peer_card(self, session_key: str, card: list[str], peer: str = "user") -> list[str] | None:
"""Update a peer's card.
Args:
session_key: Session key for peer resolution.
card: New peer card as list of fact strings.
peer: Peer alias or explicit peer ID.
Returns:
Updated card on success, None on failure.
"""
session = self._cache.get(session_key)
if not session:
return None
try:
peer_id = self._resolve_peer_id(session, peer)
if peer_id is None:
logger.warning("Could not resolve peer '%s' for set_peer_card in session '%s'", peer, session_key)
return None
peer_obj = self._get_or_create_peer(peer_id)
result = peer_obj.set_card(card)
logger.info("Updated peer card for %s (%d facts)", peer_id, len(card))
return result
except Exception as e:
logger.error("Failed to set peer card: %s", e)
return None
def seed_ai_identity(self, session_key: str, content: str, source: str = "manual") -> bool:
"""
Seed the AI peer's Honcho representation from text content.
@@ -1061,7 +1233,7 @@ class HonchoSessionManager:
return {"representation": "", "card": ""}
try:
ctx = self._fetch_peer_context(session.assistant_peer_id)
ctx = self._fetch_peer_context(session.assistant_peer_id, target=session.assistant_peer_id)
return {
"representation": ctx["representation"] or "",
"card": "\n".join(ctx["card"]),
+46 -9
View File
@@ -10,8 +10,9 @@ lifecycle instead of read-only search endpoints.
Config via environment variables (profile-scoped via each profile's .env):
OPENVIKING_ENDPOINT Server URL (default: http://127.0.0.1:1933)
OPENVIKING_API_KEY API key (required for authenticated servers)
OPENVIKING_ACCOUNT Tenant account (default: root)
OPENVIKING_ACCOUNT Tenant account (default: default)
OPENVIKING_USER Tenant user (default: default)
OPENVIKING_AGENT Tenant agent (default: hermes)
Capabilities:
- Automatic memory extraction on session commit (6 categories)
@@ -80,11 +81,12 @@ class _VikingClient:
"""Thin HTTP client for the OpenViking REST API."""
def __init__(self, endpoint: str, api_key: str = "",
account: str = "", user: str = ""):
account: str = "", user: str = "", agent: str = ""):
self._endpoint = endpoint.rstrip("/")
self._api_key = api_key
self._account = account or os.environ.get("OPENVIKING_ACCOUNT", "root")
self._account = account or os.environ.get("OPENVIKING_ACCOUNT", "default")
self._user = user or os.environ.get("OPENVIKING_USER", "default")
self._agent = agent or os.environ.get("OPENVIKING_AGENT", "hermes")
self._httpx = _get_httpx()
if self._httpx is None:
raise ImportError("httpx is required for OpenViking: pip install httpx")
@@ -94,6 +96,7 @@ class _VikingClient:
"Content-Type": "application/json",
"X-OpenViking-Account": self._account,
"X-OpenViking-User": self._user,
"X-OpenViking-Agent": self._agent,
}
if self._api_key:
h["X-API-Key"] = self._api_key
@@ -282,20 +285,44 @@ class OpenVikingMemoryProvider(MemoryProvider):
},
{
"key": "api_key",
"description": "OpenViking API key",
"description": "OpenViking API key (leave blank for local dev mode)",
"secret": True,
"env_var": "OPENVIKING_API_KEY",
},
{
"key": "account",
"description": "OpenViking tenant account ID ([default], used when local mode, OPENVIKING_API_KEY is empty)",
"default": "default",
"env_var": "OPENVIKING_ACCOUNT",
},
{
"key": "user",
"description": "OpenViking user ID within the account ([default], used when local mode, OPENVIKING_API_KEY is empty)",
"default": "default",
"env_var": "OPENVIKING_USER",
},
{
"key": "agent",
"description": "OpenViking agent ID within the account ([hermes], useful in multi-agent mode)",
"default": "hermes",
"env_var": "OPENVIKING_AGENT",
},
]
def initialize(self, session_id: str, **kwargs) -> None:
self._endpoint = os.environ.get("OPENVIKING_ENDPOINT", _DEFAULT_ENDPOINT)
self._api_key = os.environ.get("OPENVIKING_API_KEY", "")
self._account = os.environ.get("OPENVIKING_ACCOUNT", "default")
self._user = os.environ.get("OPENVIKING_USER", "default")
self._agent = os.environ.get("OPENVIKING_AGENT", "hermes")
self._session_id = session_id
self._turn_count = 0
try:
self._client = _VikingClient(self._endpoint, self._api_key)
self._client = _VikingClient(
self._endpoint, self._api_key,
account=self._account, user=self._user, agent=self._agent,
)
if not self._client.health():
logger.warning("OpenViking server at %s is not reachable", self._endpoint)
self._client = None
@@ -325,7 +352,8 @@ class OpenVikingMemoryProvider(MemoryProvider):
"(abstract/overview/full), viking_browse to explore.\n"
"Use viking_remember to store facts, viking_add_resource to index URLs/docs."
)
except Exception:
except Exception as e:
logger.warning("OpenViking system_prompt_block failed: %s", e)
return (
"# OpenViking Knowledge Base\n"
f"Active. Endpoint: {self._endpoint}\n"
@@ -351,7 +379,10 @@ class OpenVikingMemoryProvider(MemoryProvider):
def _run():
try:
client = _VikingClient(self._endpoint, self._api_key)
client = _VikingClient(
self._endpoint, self._api_key,
account=self._account, user=self._user, agent=self._agent,
)
resp = client.post("/api/v1/search/find", {
"query": query,
"top_k": 5,
@@ -386,7 +417,10 @@ class OpenVikingMemoryProvider(MemoryProvider):
def _sync():
try:
client = _VikingClient(self._endpoint, self._api_key)
client = _VikingClient(
self._endpoint, self._api_key,
account=self._account, user=self._user, agent=self._agent,
)
sid = self._session_id
# Add user message
@@ -442,7 +476,10 @@ class OpenVikingMemoryProvider(MemoryProvider):
def _write():
try:
client = _VikingClient(self._endpoint, self._api_key)
client = _VikingClient(
self._endpoint, self._api_key,
account=self._account, user=self._user, agent=self._agent,
)
# Add as a user message with memory context so the commit
# picks it up as an explicit memory during extraction
client.post(f"/api/v1/sessions/{self._session_id}/messages", {
+3
View File
@@ -63,10 +63,12 @@ homeassistant = ["aiohttp>=3.9.0,<4"]
sms = ["aiohttp>=3.9.0,<4"]
acp = ["agent-client-protocol>=0.9.0,<1.0"]
mistral = ["mistralai>=2.3.0,<3"]
bedrock = ["boto3>=1.35.0,<2"]
termux = [
# Tested Android / Termux path: keeps the core CLI feature-rich while
# avoiding extras that currently depend on non-Android wheels (notably
# faster-whisper -> ctranslate2 via the voice extra).
"python-telegram-bot[webhooks]>=22.6,<23",
"hermes-agent[cron]",
"hermes-agent[cli]",
"hermes-agent[pty]",
@@ -108,6 +110,7 @@ all = [
"hermes-agent[dingtalk]",
"hermes-agent[feishu]",
"hermes-agent[mistral]",
"hermes-agent[bedrock]",
"hermes-agent[web]",
]
+696 -84
View File
File diff suppressed because it is too large Load Diff
+6 -1
View File
@@ -28,7 +28,7 @@ BOLD='\033[1m'
# Configuration
REPO_URL_SSH="git@github.com:NousResearch/hermes-agent.git"
REPO_URL_HTTPS="https://github.com/NousResearch/hermes-agent.git"
HERMES_HOME="$HOME/.hermes"
HERMES_HOME="${HERMES_HOME:-$HOME/.hermes}"
INSTALL_DIR="${HERMES_INSTALL_DIR:-$HERMES_HOME/hermes-agent}"
PYTHON_VERSION="3.11"
NODE_VERSION="22"
@@ -66,6 +66,10 @@ while [[ $# -gt 0 ]]; do
INSTALL_DIR="$2"
shift 2
;;
--hermes-home)
HERMES_HOME="$2"
shift 2
;;
-h|--help)
echo "Hermes Agent Installer"
echo ""
@@ -76,6 +80,7 @@ while [[ $# -gt 0 ]]; do
echo " --skip-setup Skip interactive setup wizard"
echo " --branch NAME Git branch to install (default: main)"
echo " --dir PATH Installation directory (default: ~/.hermes/hermes-agent)"
echo " --hermes-home PATH Data directory (default: ~/.hermes, or \$HERMES_HOME)"
echo " -h, --help Show this help"
exit 0
;;
+15 -1
View File
@@ -62,7 +62,9 @@ AUTHOR_MAP = {
"258577966+voidborne-d@users.noreply.github.com": "voidborne-d",
"70424851+insecurejezza@users.noreply.github.com": "insecurejezza",
"259807879+Bartok9@users.noreply.github.com": "Bartok9",
"241404605+MestreY0d4-Uninter@users.noreply.github.com": "MestreY0d4-Uninter",
"268667990+Roy-oss1@users.noreply.github.com": "Roy-oss1",
"241404605+MestreY0d4-Uninter@users.noreply.github.com": "MestreY0d4-Uninter",
# contributors (manual mapping from git names)
"dmayhem93@gmail.com": "dmahan93",
"samherring99@gmail.com": "samherring99",
@@ -75,8 +77,13 @@ AUTHOR_MAP = {
"abdullahfarukozden@gmail.com": "Farukest",
"lovre.pesut@gmail.com": "rovle",
"hakanerten02@hotmail.com": "teyrebaz33",
"ruzzgarcn@gmail.com": "Ruzzgar",
"alireza78.crypto@gmail.com": "alireza78a",
"brooklyn.bb.nicholson@gmail.com": "brooklynnicholson",
"4317663+helix4u@users.noreply.github.com": "helix4u",
"331214+counterposition@users.noreply.github.com": "counterposition",
"blspear@gmail.com": "BrennerSpear",
"239876380+handsdiff@users.noreply.github.com": "handsdiff",
"gpickett00@gmail.com": "gpickett00",
"mcosma@gmail.com": "wakamex",
"clawdia.nash@proton.me": "clawdia-nash",
@@ -95,7 +102,9 @@ AUTHOR_MAP = {
"vincentcharlebois@gmail.com": "vincentcharlebois",
"aryan@synvoid.com": "aryansingh",
"johnsonblake1@gmail.com": "blakejohnson",
"greer.guthrie@gmail.com": "g-guthrie",
"kennyx102@gmail.com": "bobashopcashier",
"shokatalishaikh95@gmail.com": "areu01or00",
"bryan@intertwinesys.com": "bryanyoung",
"christo.mitov@gmail.com": "christomitov",
"hermes@nousresearch.com": "NousResearch",
@@ -115,6 +124,9 @@ AUTHOR_MAP = {
"m@statecraft.systems": "mbierling",
"balyan.sid@gmail.com": "balyansid",
"oluwadareab12@gmail.com": "bennytimz",
"simon@simonmarcus.org": "simon-marcus",
"xowiekk@gmail.com": "Xowiek",
"1243352777@qq.com": "zons-zhaozhy",
# ── bulk addition: 75 emails resolved via API, PR salvage bodies, noreply
# crossref, and GH contributor list matching (April 2026 audit) ──
"1115117931@qq.com": "aaronagent",
@@ -187,12 +199,14 @@ AUTHOR_MAP = {
"yangzhi.see@gmail.com": "SeeYangZhi",
"yongtenglei@gmail.com": "yongtenglei",
"young@YoungdeMacBook-Pro.local": "YoungYang963",
"ysfalweshcan@gmail.com": "Awsh1",
"ysfalweshcan@gmail.com": "Junass1",
"ysfwaxlycan@gmail.com": "WAXLYY",
"yusufalweshdemir@gmail.com": "Dusk1e",
"zhouboli@gmail.com": "zhouboli",
"zqiao@microsoft.com": "tomqiaozc",
"zzn+pa@zzn.im": "xinbenlv",
"zaynjarvis@gmail.com": "ZaynJarvis",
"zhiheng.liu@bytedance.com": "ZaynJarvis",
}
@@ -313,7 +313,7 @@ Type these during an interactive chat session.
```
~/.hermes/config.yaml Main configuration
~/.hermes/.env API keys and secrets
~/.hermes/skills/ Installed skills
$HERMES_HOME/skills/ Installed skills
~/.hermes/sessions/ Session transcripts
~/.hermes/logs/ Gateway and error logs
~/.hermes/auth.json OAuth tokens and credential pools
@@ -351,8 +351,8 @@ Full config reference: https://hermes-agent.nousresearch.com/docs/user-guide/con
|----------|------|-------------|
| OpenRouter | API key | `OPENROUTER_API_KEY` |
| Anthropic | API key | `ANTHROPIC_API_KEY` |
| Nous Portal | OAuth | `hermes login --provider nous` |
| OpenAI Codex | OAuth | `hermes login --provider openai-codex` |
| Nous Portal | OAuth | `hermes auth` |
| OpenAI Codex | OAuth | `hermes auth` |
| GitHub Copilot | Token | `COPILOT_GITHUB_TOKEN` |
| Google Gemini | API key | `GOOGLE_API_KEY` or `GEMINI_API_KEY` |
| DeepSeek | API key | `DEEPSEEK_API_KEY` |
@@ -650,9 +650,9 @@ registry.register(
)
```
**2. Add import** in `model_tools.py``_discover_tools()` list.
**2. Add to `toolsets.py`**`_HERMES_CORE_TOOLS` list.
**3. Add to `toolsets.py`** → `_HERMES_CORE_TOOLS` list.
Auto-discovery: any `tools/*.py` file with a top-level `registry.register()` call is imported automatically — no manual list needed.
All handlers must return JSON strings. Use `get_hermes_home()` for paths, never hardcode `~/.hermes`.
+1 -1
View File
@@ -334,7 +334,7 @@ When the user asks you to "review PR #N", "look at this PR", or gives you a PR U
### Step 1: Set up environment
```bash
source ~/.hermes/skills/github/github-auth/scripts/gh-env.sh
source "${HERMES_HOME:-$HOME/.hermes}/skills/github/github-auth/scripts/gh-env.sh"
# Or run the inline setup block from the top of this skill
```
@@ -6,7 +6,7 @@ All requests need: `-H "Authorization: token $GITHUB_TOKEN"`
Use the `gh-env.sh` helper to set `$GITHUB_TOKEN`, `$GH_OWNER`, `$GH_REPO` automatically:
```bash
source ~/.hermes/skills/github/github-auth/scripts/gh-env.sh
source "${HERMES_HOME:-$HOME/.hermes}/skills/github/github-auth/scripts/gh-env.sh"
```
## Repositories
@@ -98,7 +98,7 @@ def find_nearby(lat: float, lon: float, types: list[str], radius: int = 1500, li
# Get coordinates (nodes have lat/lon directly, ways/relations use center)
plat = el.get("lat") or (el.get("center", {}) or {}).get("lat")
plon = el.get("lon") or (el.get("center", {}) or {}).get("lon")
if not plat or not plon:
if plat is None or plon is None:
continue
dist = haversine(lat, lon, plat, plon)
+136 -105
View File
@@ -1,35 +1,19 @@
---
name: google-workspace
description: Gmail, Calendar, Drive, Contacts, Sheets, and Docs integration via gws CLI (googleworkspace/cli). Uses OAuth2 with automatic token refresh via bridge script. Requires gws binary.
version: 2.0.0
description: Gmail, Calendar, Drive, Contacts, Sheets, and Docs integration for Hermes. Uses Hermes-managed OAuth2 setup, prefers the Google Workspace CLI (`gws`) when available for broader API coverage, and falls back to the Python client libraries otherwise.
version: 1.0.0
author: Nous Research
license: MIT
required_credential_files:
- path: google_token.json
description: Google OAuth2 token (created by setup script)
- path: google_client_secret.json
description: Google OAuth2 client credentials (downloaded from Google Cloud Console)
metadata:
hermes:
tags: [Google, Gmail, Calendar, Drive, Sheets, Docs, Contacts, Email, OAuth, gws]
tags: [Google, Gmail, Calendar, Drive, Sheets, Docs, Contacts, Email, OAuth]
homepage: https://github.com/NousResearch/hermes-agent
related_skills: [himalaya]
---
# Google Workspace
Gmail, Calendar, Drive, Contacts, Sheets, and Docs — powered by `gws` (Google's official Rust CLI). The skill provides a backward-compatible Python wrapper that handles OAuth token refresh and delegates to `gws`.
## Architecture
```
google_api.py → gws_bridge.py → gws CLI
(argparse compat) (token refresh) (Google APIs)
```
- `setup.py` handles OAuth2 (headless-compatible, works on CLI/Telegram/Discord)
- `gws_bridge.py` refreshes the Hermes token and injects it into `gws` via `GOOGLE_WORKSPACE_CLI_TOKEN`
- `google_api.py` provides the same CLI interface as v1 but delegates to `gws`
Gmail, Calendar, Drive, Contacts, Sheets, and Docs — through Hermes-managed OAuth and a thin CLI wrapper. When `gws` is installed, the skill uses it as the execution backend for broader Google Workspace coverage; otherwise it falls back to the bundled Python client implementation.
## References
@@ -38,22 +22,7 @@ google_api.py → gws_bridge.py → gws CLI
## Scripts
- `scripts/setup.py` — OAuth2 setup (run once to authorize)
- `scripts/gws_bridge.py` — Token refresh bridge to gws CLI
- `scripts/google_api.py` — Backward-compatible API wrapper (delegates to gws)
## Prerequisites
Install `gws`:
```bash
cargo install google-workspace-cli
# or via npm (recommended, downloads prebuilt binary):
npm install -g @googleworkspace/cli
# or via Homebrew:
brew install googleworkspace-cli
```
Verify: `gws --version`
- `scripts/google_api.py` — compatibility wrapper CLI. It prefers `gws` for operations when available, while preserving Hermes' existing JSON output contract.
## First-Time Setup
@@ -63,13 +32,7 @@ on CLI, Telegram, Discord, or any platform.
Define a shorthand first:
```bash
HERMES_HOME="${HERMES_HOME:-$HOME/.hermes}"
GWORKSPACE_SKILL_DIR="$HERMES_HOME/skills/productivity/google-workspace"
PYTHON_BIN="${HERMES_PYTHON:-python3}"
if [ -x "$HERMES_HOME/hermes-agent/venv/bin/python" ]; then
PYTHON_BIN="$HERMES_HOME/hermes-agent/venv/bin/python"
fi
GSETUP="$PYTHON_BIN $GWORKSPACE_SKILL_DIR/scripts/setup.py"
GSETUP="python ${HERMES_HOME:-$HOME/.hermes}/skills/productivity/google-workspace/scripts/setup.py"
```
### Step 0: Check if already set up
@@ -82,88 +45,166 @@ If it prints `AUTHENTICATED`, skip to Usage — setup is already done.
### Step 1: Triage — ask the user what they need
Before starting OAuth setup, ask the user TWO questions:
**Question 1: "What Google services do you need? Just email, or also
Calendar/Drive/Sheets/Docs?"**
- **Email only** → Use the `himalaya` skill instead — simpler setup.
- **Calendar, Drive, Sheets, Docs (or email + these)** → Continue below.
- **Email only** They don't need this skill at all. Use the `himalaya` skill
instead — it works with a Gmail App Password (Settings → Security → App
Passwords) and takes 2 minutes to set up. No Google Cloud project needed.
Load the himalaya skill and follow its setup instructions.
**Partial scopes**: Users can authorize only a subset of services. The setup
script accepts partial scopes and warns about missing ones.
- **Email + Calendar** → Continue with this skill, but use
`--services email,calendar` during auth so the consent screen only asks for
the scopes they actually need.
**Question 2: "Does your Google account use Advanced Protection?"**
- **Calendar/Drive/Sheets/Docs only** → Continue with this skill and use a
narrower `--services` set like `calendar,drive,sheets,docs`.
- **No / Not sure** → Normal setup.
- **Yes** → Workspace admin must add the OAuth client ID to allowed apps first.
- **Full Workspace access** → Continue with this skill and use the default
`all` service set.
**Question 2: "Does your Google account use Advanced Protection (hardware
security keys required to sign in)? If you're not sure, you probably don't
— it's something you would have explicitly enrolled in."**
- **No / Not sure** → Normal setup. Continue below.
- **Yes** → Their Workspace admin must add the OAuth client ID to the org's
allowed apps list before Step 4 will work. Let them know upfront.
### Step 2: Create OAuth credentials (one-time, ~5 minutes)
Tell the user:
> 1. Go to https://console.cloud.google.com/apis/credentials
> 2. Create a project (or use an existing one)
> 3. Enable the APIs you need (Gmail, Calendar, Drive, Sheets, Docs, People)
> 4. Credentials → Create Credentials → OAuth 2.0 Client ID → Desktop app
> 5. Download JSON and tell me the file path
> You need a Google Cloud OAuth client. This is a one-time setup:
>
> 1. Create or select a project:
> https://console.cloud.google.com/projectselector2/home/dashboard
> 2. Enable the required APIs from the API Library:
> https://console.cloud.google.com/apis/library
> Enable: Gmail API, Google Calendar API, Google Drive API,
> Google Sheets API, Google Docs API, People API
> 3. Create the OAuth client here:
> https://console.cloud.google.com/apis/credentials
> Credentials → Create Credentials → OAuth 2.0 Client ID
> 4. Application type: "Desktop app" → Create
> 5. If the app is still in Testing, add the user's Google account as a test user here:
> https://console.cloud.google.com/auth/audience
> Audience → Test users → Add users
> 6. Download the JSON file and tell me the file path
>
> Important Hermes CLI note: if the file path starts with `/`, do NOT send only the bare path as its own message in the CLI, because it can be mistaken for a slash command. Send it in a sentence instead, like:
> `The JSON file path is: /home/user/Downloads/client_secret_....json`
Once they provide the path:
```bash
$GSETUP --client-secret /path/to/client_secret.json
```
If they paste the raw client ID / client secret values instead of a file path,
write a valid Desktop OAuth JSON file for them yourself, save it somewhere
explicit (for example `~/Downloads/hermes-google-client-secret.json`), then run
`--client-secret` against that file.
### Step 3: Get authorization URL
Use the service set chosen in Step 1. Examples:
```bash
$GSETUP --auth-url
$GSETUP --auth-url --services email,calendar --format json
$GSETUP --auth-url --services calendar,drive,sheets,docs --format json
$GSETUP --auth-url --services all --format json
```
Send the URL to the user. After authorizing, they paste back the redirect URL or code.
This returns JSON with an `auth_url` field and also saves the exact URL to
`~/.hermes/google_oauth_last_url.txt`.
Agent rules for this step:
- Extract the `auth_url` field and send that exact URL to the user as a single line.
- Tell the user that the browser will likely fail on `http://localhost:1` after approval, and that this is expected.
- Tell them to copy the ENTIRE redirected URL from the browser address bar.
- If the user gets `Error 403: access_denied`, send them directly to `https://console.cloud.google.com/auth/audience` to add themselves as a test user.
### Step 4: Exchange the code
The user will paste back either a URL like `http://localhost:1/?code=4/0A...&scope=...`
or just the code string. Either works. The `--auth-url` step stores a temporary
pending OAuth session locally so `--auth-code` can complete the PKCE exchange
later, even on headless systems:
```bash
$GSETUP --auth-code "THE_URL_OR_CODE_THE_USER_PASTED"
$GSETUP --auth-code "THE_URL_OR_CODE_THE_USER_PASTED" --format json
```
If `--auth-code` fails because the code expired, was already used, or came from
an older browser tab, it now returns a fresh `fresh_auth_url`. In that case,
immediately send the new URL to the user and have them retry with the newest
browser redirect only.
### Step 5: Verify
```bash
$GSETUP --check
```
Should print `AUTHENTICATED`. Token refreshes automatically from now on.
Should print `AUTHENTICATED`. Setup is complete — token refreshes automatically from now on.
### Notes
- Token is stored at `~/.hermes/google_token.json` and auto-refreshes.
- Pending OAuth session state/verifier are stored temporarily at `~/.hermes/google_oauth_pending.json` until exchange completes.
- If `gws` is installed, `google_api.py` points it at the same `~/.hermes/google_token.json` credentials file. Users do not need to run a separate `gws auth login` flow.
- To revoke: `$GSETUP --revoke`
## Usage
All commands go through the API script:
All commands go through the API script. Set `GAPI` as a shorthand:
```bash
HERMES_HOME="${HERMES_HOME:-$HOME/.hermes}"
GWORKSPACE_SKILL_DIR="$HERMES_HOME/skills/productivity/google-workspace"
PYTHON_BIN="${HERMES_PYTHON:-python3}"
if [ -x "$HERMES_HOME/hermes-agent/venv/bin/python" ]; then
PYTHON_BIN="$HERMES_HOME/hermes-agent/venv/bin/python"
fi
GAPI="$PYTHON_BIN $GWORKSPACE_SKILL_DIR/scripts/google_api.py"
GAPI="python ${HERMES_HOME:-$HOME/.hermes}/skills/productivity/google-workspace/scripts/google_api.py"
```
### Gmail
```bash
# Search (returns JSON array with id, from, subject, date, snippet)
$GAPI gmail search "is:unread" --max 10
$GAPI gmail search "from:boss@company.com newer_than:1d"
$GAPI gmail search "has:attachment filename:pdf newer_than:7d"
# Read full message (returns JSON with body text)
$GAPI gmail get MESSAGE_ID
# Send
$GAPI gmail send --to user@example.com --subject "Hello" --body "Message text"
$GAPI gmail send --to user@example.com --subject "Report" --body "<h1>Q4</h1>" --html
$GAPI gmail send --to user@example.com --subject "Report" --body "<h1>Q4</h1><p>Details...</p>" --html
$GAPI gmail send --to user@example.com --subject "Hello" --from '"Research Agent" <user@example.com>' --body "Message text"
# Reply (automatically threads and sets In-Reply-To)
$GAPI gmail reply MESSAGE_ID --body "Thanks, that works for me."
$GAPI gmail reply MESSAGE_ID --from '"Support Bot" <user@example.com>' --body "Thanks"
# Labels
$GAPI gmail labels
$GAPI gmail modify MESSAGE_ID --add-labels LABEL_ID
$GAPI gmail modify MESSAGE_ID --remove-labels UNREAD
```
### Calendar
```bash
# List events (defaults to next 7 days)
$GAPI calendar list
$GAPI calendar create --summary "Standup" --start 2026-03-01T10:00:00+01:00 --end 2026-03-01T10:30:00+01:00
$GAPI calendar create --summary "Review" --start ... --end ... --attendees "alice@co.com,bob@co.com"
$GAPI calendar list --start 2026-03-01T00:00:00Z --end 2026-03-07T23:59:59Z
# Create event (ISO 8601 with timezone required)
$GAPI calendar create --summary "Team Standup" --start 2026-03-01T10:00:00-06:00 --end 2026-03-01T10:30:00-06:00
$GAPI calendar create --summary "Lunch" --start 2026-03-01T12:00:00Z --end 2026-03-01T13:00:00Z --location "Cafe"
$GAPI calendar create --summary "Review" --start 2026-03-01T14:00:00Z --end 2026-03-01T15:00:00Z --attendees "alice@co.com,bob@co.com"
# Delete event
$GAPI calendar delete EVENT_ID
```
@@ -183,8 +224,13 @@ $GAPI contacts list --max 20
### Sheets
```bash
# Read
$GAPI sheets get SHEET_ID "Sheet1!A1:D10"
# Write
$GAPI sheets update SHEET_ID "Sheet1!A1:B2" --values '[["Name","Score"],["Alice","95"]]'
# Append rows
$GAPI sheets append SHEET_ID "Sheet1!A:C" --values '[["new","row","data"]]'
```
@@ -194,52 +240,37 @@ $GAPI sheets append SHEET_ID "Sheet1!A:C" --values '[["new","row","data"]]'
$GAPI docs get DOC_ID
```
### Direct gws access (advanced)
For operations not covered by the wrapper, use `gws_bridge.py` directly:
```bash
GBRIDGE="$PYTHON_BIN $GWORKSPACE_SKILL_DIR/scripts/gws_bridge.py"
$GBRIDGE calendar +agenda --today --format table
$GBRIDGE gmail +triage --labels --format json
$GBRIDGE drive +upload ./report.pdf
$GBRIDGE sheets +read --spreadsheet SHEET_ID --range "Sheet1!A1:D10"
```
## Output Format
All commands return JSON via `gws --format json`. Key output shapes:
All commands return JSON. Parse with `jq` or read directly. Key fields:
- **Gmail search/triage**: Array of message summaries (sender, subject, date, snippet)
- **Gmail get/read**: Message object with headers and body text
- **Gmail send/reply**: Confirmation with message ID
- **Calendar list/agenda**: Array of event objects (summary, start, end, location)
- **Calendar create**: Confirmation with event ID and htmlLink
- **Drive search**: Array of file objects (id, name, mimeType, webViewLink)
- **Sheets get/read**: 2D array of cell values
- **Docs get**: Full document JSON (use `body.content` for text extraction)
- **Contacts list**: Array of person objects with names, emails, phones
Parse output with `jq` or read JSON directly.
- **Gmail search**: `[{id, threadId, from, to, subject, date, snippet, labels}]`
- **Gmail get**: `{id, threadId, from, to, subject, date, labels, body}`
- **Gmail send/reply**: `{status: "sent", id, threadId}`
- **Calendar list**: `[{id, summary, start, end, location, description, htmlLink}]`
- **Calendar create**: `{status: "created", id, summary, htmlLink}`
- **Drive search**: `[{id, name, mimeType, modifiedTime, webViewLink}]`
- **Contacts list**: `[{name, emails: [...], phones: [...]}]`
- **Sheets get**: `[[cell, cell, ...], ...]`
## Rules
1. **Never send email or create/delete events without confirming with the user first.**
2. **Check auth before first use** — run `setup.py --check`.
3. **Use the Gmail search syntax reference** for complex queries.
4. **Calendar times must include timezone** — ISO 8601 with offset or UTC.
5. **Respect rate limits** — avoid rapid-fire sequential API calls.
1. **Never send email or create/delete events without confirming with the user first.** Show the draft content and ask for approval.
2. **Check auth before first use** — run `setup.py --check`. If it fails, guide the user through setup.
3. **Use the Gmail search syntax reference** for complex queries — load it with `skill_view("google-workspace", file_path="references/gmail-search-syntax.md")`.
4. **Calendar times must include timezone** always use ISO 8601 with offset (e.g., `2026-03-01T10:00:00-06:00`) or UTC (`Z`).
5. **Respect rate limits** — avoid rapid-fire sequential API calls. Batch reads when possible.
## Troubleshooting
| Problem | Fix |
|---------|-----|
| `NOT_AUTHENTICATED` | Run setup Steps 2-5 |
| `REFRESH_FAILED` | Token revoked — redo Steps 3-5 |
| `gws: command not found` | Install: `npm install -g @googleworkspace/cli` |
| `HttpError 403` | Missing scope — `$GSETUP --revoke` then redo Steps 3-5 |
| `HttpError 403: Access Not Configured` | Enable API in Google Cloud Console |
| Advanced Protection blocks auth | Admin must allowlist the OAuth client ID |
| `NOT_AUTHENTICATED` | Run setup Steps 2-5 above |
| `REFRESH_FAILED` | Token revoked or expired — redo Steps 3-5 |
| `HttpError 403: Insufficient Permission` | Missing API scope — `$GSETUP --revoke` then redo Steps 3-5 |
| `HttpError 403: Access Not Configured` | API not enabled — user needs to enable it in Google Cloud Console |
| `ModuleNotFoundError` | Run `$GSETUP --install-deps` |
| Advanced Protection blocks auth | Workspace admin must allowlist the OAuth client ID |
## Revoking Access
@@ -1,17 +1,17 @@
#!/usr/bin/env python3
"""Google Workspace API CLI for Hermes Agent.
Thin wrapper that delegates to gws (googleworkspace/cli) via gws_bridge.py.
Maintains the same CLI interface for backward compatibility with Hermes skills.
Uses the Google Workspace CLI (`gws`) when available, but preserves the
existing Hermes-facing JSON contract and falls back to the Python client
libraries if `gws` is not installed.
Usage:
python google_api.py gmail search "is:unread" [--max 10]
python google_api.py gmail get MESSAGE_ID
python google_api.py gmail send --to user@example.com --subject "Hi" --body "Hello"
python google_api.py gmail reply MESSAGE_ID --body "Thanks"
python google_api.py calendar list [--start DATE] [--end DATE] [--calendar primary]
python google_api.py calendar list [--from DATE] [--to DATE] [--calendar primary]
python google_api.py calendar create --summary "Meeting" --start DATETIME --end DATETIME
python google_api.py calendar delete EVENT_ID
python google_api.py drive search "budget report" [--max 10]
python google_api.py contacts list [--max 20]
python google_api.py sheets get SHEET_ID RANGE
@@ -21,47 +21,396 @@ Usage:
"""
import argparse
import base64
import json
import os
import shutil
import subprocess
import sys
from datetime import datetime, timedelta, timezone
from email.mime.text import MIMEText
from pathlib import Path
BRIDGE = Path(__file__).parent / "gws_bridge.py"
PYTHON = sys.executable
HERMES_HOME = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
TOKEN_PATH = HERMES_HOME / "google_token.json"
CLIENT_SECRET_PATH = HERMES_HOME / "google_client_secret.json"
SCOPES = [
"https://www.googleapis.com/auth/gmail.readonly",
"https://www.googleapis.com/auth/gmail.send",
"https://www.googleapis.com/auth/gmail.modify",
"https://www.googleapis.com/auth/calendar",
"https://www.googleapis.com/auth/drive.readonly",
"https://www.googleapis.com/auth/contacts.readonly",
"https://www.googleapis.com/auth/spreadsheets",
"https://www.googleapis.com/auth/documents.readonly",
]
def gws(*args: str) -> None:
"""Call gws via the bridge and exit with its return code."""
def _ensure_authenticated():
if not TOKEN_PATH.exists():
print("Not authenticated. Run the setup script first:", file=sys.stderr)
print(f" python {Path(__file__).parent / 'setup.py'}", file=sys.stderr)
sys.exit(1)
def _stored_token_scopes() -> list[str]:
try:
data = json.loads(TOKEN_PATH.read_text())
except Exception:
return list(SCOPES)
scopes = data.get("scopes")
if isinstance(scopes, list) and scopes:
return scopes
return list(SCOPES)
def _gws_binary() -> str | None:
override = os.getenv("HERMES_GWS_BIN")
if override:
return override
return shutil.which("gws")
def _gws_env() -> dict[str, str]:
env = os.environ.copy()
env["GOOGLE_WORKSPACE_CLI_CREDENTIALS_FILE"] = str(TOKEN_PATH)
return env
def _run_gws(parts: list[str], *, params: dict | None = None, body: dict | None = None):
binary = _gws_binary()
if not binary:
raise RuntimeError("gws not installed")
_ensure_authenticated()
cmd = [binary, *parts]
if params is not None:
cmd.extend(["--params", json.dumps(params)])
if body is not None:
cmd.extend(["--json", json.dumps(body)])
result = subprocess.run(
[PYTHON, str(BRIDGE)] + list(args),
env={**os.environ, "HERMES_HOME": os.environ.get("HERMES_HOME", str(Path.home() / ".hermes"))},
cmd,
capture_output=True,
text=True,
env=_gws_env(),
)
sys.exit(result.returncode)
if result.returncode != 0:
err = result.stderr.strip() or result.stdout.strip() or "Unknown gws error"
print(err, file=sys.stderr)
sys.exit(result.returncode or 1)
stdout = result.stdout.strip()
if not stdout:
return {}
try:
return json.loads(stdout)
except json.JSONDecodeError:
print("ERROR: Unexpected non-JSON output from gws:", file=sys.stderr)
print(stdout, file=sys.stderr)
sys.exit(1)
# -- Gmail --
def _headers_dict(msg: dict) -> dict[str, str]:
return {h["name"]: h["value"] for h in msg.get("payload", {}).get("headers", [])}
def _extract_message_body(msg: dict) -> str:
body = ""
payload = msg.get("payload", {})
if payload.get("body", {}).get("data"):
body = base64.urlsafe_b64decode(payload["body"]["data"]).decode("utf-8", errors="replace")
elif payload.get("parts"):
for part in payload["parts"]:
if part.get("mimeType") == "text/plain" and part.get("body", {}).get("data"):
body = base64.urlsafe_b64decode(part["body"]["data"]).decode("utf-8", errors="replace")
break
if not body:
for part in payload["parts"]:
if part.get("mimeType") == "text/html" and part.get("body", {}).get("data"):
body = base64.urlsafe_b64decode(part["body"]["data"]).decode("utf-8", errors="replace")
break
return body
def _extract_doc_text(doc: dict) -> str:
text_parts = []
for element in doc.get("body", {}).get("content", []):
paragraph = element.get("paragraph", {})
for pe in paragraph.get("elements", []):
text_run = pe.get("textRun", {})
if text_run.get("content"):
text_parts.append(text_run["content"])
return "".join(text_parts)
def _datetime_with_timezone(value: str) -> str:
if not value:
return value
if "T" not in value:
return value
if value.endswith("Z"):
return value
tail = value[10:]
if "+" in tail or "-" in tail:
return value
return value + "Z"
def get_credentials():
"""Load and refresh credentials from token file."""
_ensure_authenticated()
from google.oauth2.credentials import Credentials
from google.auth.transport.requests import Request
creds = Credentials.from_authorized_user_file(str(TOKEN_PATH), _stored_token_scopes())
if creds.expired and creds.refresh_token:
creds.refresh(Request())
TOKEN_PATH.write_text(creds.to_json())
if not creds.valid:
print("Token is invalid. Re-run setup.", file=sys.stderr)
sys.exit(1)
return creds
def build_service(api, version):
from googleapiclient.discovery import build
return build(api, version, credentials=get_credentials())
# =========================================================================
# Gmail
# =========================================================================
def gmail_search(args):
cmd = ["gmail", "+triage", "--query", args.query, "--max", str(args.max), "--format", "json"]
gws(*cmd)
if _gws_binary():
results = _run_gws(
["gmail", "users", "messages", "list"],
params={"userId": "me", "q": args.query, "maxResults": args.max},
)
messages = results.get("messages", [])
output = []
for msg_meta in messages:
msg = _run_gws(
["gmail", "users", "messages", "get"],
params={
"userId": "me",
"id": msg_meta["id"],
"format": "metadata",
"metadataHeaders": ["From", "To", "Subject", "Date"],
},
)
headers = _headers_dict(msg)
output.append(
{
"id": msg["id"],
"threadId": msg["threadId"],
"from": headers.get("From", ""),
"to": headers.get("To", ""),
"subject": headers.get("Subject", ""),
"date": headers.get("Date", ""),
"snippet": msg.get("snippet", ""),
"labels": msg.get("labelIds", []),
}
)
print(json.dumps(output, indent=2, ensure_ascii=False))
return
service = build_service("gmail", "v1")
results = service.users().messages().list(
userId="me", q=args.query, maxResults=args.max
).execute()
messages = results.get("messages", [])
if not messages:
print("No messages found.")
return
output = []
for msg_meta in messages:
msg = service.users().messages().get(
userId="me", id=msg_meta["id"], format="metadata",
metadataHeaders=["From", "To", "Subject", "Date"],
).execute()
headers = _headers_dict(msg)
output.append({
"id": msg["id"],
"threadId": msg["threadId"],
"from": headers.get("From", ""),
"to": headers.get("To", ""),
"subject": headers.get("Subject", ""),
"date": headers.get("Date", ""),
"snippet": msg.get("snippet", ""),
"labels": msg.get("labelIds", []),
})
print(json.dumps(output, indent=2, ensure_ascii=False))
def gmail_get(args):
gws("gmail", "+read", "--id", args.message_id, "--headers", "--format", "json")
if _gws_binary():
msg = _run_gws(
["gmail", "users", "messages", "get"],
params={"userId": "me", "id": args.message_id, "format": "full"},
)
headers = _headers_dict(msg)
result = {
"id": msg["id"],
"threadId": msg["threadId"],
"from": headers.get("From", ""),
"to": headers.get("To", ""),
"subject": headers.get("Subject", ""),
"date": headers.get("Date", ""),
"labels": msg.get("labelIds", []),
"body": _extract_message_body(msg),
}
print(json.dumps(result, indent=2, ensure_ascii=False))
return
service = build_service("gmail", "v1")
msg = service.users().messages().get(
userId="me", id=args.message_id, format="full"
).execute()
headers = _headers_dict(msg)
result = {
"id": msg["id"],
"threadId": msg["threadId"],
"from": headers.get("From", ""),
"to": headers.get("To", ""),
"subject": headers.get("Subject", ""),
"date": headers.get("Date", ""),
"labels": msg.get("labelIds", []),
"body": _extract_message_body(msg),
}
print(json.dumps(result, indent=2, ensure_ascii=False))
def gmail_send(args):
cmd = ["gmail", "+send", "--to", args.to, "--subject", args.subject, "--body", args.body, "--format", "json"]
if _gws_binary():
message = MIMEText(args.body, "html" if args.html else "plain")
message["to"] = args.to
message["subject"] = args.subject
if args.cc:
message["cc"] = args.cc
if args.from_header:
message["from"] = args.from_header
raw = base64.urlsafe_b64encode(message.as_bytes()).decode()
body = {"raw": raw}
if args.thread_id:
body["threadId"] = args.thread_id
result = _run_gws(
["gmail", "users", "messages", "send"],
params={"userId": "me"},
body=body,
)
print(json.dumps({"status": "sent", "id": result["id"], "threadId": result.get("threadId", "")}, indent=2))
return
service = build_service("gmail", "v1")
message = MIMEText(args.body, "html" if args.html else "plain")
message["to"] = args.to
message["subject"] = args.subject
if args.cc:
cmd += ["--cc", args.cc]
if args.html:
cmd.append("--html")
gws(*cmd)
message["cc"] = args.cc
if args.from_header:
message["from"] = args.from_header
raw = base64.urlsafe_b64encode(message.as_bytes()).decode()
body = {"raw": raw}
if args.thread_id:
body["threadId"] = args.thread_id
result = service.users().messages().send(userId="me", body=body).execute()
print(json.dumps({"status": "sent", "id": result["id"], "threadId": result.get("threadId", "")}, indent=2))
def gmail_reply(args):
gws("gmail", "+reply", "--message-id", args.message_id, "--body", args.body, "--format", "json")
if _gws_binary():
original = _run_gws(
["gmail", "users", "messages", "get"],
params={
"userId": "me",
"id": args.message_id,
"format": "metadata",
"metadataHeaders": ["From", "Subject", "Message-ID"],
},
)
headers = _headers_dict(original)
subject = headers.get("Subject", "")
if not subject.startswith("Re:"):
subject = f"Re: {subject}"
message = MIMEText(args.body)
message["to"] = headers.get("From", "")
message["subject"] = subject
if args.from_header:
message["from"] = args.from_header
if headers.get("Message-ID"):
message["In-Reply-To"] = headers["Message-ID"]
message["References"] = headers["Message-ID"]
raw = base64.urlsafe_b64encode(message.as_bytes()).decode()
result = _run_gws(
["gmail", "users", "messages", "send"],
params={"userId": "me"},
body={"raw": raw, "threadId": original["threadId"]},
)
print(json.dumps({"status": "sent", "id": result["id"], "threadId": result.get("threadId", "")}, indent=2))
return
service = build_service("gmail", "v1")
original = service.users().messages().get(
userId="me", id=args.message_id, format="metadata",
metadataHeaders=["From", "Subject", "Message-ID"],
).execute()
headers = _headers_dict(original)
subject = headers.get("Subject", "")
if not subject.startswith("Re:"):
subject = f"Re: {subject}"
message = MIMEText(args.body)
message["to"] = headers.get("From", "")
message["subject"] = subject
if args.from_header:
message["from"] = args.from_header
if headers.get("Message-ID"):
message["In-Reply-To"] = headers["Message-ID"]
message["References"] = headers["Message-ID"]
raw = base64.urlsafe_b64encode(message.as_bytes()).decode()
body = {"raw": raw, "threadId": original["threadId"]}
result = service.users().messages().send(userId="me", body=body).execute()
print(json.dumps({"status": "sent", "id": result["id"], "threadId": result.get("threadId", "")}, indent=2))
def gmail_labels(args):
gws("gmail", "users", "labels", "list", "--params", json.dumps({"userId": "me"}), "--format", "json")
if _gws_binary():
results = _run_gws(["gmail", "users", "labels", "list"], params={"userId": "me"})
labels = [{"id": l["id"], "name": l["name"], "type": l.get("type", "")} for l in results.get("labels", [])]
print(json.dumps(labels, indent=2))
return
service = build_service("gmail", "v1")
results = service.users().labels().list(userId="me").execute()
labels = [{"id": l["id"], "name": l["name"], "type": l.get("type", "")} for l in results.get("labels", [])]
print(json.dumps(labels, indent=2))
def gmail_modify(args):
body = {}
@@ -69,145 +418,310 @@ def gmail_modify(args):
body["addLabelIds"] = args.add_labels.split(",")
if args.remove_labels:
body["removeLabelIds"] = args.remove_labels.split(",")
gws(
"gmail", "users", "messages", "modify",
"--params", json.dumps({"userId": "me", "id": args.message_id}),
"--json", json.dumps(body),
"--format", "json",
)
if _gws_binary():
result = _run_gws(
["gmail", "users", "messages", "modify"],
params={"userId": "me", "id": args.message_id},
body=body,
)
print(json.dumps({"id": result["id"], "labels": result.get("labelIds", [])}, indent=2))
return
service = build_service("gmail", "v1")
result = service.users().messages().modify(userId="me", id=args.message_id, body=body).execute()
print(json.dumps({"id": result["id"], "labels": result.get("labelIds", [])}, indent=2))
# -- Calendar --
# =========================================================================
# Calendar
# =========================================================================
def calendar_list(args):
if args.start or args.end:
# Specific date range — use raw Calendar API for precise timeMin/timeMax
from datetime import datetime, timedelta, timezone as tz
now = datetime.now(tz.utc)
time_min = args.start or now.isoformat()
time_max = args.end or (now + timedelta(days=7)).isoformat()
gws(
"calendar", "events", "list",
"--params", json.dumps({
now = datetime.now(timezone.utc)
time_min = _datetime_with_timezone(args.start or now.isoformat())
time_max = _datetime_with_timezone(args.end or (now + timedelta(days=7)).isoformat())
if _gws_binary():
results = _run_gws(
["calendar", "events", "list"],
params={
"calendarId": args.calendar,
"timeMin": time_min,
"timeMax": time_max,
"maxResults": args.max,
"singleEvents": True,
"orderBy": "startTime",
}),
"--format", "json",
},
)
else:
# No date range — use +agenda helper (defaults to 7 days)
cmd = ["calendar", "+agenda", "--days", "7", "--format", "json"]
if args.calendar != "primary":
cmd += ["--calendar", args.calendar]
gws(*cmd)
events = []
for e in results.get("items", []):
events.append({
"id": e["id"],
"summary": e.get("summary", "(no title)"),
"start": e.get("start", {}).get("dateTime", e.get("start", {}).get("date", "")),
"end": e.get("end", {}).get("dateTime", e.get("end", {}).get("date", "")),
"location": e.get("location", ""),
"description": e.get("description", ""),
"status": e.get("status", ""),
"htmlLink": e.get("htmlLink", ""),
})
print(json.dumps(events, indent=2, ensure_ascii=False))
return
service = build_service("calendar", "v3")
results = service.events().list(
calendarId=args.calendar, timeMin=time_min, timeMax=time_max,
maxResults=args.max, singleEvents=True, orderBy="startTime",
).execute()
events = []
for e in results.get("items", []):
events.append({
"id": e["id"],
"summary": e.get("summary", "(no title)"),
"start": e.get("start", {}).get("dateTime", e.get("start", {}).get("date", "")),
"end": e.get("end", {}).get("dateTime", e.get("end", {}).get("date", "")),
"location": e.get("location", ""),
"description": e.get("description", ""),
"status": e.get("status", ""),
"htmlLink": e.get("htmlLink", ""),
})
print(json.dumps(events, indent=2, ensure_ascii=False))
def calendar_create(args):
cmd = [
"calendar", "+insert",
"--summary", args.summary,
"--start", args.start,
"--end", args.end,
"--format", "json",
]
event = {
"summary": args.summary,
"start": {"dateTime": args.start},
"end": {"dateTime": args.end},
}
if args.location:
cmd += ["--location", args.location]
event["location"] = args.location
if args.description:
cmd += ["--description", args.description]
event["description"] = args.description
if args.attendees:
for email in args.attendees.split(","):
cmd += ["--attendee", email.strip()]
if args.calendar != "primary":
cmd += ["--calendar", args.calendar]
gws(*cmd)
event["attendees"] = [{"email": e.strip()} for e in args.attendees.split(",") if e.strip()]
if _gws_binary():
result = _run_gws(
["calendar", "events", "insert"],
params={"calendarId": args.calendar},
body=event,
)
print(json.dumps({
"status": "created",
"id": result["id"],
"summary": result.get("summary", ""),
"htmlLink": result.get("htmlLink", ""),
}, indent=2))
return
service = build_service("calendar", "v3")
result = service.events().insert(calendarId=args.calendar, body=event).execute()
print(json.dumps({
"status": "created",
"id": result["id"],
"summary": result.get("summary", ""),
"htmlLink": result.get("htmlLink", ""),
}, indent=2))
def calendar_delete(args):
gws(
"calendar", "events", "delete",
"--params", json.dumps({"calendarId": args.calendar, "eventId": args.event_id}),
"--format", "json",
)
if _gws_binary():
_run_gws(["calendar", "events", "delete"], params={"calendarId": args.calendar, "eventId": args.event_id})
print(json.dumps({"status": "deleted", "eventId": args.event_id}))
return
service = build_service("calendar", "v3")
service.events().delete(calendarId=args.calendar, eventId=args.event_id).execute()
print(json.dumps({"status": "deleted", "eventId": args.event_id}))
# -- Drive --
# =========================================================================
# Drive
# =========================================================================
def drive_search(args):
query = args.query if args.raw_query else f"fullText contains '{args.query}'"
gws(
"drive", "files", "list",
"--params", json.dumps({
"q": query,
"pageSize": args.max,
"fields": "files(id,name,mimeType,modifiedTime,webViewLink)",
}),
"--format", "json",
)
if _gws_binary():
results = _run_gws(
["drive", "files", "list"],
params={
"q": query,
"pageSize": args.max,
"fields": "files(id, name, mimeType, modifiedTime, webViewLink)",
},
)
print(json.dumps(results.get("files", []), indent=2, ensure_ascii=False))
return
service = build_service("drive", "v3")
results = service.files().list(
q=query, pageSize=args.max, fields="files(id, name, mimeType, modifiedTime, webViewLink)",
).execute()
files = results.get("files", [])
print(json.dumps(files, indent=2, ensure_ascii=False))
# -- Contacts --
# =========================================================================
# Contacts
# =========================================================================
def contacts_list(args):
gws(
"people", "people", "connections", "list",
"--params", json.dumps({
"resourceName": "people/me",
"pageSize": args.max,
"personFields": "names,emailAddresses,phoneNumbers",
}),
"--format", "json",
)
if _gws_binary():
results = _run_gws(
["people", "people", "connections", "list"],
params={
"resourceName": "people/me",
"pageSize": args.max,
"personFields": "names,emailAddresses,phoneNumbers",
},
)
contacts = []
for person in results.get("connections", []):
names = person.get("names", [{}])
emails = person.get("emailAddresses", [])
phones = person.get("phoneNumbers", [])
contacts.append({
"name": names[0].get("displayName", "") if names else "",
"emails": [e.get("value", "") for e in emails],
"phones": [p.get("value", "") for p in phones],
})
print(json.dumps(contacts, indent=2, ensure_ascii=False))
return
service = build_service("people", "v1")
results = service.people().connections().list(
resourceName="people/me",
pageSize=args.max,
personFields="names,emailAddresses,phoneNumbers",
).execute()
contacts = []
for person in results.get("connections", []):
names = person.get("names", [{}])
emails = person.get("emailAddresses", [])
phones = person.get("phoneNumbers", [])
contacts.append({
"name": names[0].get("displayName", "") if names else "",
"emails": [e.get("value", "") for e in emails],
"phones": [p.get("value", "") for p in phones],
})
print(json.dumps(contacts, indent=2, ensure_ascii=False))
# -- Sheets --
# =========================================================================
# Sheets
# =========================================================================
def sheets_get(args):
gws(
"sheets", "+read",
"--spreadsheet", args.sheet_id,
"--range", args.range,
"--format", "json",
)
if _gws_binary():
result = _run_gws(
["sheets", "spreadsheets", "values", "get"],
params={"spreadsheetId": args.sheet_id, "range": args.range},
)
print(json.dumps(result.get("values", []), indent=2, ensure_ascii=False))
return
service = build_service("sheets", "v4")
result = service.spreadsheets().values().get(
spreadsheetId=args.sheet_id, range=args.range,
).execute()
print(json.dumps(result.get("values", []), indent=2, ensure_ascii=False))
def sheets_update(args):
values = json.loads(args.values)
gws(
"sheets", "spreadsheets", "values", "update",
"--params", json.dumps({
"spreadsheetId": args.sheet_id,
"range": args.range,
"valueInputOption": "USER_ENTERED",
}),
"--json", json.dumps({"values": values}),
"--format", "json",
)
body = {"values": values}
if _gws_binary():
result = _run_gws(
["sheets", "spreadsheets", "values", "update"],
params={
"spreadsheetId": args.sheet_id,
"range": args.range,
"valueInputOption": "USER_ENTERED",
},
body=body,
)
print(json.dumps({"updatedCells": result.get("updatedCells", 0), "updatedRange": result.get("updatedRange", "")}, indent=2))
return
service = build_service("sheets", "v4")
result = service.spreadsheets().values().update(
spreadsheetId=args.sheet_id, range=args.range,
valueInputOption="USER_ENTERED", body=body,
).execute()
print(json.dumps({"updatedCells": result.get("updatedCells", 0), "updatedRange": result.get("updatedRange", "")}, indent=2))
def sheets_append(args):
values = json.loads(args.values)
gws(
"sheets", "+append",
"--spreadsheet", args.sheet_id,
"--json-values", json.dumps(values),
"--format", "json",
)
body = {"values": values}
if _gws_binary():
result = _run_gws(
["sheets", "spreadsheets", "values", "append"],
params={
"spreadsheetId": args.sheet_id,
"range": args.range,
"valueInputOption": "USER_ENTERED",
"insertDataOption": "INSERT_ROWS",
},
body=body,
)
print(json.dumps({"updatedCells": result.get("updates", {}).get("updatedCells", 0)}, indent=2))
return
service = build_service("sheets", "v4")
result = service.spreadsheets().values().append(
spreadsheetId=args.sheet_id, range=args.range,
valueInputOption="USER_ENTERED", insertDataOption="INSERT_ROWS", body=body,
).execute()
print(json.dumps({"updatedCells": result.get("updates", {}).get("updatedCells", 0)}, indent=2))
# -- Docs --
# =========================================================================
# Docs
# =========================================================================
def docs_get(args):
gws(
"docs", "documents", "get",
"--params", json.dumps({"documentId": args.doc_id}),
"--format", "json",
)
if _gws_binary():
doc = _run_gws(["docs", "documents", "get"], params={"documentId": args.doc_id})
result = {
"title": doc.get("title", ""),
"documentId": doc.get("documentId", ""),
"body": _extract_doc_text(doc),
}
print(json.dumps(result, indent=2, ensure_ascii=False))
return
service = build_service("docs", "v1")
doc = service.documents().get(documentId=args.doc_id).execute()
result = {
"title": doc.get("title", ""),
"documentId": doc.get("documentId", ""),
"body": _extract_doc_text(doc),
}
print(json.dumps(result, indent=2, ensure_ascii=False))
# -- CLI parser (backward-compatible interface) --
# =========================================================================
# CLI parser
# =========================================================================
def main():
parser = argparse.ArgumentParser(description="Google Workspace API for Hermes Agent (gws backend)")
parser = argparse.ArgumentParser(description="Google Workspace API for Hermes Agent")
sub = parser.add_subparsers(dest="service", required=True)
# --- Gmail ---
@@ -228,13 +742,15 @@ def main():
p.add_argument("--subject", required=True)
p.add_argument("--body", required=True)
p.add_argument("--cc", default="")
p.add_argument("--from", dest="from_header", default="", help="Custom From header (e.g. '\"Agent Name\" <user@example.com>')")
p.add_argument("--html", action="store_true", help="Send body as HTML")
p.add_argument("--thread-id", default="", help="Thread ID (unused with gws, kept for compat)")
p.add_argument("--thread-id", default="", help="Thread ID for threading")
p.set_defaults(func=gmail_send)
p = gmail_sub.add_parser("reply")
p.add_argument("message_id", help="Message ID to reply to")
p.add_argument("--body", required=True)
p.add_argument("--from", dest="from_header", default="", help="Custom From header (e.g. '\"Agent Name\" <user@example.com>')")
p.set_defaults(func=gmail_reply)
p = gmail_sub.add_parser("labels")
@@ -25,6 +25,13 @@ def refresh_token(token_data: dict) -> dict:
import urllib.parse
import urllib.request
required_keys = ["client_id", "client_secret", "refresh_token", "token_uri"]
missing = [k for k in required_keys if k not in token_data]
if missing:
print(f"ERROR: google_token.json is missing required fields: {', '.join(missing)}", file=sys.stderr)
print("Please re-authenticate by running the Google Workspace setup script.", file=sys.stderr)
sys.exit(1)
params = urllib.parse.urlencode({
"client_id": token_data["client_id"],
"client_secret": token_data["client_secret"],
+3 -3
View File
@@ -60,7 +60,7 @@ The fastest path — auto-detect the model, test strategies, and lock in the win
# In execute_code — use the loader to avoid exec-scoping issues:
import os
exec(open(os.path.expanduser(
"~/.hermes/skills/red-teaming/godmode/scripts/load_godmode.py"
os.path.join(os.environ.get("HERMES_HOME", os.path.expanduser("~/.hermes")), "skills/red-teaming/godmode/scripts/load_godmode.py")
)).read())
# Auto-detect model from config and jailbreak it
@@ -192,7 +192,7 @@ python3 scripts/parseltongue.py "How do I hack into a WiFi network?" --tier stan
Or use `execute_code` inline:
```python
# Load the parseltongue module
exec(open(os.path.expanduser("~/.hermes/skills/red-teaming/godmode/scripts/parseltongue.py")).read())
exec(open(os.path.join(os.environ.get("HERMES_HOME", os.path.expanduser("~/.hermes")), "skills/red-teaming/godmode/scripts/parseltongue.py")).read())
query = "How do I hack into a WiFi network?"
variants = generate_variants(query, tier="standard")
@@ -229,7 +229,7 @@ Race multiple models against the same query, score responses, pick the winner:
```python
# Via execute_code
exec(open(os.path.expanduser("~/.hermes/skills/red-teaming/godmode/scripts/godmode_race.py")).read())
exec(open(os.path.join(os.environ.get("HERMES_HOME", os.path.expanduser("~/.hermes")), "skills/red-teaming/godmode/scripts/godmode_race.py")).read())
result = race_models(
query="Explain how SQL injection works with a practical example",
@@ -114,7 +114,7 @@ hermes
### Via the GODMODE CLASSIC racer script
```python
exec(open(os.path.expanduser("~/.hermes/skills/red-teaming/godmode/scripts/godmode_race.py")).read())
exec(open(os.path.join(os.environ.get("HERMES_HOME", os.path.expanduser("~/.hermes")), "skills/red-teaming/godmode/scripts/godmode_race.py")).read())
result = race_godmode_classic("Your query here")
print(f"Winner: {result['codename']} — Score: {result['score']}")
print(result['content'])
@@ -129,7 +129,7 @@ These don't auto-reject but reduce the response score:
## Using in Python
```python
exec(open(os.path.expanduser("~/.hermes/skills/red-teaming/godmode/scripts/godmode_race.py")).read())
exec(open(os.path.join(os.environ.get("HERMES_HOME", os.path.expanduser("~/.hermes")), "skills/red-teaming/godmode/scripts/godmode_race.py")).read())
# Check if a response is a refusal
text = "I'm sorry, but I can't assist with that request."
@@ -7,7 +7,7 @@ finds what works, and locks it in by writing config.yaml + prefill.json.
Usage in execute_code:
exec(open(os.path.expanduser(
"~/.hermes/skills/red-teaming/godmode/scripts/auto_jailbreak.py"
os.path.join(os.environ.get("HERMES_HOME", os.path.expanduser("~/.hermes")), "skills/red-teaming/godmode/scripts/auto_jailbreak.py")
)).read())
result = auto_jailbreak() # Uses current model from config
@@ -7,7 +7,7 @@ Queries multiple models in parallel via OpenRouter, scores responses
on quality/filteredness/speed, returns the best unfiltered answer.
Usage in execute_code:
exec(open(os.path.expanduser("~/.hermes/skills/red-teaming/godmode/scripts/godmode_race.py")).read())
exec(open(os.path.join(os.environ.get("HERMES_HOME", os.path.expanduser("~/.hermes")), "skills/red-teaming/godmode/scripts/godmode_race.py")).read())
result = race_models(
query="Your query here",
@@ -3,7 +3,7 @@ Loader for G0DM0D3 scripts. Handles the exec-scoping issues.
Usage in execute_code:
exec(open(os.path.expanduser(
"~/.hermes/skills/red-teaming/godmode/scripts/load_godmode.py"
os.path.join(os.environ.get("HERMES_HOME", os.path.expanduser("~/.hermes")), "skills/red-teaming/godmode/scripts/load_godmode.py")
)).read())
# Now all functions are available:
@@ -11,7 +11,7 @@ Usage:
python parseltongue.py "How do I hack a WiFi network?" --tier standard
# As a module in execute_code
exec(open("~/.hermes/skills/red-teaming/godmode/scripts/parseltongue.py").read())
exec(open(os.path.join(os.environ.get("HERMES_HOME", os.path.expanduser("~/.hermes")), "skills/red-teaming/godmode/scripts/parseltongue.py")).read())
variants = generate_variants("How do I hack a WiFi network?", tier="standard")
"""
+16 -7
View File
@@ -89,7 +89,8 @@ class TestReadCodexAccessToken:
hermes_home.mkdir(parents=True, exist_ok=True)
(hermes_home / "auth.json").write_text(json.dumps({"version": 1, "providers": {}}))
monkeypatch.setenv("HERMES_HOME", str(hermes_home))
result = _read_codex_access_token()
with patch("agent.auxiliary_client._select_pool_entry", return_value=(False, None)):
result = _read_codex_access_token()
assert result is None
def test_empty_token_returns_none(self, tmp_path, monkeypatch):
@@ -146,7 +147,8 @@ class TestReadCodexAccessToken:
},
}))
monkeypatch.setenv("HERMES_HOME", str(hermes_home))
result = _read_codex_access_token()
with patch("agent.auxiliary_client._select_pool_entry", return_value=(False, None)):
result = _read_codex_access_token()
assert result is None, "Expired JWT should return None"
def test_valid_jwt_returns_token(self, tmp_path, monkeypatch):
@@ -585,7 +587,10 @@ class TestGetTextAuxiliaryClient:
assert call_kwargs.kwargs["base_url"] == "http://localhost:1234/v1"
def test_codex_fallback_when_nothing_else(self, codex_auth_dir):
with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
with patch("agent.auxiliary_client._try_openrouter", return_value=(None, None)), \
patch("agent.auxiliary_client._try_nous", return_value=(None, None)), \
patch("agent.auxiliary_client._try_custom_endpoint", return_value=(None, None)), \
patch("agent.auxiliary_client._read_main_provider", return_value="openrouter"), \
patch("agent.auxiliary_client.OpenAI") as mock_openai:
client, model = get_text_auxiliary_client()
assert model == "gpt-5.2-codex"
@@ -623,17 +628,21 @@ class TestGetTextAuxiliaryClient:
monkeypatch.delenv("OPENAI_BASE_URL", raising=False)
monkeypatch.delenv("OPENAI_API_KEY", raising=False)
monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
patch("agent.auxiliary_client._read_codex_access_token", return_value=None), \
patch("agent.auxiliary_client._resolve_api_key_provider", return_value=(None, None)):
with patch("agent.auxiliary_client._resolve_auto", return_value=(None, None)):
client, model = get_text_auxiliary_client()
assert client is None
assert model is None
def test_custom_endpoint_uses_codex_wrapper_when_runtime_requests_responses_api(self):
def test_custom_endpoint_uses_codex_wrapper_when_runtime_requests_responses_api(self, monkeypatch):
monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
monkeypatch.delenv("OPENAI_API_KEY", raising=False)
monkeypatch.delenv("OPENAI_BASE_URL", raising=False)
with patch("agent.auxiliary_client._resolve_custom_runtime",
return_value=("https://api.openai.com/v1", "sk-test", "codex_responses")), \
patch("agent.auxiliary_client._read_main_model", return_value="gpt-5.3-codex"), \
patch("agent.auxiliary_client._try_openrouter", return_value=(None, None)), \
patch("agent.auxiliary_client._try_nous", return_value=(None, None)), \
patch("agent.auxiliary_client._read_main_provider", return_value="openrouter"), \
patch("agent.auxiliary_client.OpenAI") as mock_openai:
client, model = get_text_auxiliary_client()
@@ -232,7 +232,7 @@ class TestResolveVisionProviderClientModelNormalization:
assert provider == "zai"
assert client is not None
assert model == "glm-5.1"
assert model == "glm-5v-turbo" # zai has dedicated vision model in _PROVIDER_VISION_MODELS
class TestVisionPathApiMode:
File diff suppressed because it is too large Load Diff
+269
View File
@@ -0,0 +1,269 @@
"""Integration tests for the AWS Bedrock provider wiring.
Verifies that the Bedrock provider is correctly registered in the
provider registry, model catalog, and runtime resolution pipeline.
These tests do NOT require AWS credentials or boto3 all AWS calls
are mocked.
Note: Tests that import ``hermes_cli.auth`` or ``hermes_cli.runtime_provider``
require Python 3.10+ due to ``str | None`` type syntax in the import chain.
"""
import os
from unittest.mock import MagicMock, patch
import pytest
class TestProviderRegistry:
"""Verify Bedrock is registered in PROVIDER_REGISTRY."""
def test_bedrock_in_registry(self):
from hermes_cli.auth import PROVIDER_REGISTRY
assert "bedrock" in PROVIDER_REGISTRY
def test_bedrock_auth_type_is_aws_sdk(self):
from hermes_cli.auth import PROVIDER_REGISTRY
pconfig = PROVIDER_REGISTRY["bedrock"]
assert pconfig.auth_type == "aws_sdk"
def test_bedrock_has_no_api_key_env_vars(self):
"""Bedrock uses the AWS SDK credential chain, not API keys."""
from hermes_cli.auth import PROVIDER_REGISTRY
pconfig = PROVIDER_REGISTRY["bedrock"]
assert pconfig.api_key_env_vars == ()
def test_bedrock_base_url_env_var(self):
from hermes_cli.auth import PROVIDER_REGISTRY
pconfig = PROVIDER_REGISTRY["bedrock"]
assert pconfig.base_url_env_var == "BEDROCK_BASE_URL"
class TestProviderAliases:
"""Verify Bedrock aliases resolve correctly."""
def test_aws_alias(self):
from hermes_cli.models import _PROVIDER_ALIASES
assert _PROVIDER_ALIASES.get("aws") == "bedrock"
def test_aws_bedrock_alias(self):
from hermes_cli.models import _PROVIDER_ALIASES
assert _PROVIDER_ALIASES.get("aws-bedrock") == "bedrock"
def test_amazon_bedrock_alias(self):
from hermes_cli.models import _PROVIDER_ALIASES
assert _PROVIDER_ALIASES.get("amazon-bedrock") == "bedrock"
def test_amazon_alias(self):
from hermes_cli.models import _PROVIDER_ALIASES
assert _PROVIDER_ALIASES.get("amazon") == "bedrock"
class TestProviderLabels:
"""Verify Bedrock appears in provider labels."""
def test_bedrock_label(self):
from hermes_cli.models import _PROVIDER_LABELS
assert _PROVIDER_LABELS.get("bedrock") == "AWS Bedrock"
class TestModelCatalog:
"""Verify Bedrock has a static model fallback list."""
def test_bedrock_has_curated_models(self):
from hermes_cli.models import _PROVIDER_MODELS
models = _PROVIDER_MODELS.get("bedrock", [])
assert len(models) > 0
def test_bedrock_models_include_claude(self):
from hermes_cli.models import _PROVIDER_MODELS
models = _PROVIDER_MODELS.get("bedrock", [])
claude_models = [m for m in models if "anthropic.claude" in m]
assert len(claude_models) > 0
def test_bedrock_models_include_nova(self):
from hermes_cli.models import _PROVIDER_MODELS
models = _PROVIDER_MODELS.get("bedrock", [])
nova_models = [m for m in models if "amazon.nova" in m]
assert len(nova_models) > 0
class TestResolveProvider:
"""Verify resolve_provider() handles bedrock correctly."""
def test_explicit_bedrock_resolves(self, monkeypatch):
"""When user explicitly requests 'bedrock', it should resolve."""
from hermes_cli.auth import PROVIDER_REGISTRY
# bedrock is in the registry, so resolve_provider should return it
from hermes_cli.auth import resolve_provider
result = resolve_provider("bedrock")
assert result == "bedrock"
def test_aws_alias_resolves_to_bedrock(self):
from hermes_cli.auth import resolve_provider
result = resolve_provider("aws")
assert result == "bedrock"
def test_amazon_bedrock_alias_resolves(self):
from hermes_cli.auth import resolve_provider
result = resolve_provider("amazon-bedrock")
assert result == "bedrock"
def test_auto_detect_with_aws_credentials(self, monkeypatch):
"""When AWS credentials are present and no other provider is configured,
auto-detect should find bedrock."""
from hermes_cli.auth import resolve_provider
# Clear all other provider env vars
for var in ["OPENAI_API_KEY", "OPENROUTER_API_KEY", "ANTHROPIC_API_KEY",
"ANTHROPIC_TOKEN", "GOOGLE_API_KEY", "DEEPSEEK_API_KEY"]:
monkeypatch.delenv(var, raising=False)
# Set AWS credentials
monkeypatch.setenv("AWS_ACCESS_KEY_ID", "AKIAIOSFODNN7EXAMPLE")
monkeypatch.setenv("AWS_SECRET_ACCESS_KEY", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
# Mock the auth store to have no active provider
with patch("hermes_cli.auth._load_auth_store", return_value={}):
result = resolve_provider("auto")
assert result == "bedrock"
class TestRuntimeProvider:
"""Verify resolve_runtime_provider() handles bedrock correctly."""
def test_bedrock_runtime_resolution(self, monkeypatch):
from hermes_cli.runtime_provider import resolve_runtime_provider
monkeypatch.setenv("AWS_ACCESS_KEY_ID", "AKIAIOSFODNN7EXAMPLE")
monkeypatch.setenv("AWS_SECRET_ACCESS_KEY", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
monkeypatch.setenv("AWS_REGION", "eu-west-1")
# Mock resolve_provider to return bedrock
with patch("hermes_cli.runtime_provider.resolve_provider", return_value="bedrock"), \
patch("hermes_cli.runtime_provider._get_model_config", return_value={"provider": "bedrock"}):
result = resolve_runtime_provider(requested="bedrock")
assert result["provider"] == "bedrock"
assert result["api_mode"] == "bedrock_converse"
assert result["region"] == "eu-west-1"
assert "bedrock-runtime.eu-west-1.amazonaws.com" in result["base_url"]
assert result["api_key"] == "aws-sdk"
def test_bedrock_runtime_default_region(self, monkeypatch):
from hermes_cli.runtime_provider import resolve_runtime_provider
monkeypatch.setenv("AWS_PROFILE", "default")
monkeypatch.delenv("AWS_REGION", raising=False)
monkeypatch.delenv("AWS_DEFAULT_REGION", raising=False)
with patch("hermes_cli.runtime_provider.resolve_provider", return_value="bedrock"), \
patch("hermes_cli.runtime_provider._get_model_config", return_value={"provider": "bedrock"}):
result = resolve_runtime_provider(requested="bedrock")
assert result["region"] == "us-east-1"
def test_bedrock_runtime_no_credentials_raises_on_auto_detect(self, monkeypatch):
"""When bedrock is auto-detected (not explicitly requested) and no
credentials are found, runtime resolution should raise AuthError."""
from hermes_cli.runtime_provider import resolve_runtime_provider
from hermes_cli.auth import AuthError
# Clear all AWS env vars
for var in ["AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY", "AWS_PROFILE",
"AWS_BEARER_TOKEN_BEDROCK", "AWS_CONTAINER_CREDENTIALS_RELATIVE_URI",
"AWS_WEB_IDENTITY_TOKEN_FILE"]:
monkeypatch.delenv(var, raising=False)
# Mock both the provider resolution and boto3's credential chain
mock_session = MagicMock()
mock_session.get_credentials.return_value = None
with patch("hermes_cli.runtime_provider.resolve_provider", return_value="bedrock"), \
patch("hermes_cli.runtime_provider._get_model_config", return_value={"provider": "bedrock"}), \
patch("hermes_cli.runtime_provider.resolve_requested_provider", return_value="auto"), \
patch.dict("sys.modules", {"botocore": MagicMock(), "botocore.session": MagicMock()}):
import botocore.session as _bs
_bs.get_session = MagicMock(return_value=mock_session)
with pytest.raises(AuthError, match="No AWS credentials"):
resolve_runtime_provider(requested="auto")
def test_bedrock_runtime_explicit_skips_credential_check(self, monkeypatch):
"""When user explicitly requests bedrock, trust boto3's credential chain
even if env-var detection finds nothing (covers IMDS, SSO, etc.)."""
from hermes_cli.runtime_provider import resolve_runtime_provider
# No AWS env vars set — but explicit bedrock request should not raise
for var in ["AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY", "AWS_PROFILE",
"AWS_BEARER_TOKEN_BEDROCK"]:
monkeypatch.delenv(var, raising=False)
with patch("hermes_cli.runtime_provider.resolve_provider", return_value="bedrock"), \
patch("hermes_cli.runtime_provider._get_model_config", return_value={"provider": "bedrock"}):
result = resolve_runtime_provider(requested="bedrock")
assert result["provider"] == "bedrock"
assert result["api_mode"] == "bedrock_converse"
# ---------------------------------------------------------------------------
# providers.py integration
# ---------------------------------------------------------------------------
class TestProvidersModule:
"""Verify bedrock is wired into hermes_cli/providers.py."""
def test_bedrock_alias_in_providers(self):
from hermes_cli.providers import ALIASES
assert ALIASES.get("bedrock") is None # "bedrock" IS the canonical name, not an alias
assert ALIASES.get("aws") == "bedrock"
assert ALIASES.get("aws-bedrock") == "bedrock"
def test_bedrock_transport_mapping(self):
from hermes_cli.providers import TRANSPORT_TO_API_MODE
assert TRANSPORT_TO_API_MODE.get("bedrock_converse") == "bedrock_converse"
def test_determine_api_mode_from_bedrock_url(self):
from hermes_cli.providers import determine_api_mode
assert determine_api_mode(
"unknown", "https://bedrock-runtime.us-east-1.amazonaws.com"
) == "bedrock_converse"
def test_label_override(self):
from hermes_cli.providers import _LABEL_OVERRIDES
assert _LABEL_OVERRIDES.get("bedrock") == "AWS Bedrock"
# ---------------------------------------------------------------------------
# Error classifier integration
# ---------------------------------------------------------------------------
class TestErrorClassifierBedrock:
"""Verify Bedrock error patterns are in the global error classifier."""
def test_throttling_in_rate_limit_patterns(self):
from agent.error_classifier import _RATE_LIMIT_PATTERNS
assert "throttlingexception" in _RATE_LIMIT_PATTERNS
def test_context_overflow_patterns(self):
from agent.error_classifier import _CONTEXT_OVERFLOW_PATTERNS
assert "input is too long" in _CONTEXT_OVERFLOW_PATTERNS
# ---------------------------------------------------------------------------
# pyproject.toml bedrock extra
# ---------------------------------------------------------------------------
class TestPackaging:
"""Verify bedrock optional dependency is declared."""
def test_bedrock_extra_exists(self):
import configparser
from pathlib import Path
# Read pyproject.toml to verify [bedrock] extra
toml_path = Path(__file__).parent.parent.parent / "pyproject.toml"
content = toml_path.read_text()
assert 'bedrock = ["boto3' in content
def test_bedrock_in_all_extra(self):
from pathlib import Path
content = (Path(__file__).parent.parent.parent / "pyproject.toml").read_text()
assert '"hermes-agent[bedrock]"' in content
+6
View File
@@ -252,6 +252,11 @@ def test_exhausted_402_entry_resets_after_one_hour(tmp_path, monkeypatch):
def test_explicit_reset_timestamp_overrides_default_429_ttl(tmp_path, monkeypatch):
monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes"))
# Prevent auto-seeding from Codex CLI tokens on the host
monkeypatch.setattr(
"hermes_cli.auth._import_codex_cli_tokens",
lambda: None,
)
_write_auth_store(
tmp_path,
{
@@ -1091,6 +1096,7 @@ def test_load_pool_seeds_copilot_via_gh_auth_token(tmp_path, monkeypatch):
assert len(entries) == 1
assert entries[0].source == "gh_cli"
assert entries[0].access_token == "gho_fake_token_abc123"
assert entries[0].base_url == "https://api.githubcopilot.com"
def test_load_pool_does_not_seed_copilot_when_no_token(tmp_path, monkeypatch):
+315
View File
@@ -396,6 +396,108 @@ class TestPluginMemoryDiscovery:
assert load_memory_provider("nonexistent_provider") is None
class TestUserInstalledProviderDiscovery:
"""Memory providers installed to $HERMES_HOME/plugins/ should be found.
Regression test for issues #4956 and #9099: load_memory_provider() and
discover_memory_providers() only scanned the bundled plugins/memory/
directory, ignoring user-installed plugins.
"""
def _make_user_memory_plugin(self, tmp_path, name="myprovider"):
"""Create a minimal user memory provider plugin."""
plugin_dir = tmp_path / "plugins" / name
plugin_dir.mkdir(parents=True)
(plugin_dir / "__init__.py").write_text(
"from agent.memory_provider import MemoryProvider\n"
"class MyProvider(MemoryProvider):\n"
f" @property\n"
f" def name(self): return {name!r}\n"
" def is_available(self): return True\n"
" def initialize(self, **kw): pass\n"
" def sync_turn(self, *a, **kw): pass\n"
" def get_tool_schemas(self): return []\n"
" def handle_tool_call(self, *a, **kw): return '{}'\n"
)
(plugin_dir / "plugin.yaml").write_text(
f"name: {name}\ndescription: Test user provider\n"
)
return plugin_dir
def test_discover_finds_user_plugins(self, tmp_path, monkeypatch):
"""discover_memory_providers() includes user-installed plugins."""
from plugins.memory import discover_memory_providers, _get_user_plugins_dir
self._make_user_memory_plugin(tmp_path, "myexternal")
monkeypatch.setattr(
"plugins.memory._get_user_plugins_dir",
lambda: tmp_path / "plugins",
)
providers = discover_memory_providers()
names = [n for n, _, _ in providers]
assert "myexternal" in names
assert "holographic" in names # bundled still found
def test_load_user_plugin(self, tmp_path, monkeypatch):
"""load_memory_provider() can load from $HERMES_HOME/plugins/."""
from plugins.memory import load_memory_provider
self._make_user_memory_plugin(tmp_path, "myexternal")
monkeypatch.setattr(
"plugins.memory._get_user_plugins_dir",
lambda: tmp_path / "plugins",
)
p = load_memory_provider("myexternal")
assert p is not None
assert p.name == "myexternal"
assert p.is_available()
def test_bundled_takes_precedence(self, tmp_path, monkeypatch):
"""Bundled provider wins when user plugin has the same name."""
from plugins.memory import load_memory_provider, discover_memory_providers
# Create user plugin named "holographic" (same as bundled)
plugin_dir = tmp_path / "plugins" / "holographic"
plugin_dir.mkdir(parents=True)
(plugin_dir / "__init__.py").write_text(
"from agent.memory_provider import MemoryProvider\n"
"class Fake(MemoryProvider):\n"
" @property\n"
" def name(self): return 'holographic-FAKE'\n"
" def is_available(self): return True\n"
" def initialize(self, **kw): pass\n"
" def sync_turn(self, *a, **kw): pass\n"
" def get_tool_schemas(self): return []\n"
" def handle_tool_call(self, *a, **kw): return '{}'\n"
)
monkeypatch.setattr(
"plugins.memory._get_user_plugins_dir",
lambda: tmp_path / "plugins",
)
# Load should return bundled (name "holographic"), not user (name "holographic-FAKE")
p = load_memory_provider("holographic")
assert p is not None
assert p.name == "holographic" # bundled wins
# discover should not duplicate
providers = discover_memory_providers()
holo_count = sum(1 for n, _, _ in providers if n == "holographic")
assert holo_count == 1
def test_non_memory_user_plugins_excluded(self, tmp_path, monkeypatch):
"""User plugins that don't reference MemoryProvider are skipped."""
from plugins.memory import discover_memory_providers
plugin_dir = tmp_path / "plugins" / "notmemory"
plugin_dir.mkdir(parents=True)
(plugin_dir / "__init__.py").write_text(
"def register(ctx):\n ctx.register_tool('foo', 'bar', {}, lambda: None)\n"
)
monkeypatch.setattr(
"plugins.memory._get_user_plugins_dir",
lambda: tmp_path / "plugins",
)
providers = discover_memory_providers()
names = [n for n, _, _ in providers]
assert "notmemory" not in names
# ---------------------------------------------------------------------------
# Sequential dispatch routing tests
# ---------------------------------------------------------------------------
@@ -695,3 +797,216 @@ class TestMemoryContextFencing:
fence_end = combined.index("</memory-context>")
assert "Alice" in combined[fence_start:fence_end]
assert combined.index("weather") < fence_start
# ---------------------------------------------------------------------------
# AIAgent.commit_memory_session — routes to MemoryManager.on_session_end
# ---------------------------------------------------------------------------
class _CommitRecorder(FakeMemoryProvider):
"""Provider that records on_session_end calls for assertions."""
def __init__(self, name="recorder"):
super().__init__(name)
self.end_calls = []
def on_session_end(self, messages):
self.end_calls.append(list(messages or []))
class TestCommitMemorySessionRouting:
def test_on_session_end_fans_out(self):
mgr = MemoryManager()
builtin = _CommitRecorder("builtin")
external = _CommitRecorder("openviking")
mgr.add_provider(builtin)
mgr.add_provider(external)
msgs = [{"role": "user", "content": "hi"}]
mgr.on_session_end(msgs)
assert builtin.end_calls == [msgs]
assert external.end_calls == [msgs]
def test_on_session_end_tolerates_failure(self):
mgr = MemoryManager()
builtin = FakeMemoryProvider("builtin")
bad = _CommitRecorder("bad-provider")
bad.on_session_end = lambda m: (_ for _ in ()).throw(RuntimeError("boom"))
mgr.add_provider(builtin)
mgr.add_provider(bad)
mgr.on_session_end([]) # must not raise
# ---------------------------------------------------------------------------
# on_memory_write bridge — must fire from both concurrent AND sequential paths
# ---------------------------------------------------------------------------
class TestOnMemoryWriteBridge:
"""Verify that MemoryManager.on_memory_write is called when built-in
memory writes happen. This is a regression test for #10174 where the
sequential tool execution path (_execute_tool_calls_sequential) was
missing the bridge call, so single memory tool calls never notified
external memory providers.
"""
def test_on_memory_write_add(self):
"""on_memory_write fires for 'add' actions."""
mgr = MemoryManager()
p = FakeMemoryProvider("ext")
mgr.add_provider(p)
mgr.on_memory_write("add", "memory", "new fact")
assert p.memory_writes == [("add", "memory", "new fact")]
def test_on_memory_write_replace(self):
"""on_memory_write fires for 'replace' actions."""
mgr = MemoryManager()
p = FakeMemoryProvider("ext")
mgr.add_provider(p)
mgr.on_memory_write("replace", "user", "updated pref")
assert p.memory_writes == [("replace", "user", "updated pref")]
def test_on_memory_write_remove_not_bridged(self):
"""The bridge intentionally skips 'remove' — only add/replace notify."""
# This tests the contract that run_agent.py checks:
# function_args.get("action") in ("add", "replace")
mgr = MemoryManager()
p = FakeMemoryProvider("ext")
mgr.add_provider(p)
# Manager itself doesn't filter — run_agent.py does.
# But providers should handle remove gracefully.
mgr.on_memory_write("remove", "memory", "old fact")
assert p.memory_writes == [("remove", "memory", "old fact")]
def test_memory_manager_tool_injection_deduplicates(self):
"""Memory manager tools already in self.tools (from plugin registry)
must not be appended again. Duplicate function names cause 400 errors
on providers that enforce unique names (e.g. Xiaomi MiMo via Nous Portal).
Regression test for: duplicate mnemosyne_recall / mnemosyne_remember /
mnemosyne_stats in tools array 400 from Nous Portal.
"""
mgr = MemoryManager()
p = FakeMemoryProvider("ext", tools=[
{"name": "ext_recall", "description": "Recall", "parameters": {}},
{"name": "ext_remember", "description": "Remember", "parameters": {}},
])
mgr.add_provider(p)
# Simulate self.tools already containing one of the plugin tools
# (as if it was registered via ctx.register_tool → get_tool_definitions)
existing_tools = [
{"type": "function", "function": {"name": "ext_recall", "description": "Recall (from registry)", "parameters": {}}},
{"type": "function", "function": {"name": "web_search", "description": "Search", "parameters": {}}},
]
# Apply the same dedup logic from run_agent.py __init__
_existing_names = {
t.get("function", {}).get("name")
for t in existing_tools
if isinstance(t, dict)
}
for _schema in mgr.get_all_tool_schemas():
_tname = _schema.get("name", "")
if _tname and _tname in _existing_names:
continue
existing_tools.append({"type": "function", "function": _schema})
if _tname:
_existing_names.add(_tname)
# ext_recall should NOT be duplicated; ext_remember should be added
tool_names = [t["function"]["name"] for t in existing_tools]
assert tool_names.count("ext_recall") == 1, f"ext_recall duplicated: {tool_names}"
assert tool_names.count("ext_remember") == 1
assert tool_names.count("web_search") == 1
assert len(existing_tools) == 3 # web_search + ext_recall + ext_remember
def test_on_memory_write_tolerates_provider_failure(self):
"""If a provider's on_memory_write raises, others still get notified."""
mgr = MemoryManager()
bad = FakeMemoryProvider("builtin")
bad.on_memory_write = MagicMock(side_effect=RuntimeError("boom"))
good = FakeMemoryProvider("good")
mgr.add_provider(bad)
mgr.add_provider(good)
mgr.on_memory_write("add", "user", "test")
# Good provider still received the call despite bad provider crashing
assert good.memory_writes == [("add", "user", "test")]
class TestHonchoCadenceTracking:
"""Verify Honcho provider cadence gating depends on on_turn_start().
Bug: _turn_count was never updated because on_turn_start() was not called
from run_conversation(). This meant cadence checks always passed (every
turn fired both context refresh and dialectic). Fixed by calling
on_turn_start(self._user_turn_count, msg) before prefetch_all().
"""
def test_turn_count_updates_on_turn_start(self):
"""on_turn_start sets _turn_count, enabling cadence math."""
from plugins.memory.honcho import HonchoMemoryProvider
p = HonchoMemoryProvider()
assert p._turn_count == 0
p.on_turn_start(1, "hello")
assert p._turn_count == 1
p.on_turn_start(5, "world")
assert p._turn_count == 5
def test_queue_prefetch_respects_dialectic_cadence(self):
"""With dialecticCadence=3, dialectic should skip turns 2 and 3."""
from plugins.memory.honcho import HonchoMemoryProvider
p = HonchoMemoryProvider()
p._dialectic_cadence = 3
p._recall_mode = "context"
p._session_key = "test-session"
# Simulate a manager that records prefetch calls
class FakeManager:
def prefetch_context(self, key, query=None):
pass
def prefetch_dialectic(self, key, query):
pass
p._manager = FakeManager()
# Simulate turn 1: last_dialectic_turn = -999, so (1 - (-999)) >= 3 -> fires
p.on_turn_start(1, "turn 1")
p._last_dialectic_turn = 1 # simulate it fired
p._last_context_turn = 1
# Simulate turn 2: (2 - 1) = 1 < 3 -> should NOT fire dialectic
p.on_turn_start(2, "turn 2")
assert (p._turn_count - p._last_dialectic_turn) < p._dialectic_cadence
# Simulate turn 3: (3 - 1) = 2 < 3 -> should NOT fire dialectic
p.on_turn_start(3, "turn 3")
assert (p._turn_count - p._last_dialectic_turn) < p._dialectic_cadence
# Simulate turn 4: (4 - 1) = 3 >= 3 -> should fire dialectic
p.on_turn_start(4, "turn 4")
assert (p._turn_count - p._last_dialectic_turn) >= p._dialectic_cadence
def test_injection_frequency_first_turn_with_1indexed(self):
"""injection_frequency='first-turn' must inject on turn 1 (1-indexed)."""
from plugins.memory.honcho import HonchoMemoryProvider
p = HonchoMemoryProvider()
p._injection_frequency = "first-turn"
# Turn 1 should inject (not skip)
p.on_turn_start(1, "first message")
assert p._turn_count == 1
# The guard is `_turn_count > 1`, so turn 1 passes through
should_skip = p._injection_frequency == "first-turn" and p._turn_count > 1
assert not should_skip, "First turn (turn 1) should NOT be skipped"
# Turn 2 should skip
p.on_turn_start(2, "second message")
should_skip = p._injection_frequency == "first-turn" and p._turn_count > 1
assert should_skip, "Second turn (turn 2) SHOULD be skipped"
+253
View File
@@ -0,0 +1,253 @@
"""Tests for agent/nous_rate_guard.py — cross-session Nous Portal rate limit guard."""
import json
import os
import time
import pytest
@pytest.fixture
def rate_guard_env(tmp_path, monkeypatch):
"""Isolate rate guard state to a temp directory."""
hermes_home = str(tmp_path / ".hermes")
os.makedirs(hermes_home, exist_ok=True)
monkeypatch.setenv("HERMES_HOME", hermes_home)
# Clear any cached module-level imports
return hermes_home
class TestRecordNousRateLimit:
"""Test recording rate limit state."""
def test_records_with_header_reset(self, rate_guard_env):
from agent.nous_rate_guard import record_nous_rate_limit, _state_path
headers = {"x-ratelimit-reset-requests-1h": "1800"}
record_nous_rate_limit(headers=headers)
path = _state_path()
assert os.path.exists(path)
with open(path) as f:
state = json.load(f)
assert state["reset_seconds"] == pytest.approx(1800, abs=2)
assert state["reset_at"] > time.time()
def test_records_with_per_minute_header(self, rate_guard_env):
from agent.nous_rate_guard import record_nous_rate_limit, _state_path
headers = {"x-ratelimit-reset-requests": "45"}
record_nous_rate_limit(headers=headers)
with open(_state_path()) as f:
state = json.load(f)
assert state["reset_seconds"] == pytest.approx(45, abs=2)
def test_records_with_retry_after_header(self, rate_guard_env):
from agent.nous_rate_guard import record_nous_rate_limit, _state_path
headers = {"retry-after": "60"}
record_nous_rate_limit(headers=headers)
with open(_state_path()) as f:
state = json.load(f)
assert state["reset_seconds"] == pytest.approx(60, abs=2)
def test_prefers_hourly_over_per_minute(self, rate_guard_env):
from agent.nous_rate_guard import record_nous_rate_limit, _state_path
headers = {
"x-ratelimit-reset-requests-1h": "1800",
"x-ratelimit-reset-requests": "45",
}
record_nous_rate_limit(headers=headers)
with open(_state_path()) as f:
state = json.load(f)
# Should use the hourly value, not the per-minute one
assert state["reset_seconds"] == pytest.approx(1800, abs=2)
def test_falls_back_to_error_context_reset_at(self, rate_guard_env):
from agent.nous_rate_guard import record_nous_rate_limit, _state_path
future_reset = time.time() + 900
record_nous_rate_limit(
headers=None,
error_context={"reset_at": future_reset},
)
with open(_state_path()) as f:
state = json.load(f)
assert state["reset_at"] == pytest.approx(future_reset, abs=1)
def test_falls_back_to_default_cooldown(self, rate_guard_env):
from agent.nous_rate_guard import record_nous_rate_limit, _state_path
record_nous_rate_limit(headers=None)
with open(_state_path()) as f:
state = json.load(f)
# Default is 300 seconds (5 minutes)
assert state["reset_seconds"] == pytest.approx(300, abs=2)
def test_custom_default_cooldown(self, rate_guard_env):
from agent.nous_rate_guard import record_nous_rate_limit, _state_path
record_nous_rate_limit(headers=None, default_cooldown=120.0)
with open(_state_path()) as f:
state = json.load(f)
assert state["reset_seconds"] == pytest.approx(120, abs=2)
def test_creates_directory_if_missing(self, rate_guard_env):
from agent.nous_rate_guard import record_nous_rate_limit, _state_path
record_nous_rate_limit(headers={"retry-after": "10"})
assert os.path.exists(_state_path())
class TestNousRateLimitRemaining:
"""Test checking remaining rate limit time."""
def test_returns_none_when_no_file(self, rate_guard_env):
from agent.nous_rate_guard import nous_rate_limit_remaining
assert nous_rate_limit_remaining() is None
def test_returns_remaining_seconds_when_active(self, rate_guard_env):
from agent.nous_rate_guard import record_nous_rate_limit, nous_rate_limit_remaining
record_nous_rate_limit(headers={"x-ratelimit-reset-requests-1h": "600"})
remaining = nous_rate_limit_remaining()
assert remaining is not None
assert 595 < remaining <= 605 # ~600 seconds, allowing for test execution time
def test_returns_none_when_expired(self, rate_guard_env):
from agent.nous_rate_guard import nous_rate_limit_remaining, _state_path
# Write an already-expired state
state_dir = os.path.dirname(_state_path())
os.makedirs(state_dir, exist_ok=True)
with open(_state_path(), "w") as f:
json.dump({"reset_at": time.time() - 10, "recorded_at": time.time() - 100}, f)
assert nous_rate_limit_remaining() is None
# File should be cleaned up
assert not os.path.exists(_state_path())
def test_handles_corrupt_file(self, rate_guard_env):
from agent.nous_rate_guard import nous_rate_limit_remaining, _state_path
state_dir = os.path.dirname(_state_path())
os.makedirs(state_dir, exist_ok=True)
with open(_state_path(), "w") as f:
f.write("not valid json{{{")
assert nous_rate_limit_remaining() is None
class TestClearNousRateLimit:
"""Test clearing rate limit state."""
def test_clears_existing_file(self, rate_guard_env):
from agent.nous_rate_guard import (
record_nous_rate_limit,
clear_nous_rate_limit,
nous_rate_limit_remaining,
_state_path,
)
record_nous_rate_limit(headers={"retry-after": "600"})
assert nous_rate_limit_remaining() is not None
clear_nous_rate_limit()
assert nous_rate_limit_remaining() is None
assert not os.path.exists(_state_path())
def test_clear_when_no_file(self, rate_guard_env):
from agent.nous_rate_guard import clear_nous_rate_limit
# Should not raise
clear_nous_rate_limit()
class TestFormatRemaining:
"""Test human-readable duration formatting."""
def test_seconds(self):
from agent.nous_rate_guard import format_remaining
assert format_remaining(30) == "30s"
def test_minutes(self):
from agent.nous_rate_guard import format_remaining
assert format_remaining(125) == "2m 5s"
def test_exact_minutes(self):
from agent.nous_rate_guard import format_remaining
assert format_remaining(120) == "2m"
def test_hours(self):
from agent.nous_rate_guard import format_remaining
assert format_remaining(3720) == "1h 2m"
class TestParseResetSeconds:
"""Test header parsing for reset times."""
def test_case_insensitive_headers(self, rate_guard_env):
from agent.nous_rate_guard import _parse_reset_seconds
headers = {"X-Ratelimit-Reset-Requests-1h": "1200"}
assert _parse_reset_seconds(headers) == 1200.0
def test_returns_none_for_empty_headers(self):
from agent.nous_rate_guard import _parse_reset_seconds
assert _parse_reset_seconds(None) is None
assert _parse_reset_seconds({}) is None
def test_ignores_zero_values(self):
from agent.nous_rate_guard import _parse_reset_seconds
headers = {"x-ratelimit-reset-requests-1h": "0"}
assert _parse_reset_seconds(headers) is None
def test_ignores_invalid_values(self):
from agent.nous_rate_guard import _parse_reset_seconds
headers = {"x-ratelimit-reset-requests-1h": "not-a-number"}
assert _parse_reset_seconds(headers) is None
class TestAuxiliaryClientIntegration:
"""Test that the auxiliary client respects the rate guard."""
def test_try_nous_skips_when_rate_limited(self, rate_guard_env, monkeypatch):
from agent.nous_rate_guard import record_nous_rate_limit
# Record a rate limit
record_nous_rate_limit(headers={"retry-after": "600"})
# Mock _read_nous_auth to return valid creds (would normally succeed)
import agent.auxiliary_client as aux
monkeypatch.setattr(aux, "_read_nous_auth", lambda: {
"access_token": "test-token",
"inference_base_url": "https://api.nous.test/v1",
})
result = aux._try_nous()
assert result == (None, None)
def test_try_nous_works_when_not_rate_limited(self, rate_guard_env, monkeypatch):
import agent.auxiliary_client as aux
# No rate limit recorded — _try_nous should proceed normally
# (will return None because no real creds, but won't be blocked
# by the rate guard)
monkeypatch.setattr(aux, "_read_nous_auth", lambda: None)
result = aux._try_nous()
assert result == (None, None)
@@ -0,0 +1,60 @@
"""Tests for malformed proxy env var and base URL validation.
Salvaged from PR #6403 by MestreY0d4-Uninter — validates that the agent
surfaces clear errors instead of cryptic httpx ``Invalid port`` exceptions
when proxy env vars or custom endpoint URLs are malformed.
"""
from __future__ import annotations
import pytest
from agent.auxiliary_client import _validate_base_url, _validate_proxy_env_urls
# -- proxy env validation ------------------------------------------------
def test_proxy_env_accepts_normal_values(monkeypatch):
monkeypatch.setenv("HTTP_PROXY", "http://127.0.0.1:6153")
monkeypatch.setenv("HTTPS_PROXY", "https://proxy.example.com:8443")
monkeypatch.setenv("ALL_PROXY", "socks5://127.0.0.1:1080")
_validate_proxy_env_urls() # should not raise
def test_proxy_env_accepts_empty(monkeypatch):
monkeypatch.delenv("HTTP_PROXY", raising=False)
monkeypatch.delenv("HTTPS_PROXY", raising=False)
monkeypatch.delenv("ALL_PROXY", raising=False)
monkeypatch.delenv("http_proxy", raising=False)
monkeypatch.delenv("https_proxy", raising=False)
monkeypatch.delenv("all_proxy", raising=False)
_validate_proxy_env_urls() # should not raise
@pytest.mark.parametrize("key", [
"HTTP_PROXY", "HTTPS_PROXY", "ALL_PROXY",
"http_proxy", "https_proxy", "all_proxy",
])
def test_proxy_env_rejects_malformed_port(monkeypatch, key):
monkeypatch.setenv(key, "http://127.0.0.1:6153export")
with pytest.raises(RuntimeError, match=rf"Malformed proxy environment variable {key}=.*6153export"):
_validate_proxy_env_urls()
# -- base URL validation -------------------------------------------------
@pytest.mark.parametrize("url", [
"https://api.example.com/v1",
"http://127.0.0.1:6153/v1",
"acp://copilot",
"",
None,
])
def test_base_url_accepts_valid(url):
_validate_base_url(url) # should not raise
def test_base_url_rejects_malformed_port():
with pytest.raises(RuntimeError, match="Malformed custom endpoint URL"):
_validate_base_url("http://127.0.0.1:6153export")
+92
View File
@@ -284,3 +284,95 @@ class TestElevenLabsTavilyExaKeys:
assert "XYZ789abcdef" not in result
assert "HOME=/home/user" in result
assert "SHELL=/bin/bash" in result
class TestJWTTokens:
"""JWT tokens start with eyJ (base64 for '{') and have dot-separated parts."""
def test_full_3part_jwt(self):
text = (
"Token: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9"
".eyJpc3MiOiI0MjNiZDJkYjg4MjI0MDAwIn0"
".Gxgv0rru-_kS-I_60EJ7CENTnBh9UeuL3QhkMoQ-VnM"
)
result = redact_sensitive_text(text)
assert "Token:" in result
# Payload and signature must not survive
assert "eyJpc3Mi" not in result
assert "Gxgv0rru" not in result
def test_2part_jwt(self):
text = "eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiIxMjM0NTY3ODkwIn0"
result = redact_sensitive_text(text)
assert "eyJzdWIi" not in result
def test_standalone_jwt_header(self):
text = "leaked header: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9 here"
result = redact_sensitive_text(text)
assert "IkpXVCJ9" not in result
assert "leaked header:" in result
def test_jwt_with_base64_padding(self):
text = "eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiIxMjM0NTY3ODkwIn0=.abc123def456ghij"
result = redact_sensitive_text(text)
assert "abc123def456" not in result
def test_short_eyj_not_matched(self):
"""eyJ followed by fewer than 10 base64 chars should not match."""
text = "eyJust a normal word"
assert redact_sensitive_text(text) == text
def test_jwt_preserves_surrounding_text(self):
text = "before eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiIxMjM0NTY3ODkwIn0 after"
result = redact_sensitive_text(text)
assert result.startswith("before ")
assert result.endswith(" after")
def test_home_assistant_jwt_in_memory(self):
"""Real-world pattern: HA token stored in agent memory block."""
text = (
"Home Assistant API Token: "
"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9"
".eyJpc3MiOiJhYmNkZWYiLCJleHAiOjE3NzQ5NTcxMDN9"
".Gxgv0rru-_kS-I_60EJ7CENTnBh9UeuL3QhkMoQ-VnM"
)
result = redact_sensitive_text(text)
assert "Home Assistant API Token:" in result
assert "Gxgv0rru" not in result
assert "..." in result
class TestDiscordMentions:
"""Discord snowflake IDs in <@ID> or <@!ID> format."""
def test_normal_mention(self):
result = redact_sensitive_text("Hello <@222589316709220353>")
assert "222589316709220353" not in result
assert "<@***>" in result
def test_nickname_mention(self):
result = redact_sensitive_text("Ping <@!1331549159177846844>")
assert "1331549159177846844" not in result
assert "<@!***>" in result
def test_multiple_mentions(self):
text = "<@111111111111111111> and <@222222222222222222>"
result = redact_sensitive_text(text)
assert "111111111111111111" not in result
assert "222222222222222222" not in result
def test_short_id_not_matched(self):
"""IDs shorter than 17 digits are not Discord snowflakes."""
text = "<@12345>"
assert redact_sensitive_text(text) == text
def test_slack_mention_not_matched(self):
"""Slack mentions use letters, not pure digits."""
text = "<@U024BE7LH>"
assert redact_sensitive_text(text) == text
def test_preserves_surrounding_text(self):
text = "User <@222589316709220353> said hello"
result = redact_sensitive_text(text)
assert result.startswith("User ")
assert result.endswith(" said hello")
+16
View File
@@ -1,5 +1,6 @@
"""Tests for CLI /status command behavior."""
from datetime import datetime
from pathlib import Path
from types import SimpleNamespace
from unittest.mock import MagicMock, patch
@@ -83,3 +84,18 @@ def test_show_session_status_prints_gateway_style_summary():
_, kwargs = cli_obj.console.print.call_args
assert kwargs.get("highlight") is False
assert kwargs.get("markup") is False
def test_profile_command_reports_custom_root_profile(monkeypatch, tmp_path, capsys):
"""Profile detection works for custom-root deployments (not under ~/.hermes)."""
cli_obj = _make_cli()
profile_home = tmp_path / "profiles" / "coder"
monkeypatch.setenv("HERMES_HOME", str(profile_home))
monkeypatch.setattr(Path, "home", lambda: tmp_path / "unrelated-home")
cli_obj._handle_profile_command()
out = capsys.readouterr().out
assert "Profile: coder" in out
assert f"Home: {profile_home}" in out
+12
View File
@@ -144,6 +144,18 @@ class TestGatewayPersonalityNone:
assert "none" in result.lower()
@pytest.mark.asyncio
async def test_empty_personality_list_uses_profile_display_path(self, tmp_path):
runner = self._make_runner(personalities={})
(tmp_path / "config.yaml").write_text(yaml.dump({"agent": {"personalities": {}}}))
with patch("gateway.run._hermes_home", tmp_path), \
patch("hermes_constants.display_hermes_home", return_value="~/.hermes/profiles/coder"):
event = self._make_event("")
result = await runner._handle_personality_command(event)
assert result == "No personalities configured in `~/.hermes/profiles/coder/config.yaml`"
class TestPersonalityDictFormat:
"""Test dict-format custom personalities with description, tone, style."""
+115 -1
View File
@@ -8,6 +8,8 @@ from unittest.mock import AsyncMock, patch, MagicMock
import pytest
from cron.scheduler import _resolve_origin, _resolve_delivery_target, _deliver_result, _send_media_via_adapter, run_job, SILENT_MARKER, _build_job_prompt
from tools.env_passthrough import clear_env_passthrough
from tools.credential_files import clear_credential_files
class TestResolveOrigin:
@@ -233,9 +235,10 @@ class TestDeliverResultWrapping:
send_mock.assert_called_once()
sent_content = send_mock.call_args.kwargs.get("content") or send_mock.call_args[0][-1]
assert "Cronjob Response: daily-report" in sent_content
assert "(job_id: test-job)" in sent_content
assert "-------------" in sent_content
assert "Here is today's summary." in sent_content
assert "The agent cannot see this message" in sent_content
assert "To stop or manage this job" in sent_content
def test_delivery_uses_job_id_when_no_name(self):
"""When a job has no name, the wrapper should fall back to job id."""
@@ -876,6 +879,117 @@ class TestRunJobPerJobOverrides:
class TestRunJobSkillBacked:
def test_run_job_preserves_skill_env_passthrough_into_worker_thread(self, tmp_path):
job = {
"id": "skill-env-job",
"name": "skill env test",
"prompt": "Use the skill.",
"skill": "notion",
}
fake_db = MagicMock()
def _skill_view(name):
assert name == "notion"
from tools.env_passthrough import register_env_passthrough
register_env_passthrough(["NOTION_API_KEY"])
return json.dumps({"success": True, "content": "# notion\nUse Notion."})
def _run_conversation(prompt):
from tools.env_passthrough import get_all_passthrough
assert "NOTION_API_KEY" in get_all_passthrough()
return {"final_response": "ok"}
with patch("cron.scheduler._hermes_home", tmp_path), \
patch("cron.scheduler._resolve_origin", return_value=None), \
patch("dotenv.load_dotenv"), \
patch("hermes_state.SessionDB", return_value=fake_db), \
patch(
"hermes_cli.runtime_provider.resolve_runtime_provider",
return_value={
"api_key": "***",
"base_url": "https://example.invalid/v1",
"provider": "openrouter",
"api_mode": "chat_completions",
},
), \
patch("tools.skills_tool.skill_view", side_effect=_skill_view), \
patch("run_agent.AIAgent") as mock_agent_cls:
mock_agent = MagicMock()
mock_agent.run_conversation.side_effect = _run_conversation
mock_agent_cls.return_value = mock_agent
try:
success, output, final_response, error = run_job(job)
finally:
clear_env_passthrough()
assert success is True
assert error is None
assert final_response == "ok"
def test_run_job_preserves_credential_file_passthrough_into_worker_thread(self, tmp_path):
"""copy_context() also propagates credential_files ContextVar."""
job = {
"id": "cred-env-job",
"name": "cred file test",
"prompt": "Use the skill.",
"skill": "google-workspace",
}
fake_db = MagicMock()
# Create a credential file so register_credential_file succeeds
cred_dir = tmp_path / "credentials"
cred_dir.mkdir()
(cred_dir / "google_token.json").write_text('{"token": "t"}')
def _skill_view(name):
assert name == "google-workspace"
from tools.credential_files import register_credential_file
register_credential_file("credentials/google_token.json")
return json.dumps({"success": True, "content": "# google-workspace\nUse Google."})
def _run_conversation(prompt):
from tools.credential_files import _get_registered
registered = _get_registered()
assert registered, "credential files must be visible in worker thread"
assert any("google_token.json" in v for v in registered.values())
return {"final_response": "ok"}
with patch("cron.scheduler._hermes_home", tmp_path), \
patch("cron.scheduler._resolve_origin", return_value=None), \
patch("tools.credential_files._resolve_hermes_home", return_value=tmp_path), \
patch("dotenv.load_dotenv"), \
patch("hermes_state.SessionDB", return_value=fake_db), \
patch(
"hermes_cli.runtime_provider.resolve_runtime_provider",
return_value={
"api_key": "***",
"base_url": "https://example.invalid/v1",
"provider": "openrouter",
"api_mode": "chat_completions",
},
), \
patch("tools.skills_tool.skill_view", side_effect=_skill_view), \
patch("run_agent.AIAgent") as mock_agent_cls:
mock_agent = MagicMock()
mock_agent.run_conversation.side_effect = _run_conversation
mock_agent_cls.return_value = mock_agent
try:
success, output, final_response, error = run_job(job)
finally:
clear_credential_files()
assert success is True
assert error is None
assert final_response == "ok"
def test_run_job_loads_skill_and_disables_recursive_cron_tools(self, tmp_path):
job = {
"id": "skill-job",
+66
View File
@@ -0,0 +1,66 @@
"""Shared fixtures for gateway tests.
The ``_ensure_telegram_mock`` helper guarantees that a minimal mock of
the ``telegram`` package is registered in :data:`sys.modules` **before**
any test file triggers ``from gateway.platforms.telegram import ...``.
Without this, ``pytest-xdist`` workers that happen to collect
``test_telegram_caption_merge.py`` (bare top-level import, no per-file
mock) first will cache ``ChatType = None`` from the production
ImportError fallback, causing 30+ downstream test failures wherever
``ChatType.GROUP`` / ``ChatType.SUPERGROUP`` is accessed.
Individual test files may still call their own ``_ensure_telegram_mock``
it short-circuits when the mock is already present.
"""
import sys
from unittest.mock import MagicMock
def _ensure_telegram_mock() -> None:
"""Install a comprehensive telegram mock in sys.modules.
Idempotent skips when the real library is already imported.
Uses ``sys.modules[name] = mod`` (overwrite) instead of
``setdefault`` so it wins even if a partial/broken import
already cached a module with ``ChatType = None``.
"""
if "telegram" in sys.modules and hasattr(sys.modules["telegram"], "__file__"):
return # Real library is installed — nothing to mock
mod = MagicMock()
mod.ext.ContextTypes.DEFAULT_TYPE = type(None)
mod.constants.ParseMode.MARKDOWN = "Markdown"
mod.constants.ParseMode.MARKDOWN_V2 = "MarkdownV2"
mod.constants.ParseMode.HTML = "HTML"
mod.constants.ChatType.PRIVATE = "private"
mod.constants.ChatType.GROUP = "group"
mod.constants.ChatType.SUPERGROUP = "supergroup"
mod.constants.ChatType.CHANNEL = "channel"
# Real exception classes so ``except (NetworkError, ...)`` clauses
# in production code don't blow up with TypeError.
mod.error.NetworkError = type("NetworkError", (OSError,), {})
mod.error.TimedOut = type("TimedOut", (OSError,), {})
mod.error.BadRequest = type("BadRequest", (Exception,), {})
mod.error.Forbidden = type("Forbidden", (Exception,), {})
mod.error.InvalidToken = type("InvalidToken", (Exception,), {})
mod.error.RetryAfter = type("RetryAfter", (Exception,), {"retry_after": 1})
mod.error.Conflict = type("Conflict", (Exception,), {})
# Update.ALL_TYPES used in start_polling()
mod.Update.ALL_TYPES = []
for name in (
"telegram",
"telegram.ext",
"telegram.constants",
"telegram.request",
):
sys.modules[name] = mod
sys.modules["telegram.error"] = mod.error
# Run at collection time — before any test file's module-level imports.
_ensure_telegram_mock()
+169
View File
@@ -1016,6 +1016,47 @@ class TestResponsesEndpoint:
assert len(call_kwargs["conversation_history"]) > 0
assert call_kwargs["user_message"] == "Now add 1 more"
@pytest.mark.asyncio
async def test_previous_response_id_preserves_session(self, adapter):
"""Chained responses via previous_response_id reuse the same session_id."""
mock_result = {
"final_response": "ok",
"messages": [{"role": "assistant", "content": "ok"}],
"api_calls": 1,
}
usage = {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0}
app = _create_app(adapter)
async with TestClient(TestServer(app)) as cli:
# First request — establishes a session
with patch.object(adapter, "_run_agent", new_callable=AsyncMock) as mock_run:
mock_run.return_value = (mock_result, usage)
resp1 = await cli.post(
"/v1/responses",
json={"model": "hermes-agent", "input": "Hello"},
)
assert resp1.status == 200
first_session_id = mock_run.call_args.kwargs["session_id"]
data1 = await resp1.json()
response_id = data1["id"]
# Second request — chains from the first
with patch.object(adapter, "_run_agent", new_callable=AsyncMock) as mock_run:
mock_run.return_value = (mock_result, usage)
resp2 = await cli.post(
"/v1/responses",
json={
"model": "hermes-agent",
"input": "Follow up",
"previous_response_id": response_id,
},
)
assert resp2.status == 200
second_session_id = mock_run.call_args.kwargs["session_id"]
# Session must be the same across the chain
assert first_session_id == second_session_id
@pytest.mark.asyncio
async def test_invalid_previous_response_id_returns_404(self, adapter):
app = _create_app(adapter)
@@ -1115,6 +1156,134 @@ class TestResponsesEndpoint:
assert resp.status == 400
class TestResponsesStreaming:
@pytest.mark.asyncio
async def test_stream_true_returns_responses_sse(self, adapter):
app = _create_app(adapter)
async with TestClient(TestServer(app)) as cli:
async def _mock_run_agent(**kwargs):
cb = kwargs.get("stream_delta_callback")
if cb:
cb("Hello")
cb(" world")
return (
{"final_response": "Hello world", "messages": [], "api_calls": 1},
{"input_tokens": 10, "output_tokens": 5, "total_tokens": 15},
)
with patch.object(adapter, "_run_agent", side_effect=_mock_run_agent):
resp = await cli.post(
"/v1/responses",
json={"model": "hermes-agent", "input": "hi", "stream": True},
)
assert resp.status == 200
assert "text/event-stream" in resp.headers.get("Content-Type", "")
body = await resp.text()
assert "event: response.created" in body
assert "event: response.output_text.delta" in body
assert "event: response.output_text.done" in body
assert "event: response.completed" in body
assert '"sequence_number":' in body
assert '"logprobs": []' in body
assert "Hello" in body
assert " world" in body
@pytest.mark.asyncio
async def test_stream_emits_function_call_and_output_items(self, adapter):
app = _create_app(adapter)
async with TestClient(TestServer(app)) as cli:
async def _mock_run_agent(**kwargs):
start_cb = kwargs.get("tool_start_callback")
complete_cb = kwargs.get("tool_complete_callback")
text_cb = kwargs.get("stream_delta_callback")
if start_cb:
start_cb("call_123", "read_file", {"path": "/tmp/test.txt"})
if complete_cb:
complete_cb("call_123", "read_file", {"path": "/tmp/test.txt"}, '{"content":"hello"}')
if text_cb:
text_cb("Done.")
return (
{
"final_response": "Done.",
"messages": [
{
"role": "assistant",
"tool_calls": [
{
"id": "call_123",
"function": {
"name": "read_file",
"arguments": '{"path":"/tmp/test.txt"}',
},
}
],
},
{
"role": "tool",
"tool_call_id": "call_123",
"content": '{"content":"hello"}',
},
],
"api_calls": 1,
},
{"input_tokens": 10, "output_tokens": 5, "total_tokens": 15},
)
with patch.object(adapter, "_run_agent", side_effect=_mock_run_agent):
resp = await cli.post(
"/v1/responses",
json={"model": "hermes-agent", "input": "read the file", "stream": True},
)
assert resp.status == 200
body = await resp.text()
assert "event: response.output_item.added" in body
assert "event: response.output_item.done" in body
assert body.count("event: response.output_item.done") >= 2
assert '"type": "function_call"' in body
assert '"type": "function_call_output"' in body
assert '"call_id": "call_123"' in body
assert '"name": "read_file"' in body
assert '"output": [{"type": "input_text", "text": "{\\"content\\":\\"hello\\"}"}]' in body
@pytest.mark.asyncio
async def test_streamed_response_is_stored_for_get(self, adapter):
app = _create_app(adapter)
async with TestClient(TestServer(app)) as cli:
async def _mock_run_agent(**kwargs):
cb = kwargs.get("stream_delta_callback")
if cb:
cb("Stored response")
return (
{"final_response": "Stored response", "messages": [], "api_calls": 1},
{"input_tokens": 1, "output_tokens": 2, "total_tokens": 3},
)
with patch.object(adapter, "_run_agent", side_effect=_mock_run_agent):
resp = await cli.post(
"/v1/responses",
json={"model": "hermes-agent", "input": "store this", "stream": True},
)
body = await resp.text()
response_id = None
for line in body.splitlines():
if line.startswith("data: "):
try:
payload = json.loads(line[len("data: "):])
except json.JSONDecodeError:
continue
if payload.get("type") == "response.completed":
response_id = payload["response"]["id"]
break
assert response_id
get_resp = await cli.get(f"/v1/responses/{response_id}")
assert get_resp.status == 200
data = await get_resp.json()
assert data["id"] == response_id
assert data["status"] == "completed"
assert data["output"][-1]["content"][0]["text"] == "Stored response"
# ---------------------------------------------------------------------------
# Auth on endpoints
# ---------------------------------------------------------------------------
+95
View File
@@ -0,0 +1,95 @@
"""Tests for the auto-continue feature (#4493).
When the gateway restarts mid-agent-work, the session transcript ends on a
tool result that the agent never processed. The auto-continue logic detects
this and prepends a system note to the next user message so the model
finishes the interrupted work before addressing the new input.
"""
import pytest
def _simulate_auto_continue(agent_history: list, user_message: str) -> str:
"""Reproduce the auto-continue injection logic from _run_agent().
This mirrors the exact code in gateway/run.py so we can test the
detection and message transformation without spinning up a full
gateway runner.
"""
message = user_message
if agent_history and agent_history[-1].get("role") == "tool":
message = (
"[System note: Your previous turn was interrupted before you could "
"process the last tool result(s). The conversation history contains "
"tool outputs you haven't responded to yet. Please finish processing "
"those results and summarize what was accomplished, then address the "
"user's new message below.]\n\n"
+ message
)
return message
class TestAutoDetection:
"""Test that trailing tool results are correctly detected."""
def test_trailing_tool_result_triggers_note(self):
history = [
{"role": "user", "content": "deploy the app"},
{"role": "assistant", "content": None, "tool_calls": [
{"id": "call_1", "function": {"name": "terminal", "arguments": "{}"}}
]},
{"role": "tool", "tool_call_id": "call_1", "content": "deployed successfully"},
]
result = _simulate_auto_continue(history, "what happened?")
assert "[System note:" in result
assert "interrupted" in result
assert "what happened?" in result
def test_trailing_assistant_message_no_note(self):
history = [
{"role": "user", "content": "hello"},
{"role": "assistant", "content": "Hi there!"},
]
result = _simulate_auto_continue(history, "how are you?")
assert "[System note:" not in result
assert result == "how are you?"
def test_empty_history_no_note(self):
result = _simulate_auto_continue([], "hello")
assert result == "hello"
def test_trailing_user_message_no_note(self):
"""Shouldn't happen in practice, but ensure no false positive."""
history = [
{"role": "user", "content": "hello"},
]
result = _simulate_auto_continue(history, "hello again")
assert result == "hello again"
def test_multiple_tool_results_still_triggers(self):
"""Multiple tool calls in a row — last one is still role=tool."""
history = [
{"role": "user", "content": "search and read"},
{"role": "assistant", "content": None, "tool_calls": [
{"id": "call_1", "function": {"name": "search", "arguments": "{}"}},
{"id": "call_2", "function": {"name": "read", "arguments": "{}"}},
]},
{"role": "tool", "tool_call_id": "call_1", "content": "found it"},
{"role": "tool", "tool_call_id": "call_2", "content": "file content here"},
]
result = _simulate_auto_continue(history, "continue")
assert "[System note:" in result
def test_original_message_preserved_after_note(self):
"""The user's actual message must appear after the system note."""
history = [
{"role": "assistant", "content": None, "tool_calls": [
{"id": "c1", "function": {"name": "t", "arguments": "{}"}}
]},
{"role": "tool", "tool_call_id": "c1", "content": "done"},
]
result = _simulate_auto_continue(history, "now do X")
# System note comes first, then user's message
note_end = result.index("]\n\n")
user_msg_start = result.index("now do X")
assert user_msg_start > note_end
@@ -14,7 +14,7 @@ from unittest.mock import AsyncMock, patch
import pytest
from gateway.config import GatewayConfig, Platform
from gateway.run import GatewayRunner
from gateway.run import GatewayRunner, _parse_session_key
# ---------------------------------------------------------------------------
@@ -45,7 +45,7 @@ def _build_runner(monkeypatch, tmp_path, mode: str) -> GatewayRunner:
monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path)
runner = GatewayRunner(GatewayConfig())
adapter = SimpleNamespace(send=AsyncMock())
adapter = SimpleNamespace(send=AsyncMock(), handle_message=AsyncMock())
runner.adapters[Platform.TELEGRAM] = adapter
return runner
@@ -243,3 +243,174 @@ async def test_no_thread_id_sends_no_metadata(monkeypatch, tmp_path):
assert adapter.send.await_count == 1
_, kwargs = adapter.send.call_args
assert kwargs["metadata"] is None
@pytest.mark.asyncio
async def test_inject_watch_notification_routes_from_session_store_origin(monkeypatch, tmp_path):
from gateway.session import SessionSource
runner = _build_runner(monkeypatch, tmp_path, "all")
adapter = runner.adapters[Platform.TELEGRAM]
runner.session_store._entries["agent:main:telegram:group:-100:42"] = SimpleNamespace(
origin=SessionSource(
platform=Platform.TELEGRAM,
chat_id="-100",
chat_type="group",
thread_id="42",
user_id="123",
user_name="Emiliyan",
)
)
evt = {
"session_id": "proc_watch",
"session_key": "agent:main:telegram:group:-100:42",
}
await runner._inject_watch_notification("[SYSTEM: Background process matched]", evt)
adapter.handle_message.assert_awaited_once()
synth_event = adapter.handle_message.await_args.args[0]
assert synth_event.internal is True
assert synth_event.source.platform == Platform.TELEGRAM
assert synth_event.source.chat_id == "-100"
assert synth_event.source.chat_type == "group"
assert synth_event.source.thread_id == "42"
assert synth_event.source.user_id == "123"
assert synth_event.source.user_name == "Emiliyan"
def test_build_process_event_source_falls_back_to_session_key_chat_type(monkeypatch, tmp_path):
runner = _build_runner(monkeypatch, tmp_path, "all")
evt = {
"session_id": "proc_watch",
"session_key": "agent:main:telegram:group:-100:42",
"platform": "telegram",
"chat_id": "-100",
"thread_id": "42",
"user_id": "123",
"user_name": "Emiliyan",
}
source = runner._build_process_event_source(evt)
assert source is not None
assert source.platform == Platform.TELEGRAM
assert source.chat_id == "-100"
assert source.chat_type == "group"
assert source.thread_id == "42"
assert source.user_id == "123"
assert source.user_name == "Emiliyan"
@pytest.mark.asyncio
async def test_inject_watch_notification_ignores_foreground_event_source(monkeypatch, tmp_path):
"""Negative test: watch notification must NOT route to the foreground thread."""
from gateway.session import SessionSource
runner = _build_runner(monkeypatch, tmp_path, "all")
adapter = runner.adapters[Platform.TELEGRAM]
# Session store has the process's original thread (thread 42)
runner.session_store._entries["agent:main:telegram:group:-100:42"] = SimpleNamespace(
origin=SessionSource(
platform=Platform.TELEGRAM,
chat_id="-100",
chat_type="group",
thread_id="42",
user_id="proc_owner",
user_name="alice",
)
)
# The evt dict carries the correct session_key — NOT a foreground event
evt = {
"session_id": "proc_cross_thread",
"session_key": "agent:main:telegram:group:-100:42",
}
await runner._inject_watch_notification("[SYSTEM: watch match]", evt)
adapter.handle_message.assert_awaited_once()
synth_event = adapter.handle_message.await_args.args[0]
# Must route to thread 42 (process origin), NOT some other thread
assert synth_event.source.thread_id == "42"
assert synth_event.source.user_id == "proc_owner"
def test_build_process_event_source_returns_none_for_empty_evt(monkeypatch, tmp_path):
"""Missing session_key and no platform metadata → None (drop notification)."""
runner = _build_runner(monkeypatch, tmp_path, "all")
source = runner._build_process_event_source({"session_id": "proc_orphan"})
assert source is None
def test_build_process_event_source_returns_none_for_invalid_platform(monkeypatch, tmp_path):
"""Invalid platform string → None."""
runner = _build_runner(monkeypatch, tmp_path, "all")
evt = {
"session_id": "proc_bad",
"platform": "not_a_real_platform",
"chat_type": "dm",
"chat_id": "123",
}
source = runner._build_process_event_source(evt)
assert source is None
def test_build_process_event_source_returns_none_for_short_session_key(monkeypatch, tmp_path):
"""Session key with <5 parts doesn't parse, falls through to empty metadata → None."""
runner = _build_runner(monkeypatch, tmp_path, "all")
evt = {
"session_id": "proc_short",
"session_key": "agent:main:telegram", # Too few parts
}
source = runner._build_process_event_source(evt)
assert source is None
# ---------------------------------------------------------------------------
# _parse_session_key helper
# ---------------------------------------------------------------------------
def test_parse_session_key_valid():
result = _parse_session_key("agent:main:telegram:group:-100")
assert result == {"platform": "telegram", "chat_type": "group", "chat_id": "-100"}
def test_parse_session_key_with_extra_parts():
"""6th part in a group key may be a user_id, not a thread_id — omit it."""
result = _parse_session_key("agent:main:discord:group:chan123:thread456")
assert result == {"platform": "discord", "chat_type": "group", "chat_id": "chan123"}
def test_parse_session_key_with_user_id_part():
"""Group keys with per-user isolation have user_id as 6th part — don't return as thread_id."""
result = _parse_session_key("agent:main:telegram:group:chat1:user99")
assert result == {"platform": "telegram", "chat_type": "group", "chat_id": "chat1"}
def test_parse_session_key_dm_with_thread():
"""DM keys use parts[5] as thread_id unambiguously."""
result = _parse_session_key("agent:main:telegram:dm:chat1:topic42")
assert result == {"platform": "telegram", "chat_type": "dm", "chat_id": "chat1", "thread_id": "topic42"}
def test_parse_session_key_thread_chat_type():
"""Thread-typed keys use parts[5] as thread_id unambiguously."""
result = _parse_session_key("agent:main:discord:thread:chan1:thread99")
assert result == {"platform": "discord", "chat_type": "thread", "chat_id": "chan1", "thread_id": "thread99"}
def test_parse_session_key_too_short():
assert _parse_session_key("agent:main:telegram") is None
assert _parse_session_key("") is None
def test_parse_session_key_wrong_prefix():
assert _parse_session_key("cron:main:telegram:dm:123") is None
assert _parse_session_key("agent:cron:telegram:dm:123") is None
+293
View File
@@ -0,0 +1,293 @@
"""Tests for busy-session acknowledgment when user sends messages during active agent runs.
Verifies that users get an immediate status response instead of total silence
when the agent is working on a task. See PR fix for the @Lonely__MH report.
"""
import asyncio
import time
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
# ---------------------------------------------------------------------------
# Minimal stubs so we can import gateway code without heavy deps
# ---------------------------------------------------------------------------
import sys, types
_tg = types.ModuleType("telegram")
_tg.constants = types.ModuleType("telegram.constants")
_ct = MagicMock()
_ct.SUPERGROUP = "supergroup"
_ct.GROUP = "group"
_ct.PRIVATE = "private"
_tg.constants.ChatType = _ct
sys.modules.setdefault("telegram", _tg)
sys.modules.setdefault("telegram.constants", _tg.constants)
sys.modules.setdefault("telegram.ext", types.ModuleType("telegram.ext"))
from gateway.platforms.base import (
BasePlatformAdapter,
MessageEvent,
MessageType,
SessionSource,
build_session_key,
)
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _make_event(text="hello", chat_id="123", platform_val="telegram"):
"""Build a minimal MessageEvent."""
source = SessionSource(
platform=MagicMock(value=platform_val),
chat_id=chat_id,
chat_type="private",
user_id="user1",
)
evt = MessageEvent(
text=text,
message_type=MessageType.TEXT,
source=source,
message_id="msg1",
)
return evt
def _make_runner():
"""Build a minimal GatewayRunner-like object for testing."""
from gateway.run import GatewayRunner, _AGENT_PENDING_SENTINEL
runner = object.__new__(GatewayRunner)
runner._running_agents = {}
runner._running_agents_ts = {}
runner._pending_messages = {}
runner._busy_ack_ts = {}
runner._draining = False
runner.adapters = {}
runner.config = MagicMock()
runner.session_store = None
runner.hooks = MagicMock()
runner.hooks.emit = AsyncMock()
return runner, _AGENT_PENDING_SENTINEL
def _make_adapter(platform_val="telegram"):
"""Build a minimal adapter mock."""
adapter = MagicMock()
adapter._pending_messages = {}
adapter._send_with_retry = AsyncMock()
adapter.config = MagicMock()
adapter.config.extra = {}
adapter.platform = MagicMock(value=platform_val)
return adapter
# ---------------------------------------------------------------------------
# Tests
# ---------------------------------------------------------------------------
class TestBusySessionAck:
"""User sends a message while agent is running — should get acknowledgment."""
@pytest.mark.asyncio
async def test_sends_ack_when_agent_running(self):
"""First message during busy session should get a status ack."""
runner, sentinel = _make_runner()
adapter = _make_adapter()
event = _make_event(text="Are you working?")
sk = build_session_key(event.source)
# Simulate running agent
agent = MagicMock()
agent.get_activity_summary.return_value = {
"api_call_count": 21,
"max_iterations": 60,
"current_tool": "terminal",
"last_activity_ts": time.time(),
"last_activity_desc": "terminal",
"seconds_since_activity": 1.0,
}
runner._running_agents[sk] = agent
runner._running_agents_ts[sk] = time.time() - 600 # 10 min ago
runner.adapters[event.source.platform] = adapter
result = await runner._handle_active_session_busy_message(event, sk)
assert result is True # handled
# Verify ack was sent
adapter._send_with_retry.assert_called_once()
call_kwargs = adapter._send_with_retry.call_args
content = call_kwargs.kwargs.get("content") or call_kwargs[1].get("content", "")
if not content and call_kwargs.args:
# positional args
content = str(call_kwargs)
assert "Interrupting" in content or "respond" in content
assert "/stop" not in content # no need — we ARE interrupting
# Verify message was queued in adapter pending
assert sk in adapter._pending_messages
# Verify agent interrupt was called
agent.interrupt.assert_called_once_with("Are you working?")
@pytest.mark.asyncio
async def test_debounce_suppresses_rapid_acks(self):
"""Second message within 30s should NOT send another ack."""
runner, sentinel = _make_runner()
adapter = _make_adapter()
event1 = _make_event(text="hello?")
# Reuse the same source so platform mock matches
event2 = MessageEvent(
text="still there?",
message_type=MessageType.TEXT,
source=event1.source,
message_id="msg2",
)
sk = build_session_key(event1.source)
agent = MagicMock()
agent.get_activity_summary.return_value = {
"api_call_count": 5,
"max_iterations": 60,
"current_tool": None,
"last_activity_ts": time.time(),
"last_activity_desc": "api_call",
"seconds_since_activity": 0.5,
}
runner._running_agents[sk] = agent
runner._running_agents_ts[sk] = time.time() - 60
runner.adapters[event1.source.platform] = adapter
# First message — should get ack
result1 = await runner._handle_active_session_busy_message(event1, sk)
assert result1 is True
assert adapter._send_with_retry.call_count == 1
# Second message within cooldown — should be queued but no ack
result2 = await runner._handle_active_session_busy_message(event2, sk)
assert result2 is True
assert adapter._send_with_retry.call_count == 1 # still 1, no new ack
# But interrupt should still be called for both
assert agent.interrupt.call_count == 2
@pytest.mark.asyncio
async def test_ack_after_cooldown_expires(self):
"""After 30s cooldown, a new message should send a fresh ack."""
runner, sentinel = _make_runner()
adapter = _make_adapter()
event = _make_event(text="hello?")
sk = build_session_key(event.source)
agent = MagicMock()
agent.get_activity_summary.return_value = {
"api_call_count": 10,
"max_iterations": 60,
"current_tool": "web_search",
"last_activity_ts": time.time(),
"last_activity_desc": "tool",
"seconds_since_activity": 0.5,
}
runner._running_agents[sk] = agent
runner._running_agents_ts[sk] = time.time() - 120
runner.adapters[event.source.platform] = adapter
# First ack
await runner._handle_active_session_busy_message(event, sk)
assert adapter._send_with_retry.call_count == 1
# Fake that cooldown expired
runner._busy_ack_ts[sk] = time.time() - 31
# Second ack should go through
await runner._handle_active_session_busy_message(event, sk)
assert adapter._send_with_retry.call_count == 2
@pytest.mark.asyncio
async def test_includes_status_detail(self):
"""Ack message should include iteration and tool info when available."""
runner, sentinel = _make_runner()
adapter = _make_adapter()
event = _make_event(text="yo")
sk = build_session_key(event.source)
agent = MagicMock()
agent.get_activity_summary.return_value = {
"api_call_count": 21,
"max_iterations": 60,
"current_tool": "terminal",
"last_activity_ts": time.time(),
"last_activity_desc": "terminal",
"seconds_since_activity": 0.5,
}
runner._running_agents[sk] = agent
runner._running_agents_ts[sk] = time.time() - 600 # 10 min
runner.adapters[event.source.platform] = adapter
await runner._handle_active_session_busy_message(event, sk)
call_kwargs = adapter._send_with_retry.call_args
content = call_kwargs.kwargs.get("content", "")
assert "21/60" in content # iteration
assert "terminal" in content # current tool
assert "10 min" in content # elapsed
@pytest.mark.asyncio
async def test_draining_still_works(self):
"""Draining case should still produce the drain-specific message."""
runner, sentinel = _make_runner()
runner._draining = True
adapter = _make_adapter()
event = _make_event(text="hello")
sk = build_session_key(event.source)
runner.adapters[event.source.platform] = adapter
# Mock the drain-specific methods
runner._queue_during_drain_enabled = lambda: False
runner._status_action_gerund = lambda: "restarting"
result = await runner._handle_active_session_busy_message(event, sk)
assert result is True
call_kwargs = adapter._send_with_retry.call_args
content = call_kwargs.kwargs.get("content", "")
assert "restarting" in content
@pytest.mark.asyncio
async def test_pending_sentinel_no_interrupt(self):
"""When agent is PENDING_SENTINEL, don't call interrupt (it has no method)."""
runner, sentinel = _make_runner()
adapter = _make_adapter()
event = _make_event(text="hey")
sk = build_session_key(event.source)
runner._running_agents[sk] = sentinel
runner._running_agents_ts[sk] = time.time()
runner.adapters[event.source.platform] = adapter
result = await runner._handle_active_session_busy_message(event, sk)
assert result is True
# Should still send ack
adapter._send_with_retry.assert_called_once()
@pytest.mark.asyncio
async def test_no_adapter_falls_through(self):
"""If adapter is missing, return False so default path handles it."""
runner, sentinel = _make_runner()
event = _make_event(text="hello")
sk = build_session_key(event.source)
# No adapter registered
runner._running_agents[sk] = MagicMock()
result = await runner._handle_active_session_busy_message(event, sk)
assert result is False # not handled, let default path try
+113
View File
@@ -193,6 +193,67 @@ class TestLoadGatewayConfig:
assert config.thread_sessions_per_user is False
def test_bridges_discord_channel_prompts_from_config_yaml(self, tmp_path, monkeypatch):
hermes_home = tmp_path / ".hermes"
hermes_home.mkdir()
config_path = hermes_home / "config.yaml"
config_path.write_text(
"discord:\n"
" channel_prompts:\n"
" \"123\": Research mode\n"
" 456: Therapist mode\n",
encoding="utf-8",
)
monkeypatch.setenv("HERMES_HOME", str(hermes_home))
config = load_gateway_config()
assert config.platforms[Platform.DISCORD].extra["channel_prompts"] == {
"123": "Research mode",
"456": "Therapist mode",
}
def test_bridges_telegram_channel_prompts_from_config_yaml(self, tmp_path, monkeypatch):
hermes_home = tmp_path / ".hermes"
hermes_home.mkdir()
config_path = hermes_home / "config.yaml"
config_path.write_text(
"telegram:\n"
" channel_prompts:\n"
' "-1001234567": Research assistant\n'
" 789: Creative writing\n",
encoding="utf-8",
)
monkeypatch.setenv("HERMES_HOME", str(hermes_home))
config = load_gateway_config()
assert config.platforms[Platform.TELEGRAM].extra["channel_prompts"] == {
"-1001234567": "Research assistant",
"789": "Creative writing",
}
def test_bridges_slack_channel_prompts_from_config_yaml(self, tmp_path, monkeypatch):
hermes_home = tmp_path / ".hermes"
hermes_home.mkdir()
config_path = hermes_home / "config.yaml"
config_path.write_text(
"slack:\n"
" channel_prompts:\n"
' "C01ABC": Code review mode\n',
encoding="utf-8",
)
monkeypatch.setenv("HERMES_HOME", str(hermes_home))
config = load_gateway_config()
assert config.platforms[Platform.SLACK].extra["channel_prompts"] == {
"C01ABC": "Code review mode",
}
def test_invalid_quick_commands_in_config_yaml_are_ignored(self, tmp_path, monkeypatch):
hermes_home = tmp_path / ".hermes"
hermes_home.mkdir()
@@ -223,6 +284,58 @@ class TestLoadGatewayConfig:
assert config.unauthorized_dm_behavior == "ignore"
assert config.platforms[Platform.WHATSAPP].extra["unauthorized_dm_behavior"] == "pair"
def test_bridges_telegram_disable_link_previews_from_config_yaml(self, tmp_path, monkeypatch):
hermes_home = tmp_path / ".hermes"
hermes_home.mkdir()
config_path = hermes_home / "config.yaml"
config_path.write_text(
"telegram:\n"
" disable_link_previews: true\n",
encoding="utf-8",
)
monkeypatch.setenv("HERMES_HOME", str(hermes_home))
config = load_gateway_config()
assert config.platforms[Platform.TELEGRAM].extra["disable_link_previews"] is True
def test_bridges_telegram_proxy_url_from_config_yaml(self, tmp_path, monkeypatch):
hermes_home = tmp_path / ".hermes"
hermes_home.mkdir()
config_path = hermes_home / "config.yaml"
config_path.write_text(
"telegram:\n"
" proxy_url: socks5://127.0.0.1:1080\n",
encoding="utf-8",
)
monkeypatch.setenv("HERMES_HOME", str(hermes_home))
monkeypatch.delenv("TELEGRAM_PROXY", raising=False)
load_gateway_config()
import os
assert os.environ.get("TELEGRAM_PROXY") == "socks5://127.0.0.1:1080"
def test_telegram_proxy_env_takes_precedence_over_config(self, tmp_path, monkeypatch):
hermes_home = tmp_path / ".hermes"
hermes_home.mkdir()
config_path = hermes_home / "config.yaml"
config_path.write_text(
"telegram:\n"
" proxy_url: http://from-config:8080\n",
encoding="utf-8",
)
monkeypatch.setenv("HERMES_HOME", str(hermes_home))
monkeypatch.setenv("TELEGRAM_PROXY", "socks5://from-env:1080")
load_gateway_config()
import os
assert os.environ.get("TELEGRAM_PROXY") == "socks5://from-env:1080"
class TestHomeChannelEnvOverrides:
"""Home channel env vars should apply even when the platform was already

Some files were not shown because too many files have changed in this diff Show More