Compare commits

...

26 Commits

Author SHA1 Message Date
Sam Herring
e3123be445 Removing old patches 2026-03-30 10:06:08 -07:00
Sam Herring
e46d5b2c13 Removing old files 2026-03-30 09:58:05 -07:00
Sam Herring
34cc666105 Updating with trainer config pieces 2026-03-30 09:46:24 -07:00
Sam Herring
d6832260f9 Fixing eval steps to be a set number of tasks 2026-03-30 09:46:24 -07:00
Sam Herring
d2652e980f Adding random jitter for agent temp to add variance into rollouts 2026-03-30 09:46:24 -07:00
Sam Herring
89cea9fd2d Test basic Atropos trainer 2026-03-30 09:46:24 -07:00
Sam Herring
143e72c145 Updating endless terminals env with silenced warnings 2026-03-30 09:46:24 -07:00
Sam Herring
51305b3f3d Tool call changes 2026-03-30 09:46:24 -07:00
Sam Herring
570e52b342 Monkey patching chat template kwargs 2026-03-30 09:46:24 -07:00
Sam Herring
d6e874491d Env changes for tool use 2026-03-30 09:46:24 -07:00
Sam Herring
dd3812dffe Adding tool call parser default 2026-03-30 09:46:24 -07:00
Sam Herring
6e17630bac Eval splits for holdout sets 2026-03-30 09:46:24 -07:00
Sam Herring
53b710b13f Changing return type to be ScoredDataGroup to account for multiple trajectories 2026-03-30 09:46:24 -07:00
Sam Herring
5b1e8059cb Added task sppecific metris and evals 2026-03-30 09:46:24 -07:00
Sam Herring
ff16a33cdd Wandb changes 2026-03-30 09:46:24 -07:00
Sam Herring
7cfb9eb1f6 Updating config 2026-03-30 09:46:24 -07:00
Sam Herring
c7b15f8ce1 Adding config init method 2026-03-30 09:46:24 -07:00
Sam Herring
7602c462ee Updating path vars and dataset loading 2026-03-30 09:46:24 -07:00
Sam Herring
e38c24363c Updating to use hermes-agent backend and parse container definition out of provided .sif files 2026-03-30 09:46:24 -07:00
Sam Herring
d768b244a5 Adding endless terminal environment after rebase: 2026-03-30 09:46:24 -07:00
Teknium
97d6813f51 fix(cache): use deterministic call_id fallbacks instead of random UUIDs (#3991)
When the API doesn't provide a call_id for tool calls, the fallback
generated a random uuid4 hex. This made every API call's input unique
when replayed, preventing OpenAI's prompt cache from matching the
prefix across turns.

Replaced all four uuid4 fallback sites with a deterministic hash of
(function_name, arguments, position_index). The same tool call now
always produces the same fallback call_id, preserving cache-friendly
input stability.

Affected code paths:
- _chat_messages_to_responses_input() — Codex input reconstruction
- _normalize_codex_response() — function_call and custom_tool_call
- _build_assistant_message() — assistant message construction
2026-03-30 09:43:56 -07:00
Teknium
37825189dd fix(skills): validate hub bundle paths before install (#3986)
Co-authored-by: Gutslabs <gutslabsxyz@gmail.com>
2026-03-30 08:37:19 -07:00
Teknium
e08778fa1e chore: release v0.6.0 (2026.3.30) (#3985) 2026-03-30 08:29:38 -07:00
Teknium
fb634068df fix(security): extend secret redaction to ElevenLabs, Tavily and Exa API keys (#3920)
ElevenLabs (sk_), Tavily (tvly-), and Exa (exa_) keys were not covered
by _PREFIX_PATTERNS, leaking in plain text via printenv or log output.

Salvaged from PR #3790 by @memosr. Tests rewritten with correct
assertions (original tests had vacuously true checks).

Co-authored-by: memosr <memosr@users.noreply.github.com>
2026-03-30 08:13:01 -07:00
Teknium
74181fe726 fix: add TTY guard to interactive CLI commands to prevent CPU spin (#3933)
When interactive TUI commands are invoked non-interactively (e.g. via
the agent's terminal() tool through a subprocess pipe), curses loops
spin at 100% CPU and input() calls hang indefinitely.

Defense in depth — two layers:

1. Source-level guard in curses_checklist() (curses_ui.py + checklist.py):
   Returns cancel_returns immediately when stdin is not a TTY. This
   catches ALL callers automatically, including future code.

2. Command-level guards with clear error messages:
   - hermes tools (interactive checklist, not list/disable/enable)
   - hermes setup (interactive wizard)
   - hermes model (provider/model picker)
   - hermes whatsapp (pairing setup)
   - hermes skills config (skill toggle)
   - hermes mcp configure (tool selection)
   - hermes uninstall (confirmation prompt)

Non-interactive subcommands (hermes tools list, hermes tools enable,
hermes mcp add/remove/list/test, hermes skills search/install/browse)
remain unaffected.
2026-03-30 08:10:23 -07:00
Teknium
1e896b0251 fix: resolve 7 failing CI tests (#3936)
1. matrix voice: _on_room_message_media unconditionally overwrote
   media_urls with the image cache path (always None for non-images),
   wiping the locally-cached voice path. Now only overrides when
   cached_path is truthy.

2. cli_tools_command: /tools disable no longer prompts for confirmation
   (input() removed in earlier commit to fix TUI hang), but tests still
   expected the old Y/N prompt flow. Updated tests to match current
   behavior (direct apply + session reset).

3. slack app_mention: connect() was refactored for multi-workspace
   (creates AsyncWebClient per token), but test only mocked the old
   self._app.client path. Added AsyncWebClient and acquire_scoped_lock
   mocks.

4. website_policy: module-level _cached_policy from earlier tests caused
   fast-path return of None. Added invalidate_cache() before assertion.

5. codex 401 refresh: already passing on current main (fixed by
   intervening commit).
2026-03-30 08:10:14 -07:00
23 changed files with 1821 additions and 45 deletions

249
RELEASE_v0.6.0.md Normal file
View File

@@ -0,0 +1,249 @@
# Hermes Agent v0.6.0 (v2026.3.30)
**Release Date:** March 30, 2026
> The multi-instance release — Profiles for running isolated agent instances, MCP server mode, Docker container, fallback provider chains, two new messaging platforms (Feishu/Lark and WeCom), Telegram webhook mode, Slack multi-workspace OAuth, 95 PRs and 16 resolved issues in 2 days.
---
## ✨ Highlights
- **Profiles — Multi-Instance Hermes** — Run multiple isolated Hermes instances from the same installation. Each profile gets its own config, memory, sessions, skills, and gateway service. Create with `hermes profile create`, switch with `hermes -p <name>`, export/import for sharing. Full token-lock isolation prevents two profiles from using the same bot credential. ([#3681](https://github.com/NousResearch/hermes-agent/pull/3681))
- **MCP Server Mode** — Expose Hermes conversations and sessions to any MCP-compatible client (Claude Desktop, Cursor, VS Code, etc.) via `hermes mcp serve`. Browse conversations, read messages, search across sessions, and manage attachments — all through the Model Context Protocol. Supports both stdio and Streamable HTTP transports. ([#3795](https://github.com/NousResearch/hermes-agent/pull/3795))
- **Docker Container** — Official Dockerfile for running Hermes Agent in a container. Supports both CLI and gateway modes with volume-mounted config. ([#3668](https://github.com/NousResearch/hermes-agent/pull/3668), closes [#850](https://github.com/NousResearch/hermes-agent/issues/850))
- **Ordered Fallback Provider Chain** — Configure multiple inference providers with automatic failover. When your primary provider returns errors or is unreachable, Hermes automatically tries the next provider in the chain. Configure via `fallback_providers` in config.yaml. ([#3813](https://github.com/NousResearch/hermes-agent/pull/3813), closes [#1734](https://github.com/NousResearch/hermes-agent/issues/1734))
- **Feishu/Lark Platform Support** — Full gateway adapter for Feishu (飞书) and Lark with event subscriptions, message cards, group chat, image/file attachments, and interactive card callbacks. ([#3799](https://github.com/NousResearch/hermes-agent/pull/3799), [#3817](https://github.com/NousResearch/hermes-agent/pull/3817), closes [#1788](https://github.com/NousResearch/hermes-agent/issues/1788))
- **WeCom (Enterprise WeChat) Platform Support** — New gateway adapter for WeCom (企业微信) with text/image/voice messages, group chats, and callback verification. ([#3847](https://github.com/NousResearch/hermes-agent/pull/3847))
- **Slack Multi-Workspace OAuth** — Connect a single Hermes gateway to multiple Slack workspaces via OAuth token file. Each workspace gets its own bot token, resolved dynamically per incoming event. ([#3903](https://github.com/NousResearch/hermes-agent/pull/3903))
- **Telegram Webhook Mode & Group Controls** — Run the Telegram adapter in webhook mode as an alternative to polling — faster response times and better for production deployments behind a reverse proxy. New group mention gating controls when the bot responds: always, only when @mentioned, or via regex triggers. ([#3880](https://github.com/NousResearch/hermes-agent/pull/3880), [#3870](https://github.com/NousResearch/hermes-agent/pull/3870))
- **Exa Search Backend** — Add Exa as an alternative web search and content extraction backend alongside Firecrawl and DuckDuckGo. Set `EXA_API_KEY` and configure as preferred backend. ([#3648](https://github.com/NousResearch/hermes-agent/pull/3648))
- **Skills & Credentials on Remote Backends** — Mount skill directories and credential files into Modal and Docker containers, so remote terminal sessions have access to the same skills and secrets as local execution. ([#3890](https://github.com/NousResearch/hermes-agent/pull/3890), [#3671](https://github.com/NousResearch/hermes-agent/pull/3671), closes [#3665](https://github.com/NousResearch/hermes-agent/issues/3665), [#3433](https://github.com/NousResearch/hermes-agent/issues/3433))
---
## 🏗️ Core Agent & Architecture
### Provider & Model Support
- **Ordered fallback provider chain** — automatic failover across multiple configured providers ([#3813](https://github.com/NousResearch/hermes-agent/pull/3813))
- **Fix api_mode on provider switch** — switching providers via `hermes model` now correctly clears stale `api_mode` instead of hardcoding `chat_completions`, fixing 404s for providers with Anthropic-compatible endpoints ([#3726](https://github.com/NousResearch/hermes-agent/pull/3726), [#3857](https://github.com/NousResearch/hermes-agent/pull/3857), closes [#3685](https://github.com/NousResearch/hermes-agent/issues/3685))
- **Stop silent OpenRouter fallback** — when no provider is configured, Hermes now raises a clear error instead of silently routing to OpenRouter ([#3807](https://github.com/NousResearch/hermes-agent/pull/3807), [#3862](https://github.com/NousResearch/hermes-agent/pull/3862))
- **Gemini 3.1 preview models** — added to OpenRouter and Nous Portal catalogs ([#3803](https://github.com/NousResearch/hermes-agent/pull/3803), closes [#3753](https://github.com/NousResearch/hermes-agent/issues/3753))
- **Gemini direct API context length** — full context length resolution for direct Google AI endpoints ([#3876](https://github.com/NousResearch/hermes-agent/pull/3876))
- **gpt-5.4-mini** added to Codex fallback catalog ([#3855](https://github.com/NousResearch/hermes-agent/pull/3855))
- **Curated model lists preferred** over live API probe when the probe returns fewer models ([#3856](https://github.com/NousResearch/hermes-agent/pull/3856), [#3867](https://github.com/NousResearch/hermes-agent/pull/3867))
- **User-friendly 429 rate limit messages** with Retry-After countdown ([#3809](https://github.com/NousResearch/hermes-agent/pull/3809))
- **Auxiliary client placeholder key** for local servers without auth requirements ([#3842](https://github.com/NousResearch/hermes-agent/pull/3842))
- **INFO-level logging** for auxiliary provider resolution ([#3866](https://github.com/NousResearch/hermes-agent/pull/3866))
### Agent Loop & Conversation
- **Subagent status reporting** — reports `completed` status when summary exists instead of generic failure ([#3829](https://github.com/NousResearch/hermes-agent/pull/3829))
- **Session log file updated during compression** — prevents stale file references after context compression ([#3835](https://github.com/NousResearch/hermes-agent/pull/3835))
- **Omit empty tools param** — sends no `tools` parameter when empty instead of `None`, fixing compatibility with strict providers ([#3820](https://github.com/NousResearch/hermes-agent/pull/3820))
### Profiles & Multi-Instance
- **Profiles system** — `hermes profile create/list/switch/delete/export/import/rename`. Each profile gets isolated HERMES_HOME, gateway service, CLI wrapper. Token locks prevent credential collisions. Tab completion for profile names. ([#3681](https://github.com/NousResearch/hermes-agent/pull/3681))
- **Profile-aware display paths** — all user-facing `~/.hermes` paths replaced with `display_hermes_home()` to show the correct profile directory ([#3623](https://github.com/NousResearch/hermes-agent/pull/3623))
- **Lazy display_hermes_home imports** — prevents `ImportError` during `hermes update` when modules cache stale bytecode ([#3776](https://github.com/NousResearch/hermes-agent/pull/3776))
- **HERMES_HOME for protected paths** — `.env` write-deny path now respects HERMES_HOME instead of hardcoded `~/.hermes` ([#3840](https://github.com/NousResearch/hermes-agent/pull/3840))
---
## 📱 Messaging Platforms (Gateway)
### New Platforms
- **Feishu/Lark** — Full adapter with event subscriptions, message cards, group chat, image/file attachments, interactive card callbacks ([#3799](https://github.com/NousResearch/hermes-agent/pull/3799), [#3817](https://github.com/NousResearch/hermes-agent/pull/3817))
- **WeCom (Enterprise WeChat)** — Text/image/voice messages, group chats, callback verification ([#3847](https://github.com/NousResearch/hermes-agent/pull/3847))
### Telegram
- **Webhook mode** — run as webhook endpoint instead of polling for production deployments ([#3880](https://github.com/NousResearch/hermes-agent/pull/3880))
- **Group mention gating & regex triggers** — configurable bot response behavior in groups: always, @mention-only, or regex-matched ([#3870](https://github.com/NousResearch/hermes-agent/pull/3870))
- **Gracefully handle deleted reply targets** — no more crashes when the message being replied to was deleted ([#3858](https://github.com/NousResearch/hermes-agent/pull/3858), closes [#3229](https://github.com/NousResearch/hermes-agent/issues/3229))
### Discord
- **Message processing reactions** — adds a reaction emoji while processing and removes it when done, giving visual feedback in channels ([#3871](https://github.com/NousResearch/hermes-agent/pull/3871))
- **DISCORD_IGNORE_NO_MENTION** — skip messages that @mention other users/bots but not Hermes ([#3640](https://github.com/NousResearch/hermes-agent/pull/3640))
- **Clean up deferred "thinking..."** — properly removes the "thinking..." indicator after slash commands complete ([#3674](https://github.com/NousResearch/hermes-agent/pull/3674), closes [#3595](https://github.com/NousResearch/hermes-agent/issues/3595))
### Slack
- **Multi-workspace OAuth** — connect to multiple Slack workspaces from a single gateway via OAuth token file ([#3903](https://github.com/NousResearch/hermes-agent/pull/3903))
### WhatsApp
- **Persistent aiohttp session** — reuse HTTP sessions across requests instead of creating new ones per message ([#3818](https://github.com/NousResearch/hermes-agent/pull/3818))
- **LID↔phone alias resolution** — correctly match Linked ID and phone number formats in allowlists ([#3830](https://github.com/NousResearch/hermes-agent/pull/3830))
- **Skip reply prefix in bot mode** — cleaner message formatting when running as a WhatsApp bot ([#3931](https://github.com/NousResearch/hermes-agent/pull/3931))
### Matrix
- **Native voice messages via MSC3245** — send voice messages as proper Matrix voice events instead of file attachments ([#3877](https://github.com/NousResearch/hermes-agent/pull/3877))
### Mattermost
- **Configurable mention behavior** — respond to messages without requiring @mention ([#3664](https://github.com/NousResearch/hermes-agent/pull/3664))
### Signal
- **URL-encode phone numbers** and correct attachment RPC parameter — fixes delivery failures with certain phone number formats ([#3670](https://github.com/NousResearch/hermes-agent/pull/3670)) — @kshitijk4poor
### Email
- **Close SMTP/IMAP connections on failure** — prevents connection leaks during error scenarios ([#3804](https://github.com/NousResearch/hermes-agent/pull/3804))
### Gateway Core
- **Atomic config writes** — use atomic file writes for config.yaml to prevent data loss during crashes ([#3800](https://github.com/NousResearch/hermes-agent/pull/3800))
- **Home channel env overrides** — apply environment variable overrides for home channels consistently ([#3796](https://github.com/NousResearch/hermes-agent/pull/3796), [#3808](https://github.com/NousResearch/hermes-agent/pull/3808))
- **Replace print() with logger** — BasePlatformAdapter now uses proper logging instead of print statements ([#3669](https://github.com/NousResearch/hermes-agent/pull/3669))
- **Cron delivery labels** — resolve human-friendly delivery labels via channel directory ([#3860](https://github.com/NousResearch/hermes-agent/pull/3860), closes [#1945](https://github.com/NousResearch/hermes-agent/issues/1945))
- **Cron [SILENT] tightening** — prevent agents from prefixing reports with [SILENT] to suppress delivery ([#3901](https://github.com/NousResearch/hermes-agent/pull/3901))
- **Background task media delivery** and vision download timeout fixes ([#3919](https://github.com/NousResearch/hermes-agent/pull/3919))
- **Boot-md hook** — example built-in hook to run a BOOT.md file on gateway startup ([#3733](https://github.com/NousResearch/hermes-agent/pull/3733))
---
## 🖥️ CLI & User Experience
### Interactive CLI
- **Configurable tool preview length** — show full file paths by default instead of truncating at 40 chars ([#3841](https://github.com/NousResearch/hermes-agent/pull/3841))
- **Tool token context display** — `hermes tools` checklist now shows estimated token cost per toolset ([#3805](https://github.com/NousResearch/hermes-agent/pull/3805))
- **/bg spinner TUI fix** — route background task spinner through the TUI widget to prevent status bar collision ([#3643](https://github.com/NousResearch/hermes-agent/pull/3643))
- **Prevent status bar wrapping** into duplicate rows ([#3883](https://github.com/NousResearch/hermes-agent/pull/3883)) — @kshitijk4poor
- **Handle closed stdout ValueError** in safe print paths — fixes crashes when stdout is closed during gateway thread shutdown ([#3843](https://github.com/NousResearch/hermes-agent/pull/3843), closes [#3534](https://github.com/NousResearch/hermes-agent/issues/3534))
- **Remove input() from /tools disable** — eliminates freeze in terminal when disabling tools ([#3918](https://github.com/NousResearch/hermes-agent/pull/3918))
- **TTY guard for interactive CLI commands** — prevent CPU spin when launched without a terminal ([#3933](https://github.com/NousResearch/hermes-agent/pull/3933))
- **Argparse entrypoint** — use argparse in the top-level launcher for cleaner error handling ([#3874](https://github.com/NousResearch/hermes-agent/pull/3874))
- **Lazy-initialized tools show yellow** in banner instead of red, reducing false alarm about "missing" tools ([#3822](https://github.com/NousResearch/hermes-agent/pull/3822))
- **Honcho tools shown in banner** when configured ([#3810](https://github.com/NousResearch/hermes-agent/pull/3810))
### Setup & Configuration
- **Auto-install matrix-nio** during `hermes setup` when Matrix is selected ([#3802](https://github.com/NousResearch/hermes-agent/pull/3802), [#3873](https://github.com/NousResearch/hermes-agent/pull/3873))
- **Session export stdout support** — export sessions to stdout with `-` for piping ([#3641](https://github.com/NousResearch/hermes-agent/pull/3641), closes [#3609](https://github.com/NousResearch/hermes-agent/issues/3609))
- **Configurable approval timeouts** — set how long dangerous command approval prompts wait before auto-denying ([#3886](https://github.com/NousResearch/hermes-agent/pull/3886), closes [#3765](https://github.com/NousResearch/hermes-agent/issues/3765))
- **Clear __pycache__ during update** — prevents stale bytecode ImportError after `hermes update` ([#3819](https://github.com/NousResearch/hermes-agent/pull/3819))
---
## 🔧 Tool System
### MCP
- **MCP Server Mode** — `hermes mcp serve` exposes conversations, sessions, and attachments to MCP clients via stdio or Streamable HTTP ([#3795](https://github.com/NousResearch/hermes-agent/pull/3795))
- **Dynamic tool discovery** — respond to `notifications/tools/list_changed` events to pick up new tools from MCP servers without reconnecting ([#3812](https://github.com/NousResearch/hermes-agent/pull/3812))
- **Non-deprecated HTTP transport** — switched from `sse_client` to `streamable_http_client` ([#3646](https://github.com/NousResearch/hermes-agent/pull/3646))
### Web Tools
- **Exa search backend** — alternative to Firecrawl and DuckDuckGo for web search and extraction ([#3648](https://github.com/NousResearch/hermes-agent/pull/3648))
### Browser
- **Guard against None LLM responses** in browser snapshot and vision tools ([#3642](https://github.com/NousResearch/hermes-agent/pull/3642))
### Terminal & Remote Backends
- **Mount skill directories** into Modal and Docker containers ([#3890](https://github.com/NousResearch/hermes-agent/pull/3890))
- **Mount credential files** into remote backends with mtime+size caching ([#3671](https://github.com/NousResearch/hermes-agent/pull/3671))
- **Preserve partial output** when commands time out instead of losing everything ([#3868](https://github.com/NousResearch/hermes-agent/pull/3868))
- **Stop marking persisted env vars as missing** on remote backends ([#3650](https://github.com/NousResearch/hermes-agent/pull/3650))
### Audio
- **.aac format support** in transcription tool ([#3865](https://github.com/NousResearch/hermes-agent/pull/3865), closes [#1963](https://github.com/NousResearch/hermes-agent/issues/1963))
- **Audio download retry** — retry logic for `cache_audio_from_url` matching the existing image download pattern ([#3401](https://github.com/NousResearch/hermes-agent/pull/3401)) — @binhnt92
### Vision
- **Reject non-image files** and enforce website-only policy for vision analysis ([#3845](https://github.com/NousResearch/hermes-agent/pull/3845))
### Tool Schema
- **Ensure name field** always present in tool definitions, fixing `KeyError: 'name'` crashes ([#3811](https://github.com/NousResearch/hermes-agent/pull/3811), closes [#3729](https://github.com/NousResearch/hermes-agent/issues/3729))
### ACP (Editor Integration)
- **Complete session management surface** for VS Code/Zed/JetBrains clients — proper task lifecycle, cancel support, session persistence ([#3675](https://github.com/NousResearch/hermes-agent/pull/3675))
---
## 🧩 Skills & Plugins
### Skills System
- **External skill directories** — configure additional skill directories via `skills.external_dirs` in config.yaml ([#3678](https://github.com/NousResearch/hermes-agent/pull/3678))
- **Category path traversal blocked** — prevents `../` attacks in skill category names ([#3844](https://github.com/NousResearch/hermes-agent/pull/3844))
- **parallel-cli moved to optional-skills** — reduces default skill footprint ([#3673](https://github.com/NousResearch/hermes-agent/pull/3673)) — @kshitijk4poor
### New Skills
- **memento-flashcards** — spaced repetition flashcard system ([#3827](https://github.com/NousResearch/hermes-agent/pull/3827))
- **songwriting-and-ai-music** — songwriting craft and AI music generation prompts ([#3834](https://github.com/NousResearch/hermes-agent/pull/3834))
- **SiYuan Note** — integration with SiYuan note-taking app ([#3742](https://github.com/NousResearch/hermes-agent/pull/3742))
- **Scrapling** — web scraping skill using Scrapling library ([#3742](https://github.com/NousResearch/hermes-agent/pull/3742))
- **one-three-one-rule** — communication framework skill ([#3797](https://github.com/NousResearch/hermes-agent/pull/3797))
### Plugin System
- **Plugin enable/disable commands** — `hermes plugins enable/disable <name>` for managing plugin state without removing them ([#3747](https://github.com/NousResearch/hermes-agent/pull/3747))
- **Plugin message injection** — plugins can now inject messages into the conversation stream on behalf of the user via `ctx.inject_message()` ([#3778](https://github.com/NousResearch/hermes-agent/pull/3778)) — @winglian
- **Honcho self-hosted support** — allow local Honcho instances without requiring an API key ([#3644](https://github.com/NousResearch/hermes-agent/pull/3644))
---
## 🔒 Security & Reliability
### Security Hardening
- **Hardened dangerous command detection** — expanded pattern matching for risky shell commands and added file tool path guards for sensitive locations (`/etc/`, `/boot/`, docker.sock) ([#3872](https://github.com/NousResearch/hermes-agent/pull/3872))
- **Sensitive path write checks** in approval system — catch writes to system config files through file tools, not just terminal ([#3859](https://github.com/NousResearch/hermes-agent/pull/3859))
- **Secret redaction expansion** — now covers ElevenLabs, Tavily, and Exa API keys ([#3920](https://github.com/NousResearch/hermes-agent/pull/3920))
- **Vision file rejection** — reject non-image files passed to vision analysis to prevent information disclosure ([#3845](https://github.com/NousResearch/hermes-agent/pull/3845))
- **Category path traversal blocking** — prevent directory traversal in skill category names ([#3844](https://github.com/NousResearch/hermes-agent/pull/3844))
### Reliability
- **Atomic config.yaml writes** — prevent data loss during gateway crashes ([#3800](https://github.com/NousResearch/hermes-agent/pull/3800))
- **Clear __pycache__ on update** — prevent stale bytecode from causing ImportError after updates ([#3819](https://github.com/NousResearch/hermes-agent/pull/3819))
- **Lazy imports for update safety** — prevent ImportError chains during `hermes update` when modules reference new functions ([#3776](https://github.com/NousResearch/hermes-agent/pull/3776))
- **Restore terminalbench2 from patch corruption** — recovered file damaged by patch tool's secret redaction ([#3801](https://github.com/NousResearch/hermes-agent/pull/3801))
- **Terminal timeout preserves partial output** — no more lost command output on timeout ([#3868](https://github.com/NousResearch/hermes-agent/pull/3868))
---
## 🐛 Notable Bug Fixes
- **OpenClaw migration model config overwrite** — migration no longer overwrites model config dict with a string ([#3924](https://github.com/NousResearch/hermes-agent/pull/3924)) — @0xbyt4
- **OpenClaw migration expanded** — covers full data footprint including sessions, cron, memory ([#3869](https://github.com/NousResearch/hermes-agent/pull/3869))
- **Telegram deleted reply targets** — gracefully handle replies to deleted messages instead of crashing ([#3858](https://github.com/NousResearch/hermes-agent/pull/3858))
- **Discord "thinking..." persistence** — properly cleans up deferred response indicators ([#3674](https://github.com/NousResearch/hermes-agent/pull/3674))
- **WhatsApp LID↔phone aliases** — fixes allowlist matching failures with Linked ID format ([#3830](https://github.com/NousResearch/hermes-agent/pull/3830))
- **Signal URL-encoded phone numbers** — fixes delivery failures with certain formats ([#3670](https://github.com/NousResearch/hermes-agent/pull/3670))
- **Email connection leaks** — properly close SMTP/IMAP connections on error ([#3804](https://github.com/NousResearch/hermes-agent/pull/3804))
- **_safe_print ValueError** — no more gateway thread crashes on closed stdout ([#3843](https://github.com/NousResearch/hermes-agent/pull/3843))
- **Tool schema KeyError 'name'** — ensure name field always present in tool definitions ([#3811](https://github.com/NousResearch/hermes-agent/pull/3811))
- **api_mode stale on provider switch** — correctly clear when switching providers via `hermes model` ([#3857](https://github.com/NousResearch/hermes-agent/pull/3857))
---
## 🧪 Testing
- Resolved 10+ CI failures across hooks, tiktoken, plugins, and skill tests ([#3848](https://github.com/NousResearch/hermes-agent/pull/3848), [#3721](https://github.com/NousResearch/hermes-agent/pull/3721), [#3936](https://github.com/NousResearch/hermes-agent/pull/3936))
---
## 📚 Documentation
- **Comprehensive OpenClaw migration guide** — step-by-step guide for migrating from OpenClaw/Claw3D to Hermes Agent ([#3864](https://github.com/NousResearch/hermes-agent/pull/3864), [#3900](https://github.com/NousResearch/hermes-agent/pull/3900))
- **Credential file passthrough docs** — document how to forward credential files and env vars to remote backends ([#3677](https://github.com/NousResearch/hermes-agent/pull/3677))
- **DuckDuckGo requirements clarified** — note runtime dependency on duckduckgo-search package ([#3680](https://github.com/NousResearch/hermes-agent/pull/3680))
- **Skills catalog updated** — added red-teaming category and optional skills listing ([#3745](https://github.com/NousResearch/hermes-agent/pull/3745))
- **Feishu docs MDX fix** — escape angle-bracket URLs that break Docusaurus build ([#3902](https://github.com/NousResearch/hermes-agent/pull/3902))
---
## 👥 Contributors
### Core
- **@teknium1** — 90 PRs across all subsystems
### Community Contributors
- **@kshitijk4poor** — 3 PRs: Signal phone number fix ([#3670](https://github.com/NousResearch/hermes-agent/pull/3670)), parallel-cli to optional-skills ([#3673](https://github.com/NousResearch/hermes-agent/pull/3673)), status bar wrapping fix ([#3883](https://github.com/NousResearch/hermes-agent/pull/3883))
- **@winglian** — 1 PR: Plugin message injection interface ([#3778](https://github.com/NousResearch/hermes-agent/pull/3778))
- **@binhnt92** — 1 PR: Audio download retry logic ([#3401](https://github.com/NousResearch/hermes-agent/pull/3401))
- **@0xbyt4** — 1 PR: OpenClaw migration model config fix ([#3924](https://github.com/NousResearch/hermes-agent/pull/3924))
### Issues Resolved from Community
@Material-Scientist ([#850](https://github.com/NousResearch/hermes-agent/issues/850)), @hanxu98121 ([#1734](https://github.com/NousResearch/hermes-agent/issues/1734)), @penwyp ([#1788](https://github.com/NousResearch/hermes-agent/issues/1788)), @dan-and ([#1945](https://github.com/NousResearch/hermes-agent/issues/1945)), @AdrianScott ([#1963](https://github.com/NousResearch/hermes-agent/issues/1963)), @clawdbot47 ([#3229](https://github.com/NousResearch/hermes-agent/issues/3229)), @alanfwilliams ([#3404](https://github.com/NousResearch/hermes-agent/issues/3404)), @kentimsit ([#3433](https://github.com/NousResearch/hermes-agent/issues/3433)), @hayka-pacha ([#3534](https://github.com/NousResearch/hermes-agent/issues/3534)), @primmer ([#3595](https://github.com/NousResearch/hermes-agent/issues/3595)), @dagelf ([#3609](https://github.com/NousResearch/hermes-agent/issues/3609)), @HenkDz ([#3685](https://github.com/NousResearch/hermes-agent/issues/3685)), @tmdgusya ([#3729](https://github.com/NousResearch/hermes-agent/issues/3729)), @TypQxQ ([#3753](https://github.com/NousResearch/hermes-agent/issues/3753)), @acsezen ([#3765](https://github.com/NousResearch/hermes-agent/issues/3765))
---
**Full Changelog**: [v2026.3.28...v2026.3.30](https://github.com/NousResearch/hermes-agent/compare/v2026.3.28...v2026.3.30)

View File

@@ -37,6 +37,9 @@ _PREFIX_PATTERNS = [
r"dop_v1_[A-Za-z0-9]{10,}", # DigitalOcean PAT
r"doo_v1_[A-Za-z0-9]{10,}", # DigitalOcean OAuth
r"am_[A-Za-z0-9_-]{10,}", # AgentMail API key
r"sk_[A-Za-z0-9_]{10,}", # ElevenLabs TTS key (sk_ underscore, not sk- dash)
r"tvly-[A-Za-z0-9]{10,}", # Tavily search API key
r"exa_[A-Za-z0-9]{10,}", # Exa search API key
]
# ENV assignment patterns: KEY=value where KEY contains a secret-like name

View File

@@ -13,6 +13,7 @@ Core layers:
Concrete environments:
- terminal_test_env/: Simple file-creation tasks for testing the stack
- hermes_swe_env/: SWE-bench style tasks with Modal sandboxes
- endless_terminals/: Terminal tasks from HuggingFace dataset with Apptainer containers
Benchmarks (eval-only):
- benchmarks/terminalbench_2/: Terminal-Bench 2.0 evaluation

View File

@@ -0,0 +1,5 @@
"""Endless Terminals Environment - Terminal task training from HuggingFace dataset."""
from .endless_terminals_env import EndlessTerminalsEnv, EndlessTerminalsEnvConfig
__all__ = ["EndlessTerminalsEnv", "EndlessTerminalsEnvConfig"]

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,91 @@
# Endless Terminals - Qwen3-4B-Instruct-2507
# Single config for both trainer (launch_training.py) and env (endless_terminals_env.py serve)
#
# Usage:
# Terminal 1: run-api
# Terminal 2: cd tinker-atropos && python launch_training.py --config ../environments/endless_terminals/tinker_qwen.yaml
# Terminal 3: python environments/endless_terminals/endless_terminals_env.py serve --config environments/endless_terminals/tinker_qwen.yaml
env:
# Toolsets
enabled_toolsets: ["terminal", "file"]
# Model / tokenizer
tokenizer_name: "Qwen/Qwen3-4B-Instruct-2507"
# Agent configuration
max_agent_turns: 16
max_token_length: 2048
agent_temperature: 0.6
extra_body:
chat_template_kwargs:
enable_thinking: false
tool_call_parser: "hermes"
# Terminal backend
terminal_backend: "docker"
# Dataset settings
use_dataset: true
dataset_name: "obiwan96/endless-terminals"
dataset_split: "train"
dataset_cache_dir: "~/.cache/huggingface/datasets"
tasks_base_dir: "/Users/samherring/Desktop/Projects/Hermes-Agent/endless-terminals"
# Test execution
test_timeout_s: 180
default_docker_image: "ubuntu:22.04"
max_concurrent_containers: 16
# Training configuration
group_size: 16
batch_size: 64 # 4 groups × 16 rollouts per step
total_steps: 500
steps_per_eval: 5
min_items_sent_before_logging: 1
ensure_scores_are_not_same: true
max_num_workers: 2048
worker_timeout: 3600
inference_weight: 1.0
eval_limit_ratio: 0.1
rollout_server_url: "http://localhost:8000"
# Evaluation configuration
num_eval_tasks: 20
eval_split_ratio: 0.1
# Logging
use_wandb: true
wandb_name: "endless-terminals-qwen3-4b"
# System prompt
system_prompt: >
You are a skilled Linux system administrator and programmer.
You have access to a terminal and file tools to complete system administration
and programming tasks. Use the tools effectively to solve the given task,
and verify your solution works correctly before finishing.
Keep each command short and focused — break complex tasks into multiple steps
rather than writing long one-liners.
tinker:
lora_rank: 32
learning_rate: 0.0000005
max_token_trainer_length: 32768
checkpoint_dir: "./temp/"
save_checkpoint_interval: 50
wandb_project: "endless-terminals"
wandb_group: null
wandb_run_name: "qwen3-4b"
tool_call_parser: "hermes"
openai:
- model_name: "Qwen/Qwen3-4B-Instruct-2507"
base_url: "http://localhost:8001/v1"
api_key: "x"
weight: 1.0
num_requests_for_eval: 64
timeout: 600
server_type: "sglang"
slurm: false
testing: false

View File

@@ -298,7 +298,6 @@ class HermesAgentBaseEnv(BaseEnv):
return False
server = self.server.servers[0]
# If the server is an OpenAI server (not VLLM/SGLang), use direct mode
from atroposlib.envs.server_handling.openai_server import OpenAIServer
return not isinstance(server, OpenAIServer)

View File

@@ -48,7 +48,13 @@ class HermesToolCallParser(ToolCallParser):
if not raw_json.strip():
continue
tc_data = json.loads(raw_json)
try:
tc_data = json.loads(raw_json)
except json.JSONDecodeError:
# Fix invalid backslash escapes from shell commands in JSON strings
# e.g. \s \w \d \n (unescaped) → \\s \\w \\d \\n
fixed = re.sub(r'\\([^"\\/bfnrtu0-9\n])', r'\\\\\1', raw_json)
tc_data = json.loads(fixed)
tool_calls.append(
ChatCompletionMessageToolCall(
id=f"call_{uuid.uuid4().hex[:8]}",

View File

@@ -904,8 +904,9 @@ class MatrixAdapter(BasePlatformAdapter):
thread_id=thread_id,
)
# Use cached local path for images, HTTP URL for other media types
media_urls = [cached_path] if cached_path else ([http_url] if http_url else None)
# Use cached local path for images (voice messages already handled above).
if cached_path:
media_urls = [cached_path]
media_types = [media_type] if media_urls else None
msg_event = MessageEvent(

View File

@@ -11,5 +11,5 @@ Provides subcommands for:
- hermes cron - Manage cron jobs
"""
__version__ = "0.5.0"
__release_date__ = "2026.3.28"
__version__ = "0.6.0"
__release_date__ = "2026.3.30"

View File

@@ -5,6 +5,7 @@ toggleable list of items. Falls back to a numbered text UI when
curses is unavailable (Windows without curses, piped stdin, etc.).
"""
import sys
from typing import List, Set
from hermes_cli.colors import Colors, color
@@ -26,6 +27,10 @@ def curses_checklist(
The indices the user confirmed as checked. On cancel (ESC/q),
returns ``pre_selected`` unchanged.
"""
# Safety: return defaults when stdin is not a terminal.
if not sys.stdin.isatty():
return set(pre_selected)
try:
import curses
selected = set(pre_selected)

View File

@@ -4,6 +4,7 @@ Used by `hermes tools` and `hermes skills` for interactive checklists.
Provides a curses multi-select with keyboard navigation, plus a
text-based numbered fallback for terminals without curses support.
"""
import sys
from typing import Callable, List, Optional, Set
from hermes_cli.colors import Colors, color
@@ -31,6 +32,11 @@ def curses_checklist(
if cancel_returns is None:
cancel_returns = set(selected)
# Safety: curses and input() both hang or spin when stdin is not a
# terminal (e.g. subprocess pipe). Return defaults immediately.
if not sys.stdin.isatty():
return cancel_returns
try:
import curses
chosen = set(selected)

View File

@@ -50,6 +50,23 @@ import sys
from pathlib import Path
from typing import Optional
def _require_tty(command_name: str) -> None:
"""Exit with a clear error if stdin is not a terminal.
Interactive TUI commands (hermes tools, hermes setup, hermes model) use
curses or input() prompts that spin at 100% CPU when stdin is a pipe.
This guard prevents accidental non-interactive invocation.
"""
if not sys.stdin.isatty():
print(
f"Error: 'hermes {command_name}' requires an interactive terminal.\n"
f"It cannot be run through a pipe or non-interactive subprocess.\n"
f"Run it directly in your terminal instead.",
file=sys.stderr,
)
sys.exit(1)
# Add project root to path
PROJECT_ROOT = Path(__file__).parent.parent.resolve()
sys.path.insert(0, str(PROJECT_ROOT))
@@ -617,6 +634,7 @@ def cmd_gateway(args):
def cmd_whatsapp(args):
"""Set up WhatsApp: choose mode, configure, install bridge, pair via QR."""
_require_tty("whatsapp")
import subprocess
from pathlib import Path
from hermes_cli.config import get_env_value, save_env_value
@@ -803,12 +821,14 @@ def cmd_whatsapp(args):
def cmd_setup(args):
"""Interactive setup wizard."""
_require_tty("setup")
from hermes_cli.setup import run_setup_wizard
run_setup_wizard(args)
def cmd_model(args):
"""Select default model — starts with provider selection, then model picker."""
_require_tty("model")
from hermes_cli.auth import (
resolve_provider, AuthError, format_auth_error,
)
@@ -2459,6 +2479,7 @@ def cmd_version(args):
def cmd_uninstall(args):
"""Uninstall Hermes Agent."""
_require_tty("uninstall")
from hermes_cli.uninstall import run_uninstall
run_uninstall(args)
@@ -4131,6 +4152,7 @@ For more help on a command:
def cmd_skills(args):
# Route 'config' action to skills_config module
if getattr(args, 'skills_action', None) == 'config':
_require_tty("skills config")
from hermes_cli.skills_config import skills_command as skills_config_command
skills_config_command(args)
else:
@@ -4341,6 +4363,7 @@ For more help on a command:
from hermes_cli.tools_config import tools_disable_enable_command
tools_disable_enable_command(args)
else:
_require_tty("tools")
from hermes_cli.tools_config import tools_command
tools_command(args)

View File

@@ -511,6 +511,10 @@ def _interpolate_value(value: str) -> str:
def cmd_mcp_configure(args):
"""Reconfigure which tools are enabled for an existing MCP server."""
import sys as _sys
if not _sys.stdin.isatty():
print("Error: 'hermes mcp configure' requires an interactive terminal.", file=_sys.stderr)
_sys.exit(1)
name = args.name
servers = _get_mcp_servers()

View File

@@ -354,7 +354,14 @@ def do_install(identifier: str, category: str = "", force: bool = False,
extra_metadata.update(getattr(bundle, "metadata", {}) or {})
# Quarantine the bundle
q_path = quarantine_bundle(bundle)
try:
q_path = quarantine_bundle(bundle)
except ValueError as exc:
c.print(f"[bold red]Installation blocked:[/] {exc}\n")
from tools.skills_hub import append_audit_log
append_audit_log("BLOCKED", bundle.name, bundle.source,
bundle.trust_level, "invalid_path", str(exc))
return
c.print(f"[dim]Quarantined to {q_path.relative_to(q_path.parent.parent.parent)}[/]")
# Scan
@@ -414,7 +421,15 @@ def do_install(identifier: str, category: str = "", force: bool = False,
return
# Install
install_dir = install_from_quarantine(q_path, bundle.name, category, bundle, result)
try:
install_dir = install_from_quarantine(q_path, bundle.name, category, bundle, result)
except ValueError as exc:
c.print(f"[bold red]Installation blocked:[/] {exc}\n")
shutil.rmtree(q_path, ignore_errors=True)
from tools.skills_hub import append_audit_log
append_audit_log("BLOCKED", bundle.name, bundle.source,
bundle.trust_level, "invalid_path", str(exc))
return
from tools.skills_hub import SKILLS_DIR
c.print(f"[bold green]Installed:[/] {install_dir.relative_to(SKILLS_DIR)}")
c.print(f"[dim]Files: {', '.join(bundle.files.keys())}[/]\n")

View File

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "hermes-agent"
version = "0.5.0"
version = "0.6.0"
description = "The self-improving AI agent — creates skills from experience, improves them during use, and runs anywhere"
readme = "README.md"
requires-python = ">=3.11"

View File

@@ -2907,6 +2907,19 @@ class AIAgent:
})
return converted or None
@staticmethod
def _deterministic_call_id(fn_name: str, arguments: str, index: int = 0) -> str:
"""Generate a deterministic call_id from tool call content.
Used as a fallback when the API doesn't provide a call_id.
Deterministic IDs prevent cache invalidation — random UUIDs would
make every API call's prefix unique, breaking OpenAI's prompt cache.
"""
import hashlib
seed = f"{fn_name}:{arguments}:{index}"
digest = hashlib.sha256(seed.encode("utf-8", errors="replace")).hexdigest()[:12]
return f"call_{digest}"
@staticmethod
def _split_responses_tool_id(raw_id: Any) -> tuple[Optional[str], Optional[str]]:
"""Split a stored tool id into (call_id, response_item_id)."""
@@ -3013,7 +3026,8 @@ class AIAgent:
):
call_id = f"call_{embedded_response_item_id[len('fc_'):]}"
else:
call_id = f"call_{uuid.uuid4().hex[:12]}"
_raw_args = str(fn.get("arguments", "{}"))
call_id = self._deterministic_call_id(fn_name, _raw_args, len(items))
call_id = call_id.strip()
arguments = fn.get("arguments", "{}")
@@ -3377,7 +3391,7 @@ class AIAgent:
embedded_call_id, _ = self._split_responses_tool_id(raw_item_id)
call_id = raw_call_id if isinstance(raw_call_id, str) and raw_call_id.strip() else embedded_call_id
if not isinstance(call_id, str) or not call_id.strip():
call_id = f"call_{uuid.uuid4().hex[:12]}"
call_id = self._deterministic_call_id(fn_name, arguments, len(tool_calls))
call_id = call_id.strip()
response_item_id = raw_item_id if isinstance(raw_item_id, str) else None
response_item_id = self._derive_responses_function_call_id(call_id, response_item_id)
@@ -3398,7 +3412,7 @@ class AIAgent:
embedded_call_id, _ = self._split_responses_tool_id(raw_item_id)
call_id = raw_call_id if isinstance(raw_call_id, str) and raw_call_id.strip() else embedded_call_id
if not isinstance(call_id, str) or not call_id.strip():
call_id = f"call_{uuid.uuid4().hex[:12]}"
call_id = self._deterministic_call_id(fn_name, arguments, len(tool_calls))
call_id = call_id.strip()
response_item_id = raw_item_id if isinstance(raw_item_id, str) else None
response_item_id = self._derive_responses_function_call_id(call_id, response_item_id)
@@ -4933,7 +4947,10 @@ class AIAgent:
if isinstance(raw_id, str) and raw_id.strip():
call_id = raw_id.strip()
else:
call_id = f"call_{uuid.uuid4().hex[:12]}"
_fn = getattr(tool_call, "function", None)
_fn_name = getattr(_fn, "name", "") if _fn else ""
_fn_args = getattr(_fn, "arguments", "{}") if _fn else "{}"
call_id = self._deterministic_call_id(_fn_name, _fn_args, len(tool_calls))
call_id = call_id.strip()
response_item_id = getattr(tool_call, "response_item_id", None)

View File

@@ -201,3 +201,52 @@ class TestSecretCapturePayloadRedaction:
text = '{"raw_secret": "ghp_abc123def456ghi789jkl"}'
result = redact_sensitive_text(text)
assert "abc123def456" not in result
class TestElevenLabsTavilyExaKeys:
"""Regression tests for ElevenLabs (sk_), Tavily (tvly-), and Exa (exa_) keys."""
def test_elevenlabs_key_redacted(self):
text = "ELEVENLABS_API_KEY=sk_abc123def456ghi789jklmnopqrstu"
result = redact_sensitive_text(text)
assert "abc123def456ghi" not in result
def test_elevenlabs_key_in_log_line(self):
text = "Connecting to ElevenLabs with key sk_abc123def456ghi789jklmnopqrstu"
result = redact_sensitive_text(text)
assert "abc123def456ghi" not in result
def test_tavily_key_redacted(self):
text = "TAVILY_API_KEY=tvly-ABCdef123456789GHIJKL0000"
result = redact_sensitive_text(text)
assert "ABCdef123456789" not in result
def test_tavily_key_in_log_line(self):
text = "Initialising Tavily client with tvly-ABCdef123456789GHIJKL0000"
result = redact_sensitive_text(text)
assert "ABCdef123456789" not in result
def test_exa_key_redacted(self):
text = "EXA_API_KEY=exa_XYZ789abcdef000000000000000"
result = redact_sensitive_text(text)
assert "XYZ789abcdef" not in result
def test_exa_key_in_log_line(self):
text = "Using Exa client with key exa_XYZ789abcdef000000000000000"
result = redact_sensitive_text(text)
assert "XYZ789abcdef" not in result
def test_all_three_in_env_dump(self):
env_dump = (
"HOME=/home/user\n"
"ELEVENLABS_API_KEY=sk_abc123def456ghi789jklmnopqrstu\n"
"TAVILY_API_KEY=tvly-ABCdef123456789GHIJKL0000\n"
"EXA_API_KEY=exa_XYZ789abcdef000000000000000\n"
"SHELL=/bin/bash\n"
)
result = redact_sensitive_text(env_dump)
assert "abc123def456ghi" not in result
assert "ABCdef123456789" not in result
assert "XYZ789abcdef" not in result
assert "HOME=/home/user" in result
assert "SHELL=/bin/bash" in result

View File

@@ -126,9 +126,20 @@ class TestAppMentionHandler:
"user": "testbot",
})
# Mock AsyncWebClient so multi-workspace auth_test is awaitable
mock_web_client = AsyncMock()
mock_web_client.auth_test = AsyncMock(return_value={
"user_id": "U_BOT",
"user": "testbot",
"team_id": "T_FAKE",
"team": "FakeTeam",
})
with patch.object(_slack_mod, "AsyncApp", return_value=mock_app), \
patch.object(_slack_mod, "AsyncWebClient", return_value=mock_web_client), \
patch.object(_slack_mod, "AsyncSocketModeHandler", return_value=MagicMock()), \
patch.dict(os.environ, {"SLACK_APP_TOKEN": "xapp-fake"}), \
patch("gateway.status.acquire_scoped_lock", return_value=(True, None)), \
patch("asyncio.create_task"):
asyncio.run(adapter.connect())

View File

@@ -60,34 +60,43 @@ class TestToolsSlashList:
class TestToolsSlashDisableWithReset:
def test_disable_confirms_then_resets_session(self):
def test_disable_applies_directly_and_resets_session(self):
"""Disable applies immediately (no confirmation prompt) and resets session."""
cli_obj = _make_cli(["web", "memory"])
with patch("hermes_cli.tools_config.load_config",
return_value={"platform_toolsets": {"cli": ["web", "memory"]}}), \
patch("hermes_cli.tools_config.save_config"), \
patch("hermes_cli.tools_config._get_platform_tools", return_value={"memory"}), \
patch("hermes_cli.config.load_config", return_value={}), \
patch.object(cli_obj, "new_session") as mock_reset, \
patch("builtins.input", return_value="y"):
patch.object(cli_obj, "new_session") as mock_reset:
cli_obj._handle_tools_command("/tools disable web")
mock_reset.assert_called_once()
assert "web" not in cli_obj.enabled_toolsets
def test_disable_cancelled_does_not_reset(self):
def test_disable_does_not_prompt_for_confirmation(self):
"""Disable no longer uses input() — it applies directly."""
cli_obj = _make_cli(["web", "memory"])
with patch.object(cli_obj, "new_session") as mock_reset, \
patch("builtins.input", return_value="n"):
with patch("hermes_cli.tools_config.load_config",
return_value={"platform_toolsets": {"cli": ["web", "memory"]}}), \
patch("hermes_cli.tools_config.save_config"), \
patch("hermes_cli.tools_config._get_platform_tools", return_value={"memory"}), \
patch("hermes_cli.config.load_config", return_value={}), \
patch.object(cli_obj, "new_session"), \
patch("builtins.input") as mock_input:
cli_obj._handle_tools_command("/tools disable web")
mock_reset.assert_not_called()
# Toolsets unchanged
assert cli_obj.enabled_toolsets == {"web", "memory"}
mock_input.assert_not_called()
def test_disable_eof_cancels(self):
def test_disable_always_resets_session(self):
"""Even without a confirmation prompt, disable always resets the session."""
cli_obj = _make_cli(["web", "memory"])
with patch.object(cli_obj, "new_session") as mock_reset, \
patch("builtins.input", side_effect=EOFError):
with patch("hermes_cli.tools_config.load_config",
return_value={"platform_toolsets": {"cli": ["web", "memory"]}}), \
patch("hermes_cli.tools_config.save_config"), \
patch("hermes_cli.tools_config._get_platform_tools", return_value={"memory"}), \
patch("hermes_cli.config.load_config", return_value={}), \
patch.object(cli_obj, "new_session") as mock_reset:
cli_obj._handle_tools_command("/tools disable web")
mock_reset.assert_not_called()
mock_reset.assert_called_once()
def test_disable_missing_name_prints_usage(self, capsys):
cli_obj = _make_cli()
@@ -101,15 +110,15 @@ class TestToolsSlashDisableWithReset:
class TestToolsSlashEnableWithReset:
def test_enable_confirms_then_resets_session(self):
def test_enable_applies_directly_and_resets_session(self):
"""Enable applies immediately (no confirmation prompt) and resets session."""
cli_obj = _make_cli(["memory"])
with patch("hermes_cli.tools_config.load_config",
return_value={"platform_toolsets": {"cli": ["memory"]}}), \
patch("hermes_cli.tools_config.save_config"), \
patch("hermes_cli.tools_config._get_platform_tools", return_value={"memory", "web"}), \
patch("hermes_cli.config.load_config", return_value={}), \
patch.object(cli_obj, "new_session") as mock_reset, \
patch("builtins.input", return_value="y"):
patch.object(cli_obj, "new_session") as mock_reset:
cli_obj._handle_tools_command("/tools enable web")
mock_reset.assert_called_once()
assert "web" in cli_obj.enabled_toolsets

View File

@@ -5,6 +5,7 @@ from pathlib import Path
from unittest.mock import patch, MagicMock
import httpx
import pytest
from tools.skills_hub import (
GitHubAuth,
@@ -648,6 +649,29 @@ class TestWellKnownSkillSource:
assert bundle.files["SKILL.md"] == "# Code Review\n"
assert bundle.files["references/checklist.md"] == "- [ ] security\n"
@patch("tools.skills_hub._write_index_cache")
@patch("tools.skills_hub._read_index_cache", return_value=None)
@patch("tools.skills_hub.httpx.get")
def test_fetch_rejects_unsafe_file_paths_from_well_known_endpoint(self, mock_get, _mock_read_cache, _mock_write_cache):
def fake_get(url, *args, **kwargs):
if url.endswith("/index.json"):
return MagicMock(status_code=200, json=lambda: {
"skills": [{
"name": "code-review",
"description": "Review code",
"files": ["SKILL.md", "../../../escape.txt"],
}]
})
if url.endswith("/code-review/SKILL.md"):
return MagicMock(status_code=200, text="# Code Review\n")
raise AssertionError(url)
mock_get.side_effect = fake_get
bundle = self._source().fetch("well-known:https://example.com/.well-known/skills/code-review")
assert bundle is None
class TestCheckForSkillUpdates:
def test_bundle_content_hash_matches_installed_content_hash(self, tmp_path):
@@ -1143,6 +1167,61 @@ class TestQuarantineBundleBinaryAssets:
assert (q_path / "SKILL.md").read_text(encoding="utf-8").startswith("---")
assert (q_path / "assets" / "neutts-cli" / "samples" / "jo.wav").read_bytes() == b"RIFF\x00\x01fakewav"
def test_quarantine_bundle_rejects_traversal_file_paths(self, tmp_path):
import tools.skills_hub as hub
hub_dir = tmp_path / "skills" / ".hub"
with patch.object(hub, "SKILLS_DIR", tmp_path / "skills"), \
patch.object(hub, "HUB_DIR", hub_dir), \
patch.object(hub, "LOCK_FILE", hub_dir / "lock.json"), \
patch.object(hub, "QUARANTINE_DIR", hub_dir / "quarantine"), \
patch.object(hub, "AUDIT_LOG", hub_dir / "audit.log"), \
patch.object(hub, "TAPS_FILE", hub_dir / "taps.json"), \
patch.object(hub, "INDEX_CACHE_DIR", hub_dir / "index-cache"):
bundle = SkillBundle(
name="demo",
files={
"SKILL.md": "---\nname: demo\n---\n",
"../../../escape.txt": "owned",
},
source="well-known",
identifier="well-known:https://example.com/.well-known/skills/demo",
trust_level="community",
)
with pytest.raises(ValueError, match="Unsafe bundle file path"):
quarantine_bundle(bundle)
assert not (tmp_path / "skills" / "escape.txt").exists()
def test_quarantine_bundle_rejects_absolute_file_paths(self, tmp_path):
import tools.skills_hub as hub
hub_dir = tmp_path / "skills" / ".hub"
absolute_target = tmp_path / "outside.txt"
with patch.object(hub, "SKILLS_DIR", tmp_path / "skills"), \
patch.object(hub, "HUB_DIR", hub_dir), \
patch.object(hub, "LOCK_FILE", hub_dir / "lock.json"), \
patch.object(hub, "QUARANTINE_DIR", hub_dir / "quarantine"), \
patch.object(hub, "AUDIT_LOG", hub_dir / "audit.log"), \
patch.object(hub, "TAPS_FILE", hub_dir / "taps.json"), \
patch.object(hub, "INDEX_CACHE_DIR", hub_dir / "index-cache"):
bundle = SkillBundle(
name="demo",
files={
"SKILL.md": "---\nname: demo\n---\n",
str(absolute_target): "owned",
},
source="well-known",
identifier="well-known:https://example.com/.well-known/skills/demo",
trust_level="community",
)
with pytest.raises(ValueError, match="Unsafe bundle file path"):
quarantine_bundle(bundle)
assert not absolute_target.exists()
# ---------------------------------------------------------------------------
# GitHubSource._download_directory — tree API + fallback (#2940)

View File

@@ -259,6 +259,12 @@ def test_check_website_access_uses_dynamic_hermes_home(monkeypatch, tmp_path):
monkeypatch.setenv("HERMES_HOME", str(hermes_home))
# Invalidate the module-level cache so the new HERMES_HOME is picked up.
# A prior test may have cached a default policy (enabled=False) under the
# old HERMES_HOME set by the autouse _isolate_hermes_home fixture.
from tools.website_policy import invalidate_cache
invalidate_cache()
blocked = check_website_access("https://dynamic.example/path")
assert blocked is not None

View File

@@ -24,7 +24,7 @@ import time
from abc import ABC, abstractmethod
from dataclasses import dataclass, field
from datetime import datetime, timezone
from pathlib import Path
from pathlib import Path, PurePosixPath
from hermes_constants import get_hermes_home
from typing import Any, Dict, List, Optional, Tuple, Union
from urllib.parse import urlparse, urlunparse
@@ -85,6 +85,43 @@ class SkillBundle:
metadata: Dict[str, Any] = field(default_factory=dict)
def _normalize_bundle_path(path_value: str, *, field_name: str, allow_nested: bool) -> str:
"""Normalize and validate bundle-controlled paths before touching disk."""
if not isinstance(path_value, str):
raise ValueError(f"Unsafe {field_name}: expected a string")
raw = path_value.strip()
if not raw:
raise ValueError(f"Unsafe {field_name}: empty path")
normalized = raw.replace("\\", "/")
path = PurePosixPath(normalized)
parts = [part for part in path.parts if part not in ("", ".")]
if normalized.startswith("/") or path.is_absolute():
raise ValueError(f"Unsafe {field_name}: {path_value}")
if not parts or any(part == ".." for part in parts):
raise ValueError(f"Unsafe {field_name}: {path_value}")
if re.fullmatch(r"[A-Za-z]:", parts[0]):
raise ValueError(f"Unsafe {field_name}: {path_value}")
if not allow_nested and len(parts) != 1:
raise ValueError(f"Unsafe {field_name}: {path_value}")
return "/".join(parts)
def _validate_skill_name(name: str) -> str:
return _normalize_bundle_path(name, field_name="skill name", allow_nested=False)
def _validate_category_name(category: str) -> str:
return _normalize_bundle_path(category, field_name="category", allow_nested=False)
def _validate_bundle_rel_path(rel_path: str) -> str:
return _normalize_bundle_path(rel_path, field_name="bundle file path", allow_nested=True)
# ---------------------------------------------------------------------------
# GitHub Authentication
# ---------------------------------------------------------------------------
@@ -701,6 +738,12 @@ class WellKnownSkillSource(SkillSource):
if not parsed:
return None
try:
skill_name = _validate_skill_name(parsed["skill_name"])
except ValueError:
logger.warning("Well-known skill identifier contained unsafe skill name: %s", identifier)
return None
entry = self._index_entry(parsed["index_url"], parsed["skill_name"])
if not entry:
return None
@@ -713,19 +756,28 @@ class WellKnownSkillSource(SkillSource):
for rel_path in files:
if not isinstance(rel_path, str) or not rel_path:
continue
text = self._fetch_text(f"{parsed['skill_url']}/{rel_path}")
try:
safe_rel_path = _validate_bundle_rel_path(rel_path)
except ValueError:
logger.warning(
"Well-known skill %s advertised unsafe file path: %r",
identifier,
rel_path,
)
return None
text = self._fetch_text(f"{parsed['skill_url']}/{safe_rel_path}")
if text is None:
return None
downloaded[rel_path] = text
downloaded[safe_rel_path] = text
if "SKILL.md" not in downloaded:
return None
return SkillBundle(
name=parsed["skill_name"],
name=skill_name,
files=downloaded,
source="well-known",
identifier=self._wrap_identifier(parsed["base_url"], parsed["skill_name"]),
identifier=self._wrap_identifier(parsed["base_url"], skill_name),
trust_level="community",
metadata={
"index_url": parsed["index_url"],
@@ -1752,9 +1804,10 @@ class ClawHubSource(SkillSource):
for info in zf.infolist():
if info.is_dir():
continue
# Sanitize path — strip leading slashes and ..
name = info.filename.lstrip("/")
if ".." in name or name.startswith("/"):
try:
name = _validate_bundle_rel_path(info.filename)
except ValueError:
logger.debug("Skipping unsafe ZIP member path: %s", info.filename)
continue
# Only extract text-sized files (skip large binaries)
if info.file_size > 500_000:
@@ -2423,13 +2476,19 @@ def ensure_hub_dirs() -> None:
def quarantine_bundle(bundle: SkillBundle) -> Path:
"""Write a skill bundle to the quarantine directory for scanning."""
ensure_hub_dirs()
dest = QUARANTINE_DIR / bundle.name
skill_name = _validate_skill_name(bundle.name)
validated_files: List[Tuple[str, Union[str, bytes]]] = []
for rel_path, file_content in bundle.files.items():
safe_rel_path = _validate_bundle_rel_path(rel_path)
validated_files.append((safe_rel_path, file_content))
dest = QUARANTINE_DIR / skill_name
if dest.exists():
shutil.rmtree(dest)
dest.mkdir(parents=True)
for rel_path, file_content in bundle.files.items():
file_dest = dest / rel_path
for rel_path, file_content in validated_files:
file_dest = dest.joinpath(*rel_path.split("/"))
file_dest.parent.mkdir(parents=True, exist_ok=True)
if isinstance(file_content, bytes):
file_dest.write_bytes(file_content)
@@ -2447,10 +2506,17 @@ def install_from_quarantine(
scan_result: ScanResult,
) -> Path:
"""Move a scanned skill from quarantine into the skills directory."""
if category:
install_dir = SKILLS_DIR / category / skill_name
safe_skill_name = _validate_skill_name(skill_name)
safe_category = _validate_category_name(category) if category else ""
quarantine_resolved = quarantine_path.resolve()
quarantine_root = QUARANTINE_DIR.resolve()
if not quarantine_resolved.is_relative_to(quarantine_root):
raise ValueError(f"Unsafe quarantine path: {quarantine_path}")
if safe_category:
install_dir = SKILLS_DIR / safe_category / safe_skill_name
else:
install_dir = SKILLS_DIR / skill_name
install_dir = SKILLS_DIR / safe_skill_name
if install_dir.exists():
shutil.rmtree(install_dir)
@@ -2461,7 +2527,7 @@ def install_from_quarantine(
# Record in lock file
lock = HubLockFile()
lock.record_install(
name=skill_name,
name=safe_skill_name,
source=bundle.source,
identifier=bundle.identifier,
trust_level=bundle.trust_level,
@@ -2473,7 +2539,7 @@ def install_from_quarantine(
)
append_audit_log(
"INSTALL", skill_name, bundle.source,
"INSTALL", safe_skill_name, bundle.source,
bundle.trust_level, scan_result.verdict,
content_hash(install_dir),
)