Compare commits

...

55 Commits

Author SHA1 Message Date
Teknium a18884a3d0 fix(run_agent): enhance streaming error handling and retry logic
Improved the error handling and retry mechanism for streaming requests in the AIAgent class. Introduced a configurable maximum number of stream retries and refined the handling of transient network errors, allowing for retries with fresh connections. Non-transient errors now trigger a fallback to non-streaming only when appropriate, ensuring better resilience during API interactions.
2026-03-25 08:33:22 -07:00
Teknium 29d3f1216b fix(run_agent): reduce default stream read timeout for chat completions
Updated the default stream read timeout from 120 seconds to 60 seconds in the AIAgent class, enhancing the timeout configuration for chat completions. This change aims to improve responsiveness during streaming operations.
2026-03-25 08:19:43 -07:00
Teknium fe37a53b75 fix(run_agent): improve timeout handling for chat completions
Enhanced the timeout configuration for chat completions in the AIAgent class by introducing customizable connection, read, and write timeouts using environment variables. This ensures more robust handling of API requests during streaming operations.
2026-03-25 08:12:22 -07:00
Teknium b6ef1deafd fix(run_agent): ensure _fire_first_delta() is called for tool generation events
Added calls to _fire_first_delta() in the AIAgent class to improve the handling of tool generation events, ensuring timely notifications during the processing of function calls and tool usage.
2026-03-25 07:39:49 -07:00
Teknium 0f3c191ef1 fix(cli): enhance real-time reasoning output by forcing flush of long partial lines
Updated the reasoning output mechanism to emit complete lines and force-flush long partial lines, ensuring reasoning is visible in real-time even without newlines. This improves user experience during reasoning sessions.
2026-03-24 19:56:30 -07:00
Teknium 7cdf4efe05 fix(skills): agent-created skills were incorrectly treated as untrusted community content
_resolve_trust_level() didn't handle 'agent-created' source, so it
fell through to 'community' trust level. Community policy blocks on
any caution or dangerous findings, which meant common patterns like
curl with env vars, systemctl, crontab, cloudflared references etc.
would block skill creation/patching.

The agent-created policy row already existed in INSTALL_POLICY with
permissive settings (allow caution, ask on dangerous) but was never
reached. Now it is.

Fixes reports of skill_manage being blocked by security scanner.
2026-03-24 19:56:30 -07:00
Teknium adee8d1b5f fix: browser_vision ignores auxiliary.vision.timeout config (#2901)
* docs: unify hooks documentation — add plugin hooks to hooks page, add session:end event

The hooks page only documented gateway event hooks (HOOK.yaml system).
The plugins page listed plugin hooks (pre_tool_call, etc.) that weren't
referenced from the hooks page, which was confusing.

Changes:
- hooks.md: Add overview table showing both hook systems
- hooks.md: Add Plugin Hooks section with available hooks, callback
  signatures, and example
- hooks.md: Add missing session:end gateway event (emitted but undocumented)
- hooks.md: Mark pre_llm_call, post_llm_call, on_session_start,
  on_session_end as planned (defined in VALID_HOOKS but not yet invoked)
- hooks.md: Update info box to cross-reference plugin hooks
- hooks.md: Fix heading hierarchy (gateway content as subsections)
- plugins.md: Add cross-reference to hooks page for full details
- plugins.md: Mark planned hooks as (planned)

* fix: browser_vision ignores auxiliary.vision.timeout config

browser_vision called call_llm() without passing a timeout parameter,
so it always used the 30-second default in auxiliary_client.py. This
made vision analysis with local models (llama.cpp, ollama) impossible
since they typically need more than 30s for screenshot analysis.

Now browser_vision reads auxiliary.vision.timeout from config.yaml
(same config key that vision_analyze already uses) and passes it
through to call_llm().

Also bumped the default vision timeout from 30s to 120s in both
browser_vision and vision_analyze — 30s is too aggressive for local
models and the previous default silently failed for anyone running
vision locally.

Fixes user report from GamerGB1988.
2026-03-24 19:56:30 -07:00
Teknium f5b84dddfd fix(compression): restore sane defaults and cap summary at 12K tokens
- threshold: 0.80 → 0.50 (compress at 50%, not 80%)
- target_ratio: 0.40 → 0.20, now relative to threshold not total context
  (20% of 50% = 10% of context as tail budget)
- summary ceiling: 32K → 12K (Gemini can't output more than ~12K)
- Updated DEFAULT_CONFIG, config display, example config, and tests
2026-03-24 19:56:30 -07:00
Teknium 4549a2f51a docs: clarify two-mode behavior in session_search schema description 2026-03-24 19:56:30 -07:00
Teknium 466720c2f3 feat(session_search): add recent sessions mode when query is omitted
When session_search is called without a query (or with an empty query),
it now returns metadata for the most recent sessions instead of erroring.
This lets the agent quickly see what was worked on recently without
needing specific keywords.

Returns for each session: session_id, title, source, started_at,
last_active, message_count, preview (first user message).
Zero LLM cost — pure DB query. Current session lineage and child
delegation sessions are excluded.

The agent can then keyword-search specific sessions if it needs
deeper context from any of them.
2026-03-24 19:56:30 -07:00
Teknium fccd7a2ab4 docs: unify hooks documentation — add plugin hooks to hooks page, add session:end event
The hooks page only documented gateway event hooks (HOOK.yaml system).
The plugins page listed plugin hooks (pre_tool_call, etc.) that weren't
referenced from the hooks page, which was confusing.

Changes:
- hooks.md: Add overview table showing both hook systems
- hooks.md: Add Plugin Hooks section with available hooks, callback
  signatures, and example
- hooks.md: Add missing session:end gateway event (emitted but undocumented)
- hooks.md: Mark pre_llm_call, post_llm_call, on_session_start,
  on_session_end as planned (defined in VALID_HOOKS but not yet invoked)
- hooks.md: Update info box to cross-reference plugin hooks
- hooks.md: Fix heading hierarchy (gateway content as subsections)
- plugins.md: Add cross-reference to hooks page for full details
- plugins.md: Mark planned hooks as (planned)
2026-03-24 18:34:14 -07:00
Teknium 27c023e071 feat(config): expose compression target_ratio, protect_last_n, and threshold in DEFAULT_CONFIG
PR #2554 made these configurable via config.yaml but didn't add them
to DEFAULT_CONFIG or the config display. Users couldn't discover the
new knobs without reading the source.

- threshold: 0.80 (compress at 80% context usage)
- target_ratio: 0.40 (preserve 40% of context as recent tail)
- protect_last_n: 20 (keep last 20 messages uncompressed)
- Updated hermes config display to show all three fields
2026-03-24 18:05:43 -07:00
Teknium 9231a335d4 fix(compression): replace dead summary_target_tokens with ratio-based scaling (#2554)
The summary_target_tokens parameter was accepted in the constructor,
stored on the instance, and never used — the summary budget was always
computed from hardcoded module constants (_SUMMARY_RATIO=0.20,
_MAX_SUMMARY_TOKENS=8000). This caused two compounding problems:

1. The config value was silently ignored, giving users no control
   over post-compression size.
2. Fixed budgets (20K tail, 8K summary cap) didn't scale with
   context window size. Switching from a 1M-context model to a
   200K model would trigger compression that nuked 350K tokens
   of conversation history down to ~30K.

Changes:
- Replace summary_target_tokens with summary_target_ratio (default 0.40)
  which sets the post-compression target as a fraction of context_length.
  Tail token budget and summary cap now scale proportionally:
    MiniMax 200K → ~80K post-compression
    GPT-5   1M  → ~400K post-compression
- Change threshold_percent default: 0.50 → 0.80 (don't fire until
  80% of context is consumed)
- Change protect_last_n default: 4 → 20 (preserve ~10 full turns)
- Summary token cap scales to 5% of context (was fixed 8K), capped
  at 32K ceiling
- Read target_ratio and protect_last_n from config.yaml compression
  section (both are now configurable)
- Remove hardcoded summary_target_tokens=500 from run_agent.py
- Add 5 new tests for ratio scaling, clamping, and new defaults
2026-03-24 17:45:49 -07:00
Teknium 7efaa5968d Merge pull request #2891 from NousResearch/hermes/hermes-gateway-context
fix(gateway): stop loading hermes repo AGENTS.md into gateway sessions (~10k wasted tokens)
2026-03-24 17:43:41 -07:00
Teknium 8ee4f32819 fix(gateway): use TERMINAL_CWD for context file discovery, not process cwd
The gateway process runs from the hermes-agent install directory, so
os.getcwd() picks up the repo's AGENTS.md (16k chars) and other dev
context files — inflating input tokens by ~10k on every gateway message.

Fix: use TERMINAL_CWD (which the gateway sets to MESSAGING_CWD or
$HOME) as the cwd for build_context_files_prompt(). In CLI mode,
TERMINAL_CWD is the user's actual project directory, so behavior
is unchanged.

Before: gateway 15-20k input tokens, CLI 6-8k
After:  gateway ~6-8k input tokens (same as CLI)

Reported by keri on Discord.
2026-03-24 17:30:33 -07:00
Teknium 689344430c chore: gitignore orphaned mini-swe-agent directory 2026-03-24 12:50:34 -07:00
Teknium 618f15dda9 fix: reorder setup wizard providers — OpenRouter first
Move OpenRouter to position 1 in the setup wizard's provider list
to match hermes model ordering. Update default selection index and
fix test expectations for the new ordering.

Setup order: OpenRouter → Nous Portal → Codex → Custom → ...
2026-03-24 12:50:24 -07:00
Teknium 481915587e fix: update context pressure warnings and token estimates after compaction
Reset context pressure warnings and update last_prompt_tokens and last_completion_tokens in the context compressor to prevent stale values from causing excessive warnings and re-triggering compression. This change ensures accurate pressure calculations following the compaction process.
2026-03-24 09:25:10 -07:00
Teknium 0b993c1e07 docs: quote pip install extras to fix zsh glob errors (#2815)
zsh interprets square brackets as glob patterns, so
`pip install hermes-agent[voice]` fails with 'no matches found'.
Quote all pip install commands with extras across 5 docs pages (12 instances).

Reported by OFumik0OP.
2026-03-24 09:25:01 -07:00
Teknium 9718334962 docs: fix api-server response storage — SQLite, not in-memory (#2819)
* docs: update all docs for /model command overhaul and custom provider support

Documents the full /model command overhaul across 6 files:

AGENTS.md:
- Add model_switch.py to project structure tree

configuration.md:
- Rewrite General Setup with 3 config methods (interactive, config.yaml, env vars)
- Add new 'Switching Models with /model' section documenting all syntax variants
- Add 'Named Custom Providers' section with config.yaml examples and
  custom:name:model triple syntax

slash-commands.md:
- Update /model descriptions in both CLI and messaging tables with
  full syntax examples (provider:model, custom:model, custom:name:model,
  bare custom auto-detect)

cli-commands.md:
- Add /model slash command subsection under hermes model with syntax table
- Add custom endpoint config to hermes model use cases

faq.md:
- Add config.yaml example for offline/local model setup
- Note that provider: custom is a first-class provider
- Document /model custom auto-detect

provider-runtime.md:
- Add model_switch.py to implementation file list
- Update provider families to show Custom as first-class with named variants

* docs: fix api-server response storage description — SQLite, not in-memory

The ResponseStore class uses SQLite persistence (with in-memory
fallback), not pure in-memory storage. Responses survive gateway
restarts.
2026-03-24 09:05:15 -07:00
Teknium ebcb81b649 docs: document 9 previously undocumented features
New documentation for features that existed in code but had no docs:

New page:
- context-references.md: Full docs for @-syntax inline context
  injection (@file:, @folder:, @diff, @staged, @git:, @url:) with
  line ranges, CLI autocomplete, size limits, sensitive path blocking,
  and error handling

configuration.md additions:
- Environment variable substitution: ${VAR_NAME} syntax in config.yaml
  with expansion, fallback, and multi-reference support
- Gateway streaming: Progressive token delivery on messaging platforms
  via message editing (StreamingConfig: enabled, transport, edit_interval,
  buffer_threshold, cursor) with platform support matrix
- Web search backends: Three providers (Firecrawl, Parallel, Tavily)
  with web.backend config key, capability matrix, auto-detection from
  API keys, self-hosted Firecrawl, and Parallel search modes

security.md additions:
- SSRF protection: Always-on URL validation blocking private networks,
  loopback, link-local, CGNAT, cloud metadata hostnames, with
  fail-closed DNS and redirect chain re-validation
- Tirith pre-exec security scanning: Content-level command scanning
  for homograph URLs, pipe-to-interpreter, terminal injection with
  auto-install, SHA-256/cosign verification, config options, and
  fail-open/fail-closed modes

sessions.md addition:
- Auto-generated session titles: Background LLM-powered title
  generation after first exchange

creating-skills.md additions:
- Conditional skill activation: requires_toolsets, requires_tools,
  fallback_for_toolsets, fallback_for_tools frontmatter fields with
  matching logic and use cases
- Environment variable requirements: required_environment_variables
  frontmatter for automatic env passthrough to sandboxed execution,
  plus terminal.env_passthrough user config
2026-03-24 08:56:21 -07:00
Teknium ac5b8a478a ci: add supply chain audit workflow for PR scanning (#2816)
Scans every PR diff for patterns associated with supply chain attacks:

CRITICAL (blocks merge):
- .pth files (auto-execute on Python startup — litellm attack vector)
- base64 decode + exec/eval combo (obfuscated payload execution)
- subprocess with encoded/obfuscated commands

WARNING (comment only, no block):
- base64 encode/decode alone (legitimate uses: images, JWT, etc.)
- exec/eval alone
- Outbound POST/PUT requests
- setup.py/sitecustomize.py/usercustomize.py changes
- marshal.loads/pickle.loads/compile()

Posts a detailed comment on the PR with matched lines and context.
Excludes lockfiles (uv.lock, package-lock.json) from scanning.

Motivated by the litellm 1.82.7/1.82.8 credential stealer attack
(BerriAI/litellm#24512).
2026-03-24 08:56:04 -07:00
Teknium 624e4a8e7a chore: regenerate uv.lock with hashes, use lockfile in setup (#2812)
- Regenerate uv.lock with sha256 hashes for all 2965 package artifacts
- Add python_version marker to yc-bench (requires >=3.12)
- Update setup-hermes.sh to prefer 'uv sync --locked' for hash-verified
  installs, with fallback to 'uv pip install' when lockfile is stale

This completes the supply chain hardening: pyproject.toml bounds the
version ranges, and uv.lock pins exact versions with cryptographic
hashes so tampered packages are rejected at install time.
2026-03-24 08:42:45 -07:00
Teknium 177e43259f refactor: update mini_swe_runner to use Hermes built-in backends
Replace all minisweagent imports with Hermes-Agent's own environment
classes (LocalEnvironment, DockerEnvironment, ModalEnvironment).

mini_swe_runner.py no longer has any dependency on mini-swe-agent.
The runner now uses the same backends as the terminal tool, so Docker
and Modal environments work out of the box without extra submodules.

Tested: local and Docker backends verified working through the runner.
2026-03-24 08:27:15 -07:00
Teknium c9b76057d4 chore: pin all dependency version ranges (supply chain hardening) (#2810)
Adds upper-bound version pins (<next_major) to all dependencies in
pyproject.toml — both core and optional. Previously most deps were
unpinned or had only floor bounds, meaning fresh installs would pull
whatever version was latest on PyPI.

This limits blast radius from supply chain attacks like the litellm
1.82.7/1.82.8 credential stealer (BerriAI/litellm#24512). With bounded
ranges, a compromised major version bump won't be pulled automatically.

Floors are set to current known-good installed versions.
2026-03-24 08:25:17 -07:00
Teknium 745859babb feat: env var passthrough for skills and user config (#2807)
* feat: env var passthrough for skills and user config

Skills that declare required_environment_variables now have those vars
passed through to sandboxed execution environments (execute_code and
terminal).  Previously, execute_code stripped all vars containing KEY,
TOKEN, SECRET, etc. and the terminal blocklist removed Hermes
infrastructure vars — both blocked skill-declared env vars.

Two passthrough sources:

1. Skill-scoped (automatic): when a skill is loaded via skill_view and
   declares required_environment_variables, vars that are present in
   the environment are registered in a session-scoped passthrough set.

2. Config-based (manual): terminal.env_passthrough in config.yaml lets
   users explicitly allowlist vars for non-skill use cases.

Changes:
- New module: tools/env_passthrough.py — shared passthrough registry
- hermes_cli/config.py: add terminal.env_passthrough to DEFAULT_CONFIG
- tools/skills_tool.py: register available skill env vars on load
- tools/code_execution_tool.py: check passthrough before filtering
- tools/environments/local.py: check passthrough in _sanitize_subprocess_env
  and _make_run_env
- 19 new tests covering all layers

* docs: add environment variable passthrough documentation

Document the env var passthrough feature across four docs pages:

- security.md: new 'Environment Variable Passthrough' section with
  full explanation, comparison table, and security considerations
- code-execution.md: update security section, add passthrough subsection,
  fix comparison table
- creating-skills.md: add tip about automatic sandbox passthrough
- skills.md: add note about passthrough after secure setup docs

Live-tested: launched interactive CLI, loaded a skill with
required_environment_variables, verified TEST_SKILL_SECRET_KEY was
accessible inside execute_code sandbox (value: passthrough-test-value-42).
2026-03-24 08:19:34 -07:00
Teknium ad1bf16f28 chore: remove all remaining mini-swe-agent references
Complete cleanup after dropping the mini-swe-agent submodule (PR #2804):

- Remove MSWEA_SILENT_STARTUP and MSWEA_GLOBAL_CONFIG_DIR env var
  settings from cli.py, run_agent.py, hermes_cli/main.py, doctor.py
- Remove mini-swe-agent health check from hermes doctor
- Remove 'minisweagent' from logger suppression lists
- Remove litellm/typer/platformdirs from requirements.txt
- Remove mini-swe-agent install steps from install.ps1 (Windows)
- Remove mini-swe-agent install steps from website docs
- Update all stale comments/docstrings referencing mini-swe-agent
  in terminal_tool.py, tools/__init__.py, code_execution_tool.py,
  environments/README.md, environments/agent_loop.py
- Remove mini_swe_runner from pyproject.toml py-modules
  (still exists as standalone script for RL training use)
- Shrink test_minisweagent_path.py to empty stub

The orphaned mini-swe-agent/ directory on disk needs manual removal:
  rm -rf mini-swe-agent/
2026-03-24 08:19:23 -07:00
Teknium e2c81c6e2f docs: add missing skills, CLI commands, and messaging env vars
Complete the documentation gaps identified in the previous audit:

Skills catalogs:
- skills-catalog.md: Add 7 missing bundled skills — data-science/
  jupyter-live-kernel, dogfood/hermes-agent-setup, inference-sh/
  inference-sh-cli, mlops/huggingface-hub, productivity/linear,
  research/parallel-cli, social-media/xitter
- optional-skills-catalog.md: Add 8 missing optional skills —
  blockchain/base, creative/blender-mcp, creative/meme-generation,
  mcp/fastmcp, productivity/telephony, research/bioinformatics,
  security/oss-forensics, security/sherlock

CLI commands reference:
- cli-commands.md: Add full documentation for hermes mcp (add/remove/
  list/test/configure) and hermes plugins (install/update/remove/list)

Messaging platform docs:
- discord.md: Add DISCORD_REQUIRE_MENTION and
  DISCORD_FREE_RESPONSE_CHANNELS to manual config env vars section
- signal.md: Add SIGNAL_ALLOW_ALL_USERS to env var reference table
- slack.md: Add SLACK_HOME_CHANNEL_NAME to config section
2026-03-24 08:12:37 -07:00
Teknium 677b11d84c fix: reject relative cwd paths for container terminal backends
When TERMINAL_CWD is set to '.' or any relative path (common when the
CLI config defaults to cwd='.'), container backends (docker, modal,
singularity, daytona) would pass it directly to the container where it's
meaningless. This caused 'docker run -d -w .' to fail.

Now relative paths are caught alongside host paths and replaced with
the default '/root' for container backends.
2026-03-24 08:03:14 -07:00
Teknium ee3f3e756d docs: fix stale and incorrect documentation across 18 files
Cross-referenced all 84 docs pages against the actual codebase and
corrected every discrepancy found.

Reference docs:
- faq.md: Fix non-existent commands (/stats→/usage, /context→/usage,
  hermes models→hermes model, hermes config get→hermes config show,
  hermes gateway logs→cat gateway.log, async→sync chat() call)
- cli-commands.md: Fix --provider choices list (remove providers not
  in argparse), add undocumented -s/--skills flag
- slash-commands.md: Add missing /queue and /resume commands, fix
  /approve args_hint to show [session|always]
- tools-reference.md: Remove duplicate vision and web toolset sections
- environment-variables.md: Fix HERMES_INFERENCE_PROVIDER list (add
  copilot-acp, remove alibaba to match actual argparse choices)

Configuration & user guide:
- configuration.md: Fix approval_mode→approvals.mode (manual not ask),
  checkpoints.enabled default true not false, human_delay defaults
  (500/2000→800/2500), remove non-existent delegation.max_iterations
  and delegation.default_toolsets, fix website_blocklist nesting
  under security:, add .hermes.md and CLAUDE.md to context files
  table with priority system explanation
- security.md: Fix website_blocklist nesting under security:
- context-files.md: Add .hermes.md/HERMES.md and CLAUDE.md support,
  document priority-based first-match-wins loading behavior
- cli.md: Fix personalities config nesting (top-level, not under agent:)
- delegation.md: Fix model override docs (config-level, not per-call
  tool parameter)
- rl-training.md: Fix log directory (tinker-atropos/logs/→
  ~/.hermes/logs/rl_training/)
- tts.md: Fix Discord delivery format (voice bubble with fallback,
  not just file attachment)
- git-worktrees.md: Remove outdated v0.2.0 version reference

Developer guide:
- prompt-assembly.md: Add .hermes.md, CLAUDE.md, document priority
  system for context files
- agent-loop.md: Fix callback list (remove non-existent
  message_callback, add stream_delta_callback, tool_gen_callback,
  status_callback)

Messaging & guides:
- webhooks.md: Fix command (hermes setup gateway→hermes gateway setup)
- tips.md: Fix session idle timeout (120min→24h), config file
  (gateway.json→config.yaml)
- build-a-hermes-plugin.md: Fix plugin.yaml provides: format
  (provides_tools/provides_hooks as lists), note register_command()
  as not yet implemented
2026-03-24 07:53:07 -07:00
Teknium 02b38b93cb refactor: remove mini-swe-agent dependency — inline Docker/Modal backends (#2804)
Drop the mini-swe-agent git submodule. All terminal backends now use
hermes-agent's own environment implementations directly.

Docker backend:
- Inline the `docker run -d` container startup (was 15 lines in
  minisweagent's DockerEnvironment). Our wrapper already handled
  execute(), cleanup(), security hardening, volumes, and resource limits.

Modal backend:
- Import swe-rex's ModalDeployment directly instead of going through
  minisweagent's 90-line passthrough wrapper.
- Bake the _AsyncWorker pattern (from environments/patches.py) directly
  into ModalEnvironment for Atropos compatibility without monkey-patching.

Cleanup:
- Remove minisweagent_path.py (submodule path resolution helper)
- Remove submodule init/install from install.sh and setup-hermes.sh
- Remove mini-swe-agent from .gitmodules
- environments/patches.py is now a no-op (kept for backward compat)
- terminal_tool.py no longer does sys.path hacking for minisweagent
- mini_swe_runner.py guards imports (optional, for RL training only)
- Update all affected tests to mock the new direct subprocess calls
- Update README.md, CONTRIBUTING.md

No functionality change — all Docker, Modal, local, SSH, Singularity,
and Daytona backends behave identically. 6093 tests pass.
2026-03-24 07:30:25 -07:00
Teknium 2233f764af fix(tools): handle 402 insufficient credits error in vision tool (#2802)
Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>
2026-03-24 07:23:07 -07:00
Teknium 98b5570961 fix: make browser command timeout configurable via config.yaml (#2801)
browser_vision and other browser commands had a hardcoded 30-second
subprocess timeout that couldn't be overridden. Users with slower
machines (local Chromium without GPU) would hit timeouts on screenshot
capture even when setting browser.command_timeout in config.yaml,
because nothing read that value.

Changes:
- Add browser.command_timeout to DEFAULT_CONFIG (default: 30s)
- Add _get_command_timeout() helper that reads config, falls back to 30s
- _run_browser_command() now defaults to config value instead of constant
- browser_vision screenshot no longer hardcodes timeout=30
- browser_navigate uses max(config_timeout, 60) as floor for navigation

Reported by Gamer1988.
2026-03-24 07:21:50 -07:00
Teknium 773d3bb4df docs: update all docs for /model command overhaul and custom provider support
Documents the full /model command overhaul across 6 files:

AGENTS.md:
- Add model_switch.py to project structure tree

configuration.md:
- Rewrite General Setup with 3 config methods (interactive, config.yaml, env vars)
- Add new 'Switching Models with /model' section documenting all syntax variants
- Add 'Named Custom Providers' section with config.yaml examples and
  custom:name:model triple syntax

slash-commands.md:
- Update /model descriptions in both CLI and messaging tables with
  full syntax examples (provider:model, custom:model, custom:name:model,
  bare custom auto-detect)

cli-commands.md:
- Add /model slash command subsection under hermes model with syntax table
- Add custom endpoint config to hermes model use cases

faq.md:
- Add config.yaml example for offline/local model setup
- Note that provider: custom is a first-class provider
- Document /model custom auto-detect

provider-runtime.md:
- Add model_switch.py to implementation file list
- Update provider families to show Custom as first-class with named variants
2026-03-24 07:19:26 -07:00
Teknium a312ee7b4c fix(agent): ensure first delta is fired during reasoning updates
- Added calls to `_fire_first_delta()` in the `AIAgent` class to ensure that the first delta is triggered for both reasoning and thinking updates. This change improves the handling of delta events during streaming, enhancing the responsiveness of the agent's reasoning capabilities.
2026-03-24 07:16:20 -07:00
Teknium 2e524272b1 refactor(model): extract shared switch_model() from CLI and gateway handlers
Phase 4 of the /model command overhaul.

Both the CLI (cli.py) and gateway (gateway/run.py) /model handlers
had ~50 lines of duplicated core logic: parsing, provider detection,
credential resolution, and model validation. This extracts that
pipeline into hermes_cli/model_switch.py.

New module exports:
- ModelSwitchResult: dataclass with all fields both handlers need
- CustomAutoResult: dataclass for bare '/model custom' results
- switch_model(): core pipeline — parse → detect → resolve → validate
- switch_to_custom_provider(): resolve endpoint + auto-detect model

The shared functions are pure (no I/O side effects). Each caller
handles its own platform-specific concerns:
- CLI: sets self.model/provider/etc, calls save_config_value(), prints
- Gateway: writes config.yaml directly, sets env vars, returns markdown

Net result: -244 lines from handlers, +234 lines in shared module.
The handlers are now ~80 lines each (down from ~150+) and can't drift
apart on core logic.
2026-03-24 07:08:07 -07:00
Teknium ce39f9cc44 fix(gateway): detect virtualenv path instead of hardcoding venv/ (#2797)
Fixes #2492.

`generate_systemd_unit()` and `get_python_path()` hardcoded `venv`
as the virtualenv directory name. When the virtualenv is `.venv`
(which `setup-hermes.sh` and `.gitignore` both reference), the
generated systemd unit had incorrect VIRTUAL_ENV and PATH variables.

Introduce `_detect_venv_dir()` which:
1. Checks `sys.prefix` vs `sys.base_prefix` to detect the active venv
2. Falls back to probing `.venv` then `venv` under PROJECT_ROOT

Both `get_python_path()` and `generate_systemd_unit()` now use
this detection instead of hardcoded paths.

Co-authored-by: Hermes <hermes@nousresearch.ai>
2026-03-24 07:05:57 -07:00
Teknium 18cbd18fa9 fix: remove litellm/typer/platformdirs from hermes-agent deps (supply chain compromise) (#2796)
litellm 1.82.7/1.82.8 contained a credential stealer (.pth auto-exec
payload). PyPI quarantined the entire package, blocking all fresh
hermes-agent installs since litellm was listed as a hard dependency.

These three deps (litellm, typer, platformdirs) are only used by the
mini-swe-agent submodule, which has its own pyproject.toml and manages
its own dependencies. They were redundantly duplicated in hermes-agent's
pyproject.toml.

Also fixes install.sh to not print 'mini-swe-agent installed' on
failure, and updates warning messages in both install scripts to clarify
that only Docker/Modal backends are affected — local terminal is
unaffected.

Ref: https://github.com/BerriAI/litellm/issues/24512
2026-03-24 07:03:16 -07:00
Teknium b641ee88f4 feat(model): /model command overhaul — Phases 2, 3, 5
* feat(model): persist base_url on /model switch, auto-detect for bare /model custom

Phase 2+3 of the /model command overhaul:

Phase 2 — Persist base_url on model switch:
- CLI: save model.base_url when switching to a non-OpenRouter endpoint;
  clear it when switching away from custom to prevent stale URLs
  leaking into the new provider's resolution
- Gateway: same logic using direct YAML write

Phase 3 — Better feedback and edge cases:
- Bare '/model custom' now auto-detects the model from the endpoint
  using _auto_detect_local_model() and saves all three config values
  (model, provider, base_url) atomically
- Shows endpoint URL in success messages when switching to/from
  custom providers (both CLI and gateway)
- Clear error messages when no custom endpoint is configured
- Updated test assertions for the additional save_config_value call

Fixes #2562 (Phase 2+3)

* feat(model): support custom:name:model triple syntax for named custom providers

Phase 5 of the /model command overhaul.

Extends parse_model_input() to handle the triple syntax:
  /model custom:local-server:qwen → provider='custom:local-server', model='qwen'
  /model custom:my-model          → provider='custom', model='my-model' (unchanged)

The 'custom:local-server' provider string is already supported by
_get_named_custom_provider() in runtime_provider.py, which matches
it against the custom_providers list in config.yaml. This just wires
the parsing so users can do it from the /model slash command.

Added 4 tests covering single, triple, whitespace, and empty model cases.
2026-03-24 06:58:04 -07:00
Teknium 2f1c4fb01f fix(auth): preserve 'custom' provider instead of silently remapping to 'openrouter'
resolve_provider('custom') was silently returning 'openrouter', causing
users who set provider: custom in config.yaml to unknowingly route
through OpenRouter instead of their local/custom endpoint. The display
showed 'via openrouter' even when the user explicitly chose custom.

Changes:
- auth.py: Split the conditional so 'custom' returns 'custom' as-is
- runtime_provider.py: _resolve_named_custom_runtime now returns
  provider='custom' instead of 'openrouter'
- runtime_provider.py: _resolve_openrouter_runtime returns
  provider='custom' when that was explicitly requested
- Add 'no-key-required' placeholder for keyless local servers
- Update existing test + add 5 new tests covering the fix

Fixes #2562
2026-03-24 06:41:11 -07:00
Teknium 4313b8aff6 fix(cli): ensure single closure of streaming boxes during tool generation
- Updated `_on_tool_gen_start` method in `HermesCLI` to close open streaming boxes exactly once, preventing potential multiple closures.
- Added a check for `_stream_box_opened` to manage the state of the streaming box more effectively, enhancing user experience during large payload streaming.
2026-03-24 06:33:21 -07:00
Teknium 87e2626cf6 feat(cli, agent): add tool generation callback for streaming updates
- Introduced `_on_tool_gen_start` in `HermesCLI` to indicate when tool-call arguments are being generated, enhancing user feedback during streaming.
- Updated `AIAgent` to support a new `tool_gen_callback`, notifying the display layer when tool generation starts, allowing for better user experience during large payloads.
- Ensured that the callback is triggered appropriately during streaming events to prevent user interface freezing.
2026-03-23 23:10:58 -07:00
Teknium 1345e93393 fix: add macOS Homebrew paths to browser and terminal PATH resolution
On macOS with Homebrew (Apple Silicon), Node.js and agent-browser
binaries live under /opt/homebrew/bin/ which is not included in the
_SANE_PATH fallback used by browser_tool.py and environments/local.py.
When Hermes runs with a filtered PATH (e.g. as a systemd service),
these binaries are invisible, causing 'env: node: No such file or
directory' errors when using browser tools.

Changes:
- Add /opt/homebrew/bin and /opt/homebrew/sbin to _SANE_PATH in both
  browser_tool.py and environments/local.py
- Add _discover_homebrew_node_dirs() to find versioned Node installs
  (e.g. brew install node@24) that aren't linked into /opt/homebrew/bin
- Extend _find_agent_browser() to search Homebrew and Hermes-managed
  dirs when agent-browser isn't on the current PATH
- Include discovered Homebrew node dirs in subprocess PATH when
  launching agent-browser
- Add 11 new tests covering all Homebrew path discovery logic
2026-03-23 22:45:55 -07:00
Teknium 6e97a3b338 docs: revise v0.4.0 changelog — fix feature attribution, reorder sections 2026-03-23 22:42:22 -07:00
Teknium 8416bc2142 chore: release v0.4.0 (v2026.3.23) 2026-03-23 22:34:04 -07:00
Teknium 48b5bc6038 fix(gateway): prevent stale memory overwrites by flush agent (#2670)
The gateway memory flush agent reviews old conversation history on session
reset/expiry and writes to memory. It had no awareness of memory changes
made after that conversation ended (by the live agent, cron jobs, or other
sessions), causing silent overwrites of newer entries.

Two fixes:

1. Skip memory flush entirely for cron sessions (session IDs starting with
   'cron_'). Cron sessions are headless with no meaningful user conversation
   to extract memories from.

2. Inject the current live memory state (MEMORY.md + USER.md) directly into
   the flush prompt. The flush agent can now see what's already saved and
   make informed decisions — only adding genuinely new information rather
   than blindly overwriting entries that may have been updated since the
   conversation ended.

Addresses the root cause identified in #2670: the flush agent was making
memory decisions blind to the current state of memory, causing stale
context to overwrite newer entries on gateway restarts and session resets.

Co-authored-by: devorun <devorun@users.noreply.github.com>
Co-authored-by: dlkakbs <dlkakbs@users.noreply.github.com>
2026-03-23 16:08:38 -07:00
Teknium 4ff73fb32c feat(config): support ${ENV_VAR} substitution in config.yaml (#2684)
* feat(config): support ${ENV_VAR} substitution in config.yaml

* fix: extend env var expansion to CLI and gateway config loaders

The original PR (#2680) only wired _expand_env_vars into load_config(),
which is used by 'hermes tools' and 'hermes setup'. The two primary
config paths were missed:

- load_cli_config() in cli.py (interactive CLI)
- Module-level _cfg in gateway/run.py (gateway — bridges api_keys to env vars)

Also:
- Remove redundant 'import re' (already imported at module level)
- Add missing blank lines between top-level functions (PEP 8)
- Add tests for load_cli_config() expansion

---------

Co-authored-by: teyrebaz33 <hakanerten02@hotmail.com>
2026-03-23 16:02:06 -07:00
Teknium 73a88a02fe fix(security): prevent shell injection in _expand_path via ~user path suffix (#2047)
echo was called with the full unquoted path (~username/suffix), allowing
command substitution in the suffix (e.g. ~user/$(malicious)) to execute
arbitrary shell commands. The fix expands only the validated ~username
portion via the shell and concatenates the suffix as a plain string.

Co-authored-by: Gutslabs <gutslabsxyz@gmail.com>
2026-03-23 16:00:34 -07:00
Teknium f9c2565ab4 fix(config): log warning instead of silently swallowing config.yaml errors (#2683)
A bare `except Exception: pass` meant any YAML syntax error, bad value,
or unexpected structure in config.yaml was silently ignored and the
gateway fell back to .env / gateway.json without any indication.
Users had no way to know why their config changes had no effect.

Co-authored-by: sprmn24 <oncuevtv@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 15:54:11 -07:00
Teknium ad5f973a8d fix(vision): make SSRF redirect guard async for httpx.AsyncClient
httpx.AsyncClient awaits event hooks. The sync _ssrf_redirect_guard
returned None, causing 'object NoneType can't be used in await
expression' on any vision_analyze call that followed redirects.

Caught during live PTY testing of the merged SSRF protection.
2026-03-23 15:44:52 -07:00
Teknium 0791efe2c3 fix(security): add SSRF protection to vision_tools and web_tools (hardened)
* fix(security): add SSRF protection to vision_tools and web_tools

Both vision_analyze and web_extract/web_crawl accept arbitrary URLs
without checking if they target private/internal network addresses.
A prompt-injected or malicious skill could use this to access cloud
metadata endpoints (169.254.169.254), localhost services, or private
network hosts.

Adds a shared url_safety.is_safe_url() that resolves hostnames and
blocks private, loopback, link-local, and reserved IP ranges. Also
blocks known internal hostnames (metadata.google.internal).

Integrated at the URL validation layer in vision_tools and before
each website_policy check in web_tools (extract, crawl).

* test(vision): update localhost test to reflect SSRF protection

The existing test_valid_url_with_port asserted localhost URLs pass
validation. With SSRF protection, localhost is now correctly blocked.
Update the test to verify the block, and add a separate test for
valid URLs with ports using a public hostname.

* fix(security): harden SSRF protection — fail-closed, CGNAT, multicast, redirect guard

Follow-up hardening on top of dieutx's SSRF protection (PR #2630):

- Change fail-open to fail-closed: DNS errors and unexpected exceptions
  now block the request instead of allowing it (OWASP best practice)
- Block CGNAT range (100.64.0.0/10): Python's ipaddress.is_private
  does NOT cover this range (returns False for both is_private and
  is_global). Used by Tailscale/WireGuard and carrier infrastructure.
- Add is_multicast and is_unspecified checks: multicast (224.0.0.0/4)
  and unspecified (0.0.0.0) addresses were not caught by the original
  four-check chain
- Add redirect guard for vision_tools: httpx event hook re-validates
  each redirect target against SSRF checks, preventing the classic
  redirect-based SSRF bypass (302 to internal IP)
- Move SSRF filtering before backend dispatch in web_extract: now
  covers Parallel and Tavily backends, not just Firecrawl
- Extract _is_blocked_ip() helper for cleaner IP range checking
- Add 24 new tests (CGNAT, multicast, IPv4-mapped IPv6, fail-closed
  behavior, parametrized blocked/allowed IP lists)
- Fix existing tests to mock DNS resolution for test hostnames

---------

Co-authored-by: dieutx <dangtc94@gmail.com>
2026-03-23 15:40:42 -07:00
Teknium 934fbe3c06 fix: strip ANSI at the source — clean terminal output before it reaches the model
Root cause: terminal_tool, execute_code, and process_registry returned raw
subprocess output with ANSI escape sequences intact. The model saw these
in tool results and copied them into file writes.

Previous fix (PR #2532) stripped ANSI at the write point in file_tools.py,
but this was a band-aid — regex on file content risks corrupting legitimate
content, and doesn't prevent ANSI from wasting tokens in the model context.

Source-level fix:
- New tools/ansi_strip.py with comprehensive ECMA-48 regex covering CSI
  (incl. private-mode, colon-separated, intermediate bytes), OSC (both
  terminators), DCS/SOS/PM/APC strings, Fp/Fe/Fs/nF escapes, 8-bit C1
- terminal_tool.py: strip output before returning to model
- code_execution_tool.py: strip stdout/stderr before returning
- process_registry.py: strip output in poll/read_log/wait
- file_tools.py: remove _strip_ansi band-aid (no longer needed)

Verified: `ls --color=always` output returned as clean text to model,
file written from that output contains zero ESC bytes.
2026-03-23 07:43:12 -07:00
Teknium 6302e56e7c fix(gateway): add all missing platform allowlist env vars to startup warning check (#2628)
* fix(gateway): added MATRIX_ALLOWED_USERS to list of env vars checked by gateway

* fix(gateway): add all missing platform allowlist env vars to startup check

The startup warning for 'No user allowlists configured' was only checking
TELEGRAM, DISCORD, WHATSAPP, SLACK, and SMS — missing SIGNAL, EMAIL,
MATTERMOST, and DINGTALK. Users of those platforms would see a spurious
warning even with their platform-specific allowlist configured.

Now matches the canonical platform_env_map in _is_user_authorized().

---------

Co-authored-by: SteelPh0enix <wojciech_olech@hotmail.com>
2026-03-23 07:19:14 -07:00
Teknium 868b3c07e3 fix: platform default toolsets silently override tool deselection in hermes tools (#2624)
Cherry-picked from PR #2576 by ereid7, plus read-side fix from 173a5c62.

Both fixes were originally landed in 173a5c62 but were inadvertently
reverted by commit 34be3f8b (a squash-merge that bundled unrelated
tools_config.py changes).

Save side (_save_platform_tools): exclude platform default toolset
names (hermes-cli, hermes-telegram) from preserved entries so they
don't silently re-enable everything.

Read side (_get_platform_tools): when the saved list contains explicit
configurable keys, use direct membership instead of subset inference.
The subset approach is broken when composite toolsets like hermes-cli
resolve to ALL tools.
2026-03-23 07:06:51 -07:00
Teknium 9d6148316c fix: media delivery fails for file paths containing spaces (#2621)
Cherry-picked from PR #2583 by Glucksberg.

The MEDIA: regex used \S+ which truncated paths at the first space.
Added a space-aware alternative anchored to known media extensions.
Also updated extract_local_files to allow spaces in path segments.

Follow-up fix: changed \s to [^\S\n] in the space-matching group
so the regex doesn't greedily match across newlines (broke multi-line
MEDIA: tags).
2026-03-23 06:59:59 -07:00
121 changed files with 7340 additions and 1875 deletions
+192
View File
@@ -0,0 +1,192 @@
name: Supply Chain Audit
on:
pull_request:
types: [opened, synchronize, reopened]
permissions:
pull-requests: write
contents: read
jobs:
scan:
name: Scan PR for supply chain risks
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Scan diff for suspicious patterns
id: scan
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
set -euo pipefail
BASE="${{ github.event.pull_request.base.sha }}"
HEAD="${{ github.event.pull_request.head.sha }}"
# Get the full diff (added lines only)
DIFF=$(git diff "$BASE".."$HEAD" -- . ':!uv.lock' ':!*.lock' ':!package-lock.json' ':!yarn.lock' || true)
FINDINGS=""
CRITICAL=false
# --- .pth files (auto-execute on Python startup) ---
PTH_FILES=$(git diff --name-only "$BASE".."$HEAD" | grep '\.pth$' || true)
if [ -n "$PTH_FILES" ]; then
CRITICAL=true
FINDINGS="${FINDINGS}
### 🚨 CRITICAL: .pth file added or modified
Python \`.pth\` files in \`site-packages/\` execute automatically when the interpreter starts — no import required. This is the exact mechanism used in the [litellm supply chain attack](https://github.com/BerriAI/litellm/issues/24512).
**Files:**
\`\`\`
${PTH_FILES}
\`\`\`
"
fi
# --- base64 + exec/eval combo (the litellm attack pattern) ---
B64_EXEC_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -iE 'base64\.(b64decode|decodebytes|urlsafe_b64decode)' | grep -iE 'exec\(|eval\(' | head -10 || true)
if [ -n "$B64_EXEC_HITS" ]; then
CRITICAL=true
FINDINGS="${FINDINGS}
### 🚨 CRITICAL: base64 decode + exec/eval combo
This is the exact pattern used in the [litellm supply chain attack](https://github.com/BerriAI/litellm/issues/24512) — base64-decoded strings passed to exec/eval to hide credential-stealing payloads.
**Matches:**
\`\`\`
${B64_EXEC_HITS}
\`\`\`
"
fi
# --- base64 decode/encode (alone — legitimate uses exist) ---
B64_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -iE 'base64\.(b64decode|b64encode|decodebytes|encodebytes|urlsafe_b64decode)|atob\(|btoa\(|Buffer\.from\(.*base64' | head -20 || true)
if [ -n "$B64_HITS" ]; then
FINDINGS="${FINDINGS}
### ⚠️ WARNING: base64 encoding/decoding detected
Base64 has legitimate uses (images, JWT, etc.) but is also commonly used to obfuscate malicious payloads. Verify the usage is appropriate.
**Matches (first 20):**
\`\`\`
${B64_HITS}
\`\`\`
"
fi
# --- exec/eval with string arguments ---
EXEC_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -E '(exec|eval)\s*\(' | grep -v '^\+\s*#' | grep -v 'test_\|mock\|assert\|# ' | head -20 || true)
if [ -n "$EXEC_HITS" ]; then
FINDINGS="${FINDINGS}
### ⚠️ WARNING: exec() or eval() usage
Dynamic code execution can hide malicious behavior, especially when combined with base64 or network fetches.
**Matches (first 20):**
\`\`\`
${EXEC_HITS}
\`\`\`
"
fi
# --- subprocess with encoded/obfuscated commands ---
PROC_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -E 'subprocess\.(Popen|call|run)\s*\(' | grep -iE 'base64|decode|encode|\\x|chr\(' | head -10 || true)
if [ -n "$PROC_HITS" ]; then
CRITICAL=true
FINDINGS="${FINDINGS}
### 🚨 CRITICAL: subprocess with encoded/obfuscated command
Subprocess calls with encoded arguments are a strong indicator of payload execution.
**Matches:**
\`\`\`
${PROC_HITS}
\`\`\`
"
fi
# --- Network calls to non-standard domains ---
EXFIL_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -iE 'requests\.(post|put)\(|httpx\.(post|put)\(|urllib\.request\.urlopen' | grep -v '^\+\s*#' | grep -v 'test_\|mock\|assert' | head -10 || true)
if [ -n "$EXFIL_HITS" ]; then
FINDINGS="${FINDINGS}
### ⚠️ WARNING: Outbound network calls (POST/PUT)
Outbound POST/PUT requests in new code could be data exfiltration. Verify the destination URLs are legitimate.
**Matches (first 10):**
\`\`\`
${EXFIL_HITS}
\`\`\`
"
fi
# --- setup.py / setup.cfg install hooks ---
SETUP_HITS=$(git diff --name-only "$BASE".."$HEAD" | grep -E '(setup\.py|setup\.cfg|__init__\.pth|sitecustomize\.py|usercustomize\.py)$' || true)
if [ -n "$SETUP_HITS" ]; then
FINDINGS="${FINDINGS}
### ⚠️ WARNING: Install hook files modified
These files can execute code during package installation or interpreter startup.
**Files:**
\`\`\`
${SETUP_HITS}
\`\`\`
"
fi
# --- Compile/marshal/pickle (code object injection) ---
MARSHAL_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -iE 'marshal\.loads|pickle\.loads|compile\(' | grep -v '^\+\s*#' | grep -v 'test_\|re\.compile\|ast\.compile' | head -10 || true)
if [ -n "$MARSHAL_HITS" ]; then
FINDINGS="${FINDINGS}
### ⚠️ WARNING: marshal/pickle/compile usage
These can deserialize or construct executable code objects.
**Matches:**
\`\`\`
${MARSHAL_HITS}
\`\`\`
"
fi
# --- Output results ---
if [ -n "$FINDINGS" ]; then
echo "found=true" >> "$GITHUB_OUTPUT"
if [ "$CRITICAL" = true ]; then
echo "critical=true" >> "$GITHUB_OUTPUT"
else
echo "critical=false" >> "$GITHUB_OUTPUT"
fi
# Write findings to a file (multiline env vars are fragile)
echo "$FINDINGS" > /tmp/findings.md
else
echo "found=false" >> "$GITHUB_OUTPUT"
echo "critical=false" >> "$GITHUB_OUTPUT"
fi
- name: Post warning comment
if: steps.scan.outputs.found == 'true'
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
SEVERITY="⚠️ Supply Chain Risk Detected"
if [ "${{ steps.scan.outputs.critical }}" = "true" ]; then
SEVERITY="🚨 CRITICAL Supply Chain Risk Detected"
fi
BODY="## ${SEVERITY}
This PR contains patterns commonly associated with supply chain attacks. This does **not** mean the PR is malicious — but these patterns require careful human review before merging.
$(cat /tmp/findings.md)
---
*Automated scan triggered by [supply-chain-audit](/.github/workflows/supply-chain-audit.yml). If this is a false positive, a maintainer can approve after manual review.*"
gh pr comment "${{ github.event.pull_request.number }}" --body "$BODY"
- name: Fail on critical findings
if: steps.scan.outputs.critical == 'true'
run: |
echo "::error::CRITICAL supply chain risk patterns detected in this PR. See the PR comment for details."
exit 1
+1
View File
@@ -53,3 +53,4 @@ environments/benchmarks/evals/
# Release script temp files
.release_notes.md
mini-swe-agent/
-3
View File
@@ -1,6 +1,3 @@
[submodule "mini-swe-agent"]
path = mini-swe-agent
url = https://github.com/SWE-agent/mini-swe-agent
[submodule "tinker-atropos"]
path = tinker-atropos
url = https://github.com/nousresearch/tinker-atropos
+1
View File
@@ -38,6 +38,7 @@ hermes-agent/
│ ├── tools_config.py # `hermes tools` — enable/disable tools per platform
│ ├── skills_hub.py # `/skills` slash command (search, browse, install)
│ ├── models.py # Model catalog, provider model lists
│ ├── model_switch.py # Shared /model switch pipeline (CLI + gateway)
│ └── auth.py # Provider credential resolution
├── tools/ # Tool implementations (one file per tool)
│ ├── registry.py # Central tool registry (schemas, handlers, dispatch)
+3 -2
View File
@@ -72,8 +72,9 @@ export VIRTUAL_ENV="$(pwd)/venv"
# Install with all extras (messaging, cron, CLI menus, dev tools)
uv pip install -e ".[all,dev]"
uv pip install -e "./mini-swe-agent"
uv pip install -e "./tinker-atropos"
# Optional: RL training submodule
# git submodule update --init tinker-atropos && uv pip install -e "./tinker-atropos"
# Optional: browser tools
npm install
+1 -3
View File
@@ -144,16 +144,14 @@ Quick start for contributors:
```bash
git clone https://github.com/NousResearch/hermes-agent.git
cd hermes-agent
git submodule update --init mini-swe-agent # required terminal backend
curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv venv --python 3.11
source venv/bin/activate
uv pip install -e ".[all,dev]"
uv pip install -e "./mini-swe-agent"
python -m pytest tests/ -q
```
> **RL Training (optional):** To work on the RL/Tinker-Atropos integration, also run:
> **RL Training (optional):** To work on the RL/Tinker-Atropos integration:
> ```bash
> git submodule update --init tinker-atropos
> uv pip install -e "./tinker-atropos"
+400
View File
@@ -0,0 +1,400 @@
# Hermes Agent v0.4.0 (v2026.3.23)
**Release Date:** March 23, 2026
> The platform expansion release — OpenAI-compatible API server, 6 new messaging adapters, 4 new inference providers, MCP server management with OAuth 2.1, @ context references, gateway prompt caching, streaming enabled by default, and a sweeping reliability pass with 200+ bug fixes.
---
## ✨ Highlights
- **OpenAI-compatible API server** — Expose Hermes as an `/v1/chat/completions` endpoint with a new `/api/jobs` REST API for cron job management, hardened with input limits, field whitelists, SQLite-backed response persistence, and CORS origin protection ([#1756](https://github.com/NousResearch/hermes-agent/pull/1756), [#2450](https://github.com/NousResearch/hermes-agent/pull/2450), [#2456](https://github.com/NousResearch/hermes-agent/pull/2456), [#2451](https://github.com/NousResearch/hermes-agent/pull/2451), [#2472](https://github.com/NousResearch/hermes-agent/pull/2472))
- **6 new messaging platform adapters** — Signal, DingTalk, SMS (Twilio), Mattermost, Matrix, and Webhook adapters join Telegram, Discord, and WhatsApp. Gateway auto-reconnects failed platforms with exponential backoff ([#2206](https://github.com/NousResearch/hermes-agent/pull/2206), [#1685](https://github.com/NousResearch/hermes-agent/pull/1685), [#1688](https://github.com/NousResearch/hermes-agent/pull/1688), [#1683](https://github.com/NousResearch/hermes-agent/pull/1683), [#2166](https://github.com/NousResearch/hermes-agent/pull/2166), [#2584](https://github.com/NousResearch/hermes-agent/pull/2584))
- **@ context references** — Claude Code-style `@file` and `@url` context injection with tab completions in the CLI ([#2343](https://github.com/NousResearch/hermes-agent/pull/2343), [#2482](https://github.com/NousResearch/hermes-agent/pull/2482))
- **4 new inference providers** — GitHub Copilot (OAuth + token validation), Alibaba Cloud / DashScope, Kilo Code, and OpenCode Zen/Go ([#1924](https://github.com/NousResearch/hermes-agent/pull/1924), [#1879](https://github.com/NousResearch/hermes-agent/pull/1879) by @mchzimm, [#1673](https://github.com/NousResearch/hermes-agent/pull/1673), [#1666](https://github.com/NousResearch/hermes-agent/pull/1666), [#1650](https://github.com/NousResearch/hermes-agent/pull/1650))
- **MCP server management CLI** — `hermes mcp` commands for installing, configuring, and authenticating MCP servers with full OAuth 2.1 PKCE flow ([#2465](https://github.com/NousResearch/hermes-agent/pull/2465))
- **Gateway prompt caching** — Cache AIAgent instances per session, preserving Anthropic prompt cache across turns for dramatic cost reduction on long conversations ([#2282](https://github.com/NousResearch/hermes-agent/pull/2282), [#2284](https://github.com/NousResearch/hermes-agent/pull/2284), [#2361](https://github.com/NousResearch/hermes-agent/pull/2361))
- **Context compression overhaul** — Structured summaries with iterative updates, token-budget tail protection, configurable summary endpoint, and fallback model support ([#2323](https://github.com/NousResearch/hermes-agent/pull/2323), [#1727](https://github.com/NousResearch/hermes-agent/pull/1727), [#2224](https://github.com/NousResearch/hermes-agent/pull/2224))
- **Streaming enabled by default** — CLI streaming on by default with proper spinner/tool progress display during streaming mode, plus extensive linebreak and concatenation fixes ([#2340](https://github.com/NousResearch/hermes-agent/pull/2340), [#2161](https://github.com/NousResearch/hermes-agent/pull/2161), [#2258](https://github.com/NousResearch/hermes-agent/pull/2258))
---
## 🖥️ CLI & User Experience
### New Commands & Interactions
- **@ context completions** — Tab-completable `@file`/`@url` references that inject file content or web pages into the conversation ([#2482](https://github.com/NousResearch/hermes-agent/pull/2482), [#2343](https://github.com/NousResearch/hermes-agent/pull/2343))
- **`/statusbar`** — Toggle a persistent config bar showing model + provider info in the prompt ([#2240](https://github.com/NousResearch/hermes-agent/pull/2240), [#1917](https://github.com/NousResearch/hermes-agent/pull/1917))
- **`/queue`** — Queue prompts for the agent without interrupting the current run ([#2191](https://github.com/NousResearch/hermes-agent/pull/2191), [#2469](https://github.com/NousResearch/hermes-agent/pull/2469))
- **`/permission`** — Switch approval mode dynamically during a session ([#2207](https://github.com/NousResearch/hermes-agent/pull/2207))
- **`/browser`** — Interactive browser sessions from the CLI ([#2273](https://github.com/NousResearch/hermes-agent/pull/2273), [#1814](https://github.com/NousResearch/hermes-agent/pull/1814))
- **`/cost`** — Live pricing and usage tracking in gateway mode ([#2180](https://github.com/NousResearch/hermes-agent/pull/2180))
- **`/approve` and `/deny`** — Replaced bare text approval in gateway with explicit commands ([#2002](https://github.com/NousResearch/hermes-agent/pull/2002))
### Streaming & Display
- Streaming enabled by default in CLI ([#2340](https://github.com/NousResearch/hermes-agent/pull/2340))
- Show spinners and tool progress during streaming mode ([#2161](https://github.com/NousResearch/hermes-agent/pull/2161))
- Show reasoning/thinking blocks when `show_reasoning` enabled ([#2118](https://github.com/NousResearch/hermes-agent/pull/2118))
- Context pressure warnings for CLI and gateway ([#2159](https://github.com/NousResearch/hermes-agent/pull/2159))
- Fix: streaming chunks concatenated without whitespace ([#2258](https://github.com/NousResearch/hermes-agent/pull/2258))
- Fix: iteration boundary linebreak prevents stream concatenation ([#2413](https://github.com/NousResearch/hermes-agent/pull/2413))
- Fix: defer streaming linebreak to prevent blank line stacking ([#2473](https://github.com/NousResearch/hermes-agent/pull/2473))
- Fix: suppress spinner animation in non-TTY environments ([#2216](https://github.com/NousResearch/hermes-agent/pull/2216))
- Fix: display provider and endpoint in API error messages ([#2266](https://github.com/NousResearch/hermes-agent/pull/2266))
- Fix: resolve garbled ANSI escape codes in status printouts ([#2448](https://github.com/NousResearch/hermes-agent/pull/2448))
- Fix: update gold ANSI color to true-color format ([#2246](https://github.com/NousResearch/hermes-agent/pull/2246))
- Fix: normalize toolset labels and use skin colors in banner ([#1912](https://github.com/NousResearch/hermes-agent/pull/1912))
### CLI Polish
- Fix: prevent 'Press ENTER to continue...' on exit ([#2555](https://github.com/NousResearch/hermes-agent/pull/2555))
- Fix: flush stdout during agent loop to prevent macOS display freeze ([#1654](https://github.com/NousResearch/hermes-agent/pull/1654))
- Fix: show human-readable error when `hermes setup` hits permissions error ([#2196](https://github.com/NousResearch/hermes-agent/pull/2196))
- Fix: `/stop` command crash + UnboundLocalError in streaming media delivery ([#2463](https://github.com/NousResearch/hermes-agent/pull/2463))
- Fix: allow custom/local endpoints without API key ([#2556](https://github.com/NousResearch/hermes-agent/pull/2556))
- Fix: Kitty keyboard protocol Shift+Enter for Ghostty/WezTerm (attempted + reverted due to prompt_toolkit crash) ([#2345](https://github.com/NousResearch/hermes-agent/pull/2345), [#2349](https://github.com/NousResearch/hermes-agent/pull/2349))
### Configuration
- **`${ENV_VAR}` substitution** in config.yaml ([#2684](https://github.com/NousResearch/hermes-agent/pull/2684))
- **Real-time config reload** — config.yaml changes apply without restart ([#2210](https://github.com/NousResearch/hermes-agent/pull/2210))
- **`custom_models.yaml`** for user-managed model additions ([#2214](https://github.com/NousResearch/hermes-agent/pull/2214))
- **Priority-based context file selection** + CLAUDE.md support ([#2301](https://github.com/NousResearch/hermes-agent/pull/2301))
- **Merge nested YAML sections** instead of replacing on config update ([#2213](https://github.com/NousResearch/hermes-agent/pull/2213))
- Fix: config.yaml provider key overrides env var silently ([#2272](https://github.com/NousResearch/hermes-agent/pull/2272))
- Fix: log warning instead of silently swallowing config.yaml errors ([#2683](https://github.com/NousResearch/hermes-agent/pull/2683))
- Fix: disabled toolsets re-enable themselves after `hermes tools` ([#2268](https://github.com/NousResearch/hermes-agent/pull/2268))
- Fix: platform default toolsets silently override tool deselection ([#2624](https://github.com/NousResearch/hermes-agent/pull/2624))
- Fix: honor bare YAML `approvals.mode: off` ([#2620](https://github.com/NousResearch/hermes-agent/pull/2620))
- Fix: `hermes update` use `.[all]` extras with fallback ([#1728](https://github.com/NousResearch/hermes-agent/pull/1728))
- Fix: `hermes update` prompt before resetting working tree on stash conflicts ([#2390](https://github.com/NousResearch/hermes-agent/pull/2390))
- Fix: use git pull --rebase in update/install to avoid divergent branch error ([#2274](https://github.com/NousResearch/hermes-agent/pull/2274))
- Fix: add zprofile fallback and create zshrc on fresh macOS installs ([#2320](https://github.com/NousResearch/hermes-agent/pull/2320))
- Fix: remove `ANTHROPIC_BASE_URL` env var to avoid collisions ([#1675](https://github.com/NousResearch/hermes-agent/pull/1675))
- Fix: don't ask IMAP password if already in keyring or env ([#2212](https://github.com/NousResearch/hermes-agent/pull/2212))
- Fix: OpenCode Zen/Go show OpenRouter models instead of their own ([#2277](https://github.com/NousResearch/hermes-agent/pull/2277))
---
## 🏗️ Core Agent & Architecture
### New Providers
- **GitHub Copilot** — Full OAuth auth, API routing, token validation, and 400k context. ([#1924](https://github.com/NousResearch/hermes-agent/pull/1924), [#1896](https://github.com/NousResearch/hermes-agent/pull/1896), [#1879](https://github.com/NousResearch/hermes-agent/pull/1879) by @mchzimm, [#2507](https://github.com/NousResearch/hermes-agent/pull/2507))
- **Alibaba Cloud / DashScope** — Full integration with DashScope v1 runtime, model dot preservation, and 401 auth fixes ([#1673](https://github.com/NousResearch/hermes-agent/pull/1673), [#2332](https://github.com/NousResearch/hermes-agent/pull/2332), [#2459](https://github.com/NousResearch/hermes-agent/pull/2459))
- **Kilo Code** — First-class inference provider ([#1666](https://github.com/NousResearch/hermes-agent/pull/1666))
- **OpenCode Zen and OpenCode Go** — New provider backends ([#1650](https://github.com/NousResearch/hermes-agent/pull/1650), [#2393](https://github.com/NousResearch/hermes-agent/pull/2393) by @0xbyt4)
- **NeuTTS** — Local TTS provider backend with built-in setup flow, replacing the old optional skill ([#1657](https://github.com/NousResearch/hermes-agent/pull/1657), [#1664](https://github.com/NousResearch/hermes-agent/pull/1664))
### Provider Improvements
- **Eager fallback** to backup model on rate-limit errors ([#1730](https://github.com/NousResearch/hermes-agent/pull/1730))
- **Endpoint metadata** for custom model context and pricing; query local servers for actual context window size ([#1906](https://github.com/NousResearch/hermes-agent/pull/1906), [#2091](https://github.com/NousResearch/hermes-agent/pull/2091) by @dusterbloom)
- **Context length detection overhaul** — models.dev integration, provider-aware resolution, fuzzy matching for custom endpoints, `/v1/props` for llama.cpp ([#2158](https://github.com/NousResearch/hermes-agent/pull/2158), [#2051](https://github.com/NousResearch/hermes-agent/pull/2051), [#2403](https://github.com/NousResearch/hermes-agent/pull/2403))
- **Model catalog updates** — gpt-5.4-mini, gpt-5.4-nano, healer-alpha, haiku-4.5, minimax-m2.7, claude 4.6 at 1M context ([#1913](https://github.com/NousResearch/hermes-agent/pull/1913), [#1915](https://github.com/NousResearch/hermes-agent/pull/1915), [#1900](https://github.com/NousResearch/hermes-agent/pull/1900), [#2155](https://github.com/NousResearch/hermes-agent/pull/2155), [#2474](https://github.com/NousResearch/hermes-agent/pull/2474))
- **Custom endpoint improvements** — `model.base_url` in config.yaml, `api_mode` override for responses API, allow endpoints without API key, fail fast on missing keys ([#2330](https://github.com/NousResearch/hermes-agent/pull/2330), [#1651](https://github.com/NousResearch/hermes-agent/pull/1651), [#2556](https://github.com/NousResearch/hermes-agent/pull/2556), [#2445](https://github.com/NousResearch/hermes-agent/pull/2445), [#1994](https://github.com/NousResearch/hermes-agent/pull/1994), [#1998](https://github.com/NousResearch/hermes-agent/pull/1998))
- Inject model and provider into system prompt ([#1929](https://github.com/NousResearch/hermes-agent/pull/1929))
- Tie `api_mode` to provider config instead of env var ([#1656](https://github.com/NousResearch/hermes-agent/pull/1656))
- Fix: prevent Anthropic token leaking to third-party `anthropic_messages` providers ([#2389](https://github.com/NousResearch/hermes-agent/pull/2389))
- Fix: prevent Anthropic fallback from inheriting non-Anthropic `base_url` ([#2388](https://github.com/NousResearch/hermes-agent/pull/2388))
- Fix: `auxiliary_is_nous` flag never resets — leaked Nous tags to other providers ([#1713](https://github.com/NousResearch/hermes-agent/pull/1713))
- Fix: Anthropic `tool_choice 'none'` still allowed tool calls ([#1714](https://github.com/NousResearch/hermes-agent/pull/1714))
- Fix: Mistral parser nested JSON fallback extraction ([#2335](https://github.com/NousResearch/hermes-agent/pull/2335))
- Fix: MiniMax 401 auth resolved by defaulting to `anthropic_messages` ([#2103](https://github.com/NousResearch/hermes-agent/pull/2103))
- Fix: case-insensitive model family matching ([#2350](https://github.com/NousResearch/hermes-agent/pull/2350))
- Fix: ignore placeholder provider keys in activation checks ([#2358](https://github.com/NousResearch/hermes-agent/pull/2358))
- Fix: Preserve Ollama model:tag colons in context length detection ([#2149](https://github.com/NousResearch/hermes-agent/pull/2149))
- Fix: recognize Claude Code OAuth credentials in startup gate ([#1663](https://github.com/NousResearch/hermes-agent/pull/1663))
- Fix: detect Claude Code version dynamically for OAuth user-agent ([#1670](https://github.com/NousResearch/hermes-agent/pull/1670))
- Fix: OAuth flag stale after refresh/fallback ([#1890](https://github.com/NousResearch/hermes-agent/pull/1890))
- Fix: auxiliary client skips expired Codex JWT ([#2397](https://github.com/NousResearch/hermes-agent/pull/2397))
### Agent Loop
- **Gateway prompt caching** — Cache AIAgent per session, keep assistant turns, fix session restore ([#2282](https://github.com/NousResearch/hermes-agent/pull/2282), [#2284](https://github.com/NousResearch/hermes-agent/pull/2284), [#2361](https://github.com/NousResearch/hermes-agent/pull/2361))
- **Context compression overhaul** — Structured summaries, iterative updates, token-budget tail protection, configurable `summary_base_url` ([#2323](https://github.com/NousResearch/hermes-agent/pull/2323), [#1727](https://github.com/NousResearch/hermes-agent/pull/1727), [#2224](https://github.com/NousResearch/hermes-agent/pull/2224))
- **Pre-call sanitization and post-call tool guardrails** ([#1732](https://github.com/NousResearch/hermes-agent/pull/1732))
- **Auto-recover** from provider-rejected `tool_choice` by retrying without ([#2174](https://github.com/NousResearch/hermes-agent/pull/2174))
- **Background memory/skill review** replaces inline nudges ([#2235](https://github.com/NousResearch/hermes-agent/pull/2235))
- **SOUL.md as primary agent identity** instead of hardcoded default ([#1922](https://github.com/NousResearch/hermes-agent/pull/1922))
- Fix: prevent silent tool result loss during context compression ([#1993](https://github.com/NousResearch/hermes-agent/pull/1993))
- Fix: handle empty/null function arguments in tool call recovery ([#2163](https://github.com/NousResearch/hermes-agent/pull/2163))
- Fix: handle API refusal responses gracefully instead of crashing ([#2156](https://github.com/NousResearch/hermes-agent/pull/2156))
- Fix: prevent stuck agent loop on malformed tool calls ([#2114](https://github.com/NousResearch/hermes-agent/pull/2114))
- Fix: return JSON parse error to model instead of dispatching with empty args ([#2342](https://github.com/NousResearch/hermes-agent/pull/2342))
- Fix: consecutive assistant message merge drops content on mixed types ([#1703](https://github.com/NousResearch/hermes-agent/pull/1703))
- Fix: message role alternation violations in JSON recovery and error handler ([#1722](https://github.com/NousResearch/hermes-agent/pull/1722))
- Fix: `compression_attempts` resets each iteration — allowed unlimited compressions ([#1723](https://github.com/NousResearch/hermes-agent/pull/1723))
- Fix: `length_continue_retries` never resets — later truncations got fewer retries ([#1717](https://github.com/NousResearch/hermes-agent/pull/1717))
- Fix: compressor summary role violated consecutive-role constraint ([#1720](https://github.com/NousResearch/hermes-agent/pull/1720), [#1743](https://github.com/NousResearch/hermes-agent/pull/1743))
- Fix: remove hardcoded `gemini-3-flash-preview` as default summary model ([#2464](https://github.com/NousResearch/hermes-agent/pull/2464))
- Fix: correctly handle empty tool results ([#2201](https://github.com/NousResearch/hermes-agent/pull/2201))
- Fix: crash on None entry in `tool_calls` list ([#2209](https://github.com/NousResearch/hermes-agent/pull/2209) by @0xbyt4, [#2316](https://github.com/NousResearch/hermes-agent/pull/2316))
- Fix: per-thread persistent event loops in worker threads ([#2214](https://github.com/NousResearch/hermes-agent/pull/2214) by @jquesnelle)
- Fix: prevent 'event loop already running' when async tools run in parallel ([#2207](https://github.com/NousResearch/hermes-agent/pull/2207))
- Fix: strip ANSI at the source — clean terminal output before it reaches the model ([#2115](https://github.com/NousResearch/hermes-agent/pull/2115))
- Fix: skip top-level `cache_control` on role:tool for OpenRouter ([#2391](https://github.com/NousResearch/hermes-agent/pull/2391))
- Fix: delegate tool — save parent tool names before child construction mutates global ([#2083](https://github.com/NousResearch/hermes-agent/pull/2083) by @ygd58, [#1894](https://github.com/NousResearch/hermes-agent/pull/1894))
- Fix: only strip last assistant message if empty string ([#2326](https://github.com/NousResearch/hermes-agent/pull/2326))
### Session & Memory
- **Session search** and management slash commands ([#2198](https://github.com/NousResearch/hermes-agent/pull/2198))
- **Auto session titles** and `.hermes.md` project config ([#1712](https://github.com/NousResearch/hermes-agent/pull/1712))
- Fix: concurrent memory writes silently drop entries — added file locking ([#1726](https://github.com/NousResearch/hermes-agent/pull/1726))
- Fix: search all sources by default in `session_search` ([#1892](https://github.com/NousResearch/hermes-agent/pull/1892))
- Fix: handle hyphenated FTS5 queries and preserve quoted literals ([#1776](https://github.com/NousResearch/hermes-agent/pull/1776))
- Fix: skip corrupt lines in `load_transcript` instead of crashing ([#1744](https://github.com/NousResearch/hermes-agent/pull/1744))
- Fix: normalize session keys to prevent case-sensitive duplicates ([#2157](https://github.com/NousResearch/hermes-agent/pull/2157))
- Fix: prevent `session_search` crash when no sessions exist ([#2194](https://github.com/NousResearch/hermes-agent/pull/2194))
- Fix: reset token counters on new session for accurate usage display ([#2101](https://github.com/NousResearch/hermes-agent/pull/2101) by @InB4DevOps)
- Fix: prevent stale memory overwrites by flush agent ([#2687](https://github.com/NousResearch/hermes-agent/pull/2687))
- Fix: remove synthetic error message injection, fix session resume after repeated failures ([#2303](https://github.com/NousResearch/hermes-agent/pull/2303))
- Fix: quiet mode with `--resume` now passes conversation_history ([#2357](https://github.com/NousResearch/hermes-agent/pull/2357))
- Fix: unify resume logic in batch mode ([#2331](https://github.com/NousResearch/hermes-agent/pull/2331))
### Honcho Memory
- Honcho config fixes and @ context reference integration ([#2343](https://github.com/NousResearch/hermes-agent/pull/2343))
- Self-hosted / Docker configuration documentation ([#2475](https://github.com/NousResearch/hermes-agent/pull/2475))
---
## 📱 Messaging Platforms (Gateway)
### New Platform Adapters
- **Signal Messenger** — Full adapter with attachment handling, group message filtering, and Note to Self echo-back protection ([#2206](https://github.com/NousResearch/hermes-agent/pull/2206), [#2400](https://github.com/NousResearch/hermes-agent/pull/2400), [#2297](https://github.com/NousResearch/hermes-agent/pull/2297), [#2156](https://github.com/NousResearch/hermes-agent/pull/2156))
- **DingTalk** — Adapter with gateway wiring and setup docs ([#1685](https://github.com/NousResearch/hermes-agent/pull/1685), [#1690](https://github.com/NousResearch/hermes-agent/pull/1690), [#1692](https://github.com/NousResearch/hermes-agent/pull/1692))
- **SMS (Twilio)** ([#1688](https://github.com/NousResearch/hermes-agent/pull/1688))
- **Mattermost** — With @-mention-only channel filter ([#1683](https://github.com/NousResearch/hermes-agent/pull/1683), [#2443](https://github.com/NousResearch/hermes-agent/pull/2443))
- **Matrix** — With vision support and image caching ([#1683](https://github.com/NousResearch/hermes-agent/pull/1683), [#2520](https://github.com/NousResearch/hermes-agent/pull/2520))
- **Webhook** — Platform adapter for external event triggers ([#2166](https://github.com/NousResearch/hermes-agent/pull/2166))
- **OpenAI-compatible API server** — `/v1/chat/completions` endpoint with `/api/jobs` cron management ([#1756](https://github.com/NousResearch/hermes-agent/pull/1756), [#2450](https://github.com/NousResearch/hermes-agent/pull/2450), [#2456](https://github.com/NousResearch/hermes-agent/pull/2456))
### Telegram Improvements
- MarkdownV2 support — strikethrough, spoiler, blockquotes, escape parentheses/braces/backslashes/backticks ([#2199](https://github.com/NousResearch/hermes-agent/pull/2199), [#2200](https://github.com/NousResearch/hermes-agent/pull/2200) by @llbn, [#2386](https://github.com/NousResearch/hermes-agent/pull/2386))
- Auto-detect HTML tags and use `parse_mode=HTML` ([#1709](https://github.com/NousResearch/hermes-agent/pull/1709))
- Telegram group vision support + thread-based sessions ([#2153](https://github.com/NousResearch/hermes-agent/pull/2153))
- Auto-reconnect polling after network interruption ([#2517](https://github.com/NousResearch/hermes-agent/pull/2517))
- Aggregate split text messages before dispatching ([#1674](https://github.com/NousResearch/hermes-agent/pull/1674))
- Fix: streaming config bridge, not-modified, flood control ([#1782](https://github.com/NousResearch/hermes-agent/pull/1782), [#1783](https://github.com/NousResearch/hermes-agent/pull/1783))
- Fix: edited_message event crashes ([#2074](https://github.com/NousResearch/hermes-agent/pull/2074))
- Fix: retry 409 polling conflicts before giving up ([#2312](https://github.com/NousResearch/hermes-agent/pull/2312))
- Fix: topic delivery via `platform:chat_id:thread_id` format ([#2455](https://github.com/NousResearch/hermes-agent/pull/2455))
### Discord Improvements
- Document caching and text-file injection ([#2503](https://github.com/NousResearch/hermes-agent/pull/2503))
- Persistent typing indicator for DMs ([#2468](https://github.com/NousResearch/hermes-agent/pull/2468))
- Discord DM vision — inline images + attachment analysis ([#2186](https://github.com/NousResearch/hermes-agent/pull/2186))
- Persist thread participation across gateway restarts ([#1661](https://github.com/NousResearch/hermes-agent/pull/1661))
- Fix: gateway crash on non-ASCII guild names ([#2302](https://github.com/NousResearch/hermes-agent/pull/2302))
- Fix: thread permission errors ([#2073](https://github.com/NousResearch/hermes-agent/pull/2073))
- Fix: slash event routing in threads ([#2460](https://github.com/NousResearch/hermes-agent/pull/2460))
- Fix: remove bugged followup messages + `/ask` command ([#1836](https://github.com/NousResearch/hermes-agent/pull/1836))
- Fix: graceful WebSocket reconnection ([#2127](https://github.com/NousResearch/hermes-agent/pull/2127))
- Fix: voice channel TTS when streaming enabled ([#2322](https://github.com/NousResearch/hermes-agent/pull/2322))
### WhatsApp & Other Adapters
- WhatsApp: outbound `send_message` routing ([#1769](https://github.com/NousResearch/hermes-agent/pull/1769) by @sai-samarth), LID format self-chat ([#1667](https://github.com/NousResearch/hermes-agent/pull/1667)), `reply_prefix` config fix ([#1923](https://github.com/NousResearch/hermes-agent/pull/1923)), restart on bridge child exit ([#2334](https://github.com/NousResearch/hermes-agent/pull/2334)), image/bridge improvements ([#2181](https://github.com/NousResearch/hermes-agent/pull/2181))
- Matrix: correct `reply_to_message_id` parameter ([#1895](https://github.com/NousResearch/hermes-agent/pull/1895)), bare media types fix ([#1736](https://github.com/NousResearch/hermes-agent/pull/1736))
- Mattermost: MIME types for media attachments ([#2329](https://github.com/NousResearch/hermes-agent/pull/2329))
### Gateway Core
- **Auto-reconnect** failed platforms with exponential backoff ([#2584](https://github.com/NousResearch/hermes-agent/pull/2584))
- **Notify users when session auto-resets** ([#2519](https://github.com/NousResearch/hermes-agent/pull/2519))
- **Reply-to message context** for out-of-session replies ([#1662](https://github.com/NousResearch/hermes-agent/pull/1662))
- **Ignore unauthorized DMs** config option ([#1919](https://github.com/NousResearch/hermes-agent/pull/1919))
- Fix: `/reset` in thread-mode resets global session instead of thread ([#2254](https://github.com/NousResearch/hermes-agent/pull/2254))
- Fix: deliver MEDIA: files after streaming responses ([#2382](https://github.com/NousResearch/hermes-agent/pull/2382))
- Fix: cap interrupt recursion depth to prevent resource exhaustion ([#1659](https://github.com/NousResearch/hermes-agent/pull/1659))
- Fix: detect stopped processes and release stale locks on `--replace` ([#2406](https://github.com/NousResearch/hermes-agent/pull/2406), [#1908](https://github.com/NousResearch/hermes-agent/pull/1908))
- Fix: PID-based wait with force-kill for gateway restart ([#1902](https://github.com/NousResearch/hermes-agent/pull/1902))
- Fix: prevent `--replace` mode from killing the caller process ([#2185](https://github.com/NousResearch/hermes-agent/pull/2185))
- Fix: `/model` shows active fallback model instead of config default ([#1660](https://github.com/NousResearch/hermes-agent/pull/1660))
- Fix: `/title` command fails when session doesn't exist in SQLite yet ([#2379](https://github.com/NousResearch/hermes-agent/pull/2379) by @ten-jampa)
- Fix: process `/queue`'d messages after agent completion ([#2469](https://github.com/NousResearch/hermes-agent/pull/2469))
- Fix: strip orphaned `tool_results` + let `/reset` bypass running agent ([#2180](https://github.com/NousResearch/hermes-agent/pull/2180))
- Fix: prevent agents from starting gateway outside systemd management ([#2617](https://github.com/NousResearch/hermes-agent/pull/2617))
- Fix: prevent systemd restart storm on gateway connection failure ([#2327](https://github.com/NousResearch/hermes-agent/pull/2327))
- Fix: include resolved node path in systemd unit ([#1767](https://github.com/NousResearch/hermes-agent/pull/1767) by @sai-samarth)
- Fix: send error details to user in gateway outer exception handler ([#1966](https://github.com/NousResearch/hermes-agent/pull/1966))
- Fix: improve error handling for 429 usage limits and 500 context overflow ([#1839](https://github.com/NousResearch/hermes-agent/pull/1839))
- Fix: add all missing platform allowlist env vars to startup warning check ([#2628](https://github.com/NousResearch/hermes-agent/pull/2628))
- Fix: media delivery fails for file paths containing spaces ([#2621](https://github.com/NousResearch/hermes-agent/pull/2621))
- Fix: duplicate session-key collision in multi-platform gateway ([#2171](https://github.com/NousResearch/hermes-agent/pull/2171))
- Fix: Matrix and Mattermost never report as connected ([#1711](https://github.com/NousResearch/hermes-agent/pull/1711))
- Fix: PII redaction config never read — missing yaml import ([#1701](https://github.com/NousResearch/hermes-agent/pull/1701))
- Fix: NameError on skill slash commands ([#1697](https://github.com/NousResearch/hermes-agent/pull/1697))
- Fix: persist watcher metadata in checkpoint for crash recovery ([#1706](https://github.com/NousResearch/hermes-agent/pull/1706))
- Fix: pass `message_thread_id` in send_image_file, send_document, send_video ([#2339](https://github.com/NousResearch/hermes-agent/pull/2339))
- Fix: media-group aggregation on rapid successive photo messages ([#2160](https://github.com/NousResearch/hermes-agent/pull/2160))
---
## 🔧 Tool System
### MCP Enhancements
- **MCP server management CLI** + OAuth 2.1 PKCE auth ([#2465](https://github.com/NousResearch/hermes-agent/pull/2465))
- **Expose MCP servers as standalone toolsets** ([#1907](https://github.com/NousResearch/hermes-agent/pull/1907))
- **Interactive MCP tool configuration** in `hermes tools` ([#1694](https://github.com/NousResearch/hermes-agent/pull/1694))
- Fix: MCP-OAuth port mismatch, path traversal, and shared handler state ([#2552](https://github.com/NousResearch/hermes-agent/pull/2552))
- Fix: preserve MCP tool registrations across session resets ([#2124](https://github.com/NousResearch/hermes-agent/pull/2124))
- Fix: concurrent file access crash + duplicate MCP registration ([#2154](https://github.com/NousResearch/hermes-agent/pull/2154))
- Fix: normalise MCP schemas + expand session list columns ([#2102](https://github.com/NousResearch/hermes-agent/pull/2102))
- Fix: `tool_choice` `mcp_` prefix handling ([#1775](https://github.com/NousResearch/hermes-agent/pull/1775))
### Web Tool Backends
- **Tavily** as web search/extract/crawl backend ([#1731](https://github.com/NousResearch/hermes-agent/pull/1731))
- **Parallel** as alternative web search/extract backend ([#1696](https://github.com/NousResearch/hermes-agent/pull/1696))
- **Configurable web backend** — Firecrawl/BeautifulSoup/Playwright selection ([#2256](https://github.com/NousResearch/hermes-agent/pull/2256))
- Fix: whitespace-only env vars bypass web backend detection ([#2341](https://github.com/NousResearch/hermes-agent/pull/2341))
### New Tools
- **IMAP email** reading and sending ([#2173](https://github.com/NousResearch/hermes-agent/pull/2173))
- **STT (speech-to-text)** tool using Whisper API ([#2072](https://github.com/NousResearch/hermes-agent/pull/2072))
- **Route-aware pricing estimates** ([#1695](https://github.com/NousResearch/hermes-agent/pull/1695))
### Tool Improvements
- TTS: `base_url` support for OpenAI TTS provider ([#2064](https://github.com/NousResearch/hermes-agent/pull/2064) by @hanai)
- Vision: configurable timeout, tilde expansion in file paths, DM vision with multi-image and base64 fallback ([#2480](https://github.com/NousResearch/hermes-agent/pull/2480), [#2585](https://github.com/NousResearch/hermes-agent/pull/2585), [#2211](https://github.com/NousResearch/hermes-agent/pull/2211))
- Browser: race condition fix in session creation ([#1721](https://github.com/NousResearch/hermes-agent/pull/1721)), TypeError on unexpected LLM params ([#1735](https://github.com/NousResearch/hermes-agent/pull/1735))
- File tools: strip ANSI escape codes from write_file and patch content ([#2532](https://github.com/NousResearch/hermes-agent/pull/2532)), include pagination args in repeated search key ([#1824](https://github.com/NousResearch/hermes-agent/pull/1824) by @cutepawss), improve fuzzy matching accuracy + position calculation refactor ([#2096](https://github.com/NousResearch/hermes-agent/pull/2096), [#1681](https://github.com/NousResearch/hermes-agent/pull/1681))
- Code execution: resource leak and double socket close fix ([#2381](https://github.com/NousResearch/hermes-agent/pull/2381))
- Delegate: thread safety for concurrent subagent delegation ([#1672](https://github.com/NousResearch/hermes-agent/pull/1672)), preserve parent agent's tool list after delegation ([#1778](https://github.com/NousResearch/hermes-agent/pull/1778))
- Fix: make concurrent tool batching path-aware for file mutations ([#1914](https://github.com/NousResearch/hermes-agent/pull/1914))
- Fix: chunk long messages in `send_message_tool` before platform dispatch ([#1646](https://github.com/NousResearch/hermes-agent/pull/1646))
- Fix: add missing 'messaging' toolset ([#1718](https://github.com/NousResearch/hermes-agent/pull/1718))
- Fix: prevent unavailable tool names from leaking into model schemas ([#2072](https://github.com/NousResearch/hermes-agent/pull/2072))
- Fix: pass visited set by reference to prevent diamond dependency duplication ([#2311](https://github.com/NousResearch/hermes-agent/pull/2311))
- Fix: Daytona sandbox lookup migrated from `find_one` to `get/list` ([#2063](https://github.com/NousResearch/hermes-agent/pull/2063) by @rovle)
---
## 🧩 Skills Ecosystem
### Skills System Improvements
- **Agent-created skills** — Caution-level findings allowed, dangerous skills ask instead of block ([#1840](https://github.com/NousResearch/hermes-agent/pull/1840), [#2446](https://github.com/NousResearch/hermes-agent/pull/2446))
- **`--yes` flag** to bypass confirmation in `/skills install` and uninstall ([#1647](https://github.com/NousResearch/hermes-agent/pull/1647))
- **Disabled skills respected** across banner, system prompt, and slash commands ([#1897](https://github.com/NousResearch/hermes-agent/pull/1897))
- Fix: skills custom_tools import crash + sandbox file_tools integration ([#2239](https://github.com/NousResearch/hermes-agent/pull/2239))
- Fix: agent-created skills with pip requirements crash on install ([#2145](https://github.com/NousResearch/hermes-agent/pull/2145))
- Fix: race condition in `Skills.__init__` when `hub.yaml` missing ([#2242](https://github.com/NousResearch/hermes-agent/pull/2242))
- Fix: validate skill metadata before install and block duplicates ([#2241](https://github.com/NousResearch/hermes-agent/pull/2241))
- Fix: skills hub inspect/resolve — 4 bugs in inspect, redirects, discovery, tap list ([#2447](https://github.com/NousResearch/hermes-agent/pull/2447))
- Fix: agent-created skills keep working after session reset ([#2121](https://github.com/NousResearch/hermes-agent/pull/2121))
### New Skills
- **OCR-and-documents** — PDF/DOCX/XLS/PPTX/image OCR with optional GPU ([#2236](https://github.com/NousResearch/hermes-agent/pull/2236), [#2461](https://github.com/NousResearch/hermes-agent/pull/2461))
- **Huggingface-hub** bundled skill ([#1921](https://github.com/NousResearch/hermes-agent/pull/1921))
- **Sherlock OSINT** username search ([#1671](https://github.com/NousResearch/hermes-agent/pull/1671))
- **Meme-generation** — Image generator with Pillow ([#2344](https://github.com/NousResearch/hermes-agent/pull/2344))
- **Bioinformatics** gateway skill — index to 400+ bio skills ([#2387](https://github.com/NousResearch/hermes-agent/pull/2387))
- **Inference.sh** skill (terminal-based) ([#1686](https://github.com/NousResearch/hermes-agent/pull/1686))
- **Base blockchain** optional skill ([#1643](https://github.com/NousResearch/hermes-agent/pull/1643))
- **3D-model-viewer** optional skill ([#2226](https://github.com/NousResearch/hermes-agent/pull/2226))
- **FastMCP** optional skill ([#2113](https://github.com/NousResearch/hermes-agent/pull/2113))
- **Hermes-agent-setup** skill ([#1905](https://github.com/NousResearch/hermes-agent/pull/1905))
---
## 🔌 Plugin System Enhancements
- **TUI extension hooks** — Build custom CLIs on top of Hermes ([#2333](https://github.com/NousResearch/hermes-agent/pull/2333))
- **`hermes plugins install/remove/list`** commands ([#2337](https://github.com/NousResearch/hermes-agent/pull/2337))
- **Slash command registration** for plugins ([#2359](https://github.com/NousResearch/hermes-agent/pull/2359))
- **`session:end` lifecycle event** hook ([#1725](https://github.com/NousResearch/hermes-agent/pull/1725))
- Fix: require opt-in for project plugin discovery ([#2215](https://github.com/NousResearch/hermes-agent/pull/2215))
---
## 🔒 Security & Reliability
### Security
- **SSRF protection** for vision_tools and web_tools ([#2679](https://github.com/NousResearch/hermes-agent/pull/2679))
- **Shell injection prevention** in `_expand_path` via `~user` path suffix ([#2685](https://github.com/NousResearch/hermes-agent/pull/2685))
- **Block untrusted browser-origin** API server access ([#2451](https://github.com/NousResearch/hermes-agent/pull/2451))
- **Block sandbox backend creds** from subprocess env ([#1658](https://github.com/NousResearch/hermes-agent/pull/1658))
- **Block @ references** from reading secrets outside workspace ([#2601](https://github.com/NousResearch/hermes-agent/pull/2601) by @Gutslabs)
- **Malicious code pattern pre-exec scanner** for terminal_tool ([#2245](https://github.com/NousResearch/hermes-agent/pull/2245))
- **Harden terminal safety** and sandbox file writes ([#1653](https://github.com/NousResearch/hermes-agent/pull/1653))
- **PKCE verifier leak** fix + OAuth refresh Content-Type ([#1775](https://github.com/NousResearch/hermes-agent/pull/1775))
- **Eliminate SQL string formatting** in `execute()` calls ([#2061](https://github.com/NousResearch/hermes-agent/pull/2061) by @dusterbloom)
- **Harden jobs API** — input limits, field whitelist, startup check ([#2456](https://github.com/NousResearch/hermes-agent/pull/2456))
### Reliability
- Thread locks on 4 SessionDB methods ([#1704](https://github.com/NousResearch/hermes-agent/pull/1704))
- File locking for concurrent memory writes ([#1726](https://github.com/NousResearch/hermes-agent/pull/1726))
- Handle OpenRouter errors gracefully ([#2112](https://github.com/NousResearch/hermes-agent/pull/2112))
- Guard print() calls against OSError ([#1668](https://github.com/NousResearch/hermes-agent/pull/1668))
- Safely handle non-string inputs in redacting formatter ([#2392](https://github.com/NousResearch/hermes-agent/pull/2392), [#1700](https://github.com/NousResearch/hermes-agent/pull/1700))
- ACP: preserve session provider on model switch, persist sessions to disk ([#2380](https://github.com/NousResearch/hermes-agent/pull/2380), [#2071](https://github.com/NousResearch/hermes-agent/pull/2071))
- API server: persist ResponseStore to SQLite across restarts ([#2472](https://github.com/NousResearch/hermes-agent/pull/2472))
- Fix: `fetch_nous_models` always TypeError from positional args ([#1699](https://github.com/NousResearch/hermes-agent/pull/1699))
- Fix: resolve merge conflict markers in cli.py breaking startup ([#2347](https://github.com/NousResearch/hermes-agent/pull/2347))
- Fix: `minisweagent_path.py` missing from wheel ([#2098](https://github.com/NousResearch/hermes-agent/pull/2098) by @JiwaniZakir)
### Cron System
- **`[SILENT]` response** — cron agents can suppress delivery ([#1833](https://github.com/NousResearch/hermes-agent/pull/1833))
- **Scale missed-job grace window** with schedule frequency ([#2449](https://github.com/NousResearch/hermes-agent/pull/2449))
- **Recover recent one-shot jobs** ([#1918](https://github.com/NousResearch/hermes-agent/pull/1918))
- Fix: normalize `repeat<=0` to None — jobs deleted after first run when LLM passes -1 ([#2612](https://github.com/NousResearch/hermes-agent/pull/2612) by @Mibayy)
- Fix: Matrix added to scheduler delivery platform_map ([#2167](https://github.com/NousResearch/hermes-agent/pull/2167) by @buntingszn)
- Fix: naive ISO timestamps without timezone — jobs fire at wrong time ([#1729](https://github.com/NousResearch/hermes-agent/pull/1729))
- Fix: `get_due_jobs` reads `jobs.json` twice — race condition ([#1716](https://github.com/NousResearch/hermes-agent/pull/1716))
- Fix: silent jobs return empty response for delivery skip ([#2442](https://github.com/NousResearch/hermes-agent/pull/2442))
- Fix: stop injecting cron outputs into gateway session history ([#2313](https://github.com/NousResearch/hermes-agent/pull/2313))
- Fix: close abandoned coroutine when `asyncio.run()` raises RuntimeError ([#2317](https://github.com/NousResearch/hermes-agent/pull/2317))
---
## 🧪 Testing
- Resolve all consistently failing tests ([#2488](https://github.com/NousResearch/hermes-agent/pull/2488))
- Replace `FakePath` with `monkeypatch` for Python 3.12 compat ([#2444](https://github.com/NousResearch/hermes-agent/pull/2444))
- Align Hermes setup and full-suite expectations ([#1710](https://github.com/NousResearch/hermes-agent/pull/1710))
---
## 📚 Documentation
- Comprehensive docs update for recent features ([#1693](https://github.com/NousResearch/hermes-agent/pull/1693), [#2183](https://github.com/NousResearch/hermes-agent/pull/2183))
- Alibaba Cloud and DingTalk setup guides ([#1687](https://github.com/NousResearch/hermes-agent/pull/1687), [#1692](https://github.com/NousResearch/hermes-agent/pull/1692))
- Detailed skills documentation ([#2244](https://github.com/NousResearch/hermes-agent/pull/2244))
- Honcho self-hosted / Docker configuration ([#2475](https://github.com/NousResearch/hermes-agent/pull/2475))
- Context length detection FAQ and quickstart references ([#2179](https://github.com/NousResearch/hermes-agent/pull/2179))
- Fix docs inconsistencies across reference and user guides ([#1995](https://github.com/NousResearch/hermes-agent/pull/1995))
- Fix MCP install commands — use uv, not bare pip ([#1909](https://github.com/NousResearch/hermes-agent/pull/1909))
- Replace ASCII diagrams with Mermaid/lists ([#2402](https://github.com/NousResearch/hermes-agent/pull/2402))
- Gemini OAuth provider implementation plan ([#2467](https://github.com/NousResearch/hermes-agent/pull/2467))
- Discord Server Members Intent marked as required ([#2330](https://github.com/NousResearch/hermes-agent/pull/2330))
- Fix MDX build error in api-server.md ([#1787](https://github.com/NousResearch/hermes-agent/pull/1787))
- Align venv path to match installer ([#2114](https://github.com/NousResearch/hermes-agent/pull/2114))
- New skills added to hub index ([#2281](https://github.com/NousResearch/hermes-agent/pull/2281))
---
## 👥 Contributors
### Core
- **@teknium1** (Teknium) — 280 PRs
### Community Contributors
- **@mchzimm** (to_the_max) — GitHub Copilot provider integration ([#1879](https://github.com/NousResearch/hermes-agent/pull/1879))
- **@jquesnelle** (Jeffrey Quesnelle) — Per-thread persistent event loops fix ([#2214](https://github.com/NousResearch/hermes-agent/pull/2214))
- **@llbn** (lbn) — Telegram MarkdownV2 strikethrough, spoiler, blockquotes, and escape fixes ([#2199](https://github.com/NousResearch/hermes-agent/pull/2199), [#2200](https://github.com/NousResearch/hermes-agent/pull/2200))
- **@dusterbloom** — SQL injection prevention + local server context window querying ([#2061](https://github.com/NousResearch/hermes-agent/pull/2061), [#2091](https://github.com/NousResearch/hermes-agent/pull/2091))
- **@0xbyt4** — Anthropic tool_calls None guard + OpenCode-Go provider config fix ([#2209](https://github.com/NousResearch/hermes-agent/pull/2209), [#2393](https://github.com/NousResearch/hermes-agent/pull/2393))
- **@sai-samarth** (Saisamarth) — WhatsApp send_message routing + systemd node path ([#1769](https://github.com/NousResearch/hermes-agent/pull/1769), [#1767](https://github.com/NousResearch/hermes-agent/pull/1767))
- **@Gutslabs** (Guts) — Block @ references from reading secrets ([#2601](https://github.com/NousResearch/hermes-agent/pull/2601))
- **@Mibayy** (Mibay) — Cron job repeat normalization ([#2612](https://github.com/NousResearch/hermes-agent/pull/2612))
- **@ten-jampa** (Tenzin Jampa) — Gateway /title command fix ([#2379](https://github.com/NousResearch/hermes-agent/pull/2379))
- **@cutepawss** (lila) — File tools search pagination fix ([#1824](https://github.com/NousResearch/hermes-agent/pull/1824))
- **@hanai** (Hanai) — OpenAI TTS base_url support ([#2064](https://github.com/NousResearch/hermes-agent/pull/2064))
- **@rovle** (Lovre Pešut) — Daytona sandbox API migration ([#2063](https://github.com/NousResearch/hermes-agent/pull/2063))
- **@buntingszn** (bunting szn) — Matrix cron delivery support ([#2167](https://github.com/NousResearch/hermes-agent/pull/2167))
- **@InB4DevOps** — Token counter reset on new session ([#2101](https://github.com/NousResearch/hermes-agent/pull/2101))
- **@JiwaniZakir** (Zakir Jiwani) — Missing file in wheel fix ([#2098](https://github.com/NousResearch/hermes-agent/pull/2098))
- **@ygd58** (buray) — Delegate tool parent tool names fix ([#2083](https://github.com/NousResearch/hermes-agent/pull/2083))
---
**Full Changelog**: [v2026.3.17...v2026.3.23](https://github.com/NousResearch/hermes-agent/compare/v2026.3.17...v2026.3.23)
+32 -13
View File
@@ -35,14 +35,12 @@ SUMMARY_PREFIX = (
)
LEGACY_SUMMARY_PREFIX = "[CONTEXT SUMMARY]:"
# Minimum / maximum tokens for the summary output
# Minimum tokens for the summary output
_MIN_SUMMARY_TOKENS = 2000
_MAX_SUMMARY_TOKENS = 8000
# Proportion of compressed content to allocate for summary
_SUMMARY_RATIO = 0.20
# Token budget for tail protection (keep most-recent context)
_DEFAULT_TAIL_TOKEN_BUDGET = 20_000
# Absolute ceiling for summary tokens (even on very large context windows)
_SUMMARY_TOKENS_CEILING = 12_000
# Placeholder used when pruning old tool results
_PRUNED_TOOL_PLACEHOLDER = "[Old tool output cleared to save context space]"
@@ -67,8 +65,8 @@ class ContextCompressor:
model: str,
threshold_percent: float = 0.50,
protect_first_n: int = 3,
protect_last_n: int = 4,
summary_target_tokens: int = 2500,
protect_last_n: int = 20,
summary_target_ratio: float = 0.20,
quiet_mode: bool = False,
summary_model_override: str = None,
base_url: str = "",
@@ -83,7 +81,7 @@ class ContextCompressor:
self.threshold_percent = threshold_percent
self.protect_first_n = protect_first_n
self.protect_last_n = protect_last_n
self.summary_target_tokens = summary_target_tokens
self.summary_target_ratio = max(0.10, min(summary_target_ratio, 0.80))
self.quiet_mode = quiet_mode
self.context_length = get_model_context_length(
@@ -94,12 +92,22 @@ class ContextCompressor:
self.threshold_tokens = int(self.context_length * threshold_percent)
self.compression_count = 0
# Derive token budgets: ratio is relative to the threshold, not total context
target_tokens = int(self.threshold_tokens * self.summary_target_ratio)
self.tail_token_budget = target_tokens
self.max_summary_tokens = min(
int(self.context_length * 0.05), _SUMMARY_TOKENS_CEILING,
)
if not quiet_mode:
logger.info(
"Context compressor initialized: model=%s context_length=%d "
"threshold=%d (%.0f%%) provider=%s base_url=%s",
"threshold=%d (%.0f%%) target_ratio=%.0f%% tail_budget=%d "
"provider=%s base_url=%s",
model, self.context_length, self.threshold_tokens,
threshold_percent * 100, provider or "none", base_url or "none",
threshold_percent * 100, self.summary_target_ratio * 100,
self.tail_token_budget,
provider or "none", base_url or "none",
)
self._context_probed = False # True after a step-down from context error
@@ -179,10 +187,15 @@ class ContextCompressor:
# ------------------------------------------------------------------
def _compute_summary_budget(self, turns_to_summarize: List[Dict[str, Any]]) -> int:
"""Scale summary token budget with the amount of content being compressed."""
"""Scale summary token budget with the amount of content being compressed.
The maximum scales with the model's context window (5% of context,
capped at ``_SUMMARY_TOKENS_CEILING``) so large-context models get
richer summaries instead of being hard-capped at 8K tokens.
"""
content_tokens = estimate_messages_tokens_rough(turns_to_summarize)
budget = int(content_tokens * _SUMMARY_RATIO)
return max(_MIN_SUMMARY_TOKENS, min(budget, _MAX_SUMMARY_TOKENS))
return max(_MIN_SUMMARY_TOKENS, min(budget, self.max_summary_tokens))
def _serialize_for_summary(self, turns: List[Dict[str, Any]]) -> str:
"""Serialize conversation turns into labeled text for the summarizer.
@@ -477,14 +490,20 @@ Write only the summary body. Do not include any preamble or prefix."""
def _find_tail_cut_by_tokens(
self, messages: List[Dict[str, Any]], head_end: int,
token_budget: int = _DEFAULT_TAIL_TOKEN_BUDGET,
token_budget: int | None = None,
) -> int:
"""Walk backward from the end of messages, accumulating tokens until
the budget is reached. Returns the index where the tail starts.
``token_budget`` defaults to ``self.tail_token_budget`` which is
derived from ``summary_target_ratio * context_length``, so it
scales automatically with the model's context window.
Never cuts inside a tool_call/result group. Falls back to the old
``protect_last_n`` if the budget would protect fewer messages.
"""
if token_budget is None:
token_budget = self.tail_token_budget
n = len(messages)
min_tail = self.protect_last_n
accumulated = 0
+9 -23
View File
@@ -657,10 +657,6 @@ def format_context_pressure(
The bar and percentage show progress toward the compaction threshold,
NOT the raw context window. 100% = compaction fires.
Uses ANSI colors:
- cyan at ~60% to compaction = informational
- bold yellow at ~85% to compaction = warning
Args:
compaction_progress: How close to compaction (0.01.0, 1.0 = fires).
threshold_tokens: Compaction threshold in tokens.
@@ -674,18 +670,12 @@ def format_context_pressure(
threshold_k = f"{threshold_tokens // 1000}k" if threshold_tokens >= 1000 else str(threshold_tokens)
threshold_pct_int = int(threshold_percent * 100)
# Tier styling
if compaction_progress >= 0.85:
color = f"{_BOLD}{_YELLOW}"
icon = ""
if compression_enabled:
hint = "compaction imminent"
else:
hint = "no auto-compaction"
color = f"{_BOLD}{_YELLOW}"
icon = ""
if compression_enabled:
hint = "compaction approaching"
else:
color = _CYAN
icon = ""
hint = "approaching compaction"
hint = "no auto-compaction"
return (
f" {color}{icon} context {bar} {pct_int}% to compaction{_ANSI_RESET}"
@@ -709,14 +699,10 @@ def format_context_pressure_gateway(
threshold_pct_int = int(threshold_percent * 100)
if compaction_progress >= 0.85:
icon = "⚠️"
if compression_enabled:
hint = f"Context compaction is imminent (threshold: {threshold_pct_int}% of window)."
else:
hint = "Auto-compaction is disabled — context may be truncated."
icon = "⚠️"
if compression_enabled:
hint = f"Context compaction approaching (threshold: {threshold_pct_int}% of window)."
else:
icon = ""
hint = f"Compaction threshold is at {threshold_pct_int}% of context window."
hint = "Auto-compaction is disabled — context may be truncated."
return f"{icon} Context: {bar} {pct_int}% to compaction\n{hint}"
+18 -3
View File
@@ -232,19 +232,34 @@ browser:
# 1. Tracks actual token usage from API responses (not estimates)
# 2. When prompt_tokens >= threshold% of model's context_length, triggers compression
# 3. Protects first 3 turns (system prompt, initial request, first response)
# 4. Protects last 4 turns (recent context is most relevant)
# 4. Protects last N turns (default 20 messages = ~10 full turns of recent context)
# 5. Summarizes middle turns using a fast/cheap model
# 6. Inserts summary as a user message, continues conversation seamlessly
#
# Post-compression tail budget is target_ratio × threshold × context_length:
# 200K context, threshold 0.50, ratio 0.20 → 20K tokens of recent tail preserved
# 1M context, threshold 0.50, ratio 0.20 → 100K tokens of recent tail preserved
#
compression:
# Enable automatic context compression (default: true)
# Set to false if you prefer to manage context manually or want errors on overflow
enabled: true
# Trigger compression at this % of model's context limit (default: 0.85 = 85%)
# Trigger compression at this % of model's context limit (default: 0.50 = 50%)
# Lower values = more aggressive compression, higher values = compress later
threshold: 0.85
threshold: 0.50
# Fraction of the threshold to preserve as recent tail (default: 0.20 = 20%)
# e.g. 20% of 50% threshold = 10% of total context kept as recent messages.
# Summary output is separately capped at 12K tokens (Gemini output limit).
# Range: 0.10 - 0.80
target_ratio: 0.20
# Number of most-recent messages to always preserve (default: 20 ≈ 10 full turns)
# Higher values keep more recent conversation intact at the cost of more aggressive
# compression of older turns.
protect_last_n: 20
# Model to use for generating summaries (fast/cheap recommended)
# This model compresses the middle turns into a concise summary.
# IMPORTANT: it receives the full middle section of the conversation, so it
+94 -86
View File
@@ -31,7 +31,6 @@ from typing import List, Dict, Any, Optional
logger = logging.getLogger(__name__)
# Suppress startup messages for clean CLI experience
os.environ["MSWEA_SILENT_STARTUP"] = "1" # mini-swe-agent
os.environ["HERMES_QUIET"] = "1" # Our own modules
import yaml
@@ -78,8 +77,6 @@ _hermes_home = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
_project_env = Path(__file__).parent / '.env'
load_hermes_dotenv(hermes_home=_hermes_home, project_env=_project_env)
# Point mini-swe-agent at ~/.hermes/ so it shares our config
os.environ.setdefault("MSWEA_GLOBAL_CONFIG_DIR", str(_hermes_home))
# =============================================================================
# Configuration Loading
@@ -301,7 +298,11 @@ def load_cli_config() -> Dict[str, Any]:
defaults["agent"]["max_turns"] = file_config["max_turns"]
except Exception as e:
logger.warning("Failed to load cli-config.yaml: %s", e)
# Expand ${ENV_VAR} references in config values before bridging to env vars.
from hermes_cli.config import _expand_env_vars
defaults = _expand_env_vars(defaults)
# Apply terminal config to environment variables (so terminal_tool picks them up)
terminal_config = defaults.get("terminal", {})
@@ -1508,10 +1509,14 @@ class HermesCLI:
self._reasoning_buf = getattr(self, "_reasoning_buf", "") + text
# Emit complete lines
# Emit complete lines, and force-flush long partial lines so
# reasoning is visible in real-time even without newlines.
while "\n" in self._reasoning_buf:
line, self._reasoning_buf = self._reasoning_buf.split("\n", 1)
_cprint(f"{_DIM}{line}{_RST}")
if len(self._reasoning_buf) > 80:
_cprint(f"{_DIM}{self._reasoning_buf}{_RST}")
self._reasoning_buf = ""
def _close_reasoning_box(self) -> None:
"""Close the live reasoning box if it's open."""
@@ -1934,6 +1939,7 @@ class HermesCLI:
pass_session_id=self.pass_session_id,
tool_progress_callback=self._on_tool_progress,
stream_delta_callback=self._stream_delta if self.streaming_enabled else None,
tool_gen_callback=self._on_tool_gen_start if self.streaming_enabled else None,
)
# Route agent status output through prompt_toolkit so ANSI escape
# sequences aren't garbled by patch_stdout's StdoutProxy (#2262).
@@ -3557,103 +3563,85 @@ class HermesCLI:
# Use original case so model names like "Anthropic/Claude-Opus-4" are preserved
parts = cmd_original.split(maxsplit=1)
if len(parts) > 1:
from hermes_cli.auth import resolve_provider
from hermes_cli.models import (
parse_model_input,
validate_requested_model,
_PROVIDER_LABELS,
)
from hermes_cli.model_switch import switch_model, switch_to_custom_provider
raw_input = parts[1].strip()
# Parse provider:model syntax (e.g. "openrouter:anthropic/claude-sonnet-4.5")
# Handle bare "/model custom" — switch to custom provider
# and auto-detect the model from the endpoint.
if raw_input.strip().lower() == "custom":
result = switch_to_custom_provider()
if result.success:
self.model = result.model
self.requested_provider = "custom"
self.provider = "custom"
self.api_key = result.api_key
self.base_url = result.base_url
self.agent = None
save_config_value("model.default", result.model)
save_config_value("model.provider", "custom")
save_config_value("model.base_url", result.base_url)
print(f"(^_^)b Model changed to: {result.model} [provider: Custom]")
print(f" Endpoint: {result.base_url}")
print(f" Status: connected (model auto-detected)")
else:
print(f"(>_<) {result.error_message}")
return True
# Core model-switching pipeline (shared with gateway)
current_provider = self.provider or self.requested_provider or "openrouter"
target_provider, new_model = parse_model_input(raw_input, current_provider)
# Auto-detect provider when no explicit provider:model syntax was used.
# Skip auto-detection for custom providers — the model name might
# coincidentally match a known provider's catalog, but the user
# intends to use it on their custom endpoint. Require explicit
# provider:model syntax (e.g. /model openai-codex:gpt-5.2-codex)
# to switch away from a custom endpoint.
_base = self.base_url or ""
is_custom = current_provider == "custom" or (
"localhost" in _base or "127.0.0.1" in _base
result = switch_model(
raw_input,
current_provider,
current_base_url=self.base_url or "",
current_api_key=self.api_key or "",
)
if target_provider == current_provider and not is_custom:
from hermes_cli.models import detect_provider_for_model
detected = detect_provider_for_model(new_model, current_provider)
if detected:
target_provider, new_model = detected
provider_changed = target_provider != current_provider
# If provider is changing, re-resolve credentials for the new provider
api_key_for_probe = self.api_key
base_url_for_probe = self.base_url
if provider_changed:
try:
from hermes_cli.runtime_provider import resolve_runtime_provider
runtime = resolve_runtime_provider(requested=target_provider)
api_key_for_probe = runtime.get("api_key", "")
base_url_for_probe = runtime.get("base_url", "")
except Exception as e:
provider_label = _PROVIDER_LABELS.get(target_provider, target_provider)
if target_provider == "custom":
print(f"(>_<) Custom endpoint not configured. Set OPENAI_BASE_URL and OPENAI_API_KEY,")
print(f" or run: hermes setup → Custom OpenAI-compatible endpoint")
else:
print(f"(>_<) Could not resolve credentials for provider '{provider_label}': {e}")
print(f"(^_^) Current model unchanged: {self.model}")
return True
try:
validation = validate_requested_model(
new_model,
target_provider,
api_key=api_key_for_probe,
base_url=base_url_for_probe,
)
except Exception:
validation = {"accepted": True, "persist": True, "recognized": False, "message": None}
if not validation.get("accepted"):
print(f"(>_<) {validation.get('message')}")
print(f" Model unchanged: {self.model}")
if "Did you mean" not in (validation.get("message") or ""):
print(" Tip: Use /model to see available models, /provider to see providers")
if not result.success:
print(f"(>_<) {result.error_message}")
if "Did you mean" not in result.error_message:
print(f" Model unchanged: {self.model}")
if "credentials" not in result.error_message.lower():
print(" Tip: Use /model to see available models, /provider to see providers")
else:
self.model = new_model
self.model = result.new_model
self.agent = None # Force re-init
if provider_changed:
self.requested_provider = target_provider
self.provider = target_provider
self.api_key = api_key_for_probe
self.base_url = base_url_for_probe
if result.provider_changed:
self.requested_provider = result.target_provider
self.provider = result.target_provider
self.api_key = result.api_key
self.base_url = result.base_url
provider_label = _PROVIDER_LABELS.get(target_provider, target_provider)
provider_note = f" [provider: {provider_label}]" if provider_changed else ""
provider_note = f" [provider: {result.provider_label}]" if result.provider_changed else ""
if validation.get("persist"):
saved_model = save_config_value("model.default", new_model)
if provider_changed:
save_config_value("model.provider", target_provider)
if result.persist:
saved_model = save_config_value("model.default", result.new_model)
if result.provider_changed:
save_config_value("model.provider", result.target_provider)
# Persist base_url for custom endpoints; clear
# when switching away from custom (#2562 Phase 2).
if result.base_url and "openrouter.ai" not in (result.base_url or ""):
save_config_value("model.base_url", result.base_url)
else:
save_config_value("model.base_url", None)
if saved_model:
print(f"(^_^)b Model changed to: {new_model}{provider_note} (saved to config)")
print(f"(^_^)b Model changed to: {result.new_model}{provider_note} (saved to config)")
else:
print(f"(^_^) Model changed to: {new_model}{provider_note} (this session only)")
print(f"(^_^) Model changed to: {result.new_model}{provider_note} (this session only)")
else:
message = validation.get("message") or ""
print(f"(^_^) Model changed to: {new_model}{provider_note} (this session only)")
if message:
print(f" Reason: {message}")
print(f"(^_^) Model changed to: {result.new_model}{provider_note} (this session only)")
if result.warning_message:
print(f" Reason: {result.warning_message}")
print(" Note: Model will revert on restart. Use a verified model to save to config.")
# Helpful hint when staying on a custom endpoint
if is_custom and not provider_changed:
endpoint = self.base_url or "custom endpoint"
# Show endpoint info for custom providers
if result.is_custom_target:
endpoint = result.base_url or self.base_url or "custom endpoint"
print(f" Endpoint: {endpoint}")
print(f" Tip: To switch providers, use /model provider:model")
print(f" e.g. /model openai-codex:gpt-5.2-codex")
if not result.provider_changed:
print(f" Tip: To switch providers, use /model provider:model")
print(f" e.g. /model openai-codex:gpt-5.2-codex")
else:
self._show_model_and_providers()
elif canonical == "provider":
@@ -4457,7 +4445,7 @@ class HermesCLI:
logging.getLogger(noisy).setLevel(logging.WARNING)
else:
logging.getLogger().setLevel(logging.INFO)
for quiet_logger in ('tools', 'minisweagent', 'run_agent', 'trajectory_compressor', 'cron', 'hermes_cli'):
for quiet_logger in ('tools', 'run_agent', 'trajectory_compressor', 'cron', 'hermes_cli'):
logging.getLogger(quiet_logger).setLevel(logging.ERROR)
def _show_insights(self, command: str = "/insights"):
@@ -4629,6 +4617,26 @@ class HermesCLI:
except Exception as e:
print(f" ❌ MCP reload failed: {e}")
# ====================================================================
# Tool-call generation indicator (shown during streaming)
# ====================================================================
def _on_tool_gen_start(self, tool_name: str) -> None:
"""Called when the model begins generating tool-call arguments.
Closes any open streaming boxes (reasoning / response) exactly once,
then prints a short status line so the user sees activity instead of
a frozen screen while a large payload (e.g. 45 KB write_file) streams.
"""
if getattr(self, "_stream_box_opened", False):
self._flush_stream()
self._stream_box_opened = False
self._close_reasoning_box()
from agent.display import get_tool_emoji
emoji = get_tool_emoji(tool_name, default="")
_cprint(f"{emoji} preparing {tool_name}")
# ====================================================================
# Tool progress callback (audio cues for voice mode)
# ====================================================================
+1 -1
View File
@@ -101,7 +101,7 @@ Available methods:
### Patches (`patches.py`)
**Problem**: Some hermes-agent tools use `asyncio.run()` internally (e.g., mini-swe-agent's Modal backend via SWE-ReX). This crashes when called from inside Atropos's event loop because `asyncio.run()` cannot be nested.
**Problem**: Some hermes-agent tools use `asyncio.run()` internally (e.g., the Modal backend via SWE-ReX). This crashes when called from inside Atropos's event loop because `asyncio.run()` cannot be nested.
**Solution**: `patches.py` monkey-patches `SwerexModalEnvironment` to use a dedicated background thread (`_AsyncWorker`) with its own event loop. The calling code sees the same sync interface, but internally the async work happens on a separate thread that doesn't conflict with Atropos's loop.
+1 -1
View File
@@ -23,7 +23,7 @@ from typing import Any, Dict, List, Optional, Set
from model_tools import handle_function_call
# Thread pool for running sync tool calls that internally use asyncio.run()
# (e.g., mini-swe-agent's modal/docker/daytona backends). Running them in a separate
# (e.g., the Modal/Docker/Daytona terminal backends). Running them in a separate
# thread gives them a clean event loop so they don't deadlock inside Atropos's loop.
# Size must be large enough for concurrent eval tasks (e.g., 89 TB2 tasks all
# making tool calls). Too small = thread pool starvation, tasks queue for minutes.
+12 -174
View File
@@ -2,203 +2,41 @@
Monkey patches for making hermes-agent tools work inside async frameworks (Atropos).
Problem:
Some tools use asyncio.run() internally (e.g., mini-swe-agent's Modal backend,
Some tools use asyncio.run() internally (e.g., Modal backend via SWE-ReX,
web_extract). This crashes when called from inside Atropos's event loop because
asyncio.run() can't be nested.
Solution:
Replace the problematic methods with versions that use a dedicated background
thread with its own event loop. The calling code sees the same sync interface --
call a function, get a result -- but internally the async work happens on a
separate thread that doesn't conflict with Atropos's loop.
The Modal environment (tools/environments/modal.py) now uses a dedicated
_AsyncWorker thread internally, making it safe for both CLI and Atropos use.
No monkey-patching is required.
These patches are safe for normal CLI use too: when there's no running event
loop, the behavior is identical (the background thread approach works regardless).
What gets patched:
- SwerexModalEnvironment.__init__ -- creates Modal deployment on a background thread
- SwerexModalEnvironment.execute -- runs commands on the same background thread
- SwerexModalEnvironment.stop -- stops deployment on the background thread
This module is kept for backward compatibility — apply_patches() is now a no-op.
Usage:
Call apply_patches() once at import time (done automatically by hermes_base_env.py).
This is idempotent -- calling it multiple times is safe.
This is idempotent calling it multiple times is safe.
"""
import asyncio
import logging
import threading
from typing import Any
logger = logging.getLogger(__name__)
_patches_applied = False
class _AsyncWorker:
"""
A dedicated background thread with its own event loop.
Allows sync code to submit async coroutines and block for results,
even when called from inside another running event loop. Used to
bridge sync tool interfaces with async backends (Modal, SWE-ReX).
"""
def __init__(self):
self._loop: asyncio.AbstractEventLoop = None
self._thread: threading.Thread = None
self._started = threading.Event()
def start(self):
"""Start the background event loop thread."""
self._thread = threading.Thread(target=self._run_loop, daemon=True)
self._thread.start()
self._started.wait(timeout=30)
def _run_loop(self):
"""Background thread entry point -- runs the event loop forever."""
self._loop = asyncio.new_event_loop()
asyncio.set_event_loop(self._loop)
self._started.set()
self._loop.run_forever()
def run_coroutine(self, coro, timeout=600):
"""
Submit a coroutine to the background loop and block until it completes.
Safe to call from any thread, including threads that already have
a running event loop.
"""
if self._loop is None or self._loop.is_closed():
raise RuntimeError("AsyncWorker loop is not running")
future = asyncio.run_coroutine_threadsafe(coro, self._loop)
return future.result(timeout=timeout)
def stop(self):
"""Stop the background event loop and join the thread."""
if self._loop and self._loop.is_running():
self._loop.call_soon_threadsafe(self._loop.stop)
if self._thread:
self._thread.join(timeout=10)
def _patch_swerex_modal():
"""
Monkey patch SwerexModalEnvironment to use a background thread event loop
instead of asyncio.run(). This makes it safe to call from inside Atropos's
async event loop.
The patched methods have the exact same interface and behavior -- the only
difference is HOW the async work is executed internally.
"""
try:
from minisweagent.environments.extra.swerex_modal import (
SwerexModalEnvironment,
SwerexModalEnvironmentConfig,
)
from swerex.deployment.modal import ModalDeployment
from swerex.runtime.abstract import Command as RexCommand
except ImportError:
# mini-swe-agent or swe-rex not installed -- nothing to patch
logger.debug("mini-swe-agent Modal backend not available, skipping patch")
return
# Save original methods so we can refer to config handling
_original_init = SwerexModalEnvironment.__init__
def _patched_init(self, **kwargs):
"""Patched __init__: creates Modal deployment on a background thread."""
self.config = SwerexModalEnvironmentConfig(**kwargs)
# Start a dedicated event loop thread for all Modal async operations
self._worker = _AsyncWorker()
self._worker.start()
# Pre-build a modal.Image with pip fix for Modal's legacy image builder.
# Modal requires `python -m pip` to work during image build, but some
# task images (e.g., TBLite's broken-python) have intentionally broken pip.
# Fix: remove stale pip dist-info and reinstall via ensurepip before Modal
# tries to use it. This is a no-op for images where pip already works.
import modal as _modal
image_spec = self.config.image
if isinstance(image_spec, str):
image_spec = _modal.Image.from_registry(
image_spec,
setup_dockerfile_commands=[
"RUN rm -rf /usr/local/lib/python*/site-packages/pip* 2>/dev/null; "
"python -m ensurepip --upgrade --default-pip 2>/dev/null || true",
],
)
# Create AND start the deployment entirely on the worker's loop/thread
# so all gRPC channels and async state are bound to that loop
async def _create_and_start():
deployment = ModalDeployment(
image=image_spec,
startup_timeout=self.config.startup_timeout,
runtime_timeout=self.config.runtime_timeout,
deployment_timeout=self.config.deployment_timeout,
install_pipx=self.config.install_pipx,
modal_sandbox_kwargs=self.config.modal_sandbox_kwargs,
)
await deployment.start()
return deployment
self.deployment = self._worker.run_coroutine(_create_and_start())
def _patched_execute(self, command: str, cwd: str = "", *, timeout: int | None = None) -> dict[str, Any]:
"""Patched execute: runs commands on the background thread's loop."""
async def _do_execute():
return await self.deployment.runtime.execute(
RexCommand(
command=command,
shell=True,
check=False,
cwd=cwd or self.config.cwd,
timeout=timeout or self.config.timeout,
merge_output_streams=True,
env=self.config.env if self.config.env else None,
)
)
output = self._worker.run_coroutine(_do_execute())
return {
"output": output.stdout,
"returncode": output.exit_code,
}
def _patched_stop(self):
"""Patched stop: stops deployment on the background thread, then stops the thread."""
try:
self._worker.run_coroutine(
asyncio.wait_for(self.deployment.stop(), timeout=10),
timeout=15,
)
except Exception:
pass
finally:
self._worker.stop()
# Apply the patches
SwerexModalEnvironment.__init__ = _patched_init
SwerexModalEnvironment.execute = _patched_execute
SwerexModalEnvironment.stop = _patched_stop
logger.debug("Patched SwerexModalEnvironment for async-safe operation")
def apply_patches():
"""
Apply all monkey patches needed for Atropos compatibility.
"""Apply all monkey patches needed for Atropos compatibility.
Safe to call multiple times -- patches are only applied once.
Safe for normal CLI use -- patched code works identically when
there is no running event loop.
Now a no-op — Modal async safety is built directly into ModalEnvironment.
Safe to call multiple times.
"""
global _patches_applied
if _patches_applied:
return
_patch_swerex_modal()
# Modal async-safety is now built into tools/environments/modal.py
# via the _AsyncWorker class. No monkey-patching needed.
logger.debug("apply_patches() called — no patches needed (async safety is built-in)")
_patches_applied = True
+7 -2
View File
@@ -523,8 +523,13 @@ def load_gateway_config() -> GatewayConfig:
os.environ["DISCORD_FREE_RESPONSE_CHANNELS"] = str(frc)
if "auto_thread" in discord_cfg and not os.getenv("DISCORD_AUTO_THREAD"):
os.environ["DISCORD_AUTO_THREAD"] = str(discord_cfg["auto_thread"]).lower()
except Exception:
pass
except Exception as e:
logger.warning(
"Failed to process config.yaml — falling back to .env / gateway.json values. "
"Check %s for syntax errors. Error: %s",
_home / "config.yaml",
e,
)
config = GatewayConfig.from_dict(gw_data)
+1 -1
View File
@@ -721,7 +721,7 @@ class BasePlatformAdapter(ABC):
# Extract MEDIA:<path> tags, allowing optional whitespace after the colon
# and quoted/backticked paths for LLM-formatted outputs.
media_pattern = re.compile(
r'''[`"']?MEDIA:\s*(?P<path>`[^`\n]+`|"[^"\n]+"|'[^'\n]+'|\S+)[`"']?'''
r'''[`"']?MEDIA:\s*(?P<path>`[^`\n]+`|"[^"\n]+"|'[^'\n]+'|(?:~/|/)\S+(?:[^\S\n]+\S+)*?\.(?:png|jpe?g|gif|webp|mp4|mov|avi|mkv|webm|ogg|opus|mp3|wav|m4a)(?=[\s`"',;:)\]}]|$)|\S+)[`"']?'''
)
for match in media_pattern.finditer(content):
path = match.group("path").strip()
+127 -88
View File
@@ -93,6 +93,9 @@ if _config_path.exists():
import yaml as _yaml
with open(_config_path, encoding="utf-8") as _f:
_cfg = _yaml.safe_load(_f) or {}
# Expand ${ENV_VAR} references before bridging to env vars.
from hermes_cli.config import _expand_env_vars
_cfg = _expand_env_vars(_cfg)
# Top-level simple values (fallback only — don't override .env)
for _key, _val in _cfg.items():
if isinstance(_val, (str, int, float, bool)) and _key not in os.environ:
@@ -525,6 +528,12 @@ class GatewayRunner:
Synchronous worker meant to be called via run_in_executor from
an async context so it doesn't block the event loop.
"""
# Skip cron sessions — they run headless with no meaningful user
# conversation to extract memories from.
if old_session_id and old_session_id.startswith("cron_"):
logger.debug("Skipping memory flush for cron session: %s", old_session_id)
return
try:
history = self.session_store.load_transcript(old_session_id)
if not history or len(history) < 4:
@@ -557,6 +566,23 @@ class GatewayRunner:
if m.get("role") in ("user", "assistant") and m.get("content")
]
# Read live memory state from disk so the flush agent can see
# what's already saved and avoid overwriting newer entries.
_current_memory = ""
try:
from tools.memory_tool import MEMORY_DIR
for fname, label in [
("MEMORY.md", "MEMORY (your personal notes)"),
("USER.md", "USER PROFILE (who the user is)"),
]:
fpath = MEMORY_DIR / fname
if fpath.exists():
content = fpath.read_text(encoding="utf-8").strip()
if content:
_current_memory += f"\n\n## Current {label}:\n{content}"
except Exception:
pass # Non-fatal — flush still works, just without the guard
# Give the agent a real turn to think about what to save
flush_prompt = (
"[System: This session is about to be automatically reset due to "
@@ -568,6 +594,20 @@ class GatewayRunner:
"2. If you discovered a reusable workflow or solved a non-trivial "
"problem, consider saving it as a skill.\n"
"3. If nothing is worth saving, that's fine — just skip.\n\n"
)
if _current_memory:
flush_prompt += (
"IMPORTANT — here is the current live state of memory. Other "
"sessions, cron jobs, or the user may have updated it since this "
"conversation ended. Do NOT overwrite or remove entries unless "
"the conversation above reveals something that genuinely "
"supersedes them. Only add new information that is not already "
"captured below."
f"{_current_memory}\n\n"
)
flush_prompt += (
"Do NOT respond to the user. Just use the memory and skill_manage "
"tools if needed, then stop.]"
)
@@ -904,7 +944,9 @@ class GatewayRunner:
os.getenv(v)
for v in ("TELEGRAM_ALLOWED_USERS", "DISCORD_ALLOWED_USERS",
"WHATSAPP_ALLOWED_USERS", "SLACK_ALLOWED_USERS",
"SMS_ALLOWED_USERS",
"SIGNAL_ALLOWED_USERS", "EMAIL_ALLOWED_USERS",
"SMS_ALLOWED_USERS", "MATTERMOST_ALLOWED_USERS",
"MATRIX_ALLOWED_USERS", "DINGTALK_ALLOWED_USERS",
"GATEWAY_ALLOWED_USERS")
)
_allow_all = os.getenv("GATEWAY_ALLOW_ALL_USERS", "").lower() in ("true", "1", "yes")
@@ -2809,70 +2851,13 @@ class GatewayRunner:
lines.append("Switch provider: `/model provider-name` or `/model provider:model-name`")
return "\n".join(lines)
# Parse provider:model syntax
target_provider, new_model = parse_model_input(args, current_provider)
# Detect custom/local provider — skip auto-detection to prevent
# silently accepting an OpenRouter model name on a localhost endpoint.
# Users must use explicit provider:model syntax to switch away.
_resolved_base = ""
try:
from hermes_cli.runtime_provider import resolve_runtime_provider as _rtp
_resolved_base = _rtp(requested=current_provider).get("base_url", "")
except Exception:
pass
is_custom = current_provider == "custom" or (
"localhost" in _resolved_base or "127.0.0.1" in _resolved_base
)
# Auto-detect provider when no explicit provider:model syntax was used
if target_provider == current_provider and not is_custom:
from hermes_cli.models import detect_provider_for_model
detected = detect_provider_for_model(new_model, current_provider)
if detected:
target_provider, new_model = detected
provider_changed = target_provider != current_provider
# Resolve credentials for the target provider (for API probe)
api_key = os.getenv("OPENROUTER_API_KEY") or os.getenv("OPENAI_API_KEY") or ""
base_url = "https://openrouter.ai/api/v1"
if provider_changed:
try:
from hermes_cli.runtime_provider import resolve_runtime_provider
runtime = resolve_runtime_provider(requested=target_provider)
api_key = runtime.get("api_key", "")
base_url = runtime.get("base_url", "")
except Exception as e:
provider_label = _PROVIDER_LABELS.get(target_provider, target_provider)
return f"⚠️ Could not resolve credentials for provider '{provider_label}': {e}"
else:
# Use current provider's base_url from config or registry
try:
from hermes_cli.runtime_provider import resolve_runtime_provider
runtime = resolve_runtime_provider(requested=current_provider)
api_key = runtime.get("api_key", "")
base_url = runtime.get("base_url", "")
except Exception:
pass
# Validate the model against the live API
try:
validation = validate_requested_model(
new_model,
target_provider,
api_key=api_key,
base_url=base_url,
)
except Exception:
validation = {"accepted": True, "persist": True, "recognized": False, "message": None}
if not validation.get("accepted"):
msg = validation.get("message", "Invalid model")
tip = "\n\nUse `/model` to see available models, `/provider` to see providers" if "Did you mean" not in msg else ""
return f"⚠️ {msg}{tip}"
# Persist to config only if validation approves
if validation.get("persist"):
# Handle bare "/model custom" — switch to custom provider
# and auto-detect the model from the endpoint.
if args.strip().lower() == "custom":
from hermes_cli.model_switch import switch_to_custom_provider
cust_result = switch_to_custom_provider()
if not cust_result.success:
return f"⚠️ {cust_result.error_message}"
try:
user_config = {}
if config_path.exists():
@@ -2880,45 +2865,99 @@ class GatewayRunner:
user_config = yaml.safe_load(f) or {}
if "model" not in user_config or not isinstance(user_config["model"], dict):
user_config["model"] = {}
user_config["model"]["default"] = new_model
if provider_changed:
user_config["model"]["provider"] = target_provider
user_config["model"]["default"] = cust_result.model
user_config["model"]["provider"] = "custom"
user_config["model"]["base_url"] = cust_result.base_url
with open(config_path, 'w', encoding="utf-8") as f:
yaml.dump(user_config, f, default_flow_style=False, sort_keys=False)
except Exception as e:
return f"⚠️ Failed to save model change: {e}"
os.environ["HERMES_MODEL"] = cust_result.model
os.environ["HERMES_INFERENCE_PROVIDER"] = "custom"
self._effective_model = None
self._effective_provider = None
return (
f"🤖 Model changed to `{cust_result.model}` (saved to config)\n"
f"**Provider:** Custom\n"
f"**Endpoint:** `{cust_result.base_url}`\n"
f"_Model auto-detected from endpoint. Takes effect on next message._"
)
# Core model-switching pipeline (shared with CLI)
from hermes_cli.model_switch import switch_model
# Resolve current base_url for is_custom detection
_resolved_base = ""
try:
from hermes_cli.runtime_provider import resolve_runtime_provider as _rtp
_resolved_base = _rtp(requested=current_provider).get("base_url", "")
except Exception:
pass
result = switch_model(
args,
current_provider,
current_base_url=_resolved_base,
current_api_key=os.getenv("OPENROUTER_API_KEY") or os.getenv("OPENAI_API_KEY") or "",
)
if not result.success:
msg = result.error_message
tip = "\n\nUse `/model` to see available models, `/provider` to see providers" if "Did you mean" not in msg else ""
return f"⚠️ {msg}{tip}"
# Persist to config only if validation approves
if result.persist:
try:
user_config = {}
if config_path.exists():
with open(config_path, encoding="utf-8") as f:
user_config = yaml.safe_load(f) or {}
if "model" not in user_config or not isinstance(user_config["model"], dict):
user_config["model"] = {}
user_config["model"]["default"] = result.new_model
if result.provider_changed:
user_config["model"]["provider"] = result.target_provider
# Persist base_url for custom endpoints; clear when
# switching away from custom (#2562 Phase 2).
if result.base_url and "openrouter.ai" not in (result.base_url or ""):
user_config["model"]["base_url"] = result.base_url
else:
user_config["model"].pop("base_url", None)
with open(config_path, 'w', encoding="utf-8") as f:
yaml.dump(user_config, f, default_flow_style=False, sort_keys=False)
except Exception as e:
return f"⚠️ Failed to save model change: {e}"
# Set env vars so the next agent run picks up the change
os.environ["HERMES_MODEL"] = new_model
if provider_changed:
os.environ["HERMES_INFERENCE_PROVIDER"] = target_provider
os.environ["HERMES_MODEL"] = result.new_model
if result.provider_changed:
os.environ["HERMES_INFERENCE_PROVIDER"] = result.target_provider
provider_label = _PROVIDER_LABELS.get(target_provider, target_provider)
provider_note = f"\n**Provider:** {provider_label}" if provider_changed else ""
provider_note = f"\n**Provider:** {result.provider_label}" if result.provider_changed else ""
warning = ""
if validation.get("message"):
warning = f"\n⚠️ {validation['message']}"
if result.warning_message:
warning = f"\n⚠️ {result.warning_message}"
persist_note = "saved to config" if result.persist else "this session only — will revert on restart"
if validation.get("persist"):
persist_note = "saved to config"
else:
persist_note = "this session only — will revert on restart"
# Clear fallback state since user explicitly chose a model
self._effective_model = None
self._effective_provider = None
# Helpful hint when staying on a custom/local endpoint
# Show endpoint info for custom providers
custom_hint = ""
if is_custom and not provider_changed:
endpoint = _resolved_base or "custom endpoint"
custom_hint = (
f"\n**Endpoint:** `{endpoint}`"
"\n_To switch providers, use_ `/model provider:model`"
"\n_e.g._ `/model openrouter:anthropic/claude-sonnet-4`"
)
if result.is_custom_target:
endpoint = result.base_url or _resolved_base or "custom endpoint"
custom_hint = f"\n**Endpoint:** `{endpoint}`"
if not result.provider_changed:
custom_hint += (
"\n_To switch providers, use_ `/model provider:model`"
"\n_e.g._ `/model openrouter:anthropic/claude-sonnet-4`"
)
return f"🤖 Model changed to `{new_model}` ({persist_note}){provider_note}{warning}{custom_hint}\n_(takes effect on next message)_"
return f"🤖 Model changed to `{result.new_model}` ({persist_note}){provider_note}{warning}{custom_hint}\n_(takes effect on next message)_"
async def _handle_provider_command(self, event: MessageEvent) -> str:
"""Handle /provider command - show available providers."""
+1 -1
View File
@@ -12,4 +12,4 @@ Provides subcommands for:
"""
__version__ = "0.4.0"
__release_date__ = "2026.3.18"
__release_date__ = "2026.3.23"
+3 -1
View File
@@ -690,8 +690,10 @@ def resolve_provider(
}
normalized = _PROVIDER_ALIASES.get(normalized, normalized)
if normalized in {"openrouter", "custom"}:
if normalized == "openrouter":
return "openrouter"
if normalized == "custom":
return "custom"
if normalized in PROVIDER_REGISTRY:
return normalized
if normalized != "auto":
+32 -3
View File
@@ -119,6 +119,10 @@ DEFAULT_CONFIG = {
"backend": "local",
"cwd": ".", # Use current directory
"timeout": 180,
# Environment variables to pass through to sandboxed execution
# (terminal and execute_code). Skill-declared required_environment_variables
# are passed through automatically; this list is for non-skill use cases.
"env_passthrough": [],
"docker_image": "nikolaik/python-nodejs:python3.11-nodejs20",
"docker_forward_env": [],
"singularity_image": "docker://nikolaik/python-nodejs:python3.11-nodejs20",
@@ -145,6 +149,7 @@ DEFAULT_CONFIG = {
"browser": {
"inactivity_timeout": 120,
"command_timeout": 30, # Timeout for browser commands in seconds (screenshot, navigate, etc.)
"record_sessions": False, # Auto-record browser sessions as WebM videos
},
@@ -158,8 +163,10 @@ DEFAULT_CONFIG = {
"compression": {
"enabled": True,
"threshold": 0.50,
"summary_model": "", # empty = use main configured model
"threshold": 0.50, # compress when context usage exceeds this ratio
"target_ratio": 0.20, # fraction of threshold to preserve as recent tail
"protect_last_n": 20, # minimum recent messages to keep uncompressed
"summary_model": "", # empty = use main configured model
"summary_provider": "auto",
"summary_base_url": None,
},
@@ -1172,6 +1179,26 @@ def _deep_merge(base: dict, override: dict) -> dict:
return result
def _expand_env_vars(obj):
"""Recursively expand ``${VAR}`` references in config values.
Only string values are processed; dict keys, numbers, booleans, and
None are left untouched. Unresolved references (variable not in
``os.environ``) are kept verbatim so callers can detect them.
"""
if isinstance(obj, str):
return re.sub(
r"\${([^}]+)}",
lambda m: os.environ.get(m.group(1), m.group(0)),
obj,
)
if isinstance(obj, dict):
return {k: _expand_env_vars(v) for k, v in obj.items()}
if isinstance(obj, list):
return [_expand_env_vars(item) for item in obj]
return obj
def _normalize_max_turns_config(config: Dict[str, Any]) -> Dict[str, Any]:
"""Normalize legacy root-level max_turns into agent.max_turns."""
config = dict(config)
@@ -1213,7 +1240,7 @@ def load_config() -> Dict[str, Any]:
except Exception as e:
print(f"Warning: Failed to load config: {e}")
return _normalize_max_turns_config(config)
return _expand_env_vars(_normalize_max_turns_config(config))
_SECURITY_COMMENT = """
@@ -1660,6 +1687,8 @@ def show_config():
print(f" Enabled: {'yes' if enabled else 'no'}")
if enabled:
print(f" Threshold: {compression.get('threshold', 0.50) * 100:.0f}%")
print(f" Target ratio: {compression.get('target_ratio', 0.20) * 100:.0f}% of threshold preserved")
print(f" Protect last: {compression.get('protect_last_n', 20)} messages")
_sm = compression.get('summary_model', '') or '(main model)'
print(f" Model: {_sm}")
comp_provider = compression.get('summary_provider', 'auto')
-16
View File
@@ -26,10 +26,6 @@ if _env_path.exists():
# Also try project .env as dev fallback
load_dotenv(PROJECT_ROOT / ".env", override=False, encoding="utf-8")
# Point mini-swe-agent at ~/.hermes/ so it shares our config
os.environ.setdefault("MSWEA_GLOBAL_CONFIG_DIR", str(HERMES_HOME))
os.environ.setdefault("MSWEA_SILENT_STARTUP", "1")
from hermes_cli.colors import Colors, color
from hermes_constants import OPENROUTER_MODELS_URL
@@ -618,18 +614,6 @@ def run_doctor(args):
print()
print(color("◆ Submodules", Colors.CYAN, Colors.BOLD))
# mini-swe-agent (terminal tool backend)
mini_swe_dir = PROJECT_ROOT / "mini-swe-agent"
if mini_swe_dir.exists() and (mini_swe_dir / "pyproject.toml").exists():
try:
__import__("minisweagent")
check_ok("mini-swe-agent", "(terminal backend)")
except ImportError:
check_warn("mini-swe-agent found but not installed", "(run: uv pip install -e ./mini-swe-agent)")
issues.append("Install mini-swe-agent: uv pip install -e ./mini-swe-agent")
else:
check_warn("mini-swe-agent not found", "(run: git submodule update --init --recursive)")
# tinker-atropos (RL training backend)
tinker_dir = PROJECT_ROOT / "tinker-atropos"
if tinker_dir.exists() and (tinker_dir / "pyproject.toml").exists():
+33 -8
View File
@@ -371,13 +371,37 @@ def print_systemd_linger_guidance() -> None:
def get_launchd_plist_path() -> Path:
return Path.home() / "Library" / "LaunchAgents" / "ai.hermes.gateway.plist"
def _detect_venv_dir() -> Path | None:
"""Detect the active virtualenv directory.
Checks ``sys.prefix`` first (works regardless of the directory name),
then falls back to probing common directory names under PROJECT_ROOT.
Returns ``None`` when no virtualenv can be found.
"""
# If we're running inside a virtualenv, sys.prefix points to it.
if sys.prefix != sys.base_prefix:
venv = Path(sys.prefix)
if venv.is_dir():
return venv
# Fallback: check common virtualenv directory names under the project root.
for candidate in (".venv", "venv"):
venv = PROJECT_ROOT / candidate
if venv.is_dir():
return venv
return None
def get_python_path() -> str:
if is_windows():
venv_python = PROJECT_ROOT / "venv" / "Scripts" / "python.exe"
else:
venv_python = PROJECT_ROOT / "venv" / "bin" / "python"
if venv_python.exists():
return str(venv_python)
venv = _detect_venv_dir()
if venv is not None:
if is_windows():
venv_python = venv / "Scripts" / "python.exe"
else:
venv_python = venv / "bin" / "python"
if venv_python.exists():
return str(venv_python)
return sys.executable
def get_hermes_cli_path() -> str:
@@ -399,8 +423,9 @@ def get_hermes_cli_path() -> str:
def generate_systemd_unit(system: bool = False, run_as_user: str | None = None) -> str:
python_path = get_python_path()
working_dir = str(PROJECT_ROOT)
venv_dir = str(PROJECT_ROOT / "venv")
venv_bin = str(PROJECT_ROOT / "venv" / "bin")
detected_venv = _detect_venv_dir()
venv_dir = str(detected_venv) if detected_venv else str(PROJECT_ROOT / "venv")
venv_bin = str(detected_venv / "bin") if detected_venv else str(PROJECT_ROOT / "venv" / "bin")
node_bin = str(PROJECT_ROOT / "node_modules" / ".bin")
path_entries = [venv_bin, node_bin]
-3
View File
@@ -60,9 +60,6 @@ from hermes_cli.config import get_hermes_home
from hermes_cli.env_loader import load_hermes_dotenv
load_hermes_dotenv(project_env=PROJECT_ROOT / '.env')
# Point mini-swe-agent at ~/.hermes/ so it shares our config
os.environ.setdefault("MSWEA_GLOBAL_CONFIG_DIR", str(get_hermes_home()))
os.environ.setdefault("MSWEA_SILENT_STARTUP", "1")
import logging
import time as _time
+234
View File
@@ -0,0 +1,234 @@
"""Shared model-switching logic for CLI and gateway /model commands.
Both the CLI (cli.py) and gateway (gateway/run.py) /model handlers
share the same core pipeline:
parse_model_input is_custom detection auto-detect provider
credential resolution validate model return result
This module extracts that shared pipeline into pure functions that
return result objects. The callers handle all platform-specific
concerns: state mutation, config persistence, output formatting.
"""
from __future__ import annotations
import os
from dataclasses import dataclass, field
from typing import Optional
@dataclass
class ModelSwitchResult:
"""Result of a model switch attempt."""
success: bool
new_model: str = ""
target_provider: str = ""
provider_changed: bool = False
api_key: str = ""
base_url: str = ""
persist: bool = False
error_message: str = ""
warning_message: str = ""
is_custom_target: bool = False
provider_label: str = ""
@dataclass
class CustomAutoResult:
"""Result of switching to bare 'custom' provider with auto-detect."""
success: bool
model: str = ""
base_url: str = ""
api_key: str = ""
error_message: str = ""
def switch_model(
raw_input: str,
current_provider: str,
current_base_url: str = "",
current_api_key: str = "",
) -> ModelSwitchResult:
"""Core model-switching pipeline shared between CLI and gateway.
Handles parsing, provider detection, credential resolution, and
model validation. Does NOT handle config persistence, state
mutation, or output formatting those are caller responsibilities.
Args:
raw_input: The user's model input (e.g. "claude-sonnet-4",
"zai:glm-5", "custom:local:qwen").
current_provider: The currently active provider.
current_base_url: The currently active base URL (used for
is_custom detection).
current_api_key: The currently active API key.
Returns:
ModelSwitchResult with all information the caller needs to
apply the switch and format output.
"""
from hermes_cli.models import (
parse_model_input,
detect_provider_for_model,
validate_requested_model,
_PROVIDER_LABELS,
)
from hermes_cli.runtime_provider import resolve_runtime_provider
# Step 1: Parse provider:model syntax
target_provider, new_model = parse_model_input(raw_input, current_provider)
# Step 2: Detect if we're currently on a custom endpoint
_base = current_base_url or ""
is_custom = current_provider == "custom" or (
"localhost" in _base or "127.0.0.1" in _base
)
# Step 3: Auto-detect provider when no explicit provider:model syntax
# was used. Skip for custom providers — the model name might
# coincidentally match a known provider's catalog.
if target_provider == current_provider and not is_custom:
detected = detect_provider_for_model(new_model, current_provider)
if detected:
target_provider, new_model = detected
provider_changed = target_provider != current_provider
# Step 4: Resolve credentials for target provider
api_key = current_api_key
base_url = current_base_url
if provider_changed:
try:
runtime = resolve_runtime_provider(requested=target_provider)
api_key = runtime.get("api_key", "")
base_url = runtime.get("base_url", "")
except Exception as e:
provider_label = _PROVIDER_LABELS.get(target_provider, target_provider)
if target_provider == "custom":
return ModelSwitchResult(
success=False,
target_provider=target_provider,
error_message=(
"No custom endpoint configured. Set model.base_url "
"in config.yaml, or set OPENAI_BASE_URL in .env, "
"or run: hermes setup → Custom OpenAI-compatible endpoint"
),
)
return ModelSwitchResult(
success=False,
target_provider=target_provider,
error_message=(
f"Could not resolve credentials for provider "
f"'{provider_label}': {e}"
),
)
else:
# Gateway also resolves for unchanged provider to get accurate
# base_url for validation probing.
try:
runtime = resolve_runtime_provider(requested=current_provider)
api_key = runtime.get("api_key", "")
base_url = runtime.get("base_url", "")
except Exception:
pass
# Step 5: Validate the model
try:
validation = validate_requested_model(
new_model,
target_provider,
api_key=api_key,
base_url=base_url,
)
except Exception:
validation = {
"accepted": True,
"persist": True,
"recognized": False,
"message": None,
}
if not validation.get("accepted"):
msg = validation.get("message", "Invalid model")
return ModelSwitchResult(
success=False,
new_model=new_model,
target_provider=target_provider,
error_message=msg,
)
# Step 6: Build result
provider_label = _PROVIDER_LABELS.get(target_provider, target_provider)
is_custom_target = target_provider == "custom" or (
base_url
and "openrouter.ai" not in (base_url or "")
and ("localhost" in (base_url or "") or "127.0.0.1" in (base_url or ""))
)
return ModelSwitchResult(
success=True,
new_model=new_model,
target_provider=target_provider,
provider_changed=provider_changed,
api_key=api_key,
base_url=base_url,
persist=bool(validation.get("persist")),
warning_message=validation.get("message") or "",
is_custom_target=is_custom_target,
provider_label=provider_label,
)
def switch_to_custom_provider() -> CustomAutoResult:
"""Handle bare '/model custom' — resolve endpoint and auto-detect model.
Returns a result object; the caller handles persistence and output.
"""
from hermes_cli.runtime_provider import (
resolve_runtime_provider,
_auto_detect_local_model,
)
try:
runtime = resolve_runtime_provider(requested="custom")
except Exception as e:
return CustomAutoResult(
success=False,
error_message=f"Could not resolve custom endpoint: {e}",
)
cust_base = runtime.get("base_url", "")
cust_key = runtime.get("api_key", "")
if not cust_base or "openrouter.ai" in cust_base:
return CustomAutoResult(
success=False,
error_message=(
"No custom endpoint configured. "
"Set model.base_url in config.yaml, or set OPENAI_BASE_URL "
"in .env, or run: hermes setup → Custom OpenAI-compatible endpoint"
),
)
detected_model = _auto_detect_local_model(cust_base)
if not detected_model:
return CustomAutoResult(
success=False,
base_url=cust_base,
api_key=cust_key,
error_message=(
f"Custom endpoint at {cust_base} is reachable but no single "
f"model was auto-detected. Specify the model explicitly: "
f"/model custom:<model-name>"
),
)
return CustomAutoResult(
success=True,
model=detected_model,
base_url=cust_base,
api_key=cust_key,
)
+9
View File
@@ -345,6 +345,15 @@ def parse_model_input(raw: str, current_provider: str) -> tuple[str, str]:
provider_part = stripped[:colon].strip().lower()
model_part = stripped[colon + 1:].strip()
if provider_part and model_part and provider_part in _KNOWN_PROVIDER_NAMES:
# Support custom:name:model triple syntax for named custom
# providers. ``custom:local:qwen`` → ("custom:local", "qwen").
# Single colon ``custom:qwen`` → ("custom", "qwen") as before.
if provider_part == "custom" and ":" in model_part:
second_colon = model_part.find(":")
custom_name = model_part[:second_colon].strip()
actual_model = model_part[second_colon + 1:].strip()
if custom_name and actual_model:
return (f"custom:{custom_name}", actual_model)
return (normalize_provider(provider_part), model_part)
return (current_provider, stripped)
+10 -2
View File
@@ -198,7 +198,7 @@ def _resolve_named_custom_runtime(
api_key = next((candidate for candidate in api_key_candidates if has_usable_secret(candidate)), "")
return {
"provider": "openrouter",
"provider": "custom",
"api_mode": custom_provider.get("api_mode")
or _detect_api_mode_for_url(base_url)
or "chat_completions",
@@ -279,8 +279,16 @@ def _resolve_openrouter_runtime(
source = "explicit" if (explicit_api_key or explicit_base_url) else "env/config"
# When "custom" was explicitly requested, preserve that as the provider
# name instead of silently relabeling to "openrouter" (#2562).
# Also provide a placeholder API key for local servers that don't require
# authentication — the OpenAI SDK requires a non-empty api_key string.
effective_provider = "custom" if requested_norm == "custom" else "openrouter"
if effective_provider == "custom" and not api_key and not _is_openrouter_url:
api_key = "no-key-required"
return {
"provider": "openrouter",
"provider": effective_provider,
"api_mode": _parse_api_mode(model_cfg.get("api_mode"))
or _detect_api_mode_for_url(base_url)
or "chat_completions",
+77 -77
View File
@@ -873,9 +873,9 @@ def setup_model_provider(config: dict):
keep_label = None # No provider configured — don't show "Keep current"
provider_choices = [
"OpenRouter API key (100+ models, pay-per-use)",
"Login with Nous Portal (Nous Research subscription — OAuth)",
"Login with OpenAI Codex",
"OpenRouter API key (100+ models, pay-per-use)",
"Custom OpenAI-compatible endpoint (self-hosted / VLLM / etc.)",
"Z.AI / GLM (Zhipu AI models)",
"Kimi / Moonshot (Kimi coding models)",
@@ -894,7 +894,7 @@ def setup_model_provider(config: dict):
provider_choices.append(keep_label)
# Default to "Keep current" if a provider exists, otherwise OpenRouter (most common)
default_provider = len(provider_choices) - 1 if has_any_provider else 2
default_provider = len(provider_choices) - 1 if has_any_provider else 0
if not has_any_provider:
print_warning("An inference provider is required for Hermes to work.")
@@ -911,81 +911,7 @@ def setup_model_provider(config: dict):
selected_base_url = None # deferred until after model selection
nous_models = [] # populated if Nous login succeeds
if provider_idx == 0: # Nous Portal (OAuth)
selected_provider = "nous"
print()
print_header("Nous Portal Login")
print_info("This will open your browser to authenticate with Nous Portal.")
print_info("You'll need a Nous Research account with an active subscription.")
print()
try:
from hermes_cli.auth import _login_nous, ProviderConfig
import argparse
mock_args = argparse.Namespace(
portal_url=None,
inference_url=None,
client_id=None,
scope=None,
no_browser=False,
timeout=15.0,
ca_bundle=None,
insecure=False,
)
pconfig = PROVIDER_REGISTRY["nous"]
_login_nous(mock_args, pconfig)
_sync_model_from_disk(config)
# Fetch models for the selection step
try:
creds = resolve_nous_runtime_credentials(
min_key_ttl_seconds=5 * 60,
timeout_seconds=15.0,
)
nous_models = fetch_nous_models(
inference_base_url=creds.get("base_url", ""),
api_key=creds.get("api_key", ""),
)
except Exception as e:
logger.debug("Could not fetch Nous models after login: %s", e)
except SystemExit:
print_warning("Nous Portal login was cancelled or failed.")
print_info("You can try again later with: hermes model")
selected_provider = None
except Exception as e:
print_error(f"Login failed: {e}")
print_info("You can try again later with: hermes model")
selected_provider = None
elif provider_idx == 1: # OpenAI Codex
selected_provider = "openai-codex"
print()
print_header("OpenAI Codex Login")
print()
try:
import argparse
mock_args = argparse.Namespace()
_login_openai_codex(mock_args, PROVIDER_REGISTRY["openai-codex"])
# Clear custom endpoint vars that would override provider routing.
if existing_custom:
save_env_value("OPENAI_BASE_URL", "")
save_env_value("OPENAI_API_KEY", "")
_update_config_for_provider("openai-codex", DEFAULT_CODEX_BASE_URL)
_set_model_provider(config, "openai-codex", DEFAULT_CODEX_BASE_URL)
except SystemExit:
print_warning("OpenAI Codex login was cancelled or failed.")
print_info("You can try again later with: hermes model")
selected_provider = None
except Exception as e:
print_error(f"Login failed: {e}")
print_info("You can try again later with: hermes model")
selected_provider = None
elif provider_idx == 2: # OpenRouter
if provider_idx == 0: # OpenRouter
selected_provider = "openrouter"
print()
print_header("OpenRouter API Key")
@@ -1040,6 +966,80 @@ def setup_model_provider(config: dict):
except Exception as e:
logger.debug("Could not save provider to config.yaml: %s", e)
elif provider_idx == 1: # Nous Portal (OAuth)
selected_provider = "nous"
print()
print_header("Nous Portal Login")
print_info("This will open your browser to authenticate with Nous Portal.")
print_info("You'll need a Nous Research account with an active subscription.")
print()
try:
from hermes_cli.auth import _login_nous, ProviderConfig
import argparse
mock_args = argparse.Namespace(
portal_url=None,
inference_url=None,
client_id=None,
scope=None,
no_browser=False,
timeout=15.0,
ca_bundle=None,
insecure=False,
)
pconfig = PROVIDER_REGISTRY["nous"]
_login_nous(mock_args, pconfig)
_sync_model_from_disk(config)
# Fetch models for the selection step
try:
creds = resolve_nous_runtime_credentials(
min_key_ttl_seconds=5 * 60,
timeout_seconds=15.0,
)
nous_models = fetch_nous_models(
inference_base_url=creds.get("base_url", ""),
api_key=creds.get("api_key", ""),
)
except Exception as e:
logger.debug("Could not fetch Nous models after login: %s", e)
except SystemExit:
print_warning("Nous Portal login was cancelled or failed.")
print_info("You can try again later with: hermes model")
selected_provider = None
except Exception as e:
print_error(f"Login failed: {e}")
print_info("You can try again later with: hermes model")
selected_provider = None
elif provider_idx == 2: # OpenAI Codex
selected_provider = "openai-codex"
print()
print_header("OpenAI Codex Login")
print()
try:
import argparse
mock_args = argparse.Namespace()
_login_openai_codex(mock_args, PROVIDER_REGISTRY["openai-codex"])
# Clear custom endpoint vars that would override provider routing.
if existing_custom:
save_env_value("OPENAI_BASE_URL", "")
save_env_value("OPENAI_API_KEY", "")
_update_config_for_provider("openai-codex", DEFAULT_CODEX_BASE_URL)
_set_model_provider(config, "openai-codex", DEFAULT_CODEX_BASE_URL)
except SystemExit:
print_warning("OpenAI Codex login was cancelled or failed.")
print_info("You can try again later with: hermes model")
selected_provider = None
except Exception as e:
print_error(f"Login failed: {e}")
print_info("You can try again later with: hermes model")
selected_provider = None
elif provider_idx == 3: # Custom endpoint
selected_provider = "custom"
print()
+30 -13
View File
@@ -391,18 +391,29 @@ def _get_platform_tools(config: dict, platform: str) -> Set[str]:
default_ts = PLATFORMS[platform]["default_toolset"]
toolset_names = [default_ts]
# Resolve to individual tool names, then map back to which
# configurable toolsets are covered
all_tool_names = set()
for ts_name in toolset_names:
all_tool_names.update(resolve_toolset(ts_name))
configurable_keys = {ts_key for ts_key, _, _ in CONFIGURABLE_TOOLSETS}
# Map individual tool names back to configurable toolset keys
enabled_toolsets = set()
for ts_key, _, _ in CONFIGURABLE_TOOLSETS:
ts_tools = set(resolve_toolset(ts_key))
if ts_tools and ts_tools.issubset(all_tool_names):
enabled_toolsets.add(ts_key)
# If the saved list contains any configurable keys directly, the user
# has explicitly configured this platform — use direct membership.
# This avoids the subset-inference bug where composite toolsets like
# "hermes-cli" (which include all _HERMES_CORE_TOOLS) cause disabled
# toolsets to re-appear as enabled.
has_explicit_config = any(ts in configurable_keys for ts in toolset_names)
if has_explicit_config:
enabled_toolsets = {ts for ts in toolset_names if ts in configurable_keys}
else:
# No explicit config — fall back to resolving composite toolset names
# (e.g. "hermes-cli") to individual tool names and reverse-mapping.
all_tool_names = set()
for ts_name in toolset_names:
all_tool_names.update(resolve_toolset(ts_name))
enabled_toolsets = set()
for ts_key, _, _ in CONFIGURABLE_TOOLSETS:
ts_tools = set(resolve_toolset(ts_key))
if ts_tools and ts_tools.issubset(all_tool_names):
enabled_toolsets.add(ts_key)
# Plugin toolsets: enabled by default unless explicitly disabled.
# A plugin toolset is "known" for a platform once `hermes tools`
@@ -437,15 +448,21 @@ def _save_platform_tools(config: dict, platform: str, enabled_toolset_keys: Set[
plugin_keys = _get_plugin_toolset_keys()
configurable_keys |= plugin_keys
# Also exclude platform default toolsets (hermes-cli, hermes-telegram, etc.)
# These are "super" toolsets that resolve to ALL tools, so preserving them
# would silently override the user's unchecked selections on the next read.
platform_default_keys = {p["default_toolset"] for p in PLATFORMS.values()}
# Get existing toolsets for this platform
existing_toolsets = config.get("platform_toolsets", {}).get(platform, [])
if not isinstance(existing_toolsets, list):
existing_toolsets = []
# Preserve any entries that are NOT configurable toolsets (i.e. MCP server names)
# Preserve any entries that are NOT configurable toolsets and NOT platform
# defaults (i.e. only MCP server names should be preserved)
preserved_entries = {
entry for entry in existing_toolsets
if entry not in configurable_keys
if entry not in configurable_keys and entry not in platform_default_keys
}
# Merge preserved entries with new enabled toolsets
Submodule mini-swe-agent deleted from 07aa6a7385
+14 -18
View File
@@ -1,13 +1,13 @@
#!/usr/bin/env python3
"""
Mini-SWE-Agent Runner with Hermes Trajectory Format
SWE Runner with Hermes Trajectory Format
This module provides a runner that uses mini-swe-agent's execution environments
(local, docker, modal) but outputs trajectories in the Hermes-Agent format
A runner that uses Hermes-Agent's built-in execution environments
(local, docker, modal) and outputs trajectories in the Hermes-Agent format
compatible with batch_runner.py and trajectory_compressor.py.
Features:
- Uses mini-swe-agent's Docker, Modal, or Local environments for command execution
- Uses Hermes-Agent's Docker, Modal, or Local environments for command execution
- Outputs trajectories in Hermes format (from/value pairs with <tool_call>/<tool_response> XML)
- Compatible with the trajectory compression pipeline
- Supports batch processing from JSONL prompt files
@@ -42,11 +42,7 @@ from dotenv import load_dotenv
# Load environment variables
load_dotenv()
# Add mini-swe-agent to path if not installed. In git worktrees the populated
# submodule may live in the main checkout rather than the worktree itself.
from minisweagent_path import ensure_minisweagent_on_path
ensure_minisweagent_on_path(Path(__file__).resolve().parent)
# ============================================================================
@@ -110,7 +106,7 @@ def create_environment(
**kwargs
):
"""
Create an execution environment from mini-swe-agent.
Create an execution environment using Hermes-Agent's built-in backends.
Args:
env_type: One of "local", "docker", "modal"
@@ -120,19 +116,19 @@ def create_environment(
**kwargs: Additional environment-specific options
Returns:
Environment instance with execute() method
Environment instance with execute() and cleanup() methods
"""
if env_type == "local":
from minisweagent.environments.local import LocalEnvironment
from tools.environments.local import LocalEnvironment
return LocalEnvironment(cwd=cwd, timeout=timeout)
elif env_type == "docker":
from minisweagent.environments.docker import DockerEnvironment
from tools.environments.docker import DockerEnvironment
return DockerEnvironment(image=image, cwd=cwd, timeout=timeout, **kwargs)
elif env_type == "modal":
from minisweagent.environments.extra.swerex_modal import SwerexModalEnvironment
return SwerexModalEnvironment(image=image, cwd=cwd, timeout=timeout, **kwargs)
from tools.environments.modal import ModalEnvironment
return ModalEnvironment(image=image, cwd=cwd, timeout=timeout, **kwargs)
else:
raise ValueError(f"Unknown environment type: {env_type}. Use 'local', 'docker', or 'modal'")
@@ -144,8 +140,8 @@ def create_environment(
class MiniSWERunner:
"""
Agent runner that uses mini-swe-agent environments but outputs
trajectories in Hermes-Agent format.
Agent runner that uses Hermes-Agent's built-in execution environments
and outputs trajectories in Hermes-Agent format.
"""
def __init__(
@@ -618,7 +614,7 @@ Complete the user's task step by step."""
def main(
task: str = None,
prompts_file: str = None,
output_file: str = "mini-swe-agent-test1.jsonl",
output_file: str = "swe-runner-test1.jsonl",
model: str = "claude-sonnet-4-20250514",
base_url: str = None,
api_key: str = None,
@@ -630,7 +626,7 @@ def main(
verbose: bool = False,
):
"""
Run mini-swe-agent tasks with Hermes trajectory format output.
Run SWE tasks with Hermes trajectory format output.
Args:
task: Single task to run (use this OR prompts_file)
-92
View File
@@ -1,92 +0,0 @@
"""Helpers for locating the mini-swe-agent source tree.
Hermes often runs from git worktrees. In that layout the worktree root may have
an empty ``mini-swe-agent/`` placeholder while the real populated submodule
lives under the main checkout that owns the shared ``.git`` directory.
These helpers locate a usable ``mini-swe-agent/src`` directory and optionally
prepend it to ``sys.path`` so imports like ``import minisweagent`` work from
both normal checkouts and worktrees.
"""
from __future__ import annotations
import importlib.util
import sys
from pathlib import Path
from typing import Optional
def _read_gitdir(repo_root: Path) -> Optional[Path]:
"""Resolve the gitdir referenced by ``repo_root/.git`` when it is a file."""
git_marker = repo_root / ".git"
if not git_marker.is_file():
return None
try:
raw = git_marker.read_text(encoding="utf-8").strip()
except OSError:
return None
prefix = "gitdir:"
if not raw.lower().startswith(prefix):
return None
target = raw[len(prefix):].strip()
gitdir = Path(target)
if not gitdir.is_absolute():
gitdir = (repo_root / gitdir).resolve()
else:
gitdir = gitdir.resolve()
return gitdir
def discover_minisweagent_src(repo_root: Optional[Path] = None) -> Optional[Path]:
"""Return the best available ``mini-swe-agent/src`` path, if any.
Search order:
1. Current checkout/worktree root
2. Main checkout that owns the shared ``.git`` directory (for worktrees)
"""
repo_root = (repo_root or Path(__file__).resolve().parent).resolve()
candidates: list[Path] = [repo_root / "mini-swe-agent" / "src"]
gitdir = _read_gitdir(repo_root)
if gitdir is not None:
# Worktree layout: <main>/.git/worktrees/<name>
if len(gitdir.parents) >= 3 and gitdir.parent.name == "worktrees":
candidates.append(gitdir.parents[2] / "mini-swe-agent" / "src")
# Direct checkout with .git file pointing elsewhere
elif gitdir.name == ".git":
candidates.append(gitdir.parent / "mini-swe-agent" / "src")
seen = set()
for candidate in candidates:
candidate = candidate.resolve()
if candidate in seen:
continue
seen.add(candidate)
if candidate.exists() and candidate.is_dir():
return candidate
return None
def ensure_minisweagent_on_path(repo_root: Optional[Path] = None) -> Optional[Path]:
"""Ensure ``minisweagent`` is importable by prepending its src dir to sys.path.
Returns the inserted/discovered path, or ``None`` if the package is already
importable or no local source tree could be found.
"""
if importlib.util.find_spec("minisweagent") is not None:
return None
src = discover_minisweagent_src(repo_root)
if src is None:
return None
src_str = str(src)
if src_str not in sys.path:
sys.path.insert(0, src_str)
return src
+41 -45
View File
@@ -11,64 +11,60 @@ requires-python = ">=3.11"
authors = [{ name = "Nous Research" }]
license = { text = "MIT" }
dependencies = [
# Core
"openai",
"anthropic>=0.39.0",
"python-dotenv",
"fire",
"httpx",
"rich",
"tenacity",
"pyyaml",
"requests",
"jinja2",
"pydantic>=2.0",
# Core — pinned to known-good ranges to limit supply chain attack surface
"openai>=2.21.0,<3",
"anthropic>=0.39.0,<1",
"python-dotenv>=1.2.1,<2",
"fire>=0.7.1,<1",
"httpx>=0.28.1,<1",
"rich>=14.3.3,<15",
"tenacity>=9.1.4,<10",
"pyyaml>=6.0.2,<7",
"requests>=2.32.3,<3",
"jinja2>=3.1.5,<4",
"pydantic>=2.12.5,<3",
# Interactive CLI (prompt_toolkit is used directly by cli.py)
"prompt_toolkit",
"prompt_toolkit>=3.0.52,<4",
# Tools
"firecrawl-py",
"parallel-web>=0.4.2",
"fal-client",
"firecrawl-py>=4.16.0,<5",
"parallel-web>=0.4.2,<1",
"fal-client>=0.13.1,<1",
# Text-to-speech (Edge TTS is free, no API key needed)
"edge-tts",
"faster-whisper>=1.0.0",
# mini-swe-agent deps (terminal tool)
"litellm>=1.75.5",
"typer",
"platformdirs",
"edge-tts>=7.2.7,<8",
"faster-whisper>=1.0.0,<2",
# Skills Hub (GitHub App JWT auth — optional, only needed for bot identity)
"PyJWT[crypto]",
"PyJWT[crypto]>=2.10.1,<3",
]
[project.optional-dependencies]
modal = ["swe-rex[modal]>=1.4.0"]
daytona = ["daytona>=0.148.0"]
dev = ["pytest", "pytest-asyncio", "pytest-xdist", "mcp>=1.2.0"]
messaging = ["python-telegram-bot>=20.0", "discord.py[voice]>=2.0", "aiohttp>=3.9.0", "slack-bolt>=1.18.0", "slack-sdk>=3.27.0"]
cron = ["croniter"]
slack = ["slack-bolt>=1.18.0", "slack-sdk>=3.27.0"]
matrix = ["matrix-nio[e2e]>=0.24.0"]
cli = ["simple-term-menu"]
tts-premium = ["elevenlabs"]
voice = ["sounddevice>=0.4.6", "numpy>=1.24.0"]
modal = ["swe-rex[modal]>=1.4.0,<2"]
daytona = ["daytona>=0.148.0,<1"]
dev = ["pytest>=9.0.2,<10", "pytest-asyncio>=1.3.0,<2", "pytest-xdist>=3.0,<4", "mcp>=1.2.0,<2"]
messaging = ["python-telegram-bot>=22.6,<23", "discord.py[voice]>=2.7.1,<3", "aiohttp>=3.13.3,<4", "slack-bolt>=1.18.0,<2", "slack-sdk>=3.27.0,<4"]
cron = ["croniter>=6.0.0,<7"]
slack = ["slack-bolt>=1.18.0,<2", "slack-sdk>=3.27.0,<4"]
matrix = ["matrix-nio[e2e]>=0.24.0,<1"]
cli = ["simple-term-menu>=1.0,<2"]
tts-premium = ["elevenlabs>=1.0,<2"]
voice = ["sounddevice>=0.4.6,<1", "numpy>=1.24.0,<3"]
pty = [
"ptyprocess>=0.7.0; sys_platform != 'win32'",
"pywinpty>=2.0.0; sys_platform == 'win32'",
"ptyprocess>=0.7.0,<1; sys_platform != 'win32'",
"pywinpty>=2.0.0,<3; sys_platform == 'win32'",
]
honcho = ["honcho-ai>=2.0.1"]
mcp = ["mcp>=1.2.0"]
homeassistant = ["aiohttp>=3.9.0"]
sms = ["aiohttp>=3.9.0"]
honcho = ["honcho-ai>=2.0.1,<3"]
mcp = ["mcp>=1.2.0,<2"]
homeassistant = ["aiohttp>=3.9.0,<4"]
sms = ["aiohttp>=3.9.0,<4"]
acp = ["agent-client-protocol>=0.8.1,<1.0"]
dingtalk = ["dingtalk-stream>=0.1.0"]
dingtalk = ["dingtalk-stream>=0.1.0,<1"]
rl = [
"atroposlib @ git+https://github.com/NousResearch/atropos.git",
"tinker @ git+https://github.com/thinking-machines-lab/tinker.git",
"fastapi>=0.104.0",
"uvicorn[standard]>=0.24.0",
"wandb>=0.15.0",
"fastapi>=0.104.0,<1",
"uvicorn[standard]>=0.24.0,<1",
"wandb>=0.15.0,<1",
]
yc-bench = ["yc-bench @ git+https://github.com/collinear-ai/yc-bench.git"]
yc-bench = ["yc-bench @ git+https://github.com/collinear-ai/yc-bench.git ; python_version >= '3.12'"]
all = [
"hermes-agent[modal]",
"hermes-agent[daytona]",
@@ -94,7 +90,7 @@ hermes-agent = "run_agent:main"
hermes-acp = "acp_adapter.entry:main"
[tool.setuptools]
py-modules = ["run_agent", "model_tools", "toolsets", "batch_runner", "trajectory_compressor", "toolset_distributions", "cli", "hermes_constants", "hermes_state", "hermes_time", "mini_swe_runner", "minisweagent_path", "rl_cli", "utils"]
py-modules = ["run_agent", "model_tools", "toolsets", "batch_runner", "trajectory_compressor", "toolset_distributions", "cli", "hermes_constants", "hermes_state", "hermes_time", "rl_cli", "utils"]
[tool.setuptools.packages.find]
include = ["agent", "tools", "tools.*", "hermes_cli", "gateway", "gateway.*", "cron", "honcho_integration", "acp_adapter"]
-6
View File
@@ -23,12 +23,6 @@ parallel-web>=0.4.2
# Image generation
fal-client
# mini-swe-agent dependencies (for terminal tool)
# Note: Install mini-swe-agent itself with: pip install -e ./mini-swe-agent
litellm>=1.75.5
typer
platformdirs
# Text-to-speech (Edge TTS is free, no API key needed)
edge-tts
+147 -41
View File
@@ -58,9 +58,6 @@ if _loaded_env_paths:
else:
logger.info("No .env file found. Using system environment variables.")
# Point mini-swe-agent at ~/.hermes/ so it shares our config
os.environ.setdefault("MSWEA_GLOBAL_CONFIG_DIR", str(_hermes_home))
os.environ.setdefault("MSWEA_SILENT_STARTUP", "1")
# Import our tool system
from model_tools import get_tool_definitions, handle_function_call, check_toolset_requirements
@@ -405,6 +402,7 @@ class AIAgent:
clarify_callback: callable = None,
step_callback: callable = None,
stream_delta_callback: callable = None,
tool_gen_callback: callable = None,
status_callback: callable = None,
max_tokens: int = None,
reasoning_config: Dict[str, Any] = None,
@@ -534,6 +532,7 @@ class AIAgent:
self.step_callback = step_callback
self.stream_delta_callback = stream_delta_callback
self.status_callback = status_callback
self.tool_gen_callback = tool_gen_callback
self._last_reported_tool = None # Track for "new tool" mode
# Tool execution state — allows _vprint during tool execution
@@ -586,8 +585,7 @@ class AIAgent:
# Context pressure warnings: notify the USER (not the LLM) as context
# fills up. Purely informational — displayed in CLI output and sent via
# status_callback for gateway platforms. Does NOT inject into messages.
self._context_50_warned = False
self._context_70_warned = False
self._context_pressure_warned = False
# Persistent error log -- always writes WARNING+ to ~/.hermes/logs/errors.log
# so tool failures, API errors, etc. are inspectable after the fact.
@@ -659,7 +657,7 @@ class AIAgent:
# INFO/WARNING messages just clutter it.
for quiet_logger in [
'tools', # all tools.* (terminal, browser, web, file, etc.)
'minisweagent', # mini-swe-agent execution backend
'run_agent', # agent runner internals
'trajectory_compressor',
'cron', # scheduler (only relevant in daemon mode)
@@ -1014,6 +1012,8 @@ class AIAgent:
compression_threshold = float(_compression_cfg.get("threshold", 0.50))
compression_enabled = str(_compression_cfg.get("enabled", True)).lower() in ("true", "1", "yes")
compression_summary_model = _compression_cfg.get("summary_model") or None
compression_target_ratio = float(_compression_cfg.get("target_ratio", 0.20))
compression_protect_last = int(_compression_cfg.get("protect_last_n", 20))
# Read explicit context_length override from model config
_model_cfg = _agent_cfg.get("model", {})
@@ -1052,8 +1052,8 @@ class AIAgent:
model=self.model,
threshold_percent=compression_threshold,
protect_first_n=3,
protect_last_n=4,
summary_target_tokens=500,
protect_last_n=compression_protect_last,
summary_target_ratio=compression_target_ratio,
summary_model_override=compression_summary_model,
quiet_mode=self.quiet_mode,
base_url=self.base_url,
@@ -2363,7 +2363,13 @@ class AIAgent:
prompt_parts.append(skills_prompt)
if not self.skip_context_files:
context_files_prompt = build_context_files_prompt(skip_soul=_soul_loaded)
# Use TERMINAL_CWD for context file discovery when set (gateway
# mode). The gateway process runs from the hermes-agent install
# dir, so os.getcwd() would pick up the repo's AGENTS.md and
# other dev files — inflating token usage by ~10k for no benefit.
_context_cwd = os.getenv("TERMINAL_CWD") or None
context_files_prompt = build_context_files_prompt(
cwd=_context_cwd, skip_soul=_soul_loaded)
if context_files_prompt:
prompt_parts.append(context_files_prompt)
@@ -3513,6 +3519,21 @@ class AIAgent:
except Exception:
pass
def _fire_tool_gen_started(self, tool_name: str) -> None:
"""Notify display layer that the model is generating tool call arguments.
Fires once per tool name when the streaming response begins producing
tool_call / tool_use tokens. Gives the TUI a chance to show a spinner
or status line so the user isn't staring at a frozen screen while a
large tool payload (e.g. a 45 KB write_file) is being generated.
"""
cb = self.tool_gen_callback
if cb is not None:
try:
cb(tool_name)
except Exception:
pass
def _has_stream_consumers(self) -> bool:
"""Return True if any streaming consumer is registered."""
return (
@@ -3564,7 +3585,20 @@ class AIAgent:
def _call_chat_completions():
"""Stream a chat completions response."""
stream_kwargs = {**api_kwargs, "stream": True, "stream_options": {"include_usage": True}}
import httpx as _httpx
_base_timeout = float(os.getenv("HERMES_API_TIMEOUT", 900.0))
_stream_read_timeout = float(os.getenv("HERMES_STREAM_READ_TIMEOUT", 60.0))
stream_kwargs = {
**api_kwargs,
"stream": True,
"stream_options": {"include_usage": True},
"timeout": _httpx.Timeout(
connect=30.0,
read=_stream_read_timeout,
write=_base_timeout,
pool=30.0,
),
}
request_client_holder["client"] = self._create_request_openai_client(
reason="chat_completion_stream_request"
)
@@ -3572,6 +3606,7 @@ class AIAgent:
content_parts: list = []
tool_calls_acc: dict = {}
tool_gen_notified: set = set()
finish_reason = None
model_name = None
role = "assistant"
@@ -3598,6 +3633,7 @@ class AIAgent:
reasoning_text = getattr(delta, "reasoning_content", None) or getattr(delta, "reasoning", None)
if reasoning_text:
reasoning_parts.append(reasoning_text)
_fire_first_delta()
self._fire_reasoning_delta(reasoning_text)
# Accumulate text content — fire callback only when no tool calls
@@ -3608,7 +3644,7 @@ class AIAgent:
self._fire_stream_delta(delta.content)
deltas_were_sent["yes"] = True
# Accumulate tool call deltas (silently, no callback)
# Accumulate tool call deltas — notify display on first name
if delta and delta.tool_calls:
for tc_delta in delta.tool_calls:
idx = tc_delta.index if tc_delta.index is not None else 0
@@ -3626,6 +3662,12 @@ class AIAgent:
entry["function"]["name"] += tc_delta.function.name
if tc_delta.function.arguments:
entry["function"]["arguments"] += tc_delta.function.arguments
# Fire once per tool when the full name is available
name = entry["function"]["name"]
if name and idx not in tool_gen_notified:
tool_gen_notified.add(idx)
_fire_first_delta()
self._fire_tool_gen_started(name)
if chunk.choices[0].finish_reason:
finish_reason = chunk.choices[0].finish_reason
@@ -3691,6 +3733,10 @@ class AIAgent:
block = getattr(event, "content_block", None)
if block and getattr(block, "type", None) == "tool_use":
has_tool_use = True
tool_name = getattr(block, "name", None)
if tool_name:
_fire_first_delta()
self._fire_tool_gen_started(tool_name)
elif event_type == "content_block_delta":
delta = getattr(event, "delta", None)
@@ -3704,35 +3750,91 @@ class AIAgent:
elif delta_type == "thinking_delta":
thinking_text = getattr(delta, "thinking", "")
if thinking_text:
_fire_first_delta()
self._fire_reasoning_delta(thinking_text)
# Return the native Anthropic Message for downstream processing
return stream.get_final_message()
def _call():
import httpx as _httpx
_max_stream_retries = int(os.getenv("HERMES_STREAM_RETRIES", 2))
try:
if self.api_mode == "anthropic_messages":
self._try_refresh_anthropic_client_credentials()
result["response"] = _call_anthropic()
else:
result["response"] = _call_chat_completions()
except Exception as e:
if deltas_were_sent["yes"]:
# Streaming failed AFTER some tokens were already delivered
# to consumers. Don't fall back — that would cause
# double-delivery (partial streamed + full non-streamed).
# Let the error propagate; the partial content already
# reached the user via the stream.
logger.warning("Streaming failed after partial delivery, not falling back: %s", e)
result["error"] = e
else:
# Streaming failed before any tokens reached consumers.
# Safe to fall back to the standard non-streaming path.
logger.info("Streaming failed before delivery, falling back to non-streaming: %s", e)
for _stream_attempt in range(_max_stream_retries + 1):
try:
result["response"] = self._interruptible_api_call(api_kwargs)
except Exception as fallback_err:
result["error"] = fallback_err
if self.api_mode == "anthropic_messages":
self._try_refresh_anthropic_client_credentials()
result["response"] = _call_anthropic()
else:
result["response"] = _call_chat_completions()
return # success
except Exception as e:
if deltas_were_sent["yes"]:
# Streaming failed AFTER some tokens were already
# delivered. Don't retry or fall back — partial
# content already reached the user.
logger.warning(
"Streaming failed after partial delivery, not retrying: %s", e
)
result["error"] = e
return
_is_timeout = isinstance(
e, (_httpx.ReadTimeout, _httpx.ConnectTimeout, _httpx.PoolTimeout)
)
_is_conn_err = isinstance(
e, (_httpx.ConnectError, _httpx.RemoteProtocolError, ConnectionError)
)
if _is_timeout or _is_conn_err:
# Transient network / timeout error. Retry the
# streaming request with a fresh connection rather
# than falling back to non-streaming (which would
# hang for up to 15 min on the same dead server).
if _stream_attempt < _max_stream_retries:
logger.info(
"Streaming attempt %s/%s failed (%s: %s), "
"retrying with fresh connection...",
_stream_attempt + 1,
_max_stream_retries + 1,
type(e).__name__,
e,
)
# Close the stale request client before retry
stale = request_client_holder.get("client")
if stale is not None:
self._close_request_openai_client(
stale, reason="stream_retry_cleanup"
)
request_client_holder["client"] = None
continue
# Exhausted retries — propagate to outer loop
logger.warning(
"Streaming exhausted %s retries on transient error: %s",
_max_stream_retries + 1,
e,
)
result["error"] = e
return
# Non-transient error (e.g. "streaming not supported",
# auth error, 4xx). Fall back to non-streaming once.
err_msg = str(e).lower()
if "stream" in err_msg and "not supported" in err_msg:
logger.info(
"Streaming not supported, falling back to non-streaming: %s", e
)
try:
result["response"] = self._interruptible_api_call(api_kwargs)
except Exception as fallback_err:
result["error"] = fallback_err
return
# Unknown error — propagate to outer retry loop
result["error"] = e
return
finally:
request_client = request_client_holder.get("client")
if request_client is not None:
@@ -4584,9 +4686,17 @@ class AIAgent:
except Exception as e:
logger.debug("Session DB compression split failed: %s", e)
# Reset context pressure warnings — usage drops after compaction
self._context_50_warned = False
self._context_70_warned = False
# Reset context pressure warning and token estimate — usage drops
# after compaction. Without this, the stale last_prompt_tokens from
# the previous API call causes the pressure calculation to stay at
# >1000% and spam warnings / re-trigger compression in a loop.
self._context_pressure_warned = False
_compressed_est = (
estimate_tokens_rough(new_system_prompt)
+ estimate_messages_tokens_rough(compressed)
)
self.context_compressor.last_prompt_tokens = _compressed_est
self.context_compressor.last_completion_tokens = 0
return compressed, new_system_prompt
@@ -6819,12 +6929,8 @@ class AIAgent:
# and fires status_callback for gateway platforms.
if _compressor.threshold_tokens > 0:
_compaction_progress = _estimated_next_prompt / _compressor.threshold_tokens
if _compaction_progress >= 0.85 and not self._context_70_warned:
self._context_70_warned = True
self._context_50_warned = True # skip first tier if we jumped past it
self._emit_context_pressure(_compaction_progress, _compressor)
elif _compaction_progress >= 0.60 and not self._context_50_warned:
self._context_50_warned = True
if _compaction_progress >= 0.85 and not self._context_pressure_warned:
self._context_pressure_warned = True
self._emit_context_pressure(_compaction_progress, _compressor)
if self.compression_enabled and _compressor.should_compress(_estimated_next_prompt):
+2 -14
View File
@@ -505,7 +505,7 @@ function Install-Repository {
git -c windows.appendAtomically=false config windows.appendAtomically false 2>$null
# Ensure submodules are initialized and updated
Write-Info "Initializing submodules (mini-swe-agent, tinker-atropos)..."
Write-Info "Initializing submodules..."
git -c windows.appendAtomically=false submodule update --init --recursive 2>$null
if ($LASTEXITCODE -ne 0) {
Write-Warn "Submodule init failed (terminal/RL tools may need manual setup)"
@@ -559,19 +559,7 @@ function Install-Dependencies {
Write-Success "Main package installed"
# Install submodules
Write-Info "Installing mini-swe-agent (terminal tool backend)..."
if (Test-Path "mini-swe-agent\pyproject.toml") {
try {
& $UvCmd pip install -e ".\mini-swe-agent" 2>&1 | Out-Null
Write-Success "mini-swe-agent installed"
} catch {
Write-Warn "mini-swe-agent install failed (terminal tools may not work)"
}
} else {
Write-Warn "mini-swe-agent not found (run: git submodule update --init)"
}
# Install optional submodules
Write-Info "Installing tinker-atropos (RL training backend)..."
if (Test-Path "tinker-atropos\pyproject.toml") {
try {
-16
View File
@@ -637,13 +637,6 @@ clone_repo() {
cd "$INSTALL_DIR"
# Only init mini-swe-agent (terminal tool backend — required).
# tinker-atropos (RL training) is optional and heavy — users can opt in later
# with: git submodule update --init tinker-atropos && uv pip install -e ./tinker-atropos
log_info "Initializing mini-swe-agent submodule (terminal backend)..."
git submodule update --init mini-swe-agent
log_success "Submodule ready"
log_success "Repository ready"
}
@@ -718,15 +711,6 @@ install_deps() {
log_success "Main package installed"
# Install submodules
log_info "Installing mini-swe-agent (terminal tool backend)..."
if [ -d "mini-swe-agent" ] && [ -f "mini-swe-agent/pyproject.toml" ]; then
$UV_CMD pip install -e "./mini-swe-agent" || log_warn "mini-swe-agent install failed (terminal tools may not work)"
log_success "mini-swe-agent installed"
else
log_warn "mini-swe-agent not found (run: git submodule update --init)"
fi
# tinker-atropos (RL training) is optional — skip by default.
# To enable RL tools: git submodule update --init tinker-atropos && uv pip install -e "./tinker-atropos"
if [ -d "tinker-atropos" ] && [ -f "tinker-atropos/pyproject.toml" ]; then
+15 -13
View File
@@ -116,24 +116,26 @@ export VIRTUAL_ENV="$SCRIPT_DIR/venv"
echo -e "${CYAN}${NC} Installing dependencies..."
$UV_CMD pip install -e ".[all]" || $UV_CMD pip install -e "."
echo -e "${GREEN}${NC} Dependencies installed"
# Prefer uv sync with lockfile (hash-verified installs) when available,
# fall back to pip install for compatibility or when lockfile is stale.
if [ -f "uv.lock" ]; then
echo -e "${CYAN}${NC} Using uv.lock for hash-verified installation..."
UV_PROJECT_ENVIRONMENT="$SCRIPT_DIR/venv" $UV_CMD sync --all-extras --locked 2>/dev/null && \
echo -e "${GREEN}${NC} Dependencies installed (lockfile verified)" || {
echo -e "${YELLOW}${NC} Lockfile install failed (may be outdated), falling back to pip install..."
$UV_CMD pip install -e ".[all]" || $UV_CMD pip install -e "."
echo -e "${GREEN}${NC} Dependencies installed"
}
else
$UV_CMD pip install -e ".[all]" || $UV_CMD pip install -e "."
echo -e "${GREEN}${NC} Dependencies installed"
fi
# ============================================================================
# Submodules (terminal backend + RL training)
# ============================================================================
echo -e "${CYAN}${NC} Installing submodules..."
# mini-swe-agent (terminal tool backend)
if [ -d "mini-swe-agent" ] && [ -f "mini-swe-agent/pyproject.toml" ]; then
$UV_CMD pip install -e "./mini-swe-agent" && \
echo -e "${GREEN}${NC} mini-swe-agent installed" || \
echo -e "${YELLOW}${NC} mini-swe-agent install failed (terminal tools may not work)"
else
echo -e "${YELLOW}${NC} mini-swe-agent not found (run: git submodule update --init --recursive)"
fi
echo -e "${CYAN}${NC} Installing optional submodules..."
# tinker-atropos (RL training backend)
if [ -d "tinker-atropos" ] && [ -f "tinker-atropos/pyproject.toml" ]; then
+50 -1
View File
@@ -217,7 +217,7 @@ class TestCompressWithClient:
mock_client.chat.completions.create.return_value = mock_response
with patch("agent.context_compressor.get_model_context_length", return_value=100000):
c = ContextCompressor(model="test", quiet_mode=True)
c = ContextCompressor(model="test", quiet_mode=True, protect_first_n=2, protect_last_n=2)
msgs = [{"role": "user" if i % 2 == 0 else "assistant", "content": f"msg {i}"} for i in range(10)]
with patch("agent.context_compressor.call_llm", return_value=mock_response):
@@ -513,3 +513,52 @@ class TestCompressWithClient:
for msg in result:
if msg.get("role") == "tool" and msg.get("tool_call_id"):
assert msg["tool_call_id"] in called_ids
class TestSummaryTargetRatio:
"""Verify that summary_target_ratio properly scales budgets with context window."""
def test_tail_budget_scales_with_context(self):
"""Tail token budget should be threshold_tokens * summary_target_ratio."""
with patch("agent.context_compressor.get_model_context_length", return_value=200_000):
c = ContextCompressor(model="test", quiet_mode=True, summary_target_ratio=0.40)
# 200K * 0.50 threshold * 0.40 ratio = 40K
assert c.tail_token_budget == 40_000
with patch("agent.context_compressor.get_model_context_length", return_value=1_000_000):
c = ContextCompressor(model="test", quiet_mode=True, summary_target_ratio=0.40)
# 1M * 0.50 threshold * 0.40 ratio = 200K
assert c.tail_token_budget == 200_000
def test_summary_cap_scales_with_context(self):
"""Max summary tokens should be 5% of context, capped at 12K."""
with patch("agent.context_compressor.get_model_context_length", return_value=200_000):
c = ContextCompressor(model="test", quiet_mode=True)
assert c.max_summary_tokens == 10_000 # 200K * 0.05
with patch("agent.context_compressor.get_model_context_length", return_value=1_000_000):
c = ContextCompressor(model="test", quiet_mode=True)
assert c.max_summary_tokens == 12_000 # capped at 12K ceiling
def test_ratio_clamped(self):
"""Ratio should be clamped to [0.10, 0.80]."""
with patch("agent.context_compressor.get_model_context_length", return_value=100_000):
c = ContextCompressor(model="test", quiet_mode=True, summary_target_ratio=0.05)
assert c.summary_target_ratio == 0.10
with patch("agent.context_compressor.get_model_context_length", return_value=100_000):
c = ContextCompressor(model="test", quiet_mode=True, summary_target_ratio=0.95)
assert c.summary_target_ratio == 0.80
def test_default_threshold_is_50_percent(self):
"""Default compression threshold should be 50%."""
with patch("agent.context_compressor.get_model_context_length", return_value=100_000):
c = ContextCompressor(model="test", quiet_mode=True)
assert c.threshold_percent == 0.50
assert c.threshold_tokens == 50_000
def test_default_protect_last_n_is_20(self):
"""Default protect_last_n should be 20."""
with patch("agent.context_compressor.get_model_context_length", return_value=100_000):
c = ContextCompressor(model="test", quiet_mode=True)
assert c.protect_last_n == 20
@@ -0,0 +1,167 @@
"""Tests for memory flush stale-overwrite prevention (#2670).
Verifies that:
1. Cron sessions are skipped (no flush for headless cron runs)
2. Current memory state is injected into the flush prompt so the
flush agent can see what's already saved and avoid overwrites
3. The flush still works normally when memory files don't exist
"""
import pytest
from pathlib import Path
from unittest.mock import MagicMock, patch, call
def _make_runner():
from gateway.run import GatewayRunner
runner = object.__new__(GatewayRunner)
runner._honcho_managers = {}
runner._honcho_configs = {}
runner._running_agents = {}
runner._pending_messages = {}
runner._pending_approvals = {}
runner.adapters = {}
runner.hooks = MagicMock()
runner.session_store = MagicMock()
return runner
_TRANSCRIPT_4_MSGS = [
{"role": "user", "content": "hello"},
{"role": "assistant", "content": "hi there"},
{"role": "user", "content": "remember my name is Alice"},
{"role": "assistant", "content": "Got it, Alice!"},
]
class TestCronSessionBypass:
"""Cron sessions should never trigger a memory flush."""
def test_cron_session_skipped(self):
runner = _make_runner()
runner._flush_memories_for_session("cron_job123_20260323_120000")
# session_store.load_transcript should never be called
runner.session_store.load_transcript.assert_not_called()
def test_cron_session_with_honcho_key_skipped(self):
runner = _make_runner()
runner._flush_memories_for_session("cron_daily_20260323", "some-honcho-key")
runner.session_store.load_transcript.assert_not_called()
def test_non_cron_session_proceeds(self):
"""Non-cron sessions should still attempt the flush."""
runner = _make_runner()
runner.session_store.load_transcript.return_value = []
runner._flush_memories_for_session("session_abc123")
runner.session_store.load_transcript.assert_called_once_with("session_abc123")
class TestMemoryInjection:
"""The flush prompt should include current memory state from disk."""
def test_memory_content_injected_into_flush_prompt(self, tmp_path):
"""When memory files exist, their content appears in the flush prompt."""
runner = _make_runner()
runner.session_store.load_transcript.return_value = _TRANSCRIPT_4_MSGS
tmp_agent = MagicMock()
memory_dir = tmp_path / "memories"
memory_dir.mkdir()
(memory_dir / "MEMORY.md").write_text("Agent knows Python\n§\nUser prefers dark mode")
(memory_dir / "USER.md").write_text("Name: Alice\n§\nTimezone: PST")
with (
patch("gateway.run._resolve_runtime_agent_kwargs", return_value={"api_key": "k"}),
patch("gateway.run._resolve_gateway_model", return_value="test-model"),
patch("run_agent.AIAgent", return_value=tmp_agent),
# Intercept `from tools.memory_tool import MEMORY_DIR` inside the function
patch.dict("sys.modules", {"tools.memory_tool": MagicMock(MEMORY_DIR=memory_dir)}),
):
runner._flush_memories_for_session("session_123")
tmp_agent.run_conversation.assert_called_once()
call_kwargs = tmp_agent.run_conversation.call_args.kwargs
flush_prompt = call_kwargs.get("user_message", "")
# Verify both memory sections appear in the prompt
assert "Agent knows Python" in flush_prompt
assert "User prefers dark mode" in flush_prompt
assert "Name: Alice" in flush_prompt
assert "Timezone: PST" in flush_prompt
# Verify the stale-overwrite warning is present
assert "Do NOT overwrite or remove entries" in flush_prompt
assert "current live state of memory" in flush_prompt
def test_flush_works_without_memory_files(self, tmp_path):
"""When no memory files exist, flush still runs without the guard."""
runner = _make_runner()
runner.session_store.load_transcript.return_value = _TRANSCRIPT_4_MSGS
tmp_agent = MagicMock()
empty_dir = tmp_path / "no_memories"
empty_dir.mkdir()
with (
patch("gateway.run._resolve_runtime_agent_kwargs", return_value={"api_key": "k"}),
patch("gateway.run._resolve_gateway_model", return_value="test-model"),
patch("run_agent.AIAgent", return_value=tmp_agent),
patch.dict("sys.modules", {"tools.memory_tool": MagicMock(MEMORY_DIR=empty_dir)}),
):
runner._flush_memories_for_session("session_456")
# Should still run, just without the memory guard section
tmp_agent.run_conversation.assert_called_once()
flush_prompt = tmp_agent.run_conversation.call_args.kwargs.get("user_message", "")
assert "Do NOT overwrite or remove entries" not in flush_prompt
assert "Review the conversation above" in flush_prompt
def test_empty_memory_files_no_injection(self, tmp_path):
"""Empty memory files should not trigger the guard section."""
runner = _make_runner()
runner.session_store.load_transcript.return_value = _TRANSCRIPT_4_MSGS
tmp_agent = MagicMock()
memory_dir = tmp_path / "memories"
memory_dir.mkdir()
(memory_dir / "MEMORY.md").write_text("")
(memory_dir / "USER.md").write_text(" \n ") # whitespace only
with (
patch("gateway.run._resolve_runtime_agent_kwargs", return_value={"api_key": "k"}),
patch("gateway.run._resolve_gateway_model", return_value="test-model"),
patch("run_agent.AIAgent", return_value=tmp_agent),
patch.dict("sys.modules", {"tools.memory_tool": MagicMock(MEMORY_DIR=memory_dir)}),
):
runner._flush_memories_for_session("session_789")
tmp_agent.run_conversation.assert_called_once()
flush_prompt = tmp_agent.run_conversation.call_args.kwargs.get("user_message", "")
# No memory content → no guard section
assert "current live state of memory" not in flush_prompt
class TestFlushPromptStructure:
"""Verify the flush prompt retains its core instructions."""
def test_core_instructions_present(self):
"""The flush prompt should still contain the original guidance."""
runner = _make_runner()
runner.session_store.load_transcript.return_value = _TRANSCRIPT_4_MSGS
tmp_agent = MagicMock()
with (
patch("gateway.run._resolve_runtime_agent_kwargs", return_value={"api_key": "k"}),
patch("gateway.run._resolve_gateway_model", return_value="test-model"),
patch("run_agent.AIAgent", return_value=tmp_agent),
# Make the import fail gracefully so we test without memory files
patch.dict("sys.modules", {"tools.memory_tool": MagicMock(MEMORY_DIR=Path("/nonexistent"))}),
):
runner._flush_memories_for_session("session_struct")
flush_prompt = tmp_agent.run_conversation.call_args.kwargs.get("user_message", "")
assert "automatically reset" in flush_prompt
assert "Save any important facts" in flush_prompt
assert "consider saving it as a skill" in flush_prompt
assert "Do NOT respond to the user" in flush_prompt
+72
View File
@@ -282,6 +282,78 @@ class TestGatewaySystemServiceRouting:
assert run_calls == []
class TestDetectVenvDir:
"""Tests for _detect_venv_dir() virtualenv detection."""
def test_detects_active_virtualenv_via_sys_prefix(self, tmp_path, monkeypatch):
venv_path = tmp_path / "my-custom-venv"
venv_path.mkdir()
monkeypatch.setattr("sys.prefix", str(venv_path))
monkeypatch.setattr("sys.base_prefix", "/usr")
result = gateway_cli._detect_venv_dir()
assert result == venv_path
def test_falls_back_to_dot_venv_directory(self, tmp_path, monkeypatch):
# Not inside a virtualenv
monkeypatch.setattr("sys.prefix", "/usr")
monkeypatch.setattr("sys.base_prefix", "/usr")
monkeypatch.setattr(gateway_cli, "PROJECT_ROOT", tmp_path)
dot_venv = tmp_path / ".venv"
dot_venv.mkdir()
result = gateway_cli._detect_venv_dir()
assert result == dot_venv
def test_falls_back_to_venv_directory(self, tmp_path, monkeypatch):
monkeypatch.setattr("sys.prefix", "/usr")
monkeypatch.setattr("sys.base_prefix", "/usr")
monkeypatch.setattr(gateway_cli, "PROJECT_ROOT", tmp_path)
venv = tmp_path / "venv"
venv.mkdir()
result = gateway_cli._detect_venv_dir()
assert result == venv
def test_prefers_dot_venv_over_venv(self, tmp_path, monkeypatch):
monkeypatch.setattr("sys.prefix", "/usr")
monkeypatch.setattr("sys.base_prefix", "/usr")
monkeypatch.setattr(gateway_cli, "PROJECT_ROOT", tmp_path)
(tmp_path / ".venv").mkdir()
(tmp_path / "venv").mkdir()
result = gateway_cli._detect_venv_dir()
assert result == tmp_path / ".venv"
def test_returns_none_when_no_virtualenv(self, tmp_path, monkeypatch):
monkeypatch.setattr("sys.prefix", "/usr")
monkeypatch.setattr("sys.base_prefix", "/usr")
monkeypatch.setattr(gateway_cli, "PROJECT_ROOT", tmp_path)
result = gateway_cli._detect_venv_dir()
assert result is None
class TestGeneratedUnitUsesDetectedVenv:
def test_systemd_unit_uses_dot_venv_when_detected(self, tmp_path, monkeypatch):
dot_venv = tmp_path / ".venv"
dot_venv.mkdir()
(dot_venv / "bin").mkdir()
monkeypatch.setattr(gateway_cli, "_detect_venv_dir", lambda: dot_venv)
monkeypatch.setattr(gateway_cli, "get_python_path", lambda: str(dot_venv / "bin" / "python"))
unit = gateway_cli.generate_systemd_unit(system=False)
assert f"VIRTUAL_ENV={dot_venv}" in unit
assert f"{dot_venv}/bin" in unit
# Must NOT contain a hardcoded /venv/ path
assert "/venv/" not in unit or "/.venv/" in unit
class TestEnsureUserSystemdEnv:
"""Tests for _ensure_user_systemd_env() D-Bus session bus auto-detection."""
+25
View File
@@ -92,6 +92,31 @@ class TestParseModelInput:
assert provider == "openrouter"
assert model == "http://localhost:8080/model"
def test_custom_colon_model_single(self):
"""custom:model-name → anonymous custom provider."""
provider, model = parse_model_input("custom:qwen-2.5", "openrouter")
assert provider == "custom"
assert model == "qwen-2.5"
def test_custom_triple_syntax(self):
"""custom:name:model → named custom provider."""
provider, model = parse_model_input("custom:local-server:qwen-2.5", "openrouter")
assert provider == "custom:local-server"
assert model == "qwen-2.5"
def test_custom_triple_spaces(self):
"""Triple syntax should handle whitespace."""
provider, model = parse_model_input("custom: my-server : my-model ", "openrouter")
assert provider == "custom:my-server"
assert model == "my-model"
def test_custom_triple_empty_model_falls_back(self):
"""custom:name: with no model → treated as custom:name (bare)."""
provider, model = parse_model_input("custom:name:", "openrouter")
# Empty model after second colon → no triple match, falls through
assert provider == "custom"
assert model == "name:"
# -- curated_models_for_provider ---------------------------------------------
+2 -2
View File
@@ -34,7 +34,7 @@ def test_nous_oauth_setup_keeps_current_model_when_syncing_disk_provider(
def fake_prompt_choice(question, choices, default=0):
if question == "Select your inference provider:":
return 0
return 1 # Nous Portal
if question == "Configure vision:":
return len(choices) - 1
if question == "Select default model:":
@@ -135,7 +135,7 @@ def test_codex_setup_uses_runtime_access_token_for_live_model_list(tmp_path, mon
def fake_prompt_choice(question, choices, default=0):
if question == "Select your inference provider:":
return 1
return 2 # OpenAI Codex
if question == "Select default model:":
return 0
tts_idx = _maybe_keep_current_tts(question, choices)
@@ -401,7 +401,7 @@ def test_setup_switch_custom_to_codex_clears_custom_endpoint_and_updates_config(
def fake_prompt_choice(question, choices, default=0):
if question == "Select your inference provider:":
return 1
return 2 # OpenAI Codex
if question == "Select default model:":
return 0
tts_idx = _maybe_keep_current_tts(question, choices)
+104
View File
@@ -100,3 +100,107 @@ def test_save_platform_tools_handles_invalid_existing_config():
saved_toolsets = config["platform_toolsets"]["cli"]
assert "web" in saved_toolsets
def test_save_platform_tools_does_not_preserve_platform_default_toolsets():
"""Platform default toolsets (hermes-cli, hermes-telegram, etc.) must NOT
be preserved across saves.
These "super" toolsets resolve to ALL tools, so if they survive in the
config, they silently override any tools the user unchecked. Previously,
the preserve filter only excluded configurable toolset keys (web, browser,
terminal, etc.) and treated platform defaults as unknown custom entries
(like MCP server names), causing them to be kept unconditionally.
Regression test: user unchecks image_gen and homeassistant via
``hermes tools``, but hermes-cli stays in the config and re-enables
everything on the next read.
"""
config = {
"platform_toolsets": {
"cli": [
"browser", "clarify", "code_execution", "cronjob",
"delegation", "file", "hermes-cli", # <-- the culprit
"memory", "session_search", "skills", "terminal",
"todo", "tts", "vision", "web",
]
}
}
# User unchecks image_gen, homeassistant, moa — keeps the rest
new_selection = {
"browser", "clarify", "code_execution", "cronjob",
"delegation", "file", "memory", "session_search",
"skills", "terminal", "todo", "tts", "vision", "web",
}
with patch("hermes_cli.tools_config.save_config"):
_save_platform_tools(config, "cli", new_selection)
saved = config["platform_toolsets"]["cli"]
# hermes-cli must NOT survive — it's a platform default, not an MCP server
assert "hermes-cli" not in saved
# The individual toolset keys the user selected must be present
assert "web" in saved
assert "terminal" in saved
assert "browser" in saved
# Tools the user unchecked must NOT be present
assert "image_gen" not in saved
assert "homeassistant" not in saved
assert "moa" not in saved
def test_save_platform_tools_does_not_preserve_hermes_telegram():
"""Same bug for Telegram — hermes-telegram must not be preserved."""
config = {
"platform_toolsets": {
"telegram": [
"browser", "file", "hermes-telegram", "terminal", "web",
]
}
}
new_selection = {"browser", "file", "terminal", "web"}
with patch("hermes_cli.tools_config.save_config"):
_save_platform_tools(config, "telegram", new_selection)
saved = config["platform_toolsets"]["telegram"]
assert "hermes-telegram" not in saved
assert "web" in saved
def test_save_platform_tools_still_preserves_mcp_with_platform_default_present():
"""MCP server names must still be preserved even when platform defaults
are being stripped out."""
config = {
"platform_toolsets": {
"cli": [
"web", "terminal", "hermes-cli", "my-mcp-server", "github-tools",
]
}
}
new_selection = {"web", "browser"}
with patch("hermes_cli.tools_config.save_config"):
_save_platform_tools(config, "cli", new_selection)
saved = config["platform_toolsets"]["cli"]
# MCP servers preserved
assert "my-mcp-server" in saved
assert "github-tools" in saved
# Platform default stripped
assert "hermes-cli" not in saved
# User selections present
assert "web" in saved
assert "browser" in saved
# Deselected configurable toolset removed
assert "terminal" not in saved
-1
View File
@@ -41,7 +41,6 @@ except ImportError:
# Add project root to path for imports
parent_dir = Path(__file__).parent.parent.parent
sys.path.insert(0, str(parent_dir))
sys.path.insert(0, str(parent_dir / "mini-swe-agent" / "src"))
# Import terminal_tool module directly using importlib to avoid tools/__init__.py
import importlib.util
+7 -2
View File
@@ -111,8 +111,13 @@ class TestModelCommand:
assert cli_obj.model == "glm-5"
assert cli_obj.provider == "zai"
assert cli_obj.base_url == "https://api.z.ai/api/paas/v4"
# Both model and provider should be saved
assert save_mock.call_count == 2
# Model, provider, and base_url should be saved
assert save_mock.call_count == 3
save_calls = [c.args for c in save_mock.call_args_list]
assert ("model.default", "glm-5") in save_calls
assert ("model.provider", "zai") in save_calls
# base_url is also persisted on provider change (Phase 2 fix)
assert any(c[0] == "model.base_url" for c in save_calls)
def test_provider_switch_fails_on_bad_credentials(self, capsys):
cli_obj = self._make_cli()
+132
View File
@@ -0,0 +1,132 @@
"""Tests for ${ENV_VAR} substitution in config.yaml values."""
import os
import pytest
from hermes_cli.config import _expand_env_vars, load_config
from unittest.mock import patch as mock_patch
class TestExpandEnvVars:
def test_simple_substitution(self):
with pytest.MonkeyPatch().context() as mp:
mp.setenv("MY_KEY", "secret123")
assert _expand_env_vars("${MY_KEY}") == "secret123"
def test_missing_var_kept_verbatim(self):
with pytest.MonkeyPatch().context() as mp:
mp.delenv("UNDEFINED_VAR_XYZ", raising=False)
assert _expand_env_vars("${UNDEFINED_VAR_XYZ}") == "${UNDEFINED_VAR_XYZ}"
def test_no_placeholder_unchanged(self):
assert _expand_env_vars("plain-value") == "plain-value"
def test_dict_recursive(self):
with pytest.MonkeyPatch().context() as mp:
mp.setenv("TOKEN", "tok-abc")
result = _expand_env_vars({"key": "${TOKEN}", "other": "literal"})
assert result == {"key": "tok-abc", "other": "literal"}
def test_nested_dict(self):
with pytest.MonkeyPatch().context() as mp:
mp.setenv("API_KEY", "sk-xyz")
result = _expand_env_vars({"model": {"api_key": "${API_KEY}"}})
assert result["model"]["api_key"] == "sk-xyz"
def test_list_items(self):
with pytest.MonkeyPatch().context() as mp:
mp.setenv("VAL", "hello")
result = _expand_env_vars(["${VAL}", "literal", 42])
assert result == ["hello", "literal", 42]
def test_non_string_values_untouched(self):
assert _expand_env_vars(42) == 42
assert _expand_env_vars(3.14) == 3.14
assert _expand_env_vars(True) is True
assert _expand_env_vars(None) is None
def test_multiple_placeholders_in_one_string(self):
with pytest.MonkeyPatch().context() as mp:
mp.setenv("HOST", "localhost")
mp.setenv("PORT", "5432")
assert _expand_env_vars("${HOST}:${PORT}") == "localhost:5432"
def test_dict_keys_not_expanded(self):
with pytest.MonkeyPatch().context() as mp:
mp.setenv("KEY", "value")
result = _expand_env_vars({"${KEY}": "no-expand-key"})
assert "${KEY}" in result
class TestLoadConfigExpansion:
def test_load_config_expands_env_vars(self, tmp_path, monkeypatch):
config_yaml = (
"model:\n"
" api_key: ${GOOGLE_API_KEY}\n"
"platforms:\n"
" telegram:\n"
" token: ${TELEGRAM_BOT_TOKEN}\n"
"plain: no-substitution\n"
)
config_file = tmp_path / "config.yaml"
config_file.write_text(config_yaml)
monkeypatch.setenv("GOOGLE_API_KEY", "gsk-test-key")
monkeypatch.setenv("TELEGRAM_BOT_TOKEN", "1234567:ABC-token")
monkeypatch.setattr("hermes_cli.config.get_config_path", lambda: config_file)
config = load_config()
assert config["model"]["api_key"] == "gsk-test-key"
assert config["platforms"]["telegram"]["token"] == "1234567:ABC-token"
assert config["plain"] == "no-substitution"
def test_load_config_unresolved_kept_verbatim(self, tmp_path, monkeypatch):
config_yaml = "model:\n api_key: ${NOT_SET_XYZ_123}\n"
config_file = tmp_path / "config.yaml"
config_file.write_text(config_yaml)
monkeypatch.delenv("NOT_SET_XYZ_123", raising=False)
monkeypatch.setattr("hermes_cli.config.get_config_path", lambda: config_file)
config = load_config()
assert config["model"]["api_key"] == "${NOT_SET_XYZ_123}"
class TestLoadCliConfigExpansion:
"""Verify that load_cli_config() also expands ${VAR} references."""
def test_cli_config_expands_auxiliary_api_key(self, tmp_path, monkeypatch):
config_yaml = (
"auxiliary:\n"
" vision:\n"
" api_key: ${TEST_VISION_KEY_XYZ}\n"
)
config_file = tmp_path / "config.yaml"
config_file.write_text(config_yaml)
monkeypatch.setenv("TEST_VISION_KEY_XYZ", "vis-key-123")
# Patch the hermes home so load_cli_config finds our test config
monkeypatch.setattr("cli._hermes_home", tmp_path)
from cli import load_cli_config
config = load_cli_config()
assert config["auxiliary"]["vision"]["api_key"] == "vis-key-123"
def test_cli_config_unresolved_kept_verbatim(self, tmp_path, monkeypatch):
config_yaml = (
"auxiliary:\n"
" vision:\n"
" api_key: ${UNSET_CLI_VAR_ABC}\n"
)
config_file = tmp_path / "config.yaml"
config_file.write_text(config_yaml)
monkeypatch.delenv("UNSET_CLI_VAR_ABC", raising=False)
monkeypatch.setattr("cli._hermes_home", tmp_path)
from cli import load_cli_config
config = load_cli_config()
assert config["auxiliary"]["vision"]["api_key"] == "${UNSET_CLI_VAR_ABC}"
+34 -44
View File
@@ -29,40 +29,36 @@ class TestFormatContextPressure:
raw context window. 60% = 60% of the way to compaction.
"""
def test_60_percent_uses_info_icon(self):
line = format_context_pressure(0.60, 100_000, 0.50)
assert "" in line
assert "60% to compaction" in line
def test_85_percent_uses_warning_icon(self):
line = format_context_pressure(0.85, 100_000, 0.50)
def test_80_percent_uses_warning_icon(self):
line = format_context_pressure(0.80, 100_000, 0.50)
assert "" in line
assert "85% to compaction" in line
assert "80% to compaction" in line
def test_90_percent_uses_warning_icon(self):
line = format_context_pressure(0.90, 100_000, 0.50)
assert "" in line
assert "90% to compaction" in line
def test_bar_length_scales_with_progress(self):
line_60 = format_context_pressure(0.60, 100_000, 0.50)
line_85 = format_context_pressure(0.85, 100_000, 0.50)
assert line_85.count("") > line_60.count("")
line_80 = format_context_pressure(0.80, 100_000, 0.50)
line_95 = format_context_pressure(0.95, 100_000, 0.50)
assert line_95.count("") > line_80.count("")
def test_shows_threshold_tokens(self):
line = format_context_pressure(0.60, 100_000, 0.50)
line = format_context_pressure(0.80, 100_000, 0.50)
assert "100k" in line
def test_small_threshold(self):
line = format_context_pressure(0.60, 500, 0.50)
line = format_context_pressure(0.80, 500, 0.50)
assert "500" in line
def test_shows_threshold_percent(self):
line = format_context_pressure(0.85, 100_000, 0.50)
assert "50%" in line # threshold percent shown
line = format_context_pressure(0.80, 100_000, 0.50)
assert "50%" in line
def test_imminent_hint_at_85(self):
line = format_context_pressure(0.85, 100_000, 0.50)
assert "compaction imminent" in line
def test_approaching_hint_below_85(self):
line = format_context_pressure(0.60, 100_000, 0.80)
assert "approaching compaction" in line
def test_approaching_hint(self):
line = format_context_pressure(0.80, 100_000, 0.50)
assert "compaction approaching" in line
def test_no_compaction_when_disabled(self):
line = format_context_pressure(0.85, 100_000, 0.50, compression_enabled=False)
@@ -82,26 +78,26 @@ class TestFormatContextPressure:
class TestFormatContextPressureGateway:
"""Gateway (plain text) context pressure display."""
def test_60_percent_informational(self):
msg = format_context_pressure_gateway(0.60, 0.50)
assert "60% to compaction" in msg
assert "50%" in msg # threshold shown
def test_80_percent_warning(self):
msg = format_context_pressure_gateway(0.80, 0.50)
assert "80% to compaction" in msg
assert "50%" in msg
def test_85_percent_warning(self):
msg = format_context_pressure_gateway(0.85, 0.50)
assert "85% to compaction" in msg
assert "imminent" in msg
def test_90_percent_warning(self):
msg = format_context_pressure_gateway(0.90, 0.50)
assert "90% to compaction" in msg
assert "approaching" in msg
def test_no_compaction_warning(self):
msg = format_context_pressure_gateway(0.85, 0.50, compression_enabled=False)
assert "disabled" in msg
def test_no_ansi_codes(self):
msg = format_context_pressure_gateway(0.85, 0.50)
msg = format_context_pressure_gateway(0.80, 0.50)
assert "\033[" not in msg
def test_has_progress_bar(self):
msg = format_context_pressure_gateway(0.85, 0.50)
msg = format_context_pressure_gateway(0.80, 0.50)
assert "" in msg
@@ -145,9 +141,8 @@ def agent():
class TestContextPressureFlags:
"""Context pressure warning flag tracking on AIAgent."""
def test_flags_initialized_false(self, agent):
assert agent._context_50_warned is False
assert agent._context_70_warned is False
def test_flag_initialized_false(self, agent):
assert agent._context_pressure_warned is False
def test_emit_calls_status_callback(self, agent):
"""status_callback should be invoked with event type and message."""
@@ -204,13 +199,11 @@ class TestContextPressureFlags:
captured = capsys.readouterr()
assert "" not in captured.out
def test_flags_reset_on_compression(self, agent):
"""After _compress_context, context pressure flags should reset."""
agent._context_50_warned = True
agent._context_70_warned = True
def test_flag_reset_on_compression(self, agent):
"""After _compress_context, context pressure flag should reset."""
agent._context_pressure_warned = True
agent.compression_enabled = True
# Mock the compressor's compress method to return minimal valid output
agent.context_compressor = MagicMock()
agent.context_compressor.compress.return_value = [
{"role": "user", "content": "Summary of conversation so far."}
@@ -218,11 +211,9 @@ class TestContextPressureFlags:
agent.context_compressor.context_length = 200_000
agent.context_compressor.threshold_tokens = 100_000
# Mock _todo_store
agent._todo_store = MagicMock()
agent._todo_store.format_for_injection.return_value = None
# Mock _build_system_prompt
agent._build_system_prompt = MagicMock(return_value="system prompt")
agent._cached_system_prompt = "old system prompt"
agent._session_db = None
@@ -233,8 +224,7 @@ class TestContextPressureFlags:
]
agent._compress_context(messages, "system prompt")
assert agent._context_50_warned is False
assert agent._context_70_warned is False
assert agent._context_pressure_warned is False
def test_emit_callback_error_handled(self, agent):
"""If status_callback raises, it should be caught gracefully."""
+2 -34
View File
@@ -1,34 +1,2 @@
"""Tests for minisweagent_path.py."""
from pathlib import Path
from minisweagent_path import discover_minisweagent_src
def test_discover_minisweagent_src_in_current_checkout(tmp_path):
repo = tmp_path / "repo"
src = repo / "mini-swe-agent" / "src"
src.mkdir(parents=True)
assert discover_minisweagent_src(repo) == src.resolve()
def test_discover_minisweagent_src_falls_back_from_worktree_to_main_checkout(tmp_path):
main_repo = tmp_path / "main-repo"
(main_repo / ".git" / "worktrees" / "wt1").mkdir(parents=True)
main_src = main_repo / "mini-swe-agent" / "src"
main_src.mkdir(parents=True)
worktree = tmp_path / "worktree"
worktree.mkdir()
(worktree / ".git").write_text(f"gitdir: {main_repo / '.git' / 'worktrees' / 'wt1'}\n", encoding="utf-8")
(worktree / "mini-swe-agent").mkdir() # empty placeholder, no src/
assert discover_minisweagent_src(worktree) == main_src.resolve()
def test_discover_minisweagent_src_returns_none_when_missing(tmp_path):
repo = tmp_path / "repo"
repo.mkdir()
assert discover_minisweagent_src(repo) is None
# This file intentionally left empty.
# minisweagent_path.py was removed — see PR #2804.
+79 -1
View File
@@ -267,7 +267,7 @@ def test_named_custom_provider_uses_saved_credentials(monkeypatch):
resolved = rp.resolve_runtime_provider(requested="local")
assert resolved["provider"] == "openrouter"
assert resolved["provider"] == "custom"
assert resolved["api_mode"] == "chat_completions"
assert resolved["base_url"] == "http://1.2.3.4:1234/v1"
assert resolved["api_key"] == "local-provider-key"
@@ -579,3 +579,81 @@ def test_named_custom_provider_anthropic_api_mode(monkeypatch):
assert resolved["api_mode"] == "anthropic_messages"
assert resolved["base_url"] == "https://proxy.example.com/anthropic"
# ------------------------------------------------------------------
# fix #2562 — resolve_provider("custom") must not remap to "openrouter"
# ------------------------------------------------------------------
def test_resolve_provider_custom_returns_custom():
"""resolve_provider('custom') must return 'custom', not 'openrouter'."""
from hermes_cli.auth import resolve_provider
assert resolve_provider("custom") == "custom"
def test_resolve_provider_openrouter_unchanged():
"""resolve_provider('openrouter') must still return 'openrouter'."""
from hermes_cli.auth import resolve_provider
assert resolve_provider("openrouter") == "openrouter"
def test_custom_provider_runtime_preserves_provider_name(monkeypatch):
"""resolve_runtime_provider with provider='custom' must return provider='custom'."""
monkeypatch.delenv("OPENAI_API_KEY", raising=False)
monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
monkeypatch.delenv("OPENAI_BASE_URL", raising=False)
monkeypatch.delenv("OPENROUTER_BASE_URL", raising=False)
monkeypatch.setattr(
rp,
"load_config",
lambda: {
"model": {
"provider": "custom",
"base_url": "http://localhost:8080/v1",
"api_key": "test-key-123",
}
},
)
resolved = rp.resolve_runtime_provider(requested="custom")
assert resolved["provider"] == "custom", (
f"Expected provider='custom', got provider='{resolved['provider']}'"
)
assert resolved["base_url"] == "http://localhost:8080/v1"
assert resolved["api_key"] == "test-key-123"
def test_custom_provider_no_key_gets_placeholder(monkeypatch):
"""Local server with no API key should get 'no-key-required' placeholder."""
monkeypatch.delenv("OPENAI_API_KEY", raising=False)
monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
monkeypatch.delenv("OPENAI_BASE_URL", raising=False)
monkeypatch.delenv("OPENROUTER_BASE_URL", raising=False)
monkeypatch.setattr(
rp,
"load_config",
lambda: {
"model": {
"provider": "custom",
"base_url": "http://localhost:8080/v1",
}
},
)
resolved = rp.resolve_runtime_provider(requested="custom")
assert resolved["provider"] == "custom"
assert resolved["api_key"] == "no-key-required"
assert resolved["base_url"] == "http://localhost:8080/v1"
def test_openrouter_provider_not_affected_by_custom_fix(monkeypatch):
"""Fixing custom must not change openrouter behavior."""
monkeypatch.delenv("OPENAI_API_KEY", raising=False)
monkeypatch.delenv("OPENAI_BASE_URL", raising=False)
monkeypatch.delenv("OPENROUTER_BASE_URL", raising=False)
monkeypatch.setenv("OPENROUTER_API_KEY", "test-or-key")
monkeypatch.setattr(rp, "load_config", lambda: {})
resolved = rp.resolve_runtime_provider(requested="openrouter")
assert resolved["provider"] == "openrouter"
+168
View File
@@ -0,0 +1,168 @@
"""Comprehensive tests for ANSI escape sequence stripping (ECMA-48).
The strip_ansi function in tools/ansi_strip.py is the source-level fix for
ANSI codes leaking into the model's context via terminal/execute_code output.
It must strip ALL terminal escape sequences while preserving legitimate text.
"""
from tools.ansi_strip import strip_ansi
class TestStripAnsiBasicSGR:
"""Select Graphic Rendition — the most common ANSI sequences."""
def test_reset(self):
assert strip_ansi("\x1b[0m") == ""
def test_color(self):
assert strip_ansi("\x1b[31;1m") == ""
def test_truecolor_semicolon(self):
assert strip_ansi("\x1b[38;2;255;0;0m") == ""
def test_truecolor_colon_separated(self):
"""Modern terminals use colon-separated SGR params."""
assert strip_ansi("\x1b[38:2:255:0:0m") == ""
assert strip_ansi("\x1b[48:2:0:255:0m") == ""
class TestStripAnsiCSIPrivateMode:
"""CSI sequences with ? prefix (DEC private modes)."""
def test_cursor_show_hide(self):
assert strip_ansi("\x1b[?25h") == ""
assert strip_ansi("\x1b[?25l") == ""
def test_alt_screen(self):
assert strip_ansi("\x1b[?1049h") == ""
assert strip_ansi("\x1b[?1049l") == ""
def test_bracketed_paste(self):
assert strip_ansi("\x1b[?2004h") == ""
class TestStripAnsiCSIIntermediate:
"""CSI sequences with intermediate bytes (space, etc.)."""
def test_cursor_shape(self):
assert strip_ansi("\x1b[0 q") == ""
assert strip_ansi("\x1b[2 q") == ""
assert strip_ansi("\x1b[6 q") == ""
class TestStripAnsiOSC:
"""Operating System Command sequences."""
def test_bel_terminator(self):
assert strip_ansi("\x1b]0;title\x07") == ""
def test_st_terminator(self):
assert strip_ansi("\x1b]0;title\x1b\\") == ""
def test_hyperlink_preserves_text(self):
assert strip_ansi(
"\x1b]8;;https://example.com\x1b\\click\x1b]8;;\x1b\\"
) == "click"
class TestStripAnsiDECPrivate:
"""DEC private / Fp escape sequences."""
def test_save_restore_cursor(self):
assert strip_ansi("\x1b7") == ""
assert strip_ansi("\x1b8") == ""
def test_keypad_modes(self):
assert strip_ansi("\x1b=") == ""
assert strip_ansi("\x1b>") == ""
class TestStripAnsiFe:
"""Fe (C1 as 7-bit) escape sequences."""
def test_reverse_index(self):
assert strip_ansi("\x1bM") == ""
def test_reset_terminal(self):
assert strip_ansi("\x1bc") == ""
def test_index_and_newline(self):
assert strip_ansi("\x1bD") == ""
assert strip_ansi("\x1bE") == ""
class TestStripAnsiNF:
"""nF (character set selection) sequences."""
def test_charset_selection(self):
assert strip_ansi("\x1b(A") == ""
assert strip_ansi("\x1b(B") == ""
assert strip_ansi("\x1b(0") == ""
class TestStripAnsiDCS:
"""Device Control String sequences."""
def test_dcs(self):
assert strip_ansi("\x1bP+q\x1b\\") == ""
class TestStripAnsi8BitC1:
"""8-bit C1 control characters."""
def test_8bit_csi(self):
assert strip_ansi("\x9b31m") == ""
assert strip_ansi("\x9b38;2;255;0;0m") == ""
def test_8bit_standalone(self):
assert strip_ansi("\x9c") == ""
assert strip_ansi("\x9d") == ""
assert strip_ansi("\x90") == ""
class TestStripAnsiRealWorld:
"""Real-world contamination scenarios from bug reports."""
def test_colored_shebang(self):
"""The original reported bug: shebang corrupted by color codes."""
assert strip_ansi(
"\x1b[32m#!/usr/bin/env python3\x1b[0m\nprint('hello')"
) == "#!/usr/bin/env python3\nprint('hello')"
def test_stacked_sgr(self):
assert strip_ansi(
"\x1b[1m\x1b[31m\x1b[42mhello\x1b[0m"
) == "hello"
def test_ansi_mid_code(self):
assert strip_ansi(
"def foo(\x1b[33m):\x1b[0m\n return 42"
) == "def foo():\n return 42"
class TestStripAnsiPassthrough:
"""Clean content must pass through unmodified."""
def test_plain_text(self):
assert strip_ansi("normal text") == "normal text"
def test_empty(self):
assert strip_ansi("") == ""
def test_none(self):
assert strip_ansi(None) is None
def test_whitespace_preserved(self):
assert strip_ansi("line1\nline2\ttab") == "line1\nline2\ttab"
def test_unicode_safe(self):
assert strip_ansi("emoji 🎉 and ñ café") == "emoji 🎉 and ñ café"
def test_backslash_in_code(self):
code = "path = 'C:\\\\Users\\\\test'"
assert strip_ansi(code) == code
def test_square_brackets_in_code(self):
"""Array indexing must not be confused with CSI."""
code = "arr[0] = arr[31]"
assert strip_ansi(code) == code
+259
View File
@@ -0,0 +1,259 @@
"""Tests for macOS Homebrew PATH discovery in browser_tool.py."""
import json
import os
import subprocess
from pathlib import Path
from unittest.mock import patch, MagicMock, mock_open
import pytest
from tools.browser_tool import (
_discover_homebrew_node_dirs,
_find_agent_browser,
_run_browser_command,
_SANE_PATH,
)
class TestSanePath:
"""Verify _SANE_PATH includes Homebrew directories."""
def test_includes_homebrew_bin(self):
assert "/opt/homebrew/bin" in _SANE_PATH
def test_includes_homebrew_sbin(self):
assert "/opt/homebrew/sbin" in _SANE_PATH
def test_includes_standard_dirs(self):
assert "/usr/local/bin" in _SANE_PATH
assert "/usr/bin" in _SANE_PATH
assert "/bin" in _SANE_PATH
class TestDiscoverHomebrewNodeDirs:
"""Tests for _discover_homebrew_node_dirs()."""
def test_returns_empty_when_no_homebrew(self):
"""Non-macOS systems without /opt/homebrew/opt should return empty."""
with patch("os.path.isdir", return_value=False):
assert _discover_homebrew_node_dirs() == []
def test_finds_versioned_node_dirs(self):
"""Should discover node@20/bin, node@24/bin etc."""
entries = ["node@20", "node@24", "openssl", "node", "python@3.12"]
def mock_isdir(p):
if p == "/opt/homebrew/opt":
return True
# node@20/bin and node@24/bin exist
if p in (
"/opt/homebrew/opt/node@20/bin",
"/opt/homebrew/opt/node@24/bin",
):
return True
return False
with patch("os.path.isdir", side_effect=mock_isdir), \
patch("os.listdir", return_value=entries):
result = _discover_homebrew_node_dirs()
assert len(result) == 2
assert "/opt/homebrew/opt/node@20/bin" in result
assert "/opt/homebrew/opt/node@24/bin" in result
def test_excludes_plain_node(self):
"""'node' (unversioned) should be excluded — covered by /opt/homebrew/bin."""
with patch("os.path.isdir", return_value=True), \
patch("os.listdir", return_value=["node"]):
result = _discover_homebrew_node_dirs()
assert result == []
def test_handles_oserror_gracefully(self):
"""Should return empty list if listdir raises OSError."""
with patch("os.path.isdir", return_value=True), \
patch("os.listdir", side_effect=OSError("Permission denied")):
assert _discover_homebrew_node_dirs() == []
class TestFindAgentBrowser:
"""Tests for _find_agent_browser() Homebrew path search."""
def test_finds_in_current_path(self):
"""Should return result from shutil.which if available on current PATH."""
with patch("shutil.which", return_value="/usr/local/bin/agent-browser"):
assert _find_agent_browser() == "/usr/local/bin/agent-browser"
def test_finds_in_homebrew_bin(self):
"""Should search Homebrew dirs when not found on current PATH."""
def mock_which(cmd, path=None):
if path and "/opt/homebrew/bin" in path and cmd == "agent-browser":
return "/opt/homebrew/bin/agent-browser"
return None
with patch("shutil.which", side_effect=mock_which), \
patch("os.path.isdir", return_value=True), \
patch(
"tools.browser_tool._discover_homebrew_node_dirs",
return_value=[],
):
result = _find_agent_browser()
assert result == "/opt/homebrew/bin/agent-browser"
def test_finds_npx_in_homebrew(self):
"""Should find npx in Homebrew paths as a fallback."""
def mock_which(cmd, path=None):
if cmd == "agent-browser":
return None
if cmd == "npx":
if path and "/opt/homebrew/bin" in path:
return "/opt/homebrew/bin/npx"
return None
return None
# Mock Path.exists() to prevent the local node_modules check from matching
original_path_exists = Path.exists
def mock_path_exists(self):
if "node_modules" in str(self) and "agent-browser" in str(self):
return False
return original_path_exists(self)
with patch("shutil.which", side_effect=mock_which), \
patch("os.path.isdir", return_value=True), \
patch.object(Path, "exists", mock_path_exists), \
patch(
"tools.browser_tool._discover_homebrew_node_dirs",
return_value=[],
):
result = _find_agent_browser()
assert result == "npx agent-browser"
def test_raises_when_not_found(self):
"""Should raise FileNotFoundError when nothing works."""
original_path_exists = Path.exists
def mock_path_exists(self):
if "node_modules" in str(self) and "agent-browser" in str(self):
return False
return original_path_exists(self)
with patch("shutil.which", return_value=None), \
patch("os.path.isdir", return_value=False), \
patch.object(Path, "exists", mock_path_exists), \
patch(
"tools.browser_tool._discover_homebrew_node_dirs",
return_value=[],
):
with pytest.raises(FileNotFoundError, match="agent-browser CLI not found"):
_find_agent_browser()
class TestRunBrowserCommandPathConstruction:
"""Verify _run_browser_command() includes Homebrew node dirs in subprocess PATH."""
def test_subprocess_path_includes_homebrew_node_dirs(self, tmp_path):
"""When _discover_homebrew_node_dirs returns dirs, they should appear
in the subprocess env PATH passed to Popen."""
captured_env = {}
# Create a mock Popen that captures the env dict
mock_proc = MagicMock()
mock_proc.returncode = 0
mock_proc.wait.return_value = 0
def capture_popen(cmd, **kwargs):
captured_env.update(kwargs.get("env", {}))
return mock_proc
fake_session = {
"session_name": "test-session",
"session_id": "test-id",
"cdp_url": None,
}
# Write fake JSON output to the stdout temp file
fake_json = json.dumps({"success": True})
stdout_file = tmp_path / "stdout"
stdout_file.write_text(fake_json)
fake_homebrew_dirs = [
"/opt/homebrew/opt/node@24/bin",
"/opt/homebrew/opt/node@20/bin",
]
# We need os.path.isdir to return True for our fake dirs
# but we also need real isdir for tmp_path operations
real_isdir = os.path.isdir
def selective_isdir(p):
if p in fake_homebrew_dirs or p.startswith(str(tmp_path)):
return True
if "/opt/homebrew/" in p:
return True # _SANE_PATH dirs
return real_isdir(p)
with patch("tools.browser_tool._find_agent_browser", return_value="/usr/local/bin/agent-browser"), \
patch("tools.browser_tool._get_session_info", return_value=fake_session), \
patch("tools.browser_tool._socket_safe_tmpdir", return_value=str(tmp_path)), \
patch("tools.browser_tool._discover_homebrew_node_dirs", return_value=fake_homebrew_dirs), \
patch("os.path.isdir", side_effect=selective_isdir), \
patch("subprocess.Popen", side_effect=capture_popen), \
patch("os.open", return_value=99), \
patch("os.close"), \
patch("tools.interrupt.is_interrupted", return_value=False), \
patch.dict(os.environ, {"PATH": "/usr/bin:/bin", "HOME": "/home/test"}, clear=True):
# The function reads from temp files for stdout/stderr
with patch("builtins.open", mock_open(read_data=fake_json)):
_run_browser_command("test-task", "navigate", ["https://example.com"])
# Verify Homebrew node dirs made it into the subprocess PATH
result_path = captured_env.get("PATH", "")
assert "/opt/homebrew/opt/node@24/bin" in result_path
assert "/opt/homebrew/opt/node@20/bin" in result_path
assert "/opt/homebrew/bin" in result_path # from _SANE_PATH
def test_subprocess_path_includes_sane_path_homebrew(self, tmp_path):
"""_SANE_PATH Homebrew entries should appear even without versioned node dirs."""
captured_env = {}
mock_proc = MagicMock()
mock_proc.returncode = 0
mock_proc.wait.return_value = 0
def capture_popen(cmd, **kwargs):
captured_env.update(kwargs.get("env", {}))
return mock_proc
fake_session = {
"session_name": "test-session",
"session_id": "test-id",
"cdp_url": None,
}
fake_json = json.dumps({"success": True})
real_isdir = os.path.isdir
def selective_isdir(p):
if "/opt/homebrew/" in p:
return True
if p.startswith(str(tmp_path)):
return True
return real_isdir(p)
with patch("tools.browser_tool._find_agent_browser", return_value="/usr/local/bin/agent-browser"), \
patch("tools.browser_tool._get_session_info", return_value=fake_session), \
patch("tools.browser_tool._socket_safe_tmpdir", return_value=str(tmp_path)), \
patch("tools.browser_tool._discover_homebrew_node_dirs", return_value=[]), \
patch("os.path.isdir", side_effect=selective_isdir), \
patch("subprocess.Popen", side_effect=capture_popen), \
patch("os.open", return_value=99), \
patch("os.close"), \
patch("tools.interrupt.is_interrupted", return_value=False), \
patch.dict(os.environ, {"PATH": "/usr/bin:/bin", "HOME": "/home/test"}, clear=True):
with patch("builtins.open", mock_open(read_data=fake_json)):
_run_browser_command("test-task", "navigate", ["https://example.com"])
result_path = captured_env.get("PATH", "")
assert "/opt/homebrew/bin" in result_path
assert "/opt/homebrew/sbin" in result_path
+2 -2
View File
@@ -131,9 +131,9 @@ class TestExecuteCode(unittest.TestCase):
def test_repo_root_modules_are_importable(self):
"""Sandboxed scripts can import modules that live at the repo root."""
result = self._run('import minisweagent_path; print(minisweagent_path.__file__)')
result = self._run('import hermes_constants; print(hermes_constants.__file__)')
self.assertEqual(result["status"], "success")
self.assertIn("minisweagent_path.py", result["output"])
self.assertIn("hermes_constants.py", result["output"])
def test_single_tool_call(self):
"""Script calls terminal and prints the result."""
+45 -97
View File
@@ -9,25 +9,24 @@ import pytest
from tools.environments import docker as docker_env
def _install_fake_minisweagent(monkeypatch, captured_run_args):
class MockInnerDocker:
container_id = "fake-container"
config = type("Config", (), {"executable": "/usr/bin/docker", "forward_env": [], "env": {}})()
def _mock_subprocess_run(monkeypatch):
"""Mock subprocess.run to intercept docker run -d and docker version calls.
def __init__(self, **kwargs):
captured_run_args.extend(kwargs.get("run_args", []))
Returns a list of captured (cmd, kwargs) tuples for inspection.
"""
calls = []
def cleanup(self):
pass
def _run(cmd, **kwargs):
calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
if isinstance(cmd, list) and len(cmd) >= 2:
if cmd[1] == "version":
return subprocess.CompletedProcess(cmd, 0, stdout="Docker version", stderr="")
if cmd[1] == "run":
return subprocess.CompletedProcess(cmd, 0, stdout="fake-container-id\n", stderr="")
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
minisweagent_mod = types.ModuleType("minisweagent")
environments_mod = types.ModuleType("minisweagent.environments")
docker_mod = types.ModuleType("minisweagent.environments.docker")
docker_mod.DockerEnvironment = MockInnerDocker
monkeypatch.setitem(sys.modules, "minisweagent", minisweagent_mod)
monkeypatch.setitem(sys.modules, "minisweagent.environments", environments_mod)
monkeypatch.setitem(sys.modules, "minisweagent.environments.docker", docker_mod)
monkeypatch.setattr(docker_env.subprocess, "run", _run)
return calls
def _make_dummy_env(**kwargs):
@@ -49,7 +48,7 @@ def _make_dummy_env(**kwargs):
def test_ensure_docker_available_logs_and_raises_when_not_found(monkeypatch, caplog):
"""When docker cannot be found, raise a clear error before mini-swe setup."""
"""When docker cannot be found, raise a clear error before container setup."""
monkeypatch.setattr(docker_env, "find_docker", lambda: None)
monkeypatch.setattr(
@@ -118,14 +117,8 @@ def test_auto_mount_host_cwd_adds_volume(monkeypatch, tmp_path):
project_dir = tmp_path / "my-project"
project_dir.mkdir()
def _run_docker_version(*args, **kwargs):
return subprocess.CompletedProcess(args[0], 0, stdout="Docker version", stderr="")
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env.subprocess, "run", _run_docker_version)
captured_run_args = []
_install_fake_minisweagent(monkeypatch, captured_run_args)
calls = _mock_subprocess_run(monkeypatch)
_make_dummy_env(
cwd="/workspace",
@@ -133,7 +126,10 @@ def test_auto_mount_host_cwd_adds_volume(monkeypatch, tmp_path):
auto_mount_cwd=True,
)
run_args_str = " ".join(captured_run_args)
# Find the docker run call and check its args
run_calls = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "run"]
assert run_calls, "docker run should have been called"
run_args_str = " ".join(run_calls[0][0])
assert f"{project_dir}:/workspace" in run_args_str
@@ -142,14 +138,8 @@ def test_auto_mount_disabled_by_default(monkeypatch, tmp_path):
project_dir = tmp_path / "my-project"
project_dir.mkdir()
def _run_docker_version(*args, **kwargs):
return subprocess.CompletedProcess(args[0], 0, stdout="Docker version", stderr="")
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env.subprocess, "run", _run_docker_version)
captured_run_args = []
_install_fake_minisweagent(monkeypatch, captured_run_args)
calls = _mock_subprocess_run(monkeypatch)
_make_dummy_env(
cwd="/root",
@@ -157,7 +147,9 @@ def test_auto_mount_disabled_by_default(monkeypatch, tmp_path):
auto_mount_cwd=False,
)
run_args_str = " ".join(captured_run_args)
run_calls = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "run"]
assert run_calls, "docker run should have been called"
run_args_str = " ".join(run_calls[0][0])
assert f"{project_dir}:/workspace" not in run_args_str
@@ -168,14 +160,8 @@ def test_auto_mount_skipped_when_workspace_already_mounted(monkeypatch, tmp_path
other_dir = tmp_path / "other"
other_dir.mkdir()
def _run_docker_version(*args, **kwargs):
return subprocess.CompletedProcess(args[0], 0, stdout="Docker version", stderr="")
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env.subprocess, "run", _run_docker_version)
captured_run_args = []
_install_fake_minisweagent(monkeypatch, captured_run_args)
calls = _mock_subprocess_run(monkeypatch)
_make_dummy_env(
cwd="/workspace",
@@ -184,7 +170,9 @@ def test_auto_mount_skipped_when_workspace_already_mounted(monkeypatch, tmp_path
volumes=[f"{other_dir}:/workspace"],
)
run_args_str = " ".join(captured_run_args)
run_calls = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "run"]
assert run_calls, "docker run should have been called"
run_args_str = " ".join(run_calls[0][0])
assert f"{other_dir}:/workspace" in run_args_str
assert run_args_str.count(":/workspace") == 1
@@ -194,14 +182,8 @@ def test_auto_mount_replaces_persistent_workspace_bind(monkeypatch, tmp_path):
project_dir = tmp_path / "my-project"
project_dir.mkdir()
def _run_docker_version(*args, **kwargs):
return subprocess.CompletedProcess(args[0], 0, stdout="Docker version", stderr="")
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env.subprocess, "run", _run_docker_version)
captured_run_args = []
_install_fake_minisweagent(monkeypatch, captured_run_args)
calls = _mock_subprocess_run(monkeypatch)
_make_dummy_env(
cwd="/workspace",
@@ -211,28 +193,23 @@ def test_auto_mount_replaces_persistent_workspace_bind(monkeypatch, tmp_path):
task_id="test-persistent-auto-mount",
)
run_args_str = " ".join(captured_run_args)
run_calls = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "run"]
assert run_calls, "docker run should have been called"
run_args_str = " ".join(run_calls[0][0])
assert f"{project_dir}:/workspace" in run_args_str
assert "/sandboxes/docker/test-persistent-auto-mount/workspace:/workspace" not in run_args_str
def test_non_persistent_cleanup_removes_container(monkeypatch):
"""When container_persistent=false, cleanup() must run docker rm -f so the container is removed (Fixes #1679)."""
run_calls = []
def _run(cmd, **kwargs):
run_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
if cmd and getattr(cmd[0], "__str__", None) and "docker" in str(cmd[0]):
if len(cmd) >= 2 and cmd[1] == "run":
return subprocess.CompletedProcess(cmd, 0, stdout="abc123container\n", stderr="")
return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
"""When persistent=false, cleanup() must schedule docker stop + rm."""
monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
monkeypatch.setattr(docker_env.subprocess, "run", _run)
monkeypatch.setattr(docker_env.subprocess, "Popen", lambda *a, **k: type("P", (), {"poll": lambda: None, "wait": lambda **kw: None, "returncode": 0, "stdout": iter([]), "stdin": None})())
calls = _mock_subprocess_run(monkeypatch)
captured_run_args = []
_install_fake_minisweagent(monkeypatch, captured_run_args)
popen_cmds = []
monkeypatch.setattr(
docker_env.subprocess, "Popen",
lambda cmd, **kw: (popen_cmds.append(cmd), type("P", (), {"poll": lambda s: 0, "wait": lambda s, **k: None, "returncode": 0, "stdout": iter([]), "stdin": None})())[1],
)
env = _make_dummy_env(persistent_filesystem=False, task_id="ephemeral-task")
assert env._container_id
@@ -240,8 +217,9 @@ def test_non_persistent_cleanup_removes_container(monkeypatch):
env.cleanup()
rm_calls = [c for c in run_calls if isinstance(c[0], list) and len(c[0]) >= 4 and c[0][1:4] == ["rm", "-f", container_id]]
assert len(rm_calls) >= 1, "cleanup() should run docker rm -f <container_id> when container_persistent=false"
# Should have stop and rm calls via Popen
stop_cmds = [c for c in popen_cmds if container_id in str(c) and "stop" in str(c)]
assert len(stop_cmds) >= 1, f"cleanup() should schedule docker stop for {container_id}"
class _FakePopen:
@@ -263,10 +241,8 @@ def _make_execute_only_env(forward_env=None):
env._forward_env = forward_env or []
env._prepare_command = lambda command: (command, None)
env._timeout_result = lambda timeout: {"output": f"timed out after {timeout}", "returncode": 124}
env._inner = type("Inner", (), {
"container_id": "test-container",
"config": type("Cfg", (), {"executable": "/usr/bin/docker", "env": {}})(),
})()
env._container_id = "test-container"
env._docker_exe = "/usr/bin/docker"
return env
@@ -304,31 +280,3 @@ def test_execute_prefers_shell_env_over_hermes_dotenv(monkeypatch):
assert "GITHUB_TOKEN=value_from_shell" in popen_calls[0]
assert "GITHUB_TOKEN=value_from_dotenv" not in popen_calls[0]
def test_non_persistent_cleanup_removes_container(monkeypatch):
"""When container_persistent=false, cleanup() must run docker rm -f so the container is removed (Fixes #1679)."""
run_calls = []
def _run(cmd, **kwargs):
run_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
if cmd and getattr(cmd[0], '__str__', None) and 'docker' in str(cmd[0]):
if len(cmd) >= 2 and cmd[1] == 'run':
return subprocess.CompletedProcess(cmd, 0, stdout="abc123container\n", stderr="")
return subprocess.CompletedProcess(cmd, 0, stdout='', stderr='')
monkeypatch.setattr(docker_env, 'find_docker', lambda: '/usr/bin/docker')
monkeypatch.setattr(docker_env.subprocess, 'run', _run)
monkeypatch.setattr(docker_env.subprocess, 'Popen', lambda *a, **k: type('P', (), {'poll': lambda: None, 'wait': lambda **kw: None, 'returncode': 0, 'stdout': iter([]), 'stdin': None})())
captured_run_args = []
_install_fake_minisweagent(monkeypatch, captured_run_args)
env = _make_dummy_env(persistent_filesystem=False, task_id='ephemeral-task')
assert env._container_id
container_id = env._container_id
env.cleanup()
rm_calls = [c for c in run_calls if isinstance(c[0], list) and len(c[0]) >= 4 and c[0][1:4] == ['rm', '-f', container_id]]
assert len(rm_calls) >= 1, 'cleanup() should run docker rm -f <container_id> when container_persistent=false'
+199
View File
@@ -0,0 +1,199 @@
"""Tests for tools.env_passthrough — skill and config env var passthrough."""
import os
import pytest
import yaml
from tools.env_passthrough import (
clear_env_passthrough,
get_all_passthrough,
is_env_passthrough,
register_env_passthrough,
reset_config_cache,
)
@pytest.fixture(autouse=True)
def _clean_passthrough():
"""Ensure a clean passthrough state for every test."""
clear_env_passthrough()
reset_config_cache()
yield
clear_env_passthrough()
reset_config_cache()
class TestSkillScopedPassthrough:
def test_register_and_check(self):
assert not is_env_passthrough("TENOR_API_KEY")
register_env_passthrough(["TENOR_API_KEY"])
assert is_env_passthrough("TENOR_API_KEY")
def test_register_multiple(self):
register_env_passthrough(["FOO_TOKEN", "BAR_SECRET"])
assert is_env_passthrough("FOO_TOKEN")
assert is_env_passthrough("BAR_SECRET")
assert not is_env_passthrough("OTHER_KEY")
def test_clear(self):
register_env_passthrough(["TENOR_API_KEY"])
assert is_env_passthrough("TENOR_API_KEY")
clear_env_passthrough()
assert not is_env_passthrough("TENOR_API_KEY")
def test_get_all(self):
register_env_passthrough(["A_KEY", "B_TOKEN"])
result = get_all_passthrough()
assert "A_KEY" in result
assert "B_TOKEN" in result
def test_strips_whitespace(self):
register_env_passthrough([" SPACED_KEY "])
assert is_env_passthrough("SPACED_KEY")
def test_skips_empty(self):
register_env_passthrough(["", " ", "VALID_KEY"])
assert is_env_passthrough("VALID_KEY")
assert not is_env_passthrough("")
class TestConfigPassthrough:
def test_reads_from_config(self, tmp_path, monkeypatch):
config = {"terminal": {"env_passthrough": ["MY_CUSTOM_KEY", "ANOTHER_TOKEN"]}}
config_path = tmp_path / "config.yaml"
config_path.write_text(yaml.dump(config))
monkeypatch.setenv("HERMES_HOME", str(tmp_path))
reset_config_cache()
assert is_env_passthrough("MY_CUSTOM_KEY")
assert is_env_passthrough("ANOTHER_TOKEN")
assert not is_env_passthrough("UNRELATED_VAR")
def test_empty_config(self, tmp_path, monkeypatch):
config = {"terminal": {"env_passthrough": []}}
config_path = tmp_path / "config.yaml"
config_path.write_text(yaml.dump(config))
monkeypatch.setenv("HERMES_HOME", str(tmp_path))
reset_config_cache()
assert not is_env_passthrough("ANYTHING")
def test_missing_config_key(self, tmp_path, monkeypatch):
config = {"terminal": {"backend": "local"}}
config_path = tmp_path / "config.yaml"
config_path.write_text(yaml.dump(config))
monkeypatch.setenv("HERMES_HOME", str(tmp_path))
reset_config_cache()
assert not is_env_passthrough("ANYTHING")
def test_no_config_file(self, tmp_path, monkeypatch):
monkeypatch.setenv("HERMES_HOME", str(tmp_path))
reset_config_cache()
assert not is_env_passthrough("ANYTHING")
def test_union_of_skill_and_config(self, tmp_path, monkeypatch):
config = {"terminal": {"env_passthrough": ["CONFIG_KEY"]}}
config_path = tmp_path / "config.yaml"
config_path.write_text(yaml.dump(config))
monkeypatch.setenv("HERMES_HOME", str(tmp_path))
reset_config_cache()
register_env_passthrough(["SKILL_KEY"])
all_pt = get_all_passthrough()
assert "CONFIG_KEY" in all_pt
assert "SKILL_KEY" in all_pt
class TestExecuteCodeIntegration:
"""Verify that the passthrough is checked in execute_code's env filtering."""
def test_secret_substring_blocked_by_default(self):
"""TENOR_API_KEY should be blocked without passthrough."""
_SAFE_ENV_PREFIXES = ("PATH", "HOME", "USER", "LANG", "LC_", "TERM",
"TMPDIR", "TMP", "TEMP", "SHELL", "LOGNAME",
"XDG_", "PYTHONPATH", "VIRTUAL_ENV", "CONDA")
_SECRET_SUBSTRINGS = ("KEY", "TOKEN", "SECRET", "PASSWORD", "CREDENTIAL",
"PASSWD", "AUTH")
test_env = {"PATH": "/usr/bin", "TENOR_API_KEY": "test123", "HOME": "/home/user"}
child_env = {}
for k, v in test_env.items():
if is_env_passthrough(k):
child_env[k] = v
continue
if any(s in k.upper() for s in _SECRET_SUBSTRINGS):
continue
if any(k.startswith(p) for p in _SAFE_ENV_PREFIXES):
child_env[k] = v
assert "PATH" in child_env
assert "HOME" in child_env
assert "TENOR_API_KEY" not in child_env
def test_passthrough_allows_secret_through(self):
"""TENOR_API_KEY should pass through when registered."""
_SAFE_ENV_PREFIXES = ("PATH", "HOME", "USER", "LANG", "LC_", "TERM",
"TMPDIR", "TMP", "TEMP", "SHELL", "LOGNAME",
"XDG_", "PYTHONPATH", "VIRTUAL_ENV", "CONDA")
_SECRET_SUBSTRINGS = ("KEY", "TOKEN", "SECRET", "PASSWORD", "CREDENTIAL",
"PASSWD", "AUTH")
register_env_passthrough(["TENOR_API_KEY"])
test_env = {"PATH": "/usr/bin", "TENOR_API_KEY": "test123", "HOME": "/home/user"}
child_env = {}
for k, v in test_env.items():
if is_env_passthrough(k):
child_env[k] = v
continue
if any(s in k.upper() for s in _SECRET_SUBSTRINGS):
continue
if any(k.startswith(p) for p in _SAFE_ENV_PREFIXES):
child_env[k] = v
assert "PATH" in child_env
assert "HOME" in child_env
assert "TENOR_API_KEY" in child_env
assert child_env["TENOR_API_KEY"] == "test123"
class TestTerminalIntegration:
"""Verify that the passthrough is checked in terminal's env sanitizers."""
def test_blocklisted_var_blocked_by_default(self):
from tools.environments.local import _sanitize_subprocess_env, _HERMES_PROVIDER_ENV_BLOCKLIST
# Pick a var we know is in the blocklist
blocked_var = next(iter(_HERMES_PROVIDER_ENV_BLOCKLIST))
env = {blocked_var: "secret_value", "PATH": "/usr/bin"}
result = _sanitize_subprocess_env(env)
assert blocked_var not in result
assert "PATH" in result
def test_passthrough_allows_blocklisted_var(self):
from tools.environments.local import _sanitize_subprocess_env, _HERMES_PROVIDER_ENV_BLOCKLIST
blocked_var = next(iter(_HERMES_PROVIDER_ENV_BLOCKLIST))
register_env_passthrough([blocked_var])
env = {blocked_var: "secret_value", "PATH": "/usr/bin"}
result = _sanitize_subprocess_env(env)
assert blocked_var in result
assert result[blocked_var] == "secret_value"
def test_make_run_env_passthrough(self, monkeypatch):
from tools.environments.local import _make_run_env, _HERMES_PROVIDER_ENV_BLOCKLIST
blocked_var = next(iter(_HERMES_PROVIDER_ENV_BLOCKLIST))
monkeypatch.setenv(blocked_var, "secret_value")
# Without passthrough — blocked
result_before = _make_run_env({})
assert blocked_var not in result_before
# With passthrough — allowed
register_env_passthrough([blocked_var])
result_after = _make_run_env({})
assert blocked_var in result_after
+3
View File
@@ -309,3 +309,6 @@ class TestSearchHints:
raw = search_tool(pattern="foo", offset=50, limit=50)
assert "[Hint:" in raw
assert "offset=100" in raw
+31
View File
@@ -288,3 +288,34 @@ class TestBlocklistCoverage:
"DAYTONA_API_KEY",
}
assert extras.issubset(_HERMES_PROVIDER_ENV_BLOCKLIST)
class TestSanePathIncludesHomebrew:
"""Verify _SANE_PATH includes macOS Homebrew directories."""
def test_sane_path_includes_homebrew_bin(self):
from tools.environments.local import _SANE_PATH
assert "/opt/homebrew/bin" in _SANE_PATH
def test_sane_path_includes_homebrew_sbin(self):
from tools.environments.local import _SANE_PATH
assert "/opt/homebrew/sbin" in _SANE_PATH
def test_make_run_env_appends_homebrew_on_minimal_path(self):
"""When PATH is minimal (no /usr/bin), _make_run_env should append
_SANE_PATH which now includes Homebrew dirs."""
from tools.environments.local import _make_run_env
minimal_env = {"PATH": "/some/custom/bin"}
with patch.dict(os.environ, minimal_env, clear=True):
result = _make_run_env({})
assert "/opt/homebrew/bin" in result["PATH"]
assert "/opt/homebrew/sbin" in result["PATH"]
def test_make_run_env_does_not_duplicate_on_full_path(self):
"""When PATH already has /usr/bin, _make_run_env should not append."""
from tools.environments.local import _make_run_env
full_env = {"PATH": "/usr/bin:/bin"}
with patch.dict(os.environ, full_env, clear=True):
result = _make_run_env({})
# Should keep existing PATH unchanged
assert result["PATH"] == "/usr/bin:/bin"
+18 -33
View File
@@ -1,11 +1,11 @@
"""Tests for Modal sandbox infrastructure fixes (TBLite baseline).
Covers the 9 bugs discovered while setting up TBLite evaluation:
1. Tool resolution terminal + file tools load with minisweagent
Covers the bugs discovered while setting up TBLite evaluation:
1. Tool resolution terminal + file tools load correctly
2. CWD fix host paths get replaced with /root for container backends
3. ephemeral_disk version check
4. Tilde ~ replaced with /root for container backends
5. ensurepip fix in patches.py for Modal image builder
5. ensurepip fix in Modal image builder
6. install_pipx stays True for swerex-remote
7. /home/ added to host prefix check
"""
@@ -36,17 +36,8 @@ except ImportError:
class TestToolResolution:
"""Verify get_tool_definitions returns all expected tools for eval."""
def _has_minisweagent(self):
try:
import minisweagent # noqa: F401
return True
except ImportError:
return False
def test_terminal_and_file_toolsets_resolve_all_tools(self):
"""enabled_toolsets=['terminal', 'file'] should produce 6 tools."""
if not self._has_minisweagent():
pytest.skip("minisweagent not installed (git submodule update --init)")
from model_tools import get_tool_definitions
tools = get_tool_definitions(
enabled_toolsets=["terminal", "file"],
@@ -58,18 +49,13 @@ class TestToolResolution:
def test_terminal_tool_present(self):
"""The terminal tool must be present (not silently dropped)."""
if not self._has_minisweagent():
pytest.skip("minisweagent not installed (git submodule update --init)")
from model_tools import get_tool_definitions
tools = get_tool_definitions(
enabled_toolsets=["terminal", "file"],
quiet_mode=True,
)
names = [t["function"]["name"] for t in tools]
assert "terminal" in names, (
f"terminal tool missing! Only got: {names}. "
"Check that minisweagent is installed (git submodule update --init)."
)
assert "terminal" in names, f"terminal tool missing! Only got: {names}."
# =========================================================================
@@ -269,38 +255,37 @@ class TestModalEnvironmentDefaults:
# =========================================================================
class TestEnsurepipFix:
"""Verify the pip fix is applied in the patched Modal init."""
"""Verify the pip fix is applied in the ModalEnvironment init."""
def test_patched_init_creates_image_with_setup_commands(self):
"""The patched __init__ should create a modal.Image with pip fix."""
def test_modal_environment_creates_image_with_setup_commands(self):
"""ModalEnvironment.__init__ should create a modal.Image with pip fix."""
try:
from environments.patches import _patch_swerex_modal
from tools.environments.modal import ModalEnvironment
except ImportError:
pytest.skip("environments.patches not importable")
pytest.skip("tools.environments.modal not importable")
# Check that the patch code references ensurepip
import inspect
source = inspect.getsource(_patch_swerex_modal)
source = inspect.getsource(ModalEnvironment.__init__)
assert "ensurepip" in source, (
"patches._patch_swerex_modal should include ensurepip fix "
"ModalEnvironment should include ensurepip fix "
"for Modal's legacy image builder"
)
assert "setup_dockerfile_commands" in source, (
"patches._patch_swerex_modal should use setup_dockerfile_commands "
"ModalEnvironment should use setup_dockerfile_commands "
"to fix pip before Modal's bootstrap"
)
def test_patched_init_uses_install_pipx_from_config(self):
"""The patched init should respect install_pipx from config."""
def test_modal_environment_uses_install_pipx(self):
"""ModalEnvironment should pass install_pipx to ModalDeployment."""
try:
from environments.patches import _patch_swerex_modal
from tools.environments.modal import ModalEnvironment
except ImportError:
pytest.skip("environments.patches not importable")
pytest.skip("tools.environments.modal not importable")
import inspect
source = inspect.getsource(_patch_swerex_modal)
source = inspect.getsource(ModalEnvironment.__init__)
assert "install_pipx" in source, (
"patches._patch_swerex_modal should pass install_pipx to ModalDeployment"
"ModalEnvironment should pass install_pipx to ModalDeployment"
)
+105
View File
@@ -0,0 +1,105 @@
"""Test that skill_view registers required env vars in the passthrough registry."""
import json
import os
from pathlib import Path
from unittest.mock import patch
import pytest
from tools.env_passthrough import clear_env_passthrough, is_env_passthrough, reset_config_cache
@pytest.fixture(autouse=True)
def _clean_passthrough():
clear_env_passthrough()
reset_config_cache()
yield
clear_env_passthrough()
reset_config_cache()
def _create_skill(tmp_path, name, frontmatter_extra=""):
"""Create a minimal skill directory with SKILL.md."""
skill_dir = tmp_path / name
skill_dir.mkdir(parents=True, exist_ok=True)
(skill_dir / "SKILL.md").write_text(
f"---\n"
f"name: {name}\n"
f"description: Test skill\n"
f"{frontmatter_extra}"
f"---\n\n"
f"# {name}\n\n"
f"Test content.\n"
)
return skill_dir
class TestSkillViewRegistersPassthrough:
def test_available_env_vars_registered(self, tmp_path, monkeypatch):
"""When a skill declares required_environment_variables and the var IS set,
it should be registered in the passthrough."""
_create_skill(
tmp_path,
"test-skill",
frontmatter_extra=(
"required_environment_variables:\n"
" - name: TENOR_API_KEY\n"
" prompt: Enter your Tenor API key\n"
),
)
monkeypatch.setattr(
"tools.skills_tool.SKILLS_DIR", tmp_path
)
# Set the env var so it's "available"
monkeypatch.setenv("TENOR_API_KEY", "test-value-123")
# Patch the secret capture callback to not prompt
with patch("tools.skills_tool._secret_capture_callback", None):
from tools.skills_tool import skill_view
result = json.loads(skill_view(name="test-skill"))
assert result["success"] is True
assert is_env_passthrough("TENOR_API_KEY")
def test_missing_env_vars_not_registered(self, tmp_path, monkeypatch):
"""When a skill declares required_environment_variables but the var is NOT set,
it should NOT be registered in the passthrough."""
_create_skill(
tmp_path,
"test-skill",
frontmatter_extra=(
"required_environment_variables:\n"
" - name: NONEXISTENT_SKILL_KEY_XYZ\n"
" prompt: Enter your key\n"
),
)
monkeypatch.setattr(
"tools.skills_tool.SKILLS_DIR", tmp_path
)
monkeypatch.delenv("NONEXISTENT_SKILL_KEY_XYZ", raising=False)
with patch("tools.skills_tool._secret_capture_callback", None):
from tools.skills_tool import skill_view
result = json.loads(skill_view(name="test-skill"))
assert result["success"] is True
assert not is_env_passthrough("NONEXISTENT_SKILL_KEY_XYZ")
def test_no_env_vars_skill_no_registration(self, tmp_path, monkeypatch):
"""Skills without required_environment_variables shouldn't register anything."""
_create_skill(tmp_path, "simple-skill")
monkeypatch.setattr(
"tools.skills_tool.SKILLS_DIR", tmp_path
)
with patch("tools.skills_tool._secret_capture_callback", None):
from tools.skills_tool import skill_view
result = json.loads(skill_view(name="simple-skill"))
assert result["success"] is True
from tools.env_passthrough import get_all_passthrough
assert len(get_all_passthrough()) == 0
+3 -4
View File
@@ -18,9 +18,8 @@ def _clear_terminal_env(monkeypatch):
monkeypatch.delenv(key, raising=False)
def test_local_terminal_requirements_do_not_depend_on_minisweagent(monkeypatch, caplog):
"""Local backend uses Hermes' own LocalEnvironment wrapper and should not
be marked unavailable just because `minisweagent` isn't importable."""
def test_local_terminal_requirements(monkeypatch, caplog):
"""Local backend uses Hermes' own LocalEnvironment wrapper."""
_clear_terminal_env(monkeypatch)
monkeypatch.setenv("TERMINAL_ENV", "local")
@@ -64,7 +63,7 @@ def test_modal_backend_without_token_or_config_logs_specific_error(monkeypatch,
monkeypatch.setenv("TERMINAL_ENV", "modal")
monkeypatch.setenv("HOME", str(tmp_path))
monkeypatch.setenv("USERPROFILE", str(tmp_path))
monkeypatch.setattr(terminal_tool_module, "ensure_minisweagent_on_path", lambda *_args, **_kwargs: None)
# Pretend swerex is installed
monkeypatch.setattr(terminal_tool_module.importlib.util, "find_spec", lambda _name: object())
with caplog.at_level(logging.ERROR):
@@ -8,7 +8,7 @@ terminal_tool_module = importlib.import_module("tools.terminal_tool")
class TestTerminalRequirements:
def test_local_backend_does_not_require_minisweagent_package(self, monkeypatch):
def test_local_backend_requirements(self, monkeypatch):
monkeypatch.setattr(
terminal_tool_module,
"_get_env_config",
+176
View File
@@ -0,0 +1,176 @@
"""Tests for SSRF protection in url_safety module."""
import socket
from unittest.mock import patch
from tools.url_safety import is_safe_url, _is_blocked_ip
import ipaddress
import pytest
class TestIsSafeUrl:
def test_public_url_allowed(self):
with patch("socket.getaddrinfo", return_value=[
(2, 1, 6, "", ("93.184.216.34", 0)),
]):
assert is_safe_url("https://example.com/image.png") is True
def test_localhost_blocked(self):
with patch("socket.getaddrinfo", return_value=[
(2, 1, 6, "", ("127.0.0.1", 0)),
]):
assert is_safe_url("http://localhost:8080/secret") is False
def test_loopback_ip_blocked(self):
with patch("socket.getaddrinfo", return_value=[
(2, 1, 6, "", ("127.0.0.1", 0)),
]):
assert is_safe_url("http://127.0.0.1/admin") is False
def test_private_10_blocked(self):
with patch("socket.getaddrinfo", return_value=[
(2, 1, 6, "", ("10.0.0.1", 0)),
]):
assert is_safe_url("http://internal-service.local/api") is False
def test_private_172_blocked(self):
with patch("socket.getaddrinfo", return_value=[
(2, 1, 6, "", ("172.16.0.1", 0)),
]):
assert is_safe_url("http://private.corp/data") is False
def test_private_192_blocked(self):
with patch("socket.getaddrinfo", return_value=[
(2, 1, 6, "", ("192.168.1.1", 0)),
]):
assert is_safe_url("http://router.local") is False
def test_link_local_169_254_blocked(self):
with patch("socket.getaddrinfo", return_value=[
(2, 1, 6, "", ("169.254.169.254", 0)),
]):
assert is_safe_url("http://169.254.169.254/latest/meta-data/") is False
def test_metadata_google_internal_blocked(self):
assert is_safe_url("http://metadata.google.internal/computeMetadata/v1/") is False
def test_ipv6_loopback_blocked(self):
with patch("socket.getaddrinfo", return_value=[
(10, 1, 6, "", ("::1", 0, 0, 0)),
]):
assert is_safe_url("http://[::1]:8080/") is False
def test_dns_failure_blocked(self):
"""DNS failures now fail closed — block the request."""
with patch("socket.getaddrinfo", side_effect=socket.gaierror("Name resolution failed")):
assert is_safe_url("https://nonexistent.example.com") is False
def test_empty_url_blocked(self):
assert is_safe_url("") is False
def test_no_hostname_blocked(self):
assert is_safe_url("http://") is False
def test_public_ip_allowed(self):
with patch("socket.getaddrinfo", return_value=[
(2, 1, 6, "", ("93.184.216.34", 0)),
]):
assert is_safe_url("https://example.com") is True
# ── New tests for hardened SSRF protection ──
def test_cgnat_100_64_blocked(self):
"""100.64.0.0/10 (CGNAT/Shared Address Space) is NOT covered by
ipaddress.is_private must be blocked explicitly."""
with patch("socket.getaddrinfo", return_value=[
(2, 1, 6, "", ("100.64.0.1", 0)),
]):
assert is_safe_url("http://some-cgnat-host.example/") is False
def test_cgnat_100_127_blocked(self):
"""Upper end of CGNAT range (100.127.255.255)."""
with patch("socket.getaddrinfo", return_value=[
(2, 1, 6, "", ("100.127.255.254", 0)),
]):
assert is_safe_url("http://tailscale-peer.example/") is False
def test_multicast_blocked(self):
"""Multicast addresses (224.0.0.0/4) not caught by is_private."""
with patch("socket.getaddrinfo", return_value=[
(2, 1, 6, "", ("224.0.0.251", 0)),
]):
assert is_safe_url("http://mdns-host.local/") is False
def test_multicast_ipv6_blocked(self):
with patch("socket.getaddrinfo", return_value=[
(10, 1, 6, "", ("ff02::1", 0, 0, 0)),
]):
assert is_safe_url("http://[ff02::1]/") is False
def test_ipv4_mapped_ipv6_loopback_blocked(self):
"""::ffff:127.0.0.1 — IPv4-mapped IPv6 loopback."""
with patch("socket.getaddrinfo", return_value=[
(10, 1, 6, "", ("::ffff:127.0.0.1", 0, 0, 0)),
]):
assert is_safe_url("http://[::ffff:127.0.0.1]/") is False
def test_ipv4_mapped_ipv6_metadata_blocked(self):
"""::ffff:169.254.169.254 — IPv4-mapped IPv6 cloud metadata."""
with patch("socket.getaddrinfo", return_value=[
(10, 1, 6, "", ("::ffff:169.254.169.254", 0, 0, 0)),
]):
assert is_safe_url("http://[::ffff:169.254.169.254]/") is False
def test_unspecified_address_blocked(self):
"""0.0.0.0 — unspecified address, can bind to all interfaces."""
with patch("socket.getaddrinfo", return_value=[
(2, 1, 6, "", ("0.0.0.0", 0)),
]):
assert is_safe_url("http://0.0.0.0/") is False
def test_unexpected_error_fails_closed(self):
"""Unexpected exceptions should block, not allow."""
with patch("tools.url_safety.urlparse", side_effect=ValueError("bad url")):
assert is_safe_url("http://evil.com/") is False
def test_metadata_goog_blocked(self):
assert is_safe_url("http://metadata.goog/computeMetadata/v1/") is False
def test_ipv6_unique_local_blocked(self):
"""fc00::/7 — IPv6 unique local addresses."""
with patch("socket.getaddrinfo", return_value=[
(10, 1, 6, "", ("fd12::1", 0, 0, 0)),
]):
assert is_safe_url("http://[fd12::1]/internal") is False
def test_non_cgnat_100_allowed(self):
"""100.0.0.1 is NOT in CGNAT range (100.64.0.0/10), should be allowed."""
with patch("socket.getaddrinfo", return_value=[
(2, 1, 6, "", ("100.0.0.1", 0)),
]):
# 100.0.0.1 is a global IP, not in CGNAT range
assert is_safe_url("http://legit-host.example/") is True
class TestIsBlockedIp:
"""Direct tests for the _is_blocked_ip helper."""
@pytest.mark.parametrize("ip_str", [
"127.0.0.1", "10.0.0.1", "172.16.0.1", "192.168.1.1",
"169.254.169.254", "0.0.0.0", "224.0.0.1", "255.255.255.255",
"100.64.0.1", "100.100.100.100", "100.127.255.254",
"::1", "fe80::1", "fc00::1", "fd12::1", "ff02::1",
"::ffff:127.0.0.1", "::ffff:169.254.169.254",
])
def test_blocked_ips(self, ip_str):
ip = ipaddress.ip_address(ip_str)
assert _is_blocked_ip(ip) is True, f"{ip_str} should be blocked"
@pytest.mark.parametrize("ip_str", [
"8.8.8.8", "93.184.216.34", "1.1.1.1", "100.0.0.1",
"2606:4700::1", "2001:4860:4860::8888",
])
def test_allowed_ips(self, ip_str):
ip = ipaddress.ip_address(ip_str)
assert _is_blocked_ip(ip) is False, f"{ip_str} should be allowed"
+17 -4
View File
@@ -33,17 +33,30 @@ class TestValidateImageUrl:
assert _validate_image_url("https://example.com/image.jpg") is True
def test_valid_http_url(self):
assert _validate_image_url("http://cdn.example.org/photo.png") is True
with patch("tools.url_safety.socket.getaddrinfo", return_value=[
(2, 1, 6, "", ("93.184.216.34", 0)),
]):
assert _validate_image_url("http://cdn.example.org/photo.png") is True
def test_valid_url_without_extension(self):
"""CDN endpoints that redirect to images should still pass."""
assert _validate_image_url("https://cdn.example.com/abcdef123") is True
with patch("tools.url_safety.socket.getaddrinfo", return_value=[
(2, 1, 6, "", ("93.184.216.34", 0)),
]):
assert _validate_image_url("https://cdn.example.com/abcdef123") is True
def test_valid_url_with_query_params(self):
assert _validate_image_url("https://img.example.com/pic?w=200&h=200") is True
with patch("tools.url_safety.socket.getaddrinfo", return_value=[
(2, 1, 6, "", ("93.184.216.34", 0)),
]):
assert _validate_image_url("https://img.example.com/pic?w=200&h=200") is True
def test_localhost_url_blocked_by_ssrf(self):
"""localhost URLs are now blocked by SSRF protection."""
assert _validate_image_url("http://localhost:8080/image.png") is False
def test_valid_url_with_port(self):
assert _validate_image_url("http://localhost:8080/image.png") is True
assert _validate_image_url("http://example.com:8080/image.png") is True
def test_valid_url_with_path_only(self):
assert _validate_image_url("https://example.com/") is True
+9
View File
@@ -343,6 +343,8 @@ def test_browser_navigate_allows_when_shared_file_missing(monkeypatch, tmp_path)
async def test_web_extract_short_circuits_blocked_url(monkeypatch):
from tools import web_tools
# Allow test URLs past SSRF check so website policy is what gets tested
monkeypatch.setattr(web_tools, "is_safe_url", lambda url: True)
monkeypatch.setattr(
web_tools,
"check_website_access",
@@ -389,6 +391,9 @@ def test_check_website_access_fails_open_on_malformed_config(tmp_path, monkeypat
async def test_web_extract_blocks_redirected_final_url(monkeypatch):
from tools import web_tools
# Allow test URLs past SSRF check so website policy is what gets tested
monkeypatch.setattr(web_tools, "is_safe_url", lambda url: True)
def fake_check(url):
if url == "https://allowed.test":
return None
@@ -428,6 +433,8 @@ async def test_web_crawl_short_circuits_blocked_url(monkeypatch):
# web_crawl_tool checks for Firecrawl env before website policy
monkeypatch.setenv("FIRECRAWL_API_KEY", "fake-key")
# Allow test URLs past SSRF check so website policy is what gets tested
monkeypatch.setattr(web_tools, "is_safe_url", lambda url: True)
monkeypatch.setattr(
web_tools,
"check_website_access",
@@ -457,6 +464,8 @@ async def test_web_crawl_blocks_redirected_final_url(monkeypatch):
# web_crawl_tool checks for Firecrawl env before website policy
monkeypatch.setenv("FIRECRAWL_API_KEY", "fake-key")
# Allow test URLs past SSRF check so website policy is what gets tested
monkeypatch.setattr(web_tools, "is_safe_url", lambda url: True)
def fake_check(url):
if url == "https://allowed.test":
+3 -3
View File
@@ -6,7 +6,7 @@ This package contains all the specific tool implementations for the Hermes Agent
Each module provides specialized functionality for different capabilities:
- web_tools: Web search, content extraction, and crawling
- terminal_tool: Command execution using mini-swe-agent (local/docker/modal/daytona backends)
- terminal_tool: Command execution (local/docker/modal/daytona/ssh/singularity backends)
- vision_tools: Image analysis and understanding
- mixture_of_agents_tool: Multi-model collaborative reasoning
- image_generation_tool: Text-to-image generation with upscaling
@@ -23,7 +23,7 @@ from .web_tools import (
check_firecrawl_api_key
)
# Primary terminal tool (mini-swe-agent backend: local/docker/singularity/modal/daytona)
# Primary terminal tool (local/docker/singularity/modal/daytona/ssh)
from .terminal_tool import (
terminal_tool,
check_terminal_requirements,
@@ -166,7 +166,7 @@ __all__ = [
'web_extract_tool',
'web_crawl_tool',
'check_firecrawl_api_key',
# Terminal tools (mini-swe-agent backend)
# Terminal tools
'terminal_tool',
'check_terminal_requirements',
'cleanup_vm',
+44
View File
@@ -0,0 +1,44 @@
"""Strip ANSI escape sequences from subprocess output.
Used by terminal_tool, code_execution_tool, and process_registry to clean
command output before returning it to the model. This prevents ANSI codes
from entering the model's context — which is the root cause of models
copying escape sequences into file writes.
Covers the full ECMA-48 spec: CSI (including private-mode ``?`` prefix,
colon-separated params, intermediate bytes), OSC (BEL and ST terminators),
DCS/SOS/PM/APC string sequences, nF multi-byte escapes, Fp/Fe/Fs
single-byte escapes, and 8-bit C1 control characters.
"""
import re
_ANSI_ESCAPE_RE = re.compile(
r"\x1b"
r"(?:"
r"\[[\x30-\x3f]*[\x20-\x2f]*[\x40-\x7e]" # CSI sequence
r"|\][\s\S]*?(?:\x07|\x1b\\)" # OSC (BEL or ST terminator)
r"|[PX^_][\s\S]*?(?:\x1b\\)" # DCS/SOS/PM/APC strings
r"|[\x20-\x2f]+[\x30-\x7e]" # nF escape sequences
r"|[\x30-\x7e]" # Fp/Fe/Fs single-byte
r")"
r"|\x9b[\x30-\x3f]*[\x20-\x2f]*[\x40-\x7e]" # 8-bit CSI
r"|\x9d[\s\S]*?(?:\x07|\x9c)" # 8-bit OSC
r"|[\x80-\x9f]", # Other 8-bit C1 controls
re.DOTALL,
)
# Fast-path check — skip full regex when no escape-like bytes are present.
_HAS_ESCAPE = re.compile(r"[\x1b\x80-\x9f]")
def strip_ansi(text: str) -> str:
"""Remove ANSI escape sequences from text.
Returns the input unchanged (fast path) when no ESC or C1 bytes are
present. Safe to call on any string clean text passes through
with negligible overhead.
"""
if not text or not _HAS_ESCAPE.search(text):
return text
return _ANSI_ESCAPE_RE.sub("", text)
+103 -11
View File
@@ -76,8 +76,35 @@ from tools.browser_providers.browser_use import BrowserUseProvider
logger = logging.getLogger(__name__)
# Standard PATH entries for environments with minimal PATH (e.g. systemd services)
_SANE_PATH = "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
# Standard PATH entries for environments with minimal PATH (e.g. systemd services).
# Includes macOS Homebrew paths (/opt/homebrew/* for Apple Silicon).
_SANE_PATH = (
"/opt/homebrew/bin:/opt/homebrew/sbin:"
"/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
)
def _discover_homebrew_node_dirs() -> list[str]:
"""Find Homebrew versioned Node.js bin directories (e.g. node@20, node@24).
When Node is installed via ``brew install node@24`` and NOT linked into
/opt/homebrew/bin, the binary lives only in /opt/homebrew/opt/node@24/bin/.
This function discovers those paths so they can be added to subprocess PATH.
"""
dirs: list[str] = []
homebrew_opt = "/opt/homebrew/opt"
if not os.path.isdir(homebrew_opt):
return dirs
try:
for entry in os.listdir(homebrew_opt):
if entry.startswith("node") and entry != "node":
# e.g. node@20, node@24
bin_dir = os.path.join(homebrew_opt, entry, "bin")
if os.path.isdir(bin_dir):
dirs.append(bin_dir)
except OSError:
pass
return dirs
# Throttle screenshot cleanup to avoid repeated full directory scans.
_last_screenshot_cleanup_by_dir: dict[str, float] = {}
@@ -96,6 +123,27 @@ DEFAULT_SESSION_TIMEOUT = 300
SNAPSHOT_SUMMARIZE_THRESHOLD = 8000
def _get_command_timeout() -> int:
"""Return the configured browser command timeout from config.yaml.
Reads ``config["browser"]["command_timeout"]`` and falls back to
``DEFAULT_COMMAND_TIMEOUT`` (30s) if unset or unreadable.
"""
try:
hermes_home = Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes"))
config_path = hermes_home / "config.yaml"
if config_path.exists():
import yaml
with open(config_path) as f:
cfg = yaml.safe_load(f) or {}
val = cfg.get("browser", {}).get("command_timeout")
if val is not None:
return max(int(val), 5) # Floor at 5s to avoid instant kills
except Exception as e:
logger.debug("Could not read command_timeout from config: %s", e)
return DEFAULT_COMMAND_TIMEOUT
def _get_vision_model() -> Optional[str]:
"""Model for browser_vision (screenshot analysis — multimodal)."""
return os.getenv("AUXILIARY_VISION_MODEL", "").strip() or None
@@ -619,7 +667,8 @@ def _find_agent_browser() -> str:
"""
Find the agent-browser CLI executable.
Checks in order: PATH, local node_modules/.bin/, npx fallback.
Checks in order: current PATH, Homebrew/common bin dirs, Hermes-managed
node, local node_modules/.bin/, npx fallback.
Returns:
Path to agent-browser executable
@@ -632,15 +681,36 @@ def _find_agent_browser() -> str:
which_result = shutil.which("agent-browser")
if which_result:
return which_result
# Build an extended search PATH including Homebrew and Hermes-managed dirs.
# This covers macOS where the process PATH may not include Homebrew paths.
extra_dirs: list[str] = []
for d in ["/opt/homebrew/bin", "/usr/local/bin"]:
if os.path.isdir(d):
extra_dirs.append(d)
extra_dirs.extend(_discover_homebrew_node_dirs())
hermes_home = Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes"))
hermes_node_bin = str(hermes_home / "node" / "bin")
if os.path.isdir(hermes_node_bin):
extra_dirs.append(hermes_node_bin)
if extra_dirs:
extended_path = os.pathsep.join(extra_dirs)
which_result = shutil.which("agent-browser", path=extended_path)
if which_result:
return which_result
# Check local node_modules/.bin/ (npm install in repo root)
repo_root = Path(__file__).parent.parent
local_bin = repo_root / "node_modules" / ".bin" / "agent-browser"
if local_bin.exists():
return str(local_bin)
# Check common npx locations
# Check common npx locations (also search extended dirs)
npx_path = shutil.which("npx")
if not npx_path and extra_dirs:
npx_path = shutil.which("npx", path=os.pathsep.join(extra_dirs))
if npx_path:
return "npx agent-browser"
@@ -676,7 +746,7 @@ def _run_browser_command(
task_id: str,
command: str,
args: List[str] = None,
timeout: int = DEFAULT_COMMAND_TIMEOUT
timeout: Optional[int] = None,
) -> Dict[str, Any]:
"""
Run an agent-browser CLI command using our pre-created Browserbase session.
@@ -685,11 +755,14 @@ def _run_browser_command(
task_id: Task identifier to get the right session
command: The command to run (e.g., "open", "click")
args: Additional arguments for the command
timeout: Command timeout in seconds
timeout: Command timeout in seconds. ``None`` reads
``browser.command_timeout`` from config (default 30s).
Returns:
Parsed JSON response from agent-browser
"""
if timeout is None:
timeout = _get_command_timeout()
args = args or []
# Build the command
@@ -742,13 +815,18 @@ def _run_browser_command(
browser_env = {**os.environ}
# Ensure PATH includes Hermes-managed Node first, then standard system dirs.
# Ensure PATH includes Hermes-managed Node first, Homebrew versioned
# node dirs (for macOS ``brew install node@24``), then standard system dirs.
hermes_home = Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes"))
hermes_node_bin = str(hermes_home / "node" / "bin")
existing_path = browser_env.get("PATH", "")
path_parts = [p for p in existing_path.split(":") if p]
candidate_dirs = [hermes_node_bin] + [p for p in _SANE_PATH.split(":") if p]
candidate_dirs = (
[hermes_node_bin]
+ _discover_homebrew_node_dirs()
+ [p for p in _SANE_PATH.split(":") if p]
)
for part in reversed(candidate_dirs):
if os.path.isdir(part) and part not in path_parts:
@@ -968,7 +1046,7 @@ def browser_navigate(url: str, task_id: Optional[str] = None) -> str:
session_info["_first_nav"] = False
_maybe_start_recording(effective_task_id)
result = _run_browser_command(effective_task_id, "open", [url], timeout=60)
result = _run_browser_command(effective_task_id, "open", [url], timeout=max(_get_command_timeout(), 60))
if result.get("success"):
data = result.get("data", {})
@@ -1442,7 +1520,6 @@ def browser_vision(question: str, annotate: bool = False, task_id: Optional[str]
effective_task_id,
"screenshot",
screenshot_args,
timeout=30
)
if not result.get("success"):
@@ -1490,6 +1567,20 @@ def browser_vision(question: str, annotate: bool = False, task_id: Optional[str]
vision_model = _get_vision_model()
logger.debug("browser_vision: analysing screenshot (%d bytes)",
len(image_data))
# Read vision timeout from config (auxiliary.vision.timeout), default 120s.
# Local vision models (llama.cpp, ollama) can take well over 30s for
# screenshot analysis, so the default must be generous.
vision_timeout = 120.0
try:
from hermes_cli.config import load_config
_cfg = load_config()
_vt = _cfg.get("auxiliary", {}).get("vision", {}).get("timeout")
if _vt is not None:
vision_timeout = float(_vt)
except Exception:
pass
call_kwargs = {
"task": "vision",
"messages": [
@@ -1503,6 +1594,7 @@ def browser_vision(question: str, annotate: bool = False, task_id: Optional[str]
],
"max_tokens": 2000,
"temperature": 0.1,
"timeout": vision_timeout,
}
if vision_model:
call_kwargs["model"] = vision_model
+20 -1
View File
@@ -428,21 +428,34 @@ def execute_code(
# Build a minimal environment for the child. We intentionally exclude
# API keys and tokens to prevent credential exfiltration from LLM-
# generated scripts. The child accesses tools via RPC, not direct API.
# Exception: env vars declared by loaded skills (via env_passthrough
# registry) or explicitly allowed by the user in config.yaml
# (terminal.env_passthrough) are passed through.
_SAFE_ENV_PREFIXES = ("PATH", "HOME", "USER", "LANG", "LC_", "TERM",
"TMPDIR", "TMP", "TEMP", "SHELL", "LOGNAME",
"XDG_", "PYTHONPATH", "VIRTUAL_ENV", "CONDA")
_SECRET_SUBSTRINGS = ("KEY", "TOKEN", "SECRET", "PASSWORD", "CREDENTIAL",
"PASSWD", "AUTH")
try:
from tools.env_passthrough import is_env_passthrough as _is_passthrough
except Exception:
_is_passthrough = lambda _: False # noqa: E731
child_env = {}
for k, v in os.environ.items():
# Passthrough vars (skill-declared or user-configured) always pass.
if _is_passthrough(k):
child_env[k] = v
continue
# Block vars with secret-like names.
if any(s in k.upper() for s in _SECRET_SUBSTRINGS):
continue
# Allow vars with known safe prefixes.
if any(k.startswith(p) for p in _SAFE_ENV_PREFIXES):
child_env[k] = v
child_env["HERMES_RPC_SOCKET"] = sock_path
child_env["PYTHONDONTWRITEBYTECODE"] = "1"
# Ensure the hermes-agent root is importable in the sandbox so
# modules like minisweagent_path are available to child scripts.
# repo-root modules are available to child scripts.
_hermes_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
_existing_pp = child_env.get("PYTHONPATH", "")
child_env["PYTHONPATH"] = _hermes_root + (os.pathsep + _existing_pp if _existing_pp else "")
@@ -577,6 +590,12 @@ def execute_code(
server_sock = None # prevent double close in finally
rpc_thread.join(timeout=3)
# Strip ANSI escape sequences so the model never sees terminal
# formatting — prevents it from copying escapes into file writes.
from tools.ansi_strip import strip_ansi
stdout_text = strip_ansi(stdout_text)
stderr_text = strip_ansi(stderr_text)
# Build response
result: Dict[str, Any] = {
"status": status,
+99
View File
@@ -0,0 +1,99 @@
"""Environment variable passthrough registry.
Skills that declare ``required_environment_variables`` in their frontmatter
need those vars available in sandboxed execution environments (execute_code,
terminal). By default both sandboxes strip secrets from the child process
environment for security. This module provides a session-scoped allowlist
so skill-declared vars (and user-configured overrides) pass through.
Two sources feed the allowlist:
1. **Skill declarations** when a skill is loaded via ``skill_view``, its
``required_environment_variables`` are registered here automatically.
2. **User config** ``terminal.env_passthrough`` in config.yaml lets users
explicitly allowlist vars for non-skill use cases.
Both ``code_execution_tool.py`` and ``tools/environments/local.py`` consult
:func:`is_env_passthrough` before stripping a variable.
"""
from __future__ import annotations
import logging
import os
from pathlib import Path
from typing import Iterable
logger = logging.getLogger(__name__)
# Session-scoped set of env var names that should pass through to sandboxes.
_allowed_env_vars: set[str] = set()
# Cache for the config-based allowlist (loaded once per process).
_config_passthrough: frozenset[str] | None = None
def register_env_passthrough(var_names: Iterable[str]) -> None:
"""Register environment variable names as allowed in sandboxed environments.
Typically called when a skill declares ``required_environment_variables``.
"""
for name in var_names:
name = name.strip()
if name:
_allowed_env_vars.add(name)
logger.debug("env passthrough: registered %s", name)
def _load_config_passthrough() -> frozenset[str]:
"""Load ``tools.env_passthrough`` from config.yaml (cached)."""
global _config_passthrough
if _config_passthrough is not None:
return _config_passthrough
result: set[str] = set()
try:
hermes_home = Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes"))
config_path = hermes_home / "config.yaml"
if config_path.exists():
import yaml
with open(config_path) as f:
cfg = yaml.safe_load(f) or {}
passthrough = cfg.get("terminal", {}).get("env_passthrough")
if isinstance(passthrough, list):
for item in passthrough:
if isinstance(item, str) and item.strip():
result.add(item.strip())
except Exception as e:
logger.debug("Could not read tools.env_passthrough from config: %s", e)
_config_passthrough = frozenset(result)
return _config_passthrough
def is_env_passthrough(var_name: str) -> bool:
"""Check whether *var_name* is allowed to pass through to sandboxes.
Returns ``True`` if the variable was registered by a skill or listed in
the user's ``tools.env_passthrough`` config.
"""
if var_name in _allowed_env_vars:
return True
return var_name in _load_config_passthrough()
def get_all_passthrough() -> frozenset[str]:
"""Return the union of skill-registered and config-based passthrough vars."""
return frozenset(_allowed_env_vars) | _load_config_passthrough()
def clear_env_passthrough() -> None:
"""Reset the skill-scoped allowlist (e.g. on session reset)."""
_allowed_env_vars.clear()
def reset_config_cache() -> None:
"""Force re-read of config on next access (for testing)."""
global _config_passthrough
_config_passthrough = None
+44 -29
View File
@@ -1,6 +1,6 @@
"""Docker execution environment wrapping mini-swe-agent's DockerEnvironment.
"""Docker execution environment for sandboxed command execution.
Adds security hardening (cap-drop ALL, no-new-privileges, PID limits),
Security hardened (cap-drop ALL, no-new-privileges, PID limits),
configurable resource limits (CPU, memory, disk), and optional filesystem
persistence via bind mounts.
"""
@@ -13,6 +13,7 @@ import subprocess
import sys
import threading
import time
import uuid
from typing import Optional
from tools.environments.base import BaseEnvironment
@@ -227,12 +228,9 @@ class DockerEnvironment(BaseEnvironment):
logger.warning(f"docker_volumes config is not a list: {volumes!r}")
volumes = []
# Fail fast if Docker is not available rather than surfacing a cryptic
# FileNotFoundError deep inside the mini-swe-agent stack.
# Fail fast if Docker is not available.
_ensure_docker_available()
from minisweagent.environments.docker import DockerEnvironment as _Docker
# Build resource limit args
resource_args = []
if cpu > 0:
@@ -320,14 +318,28 @@ class DockerEnvironment(BaseEnvironment):
# Resolve the docker executable once so it works even when
# /usr/local/bin is not in PATH (common on macOS gateway/service).
docker_exe = find_docker() or "docker"
self._docker_exe = find_docker() or "docker"
self._inner = _Docker(
image=image, cwd=cwd, timeout=timeout,
run_args=all_run_args,
executable=docker_exe,
# Start the container directly via `docker run -d`.
container_name = f"hermes-{uuid.uuid4().hex[:8]}"
run_cmd = [
self._docker_exe, "run", "-d",
"--name", container_name,
"-w", cwd,
*all_run_args,
image,
"sleep", "2h",
]
logger.debug(f"Starting container: {' '.join(run_cmd)}")
result = subprocess.run(
run_cmd,
capture_output=True,
text=True,
timeout=120, # image pull may take a while
check=True,
)
self._container_id = self._inner.container_id
self._container_id = result.stdout.strip()
logger.info(f"Started container {container_name} ({self._container_id[:12]})")
@staticmethod
def _storage_opt_supported() -> bool:
@@ -389,8 +401,8 @@ class DockerEnvironment(BaseEnvironment):
exec_command = f"cd {work_dir} && {exec_command}"
work_dir = "/"
assert self._inner.container_id, "Container not started"
cmd = [self._inner.config.executable, "exec"]
assert self._container_id, "Container not started"
cmd = [self._docker_exe, "exec"]
if effective_stdin is not None:
cmd.append("-i")
cmd.extend(["-w", work_dir])
@@ -401,9 +413,7 @@ class DockerEnvironment(BaseEnvironment):
value = hermes_env.get(key)
if value is not None:
cmd.extend(["-e", f"{key}={value}"])
for key, value in self._inner.config.env.items():
cmd.extend(["-e", f"{key}={value}"])
cmd.extend([self._inner.container_id, "bash", "-lc", exec_command])
cmd.extend([self._container_id, "bash", "-lc", exec_command])
try:
_output_chunks = []
@@ -456,24 +466,29 @@ class DockerEnvironment(BaseEnvironment):
def cleanup(self):
"""Stop and remove the container. Bind-mount dirs persist if persistent=True."""
self._inner.cleanup()
if not self._persistent and self._container_id:
# Inner cleanup only runs `docker stop` in background; container is left
# as stopped. When container_persistent=false we must remove it.
docker_exe = find_docker() or self._inner.config.executable
if self._container_id:
try:
subprocess.run(
[docker_exe, "rm", "-f", self._container_id],
capture_output=True,
timeout=30,
# Stop in background so cleanup doesn't block
stop_cmd = (
f"(timeout 60 {self._docker_exe} stop {self._container_id} || "
f"{self._docker_exe} rm -f {self._container_id}) >/dev/null 2>&1 &"
)
subprocess.Popen(stop_cmd, shell=True)
except Exception as e:
logger.warning("Failed to remove non-persistent container %s: %s", self._container_id, e)
logger.warning("Failed to stop container %s: %s", self._container_id, e)
if not self._persistent:
# Also schedule removal (stop only leaves it as stopped)
try:
subprocess.Popen(
f"sleep 3 && {self._docker_exe} rm -f {self._container_id} >/dev/null 2>&1 &",
shell=True,
)
except Exception:
pass
self._container_id = None
if not self._persistent:
import shutil
for d in (self._workspace_dir, self._home_dir):
if d:
shutil.rmtree(d, ignore_errors=True)
+22 -5
View File
@@ -135,21 +135,28 @@ def _sanitize_subprocess_env(base_env: dict | None, extra_env: dict | None = Non
"""Filter Hermes-managed secrets from a subprocess environment.
`_HERMES_FORCE_<VAR>` entries in ``extra_env`` opt a blocked variable back in
intentionally for callers that truly need it.
intentionally for callers that truly need it. Vars registered via
:mod:`tools.env_passthrough` (skill-declared or user-configured) also
bypass the blocklist.
"""
try:
from tools.env_passthrough import is_env_passthrough as _is_passthrough
except Exception:
_is_passthrough = lambda _: False # noqa: E731
sanitized: dict[str, str] = {}
for key, value in (base_env or {}).items():
if key.startswith(_HERMES_PROVIDER_ENV_FORCE_PREFIX):
continue
if key not in _HERMES_PROVIDER_ENV_BLOCKLIST:
if key not in _HERMES_PROVIDER_ENV_BLOCKLIST or _is_passthrough(key):
sanitized[key] = value
for key, value in (extra_env or {}).items():
if key.startswith(_HERMES_PROVIDER_ENV_FORCE_PREFIX):
real_key = key[len(_HERMES_PROVIDER_ENV_FORCE_PREFIX):]
sanitized[real_key] = value
elif key not in _HERMES_PROVIDER_ENV_BLOCKLIST:
elif key not in _HERMES_PROVIDER_ENV_BLOCKLIST or _is_passthrough(key):
sanitized[key] = value
return sanitized
@@ -254,18 +261,28 @@ def _clean_shell_noise(output: str) -> str:
return result
_SANE_PATH = "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
# Standard PATH entries for environments with minimal PATH (e.g. systemd services).
# Includes macOS Homebrew paths (/opt/homebrew/* for Apple Silicon).
_SANE_PATH = (
"/opt/homebrew/bin:/opt/homebrew/sbin:"
"/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
)
def _make_run_env(env: dict) -> dict:
"""Build a run environment with a sane PATH and provider-var stripping."""
try:
from tools.env_passthrough import is_env_passthrough as _is_passthrough
except Exception:
_is_passthrough = lambda _: False # noqa: E731
merged = dict(os.environ | env)
run_env = {}
for k, v in merged.items():
if k.startswith(_HERMES_PROVIDER_ENV_FORCE_PREFIX):
real_key = k[len(_HERMES_PROVIDER_ENV_FORCE_PREFIX):]
run_env[real_key] = v
elif k not in _HERMES_PROVIDER_ENV_BLOCKLIST:
elif k not in _HERMES_PROVIDER_ENV_BLOCKLIST or _is_passthrough(k):
run_env[k] = v
existing_path = run_env.get("PATH", "")
if "/usr/bin" not in existing_path.split(":"):
+124 -51
View File
@@ -1,14 +1,14 @@
"""Modal cloud execution environment wrapping mini-swe-agent's SwerexModalEnvironment.
"""Modal cloud execution environment using SWE-ReX directly.
Supports persistent filesystem snapshots: when enabled, the sandbox's filesystem
is snapshotted on cleanup and restored on next creation, so installed packages,
project files, and config changes survive across sessions.
"""
import asyncio
import json
import logging
import threading
import time
import uuid
from pathlib import Path
from typing import Any, Dict, Optional
@@ -38,15 +38,49 @@ def _save_snapshots(data: Dict[str, str]) -> None:
_SNAPSHOT_STORE.write_text(json.dumps(data, indent=2))
class ModalEnvironment(BaseEnvironment):
"""Modal cloud execution via mini-swe-agent.
class _AsyncWorker:
"""Background thread with its own event loop for async-safe swe-rex calls.
Wraps SwerexModalEnvironment and adds sudo -S support, configurable
resources (CPU, memory, disk), and optional filesystem persistence
via Modal's snapshot_filesystem() API.
Allows sync code to submit async coroutines and block for results,
even when called from inside another running event loop (e.g. Atropos).
"""
_patches_applied = False
def __init__(self):
self._loop: Optional[asyncio.AbstractEventLoop] = None
self._thread: Optional[threading.Thread] = None
self._started = threading.Event()
def start(self):
self._thread = threading.Thread(target=self._run_loop, daemon=True)
self._thread.start()
self._started.wait(timeout=30)
def _run_loop(self):
self._loop = asyncio.new_event_loop()
asyncio.set_event_loop(self._loop)
self._started.set()
self._loop.run_forever()
def run_coroutine(self, coro, timeout=600):
if self._loop is None or self._loop.is_closed():
raise RuntimeError("AsyncWorker loop is not running")
future = asyncio.run_coroutine_threadsafe(coro, self._loop)
return future.result(timeout=timeout)
def stop(self):
if self._loop and self._loop.is_running():
self._loop.call_soon_threadsafe(self._loop.stop)
if self._thread:
self._thread.join(timeout=10)
class ModalEnvironment(BaseEnvironment):
"""Modal cloud execution via SWE-ReX.
Uses swe-rex's ModalDeployment directly for sandbox management.
Adds sudo -S support, configurable resources (CPU, memory, disk),
and optional filesystem persistence via Modal's snapshot API.
"""
def __init__(
self,
@@ -59,17 +93,11 @@ class ModalEnvironment(BaseEnvironment):
):
super().__init__(cwd=cwd, timeout=timeout)
if not ModalEnvironment._patches_applied:
try:
from environments.patches import apply_patches
apply_patches()
except ImportError:
pass
ModalEnvironment._patches_applied = True
self._persistent = persistent_filesystem
self._task_id = task_id
self._base_image = image
self._deployment = None
self._worker = _AsyncWorker()
sandbox_kwargs = dict(modal_sandbox_kwargs or {})
@@ -88,16 +116,37 @@ class ModalEnvironment(BaseEnvironment):
effective_image = restored_image if restored_image else image
from minisweagent.environments.extra.swerex_modal import SwerexModalEnvironment
self._inner = SwerexModalEnvironment(
image=effective_image,
cwd=cwd,
timeout=timeout,
startup_timeout=180.0,
runtime_timeout=3600.0,
modal_sandbox_kwargs=sandbox_kwargs,
install_pipx=True, # Required: installs pipx + swe-rex runtime (swerex-remote)
)
# Pre-build a modal.Image with pip fix for Modal's legacy image builder.
# Some task images have broken pip; fix via ensurepip before Modal uses it.
import modal as _modal
if isinstance(effective_image, str):
effective_image = _modal.Image.from_registry(
effective_image,
setup_dockerfile_commands=[
"RUN rm -rf /usr/local/lib/python*/site-packages/pip* 2>/dev/null; "
"python -m ensurepip --upgrade --default-pip 2>/dev/null || true",
],
)
# Start the async worker thread and create the deployment on it
# so all gRPC channels are bound to the worker's event loop.
self._worker.start()
from swerex.deployment.modal import ModalDeployment
async def _create_and_start():
deployment = ModalDeployment(
image=effective_image,
startup_timeout=180.0,
runtime_timeout=3600.0,
deployment_timeout=3600.0,
install_pipx=True,
modal_sandbox_kwargs=sandbox_kwargs,
)
await deployment.start()
return deployment
self._deployment = self._worker.run_coroutine(_create_and_start())
def execute(self, command: str, cwd: str = "", *,
timeout: int | None = None,
@@ -114,21 +163,39 @@ class ModalEnvironment(BaseEnvironment):
# subprocess stdin directly the way a local Popen can. When a sudo
# password is present, use a shell-level pipe from printf so that the
# password feeds sudo -S without appearing as an echo argument embedded
# in the shell string. The password is still visible in the remote
# sandbox's command line, but it is not exposed on the user's local
# machine — which is the primary threat being mitigated.
# in the shell string.
if sudo_stdin is not None:
import shlex
exec_command = (
f"printf '%s\\n' {shlex.quote(sudo_stdin.rstrip())} | {exec_command}"
)
from swerex.runtime.abstract import Command as RexCommand
effective_cwd = cwd or self.cwd
effective_timeout = timeout or self.timeout
# Run in a background thread so we can poll for interrupts
result_holder = {"value": None, "error": None}
def _run():
try:
result_holder["value"] = self._inner.execute(exec_command, cwd=cwd, timeout=timeout)
async def _do_execute():
return await self._deployment.runtime.execute(
RexCommand(
command=exec_command,
shell=True,
check=False,
cwd=effective_cwd,
timeout=effective_timeout,
merge_output_streams=True,
)
)
output = self._worker.run_coroutine(_do_execute())
result_holder["value"] = {
"output": output.stdout,
"returncode": output.exit_code,
}
except Exception as e:
result_holder["error"] = e
@@ -138,7 +205,10 @@ class ModalEnvironment(BaseEnvironment):
t.join(timeout=0.2)
if is_interrupted():
try:
self._inner.stop()
self._worker.run_coroutine(
asyncio.wait_for(self._deployment.stop(), timeout=10),
timeout=15,
)
except Exception:
pass
return {
@@ -152,35 +222,38 @@ class ModalEnvironment(BaseEnvironment):
def cleanup(self):
"""Snapshot the filesystem (if persistent) then stop the sandbox."""
# Check if _inner was ever set (init may have failed)
if not hasattr(self, '_inner') or self._inner is None:
if self._deployment is None:
return
if self._persistent:
try:
sandbox = getattr(self._inner, 'deployment', None)
sandbox = getattr(sandbox, '_sandbox', None) if sandbox else None
sandbox = getattr(self._deployment, '_sandbox', None)
if sandbox:
import asyncio
async def _snapshot():
img = await sandbox.snapshot_filesystem.aio()
return img.object_id
try:
snapshot_id = asyncio.run(_snapshot())
except RuntimeError:
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
snapshot_id = pool.submit(
asyncio.run, _snapshot()
).result(timeout=60)
snapshots = _load_snapshots()
snapshots[self._task_id] = snapshot_id
_save_snapshots(snapshots)
logger.info("Modal: saved filesystem snapshot %s for task %s",
snapshot_id[:20], self._task_id)
try:
snapshot_id = self._worker.run_coroutine(_snapshot(), timeout=60)
except Exception:
snapshot_id = None
if snapshot_id:
snapshots = _load_snapshots()
snapshots[self._task_id] = snapshot_id
_save_snapshots(snapshots)
logger.info("Modal: saved filesystem snapshot %s for task %s",
snapshot_id[:20], self._task_id)
except Exception as e:
logger.warning("Modal: filesystem snapshot failed: %s", e)
if hasattr(self._inner, 'stop'):
self._inner.stop()
try:
self._worker.run_coroutine(
asyncio.wait_for(self._deployment.stop(), timeout=10),
timeout=15,
)
except Exception:
pass
finally:
self._worker.stop()
self._deployment = None
+6 -2
View File
@@ -433,9 +433,13 @@ class ShellFileOperations(FileOperations):
slash_idx = rest.find('/')
username = rest[:slash_idx] if slash_idx >= 0 else rest
if username and re.fullmatch(r'[a-zA-Z0-9._-]+', username):
expand_result = self._exec(f"echo {path}")
# Only expand ~username (not the full path) to avoid shell
# injection via path suffixes like "~user/$(malicious)".
expand_result = self._exec(f"echo ~{username}")
if expand_result.exit_code == 0 and expand_result.stdout.strip():
return expand_result.stdout.strip()
user_home = expand_result.stdout.strip()
suffix = path[1 + len(username):] # e.g. "/rest/of/path"
return user_home + suffix
return path
-16
View File
@@ -5,7 +5,6 @@ import errno
import json
import logging
import os
import re
import threading
from typing import Optional
from tools.file_operations import ShellFileOperations
@@ -13,17 +12,6 @@ from agent.redact import redact_sensitive_text
logger = logging.getLogger(__name__)
# Regex to match ANSI escape sequences (CSI codes, OSC codes, simple escapes).
# Models occasionally copy these from terminal output into file content.
_ANSI_ESCAPE_RE = re.compile(r"\x1b\[[0-9;]*[A-Za-z]|\x1b\][^\x07]*\x07|\x1b[()][A-B012]|\x1b[=>]")
def _strip_ansi(text: str) -> str:
"""Remove ANSI escape sequences from text destined for file writes."""
if not text or "\x1b" not in text:
return text
return _ANSI_ESCAPE_RE.sub("", text)
_EXPECTED_WRITE_ERRNOS = {errno.EACCES, errno.EPERM, errno.EROFS}
@@ -301,7 +289,6 @@ def notify_other_tool_call(task_id: str = "default"):
def write_file_tool(path: str, content: str, task_id: str = "default") -> str:
"""Write content to a file."""
try:
content = _strip_ansi(content)
file_ops = _get_file_ops(task_id)
result = file_ops.write_file(path, content)
return json.dumps(result.to_dict(), ensure_ascii=False)
@@ -325,13 +312,10 @@ def patch_tool(mode: str = "replace", path: str = None, old_string: str = None,
return json.dumps({"error": "path required"})
if old_string is None or new_string is None:
return json.dumps({"error": "old_string and new_string required"})
old_string = _strip_ansi(old_string)
new_string = _strip_ansi(new_string)
result = file_ops.patch_replace(path, old_string, new_string, replace_all)
elif mode == "patch":
if not patch:
return json.dumps({"error": "patch content required"})
patch = _strip_ansi(patch)
result = file_ops.patch_v4a(patch)
else:
return json.dumps({"error": f"Unknown mode: {mode}"})
+10 -5
View File
@@ -426,12 +426,14 @@ class ProcessRegistry:
def poll(self, session_id: str) -> dict:
"""Check status and get new output for a background process."""
from tools.ansi_strip import strip_ansi
session = self.get(session_id)
if session is None:
return {"status": "not_found", "error": f"No process with ID {session_id}"}
with session._lock:
output_preview = session.output_buffer[-1000:] if session.output_buffer else ""
output_preview = strip_ansi(session.output_buffer[-1000:]) if session.output_buffer else ""
result = {
"session_id": session.id,
@@ -450,12 +452,14 @@ class ProcessRegistry:
def read_log(self, session_id: str, offset: int = 0, limit: int = 200) -> dict:
"""Read the full output log with optional pagination by lines."""
from tools.ansi_strip import strip_ansi
session = self.get(session_id)
if session is None:
return {"status": "not_found", "error": f"No process with ID {session_id}"}
with session._lock:
full_output = session.output_buffer
full_output = strip_ansi(session.output_buffer)
lines = full_output.splitlines()
total_lines = len(lines)
@@ -486,6 +490,7 @@ class ProcessRegistry:
dict with status ("exited", "timeout", "interrupted", "not_found")
and output snapshot.
"""
from tools.ansi_strip import strip_ansi
from tools.terminal_tool import _interrupt_event
default_timeout = int(os.getenv("TERMINAL_TIMEOUT", "180"))
@@ -513,7 +518,7 @@ class ProcessRegistry:
result = {
"status": "exited",
"exit_code": session.exit_code,
"output": session.output_buffer[-2000:],
"output": strip_ansi(session.output_buffer[-2000:]),
}
if timeout_note:
result["timeout_note"] = timeout_note
@@ -522,7 +527,7 @@ class ProcessRegistry:
if _interrupt_event.is_set():
result = {
"status": "interrupted",
"output": session.output_buffer[-1000:],
"output": strip_ansi(session.output_buffer[-1000:]),
"note": "User sent a new message -- wait interrupted",
}
if timeout_note:
@@ -533,7 +538,7 @@ class ProcessRegistry:
result = {
"status": "timeout",
"output": session.output_buffer[-1000:],
"output": strip_ansi(session.output_buffer[-1000:]),
}
if timeout_note:
result["timeout_note"] = timeout_note
+67 -6
View File
@@ -179,6 +179,58 @@ async def _summarize_session(
return None
def _list_recent_sessions(db, limit: int, current_session_id: str = None) -> str:
"""Return metadata for the most recent sessions (no LLM calls)."""
try:
sessions = db.list_sessions_rich(limit=limit + 5) # fetch extra to skip current
# Resolve current session lineage to exclude it
current_root = None
if current_session_id:
try:
sid = current_session_id
visited = set()
while sid and sid not in visited:
visited.add(sid)
s = db.get_session(sid)
parent = s.get("parent_session_id") if s else None
sid = parent if parent else None
current_root = max(visited, key=len) if visited else current_session_id
except Exception:
current_root = current_session_id
results = []
for s in sessions:
sid = s.get("id", "")
if current_root and (sid == current_root or sid == current_session_id):
continue
# Skip child/delegation sessions (they have parent_session_id)
if s.get("parent_session_id"):
continue
results.append({
"session_id": sid,
"title": s.get("title") or None,
"source": s.get("source", ""),
"started_at": s.get("started_at", ""),
"last_active": s.get("last_active", ""),
"message_count": s.get("message_count", 0),
"preview": s.get("preview", ""),
})
if len(results) >= limit:
break
return json.dumps({
"success": True,
"mode": "recent",
"results": results,
"count": len(results),
"message": f"Showing {len(results)} most recent sessions. Use a keyword query to search specific topics.",
}, ensure_ascii=False)
except Exception as e:
logging.error("Error listing recent sessions: %s", e, exc_info=True)
return json.dumps({"success": False, "error": f"Failed to list recent sessions: {e}"}, ensure_ascii=False)
def session_search(
query: str,
role_filter: str = None,
@@ -195,11 +247,14 @@ def session_search(
if db is None:
return json.dumps({"success": False, "error": "Session database not available."}, ensure_ascii=False)
limit = min(limit, 5) # Cap at 5 sessions to avoid excessive LLM calls
# Recent sessions mode: when query is empty, return metadata for recent sessions.
# No LLM calls — just DB queries for titles, previews, timestamps.
if not query or not query.strip():
return json.dumps({"success": False, "error": "Query cannot be empty."}, ensure_ascii=False)
return _list_recent_sessions(db, limit, current_session_id)
query = query.strip()
limit = min(limit, 5) # Cap at 5 sessions to avoid excessive LLM calls
try:
# Parse role filter
@@ -364,8 +419,14 @@ def check_session_search_requirements() -> bool:
SESSION_SEARCH_SCHEMA = {
"name": "session_search",
"description": (
"Search your long-term memory of past conversations. This is your recall -- "
"Search your long-term memory of past conversations, or browse recent sessions. This is your recall -- "
"every past session is searchable, and this tool summarizes what happened.\n\n"
"TWO MODES:\n"
"1. Recent sessions (no query): Call with no arguments to see what was worked on recently. "
"Returns titles, previews, and timestamps. Zero LLM cost, instant. "
"Start here when the user asks what were we working on or what did we do recently.\n"
"2. Keyword search (with query): Search for specific topics across all past sessions. "
"Returns LLM-generated summaries of matching sessions.\n\n"
"USE THIS PROACTIVELY when:\n"
"- The user says 'we did this before', 'remember when', 'last time', 'as I mentioned'\n"
"- The user asks about a topic you worked on before but don't have in current context\n"
@@ -385,7 +446,7 @@ SESSION_SEARCH_SCHEMA = {
"properties": {
"query": {
"type": "string",
"description": "Search query — keywords, phrases, or boolean expressions to find in past sessions.",
"description": "Search query — keywords, phrases, or boolean expressions to find in past sessions. Omit this parameter entirely to browse recent sessions instead (returns titles, previews, timestamps with no LLM cost).",
},
"role_filter": {
"type": "string",
@@ -397,7 +458,7 @@ SESSION_SEARCH_SCHEMA = {
"default": 3,
},
},
"required": ["query"],
"required": [],
},
}
@@ -410,7 +471,7 @@ registry.register(
toolset="session_search",
schema=SESSION_SEARCH_SCHEMA,
handler=lambda args, **kw: session_search(
query=args.get("query", ""),
query=args.get("query") or "",
role_filter=args.get("role_filter"),
limit=args.get("limit", 3),
db=kw.get("db"),
+3
View File
@@ -1050,6 +1050,9 @@ def _get_configured_model() -> str:
def _resolve_trust_level(source: str) -> str:
"""Map a source identifier to a trust level."""
# Agent-created skills get their own permissive trust level
if source == "agent-created":
return "agent-created"
# Official optional skills shipped with the repo
if source.startswith("official/") or source == "official":
return "builtin"
+20
View File
@@ -1146,6 +1146,26 @@ def skill_view(name: str, file_path: str = None, task_id: str = None) -> str:
)
setup_needed = bool(remaining_missing_required_envs)
# Register available skill env vars so they pass through to sandboxed
# execution environments (execute_code, terminal). Only vars that are
# actually set get registered — missing ones are reported as setup_needed.
available_env_names = [
e["name"]
for e in required_env_vars
if e["name"] not in remaining_missing_required_envs
]
if available_env_names:
try:
from tools.env_passthrough import register_env_passthrough
register_env_passthrough(available_env_names)
except Exception:
logger.debug(
"Could not register env passthrough for skill %s",
skill_name,
exc_info=True,
)
result = {
"success": True,
"name": skill_name,
+20 -35
View File
@@ -1,8 +1,8 @@
#!/usr/bin/env python3
"""
Terminal Tool Module (mini-swe-agent backend)
Terminal Tool Module
A terminal tool that executes commands using mini-swe-agent's execution environments.
A terminal tool that executes commands in local, Docker, Modal, SSH, Singularity, and Daytona environments.
Supports local execution, Docker containers, and Modal cloud sandboxes.
Environment Selection (via TERMINAL_ENV environment variable):
@@ -51,13 +51,6 @@ logger = logging.getLogger(__name__)
from tools.interrupt import is_interrupted, _interrupt_event
# Add mini-swe-agent to path if not installed. In git worktrees the populated
# submodule may live in the main checkout rather than the worktree itself.
from minisweagent_path import ensure_minisweagent_on_path
ensure_minisweagent_on_path(Path(__file__).resolve().parent.parent)
# =============================================================================
# Custom Singularity Environment with more space
# =============================================================================
@@ -490,10 +483,12 @@ def _get_env_config() -> Dict[str, Any]:
host_cwd = candidate
cwd = "/workspace"
elif env_type in ("modal", "docker", "singularity", "daytona") and cwd:
# Host paths that won't exist inside containers
if any(cwd.startswith(p) for p in host_prefixes) and cwd != default_cwd:
# Host paths and relative paths that won't work inside containers
is_host_path = any(cwd.startswith(p) for p in host_prefixes)
is_relative = not os.path.isabs(cwd) # e.g. "." or "src/"
if (is_host_path or is_relative) and cwd != default_cwd:
logger.info("Ignoring TERMINAL_CWD=%r for %s backend "
"(host path won't exist in sandbox). Using %r instead.",
"(host/relative path won't work in sandbox). Using %r instead.",
cwd, env_type, default_cwd)
cwd = default_cwd
@@ -537,7 +532,7 @@ def _create_environment(env_type: str, image: str, cwd: str, timeout: int,
task_id: str = "default",
host_cwd: str = None):
"""
Create an execution environment from mini-swe-agent.
Create an execution environment for sandboxed command execution.
Args:
env_type: One of "local", "docker", "singularity", "modal", "daytona", "ssh"
@@ -852,7 +847,7 @@ def terminal_tool(
pty: bool = False,
) -> str:
"""
Execute a command using mini-swe-agent's execution environments.
Execute a command in the configured terminal environment.
Args:
command: The command to execute
@@ -987,7 +982,7 @@ def terminal_tool(
return json.dumps({
"output": "",
"exit_code": -1,
"error": f"Terminal tool disabled: mini-swe-agent not available ({e})",
"error": f"Terminal tool disabled: environment creation failed ({e})",
"status": "disabled"
}, ensure_ascii=False)
@@ -1163,6 +1158,11 @@ def terminal_tool(
)
output = output[:head_chars] + truncated_notice + output[-tail_chars:]
# Strip ANSI escape sequences so the model never sees terminal
# formatting — prevents it from copying escapes into file writes.
from tools.ansi_strip import strip_ansi
output = strip_ansi(output)
# Redact secrets from command output (catches env/printenv leaking keys)
from agent.redact import redact_sensitive_text
output = redact_sensitive_text(output.strip()) if output else ""
@@ -1183,27 +1183,15 @@ def terminal_tool(
def check_terminal_requirements() -> bool:
"""Check if all requirements for the terminal tool are met.
Important: local and singularity backends now use Hermes' own environment
wrappers directly and do not require the ``minisweagent`` Python package to
be installed. Docker and Modal still rely on mini-swe-agent internals.
"""
"""Check if all requirements for the terminal tool are met."""
config = _get_env_config()
env_type = config["env_type"]
try:
if env_type == "local":
# Local execution uses Hermes' own LocalEnvironment wrapper and does
# not depend on minisweagent being importable.
return True
elif env_type == "docker":
ensure_minisweagent_on_path(Path(__file__).resolve().parent.parent)
if importlib.util.find_spec("minisweagent") is None:
logger.error("mini-swe-agent is required for docker terminal backend but is not importable")
return False
# Check if docker is available (use find_docker for macOS PATH issues)
from tools.environments.docker import find_docker
docker = find_docker()
if not docker:
@@ -1220,7 +1208,6 @@ def check_terminal_requirements() -> bool:
return False
elif env_type == "ssh":
# Check that host and user are configured
if not config.get("ssh_host") or not config.get("ssh_user"):
logger.error(
"SSH backend selected but TERMINAL_SSH_HOST and TERMINAL_SSH_USER "
@@ -1230,11 +1217,9 @@ def check_terminal_requirements() -> bool:
return True
elif env_type == "modal":
ensure_minisweagent_on_path(Path(__file__).resolve().parent.parent)
if importlib.util.find_spec("minisweagent") is None:
logger.error("mini-swe-agent is required for modal terminal backend but is not importable")
if importlib.util.find_spec("swerex") is None:
logger.error("swe-rex is required for modal terminal backend: pip install 'swe-rex[modal]'")
return False
# Check for modal token
has_token = os.getenv("MODAL_TOKEN_ID") is not None
has_config = Path.home().joinpath(".modal.toml").exists()
if not (has_token or has_config):
@@ -1264,7 +1249,7 @@ def check_terminal_requirements() -> bool:
if __name__ == "__main__":
# Simple test when run directly
print("Terminal Tool Module (mini-swe-agent backend)")
print("Terminal Tool Module")
print("=" * 50)
config = _get_env_config()
@@ -1282,7 +1267,7 @@ if __name__ == "__main__":
print("\n✅ All requirements met!")
print("\nAvailable Tool:")
print(" - terminal_tool: Execute commands using mini-swe-agent environments")
print(" - terminal_tool: Execute commands in sandboxed environments")
print("\nUsage Examples:")
print(" # Execute a command")
+96
View File
@@ -0,0 +1,96 @@
"""URL safety checks — blocks requests to private/internal network addresses.
Prevents SSRF (Server-Side Request Forgery) where a malicious prompt or
skill could trick the agent into fetching internal resources like cloud
metadata endpoints (169.254.169.254), localhost services, or private
network hosts.
Limitations (documented, not fixable at pre-flight level):
- DNS rebinding (TOCTOU): an attacker-controlled DNS server with TTL=0
can return a public IP for the check, then a private IP for the actual
connection. Fixing this requires connection-level validation (e.g.
Python's Champion library or an egress proxy like Stripe's Smokescreen).
- Redirect-based bypass in vision_tools is mitigated by an httpx event
hook that re-validates each redirect target. Web tools use third-party
SDKs (Firecrawl/Tavily) where redirect handling is on their servers.
"""
import ipaddress
import logging
import socket
from urllib.parse import urlparse
logger = logging.getLogger(__name__)
# Hostnames that should always be blocked regardless of IP resolution
_BLOCKED_HOSTNAMES = frozenset({
"metadata.google.internal",
"metadata.goog",
})
# 100.64.0.0/10 (CGNAT / Shared Address Space, RFC 6598) is NOT covered by
# ipaddress.is_private — it returns False for both is_private and is_global.
# Must be blocked explicitly. Used by carrier-grade NAT, Tailscale/WireGuard
# VPNs, and some cloud internal networks.
_CGNAT_NETWORK = ipaddress.ip_network("100.64.0.0/10")
def _is_blocked_ip(ip: ipaddress.IPv4Address | ipaddress.IPv6Address) -> bool:
"""Return True if the IP should be blocked for SSRF protection."""
if ip.is_private or ip.is_loopback or ip.is_link_local or ip.is_reserved:
return True
if ip.is_multicast or ip.is_unspecified:
return True
# CGNAT range not covered by is_private
if ip in _CGNAT_NETWORK:
return True
return False
def is_safe_url(url: str) -> bool:
"""Return True if the URL target is not a private/internal address.
Resolves the hostname to an IP and checks against private ranges.
Fails closed: DNS errors and unexpected exceptions block the request.
"""
try:
parsed = urlparse(url)
hostname = (parsed.hostname or "").strip().lower()
if not hostname:
return False
# Block known internal hostnames
if hostname in _BLOCKED_HOSTNAMES:
logger.warning("Blocked request to internal hostname: %s", hostname)
return False
# Try to resolve and check IP
try:
addr_info = socket.getaddrinfo(hostname, None, socket.AF_UNSPEC, socket.SOCK_STREAM)
except socket.gaierror:
# DNS resolution failed — fail closed. If DNS can't resolve it,
# the HTTP client will also fail, so blocking loses nothing.
logger.warning("Blocked request — DNS resolution failed for: %s", hostname)
return False
for family, _, _, _, sockaddr in addr_info:
ip_str = sockaddr[0]
try:
ip = ipaddress.ip_address(ip_str)
except ValueError:
continue
if _is_blocked_ip(ip):
logger.warning(
"Blocked request to private/internal address: %s -> %s",
hostname, ip_str,
)
return False
return True
except Exception as exc:
# Fail closed on unexpected errors — don't let parsing edge cases
# become SSRF bypass vectors
logger.warning("Blocked request — URL safety check error for %s: %s", url, exc)
return False
+38 -4
View File
@@ -69,7 +69,12 @@ def _validate_image_url(url: str) -> bool:
if not parsed.netloc:
return False
return True # Allow all well-formed HTTP/HTTPS URLs for flexibility
# Block private/internal addresses to prevent SSRF
from tools.url_safety import is_safe_url
if not is_safe_url(url):
return False
return True
async def _download_image(image_url: str, destination: Path, max_retries: int = 3) -> Path:
@@ -92,12 +97,33 @@ async def _download_image(image_url: str, destination: Path, max_retries: int =
# Create parent directories if they don't exist
destination.parent.mkdir(parents=True, exist_ok=True)
async def _ssrf_redirect_guard(response):
"""Re-validate each redirect target to prevent redirect-based SSRF.
Without this, an attacker can host a public URL that 302-redirects
to http://169.254.169.254/ and bypass the pre-flight is_safe_url check.
Must be async because httpx.AsyncClient awaits event hooks.
"""
if response.is_redirect and response.next_request:
redirect_url = str(response.next_request.url)
from tools.url_safety import is_safe_url
if not is_safe_url(redirect_url):
raise ValueError(
f"Blocked redirect to private/internal address: {redirect_url}"
)
last_error = None
for attempt in range(max_retries):
try:
# Download the image with appropriate headers using async httpx
# Enable follow_redirects to handle image CDNs that redirect (e.g., Imgur, Picsum)
async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
# SSRF: event_hooks validates each redirect target against private IP ranges
async with httpx.AsyncClient(
timeout=30.0,
follow_redirects=True,
event_hooks={"response": [_ssrf_redirect_guard]},
) as client:
response = await client.get(
image_url,
headers={
@@ -299,8 +325,9 @@ async def vision_analyze_tool(
logger.info("Processing image with vision model...")
# Call the vision API via centralized router.
# Read timeout from config.yaml (auxiliary.vision.timeout), default 30s.
vision_timeout = 30.0
# Read timeout from config.yaml (auxiliary.vision.timeout), default 120s.
# Local vision models (llama.cpp, ollama) can take well over 30s.
vision_timeout = 120.0
try:
from hermes_cli.config import load_config
_cfg = load_config()
@@ -349,6 +376,13 @@ async def vision_analyze_tool(
# so it can inform the user instead of a cryptic API error.
err_str = str(e).lower()
if any(hint in err_str for hint in (
"402", "insufficient", "payment required", "credits", "billing",
)):
analysis = (
"Insufficient credits or payment required. Please top up your "
f"API provider account and try again. Error: {e}"
)
elif any(hint in err_str for hint in (
"does not support", "not support image", "invalid_request",
"content_policy", "image_url", "multimodal",
"unrecognized request argument", "image input",
+148 -118
View File
@@ -46,6 +46,7 @@ import httpx
from firecrawl import Firecrawl
from agent.auxiliary_client import async_call_llm
from tools.debug_helpers import DebugSession
from tools.url_safety import is_safe_url
from tools.website_policy import check_website_access
logger = logging.getLogger(__name__)
@@ -861,136 +862,155 @@ async def web_extract_tool(
try:
logger.info("Extracting content from %d URL(s)", len(urls))
# Dispatch to the configured backend
backend = _get_backend()
if backend == "parallel":
results = await _parallel_extract(urls)
elif backend == "tavily":
logger.info("Tavily extract: %d URL(s)", len(urls))
raw = _tavily_request("extract", {
"urls": urls,
"include_images": False,
})
results = _normalize_tavily_documents(raw, fallback_url=urls[0] if urls else "")
else:
# ── Firecrawl extraction ──
# Determine requested formats for Firecrawl v2
formats: List[str] = []
if format == "markdown":
formats = ["markdown"]
elif format == "html":
formats = ["html"]
# ── SSRF protection — filter out private/internal URLs before any backend ──
safe_urls = []
ssrf_blocked: List[Dict[str, Any]] = []
for url in urls:
if not is_safe_url(url):
ssrf_blocked.append({
"url": url, "title": "", "content": "",
"error": "Blocked: URL targets a private or internal network address",
})
else:
# Default: request markdown for LLM-readiness and include html as backup
formats = ["markdown", "html"]
safe_urls.append(url)
# Always use individual scraping for simplicity and reliability
# Batch scraping adds complexity without much benefit for small numbers of URLs
results: List[Dict[str, Any]] = []
# Dispatch only safe URLs to the configured backend
if not safe_urls:
results = []
else:
backend = _get_backend()
from tools.interrupt import is_interrupted as _is_interrupted
for url in urls:
if _is_interrupted():
results.append({"url": url, "error": "Interrupted", "title": ""})
continue
if backend == "parallel":
results = await _parallel_extract(safe_urls)
elif backend == "tavily":
logger.info("Tavily extract: %d URL(s)", len(safe_urls))
raw = _tavily_request("extract", {
"urls": safe_urls,
"include_images": False,
})
results = _normalize_tavily_documents(raw, fallback_url=safe_urls[0] if safe_urls else "")
else:
# ── Firecrawl extraction ──
# Determine requested formats for Firecrawl v2
formats: List[str] = []
if format == "markdown":
formats = ["markdown"]
elif format == "html":
formats = ["html"]
else:
# Default: request markdown for LLM-readiness and include html as backup
formats = ["markdown", "html"]
# Website policy check — block before fetching
blocked = check_website_access(url)
if blocked:
logger.info("Blocked web_extract for %s by rule %s", blocked["host"], blocked["rule"])
results.append({
"url": url, "title": "", "content": "",
"error": blocked["message"],
"blocked_by_policy": {"host": blocked["host"], "rule": blocked["rule"], "source": blocked["source"]},
})
continue
# Always use individual scraping for simplicity and reliability
# Batch scraping adds complexity without much benefit for small numbers of URLs
results: List[Dict[str, Any]] = []
try:
logger.info("Scraping: %s", url)
scrape_result = _get_firecrawl_client().scrape(
url=url,
formats=formats
)
from tools.interrupt import is_interrupted as _is_interrupted
for url in safe_urls:
if _is_interrupted():
results.append({"url": url, "error": "Interrupted", "title": ""})
continue
# Process the result - properly handle object serialization
metadata = {}
title = ""
content_markdown = None
content_html = None
# Extract data from the scrape result
if hasattr(scrape_result, 'model_dump'):
# Pydantic model - use model_dump to get dict
result_dict = scrape_result.model_dump()
content_markdown = result_dict.get('markdown')
content_html = result_dict.get('html')
metadata = result_dict.get('metadata', {})
elif hasattr(scrape_result, '__dict__'):
# Regular object with attributes
content_markdown = getattr(scrape_result, 'markdown', None)
content_html = getattr(scrape_result, 'html', None)
# Handle metadata - convert to dict if it's an object
metadata_obj = getattr(scrape_result, 'metadata', {})
if hasattr(metadata_obj, 'model_dump'):
metadata = metadata_obj.model_dump()
elif hasattr(metadata_obj, '__dict__'):
metadata = metadata_obj.__dict__
elif isinstance(metadata_obj, dict):
metadata = metadata_obj
else:
metadata = {}
elif isinstance(scrape_result, dict):
# Already a dictionary
content_markdown = scrape_result.get('markdown')
content_html = scrape_result.get('html')
metadata = scrape_result.get('metadata', {})
# Ensure metadata is a dict (not an object)
if not isinstance(metadata, dict):
if hasattr(metadata, 'model_dump'):
metadata = metadata.model_dump()
elif hasattr(metadata, '__dict__'):
metadata = metadata.__dict__
else:
metadata = {}
# Get title from metadata
title = metadata.get("title", "")
# Re-check final URL after redirect
final_url = metadata.get("sourceURL", url)
final_blocked = check_website_access(final_url)
if final_blocked:
logger.info("Blocked redirected web_extract for %s by rule %s", final_blocked["host"], final_blocked["rule"])
# Website policy check — block before fetching
blocked = check_website_access(url)
if blocked:
logger.info("Blocked web_extract for %s by rule %s", blocked["host"], blocked["rule"])
results.append({
"url": final_url, "title": title, "content": "", "raw_content": "",
"error": final_blocked["message"],
"blocked_by_policy": {"host": final_blocked["host"], "rule": final_blocked["rule"], "source": final_blocked["source"]},
"url": url, "title": "", "content": "",
"error": blocked["message"],
"blocked_by_policy": {"host": blocked["host"], "rule": blocked["rule"], "source": blocked["source"]},
})
continue
# Choose content based on requested format
chosen_content = content_markdown if (format == "markdown" or (format is None and content_markdown)) else content_html or content_markdown or ""
try:
logger.info("Scraping: %s", url)
scrape_result = _get_firecrawl_client().scrape(
url=url,
formats=formats
)
results.append({
"url": final_url,
"title": title,
"content": chosen_content,
"raw_content": chosen_content,
"metadata": metadata # Now guaranteed to be a dict
})
# Process the result - properly handle object serialization
metadata = {}
title = ""
content_markdown = None
content_html = None
except Exception as scrape_err:
logger.debug("Scrape failed for %s: %s", url, scrape_err)
results.append({
"url": url,
"title": "",
"content": "",
"raw_content": "",
"error": str(scrape_err)
})
# Extract data from the scrape result
if hasattr(scrape_result, 'model_dump'):
# Pydantic model - use model_dump to get dict
result_dict = scrape_result.model_dump()
content_markdown = result_dict.get('markdown')
content_html = result_dict.get('html')
metadata = result_dict.get('metadata', {})
elif hasattr(scrape_result, '__dict__'):
# Regular object with attributes
content_markdown = getattr(scrape_result, 'markdown', None)
content_html = getattr(scrape_result, 'html', None)
# Handle metadata - convert to dict if it's an object
metadata_obj = getattr(scrape_result, 'metadata', {})
if hasattr(metadata_obj, 'model_dump'):
metadata = metadata_obj.model_dump()
elif hasattr(metadata_obj, '__dict__'):
metadata = metadata_obj.__dict__
elif isinstance(metadata_obj, dict):
metadata = metadata_obj
else:
metadata = {}
elif isinstance(scrape_result, dict):
# Already a dictionary
content_markdown = scrape_result.get('markdown')
content_html = scrape_result.get('html')
metadata = scrape_result.get('metadata', {})
# Ensure metadata is a dict (not an object)
if not isinstance(metadata, dict):
if hasattr(metadata, 'model_dump'):
metadata = metadata.model_dump()
elif hasattr(metadata, '__dict__'):
metadata = metadata.__dict__
else:
metadata = {}
# Get title from metadata
title = metadata.get("title", "")
# Re-check final URL after redirect
final_url = metadata.get("sourceURL", url)
final_blocked = check_website_access(final_url)
if final_blocked:
logger.info("Blocked redirected web_extract for %s by rule %s", final_blocked["host"], final_blocked["rule"])
results.append({
"url": final_url, "title": title, "content": "", "raw_content": "",
"error": final_blocked["message"],
"blocked_by_policy": {"host": final_blocked["host"], "rule": final_blocked["rule"], "source": final_blocked["source"]},
})
continue
# Choose content based on requested format
chosen_content = content_markdown if (format == "markdown" or (format is None and content_markdown)) else content_html or content_markdown or ""
results.append({
"url": final_url,
"title": title,
"content": chosen_content,
"raw_content": chosen_content,
"metadata": metadata # Now guaranteed to be a dict
})
except Exception as scrape_err:
logger.debug("Scrape failed for %s: %s", url, scrape_err)
results.append({
"url": url,
"title": "",
"content": "",
"raw_content": "",
"error": str(scrape_err)
})
# Merge any SSRF-blocked results back in
if ssrf_blocked:
results = ssrf_blocked + results
response = {"results": results}
@@ -1173,6 +1193,11 @@ async def web_crawl_tool(
if not url.startswith(('http://', 'https://')):
url = f'https://{url}'
# SSRF protection — block private/internal addresses
if not is_safe_url(url):
return json.dumps({"results": [{"url": url, "title": "", "content": "",
"error": "Blocked: URL targets a private or internal network address"}]}, ensure_ascii=False)
# Website policy check
blocked = check_website_access(url)
if blocked:
@@ -1258,6 +1283,11 @@ async def web_crawl_tool(
instructions_text = f" with instructions: '{instructions}'" if instructions else ""
logger.info("Crawling %s%s", url, instructions_text)
# SSRF protection — block private/internal addresses
if not is_safe_url(url):
return json.dumps({"results": [{"url": url, "title": "", "content": "",
"error": "Blocked: URL targets a private or internal network address"}]}, ensure_ascii=False)
# Website policy check — block before crawling
blocked = check_website_access(url)
if blocked:
Generated
+2494 -451
View File
File diff suppressed because it is too large Load Diff
+3 -1
View File
@@ -75,7 +75,9 @@ Concurrent tool execution preserves message/result ordering when reinserting too
- `reasoning_callback`
- `clarify_callback`
- `step_callback`
- `message_callback`
- `stream_delta_callback`
- `tool_gen_callback`
- `status_callback`
These are how the CLI, gateway, and ACP integrations stream intermediate progress and interactive approval/clarification flows.
@@ -49,7 +49,6 @@ export VIRTUAL_ENV="$(pwd)/venv"
# Install with all extras (messaging, cron, CLI menus, dev tools)
uv pip install -e ".[all,dev]"
uv pip install -e "./mini-swe-agent"
uv pip install -e "./tinker-atropos"
# Optional: browser tools
@@ -57,6 +57,15 @@ metadata:
hermes:
tags: [Category, Subcategory, Keywords]
related_skills: [other-skill-name]
requires_toolsets: [web] # Optional — only show when these toolsets are active
requires_tools: [web_search] # Optional — only show when these tools are available
fallback_for_toolsets: [browser] # Optional — hide when these toolsets are active
fallback_for_tools: [browser_navigate] # Optional — hide when these tools exist
required_environment_variables: # Optional — env vars the skill needs
- name: MY_API_KEY
prompt: "Enter your API key"
help: "Get one at https://example.com"
required_for: "API access"
---
# Skill Title
@@ -91,6 +100,57 @@ platforms: [windows] # Windows only
When set, the skill is automatically hidden from the system prompt, `skills_list()`, and slash commands on incompatible platforms. If omitted or empty, the skill loads on all platforms (backward compatible).
### Conditional Skill Activation
Skills can declare dependencies on specific tools or toolsets. This controls whether the skill appears in the system prompt for a given session.
```yaml
metadata:
hermes:
requires_toolsets: [web] # Hide if the web toolset is NOT active
requires_tools: [web_search] # Hide if web_search tool is NOT available
fallback_for_toolsets: [browser] # Hide if the browser toolset IS active
fallback_for_tools: [browser_navigate] # Hide if browser_navigate IS available
```
| Field | Behavior |
|-------|----------|
| `requires_toolsets` | Skill is **hidden** when ANY listed toolset is **not** available |
| `requires_tools` | Skill is **hidden** when ANY listed tool is **not** available |
| `fallback_for_toolsets` | Skill is **hidden** when ANY listed toolset **is** available |
| `fallback_for_tools` | Skill is **hidden** when ANY listed tool **is** available |
**Use case for `fallback_for_*`:** Create a skill that serves as a workaround when a primary tool isn't available. For example, a `duckduckgo-search` skill with `fallback_for_tools: [web_search]` only shows when the web search tool (which requires an API key) is not configured.
**Use case for `requires_*`:** Create a skill that only makes sense when certain tools are present. For example, a web scraping workflow skill with `requires_toolsets: [web]` won't clutter the prompt when web tools are disabled.
### Environment Variable Requirements
Skills can declare environment variables they need. When a skill is loaded via `skill_view`, its required vars are automatically registered for passthrough into sandboxed execution environments (terminal, execute_code).
```yaml
required_environment_variables:
- name: TENOR_API_KEY
prompt: "Tenor API key" # Shown when prompting user
help: "Get your key at https://tenor.com" # Help text or URL
required_for: "GIF search functionality" # What needs this var
```
Each entry supports:
- `name` (required) — the environment variable name
- `prompt` (optional) — prompt text when asking the user for the value
- `help` (optional) — help text or URL for obtaining the value
- `required_for` (optional) — describes which feature needs this variable
Users can also manually configure passthrough variables in `config.yaml`:
```yaml
terminal:
env_passthrough:
- MY_CUSTOM_VAR
- ANOTHER_VAR
```
See `skills/apple/` for examples of macOS-only skills.
## Secure Setup on Load
@@ -107,6 +167,10 @@ required_environment_variables:
The user can skip setup and keep loading the skill. Hermes never exposes the raw secret value to the model. Gateway and messaging sessions show local setup guidance instead of collecting secrets in-band.
:::tip Sandbox Passthrough
When your skill is loaded, any declared `required_environment_variables` that are set are **automatically passed through** to `execute_code` and `terminal` sandboxes. Your skill's scripts can access `$TENOR_API_KEY` (or `os.environ["TENOR_API_KEY"]` in Python) without the user needing to configure anything extra. See [Environment Variable Passthrough](/docs/user-guide/security#environment-variable-passthrough) for details.
:::
Legacy `prerequisites.env_vars` remains supported as a backward-compatible alias.
## Skill Guidelines
@@ -58,11 +58,12 @@ Local memory and user profile data are injected as frozen snapshots at session s
## Context files
`agent/prompt_builder.py` scans and sanitizes:
`agent/prompt_builder.py` scans and sanitizes project context files using a **priority system** — only one type is loaded (first match wins):
- `AGENTS.md`
- `.cursorrules`
- `.cursor/rules/*.mdc`
1. `.hermes.md` / `HERMES.md` (walks to git root)
2. `AGENTS.md` (recursive directory walk)
3. `CLAUDE.md` (CWD only)
4. `.cursorrules` / `.cursor/rules/*.mdc` (CWD only)
`SOUL.md` is loaded separately via `load_soul_md()` for the identity slot. When it loads successfully, `build_context_files_prompt(skip_soul=True)` prevents it from appearing twice.
@@ -16,9 +16,10 @@ Hermes has a shared provider runtime resolver used across:
Primary implementation:
- `hermes_cli/runtime_provider.py`
- `hermes_cli/auth.py`
- `agent/auxiliary_client.py`
- `hermes_cli/runtime_provider.py` — credential resolution, `_resolve_custom_runtime()`
- `hermes_cli/auth.py` — provider registry, `resolve_provider()`
- `hermes_cli/model_switch.py` — shared `/model` switch pipeline (CLI + gateway)
- `agent/auxiliary_client.py` — auxiliary model routing
If you are trying to add a new first-class inference provider, read [Adding Providers](./adding-providers.md) alongside this page.
@@ -46,7 +47,8 @@ Current provider families include:
- Kimi / Moonshot
- MiniMax
- MiniMax China
- custom OpenAI-compatible endpoints
- Custom (`provider: custom`) — first-class provider for any OpenAI-compatible endpoint
- Named custom providers (`custom_providers` list in config.yaml)
## Output of runtime resolution
+2 -6
View File
@@ -132,13 +132,10 @@ You can combine extras: `uv pip install -e ".[messaging,cron]"`
</details>
### Step 4: Install Submodule Packages
### Step 4: Install Optional Submodules (if needed)
```bash
# Terminal tool backend (required for terminal/command-execution)
uv pip install -e "./mini-swe-agent"
# RL training backend
# RL training backend (optional)
uv pip install -e "./tinker-atropos"
```
@@ -238,7 +235,6 @@ export VIRTUAL_ENV="$(pwd)/venv"
# Install everything
uv pip install -e ".[all]"
uv pip install -e "./mini-swe-agent"
uv pip install -e "./tinker-atropos"
npm install # optional, for browser tools and WhatsApp
+1 -1
View File
@@ -139,7 +139,7 @@ hermes gateway setup # Interactive platform configuration
Want microphone input in the CLI or spoken replies in messaging?
```bash
pip install hermes-agent[voice]
pip install "hermes-agent[voice]"
# Optional but recommended for free local speech-to-text
pip install faster-whisper
-1
View File
@@ -44,7 +44,6 @@ git submodule update --init --recursive
# Reinstall (picks up new dependencies)
uv pip install -e ".[all]"
uv pip install -e "./mini-swe-agent"
uv pip install -e "./tinker-atropos"
# Check for new config options
+7 -5
View File
@@ -29,12 +29,14 @@ Create `plugin.yaml`:
name: calculator
version: 1.0.0
description: Math calculator — evaluate expressions and convert units
provides:
tools: true
hooks: true
provides_tools:
- calculate
- unit_convert
provides_hooks:
- post_tool_call
```
This tells Hermes: "I'm a plugin called calculator, I provide tools and hooks." That's all the manifest needs.
This tells Hermes: "I'm a plugin called calculator, I provide tools and hooks." The `provides_tools` and `provides_hooks` fields are lists of what the plugin registers.
Optional fields you could add:
```yaml
@@ -232,7 +234,7 @@ def register(ctx):
- Called exactly once at startup
- `ctx.register_tool()` puts your tool in the registry — the model sees it immediately
- `ctx.register_hook()` subscribes to lifecycle events
- `ctx.register_command()` adds a slash command to `/help`, autocomplete, and gateway dispatch
- `ctx.register_command()` — _planned but not yet implemented_
- If this function crashes, the plugin is disabled but Hermes continues fine
## Step 6: Test it
+1 -1
View File
@@ -170,7 +170,7 @@ Instead of manually collecting user IDs for allowlists, enable DM pairing. When
Use `/verbose` to control how much tool activity you see. In messaging platforms, less is usually more — keep it on "new" to see just new tool calls. In the CLI, "all" gives you a satisfying live view of everything the agent does.
:::tip
On messaging platforms, sessions auto-reset after idle time (default: 120 min) or daily at 4 AM. Adjust per-platform in `~/.hermes/gateway.json` if you need longer sessions.
On messaging platforms, sessions auto-reset after idle time (default: 24 hours) or daily at 4 AM. Adjust per-platform in `~/.hermes/config.yaml` if you need longer sessions.
:::
## Security
@@ -57,19 +57,19 @@ If that is not solid yet, fix text mode first.
### CLI microphone + playback
```bash
pip install hermes-agent[voice]
pip install "hermes-agent[voice]"
```
### Messaging platforms
```bash
pip install hermes-agent[messaging]
pip install "hermes-agent[messaging]"
```
### Premium ElevenLabs TTS
```bash
pip install hermes-agent[tts-premium]
pip install "hermes-agent[tts-premium]"
```
### Local NeuTTS (optional)
@@ -81,7 +81,7 @@ python -m pip install -U neutts[all]
### Everything
```bash
pip install hermes-agent[all]
pip install "hermes-agent[all]"
```
## Step 3: install system dependencies
+54 -1
View File
@@ -66,7 +66,8 @@ Common options:
| `-q`, `--query "..."` | One-shot, non-interactive prompt. |
| `-m`, `--model <model>` | Override the model for this run. |
| `-t`, `--toolsets <csv>` | Enable a comma-separated set of toolsets. |
| `--provider <provider>` | Force a provider: `auto`, `openrouter`, `nous`, `openai-codex`, `copilot`, `copilot-acp`, `anthropic`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `opencode-zen`, `opencode-go`, `ai-gateway`, `kilocode`, `alibaba`. |
| `--provider <provider>` | Force a provider: `auto`, `openrouter`, `nous`, `openai-codex`, `copilot`, `copilot-acp`, `anthropic`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `kilocode`. |
| `-s`, `--skills <name>` | Preload one or more skills for the session (can be repeated or comma-separated). |
| `-v`, `--verbose` | Verbose output. |
| `-Q`, `--quiet` | Programmatic mode: suppress banner/spinner/tool previews. |
| `--resume <session>` / `--continue [name]` | Resume a session directly from `chat`. |
@@ -98,8 +99,25 @@ Use this when you want to:
- switch default providers
- log into OAuth-backed providers during model selection
- pick from provider-specific model lists
- configure a custom/self-hosted endpoint
- save the new default into config
### `/model` slash command (mid-session)
Switch models without leaving a session:
```
/model # Show current model and available options
/model claude-sonnet-4 # Switch model (auto-detects provider)
/model zai:glm-5 # Switch provider and model
/model custom:qwen-2.5 # Use model on your custom endpoint
/model custom # Auto-detect model from custom endpoint
/model custom:local:qwen-2.5 # Use a named custom provider
/model openrouter:anthropic/claude-sonnet-4 # Switch back to cloud
```
Provider and base URL changes are persisted to `config.yaml` automatically. When switching away from a custom endpoint, the stale base URL is cleared to prevent it leaking into other providers.
## `hermes gateway`
```bash
@@ -326,6 +344,41 @@ pip install -e '.[acp]'
See [ACP Editor Integration](../user-guide/features/acp.md) and [ACP Internals](../developer-guide/acp-internals.md).
## `hermes mcp`
```bash
hermes mcp <subcommand>
```
Manage MCP (Model Context Protocol) server configurations.
| Subcommand | Description |
|------------|-------------|
| `add <name> [--url URL] [--command CMD] [--args ...] [--auth oauth\|header]` | Add an MCP server with automatic tool discovery. |
| `remove <name>` (alias: `rm`) | Remove an MCP server from config. |
| `list` (alias: `ls`) | List configured MCP servers. |
| `test <name>` | Test connection to an MCP server. |
| `configure <name>` (alias: `config`) | Toggle tool selection for a server. |
See [MCP Config Reference](./mcp-config-reference.md) and [Use MCP with Hermes](../guides/use-mcp-with-hermes.md).
## `hermes plugins`
```bash
hermes plugins <subcommand>
```
Manage Hermes Agent plugins.
| Subcommand | Description |
|------------|-------------|
| `install <identifier> [--force]` | Install a plugin from a Git URL or `owner/repo`. |
| `update <name>` | Pull latest changes for an installed plugin. |
| `remove <name>` (aliases: `rm`, `uninstall`) | Remove an installed plugin. |
| `list` (alias: `ls`) | List installed plugins. |
See [Plugins](../user-guide/features/plugins.md) and [Build a Hermes Plugin](../guides/build-a-hermes-plugin.md).
## `hermes tools`
```bash
@@ -61,7 +61,7 @@ For native Anthropic auth, Hermes prefers Claude Code's own credential files whe
| Variable | Description |
|----------|-------------|
| `HERMES_INFERENCE_PROVIDER` | Override provider selection: `auto`, `openrouter`, `nous`, `openai-codex`, `copilot`, `anthropic`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `kilocode`, `alibaba` (default: `auto`) |
| `HERMES_INFERENCE_PROVIDER` | Override provider selection: `auto`, `openrouter`, `nous`, `openai-codex`, `copilot`, `copilot-acp`, `anthropic`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `kilocode` (default: `auto`) |
| `HERMES_PORTAL_BASE_URL` | Override Nous Portal URL (for development/testing) |
| `NOUS_INFERENCE_BASE_URL` | Override Nous inference API URL |
| `HERMES_NOUS_MIN_KEY_TTL_SECONDS` | Min agent key TTL before re-mint (default: 1800 = 30min) |
+20 -14
View File
@@ -53,7 +53,16 @@ hermes model
# Context length: 32768 ← set this to match your server's actual context window
```
Hermes persists the endpoint in `config.yaml` and prompts for the context window size so compression triggers at the right time. If you leave context length blank, Hermes auto-detects it from the server's `/models` endpoint or [models.dev](https://models.dev).
Or configure it directly in `config.yaml`:
```yaml
model:
default: qwen3.5:27b
provider: custom
base_url: http://localhost:11434/v1
```
Hermes persists the endpoint, provider, and base URL in `config.yaml` so it survives restarts. If your local server has exactly one model loaded, `/model custom` auto-detects it. You can also set `provider: custom` in config.yaml — it's a first-class provider, not an alias for anything else.
This works with Ollama, vLLM, llama.cpp server, SGLang, LocalAI, and others. See the [Configuration guide](../user-guide/configuration.md) for details.
@@ -84,7 +93,7 @@ Yes. Import the `AIAgent` class and use Hermes programmatically:
from hermes.agent import AIAgent
agent = AIAgent(model="openrouter/nous/hermes-3-llama-3.1-70b")
response = await agent.chat("Explain quantum computing briefly")
response = agent.chat("Explain quantum computing briefly")
```
See the [Python Library guide](../user-guide/features/code-execution.md) for full API usage.
@@ -166,8 +175,8 @@ curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scri
**Solution:**
```bash
# Check which keys are set
hermes config get OPENROUTER_API_KEY
# Check your configuration
hermes config show
# Re-configure your provider
hermes model
@@ -187,7 +196,7 @@ Make sure the key matches the provider. An OpenAI key won't work with OpenRouter
**Solution:**
```bash
# List available models for your provider
hermes models
hermes model
# Set a valid model
hermes config set HERMES_MODEL openrouter/nous/hermes-3-llama-3.1-70b
@@ -223,10 +232,7 @@ hermes chat --model openrouter/google/gemini-2.0-flash-001
If this happens on the first long conversation, Hermes may have the wrong context length for your model. Check what it detected:
```bash
# Look at the status bar — it shows the detected context length
/context
```
Look at the CLI startup line — it shows the detected context length (e.g., `📊 Context limit: 128000 tokens`). You can also check with `/usage` during a session.
To fix context detection, set it explicitly:
@@ -309,7 +315,7 @@ hermes gateway status
hermes gateway start
# Check logs for errors
hermes gateway logs
cat ~/.hermes/logs/gateway.log | tail -50
```
#### Messages not delivering
@@ -318,7 +324,7 @@ hermes gateway logs
**Solution:**
- Verify your bot token is valid with `hermes gateway setup`
- Check gateway logs: `hermes gateway logs`
- Check gateway logs: `cat ~/.hermes/logs/gateway.log | tail -50`
- For webhook-based platforms (Slack, WhatsApp), ensure your server is publicly accessible
#### Allowlist confusion — who can talk to the bot?
@@ -342,7 +348,7 @@ Configure in `~/.hermes/config.yaml` under your gateway's settings. See the [Mes
**Solution:**
```bash
# Install messaging dependencies
pip install hermes-agent[telegram] # or [discord], [slack], [whatsapp]
pip install "hermes-agent[telegram]" # or [discord], [slack], [whatsapp]
# Check for port conflicts
lsof -i :8080
@@ -374,8 +380,8 @@ hermes config show
# Compress the conversation to reduce tokens
/compress
# Check session token count
/stats
# Check session token usage
/usage
```
:::tip
@@ -18,8 +18,16 @@ Official optional skills live in the repository under `optional-skills/`. Instal
| Skill | Description | Path |
|-------|-------------|------|
| `base` | Query Base (Ethereum L2) blockchain data with USD pricing — wallet balances, token info, transaction details, gas analysis, contract inspection. | `blockchain/base` |
| `solana` | Query Solana blockchain data with USD pricing — wallet balances, token portfolios with values, transaction details, NFTs, whale detection, and live network stats. Uses Solana RPC + CoinGecko. No API key required. | `blockchain/solana` |
## creative
| Skill | Description | Path |
|-------|-------------|------|
| `blender-mcp` | Control Blender directly from Hermes via socket connection to the blender-mcp addon. Create 3D objects, materials, animations, and run arbitrary Blender Python. | `creative/blender-mcp` |
| `meme-generation` | Generate real meme images by picking a template and overlaying text with Pillow. Produces actual .png meme files. | `creative/meme-generation` |
## email
| Skill | Description | Path |
@@ -32,16 +40,29 @@ Official optional skills live in the repository under `optional-skills/`. Instal
|-------|-------------|------|
| `neuroskill-bci` | Connect to a running NeuroSkill instance and incorporate the user's real-time cognitive and emotional state (focus, relaxation, mood, cognitive load, drowsiness, heart rate, HRV, sleep staging, and 40+ derived EXG scores) into responses. Requires a BCI wearable (Muse 2/S or Open… | `health/neuroskill-bci` |
## mcp
| Skill | Description | Path |
|-------|-------------|------|
| `fastmcp` | Build, test, inspect, install, and deploy MCP servers with FastMCP in Python. | `mcp/fastmcp` |
## migration
| Skill | Description | Path |
|-------|-------------|------|
| `openclaw-migration` | Migrate a user's OpenClaw customization footprint into Hermes Agent. Imports Hermes-compatible memories, SOUL.md, command allowlists, user skills, and selected workspace assets from ~/.openclaw, then reports exactly what could not be migrated and why. | `migration/openclaw-migration` |
## productivity
| Skill | Description | Path |
|-------|-------------|------|
| `telephony` | Give Hermes phone capabilities — provision a Twilio number, send/receive SMS/MMS, make direct calls, and place AI-driven outbound calls through Bland.ai or Vapi. | `productivity/telephony` |
## research
| Skill | Description | Path |
|-------|-------------|------|
| `bioinformatics` | Gateway to 400+ bioinformatics skills from bioSkills and ClawBio. Covers genomics, transcriptomics, single-cell, variant calling, pharmacogenomics, metagenomics, structural biology. | `research/bioinformatics` |
| `qmd` | Search personal knowledge bases, notes, docs, and meeting transcripts locally using qmd — a hybrid retrieval engine with BM25, vector search, and LLM reranking. Supports CLI and MCP integration. | `research/qmd` |
## security
@@ -49,3 +70,5 @@ Official optional skills live in the repository under `optional-skills/`. Instal
| Skill | Description | Path |
|-------|-------------|------|
| `1password` | Set up and use 1Password CLI (op). Use when installing the CLI, enabling desktop app integration, signing in, and reading/injecting secrets for commands. | `security/1password` |
| `oss-forensics` | Supply chain investigation, evidence recovery, and forensic analysis for GitHub repositories. Covers deleted commit recovery, force-push detection, IOC extraction. | `security/oss-forensics` |
| `sherlock` | OSINT username search across 400+ social networks. Hunt down social media accounts by username. | `security/sherlock` |
+36 -1
View File
@@ -30,6 +30,14 @@ Skills for spawning and orchestrating autonomous AI coding agents and multi-agen
| `hermes-agent-spawning` | Spawn additional Hermes Agent instances as autonomous subprocesses for independent long-running tasks. Supports non-interactive one-shot mode (-q) and interactive PTY mode for multi-turn collaboration. Different from delegate_task — this runs a full separate hermes process. | `autonomous-ai-agents/hermes-agent` |
| `opencode` | Delegate coding tasks to OpenCode CLI agent for feature implementation, refactoring, PR review, and long-running autonomous sessions. Requires the opencode CLI installed and authenticated. | `autonomous-ai-agents/opencode` |
## data-science
Skills for data science workflows — interactive exploration, Jupyter notebooks, data analysis, and visualization.
| Skill | Description | Path |
|-------|-------------|------|
| `jupyter-live-kernel` | Use a live Jupyter kernel for stateful, iterative Python execution via hamelnb. Load this skill when the task involves exploration, iteration, or inspecting intermediate results. | `data-science/jupyter-live-kernel` |
## creative
Creative content generation — ASCII art, hand-drawn style diagrams, and visual design tools.
@@ -44,7 +52,8 @@ Creative content generation — ASCII art, hand-drawn style diagrams, and visual
| Skill | Description | Path |
|-------|-------------|------|
| `dogfood` | Systematic exploratory QA testing of web applications — find bugs, capture evidence, and generate structured reports | `dogfood` |
| `dogfood` | Systematic exploratory QA testing of web applications — find bugs, capture evidence, and generate structured reports. | `dogfood/dogfood` |
| `hermes-agent-setup` | Help users configure Hermes Agent — CLI usage, setup wizard, model/provider selection, tools, skills, voice/STT/TTS, gateway, and troubleshooting. | `dogfood/hermes-agent-setup` |
## email
@@ -76,6 +85,14 @@ GitHub workflow skills for managing repositories, pull requests, code reviews, i
| `github-pr-workflow` | Full pull request lifecycle — create branches, commit changes, open PRs, monitor CI status, auto-fix failures, and merge. Works with gh CLI or falls back to git + GitHub REST API via curl. | `github/github-pr-workflow` |
| `github-repo-management` | Clone, create, fork, configure, and manage GitHub repositories. Manage remotes, secrets, releases, and workflows. Works with gh CLI or falls back to git + GitHub REST API via curl. | `github/github-repo-management` |
## inference-sh
Skills for AI app execution via inference.sh cloud platform.
| Skill | Description | Path |
|-------|-------------|------|
| `inference-sh-cli` | Run 150+ AI apps via inference.sh CLI (infsh) — image generation, video creation, LLMs, search, 3D, social automation. | `inference-sh/cli` |
## leisure
| Skill | Description | Path |
@@ -102,6 +119,14 @@ Skills for working with media content — YouTube transcripts, GIF search, music
| `songsee` | Generate spectrograms and audio feature visualizations (mel, chroma, MFCC, tempogram, etc.) from audio files via CLI. Useful for audio analysis, music production debugging, and visual documentation. | `media/songsee` |
| `youtube-content` | Fetch YouTube video transcripts and transform them into structured content (chapters, summaries, threads, blog posts). | `media/youtube-content` |
## mlops
General-purpose ML operations tools — model hub management, dataset operations, and workflow orchestration.
| Skill | Description | Path |
|-------|-------------|------|
| `huggingface-hub` | Hugging Face Hub CLI (hf) — search, download, and upload models and datasets, manage repos, deploy inference endpoints. | `mlops/huggingface-hub` |
## mlops/cloud
GPU cloud providers and serverless compute platforms for ML workloads.
@@ -205,6 +230,7 @@ Skills for document creation, presentations, spreadsheets, and other productivit
| Skill | Description | Path |
|-------|-------------|------|
| `google-workspace` | Gmail, Calendar, Drive, Contacts, Sheets, and Docs integration via Python. Uses OAuth2 with automatic token refresh. No external binaries needed — runs entirely with Google's Python client libraries in the Hermes venv. | `productivity/google-workspace` |
| `linear` | Manage Linear issues, projects, and teams via the GraphQL API. Create, update, search, and organize issues. | `productivity/linear` |
| `nano-pdf` | Edit PDFs with natural-language instructions using the nano-pdf CLI. Modify text, fix typos, update titles, and make content changes to specific pages without manual editing. | `productivity/nano-pdf` |
| `notion` | Notion API for creating and managing pages, databases, and blocks via curl. Search, create, update, and query Notion workspaces directly from the terminal. | `productivity/notion` |
| `ocr-and-documents` | Extract text from PDFs and scanned documents. Use web_extract for remote URLs, pymupdf for local text-based PDFs, marker-pdf for OCR/scanned docs. For DOCX use python-docx, for PPTX see the powerpoint skill. | `productivity/ocr-and-documents` |
@@ -220,6 +246,7 @@ Skills for academic research, paper discovery, literature review, domain reconna
| `blogwatcher` | Monitor blogs and RSS/Atom feeds for updates using the blogwatcher CLI. Add blogs, scan for new articles, and track what you've read. | `research/blogwatcher` |
| `domain-intel` | Passive domain reconnaissance using Python stdlib. Subdomain discovery, SSL certificate inspection, WHOIS lookups, DNS records, domain availability checks, and bulk multi-domain analysis. No API keys required. | `research/domain-intel` |
| `duckduckgo-search` | Free web search via DuckDuckGo — text, news, images, videos. No API key needed. Use the Python DDGS library or CLI to search, then web_extract for full content. | `research/duckduckgo-search` |
| `parallel-cli` | Optional vendor skill for Parallel CLI — agent-native web search, extraction, deep research, enrichment, FindAll, and monitoring. | `research/parallel-cli` |
| `ml-paper-writing` | Write publication-ready ML/AI papers for NeurIPS, ICML, ICLR, ACL, AAAI, COLM. Use when drafting papers from research repos, structuring arguments, verifying citations, or preparing camera-ready submissions. Includes LaTeX templates, reviewer guidelines, and citation verificatio… | `research/ml-paper-writing` |
| `polymarket` | Query Polymarket prediction market data — search markets, get prices, orderbooks, and price history. Read-only via public REST APIs, no API key needed. | `research/polymarket` |
@@ -231,6 +258,14 @@ Skills for controlling smart home devices — lights, switches, sensors, and hom
|-------|-------------|------|
| `openhue` | Control Philips Hue lights, rooms, and scenes via the OpenHue CLI. Turn lights on/off, adjust brightness, color, color temperature, and activate scenes. | `smart-home/openhue` |
## social-media
Skills for interacting with social platforms — posting, reading, monitoring, and account operations.
| Skill | Description | Path |
|-------|-------------|------|
| `xitter` | Interact with X/Twitter via the x-cli terminal client using official X API credentials. | `social-media/xitter` |
## software-development
| Skill | Description | Path |
+5 -3
View File
@@ -31,6 +31,8 @@ Type `/` in the CLI to open the autocomplete menu. Built-in commands are case-in
| `/compress` | Manually compress conversation context (flush memories + summarize) |
| `/rollback` | List or restore filesystem checkpoints (usage: /rollback [number]) |
| `/stop` | Kill all running background processes |
| `/queue <prompt>` (alias: `/q`) | Queue a prompt for the next turn (doesn't interrupt the current agent response) |
| `/resume [name]` | Resume a previously-named session |
| `/statusbar` (alias: `/sb`) | Toggle the context/model status bar on or off |
| `/background <prompt>` | Run a prompt in a separate background session. The agent processes your prompt independently — your current session stays free for other work. Results appear as a panel when the task finishes. See [CLI Background Sessions](/docs/user-guide/cli#background-sessions). |
| `/plan [request]` | Load the bundled `plan` skill to write a markdown plan instead of executing the work. Plans are saved under `.hermes/plans/` relative to the active workspace/backend working directory. |
@@ -40,7 +42,7 @@ Type `/` in the CLI to open the autocomplete menu. Built-in commands are case-in
| Command | Description |
|---------|-------------|
| `/config` | Show current configuration |
| `/model` | Show or change the current model |
| `/model [model-name]` | Show or change the current model. Supports: `/model claude-sonnet-4`, `/model provider:model` (switch providers), `/model custom:model` (custom endpoint), `/model custom:name:model` (named custom provider), `/model custom` (auto-detect from endpoint) |
| `/provider` | Show available providers and current provider |
| `/prompt` | View/set custom system prompt |
| `/personality` | Set a predefined personality |
@@ -98,7 +100,7 @@ The messaging gateway supports the following built-in commands inside Telegram,
| `/reset` | Reset conversation history. |
| `/status` | Show session info. |
| `/stop` | Kill all running background processes and interrupt the running agent. |
| `/model [provider:model]` | Show or change the model, including provider switches. |
| `/model [provider:model]` | Show or change the model. Supports provider switches (`/model zai:glm-5`), custom endpoints (`/model custom:model`), named custom providers (`/model custom:local:qwen`), and auto-detect (`/model custom`). |
| `/provider` | Show provider availability and auth status. |
| `/personality [name]` | Set a personality overlay for the session. |
| `/retry` | Retry the last message. |
@@ -115,7 +117,7 @@ The messaging gateway supports the following built-in commands inside Telegram,
| `/background <prompt>` | Run a prompt in a separate background session. Results are delivered back to the same chat when the task finishes. See [Messaging Background Sessions](/docs/user-guide/messaging/#background-sessions). |
| `/plan [request]` | Load the bundled `plan` skill to write a markdown plan instead of executing the work. Plans are saved under `.hermes/plans/` relative to the active workspace/backend working directory. |
| `/reload-mcp` | Reload MCP servers from config. |
| `/approve` | Approve and execute a pending dangerous command (terminal commands flagged for review). |
| `/approve [session\|always]` | Approve and execute a pending dangerous command. `session` approves for this session only; `always` adds to permanent allowlist. |
| `/deny` | Reject a pending dangerous command. |
| `/update` | Update Hermes Agent to the latest version. |
| `/help` | Show messaging help. |

Some files were not shown because too many files have changed in this diff Show More