79799c80f576f111b92cedfcfbdbecee950cdff8
527 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
ba3c450914 |
fix(security): block read_file on project-local .env files
get_read_block_error() only blocked internal Hermes cache files but allowed reading project-local secret-bearing environment files (.env, .env.production, .env.local, etc.) through both read_file and ACP fs/read_text_file paths. Add a basename deny set for common secret-bearing .env variants. .env.example remains readable as documentation. Fixes #20734 |
||
|
|
2d422720b5 |
fix(codex): size and propagate timeouts for Responses-API requests; lower stale defaults
Codex / Responses-API requests had three latent timeout bugs that combined into the long silent hangs reported on #21444: 1. The non-stream stale-call detector estimated context tokens from ``api_kwargs["messages"]`` only. Codex / Responses-API payloads carry their conversational load in ``input`` (with ``instructions`` and ``tools``), so every Codex turn logged ``context=~0 tokens`` and the detector never applied its >50k / >100k tier bumps. 2. ``providers.<id>.request_timeout_seconds`` was silently dropped on the main Codex path. The chat_completions path and the auxiliary Codex adapter both forwarded it; the main path skipped it through three places (``build_api_kwargs``, ``ResponsesApiTransport.build_kwargs``, ``_preflight_codex_api_kwargs``). 3. The streaming stale detector had the same payload-shape bug for ``codex_responses`` requests, which route through the non-streaming detector (it's the path that emits the user-facing "No response from provider for 300s (non-streaming, ...)" warning that reporters keep pasting). This commit: - Adds ``estimate_request_context_tokens`` in ``chat_completion_helpers``, used by both the non-stream and stream detectors. Handles ``messages`` (Chat Completions), ``input + instructions + tools`` (Responses API), bare lists, and an unknown-dict fallback. - Forwards ``timeout`` through ``ResponsesApiTransport.build_kwargs`` and ``_preflight_codex_api_kwargs`` (with guards against zero/negative/inf/bool values), and wires ``_resolved_api_call_timeout()`` into the Codex branch of ``build_api_kwargs``. - Lowers the implicit non-stream stale defaults so fallback providers kick in faster when upstream stalls: * base 300s -> 90s * >50k 450s -> 150s * >100k 600s -> 240s These only apply when the user has *not* set ``providers.<id>.stale_timeout_seconds`` or ``HERMES_API_CALL_STALE_TIMEOUT``. Explicit config still wins. - Adds regression tests for the estimator shapes, the new defaults, the context-tier scaling, transport timeout pass-through, and preflight timeout pass-through / rejection of invalid values. Closes #21444 Supersedes #21652 #24126 #31855 Co-authored-by: Hoang V. Pham <26063003+hehehe0803@users.noreply.github.com> |
||
|
|
2cd952e110 |
feat(stt): add register_transcription_provider() plugin hook
Add an opt-in Python plugin surface for speech-to-text backends,
mirroring the TTS hook pattern. New backends (OpenRouter, SenseAudio,
Gemini-STT, custom proprietary engines) can be implemented as plugins
without modifying tools/transcription_tools.py.
Built-ins always win
--------------------
The 6 built-in STT providers (local/faster-whisper, local_command,
groq, openai, mistral, xai) keep their native handlers. Plugins
attempting to register under a built-in name are rejected at
registration time with a warning and re-checked defensively at
dispatch.
Resolution order
----------------
1. stt.provider matches a built-in → built-in dispatch (unchanged)
2. stt.provider matches a registered plugin →
a. if plugin.is_available() returns False → unavailability envelope
identifying the plugin (not the generic "No STT provider"
message — the user explicitly opted into this plugin)
b. otherwise plugin.transcribe() with model + language forwarded
from stt.<provider>.{model,language} config
3. No match → legacy "No STT provider available" error (unchanged)
Per-provider config namespace
-----------------------------
Plugins read their config from stt.<provider> in config.yaml, mirroring
how built-ins read stt.openai.model / stt.mistral.model. The dispatcher
forwards `model` and `language` from this section. Caller's explicit
`model=` argument overrides the config-set model.
Files
-----
- agent/transcription_provider.py: TranscriptionProvider ABC
- agent/transcription_registry.py: register/get/list providers,
built-in shadow guard, _reset_for_tests
- hermes_cli/plugins.py: register_transcription_provider() on
PluginContext
- tools/transcription_tools.py: BUILTIN_STT_PROVIDERS frozenset,
_dispatch_to_plugin_provider() with availability gate, wire-in
after xai branch and before "No STT provider" error
- tests/agent/test_transcription_registry.py: 27 tests
- tests/hermes_cli/test_plugins_transcription_registration.py: 3 tests
- tests/tools/test_transcription_plugin_dispatch.py: 28 tests
(covering built-in short-circuit, plugin dispatch, exception
envelope, non-dict guard, availability gate, language forwarding)
- tests/plugins/transcription/check_parity_vs_main.py: 10-scenario
subprocess-pinned parity harness vs origin/main
- website/docs/user-guide/features/{tts,plugins}.md: docs
Behavior parity
---------------
10 scenarios, 8 OK + 2 expected DIFFs:
no_provider_error → plugin (plugin-installed scenario)
no_provider_error → plugin_unavailable (plugin-installed-unavailable
scenario; PR returns cleaner envelope)
Zero behavior change for users not opting into a plugin.
Issue follow-up to #30398.
|
||
|
|
d7c5d5dee5 | fix: avoid persisting borrowed credential secrets (#31416) | ||
|
|
031f9c9edc |
fix(image_gen): cache xAI ephemeral URL responses to disk (#26942) (#31759)
xAI's grok-imagine-image API returns ephemeral imgen.x.ai/xai-tmp-* URLs that 404 within minutes — long before downstream consumers (Telegram send_photo, browser preview, multi-tier delivery fallback) get a chance to fetch them. The xAI image_gen provider was passing those URLs through unchanged on the elif url: branch; b64 responses were already cached locally via save_b64_image. Result: every image_generate call on a Telegram-routed xai-oauth profile delivered no image, falling through to text-only. Adds agent.image_gen_provider.save_url_image() — a sibling helper to save_b64_image that downloads URL bytes to $HERMES_HOME/cache/images/. Content-type-aware extension inference with URL-suffix fallback; oversize cap (25MB default) with partial-write cleanup; empty-body refusal. Mirrors the audio_cache pattern used by text_to_speech. Wires save_url_image into both the xAI and OpenAI providers' URL branches. When the download fails (network blip, 404 in-flight) we log a warning and fall back to the bare URL rather than turning the tool call into a hard error — the gateway's existing URL-send fallback then gets a chance to surface the original error legibly. Test plan: - tests/agent/test_save_url_image.py — 8 direct tests against a real in-process HTTP server: bytes round-trip, content-type → extension, URL-suffix fallback, default-to-png, 404 propagation, empty-body refusal, oversize cap + cleanup, filename uniqueness. - tests/plugins/image_gen/test_xai_provider.py — flip test_successful_url_response (was asserting the bug), add test_url_response_falls_back_to_bare_url_when_download_fails. - tests/plugins/image_gen/test_openai_provider.py — symmetric pair. 160/160 in the broader image_gen test surface. |
||
|
|
00ec0b617c |
feat(tts): add register_tts_provider() plugin hook (closes #30398)
Adds a `TTSProvider(ABC)` + `register_tts_provider()` extension point to the plugin context API, **alongside** the existing config-driven `tts.providers.<name>: type: command` registry from PR #17843. This is additive — the command-provider surface stays as the primary way to add a TTS backend. The hook covers cases the shell-template grammar can't reasonably express: - Native Python SDKs without a CLI (Cartesia, Fish Audio, etc.) - Streaming synthesis (chunked Opus → voice-bubble delivery) - Voice metadata API for the `hermes tools` picker - OAuth-refreshing auth flows None of the 10 inline built-in providers (`edge`, `openai`, `elevenlabs`, `minimax`, `gemini`, `mistral`, `xai`, `piper`, `kittentts`, `neutts`) are migrated to plugins. They stay inline. The hook is for *new* engines that aren't built-in. ## Resolution order The dispatcher's resolution order is the load-bearing invariant: 1. `tts.provider` is a built-in name → built-in dispatch. **Always wins.** 2. `tts.provider` matches `tts.providers.<name>` with `command:` set → command-provider dispatch (PR #17843). 3. `tts.provider` matches a plugin-registered `TTSProvider` → plugin dispatch (new). 4. No match → falls through to Edge TTS default (legacy behavior). Built-ins-always-win is enforced at THREE layers: - Registry: `register_provider()` rejects shadowing names with a warning. - Dispatcher: `_dispatch_to_plugin_provider()` short-circuits built-in names defensively before consulting the registry. - Picker: `_plugin_tts_providers()` filters built-in shadows out of the `hermes tools` row list defensively. Command-providers-win-over-plugins is enforced at TWO layers: - The caller in `text_to_speech_tool` checks `_resolve_command_provider_config` first. - `_dispatch_to_plugin_provider` re-checks for a same-name command config defensively so a refactor of the caller can't silently break the invariant. ## New files - `agent/tts_provider.py` — `TTSProvider(ABC)` with `synthesize()` (required), `list_voices()`, `list_models()`, `get_setup_schema()`, `stream()`, `voice_compatible` (all optional with sane defaults). Mirrors `agent/image_gen_provider.py` shape. - `agent/tts_registry.py` — `register_provider`/`get_provider`/`list_providers` with `_BUILTIN_NAMES` reject-shadowing invariant. Mirrors `agent/image_gen_registry.py` shape. - `plugins/tts/...` directory ready for community plugins (none shipped). ## Modified files - `hermes_cli/plugins.py` — `register_tts_provider()` method on `PluginContext`. Matches the gating shape of `register_image_gen_provider()` / `register_browser_provider()`. - `tools/tts_tool.py` — `_dispatch_to_plugin_provider()` + `_plugin_provider_is_voice_compatible()` + walrus-elif wiring into the main dispatcher. Built-in elif chain untouched. - `hermes_cli/tools_config.py` — `_plugin_tts_providers()` injects plugin rows into the Text-to-Speech picker category alongside the 10 hardcoded built-in rows. ## Tests - `tests/agent/test_tts_registry.py` — 47 tests covering registration, lookup, ABC contract, helpers, AND a `TestBuiltinSync` regression test that fails if `agent.tts_registry._BUILTIN_NAMES` drifts from `tools.tts_tool.BUILTIN_TTS_PROVIDERS` (kept duplicated due to circular import constraints). - `tests/tools/test_tts_plugin_dispatch.py` — 35 tests covering built-in-always-wins, command-wins-over-plugin, plugin dispatch, exception passthrough, voice_compatible helper. - `tests/hermes_cli/test_tts_picker.py` — 10 tests covering the picker surface, builtin shadowing defense, integration with `_visible_providers`. - `tests/hermes_cli/test_plugins_tts_registration.py` — 3 end-to-end tests via `PluginManager.discover_and_load()`. - `tests/plugins/tts/check_parity_vs_main.py` — 9-scenario subprocess parity harness vs `origin/main`. The only intentional diff is `fallback_edge → plugin` for the `plugin-installed` scenario. ## Verification - 95/95 new tests pass. - 170/170 pre-existing TTS tests (test_tts_command_providers, test_tts_max_text_length, test_tts_speed, etc.) pass unchanged. - Parity harness against `origin/main`: 8 OK + 1 expected DIFF. - E2E smoke: a registered plugin's `synthesize()` is called via `text_to_speech_tool` with the standard JSON envelope returned. - Ruff clean on all touched files. ## Docs - `website/docs/user-guide/features/tts.md` — new "Python plugin providers" section with a decision table (command-provider vs plugin), minimal plugin example, and the optional-hook reference. - `website/docs/user-guide/features/plugins.md` — TTS row updated to mention both surfaces (command-provider primary, plugin for SDK/streaming). Closes #30398 |
||
|
|
223a3971c0 |
fix(security): close TOCTOU window when saving Claude Code OAuth credentials (#21152)
_write_claude_code_credentials wrote ~/.claude/.credentials.json via Path.write_text + replace + post-write chmod(0o600). Both the temp file and the destination briefly inherited the process umask (commonly 0o644 = world-readable) between create/replace and chmod, exposing the OAuth access/refresh tokens to other local users on multi-user hosts. Use os.open with O_WRONLY|O_CREAT|O_EXCL and an explicit S_IRUSR|S_IWUSR mode so the temp file is created atomically at 0o600. After os.replace, the destination inherits the temp's mode, so the post-write chmod is no longer needed. The temp name also gains a per-process random suffix to avoid collisions between concurrent writers and stale leftovers from a crashed prior write. Parent dir (~/.claude/) is owned by Claude Code itself and shared with its native auth, so we deliberately don't tighten its mode here (unlike the mcp_oauth fix which owns its own subtree under HERMES_HOME). Mirrors the fix shipped for agent/google_oauth.py in #19673 and the parallel fix for tools/mcp_oauth.py in #21148. Adds a regression test in TestWriteClaudeCodeCredentials asserting the resulting file mode is 0o600 (skipped on Windows where POSIX mode bits aren't enforced). |
||
|
|
bba76f3dcd | fix(file-safety): deny reads of Google OAuth tokens (#30972) | ||
|
|
514f5020c7 | fix(debug): redact BlueBubbles webhook secrets | ||
|
|
eea9553a9c |
fix(anthropic): skip mcp_ prefix on outgoing tool schemas when already prefixed
Companion to the GH-25255 incoming-strip fix from @hayka-pacha. Without this, build_anthropic_kwargs unconditionally added 'mcp_' to every tool name in step 3, so a native MCP server tool registered as 'mcp_composio_X' was sent as 'mcp_mcp_composio_X' on the wire. The incoming strip only removes ONE prefix, which still worked on first call, but on subsequent calls the model pattern-matched the single-prefixed form from message history and produced names that stripped to 'composio_X' — registry miss, dispatch fail. The history-rewrite block (#4) already has this guard. Apply the same guard to the schema-rewrite block (#3) so round-trip is symmetric. Added 4 outgoing-side tests. Existing 7 incoming-side tests still pass. Author map: hayka-pacha added for PR #25270 salvage attribution. Refs GH-25255. |
||
|
|
2f91a8406c |
fix(agent): only strip mcp_ prefix for OAuth-injected tools (GH-25255)
When strip_tool_prefix=True (Anthropic OAuth path), normalize_response unconditionally stripped the mcp_ prefix from ALL tool names starting with mcp_. This broke Hermes-native MCP server tools (registered under their full mcp_<server>_<tool> name in the registry) because the stripped name doesn't match any registry entry. Fix: check the tool registry before stripping. Only strip when: - The stripped name EXISTS in the registry (OAuth-injected tool) - The full name does NOT exist in the registry This preserves backward compatibility for OAuth-injected tools while protecting native MCP server tools from incorrect prefix removal. 7 new tests covering: OAuth strip, native preserve, no-flag, non-mcp, unknown tools, mixed responses, and dual-registration edge case. Signed-off-by: HKPA <hayka-pacha@users.noreply.github.com> |
||
|
|
6212e9ade8 |
fix(error-classifier): treat 5xx request-validation errors as non-retryable
Standard OpenAI returns request-validation failures (unknown/ unsupported parameter, malformed request) as 4xx. Some OpenAI-compatible gateways return them as 5xx instead — codex.nekos.me returns 502 for an unknown parameter. The generic '5xx -> retryable server_error' rule then misfires: the error is deterministic (every retry gets the identical rejection), so the retry loop burns all 3 attempts, the transport-recovery path resets the counter and burns 3 more, and the result is a request flood against a request that can never succeed. Fix: when a 500/502 body carries an unambiguous request-validation signal — 'unknown parameter' / 'unsupported parameter' / 'invalid_request_error' in the message text, or invalid_request_error / unknown_parameter / unsupported_parameter as the structured error code — classify as a non-retryable format_error so the loop fails fast and falls back. Genuine 502 Bad Gateway with no such signal stays retryable as before. Origin: local-author Upstream-PR: none Patch-State: local-only |
||
|
|
775a17284f |
fix(transport): strip Hermes-internal scaffolding keys before chat.completions
The empty-response recovery path in run_agent.py appends synthetic
messages tagged with _empty_recovery_synthetic (and the agent loop uses
_thinking_prefill / _empty_terminal_sentinel similarly). These are
internal bookkeeping markers — they must never reach the wire.
chat_completions' convert_messages only stripped Codex Responses leak
fields (codex_reasoning_items, call_id, etc.), not these _-prefixed
markers. Permissive providers (real OpenAI, Anthropic) silently ignore
unknown message keys so the bug stayed hidden, but strict
OpenAI-compatible gateways reject them outright. Observed against
codex.nekos.me:
502: [ObjectParam] [input[617]._empty_recovery_synthetic]
[unknown_parameter] Unknown parameter:
'_empty_recovery_synthetic'
Because the synthetic messages persist in the session, every
subsequent request in that session carries the poisoned key and
fails identically — a deterministic 502 the retry loop mistakes for
a transient server error.
Fix: convert_messages now drops any top-level message key starting
with '_'. OpenAI's message schema has no '_'-prefixed fields, so this
is safe and future-proofs against new internal markers.
Origin: local-author
Upstream-PR: none
Patch-State: local-only
|
||
|
|
3d66787a04 |
fix(vision): route auxiliary.vision.provider=openai to api.openai.com, skip text-only main (#31452)
* fix(vision): route auxiliary.vision.provider=openai to api.openai.com, skip text-only main for vision Fixes #31179. Three coupled fixes so a configured aux vision backend actually serves vision tasks instead of silently routing images to the user's main provider: 1. agent/auxiliary_client.py: `auxiliary.<task>.provider: openai` resolves to `custom` + `https://api.openai.com/v1`. "openai" was not in PROVIDER_REGISTRY (we have `openai-codex` for OAuth and `custom` for manual base_url), so the obvious config name silently failed to build a client. User-supplied base_url is still preserved; only the provider name normalises to `custom` so resolution doesn't hit the PROVIDER_REGISTRY-only path. 2. agent/auxiliary_client.py: the vision auto-detect chain now skips the user's main provider when models.dev reports `supports_vision=False`. Without this guard, a misconfigured aux provider would fall back to `auto`, which happily returned the main-provider client. The caller would then send image content to e.g. api.deepseek.com with model `gpt-4o-mini` and get a cryptic `unknown variant 'image_url', expected 'text'` from the provider's parser. 3. tools/vision_tools.py + tools/browser_tool.py: `check_vision_requirements` now mirrors the runtime fallback chain (explicit provider, then auto), so `vision_analyze` shows up whenever vision is actually serviceable. `browser_vision` gets a new `check_browser_vision_requirements` check_fn that AND-gates browser + vision availability, so it doesn't get advertised to the model when the call would fail at runtime. Reproduction (config from the bug report): model.provider: deepseek model.default: deepseek-v4-pro auxiliary.vision.provider: openai auxiliary.vision.model: gpt-4o-mini Before: resolve_vision_provider_client() returns None for the explicit provider, fallback auto returns the deepseek client with model='gpt-4o-mini', image hits api.deepseek.com → 'unknown variant image_url'. vision_analyze hidden from tool list; browser_vision exposed but fails at call time. After: resolves to custom + api.openai.com/v1 with model gpt-4o-mini. vision_analyze and browser_vision both gate correctly on capability. Tests: tests/agent/test_vision_routing_31179.py covers all three fixes (12 cases including the user's exact scenario, base_url preservation, text-only-main skip, capability-unknown permissive fallback, and tool gating parity). Existing 382 tests across auxiliary/vision/image_routing suites still pass. * test(vision): use exact hostname check to silence CodeQL substring-sanitization alert * fix(auxiliary): drop model name from vision-skip debug log to silence CodeQL The new `logger.debug(...)` added in the previous commit interpolated both `main_provider` and `vision_model` (a public model slug \u2014 not sensitive). CodeQL's `py/clear-text-logging-sensitive-data` heuristic re-flagged it twice because the rule mis-detects multi-value interpolations near tainted-via-config provider strings. Drop the model from the log args (provider alone is enough to diagnose the skip; the same sibling branch a few lines up already logs provider only). Behavior unchanged; CodeQL false positive cleared. |
||
|
|
d3c167b644 |
fix(profiles): cross-profile soft guard on file-write tools + system-prompt hint (#31290)
* fix(profiles): cross-profile soft guard on file-write tools + system-prompt hint
Adds a soft guard so an agent running under one Hermes profile cannot
silently edit a different profile's skills/plugins/cron/memories.
Three layers:
A. agent/file_safety.classify_cross_profile_target
Classifies a write target against the active HERMES_HOME. Returns
a {active_profile, target_profile, area, target_path} dict when the
path lands in another profile's scoped area. PROFILE_SCOPED_AREAS =
(skills, plugins, cron, memories). get_cross_profile_warning()
wraps it into a model-facing error string that names both profiles,
names the area, and points at the cross_profile=True bypass.
Defense-in-depth, NOT a security boundary — the terminal tool runs
as the same OS user and can write any of these paths directly. The
guard exists to prevent confused-agent corruption, not to stop a
determined attacker. SECURITY.md §3.2 (terminal-bypass posture)
still applies.
Wired into tools/file_tools.write_file_tool and patch_tool with a
cross_profile=False kwarg. WRITE_FILE_SCHEMA and PATCH_SCHEMA both
advertise cross_profile so the model can pass it after explicit
user direction. patch_tool extracts target paths from V4A patch
bodies before checking (same shape as the existing sensitive-path
check).
skill_manage is already scoped to the active profile's SKILLS_DIR
by construction, so no extra guard wiring is needed there. The
D-side error message (below) still names other profiles when the
skill exists elsewhere.
B. agent/system_prompt
One deterministic line near the environment-hints block names the
active profile and tells the model not to modify another profile's
skills/plugins/cron/memories without explicit direction. Profile
name is stable for the lifetime of the AIAgent, so the line is
prompt-cache-safe.
D. tools/skill_manager_tool._skill_not_found_error
Replaces the bare "Skill 'X' not found." with a message that:
- names the active profile,
- searches OTHER profiles' skills dirs for the same name,
- names the profile(s) where the skill exists and the path,
- suggests `hermes -p <name>` to switch profiles, or
cross_profile=True for an explicit edit.
All 5 "not found" sites in skill_manager_tool (edit, patch, delete,
write_file, remove_file) now go through the helper.
Reference incident (May 2026): a hermes-security profile session
edited skills under both ~/.hermes/profiles/hermes-security/skills/
AND ~/.hermes/skills/ (the default profile's skills) without
realizing the second path belonged to a different profile. Three of
the four skill files needed manual restoration afterward.
What this PR does NOT do:
* No hard block. The terminal tool can still touch any of these
paths with no guard — same posture as the dangerous-command
approval flow. SECURITY.md §3.2 applies.
* No regex sweep on terminal commands for cross-profile paths.
That direction is a Skills-Guard-style arms race (cd + relative
paths, base64, etc.) and would false-positive on legitimate
cross-profile reads. Filed as a follow-up.
* No on-disk path migration. ~/.hermes/skills/ remains the
default profile's skills dir; this PR is about telling the
agent about that boundary, not changing the layout.
Tests:
tests/agent/test_file_safety_cross_profile.py (16 tests)
- _resolve_active_profile_name covers default/named/failure paths
- classify_cross_profile_target covers all four scoped areas,
both directions (default → named, named → default, named → named),
non-Hermes paths, and root-level config files
- get_cross_profile_warning covers in-profile no-op, cross-profile
message shape, and the defense-in-depth self-documentation
tests/tools/test_cross_profile_guard.py (12 tests)
- write_file: in-profile allow, cross-profile block, cross_profile=True
bypass, non-Hermes pass-through
- patch: replace-mode block, cross_profile=True bypass, V4A patch
path extraction
- skill_manage: error names the other profile (single + multiple),
missing-everywhere falls back to skills_list hint
- system prompt: contract-level checks (both branches present,
cross_profile=True mentioned, ~/.hermes/profiles/ referenced)
All 207 existing tests in file_safety/file_operations/skill_manager
still pass. 10 system-prompt tests still pass.
E2E verified: the exact incident scenario (security profile editing
default's hermes-agent-dev skill) is now blocked with the warning
message; cross_profile=True unblocks.
* fix(code_execution): add cross_profile to write_file/patch stubs
The cross_profile kwarg added to write_file_tool/patch_tool needs to
flow through the execute_code sandbox stubs in _TOOL_STUBS so the
test_stubs_cover_all_schema_params drift test passes. Without this,
scripts running inside execute_code couldn't pass cross_profile=True
through hermes_tools.write_file().
Caught by CI on PR #31290.
|
||
|
|
2b10024ee8 |
test(display): cover failure-suffix rendering + update scrollback test
The original PR #17194 description claimed test_display_tool_preview.py but only ever shipped test_display_todo_progress.py. Add the missing coverage for the failure-suffix path: - _trim_error: whitespace strip, length cap, File-not-found path collapse - _detect_tool_failure: terminal exit codes, memory full, structured {error}/{message} extraction, malformed JSON, None result - get_cute_tool_message E2E: read_file failure, terminal exit-only, terminal stderr message, memory full, success path, no-result path Also update test_tool_progress_scrollback.test_error_suffix_on_failed_tool to reflect the new behavior: the generic '[error]' fallback in cli.py has been removed; failure suffixes now come from the result-aware _detect_tool_failure (e.g. '[exit 1]', '[File not found: x]'). |
||
|
|
ffde8b7b09 |
feat(cli): show todo progress as done/total fraction
Parse the todo_tool result summary to display completion progress in CLI tool preview lines: Read: ┊ 📋 plan 3/4 task(s) 0.5s Update: ┊ 📋 plan update 3/4 ✓ 0.5s Create: falls back to plain count when no completed tasks Falls back gracefully to the existing 'N task(s)' format when the result is missing, malformed, or has no completed items. Originally proposed in PR #17194 by Albert.Zhou; salvaged onto current main. Co-authored-by: Albert.Zhou <albert748@gmail.com> |
||
|
|
83f6a83b24 | fix(tui): handle images with codex app-server | ||
|
|
e752c9454e |
feat(plugins): add register_auxiliary_task() to PluginContext API
Auxiliary LLM tasks (vision, compression, web_extract, etc.) currently
require modifications to core files for any plugin that needs its own
task slot — specifically the _AUX_TASKS list in hermes_cli/main.py and
the hardcoded env-var bridging dict in gateway/run.py. This violates
the 'plugins must not modify core files' rule and forces every memory
or context plugin that wants its own auxiliary task to either fork
core or open a coupled core+plugin PR.
This change adds a generic plugin surface for auxiliary task
registration:
ctx.register_auxiliary_task(
key='memory_retain_filter',
display_name='Memory retain filter',
description='hindsight pre-retain dedup/extract',
defaults={'timeout': 30, 'extra_body': {'reasoning_effort': 'low'}},
)
After registration, the task automatically:
- Appears in 'hermes model → Configure auxiliary models' picker via
a new _all_aux_tasks() merge of built-in + plugin tasks
- Has its provider/model/base_url/api_key bridged from config.yaml
to AUXILIARY_<KEY_UPPER>_* env vars at gateway startup
(gateway/run.py now uses a dynamic bridged-keys set instead of
a hardcoded per-task dict)
- Gets plugin-declared defaults (timeout, extra_body, etc.) layered
underneath user config so unconfigured plugin tasks still work
(agent/auxiliary_client._get_auxiliary_task_config)
- Resets to auto via 'Reset all to auto' alongside built-ins
Validation:
- Rejects shadowing of built-in keys (vision, compression, etc.)
- Rejects invalid key shapes (must match [A-Za-z0-9_]+)
- Rejects cross-plugin collisions (clear error)
- Allows same-plugin re-registration (idempotent updates)
Plugin discovery failures (rare) fall back gracefully — the aux
config UI still shows built-in tasks if get_plugin_auxiliary_tasks()
raises, and gateway env-var bridging keeps working for built-ins.
Built-in tasks remain hardcoded in _AUX_TASKS for stability — they're
the baseline UX, and DEFAULT_CONFIG already ships their defaults.
Plugin tasks layer on top.
Tests: 15 new tests in test_plugin_auxiliary_tasks.py covering API
validation, manager state lifecycle, helper sort order, _all_aux_tasks
merge semantics, _reset_aux_to_auto inclusion of plugin tasks, and
default-layering in auxiliary_client.
Updates the gateway-bridge code-parity test (test_auxiliary_config_bridge)
to assert the new dynamic shape rather than the hardcoded literal env
var names which no longer appear post-refactor.
Motivation: this unblocks PR #20262 (hindsight smart retain pipeline)
and similar plugins that need a dedicated aux task slot. The change
is non-breaking — built-in env vars (AUXILIARY_VISION_PROVIDER, etc.)
keep working since they're produced by the same f-string template
that built the hardcoded names.
|
||
|
|
8b2adead78 | fix(compressor): ABC compliance — total_tokens, api_mode, logger consistency | ||
|
|
71291d83cd | test: keep tirith checks hermetic | ||
|
|
97e975edd2 |
fix(file-safety): widen read-deny to .env, mcp-tokens/, webhook secrets, root
Extends @briandevans's PR #17659 from {auth.json, auth.lock, .anthropic_oauth.json} to also cover: - HERMES_HOME/.env (provider API keys) - HERMES_HOME/webhook_subscriptions.json (per-route HMAC secrets) - HERMES_HOME/mcp-tokens/ (OAuth token directory; dir + everything inside) …AND iterates over both _hermes_home_path() AND _hermes_root_path() so profile-mode runs (HERMES_HOME = <root>/profiles/<name>) also block <root>/{auth.json, .env, mcp-tokens/, ...}. Same widening shape as the write-deny side already does (#15981, #14157). Explicitly NOT a security boundary. Per the personal-assistant trust model, the terminal tool runs as the same OS user and can `cat auth.json` directly. This read-deny exists as defense-in-depth: - Models that respect tool denials empirically tend to stop rather than reach for the shell. - The denial surfaces an audit trail when something tries to read credentials — easier to spot in logs than a generic `cat`. Docstring + error message both flag this as defense-in-depth so future contributors don't mistake it for a real security boundary and don't re-decline reports that propose the same fix shape. Absorbs the .env and mcp-tokens/ coverage from @tomqiaozc's parallel PR #8055 (closed-as-duplicate, credited). Co-authored-by: Tom Qiao <zqiao@microsoft.com> |
||
|
|
567ea61298 |
fix(file-safety): block auth.json read via TERMINAL_CWD relative path
read_file_tool resolves relative paths against TERMINAL_CWD (or the task's live terminal cwd), but the prior call passed the original unresolved string to get_read_block_error. That function's own resolve() is anchored at the Python process cwd, so when a task's TERMINAL_CWD pointed at HERMES_HOME and the agent issued read_file on the relative path "auth.json", the credential-store denylist was never reached and the file was read normally. Pass the already-resolved absolute path string at the file_tools call site, document the contract on get_read_block_error, and add a read_file_tool-level regression test that pins the relative-path case under TERMINAL_CWD == HERMES_HOME. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
056e00a77e |
fix(file-safety): block read_file on HERMES_HOME credential stores (#17656)
`get_read_block_error` previously only denied reads inside
`${HERMES_HOME}/skills/.hub`, which left `auth.json` (provider OAuth
state + plaintext API keys) and `.anthropic_oauth.json` (Anthropic PKCE
tokens) directly readable by the agent. A prompt-injection reaching
`read_file` could exfiltrate active provider credentials in plaintext.
Mode-0600 file permissions only protect against *other Unix users* —
the agent runs as the file's owner, so `read_file` is unaffected.
Extend the existing deny list with the three credential paths
identified in #17656 (`auth.json`, `auth.lock`, `.anthropic_oauth.json`).
The check uses the same `Path.resolve()` pattern as `skills/.hub`, so
symlink/path-traversal indirection is caught too. The agent doesn't
need to read these directly — `auxiliary_client` and `credential_pool`
consume them through process env / OAuth flows that bypass `read_file`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
09afafb87e | fix(xai): resolve Grok Build context for OAuth | ||
|
|
c769be344a |
fix(agent): recover from providers rejecting list-type tool content (#27344) (#30259)
Some providers (Xiaomi MiMo, some Alibaba endpoints, a long tail of OpenAI-compatible servers) follow the OpenAI spec strictly and require tool message `content` to be a string — they reject our list-type content (text + image_url parts) with HTTP 400 'text is not set' / 'tool message content must be a string'. Instead of an allowlist of known-good providers (maintenance burden, guaranteed to miss aggregators like OpenRouter where the underlying model determines support, not the aggregator name), this lands a reactive recovery: 1. New `FailoverReason.multimodal_tool_content_unsupported` with a small pattern list covering the common 400 wordings. 2. `AIAgent._try_strip_image_parts_from_tool_messages` walks the API message list, downgrades any `role:tool` message whose content is list-with-image to a plain text summary (preserves text parts) in place, AND records the active (provider, model) in a session-scoped `_no_list_tool_content_models` set. 3. `_tool_result_content_for_active_model` short-circuits to a text summary when (provider, model) is in the cache — so after the first 400 + retry, subsequent screenshots in the same session skip the round trip entirely. 4. Retry hook in `agent.conversation_loop` mirrors the existing `image_too_large` recovery: detect the reason, run the helper, retry once, fall through to the normal error path if no list-type tool content was actually present. Cache is transient (per-session) by design — next session retries in case the provider added support, no persistent state to maintain. Fixes #27344. Closes #27351 (allowlist approach superseded by reactive recovery). |
||
|
|
e77f1ed5f7 |
fix(agent): widen toolset gate to context engine tools (#5544 sibling)
The memory-provider gate added in the prior commit closes one of two blind-injection sites in agent_init.py. The context engine block (lines ~1445) follows the identical pattern: agent.context_compressor.get_tool_schemas() (lcm_grep, lcm_describe, lcm_expand) was appended to agent.tools unconditionally, ignoring enabled_toolsets. Same bug class, same local-model latency penalty, same one-line gate — using 'context_engine' as the toolset name (matches the existing plugin-system convention in plugins.py, plugins_cmd.py, etc.). Also adds Lempkey to scripts/release.py AUTHOR_MAP for the prior commit's authorship. |
||
|
|
4c61fb6cf6 |
fix(agent): gate memory tool injection on enabled_toolsets (#5544)
MemoryManager.get_all_tool_schemas() output was appended to AIAgent.tools unconditionally — bypassing the enabled_toolsets / platform_toolsets filter. Setting `platform_toolsets: telegram: []` had no effect: fact_store and other memory provider tools still leaked into the tool surface on every session. Impact on local models (per @thundercat49's benchmarks on Qwen3-30B-A3B Q4_K_M / RTX 3090): tool-formatted prompts process at 134 tok/s vs 1,230 tok/s for plain text. With 8 memory tool schemas injected, a simple 'hello' on Telegram took ~42s instead of ~1.7s. Small models also entered tool-call loops when memory tools were the only tools present. Gate condition (matches the natural meaning of enabled_toolsets): None → no filter, inject (backward compat) contains 'memory' → user opted in, inject otherwise (including []) → skip injection Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com> |
||
|
|
9896e43db5 |
fix(skills): load Linux-tagged skills on Termux (android sys.platform)
Reported by @LikiusInik in Discord: on Termux only 3 built-in skills
appeared and /gh-pr-workflow + every other slash-skill from
github/productivity/mlops was missing.
Root cause: skill_matches_platform() compares sys.platform.startswith()
against the skill's platforms list. Termux is a Linux userland on
Android, but Python 3.13+ reports sys.platform == "android" instead of
"linux" — so the ~60 built-in skills tagged platforms:[linux,macos,
windows] (github-pr-workflow, google-workspace, github-auth,
huggingface-hub, etc.) all got filtered out at the listing step in
tools/skills_tool.py:_find_all_skills and never appeared as /slash
commands or in skill_view.
Fix: when is_termux() detects we're running inside Termux, accept
"linux" platform tags regardless of whether sys.platform is "linux"
(pre-3.13) or "android" (3.13+). Also accept explicit
platforms:[termux] / [android] tags. macOS-only and Windows-only
skills correctly remain excluded.
E2E (simulated TERMUX_VERSION=set + sys.platform="android"):
Before: _find_all_skills() returned ~3 skills.
After: _find_all_skills() returns 84 skills including
github-pr-workflow, google-workspace, github-auth,
huggingface-hub. Apple-only skills remain excluded.
Non-Termux Linux/macOS/Windows behavior unchanged (verified).
Tests: tests/agent/test_skill_utils.py — 9 new cases covering
android-as-Termux, the [linux,macos,windows] case, macOS-only
exclusion, explicit termux/android tags, non-Termux Android safety,
and unchanged behavior on real Linux/macOS.
|
||
|
|
3fde8c153d |
fix(skills): prune dependency/venv dirs from all skill scanners (#30042)
* fix(skills): skip dependency dirs in skill scan * fix(skills): widen sibling rglob scanners to use shared exclusion set Follow-up to PR #29968. The contributor's PR widened EXCLUDED_SKILL_DIRS in the canonical walker (iter_skill_index_files), which fixes the user-visible discovery path. This commit sweeps the ~12 other rglob('SKILL.md') sites that did their own ad-hoc filtering — most only checked .git/.hub, some had no filter at all — so dependency dirs (.venv, node_modules, site-packages, etc.) cannot leak ghost skills through the secondary paths. Adds agent.skill_utils.is_excluded_skill_path(path) helper. Migrates all 13 sites to use it. Removes 3 hardcoded duplicate filter sets. Sites touched: agent/curator_backup.py - skill backup file count gateway/run.py - disabled-skill response (2 sites) hermes_cli/dump.py - skill count in env dump hermes_cli/profile_describer.py- profile description (2 sites) hermes_cli/profile_distribution.py - profile install count hermes_cli/profiles.py - profile skill count hermes_cli/skills_hub.py - category detection tools/skill_manager_tool.py - skill name lookup (already used set, now uses helper) tools/skill_usage.py - usage tracking + skill dir lookup (2 sites) tools/skills_hub.py - optional skills find + scan (2 sites) tools/skills_sync.py - bundled skills sync E2E verified with the exact reported shape (bring/scripts/.venv/.../typer/.agents/skills/typer/SKILL.md): no sibling site picks up the ghost skill, all five legit-skill counts still return 1. * chore(infographic): retro-pop-grid bento for PR #30042 skill-scanner sweep --------- Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com> |
||
|
|
ba9964ff0d |
fix(custom): pass custom provider extra body
Allow custom OpenAI-compatible providers declared under `custom_providers:`
to set provider-specific `extra_body` fields and have Hermes merge them into
chat-completions requests when the matching custom endpoint is active.
This is a manual per-provider override rather than a model-name heuristic.
OpenAI-compatible Gemma thinking support is real, but the on-wire payload
shape is backend-specific: some servers want top-level `enable_thinking`,
while vLLM Gemma and NIM-style endpoints expect `chat_template_kwargs`.
A per-provider override is safer than picking one assumed payload.
Example config:
```yaml
custom_providers:
- name: gemma-local
base_url: http://localhost:8080/v1
model: google/gemma-4-31b-it
extra_body:
enable_thinking: true
reasoning_effort: high
```
For vLLM Gemma or NIM-style endpoints, use the nested shape those servers
expect:
```yaml
extra_body:
chat_template_kwargs:
enable_thinking: true
```
Changes:
- `hermes_cli/config.py`: preserve `extra_body` in normalized
`custom_providers:` entries and allow it in the validated field set.
- `hermes_cli/runtime_provider.py`: propagate custom-provider `extra_body`
as `request_overrides.extra_body` for named custom runtime resolution,
including credential-pool paths.
- `agent/agent_init.py`: at agent init, locate the matching custom-provider
entry by `base_url` (+ optional model) and merge its `extra_body` into
`AIAgent.request_overrides`, with caller-provided overrides winning on
conflicting top-level keys.
- `plugins/model-providers/custom/__init__.py`: keep existing CustomProfile
behavior (Ollama `num_ctx`, `think=False` when reasoning disabled);
user-configured `extra_body` flows through `request_overrides`.
- `website/docs/integrations/providers.md`: document the explicit
`extra_body` override and the vLLM/Gemma `chat_template_kwargs` variant.
- Tests cover config normalization, runtime propagation, model matching,
trailing-slash equivalence, fallback when no `model` field is set, and
caller-override merging precedence.
Verified end-to-end against `CustomProfile` via `ChatCompletionsTransport`:
configured `extra_body` reaches `kwargs.extra_body` on the wire request,
and coexists with profile-generated entries (Ollama `num_ctx`, `think=False`)
without clobber.
Salvaged from #29022 onto current `main`. Cosmetic typing edit in
`plugins/model-providers/custom/__init__.py` and a stale-base docs revert
in `providers.md` were dropped during cherry-pick.
Closes #29022
|
||
|
|
48be2e0e4d |
test: use subprocesses for each test file (#29016)
* ci(tests): install ripgrep from prebuilt tarball instead of apt
apt-get update + install of ripgrep takes ~4 min on the GHA Ubuntu
runners (the apt-get update against archive.ubuntu.com is the slow
part; ripgrep itself is small). Switching to the upstream musl
binary tarball cuts the step to a few seconds.
- Pinned to ripgrep 15.1.0 with sha256 verification (same hash as
published in the releases sha256 sidecar file).
- Drops the `rg` binary into /usr/local/bin so it is on PATH for
every subsequent step without GITHUB_PATH manipulation.
- Applied to both the test and e2e jobs in tests.yml.
* fix(cli): compile syntax check to tempdir, not source __pycache__
`_validate_critical_files_syntax` runs `py_compile.compile()` on each
critical bootstrap file after a successful `git pull`. The default
`py_compile` writes the resulting `.pyc` next to the source under
`__pycache__/`, which causes two real problems:
1. Parallel test workers walking the same source tree (e.g. running
the suite under per-file process isolation) can race against each
other on the `__pycache__` write — manifests as flaky 'directory
not empty' errors during teardown.
2. In production, the post-pull syntax check leaves a `.pyc` behind
that the next interpreter run might pick up — fine when the
interpreter version matches, sketchy if it doesn't.
Fix: write the compiled output to a `tempfile.TemporaryDirectory()`
that's discarded on function exit. We only care about the compile-or-not
signal, not the artifact.
* test(runner): per-file process isolation, drop manual state reset + xdist
Replace fragile manual _reset_module_state test fixtures with robust
per-file subprocess isolation. Each test file runs in a fresh
`python -m pytest <file>` subprocess via ThreadPoolExecutor. No xdist,
no custom pytest plugin, no shared worker state.
Key changes:
* scripts/run_tests_parallel.py — new runner: discovers test files,
runs N in parallel via ThreadPoolExecutor, captures stdout per file,
treats exit code 5 (no tests collected) as pass, kills all children
on exit. Change from cpu_count to cpu_count*2. The runner is
I/O-bound (waiting on subprocess.communicate() from pytest children)
The parent process does almost no CPU work, so 2x oversubscription
keeps more pipes full. When a file fails, immediately show the last
30 lines of pytest output (stack traces + FAILED summary) plus a
ready-to-copy repro command:
python -m pytest tests/agent/test_auxiliary_client.py
* scripts/run_tests.sh — delegates to run_tests_parallel.py
* .github/workflows/tests.yml — test step: python
scripts/run_tests_parallel.py
* pyproject.toml — drop pytest-xdist, pytest-split; simplify addopts
* tests/conftest.py — remove ~200 lines of manual state-reset fixtures
* AGENTS.md — update Testing section for per-file design
* test(runner): speed gateway test antipattern scan up
* fix(test): web search provider plugin test missing xai
* fix(tests): make 14 test files pass under per-file subprocess isolation
Tests that relied on cross-file state pollution from xdist workers
fail when run in isolation (per-file subprocess model). Root causes
and fixes:
Tool registry not populated:
- test_video_generation_tool_surface_matrix: add discover_builtin_tools()
- test_web_providers_brave_free/ddgs/searxng/general: autouse fixtures
registering all 8 bundled web providers, reset after each test
- test_website_policy: same provider registration pattern
- test_web_tools_tavily: same pattern across 3 dispatch test classes
- Also add is_safe_url/check_website_access mocks where SSRF check
blocks example.com (DNS resolution fails in isolated envs)
Stale check_fn cache:
- test_kanban_tools: invalidate_check_fn_cache() + _clear_tool_defs_cache()
in both kanban guidance tests (prior test cached False for kanban_show)
- test_discord_tool: cache invalidation in setup/teardown
- test_homeassistant_tool: invalidate_check_fn_cache() before registry queries
Module-level state pollution:
- test_auxiliary_client: autouse fixture clearing _aux_unhealthy_until cache
- test_skill_commands: set_session_vars() instead of patch.dict(os.environ)
(ContextVar takes precedence over os.environ)
- test_dm_topics: overwrite sys.modules + separate telegram.constants mock
+ force-reimport of gateway.platforms.telegram
- test_terminal_tool_requirements: removed duplicate class declaration,
autouse _clear_caches fixture
* change(tests): run_tests.sh explicitly includes env vars
instead of manually dropping some vars, now we just only include some
* fix(tests): 5 more isolation/NixOS fixes
- test_approval_plugin_hooks: isolate HERMES_HOME so real user's
command_allowlist doesn't short-circuit the approval path
- test_google_chat: skipif when Platform.GOOGLE_CHAT not in enum
(feature not merged on this branch)
- test_write_deny: test systemd prefix against tmp_path instead of
/etc/systemd which resolves to /nix/store on NixOS
- test_pty_bridge: use shutil.which('cat') instead of /bin/cat
(doesn't exist on NixOS)
- profiles.py: rmtree onexc handler chmod's parent dirs too, fixing
profile deletion when copytree preserved read-only modes from
nix store
* fix(tests): clear unhealthy cache in autouse fixture for auxiliary_client
* fix(tests): skip send_message when telegram not installed; handle missing worker_id in browser_supervisor
* fix: py3.11 rmtree onexc compat + belt-and-suspenders unhealthy cache clear for expired codex test
* fix: address PR #29016 review feedback
- Remove tracked .pytest-cache/ artifact and add to .gitignore
- Fix stale 'xdist worker' comment in conftest.py
- Deduplicate web provider registration into tests/tools/conftest.py
shared helper (register_all_web_providers), replacing 8 copy-pasted
blocks across 6 test files
- Update PR description: remove stale recovered-test-files claim,
fix worker count to match code (cpu_count*2)
* fix: eliminate race in stale-cache achievements test
The background scan thread could complete and overwrite _SNAPSHOT_CACHE
before evaluate_all() returned the stale data — only 10 fake sessions
made the scan finish instantly. Added scan_delay param to _FakeSessionDB
and set it to 2s in the stale-cache test so the background thread can't
win the race.
|
||
|
|
32aea113f0 |
fix(agent): consult supports_vision override in auto-mode routing
The contributor PR (#17936) only patched the strip path in `_model_supports_vision()`. The auto-mode router in `agent/image_routing._lookup_supports_vision` still only read models.dev, so a custom-provider model declared as vision-capable would still get its images routed through vision_analyze in the default `agent.image_input_mode: auto` setting. Users had to set both `supports_vision: true` AND `image_input_mode: native` to bypass the text pipeline. Single-knob behavior now: `supports_vision: true` alone is enough in auto mode. The strip path and the routing path consult the same resolver. - Extract override resolution into `_supports_vision_override()` in agent/image_routing.py and wire it into `_lookup_supports_vision()`. - Refactor `run_agent._model_supports_vision` to call the same helper (DRY, single source of truth for the resolution order). - Strict YAML boolean coercion: `supports_vision: "false"` (quoted — a common YAML mistake) no longer coerces to True via bool() truthiness. Recognised tokens: true/false/yes/no/on/off/1/0 plus real bools and 0/1. Unrecognised values return None and fall through to models.dev. - Add @CNSeniorious000 to AUTHOR_MAP for release attribution. Tests: 26 new (TestCoerceCapabilityBool, TestSupportsVisionOverride, TestLookupSupportsVisionOverride, TestAutoModeRespectsOverride). Existing contributor tests + image_routing + vision_native_fast_path + native_image_buffer_isolation all green (92/92). |
||
|
|
b4afc6546e |
fix(xai): restore encrypted reasoning replay across turns
xAI partner integration requires Hermes to thread `encrypted_content` reasoning items back to the Responses API on every turn so Grok can maintain cross-turn reasoning coherence. PR #26644 (May 15) gated this off for `is_xai_responses` on the theory that the OAuth/SuperGrok surface rejected replayed encrypted blobs and produced the multi-turn "Expected to have received \`response.created\` before \`error\`" failure. That diagnosis was wrong — the prelude-SSE fallback added in the same PR is what actually fixed that failure mode. Suppressing the replay was an unnecessary side-effect that broke the whole point of xAI's partnership integration. Changes: - agent/codex_responses_adapter.py — drop the `is_xai_responses` gate in `_chat_messages_to_responses_input`. Keep the kwarg in the signature for transport compatibility; update the docstring to document the May 2026 reversal. - agent/transports/codex.py — restore `kwargs["include"] = ["reasoning.encrypted_content"]` on the xAI Responses path so xAI echoes encrypted reasoning back to us. - tests/run_agent/test_codex_xai_oauth_recovery.py — flip the three xAI assertions (now: xAI MUST receive replayed reasoning AND we MUST include encrypted_content in the request). - tests/agent/transports/test_codex_transport.py — flip the `include` assertions on `test_xai_reasoning_effort_passed` and `test_xai_grok_4_omits_reasoning_effort`; update the allowlist block comment. The prelude-SSE fallback and the entitlement-403 surfacing fixes from #26644 are untouched — they were independent fixes that happened to ride along with the reasoning-replay gate. Validation: - Targeted: tests/run_agent/test_codex_xai_oauth_recovery.py + tests/agent/transports/test_codex_transport.py → 65/65 pass - Broader: tests/agent/transports/ + tests/run_agent/ → 1674 passed, 3 skipped, 0 failures - E2E (real imports, isolated HERMES_HOME, ResponsesApiTransport build_kwargs): turn-1 request carries `include: ["reasoning.encrypted_content"]`; turn-2 input replays the encrypted_content blob from turn-1's `codex_reasoning_items`; native Codex unchanged. |
||
|
|
258965663c |
fix(chat_completions): strip tool_name from messages for strict providers
The 'tool_name' key on role=tool messages is an internal Hermes field (stored in the messages.tool_name SQLite column for FTS indexing) that is not part of the OpenAI Chat Completions schema. Strict OpenAI-compatible providers — notably Moonshot AI (Kimi) — reject it with HTTP 400: Error from provider: Extra inputs are not permitted, field: 'messages[N].tool_name', value: 'execute_code' Add 'tool_name' to the sanitize block in ChatCompletionsTransport.convert_messages alongside the existing Codex Responses API fields (codex_reasoning_items, codex_message_items) so it is popped before the request is sent. Reproducer: hermes chat --model kimi-k2.6 > list the top 5 Hacker News stories -> assistant emits tool_call(execute_code) -> tool result message gets tool_name='execute_code' -> next turn's payload includes messages[N].tool_name -> 400 Permissive backends (MiniMax, OpenRouter on most routes) ignore the extra field and were masking the bug. |
||
|
|
5e743559e0 |
fix(lint): skip per-file shell linter when LSP will handle the file (#29054)
* fix(lint): skip per-file shell linter when LSP will handle the file `_check_lint` ran `npx tsc --noEmit FILE.ts` after every `.ts`/`.tsx` edit. `tsc` ignores `tsconfig.json` when given an explicit file argument (documented quirk) and defaults to no-lib / ES5, so every ES2015+ stdlib reference reports as missing: - `Cannot find global value 'Promise'` - `Cannot find name 'Map' / 'Set' / 'ReadonlySet' / 'Iterable'` - `Property 'isFinite' does not exist on type 'NumberConstructor'` - `Module 'phaser' can only be default-imported using esModuleInterop` - `import.meta is only allowed when --module is es2020+` On real TypeScript projects this floods the `lint` field on WriteResult / PatchResult with up to 25K tokens of false positives per edit. The delta filter in `_check_lint_delta` is supposed to mask them, but a tiny edit shifts line numbers and every phantom resurfaces as "introduced by this edit". The result is a 1MB+ phantom-error dump on every patch that eats the agent's context budget. Same shape for `.go` (`go vet` outside a module) and `.rs` (`rustfmt --check` outside a Cargo project). PR #24168 added an LSP tier on top of this — real `tsserver` / `gopls` / `rust-analyzer` diagnostics surface in the separate `lsp_diagnostics` field. But the broken shell linter kept running underneath, so the phantom-error dump kept happening even when LSP was giving us a clean authoritative signal. This change short-circuits the shell linter for the structurally-broken extensions (`.ts`, `.tsx`, `.go`, `.rs`) when an LSP server is active and claims the file via `LSPService.enabled_for(path)`. The LSP tier runs as before and carries the real diagnostics in `lsp_diagnostics`. Other shell linters (`py_compile`, `node --check`) keep running unconditionally — they're fast, file-local, and correct. Default behavior (LSP disabled, LSP misconfigured, remote backend, file outside a workspace) is unchanged — the existing fallback paths trigger when `_lsp_will_handle` returns False, so users who haven't opted into LSP get the same shell-linter behavior they had before. Drive-by: `.tsx` was missing from the `LINTERS` table entirely, so TS React files got no post-edit syntax check at all. Added it for symmetry; in practice it now hits the LSP-skip path. Tests: - `tests/agent/lsp/test_shell_linter_lsp_skip.py` — 14 tests covering: * skip happens for each redundant extension when LSP claims the file (asserted by patching `_exec` to raise on any shell-linter call) * shell linter still runs when LSP is inactive (regression guard) * `.py` / `.js` continue to run unconditionally even with LSP active * `_lsp_will_handle` is exception-safe: returns False on None service, remote backend, or `enabled_for` raising * `.tsx` is in both `LINTERS` and `_SHELL_LINTER_LSP_REDUNDANT` - All pre-existing tests in `tests/agent/lsp/` and `tests/tools/test_file_operations*.py` still pass (233/233). * fix(lint): address Copilot review on #29054 Two fixes from copilot-pull-request-reviewer on PR #29054: 1. `.tsx` regression with LSP disabled (https://github.com/NousResearch/hermes-agent/pull/29054#discussion_r3271017282) The first revision added `.tsx` to the `LINTERS` table so that TypeScript React files would hit the LSP skip path. Side effect: when LSP is *disabled* (the default), `.tsx` edits would suddenly run `npx tsc --noEmit FILE.tsx` and inherit the same phantom-error dump this PR is supposed to fix. Pre-PR behavior was implicit `skipped` (no `LINTERS` entry); restore that. - Remove `.tsx` from `LINTERS`. - Remove `.tsx` from `_SHELL_LINTER_LSP_REDUNDANT` (the skip path is unreachable without a `LINTERS` entry — falls through to `ext not in LINTERS` first). - When LSP IS enabled, `.tsx` is still covered by the LSP tier via `_maybe_lsp_diagnostics` (typescript-language-server's `extensions` tuple includes `.tsx`), so the diagnostics still surface — just on the `lsp_diagnostics` channel, not `lint`. - Update test_shell_linter_lsp_skip.py to reflect this contract (drop `.tsx` from the parametrize lists; add `test_tsx_stays_out_of_linters_table_for_default_compatibility` and `test_tsx_default_check_lint_returns_skipped`). 2. V4A patches dropped `WriteResult.lsp_diagnostics` (https://github.com/NousResearch/hermes-agent/pull/29054#discussion_r3271017295) `tools/patch_parser.py::apply_v4a_operations` calls `file_ops.write_file()` per operation, then calls `_check_lint()` directly afterwards — but never propagates `WriteResult.lsp_diagnostics` to the `PatchResult`. The shell-linter skip introduced in this PR makes the gap visible: a `.ts` / `.go` / `.rs` V4A patch with LSP active would return `lint = {f: {skipped: True}}` and zero diagnostics from any channel. - `_apply_add` and `_apply_update` now return `Tuple[bool, str, Optional[str]]` where the third element is `WriteResult.lsp_diagnostics` (or `None` on failure / no diags). - `_apply_delete` and `_apply_move` stay 2-tuples — they don't produce diagnostics, no write goes through `write_file`. - `apply_v4a_operations` accumulates per-file diagnostics blocks and surfaces a combined block on `PatchResult.lsp_diagnostics`. Each block already carries its `<diagnostics file="...">` header from `LSPService.report_for_file`, so concatenation preserves per-file attribution. Tests added (`test_patch_parser.py::TestV4ALspDiagnosticsPropagation`): - ADD op: `WriteResult.lsp_diagnostics` flows to `PatchResult` - UPDATE op: same - No diagnostics → `PatchResult.lsp_diagnostics is None` (not "") - Multi-file patch: combined block contains every per-file block Verification: - Targeted test scope: 257/257 pass (tests/agent/lsp/, tests/tools/test_file_operations*.py, tests/tools/test_patch_parser.py) - Wider sweep: 5400 pass; 11 failures all pre-existing on origin/main (file_staleness / file_read_guards / file_state_registry — unrelated macOS /var/folders tmp-path sensitivity issues, confirmed by re-running on a clean origin/main checkout) * docs(test): align shell-linter LSP skip docstring with .tsx behavior Copilot review feedback (review #4324947616, comment #3271049036): the test module docstring still listed .tsx alongside .ts/.go/.rs in the skip contract, but .tsx is now intentionally NOT in LINTERS or _SHELL_LINTER_LSP_REDUNDANT. Updated the bullet list to drop .tsx from the skip contract and added a paragraph documenting why .tsx is left out (preserves pre-PR implicit-skip behavior for LSP-disabled users; LSP coverage still happens via _maybe_lsp_diagnostics). * test(lsp): drop unused tmp_path from _make_fops helper Copilot review #3271069484: the helper accepted tmp_path but never used it. Callers still need tmp_path themselves for the file they're asserting against, so we just drop the helper's parameter. |
||
|
|
d759a67c0f | fix: add recovery hints to loop guard warnings | ||
|
|
b5c1fe78aa |
feat(skills): add skill bundles — alias /<name> loads multiple skills (#28373)
Skill bundles are tiny YAML files in ~/.hermes/skill-bundles/ that
group several skills under one slash command. Invoking /<bundle-name>
from any surface (CLI, TUI, dashboard, any gateway platform) loads
every referenced skill into a single combined user message.
Use cases:
- /backend-dev → loads github-code-review + test-driven-development
+ github-pr-workflow as one bundle.
- /research → loads several research skills together.
- Team task profiles shared via dotfiles.
Behavior:
- Bundles take precedence over individual skills when slugs collide.
- Missing skills are skipped with a note, not fatal.
- No system-prompt mutation — bundles generate a fresh user message
at invocation time, the same way /<skill> does. Prompt cache stays
intact.
- Works in CLI dispatch, gateway dispatch, autocomplete (CLI + TUI),
/help display.
Schema (~/.hermes/skill-bundles/<slug>.yaml):
name: backend-dev
description: Backend feature work.
skills:
- github-code-review
- test-driven-development
instruction: |
Optional extra guidance prepended to the loaded skills.
New module: agent/skill_bundles.py — load, scan, resolve, build
invocation message, save, delete. yaml.safe_load only; broken
bundles log a warning and are skipped, never raise.
New CLI subcommand: hermes bundles {list,show,create,delete,reload}.
Implementation in hermes_cli/bundles.py; wired in hermes_cli/main.py.
'bundles' added to _BUILTIN_SUBCOMMANDS so plugin discovery skips it.
New in-session slash command: /bundles lists installed bundles in
both CLI and gateway. /<bundle-name> dispatch added to CLI (cli.py)
and gateway (gateway/run.py) before the existing /<skill-name> path.
Autocomplete: SlashCommandCompleter gained an optional
skill_bundles_provider parameter that defaults to None — the prompt
shows '▣ <description> (N skills)' for bundles vs '⚡' for skills.
Tests:
- tests/agent/test_skill_bundles.py — 33 tests covering slugify,
scan/cache freshness, resolve (including underscore→hyphen
Telegram alias), build_bundle_invocation_message (loading, missing
skills, user/bundle instruction injection, dedup), save/delete,
reload diff, list sort.
- tests/hermes_cli/test_bundles.py — 8 tests for the CLI
subcommand (create/list/show/delete/reload, --force, missing
bundle errors).
- tests/gateway/test_bundles_command.py — 4 tests for the gateway
handler and bundle resolution priority.
Live E2E: verified subprocess invocations of hermes bundles
{list,create,show,reload,delete} round-trip correctly against an
isolated HERMES_HOME.
Docs:
- website/docs/user-guide/features/skills.md — new 'Skill Bundles'
section with quick example, YAML schema, management commands,
behavior notes.
- website/docs/reference/cli-commands.md — 'hermes bundles' added to
the top-level command table and given its own subcommand section.
|
||
|
|
756900723a |
fix(agent): add qwen and deepseek to TOOL_USE_ENFORCEMENT_MODELS
Qwen3.x and DeepSeek-V3.x default to chatty/hallucinatory tool use without enforcement steering — agents narrate "calling tool X" without actually emitting a tool call, or run partial loops. Both model families fit the same failure pattern TOOL_USE_ENFORCEMENT_GUIDANCE was already injected for (gpt, codex, gemini, gemma, grok, glm). Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com> Squashed salvage of: - |
||
|
|
25e0f4d465 |
fix: wrap _pool_may_recover_from_rate_limit call through run_agent namespace
The conversation_loop.py references _pool_may_recover_from_rate_limit which was defined in run_agent.py. After the conversation-loop extraction refactor, the helper was no longer in the same module scope. Wrap the call as _ra()._pool_may_recover_from_rate_limit() to route through the run_agent monkeypatch namespace where the helper is available. Adds regression test in test_gemini_fast_fallback.py. Fixes: MAILROOM Email Triage NameError, OPS Execution Monitor NameError. |
||
|
|
50e93f23f2 | 🐛 fix(memory): require newline after context tag | ||
|
|
341c8d3030 | 🐛 fix(memory): keep inline memory-context mentions visible | ||
|
|
b570e0fdd0 |
fix(codex-oauth): quarantine terminal refresh errors so dead tokens are not replayed across sessions
When a Codex OAuth refresh token is permanently invalidated (HTTP 400/401/403,
token revoked or reused), _mark_exhausted was called but auth.json was left with
the dead credentials. On the next session, _seed_from_singletons re-read
auth.json and re-seeded the pool with the same revoked token, triggering the
same terminal failure in a loop.
Add _is_terminal_codex_oauth_refresh_error to auth.py and a matching quarantine
block in _refresh_entry: when a terminal error is detected and auth.json holds
no newer tokens, clear access_token/refresh_token from auth.json and remove all
device_code-sourced pool entries from memory. Mirrors the Nous quarantine added
in
|
||
|
|
9aae59feab |
fix(compress): make abort-on-summary-failure opt-in via config flag (#28117)
PR #28102 made the summary-failure abort path the unconditional default, changing established behavior. Gate it behind config.yaml flag `compression.abort_on_summary_failure` (default False = historical fallback-placeholder behavior). - hermes_cli/config.py: new `compression.abort_on_summary_failure` key, default False, documented inline. - agent/agent_init.py: read the flag from compression config and pass to ContextCompressor. - agent/context_compressor.py: `__init__` accepts `abort_on_summary_failure` (default False). `compress()` failure branch gates the abort on the flag; when False, falls through to the restored legacy fallback path (static "summary unavailable" placeholder + drop middle window). - tests: restore original fallback expectations as default; add new TestAbortOnSummaryFailure class for the opt-in mode. Gateway/CLI plumbing (force=True on /compress, hygiene/handler abort detection, locale `gateway.compress.aborted` key) from PR #28102 stays intact — those paths only fire when `_last_compress_aborted` is True, which now only happens when the flag is enabled. |
||
|
|
5e40f83cb7 |
fix(xai-oauth): quarantine terminal refresh errors so dead tokens are not replayed across sessions
When refresh_xai_oauth_pure raises a terminal error (HTTP 400/401/403, i.e. revoked or reused refresh token), _refresh_entry's existing race- recovery path re-syncs from auth.json and returns if another process has already rotated the tokens. If auth.json still holds the same stale token pair, the function fell through to _mark_exhausted — leaving the dead credentials in auth.json. On the next Hermes startup _seed_from_singletons re-seeded the pool from those stale tokens, causing the same failure loop on every session. Fix: after the auth.json re-sync check in the xAI-oauth error handler, detect terminal errors with the new _is_terminal_xai_oauth_refresh_error helper and apply a quarantine: - Clear access_token and refresh_token from providers["xai-oauth"]["tokens"] in auth.json so they are not re-seeded. - Write a last_auth_error entry for hermes doctor / auth status diagnostics. - Remove all loopback_pkce entries from the in-memory pool so the current session stops retrying with the dead credentials. Mirrors the identical quarantine already in place for Nous OAuth ( |
||
|
|
5613dfea93 |
fix(security): redact xAI (Grok) API keys in logs
xAI is a first-class provider in hermes-agent with its own credential
pool entry (XAI_API_KEY / xai-oauth). API keys follow the format
xai-<60+ alphanumeric chars> and were absent from _PREFIX_PATTERNS in
agent/redact.py.
When a key appears raw in log output, tool results, or error messages,
it passed through completely unmasked. The ENV-assignment and Bearer
header patterns catch the most common cases, but a raw token in a
stack trace or debug print had no protection.
Verified before fix:
redact_sensitive_text("using key xai-ABCD...rstu to call xAI", force=True)
# "using key xai-ABCD...rstu to call xAI" <- exposed
After fix:
# "using key xai-AB...rstu to call xAI" <- masked
Five unit tests added to TestXaiToken covering bare token masking,
env assignment, short-prefix false positive, company name false
positive, and visible prefix in masked output.
|
||
|
|
1634397ddb |
fix(compress): abort instead of dropping messages when summary LLM fails (#28102)
When auxiliary compression's summary generation returns None (aux model errored, returned non-JSON, timed out, etc.) the compressor previously still dropped every middle message between compress_start..compress_end and replaced them with a static 'Summary generation was unavailable' placeholder. The session kept going but the user silently lost N turns of context for nothing. New behavior: on summary failure, compress() aborts entirely — returns the input messages unchanged and sets _last_compress_aborted=True. The existing _summary_failure_cooldown_until gate (30-60s) keeps the aux model from being burned on every turn. Auto-compress callers detect the no-op (len(after) == len(before)) and stop looping. The chat is 'frozen' at its current size until the next /compress or /new. Manual /compress (CLI + gateway) now passes force=True which clears the cooldown so users can retry immediately after an auto-abort. If the manual retry also fails, the user gets a visible warning telling them nothing was dropped and how to retry. - agent/context_compressor.py: compress() gains force= kwarg; failure branch sets _last_compress_aborted and returns messages unchanged instead of inserting placeholder. - run_agent.py: _compress_context() detects abort, surfaces warning, skips session-rotation entirely, returns messages unchanged. - cli.py + gateway/run.py: manual /compress paths pass force=True. - gateway/run.py: hygiene + /compress handlers detect _last_compress_aborted and emit the new 'Compression aborted' warning (gateway.compress.aborted) instead of the old 'N historical messages were removed' message. - locales/*.yaml: new gateway.compress.aborted key in all 16 locales. - tests: updated to assert the abort contract (messages preserved, compression_count not incremented, abort flag set, no placeholder leaked). New test_force_true_bypasses_failure_cooldown covers the manual-retry path. |
||
|
|
9df9816dab |
feat(azure-foundry): add Microsoft Entra ID auth
Use azure-identity DefaultAzureCredential for keyless Foundry auth. Preserve refreshable callable credentials through OpenAI and Anthropic client paths. Add setup, doctor, auth status, docs, and tests for Entra auth. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
||
|
|
f0c6d59148 |
fix(anthropic): scope MiniMax beta-strip to MiniMax only
Cherry-pick of @sharziki's #27022 routed Azure Foundry through _requires_bearer_auth, which also triggered the MiniMax-specific beta-strip in _common_betas_for_base_url — dropping the 1M-context beta from Azure even though Azure needs it for 1M context. Split the strip predicate: introduce _is_minimax_anthropic_endpoint so the fine-grained-tool-streaming and context-1m strips only fire for MiniMax hosts, leaving Azure's bearer-auth header swap intact without losing 1M context. Also add a regression test that asserts Azure gets Bearer auth, the api-version query param, and the context-1m-2025-08-07 beta. |
||
|
|
ff078738ea | fix(skills): load symlinked skill slash commands |