main
554 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
769ee86cd2 |
feat(kanban): attach images referenced in task bodies to worker vision (#34210)
Kanban workers now scan the task body for local image paths and http(s) image URLs and attach them to the worker's first user turn — matching the CLI/gateway behaviour for inbound images. Before, a user pasting `/home/me/screenshot.png` or `https://example.com/img.png` into a kanban task description had it sent to the model as plain text and the pixels were never seen. How it works: * agent/image_routing.py gains extract_image_refs(text) → (paths, urls) that mirrors gateway/platforms/base.py:extract_local_files (absolute / ~-relative paths, image extensions only, ignores fenced/inline code). * build_native_content_parts() accepts an optional image_urls= kwarg and emits passthrough image_url parts for remote URLs alongside the base64 data: URLs used for local paths. * cli.py (single-query/quiet branch — the path every dispatcher-spawned worker takes) detects HERMES_KANBAN_TASK, reads the task body via kanban_db.get_task, runs extract_image_refs, and threads the results into the existing image-routing decision (native vs text). Best-effort: enrichment failures never block worker startup. Tested: * tests/agent/test_image_routing.py — 22 new tests for extract_image_refs and URL pass-through in build_native_content_parts. * tests/hermes_cli/test_kanban_worker_image_extraction.py — 10 new tests driving real kanban_db round-trip (create task → read body → extract refs → build parts). * E2E: created a fake kanban task with a body referencing both a local PNG and an https URL; verified the worker pipeline produces a multimodal user turn with 1 text part + 2 image_url parts (data URL for the local file, passthrough URL for the remote). |
||
|
|
5a95fb2e14 |
feat: expose completed-turn message context to memory providers
Adds an optional `messages` keyword to the `MemoryProvider.sync_turn` contract so external/community memory plugins can receive the OpenAI-style conversation message list for the completed turn — including assistant tool calls and tool result content — not just the final assistant text. Dispatch uses signature inspection (`_provider_sync_accepts_messages`): only providers that declare a `messages` parameter (or `**kwargs`) receive it; all existing in-tree providers keep their legacy text-only signature and are called unchanged. No structured-trace envelope is added to core — providers reconstruct whatever they need from the standard message list. Also documents Memori as a standalone community memory provider. Salvaged from #28065 — rebased onto current main. Co-authored-by: Dave Heritage <david@memorilabs.ai> |
||
|
|
5f66c36470 |
fix(redact): pass web URLs through unchanged (#34029)
* fix(redact): pass web URLs through unchanged Magic-link checkout URLs, OAuth callbacks the agent is meant to follow, and pre-signed share URLs were getting `?token=***` / `?code=***` / `?signature=***` blanket-redacted by parameter NAME, which breaks any skill that has to round-trip a URL through history (the model's tool call arguments get sanitized before persistence — the live call fires with the real URL, but the next turn sees `***`). Joe Rinaldi Johnson hit this with a checkout-acceleration skill that uses magic links in URLs. Drops three call sites from `redact_sensitive_text`: - `_redact_url_query_params` (was redacting `access_token`, `token`, `api_key`, `code`, `signature`, `key`, `auth`, etc.) - `_redact_url_userinfo` (was redacting `https://user:pass@host`) - `_redact_http_request_target_query_params` (was redacting access-log request targets like `"POST /hook?password=... HTTP/1.1"`) The helpers themselves are kept in the module — still importable by anything that wants to opt in explicitly. Still redacted (unchanged): - Vendor-prefix credential shapes (sk-, ghp_, AKIA, gAAAA, etc.) anywhere they appear, including inside URLs — see the `test_known_prefix_inside_url_still_redacted` case. - JWTs (`eyJ...`) - DB connection-string passwords (`postgres://admin:pw@host`) — these are connection strings, not web URLs the agent navigates to. - Authorization headers, ENV assignments, JSON `apiKey`/`token` fields, Telegram bot tokens, private key blocks, Discord mentions, E.164 phone numbers, and form-urlencoded bodies (request bodies, not URLs). Tests: replaces `TestUrlQueryParamRedaction` + `TestUrlUserinfoRedaction` with `TestWebUrlsNotRedacted`, asserting representative URLs (OAuth callback, magic link, S3 pre-signed, websocket, userinfo, access log) pass through unchanged. Adds positive cases proving the prefix and DB connstr nets still fire. 74 redact tests + 10 browser-exfil + 16 PII redaction tests all pass. * test(codex_app_server): drop URL-query assertion from stderr-tail redaction test The test bundled (a) sk-live-* credential-prefix redaction with (b) URL query-param redaction. (a) is still in effect via _PREFIX_RE; (b) was the contract we just removed in the parent commit so the 'querysecret12345' assertion stopped holding. Keep the credential-shape assertion, drop the URL-query one. Send-message tool's local _URL_SECRET_QUERY_RE in tools/send_message_tool.py is independent of agent/redact.py and unchanged — its tests (test_top_level_send_failure_redacts_query_token, test_http_error_redacts_access_token_in_exception_text) still pass. |
||
|
|
1a74795735 |
feat: add claude-opus-4.8 and claude-opus-4.8-fast (#34003)
Anthropic released Claude Opus 4.8 on 2026-05-27, available on OpenRouter, Anthropic, Amazon Bedrock, and Claude Platform on AWS: - https://openrouter.ai/anthropic/claude-opus-4.8 - https://openrouter.ai/anthropic/claude-opus-4.8-fast The fast-mode variant is a separate model ID (anthropic/claude-opus-4.8-fast) priced at 2x of the base model — a notable improvement over the 6x premium on older Opus generations (4.6/4.7). It is NOT a `speed: "fast"` request parameter like Opus 4.6; Anthropic's native fast-mode beta still only covers Opus 4.6. Changes: hermes_cli/models.py - Add anthropic/claude-opus-4.8 + anthropic/claude-opus-4.8-fast to the OpenRouter fallback snapshot and the Nous Portal curated list (live catalogs surface them automatically when reachable; the fallback list matters when the manifest fetch fails). - Add claude-opus-4-8 to the Anthropic-native picker list. agent/model_metadata.py - Register claude-opus-4-8 / claude-opus-4.8 in DEFAULT_CONTEXT_LENGTHS with 1M tokens (matches 4.6/4.7). agent/anthropic_adapter.py - Extend _XHIGH_EFFORT_SUBSTRINGS, _ADAPTIVE_THINKING_SUBSTRINGS, and _NO_SAMPLING_PARAMS_SUBSTRINGS with "4-8"/"4.8". 4.8 inherits the Opus 4.7 API contract: adaptive thinking only, xhigh effort level supported, sampling parameters (temperature/top_p/top_k) return 400. - Add claude-opus-4-8 to _ANTHROPIC_OUTPUT_LIMITS (128k max output, same as 4.7). Matches by substring so claude-opus-4-8-fast and date-stamped variants resolve correctly. agent/usage_pricing.py - Add anthropic/claude-opus-4-8: $5/$25 per MTok input/output, $0.50 cache read, $6.25 cache write (same as 4.6/4.7). - Add anthropic/claude-opus-4-8-fast: $10/$50 per MTok (2x), $1.00 cache read, $12.50 cache write. Per OpenRouter, the 2x premium is the only differentiator from regular Opus 4.8. - OpenRouter routes still pull pricing from the live /models API, so no static OpenRouter entry is needed. tests/agent/test_model_metadata.py - Extend the Claude 4.6+ context-length tag list with 4.8/4-8. website/static/api/model-catalog.json - Regenerated via `python scripts/build_model_catalog.py` to pick up the new entries in the OpenRouter and Nous Portal fallback lists. E2E verification (isolated sys.path import against the worktree): - _supports_adaptive_thinking, _supports_xhigh_effort, _forbids_sampling_params all return True for claude-opus-4.8 and claude-opus-4.8-fast. - _supports_fast_mode (the `speed: "fast"` request-parameter gate) stays False for 4.8 — fast mode is a separate model ID on OpenRouter, not a parameter Anthropic accepts on the base model. - DEFAULT_CONTEXT_LENGTHS resolves 1M for both notations. - resolve_billing_route + _lookup_official_docs_pricing resolve the correct $5/$25 (regular) and $10/$50 (fast) pricing for both dot-notation and dash-notation inputs. - 4.7 and 4.6 regression: behavior unchanged. Unit tests: 305 passed across tests/agent/test_usage_pricing.py, test_model_metadata.py, tests/hermes_cli/test_model_catalog.py, test_models.py, test_model_validation.py, test_models_dev_preferred_merge.py. |
||
|
|
0554ef1aa3 |
fix(agent): fallback immediately on provider content-policy blocks (#33883)
* fix(agent): fallback immediately on provider content-policy blocks
Provider safety-filter refusals (e.g. OpenAI Codex 'flagged for possible
cybersecurity risk', OpenAI moderation 'violates our usage policies',
Anthropic safety-system rejections, Azure content_filter) are
deterministic decisions about a specific prompt. Retrying the same
prompt up to api_max_retries times just reproduces the same refusal and
burns paid attempts before surfacing the generic 'API failed after 3
retries — <provider message>' to Telegram / cron with no indication that
the failure came from the model provider rather than Hermes itself.
Classify these as a new FailoverReason.content_policy_blocked
(non-retryable, should_fallback=True) and route them through the
existing is_client_error path so the loop:
- skips the 3x retry backoff
- activates a configured fallback model immediately
- emits a clear provider-safety message to the user (not the generic
'Non-retryable error (HTTP None)') and surfaces actionable guidance
when no fallback is configured (rephrase, narrow context, or set
fallback_model in hermes config)
- returns a final_response that explicitly tells the user this came
from the model provider, so gateway delivery is unambiguous and
cron last_status reflects the safety block rather than a vague
'agent reported failure'
Patterns are intentionally narrow — verbatim refusal phrasings keyed to
specific provider safety pipelines, not generic words like 'policy' or
'violation' that would collide with billing / format / auth errors.
Regression guards in test_18028_content_policy_blocked.py verify
billing 402s, generic 400s, and OpenRouter account-level
provider_policy_blocked remain distinct classifications.
Salvaged from #18164 onto current main (file restructure: loop logic
moved from run_agent.py to agent/conversation_loop.py, _emit_status →
_buffer_status), broadened patterns beyond the original OpenAI Codex
cybersecurity case to cover OpenAI moderation, Anthropic safety system,
and Azure content_filter; added user-actionable guidance and a clear
final_response so cron/gateway surfaces the policy block instead of a
generic non-retryable error, and added a regression-guard test module
mirroring the is_client_error predicate.
Addresses #18028.
Co-authored-by: Kuan-Chieh Huang <kchuang1015@users.noreply.github.com>
* chore: add kchuang1015 to AUTHOR_MAP
---------
Co-authored-by: Kuan-Chieh Huang <kchuang1015@users.noreply.github.com>
|
||
|
|
67011cc0d7 |
feat(agent): buffer retry/fallback status, surface only on terminal failure (#33816)
Users report that the CLI/gateway floods them with confusing retry chatter
during transient failures: a single 429 can produce 10+ "Provider/Endpoint/
Retrying in 5s..." lines before the request eventually succeeds. The same
firehose hits Telegram, Discord, Slack, etc. via _emit_status.
This patch defers all retry/fallback/compression status messages until we
know the outcome:
- if the turn ultimately succeeds (any path: primary recovers, fallback
activates, compression unsticks the request), the buffer is silently
dropped — the user sees nothing.
- if every retry and fallback exhausts and the turn fails, the buffer
is flushed at the terminal-failure return so the user sees the full
retry trace alongside the final error.
Backend logging (agent.log) is unchanged — every emission site still
writes to logger.warning/info, so post-mortem diagnosis is intact.
## What changed
run_agent.py: four new methods on AIAgent:
_buffer_status(msg) — defer an _emit_status call
_buffer_vprint(msg) — defer a _vprint(force=True) line
_clear_status_buffer() — drop pending messages on success
_flush_status_buffer() — replay pending messages on terminal failure
agent/conversation_loop.py:
- converted ~30 mid-process emit/vprint sites in the retry, fallback,
compression, empty-response, and stream-watchdog paths to the buffered
helpers
- added _flush_status_buffer() at every terminal-failure return so users
still see the trace when it actually matters
- added _clear_status_buffer() at the "non-empty assistant content"
point (NOT at "API call returned bytes" — empty responses still loop
through the empty-retry path and would otherwise lose their trace
between iterations)
- silenced the two "(´;ω;`) oops, retrying..." / "(╥_╥) error,
retrying..." spinner final-frame messages — the spinner now stops
cleanly so retries leave no visible residue
agent/chat_completion_helpers.py: same conversion for codex TTFB / stale-
stream / fallback-activation status messages.
agent/stream_diag.py: _emit_stream_drop now buffers instead of emitting
directly.
## Tests
tests/run_agent/test_retry_status_buffer.py: 7 unit tests covering
accumulate→flush, clear-on-success, mixed kinds, empty-buffer no-op,
re-buffer after flush, exception swallowing.
Updated 3 existing tests that mocked _emit_status to also mock (or use)
_buffer_status:
- tests/run_agent/test_run_agent.py::test_empty_response_emits_status_for_gateway
- tests/run_agent/test_stream_drop_logging.py (2 tests)
- tests/agent/test_codex_ttfb_watchdog.py (TTFB hint test)
## Validation
Live test: hermes chat -q against an unreachable endpoint with no fallback
exhausts retries and prints the full trace at the end. Same flow against
a working endpoint prints zero retry chatter.
|
||
|
|
9b5dae17a5 |
feat(context-engine): host contract for external context engines
Condenses the substance of PRs #16453, #17453, #16451, #17600, and #13373 into a minimal generic host contract that external context engine plugins (e.g. hermes-lcm) need to integrate cleanly. Drops scaffolding that duplicated existing infrastructure or had marginal value. Five concrete changes: 1. `_transition_context_engine_session()` on AIAgent — generic lifecycle helper that fires on_session_end → on_session_reset → on_session_start → optional carry_over_new_session_context. Engines implement only the hooks they need; missing hooks are skipped. Built-in compressor keeps its existing reset-only behavior because callers default to no metadata. `reset_session_state()` now optionally accepts previous_messages / old_session_id / carry_over_context and delegates to the transition helper when provided. (#16453) 2. `conversation_id` passed to `on_session_start()` — both the agent-init call site and the compression-boundary call site now forward `self._gateway_session_key` so plugin engines have a stable conversation identity that survives session_id rotation (compression splits, /new, resume). The key already existed on AIAgent; it just wasn't reaching engines. (#16453) 3. Canonical cache buckets forwarded to engines — the usage dict passed to `update_from_response()` now includes input_tokens, output_tokens, cache_read_tokens, cache_write_tokens, and reasoning_tokens on top of the legacy prompt/completion/total keys. Engines can make decisions on cache-hit ratios and reasoning costs instead of only aggregates. ABC docstring updated. (#17453) 4. Plugin-registered context engines visible in the picker — `_discover_context_engines()` in plugins_cmd.py now also includes engines registered via `ctx.register_context_engine()` from plugin manifests, deduplicating by name so repo-shipped descriptions win on collision. (#16451) 5. `_EngineCollector.register_command()` — context engines using the standard `register(ctx)` pattern can now expose slash commands (e.g. `/lcm`). Routes to the global plugin command registry with the same conflict-rejection policy regular plugins use (no shadowing built-ins, no clobbering other plugins). Previously these calls hit a no-op and the slash commands silently never appeared. (#17600) Dropped from the original 5 PRs: - Compression boundary signal (`boundary_reason="compression"`) from #16453 — already on main at `agent/conversation_compression.py:412-424`, landed via the bg-review extraction. - `discover_plugins()` before fallback in run_agent.py from #16451 — redundant: `get_plugin_context_engine()` already routes through `_ensure_plugins_discovered()` which is idempotent. - Runtime identity diagnostics method + helpers from #13373 (+251 LOC) — operators can already read engine state via `engine.get_status()`; the diagnostics view added marginal value relative to its surface area. - The 553-LOC slash-command machinery from #17600 — replaced with a 20-LOC `register_command` method on the collector that reuses the existing plugin command registry instead of building a parallel one. Net: ~215 LOC of host-contract changes + 282 LOC of focused tests, vs ~1,176 LOC across the original 5 PRs. Co-authored-by: Tosko4 <1294707+Tosko4@users.noreply.github.com> Closes #16453. Closes #17453. Closes #16451. Closes #17600. Closes #13373. Related: stephenschoettler/hermes-lcm#68. |
||
|
|
459d7694d3 | fix(agent): preload jiter native parser | ||
|
|
406901b27d | feat(auth) normalise the way in which we check whether a user has free/paid access to nous portal so we can expose behaviour and error messages accordingly. | ||
|
|
4a6f1863ac |
test: cover ci-unblocker production regressions
Snapshot review_agent._session_messages before teardown so close() can clean per-session state without dropping the user-visible self-improvement summary. Adds two regressions: - bg-review summarizer receives captured review-agent tool messages after review_agent.close() runs - context-compressor protected-head handoff rehydration populates _previous_summary and keeps the old handoff out of newly summarized turns Salvaged from PR #26039 onto current main after agent/background_review.py extraction. Original commit 63eaf6055; bg-review test updated to patch the module-level summarize_background_review_actions in agent.background_review instead of the now-forwarder AIAgent._summarize_background_review_actions. |
||
|
|
ee80dfdea0 | fix: preserve skill packages during curator consolidation | ||
|
|
283bb810e7 | fix(agent): tolerate large codex stream prefill | ||
|
|
486d632cc2 |
fix(auxiliary): coerce None final.output to empty list in Codex aux adapter
Closes #33368. `_CodexCompletionsAdapter.create()` iterates `final.output` from the Codex Responses stream. The event-driven consumer (introduced in #33042) always sets `final.output` to a list, so this shape can't come from our own code path. But: - Mocked clients in tests can return a typed Response with `output=None` - Third-party shims / compatibility layers that bypass the consumer can do the same - A future code path that wraps a different consumer could regress The old code `getattr(final, "output", [])` returns `None` (not the default `[]`) when the attribute EXISTS but is `None`. Iterating `None` then raises `TypeError: 'NoneType' object is not iterable` — the exact error logged by title-generation when this fires. Fix: `getattr(final, "output", None) or []` — single-line defensive coerce. Cheap; zero risk. Regression test asserts the auxiliary path handles a final whose `.output` is `None` (via monkey-patched consumer) without raising and returns the expected chat.completions-shaped response. Reporter: @pavegrid-1 (issue #33368). |
||
|
|
b4eea187d5 |
fix(xai-oauth): gate slash-enum strip on model name + add regression tests (#28490)
Three additions on top of @Nami4D's salvage:
1. Gate the preflight slash-enum strip on the model name pattern
(grok-* / x-ai/grok-*). The original PR stripped slash-containing
enum values from every codex_responses request, but native Codex
(OpenAI) and GitHub Models DO accept slash enums — stripping them
there would silently degrade tool-schema constraints. xAI is the
only Responses-API surface that rejects the shape.
2. Resolve the merge conflict in agent/transports/codex.py by
preserving both the timeout-forwarding block that landed on main
between the PR's branch point and now AND the new service_tier
strip. Behavioural intent of both is preserved.
3. Six new tests in tests/agent/transports/test_codex_transport.py
covering:
- TestCodexTransportXaiServiceTierStrip (3 tests): xAI strips
service_tier from request_overrides; non-xAI codex_responses
and GitHub Models both KEEP service_tier (regression guards
so the strip stays xAI-only).
- TestPreflightSlashEnumStrip (3 tests): Grok and aggregator-
prefixed Grok model names both trigger the safety-net strip;
non-Grok models preserve slash enums as a regression guard
against the strip becoming too broad.
51/51 in tests/agent/transports/test_codex_transport.py.
Co-authored-by: Nami4D <hello@nami4d.tech>
|
||
|
|
b1a46b3047 | fix(codex): drop transient rs_tmp reasoning replay state | ||
|
|
4243b6dc45 | fix(codex): update silent-hang workaround hint | ||
|
|
febc4cfec0 |
remove Vercel AI Gateway and Vercel Sandbox (#33067)
* remove Vercel AI Gateway provider and Vercel Sandbox terminal backend Both Vercel-hosted integrations are removed end-to-end. Users on the AI Gateway should switch to OpenRouter or one of the other aggregators (Nous Portal, Kilo Code). Users on the Vercel Sandbox backend should switch to Docker, Modal, Daytona, or SSH. What's removed: - `plugins/model-providers/ai-gateway/` provider plugin - `hermes_cli/vercel_auth.py` Vercel-Sandbox auth helper - `tools/environments/vercel_sandbox.py` terminal backend - `ai-gateway` provider wiring across auth, doctor, setup, models, config, status, providers, main, web_server, model_normalize, dump - `vercel_sandbox` backend wiring across terminal_tool, file_tools, code_execution_tool, file_operations, approval, skills_tool, environments/local, credential_files, lazy_deps, prompt_builder, cli, gateway/run - `AI_GATEWAY_BASE_URL` constant, `_AI_GATEWAY_HEADERS` auxiliary-client header set, run_agent base-URL header/reasoning special-cases - `[vercel]` pyproject extra and `vercel`/`vercel-workers` from uv.lock - env vars: `AI_GATEWAY_API_KEY`, `AI_GATEWAY_BASE_URL`, `VERCEL_TOKEN`, `VERCEL_PROJECT_ID`, `VERCEL_TEAM_ID`, `VERCEL_OIDC_TOKEN`, `TERMINAL_VERCEL_RUNTIME` - Tests: deletes test_ai_gateway_models.py and test_vercel_sandbox_environment.py; scrubs references across 23 surviving test files (no entire tests deleted unless they were dedicated to AI Gateway / Sandbox) - Docs: provider tables, env-var reference, setup guides, security notes, tool config, terminal-backend tables — English plus zh-Hans i18n parity - `hermes-agent` skill: provider table entry and remote-backend list What stays (intentional): - `popular-web-designs/templates/vercel.md` — CSS design reference, unrelated to Vercel-the-AI-product - `x-vercel-id` in `stream_diag.py` headers — generic Vercel CDN response header, useful diag signal on any Vercel-hosted endpoint - `vercel-labs/agent-browser` URL in browser config — lightpanda browser project, different OSS effort - `userStories.json` historical contributor entry mentioning Vercel Sandbox — archive, not active docs Validation: - 1153 tests in the 22 targeted files pass (`scripts/run_tests.sh`) - Full repo `py_compile` clean - Live import of every touched module + invariant check (no `ai-gateway` in `PROVIDER_REGISTRY`, no `_AI_GATEWAY_HEADERS`, no `vercel_sandbox` in `_REMOTE_TERMINAL_BACKENDS`) * test: convert profile-count check from change-detector to invariant The hardcoded "== 34" assertion broke when ai-gateway was removed. Per AGENTS.md change-detector-test guidance, assert the relationship (registry count >= number of plugin dirs) instead of a literal count. Counts shift when providers are added/removed; that's expected. |
||
|
|
cb38ce28cb |
refactor(codex): drop SDK responses.stream() helper; consume events directly (#33042)
* refactor(codex): drop SDK responses.stream() helper; consume events directly The OpenAI Python SDK's high-level `client.responses.stream(...)` helper does post-hoc typed reconstruction from the terminal `response.completed.response.output` field. The chatgpt.com Codex backend has been observed (today, gpt-5.5) to ship `response.output = null` on terminal frames, which crashes the SDK with `TypeError: 'NoneType' object is not iterable` mid-iteration. Carlton's #32963 patched the symptom by wrapping the helper in try/except and recovering from the same per-event accumulator the SDK was supposed to populate. This PR removes the helper from the call path entirely: we now use `client.responses.create(stream=True)` (raw AsyncIterable of SSE events) and assemble the final response object ourselves from `response.output_item.done` events as they arrive. The terminal event's `output` field is never read for content. Same strategy OpenClaw uses for the same backend. This makes Hermes structurally immune to the bug class, not patched. The next time OpenAI ships a shape change to chatgpt.com's terminal frame, our consumer keeps working because it doesn't read that frame for content — only for usage/status/id. Changes - `agent/codex_runtime.py`: new `_consume_codex_event_stream()` shared consumer; `run_codex_stream()` uses `responses.create(stream=True)`; `run_codex_create_stream_fallback()` collapses into a thin alias since the primary path now does what the fallback used to do. - `agent/auxiliary_client.py`: `_CodexCompletionsAdapter` uses the same consumer; old null-output recovery helpers deleted as unreferenced. - Tests migrated: fixtures that mocked `responses.stream` now mock `responses.create` returning a raw iterable. New regression test asserts the auxiliary path returns streamed items even when the terminal event's `output` is literally `null`. Validation - Live: tested against fresh OAuth on `chatgpt.com/backend-api/codex` with `gpt-5.5` — response built correctly with `response.output=null` on the terminal frame, all events consumed, usage/reasoning tokens propagated. - `tests/run_agent/test_run_agent_codex_responses.py` + `tests/agent/test_auxiliary_client.py`: 242 passed. * test+fix(codex): migrate streaming tests, raise on truncated streams CI surfaced 10 test failures across tests/run_agent/test_streaming.py and tests/run_agent/test_codex_xai_oauth_recovery.py — both files had their own `responses.stream(...)` mocks I missed in the first sweep. agent/codex_runtime.py: _consume_codex_event_stream() now raises "Codex Responses stream did not emit a terminal response" when the stream ends without any terminal frame AND no usable content. This preserves the signal callers used to get from the SDK's high-level helper, which they distinguished from "completed with empty body" in error handling. Tests migrated: - test_streaming.py: text-delta callback, activity-touch, and remote-protocol-error tests all switch from mocking responses.stream to responses.create returning an iterable of events. - test_codex_xai_oauth_recovery.py: prelude-error tests are recast as wire-error-event tests (the new path raises _StreamErrorEvent directly when the wire emits type=error, which is strictly better than the old two-phase "SDK RuntimeError → retry → fallback"). The retry-on-transport-error test moves from responses.stream side-effect to responses.create side-effect. Verified live against chatgpt.com Codex with gpt-5.5 — AIAgent.chat() through the full codex_responses path returns correctly, 319/319 targeted tests passing. |
||
|
|
b6ca56f651 |
fix(codex-responses): gracefully recover from invalid_encrypted_content (salvage #10144) (#33035)
* fix(codex-responses): gracefully recover from invalid_encrypted_content (salvage #10144) When an OpenAI-compatible Responses API surface accepts an initial request but later rejects the replayed `codex_reasoning_items` encrypted blob with HTTP 400 `invalid_encrypted_content`, the session previously got stuck retrying the same poisoned payload. Recovery: classify the error as a dedicated FailoverReason, and on the first hit disable encrypted reasoning replay for the rest of the session, strip cached items from message history, and retry once. Changes: * error_classifier: add FailoverReason.invalid_encrypted_content branch in _classify_400 (before context_overflow so the messages that mention 'encrypted content … could not be verified' don't trip context heuristics), in _classify_by_error_code, and extend _extract_error_code to peek inside wrapped JSON in error.message and ignore the bare '400' as a code. * agent_init: initialize `_codex_reasoning_replay_enabled = True` on every agent. * run_agent: add AIAgent._disable_codex_reasoning_replay() helper that flips the flag and pops cached items. * codex_responses_adapter: thread a `replay_encrypted_reasoning` kwarg through _chat_messages_to_responses_input so that when the flag is False we don't replay codex_reasoning_items. * transports/codex.py: read `replay_encrypted_reasoning` from params, thread it into the adapter, and gate the `include=['reasoning.encrypted_content']` request hint on it. * chat_completion_helpers: pass the agent's replay flag through to the transport. * conversation_loop: in the retry loop, add an invalid_encrypted_content recovery branch that fires once per session, only when api_mode == codex_responses, only when replay is still enabled, and only when at least one assistant message in history actually carries cached reasoning items (otherwise the 400 has nothing to do with our cache and the normal retry path handles it). Tests: * test_error_classifier: new wrapped-JSON _extract_error_code case; new TestClassifyApiError cases proving the 400 is retryable with no fallback, that the broad message match doesn't catch a generic 'parsed' message, and that the error code match is case-insensitive. * test_run_agent_codex_responses: end-to-end test of the recovery branch firing once and disabling replay, plus a sibling test that proves the branch does *not* fire (and the flag stays True) when history has no cached reasoning items. Salvages PR #10144 onto the post-refactor module layout (error_classifier / codex_responses_adapter / transports/codex / conversation_loop / agent_init) since the original diff was written against the pre-refactor monolithic run_agent.py. * chore(release): map victorGPT in AUTHOR_MAP for #10144 salvage --------- Co-authored-by: victorGPT <wuxuebin1993@gmail.com> |
||
|
|
43a3f119fc | fix(agent): recover Codex streams with null output | ||
|
|
f4953bc648 |
fix(subdirectory_hints): prevent loading AGENTS.md outside workspace
SubdirectoryHintTracker was scanning directories outside the active working directory, allowing files like ~/.codex/AGENTS.md or ~/.claude/CLAUDE.md to be loaded and injected into the agent context. This causes cross-agent context contamination and instruction mixup. Add _is_ancestor_or_same() helper and a path boundary check in _is_valid_subdir(): only directories within the working directory tree (i.e. path.is_relative_to(working_dir)) are allowed. Also add exist_ok=True to mkdir() calls in new tests to prevent pytest-xdist race conditions when workers share the same tmp_path parent. Tests added: - test_outside_working_dir_rejected: verifies sibling dirs are blocked - test_outside_working_dir_absolute_path_rejected: verifies ~/.codex paths blocked - test_inside_workspace_subdir_allowed: verifies normal subdir access unaffected - test_sibling_repo_not_loaded_via_ancestor_walk: ancestor walk stays within workspace |
||
|
|
e3236e99a4 |
fix(anthropic): API-key path skips OAuth autodiscovery + prunes stale entries
When the user picks 'Anthropic API key' at `hermes setup` (vs 'Claude
Pro/Max subscription'), `save_anthropic_api_key()` writes ANTHROPIC_API_KEY
to ~/.hermes/.env and zeros ANTHROPIC_TOKEN. That env-var pattern is the
user's explicit choice of auth method — API key, not OAuth.
But the anthropic credential pool's autodiscovery (_seed_from_singletons)
unconditionally read ~/.claude/.credentials.json from the Claude Code CLI
and any saved hermes_pkce creds, and added them to the SAME anthropic
pool as the user's API key. Two problems:
1. Even with the API key at higher priority, a 401/429 on the API key
would rotate the session onto an autodiscovered OAuth credential,
silently flipping the agent into the Claude Code masquerade
mid-conversation: 'You are Claude Code' system block, every tool
renamed to mcp_*, claude-cli User-Agent header.
2. Switching OAuth → API key at `hermes setup` cleared the env vars
but left previously-seeded OAuth entries dormant in auth.json,
where rotation could revive them.
The user picking the API-key path is explicitly opting OUT of the
masquerade. Mixing OAuth credentials into their pool defeats that
choice.
Fix: in `_seed_from_singletons` for provider='anthropic', detect the
API-key path (ANTHROPIC_API_KEY set in env, no OAuth env var set) and:
- Skip calling read_claude_code_credentials() and
read_hermes_oauth_credentials() entirely
- Prune any stale hermes_pkce / claude_code entries that may already
be in the on-disk pool
OAuth-path users (ANTHROPIC_TOKEN set) are unaffected — autodiscovery
continues to fire as before.
Tests: 3 new regression tests (api-key skips autodiscovery, api-key
prunes stale entries, oauth path still autodiscovers). Full file 70/70.
|
||
|
|
0dee92df22 |
feat(security): promptware defense — shared threat patterns + memory load-time scan + tool-result delimiters (#32269)
Hardens the context window against Brainworm-class promptware attacks (see #496). Three changes: 1. tools/threat_patterns.py — single source of truth for injection/promptware patterns. Replaces the duplicated pattern lists in prompt_builder.py and memory_tool.py. Adds ~15 new Brainworm/C2 patterns (node registration, heartbeat/beacon, pull tasking, anti-forensic disk avoidance, identity override, known framework names). Three scopes — 'all' (narrow, classic injection), 'context' (adds promptware/role-play, broader detection), 'strict' (adds persistence/SSH-backdoor patterns for user-mediated writes). 2. MemoryStore.load_from_disk() now scans entries at snapshot-build time. Poisoned entries are replaced with [BLOCKED: ...] placeholders in the frozen system-prompt snapshot. Live state keeps the original so the user can still inspect + remove via memory(action=read/remove). Scan is deterministic from disk bytes — prefix-cache invariant holds. 3. make_tool_result_message() wraps results from high-risk tools (web_extract, web_search, browser_*, mcp_*) in <untrusted_tool_result source="...">...</untrusted_tool_result> delimiters with framing prose telling the model the content is data, not instructions. Architectural defense against indirect injection from poisoned web pages, GitHub issues, MCP responses — does NOT regex-scan tool results (pattern arms race + per-iteration latency). Multimodal content lists pass through unwrapped to preserve adapter compatibility. Pattern philosophy: anchor on C2-specific vocabulary or unambiguous attack behavior, NOT on bossy English. Dropped patterns suggested in #496 that would have tripped legitimate content: standalone 'you are obligated to', 'do not respond immediately', 'you must X' without a C2-verb anchor. Validation: - 257/257 targeted tests pass (test_threat_patterns + test_memory_tool + test_tool_dispatch_helpers + test_prompt_builder) - E2E run with real Brainworm payload: blocked from AGENTS.md context-file path, blocked from MEMORY.md snapshot, wrapped in delimiters when arriving via web_extract. Legitimate 'you must follow conventions' phrasing not flagged. Explicitly NOT in this PR (per #496 discussion): - Per-tool-result regex scanning (pattern arms race) - SessionBehaviorMonitor / polling-loop detection (wrong layer) - Outbound network gating (Docker backend already covers this) - security.context_scanning warn|block knob (current behavior is always block-with-placeholder — there's no warn mode that makes sense) Closes #496 for Phase 1 + the architectural delimiter piece of Phase 2. Phase 3 stays in tracking issue territory. |
||
|
|
bbc8f2f961 |
chore(models): drop retired grok-4-1-fast from metadata, tests, docs
xAI retired grok-4-1-fast. hermes_cli/models.py already removed it from the static fallback in an earlier commit, but the context-length metadata, the tests pinning those values, and the provider doc still referenced the retired ID. Clean those up so retired model names stop appearing in user-facing output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
0d137f1039 |
feat(errors): actionable guidance for Nous OAuth 401s (#32082)
Nous Portal is OAuth-only (auth_type=oauth_device_code, no API key path), but the non-retryable-401 guidance branch only covered openai-codex and xai-oauth. A Nous 401 fell through to the generic 'Your API key was rejected... run hermes setup' message, which is wrong advice — the user needs hermes auth add nous --type oauth, not an API key. Also flag the case where the failing model slug ends in :free (OpenRouter syntax) while provider is nous. Without that hint, users re-OAuth successfully and then hit the same 401 on the next message because Nous Portal doesn't carry the OpenRouter free-tier slug. Reported by ashh — debug dump showed Nous device_code exhausted + deepseek/deepseek-v4-flash:free as the model. |
||
|
|
dbe5d84972 |
fix(auxiliary): universal main-model fallback for aux tasks (#31845)
Aux callers (title generation, vision, session search, etc.) can reach
resolve_provider_client() without an explicit model when the user
picked their main provider via 'hermes model' and didn't bother
configuring a per-task auxiliary.<task>.model override. The
expectation in that case is universal: 'use my main model for side
tasks too.'
Before, the OAuth providers (xai-oauth, openai-codex) silently
returned (None, None) on an empty model — both lack a catalog default
because their accepted-model lists drift on the backend. That caused
_resolve_auto to drop to its Step-2 fallback chain (OpenRouter /
Nous / etc.), so aux tasks billed against the wrong subscription
without warning.
The fix is at the top of resolve_provider_client() — a single
3-step universal fallback that runs before any provider branch, so
no provider-specific empty-model guards are needed (now or for any
future provider we add):
1. caller-passed model (caller knew what they wanted)
2. provider's catalog default (cheap aux model, if registered)
3. user's main model from config.yaml
Behaviour by provider class:
- OAuth providers (xai-oauth, openai-codex) — no catalog default, so
step 3 applies. Title gen runs on grok-4.3 / gpt-5.4 against the
user's actual subscription instead of leaking to OpenRouter.
- API-key providers (anthropic, gemini, kimi-coding, etc.) — catalog
default wins at step 2, preserving the original 'cheap aux model'
behaviour. Anthropic users still get claude-haiku-4-5 for titles,
not opus.
- Explicit-model callers (auxiliary.<task>.model config, programmatic
callers) — caller wins at step 1, no surprise switching.
Salvaged from @wysie's PR #31845 which fixed the xai-oauth branch
specifically. The universal shape supersedes the per-branch fix
and covers openai-codex (same bug class) plus any future OAuth
providers.
4 new tests in TestResolveProviderClientUniversalModelFallback:
- empty_model_for_oauth_provider_falls_back_to_main_model
- empty_model_for_codex_also_uses_main_model
- empty_model_for_catalog_provider_uses_catalog_default
- explicit_model_takes_precedence_over_fallbacks
365/365 across tests/agent/test_auxiliary_*, tests/run_agent/test_codex_xai_oauth_recovery.py, tests/hermes_cli/test_auth_xai_oauth_provider.py, and tests/hermes_cli/test_plugin_auxiliary_tasks.py.
Co-authored-by: wysie <wysie@users.noreply.github.com>
|
||
|
|
8601c4d44c |
fix(codex): add time-to-first-byte watchdog for stalled Codex streams
The chatgpt.com/backend-api/codex endpoint has an intermittent failure mode where it accepts the connection but never emits a single stream event — the socket just hangs. Direct sequential probing reproduces it (0 events, no HTTP status), and a fresh reconnect then succeeds in ~2s. Today the only guard is the wall-clock stale timeout in interruptible_api_call, so a dead-on-arrival connection is held for the full stale window (90-900s depending on context / config) before the retry loop can reconnect — minutes of wasted wall time per stall, at a rate of ~20% of calls during affected windows. Add a TTFB watchdog scoped to the codex_responses path: - codex_runtime.run_codex_stream stamps agent._codex_stream_last_event_ts on *every* stream event (not just output-text deltas), so reasoning-only and tool-call-only turns are not mistaken for a stall. - interruptible_api_call resets that marker before the worker starts and, while it is still None, kills the connection once elapsed exceeds the TTFB cutoff (default 45s, tunable via HERMES_CODEX_TTFB_TIMEOUT_SECONDS, 0 disables). The raised TimeoutError flows through the existing retry path unchanged. Once any event has arrived the stream is healthy and only the existing wall-clock stale timeout applies, so legitimate long generations are never interrupted. Gated to codex_responses; the chat_completions non-stream, anthropic and bedrock branches have no first-event signal and are untouched. Adds tests/agent/test_codex_ttfb_watchdog.py covering the stall kill, the events-flowing pass-through, and the env-disable path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
ba3c450914 |
fix(security): block read_file on project-local .env files
get_read_block_error() only blocked internal Hermes cache files but allowed reading project-local secret-bearing environment files (.env, .env.production, .env.local, etc.) through both read_file and ACP fs/read_text_file paths. Add a basename deny set for common secret-bearing .env variants. .env.example remains readable as documentation. Fixes #20734 |
||
|
|
2d422720b5 |
fix(codex): size and propagate timeouts for Responses-API requests; lower stale defaults
Codex / Responses-API requests had three latent timeout bugs that combined into the long silent hangs reported on #21444: 1. The non-stream stale-call detector estimated context tokens from ``api_kwargs["messages"]`` only. Codex / Responses-API payloads carry their conversational load in ``input`` (with ``instructions`` and ``tools``), so every Codex turn logged ``context=~0 tokens`` and the detector never applied its >50k / >100k tier bumps. 2. ``providers.<id>.request_timeout_seconds`` was silently dropped on the main Codex path. The chat_completions path and the auxiliary Codex adapter both forwarded it; the main path skipped it through three places (``build_api_kwargs``, ``ResponsesApiTransport.build_kwargs``, ``_preflight_codex_api_kwargs``). 3. The streaming stale detector had the same payload-shape bug for ``codex_responses`` requests, which route through the non-streaming detector (it's the path that emits the user-facing "No response from provider for 300s (non-streaming, ...)" warning that reporters keep pasting). This commit: - Adds ``estimate_request_context_tokens`` in ``chat_completion_helpers``, used by both the non-stream and stream detectors. Handles ``messages`` (Chat Completions), ``input + instructions + tools`` (Responses API), bare lists, and an unknown-dict fallback. - Forwards ``timeout`` through ``ResponsesApiTransport.build_kwargs`` and ``_preflight_codex_api_kwargs`` (with guards against zero/negative/inf/bool values), and wires ``_resolved_api_call_timeout()`` into the Codex branch of ``build_api_kwargs``. - Lowers the implicit non-stream stale defaults so fallback providers kick in faster when upstream stalls: * base 300s -> 90s * >50k 450s -> 150s * >100k 600s -> 240s These only apply when the user has *not* set ``providers.<id>.stale_timeout_seconds`` or ``HERMES_API_CALL_STALE_TIMEOUT``. Explicit config still wins. - Adds regression tests for the estimator shapes, the new defaults, the context-tier scaling, transport timeout pass-through, and preflight timeout pass-through / rejection of invalid values. Closes #21444 Supersedes #21652 #24126 #31855 Co-authored-by: Hoang V. Pham <26063003+hehehe0803@users.noreply.github.com> |
||
|
|
2cd952e110 |
feat(stt): add register_transcription_provider() plugin hook
Add an opt-in Python plugin surface for speech-to-text backends,
mirroring the TTS hook pattern. New backends (OpenRouter, SenseAudio,
Gemini-STT, custom proprietary engines) can be implemented as plugins
without modifying tools/transcription_tools.py.
Built-ins always win
--------------------
The 6 built-in STT providers (local/faster-whisper, local_command,
groq, openai, mistral, xai) keep their native handlers. Plugins
attempting to register under a built-in name are rejected at
registration time with a warning and re-checked defensively at
dispatch.
Resolution order
----------------
1. stt.provider matches a built-in → built-in dispatch (unchanged)
2. stt.provider matches a registered plugin →
a. if plugin.is_available() returns False → unavailability envelope
identifying the plugin (not the generic "No STT provider"
message — the user explicitly opted into this plugin)
b. otherwise plugin.transcribe() with model + language forwarded
from stt.<provider>.{model,language} config
3. No match → legacy "No STT provider available" error (unchanged)
Per-provider config namespace
-----------------------------
Plugins read their config from stt.<provider> in config.yaml, mirroring
how built-ins read stt.openai.model / stt.mistral.model. The dispatcher
forwards `model` and `language` from this section. Caller's explicit
`model=` argument overrides the config-set model.
Files
-----
- agent/transcription_provider.py: TranscriptionProvider ABC
- agent/transcription_registry.py: register/get/list providers,
built-in shadow guard, _reset_for_tests
- hermes_cli/plugins.py: register_transcription_provider() on
PluginContext
- tools/transcription_tools.py: BUILTIN_STT_PROVIDERS frozenset,
_dispatch_to_plugin_provider() with availability gate, wire-in
after xai branch and before "No STT provider" error
- tests/agent/test_transcription_registry.py: 27 tests
- tests/hermes_cli/test_plugins_transcription_registration.py: 3 tests
- tests/tools/test_transcription_plugin_dispatch.py: 28 tests
(covering built-in short-circuit, plugin dispatch, exception
envelope, non-dict guard, availability gate, language forwarding)
- tests/plugins/transcription/check_parity_vs_main.py: 10-scenario
subprocess-pinned parity harness vs origin/main
- website/docs/user-guide/features/{tts,plugins}.md: docs
Behavior parity
---------------
10 scenarios, 8 OK + 2 expected DIFFs:
no_provider_error → plugin (plugin-installed scenario)
no_provider_error → plugin_unavailable (plugin-installed-unavailable
scenario; PR returns cleaner envelope)
Zero behavior change for users not opting into a plugin.
Issue follow-up to #30398.
|
||
|
|
d7c5d5dee5 | fix: avoid persisting borrowed credential secrets (#31416) | ||
|
|
031f9c9edc |
fix(image_gen): cache xAI ephemeral URL responses to disk (#26942) (#31759)
xAI's grok-imagine-image API returns ephemeral imgen.x.ai/xai-tmp-* URLs that 404 within minutes — long before downstream consumers (Telegram send_photo, browser preview, multi-tier delivery fallback) get a chance to fetch them. The xAI image_gen provider was passing those URLs through unchanged on the elif url: branch; b64 responses were already cached locally via save_b64_image. Result: every image_generate call on a Telegram-routed xai-oauth profile delivered no image, falling through to text-only. Adds agent.image_gen_provider.save_url_image() — a sibling helper to save_b64_image that downloads URL bytes to $HERMES_HOME/cache/images/. Content-type-aware extension inference with URL-suffix fallback; oversize cap (25MB default) with partial-write cleanup; empty-body refusal. Mirrors the audio_cache pattern used by text_to_speech. Wires save_url_image into both the xAI and OpenAI providers' URL branches. When the download fails (network blip, 404 in-flight) we log a warning and fall back to the bare URL rather than turning the tool call into a hard error — the gateway's existing URL-send fallback then gets a chance to surface the original error legibly. Test plan: - tests/agent/test_save_url_image.py — 8 direct tests against a real in-process HTTP server: bytes round-trip, content-type → extension, URL-suffix fallback, default-to-png, 404 propagation, empty-body refusal, oversize cap + cleanup, filename uniqueness. - tests/plugins/image_gen/test_xai_provider.py — flip test_successful_url_response (was asserting the bug), add test_url_response_falls_back_to_bare_url_when_download_fails. - tests/plugins/image_gen/test_openai_provider.py — symmetric pair. 160/160 in the broader image_gen test surface. |
||
|
|
00ec0b617c |
feat(tts): add register_tts_provider() plugin hook (closes #30398)
Adds a `TTSProvider(ABC)` + `register_tts_provider()` extension point to the plugin context API, **alongside** the existing config-driven `tts.providers.<name>: type: command` registry from PR #17843. This is additive — the command-provider surface stays as the primary way to add a TTS backend. The hook covers cases the shell-template grammar can't reasonably express: - Native Python SDKs without a CLI (Cartesia, Fish Audio, etc.) - Streaming synthesis (chunked Opus → voice-bubble delivery) - Voice metadata API for the `hermes tools` picker - OAuth-refreshing auth flows None of the 10 inline built-in providers (`edge`, `openai`, `elevenlabs`, `minimax`, `gemini`, `mistral`, `xai`, `piper`, `kittentts`, `neutts`) are migrated to plugins. They stay inline. The hook is for *new* engines that aren't built-in. ## Resolution order The dispatcher's resolution order is the load-bearing invariant: 1. `tts.provider` is a built-in name → built-in dispatch. **Always wins.** 2. `tts.provider` matches `tts.providers.<name>` with `command:` set → command-provider dispatch (PR #17843). 3. `tts.provider` matches a plugin-registered `TTSProvider` → plugin dispatch (new). 4. No match → falls through to Edge TTS default (legacy behavior). Built-ins-always-win is enforced at THREE layers: - Registry: `register_provider()` rejects shadowing names with a warning. - Dispatcher: `_dispatch_to_plugin_provider()` short-circuits built-in names defensively before consulting the registry. - Picker: `_plugin_tts_providers()` filters built-in shadows out of the `hermes tools` row list defensively. Command-providers-win-over-plugins is enforced at TWO layers: - The caller in `text_to_speech_tool` checks `_resolve_command_provider_config` first. - `_dispatch_to_plugin_provider` re-checks for a same-name command config defensively so a refactor of the caller can't silently break the invariant. ## New files - `agent/tts_provider.py` — `TTSProvider(ABC)` with `synthesize()` (required), `list_voices()`, `list_models()`, `get_setup_schema()`, `stream()`, `voice_compatible` (all optional with sane defaults). Mirrors `agent/image_gen_provider.py` shape. - `agent/tts_registry.py` — `register_provider`/`get_provider`/`list_providers` with `_BUILTIN_NAMES` reject-shadowing invariant. Mirrors `agent/image_gen_registry.py` shape. - `plugins/tts/...` directory ready for community plugins (none shipped). ## Modified files - `hermes_cli/plugins.py` — `register_tts_provider()` method on `PluginContext`. Matches the gating shape of `register_image_gen_provider()` / `register_browser_provider()`. - `tools/tts_tool.py` — `_dispatch_to_plugin_provider()` + `_plugin_provider_is_voice_compatible()` + walrus-elif wiring into the main dispatcher. Built-in elif chain untouched. - `hermes_cli/tools_config.py` — `_plugin_tts_providers()` injects plugin rows into the Text-to-Speech picker category alongside the 10 hardcoded built-in rows. ## Tests - `tests/agent/test_tts_registry.py` — 47 tests covering registration, lookup, ABC contract, helpers, AND a `TestBuiltinSync` regression test that fails if `agent.tts_registry._BUILTIN_NAMES` drifts from `tools.tts_tool.BUILTIN_TTS_PROVIDERS` (kept duplicated due to circular import constraints). - `tests/tools/test_tts_plugin_dispatch.py` — 35 tests covering built-in-always-wins, command-wins-over-plugin, plugin dispatch, exception passthrough, voice_compatible helper. - `tests/hermes_cli/test_tts_picker.py` — 10 tests covering the picker surface, builtin shadowing defense, integration with `_visible_providers`. - `tests/hermes_cli/test_plugins_tts_registration.py` — 3 end-to-end tests via `PluginManager.discover_and_load()`. - `tests/plugins/tts/check_parity_vs_main.py` — 9-scenario subprocess parity harness vs `origin/main`. The only intentional diff is `fallback_edge → plugin` for the `plugin-installed` scenario. ## Verification - 95/95 new tests pass. - 170/170 pre-existing TTS tests (test_tts_command_providers, test_tts_max_text_length, test_tts_speed, etc.) pass unchanged. - Parity harness against `origin/main`: 8 OK + 1 expected DIFF. - E2E smoke: a registered plugin's `synthesize()` is called via `text_to_speech_tool` with the standard JSON envelope returned. - Ruff clean on all touched files. ## Docs - `website/docs/user-guide/features/tts.md` — new "Python plugin providers" section with a decision table (command-provider vs plugin), minimal plugin example, and the optional-hook reference. - `website/docs/user-guide/features/plugins.md` — TTS row updated to mention both surfaces (command-provider primary, plugin for SDK/streaming). Closes #30398 |
||
|
|
223a3971c0 |
fix(security): close TOCTOU window when saving Claude Code OAuth credentials (#21152)
_write_claude_code_credentials wrote ~/.claude/.credentials.json via Path.write_text + replace + post-write chmod(0o600). Both the temp file and the destination briefly inherited the process umask (commonly 0o644 = world-readable) between create/replace and chmod, exposing the OAuth access/refresh tokens to other local users on multi-user hosts. Use os.open with O_WRONLY|O_CREAT|O_EXCL and an explicit S_IRUSR|S_IWUSR mode so the temp file is created atomically at 0o600. After os.replace, the destination inherits the temp's mode, so the post-write chmod is no longer needed. The temp name also gains a per-process random suffix to avoid collisions between concurrent writers and stale leftovers from a crashed prior write. Parent dir (~/.claude/) is owned by Claude Code itself and shared with its native auth, so we deliberately don't tighten its mode here (unlike the mcp_oauth fix which owns its own subtree under HERMES_HOME). Mirrors the fix shipped for agent/google_oauth.py in #19673 and the parallel fix for tools/mcp_oauth.py in #21148. Adds a regression test in TestWriteClaudeCodeCredentials asserting the resulting file mode is 0o600 (skipped on Windows where POSIX mode bits aren't enforced). |
||
|
|
bba76f3dcd | fix(file-safety): deny reads of Google OAuth tokens (#30972) | ||
|
|
514f5020c7 | fix(debug): redact BlueBubbles webhook secrets | ||
|
|
eea9553a9c |
fix(anthropic): skip mcp_ prefix on outgoing tool schemas when already prefixed
Companion to the GH-25255 incoming-strip fix from @hayka-pacha. Without this, build_anthropic_kwargs unconditionally added 'mcp_' to every tool name in step 3, so a native MCP server tool registered as 'mcp_composio_X' was sent as 'mcp_mcp_composio_X' on the wire. The incoming strip only removes ONE prefix, which still worked on first call, but on subsequent calls the model pattern-matched the single-prefixed form from message history and produced names that stripped to 'composio_X' — registry miss, dispatch fail. The history-rewrite block (#4) already has this guard. Apply the same guard to the schema-rewrite block (#3) so round-trip is symmetric. Added 4 outgoing-side tests. Existing 7 incoming-side tests still pass. Author map: hayka-pacha added for PR #25270 salvage attribution. Refs GH-25255. |
||
|
|
2f91a8406c |
fix(agent): only strip mcp_ prefix for OAuth-injected tools (GH-25255)
When strip_tool_prefix=True (Anthropic OAuth path), normalize_response unconditionally stripped the mcp_ prefix from ALL tool names starting with mcp_. This broke Hermes-native MCP server tools (registered under their full mcp_<server>_<tool> name in the registry) because the stripped name doesn't match any registry entry. Fix: check the tool registry before stripping. Only strip when: - The stripped name EXISTS in the registry (OAuth-injected tool) - The full name does NOT exist in the registry This preserves backward compatibility for OAuth-injected tools while protecting native MCP server tools from incorrect prefix removal. 7 new tests covering: OAuth strip, native preserve, no-flag, non-mcp, unknown tools, mixed responses, and dual-registration edge case. Signed-off-by: HKPA <hayka-pacha@users.noreply.github.com> |
||
|
|
6212e9ade8 |
fix(error-classifier): treat 5xx request-validation errors as non-retryable
Standard OpenAI returns request-validation failures (unknown/ unsupported parameter, malformed request) as 4xx. Some OpenAI-compatible gateways return them as 5xx instead — codex.nekos.me returns 502 for an unknown parameter. The generic '5xx -> retryable server_error' rule then misfires: the error is deterministic (every retry gets the identical rejection), so the retry loop burns all 3 attempts, the transport-recovery path resets the counter and burns 3 more, and the result is a request flood against a request that can never succeed. Fix: when a 500/502 body carries an unambiguous request-validation signal — 'unknown parameter' / 'unsupported parameter' / 'invalid_request_error' in the message text, or invalid_request_error / unknown_parameter / unsupported_parameter as the structured error code — classify as a non-retryable format_error so the loop fails fast and falls back. Genuine 502 Bad Gateway with no such signal stays retryable as before. Origin: local-author Upstream-PR: none Patch-State: local-only |
||
|
|
775a17284f |
fix(transport): strip Hermes-internal scaffolding keys before chat.completions
The empty-response recovery path in run_agent.py appends synthetic
messages tagged with _empty_recovery_synthetic (and the agent loop uses
_thinking_prefill / _empty_terminal_sentinel similarly). These are
internal bookkeeping markers — they must never reach the wire.
chat_completions' convert_messages only stripped Codex Responses leak
fields (codex_reasoning_items, call_id, etc.), not these _-prefixed
markers. Permissive providers (real OpenAI, Anthropic) silently ignore
unknown message keys so the bug stayed hidden, but strict
OpenAI-compatible gateways reject them outright. Observed against
codex.nekos.me:
502: [ObjectParam] [input[617]._empty_recovery_synthetic]
[unknown_parameter] Unknown parameter:
'_empty_recovery_synthetic'
Because the synthetic messages persist in the session, every
subsequent request in that session carries the poisoned key and
fails identically — a deterministic 502 the retry loop mistakes for
a transient server error.
Fix: convert_messages now drops any top-level message key starting
with '_'. OpenAI's message schema has no '_'-prefixed fields, so this
is safe and future-proofs against new internal markers.
Origin: local-author
Upstream-PR: none
Patch-State: local-only
|
||
|
|
3d66787a04 |
fix(vision): route auxiliary.vision.provider=openai to api.openai.com, skip text-only main (#31452)
* fix(vision): route auxiliary.vision.provider=openai to api.openai.com, skip text-only main for vision Fixes #31179. Three coupled fixes so a configured aux vision backend actually serves vision tasks instead of silently routing images to the user's main provider: 1. agent/auxiliary_client.py: `auxiliary.<task>.provider: openai` resolves to `custom` + `https://api.openai.com/v1`. "openai" was not in PROVIDER_REGISTRY (we have `openai-codex` for OAuth and `custom` for manual base_url), so the obvious config name silently failed to build a client. User-supplied base_url is still preserved; only the provider name normalises to `custom` so resolution doesn't hit the PROVIDER_REGISTRY-only path. 2. agent/auxiliary_client.py: the vision auto-detect chain now skips the user's main provider when models.dev reports `supports_vision=False`. Without this guard, a misconfigured aux provider would fall back to `auto`, which happily returned the main-provider client. The caller would then send image content to e.g. api.deepseek.com with model `gpt-4o-mini` and get a cryptic `unknown variant 'image_url', expected 'text'` from the provider's parser. 3. tools/vision_tools.py + tools/browser_tool.py: `check_vision_requirements` now mirrors the runtime fallback chain (explicit provider, then auto), so `vision_analyze` shows up whenever vision is actually serviceable. `browser_vision` gets a new `check_browser_vision_requirements` check_fn that AND-gates browser + vision availability, so it doesn't get advertised to the model when the call would fail at runtime. Reproduction (config from the bug report): model.provider: deepseek model.default: deepseek-v4-pro auxiliary.vision.provider: openai auxiliary.vision.model: gpt-4o-mini Before: resolve_vision_provider_client() returns None for the explicit provider, fallback auto returns the deepseek client with model='gpt-4o-mini', image hits api.deepseek.com → 'unknown variant image_url'. vision_analyze hidden from tool list; browser_vision exposed but fails at call time. After: resolves to custom + api.openai.com/v1 with model gpt-4o-mini. vision_analyze and browser_vision both gate correctly on capability. Tests: tests/agent/test_vision_routing_31179.py covers all three fixes (12 cases including the user's exact scenario, base_url preservation, text-only-main skip, capability-unknown permissive fallback, and tool gating parity). Existing 382 tests across auxiliary/vision/image_routing suites still pass. * test(vision): use exact hostname check to silence CodeQL substring-sanitization alert * fix(auxiliary): drop model name from vision-skip debug log to silence CodeQL The new `logger.debug(...)` added in the previous commit interpolated both `main_provider` and `vision_model` (a public model slug \u2014 not sensitive). CodeQL's `py/clear-text-logging-sensitive-data` heuristic re-flagged it twice because the rule mis-detects multi-value interpolations near tainted-via-config provider strings. Drop the model from the log args (provider alone is enough to diagnose the skip; the same sibling branch a few lines up already logs provider only). Behavior unchanged; CodeQL false positive cleared. |
||
|
|
d3c167b644 |
fix(profiles): cross-profile soft guard on file-write tools + system-prompt hint (#31290)
* fix(profiles): cross-profile soft guard on file-write tools + system-prompt hint
Adds a soft guard so an agent running under one Hermes profile cannot
silently edit a different profile's skills/plugins/cron/memories.
Three layers:
A. agent/file_safety.classify_cross_profile_target
Classifies a write target against the active HERMES_HOME. Returns
a {active_profile, target_profile, area, target_path} dict when the
path lands in another profile's scoped area. PROFILE_SCOPED_AREAS =
(skills, plugins, cron, memories). get_cross_profile_warning()
wraps it into a model-facing error string that names both profiles,
names the area, and points at the cross_profile=True bypass.
Defense-in-depth, NOT a security boundary — the terminal tool runs
as the same OS user and can write any of these paths directly. The
guard exists to prevent confused-agent corruption, not to stop a
determined attacker. SECURITY.md §3.2 (terminal-bypass posture)
still applies.
Wired into tools/file_tools.write_file_tool and patch_tool with a
cross_profile=False kwarg. WRITE_FILE_SCHEMA and PATCH_SCHEMA both
advertise cross_profile so the model can pass it after explicit
user direction. patch_tool extracts target paths from V4A patch
bodies before checking (same shape as the existing sensitive-path
check).
skill_manage is already scoped to the active profile's SKILLS_DIR
by construction, so no extra guard wiring is needed there. The
D-side error message (below) still names other profiles when the
skill exists elsewhere.
B. agent/system_prompt
One deterministic line near the environment-hints block names the
active profile and tells the model not to modify another profile's
skills/plugins/cron/memories without explicit direction. Profile
name is stable for the lifetime of the AIAgent, so the line is
prompt-cache-safe.
D. tools/skill_manager_tool._skill_not_found_error
Replaces the bare "Skill 'X' not found." with a message that:
- names the active profile,
- searches OTHER profiles' skills dirs for the same name,
- names the profile(s) where the skill exists and the path,
- suggests `hermes -p <name>` to switch profiles, or
cross_profile=True for an explicit edit.
All 5 "not found" sites in skill_manager_tool (edit, patch, delete,
write_file, remove_file) now go through the helper.
Reference incident (May 2026): a hermes-security profile session
edited skills under both ~/.hermes/profiles/hermes-security/skills/
AND ~/.hermes/skills/ (the default profile's skills) without
realizing the second path belonged to a different profile. Three of
the four skill files needed manual restoration afterward.
What this PR does NOT do:
* No hard block. The terminal tool can still touch any of these
paths with no guard — same posture as the dangerous-command
approval flow. SECURITY.md §3.2 applies.
* No regex sweep on terminal commands for cross-profile paths.
That direction is a Skills-Guard-style arms race (cd + relative
paths, base64, etc.) and would false-positive on legitimate
cross-profile reads. Filed as a follow-up.
* No on-disk path migration. ~/.hermes/skills/ remains the
default profile's skills dir; this PR is about telling the
agent about that boundary, not changing the layout.
Tests:
tests/agent/test_file_safety_cross_profile.py (16 tests)
- _resolve_active_profile_name covers default/named/failure paths
- classify_cross_profile_target covers all four scoped areas,
both directions (default → named, named → default, named → named),
non-Hermes paths, and root-level config files
- get_cross_profile_warning covers in-profile no-op, cross-profile
message shape, and the defense-in-depth self-documentation
tests/tools/test_cross_profile_guard.py (12 tests)
- write_file: in-profile allow, cross-profile block, cross_profile=True
bypass, non-Hermes pass-through
- patch: replace-mode block, cross_profile=True bypass, V4A patch
path extraction
- skill_manage: error names the other profile (single + multiple),
missing-everywhere falls back to skills_list hint
- system prompt: contract-level checks (both branches present,
cross_profile=True mentioned, ~/.hermes/profiles/ referenced)
All 207 existing tests in file_safety/file_operations/skill_manager
still pass. 10 system-prompt tests still pass.
E2E verified: the exact incident scenario (security profile editing
default's hermes-agent-dev skill) is now blocked with the warning
message; cross_profile=True unblocks.
* fix(code_execution): add cross_profile to write_file/patch stubs
The cross_profile kwarg added to write_file_tool/patch_tool needs to
flow through the execute_code sandbox stubs in _TOOL_STUBS so the
test_stubs_cover_all_schema_params drift test passes. Without this,
scripts running inside execute_code couldn't pass cross_profile=True
through hermes_tools.write_file().
Caught by CI on PR #31290.
|
||
|
|
2b10024ee8 |
test(display): cover failure-suffix rendering + update scrollback test
The original PR #17194 description claimed test_display_tool_preview.py but only ever shipped test_display_todo_progress.py. Add the missing coverage for the failure-suffix path: - _trim_error: whitespace strip, length cap, File-not-found path collapse - _detect_tool_failure: terminal exit codes, memory full, structured {error}/{message} extraction, malformed JSON, None result - get_cute_tool_message E2E: read_file failure, terminal exit-only, terminal stderr message, memory full, success path, no-result path Also update test_tool_progress_scrollback.test_error_suffix_on_failed_tool to reflect the new behavior: the generic '[error]' fallback in cli.py has been removed; failure suffixes now come from the result-aware _detect_tool_failure (e.g. '[exit 1]', '[File not found: x]'). |
||
|
|
ffde8b7b09 |
feat(cli): show todo progress as done/total fraction
Parse the todo_tool result summary to display completion progress in CLI tool preview lines: Read: ┊ 📋 plan 3/4 task(s) 0.5s Update: ┊ 📋 plan update 3/4 ✓ 0.5s Create: falls back to plain count when no completed tasks Falls back gracefully to the existing 'N task(s)' format when the result is missing, malformed, or has no completed items. Originally proposed in PR #17194 by Albert.Zhou; salvaged onto current main. Co-authored-by: Albert.Zhou <albert748@gmail.com> |
||
|
|
83f6a83b24 | fix(tui): handle images with codex app-server | ||
|
|
e752c9454e |
feat(plugins): add register_auxiliary_task() to PluginContext API
Auxiliary LLM tasks (vision, compression, web_extract, etc.) currently
require modifications to core files for any plugin that needs its own
task slot — specifically the _AUX_TASKS list in hermes_cli/main.py and
the hardcoded env-var bridging dict in gateway/run.py. This violates
the 'plugins must not modify core files' rule and forces every memory
or context plugin that wants its own auxiliary task to either fork
core or open a coupled core+plugin PR.
This change adds a generic plugin surface for auxiliary task
registration:
ctx.register_auxiliary_task(
key='memory_retain_filter',
display_name='Memory retain filter',
description='hindsight pre-retain dedup/extract',
defaults={'timeout': 30, 'extra_body': {'reasoning_effort': 'low'}},
)
After registration, the task automatically:
- Appears in 'hermes model → Configure auxiliary models' picker via
a new _all_aux_tasks() merge of built-in + plugin tasks
- Has its provider/model/base_url/api_key bridged from config.yaml
to AUXILIARY_<KEY_UPPER>_* env vars at gateway startup
(gateway/run.py now uses a dynamic bridged-keys set instead of
a hardcoded per-task dict)
- Gets plugin-declared defaults (timeout, extra_body, etc.) layered
underneath user config so unconfigured plugin tasks still work
(agent/auxiliary_client._get_auxiliary_task_config)
- Resets to auto via 'Reset all to auto' alongside built-ins
Validation:
- Rejects shadowing of built-in keys (vision, compression, etc.)
- Rejects invalid key shapes (must match [A-Za-z0-9_]+)
- Rejects cross-plugin collisions (clear error)
- Allows same-plugin re-registration (idempotent updates)
Plugin discovery failures (rare) fall back gracefully — the aux
config UI still shows built-in tasks if get_plugin_auxiliary_tasks()
raises, and gateway env-var bridging keeps working for built-ins.
Built-in tasks remain hardcoded in _AUX_TASKS for stability — they're
the baseline UX, and DEFAULT_CONFIG already ships their defaults.
Plugin tasks layer on top.
Tests: 15 new tests in test_plugin_auxiliary_tasks.py covering API
validation, manager state lifecycle, helper sort order, _all_aux_tasks
merge semantics, _reset_aux_to_auto inclusion of plugin tasks, and
default-layering in auxiliary_client.
Updates the gateway-bridge code-parity test (test_auxiliary_config_bridge)
to assert the new dynamic shape rather than the hardcoded literal env
var names which no longer appear post-refactor.
Motivation: this unblocks PR #20262 (hindsight smart retain pipeline)
and similar plugins that need a dedicated aux task slot. The change
is non-breaking — built-in env vars (AUXILIARY_VISION_PROVIDER, etc.)
keep working since they're produced by the same f-string template
that built the hardcoded names.
|
||
|
|
8b2adead78 | fix(compressor): ABC compliance — total_tokens, api_mode, logger consistency | ||
|
|
71291d83cd | test: keep tirith checks hermetic | ||
|
|
97e975edd2 |
fix(file-safety): widen read-deny to .env, mcp-tokens/, webhook secrets, root
Extends @briandevans's PR #17659 from {auth.json, auth.lock, .anthropic_oauth.json} to also cover: - HERMES_HOME/.env (provider API keys) - HERMES_HOME/webhook_subscriptions.json (per-route HMAC secrets) - HERMES_HOME/mcp-tokens/ (OAuth token directory; dir + everything inside) …AND iterates over both _hermes_home_path() AND _hermes_root_path() so profile-mode runs (HERMES_HOME = <root>/profiles/<name>) also block <root>/{auth.json, .env, mcp-tokens/, ...}. Same widening shape as the write-deny side already does (#15981, #14157). Explicitly NOT a security boundary. Per the personal-assistant trust model, the terminal tool runs as the same OS user and can `cat auth.json` directly. This read-deny exists as defense-in-depth: - Models that respect tool denials empirically tend to stop rather than reach for the shell. - The denial surfaces an audit trail when something tries to read credentials — easier to spot in logs than a generic `cat`. Docstring + error message both flag this as defense-in-depth so future contributors don't mistake it for a real security boundary and don't re-decline reports that propose the same fix shape. Absorbs the .env and mcp-tokens/ coverage from @tomqiaozc's parallel PR #8055 (closed-as-duplicate, credited). Co-authored-by: Tom Qiao <zqiao@microsoft.com> |
||
|
|
567ea61298 |
fix(file-safety): block auth.json read via TERMINAL_CWD relative path
read_file_tool resolves relative paths against TERMINAL_CWD (or the task's live terminal cwd), but the prior call passed the original unresolved string to get_read_block_error. That function's own resolve() is anchored at the Python process cwd, so when a task's TERMINAL_CWD pointed at HERMES_HOME and the agent issued read_file on the relative path "auth.json", the credential-store denylist was never reached and the file was read normally. Pass the already-resolved absolute path string at the file_tools call site, document the contract on get_read_block_error, and add a read_file_tool-level regression test that pins the relative-path case under TERMINAL_CWD == HERMES_HOME. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |