Compare commits

..

11 Commits

Author SHA1 Message Date
alt-glitch 415043315f refactor: remove config TypedDicts and fix ImportError propagation in clipboard
Remove 44 TypedDict classes from config.py — they were already stale
(11 missing keys) and load_config() still returns Dict[str, Any], so
they provided zero type-checking value. Keep the int() coercions and
Dict[str, Any] annotations which are real fixes.

Fix _wayland_save() swallowing ImportError at DEBUG level by adding
an explicit except ImportError: raise before the broad except Exception.
2026-04-23 17:41:25 +05:30
alt-glitch 98eb32f39a Clean up TODO comment in auxiliary_client.py
Remove the unnecessary nudge about agent refactoring; the TODO describes
the actual work that needs to be done.
2026-04-23 17:41:18 +05:30
alt-glitch 2df306e6cd Add helpful ImportError messages for optional dependencies
When optional dependencies are missing, raise ImportError with
installation
instructions pointing to the relevant extras group (e.g. `[messaging]`,
`[cli]`, `[mcp]`, etc.) instead of letting the import fail silently.
2026-04-23 17:41:18 +05:30
alt-glitch 79a5f03f92 refactor(types): simplify pass on P1 batch
Follow-up to 15ac253b per /simplify review:

- gateway/platforms/discord.py:3638 - move self.resolved = True *after*
  the `if interaction.data is None: return` guard. Previously the view
  was marked resolved before the None-guard, so a None data payload
  silently rejected the user's next click.
- agent/display.py:732 - replace `if self.start_time is None: continue`
  with `assert self.start_time is not None`. start() sets start_time
  before the animate thread starts, so the None branch was dead; the
  `continue` form would have busy-looped (skipping the 0.12s sleep).
- tests/hermes_cli/test_config_shapes.py - drop __total__ dunder
  restatement test (it just echoes the class declaration); trim commit
  narration from module docstring.
- tests/agent/test_credential_pool.py, tests/tools/test_rl_training_tool.py -
  drop "added in commit ..." banners (narrates the change per CLAUDE.md).
2026-04-23 17:41:18 +05:30
alt-glitch 527ca7d238 fix(types): batch P1 ty hotfixes + run_agent.py annotation pass
15 P1 ship-stopper runtime bugs from the ty triage plus the cross-bucket
cleanup in run_agent.py. Net: -138 ty diagnostics (1953 -> 1815). Major
wins on not-subscriptable (-34), unresolved-attribute (-29),
invalid-argument-type (-26), invalid-type-form (-20),
unsupported-operator
(-18), invalid-key (-9).

Missing refs (structural):
- tools/rl_training_tool.py: RunState dataclass gains api_log_file,
  trainer_log_file, env_log_file fields; stop-run was closing undeclared
  handles.
- agent/credential_pool.py: remove_entry(entry_id) added, symmetric with
  add_entry; used by hermes_cli/web_server.py OAuth dashboard cleanup.
- hermes_cli/config.py: _CamofoxConfig TypedDict defined (was referenced
  by _BrowserConfig but never declared).
- hermes_cli/gateway.py: _setup_wecom_callback() added, mirroring
  _setup_wecom().
- tui_gateway/server.py: skills_hub imports corrected from
  hermes_cli.skills_hub -> tools.skills_hub.

Typo / deprecation:
- tools/transcription_tools.py: os.sys.modules -> sys.modules.
- gateway/platforms/bluebubbles.py: datetime.utcnow() ->
  datetime.now(timezone.utc).

None-guards:
- gateway/platforms/telegram.py:~2798 - msg.sticker None guard.
- gateway/platforms/discord.py:3602/3637 - interaction.data None +
  SelectMenu narrowing; :3009 - thread_id None before `in`; :1893 -
  guild.member_count None.
- gateway/platforms/matrix.py:2174/2185 - walrus-narrow
  re.search().group().
- agent/display.py:732 - start_time None before elapsed subtraction.
- gateway/run.py:10334 - assert _agent_timeout is not None before `//
  60`.

Platform override signature match:
- gateway/platforms/email.py: send_image accepts metadata kwarg;
  send_document accepts **kwargs (matches base class).

run_agent.py annotation pass:
- callable/any -> Callable/Any in annotation position (15 sites in
  run_agent.py + 5 in cli.py, toolset_distributions.py,
  tools/delegate_tool.py, hermes_cli/dingtalk_auth.py,
  tui_gateway/server.py).
- conversation_history param widened to list[dict[str, Any]] | None.
- OMIT_TEMPERATURE sentinel guarded from leaking into
  call_llm(temperature): kwargs-dict pattern at run_agent.py:7337 +
  scripts/trajectory_compressor.py:618/688.
- build_anthropic_client(timeout) widened to Optional[float].

Tests:
- tests/agent/test_credential_pool.py: remove_entry (id match,
  unknown-id, priority renumbering).
- tests/hermes_cli/test_config_shapes.py: _CamofoxConfig shape +
  nesting.
- tests/tools/test_rl_training_tool.py: RunState log_file fields.
2026-04-23 17:41:18 +05:30
alt-glitch b11e53e34f fix: resolve not-subscriptable ty diagnostics across codebase
Add TypedDicts for DEFAULT_CONFIG, CLI state dicts (_ModelPickerState,
_ApprovalState, _ClarifyState), and OPTIONAL_ENV_VARS so ty can resolve
nested dict subscripts.  Guard Optional returns before subscripting
(toolsets, cron/scheduler, delegate_tool), coerce str|None to str before
slicing (gateway/run, run_agent), split ternary for isinstance narrowing
(wecom), and suppress discord interaction.data access with ty: ignore.
2026-04-23 17:41:10 +05:30
alt-glitch 1e7a598bac fix: declare undeclared soft deps in extras and remove silent import guards
Previously mutagen, aiohttp-socks, tiktoken, Pillow, psutil, datasets,
neutts, and soundfile were used behind try/except ImportError with silent
fallbacks, masking broken functionality at runtime.  Declare each in its
natural extra (messaging, cli, mcp, rl, new tts-local) so they get
installed, and remove the guards so missing deps crash loudly.
2026-04-23 17:41:10 +05:30
alt-glitch 3eddabf53b fix: resolve all call-non-callable ty diagnostics across codebase
Replace hasattr() duck-typing with isinstance() checks for DiscordAdapter
in gateway/run.py, add TypedDict for IMAGEGEN_BACKENDS in tools_config.py,
properly type fal_client getattr'd callables in image_generation_tool.py,
fix dict[str, object] → Callable annotation in approval.py, use
isinstance(BaseModel) in web_tools.py, capture _message_handler to local
in base.py, rename shadowed list_distributions parameter in batch_runner.py,
and remove dead queue_message branch.
2026-04-23 17:40:52 +05:30
alt-glitch 971542d254 refactor: move standalone scripts to scripts/ directory
Move batch_runner, trajectory_compressor, mini_swe_runner, and rl_cli
from the project root into scripts/, update all imports, logger names,
pyproject.toml, and downstream test references.
2026-04-23 17:40:52 +05:30
alt-glitch 4a95029e6c fix: resolve all invalid-return-type ty diagnostics across codebase
Widen return type annotations to match actual control flow, add
unreachable assertions after retry loops ty cannot prove terminate,
split ambiguous union returns (auth.py credential pool), and remove
the AIOHTTP_AVAILABLE conditional-import guard from api_server.py.
2026-04-23 17:40:52 +05:30
alt-glitch 432614591a Add TYPE_CHECKING imports to fix unresolved-reference type bugs 2026-04-23 17:40:52 +05:30
642 changed files with 9150 additions and 102970 deletions
-3
View File
@@ -53,9 +53,6 @@ jobs:
- name: Extract skill metadata for dashboard
run: python3 website/scripts/extract-skills.py
- name: Regenerate per-skill docs pages + catalogs
run: python3 website/scripts/generate-skill-docs.py
- name: Build skills index (if not already present)
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-3
View File
@@ -36,9 +36,6 @@ jobs:
- name: Extract skill metadata for dashboard
run: python3 website/scripts/extract-skills.py
- name: Regenerate per-skill docs pages + catalogs
run: python3 website/scripts/generate-skill-docs.py
- name: Lint docs diagrams
run: npm run lint:diagrams
working-directory: website
+77 -224
View File
@@ -5,61 +5,78 @@ Instructions for AI coding assistants and developers working on the hermes-agent
## Development Environment
```bash
# Prefer .venv; fall back to venv if that's what your checkout has.
source .venv/bin/activate # or: source venv/bin/activate
source venv/bin/activate # ALWAYS activate before running Python
```
`scripts/run_tests.sh` probes `.venv` first, then `venv`, then
`$HOME/.hermes/hermes-agent/venv` (for worktrees that share a venv with the
main checkout).
## Project Structure
File counts shift constantly — don't treat the tree below as exhaustive.
The canonical source is the filesystem. The notes call out the load-bearing
entry points you'll actually edit.
```
hermes-agent/
├── run_agent.py # AIAgent class — core conversation loop (~12k LOC)
├── run_agent.py # AIAgent class — core conversation loop
├── model_tools.py # Tool orchestration, discover_builtin_tools(), handle_function_call()
├── toolsets.py # Toolset definitions, _HERMES_CORE_TOOLS list
├── cli.py # HermesCLI class — interactive CLI orchestrator (~11k LOC)
├── cli.py # HermesCLI class — interactive CLI orchestrator
├── hermes_state.py # SessionDB — SQLite session store (FTS5 search)
├── hermes_constants.py # get_hermes_home(), display_hermes_home() — profile-aware paths
├── hermes_logging.py # setup_logging() — agent.log / errors.log / gateway.log (profile-aware)
├── batch_runner.py # Parallel batch processing
├── agent/ # Agent internals (provider adapters, memory, caching, compression, etc.)
├── hermes_cli/ # CLI subcommands, setup wizard, plugins loader, skin engine
├── tools/ # Tool implementations — auto-discovered via tools/registry.py
├── agent/ # Agent internals
│ ├── prompt_builder.py # System prompt assembly
│ ├── context_compressor.py # Auto context compression
│ ├── prompt_caching.py # Anthropic prompt caching
│ ├── auxiliary_client.py # Auxiliary LLM client (vision, summarization)
│ ├── model_metadata.py # Model context lengths, token estimation
│ ├── models_dev.py # models.dev registry integration (provider-aware context)
│ ├── display.py # KawaiiSpinner, tool preview formatting
│ ├── skill_commands.py # Skill slash commands (shared CLI/gateway)
│ └── trajectory.py # Trajectory saving helpers
├── hermes_cli/ # CLI subcommands and setup
│ ├── main.py # Entry point — all `hermes` subcommands
│ ├── config.py # DEFAULT_CONFIG, OPTIONAL_ENV_VARS, migration
│ ├── commands.py # Slash command definitions + SlashCommandCompleter
│ ├── callbacks.py # Terminal callbacks (clarify, sudo, approval)
│ ├── setup.py # Interactive setup wizard
│ ├── skin_engine.py # Skin/theme engine — CLI visual customization
│ ├── skills_config.py # `hermes skills` — enable/disable skills per platform
│ ├── tools_config.py # `hermes tools` — enable/disable tools per platform
│ ├── skills_hub.py # `/skills` slash command (search, browse, install)
│ ├── models.py # Model catalog, provider model lists
│ ├── model_switch.py # Shared /model switch pipeline (CLI + gateway)
│ └── auth.py # Provider credential resolution
├── tools/ # Tool implementations (one file per tool)
│ ├── registry.py # Central tool registry (schemas, handlers, dispatch)
│ ├── approval.py # Dangerous command detection
│ ├── terminal_tool.py # Terminal orchestration
│ ├── process_registry.py # Background process management
│ ├── file_tools.py # File read/write/search/patch
│ ├── web_tools.py # Web search/extract (Parallel + Firecrawl)
│ ├── browser_tool.py # Browserbase browser automation
│ ├── code_execution_tool.py # execute_code sandbox
│ ├── delegate_tool.py # Subagent delegation
│ ├── mcp_tool.py # MCP client (~1050 lines)
│ └── environments/ # Terminal backends (local, docker, ssh, modal, daytona, singularity)
├── gateway/ # Messaging gateway — run.py + session.py + platforms/
│ ├── platforms/ # Adapter per platform (telegram, discord, slack, whatsapp,
│ # homeassistant, signal, matrix, mattermost, email, sms,
│ # dingtalk, wecom, weixin, feishu, qqbot, bluebubbles,
│ │ # webhook, api_server, ...). See ADDING_A_PLATFORM.md.
│ └── builtin_hooks/ # Always-registered gateway hooks (boot-md, ...)
├── plugins/ # Plugin system (see "Plugins" section below)
│ ├── memory/ # Memory-provider plugins (honcho, mem0, supermemory, ...)
│ ├── context_engine/ # Context-engine plugins
│ └── <others>/ # Dashboard, image-gen, disk-cleanup, examples, ...
├── optional-skills/ # Heavier/niche skills shipped but NOT active by default
├── skills/ # Built-in skills bundled with the repo
├── gateway/ # Messaging platform gateway
│ ├── run.py # Main loop, slash commands, message dispatch
├── session.py # SessionStore — conversation persistence
└── platforms/ # Adapters: telegram, discord, slack, whatsapp, homeassistant, signal, qqbot
├── ui-tui/ # Ink (React) terminal UI — `hermes --tui`
── src/ # entry.tsx, app.tsx, gatewayClient.ts + app/components/hooks/lib
── src/entry.tsx # TTY gate + render()
│ ├── src/app.tsx # Main state machine and UI
│ ├── src/gatewayClient.ts # Child process + JSON-RPC bridge
│ ├── src/app/ # Decomposed app logic (event handler, slash handler, stores, hooks)
│ ├── src/components/ # Ink components (branding, markdown, prompts, pickers, etc.)
│ ├── src/hooks/ # useCompletion, useInputHistory, useQueue, useVirtualHistory
│ └── src/lib/ # Pure helpers (history, osc52, text, rpc, messages)
├── tui_gateway/ # Python JSON-RPC backend for the TUI
│ ├── entry.py # stdio entrypoint
│ ├── server.py # RPC handlers and session logic
│ ├── render.py # Optional rich/ANSI bridge
│ └── slash_worker.py # Persistent HermesCLI subprocess for slash commands
├── acp_adapter/ # ACP server (VS Code / Zed / JetBrains integration)
├── cron/ # Scheduler jobs.py, scheduler.py
├── cron/ # Scheduler (jobs.py, scheduler.py)
├── environments/ # RL training environments (Atropos)
├── scripts/ # run_tests.sh, release.py, auxiliary scripts
── website/ # Docusaurus docs site
└── tests/ # Pytest suite (~15k tests across ~700 files as of Apr 2026)
├── tests/ # Pytest suite (~3000 tests)
── batch_runner.py # Parallel batch processing
```
**User config:** `~/.hermes/config.yaml` (settings), `~/.hermes/.env` (API keys only).
**Logs:** `~/.hermes/logs/``agent.log` (INFO+), `errors.log` (WARNING+),
`gateway.log` when running the gateway. Profile-aware via `get_hermes_home()`.
Browse with `hermes logs [--follow] [--level ...] [--session ...]`.
**User config:** `~/.hermes/config.yaml` (settings), `~/.hermes/.env` (API keys)
## File Dependency Chain
@@ -77,30 +94,20 @@ run_agent.py, cli.py, batch_runner.py, environments/
## AIAgent Class (run_agent.py)
The real `AIAgent.__init__` takes ~60 parameters (credentials, routing, callbacks,
session context, budget, credential pool, etc.). The signature below is the
minimum subset you'll usually touch — read `run_agent.py` for the full list.
```python
class AIAgent:
def __init__(self,
base_url: str = None,
api_key: str = None,
provider: str = None,
api_mode: str = None, # "chat_completions" | "codex_responses" | ...
model: str = "", # empty → resolved from config/provider later
max_iterations: int = 90, # tool-calling iterations (shared with subagents)
model: str = "anthropic/claude-opus-4.6",
max_iterations: int = 90,
enabled_toolsets: list = None,
disabled_toolsets: list = None,
quiet_mode: bool = False,
save_trajectories: bool = False,
platform: str = None, # "cli", "telegram", etc.
platform: str = None, # "cli", "telegram", etc.
session_id: str = None,
skip_context_files: bool = False,
skip_memory: bool = False,
credential_pool=None,
# ... plus callbacks, thread/user/chat IDs, iteration_budget, fallback_model,
# checkpoints config, prefill_messages, service_tier, reasoning_config, etc.
# ... plus provider, api_mode, callbacks, routing params
): ...
def chat(self, message: str) -> str:
@@ -113,13 +120,10 @@ class AIAgent:
### Agent Loop
The core loop is inside `run_conversation()` — entirely synchronous, with
interrupt checks, budget tracking, and a one-turn grace call:
The core loop is inside `run_conversation()` — entirely synchronous:
```python
while (api_call_count < self.max_iterations and self.iteration_budget.remaining > 0) \
or self._budget_grace_call:
if self._interrupt_requested: break
while api_call_count < self.max_iterations and self.iteration_budget.remaining > 0:
response = client.chat.completions.create(model=model, messages=messages, tools=tool_schemas)
if response.tool_calls:
for tool_call in response.tool_calls:
@@ -130,8 +134,7 @@ while (api_call_count < self.max_iterations and self.iteration_budget.remaining
return response.content
```
Messages follow OpenAI format: `{"role": "system/user/assistant/tool", ...}`.
Reasoning content is stored in `assistant_msg["reasoning"]`.
Messages follow OpenAI format: `{"role": "system/user/assistant/tool", ...}`. Reasoning content is stored in `assistant_msg["reasoning"]`.
---
@@ -240,19 +243,6 @@ npm run fmt # prettier
npm test # vitest
```
### TUI in the Dashboard (`hermes dashboard` → `/chat`)
The dashboard embeds the real `hermes --tui`**not** a rewrite. See `hermes_cli/pty_bridge.py` + the `@app.websocket("/api/pty")` endpoint in `hermes_cli/web_server.py`.
- Browser loads `web/src/pages/ChatPage.tsx`, which mounts xterm.js's `Terminal` with the WebGL renderer, `@xterm/addon-fit` for container-driven resize, and `@xterm/addon-unicode11` for modern wide-character widths.
- `/api/pty?token=…` upgrades to a WebSocket; auth uses the same ephemeral `_SESSION_TOKEN` as REST, via query param (browsers can't set `Authorization` on WS upgrade).
- The server spawns whatever `hermes --tui` would spawn, through `ptyprocess` (POSIX PTY — WSL works, native Windows does not).
- Frames: raw PTY bytes each direction; resize via `\x1b[RESIZE:<cols>;<rows>]` intercepted on the server and applied with `TIOCSWINSZ`.
**Do not re-implement the primary chat experience in React.** The main transcript, composer/input flow (including slash-command behavior), and PTY-backed terminal belong to the embedded `hermes --tui` — anything new you add to Ink shows up in the dashboard automatically. If you find yourself rebuilding the transcript or composer for the dashboard, stop and extend Ink instead.
**Structured React UI around the TUI is allowed when it is not a second chat surface.** Sidebar widgets, inspectors, summaries, status panels, and similar supporting views (e.g. `ChatSidebar`, `ModelPickerDialog`, `ToolCall`) are fine when they complement the embedded TUI rather than replacing the transcript / composer / terminal. Keep their state independent of the PTY child's session and surface their failures non-destructively so the terminal pane keeps working unimpaired.
---
## Adding New Tools
@@ -290,7 +280,7 @@ The registry handles schema collection, dispatch, availability checking, and err
**State files**: If a tool stores persistent state (caches, logs, checkpoints), use `get_hermes_home()` for the base directory — never `Path.home() / ".hermes"`. This ensures each profile gets its own state.
**Agent-level tools** (todo, memory): intercepted by `run_agent.py` before `handle_function_call()`. See `tools/todo_tool.py` for the pattern.
**Agent-level tools** (todo, memory): intercepted by `run_agent.py` before `handle_function_call()`. See `todo_tool.py` for the pattern.
---
@@ -298,13 +288,9 @@ The registry handles schema collection, dispatch, availability checking, and err
### config.yaml options:
1. Add to `DEFAULT_CONFIG` in `hermes_cli/config.py`
2. Bump `_config_version` (check the current value at the top of `DEFAULT_CONFIG`)
ONLY if you need to actively migrate/transform existing user config
(renaming keys, changing structure). Adding a new key to an existing
section is handled automatically by the deep-merge and does NOT require
a version bump.
2. Bump `_config_version` (currently 5) to trigger migration for existing users
### .env variables (SECRETS ONLY — API keys, tokens, passwords):
### .env variables:
1. Add to `OPTIONAL_ENV_VARS` in `hermes_cli/config.py` with metadata:
```python
"NEW_API_KEY": {
@@ -316,29 +302,13 @@ The registry handles schema collection, dispatch, availability checking, and err
},
```
Non-secret settings (timeouts, thresholds, feature flags, paths, display
preferences) belong in `config.yaml`, not `.env`. If internal code needs an
env var mirror for backward compatibility, bridge it from `config.yaml` to
the env var in code (see `gateway_timeout`, `terminal.cwd``TERMINAL_CWD`).
### Config loaders (three paths — know which one you're in):
### Config loaders (two separate systems):
| Loader | Used by | Location |
|--------|---------|----------|
| `load_cli_config()` | CLI mode | `cli.py` — merges CLI-specific defaults + user YAML |
| `load_config()` | `hermes tools`, `hermes setup`, most CLI subcommands | `hermes_cli/config.py` — merges `DEFAULT_CONFIG` + user YAML |
| Direct YAML load | Gateway runtime | `gateway/run.py` + `gateway/config.py` — reads user YAML raw |
If you add a new key and the CLI sees it but the gateway doesn't (or vice
versa), you're on the wrong loader. Check `DEFAULT_CONFIG` coverage.
### Working directory:
- **CLI** — uses the process's current directory (`os.getcwd()`).
- **Messaging** — uses `terminal.cwd` from `config.yaml`. The gateway bridges this
to the `TERMINAL_CWD` env var for child tools. **`MESSAGING_CWD` has been
removed** — the config loader prints a deprecation warning if it's set in
`.env`. Same for `TERMINAL_CWD` in `.env`; the canonical setting is
`terminal.cwd` in `config.yaml`.
| `load_cli_config()` | CLI mode | `cli.py` |
| `load_config()` | `hermes tools`, `hermes setup` | `hermes_cli/config.py` |
| Direct YAML load | Gateway | `gateway/run.py` |
---
@@ -431,95 +401,7 @@ Activate with `/skin cyberpunk` or `display.skin: cyberpunk` in config.yaml.
---
## Plugins
Hermes has two plugin surfaces. Both live under `plugins/` in the repo so
repo-shipped plugins can be discovered alongside user-installed ones in
`~/.hermes/plugins/` and pip-installed entry points.
### General plugins (`hermes_cli/plugins.py` + `plugins/<name>/`)
`PluginManager` discovers plugins from `~/.hermes/plugins/`, `./.hermes/plugins/`,
and pip entry points. Each plugin exposes a `register(ctx)` function that
can:
- Register Python-callback lifecycle hooks:
`pre_tool_call`, `post_tool_call`, `pre_llm_call`, `post_llm_call`,
`on_session_start`, `on_session_end`
- Register new tools via `ctx.register_tool(...)`
- Register CLI subcommands via `ctx.register_cli_command(...)` — the
plugin's argparse tree is wired into `hermes` at startup so
`hermes <pluginname> <subcmd>` works with no change to `main.py`
Hooks are invoked from `model_tools.py` (pre/post tool) and `run_agent.py`
(lifecycle). **Discovery timing pitfall:** `discover_plugins()` only runs
as a side effect of importing `model_tools.py`. Code paths that read plugin
state without importing `model_tools.py` first must call `discover_plugins()`
explicitly (it's idempotent).
### Memory-provider plugins (`plugins/memory/<name>/`)
Separate discovery system for pluggable memory backends. Current built-in
providers include **honcho, mem0, supermemory, byterover, hindsight,
holographic, openviking, retaindb**.
Each provider implements the `MemoryProvider` ABC (see `agent/memory_provider.py`)
and is orchestrated by `agent/memory_manager.py`. Lifecycle hooks include
`sync_turn(turn_messages)`, `prefetch(query)`, `shutdown()`, and optional
`post_setup(hermes_home, config)` for setup-wizard integration.
**CLI commands via `plugins/memory/<name>/cli.py`:** if a memory plugin
defines `register_cli(subparser)`, `discover_plugin_cli_commands()` finds
it at argparse setup time and wires it into `hermes <plugin>`. The
framework only exposes CLI commands for the **currently active** memory
provider (read from `memory.provider` in config.yaml), so disabled
providers don't clutter `hermes --help`.
**Rule (Teknium, May 2026):** plugins MUST NOT modify core files
(`run_agent.py`, `cli.py`, `gateway/run.py`, `hermes_cli/main.py`, etc.).
If a plugin needs a capability the framework doesn't expose, expand the
generic plugin surface (new hook, new ctx method) — never hardcode
plugin-specific logic into core. PR #5295 removed 95 lines of hardcoded
honcho argparse from `main.py` for exactly this reason.
### Dashboard / context-engine / image-gen plugin directories
`plugins/context_engine/`, `plugins/image_gen/`, `plugins/example-dashboard/`,
etc. follow the same pattern (ABC + orchestrator + per-plugin directory).
Context engines plug into `agent/context_engine.py`; image-gen providers
into `agent/image_gen_provider.py`.
---
## Skills
Two parallel surfaces:
- **`skills/`** — built-in skills shipped and loadable by default.
Organized by category directories (e.g. `skills/github/`, `skills/mlops/`).
- **`optional-skills/`** — heavier or niche skills shipped with the repo but
NOT active by default. Installed explicitly via
`hermes skills install official/<category>/<skill>`. Adapter lives in
`tools/skills_hub.py` (`OptionalSkillSource`). Categories include
`autonomous-ai-agents`, `blockchain`, `communication`, `creative`,
`devops`, `email`, `health`, `mcp`, `migration`, `mlops`, `productivity`,
`research`, `security`, `web-development`.
When reviewing skill PRs, check which directory they target — heavy-dep or
niche skills belong in `optional-skills/`.
### SKILL.md frontmatter
Standard fields: `name`, `description`, `version`, `platforms`
(OS-gating list: `[macos]`, `[linux, macos]`, ...),
`metadata.hermes.tags`, `metadata.hermes.category`,
`metadata.hermes.config` (config.yaml settings the skill needs — stored
under `skills.config.<key>`, prompted during setup, injected at load time).
---
## Important Policies
### Prompt Caching Must Not Break
Hermes-Agent ensures caching remains valid throughout a conversation. **Do NOT implement changes that would:**
@@ -529,10 +411,9 @@ Hermes-Agent ensures caching remains valid throughout a conversation. **Do NOT i
Cache-breaking forces dramatically higher costs. The ONLY time we alter context is during context compression.
Slash commands that mutate system-prompt state (skills, tools, memory, etc.)
must be **cache-aware**: default to deferred invalidation (change takes
effect next session), with an opt-in `--now` flag for immediate
invalidation. See `/skills install --now` for the canonical pattern.
### Working Directory Behavior
- **CLI**: Uses current directory (`.``os.getcwd()`)
- **Messaging**: Uses `MESSAGING_CWD` env var (default: home directory)
### Background Process Notifications (Gateway)
@@ -554,7 +435,7 @@ Hermes supports **profiles** — multiple fully isolated instances, each with it
`HERMES_HOME` directory (config, API keys, memory, sessions, skills, gateway, etc.).
The core mechanism: `_apply_profile_override()` in `hermes_cli/main.py` sets
`HERMES_HOME` before any module imports. All `get_hermes_home()` references
`HERMES_HOME` before any module imports. All 119+ references to `get_hermes_home()`
automatically scope to the active profile.
### Rules for profile-safe code
@@ -611,12 +492,8 @@ Use `get_hermes_home()` from `hermes_constants` for code paths. Use `display_her
for user-facing print/log messages. Hardcoding `~/.hermes` breaks profiles — each profile
has its own `HERMES_HOME` directory. This was the source of 5 bugs fixed in PR #3575.
### DO NOT introduce new `simple_term_menu` usage
Existing call sites in `hermes_cli/main.py` remain for legacy fallback only;
the preferred UI is curses (stdlib) because `simple_term_menu` has
ghost-duplication rendering bugs in tmux/iTerm2 with arrow keys. New
interactive menus must use `hermes_cli/curses_ui.py` — see
`hermes_cli/tools_config.py` for the canonical pattern.
### DO NOT use `simple_term_menu` for interactive menus
Rendering bugs in tmux/iTerm2 — ghosting on scroll. Use `curses` (stdlib) instead. See `hermes_cli/tools_config.py` for the pattern.
### DO NOT use `\033[K` (ANSI erase-to-EOL) in spinner/display code
Leaks as literal `?[K` text under `prompt_toolkit`'s `patch_stdout`. Use space-padding: `f"\r{line}{' ' * pad}"`.
@@ -627,30 +504,6 @@ Leaks as literal `?[K` text under `prompt_toolkit`'s `patch_stdout`. Use space-p
### DO NOT hardcode cross-tool references in schema descriptions
Tool schema descriptions must not mention tools from other toolsets by name (e.g., `browser_navigate` saying "prefer web_search"). Those tools may be unavailable (missing API keys, disabled toolset), causing the model to hallucinate calls to non-existent tools. If a cross-reference is needed, add it dynamically in `get_tool_definitions()` in `model_tools.py` — see the `browser_navigate` / `execute_code` post-processing blocks for the pattern.
### The gateway has TWO message guards — both must bypass approval/control commands
When an agent is running, messages pass through two sequential guards:
(1) **base adapter** (`gateway/platforms/base.py`) queues messages in
`_pending_messages` when `session_key in self._active_sessions`, and
(2) **gateway runner** (`gateway/run.py`) intercepts `/stop`, `/new`,
`/queue`, `/status`, `/approve`, `/deny` before they reach
`running_agent.interrupt()`. Any new command that must reach the runner
while the agent is blocked (e.g. approval prompts) MUST bypass BOTH
guards and be dispatched inline, not via `_process_message_background()`
(which races session lifecycle).
### Squash merges from stale branches silently revert recent fixes
Before squash-merging a PR, ensure the branch is up to date with `main`
(`git fetch origin main && git reset --hard origin/main` in the worktree,
then re-apply the PR's commits). A stale branch's version of an unrelated
file will silently overwrite recent fixes on main when squashed. Verify
with `git diff HEAD~1..HEAD` after merging — unexpected deletions are a
red flag.
### Don't wire in dead code without E2E validation
Unused code that was never shipped was dead for a reason. Before wiring an
unused module into a live code path, E2E test the real resolution chain
with actual imports (not mocks) against a temp `HERMES_HOME`.
### Tests must not write to `~/.hermes/`
The `_isolate_hermes_home` autouse fixture in `tests/conftest.py` redirects `HERMES_HOME` to a temp dir. Never hardcode `~/.hermes/` paths in tests.
@@ -706,7 +559,7 @@ If you can't use the wrapper (e.g. on Windows or inside an IDE that shells
pytest directly), at minimum activate the venv and pass `-n 4`:
```bash
source .venv/bin/activate # or: source venv/bin/activate
source venv/bin/activate
python -m pytest tests/ -q -n 4
```
+3 -3
View File
@@ -9,7 +9,7 @@ Thank you for contributing to Hermes Agent! This guide covers everything you nee
We value contributions in this order:
1. **Bug fixes** — crashes, incorrect behavior, data loss. Always top priority.
2. **Cross-platform compatibility** — macOS, different Linux distros, and WSL2 on Windows. We want Hermes to work everywhere.
2. **Cross-platform compatibility** Windows, macOS, different Linux distros, different terminal emulators. We want Hermes to work everywhere.
3. **Security hardening** — shell injection, prompt injection, path traversal, privilege escalation. See [Security](#security-considerations).
4. **Performance and robustness** — retry logic, error handling, graceful degradation.
5. **New skills** — but only broadly useful ones. See [Should it be a Skill or a Tool?](#should-it-be-a-skill-or-a-tool)
@@ -515,7 +515,7 @@ See `hermes_cli/skin_engine.py` for the full schema and existing skins as exampl
## Cross-Platform Compatibility
Hermes runs on Linux, macOS, and WSL2 on Windows. When writing code that touches the OS:
Hermes runs on Linux, macOS, and Windows. When writing code that touches the OS:
### Critical rules
@@ -597,7 +597,7 @@ refactor/description # Code restructuring
1. **Run tests**: `pytest tests/ -v`
2. **Test manually**: Run `hermes` and exercise the code path you changed
3. **Check cross-platform impact**: If you touch file I/O, process management, or terminal handling, consider macOS, Linux, and WSL2
3. **Check cross-platform impact**: If you touch file I/O, process management, or terminal handling, consider Windows and macOS
4. **Keep PRs focused**: One logical change per PR. Don't mix a bug fix with a refactor with a new feature.
### PR description
+4 -12
View File
@@ -10,11 +10,9 @@ ENV PYTHONUNBUFFERED=1
ENV PLAYWRIGHT_BROWSERS_PATH=/opt/hermes/.playwright
# Install system dependencies in one layer, clear APT cache
# tini reaps orphaned zombie processes (MCP stdio subprocesses, git, bun, etc.)
# that would otherwise accumulate when hermes runs as PID 1. See #15012.
RUN apt-get update && \
apt-get install -y --no-install-recommends \
build-essential nodejs npm python3 ripgrep ffmpeg gcc python3-dev libffi-dev procps git openssh-client docker-cli tini && \
build-essential nodejs npm python3 ripgrep ffmpeg gcc python3-dev libffi-dev procps git openssh-client docker-cli && \
rm -rf /var/lib/apt/lists/*
# Non-root user for runtime; UID can be overridden via HERMES_UID at runtime
@@ -43,15 +41,9 @@ COPY --chown=hermes:hermes . .
# Build web dashboard (Vite outputs to hermes_cli/web_dist/)
RUN cd web && npm run build
# ---------- Permissions ----------
# Make install dir world-readable so any HERMES_UID can read it at runtime.
# The venv needs to be traversable too.
USER root
RUN chmod -R a+rX /opt/hermes
# Start as root so the entrypoint can usermod/groupmod + gosu.
# If HERMES_UID is unset, the entrypoint drops to the default hermes user (10000).
# ---------- Python virtualenv ----------
RUN chown hermes:hermes /opt/hermes
USER hermes
RUN uv venv && \
uv pip install --no-cache-dir -e ".[all]"
@@ -60,4 +52,4 @@ ENV HERMES_WEB_DIST=/opt/hermes/hermes_cli/web_dist
ENV HERMES_HOME=/opt/data
ENV PATH="/opt/data/.local/bin:${PATH}"
VOLUME [ "/opt/data" ]
ENTRYPOINT [ "/usr/bin/tini", "-g", "--", "/opt/hermes/docker/entrypoint.sh" ]
ENTRYPOINT [ "/opt/hermes/docker/entrypoint.sh" ]
+7 -3
View File
@@ -76,7 +76,7 @@ Hermes has two entry points: start the terminal UI with `hermes`, or run the gat
| Set a personality | `/personality [name]` | `/personality [name]` |
| Retry or undo the last turn | `/retry`, `/undo` | `/retry`, `/undo` |
| Compress context / check usage | `/compress`, `/usage`, `/insights [--days N]` | `/compress`, `/usage`, `/insights [days]` |
| Browse skills | `/skills` or `/<skill-name>` | `/<skill-name>` |
| Browse skills | `/skills` or `/<skill-name>` | `/skills` or `/<skill-name>` |
| Interrupt current work | `Ctrl+C` or send a new message | `/stop` or send a new message |
| Platform-specific status | `/platforms` | `/status`, `/sethome` |
@@ -157,10 +157,14 @@ curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv venv --python 3.11
source venv/bin/activate
uv pip install -e ".[all,dev]"
scripts/run_tests.sh
python -m pytest tests/ -q
```
> **RL Training (optional):** The RL/Atropos integration (`environments/`) ships via the `atroposlib` and `tinker` dependencies pulled in by `.[all,dev]` — no submodule setup required.
> **RL Training (optional):** To work on the RL/Tinker-Atropos integration:
> ```bash
> git submodule update --init tinker-atropos
> uv pip install -e "./tinker-atropos"
> ```
---
-453
View File
@@ -1,453 +0,0 @@
# Hermes Agent v0.11.0 (v2026.4.23)
**Release Date:** April 23, 2026
**Since v0.9.0:** 1,556 commits · 761 merged PRs · 1,314 files changed · 224,174 insertions · 29 community contributors (290 including co-authors)
> The Interface release — a full React/Ink rewrite of the interactive CLI, a pluggable transport architecture underneath every provider, native AWS Bedrock support, five new inference paths, a 17th messaging platform (QQBot), a dramatically expanded plugin surface, and GPT-5.5 via Codex OAuth.
This release also folds in all the highlights deferred from v0.10.0 (which shipped only the Nous Tool Gateway) — so it covers roughly two weeks of work across the whole stack.
---
## ✨ Highlights
- **New Ink-based TUI** — `hermes --tui` is now a full React/Ink rewrite of the interactive CLI, with a Python JSON-RPC backend (`tui_gateway`). Sticky composer, live streaming with OSC-52 clipboard support, stable picker keys, status bar with per-turn stopwatch and git branch, `/clear` confirm, light-theme preset, and a subagent spawn observability overlay. ~310 commits to `ui-tui/` + `tui_gateway/`. (@OutThisLife + Teknium)
- **Transport ABC + Native AWS Bedrock** — Format conversion and HTTP transport were extracted from `run_agent.py` into a pluggable `agent/transports/` layer. `AnthropicTransport`, `ChatCompletionsTransport`, `ResponsesApiTransport`, and `BedrockTransport` each own their own format conversion and API shape. Native AWS Bedrock support via the Converse API ships on top of the new abstraction. ([#10549](https://github.com/NousResearch/hermes-agent/pull/10549), [#13347](https://github.com/NousResearch/hermes-agent/pull/13347), [#13366](https://github.com/NousResearch/hermes-agent/pull/13366), [#13430](https://github.com/NousResearch/hermes-agent/pull/13430), [#13805](https://github.com/NousResearch/hermes-agent/pull/13805), [#13814](https://github.com/NousResearch/hermes-agent/pull/13814) — @kshitijk4poor + Teknium)
- **Five new inference paths** — Native NVIDIA NIM ([#11774](https://github.com/NousResearch/hermes-agent/pull/11774)), Arcee AI ([#9276](https://github.com/NousResearch/hermes-agent/pull/9276)), Step Plan ([#13893](https://github.com/NousResearch/hermes-agent/pull/13893)), Google Gemini CLI OAuth ([#11270](https://github.com/NousResearch/hermes-agent/pull/11270)), and Vercel ai-gateway with pricing + dynamic discovery ([#13223](https://github.com/NousResearch/hermes-agent/pull/13223) — @jerilynzheng). Plus Gemini routed through the native AI Studio API for better performance ([#12674](https://github.com/NousResearch/hermes-agent/pull/12674)).
- **GPT-5.5 over Codex OAuth** — OpenAI's new GPT-5.5 reasoning model is now available through your ChatGPT Codex OAuth, with live model discovery wired into the model picker so new OpenAI releases show up without catalog updates. ([#14720](https://github.com/NousResearch/hermes-agent/pull/14720))
- **QQBot — 17th supported platform** — Native QQBot adapter via QQ Official API v2, with QR scan-to-configure setup wizard, streaming cursor, emoji reactions, and DM/group policy gating that matches WeCom/Weixin parity. ([#9364](https://github.com/NousResearch/hermes-agent/pull/9364), [#11831](https://github.com/NousResearch/hermes-agent/pull/11831))
- **Plugin surface expanded** — Plugins can now register slash commands (`register_command`), dispatch tools directly (`dispatch_tool`), block tool execution from hooks (`pre_tool_call` can veto), rewrite tool results (`transform_tool_result`), transform terminal output (`transform_terminal_output`), ship image_gen backends, and add custom dashboard tabs. The bundled disk-cleanup plugin is opt-in by default as a reference implementation. ([#9377](https://github.com/NousResearch/hermes-agent/pull/9377), [#10626](https://github.com/NousResearch/hermes-agent/pull/10626), [#10763](https://github.com/NousResearch/hermes-agent/pull/10763), [#10951](https://github.com/NousResearch/hermes-agent/pull/10951), [#12929](https://github.com/NousResearch/hermes-agent/pull/12929), [#12944](https://github.com/NousResearch/hermes-agent/pull/12944), [#12972](https://github.com/NousResearch/hermes-agent/pull/12972), [#13799](https://github.com/NousResearch/hermes-agent/pull/13799), [#14175](https://github.com/NousResearch/hermes-agent/pull/14175))
- **`/steer` — mid-run agent nudges** — `/steer <prompt>` injects a note that the running agent sees after its next tool call, without interrupting the turn or breaking prompt cache. For when you want to course-correct an agent in-flight. ([#12116](https://github.com/NousResearch/hermes-agent/pull/12116))
- **Shell hooks** — Wire any shell script as a Hermes lifecycle hook (pre_tool_call, post_tool_call, on_session_start, etc.) without writing a Python plugin. ([#13296](https://github.com/NousResearch/hermes-agent/pull/13296))
- **Webhook direct-delivery mode** — Webhook subscriptions can now forward payloads straight to a platform chat without going through the agent — zero-LLM push notifications for alerting, uptime checks, and event streams. ([#12473](https://github.com/NousResearch/hermes-agent/pull/12473))
- **Smarter delegation** — Subagents now have an explicit `orchestrator` role that can spawn their own workers, with configurable `max_spawn_depth` (default flat). Concurrent sibling subagents share filesystem state through a file-coordination layer so they don't clobber each other's edits. ([#13691](https://github.com/NousResearch/hermes-agent/pull/13691), [#13718](https://github.com/NousResearch/hermes-agent/pull/13718))
- **Auxiliary models — configurable UI + main-model-first** — `hermes model` has a dedicated "Configure auxiliary models" screen for per-task overrides (compression, vision, session_search, title_generation). `auto` routing now defaults to the main model for side tasks across all users (previously aggregator users were silently routed to a cheap provider-side default). ([#11891](https://github.com/NousResearch/hermes-agent/pull/11891), [#11900](https://github.com/NousResearch/hermes-agent/pull/11900))
- **Dashboard plugin system + live theme switching** — The web dashboard is now extensible. Third-party plugins can add custom tabs, widgets, and views without forking. Paired with a live-switching theme system — themes now control colors, fonts, layout, and density — so users can hot-swap the dashboard look without a reload. Same theming discipline the CLI has, now on the web. ([#10951](https://github.com/NousResearch/hermes-agent/pull/10951), [#10687](https://github.com/NousResearch/hermes-agent/pull/10687), [#14725](https://github.com/NousResearch/hermes-agent/pull/14725))
- **Dashboard polish** — i18n (English + Chinese), react-router sidebar layout, mobile-responsive, Vercel deployment, real per-session API call tracking, and one-click update + gateway restart buttons. ([#9228](https://github.com/NousResearch/hermes-agent/pull/9228), [#9370](https://github.com/NousResearch/hermes-agent/pull/9370), [#9453](https://github.com/NousResearch/hermes-agent/pull/9453), [#10686](https://github.com/NousResearch/hermes-agent/pull/10686), [#13526](https://github.com/NousResearch/hermes-agent/pull/13526), [#14004](https://github.com/NousResearch/hermes-agent/pull/14004) — @austinpickett + @DeployFaith + Teknium)
---
## 🏗️ Core Agent & Architecture
### Transport Layer (NEW)
- **Transport ABC** abstracts format conversion and HTTP transport from `run_agent.py` into `agent/transports/` ([#13347](https://github.com/NousResearch/hermes-agent/pull/13347))
- **AnthropicTransport** — Anthropic Messages API path ([#13366](https://github.com/NousResearch/hermes-agent/pull/13366), @kshitijk4poor)
- **ChatCompletionsTransport** — default path for OpenAI-compatible providers ([#13805](https://github.com/NousResearch/hermes-agent/pull/13805))
- **ResponsesApiTransport** — OpenAI Responses API + Codex build_kwargs wiring ([#13430](https://github.com/NousResearch/hermes-agent/pull/13430), @kshitijk4poor)
- **BedrockTransport** — AWS Bedrock Converse API transport ([#13814](https://github.com/NousResearch/hermes-agent/pull/13814))
### Provider & Model Support
- **Native AWS Bedrock provider** via Converse API ([#10549](https://github.com/NousResearch/hermes-agent/pull/10549))
- **NVIDIA NIM native provider** (salvage of #11703) ([#11774](https://github.com/NousResearch/hermes-agent/pull/11774))
- **Arcee AI direct provider** ([#9276](https://github.com/NousResearch/hermes-agent/pull/9276))
- **Step Plan provider** (salvage #6005) ([#13893](https://github.com/NousResearch/hermes-agent/pull/13893), @kshitijk4poor)
- **Google Gemini CLI OAuth** inference provider ([#11270](https://github.com/NousResearch/hermes-agent/pull/11270))
- **Vercel ai-gateway** with pricing, attribution, and dynamic discovery ([#13223](https://github.com/NousResearch/hermes-agent/pull/13223), @jerilynzheng)
- **GPT-5.5 over Codex OAuth** with live model discovery in the picker ([#14720](https://github.com/NousResearch/hermes-agent/pull/14720))
- **Gemini routed through native AI Studio API** ([#12674](https://github.com/NousResearch/hermes-agent/pull/12674))
- **xAI Grok upgraded to Responses API** ([#10783](https://github.com/NousResearch/hermes-agent/pull/10783))
- **Ollama improvements** — Cloud provider support, GLM continuation, `think=false` control, surrogate sanitization, `/v1` hint ([#10782](https://github.com/NousResearch/hermes-agent/pull/10782))
- **Kimi K2.6** across OpenRouter, Nous Portal, native Kimi, and HuggingFace ([#13148](https://github.com/NousResearch/hermes-agent/pull/13148), [#13152](https://github.com/NousResearch/hermes-agent/pull/13152), [#13169](https://github.com/NousResearch/hermes-agent/pull/13169))
- **Kimi K2.5** promoted to first position in all model suggestion lists ([#11745](https://github.com/NousResearch/hermes-agent/pull/11745), @kshitijk4poor)
- **Xiaomi MiMo v2.5-pro + v2.5** on OpenRouter, Nous Portal, and native ([#14184](https://github.com/NousResearch/hermes-agent/pull/14184), [#14635](https://github.com/NousResearch/hermes-agent/pull/14635), @kshitijk4poor)
- **GLM-5V-Turbo** for coding plan ([#9907](https://github.com/NousResearch/hermes-agent/pull/9907))
- **Claude Opus 4.7** in Nous Portal catalog ([#11398](https://github.com/NousResearch/hermes-agent/pull/11398))
- **OpenRouter elephant-alpha** in curated lists ([#9378](https://github.com/NousResearch/hermes-agent/pull/9378))
- **OpenCode-Go** — Kimi K2.6 and Qwen3.5/3.6 Plus in curated catalog ([#13429](https://github.com/NousResearch/hermes-agent/pull/13429))
- **minimax/minimax-m2.5:free** in OpenRouter catalog ([#13836](https://github.com/NousResearch/hermes-agent/pull/13836))
- **`/model` merges models.dev entries** for lesser-loved providers ([#14221](https://github.com/NousResearch/hermes-agent/pull/14221))
- **Per-provider + per-model `request_timeout_seconds`** config ([#12652](https://github.com/NousResearch/hermes-agent/pull/12652))
- **Configurable API retry count** via `agent.api_max_retries` ([#14730](https://github.com/NousResearch/hermes-agent/pull/14730))
- **ctx_size context length key** for Lemonade server (salvage #8536) ([#14215](https://github.com/NousResearch/hermes-agent/pull/14215))
- **Custom provider display name prompt** ([#9420](https://github.com/NousResearch/hermes-agent/pull/9420))
- **Recommendation badges** on tool provider selection ([#9929](https://github.com/NousResearch/hermes-agent/pull/9929))
- Fix: correct GPT-5 family context lengths in fallback defaults ([#9309](https://github.com/NousResearch/hermes-agent/pull/9309))
- Fix: clamp `minimal` reasoning effort to `low` on Responses API ([#9429](https://github.com/NousResearch/hermes-agent/pull/9429))
- Fix: strip reasoning item IDs from Responses API input when `store=False` ([#10217](https://github.com/NousResearch/hermes-agent/pull/10217))
- Fix: OpenViking correct account default + commit session on `/new` and compress ([#10463](https://github.com/NousResearch/hermes-agent/pull/10463))
- Fix: Kimi `/coding` thinking block survival + empty reasoning_content + block ordering (multiple PRs)
- Fix: don't send Anthropic thinking to api.kimi.com/coding ([#13826](https://github.com/NousResearch/hermes-agent/pull/13826))
- Fix: send `max_tokens`, `reasoning_effort`, and `thinking` for Kimi/Moonshot
- Fix: stream reasoning content through OpenAI-compatible providers that emit it
### Agent Loop & Conversation
- **`/steer <prompt>`** — mid-run agent nudges after next tool call ([#12116](https://github.com/NousResearch/hermes-agent/pull/12116))
- **Orchestrator role + configurable spawn depth** for `delegate_task` (default flat) ([#13691](https://github.com/NousResearch/hermes-agent/pull/13691))
- **Cross-agent file state coordination** for concurrent subagents ([#13718](https://github.com/NousResearch/hermes-agent/pull/13718))
- **Compressor smart collapse, dedup, anti-thrashing**, template upgrade, hardening ([#10088](https://github.com/NousResearch/hermes-agent/pull/10088))
- **Compression summaries respect the conversation's language** ([#12556](https://github.com/NousResearch/hermes-agent/pull/12556))
- **Compression model falls back to main model** on permanent 503/404 ([#10093](https://github.com/NousResearch/hermes-agent/pull/10093))
- **Auto-continue interrupted agent work** after gateway restart ([#9934](https://github.com/NousResearch/hermes-agent/pull/9934))
- **Activity heartbeats** prevent false gateway inactivity timeouts ([#10501](https://github.com/NousResearch/hermes-agent/pull/10501))
- **Auxiliary models UI** — dedicated screen for per-task overrides ([#11891](https://github.com/NousResearch/hermes-agent/pull/11891))
- **Auxiliary auto routing defaults to main model** for all users ([#11900](https://github.com/NousResearch/hermes-agent/pull/11900))
- **PLATFORM_HINTS for Matrix, Mattermost, Feishu** ([#14428](https://github.com/NousResearch/hermes-agent/pull/14428), @alt-glitch)
- Fix: reset retry counters after compression; stop poisoning conversation history ([#10055](https://github.com/NousResearch/hermes-agent/pull/10055))
- Fix: break compression-exhaustion infinite loop and auto-reset session ([#10063](https://github.com/NousResearch/hermes-agent/pull/10063))
- Fix: stale agent timeout, uv venv detection, empty response after tools ([#10065](https://github.com/NousResearch/hermes-agent/pull/10065))
- Fix: prevent premature loop exit when weak models return empty after substantive tool calls ([#10472](https://github.com/NousResearch/hermes-agent/pull/10472))
- Fix: preserve pre-start terminal interrupts ([#10504](https://github.com/NousResearch/hermes-agent/pull/10504))
- Fix: improve interrupt responsiveness during concurrent tool execution ([#10935](https://github.com/NousResearch/hermes-agent/pull/10935))
- Fix: word-wrap spinner, interruptable agent join, and delegate_task interrupt ([#10940](https://github.com/NousResearch/hermes-agent/pull/10940))
- Fix: `/stop` no longer resets the session ([#9224](https://github.com/NousResearch/hermes-agent/pull/9224))
- Fix: honor interrupts during MCP tool waits ([#9382](https://github.com/NousResearch/hermes-agent/pull/9382), @helix4u)
- Fix: break stuck session resume loops after repeated restarts ([#9941](https://github.com/NousResearch/hermes-agent/pull/9941))
- Fix: empty response nudge crash + placeholder leak to cron targets ([#11021](https://github.com/NousResearch/hermes-agent/pull/11021))
- Fix: streaming cursor sanitization to prevent message truncation (multiple PRs)
- Fix: resolve `context_length` for plugin context engines ([#9238](https://github.com/NousResearch/hermes-agent/pull/9238))
### Session & Memory
- **Auto-prune old sessions + VACUUM state.db** at startup ([#13861](https://github.com/NousResearch/hermes-agent/pull/13861))
- **Honcho overhaul** — context injection, 5-tool surface, cost safety, session isolation ([#10619](https://github.com/NousResearch/hermes-agent/pull/10619))
- **Hindsight richer session-scoped retain metadata** (salvage of #6290) ([#13987](https://github.com/NousResearch/hermes-agent/pull/13987))
- Fix: deduplicate memory provider tools to prevent 400 on strict providers ([#10511](https://github.com/NousResearch/hermes-agent/pull/10511))
- Fix: discover user-installed memory providers from `$HERMES_HOME/plugins/` ([#10529](https://github.com/NousResearch/hermes-agent/pull/10529))
- Fix: add `on_memory_write` bridge to sequential tool execution path ([#10507](https://github.com/NousResearch/hermes-agent/pull/10507))
- Fix: preserve `session_id` across `previous_response_id` chains in `/v1/responses` ([#10059](https://github.com/NousResearch/hermes-agent/pull/10059))
---
## 🖥️ New Ink-based TUI
A full React/Ink rewrite of the interactive CLI — invoked via `hermes --tui` or `HERMES_TUI=1`. Shipped across ~310 commits to `ui-tui/` and `tui_gateway/`.
### TUI Foundations
- New TUI based on Ink + Python JSON-RPC backend
- Prettier + ESLint + vitest tooling for `ui-tui/`
- Entry split between `src/entry.tsx` (TTY gate) and `src/app.tsx` (state machine)
- Persistent `_SlashWorker` subprocess for slash command dispatch
### UX & Features
- **Stable picker keys, /clear confirm, light-theme preset** ([#12312](https://github.com/NousResearch/hermes-agent/pull/12312), @OutThisLife)
- **Git branch in status bar** cwd label ([#12305](https://github.com/NousResearch/hermes-agent/pull/12305), @OutThisLife)
- **Per-turn elapsed stopwatch in FaceTicker + done-in sys line** ([#13105](https://github.com/NousResearch/hermes-agent/pull/13105), @OutThisLife)
- **Subagent spawn observability overlay** ([#14045](https://github.com/NousResearch/hermes-agent/pull/14045), @OutThisLife)
- **Per-prompt elapsed stopwatch in status bar** ([#12948](https://github.com/NousResearch/hermes-agent/pull/12948))
- Sticky composer that freezes during scroll
- OSC-52 clipboard support for copy across SSH sessions
- Virtualized history rendering for performance
- Slash command autocomplete via `complete.slash` RPC
- Path autocomplete via `complete.path` RPC
- Dozens of resize/ghosting/sticky-prompt fixes landed through the week
### Structural Refactors
- Decomposed `app.tsx` into `app/event-handler`, `app/slash-handler`, `app/stores`, `app/hooks` ([#14640](https://github.com/NousResearch/hermes-agent/pull/14640) and surrounding)
- Component split: `branding.tsx`, `markdown.tsx`, `prompts.tsx`, `sessionPicker.tsx`, `messageLine.tsx`, `thinking.tsx`, `maskedPrompt.tsx`
- Hook split: `useCompletion`, `useInputHistory`, `useQueue`, `useVirtualHistory`
---
## 📱 Messaging Platforms (Gateway)
### New Platforms
- **QQBot (17th platform)** — QQ Official API v2 adapter with QR setup, streaming, package split ([#9364](https://github.com/NousResearch/hermes-agent/pull/9364), [#11831](https://github.com/NousResearch/hermes-agent/pull/11831))
### Telegram
- **Dedicated `TELEGRAM_PROXY` env var + config.yaml proxy support** (closes #9414, #6530, #9074, #7786) ([#10681](https://github.com/NousResearch/hermes-agent/pull/10681))
- **`ignored_threads` config** for Telegram groups ([#9530](https://github.com/NousResearch/hermes-agent/pull/9530))
- **Config option to disable link previews** (closes #8728) ([#10610](https://github.com/NousResearch/hermes-agent/pull/10610))
- **Auto-wrap markdown tables** in code blocks ([#11794](https://github.com/NousResearch/hermes-agent/pull/11794))
- Fix: prevent duplicate replies when stream task is cancelled ([#9319](https://github.com/NousResearch/hermes-agent/pull/9319))
- Fix: prevent streaming cursor (▉) from appearing as standalone messages ([#9538](https://github.com/NousResearch/hermes-agent/pull/9538))
- Fix: retry transient tool sends + cold-boot budget ([#10947](https://github.com/NousResearch/hermes-agent/pull/10947))
- Fix: Markdown special char escaping in `send_exec_approval`
- Fix: parentheses in URLs during MarkdownV2 link conversion
- Fix: Unicode dash normalization in model switch (closes iOS smart-punctuation issue)
- Many platform hint / streaming / session-key fixes
### Discord
- **Forum channel support** (salvage of #10145 + media + polish) ([#11920](https://github.com/NousResearch/hermes-agent/pull/11920))
- **`DISCORD_ALLOWED_ROLES`** for role-based access control ([#11608](https://github.com/NousResearch/hermes-agent/pull/11608))
- **Config option to disable slash commands** (salvage #13130) ([#14315](https://github.com/NousResearch/hermes-agent/pull/14315))
- **Native `send_animation`** for inline GIF playback ([#10283](https://github.com/NousResearch/hermes-agent/pull/10283))
- **`send_message` Discord media attachments** ([#10246](https://github.com/NousResearch/hermes-agent/pull/10246))
- **`/skill` command group** with category subcommands ([#9909](https://github.com/NousResearch/hermes-agent/pull/9909))
- **Extract reply text from message references** ([#9781](https://github.com/NousResearch/hermes-agent/pull/9781))
### Feishu
- **Intelligent reply on document comments** with 3-tier access control ([#11898](https://github.com/NousResearch/hermes-agent/pull/11898))
- **Show processing state via reactions** on user messages ([#12927](https://github.com/NousResearch/hermes-agent/pull/12927))
- **Preserve @mention context for agent consumption** (salvage #13874) ([#14167](https://github.com/NousResearch/hermes-agent/pull/14167))
### DingTalk
- **`require_mention` + `allowed_users` gating** (parity with Slack/Telegram/Discord) ([#11564](https://github.com/NousResearch/hermes-agent/pull/11564))
- **QR-code device-flow authorization** for setup wizard ([#11574](https://github.com/NousResearch/hermes-agent/pull/11574))
- **AI Cards streaming, emoji reactions, and media handling** (salvage of #10985) ([#11910](https://github.com/NousResearch/hermes-agent/pull/11910))
### WhatsApp
- **`send_voice`** — native audio message delivery ([#13002](https://github.com/NousResearch/hermes-agent/pull/13002))
- **`dm_policy` and `group_policy`** parity with WeCom/Weixin/QQ adapters ([#13151](https://github.com/NousResearch/hermes-agent/pull/13151))
### WeCom / Weixin
- **WeCom QR-scan bot creation + interactive setup wizard** (salvage #13923) ([#13961](https://github.com/NousResearch/hermes-agent/pull/13961))
### Signal
- **Media delivery support** via `send_message` ([#13178](https://github.com/NousResearch/hermes-agent/pull/13178))
### Slack
- **Per-thread sessions for DMs by default** ([#10987](https://github.com/NousResearch/hermes-agent/pull/10987))
### BlueBubbles (iMessage)
- Group chat session separation, webhook registration & auth fixes ([#9806](https://github.com/NousResearch/hermes-agent/pull/9806))
### Gateway Core
- **Gateway proxy mode** — forward messages to a remote API server ([#9787](https://github.com/NousResearch/hermes-agent/pull/9787))
- **Per-channel ephemeral prompts** (Discord, Telegram, Slack, Mattermost) ([#10564](https://github.com/NousResearch/hermes-agent/pull/10564))
- **Surface plugin slash commands** natively on all platforms + decision-capable command hook ([#14175](https://github.com/NousResearch/hermes-agent/pull/14175))
- **Support document/archive extensions in MEDIA: tag extraction** (salvage #8255) ([#14307](https://github.com/NousResearch/hermes-agent/pull/14307))
- **Recognize `.pdf` in MEDIA: tag extraction** ([#13683](https://github.com/NousResearch/hermes-agent/pull/13683))
- **`--all` flag for `gateway start` and `restart`** ([#10043](https://github.com/NousResearch/hermes-agent/pull/10043))
- **Notify active sessions on gateway shutdown** + update health check ([#9850](https://github.com/NousResearch/hermes-agent/pull/9850))
- **Block agent from self-destructing the gateway** via terminal (closes #6666) ([#9895](https://github.com/NousResearch/hermes-agent/pull/9895))
- Fix: suppress duplicate replies on interrupt and streaming flood control ([#10235](https://github.com/NousResearch/hermes-agent/pull/10235))
- Fix: close temporary agents after one-off tasks ([#11028](https://github.com/NousResearch/hermes-agent/pull/11028), @kshitijk4poor)
- Fix: busy-session ack when user messages during active agent run ([#10068](https://github.com/NousResearch/hermes-agent/pull/10068))
- Fix: route watch-pattern notifications to the originating session ([#10460](https://github.com/NousResearch/hermes-agent/pull/10460))
- Fix: preserve notify context in executor threads ([#10921](https://github.com/NousResearch/hermes-agent/pull/10921), @kshitijk4poor)
- Fix: avoid duplicate replies after interrupted long tasks ([#11018](https://github.com/NousResearch/hermes-agent/pull/11018))
- Fix: unlink stale PID + lock files on cleanup
- Fix: force-unlink stale PID file after `--replace` takeover
---
## 🔧 Tool System
### Plugin Surface (major expansion)
- **`register_command()`** — plugins can now add slash commands ([#10626](https://github.com/NousResearch/hermes-agent/pull/10626))
- **`dispatch_tool()`** — plugins can invoke tools from their code ([#10763](https://github.com/NousResearch/hermes-agent/pull/10763))
- **`pre_tool_call` blocking** — plugins can veto tool execution ([#9377](https://github.com/NousResearch/hermes-agent/pull/9377))
- **`transform_tool_result`** — plugins rewrite tool results generically ([#12972](https://github.com/NousResearch/hermes-agent/pull/12972))
- **`transform_terminal_output`** — plugins rewrite terminal tool output ([#12929](https://github.com/NousResearch/hermes-agent/pull/12929))
- **Namespaced skill registration** for plugin skill bundles ([#9786](https://github.com/NousResearch/hermes-agent/pull/9786))
- **Opt-in-by-default + bundled disk-cleanup plugin** (salvage #12212) ([#12944](https://github.com/NousResearch/hermes-agent/pull/12944))
- **Pluggable `image_gen` backends + OpenAI provider** ([#13799](https://github.com/NousResearch/hermes-agent/pull/13799))
- **`openai-codex` image_gen plugin** (gpt-image-2 via Codex OAuth) ([#14317](https://github.com/NousResearch/hermes-agent/pull/14317))
- **Shell hooks** — wire shell scripts as hook callbacks ([#13296](https://github.com/NousResearch/hermes-agent/pull/13296))
### Browser
- **`browser_cdp` raw DevTools Protocol passthrough** ([#12369](https://github.com/NousResearch/hermes-agent/pull/12369))
- Camofox hardening + connection stability across the window
### Execute Code
- **Project/strict execution modes** (default: project) ([#11971](https://github.com/NousResearch/hermes-agent/pull/11971))
### Image Generation
- **Multi-model FAL support** with picker in `hermes tools` ([#11265](https://github.com/NousResearch/hermes-agent/pull/11265))
- **Recraft V3 → V4 Pro, Nano Banana → Pro upgrades** ([#11406](https://github.com/NousResearch/hermes-agent/pull/11406))
- **GPT Image 2** in FAL catalog ([#13677](https://github.com/NousResearch/hermes-agent/pull/13677))
- **xAI image generation provider** (grok-imagine-image) ([#14765](https://github.com/NousResearch/hermes-agent/pull/14765))
### TTS / STT / Voice
- **Google Gemini TTS provider** ([#11229](https://github.com/NousResearch/hermes-agent/pull/11229))
- **xAI Grok STT provider** ([#14473](https://github.com/NousResearch/hermes-agent/pull/14473))
- **xAI TTS** (shipped with Responses API upgrade) ([#10783](https://github.com/NousResearch/hermes-agent/pull/10783))
- **KittenTTS local provider** (salvage of #2109) ([#13395](https://github.com/NousResearch/hermes-agent/pull/13395))
- **CLI record beep toggle** ([#13247](https://github.com/NousResearch/hermes-agent/pull/13247), @helix4u)
### Webhook / Cron
- **Webhook direct-delivery mode** — zero-LLM push notifications ([#12473](https://github.com/NousResearch/hermes-agent/pull/12473))
- **Cron `wakeAgent` gate** — scripts can skip the agent entirely ([#12373](https://github.com/NousResearch/hermes-agent/pull/12373))
- **Cron per-job `enabled_toolsets`** — cap token overhead + cost per job ([#14767](https://github.com/NousResearch/hermes-agent/pull/14767))
### Delegate
- **Orchestrator role** + configurable spawn depth (default flat) ([#13691](https://github.com/NousResearch/hermes-agent/pull/13691))
- **Cross-agent file state coordination** ([#13718](https://github.com/NousResearch/hermes-agent/pull/13718))
### File / Patch
- **`patch` — "did you mean?" feedback** when patch fails to match ([#13435](https://github.com/NousResearch/hermes-agent/pull/13435))
### API Server
- **Stream `/v1/responses` SSE tool events** (salvage #9779) ([#10049](https://github.com/NousResearch/hermes-agent/pull/10049))
- **Inline image inputs** on `/v1/chat/completions` and `/v1/responses` ([#12969](https://github.com/NousResearch/hermes-agent/pull/12969))
### Docker / Podman
- **Entry-level Podman support** — `find_docker()` + rootless entrypoint ([#10066](https://github.com/NousResearch/hermes-agent/pull/10066))
- **Add docker-cli to Docker image** (salvage #10096) ([#14232](https://github.com/NousResearch/hermes-agent/pull/14232))
- **File-sync back to host on teardown** (salvage of #8189 + hardening) ([#11291](https://github.com/NousResearch/hermes-agent/pull/11291))
### MCP
- 12 MCP improvements across the window (status, timeout handling, tool-call forwarding, etc.)
---
## 🧩 Skills Ecosystem
### Skill System
- **Namespaced skill registration** for plugin bundles ([#9786](https://github.com/NousResearch/hermes-agent/pull/9786))
- **`hermes skills reset`** to un-stick bundled skills ([#11468](https://github.com/NousResearch/hermes-agent/pull/11468))
- **Skills guard opt-in** — `config.skills.guard_agent_created` (default off) ([#14557](https://github.com/NousResearch/hermes-agent/pull/14557))
- **Bundled skill scripts runnable out of the box** ([#13384](https://github.com/NousResearch/hermes-agent/pull/13384))
- **`xitter` replaced with `xurl`** — the official X API CLI ([#12303](https://github.com/NousResearch/hermes-agent/pull/12303))
- **MiniMax-AI/cli as default skill tap** (salvage #7501) ([#14493](https://github.com/NousResearch/hermes-agent/pull/14493))
- **Fuzzy `@` file completions + mtime sorting** ([#9467](https://github.com/NousResearch/hermes-agent/pull/9467))
### New Skills
- **concept-diagrams** (salvage of #11045, @v1k22) ([#11363](https://github.com/NousResearch/hermes-agent/pull/11363))
- **architecture-diagram** (Cocoon AI port) ([#9906](https://github.com/NousResearch/hermes-agent/pull/9906))
- **pixel-art** with hardware palettes and video animation ([#12663](https://github.com/NousResearch/hermes-agent/pull/12663), [#12725](https://github.com/NousResearch/hermes-agent/pull/12725))
- **baoyu-comic** ([#13257](https://github.com/NousResearch/hermes-agent/pull/13257), @JimLiu)
- **baoyu-infographic** — 21 layouts × 21 styles (salvage #9901) ([#12254](https://github.com/NousResearch/hermes-agent/pull/12254))
- **page-agent** — embed Alibaba's in-page GUI agent in your webapp ([#13976](https://github.com/NousResearch/hermes-agent/pull/13976))
- **fitness-nutrition** optional skill + optional env var support ([#9355](https://github.com/NousResearch/hermes-agent/pull/9355))
- **drug-discovery** — ChEMBL, PubChem, OpenFDA, ADMET ([#9443](https://github.com/NousResearch/hermes-agent/pull/9443))
- **touchdesigner-mcp** (salvage of #10081) ([#12298](https://github.com/NousResearch/hermes-agent/pull/12298))
- **adversarial-ux-test** optional skill (salvage of #2494, @omnissiah-comelse) ([#13425](https://github.com/NousResearch/hermes-agent/pull/13425))
- **maps** — added `guest_house`, `camp_site`, and dual-key bakery lookup ([#13398](https://github.com/NousResearch/hermes-agent/pull/13398))
- **llm-wiki** — port provenance markers, source hashing, and quality signals ([#13700](https://github.com/NousResearch/hermes-agent/pull/13700))
---
## 📊 Web Dashboard
- **i18n (English + Chinese) language switcher** ([#9453](https://github.com/NousResearch/hermes-agent/pull/9453))
- **Live-switching theme system** ([#10687](https://github.com/NousResearch/hermes-agent/pull/10687))
- **Dashboard plugin system** — extend the web UI with custom tabs ([#10951](https://github.com/NousResearch/hermes-agent/pull/10951))
- **react-router, sidebar layout, sticky header, dropdown component** ([#9370](https://github.com/NousResearch/hermes-agent/pull/9370), @austinpickett)
- **Responsive for mobile** ([#9228](https://github.com/NousResearch/hermes-agent/pull/9228), @DeployFaith)
- **Vercel deployment** ([#10686](https://github.com/NousResearch/hermes-agent/pull/10686), [#11061](https://github.com/NousResearch/hermes-agent/pull/11061), @austinpickett)
- **Context window config support** ([#9357](https://github.com/NousResearch/hermes-agent/pull/9357))
- **HTTP health probe for cross-container gateway detection** ([#9894](https://github.com/NousResearch/hermes-agent/pull/9894))
- **Update + restart gateway buttons** ([#13526](https://github.com/NousResearch/hermes-agent/pull/13526), @austinpickett)
- **Real API call count per session** (salvages #10140) ([#14004](https://github.com/NousResearch/hermes-agent/pull/14004))
---
## 🖱️ CLI & User Experience
- **Dynamic shell completion for bash, zsh, and fish** ([#9785](https://github.com/NousResearch/hermes-agent/pull/9785))
- **Light-mode skins + skin-aware completion menus** ([#9461](https://github.com/NousResearch/hermes-agent/pull/9461))
- **Numbered keyboard shortcuts** on approval and clarify prompts ([#13416](https://github.com/NousResearch/hermes-agent/pull/13416))
- **Markdown stripping, compact multiline previews, external editor** ([#12934](https://github.com/NousResearch/hermes-agent/pull/12934))
- **`--ignore-user-config` and `--ignore-rules` flags** (port codex#18646) ([#14277](https://github.com/NousResearch/hermes-agent/pull/14277))
- **Account limits section in `/usage`** ([#13428](https://github.com/NousResearch/hermes-agent/pull/13428))
- **Doctor: Command Installation check** for `hermes` bin symlink ([#10112](https://github.com/NousResearch/hermes-agent/pull/10112))
- **ESC cancels secret/sudo prompts**, clearer skip messaging ([#9902](https://github.com/NousResearch/hermes-agent/pull/9902))
- Fix: agent-facing text uses `display_hermes_home()` instead of hardcoded `~/.hermes` ([#10285](https://github.com/NousResearch/hermes-agent/pull/10285))
- Fix: enforce `config.yaml` as sole CWD source + deprecate `.env` CWD vars + add `hermes memory reset` ([#11029](https://github.com/NousResearch/hermes-agent/pull/11029))
---
## 🔒 Security & Reliability
- **Global toggle to allow private/internal URL resolution** ([#14166](https://github.com/NousResearch/hermes-agent/pull/14166))
- **Block agent from self-destructing the gateway** via terminal (closes #6666) ([#9895](https://github.com/NousResearch/hermes-agent/pull/9895))
- **Telegram callback authorization** on update prompts ([#10536](https://github.com/NousResearch/hermes-agent/pull/10536))
- **SECURITY.md** added ([#10532](https://github.com/NousResearch/hermes-agent/pull/10532), @I3eg1nner)
- **Warn about legacy hermes.service units** during `hermes update` ([#11918](https://github.com/NousResearch/hermes-agent/pull/11918))
- **Complete ASCII-locale UnicodeEncodeError recovery** for `api_messages`/`reasoning_content` (closes #6843) ([#10537](https://github.com/NousResearch/hermes-agent/pull/10537))
- **Prevent stale `os.environ` leak** after `clear_session_vars` ([#10527](https://github.com/NousResearch/hermes-agent/pull/10527))
- **Prevent agent hang when backgrounding processes** via terminal tool ([#10584](https://github.com/NousResearch/hermes-agent/pull/10584))
- Many smaller session-resume, interrupt, streaming, and memory-race fixes throughout the window
---
## 🐛 Notable Bug Fixes
The `fix:` category in this window covers 482 PRs. Highlights:
- Streaming cursor artifacts filtered from Matrix, Telegram, WhatsApp, Discord (multiple PRs)
- `<think>` and `<thought>` blocks filtered from gateway stream consumers ([#9408](https://github.com/NousResearch/hermes-agent/pull/9408))
- Gateway display.streaming root-config override regression ([#9799](https://github.com/NousResearch/hermes-agent/pull/9799))
- Context `session_search` coerces limit to int (prevents TypeError) ([#10522](https://github.com/NousResearch/hermes-agent/pull/10522))
- Memory tool stays available when `fcntl` is unavailable (Windows) ([#9783](https://github.com/NousResearch/hermes-agent/pull/9783))
- Trajectory compressor credentials load from `HERMES_HOME/.env` ([#9632](https://github.com/NousResearch/hermes-agent/pull/9632), @Dusk1e)
- `@_context_completions` no longer crashes on `@` mention ([#9683](https://github.com/NousResearch/hermes-agent/pull/9683), @kshitijk4poor)
- Group session `user_id` no longer treated as `thread_id` in shutdown notifications ([#10546](https://github.com/NousResearch/hermes-agent/pull/10546))
- Telegram `platform_hint` — markdown is supported (closes #8261) ([#10612](https://github.com/NousResearch/hermes-agent/pull/10612))
- Doctor checks for Kimi China credentials fixed
- Streaming: don't suppress final response when commentary message is sent ([#10540](https://github.com/NousResearch/hermes-agent/pull/10540))
- Rapid Telegram follow-ups no longer get cut off
---
## 🧪 Testing & CI
- **Contributor attribution CI check** on PRs ([#9376](https://github.com/NousResearch/hermes-agent/pull/9376))
- Hermetic test parity (`scripts/run_tests.sh`) held across this window
- Test count stabilized post-Transport refactor; CI matrix held green through the transport rollout
---
## 📚 Documentation
- Atropos + wandb links in user guide
- ACP / VS Code / Zed / JetBrains integration docs refresh
- Webhook subscription docs updated for direct-delivery mode
- Plugin author guide expanded for new hooks (`register_command`, `dispatch_tool`, `transform_tool_result`)
- Transport layer developer guide added
- Website removed Discussions link from README
---
## 👥 Contributors
### Core
- **@teknium1** (Teknium)
### Top Community Contributors (by merged PR count)
- **@kshitijk4poor** — 49 PRs · Transport refactor (AnthropicTransport, ResponsesApiTransport), Step Plan provider, Xiaomi MiMo v2.5 support, numerous gateway fixes, promoted Kimi K2.5, @ mention crash fix
- **@OutThisLife** (Brooklyn) — 31 PRs · TUI polish, git branch in status bar, per-turn stopwatch, stable picker keys, `/clear` confirm, light-theme preset, subagent spawn observability overlay
- **@helix4u** — 11 PRs · Voice CLI record beep, MCP tool interrupt handling, assorted stability fixes
- **@austinpickett** — 8 PRs · Dashboard react-router + sidebar + sticky header + dropdown, Vercel deployment, update + restart buttons
- **@alt-glitch** — 8 PRs · PLATFORM_HINTS for Matrix/Mattermost/Feishu, Matrix fixes
- **@ethernet8023** — 3 PRs
- **@benbarclay** — 3 PRs
- **@Aslaaen** — 2 PRs
### Also contributing
@jerilynzheng (ai-gateway pricing), @JimLiu (baoyu-comic skill), @Dusk1e (trajectory compressor credentials), @DeployFaith (mobile-responsive dashboard), @LeonSGP43, @v1k22 (concept-diagrams), @omnissiah-comelse (adversarial-ux-test), @coekfung (Telegram MarkdownV2 expandable blockquotes), @liftaris (TUI provider resolution), @arihantsethia (skill analytics dashboard), @topcheer + @xing8star (QQBot foundation), @kovyrin, @I3eg1nner (SECURITY.md), @PeterBerthelsen, @lengxii, @priveperfumes, @sjz-ks, @cuyua9, @Disaster-Terminator, @leozeli, @LehaoLin, @trevthefoolish, @loongfay, @MrNiceRicee, @WideLee, @bluefishs, @malaiwah, @bobashopcashier, @dsocolobsky, @iamagenius00, @IAvecilla, @aniruddhaadak80, @Es1la, @asheriif, @walli, @jquesnelle (original Tool Gateway work).
### All Contributors (alphabetical)
@0xyg3n, @10ishq, @A-afflatus, @Abnertheforeman, @admin28980, @adybag14-cyber, @akhater, @alexzhu0,
@AllardQuek, @alt-glitch, @aniruddhaadak80, @anna-oake, @anniesurla, @anthhub, @areu01or00, @arihantsethia,
@arthurbr11, @asheriif, @Aslaaen, @Asunfly, @austinpickett, @AviArora02-commits, @AxDSan, @azhengbot, @Bartok9,
@benbarclay, @bennytimz, @bernylinville, @bingo906, @binhnt92, @bkadish, @bluefishs, @bobashopcashier,
@brantzh6, @BrennerSpear, @brianclemens, @briandevans, @brooklynnicholson, @bugkill3r, @buray, @burtenshaw,
@cdanis, @cgarwood82, @ChimingLiu, @chongweiliu, @christopherwoodall, @coekfung, @cola-runner, @corazzione,
@counterposition, @cresslank, @cuyua9, @cypres0099, @danieldoderlein, @davetist, @davidvv, @DeployFaith,
@Dev-Mriganka, @devorun, @dieutx, @Disaster-Terminator, @dodo-reach, @draix, @DrStrangerUJN, @dsocolobsky,
@Dusk1e, @dyxushuai, @elkimek, @elmatadorgh, @emozilla, @entropidelic, @Erosika, @erosika, @Es1la, @etcircle,
@etherman-os, @ethernet8023, @fancydirty, @farion1231, @fatinghenji, @Fatty911, @fengtianyu88, @Feranmi10,
@flobo3, @francip, @fuleinist, @g-guthrie, @GenKoKo, @gianfrancopiana, @gnanam1990, @GuyCui, @haileymarshall,
@haimu0x, @handsdiff, @hansnow, @hedgeho9X, @helix4u, @hengm3467, @HenkDz, @heykb, @hharry11, @HiddenPuppy,
@honghua, @houko, @houziershi, @hsy5571616, @huangke19, @hxp-plus, @Hypn0sis, @I3eg1nner, @iacker,
@iamagenius00, @IAvecilla, @iborazzi, @Ifkellx, @ifrederico, @imink, @isaachuangGMICLOUD, @ismell0992-afk,
@j0sephz, @Jaaneek, @jackjin1997, @JackTheGit, @jaffarkeikei, @jerilynzheng, @JiaDe-Wu, @Jiawen-lee, @JimLiu,
@jinzheng8115, @jneeee, @jplew, @jquesnelle, @Julientalbot, @Junass1, @jvcl, @kagura-agent, @keifergu,
@kevinskysunny, @keyuyuan, @konsisumer, @kovyrin, @kshitijk4poor, @leeyang1990, @LehaoLin, @lengxii,
@LeonSGP43, @leozeli, @li0near, @liftaris, @Lind3ey, @Linux2010, @liujinkun2025, @LLQWQ, @Llugaes, @lmoncany,
@longsizhuo, @lrawnsley, @Lubrsy706, @lumenradley, @luyao618, @lvnilesh, @LVT382009, @m0n5t3r, @Magaav,
@MagicRay1217, @malaiwah, @manuelschipper, @Marvae, @MassiveMassimo, @mavrickdeveloper, @maxchernin, @memosr,
@meng93, @mengjian-github, @MestreY0d4-Uninter, @Mibayy, @MikeFac, @mikewaters, @milkoor, @minorgod,
@MrNiceRicee, @ms-alan, @mvanhorn, @n-WN, @N0nb0at, @Nan93, @NIDNASSER-Abdelmajid, @nish3451, @niyoh120,
@nocoo, @nosleepcassette, @NousResearch, @ogzerber, @omnissiah-comelse, @Only-Code-A, @opriz, @OwenYWT, @pedh,
@pefontana, @PeterBerthelsen, @phpoh, @pinion05, @plgonzalezrx8, @pradeep7127, @priveperfumes,
@projectadmin-dev, @PStarH, @rnijhara, @Roy-oss1, @roytian1217, @RucchiZ, @Ruzzgar, @RyanLee-Dev, @Salt-555,
@Sanjays2402, @sgaofen, @sharziki, @shenuu, @shin4, @SHL0MS, @shushuzn, @sicnuyudidi, @simon-gtcl,
@simon-marcus, @sirEven, @Sisyphus, @sjz-ks, @snreynolds, @Societus, @Somme4096, @sontianye, @sprmn24,
@StefanIsMe, @stephenschoettler, @Swift42, @taeng0204, @taeuk178, @tannerfokkens-maker, @TaroballzChen,
@ten-ltw, @teyrebaz33, @Tianworld, @topcheer, @Tranquil-Flow, @trevthefoolish, @TroyMitchell911, @UNLINEARITY,
@v1k22, @vivganes, @vominh1919, @vrinek, @VTRiot, @WadydX, @walli, @wenhao7, @WhiteWorld, @WideLee, @wujhsu,
@WuTianyi123, @Wysie, @xandersbell, @xiaoqiang243, @xiayh0107, @xinpengdr, @Xowiek, @ycbai, @yeyitech, @ygd58,
@youngDoo, @yudaiyan, @Yukipukii1, @yule975, @yyq4193, @yzx9, @ZaynJarvis, @zhang9w0v5, @zhanggttry,
@zhangxicen, @zhongyueming1121, @zhouxiaoya12, @zons-zhaozhy
Also: @maelrx, @Marco Rutsch, @MaxsolcuCrypto, @Mind-Dragon, @Paul Bergeron, @say8hi, @whitehatjr1001.
---
**Full Changelog**: [v2026.4.13...v2026.4.23](https://github.com/NousResearch/hermes-agent/compare/v2026.4.13...v2026.4.23)
+3 -9
View File
@@ -60,7 +60,7 @@ from acp_adapter.events import (
make_tool_progress_cb,
)
from acp_adapter.permissions import make_approval_callback
from acp_adapter.session import SessionManager, SessionState, _expand_acp_enabled_toolsets
from acp_adapter.session import SessionManager, SessionState
logger = logging.getLogger(__name__)
@@ -287,11 +287,7 @@ class HermesACPAgent(acp.Agent):
try:
from model_tools import get_tool_definitions
enabled_toolsets = _expand_acp_enabled_toolsets(
getattr(state.agent, "enabled_toolsets", None) or ["hermes-acp"],
mcp_server_names=[server.name for server in mcp_servers],
)
state.agent.enabled_toolsets = enabled_toolsets
enabled_toolsets = getattr(state.agent, "enabled_toolsets", None) or ["hermes-acp"]
disabled_toolsets = getattr(state.agent, "disabled_toolsets", None)
state.agent.tools = get_tool_definitions(
enabled_toolsets=enabled_toolsets,
@@ -758,9 +754,7 @@ class HermesACPAgent(acp.Agent):
def _cmd_tools(self, args: str, state: SessionState) -> str:
try:
from model_tools import get_tool_definitions
toolsets = _expand_acp_enabled_toolsets(
getattr(state.agent, "enabled_toolsets", None) or ["hermes-acp"]
)
toolsets = getattr(state.agent, "enabled_toolsets", None) or ["hermes-acp"]
tools = get_tool_definitions(enabled_toolsets=toolsets, quiet_mode=True)
if not tools:
return "No tools available."
+1 -28
View File
@@ -106,24 +106,6 @@ def _register_task_cwd(task_id: str, cwd: str) -> None:
logger.debug("Failed to register ACP task cwd override", exc_info=True)
def _expand_acp_enabled_toolsets(
toolsets: List[str] | None = None,
mcp_server_names: List[str] | None = None,
) -> List[str]:
"""Return ACP toolsets plus explicit MCP server toolsets for this session."""
expanded: List[str] = []
for name in list(toolsets or ["hermes-acp"]):
if name and name not in expanded:
expanded.append(name)
for server_name in list(mcp_server_names or []):
toolset_name = f"mcp-{server_name}"
if server_name and toolset_name not in expanded:
expanded.append(toolset_name)
return expanded
def _clear_task_cwd(task_id: str) -> None:
"""Remove task-specific cwd overrides for an ACP session."""
if not task_id:
@@ -555,18 +537,9 @@ class SessionManager:
elif isinstance(model_cfg, str) and model_cfg.strip():
default_model = model_cfg.strip()
configured_mcp_servers = [
name
for name, cfg in (config.get("mcp_servers") or {}).items()
if not isinstance(cfg, dict) or cfg.get("enabled", True) is not False
]
kwargs = {
"platform": "acp",
"enabled_toolsets": _expand_acp_enabled_toolsets(
["hermes-acp"],
mcp_server_names=configured_mcp_servers,
),
"enabled_toolsets": ["hermes-acp"],
"quiet_mode": True,
"session_id": session_id,
"model": model or default_model,
+8 -113
View File
@@ -14,8 +14,6 @@ import copy
import json
import logging
import os
import platform
import subprocess
from pathlib import Path
from hermes_constants import get_hermes_home
@@ -279,9 +277,8 @@ def _is_oauth_token(key: str) -> bool:
Positively identifies Anthropic OAuth tokens by their key format:
- ``sk-ant-`` prefix (but NOT ``sk-ant-api``) → setup tokens, managed keys
- ``eyJ`` prefix → JWTs from the Anthropic OAuth flow
- ``cc-`` prefix → Claude Code OAuth access tokens (from CLAUDE_CODE_OAUTH_TOKEN)
Non-Anthropic keys (MiniMax, Alibaba, etc.) don't match any pattern
Non-Anthropic keys (MiniMax, Alibaba, etc.) don't match either pattern
and correctly return False.
"""
if not key:
@@ -295,9 +292,6 @@ def _is_oauth_token(key: str) -> bool:
# JWTs from Anthropic OAuth flow
if key.startswith("eyJ"):
return True
# Claude Code OAuth access tokens (opaque, from CLAUDE_CODE_OAUTH_TOKEN)
if key.startswith("cc-"):
return True
return False
@@ -363,7 +357,7 @@ def _common_betas_for_base_url(base_url: str | None) -> list[str]:
return _COMMON_BETAS
def build_anthropic_client(api_key: str, base_url: str = None, timeout: float = None):
def build_anthropic_client(api_key: str, base_url: str = None, timeout: Optional[float] = None):
"""Create an Anthropic client, auto-detecting setup-tokens vs API keys.
If *timeout* is provided it overrides the default 900s read timeout. The
@@ -467,72 +461,8 @@ def build_anthropic_bedrock_client(region: str):
)
def _read_claude_code_credentials_from_keychain() -> Optional[Dict[str, Any]]:
"""Read Claude Code OAuth credentials from the macOS Keychain.
Claude Code >=2.1.114 stores credentials in the macOS Keychain under the
service name "Claude Code-credentials" rather than (or in addition to)
the JSON file at ~/.claude/.credentials.json.
The password field contains a JSON string with the same claudeAiOauth
structure as the JSON file.
Returns dict with {accessToken, refreshToken?, expiresAt?} or None.
"""
import platform
import subprocess
if platform.system() != "Darwin":
return None
try:
# Read the "Claude Code-credentials" generic password entry
result = subprocess.run(
["security", "find-generic-password",
"-s", "Claude Code-credentials",
"-w"],
capture_output=True,
text=True,
timeout=5,
)
except (OSError, subprocess.TimeoutExpired):
logger.debug("Keychain: security command not available or timed out")
return None
if result.returncode != 0:
logger.debug("Keychain: no entry found for 'Claude Code-credentials'")
return None
raw = result.stdout.strip()
if not raw:
return None
try:
data = json.loads(raw)
except json.JSONDecodeError:
logger.debug("Keychain: credentials payload is not valid JSON")
return None
oauth_data = data.get("claudeAiOauth")
if oauth_data and isinstance(oauth_data, dict):
access_token = oauth_data.get("accessToken", "")
if access_token:
return {
"accessToken": access_token,
"refreshToken": oauth_data.get("refreshToken", ""),
"expiresAt": oauth_data.get("expiresAt", 0),
"source": "macos_keychain",
}
return None
def read_claude_code_credentials() -> Optional[Dict[str, Any]]:
"""Read refreshable Claude Code OAuth credentials.
Checks two sources in order:
1. macOS Keychain (Darwin only) — "Claude Code-credentials" entry
2. ~/.claude/.credentials.json file
"""Read refreshable Claude Code OAuth credentials from ~/.claude/.credentials.json.
This intentionally excludes ~/.claude.json primaryApiKey. Opencode's
subscription flow is OAuth/setup-token based with refreshable credentials,
@@ -541,12 +471,6 @@ def read_claude_code_credentials() -> Optional[Dict[str, Any]]:
Returns dict with {accessToken, refreshToken?, expiresAt?} or None.
"""
# Try macOS Keychain first (covers Claude Code >=2.1.114)
kc_creds = _read_claude_code_credentials_from_keychain()
if kc_creds:
return kc_creds
# Fall back to JSON file
cred_path = Path.home() / ".claude" / ".credentials.json"
if cred_path.exists():
try:
@@ -717,9 +641,7 @@ def _write_claude_code_credentials(
existing["claudeAiOauth"] = oauth_data
cred_path.parent.mkdir(parents=True, exist_ok=True)
_tmp_cred = cred_path.with_suffix(".tmp")
_tmp_cred.write_text(json.dumps(existing, indent=2), encoding="utf-8")
_tmp_cred.replace(cred_path)
cred_path.write_text(json.dumps(existing, indent=2), encoding="utf-8")
# Restrict permissions (credentials file)
cred_path.chmod(0o600)
except (OSError, IOError) as e:
@@ -986,26 +908,6 @@ def read_hermes_oauth_credentials() -> Optional[Dict[str, Any]]:
# ---------------------------------------------------------------------------
def _is_bedrock_model_id(model: str) -> bool:
"""Detect AWS Bedrock model IDs that use dots as namespace separators.
Bedrock model IDs come in two forms:
- Bare: ``anthropic.claude-opus-4-7``
- Regional (inference profiles): ``us.anthropic.claude-sonnet-4-5-v1:0``
In both cases the dots separate namespace components, not version
numbers, and must be preserved verbatim for the Bedrock API.
"""
lower = model.lower()
# Regional inference-profile prefixes
if any(lower.startswith(p) for p in ("global.", "us.", "eu.", "ap.", "jp.")):
return True
# Bare Bedrock model IDs: provider.model-family
if lower.startswith("anthropic."):
return True
return False
def normalize_model_name(model: str, preserve_dots: bool = False) -> str:
"""Normalize a model name for the Anthropic API.
@@ -1013,19 +915,11 @@ def normalize_model_name(model: str, preserve_dots: bool = False) -> str:
- Converts dots to hyphens in version numbers (OpenRouter uses dots,
Anthropic uses hyphens: claude-opus-4.6 → claude-opus-4-6), unless
preserve_dots is True (e.g. for Alibaba/DashScope: qwen3.5-plus).
- Preserves Bedrock model IDs (``anthropic.claude-opus-4-7``) and
regional inference profiles (``us.anthropic.claude-*``) whose dots
are namespace separators, not version separators.
"""
lower = model.lower()
if lower.startswith("anthropic/"):
model = model[len("anthropic/"):]
if not preserve_dots:
# Bedrock model IDs use dots as namespace separators
# (e.g. "anthropic.claude-opus-4-7", "us.anthropic.claude-*").
# These must not be converted to hyphens. See issue #12295.
if _is_bedrock_model_id(model):
return model
# OpenRouter uses dots for version separators (claude-opus-4.6),
# Anthropic uses hyphens (claude-opus-4-6). Convert dots to hyphens.
model = model.replace(".", "-")
@@ -1680,9 +1574,9 @@ def build_anthropic_kwargs(
# ── Strip sampling params on 4.7+ ─────────────────────────────────
# Opus 4.7 rejects any non-default temperature/top_p/top_k with a 400.
# Callers (auxiliary_client, etc.) may set these for older models;
# drop them here as a safety net so upstream 4.6 → 4.7 migrations
# don't require coordinated edits everywhere.
# Callers (auxiliary_client, flush_memories, etc.) may set these for
# older models; drop them here as a safety net so upstream 4.6 → 4.7
# migrations don't require coordinated edits everywhere.
if _forbids_sampling_params(model):
for _sampling_key in ("temperature", "top_p", "top_k"):
kwargs.pop(_sampling_key, None)
@@ -1704,3 +1598,4 @@ def build_anthropic_kwargs(
return kwargs
+27 -359
View File
@@ -41,10 +41,13 @@ import threading
import time
from pathlib import Path # noqa: F401 — used by test mocks
from types import SimpleNamespace
from typing import Any, Dict, List, Optional, Tuple
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Tuple, Union
from openai import OpenAI
if TYPE_CHECKING:
from agent.gemini_native_adapter import GeminiNativeClient
from agent.credential_pool import load_pool
from hermes_cli.config import get_hermes_home
from hermes_constants import OPENROUTER_BASE_URL
@@ -74,12 +77,6 @@ _PROVIDER_ALIASES = {
"minimax_cn": "minimax-cn",
"claude": "anthropic",
"claude-code": "anthropic",
"github": "copilot",
"github-copilot": "copilot",
"github-model": "copilot",
"github-models": "copilot",
"github-copilot-acp": "copilot-acp",
"copilot-acp-agent": "copilot-acp",
}
@@ -95,11 +92,10 @@ def _normalize_aux_provider(provider: Optional[str]) -> str:
if normalized == "main":
# Resolve to the user's actual main provider so named custom providers
# and non-aggregator providers (DeepSeek, Alibaba, etc.) work correctly.
main_prov = (_read_main_provider() or "").strip().lower()
main_prov = _read_main_provider()
if main_prov and main_prov not in ("auto", "main", ""):
normalized = main_prov
else:
return "custom"
return main_prov
return "custom"
return _PROVIDER_ALIASES.get(normalized, normalized)
@@ -158,7 +154,7 @@ _API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
# differs from their main chat model, map it here. The vision auto-detect
# "exotic provider" branch checks this before falling back to the main model.
_PROVIDER_VISION_MODELS: Dict[str, str] = {
"xiaomi": "mimo-v2.5",
"xiaomi": "mimo-v2-omni",
"zai": "glm-5v-turbo",
}
@@ -390,7 +386,7 @@ class _CodexCompletionsAdapter:
# Note: the Codex endpoint (chatgpt.com/backend-api/codex) does NOT
# support max_output_tokens or temperature — omit to avoid 400 errors.
# Tools support for auxiliary callers (e.g. skills_hub) that pass function schemas
# Tools support for flush_memories and similar callers
tools = kwargs.get("tools")
if tools:
converted = []
@@ -817,7 +813,11 @@ def _read_codex_access_token() -> Optional[str]:
return None
def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
# TODO(refactor): This function has messy types and duplicated logic (pool vs direct creds).
# Ideal fix: (1) define an AuxiliaryClient Protocol both OpenAI/GeminiNativeClient satisfy,
# (2) return a NamedTuple or dataclass instead of raw tuple, (3) extract the repeated
# Gemini/Kimi/Copilot client-building into a helper.
def _resolve_api_key_provider() -> Tuple[Optional[Union[OpenAI, "GeminiNativeClient"]], Optional[str]]:
"""Try each API-key provider in PROVIDER_REGISTRY order.
Returns (client, model) for the first provider with usable runtime
@@ -923,19 +923,6 @@ def _try_openrouter() -> Tuple[Optional[OpenAI], Optional[str]]:
default_headers=_OR_HEADERS), _OPENROUTER_MODEL
def _describe_openrouter_unavailable() -> str:
"""Return a more precise OpenRouter auth failure reason for logs."""
pool_present, entry = _select_pool_entry("openrouter")
if pool_present:
if entry is None:
return "OpenRouter credential pool has no usable entries (credentials may be exhausted)"
if not _pool_runtime_api_key(entry):
return "OpenRouter credential pool entry is missing a runtime API key"
if not str(os.getenv("OPENROUTER_API_KEY") or "").strip():
return "OPENROUTER_API_KEY not set"
return "no usable OpenRouter credentials found"
def _try_nous(vision: bool = False) -> Tuple[Optional[OpenAI], Optional[str]]:
# Check cross-session rate limit guard before attempting Nous —
# if another session already recorded a 429, skip Nous entirely
@@ -1349,111 +1336,6 @@ def _is_auth_error(exc: Exception) -> bool:
return "error code: 401" in err_lower or "authenticationerror" in type(exc).__name__.lower()
def _is_unsupported_parameter_error(exc: Exception, param: str) -> bool:
"""Detect provider 400s for an unsupported request parameter.
Different OpenAI-compatible endpoints phrase the same class of error a few
ways: ``Unsupported parameter: X``, ``unsupported_parameter`` with a
``param`` field, ``X is not supported``, ``unknown parameter: X``,
``unrecognized request argument: X``. We match on both the parameter
name and a generic "unsupported/unknown/unrecognized parameter" marker so
call sites can reactively retry without the offending key instead of
surfacing a noisy auxiliary failure.
Generalizes the temperature-specific detector that originally shipped
with PR #15621 so the same retry strategy can cover ``max_tokens``,
``seed``, ``top_p``, and any future quirk. Credit @nicholasrae (PR #15416)
for the generalization pattern.
"""
param_lower = (param or "").lower()
if not param_lower:
return False
err_lower = str(exc).lower()
if param_lower not in err_lower:
return False
return any(marker in err_lower for marker in (
"unsupported parameter",
"unsupported_parameter",
"not supported",
"does not support",
"unknown parameter",
"unrecognized request argument",
"unrecognized parameter",
"invalid parameter",
))
def _is_unsupported_temperature_error(exc: Exception) -> bool:
"""Back-compat wrapper: detect API errors where the model rejects ``temperature``.
Delegates to :func:`_is_unsupported_parameter_error`; kept as a separate
public symbol because existing tests and call sites import it by name.
"""
return _is_unsupported_parameter_error(exc, "temperature")
def _evict_cached_clients(provider: str) -> None:
"""Drop cached auxiliary clients for a provider so fresh creds are used."""
normalized = _normalize_aux_provider(provider)
with _client_cache_lock:
stale_keys = [
key for key in _client_cache
if _normalize_aux_provider(str(key[0])) == normalized
]
for key in stale_keys:
client = _client_cache.get(key, (None, None, None))[0]
if client is not None:
_force_close_async_httpx(client)
try:
close_fn = getattr(client, "close", None)
if callable(close_fn):
close_fn()
except Exception:
pass
_client_cache.pop(key, None)
def _refresh_provider_credentials(provider: str) -> bool:
"""Refresh short-lived credentials for OAuth-backed auxiliary providers."""
normalized = _normalize_aux_provider(provider)
try:
if normalized == "openai-codex":
from hermes_cli.auth import resolve_codex_runtime_credentials
creds = resolve_codex_runtime_credentials(force_refresh=True)
if not str(creds.get("api_key", "") or "").strip():
return False
_evict_cached_clients(normalized)
return True
if normalized == "nous":
from hermes_cli.auth import resolve_nous_runtime_credentials
creds = resolve_nous_runtime_credentials(
min_key_ttl_seconds=max(60, int(os.getenv("HERMES_NOUS_MIN_KEY_TTL_SECONDS", "1800"))),
timeout_seconds=float(os.getenv("HERMES_NOUS_TIMEOUT_SECONDS", "15")),
force_mint=True,
)
if not str(creds.get("api_key", "") or "").strip():
return False
_evict_cached_clients(normalized)
return True
if normalized == "anthropic":
from agent.anthropic_adapter import read_claude_code_credentials, _refresh_oauth_token, resolve_anthropic_token
creds = read_claude_code_credentials()
token = _refresh_oauth_token(creds) if isinstance(creds, dict) and creds.get("refreshToken") else None
if not str(token or "").strip():
token = resolve_anthropic_token()
if not str(token or "").strip():
return False
_evict_cached_clients(normalized)
return True
except Exception as exc:
logger.debug("Auxiliary provider credential refresh failed for %s: %s", normalized, exc)
return False
return False
def _try_payment_fallback(
failed_provider: str,
task: str = None,
@@ -1752,10 +1634,8 @@ def resolve_provider_client(
if provider == "openrouter":
client, default = _try_openrouter()
if client is None:
logger.warning(
"resolve_provider_client: openrouter requested but %s",
_describe_openrouter_unavailable(),
)
logger.warning("resolve_provider_client: openrouter requested "
"but OPENROUTER_API_KEY not set")
return None, None
final_model = _normalize_resolved_model(model or default, provider)
return (_to_async_client(client, final_model) if async_mode
@@ -1848,7 +1728,7 @@ def resolve_provider_client(
"but no endpoint credentials found")
return None, None
# ── Named custom providers (config.yaml providers dict / custom_providers list) ───
# ── Named custom providers (config.yaml custom_providers list) ───
try:
from hermes_cli.runtime_provider import _get_named_custom_provider
custom_entry = _get_named_custom_provider(provider)
@@ -1859,51 +1739,16 @@ def resolve_provider_client(
if not custom_key and custom_key_env:
custom_key = os.getenv(custom_key_env, "").strip()
custom_key = custom_key or "no-key-required"
# An explicit per-task api_mode override (from _resolve_task_provider_model)
# wins; otherwise fall back to what the provider entry declared.
entry_api_mode = (api_mode or custom_entry.get("api_mode") or "").strip()
if custom_base:
final_model = _normalize_resolved_model(
model or custom_entry.get("model") or _read_main_model() or "gpt-4o-mini",
provider,
)
logger.debug(
"resolve_provider_client: named custom provider %r (%s, api_mode=%s)",
provider, final_model, entry_api_mode or "chat_completions")
# anthropic_messages: route through the Anthropic Messages API
# via AnthropicAuxiliaryClient. Mirrors the anonymous-custom
# branch in _try_custom_endpoint(). See #15033.
if entry_api_mode == "anthropic_messages":
try:
from agent.anthropic_adapter import build_anthropic_client
real_client = build_anthropic_client(custom_key, custom_base)
except ImportError:
logger.warning(
"Named custom provider %r declares api_mode="
"anthropic_messages but the anthropic SDK is not "
"installed — falling back to OpenAI-wire.",
provider,
)
client = OpenAI(api_key=custom_key, base_url=custom_base)
return (_to_async_client(client, final_model) if async_mode
else (client, final_model))
sync_anthropic = AnthropicAuxiliaryClient(
real_client, final_model, custom_key, custom_base, is_oauth=False,
)
if async_mode:
return AsyncAnthropicAuxiliaryClient(sync_anthropic), final_model
return sync_anthropic, final_model
client = OpenAI(api_key=custom_key, base_url=custom_base)
# codex_responses or inherited auto-detect (via _wrap_if_needed).
# _wrap_if_needed reads the closed-over `api_mode` (the task-level
# override). Named-provider entry api_mode=codex_responses also
# flows through here.
if entry_api_mode == "codex_responses" and not isinstance(
client, CodexAuxiliaryClient
):
client = CodexAuxiliaryClient(client, final_model)
else:
client = _wrap_if_needed(client, final_model, custom_base)
client = _wrap_if_needed(client, final_model, custom_base)
logger.debug(
"resolve_provider_client: named custom provider %r (%s)",
provider, final_model)
return (_to_async_client(client, final_model) if async_mode
else (client, final_model))
logger.warning(
@@ -2036,39 +1881,6 @@ def resolve_provider_client(
"directly supported", provider)
return None, None
elif pconfig.auth_type == "aws_sdk":
# AWS SDK providers (Bedrock) — use the Anthropic Bedrock client via
# boto3's credential chain (IAM roles, SSO, env vars, instance metadata).
try:
from agent.bedrock_adapter import has_aws_credentials, resolve_bedrock_region
from agent.anthropic_adapter import build_anthropic_bedrock_client
except ImportError:
logger.warning("resolve_provider_client: bedrock requested but "
"boto3 or anthropic SDK not installed")
return None, None
if not has_aws_credentials():
logger.debug("resolve_provider_client: bedrock requested but "
"no AWS credentials found")
return None, None
region = resolve_bedrock_region()
default_model = "anthropic.claude-haiku-4-5-20251001-v1:0"
final_model = _normalize_resolved_model(model or default_model, provider)
try:
real_client = build_anthropic_bedrock_client(region)
except ImportError as exc:
logger.warning("resolve_provider_client: cannot create Bedrock "
"client: %s", exc)
return None, None
client = AnthropicAuxiliaryClient(
real_client, final_model, api_key="aws-sdk",
base_url=f"https://bedrock-runtime.{region}.amazonaws.com",
)
logger.debug("resolve_provider_client: bedrock (%s, %s)", final_model, region)
return (_to_async_client(client, final_model) if async_mode
else (client, final_model))
elif pconfig.auth_type in ("oauth_device_code", "oauth_external"):
# OAuth providers — route through their specific try functions
if provider == "nous":
@@ -2803,8 +2615,8 @@ def _build_call_kwargs(
temperature = fixed_temperature
# Opus 4.7+ rejects any non-default temperature/top_p/top_k — silently
# drop here so auxiliary callers that hardcode temperature (e.g. 0 on
# structured-JSON extraction) don't 400 the moment
# drop here so auxiliary callers that hardcode temperature (e.g. 0.3 on
# flush_memories, 0 on structured-JSON extraction) don't 400 the moment
# the aux model is flipped to 4.7.
if temperature is not None:
from agent.anthropic_adapter import _forbids_sampling_params
@@ -2892,7 +2704,7 @@ def call_llm(
Args:
task: Auxiliary task name ("compression", "vision", "web_extract",
"session_search", "skills_hub", "mcp", "title_generation").
"session_search", "skills_hub", "mcp", "flush_memories").
Reads provider:model from config/env. Ignored if provider is set.
provider: Explicit provider override.
model: Explicit model override.
@@ -2995,45 +2807,13 @@ def call_llm(
if _is_anthropic_compat_endpoint(resolved_provider, _client_base):
kwargs["messages"] = _convert_openai_images_to_anthropic(kwargs["messages"])
# Handle unsupported temperature, max_tokens vs max_completion_tokens retry,
# then payment fallback.
# Handle max_tokens vs max_completion_tokens retry, then payment fallback.
try:
return _validate_llm_response(
client.chat.completions.create(**kwargs), task)
except Exception as first_err:
if "temperature" in kwargs and _is_unsupported_temperature_error(first_err):
retry_kwargs = dict(kwargs)
retry_kwargs.pop("temperature", None)
logger.info(
"Auxiliary %s: provider rejected temperature; retrying once without it",
task or "call",
)
try:
return _validate_llm_response(
client.chat.completions.create(**retry_kwargs), task)
except Exception as retry_err:
retry_err_str = str(retry_err)
# If retry still fails, fall through to the max_tokens /
# payment / auth chains below using the temperature-stripped
# kwargs. Re-raise only if the retry hit something those
# chains won't handle.
if not (
_is_payment_error(retry_err)
or _is_connection_error(retry_err)
or _is_auth_error(retry_err)
or "max_tokens" in retry_err_str
or "unsupported_parameter" in retry_err_str
):
raise
first_err = retry_err
kwargs = retry_kwargs
err_str = str(first_err)
if max_tokens is not None and (
"max_tokens" in err_str
or "unsupported_parameter" in err_str
or _is_unsupported_parameter_error(first_err, "max_tokens")
):
if "max_tokens" in err_str or "unsupported_parameter" in err_str:
kwargs.pop("max_tokens", None)
kwargs["max_completion_tokens"] = max_tokens
try:
@@ -3069,49 +2849,6 @@ def call_llm(
return _validate_llm_response(
refreshed_client.chat.completions.create(**kwargs), task)
# ── Auth refresh retry ───────────────────────────────────────
if (_is_auth_error(first_err)
and resolved_provider not in ("auto", "", None)
and not client_is_nous):
if _refresh_provider_credentials(resolved_provider):
logger.info(
"Auxiliary %s: refreshed %s credentials after auth error, retrying",
task or "call", resolved_provider,
)
retry_client, retry_model = (
resolve_vision_provider_client(
provider=resolved_provider,
model=final_model,
async_mode=False,
)[1:]
if task == "vision"
else _get_cached_client(
resolved_provider,
resolved_model,
base_url=resolved_base_url,
api_key=resolved_api_key,
api_mode=resolved_api_mode,
main_runtime=main_runtime,
)
)
if retry_client is not None:
retry_kwargs = _build_call_kwargs(
resolved_provider,
retry_model or final_model,
messages,
temperature=temperature,
max_tokens=max_tokens,
tools=tools,
timeout=effective_timeout,
extra_body=effective_extra_body,
base_url=resolved_base_url,
)
_retry_base = str(getattr(retry_client, "base_url", "") or "")
if _is_anthropic_compat_endpoint(resolved_provider, _retry_base):
retry_kwargs["messages"] = _convert_openai_images_to_anthropic(retry_kwargs["messages"])
return _validate_llm_response(
retry_client.chat.completions.create(**retry_kwargs), task)
# ── Payment / credit exhaustion fallback ──────────────────────
# When the resolved provider returns 402 or a credit-related error,
# try alternative providers instead of giving up. This handles the
@@ -3296,35 +3033,8 @@ async def async_call_llm(
return _validate_llm_response(
await client.chat.completions.create(**kwargs), task)
except Exception as first_err:
if "temperature" in kwargs and _is_unsupported_temperature_error(first_err):
retry_kwargs = dict(kwargs)
retry_kwargs.pop("temperature", None)
logger.info(
"Auxiliary %s (async): provider rejected temperature; retrying once without it",
task or "call",
)
try:
return _validate_llm_response(
await client.chat.completions.create(**retry_kwargs), task)
except Exception as retry_err:
retry_err_str = str(retry_err)
if not (
_is_payment_error(retry_err)
or _is_connection_error(retry_err)
or _is_auth_error(retry_err)
or "max_tokens" in retry_err_str
or "unsupported_parameter" in retry_err_str
):
raise
first_err = retry_err
kwargs = retry_kwargs
err_str = str(first_err)
if max_tokens is not None and (
"max_tokens" in err_str
or "unsupported_parameter" in err_str
or _is_unsupported_parameter_error(first_err, "max_tokens")
):
if "max_tokens" in err_str or "unsupported_parameter" in err_str:
kwargs.pop("max_tokens", None)
kwargs["max_completion_tokens"] = max_tokens
try:
@@ -3359,48 +3069,6 @@ async def async_call_llm(
return _validate_llm_response(
await refreshed_client.chat.completions.create(**kwargs), task)
# ── Auth refresh retry (mirrors sync call_llm) ───────────────
if (_is_auth_error(first_err)
and resolved_provider not in ("auto", "", None)
and not client_is_nous):
if _refresh_provider_credentials(resolved_provider):
logger.info(
"Auxiliary %s (async): refreshed %s credentials after auth error, retrying",
task or "call", resolved_provider,
)
if task == "vision":
_, retry_client, retry_model = resolve_vision_provider_client(
provider=resolved_provider,
model=final_model,
async_mode=True,
)
else:
retry_client, retry_model = _get_cached_client(
resolved_provider,
resolved_model,
async_mode=True,
base_url=resolved_base_url,
api_key=resolved_api_key,
api_mode=resolved_api_mode,
)
if retry_client is not None:
retry_kwargs = _build_call_kwargs(
resolved_provider,
retry_model or final_model,
messages,
temperature=temperature,
max_tokens=max_tokens,
tools=tools,
timeout=effective_timeout,
extra_body=effective_extra_body,
base_url=resolved_base_url,
)
_retry_base = str(getattr(retry_client, "base_url", "") or "")
if _is_anthropic_compat_endpoint(resolved_provider, _retry_base):
retry_kwargs["messages"] = _convert_openai_images_to_anthropic(retry_kwargs["messages"])
return _validate_llm_response(
await retry_client.chat.completions.create(**retry_kwargs), task)
# ── Payment / connection fallback (mirrors sync call_llm) ─────
should_fallback = _is_payment_error(first_err) or _is_connection_error(first_err)
is_auto = resolved_provider in ("auto", "", None)
+2 -130
View File
@@ -87,114 +87,6 @@ def reset_client_cache():
_bedrock_control_client_cache.clear()
def invalidate_runtime_client(region: str) -> bool:
"""Evict the cached ``bedrock-runtime`` client for a single region.
Per-region counterpart to :func:`reset_client_cache`. Used by the converse
call wrappers to discard clients whose underlying HTTP connection has
gone stale, so the next call allocates a fresh client (with a fresh
connection pool) instead of reusing a dead socket.
Returns True if a cached entry was evicted, False if the region was not
cached.
"""
existed = region in _bedrock_runtime_client_cache
_bedrock_runtime_client_cache.pop(region, None)
return existed
# ---------------------------------------------------------------------------
# Stale-connection detection
# ---------------------------------------------------------------------------
#
# boto3 caches its HTTPS connection pool inside the client object. When a
# pooled connection is killed out from under us (NAT timeout, VPN flap,
# server-side TCP RST, proxy idle cull, etc.), the next use surfaces as
# one of a handful of low-level exceptions — most commonly
# ``botocore.exceptions.ConnectionClosedError`` or
# ``urllib3.exceptions.ProtocolError``. urllib3 also trips an internal
# ``assert`` in a couple of paths (connection pool state checks, chunked
# response readers) which bubbles up as a bare ``AssertionError`` with an
# empty ``str(exc)``.
#
# In all of these cases the client is the problem, not the request: retrying
# with the same cached client reproduces the failure until the process
# restarts. The fix is to evict the region's cached client so the next
# attempt builds a new one.
_STALE_LIB_MODULE_PREFIXES = (
"urllib3.",
"botocore.",
"boto3.",
)
def _traceback_frames_modules(exc: BaseException):
"""Yield ``__name__``-style module strings for each frame in exc's traceback."""
tb = getattr(exc, "__traceback__", None)
while tb is not None:
frame = tb.tb_frame
module = frame.f_globals.get("__name__", "")
yield module or ""
tb = tb.tb_next
def is_stale_connection_error(exc: BaseException) -> bool:
"""Return True if ``exc`` indicates a dead/stale Bedrock HTTP connection.
Matches:
* ``botocore.exceptions.ConnectionError`` and subclasses
(``ConnectionClosedError``, ``EndpointConnectionError``,
``ReadTimeoutError``, ``ConnectTimeoutError``).
* ``urllib3.exceptions.ProtocolError`` / ``NewConnectionError`` /
``ConnectionError`` (best-effort import — urllib3 is a transitive
dependency of botocore so it is always available in practice).
* Bare ``AssertionError`` raised from a frame inside urllib3, botocore,
or boto3. These are internal-invariant failures (typically triggered
by corrupted connection-pool state after a dropped socket) and are
recoverable by swapping the client.
Non-library ``AssertionError``s (from application code or tests) are
intentionally not matched — only library-internal asserts signal stale
connection state.
"""
# botocore: the canonical signal — HTTPClientError is the umbrella for
# ConnectionClosedError, ReadTimeoutError, EndpointConnectionError,
# ConnectTimeoutError, and ProxyConnectionError. ConnectionError covers
# the same family via a different branch of the hierarchy.
try:
from botocore.exceptions import (
ConnectionError as BotoConnectionError,
HTTPClientError,
)
botocore_errors: tuple = (BotoConnectionError, HTTPClientError)
except ImportError: # pragma: no cover — botocore always present with boto3
botocore_errors = ()
if botocore_errors and isinstance(exc, botocore_errors):
return True
# urllib3: low-level transport failures
try:
from urllib3.exceptions import (
ProtocolError,
NewConnectionError,
ConnectionError as Urllib3ConnectionError,
)
urllib3_errors = (ProtocolError, NewConnectionError, Urllib3ConnectionError)
except ImportError: # pragma: no cover
urllib3_errors = ()
if urllib3_errors and isinstance(exc, urllib3_errors):
return True
# Library-internal AssertionError (urllib3 / botocore / boto3)
if isinstance(exc, AssertionError):
for module in _traceback_frames_modules(exc):
if any(module.startswith(prefix) for prefix in _STALE_LIB_MODULE_PREFIXES):
return True
return False
# ---------------------------------------------------------------------------
# AWS credential detection
# ---------------------------------------------------------------------------
@@ -895,17 +787,7 @@ def call_converse(
guardrail_config=guardrail_config,
)
try:
response = client.converse(**kwargs)
except Exception as exc:
if is_stale_connection_error(exc):
logger.warning(
"bedrock: stale-connection error on converse(region=%s, model=%s): "
"%s — evicting cached client so the next call reconnects.",
region, model, type(exc).__name__,
)
invalidate_runtime_client(region)
raise
response = client.converse(**kwargs)
return normalize_converse_response(response)
@@ -937,17 +819,7 @@ def call_converse_stream(
guardrail_config=guardrail_config,
)
try:
response = client.converse_stream(**kwargs)
except Exception as exc:
if is_stale_connection_error(exc):
logger.warning(
"bedrock: stale-connection error on converse_stream(region=%s, "
"model=%s): %s — evicting cached client so the next call reconnects.",
region, model, type(exc).__name__,
)
invalidate_runtime_client(region)
raise
response = client.converse_stream(**kwargs)
return normalize_converse_stream_events(response)
+10 -73
View File
@@ -23,52 +23,26 @@ from agent.prompt_builder import DEFAULT_AGENT_IDENTITY
logger = logging.getLogger(__name__)
# Matches Codex/Harmony tool-call serialization that occasionally leaks into
# assistant-message content when the model fails to emit a structured
# ``function_call`` item. Accepts the common forms:
#
# to=functions.exec_command
# assistant to=functions.exec_command
# <|channel|>commentary to=functions.exec_command
#
# ``to=functions.<name>`` is the stable marker — the optional ``assistant`` or
# Harmony channel prefix varies by degeneration mode. Case-insensitive to
# cover lowercase/uppercase ``assistant`` variants.
_TOOL_CALL_LEAK_PATTERN = re.compile(
r"(?:^|[\s>|])to=functions\.[A-Za-z_][\w.]*",
re.IGNORECASE,
)
# ---------------------------------------------------------------------------
# Multimodal content helpers
# ---------------------------------------------------------------------------
def _chat_content_to_responses_parts(content: Any, *, role: str = "user") -> List[Dict[str, Any]]:
def _chat_content_to_responses_parts(content: Any) -> List[Dict[str, Any]]:
"""Convert chat-style multimodal content to Responses API input parts.
Input: ``[{"type":"text"|"image_url", ...}]`` (native OpenAI Chat format)
Output: ``[{"type":"input_text"|"output_text"|"input_image", ...}]`` (Responses format)
The ``role`` parameter controls the text content type:
- ``"user"`` (default) → ``"input_text"``
- ``"assistant"`` → ``"output_text"``
The Responses API rejects ``input_text`` inside assistant messages and
``output_text`` inside user messages, so callers MUST pass the correct
role for the message being converted.
Output: ``[{"type":"input_text"|"input_image", ...}]`` (Responses format)
Returns an empty list when ``content`` is not a list or contains no
recognized parts — callers fall back to the string path.
"""
text_type = "output_text" if role == "assistant" else "input_text"
if not isinstance(content, list):
return []
converted: List[Dict[str, Any]] = []
for part in content:
if isinstance(part, str):
if part:
converted.append({"type": text_type, "text": part})
converted.append({"type": "input_text", "text": part})
continue
if not isinstance(part, dict):
continue
@@ -76,7 +50,7 @@ def _chat_content_to_responses_parts(content: Any, *, role: str = "user") -> Lis
if ptype in {"text", "input_text", "output_text"}:
text = part.get("text")
if isinstance(text, str) and text:
converted.append({"type": text_type, "text": text})
converted.append({"type": "input_text", "text": text})
continue
if ptype in {"image_url", "input_image"}:
image_ref = part.get("image_url")
@@ -242,10 +216,9 @@ def _chat_messages_to_responses_input(messages: List[Dict[str, Any]]) -> List[Di
if role in {"user", "assistant"}:
content = msg.get("content", "")
if isinstance(content, list):
content_parts = _chat_content_to_responses_parts(content, role=role)
text_type = "output_text" if role == "assistant" else "input_text"
content_parts = _chat_content_to_responses_parts(content)
content_text = "".join(
p.get("text", "") for p in content_parts if p.get("type") == text_type
p.get("text", "") for p in content_parts if p.get("type") == "input_text"
)
else:
content_parts = []
@@ -439,16 +412,13 @@ def _preflight_codex_input_items(raw_items: Any) -> List[Dict[str, Any]]:
content = ""
if isinstance(content, list):
# Multimodal content from ``_chat_messages_to_responses_input``
# is already in Responses format (``input_text`` / ``output_text``
# / ``input_image``). Validate each part and pass through.
# Use the correct text type for the role — ``output_text`` for
# assistant messages, ``input_text`` for user messages.
text_type = "output_text" if role == "assistant" else "input_text"
# is already in Responses format (``input_text`` / ``input_image``).
# Validate each part and pass through.
validated: List[Dict[str, Any]] = []
for part_idx, part in enumerate(content):
if isinstance(part, str):
if part:
validated.append({"type": text_type, "text": part})
validated.append({"type": "input_text", "text": part})
continue
if not isinstance(part, dict):
raise ValueError(
@@ -459,7 +429,7 @@ def _preflight_codex_input_items(raw_items: Any) -> List[Dict[str, Any]]:
text = part.get("text", "")
if not isinstance(text, str):
text = str(text or "")
validated.append({"type": text_type, "text": text})
validated.append({"type": "input_text", "text": text})
elif ptype in {"input_image", "image_url"}:
image_ref = part.get("image_url", "")
detail = part.get("detail")
@@ -817,37 +787,6 @@ def _normalize_codex_response(response: Any) -> tuple[Any, str]:
if isinstance(out_text, str):
final_text = out_text.strip()
# ── Tool-call leak recovery ──────────────────────────────────
# gpt-5.x on the Codex Responses API sometimes degenerates and emits
# what should be a structured `function_call` item as plain assistant
# text using the Harmony/Codex serialization (``to=functions.foo
# {json}`` or ``assistant to=functions.foo {json}``). The model
# intended to call a tool, but the intent never made it into
# ``response.output`` as a ``function_call`` item, so ``tool_calls``
# is empty here. If we pass this through, the parent sees a
# confident-looking summary with no audit trail (empty ``tool_trace``)
# and no tools actually ran — the Taiwan-embassy-email incident.
#
# Detection: leaked tokens always contain ``to=functions.<name>`` and
# the assistant message has no real tool calls. Treat it as incomplete
# so the existing Codex-incomplete continuation path (3 retries,
# handled in run_agent.py) gets a chance to re-elicit a proper
# ``function_call`` item. The existing loop already handles message
# append, dedup, and retry budget.
leaked_tool_call_text = False
if final_text and not tool_calls and _TOOL_CALL_LEAK_PATTERN.search(final_text):
leaked_tool_call_text = True
logger.warning(
"Codex response contains leaked tool-call text in assistant content "
"(no structured function_call items). Treating as incomplete so the "
"continuation path can re-elicit a proper tool call. Leaked snippet: %r",
final_text[:300],
)
# Clear the text so downstream code doesn't surface the garbage as
# a summary. The encrypted reasoning items (if any) are preserved
# so the model keeps its chain-of-thought on the retry.
final_text = ""
assistant_message = SimpleNamespace(
content=final_text,
tool_calls=tool_calls,
@@ -859,8 +798,6 @@ def _normalize_codex_response(response: Any) -> tuple[Any, str]:
if tool_calls:
finish_reason = "tool_calls"
elif leaked_tool_call_text:
finish_reason = "incomplete"
elif has_incomplete_items or (saw_commentary_phase and not saw_final_answer_phase):
finish_reason = "incomplete"
elif reasoning_items_raw and not final_text:
-30
View File
@@ -294,7 +294,6 @@ class ContextCompressor(ContextEngine):
self._context_probed = False
self._context_probe_persistable = False
self._previous_summary = None
self._last_summary_error = None
self._last_compression_savings_pct = 100.0
self._ineffective_compression_count = 0
@@ -318,13 +317,6 @@ class ContextCompressor(ContextEngine):
int(context_length * self.threshold_percent),
MINIMUM_CONTEXT_LENGTH,
)
# Recalculate token budgets for the new context length so the
# compressor stays calibrated after a model switch (e.g. 200K → 32K).
target_tokens = int(self.threshold_tokens * self.summary_target_ratio)
self.tail_token_budget = target_tokens
self.max_summary_tokens = min(
int(context_length * 0.05), _SUMMARY_TOKENS_CEILING,
)
def __init__(
self,
@@ -397,7 +389,6 @@ class ContextCompressor(ContextEngine):
self._last_compression_savings_pct: float = 100.0
self._ineffective_compression_count: int = 0
self._summary_failure_cooldown_until: float = 0.0
self._last_summary_error: Optional[str] = None
def update_from_response(self, usage: Dict[str, Any]):
"""Update tracked token usage from API response."""
@@ -821,12 +812,10 @@ The user has requested that this compaction PRIORITISE preserving all informatio
self._previous_summary = summary
self._summary_failure_cooldown_until = 0.0
self._summary_model_fallen_back = False
self._last_summary_error = None
return self._with_summary_prefix(summary)
except RuntimeError:
# No provider configured — long cooldown, unlikely to self-resolve
self._summary_failure_cooldown_until = time.monotonic() + _SUMMARY_FAILURE_COOLDOWN_SECONDS
self._last_summary_error = "no auxiliary LLM provider configured"
logging.warning("Context compression: no provider available for "
"summary. Middle turns will be dropped without summary "
"for %d seconds.",
@@ -864,10 +853,6 @@ The user has requested that this compaction PRIORITISE preserving all informatio
# Transient errors (timeout, rate limit, network) — shorter cooldown
_transient_cooldown = 60
self._summary_failure_cooldown_until = time.monotonic() + _transient_cooldown
err_text = str(e).strip() or e.__class__.__name__
if len(err_text) > 220:
err_text = err_text[:217].rstrip() + "..."
self._last_summary_error = err_text
logging.warning(
"Failed to generate context summary: %s. "
"Further summary attempts paused for %d seconds.",
@@ -1114,21 +1099,6 @@ The user has requested that this compaction PRIORITISE preserving all informatio
return max(cut_idx, head_end + 1)
# ------------------------------------------------------------------
# ContextEngine: manual /compress preflight
# ------------------------------------------------------------------
def has_content_to_compress(self, messages: List[Dict[str, Any]]) -> bool:
"""Return True if there is a non-empty middle region to compact.
Overrides the ABC default so the gateway ``/compress`` guard can
skip the LLM call when the transcript is still entirely inside
the protected head/tail.
"""
compress_start = self._align_boundary_forward(messages, self.protect_first_n)
compress_end = self._find_tail_cut_by_tokens(messages, compress_start)
return compress_start < compress_end
# ------------------------------------------------------------------
# Main compression entry point
# ------------------------------------------------------------------
-22
View File
@@ -78,7 +78,6 @@ class ContextEngine(ABC):
self,
messages: List[Dict[str, Any]],
current_tokens: int = None,
focus_topic: str = None,
) -> List[Dict[str, Any]]:
"""Compact the message list and return the new message list.
@@ -87,12 +86,6 @@ class ContextEngine(ABC):
context budget. The implementation is free to summarize, build a
DAG, or do anything else — as long as the returned list is a valid
OpenAI-format message sequence.
Args:
focus_topic: Optional topic string from manual ``/compress <focus>``.
Engines that support guided compression should prioritise
preserving information related to this topic. Engines that
don't support it may simply ignore this argument.
"""
# -- Optional: pre-flight check ----------------------------------------
@@ -105,21 +98,6 @@ class ContextEngine(ABC):
"""
return False
# -- Optional: manual /compress preflight ------------------------------
def has_content_to_compress(self, messages: List[Dict[str, Any]]) -> bool:
"""Quick check: is there anything in ``messages`` that can be compacted?
Used by the gateway ``/compress`` command as a preflight guard —
returning False lets the gateway report "nothing to compress yet"
without making an LLM call.
Default returns True (always attempt). Engines with a cheap way
to introspect their own head/tail boundaries should override this
to return False when the transcript is still entirely protected.
"""
return True
# -- Optional: session lifecycle ---------------------------------------
def on_session_start(self, session_id: str, **kwargs) -> None:
-42
View File
@@ -46,47 +46,6 @@ def _resolve_args() -> list[str]:
return shlex.split(raw)
def _resolve_home_dir() -> str:
"""Return a stable HOME for child ACP processes."""
try:
from hermes_constants import get_subprocess_home
profile_home = get_subprocess_home()
if profile_home:
return profile_home
except Exception:
pass
home = os.environ.get("HOME", "").strip()
if home:
return home
expanded = os.path.expanduser("~")
if expanded and expanded != "~":
return expanded
try:
import pwd
resolved = pwd.getpwuid(os.getuid()).pw_dir.strip()
if resolved:
return resolved
except Exception:
pass
# Last resort: /tmp (writable on any POSIX system). Avoids crashing the
# subprocess with no HOME; callers can set HERMES_HOME explicitly if they
# need a different writable dir.
return "/tmp"
def _build_subprocess_env() -> dict[str, str]:
env = os.environ.copy()
env["HOME"] = _resolve_home_dir()
return env
def _jsonrpc_error(message_id: Any, code: int, message: str) -> dict[str, Any]:
return {
"jsonrpc": "2.0",
@@ -423,7 +382,6 @@ class CopilotACPClient:
text=True,
bufsize=1,
cwd=self._acp_cwd,
env=_build_subprocess_env(),
)
except FileNotFoundError as exc:
raise RuntimeError(
+20 -110
View File
@@ -29,6 +29,7 @@ from hermes_cli.auth import (
_save_auth_store,
_save_provider_state,
read_credential_pool,
read_provider_credentials,
write_credential_pool,
)
@@ -321,7 +322,7 @@ def get_custom_provider_pool_key(base_url: str) -> Optional[str]:
def list_custom_pool_providers() -> List[str]:
"""Return all 'custom:*' pool keys that have entries in auth.json."""
pool_data = read_credential_pool(None)
pool_data = read_credential_pool()
return sorted(
key for key in pool_data
if key.startswith(CUSTOM_POOL_PREFIX)
@@ -455,61 +456,6 @@ class CredentialPool:
logger.debug("Failed to sync from credentials file: %s", exc)
return entry
def _sync_nous_entry_from_auth_store(self, entry: PooledCredential) -> PooledCredential:
"""Sync a Nous pool entry from auth.json if tokens differ.
Nous OAuth refresh tokens are single-use. When another process
(e.g. a concurrent cron) refreshes the token via
``resolve_nous_runtime_credentials``, it writes fresh tokens to
auth.json under ``_auth_store_lock``. The pool entry's tokens
become stale. This method detects that and adopts the newer pair,
avoiding a "refresh token reuse" revocation on the Nous Portal.
"""
if self.provider != "nous" or entry.source != "device_code":
return entry
try:
with _auth_store_lock():
auth_store = _load_auth_store()
state = _load_provider_state(auth_store, "nous")
if not state:
return entry
store_refresh = state.get("refresh_token", "")
store_access = state.get("access_token", "")
if store_refresh and store_refresh != entry.refresh_token:
logger.debug(
"Pool entry %s: syncing tokens from auth.json (Nous refresh token changed)",
entry.id,
)
field_updates: Dict[str, Any] = {
"access_token": store_access,
"refresh_token": store_refresh,
"last_status": None,
"last_status_at": None,
"last_error_code": None,
}
if state.get("expires_at"):
field_updates["expires_at"] = state["expires_at"]
if state.get("agent_key"):
field_updates["agent_key"] = state["agent_key"]
if state.get("agent_key_expires_at"):
field_updates["agent_key_expires_at"] = state["agent_key_expires_at"]
if state.get("inference_base_url"):
field_updates["inference_base_url"] = state["inference_base_url"]
extra_updates = dict(entry.extra)
for extra_key in ("obtained_at", "expires_in", "agent_key_id",
"agent_key_expires_in", "agent_key_reused",
"agent_key_obtained_at"):
val = state.get(extra_key)
if val is not None:
extra_updates[extra_key] = val
updated = replace(entry, extra=extra_updates, **field_updates)
self._replace_entry(entry, updated)
self._persist()
return updated
except Exception as exc:
logger.debug("Failed to sync Nous entry from auth.json: %s", exc)
return entry
def _sync_device_code_entry_to_auth_store(self, entry: PooledCredential) -> None:
"""Write refreshed pool entry tokens back to auth.json providers.
@@ -616,9 +562,6 @@ class CredentialPool:
last_refresh=refreshed.get("last_refresh"),
)
elif self.provider == "nous":
synced = self._sync_nous_entry_from_auth_store(entry)
if synced is not entry:
entry = synced
nous_state = {
"access_token": entry.access_token,
"refresh_token": entry.refresh_token,
@@ -693,26 +636,6 @@ class CredentialPool:
# Credentials file had a valid (non-expired) token — use it directly
logger.debug("Credentials file has valid token, using without refresh")
return synced
# For nous: another process may have consumed the refresh token
# between our proactive sync and the HTTP call. Re-sync from
# auth.json and adopt the fresh tokens if available.
if self.provider == "nous":
synced = self._sync_nous_entry_from_auth_store(entry)
if synced.refresh_token != entry.refresh_token:
logger.debug("Nous refresh failed but auth.json has newer tokens — adopting")
updated = replace(
synced,
last_status=STATUS_OK,
last_status_at=None,
last_error_code=None,
last_error_reason=None,
last_error_message=None,
last_error_reset_at=None,
)
self._replace_entry(synced, updated)
self._persist()
self._sync_device_code_entry_to_auth_store(updated)
return updated
self._mark_exhausted(entry, None)
return None
@@ -776,17 +699,6 @@ class CredentialPool:
if synced is not entry:
entry = synced
cleared_any = True
# For nous entries, sync from auth.json before status checks.
# Another process may have successfully refreshed via
# resolve_nous_runtime_credentials(), making this entry's
# exhausted status stale.
if (self.provider == "nous"
and entry.source == "device_code"
and entry.last_status == STATUS_EXHAUSTED):
synced = self._sync_nous_entry_from_auth_store(entry)
if synced is not entry:
entry = synced
cleared_any = True
if entry.last_status == STATUS_EXHAUSTED:
exhausted_until = _exhausted_until(entry)
if exhausted_until is not None and now < exhausted_until:
@@ -828,11 +740,8 @@ class CredentialPool:
if self._strategy == STRATEGY_LEAST_USED and len(available) > 1:
entry = min(available, key=lambda e: e.request_count)
# Increment usage counter so subsequent selections distribute load
updated = replace(entry, request_count=entry.request_count + 1)
self._replace_entry(entry, updated)
self._current_id = entry.id
return updated
return entry
if self._strategy == STRATEGY_ROUND_ROBIN and len(available) > 1:
entry = available[0]
@@ -967,6 +876,20 @@ class CredentialPool:
self._current_id = None
return removed
def remove_entry(self, entry_id: str) -> Optional[PooledCredential]:
for idx, entry in enumerate(self._entries):
if entry.id == entry_id:
removed = self._entries.pop(idx)
self._entries = [
replace(e, priority=new_priority)
for new_priority, e in enumerate(self._entries)
]
self._persist()
if self._current_id == removed.id:
self._current_id = None
return removed
return None
def resolve_target(self, target: Any) -> Tuple[Optional[int], Optional[PooledCredential], Optional[str]]:
raw = str(target or "").strip()
if not raw:
@@ -1148,18 +1071,6 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
"inference_base_url": state.get("inference_base_url"),
"agent_key": state.get("agent_key"),
"agent_key_expires_at": state.get("agent_key_expires_at"),
# Carry the mint/refresh timestamps into the pool so
# freshness-sensitive consumers (self-heal hooks, pool
# pruning by age) can distinguish just-minted credentials
# from stale ones. Without these, fresh device_code
# entries get obtained_at=None and look older than they
# are (#15099).
"obtained_at": state.get("obtained_at"),
"expires_in": state.get("expires_in"),
"agent_key_id": state.get("agent_key_id"),
"agent_key_expires_in": state.get("agent_key_expires_in"),
"agent_key_reused": state.get("agent_key_reused"),
"agent_key_obtained_at": state.get("agent_key_obtained_at"),
"tls": state.get("tls") if isinstance(state.get("tls"), dict) else None,
"label": seeded_label,
},
@@ -1170,10 +1081,9 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
# env vars (COPILOT_GITHUB_TOKEN / GH_TOKEN). They don't live in
# the auth store or credential pool, so we resolve them here.
try:
from hermes_cli.copilot_auth import resolve_copilot_token, get_copilot_api_token
from hermes_cli.copilot_auth import resolve_copilot_token
token, source = resolve_copilot_token()
if token:
api_token = get_copilot_api_token(token)
source_name = "gh_cli" if "gh" in source.lower() else f"env:{source}"
if not _is_suppressed(provider, source_name):
active_sources.add(source_name)
@@ -1185,7 +1095,7 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
{
"source": source_name,
"auth_type": AUTH_TYPE_API_KEY,
"access_token": api_token,
"access_token": token,
"base_url": pconfig.inference_base_url if pconfig else "",
"label": source,
},
@@ -1430,7 +1340,7 @@ def _seed_custom_pool(pool_key: str, entries: List[PooledCredential]) -> Tuple[b
def load_pool(provider: str) -> CredentialPool:
provider = (provider or "").strip().lower()
raw_entries = read_credential_pool(provider)
raw_entries = read_provider_credentials(provider)
entries = [PooledCredential.from_dict(provider, payload) for payload in raw_entries]
if provider.startswith(CUSTOM_POOL_PREFIX):
+1
View File
@@ -729,6 +729,7 @@ class KawaiiSpinner:
time.sleep(0.1)
continue
frame = self.spinner_frames[self.frame_idx % len(self.spinner_frames)]
assert self.start_time is not None # start() sets it before thread starts
elapsed = time.time() - self.start_time
if wings:
left, right = wings[self.frame_idx % len(wings)]
-55
View File
@@ -45,7 +45,6 @@ class FailoverReason(enum.Enum):
# Model
model_not_found = "model_not_found" # 404 or invalid model — fallback to different model
provider_policy_blocked = "provider_policy_blocked" # Aggregator (e.g. OpenRouter) blocked the only endpoint due to account data/privacy policy
# Request format
format_error = "format_error" # 400 bad request — abort or strip + retry
@@ -195,29 +194,6 @@ _MODEL_NOT_FOUND_PATTERNS = [
"unsupported model",
]
# OpenRouter aggregator policy-block patterns.
#
# When a user's OpenRouter account privacy setting (or a per-request
# `provider.data_collection: deny` preference) excludes the only endpoint
# serving a model, OpenRouter returns 404 with a *specific* message that is
# distinct from "model not found":
#
# "No endpoints available matching your guardrail restrictions and
# data policy. Configure: https://openrouter.ai/settings/privacy"
#
# We classify this as `provider_policy_blocked` rather than
# `model_not_found` because:
# - The model *exists* — model_not_found is misleading in logs
# - Provider fallback won't help: the account-level setting applies to
# every call on the same OpenRouter account
# - The error body already contains the fix URL, so the user gets
# actionable guidance without us rewriting the message
_PROVIDER_POLICY_BLOCKED_PATTERNS = [
"no endpoints available matching your guardrail",
"no endpoints available matching your data policy",
"no endpoints found matching your data policy",
]
# Auth patterns (non-status-code signals)
_AUTH_PATTERNS = [
"invalid api key",
@@ -343,11 +319,6 @@ def classify_api_error(
"""
status_code = _extract_status_code(error)
error_type = type(error).__name__
# Copilot/GitHub Models RateLimitError may not set .status_code; force 429
# so downstream rate-limit handling (classifier reason, pool rotation,
# fallback gating) fires correctly instead of misclassifying as generic.
if status_code is None and error_type == "RateLimitError":
status_code = 429
body = _extract_error_body(error)
error_code = _extract_error_code(body)
@@ -552,17 +523,6 @@ def _classify_by_status(
return _classify_402(error_msg, result_fn)
if status_code == 404:
# OpenRouter policy-block 404 — distinct from "model not found".
# The model exists; the user's account privacy setting excludes the
# only endpoint serving it. Falling back to another provider won't
# help (same account setting applies). The error body already
# contains the fix URL, so just surface it.
if any(p in error_msg for p in _PROVIDER_POLICY_BLOCKED_PATTERNS):
return result_fn(
FailoverReason.provider_policy_blocked,
retryable=False,
should_fallback=False,
)
if any(p in error_msg for p in _MODEL_NOT_FOUND_PATTERNS):
return result_fn(
FailoverReason.model_not_found,
@@ -680,12 +640,6 @@ def _classify_400(
)
# Some providers return model-not-found as 400 instead of 404 (e.g. OpenRouter).
if any(p in error_msg for p in _PROVIDER_POLICY_BLOCKED_PATTERNS):
return result_fn(
FailoverReason.provider_policy_blocked,
retryable=False,
should_fallback=False,
)
if any(p in error_msg for p in _MODEL_NOT_FOUND_PATTERNS):
return result_fn(
FailoverReason.model_not_found,
@@ -858,15 +812,6 @@ def _classify_by_message(
should_fallback=True,
)
# Provider policy-block (aggregator-side guardrail) — check before
# model_not_found so we don't mis-label as a missing model.
if any(p in error_msg for p in _PROVIDER_POLICY_BLOCKED_PATTERNS):
return result_fn(
FailoverReason.provider_policy_blocked,
retryable=False,
should_fallback=False,
)
# Model not found patterns
if any(p in error_msg for p in _MODEL_NOT_FOUND_PATTERNS):
return result_fn(
-104
View File
@@ -44,97 +44,6 @@ def is_native_gemini_base_url(base_url: str) -> bool:
return not normalized.endswith("/openai")
def probe_gemini_tier(
api_key: str,
base_url: str = DEFAULT_GEMINI_BASE_URL,
*,
model: str = "gemini-2.5-flash",
timeout: float = 10.0,
) -> str:
"""Probe a Google AI Studio API key and return its tier.
Returns one of:
- ``"free"`` -- key is on the free tier (unusable with Hermes)
- ``"paid"`` -- key is on a paid tier
- ``"unknown"`` -- probe failed; callers should proceed without blocking.
"""
key = (api_key or "").strip()
if not key:
return "unknown"
normalized_base = str(base_url or DEFAULT_GEMINI_BASE_URL).strip().rstrip("/")
if not normalized_base:
normalized_base = DEFAULT_GEMINI_BASE_URL
if normalized_base.lower().endswith("/openai"):
normalized_base = normalized_base[: -len("/openai")]
url = f"{normalized_base}/models/{model}:generateContent"
payload = {
"contents": [{"role": "user", "parts": [{"text": "hi"}]}],
"generationConfig": {"maxOutputTokens": 1},
}
try:
with httpx.Client(timeout=timeout) as client:
resp = client.post(
url,
params={"key": key},
json=payload,
headers={"Content-Type": "application/json"},
)
except Exception as exc:
logger.debug("probe_gemini_tier: network error: %s", exc)
return "unknown"
headers_lower = {k.lower(): v for k, v in resp.headers.items()}
rpd_header = headers_lower.get("x-ratelimit-limit-requests-per-day")
if rpd_header:
try:
rpd_val = int(rpd_header)
except (TypeError, ValueError):
rpd_val = None
# Published free-tier daily caps (Dec 2025):
# gemini-2.5-pro: 100, gemini-2.5-flash: 250, flash-lite: 1000
# Tier 1 starts at ~1500+ for Flash. We treat <= 1000 as free.
if rpd_val is not None and rpd_val <= 1000:
return "free"
if rpd_val is not None and rpd_val > 1000:
return "paid"
if resp.status_code == 429:
body_text = ""
try:
body_text = resp.text or ""
except Exception:
body_text = ""
if "free_tier" in body_text.lower():
return "free"
return "paid"
if 200 <= resp.status_code < 300:
return "paid"
return "unknown"
def is_free_tier_quota_error(error_message: str) -> bool:
"""Return True when a Gemini 429 message indicates free-tier exhaustion."""
if not error_message:
return False
return "free_tier" in error_message.lower()
_FREE_TIER_GUIDANCE = (
"\n\nYour Google API key is on the free tier (<= 250 requests/day for "
"gemini-2.5-flash). Hermes typically makes 3-10 API calls per user turn, "
"so the free tier is exhausted in a handful of messages and cannot sustain "
"an agent session. Enable billing on your Google Cloud project and "
"regenerate the key in a billing-enabled project: "
"https://aistudio.google.com/apikey"
)
class GeminiAPIError(Exception):
"""Error shape compatible with Hermes retry/error classification."""
@@ -741,12 +650,6 @@ def gemini_http_error(response: httpx.Response) -> GeminiAPIError:
else:
message = f"Gemini returned HTTP {status}: {body_text[:500]}"
# Free-tier quota exhaustion -> append actionable guidance so users who
# bypassed the setup wizard (direct GOOGLE_API_KEY in .env) still learn
# that the free tier cannot sustain an agent session.
if status == 429 and is_free_tier_quota_error(err_message or body_text):
message = message + _FREE_TIER_GUIDANCE
return GeminiAPIError(
message,
code=code,
@@ -801,13 +704,6 @@ class GeminiNativeClient:
http_client: Optional[httpx.Client] = None,
**_: Any,
) -> None:
if not (api_key or "").strip():
raise RuntimeError(
"Gemini native client requires an API key, but none was provided. "
"Set GOOGLE_API_KEY or GEMINI_API_KEY in your environment / ~/.hermes/.env "
"(get one at https://aistudio.google.com/app/apikey), or run `hermes setup` "
"to configure the Google provider."
)
self.api_key = api_key
normalized_base = (base_url or DEFAULT_GEMINI_BASE_URL).rstrip("/")
if normalized_base.endswith("/openai"):
-14
View File
@@ -73,20 +73,6 @@ def sanitize_gemini_schema(schema: Any) -> Dict[str, Any]:
]
continue
cleaned[key] = value
# Gemini's Schema validator requires every ``enum`` entry to be a string,
# even when the parent ``type`` is ``integer`` / ``number`` / ``boolean``.
# OpenAI / OpenRouter / Anthropic accept typed enums (e.g. Discord's
# ``auto_archive_duration: {type: integer, enum: [60, 1440, 4320, 10080]}``),
# so we only drop the ``enum`` when it would collide with Gemini's rule.
# Keeping ``type: integer`` plus the human-readable description gives the
# model enough guidance; the tool handler still validates the value.
enum_val = cleaned.get("enum")
type_val = cleaned.get("type")
if isinstance(enum_val, list) and type_val in {"integer", "number", "boolean"}:
if any(not isinstance(item, str) for item in enum_val):
cleaned.pop("enum", None)
return cleaned
+2 -43
View File
@@ -31,7 +31,6 @@ from __future__ import annotations
import json
import logging
import re
import inspect
from typing import Any, Dict, List, Optional
from agent.memory_provider import MemoryProvider
@@ -313,39 +312,7 @@ class MemoryManager:
)
return "\n\n".join(parts)
@staticmethod
def _provider_memory_write_metadata_mode(provider: MemoryProvider) -> str:
"""Return how to pass metadata to a provider's memory-write hook."""
try:
signature = inspect.signature(provider.on_memory_write)
except (TypeError, ValueError):
return "keyword"
params = list(signature.parameters.values())
if any(p.kind == inspect.Parameter.VAR_KEYWORD for p in params):
return "keyword"
if "metadata" in signature.parameters:
return "keyword"
accepted = [
p for p in params
if p.kind in (
inspect.Parameter.POSITIONAL_ONLY,
inspect.Parameter.POSITIONAL_OR_KEYWORD,
inspect.Parameter.KEYWORD_ONLY,
)
]
if len(accepted) >= 4:
return "positional"
return "legacy"
def on_memory_write(
self,
action: str,
target: str,
content: str,
metadata: Optional[Dict[str, Any]] = None,
) -> None:
def on_memory_write(self, action: str, target: str, content: str) -> None:
"""Notify external providers when the built-in memory tool writes.
Skips the builtin provider itself (it's the source of the write).
@@ -354,15 +321,7 @@ class MemoryManager:
if provider.name == "builtin":
continue
try:
metadata_mode = self._provider_memory_write_metadata_mode(provider)
if metadata_mode == "keyword":
provider.on_memory_write(
action, target, content, metadata=dict(metadata or {})
)
elif metadata_mode == "positional":
provider.on_memory_write(action, target, content, dict(metadata or {}))
else:
provider.on_memory_write(action, target, content)
provider.on_memory_write(action, target, content)
except Exception as e:
logger.debug(
"Memory provider '%s' on_memory_write failed: %s",
+3 -12
View File
@@ -26,7 +26,7 @@ Optional hooks (override to opt in):
on_turn_start(turn, message, **kwargs) per-turn tick with runtime context
on_session_end(messages) end-of-session extraction
on_pre_compress(messages) -> str extract before context compression
on_memory_write(action, target, content, metadata=None) mirror built-in memory writes
on_memory_write(action, target, content) mirror built-in memory writes
on_delegation(task, result, **kwargs) parent-side observation of subagent work
"""
@@ -34,7 +34,7 @@ from __future__ import annotations
import logging
from abc import ABC, abstractmethod
from typing import Any, Dict, List, Optional
from typing import Any, Dict, List
logger = logging.getLogger(__name__)
@@ -220,21 +220,12 @@ class MemoryProvider(ABC):
should all have ``env_var`` set and this method stays no-op).
"""
def on_memory_write(
self,
action: str,
target: str,
content: str,
metadata: Optional[Dict[str, Any]] = None,
) -> None:
def on_memory_write(self, action: str, target: str, content: str) -> None:
"""Called when the built-in memory tool writes an entry.
action: 'add', 'replace', or 'remove'
target: 'memory' or 'user'
content: the entry content
metadata: structured provenance for the write, when available. Common
keys include ``write_origin``, ``execution_context``, ``session_id``,
``parent_session_id``, ``platform``, and ``tool_name``.
Use to mirror built-in memory writes to your backend.
"""
+25 -222
View File
@@ -6,7 +6,6 @@ and run_agent.py for pre-flight context checks.
import ipaddress
import logging
import os
import re
import time
from pathlib import Path
@@ -22,25 +21,6 @@ from hermes_constants import OPENROUTER_MODELS_URL
logger = logging.getLogger(__name__)
def _resolve_requests_verify() -> bool | str:
"""Resolve SSL verify setting for `requests` calls from env vars.
The `requests` library only honours REQUESTS_CA_BUNDLE / CURL_CA_BUNDLE
by default. Hermes also honours HERMES_CA_BUNDLE (its own convention)
and SSL_CERT_FILE (used by the stdlib `ssl` module and by httpx), so
that a single env var can cover both `requests` and `httpx` callsites
inside the same process.
Returns either a filesystem path to a CA bundle, or True to defer to
the requests default (certifi).
"""
for env_var in ("HERMES_CA_BUNDLE", "REQUESTS_CA_BUNDLE", "SSL_CERT_FILE"):
val = os.getenv(env_var)
if val and os.path.isfile(val):
return val
return True
# Provider names that can appear as a "provider:" prefix before a model ID.
# Only these are stripped — Ollama-style "model:tag" colons (e.g. "qwen3.5:27b")
# are preserved so the full model name reaches cache lookups and server queries.
@@ -143,10 +123,6 @@ DEFAULT_CONTEXT_LENGTHS = {
"claude": 200000,
# OpenAI — GPT-5 family (most have 400k; specific overrides first)
# Source: https://developers.openai.com/api/docs/models
# GPT-5.5 (launched Apr 23 2026). 400k is the fallback for providers we
# can't probe live. ChatGPT Codex OAuth actually caps lower (272k as of
# Apr 2026) and is resolved via _resolve_codex_oauth_context_length().
"gpt-5.5": 400000,
"gpt-5.4-nano": 400000, # 400k (not 1.05M like full 5.4)
"gpt-5.4-mini": 400000, # 400k (not 1.05M like full 5.4)
"gpt-5.4": 1050000, # GPT-5.4, GPT-5.4 Pro (1.05M context)
@@ -207,12 +183,12 @@ DEFAULT_CONTEXT_LENGTHS = {
"moonshotai/Kimi-K2.6": 262144,
"moonshotai/Kimi-K2-Thinking": 262144,
"MiniMaxAI/MiniMax-M2.5": 204800,
"XiaomiMiMo/MiMo-V2-Flash": 262144,
"mimo-v2-pro": 1048576,
"mimo-v2.5-pro": 1048576,
"mimo-v2.5": 1048576,
"mimo-v2-omni": 262144,
"mimo-v2-flash": 262144,
"XiaomiMiMo/MiMo-V2-Flash": 256000,
"mimo-v2-pro": 1000000,
"mimo-v2-omni": 256000,
"mimo-v2-flash": 256000,
"mimo-v2.5-pro": 1000000,
"mimo-v2.5": 1000000,
"zai-org/GLM-5": 202752,
}
@@ -515,7 +491,7 @@ def fetch_model_metadata(force_refresh: bool = False) -> Dict[str, Dict[str, Any
return _model_metadata_cache
try:
response = requests.get(OPENROUTER_MODELS_URL, timeout=10, verify=_resolve_requests_verify())
response = requests.get(OPENROUTER_MODELS_URL, timeout=10)
response.raise_for_status()
data = response.json()
@@ -582,7 +558,6 @@ def fetch_endpoint_model_metadata(
server_url.rstrip("/") + "/api/v1/models",
headers=headers,
timeout=10,
verify=_resolve_requests_verify(),
)
response.raise_for_status()
payload = response.json()
@@ -631,7 +606,7 @@ def fetch_endpoint_model_metadata(
for candidate in candidates:
url = candidate.rstrip("/") + "/models"
try:
response = requests.get(url, headers=headers, timeout=10, verify=_resolve_requests_verify())
response = requests.get(url, headers=headers, timeout=10)
response.raise_for_status()
payload = response.json()
cache: Dict[str, Dict[str, Any]] = {}
@@ -662,10 +637,9 @@ def fetch_endpoint_model_metadata(
try:
# Try /v1/props first (current llama.cpp); fall back to /props for older builds
base = candidate.rstrip("/").replace("/v1", "")
_verify = _resolve_requests_verify()
props_resp = requests.get(base + "/v1/props", headers=headers, timeout=5, verify=_verify)
props_resp = requests.get(base + "/v1/props", headers=headers, timeout=5)
if not props_resp.ok:
props_resp = requests.get(base + "/props", headers=headers, timeout=5, verify=_verify)
props_resp = requests.get(base + "/props", headers=headers, timeout=5)
if props_resp.ok:
props = props_resp.json()
gen_settings = props.get("default_generation_settings", {})
@@ -737,22 +711,6 @@ def get_cached_context_length(model: str, base_url: str) -> Optional[int]:
return cache.get(key)
def _invalidate_cached_context_length(model: str, base_url: str) -> None:
"""Drop a stale cache entry so it gets re-resolved on the next lookup."""
key = f"{model}@{base_url}"
cache = _load_context_cache()
if key not in cache:
return
del cache[key]
path = _get_context_cache_path()
try:
path.parent.mkdir(parents=True, exist_ok=True)
with open(path, "w") as f:
yaml.dump({"context_lengths": cache}, f, default_flow_style=False)
except Exception as e:
logger.debug("Failed to invalidate context length cache entry %s: %s", key, e)
def get_next_probe_tier(current_length: int) -> Optional[int]:
"""Return the next lower probe tier, or None if already at minimum."""
for tier in CONTEXT_PROBE_TIERS:
@@ -1030,7 +988,7 @@ def _query_anthropic_context_length(model: str, base_url: str, api_key: str) ->
"x-api-key": api_key,
"anthropic-version": "2023-06-01",
}
resp = requests.get(url, headers=headers, timeout=10, verify=_resolve_requests_verify())
resp = requests.get(url, headers=headers, timeout=10)
if resp.status_code != 200:
return None
data = resp.json()
@@ -1044,116 +1002,6 @@ def _query_anthropic_context_length(model: str, base_url: str, api_key: str) ->
return None
# Known ChatGPT Codex OAuth context windows (observed via live
# chatgpt.com/backend-api/codex/models probe, Apr 2026). These are the
# `context_window` values, which are what Codex actually enforces — the
# direct OpenAI API has larger limits for the same slugs, but Codex OAuth
# caps lower (e.g. gpt-5.5 is 1.05M on the API, 272K on Codex).
#
# Used as a fallback when the live probe fails (no token, network error).
# Longest keys first so substring match picks the most specific entry.
_CODEX_OAUTH_CONTEXT_FALLBACK: Dict[str, int] = {
"gpt-5.1-codex-max": 272_000,
"gpt-5.1-codex-mini": 272_000,
"gpt-5.3-codex": 272_000,
"gpt-5.2-codex": 272_000,
"gpt-5.4-mini": 272_000,
"gpt-5.5": 272_000,
"gpt-5.4": 272_000,
"gpt-5.2": 272_000,
"gpt-5": 272_000,
}
_codex_oauth_context_cache: Dict[str, int] = {}
_codex_oauth_context_cache_time: float = 0.0
_CODEX_OAUTH_CONTEXT_CACHE_TTL = 3600 # 1 hour
def _fetch_codex_oauth_context_lengths(access_token: str) -> Dict[str, int]:
"""Probe the ChatGPT Codex /models endpoint for per-slug context windows.
Codex OAuth imposes its own context limits that differ from the direct
OpenAI API (e.g. gpt-5.5 is 1.05M on the API, 272K on Codex). The
`context_window` field in each model entry is the authoritative source.
Returns a ``{slug: context_window}`` dict. Empty on failure.
"""
global _codex_oauth_context_cache, _codex_oauth_context_cache_time
now = time.time()
if (
_codex_oauth_context_cache
and now - _codex_oauth_context_cache_time < _CODEX_OAUTH_CONTEXT_CACHE_TTL
):
return _codex_oauth_context_cache
try:
resp = requests.get(
"https://chatgpt.com/backend-api/codex/models?client_version=1.0.0",
headers={"Authorization": f"Bearer {access_token}"},
timeout=10,
verify=_resolve_requests_verify(),
)
if resp.status_code != 200:
logger.debug(
"Codex /models probe returned HTTP %s; falling back to hardcoded defaults",
resp.status_code,
)
return {}
data = resp.json()
except Exception as exc:
logger.debug("Codex /models probe failed: %s", exc)
return {}
entries = data.get("models", []) if isinstance(data, dict) else []
result: Dict[str, int] = {}
for item in entries:
if not isinstance(item, dict):
continue
slug = item.get("slug")
ctx = item.get("context_window")
if isinstance(slug, str) and isinstance(ctx, int) and ctx > 0:
result[slug.strip()] = ctx
if result:
_codex_oauth_context_cache = result
_codex_oauth_context_cache_time = now
return result
def _resolve_codex_oauth_context_length(
model: str, access_token: str = ""
) -> Optional[int]:
"""Resolve a Codex OAuth model's real context window.
Prefers a live probe of chatgpt.com/backend-api/codex/models (when we
have a bearer token), then falls back to ``_CODEX_OAUTH_CONTEXT_FALLBACK``.
"""
model_bare = _strip_provider_prefix(model).strip()
if not model_bare:
return None
if access_token:
live = _fetch_codex_oauth_context_lengths(access_token)
if model_bare in live:
return live[model_bare]
# Case-insensitive match in case casing drifts
model_lower = model_bare.lower()
for slug, ctx in live.items():
if slug.lower() == model_lower:
return ctx
# Fallback: longest-key-first substring match over hardcoded defaults.
model_lower = model_bare.lower()
for slug, ctx in sorted(
_CODEX_OAUTH_CONTEXT_FALLBACK.items(), key=lambda x: len(x[0]), reverse=True
):
if slug in model_lower:
return ctx
return None
def _resolve_nous_context_length(model: str) -> Optional[int]:
"""Resolve Nous Portal model context length via OpenRouter metadata.
@@ -1199,7 +1047,6 @@ def get_model_context_length(
Resolution order:
0. Explicit config override (model.context_length or custom_providers per-model)
1. Persistent cache (previously discovered via probing)
1b. AWS Bedrock static table (must precede custom-endpoint probe)
2. Active endpoint metadata (/models for explicit custom endpoints)
3. Local server query (for local endpoints)
4. Anthropic /v1/models API (API-key users only, not OAuth)
@@ -1222,41 +1069,7 @@ def get_model_context_length(
if base_url:
cached = get_cached_context_length(model, base_url)
if cached is not None:
# Invalidate stale Codex OAuth cache entries: pre-PR #14935 builds
# resolved gpt-5.x to the direct-API value (e.g. 1.05M) via
# models.dev and persisted it. Codex OAuth caps at 272K for every
# slug, so any cached Codex entry at or above 400K is a leftover
# from the old resolution path. Drop it and fall through to the
# live /models probe in step 5 below.
if provider == "openai-codex" and cached >= 400_000:
logger.info(
"Dropping stale Codex cache entry %s@%s -> %s (pre-fix value); "
"re-resolving via live /models probe",
model, base_url, f"{cached:,}",
)
_invalidate_cached_context_length(model, base_url)
else:
return cached
# 1b. AWS Bedrock — use static context length table.
# Bedrock's ListFoundationModels API doesn't expose context window sizes,
# so we maintain a curated table in bedrock_adapter.py that reflects
# AWS-imposed limits (e.g. 200K for Claude models vs 1M on the native
# Anthropic API). This must run BEFORE the custom-endpoint probe at
# step 2 — bedrock-runtime.<region>.amazonaws.com is not in
# _URL_TO_PROVIDER, so it would otherwise be treated as a custom endpoint,
# fail the /models probe (Bedrock doesn't expose that shape), and fall
# back to the 128K default before reaching the original step 4b branch.
if provider == "bedrock" or (
base_url
and base_url_hostname(base_url).startswith("bedrock-runtime.")
and base_url_host_matches(base_url, "amazonaws.com")
):
try:
from agent.bedrock_adapter import get_bedrock_context_length
return get_bedrock_context_length(model)
except ImportError:
pass # boto3 not installed — fall through to generic resolution
return cached
# 2. Active endpoint metadata for truly custom/unknown endpoints.
# Known providers (Copilot, OpenAI, Anthropic, etc.) skip this — their
@@ -1303,7 +1116,19 @@ def get_model_context_length(
if ctx:
return ctx
# 4b. (Bedrock handled earlier at step 1b — before custom-endpoint probe.)
# 4b. AWS Bedrock — use static context length table.
# Bedrock's ListFoundationModels doesn't expose context window sizes,
# so we maintain a curated table in bedrock_adapter.py.
if provider == "bedrock" or (
base_url
and base_url_hostname(base_url).startswith("bedrock-runtime.")
and base_url_host_matches(base_url, "amazonaws.com")
):
try:
from agent.bedrock_adapter import get_bedrock_context_length
return get_bedrock_context_length(model)
except ImportError:
pass # boto3 not installed — fall through to generic resolution
# 5. Provider-aware lookups (before generic OpenRouter cache)
# These are provider-specific and take priority over the generic OR cache,
@@ -1317,32 +1142,10 @@ def get_model_context_length(
if inferred:
effective_provider = inferred
# 5a. Copilot live /models API — max_prompt_tokens from the user's account.
# This catches account-specific models (e.g. claude-opus-4.6-1m) that
# don't exist in models.dev. For models that ARE in models.dev, this
# returns the provider-enforced limit which is what users can actually use.
if effective_provider in ("copilot", "copilot-acp", "github-copilot"):
try:
from hermes_cli.models import get_copilot_model_context
ctx = get_copilot_model_context(model, api_key=api_key)
if ctx:
return ctx
except Exception:
pass # Fall through to models.dev
if effective_provider == "nous":
ctx = _resolve_nous_context_length(model)
if ctx:
return ctx
if effective_provider == "openai-codex":
# Codex OAuth enforces lower context limits than the direct OpenAI
# API for the same slug (e.g. gpt-5.5 is 1.05M on the API but 272K
# on Codex). Authoritative source is Codex's own /models endpoint.
codex_ctx = _resolve_codex_oauth_context_length(model, access_token=api_key or "")
if codex_ctx:
if base_url:
save_context_length(model, base_url, codex_ctx)
return codex_ctx
if effective_provider:
from agent.models_dev import lookup_models_dev_context
ctx = lookup_models_dev_context(effective_provider, model)
-190
View File
@@ -1,190 +0,0 @@
"""Helpers for translating OpenAI-style tool schemas to Moonshot's schema subset.
Moonshot (Kimi) accepts a stricter subset of JSON Schema than standard OpenAI
tool calling. Requests that violate it fail with HTTP 400:
tools.function.parameters is not a valid moonshot flavored json schema,
details: <...>
Known rejection modes documented at
https://forum.moonshot.ai/t/tool-calling-specification-violation-on-moonshot-api/102
and MoonshotAI/kimi-cli#1595:
1. Every property schema must carry a ``type``. Standard JSON Schema allows
type to be omitted (the value is then unconstrained); Moonshot refuses.
2. When ``anyOf`` is used, ``type`` must be on the ``anyOf`` children, not
the parent. Presence of both causes "type should be defined in anyOf
items instead of the parent schema".
The ``#/definitions/...`` → ``#/$defs/...`` rewrite for draft-07 refs is
handled separately in ``tools/mcp_tool._normalize_mcp_input_schema`` so it
applies at MCP registration time for all providers.
"""
from __future__ import annotations
import copy
from typing import Any, Dict, List
# Keys whose values are maps of name → schema (not schemas themselves).
# When we recurse, we walk the values of these maps as schemas, but we do
# NOT apply the missing-type repair to the map itself.
_SCHEMA_MAP_KEYS = frozenset({"properties", "patternProperties", "$defs", "definitions"})
# Keys whose values are lists of schemas.
_SCHEMA_LIST_KEYS = frozenset({"anyOf", "oneOf", "allOf", "prefixItems"})
# Keys whose values are a single nested schema.
_SCHEMA_NODE_KEYS = frozenset({"items", "contains", "not", "additionalProperties", "propertyNames"})
def _repair_schema(node: Any, is_schema: bool = True) -> Any:
"""Recursively apply Moonshot repairs to a schema node.
``is_schema=True`` means this dict is a JSON Schema node and gets the
missing-type + anyOf-parent repairs applied. ``is_schema=False`` means
it's a container map (e.g. the value of ``properties``) and we only
recurse into its values.
"""
if isinstance(node, list):
# Lists only show up under schema-list keys (anyOf/oneOf/allOf), so
# every element is itself a schema.
return [_repair_schema(item, is_schema=True) for item in node]
if not isinstance(node, dict):
return node
# Walk the dict, deciding per-key whether recursion is into a schema
# node, a container map, or a scalar.
repaired: Dict[str, Any] = {}
for key, value in node.items():
if key in _SCHEMA_MAP_KEYS and isinstance(value, dict):
# Map of name → schema. Don't treat the map itself as a schema
# (it has no type / properties of its own), but each value is.
repaired[key] = {
sub_key: _repair_schema(sub_val, is_schema=True)
for sub_key, sub_val in value.items()
}
elif key in _SCHEMA_LIST_KEYS and isinstance(value, list):
repaired[key] = [_repair_schema(v, is_schema=True) for v in value]
elif key in _SCHEMA_NODE_KEYS:
# items / not / additionalProperties: single nested schema.
# additionalProperties can also be a bool — leave those alone.
if isinstance(value, dict):
repaired[key] = _repair_schema(value, is_schema=True)
else:
repaired[key] = value
else:
# Scalars (description, title, format, enum values, etc.) pass through.
repaired[key] = value
if not is_schema:
return repaired
# Rule 2: when anyOf is present, type belongs only on the children.
if "anyOf" in repaired and isinstance(repaired["anyOf"], list):
repaired.pop("type", None)
return repaired
# Rule 1: property schemas without type need one. $ref nodes are exempt
# — their type comes from the referenced definition.
if "$ref" in repaired:
return repaired
return _fill_missing_type(repaired)
def _fill_missing_type(node: Dict[str, Any]) -> Dict[str, Any]:
"""Infer a reasonable ``type`` if this schema node has none."""
if "type" in node and node["type"] not in (None, ""):
return node
# Heuristic: presence of ``properties`` → object, ``items`` → array, ``enum``
# → type of first enum value, else fall back to ``string`` (safest scalar).
if "properties" in node or "required" in node or "additionalProperties" in node:
inferred = "object"
elif "items" in node or "prefixItems" in node:
inferred = "array"
elif "enum" in node and isinstance(node["enum"], list) and node["enum"]:
sample = node["enum"][0]
if isinstance(sample, bool):
inferred = "boolean"
elif isinstance(sample, int):
inferred = "integer"
elif isinstance(sample, float):
inferred = "number"
else:
inferred = "string"
else:
inferred = "string"
return {**node, "type": inferred}
def sanitize_moonshot_tool_parameters(parameters: Any) -> Dict[str, Any]:
"""Normalize tool parameters to a Moonshot-compatible object schema.
Returns a deep-copied schema with the two flavored-JSON-Schema repairs
applied. Input is not mutated.
"""
if not isinstance(parameters, dict):
return {"type": "object", "properties": {}}
repaired = _repair_schema(copy.deepcopy(parameters), is_schema=True)
if not isinstance(repaired, dict):
return {"type": "object", "properties": {}}
# Top-level must be an object schema
if repaired.get("type") != "object":
repaired["type"] = "object"
if "properties" not in repaired:
repaired["properties"] = {}
return repaired
def sanitize_moonshot_tools(tools: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
"""Apply ``sanitize_moonshot_tool_parameters`` to every tool's parameters."""
if not tools:
return tools
sanitized: List[Dict[str, Any]] = []
any_change = False
for tool in tools:
if not isinstance(tool, dict):
sanitized.append(tool)
continue
fn = tool.get("function")
if not isinstance(fn, dict):
sanitized.append(tool)
continue
params = fn.get("parameters")
repaired = sanitize_moonshot_tool_parameters(params)
if repaired is not params:
any_change = True
new_fn = {**fn, "parameters": repaired}
sanitized.append({**tool, "function": new_fn})
else:
sanitized.append(tool)
return sanitized if any_change else tools
def is_moonshot_model(model: str | None) -> bool:
"""True for any Kimi / Moonshot model slug, regardless of aggregator prefix.
Matches bare names (``kimi-k2.6``, ``moonshotai/Kimi-K2.6``) and aggregator-
prefixed slugs (``nous/moonshotai/kimi-k2.6``, ``openrouter/moonshotai/...``).
Detection by model name covers Nous / OpenRouter / other aggregators that
route to Moonshot's inference, where the base URL is the aggregator's, not
``api.moonshot.ai``.
"""
if not model:
return False
bare = model.strip().lower()
# Last path segment (covers aggregator-prefixed slugs)
tail = bare.rsplit("/", 1)[-1]
if tail.startswith("kimi-") or tail == "kimi":
return True
# Vendor-prefixed forms commonly used on aggregators
if "moonshot" in bare or "/kimi" in bare or bare.startswith("kimi"):
return True
return False
+135 -12
View File
@@ -1,29 +1,154 @@
"""Shared slash command helpers for skills.
"""Shared slash command helpers for skills and built-in prompt-style modes.
Shared between CLI (cli.py) and gateway (gateway/run.py) so both surfaces
can invoke skills via /skill-name commands.
can invoke skills via /skill-name commands and prompt-only built-ins like
/plan.
"""
import json
import logging
import re
import subprocess
from datetime import datetime
from pathlib import Path
from typing import Any, Dict, Optional
from hermes_constants import display_hermes_home
from agent.skill_preprocessing import (
expand_inline_shell as _expand_inline_shell,
load_skills_config as _load_skills_config,
substitute_template_vars as _substitute_template_vars,
)
logger = logging.getLogger(__name__)
_skill_commands: Dict[str, Dict[str, Any]] = {}
_PLAN_SLUG_RE = re.compile(r"[^a-z0-9]+")
# Patterns for sanitizing skill names into clean hyphen-separated slugs.
_SKILL_INVALID_CHARS = re.compile(r"[^a-z0-9-]")
_SKILL_MULTI_HYPHEN = re.compile(r"-{2,}")
# Matches ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} tokens in SKILL.md.
# Tokens that don't resolve (e.g. ${HERMES_SESSION_ID} with no session) are
# left as-is so the user can debug them.
_SKILL_TEMPLATE_RE = re.compile(r"\$\{(HERMES_SKILL_DIR|HERMES_SESSION_ID)\}")
# Matches inline shell snippets like: !`date +%Y-%m-%d`
# Non-greedy, single-line only — no newlines inside the backticks.
_INLINE_SHELL_RE = re.compile(r"!`([^`\n]+)`")
# Cap inline-shell output so a runaway command can't blow out the context.
_INLINE_SHELL_MAX_OUTPUT = 4000
def _load_skills_config() -> dict:
"""Load the ``skills`` section of config.yaml (best-effort)."""
try:
from hermes_cli.config import load_config
cfg = load_config() or {}
skills_cfg = cfg.get("skills")
if isinstance(skills_cfg, dict):
return skills_cfg
except Exception:
logger.debug("Could not read skills config", exc_info=True)
return {}
def _substitute_template_vars(
content: str,
skill_dir: Path | None,
session_id: str | None,
) -> str:
"""Replace ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} in skill content.
Only substitutes tokens for which a concrete value is available
unresolved tokens are left in place so the author can spot them.
"""
if not content:
return content
skill_dir_str = str(skill_dir) if skill_dir else None
def _replace(match: re.Match) -> str:
token = match.group(1)
if token == "HERMES_SKILL_DIR" and skill_dir_str:
return skill_dir_str
if token == "HERMES_SESSION_ID" and session_id:
return str(session_id)
return match.group(0)
return _SKILL_TEMPLATE_RE.sub(_replace, content)
def _run_inline_shell(command: str, cwd: Path | None, timeout: int) -> str:
"""Execute a single inline-shell snippet and return its stdout (trimmed).
Failures return a short ``[inline-shell error: ...]`` marker instead of
raising, so one bad snippet can't wreck the whole skill message.
"""
try:
completed = subprocess.run(
["bash", "-c", command],
cwd=str(cwd) if cwd else None,
capture_output=True,
text=True,
timeout=max(1, int(timeout)),
check=False,
)
except subprocess.TimeoutExpired:
return f"[inline-shell timeout after {timeout}s: {command}]"
except FileNotFoundError:
return f"[inline-shell error: bash not found]"
except Exception as exc:
return f"[inline-shell error: {exc}]"
output = (completed.stdout or "").rstrip("\n")
if not output and completed.stderr:
output = completed.stderr.rstrip("\n")
if len(output) > _INLINE_SHELL_MAX_OUTPUT:
output = output[:_INLINE_SHELL_MAX_OUTPUT] + "…[truncated]"
return output
def _expand_inline_shell(
content: str,
skill_dir: Path | None,
timeout: int,
) -> str:
"""Replace every !`cmd` snippet in ``content`` with its stdout.
Runs each snippet with the skill directory as CWD so relative paths in
the snippet work the way the author expects.
"""
if "!`" not in content:
return content
def _replace(match: re.Match) -> str:
cmd = match.group(1).strip()
if not cmd:
return ""
return _run_inline_shell(cmd, skill_dir, timeout)
return _INLINE_SHELL_RE.sub(_replace, content)
def build_plan_path(
user_instruction: str = "",
*,
now: datetime | None = None,
) -> Path:
"""Return the default workspace-relative markdown path for a /plan invocation.
Relative paths are intentional: file tools are task/backend-aware and resolve
them against the active working directory for local, docker, ssh, modal,
daytona, and similar terminal backends. That keeps the plan with the active
workspace instead of the Hermes host's global home directory.
"""
slug_source = (user_instruction or "").strip().splitlines()[0] if user_instruction else ""
slug = _PLAN_SLUG_RE.sub("-", slug_source.lower()).strip("-")
if slug:
slug = "-".join(part for part in slug.split("-")[:8] if part)[:48].strip("-")
slug = slug or "conversation-plan"
timestamp = (now or datetime.now()).strftime("%Y-%m-%d_%H%M%S")
return Path(".hermes") / "plans" / f"{timestamp}-{slug}.md"
def _load_skill_payload(skill_identifier: str, task_id: str | None = None) -> tuple[dict[str, Any], Path | None, str] | None:
"""Load a skill by name/path and return (loaded_payload, skill_dir, display_name)."""
raw_identifier = (skill_identifier or "").strip()
@@ -42,9 +167,7 @@ def _load_skill_payload(skill_identifier: str, task_id: str | None = None) -> tu
else:
normalized = raw_identifier.lstrip("/")
loaded_skill = json.loads(
skill_view(normalized, task_id=task_id, preprocess=False)
)
loaded_skill = json.loads(skill_view(normalized, task_id=task_id))
except Exception:
return None
@@ -222,7 +345,7 @@ def scan_skill_commands() -> Dict[str, Dict[str, Any]]:
_skill_commands = {}
try:
from tools.skills_tool import SKILLS_DIR, _parse_frontmatter, skill_matches_platform, _get_disabled_skill_names
from agent.skill_utils import get_external_skills_dirs, iter_skill_index_files
from agent.skill_utils import get_external_skills_dirs
disabled = _get_disabled_skill_names()
seen_names: set = set()
@@ -233,7 +356,7 @@ def scan_skill_commands() -> Dict[str, Dict[str, Any]]:
dirs_to_scan.extend(get_external_skills_dirs())
for scan_dir in dirs_to_scan:
for skill_md in iter_skill_index_files(scan_dir, "SKILL.md"):
for skill_md in scan_dir.rglob("SKILL.md"):
if any(part in ('.git', '.github', '.hub') for part in skill_md.parts):
continue
try:
-131
View File
@@ -1,131 +0,0 @@
"""Shared SKILL.md preprocessing helpers."""
import logging
import re
import subprocess
from pathlib import Path
logger = logging.getLogger(__name__)
# Matches ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} tokens in SKILL.md.
# Tokens that don't resolve (e.g. ${HERMES_SESSION_ID} with no session) are
# left as-is so the user can debug them.
_SKILL_TEMPLATE_RE = re.compile(r"\$\{(HERMES_SKILL_DIR|HERMES_SESSION_ID)\}")
# Matches inline shell snippets like: !`date +%Y-%m-%d`
# Non-greedy, single-line only -- no newlines inside the backticks.
_INLINE_SHELL_RE = re.compile(r"!`([^`\n]+)`")
# Cap inline-shell output so a runaway command can't blow out the context.
_INLINE_SHELL_MAX_OUTPUT = 4000
def load_skills_config() -> dict:
"""Load the ``skills`` section of config.yaml (best-effort)."""
try:
from hermes_cli.config import load_config
cfg = load_config() or {}
skills_cfg = cfg.get("skills")
if isinstance(skills_cfg, dict):
return skills_cfg
except Exception:
logger.debug("Could not read skills config", exc_info=True)
return {}
def substitute_template_vars(
content: str,
skill_dir: Path | None,
session_id: str | None,
) -> str:
"""Replace ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} in skill content.
Only substitutes tokens for which a concrete value is available --
unresolved tokens are left in place so the author can spot them.
"""
if not content:
return content
skill_dir_str = str(skill_dir) if skill_dir else None
def _replace(match: re.Match) -> str:
token = match.group(1)
if token == "HERMES_SKILL_DIR" and skill_dir_str:
return skill_dir_str
if token == "HERMES_SESSION_ID" and session_id:
return str(session_id)
return match.group(0)
return _SKILL_TEMPLATE_RE.sub(_replace, content)
def run_inline_shell(command: str, cwd: Path | None, timeout: int) -> str:
"""Execute a single inline-shell snippet and return its stdout (trimmed).
Failures return a short ``[inline-shell error: ...]`` marker instead of
raising, so one bad snippet can't wreck the whole skill message.
"""
try:
completed = subprocess.run(
["bash", "-c", command],
cwd=str(cwd) if cwd else None,
capture_output=True,
text=True,
timeout=max(1, int(timeout)),
check=False,
)
except subprocess.TimeoutExpired:
return f"[inline-shell timeout after {timeout}s: {command}]"
except FileNotFoundError:
return "[inline-shell error: bash not found]"
except Exception as exc:
return f"[inline-shell error: {exc}]"
output = (completed.stdout or "").rstrip("\n")
if not output and completed.stderr:
output = completed.stderr.rstrip("\n")
if len(output) > _INLINE_SHELL_MAX_OUTPUT:
output = output[:_INLINE_SHELL_MAX_OUTPUT] + "...[truncated]"
return output
def expand_inline_shell(
content: str,
skill_dir: Path | None,
timeout: int,
) -> str:
"""Replace every !`cmd` snippet in ``content`` with its stdout.
Runs each snippet with the skill directory as CWD so relative paths in
the snippet work the way the author expects.
"""
if "!`" not in content:
return content
def _replace(match: re.Match) -> str:
cmd = match.group(1).strip()
if not cmd:
return ""
return run_inline_shell(cmd, skill_dir, timeout)
return _INLINE_SHELL_RE.sub(_replace, content)
def preprocess_skill_content(
content: str,
skill_dir: Path | None,
session_id: str | None = None,
skills_cfg: dict | None = None,
) -> str:
"""Apply configured SKILL.md template and inline-shell preprocessing."""
if not content:
return content
cfg = skills_cfg if isinstance(skills_cfg, dict) else load_skills_config()
if cfg.get("template_vars", True):
content = substitute_template_vars(content, skill_dir, session_id)
if cfg.get("inline_shell", False):
timeout = int(cfg.get("inline_shell_timeout", 10) or 10)
content = expand_inline_shell(content, skill_dir, timeout)
return content
+2 -1
View File
@@ -455,7 +455,8 @@ def parse_qualified_name(name: str) -> Tuple[Optional[str], str]:
"""
if ":" not in name:
return None, name
return tuple(name.split(":", 1)) # type: ignore[return-value]
ns, bare = name.split(":", 1)
return ns, bare
def is_valid_namespace(candidate: Optional[str]) -> bool:
-6
View File
@@ -12,7 +12,6 @@ reasoning configuration, temperature handling, and extra_body assembly.
import copy
from typing import Any, Dict, List, Optional
from agent.moonshot_schema import is_moonshot_model, sanitize_moonshot_tools
from agent.prompt_builder import DEVELOPER_ROLE_MODELS
from agent.transports.base import ProviderTransport
from agent.transports.types import NormalizedResponse, ToolCall, Usage
@@ -173,11 +172,6 @@ class ChatCompletionsTransport(ProviderTransport):
# Tools
if tools:
# Moonshot/Kimi uses a stricter flavored JSON Schema. Rewriting
# tool parameters here keeps aggregator routes (Nous, OpenRouter,
# etc.) compatible, in addition to direct moonshot.ai endpoints.
if is_moonshot_model(model):
tools = sanitize_moonshot_tools(tools)
api_kwargs["tools"] = tools
# max_tokens resolution — priority: ephemeral > user > provider default
-14
View File
@@ -61,20 +61,6 @@ class ToolCall:
"""Codex response_item_id from provider_data."""
return (self.provider_data or {}).get("response_item_id")
@property
def extra_content(self) -> Optional[Dict[str, Any]]:
"""Gemini extra_content (thought_signature) from provider_data.
Gemini 3 thinking models attach ``extra_content`` with a
``thought_signature`` to each tool call. This signature must be
replayed on subsequent API calls without it the API rejects the
request with HTTP 400. The chat_completions transport stores this
in ``provider_data["extra_content"]``; this property exposes it so
``_build_assistant_message`` can ``getattr(tc, "extra_content")``
uniformly.
"""
return (self.provider_data or {}).get("extra_content")
@dataclass
class Usage:
-58
View File
@@ -1,58 +0,0 @@
# Hermes Apps
Platform apps live here. The first app is a cross-platform GUI shell around the
existing Hermes dashboard; it should not fork chat, config, logs, or session UI.
## Shape
```text
apps/
gui/ # cross-platform app shell: dev Chrome shell now, Tauri native next
shared/ # runtime bundle notes/scripts used by Windows + macOS packaging
```
## Desktop Dev
The backend-only GUI mode is:
```bash
hermes dashboard --gui
```
The fast GUI shell is:
```powershell
cd \\wsl$\Ubuntu\home\bb\hermes-agent\apps\gui
npm run dev
```
The native Tauri shell is:
```powershell
cd \\wsl$\Ubuntu\home\bb\hermes-agent\apps\gui
npm run dev:tauri
```
`--gui` implies the embedded TUI; do not pass `--tui` separately for GUI mode.
## MVP Boundary
Included:
- bundled Python runtime
- bundled Node/TUI runtime
- CLI install to PATH
- profile picker and first-run setup
- dashboard health/reconnect state
- tray controls
- desktop notifications
- Windows installer
Deferred:
- code signing
- native self-updater
- store distribution
For MVP updates, the desktop UI should run the existing `hermes update` flow and
surface progress/finish notifications.
-102
View File
@@ -1,102 +0,0 @@
# Hermes GUI
Cross-platform GUI shell for the Hermes dashboard.
## Fast Dev Shell
This gets a GUI window on Windows/WSL today by launching Chrome in app mode:
```bash
cd apps/gui
npm run dev
```
It starts `hermes dashboard --gui --no-open --port 9120`, waits for
`/api/health`, then opens a standalone app window at `http://127.0.0.1:9120`.
## Native Shell
The native Tauri shell is still scaffolded:
```bash
cd apps/gui
npm run dev:tauri
```
From Windows PowerShell on a `\\wsl$` path, use PowerShell `npm`, not
`npm.cmd`:
```powershell
Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass -Force
cd \\wsl$\Ubuntu\home\bb\hermes-agent\apps\gui
npm run dev:tauri
```
`npm.cmd` goes through `cmd.exe`, and `cmd.exe` cannot use UNC paths as the
current directory.
If `npm run` still falls through `cmd.exe`, bypass npm entirely:
```powershell
\\wsl$\Ubuntu\home\bb\hermes-agent\apps\gui\dev-tauri.ps1
```
The launcher builds into `%LOCALAPPDATA%\Hermes\cargo-target\gui` instead of
`\\wsl$` because Windows Cargo incremental locks do not work reliably on UNC
WSL filesystems.
In dev, either start Hermes yourself:
```bash
hermes dashboard --gui --no-open --port 9120
```
or let the native shell start it. The tray menu owns:
- Open Hermes
- Open in Browser
- Restart Hermes Runtime
- Quit Hermes
The native shell reuses a healthy GUI runtime when one is already running.
Otherwise it picks the first free port from `9120..9139`, passes that port into
the WSL/backend process, and navigates the Tauri window there. Set
`HERMES_GUI_PORT` to force a starting port.
## Fresh Install Emulation
Use an isolated Hermes home without touching your real `~/.hermes`:
```powershell
powershell.exe -ExecutionPolicy Bypass -File \\wsl$\Ubuntu\home\bb\hermes-agent\apps\gui\dev-tauri.ps1 -Fresh
```
Reset that disposable home and run again:
```powershell
powershell.exe -ExecutionPolicy Bypass -File \\wsl$\Ubuntu\home\bb\hermes-agent\apps\gui\dev-tauri.ps1 -Fresh -ResetFresh
```
Fresh mode stores state in `%LOCALAPPDATA%\Hermes\fresh-install-home` and starts
from port `9140` so it does not collide with your normal GUI dev session.
Set `HERMES_GUI_MIN_SPLASH_MS` only when debugging the startup screen; default
startup is instant once the backend is healthy.
## Boundary
GUI owns:
- app shell/window
- startup state
- sidecar process lifecycle
- future tray/notifications/installers
Hermes owns:
- dashboard UI
- auth/session token
- profiles/config/env
- TUI/PTT chat bridge
- tools/skills/gateway
- update flow
-57
View File
@@ -1,57 +0,0 @@
param(
[string]$Command = "dev",
[switch]$Fresh,
[switch]$ResetFresh
)
$ErrorActionPreference = "Stop"
Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass -Force
$AppRoot = Split-Path -Parent $MyInvocation.MyCommand.Path
$Script = Join-Path $AppRoot "scripts\tauri.mjs"
if (-not (Get-Command node -ErrorAction SilentlyContinue)) {
throw "Windows Node.js was not found. Install it with: winget install OpenJS.NodeJS.LTS"
}
if (-not (Get-Command rustc -ErrorAction SilentlyContinue)) {
throw "Windows Rust was not found. Install it with: winget install Rustlang.Rustup"
}
$Tauri = Get-Command tauri -ErrorAction SilentlyContinue
$CargoTauri = Get-Command cargo-tauri -ErrorAction SilentlyContinue
if (-not $Tauri -and -not $CargoTauri) {
throw "Tauri CLI not found. Install it with: npm install -g @tauri-apps/cli (run from a normal Windows path, not \\wsl$)"
}
$env:CARGO_INCREMENTAL = "0"
$env:CARGO_TARGET_DIR = Join-Path $env:LOCALAPPDATA "Hermes\cargo-target\gui"
New-Item -ItemType Directory -Force -Path $env:CARGO_TARGET_DIR | Out-Null
if ($Fresh) {
$FreshHome = Join-Path $env:LOCALAPPDATA "Hermes\fresh-install-home"
if ($ResetFresh -and (Test-Path $FreshHome)) {
Remove-Item -Recurse -Force $FreshHome
}
New-Item -ItemType Directory -Force -Path $FreshHome | Out-Null
$env:HERMES_HOME = $FreshHome
$env:HERMES_GUI_PORT = "9140"
$env:HERMES_GUI_FRESH = "1"
Write-Host "Fresh GUI mode"
Write-Host " HERMES_HOME=$FreshHome"
Write-Host " HERMES_GUI_PORT=$env:HERMES_GUI_PORT"
}
Push-Location $AppRoot
try {
if ($Tauri) {
& tauri $Command
}
else {
& cargo tauri $Command
}
}
finally {
Pop-Location
}
-13
View File
@@ -1,13 +0,0 @@
{
"name": "@hermes/gui",
"version": "0.0.0",
"private": true,
"type": "module",
"scripts": {
"dev": "node scripts/dev-shell.mjs",
"dev:tauri": "node scripts/tauri.mjs dev",
"build": "node scripts/tauri.mjs build",
"dashboard": "node scripts/start-dashboard.mjs",
"tauri": "node scripts/tauri.mjs"
}
}
-156
View File
@@ -1,156 +0,0 @@
import { spawn, spawnSync } from "node:child_process";
import { createServer } from "node:net";
import { dirname, resolve } from "node:path";
import { setTimeout as delay } from "node:timers/promises";
import { fileURLToPath } from "node:url";
const here = dirname(fileURLToPath(import.meta.url));
const repoRoot = resolve(here, "../../..");
const python = process.env.HERMES_PYTHON || "python";
let port = process.env.HERMES_GUI_PORT || "9120";
let url = `http://127.0.0.1:${port}`;
let dashboard = null;
function stop() {
if (dashboard && !dashboard.killed) dashboard.kill();
}
process.on("SIGINT", () => {
stop();
process.exit(130);
});
process.on("SIGTERM", () => {
stop();
process.exit(143);
});
process.on("exit", stop);
async function waitForHealth() {
for (let i = 0; i < 120; i += 1) {
if (await isHealthy()) return true;
await delay(500);
}
return false;
}
async function isHealthy() {
try {
const res = await fetch(`${url}/api/health`, {
signal: AbortSignal.timeout(1000),
});
const data = await res.json();
return res.ok && data.status === "ok";
} catch {
return false;
}
}
function canBind(candidate) {
return new Promise((resolveBind) => {
const server = createServer();
server.once("error", () => resolveBind(false));
server.listen(Number(candidate), "127.0.0.1", () => {
server.close(() => resolveBind(true));
});
});
}
async function choosePort() {
if (process.env.HERMES_GUI_PORT) return;
let candidate = Number(port);
for (let i = 0; i < 20; i += 1) {
if (await canBind(candidate)) {
port = String(candidate);
url = `http://127.0.0.1:${port}`;
return;
}
candidate += 1;
}
}
function startDashboard() {
dashboard = spawn(
python,
[
"-m",
"hermes_cli.main",
"dashboard",
"--gui",
"--no-open",
"--host",
"127.0.0.1",
"--port",
port,
],
{
cwd: repoRoot,
env: {
...process.env,
HERMES_GUI: "1",
},
stdio: "inherit",
},
);
dashboard.on("exit", (code) => {
process.exit(code ?? 0);
});
}
function run(command, args) {
return (
spawnSync(command, args, {
shell: process.platform === "win32",
stdio: "ignore",
}).status === 0
);
}
function openGuiWindow() {
if (process.platform === "win32") {
return (
run("cmd.exe", ["/C", "start", "", "chrome", `--app=${url}`]) ||
run("cmd.exe", ["/C", "start", "", "msedge", `--app=${url}`]) ||
run("cmd.exe", ["/C", "start", "", url])
);
}
if (process.env.WSL_DISTRO_NAME) {
return (
run("cmd.exe", ["/C", "start", "", "chrome", `--app=${url}`]) ||
run("cmd.exe", ["/C", "start", "", "msedge", `--app=${url}`]) ||
run("cmd.exe", ["/C", "start", "", url])
);
}
if (process.platform === "darwin") {
return (
run("open", ["-na", "Google Chrome", "--args", `--app=${url}`]) ||
run("open", [url])
);
}
return (
run("google-chrome", [`--app=${url}`]) ||
run("chromium", [`--app=${url}`]) ||
run("xdg-open", [url])
);
}
if (await isHealthy()) {
console.log(`Hermes GUI already running -> ${url}`);
openGuiWindow();
process.exit(0);
}
await choosePort();
startDashboard();
if (await waitForHealth()) {
console.log(`Hermes GUI -> ${url}`);
openGuiWindow();
} else {
console.error(`Hermes GUI did not become healthy at ${url}`);
}
-95
View File
@@ -1,95 +0,0 @@
import { spawn } from "node:child_process";
import { dirname, resolve } from "node:path";
import { fileURLToPath } from "node:url";
const here = dirname(fileURLToPath(import.meta.url));
const repoRoot = resolve(here, "../../..");
const python = process.env.HERMES_PYTHON || "python";
const port = process.env.HERMES_GUI_PORT || "9120";
const url = `http://127.0.0.1:${port}`;
async function isHealthy() {
try {
const res = await fetch(`${url}/api/health`, {
signal: AbortSignal.timeout(1000),
});
const data = await res.json();
return res.ok && data.status === "ok";
} catch {
return false;
}
}
function wslRepoRoot() {
const normalized = repoRoot.replaceAll("\\", "/");
const parts = normalized.split("/");
const host = parts[2]?.toLowerCase();
if (process.platform !== "win32") return null;
if (host !== "wsl$" && host !== "wsl.localhost") return null;
const distro = parts[3];
const path = `/${parts.slice(4).join("/")}`;
return distro && path !== "/" ? { distro, path } : null;
}
function spawnDashboard() {
const wsl = wslRepoRoot();
if (wsl) {
return spawn(
"wsl.exe",
[
"-d",
wsl.distro,
"--cd",
wsl.path,
"env",
"HERMES_GUI=1",
process.env.HERMES_WSL_PYTHON || "python",
"-m",
"hermes_cli.main",
"dashboard",
"--gui",
"--no-open",
"--host",
"127.0.0.1",
"--port",
port,
],
{ stdio: "inherit" },
);
}
return spawn(
python,
[
"-m",
"hermes_cli.main",
"dashboard",
"--gui",
"--no-open",
"--host",
"127.0.0.1",
"--port",
port,
],
{
cwd: repoRoot,
env: {
...process.env,
HERMES_GUI: "1",
},
stdio: "inherit",
},
);
}
if (await isHealthy()) {
console.log(`Hermes GUI already running -> ${url}`);
process.exit(0);
}
const child = spawnDashboard();
child.on("exit", (code, signal) => {
if (signal) process.kill(process.pid, signal);
process.exit(code ?? 0);
});
-90
View File
@@ -1,90 +0,0 @@
import { spawnSync } from "node:child_process";
import { existsSync } from "node:fs";
import { dirname, resolve } from "node:path";
import { fileURLToPath } from "node:url";
const here = dirname(fileURLToPath(import.meta.url));
const appRoot = resolve(here, "..");
const bin = process.platform === "win32" ? "tauri.cmd" : "tauri";
const localTauri = resolve(appRoot, "node_modules", ".bin", bin);
const args = process.argv.slice(2);
function isWsl() {
return process.platform === "linux" && !!process.env.WSL_DISTRO_NAME;
}
function quotePs(value) {
return `'${value.replaceAll("'", "''")}'`;
}
function dispatchToWindows() {
const pathResult = spawnSync("wslpath", ["-w", appRoot], {
encoding: "utf8",
});
const windowsPath = pathResult.stdout.trim();
if (!windowsPath) return false;
const command = [
"$ErrorActionPreference = 'Stop'",
"Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass -Force",
"if (-not (Get-Command npm -ErrorAction SilentlyContinue)) {",
' Write-Error "Windows npm was not found. Install Windows Node.js first: winget install OpenJS.NodeJS.LTS"',
"}",
"if (-not (Get-Command rustc -ErrorAction SilentlyContinue)) {",
' Write-Error "Windows Rust was not found. Install Rust first: winget install Rustlang.Rustup"',
"}",
`Set-Location -LiteralPath ${quotePs(windowsPath)}`,
"& npm run dev:tauri",
].join("; ");
const result = spawnSync(
"powershell.exe",
["-NoProfile", "-ExecutionPolicy", "Bypass", "-Command", command],
{ stdio: "inherit" },
);
process.exit(result.status ?? 1);
}
function run(command, commandArgs, { exit = true } = {}) {
if (process.platform === "win32") {
const psCommand = [
"$ErrorActionPreference = 'Stop'",
"Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass -Force",
`Set-Location -LiteralPath ${quotePs(appRoot)}`,
`& ${quotePs(command)} ${commandArgs.map(quotePs).join(" ")}`,
].join("; ");
const result = spawnSync(
"powershell.exe",
["-NoProfile", "-ExecutionPolicy", "Bypass", "-Command", psCommand],
{ stdio: "inherit" },
);
if (result.error && result.error.code === "ENOENT") return false;
if (exit) process.exit(result.status ?? 1);
return result.status === 0;
}
const result = spawnSync(command, commandArgs, {
cwd: appRoot,
env: process.env,
stdio: "inherit",
});
if (result.error && result.error.code === "ENOENT") return false;
if (exit) process.exit(result.status ?? 1);
return result.status === 0;
}
if (isWsl() && process.env.HERMES_GUI_TAURI_WSL !== "1") {
console.log("Launching native Windows Tauri from WSL...");
dispatchToWindows();
console.error(
"Could not hand off to Windows PowerShell. Run this from Windows PowerShell instead:",
);
console.error(" cd \\\\wsl$\\Ubuntu\\home\\bb\\hermes-agent\\apps\\gui");
console.error(" npm run dev:tauri");
process.exit(1);
}
if (existsSync(localTauri)) run(localTauri, args);
if (run("tauri", args, { exit: false })) process.exit(0);
if (run("cargo", ["tauri", ...args], { exit: false })) process.exit(0);
run("npx", ["--yes", "@tauri-apps/cli@latest", ...args]);
-1
View File
@@ -1 +0,0 @@
/target/
-5579
View File
File diff suppressed because it is too large Load Diff
-17
View File
@@ -1,17 +0,0 @@
[package]
name = "hermes-gui"
version = "0.0.0"
description = "Hermes GUI shell"
edition = "2021"
[lib]
name = "hermes_gui_lib"
crate-type = ["staticlib", "cdylib", "rlib"]
[build-dependencies]
tauri-build = { version = "2", features = [] }
[dependencies]
tauri = { version = "2", features = ["tray-icon"] }
tauri-plugin-notification = "2"
tauri-plugin-opener = "2"
-3
View File
@@ -1,3 +0,0 @@
fn main() {
tauri_build::build();
}
@@ -1,7 +0,0 @@
{
"$schema": "../gen/schemas/desktop-schema.json",
"identifier": "default",
"description": "Default Hermes GUI permissions",
"windows": ["main"],
"permissions": ["core:default", "notification:default", "opener:default"]
}
File diff suppressed because one or more lines are too long
@@ -1 +0,0 @@
{"default":{"identifier":"default","description":"Default Hermes GUI permissions","local":true,"windows":["main"],"permissions":["core:default","notification:default","opener:default"]}}
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
Binary file not shown.

Before

Width:  |  Height:  |  Size: 135 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.1 KiB

-4
View File
@@ -1,4 +0,0 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 100 100">
<rect width="100" height="100" rx="18" fill="#071313"/>
<text x="50" y="70" text-anchor="middle" font-size="68" fill="#f0e6d2"></text>
</svg>

Before

Width:  |  Height:  |  Size: 212 B

-1
View File
@@ -1 +0,0 @@
-433
View File
@@ -1,433 +0,0 @@
use std::{
io::{Read, Write},
net::{TcpListener, TcpStream},
process::{Child, Command, Stdio},
sync::Mutex,
time::{Duration, Instant},
};
use tauri::{
image::Image,
menu::{Menu, MenuItem, PredefinedMenuItem},
tray::{MouseButton, MouseButtonState, TrayIconBuilder, TrayIconEvent},
App, AppHandle, Manager, WebviewWindow,
};
const GUI_HOST: &str = "127.0.0.1";
const DEFAULT_GUI_PORT: u16 = 9120;
const MIN_SPLASH_MS: u64 = 0;
const SPLASH_URL: &str = "data:text/html,%3C!doctype%20html%3E%3Cmeta%20charset%3Dutf-8%3E%3Cstyle%3Ebody%7Bmargin%3A0%3Bheight%3A100vh%3Bdisplay%3Agrid%3Bplace-items%3Acenter%3Bbackground%3A%23071313%3Bcolor%3A%23f0e6d2%3Bfont%3A14px%20monospace%3Bletter-spacing%3A.08em%3Btext-transform%3Auppercase%7D%3C%2Fstyle%3E%3Cbody%3EStarting%20Hermes%E2%80%A6%3C%2Fbody%3E";
struct GuiState {
child: Mutex<Option<Child>>,
port: Mutex<u16>,
}
fn gui_url(port: u16) -> String {
format!("http://{GUI_HOST}:{port}")
}
fn check_health(port: u16) -> bool {
let Ok(mut stream) = TcpStream::connect_timeout(
&format!("{GUI_HOST}:{port}").parse().unwrap(),
Duration::from_secs(1),
) else {
return false;
};
let _ = stream.set_read_timeout(Some(Duration::from_secs(1)));
let request =
format!("GET /api/health HTTP/1.1\r\nHost: {GUI_HOST}:{port}\r\nConnection: close\r\n\r\n");
if stream.write_all(request.as_bytes()).is_err() {
return false;
}
let mut response = String::new();
let _ = stream.read_to_string(&mut response);
response.contains("200 OK")
&& response.contains("\"status\":\"ok\"")
&& response.contains("\"mode\":\"gui\"")
}
fn can_bind(port: u16) -> bool {
TcpListener::bind((GUI_HOST, port)).is_ok()
}
fn base_port() -> u16 {
std::env::var("HERMES_GUI_PORT")
.ok()
.and_then(|raw| raw.parse().ok())
.unwrap_or(DEFAULT_GUI_PORT)
}
fn select_port() -> u16 {
let start = base_port();
for port in start..start.saturating_add(20) {
if check_health(port) || can_bind(port) {
return port;
}
}
start
}
fn repo_root() -> std::path::PathBuf {
std::path::PathBuf::from(env!("CARGO_MANIFEST_DIR"))
.join("../../..")
.canonicalize()
.unwrap_or_else(|_| std::path::PathBuf::from("."))
}
fn runtime_dir() -> Option<std::path::PathBuf> {
std::env::var_os("HERMES_GUI_RUNTIME_DIR").map(std::path::PathBuf::from)
}
fn runtime_python(runtime: &std::path::Path) -> std::path::PathBuf {
if cfg!(target_os = "windows") {
runtime.join("venv").join("Scripts").join("python.exe")
} else {
runtime.join("venv").join("bin").join("python")
}
}
fn wsl_path(root: &std::path::Path) -> Option<(String, String)> {
let raw = root.to_string_lossy().replace('\\', "/");
let parts: Vec<&str> = raw.split('/').collect();
let host = parts.get(2)?.to_ascii_lowercase();
if host != "wsl$" && host != "wsl.localhost" {
return None;
}
let distro = parts.get(3)?.to_string();
let path = format!("/{}", parts.get(4..)?.join("/"));
Some((distro, path))
}
fn start_dashboard(port: u16) -> std::io::Result<Child> {
if let Some(runtime) = runtime_dir() {
let python = runtime_python(&runtime);
let web_dist = runtime.join("web_dist");
let tui_dir = runtime.join("ui-tui");
let port = port.to_string();
return Command::new(python)
.args([
"-m",
"hermes_cli.main",
"dashboard",
"--gui",
"--no-open",
"--host",
GUI_HOST,
"--port",
&port,
])
.env("HERMES_GUI", "1")
.env("HERMES_GUI_PORT", &port)
.env("HERMES_WEB_DIST", web_dist)
.env("HERMES_TUI_DIR", tui_dir)
.envs(
std::env::vars()
.filter(|(key, _)| matches!(key.as_str(), "HERMES_HOME" | "HERMES_GUI_FRESH")),
)
.stdin(Stdio::null())
.stdout(Stdio::null())
.stderr(Stdio::null())
.spawn();
}
let root = repo_root();
let port = port.to_string();
if let Some((distro, path)) = wsl_path(&root) {
let port_env = format!("HERMES_GUI_PORT={port}");
let mut env_args = vec!["HERMES_GUI=1".to_string(), port_env];
if let Ok(home) = std::env::var("HERMES_HOME") {
env_args.push(format!("HERMES_HOME={home}"));
}
if let Ok(fresh) = std::env::var("HERMES_GUI_FRESH") {
env_args.push(format!("HERMES_GUI_FRESH={fresh}"));
}
let mut args = vec![
"-d".to_string(),
distro,
"--cd".to_string(),
path,
"env".to_string(),
];
args.extend(env_args);
args.extend([
"python".to_string(),
"-m".to_string(),
"hermes_cli.main".to_string(),
"dashboard".to_string(),
"--gui".to_string(),
"--no-open".to_string(),
"--host".to_string(),
GUI_HOST.to_string(),
"--port".to_string(),
port.clone(),
]);
return Command::new("wsl.exe")
.args(args)
.stdin(Stdio::null())
.stdout(Stdio::null())
.stderr(Stdio::null())
.spawn();
}
Command::new("python")
.args([
"-m",
"hermes_cli.main",
"dashboard",
"--gui",
"--no-open",
"--host",
GUI_HOST,
"--port",
&port,
])
.current_dir(root)
.env("HERMES_GUI", "1")
.env("HERMES_GUI_PORT", &port)
.envs(
std::env::vars()
.filter(|(key, _)| matches!(key.as_str(), "HERMES_HOME" | "HERMES_GUI_FRESH")),
)
.stdin(Stdio::null())
.stdout(Stdio::null())
.stderr(Stdio::null())
.spawn()
}
fn stop_owned_dashboard(state: &GuiState) {
let Some(mut child) = state.child.lock().expect("gui child lock poisoned").take() else {
return;
};
let _ = child.kill();
let _ = child.wait();
}
fn current_port(state: &GuiState) -> u16 {
*state.port.lock().expect("gui port lock poisoned")
}
fn ensure_dashboard(state: &GuiState) -> Result<(), String> {
let current = current_port(state);
if check_health(current) {
return Ok(());
}
let port = select_port();
*state.port.lock().expect("gui port lock poisoned") = port;
if check_health(port) {
return Ok(());
}
let child = start_dashboard(port).map_err(|err| {
format!(
"Could not auto-start Hermes dashboard ({err}). Start it manually with: hermes dashboard --gui --no-open --port {port}"
)
})?;
*state.child.lock().expect("gui child lock poisoned") = Some(child);
Ok(())
}
fn navigate_when_ready(window: WebviewWindow, port: u16) {
std::thread::spawn(move || {
let started = Instant::now();
while started.elapsed() < Duration::from_secs(60) {
if check_health(port) {
let min_splash = std::env::var("HERMES_GUI_MIN_SPLASH_MS")
.ok()
.and_then(|raw| raw.parse::<u64>().ok())
.unwrap_or(MIN_SPLASH_MS);
let elapsed = started.elapsed();
if elapsed < Duration::from_millis(min_splash) {
std::thread::sleep(Duration::from_millis(min_splash) - elapsed);
}
if let Ok(url) = tauri::Url::parse(&gui_url(port)) {
let _ = window.navigate(url);
let _ = window.show();
let _ = window.set_focus();
}
return;
}
std::thread::sleep(Duration::from_millis(500));
}
});
}
fn show_main_window(app: &AppHandle) {
if let Some(window) = app.get_webview_window("main") {
let _ = window.show();
let _ = window.set_focus();
}
}
fn open_browser(port: u16) {
let url = gui_url(port);
#[cfg(target_os = "windows")]
let _ = Command::new("cmd")
.args(["/C", "start", "", &url])
.stdin(Stdio::null())
.stdout(Stdio::null())
.stderr(Stdio::null())
.spawn();
#[cfg(target_os = "macos")]
let _ = Command::new("open").arg(&url).spawn();
#[cfg(all(unix, not(target_os = "macos")))]
let _ = Command::new("xdg-open").arg(&url).spawn();
}
fn tray_icon() -> Image<'static> {
let width = 32;
let height = 32;
let mut rgba = Vec::with_capacity(width * height * 4);
for y in 0..height {
for x in 0..width {
let mark = (14..=17).contains(&x) && (5..=26).contains(&y)
|| (8..=23).contains(&x) && (13..=16).contains(&y)
|| (10..=21).contains(&x) && (y == 5 || y == 26);
if mark {
rgba.extend_from_slice(&[0xF0, 0xE6, 0xD2, 0xFF]);
} else {
rgba.extend_from_slice(&[0x07, 0x13, 0x13, 0xFF]);
}
}
}
Image::new_owned(rgba, width as u32, height as u32)
}
fn restart_runtime(app: &AppHandle) -> Result<(), String> {
let state = app.state::<GuiState>();
stop_owned_dashboard(&state);
ensure_dashboard(&state)?;
if let Some(window) = app.get_webview_window("main") {
if let Ok(url) = tauri::Url::parse(SPLASH_URL) {
let _ = window.navigate(url);
}
let port = current_port(&state);
navigate_when_ready(window, port);
}
Ok(())
}
fn setup_tray(app: &App) -> tauri::Result<()> {
let open_item = MenuItem::with_id(app, "open", "Open Hermes", true, None::<&str>)?;
let browser_item = MenuItem::with_id(app, "browser", "Open in Browser", true, None::<&str>)?;
let restart_item =
MenuItem::with_id(app, "restart", "Restart Hermes Runtime", true, None::<&str>)?;
let status_item = MenuItem::with_id(app, "status", "Local runtime", false, None::<&str>)?;
let separator = PredefinedMenuItem::separator(app)?;
let separator2 = PredefinedMenuItem::separator(app)?;
let quit_item = MenuItem::with_id(app, "quit", "Quit Hermes", true, None::<&str>)?;
let menu = Menu::with_items(
app,
&[
&open_item,
&browser_item,
&restart_item,
&separator,
&status_item,
&separator2,
&quit_item,
],
)?;
let icon = tray_icon();
let _tray = TrayIconBuilder::new()
.icon(icon)
.menu(&menu)
.tooltip("Hermes")
.on_menu_event(|app, event| match event.id.as_ref() {
"open" => show_main_window(app),
"browser" => {
let state = app.state::<GuiState>();
open_browser(current_port(&state));
}
"restart" => {
if let Err(err) = restart_runtime(app) {
eprintln!("Failed to restart Hermes runtime: {err}");
}
}
"quit" => {
let state = app.state::<GuiState>();
stop_owned_dashboard(&state);
app.exit(0);
}
_ => {}
})
.on_tray_icon_event(|tray, event| {
if let TrayIconEvent::Click {
button: MouseButton::Left,
button_state: MouseButtonState::Up,
..
} = event
{
show_main_window(&tray.app_handle());
}
})
.build(app)?;
Ok(())
}
#[tauri::command]
fn runtime_running(app: AppHandle) -> bool {
let state = app.state::<GuiState>();
check_health(current_port(&state))
}
#[tauri::command]
fn restart_runtime_command(app: AppHandle) -> Result<(), String> {
restart_runtime(&app)
}
pub fn run() {
tauri::Builder::default()
.plugin(tauri_plugin_notification::init())
.plugin(tauri_plugin_opener::init())
.manage(GuiState {
child: Mutex::new(None),
port: Mutex::new(base_port()),
})
.invoke_handler(tauri::generate_handler![
runtime_running,
restart_runtime_command
])
.setup(|app| {
setup_tray(app)?;
if let Some(window) = app.get_webview_window("main") {
if let Ok(url) = tauri::Url::parse(SPLASH_URL) {
let _ = window.navigate(url);
}
let state = app.state::<GuiState>();
if let Err(err) = ensure_dashboard(&state) {
eprintln!("{err}");
}
let port = current_port(&state);
navigate_when_ready(window, port);
}
Ok(())
})
.on_window_event(|window, event| {
if let tauri::WindowEvent::CloseRequested { api, .. } = event {
api.prevent_close();
let _ = window.hide();
}
})
.run(tauri::generate_context!())
.expect("failed to run Hermes GUI");
}
-5
View File
@@ -1,5 +0,0 @@
#![cfg_attr(not(debug_assertions), windows_subsystem = "windows")]
fn main() {
hermes_gui_lib::run();
}
-38
View File
@@ -1,38 +0,0 @@
{
"$schema": "https://schema.tauri.app/config/2",
"productName": "Hermes",
"version": "0.0.0",
"identifier": "ai.nous.hermes.gui",
"build": {
"beforeDevCommand": "",
"beforeBuildCommand": "",
"devUrl": "http://127.0.0.1:9120",
"frontendDist": "../dist"
},
"app": {
"withGlobalTauri": true,
"windows": [
{
"label": "main",
"title": "Hermes",
"width": 1400,
"height": 900,
"minWidth": 900,
"minHeight": 600,
"resizable": true,
"center": true
}
],
"security": {
"csp": "default-src 'self' http://127.0.0.1:* http://localhost:*; connect-src 'self' http://127.0.0.1:* http://localhost:* ws://127.0.0.1:* ws://localhost:*; img-src 'self' data: blob: http://127.0.0.1:* http://localhost:*; style-src 'self' 'unsafe-inline' http://127.0.0.1:* http://localhost:*; script-src 'self' 'unsafe-inline' 'unsafe-eval' http://127.0.0.1:* http://localhost:*"
}
},
"bundle": {
"active": true,
"icon": ["icons/32x32.png", "icons/icon.ico", "icons/icon.svg"],
"targets": ["nsis", "dmg", "app"],
"resources": {
"sidecars": "sidecars/"
}
}
}
-5
View File
@@ -1,5 +0,0 @@
// Browser-side GUI bridge entry.
//
// The dashboard remains in `web/`; this file is reserved for future shell-only
// glue if we need pre-navigation scripts or native event wiring.
export {};
-44
View File
@@ -1,44 +0,0 @@
param(
[string]$Out = "$PSScriptRoot\..\gui\src-tauri\sidecars\hermes-runtime",
[string]$Python = "python"
)
$Root = Resolve-Path "$PSScriptRoot\..\.."
Write-Host "Bundling Hermes GUI runtime"
Write-Host "repo: $Root"
Write-Host "out: $Out"
if (Test-Path $Out) {
Remove-Item -Recurse -Force $Out
}
New-Item -ItemType Directory -Force -Path $Out | Out-Null
Write-Host "-> Building dashboard"
npm --prefix "$Root\web" ci
npm --prefix "$Root\web" run build
Copy-Item -Recurse "$Root\web\dist" "$Out\web_dist"
Write-Host "-> Building TUI"
npm --prefix "$Root\ui-tui" ci
npm --prefix "$Root\ui-tui" run build
New-Item -ItemType Directory -Force -Path "$Out\ui-tui" | Out-Null
Copy-Item -Recurse "$Root\ui-tui\dist" "$Out\ui-tui\dist"
Copy-Item "$Root\ui-tui\package.json" "$Out\ui-tui\package.json"
Copy-Item "$Root\ui-tui\package-lock.json" "$Out\ui-tui\package-lock.json"
Copy-Item -Recurse "$Root\ui-tui\node_modules" "$Out\ui-tui\node_modules"
Write-Host "-> Creating Python runtime"
& $Python -m venv "$Out\venv"
& "$Out\venv\Scripts\python.exe" -m pip install --upgrade pip
& "$Out\venv\Scripts\python.exe" -m pip install -e "$Root[web,pty]"
@"
# Hermes GUI Runtime
Generated by apps/shared/bundle-runtime.ps1.
Set HERMES_GUI_RUNTIME_DIR to this directory before launching the Tauri shell.
"@ | Set-Content "$Out\README.md"
Write-Host "Runtime bundle ready: $Out"
-41
View File
@@ -1,41 +0,0 @@
#!/usr/bin/env bash
set -euo pipefail
ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)"
OUT="${1:-"$ROOT/apps/gui/src-tauri/sidecars/hermes-runtime"}"
PYTHON="${PYTHON:-python}"
echo "Bundling Hermes GUI runtime"
echo "repo: $ROOT"
echo "out: $OUT"
rm -rf "$OUT"
mkdir -p "$OUT"
echo "→ Building dashboard"
npm --prefix "$ROOT/web" ci
npm --prefix "$ROOT/web" run build
cp -a "$ROOT/web/dist" "$OUT/web_dist"
echo "→ Building TUI"
npm --prefix "$ROOT/ui-tui" ci
npm --prefix "$ROOT/ui-tui" run build
mkdir -p "$OUT/ui-tui"
cp -a "$ROOT/ui-tui/dist" "$OUT/ui-tui/dist"
cp -a "$ROOT/ui-tui/package.json" "$ROOT/ui-tui/package-lock.json" "$OUT/ui-tui/"
cp -a "$ROOT/ui-tui/node_modules" "$OUT/ui-tui/node_modules"
echo "→ Creating Python runtime"
"$PYTHON" -m venv "$OUT/venv"
"$OUT/venv/bin/python" -m pip install --upgrade pip
"$OUT/venv/bin/python" -m pip install -e "$ROOT[web,pty]"
cat > "$OUT/README.md" <<EOF
# Hermes GUI Runtime
Generated by apps/shared/bundle-runtime.sh.
Set HERMES_GUI_RUNTIME_DIR to this directory before launching the Tauri shell.
EOF
echo "✓ Runtime bundle ready: $OUT"
-33
View File
@@ -1,33 +0,0 @@
# GUI Runtime Contract
The GUI shell starts Hermes with a small, explicit environment.
## Environment
```text
HERMES_GUI=1
HERMES_WEB_DIST=<bundled web dist>
HERMES_TUI_DIR=<bundled ui-tui dir>
```
The native shell uses `127.0.0.1:9120` as its initial GUI port during dev.
Bundled builds should keep the port private to the local machine and expose it
through `/api/health` and `/api/runtime`.
The shell should also pass the selected profile through the normal Hermes CLI
profile mechanism once the profile picker is wired.
## Ports
Use `127.0.0.1` only. Start with the GUI default port, then fall back to a
free port if occupied. Show the chosen port in the tray menu.
## User Data
The installer owns app files. Hermes owns user state under `HERMES_HOME`.
Uninstallers must not delete user state unless the user explicitly asks.
## Update Model
MVP does not use Tauri's native updater. GUI runs `hermes update`, tails the
action log, notifies completion, then offers to restart the runtime.
+2 -26
View File
@@ -326,16 +326,6 @@ compression:
# To pin a specific model/provider for compression summaries, use the
# auxiliary section below (auxiliary.compression.provider / model).
# =============================================================================
# Anthropic prompt caching TTL
# =============================================================================
# When prompt caching is active (Claude via OpenRouter or native Anthropic),
# Anthropic supports two TTL tiers for cached prefixes: "5m" (default) and
# "1h". Other values are ignored and "5m" is used.
#
prompt_caching:
cache_ttl: "5m" # use "1h" for long sessions with pauses between turns
# =============================================================================
# Auxiliary Models (Advanced — Experimental)
# =============================================================================
@@ -517,13 +507,6 @@ agent:
# finish, then interrupts anything still running after this timeout.
# 0 = no drain, interrupt immediately.
# restart_drain_timeout: 60
# Max app-level retry attempts for API errors (connection drops, provider
# timeouts, 5xx, etc.) before the agent surfaces the failure. Lower this
# to 1 if you use fallback providers and want fast failover on flaky
# primaries (default 3). The OpenAI SDK does its own low-level retries
# underneath this wrapper — this is the Hermes-level loop.
# api_max_retries: 3
# Enable verbose logging
verbose: false
@@ -790,16 +773,9 @@ code_execution:
# Supports single tasks and batch mode (default 3 parallel, configurable).
delegation:
max_iterations: 50 # Max tool-calling turns per child (default: 50)
# max_concurrent_children: 3 # Max parallel child agents per batch (default: 3, floor: 1, no ceiling).
# WARNING: values above 10 multiply API cost linearly.
# max_spawn_depth: 1 # Delegation tree depth cap (range: 1-3, default: 1 = flat).
# Raise to 2 to allow workers to spawn their own subagents.
# Requires role="orchestrator" on intermediate agents.
# max_concurrent_children: 3 # Max parallel child agents (default: 3)
# max_spawn_depth: 1 # Tree depth cap (1-3, default: 1 = flat). Raise to 2 or 3 to allow orchestrator children to spawn their own workers.
# orchestrator_enabled: true # Kill switch for role="orchestrator" children (default: true).
# subagent_auto_approve: false # When a subagent hits a dangerous-command approval prompt, auto-deny (default: false)
# or auto-approve "once" (true) instead of blocking on stdin.
# The parent TUI owns stdin, so blocking would deadlock; non-interactive resolution is required.
# Both choices emit a logger.warning audit line. Flip to true only for cron/batch pipelines.
# inherit_mcp_toolsets: true # When explicit child toolsets are narrowed, also keep the parent's MCP toolsets (default: true). Set false for strict intersection.
# model: "google/gemini-3-flash-preview" # Override model for subagents (empty = inherit parent)
# provider: "openrouter" # Override provider for subagents (empty = inherit parent)
+213 -205
View File
@@ -30,7 +30,7 @@ from urllib.parse import unquote, urlparse
from contextlib import contextmanager
from pathlib import Path
from datetime import datetime
from typing import List, Dict, Any, Optional
from typing import List, Dict, Any, Optional, TypedDict
logger = logging.getLogger(__name__)
@@ -84,6 +84,34 @@ _project_env = Path(__file__).parent / '.env'
load_hermes_dotenv(hermes_home=_hermes_home, project_env=_project_env)
class _ModelPickerState(TypedDict, total=False):
stage: str
providers: List[Dict[str, Any]]
selected: int
current_model: str
current_provider: str
user_provs: Optional[Dict[str, Any]]
custom_provs: Optional[Dict[str, Any]]
provider_data: Dict[str, Any]
model_list: List[str]
class _ApprovalState(TypedDict, total=False):
command: str
description: str
choices: List[str]
selected: int
response_queue: "queue.Queue[str]"
show_full: bool
class _ClarifyState(TypedDict, total=False):
question: str
choices: List[str]
selected: int
response_queue: "queue.Queue[str]"
_REASONING_TAGS = (
"REASONING_SCRATCHPAD",
"think",
@@ -1688,6 +1716,7 @@ def _looks_like_slash_command(text: str) -> bool:
from agent.skill_commands import (
scan_skill_commands,
build_skill_invocation_message,
build_plan_path,
build_preloaded_skills_prompt,
)
@@ -1727,7 +1756,7 @@ def _parse_skills_argument(skills: str | list[str] | tuple[str, ...] | None) ->
return parsed
def save_config_value(key_path: str, value: any) -> bool:
def save_config_value(key_path: str, value: Any) -> bool:
"""
Save a value to the active config file at the specified key path.
@@ -2064,16 +2093,16 @@ class HermesCLI:
self._interrupt_queue = queue.Queue()
self._should_exit = False
self._last_ctrl_c_time = 0
self._clarify_state = None
self._clarify_state: Optional[_ClarifyState] = None
self._clarify_freetext = False
self._clarify_deadline = 0
self._sudo_state = None
self._sudo_deadline = 0
self._modal_input_snapshot = None
self._approval_state = None
self._approval_state: Optional[_ApprovalState] = None
self._approval_deadline = 0
self._approval_lock = threading.Lock()
self._model_picker_state = None
self._model_picker_state: Optional[_ModelPickerState] = None
self._secret_state = None
self._secret_deadline = 0
self._spinner_text: str = "" # thinking spinner text for TUI
@@ -3083,8 +3112,6 @@ class HermesCLI:
format_runtime_provider_error,
)
_primary_exc = None
runtime = None
try:
runtime = resolve_runtime_provider(
requested=self.requested_provider,
@@ -3092,34 +3119,7 @@ class HermesCLI:
explicit_base_url=self._explicit_base_url,
)
except Exception as exc:
_primary_exc = exc
# Primary provider auth failed — try fallback providers before giving up.
if runtime is None and _primary_exc is not None:
from hermes_cli.auth import AuthError
if isinstance(_primary_exc, AuthError):
_fb_chain = self._fallback_model if isinstance(self._fallback_model, list) else []
for _fb in _fb_chain:
_fb_provider = (_fb.get("provider") or "").strip().lower()
_fb_model = (_fb.get("model") or "").strip()
if not _fb_provider or not _fb_model:
continue
try:
runtime = resolve_runtime_provider(requested=_fb_provider)
logger.warning(
"Primary provider auth failed (%s). Falling through to fallback: %s/%s",
_primary_exc, _fb_provider, _fb_model,
)
_cprint(f"⚠️ Primary auth failed — switching to fallback: {_fb_provider} / {_fb_model}")
self.requested_provider = _fb_provider
self.model = _fb_model
_primary_exc = None
break
except Exception:
continue
if runtime is None:
message = format_runtime_provider_error(_primary_exc) if _primary_exc else "Provider resolution failed."
message = format_runtime_provider_error(exc)
ChatConsole().print(f"[bold red]{message}[/]")
return False
@@ -3176,14 +3176,7 @@ class HermesCLI:
# the configured model (e.g. "qwen3.6-plus"), causing 400 errors.
runtime_model = runtime.get("model")
if runtime_model and isinstance(runtime_model, str):
# Only use runtime model if: model is unset, or model equals provider name
should_use_runtime_model = (
not self.model or # No model configured yet
self.model == self.provider or # Model is the provider slug
self.model == runtime.get("name") # Model matches provider display name
)
if should_use_runtime_model:
self.model = runtime_model
self.model = runtime_model
# If model is still empty (e.g. user ran `hermes auth add openai-codex`
# without `hermes model`), fall back to the provider's first catalog
@@ -3289,23 +3282,6 @@ class HermesCLI:
_cprint(f"\033[1;31mSession not found: {self.session_id}{_RST}")
_cprint(f"{_DIM}Use a session ID from a previous CLI run (hermes sessions list).{_RST}")
return False
# If the requested session is the (empty) head of a compression
# chain, walk to the descendant that actually holds the messages.
# See #15000 and SessionDB.resolve_resume_session_id.
try:
resolved_id = self._session_db.resolve_resume_session_id(self.session_id)
except Exception:
resolved_id = self.session_id
if resolved_id and resolved_id != self.session_id:
ChatConsole().print(
f"[{_DIM}]Session {_escape(self.session_id)} was compressed into "
f"{_escape(resolved_id)}; resuming the descendant with your "
f"transcript.[/]"
)
self.session_id = resolved_id
resolved_meta = self._session_db.get_session(self.session_id)
if resolved_meta:
session_meta = resolved_meta
restored = self._session_db.get_messages_as_conversation(self.session_id)
if restored:
restored = [m for m in restored if m.get("role") != "session_meta"]
@@ -3524,22 +3500,6 @@ class HermesCLI:
)
return False
# If the requested session is the (empty) head of a compression chain,
# walk to the descendant that actually holds the messages. See #15000.
try:
resolved_id = self._session_db.resolve_resume_session_id(self.session_id)
except Exception:
resolved_id = self.session_id
if resolved_id and resolved_id != self.session_id:
self._console_print(
f"[dim]Session {self.session_id} was compressed into "
f"{resolved_id}; resuming the descendant with your transcript.[/]"
)
self.session_id = resolved_id
resolved_meta = self._session_db.get_session(self.session_id)
if resolved_meta:
session_meta = resolved_meta
restored = self._session_db.get_messages_as_conversation(self.session_id)
if restored:
restored = [m for m in restored if m.get("role") != "session_meta"]
@@ -4668,6 +4628,10 @@ class HermesCLI:
def new_session(self, silent=False):
"""Start a fresh session with a new session ID and cleared agent state."""
if self.agent and self.conversation_history:
try:
self.agent.flush_memories(self.conversation_history)
except (Exception, KeyboardInterrupt):
pass
# Trigger memory extraction on the old session before session_id rotates.
self.agent.commit_memory_session(self.conversation_history)
self._notify_session_boundary("on_session_finalize")
@@ -4750,22 +4714,6 @@ class HermesCLI:
_cprint(" Use /history or `hermes sessions list` to see available sessions.")
return
# If the target is the empty head of a compression chain, redirect to
# the descendant that actually holds the transcript. See #15000.
try:
resolved_id = self._session_db.resolve_resume_session_id(target_id)
except Exception:
resolved_id = target_id
if resolved_id and resolved_id != target_id:
_cprint(
f" Session {target_id} was compressed into {resolved_id}; "
f"resuming the descendant with your transcript."
)
target_id = resolved_id
resolved_meta = self._session_db.get_session(target_id)
if resolved_meta:
session_meta = resolved_meta
if target_id == self.session_id:
_cprint(" Already on that session.")
return
@@ -5377,26 +5325,29 @@ class HermesCLI:
_cprint(f" ✓ Model switched: {result.new_model}")
_cprint(f" Provider: {provider_label}")
# Context: always resolve via the provider-aware chain so Codex OAuth,
# Copilot, and Nous-enforced caps win over the raw models.dev entry
# (e.g. gpt-5.5 is 1.05M on openai but 272K on Codex OAuth).
# Rich metadata from models.dev
mi = result.model_info
from hermes_cli.model_switch import resolve_display_context_length
ctx = resolve_display_context_length(
result.new_model,
result.target_provider,
base_url=result.base_url or self.base_url or "",
api_key=result.api_key or self.api_key or "",
model_info=mi,
)
if ctx:
_cprint(f" Context: {ctx:,} tokens")
if mi:
if mi.context_window:
_cprint(f" Context: {mi.context_window:,} tokens")
if mi.max_output:
_cprint(f" Max output: {mi.max_output:,} tokens")
if mi.has_cost_data():
_cprint(f" Cost: {mi.format_cost()}")
_cprint(f" Capabilities: {mi.format_capabilities()}")
else:
# Fallback to old context length lookup
try:
from agent.model_metadata import get_model_context_length
ctx = get_model_context_length(
result.new_model,
base_url=result.base_url or self.base_url,
api_key=result.api_key or self.api_key,
provider=result.target_provider,
)
_cprint(f" Context: {ctx:,} tokens")
except Exception:
pass
# Cache notice
cache_enabled = (
@@ -5455,6 +5406,79 @@ class HermesCLI:
except Exception:
return False
def _show_model_and_providers(self):
"""Show current model + provider and list all authenticated providers.
Shows current model + provider, then lists all authenticated
providers with their available models.
"""
from hermes_cli.models import (
curated_models_for_provider, list_available_providers,
normalize_provider, _PROVIDER_LABELS,
get_pricing_for_provider, format_model_pricing_table,
)
from hermes_cli.auth import resolve_provider as _resolve_provider
# Resolve current provider
raw_provider = normalize_provider(self.provider)
if raw_provider == "auto":
try:
current = _resolve_provider(
self.requested_provider,
explicit_api_key=self._explicit_api_key,
explicit_base_url=self._explicit_base_url,
)
except Exception:
current = "openrouter"
else:
current = raw_provider
current_label = _PROVIDER_LABELS.get(current, current)
print(f"\n Current: {self.model} via {current_label}")
print()
# Show all authenticated providers with their models
providers = list_available_providers()
authed = [p for p in providers if p["authenticated"]]
unauthed = [p for p in providers if not p["authenticated"]]
if authed:
print(" Authenticated providers & models:")
for p in authed:
is_active = p["id"] == current
marker = " ← active" if is_active else ""
print(f" [{p['id']}]{marker}")
curated = curated_models_for_provider(p["id"])
# Fetch pricing for providers that support it (openrouter, nous)
pricing_map = get_pricing_for_provider(p["id"]) if p["id"] in ("openrouter", "nous") else {}
if curated and pricing_map:
cur_model = self.model if is_active else ""
for line in format_model_pricing_table(curated, pricing_map, current_model=cur_model):
print(line)
elif curated:
for mid, desc in curated:
current_marker = " ← current" if (is_active and mid == self.model) else ""
print(f" {mid}{current_marker}")
elif p["id"] == "custom":
from hermes_cli.models import _get_custom_base_url
custom_url = _get_custom_base_url()
if custom_url:
print(f" endpoint: {custom_url}")
if is_active:
print(f" model: {self.model} ← current")
print(" (use hermes model to change)")
else:
print(" (use hermes model to change)")
print()
if unauthed:
names = ", ".join(p["label"] for p in unauthed)
print(f" Not configured: {names}")
print(" Run: hermes setup")
print()
print(" To change model or provider, use: hermes model")
def _output_console(self):
"""Use prompt_toolkit-safe Rich rendering once the TUI is live."""
if getattr(self, "_app", None):
@@ -6030,12 +6054,16 @@ class HermesCLI:
self._handle_resume_command(cmd_original)
elif canonical == "model":
self._handle_model_switch(cmd_original)
elif canonical == "provider":
self._show_model_and_providers()
elif canonical == "gquota":
self._handle_gquota_command(cmd_original)
elif canonical == "personality":
# Use original case (handler lowercases the personality name itself)
self._handle_personality_command(cmd_original)
elif canonical == "plan":
self._handle_plan_command(cmd_original)
elif canonical == "retry":
retry_msg = self.retry_last()
if retry_msg and hasattr(self, '_pending_input'):
@@ -6165,8 +6193,6 @@ class HermesCLI:
self._handle_skin_command(cmd_original)
elif canonical == "voice":
self._handle_voice_command(cmd_original)
elif canonical == "busy":
self._handle_busy_command(cmd_original)
else:
# Check for user-defined quick commands (bypass agent loop, no LLM call)
base_cmd = cmd_lower.split()[0]
@@ -6272,6 +6298,32 @@ class HermesCLI:
return True
def _handle_plan_command(self, cmd: str):
"""Handle /plan [request] — load the bundled plan skill."""
parts = cmd.strip().split(maxsplit=1)
user_instruction = parts[1].strip() if len(parts) > 1 else ""
plan_path = build_plan_path(user_instruction)
msg = build_skill_invocation_message(
"/plan",
user_instruction,
task_id=self.session_id,
runtime_note=(
"Save the markdown plan with write_file to this exact relative path "
f"inside the active workspace/backend cwd: {plan_path}"
),
)
if not msg:
ChatConsole().print("[bold red]Failed to load the bundled /plan skill[/]")
return
_cprint(f" 📝 Plan mode queued via skill. Markdown plan target: {plan_path}")
if hasattr(self, '_pending_input'):
self._pending_input.put(msg)
else:
ChatConsole().print("[bold red]Plan mode unavailable: input queue not initialized[/]")
def _handle_background_command(self, cmd: str):
"""Handle /background <prompt> — run a prompt in a separate background session.
@@ -6661,13 +6713,6 @@ class HermesCLI:
print(f" ⚠ Port {_port} is not reachable at {cdp_url}")
os.environ["BROWSER_CDP_URL"] = cdp_url
# Eagerly start the CDP supervisor so pending_dialogs + frame_tree
# show up in the next browser_snapshot. No-op if already started.
try:
from tools.browser_tool import _ensure_cdp_supervisor # type: ignore[import-not-found]
_ensure_cdp_supervisor("default")
except Exception:
pass
print()
print("🌐 Browser connected to live Chrome via CDP")
print(f" Endpoint: {cdp_url}")
@@ -6689,8 +6734,7 @@ class HermesCLI:
if current:
os.environ.pop("BROWSER_CDP_URL", None)
try:
from tools.browser_tool import cleanup_all_browsers, _stop_cdp_supervisor
_stop_cdp_supervisor("default")
from tools.browser_tool import cleanup_all_browsers
cleanup_all_browsers()
except Exception:
pass
@@ -6903,36 +6947,6 @@ class HermesCLI:
else:
_cprint(f" {_ACCENT}✓ Reasoning effort set to '{arg}' (session only){_RST}")
def _handle_busy_command(self, cmd: str):
"""Handle /busy — control what Enter does while Hermes is working.
Usage:
/busy Show current busy input mode
/busy status Show current busy input mode
/busy queue Queue input for the next turn instead of interrupting
/busy interrupt Interrupt the current run on Enter (default)
"""
parts = cmd.strip().split(maxsplit=1)
if len(parts) < 2 or parts[1].strip().lower() == "status":
_cprint(f" {_ACCENT}Busy input mode: {self.busy_input_mode}{_RST}")
_cprint(f" {_DIM}Enter while busy: {'queues for next turn' if self.busy_input_mode == 'queue' else 'interrupts current run'}{_RST}")
_cprint(f" {_DIM}Usage: /busy [queue|interrupt|status]{_RST}")
return
arg = parts[1].strip().lower()
if arg not in {"queue", "interrupt"}:
_cprint(f" {_DIM}(._.) Unknown argument: {arg}{_RST}")
_cprint(f" {_DIM}Usage: /busy [queue|interrupt|status]{_RST}")
return
self.busy_input_mode = arg
if save_config_value("display.busy_input_mode", arg):
behavior = "Enter will queue follow-up input while Hermes is busy." if arg == "queue" else "Enter will interrupt the current run while Hermes is busy."
_cprint(f" {_ACCENT}✓ Busy input mode set to '{arg}' (saved to config){_RST}")
_cprint(f" {_DIM}{behavior}{_RST}")
else:
_cprint(f" {_ACCENT}✓ Busy input mode set to '{arg}' (session only){_RST}")
def _handle_fast_command(self, cmd: str):
"""Handle /fast — toggle fast mode (OpenAI Priority Processing / Anthropic Fast Mode)."""
if not self._fast_command_available():
@@ -7011,52 +7025,51 @@ class HermesCLI:
focus_topic = parts[1].strip()
original_count = len(self.conversation_history)
with self._busy_command("Compressing context..."):
try:
from agent.model_metadata import estimate_messages_tokens_rough
from agent.manual_compression_feedback import summarize_manual_compression
original_history = list(self.conversation_history)
approx_tokens = estimate_messages_tokens_rough(original_history)
if focus_topic:
print(f"🗜️ Compressing {original_count} messages (~{approx_tokens:,} tokens), "
f"focus: \"{focus_topic}\"...")
else:
print(f"🗜️ Compressing {original_count} messages (~{approx_tokens:,} tokens)...")
try:
from agent.model_metadata import estimate_messages_tokens_rough
from agent.manual_compression_feedback import summarize_manual_compression
original_history = list(self.conversation_history)
approx_tokens = estimate_messages_tokens_rough(original_history)
if focus_topic:
print(f"🗜️ Compressing {original_count} messages (~{approx_tokens:,} tokens), "
f"focus: \"{focus_topic}\"...")
else:
print(f"🗜️ Compressing {original_count} messages (~{approx_tokens:,} tokens)...")
compressed, _ = self.agent._compress_context(
original_history,
self.agent._cached_system_prompt or "",
approx_tokens=approx_tokens,
focus_topic=focus_topic or None,
)
self.conversation_history = compressed
# _compress_context ends the old session and creates a new child
# session on the agent (run_agent.py::_compress_context). Sync the
# CLI's session_id so /status, /resume, exit summary, and title
# generation all point at the live continuation session, not the
# ended parent. Without this, subsequent end_session() calls target
# the already-closed parent and the child is orphaned.
if (
getattr(self.agent, "session_id", None)
and self.agent.session_id != self.session_id
):
self.session_id = self.agent.session_id
self._pending_title = None
new_tokens = estimate_messages_tokens_rough(self.conversation_history)
summary = summarize_manual_compression(
original_history,
self.conversation_history,
approx_tokens,
new_tokens,
)
icon = "🗜️" if summary["noop"] else ""
print(f" {icon} {summary['headline']}")
print(f" {summary['token_line']}")
if summary["note"]:
print(f" {summary['note']}")
compressed, _ = self.agent._compress_context(
original_history,
self.agent._cached_system_prompt or "",
approx_tokens=approx_tokens,
focus_topic=focus_topic or None,
)
self.conversation_history = compressed
# _compress_context ends the old session and creates a new child
# session on the agent (run_agent.py::_compress_context). Sync the
# CLI's session_id so /status, /resume, exit summary, and title
# generation all point at the live continuation session, not the
# ended parent. Without this, subsequent end_session() calls target
# the already-closed parent and the child is orphaned.
if (
getattr(self.agent, "session_id", None)
and self.agent.session_id != self.session_id
):
self.session_id = self.agent.session_id
self._pending_title = None
new_tokens = estimate_messages_tokens_rough(self.conversation_history)
summary = summarize_manual_compression(
original_history,
self.conversation_history,
approx_tokens,
new_tokens,
)
icon = "🗜️" if summary["noop"] else ""
print(f" {icon} {summary['headline']}")
print(f" {summary['token_line']}")
if summary["note"]:
print(f" {summary['note']}")
except Exception as e:
print(f" ❌ Compression failed: {e}")
except Exception as e:
print(f" ❌ Compression failed: {e}")
def _handle_debug_command(self):
"""Handle /debug — upload debug report + logs and print paste URLs."""
@@ -7171,7 +7184,7 @@ class HermesCLI:
logging.getLogger(noisy).setLevel(logging.WARNING)
else:
logging.getLogger().setLevel(logging.INFO)
for quiet_logger in ('tools', 'run_agent', 'trajectory_compressor', 'cron', 'hermes_cli'):
for quiet_logger in ('tools', 'run_agent', 'scripts.trajectory_compressor', 'cron', 'hermes_cli'):
logging.getLogger(quiet_logger).setLevel(logging.ERROR)
def _show_insights(self, command: str = "/insights"):
@@ -9558,20 +9571,9 @@ class HermesCLI:
@kb.add('c-d')
def handle_ctrl_d(event):
"""Ctrl+D: delete char under cursor (standard readline behaviour).
Only exit when the input is empty same as bash/zsh. Pending
attached images count as input and block the EOF-exit so the
user doesn't lose them silently.
"""
buf = event.app.current_buffer
if buf.text:
buf.delete()
elif self._attached_images:
# Empty text but pending attachments — no-op, don't exit.
return
else:
self._should_exit = True
event.app.exit()
"""Handle Ctrl+D - exit."""
self._should_exit = True
event.app.exit()
_modal_prompt_active = Condition(
lambda: bool(self._secret_state or self._sudo_state)
@@ -10784,6 +10786,12 @@ class HermesCLI:
self.agent.interrupt()
except Exception:
pass
# Flush memories before exit (only for substantial conversations)
if self.agent and self.conversation_history:
try:
self.agent.flush_memories(self.conversation_history)
except (Exception, KeyboardInterrupt):
pass
# Shut down voice recorder (release persistent audio stream)
if hasattr(self, '_voice_recorder') and self._voice_recorder:
try:
+1 -72
View File
@@ -16,7 +16,7 @@ import uuid
from datetime import datetime, timedelta
from pathlib import Path
from hermes_constants import get_hermes_home
from typing import Optional, Dict, List, Any, Union
from typing import Optional, Dict, List, Any
logger = logging.getLogger(__name__)
@@ -371,39 +371,6 @@ def save_jobs(jobs: List[Dict[str, Any]]):
raise
def _normalize_workdir(workdir: Optional[str]) -> Optional[str]:
"""Normalize and validate a cron job workdir.
Rules:
- Empty / None None (feature off, preserves old behaviour).
- ``~`` is expanded. Relative paths are rejected cron jobs run detached
from any shell cwd, so relative paths have no stable meaning.
- The path must exist and be a directory at create/update time. We do
NOT re-check at run time (a user might briefly unmount the dir; the
scheduler will just fall back to old behaviour with a logged warning).
Returns the absolute path string, or None when disabled.
Raises ValueError on invalid input.
"""
if workdir is None:
return None
raw = str(workdir).strip()
if not raw:
return None
expanded = Path(raw).expanduser()
if not expanded.is_absolute():
raise ValueError(
f"Cron workdir must be an absolute path (got {raw!r}). "
f"Cron jobs run detached from any shell cwd, so relative paths are ambiguous."
)
resolved = expanded.resolve()
if not resolved.exists():
raise ValueError(f"Cron workdir does not exist: {resolved}")
if not resolved.is_dir():
raise ValueError(f"Cron workdir is not a directory: {resolved}")
return str(resolved)
def create_job(
prompt: str,
schedule: str,
@@ -417,9 +384,6 @@ def create_job(
provider: Optional[str] = None,
base_url: Optional[str] = None,
script: Optional[str] = None,
context_from: Optional[Union[str, List[str]]] = None,
enabled_toolsets: Optional[List[str]] = None,
workdir: Optional[str] = None,
) -> Dict[str, Any]:
"""
Create a new cron job.
@@ -439,18 +403,6 @@ def create_job(
script: Optional path to a Python script whose stdout is injected into the
prompt each run. The script runs before the agent turn, and its output
is prepended as context. Useful for data collection / change detection.
context_from: Optional job ID (or list of job IDs) whose most recent output
is injected into the prompt as context before each run.
Useful for chaining cron jobs: job A finds data, job B processes it.
enabled_toolsets: Optional list of toolset names to restrict the agent to.
When set, only tools from these toolsets are loaded, reducing
token overhead. When omitted, all default tools are loaded.
workdir: Optional absolute path. When set, the job runs as if launched
from that directory: AGENTS.md / CLAUDE.md / .cursorrules from
that directory are injected into the system prompt, and the
terminal/file/code_exec tools use it as their working directory
(via TERMINAL_CWD). When unset, the old behaviour is preserved
(no context files injected, tools use the scheduler's cwd).
Returns:
The created job dict
@@ -481,17 +433,6 @@ def create_job(
normalized_base_url = normalized_base_url or None
normalized_script = str(script).strip() if isinstance(script, str) else None
normalized_script = normalized_script or None
normalized_toolsets = [str(t).strip() for t in enabled_toolsets if str(t).strip()] if enabled_toolsets else None
normalized_toolsets = normalized_toolsets or None
normalized_workdir = _normalize_workdir(workdir)
# Normalize context_from: accept str or list of str, store as list or None
if isinstance(context_from, str):
context_from = [context_from.strip()] if context_from.strip() else None
elif isinstance(context_from, list):
context_from = [str(j).strip() for j in context_from if str(j).strip()] or None
else:
context_from = None
label_source = (prompt or (normalized_skills[0] if normalized_skills else None)) or "cron job"
job = {
@@ -504,7 +445,6 @@ def create_job(
"provider": normalized_provider,
"base_url": normalized_base_url,
"script": normalized_script,
"context_from": context_from,
"schedule": parsed_schedule,
"schedule_display": parsed_schedule.get("display", schedule),
"repeat": {
@@ -524,8 +464,6 @@ def create_job(
# Delivery configuration
"deliver": deliver,
"origin": origin, # Tracks where job was created for "origin" delivery
"enabled_toolsets": normalized_toolsets,
"workdir": normalized_workdir,
}
jobs = load_jobs()
@@ -559,15 +497,6 @@ def update_job(job_id: str, updates: Dict[str, Any]) -> Optional[Dict[str, Any]]
if job["id"] != job_id:
continue
# Validate / normalize workdir if present in updates. Empty string or
# None both mean "clear the field" (restore old behaviour).
if "workdir" in updates:
_wd = updates["workdir"]
if _wd in (None, "", False):
updates["workdir"] = None
else:
updates["workdir"] = _normalize_workdir(_wd)
updated = _apply_skill_fields({**job, **updates})
schedule_changed = "schedule" in updates
+12 -156
View File
@@ -40,37 +40,6 @@ from hermes_time import now as _hermes_now
logger = logging.getLogger(__name__)
def _resolve_cron_enabled_toolsets(job: dict, cfg: dict) -> list[str] | None:
"""Resolve the toolset list for a cron job.
Precedence:
1. Per-job ``enabled_toolsets`` (set via ``cronjob`` tool on create/update).
Keeps the agent's job-scoped toolset override intact — #6130.
2. Per-platform ``hermes tools`` config for the ``cron`` platform.
Mirrors gateway behavior (``_get_platform_tools(cfg, platform_key)``)
so users can gate cron toolsets globally without recreating every job.
3. ``None`` on any lookup failure AIAgent loads the full default set
(legacy behavior before this change, preserved as the safety net).
_DEFAULT_OFF_TOOLSETS ({moa, homeassistant, rl}) are removed by
``_get_platform_tools`` for unconfigured platforms, so fresh installs
get cron WITHOUT ``moa`` by default (issue reported by Norbert
surprise $4.63 run).
"""
per_job = job.get("enabled_toolsets")
if per_job:
return per_job
try:
from hermes_cli.tools_config import _get_platform_tools # lazy: avoid heavy import at cron module load
return sorted(_get_platform_tools(cfg or {}, "cron"))
except Exception as exc:
logger.warning(
"Cron toolset resolution failed, falling back to full default toolset: %s",
exc,
)
return None
# Valid delivery platforms — used to validate user-supplied platform names
# in cron delivery targets, preventing env var enumeration via crafted names.
_KNOWN_DELIVERY_PLATFORMS = frozenset({
@@ -470,8 +439,9 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Option
delivery_errors.append(msg)
continue
if result and result.get("error"):
msg = f"delivery error: {result['error']}"
error = result.get("error") if result else None
if error:
msg = f"delivery error: {error}"
logger.error("Job '%s': %s", job["id"], msg)
delivery_errors.append(msg)
continue
@@ -671,47 +641,6 @@ def _build_job_prompt(job: dict, prerun_script: Optional[tuple] = None) -> str:
f"{prompt}"
)
# Inject output from referenced cron jobs as context.
context_from = job.get("context_from")
if context_from:
from cron.jobs import OUTPUT_DIR
if isinstance(context_from, str):
context_from = [context_from]
for source_job_id in context_from:
# Guard against path traversal — valid job IDs are 12-char hex strings
if not source_job_id or not all(c in "0123456789abcdef" for c in source_job_id):
logger.warning("context_from: skipping invalid job_id %r", source_job_id)
continue
try:
job_output_dir = OUTPUT_DIR / source_job_id
if not job_output_dir.exists():
continue # silent skip — no output yet
output_files = sorted(
job_output_dir.glob("*.md"),
key=lambda f: f.stat().st_mtime,
reverse=True,
)
if not output_files:
continue # silent skip — no output yet
latest_output = output_files[0].read_text(encoding="utf-8").strip()
# Truncate to 8K characters to avoid prompt bloat
_MAX_CONTEXT_CHARS = 8000
if len(latest_output) > _MAX_CONTEXT_CHARS:
latest_output = latest_output[:_MAX_CONTEXT_CHARS] + "\n\n[... output truncated ...]"
if latest_output:
prompt = (
f"## Output from job '{source_job_id}'\n"
"The following is the most recent output from a preceding "
"cron job. Use it as context for your analysis.\n\n"
f"```\n{latest_output}\n```\n\n"
f"{prompt}"
)
else:
continue # silent skip — empty output
except (OSError, PermissionError) as e:
logger.warning("context_from: failed to read output for job %r: %s", source_job_id, e)
# silent skip — do not pollute the prompt with error messages
# Always prepend cron execution guidance so the agent knows how
# delivery works and can suppress delivery when appropriate.
cron_hint = (
@@ -836,30 +765,6 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
chat_name=origin.get("chat_name", "") if origin else "",
)
# Per-job working directory. When set (and validated at create/update
# time), we point TERMINAL_CWD at it so:
# - build_context_files_prompt() picks up AGENTS.md / CLAUDE.md /
# .cursorrules from the job's project dir, AND
# - the terminal, file, and code-exec tools run commands from there.
#
# tick() serializes workdir-jobs outside the parallel pool, so mutating
# os.environ["TERMINAL_CWD"] here is safe for those jobs. For workdir-less
# jobs we leave TERMINAL_CWD untouched — preserves the original behaviour
# (skip_context_files=True, tools use whatever cwd the scheduler has).
_job_workdir = (job.get("workdir") or "").strip() or None
if _job_workdir and not Path(_job_workdir).is_dir():
# Directory was removed between create-time validation and now. Log
# and drop back to old behaviour rather than crashing the job.
logger.warning(
"Job '%s': configured workdir %r no longer exists — running without it",
job_id, _job_workdir,
)
_job_workdir = None
_prior_terminal_cwd = os.environ.get("TERMINAL_CWD", "_UNSET_")
if _job_workdir:
os.environ["TERMINAL_CWD"] = _job_workdir
logger.info("Job '%s': using workdir %s", job_id, _job_workdir)
try:
# Re-read .env and config.yaml fresh every run so provider/key
# changes take effect without a gateway restart.
@@ -936,7 +841,6 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
resolve_runtime_provider,
format_runtime_provider_error,
)
from hermes_cli.auth import AuthError
try:
runtime_kwargs = {
"requested": job.get("provider") or os.getenv("HERMES_INFERENCE_PROVIDER"),
@@ -944,28 +848,6 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
if job.get("base_url"):
runtime_kwargs["explicit_base_url"] = job.get("base_url")
runtime = resolve_runtime_provider(**runtime_kwargs)
except AuthError as auth_exc:
# Primary provider auth failed — try fallback chain before giving up.
logger.warning("Job '%s': primary auth failed (%s), trying fallback", job_id, auth_exc)
fb = _cfg.get("fallback_providers") or _cfg.get("fallback_model")
fb_list = (fb if isinstance(fb, list) else [fb]) if fb else []
runtime = None
for entry in fb_list:
if not isinstance(entry, dict):
continue
try:
fb_kwargs = {"requested": entry.get("provider")}
if entry.get("base_url"):
fb_kwargs["explicit_base_url"] = entry["base_url"]
if entry.get("api_key"):
fb_kwargs["explicit_api_key"] = entry["api_key"]
runtime = resolve_runtime_provider(**fb_kwargs)
logger.info("Job '%s': fallback resolved to %s", job_id, runtime.get("provider"))
break
except Exception as fb_exc:
logger.debug("Job '%s': fallback %s failed: %s", job_id, entry.get("provider"), fb_exc)
if runtime is None:
raise RuntimeError(format_runtime_provider_error(auth_exc)) from auth_exc
except Exception as exc:
message = format_runtime_provider_error(exc)
raise RuntimeError(message) from exc
@@ -1005,13 +887,9 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
providers_ignored=pr.get("ignore"),
providers_order=pr.get("order"),
provider_sort=pr.get("sort"),
enabled_toolsets=_resolve_cron_enabled_toolsets(job, _cfg),
disabled_toolsets=["cronjob", "messaging", "clarify"],
quiet_mode=True,
# When a workdir is configured, inject AGENTS.md / CLAUDE.md /
# .cursorrules from that directory; otherwise preserve the old
# behaviour (don't inject SOUL.md/AGENTS.md from the scheduler cwd).
skip_context_files=not bool(_job_workdir),
skip_context_files=True, # Don't inject SOUL.md/AGENTS.md from scheduler cwd
skip_memory=True, # Cron system prompts would corrupt user representations
platform="cron",
session_id=_cron_session_id,
@@ -1150,14 +1028,6 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
return False, output, "", error_msg
finally:
# Restore TERMINAL_CWD to whatever it was before this job ran. We
# only ever mutate it when the job has a workdir; see the setup block
# at the top of run_job for the serialization guarantee.
if _job_workdir:
if _prior_terminal_cwd == "_UNSET_":
os.environ.pop("TERMINAL_CWD", None)
else:
os.environ["TERMINAL_CWD"] = _prior_terminal_cwd
# Clean up ContextVar session/delivery state for this job.
clear_session_vars(_ctx_tokens)
if _session_db:
@@ -1285,28 +1155,14 @@ def tick(verbose: bool = True, adapters=None, loop=None) -> int:
mark_job_run(job["id"], False, str(e))
return False
# Partition due jobs: those with a per-job workdir mutate
# os.environ["TERMINAL_CWD"] inside run_job, which is process-global —
# so they MUST run sequentially to avoid corrupting each other. Jobs
# without a workdir leave env untouched and stay parallel-safe.
workdir_jobs = [j for j in due_jobs if (j.get("workdir") or "").strip()]
parallel_jobs = [j for j in due_jobs if not (j.get("workdir") or "").strip()]
_results: list = []
# Sequential pass for workdir jobs.
for job in workdir_jobs:
_ctx = contextvars.copy_context()
_results.append(_ctx.run(_process_job, job))
# Parallel pass for the rest — same behaviour as before.
if parallel_jobs:
with concurrent.futures.ThreadPoolExecutor(max_workers=_max_workers) as _tick_pool:
_futures = []
for job in parallel_jobs:
_ctx = contextvars.copy_context()
_futures.append(_tick_pool.submit(_ctx.run, _process_job, job))
_results.extend(f.result() for f in _futures)
# Run all due jobs concurrently, each in its own ContextVar copy
# so session/delivery state stays isolated per-thread.
with concurrent.futures.ThreadPoolExecutor(max_workers=_max_workers) as _tick_pool:
_futures = []
for job in due_jobs:
_ctx = contextvars.copy_context()
_futures.append(_tick_pool.submit(_ctx.run, _process_job, job))
_results = [f.result() for f in _futures]
return sum(_results)
finally:
+1 -1
View File
@@ -29,7 +29,7 @@ echo "📝 Logging to: $LOG_FILE"
# Point to the example dataset in this directory
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
python batch_runner.py \
python scripts/batch_runner.py \
--dataset_file="$SCRIPT_DIR/example_browser_tasks.jsonl" \
--batch_size=5 \
--run_name="browser_tasks_example" \
+1 -1
View File
@@ -4,7 +4,7 @@
# Generates tool-calling trajectories for multi-step web research tasks.
#
# Usage:
# python batch_runner.py \
# python scripts/batch_runner.py \
# --config datagen-config-examples/web_research.yaml \
# --run_name web_research_v1
-52
View File
@@ -1,52 +0,0 @@
#
# docker-compose.yml for Hermes Agent
#
# Usage:
# HERMES_UID=$(id -u) HERMES_GID=$(id -g) docker compose up -d
#
# Set HERMES_UID / HERMES_GID to the host user that owns ~/.hermes so
# files created inside the container stay readable/writable on the host.
# The entrypoint remaps the internal `hermes` user to these values via
# usermod/groupmod + gosu.
#
# Security notes:
# - The dashboard service binds to 127.0.0.1 by default. It stores API
# keys; exposing it on LAN without auth is unsafe. If you want remote
# access, use an SSH tunnel or put it behind a reverse proxy that
# adds authentication — do NOT pass --insecure --host 0.0.0.0.
# - The gateway's API server is off unless you uncomment API_SERVER_KEY
# and API_SERVER_HOST. See docs/user-guide/api-server.md before doing
# this on an internet-facing host.
#
services:
gateway:
build: .
image: hermes-agent
container_name: hermes
restart: unless-stopped
network_mode: host
volumes:
- ~/.hermes:/opt/data
environment:
- HERMES_UID=${HERMES_UID:-10000}
- HERMES_GID=${HERMES_GID:-10000}
# To expose the OpenAI-compatible API server beyond localhost,
# uncomment BOTH lines (API_SERVER_KEY is mandatory for auth):
# - API_SERVER_HOST=0.0.0.0
# - API_SERVER_KEY=${API_SERVER_KEY}
command: ["gateway", "run"]
dashboard:
image: hermes-agent
container_name: hermes-dashboard
restart: unless-stopped
network_mode: host
depends_on:
- gateway
volumes:
- ~/.hermes:/opt/data
environment:
- HERMES_UID=${HERMES_UID:-10000}
- HERMES_GID=${HERMES_GID:-10000}
# Localhost-only. For remote access, tunnel via `ssh -L 9119:localhost:9119`.
command: ["dashboard", "--host", "127.0.0.1", "--no-open"]
+2 -11
View File
@@ -22,18 +22,9 @@ if [ "$(id -u)" = "0" ]; then
groupmod -o -g "$HERMES_GID" hermes 2>/dev/null || true
fi
# Fix ownership of the data volume. When HERMES_UID remaps the hermes user,
# files created by previous runs (under the old UID) become inaccessible.
# Always chown -R when UID was remapped; otherwise only if top-level is wrong.
actual_hermes_uid=$(id -u hermes)
needs_chown=false
if [ -n "$HERMES_UID" ] && [ "$HERMES_UID" != "10000" ]; then
needs_chown=true
elif [ "$(stat -c %u "$HERMES_HOME" 2>/dev/null)" != "$actual_hermes_uid" ]; then
needs_chown=true
fi
if [ "$needs_chown" = true ]; then
echo "Fixing ownership of $HERMES_HOME to hermes ($actual_hermes_uid)"
if [ "$(stat -c %u "$HERMES_HOME" 2>/dev/null)" != "$actual_hermes_uid" ]; then
echo "$HERMES_HOME is not owned by $actual_hermes_uid, fixing"
# In rootless Podman the container's "root" is mapped to an unprivileged
# host UID — chown will fail. That's fine: the volume is already owned
# by the mapped user on the host side.
+4 -1
View File
@@ -18,7 +18,10 @@ import logging
import os
import uuid
from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional, Set
from typing import Any, Dict, List, Optional, Set, TYPE_CHECKING
if TYPE_CHECKING:
from tools.budget_config import BudgetConfig
from model_tools import handle_function_call
from tools.terminal_tool import get_active_env
+3 -8
View File
@@ -135,7 +135,7 @@ class SessionResetPolicy:
mode=mode if mode is not None else "both",
at_hour=at_hour if at_hour is not None else 4,
idle_minutes=idle_minutes if idle_minutes is not None else 1440,
notify=_coerce_bool(notify, True),
notify=notify if notify is not None else True,
notify_exclude_platforms=tuple(exclude) if exclude is not None else ("api_server", "webhook"),
)
@@ -178,7 +178,7 @@ class PlatformConfig:
home_channel = HomeChannel.from_dict(data["home_channel"])
return cls(
enabled=_coerce_bool(data.get("enabled"), False),
enabled=data.get("enabled", False),
token=data.get("token"),
api_key=data.get("api_key"),
home_channel=home_channel,
@@ -435,7 +435,7 @@ class GatewayConfig:
reset_triggers=data.get("reset_triggers", ["/new", "/reset"]),
quick_commands=quick_commands,
sessions_dir=sessions_dir,
always_log_local=_coerce_bool(data.get("always_log_local"), True),
always_log_local=data.get("always_log_local", True),
stt_enabled=_coerce_bool(stt_enabled, True),
group_sessions_per_user=_coerce_bool(group_sessions_per_user, True),
thread_sessions_per_user=_coerce_bool(thread_sessions_per_user, False),
@@ -687,11 +687,6 @@ def load_gateway_config() -> GatewayConfig:
os.environ["TELEGRAM_REACTIONS"] = str(telegram_cfg["reactions"]).lower()
if "proxy_url" in telegram_cfg and not os.getenv("TELEGRAM_PROXY"):
os.environ["TELEGRAM_PROXY"] = str(telegram_cfg["proxy_url"]).strip()
if "group_allowed_chats" in telegram_cfg and not os.getenv("TELEGRAM_GROUP_ALLOWED_USERS"):
gac = telegram_cfg["group_allowed_chats"]
if isinstance(gac, list):
gac = ",".join(str(v) for v in gac)
os.environ["TELEGRAM_GROUP_ALLOWED_USERS"] = str(gac)
if "disable_link_previews" in telegram_cfg:
plat_data = platforms_data.setdefault(Platform.TELEGRAM.value, {})
if not isinstance(plat_data, dict):
+62 -168
View File
@@ -32,14 +32,7 @@ import sqlite3
import time
import uuid
from typing import Any, Dict, List, Optional
try:
from aiohttp import web
AIOHTTP_AVAILABLE = True
except ImportError:
AIOHTTP_AVAILABLE = False
web = None # type: ignore[assignment]
from aiohttp import web
from gateway.config import Platform, PlatformConfig
from gateway.platforms.base import (
BasePlatformAdapter,
@@ -270,12 +263,6 @@ def _multimodal_validation_error(exc: ValueError, *, param: str) -> "web.Respons
status=400,
)
def check_api_server_requirements() -> bool:
"""Check if API server dependencies are available."""
return AIOHTTP_AVAILABLE
class ResponseStore:
"""
SQLite-backed LRU store for Responses API state.
@@ -391,30 +378,26 @@ _CORS_HEADERS = {
}
if AIOHTTP_AVAILABLE:
@web.middleware
async def cors_middleware(request, handler):
"""Add CORS headers for explicitly allowed origins; handle OPTIONS preflight."""
adapter = request.app.get("api_server_adapter")
origin = request.headers.get("Origin", "")
cors_headers = None
if adapter is not None:
if not adapter._origin_allowed(origin):
return web.Response(status=403)
cors_headers = adapter._cors_headers_for_origin(origin)
@web.middleware
async def cors_middleware(request, handler):
"""Add CORS headers for explicitly allowed origins; handle OPTIONS preflight."""
adapter = request.app.get("api_server_adapter")
origin = request.headers.get("Origin", "")
cors_headers = None
if adapter is not None:
if not adapter._origin_allowed(origin):
return web.Response(status=403)
cors_headers = adapter._cors_headers_for_origin(origin)
if request.method == "OPTIONS":
if cors_headers is None:
return web.Response(status=403)
return web.Response(status=200, headers=cors_headers)
response = await handler(request)
if cors_headers is not None:
response.headers.update(cors_headers)
return response
else:
cors_middleware = None # type: ignore[assignment]
if request.method == "OPTIONS":
if cors_headers is None:
return web.Response(status=403)
return web.Response(status=200, headers=cors_headers)
response = await handler(request)
if cors_headers is not None:
response.headers.update(cors_headers)
return response
def _openai_error(message: str, err_type: str = "invalid_request_error", param: str = None, code: str = None) -> Dict[str, Any]:
"""OpenAI-style error envelope."""
@@ -428,21 +411,18 @@ def _openai_error(message: str, err_type: str = "invalid_request_error", param:
}
if AIOHTTP_AVAILABLE:
@web.middleware
async def body_limit_middleware(request, handler):
"""Reject overly large request bodies early based on Content-Length."""
if request.method in ("POST", "PUT", "PATCH"):
cl = request.headers.get("Content-Length")
if cl is not None:
try:
if int(cl) > MAX_REQUEST_BYTES:
return web.json_response(_openai_error("Request body too large.", code="body_too_large"), status=413)
except ValueError:
return web.json_response(_openai_error("Invalid Content-Length header.", code="invalid_content_length"), status=400)
return await handler(request)
else:
body_limit_middleware = None # type: ignore[assignment]
@web.middleware
async def body_limit_middleware(request, handler):
"""Reject overly large request bodies early based on Content-Length."""
if request.method in ("POST", "PUT", "PATCH"):
cl = request.headers.get("Content-Length")
if cl is not None:
try:
if int(cl) > MAX_REQUEST_BYTES:
return web.json_response(_openai_error("Request body too large.", code="body_too_large"), status=413)
except ValueError:
return web.json_response(_openai_error("Invalid Content-Length header.", code="invalid_content_length"), status=400)
return await handler(request)
_SECURITY_HEADERS = {
"X-Content-Type-Options": "nosniff",
@@ -450,16 +430,13 @@ _SECURITY_HEADERS = {
}
if AIOHTTP_AVAILABLE:
@web.middleware
async def security_headers_middleware(request, handler):
"""Add security headers to all responses (including errors)."""
response = await handler(request)
for k, v in _SECURITY_HEADERS.items():
response.headers.setdefault(k, v)
return response
else:
security_headers_middleware = None # type: ignore[assignment]
@web.middleware
async def security_headers_middleware(request, handler):
"""Add security headers to all responses (including errors)."""
response = await handler(request)
for k, v in _SECURITY_HEADERS.items():
response.headers.setdefault(k, v)
return response
class _IdempotencyCache:
@@ -804,7 +781,7 @@ class APIServerAdapter(BasePlatformAdapter):
],
})
async def _handle_chat_completions(self, request: "web.Request") -> "web.Response":
async def _handle_chat_completions(self, request: "web.Request") -> "web.StreamResponse":
"""POST /v1/chat/completions — OpenAI Chat Completions format."""
auth_err = self._check_auth(request)
if auth_err:
@@ -1204,12 +1181,10 @@ class APIServerAdapter(BasePlatformAdapter):
If the client disconnects mid-stream, ``agent.interrupt()`` is
called so the agent stops issuing upstream LLM calls, then the
asyncio task is cancelled. When ``store=True`` an initial
``in_progress`` snapshot is persisted immediately after
``response.created`` and disconnects update it to an
``incomplete`` snapshot so GET /v1/responses/{id} and
``previous_response_id`` chaining still have something to
recover from.
asyncio task is cancelled. When ``store=True`` the full response
is persisted to the ResponseStore in a ``finally`` block so GET
/v1/responses/{id} and ``previous_response_id`` chaining work the
same as the batch path.
"""
import queue as _q
@@ -1271,60 +1246,6 @@ class APIServerAdapter(BasePlatformAdapter):
final_response_text = ""
agent_error: Optional[str] = None
usage: Dict[str, int] = {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0}
terminal_snapshot_persisted = False
def _persist_response_snapshot(
response_env: Dict[str, Any],
*,
conversation_history_snapshot: Optional[List[Dict[str, Any]]] = None,
) -> None:
if not store:
return
if conversation_history_snapshot is None:
conversation_history_snapshot = list(conversation_history)
conversation_history_snapshot.append({"role": "user", "content": user_message})
self._response_store.put(response_id, {
"response": response_env,
"conversation_history": conversation_history_snapshot,
"instructions": instructions,
"session_id": session_id,
})
if conversation:
self._response_store.set_conversation(conversation, response_id)
def _persist_incomplete_if_needed() -> None:
"""Persist an ``incomplete`` snapshot if no terminal one was written.
Called from both the client-disconnect (``ConnectionResetError``)
and server-cancellation (``asyncio.CancelledError``) paths so
GET /v1/responses/{id} and ``previous_response_id`` chaining keep
working after abrupt stream termination.
"""
if not store or terminal_snapshot_persisted:
return
incomplete_text = "".join(final_text_parts) or final_response_text
incomplete_items: List[Dict[str, Any]] = list(emitted_items)
if incomplete_text:
incomplete_items.append({
"type": "message",
"role": "assistant",
"content": [{"type": "output_text", "text": incomplete_text}],
})
incomplete_env = _envelope("incomplete")
incomplete_env["output"] = incomplete_items
incomplete_env["usage"] = {
"input_tokens": usage.get("input_tokens", 0),
"output_tokens": usage.get("output_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
}
incomplete_history = list(conversation_history)
incomplete_history.append({"role": "user", "content": user_message})
if incomplete_text:
incomplete_history.append({"role": "assistant", "content": incomplete_text})
_persist_response_snapshot(
incomplete_env,
conversation_history_snapshot=incomplete_history,
)
try:
# response.created — initial envelope, status=in_progress
@@ -1334,7 +1255,6 @@ class APIServerAdapter(BasePlatformAdapter):
"type": "response.created",
"response": created_env,
})
_persist_response_snapshot(created_env)
last_activity = time.monotonic()
async def _open_message_item() -> None:
@@ -1591,18 +1511,6 @@ class APIServerAdapter(BasePlatformAdapter):
"output_tokens": usage.get("output_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
}
_failed_history = list(conversation_history)
_failed_history.append({"role": "user", "content": user_message})
if final_response_text or agent_error:
_failed_history.append({
"role": "assistant",
"content": final_response_text or agent_error,
})
_persist_response_snapshot(
failed_env,
conversation_history_snapshot=_failed_history,
)
terminal_snapshot_persisted = True
await _write_event("response.failed", {
"type": "response.failed",
"response": failed_env,
@@ -1615,24 +1523,30 @@ class APIServerAdapter(BasePlatformAdapter):
"output_tokens": usage.get("output_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
}
full_history = list(conversation_history)
full_history.append({"role": "user", "content": user_message})
if isinstance(result, dict) and result.get("messages"):
full_history.extend(result["messages"])
else:
full_history.append({"role": "assistant", "content": final_response_text})
_persist_response_snapshot(
completed_env,
conversation_history_snapshot=full_history,
)
terminal_snapshot_persisted = True
await _write_event("response.completed", {
"type": "response.completed",
"response": completed_env,
})
# Persist for future chaining / GET retrieval, mirroring
# the batch path behavior.
if store:
full_history = list(conversation_history)
full_history.append({"role": "user", "content": user_message})
if isinstance(result, dict) and result.get("messages"):
full_history.extend(result["messages"])
else:
full_history.append({"role": "assistant", "content": final_response_text})
self._response_store.put(response_id, {
"response": completed_env,
"conversation_history": full_history,
"instructions": instructions,
"session_id": session_id,
})
if conversation:
self._response_store.set_conversation(conversation, response_id)
except (ConnectionResetError, ConnectionAbortedError, BrokenPipeError, OSError):
_persist_incomplete_if_needed()
# Client disconnected — interrupt the agent so it stops
# making upstream LLM calls, then cancel the task.
agent = agent_ref[0] if agent_ref else None
@@ -1648,26 +1562,10 @@ class APIServerAdapter(BasePlatformAdapter):
except (asyncio.CancelledError, Exception):
pass
logger.info("SSE client disconnected; interrupted agent task %s", response_id)
except asyncio.CancelledError:
# Server-side cancellation (e.g. shutdown, request timeout) —
# persist an incomplete snapshot so GET /v1/responses/{id} and
# previous_response_id chaining still work, then re-raise so the
# runtime's cancellation semantics are respected.
_persist_incomplete_if_needed()
agent = agent_ref[0] if agent_ref else None
if agent is not None:
try:
agent.interrupt("SSE task cancelled")
except Exception:
pass
if not agent_task.done():
agent_task.cancel()
logger.info("SSE task cancelled; persisted incomplete snapshot for %s", response_id)
raise
return response
async def _handle_responses(self, request: "web.Request") -> "web.Response":
async def _handle_responses(self, request: "web.Request") -> "web.StreamResponse":
"""POST /v1/responses — OpenAI Responses API format."""
auth_err = self._check_auth(request)
if auth_err:
@@ -2561,10 +2459,6 @@ class APIServerAdapter(BasePlatformAdapter):
async def connect(self) -> bool:
"""Start the aiohttp web server."""
if not AIOHTTP_AVAILABLE:
logger.warning("[%s] aiohttp not installed", self.name)
return False
try:
mws = [mw for mw in (cors_middleware, body_limit_middleware, security_headers_middleware) if mw is not None]
self._app = web.Application(middlewares=mws)
+43 -397
View File
@@ -148,102 +148,7 @@ def _detect_macos_system_proxy() -> str | None:
return None
def _split_host_port(value: str) -> tuple[str, int | None]:
raw = str(value or "").strip()
if not raw:
return "", None
if "://" in raw:
parsed = urlsplit(raw)
return (parsed.hostname or "").lower().rstrip("."), parsed.port
if raw.startswith("[") and "]" in raw:
host, _, rest = raw[1:].partition("]")
port = None
if rest.startswith(":") and rest[1:].isdigit():
port = int(rest[1:])
return host.lower().rstrip("."), port
if raw.count(":") == 1:
host, _, maybe_port = raw.rpartition(":")
if maybe_port.isdigit():
return host.lower().rstrip("."), int(maybe_port)
return raw.lower().strip("[]").rstrip("."), None
def _no_proxy_entries() -> list[str]:
entries: list[str] = []
for key in ("NO_PROXY", "no_proxy"):
raw = os.environ.get(key, "")
entries.extend(part.strip() for part in raw.split(",") if part.strip())
return entries
def _no_proxy_entry_matches(entry: str, host: str, port: int | None = None) -> bool:
token = str(entry or "").strip().lower()
if not token:
return False
if token == "*":
return True
token_host, token_port = _split_host_port(token)
if token_port is not None and port is not None and token_port != port:
return False
if token_port is not None and port is None:
return False
if not token_host:
return False
try:
network = ipaddress.ip_network(token_host, strict=False)
try:
return ipaddress.ip_address(host) in network
except ValueError:
return False
except ValueError:
pass
try:
token_ip = ipaddress.ip_address(token_host)
try:
return ipaddress.ip_address(host) == token_ip
except ValueError:
return False
except ValueError:
pass
if token_host.startswith("*."):
suffix = token_host[1:]
return host.endswith(suffix)
if token_host.startswith("."):
return host == token_host[1:] or host.endswith(token_host)
return host == token_host or host.endswith(f".{token_host}")
def should_bypass_proxy(target_hosts: str | list[str] | tuple[str, ...] | set[str] | None) -> bool:
"""Return True when NO_PROXY/no_proxy matches at least one target host.
Supports exact hosts, domain suffixes, wildcard suffixes, IP literals,
CIDR ranges, optional host:port entries, and ``*``.
"""
entries = _no_proxy_entries()
if not entries or not target_hosts:
return False
if isinstance(target_hosts, str):
candidates = [target_hosts]
else:
candidates = list(target_hosts)
for candidate in candidates:
host, port = _split_host_port(str(candidate))
if not host:
continue
if any(_no_proxy_entry_matches(entry, host, port) for entry in entries):
return True
return False
def resolve_proxy_url(
platform_env_var: str | None = None,
*,
target_hosts: str | list[str] | tuple[str, ...] | set[str] | None = None,
) -> str | None:
def resolve_proxy_url(platform_env_var: str | None = None) -> str | None:
"""Return a proxy URL from env vars, or macOS system proxy.
Check order:
@@ -251,26 +156,18 @@ def resolve_proxy_url(
1. HTTPS_PROXY / HTTP_PROXY / ALL_PROXY (and lowercase variants)
2. macOS system proxy via ``scutil --proxy`` (auto-detect)
Returns *None* if no proxy is found, or if NO_PROXY/no_proxy matches one
of ``target_hosts``.
Returns *None* if no proxy is found.
"""
if platform_env_var:
value = (os.environ.get(platform_env_var) or "").strip()
if value:
if should_bypass_proxy(target_hosts):
return None
return normalize_proxy_url(value)
for key in ("HTTPS_PROXY", "HTTP_PROXY", "ALL_PROXY",
"https_proxy", "http_proxy", "all_proxy"):
value = (os.environ.get(key) or "").strip()
if value:
if should_bypass_proxy(target_hosts):
return None
return normalize_proxy_url(value)
detected = normalize_proxy_url(_detect_macos_system_proxy())
if detected and should_bypass_proxy(target_hosts):
return None
return detected
return normalize_proxy_url(_detect_macos_system_proxy())
def proxy_kwargs_for_bot(proxy_url: str | None) -> dict:
@@ -290,16 +187,14 @@ def proxy_kwargs_for_bot(proxy_url: str | None) -> dict:
if proxy_url.lower().startswith("socks"):
try:
from aiohttp_socks import ProxyConnector
connector = ProxyConnector.from_url(proxy_url, rdns=True)
return {"connector": connector}
except ImportError:
logger.warning(
"aiohttp_socks not installed — SOCKS proxy %s ignored. "
"Run: pip install aiohttp-socks",
proxy_url,
)
return {}
raise ImportError(
"aiohttp-socks is required for SOCKS proxy support. "
"Install with: pip install hermes-agent[messaging]"
) from None
connector = ProxyConnector.from_url(proxy_url, rdns=True)
return {"connector": connector}
return {"proxy": proxy_url}
@@ -323,16 +218,14 @@ def proxy_kwargs_for_aiohttp(proxy_url: str | None) -> tuple[dict, dict]:
if proxy_url.lower().startswith("socks"):
try:
from aiohttp_socks import ProxyConnector
connector = ProxyConnector.from_url(proxy_url, rdns=True)
return {"connector": connector}, {}
except ImportError:
logger.warning(
"aiohttp_socks not installed — SOCKS proxy %s ignored. "
"Run: pip install aiohttp-socks",
proxy_url,
)
return {}, {}
raise ImportError(
"aiohttp-socks is required for SOCKS proxy support. "
"Install with: pip install hermes-agent[messaging]"
) from None
connector = ProxyConnector.from_url(proxy_url, rdns=True)
return {"connector": connector}, {}
return {}, {"proxy": proxy_url}
@@ -531,6 +424,7 @@ async def cache_image_from_url(url: str, ext: str = ".jpg", retries: int = 2) ->
await asyncio.sleep(wait)
continue
raise
raise AssertionError("unreachable: retry loop exhausted")
def cleanup_image_cache(max_age_hours: int = 24) -> int:
@@ -645,6 +539,7 @@ async def cache_audio_from_url(url: str, ext: str = ".ogg", retries: int = 2) ->
await asyncio.sleep(wait)
continue
raise
raise AssertionError("unreachable: retry loop exhausted")
# ---------------------------------------------------------------------------
@@ -1003,16 +898,10 @@ class BasePlatformAdapter(ABC):
self._fatal_error_retryable = True
self._fatal_error_handler: Optional[Callable[["BasePlatformAdapter"], Awaitable[None] | None]] = None
# Track active message handlers per session for interrupt support.
# _active_sessions stores the per-session interrupt Event; _session_tasks
# maps session → the specific Task currently processing it so that
# session-terminating commands (/stop, /new, /reset) can cancel the
# right task and release the adapter-level guard deterministically.
# Without the owner-task map, an old task's finally block could delete
# a newer task's guard, leaving stale busy state.
# Track active message handlers per session for interrupt support
# Key: session_key (e.g., chat_id), Value: (event, asyncio.Event for interrupt)
self._active_sessions: Dict[str, asyncio.Event] = {}
self._pending_messages: Dict[str, MessageEvent] = {}
self._session_tasks: Dict[str, asyncio.Task] = {}
# Background message-processing tasks spawned by handle_message().
# Gateway shutdown cancels these so an old gateway instance doesn't keep
# working on a task after --replace or manual restarts.
@@ -1789,222 +1678,6 @@ class BasePlatformAdapter(ABC):
return f"{existing_text}\n\n{new_text}".strip()
return existing_text
# ------------------------------------------------------------------
# Session task + guard ownership helpers
# ------------------------------------------------------------------
# These were introduced together with the _session_tasks owner map to
# make session lifecycle reconciliation deterministic across (a) the
# normal completion path, (b) /stop/ /new/ /reset bypass commands,
# and (c) stale-lock self-heal on the next inbound message.
def _release_session_guard(
self,
session_key: str,
*,
guard: Optional[asyncio.Event] = None,
) -> None:
"""Release the adapter-level guard for a session.
When ``guard`` is provided, only release the entry if it still points
at that exact Event. This lets reset-like commands swap in a temporary
guard while the old processing task unwinds, without having the old
task's cleanup accidentally clear the replacement guard.
"""
current_guard = self._active_sessions.get(session_key)
if current_guard is None:
return
if guard is not None and current_guard is not guard:
return
del self._active_sessions[session_key]
def _session_task_is_stale(self, session_key: str) -> bool:
"""Return True if the owner task for ``session_key`` is done/cancelled.
A lock is "stale" when the adapter still has ``_active_sessions[key]``
AND a known owner task in ``_session_tasks`` that has already exited.
When there is no owner task at all, that usually means the guard was
installed by some path other than handle_message() (tests sometimes
install guards directly) don't treat that as stale. The on-entry
self-heal only needs to handle the production split-brain case where
an owner task was recorded, then exited without clearing its guard.
"""
task = self._session_tasks.get(session_key)
if task is None:
return False
done = getattr(task, "done", None)
return bool(done and done())
def _heal_stale_session_lock(self, session_key: str) -> bool:
"""Clear a stale session lock if the owner task is already gone.
Returns True if a stale lock was healed. Returns False if there is
no lock, or the owner task is still alive (the normal busy case).
This is the on-entry safety net sidbin's issue #11016 analysis calls
for: without it, a split-brain adapter still thinks the session is
active, but nothing is actually processing traps the chat in
infinite "Interrupting current task..." until the gateway is
restarted.
"""
if session_key not in self._active_sessions:
return False
if not self._session_task_is_stale(session_key):
return False
logger.warning(
"[%s] Healing stale session lock for %s (owner task is done/absent)",
self.name,
session_key,
)
self._active_sessions.pop(session_key, None)
self._pending_messages.pop(session_key, None)
self._session_tasks.pop(session_key, None)
return True
def _start_session_processing(
self,
event: MessageEvent,
session_key: str,
*,
interrupt_event: Optional[asyncio.Event] = None,
) -> bool:
"""Spawn a background processing task under the given session guard.
Returns True on success. If the runtime stubs ``create_task`` with a
non-Task sentinel (some tests do this), the guard is rolled back and
False is returned so the caller isn't left holding a half-installed
session lock.
"""
guard = interrupt_event or asyncio.Event()
self._active_sessions[session_key] = guard
task = asyncio.create_task(self._process_message_background(event, session_key))
self._session_tasks[session_key] = task
try:
self._background_tasks.add(task)
except TypeError:
# Tests stub create_task() with lightweight sentinels that are not
# hashable and do not support lifecycle callbacks.
self._session_tasks.pop(session_key, None)
self._release_session_guard(session_key, guard=guard)
return False
if hasattr(task, "add_done_callback"):
task.add_done_callback(self._background_tasks.discard)
task.add_done_callback(self._expected_cancelled_tasks.discard)
return True
async def cancel_session_processing(
self,
session_key: str,
*,
release_guard: bool = True,
discard_pending: bool = True,
) -> None:
"""Cancel in-flight processing for a single session.
``release_guard=False`` keeps the adapter-level session guard in place
so reset-like commands can finish atomically before follow-up messages
are allowed to start a fresh background task.
"""
task = self._session_tasks.pop(session_key, None)
if task is not None and not task.done():
logger.debug(
"[%s] Cancelling active processing for session %s",
self.name,
session_key,
)
self._expected_cancelled_tasks.add(task)
task.cancel()
try:
await task
except asyncio.CancelledError:
pass
except Exception:
logger.debug(
"[%s] Session cancellation raised while unwinding %s",
self.name,
session_key,
exc_info=True,
)
if discard_pending:
self._pending_messages.pop(session_key, None)
if release_guard:
self._release_session_guard(session_key)
async def _drain_pending_after_session_command(
self,
session_key: str,
command_guard: asyncio.Event,
) -> None:
"""Resume the latest queued follow-up once a session command completes.
Called at the tail of /stop, /new, and /reset dispatch. Releases the
command-scoped guard, then if a follow-up message landed while the
command was running spawns a fresh processing task for it.
"""
pending_event = self._pending_messages.pop(session_key, None)
self._release_session_guard(session_key, guard=command_guard)
if pending_event is None:
return
self._start_session_processing(pending_event, session_key)
async def _dispatch_active_session_command(
self,
event: MessageEvent,
session_key: str,
cmd: str,
) -> None:
"""Dispatch a reset-like bypass command while preserving guard ordering.
/stop, /new, and /reset must:
1. Keep the session guard installed while the runner processes the
command (so a racing follow-up message stays queued, not
dispatched as a second parallel run).
2. Cancel the old in-flight adapter task only AFTER the runner has
finished handling the command (so the runner sees consistent
state and its response is sent in order).
3. Release the command-scoped guard and drain the latest queued
follow-up exactly once, after 1 and 2 complete.
"""
logger.debug(
"[%s] Command '/%s' bypassing active-session guard for %s",
self.name,
cmd,
session_key,
)
current_guard = self._active_sessions.get(session_key)
command_guard = asyncio.Event()
self._active_sessions[session_key] = command_guard
thread_meta = {"thread_id": event.source.thread_id} if event.source.thread_id else None
try:
response = await self._message_handler(event)
# Old adapter task (if any) is cancelled AFTER the runner has
# fully handled the command — keeps ordering deterministic.
await self.cancel_session_processing(
session_key,
release_guard=False,
discard_pending=False,
)
if response:
await self._send_with_retry(
chat_id=event.source.chat_id,
content=response,
reply_to=event.message_id,
metadata=thread_meta,
)
except Exception:
# On failure, restore the original guard if one still exists so
# we don't leave the session in a half-reset state.
if self._active_sessions.get(session_key) is command_guard:
if session_key in self._session_tasks and current_guard is not None:
self._active_sessions[session_key] = current_guard
else:
self._release_session_guard(session_key, guard=command_guard)
raise
await self._drain_pending_after_session_command(session_key, command_guard)
async def handle_message(self, event: MessageEvent) -> None:
"""
Process an incoming message.
@@ -2021,15 +1694,7 @@ class BasePlatformAdapter(ABC):
group_sessions_per_user=self.config.extra.get("group_sessions_per_user", True),
thread_sessions_per_user=self.config.extra.get("thread_sessions_per_user", False),
)
# On-entry self-heal: if the adapter still has an _active_sessions
# entry for this key but the owner task has already exited (done or
# cancelled), the lock is stale. Clear it and fall through to
# normal dispatch so the user isn't trapped behind a dead guard —
# this is the split-brain tail described in issue #11016.
if session_key in self._active_sessions:
self._heal_stale_session_lock(session_key)
# Check if there's already an active handler for this session
if session_key in self._active_sessions:
# Certain commands must bypass the active-session guard and be
@@ -2046,23 +1711,6 @@ class BasePlatformAdapter(ABC):
from hermes_cli.commands import should_bypass_active_session
if should_bypass_active_session(cmd):
# /stop, /new, /reset must cancel the in-flight adapter task
# and preserve ordering of queued follow-ups. Route those
# through the dedicated handoff path that serializes
# cancellation + runner response + pending drain.
if cmd in ("stop", "new", "reset"):
try:
await self._dispatch_active_session_command(event, session_key, cmd)
except Exception as e:
logger.error(
"[%s] Command '/%s' dispatch failed: %s",
self.name, cmd, e, exc_info=True,
)
return
# Other bypass commands (/approve, /deny, /status,
# /background, /restart) just need direct dispatch — they
# don't cancel the running task.
logger.debug(
"[%s] Command '/%s' bypassing active-session guard for %s",
self.name, cmd, session_key,
@@ -2108,9 +1756,19 @@ class BasePlatformAdapter(ABC):
# starts would also pass the _active_sessions check and spawn a
# duplicate task. (grammY sequentialize / aiogram EventIsolation
# pattern — set the guard synchronously, not inside the task.)
# _start_session_processing installs the guard AND the owner-task
# mapping atomically so stale-lock detection works.
self._start_session_processing(event, session_key)
self._active_sessions[session_key] = asyncio.Event()
# Spawn background task to process this message
task = asyncio.create_task(self._process_message_background(event, session_key))
try:
self._background_tasks.add(task)
except TypeError:
# Some tests stub create_task() with lightweight sentinels that are not
# hashable and do not support lifecycle callbacks.
return
if hasattr(task, "add_done_callback"):
task.add_done_callback(self._background_tasks.discard)
task.add_done_callback(self._expected_cancelled_tasks.discard)
@staticmethod
def _get_human_delay() -> float:
@@ -2171,8 +1829,11 @@ class BasePlatformAdapter(ABC):
try:
await self._run_processing_hook("on_processing_start", event)
# Call the handler (this can take a while with tool calls)
response = await self._message_handler(event)
handler = self._message_handler
if handler is None:
return
response = await handler(event)
# Send response if any. A None/empty response is normal when
# streaming already delivered the text (already_sent=True) or
@@ -2470,9 +2131,6 @@ class BasePlatformAdapter(ABC):
drain_task = asyncio.create_task(
self._process_message_background(late_pending, session_key)
)
# Hand ownership of the session to the drain task so stale-lock
# detection keeps working while it runs.
self._session_tasks[session_key] = drain_task
try:
self._background_tasks.add(drain_task)
drain_task.add_done_callback(self._background_tasks.discard)
@@ -2482,14 +2140,9 @@ class BasePlatformAdapter(ABC):
# Leave _active_sessions[session_key] populated — the drain
# task's own lifecycle will clean it up.
else:
# Clean up session tracking. Guard-match both deletes so a
# reset-like command that already swapped in its own
# command_guard (and cancelled us) can't be accidentally
# cleared by our unwind. The command owns the session now.
current_task = asyncio.current_task()
if current_task is not None and self._session_tasks.get(session_key) is current_task:
del self._session_tasks[session_key]
self._release_session_guard(session_key, guard=interrupt_event)
# Clean up session tracking
if session_key in self._active_sessions:
del self._active_sessions[session_key]
async def cancel_background_tasks(self) -> None:
"""Cancel any in-flight background message-processing tasks.
@@ -2519,7 +2172,6 @@ class BasePlatformAdapter(ABC):
# will be in self._background_tasks now. Re-check.
self._background_tasks.clear()
self._expected_cancelled_tasks.clear()
self._session_tasks.clear()
self._pending_messages.clear()
self._active_sessions.clear()
@@ -2543,9 +2195,6 @@ class BasePlatformAdapter(ABC):
user_id_alt: Optional[str] = None,
chat_id_alt: Optional[str] = None,
is_bot: bool = False,
guild_id: Optional[str] = None,
parent_chat_id: Optional[str] = None,
message_id: Optional[str] = None,
) -> SessionSource:
"""Helper to build a SessionSource for this platform."""
# Normalize empty topic to None
@@ -2563,9 +2212,6 @@ class BasePlatformAdapter(ABC):
user_id_alt=user_id_alt,
chat_id_alt=chat_id_alt,
is_bot=is_bot,
guild_id=str(guild_id) if guild_id else None,
parent_chat_id=str(parent_chat_id) if parent_chat_id else None,
message_id=str(message_id) if message_id else None,
)
@abstractmethod
+5 -22
View File
@@ -14,7 +14,7 @@ import logging
import os
import re
import uuid
from datetime import datetime
from datetime import datetime, timezone
from typing import Any, Dict, List, Optional
from urllib.parse import quote
@@ -99,7 +99,6 @@ def _normalize_server_url(raw: str) -> str:
class BlueBubblesAdapter(BasePlatformAdapter):
platform = Platform.BLUEBUBBLES
SUPPORTS_MESSAGE_EDITING = False
MAX_MESSAGE_LENGTH = MAX_TEXT_LENGTH
def __init__(self, config: PlatformConfig):
@@ -378,7 +377,7 @@ class BlueBubblesAdapter(BasePlatformAdapter):
payload = {
"addresses": [address],
"message": message,
"tempGuid": f"temp-{datetime.utcnow().timestamp()}",
"tempGuid": f"temp-{datetime.now(timezone.utc).timestamp()}",
}
try:
res = await self._api_post("/api/v1/chat/new", payload)
@@ -392,13 +391,6 @@ class BlueBubblesAdapter(BasePlatformAdapter):
# Text sending
# ------------------------------------------------------------------
@staticmethod
def truncate_message(content: str, max_length: int = MAX_TEXT_LENGTH) -> List[str]:
# Use the base splitter but skip pagination indicators — iMessage
# bubbles flow naturally without "(1/3)" suffixes.
chunks = BasePlatformAdapter.truncate_message(content, max_length)
return [re.sub(r"\s*\(\d+/\d+\)$", "", c) for c in chunks]
async def send(
self,
chat_id: str,
@@ -406,19 +398,10 @@ class BlueBubblesAdapter(BasePlatformAdapter):
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
text = self.format_message(content)
text = strip_markdown(content or "")
if not text:
return SendResult(success=False, error="BlueBubbles send requires text")
# Split on paragraph breaks first (double newlines) so each thought
# becomes its own iMessage bubble, then truncate any that are still
# too long.
paragraphs = [p.strip() for p in re.split(r'\n\s*\n', text) if p.strip()]
chunks: List[str] = []
for para in (paragraphs or [text]):
if len(para) <= self.MAX_MESSAGE_LENGTH:
chunks.append(para)
else:
chunks.extend(self.truncate_message(para, max_length=self.MAX_MESSAGE_LENGTH))
chunks = self.truncate_message(text, max_length=self.MAX_MESSAGE_LENGTH)
last = SendResult(success=True)
for chunk in chunks:
guid = await self._resolve_chat_guid(chat_id)
@@ -434,7 +417,7 @@ class BlueBubblesAdapter(BasePlatformAdapter):
)
payload: Dict[str, Any] = {
"chatGuid": guid,
"tempGuid": f"temp-{datetime.utcnow().timestamp()}",
"tempGuid": f"temp-{datetime.now(timezone.utc).timestamp()}",
"message": chunk,
}
if reply_to and self._private_api_enabled and self._helper_connected:
+30 -233
View File
@@ -23,7 +23,6 @@ from typing import Callable, Dict, Optional, Any
logger = logging.getLogger(__name__)
VALID_THREAD_AUTO_ARCHIVE_MINUTES = {60, 1440, 4320, 10080}
_DISCORD_COMMAND_SYNC_POLICIES = {"safe", "bulk", "off"}
try:
import discord
@@ -803,27 +802,8 @@ class DiscordAdapter(BasePlatformAdapter):
if not self._client:
return
try:
sync_policy = self._get_discord_command_sync_policy()
if sync_policy == "off":
logger.info("[%s] Skipping Discord slash command sync (policy=off)", self.name)
return
if sync_policy == "bulk":
synced = await asyncio.wait_for(self._client.tree.sync(), timeout=30)
logger.info("[%s] Synced %d slash command(s) via bulk tree sync", self.name, len(synced))
return
summary = await asyncio.wait_for(self._safe_sync_slash_commands(), timeout=30)
logger.info(
"[%s] Safely reconciled %d slash command(s): unchanged=%d updated=%d recreated=%d created=%d deleted=%d",
self.name,
summary["total"],
summary["unchanged"],
summary["updated"],
summary["recreated"],
summary["created"],
summary["deleted"],
)
synced = await asyncio.wait_for(self._client.tree.sync(), timeout=30)
logger.info("[%s] Synced %d slash command(s)", self.name, len(synced))
except asyncio.TimeoutError:
logger.warning("[%s] Slash command sync timed out after 30s", self.name)
except asyncio.CancelledError:
@@ -831,183 +811,6 @@ class DiscordAdapter(BasePlatformAdapter):
except Exception as e: # pragma: no cover - defensive logging
logger.warning("[%s] Slash command sync failed: %s", self.name, e, exc_info=True)
def _get_discord_command_sync_policy(self) -> str:
raw = str(os.getenv("DISCORD_COMMAND_SYNC_POLICY", "safe") or "").strip().lower()
if raw in _DISCORD_COMMAND_SYNC_POLICIES:
return raw
if raw:
logger.warning(
"[%s] Invalid DISCORD_COMMAND_SYNC_POLICY=%r; falling back to 'safe'",
self.name,
raw,
)
return "safe"
def _canonicalize_app_command_payload(self, payload: Dict[str, Any]) -> Dict[str, Any]:
"""Reduce command payloads to the semantic fields Hermes manages."""
contexts = payload.get("contexts")
integration_types = payload.get("integration_types")
return {
"type": int(payload.get("type", 1) or 1),
"name": str(payload.get("name", "") or ""),
"description": str(payload.get("description", "") or ""),
"default_member_permissions": self._normalize_permissions(
payload.get("default_member_permissions")
),
"dm_permission": bool(payload.get("dm_permission", True)),
"nsfw": bool(payload.get("nsfw", False)),
"contexts": sorted(int(c) for c in contexts) if contexts else None,
"integration_types": (
sorted(int(i) for i in integration_types) if integration_types else None
),
"options": [
self._canonicalize_app_command_option(item)
for item in payload.get("options", []) or []
if isinstance(item, dict)
],
}
@staticmethod
def _normalize_permissions(value: Any) -> Optional[str]:
"""Discord emits default_member_permissions as str server-side but discord.py
sets it as int locally. Normalize to str-or-None so the comparison is stable."""
if value is None:
return None
return str(value)
def _existing_command_to_payload(self, command: Any) -> Dict[str, Any]:
"""Build a canonical-ready dict from an AppCommand.
discord.py's AppCommand.to_dict() does NOT include nsfw,
dm_permission, or default_member_permissions (they live only on the
attributes). Pull them from the attributes so the canonicalizer sees
the real server-side values instead of defaults otherwise any
command using non-default permissions would diff on every startup.
"""
payload = dict(command.to_dict())
nsfw = getattr(command, "nsfw", None)
if nsfw is not None:
payload["nsfw"] = bool(nsfw)
guild_only = getattr(command, "guild_only", None)
if guild_only is not None:
payload["dm_permission"] = not bool(guild_only)
default_permissions = getattr(command, "default_member_permissions", None)
if default_permissions is not None:
payload["default_member_permissions"] = getattr(
default_permissions, "value", default_permissions
)
return payload
def _canonicalize_app_command_option(self, payload: Dict[str, Any]) -> Dict[str, Any]:
return {
"type": int(payload.get("type", 0) or 0),
"name": str(payload.get("name", "") or ""),
"description": str(payload.get("description", "") or ""),
"required": bool(payload.get("required", False)),
"autocomplete": bool(payload.get("autocomplete", False)),
"choices": [
{
"name": str(choice.get("name", "") or ""),
"value": choice.get("value"),
}
for choice in payload.get("choices", []) or []
if isinstance(choice, dict)
],
"channel_types": list(payload.get("channel_types", []) or []),
"min_value": payload.get("min_value"),
"max_value": payload.get("max_value"),
"min_length": payload.get("min_length"),
"max_length": payload.get("max_length"),
"options": [
self._canonicalize_app_command_option(item)
for item in payload.get("options", []) or []
if isinstance(item, dict)
],
}
def _patchable_app_command_payload(self, payload: Dict[str, Any]) -> Dict[str, Any]:
"""Fields supported by discord.py's edit_global_command route."""
canonical = self._canonicalize_app_command_payload(payload)
return {
"name": canonical["name"],
"description": canonical["description"],
"options": canonical["options"],
}
async def _safe_sync_slash_commands(self) -> Dict[str, int]:
"""Diff existing global commands and only mutate the commands that changed."""
if not self._client:
return {
"total": 0,
"unchanged": 0,
"updated": 0,
"recreated": 0,
"created": 0,
"deleted": 0,
}
tree = self._client.tree
app_id = getattr(self._client, "application_id", None) or getattr(getattr(self._client, "user", None), "id", None)
if not app_id:
raise RuntimeError("Discord application ID is unavailable for slash command sync")
desired_payloads = [command.to_dict(tree) for command in tree.get_commands()]
desired_by_key = {
(int(payload.get("type", 1) or 1), str(payload.get("name", "") or "").lower()): payload
for payload in desired_payloads
}
existing_commands = await tree.fetch_commands()
existing_by_key = {
(
int(getattr(getattr(command, "type", None), "value", getattr(command, "type", 1)) or 1),
str(command.name or "").lower(),
): command
for command in existing_commands
}
unchanged = 0
updated = 0
recreated = 0
created = 0
deleted = 0
http = self._client.http
for key, desired in desired_by_key.items():
current = existing_by_key.pop(key, None)
if current is None:
await http.upsert_global_command(app_id, desired)
created += 1
continue
current_existing_payload = self._existing_command_to_payload(current)
current_payload = self._canonicalize_app_command_payload(current_existing_payload)
desired_payload = self._canonicalize_app_command_payload(desired)
if current_payload == desired_payload:
unchanged += 1
continue
if self._patchable_app_command_payload(current_existing_payload) == self._patchable_app_command_payload(desired):
await http.delete_global_command(app_id, current.id)
await http.upsert_global_command(app_id, desired)
recreated += 1
continue
await http.edit_global_command(app_id, current.id, desired)
updated += 1
for current in existing_by_key.values():
await http.delete_global_command(app_id, current.id)
deleted += 1
return {
"total": len(desired_payloads),
"unchanged": unchanged,
"updated": updated,
"recreated": recreated,
"created": created,
"deleted": deleted,
}
async def _add_reaction(self, message: Any, emoji: str) -> bool:
"""Add an emoji reaction to a Discord message."""
if not message or not hasattr(message, "add_reaction"):
@@ -1393,9 +1196,16 @@ class DiscordAdapter(BasePlatformAdapter):
try:
import base64
duration_secs = 5.0
try:
from mutagen.oggopus import OggOpus
except ImportError:
raise ImportError(
"mutagen is required for Discord voice messages. "
"Install with: pip install hermes-agent[messaging]"
) from None
duration_secs = 5.0
try:
info = OggOpus(audio_path)
duration_secs = info.info.length
except Exception:
@@ -2088,7 +1898,7 @@ class DiscordAdapter(BasePlatformAdapter):
# Fetch full member list (requires members intent)
try:
members = guild.members
if len(members) < guild.member_count:
if guild.member_count is not None and len(members) < guild.member_count:
members = [m async for m in guild.fetch_members(limit=None)]
except Exception as e:
logger.warning("Failed to fetch members for guild %s: %s", guild.name, e)
@@ -2246,6 +2056,10 @@ class DiscordAdapter(BasePlatformAdapter):
async def slash_usage(interaction: discord.Interaction):
await self._run_simple_slash(interaction, "/usage")
@tree.command(name="provider", description="Show available providers")
async def slash_provider(interaction: discord.Interaction):
await self._run_simple_slash(interaction, "/provider")
@tree.command(name="help", description="Show available commands")
async def slash_help(interaction: discord.Interaction):
await self._run_simple_slash(interaction, "/help")
@@ -2697,7 +2511,7 @@ class DiscordAdapter(BasePlatformAdapter):
if isinstance(skills, str):
return [skills]
if isinstance(skills, list) and skills:
return list(dict.fromkeys(skills)) # dedup, preserve order
return list(dict.fromkeys(skills)) # ty: ignore[invalid-return-type] # dedup, preserve order
return None
def _resolve_channel_prompt(self, channel_id: str, parent_id: str | None = None) -> str | None:
@@ -2715,12 +2529,7 @@ class DiscordAdapter(BasePlatformAdapter):
return os.getenv("DISCORD_REQUIRE_MENTION", "true").lower() not in ("false", "0", "no", "off")
def _discord_free_response_channels(self) -> set:
"""Return Discord channel IDs where no bot mention is required.
A single ``"*"`` entry (either from a list or a comma-separated
string) is preserved in the returned set so callers can short-circuit
on wildcard membership, consistent with ``allowed_channels``.
"""
"""Return Discord channel IDs where no bot mention is required."""
raw = self.config.extra.get("free_response_channels")
if raw is None:
raw = os.getenv("DISCORD_FREE_RESPONSE_CHANNELS", "")
@@ -3213,14 +3022,14 @@ class DiscordAdapter(BasePlatformAdapter):
allowed_channels_raw = os.getenv("DISCORD_ALLOWED_CHANNELS", "")
if allowed_channels_raw:
allowed_channels = {ch.strip() for ch in allowed_channels_raw.split(",") if ch.strip()}
if "*" not in allowed_channels and not (channel_ids & allowed_channels):
if not (channel_ids & allowed_channels):
logger.debug("[%s] Ignoring message in non-allowed channel: %s", self.name, channel_ids)
return
# Check ignored channels - never respond even when mentioned
ignored_channels_raw = os.getenv("DISCORD_IGNORED_CHANNELS", "")
ignored_channels = {ch.strip() for ch in ignored_channels_raw.split(",") if ch.strip()}
if "*" in ignored_channels or (channel_ids & ignored_channels):
if channel_ids & ignored_channels:
logger.debug("[%s] Ignoring message in ignored channel: %s", self.name, channel_ids)
return
@@ -3234,15 +3043,11 @@ class DiscordAdapter(BasePlatformAdapter):
voice_linked_ids = {str(ch_id) for ch_id in self._voice_text_channels.values()}
current_channel_id = str(message.channel.id)
is_voice_linked_channel = current_channel_id in voice_linked_ids
is_free_channel = (
"*" in free_channels
or bool(channel_ids & free_channels)
or is_voice_linked_channel
)
is_free_channel = bool(channel_ids & free_channels) or is_voice_linked_channel
# Skip the mention check if the message is in a thread where
# the bot has previously participated (auto-created or replied in).
in_bot_thread = is_thread and thread_id in self._threads
in_bot_thread = is_thread and thread_id is not None and thread_id in self._threads
if require_mention and not is_free_channel and not in_bot_thread:
if self._client.user not in message.mentions and not mention_prefix:
@@ -3261,7 +3066,6 @@ class DiscordAdapter(BasePlatformAdapter):
if auto_thread and not skip_thread and not is_voice_linked_channel and not is_reply_message:
thread = await self._auto_create_thread(message)
if thread:
parent_channel_id = str(message.channel.id)
is_thread = True
thread_id = str(thread.id)
auto_threaded_channel = thread
@@ -3321,9 +3125,6 @@ class DiscordAdapter(BasePlatformAdapter):
thread_id=thread_id,
chat_topic=chat_topic,
is_bot=getattr(message.author, "bot", False),
guild_id=str(message.guild.id) if message.guild else None,
parent_chat_id=parent_channel_id,
message_id=str(message.id),
)
# Build media URLs -- download image attachments to local cache so the
@@ -3839,7 +3640,9 @@ if DISCORD_AVAILABLE:
)
return
provider_slug = interaction.data["values"][0]
if interaction.data is None:
return
provider_slug = interaction.data["values"][0] # ty: ignore[invalid-key]
self._selected_provider = provider_slug
provider = next(
(p for p in self.providers if p["slug"] == provider_slug), None
@@ -3873,17 +3676,10 @@ if DISCORD_AVAILABLE:
)
return
if interaction.data is None:
return
self.resolved = True
model_id = interaction.data["values"][0]
self.clear_items()
await interaction.response.edit_message(
embed=discord.Embed(
title="⚙ Switching Model",
description=f"Switching to `{model_id}`...",
color=discord.Color.blue(),
),
view=None,
)
model_id = interaction.data["values"][0] # ty: ignore[invalid-key]
try:
result_text = await self.on_model_selected(
@@ -3894,13 +3690,14 @@ if DISCORD_AVAILABLE:
except Exception as exc:
result_text = f"Error switching model: {exc}"
await interaction.edit_original_response(
self.clear_items()
await interaction.response.edit_message(
embed=discord.Embed(
title="⚙ Model Switched",
description=result_text,
color=discord.Color.green(),
),
view=None,
view=self,
)
async def _on_back(self, interaction: discord.Interaction):
+1
View File
@@ -532,6 +532,7 @@ class EmailAdapter(BasePlatformAdapter):
image_url: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Send an image URL as part of an email body."""
text = caption or ""
+4 -18
View File
@@ -532,20 +532,6 @@ class MatrixAdapter(BasePlatformAdapter):
)
await crypto_store.open()
# Bind the store to the runtime device_id before any
# put_account() runs. PgCryptoStore defaults _device_id
# to "" and its crypto_account UPSERT never updates the
# device_id column on conflict — so once put_account
# writes blank, it stays blank forever. That breaks
# every downstream device-scoped olm operation: peer
# to-device ciphertext can't find our identity key and
# no megolm sessions ever land. Setting _device_id here
# (in-memory; the on-disk row may not exist yet) makes
# the first put_account write the correct value.
# DeviceID is a NewType(str) so plain str works at runtime.
if client.device_id:
await crypto_store.put_device_id(client.device_id)
crypto_state = _CryptoStateStore(state_store, self._joined_rooms)
olm = OlmMachine(client, crypto_store, crypto_state)
@@ -2184,8 +2170,8 @@ class MatrixAdapter(BasePlatformAdapter):
ul_match = re.match(r"^[\s]*[-*+]\s+(.+)$", line)
if ul_match:
items = []
while i < len(lines) and re.match(r"^[\s]*[-*+]\s+(.+)$", lines[i]):
items.append(re.match(r"^[\s]*[-*+]\s+(.+)$", lines[i]).group(1))
while i < len(lines) and (m := re.match(r"^[\s]*[-*+]\s+(.+)$", lines[i])):
items.append(m.group(1))
i += 1
li = "".join(f"<li>{item}</li>" for item in items)
out_lines.append(f"<ul>{li}</ul>")
@@ -2195,8 +2181,8 @@ class MatrixAdapter(BasePlatformAdapter):
ol_match = re.match(r"^[\s]*\d+[.)]\s+(.+)$", line)
if ol_match:
items = []
while i < len(lines) and re.match(r"^[\s]*\d+[.)]\s+(.+)$", lines[i]):
items.append(re.match(r"^[\s]*\d+[.)]\s+(.+)$", lines[i]).group(1))
while i < len(lines) and (m := re.match(r"^[\s]*\d+[.)]\s+(.+)$", lines[i])):
items.append(m.group(1))
i += 1
li = "".join(f"<li>{item}</li>" for item in items)
out_lines.append(f"<ol>{li}</ol>")
+1
View File
@@ -1842,6 +1842,7 @@ class QQAdapter(BasePlatformAdapter):
await asyncio.sleep(1.5 * (attempt + 1))
else:
raise
raise AssertionError("unreachable: retry loop exhausted")
# Maximum time (seconds) to wait for reconnection before giving up on send.
_RECONNECT_WAIT_SECONDS = 15.0
+2
View File
@@ -1690,6 +1690,7 @@ class SlackAdapter(BasePlatformAdapter):
await asyncio.sleep(1.5 * (attempt + 1))
continue
raise
raise AssertionError("unreachable: retry loop exhausted")
async def _download_slack_file_bytes(self, url: str, team_id: str = "") -> bytes:
"""Download a Slack file and return raw bytes, with retry."""
@@ -1715,6 +1716,7 @@ class SlackAdapter(BasePlatformAdapter):
await asyncio.sleep(1.5 * (attempt + 1))
continue
raise
raise AssertionError("unreachable: retry loop exhausted")
# ── Channel mention gating ─────────────────────────────────────────────
+4 -1
View File
@@ -25,7 +25,10 @@ import hmac
import logging
import os
import urllib.parse
from typing import Any, Dict, Optional
from typing import Any, Dict, Optional, TYPE_CHECKING
if TYPE_CHECKING:
import aiohttp
from gateway.config import Platform, PlatformConfig
from gateway.platforms.base import (
+3 -2
View File
@@ -703,6 +703,7 @@ class TelegramAdapter(BasePlatformAdapter):
"write_timeout": _env_float("HERMES_TELEGRAM_HTTP_WRITE_TIMEOUT", 20.0),
}
proxy_url = resolve_proxy_url("TELEGRAM_PROXY")
disable_fallback = (os.getenv("HERMES_TELEGRAM_DISABLE_FALLBACK_IPS", "").strip().lower() in ("1", "true", "yes", "on"))
fallback_ips = self._fallback_ips()
if not fallback_ips:
@@ -713,8 +714,6 @@ class TelegramAdapter(BasePlatformAdapter):
", ".join(fallback_ips),
)
proxy_targets = ["api.telegram.org", *fallback_ips]
proxy_url = resolve_proxy_url("TELEGRAM_PROXY", target_hosts=proxy_targets)
if fallback_ips and not proxy_url and not disable_fallback:
logger.info(
"[%s] Telegram fallback IPs active: %s",
@@ -2821,6 +2820,8 @@ class TelegramAdapter(BasePlatformAdapter):
)
sticker = msg.sticker
if sticker is None:
return
emoji = sticker.emoji or ""
set_name = sticker.set_name or ""
+4 -4
View File
@@ -43,10 +43,10 @@ _DOH_PROVIDERS: list[dict] = [
_SEED_FALLBACK_IPS: list[str] = ["149.154.167.220"]
def _resolve_proxy_url(target_hosts=None) -> str | None:
def _resolve_proxy_url() -> str | None:
# Delegate to shared implementation (env vars + macOS system proxy detection)
from gateway.platforms.base import resolve_proxy_url
return resolve_proxy_url("TELEGRAM_PROXY", target_hosts=target_hosts)
return resolve_proxy_url("TELEGRAM_PROXY")
class TelegramFallbackTransport(httpx.AsyncBaseTransport):
@@ -60,7 +60,7 @@ class TelegramFallbackTransport(httpx.AsyncBaseTransport):
def __init__(self, fallback_ips: Iterable[str], **transport_kwargs):
self._fallback_ips = [ip for ip in dict.fromkeys(_normalize_fallback_ips(fallback_ips))]
proxy_url = _resolve_proxy_url(target_hosts=[_TELEGRAM_API_HOST, *self._fallback_ips])
proxy_url = _resolve_proxy_url()
if proxy_url and "proxy" not in transport_kwargs:
transport_kwargs["proxy"] = proxy_url
self._primary = httpx.AsyncHTTPTransport(**transport_kwargs)
@@ -151,7 +151,7 @@ def _resolve_system_dns() -> set[str]:
"""Return the IPv4 addresses that the OS resolver gives for api.telegram.org."""
try:
results = socket.getaddrinfo(_TELEGRAM_API_HOST, 443, socket.AF_INET)
return {addr[4][0] for addr in results}
return {str(addr[4][0]) for addr in results}
except Exception:
return set()
+2 -1
View File
@@ -703,7 +703,8 @@ class WeComAdapter(BasePlatformAdapter):
elif isinstance(appmsg.get("image"), dict):
refs.append(("image", appmsg["image"]))
quote = body.get("quote") if isinstance(body.get("quote"), dict) else {}
raw_quote = body.get("quote")
quote = raw_quote if isinstance(raw_quote, dict) else {}
quote_type = str(quote.get("msgtype") or "").lower()
if quote_type == "image" and isinstance(quote.get("image"), dict):
refs.append(("image", quote["image"]))
+4 -1
View File
@@ -25,7 +25,10 @@ import subprocess
_IS_WINDOWS = platform.system() == "Windows"
from pathlib import Path
from typing import Dict, Optional, Any
from typing import Dict, Optional, Any, TYPE_CHECKING
if TYPE_CHECKING:
import aiohttp
from hermes_constants import get_hermes_dir
+422 -366
View File
File diff suppressed because it is too large Load Diff
+16 -97
View File
@@ -60,10 +60,6 @@ from .config import (
SessionResetPolicy, # noqa: F401 — re-exported via gateway/__init__.py
HomeChannel,
)
from .whatsapp_identity import (
canonical_whatsapp_identifier,
normalize_whatsapp_identifier,
)
@dataclass
@@ -87,9 +83,6 @@ class SessionSource:
user_id_alt: Optional[str] = None # Platform-specific stable alt ID (Signal UUID, Feishu union_id)
chat_id_alt: Optional[str] = None # Signal group internal ID
is_bot: bool = False # True when the message author is a bot/webhook (Discord)
guild_id: Optional[str] = None # Discord guild / Slack workspace / Matrix server scope
parent_chat_id: Optional[str] = None # Parent channel when chat_id refers to a thread
message_id: Optional[str] = None # ID of the triggering message (for pin/reply/react)
@property
def description(self) -> str:
@@ -127,14 +120,8 @@ class SessionSource:
d["user_id_alt"] = self.user_id_alt
if self.chat_id_alt:
d["chat_id_alt"] = self.chat_id_alt
if self.guild_id:
d["guild_id"] = self.guild_id
if self.parent_chat_id:
d["parent_chat_id"] = self.parent_chat_id
if self.message_id:
d["message_id"] = self.message_id
return d
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "SessionSource":
return cls(
@@ -148,9 +135,6 @@ class SessionSource:
chat_topic=data.get("chat_topic"),
user_id_alt=data.get("user_id_alt"),
chat_id_alt=data.get("chat_id_alt"),
guild_id=data.get("guild_id"),
parent_chat_id=data.get("parent_chat_id"),
message_id=data.get("message_id"),
)
@@ -202,31 +186,6 @@ that requires raw IDs). Discord is excluded because mentions use ``<@user_id>``
and the LLM needs the real ID to tag users."""
def _discord_tools_loaded() -> bool:
"""True iff the agent will actually have Discord tools this session.
Two conditions must hold:
1. The `discord` or `discord_admin` toolset is enabled for the
Discord platform via `hermes tools` (opt-in, default OFF).
2. `DISCORD_BOT_TOKEN` is set the tool's `check_fn` gates on it
at registry time, so the toolset being enabled in config is not
enough if the token isn't configured.
Returns False (safe default keeps the stale-API disclaimer) on any
error so a bad config can't silently promise tools the agent lacks.
"""
if not (os.environ.get("DISCORD_BOT_TOKEN") or "").strip():
return False
try:
from hermes_cli.config import load_config
from hermes_cli.tools_config import _get_platform_tools
cfg = load_config()
enabled = _get_platform_tools(cfg, "discord", include_default_mcp_servers=False)
return "discord" in enabled or "discord_admin" in enabled
except Exception:
return False
def build_session_context_prompt(
context: SessionContext,
*,
@@ -314,44 +273,13 @@ def build_session_context_prompt(
"that you can only read messages sent directly to you and respond."
)
elif context.source.platform == Platform.DISCORD:
# Inject the Discord IDs block only when the agent actually has
# Discord tools loaded this session — i.e. the user opted into
# `discord` / `discord_admin` via `hermes tools` AND the bot
# token is configured. Otherwise keep the stale-API disclaimer
# honest so we never promise tools the agent lacks.
if _discord_tools_loaded():
src = context.source
id_lines = ["", "**Discord IDs (for the `discord` / `discord_admin` tools):**"]
if src.guild_id:
id_lines.append(f" - Guild: `{src.guild_id}`")
if src.thread_id and src.parent_chat_id:
id_lines.append(f" - Parent channel: `{src.parent_chat_id}`")
id_lines.append(f" - Thread: `{src.thread_id}` (use as `channel_id` for fetch_messages etc.)")
else:
id_lines.append(f" - Channel: `{src.chat_id}`")
if src.message_id:
id_lines.append(f" - Triggering message: `{src.message_id}`")
lines.extend(id_lines)
else:
lines.append("")
lines.append(
"**Platform notes:** You are running inside Discord. "
"You do NOT have access to Discord-specific APIs — you cannot search "
"channel history, pin messages, manage roles, or list server members. "
"Do not promise to perform these actions. If the user asks, explain "
"that you can only read messages sent directly to you and respond."
)
elif context.source.platform == Platform.BLUEBUBBLES:
lines.append("")
lines.append(
"**Platform notes:** You are responding via iMessage. "
"Keep responses short and conversational — think texts, not essays. "
"Structure longer replies as separate short thoughts, each separated "
"by a blank line (double newline). Each block between blank lines "
"will be delivered as its own iMessage bubble, so write accordingly: "
"one idea per bubble, 13 sentences each. "
"If the user needs a detailed answer, give the short version first "
"and offer to elaborate."
"**Platform notes:** You are running inside Discord. "
"You do NOT have access to Discord-specific APIs — you cannot search "
"channel history, pin messages, manage roles, or list server members. "
"Do not promise to perform these actions. If the user asks, explain "
"that you can only read messages sent directly to you and respond."
)
# Connected platforms
@@ -439,11 +367,11 @@ class SessionEntry:
auto_reset_reason: Optional[str] = None # "idle" or "daily"
reset_had_activity: bool = False # whether the expired session had any messages
# Set by the background expiry watcher after it finalizes an expired
# session (invoking on_session_finalize hooks and evicting the cached
# agent). Persisted to sessions.json so the flag survives gateway
# restarts — prevents redundant finalization runs.
expiry_finalized: bool = False
# Set by the background expiry watcher after it successfully flushes
# memories for this session. Persisted to sessions.json so the flag
# survives gateway restarts (the old in-memory _pre_flushed_sessions
# set was lost on restart, causing redundant re-flushes).
memory_flushed: bool = False
# When True the next call to get_or_create_session() will auto-reset
# this session (create a new session_id) so the user starts fresh.
@@ -479,7 +407,7 @@ class SessionEntry:
"last_prompt_tokens": self.last_prompt_tokens,
"estimated_cost_usd": self.estimated_cost_usd,
"cost_status": self.cost_status,
"expiry_finalized": self.expiry_finalized,
"memory_flushed": self.memory_flushed,
"suspended": self.suspended,
"resume_pending": self.resume_pending,
"resume_reason": self.resume_reason,
@@ -531,7 +459,7 @@ class SessionEntry:
last_prompt_tokens=data.get("last_prompt_tokens", 0),
estimated_cost_usd=data.get("estimated_cost_usd", 0.0),
cost_status=data.get("cost_status", "unknown"),
expiry_finalized=data.get("expiry_finalized", data.get("memory_flushed", False)),
memory_flushed=data.get("memory_flushed", False),
suspended=data.get("suspended", False),
resume_pending=data.get("resume_pending", False),
resume_reason=data.get("resume_reason"),
@@ -590,24 +518,15 @@ def build_session_key(
"""
platform = source.platform.value
if source.chat_type == "dm":
dm_chat_id = source.chat_id
if source.platform == Platform.WHATSAPP:
dm_chat_id = canonical_whatsapp_identifier(source.chat_id)
if dm_chat_id:
if source.chat_id:
if source.thread_id:
return f"agent:main:{platform}:dm:{dm_chat_id}:{source.thread_id}"
return f"agent:main:{platform}:dm:{dm_chat_id}"
return f"agent:main:{platform}:dm:{source.chat_id}:{source.thread_id}"
return f"agent:main:{platform}:dm:{source.chat_id}"
if source.thread_id:
return f"agent:main:{platform}:dm:{source.thread_id}"
return f"agent:main:{platform}:dm"
participant_id = source.user_id_alt or source.user_id
if participant_id and source.platform == Platform.WHATSAPP:
# Same JID/LID-flip bug as the DM case: without canonicalisation, a
# single group member gets two isolated per-user sessions when the
# bridge reshuffles alias forms.
participant_id = canonical_whatsapp_identifier(str(participant_id)) or participant_id
key_parts = ["agent:main", platform, source.chat_type]
if source.chat_id:
+3 -34
View File
@@ -113,11 +113,6 @@ def _get_process_start_time(pid: int) -> Optional[int]:
return None
def get_process_start_time(pid: int) -> Optional[int]:
"""Public wrapper for retrieving a process start time when available."""
return _get_process_start_time(pid)
def _read_process_cmdline(pid: int) -> Optional[str]:
"""Return the process command line as a space-separated string."""
cmdline_path = Path(f"/proc/{pid}/cmdline")
@@ -567,43 +562,17 @@ def release_scoped_lock(scope: str, identity: str) -> None:
pass
def release_all_scoped_locks(
*,
owner_pid: Optional[int] = None,
owner_start_time: Optional[int] = None,
) -> int:
"""Remove scoped lock files in the lock directory.
def release_all_scoped_locks() -> int:
"""Remove all scoped lock files in the lock directory.
Called during --replace to clean up stale locks left by stopped/killed
gateway processes that did not release their locks gracefully. When an
``owner_pid`` is provided, only lock records belonging to that gateway
process are removed. ``owner_start_time`` further narrows the match to
protect against PID reuse.
When no owner is provided, preserves the legacy behavior and removes every
scoped lock file in the directory.
gateway processes that did not release their locks gracefully.
Returns the number of lock files removed.
"""
lock_dir = _get_lock_dir()
removed = 0
if lock_dir.exists():
for lock_file in lock_dir.glob("*.lock"):
if owner_pid is not None:
record = _read_json_file(lock_file)
if not isinstance(record, dict):
continue
try:
record_pid = int(record.get("pid"))
except (TypeError, ValueError):
continue
if record_pid != owner_pid:
continue
if (
owner_start_time is not None
and record.get("start_time") != owner_start_time
):
continue
try:
lock_file.unlink(missing_ok=True)
removed += 1
-135
View File
@@ -1,135 +0,0 @@
"""Shared helpers for canonicalising WhatsApp sender identity.
WhatsApp's bridge can surface the same human under two different JID shapes
within a single conversation:
- LID form: ``999999999999999@lid``
- Phone form: ``15551234567@s.whatsapp.net``
Both the authorisation path (:mod:`gateway.run`) and the session-key path
(:mod:`gateway.session`) need to collapse these aliases to a single stable
identity. This module is the single source of truth for that resolution so
the two paths can never drift apart.
Public helpers:
- :func:`normalize_whatsapp_identifier` strip JID/LID/device/plus syntax
down to the bare numeric identifier.
- :func:`canonical_whatsapp_identifier` walk the bridge's
``lid-mapping-*.json`` files and return a stable canonical identity
across phone/LID variants.
- :func:`expand_whatsapp_aliases` return the full alias set for an
identifier. Used by authorisation code that needs to match any known
form of a sender against an allow-list.
Plugins that need per-sender behaviour on WhatsApp (role-based routing,
per-contact authorisation, policy gating in a gateway hook) should use
``canonical_whatsapp_identifier`` so their bookkeeping lines up with
Hermes' own session keys.
"""
from __future__ import annotations
import json
from typing import Set
from hermes_constants import get_hermes_home
def normalize_whatsapp_identifier(value: str) -> str:
"""Strip WhatsApp JID/LID syntax down to its stable numeric identifier.
Accepts any of the identifier shapes the WhatsApp bridge may emit:
``"60123456789@s.whatsapp.net"``, ``"60123456789:47@s.whatsapp.net"``,
``"60123456789@lid"``, or a bare ``"+601****6789"`` / ``"60123456789"``.
Returns just the numeric identifier (``"60123456789"``) suitable for
equality comparisons.
Useful for plugins that want to match sender IDs against
user-supplied config (phone numbers in ``config.yaml``) without
worrying about which variant the bridge happens to deliver.
"""
return (
str(value or "")
.strip()
.replace("+", "", 1)
.split(":", 1)[0]
.split("@", 1)[0]
)
def expand_whatsapp_aliases(identifier: str) -> Set[str]:
"""Resolve WhatsApp phone/LID aliases via bridge session mapping files.
Returns the set of all identifiers transitively reachable through the
bridge's ``$HERMES_HOME/whatsapp/session/lid-mapping-*.json`` files,
starting from ``identifier``. The result always includes the
normalized input itself, so callers can safely ``in`` check against
the return value without a separate fallback branch.
Returns an empty set if ``identifier`` normalizes to empty.
"""
normalized = normalize_whatsapp_identifier(identifier)
if not normalized:
return set()
session_dir = get_hermes_home() / "whatsapp" / "session"
resolved: Set[str] = set()
queue = [normalized]
while queue:
current = queue.pop(0)
if not current or current in resolved:
continue
resolved.add(current)
for suffix in ("", "_reverse"):
mapping_path = session_dir / f"lid-mapping-{current}{suffix}.json"
if not mapping_path.exists():
continue
try:
mapped = normalize_whatsapp_identifier(
json.loads(mapping_path.read_text(encoding="utf-8"))
)
except Exception:
continue
if mapped and mapped not in resolved:
queue.append(mapped)
return resolved
def canonical_whatsapp_identifier(identifier: str) -> str:
"""Return a stable WhatsApp sender identity across phone-JID/LID variants.
WhatsApp may surface the same person under either a phone-format JID
(``60123456789@s.whatsapp.net``) or a LID (``1234567890@lid``). This
applies to a DM ``chat_id`` *and* to the ``participant_id`` of a
member inside a group chat both represent a user identity, and the
bridge may flip between the two for the same human.
This helper reads the bridge's ``whatsapp/session/lid-mapping-*.json``
files, walks the mapping transitively, and picks the shortest
(numeric-preferred) alias as the canonical identity.
:func:`gateway.session.build_session_key` uses this for both WhatsApp
DM chat_ids and WhatsApp group participant_ids, so callers get the
same session-key identity Hermes itself uses.
Plugins that need per-sender behaviour (role-based routing,
authorisation, per-contact policy) should use this so their
bookkeeping lines up with Hermes' session bookkeeping even when
the bridge reshuffles aliases.
Returns an empty string if ``identifier`` normalizes to empty. If no
mapping files exist yet (fresh bridge install), returns the
normalized input unchanged.
"""
normalized = normalize_whatsapp_identifier(identifier)
if not normalized:
return ""
# expand_whatsapp_aliases always includes `normalized` itself in the
# returned set, so the min() below degrades gracefully to `normalized`
# when no lid-mapping files are present.
aliases = expand_whatsapp_aliases(normalized)
return min(aliases, key=lambda candidate: (len(candidate), candidate))
+2 -2
View File
@@ -11,5 +11,5 @@ Provides subcommands for:
- hermes cron - Manage cron jobs
"""
__version__ = "0.11.0"
__release_date__ = "2026.4.23"
__version__ = "0.10.0"
__release_date__ = "2026.4.16"
+110 -947
View File
File diff suppressed because it is too large Load Diff
+3 -72
View File
@@ -110,40 +110,18 @@ def _display_source(source: str) -> str:
return source.split(":", 1)[1] if source.startswith("manual:") else source
def _classify_exhausted_status(entry) -> tuple[str, bool]:
code = getattr(entry, "last_error_code", None)
reason = str(getattr(entry, "last_error_reason", "") or "").strip().lower()
message = str(getattr(entry, "last_error_message", "") or "").strip().lower()
if code == 429 or any(token in reason for token in ("rate_limit", "usage_limit", "quota", "exhausted")) or any(
token in message for token in ("rate limit", "usage limit", "quota", "too many requests")
):
return "rate-limited", True
if code in {401, 403} or any(token in reason for token in ("invalid_token", "invalid_grant", "unauthorized", "forbidden", "auth")) or any(
token in message for token in ("unauthorized", "forbidden", "expired", "revoked", "invalid token", "authentication")
):
return "auth failed", False
return "exhausted", True
def _format_exhausted_status(entry) -> str:
if entry.last_status != STATUS_EXHAUSTED:
return ""
label, show_retry_window = _classify_exhausted_status(entry)
reason = getattr(entry, "last_error_reason", None)
reason_text = f" {reason}" if isinstance(reason, str) and reason.strip() else ""
code = f" ({entry.last_error_code})" if entry.last_error_code else ""
if not show_retry_window:
return f" {label}{reason_text}{code} (re-auth may be required)"
exhausted_until = _exhausted_until(entry)
if exhausted_until is None:
return f" {label}{reason_text}{code}"
return f" exhausted{reason_text}{code}"
remaining = max(0, int(math.ceil(exhausted_until - time.time())))
if remaining <= 0:
return f" {label}{reason_text}{code} (ready to retry)"
return f" exhausted{reason_text}{code} (ready to retry)"
minutes, seconds = divmod(remaining, 60)
hours, minutes = divmod(minutes, 60)
days, hours = divmod(hours, 24)
@@ -155,7 +133,7 @@ def _format_exhausted_status(entry) -> str:
wait = f"{minutes}m {seconds}s"
else:
wait = f"{seconds}s"
return f" {label}{reason_text}{code} ({wait} left)"
return f" exhausted{reason_text}{code} ({wait} left)"
def auth_add_command(args) -> None:
@@ -408,44 +386,6 @@ def auth_reset_command(args) -> None:
print(f"Reset status on {count} {provider} credentials")
def auth_status_command(args) -> None:
provider = _normalize_provider(getattr(args, "provider", "") or "")
if not provider:
raise SystemExit("Provider is required. Example: `hermes auth status spotify`.")
status = auth_mod.get_auth_status(provider)
if not status.get("logged_in"):
reason = status.get("error")
if reason:
print(f"{provider}: logged out ({reason})")
else:
print(f"{provider}: logged out")
return
print(f"{provider}: logged in")
for key in ("auth_type", "client_id", "redirect_uri", "scope", "expires_at", "api_base_url"):
value = status.get(key)
if value:
print(f" {key}: {value}")
def auth_logout_command(args) -> None:
auth_mod.logout_command(SimpleNamespace(provider=getattr(args, "provider", None)))
def auth_spotify_command(args) -> None:
action = str(getattr(args, "spotify_action", "") or "login").strip().lower()
if action in {"", "login"}:
auth_mod.login_spotify_command(args)
return
if action == "status":
auth_status_command(SimpleNamespace(provider="spotify"))
return
if action == "logout":
auth_logout_command(SimpleNamespace(provider="spotify"))
return
raise SystemExit(f"Unknown Spotify auth action: {action}")
def _interactive_auth() -> None:
"""Interactive credential pool management when `hermes auth` is called bare."""
# Show current pool status first
@@ -643,14 +583,5 @@ def auth_command(args) -> None:
if action == "reset":
auth_reset_command(args)
return
if action == "status":
auth_status_command(args)
return
if action == "logout":
auth_logout_command(args)
return
if action == "spotify":
auth_spotify_command(args)
return
# No subcommand — launch interactive mode
_interactive_auth()
+1 -54
View File
@@ -238,52 +238,6 @@ def get_git_banner_state(repo_dir: Optional[Path] = None) -> Optional[dict]:
return {"upstream": upstream, "local": local, "ahead": max(ahead, 0)}
_RELEASE_URL_BASE = "https://github.com/NousResearch/hermes-agent/releases/tag"
_latest_release_cache: Optional[tuple] = None # (tag, url) once resolved
def get_latest_release_tag(repo_dir: Optional[Path] = None) -> Optional[tuple]:
"""Return ``(tag, release_url)`` for the latest git tag, or None.
Local-only runs ``git describe --tags --abbrev=0`` against the
Hermes checkout. Cached per-process. Release URL always points at the
canonical NousResearch/hermes-agent repo (forks don't get a link).
"""
global _latest_release_cache
if _latest_release_cache is not None:
return _latest_release_cache or None
repo_dir = repo_dir or _resolve_repo_dir()
if repo_dir is None:
_latest_release_cache = () # falsy sentinel — skip future lookups
return None
try:
result = subprocess.run(
["git", "describe", "--tags", "--abbrev=0"],
capture_output=True,
text=True,
timeout=3,
cwd=str(repo_dir),
)
except Exception:
_latest_release_cache = ()
return None
if result.returncode != 0:
_latest_release_cache = ()
return None
tag = (result.stdout or "").strip()
if not tag:
_latest_release_cache = ()
return None
url = f"{_RELEASE_URL_BASE}/{tag}"
_latest_release_cache = (tag, url)
return _latest_release_cache
def format_banner_version_label() -> str:
"""Return the version label shown in the startup banner title."""
base = f"Hermes Agent v{VERSION} ({RELEASE_DATE})"
@@ -565,16 +519,9 @@ def build_welcome_banner(console: Console, model: str, cwd: str,
agent_name = _skin_branding("agent_name", "Hermes Agent")
title_color = _skin_color("banner_title", "#FFD700")
border_color = _skin_color("banner_border", "#CD7F32")
version_label = format_banner_version_label()
release_info = get_latest_release_tag()
if release_info:
_tag, _url = release_info
title_markup = f"[bold {title_color}][link={_url}]{version_label}[/link][/]"
else:
title_markup = f"[bold {title_color}]{version_label}[/]"
outer_panel = Panel(
layout_table,
title=title_markup,
title=f"[bold {title_color}]{format_banner_version_label()}[/]",
border_style=border_color,
padding=(0, 2),
)
+9 -4
View File
@@ -276,7 +276,7 @@ def _get_ps_exe() -> str | None:
global _ps_exe
if _ps_exe is False:
_ps_exe = _find_powershell()
return _ps_exe
return _ps_exe if isinstance(_ps_exe, str) else None
def _windows_has_image() -> bool:
@@ -387,6 +387,8 @@ def _wayland_save(dest: Path) -> bool:
except FileNotFoundError:
logger.debug("wl-paste not installed — Wayland clipboard unavailable")
except ImportError:
raise
except Exception as e:
logger.debug("wl-paste clipboard extraction failed: %s", e)
dest.unlink(missing_ok=True)
@@ -395,14 +397,17 @@ def _wayland_save(dest: Path) -> bool:
def _convert_to_png(path: Path) -> bool:
"""Convert an image file to PNG in-place (requires Pillow or ImageMagick)."""
# Try Pillow first (likely installed in the venv)
try:
from PIL import Image
except ImportError:
raise ImportError(
"Pillow is required for clipboard image conversion. "
"Install with: pip install hermes-agent[cli]"
) from None
try:
img = Image.open(path)
img.save(path, "PNG")
return True
except ImportError:
pass
except Exception as e:
logger.debug("Pillow BMP→PNG conversion failed: %s", e)
-2
View File
@@ -12,7 +12,6 @@ import os
logger = logging.getLogger(__name__)
DEFAULT_CODEX_MODELS: List[str] = [
"gpt-5.5",
"gpt-5.4-mini",
"gpt-5.4",
"gpt-5.3-codex",
@@ -22,7 +21,6 @@ DEFAULT_CODEX_MODELS: List[str] = [
]
_FORWARD_COMPAT_TEMPLATE_MODELS: List[tuple[str, tuple[str, ...]]] = [
("gpt-5.5", ("gpt-5.4", "gpt-5.4-mini", "gpt-5.3-codex")),
("gpt-5.4-mini", ("gpt-5.3-codex", "gpt-5.2-codex")),
("gpt-5.4", ("gpt-5.3-codex", "gpt-5.2-codex")),
("gpt-5.3-codex", ("gpt-5.2-codex",)),
+8 -12
View File
@@ -77,7 +77,7 @@ COMMAND_REGISTRY: list[CommandDef] = [
CommandDef("rollback", "List or restore filesystem checkpoints", "Session",
args_hint="[number]"),
CommandDef("snapshot", "Create or restore state snapshots of Hermes config/state", "Session",
cli_only=True, aliases=("snap",), args_hint="[create|restore <id>|prune]"),
aliases=("snap",), args_hint="[create|restore <id>|prune]"),
CommandDef("stop", "Kill all running background processes", "Session"),
CommandDef("approve", "Approve a pending dangerous command", "Session",
gateway_only=True, args_hint="[session|always]"),
@@ -103,10 +103,10 @@ COMMAND_REGISTRY: list[CommandDef] = [
# Configuration
CommandDef("config", "Show current configuration", "Configuration",
cli_only=True),
CommandDef("model", "Switch model for this session", "Configuration",
aliases=("provider",), args_hint="[model] [--provider name] [--global]"),
CommandDef("gquota", "Show Google Gemini Code Assist quota usage", "Info",
cli_only=True),
CommandDef("model", "Switch model for this session", "Configuration", args_hint="[model] [--provider name] [--global]"),
CommandDef("provider", "Show available providers and current provider",
"Configuration"),
CommandDef("gquota", "Show Google Gemini Code Assist quota usage", "Info"),
CommandDef("personality", "Set a predefined personality", "Configuration",
args_hint="[name]"),
@@ -124,12 +124,9 @@ COMMAND_REGISTRY: list[CommandDef] = [
args_hint="[normal|fast|status]",
subcommands=("normal", "fast", "status", "on", "off")),
CommandDef("skin", "Show or change the display skin/theme", "Configuration",
cli_only=True, args_hint="[name]"),
args_hint="[name]"),
CommandDef("voice", "Toggle voice mode", "Configuration",
args_hint="[on|off|tts|status]", subcommands=("on", "off", "tts", "status")),
CommandDef("busy", "Control what Enter does while Hermes is working", "Configuration",
cli_only=True, args_hint="[queue|interrupt|status]",
subcommands=("queue", "interrupt", "status")),
# Tools & Skills
CommandDef("tools", "Manage tools: /tools [list|disable|enable] [name...]", "Tools & Skills",
@@ -142,8 +139,7 @@ COMMAND_REGISTRY: list[CommandDef] = [
CommandDef("cron", "Manage scheduled tasks", "Tools & Skills",
cli_only=True, args_hint="[subcommand]",
subcommands=("list", "add", "create", "edit", "pause", "resume", "run", "remove")),
CommandDef("reload", "Reload .env variables into the running session", "Tools & Skills",
cli_only=True),
CommandDef("reload", "Reload .env variables into the running session", "Tools & Skills"),
CommandDef("reload-mcp", "Reload MCP servers from config", "Tools & Skills",
aliases=("reload_mcp",)),
CommandDef("browser", "Connect browser tools to your live Chrome via CDP", "Tools & Skills",
@@ -321,7 +317,7 @@ def should_bypass_active_session(command_name: str | None) -> bool:
safety net in gateway.run discards any command text that reaches
the pending queue which meant a mid-run /model (or /reasoning,
/voice, /insights, /title, /resume, /retry, /undo, /compress,
/usage, /reload-mcp, /sethome, /reset) would silently
/usage, /provider, /reload-mcp, /sethome, /reset) would silently
interrupt the agent AND get discarded, producing a zero-char
response. See issue #5057 / PRs #6252, #10370, #4665.
+23 -90
View File
@@ -361,15 +361,6 @@ DEFAULT_CONFIG = {
# to finish, then interrupts any remaining runs after the timeout.
# 0 = no drain, interrupt immediately.
"restart_drain_timeout": 60,
# Max app-level retry attempts for API errors (connection drops,
# provider timeouts, 5xx, etc.) before the agent surfaces the
# failure. The OpenAI SDK already does its own low-level retries
# (max_retries=2 default) for transient network errors; this is
# the Hermes-level retry loop that wraps the whole call. Lower
# this to 1 if you use fallback providers and want fast failover
# on flaky primaries; raise it if you prefer to tolerate longer
# provider hiccups on a single provider.
"api_max_retries": 3,
"service_tier": "",
# Tool-use enforcement: injects system prompt guidance that tells the
# model to actually call tools instead of describing intended actions.
@@ -384,11 +375,7 @@ DEFAULT_CONFIG = {
# Periodic "still working" notification interval (seconds).
# Sends a status message every N seconds so the user knows the
# agent hasn't died during long tasks. 0 = disable notifications.
# Lower values mean faster feedback on slow tasks but more chat
# noise; 180s is a compromise that catches spinning weak-model runs
# (60+ tool iterations with tiny output) before users assume the
# bot is dead and /restart.
"gateway_notify_interval": 180,
"gateway_notify_interval": 600,
},
"terminal": {
@@ -407,23 +394,17 @@ DEFAULT_CONFIG = {
# (bash doesn't source bashrc in non-interactive login mode) or
# zsh-specific files like ``~/.zshrc`` / ``~/.zprofile``.
# Paths support ``~`` / ``${VAR}``. Missing files are silently
# skipped. When empty, Hermes auto-sources ``~/.profile``,
# ``~/.bash_profile``, and ``~/.bashrc`` (in that order) if the
# skipped. When empty, Hermes auto-appends ``~/.bashrc`` if the
# snapshot shell is bash (this is the ``auto_source_bashrc``
# behaviour — disable with that key if you want strict login-only
# semantics).
"shell_init_files": [],
# When true (default), Hermes sources the user's shell rc files
# (``~/.profile``, ``~/.bash_profile``, ``~/.bashrc``) in the
# login shell used to build the environment snapshot. This
# captures PATH additions, shell functions, and aliases — which a
# plain ``bash -l -c`` would otherwise miss because bash skips
# bashrc in non-interactive login mode, and because a default
# Debian/Ubuntu ``~/.bashrc`` short-circuits on non-interactive
# sources. ``~/.profile`` and ``~/.bash_profile`` are tried first
# because ``n`` / ``nvm`` / ``asdf`` installers typically write
# their PATH exports there without an interactivity guard. Turn
# this off if your rc files misbehave when sourced
# When true (default), Hermes sources ``~/.bashrc`` in the login
# shell used to build the environment snapshot. This captures
# PATH additions, shell functions, and aliases defined in the
# user's bashrc — which a plain ``bash -l -c`` would otherwise
# miss because bash skips bashrc in non-interactive login mode.
# Turn this off if you have a bashrc that misbehaves when sourced
# non-interactively (e.g. one that hard-exits on TTY checks).
"auto_source_bashrc": True,
"docker_image": "nikolaik/python-nodejs:python3.11-nodejs20",
@@ -466,12 +447,6 @@ DEFAULT_CONFIG = {
"record_sessions": False, # Auto-record browser sessions as WebM videos
"allow_private_urls": False, # Allow navigating to private/internal IPs (localhost, 192.168.x.x, etc.)
"cdp_url": "", # Optional persistent CDP endpoint for attaching to an existing Chromium/Chrome
# CDP supervisor — dialog + frame detection via a persistent WebSocket.
# Active only when a CDP-capable backend is attached (Browserbase or
# local Chrome via /browser connect). See
# website/docs/developer-guide/browser-supervisor.md.
"dialog_policy": "must_respond", # must_respond | auto_dismiss | auto_accept
"dialog_timeout_s": 300, # Safety auto-dismiss after N seconds under must_respond
"camofox": {
# When true, Hermes sends a stable profile-scoped userId to Camofox
# so the server maps it to a persistent Firefox profile automatically.
@@ -492,27 +467,7 @@ DEFAULT_CONFIG = {
# exceed this are rejected with guidance to use offset+limit.
# 100K chars ≈ 2535K tokens across typical tokenisers.
"file_read_max_chars": 100_000,
# Tool-output truncation thresholds. When terminal output or a
# single read_file page exceeds these limits, Hermes truncates the
# payload sent to the model (keeping head + tail for terminal,
# enforcing pagination for read_file). Tuning these trades context
# footprint against how much raw output the model can see in one
# shot. Ported from anomalyco/opencode PR #23770.
#
# - max_bytes: terminal_tool output cap, in chars
# (default 50_000 ≈ 12-15K tokens).
# - max_lines: read_file pagination cap — the maximum `limit`
# a single read_file call can request before
# being clamped (default 2000).
# - max_line_length: per-line cap applied when read_file emits a
# line-numbered view (default 2000 chars).
"tool_output": {
"max_bytes": 50_000,
"max_lines": 2000,
"max_line_length": 2000,
},
"compression": {
"enabled": True,
"threshold": 0.50, # compress when context usage exceeds this ratio
@@ -521,12 +476,6 @@ DEFAULT_CONFIG = {
},
# Anthropic prompt caching (Claude via OpenRouter or native Anthropic API).
# cache_ttl must be "5m" or "1h" (Anthropic-supported tiers); other values are ignored.
"prompt_caching": {
"cache_ttl": "5m",
},
# AWS Bedrock provider configuration.
# Only used when model.provider is "bedrock".
"bedrock": {
@@ -612,6 +561,14 @@ DEFAULT_CONFIG = {
"timeout": 30,
"extra_body": {},
},
"flush_memories": {
"provider": "auto",
"model": "",
"base_url": "",
"api_key": "",
"timeout": 30,
"extra_body": {},
},
"title_generation": {
"provider": "auto",
"model": "",
@@ -763,10 +720,6 @@ DEFAULT_CONFIG = {
"inherit_mcp_toolsets": True,
"max_iterations": 50, # per-subagent iteration cap (each subagent gets its own budget,
# independent of the parent's max_iterations)
"child_timeout_seconds": 600, # wall-clock timeout for each child agent (floor 30s,
# no ceiling). High-reasoning models on large tasks
# (e.g. gpt-5.5 xhigh, opus-4.6) need generous budgets;
# raise if children time out before producing output.
"reasoning_effort": "", # reasoning effort for subagents: "xhigh", "high", "medium",
# "low", "minimal", "none" (empty = inherit parent's level)
"max_concurrent_children": 3, # max parallel children per batch; floor of 1 enforced, no ceiling
@@ -775,15 +728,6 @@ DEFAULT_CONFIG = {
# warning log if out of range.
"max_spawn_depth": 1, # depth cap (1 = flat [default], 2 = orchestrator→leaf, 3 = three-level)
"orchestrator_enabled": True, # kill switch for role="orchestrator"
# When a subagent hits a dangerous-command approval prompt, the parent's
# prompt_toolkit TUI owns stdin — a thread-local input() call from the
# subagent worker would deadlock the parent UI. To avoid the deadlock,
# subagent threads ALWAYS resolve approvals non-interactively:
# false (default) → auto-deny with a logger.warning audit line (safe)
# true → auto-approve "once" with a logger.warning audit line
# Flip to true only if you trust delegated work to run dangerous cmds
# without human review (cron pipelines, batch automation, etc.).
"subagent_auto_approve": False,
},
# Ephemeral prefill messages file — JSON list of {role, content} dicts
@@ -810,17 +754,6 @@ DEFAULT_CONFIG = {
"inline_shell": False,
# Timeout (seconds) for each !`cmd` snippet when inline_shell is on.
"inline_shell_timeout": 10,
# Run the keyword/pattern security scanner on skills the agent
# writes via skill_manage (create/edit/patch). Off by default
# because the agent can already execute the same code paths via
# terminal() with no gate, so the scan adds friction (blocks
# skills that mention risky keywords in prose) without meaningful
# security. Turn on if you want the belt-and-suspenders — a
# dangerous verdict will then surface as a tool error to the
# agent, which can retry with the flagged content removed.
# External hub installs (trusted/community sources) are always
# scanned regardless of this setting.
"guard_agent_created": False,
},
# Honcho AI-native memory -- reads ~/.honcho/config.json as single source of truth.
@@ -840,7 +773,7 @@ DEFAULT_CONFIG = {
"auto_thread": True, # Auto-create threads on @mention in channels (like Slack)
"reactions": True, # Add 👀/✅/❌ reactions to messages during processing
"channel_prompts": {}, # Per-channel ephemeral system prompts (forum parents apply to child threads)
# discord / discord_admin tools: restrict which actions the agent may call.
# discord_server tool: restrict which actions the agent may call.
# Default (empty) = all actions allowed (subject to bot privileged intents).
# Accepts comma-separated string ("list_guilds,list_channels,fetch_messages")
# or YAML list. Unknown names are dropped with a warning at load time.
@@ -1341,7 +1274,7 @@ OPTIONAL_ENV_VARS = {
"advanced": True,
},
"XIAOMI_API_KEY": {
"description": "Xiaomi MiMo API key for MiMo models (mimo-v2.5-pro, mimo-v2.5, mimo-v2-pro, mimo-v2-omni, mimo-v2-flash)",
"description": "Xiaomi MiMo API key for MiMo models (mimo-v2-pro, mimo-v2-omni, mimo-v2-flash)",
"prompt": "Xiaomi MiMo API Key",
"url": "https://platform.xiaomimimo.com",
"password": True,
@@ -1971,7 +1904,7 @@ def get_missing_config_fields() -> List[Dict[str, Any]]:
config = load_config()
missing = []
def _check(defaults: dict, current: dict, prefix: str = ""):
def _check(defaults: Dict[str, Any], current: Dict[str, Any], prefix: str = ""):
for key, default_value in defaults.items():
if key.startswith('_'):
continue
@@ -2213,8 +2146,8 @@ def check_config_version() -> Tuple[int, int]:
Returns (current_version, latest_version).
"""
config = load_config()
current = config.get("_config_version", 0)
latest = DEFAULT_CONFIG.get("_config_version", 1)
current = int(config.get("_config_version", 0))
latest = int(DEFAULT_CONFIG.get("_config_version", 1))
return current, latest
@@ -2934,7 +2867,7 @@ def migrate_config(interactive: bool = True, quiet: bool = False) -> Dict[str, A
return results
def _deep_merge(base: dict, override: dict) -> dict:
def _deep_merge(base: Dict[str, Any], override: Dict[str, Any]) -> Dict[str, Any]:
"""Recursively merge *override* into *base*, preserving nested defaults.
Keys in *override* take precedence. If both values are dicts the merge
-93
View File
@@ -275,99 +275,6 @@ def copilot_device_code_login(
return None
# ─── Copilot Token Exchange ────────────────────────────────────────────────
# Module-level cache for exchanged Copilot API tokens.
# Maps raw_token_fingerprint -> (api_token, expires_at_epoch).
_jwt_cache: dict[str, tuple[str, float]] = {}
_JWT_REFRESH_MARGIN_SECONDS = 120 # refresh 2 min before expiry
# Token exchange endpoint and headers (matching VS Code / Copilot CLI)
_TOKEN_EXCHANGE_URL = "https://api.github.com/copilot_internal/v2/token"
_EDITOR_VERSION = "vscode/1.104.1"
_EXCHANGE_USER_AGENT = "GitHubCopilotChat/0.26.7"
def _token_fingerprint(raw_token: str) -> str:
"""Short fingerprint of a raw token for cache keying (avoids storing full token)."""
import hashlib
return hashlib.sha256(raw_token.encode()).hexdigest()[:16]
def exchange_copilot_token(raw_token: str, *, timeout: float = 10.0) -> tuple[str, float]:
"""Exchange a raw GitHub token for a short-lived Copilot API token.
Calls ``GET https://api.github.com/copilot_internal/v2/token`` with
the raw GitHub token and returns ``(api_token, expires_at)``.
The returned token is a semicolon-separated string (not a standard JWT)
used as ``Authorization: Bearer <token>`` for Copilot API requests.
Results are cached in-process and reused until close to expiry.
Raises ``ValueError`` on failure.
"""
import urllib.request
fp = _token_fingerprint(raw_token)
# Check cache first
cached = _jwt_cache.get(fp)
if cached:
api_token, expires_at = cached
if time.time() < expires_at - _JWT_REFRESH_MARGIN_SECONDS:
return api_token, expires_at
req = urllib.request.Request(
_TOKEN_EXCHANGE_URL,
method="GET",
headers={
"Authorization": f"token {raw_token}",
"User-Agent": _EXCHANGE_USER_AGENT,
"Accept": "application/json",
"Editor-Version": _EDITOR_VERSION,
},
)
try:
with urllib.request.urlopen(req, timeout=timeout) as resp:
data = json.loads(resp.read().decode())
except Exception as exc:
raise ValueError(f"Copilot token exchange failed: {exc}") from exc
api_token = data.get("token", "")
expires_at = data.get("expires_at", 0)
if not api_token:
raise ValueError("Copilot token exchange returned empty token")
# Convert expires_at to float if needed
expires_at = float(expires_at) if expires_at else time.time() + 1800
_jwt_cache[fp] = (api_token, expires_at)
logger.debug(
"Copilot token exchanged, expires_at=%s",
expires_at,
)
return api_token, expires_at
def get_copilot_api_token(raw_token: str) -> str:
"""Exchange a raw GitHub token for a Copilot API token, with fallback.
Convenience wrapper: returns the exchanged token on success, or the
raw token unchanged if the exchange fails (e.g. network error, unsupported
account type). This preserves existing behaviour for accounts that don't
need exchange while enabling access to internal-only models for those that do.
"""
if not raw_token:
return raw_token
try:
api_token, _ = exchange_copilot_token(raw_token)
return api_token
except Exception as exc:
logger.debug("Copilot token exchange failed, using raw token: %s", exc)
return raw_token
# ─── Copilot API Headers ───────────────────────────────────────────────────
def copilot_request_headers(
-9
View File
@@ -93,9 +93,6 @@ def cron_list(show_all: bool = False):
script = job.get("script")
if script:
print(f" Script: {script}")
workdir = job.get("workdir")
if workdir:
print(f" Workdir: {workdir}")
# Execution history
last_status = job.get("last_status")
@@ -171,7 +168,6 @@ def cron_create(args):
skill=getattr(args, "skill", None),
skills=_normalize_skills(getattr(args, "skill", None), getattr(args, "skills", None)),
script=getattr(args, "script", None),
workdir=getattr(args, "workdir", None),
)
if not result.get("success"):
print(color(f"Failed to create job: {result.get('error', 'unknown error')}", Colors.RED))
@@ -184,8 +180,6 @@ def cron_create(args):
job_data = result.get("job", {})
if job_data.get("script"):
print(f" Script: {job_data['script']}")
if job_data.get("workdir"):
print(f" Workdir: {job_data['workdir']}")
print(f" Next run: {result['next_run_at']}")
return 0
@@ -224,7 +218,6 @@ def cron_edit(args):
repeat=getattr(args, "repeat", None),
skills=final_skills,
script=getattr(args, "script", None),
workdir=getattr(args, "workdir", None),
)
if not result.get("success"):
print(color(f"Failed to update job: {result.get('error', 'unknown error')}", Colors.RED))
@@ -240,8 +233,6 @@ def cron_edit(args):
print(" Skills: none")
if updated.get("script"):
print(f" Script: {updated['script']}")
if updated.get("workdir"):
print(f" Workdir: {updated['workdir']}")
return 0
+2 -2
View File
@@ -18,7 +18,7 @@ import os
import sys
import time
import logging
from typing import Optional, Tuple
from typing import Any, Callable, Optional, Tuple
import requests
@@ -108,7 +108,7 @@ def wait_for_registration_success(
device_code: str,
interval: int = 3,
expires_in: int = 7200,
on_waiting: Optional[callable] = None,
on_waiting: Optional[Callable[..., Any]] = None,
) -> Tuple[str, str]:
"""Block until the registration succeeds or times out.
+8 -29
View File
@@ -29,7 +29,6 @@ if _env_path.exists():
load_dotenv(PROJECT_ROOT / ".env", override=False, encoding="utf-8")
from hermes_cli.colors import Colors, color
from hermes_cli.models import _HERMES_USER_AGENT
from hermes_constants import OPENROUTER_MODELS_URL
from utils import base_url_host_matches
@@ -296,33 +295,16 @@ def run_doctor(args):
except Exception:
pass
try:
from hermes_cli.config import get_compatible_custom_providers as _compatible_custom_providers
from hermes_cli.providers import resolve_provider_full as _resolve_provider_full
from hermes_cli.auth import resolve_provider as _resolve_provider
except Exception:
_compatible_custom_providers = None
_resolve_provider_full = None
custom_providers = []
if _compatible_custom_providers is not None:
try:
custom_providers = _compatible_custom_providers(cfg)
except Exception:
custom_providers = []
user_providers = cfg.get("providers")
if isinstance(user_providers, dict):
known_providers.update(str(name).strip().lower() for name in user_providers if str(name).strip())
for entry in custom_providers:
if not isinstance(entry, dict):
continue
name = str(entry.get("name") or "").strip()
if name:
known_providers.add("custom:" + name.lower().replace(" ", "-"))
_resolve_provider = None
canonical_provider = provider
if provider and _resolve_provider_full is not None and provider != "auto":
provider_def = _resolve_provider_full(provider, user_providers, custom_providers)
canonical_provider = provider_def.id if provider_def is not None else None
if provider and _resolve_provider is not None and provider != "auto":
try:
canonical_provider = _resolve_provider(provider)
except Exception:
canonical_provider = None
if provider and provider != "auto":
if canonical_provider is None or (known_providers and canonical_provider not in known_providers):
@@ -975,10 +957,7 @@ def run_doctor(args):
if base_url_host_matches(_base, "api.kimi.com") and _base.rstrip("/").endswith("/coding"):
_base = _base.rstrip("/") + "/v1"
_url = (_base.rstrip("/") + "/models") if _base else _default_url
_headers = {
"Authorization": f"Bearer {_key}",
"User-Agent": _HERMES_USER_AGENT,
}
_headers = {"Authorization": f"Bearer {_key}"}
if base_url_host_matches(_base, "api.kimi.com"):
_headers["User-Agent"] = "claude-code/0.1.0"
_resp = httpx.get(
-2
View File
@@ -267,8 +267,6 @@ def run_dump(args):
("ANTHROPIC_API_KEY", "anthropic"),
("ANTHROPIC_TOKEN", "anthropic_token"),
("NOUS_API_KEY", "nous"),
("GOOGLE_API_KEY", "google/gemini"),
("GEMINI_API_KEY", "gemini"),
("GLM_API_KEY", "glm/zai"),
("ZAI_API_KEY", "zai"),
("KIMI_API_KEY", "kimi"),
+7 -62
View File
@@ -175,60 +175,6 @@ def _request_gateway_self_restart(pid: int) -> bool:
return True
def _graceful_restart_via_sigusr1(pid: int, drain_timeout: float) -> bool:
"""Send SIGUSR1 to a gateway PID and wait for it to exit gracefully.
SIGUSR1 is wired in gateway/run.py to ``request_restart(via_service=True)``
which drains in-flight agent runs (up to ``agent.restart_drain_timeout``
seconds), then exits with code 75. Both systemd (``Restart=on-failure``
+ ``RestartForceExitStatus=75``) and launchd (``KeepAlive.SuccessfulExit
= false``) relaunch the process after the graceful exit.
This is the drain-aware alternative to ``systemctl restart`` / ``SIGTERM``,
which SIGKILL in-flight agents after a short timeout.
Args:
pid: Gateway process PID (systemd MainPID, launchd PID, or bare
process PID).
drain_timeout: Seconds to wait for the process to exit after sending
SIGUSR1. Should be slightly larger than the gateway's
``agent.restart_drain_timeout`` to allow the drain loop to
finish cleanly.
Returns:
True if the PID was signalled and exited within the timeout.
False if SIGUSR1 couldn't be sent or the process didn't exit in
time (caller should fall back to a harder restart path).
"""
if not hasattr(signal, "SIGUSR1"):
return False
if pid <= 0:
return False
try:
os.kill(pid, signal.SIGUSR1)
except ProcessLookupError:
# Already gone — nothing to drain.
return True
except (PermissionError, OSError):
return False
import time as _time
deadline = _time.monotonic() + max(drain_timeout, 1.0)
while _time.monotonic() < deadline:
try:
os.kill(pid, 0) # signal 0 — probe liveness
except ProcessLookupError:
return True
except PermissionError:
# Process still exists but we can't signal it. Treat as alive
# so the caller falls back.
pass
_time.sleep(0.5)
# Drain didn't finish in time.
return False
def _append_unique_pid(pids: list[int], pid: int | None, exclude_pids: set[int]) -> None:
if pid is None or pid <= 0:
return
@@ -1523,14 +1469,7 @@ def generate_systemd_unit(system: bool = False, run_as_user: str | None = None)
path_entries.append(resolved_node_dir)
common_bin_paths = ["/usr/local/sbin", "/usr/local/bin", "/usr/sbin", "/usr/bin", "/sbin", "/bin"]
# systemd's TimeoutStopSec must exceed the gateway's drain_timeout so
# there's budget left for post-interrupt cleanup (tool subprocess kill,
# adapter disconnect, session DB close) before systemd escalates to
# SIGKILL on the cgroup — otherwise bash/sleep tool-call children left
# by a force-interrupted agent get reaped by systemd instead of us
# (#8202). 30s of headroom covers the worst case we've observed.
_drain_timeout = int(_get_restart_drain_timeout() or 0)
restart_timeout = max(60, _drain_timeout) + 30
restart_timeout = max(60, int(_get_restart_drain_timeout() or 0))
if system:
username, group_name, home_dir = _system_service_identity(run_as_user)
@@ -3108,6 +3047,12 @@ def _setup_wecom():
print_success("💬 WeCom configured!")
def _setup_wecom_callback():
"""Configure WeCom Callback (self-built app) via the standard platform setup."""
wecom_platform = next(p for p in _PLATFORMS if p["key"] == "wecom_callback")
_setup_standard_platform(wecom_platform)
def _is_service_installed() -> bool:
"""Check if the gateway is installed as a system service."""
if supports_systemd_services():
+100 -624
View File
File diff suppressed because it is too large Load Diff
+9 -48
View File
@@ -12,12 +12,8 @@ Different LLM providers expect model identifiers in different formats:
model IDs, but Claude still uses hyphenated native names like
``claude-sonnet-4-6``.
- **OpenCode Go** preserves dots in model names: ``minimax-m2.7``.
- **DeepSeek** accepts ``deepseek-chat`` (V3), ``deepseek-reasoner``
(R1-family), and the first-class V-series IDs (``deepseek-v4-pro``,
``deepseek-v4-flash``, and any future ``deepseek-v<N>-*``). Older
Hermes revisions folded every non-reasoner input into
``deepseek-chat``, which on aggregators routes to V3 so a user
picking V4 Pro was silently downgraded.
- **DeepSeek** only accepts two model identifiers:
``deepseek-chat`` and ``deepseek-reasoner``.
- **Custom** and remaining providers pass the name through as-is.
This module centralises that translation so callers can simply write::
@@ -29,7 +25,6 @@ Inspired by Clawdbot's ``normalizeAnthropicModelId`` pattern.
from __future__ import annotations
import re
from typing import Optional
# ---------------------------------------------------------------------------
@@ -105,15 +100,6 @@ _MATCHING_PREFIX_STRIP_PROVIDERS: frozenset[str] = frozenset({
"custom",
})
# Providers whose APIs require lowercase model IDs. Xiaomi's
# ``api.xiaomimimo.com`` rejects mixed-case names like ``MiMo-V2.5-Pro``
# that users might copy from marketing docs — it only accepts
# ``mimo-v2.5-pro``. After stripping a matching provider prefix, these
# providers also get ``.lower()`` applied.
_LOWERCASE_MODEL_PROVIDERS: frozenset[str] = frozenset({
"xiaomi",
})
# ---------------------------------------------------------------------------
# DeepSeek special handling
# ---------------------------------------------------------------------------
@@ -129,30 +115,17 @@ _DEEPSEEK_REASONER_KEYWORDS: frozenset[str] = frozenset({
})
_DEEPSEEK_CANONICAL_MODELS: frozenset[str] = frozenset({
"deepseek-chat", # V3 on DeepSeek direct and most aggregators
"deepseek-reasoner", # R1-family reasoning model
"deepseek-v4-pro", # V4 Pro — first-class model ID
"deepseek-v4-flash", # V4 Flash — first-class model ID
"deepseek-chat",
"deepseek-reasoner",
})
# First-class V-series IDs (``deepseek-v4-pro``, ``deepseek-v4-flash``,
# future ``deepseek-v5-*``, dated variants like ``deepseek-v4-flash-20260423``).
# Verified empirically 2026-04-24: DeepSeek's Chat Completions API returns
# ``provider: DeepSeek`` / ``model: deepseek-v4-flash-20260423`` when called
# with ``model=deepseek/deepseek-v4-flash``, so these names are not aliases
# of ``deepseek-chat`` and must not be folded into it.
_DEEPSEEK_V_SERIES_RE = re.compile(r"^deepseek-v\d+([-.].+)?$")
def _normalize_for_deepseek(model_name: str) -> str:
"""Map a model input to a DeepSeek-accepted identifier.
"""Map any model input to one of DeepSeek's two accepted identifiers.
Rules:
- Already a known canonical (``deepseek-chat``/``deepseek-reasoner``/
``deepseek-v4-pro``/``deepseek-v4-flash``) -> pass through.
- Matches the V-series pattern ``deepseek-v<digit>...`` -> pass through
(covers future ``deepseek-v5-*`` and dated variants without a release).
- Contains a reasoner keyword (r1, think, reasoning, cot, reasoner)
- Already ``deepseek-chat`` or ``deepseek-reasoner`` -> pass through.
- Contains any reasoner keyword (r1, think, reasoning, cot, reasoner)
-> ``deepseek-reasoner``.
- Everything else -> ``deepseek-chat``.
@@ -160,17 +133,13 @@ def _normalize_for_deepseek(model_name: str) -> str:
model_name: The bare model name (vendor prefix already stripped).
Returns:
A DeepSeek-accepted model identifier.
One of ``"deepseek-chat"`` or ``"deepseek-reasoner"``.
"""
bare = _strip_vendor_prefix(model_name).lower()
if bare in _DEEPSEEK_CANONICAL_MODELS:
return bare
# V-series first-class IDs (v4-pro, v4-flash, future v5-*, dated variants)
if _DEEPSEEK_V_SERIES_RE.match(bare):
return bare
# Check for reasoner-like keywords anywhere in the name
for keyword in _DEEPSEEK_REASONER_KEYWORDS:
if keyword in bare:
@@ -378,9 +347,6 @@ def normalize_model_for_provider(model_input: str, target_provider: str) -> str:
>>> normalize_model_for_provider("claude-sonnet-4.6", "zai")
'claude-sonnet-4.6'
>>> normalize_model_for_provider("MiMo-V2.5-Pro", "xiaomi")
'mimo-v2.5-pro'
"""
name = (model_input or "").strip()
if not name:
@@ -444,12 +410,7 @@ def normalize_model_for_provider(model_input: str, target_provider: str) -> str:
# --- Direct providers: repair matching provider prefixes only ---
if provider in _MATCHING_PREFIX_STRIP_PROVIDERS:
result = _strip_matching_provider_prefix(name, provider)
# Some providers require lowercase model IDs (e.g. Xiaomi's API
# rejects "MiMo-V2.5-Pro" but accepts "mimo-v2.5-pro").
if provider in _LOWERCASE_MODEL_PROVIDERS:
result = result.lower()
return result
return _strip_matching_provider_prefix(name, provider)
# --- Authoritative native providers: preserve user-facing slugs as-is ---
if provider in _AUTHORITATIVE_NATIVE_PROVIDERS:

Some files were not shown because too many files have changed in this diff Show More