Compare commits

..

1 Commits

Author SHA1 Message Date
Brooklyn Nicholson 25ba6783b8 feat(tui-gateway): WebSocket transport + /chat web UI, wire-compatible with Ink
Extracts the JSON-RPC transport from stdio into an abstraction so the same
dispatcher drives Ink over stdio AND browser/iOS clients over WebSocket
without duplicating handler logic. Adds a Chat page to the existing web
dashboard that exercises the full surface — streaming, tool calls, slash
commands, model picker, session resume.

Backend
-------
* tui_gateway/transport.py — Transport protocol + contextvar binding + the
  module-level StdioTransport. Stream is resolved through a callback so
  tests that monkeypatch `_real_stdout` keep working.
* tui_gateway/server.py — write_json and dispatch are now transport-aware.
  Backward compatible: no transport bound = legacy stdio path, so entry.py
  (Ink's stdio entrypoint) is unchanged externally.
* tui_gateway/ws.py — WSTransport + handle_ws coroutine. Safe to call from
  any thread: detects loop-thread deadlock and fire-and-forget schedules
  when needed, blocking run_coroutine_threadsafe + future.result otherwise.
* hermes_cli/web_server.py — mounts /api/ws on the existing FastAPI app,
  gated by the same ephemeral session token used for REST. Adds
  HERMES_DASHBOARD_DEV_TOKEN env override so Vite HMR dev can share the
  token with the backend.

Frontend
--------
* web/src/lib/gatewayClient.ts — browser WebSocket JSON-RPC client that
  mirrors ui-tui/src/gatewayClient.ts.
* web/src/lib/slashExec.ts — slash command pipeline (slash.exec with
  command.dispatch fallback + exec/plugin/alias/skill/send directive
  handling), mirrors ui-tui/src/app/createSlashHandler.ts.
* web/src/pages/ChatPage.tsx — transcript + composer driven entirely by
  the WS.
* web/src/components/SlashPopover.tsx — autocomplete popover above the
  composer, debounced complete.slash.
* web/src/components/ModelPickerDialog.tsx — two-stage provider/model
  picker; confirms by emitting /model through the slash pipeline.
* web/src/components/ToolCall.tsx — expandable tool call row (Ink-style
  chevron + context + summary/error/diff).
* web/src/App.tsx — logo links to /, Chat entry added to nav.
* web/src/pages/SessionsPage.tsx — every session row gets an Open-in-chat
  button that navigates to /chat?resume=<id> (uses session.resume).
* web/vite.config.ts — /api proxy configured with ws: true so WebSocket
  upgrades forward in dev mode; injectDevToken plugin reads
  HERMES_DASHBOARD_DEV_TOKEN and injects it into the served index.html so
  Vite HMR can authenticate against FastAPI without a separate flow.

Tests
-----
tests/hermes_cli/test_web_server.py picks up three new classes:

* TestTuiGatewayWebSocket — handshake, auth rejection, parse errors,
  unknown methods, inline + pool handler round-trips, session event
  routing, disconnect cleanup.
* TestTuiGatewayTransportParity — byte-identical envelopes for the same
  RPC over stdio vs WS (unknown method, inline handler, error envelope,
  explicit stdio transport).
* TestTuiGatewayE2EAnyPort — scripted multi-RPC conversation driven
  identically via handle_request and via WebSocket; order + shape must
  match. This is the "hermes --tui in any port" check.

Existing tests under tests/tui_gateway/ and tests/test_tui_gateway_server.py
all still pass unchanged — backward compat preserved.

Try it
------
    hermes dashboard          # builds web, serves on :9119, click Chat

Dev with HMR:

    export HERMES_DASHBOARD_DEV_TOKEN="dev-\$(openssl rand -hex 16)"
    hermes dashboard --no-open
    cd web && npm run dev     # :5173, /api + /api/ws proxied to :9119

fix(chat): insert tool rows before the streaming assistant message

Transcript used to read "user → empty assistant bubble → tool → bubble
filling in", which is disorienting: the streaming cursor sits at the top
while the "work" rows appear below it chronologically.

Now tool.start inserts the row just before the current streaming
assistant message, so the order reads "user → tools → final message".
If no streaming assistant exists yet (rare), tools still append at the
end; tool.progress / tool.complete match by id regardless of position.

fix(web-chat): font, composer, streaming caret + port GoodVibesHeart

- ChatPage root opts out of App's `font-mondwest uppercase` (dashboard
  chrome style) — adds `font-courier normal-case` so transcript prose is
  readable mono mixed-case instead of pixel-display caps.
- Composer: textarea + send button wrapped as one bordered unit with
  `focus-within` ring; `font-sans` dropped (it mapped to `Collapse`
  display). Heights stretch together via `items-stretch`; button is a
  flush cap with `border-l` divider.
- Streaming caret no longer wraps to a new line when the assistant
  renders a block element. Markdown now takes a `streaming` prop and
  injects the caret inside the last block (paragraph, list item, code)
  so it hugs the trailing character. Caret sized in em units.
- EmptyState gets a blinking caret + <kbd> shortcut chips.
- Port ui-tui's GoodVibesHeart easter egg to the web: typing "thanks" /
  "ty" / "ily" / "good bot" flashes a Lucide heart next to the
  connection badge (same regex, same 650ms beat, same palette as
  ui-tui/src/app/useMainApp.ts).
2026-04-23 21:11:04 -04:00
540 changed files with 8263 additions and 91696 deletions
-3
View File
@@ -53,9 +53,6 @@ jobs:
- name: Extract skill metadata for dashboard
run: python3 website/scripts/extract-skills.py
- name: Regenerate per-skill docs pages + catalogs
run: python3 website/scripts/generate-skill-docs.py
- name: Build skills index (if not already present)
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-3
View File
@@ -36,9 +36,6 @@ jobs:
- name: Extract skill metadata for dashboard
run: python3 website/scripts/extract-skills.py
- name: Regenerate per-skill docs pages + catalogs
run: python3 website/scripts/generate-skill-docs.py
- name: Lint docs diagrams
run: npm run lint:diagrams
working-directory: website
-13
View File
@@ -240,19 +240,6 @@ npm run fmt # prettier
npm test # vitest
```
### TUI in the Dashboard (`hermes dashboard` → `/chat`)
The dashboard embeds the real `hermes --tui`**not** a rewrite. See `hermes_cli/pty_bridge.py` + the `@app.websocket("/api/pty")` endpoint in `hermes_cli/web_server.py`.
- Browser loads `web/src/pages/ChatPage.tsx`, which mounts xterm.js's `Terminal` with the WebGL renderer, `@xterm/addon-fit` for container-driven resize, and `@xterm/addon-unicode11` for modern wide-character widths.
- `/api/pty?token=…` upgrades to a WebSocket; auth uses the same ephemeral `_SESSION_TOKEN` as REST, via query param (browsers can't set `Authorization` on WS upgrade).
- The server spawns whatever `hermes --tui` would spawn, through `ptyprocess` (POSIX PTY — WSL works, native Windows does not).
- Frames: raw PTY bytes each direction; resize via `\x1b[RESIZE:<cols>;<rows>]` intercepted on the server and applied with `TIOCSWINSZ`.
**Do not re-implement the primary chat experience in React.** The main transcript, composer/input flow (including slash-command behavior), and PTY-backed terminal belong to the embedded `hermes --tui` — anything new you add to Ink shows up in the dashboard automatically. If you find yourself rebuilding the transcript or composer for the dashboard, stop and extend Ink instead.
**Structured React UI around the TUI is allowed when it is not a second chat surface.** Sidebar widgets, inspectors, summaries, status panels, and similar supporting views (e.g. `ChatSidebar`, `ModelPickerDialog`, `ToolCall`) are fine when they complement the embedded TUI rather than replacing the transcript / composer / terminal. Keep their state independent of the PTY child's session and surface their failures non-destructively so the terminal pane keeps working unimpaired.
---
## Adding New Tools
+4 -12
View File
@@ -10,11 +10,9 @@ ENV PYTHONUNBUFFERED=1
ENV PLAYWRIGHT_BROWSERS_PATH=/opt/hermes/.playwright
# Install system dependencies in one layer, clear APT cache
# tini reaps orphaned zombie processes (MCP stdio subprocesses, git, bun, etc.)
# that would otherwise accumulate when hermes runs as PID 1. See #15012.
RUN apt-get update && \
apt-get install -y --no-install-recommends \
build-essential nodejs npm python3 ripgrep ffmpeg gcc python3-dev libffi-dev procps git openssh-client docker-cli tini && \
build-essential nodejs npm python3 ripgrep ffmpeg gcc python3-dev libffi-dev procps git openssh-client docker-cli && \
rm -rf /var/lib/apt/lists/*
# Non-root user for runtime; UID can be overridden via HERMES_UID at runtime
@@ -43,15 +41,9 @@ COPY --chown=hermes:hermes . .
# Build web dashboard (Vite outputs to hermes_cli/web_dist/)
RUN cd web && npm run build
# ---------- Permissions ----------
# Make install dir world-readable so any HERMES_UID can read it at runtime.
# The venv needs to be traversable too.
USER root
RUN chmod -R a+rX /opt/hermes
# Start as root so the entrypoint can usermod/groupmod + gosu.
# If HERMES_UID is unset, the entrypoint drops to the default hermes user (10000).
# ---------- Python virtualenv ----------
RUN chown hermes:hermes /opt/hermes
USER hermes
RUN uv venv && \
uv pip install --no-cache-dir -e ".[all]"
@@ -60,4 +52,4 @@ ENV HERMES_WEB_DIST=/opt/hermes/hermes_cli/web_dist
ENV HERMES_HOME=/opt/data
ENV PATH="/opt/data/.local/bin:${PATH}"
VOLUME [ "/opt/data" ]
ENTRYPOINT [ "/usr/bin/tini", "-g", "--", "/opt/hermes/docker/entrypoint.sh" ]
ENTRYPOINT [ "/opt/hermes/docker/entrypoint.sh" ]
+3 -9
View File
@@ -60,7 +60,7 @@ from acp_adapter.events import (
make_tool_progress_cb,
)
from acp_adapter.permissions import make_approval_callback
from acp_adapter.session import SessionManager, SessionState, _expand_acp_enabled_toolsets
from acp_adapter.session import SessionManager, SessionState
logger = logging.getLogger(__name__)
@@ -287,11 +287,7 @@ class HermesACPAgent(acp.Agent):
try:
from model_tools import get_tool_definitions
enabled_toolsets = _expand_acp_enabled_toolsets(
getattr(state.agent, "enabled_toolsets", None) or ["hermes-acp"],
mcp_server_names=[server.name for server in mcp_servers],
)
state.agent.enabled_toolsets = enabled_toolsets
enabled_toolsets = getattr(state.agent, "enabled_toolsets", None) or ["hermes-acp"]
disabled_toolsets = getattr(state.agent, "disabled_toolsets", None)
state.agent.tools = get_tool_definitions(
enabled_toolsets=enabled_toolsets,
@@ -758,9 +754,7 @@ class HermesACPAgent(acp.Agent):
def _cmd_tools(self, args: str, state: SessionState) -> str:
try:
from model_tools import get_tool_definitions
toolsets = _expand_acp_enabled_toolsets(
getattr(state.agent, "enabled_toolsets", None) or ["hermes-acp"]
)
toolsets = getattr(state.agent, "enabled_toolsets", None) or ["hermes-acp"]
tools = get_tool_definitions(enabled_toolsets=toolsets, quiet_mode=True)
if not tools:
return "No tools available."
+1 -28
View File
@@ -106,24 +106,6 @@ def _register_task_cwd(task_id: str, cwd: str) -> None:
logger.debug("Failed to register ACP task cwd override", exc_info=True)
def _expand_acp_enabled_toolsets(
toolsets: List[str] | None = None,
mcp_server_names: List[str] | None = None,
) -> List[str]:
"""Return ACP toolsets plus explicit MCP server toolsets for this session."""
expanded: List[str] = []
for name in list(toolsets or ["hermes-acp"]):
if name and name not in expanded:
expanded.append(name)
for server_name in list(mcp_server_names or []):
toolset_name = f"mcp-{server_name}"
if server_name and toolset_name not in expanded:
expanded.append(toolset_name)
return expanded
def _clear_task_cwd(task_id: str) -> None:
"""Remove task-specific cwd overrides for an ACP session."""
if not task_id:
@@ -555,18 +537,9 @@ class SessionManager:
elif isinstance(model_cfg, str) and model_cfg.strip():
default_model = model_cfg.strip()
configured_mcp_servers = [
name
for name, cfg in (config.get("mcp_servers") or {}).items()
if not isinstance(cfg, dict) or cfg.get("enabled", True) is not False
]
kwargs = {
"platform": "acp",
"enabled_toolsets": _expand_acp_enabled_toolsets(
["hermes-acp"],
mcp_server_names=configured_mcp_servers,
),
"enabled_toolsets": ["hermes-acp"],
"quiet_mode": True,
"session_id": session_id,
"model": model or default_model,
+7 -112
View File
@@ -14,8 +14,6 @@ import copy
import json
import logging
import os
import platform
import subprocess
from pathlib import Path
from hermes_constants import get_hermes_home
@@ -279,9 +277,8 @@ def _is_oauth_token(key: str) -> bool:
Positively identifies Anthropic OAuth tokens by their key format:
- ``sk-ant-`` prefix (but NOT ``sk-ant-api``) → setup tokens, managed keys
- ``eyJ`` prefix → JWTs from the Anthropic OAuth flow
- ``cc-`` prefix → Claude Code OAuth access tokens (from CLAUDE_CODE_OAUTH_TOKEN)
Non-Anthropic keys (MiniMax, Alibaba, etc.) don't match any pattern
Non-Anthropic keys (MiniMax, Alibaba, etc.) don't match either pattern
and correctly return False.
"""
if not key:
@@ -295,9 +292,6 @@ def _is_oauth_token(key: str) -> bool:
# JWTs from Anthropic OAuth flow
if key.startswith("eyJ"):
return True
# Claude Code OAuth access tokens (opaque, from CLAUDE_CODE_OAUTH_TOKEN)
if key.startswith("cc-"):
return True
return False
@@ -467,72 +461,8 @@ def build_anthropic_bedrock_client(region: str):
)
def _read_claude_code_credentials_from_keychain() -> Optional[Dict[str, Any]]:
"""Read Claude Code OAuth credentials from the macOS Keychain.
Claude Code >=2.1.114 stores credentials in the macOS Keychain under the
service name "Claude Code-credentials" rather than (or in addition to)
the JSON file at ~/.claude/.credentials.json.
The password field contains a JSON string with the same claudeAiOauth
structure as the JSON file.
Returns dict with {accessToken, refreshToken?, expiresAt?} or None.
"""
import platform
import subprocess
if platform.system() != "Darwin":
return None
try:
# Read the "Claude Code-credentials" generic password entry
result = subprocess.run(
["security", "find-generic-password",
"-s", "Claude Code-credentials",
"-w"],
capture_output=True,
text=True,
timeout=5,
)
except (OSError, subprocess.TimeoutExpired):
logger.debug("Keychain: security command not available or timed out")
return None
if result.returncode != 0:
logger.debug("Keychain: no entry found for 'Claude Code-credentials'")
return None
raw = result.stdout.strip()
if not raw:
return None
try:
data = json.loads(raw)
except json.JSONDecodeError:
logger.debug("Keychain: credentials payload is not valid JSON")
return None
oauth_data = data.get("claudeAiOauth")
if oauth_data and isinstance(oauth_data, dict):
access_token = oauth_data.get("accessToken", "")
if access_token:
return {
"accessToken": access_token,
"refreshToken": oauth_data.get("refreshToken", ""),
"expiresAt": oauth_data.get("expiresAt", 0),
"source": "macos_keychain",
}
return None
def read_claude_code_credentials() -> Optional[Dict[str, Any]]:
"""Read refreshable Claude Code OAuth credentials.
Checks two sources in order:
1. macOS Keychain (Darwin only) — "Claude Code-credentials" entry
2. ~/.claude/.credentials.json file
"""Read refreshable Claude Code OAuth credentials from ~/.claude/.credentials.json.
This intentionally excludes ~/.claude.json primaryApiKey. Opencode's
subscription flow is OAuth/setup-token based with refreshable credentials,
@@ -541,12 +471,6 @@ def read_claude_code_credentials() -> Optional[Dict[str, Any]]:
Returns dict with {accessToken, refreshToken?, expiresAt?} or None.
"""
# Try macOS Keychain first (covers Claude Code >=2.1.114)
kc_creds = _read_claude_code_credentials_from_keychain()
if kc_creds:
return kc_creds
# Fall back to JSON file
cred_path = Path.home() / ".claude" / ".credentials.json"
if cred_path.exists():
try:
@@ -717,9 +641,7 @@ def _write_claude_code_credentials(
existing["claudeAiOauth"] = oauth_data
cred_path.parent.mkdir(parents=True, exist_ok=True)
_tmp_cred = cred_path.with_suffix(".tmp")
_tmp_cred.write_text(json.dumps(existing, indent=2), encoding="utf-8")
_tmp_cred.replace(cred_path)
cred_path.write_text(json.dumps(existing, indent=2), encoding="utf-8")
# Restrict permissions (credentials file)
cred_path.chmod(0o600)
except (OSError, IOError) as e:
@@ -986,26 +908,6 @@ def read_hermes_oauth_credentials() -> Optional[Dict[str, Any]]:
# ---------------------------------------------------------------------------
def _is_bedrock_model_id(model: str) -> bool:
"""Detect AWS Bedrock model IDs that use dots as namespace separators.
Bedrock model IDs come in two forms:
- Bare: ``anthropic.claude-opus-4-7``
- Regional (inference profiles): ``us.anthropic.claude-sonnet-4-5-v1:0``
In both cases the dots separate namespace components, not version
numbers, and must be preserved verbatim for the Bedrock API.
"""
lower = model.lower()
# Regional inference-profile prefixes
if any(lower.startswith(p) for p in ("global.", "us.", "eu.", "ap.", "jp.")):
return True
# Bare Bedrock model IDs: provider.model-family
if lower.startswith("anthropic."):
return True
return False
def normalize_model_name(model: str, preserve_dots: bool = False) -> str:
"""Normalize a model name for the Anthropic API.
@@ -1013,19 +915,11 @@ def normalize_model_name(model: str, preserve_dots: bool = False) -> str:
- Converts dots to hyphens in version numbers (OpenRouter uses dots,
Anthropic uses hyphens: claude-opus-4.6 → claude-opus-4-6), unless
preserve_dots is True (e.g. for Alibaba/DashScope: qwen3.5-plus).
- Preserves Bedrock model IDs (``anthropic.claude-opus-4-7``) and
regional inference profiles (``us.anthropic.claude-*``) whose dots
are namespace separators, not version separators.
"""
lower = model.lower()
if lower.startswith("anthropic/"):
model = model[len("anthropic/"):]
if not preserve_dots:
# Bedrock model IDs use dots as namespace separators
# (e.g. "anthropic.claude-opus-4-7", "us.anthropic.claude-*").
# These must not be converted to hyphens. See issue #12295.
if _is_bedrock_model_id(model):
return model
# OpenRouter uses dots for version separators (claude-opus-4.6),
# Anthropic uses hyphens (claude-opus-4-6). Convert dots to hyphens.
model = model.replace(".", "-")
@@ -1680,9 +1574,9 @@ def build_anthropic_kwargs(
# ── Strip sampling params on 4.7+ ─────────────────────────────────
# Opus 4.7 rejects any non-default temperature/top_p/top_k with a 400.
# Callers (auxiliary_client, etc.) may set these for older models;
# drop them here as a safety net so upstream 4.6 → 4.7 migrations
# don't require coordinated edits everywhere.
# Callers (auxiliary_client, flush_memories, etc.) may set these for
# older models; drop them here as a safety net so upstream 4.6 → 4.7
# migrations don't require coordinated edits everywhere.
if _forbids_sampling_params(model):
for _sampling_key in ("temperature", "top_p", "top_k"):
kwargs.pop(_sampling_key, None)
@@ -1704,3 +1598,4 @@ def build_anthropic_kwargs(
return kwargs
+15 -339
View File
@@ -74,12 +74,6 @@ _PROVIDER_ALIASES = {
"minimax_cn": "minimax-cn",
"claude": "anthropic",
"claude-code": "anthropic",
"github": "copilot",
"github-copilot": "copilot",
"github-model": "copilot",
"github-models": "copilot",
"github-copilot-acp": "copilot-acp",
"copilot-acp-agent": "copilot-acp",
}
@@ -95,11 +89,10 @@ def _normalize_aux_provider(provider: Optional[str]) -> str:
if normalized == "main":
# Resolve to the user's actual main provider so named custom providers
# and non-aggregator providers (DeepSeek, Alibaba, etc.) work correctly.
main_prov = (_read_main_provider() or "").strip().lower()
main_prov = _read_main_provider()
if main_prov and main_prov not in ("auto", "main", ""):
normalized = main_prov
else:
return "custom"
return main_prov
return "custom"
return _PROVIDER_ALIASES.get(normalized, normalized)
@@ -390,7 +383,7 @@ class _CodexCompletionsAdapter:
# Note: the Codex endpoint (chatgpt.com/backend-api/codex) does NOT
# support max_output_tokens or temperature — omit to avoid 400 errors.
# Tools support for auxiliary callers (e.g. skills_hub) that pass function schemas
# Tools support for flush_memories and similar callers
tools = kwargs.get("tools")
if tools:
converted = []
@@ -1349,111 +1342,6 @@ def _is_auth_error(exc: Exception) -> bool:
return "error code: 401" in err_lower or "authenticationerror" in type(exc).__name__.lower()
def _is_unsupported_parameter_error(exc: Exception, param: str) -> bool:
"""Detect provider 400s for an unsupported request parameter.
Different OpenAI-compatible endpoints phrase the same class of error a few
ways: ``Unsupported parameter: X``, ``unsupported_parameter`` with a
``param`` field, ``X is not supported``, ``unknown parameter: X``,
``unrecognized request argument: X``. We match on both the parameter
name and a generic "unsupported/unknown/unrecognized parameter" marker so
call sites can reactively retry without the offending key instead of
surfacing a noisy auxiliary failure.
Generalizes the temperature-specific detector that originally shipped
with PR #15621 so the same retry strategy can cover ``max_tokens``,
``seed``, ``top_p``, and any future quirk. Credit @nicholasrae (PR #15416)
for the generalization pattern.
"""
param_lower = (param or "").lower()
if not param_lower:
return False
err_lower = str(exc).lower()
if param_lower not in err_lower:
return False
return any(marker in err_lower for marker in (
"unsupported parameter",
"unsupported_parameter",
"not supported",
"does not support",
"unknown parameter",
"unrecognized request argument",
"unrecognized parameter",
"invalid parameter",
))
def _is_unsupported_temperature_error(exc: Exception) -> bool:
"""Back-compat wrapper: detect API errors where the model rejects ``temperature``.
Delegates to :func:`_is_unsupported_parameter_error`; kept as a separate
public symbol because existing tests and call sites import it by name.
"""
return _is_unsupported_parameter_error(exc, "temperature")
def _evict_cached_clients(provider: str) -> None:
"""Drop cached auxiliary clients for a provider so fresh creds are used."""
normalized = _normalize_aux_provider(provider)
with _client_cache_lock:
stale_keys = [
key for key in _client_cache
if _normalize_aux_provider(str(key[0])) == normalized
]
for key in stale_keys:
client = _client_cache.get(key, (None, None, None))[0]
if client is not None:
_force_close_async_httpx(client)
try:
close_fn = getattr(client, "close", None)
if callable(close_fn):
close_fn()
except Exception:
pass
_client_cache.pop(key, None)
def _refresh_provider_credentials(provider: str) -> bool:
"""Refresh short-lived credentials for OAuth-backed auxiliary providers."""
normalized = _normalize_aux_provider(provider)
try:
if normalized == "openai-codex":
from hermes_cli.auth import resolve_codex_runtime_credentials
creds = resolve_codex_runtime_credentials(force_refresh=True)
if not str(creds.get("api_key", "") or "").strip():
return False
_evict_cached_clients(normalized)
return True
if normalized == "nous":
from hermes_cli.auth import resolve_nous_runtime_credentials
creds = resolve_nous_runtime_credentials(
min_key_ttl_seconds=max(60, int(os.getenv("HERMES_NOUS_MIN_KEY_TTL_SECONDS", "1800"))),
timeout_seconds=float(os.getenv("HERMES_NOUS_TIMEOUT_SECONDS", "15")),
force_mint=True,
)
if not str(creds.get("api_key", "") or "").strip():
return False
_evict_cached_clients(normalized)
return True
if normalized == "anthropic":
from agent.anthropic_adapter import read_claude_code_credentials, _refresh_oauth_token, resolve_anthropic_token
creds = read_claude_code_credentials()
token = _refresh_oauth_token(creds) if isinstance(creds, dict) and creds.get("refreshToken") else None
if not str(token or "").strip():
token = resolve_anthropic_token()
if not str(token or "").strip():
return False
_evict_cached_clients(normalized)
return True
except Exception as exc:
logger.debug("Auxiliary provider credential refresh failed for %s: %s", normalized, exc)
return False
return False
def _try_payment_fallback(
failed_provider: str,
task: str = None,
@@ -1848,7 +1736,7 @@ def resolve_provider_client(
"but no endpoint credentials found")
return None, None
# ── Named custom providers (config.yaml providers dict / custom_providers list) ───
# ── Named custom providers (config.yaml custom_providers list) ───
try:
from hermes_cli.runtime_provider import _get_named_custom_provider
custom_entry = _get_named_custom_provider(provider)
@@ -1859,51 +1747,16 @@ def resolve_provider_client(
if not custom_key and custom_key_env:
custom_key = os.getenv(custom_key_env, "").strip()
custom_key = custom_key or "no-key-required"
# An explicit per-task api_mode override (from _resolve_task_provider_model)
# wins; otherwise fall back to what the provider entry declared.
entry_api_mode = (api_mode or custom_entry.get("api_mode") or "").strip()
if custom_base:
final_model = _normalize_resolved_model(
model or custom_entry.get("model") or _read_main_model() or "gpt-4o-mini",
provider,
)
logger.debug(
"resolve_provider_client: named custom provider %r (%s, api_mode=%s)",
provider, final_model, entry_api_mode or "chat_completions")
# anthropic_messages: route through the Anthropic Messages API
# via AnthropicAuxiliaryClient. Mirrors the anonymous-custom
# branch in _try_custom_endpoint(). See #15033.
if entry_api_mode == "anthropic_messages":
try:
from agent.anthropic_adapter import build_anthropic_client
real_client = build_anthropic_client(custom_key, custom_base)
except ImportError:
logger.warning(
"Named custom provider %r declares api_mode="
"anthropic_messages but the anthropic SDK is not "
"installed — falling back to OpenAI-wire.",
provider,
)
client = OpenAI(api_key=custom_key, base_url=custom_base)
return (_to_async_client(client, final_model) if async_mode
else (client, final_model))
sync_anthropic = AnthropicAuxiliaryClient(
real_client, final_model, custom_key, custom_base, is_oauth=False,
)
if async_mode:
return AsyncAnthropicAuxiliaryClient(sync_anthropic), final_model
return sync_anthropic, final_model
client = OpenAI(api_key=custom_key, base_url=custom_base)
# codex_responses or inherited auto-detect (via _wrap_if_needed).
# _wrap_if_needed reads the closed-over `api_mode` (the task-level
# override). Named-provider entry api_mode=codex_responses also
# flows through here.
if entry_api_mode == "codex_responses" and not isinstance(
client, CodexAuxiliaryClient
):
client = CodexAuxiliaryClient(client, final_model)
else:
client = _wrap_if_needed(client, final_model, custom_base)
client = _wrap_if_needed(client, final_model, custom_base)
logger.debug(
"resolve_provider_client: named custom provider %r (%s)",
provider, final_model)
return (_to_async_client(client, final_model) if async_mode
else (client, final_model))
logger.warning(
@@ -2036,39 +1889,6 @@ def resolve_provider_client(
"directly supported", provider)
return None, None
elif pconfig.auth_type == "aws_sdk":
# AWS SDK providers (Bedrock) — use the Anthropic Bedrock client via
# boto3's credential chain (IAM roles, SSO, env vars, instance metadata).
try:
from agent.bedrock_adapter import has_aws_credentials, resolve_bedrock_region
from agent.anthropic_adapter import build_anthropic_bedrock_client
except ImportError:
logger.warning("resolve_provider_client: bedrock requested but "
"boto3 or anthropic SDK not installed")
return None, None
if not has_aws_credentials():
logger.debug("resolve_provider_client: bedrock requested but "
"no AWS credentials found")
return None, None
region = resolve_bedrock_region()
default_model = "anthropic.claude-haiku-4-5-20251001-v1:0"
final_model = _normalize_resolved_model(model or default_model, provider)
try:
real_client = build_anthropic_bedrock_client(region)
except ImportError as exc:
logger.warning("resolve_provider_client: cannot create Bedrock "
"client: %s", exc)
return None, None
client = AnthropicAuxiliaryClient(
real_client, final_model, api_key="aws-sdk",
base_url=f"https://bedrock-runtime.{region}.amazonaws.com",
)
logger.debug("resolve_provider_client: bedrock (%s, %s)", final_model, region)
return (_to_async_client(client, final_model) if async_mode
else (client, final_model))
elif pconfig.auth_type in ("oauth_device_code", "oauth_external"):
# OAuth providers — route through their specific try functions
if provider == "nous":
@@ -2803,8 +2623,8 @@ def _build_call_kwargs(
temperature = fixed_temperature
# Opus 4.7+ rejects any non-default temperature/top_p/top_k — silently
# drop here so auxiliary callers that hardcode temperature (e.g. 0 on
# structured-JSON extraction) don't 400 the moment
# drop here so auxiliary callers that hardcode temperature (e.g. 0.3 on
# flush_memories, 0 on structured-JSON extraction) don't 400 the moment
# the aux model is flipped to 4.7.
if temperature is not None:
from agent.anthropic_adapter import _forbids_sampling_params
@@ -2892,7 +2712,7 @@ def call_llm(
Args:
task: Auxiliary task name ("compression", "vision", "web_extract",
"session_search", "skills_hub", "mcp", "title_generation").
"session_search", "skills_hub", "mcp", "flush_memories").
Reads provider:model from config/env. Ignored if provider is set.
provider: Explicit provider override.
model: Explicit model override.
@@ -2995,45 +2815,13 @@ def call_llm(
if _is_anthropic_compat_endpoint(resolved_provider, _client_base):
kwargs["messages"] = _convert_openai_images_to_anthropic(kwargs["messages"])
# Handle unsupported temperature, max_tokens vs max_completion_tokens retry,
# then payment fallback.
# Handle max_tokens vs max_completion_tokens retry, then payment fallback.
try:
return _validate_llm_response(
client.chat.completions.create(**kwargs), task)
except Exception as first_err:
if "temperature" in kwargs and _is_unsupported_temperature_error(first_err):
retry_kwargs = dict(kwargs)
retry_kwargs.pop("temperature", None)
logger.info(
"Auxiliary %s: provider rejected temperature; retrying once without it",
task or "call",
)
try:
return _validate_llm_response(
client.chat.completions.create(**retry_kwargs), task)
except Exception as retry_err:
retry_err_str = str(retry_err)
# If retry still fails, fall through to the max_tokens /
# payment / auth chains below using the temperature-stripped
# kwargs. Re-raise only if the retry hit something those
# chains won't handle.
if not (
_is_payment_error(retry_err)
or _is_connection_error(retry_err)
or _is_auth_error(retry_err)
or "max_tokens" in retry_err_str
or "unsupported_parameter" in retry_err_str
):
raise
first_err = retry_err
kwargs = retry_kwargs
err_str = str(first_err)
if max_tokens is not None and (
"max_tokens" in err_str
or "unsupported_parameter" in err_str
or _is_unsupported_parameter_error(first_err, "max_tokens")
):
if "max_tokens" in err_str or "unsupported_parameter" in err_str:
kwargs.pop("max_tokens", None)
kwargs["max_completion_tokens"] = max_tokens
try:
@@ -3069,49 +2857,6 @@ def call_llm(
return _validate_llm_response(
refreshed_client.chat.completions.create(**kwargs), task)
# ── Auth refresh retry ───────────────────────────────────────
if (_is_auth_error(first_err)
and resolved_provider not in ("auto", "", None)
and not client_is_nous):
if _refresh_provider_credentials(resolved_provider):
logger.info(
"Auxiliary %s: refreshed %s credentials after auth error, retrying",
task or "call", resolved_provider,
)
retry_client, retry_model = (
resolve_vision_provider_client(
provider=resolved_provider,
model=final_model,
async_mode=False,
)[1:]
if task == "vision"
else _get_cached_client(
resolved_provider,
resolved_model,
base_url=resolved_base_url,
api_key=resolved_api_key,
api_mode=resolved_api_mode,
main_runtime=main_runtime,
)
)
if retry_client is not None:
retry_kwargs = _build_call_kwargs(
resolved_provider,
retry_model or final_model,
messages,
temperature=temperature,
max_tokens=max_tokens,
tools=tools,
timeout=effective_timeout,
extra_body=effective_extra_body,
base_url=resolved_base_url,
)
_retry_base = str(getattr(retry_client, "base_url", "") or "")
if _is_anthropic_compat_endpoint(resolved_provider, _retry_base):
retry_kwargs["messages"] = _convert_openai_images_to_anthropic(retry_kwargs["messages"])
return _validate_llm_response(
retry_client.chat.completions.create(**retry_kwargs), task)
# ── Payment / credit exhaustion fallback ──────────────────────
# When the resolved provider returns 402 or a credit-related error,
# try alternative providers instead of giving up. This handles the
@@ -3296,35 +3041,8 @@ async def async_call_llm(
return _validate_llm_response(
await client.chat.completions.create(**kwargs), task)
except Exception as first_err:
if "temperature" in kwargs and _is_unsupported_temperature_error(first_err):
retry_kwargs = dict(kwargs)
retry_kwargs.pop("temperature", None)
logger.info(
"Auxiliary %s (async): provider rejected temperature; retrying once without it",
task or "call",
)
try:
return _validate_llm_response(
await client.chat.completions.create(**retry_kwargs), task)
except Exception as retry_err:
retry_err_str = str(retry_err)
if not (
_is_payment_error(retry_err)
or _is_connection_error(retry_err)
or _is_auth_error(retry_err)
or "max_tokens" in retry_err_str
or "unsupported_parameter" in retry_err_str
):
raise
first_err = retry_err
kwargs = retry_kwargs
err_str = str(first_err)
if max_tokens is not None and (
"max_tokens" in err_str
or "unsupported_parameter" in err_str
or _is_unsupported_parameter_error(first_err, "max_tokens")
):
if "max_tokens" in err_str or "unsupported_parameter" in err_str:
kwargs.pop("max_tokens", None)
kwargs["max_completion_tokens"] = max_tokens
try:
@@ -3359,48 +3077,6 @@ async def async_call_llm(
return _validate_llm_response(
await refreshed_client.chat.completions.create(**kwargs), task)
# ── Auth refresh retry (mirrors sync call_llm) ───────────────
if (_is_auth_error(first_err)
and resolved_provider not in ("auto", "", None)
and not client_is_nous):
if _refresh_provider_credentials(resolved_provider):
logger.info(
"Auxiliary %s (async): refreshed %s credentials after auth error, retrying",
task or "call", resolved_provider,
)
if task == "vision":
_, retry_client, retry_model = resolve_vision_provider_client(
provider=resolved_provider,
model=final_model,
async_mode=True,
)
else:
retry_client, retry_model = _get_cached_client(
resolved_provider,
resolved_model,
async_mode=True,
base_url=resolved_base_url,
api_key=resolved_api_key,
api_mode=resolved_api_mode,
)
if retry_client is not None:
retry_kwargs = _build_call_kwargs(
resolved_provider,
retry_model or final_model,
messages,
temperature=temperature,
max_tokens=max_tokens,
tools=tools,
timeout=effective_timeout,
extra_body=effective_extra_body,
base_url=resolved_base_url,
)
_retry_base = str(getattr(retry_client, "base_url", "") or "")
if _is_anthropic_compat_endpoint(resolved_provider, _retry_base):
retry_kwargs["messages"] = _convert_openai_images_to_anthropic(retry_kwargs["messages"])
return _validate_llm_response(
await retry_client.chat.completions.create(**retry_kwargs), task)
# ── Payment / connection fallback (mirrors sync call_llm) ─────
should_fallback = _is_payment_error(first_err) or _is_connection_error(first_err)
is_auto = resolved_provider in ("auto", "", None)
+2 -130
View File
@@ -87,114 +87,6 @@ def reset_client_cache():
_bedrock_control_client_cache.clear()
def invalidate_runtime_client(region: str) -> bool:
"""Evict the cached ``bedrock-runtime`` client for a single region.
Per-region counterpart to :func:`reset_client_cache`. Used by the converse
call wrappers to discard clients whose underlying HTTP connection has
gone stale, so the next call allocates a fresh client (with a fresh
connection pool) instead of reusing a dead socket.
Returns True if a cached entry was evicted, False if the region was not
cached.
"""
existed = region in _bedrock_runtime_client_cache
_bedrock_runtime_client_cache.pop(region, None)
return existed
# ---------------------------------------------------------------------------
# Stale-connection detection
# ---------------------------------------------------------------------------
#
# boto3 caches its HTTPS connection pool inside the client object. When a
# pooled connection is killed out from under us (NAT timeout, VPN flap,
# server-side TCP RST, proxy idle cull, etc.), the next use surfaces as
# one of a handful of low-level exceptions — most commonly
# ``botocore.exceptions.ConnectionClosedError`` or
# ``urllib3.exceptions.ProtocolError``. urllib3 also trips an internal
# ``assert`` in a couple of paths (connection pool state checks, chunked
# response readers) which bubbles up as a bare ``AssertionError`` with an
# empty ``str(exc)``.
#
# In all of these cases the client is the problem, not the request: retrying
# with the same cached client reproduces the failure until the process
# restarts. The fix is to evict the region's cached client so the next
# attempt builds a new one.
_STALE_LIB_MODULE_PREFIXES = (
"urllib3.",
"botocore.",
"boto3.",
)
def _traceback_frames_modules(exc: BaseException):
"""Yield ``__name__``-style module strings for each frame in exc's traceback."""
tb = getattr(exc, "__traceback__", None)
while tb is not None:
frame = tb.tb_frame
module = frame.f_globals.get("__name__", "")
yield module or ""
tb = tb.tb_next
def is_stale_connection_error(exc: BaseException) -> bool:
"""Return True if ``exc`` indicates a dead/stale Bedrock HTTP connection.
Matches:
* ``botocore.exceptions.ConnectionError`` and subclasses
(``ConnectionClosedError``, ``EndpointConnectionError``,
``ReadTimeoutError``, ``ConnectTimeoutError``).
* ``urllib3.exceptions.ProtocolError`` / ``NewConnectionError`` /
``ConnectionError`` (best-effort import — urllib3 is a transitive
dependency of botocore so it is always available in practice).
* Bare ``AssertionError`` raised from a frame inside urllib3, botocore,
or boto3. These are internal-invariant failures (typically triggered
by corrupted connection-pool state after a dropped socket) and are
recoverable by swapping the client.
Non-library ``AssertionError``s (from application code or tests) are
intentionally not matched — only library-internal asserts signal stale
connection state.
"""
# botocore: the canonical signal — HTTPClientError is the umbrella for
# ConnectionClosedError, ReadTimeoutError, EndpointConnectionError,
# ConnectTimeoutError, and ProxyConnectionError. ConnectionError covers
# the same family via a different branch of the hierarchy.
try:
from botocore.exceptions import (
ConnectionError as BotoConnectionError,
HTTPClientError,
)
botocore_errors: tuple = (BotoConnectionError, HTTPClientError)
except ImportError: # pragma: no cover — botocore always present with boto3
botocore_errors = ()
if botocore_errors and isinstance(exc, botocore_errors):
return True
# urllib3: low-level transport failures
try:
from urllib3.exceptions import (
ProtocolError,
NewConnectionError,
ConnectionError as Urllib3ConnectionError,
)
urllib3_errors = (ProtocolError, NewConnectionError, Urllib3ConnectionError)
except ImportError: # pragma: no cover
urllib3_errors = ()
if urllib3_errors and isinstance(exc, urllib3_errors):
return True
# Library-internal AssertionError (urllib3 / botocore / boto3)
if isinstance(exc, AssertionError):
for module in _traceback_frames_modules(exc):
if any(module.startswith(prefix) for prefix in _STALE_LIB_MODULE_PREFIXES):
return True
return False
# ---------------------------------------------------------------------------
# AWS credential detection
# ---------------------------------------------------------------------------
@@ -895,17 +787,7 @@ def call_converse(
guardrail_config=guardrail_config,
)
try:
response = client.converse(**kwargs)
except Exception as exc:
if is_stale_connection_error(exc):
logger.warning(
"bedrock: stale-connection error on converse(region=%s, model=%s): "
"%s — evicting cached client so the next call reconnects.",
region, model, type(exc).__name__,
)
invalidate_runtime_client(region)
raise
response = client.converse(**kwargs)
return normalize_converse_response(response)
@@ -937,17 +819,7 @@ def call_converse_stream(
guardrail_config=guardrail_config,
)
try:
response = client.converse_stream(**kwargs)
except Exception as exc:
if is_stale_connection_error(exc):
logger.warning(
"bedrock: stale-connection error on converse_stream(region=%s, "
"model=%s): %s — evicting cached client so the next call reconnects.",
region, model, type(exc).__name__,
)
invalidate_runtime_client(region)
raise
response = client.converse_stream(**kwargs)
return normalize_converse_stream_events(response)
+10 -73
View File
@@ -23,52 +23,26 @@ from agent.prompt_builder import DEFAULT_AGENT_IDENTITY
logger = logging.getLogger(__name__)
# Matches Codex/Harmony tool-call serialization that occasionally leaks into
# assistant-message content when the model fails to emit a structured
# ``function_call`` item. Accepts the common forms:
#
# to=functions.exec_command
# assistant to=functions.exec_command
# <|channel|>commentary to=functions.exec_command
#
# ``to=functions.<name>`` is the stable marker — the optional ``assistant`` or
# Harmony channel prefix varies by degeneration mode. Case-insensitive to
# cover lowercase/uppercase ``assistant`` variants.
_TOOL_CALL_LEAK_PATTERN = re.compile(
r"(?:^|[\s>|])to=functions\.[A-Za-z_][\w.]*",
re.IGNORECASE,
)
# ---------------------------------------------------------------------------
# Multimodal content helpers
# ---------------------------------------------------------------------------
def _chat_content_to_responses_parts(content: Any, *, role: str = "user") -> List[Dict[str, Any]]:
def _chat_content_to_responses_parts(content: Any) -> List[Dict[str, Any]]:
"""Convert chat-style multimodal content to Responses API input parts.
Input: ``[{"type":"text"|"image_url", ...}]`` (native OpenAI Chat format)
Output: ``[{"type":"input_text"|"output_text"|"input_image", ...}]`` (Responses format)
The ``role`` parameter controls the text content type:
- ``"user"`` (default) → ``"input_text"``
- ``"assistant"`` → ``"output_text"``
The Responses API rejects ``input_text`` inside assistant messages and
``output_text`` inside user messages, so callers MUST pass the correct
role for the message being converted.
Output: ``[{"type":"input_text"|"input_image", ...}]`` (Responses format)
Returns an empty list when ``content`` is not a list or contains no
recognized parts — callers fall back to the string path.
"""
text_type = "output_text" if role == "assistant" else "input_text"
if not isinstance(content, list):
return []
converted: List[Dict[str, Any]] = []
for part in content:
if isinstance(part, str):
if part:
converted.append({"type": text_type, "text": part})
converted.append({"type": "input_text", "text": part})
continue
if not isinstance(part, dict):
continue
@@ -76,7 +50,7 @@ def _chat_content_to_responses_parts(content: Any, *, role: str = "user") -> Lis
if ptype in {"text", "input_text", "output_text"}:
text = part.get("text")
if isinstance(text, str) and text:
converted.append({"type": text_type, "text": text})
converted.append({"type": "input_text", "text": text})
continue
if ptype in {"image_url", "input_image"}:
image_ref = part.get("image_url")
@@ -242,10 +216,9 @@ def _chat_messages_to_responses_input(messages: List[Dict[str, Any]]) -> List[Di
if role in {"user", "assistant"}:
content = msg.get("content", "")
if isinstance(content, list):
content_parts = _chat_content_to_responses_parts(content, role=role)
text_type = "output_text" if role == "assistant" else "input_text"
content_parts = _chat_content_to_responses_parts(content)
content_text = "".join(
p.get("text", "") for p in content_parts if p.get("type") == text_type
p.get("text", "") for p in content_parts if p.get("type") == "input_text"
)
else:
content_parts = []
@@ -439,16 +412,13 @@ def _preflight_codex_input_items(raw_items: Any) -> List[Dict[str, Any]]:
content = ""
if isinstance(content, list):
# Multimodal content from ``_chat_messages_to_responses_input``
# is already in Responses format (``input_text`` / ``output_text``
# / ``input_image``). Validate each part and pass through.
# Use the correct text type for the role — ``output_text`` for
# assistant messages, ``input_text`` for user messages.
text_type = "output_text" if role == "assistant" else "input_text"
# is already in Responses format (``input_text`` / ``input_image``).
# Validate each part and pass through.
validated: List[Dict[str, Any]] = []
for part_idx, part in enumerate(content):
if isinstance(part, str):
if part:
validated.append({"type": text_type, "text": part})
validated.append({"type": "input_text", "text": part})
continue
if not isinstance(part, dict):
raise ValueError(
@@ -459,7 +429,7 @@ def _preflight_codex_input_items(raw_items: Any) -> List[Dict[str, Any]]:
text = part.get("text", "")
if not isinstance(text, str):
text = str(text or "")
validated.append({"type": text_type, "text": text})
validated.append({"type": "input_text", "text": text})
elif ptype in {"input_image", "image_url"}:
image_ref = part.get("image_url", "")
detail = part.get("detail")
@@ -817,37 +787,6 @@ def _normalize_codex_response(response: Any) -> tuple[Any, str]:
if isinstance(out_text, str):
final_text = out_text.strip()
# ── Tool-call leak recovery ──────────────────────────────────
# gpt-5.x on the Codex Responses API sometimes degenerates and emits
# what should be a structured `function_call` item as plain assistant
# text using the Harmony/Codex serialization (``to=functions.foo
# {json}`` or ``assistant to=functions.foo {json}``). The model
# intended to call a tool, but the intent never made it into
# ``response.output`` as a ``function_call`` item, so ``tool_calls``
# is empty here. If we pass this through, the parent sees a
# confident-looking summary with no audit trail (empty ``tool_trace``)
# and no tools actually ran — the Taiwan-embassy-email incident.
#
# Detection: leaked tokens always contain ``to=functions.<name>`` and
# the assistant message has no real tool calls. Treat it as incomplete
# so the existing Codex-incomplete continuation path (3 retries,
# handled in run_agent.py) gets a chance to re-elicit a proper
# ``function_call`` item. The existing loop already handles message
# append, dedup, and retry budget.
leaked_tool_call_text = False
if final_text and not tool_calls and _TOOL_CALL_LEAK_PATTERN.search(final_text):
leaked_tool_call_text = True
logger.warning(
"Codex response contains leaked tool-call text in assistant content "
"(no structured function_call items). Treating as incomplete so the "
"continuation path can re-elicit a proper tool call. Leaked snippet: %r",
final_text[:300],
)
# Clear the text so downstream code doesn't surface the garbage as
# a summary. The encrypted reasoning items (if any) are preserved
# so the model keeps its chain-of-thought on the retry.
final_text = ""
assistant_message = SimpleNamespace(
content=final_text,
tool_calls=tool_calls,
@@ -859,8 +798,6 @@ def _normalize_codex_response(response: Any) -> tuple[Any, str]:
if tool_calls:
finish_reason = "tool_calls"
elif leaked_tool_call_text:
finish_reason = "incomplete"
elif has_incomplete_items or (saw_commentary_phase and not saw_final_answer_phase):
finish_reason = "incomplete"
elif reasoning_items_raw and not final_text:
-30
View File
@@ -294,7 +294,6 @@ class ContextCompressor(ContextEngine):
self._context_probed = False
self._context_probe_persistable = False
self._previous_summary = None
self._last_summary_error = None
self._last_compression_savings_pct = 100.0
self._ineffective_compression_count = 0
@@ -318,13 +317,6 @@ class ContextCompressor(ContextEngine):
int(context_length * self.threshold_percent),
MINIMUM_CONTEXT_LENGTH,
)
# Recalculate token budgets for the new context length so the
# compressor stays calibrated after a model switch (e.g. 200K → 32K).
target_tokens = int(self.threshold_tokens * self.summary_target_ratio)
self.tail_token_budget = target_tokens
self.max_summary_tokens = min(
int(context_length * 0.05), _SUMMARY_TOKENS_CEILING,
)
def __init__(
self,
@@ -397,7 +389,6 @@ class ContextCompressor(ContextEngine):
self._last_compression_savings_pct: float = 100.0
self._ineffective_compression_count: int = 0
self._summary_failure_cooldown_until: float = 0.0
self._last_summary_error: Optional[str] = None
def update_from_response(self, usage: Dict[str, Any]):
"""Update tracked token usage from API response."""
@@ -821,12 +812,10 @@ The user has requested that this compaction PRIORITISE preserving all informatio
self._previous_summary = summary
self._summary_failure_cooldown_until = 0.0
self._summary_model_fallen_back = False
self._last_summary_error = None
return self._with_summary_prefix(summary)
except RuntimeError:
# No provider configured — long cooldown, unlikely to self-resolve
self._summary_failure_cooldown_until = time.monotonic() + _SUMMARY_FAILURE_COOLDOWN_SECONDS
self._last_summary_error = "no auxiliary LLM provider configured"
logging.warning("Context compression: no provider available for "
"summary. Middle turns will be dropped without summary "
"for %d seconds.",
@@ -864,10 +853,6 @@ The user has requested that this compaction PRIORITISE preserving all informatio
# Transient errors (timeout, rate limit, network) — shorter cooldown
_transient_cooldown = 60
self._summary_failure_cooldown_until = time.monotonic() + _transient_cooldown
err_text = str(e).strip() or e.__class__.__name__
if len(err_text) > 220:
err_text = err_text[:217].rstrip() + "..."
self._last_summary_error = err_text
logging.warning(
"Failed to generate context summary: %s. "
"Further summary attempts paused for %d seconds.",
@@ -1114,21 +1099,6 @@ The user has requested that this compaction PRIORITISE preserving all informatio
return max(cut_idx, head_end + 1)
# ------------------------------------------------------------------
# ContextEngine: manual /compress preflight
# ------------------------------------------------------------------
def has_content_to_compress(self, messages: List[Dict[str, Any]]) -> bool:
"""Return True if there is a non-empty middle region to compact.
Overrides the ABC default so the gateway ``/compress`` guard can
skip the LLM call when the transcript is still entirely inside
the protected head/tail.
"""
compress_start = self._align_boundary_forward(messages, self.protect_first_n)
compress_end = self._find_tail_cut_by_tokens(messages, compress_start)
return compress_start < compress_end
# ------------------------------------------------------------------
# Main compression entry point
# ------------------------------------------------------------------
-22
View File
@@ -78,7 +78,6 @@ class ContextEngine(ABC):
self,
messages: List[Dict[str, Any]],
current_tokens: int = None,
focus_topic: str = None,
) -> List[Dict[str, Any]]:
"""Compact the message list and return the new message list.
@@ -87,12 +86,6 @@ class ContextEngine(ABC):
context budget. The implementation is free to summarize, build a
DAG, or do anything else — as long as the returned list is a valid
OpenAI-format message sequence.
Args:
focus_topic: Optional topic string from manual ``/compress <focus>``.
Engines that support guided compression should prioritise
preserving information related to this topic. Engines that
don't support it may simply ignore this argument.
"""
# -- Optional: pre-flight check ----------------------------------------
@@ -105,21 +98,6 @@ class ContextEngine(ABC):
"""
return False
# -- Optional: manual /compress preflight ------------------------------
def has_content_to_compress(self, messages: List[Dict[str, Any]]) -> bool:
"""Quick check: is there anything in ``messages`` that can be compacted?
Used by the gateway ``/compress`` command as a preflight guard —
returning False lets the gateway report "nothing to compress yet"
without making an LLM call.
Default returns True (always attempt). Engines with a cheap way
to introspect their own head/tail boundaries should override this
to return False when the transcript is still entirely protected.
"""
return True
# -- Optional: session lifecycle ---------------------------------------
def on_session_start(self, session_id: str, **kwargs) -> None:
-42
View File
@@ -46,47 +46,6 @@ def _resolve_args() -> list[str]:
return shlex.split(raw)
def _resolve_home_dir() -> str:
"""Return a stable HOME for child ACP processes."""
try:
from hermes_constants import get_subprocess_home
profile_home = get_subprocess_home()
if profile_home:
return profile_home
except Exception:
pass
home = os.environ.get("HOME", "").strip()
if home:
return home
expanded = os.path.expanduser("~")
if expanded and expanded != "~":
return expanded
try:
import pwd
resolved = pwd.getpwuid(os.getuid()).pw_dir.strip()
if resolved:
return resolved
except Exception:
pass
# Last resort: /tmp (writable on any POSIX system). Avoids crashing the
# subprocess with no HOME; callers can set HERMES_HOME explicitly if they
# need a different writable dir.
return "/tmp"
def _build_subprocess_env() -> dict[str, str]:
env = os.environ.copy()
env["HOME"] = _resolve_home_dir()
return env
def _jsonrpc_error(message_id: Any, code: int, message: str) -> dict[str, Any]:
return {
"jsonrpc": "2.0",
@@ -423,7 +382,6 @@ class CopilotACPClient:
text=True,
bufsize=1,
cwd=self._acp_cwd,
env=_build_subprocess_env(),
)
except FileNotFoundError as exc:
raise RuntimeError(
+3 -108
View File
@@ -455,61 +455,6 @@ class CredentialPool:
logger.debug("Failed to sync from credentials file: %s", exc)
return entry
def _sync_nous_entry_from_auth_store(self, entry: PooledCredential) -> PooledCredential:
"""Sync a Nous pool entry from auth.json if tokens differ.
Nous OAuth refresh tokens are single-use. When another process
(e.g. a concurrent cron) refreshes the token via
``resolve_nous_runtime_credentials``, it writes fresh tokens to
auth.json under ``_auth_store_lock``. The pool entry's tokens
become stale. This method detects that and adopts the newer pair,
avoiding a "refresh token reuse" revocation on the Nous Portal.
"""
if self.provider != "nous" or entry.source != "device_code":
return entry
try:
with _auth_store_lock():
auth_store = _load_auth_store()
state = _load_provider_state(auth_store, "nous")
if not state:
return entry
store_refresh = state.get("refresh_token", "")
store_access = state.get("access_token", "")
if store_refresh and store_refresh != entry.refresh_token:
logger.debug(
"Pool entry %s: syncing tokens from auth.json (Nous refresh token changed)",
entry.id,
)
field_updates: Dict[str, Any] = {
"access_token": store_access,
"refresh_token": store_refresh,
"last_status": None,
"last_status_at": None,
"last_error_code": None,
}
if state.get("expires_at"):
field_updates["expires_at"] = state["expires_at"]
if state.get("agent_key"):
field_updates["agent_key"] = state["agent_key"]
if state.get("agent_key_expires_at"):
field_updates["agent_key_expires_at"] = state["agent_key_expires_at"]
if state.get("inference_base_url"):
field_updates["inference_base_url"] = state["inference_base_url"]
extra_updates = dict(entry.extra)
for extra_key in ("obtained_at", "expires_in", "agent_key_id",
"agent_key_expires_in", "agent_key_reused",
"agent_key_obtained_at"):
val = state.get(extra_key)
if val is not None:
extra_updates[extra_key] = val
updated = replace(entry, extra=extra_updates, **field_updates)
self._replace_entry(entry, updated)
self._persist()
return updated
except Exception as exc:
logger.debug("Failed to sync Nous entry from auth.json: %s", exc)
return entry
def _sync_device_code_entry_to_auth_store(self, entry: PooledCredential) -> None:
"""Write refreshed pool entry tokens back to auth.json providers.
@@ -616,9 +561,6 @@ class CredentialPool:
last_refresh=refreshed.get("last_refresh"),
)
elif self.provider == "nous":
synced = self._sync_nous_entry_from_auth_store(entry)
if synced is not entry:
entry = synced
nous_state = {
"access_token": entry.access_token,
"refresh_token": entry.refresh_token,
@@ -693,26 +635,6 @@ class CredentialPool:
# Credentials file had a valid (non-expired) token — use it directly
logger.debug("Credentials file has valid token, using without refresh")
return synced
# For nous: another process may have consumed the refresh token
# between our proactive sync and the HTTP call. Re-sync from
# auth.json and adopt the fresh tokens if available.
if self.provider == "nous":
synced = self._sync_nous_entry_from_auth_store(entry)
if synced.refresh_token != entry.refresh_token:
logger.debug("Nous refresh failed but auth.json has newer tokens — adopting")
updated = replace(
synced,
last_status=STATUS_OK,
last_status_at=None,
last_error_code=None,
last_error_reason=None,
last_error_message=None,
last_error_reset_at=None,
)
self._replace_entry(synced, updated)
self._persist()
self._sync_device_code_entry_to_auth_store(updated)
return updated
self._mark_exhausted(entry, None)
return None
@@ -776,17 +698,6 @@ class CredentialPool:
if synced is not entry:
entry = synced
cleared_any = True
# For nous entries, sync from auth.json before status checks.
# Another process may have successfully refreshed via
# resolve_nous_runtime_credentials(), making this entry's
# exhausted status stale.
if (self.provider == "nous"
and entry.source == "device_code"
and entry.last_status == STATUS_EXHAUSTED):
synced = self._sync_nous_entry_from_auth_store(entry)
if synced is not entry:
entry = synced
cleared_any = True
if entry.last_status == STATUS_EXHAUSTED:
exhausted_until = _exhausted_until(entry)
if exhausted_until is not None and now < exhausted_until:
@@ -828,11 +739,8 @@ class CredentialPool:
if self._strategy == STRATEGY_LEAST_USED and len(available) > 1:
entry = min(available, key=lambda e: e.request_count)
# Increment usage counter so subsequent selections distribute load
updated = replace(entry, request_count=entry.request_count + 1)
self._replace_entry(entry, updated)
self._current_id = entry.id
return updated
return entry
if self._strategy == STRATEGY_ROUND_ROBIN and len(available) > 1:
entry = available[0]
@@ -1148,18 +1056,6 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
"inference_base_url": state.get("inference_base_url"),
"agent_key": state.get("agent_key"),
"agent_key_expires_at": state.get("agent_key_expires_at"),
# Carry the mint/refresh timestamps into the pool so
# freshness-sensitive consumers (self-heal hooks, pool
# pruning by age) can distinguish just-minted credentials
# from stale ones. Without these, fresh device_code
# entries get obtained_at=None and look older than they
# are (#15099).
"obtained_at": state.get("obtained_at"),
"expires_in": state.get("expires_in"),
"agent_key_id": state.get("agent_key_id"),
"agent_key_expires_in": state.get("agent_key_expires_in"),
"agent_key_reused": state.get("agent_key_reused"),
"agent_key_obtained_at": state.get("agent_key_obtained_at"),
"tls": state.get("tls") if isinstance(state.get("tls"), dict) else None,
"label": seeded_label,
},
@@ -1170,10 +1066,9 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
# env vars (COPILOT_GITHUB_TOKEN / GH_TOKEN). They don't live in
# the auth store or credential pool, so we resolve them here.
try:
from hermes_cli.copilot_auth import resolve_copilot_token, get_copilot_api_token
from hermes_cli.copilot_auth import resolve_copilot_token
token, source = resolve_copilot_token()
if token:
api_token = get_copilot_api_token(token)
source_name = "gh_cli" if "gh" in source.lower() else f"env:{source}"
if not _is_suppressed(provider, source_name):
active_sources.add(source_name)
@@ -1185,7 +1080,7 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
{
"source": source_name,
"auth_type": AUTH_TYPE_API_KEY,
"access_token": api_token,
"access_token": token,
"base_url": pconfig.inference_base_url if pconfig else "",
"label": source,
},
-55
View File
@@ -45,7 +45,6 @@ class FailoverReason(enum.Enum):
# Model
model_not_found = "model_not_found" # 404 or invalid model — fallback to different model
provider_policy_blocked = "provider_policy_blocked" # Aggregator (e.g. OpenRouter) blocked the only endpoint due to account data/privacy policy
# Request format
format_error = "format_error" # 400 bad request — abort or strip + retry
@@ -195,29 +194,6 @@ _MODEL_NOT_FOUND_PATTERNS = [
"unsupported model",
]
# OpenRouter aggregator policy-block patterns.
#
# When a user's OpenRouter account privacy setting (or a per-request
# `provider.data_collection: deny` preference) excludes the only endpoint
# serving a model, OpenRouter returns 404 with a *specific* message that is
# distinct from "model not found":
#
# "No endpoints available matching your guardrail restrictions and
# data policy. Configure: https://openrouter.ai/settings/privacy"
#
# We classify this as `provider_policy_blocked` rather than
# `model_not_found` because:
# - The model *exists* — model_not_found is misleading in logs
# - Provider fallback won't help: the account-level setting applies to
# every call on the same OpenRouter account
# - The error body already contains the fix URL, so the user gets
# actionable guidance without us rewriting the message
_PROVIDER_POLICY_BLOCKED_PATTERNS = [
"no endpoints available matching your guardrail",
"no endpoints available matching your data policy",
"no endpoints found matching your data policy",
]
# Auth patterns (non-status-code signals)
_AUTH_PATTERNS = [
"invalid api key",
@@ -343,11 +319,6 @@ def classify_api_error(
"""
status_code = _extract_status_code(error)
error_type = type(error).__name__
# Copilot/GitHub Models RateLimitError may not set .status_code; force 429
# so downstream rate-limit handling (classifier reason, pool rotation,
# fallback gating) fires correctly instead of misclassifying as generic.
if status_code is None and error_type == "RateLimitError":
status_code = 429
body = _extract_error_body(error)
error_code = _extract_error_code(body)
@@ -552,17 +523,6 @@ def _classify_by_status(
return _classify_402(error_msg, result_fn)
if status_code == 404:
# OpenRouter policy-block 404 — distinct from "model not found".
# The model exists; the user's account privacy setting excludes the
# only endpoint serving it. Falling back to another provider won't
# help (same account setting applies). The error body already
# contains the fix URL, so just surface it.
if any(p in error_msg for p in _PROVIDER_POLICY_BLOCKED_PATTERNS):
return result_fn(
FailoverReason.provider_policy_blocked,
retryable=False,
should_fallback=False,
)
if any(p in error_msg for p in _MODEL_NOT_FOUND_PATTERNS):
return result_fn(
FailoverReason.model_not_found,
@@ -680,12 +640,6 @@ def _classify_400(
)
# Some providers return model-not-found as 400 instead of 404 (e.g. OpenRouter).
if any(p in error_msg for p in _PROVIDER_POLICY_BLOCKED_PATTERNS):
return result_fn(
FailoverReason.provider_policy_blocked,
retryable=False,
should_fallback=False,
)
if any(p in error_msg for p in _MODEL_NOT_FOUND_PATTERNS):
return result_fn(
FailoverReason.model_not_found,
@@ -858,15 +812,6 @@ def _classify_by_message(
should_fallback=True,
)
# Provider policy-block (aggregator-side guardrail) — check before
# model_not_found so we don't mis-label as a missing model.
if any(p in error_msg for p in _PROVIDER_POLICY_BLOCKED_PATTERNS):
return result_fn(
FailoverReason.provider_policy_blocked,
retryable=False,
should_fallback=False,
)
# Model not found patterns
if any(p in error_msg for p in _MODEL_NOT_FOUND_PATTERNS):
return result_fn(
-104
View File
@@ -44,97 +44,6 @@ def is_native_gemini_base_url(base_url: str) -> bool:
return not normalized.endswith("/openai")
def probe_gemini_tier(
api_key: str,
base_url: str = DEFAULT_GEMINI_BASE_URL,
*,
model: str = "gemini-2.5-flash",
timeout: float = 10.0,
) -> str:
"""Probe a Google AI Studio API key and return its tier.
Returns one of:
- ``"free"`` -- key is on the free tier (unusable with Hermes)
- ``"paid"`` -- key is on a paid tier
- ``"unknown"`` -- probe failed; callers should proceed without blocking.
"""
key = (api_key or "").strip()
if not key:
return "unknown"
normalized_base = str(base_url or DEFAULT_GEMINI_BASE_URL).strip().rstrip("/")
if not normalized_base:
normalized_base = DEFAULT_GEMINI_BASE_URL
if normalized_base.lower().endswith("/openai"):
normalized_base = normalized_base[: -len("/openai")]
url = f"{normalized_base}/models/{model}:generateContent"
payload = {
"contents": [{"role": "user", "parts": [{"text": "hi"}]}],
"generationConfig": {"maxOutputTokens": 1},
}
try:
with httpx.Client(timeout=timeout) as client:
resp = client.post(
url,
params={"key": key},
json=payload,
headers={"Content-Type": "application/json"},
)
except Exception as exc:
logger.debug("probe_gemini_tier: network error: %s", exc)
return "unknown"
headers_lower = {k.lower(): v for k, v in resp.headers.items()}
rpd_header = headers_lower.get("x-ratelimit-limit-requests-per-day")
if rpd_header:
try:
rpd_val = int(rpd_header)
except (TypeError, ValueError):
rpd_val = None
# Published free-tier daily caps (Dec 2025):
# gemini-2.5-pro: 100, gemini-2.5-flash: 250, flash-lite: 1000
# Tier 1 starts at ~1500+ for Flash. We treat <= 1000 as free.
if rpd_val is not None and rpd_val <= 1000:
return "free"
if rpd_val is not None and rpd_val > 1000:
return "paid"
if resp.status_code == 429:
body_text = ""
try:
body_text = resp.text or ""
except Exception:
body_text = ""
if "free_tier" in body_text.lower():
return "free"
return "paid"
if 200 <= resp.status_code < 300:
return "paid"
return "unknown"
def is_free_tier_quota_error(error_message: str) -> bool:
"""Return True when a Gemini 429 message indicates free-tier exhaustion."""
if not error_message:
return False
return "free_tier" in error_message.lower()
_FREE_TIER_GUIDANCE = (
"\n\nYour Google API key is on the free tier (<= 250 requests/day for "
"gemini-2.5-flash). Hermes typically makes 3-10 API calls per user turn, "
"so the free tier is exhausted in a handful of messages and cannot sustain "
"an agent session. Enable billing on your Google Cloud project and "
"regenerate the key in a billing-enabled project: "
"https://aistudio.google.com/apikey"
)
class GeminiAPIError(Exception):
"""Error shape compatible with Hermes retry/error classification."""
@@ -741,12 +650,6 @@ def gemini_http_error(response: httpx.Response) -> GeminiAPIError:
else:
message = f"Gemini returned HTTP {status}: {body_text[:500]}"
# Free-tier quota exhaustion -> append actionable guidance so users who
# bypassed the setup wizard (direct GOOGLE_API_KEY in .env) still learn
# that the free tier cannot sustain an agent session.
if status == 429 and is_free_tier_quota_error(err_message or body_text):
message = message + _FREE_TIER_GUIDANCE
return GeminiAPIError(
message,
code=code,
@@ -801,13 +704,6 @@ class GeminiNativeClient:
http_client: Optional[httpx.Client] = None,
**_: Any,
) -> None:
if not (api_key or "").strip():
raise RuntimeError(
"Gemini native client requires an API key, but none was provided. "
"Set GOOGLE_API_KEY or GEMINI_API_KEY in your environment / ~/.hermes/.env "
"(get one at https://aistudio.google.com/app/apikey), or run `hermes setup` "
"to configure the Google provider."
)
self.api_key = api_key
normalized_base = (base_url or DEFAULT_GEMINI_BASE_URL).rstrip("/")
if normalized_base.endswith("/openai"):
-14
View File
@@ -73,20 +73,6 @@ def sanitize_gemini_schema(schema: Any) -> Dict[str, Any]:
]
continue
cleaned[key] = value
# Gemini's Schema validator requires every ``enum`` entry to be a string,
# even when the parent ``type`` is ``integer`` / ``number`` / ``boolean``.
# OpenAI / OpenRouter / Anthropic accept typed enums (e.g. Discord's
# ``auto_archive_duration: {type: integer, enum: [60, 1440, 4320, 10080]}``),
# so we only drop the ``enum`` when it would collide with Gemini's rule.
# Keeping ``type: integer`` plus the human-readable description gives the
# model enough guidance; the tool handler still validates the value.
enum_val = cleaned.get("enum")
type_val = cleaned.get("type")
if isinstance(enum_val, list) and type_val in {"integer", "number", "boolean"}:
if any(not isinstance(item, str) for item in enum_val):
cleaned.pop("enum", None)
return cleaned
+2 -43
View File
@@ -31,7 +31,6 @@ from __future__ import annotations
import json
import logging
import re
import inspect
from typing import Any, Dict, List, Optional
from agent.memory_provider import MemoryProvider
@@ -313,39 +312,7 @@ class MemoryManager:
)
return "\n\n".join(parts)
@staticmethod
def _provider_memory_write_metadata_mode(provider: MemoryProvider) -> str:
"""Return how to pass metadata to a provider's memory-write hook."""
try:
signature = inspect.signature(provider.on_memory_write)
except (TypeError, ValueError):
return "keyword"
params = list(signature.parameters.values())
if any(p.kind == inspect.Parameter.VAR_KEYWORD for p in params):
return "keyword"
if "metadata" in signature.parameters:
return "keyword"
accepted = [
p for p in params
if p.kind in (
inspect.Parameter.POSITIONAL_ONLY,
inspect.Parameter.POSITIONAL_OR_KEYWORD,
inspect.Parameter.KEYWORD_ONLY,
)
]
if len(accepted) >= 4:
return "positional"
return "legacy"
def on_memory_write(
self,
action: str,
target: str,
content: str,
metadata: Optional[Dict[str, Any]] = None,
) -> None:
def on_memory_write(self, action: str, target: str, content: str) -> None:
"""Notify external providers when the built-in memory tool writes.
Skips the builtin provider itself (it's the source of the write).
@@ -354,15 +321,7 @@ class MemoryManager:
if provider.name == "builtin":
continue
try:
metadata_mode = self._provider_memory_write_metadata_mode(provider)
if metadata_mode == "keyword":
provider.on_memory_write(
action, target, content, metadata=dict(metadata or {})
)
elif metadata_mode == "positional":
provider.on_memory_write(action, target, content, dict(metadata or {}))
else:
provider.on_memory_write(action, target, content)
provider.on_memory_write(action, target, content)
except Exception as e:
logger.debug(
"Memory provider '%s' on_memory_write failed: %s",
+3 -12
View File
@@ -26,7 +26,7 @@ Optional hooks (override to opt in):
on_turn_start(turn, message, **kwargs) per-turn tick with runtime context
on_session_end(messages) end-of-session extraction
on_pre_compress(messages) -> str extract before context compression
on_memory_write(action, target, content, metadata=None) mirror built-in memory writes
on_memory_write(action, target, content) mirror built-in memory writes
on_delegation(task, result, **kwargs) parent-side observation of subagent work
"""
@@ -34,7 +34,7 @@ from __future__ import annotations
import logging
from abc import ABC, abstractmethod
from typing import Any, Dict, List, Optional
from typing import Any, Dict, List
logger = logging.getLogger(__name__)
@@ -220,21 +220,12 @@ class MemoryProvider(ABC):
should all have ``env_var`` set and this method stays no-op).
"""
def on_memory_write(
self,
action: str,
target: str,
content: str,
metadata: Optional[Dict[str, Any]] = None,
) -> None:
def on_memory_write(self, action: str, target: str, content: str) -> None:
"""Called when the built-in memory tool writes an entry.
action: 'add', 'replace', or 'remove'
target: 'memory' or 'user'
content: the entry content
metadata: structured provenance for the write, when available. Common
keys include ``write_origin``, ``execution_context``, ``session_id``,
``parent_session_id``, ``platform``, and ``tool_name``.
Use to mirror built-in memory writes to your backend.
"""
+21 -215
View File
@@ -6,7 +6,6 @@ and run_agent.py for pre-flight context checks.
import ipaddress
import logging
import os
import re
import time
from pathlib import Path
@@ -22,25 +21,6 @@ from hermes_constants import OPENROUTER_MODELS_URL
logger = logging.getLogger(__name__)
def _resolve_requests_verify() -> bool | str:
"""Resolve SSL verify setting for `requests` calls from env vars.
The `requests` library only honours REQUESTS_CA_BUNDLE / CURL_CA_BUNDLE
by default. Hermes also honours HERMES_CA_BUNDLE (its own convention)
and SSL_CERT_FILE (used by the stdlib `ssl` module and by httpx), so
that a single env var can cover both `requests` and `httpx` callsites
inside the same process.
Returns either a filesystem path to a CA bundle, or True to defer to
the requests default (certifi).
"""
for env_var in ("HERMES_CA_BUNDLE", "REQUESTS_CA_BUNDLE", "SSL_CERT_FILE"):
val = os.getenv(env_var)
if val and os.path.isfile(val):
return val
return True
# Provider names that can appear as a "provider:" prefix before a model ID.
# Only these are stripped — Ollama-style "model:tag" colons (e.g. "qwen3.5:27b")
# are preserved so the full model name reaches cache lookups and server queries.
@@ -143,9 +123,8 @@ DEFAULT_CONTEXT_LENGTHS = {
"claude": 200000,
# OpenAI — GPT-5 family (most have 400k; specific overrides first)
# Source: https://developers.openai.com/api/docs/models
# GPT-5.5 (launched Apr 23 2026). 400k is the fallback for providers we
# can't probe live. ChatGPT Codex OAuth actually caps lower (272k as of
# Apr 2026) and is resolved via _resolve_codex_oauth_context_length().
# GPT-5.5 (launched Apr 23 2026). Verified via live ChatGPT codex/models
# endpoint: bare slug `gpt-5.5`, no -pro/-mini variants. 400k context on Codex.
"gpt-5.5": 400000,
"gpt-5.4-nano": 400000, # 400k (not 1.05M like full 5.4)
"gpt-5.4-mini": 400000, # 400k (not 1.05M like full 5.4)
@@ -515,7 +494,7 @@ def fetch_model_metadata(force_refresh: bool = False) -> Dict[str, Dict[str, Any
return _model_metadata_cache
try:
response = requests.get(OPENROUTER_MODELS_URL, timeout=10, verify=_resolve_requests_verify())
response = requests.get(OPENROUTER_MODELS_URL, timeout=10)
response.raise_for_status()
data = response.json()
@@ -582,7 +561,6 @@ def fetch_endpoint_model_metadata(
server_url.rstrip("/") + "/api/v1/models",
headers=headers,
timeout=10,
verify=_resolve_requests_verify(),
)
response.raise_for_status()
payload = response.json()
@@ -631,7 +609,7 @@ def fetch_endpoint_model_metadata(
for candidate in candidates:
url = candidate.rstrip("/") + "/models"
try:
response = requests.get(url, headers=headers, timeout=10, verify=_resolve_requests_verify())
response = requests.get(url, headers=headers, timeout=10)
response.raise_for_status()
payload = response.json()
cache: Dict[str, Dict[str, Any]] = {}
@@ -662,10 +640,9 @@ def fetch_endpoint_model_metadata(
try:
# Try /v1/props first (current llama.cpp); fall back to /props for older builds
base = candidate.rstrip("/").replace("/v1", "")
_verify = _resolve_requests_verify()
props_resp = requests.get(base + "/v1/props", headers=headers, timeout=5, verify=_verify)
props_resp = requests.get(base + "/v1/props", headers=headers, timeout=5)
if not props_resp.ok:
props_resp = requests.get(base + "/props", headers=headers, timeout=5, verify=_verify)
props_resp = requests.get(base + "/props", headers=headers, timeout=5)
if props_resp.ok:
props = props_resp.json()
gen_settings = props.get("default_generation_settings", {})
@@ -737,22 +714,6 @@ def get_cached_context_length(model: str, base_url: str) -> Optional[int]:
return cache.get(key)
def _invalidate_cached_context_length(model: str, base_url: str) -> None:
"""Drop a stale cache entry so it gets re-resolved on the next lookup."""
key = f"{model}@{base_url}"
cache = _load_context_cache()
if key not in cache:
return
del cache[key]
path = _get_context_cache_path()
try:
path.parent.mkdir(parents=True, exist_ok=True)
with open(path, "w") as f:
yaml.dump({"context_lengths": cache}, f, default_flow_style=False)
except Exception as e:
logger.debug("Failed to invalidate context length cache entry %s: %s", key, e)
def get_next_probe_tier(current_length: int) -> Optional[int]:
"""Return the next lower probe tier, or None if already at minimum."""
for tier in CONTEXT_PROBE_TIERS:
@@ -1030,7 +991,7 @@ def _query_anthropic_context_length(model: str, base_url: str, api_key: str) ->
"x-api-key": api_key,
"anthropic-version": "2023-06-01",
}
resp = requests.get(url, headers=headers, timeout=10, verify=_resolve_requests_verify())
resp = requests.get(url, headers=headers, timeout=10)
if resp.status_code != 200:
return None
data = resp.json()
@@ -1044,116 +1005,6 @@ def _query_anthropic_context_length(model: str, base_url: str, api_key: str) ->
return None
# Known ChatGPT Codex OAuth context windows (observed via live
# chatgpt.com/backend-api/codex/models probe, Apr 2026). These are the
# `context_window` values, which are what Codex actually enforces — the
# direct OpenAI API has larger limits for the same slugs, but Codex OAuth
# caps lower (e.g. gpt-5.5 is 1.05M on the API, 272K on Codex).
#
# Used as a fallback when the live probe fails (no token, network error).
# Longest keys first so substring match picks the most specific entry.
_CODEX_OAUTH_CONTEXT_FALLBACK: Dict[str, int] = {
"gpt-5.1-codex-max": 272_000,
"gpt-5.1-codex-mini": 272_000,
"gpt-5.3-codex": 272_000,
"gpt-5.2-codex": 272_000,
"gpt-5.4-mini": 272_000,
"gpt-5.5": 272_000,
"gpt-5.4": 272_000,
"gpt-5.2": 272_000,
"gpt-5": 272_000,
}
_codex_oauth_context_cache: Dict[str, int] = {}
_codex_oauth_context_cache_time: float = 0.0
_CODEX_OAUTH_CONTEXT_CACHE_TTL = 3600 # 1 hour
def _fetch_codex_oauth_context_lengths(access_token: str) -> Dict[str, int]:
"""Probe the ChatGPT Codex /models endpoint for per-slug context windows.
Codex OAuth imposes its own context limits that differ from the direct
OpenAI API (e.g. gpt-5.5 is 1.05M on the API, 272K on Codex). The
`context_window` field in each model entry is the authoritative source.
Returns a ``{slug: context_window}`` dict. Empty on failure.
"""
global _codex_oauth_context_cache, _codex_oauth_context_cache_time
now = time.time()
if (
_codex_oauth_context_cache
and now - _codex_oauth_context_cache_time < _CODEX_OAUTH_CONTEXT_CACHE_TTL
):
return _codex_oauth_context_cache
try:
resp = requests.get(
"https://chatgpt.com/backend-api/codex/models?client_version=1.0.0",
headers={"Authorization": f"Bearer {access_token}"},
timeout=10,
verify=_resolve_requests_verify(),
)
if resp.status_code != 200:
logger.debug(
"Codex /models probe returned HTTP %s; falling back to hardcoded defaults",
resp.status_code,
)
return {}
data = resp.json()
except Exception as exc:
logger.debug("Codex /models probe failed: %s", exc)
return {}
entries = data.get("models", []) if isinstance(data, dict) else []
result: Dict[str, int] = {}
for item in entries:
if not isinstance(item, dict):
continue
slug = item.get("slug")
ctx = item.get("context_window")
if isinstance(slug, str) and isinstance(ctx, int) and ctx > 0:
result[slug.strip()] = ctx
if result:
_codex_oauth_context_cache = result
_codex_oauth_context_cache_time = now
return result
def _resolve_codex_oauth_context_length(
model: str, access_token: str = ""
) -> Optional[int]:
"""Resolve a Codex OAuth model's real context window.
Prefers a live probe of chatgpt.com/backend-api/codex/models (when we
have a bearer token), then falls back to ``_CODEX_OAUTH_CONTEXT_FALLBACK``.
"""
model_bare = _strip_provider_prefix(model).strip()
if not model_bare:
return None
if access_token:
live = _fetch_codex_oauth_context_lengths(access_token)
if model_bare in live:
return live[model_bare]
# Case-insensitive match in case casing drifts
model_lower = model_bare.lower()
for slug, ctx in live.items():
if slug.lower() == model_lower:
return ctx
# Fallback: longest-key-first substring match over hardcoded defaults.
model_lower = model_bare.lower()
for slug, ctx in sorted(
_CODEX_OAUTH_CONTEXT_FALLBACK.items(), key=lambda x: len(x[0]), reverse=True
):
if slug in model_lower:
return ctx
return None
def _resolve_nous_context_length(model: str) -> Optional[int]:
"""Resolve Nous Portal model context length via OpenRouter metadata.
@@ -1199,7 +1050,6 @@ def get_model_context_length(
Resolution order:
0. Explicit config override (model.context_length or custom_providers per-model)
1. Persistent cache (previously discovered via probing)
1b. AWS Bedrock static table (must precede custom-endpoint probe)
2. Active endpoint metadata (/models for explicit custom endpoints)
3. Local server query (for local endpoints)
4. Anthropic /v1/models API (API-key users only, not OAuth)
@@ -1222,41 +1072,7 @@ def get_model_context_length(
if base_url:
cached = get_cached_context_length(model, base_url)
if cached is not None:
# Invalidate stale Codex OAuth cache entries: pre-PR #14935 builds
# resolved gpt-5.x to the direct-API value (e.g. 1.05M) via
# models.dev and persisted it. Codex OAuth caps at 272K for every
# slug, so any cached Codex entry at or above 400K is a leftover
# from the old resolution path. Drop it and fall through to the
# live /models probe in step 5 below.
if provider == "openai-codex" and cached >= 400_000:
logger.info(
"Dropping stale Codex cache entry %s@%s -> %s (pre-fix value); "
"re-resolving via live /models probe",
model, base_url, f"{cached:,}",
)
_invalidate_cached_context_length(model, base_url)
else:
return cached
# 1b. AWS Bedrock — use static context length table.
# Bedrock's ListFoundationModels API doesn't expose context window sizes,
# so we maintain a curated table in bedrock_adapter.py that reflects
# AWS-imposed limits (e.g. 200K for Claude models vs 1M on the native
# Anthropic API). This must run BEFORE the custom-endpoint probe at
# step 2 — bedrock-runtime.<region>.amazonaws.com is not in
# _URL_TO_PROVIDER, so it would otherwise be treated as a custom endpoint,
# fail the /models probe (Bedrock doesn't expose that shape), and fall
# back to the 128K default before reaching the original step 4b branch.
if provider == "bedrock" or (
base_url
and base_url_hostname(base_url).startswith("bedrock-runtime.")
and base_url_host_matches(base_url, "amazonaws.com")
):
try:
from agent.bedrock_adapter import get_bedrock_context_length
return get_bedrock_context_length(model)
except ImportError:
pass # boto3 not installed — fall through to generic resolution
return cached
# 2. Active endpoint metadata for truly custom/unknown endpoints.
# Known providers (Copilot, OpenAI, Anthropic, etc.) skip this — their
@@ -1303,7 +1119,19 @@ def get_model_context_length(
if ctx:
return ctx
# 4b. (Bedrock handled earlier at step 1b — before custom-endpoint probe.)
# 4b. AWS Bedrock — use static context length table.
# Bedrock's ListFoundationModels doesn't expose context window sizes,
# so we maintain a curated table in bedrock_adapter.py.
if provider == "bedrock" or (
base_url
and base_url_hostname(base_url).startswith("bedrock-runtime.")
and base_url_host_matches(base_url, "amazonaws.com")
):
try:
from agent.bedrock_adapter import get_bedrock_context_length
return get_bedrock_context_length(model)
except ImportError:
pass # boto3 not installed — fall through to generic resolution
# 5. Provider-aware lookups (before generic OpenRouter cache)
# These are provider-specific and take priority over the generic OR cache,
@@ -1317,32 +1145,10 @@ def get_model_context_length(
if inferred:
effective_provider = inferred
# 5a. Copilot live /models API — max_prompt_tokens from the user's account.
# This catches account-specific models (e.g. claude-opus-4.6-1m) that
# don't exist in models.dev. For models that ARE in models.dev, this
# returns the provider-enforced limit which is what users can actually use.
if effective_provider in ("copilot", "copilot-acp", "github-copilot"):
try:
from hermes_cli.models import get_copilot_model_context
ctx = get_copilot_model_context(model, api_key=api_key)
if ctx:
return ctx
except Exception:
pass # Fall through to models.dev
if effective_provider == "nous":
ctx = _resolve_nous_context_length(model)
if ctx:
return ctx
if effective_provider == "openai-codex":
# Codex OAuth enforces lower context limits than the direct OpenAI
# API for the same slug (e.g. gpt-5.5 is 1.05M on the API but 272K
# on Codex). Authoritative source is Codex's own /models endpoint.
codex_ctx = _resolve_codex_oauth_context_length(model, access_token=api_key or "")
if codex_ctx:
if base_url:
save_context_length(model, base_url, codex_ctx)
return codex_ctx
if effective_provider:
from agent.models_dev import lookup_models_dev_context
ctx = lookup_models_dev_context(effective_provider, model)
+133 -10
View File
@@ -1,29 +1,154 @@
"""Shared slash command helpers for skills.
"""Shared slash command helpers for skills and built-in prompt-style modes.
Shared between CLI (cli.py) and gateway (gateway/run.py) so both surfaces
can invoke skills via /skill-name commands.
can invoke skills via /skill-name commands and prompt-only built-ins like
/plan.
"""
import json
import logging
import re
import subprocess
from datetime import datetime
from pathlib import Path
from typing import Any, Dict, Optional
from hermes_constants import display_hermes_home
from agent.skill_preprocessing import (
expand_inline_shell as _expand_inline_shell,
load_skills_config as _load_skills_config,
substitute_template_vars as _substitute_template_vars,
)
logger = logging.getLogger(__name__)
_skill_commands: Dict[str, Dict[str, Any]] = {}
_PLAN_SLUG_RE = re.compile(r"[^a-z0-9]+")
# Patterns for sanitizing skill names into clean hyphen-separated slugs.
_SKILL_INVALID_CHARS = re.compile(r"[^a-z0-9-]")
_SKILL_MULTI_HYPHEN = re.compile(r"-{2,}")
# Matches ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} tokens in SKILL.md.
# Tokens that don't resolve (e.g. ${HERMES_SESSION_ID} with no session) are
# left as-is so the user can debug them.
_SKILL_TEMPLATE_RE = re.compile(r"\$\{(HERMES_SKILL_DIR|HERMES_SESSION_ID)\}")
# Matches inline shell snippets like: !`date +%Y-%m-%d`
# Non-greedy, single-line only — no newlines inside the backticks.
_INLINE_SHELL_RE = re.compile(r"!`([^`\n]+)`")
# Cap inline-shell output so a runaway command can't blow out the context.
_INLINE_SHELL_MAX_OUTPUT = 4000
def _load_skills_config() -> dict:
"""Load the ``skills`` section of config.yaml (best-effort)."""
try:
from hermes_cli.config import load_config
cfg = load_config() or {}
skills_cfg = cfg.get("skills")
if isinstance(skills_cfg, dict):
return skills_cfg
except Exception:
logger.debug("Could not read skills config", exc_info=True)
return {}
def _substitute_template_vars(
content: str,
skill_dir: Path | None,
session_id: str | None,
) -> str:
"""Replace ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} in skill content.
Only substitutes tokens for which a concrete value is available
unresolved tokens are left in place so the author can spot them.
"""
if not content:
return content
skill_dir_str = str(skill_dir) if skill_dir else None
def _replace(match: re.Match) -> str:
token = match.group(1)
if token == "HERMES_SKILL_DIR" and skill_dir_str:
return skill_dir_str
if token == "HERMES_SESSION_ID" and session_id:
return str(session_id)
return match.group(0)
return _SKILL_TEMPLATE_RE.sub(_replace, content)
def _run_inline_shell(command: str, cwd: Path | None, timeout: int) -> str:
"""Execute a single inline-shell snippet and return its stdout (trimmed).
Failures return a short ``[inline-shell error: ...]`` marker instead of
raising, so one bad snippet can't wreck the whole skill message.
"""
try:
completed = subprocess.run(
["bash", "-c", command],
cwd=str(cwd) if cwd else None,
capture_output=True,
text=True,
timeout=max(1, int(timeout)),
check=False,
)
except subprocess.TimeoutExpired:
return f"[inline-shell timeout after {timeout}s: {command}]"
except FileNotFoundError:
return f"[inline-shell error: bash not found]"
except Exception as exc:
return f"[inline-shell error: {exc}]"
output = (completed.stdout or "").rstrip("\n")
if not output and completed.stderr:
output = completed.stderr.rstrip("\n")
if len(output) > _INLINE_SHELL_MAX_OUTPUT:
output = output[:_INLINE_SHELL_MAX_OUTPUT] + "…[truncated]"
return output
def _expand_inline_shell(
content: str,
skill_dir: Path | None,
timeout: int,
) -> str:
"""Replace every !`cmd` snippet in ``content`` with its stdout.
Runs each snippet with the skill directory as CWD so relative paths in
the snippet work the way the author expects.
"""
if "!`" not in content:
return content
def _replace(match: re.Match) -> str:
cmd = match.group(1).strip()
if not cmd:
return ""
return _run_inline_shell(cmd, skill_dir, timeout)
return _INLINE_SHELL_RE.sub(_replace, content)
def build_plan_path(
user_instruction: str = "",
*,
now: datetime | None = None,
) -> Path:
"""Return the default workspace-relative markdown path for a /plan invocation.
Relative paths are intentional: file tools are task/backend-aware and resolve
them against the active working directory for local, docker, ssh, modal,
daytona, and similar terminal backends. That keeps the plan with the active
workspace instead of the Hermes host's global home directory.
"""
slug_source = (user_instruction or "").strip().splitlines()[0] if user_instruction else ""
slug = _PLAN_SLUG_RE.sub("-", slug_source.lower()).strip("-")
if slug:
slug = "-".join(part for part in slug.split("-")[:8] if part)[:48].strip("-")
slug = slug or "conversation-plan"
timestamp = (now or datetime.now()).strftime("%Y-%m-%d_%H%M%S")
return Path(".hermes") / "plans" / f"{timestamp}-{slug}.md"
def _load_skill_payload(skill_identifier: str, task_id: str | None = None) -> tuple[dict[str, Any], Path | None, str] | None:
"""Load a skill by name/path and return (loaded_payload, skill_dir, display_name)."""
raw_identifier = (skill_identifier or "").strip()
@@ -42,9 +167,7 @@ def _load_skill_payload(skill_identifier: str, task_id: str | None = None) -> tu
else:
normalized = raw_identifier.lstrip("/")
loaded_skill = json.loads(
skill_view(normalized, task_id=task_id, preprocess=False)
)
loaded_skill = json.loads(skill_view(normalized, task_id=task_id))
except Exception:
return None
-131
View File
@@ -1,131 +0,0 @@
"""Shared SKILL.md preprocessing helpers."""
import logging
import re
import subprocess
from pathlib import Path
logger = logging.getLogger(__name__)
# Matches ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} tokens in SKILL.md.
# Tokens that don't resolve (e.g. ${HERMES_SESSION_ID} with no session) are
# left as-is so the user can debug them.
_SKILL_TEMPLATE_RE = re.compile(r"\$\{(HERMES_SKILL_DIR|HERMES_SESSION_ID)\}")
# Matches inline shell snippets like: !`date +%Y-%m-%d`
# Non-greedy, single-line only -- no newlines inside the backticks.
_INLINE_SHELL_RE = re.compile(r"!`([^`\n]+)`")
# Cap inline-shell output so a runaway command can't blow out the context.
_INLINE_SHELL_MAX_OUTPUT = 4000
def load_skills_config() -> dict:
"""Load the ``skills`` section of config.yaml (best-effort)."""
try:
from hermes_cli.config import load_config
cfg = load_config() or {}
skills_cfg = cfg.get("skills")
if isinstance(skills_cfg, dict):
return skills_cfg
except Exception:
logger.debug("Could not read skills config", exc_info=True)
return {}
def substitute_template_vars(
content: str,
skill_dir: Path | None,
session_id: str | None,
) -> str:
"""Replace ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} in skill content.
Only substitutes tokens for which a concrete value is available --
unresolved tokens are left in place so the author can spot them.
"""
if not content:
return content
skill_dir_str = str(skill_dir) if skill_dir else None
def _replace(match: re.Match) -> str:
token = match.group(1)
if token == "HERMES_SKILL_DIR" and skill_dir_str:
return skill_dir_str
if token == "HERMES_SESSION_ID" and session_id:
return str(session_id)
return match.group(0)
return _SKILL_TEMPLATE_RE.sub(_replace, content)
def run_inline_shell(command: str, cwd: Path | None, timeout: int) -> str:
"""Execute a single inline-shell snippet and return its stdout (trimmed).
Failures return a short ``[inline-shell error: ...]`` marker instead of
raising, so one bad snippet can't wreck the whole skill message.
"""
try:
completed = subprocess.run(
["bash", "-c", command],
cwd=str(cwd) if cwd else None,
capture_output=True,
text=True,
timeout=max(1, int(timeout)),
check=False,
)
except subprocess.TimeoutExpired:
return f"[inline-shell timeout after {timeout}s: {command}]"
except FileNotFoundError:
return "[inline-shell error: bash not found]"
except Exception as exc:
return f"[inline-shell error: {exc}]"
output = (completed.stdout or "").rstrip("\n")
if not output and completed.stderr:
output = completed.stderr.rstrip("\n")
if len(output) > _INLINE_SHELL_MAX_OUTPUT:
output = output[:_INLINE_SHELL_MAX_OUTPUT] + "...[truncated]"
return output
def expand_inline_shell(
content: str,
skill_dir: Path | None,
timeout: int,
) -> str:
"""Replace every !`cmd` snippet in ``content`` with its stdout.
Runs each snippet with the skill directory as CWD so relative paths in
the snippet work the way the author expects.
"""
if "!`" not in content:
return content
def _replace(match: re.Match) -> str:
cmd = match.group(1).strip()
if not cmd:
return ""
return run_inline_shell(cmd, skill_dir, timeout)
return _INLINE_SHELL_RE.sub(_replace, content)
def preprocess_skill_content(
content: str,
skill_dir: Path | None,
session_id: str | None = None,
skills_cfg: dict | None = None,
) -> str:
"""Apply configured SKILL.md template and inline-shell preprocessing."""
if not content:
return content
cfg = skills_cfg if isinstance(skills_cfg, dict) else load_skills_config()
if cfg.get("template_vars", True):
content = substitute_template_vars(content, skill_dir, session_id)
if cfg.get("inline_shell", False):
timeout = int(cfg.get("inline_shell_timeout", 10) or 10)
content = expand_inline_shell(content, skill_dir, timeout)
return content
-58
View File
@@ -1,58 +0,0 @@
# Hermes Apps
Platform apps live here. The first app is a cross-platform GUI shell around the
existing Hermes dashboard; it should not fork chat, config, logs, or session UI.
## Shape
```text
apps/
gui/ # cross-platform app shell: dev Chrome shell now, Tauri native next
shared/ # runtime bundle notes/scripts used by Windows + macOS packaging
```
## Desktop Dev
The backend-only GUI mode is:
```bash
hermes dashboard --gui
```
The fast GUI shell is:
```powershell
cd \\wsl$\Ubuntu\home\bb\hermes-agent\apps\gui
npm run dev
```
The native Tauri shell is:
```powershell
cd \\wsl$\Ubuntu\home\bb\hermes-agent\apps\gui
npm run dev:tauri
```
`--gui` implies the embedded TUI; do not pass `--tui` separately for GUI mode.
## MVP Boundary
Included:
- bundled Python runtime
- bundled Node/TUI runtime
- CLI install to PATH
- profile picker and first-run setup
- dashboard health/reconnect state
- tray controls
- desktop notifications
- Windows installer
Deferred:
- code signing
- native self-updater
- store distribution
For MVP updates, the desktop UI should run the existing `hermes update` flow and
surface progress/finish notifications.
-102
View File
@@ -1,102 +0,0 @@
# Hermes GUI
Cross-platform GUI shell for the Hermes dashboard.
## Fast Dev Shell
This gets a GUI window on Windows/WSL today by launching Chrome in app mode:
```bash
cd apps/gui
npm run dev
```
It starts `hermes dashboard --gui --no-open --port 9120`, waits for
`/api/health`, then opens a standalone app window at `http://127.0.0.1:9120`.
## Native Shell
The native Tauri shell is still scaffolded:
```bash
cd apps/gui
npm run dev:tauri
```
From Windows PowerShell on a `\\wsl$` path, use PowerShell `npm`, not
`npm.cmd`:
```powershell
Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass -Force
cd \\wsl$\Ubuntu\home\bb\hermes-agent\apps\gui
npm run dev:tauri
```
`npm.cmd` goes through `cmd.exe`, and `cmd.exe` cannot use UNC paths as the
current directory.
If `npm run` still falls through `cmd.exe`, bypass npm entirely:
```powershell
\\wsl$\Ubuntu\home\bb\hermes-agent\apps\gui\dev-tauri.ps1
```
The launcher builds into `%LOCALAPPDATA%\Hermes\cargo-target\gui` instead of
`\\wsl$` because Windows Cargo incremental locks do not work reliably on UNC
WSL filesystems.
In dev, either start Hermes yourself:
```bash
hermes dashboard --gui --no-open --port 9120
```
or let the native shell start it. The tray menu owns:
- Open Hermes
- Open in Browser
- Restart Hermes Runtime
- Quit Hermes
The native shell reuses a healthy GUI runtime when one is already running.
Otherwise it picks the first free port from `9120..9139`, passes that port into
the WSL/backend process, and navigates the Tauri window there. Set
`HERMES_GUI_PORT` to force a starting port.
## Fresh Install Emulation
Use an isolated Hermes home without touching your real `~/.hermes`:
```powershell
powershell.exe -ExecutionPolicy Bypass -File \\wsl$\Ubuntu\home\bb\hermes-agent\apps\gui\dev-tauri.ps1 -Fresh
```
Reset that disposable home and run again:
```powershell
powershell.exe -ExecutionPolicy Bypass -File \\wsl$\Ubuntu\home\bb\hermes-agent\apps\gui\dev-tauri.ps1 -Fresh -ResetFresh
```
Fresh mode stores state in `%LOCALAPPDATA%\Hermes\fresh-install-home` and starts
from port `9140` so it does not collide with your normal GUI dev session.
Set `HERMES_GUI_MIN_SPLASH_MS` only when debugging the startup screen; default
startup is instant once the backend is healthy.
## Boundary
GUI owns:
- app shell/window
- startup state
- sidecar process lifecycle
- future tray/notifications/installers
Hermes owns:
- dashboard UI
- auth/session token
- profiles/config/env
- TUI/PTT chat bridge
- tools/skills/gateway
- update flow
-57
View File
@@ -1,57 +0,0 @@
param(
[string]$Command = "dev",
[switch]$Fresh,
[switch]$ResetFresh
)
$ErrorActionPreference = "Stop"
Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass -Force
$AppRoot = Split-Path -Parent $MyInvocation.MyCommand.Path
$Script = Join-Path $AppRoot "scripts\tauri.mjs"
if (-not (Get-Command node -ErrorAction SilentlyContinue)) {
throw "Windows Node.js was not found. Install it with: winget install OpenJS.NodeJS.LTS"
}
if (-not (Get-Command rustc -ErrorAction SilentlyContinue)) {
throw "Windows Rust was not found. Install it with: winget install Rustlang.Rustup"
}
$Tauri = Get-Command tauri -ErrorAction SilentlyContinue
$CargoTauri = Get-Command cargo-tauri -ErrorAction SilentlyContinue
if (-not $Tauri -and -not $CargoTauri) {
throw "Tauri CLI not found. Install it with: npm install -g @tauri-apps/cli (run from a normal Windows path, not \\wsl$)"
}
$env:CARGO_INCREMENTAL = "0"
$env:CARGO_TARGET_DIR = Join-Path $env:LOCALAPPDATA "Hermes\cargo-target\gui"
New-Item -ItemType Directory -Force -Path $env:CARGO_TARGET_DIR | Out-Null
if ($Fresh) {
$FreshHome = Join-Path $env:LOCALAPPDATA "Hermes\fresh-install-home"
if ($ResetFresh -and (Test-Path $FreshHome)) {
Remove-Item -Recurse -Force $FreshHome
}
New-Item -ItemType Directory -Force -Path $FreshHome | Out-Null
$env:HERMES_HOME = $FreshHome
$env:HERMES_GUI_PORT = "9140"
$env:HERMES_GUI_FRESH = "1"
Write-Host "Fresh GUI mode"
Write-Host " HERMES_HOME=$FreshHome"
Write-Host " HERMES_GUI_PORT=$env:HERMES_GUI_PORT"
}
Push-Location $AppRoot
try {
if ($Tauri) {
& tauri $Command
}
else {
& cargo tauri $Command
}
}
finally {
Pop-Location
}
-13
View File
@@ -1,13 +0,0 @@
{
"name": "@hermes/gui",
"version": "0.0.0",
"private": true,
"type": "module",
"scripts": {
"dev": "node scripts/dev-shell.mjs",
"dev:tauri": "node scripts/tauri.mjs dev",
"build": "node scripts/tauri.mjs build",
"dashboard": "node scripts/start-dashboard.mjs",
"tauri": "node scripts/tauri.mjs"
}
}
-156
View File
@@ -1,156 +0,0 @@
import { spawn, spawnSync } from "node:child_process";
import { createServer } from "node:net";
import { dirname, resolve } from "node:path";
import { setTimeout as delay } from "node:timers/promises";
import { fileURLToPath } from "node:url";
const here = dirname(fileURLToPath(import.meta.url));
const repoRoot = resolve(here, "../../..");
const python = process.env.HERMES_PYTHON || "python";
let port = process.env.HERMES_GUI_PORT || "9120";
let url = `http://127.0.0.1:${port}`;
let dashboard = null;
function stop() {
if (dashboard && !dashboard.killed) dashboard.kill();
}
process.on("SIGINT", () => {
stop();
process.exit(130);
});
process.on("SIGTERM", () => {
stop();
process.exit(143);
});
process.on("exit", stop);
async function waitForHealth() {
for (let i = 0; i < 120; i += 1) {
if (await isHealthy()) return true;
await delay(500);
}
return false;
}
async function isHealthy() {
try {
const res = await fetch(`${url}/api/health`, {
signal: AbortSignal.timeout(1000),
});
const data = await res.json();
return res.ok && data.status === "ok";
} catch {
return false;
}
}
function canBind(candidate) {
return new Promise((resolveBind) => {
const server = createServer();
server.once("error", () => resolveBind(false));
server.listen(Number(candidate), "127.0.0.1", () => {
server.close(() => resolveBind(true));
});
});
}
async function choosePort() {
if (process.env.HERMES_GUI_PORT) return;
let candidate = Number(port);
for (let i = 0; i < 20; i += 1) {
if (await canBind(candidate)) {
port = String(candidate);
url = `http://127.0.0.1:${port}`;
return;
}
candidate += 1;
}
}
function startDashboard() {
dashboard = spawn(
python,
[
"-m",
"hermes_cli.main",
"dashboard",
"--gui",
"--no-open",
"--host",
"127.0.0.1",
"--port",
port,
],
{
cwd: repoRoot,
env: {
...process.env,
HERMES_GUI: "1",
},
stdio: "inherit",
},
);
dashboard.on("exit", (code) => {
process.exit(code ?? 0);
});
}
function run(command, args) {
return (
spawnSync(command, args, {
shell: process.platform === "win32",
stdio: "ignore",
}).status === 0
);
}
function openGuiWindow() {
if (process.platform === "win32") {
return (
run("cmd.exe", ["/C", "start", "", "chrome", `--app=${url}`]) ||
run("cmd.exe", ["/C", "start", "", "msedge", `--app=${url}`]) ||
run("cmd.exe", ["/C", "start", "", url])
);
}
if (process.env.WSL_DISTRO_NAME) {
return (
run("cmd.exe", ["/C", "start", "", "chrome", `--app=${url}`]) ||
run("cmd.exe", ["/C", "start", "", "msedge", `--app=${url}`]) ||
run("cmd.exe", ["/C", "start", "", url])
);
}
if (process.platform === "darwin") {
return (
run("open", ["-na", "Google Chrome", "--args", `--app=${url}`]) ||
run("open", [url])
);
}
return (
run("google-chrome", [`--app=${url}`]) ||
run("chromium", [`--app=${url}`]) ||
run("xdg-open", [url])
);
}
if (await isHealthy()) {
console.log(`Hermes GUI already running -> ${url}`);
openGuiWindow();
process.exit(0);
}
await choosePort();
startDashboard();
if (await waitForHealth()) {
console.log(`Hermes GUI -> ${url}`);
openGuiWindow();
} else {
console.error(`Hermes GUI did not become healthy at ${url}`);
}
-95
View File
@@ -1,95 +0,0 @@
import { spawn } from "node:child_process";
import { dirname, resolve } from "node:path";
import { fileURLToPath } from "node:url";
const here = dirname(fileURLToPath(import.meta.url));
const repoRoot = resolve(here, "../../..");
const python = process.env.HERMES_PYTHON || "python";
const port = process.env.HERMES_GUI_PORT || "9120";
const url = `http://127.0.0.1:${port}`;
async function isHealthy() {
try {
const res = await fetch(`${url}/api/health`, {
signal: AbortSignal.timeout(1000),
});
const data = await res.json();
return res.ok && data.status === "ok";
} catch {
return false;
}
}
function wslRepoRoot() {
const normalized = repoRoot.replaceAll("\\", "/");
const parts = normalized.split("/");
const host = parts[2]?.toLowerCase();
if (process.platform !== "win32") return null;
if (host !== "wsl$" && host !== "wsl.localhost") return null;
const distro = parts[3];
const path = `/${parts.slice(4).join("/")}`;
return distro && path !== "/" ? { distro, path } : null;
}
function spawnDashboard() {
const wsl = wslRepoRoot();
if (wsl) {
return spawn(
"wsl.exe",
[
"-d",
wsl.distro,
"--cd",
wsl.path,
"env",
"HERMES_GUI=1",
process.env.HERMES_WSL_PYTHON || "python",
"-m",
"hermes_cli.main",
"dashboard",
"--gui",
"--no-open",
"--host",
"127.0.0.1",
"--port",
port,
],
{ stdio: "inherit" },
);
}
return spawn(
python,
[
"-m",
"hermes_cli.main",
"dashboard",
"--gui",
"--no-open",
"--host",
"127.0.0.1",
"--port",
port,
],
{
cwd: repoRoot,
env: {
...process.env,
HERMES_GUI: "1",
},
stdio: "inherit",
},
);
}
if (await isHealthy()) {
console.log(`Hermes GUI already running -> ${url}`);
process.exit(0);
}
const child = spawnDashboard();
child.on("exit", (code, signal) => {
if (signal) process.kill(process.pid, signal);
process.exit(code ?? 0);
});
-90
View File
@@ -1,90 +0,0 @@
import { spawnSync } from "node:child_process";
import { existsSync } from "node:fs";
import { dirname, resolve } from "node:path";
import { fileURLToPath } from "node:url";
const here = dirname(fileURLToPath(import.meta.url));
const appRoot = resolve(here, "..");
const bin = process.platform === "win32" ? "tauri.cmd" : "tauri";
const localTauri = resolve(appRoot, "node_modules", ".bin", bin);
const args = process.argv.slice(2);
function isWsl() {
return process.platform === "linux" && !!process.env.WSL_DISTRO_NAME;
}
function quotePs(value) {
return `'${value.replaceAll("'", "''")}'`;
}
function dispatchToWindows() {
const pathResult = spawnSync("wslpath", ["-w", appRoot], {
encoding: "utf8",
});
const windowsPath = pathResult.stdout.trim();
if (!windowsPath) return false;
const command = [
"$ErrorActionPreference = 'Stop'",
"Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass -Force",
"if (-not (Get-Command npm -ErrorAction SilentlyContinue)) {",
' Write-Error "Windows npm was not found. Install Windows Node.js first: winget install OpenJS.NodeJS.LTS"',
"}",
"if (-not (Get-Command rustc -ErrorAction SilentlyContinue)) {",
' Write-Error "Windows Rust was not found. Install Rust first: winget install Rustlang.Rustup"',
"}",
`Set-Location -LiteralPath ${quotePs(windowsPath)}`,
"& npm run dev:tauri",
].join("; ");
const result = spawnSync(
"powershell.exe",
["-NoProfile", "-ExecutionPolicy", "Bypass", "-Command", command],
{ stdio: "inherit" },
);
process.exit(result.status ?? 1);
}
function run(command, commandArgs, { exit = true } = {}) {
if (process.platform === "win32") {
const psCommand = [
"$ErrorActionPreference = 'Stop'",
"Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass -Force",
`Set-Location -LiteralPath ${quotePs(appRoot)}`,
`& ${quotePs(command)} ${commandArgs.map(quotePs).join(" ")}`,
].join("; ");
const result = spawnSync(
"powershell.exe",
["-NoProfile", "-ExecutionPolicy", "Bypass", "-Command", psCommand],
{ stdio: "inherit" },
);
if (result.error && result.error.code === "ENOENT") return false;
if (exit) process.exit(result.status ?? 1);
return result.status === 0;
}
const result = spawnSync(command, commandArgs, {
cwd: appRoot,
env: process.env,
stdio: "inherit",
});
if (result.error && result.error.code === "ENOENT") return false;
if (exit) process.exit(result.status ?? 1);
return result.status === 0;
}
if (isWsl() && process.env.HERMES_GUI_TAURI_WSL !== "1") {
console.log("Launching native Windows Tauri from WSL...");
dispatchToWindows();
console.error(
"Could not hand off to Windows PowerShell. Run this from Windows PowerShell instead:",
);
console.error(" cd \\\\wsl$\\Ubuntu\\home\\bb\\hermes-agent\\apps\\gui");
console.error(" npm run dev:tauri");
process.exit(1);
}
if (existsSync(localTauri)) run(localTauri, args);
if (run("tauri", args, { exit: false })) process.exit(0);
if (run("cargo", ["tauri", ...args], { exit: false })) process.exit(0);
run("npx", ["--yes", "@tauri-apps/cli@latest", ...args]);
-1
View File
@@ -1 +0,0 @@
/target/
-5579
View File
File diff suppressed because it is too large Load Diff
-17
View File
@@ -1,17 +0,0 @@
[package]
name = "hermes-gui"
version = "0.0.0"
description = "Hermes GUI shell"
edition = "2021"
[lib]
name = "hermes_gui_lib"
crate-type = ["staticlib", "cdylib", "rlib"]
[build-dependencies]
tauri-build = { version = "2", features = [] }
[dependencies]
tauri = { version = "2", features = ["tray-icon"] }
tauri-plugin-notification = "2"
tauri-plugin-opener = "2"
-3
View File
@@ -1,3 +0,0 @@
fn main() {
tauri_build::build();
}
@@ -1,7 +0,0 @@
{
"$schema": "../gen/schemas/desktop-schema.json",
"identifier": "default",
"description": "Default Hermes GUI permissions",
"windows": ["main"],
"permissions": ["core:default", "notification:default", "opener:default"]
}
File diff suppressed because one or more lines are too long
@@ -1 +0,0 @@
{"default":{"identifier":"default","description":"Default Hermes GUI permissions","local":true,"windows":["main"],"permissions":["core:default","notification:default","opener:default"]}}
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
Binary file not shown.

Before

Width:  |  Height:  |  Size: 135 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.1 KiB

-4
View File
@@ -1,4 +0,0 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 100 100">
<rect width="100" height="100" rx="18" fill="#071313"/>
<text x="50" y="70" text-anchor="middle" font-size="68" fill="#f0e6d2"></text>
</svg>

Before

Width:  |  Height:  |  Size: 212 B

-1
View File
@@ -1 +0,0 @@
-433
View File
@@ -1,433 +0,0 @@
use std::{
io::{Read, Write},
net::{TcpListener, TcpStream},
process::{Child, Command, Stdio},
sync::Mutex,
time::{Duration, Instant},
};
use tauri::{
image::Image,
menu::{Menu, MenuItem, PredefinedMenuItem},
tray::{MouseButton, MouseButtonState, TrayIconBuilder, TrayIconEvent},
App, AppHandle, Manager, WebviewWindow,
};
const GUI_HOST: &str = "127.0.0.1";
const DEFAULT_GUI_PORT: u16 = 9120;
const MIN_SPLASH_MS: u64 = 0;
const SPLASH_URL: &str = "data:text/html,%3C!doctype%20html%3E%3Cmeta%20charset%3Dutf-8%3E%3Cstyle%3Ebody%7Bmargin%3A0%3Bheight%3A100vh%3Bdisplay%3Agrid%3Bplace-items%3Acenter%3Bbackground%3A%23071313%3Bcolor%3A%23f0e6d2%3Bfont%3A14px%20monospace%3Bletter-spacing%3A.08em%3Btext-transform%3Auppercase%7D%3C%2Fstyle%3E%3Cbody%3EStarting%20Hermes%E2%80%A6%3C%2Fbody%3E";
struct GuiState {
child: Mutex<Option<Child>>,
port: Mutex<u16>,
}
fn gui_url(port: u16) -> String {
format!("http://{GUI_HOST}:{port}")
}
fn check_health(port: u16) -> bool {
let Ok(mut stream) = TcpStream::connect_timeout(
&format!("{GUI_HOST}:{port}").parse().unwrap(),
Duration::from_secs(1),
) else {
return false;
};
let _ = stream.set_read_timeout(Some(Duration::from_secs(1)));
let request =
format!("GET /api/health HTTP/1.1\r\nHost: {GUI_HOST}:{port}\r\nConnection: close\r\n\r\n");
if stream.write_all(request.as_bytes()).is_err() {
return false;
}
let mut response = String::new();
let _ = stream.read_to_string(&mut response);
response.contains("200 OK")
&& response.contains("\"status\":\"ok\"")
&& response.contains("\"mode\":\"gui\"")
}
fn can_bind(port: u16) -> bool {
TcpListener::bind((GUI_HOST, port)).is_ok()
}
fn base_port() -> u16 {
std::env::var("HERMES_GUI_PORT")
.ok()
.and_then(|raw| raw.parse().ok())
.unwrap_or(DEFAULT_GUI_PORT)
}
fn select_port() -> u16 {
let start = base_port();
for port in start..start.saturating_add(20) {
if check_health(port) || can_bind(port) {
return port;
}
}
start
}
fn repo_root() -> std::path::PathBuf {
std::path::PathBuf::from(env!("CARGO_MANIFEST_DIR"))
.join("../../..")
.canonicalize()
.unwrap_or_else(|_| std::path::PathBuf::from("."))
}
fn runtime_dir() -> Option<std::path::PathBuf> {
std::env::var_os("HERMES_GUI_RUNTIME_DIR").map(std::path::PathBuf::from)
}
fn runtime_python(runtime: &std::path::Path) -> std::path::PathBuf {
if cfg!(target_os = "windows") {
runtime.join("venv").join("Scripts").join("python.exe")
} else {
runtime.join("venv").join("bin").join("python")
}
}
fn wsl_path(root: &std::path::Path) -> Option<(String, String)> {
let raw = root.to_string_lossy().replace('\\', "/");
let parts: Vec<&str> = raw.split('/').collect();
let host = parts.get(2)?.to_ascii_lowercase();
if host != "wsl$" && host != "wsl.localhost" {
return None;
}
let distro = parts.get(3)?.to_string();
let path = format!("/{}", parts.get(4..)?.join("/"));
Some((distro, path))
}
fn start_dashboard(port: u16) -> std::io::Result<Child> {
if let Some(runtime) = runtime_dir() {
let python = runtime_python(&runtime);
let web_dist = runtime.join("web_dist");
let tui_dir = runtime.join("ui-tui");
let port = port.to_string();
return Command::new(python)
.args([
"-m",
"hermes_cli.main",
"dashboard",
"--gui",
"--no-open",
"--host",
GUI_HOST,
"--port",
&port,
])
.env("HERMES_GUI", "1")
.env("HERMES_GUI_PORT", &port)
.env("HERMES_WEB_DIST", web_dist)
.env("HERMES_TUI_DIR", tui_dir)
.envs(
std::env::vars()
.filter(|(key, _)| matches!(key.as_str(), "HERMES_HOME" | "HERMES_GUI_FRESH")),
)
.stdin(Stdio::null())
.stdout(Stdio::null())
.stderr(Stdio::null())
.spawn();
}
let root = repo_root();
let port = port.to_string();
if let Some((distro, path)) = wsl_path(&root) {
let port_env = format!("HERMES_GUI_PORT={port}");
let mut env_args = vec!["HERMES_GUI=1".to_string(), port_env];
if let Ok(home) = std::env::var("HERMES_HOME") {
env_args.push(format!("HERMES_HOME={home}"));
}
if let Ok(fresh) = std::env::var("HERMES_GUI_FRESH") {
env_args.push(format!("HERMES_GUI_FRESH={fresh}"));
}
let mut args = vec![
"-d".to_string(),
distro,
"--cd".to_string(),
path,
"env".to_string(),
];
args.extend(env_args);
args.extend([
"python".to_string(),
"-m".to_string(),
"hermes_cli.main".to_string(),
"dashboard".to_string(),
"--gui".to_string(),
"--no-open".to_string(),
"--host".to_string(),
GUI_HOST.to_string(),
"--port".to_string(),
port.clone(),
]);
return Command::new("wsl.exe")
.args(args)
.stdin(Stdio::null())
.stdout(Stdio::null())
.stderr(Stdio::null())
.spawn();
}
Command::new("python")
.args([
"-m",
"hermes_cli.main",
"dashboard",
"--gui",
"--no-open",
"--host",
GUI_HOST,
"--port",
&port,
])
.current_dir(root)
.env("HERMES_GUI", "1")
.env("HERMES_GUI_PORT", &port)
.envs(
std::env::vars()
.filter(|(key, _)| matches!(key.as_str(), "HERMES_HOME" | "HERMES_GUI_FRESH")),
)
.stdin(Stdio::null())
.stdout(Stdio::null())
.stderr(Stdio::null())
.spawn()
}
fn stop_owned_dashboard(state: &GuiState) {
let Some(mut child) = state.child.lock().expect("gui child lock poisoned").take() else {
return;
};
let _ = child.kill();
let _ = child.wait();
}
fn current_port(state: &GuiState) -> u16 {
*state.port.lock().expect("gui port lock poisoned")
}
fn ensure_dashboard(state: &GuiState) -> Result<(), String> {
let current = current_port(state);
if check_health(current) {
return Ok(());
}
let port = select_port();
*state.port.lock().expect("gui port lock poisoned") = port;
if check_health(port) {
return Ok(());
}
let child = start_dashboard(port).map_err(|err| {
format!(
"Could not auto-start Hermes dashboard ({err}). Start it manually with: hermes dashboard --gui --no-open --port {port}"
)
})?;
*state.child.lock().expect("gui child lock poisoned") = Some(child);
Ok(())
}
fn navigate_when_ready(window: WebviewWindow, port: u16) {
std::thread::spawn(move || {
let started = Instant::now();
while started.elapsed() < Duration::from_secs(60) {
if check_health(port) {
let min_splash = std::env::var("HERMES_GUI_MIN_SPLASH_MS")
.ok()
.and_then(|raw| raw.parse::<u64>().ok())
.unwrap_or(MIN_SPLASH_MS);
let elapsed = started.elapsed();
if elapsed < Duration::from_millis(min_splash) {
std::thread::sleep(Duration::from_millis(min_splash) - elapsed);
}
if let Ok(url) = tauri::Url::parse(&gui_url(port)) {
let _ = window.navigate(url);
let _ = window.show();
let _ = window.set_focus();
}
return;
}
std::thread::sleep(Duration::from_millis(500));
}
});
}
fn show_main_window(app: &AppHandle) {
if let Some(window) = app.get_webview_window("main") {
let _ = window.show();
let _ = window.set_focus();
}
}
fn open_browser(port: u16) {
let url = gui_url(port);
#[cfg(target_os = "windows")]
let _ = Command::new("cmd")
.args(["/C", "start", "", &url])
.stdin(Stdio::null())
.stdout(Stdio::null())
.stderr(Stdio::null())
.spawn();
#[cfg(target_os = "macos")]
let _ = Command::new("open").arg(&url).spawn();
#[cfg(all(unix, not(target_os = "macos")))]
let _ = Command::new("xdg-open").arg(&url).spawn();
}
fn tray_icon() -> Image<'static> {
let width = 32;
let height = 32;
let mut rgba = Vec::with_capacity(width * height * 4);
for y in 0..height {
for x in 0..width {
let mark = (14..=17).contains(&x) && (5..=26).contains(&y)
|| (8..=23).contains(&x) && (13..=16).contains(&y)
|| (10..=21).contains(&x) && (y == 5 || y == 26);
if mark {
rgba.extend_from_slice(&[0xF0, 0xE6, 0xD2, 0xFF]);
} else {
rgba.extend_from_slice(&[0x07, 0x13, 0x13, 0xFF]);
}
}
}
Image::new_owned(rgba, width as u32, height as u32)
}
fn restart_runtime(app: &AppHandle) -> Result<(), String> {
let state = app.state::<GuiState>();
stop_owned_dashboard(&state);
ensure_dashboard(&state)?;
if let Some(window) = app.get_webview_window("main") {
if let Ok(url) = tauri::Url::parse(SPLASH_URL) {
let _ = window.navigate(url);
}
let port = current_port(&state);
navigate_when_ready(window, port);
}
Ok(())
}
fn setup_tray(app: &App) -> tauri::Result<()> {
let open_item = MenuItem::with_id(app, "open", "Open Hermes", true, None::<&str>)?;
let browser_item = MenuItem::with_id(app, "browser", "Open in Browser", true, None::<&str>)?;
let restart_item =
MenuItem::with_id(app, "restart", "Restart Hermes Runtime", true, None::<&str>)?;
let status_item = MenuItem::with_id(app, "status", "Local runtime", false, None::<&str>)?;
let separator = PredefinedMenuItem::separator(app)?;
let separator2 = PredefinedMenuItem::separator(app)?;
let quit_item = MenuItem::with_id(app, "quit", "Quit Hermes", true, None::<&str>)?;
let menu = Menu::with_items(
app,
&[
&open_item,
&browser_item,
&restart_item,
&separator,
&status_item,
&separator2,
&quit_item,
],
)?;
let icon = tray_icon();
let _tray = TrayIconBuilder::new()
.icon(icon)
.menu(&menu)
.tooltip("Hermes")
.on_menu_event(|app, event| match event.id.as_ref() {
"open" => show_main_window(app),
"browser" => {
let state = app.state::<GuiState>();
open_browser(current_port(&state));
}
"restart" => {
if let Err(err) = restart_runtime(app) {
eprintln!("Failed to restart Hermes runtime: {err}");
}
}
"quit" => {
let state = app.state::<GuiState>();
stop_owned_dashboard(&state);
app.exit(0);
}
_ => {}
})
.on_tray_icon_event(|tray, event| {
if let TrayIconEvent::Click {
button: MouseButton::Left,
button_state: MouseButtonState::Up,
..
} = event
{
show_main_window(&tray.app_handle());
}
})
.build(app)?;
Ok(())
}
#[tauri::command]
fn runtime_running(app: AppHandle) -> bool {
let state = app.state::<GuiState>();
check_health(current_port(&state))
}
#[tauri::command]
fn restart_runtime_command(app: AppHandle) -> Result<(), String> {
restart_runtime(&app)
}
pub fn run() {
tauri::Builder::default()
.plugin(tauri_plugin_notification::init())
.plugin(tauri_plugin_opener::init())
.manage(GuiState {
child: Mutex::new(None),
port: Mutex::new(base_port()),
})
.invoke_handler(tauri::generate_handler![
runtime_running,
restart_runtime_command
])
.setup(|app| {
setup_tray(app)?;
if let Some(window) = app.get_webview_window("main") {
if let Ok(url) = tauri::Url::parse(SPLASH_URL) {
let _ = window.navigate(url);
}
let state = app.state::<GuiState>();
if let Err(err) = ensure_dashboard(&state) {
eprintln!("{err}");
}
let port = current_port(&state);
navigate_when_ready(window, port);
}
Ok(())
})
.on_window_event(|window, event| {
if let tauri::WindowEvent::CloseRequested { api, .. } = event {
api.prevent_close();
let _ = window.hide();
}
})
.run(tauri::generate_context!())
.expect("failed to run Hermes GUI");
}
-5
View File
@@ -1,5 +0,0 @@
#![cfg_attr(not(debug_assertions), windows_subsystem = "windows")]
fn main() {
hermes_gui_lib::run();
}
-38
View File
@@ -1,38 +0,0 @@
{
"$schema": "https://schema.tauri.app/config/2",
"productName": "Hermes",
"version": "0.0.0",
"identifier": "ai.nous.hermes.gui",
"build": {
"beforeDevCommand": "",
"beforeBuildCommand": "",
"devUrl": "http://127.0.0.1:9120",
"frontendDist": "../dist"
},
"app": {
"withGlobalTauri": true,
"windows": [
{
"label": "main",
"title": "Hermes",
"width": 1400,
"height": 900,
"minWidth": 900,
"minHeight": 600,
"resizable": true,
"center": true
}
],
"security": {
"csp": "default-src 'self' http://127.0.0.1:* http://localhost:*; connect-src 'self' http://127.0.0.1:* http://localhost:* ws://127.0.0.1:* ws://localhost:*; img-src 'self' data: blob: http://127.0.0.1:* http://localhost:*; style-src 'self' 'unsafe-inline' http://127.0.0.1:* http://localhost:*; script-src 'self' 'unsafe-inline' 'unsafe-eval' http://127.0.0.1:* http://localhost:*"
}
},
"bundle": {
"active": true,
"icon": ["icons/32x32.png", "icons/icon.ico", "icons/icon.svg"],
"targets": ["nsis", "dmg", "app"],
"resources": {
"sidecars": "sidecars/"
}
}
}
-5
View File
@@ -1,5 +0,0 @@
// Browser-side GUI bridge entry.
//
// The dashboard remains in `web/`; this file is reserved for future shell-only
// glue if we need pre-navigation scripts or native event wiring.
export {};
-44
View File
@@ -1,44 +0,0 @@
param(
[string]$Out = "$PSScriptRoot\..\gui\src-tauri\sidecars\hermes-runtime",
[string]$Python = "python"
)
$Root = Resolve-Path "$PSScriptRoot\..\.."
Write-Host "Bundling Hermes GUI runtime"
Write-Host "repo: $Root"
Write-Host "out: $Out"
if (Test-Path $Out) {
Remove-Item -Recurse -Force $Out
}
New-Item -ItemType Directory -Force -Path $Out | Out-Null
Write-Host "-> Building dashboard"
npm --prefix "$Root\web" ci
npm --prefix "$Root\web" run build
Copy-Item -Recurse "$Root\web\dist" "$Out\web_dist"
Write-Host "-> Building TUI"
npm --prefix "$Root\ui-tui" ci
npm --prefix "$Root\ui-tui" run build
New-Item -ItemType Directory -Force -Path "$Out\ui-tui" | Out-Null
Copy-Item -Recurse "$Root\ui-tui\dist" "$Out\ui-tui\dist"
Copy-Item "$Root\ui-tui\package.json" "$Out\ui-tui\package.json"
Copy-Item "$Root\ui-tui\package-lock.json" "$Out\ui-tui\package-lock.json"
Copy-Item -Recurse "$Root\ui-tui\node_modules" "$Out\ui-tui\node_modules"
Write-Host "-> Creating Python runtime"
& $Python -m venv "$Out\venv"
& "$Out\venv\Scripts\python.exe" -m pip install --upgrade pip
& "$Out\venv\Scripts\python.exe" -m pip install -e "$Root[web,pty]"
@"
# Hermes GUI Runtime
Generated by apps/shared/bundle-runtime.ps1.
Set HERMES_GUI_RUNTIME_DIR to this directory before launching the Tauri shell.
"@ | Set-Content "$Out\README.md"
Write-Host "Runtime bundle ready: $Out"
-41
View File
@@ -1,41 +0,0 @@
#!/usr/bin/env bash
set -euo pipefail
ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)"
OUT="${1:-"$ROOT/apps/gui/src-tauri/sidecars/hermes-runtime"}"
PYTHON="${PYTHON:-python}"
echo "Bundling Hermes GUI runtime"
echo "repo: $ROOT"
echo "out: $OUT"
rm -rf "$OUT"
mkdir -p "$OUT"
echo "→ Building dashboard"
npm --prefix "$ROOT/web" ci
npm --prefix "$ROOT/web" run build
cp -a "$ROOT/web/dist" "$OUT/web_dist"
echo "→ Building TUI"
npm --prefix "$ROOT/ui-tui" ci
npm --prefix "$ROOT/ui-tui" run build
mkdir -p "$OUT/ui-tui"
cp -a "$ROOT/ui-tui/dist" "$OUT/ui-tui/dist"
cp -a "$ROOT/ui-tui/package.json" "$ROOT/ui-tui/package-lock.json" "$OUT/ui-tui/"
cp -a "$ROOT/ui-tui/node_modules" "$OUT/ui-tui/node_modules"
echo "→ Creating Python runtime"
"$PYTHON" -m venv "$OUT/venv"
"$OUT/venv/bin/python" -m pip install --upgrade pip
"$OUT/venv/bin/python" -m pip install -e "$ROOT[web,pty]"
cat > "$OUT/README.md" <<EOF
# Hermes GUI Runtime
Generated by apps/shared/bundle-runtime.sh.
Set HERMES_GUI_RUNTIME_DIR to this directory before launching the Tauri shell.
EOF
echo "✓ Runtime bundle ready: $OUT"
-33
View File
@@ -1,33 +0,0 @@
# GUI Runtime Contract
The GUI shell starts Hermes with a small, explicit environment.
## Environment
```text
HERMES_GUI=1
HERMES_WEB_DIST=<bundled web dist>
HERMES_TUI_DIR=<bundled ui-tui dir>
```
The native shell uses `127.0.0.1:9120` as its initial GUI port during dev.
Bundled builds should keep the port private to the local machine and expose it
through `/api/health` and `/api/runtime`.
The shell should also pass the selected profile through the normal Hermes CLI
profile mechanism once the profile picker is wired.
## Ports
Use `127.0.0.1` only. Start with the GUI default port, then fall back to a
free port if occupied. Show the chosen port in the tray menu.
## User Data
The installer owns app files. Hermes owns user state under `HERMES_HOME`.
Uninstallers must not delete user state unless the user explicitly asks.
## Update Model
MVP does not use Tauri's native updater. GUI runs `hermes update`, tails the
action log, notifies completion, then offers to restart the runtime.
+6 -2
View File
@@ -951,9 +951,13 @@ class BatchRunner:
root_logger.setLevel(original_level)
# Aggregate all batch statistics and update checkpoint
all_completed_prompts = list(completed_prompts_set)
total_reasoning_stats = {"total_assistant_turns": 0, "turns_with_reasoning": 0, "turns_without_reasoning": 0}
for batch_result in results:
# Add newly completed prompts
all_completed_prompts.extend(batch_result.get("completed_prompts", []))
# Aggregate tool stats
for tool_name, stats in batch_result.get("tool_stats", {}).items():
if tool_name not in total_tool_stats:
@@ -973,7 +977,7 @@ class BatchRunner:
# Save final checkpoint (best-effort; incremental writes already happened)
try:
checkpoint_data["completed_prompts"] = sorted(completed_prompts_set)
checkpoint_data["completed_prompts"] = all_completed_prompts
self._save_checkpoint(checkpoint_data, lock=checkpoint_lock)
except Exception as ckpt_err:
print(f"⚠️ Warning: Failed to save final checkpoint: {ckpt_err}")
+2 -19
View File
@@ -326,16 +326,6 @@ compression:
# To pin a specific model/provider for compression summaries, use the
# auxiliary section below (auxiliary.compression.provider / model).
# =============================================================================
# Anthropic prompt caching TTL
# =============================================================================
# When prompt caching is active (Claude via OpenRouter or native Anthropic),
# Anthropic supports two TTL tiers for cached prefixes: "5m" (default) and
# "1h". Other values are ignored and "5m" is used.
#
prompt_caching:
cache_ttl: "5m" # use "1h" for long sessions with pauses between turns
# =============================================================================
# Auxiliary Models (Advanced — Experimental)
# =============================================================================
@@ -790,16 +780,9 @@ code_execution:
# Supports single tasks and batch mode (default 3 parallel, configurable).
delegation:
max_iterations: 50 # Max tool-calling turns per child (default: 50)
# max_concurrent_children: 3 # Max parallel child agents per batch (default: 3, floor: 1, no ceiling).
# WARNING: values above 10 multiply API cost linearly.
# max_spawn_depth: 1 # Delegation tree depth cap (range: 1-3, default: 1 = flat).
# Raise to 2 to allow workers to spawn their own subagents.
# Requires role="orchestrator" on intermediate agents.
# max_concurrent_children: 3 # Max parallel child agents (default: 3)
# max_spawn_depth: 1 # Tree depth cap (1-3, default: 1 = flat). Raise to 2 or 3 to allow orchestrator children to spawn their own workers.
# orchestrator_enabled: true # Kill switch for role="orchestrator" children (default: true).
# subagent_auto_approve: false # When a subagent hits a dangerous-command approval prompt, auto-deny (default: false)
# or auto-approve "once" (true) instead of blocking on stdin.
# The parent TUI owns stdin, so blocking would deadlock; non-interactive resolution is required.
# Both choices emit a logger.warning audit line. Flip to true only for cron/batch pipelines.
# inherit_mcp_toolsets: true # When explicit child toolsets are narrowed, also keep the parent's MCP toolsets (default: true). Set false for strict intersection.
# model: "google/gemini-3-flash-preview" # Override model for subagents (empty = inherit parent)
# provider: "openrouter" # Override provider for subagents (empty = inherit parent)
+179 -199
View File
@@ -1688,6 +1688,7 @@ def _looks_like_slash_command(text: str) -> bool:
from agent.skill_commands import (
scan_skill_commands,
build_skill_invocation_message,
build_plan_path,
build_preloaded_skills_prompt,
)
@@ -3083,8 +3084,6 @@ class HermesCLI:
format_runtime_provider_error,
)
_primary_exc = None
runtime = None
try:
runtime = resolve_runtime_provider(
requested=self.requested_provider,
@@ -3092,34 +3091,7 @@ class HermesCLI:
explicit_base_url=self._explicit_base_url,
)
except Exception as exc:
_primary_exc = exc
# Primary provider auth failed — try fallback providers before giving up.
if runtime is None and _primary_exc is not None:
from hermes_cli.auth import AuthError
if isinstance(_primary_exc, AuthError):
_fb_chain = self._fallback_model if isinstance(self._fallback_model, list) else []
for _fb in _fb_chain:
_fb_provider = (_fb.get("provider") or "").strip().lower()
_fb_model = (_fb.get("model") or "").strip()
if not _fb_provider or not _fb_model:
continue
try:
runtime = resolve_runtime_provider(requested=_fb_provider)
logger.warning(
"Primary provider auth failed (%s). Falling through to fallback: %s/%s",
_primary_exc, _fb_provider, _fb_model,
)
_cprint(f"⚠️ Primary auth failed — switching to fallback: {_fb_provider} / {_fb_model}")
self.requested_provider = _fb_provider
self.model = _fb_model
_primary_exc = None
break
except Exception:
continue
if runtime is None:
message = format_runtime_provider_error(_primary_exc) if _primary_exc else "Provider resolution failed."
message = format_runtime_provider_error(exc)
ChatConsole().print(f"[bold red]{message}[/]")
return False
@@ -3176,14 +3148,7 @@ class HermesCLI:
# the configured model (e.g. "qwen3.6-plus"), causing 400 errors.
runtime_model = runtime.get("model")
if runtime_model and isinstance(runtime_model, str):
# Only use runtime model if: model is unset, or model equals provider name
should_use_runtime_model = (
not self.model or # No model configured yet
self.model == self.provider or # Model is the provider slug
self.model == runtime.get("name") # Model matches provider display name
)
if should_use_runtime_model:
self.model = runtime_model
self.model = runtime_model
# If model is still empty (e.g. user ran `hermes auth add openai-codex`
# without `hermes model`), fall back to the provider's first catalog
@@ -3289,23 +3254,6 @@ class HermesCLI:
_cprint(f"\033[1;31mSession not found: {self.session_id}{_RST}")
_cprint(f"{_DIM}Use a session ID from a previous CLI run (hermes sessions list).{_RST}")
return False
# If the requested session is the (empty) head of a compression
# chain, walk to the descendant that actually holds the messages.
# See #15000 and SessionDB.resolve_resume_session_id.
try:
resolved_id = self._session_db.resolve_resume_session_id(self.session_id)
except Exception:
resolved_id = self.session_id
if resolved_id and resolved_id != self.session_id:
ChatConsole().print(
f"[{_DIM}]Session {_escape(self.session_id)} was compressed into "
f"{_escape(resolved_id)}; resuming the descendant with your "
f"transcript.[/]"
)
self.session_id = resolved_id
resolved_meta = self._session_db.get_session(self.session_id)
if resolved_meta:
session_meta = resolved_meta
restored = self._session_db.get_messages_as_conversation(self.session_id)
if restored:
restored = [m for m in restored if m.get("role") != "session_meta"]
@@ -3524,22 +3472,6 @@ class HermesCLI:
)
return False
# If the requested session is the (empty) head of a compression chain,
# walk to the descendant that actually holds the messages. See #15000.
try:
resolved_id = self._session_db.resolve_resume_session_id(self.session_id)
except Exception:
resolved_id = self.session_id
if resolved_id and resolved_id != self.session_id:
self._console_print(
f"[dim]Session {self.session_id} was compressed into "
f"{resolved_id}; resuming the descendant with your transcript.[/]"
)
self.session_id = resolved_id
resolved_meta = self._session_db.get_session(self.session_id)
if resolved_meta:
session_meta = resolved_meta
restored = self._session_db.get_messages_as_conversation(self.session_id)
if restored:
restored = [m for m in restored if m.get("role") != "session_meta"]
@@ -4668,6 +4600,10 @@ class HermesCLI:
def new_session(self, silent=False):
"""Start a fresh session with a new session ID and cleared agent state."""
if self.agent and self.conversation_history:
try:
self.agent.flush_memories(self.conversation_history)
except (Exception, KeyboardInterrupt):
pass
# Trigger memory extraction on the old session before session_id rotates.
self.agent.commit_memory_session(self.conversation_history)
self._notify_session_boundary("on_session_finalize")
@@ -4750,22 +4686,6 @@ class HermesCLI:
_cprint(" Use /history or `hermes sessions list` to see available sessions.")
return
# If the target is the empty head of a compression chain, redirect to
# the descendant that actually holds the transcript. See #15000.
try:
resolved_id = self._session_db.resolve_resume_session_id(target_id)
except Exception:
resolved_id = target_id
if resolved_id and resolved_id != target_id:
_cprint(
f" Session {target_id} was compressed into {resolved_id}; "
f"resuming the descendant with your transcript."
)
target_id = resolved_id
resolved_meta = self._session_db.get_session(target_id)
if resolved_meta:
session_meta = resolved_meta
if target_id == self.session_id:
_cprint(" Already on that session.")
return
@@ -5377,26 +5297,29 @@ class HermesCLI:
_cprint(f" ✓ Model switched: {result.new_model}")
_cprint(f" Provider: {provider_label}")
# Context: always resolve via the provider-aware chain so Codex OAuth,
# Copilot, and Nous-enforced caps win over the raw models.dev entry
# (e.g. gpt-5.5 is 1.05M on openai but 272K on Codex OAuth).
# Rich metadata from models.dev
mi = result.model_info
from hermes_cli.model_switch import resolve_display_context_length
ctx = resolve_display_context_length(
result.new_model,
result.target_provider,
base_url=result.base_url or self.base_url or "",
api_key=result.api_key or self.api_key or "",
model_info=mi,
)
if ctx:
_cprint(f" Context: {ctx:,} tokens")
if mi:
if mi.context_window:
_cprint(f" Context: {mi.context_window:,} tokens")
if mi.max_output:
_cprint(f" Max output: {mi.max_output:,} tokens")
if mi.has_cost_data():
_cprint(f" Cost: {mi.format_cost()}")
_cprint(f" Capabilities: {mi.format_capabilities()}")
else:
# Fallback to old context length lookup
try:
from agent.model_metadata import get_model_context_length
ctx = get_model_context_length(
result.new_model,
base_url=result.base_url or self.base_url,
api_key=result.api_key or self.api_key,
provider=result.target_provider,
)
_cprint(f" Context: {ctx:,} tokens")
except Exception:
pass
# Cache notice
cache_enabled = (
@@ -5455,6 +5378,79 @@ class HermesCLI:
except Exception:
return False
def _show_model_and_providers(self):
"""Show current model + provider and list all authenticated providers.
Shows current model + provider, then lists all authenticated
providers with their available models.
"""
from hermes_cli.models import (
curated_models_for_provider, list_available_providers,
normalize_provider, _PROVIDER_LABELS,
get_pricing_for_provider, format_model_pricing_table,
)
from hermes_cli.auth import resolve_provider as _resolve_provider
# Resolve current provider
raw_provider = normalize_provider(self.provider)
if raw_provider == "auto":
try:
current = _resolve_provider(
self.requested_provider,
explicit_api_key=self._explicit_api_key,
explicit_base_url=self._explicit_base_url,
)
except Exception:
current = "openrouter"
else:
current = raw_provider
current_label = _PROVIDER_LABELS.get(current, current)
print(f"\n Current: {self.model} via {current_label}")
print()
# Show all authenticated providers with their models
providers = list_available_providers()
authed = [p for p in providers if p["authenticated"]]
unauthed = [p for p in providers if not p["authenticated"]]
if authed:
print(" Authenticated providers & models:")
for p in authed:
is_active = p["id"] == current
marker = " ← active" if is_active else ""
print(f" [{p['id']}]{marker}")
curated = curated_models_for_provider(p["id"])
# Fetch pricing for providers that support it (openrouter, nous)
pricing_map = get_pricing_for_provider(p["id"]) if p["id"] in ("openrouter", "nous") else {}
if curated and pricing_map:
cur_model = self.model if is_active else ""
for line in format_model_pricing_table(curated, pricing_map, current_model=cur_model):
print(line)
elif curated:
for mid, desc in curated:
current_marker = " ← current" if (is_active and mid == self.model) else ""
print(f" {mid}{current_marker}")
elif p["id"] == "custom":
from hermes_cli.models import _get_custom_base_url
custom_url = _get_custom_base_url()
if custom_url:
print(f" endpoint: {custom_url}")
if is_active:
print(f" model: {self.model} ← current")
print(" (use hermes model to change)")
else:
print(" (use hermes model to change)")
print()
if unauthed:
names = ", ".join(p["label"] for p in unauthed)
print(f" Not configured: {names}")
print(" Run: hermes setup")
print()
print(" To change model or provider, use: hermes model")
def _output_console(self):
"""Use prompt_toolkit-safe Rich rendering once the TUI is live."""
if getattr(self, "_app", None):
@@ -6030,12 +6026,16 @@ class HermesCLI:
self._handle_resume_command(cmd_original)
elif canonical == "model":
self._handle_model_switch(cmd_original)
elif canonical == "provider":
self._show_model_and_providers()
elif canonical == "gquota":
self._handle_gquota_command(cmd_original)
elif canonical == "personality":
# Use original case (handler lowercases the personality name itself)
self._handle_personality_command(cmd_original)
elif canonical == "plan":
self._handle_plan_command(cmd_original)
elif canonical == "retry":
retry_msg = self.retry_last()
if retry_msg and hasattr(self, '_pending_input'):
@@ -6165,8 +6165,6 @@ class HermesCLI:
self._handle_skin_command(cmd_original)
elif canonical == "voice":
self._handle_voice_command(cmd_original)
elif canonical == "busy":
self._handle_busy_command(cmd_original)
else:
# Check for user-defined quick commands (bypass agent loop, no LLM call)
base_cmd = cmd_lower.split()[0]
@@ -6272,6 +6270,32 @@ class HermesCLI:
return True
def _handle_plan_command(self, cmd: str):
"""Handle /plan [request] — load the bundled plan skill."""
parts = cmd.strip().split(maxsplit=1)
user_instruction = parts[1].strip() if len(parts) > 1 else ""
plan_path = build_plan_path(user_instruction)
msg = build_skill_invocation_message(
"/plan",
user_instruction,
task_id=self.session_id,
runtime_note=(
"Save the markdown plan with write_file to this exact relative path "
f"inside the active workspace/backend cwd: {plan_path}"
),
)
if not msg:
ChatConsole().print("[bold red]Failed to load the bundled /plan skill[/]")
return
_cprint(f" 📝 Plan mode queued via skill. Markdown plan target: {plan_path}")
if hasattr(self, '_pending_input'):
self._pending_input.put(msg)
else:
ChatConsole().print("[bold red]Plan mode unavailable: input queue not initialized[/]")
def _handle_background_command(self, cmd: str):
"""Handle /background <prompt> — run a prompt in a separate background session.
@@ -6661,13 +6685,6 @@ class HermesCLI:
print(f" ⚠ Port {_port} is not reachable at {cdp_url}")
os.environ["BROWSER_CDP_URL"] = cdp_url
# Eagerly start the CDP supervisor so pending_dialogs + frame_tree
# show up in the next browser_snapshot. No-op if already started.
try:
from tools.browser_tool import _ensure_cdp_supervisor # type: ignore[import-not-found]
_ensure_cdp_supervisor("default")
except Exception:
pass
print()
print("🌐 Browser connected to live Chrome via CDP")
print(f" Endpoint: {cdp_url}")
@@ -6689,8 +6706,7 @@ class HermesCLI:
if current:
os.environ.pop("BROWSER_CDP_URL", None)
try:
from tools.browser_tool import cleanup_all_browsers, _stop_cdp_supervisor
_stop_cdp_supervisor("default")
from tools.browser_tool import cleanup_all_browsers
cleanup_all_browsers()
except Exception:
pass
@@ -6903,36 +6919,6 @@ class HermesCLI:
else:
_cprint(f" {_ACCENT}✓ Reasoning effort set to '{arg}' (session only){_RST}")
def _handle_busy_command(self, cmd: str):
"""Handle /busy — control what Enter does while Hermes is working.
Usage:
/busy Show current busy input mode
/busy status Show current busy input mode
/busy queue Queue input for the next turn instead of interrupting
/busy interrupt Interrupt the current run on Enter (default)
"""
parts = cmd.strip().split(maxsplit=1)
if len(parts) < 2 or parts[1].strip().lower() == "status":
_cprint(f" {_ACCENT}Busy input mode: {self.busy_input_mode}{_RST}")
_cprint(f" {_DIM}Enter while busy: {'queues for next turn' if self.busy_input_mode == 'queue' else 'interrupts current run'}{_RST}")
_cprint(f" {_DIM}Usage: /busy [queue|interrupt|status]{_RST}")
return
arg = parts[1].strip().lower()
if arg not in {"queue", "interrupt"}:
_cprint(f" {_DIM}(._.) Unknown argument: {arg}{_RST}")
_cprint(f" {_DIM}Usage: /busy [queue|interrupt|status]{_RST}")
return
self.busy_input_mode = arg
if save_config_value("display.busy_input_mode", arg):
behavior = "Enter will queue follow-up input while Hermes is busy." if arg == "queue" else "Enter will interrupt the current run while Hermes is busy."
_cprint(f" {_ACCENT}✓ Busy input mode set to '{arg}' (saved to config){_RST}")
_cprint(f" {_DIM}{behavior}{_RST}")
else:
_cprint(f" {_ACCENT}✓ Busy input mode set to '{arg}' (session only){_RST}")
def _handle_fast_command(self, cmd: str):
"""Handle /fast — toggle fast mode (OpenAI Priority Processing / Anthropic Fast Mode)."""
if not self._fast_command_available():
@@ -7011,52 +6997,51 @@ class HermesCLI:
focus_topic = parts[1].strip()
original_count = len(self.conversation_history)
with self._busy_command("Compressing context..."):
try:
from agent.model_metadata import estimate_messages_tokens_rough
from agent.manual_compression_feedback import summarize_manual_compression
original_history = list(self.conversation_history)
approx_tokens = estimate_messages_tokens_rough(original_history)
if focus_topic:
print(f"🗜️ Compressing {original_count} messages (~{approx_tokens:,} tokens), "
f"focus: \"{focus_topic}\"...")
else:
print(f"🗜️ Compressing {original_count} messages (~{approx_tokens:,} tokens)...")
try:
from agent.model_metadata import estimate_messages_tokens_rough
from agent.manual_compression_feedback import summarize_manual_compression
original_history = list(self.conversation_history)
approx_tokens = estimate_messages_tokens_rough(original_history)
if focus_topic:
print(f"🗜️ Compressing {original_count} messages (~{approx_tokens:,} tokens), "
f"focus: \"{focus_topic}\"...")
else:
print(f"🗜️ Compressing {original_count} messages (~{approx_tokens:,} tokens)...")
compressed, _ = self.agent._compress_context(
original_history,
self.agent._cached_system_prompt or "",
approx_tokens=approx_tokens,
focus_topic=focus_topic or None,
)
self.conversation_history = compressed
# _compress_context ends the old session and creates a new child
# session on the agent (run_agent.py::_compress_context). Sync the
# CLI's session_id so /status, /resume, exit summary, and title
# generation all point at the live continuation session, not the
# ended parent. Without this, subsequent end_session() calls target
# the already-closed parent and the child is orphaned.
if (
getattr(self.agent, "session_id", None)
and self.agent.session_id != self.session_id
):
self.session_id = self.agent.session_id
self._pending_title = None
new_tokens = estimate_messages_tokens_rough(self.conversation_history)
summary = summarize_manual_compression(
original_history,
self.conversation_history,
approx_tokens,
new_tokens,
)
icon = "🗜️" if summary["noop"] else ""
print(f" {icon} {summary['headline']}")
print(f" {summary['token_line']}")
if summary["note"]:
print(f" {summary['note']}")
compressed, _ = self.agent._compress_context(
original_history,
self.agent._cached_system_prompt or "",
approx_tokens=approx_tokens,
focus_topic=focus_topic or None,
)
self.conversation_history = compressed
# _compress_context ends the old session and creates a new child
# session on the agent (run_agent.py::_compress_context). Sync the
# CLI's session_id so /status, /resume, exit summary, and title
# generation all point at the live continuation session, not the
# ended parent. Without this, subsequent end_session() calls target
# the already-closed parent and the child is orphaned.
if (
getattr(self.agent, "session_id", None)
and self.agent.session_id != self.session_id
):
self.session_id = self.agent.session_id
self._pending_title = None
new_tokens = estimate_messages_tokens_rough(self.conversation_history)
summary = summarize_manual_compression(
original_history,
self.conversation_history,
approx_tokens,
new_tokens,
)
icon = "🗜️" if summary["noop"] else ""
print(f" {icon} {summary['headline']}")
print(f" {summary['token_line']}")
if summary["note"]:
print(f" {summary['note']}")
except Exception as e:
print(f" ❌ Compression failed: {e}")
except Exception as e:
print(f" ❌ Compression failed: {e}")
def _handle_debug_command(self):
"""Handle /debug — upload debug report + logs and print paste URLs."""
@@ -9558,20 +9543,9 @@ class HermesCLI:
@kb.add('c-d')
def handle_ctrl_d(event):
"""Ctrl+D: delete char under cursor (standard readline behaviour).
Only exit when the input is empty same as bash/zsh. Pending
attached images count as input and block the EOF-exit so the
user doesn't lose them silently.
"""
buf = event.app.current_buffer
if buf.text:
buf.delete()
elif self._attached_images:
# Empty text but pending attachments — no-op, don't exit.
return
else:
self._should_exit = True
event.app.exit()
"""Handle Ctrl+D - exit."""
self._should_exit = True
event.app.exit()
_modal_prompt_active = Condition(
lambda: bool(self._secret_state or self._sudo_state)
@@ -10784,6 +10758,12 @@ class HermesCLI:
self.agent.interrupt()
except Exception:
pass
# Flush memories before exit (only for substantial conversations)
if self.agent and self.conversation_history:
try:
self.agent.flush_memories(self.conversation_history)
except (Exception, KeyboardInterrupt):
pass
# Shut down voice recorder (release persistent audio stream)
if hasattr(self, '_voice_recorder') and self._voice_recorder:
try:
+1 -65
View File
@@ -16,7 +16,7 @@ import uuid
from datetime import datetime, timedelta
from pathlib import Path
from hermes_constants import get_hermes_home
from typing import Optional, Dict, List, Any, Union
from typing import Optional, Dict, List, Any
logger = logging.getLogger(__name__)
@@ -371,39 +371,6 @@ def save_jobs(jobs: List[Dict[str, Any]]):
raise
def _normalize_workdir(workdir: Optional[str]) -> Optional[str]:
"""Normalize and validate a cron job workdir.
Rules:
- Empty / None None (feature off, preserves old behaviour).
- ``~`` is expanded. Relative paths are rejected cron jobs run detached
from any shell cwd, so relative paths have no stable meaning.
- The path must exist and be a directory at create/update time. We do
NOT re-check at run time (a user might briefly unmount the dir; the
scheduler will just fall back to old behaviour with a logged warning).
Returns the absolute path string, or None when disabled.
Raises ValueError on invalid input.
"""
if workdir is None:
return None
raw = str(workdir).strip()
if not raw:
return None
expanded = Path(raw).expanduser()
if not expanded.is_absolute():
raise ValueError(
f"Cron workdir must be an absolute path (got {raw!r}). "
f"Cron jobs run detached from any shell cwd, so relative paths are ambiguous."
)
resolved = expanded.resolve()
if not resolved.exists():
raise ValueError(f"Cron workdir does not exist: {resolved}")
if not resolved.is_dir():
raise ValueError(f"Cron workdir is not a directory: {resolved}")
return str(resolved)
def create_job(
prompt: str,
schedule: str,
@@ -417,9 +384,7 @@ def create_job(
provider: Optional[str] = None,
base_url: Optional[str] = None,
script: Optional[str] = None,
context_from: Optional[Union[str, List[str]]] = None,
enabled_toolsets: Optional[List[str]] = None,
workdir: Optional[str] = None,
) -> Dict[str, Any]:
"""
Create a new cron job.
@@ -439,18 +404,9 @@ def create_job(
script: Optional path to a Python script whose stdout is injected into the
prompt each run. The script runs before the agent turn, and its output
is prepended as context. Useful for data collection / change detection.
context_from: Optional job ID (or list of job IDs) whose most recent output
is injected into the prompt as context before each run.
Useful for chaining cron jobs: job A finds data, job B processes it.
enabled_toolsets: Optional list of toolset names to restrict the agent to.
When set, only tools from these toolsets are loaded, reducing
token overhead. When omitted, all default tools are loaded.
workdir: Optional absolute path. When set, the job runs as if launched
from that directory: AGENTS.md / CLAUDE.md / .cursorrules from
that directory are injected into the system prompt, and the
terminal/file/code_exec tools use it as their working directory
(via TERMINAL_CWD). When unset, the old behaviour is preserved
(no context files injected, tools use the scheduler's cwd).
Returns:
The created job dict
@@ -483,15 +439,6 @@ def create_job(
normalized_script = normalized_script or None
normalized_toolsets = [str(t).strip() for t in enabled_toolsets if str(t).strip()] if enabled_toolsets else None
normalized_toolsets = normalized_toolsets or None
normalized_workdir = _normalize_workdir(workdir)
# Normalize context_from: accept str or list of str, store as list or None
if isinstance(context_from, str):
context_from = [context_from.strip()] if context_from.strip() else None
elif isinstance(context_from, list):
context_from = [str(j).strip() for j in context_from if str(j).strip()] or None
else:
context_from = None
label_source = (prompt or (normalized_skills[0] if normalized_skills else None)) or "cron job"
job = {
@@ -504,7 +451,6 @@ def create_job(
"provider": normalized_provider,
"base_url": normalized_base_url,
"script": normalized_script,
"context_from": context_from,
"schedule": parsed_schedule,
"schedule_display": parsed_schedule.get("display", schedule),
"repeat": {
@@ -525,7 +471,6 @@ def create_job(
"deliver": deliver,
"origin": origin, # Tracks where job was created for "origin" delivery
"enabled_toolsets": normalized_toolsets,
"workdir": normalized_workdir,
}
jobs = load_jobs()
@@ -559,15 +504,6 @@ def update_job(job_id: str, updates: Dict[str, Any]) -> Optional[Dict[str, Any]]
if job["id"] != job_id:
continue
# Validate / normalize workdir if present in updates. Empty string or
# None both mean "clear the field" (restore old behaviour).
if "workdir" in updates:
_wd = updates["workdir"]
if _wd in (None, "", False):
updates["workdir"] = None
else:
updates["workdir"] = _normalize_workdir(_wd)
updated = _apply_skill_fields({**job, **updates})
schedule_changed = "schedule" in updates
+9 -122
View File
@@ -671,47 +671,6 @@ def _build_job_prompt(job: dict, prerun_script: Optional[tuple] = None) -> str:
f"{prompt}"
)
# Inject output from referenced cron jobs as context.
context_from = job.get("context_from")
if context_from:
from cron.jobs import OUTPUT_DIR
if isinstance(context_from, str):
context_from = [context_from]
for source_job_id in context_from:
# Guard against path traversal — valid job IDs are 12-char hex strings
if not source_job_id or not all(c in "0123456789abcdef" for c in source_job_id):
logger.warning("context_from: skipping invalid job_id %r", source_job_id)
continue
try:
job_output_dir = OUTPUT_DIR / source_job_id
if not job_output_dir.exists():
continue # silent skip — no output yet
output_files = sorted(
job_output_dir.glob("*.md"),
key=lambda f: f.stat().st_mtime,
reverse=True,
)
if not output_files:
continue # silent skip — no output yet
latest_output = output_files[0].read_text(encoding="utf-8").strip()
# Truncate to 8K characters to avoid prompt bloat
_MAX_CONTEXT_CHARS = 8000
if len(latest_output) > _MAX_CONTEXT_CHARS:
latest_output = latest_output[:_MAX_CONTEXT_CHARS] + "\n\n[... output truncated ...]"
if latest_output:
prompt = (
f"## Output from job '{source_job_id}'\n"
"The following is the most recent output from a preceding "
"cron job. Use it as context for your analysis.\n\n"
f"```\n{latest_output}\n```\n\n"
f"{prompt}"
)
else:
continue # silent skip — empty output
except (OSError, PermissionError) as e:
logger.warning("context_from: failed to read output for job %r: %s", source_job_id, e)
# silent skip — do not pollute the prompt with error messages
# Always prepend cron execution guidance so the agent knows how
# delivery works and can suppress delivery when appropriate.
cron_hint = (
@@ -836,30 +795,6 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
chat_name=origin.get("chat_name", "") if origin else "",
)
# Per-job working directory. When set (and validated at create/update
# time), we point TERMINAL_CWD at it so:
# - build_context_files_prompt() picks up AGENTS.md / CLAUDE.md /
# .cursorrules from the job's project dir, AND
# - the terminal, file, and code-exec tools run commands from there.
#
# tick() serializes workdir-jobs outside the parallel pool, so mutating
# os.environ["TERMINAL_CWD"] here is safe for those jobs. For workdir-less
# jobs we leave TERMINAL_CWD untouched — preserves the original behaviour
# (skip_context_files=True, tools use whatever cwd the scheduler has).
_job_workdir = (job.get("workdir") or "").strip() or None
if _job_workdir and not Path(_job_workdir).is_dir():
# Directory was removed between create-time validation and now. Log
# and drop back to old behaviour rather than crashing the job.
logger.warning(
"Job '%s': configured workdir %r no longer exists — running without it",
job_id, _job_workdir,
)
_job_workdir = None
_prior_terminal_cwd = os.environ.get("TERMINAL_CWD", "_UNSET_")
if _job_workdir:
os.environ["TERMINAL_CWD"] = _job_workdir
logger.info("Job '%s': using workdir %s", job_id, _job_workdir)
try:
# Re-read .env and config.yaml fresh every run so provider/key
# changes take effect without a gateway restart.
@@ -936,7 +871,6 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
resolve_runtime_provider,
format_runtime_provider_error,
)
from hermes_cli.auth import AuthError
try:
runtime_kwargs = {
"requested": job.get("provider") or os.getenv("HERMES_INFERENCE_PROVIDER"),
@@ -944,28 +878,6 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
if job.get("base_url"):
runtime_kwargs["explicit_base_url"] = job.get("base_url")
runtime = resolve_runtime_provider(**runtime_kwargs)
except AuthError as auth_exc:
# Primary provider auth failed — try fallback chain before giving up.
logger.warning("Job '%s': primary auth failed (%s), trying fallback", job_id, auth_exc)
fb = _cfg.get("fallback_providers") or _cfg.get("fallback_model")
fb_list = (fb if isinstance(fb, list) else [fb]) if fb else []
runtime = None
for entry in fb_list:
if not isinstance(entry, dict):
continue
try:
fb_kwargs = {"requested": entry.get("provider")}
if entry.get("base_url"):
fb_kwargs["explicit_base_url"] = entry["base_url"]
if entry.get("api_key"):
fb_kwargs["explicit_api_key"] = entry["api_key"]
runtime = resolve_runtime_provider(**fb_kwargs)
logger.info("Job '%s': fallback resolved to %s", job_id, runtime.get("provider"))
break
except Exception as fb_exc:
logger.debug("Job '%s': fallback %s failed: %s", job_id, entry.get("provider"), fb_exc)
if runtime is None:
raise RuntimeError(format_runtime_provider_error(auth_exc)) from auth_exc
except Exception as exc:
message = format_runtime_provider_error(exc)
raise RuntimeError(message) from exc
@@ -1008,10 +920,7 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
enabled_toolsets=_resolve_cron_enabled_toolsets(job, _cfg),
disabled_toolsets=["cronjob", "messaging", "clarify"],
quiet_mode=True,
# When a workdir is configured, inject AGENTS.md / CLAUDE.md /
# .cursorrules from that directory; otherwise preserve the old
# behaviour (don't inject SOUL.md/AGENTS.md from the scheduler cwd).
skip_context_files=not bool(_job_workdir),
skip_context_files=True, # Don't inject SOUL.md/AGENTS.md from scheduler cwd
skip_memory=True, # Cron system prompts would corrupt user representations
platform="cron",
session_id=_cron_session_id,
@@ -1150,14 +1059,6 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
return False, output, "", error_msg
finally:
# Restore TERMINAL_CWD to whatever it was before this job ran. We
# only ever mutate it when the job has a workdir; see the setup block
# at the top of run_job for the serialization guarantee.
if _job_workdir:
if _prior_terminal_cwd == "_UNSET_":
os.environ.pop("TERMINAL_CWD", None)
else:
os.environ["TERMINAL_CWD"] = _prior_terminal_cwd
# Clean up ContextVar session/delivery state for this job.
clear_session_vars(_ctx_tokens)
if _session_db:
@@ -1285,28 +1186,14 @@ def tick(verbose: bool = True, adapters=None, loop=None) -> int:
mark_job_run(job["id"], False, str(e))
return False
# Partition due jobs: those with a per-job workdir mutate
# os.environ["TERMINAL_CWD"] inside run_job, which is process-global —
# so they MUST run sequentially to avoid corrupting each other. Jobs
# without a workdir leave env untouched and stay parallel-safe.
workdir_jobs = [j for j in due_jobs if (j.get("workdir") or "").strip()]
parallel_jobs = [j for j in due_jobs if not (j.get("workdir") or "").strip()]
_results: list = []
# Sequential pass for workdir jobs.
for job in workdir_jobs:
_ctx = contextvars.copy_context()
_results.append(_ctx.run(_process_job, job))
# Parallel pass for the rest — same behaviour as before.
if parallel_jobs:
with concurrent.futures.ThreadPoolExecutor(max_workers=_max_workers) as _tick_pool:
_futures = []
for job in parallel_jobs:
_ctx = contextvars.copy_context()
_futures.append(_tick_pool.submit(_ctx.run, _process_job, job))
_results.extend(f.result() for f in _futures)
# Run all due jobs concurrently, each in its own ContextVar copy
# so session/delivery state stays isolated per-thread.
with concurrent.futures.ThreadPoolExecutor(max_workers=_max_workers) as _tick_pool:
_futures = []
for job in due_jobs:
_ctx = contextvars.copy_context()
_futures.append(_tick_pool.submit(_ctx.run, _process_job, job))
_results = [f.result() for f in _futures]
return sum(_results)
finally:
-52
View File
@@ -1,52 +0,0 @@
#
# docker-compose.yml for Hermes Agent
#
# Usage:
# HERMES_UID=$(id -u) HERMES_GID=$(id -g) docker compose up -d
#
# Set HERMES_UID / HERMES_GID to the host user that owns ~/.hermes so
# files created inside the container stay readable/writable on the host.
# The entrypoint remaps the internal `hermes` user to these values via
# usermod/groupmod + gosu.
#
# Security notes:
# - The dashboard service binds to 127.0.0.1 by default. It stores API
# keys; exposing it on LAN without auth is unsafe. If you want remote
# access, use an SSH tunnel or put it behind a reverse proxy that
# adds authentication — do NOT pass --insecure --host 0.0.0.0.
# - The gateway's API server is off unless you uncomment API_SERVER_KEY
# and API_SERVER_HOST. See docs/user-guide/api-server.md before doing
# this on an internet-facing host.
#
services:
gateway:
build: .
image: hermes-agent
container_name: hermes
restart: unless-stopped
network_mode: host
volumes:
- ~/.hermes:/opt/data
environment:
- HERMES_UID=${HERMES_UID:-10000}
- HERMES_GID=${HERMES_GID:-10000}
# To expose the OpenAI-compatible API server beyond localhost,
# uncomment BOTH lines (API_SERVER_KEY is mandatory for auth):
# - API_SERVER_HOST=0.0.0.0
# - API_SERVER_KEY=${API_SERVER_KEY}
command: ["gateway", "run"]
dashboard:
image: hermes-agent
container_name: hermes-dashboard
restart: unless-stopped
network_mode: host
depends_on:
- gateway
volumes:
- ~/.hermes:/opt/data
environment:
- HERMES_UID=${HERMES_UID:-10000}
- HERMES_GID=${HERMES_GID:-10000}
# Localhost-only. For remote access, tunnel via `ssh -L 9119:localhost:9119`.
command: ["dashboard", "--host", "127.0.0.1", "--no-open"]
+2 -11
View File
@@ -22,18 +22,9 @@ if [ "$(id -u)" = "0" ]; then
groupmod -o -g "$HERMES_GID" hermes 2>/dev/null || true
fi
# Fix ownership of the data volume. When HERMES_UID remaps the hermes user,
# files created by previous runs (under the old UID) become inaccessible.
# Always chown -R when UID was remapped; otherwise only if top-level is wrong.
actual_hermes_uid=$(id -u hermes)
needs_chown=false
if [ -n "$HERMES_UID" ] && [ "$HERMES_UID" != "10000" ]; then
needs_chown=true
elif [ "$(stat -c %u "$HERMES_HOME" 2>/dev/null)" != "$actual_hermes_uid" ]; then
needs_chown=true
fi
if [ "$needs_chown" = true ]; then
echo "Fixing ownership of $HERMES_HOME to hermes ($actual_hermes_uid)"
if [ "$(stat -c %u "$HERMES_HOME" 2>/dev/null)" != "$actual_hermes_uid" ]; then
echo "$HERMES_HOME is not owned by $actual_hermes_uid, fixing"
# In rootless Podman the container's "root" is mapped to an unprivileged
# host UID — chown will fail. That's fine: the volume is already owned
# by the mapped user on the host side.
+3 -8
View File
@@ -135,7 +135,7 @@ class SessionResetPolicy:
mode=mode if mode is not None else "both",
at_hour=at_hour if at_hour is not None else 4,
idle_minutes=idle_minutes if idle_minutes is not None else 1440,
notify=_coerce_bool(notify, True),
notify=notify if notify is not None else True,
notify_exclude_platforms=tuple(exclude) if exclude is not None else ("api_server", "webhook"),
)
@@ -178,7 +178,7 @@ class PlatformConfig:
home_channel = HomeChannel.from_dict(data["home_channel"])
return cls(
enabled=_coerce_bool(data.get("enabled"), False),
enabled=data.get("enabled", False),
token=data.get("token"),
api_key=data.get("api_key"),
home_channel=home_channel,
@@ -435,7 +435,7 @@ class GatewayConfig:
reset_triggers=data.get("reset_triggers", ["/new", "/reset"]),
quick_commands=quick_commands,
sessions_dir=sessions_dir,
always_log_local=_coerce_bool(data.get("always_log_local"), True),
always_log_local=data.get("always_log_local", True),
stt_enabled=_coerce_bool(stt_enabled, True),
group_sessions_per_user=_coerce_bool(group_sessions_per_user, True),
thread_sessions_per_user=_coerce_bool(thread_sessions_per_user, False),
@@ -687,11 +687,6 @@ def load_gateway_config() -> GatewayConfig:
os.environ["TELEGRAM_REACTIONS"] = str(telegram_cfg["reactions"]).lower()
if "proxy_url" in telegram_cfg and not os.getenv("TELEGRAM_PROXY"):
os.environ["TELEGRAM_PROXY"] = str(telegram_cfg["proxy_url"]).strip()
if "group_allowed_chats" in telegram_cfg and not os.getenv("TELEGRAM_GROUP_ALLOWED_USERS"):
gac = telegram_cfg["group_allowed_chats"]
if isinstance(gac, list):
gac = ",".join(str(v) for v in gac)
os.environ["TELEGRAM_GROUP_ALLOWED_USERS"] = str(gac)
if "disable_link_previews" in telegram_cfg:
plat_data = platforms_data.setdefault(Platform.TELEGRAM.value, {})
if not isinstance(plat_data, dict):
+22 -101
View File
@@ -1204,12 +1204,10 @@ class APIServerAdapter(BasePlatformAdapter):
If the client disconnects mid-stream, ``agent.interrupt()`` is
called so the agent stops issuing upstream LLM calls, then the
asyncio task is cancelled. When ``store=True`` an initial
``in_progress`` snapshot is persisted immediately after
``response.created`` and disconnects update it to an
``incomplete`` snapshot so GET /v1/responses/{id} and
``previous_response_id`` chaining still have something to
recover from.
asyncio task is cancelled. When ``store=True`` the full response
is persisted to the ResponseStore in a ``finally`` block so GET
/v1/responses/{id} and ``previous_response_id`` chaining work the
same as the batch path.
"""
import queue as _q
@@ -1271,60 +1269,6 @@ class APIServerAdapter(BasePlatformAdapter):
final_response_text = ""
agent_error: Optional[str] = None
usage: Dict[str, int] = {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0}
terminal_snapshot_persisted = False
def _persist_response_snapshot(
response_env: Dict[str, Any],
*,
conversation_history_snapshot: Optional[List[Dict[str, Any]]] = None,
) -> None:
if not store:
return
if conversation_history_snapshot is None:
conversation_history_snapshot = list(conversation_history)
conversation_history_snapshot.append({"role": "user", "content": user_message})
self._response_store.put(response_id, {
"response": response_env,
"conversation_history": conversation_history_snapshot,
"instructions": instructions,
"session_id": session_id,
})
if conversation:
self._response_store.set_conversation(conversation, response_id)
def _persist_incomplete_if_needed() -> None:
"""Persist an ``incomplete`` snapshot if no terminal one was written.
Called from both the client-disconnect (``ConnectionResetError``)
and server-cancellation (``asyncio.CancelledError``) paths so
GET /v1/responses/{id} and ``previous_response_id`` chaining keep
working after abrupt stream termination.
"""
if not store or terminal_snapshot_persisted:
return
incomplete_text = "".join(final_text_parts) or final_response_text
incomplete_items: List[Dict[str, Any]] = list(emitted_items)
if incomplete_text:
incomplete_items.append({
"type": "message",
"role": "assistant",
"content": [{"type": "output_text", "text": incomplete_text}],
})
incomplete_env = _envelope("incomplete")
incomplete_env["output"] = incomplete_items
incomplete_env["usage"] = {
"input_tokens": usage.get("input_tokens", 0),
"output_tokens": usage.get("output_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
}
incomplete_history = list(conversation_history)
incomplete_history.append({"role": "user", "content": user_message})
if incomplete_text:
incomplete_history.append({"role": "assistant", "content": incomplete_text})
_persist_response_snapshot(
incomplete_env,
conversation_history_snapshot=incomplete_history,
)
try:
# response.created — initial envelope, status=in_progress
@@ -1334,7 +1278,6 @@ class APIServerAdapter(BasePlatformAdapter):
"type": "response.created",
"response": created_env,
})
_persist_response_snapshot(created_env)
last_activity = time.monotonic()
async def _open_message_item() -> None:
@@ -1591,18 +1534,6 @@ class APIServerAdapter(BasePlatformAdapter):
"output_tokens": usage.get("output_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
}
_failed_history = list(conversation_history)
_failed_history.append({"role": "user", "content": user_message})
if final_response_text or agent_error:
_failed_history.append({
"role": "assistant",
"content": final_response_text or agent_error,
})
_persist_response_snapshot(
failed_env,
conversation_history_snapshot=_failed_history,
)
terminal_snapshot_persisted = True
await _write_event("response.failed", {
"type": "response.failed",
"response": failed_env,
@@ -1615,24 +1546,30 @@ class APIServerAdapter(BasePlatformAdapter):
"output_tokens": usage.get("output_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
}
full_history = list(conversation_history)
full_history.append({"role": "user", "content": user_message})
if isinstance(result, dict) and result.get("messages"):
full_history.extend(result["messages"])
else:
full_history.append({"role": "assistant", "content": final_response_text})
_persist_response_snapshot(
completed_env,
conversation_history_snapshot=full_history,
)
terminal_snapshot_persisted = True
await _write_event("response.completed", {
"type": "response.completed",
"response": completed_env,
})
# Persist for future chaining / GET retrieval, mirroring
# the batch path behavior.
if store:
full_history = list(conversation_history)
full_history.append({"role": "user", "content": user_message})
if isinstance(result, dict) and result.get("messages"):
full_history.extend(result["messages"])
else:
full_history.append({"role": "assistant", "content": final_response_text})
self._response_store.put(response_id, {
"response": completed_env,
"conversation_history": full_history,
"instructions": instructions,
"session_id": session_id,
})
if conversation:
self._response_store.set_conversation(conversation, response_id)
except (ConnectionResetError, ConnectionAbortedError, BrokenPipeError, OSError):
_persist_incomplete_if_needed()
# Client disconnected — interrupt the agent so it stops
# making upstream LLM calls, then cancel the task.
agent = agent_ref[0] if agent_ref else None
@@ -1648,22 +1585,6 @@ class APIServerAdapter(BasePlatformAdapter):
except (asyncio.CancelledError, Exception):
pass
logger.info("SSE client disconnected; interrupted agent task %s", response_id)
except asyncio.CancelledError:
# Server-side cancellation (e.g. shutdown, request timeout) —
# persist an incomplete snapshot so GET /v1/responses/{id} and
# previous_response_id chaining still work, then re-raise so the
# runtime's cancellation semantics are respected.
_persist_incomplete_if_needed()
agent = agent_ref[0] if agent_ref else None
if agent is not None:
try:
agent.interrupt("SSE task cancelled")
except Exception:
pass
if not agent_task.done():
agent_task.cancel()
logger.info("SSE task cancelled; persisted incomplete snapshot for %s", response_id)
raise
return response
+3 -112
View File
@@ -148,102 +148,7 @@ def _detect_macos_system_proxy() -> str | None:
return None
def _split_host_port(value: str) -> tuple[str, int | None]:
raw = str(value or "").strip()
if not raw:
return "", None
if "://" in raw:
parsed = urlsplit(raw)
return (parsed.hostname or "").lower().rstrip("."), parsed.port
if raw.startswith("[") and "]" in raw:
host, _, rest = raw[1:].partition("]")
port = None
if rest.startswith(":") and rest[1:].isdigit():
port = int(rest[1:])
return host.lower().rstrip("."), port
if raw.count(":") == 1:
host, _, maybe_port = raw.rpartition(":")
if maybe_port.isdigit():
return host.lower().rstrip("."), int(maybe_port)
return raw.lower().strip("[]").rstrip("."), None
def _no_proxy_entries() -> list[str]:
entries: list[str] = []
for key in ("NO_PROXY", "no_proxy"):
raw = os.environ.get(key, "")
entries.extend(part.strip() for part in raw.split(",") if part.strip())
return entries
def _no_proxy_entry_matches(entry: str, host: str, port: int | None = None) -> bool:
token = str(entry or "").strip().lower()
if not token:
return False
if token == "*":
return True
token_host, token_port = _split_host_port(token)
if token_port is not None and port is not None and token_port != port:
return False
if token_port is not None and port is None:
return False
if not token_host:
return False
try:
network = ipaddress.ip_network(token_host, strict=False)
try:
return ipaddress.ip_address(host) in network
except ValueError:
return False
except ValueError:
pass
try:
token_ip = ipaddress.ip_address(token_host)
try:
return ipaddress.ip_address(host) == token_ip
except ValueError:
return False
except ValueError:
pass
if token_host.startswith("*."):
suffix = token_host[1:]
return host.endswith(suffix)
if token_host.startswith("."):
return host == token_host[1:] or host.endswith(token_host)
return host == token_host or host.endswith(f".{token_host}")
def should_bypass_proxy(target_hosts: str | list[str] | tuple[str, ...] | set[str] | None) -> bool:
"""Return True when NO_PROXY/no_proxy matches at least one target host.
Supports exact hosts, domain suffixes, wildcard suffixes, IP literals,
CIDR ranges, optional host:port entries, and ``*``.
"""
entries = _no_proxy_entries()
if not entries or not target_hosts:
return False
if isinstance(target_hosts, str):
candidates = [target_hosts]
else:
candidates = list(target_hosts)
for candidate in candidates:
host, port = _split_host_port(str(candidate))
if not host:
continue
if any(_no_proxy_entry_matches(entry, host, port) for entry in entries):
return True
return False
def resolve_proxy_url(
platform_env_var: str | None = None,
*,
target_hosts: str | list[str] | tuple[str, ...] | set[str] | None = None,
) -> str | None:
def resolve_proxy_url(platform_env_var: str | None = None) -> str | None:
"""Return a proxy URL from env vars, or macOS system proxy.
Check order:
@@ -251,26 +156,18 @@ def resolve_proxy_url(
1. HTTPS_PROXY / HTTP_PROXY / ALL_PROXY (and lowercase variants)
2. macOS system proxy via ``scutil --proxy`` (auto-detect)
Returns *None* if no proxy is found, or if NO_PROXY/no_proxy matches one
of ``target_hosts``.
Returns *None* if no proxy is found.
"""
if platform_env_var:
value = (os.environ.get(platform_env_var) or "").strip()
if value:
if should_bypass_proxy(target_hosts):
return None
return normalize_proxy_url(value)
for key in ("HTTPS_PROXY", "HTTP_PROXY", "ALL_PROXY",
"https_proxy", "http_proxy", "all_proxy"):
value = (os.environ.get(key) or "").strip()
if value:
if should_bypass_proxy(target_hosts):
return None
return normalize_proxy_url(value)
detected = normalize_proxy_url(_detect_macos_system_proxy())
if detected and should_bypass_proxy(target_hosts):
return None
return detected
return normalize_proxy_url(_detect_macos_system_proxy())
def proxy_kwargs_for_bot(proxy_url: str | None) -> dict:
@@ -2543,9 +2440,6 @@ class BasePlatformAdapter(ABC):
user_id_alt: Optional[str] = None,
chat_id_alt: Optional[str] = None,
is_bot: bool = False,
guild_id: Optional[str] = None,
parent_chat_id: Optional[str] = None,
message_id: Optional[str] = None,
) -> SessionSource:
"""Helper to build a SessionSource for this platform."""
# Normalize empty topic to None
@@ -2563,9 +2457,6 @@ class BasePlatformAdapter(ABC):
user_id_alt=user_id_alt,
chat_id_alt=chat_id_alt,
is_bot=is_bot,
guild_id=str(guild_id) if guild_id else None,
parent_chat_id=str(parent_chat_id) if parent_chat_id else None,
message_id=str(message_id) if message_id else None,
)
@abstractmethod
+2 -19
View File
@@ -99,7 +99,6 @@ def _normalize_server_url(raw: str) -> str:
class BlueBubblesAdapter(BasePlatformAdapter):
platform = Platform.BLUEBUBBLES
SUPPORTS_MESSAGE_EDITING = False
MAX_MESSAGE_LENGTH = MAX_TEXT_LENGTH
def __init__(self, config: PlatformConfig):
@@ -392,13 +391,6 @@ class BlueBubblesAdapter(BasePlatformAdapter):
# Text sending
# ------------------------------------------------------------------
@staticmethod
def truncate_message(content: str, max_length: int = MAX_TEXT_LENGTH) -> List[str]:
# Use the base splitter but skip pagination indicators — iMessage
# bubbles flow naturally without "(1/3)" suffixes.
chunks = BasePlatformAdapter.truncate_message(content, max_length)
return [re.sub(r"\s*\(\d+/\d+\)$", "", c) for c in chunks]
async def send(
self,
chat_id: str,
@@ -406,19 +398,10 @@ class BlueBubblesAdapter(BasePlatformAdapter):
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
text = self.format_message(content)
text = strip_markdown(content or "")
if not text:
return SendResult(success=False, error="BlueBubbles send requires text")
# Split on paragraph breaks first (double newlines) so each thought
# becomes its own iMessage bubble, then truncate any that are still
# too long.
paragraphs = [p.strip() for p in re.split(r'\n\s*\n', text) if p.strip()]
chunks: List[str] = []
for para in (paragraphs or [text]):
if len(para) <= self.MAX_MESSAGE_LENGTH:
chunks.append(para)
else:
chunks.extend(self.truncate_message(para, max_length=self.MAX_MESSAGE_LENGTH))
chunks = self.truncate_message(text, max_length=self.MAX_MESSAGE_LENGTH)
last = SendResult(success=True)
for chunk in chunks:
guid = await self._resolve_chat_guid(chat_id)
+11 -28
View File
@@ -2246,6 +2246,10 @@ class DiscordAdapter(BasePlatformAdapter):
async def slash_usage(interaction: discord.Interaction):
await self._run_simple_slash(interaction, "/usage")
@tree.command(name="provider", description="Show available providers")
async def slash_provider(interaction: discord.Interaction):
await self._run_simple_slash(interaction, "/provider")
@tree.command(name="help", description="Show available commands")
async def slash_help(interaction: discord.Interaction):
await self._run_simple_slash(interaction, "/help")
@@ -2715,12 +2719,7 @@ class DiscordAdapter(BasePlatformAdapter):
return os.getenv("DISCORD_REQUIRE_MENTION", "true").lower() not in ("false", "0", "no", "off")
def _discord_free_response_channels(self) -> set:
"""Return Discord channel IDs where no bot mention is required.
A single ``"*"`` entry (either from a list or a comma-separated
string) is preserved in the returned set so callers can short-circuit
on wildcard membership, consistent with ``allowed_channels``.
"""
"""Return Discord channel IDs where no bot mention is required."""
raw = self.config.extra.get("free_response_channels")
if raw is None:
raw = os.getenv("DISCORD_FREE_RESPONSE_CHANNELS", "")
@@ -3213,14 +3212,14 @@ class DiscordAdapter(BasePlatformAdapter):
allowed_channels_raw = os.getenv("DISCORD_ALLOWED_CHANNELS", "")
if allowed_channels_raw:
allowed_channels = {ch.strip() for ch in allowed_channels_raw.split(",") if ch.strip()}
if "*" not in allowed_channels and not (channel_ids & allowed_channels):
if not (channel_ids & allowed_channels):
logger.debug("[%s] Ignoring message in non-allowed channel: %s", self.name, channel_ids)
return
# Check ignored channels - never respond even when mentioned
ignored_channels_raw = os.getenv("DISCORD_IGNORED_CHANNELS", "")
ignored_channels = {ch.strip() for ch in ignored_channels_raw.split(",") if ch.strip()}
if "*" in ignored_channels or (channel_ids & ignored_channels):
if channel_ids & ignored_channels:
logger.debug("[%s] Ignoring message in ignored channel: %s", self.name, channel_ids)
return
@@ -3234,11 +3233,7 @@ class DiscordAdapter(BasePlatformAdapter):
voice_linked_ids = {str(ch_id) for ch_id in self._voice_text_channels.values()}
current_channel_id = str(message.channel.id)
is_voice_linked_channel = current_channel_id in voice_linked_ids
is_free_channel = (
"*" in free_channels
or bool(channel_ids & free_channels)
or is_voice_linked_channel
)
is_free_channel = bool(channel_ids & free_channels) or is_voice_linked_channel
# Skip the mention check if the message is in a thread where
# the bot has previously participated (auto-created or replied in).
@@ -3261,7 +3256,6 @@ class DiscordAdapter(BasePlatformAdapter):
if auto_thread and not skip_thread and not is_voice_linked_channel and not is_reply_message:
thread = await self._auto_create_thread(message)
if thread:
parent_channel_id = str(message.channel.id)
is_thread = True
thread_id = str(thread.id)
auto_threaded_channel = thread
@@ -3321,9 +3315,6 @@ class DiscordAdapter(BasePlatformAdapter):
thread_id=thread_id,
chat_topic=chat_topic,
is_bot=getattr(message.author, "bot", False),
guild_id=str(message.guild.id) if message.guild else None,
parent_chat_id=parent_channel_id,
message_id=str(message.id),
)
# Build media URLs -- download image attachments to local cache so the
@@ -3875,15 +3866,6 @@ if DISCORD_AVAILABLE:
self.resolved = True
model_id = interaction.data["values"][0]
self.clear_items()
await interaction.response.edit_message(
embed=discord.Embed(
title="⚙ Switching Model",
description=f"Switching to `{model_id}`...",
color=discord.Color.blue(),
),
view=None,
)
try:
result_text = await self.on_model_selected(
@@ -3894,13 +3876,14 @@ if DISCORD_AVAILABLE:
except Exception as exc:
result_text = f"Error switching model: {exc}"
await interaction.edit_original_response(
self.clear_items()
await interaction.response.edit_message(
embed=discord.Embed(
title="⚙ Model Switched",
description=result_text,
color=discord.Color.green(),
),
view=None,
view=self,
)
async def _on_back(self, interaction: discord.Interaction):
-14
View File
@@ -532,20 +532,6 @@ class MatrixAdapter(BasePlatformAdapter):
)
await crypto_store.open()
# Bind the store to the runtime device_id before any
# put_account() runs. PgCryptoStore defaults _device_id
# to "" and its crypto_account UPSERT never updates the
# device_id column on conflict — so once put_account
# writes blank, it stays blank forever. That breaks
# every downstream device-scoped olm operation: peer
# to-device ciphertext can't find our identity key and
# no megolm sessions ever land. Setting _device_id here
# (in-memory; the on-disk row may not exist yet) makes
# the first put_account write the correct value.
# DeviceID is a NewType(str) so plain str works at runtime.
if client.device_id:
await crypto_store.put_device_id(client.device_id)
crypto_state = _CryptoStateStore(state_store, self._joined_rooms)
olm = OlmMachine(client, crypto_store, crypto_state)
+1 -2
View File
@@ -703,6 +703,7 @@ class TelegramAdapter(BasePlatformAdapter):
"write_timeout": _env_float("HERMES_TELEGRAM_HTTP_WRITE_TIMEOUT", 20.0),
}
proxy_url = resolve_proxy_url("TELEGRAM_PROXY")
disable_fallback = (os.getenv("HERMES_TELEGRAM_DISABLE_FALLBACK_IPS", "").strip().lower() in ("1", "true", "yes", "on"))
fallback_ips = self._fallback_ips()
if not fallback_ips:
@@ -713,8 +714,6 @@ class TelegramAdapter(BasePlatformAdapter):
", ".join(fallback_ips),
)
proxy_targets = ["api.telegram.org", *fallback_ips]
proxy_url = resolve_proxy_url("TELEGRAM_PROXY", target_hosts=proxy_targets)
if fallback_ips and not proxy_url and not disable_fallback:
logger.info(
"[%s] Telegram fallback IPs active: %s",
+3 -3
View File
@@ -43,10 +43,10 @@ _DOH_PROVIDERS: list[dict] = [
_SEED_FALLBACK_IPS: list[str] = ["149.154.167.220"]
def _resolve_proxy_url(target_hosts=None) -> str | None:
def _resolve_proxy_url() -> str | None:
# Delegate to shared implementation (env vars + macOS system proxy detection)
from gateway.platforms.base import resolve_proxy_url
return resolve_proxy_url("TELEGRAM_PROXY", target_hosts=target_hosts)
return resolve_proxy_url("TELEGRAM_PROXY")
class TelegramFallbackTransport(httpx.AsyncBaseTransport):
@@ -60,7 +60,7 @@ class TelegramFallbackTransport(httpx.AsyncBaseTransport):
def __init__(self, fallback_ips: Iterable[str], **transport_kwargs):
self._fallback_ips = [ip for ip in dict.fromkeys(_normalize_fallback_ips(fallback_ips))]
proxy_url = _resolve_proxy_url(target_hosts=[_TELEGRAM_API_HOST, *self._fallback_ips])
proxy_url = _resolve_proxy_url()
if proxy_url and "proxy" not in transport_kwargs:
transport_kwargs["proxy"] = proxy_url
self._primary = httpx.AsyncHTTPTransport(**transport_kwargs)
+356 -221
View File
@@ -14,7 +14,6 @@ Usage:
"""
import asyncio
import dataclasses
import json
import logging
import os
@@ -298,16 +297,50 @@ from gateway.restart import (
)
from gateway.whatsapp_identity import (
canonical_whatsapp_identifier as _canonical_whatsapp_identifier, # noqa: F401
expand_whatsapp_aliases as _expand_whatsapp_auth_aliases,
normalize_whatsapp_identifier as _normalize_whatsapp_identifier,
)
def _normalize_whatsapp_identifier(value: str) -> str:
"""Strip WhatsApp JID/LID syntax down to its stable numeric identifier."""
return (
str(value or "")
.strip()
.replace("+", "", 1)
.split(":", 1)[0]
.split("@", 1)[0]
)
def _expand_whatsapp_auth_aliases(identifier: str) -> set:
"""Resolve WhatsApp phone/LID aliases using bridge session mapping files."""
normalized = _normalize_whatsapp_identifier(identifier)
if not normalized:
return set()
session_dir = _hermes_home / "whatsapp" / "session"
resolved = set()
queue = [normalized]
while queue:
current = queue.pop(0)
if not current or current in resolved:
continue
resolved.add(current)
for suffix in ("", "_reverse"):
mapping_path = session_dir / f"lid-mapping-{current}{suffix}.json"
if not mapping_path.exists():
continue
try:
mapped = _normalize_whatsapp_identifier(
json.loads(mapping_path.read_text(encoding="utf-8"))
)
except Exception:
continue
if mapped and mapped not in resolved:
queue.append(mapped)
return resolved
logger = logging.getLogger(__name__)
# Sentinel placed into _running_agents immediately when a session starts
# processing, *before* any await. Prevents a second message for the same
# session from bypassing the "already running" guard during the async gap
@@ -316,30 +349,16 @@ _AGENT_PENDING_SENTINEL = object()
def _resolve_runtime_agent_kwargs() -> dict:
"""Resolve provider credentials for gateway-created AIAgent instances.
If the primary provider fails with an authentication error, attempt to
resolve credentials using the fallback provider chain from config.yaml
before giving up.
"""
"""Resolve provider credentials for gateway-created AIAgent instances."""
from hermes_cli.runtime_provider import (
resolve_runtime_provider,
format_runtime_provider_error,
)
from hermes_cli.auth import AuthError
try:
runtime = resolve_runtime_provider(
requested=os.getenv("HERMES_INFERENCE_PROVIDER"),
)
except AuthError as auth_exc:
# Primary provider auth failed (expired token, revoked key, etc.).
# Try the fallback provider chain before raising.
logger.warning("Primary provider auth failed: %s — trying fallback", auth_exc)
fb_config = _try_resolve_fallback_provider()
if fb_config is not None:
return fb_config
raise RuntimeError(format_runtime_provider_error(auth_exc)) from auth_exc
except Exception as exc:
raise RuntimeError(format_runtime_provider_error(exc)) from exc
@@ -354,48 +373,6 @@ def _resolve_runtime_agent_kwargs() -> dict:
}
def _try_resolve_fallback_provider() -> dict | None:
"""Attempt to resolve credentials from the fallback_model/fallback_providers config."""
from hermes_cli.runtime_provider import resolve_runtime_provider
try:
import yaml as _y
cfg_path = _hermes_home / "config.yaml"
if not cfg_path.exists():
return None
with open(cfg_path, encoding="utf-8") as _f:
cfg = _y.safe_load(_f) or {}
fb = cfg.get("fallback_providers") or cfg.get("fallback_model")
if not fb:
return None
# Normalize to list
fb_list = fb if isinstance(fb, list) else [fb]
for entry in fb_list:
if not isinstance(entry, dict):
continue
try:
runtime = resolve_runtime_provider(
requested=entry.get("provider"),
explicit_base_url=entry.get("base_url"),
explicit_api_key=entry.get("api_key"),
)
logger.info("Fallback provider resolved: %s", runtime.get("provider"))
return {
"api_key": runtime.get("api_key"),
"base_url": runtime.get("base_url"),
"provider": runtime.get("provider"),
"api_mode": runtime.get("api_mode"),
"command": runtime.get("command"),
"args": list(runtime.get("args") or []),
"credential_pool": runtime.get("credential_pool"),
}
except Exception as fb_exc:
logger.debug("Fallback entry %s failed: %s", entry.get("provider"), fb_exc)
continue
except Exception:
pass
return None
def _build_media_placeholder(event) -> str:
"""Build a text placeholder for media-only events so they aren't dropped.
@@ -524,7 +501,7 @@ def _load_gateway_config() -> dict:
def _resolve_gateway_model(config: dict | None = None) -> str:
"""Read model from config.yaml — single source of truth.
Without this, temporary AIAgent instances (e.g. /compress) fall
Without this, temporary AIAgent instances (memory flush, /compress) fall
back to the hardcoded default which fails when the active provider is
openai-codex.
"""
@@ -915,6 +892,129 @@ class GatewayRunner:
e,
)
# -----------------------------------------------------------------
def _flush_memories_for_session(
self,
old_session_id: str,
session_key: Optional[str] = None,
):
"""Prompt the agent to save memories/skills before context is lost.
Synchronous worker meant to be called via run_in_executor from
an async context so it doesn't block the event loop.
"""
# Skip cron sessions — they run headless with no meaningful user
# conversation to extract memories from.
if old_session_id and old_session_id.startswith("cron_"):
logger.debug("Skipping memory flush for cron session: %s", old_session_id)
return
try:
history = self.session_store.load_transcript(old_session_id)
if not history or len(history) < 4:
return
from run_agent import AIAgent
model, runtime_kwargs = self._resolve_session_agent_runtime(
session_key=session_key,
)
if not runtime_kwargs.get("api_key"):
return
tmp_agent = AIAgent(
**runtime_kwargs,
model=model,
max_iterations=8,
quiet_mode=True,
skip_memory=True, # Flush agent — no memory provider
enabled_toolsets=["memory", "skills"],
session_id=old_session_id,
)
try:
# Fully silence the flush agent — quiet_mode only suppresses init
# messages; tool call output still leaks to the terminal through
# _safe_print → _print_fn. Set a no-op to prevent that.
tmp_agent._print_fn = lambda *a, **kw: None
# Build conversation history from transcript
msgs = [
{"role": m.get("role"), "content": m.get("content")}
for m in history
if m.get("role") in ("user", "assistant") and m.get("content")
]
# Read live memory state from disk so the flush agent can see
# what's already saved and avoid overwriting newer entries.
_current_memory = ""
try:
from tools.memory_tool import get_memory_dir
_mem_dir = get_memory_dir()
for fname, label in [
("MEMORY.md", "MEMORY (your personal notes)"),
("USER.md", "USER PROFILE (who the user is)"),
]:
fpath = _mem_dir / fname
if fpath.exists():
content = fpath.read_text(encoding="utf-8").strip()
if content:
_current_memory += f"\n\n## Current {label}:\n{content}"
except Exception:
pass # Non-fatal — flush still works, just without the guard
# Give the agent a real turn to think about what to save
flush_prompt = (
"[System: This session is about to be automatically reset due to "
"inactivity or a scheduled daily reset. The conversation context "
"will be cleared after this turn.\n\n"
"Review the conversation above and:\n"
"1. Save any important facts, preferences, or decisions to memory "
"(user profile or your notes) that would be useful in future sessions.\n"
"2. If you discovered a reusable workflow or solved a non-trivial "
"problem, consider saving it as a skill.\n"
"3. If nothing is worth saving, that's fine — just skip.\n\n"
)
if _current_memory:
flush_prompt += (
"IMPORTANT — here is the current live state of memory. Other "
"sessions, cron jobs, or the user may have updated it since this "
"conversation ended. Do NOT overwrite or remove entries unless "
"the conversation above reveals something that genuinely "
"supersedes them. Only add new information that is not already "
"captured below."
f"{_current_memory}\n\n"
)
flush_prompt += (
"Do NOT respond to the user. Just use the memory and skill_manage "
"tools if needed, then stop.]"
)
tmp_agent.run_conversation(
user_message=flush_prompt,
conversation_history=msgs,
)
finally:
self._cleanup_agent_resources(tmp_agent)
logger.info("Pre-reset memory flush completed for session %s", old_session_id)
except Exception as e:
logger.debug("Pre-reset memory flush failed for session %s: %s", old_session_id, e)
async def _async_flush_memories(
self,
old_session_id: str,
session_key: Optional[str] = None,
):
"""Run the sync memory flush in a thread pool so it won't block the event loop."""
loop = asyncio.get_running_loop()
await loop.run_in_executor(
None,
self._flush_memories_for_session,
old_session_id,
session_key,
)
@property
def should_exit_cleanly(self) -> bool:
return self._exit_cleanly
@@ -980,7 +1080,7 @@ class GatewayRunner:
if override_runtime.get("api_key"):
logger.debug(
"Session model override (fast): session=%s config_model=%s -> override_model=%s provider=%s",
resolved_session_key or "", model, override_model,
(resolved_session_key or "")[:30], model, override_model,
override_runtime.get("provider"),
)
return override_model, override_runtime
@@ -988,12 +1088,12 @@ class GatewayRunner:
# resolution and apply model/provider from the override on top.
logger.debug(
"Session model override (no api_key, fallback): session=%s config_model=%s override_model=%s",
resolved_session_key or "", model, override_model,
(resolved_session_key or "")[:30], model, override_model,
)
else:
logger.debug(
"No session model override: session=%s config_model=%s override_keys=%s",
resolved_session_key or "", model,
(resolved_session_key or "")[:30], model,
list(self._session_model_overrides.keys())[:5] if self._session_model_overrides else "[]",
)
@@ -1564,7 +1664,7 @@ class GatewayRunner:
continue
try:
agent.interrupt(reason)
logger.debug("Interrupted running agent for session %s during shutdown", session_key)
logger.debug("Interrupted running agent for session %s during shutdown", session_key[:20])
except Exception as e:
logger.debug("Failed interrupting agent during shutdown: %s", e)
@@ -1736,7 +1836,7 @@ class GatewayRunner:
logger.warning(
"Auto-suspended stuck session %s (active across %d "
"consecutive restarts — likely a stuck loop)",
session_key, counts[session_key],
session_key[:30], counts[session_key],
)
except Exception:
pass
@@ -2149,7 +2249,7 @@ class GatewayRunner:
except Exception as e:
logger.error("Recovered watcher setup error: %s", e)
# Start background session expiry watcher to finalize expired sessions
# Start background session expiry watcher for proactive memory flushing
asyncio.create_task(self._session_expiry_watcher())
# Start background reconnection watcher for platforms that failed at startup
@@ -2166,24 +2266,25 @@ class GatewayRunner:
return True
async def _session_expiry_watcher(self, interval: int = 300):
"""Background task that finalizes expired sessions.
"""Background task that proactively flushes memories for expired sessions.
Runs every `interval` seconds (default 5 min). For each session that
has expired according to its reset policy, flushes memories in a thread
pool and marks the session so it won't be flushed again.
Runs every ``interval`` seconds (default 5 min). For each session
whose reset policy has expired, invokes ``on_session_finalize``
hooks, cleans up the cached AIAgent's tool resources, evicts the
cache entry so it can be garbage-collected, and marks the session
so it won't be finalized again.
This means memories are already saved by the time the user sends their
next message, so there's no blocking delay.
"""
await asyncio.sleep(60) # initial delay — let the gateway fully start
_finalize_failures: dict[str, int] = {} # session_id -> consecutive failure count
_MAX_FINALIZE_RETRIES = 3
_flush_failures: dict[str, int] = {} # session_id -> consecutive failure count
_MAX_FLUSH_RETRIES = 3
while self._running:
try:
self.session_store._ensure_loaded()
# Collect expired sessions first, then log a single summary.
_expired_entries = []
for key, entry in list(self.session_store._entries.items()):
if entry.expiry_finalized:
if entry.memory_flushed:
continue
if not self.session_store._is_session_expired(entry):
continue
@@ -2201,23 +2302,13 @@ class GatewayRunner:
f"{p}:{c}" for p, c in sorted(_platforms.items())
)
logger.info(
"Session expiry: %d sessions to finalize (%s)",
"Session expiry: %d sessions to flush (%s)",
len(_expired_entries), _plat_summary,
)
for key, entry in _expired_entries:
try:
try:
from hermes_cli.plugins import invoke_hook as _invoke_hook
_parts = key.split(":")
_platform = _parts[2] if len(_parts) > 2 else ""
_invoke_hook(
"on_session_finalize",
session_id=entry.session_id,
platform=_platform,
)
except Exception:
pass
await self._async_flush_memories(entry.session_id, key)
# Shut down memory provider and close tool resources
# on the cached agent. Idle agents live in
# _agent_cache (not _running_agents), so look there.
@@ -2238,48 +2329,48 @@ class GatewayRunner:
# be garbage-collected. Otherwise the cache grows
# unbounded across the gateway's lifetime.
self._evict_cached_agent(key)
# Mark as finalized and persist to disk so the flag
# Mark as flushed and persist to disk so the flag
# survives gateway restarts.
with self.session_store._lock:
entry.expiry_finalized = True
entry.memory_flushed = True
self.session_store._save()
logger.debug(
"Session expiry finalized for %s",
"Memory flush completed for session %s",
entry.session_id,
)
_finalize_failures.pop(entry.session_id, None)
_flush_failures.pop(entry.session_id, None)
except Exception as e:
failures = _finalize_failures.get(entry.session_id, 0) + 1
_finalize_failures[entry.session_id] = failures
if failures >= _MAX_FINALIZE_RETRIES:
failures = _flush_failures.get(entry.session_id, 0) + 1
_flush_failures[entry.session_id] = failures
if failures >= _MAX_FLUSH_RETRIES:
logger.warning(
"Session finalize gave up after %d attempts for %s: %s. "
"Marking as finalized to prevent infinite retry loop.",
"Memory flush gave up after %d attempts for %s: %s. "
"Marking as flushed to prevent infinite retry loop.",
failures, entry.session_id, e,
)
with self.session_store._lock:
entry.expiry_finalized = True
entry.memory_flushed = True
self.session_store._save()
_finalize_failures.pop(entry.session_id, None)
_flush_failures.pop(entry.session_id, None)
else:
logger.debug(
"Session finalize failed (%d/%d) for %s: %s",
failures, _MAX_FINALIZE_RETRIES, entry.session_id, e,
"Memory flush failed (%d/%d) for %s: %s",
failures, _MAX_FLUSH_RETRIES, entry.session_id, e,
)
if _expired_entries:
_done = sum(
1 for _, e in _expired_entries if e.expiry_finalized
_flushed = sum(
1 for _, e in _expired_entries if e.memory_flushed
)
_failed = len(_expired_entries) - _done
_failed = len(_expired_entries) - _flushed
if _failed:
logger.info(
"Session expiry done: %d finalized, %d pending retry",
_done, _failed,
"Session expiry done: %d flushed, %d pending retry",
_flushed, _failed,
)
else:
logger.info(
"Session expiry done: %d finalized", _done,
"Session expiry done: %d flushed", _flushed,
)
# Sweep agents that have been idle beyond the TTL regardless
@@ -2556,7 +2647,7 @@ class GatewayRunner:
except Exception as _e:
logger.debug(
"mark_resume_pending failed for %s: %s",
_sk, _e,
_sk[:20], _e,
)
self._interrupt_running_agents(
_INTERRUPT_REASON_GATEWAY_RESTART if self._restart_requested else _INTERRUPT_REASON_GATEWAY_SHUTDOWN
@@ -2878,7 +2969,6 @@ class GatewayRunner:
Platform.QQBOT: "QQ_ALLOWED_USERS",
}
platform_group_env_map = {
Platform.TELEGRAM: "TELEGRAM_GROUP_ALLOWED_USERS",
Platform.QQBOT: "QQ_GROUP_ALLOWED_USERS",
}
platform_allow_all_map = {
@@ -2935,7 +3025,7 @@ class GatewayRunner:
# Check platform-specific and global allowlists
platform_allowlist = os.getenv(platform_env_map.get(source.platform, ""), "").strip()
group_allowlist = ""
if source.chat_type in {"group", "forum"}:
if source.chat_type == "group":
group_allowlist = os.getenv(platform_group_env_map.get(source.platform, ""), "").strip()
global_allowlist = os.getenv("GATEWAY_ALLOWED_USERS", "").strip()
@@ -2944,7 +3034,7 @@ class GatewayRunner:
return os.getenv("GATEWAY_ALLOW_ALL_USERS", "").lower() in ("true", "1", "yes")
# Some platforms authorize group traffic by chat ID rather than sender ID.
if group_allowlist and source.chat_type in {"group", "forum"} and source.chat_id:
if group_allowlist and source.chat_type == "group" and source.chat_id:
allowed_group_ids = {
chat_id.strip() for chat_id in group_allowlist.split(",") if chat_id.strip()
}
@@ -3055,50 +3145,7 @@ class GatewayRunner:
# Internal events (e.g. background-process completion notifications)
# are system-generated and must skip user authorization.
is_internal = bool(getattr(event, "internal", False))
# Fire pre_gateway_dispatch plugin hook for user-originated messages.
# Plugins receive the MessageEvent and may return a dict influencing flow:
# {"action": "skip", "reason": ...} -> drop (no reply, plugin handled)
# {"action": "rewrite", "text": ...} -> replace event.text, continue
# {"action": "allow"} / None -> normal dispatch
# Hook runs BEFORE auth so plugins can handle unauthorized senders
# (e.g. customer handover ingest) without triggering the pairing flow.
if not is_internal:
try:
from hermes_cli.plugins import invoke_hook as _invoke_hook
_hook_results = _invoke_hook(
"pre_gateway_dispatch",
event=event,
gateway=self,
session_store=self.session_store,
)
except Exception as _hook_exc:
logger.warning("pre_gateway_dispatch invocation failed: %s", _hook_exc)
_hook_results = []
for _result in _hook_results:
if not isinstance(_result, dict):
continue
_action = _result.get("action")
if _action == "skip":
logger.info(
"pre_gateway_dispatch skip: reason=%s platform=%s chat=%s",
_result.get("reason"),
source.platform.value if source.platform else "unknown",
source.chat_id or "unknown",
)
return None
if _action == "rewrite":
_new_text = _result.get("text")
if isinstance(_new_text, str):
event = dataclasses.replace(event, text=_new_text)
source = event.source
break
if _action == "allow":
break
if is_internal:
if getattr(event, "internal", False):
pass
elif source.user_id is None:
# Messages with no user identity (Telegram service messages,
@@ -3222,7 +3269,7 @@ class GatewayRunner:
logger.warning(
"Evicting stale _running_agents entry for %s "
"(age: %.0fs, idle: %.0fs, timeout: %.0fs)%s",
_quick_key, _stale_age, _stale_idle,
_quick_key[:30], _stale_age, _stale_idle,
_raw_stale_timeout, _stale_detail,
)
self._invalidate_session_run_generation(
@@ -3258,7 +3305,7 @@ class GatewayRunner:
interrupt_reason=_INTERRUPT_REASON_STOP,
invalidation_reason="stop_command",
)
logger.info("STOP for session %s — agent interrupted, session lock released", _quick_key)
logger.info("STOP for session %s — agent interrupted, session lock released", _quick_key[:20])
return "⚡ Stopped. You can continue this session."
# /reset and /new must bypass the running-agent guard so they
@@ -3324,7 +3371,7 @@ class GatewayRunner:
try:
accepted = running_agent.steer(steer_text)
except Exception as exc:
logger.warning("Steer failed for session %s: %s", _quick_key, exc)
logger.warning("Steer failed for session %s: %s", _quick_key[:20], exc)
return f"⚠️ Steer failed: {exc}"
if accepted:
preview = steer_text[:60] + ("..." if len(steer_text) > 60 else "")
@@ -3395,7 +3442,7 @@ class GatewayRunner:
# running-agent guard. Reject gracefully rather than falling
# through to interrupt + discard. Without this, commands
# like /model, /reasoning, /voice, /insights, /title,
# /resume, /retry, /undo, /compress, /usage,
# /resume, /retry, /undo, /compress, /usage, /provider,
# /reload-mcp, /sethome, /reset (all registered as Discord
# slash commands) would interrupt the agent AND get
# silently discarded by the slash-command safety net,
@@ -3407,7 +3454,7 @@ class GatewayRunner:
)
if event.message_type == MessageType.PHOTO:
logger.debug("PRIORITY photo follow-up for session %s — queueing without interrupt", _quick_key)
logger.debug("PRIORITY photo follow-up for session %s — queueing without interrupt", _quick_key[:20])
adapter = self.adapters.get(source.platform)
if adapter:
merge_pending_message_event(adapter._pending_messages, _quick_key, event)
@@ -3427,7 +3474,7 @@ class GatewayRunner:
logger.debug(
"Telegram follow-up arrived %.2fs after run start for %s — queueing without interrupt",
time.time() - _started_at,
_quick_key,
_quick_key[:20],
)
adapter = self.adapters.get(source.platform)
if adapter:
@@ -3445,7 +3492,7 @@ class GatewayRunner:
if event.get_command() == "stop":
# Force-clean the sentinel so the session is unlocked.
self._release_running_agent_state(_quick_key)
logger.info("HARD STOP (pending) for session %s — sentinel cleared", _quick_key)
logger.info("HARD STOP (pending) for session %s — sentinel cleared", _quick_key[:20])
return "⚡ Force-stopped. The agent was still starting — session unlocked."
# Queue the message so it will be picked up after the
# agent starts.
@@ -3466,11 +3513,7 @@ class GatewayRunner:
if self._queue_during_drain_enabled()
else f"⏳ Gateway is {self._status_action_gerund()} and is not accepting another turn right now."
)
if self._busy_input_mode == "queue":
logger.debug("PRIORITY queue follow-up for session %s", _quick_key)
self._queue_or_replace_pending_event(_quick_key, event)
return None
logger.debug("PRIORITY interrupt for session %s", _quick_key)
logger.debug("PRIORITY interrupt for session %s", _quick_key[:20])
running_agent.interrupt(event.text)
if _quick_key in self._pending_messages:
self._pending_messages[_quick_key] += "\n" + event.text
@@ -3586,9 +3629,34 @@ class GatewayRunner:
if canonical == "model":
return await self._handle_model_command(event)
if canonical == "provider":
return await self._handle_provider_command(event)
if canonical == "personality":
return await self._handle_personality_command(event)
if canonical == "plan":
try:
from agent.skill_commands import build_plan_path, build_skill_invocation_message
user_instruction = event.get_command_args().strip()
plan_path = build_plan_path(user_instruction)
event.text = build_skill_invocation_message(
"/plan",
user_instruction,
task_id=_quick_key,
runtime_note=(
"Save the markdown plan with write_file to this exact relative path "
f"inside the active workspace/backend cwd: {plan_path}"
),
)
if not event.text:
return "Failed to load the bundled /plan skill."
canonical = None
except Exception as e:
logger.exception("Failed to prepare /plan command")
return f"Failed to enter plan mode: {e}"
if canonical == "retry":
return await self._handle_retry_command(event)
@@ -4468,7 +4536,7 @@ class GatewayRunner:
if not self._is_session_run_current(_quick_key, run_generation):
logger.info(
"Discarding stale agent result for %s — generation %d is no longer current",
_quick_key or "?",
_quick_key[:20] if _quick_key else "?",
run_generation,
)
_stale_adapter = self.adapters.get(source.platform)
@@ -4519,7 +4587,7 @@ class GatewayRunner:
except Exception as _e:
logger.debug(
"clear_resume_pending failed for %s: %s",
session_key, _e,
session_key[:20], _e,
)
# Surface error details when the agent failed silently (final_response=None)
@@ -4896,11 +4964,19 @@ class GatewayRunner:
# Get existing session key
session_key = self._session_key_for_source(source)
self._invalidate_session_run_generation(session_key, reason="session_reset")
# Snapshot the old entry so on_session_finalize can report the
# expiring session id before reset_session() rotates it.
old_entry = self.session_store._entries.get(session_key)
# Flush memories in the background (fire-and-forget) so the user
# gets the "Session reset!" response immediately.
try:
old_entry = self.session_store._entries.get(session_key)
if old_entry:
_flush_task = asyncio.create_task(
self._async_flush_memories(old_entry.session_id, session_key)
)
self._background_tasks.add(_flush_task)
_flush_task.add_done_callback(self._background_tasks.discard)
except Exception as e:
logger.debug("Gateway memory flush on reset failed: %s", e)
# Close tool resources on the old agent (terminal sandboxes, browser
# daemons, background processes) before evicting from cache.
# Guard with getattr because test fixtures may skip __init__.
@@ -5158,7 +5234,7 @@ class GatewayRunner:
interrupt_reason=_INTERRUPT_REASON_STOP,
invalidation_reason="stop_command_pending",
)
logger.info("STOP (pending) for session %s — sentinel cleared", session_key)
logger.info("STOP (pending) for session %s — sentinel cleared", session_key[:20])
return "⚡ Stopped. The agent hadn't started yet — you can continue this session."
if agent:
# Force-clean the session lock so a truly hung agent doesn't
@@ -5526,17 +5602,9 @@ class GatewayRunner:
lines = [f"Model switched to `{result.new_model}`"]
lines.append(f"Provider: {plabel}")
mi = result.model_info
from hermes_cli.model_switch import resolve_display_context_length
ctx = resolve_display_context_length(
result.new_model,
result.target_provider,
base_url=result.base_url or current_base_url or "",
api_key=result.api_key or current_api_key or "",
model_info=mi,
)
if ctx:
lines.append(f"Context: {ctx:,} tokens")
if mi:
if mi.context_window:
lines.append(f"Context: {mi.context_window:,} tokens")
if mi.max_output:
lines.append(f"Max output: {mi.max_output:,} tokens")
if mi.has_cost_data():
@@ -5670,25 +5738,28 @@ class GatewayRunner:
lines = [f"Model switched to `{result.new_model}`"]
lines.append(f"Provider: {provider_label}")
# Context: always resolve via the provider-aware chain so Codex OAuth,
# Copilot, and Nous-enforced caps win over the raw models.dev entry.
# Rich metadata from models.dev
mi = result.model_info
from hermes_cli.model_switch import resolve_display_context_length
ctx = resolve_display_context_length(
result.new_model,
result.target_provider,
base_url=result.base_url or current_base_url or "",
api_key=result.api_key or current_api_key or "",
model_info=mi,
)
if ctx:
lines.append(f"Context: {ctx:,} tokens")
if mi:
if mi.context_window:
lines.append(f"Context: {mi.context_window:,} tokens")
if mi.max_output:
lines.append(f"Max output: {mi.max_output:,} tokens")
if mi.has_cost_data():
lines.append(f"Cost: {mi.format_cost()}")
lines.append(f"Capabilities: {mi.format_capabilities()}")
else:
try:
from agent.model_metadata import get_model_context_length
ctx = get_model_context_length(
result.new_model,
base_url=result.base_url or current_base_url,
api_key=result.api_key or current_api_key,
provider=result.target_provider,
)
lines.append(f"Context: {ctx:,} tokens")
except Exception:
pass
# Cache notice
cache_enabled = (
@@ -5708,6 +5779,63 @@ class GatewayRunner:
return "\n".join(lines)
async def _handle_provider_command(self, event: MessageEvent) -> str:
"""Handle /provider command - show available providers."""
import yaml
from hermes_cli.models import (
list_available_providers,
normalize_provider,
_PROVIDER_LABELS,
)
# Resolve current provider from config
current_provider = "openrouter"
model_cfg = {}
config_path = _hermes_home / 'config.yaml'
try:
if config_path.exists():
with open(config_path, encoding="utf-8") as f:
cfg = yaml.safe_load(f) or {}
model_cfg = cfg.get("model", {})
if isinstance(model_cfg, dict):
current_provider = model_cfg.get("provider", current_provider)
except Exception:
pass
current_provider = normalize_provider(current_provider)
if current_provider == "auto":
try:
from hermes_cli.auth import resolve_provider as _resolve_provider
current_provider = _resolve_provider(current_provider)
except Exception:
current_provider = "openrouter"
# Detect custom endpoint from config base_url
if current_provider == "openrouter":
_cfg_base = model_cfg.get("base_url", "") if isinstance(model_cfg, dict) else ""
if _cfg_base and "openrouter.ai" not in _cfg_base:
current_provider = "custom"
current_label = _PROVIDER_LABELS.get(current_provider, current_provider)
lines = [
f"🔌 **Current provider:** {current_label} (`{current_provider}`)",
"",
"**Available providers:**",
]
providers = list_available_providers()
for p in providers:
marker = " ← active" if p["id"] == current_provider else ""
auth = "" if p["authenticated"] else ""
aliases = f" _(also: {', '.join(p['aliases'])})_" if p["aliases"] else ""
lines.append(f"{auth} `{p['id']}` — {p['label']}{aliases}{marker}")
lines.append("")
lines.append("Switch: `/model provider:model-name`")
lines.append("Setup: `hermes setup`")
return "\n".join(lines)
async def _handle_personality_command(self, event: MessageEvent) -> str:
"""Handle /personality command - list or set a personality."""
import yaml
@@ -6974,7 +7102,10 @@ class GatewayRunner:
tmp_agent._print_fn = lambda *a, **kw: None
compressor = tmp_agent.context_compressor
if not compressor.has_content_to_compress(msgs):
compress_start = compressor.protect_first_n
compress_start = compressor._align_boundary_forward(msgs, compress_start)
compress_end = compressor._find_tail_cut_by_tokens(msgs, compress_start)
if compress_start >= compress_end:
return "Nothing to compress yet (the transcript is still all protected context)."
loop = asyncio.get_running_loop()
@@ -7100,25 +7231,29 @@ class GatewayRunner:
logger.debug("Failed to list titled sessions: %s", e)
return f"Could not list sessions: {e}"
# Resolve the name to a session ID.
# Resolve the name to a session ID
target_id = self._session_db.resolve_session_by_title(name)
if not target_id:
return (
f"No session found matching '**{name}**'.\n"
"Use `/resume` with no arguments to see available sessions."
)
# Compression creates child continuations that hold the live transcript.
# Follow that chain so gateway /resume matches CLI behavior (#15000).
try:
target_id = self._session_db.resolve_resume_session_id(target_id)
except Exception as e:
logger.debug("Failed to resolve resume continuation for %s: %s", target_id, e)
# Check if already on that session
current_entry = self.session_store.get_or_create_session(source)
if current_entry.session_id == target_id:
return f"📌 Already on session **{name}**."
# Flush memories for current session before switching
try:
_flush_task = asyncio.create_task(
self._async_flush_memories(current_entry.session_id, session_key)
)
self._background_tasks.add(_flush_task)
_flush_task.add_done_callback(self._background_tasks.discard)
except Exception as e:
logger.debug("Memory flush on resume failed: %s", e)
# Clear any running agent for this session key
self._release_running_agent_state(session_key)
@@ -8655,7 +8790,7 @@ class GatewayRunner:
if reason:
logger.info(
"Invalidated run generation for %s%d (%s)",
session_key,
session_key[:20],
generation,
reason,
)
@@ -9062,7 +9197,7 @@ class GatewayRunner:
if not _run_still_current():
logger.info(
"Discarding stale proxy stream for %s — generation %d is no longer current",
session_key or "?",
session_key[:20] if session_key else "?",
run_generation or 0,
)
return {
@@ -9126,7 +9261,7 @@ class GatewayRunner:
if not _run_still_current():
logger.info(
"Discarding stale proxy result for %s — generation %d is no longer current",
session_key or "?",
session_key[:20] if session_key else "?",
run_generation or 0,
)
return {
@@ -9568,7 +9703,7 @@ class GatewayRunner:
)
logger.debug(
"run_agent resolved: model=%s provider=%s session=%s",
model, runtime_kwargs.get("provider"), session_key or "",
model, runtime_kwargs.get("provider"), (session_key or "")[:30],
)
except Exception as exc:
return {
@@ -10179,7 +10314,7 @@ class GatewayRunner:
):
logger.info(
"Skipping stale agent promotion for %s — generation %s is no longer current",
session_key or "",
(session_key or "")[:20],
run_generation,
)
return
@@ -10326,7 +10461,7 @@ class GatewayRunner:
logger.info(
"Backup interrupt detected for session %s "
"(monitor task state: %s)",
session_key,
session_key[:20],
"done" if interrupt_monitor.done() else "running",
)
_backup_agent.interrupt(_bp_text)
@@ -10386,7 +10521,7 @@ class GatewayRunner:
logger.info(
"Backup interrupt detected for session %s "
"(monitor task state: %s)",
session_key,
session_key[:20],
"done" if interrupt_monitor.done() else "running",
)
_backup_agent.interrupt(_bp_text)
@@ -10488,7 +10623,7 @@ class GatewayRunner:
if _is_control_interrupt_message(interrupt_message):
logger.info(
"Ignoring control interrupt message for session %s: %s",
session_key or "?",
session_key[:20] if session_key else "?",
interrupt_message,
)
else:
@@ -10532,7 +10667,7 @@ class GatewayRunner:
if self._draining and (pending_event or pending):
logger.info(
"Discarding pending follow-up for session %s during gateway %s",
session_key or "?",
session_key[:20] if session_key else "?",
self._status_action_label(),
)
pending_event = None
@@ -10589,7 +10724,7 @@ class GatewayRunner:
try:
logger.info(
"Queued follow-up for session %s: final stream delivery not confirmed; sending first response before continuing.",
session_key or "?",
session_key[:20] if session_key else "?",
)
await adapter.send(
source.chat_id,
@@ -10601,7 +10736,7 @@ class GatewayRunner:
elif first_response:
logger.info(
"Queued follow-up for session %s: skipping resend because final streamed delivery was confirmed.",
session_key or "?",
session_key[:20] if session_key else "?",
)
# Release deferred bg-review notifications now that the
# first response has been delivered. Pop from the
@@ -10736,7 +10871,7 @@ class GatewayRunner:
if not _is_empty_sentinel and (_streamed or _previewed):
logger.info(
"Suppressing normal final send for session %s: final delivery already confirmed (streamed=%s previewed=%s).",
session_key or "?",
session_key[:20] if session_key else "?",
_streamed,
_previewed,
)
+16 -97
View File
@@ -60,10 +60,6 @@ from .config import (
SessionResetPolicy, # noqa: F401 — re-exported via gateway/__init__.py
HomeChannel,
)
from .whatsapp_identity import (
canonical_whatsapp_identifier,
normalize_whatsapp_identifier,
)
@dataclass
@@ -87,9 +83,6 @@ class SessionSource:
user_id_alt: Optional[str] = None # Platform-specific stable alt ID (Signal UUID, Feishu union_id)
chat_id_alt: Optional[str] = None # Signal group internal ID
is_bot: bool = False # True when the message author is a bot/webhook (Discord)
guild_id: Optional[str] = None # Discord guild / Slack workspace / Matrix server scope
parent_chat_id: Optional[str] = None # Parent channel when chat_id refers to a thread
message_id: Optional[str] = None # ID of the triggering message (for pin/reply/react)
@property
def description(self) -> str:
@@ -127,14 +120,8 @@ class SessionSource:
d["user_id_alt"] = self.user_id_alt
if self.chat_id_alt:
d["chat_id_alt"] = self.chat_id_alt
if self.guild_id:
d["guild_id"] = self.guild_id
if self.parent_chat_id:
d["parent_chat_id"] = self.parent_chat_id
if self.message_id:
d["message_id"] = self.message_id
return d
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "SessionSource":
return cls(
@@ -148,9 +135,6 @@ class SessionSource:
chat_topic=data.get("chat_topic"),
user_id_alt=data.get("user_id_alt"),
chat_id_alt=data.get("chat_id_alt"),
guild_id=data.get("guild_id"),
parent_chat_id=data.get("parent_chat_id"),
message_id=data.get("message_id"),
)
@@ -202,31 +186,6 @@ that requires raw IDs). Discord is excluded because mentions use ``<@user_id>``
and the LLM needs the real ID to tag users."""
def _discord_tools_loaded() -> bool:
"""True iff the agent will actually have Discord tools this session.
Two conditions must hold:
1. The `discord` or `discord_admin` toolset is enabled for the
Discord platform via `hermes tools` (opt-in, default OFF).
2. `DISCORD_BOT_TOKEN` is set the tool's `check_fn` gates on it
at registry time, so the toolset being enabled in config is not
enough if the token isn't configured.
Returns False (safe default keeps the stale-API disclaimer) on any
error so a bad config can't silently promise tools the agent lacks.
"""
if not (os.environ.get("DISCORD_BOT_TOKEN") or "").strip():
return False
try:
from hermes_cli.config import load_config
from hermes_cli.tools_config import _get_platform_tools
cfg = load_config()
enabled = _get_platform_tools(cfg, "discord", include_default_mcp_servers=False)
return "discord" in enabled or "discord_admin" in enabled
except Exception:
return False
def build_session_context_prompt(
context: SessionContext,
*,
@@ -314,44 +273,13 @@ def build_session_context_prompt(
"that you can only read messages sent directly to you and respond."
)
elif context.source.platform == Platform.DISCORD:
# Inject the Discord IDs block only when the agent actually has
# Discord tools loaded this session — i.e. the user opted into
# `discord` / `discord_admin` via `hermes tools` AND the bot
# token is configured. Otherwise keep the stale-API disclaimer
# honest so we never promise tools the agent lacks.
if _discord_tools_loaded():
src = context.source
id_lines = ["", "**Discord IDs (for the `discord` / `discord_admin` tools):**"]
if src.guild_id:
id_lines.append(f" - Guild: `{src.guild_id}`")
if src.thread_id and src.parent_chat_id:
id_lines.append(f" - Parent channel: `{src.parent_chat_id}`")
id_lines.append(f" - Thread: `{src.thread_id}` (use as `channel_id` for fetch_messages etc.)")
else:
id_lines.append(f" - Channel: `{src.chat_id}`")
if src.message_id:
id_lines.append(f" - Triggering message: `{src.message_id}`")
lines.extend(id_lines)
else:
lines.append("")
lines.append(
"**Platform notes:** You are running inside Discord. "
"You do NOT have access to Discord-specific APIs — you cannot search "
"channel history, pin messages, manage roles, or list server members. "
"Do not promise to perform these actions. If the user asks, explain "
"that you can only read messages sent directly to you and respond."
)
elif context.source.platform == Platform.BLUEBUBBLES:
lines.append("")
lines.append(
"**Platform notes:** You are responding via iMessage. "
"Keep responses short and conversational — think texts, not essays. "
"Structure longer replies as separate short thoughts, each separated "
"by a blank line (double newline). Each block between blank lines "
"will be delivered as its own iMessage bubble, so write accordingly: "
"one idea per bubble, 13 sentences each. "
"If the user needs a detailed answer, give the short version first "
"and offer to elaborate."
"**Platform notes:** You are running inside Discord. "
"You do NOT have access to Discord-specific APIs — you cannot search "
"channel history, pin messages, manage roles, or list server members. "
"Do not promise to perform these actions. If the user asks, explain "
"that you can only read messages sent directly to you and respond."
)
# Connected platforms
@@ -439,11 +367,11 @@ class SessionEntry:
auto_reset_reason: Optional[str] = None # "idle" or "daily"
reset_had_activity: bool = False # whether the expired session had any messages
# Set by the background expiry watcher after it finalizes an expired
# session (invoking on_session_finalize hooks and evicting the cached
# agent). Persisted to sessions.json so the flag survives gateway
# restarts — prevents redundant finalization runs.
expiry_finalized: bool = False
# Set by the background expiry watcher after it successfully flushes
# memories for this session. Persisted to sessions.json so the flag
# survives gateway restarts (the old in-memory _pre_flushed_sessions
# set was lost on restart, causing redundant re-flushes).
memory_flushed: bool = False
# When True the next call to get_or_create_session() will auto-reset
# this session (create a new session_id) so the user starts fresh.
@@ -479,7 +407,7 @@ class SessionEntry:
"last_prompt_tokens": self.last_prompt_tokens,
"estimated_cost_usd": self.estimated_cost_usd,
"cost_status": self.cost_status,
"expiry_finalized": self.expiry_finalized,
"memory_flushed": self.memory_flushed,
"suspended": self.suspended,
"resume_pending": self.resume_pending,
"resume_reason": self.resume_reason,
@@ -531,7 +459,7 @@ class SessionEntry:
last_prompt_tokens=data.get("last_prompt_tokens", 0),
estimated_cost_usd=data.get("estimated_cost_usd", 0.0),
cost_status=data.get("cost_status", "unknown"),
expiry_finalized=data.get("expiry_finalized", data.get("memory_flushed", False)),
memory_flushed=data.get("memory_flushed", False),
suspended=data.get("suspended", False),
resume_pending=data.get("resume_pending", False),
resume_reason=data.get("resume_reason"),
@@ -590,24 +518,15 @@ def build_session_key(
"""
platform = source.platform.value
if source.chat_type == "dm":
dm_chat_id = source.chat_id
if source.platform == Platform.WHATSAPP:
dm_chat_id = canonical_whatsapp_identifier(source.chat_id)
if dm_chat_id:
if source.chat_id:
if source.thread_id:
return f"agent:main:{platform}:dm:{dm_chat_id}:{source.thread_id}"
return f"agent:main:{platform}:dm:{dm_chat_id}"
return f"agent:main:{platform}:dm:{source.chat_id}:{source.thread_id}"
return f"agent:main:{platform}:dm:{source.chat_id}"
if source.thread_id:
return f"agent:main:{platform}:dm:{source.thread_id}"
return f"agent:main:{platform}:dm"
participant_id = source.user_id_alt or source.user_id
if participant_id and source.platform == Platform.WHATSAPP:
# Same JID/LID-flip bug as the DM case: without canonicalisation, a
# single group member gets two isolated per-user sessions when the
# bridge reshuffles alias forms.
participant_id = canonical_whatsapp_identifier(str(participant_id)) or participant_id
key_parts = ["agent:main", platform, source.chat_type]
if source.chat_id:
-135
View File
@@ -1,135 +0,0 @@
"""Shared helpers for canonicalising WhatsApp sender identity.
WhatsApp's bridge can surface the same human under two different JID shapes
within a single conversation:
- LID form: ``999999999999999@lid``
- Phone form: ``15551234567@s.whatsapp.net``
Both the authorisation path (:mod:`gateway.run`) and the session-key path
(:mod:`gateway.session`) need to collapse these aliases to a single stable
identity. This module is the single source of truth for that resolution so
the two paths can never drift apart.
Public helpers:
- :func:`normalize_whatsapp_identifier` strip JID/LID/device/plus syntax
down to the bare numeric identifier.
- :func:`canonical_whatsapp_identifier` walk the bridge's
``lid-mapping-*.json`` files and return a stable canonical identity
across phone/LID variants.
- :func:`expand_whatsapp_aliases` return the full alias set for an
identifier. Used by authorisation code that needs to match any known
form of a sender against an allow-list.
Plugins that need per-sender behaviour on WhatsApp (role-based routing,
per-contact authorisation, policy gating in a gateway hook) should use
``canonical_whatsapp_identifier`` so their bookkeeping lines up with
Hermes' own session keys.
"""
from __future__ import annotations
import json
from typing import Set
from hermes_constants import get_hermes_home
def normalize_whatsapp_identifier(value: str) -> str:
"""Strip WhatsApp JID/LID syntax down to its stable numeric identifier.
Accepts any of the identifier shapes the WhatsApp bridge may emit:
``"60123456789@s.whatsapp.net"``, ``"60123456789:47@s.whatsapp.net"``,
``"60123456789@lid"``, or a bare ``"+601****6789"`` / ``"60123456789"``.
Returns just the numeric identifier (``"60123456789"``) suitable for
equality comparisons.
Useful for plugins that want to match sender IDs against
user-supplied config (phone numbers in ``config.yaml``) without
worrying about which variant the bridge happens to deliver.
"""
return (
str(value or "")
.strip()
.replace("+", "", 1)
.split(":", 1)[0]
.split("@", 1)[0]
)
def expand_whatsapp_aliases(identifier: str) -> Set[str]:
"""Resolve WhatsApp phone/LID aliases via bridge session mapping files.
Returns the set of all identifiers transitively reachable through the
bridge's ``$HERMES_HOME/whatsapp/session/lid-mapping-*.json`` files,
starting from ``identifier``. The result always includes the
normalized input itself, so callers can safely ``in`` check against
the return value without a separate fallback branch.
Returns an empty set if ``identifier`` normalizes to empty.
"""
normalized = normalize_whatsapp_identifier(identifier)
if not normalized:
return set()
session_dir = get_hermes_home() / "whatsapp" / "session"
resolved: Set[str] = set()
queue = [normalized]
while queue:
current = queue.pop(0)
if not current or current in resolved:
continue
resolved.add(current)
for suffix in ("", "_reverse"):
mapping_path = session_dir / f"lid-mapping-{current}{suffix}.json"
if not mapping_path.exists():
continue
try:
mapped = normalize_whatsapp_identifier(
json.loads(mapping_path.read_text(encoding="utf-8"))
)
except Exception:
continue
if mapped and mapped not in resolved:
queue.append(mapped)
return resolved
def canonical_whatsapp_identifier(identifier: str) -> str:
"""Return a stable WhatsApp sender identity across phone-JID/LID variants.
WhatsApp may surface the same person under either a phone-format JID
(``60123456789@s.whatsapp.net``) or a LID (``1234567890@lid``). This
applies to a DM ``chat_id`` *and* to the ``participant_id`` of a
member inside a group chat both represent a user identity, and the
bridge may flip between the two for the same human.
This helper reads the bridge's ``whatsapp/session/lid-mapping-*.json``
files, walks the mapping transitively, and picks the shortest
(numeric-preferred) alias as the canonical identity.
:func:`gateway.session.build_session_key` uses this for both WhatsApp
DM chat_ids and WhatsApp group participant_ids, so callers get the
same session-key identity Hermes itself uses.
Plugins that need per-sender behaviour (role-based routing,
authorisation, per-contact policy) should use this so their
bookkeeping lines up with Hermes' session bookkeeping even when
the bridge reshuffles aliases.
Returns an empty string if ``identifier`` normalizes to empty. If no
mapping files exist yet (fresh bridge install), returns the
normalized input unchanged.
"""
normalized = normalize_whatsapp_identifier(identifier)
if not normalized:
return ""
# expand_whatsapp_aliases always includes `normalized` itself in the
# returned set, so the min() below degrades gracefully to `normalized`
# when no lid-mapping files are present.
aliases = expand_whatsapp_aliases(normalized)
return min(aliases, key=lambda candidate: (len(candidate), candidate))
+99 -922
View File
File diff suppressed because it is too large Load Diff
+3 -72
View File
@@ -110,40 +110,18 @@ def _display_source(source: str) -> str:
return source.split(":", 1)[1] if source.startswith("manual:") else source
def _classify_exhausted_status(entry) -> tuple[str, bool]:
code = getattr(entry, "last_error_code", None)
reason = str(getattr(entry, "last_error_reason", "") or "").strip().lower()
message = str(getattr(entry, "last_error_message", "") or "").strip().lower()
if code == 429 or any(token in reason for token in ("rate_limit", "usage_limit", "quota", "exhausted")) or any(
token in message for token in ("rate limit", "usage limit", "quota", "too many requests")
):
return "rate-limited", True
if code in {401, 403} or any(token in reason for token in ("invalid_token", "invalid_grant", "unauthorized", "forbidden", "auth")) or any(
token in message for token in ("unauthorized", "forbidden", "expired", "revoked", "invalid token", "authentication")
):
return "auth failed", False
return "exhausted", True
def _format_exhausted_status(entry) -> str:
if entry.last_status != STATUS_EXHAUSTED:
return ""
label, show_retry_window = _classify_exhausted_status(entry)
reason = getattr(entry, "last_error_reason", None)
reason_text = f" {reason}" if isinstance(reason, str) and reason.strip() else ""
code = f" ({entry.last_error_code})" if entry.last_error_code else ""
if not show_retry_window:
return f" {label}{reason_text}{code} (re-auth may be required)"
exhausted_until = _exhausted_until(entry)
if exhausted_until is None:
return f" {label}{reason_text}{code}"
return f" exhausted{reason_text}{code}"
remaining = max(0, int(math.ceil(exhausted_until - time.time())))
if remaining <= 0:
return f" {label}{reason_text}{code} (ready to retry)"
return f" exhausted{reason_text}{code} (ready to retry)"
minutes, seconds = divmod(remaining, 60)
hours, minutes = divmod(minutes, 60)
days, hours = divmod(hours, 24)
@@ -155,7 +133,7 @@ def _format_exhausted_status(entry) -> str:
wait = f"{minutes}m {seconds}s"
else:
wait = f"{seconds}s"
return f" {label}{reason_text}{code} ({wait} left)"
return f" exhausted{reason_text}{code} ({wait} left)"
def auth_add_command(args) -> None:
@@ -408,44 +386,6 @@ def auth_reset_command(args) -> None:
print(f"Reset status on {count} {provider} credentials")
def auth_status_command(args) -> None:
provider = _normalize_provider(getattr(args, "provider", "") or "")
if not provider:
raise SystemExit("Provider is required. Example: `hermes auth status spotify`.")
status = auth_mod.get_auth_status(provider)
if not status.get("logged_in"):
reason = status.get("error")
if reason:
print(f"{provider}: logged out ({reason})")
else:
print(f"{provider}: logged out")
return
print(f"{provider}: logged in")
for key in ("auth_type", "client_id", "redirect_uri", "scope", "expires_at", "api_base_url"):
value = status.get(key)
if value:
print(f" {key}: {value}")
def auth_logout_command(args) -> None:
auth_mod.logout_command(SimpleNamespace(provider=getattr(args, "provider", None)))
def auth_spotify_command(args) -> None:
action = str(getattr(args, "spotify_action", "") or "login").strip().lower()
if action in {"", "login"}:
auth_mod.login_spotify_command(args)
return
if action == "status":
auth_status_command(SimpleNamespace(provider="spotify"))
return
if action == "logout":
auth_logout_command(SimpleNamespace(provider="spotify"))
return
raise SystemExit(f"Unknown Spotify auth action: {action}")
def _interactive_auth() -> None:
"""Interactive credential pool management when `hermes auth` is called bare."""
# Show current pool status first
@@ -643,14 +583,5 @@ def auth_command(args) -> None:
if action == "reset":
auth_reset_command(args)
return
if action == "status":
auth_status_command(args)
return
if action == "logout":
auth_logout_command(args)
return
if action == "spotify":
auth_spotify_command(args)
return
# No subcommand — launch interactive mode
_interactive_auth()
+1 -54
View File
@@ -238,52 +238,6 @@ def get_git_banner_state(repo_dir: Optional[Path] = None) -> Optional[dict]:
return {"upstream": upstream, "local": local, "ahead": max(ahead, 0)}
_RELEASE_URL_BASE = "https://github.com/NousResearch/hermes-agent/releases/tag"
_latest_release_cache: Optional[tuple] = None # (tag, url) once resolved
def get_latest_release_tag(repo_dir: Optional[Path] = None) -> Optional[tuple]:
"""Return ``(tag, release_url)`` for the latest git tag, or None.
Local-only runs ``git describe --tags --abbrev=0`` against the
Hermes checkout. Cached per-process. Release URL always points at the
canonical NousResearch/hermes-agent repo (forks don't get a link).
"""
global _latest_release_cache
if _latest_release_cache is not None:
return _latest_release_cache or None
repo_dir = repo_dir or _resolve_repo_dir()
if repo_dir is None:
_latest_release_cache = () # falsy sentinel — skip future lookups
return None
try:
result = subprocess.run(
["git", "describe", "--tags", "--abbrev=0"],
capture_output=True,
text=True,
timeout=3,
cwd=str(repo_dir),
)
except Exception:
_latest_release_cache = ()
return None
if result.returncode != 0:
_latest_release_cache = ()
return None
tag = (result.stdout or "").strip()
if not tag:
_latest_release_cache = ()
return None
url = f"{_RELEASE_URL_BASE}/{tag}"
_latest_release_cache = (tag, url)
return _latest_release_cache
def format_banner_version_label() -> str:
"""Return the version label shown in the startup banner title."""
base = f"Hermes Agent v{VERSION} ({RELEASE_DATE})"
@@ -565,16 +519,9 @@ def build_welcome_banner(console: Console, model: str, cwd: str,
agent_name = _skin_branding("agent_name", "Hermes Agent")
title_color = _skin_color("banner_title", "#FFD700")
border_color = _skin_color("banner_border", "#CD7F32")
version_label = format_banner_version_label()
release_info = get_latest_release_tag()
if release_info:
_tag, _url = release_info
title_markup = f"[bold {title_color}][link={_url}]{version_label}[/link][/]"
else:
title_markup = f"[bold {title_color}]{version_label}[/]"
outer_panel = Panel(
layout_table,
title=title_markup,
title=f"[bold {title_color}]{format_banner_version_label()}[/]",
border_style=border_color,
padding=(0, 2),
)
+8 -12
View File
@@ -77,7 +77,7 @@ COMMAND_REGISTRY: list[CommandDef] = [
CommandDef("rollback", "List or restore filesystem checkpoints", "Session",
args_hint="[number]"),
CommandDef("snapshot", "Create or restore state snapshots of Hermes config/state", "Session",
cli_only=True, aliases=("snap",), args_hint="[create|restore <id>|prune]"),
aliases=("snap",), args_hint="[create|restore <id>|prune]"),
CommandDef("stop", "Kill all running background processes", "Session"),
CommandDef("approve", "Approve a pending dangerous command", "Session",
gateway_only=True, args_hint="[session|always]"),
@@ -103,10 +103,10 @@ COMMAND_REGISTRY: list[CommandDef] = [
# Configuration
CommandDef("config", "Show current configuration", "Configuration",
cli_only=True),
CommandDef("model", "Switch model for this session", "Configuration",
aliases=("provider",), args_hint="[model] [--provider name] [--global]"),
CommandDef("gquota", "Show Google Gemini Code Assist quota usage", "Info",
cli_only=True),
CommandDef("model", "Switch model for this session", "Configuration", args_hint="[model] [--provider name] [--global]"),
CommandDef("provider", "Show available providers and current provider",
"Configuration"),
CommandDef("gquota", "Show Google Gemini Code Assist quota usage", "Info"),
CommandDef("personality", "Set a predefined personality", "Configuration",
args_hint="[name]"),
@@ -124,12 +124,9 @@ COMMAND_REGISTRY: list[CommandDef] = [
args_hint="[normal|fast|status]",
subcommands=("normal", "fast", "status", "on", "off")),
CommandDef("skin", "Show or change the display skin/theme", "Configuration",
cli_only=True, args_hint="[name]"),
args_hint="[name]"),
CommandDef("voice", "Toggle voice mode", "Configuration",
args_hint="[on|off|tts|status]", subcommands=("on", "off", "tts", "status")),
CommandDef("busy", "Control what Enter does while Hermes is working", "Configuration",
cli_only=True, args_hint="[queue|interrupt|status]",
subcommands=("queue", "interrupt", "status")),
# Tools & Skills
CommandDef("tools", "Manage tools: /tools [list|disable|enable] [name...]", "Tools & Skills",
@@ -142,8 +139,7 @@ COMMAND_REGISTRY: list[CommandDef] = [
CommandDef("cron", "Manage scheduled tasks", "Tools & Skills",
cli_only=True, args_hint="[subcommand]",
subcommands=("list", "add", "create", "edit", "pause", "resume", "run", "remove")),
CommandDef("reload", "Reload .env variables into the running session", "Tools & Skills",
cli_only=True),
CommandDef("reload", "Reload .env variables into the running session", "Tools & Skills"),
CommandDef("reload-mcp", "Reload MCP servers from config", "Tools & Skills",
aliases=("reload_mcp",)),
CommandDef("browser", "Connect browser tools to your live Chrome via CDP", "Tools & Skills",
@@ -321,7 +317,7 @@ def should_bypass_active_session(command_name: str | None) -> bool:
safety net in gateway.run discards any command text that reaches
the pending queue which meant a mid-run /model (or /reasoning,
/voice, /insights, /title, /resume, /retry, /undo, /compress,
/usage, /reload-mcp, /sethome, /reset) would silently
/usage, /provider, /reload-mcp, /sethome, /reset) would silently
interrupt the agent AND get discarded, producing a zero-char
response. See issue #5057 / PRs #6252, #10370, #4665.
+10 -43
View File
@@ -466,12 +466,6 @@ DEFAULT_CONFIG = {
"record_sessions": False, # Auto-record browser sessions as WebM videos
"allow_private_urls": False, # Allow navigating to private/internal IPs (localhost, 192.168.x.x, etc.)
"cdp_url": "", # Optional persistent CDP endpoint for attaching to an existing Chromium/Chrome
# CDP supervisor — dialog + frame detection via a persistent WebSocket.
# Active only when a CDP-capable backend is attached (Browserbase or
# local Chrome via /browser connect). See
# website/docs/developer-guide/browser-supervisor.md.
"dialog_policy": "must_respond", # must_respond | auto_dismiss | auto_accept
"dialog_timeout_s": 300, # Safety auto-dismiss after N seconds under must_respond
"camofox": {
# When true, Hermes sends a stable profile-scoped userId to Camofox
# so the server maps it to a persistent Firefox profile automatically.
@@ -492,27 +486,7 @@ DEFAULT_CONFIG = {
# exceed this are rejected with guidance to use offset+limit.
# 100K chars ≈ 2535K tokens across typical tokenisers.
"file_read_max_chars": 100_000,
# Tool-output truncation thresholds. When terminal output or a
# single read_file page exceeds these limits, Hermes truncates the
# payload sent to the model (keeping head + tail for terminal,
# enforcing pagination for read_file). Tuning these trades context
# footprint against how much raw output the model can see in one
# shot. Ported from anomalyco/opencode PR #23770.
#
# - max_bytes: terminal_tool output cap, in chars
# (default 50_000 ≈ 12-15K tokens).
# - max_lines: read_file pagination cap — the maximum `limit`
# a single read_file call can request before
# being clamped (default 2000).
# - max_line_length: per-line cap applied when read_file emits a
# line-numbered view (default 2000 chars).
"tool_output": {
"max_bytes": 50_000,
"max_lines": 2000,
"max_line_length": 2000,
},
"compression": {
"enabled": True,
"threshold": 0.50, # compress when context usage exceeds this ratio
@@ -521,12 +495,6 @@ DEFAULT_CONFIG = {
},
# Anthropic prompt caching (Claude via OpenRouter or native Anthropic API).
# cache_ttl must be "5m" or "1h" (Anthropic-supported tiers); other values are ignored.
"prompt_caching": {
"cache_ttl": "5m",
},
# AWS Bedrock provider configuration.
# Only used when model.provider is "bedrock".
"bedrock": {
@@ -612,6 +580,14 @@ DEFAULT_CONFIG = {
"timeout": 30,
"extra_body": {},
},
"flush_memories": {
"provider": "auto",
"model": "",
"base_url": "",
"api_key": "",
"timeout": 30,
"extra_body": {},
},
"title_generation": {
"provider": "auto",
"model": "",
@@ -775,15 +751,6 @@ DEFAULT_CONFIG = {
# warning log if out of range.
"max_spawn_depth": 1, # depth cap (1 = flat [default], 2 = orchestrator→leaf, 3 = three-level)
"orchestrator_enabled": True, # kill switch for role="orchestrator"
# When a subagent hits a dangerous-command approval prompt, the parent's
# prompt_toolkit TUI owns stdin — a thread-local input() call from the
# subagent worker would deadlock the parent UI. To avoid the deadlock,
# subagent threads ALWAYS resolve approvals non-interactively:
# false (default) → auto-deny with a logger.warning audit line (safe)
# true → auto-approve "once" with a logger.warning audit line
# Flip to true only if you trust delegated work to run dangerous cmds
# without human review (cron pipelines, batch automation, etc.).
"subagent_auto_approve": False,
},
# Ephemeral prefill messages file — JSON list of {role, content} dicts
@@ -840,7 +807,7 @@ DEFAULT_CONFIG = {
"auto_thread": True, # Auto-create threads on @mention in channels (like Slack)
"reactions": True, # Add 👀/✅/❌ reactions to messages during processing
"channel_prompts": {}, # Per-channel ephemeral system prompts (forum parents apply to child threads)
# discord / discord_admin tools: restrict which actions the agent may call.
# discord_server tool: restrict which actions the agent may call.
# Default (empty) = all actions allowed (subject to bot privileged intents).
# Accepts comma-separated string ("list_guilds,list_channels,fetch_messages")
# or YAML list. Unknown names are dropped with a warning at load time.
-93
View File
@@ -275,99 +275,6 @@ def copilot_device_code_login(
return None
# ─── Copilot Token Exchange ────────────────────────────────────────────────
# Module-level cache for exchanged Copilot API tokens.
# Maps raw_token_fingerprint -> (api_token, expires_at_epoch).
_jwt_cache: dict[str, tuple[str, float]] = {}
_JWT_REFRESH_MARGIN_SECONDS = 120 # refresh 2 min before expiry
# Token exchange endpoint and headers (matching VS Code / Copilot CLI)
_TOKEN_EXCHANGE_URL = "https://api.github.com/copilot_internal/v2/token"
_EDITOR_VERSION = "vscode/1.104.1"
_EXCHANGE_USER_AGENT = "GitHubCopilotChat/0.26.7"
def _token_fingerprint(raw_token: str) -> str:
"""Short fingerprint of a raw token for cache keying (avoids storing full token)."""
import hashlib
return hashlib.sha256(raw_token.encode()).hexdigest()[:16]
def exchange_copilot_token(raw_token: str, *, timeout: float = 10.0) -> tuple[str, float]:
"""Exchange a raw GitHub token for a short-lived Copilot API token.
Calls ``GET https://api.github.com/copilot_internal/v2/token`` with
the raw GitHub token and returns ``(api_token, expires_at)``.
The returned token is a semicolon-separated string (not a standard JWT)
used as ``Authorization: Bearer <token>`` for Copilot API requests.
Results are cached in-process and reused until close to expiry.
Raises ``ValueError`` on failure.
"""
import urllib.request
fp = _token_fingerprint(raw_token)
# Check cache first
cached = _jwt_cache.get(fp)
if cached:
api_token, expires_at = cached
if time.time() < expires_at - _JWT_REFRESH_MARGIN_SECONDS:
return api_token, expires_at
req = urllib.request.Request(
_TOKEN_EXCHANGE_URL,
method="GET",
headers={
"Authorization": f"token {raw_token}",
"User-Agent": _EXCHANGE_USER_AGENT,
"Accept": "application/json",
"Editor-Version": _EDITOR_VERSION,
},
)
try:
with urllib.request.urlopen(req, timeout=timeout) as resp:
data = json.loads(resp.read().decode())
except Exception as exc:
raise ValueError(f"Copilot token exchange failed: {exc}") from exc
api_token = data.get("token", "")
expires_at = data.get("expires_at", 0)
if not api_token:
raise ValueError("Copilot token exchange returned empty token")
# Convert expires_at to float if needed
expires_at = float(expires_at) if expires_at else time.time() + 1800
_jwt_cache[fp] = (api_token, expires_at)
logger.debug(
"Copilot token exchanged, expires_at=%s",
expires_at,
)
return api_token, expires_at
def get_copilot_api_token(raw_token: str) -> str:
"""Exchange a raw GitHub token for a Copilot API token, with fallback.
Convenience wrapper: returns the exchanged token on success, or the
raw token unchanged if the exchange fails (e.g. network error, unsupported
account type). This preserves existing behaviour for accounts that don't
need exchange while enabling access to internal-only models for those that do.
"""
if not raw_token:
return raw_token
try:
api_token, _ = exchange_copilot_token(raw_token)
return api_token
except Exception as exc:
logger.debug("Copilot token exchange failed, using raw token: %s", exc)
return raw_token
# ─── Copilot API Headers ───────────────────────────────────────────────────
def copilot_request_headers(
-9
View File
@@ -93,9 +93,6 @@ def cron_list(show_all: bool = False):
script = job.get("script")
if script:
print(f" Script: {script}")
workdir = job.get("workdir")
if workdir:
print(f" Workdir: {workdir}")
# Execution history
last_status = job.get("last_status")
@@ -171,7 +168,6 @@ def cron_create(args):
skill=getattr(args, "skill", None),
skills=_normalize_skills(getattr(args, "skill", None), getattr(args, "skills", None)),
script=getattr(args, "script", None),
workdir=getattr(args, "workdir", None),
)
if not result.get("success"):
print(color(f"Failed to create job: {result.get('error', 'unknown error')}", Colors.RED))
@@ -184,8 +180,6 @@ def cron_create(args):
job_data = result.get("job", {})
if job_data.get("script"):
print(f" Script: {job_data['script']}")
if job_data.get("workdir"):
print(f" Workdir: {job_data['workdir']}")
print(f" Next run: {result['next_run_at']}")
return 0
@@ -224,7 +218,6 @@ def cron_edit(args):
repeat=getattr(args, "repeat", None),
skills=final_skills,
script=getattr(args, "script", None),
workdir=getattr(args, "workdir", None),
)
if not result.get("success"):
print(color(f"Failed to update job: {result.get('error', 'unknown error')}", Colors.RED))
@@ -240,8 +233,6 @@ def cron_edit(args):
print(" Skills: none")
if updated.get("script"):
print(f" Script: {updated['script']}")
if updated.get("workdir"):
print(f" Workdir: {updated['workdir']}")
return 0
+8 -29
View File
@@ -29,7 +29,6 @@ if _env_path.exists():
load_dotenv(PROJECT_ROOT / ".env", override=False, encoding="utf-8")
from hermes_cli.colors import Colors, color
from hermes_cli.models import _HERMES_USER_AGENT
from hermes_constants import OPENROUTER_MODELS_URL
from utils import base_url_host_matches
@@ -296,33 +295,16 @@ def run_doctor(args):
except Exception:
pass
try:
from hermes_cli.config import get_compatible_custom_providers as _compatible_custom_providers
from hermes_cli.providers import resolve_provider_full as _resolve_provider_full
from hermes_cli.auth import resolve_provider as _resolve_provider
except Exception:
_compatible_custom_providers = None
_resolve_provider_full = None
custom_providers = []
if _compatible_custom_providers is not None:
try:
custom_providers = _compatible_custom_providers(cfg)
except Exception:
custom_providers = []
user_providers = cfg.get("providers")
if isinstance(user_providers, dict):
known_providers.update(str(name).strip().lower() for name in user_providers if str(name).strip())
for entry in custom_providers:
if not isinstance(entry, dict):
continue
name = str(entry.get("name") or "").strip()
if name:
known_providers.add("custom:" + name.lower().replace(" ", "-"))
_resolve_provider = None
canonical_provider = provider
if provider and _resolve_provider_full is not None and provider != "auto":
provider_def = _resolve_provider_full(provider, user_providers, custom_providers)
canonical_provider = provider_def.id if provider_def is not None else None
if provider and _resolve_provider is not None and provider != "auto":
try:
canonical_provider = _resolve_provider(provider)
except Exception:
canonical_provider = None
if provider and provider != "auto":
if canonical_provider is None or (known_providers and canonical_provider not in known_providers):
@@ -975,10 +957,7 @@ def run_doctor(args):
if base_url_host_matches(_base, "api.kimi.com") and _base.rstrip("/").endswith("/coding"):
_base = _base.rstrip("/") + "/v1"
_url = (_base.rstrip("/") + "/models") if _base else _default_url
_headers = {
"Authorization": f"Bearer {_key}",
"User-Agent": _HERMES_USER_AGENT,
}
_headers = {"Authorization": f"Bearer {_key}"}
if base_url_host_matches(_base, "api.kimi.com"):
_headers["User-Agent"] = "claude-code/0.1.0"
_resp = httpx.get(
-2
View File
@@ -267,8 +267,6 @@ def run_dump(args):
("ANTHROPIC_API_KEY", "anthropic"),
("ANTHROPIC_TOKEN", "anthropic_token"),
("NOUS_API_KEY", "nous"),
("GOOGLE_API_KEY", "google/gemini"),
("GEMINI_API_KEY", "gemini"),
("GLM_API_KEY", "glm/zai"),
("ZAI_API_KEY", "zai"),
("KIMI_API_KEY", "kimi"),
+81 -508
View File
@@ -51,7 +51,6 @@ import sys
from pathlib import Path
from typing import Optional
def _add_accept_hooks_flag(parser) -> None:
"""Attach the ``--accept-hooks`` flag. Shared across every agent
subparser so the flag works regardless of CLI position."""
@@ -167,28 +166,6 @@ from hermes_cli.env_loader import load_hermes_dotenv
load_hermes_dotenv(project_env=PROJECT_ROOT / ".env")
# Bridge security.redact_secrets from config.yaml → HERMES_REDACT_SECRETS env
# var BEFORE hermes_logging imports agent.redact (which snapshots the flag at
# module-import time). Without this, config.yaml's toggle is ignored because
# the setup_logging() call below imports agent.redact, which reads the env var
# exactly once. Env var in .env still wins — this is config.yaml fallback only.
try:
if "HERMES_REDACT_SECRETS" not in os.environ:
import yaml as _yaml_early
_cfg_path = get_hermes_home() / "config.yaml"
if _cfg_path.exists():
with open(_cfg_path, encoding="utf-8") as _f:
_early_sec_cfg = (_yaml_early.safe_load(_f) or {}).get("security", {})
if isinstance(_early_sec_cfg, dict):
_early_redact = _early_sec_cfg.get("redact_secrets")
if _early_redact is not None:
os.environ["HERMES_REDACT_SECRETS"] = str(_early_redact).lower()
del _early_sec_cfg
del _cfg_path
except Exception:
pass # best-effort — redaction stays at default (enabled) on config errors
# Initialize centralized file logging early — all `hermes` subcommands
# (chat, setup, gateway, config, etc.) write to agent.log + errors.log.
try:
@@ -841,8 +818,6 @@ def _find_bundled_tui(tui_dir: Path) -> Optional[Path]:
def _tui_build_needed(tui_dir: Path) -> bool:
if _hermes_ink_bundle_stale(tui_dir):
return True
entry = tui_dir / "dist" / "entry.js"
if not entry.exists():
return True
@@ -1030,12 +1005,7 @@ def _make_tui_argv(tui_dir: Path, tui_dev: bool) -> tuple[list[str], Path]:
return [node, str(root / "dist" / "entry.js")], root
def _launch_tui(
resume_session_id: Optional[str] = None,
tui_dev: bool = False,
model: Optional[str] = None,
provider: Optional[str] = None,
):
def _launch_tui(resume_session_id: Optional[str] = None, tui_dev: bool = False):
"""Replace current process with the TUI."""
tui_dir = PROJECT_ROOT / "ui-tui"
@@ -1045,12 +1015,6 @@ def _launch_tui(
)
env.setdefault("HERMES_PYTHON", sys.executable)
env.setdefault("HERMES_CWD", os.getcwd())
if model:
env["HERMES_MODEL"] = model
env["HERMES_INFERENCE_MODEL"] = model
if provider:
env["HERMES_TUI_PROVIDER"] = provider
env["HERMES_INFERENCE_PROVIDER"] = provider
# Guarantee an 8GB V8 heap + exposed GC for the TUI. Default node cap is
# ~1.54GB depending on version and can fatal-OOM on long sessions with
# large transcripts / reasoning blobs. Token-level merge: respect any
@@ -1189,8 +1153,6 @@ def cmd_chat(args):
_launch_tui(
getattr(args, "resume", None),
tui_dev=getattr(args, "tui_dev", False),
model=getattr(args, "model", None),
provider=getattr(args, "provider", None),
)
# Import and run the CLI
@@ -1342,9 +1304,7 @@ def cmd_whatsapp(args):
return
if not (bridge_dir / "node_modules").exists():
print(
"\n→ Installing WhatsApp bridge dependencies (this can take a few minutes)..."
)
print("\n→ Installing WhatsApp bridge dependencies (this can take a few minutes)...")
npm = shutil.which("npm")
if not npm:
print(" ✗ npm not found on PATH — install Node.js first")
@@ -1469,7 +1429,6 @@ def select_provider_and_model(args=None):
load_config,
get_env_value,
)
from hermes_cli.providers import resolve_provider_full
config = load_config()
current_model = config.get("model")
@@ -1487,30 +1446,14 @@ def select_provider_and_model(args=None):
effective_provider = (
config_provider or os.getenv("HERMES_INFERENCE_PROVIDER") or "auto"
)
compatible_custom_providers = get_compatible_custom_providers(config)
active = None
if effective_provider != "auto":
active_def = resolve_provider_full(
effective_provider,
config.get("providers"),
compatible_custom_providers,
)
if active_def is not None:
active = active_def.id
else:
warning = (
f"Unknown provider '{effective_provider}'. Check 'hermes model' for "
"available providers, or run 'hermes doctor' to diagnose config "
"issues."
)
print(f"Warning: {warning} Falling back to auto provider detection.")
if active is None:
try:
active = resolve_provider(effective_provider)
except AuthError as exc:
warning = format_auth_error(exc)
print(f"Warning: {warning} Falling back to auto provider detection.")
try:
active = resolve_provider("auto")
except AuthError as exc:
if effective_provider == "auto":
warning = format_auth_error(exc)
print(f"Warning: {warning} Falling back to auto provider detection.")
except AuthError:
active = None # no provider yet; default to first in list
# Detect custom endpoint
@@ -1720,14 +1663,15 @@ def _clear_stale_openai_base_url():
# (task_key, display_name, short_description)
_AUX_TASKS: list[tuple[str, str, str]] = [
("vision", "Vision", "image/screenshot analysis"),
("compression", "Compression", "context summarization"),
("web_extract", "Web extract", "web page summarization"),
("session_search", "Session search", "past-conversation recall"),
("approval", "Approval", "smart command approval"),
("mcp", "MCP", "MCP tool reasoning"),
("vision", "Vision", "image/screenshot analysis"),
("compression", "Compression", "context summarization"),
("web_extract", "Web extract", "web page summarization"),
("session_search", "Session search", "past-conversation recall"),
("approval", "Approval", "smart command approval"),
("mcp", "MCP", "MCP tool reasoning"),
("flush_memories", "Flush memories", "memory consolidation"),
("title_generation", "Title generation", "session titles"),
("skills_hub", "Skills hub", "skills search/install"),
("skills_hub", "Skills hub", "skills search/install"),
]
@@ -1826,7 +1770,7 @@ def _aux_config_menu() -> None:
print(" Auxiliary models — side-task routing")
print()
print(" Side tasks (vision, compression, web extraction, etc.) default")
print(' to your main chat model. "auto" means "use my main model"')
print(" to your main chat model. \"auto\" means \"use my main model\"")
print(" Hermes only falls back to a lightweight backend (OpenRouter,")
print(" Nous Portal) if the main model is unavailable. Override a")
print(" task below if you want it pinned to a specific provider/model.")
@@ -1837,20 +1781,15 @@ def _aux_config_menu() -> None:
desc_col = max(len(desc) for _, _, desc in _AUX_TASKS) + 4
entries: list[tuple[str, str]] = []
for task_key, name, desc in _AUX_TASKS:
task_cfg = (
aux.get(task_key, {}) if isinstance(aux.get(task_key), dict) else {}
)
task_cfg = aux.get(task_key, {}) if isinstance(aux.get(task_key), dict) else {}
current = _format_aux_current(task_cfg)
label = (
f"{name.ljust(name_col)}{('(' + desc + ')').ljust(desc_col)}{current}"
)
label = f"{name.ljust(name_col)}{('(' + desc + ')').ljust(desc_col)}{current}"
entries.append((task_key, label))
entries.append(("__reset__", "Reset all to auto"))
entries.append(("__back__", "Back"))
entries.append(("__back__", "Back"))
idx = _prompt_provider_choice(
[label for _, label in entries],
default=0,
[label for _, label in entries], default=0,
)
if idx is None:
return
@@ -1898,9 +1837,7 @@ def _aux_select_for_task(task: str) -> None:
entries: list[tuple[str, str, list[str]]] = [] # (slug, label, models)
# "auto" always first
auto_marker = (
" ← current" if current_provider == "auto" and not current_base_url else ""
)
auto_marker = " ← current" if current_provider == "auto" and not current_base_url else ""
entries.append(("__auto__", f"auto (recommended){auto_marker}", []))
for p in providers:
@@ -1909,9 +1846,7 @@ def _aux_select_for_task(task: str) -> None:
total = p.get("total_models", 0)
models = p.get("models") or []
model_hint = f"{total} models" if total else ""
marker = (
" ← current" if slug == current_provider and not current_base_url else ""
)
marker = " ← current" if slug == current_provider and not current_base_url else ""
entries.append((slug, f"{name}{model_hint}{marker}", list(models)))
# Custom endpoint (raw base_url)
@@ -1979,17 +1914,14 @@ def _aux_flow_provider_model(
selected = val or ""
else:
selected = _prompt_model_selection(
model_list,
current_model=current_model,
pricing=pricing,
model_list, current_model=current_model, pricing=pricing,
)
if selected is None:
print("No change.")
return
_save_aux_choice(
task, provider=provider_slug, model=selected or "", base_url="", api_key=""
)
_save_aux_choice(task, provider=provider_slug, model=selected or "",
base_url="", api_key="")
if selected:
print(f"{display_name}: {provider_slug} · {selected}")
else:
@@ -2009,9 +1941,7 @@ def _aux_flow_custom_endpoint(task: str, task_cfg: dict) -> None:
print(" Provide an OpenAI-compatible base URL (e.g. http://localhost:11434/v1)")
print()
try:
url_prompt = (
f"Base URL [{current_base_url}]: " if current_base_url else "Base URL: "
)
url_prompt = f"Base URL [{current_base_url}]: " if current_base_url else "Base URL: "
url = input(url_prompt).strip()
except (KeyboardInterrupt, EOFError):
print()
@@ -2021,30 +1951,20 @@ def _aux_flow_custom_endpoint(task: str, task_cfg: dict) -> None:
print("No URL provided. No change.")
return
try:
model_prompt = (
f"Model slug (optional) [{current_model}]: "
if current_model
else "Model slug (optional): "
)
model_prompt = f"Model slug (optional) [{current_model}]: " if current_model else "Model slug (optional): "
model = input(model_prompt).strip()
except (KeyboardInterrupt, EOFError):
print()
return
model = model or current_model
try:
api_key = getpass.getpass(
"API key (optional, blank = use OPENAI_API_KEY): "
).strip()
api_key = getpass.getpass("API key (optional, blank = use OPENAI_API_KEY): ").strip()
except (KeyboardInterrupt, EOFError):
print()
return
_save_aux_choice(
task,
provider="custom",
model=model,
base_url=url,
api_key=api_key,
task, provider="custom", model=model, base_url=url, api_key=api_key,
)
short_url = url.replace("https://", "").replace("http://", "").rstrip("/")
print(f"{display_name}: custom ({short_url})" + (f" · {model}" if model else ""))
@@ -2160,9 +2080,7 @@ def _model_flow_ai_gateway(config, current_model=""):
api_key = get_env_value("AI_GATEWAY_API_KEY")
if not api_key:
print("No Vercel AI Gateway API key configured.")
print(
"Create API key here: https://vercel.com/d?to=%2F%5Bteam%5D%2F%7E%2Fai-gateway&title=AI+Gateway"
)
print("Create API key here: https://vercel.com/d?to=%2F%5Bteam%5D%2F%7E%2Fai-gateway&title=AI+Gateway")
print("Add a payment method to get $5 in free credits.")
print()
try:
@@ -2393,41 +2311,7 @@ def _model_flow_openai_codex(config, current_model=""):
from hermes_cli.codex_models import get_codex_model_ids
status = get_codex_auth_status()
if status.get("logged_in"):
print(" OpenAI Codex credentials: ✓")
print()
print(" 1. Use existing credentials")
print(" 2. Reauthenticate (new OAuth login)")
print(" 3. Cancel")
print()
try:
choice = input(" Choice [1/2/3]: ").strip()
except (KeyboardInterrupt, EOFError):
choice = "1"
if choice == "2":
print("Starting a fresh OpenAI Codex login...")
print()
try:
mock_args = argparse.Namespace()
_login_openai_codex(
mock_args,
PROVIDER_REGISTRY["openai-codex"],
force_new_login=True,
)
except SystemExit:
print("Login cancelled or failed.")
return
except Exception as exc:
print(f"Login failed: {exc}")
return
status = get_codex_auth_status()
if not status.get("logged_in"):
print("Login failed.")
return
elif choice == "3":
return
else:
if not status.get("logged_in"):
print("Not logged into OpenAI Codex. Starting login...")
print()
try:
@@ -2944,16 +2828,11 @@ def _model_flow_named_custom(config, provider_info):
name = provider_info["name"]
base_url = provider_info["base_url"]
api_mode = provider_info.get("api_mode", "")
api_key = provider_info.get("api_key", "")
key_env = provider_info.get("key_env", "")
saved_model = provider_info.get("model", "")
provider_key = (provider_info.get("provider_key") or "").strip()
# Resolve key from env var if api_key not set directly
if not api_key and key_env:
api_key = os.environ.get(key_env, "")
print(f" Provider: {name}")
print(f" URL: {base_url}")
if saved_model:
@@ -2961,12 +2840,7 @@ def _model_flow_named_custom(config, provider_info):
print()
print("Fetching available models...")
models = fetch_api_models(
api_key,
base_url,
timeout=8.0,
api_mode=api_mode or None,
)
models = fetch_api_models(api_key, base_url, timeout=8.0)
if models:
default_idx = 0
@@ -3635,12 +3509,7 @@ def _model_flow_stepfun(config, current_model=""):
_save_model_choice,
deactivate_provider,
)
from hermes_cli.config import (
get_env_value,
save_env_value,
load_config,
save_config,
)
from hermes_cli.config import get_env_value, save_env_value, load_config, save_config
from hermes_cli.models import fetch_api_models
provider_id = "stepfun"
@@ -3659,7 +3528,6 @@ def _model_flow_stepfun(config, current_model=""):
if key_env:
try:
import getpass
new_key = getpass.getpass(f"{key_env} (or Enter to cancel): ").strip()
except (KeyboardInterrupt, EOFError):
print()
@@ -3685,10 +3553,7 @@ def _model_flow_stepfun(config, current_model=""):
current_region = _infer_stepfun_region(current_base or pconfig.inference_base_url)
region_choices = [
(
"international",
f"International ({_stepfun_base_url_for_region('international')})",
),
("international", f"International ({_stepfun_base_url_for_region('international')})"),
("china", f"China ({_stepfun_base_url_for_region('china')})"),
]
ordered_regions = []
@@ -4065,71 +3930,12 @@ def _model_flow_api_key_provider(config, provider_id, current_model=""):
print("Cancelled.")
return
save_env_value(key_env, new_key)
existing_key = new_key
print("API key saved.")
print()
else:
print(f" {pconfig.name} API key: {existing_key[:8]}... ✓")
print()
# Gemini free-tier gate: free-tier daily quotas (<= 250 RPD for Flash)
# are exhausted in a handful of agent turns, so refuse to wire up the
# provider with a free-tier key. Probe is best-effort; network or auth
# errors fall through without blocking.
if provider_id == "gemini" and existing_key:
try:
from agent.gemini_native_adapter import probe_gemini_tier
except Exception:
probe_gemini_tier = None
if probe_gemini_tier is not None:
print(" Checking Gemini API tier...")
probe_base = (
(get_env_value(base_url_env) if base_url_env else "")
or os.getenv(base_url_env or "", "")
or pconfig.inference_base_url
)
tier = probe_gemini_tier(existing_key, probe_base)
if tier == "free":
print()
print(
"❌ This Google API key is on the free tier "
"(<= 250 requests/day for gemini-2.5-flash)."
)
print(
" Hermes typically makes 3-10 API calls per user turn "
"(tool iterations + auxiliary tasks),"
)
print(
" so the free tier is exhausted after a handful of "
"messages and cannot sustain"
)
print(" an agent session.")
print()
print(
" To use Gemini with Hermes, enable billing on your "
"Google Cloud project and regenerate"
)
print(
" the key in a billing-enabled project: "
"https://aistudio.google.com/apikey"
)
print()
print(
" Alternatives with workable free usage: DeepSeek, "
"OpenRouter (free models), Groq, Nous."
)
print()
print("Not saving Gemini as the default provider.")
return
if tier == "paid":
print(" Tier check: paid ✓")
else:
# "unknown" -- network issue, auth problem, unexpected response.
# Don't block; the runtime 429 handler will surface free-tier
# guidance if the key turns out to be free tier.
print(" Tier check: could not verify (proceeding anyway).")
print()
# Optional base URL override
current_base = ""
if base_url_env:
@@ -4371,8 +4177,6 @@ def _model_flow_anthropic(config, current_model=""):
from agent.anthropic_adapter import (
read_claude_code_credentials,
is_claude_code_token_valid,
_is_oauth_token,
_resolve_claude_code_token_from_credentials,
)
cc_creds = read_claude_code_credentials()
@@ -4381,14 +4185,7 @@ def _model_flow_anthropic(config, current_model=""):
except Exception:
pass
# Stale-OAuth guard: if the only existing cred is an expired OAuth token
# (no valid cc_creds to fall back on), treat it as missing so the re-auth
# path is offered instead of silently accepting a broken token.
existing_is_stale_oauth = False
if existing_key and _is_oauth_token(existing_key) and not cc_available:
existing_is_stale_oauth = True
has_creds = (bool(existing_key) and not existing_is_stale_oauth) or cc_available
has_creds = bool(existing_key) or cc_available
needs_auth = not has_creds
if has_creds:
@@ -4531,7 +4328,6 @@ def cmd_webhook(args):
def cmd_hooks(args):
"""Shell-hook inspection and management."""
from hermes_cli.hooks import hooks_command
hooks_command(args)
@@ -6102,86 +5898,6 @@ def _cmd_update_impl(args, gateway_mode: bool):
)
import signal as _signal
def _wait_for_service_active(
scope_cmd_: list,
svc_name_: str,
timeout: float = 10.0,
) -> bool:
"""Poll ``systemctl is-active`` until the unit reports active.
systemd's Stopped -> Started transition after a graceful exit
(or a hard restart) is not instantaneous; a one-shot check
races that window and falsely reports the unit as down.
Poll every 0.5s up to ``timeout`` seconds before giving up.
"""
deadline = _time.monotonic() + max(timeout, 0.5)
while True:
try:
_verify = subprocess.run(
scope_cmd_ + ["is-active", svc_name_],
capture_output=True,
text=True,
timeout=5,
)
if _verify.stdout.strip() == "active":
return True
except (FileNotFoundError, subprocess.TimeoutExpired):
pass
if _time.monotonic() >= deadline:
return False
_time.sleep(0.5)
def _service_restart_sec(
scope_cmd_: list,
svc_name_: str,
default: float = 0.0,
) -> float:
"""Read the unit's ``RestartUSec`` (RestartSec) in seconds.
After a graceful exit-75, systemd waits ``RestartSec`` before
respawning the unit. Callers that poll for ``is-active``
must use a timeout >= ``RestartSec`` + transition slack, or
they'll give up *during* the cooldown window and wrongly
conclude the unit didn't relaunch.
"""
try:
_show = subprocess.run(
scope_cmd_
+ [
"show",
svc_name_,
"--property=RestartUSec",
"--value",
],
capture_output=True,
text=True,
timeout=5,
)
except (FileNotFoundError, subprocess.TimeoutExpired):
return default
raw = (_show.stdout or "").strip()
# systemd emits values like "30s", "100ms", "1min 30s", or
# "infinity". Parse conservatively; on any miss return default.
if not raw or raw == "infinity":
return default
total = 0.0
matched = False
for part in raw.split():
for _suf, _mult in (
("ms", 0.001),
("us", 0.000001),
("min", 60.0),
("s", 1.0),
):
if part.endswith(_suf):
try:
total += float(part[: -len(_suf)]) * _mult
matched = True
except ValueError:
pass
break
return total if matched else default
# Drain budget for graceful SIGUSR1 restarts. The gateway drains
# for up to ``agent.restart_drain_timeout`` (default 60s) before
# exiting with code 75; we wait slightly longer so the drain
@@ -6197,17 +5913,12 @@ def _cmd_update_impl(args, gateway_mode: bool):
_cfg_drain = None
try:
from hermes_cli.config import load_config
_cfg_agent = load_config().get("agent") or {}
_cfg_agent = (load_config().get("agent") or {})
_cfg_drain = _cfg_agent.get("restart_drain_timeout")
except Exception:
pass
try:
_drain_budget = (
float(_cfg_drain)
if _cfg_drain is not None
else float(_DEFAULT_DRAIN)
)
_drain_budget = float(_cfg_drain) if _cfg_drain is not None else float(_DEFAULT_DRAIN)
except (TypeError, ValueError):
_drain_budget = float(_DEFAULT_DRAIN)
# Add a 15s margin so the drain loop + final exit finish before
@@ -6272,23 +5983,14 @@ def _cmd_update_impl(args, gateway_mode: bool):
_main_pid = 0
try:
_show = subprocess.run(
scope_cmd
+ [
"show",
svc_name,
"--property=MainPID",
"--value",
scope_cmd + [
"show", svc_name,
"--property=MainPID", "--value",
],
capture_output=True,
text=True,
timeout=5,
capture_output=True, text=True, timeout=5,
)
_main_pid = int((_show.stdout or "").strip() or 0)
except (
ValueError,
subprocess.TimeoutExpired,
FileNotFoundError,
):
except (ValueError, subprocess.TimeoutExpired, FileNotFoundError):
_main_pid = 0
_graceful_ok = False
@@ -6297,33 +5999,19 @@ def _cmd_update_impl(args, gateway_mode: bool):
f"{svc_name}: draining (up to {int(_drain_budget)}s)..."
)
_graceful_ok = _graceful_restart_via_sigusr1(
_main_pid,
drain_timeout=_drain_budget,
_main_pid, drain_timeout=_drain_budget,
)
if _graceful_ok:
# Gateway exited 75; systemd should relaunch
# via Restart=on-failure. The unit's
# RestartSec (default 30s on ours) gates the
# respawn — poll past that + slack so we
# don't give up mid-cooldown and falsely
# print "drained but didn't relaunch". For
# units without RestartSec set we fall back
# to the original 10s budget.
_restart_sec = _service_restart_sec(
scope_cmd,
svc_name,
default=0.0,
# via Restart=on-failure. Verify the new
# process came up.
_time.sleep(3)
verify = subprocess.run(
scope_cmd + ["is-active", svc_name],
capture_output=True, text=True, timeout=5,
)
_post_drain_timeout = max(
10.0,
_restart_sec + 10.0,
)
if _wait_for_service_active(
scope_cmd,
svc_name,
timeout=_post_drain_timeout,
):
if verify.stdout.strip() == "active":
restarted_services.append(svc_name)
continue
# Process exited but wasn't respawned (older
@@ -6349,11 +6037,14 @@ def _cmd_update_impl(args, gateway_mode: bool):
# Verify the service actually survived the
# restart. systemctl restart returns 0 even
# if the new process crashes immediately.
if _wait_for_service_active(
scope_cmd,
svc_name,
timeout=10.0,
):
_time.sleep(3)
verify = subprocess.run(
scope_cmd + ["is-active", svc_name],
capture_output=True,
text=True,
timeout=5,
)
if verify.stdout.strip() == "active":
restarted_services.append(svc_name)
else:
# Retry once — transient startup failures
@@ -6368,11 +6059,14 @@ def _cmd_update_impl(args, gateway_mode: bool):
text=True,
timeout=15,
)
if _wait_for_service_active(
scope_cmd,
svc_name,
timeout=10.0,
):
_time.sleep(3)
verify2 = subprocess.run(
scope_cmd + ["is-active", svc_name],
capture_output=True,
text=True,
timeout=5,
)
if verify2.stdout.strip() == "active":
restarted_services.append(svc_name)
print(f"{svc_name} recovered on retry")
else:
@@ -6873,15 +6567,9 @@ def cmd_dashboard(args):
try:
import fastapi # noqa: F401
import uvicorn # noqa: F401
except ImportError as e:
print("Web UI dependencies not installed (need fastapi + uvicorn).")
print(
f"Re-install the package into this interpreter so metadata updates apply:\n"
f" cd {PROJECT_ROOT}\n"
f" {sys.executable} -m pip install -e .\n"
"If `pip` is missing in this venv, use: uv pip install -e ."
)
print(f"Import error: {e}")
except ImportError:
print("Web UI dependencies not installed.")
print(f"Install them with: {sys.executable} -m pip install 'fastapi' 'uvicorn[standard]'")
sys.exit(1)
if "HERMES_WEB_DIST" not in os.environ:
@@ -6890,17 +6578,11 @@ def cmd_dashboard(args):
from hermes_cli.web_server import start_server
gui_mode = getattr(args, "gui", False)
embedded_chat = (
gui_mode or args.tui or os.environ.get("HERMES_DASHBOARD_TUI") == "1"
)
start_server(
host=args.host,
port=args.port,
open_browser=not args.no_open,
allow_public=getattr(args, "insecure", False),
embedded_chat=embedded_chat,
gui_mode=gui_mode,
)
@@ -6983,40 +6665,6 @@ For more help on a command:
parser.add_argument(
"--version", "-V", action="store_true", help="Show version and exit"
)
parser.add_argument(
"-z",
"--oneshot",
metavar="PROMPT",
default=None,
help=(
"One-shot mode: send a single prompt and print ONLY the final "
"response text to stdout. No banner, no spinner, no tool "
"previews, no session_id line. Tools, memory, rules, and "
"AGENTS.md in the CWD are loaded as normal; approvals are "
"auto-bypassed. Intended for scripts / pipes."
),
)
# --model / --provider are accepted at the top level so they can pair
# with -z without needing the `chat` subcommand. If neither -z nor a
# subcommand consumes them, they fall through harmlessly as None.
# Mirrors `hermes chat --model ... --provider ...` semantics.
parser.add_argument(
"-m",
"--model",
default=None,
help=(
"Model override for this invocation (e.g. anthropic/claude-sonnet-4.6). "
"Applies to -z/--oneshot and --tui. Also settable via HERMES_INFERENCE_MODEL env var."
),
)
parser.add_argument(
"--provider",
default=None,
help=(
"Provider override for this invocation (e.g. openrouter, anthropic). "
"Applies to -z/--oneshot and --tui. Also settable via HERMES_INFERENCE_PROVIDER env var."
),
)
parser.add_argument(
"--resume",
"-r",
@@ -7537,7 +7185,7 @@ For more help on a command:
)
logout_parser.add_argument(
"--provider",
choices=["nous", "openai-codex", "spotify"],
choices=["nous", "openai-codex"],
default=None,
help="Provider to log out from (default: active provider)",
)
@@ -7594,39 +7242,6 @@ For more help on a command:
"reset", help="Clear exhaustion status for all credentials for a provider"
)
auth_reset.add_argument("provider", help="Provider id")
auth_status = auth_subparsers.add_parser(
"status", help="Show auth status for a provider"
)
auth_status.add_argument("provider", help="Provider id")
auth_logout = auth_subparsers.add_parser(
"logout", help="Log out a provider and clear stored auth state"
)
auth_logout.add_argument("provider", help="Provider id")
auth_spotify = auth_subparsers.add_parser(
"spotify", help="Authenticate Hermes with Spotify via PKCE"
)
auth_spotify.add_argument(
"spotify_action",
nargs="?",
choices=["login", "status", "logout"],
default="login",
)
auth_spotify.add_argument(
"--client-id", help="Spotify app client_id (or set HERMES_SPOTIFY_CLIENT_ID)"
)
auth_spotify.add_argument(
"--redirect-uri",
help="Allow-listed localhost redirect URI for your Spotify app",
)
auth_spotify.add_argument("--scope", help="Override requested Spotify scopes")
auth_spotify.add_argument(
"--no-browser",
action="store_true",
help="Do not attempt to open the browser automatically",
)
auth_spotify.add_argument(
"--timeout", type=float, help="Callback/token exchange timeout in seconds"
)
auth_parser.set_defaults(func=cmd_auth)
# =========================================================================
@@ -7683,10 +7298,6 @@ For more help on a command:
"--script",
help="Path to a Python script whose stdout is injected into the prompt each run",
)
cron_create.add_argument(
"--workdir",
help="Absolute path for the job to run from. Injects AGENTS.md / CLAUDE.md / .cursorrules from that directory and uses it as the cwd for terminal/file/code_exec tools. Omit to preserve old behaviour (no project context files).",
)
# cron edit
cron_edit = cron_subparsers.add_parser(
@@ -7725,10 +7336,6 @@ For more help on a command:
"--script",
help="Path to a Python script whose stdout is injected into the prompt each run. Pass empty string to clear.",
)
cron_edit.add_argument(
"--workdir",
help="Absolute path for the job to run from (injects AGENTS.md etc. and sets terminal cwd). Pass empty string to clear.",
)
# lifecycle actions
cron_pause = cron_subparsers.add_parser("pause", help="Pause a scheduled job")
@@ -7836,8 +7443,7 @@ For more help on a command:
hooks_subparsers = hooks_parser.add_subparsers(dest="hooks_action")
hooks_subparsers.add_parser(
"list",
aliases=["ls"],
"list", aliases=["ls"],
help="List configured hooks with matcher, timeout, and consent status",
)
@@ -7850,18 +7456,14 @@ For more help on a command:
help="Hook event name (e.g. pre_tool_call, pre_llm_call, subagent_stop)",
)
_hk_test.add_argument(
"--for-tool",
dest="for_tool",
default=None,
"--for-tool", dest="for_tool", default=None,
help=(
"Only fire hooks whose matcher matches this tool name "
"(used for pre_tool_call / post_tool_call)"
),
)
_hk_test.add_argument(
"--payload-file",
dest="payload_file",
default=None,
"--payload-file", dest="payload_file", default=None,
help=(
"Path to a JSON file whose contents are merged into the "
"synthetic payload before execution"
@@ -7869,8 +7471,7 @@ For more help on a command:
)
_hk_revoke = hooks_subparsers.add_parser(
"revoke",
aliases=["remove", "rm"],
"revoke", aliases=["remove", "rm"],
help="Remove a command's allowlist entries (takes effect on next restart)",
)
_hk_revoke.add_argument(
@@ -9148,19 +8749,6 @@ Examples:
action="store_true",
help="Allow binding to non-localhost (DANGEROUS: exposes API keys on the network)",
)
dashboard_parser.add_argument(
"--tui",
action="store_true",
help=(
"Expose the in-browser Chat tab (embedded `hermes --tui` via PTY/WebSocket). "
"Alternatively set HERMES_DASHBOARD_TUI=1."
),
)
dashboard_parser.add_argument(
"--gui",
action="store_true",
help="Run dashboard in GUI-shell mode; implies --tui",
)
dashboard_parser.set_defaults(func=cmd_dashboard)
# =========================================================================
@@ -9303,28 +8891,26 @@ Examples:
# the nested subcommand (dest varies by parser).
_AGENT_COMMANDS = {None, "chat", "acp", "rl"}
_AGENT_SUBCOMMANDS = {
"cron": ("cron_command", {"run", "tick"}),
"cron": ("cron_command", {"run", "tick"}),
"gateway": ("gateway_command", {"run"}),
"mcp": ("mcp_action", {"serve"}),
"mcp": ("mcp_action", {"serve"}),
}
_sub_attr, _sub_set = _AGENT_SUBCOMMANDS.get(args.command, (None, None))
if args.command in _AGENT_COMMANDS or (
_sub_attr and getattr(args, _sub_attr, None) in _sub_set
if (
args.command in _AGENT_COMMANDS
or (_sub_attr and getattr(args, _sub_attr, None) in _sub_set)
):
_accept_hooks = bool(getattr(args, "accept_hooks", False))
try:
from hermes_cli.plugins import discover_plugins
discover_plugins()
except Exception:
logger.debug(
"plugin discovery failed at CLI startup",
exc_info=True,
"plugin discovery failed at CLI startup", exc_info=True,
)
try:
from hermes_cli.config import load_config
from agent.shell_hooks import register_from_config
register_from_config(load_config(), accept_hooks=_accept_hooks)
except Exception:
logger.debug(
@@ -9332,19 +8918,6 @@ Examples:
exc_info=True,
)
# Handle top-level --oneshot / -z: single-shot mode, stdout = final
# response only, nothing else. Bypasses cli.py entirely.
if getattr(args, "oneshot", None):
from hermes_cli.oneshot import run_oneshot
sys.exit(
run_oneshot(
args.oneshot,
model=getattr(args, "model", None),
provider=getattr(args, "provider", None),
)
)
# Handle top-level --resume / --continue as shortcut to chat
if (args.resume or args.continue_last) and args.command is None:
args.command = "chat"
+9 -48
View File
@@ -12,12 +12,8 @@ Different LLM providers expect model identifiers in different formats:
model IDs, but Claude still uses hyphenated native names like
``claude-sonnet-4-6``.
- **OpenCode Go** preserves dots in model names: ``minimax-m2.7``.
- **DeepSeek** accepts ``deepseek-chat`` (V3), ``deepseek-reasoner``
(R1-family), and the first-class V-series IDs (``deepseek-v4-pro``,
``deepseek-v4-flash``, and any future ``deepseek-v<N>-*``). Older
Hermes revisions folded every non-reasoner input into
``deepseek-chat``, which on aggregators routes to V3 so a user
picking V4 Pro was silently downgraded.
- **DeepSeek** only accepts two model identifiers:
``deepseek-chat`` and ``deepseek-reasoner``.
- **Custom** and remaining providers pass the name through as-is.
This module centralises that translation so callers can simply write::
@@ -29,7 +25,6 @@ Inspired by Clawdbot's ``normalizeAnthropicModelId`` pattern.
from __future__ import annotations
import re
from typing import Optional
# ---------------------------------------------------------------------------
@@ -105,15 +100,6 @@ _MATCHING_PREFIX_STRIP_PROVIDERS: frozenset[str] = frozenset({
"custom",
})
# Providers whose APIs require lowercase model IDs. Xiaomi's
# ``api.xiaomimimo.com`` rejects mixed-case names like ``MiMo-V2.5-Pro``
# that users might copy from marketing docs — it only accepts
# ``mimo-v2.5-pro``. After stripping a matching provider prefix, these
# providers also get ``.lower()`` applied.
_LOWERCASE_MODEL_PROVIDERS: frozenset[str] = frozenset({
"xiaomi",
})
# ---------------------------------------------------------------------------
# DeepSeek special handling
# ---------------------------------------------------------------------------
@@ -129,30 +115,17 @@ _DEEPSEEK_REASONER_KEYWORDS: frozenset[str] = frozenset({
})
_DEEPSEEK_CANONICAL_MODELS: frozenset[str] = frozenset({
"deepseek-chat", # V3 on DeepSeek direct and most aggregators
"deepseek-reasoner", # R1-family reasoning model
"deepseek-v4-pro", # V4 Pro — first-class model ID
"deepseek-v4-flash", # V4 Flash — first-class model ID
"deepseek-chat",
"deepseek-reasoner",
})
# First-class V-series IDs (``deepseek-v4-pro``, ``deepseek-v4-flash``,
# future ``deepseek-v5-*``, dated variants like ``deepseek-v4-flash-20260423``).
# Verified empirically 2026-04-24: DeepSeek's Chat Completions API returns
# ``provider: DeepSeek`` / ``model: deepseek-v4-flash-20260423`` when called
# with ``model=deepseek/deepseek-v4-flash``, so these names are not aliases
# of ``deepseek-chat`` and must not be folded into it.
_DEEPSEEK_V_SERIES_RE = re.compile(r"^deepseek-v\d+([-.].+)?$")
def _normalize_for_deepseek(model_name: str) -> str:
"""Map a model input to a DeepSeek-accepted identifier.
"""Map any model input to one of DeepSeek's two accepted identifiers.
Rules:
- Already a known canonical (``deepseek-chat``/``deepseek-reasoner``/
``deepseek-v4-pro``/``deepseek-v4-flash``) -> pass through.
- Matches the V-series pattern ``deepseek-v<digit>...`` -> pass through
(covers future ``deepseek-v5-*`` and dated variants without a release).
- Contains a reasoner keyword (r1, think, reasoning, cot, reasoner)
- Already ``deepseek-chat`` or ``deepseek-reasoner`` -> pass through.
- Contains any reasoner keyword (r1, think, reasoning, cot, reasoner)
-> ``deepseek-reasoner``.
- Everything else -> ``deepseek-chat``.
@@ -160,17 +133,13 @@ def _normalize_for_deepseek(model_name: str) -> str:
model_name: The bare model name (vendor prefix already stripped).
Returns:
A DeepSeek-accepted model identifier.
One of ``"deepseek-chat"`` or ``"deepseek-reasoner"``.
"""
bare = _strip_vendor_prefix(model_name).lower()
if bare in _DEEPSEEK_CANONICAL_MODELS:
return bare
# V-series first-class IDs (v4-pro, v4-flash, future v5-*, dated variants)
if _DEEPSEEK_V_SERIES_RE.match(bare):
return bare
# Check for reasoner-like keywords anywhere in the name
for keyword in _DEEPSEEK_REASONER_KEYWORDS:
if keyword in bare:
@@ -378,9 +347,6 @@ def normalize_model_for_provider(model_input: str, target_provider: str) -> str:
>>> normalize_model_for_provider("claude-sonnet-4.6", "zai")
'claude-sonnet-4.6'
>>> normalize_model_for_provider("MiMo-V2.5-Pro", "xiaomi")
'mimo-v2.5-pro'
"""
name = (model_input or "").strip()
if not name:
@@ -444,12 +410,7 @@ def normalize_model_for_provider(model_input: str, target_provider: str) -> str:
# --- Direct providers: repair matching provider prefixes only ---
if provider in _MATCHING_PREFIX_STRIP_PROVIDERS:
result = _strip_matching_provider_prefix(name, provider)
# Some providers require lowercase model IDs (e.g. Xiaomi's API
# rejects "MiMo-V2.5-Pro" but accepts "mimo-v2.5-pro").
if provider in _LOWERCASE_MODEL_PROVIDERS:
result = result.lower()
return result
return _strip_matching_provider_prefix(name, provider)
# --- Authoritative native providers: preserve user-facing slugs as-is ---
if provider in _AUTHORITATIVE_NATIVE_PROVIDERS:
+8 -71
View File
@@ -527,42 +527,6 @@ def _resolve_alias_fallback(
return None
def resolve_display_context_length(
model: str,
provider: str,
base_url: str = "",
api_key: str = "",
model_info: Optional[ModelInfo] = None,
) -> Optional[int]:
"""Resolve the context length to show in /model output.
models.dev reports per-vendor context (e.g. gpt-5.5 = 1.05M on openai)
but provider-enforced limits can be lower (e.g. Codex OAuth caps the
same slug at 272k). The authoritative source is
``agent.model_metadata.get_model_context_length`` which already knows
about Codex OAuth, Copilot, Nous, and falls back to models.dev for the
rest.
Prefer the provider-aware value; fall back to ``model_info.context_window``
only if the resolver returns nothing.
"""
try:
from agent.model_metadata import get_model_context_length
ctx = get_model_context_length(
model,
base_url=base_url or "",
api_key=api_key or "",
provider=provider or None,
)
if ctx:
return int(ctx)
except Exception:
pass
if model_info is not None and model_info.context_window:
return int(model_info.context_window)
return None
# ---------------------------------------------------------------------------
# Core model-switching pipeline
# ---------------------------------------------------------------------------
@@ -807,10 +771,7 @@ def switch_model(
if provider_changed or explicit_provider:
try:
runtime = resolve_runtime_provider(
requested=target_provider,
target_model=new_model,
)
runtime = resolve_runtime_provider(requested=target_provider)
api_key = runtime.get("api_key", "")
base_url = runtime.get("base_url", "")
api_mode = runtime.get("api_mode", "")
@@ -827,10 +788,7 @@ def switch_model(
)
else:
try:
runtime = resolve_runtime_provider(
requested=current_provider,
target_model=new_model,
)
runtime = resolve_runtime_provider(requested=current_provider)
api_key = runtime.get("api_key", "")
base_url = runtime.get("base_url", "")
api_mode = runtime.get("api_mode", "")
@@ -857,7 +815,6 @@ def switch_model(
target_provider,
api_key=api_key,
base_url=base_url,
api_mode=api_mode or None,
)
except Exception as e:
validation = {
@@ -979,7 +936,7 @@ def list_authenticated_providers(
from hermes_cli.auth import PROVIDER_REGISTRY
from hermes_cli.models import (
OPENROUTER_MODELS, _PROVIDER_MODELS,
_MODELS_DEV_PREFERRED, _merge_with_models_dev, provider_model_ids,
_MODELS_DEV_PREFERRED, _merge_with_models_dev,
)
results: List[dict] = []
@@ -1027,14 +984,6 @@ def list_authenticated_providers(
# Check if any env var is set
has_creds = any(os.environ.get(ev) for ev in env_vars)
if not has_creds:
try:
from hermes_cli.auth import _load_auth_store
store = _load_auth_store()
if store and hermes_id in store.get("credential_pool", {}):
has_creds = True
except Exception:
pass
if not has_creds:
continue
@@ -1146,14 +1095,11 @@ def list_authenticated_providers(
if not has_creds:
continue
if hermes_slug in {"copilot", "copilot-acp"}:
model_ids = provider_model_ids(hermes_slug)
else:
# Use curated list — look up by Hermes slug, fall back to overlay key
model_ids = curated.get(hermes_slug, []) or curated.get(pid, [])
# Merge with models.dev for preferred providers (same rationale as above).
if hermes_slug in _MODELS_DEV_PREFERRED:
model_ids = _merge_with_models_dev(hermes_slug, model_ids)
# Use curated list — look up by Hermes slug, fall back to overlay key
model_ids = curated.get(hermes_slug, []) or curated.get(pid, [])
# Merge with models.dev for preferred providers (same rationale as above).
if hermes_slug in _MODELS_DEV_PREFERRED:
model_ids = _merge_with_models_dev(hermes_slug, model_ids)
total = len(model_ids)
top = model_ids[:max_models]
@@ -1276,15 +1222,6 @@ def list_authenticated_providers(
if m and m not in models_list:
models_list.append(m)
# Official OpenAI API rows in providers: often have base_url but no
# explicit models: dict — avoid a misleading zero count in /model.
if not models_list:
url_lower = str(api_url).strip().lower()
if "api.openai.com" in url_lower:
fb = curated.get("openai") or []
if fb:
models_list = list(fb)
# Try to probe /v1/models if URL is set (but don't block on it)
# For now just show what we know from config
results.append({
+87 -335
View File
@@ -33,8 +33,6 @@ COPILOT_REASONING_EFFORTS_O_SERIES = ["low", "medium", "high"]
# (model_id, display description shown in menus)
OPENROUTER_MODELS: list[tuple[str, str]] = [
("moonshotai/kimi-k2.6", "recommended"),
("deepseek/deepseek-v4-pro", ""),
("deepseek/deepseek-v4-flash", ""),
("anthropic/claude-opus-4.7", ""),
("anthropic/claude-opus-4.6", ""),
("anthropic/claude-sonnet-4.6", ""),
@@ -42,7 +40,7 @@ OPENROUTER_MODELS: list[tuple[str, str]] = [
("anthropic/claude-sonnet-4.5", ""),
("anthropic/claude-haiku-4.5", ""),
("openrouter/elephant-alpha", "free"),
("openai/gpt-5.5", ""),
("openai/gpt-5.4", ""),
("openai/gpt-5.4-mini", ""),
("xiaomi/mimo-v2.5-pro", ""),
("xiaomi/mimo-v2.5", ""),
@@ -65,7 +63,7 @@ OPENROUTER_MODELS: list[tuple[str, str]] = [
("nvidia/nemotron-3-super-120b-a12b:free", "free"),
("arcee-ai/trinity-large-preview:free", "free"),
("arcee-ai/trinity-large-thinking", ""),
("openai/gpt-5.5-pro", ""),
("openai/gpt-5.4-pro", ""),
("openai/gpt-5.4-nano", ""),
]
@@ -111,8 +109,6 @@ def _codex_curated_models() -> list[str]:
_PROVIDER_MODELS: dict[str, list[str]] = {
"nous": [
"moonshotai/kimi-k2.6",
"deepseek/deepseek-v4-pro",
"deepseek/deepseek-v4-flash",
"xiaomi/mimo-v2.5-pro",
"xiaomi/mimo-v2.5",
"anthropic/claude-opus-4.7",
@@ -120,7 +116,7 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"anthropic/claude-sonnet-4.6",
"anthropic/claude-sonnet-4.5",
"anthropic/claude-haiku-4.5",
"openai/gpt-5.5",
"openai/gpt-5.4",
"openai/gpt-5.4-mini",
"openai/gpt-5.3-codex",
"google/gemini-3-pro-preview",
@@ -139,21 +135,9 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"x-ai/grok-4.20-beta",
"nvidia/nemotron-3-super-120b-a12b",
"arcee-ai/trinity-large-thinking",
"openai/gpt-5.5-pro",
"openai/gpt-5.4-pro",
"openai/gpt-5.4-nano",
],
# Native OpenAI Chat Completions (api.openai.com). Used by /model counts and
# provider_model_ids fallback when /v1/models is unavailable.
"openai": [
"gpt-5.4",
"gpt-5.4-mini",
"gpt-5-mini",
"gpt-5.3-codex",
"gpt-5.2-codex",
"gpt-4.1",
"gpt-4o",
"gpt-4o-mini",
],
"openai-codex": _codex_curated_models(),
"copilot-acp": [
"copilot-acp",
@@ -167,13 +151,10 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"gpt-4.1",
"gpt-4o",
"gpt-4o-mini",
"claude-opus-4.6",
"claude-sonnet-4.6",
"claude-sonnet-4",
"claude-sonnet-4.5",
"claude-haiku-4.5",
"gemini-3.1-pro-preview",
"gemini-3-pro-preview",
"gemini-3-flash-preview",
"gemini-2.5-pro",
"grok-code-fast-1",
],
@@ -265,8 +246,6 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"claude-haiku-4-5-20251001",
],
"deepseek": [
"deepseek-v4-pro",
"deepseek-v4-flash",
"deepseek-chat",
"deepseek-reasoner",
],
@@ -697,7 +676,7 @@ def get_nous_recommended_aux_model(
# ---------------------------------------------------------------------------
# Canonical provider list — single source of truth for provider identity.
# Every code path that lists, displays, or iterates providers derives from
# this list: hermes model, /model, list_authenticated_providers.
# this list: hermes model, /model, /provider, list_authenticated_providers.
#
# Fields:
# slug — internal provider ID (used in config.yaml, --provider flag)
@@ -1125,10 +1104,7 @@ def fetch_models_with_pricing(
return _pricing_cache[cache_key]
url = cache_key.rstrip("/") + "/v1/models"
headers: dict[str, str] = {
"Accept": "application/json",
"User-Agent": _HERMES_USER_AGENT,
}
headers: dict[str, str] = {"Accept": "application/json"}
if api_key:
headers["Authorization"] = f"Bearer {api_key}"
@@ -1379,124 +1355,6 @@ def curated_models_for_provider(
return [(m, "") for m in models]
def _provider_keys(provider: str) -> set[str]:
key = (provider or "").strip().lower()
normalized = normalize_provider(provider)
return {k for k in (key, normalized) if k}
def _model_in_provider_catalog(name_lower: str, providers: set[str]) -> bool:
return any(
name_lower == model.lower()
for provider in providers
for model in _PROVIDER_MODELS.get(provider, [])
)
_AGGREGATOR_PROVIDERS = frozenset(
{"nous", "openrouter", "ai-gateway", "copilot", "kilocode"}
)
def _resolve_static_model_alias(
name_lower: str,
current_keys: set[str],
) -> Optional[tuple[str, str]]:
"""Resolve short aliases (e.g. sonnet/opus) using static catalogs only."""
try:
from hermes_cli.model_switch import MODEL_ALIASES
except Exception:
return None
identity = MODEL_ALIASES.get(name_lower)
if identity is None:
return None
vendor = identity.vendor
family = identity.family
def _match(provider: str) -> Optional[str]:
models = _PROVIDER_MODELS.get(provider, [])
if not models:
return None
prefix = (
f"{vendor}/{family}"
if provider in _AGGREGATOR_PROVIDERS
else family
).lower()
for model in models:
if model.lower().startswith(prefix):
return model
return None
for provider in current_keys:
if matched := _match(provider):
return provider, matched
for provider in _PROVIDER_MODELS:
if provider in current_keys or provider in _AGGREGATOR_PROVIDERS:
continue
if matched := _match(provider):
return provider, matched
for provider in _AGGREGATOR_PROVIDERS:
if provider in current_keys and (matched := _match(provider)):
return provider, matched
return None
def detect_static_provider_for_model(
model_name: str,
current_provider: str,
) -> Optional[tuple[str, str]]:
"""Auto-detect a provider from static catalogs only.
Returns ``(provider_id, model_name)``. The model name may be remapped
when a static alias or bare provider name resolves to a catalog default.
Returns ``None`` when no confident match is found.
"""
name = (model_name or "").strip()
if not name:
return None
name_lower = name.lower()
current_keys = _provider_keys(current_provider)
alias_match = _resolve_static_model_alias(name_lower, current_keys)
if alias_match:
return alias_match
# --- Step 0: bare provider name typed as model ---
# If someone types `/model nous` or `/model anthropic`, treat it as a
# provider switch and pick the first model from that provider's catalog.
# Skip "custom" and "openrouter" — custom has no model catalog, and
# openrouter requires an explicit model name to be useful.
resolved_provider = _PROVIDER_ALIASES.get(name_lower, name_lower)
if resolved_provider not in {"custom", "openrouter"}:
default_models = _PROVIDER_MODELS.get(resolved_provider, [])
if (
resolved_provider in _PROVIDER_LABELS
and default_models
and resolved_provider not in current_keys
):
return (resolved_provider, default_models[0])
# Aggregators list other providers' models — never auto-switch TO them
# If the model belongs to the current provider's catalog, don't suggest switching
if _model_in_provider_catalog(name_lower, current_keys):
return None
# --- Step 1: check static provider catalogs for a direct match ---
for pid, models in _PROVIDER_MODELS.items():
if pid in current_keys or pid in _AGGREGATOR_PROVIDERS:
continue
if any(name_lower == m.lower() for m in models):
return (pid, name)
return None
def detect_provider_for_model(
model_name: str,
current_provider: str,
@@ -1509,19 +1367,86 @@ def detect_provider_for_model(
Priority:
0. Bare provider name switch to that provider's default model
1. Direct provider static catalog match
2. OpenRouter catalog match
1. Direct provider with credentials (highest)
2. Direct provider without credentials remap to OpenRouter slug
3. OpenRouter catalog match
"""
name = (model_name or "").strip()
if not name:
return None
static_match = detect_static_provider_for_model(name, current_provider)
if static_match:
return static_match
if _model_in_provider_catalog(name.lower(), _provider_keys(current_provider)):
name_lower = name.lower()
# --- Step 0: bare provider name typed as model ---
# If someone types `/model nous` or `/model anthropic`, treat it as a
# provider switch and pick the first model from that provider's catalog.
# Skip "custom" and "openrouter" — custom has no model catalog, and
# openrouter requires an explicit model name to be useful.
resolved_provider = _PROVIDER_ALIASES.get(name_lower, name_lower)
if resolved_provider not in {"custom", "openrouter"}:
default_models = _PROVIDER_MODELS.get(resolved_provider, [])
if (
resolved_provider in _PROVIDER_LABELS
and default_models
and resolved_provider != normalize_provider(current_provider)
):
return (resolved_provider, default_models[0])
# Aggregators list other providers' models — never auto-switch TO them
_AGGREGATORS = {"nous", "openrouter", "ai-gateway", "copilot", "kilocode"}
# If the model belongs to the current provider's catalog, don't suggest switching
current_models = _PROVIDER_MODELS.get(current_provider, [])
if any(name_lower == m.lower() for m in current_models):
return None
# --- Step 1: check static provider catalogs for a direct match ---
direct_match: Optional[str] = None
for pid, models in _PROVIDER_MODELS.items():
if pid == current_provider or pid in _AGGREGATORS:
continue
if any(name_lower == m.lower() for m in models):
direct_match = pid
break
if direct_match:
# Check if we have credentials for this provider — env vars,
# credential pool, or auth store entries.
has_creds = False
try:
from hermes_cli.auth import PROVIDER_REGISTRY
pconfig = PROVIDER_REGISTRY.get(direct_match)
if pconfig:
for env_var in pconfig.api_key_env_vars:
if os.getenv(env_var, "").strip():
has_creds = True
break
except Exception:
pass
# Also check credential pool and auth store — covers OAuth,
# Claude Code tokens, and other non-env-var credentials (#10300).
if not has_creds:
try:
from agent.credential_pool import load_pool
pool = load_pool(direct_match)
if pool.has_credentials():
has_creds = True
except Exception:
pass
if not has_creds:
try:
from hermes_cli.auth import _load_auth_store
store = _load_auth_store()
if direct_match in store.get("providers", {}) or direct_match in store.get("credential_pool", {}):
has_creds = True
except Exception:
pass
# Always return the direct provider match. If credentials are
# missing, the client init will give a clear error rather than
# silently routing through the wrong provider (#10300).
return (direct_match, name)
# --- Step 2: check OpenRouter catalog ---
# First try exact match (handles provider/model format)
or_slug = _find_openrouter_slug(name)
@@ -1811,17 +1736,6 @@ def provider_model_ids(provider: Optional[str], *, force_refresh: bool = False)
live = fetch_ollama_cloud_models(force_refresh=force_refresh)
if live:
return live
if normalized == "openai":
api_key = os.getenv("OPENAI_API_KEY", "").strip()
if api_key:
base_raw = os.getenv("OPENAI_BASE_URL", "").strip().rstrip("/")
base = base_raw or "https://api.openai.com/v1"
try:
live = fetch_api_models(api_key, base)
if live:
return live
except Exception:
pass
if normalized == "custom":
base_url = _get_custom_base_url()
if base_url:
@@ -1976,51 +1890,6 @@ def fetch_github_model_catalog(
return None
# ─── Copilot catalog context-window helpers ─────────────────────────────────
# Module-level cache: {model_id: max_prompt_tokens}
_copilot_context_cache: dict[str, int] = {}
_copilot_context_cache_time: float = 0.0
_COPILOT_CONTEXT_CACHE_TTL = 3600 # 1 hour
def get_copilot_model_context(model_id: str, api_key: Optional[str] = None) -> Optional[int]:
"""Look up max_prompt_tokens for a Copilot model from the live /models API.
Results are cached in-process for 1 hour to avoid repeated API calls.
Returns the token limit or None if not found.
"""
global _copilot_context_cache, _copilot_context_cache_time
# Serve from cache if fresh
if _copilot_context_cache and (time.time() - _copilot_context_cache_time < _COPILOT_CONTEXT_CACHE_TTL):
if model_id in _copilot_context_cache:
return _copilot_context_cache[model_id]
# Cache is fresh but model not in it — don't re-fetch
return None
# Fetch and populate cache
catalog = fetch_github_model_catalog(api_key=api_key)
if not catalog:
return None
cache: dict[str, int] = {}
for item in catalog:
mid = str(item.get("id") or "").strip()
if not mid:
continue
caps = item.get("capabilities") or {}
limits = caps.get("limits") or {}
max_prompt = limits.get("max_prompt_tokens")
if isinstance(max_prompt, int) and max_prompt > 0:
cache[mid] = max_prompt
_copilot_context_cache = cache
_copilot_context_cache_time = time.time()
return cache.get(model_id)
def _is_github_models_base_url(base_url: Optional[str]) -> bool:
normalized = (base_url or "").strip().rstrip("/").lower()
return (
@@ -2054,7 +1923,6 @@ _COPILOT_MODEL_ALIASES = {
"openai/o4-mini": "gpt-5-mini",
"anthropic/claude-opus-4.6": "claude-opus-4.6",
"anthropic/claude-sonnet-4.6": "claude-sonnet-4.6",
"anthropic/claude-sonnet-4": "claude-sonnet-4",
"anthropic/claude-sonnet-4.5": "claude-sonnet-4.5",
"anthropic/claude-haiku-4.5": "claude-haiku-4.5",
# Dash-notation fallbacks: Hermes' default Claude IDs elsewhere use
@@ -2064,12 +1932,10 @@ _COPILOT_MODEL_ALIASES = {
# "model_not_supported". See issue #6879.
"claude-opus-4-6": "claude-opus-4.6",
"claude-sonnet-4-6": "claude-sonnet-4.6",
"claude-sonnet-4-0": "claude-sonnet-4",
"claude-sonnet-4-5": "claude-sonnet-4.5",
"claude-haiku-4-5": "claude-haiku-4.5",
"anthropic/claude-opus-4-6": "claude-opus-4.6",
"anthropic/claude-sonnet-4-6": "claude-sonnet-4.6",
"anthropic/claude-sonnet-4-0": "claude-sonnet-4",
"anthropic/claude-sonnet-4-5": "claude-sonnet-4.5",
"anthropic/claude-haiku-4-5": "claude-haiku-4.5",
}
@@ -2294,15 +2160,8 @@ def probe_api_models(
api_key: Optional[str],
base_url: Optional[str],
timeout: float = 5.0,
api_mode: Optional[str] = None,
) -> dict[str, Any]:
"""Probe a ``/models`` endpoint with light URL heuristics.
For ``anthropic_messages`` mode, uses ``x-api-key`` and
``anthropic-version`` headers (Anthropic's native auth) instead of
``Authorization: Bearer``. The response shape (``data[].id``) is
identical, so the same parser works for both.
"""
"""Probe an OpenAI-compatible ``/models`` endpoint with light URL heuristics."""
normalized = (base_url or "").strip().rstrip("/")
if not normalized:
return {
@@ -2334,10 +2193,7 @@ def probe_api_models(
tried: list[str] = []
headers: dict[str, str] = {"User-Agent": _HERMES_USER_AGENT}
if api_key and api_mode == "anthropic_messages":
headers["x-api-key"] = api_key
headers["anthropic-version"] = "2023-06-01"
elif api_key:
if api_key:
headers["Authorization"] = f"Bearer {api_key}"
if normalized.startswith(COPILOT_BASE_URL):
headers.update(copilot_default_headers())
@@ -2379,10 +2235,7 @@ def _fetch_ai_gateway_models(timeout: float = 5.0) -> Optional[list[str]]:
base_url = AI_GATEWAY_BASE_URL
url = base_url.rstrip("/") + "/models"
headers: dict[str, str] = {
"Authorization": f"Bearer {api_key}",
"User-Agent": _HERMES_USER_AGENT,
}
headers: dict[str, str] = {"Authorization": f"Bearer {api_key}"}
req = urllib.request.Request(url, headers=headers)
try:
with urllib.request.urlopen(req, timeout=timeout) as resp:
@@ -2402,14 +2255,13 @@ def fetch_api_models(
api_key: Optional[str],
base_url: Optional[str],
timeout: float = 5.0,
api_mode: Optional[str] = None,
) -> Optional[list[str]]:
"""Fetch the list of available model IDs from the provider's ``/models`` endpoint.
Returns a list of model ID strings, or ``None`` if the endpoint could not
be reached (network error, timeout, auth failure, etc.).
"""
return probe_api_models(api_key, base_url, timeout=timeout, api_mode=api_mode).get("models")
return probe_api_models(api_key, base_url, timeout=timeout).get("models")
# ---------------------------------------------------------------------------
@@ -2537,7 +2389,6 @@ def validate_requested_model(
*,
api_key: Optional[str] = None,
base_url: Optional[str] = None,
api_mode: Optional[str] = None,
) -> dict[str, Any]:
"""
Validate a ``/model`` value for the active provider.
@@ -2579,11 +2430,7 @@ def validate_requested_model(
}
if normalized == "custom":
# Try probing with correct auth for the api_mode.
if api_mode == "anthropic_messages":
probe = probe_api_models(api_key, base_url, api_mode=api_mode)
else:
probe = probe_api_models(api_key, base_url)
probe = probe_api_models(api_key, base_url)
api_models = probe.get("models")
if api_models is not None:
if requested_for_lookup in set(api_models):
@@ -2632,17 +2479,12 @@ def validate_requested_model(
f"Note: could not reach this custom endpoint's model listing at `{probe.get('probed_url')}`. "
f"Hermes will still save `{requested}`, but the endpoint should expose `/models` for verification."
)
if api_mode == "anthropic_messages":
message += (
"\n Many Anthropic-compatible proxies do not implement the Models API "
"(GET /v1/models). The model name has been accepted without verification."
)
if probe.get("suggested_base_url"):
message += f"\n If this server expects `/v1`, try base URL: `{probe.get('suggested_base_url')}`"
return {
"accepted": api_mode == "anthropic_messages",
"persist": True,
"accepted": False,
"persist": False,
"recognized": False,
"message": message,
}
@@ -2730,100 +2572,10 @@ def validate_requested_model(
),
}
# Native Anthropic provider: /v1/models requires x-api-key (or Bearer for
# OAuth) plus anthropic-version headers. The generic OpenAI-style probe
# below uses plain Bearer auth and 401s against Anthropic, so dispatch to
# the native fetcher which handles both API keys and Claude-Code OAuth
# tokens. (The api_mode=="anthropic_messages" branch below handles the
# Messages-API transport case separately.)
if normalized == "anthropic":
anthropic_models = _fetch_anthropic_models()
if anthropic_models is not None:
if requested_for_lookup in set(anthropic_models):
return {
"accepted": True,
"persist": True,
"recognized": True,
"message": None,
}
auto = get_close_matches(requested_for_lookup, anthropic_models, n=1, cutoff=0.9)
if auto:
return {
"accepted": True,
"persist": True,
"recognized": True,
"corrected_model": auto[0],
"message": f"Auto-corrected `{requested}` → `{auto[0]}`",
}
suggestions = get_close_matches(requested, anthropic_models, n=3, cutoff=0.5)
suggestion_text = ""
if suggestions:
suggestion_text = "\n Similar models: " + ", ".join(f"`{s}`" for s in suggestions)
# Accept anyway — Anthropic sometimes gates newer/preview models
# (e.g. snapshot IDs, early-access releases) behind accounts
# even though they aren't listed on /v1/models.
return {
"accepted": True,
"persist": True,
"recognized": False,
"message": (
f"Note: `{requested}` was not found in Anthropic's /v1/models listing. "
f"It may still work if you have early-access or snapshot IDs."
f"{suggestion_text}"
),
}
# _fetch_anthropic_models returned None — no token resolvable or
# network failure. Fall through to the generic warning below.
# Anthropic Messages API: many proxies don't implement /v1/models.
# Try probing with correct auth; if it fails, accept with a warning.
if api_mode == "anthropic_messages":
api_models = fetch_api_models(api_key, base_url, api_mode=api_mode)
if api_models is not None:
if requested_for_lookup in set(api_models):
return {
"accepted": True,
"persist": True,
"recognized": True,
"message": None,
}
auto = get_close_matches(requested_for_lookup, api_models, n=1, cutoff=0.9)
if auto:
return {
"accepted": True,
"persist": True,
"recognized": True,
"corrected_model": auto[0],
"message": f"Auto-corrected `{requested}` → `{auto[0]}`",
}
# Probe failed or model not found — accept anyway (proxy likely
# doesn't implement the Anthropic Models API).
return {
"accepted": True,
"persist": True,
"recognized": False,
"message": (
f"Note: could not verify `{requested}` against this endpoint's "
f"model listing. Many Anthropic-compatible proxies do not "
f"implement GET /v1/models. The model name has been accepted "
f"without verification."
),
}
# Probe the live API to check if the model actually exists
api_models = fetch_api_models(api_key, base_url)
if api_models is not None:
# Gemini's OpenAI-compat /v1beta/openai/models endpoint returns IDs
# prefixed with "models/" (e.g. "models/gemini-2.5-flash") — native
# Gemini-API convention. Our curated list and user input both use
# the bare ID, so a direct set-membership check drops every known
# Gemini model. Strip the prefix before comparison. See #12532.
if normalized == "gemini":
api_models = [
m[len("models/"):] if isinstance(m, str) and m.startswith("models/") else m
for m in api_models
]
if requested_for_lookup in set(api_models):
# API confirmed the model exists
return {
-202
View File
@@ -1,202 +0,0 @@
"""Oneshot (-z) mode: send a prompt, get the final content block, exit.
Bypasses cli.py entirely. No banner, no spinner, no session_id line,
no stderr chatter. Just the agent's final text to stdout.
Toolsets = whatever the user has configured for "cli" in `hermes tools`.
Rules / memory / AGENTS.md / preloaded skills = same as a normal chat turn.
Approvals = auto-bypassed (HERMES_YOLO_MODE=1 is set for the call).
Working directory = the user's CWD (AGENTS.md etc. resolve from there as usual).
Model / provider selection mirrors `hermes chat`:
- Both optional. If omitted, use the user's configured default.
- If both given, pair them exactly as given.
- If only --model given, auto-detect the provider that serves it.
- If only --provider given, error out (ambiguous caller must pick a model).
Env var fallbacks (used when the corresponding arg is not passed):
- HERMES_INFERENCE_MODEL
- HERMES_INFERENCE_PROVIDER (already read by resolve_runtime_provider)
"""
from __future__ import annotations
import logging
import os
import sys
from contextlib import redirect_stderr, redirect_stdout
from typing import Optional
def run_oneshot(
prompt: str,
model: Optional[str] = None,
provider: Optional[str] = None,
) -> int:
"""Execute a single prompt and print only the final content block.
Args:
prompt: The user message to send.
model: Optional model override. Falls back to HERMES_INFERENCE_MODEL
env var, then config.yaml's model.default / model.model.
provider: Optional provider override. Falls back to
HERMES_INFERENCE_PROVIDER env var, then config.yaml's model.provider,
then "auto".
Returns the exit code. Caller should sys.exit() with the return.
"""
# Silence every stdlib logger for the duration. AIAgent, tools, and
# provider adapters all log to stderr through the root logger; file
# handlers added by setup_logging() keep working (they're attached to
# the root logger's handler list, not affected by level), but no
# bytes reach the terminal.
logging.disable(logging.CRITICAL)
# --provider without --model is ambiguous: carrying the user's configured
# model across to a different provider is usually wrong (that provider may
# not host it), and silently picking the provider's catalog default hides
# the mismatch. Require the caller to be explicit. Validate BEFORE the
# stderr redirect so the message actually reaches the terminal.
env_model_early = os.getenv("HERMES_INFERENCE_MODEL", "").strip()
if provider and not ((model or "").strip() or env_model_early):
sys.stderr.write(
"hermes -z: --provider requires --model (or HERMES_INFERENCE_MODEL). "
"Pass both explicitly, or neither to use your configured defaults.\n"
)
return 2
# Auto-approve any shell / tool approvals. Non-interactive by
# definition — a prompt would hang forever.
os.environ["HERMES_YOLO_MODE"] = "1"
os.environ["HERMES_ACCEPT_HOOKS"] = "1"
# Redirect stderr AND stdout to devnull for the entire call tree.
# We'll print the final response to the real stdout at the end.
real_stdout = sys.stdout
devnull = open(os.devnull, "w")
try:
with redirect_stdout(devnull), redirect_stderr(devnull):
response = _run_agent(prompt, model=model, provider=provider)
finally:
try:
devnull.close()
except Exception:
pass
if response:
real_stdout.write(response)
if not response.endswith("\n"):
real_stdout.write("\n")
real_stdout.flush()
return 0
def _run_agent(
prompt: str,
model: Optional[str] = None,
provider: Optional[str] = None,
) -> str:
"""Build an AIAgent exactly like a normal CLI chat turn would, then
run a single conversation. Returns the final response string."""
# Imports are local so they don't run when hermes is invoked for
# other commands (keeps top-level CLI startup cheap).
from hermes_cli.config import load_config
from hermes_cli.models import detect_provider_for_model
from hermes_cli.runtime_provider import resolve_runtime_provider
from hermes_cli.tools_config import _get_platform_tools
from run_agent import AIAgent
cfg = load_config()
# Resolve effective model: explicit arg → env var → config.
model_cfg = cfg.get("model") or {}
if isinstance(model_cfg, str):
cfg_model = model_cfg
else:
cfg_model = model_cfg.get("default") or model_cfg.get("model") or ""
env_model = os.getenv("HERMES_INFERENCE_MODEL", "").strip()
effective_model = (model or "").strip() or env_model or cfg_model
# Resolve effective provider: explicit arg → (auto-detect from model if
# model was explicit) → env / config (handled inside resolve_runtime_provider).
#
# When --model is given without --provider, auto-detect the provider that
# serves that model — same semantic as `/model <name>` in an interactive
# session. Without this, resolve_runtime_provider() would fall back to
# the user's configured default provider, which may not host the model
# the caller just asked for.
effective_provider = (provider or "").strip() or None
if effective_provider is None and (model or env_model):
# Only auto-detect when the model was explicitly requested via arg or
# env var (not when it came from config — that's the "use my defaults"
# path and the configured provider is already correct).
explicit_model = (model or "").strip() or env_model
if explicit_model:
cfg_provider = ""
if isinstance(model_cfg, dict):
cfg_provider = str(model_cfg.get("provider") or "").strip().lower()
current_provider = (
cfg_provider
or os.getenv("HERMES_INFERENCE_PROVIDER", "").strip().lower()
or "auto"
)
detected = detect_provider_for_model(explicit_model, current_provider)
if detected:
effective_provider, effective_model = detected
runtime = resolve_runtime_provider(
requested=effective_provider,
target_model=effective_model or None,
)
# Pull in whatever toolsets the user has enabled for "cli".
# sorted() gives stable ordering; set→list for AIAgent's signature.
toolsets_list = sorted(_get_platform_tools(cfg, "cli"))
agent = AIAgent(
api_key=runtime.get("api_key"),
base_url=runtime.get("base_url"),
provider=runtime.get("provider"),
api_mode=runtime.get("api_mode"),
model=effective_model,
enabled_toolsets=toolsets_list,
quiet_mode=True,
platform="cli",
credential_pool=runtime.get("credential_pool"),
# Interactive callbacks are intentionally NOT wired beyond this
# one. In oneshot mode there's no user sitting at a terminal:
# - clarify → returns a synthetic "pick a default" instruction
# so the agent continues instead of stalling on
# the tool's built-in "not available" error
# - sudo password prompt → terminal_tool gates on
# HERMES_INTERACTIVE which we never set
# - shell-hook approval → auto-approved via HERMES_ACCEPT_HOOKS=1
# (set above); also falls back to deny on non-tty
# - dangerous-command approval → bypassed via HERMES_YOLO_MODE=1
# - skill secret capture → returns gracefully when no callback set
clarify_callback=_oneshot_clarify_callback,
)
# Belt-and-braces: make sure AIAgent doesn't invoke any streaming
# display callbacks that would bypass our stdout capture.
agent.suppress_status_output = True
agent.stream_delta_callback = None
agent.tool_gen_callback = None
return agent.chat(prompt) or ""
def _oneshot_clarify_callback(question: str, choices=None) -> str:
"""Clarify is disabled in oneshot mode — tell the agent to pick a
default and proceed instead of stalling or erroring."""
if choices:
return (
f"[oneshot mode: no user available. Pick the best option from "
f"{choices} using your own judgment and continue.]"
)
return (
"[oneshot mode: no user available. Make the most reasonable "
"assumption you can and continue.]"
)
-8
View File
@@ -71,14 +71,6 @@ VALID_HOOKS: Set[str] = {
"on_session_finalize",
"on_session_reset",
"subagent_stop",
# Gateway pre-dispatch hook. Fired once per incoming MessageEvent
# after the internal-event guard but BEFORE auth/pairing and agent
# dispatch. Plugins may return a dict to influence flow:
# {"action": "skip", "reason": "..."} -> drop message (no reply)
# {"action": "rewrite", "text": "..."} -> replace event.text, continue
# {"action": "allow"} / None -> normal dispatch
# Kwargs: event: MessageEvent, gateway: GatewayRunner, session_store.
"pre_gateway_dispatch",
}
ENTRY_POINTS_GROUP = "hermes_agent.plugins"
-7
View File
@@ -116,10 +116,6 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
transport="openai_chat",
base_url_env_var="DASHSCOPE_BASE_URL",
),
"alibaba-coding-plan": HermesOverlay(
transport="openai_chat",
base_url_env_var="ALIBABA_CODING_PLAN_BASE_URL",
),
"vercel": HermesOverlay(
transport="openai_chat",
is_aggregator=True,
@@ -263,9 +259,6 @@ ALIASES: Dict[str, str] = {
"aliyun": "alibaba",
"qwen": "alibaba",
"alibaba-cloud": "alibaba",
"alibaba_coding": "alibaba-coding-plan",
"alibaba-coding": "alibaba-coding-plan",
"alibaba_coding_plan": "alibaba-coding-plan",
# google-gemini-cli (OAuth + Code Assist)
"gemini-cli": "google-gemini-cli",
-229
View File
@@ -1,229 +0,0 @@
"""PTY bridge for `hermes dashboard` chat tab.
Wraps a child process behind a pseudo-terminal so its ANSI output can be
streamed to a browser-side terminal emulator (xterm.js) and typed
keystrokes can be fed back in. The only caller today is the
``/api/pty`` WebSocket endpoint in ``hermes_cli.web_server``.
Design constraints:
* **POSIX-only.** Hermes Agent supports Windows exclusively via WSL, which
exposes a native POSIX PTY via ``openpty(3)``. Native Windows Python
has no PTY; :class:`PtyUnavailableError` is raised with a user-readable
install/platform message so the dashboard can render a banner instead of
crashing.
* **Zero Node dependency on the server side.** We use :mod:`ptyprocess`,
which is a pure-Python wrapper around the OS calls. The browser talks
to the same ``hermes --tui`` binary it would launch from the CLI, so
every TUI feature (slash popover, model picker, tool rows, markdown,
skin engine, clarify/sudo/approval prompts) ships automatically.
* **Byte-safe I/O.** Reads and writes go through the PTY master fd
directly we avoid :class:`ptyprocess.PtyProcessUnicode` because
streaming ANSI is inherently byte-oriented and UTF-8 boundaries may land
mid-read.
"""
from __future__ import annotations
import errno
import fcntl
import os
import select
import signal
import struct
import sys
import termios
import time
from typing import Optional, Sequence
try:
import ptyprocess # type: ignore
_PTY_AVAILABLE = not sys.platform.startswith("win")
except ImportError: # pragma: no cover - dev env without ptyprocess
ptyprocess = None # type: ignore
_PTY_AVAILABLE = False
__all__ = ["PtyBridge", "PtyUnavailableError"]
class PtyUnavailableError(RuntimeError):
"""Raised when a PTY cannot be created on this platform.
Today this means native Windows (no ConPTY bindings) or a dev
environment missing the ``ptyprocess`` dependency. The dashboard
surfaces the message to the user as a chat-tab banner.
"""
class PtyBridge:
"""Thin wrapper around ``ptyprocess.PtyProcess`` for byte streaming.
Not thread-safe. A single bridge is owned by the WebSocket handler
that spawned it; the reader runs in an executor thread while writes
happen on the event-loop thread. Both sides are OK because the
kernel PTY is the actual synchronization point we never call
:mod:`ptyprocess` methods concurrently, we only call ``os.read`` and
``os.write`` on the master fd, which is safe.
"""
def __init__(self, proc: "ptyprocess.PtyProcess"): # type: ignore[name-defined]
self._proc = proc
self._fd: int = proc.fd
self._closed = False
# -- lifecycle --------------------------------------------------------
@classmethod
def is_available(cls) -> bool:
"""True if a PTY can be spawned on this platform."""
return bool(_PTY_AVAILABLE)
@classmethod
def spawn(
cls,
argv: Sequence[str],
*,
cwd: Optional[str] = None,
env: Optional[dict] = None,
cols: int = 80,
rows: int = 24,
) -> "PtyBridge":
"""Spawn ``argv`` behind a new PTY and return a bridge.
Raises :class:`PtyUnavailableError` if the platform can't host a
PTY. Raises :class:`FileNotFoundError` or :class:`OSError` for
ordinary exec failures (missing binary, bad cwd, etc.).
"""
if not _PTY_AVAILABLE:
if sys.platform.startswith("win"):
raise PtyUnavailableError(
"Pseudo-terminals are unavailable on this platform. "
"Hermes Agent supports Windows only via WSL."
)
if ptyprocess is None:
raise PtyUnavailableError(
"The `ptyprocess` package is missing. "
"Install with: pip install ptyprocess "
"(or pip install -e '.[pty]')."
)
raise PtyUnavailableError("Pseudo-terminals are unavailable.")
# Let caller-supplied env fully override inheritance; if they pass
# None we inherit the server's env (same semantics as subprocess).
spawn_env = os.environ.copy() if env is None else env
proc = ptyprocess.PtyProcess.spawn( # type: ignore[union-attr]
list(argv),
cwd=cwd,
env=spawn_env,
dimensions=(rows, cols),
)
return cls(proc)
@property
def pid(self) -> int:
return int(self._proc.pid)
def is_alive(self) -> bool:
if self._closed:
return False
try:
return bool(self._proc.isalive())
except Exception:
return False
# -- I/O --------------------------------------------------------------
def read(self, timeout: float = 0.2) -> Optional[bytes]:
"""Read up to 64 KiB of raw bytes from the PTY master.
Returns:
* bytes zero or more bytes of child output
* empty bytes (``b""``) no data available within ``timeout``
* None child has exited and the master fd is at EOF
Never blocks longer than ``timeout`` seconds. Safe to call after
:meth:`close`; returns ``None`` in that case.
"""
if self._closed:
return None
try:
readable, _, _ = select.select([self._fd], [], [], timeout)
except (OSError, ValueError):
return None
if not readable:
return b""
try:
data = os.read(self._fd, 65536)
except OSError as exc:
# EIO on Linux = slave side closed. EBADF = already closed.
if exc.errno in (errno.EIO, errno.EBADF):
return None
raise
if not data:
return None
return data
def write(self, data: bytes) -> None:
"""Write raw bytes to the PTY master (i.e. the child's stdin)."""
if self._closed or not data:
return
# os.write can return a short write under load; loop until drained.
view = memoryview(data)
while view:
try:
n = os.write(self._fd, view)
except OSError as exc:
if exc.errno in (errno.EIO, errno.EBADF, errno.EPIPE):
return
raise
if n <= 0:
return
view = view[n:]
def resize(self, cols: int, rows: int) -> None:
"""Forward a terminal resize to the child via ``TIOCSWINSZ``."""
if self._closed:
return
# struct winsize: rows, cols, xpixel, ypixel (all unsigned short)
winsize = struct.pack("HHHH", max(1, rows), max(1, cols), 0, 0)
try:
fcntl.ioctl(self._fd, termios.TIOCSWINSZ, winsize)
except OSError:
pass
# -- teardown ---------------------------------------------------------
def close(self) -> None:
"""Terminate the child (SIGTERM → 0.5s grace → SIGKILL) and close fds.
Idempotent. Reaping the child is important so we don't leak
zombies across the lifetime of the dashboard process.
"""
if self._closed:
return
self._closed = True
# SIGHUP is the conventional "your terminal went away" signal.
# We escalate if the child ignores it.
for sig in (signal.SIGHUP, signal.SIGTERM, signal.SIGKILL):
if not self._proc.isalive():
break
try:
self._proc.kill(sig)
except Exception:
pass
deadline = time.monotonic() + 0.5
while self._proc.isalive() and time.monotonic() < deadline:
time.sleep(0.02)
try:
self._proc.close(force=True)
except Exception:
pass
# Context-manager sugar — handy in tests and ad-hoc scripts.
def __enter__(self) -> "PtyBridge":
return self
def __exit__(self, *_exc) -> None:
self.close()
+6 -64
View File
@@ -36,29 +36,6 @@ def _normalize_custom_provider_name(value: str) -> str:
return value.strip().lower().replace(" ", "-")
def _loopback_hostname(host: str) -> bool:
h = (host or "").lower().rstrip(".")
return h in {"localhost", "127.0.0.1", "::1", "0.0.0.0"}
def _config_base_url_trustworthy_for_bare_custom(cfg_base_url: str, cfg_provider: str) -> bool:
"""Decide whether ``model.base_url`` may back bare ``custom`` runtime resolution.
GitHub #14676: the model picker can select Custom while ``model.provider`` still reflects a
previous provider. Reject non-loopback URLs unless the YAML provider is already ``custom``,
so a stale OpenRouter/Z.ai base_url cannot hijack local ``custom`` sessions.
"""
cfg_provider_norm = (cfg_provider or "").strip().lower()
bu = (cfg_base_url or "").strip()
if not bu:
return False
if cfg_provider_norm == "custom":
return True
if base_url_host_matches(bu, "openrouter.ai"):
return False
return _loopback_hostname(base_url_hostname(bu))
def _detect_api_mode_for_url(base_url: str) -> Optional[str]:
"""Auto-detect api_mode from the resolved base URL.
@@ -183,16 +160,8 @@ def _resolve_runtime_from_pool_entry(
requested_provider: str,
model_cfg: Optional[Dict[str, Any]] = None,
pool: Optional[CredentialPool] = None,
target_model: Optional[str] = None,
) -> Dict[str, Any]:
model_cfg = model_cfg or _get_model_config()
# When the caller is resolving for a specific target model (e.g. a /model
# mid-session switch), prefer that over the persisted model.default. This
# prevents api_mode being computed from a stale config default that no
# longer matches the model actually being used — the bug that caused
# opencode-zen /v1 to be stripped for chat_completions requests when
# config.default was still a Claude model.
effective_model = (target_model or model_cfg.get("default") or "")
base_url = (getattr(entry, "runtime_base_url", None) or getattr(entry, "base_url", None) or "").rstrip("/")
api_key = getattr(entry, "runtime_api_key", None) or getattr(entry, "access_token", "")
api_mode = "chat_completions"
@@ -238,7 +207,7 @@ def _resolve_runtime_from_pool_entry(
api_mode = configured_mode
elif provider in ("opencode-zen", "opencode-go"):
from hermes_cli.models import opencode_model_api_mode
api_mode = opencode_model_api_mode(provider, effective_model)
api_mode = opencode_model_api_mode(provider, model_cfg.get("default", ""))
else:
# Auto-detect Anthropic-compatible endpoints (/anthropic suffix,
# Kimi /coding, api.openai.com → codex_responses, api.x.ai →
@@ -354,16 +323,12 @@ def _get_named_custom_provider(requested_provider: str) -> Optional[Dict[str, An
# Found match by provider key
base_url = entry.get("api") or entry.get("url") or entry.get("base_url") or ""
if base_url:
result = {
return {
"name": entry.get("name", ep_name),
"base_url": base_url.strip(),
"api_key": resolved_api_key,
"model": entry.get("default_model", ""),
}
api_mode = _parse_api_mode(entry.get("api_mode"))
if api_mode:
result["api_mode"] = api_mode
return result
# Also check the 'name' field if present
display_name = entry.get("name", "")
if display_name:
@@ -372,16 +337,12 @@ def _get_named_custom_provider(requested_provider: str) -> Optional[Dict[str, An
# Found match by display name
base_url = entry.get("api") or entry.get("url") or entry.get("base_url") or ""
if base_url:
result = {
return {
"name": display_name,
"base_url": base_url.strip(),
"api_key": resolved_api_key,
"model": entry.get("default_model", ""),
}
api_mode = _parse_api_mode(entry.get("api_mode"))
if api_mode:
result["api_mode"] = api_mode
return result
# Fall back to custom_providers: list (legacy format)
custom_providers = config.get("custom_providers")
@@ -503,7 +464,6 @@ def _resolve_openrouter_runtime(
cfg_provider = cfg_provider.strip().lower()
env_openrouter_base_url = os.getenv("OPENROUTER_BASE_URL", "").strip()
env_custom_base_url = os.getenv("CUSTOM_BASE_URL", "").strip()
# Use config base_url when available and the provider context matches.
# OPENAI_BASE_URL env var is no longer consulted — config.yaml is
@@ -513,14 +473,11 @@ def _resolve_openrouter_runtime(
if requested_norm == "auto":
if not cfg_provider or cfg_provider == "auto":
use_config_base_url = True
elif requested_norm == "custom" and _config_base_url_trustworthy_for_bare_custom(
cfg_base_url, cfg_provider
):
elif requested_norm == "custom" and cfg_provider == "custom":
use_config_base_url = True
base_url = (
(explicit_base_url or "").strip()
or env_custom_base_url
or (cfg_base_url.strip() if use_config_base_url else "")
or env_openrouter_base_url
or OPENROUTER_BASE_URL
@@ -732,18 +689,8 @@ def resolve_runtime_provider(
requested: Optional[str] = None,
explicit_api_key: Optional[str] = None,
explicit_base_url: Optional[str] = None,
target_model: Optional[str] = None,
) -> Dict[str, Any]:
"""Resolve runtime provider credentials for agent execution.
target_model: Optional override for model_cfg.get("default") when
computing provider-specific api_mode (e.g. OpenCode Zen/Go where different
models route through different API surfaces). Callers performing an
explicit mid-session model switch should pass the new model here so
api_mode is derived from the model they are switching TO, not the stale
persisted default. Other callers can leave it None to preserve existing
behavior (api_mode derived from config).
"""
"""Resolve runtime provider credentials for agent execution."""
requested_provider = resolve_requested_provider(requested)
custom_runtime = _resolve_named_custom_runtime(
@@ -825,7 +772,6 @@ def resolve_runtime_provider(
requested_provider=requested_provider,
model_cfg=model_cfg,
pool=pool,
target_model=target_model,
)
if provider == "nous":
@@ -1044,11 +990,7 @@ def resolve_runtime_provider(
api_mode = configured_mode
elif provider in ("opencode-zen", "opencode-go"):
from hermes_cli.models import opencode_model_api_mode
# Prefer the target_model from the caller (explicit mid-session
# switch) over the stale model.default; see _resolve_runtime_from_pool_entry
# for the same rationale.
_effective = target_model or model_cfg.get("default", "")
api_mode = opencode_model_api_mode(provider, _effective)
api_mode = opencode_model_api_mode(provider, model_cfg.get("default", ""))
else:
# Auto-detect Anthropic-compatible endpoints by URL convention
# (e.g. https://api.minimax.io/anthropic, https://dashscope.../anthropic)
-9
View File
@@ -500,15 +500,6 @@ def _print_setup_summary(config: dict, hermes_home):
if get_env_value("HASS_TOKEN"):
tool_status.append(("Smart Home (Home Assistant)", True, None))
# Spotify (OAuth via hermes auth spotify — check auth.json, not env vars)
try:
from hermes_cli.auth import get_provider_auth_state
_spotify_state = get_provider_auth_state("spotify") or {}
if _spotify_state.get("access_token") or _spotify_state.get("refresh_token"):
tool_status.append(("Spotify (PKCE OAuth)", True, None))
except Exception:
pass
# Skills Hub
if get_env_value("GITHUB_TOKEN"):
tool_status.append(("Skills Hub (GitHub)", True, None))
+6 -13
View File
@@ -164,26 +164,19 @@ def show_status(args):
qwen_status = {}
nous_logged_in = bool(nous_status.get("logged_in"))
nous_error = nous_status.get("error")
nous_label = "logged in" if nous_logged_in else "not logged in (run: hermes auth add nous --type oauth)"
print(
f" {'Nous Portal':<12} {check_mark(nous_logged_in)} "
f"{nous_label}"
f"{'logged in' if nous_logged_in else 'not logged in (run: hermes model)'}"
)
portal_url = nous_status.get("portal_base_url") or "(unknown)"
access_exp = _format_iso_timestamp(nous_status.get("access_expires_at"))
key_exp = _format_iso_timestamp(nous_status.get("agent_key_expires_at"))
refresh_label = "yes" if nous_status.get("has_refresh_token") else "no"
if nous_logged_in or portal_url != "(unknown)" or nous_error:
if nous_logged_in:
portal_url = nous_status.get("portal_base_url") or "(unknown)"
access_exp = _format_iso_timestamp(nous_status.get("access_expires_at"))
key_exp = _format_iso_timestamp(nous_status.get("agent_key_expires_at"))
refresh_label = "yes" if nous_status.get("has_refresh_token") else "no"
print(f" Portal URL: {portal_url}")
if nous_logged_in or nous_status.get("access_expires_at"):
print(f" Access exp: {access_exp}")
if nous_logged_in or nous_status.get("agent_key_expires_at"):
print(f" Key exp: {key_exp}")
if nous_logged_in or nous_status.get("has_refresh_token"):
print(f" Refresh: {refresh_label}")
if nous_error and not nous_logged_in:
print(f" Error: {nous_error}")
codex_logged_in = bool(codex_status.get("logged_in"))
print(
+1 -1
View File
@@ -127,7 +127,7 @@ TIPS = [
# --- Tools & Capabilities ---
"execute_code runs Python scripts that call Hermes tools programmatically — results stay out of context.",
"delegate_task spawns up to 3 concurrent sub-agents by default (delegation.max_concurrent_children) with isolated contexts for parallel work.",
"delegate_task spawns up to 3 concurrent sub-agents by default (configurable via delegation.max_concurrent_children) with isolated contexts for parallel work.",
"web_extract works on PDF URLs — pass any PDF link and it converts to markdown.",
"search_files is ripgrep-backed and faster than grep — use it instead of terminal grep.",
"patch uses 9 fuzzy matching strategies so minor whitespace differences won't break edits.",
+13 -172
View File
@@ -67,59 +67,25 @@ CONFIGURABLE_TOOLSETS = [
("messaging", "📨 Cross-Platform Messaging", "send_message"),
("rl", "🧪 RL Training", "Tinker-Atropos training tools"),
("homeassistant", "🏠 Home Assistant", "smart home device control"),
("spotify", "🎵 Spotify", "playback, search, playlists, library"),
("discord", "💬 Discord (read/participate)", "fetch messages, search members, create thread"),
("discord_admin", "🛡️ Discord Server Admin", "list channels/roles, pin, assign roles"),
]
# Toolsets that are OFF by default for new installs.
# They're still in _HERMES_CORE_TOOLS (available at runtime if enabled),
# but the setup checklist won't pre-select them for first-time users.
_DEFAULT_OFF_TOOLSETS = {"moa", "homeassistant", "rl", "spotify", "discord", "discord_admin"}
# Platform-scoped toolsets: only appear in the `hermes tools` checklist for
# these platforms, and only resolve/save for these platforms. A toolset
# absent from this map is available on every platform (current behaviour).
#
# Use this for tools whose APIs only make sense on one platform (Discord
# server admin, Slack workspace admin, etc.). Keeps every other platform's
# checklist from filling up with irrelevant toggles.
_TOOLSET_PLATFORM_RESTRICTIONS: Dict[str, Set[str]] = {
"discord": {"discord"},
"discord_admin": {"discord"},
}
def _toolset_allowed_for_platform(ts_key: str, platform: str) -> bool:
"""Return True if ``ts_key`` is configurable on ``platform``.
Toolsets without a restriction entry are allowed everywhere (the default).
"""
allowed = _TOOLSET_PLATFORM_RESTRICTIONS.get(ts_key)
return allowed is None or platform in allowed
_DEFAULT_OFF_TOOLSETS = {"moa", "homeassistant", "rl"}
def _get_effective_configurable_toolsets():
"""Return CONFIGURABLE_TOOLSETS + any plugin-provided toolsets.
Plugin toolsets are appended at the end so they appear after the
built-in toolsets in the TUI checklist. A plugin whose toolset key
already appears in ``CONFIGURABLE_TOOLSETS`` is skipped bundled
plugins (e.g. ``plugins/spotify``) share their toolset key with the
built-in entry, and we want the built-in label/description to win.
Without the dedupe, ``hermes tools`` "reconfigure existing" would
list the same toolset twice.
built-in toolsets in the TUI checklist.
"""
result = list(CONFIGURABLE_TOOLSETS)
seen = {ts_key for ts_key, _, _ in result}
try:
from hermes_cli.plugins import discover_plugins, get_plugin_toolsets
discover_plugins() # idempotent — ensures plugins are loaded
for entry in get_plugin_toolsets():
if entry[0] in seen:
continue
seen.add(entry[0])
result.append(entry)
result.extend(get_plugin_toolsets())
except Exception:
pass
return result
@@ -395,18 +361,6 @@ TOOL_CATEGORIES = {
},
],
},
"spotify": {
"name": "Spotify",
"icon": "🎵",
"providers": [
{
"name": "Spotify Web API",
"tag": "PKCE OAuth — opens the setup wizard",
"env_vars": [],
"post_setup": "spotify",
},
],
},
"rl": {
"name": "RL Training",
"icon": "🧪",
@@ -507,35 +461,6 @@ def _run_post_setup(post_setup_key: str):
_print_warning(" kittentts install timed out (>5min)")
_print_info(f" Run manually: python -m pip install -U '{wheel_url}' soundfile")
elif post_setup_key == "spotify":
# Run the full `hermes auth spotify` flow — if the user has no
# client_id yet, this drops them into the interactive wizard
# (opens the Spotify dashboard, prompts for client_id, persists
# to ~/.hermes/.env), then continues straight into PKCE. If they
# already have an app, it skips the wizard and just does OAuth.
from types import SimpleNamespace
try:
from hermes_cli.auth import login_spotify_command
except Exception as exc:
_print_warning(f" Could not load Spotify auth: {exc}")
_print_info(" Run manually: hermes auth spotify")
return
_print_info(" Starting Spotify login...")
try:
login_spotify_command(SimpleNamespace(
client_id=None, redirect_uri=None, scope=None,
no_browser=False, timeout=None,
))
_print_success(" Spotify authenticated")
except SystemExit as exc:
# User aborted the wizard, or OAuth failed — don't fail the
# toolset enable; they can retry with `hermes auth spotify`.
_print_warning(f" Spotify login did not complete: {exc}")
_print_info(" Run later: hermes auth spotify")
except Exception as exc:
_print_warning(f" Spotify login failed: {exc}")
_print_info(" Run manually: hermes auth spotify")
elif post_setup_key == "rl_training":
try:
__import__("tinker_atropos")
@@ -624,7 +549,7 @@ def _get_platform_tools(
include_default_mcp_servers: bool = True,
) -> Set[str]:
"""Resolve which individual toolset names are enabled for a platform."""
from toolsets import resolve_toolset, TOOLSETS
from toolsets import resolve_toolset
platform_toolsets = config.get("platform_toolsets") or {}
toolset_names = platform_toolsets.get(platform)
@@ -638,8 +563,6 @@ def _get_platform_tools(
toolset_names = [str(ts) for ts in toolset_names]
configurable_keys = {ts_key for ts_key, _, _ in CONFIGURABLE_TOOLSETS}
plugin_ts_keys = _get_plugin_toolset_keys()
platform_default_keys = {p["default_toolset"] for p in PLATFORMS.values()}
# If the saved list contains any configurable keys directly, the user
# has explicitly configured this platform — use direct membership.
@@ -649,10 +572,7 @@ def _get_platform_tools(
has_explicit_config = any(ts in configurable_keys for ts in toolset_names)
if has_explicit_config:
enabled_toolsets = {
ts for ts in toolset_names
if ts in configurable_keys and _toolset_allowed_for_platform(ts, platform)
}
enabled_toolsets = {ts for ts in toolset_names if ts in configurable_keys}
else:
# No explicit config — fall back to resolving composite toolset names
# (e.g. "hermes-cli") to individual tool names and reverse-mapping.
@@ -662,59 +582,19 @@ def _get_platform_tools(
enabled_toolsets = set()
for ts_key, _, _ in CONFIGURABLE_TOOLSETS:
if not _toolset_allowed_for_platform(ts_key, platform):
continue
ts_tools = set(resolve_toolset(ts_key))
if ts_tools and ts_tools.issubset(all_tool_names):
enabled_toolsets.add(ts_key)
default_off = set(_DEFAULT_OFF_TOOLSETS)
# Legacy safety: if the platform's own name matches a default-off
# toolset (e.g. `homeassistant` platform + `homeassistant` toolset),
# keep that toolset enabled on first install. Skip this dodge for
# platform-restricted toolsets — those are always opt-in even on
# their own platform (e.g. `discord` + `discord` should stay OFF).
if platform in default_off and platform not in _TOOLSET_PLATFORM_RESTRICTIONS:
if platform in default_off:
default_off.remove(platform)
enabled_toolsets -= default_off
# Recover non-configurable platform toolsets (e.g. discord, feishu_doc,
# feishu_drive). These are part of the platform's default composite but
# absent from CONFIGURABLE_TOOLSETS, so they can't appear in the TUI
# checklist or in a user-saved config. Must run in BOTH branches —
# otherwise saving via `hermes tools` (which flips has_explicit_config
# to True) silently drops them.
platform_tool_universe = set(resolve_toolset(PLATFORMS[platform]["default_toolset"]))
configurable_tool_universe = set()
for ck in configurable_keys:
configurable_tool_universe.update(resolve_toolset(ck))
claimed = set()
for ts_key in enabled_toolsets:
claimed.update(resolve_toolset(ts_key))
skip = configurable_keys | plugin_ts_keys | platform_default_keys
skip |= {k for k in TOOLSETS if k.startswith("hermes-")}
skip |= set(_DEFAULT_OFF_TOOLSETS) - {platform}
for ts_key, ts_def in TOOLSETS.items():
if ts_key in skip:
continue
if ts_def.get("includes"):
continue
ts_tools = set(resolve_toolset(ts_key))
if not ts_tools or not ts_tools.issubset(platform_tool_universe):
continue
if ts_tools.issubset(configurable_tool_universe):
continue
if not ts_tools.issubset(claimed):
enabled_toolsets.add(ts_key)
claimed.update(ts_tools)
# Plugin toolsets: enabled by default unless explicitly disabled, or
# unless the toolset is in _DEFAULT_OFF_TOOLSETS (e.g. spotify —
# shipped as a bundled plugin but user must opt in via `hermes tools`
# so we don't ship 7 Spotify tool schemas to users who don't use it).
# Plugin toolsets: enabled by default unless explicitly disabled.
# A plugin toolset is "known" for a platform once `hermes tools`
# has been saved for that platform (tracked via known_plugin_toolsets).
# Unknown plugins default to enabled; known-but-absent = disabled.
plugin_ts_keys = _get_plugin_toolset_keys()
if plugin_ts_keys:
known_map = config.get("known_plugin_toolsets", {})
known_for_platform = set(known_map.get(platform, []))
@@ -722,9 +602,6 @@ def _get_platform_tools(
if pts in toolset_names:
# Explicitly listed in config — enabled
enabled_toolsets.add(pts)
elif pts in _DEFAULT_OFF_TOOLSETS:
# Opt-in plugin toolset — stay off until user picks it
continue
elif pts not in known_for_platform:
# New plugin not yet seen by hermes tools — default enabled
enabled_toolsets.add(pts)
@@ -732,6 +609,7 @@ def _get_platform_tools(
# Preserve any explicit non-configurable toolset entries (for example,
# custom toolsets or MCP server names saved in platform_toolsets).
platform_default_keys = {p["default_toolset"] for p in PLATFORMS.values()}
explicit_passthrough = {
ts
for ts in toolset_names
@@ -777,14 +655,6 @@ def _save_platform_tools(config: dict, platform: str, enabled_toolset_keys: Set[
"""
config.setdefault("platform_toolsets", {})
# Drop platform-scoped toolsets that don't apply here. Prevents the
# "Configure all platforms" checklist (or a hand-edited config.yaml)
# from turning on, say, the `discord` toolset for Telegram.
enabled_toolset_keys = {
ts for ts in enabled_toolset_keys
if _toolset_allowed_for_platform(ts, platform)
}
# Get the set of all configurable toolset keys (built-in + plugin)
configurable_keys = {ts_key for ts_key, _, _ in CONFIGURABLE_TOOLSETS}
plugin_keys = _get_plugin_toolset_keys()
@@ -799,7 +669,6 @@ def _save_platform_tools(config: dict, platform: str, enabled_toolset_keys: Set[
existing_toolsets = config.get("platform_toolsets", {}).get(platform, [])
if not isinstance(existing_toolsets, list):
existing_toolsets = []
existing_toolsets = [str(ts) for ts in existing_toolsets]
# Preserve any entries that are NOT configurable toolsets and NOT platform
# defaults (i.e. only MCP server names should be preserved)
@@ -807,11 +676,6 @@ def _save_platform_tools(config: dict, platform: str, enabled_toolset_keys: Set[
entry for entry in existing_toolsets
if entry not in configurable_keys and entry not in platform_default_keys
}
# Opening `hermes tools` is the user's opt-in to reconfigure tools, so treat
# saving from the picker as consent to clear the "no_mcp" sentinel. The
# picker has no checkbox for no_mcp, so without this users who once set it
# by hand could never re-enable MCP servers through the UI.
preserved_entries.discard("no_mcp")
# Merge preserved entries with new enabled toolsets
config["platform_toolsets"][platform] = sorted(enabled_toolset_keys | preserved_entries)
@@ -919,7 +783,7 @@ def _estimate_tool_tokens() -> Dict[str, int]:
return _tool_token_cache
def _prompt_toolset_checklist(platform_label: str, enabled: Set[str], platform: str = "cli") -> Set[str]:
def _prompt_toolset_checklist(platform_label: str, enabled: Set[str]) -> Set[str]:
"""Multi-select checklist of toolsets. Returns set of selected toolset keys."""
from hermes_cli.curses_ui import curses_checklist
from toolsets import resolve_toolset
@@ -927,12 +791,7 @@ def _prompt_toolset_checklist(platform_label: str, enabled: Set[str], platform:
# Pre-compute per-tool token counts (cached after first call).
tool_tokens = _estimate_tool_tokens()
effective_all = _get_effective_configurable_toolsets()
# Drop platform-scoped toolsets that don't apply to this platform.
effective = [
(k, l, d) for (k, l, d) in effective_all
if _toolset_allowed_for_platform(k, platform)
]
effective = _get_effective_configurable_toolsets()
labels = []
for ts_key, ts_label, ts_desc in effective:
@@ -1846,7 +1705,7 @@ def tools_command(args=None, first_install: bool = False, config: dict = None):
checklist_preselected = current_enabled - _DEFAULT_OFF_TOOLSETS
# Show checklist
new_enabled = _prompt_toolset_checklist(pinfo["label"], checklist_preselected, pkey)
new_enabled = _prompt_toolset_checklist(pinfo["label"], checklist_preselected)
added = new_enabled - current_enabled
removed = current_enabled - new_enabled
@@ -2202,11 +2061,7 @@ def _apply_mcp_change(config: dict, targets: List[str], action: str) -> Set[str]
def _print_tools_list(enabled_toolsets: set, mcp_servers: dict, platform: str = "cli"):
"""Print a summary of enabled/disabled toolsets and MCP tool filters."""
effective_all = _get_effective_configurable_toolsets()
effective = [
(k, l, d) for (k, l, d) in effective_all
if _toolset_allowed_for_platform(k, platform)
]
effective = _get_effective_configurable_toolsets()
builtin_keys = {ts_key for ts_key, _, _ in CONFIGURABLE_TOOLSETS}
print(f"Built-in toolsets ({platform}):")
@@ -2272,20 +2127,6 @@ def tools_disable_enable_command(args):
_print_error(f"Unknown toolset '{name}'")
toolset_targets = [t for t in toolset_targets if t in valid_toolsets]
# Reject platform-scoped toolsets on platforms that don't allow them.
restricted_targets = [
t for t in toolset_targets
if not _toolset_allowed_for_platform(t, platform)
]
if restricted_targets:
for name in restricted_targets:
allowed = sorted(_TOOLSET_PLATFORM_RESTRICTIONS.get(name) or set())
_print_error(
f"Toolset '{name}' is not available on platform '{platform}' "
f"(only: {', '.join(allowed)})"
)
toolset_targets = [t for t in toolset_targets if t not in restricted_targets]
if toolset_targets:
_apply_toolset_change(config, platform, toolset_targets, action)
+185 -893
View File
File diff suppressed because it is too large Load Diff
-65
View File
@@ -1039,71 +1039,6 @@ class SessionDB:
result.append(msg)
return result
def resolve_resume_session_id(self, session_id: str) -> str:
"""Redirect a resume target to the descendant session that holds the messages.
Context compression ends the current session and forks a new child session
(linked via ``parent_session_id``). The flush cursor is reset, so the
child is where new messages actually land the parent ends up with
``message_count = 0`` rows unless messages had already been flushed to
it before compression. See #15000.
This helper walks ``parent_session_id`` forward from ``session_id`` and
returns the first descendant in the chain that has at least one message
row. If the original session already has messages, or no descendant
has any, the original ``session_id`` is returned unchanged.
The chain is always walked via the child whose ``started_at`` is
latest; that matches the single-chain shape that compression creates.
A depth cap (32) guards against accidental loops in malformed data.
"""
if not session_id:
return session_id
with self._lock:
# If this session already has messages, nothing to redirect.
try:
row = self._conn.execute(
"SELECT 1 FROM messages WHERE session_id = ? LIMIT 1",
(session_id,),
).fetchone()
except Exception:
return session_id
if row is not None:
return session_id
# Walk descendants: at each step, pick the most-recently-started
# child session; stop once we find one with messages.
current = session_id
seen = {current}
for _ in range(32):
try:
child_row = self._conn.execute(
"SELECT id FROM sessions "
"WHERE parent_session_id = ? "
"ORDER BY started_at DESC, id DESC LIMIT 1",
(current,),
).fetchone()
except Exception:
return session_id
if child_row is None:
return session_id
child_id = child_row["id"] if hasattr(child_row, "keys") else child_row[0]
if not child_id or child_id in seen:
return session_id
seen.add(child_id)
try:
msg_row = self._conn.execute(
"SELECT 1 FROM messages WHERE session_id = ? LIMIT 1",
(child_id,),
).fetchone()
except Exception:
return session_id
if msg_row is not None:
return child_id
current = child_id
return session_id
def get_messages_as_conversation(self, session_id: str) -> List[Dict[str, Any]]:
"""
Load messages in the OpenAI conversation format (role + content dicts).
+25 -41
View File
@@ -288,34 +288,30 @@ def get_tool_definitions(
filtered_tools[i] = {"type": "function", "function": dynamic_schema}
break
# Rebuild discord / discord_admin schemas based on the bot's privileged
# intents (detected from GET /applications/@me) and the user's action
# allowlist in config. Hides actions the bot's intents don't support so
# the model never attempts them, and annotates fetch_messages when the
# Rebuild discord_server schema based on the bot's privileged intents
# (detected from GET /applications/@me) and the user's action allowlist
# in config. Hides actions the bot's intents don't support so the
# model never attempts them, and annotates fetch_messages when the
# MESSAGE_CONTENT intent is missing.
_discord_schema_fns = {
"discord": "get_dynamic_schema_core",
"discord_admin": "get_dynamic_schema_admin",
}
for discord_tool_name in _discord_schema_fns:
if discord_tool_name in available_tool_names:
try:
from tools import discord_tool as _dt
schema_fn = getattr(_dt, _discord_schema_fns[discord_tool_name])
dynamic = schema_fn()
except Exception:
dynamic = None
if dynamic is None:
filtered_tools = [
t for t in filtered_tools
if t.get("function", {}).get("name") != discord_tool_name
]
available_tool_names.discard(discord_tool_name)
else:
for i, td in enumerate(filtered_tools):
if td.get("function", {}).get("name") == discord_tool_name:
filtered_tools[i] = {"type": "function", "function": dynamic}
break
if "discord_server" in available_tool_names:
try:
from tools.discord_tool import get_dynamic_schema
dynamic = get_dynamic_schema()
except Exception: # pragma: no cover — defensive, fall back to static
dynamic = None
if dynamic is None:
# Tool filtered out entirely (empty allowlist or detection disabled
# the only remaining actions). Drop it from the schema list.
filtered_tools = [
t for t in filtered_tools
if t.get("function", {}).get("name") != "discord_server"
]
available_tool_names.discard("discord_server")
else:
for i, td in enumerate(filtered_tools):
if td.get("function", {}).get("name") == "discord_server":
filtered_tools[i] = {"type": "function", "function": dynamic}
break
# Strip web tool cross-references from browser_navigate description when
# web_search / web_extract are not available. The static schema says
@@ -347,18 +343,6 @@ def get_tool_definitions(
global _last_resolved_tool_names
_last_resolved_tool_names = [t["function"]["name"] for t in filtered_tools]
# Sanitize schemas for broad backend compatibility. llama.cpp's
# json-schema-to-grammar converter (used by its OAI server to build
# GBNF tool-call parsers) rejects some shapes that cloud providers
# silently accept — bare "type": "object" with no properties,
# string-valued schema nodes from malformed MCP servers, etc. This
# is a no-op for schemas that are already well-formed.
try:
from tools.schema_sanitizer import sanitize_tool_schemas
filtered_tools = sanitize_tool_schemas(filtered_tools)
except Exception as e: # pragma: no cover — defensive
logger.warning("Schema sanitization skipped: %s", e)
return filtered_tools
@@ -468,9 +452,9 @@ def _coerce_number(value: str, integer_only: bool = False):
f = float(value)
except (ValueError, OverflowError):
return value
# Guard against inf/nan — not JSON-serializable, keep original string
# Guard against inf/nan before int() conversion
if f != f or f == float("inf") or f == float("-inf"):
return value
return f
# If it looks like an integer (no fractional part), return int
if f == int(f):
return int(f)
+1 -1
View File
@@ -156,7 +156,7 @@
for entry in "''${ENTRIES[@]}"; do
IFS=":" read -r ATTR FOLDER NIX_FILE <<< "$entry"
echo "==> .#$ATTR ($FOLDER -> $NIX_FILE)"
OUTPUT=$(nix build ".#$ATTR.npmDeps" --no-link --rebuild --print-build-logs 2>&1)
OUTPUT=$(nix build ".#$ATTR.npmDeps" --no-link --print-build-logs 2>&1)
STATUS=$?
if [ "$STATUS" -eq 0 ]; then
echo " ok"
+1 -1
View File
@@ -4,7 +4,7 @@ let
src = ../web;
npmDeps = pkgs.fetchNpmDeps {
inherit src;
hash = "sha256-4Z8KQ69QhO83X6zff+5urWBv6MME686MhTTMdwSl65o=";
hash = "sha256-TS/vrCHbdvXkPcAPxImKzAd2pdDCrKlgYZkXBMQ+TEg=";
};
npm = hermesNpmLib.mkNpmPassthru { folder = "web"; attr = "web"; pname = "hermes-web"; };
-25
View File
@@ -91,29 +91,4 @@
// Register this plugin — the dashboard picks it up automatically.
window.__HERMES_PLUGINS__.register("example", ExamplePage);
// ─────────────────────────────────────────────────────────────────────
// Page-scoped slot demo: inject a small banner at the top of /sessions.
//
// Built-in pages expose named slots (<page>:top, <page>:bottom) that
// plugins can populate without overriding the whole route. The
// manifest lists the slots we use in its `slots` array so the shell
// knows to render <PluginSlot name="sessions:top" /> there.
// ─────────────────────────────────────────────────────────────────────
function SessionsTopBanner() {
return React.createElement(Card, {
className: "border-dashed",
},
React.createElement(CardContent, { className: "flex items-center gap-3 py-2" },
React.createElement(Badge, { variant: "outline" }, "Example"),
React.createElement("span", {
className: "text-xs text-muted-foreground",
}, "This banner was injected into the Sessions page by the example plugin via the ",
React.createElement("code", { className: "font-courier" }, "sessions:top"),
" slot."),
),
);
}
window.__HERMES_PLUGINS__.registerSlot("example", "sessions:top", SessionsTopBanner);
})();
@@ -8,7 +8,6 @@
"path": "/example",
"position": "after:skills"
},
"slots": ["sessions:top"],
"entry": "dist/index.js",
"api": "plugin_api.py"
}
+1 -2
View File
@@ -59,8 +59,7 @@ Config file: `~/.hermes/hindsight/config.json`
| Key | Default | Description |
|-----|---------|-------------|
| `bank_id` | `hermes` | Memory bank name (static fallback used when `bank_id_template` is unset or resolves empty) |
| `bank_id_template` | — | Optional template to derive the bank name dynamically. Placeholders: `{profile}`, `{workspace}`, `{platform}`, `{user}`, `{session}`. Example: `hermes-{profile}` isolates memory per active Hermes profile. Empty placeholders collapse cleanly (e.g. `hermes-{user}` with no user becomes `hermes`). |
| `bank_id` | `hermes` | Memory bank name |
| `bank_mission` | — | Reflect mission (identity/framing for reflect reasoning). Applied via Banks API. |
| `bank_retain_mission` | — | Retain mission (steers what gets extracted). Applied via Banks API. |
+69 -286
View File
@@ -3,8 +3,6 @@
Long-term memory with knowledge graph, entity resolution, and multi-strategy
retrieval. Supports cloud (API key) and local modes.
Configurable timeout via HINDSIGHT_TIMEOUT env var or config.json.
Original PR #1811 by benfrank241, adapted to MemoryProvider ABC.
Config via environment variables:
@@ -13,7 +11,6 @@ Config via environment variables:
HINDSIGHT_BUDGET recall budget: low/mid/high (default: mid)
HINDSIGHT_API_URL API endpoint
HINDSIGHT_MODE cloud or local (default: cloud)
HINDSIGHT_TIMEOUT API request timeout in seconds (default: 120)
HINDSIGHT_RETAIN_TAGS comma-separated tags attached to retained memories
HINDSIGHT_RETAIN_SOURCE metadata source value attached to retained memories
HINDSIGHT_RETAIN_USER_PREFIX label used before user turns in retained transcripts
@@ -26,7 +23,6 @@ Or via $HERMES_HOME/hindsight/config.json (profile-scoped), falling back to
from __future__ import annotations
import asyncio
import importlib
import json
import logging
import os
@@ -44,7 +40,6 @@ logger = logging.getLogger(__name__)
_DEFAULT_API_URL = "https://api.hindsight.vectorize.io"
_DEFAULT_LOCAL_URL = "http://localhost:8888"
_MIN_CLIENT_VERSION = "0.4.22"
_DEFAULT_TIMEOUT = 120 # seconds — cloud API can take 30-40s per request
_VALID_BUDGETS = {"low", "mid", "high"}
_PROVIDER_DEFAULT_MODELS = {
"openai": "gpt-4o-mini",
@@ -59,22 +54,6 @@ _PROVIDER_DEFAULT_MODELS = {
}
def _check_local_runtime() -> tuple[bool, str | None]:
"""Return whether local embedded Hindsight imports cleanly.
On older CPUs, importing the local Hindsight stack can raise a runtime
error from NumPy before the daemon starts. Treat that as "unavailable"
so Hermes can degrade gracefully instead of repeatedly trying to start
a broken local memory backend.
"""
try:
importlib.import_module("hindsight")
importlib.import_module("hindsight_embed.daemon_embed_manager")
return True, None
except Exception as exc:
return False, str(exc)
# ---------------------------------------------------------------------------
# Dedicated event loop for Hindsight async calls (one per process, reused).
# Avoids creating ephemeral loops that leak aiohttp sessions.
@@ -102,18 +81,13 @@ def _get_loop() -> asyncio.AbstractEventLoop:
return _loop
def _run_sync(coro, timeout: float = _DEFAULT_TIMEOUT):
def _run_sync(coro, timeout: float = 120.0):
"""Schedule *coro* on the shared loop and block until done."""
loop = _get_loop()
future = asyncio.run_coroutine_threadsafe(coro, loop)
return future.result(timeout=timeout)
# ---------------------------------------------------------------------------
# Backward-compatible alias — instances use self._run_sync() instead.
# ---------------------------------------------------------------------------
# ---------------------------------------------------------------------------
# Tool schemas
# ---------------------------------------------------------------------------
@@ -259,126 +233,6 @@ def _utc_timestamp() -> str:
return datetime.now(timezone.utc).isoformat(timespec="milliseconds").replace("+00:00", "Z")
def _embedded_profile_name(config: dict[str, Any]) -> str:
"""Return the Hindsight embedded profile name for this Hermes config."""
profile = config.get("profile", "hermes")
return str(profile or "hermes")
def _load_simple_env(path) -> dict[str, str]:
"""Parse a simple KEY=VALUE env file, ignoring comments and blank lines."""
if not path.exists():
return {}
values: dict[str, str] = {}
for line in path.read_text(encoding="utf-8").splitlines():
if not line or line.startswith("#") or "=" not in line:
continue
key, value = line.split("=", 1)
values[key.strip()] = value.strip()
return values
def _build_embedded_profile_env(config: dict[str, Any], *, llm_api_key: str | None = None) -> dict[str, str]:
"""Build the profile-scoped env file that standalone hindsight-embed consumes."""
current_key = llm_api_key
if current_key is None:
current_key = (
config.get("llmApiKey")
or config.get("llm_api_key")
or os.environ.get("HINDSIGHT_LLM_API_KEY", "")
)
current_provider = config.get("llm_provider", "")
current_model = config.get("llm_model", "")
current_base_url = config.get("llm_base_url") or os.environ.get("HINDSIGHT_API_LLM_BASE_URL", "")
# The embedded daemon expects OpenAI wire format for these providers.
daemon_provider = "openai" if current_provider in ("openai_compatible", "openrouter") else current_provider
env_values = {
"HINDSIGHT_API_LLM_PROVIDER": str(daemon_provider),
"HINDSIGHT_API_LLM_API_KEY": str(current_key or ""),
"HINDSIGHT_API_LLM_MODEL": str(current_model),
"HINDSIGHT_API_LOG_LEVEL": "info",
}
if current_base_url:
env_values["HINDSIGHT_API_LLM_BASE_URL"] = str(current_base_url)
return env_values
def _embedded_profile_env_path(config: dict[str, Any]):
from pathlib import Path
return Path.home() / ".hindsight" / "profiles" / f"{_embedded_profile_name(config)}.env"
def _materialize_embedded_profile_env(config: dict[str, Any], *, llm_api_key: str | None = None):
"""Write the profile-scoped env file that standalone hindsight-embed uses."""
profile_env = _embedded_profile_env_path(config)
profile_env.parent.mkdir(parents=True, exist_ok=True)
env_values = _build_embedded_profile_env(config, llm_api_key=llm_api_key)
profile_env.write_text(
"".join(f"{key}={value}\n" for key, value in env_values.items()),
encoding="utf-8",
)
return profile_env
def _sanitize_bank_segment(value: str) -> str:
"""Sanitize a bank_id_template placeholder value.
Bank IDs should be safe for URL paths and filesystem use. Replaces any
character that isn't alphanumeric, dash, or underscore with a dash, and
collapses runs of dashes.
"""
if not value:
return ""
out = []
prev_dash = False
for ch in str(value):
if ch.isalnum() or ch == "-" or ch == "_":
out.append(ch)
prev_dash = False
else:
if not prev_dash:
out.append("-")
prev_dash = True
return "".join(out).strip("-_")
def _resolve_bank_id_template(template: str, fallback: str, **placeholders: str) -> str:
"""Resolve a bank_id template string with the given placeholders.
Supported placeholders (each is sanitized before substitution):
{profile} active Hermes profile name (from agent_identity)
{workspace} Hermes workspace name (from agent_workspace)
{platform} "cli", "telegram", "discord", etc.
{user} platform user id (gateway sessions)
{session} current session id
Missing/empty placeholders are rendered as the empty string and then
collapsed e.g. ``hermes-{user}`` with no user becomes ``hermes``.
If the template is empty, resolution falls back to *fallback*.
Returns the sanitized bank id.
"""
if not template:
return fallback
sanitized = {k: _sanitize_bank_segment(v) for k, v in placeholders.items()}
try:
rendered = template.format(**sanitized)
except (KeyError, IndexError) as exc:
logger.warning("Invalid bank_id_template %r: %s — using fallback %r",
template, exc, fallback)
return fallback
while "--" in rendered:
rendered = rendered.replace("--", "-")
while "__" in rendered:
rendered = rendered.replace("__", "_")
rendered = rendered.strip("-_")
return rendered or fallback
# ---------------------------------------------------------------------------
# MemoryProvider implementation
# ---------------------------------------------------------------------------
@@ -408,17 +262,13 @@ class HindsightMemoryProvider(MemoryProvider):
self._chat_type = ""
self._thread_id = ""
self._agent_identity = ""
self._agent_workspace = ""
self._turn_index = 0
self._client = None
self._timeout = _DEFAULT_TIMEOUT
self._prefetch_result = ""
self._prefetch_lock = threading.Lock()
self._prefetch_thread = None
self._sync_thread = None
self._session_id = ""
self._parent_session_id = ""
self._document_id = ""
# Tags
self._tags: list[str] | None = None
@@ -443,7 +293,6 @@ class HindsightMemoryProvider(MemoryProvider):
# Bank
self._bank_mission = ""
self._bank_retain_mission: str | None = None
self._bank_id_template = ""
@property
def name(self) -> str:
@@ -453,16 +302,9 @@ class HindsightMemoryProvider(MemoryProvider):
try:
cfg = _load_config()
mode = cfg.get("mode", "cloud")
if mode in ("local", "local_embedded"):
available, _ = _check_local_runtime()
return available
if mode == "local_external":
if mode in ("local", "local_embedded", "local_external"):
return True
has_key = bool(
cfg.get("apiKey")
or cfg.get("api_key")
or os.environ.get("HINDSIGHT_API_KEY", "")
)
has_key = bool(cfg.get("apiKey") or os.environ.get("HINDSIGHT_API_KEY", ""))
has_url = bool(cfg.get("api_url") or os.environ.get("HINDSIGHT_API_URL", ""))
return has_key or has_url
except Exception:
@@ -521,7 +363,7 @@ class HindsightMemoryProvider(MemoryProvider):
else:
deps_to_install = [cloud_dep]
print("\n Checking dependencies...")
print(f"\n Checking dependencies...")
uv_path = shutil.which("uv")
if not uv_path:
print(" ⚠ uv not found — install it: curl -LsSf https://astral.sh/uv/install.sh | sh")
@@ -532,14 +374,14 @@ class HindsightMemoryProvider(MemoryProvider):
[uv_path, "pip", "install", "--python", sys.executable, "--quiet", "--upgrade"] + deps_to_install,
check=True, timeout=120, capture_output=True,
)
print(" ✓ Dependencies up to date")
print(f" ✓ Dependencies up to date")
except Exception as e:
print(f" ⚠ Install failed: {e}")
print(f" Run manually: uv pip install --python {sys.executable} {' '.join(deps_to_install)}")
# Step 3: Mode-specific config
if mode == "cloud":
print("\n Get your API key at https://ui.hindsight.vectorize.io\n")
print(f"\n Get your API key at https://ui.hindsight.vectorize.io\n")
existing_key = os.environ.get("HINDSIGHT_API_KEY", "")
if existing_key:
masked = f"...{existing_key[-4:]}" if len(existing_key) > 4 else "set"
@@ -592,19 +434,13 @@ class HindsightMemoryProvider(MemoryProvider):
sys.stdout.write(" LLM API key: ")
sys.stdout.flush()
llm_key = getpass.getpass(prompt="") if sys.stdin.isatty() else sys.stdin.readline().strip()
# Always write explicitly (including empty) so the provider sees ""
# rather than a missing variable. The daemon reads from .env at
# startup and fails when HINDSIGHT_LLM_API_KEY is unset.
env_writes["HINDSIGHT_LLM_API_KEY"] = llm_key
if llm_key:
env_writes["HINDSIGHT_LLM_API_KEY"] = llm_key
# Step 4: Save everything
provider_config["bank_id"] = "hermes"
provider_config["recall_budget"] = "mid"
# Read existing timeout from config if present, otherwise use default
existing_timeout = self._config.get("timeout") if self._config else None
timeout_val = existing_timeout if existing_timeout else _DEFAULT_TIMEOUT
provider_config["timeout"] = timeout_val
env_writes["HINDSIGHT_TIMEOUT"] = str(timeout_val)
bank_id = "hermes"
config["memory"]["provider"] = "hindsight"
save_config(config)
@@ -630,32 +466,10 @@ class HindsightMemoryProvider(MemoryProvider):
new_lines.append(f"{k}={v}")
env_path.write_text("\n".join(new_lines) + "\n")
if mode == "local_embedded":
materialized_config = dict(provider_config)
config_path = Path(hermes_home) / "hindsight" / "config.json"
try:
materialized_config = json.loads(config_path.read_text(encoding="utf-8"))
except Exception:
pass
llm_api_key = env_writes.get("HINDSIGHT_LLM_API_KEY", "")
if not llm_api_key:
llm_api_key = _load_simple_env(Path(hermes_home) / ".env").get("HINDSIGHT_LLM_API_KEY", "")
if not llm_api_key:
llm_api_key = _load_simple_env(_embedded_profile_env_path(materialized_config)).get(
"HINDSIGHT_API_LLM_API_KEY",
"",
)
_materialize_embedded_profile_env(
materialized_config,
llm_api_key=llm_api_key or None,
)
print(f"\n ✓ Hindsight memory configured ({mode} mode)")
if env_writes:
print(" API keys saved to .env")
print("\n Start a new session to activate.\n")
print(f" API keys saved to .env")
print(f"\n Start a new session to activate.\n")
def get_config_schema(self):
return [
@@ -671,8 +485,7 @@ class HindsightMemoryProvider(MemoryProvider):
{"key": "llm_base_url", "description": "Endpoint URL (e.g. http://192.168.1.10:8080/v1)", "default": "", "when": {"mode": "local_embedded", "llm_provider": "openai_compatible"}},
{"key": "llm_api_key", "description": "LLM API key (optional for openai_compatible)", "secret": True, "env_var": "HINDSIGHT_LLM_API_KEY", "when": {"mode": "local_embedded"}},
{"key": "llm_model", "description": "LLM model", "default": "gpt-4o-mini", "default_from": {"field": "llm_provider", "map": _PROVIDER_DEFAULT_MODELS}, "when": {"mode": "local_embedded"}},
{"key": "bank_id", "description": "Memory bank name (static fallback when bank_id_template is unset)", "default": "hermes"},
{"key": "bank_id_template", "description": "Optional template to derive bank_id dynamically. Placeholders: {profile}, {workspace}, {platform}, {user}, {session}. Example: hermes-{profile}", "default": ""},
{"key": "bank_id", "description": "Memory bank name", "default": "hermes"},
{"key": "bank_mission", "description": "Mission/purpose description for the memory bank"},
{"key": "bank_retain_mission", "description": "Custom extraction prompt for memory retention"},
{"key": "recall_budget", "description": "Recall thoroughness", "default": "mid", "choices": ["low", "mid", "high"]},
@@ -692,19 +505,12 @@ class HindsightMemoryProvider(MemoryProvider):
{"key": "recall_max_tokens", "description": "Maximum tokens for recall results", "default": 4096},
{"key": "recall_max_input_chars", "description": "Maximum input query length for auto-recall", "default": 800},
{"key": "recall_prompt_preamble", "description": "Custom preamble for recalled memories in context"},
{"key": "timeout", "description": "API request timeout in seconds", "default": _DEFAULT_TIMEOUT},
]
def _get_client(self):
"""Return the cached Hindsight client (created once, reused)."""
if self._client is None:
if self._mode == "local_embedded":
available, reason = _check_local_runtime()
if not available:
raise RuntimeError(
"Hindsight local runtime is unavailable"
+ (f": {reason}" if reason else "")
)
from hindsight import HindsightEmbedded
HindsightEmbedded.__del__ = lambda self: None
llm_provider = self._config.get("llm_provider", "")
@@ -723,30 +529,16 @@ class HindsightMemoryProvider(MemoryProvider):
self._client = HindsightEmbedded(**kwargs)
else:
from hindsight_client import Hindsight
timeout = self._timeout or _DEFAULT_TIMEOUT
kwargs = {"base_url": self._api_url, "timeout": float(timeout)}
kwargs = {"base_url": self._api_url, "timeout": 30.0}
if self._api_key:
kwargs["api_key"] = self._api_key
logger.debug("Creating Hindsight cloud client (url=%s, has_key=%s, timeout=%s)",
self._api_url, bool(self._api_key), kwargs["timeout"])
logger.debug("Creating Hindsight cloud client (url=%s, has_key=%s)",
self._api_url, bool(self._api_key))
self._client = Hindsight(**kwargs)
return self._client
def _run_sync(self, coro):
"""Schedule *coro* on the shared loop using the configured timeout."""
return _run_sync(coro, timeout=self._timeout)
def initialize(self, session_id: str, **kwargs) -> None:
self._session_id = str(session_id or "").strip()
self._parent_session_id = str(kwargs.get("parent_session_id", "") or "").strip()
# Each process lifecycle gets its own document_id. Reusing session_id
# alone caused overwrites on /resume — the reloaded session starts
# with an empty _session_turns, so the next retain would replace the
# previously stored content. session_id stays in tags so processes
# for the same session remain filterable together.
start_ts = datetime.now().strftime("%Y%m%d_%H%M%S_%f")
self._document_id = f"{self._session_id}-{start_ts}"
# Check client version and auto-upgrade if needed
try:
@@ -756,9 +548,7 @@ class HindsightMemoryProvider(MemoryProvider):
if Version(installed) < Version(_MIN_CLIENT_VERSION):
logger.warning("hindsight-client %s is outdated (need >=%s), attempting upgrade...",
installed, _MIN_CLIENT_VERSION)
import shutil
import subprocess
import sys
import shutil, subprocess, sys
uv_path = shutil.which("uv")
if uv_path:
try:
@@ -785,41 +575,19 @@ class HindsightMemoryProvider(MemoryProvider):
self._chat_type = str(kwargs.get("chat_type") or "").strip()
self._thread_id = str(kwargs.get("thread_id") or "").strip()
self._agent_identity = str(kwargs.get("agent_identity") or "").strip()
self._agent_workspace = str(kwargs.get("agent_workspace") or "").strip()
self._turn_index = 0
self._session_turns = []
self._mode = self._config.get("mode", "cloud")
# Read timeout from config or env var, fall back to default
self._timeout = self._config.get("timeout") or int(os.environ.get("HINDSIGHT_TIMEOUT", str(_DEFAULT_TIMEOUT)))
# "local" is a legacy alias for "local_embedded"
if self._mode == "local":
self._mode = "local_embedded"
if self._mode == "local_embedded":
available, reason = _check_local_runtime()
if not available:
logger.warning(
"Hindsight local mode disabled because its runtime could not be imported: %s",
reason,
)
self._mode = "disabled"
return
self._api_key = self._config.get("apiKey") or self._config.get("api_key") or os.environ.get("HINDSIGHT_API_KEY", "")
default_url = _DEFAULT_LOCAL_URL if self._mode in ("local_embedded", "local_external") else _DEFAULT_API_URL
self._api_url = self._config.get("api_url") or os.environ.get("HINDSIGHT_API_URL", default_url)
self._llm_base_url = self._config.get("llm_base_url", "")
banks = self._config.get("banks", {}).get("hermes", {})
static_bank_id = self._config.get("bank_id") or banks.get("bankId", "hermes")
self._bank_id_template = self._config.get("bank_id_template", "") or ""
self._bank_id = _resolve_bank_id_template(
self._bank_id_template,
fallback=static_bank_id,
profile=self._agent_identity,
workspace=self._agent_workspace,
platform=self._platform,
user=self._user_id,
session=self._session_id,
)
self._bank_id = self._config.get("bank_id") or banks.get("bankId", "hermes")
budget = self._config.get("recall_budget") or self._config.get("budget") or banks.get("budget", "mid")
self._budget = budget if budget in _VALID_BUDGETS else "mid"
@@ -872,10 +640,6 @@ class HindsightMemoryProvider(MemoryProvider):
pass
logger.info("Hindsight initialized: mode=%s, api_url=%s, bank=%s, budget=%s, memory_mode=%s, prefetch_method=%s, client=%s",
self._mode, self._api_url, self._bank_id, self._budget, self._memory_mode, self._prefetch_method, _client_version)
if self._bank_id_template:
logger.debug("Hindsight bank resolved from template %r: profile=%s workspace=%s platform=%s user=%s -> bank=%s",
self._bank_id_template, self._agent_identity, self._agent_workspace,
self._platform, self._user_id, self._bank_id)
logger.debug("Hindsight config: auto_retain=%s, auto_recall=%s, retain_every_n=%d, "
"retain_async=%s, retain_context=%s, recall_max_tokens=%d, recall_max_input_chars=%d, tags=%s, recall_tags=%s",
self._auto_retain, self._auto_recall, self._retain_every_n_turns,
@@ -905,13 +669,42 @@ class HindsightMemoryProvider(MemoryProvider):
# Update the profile .env to match our current config so
# the daemon always starts with the right settings.
# If the config changed and the daemon is running, stop it.
profile_env = _embedded_profile_env_path(self._config)
expected_env = _build_embedded_profile_env(self._config)
saved = _load_simple_env(profile_env)
config_changed = saved != expected_env
from pathlib import Path as _Path
profile_env = _Path.home() / ".hindsight" / "profiles" / f"{profile}.env"
current_key = self._config.get("llm_api_key") or os.environ.get("HINDSIGHT_LLM_API_KEY", "")
current_provider = self._config.get("llm_provider", "")
current_model = self._config.get("llm_model", "")
current_base_url = self._config.get("llm_base_url") or os.environ.get("HINDSIGHT_API_LLM_BASE_URL", "")
# Map openai_compatible/openrouter → openai for the daemon (OpenAI wire format)
daemon_provider = "openai" if current_provider in ("openai_compatible", "openrouter") else current_provider
# Read saved profile config
saved = {}
if profile_env.exists():
for line in profile_env.read_text().splitlines():
if "=" in line and not line.startswith("#"):
k, v = line.split("=", 1)
saved[k.strip()] = v.strip()
config_changed = (
saved.get("HINDSIGHT_API_LLM_PROVIDER") != daemon_provider or
saved.get("HINDSIGHT_API_LLM_MODEL") != current_model or
saved.get("HINDSIGHT_API_LLM_API_KEY") != current_key or
saved.get("HINDSIGHT_API_LLM_BASE_URL", "") != current_base_url
)
if config_changed:
profile_env = _materialize_embedded_profile_env(self._config)
# Write updated profile .env
profile_env.parent.mkdir(parents=True, exist_ok=True)
env_lines = (
f"HINDSIGHT_API_LLM_PROVIDER={daemon_provider}\n"
f"HINDSIGHT_API_LLM_API_KEY={current_key}\n"
f"HINDSIGHT_API_LLM_MODEL={current_model}\n"
f"HINDSIGHT_API_LOG_LEVEL=info\n"
)
if current_base_url:
env_lines += f"HINDSIGHT_API_LLM_BASE_URL={current_base_url}\n"
profile_env.write_text(env_lines)
if client._manager.is_running(profile):
with open(log_path, "a") as f:
f.write("\n=== Config changed, restarting daemon ===\n")
@@ -984,7 +777,7 @@ class HindsightMemoryProvider(MemoryProvider):
client = self._get_client()
if self._prefetch_method == "reflect":
logger.debug("Prefetch: calling reflect (bank=%s, query_len=%d)", self._bank_id, len(query))
resp = self._run_sync(client.areflect(bank_id=self._bank_id, query=query, budget=self._budget))
resp = _run_sync(client.areflect(bank_id=self._bank_id, query=query, budget=self._budget))
text = resp.text or ""
else:
recall_kwargs: dict = {
@@ -998,7 +791,7 @@ class HindsightMemoryProvider(MemoryProvider):
recall_kwargs["types"] = self._recall_types
logger.debug("Prefetch: calling recall (bank=%s, query_len=%d, budget=%s)",
self._bank_id, len(query), self._budget)
resp = self._run_sync(client.arecall(**recall_kwargs))
resp = _run_sync(client.arecall(**recall_kwargs))
num_results = len(resp.results) if resp.results else 0
logger.debug("Prefetch: recall returned %d results", num_results)
text = "\n".join(f"- {r.text}" for r in resp.results if r.text) if resp.results else ""
@@ -1095,7 +888,7 @@ class HindsightMemoryProvider(MemoryProvider):
if session_id:
self._session_id = str(session_id).strip()
turn = json.dumps(self._build_turn_messages(user_content, assistant_content), ensure_ascii=False)
turn = json.dumps(self._build_turn_messages(user_content, assistant_content))
self._session_turns.append(turn)
self._turn_counter += 1
self._turn_index = self._turn_counter
@@ -1109,12 +902,6 @@ class HindsightMemoryProvider(MemoryProvider):
len(self._session_turns), sum(len(t) for t in self._session_turns))
content = "[" + ",".join(self._session_turns) + "]"
lineage_tags: list[str] = []
if self._session_id:
lineage_tags.append(f"session:{self._session_id}")
if self._parent_session_id:
lineage_tags.append(f"parent:{self._parent_session_id}")
def _sync():
try:
client = self._get_client()
@@ -1125,16 +912,15 @@ class HindsightMemoryProvider(MemoryProvider):
message_count=len(self._session_turns) * 2,
turn_index=self._turn_index,
),
tags=lineage_tags or None,
)
item.pop("bank_id", None)
item.pop("retain_async", None)
logger.debug("Hindsight retain: bank=%s, doc=%s, async=%s, content_len=%d, num_turns=%d",
self._bank_id, self._document_id, self._retain_async, len(content), len(self._session_turns))
self._run_sync(client.aretain_batch(
self._bank_id, self._session_id, self._retain_async, len(content), len(self._session_turns))
_run_sync(client.aretain_batch(
bank_id=self._bank_id,
items=[item],
document_id=self._document_id,
document_id=self._session_id,
retain_async=self._retain_async,
))
logger.debug("Hindsight retain succeeded")
@@ -1171,7 +957,7 @@ class HindsightMemoryProvider(MemoryProvider):
)
logger.debug("Tool hindsight_retain: bank=%s, content_len=%d, context=%s",
self._bank_id, len(content), context)
self._run_sync(client.aretain(**retain_kwargs))
_run_sync(client.aretain(**retain_kwargs))
logger.debug("Tool hindsight_retain: success")
return json.dumps({"result": "Memory stored successfully."})
except Exception as e:
@@ -1194,7 +980,7 @@ class HindsightMemoryProvider(MemoryProvider):
recall_kwargs["types"] = self._recall_types
logger.debug("Tool hindsight_recall: bank=%s, query_len=%d, budget=%s",
self._bank_id, len(query), self._budget)
resp = self._run_sync(client.arecall(**recall_kwargs))
resp = _run_sync(client.arecall(**recall_kwargs))
num_results = len(resp.results) if resp.results else 0
logger.debug("Tool hindsight_recall: %d results", num_results)
if not resp.results:
@@ -1212,7 +998,7 @@ class HindsightMemoryProvider(MemoryProvider):
try:
logger.debug("Tool hindsight_reflect: bank=%s, query_len=%d, budget=%s",
self._bank_id, len(query), self._budget)
resp = self._run_sync(client.areflect(
resp = _run_sync(client.areflect(
bank_id=self._bank_id, query=query, budget=self._budget
))
logger.debug("Tool hindsight_reflect: response_len=%d", len(resp.text or ""))
@@ -1225,6 +1011,7 @@ class HindsightMemoryProvider(MemoryProvider):
def shutdown(self) -> None:
logger.debug("Hindsight shutdown: waiting for background threads")
global _loop, _loop_thread
for t in (self._prefetch_thread, self._sync_thread):
if t and t.is_alive():
t.join(timeout=5.0)
@@ -1239,21 +1026,17 @@ class HindsightMemoryProvider(MemoryProvider):
except RuntimeError:
pass
else:
self._run_sync(self._client.aclose())
_run_sync(self._client.aclose())
except Exception:
pass
self._client = None
# The module-global background event loop (_loop / _loop_thread)
# is intentionally NOT stopped here. It is shared across every
# HindsightMemoryProvider instance in the process — the plugin
# loader creates a new provider per AIAgent, and the gateway
# creates one AIAgent per concurrent chat session. Stopping the
# loop from one provider's shutdown() strands the aiohttp
# ClientSession + TCPConnector owned by every sibling provider
# on a dead loop, which surfaces as the "Unclosed client session"
# / "Unclosed connector" warnings reported in #11923. The loop
# runs on a daemon thread and is reclaimed on process exit;
# per-session cleanup happens via self._client.aclose() above.
# Stop the background event loop so no tasks are pending at exit
if _loop is not None and _loop.is_running():
_loop.call_soon_threadsafe(_loop.stop)
if _loop_thread is not None:
_loop_thread.join(timeout=5.0)
_loop = None
_loop_thread = None
def register(ctx) -> None:
+1 -1
View File
@@ -43,7 +43,7 @@ _TIMEOUT = 30.0
# ---------------------------------------------------------------------------
# Process-level atexit safety net — ensures pending sessions are committed
# even if shutdown_memory_provider is never called (e.g. gateway crash,
# SIGKILL, or exception in the session expiry watcher preventing shutdown).
# SIGKILL, or exception in _async_flush_memories preventing shutdown).
# ---------------------------------------------------------------------------
_last_active_provider: Optional["OpenVikingMemoryProvider"] = None
-66
View File
@@ -1,66 +0,0 @@
"""Spotify integration plugin — bundled, auto-loaded.
Registers 7 tools (playback, devices, queue, search, playlists, albums,
library) into the ``spotify`` toolset. Each tool's handler is gated by
``_check_spotify_available()`` when the user has not run ``hermes auth
spotify``, the tools remain registered (so they appear in ``hermes
tools``) but the runtime check prevents dispatch.
Why a plugin instead of a top-level ``tools/`` file?
- ``plugins/`` is where third-party service integrations live (see
``plugins/image_gen/`` for the backend-provider pattern, ``plugins/
disk-cleanup/`` for the standalone pattern). ``tools/`` is reserved
for foundational capabilities (terminal, read_file, web_search, etc.).
- Mirroring the image_gen plugin layout (``plugins/<category>/<backend>/``
for categories, flat ``plugins/<name>/`` for standalones) makes new
service integrations a pattern contributors can copy.
- Bundled + ``kind: backend`` auto-loads on startup just like image_gen
backends no user opt-in needed, no ``plugins.enabled`` config.
The Spotify auth flow (``hermes auth spotify``), CLI plumbing, and docs
are unchanged. This move is purely structural.
"""
from __future__ import annotations
from plugins.spotify.tools import (
SPOTIFY_ALBUMS_SCHEMA,
SPOTIFY_DEVICES_SCHEMA,
SPOTIFY_LIBRARY_SCHEMA,
SPOTIFY_PLAYBACK_SCHEMA,
SPOTIFY_PLAYLISTS_SCHEMA,
SPOTIFY_QUEUE_SCHEMA,
SPOTIFY_SEARCH_SCHEMA,
_check_spotify_available,
_handle_spotify_albums,
_handle_spotify_devices,
_handle_spotify_library,
_handle_spotify_playback,
_handle_spotify_playlists,
_handle_spotify_queue,
_handle_spotify_search,
)
_TOOLS = (
("spotify_playback", SPOTIFY_PLAYBACK_SCHEMA, _handle_spotify_playback, "🎵"),
("spotify_devices", SPOTIFY_DEVICES_SCHEMA, _handle_spotify_devices, "🔈"),
("spotify_queue", SPOTIFY_QUEUE_SCHEMA, _handle_spotify_queue, "📻"),
("spotify_search", SPOTIFY_SEARCH_SCHEMA, _handle_spotify_search, "🔎"),
("spotify_playlists", SPOTIFY_PLAYLISTS_SCHEMA, _handle_spotify_playlists, "📚"),
("spotify_albums", SPOTIFY_ALBUMS_SCHEMA, _handle_spotify_albums, "💿"),
("spotify_library", SPOTIFY_LIBRARY_SCHEMA, _handle_spotify_library, "❤️"),
)
def register(ctx) -> None:
"""Register all Spotify tools. Called once by the plugin loader."""
for name, schema, handler, emoji in _TOOLS:
ctx.register_tool(
name=name,
toolset="spotify",
schema=schema,
handler=handler,
check_fn=_check_spotify_available,
emoji=emoji,
)

Some files were not shown because too many files have changed in this diff Show More