Compare commits
28 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| f90afa03cc | |||
| c6974fd108 | |||
| c6dba918b3 | |||
| 3eade90b39 | |||
| 34d06a9802 | |||
| 2772d99085 | |||
| ee16416c7b | |||
| 3007174a61 | |||
| 2f0a83dd12 | |||
| 110cdd573a | |||
| 4d1b988070 | |||
| 019c11d07e | |||
| fce23e8024 | |||
| 1ec1f6a68a | |||
| 637ad443bf | |||
| a8b85bb887 | |||
| d9753720f3 | |||
| dbc11abcb6 | |||
| 268ee6bdce | |||
| 173289b64f | |||
| 1a3ae6ac6e | |||
| 78e6b06518 | |||
| b650957b40 | |||
| ad06bfccf0 | |||
| 8dfc96dbbb | |||
| 3c8ec7037c | |||
| 161c2c4da4 | |||
| e22416dd9b |
@@ -27,8 +27,8 @@ jobs:
|
||||
with:
|
||||
python-version: '3.11'
|
||||
|
||||
- name: Install Python dependencies
|
||||
run: python -m pip install ascii-guard pyyaml
|
||||
- name: Install ascii-guard
|
||||
run: python -m pip install ascii-guard==2.3.0 pyyaml==6.0.3
|
||||
|
||||
- name: Extract skill metadata for dashboard
|
||||
run: python3 website/scripts/extract-skills.py
|
||||
|
||||
@@ -27,8 +27,8 @@ jobs:
|
||||
timeout-minutes: 30
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: DeterminateSystems/nix-installer-action@main
|
||||
- uses: DeterminateSystems/magic-nix-cache-action@main
|
||||
- uses: DeterminateSystems/nix-installer-action@ef8a148080ab6020fd15196c2084a2eea5ff2d25 # v22
|
||||
- uses: DeterminateSystems/magic-nix-cache-action@565684385bcd71bad329742eefe8d12f2e765b39 # v13
|
||||
- name: Check flake
|
||||
if: runner.os == 'Linux'
|
||||
run: nix flake check --print-build-logs
|
||||
|
||||
@@ -1,5 +1,8 @@
|
||||
FROM debian:13.4
|
||||
|
||||
# Disable Python stdout buffering to ensure logs are printed immediately
|
||||
ENV PYTHONUNBUFFERED=1
|
||||
|
||||
# Install system dependencies in one layer, clear APT cache
|
||||
RUN apt-get update && \
|
||||
apt-get install -y --no-install-recommends \
|
||||
|
||||
@@ -1238,10 +1238,27 @@ def build_anthropic_kwargs(
|
||||
) -> Dict[str, Any]:
|
||||
"""Build kwargs for anthropic.messages.create().
|
||||
|
||||
When *max_tokens* is None, the model's native output limit is used
|
||||
(e.g. 128K for Opus 4.6, 64K for Sonnet 4.6). If *context_length*
|
||||
is provided, the effective limit is clamped so it doesn't exceed
|
||||
the context window.
|
||||
Naming note — two distinct concepts, easily confused:
|
||||
max_tokens = OUTPUT token cap for a single response.
|
||||
Anthropic's API calls this "max_tokens" but it only
|
||||
limits the *output*. Anthropic's own native SDK
|
||||
renamed it "max_output_tokens" for clarity.
|
||||
context_length = TOTAL context window (input tokens + output tokens).
|
||||
The API enforces: input_tokens + max_tokens ≤ context_length.
|
||||
Stored on the ContextCompressor; reduced on overflow errors.
|
||||
|
||||
When *max_tokens* is None the model's native output ceiling is used
|
||||
(e.g. 128K for Opus 4.6, 64K for Sonnet 4.6).
|
||||
|
||||
When *context_length* is provided and the model's native output ceiling
|
||||
exceeds it (e.g. a local endpoint with an 8K window), the output cap is
|
||||
clamped to context_length − 1. This only kicks in for unusually small
|
||||
context windows; for full-size models the native output cap is always
|
||||
smaller than the context window so no clamping happens.
|
||||
NOTE: this clamping does not account for prompt size — if the prompt is
|
||||
large, Anthropic may still reject the request. The caller must detect
|
||||
"max_tokens too large given prompt" errors and retry with a smaller cap
|
||||
(see parse_available_output_tokens_from_error + _ephemeral_max_output_tokens).
|
||||
|
||||
When *is_oauth* is True, applies Claude Code compatibility transforms:
|
||||
system prompt prefix, tool name prefixing, and prompt sanitization.
|
||||
@@ -1256,10 +1273,14 @@ def build_anthropic_kwargs(
|
||||
anthropic_tools = convert_tools_to_anthropic(tools) if tools else []
|
||||
|
||||
model = normalize_model_name(model, preserve_dots=preserve_dots)
|
||||
# effective_max_tokens = output cap for this call (≠ total context window)
|
||||
effective_max_tokens = max_tokens or _get_anthropic_max_output(model)
|
||||
|
||||
# Clamp to context window if the user set a lower context_length
|
||||
# (e.g. custom endpoint with limited capacity).
|
||||
# Clamp output cap to fit inside the total context window.
|
||||
# Only matters for small custom endpoints where context_length < native
|
||||
# output ceiling. For standard Anthropic models context_length (e.g.
|
||||
# 200K) is always larger than the output ceiling (e.g. 128K), so this
|
||||
# branch is not taken.
|
||||
if context_length and effective_max_tokens > context_length:
|
||||
effective_max_tokens = max(context_length - 1, 1)
|
||||
|
||||
|
||||
@@ -702,7 +702,7 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
|
||||
logger.debug("Auxiliary text client: %s (%s) via pool", pconfig.name, model)
|
||||
extra = {}
|
||||
if "api.kimi.com" in base_url.lower():
|
||||
extra["default_headers"] = {"User-Agent": "KimiCLI/1.0"}
|
||||
extra["default_headers"] = {"User-Agent": "KimiCLI/1.3"}
|
||||
elif "api.githubcopilot.com" in base_url.lower():
|
||||
from hermes_cli.models import copilot_default_headers
|
||||
|
||||
@@ -721,7 +721,7 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
|
||||
logger.debug("Auxiliary text client: %s (%s)", pconfig.name, model)
|
||||
extra = {}
|
||||
if "api.kimi.com" in base_url.lower():
|
||||
extra["default_headers"] = {"User-Agent": "KimiCLI/1.0"}
|
||||
extra["default_headers"] = {"User-Agent": "KimiCLI/1.3"}
|
||||
elif "api.githubcopilot.com" in base_url.lower():
|
||||
from hermes_cli.models import copilot_default_headers
|
||||
|
||||
@@ -1047,6 +1047,32 @@ def _is_payment_error(exc: Exception) -> bool:
|
||||
return False
|
||||
|
||||
|
||||
def _is_connection_error(exc: Exception) -> bool:
|
||||
"""Detect connection/network errors that warrant provider fallback.
|
||||
|
||||
Returns True for errors indicating the provider endpoint is unreachable
|
||||
(DNS failure, connection refused, TLS errors, timeouts). These are
|
||||
distinct from API errors (4xx/5xx) which indicate the provider IS
|
||||
reachable but returned an error.
|
||||
"""
|
||||
from openai import APIConnectionError, APITimeoutError
|
||||
|
||||
if isinstance(exc, (APIConnectionError, APITimeoutError)):
|
||||
return True
|
||||
# urllib3 / httpx / httpcore connection errors
|
||||
err_type = type(exc).__name__
|
||||
if any(kw in err_type for kw in ("Connection", "Timeout", "DNS", "SSL")):
|
||||
return True
|
||||
err_lower = str(exc).lower()
|
||||
if any(kw in err_lower for kw in (
|
||||
"connection refused", "name or service not known",
|
||||
"no route to host", "network is unreachable",
|
||||
"timed out", "connection reset",
|
||||
)):
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def _try_payment_fallback(
|
||||
failed_provider: str,
|
||||
task: str = None,
|
||||
@@ -1111,7 +1137,7 @@ def _resolve_auto() -> Tuple[Optional[OpenAI], Optional[str]]:
|
||||
main_model = _read_main_model()
|
||||
if (main_provider and main_model
|
||||
and main_provider not in _AGGREGATOR_PROVIDERS
|
||||
and main_provider not in ("auto", "custom", "")):
|
||||
and main_provider not in ("auto", "")):
|
||||
client, resolved = resolve_provider_client(main_provider, main_model)
|
||||
if client is not None:
|
||||
logger.info("Auxiliary auto-detect: using main provider %s (%s)",
|
||||
@@ -1169,7 +1195,7 @@ def _to_async_client(sync_client, model: str):
|
||||
|
||||
async_kwargs["default_headers"] = copilot_default_headers()
|
||||
elif "api.kimi.com" in base_lower:
|
||||
async_kwargs["default_headers"] = {"User-Agent": "KimiCLI/1.0"}
|
||||
async_kwargs["default_headers"] = {"User-Agent": "KimiCLI/1.3"}
|
||||
return AsyncOpenAI(**async_kwargs), model
|
||||
|
||||
|
||||
@@ -1289,7 +1315,13 @@ def resolve_provider_client(
|
||||
)
|
||||
return None, None
|
||||
final_model = model or _read_main_model() or "gpt-4o-mini"
|
||||
client = OpenAI(api_key=custom_key, base_url=custom_base)
|
||||
extra = {}
|
||||
if "api.kimi.com" in custom_base.lower():
|
||||
extra["default_headers"] = {"User-Agent": "KimiCLI/1.3"}
|
||||
elif "api.githubcopilot.com" in custom_base.lower():
|
||||
from hermes_cli.models import copilot_default_headers
|
||||
extra["default_headers"] = copilot_default_headers()
|
||||
client = OpenAI(api_key=custom_key, base_url=custom_base, **extra)
|
||||
return (_to_async_client(client, final_model) if async_mode
|
||||
else (client, final_model))
|
||||
# Try custom first, then codex, then API-key providers
|
||||
@@ -1368,7 +1400,7 @@ def resolve_provider_client(
|
||||
# Provider-specific headers
|
||||
headers = {}
|
||||
if "api.kimi.com" in base_url.lower():
|
||||
headers["User-Agent"] = "KimiCLI/1.0"
|
||||
headers["User-Agent"] = "KimiCLI/1.3"
|
||||
elif "api.githubcopilot.com" in base_url.lower():
|
||||
from hermes_cli.models import copilot_default_headers
|
||||
|
||||
@@ -2093,7 +2125,18 @@ def call_llm(
|
||||
# try alternative providers instead of giving up. This handles the
|
||||
# common case where a user runs out of OpenRouter credits but has
|
||||
# Codex OAuth or another provider available.
|
||||
if _is_payment_error(first_err):
|
||||
#
|
||||
# ── Connection error fallback ────────────────────────────────
|
||||
# When a provider endpoint is unreachable (DNS failure, connection
|
||||
# refused, timeout), try alternative providers. This handles stale
|
||||
# Codex/OAuth tokens that authenticate but whose endpoint is down,
|
||||
# and providers the user never configured that got picked up by
|
||||
# the auto-detection chain.
|
||||
should_fallback = _is_payment_error(first_err) or _is_connection_error(first_err)
|
||||
if should_fallback:
|
||||
reason = "payment error" if _is_payment_error(first_err) else "connection error"
|
||||
logger.info("Auxiliary %s: %s on %s (%s), trying fallback",
|
||||
task or "call", reason, resolved_provider, first_err)
|
||||
fb_client, fb_model, fb_label = _try_payment_fallback(
|
||||
resolved_provider, task)
|
||||
if fb_client is not None:
|
||||
|
||||
@@ -18,12 +18,14 @@ import hermes_cli.auth as auth_mod
|
||||
from hermes_cli.auth import (
|
||||
CODEX_ACCESS_TOKEN_REFRESH_SKEW_SECONDS,
|
||||
DEFAULT_AGENT_KEY_MIN_TTL_SECONDS,
|
||||
KIMI_CODE_BASE_URL,
|
||||
PROVIDER_REGISTRY,
|
||||
_codex_access_token_is_expiring,
|
||||
_decode_jwt_claims,
|
||||
_import_codex_cli_tokens,
|
||||
_load_auth_store,
|
||||
_load_provider_state,
|
||||
_resolve_kimi_base_url,
|
||||
_resolve_zai_base_url,
|
||||
read_credential_pool,
|
||||
write_credential_pool,
|
||||
@@ -511,6 +513,13 @@ class CredentialPool:
|
||||
except Exception as wexc:
|
||||
logger.debug("Failed to write refreshed token to credentials file: %s", wexc)
|
||||
elif self.provider == "openai-codex":
|
||||
# Proactively sync from ~/.codex/auth.json before refresh.
|
||||
# The Codex CLI (or another Hermes profile) may have already
|
||||
# consumed our refresh_token. Syncing first avoids a
|
||||
# "refresh_token_reused" error when the CLI has a newer pair.
|
||||
synced = self._sync_codex_entry_from_cli(entry)
|
||||
if synced is not entry:
|
||||
entry = synced
|
||||
refreshed = auth_mod.refresh_codex_oauth_pure(
|
||||
entry.access_token,
|
||||
entry.refresh_token,
|
||||
@@ -596,6 +605,35 @@ class CredentialPool:
|
||||
# Credentials file had a valid (non-expired) token — use it directly
|
||||
logger.debug("Credentials file has valid token, using without refresh")
|
||||
return synced
|
||||
# For openai-codex: the refresh_token may have been consumed by
|
||||
# the Codex CLI between our proactive sync and the refresh call.
|
||||
# Re-sync and retry once.
|
||||
if self.provider == "openai-codex":
|
||||
synced = self._sync_codex_entry_from_cli(entry)
|
||||
if synced.refresh_token != entry.refresh_token:
|
||||
logger.debug("Retrying Codex refresh with synced token from ~/.codex/auth.json")
|
||||
try:
|
||||
refreshed = auth_mod.refresh_codex_oauth_pure(
|
||||
synced.access_token,
|
||||
synced.refresh_token,
|
||||
)
|
||||
updated = replace(
|
||||
synced,
|
||||
access_token=refreshed["access_token"],
|
||||
refresh_token=refreshed["refresh_token"],
|
||||
last_refresh=refreshed.get("last_refresh"),
|
||||
last_status=STATUS_OK,
|
||||
last_status_at=None,
|
||||
last_error_code=None,
|
||||
)
|
||||
self._replace_entry(synced, updated)
|
||||
self._persist()
|
||||
return updated
|
||||
except Exception as retry_exc:
|
||||
logger.debug("Codex retry refresh also failed: %s", retry_exc)
|
||||
elif not self._entry_needs_refresh(synced):
|
||||
logger.debug("Codex CLI has valid token, using without refresh")
|
||||
return synced
|
||||
self._mark_exhausted(entry, None)
|
||||
return None
|
||||
|
||||
@@ -1084,7 +1122,9 @@ def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool
|
||||
active_sources.add(source)
|
||||
auth_type = AUTH_TYPE_OAUTH if provider == "anthropic" and not token.startswith("sk-ant-api") else AUTH_TYPE_API_KEY
|
||||
base_url = env_url or pconfig.inference_base_url
|
||||
if provider == "zai":
|
||||
if provider == "kimi-coding":
|
||||
base_url = _resolve_kimi_base_url(token, pconfig.inference_base_url, env_url)
|
||||
elif provider == "zai":
|
||||
base_url = _resolve_zai_base_url(token, pconfig.inference_base_url, env_url)
|
||||
changed |= _upsert_entry(
|
||||
entries,
|
||||
|
||||
@@ -0,0 +1,792 @@
|
||||
"""API error classification for smart failover and recovery.
|
||||
|
||||
Provides a structured taxonomy of API errors and a priority-ordered
|
||||
classification pipeline that determines the correct recovery action
|
||||
(retry, rotate credential, fallback to another provider, compress
|
||||
context, or abort).
|
||||
|
||||
Replaces scattered inline string-matching with a centralized classifier
|
||||
that the main retry loop in run_agent.py consults for every API failure.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import enum
|
||||
import logging
|
||||
import re
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any, Dict, Optional
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
# ── Error taxonomy ──────────────────────────────────────────────────────
|
||||
|
||||
class FailoverReason(enum.Enum):
|
||||
"""Why an API call failed — determines recovery strategy."""
|
||||
|
||||
# Authentication / authorization
|
||||
auth = "auth" # Transient auth (401/403) — refresh/rotate
|
||||
auth_permanent = "auth_permanent" # Auth failed after refresh — abort
|
||||
|
||||
# Billing / quota
|
||||
billing = "billing" # 402 or confirmed credit exhaustion — rotate immediately
|
||||
rate_limit = "rate_limit" # 429 or quota-based throttling — backoff then rotate
|
||||
|
||||
# Server-side
|
||||
overloaded = "overloaded" # 503/529 — provider overloaded, backoff
|
||||
server_error = "server_error" # 500/502 — internal server error, retry
|
||||
|
||||
# Transport
|
||||
timeout = "timeout" # Connection/read timeout — rebuild client + retry
|
||||
|
||||
# Context / payload
|
||||
context_overflow = "context_overflow" # Context too large — compress, not failover
|
||||
payload_too_large = "payload_too_large" # 413 — compress payload
|
||||
|
||||
# Model
|
||||
model_not_found = "model_not_found" # 404 or invalid model — fallback to different model
|
||||
|
||||
# Request format
|
||||
format_error = "format_error" # 400 bad request — abort or strip + retry
|
||||
|
||||
# Provider-specific
|
||||
thinking_signature = "thinking_signature" # Anthropic thinking block sig invalid
|
||||
long_context_tier = "long_context_tier" # Anthropic "extra usage" tier gate
|
||||
|
||||
# Catch-all
|
||||
unknown = "unknown" # Unclassifiable — retry with backoff
|
||||
|
||||
|
||||
# ── Classification result ───────────────────────────────────────────────
|
||||
|
||||
@dataclass
|
||||
class ClassifiedError:
|
||||
"""Structured classification of an API error with recovery hints."""
|
||||
|
||||
reason: FailoverReason
|
||||
status_code: Optional[int] = None
|
||||
provider: Optional[str] = None
|
||||
model: Optional[str] = None
|
||||
message: str = ""
|
||||
error_context: Dict[str, Any] = field(default_factory=dict)
|
||||
|
||||
# Recovery action hints — the retry loop checks these instead of
|
||||
# re-classifying the error itself.
|
||||
retryable: bool = True
|
||||
should_compress: bool = False
|
||||
should_rotate_credential: bool = False
|
||||
should_fallback: bool = False
|
||||
|
||||
@property
|
||||
def is_auth(self) -> bool:
|
||||
return self.reason in (FailoverReason.auth, FailoverReason.auth_permanent)
|
||||
|
||||
@property
|
||||
def is_transient(self) -> bool:
|
||||
"""Error is expected to resolve on retry (with or without backoff)."""
|
||||
return self.reason in (
|
||||
FailoverReason.rate_limit,
|
||||
FailoverReason.overloaded,
|
||||
FailoverReason.server_error,
|
||||
FailoverReason.timeout,
|
||||
FailoverReason.unknown,
|
||||
)
|
||||
|
||||
|
||||
# ── Provider-specific patterns ──────────────────────────────────────────
|
||||
|
||||
# Patterns that indicate billing exhaustion (not transient rate limit)
|
||||
_BILLING_PATTERNS = [
|
||||
"insufficient credits",
|
||||
"insufficient_quota",
|
||||
"credit balance",
|
||||
"credits have been exhausted",
|
||||
"top up your credits",
|
||||
"payment required",
|
||||
"billing hard limit",
|
||||
"exceeded your current quota",
|
||||
"account is deactivated",
|
||||
"plan does not include",
|
||||
]
|
||||
|
||||
# Patterns that indicate rate limiting (transient, will resolve)
|
||||
_RATE_LIMIT_PATTERNS = [
|
||||
"rate limit",
|
||||
"rate_limit",
|
||||
"too many requests",
|
||||
"throttled",
|
||||
"requests per minute",
|
||||
"tokens per minute",
|
||||
"requests per day",
|
||||
"try again in",
|
||||
"please retry after",
|
||||
"resource_exhausted",
|
||||
]
|
||||
|
||||
# Usage-limit patterns that need disambiguation (could be billing OR rate_limit)
|
||||
_USAGE_LIMIT_PATTERNS = [
|
||||
"usage limit",
|
||||
"quota",
|
||||
"limit exceeded",
|
||||
"key limit exceeded",
|
||||
]
|
||||
|
||||
# Patterns confirming usage limit is transient (not billing)
|
||||
_USAGE_LIMIT_TRANSIENT_SIGNALS = [
|
||||
"try again",
|
||||
"retry",
|
||||
"resets at",
|
||||
"reset in",
|
||||
"wait",
|
||||
"requests remaining",
|
||||
"periodic",
|
||||
"window",
|
||||
]
|
||||
|
||||
# Payload-too-large patterns detected from message text (no status_code attr).
|
||||
# Proxies and some backends embed the HTTP status in the error message.
|
||||
_PAYLOAD_TOO_LARGE_PATTERNS = [
|
||||
"request entity too large",
|
||||
"payload too large",
|
||||
"error code: 413",
|
||||
]
|
||||
|
||||
# Context overflow patterns
|
||||
_CONTEXT_OVERFLOW_PATTERNS = [
|
||||
"context length",
|
||||
"context size",
|
||||
"maximum context",
|
||||
"token limit",
|
||||
"too many tokens",
|
||||
"reduce the length",
|
||||
"exceeds the limit",
|
||||
"context window",
|
||||
"prompt is too long",
|
||||
"prompt exceeds max length",
|
||||
"max_tokens",
|
||||
"maximum number of tokens",
|
||||
# Chinese error messages (some providers return these)
|
||||
"超过最大长度",
|
||||
"上下文长度",
|
||||
]
|
||||
|
||||
# Model not found patterns
|
||||
_MODEL_NOT_FOUND_PATTERNS = [
|
||||
"is not a valid model",
|
||||
"invalid model",
|
||||
"model not found",
|
||||
"model_not_found",
|
||||
"does not exist",
|
||||
"no such model",
|
||||
"unknown model",
|
||||
"unsupported model",
|
||||
]
|
||||
|
||||
# Auth patterns (non-status-code signals)
|
||||
_AUTH_PATTERNS = [
|
||||
"invalid api key",
|
||||
"invalid_api_key",
|
||||
"authentication",
|
||||
"unauthorized",
|
||||
"forbidden",
|
||||
"invalid token",
|
||||
"token expired",
|
||||
"token revoked",
|
||||
"access denied",
|
||||
]
|
||||
|
||||
# Anthropic thinking block signature patterns
|
||||
_THINKING_SIG_PATTERNS = [
|
||||
"signature", # Combined with "thinking" check
|
||||
]
|
||||
|
||||
# Transport error type names
|
||||
_TRANSPORT_ERROR_TYPES = frozenset({
|
||||
"ReadTimeout", "ConnectTimeout", "PoolTimeout",
|
||||
"ConnectError", "RemoteProtocolError",
|
||||
"ConnectionError", "ConnectionResetError",
|
||||
"ConnectionAbortedError", "BrokenPipeError",
|
||||
"TimeoutError", "ReadError",
|
||||
"ServerDisconnectedError",
|
||||
# OpenAI SDK errors (not subclasses of Python builtins)
|
||||
"APIConnectionError",
|
||||
"APITimeoutError",
|
||||
})
|
||||
|
||||
# Server disconnect patterns (no status code, but transport-level)
|
||||
_SERVER_DISCONNECT_PATTERNS = [
|
||||
"server disconnected",
|
||||
"peer closed connection",
|
||||
"connection reset by peer",
|
||||
"connection was closed",
|
||||
"network connection lost",
|
||||
"unexpected eof",
|
||||
"incomplete chunked read",
|
||||
]
|
||||
|
||||
|
||||
# ── Classification pipeline ─────────────────────────────────────────────
|
||||
|
||||
def classify_api_error(
|
||||
error: Exception,
|
||||
*,
|
||||
provider: str = "",
|
||||
model: str = "",
|
||||
approx_tokens: int = 0,
|
||||
context_length: int = 200000,
|
||||
num_messages: int = 0,
|
||||
) -> ClassifiedError:
|
||||
"""Classify an API error into a structured recovery recommendation.
|
||||
|
||||
Priority-ordered pipeline:
|
||||
1. Special-case provider-specific patterns (thinking sigs, tier gates)
|
||||
2. HTTP status code + message-aware refinement
|
||||
3. Error code classification (from body)
|
||||
4. Message pattern matching (billing vs rate_limit vs context vs auth)
|
||||
5. Transport error heuristics
|
||||
6. Server disconnect + large session → context overflow
|
||||
7. Fallback: unknown (retryable with backoff)
|
||||
|
||||
Args:
|
||||
error: The exception from the API call.
|
||||
provider: Current provider name (e.g. "openrouter", "anthropic").
|
||||
model: Current model slug.
|
||||
approx_tokens: Approximate token count of the current context.
|
||||
context_length: Maximum context length for the current model.
|
||||
|
||||
Returns:
|
||||
ClassifiedError with reason and recovery action hints.
|
||||
"""
|
||||
status_code = _extract_status_code(error)
|
||||
error_type = type(error).__name__
|
||||
body = _extract_error_body(error)
|
||||
error_code = _extract_error_code(body)
|
||||
|
||||
# Build a comprehensive error message string for pattern matching.
|
||||
# str(error) alone may not include the body message (e.g. OpenAI SDK's
|
||||
# APIStatusError.__str__ returns the first arg, not the body). Append
|
||||
# the body message so patterns like "try again" in 402 disambiguation
|
||||
# are detected even when only present in the structured body.
|
||||
#
|
||||
# Also extract metadata.raw — OpenRouter wraps upstream provider errors
|
||||
# inside {"error": {"message": "Provider returned error", "metadata":
|
||||
# {"raw": "<actual error JSON>"}}} and the real error message (e.g.
|
||||
# "context length exceeded") is only in the inner JSON.
|
||||
_raw_msg = str(error).lower()
|
||||
_body_msg = ""
|
||||
_metadata_msg = ""
|
||||
if isinstance(body, dict):
|
||||
_err_obj = body.get("error", {})
|
||||
if isinstance(_err_obj, dict):
|
||||
_body_msg = (_err_obj.get("message") or "").lower()
|
||||
# Parse metadata.raw for wrapped provider errors
|
||||
_metadata = _err_obj.get("metadata", {})
|
||||
if isinstance(_metadata, dict):
|
||||
_raw_json = _metadata.get("raw") or ""
|
||||
if isinstance(_raw_json, str) and _raw_json.strip():
|
||||
try:
|
||||
import json
|
||||
_inner = json.loads(_raw_json)
|
||||
if isinstance(_inner, dict):
|
||||
_inner_err = _inner.get("error", {})
|
||||
if isinstance(_inner_err, dict):
|
||||
_metadata_msg = (_inner_err.get("message") or "").lower()
|
||||
except (json.JSONDecodeError, TypeError):
|
||||
pass
|
||||
if not _body_msg:
|
||||
_body_msg = (body.get("message") or "").lower()
|
||||
# Combine all message sources for pattern matching
|
||||
parts = [_raw_msg]
|
||||
if _body_msg and _body_msg not in _raw_msg:
|
||||
parts.append(_body_msg)
|
||||
if _metadata_msg and _metadata_msg not in _raw_msg and _metadata_msg not in _body_msg:
|
||||
parts.append(_metadata_msg)
|
||||
error_msg = " ".join(parts)
|
||||
provider_lower = (provider or "").strip().lower()
|
||||
model_lower = (model or "").strip().lower()
|
||||
|
||||
def _result(reason: FailoverReason, **overrides) -> ClassifiedError:
|
||||
defaults = {
|
||||
"reason": reason,
|
||||
"status_code": status_code,
|
||||
"provider": provider,
|
||||
"model": model,
|
||||
"message": _extract_message(error, body),
|
||||
}
|
||||
defaults.update(overrides)
|
||||
return ClassifiedError(**defaults)
|
||||
|
||||
# ── 1. Provider-specific patterns (highest priority) ────────────
|
||||
|
||||
# Anthropic thinking block signature invalid (400).
|
||||
# Don't gate on provider — OpenRouter proxies Anthropic errors, so the
|
||||
# provider may be "openrouter" even though the error is Anthropic-specific.
|
||||
# The message pattern ("signature" + "thinking") is unique enough.
|
||||
if (
|
||||
status_code == 400
|
||||
and "signature" in error_msg
|
||||
and "thinking" in error_msg
|
||||
):
|
||||
return _result(
|
||||
FailoverReason.thinking_signature,
|
||||
retryable=True,
|
||||
should_compress=False,
|
||||
)
|
||||
|
||||
# Anthropic long-context tier gate (429 "extra usage" + "long context")
|
||||
if (
|
||||
status_code == 429
|
||||
and "extra usage" in error_msg
|
||||
and "long context" in error_msg
|
||||
):
|
||||
return _result(
|
||||
FailoverReason.long_context_tier,
|
||||
retryable=True,
|
||||
should_compress=True,
|
||||
)
|
||||
|
||||
# ── 2. HTTP status code classification ──────────────────────────
|
||||
|
||||
if status_code is not None:
|
||||
classified = _classify_by_status(
|
||||
status_code, error_msg, error_code, body,
|
||||
provider=provider_lower, model=model_lower,
|
||||
approx_tokens=approx_tokens, context_length=context_length,
|
||||
num_messages=num_messages,
|
||||
result_fn=_result,
|
||||
)
|
||||
if classified is not None:
|
||||
return classified
|
||||
|
||||
# ── 3. Error code classification ────────────────────────────────
|
||||
|
||||
if error_code:
|
||||
classified = _classify_by_error_code(error_code, error_msg, _result)
|
||||
if classified is not None:
|
||||
return classified
|
||||
|
||||
# ── 4. Message pattern matching (no status code) ────────────────
|
||||
|
||||
classified = _classify_by_message(
|
||||
error_msg, error_type,
|
||||
approx_tokens=approx_tokens,
|
||||
context_length=context_length,
|
||||
result_fn=_result,
|
||||
)
|
||||
if classified is not None:
|
||||
return classified
|
||||
|
||||
# ── 5. Server disconnect + large session → context overflow ─────
|
||||
# Must come BEFORE generic transport error catch — a disconnect on
|
||||
# a large session is more likely context overflow than a transient
|
||||
# transport hiccup. Without this ordering, RemoteProtocolError
|
||||
# always maps to timeout regardless of session size.
|
||||
|
||||
is_disconnect = any(p in error_msg for p in _SERVER_DISCONNECT_PATTERNS)
|
||||
if is_disconnect and not status_code:
|
||||
is_large = approx_tokens > context_length * 0.6 or approx_tokens > 120000 or num_messages > 200
|
||||
if is_large:
|
||||
return _result(
|
||||
FailoverReason.context_overflow,
|
||||
retryable=True,
|
||||
should_compress=True,
|
||||
)
|
||||
return _result(FailoverReason.timeout, retryable=True)
|
||||
|
||||
# ── 6. Transport / timeout heuristics ───────────────────────────
|
||||
|
||||
if error_type in _TRANSPORT_ERROR_TYPES or isinstance(error, (TimeoutError, ConnectionError, OSError)):
|
||||
return _result(FailoverReason.timeout, retryable=True)
|
||||
|
||||
# ── 7. Fallback: unknown ────────────────────────────────────────
|
||||
|
||||
return _result(FailoverReason.unknown, retryable=True)
|
||||
|
||||
|
||||
# ── Status code classification ──────────────────────────────────────────
|
||||
|
||||
def _classify_by_status(
|
||||
status_code: int,
|
||||
error_msg: str,
|
||||
error_code: str,
|
||||
body: dict,
|
||||
*,
|
||||
provider: str,
|
||||
model: str,
|
||||
approx_tokens: int,
|
||||
context_length: int,
|
||||
num_messages: int = 0,
|
||||
result_fn,
|
||||
) -> Optional[ClassifiedError]:
|
||||
"""Classify based on HTTP status code with message-aware refinement."""
|
||||
|
||||
if status_code == 401:
|
||||
# Not retryable on its own — credential pool rotation and
|
||||
# provider-specific refresh (Codex, Anthropic, Nous) run before
|
||||
# the retryability check in run_agent.py. If those succeed, the
|
||||
# loop `continue`s. If they fail, retryable=False ensures we
|
||||
# hit the client-error abort path (which tries fallback first).
|
||||
return result_fn(
|
||||
FailoverReason.auth,
|
||||
retryable=False,
|
||||
should_rotate_credential=True,
|
||||
should_fallback=True,
|
||||
)
|
||||
|
||||
if status_code == 403:
|
||||
# OpenRouter 403 "key limit exceeded" is actually billing
|
||||
if "key limit exceeded" in error_msg or "spending limit" in error_msg:
|
||||
return result_fn(
|
||||
FailoverReason.billing,
|
||||
retryable=False,
|
||||
should_rotate_credential=True,
|
||||
should_fallback=True,
|
||||
)
|
||||
return result_fn(
|
||||
FailoverReason.auth,
|
||||
retryable=False,
|
||||
should_fallback=True,
|
||||
)
|
||||
|
||||
if status_code == 402:
|
||||
return _classify_402(error_msg, result_fn)
|
||||
|
||||
if status_code == 404:
|
||||
if any(p in error_msg for p in _MODEL_NOT_FOUND_PATTERNS):
|
||||
return result_fn(
|
||||
FailoverReason.model_not_found,
|
||||
retryable=False,
|
||||
should_fallback=True,
|
||||
)
|
||||
# Generic 404 — could be model or endpoint
|
||||
return result_fn(
|
||||
FailoverReason.model_not_found,
|
||||
retryable=False,
|
||||
should_fallback=True,
|
||||
)
|
||||
|
||||
if status_code == 413:
|
||||
return result_fn(
|
||||
FailoverReason.payload_too_large,
|
||||
retryable=True,
|
||||
should_compress=True,
|
||||
)
|
||||
|
||||
if status_code == 429:
|
||||
# Already checked long_context_tier above; this is a normal rate limit
|
||||
return result_fn(
|
||||
FailoverReason.rate_limit,
|
||||
retryable=True,
|
||||
should_rotate_credential=True,
|
||||
should_fallback=True,
|
||||
)
|
||||
|
||||
if status_code == 400:
|
||||
return _classify_400(
|
||||
error_msg, error_code, body,
|
||||
provider=provider, model=model,
|
||||
approx_tokens=approx_tokens,
|
||||
context_length=context_length,
|
||||
num_messages=num_messages,
|
||||
result_fn=result_fn,
|
||||
)
|
||||
|
||||
if status_code in (500, 502):
|
||||
return result_fn(FailoverReason.server_error, retryable=True)
|
||||
|
||||
if status_code in (503, 529):
|
||||
return result_fn(FailoverReason.overloaded, retryable=True)
|
||||
|
||||
# Other 4xx — non-retryable
|
||||
if 400 <= status_code < 500:
|
||||
return result_fn(
|
||||
FailoverReason.format_error,
|
||||
retryable=False,
|
||||
should_fallback=True,
|
||||
)
|
||||
|
||||
# Other 5xx — retryable
|
||||
if 500 <= status_code < 600:
|
||||
return result_fn(FailoverReason.server_error, retryable=True)
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def _classify_402(error_msg: str, result_fn) -> ClassifiedError:
|
||||
"""Disambiguate 402: billing exhaustion vs transient usage limit.
|
||||
|
||||
The key insight from OpenClaw: some 402s are transient rate limits
|
||||
disguised as payment errors. "Usage limit, try again in 5 minutes"
|
||||
is NOT a billing problem — it's a periodic quota that resets.
|
||||
"""
|
||||
# Check for transient usage-limit signals first
|
||||
has_usage_limit = any(p in error_msg for p in _USAGE_LIMIT_PATTERNS)
|
||||
has_transient_signal = any(p in error_msg for p in _USAGE_LIMIT_TRANSIENT_SIGNALS)
|
||||
|
||||
if has_usage_limit and has_transient_signal:
|
||||
# Transient quota — treat as rate limit, not billing
|
||||
return result_fn(
|
||||
FailoverReason.rate_limit,
|
||||
retryable=True,
|
||||
should_rotate_credential=True,
|
||||
should_fallback=True,
|
||||
)
|
||||
|
||||
# Confirmed billing exhaustion
|
||||
return result_fn(
|
||||
FailoverReason.billing,
|
||||
retryable=False,
|
||||
should_rotate_credential=True,
|
||||
should_fallback=True,
|
||||
)
|
||||
|
||||
|
||||
def _classify_400(
|
||||
error_msg: str,
|
||||
error_code: str,
|
||||
body: dict,
|
||||
*,
|
||||
provider: str,
|
||||
model: str,
|
||||
approx_tokens: int,
|
||||
context_length: int,
|
||||
num_messages: int = 0,
|
||||
result_fn,
|
||||
) -> ClassifiedError:
|
||||
"""Classify 400 Bad Request — context overflow, format error, or generic."""
|
||||
|
||||
# Context overflow from 400
|
||||
if any(p in error_msg for p in _CONTEXT_OVERFLOW_PATTERNS):
|
||||
return result_fn(
|
||||
FailoverReason.context_overflow,
|
||||
retryable=True,
|
||||
should_compress=True,
|
||||
)
|
||||
|
||||
# Some providers return model-not-found as 400 instead of 404 (e.g. OpenRouter).
|
||||
if any(p in error_msg for p in _MODEL_NOT_FOUND_PATTERNS):
|
||||
return result_fn(
|
||||
FailoverReason.model_not_found,
|
||||
retryable=False,
|
||||
should_fallback=True,
|
||||
)
|
||||
|
||||
# Some providers return rate limit / billing errors as 400 instead of 429/402.
|
||||
# Check these patterns before falling through to format_error.
|
||||
if any(p in error_msg for p in _RATE_LIMIT_PATTERNS):
|
||||
return result_fn(
|
||||
FailoverReason.rate_limit,
|
||||
retryable=True,
|
||||
should_rotate_credential=True,
|
||||
should_fallback=True,
|
||||
)
|
||||
if any(p in error_msg for p in _BILLING_PATTERNS):
|
||||
return result_fn(
|
||||
FailoverReason.billing,
|
||||
retryable=False,
|
||||
should_rotate_credential=True,
|
||||
should_fallback=True,
|
||||
)
|
||||
|
||||
# Generic 400 + large session → probable context overflow
|
||||
# Anthropic sometimes returns a bare "Error" message when context is too large
|
||||
err_body_msg = ""
|
||||
if isinstance(body, dict):
|
||||
err_obj = body.get("error", {})
|
||||
if isinstance(err_obj, dict):
|
||||
err_body_msg = (err_obj.get("message") or "").strip().lower()
|
||||
# Responses API (and some providers) use flat body: {"message": "..."}
|
||||
if not err_body_msg:
|
||||
err_body_msg = (body.get("message") or "").strip().lower()
|
||||
is_generic = len(err_body_msg) < 30 or err_body_msg in ("error", "")
|
||||
is_large = approx_tokens > context_length * 0.4 or approx_tokens > 80000 or num_messages > 80
|
||||
|
||||
if is_generic and is_large:
|
||||
return result_fn(
|
||||
FailoverReason.context_overflow,
|
||||
retryable=True,
|
||||
should_compress=True,
|
||||
)
|
||||
|
||||
# Non-retryable format error
|
||||
return result_fn(
|
||||
FailoverReason.format_error,
|
||||
retryable=False,
|
||||
should_fallback=True,
|
||||
)
|
||||
|
||||
|
||||
# ── Error code classification ───────────────────────────────────────────
|
||||
|
||||
def _classify_by_error_code(
|
||||
error_code: str, error_msg: str, result_fn,
|
||||
) -> Optional[ClassifiedError]:
|
||||
"""Classify by structured error codes from the response body."""
|
||||
code_lower = error_code.lower()
|
||||
|
||||
if code_lower in ("resource_exhausted", "throttled", "rate_limit_exceeded"):
|
||||
return result_fn(
|
||||
FailoverReason.rate_limit,
|
||||
retryable=True,
|
||||
should_rotate_credential=True,
|
||||
)
|
||||
|
||||
if code_lower in ("insufficient_quota", "billing_not_active", "payment_required"):
|
||||
return result_fn(
|
||||
FailoverReason.billing,
|
||||
retryable=False,
|
||||
should_rotate_credential=True,
|
||||
should_fallback=True,
|
||||
)
|
||||
|
||||
if code_lower in ("model_not_found", "model_not_available", "invalid_model"):
|
||||
return result_fn(
|
||||
FailoverReason.model_not_found,
|
||||
retryable=False,
|
||||
should_fallback=True,
|
||||
)
|
||||
|
||||
if code_lower in ("context_length_exceeded", "max_tokens_exceeded"):
|
||||
return result_fn(
|
||||
FailoverReason.context_overflow,
|
||||
retryable=True,
|
||||
should_compress=True,
|
||||
)
|
||||
|
||||
return None
|
||||
|
||||
|
||||
# ── Message pattern classification ──────────────────────────────────────
|
||||
|
||||
def _classify_by_message(
|
||||
error_msg: str,
|
||||
error_type: str,
|
||||
*,
|
||||
approx_tokens: int,
|
||||
context_length: int,
|
||||
result_fn,
|
||||
) -> Optional[ClassifiedError]:
|
||||
"""Classify based on error message patterns when no status code is available."""
|
||||
|
||||
# Payload-too-large patterns (from message text when no status_code)
|
||||
if any(p in error_msg for p in _PAYLOAD_TOO_LARGE_PATTERNS):
|
||||
return result_fn(
|
||||
FailoverReason.payload_too_large,
|
||||
retryable=True,
|
||||
should_compress=True,
|
||||
)
|
||||
|
||||
# Billing patterns
|
||||
if any(p in error_msg for p in _BILLING_PATTERNS):
|
||||
return result_fn(
|
||||
FailoverReason.billing,
|
||||
retryable=False,
|
||||
should_rotate_credential=True,
|
||||
should_fallback=True,
|
||||
)
|
||||
|
||||
# Rate limit patterns
|
||||
if any(p in error_msg for p in _RATE_LIMIT_PATTERNS):
|
||||
return result_fn(
|
||||
FailoverReason.rate_limit,
|
||||
retryable=True,
|
||||
should_rotate_credential=True,
|
||||
should_fallback=True,
|
||||
)
|
||||
|
||||
# Context overflow patterns
|
||||
if any(p in error_msg for p in _CONTEXT_OVERFLOW_PATTERNS):
|
||||
return result_fn(
|
||||
FailoverReason.context_overflow,
|
||||
retryable=True,
|
||||
should_compress=True,
|
||||
)
|
||||
|
||||
# Auth patterns
|
||||
if any(p in error_msg for p in _AUTH_PATTERNS):
|
||||
return result_fn(
|
||||
FailoverReason.auth,
|
||||
retryable=True,
|
||||
should_rotate_credential=True,
|
||||
)
|
||||
|
||||
# Model not found patterns
|
||||
if any(p in error_msg for p in _MODEL_NOT_FOUND_PATTERNS):
|
||||
return result_fn(
|
||||
FailoverReason.model_not_found,
|
||||
retryable=False,
|
||||
should_fallback=True,
|
||||
)
|
||||
|
||||
return None
|
||||
|
||||
|
||||
# ── Helpers ─────────────────────────────────────────────────────────────
|
||||
|
||||
def _extract_status_code(error: Exception) -> Optional[int]:
|
||||
"""Walk the error and its cause chain to find an HTTP status code."""
|
||||
current = error
|
||||
for _ in range(5): # Max depth to prevent infinite loops
|
||||
code = getattr(current, "status_code", None)
|
||||
if isinstance(code, int):
|
||||
return code
|
||||
# Some SDKs use .status instead of .status_code
|
||||
code = getattr(current, "status", None)
|
||||
if isinstance(code, int) and 100 <= code < 600:
|
||||
return code
|
||||
# Walk cause chain
|
||||
cause = getattr(current, "__cause__", None) or getattr(current, "__context__", None)
|
||||
if cause is None or cause is current:
|
||||
break
|
||||
current = cause
|
||||
return None
|
||||
|
||||
|
||||
def _extract_error_body(error: Exception) -> dict:
|
||||
"""Extract the structured error body from an SDK exception."""
|
||||
body = getattr(error, "body", None)
|
||||
if isinstance(body, dict):
|
||||
return body
|
||||
# Some errors have .response.json()
|
||||
response = getattr(error, "response", None)
|
||||
if response is not None:
|
||||
try:
|
||||
json_body = response.json()
|
||||
if isinstance(json_body, dict):
|
||||
return json_body
|
||||
except Exception:
|
||||
pass
|
||||
return {}
|
||||
|
||||
|
||||
def _extract_error_code(body: dict) -> str:
|
||||
"""Extract an error code string from the response body."""
|
||||
if not body:
|
||||
return ""
|
||||
error_obj = body.get("error", {})
|
||||
if isinstance(error_obj, dict):
|
||||
code = error_obj.get("code") or error_obj.get("type") or ""
|
||||
if isinstance(code, str) and code.strip():
|
||||
return code.strip()
|
||||
# Top-level code
|
||||
code = body.get("code") or body.get("error_code") or ""
|
||||
if isinstance(code, (str, int)):
|
||||
return str(code).strip()
|
||||
return ""
|
||||
|
||||
|
||||
def _extract_message(error: Exception, body: dict) -> str:
|
||||
"""Extract the most informative error message."""
|
||||
# Try structured body first
|
||||
if body:
|
||||
error_obj = body.get("error", {})
|
||||
if isinstance(error_obj, dict):
|
||||
msg = error_obj.get("message", "")
|
||||
if isinstance(msg, str) and msg.strip():
|
||||
return msg.strip()[:500]
|
||||
msg = body.get("message", "")
|
||||
if isinstance(msg, str) and msg.strip():
|
||||
return msg.strip()[:500]
|
||||
# Fallback to str(error)
|
||||
return str(error)[:500]
|
||||
@@ -603,6 +603,49 @@ def parse_context_limit_from_error(error_msg: str) -> Optional[int]:
|
||||
return None
|
||||
|
||||
|
||||
def parse_available_output_tokens_from_error(error_msg: str) -> Optional[int]:
|
||||
"""Detect an "output cap too large" error and return how many output tokens are available.
|
||||
|
||||
Background — two distinct context errors exist:
|
||||
1. "Prompt too long" — the INPUT itself exceeds the context window.
|
||||
Fix: compress history and/or halve context_length.
|
||||
2. "max_tokens too large" — input is fine, but input + requested_output > window.
|
||||
Fix: reduce max_tokens (the output cap) for this call.
|
||||
Do NOT touch context_length — the window hasn't shrunk.
|
||||
|
||||
Anthropic's API returns errors like:
|
||||
"max_tokens: 32768 > context_window: 200000 - input_tokens: 190000 = available_tokens: 10000"
|
||||
|
||||
Returns the number of output tokens that would fit (e.g. 10000 above), or None if
|
||||
the error does not look like a max_tokens-too-large error.
|
||||
"""
|
||||
error_lower = error_msg.lower()
|
||||
|
||||
# Must look like an output-cap error, not a prompt-length error.
|
||||
is_output_cap_error = (
|
||||
"max_tokens" in error_lower
|
||||
and ("available_tokens" in error_lower or "available tokens" in error_lower)
|
||||
)
|
||||
if not is_output_cap_error:
|
||||
return None
|
||||
|
||||
# Extract the available_tokens figure.
|
||||
# Anthropic format: "… = available_tokens: 10000"
|
||||
patterns = [
|
||||
r'available_tokens[:\s]+(\d+)',
|
||||
r'available\s+tokens[:\s]+(\d+)',
|
||||
# fallback: last number after "=" in expressions like "200000 - 190000 = 10000"
|
||||
r'=\s*(\d+)\s*$',
|
||||
]
|
||||
for pattern in patterns:
|
||||
match = re.search(pattern, error_lower)
|
||||
if match:
|
||||
tokens = int(match.group(1))
|
||||
if tokens >= 1:
|
||||
return tokens
|
||||
return None
|
||||
|
||||
|
||||
def _model_id_matches(candidate_id: str, lookup_model: str) -> bool:
|
||||
"""Return True if *candidate_id* (from server) matches *lookup_model* (configured).
|
||||
|
||||
|
||||
@@ -0,0 +1,242 @@
|
||||
"""Rate limit tracking for inference API responses.
|
||||
|
||||
Captures x-ratelimit-* headers from provider responses and provides
|
||||
formatted display for the /usage slash command. Currently supports
|
||||
the Nous Portal header format (also used by OpenRouter and OpenAI-compatible
|
||||
APIs that follow the same convention).
|
||||
|
||||
Header schema (12 headers total):
|
||||
x-ratelimit-limit-requests RPM cap
|
||||
x-ratelimit-limit-requests-1h RPH cap
|
||||
x-ratelimit-limit-tokens TPM cap
|
||||
x-ratelimit-limit-tokens-1h TPH cap
|
||||
x-ratelimit-remaining-requests requests left in minute window
|
||||
x-ratelimit-remaining-requests-1h requests left in hour window
|
||||
x-ratelimit-remaining-tokens tokens left in minute window
|
||||
x-ratelimit-remaining-tokens-1h tokens left in hour window
|
||||
x-ratelimit-reset-requests seconds until minute request window resets
|
||||
x-ratelimit-reset-requests-1h seconds until hour request window resets
|
||||
x-ratelimit-reset-tokens seconds until minute token window resets
|
||||
x-ratelimit-reset-tokens-1h seconds until hour token window resets
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import time
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any, Dict, Mapping, Optional
|
||||
|
||||
|
||||
@dataclass
|
||||
class RateLimitBucket:
|
||||
"""One rate-limit window (e.g. requests per minute)."""
|
||||
|
||||
limit: int = 0
|
||||
remaining: int = 0
|
||||
reset_seconds: float = 0.0
|
||||
captured_at: float = 0.0 # time.time() when this was captured
|
||||
|
||||
@property
|
||||
def used(self) -> int:
|
||||
return max(0, self.limit - self.remaining)
|
||||
|
||||
@property
|
||||
def usage_pct(self) -> float:
|
||||
if self.limit <= 0:
|
||||
return 0.0
|
||||
return (self.used / self.limit) * 100.0
|
||||
|
||||
@property
|
||||
def remaining_seconds_now(self) -> float:
|
||||
"""Estimated seconds remaining until reset, adjusted for elapsed time."""
|
||||
elapsed = time.time() - self.captured_at
|
||||
return max(0.0, self.reset_seconds - elapsed)
|
||||
|
||||
|
||||
@dataclass
|
||||
class RateLimitState:
|
||||
"""Full rate-limit state parsed from response headers."""
|
||||
|
||||
requests_min: RateLimitBucket = field(default_factory=RateLimitBucket)
|
||||
requests_hour: RateLimitBucket = field(default_factory=RateLimitBucket)
|
||||
tokens_min: RateLimitBucket = field(default_factory=RateLimitBucket)
|
||||
tokens_hour: RateLimitBucket = field(default_factory=RateLimitBucket)
|
||||
captured_at: float = 0.0 # when the headers were captured
|
||||
provider: str = ""
|
||||
|
||||
@property
|
||||
def has_data(self) -> bool:
|
||||
return self.captured_at > 0
|
||||
|
||||
@property
|
||||
def age_seconds(self) -> float:
|
||||
if not self.has_data:
|
||||
return float("inf")
|
||||
return time.time() - self.captured_at
|
||||
|
||||
|
||||
def _safe_int(value: Any, default: int = 0) -> int:
|
||||
try:
|
||||
return int(float(value))
|
||||
except (TypeError, ValueError):
|
||||
return default
|
||||
|
||||
|
||||
def _safe_float(value: Any, default: float = 0.0) -> float:
|
||||
try:
|
||||
return float(value)
|
||||
except (TypeError, ValueError):
|
||||
return default
|
||||
|
||||
|
||||
def parse_rate_limit_headers(
|
||||
headers: Mapping[str, str],
|
||||
provider: str = "",
|
||||
) -> Optional[RateLimitState]:
|
||||
"""Parse x-ratelimit-* headers into a RateLimitState.
|
||||
|
||||
Returns None if no rate limit headers are present.
|
||||
"""
|
||||
# Quick check: at least one rate limit header must exist
|
||||
has_any = any(k.lower().startswith("x-ratelimit-") for k in headers)
|
||||
if not has_any:
|
||||
return None
|
||||
|
||||
now = time.time()
|
||||
|
||||
def _bucket(resource: str, suffix: str = "") -> RateLimitBucket:
|
||||
# e.g. resource="requests", suffix="" -> per-minute
|
||||
# resource="tokens", suffix="-1h" -> per-hour
|
||||
tag = f"{resource}{suffix}"
|
||||
return RateLimitBucket(
|
||||
limit=_safe_int(headers.get(f"x-ratelimit-limit-{tag}")),
|
||||
remaining=_safe_int(headers.get(f"x-ratelimit-remaining-{tag}")),
|
||||
reset_seconds=_safe_float(headers.get(f"x-ratelimit-reset-{tag}")),
|
||||
captured_at=now,
|
||||
)
|
||||
|
||||
return RateLimitState(
|
||||
requests_min=_bucket("requests"),
|
||||
requests_hour=_bucket("requests", "-1h"),
|
||||
tokens_min=_bucket("tokens"),
|
||||
tokens_hour=_bucket("tokens", "-1h"),
|
||||
captured_at=now,
|
||||
provider=provider,
|
||||
)
|
||||
|
||||
|
||||
# ── Formatting ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def _fmt_count(n: int) -> str:
|
||||
"""Human-friendly number: 7999856 -> '8.0M', 33599 -> '33.6K', 799 -> '799'."""
|
||||
if n >= 1_000_000:
|
||||
return f"{n / 1_000_000:.1f}M"
|
||||
if n >= 10_000:
|
||||
return f"{n / 1_000:.1f}K"
|
||||
if n >= 1_000:
|
||||
return f"{n / 1_000:.1f}K"
|
||||
return str(n)
|
||||
|
||||
|
||||
def _fmt_seconds(seconds: float) -> str:
|
||||
"""Seconds -> human-friendly duration: '58s', '2m 14s', '58m 57s', '1h 2m'."""
|
||||
s = max(0, int(seconds))
|
||||
if s < 60:
|
||||
return f"{s}s"
|
||||
if s < 3600:
|
||||
m, sec = divmod(s, 60)
|
||||
return f"{m}m {sec}s" if sec else f"{m}m"
|
||||
h, remainder = divmod(s, 3600)
|
||||
m = remainder // 60
|
||||
return f"{h}h {m}m" if m else f"{h}h"
|
||||
|
||||
|
||||
def _bar(pct: float, width: int = 20) -> str:
|
||||
"""ASCII progress bar: [████████░░░░░░░░░░░░] 40%."""
|
||||
filled = int(pct / 100.0 * width)
|
||||
filled = max(0, min(width, filled))
|
||||
empty = width - filled
|
||||
return f"[{'█' * filled}{'░' * empty}]"
|
||||
|
||||
|
||||
def _bucket_line(label: str, bucket: RateLimitBucket, label_width: int = 14) -> str:
|
||||
"""Format one bucket as a single line."""
|
||||
if bucket.limit <= 0:
|
||||
return f" {label:<{label_width}} (no data)"
|
||||
|
||||
pct = bucket.usage_pct
|
||||
used = _fmt_count(bucket.used)
|
||||
limit = _fmt_count(bucket.limit)
|
||||
remaining = _fmt_count(bucket.remaining)
|
||||
reset = _fmt_seconds(bucket.remaining_seconds_now)
|
||||
|
||||
bar = _bar(pct)
|
||||
return f" {label:<{label_width}} {bar} {pct:5.1f}% {used}/{limit} used ({remaining} left, resets in {reset})"
|
||||
|
||||
|
||||
def format_rate_limit_display(state: RateLimitState) -> str:
|
||||
"""Format rate limit state for terminal/chat display."""
|
||||
if not state.has_data:
|
||||
return "No rate limit data yet — make an API request first."
|
||||
|
||||
age = state.age_seconds
|
||||
if age < 5:
|
||||
freshness = "just now"
|
||||
elif age < 60:
|
||||
freshness = f"{int(age)}s ago"
|
||||
else:
|
||||
freshness = f"{_fmt_seconds(age)} ago"
|
||||
|
||||
provider_label = state.provider.title() if state.provider else "Provider"
|
||||
|
||||
lines = [
|
||||
f"{provider_label} Rate Limits (captured {freshness}):",
|
||||
"",
|
||||
_bucket_line("Requests/min", state.requests_min),
|
||||
_bucket_line("Requests/hr", state.requests_hour),
|
||||
"",
|
||||
_bucket_line("Tokens/min", state.tokens_min),
|
||||
_bucket_line("Tokens/hr", state.tokens_hour),
|
||||
]
|
||||
|
||||
# Add warnings if any bucket is getting hot
|
||||
warnings = []
|
||||
for label, bucket in [
|
||||
("requests/min", state.requests_min),
|
||||
("requests/hr", state.requests_hour),
|
||||
("tokens/min", state.tokens_min),
|
||||
("tokens/hr", state.tokens_hour),
|
||||
]:
|
||||
if bucket.limit > 0 and bucket.usage_pct >= 80:
|
||||
reset = _fmt_seconds(bucket.remaining_seconds_now)
|
||||
warnings.append(f" ⚠ {label} at {bucket.usage_pct:.0f}% — resets in {reset}")
|
||||
|
||||
if warnings:
|
||||
lines.append("")
|
||||
lines.extend(warnings)
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def format_rate_limit_compact(state: RateLimitState) -> str:
|
||||
"""One-line compact summary for status bars / gateway messages."""
|
||||
if not state.has_data:
|
||||
return "No rate limit data."
|
||||
|
||||
rm = state.requests_min
|
||||
tm = state.tokens_min
|
||||
rh = state.requests_hour
|
||||
th = state.tokens_hour
|
||||
|
||||
parts = []
|
||||
if rm.limit > 0:
|
||||
parts.append(f"RPM: {rm.remaining}/{rm.limit}")
|
||||
if rh.limit > 0:
|
||||
parts.append(f"RPH: {_fmt_count(rh.remaining)}/{_fmt_count(rh.limit)} (resets {_fmt_seconds(rh.remaining_seconds_now)})")
|
||||
if tm.limit > 0:
|
||||
parts.append(f"TPM: {_fmt_count(tm.remaining)}/{_fmt_count(tm.limit)}")
|
||||
if th.limit > 0:
|
||||
parts.append(f"TPH: {_fmt_count(th.remaining)}/{_fmt_count(th.limit)} (resets {_fmt_seconds(th.remaining_seconds_now)})")
|
||||
|
||||
return " | ".join(parts)
|
||||
@@ -159,7 +159,10 @@ class SubdirectoryHintTracker:
|
||||
|
||||
def _is_valid_subdir(self, path: Path) -> bool:
|
||||
"""Check if path is a valid directory to scan for hints."""
|
||||
if not path.is_dir():
|
||||
try:
|
||||
if not path.is_dir():
|
||||
return False
|
||||
except OSError:
|
||||
return False
|
||||
if path in self._loaded_dirs:
|
||||
return False
|
||||
@@ -172,7 +175,10 @@ class SubdirectoryHintTracker:
|
||||
found_hints = []
|
||||
for filename in _HINT_FILENAMES:
|
||||
hint_path = directory / filename
|
||||
if not hint_path.is_file():
|
||||
try:
|
||||
if not hint_path.is_file():
|
||||
continue
|
||||
except OSError:
|
||||
continue
|
||||
try:
|
||||
content = hint_path.read_text(encoding="utf-8").strip()
|
||||
|
||||
+27
-2
@@ -48,6 +48,25 @@ model:
|
||||
# api_key: "your-key-here" # Uncomment to set here instead of .env
|
||||
base_url: "https://openrouter.ai/api/v1"
|
||||
|
||||
# ── Token limits — two settings, easy to confuse ──────────────────────────
|
||||
#
|
||||
# context_length: TOTAL context window (input + output tokens combined).
|
||||
# Controls when Hermes compresses history and validates requests.
|
||||
# Leave unset — Hermes auto-detects the correct value from the provider.
|
||||
# Set manually only when auto-detection is wrong (e.g. a local server with
|
||||
# a custom num_ctx, or a proxy that doesn't expose /v1/models).
|
||||
#
|
||||
# context_length: 131072
|
||||
#
|
||||
# max_tokens: OUTPUT cap — maximum tokens the model may generate per response.
|
||||
# Unrelated to how long your conversation history can be.
|
||||
# The OpenAI-standard name "max_tokens" is a misnomer; Anthropic's native
|
||||
# API has since renamed it "max_output_tokens" for clarity.
|
||||
# Leave unset to use the model's native output ceiling (recommended).
|
||||
# Set only if you want to deliberately limit individual response length.
|
||||
#
|
||||
# max_tokens: 8192
|
||||
|
||||
# =============================================================================
|
||||
# OpenRouter Provider Routing (only applies when using OpenRouter)
|
||||
# =============================================================================
|
||||
@@ -117,7 +136,8 @@ terminal:
|
||||
timeout: 180
|
||||
docker_mount_cwd_to_workspace: false # SECURITY: off by default. Opt in to mount the launch cwd into Docker /workspace.
|
||||
lifetime_seconds: 300
|
||||
# sudo_password: "" # Enable sudo commands (pipes via sudo -S) - SECURITY WARNING: plaintext!
|
||||
# sudo_password: "hunter2" # Optional: pipe a sudo password via sudo -S. SECURITY WARNING: plaintext.
|
||||
# sudo_password: "" # Explicit empty password: try empty and never open the interactive sudo prompt.
|
||||
|
||||
# -----------------------------------------------------------------------------
|
||||
# OPTION 2: SSH remote execution
|
||||
@@ -208,13 +228,18 @@ terminal:
|
||||
#
|
||||
# SECURITY WARNING: Password stored in plaintext!
|
||||
#
|
||||
# INTERACTIVE PROMPT: If no sudo_password is set and the CLI is running,
|
||||
# INTERACTIVE PROMPT: If sudo_password is unset and the CLI is running,
|
||||
# you'll be prompted to enter your password when sudo is needed:
|
||||
# - 45-second timeout (auto-skips if no input)
|
||||
# - Press Enter to skip (command fails gracefully)
|
||||
# - Password is hidden while typing
|
||||
# - Password is cached for the session
|
||||
#
|
||||
# EMPTY PASSWORDS: Setting sudo_password to an explicit empty string is different
|
||||
# from leaving it unset. Hermes will try an empty password via `sudo -S` and
|
||||
# will not open the interactive prompt. This is useful for passwordless sudo,
|
||||
# Touch ID sudo setups, and environments where prompting is just noise.
|
||||
#
|
||||
# ALTERNATIVES:
|
||||
# - SSH backend: Configure passwordless sudo on the remote server
|
||||
# - Containers: Run as root inside the container (no sudo needed)
|
||||
|
||||
@@ -1546,6 +1546,7 @@ class HermesCLI:
|
||||
self._clarify_deadline = 0
|
||||
self._sudo_state = None
|
||||
self._sudo_deadline = 0
|
||||
self._modal_input_snapshot = None
|
||||
self._approval_state = None
|
||||
self._approval_deadline = 0
|
||||
self._approval_lock = threading.Lock()
|
||||
@@ -1602,7 +1603,12 @@ class HermesCLI:
|
||||
return f"[{('█' * filled) + ('░' * max(0, width - filled))}]"
|
||||
|
||||
def _get_status_bar_snapshot(self) -> Dict[str, Any]:
|
||||
model_name = self.model or "unknown"
|
||||
# Prefer the agent's model name — it updates on fallback.
|
||||
# self.model reflects the originally configured model and never
|
||||
# changes mid-session, so the TUI would show a stale name after
|
||||
# _try_activate_fallback() switches provider/model.
|
||||
agent = getattr(self, "agent", None)
|
||||
model_name = (getattr(agent, "model", None) or self.model or "unknown")
|
||||
model_short = model_name.split("/")[-1] if "/" in model_name else model_name
|
||||
if model_short.endswith(".gguf"):
|
||||
model_short = model_short[:-5]
|
||||
@@ -1628,7 +1634,6 @@ class HermesCLI:
|
||||
"compressions": 0,
|
||||
}
|
||||
|
||||
agent = getattr(self, "agent", None)
|
||||
if not agent:
|
||||
return snapshot
|
||||
|
||||
@@ -4003,59 +4008,7 @@ class HermesCLI:
|
||||
|
||||
print(" To change model or provider, use: hermes model")
|
||||
|
||||
def _handle_prompt_command(self, cmd: str):
|
||||
"""Handle the /prompt command to view or set system prompt."""
|
||||
parts = cmd.split(maxsplit=1)
|
||||
|
||||
if len(parts) > 1:
|
||||
# Set new prompt
|
||||
new_prompt = parts[1].strip()
|
||||
|
||||
if new_prompt.lower() == "clear":
|
||||
self.system_prompt = ""
|
||||
self.agent = None # Force re-init
|
||||
if save_config_value("agent.system_prompt", ""):
|
||||
print("(^_^)b System prompt cleared (saved to config)")
|
||||
else:
|
||||
print("(^_^) System prompt cleared (session only)")
|
||||
else:
|
||||
self.system_prompt = new_prompt
|
||||
self.agent = None # Force re-init
|
||||
if save_config_value("agent.system_prompt", new_prompt):
|
||||
print("(^_^)b System prompt set (saved to config)")
|
||||
else:
|
||||
print("(^_^) System prompt set (session only)")
|
||||
print(f" \"{new_prompt[:60]}{'...' if len(new_prompt) > 60 else ''}\"")
|
||||
else:
|
||||
# Show current prompt
|
||||
print()
|
||||
print("+" + "-" * 50 + "+")
|
||||
print("|" + " " * 15 + "(^_^) System Prompt" + " " * 15 + "|")
|
||||
print("+" + "-" * 50 + "+")
|
||||
print()
|
||||
if self.system_prompt:
|
||||
# Word wrap the prompt for display
|
||||
words = self.system_prompt.split()
|
||||
lines = []
|
||||
current_line = ""
|
||||
for word in words:
|
||||
if len(current_line) + len(word) + 1 <= 50:
|
||||
current_line += (" " if current_line else "") + word
|
||||
else:
|
||||
lines.append(current_line)
|
||||
current_line = word
|
||||
if current_line:
|
||||
lines.append(current_line)
|
||||
for line in lines:
|
||||
print(f" {line}")
|
||||
else:
|
||||
print(" (no custom prompt set - using default)")
|
||||
print()
|
||||
print(" Usage:")
|
||||
print(" /prompt <text> - Set a custom system prompt")
|
||||
print(" /prompt clear - Remove custom prompt")
|
||||
print(" /personality - Use a predefined personality")
|
||||
print()
|
||||
|
||||
|
||||
|
||||
@staticmethod
|
||||
@@ -4555,9 +4508,7 @@ class HermesCLI:
|
||||
self._handle_model_switch(cmd_original)
|
||||
elif canonical == "provider":
|
||||
self._show_model_and_providers()
|
||||
elif canonical == "prompt":
|
||||
# Use original case so prompt text isn't lowercased
|
||||
self._handle_prompt_command(cmd_original)
|
||||
|
||||
elif canonical == "personality":
|
||||
# Use original case (handler lowercases the personality name itself)
|
||||
self._handle_personality_command(cmd_original)
|
||||
@@ -5408,12 +5359,27 @@ class HermesCLI:
|
||||
print(f" ❌ Compression failed: {e}")
|
||||
|
||||
def _show_usage(self):
|
||||
"""Show cumulative token usage for the current session."""
|
||||
"""Show rate limits (if available) and session token usage."""
|
||||
if not self.agent:
|
||||
print("(._.) No active agent -- send a message first.")
|
||||
return
|
||||
|
||||
agent = self.agent
|
||||
calls = agent.session_api_calls
|
||||
|
||||
if calls == 0:
|
||||
print("(._.) No API calls made yet in this session.")
|
||||
return
|
||||
|
||||
# ── Rate limits (shown first when available) ────────────────
|
||||
rl_state = agent.get_rate_limit_state()
|
||||
if rl_state and rl_state.has_data:
|
||||
from agent.rate_limit_tracker import format_rate_limit_display
|
||||
print()
|
||||
print(format_rate_limit_display(rl_state))
|
||||
print()
|
||||
|
||||
# ── Session token usage ─────────────────────────────────────
|
||||
input_tokens = getattr(agent, "session_input_tokens", 0) or 0
|
||||
output_tokens = getattr(agent, "session_output_tokens", 0) or 0
|
||||
cache_read_tokens = getattr(agent, "session_cache_read_tokens", 0) or 0
|
||||
@@ -5421,13 +5387,7 @@ class HermesCLI:
|
||||
prompt = agent.session_prompt_tokens
|
||||
completion = agent.session_completion_tokens
|
||||
total = agent.session_total_tokens
|
||||
calls = agent.session_api_calls
|
||||
|
||||
if calls == 0:
|
||||
print("(._.) No API calls made yet in this session.")
|
||||
return
|
||||
|
||||
# Current context window state
|
||||
compressor = agent.context_compressor
|
||||
last_prompt = compressor.last_prompt_tokens
|
||||
ctx_len = compressor.context_length
|
||||
@@ -6205,6 +6165,7 @@ class HermesCLI:
|
||||
timeout = 45
|
||||
response_queue = queue.Queue()
|
||||
|
||||
self._capture_modal_input_snapshot()
|
||||
self._sudo_state = {
|
||||
"response_queue": response_queue,
|
||||
}
|
||||
@@ -6217,6 +6178,7 @@ class HermesCLI:
|
||||
result = response_queue.get(timeout=1)
|
||||
self._sudo_state = None
|
||||
self._sudo_deadline = 0
|
||||
self._restore_modal_input_snapshot()
|
||||
self._invalidate()
|
||||
if result:
|
||||
_cprint(f"\n{_DIM} ✓ Password received (cached for session){_RST}")
|
||||
@@ -6231,6 +6193,7 @@ class HermesCLI:
|
||||
|
||||
self._sudo_state = None
|
||||
self._sudo_deadline = 0
|
||||
self._restore_modal_input_snapshot()
|
||||
self._invalidate()
|
||||
_cprint(f"\n{_DIM} ⏱ Timeout — continuing without sudo{_RST}")
|
||||
return ""
|
||||
@@ -6403,6 +6366,33 @@ class HermesCLI:
|
||||
def _secret_capture_callback(self, var_name: str, prompt: str, metadata=None) -> dict:
|
||||
return prompt_for_secret(self, var_name, prompt, metadata)
|
||||
|
||||
def _capture_modal_input_snapshot(self) -> None:
|
||||
"""Temporarily clear the input buffer and save the user's in-progress draft."""
|
||||
if self._modal_input_snapshot is not None or not getattr(self, "_app", None):
|
||||
return
|
||||
try:
|
||||
buf = self._app.current_buffer
|
||||
self._modal_input_snapshot = {
|
||||
"text": buf.text,
|
||||
"cursor_position": buf.cursor_position,
|
||||
}
|
||||
buf.reset()
|
||||
except Exception:
|
||||
self._modal_input_snapshot = None
|
||||
|
||||
def _restore_modal_input_snapshot(self) -> None:
|
||||
"""Restore any draft text that was present before a modal prompt opened."""
|
||||
snapshot = self._modal_input_snapshot
|
||||
self._modal_input_snapshot = None
|
||||
if not snapshot or not getattr(self, "_app", None):
|
||||
return
|
||||
try:
|
||||
buf = self._app.current_buffer
|
||||
buf.text = snapshot.get("text", "")
|
||||
buf.cursor_position = min(snapshot.get("cursor_position", 0), len(buf.text))
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
def _submit_secret_response(self, value: str) -> None:
|
||||
if not self._secret_state:
|
||||
return
|
||||
@@ -7130,6 +7120,7 @@ class HermesCLI:
|
||||
# Sudo password prompt state (similar mechanism to clarify)
|
||||
self._sudo_state = None # dict with response_queue when active
|
||||
self._sudo_deadline = 0
|
||||
self._modal_input_snapshot = None
|
||||
|
||||
# Dangerous command approval state (similar mechanism to clarify)
|
||||
self._approval_state = None # dict with command, description, choices, selected, response_queue
|
||||
@@ -7201,7 +7192,6 @@ class HermesCLI:
|
||||
text = event.app.current_buffer.text
|
||||
self._sudo_state["response_queue"].put(text)
|
||||
self._sudo_state = None
|
||||
event.app.current_buffer.reset()
|
||||
event.app.invalidate()
|
||||
return
|
||||
|
||||
@@ -7406,7 +7396,6 @@ class HermesCLI:
|
||||
if self._sudo_state:
|
||||
self._sudo_state["response_queue"].put("")
|
||||
self._sudo_state = None
|
||||
event.app.current_buffer.reset()
|
||||
event.app.invalidate()
|
||||
return
|
||||
|
||||
|
||||
Generated
+4
-4
@@ -22,16 +22,16 @@
|
||||
},
|
||||
"nixpkgs": {
|
||||
"locked": {
|
||||
"lastModified": 1751274312,
|
||||
"narHash": "sha256-/bVBlRpECLVzjV19t5KMdMFWSwKLtb5RyXdjz3LJT+g=",
|
||||
"lastModified": 1775036866,
|
||||
"narHash": "sha256-ZojAnPuCdy657PbTq5V0Y+AHKhZAIwSIT2cb8UgAz/U=",
|
||||
"owner": "NixOS",
|
||||
"repo": "nixpkgs",
|
||||
"rev": "50ab793786d9de88ee30ec4e4c24fb4236fc2674",
|
||||
"rev": "6201e203d09599479a3b3450ed24fa81537ebc4e",
|
||||
"type": "github"
|
||||
},
|
||||
"original": {
|
||||
"owner": "NixOS",
|
||||
"ref": "nixos-24.11",
|
||||
"ref": "nixos-unstable",
|
||||
"repo": "nixpkgs",
|
||||
"type": "github"
|
||||
}
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
description = "Hermes Agent - AI agent framework by Nous Research";
|
||||
|
||||
inputs = {
|
||||
nixpkgs.url = "github:NixOS/nixpkgs/nixos-24.11";
|
||||
nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
|
||||
flake-parts = {
|
||||
url = "github:hercules-ci/flake-parts";
|
||||
inputs.nixpkgs-lib.follows = "nixpkgs";
|
||||
|
||||
+16
-7
@@ -5280,19 +5280,28 @@ class GatewayRunner:
|
||||
|
||||
agent = self._running_agents.get(session_key)
|
||||
if agent and hasattr(agent, "session_total_tokens") and agent.session_api_calls > 0:
|
||||
lines = [
|
||||
"📊 **Session Token Usage**",
|
||||
f"Prompt (input): {agent.session_prompt_tokens:,}",
|
||||
f"Completion (output): {agent.session_completion_tokens:,}",
|
||||
f"Total: {agent.session_total_tokens:,}",
|
||||
f"API calls: {agent.session_api_calls}",
|
||||
]
|
||||
lines = []
|
||||
|
||||
# Rate limits first (when available from provider headers)
|
||||
rl_state = agent.get_rate_limit_state()
|
||||
if rl_state and rl_state.has_data:
|
||||
from agent.rate_limit_tracker import format_rate_limit_compact
|
||||
lines.append(f"⏱️ **Rate Limits:** {format_rate_limit_compact(rl_state)}")
|
||||
lines.append("")
|
||||
|
||||
# Session token usage
|
||||
lines.append("📊 **Session Token Usage**")
|
||||
lines.append(f"Prompt (input): {agent.session_prompt_tokens:,}")
|
||||
lines.append(f"Completion (output): {agent.session_completion_tokens:,}")
|
||||
lines.append(f"Total: {agent.session_total_tokens:,}")
|
||||
lines.append(f"API calls: {agent.session_api_calls}")
|
||||
ctx = agent.context_compressor
|
||||
if ctx.last_prompt_tokens:
|
||||
pct = min(100, ctx.last_prompt_tokens / ctx.context_length * 100) if ctx.context_length else 0
|
||||
lines.append(f"Context: {ctx.last_prompt_tokens:,} / {ctx.context_length:,} ({pct:.0f}%)")
|
||||
if ctx.compression_count:
|
||||
lines.append(f"Compressions: {ctx.compression_count}")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
# No running agent -- check session history for a rough count
|
||||
|
||||
+15
-6
@@ -250,7 +250,7 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
|
||||
# Kimi Code Endpoint Detection
|
||||
# =============================================================================
|
||||
|
||||
# Kimi Code (platform.kimi.ai) issues keys prefixed "sk-kimi-" that only work
|
||||
# Kimi Code (kimi.com/code) issues keys prefixed "sk-kimi-" that only work
|
||||
# on api.kimi.com/coding/v1. Legacy keys from platform.moonshot.ai work on
|
||||
# api.moonshot.ai/v1 (the default). Auto-detect when user hasn't set
|
||||
# KIMI_BASE_URL explicitly.
|
||||
@@ -3017,12 +3017,15 @@ def _login_nous(args, pconfig: ProviderConfig) -> None:
|
||||
_save_provider_state(auth_store, "nous", auth_state)
|
||||
saved_to = _save_auth_store(auth_store)
|
||||
|
||||
config_path = _update_config_for_provider("nous", inference_base_url)
|
||||
print()
|
||||
print("Login successful!")
|
||||
print(f" Auth state: {saved_to}")
|
||||
print(f" Config updated: {config_path} (model.provider=nous)")
|
||||
|
||||
# Resolve model BEFORE writing provider to config.yaml so we never
|
||||
# leave the config in a half-updated state (provider=nous but model
|
||||
# still set to the previous provider's model, e.g. opus from
|
||||
# OpenRouter). The auth.json active_provider was already set above.
|
||||
selected_model = None
|
||||
try:
|
||||
runtime_key = auth_state.get("agent_key") or auth_state.get("access_token")
|
||||
if not isinstance(runtime_key, str) or not runtime_key:
|
||||
@@ -3056,9 +3059,6 @@ def _login_nous(args, pconfig: ProviderConfig) -> None:
|
||||
unavailable_models=unavailable_models,
|
||||
portal_url=_portal,
|
||||
)
|
||||
if selected_model:
|
||||
_save_model_choice(selected_model)
|
||||
print(f"Default model set to: {selected_model}")
|
||||
elif unavailable_models:
|
||||
_url = (_portal or DEFAULT_NOUS_PORTAL_URL).rstrip("/")
|
||||
print("No free models currently available.")
|
||||
@@ -3070,6 +3070,15 @@ def _login_nous(args, pconfig: ProviderConfig) -> None:
|
||||
print()
|
||||
print(f"Login succeeded, but could not fetch available models. Reason: {message}")
|
||||
|
||||
# Write provider + model atomically so config is never mismatched.
|
||||
config_path = _update_config_for_provider(
|
||||
"nous", inference_base_url, default_model=selected_model,
|
||||
)
|
||||
if selected_model:
|
||||
_save_model_choice(selected_model)
|
||||
print(f"Default model set to: {selected_model}")
|
||||
print(f" Config updated: {config_path} (model.provider=nous)")
|
||||
|
||||
except KeyboardInterrupt:
|
||||
print("\nLogin cancelled.")
|
||||
raise SystemExit(130)
|
||||
|
||||
@@ -87,8 +87,7 @@ COMMAND_REGISTRY: list[CommandDef] = [
|
||||
CommandDef("model", "Switch model for this session", "Configuration", args_hint="[model] [--global]"),
|
||||
CommandDef("provider", "Show available providers and current provider",
|
||||
"Configuration"),
|
||||
CommandDef("prompt", "View/set custom system prompt", "Configuration",
|
||||
cli_only=True, args_hint="[text]", subcommands=("clear",)),
|
||||
|
||||
CommandDef("personality", "Set a predefined personality", "Configuration",
|
||||
args_hint="[name]"),
|
||||
CommandDef("statusbar", "Toggle the context/model status bar", "Configuration",
|
||||
@@ -129,7 +128,7 @@ COMMAND_REGISTRY: list[CommandDef] = [
|
||||
CommandDef("commands", "Browse all commands and skills (paginated)", "Info",
|
||||
gateway_only=True, args_hint="[page]"),
|
||||
CommandDef("help", "Show available commands", "Info"),
|
||||
CommandDef("usage", "Show token usage for the current session", "Info"),
|
||||
CommandDef("usage", "Show token usage and rate limits for the current session", "Info"),
|
||||
CommandDef("insights", "Show usage insights and analytics", "Info",
|
||||
args_hint="[days]"),
|
||||
CommandDef("platforms", "Show gateway/messaging platform status", "Info",
|
||||
|
||||
+17
-2
@@ -569,7 +569,7 @@ DEFAULT_CONFIG = {
|
||||
},
|
||||
|
||||
# Config schema version - bump this when adding new required fields
|
||||
"_config_version": 12,
|
||||
"_config_version": 13,
|
||||
}
|
||||
|
||||
# =============================================================================
|
||||
@@ -1217,7 +1217,7 @@ OPTIONAL_ENV_VARS = {
|
||||
"category": "setting",
|
||||
},
|
||||
"SUDO_PASSWORD": {
|
||||
"description": "Sudo password for terminal commands requiring root access",
|
||||
"description": "Sudo password for terminal commands requiring root access; set to an explicit empty string to try empty without prompting",
|
||||
"prompt": "Sudo password",
|
||||
"url": None,
|
||||
"password": True,
|
||||
@@ -1701,6 +1701,21 @@ def migrate_config(interactive: bool = True, quiet: bool = False) -> Dict[str, A
|
||||
ep = providers_dict[key]
|
||||
print(f" → {key}: {ep.get('api', '')}")
|
||||
|
||||
# ── Version 12 → 13: clear dead LLM_MODEL / OPENAI_MODEL from .env ──
|
||||
# These env vars were written by the old setup wizard but nothing reads
|
||||
# them anymore (config.yaml is the sole source of truth since March 2026).
|
||||
# Stale entries cause user confusion — see issue report.
|
||||
if current_ver < 13:
|
||||
for dead_var in ("LLM_MODEL", "OPENAI_MODEL"):
|
||||
try:
|
||||
old_val = get_env_value(dead_var)
|
||||
if old_val:
|
||||
save_env_value(dead_var, "")
|
||||
if not quiet:
|
||||
print(f" ✓ Cleared {dead_var} from .env (no longer used — config.yaml is source of truth)")
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
if current_ver < latest_ver and not quiet:
|
||||
print(f"Config version: {current_ver} → {latest_ver}")
|
||||
|
||||
|
||||
@@ -0,0 +1,337 @@
|
||||
"""
|
||||
Dump command for hermes CLI.
|
||||
|
||||
Outputs a compact, plain-text summary of the user's Hermes setup
|
||||
that can be copy-pasted into Discord/GitHub/Telegram for support context.
|
||||
No ANSI colors, no checkmarks — just data.
|
||||
"""
|
||||
|
||||
import json
|
||||
import os
|
||||
import platform
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
from hermes_cli.config import get_hermes_home, get_env_path, get_project_root, load_config
|
||||
from hermes_constants import display_hermes_home
|
||||
|
||||
|
||||
def _get_git_commit(project_root: Path) -> str:
|
||||
"""Return short git commit hash, or '(unknown)'."""
|
||||
try:
|
||||
result = subprocess.run(
|
||||
["git", "rev-parse", "--short=8", "HEAD"],
|
||||
capture_output=True, text=True, timeout=5,
|
||||
cwd=str(project_root),
|
||||
)
|
||||
if result.returncode == 0:
|
||||
return result.stdout.strip()
|
||||
except Exception:
|
||||
pass
|
||||
return "(unknown)"
|
||||
|
||||
|
||||
def _key_present(name: str) -> str:
|
||||
"""Return 'set' or 'not set' for an env var."""
|
||||
return "set" if os.getenv(name) else "not set"
|
||||
|
||||
|
||||
def _redact(value: str) -> str:
|
||||
"""Redact all but first 4 and last 4 chars."""
|
||||
if not value:
|
||||
return ""
|
||||
if len(value) < 12:
|
||||
return "***"
|
||||
return value[:4] + "..." + value[-4:]
|
||||
|
||||
|
||||
def _gateway_status() -> str:
|
||||
"""Return a short gateway status string."""
|
||||
if sys.platform.startswith("linux"):
|
||||
try:
|
||||
from hermes_cli.gateway import get_service_name
|
||||
svc = get_service_name()
|
||||
except Exception:
|
||||
svc = "hermes-gateway"
|
||||
try:
|
||||
r = subprocess.run(
|
||||
["systemctl", "--user", "is-active", svc],
|
||||
capture_output=True, text=True, timeout=5,
|
||||
)
|
||||
return "running (systemd)" if r.stdout.strip() == "active" else "stopped"
|
||||
except Exception:
|
||||
return "unknown"
|
||||
elif sys.platform == "darwin":
|
||||
try:
|
||||
from hermes_cli.gateway import get_launchd_label
|
||||
r = subprocess.run(
|
||||
["launchctl", "list", get_launchd_label()],
|
||||
capture_output=True, text=True, timeout=5,
|
||||
)
|
||||
return "loaded (launchd)" if r.returncode == 0 else "not loaded"
|
||||
except Exception:
|
||||
return "unknown"
|
||||
return "N/A"
|
||||
|
||||
|
||||
def _count_skills(hermes_home: Path) -> int:
|
||||
"""Count installed skills."""
|
||||
skills_dir = hermes_home / "skills"
|
||||
if not skills_dir.is_dir():
|
||||
return 0
|
||||
count = 0
|
||||
for item in skills_dir.rglob("SKILL.md"):
|
||||
count += 1
|
||||
return count
|
||||
|
||||
|
||||
def _count_mcp_servers(config: dict) -> int:
|
||||
"""Count configured MCP servers."""
|
||||
mcp = config.get("mcp", {})
|
||||
servers = mcp.get("servers", {})
|
||||
return len(servers)
|
||||
|
||||
|
||||
def _cron_summary(hermes_home: Path) -> str:
|
||||
"""Return cron jobs summary."""
|
||||
jobs_file = hermes_home / "cron" / "jobs.json"
|
||||
if not jobs_file.exists():
|
||||
return "0"
|
||||
try:
|
||||
with open(jobs_file, encoding="utf-8") as f:
|
||||
data = json.load(f)
|
||||
jobs = data.get("jobs", [])
|
||||
active = sum(1 for j in jobs if j.get("enabled", True))
|
||||
return f"{active} active / {len(jobs)} total"
|
||||
except Exception:
|
||||
return "(error reading)"
|
||||
|
||||
|
||||
def _configured_platforms() -> list[str]:
|
||||
"""Return list of configured messaging platform names."""
|
||||
checks = {
|
||||
"telegram": "TELEGRAM_BOT_TOKEN",
|
||||
"discord": "DISCORD_BOT_TOKEN",
|
||||
"slack": "SLACK_BOT_TOKEN",
|
||||
"whatsapp": "WHATSAPP_ENABLED",
|
||||
"signal": "SIGNAL_HTTP_URL",
|
||||
"email": "EMAIL_ADDRESS",
|
||||
"sms": "TWILIO_ACCOUNT_SID",
|
||||
"matrix": "MATRIX_HOMESERVER_URL",
|
||||
"mattermost": "MATTERMOST_URL",
|
||||
"homeassistant": "HASS_TOKEN",
|
||||
"dingtalk": "DINGTALK_CLIENT_ID",
|
||||
"feishu": "FEISHU_APP_ID",
|
||||
"wecom": "WECOM_BOT_ID",
|
||||
}
|
||||
return [name for name, env in checks.items() if os.getenv(env)]
|
||||
|
||||
|
||||
def _memory_provider(config: dict) -> str:
|
||||
"""Return the active memory provider name."""
|
||||
mem = config.get("memory", {})
|
||||
provider = mem.get("provider", "")
|
||||
return provider if provider else "built-in"
|
||||
|
||||
|
||||
def _get_model_and_provider(config: dict) -> tuple[str, str]:
|
||||
"""Extract model and provider from config."""
|
||||
model_cfg = config.get("model", "")
|
||||
if isinstance(model_cfg, dict):
|
||||
model = model_cfg.get("default") or model_cfg.get("model") or model_cfg.get("name") or "(not set)"
|
||||
provider = model_cfg.get("provider") or "(auto)"
|
||||
elif isinstance(model_cfg, str):
|
||||
model = model_cfg or "(not set)"
|
||||
provider = "(auto)"
|
||||
else:
|
||||
model = "(not set)"
|
||||
provider = "(auto)"
|
||||
return model, provider
|
||||
|
||||
|
||||
def _config_overrides(config: dict) -> dict[str, str]:
|
||||
"""Find non-default config values worth reporting.
|
||||
|
||||
Returns a flat dict of dotpath -> value for interesting overrides.
|
||||
"""
|
||||
from hermes_cli.config import DEFAULT_CONFIG
|
||||
|
||||
overrides = {}
|
||||
|
||||
# Sections with interesting user-facing overrides
|
||||
interesting_paths = [
|
||||
("agent", "max_turns"),
|
||||
("agent", "gateway_timeout"),
|
||||
("agent", "tool_use_enforcement"),
|
||||
("terminal", "backend"),
|
||||
("terminal", "docker_image"),
|
||||
("terminal", "persistent_shell"),
|
||||
("browser", "allow_private_urls"),
|
||||
("compression", "enabled"),
|
||||
("compression", "threshold"),
|
||||
("display", "streaming"),
|
||||
("display", "skin"),
|
||||
("display", "show_reasoning"),
|
||||
("smart_model_routing", "enabled"),
|
||||
("privacy", "redact_pii"),
|
||||
("tts", "provider"),
|
||||
]
|
||||
|
||||
for section, key in interesting_paths:
|
||||
default_section = DEFAULT_CONFIG.get(section, {})
|
||||
user_section = config.get(section, {})
|
||||
if not isinstance(default_section, dict) or not isinstance(user_section, dict):
|
||||
continue
|
||||
default_val = default_section.get(key)
|
||||
user_val = user_section.get(key)
|
||||
if user_val is not None and user_val != default_val:
|
||||
overrides[f"{section}.{key}"] = str(user_val)
|
||||
|
||||
# Toolsets (if different from default)
|
||||
default_toolsets = DEFAULT_CONFIG.get("toolsets", [])
|
||||
user_toolsets = config.get("toolsets", [])
|
||||
if user_toolsets != default_toolsets:
|
||||
overrides["toolsets"] = str(user_toolsets)
|
||||
|
||||
# Fallback providers
|
||||
fallbacks = config.get("fallback_providers", [])
|
||||
if fallbacks:
|
||||
overrides["fallback_providers"] = str(fallbacks)
|
||||
|
||||
return overrides
|
||||
|
||||
|
||||
def run_dump(args):
|
||||
"""Output a compact, copy-pasteable setup summary."""
|
||||
show_keys = getattr(args, "show_keys", False)
|
||||
|
||||
# Load env from .env file so key checks work
|
||||
from dotenv import load_dotenv
|
||||
env_path = get_env_path()
|
||||
if env_path.exists():
|
||||
try:
|
||||
load_dotenv(env_path, encoding="utf-8")
|
||||
except UnicodeDecodeError:
|
||||
load_dotenv(env_path, encoding="latin-1")
|
||||
# Also try project .env as dev fallback
|
||||
load_dotenv(get_project_root() / ".env", override=False, encoding="utf-8")
|
||||
|
||||
project_root = get_project_root()
|
||||
hermes_home = get_hermes_home()
|
||||
|
||||
try:
|
||||
from hermes_cli import __version__, __release_date__
|
||||
except ImportError:
|
||||
__version__ = "(unknown)"
|
||||
__release_date__ = ""
|
||||
|
||||
commit = _get_git_commit(project_root)
|
||||
|
||||
try:
|
||||
config = load_config()
|
||||
except Exception:
|
||||
config = {}
|
||||
|
||||
model, provider = _get_model_and_provider(config)
|
||||
|
||||
# Profile
|
||||
try:
|
||||
from hermes_cli.profiles import get_active_profile_name
|
||||
profile = get_active_profile_name() or "(default)"
|
||||
except Exception:
|
||||
profile = "(default)"
|
||||
|
||||
# Terminal backend
|
||||
terminal_cfg = config.get("terminal", {})
|
||||
backend = terminal_cfg.get("backend", "local")
|
||||
|
||||
# OpenAI SDK version
|
||||
try:
|
||||
import openai
|
||||
openai_ver = openai.__version__
|
||||
except ImportError:
|
||||
openai_ver = "not installed"
|
||||
|
||||
# OS info
|
||||
os_info = f"{platform.system()} {platform.release()} {platform.machine()}"
|
||||
|
||||
lines = []
|
||||
lines.append("--- hermes dump ---")
|
||||
ver_str = f"{__version__}"
|
||||
if __release_date__:
|
||||
ver_str += f" ({__release_date__})"
|
||||
ver_str += f" [{commit}]"
|
||||
lines.append(f"version: {ver_str}")
|
||||
lines.append(f"os: {os_info}")
|
||||
lines.append(f"python: {sys.version.split()[0]}")
|
||||
lines.append(f"openai_sdk: {openai_ver}")
|
||||
lines.append(f"profile: {profile}")
|
||||
lines.append(f"hermes_home: {display_hermes_home()}")
|
||||
lines.append(f"model: {model}")
|
||||
lines.append(f"provider: {provider}")
|
||||
lines.append(f"terminal: {backend}")
|
||||
|
||||
# API keys
|
||||
lines.append("")
|
||||
lines.append("api_keys:")
|
||||
api_keys = [
|
||||
("OPENROUTER_API_KEY", "openrouter"),
|
||||
("OPENAI_API_KEY", "openai"),
|
||||
("ANTHROPIC_API_KEY", "anthropic"),
|
||||
("ANTHROPIC_TOKEN", "anthropic_token"),
|
||||
("NOUS_API_KEY", "nous"),
|
||||
("GLM_API_KEY", "glm/zai"),
|
||||
("ZAI_API_KEY", "zai"),
|
||||
("KIMI_API_KEY", "kimi"),
|
||||
("MINIMAX_API_KEY", "minimax"),
|
||||
("DEEPSEEK_API_KEY", "deepseek"),
|
||||
("DASHSCOPE_API_KEY", "dashscope"),
|
||||
("HF_TOKEN", "huggingface"),
|
||||
("AI_GATEWAY_API_KEY", "ai_gateway"),
|
||||
("OPENCODE_ZEN_API_KEY", "opencode_zen"),
|
||||
("OPENCODE_GO_API_KEY", "opencode_go"),
|
||||
("KILOCODE_API_KEY", "kilocode"),
|
||||
("FIRECRAWL_API_KEY", "firecrawl"),
|
||||
("TAVILY_API_KEY", "tavily"),
|
||||
("BROWSERBASE_API_KEY", "browserbase"),
|
||||
("FAL_KEY", "fal"),
|
||||
("ELEVENLABS_API_KEY", "elevenlabs"),
|
||||
("GITHUB_TOKEN", "github"),
|
||||
]
|
||||
|
||||
for env_var, label in api_keys:
|
||||
val = os.getenv(env_var, "")
|
||||
if show_keys and val:
|
||||
display = _redact(val)
|
||||
else:
|
||||
display = "set" if val else "not set"
|
||||
lines.append(f" {label:<20} {display}")
|
||||
|
||||
# Features summary
|
||||
lines.append("")
|
||||
lines.append("features:")
|
||||
|
||||
toolsets = config.get("toolsets", ["hermes-cli"])
|
||||
lines.append(f" toolsets: {', '.join(toolsets) if toolsets else '(default)'}")
|
||||
lines.append(f" mcp_servers: {_count_mcp_servers(config)}")
|
||||
lines.append(f" memory_provider: {_memory_provider(config)}")
|
||||
lines.append(f" gateway: {_gateway_status()}")
|
||||
|
||||
platforms = _configured_platforms()
|
||||
lines.append(f" platforms: {', '.join(platforms) if platforms else 'none'}")
|
||||
lines.append(f" cron_jobs: {_cron_summary(hermes_home)}")
|
||||
lines.append(f" skills: {_count_skills(hermes_home)}")
|
||||
|
||||
# Config overrides (non-default values)
|
||||
overrides = _config_overrides(config)
|
||||
if overrides:
|
||||
lines.append("")
|
||||
lines.append("config_overrides:")
|
||||
for key, val in overrides.items():
|
||||
lines.append(f" {key}: {val}")
|
||||
|
||||
lines.append("--- end dump ---")
|
||||
|
||||
output = "\n".join(lines)
|
||||
print(output)
|
||||
@@ -2643,6 +2643,12 @@ def cmd_doctor(args):
|
||||
run_doctor(args)
|
||||
|
||||
|
||||
def cmd_dump(args):
|
||||
"""Dump setup summary for support/debugging."""
|
||||
from hermes_cli.dump import run_dump
|
||||
run_dump(args)
|
||||
|
||||
|
||||
def cmd_config(args):
|
||||
"""Configuration management."""
|
||||
from hermes_cli.config import config_command
|
||||
@@ -4724,6 +4730,22 @@ For more help on a command:
|
||||
help="Attempt to fix issues automatically"
|
||||
)
|
||||
doctor_parser.set_defaults(func=cmd_doctor)
|
||||
|
||||
# =========================================================================
|
||||
# dump command
|
||||
# =========================================================================
|
||||
dump_parser = subparsers.add_parser(
|
||||
"dump",
|
||||
help="Dump setup summary for support/debugging",
|
||||
description="Output a compact, plain-text summary of your Hermes setup "
|
||||
"that can be copy-pasted into Discord/GitHub for support context"
|
||||
)
|
||||
dump_parser.add_argument(
|
||||
"--show-keys",
|
||||
action="store_true",
|
||||
help="Show redacted API key prefixes (first/last 4 chars) instead of just set/not set"
|
||||
)
|
||||
dump_parser.set_defaults(func=cmd_dump)
|
||||
|
||||
# =========================================================================
|
||||
# config command
|
||||
|
||||
@@ -733,6 +733,7 @@ def list_authenticated_providers(
|
||||
fetch_models_dev,
|
||||
get_provider_info as _mdev_pinfo,
|
||||
)
|
||||
from hermes_cli.auth import PROVIDER_REGISTRY
|
||||
from hermes_cli.models import OPENROUTER_MODELS, _PROVIDER_MODELS
|
||||
|
||||
results: List[dict] = []
|
||||
@@ -753,9 +754,16 @@ def list_authenticated_providers(
|
||||
if not isinstance(pdata, dict):
|
||||
continue
|
||||
|
||||
env_vars = pdata.get("env", [])
|
||||
if not isinstance(env_vars, list):
|
||||
continue
|
||||
# Prefer auth.py PROVIDER_REGISTRY for env var names — it's our
|
||||
# source of truth. models.dev can have wrong mappings (e.g.
|
||||
# minimax-cn → MINIMAX_API_KEY instead of MINIMAX_CN_API_KEY).
|
||||
pconfig = PROVIDER_REGISTRY.get(hermes_id)
|
||||
if pconfig and pconfig.api_key_env_vars:
|
||||
env_vars = list(pconfig.api_key_env_vars)
|
||||
else:
|
||||
env_vars = pdata.get("env", [])
|
||||
if not isinstance(env_vars, list):
|
||||
continue
|
||||
|
||||
# Check if any env var is set
|
||||
has_creds = any(os.environ.get(ev) for ev in env_vars)
|
||||
|
||||
@@ -102,7 +102,7 @@ _RESERVED_NAMES = frozenset({
|
||||
# Hermes subcommands that cannot be used as profile names/aliases
|
||||
_HERMES_SUBCOMMANDS = frozenset({
|
||||
"chat", "model", "gateway", "setup", "whatsapp", "login", "logout",
|
||||
"status", "cron", "doctor", "config", "pairing", "skills", "tools",
|
||||
"status", "cron", "doctor", "dump", "config", "pairing", "skills", "tools",
|
||||
"mcp", "sessions", "insights", "version", "update", "uninstall",
|
||||
"profile", "plugins", "honcho", "acp",
|
||||
})
|
||||
@@ -1007,7 +1007,7 @@ _hermes_completion() {
|
||||
|
||||
# Top-level subcommands
|
||||
if [[ "$COMP_CWORD" == 1 ]]; then
|
||||
local commands="chat model gateway setup status cron doctor config skills tools mcp sessions profile update version"
|
||||
local commands="chat model gateway setup status cron doctor dump config skills tools mcp sessions profile update version"
|
||||
COMPREPLY=($(compgen -W "$commands" -- "$cur"))
|
||||
fi
|
||||
}
|
||||
@@ -1032,7 +1032,7 @@ _hermes() {
|
||||
_arguments \\
|
||||
'-p[Profile name]:profile:($profiles)' \\
|
||||
'--profile[Profile name]:profile:($profiles)' \\
|
||||
'1:command:(chat model gateway setup status cron doctor config skills tools mcp sessions profile update version)' \\
|
||||
'1:command:(chat model gateway setup status cron doctor dump config skills tools mcp sessions profile update version)' \\
|
||||
'*::arg:->args'
|
||||
|
||||
case $words[1] in
|
||||
|
||||
+171
-23
@@ -2572,9 +2572,120 @@ _OPENCLAW_SCRIPT = (
|
||||
)
|
||||
|
||||
|
||||
def _load_openclaw_migration_module():
|
||||
"""Load the openclaw_to_hermes migration script as a module.
|
||||
|
||||
Returns the loaded module, or None if the script can't be loaded.
|
||||
"""
|
||||
if not _OPENCLAW_SCRIPT.exists():
|
||||
return None
|
||||
|
||||
spec = importlib.util.spec_from_file_location(
|
||||
"openclaw_to_hermes", _OPENCLAW_SCRIPT
|
||||
)
|
||||
if spec is None or spec.loader is None:
|
||||
return None
|
||||
|
||||
mod = importlib.util.module_from_spec(spec)
|
||||
# Register in sys.modules so @dataclass can resolve the module
|
||||
# (Python 3.11+ requires this for dynamically loaded modules)
|
||||
import sys as _sys
|
||||
_sys.modules[spec.name] = mod
|
||||
try:
|
||||
spec.loader.exec_module(mod)
|
||||
except Exception:
|
||||
_sys.modules.pop(spec.name, None)
|
||||
raise
|
||||
return mod
|
||||
|
||||
|
||||
# Item kinds that represent high-impact changes warranting explicit warnings.
|
||||
# Gateway tokens/channels can hijack messaging platforms from the old agent.
|
||||
# Config values may have different semantics between OpenClaw and Hermes.
|
||||
# Instruction/context files (.md) can contain incompatible setup procedures.
|
||||
_HIGH_IMPACT_KIND_KEYWORDS = {
|
||||
"gateway": "⚠ Gateway/messaging — this will configure Hermes to use your OpenClaw messaging channels",
|
||||
"telegram": "⚠ Telegram — this will point Hermes at your OpenClaw Telegram bot",
|
||||
"slack": "⚠ Slack — this will point Hermes at your OpenClaw Slack workspace",
|
||||
"discord": "⚠ Discord — this will point Hermes at your OpenClaw Discord bot",
|
||||
"whatsapp": "⚠ WhatsApp — this will point Hermes at your OpenClaw WhatsApp connection",
|
||||
"config": "⚠ Config values — OpenClaw settings may not map 1:1 to Hermes equivalents",
|
||||
"soul": "⚠ Instruction file — may contain OpenClaw-specific setup/restart procedures",
|
||||
"memory": "⚠ Memory/context file — may reference OpenClaw-specific infrastructure",
|
||||
"context": "⚠ Context file — may contain OpenClaw-specific instructions",
|
||||
}
|
||||
|
||||
|
||||
def _print_migration_preview(report: dict):
|
||||
"""Print a detailed dry-run preview of what migration would do.
|
||||
|
||||
Groups items by category and adds explicit warnings for high-impact
|
||||
changes like gateway token takeover and config value differences.
|
||||
"""
|
||||
items = report.get("items", [])
|
||||
if not items:
|
||||
print_info("Nothing to migrate.")
|
||||
return
|
||||
|
||||
migrated_items = [i for i in items if i.get("status") == "migrated"]
|
||||
conflict_items = [i for i in items if i.get("status") == "conflict"]
|
||||
skipped_items = [i for i in items if i.get("status") == "skipped"]
|
||||
|
||||
warnings_shown = set()
|
||||
|
||||
if migrated_items:
|
||||
print(color(" Would import:", Colors.GREEN))
|
||||
for item in migrated_items:
|
||||
kind = item.get("kind", "unknown")
|
||||
dest = item.get("destination", "")
|
||||
if dest:
|
||||
dest_short = str(dest).replace(str(Path.home()), "~")
|
||||
print(f" {kind:<22s} → {dest_short}")
|
||||
else:
|
||||
print(f" {kind}")
|
||||
|
||||
# Check for high-impact items and collect warnings
|
||||
kind_lower = kind.lower()
|
||||
dest_lower = str(dest).lower()
|
||||
for keyword, warning in _HIGH_IMPACT_KIND_KEYWORDS.items():
|
||||
if keyword in kind_lower or keyword in dest_lower:
|
||||
warnings_shown.add(warning)
|
||||
print()
|
||||
|
||||
if conflict_items:
|
||||
print(color(" Would overwrite (conflicts with existing Hermes config):", Colors.YELLOW))
|
||||
for item in conflict_items:
|
||||
kind = item.get("kind", "unknown")
|
||||
reason = item.get("reason", "already exists")
|
||||
print(f" {kind:<22s} {reason}")
|
||||
print()
|
||||
|
||||
if skipped_items:
|
||||
print(color(" Would skip:", Colors.DIM))
|
||||
for item in skipped_items:
|
||||
kind = item.get("kind", "unknown")
|
||||
reason = item.get("reason", "")
|
||||
print(f" {kind:<22s} {reason}")
|
||||
print()
|
||||
|
||||
# Print collected warnings
|
||||
if warnings_shown:
|
||||
print(color(" ── Warnings ──", Colors.YELLOW))
|
||||
for warning in sorted(warnings_shown):
|
||||
print(color(f" {warning}", Colors.YELLOW))
|
||||
print()
|
||||
print(color(" Note: OpenClaw config values may have different semantics in Hermes.", Colors.YELLOW))
|
||||
print(color(" For example, OpenClaw's tool_call_execution: \"auto\" ≠ Hermes's yolo mode.", Colors.YELLOW))
|
||||
print(color(" Instruction files (.md) from OpenClaw may contain incompatible procedures.", Colors.YELLOW))
|
||||
print()
|
||||
|
||||
|
||||
def _offer_openclaw_migration(hermes_home: Path) -> bool:
|
||||
"""Detect ~/.openclaw and offer to migrate during first-time setup.
|
||||
|
||||
Runs a dry-run first to show the user exactly what would be imported,
|
||||
overwritten, or taken over. Only executes after explicit confirmation.
|
||||
|
||||
Returns True if migration ran successfully, False otherwise.
|
||||
"""
|
||||
openclaw_dir = Path.home() / ".openclaw"
|
||||
@@ -2587,12 +2698,12 @@ def _offer_openclaw_migration(hermes_home: Path) -> bool:
|
||||
print()
|
||||
print_header("OpenClaw Installation Detected")
|
||||
print_info(f"Found OpenClaw data at {openclaw_dir}")
|
||||
print_info("Hermes can import your settings, memories, skills, and API keys.")
|
||||
print_info("Hermes can preview what would be imported before making any changes.")
|
||||
print()
|
||||
|
||||
if not prompt_yes_no("Would you like to import from OpenClaw?", default=True):
|
||||
if not prompt_yes_no("Would you like to see what can be imported?", default=True):
|
||||
print_info(
|
||||
"Skipping migration. You can run it later via the openclaw-migration skill."
|
||||
"Skipping migration. You can run it later with: hermes claw migrate --dry-run"
|
||||
)
|
||||
return False
|
||||
|
||||
@@ -2601,34 +2712,71 @@ def _offer_openclaw_migration(hermes_home: Path) -> bool:
|
||||
if not config_path.exists():
|
||||
save_config(load_config())
|
||||
|
||||
# Dynamically load the migration script
|
||||
# Load the migration module
|
||||
try:
|
||||
spec = importlib.util.spec_from_file_location(
|
||||
"openclaw_to_hermes", _OPENCLAW_SCRIPT
|
||||
)
|
||||
if spec is None or spec.loader is None:
|
||||
mod = _load_openclaw_migration_module()
|
||||
if mod is None:
|
||||
print_warning("Could not load migration script.")
|
||||
return False
|
||||
except Exception as e:
|
||||
print_warning(f"Could not load migration script: {e}")
|
||||
logger.debug("OpenClaw migration module load error", exc_info=True)
|
||||
return False
|
||||
|
||||
mod = importlib.util.module_from_spec(spec)
|
||||
# Register in sys.modules so @dataclass can resolve the module
|
||||
# (Python 3.11+ requires this for dynamically loaded modules)
|
||||
import sys as _sys
|
||||
_sys.modules[spec.name] = mod
|
||||
try:
|
||||
spec.loader.exec_module(mod)
|
||||
except Exception:
|
||||
_sys.modules.pop(spec.name, None)
|
||||
raise
|
||||
|
||||
# Run migration with the "full" preset, execute mode, no overwrite
|
||||
# ── Phase 1: Dry-run preview ──
|
||||
try:
|
||||
selected = mod.resolve_selected_options(None, None, preset="full")
|
||||
dry_migrator = mod.Migrator(
|
||||
source_root=openclaw_dir.resolve(),
|
||||
target_root=hermes_home.resolve(),
|
||||
execute=False, # dry-run — no files modified
|
||||
workspace_target=None,
|
||||
overwrite=True, # show everything including conflicts
|
||||
migrate_secrets=True,
|
||||
output_dir=None,
|
||||
selected_options=selected,
|
||||
preset_name="full",
|
||||
)
|
||||
preview_report = dry_migrator.migrate()
|
||||
except Exception as e:
|
||||
print_warning(f"Migration preview failed: {e}")
|
||||
logger.debug("OpenClaw migration preview error", exc_info=True)
|
||||
return False
|
||||
|
||||
# Display the full preview
|
||||
preview_summary = preview_report.get("summary", {})
|
||||
preview_count = preview_summary.get("migrated", 0)
|
||||
|
||||
if preview_count == 0:
|
||||
print()
|
||||
print_info("Nothing to import from OpenClaw.")
|
||||
return False
|
||||
|
||||
print()
|
||||
print_header(f"Migration Preview — {preview_count} item(s) would be imported")
|
||||
print_info("No changes have been made yet. Review the list below:")
|
||||
print()
|
||||
_print_migration_preview(preview_report)
|
||||
|
||||
# ── Phase 2: Confirm and execute ──
|
||||
if not prompt_yes_no("Proceed with migration?", default=False):
|
||||
print_info(
|
||||
"Migration cancelled. You can run it later with: hermes claw migrate"
|
||||
)
|
||||
print_info(
|
||||
"Use --dry-run to preview again, or --preset minimal for a lighter import."
|
||||
)
|
||||
return False
|
||||
|
||||
# Execute the migration — overwrite=False so existing Hermes configs are
|
||||
# preserved. The user saw the preview; conflicts are skipped by default.
|
||||
try:
|
||||
migrator = mod.Migrator(
|
||||
source_root=openclaw_dir.resolve(),
|
||||
target_root=hermes_home.resolve(),
|
||||
execute=True,
|
||||
workspace_target=None,
|
||||
overwrite=True,
|
||||
overwrite=False, # preserve existing Hermes config
|
||||
migrate_secrets=True,
|
||||
output_dir=None,
|
||||
selected_options=selected,
|
||||
@@ -2640,7 +2788,7 @@ def _offer_openclaw_migration(hermes_home: Path) -> bool:
|
||||
logger.debug("OpenClaw migration error", exc_info=True)
|
||||
return False
|
||||
|
||||
# Print summary
|
||||
# Print final summary
|
||||
summary = report.get("summary", {})
|
||||
migrated = summary.get("migrated", 0)
|
||||
skipped = summary.get("skipped", 0)
|
||||
@@ -2651,7 +2799,7 @@ def _offer_openclaw_migration(hermes_home: Path) -> bool:
|
||||
if migrated:
|
||||
print_success(f"Imported {migrated} item(s) from OpenClaw.")
|
||||
if conflicts:
|
||||
print_info(f"Skipped {conflicts} item(s) that already exist in Hermes.")
|
||||
print_info(f"Skipped {conflicts} item(s) that already exist in Hermes (use hermes claw migrate --overwrite to force).")
|
||||
if skipped:
|
||||
print_info(f"Skipped {skipped} item(s) (not found or unchanged).")
|
||||
if errors:
|
||||
|
||||
@@ -569,7 +569,7 @@
|
||||
|
||||
# ── Activation: link config + auth + documents ────────────────────
|
||||
{
|
||||
system.activationScripts."hermes-agent-setup" = lib.stringAfter [ "users" "setupSecrets" ] ''
|
||||
system.activationScripts."hermes-agent-setup" = lib.stringAfter ([ "users" ] ++ lib.optional (config.system.activationScripts ? setupSecrets) "setupSecrets") ''
|
||||
# Ensure directories exist (activation runs before tmpfiles)
|
||||
mkdir -p ${cfg.stateDir}/.hermes
|
||||
mkdir -p ${cfg.stateDir}/home
|
||||
|
||||
+1
-1
@@ -14,7 +14,7 @@
|
||||
};
|
||||
|
||||
runtimeDeps = with pkgs; [
|
||||
nodejs_20 ripgrep git openssh ffmpeg
|
||||
nodejs_20 ripgrep git openssh ffmpeg tirith
|
||||
];
|
||||
|
||||
runtimePath = pkgs.lib.makeBinPath runtimeDeps;
|
||||
|
||||
@@ -1803,30 +1803,34 @@ class Migrator:
|
||||
def migrate_cron_jobs(self, config: Optional[Dict[str, Any]] = None) -> None:
|
||||
config = config or self.load_openclaw_config()
|
||||
cron = config.get("cron") or {}
|
||||
if not cron:
|
||||
self.record("cron-jobs", None, None, "skipped", "No cron configuration found")
|
||||
return
|
||||
|
||||
# Archive the full cron config
|
||||
if self.archive_dir and self.execute:
|
||||
self.archive_dir.mkdir(parents=True, exist_ok=True)
|
||||
dest = self.archive_dir / "cron-config.json"
|
||||
dest.write_text(json.dumps(cron, indent=2, ensure_ascii=False) + "\n", encoding="utf-8")
|
||||
self.record("cron-jobs", "openclaw.json cron.*", str(dest), "archived",
|
||||
"Cron config archived. Use 'hermes cron' to recreate jobs manually.")
|
||||
else:
|
||||
self.record("cron-jobs", "openclaw.json cron.*", "archive/cron-config.json",
|
||||
"archived", "Would archive cron config")
|
||||
|
||||
# Also check for cron store files
|
||||
cron_store = self.source_root / "cron"
|
||||
found_any = False
|
||||
|
||||
# Archive the full cron config when present
|
||||
if cron:
|
||||
found_any = True
|
||||
if self.archive_dir and self.execute:
|
||||
self.archive_dir.mkdir(parents=True, exist_ok=True)
|
||||
dest = self.archive_dir / "cron-config.json"
|
||||
dest.write_text(json.dumps(cron, indent=2, ensure_ascii=False) + "\n", encoding="utf-8")
|
||||
self.record("cron-jobs", "openclaw.json cron.*", str(dest), "archived",
|
||||
"Cron config archived. Use 'hermes cron' to recreate jobs manually.")
|
||||
else:
|
||||
self.record("cron-jobs", "openclaw.json cron.*", "archive/cron-config.json",
|
||||
"archived", "Would archive cron config")
|
||||
|
||||
# Also check for cron store files even when config.cron is missing
|
||||
if cron_store.is_dir() and self.archive_dir:
|
||||
found_any = True
|
||||
dest_cron = self.archive_dir / "cron-store"
|
||||
if self.execute:
|
||||
shutil.copytree(cron_store, dest_cron, dirs_exist_ok=True)
|
||||
self.record("cron-jobs", str(cron_store), str(dest_cron), "archived",
|
||||
"Cron job store archived")
|
||||
|
||||
if not found_any:
|
||||
self.record("cron-jobs", None, None, "skipped", "No cron configuration found")
|
||||
|
||||
# ── Hooks ─────────────────────────────────────────────────
|
||||
def migrate_hooks_config(self, config: Optional[Dict[str, Any]] = None) -> None:
|
||||
config = config or self.load_openclaw_config()
|
||||
@@ -2454,6 +2458,15 @@ class Migrator:
|
||||
notes.append(f"- **{item.kind}**: {item.reason}")
|
||||
notes.append("")
|
||||
|
||||
has_cron_config_archive = any(
|
||||
i.kind == "cron-jobs" and i.status == "archived" and i.destination and i.destination.endswith("cron-config.json")
|
||||
for i in self.items
|
||||
)
|
||||
has_cron_store_archive = any(
|
||||
i.kind == "cron-jobs" and i.status == "archived" and i.destination and i.destination.endswith("cron-store")
|
||||
for i in self.items
|
||||
)
|
||||
|
||||
notes.extend([
|
||||
"## IMPORTANT: Archive the OpenClaw Directory",
|
||||
"",
|
||||
@@ -2475,7 +2488,14 @@ class Migrator:
|
||||
"- Run `hermes claw cleanup` to archive the OpenClaw directory (prevents state confusion)",
|
||||
"- Run `hermes setup` to configure any remaining settings",
|
||||
"- Run `hermes mcp list` to verify MCP servers were imported correctly",
|
||||
"- Run `hermes cron` to recreate scheduled tasks (see archive/cron-config.json)",
|
||||
])
|
||||
|
||||
if has_cron_config_archive:
|
||||
notes.append("- Run `hermes cron` to recreate scheduled tasks (see archive/cron-config.json)")
|
||||
elif has_cron_store_archive:
|
||||
notes.append("- Run `hermes cron` to recreate scheduled tasks (see archived cron-store)")
|
||||
|
||||
notes.extend([
|
||||
"- Run `hermes gateway install` if you need the gateway service",
|
||||
"- Review `~/.hermes/config.yaml` for any adjustments",
|
||||
"",
|
||||
|
||||
+218
-125
@@ -77,6 +77,7 @@ from hermes_constants import OPENROUTER_BASE_URL
|
||||
# Agent internals extracted to agent/ package for modularity
|
||||
from agent.memory_manager import build_memory_context_block
|
||||
from agent.retry_utils import jittered_backoff
|
||||
from agent.error_classifier import classify_api_error, FailoverReason
|
||||
from agent.prompt_builder import (
|
||||
DEFAULT_AGENT_IDENTITY, PLATFORM_HINTS,
|
||||
MEMORY_GUIDANCE, SESSION_SEARCH_GUIDANCE, SKILLS_GUIDANCE,
|
||||
@@ -86,6 +87,7 @@ from agent.model_metadata import (
|
||||
fetch_model_metadata,
|
||||
estimate_tokens_rough, estimate_messages_tokens_rough, estimate_request_tokens_rough,
|
||||
get_next_probe_tier, parse_context_limit_from_error,
|
||||
parse_available_output_tokens_from_error,
|
||||
save_context_length, is_local_endpoint,
|
||||
query_ollama_num_ctx,
|
||||
)
|
||||
@@ -692,6 +694,10 @@ class AIAgent:
|
||||
self._current_tool: str | None = None
|
||||
self._api_call_count: int = 0
|
||||
|
||||
# Rate limit tracking — updated from x-ratelimit-* response headers
|
||||
# after each API call. Accessed by /usage slash command.
|
||||
self._rate_limit_state: Optional["RateLimitState"] = None
|
||||
|
||||
# Centralized logging — agent.log (INFO+) and errors.log (WARNING+)
|
||||
# both live under ~/.hermes/logs/. Idempotent, so gateway mode
|
||||
# (which creates a new AIAgent per message) won't duplicate handlers.
|
||||
@@ -2545,6 +2551,29 @@ class AIAgent:
|
||||
self._last_activity_ts = time.time()
|
||||
self._last_activity_desc = desc
|
||||
|
||||
def _capture_rate_limits(self, http_response: Any) -> None:
|
||||
"""Parse x-ratelimit-* headers from an HTTP response and cache the state.
|
||||
|
||||
Called after each streaming API call. The httpx Response object is
|
||||
available on the OpenAI SDK Stream via ``stream.response``.
|
||||
"""
|
||||
if http_response is None:
|
||||
return
|
||||
headers = getattr(http_response, "headers", None)
|
||||
if not headers:
|
||||
return
|
||||
try:
|
||||
from agent.rate_limit_tracker import parse_rate_limit_headers
|
||||
state = parse_rate_limit_headers(headers, provider=self.provider)
|
||||
if state is not None:
|
||||
self._rate_limit_state = state
|
||||
except Exception:
|
||||
pass # Never let header parsing break the agent loop
|
||||
|
||||
def get_rate_limit_state(self):
|
||||
"""Return the last captured RateLimitState, or None."""
|
||||
return self._rate_limit_state
|
||||
|
||||
def get_activity_summary(self) -> dict:
|
||||
"""Return a snapshot of the agent's current activity for diagnostics.
|
||||
|
||||
@@ -4399,6 +4428,11 @@ class AIAgent:
|
||||
self._touch_activity("waiting for provider response (streaming)")
|
||||
stream = request_client_holder["client"].chat.completions.create(**stream_kwargs)
|
||||
|
||||
# Capture rate limit headers from the initial HTTP response.
|
||||
# The OpenAI SDK Stream object exposes the underlying httpx
|
||||
# response via .response before any chunks are consumed.
|
||||
self._capture_rate_limits(getattr(stream, "response", None))
|
||||
|
||||
content_parts: list = []
|
||||
tool_calls_acc: dict = {}
|
||||
tool_gen_notified: set = set()
|
||||
@@ -4935,9 +4969,21 @@ class AIAgent:
|
||||
# Swap OpenAI client and config in-place
|
||||
self.api_key = fb_client.api_key
|
||||
self.client = fb_client
|
||||
# Preserve provider-specific headers that
|
||||
# resolve_provider_client() may have baked into
|
||||
# fb_client via the default_headers kwarg. The OpenAI
|
||||
# SDK stores these in _custom_headers. Without this,
|
||||
# subsequent request-client rebuilds (via
|
||||
# _create_request_openai_client) drop the headers,
|
||||
# causing 403s from providers like Kimi Coding that
|
||||
# require a User-Agent sentinel.
|
||||
fb_headers = getattr(fb_client, "_custom_headers", None)
|
||||
if not fb_headers:
|
||||
fb_headers = getattr(fb_client, "default_headers", None)
|
||||
self._client_kwargs = {
|
||||
"api_key": fb_client.api_key,
|
||||
"base_url": fb_base_url,
|
||||
**({"default_headers": dict(fb_headers)} if fb_headers else {}),
|
||||
}
|
||||
|
||||
# Re-evaluate prompt caching for the new provider/model
|
||||
@@ -5352,15 +5398,22 @@ class AIAgent:
|
||||
if self.api_mode == "anthropic_messages":
|
||||
from agent.anthropic_adapter import build_anthropic_kwargs
|
||||
anthropic_messages = self._prepare_anthropic_messages_for_api(api_messages)
|
||||
# Pass context_length so the adapter can clamp max_tokens if the
|
||||
# user configured a smaller context window than the model's output limit.
|
||||
# Pass context_length (total input+output window) so the adapter can
|
||||
# clamp max_tokens (output cap) when the user configured a smaller
|
||||
# context window than the model's native output limit.
|
||||
ctx_len = getattr(self, "context_compressor", None)
|
||||
ctx_len = ctx_len.context_length if ctx_len else None
|
||||
# _ephemeral_max_output_tokens is set for one call when the API
|
||||
# returns "max_tokens too large given prompt" — it caps output to
|
||||
# the available window space without touching context_length.
|
||||
ephemeral_out = getattr(self, "_ephemeral_max_output_tokens", None)
|
||||
if ephemeral_out is not None:
|
||||
self._ephemeral_max_output_tokens = None # consume immediately
|
||||
return build_anthropic_kwargs(
|
||||
model=self.model,
|
||||
messages=anthropic_messages,
|
||||
tools=self.tools,
|
||||
max_tokens=self.max_tokens,
|
||||
max_tokens=ephemeral_out if ephemeral_out is not None else self.max_tokens,
|
||||
reasoning_config=self.reasoning_config,
|
||||
is_oauth=self._is_anthropic_oauth,
|
||||
preserve_dots=self._anthropic_preserve_dots(),
|
||||
@@ -7249,6 +7302,7 @@ class AIAgent:
|
||||
length_continue_retries = 0
|
||||
truncated_response_prefix = ""
|
||||
compression_attempts = 0
|
||||
_turn_exit_reason = "unknown" # Diagnostic: why the loop ended
|
||||
|
||||
# Clear any stale interrupt state at start
|
||||
self.clear_interrupt()
|
||||
@@ -7273,6 +7327,7 @@ class AIAgent:
|
||||
# Check for interrupt request (e.g., user sent new message)
|
||||
if self._interrupt_requested:
|
||||
interrupted = True
|
||||
_turn_exit_reason = "interrupted_by_user"
|
||||
if not self.quiet_mode:
|
||||
self._safe_print("\n⚡ Breaking out of tool loop due to interrupt...")
|
||||
break
|
||||
@@ -7281,6 +7336,7 @@ class AIAgent:
|
||||
self._api_call_count = api_call_count
|
||||
self._touch_activity(f"starting API call #{api_call_count}")
|
||||
if not self.iteration_budget.consume():
|
||||
_turn_exit_reason = "budget_exhausted"
|
||||
if not self.quiet_mode:
|
||||
self._safe_print(f"\n⚠️ Iteration budget exhausted ({self.iteration_budget.used}/{self.iteration_budget.max_total} iterations used)")
|
||||
break
|
||||
@@ -7985,6 +8041,25 @@ class AIAgent:
|
||||
|
||||
status_code = getattr(api_error, "status_code", None)
|
||||
error_context = self._extract_api_error_context(api_error)
|
||||
|
||||
# ── Classify the error for structured recovery decisions ──
|
||||
_compressor = getattr(self, "context_compressor", None)
|
||||
_ctx_len = getattr(_compressor, "context_length", 200000) if _compressor else 200000
|
||||
classified = classify_api_error(
|
||||
api_error,
|
||||
provider=getattr(self, "provider", "") or "",
|
||||
model=getattr(self, "model", "") or "",
|
||||
approx_tokens=approx_tokens,
|
||||
context_length=_ctx_len,
|
||||
num_messages=len(api_messages) if api_messages else 0,
|
||||
)
|
||||
logger.debug(
|
||||
"Error classified: reason=%s status=%s retryable=%s compress=%s rotate=%s fallback=%s",
|
||||
classified.reason.value, classified.status_code,
|
||||
classified.retryable, classified.should_compress,
|
||||
classified.should_rotate_credential, classified.should_fallback,
|
||||
)
|
||||
|
||||
recovered_with_pool, has_retried_429 = self._recover_with_credential_pool(
|
||||
status_code=status_code,
|
||||
has_retried_429=has_retried_429,
|
||||
@@ -8047,27 +8122,24 @@ class AIAgent:
|
||||
# from all messages so the next retry sends no thinking
|
||||
# blocks at all. One-shot — don't retry infinitely.
|
||||
if (
|
||||
self.api_mode == "anthropic_messages"
|
||||
and status_code == 400
|
||||
classified.reason == FailoverReason.thinking_signature
|
||||
and not thinking_sig_retry_attempted
|
||||
):
|
||||
_err_msg_lower = str(api_error).lower()
|
||||
if "signature" in _err_msg_lower and "thinking" in _err_msg_lower:
|
||||
thinking_sig_retry_attempted = True
|
||||
for _m in messages:
|
||||
if isinstance(_m, dict):
|
||||
_m.pop("reasoning_details", None)
|
||||
self._vprint(
|
||||
f"{self.log_prefix}⚠️ Thinking block signature invalid — "
|
||||
f"stripped all thinking blocks, retrying...",
|
||||
force=True,
|
||||
)
|
||||
logging.warning(
|
||||
"%sThinking block signature recovery: stripped "
|
||||
"reasoning_details from %d messages",
|
||||
self.log_prefix, len(messages),
|
||||
)
|
||||
continue
|
||||
thinking_sig_retry_attempted = True
|
||||
for _m in messages:
|
||||
if isinstance(_m, dict):
|
||||
_m.pop("reasoning_details", None)
|
||||
self._vprint(
|
||||
f"{self.log_prefix}⚠️ Thinking block signature invalid — "
|
||||
f"stripped all thinking blocks, retrying...",
|
||||
force=True,
|
||||
)
|
||||
logging.warning(
|
||||
"%sThinking block signature recovery: stripped "
|
||||
"reasoning_details from %d messages",
|
||||
self.log_prefix, len(messages),
|
||||
)
|
||||
continue
|
||||
|
||||
retry_count += 1
|
||||
elapsed_time = time.time() - api_start_time
|
||||
@@ -8124,14 +8196,7 @@ class AIAgent:
|
||||
# is NOT a transient rate limit — retrying or switching
|
||||
# credentials won't help. Reduce context to 200k (the
|
||||
# standard tier) and compress.
|
||||
# Only applies to Sonnet — Opus 1M is general access.
|
||||
_is_long_context_tier_error = (
|
||||
status_code == 429
|
||||
and "extra usage" in error_msg
|
||||
and "long context" in error_msg
|
||||
and "sonnet" in self.model.lower()
|
||||
)
|
||||
if _is_long_context_tier_error:
|
||||
if classified.reason == FailoverReason.long_context_tier:
|
||||
_reduced_ctx = 200000
|
||||
compressor = self.context_compressor
|
||||
old_ctx = compressor.context_length
|
||||
@@ -8176,13 +8241,9 @@ class AIAgent:
|
||||
# When a fallback model is configured, switch immediately instead
|
||||
# of burning through retries with exponential backoff -- the
|
||||
# primary provider won't recover within the retry window.
|
||||
is_rate_limited = (
|
||||
status_code == 429
|
||||
or "rate limit" in error_msg
|
||||
or "too many requests" in error_msg
|
||||
or "rate_limit" in error_msg
|
||||
or "usage limit" in error_msg
|
||||
or "quota" in error_msg
|
||||
is_rate_limited = classified.reason in (
|
||||
FailoverReason.rate_limit,
|
||||
FailoverReason.billing,
|
||||
)
|
||||
if is_rate_limited and self._fallback_index < len(self._fallback_chain):
|
||||
# Don't eagerly fallback if credential pool rotation may
|
||||
@@ -8198,10 +8259,7 @@ class AIAgent:
|
||||
continue
|
||||
|
||||
is_payload_too_large = (
|
||||
status_code == 413
|
||||
or 'request entity too large' in error_msg
|
||||
or 'payload too large' in error_msg
|
||||
or 'error code: 413' in error_msg
|
||||
classified.reason == FailoverReason.payload_too_large
|
||||
)
|
||||
|
||||
if is_payload_too_large:
|
||||
@@ -8245,69 +8303,59 @@ class AIAgent:
|
||||
}
|
||||
|
||||
# Check for context-length errors BEFORE generic 4xx handler.
|
||||
# Local backends (LM Studio, Ollama, llama.cpp) often return
|
||||
# HTTP 400 with messages like "Context size has been exceeded"
|
||||
# which must trigger compression, not an immediate abort.
|
||||
is_context_length_error = any(phrase in error_msg for phrase in [
|
||||
'context length', 'context size', 'maximum context',
|
||||
'token limit', 'too many tokens', 'reduce the length',
|
||||
'exceeds the limit', 'context window',
|
||||
'request entity too large', # OpenRouter/Nous 413 safety net
|
||||
'prompt is too long', # Anthropic: "prompt is too long: N tokens > M maximum"
|
||||
'prompt exceeds max length', # Z.AI / GLM: generic 400 overflow wording
|
||||
])
|
||||
|
||||
# Fallback heuristic: Anthropic sometimes returns a generic
|
||||
# 400 invalid_request_error with just "Error" as the message
|
||||
# when the context is too large. If the error message is very
|
||||
# short/generic AND the session is large, treat it as a
|
||||
# probable context-length error and attempt compression rather
|
||||
# than aborting. This prevents an infinite failure loop where
|
||||
# each failed message gets persisted, making the session even
|
||||
# larger. (#1630)
|
||||
if not is_context_length_error and status_code == 400:
|
||||
ctx_len = getattr(getattr(self, 'context_compressor', None), 'context_length', 200000)
|
||||
is_large_session = approx_tokens > ctx_len * 0.4 or len(api_messages) > 80
|
||||
is_generic_error = len(error_msg.strip()) < 30 # e.g. just "error"
|
||||
if is_large_session and is_generic_error:
|
||||
is_context_length_error = True
|
||||
self._vprint(
|
||||
f"{self.log_prefix}⚠️ Generic 400 with large session "
|
||||
f"(~{approx_tokens:,} tokens, {len(api_messages)} msgs) — "
|
||||
f"treating as probable context overflow.",
|
||||
force=True,
|
||||
)
|
||||
|
||||
# Server disconnects on large sessions are often caused by
|
||||
# the request exceeding the provider's context/payload limit
|
||||
# without a proper HTTP error response. Treat these as
|
||||
# context-length errors to trigger compression rather than
|
||||
# burning through retries that will all fail the same way.
|
||||
# This breaks the death spiral: disconnect → no token data
|
||||
# → no compression → bigger session → more disconnects.
|
||||
# (#2153)
|
||||
if not is_context_length_error and not status_code:
|
||||
_is_server_disconnect = (
|
||||
'server disconnected' in error_msg
|
||||
or 'peer closed connection' in error_msg
|
||||
or error_type in ('ReadError', 'RemoteProtocolError', 'ServerDisconnectedError')
|
||||
)
|
||||
if _is_server_disconnect:
|
||||
ctx_len = getattr(getattr(self, 'context_compressor', None), 'context_length', 200000)
|
||||
_is_large = approx_tokens > ctx_len * 0.6 or len(api_messages) > 200
|
||||
if _is_large:
|
||||
is_context_length_error = True
|
||||
self._vprint(
|
||||
f"{self.log_prefix}⚠️ Server disconnected with large session "
|
||||
f"(~{approx_tokens:,} tokens, {len(api_messages)} msgs) — "
|
||||
f"treating as context-length error, attempting compression.",
|
||||
force=True,
|
||||
)
|
||||
# The classifier detects context overflow from: explicit error
|
||||
# messages, generic 400 + large session heuristic (#1630), and
|
||||
# server disconnect + large session pattern (#2153).
|
||||
is_context_length_error = (
|
||||
classified.reason == FailoverReason.context_overflow
|
||||
)
|
||||
|
||||
if is_context_length_error:
|
||||
compressor = self.context_compressor
|
||||
old_ctx = compressor.context_length
|
||||
|
||||
# ── Distinguish two very different errors ───────────
|
||||
# 1. "Prompt too long": the INPUT exceeds the context window.
|
||||
# Fix: reduce context_length + compress history.
|
||||
# 2. "max_tokens too large": input is fine, but
|
||||
# input_tokens + requested max_tokens > context_window.
|
||||
# Fix: reduce max_tokens (the OUTPUT cap) for this call.
|
||||
# Do NOT shrink context_length — the window is unchanged.
|
||||
#
|
||||
# Note: max_tokens = output token cap (one response).
|
||||
# context_length = total window (input + output combined).
|
||||
available_out = parse_available_output_tokens_from_error(error_msg)
|
||||
if available_out is not None:
|
||||
# Error is purely about the output cap being too large.
|
||||
# Cap output to the available space and retry without
|
||||
# touching context_length or triggering compression.
|
||||
safe_out = max(1, available_out - 64) # small safety margin
|
||||
self._ephemeral_max_output_tokens = safe_out
|
||||
self._vprint(
|
||||
f"{self.log_prefix}⚠️ Output cap too large for current prompt — "
|
||||
f"retrying with max_tokens={safe_out:,} "
|
||||
f"(available_tokens={available_out:,}; context_length unchanged at {old_ctx:,})",
|
||||
force=True,
|
||||
)
|
||||
# Still count against compression_attempts so we don't
|
||||
# loop forever if the error keeps recurring.
|
||||
compression_attempts += 1
|
||||
if compression_attempts > max_compression_attempts:
|
||||
self._vprint(f"{self.log_prefix}❌ Max compression attempts ({max_compression_attempts}) reached.", force=True)
|
||||
self._vprint(f"{self.log_prefix} 💡 Try /new to start a fresh conversation, or /compress to retry compression.", force=True)
|
||||
logging.error(f"{self.log_prefix}Context compression failed after {max_compression_attempts} attempts.")
|
||||
self._persist_session(messages, conversation_history)
|
||||
return {
|
||||
"messages": messages,
|
||||
"completed": False,
|
||||
"api_calls": api_call_count,
|
||||
"error": f"Context length exceeded: max compression attempts ({max_compression_attempts}) reached.",
|
||||
"partial": True
|
||||
}
|
||||
restart_with_compressed_messages = True
|
||||
break
|
||||
|
||||
# Error is about the INPUT being too large — reduce context_length.
|
||||
# Try to parse the actual limit from the error message
|
||||
parsed_limit = parse_context_limit_from_error(error_msg)
|
||||
if parsed_limit and parsed_limit < old_ctx:
|
||||
@@ -8374,35 +8422,30 @@ class AIAgent:
|
||||
"partial": True
|
||||
}
|
||||
|
||||
# Check for non-retryable client errors (4xx HTTP status codes).
|
||||
# These indicate a problem with the request itself (bad model ID,
|
||||
# invalid API key, forbidden, etc.) and will never succeed on retry.
|
||||
# Note: 413 and context-length errors are excluded — handled above.
|
||||
# 429 (rate limit) is transient and MUST be retried with backoff.
|
||||
# 529 (Anthropic overloaded) is also transient.
|
||||
# Also catch local validation errors (ValueError, TypeError) — these
|
||||
# are programming bugs, not transient failures.
|
||||
# Exclude UnicodeEncodeError — it's a ValueError subclass but is
|
||||
# handled separately by the surrogate sanitization path above.
|
||||
_RETRYABLE_STATUS_CODES = {413, 429, 529}
|
||||
# Check for non-retryable client errors. The classifier
|
||||
# already accounts for 413, 429, 529 (transient), context
|
||||
# overflow, and generic-400 heuristics. Local validation
|
||||
# errors (ValueError, TypeError) are programming bugs.
|
||||
is_local_validation_error = (
|
||||
isinstance(api_error, (ValueError, TypeError))
|
||||
and not isinstance(api_error, UnicodeEncodeError)
|
||||
)
|
||||
# Detect generic 400s from Anthropic OAuth (transient server-side failures).
|
||||
# Real invalid_request_error responses include a descriptive message;
|
||||
# transient ones contain only "Error" or are empty. (ref: issue #1608)
|
||||
_err_body = getattr(api_error, "body", None) or {}
|
||||
_err_message = (_err_body.get("error", {}).get("message", "") if isinstance(_err_body, dict) else "")
|
||||
_is_generic_400 = (status_code == 400 and _err_message.strip().lower() in ("error", ""))
|
||||
is_client_status_error = isinstance(status_code, int) and 400 <= status_code < 500 and status_code not in _RETRYABLE_STATUS_CODES and not _is_generic_400
|
||||
is_client_error = (is_local_validation_error or is_client_status_error or any(phrase in error_msg for phrase in [
|
||||
'error code: 401', 'error code: 403',
|
||||
'error code: 404', 'error code: 422',
|
||||
'is not a valid model', 'invalid model', 'model not found',
|
||||
'invalid api key', 'invalid_api_key', 'authentication',
|
||||
'unauthorized', 'forbidden', 'not found',
|
||||
])) and not is_context_length_error
|
||||
is_client_error = (
|
||||
is_local_validation_error
|
||||
or (
|
||||
not classified.retryable
|
||||
and not classified.should_compress
|
||||
and classified.reason not in (
|
||||
FailoverReason.rate_limit,
|
||||
FailoverReason.billing,
|
||||
FailoverReason.overloaded,
|
||||
FailoverReason.context_overflow,
|
||||
FailoverReason.payload_too_large,
|
||||
FailoverReason.long_context_tier,
|
||||
FailoverReason.thinking_signature,
|
||||
)
|
||||
)
|
||||
) and not is_context_length_error
|
||||
|
||||
if is_client_error:
|
||||
# Try fallback before aborting — a different provider
|
||||
@@ -8422,7 +8465,7 @@ class AIAgent:
|
||||
self._vprint(f"{self.log_prefix} 🔌 Provider: {_provider} Model: {_model}", force=True)
|
||||
self._vprint(f"{self.log_prefix} 🌐 Endpoint: {_base}", force=True)
|
||||
# Actionable guidance for common auth errors
|
||||
if status_code in (401, 403) or "unauthorized" in error_msg or "forbidden" in error_msg or "permission" in error_msg:
|
||||
if classified.is_auth or classified.reason == FailoverReason.billing:
|
||||
if _provider == "openai-codex" and status_code == 401:
|
||||
self._vprint(f"{self.log_prefix} 💡 Codex OAuth token was rejected (HTTP 401). Your token may have been", force=True)
|
||||
self._vprint(f"{self.log_prefix} refreshed by another client (Codex CLI, VS Code). To fix:", force=True)
|
||||
@@ -8582,6 +8625,7 @@ class AIAgent:
|
||||
|
||||
# If the API call was interrupted, skip response processing
|
||||
if interrupted:
|
||||
_turn_exit_reason = "interrupted_during_api_call"
|
||||
break
|
||||
|
||||
if restart_with_compressed_messages:
|
||||
@@ -8601,6 +8645,7 @@ class AIAgent:
|
||||
# (e.g. repeated context-length errors that exhausted retry_count),
|
||||
# the `response` variable is still None. Break out cleanly.
|
||||
if response is None:
|
||||
_turn_exit_reason = "all_retries_exhausted_no_response"
|
||||
print(f"{self.log_prefix}❌ All API retries exhausted with no successful response.")
|
||||
self._persist_session(messages, conversation_history)
|
||||
break
|
||||
@@ -9064,6 +9109,7 @@ class AIAgent:
|
||||
# instead of wasting API calls on retries that won't help.
|
||||
fallback = getattr(self, '_last_content_with_tools', None)
|
||||
if fallback:
|
||||
_turn_exit_reason = "fallback_prior_turn_content"
|
||||
logger.debug("Empty follow-up after tool calls — using prior turn content as final response")
|
||||
self._last_content_with_tools = None
|
||||
self._empty_content_retries = 0
|
||||
@@ -9130,6 +9176,7 @@ class AIAgent:
|
||||
# Exhausted prefill attempts, empty retries, or
|
||||
# structured reasoning with no content —
|
||||
# fall through to "(empty)" terminal.
|
||||
_turn_exit_reason = "empty_response_exhausted"
|
||||
reasoning_text = self._extract_reasoning(assistant_message)
|
||||
assistant_msg = self._build_assistant_message(assistant_message, finish_reason)
|
||||
assistant_msg["content"] = "(empty)"
|
||||
@@ -9201,6 +9248,7 @@ class AIAgent:
|
||||
|
||||
messages.append(final_msg)
|
||||
|
||||
_turn_exit_reason = f"text_response(finish_reason={finish_reason})"
|
||||
if not self.quiet_mode:
|
||||
self._safe_print(f"🎉 Conversation completed after {api_call_count} OpenAI-compatible API call(s)")
|
||||
break
|
||||
@@ -9250,6 +9298,7 @@ class AIAgent:
|
||||
|
||||
# If we're near the limit, break to avoid infinite loops
|
||||
if api_call_count >= self.max_iterations - 1:
|
||||
_turn_exit_reason = f"error_near_max_iterations({error_msg[:80]})"
|
||||
final_response = f"I apologize, but I encountered repeated errors: {error_msg}"
|
||||
# Append as assistant so the history stays valid for
|
||||
# session resume (avoids consecutive user messages).
|
||||
@@ -9260,6 +9309,7 @@ class AIAgent:
|
||||
api_call_count >= self.max_iterations
|
||||
or self.iteration_budget.remaining <= 0
|
||||
):
|
||||
_turn_exit_reason = f"max_iterations_reached({api_call_count}/{self.max_iterations})"
|
||||
if self.iteration_budget.remaining <= 0 and not self.quiet_mode:
|
||||
print(f"\n⚠️ Iteration budget exhausted ({self.iteration_budget.used}/{self.iteration_budget.max_total} iterations used)")
|
||||
final_response = self._handle_max_iterations(messages, api_call_count)
|
||||
@@ -9276,6 +9326,49 @@ class AIAgent:
|
||||
# Persist session to both JSON log and SQLite
|
||||
self._persist_session(messages, conversation_history)
|
||||
|
||||
# ── Turn-exit diagnostic log ─────────────────────────────────────
|
||||
# Always logged at INFO so agent.log captures WHY every turn ended.
|
||||
# When the last message is a tool result (agent was mid-work), log
|
||||
# at WARNING — this is the "just stops" scenario users report.
|
||||
_last_msg_role = messages[-1].get("role") if messages else None
|
||||
_last_tool_name = None
|
||||
if _last_msg_role == "tool":
|
||||
# Walk back to find the assistant message with the tool call
|
||||
for _m in reversed(messages):
|
||||
if _m.get("role") == "assistant" and _m.get("tool_calls"):
|
||||
_tcs = _m["tool_calls"]
|
||||
if _tcs and isinstance(_tcs[0], dict):
|
||||
_last_tool_name = _tcs[-1].get("function", {}).get("name")
|
||||
break
|
||||
|
||||
_turn_tool_count = sum(
|
||||
1 for m in messages
|
||||
if isinstance(m, dict) and m.get("role") == "assistant" and m.get("tool_calls")
|
||||
)
|
||||
_resp_len = len(final_response) if final_response else 0
|
||||
_budget_used = self.iteration_budget.used if self.iteration_budget else 0
|
||||
_budget_max = self.iteration_budget.max_total if self.iteration_budget else 0
|
||||
|
||||
_diag_msg = (
|
||||
"Turn ended: reason=%s model=%s api_calls=%d/%d budget=%d/%d "
|
||||
"tool_turns=%d last_msg_role=%s response_len=%d session=%s"
|
||||
)
|
||||
_diag_args = (
|
||||
_turn_exit_reason, self.model, api_call_count, self.max_iterations,
|
||||
_budget_used, _budget_max,
|
||||
_turn_tool_count, _last_msg_role, _resp_len,
|
||||
self.session_id or "none",
|
||||
)
|
||||
|
||||
if _last_msg_role == "tool" and not interrupted:
|
||||
# Agent was mid-work — this is the "just stops" case.
|
||||
logger.warning(
|
||||
"Turn ended with pending tool result (agent may appear stuck). "
|
||||
+ _diag_msg + " last_tool=%s",
|
||||
*_diag_args, _last_tool_name,
|
||||
)
|
||||
else:
|
||||
logger.info(_diag_msg, *_diag_args)
|
||||
|
||||
# Plugin hook: post_llm_call
|
||||
# Fired once per turn after the tool-calling loop completes.
|
||||
|
||||
@@ -249,7 +249,6 @@ Type these during an interactive chat session.
|
||||
/config Show config (CLI)
|
||||
/model [name] Show or change model
|
||||
/provider Show provider info
|
||||
/prompt [text] View/set system prompt (CLI)
|
||||
/personality [name] Set personality
|
||||
/reasoning [level] Set reasoning (none|low|medium|high|xhigh|show|hide)
|
||||
/verbose Cycle: off → new → all → verbose
|
||||
|
||||
@@ -0,0 +1,782 @@
|
||||
"""Tests for agent.error_classifier — structured API error classification."""
|
||||
|
||||
import pytest
|
||||
from agent.error_classifier import (
|
||||
ClassifiedError,
|
||||
FailoverReason,
|
||||
classify_api_error,
|
||||
_extract_status_code,
|
||||
_extract_error_body,
|
||||
_extract_error_code,
|
||||
_classify_402,
|
||||
)
|
||||
|
||||
|
||||
# ── Helper: mock API errors ────────────────────────────────────────────
|
||||
|
||||
class MockAPIError(Exception):
|
||||
"""Simulates an OpenAI SDK APIStatusError."""
|
||||
def __init__(self, message, status_code=None, body=None):
|
||||
super().__init__(message)
|
||||
self.status_code = status_code
|
||||
self.body = body or {}
|
||||
|
||||
|
||||
class MockTransportError(Exception):
|
||||
"""Simulates a transport-level error with a specific type name."""
|
||||
pass
|
||||
|
||||
|
||||
class ReadTimeout(MockTransportError):
|
||||
pass
|
||||
|
||||
|
||||
class ConnectError(MockTransportError):
|
||||
pass
|
||||
|
||||
|
||||
class RemoteProtocolError(MockTransportError):
|
||||
pass
|
||||
|
||||
|
||||
class ServerDisconnectedError(MockTransportError):
|
||||
pass
|
||||
|
||||
|
||||
# ── Test: FailoverReason enum ──────────────────────────────────────────
|
||||
|
||||
class TestFailoverReason:
|
||||
def test_all_reasons_have_string_values(self):
|
||||
for reason in FailoverReason:
|
||||
assert isinstance(reason.value, str)
|
||||
|
||||
def test_enum_members_exist(self):
|
||||
expected = {
|
||||
"auth", "auth_permanent", "billing", "rate_limit",
|
||||
"overloaded", "server_error", "timeout",
|
||||
"context_overflow", "payload_too_large",
|
||||
"model_not_found", "format_error",
|
||||
"thinking_signature", "long_context_tier", "unknown",
|
||||
}
|
||||
actual = {r.value for r in FailoverReason}
|
||||
assert expected == actual
|
||||
|
||||
|
||||
# ── Test: ClassifiedError ──────────────────────────────────────────────
|
||||
|
||||
class TestClassifiedError:
|
||||
def test_is_auth_property(self):
|
||||
e1 = ClassifiedError(reason=FailoverReason.auth)
|
||||
assert e1.is_auth is True
|
||||
|
||||
e2 = ClassifiedError(reason=FailoverReason.auth_permanent)
|
||||
assert e2.is_auth is True
|
||||
|
||||
e3 = ClassifiedError(reason=FailoverReason.billing)
|
||||
assert e3.is_auth is False
|
||||
|
||||
def test_is_transient_property(self):
|
||||
transient_reasons = [
|
||||
FailoverReason.rate_limit,
|
||||
FailoverReason.overloaded,
|
||||
FailoverReason.server_error,
|
||||
FailoverReason.timeout,
|
||||
FailoverReason.unknown,
|
||||
]
|
||||
for reason in transient_reasons:
|
||||
e = ClassifiedError(reason=reason)
|
||||
assert e.is_transient is True, f"{reason} should be transient"
|
||||
|
||||
non_transient = [
|
||||
FailoverReason.auth,
|
||||
FailoverReason.billing,
|
||||
FailoverReason.model_not_found,
|
||||
FailoverReason.format_error,
|
||||
]
|
||||
for reason in non_transient:
|
||||
e = ClassifiedError(reason=reason)
|
||||
assert e.is_transient is False, f"{reason} should NOT be transient"
|
||||
|
||||
def test_defaults(self):
|
||||
e = ClassifiedError(reason=FailoverReason.unknown)
|
||||
assert e.retryable is True
|
||||
assert e.should_compress is False
|
||||
assert e.should_rotate_credential is False
|
||||
assert e.should_fallback is False
|
||||
assert e.status_code is None
|
||||
assert e.message == ""
|
||||
|
||||
|
||||
# ── Test: Status code extraction ───────────────────────────────────────
|
||||
|
||||
class TestExtractStatusCode:
|
||||
def test_from_status_code_attr(self):
|
||||
e = MockAPIError("fail", status_code=429)
|
||||
assert _extract_status_code(e) == 429
|
||||
|
||||
def test_from_status_attr(self):
|
||||
class ErrWithStatus(Exception):
|
||||
status = 503
|
||||
assert _extract_status_code(ErrWithStatus()) == 503
|
||||
|
||||
def test_from_cause_chain(self):
|
||||
inner = MockAPIError("inner", status_code=401)
|
||||
outer = Exception("outer")
|
||||
outer.__cause__ = inner
|
||||
assert _extract_status_code(outer) == 401
|
||||
|
||||
def test_none_when_missing(self):
|
||||
assert _extract_status_code(Exception("generic")) is None
|
||||
|
||||
def test_rejects_non_http_status(self):
|
||||
"""Integers outside 100-599 on .status should be ignored."""
|
||||
class ErrWeirdStatus(Exception):
|
||||
status = 42
|
||||
assert _extract_status_code(ErrWeirdStatus()) is None
|
||||
|
||||
|
||||
# ── Test: Error body extraction ────────────────────────────────────────
|
||||
|
||||
class TestExtractErrorBody:
|
||||
def test_from_body_attr(self):
|
||||
e = MockAPIError("fail", body={"error": {"message": "bad"}})
|
||||
assert _extract_error_body(e) == {"error": {"message": "bad"}}
|
||||
|
||||
def test_empty_when_no_body(self):
|
||||
assert _extract_error_body(Exception("generic")) == {}
|
||||
|
||||
|
||||
# ── Test: Error code extraction ────────────────────────────────────────
|
||||
|
||||
class TestExtractErrorCode:
|
||||
def test_from_nested_error_code(self):
|
||||
body = {"error": {"code": "rate_limit_exceeded"}}
|
||||
assert _extract_error_code(body) == "rate_limit_exceeded"
|
||||
|
||||
def test_from_nested_error_type(self):
|
||||
body = {"error": {"type": "invalid_request_error"}}
|
||||
assert _extract_error_code(body) == "invalid_request_error"
|
||||
|
||||
def test_from_top_level_code(self):
|
||||
body = {"code": "model_not_found"}
|
||||
assert _extract_error_code(body) == "model_not_found"
|
||||
|
||||
def test_empty_when_no_code(self):
|
||||
assert _extract_error_code({}) == ""
|
||||
assert _extract_error_code({"error": {"message": "oops"}}) == ""
|
||||
|
||||
|
||||
# ── Test: 402 disambiguation ───────────────────────────────────────────
|
||||
|
||||
class TestClassify402:
|
||||
"""The critical 402 billing vs rate_limit disambiguation."""
|
||||
|
||||
def test_billing_exhaustion(self):
|
||||
"""Plain 402 = billing."""
|
||||
result = _classify_402(
|
||||
"payment required",
|
||||
lambda reason, **kw: ClassifiedError(reason=reason, **kw),
|
||||
)
|
||||
assert result.reason == FailoverReason.billing
|
||||
assert result.should_rotate_credential is True
|
||||
|
||||
def test_transient_usage_limit(self):
|
||||
"""402 with 'usage limit' + 'try again' = rate limit, not billing."""
|
||||
result = _classify_402(
|
||||
"usage limit exceeded. try again in 5 minutes",
|
||||
lambda reason, **kw: ClassifiedError(reason=reason, **kw),
|
||||
)
|
||||
assert result.reason == FailoverReason.rate_limit
|
||||
assert result.should_rotate_credential is True
|
||||
|
||||
def test_quota_with_retry(self):
|
||||
"""402 with 'quota' + 'retry' = rate limit."""
|
||||
result = _classify_402(
|
||||
"quota exceeded, please retry after the window resets",
|
||||
lambda reason, **kw: ClassifiedError(reason=reason, **kw),
|
||||
)
|
||||
assert result.reason == FailoverReason.rate_limit
|
||||
|
||||
def test_quota_without_retry(self):
|
||||
"""402 with just 'quota' but no transient signal = billing."""
|
||||
result = _classify_402(
|
||||
"quota exceeded",
|
||||
lambda reason, **kw: ClassifiedError(reason=reason, **kw),
|
||||
)
|
||||
assert result.reason == FailoverReason.billing
|
||||
|
||||
def test_insufficient_credits(self):
|
||||
result = _classify_402(
|
||||
"insufficient credits to complete request",
|
||||
lambda reason, **kw: ClassifiedError(reason=reason, **kw),
|
||||
)
|
||||
assert result.reason == FailoverReason.billing
|
||||
|
||||
|
||||
# ── Test: Full classification pipeline ─────────────────────────────────
|
||||
|
||||
class TestClassifyApiError:
|
||||
"""End-to-end classification tests."""
|
||||
|
||||
# ── Auth errors ──
|
||||
|
||||
def test_401_classified_as_auth(self):
|
||||
e = MockAPIError("Unauthorized", status_code=401)
|
||||
result = classify_api_error(e, provider="openrouter")
|
||||
assert result.reason == FailoverReason.auth
|
||||
assert result.should_rotate_credential is True
|
||||
# 401 is non-retryable on its own — credential rotation runs
|
||||
# before the retryability check in the agent loop.
|
||||
assert result.retryable is False
|
||||
assert result.should_fallback is True
|
||||
|
||||
def test_403_classified_as_auth(self):
|
||||
e = MockAPIError("Forbidden", status_code=403)
|
||||
result = classify_api_error(e, provider="anthropic")
|
||||
assert result.reason == FailoverReason.auth
|
||||
assert result.should_fallback is True
|
||||
|
||||
def test_403_key_limit_classified_as_billing(self):
|
||||
"""OpenRouter 403 'key limit exceeded' is billing, not auth."""
|
||||
e = MockAPIError("Key limit exceeded for this key", status_code=403)
|
||||
result = classify_api_error(e, provider="openrouter")
|
||||
assert result.reason == FailoverReason.billing
|
||||
assert result.should_rotate_credential is True
|
||||
assert result.should_fallback is True
|
||||
|
||||
def test_403_spending_limit_classified_as_billing(self):
|
||||
e = MockAPIError("spending limit reached", status_code=403)
|
||||
result = classify_api_error(e, provider="openrouter")
|
||||
assert result.reason == FailoverReason.billing
|
||||
|
||||
# ── Billing ──
|
||||
|
||||
def test_402_plain_billing(self):
|
||||
e = MockAPIError("Payment Required", status_code=402)
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.billing
|
||||
assert result.retryable is False
|
||||
|
||||
def test_402_transient_usage_limit(self):
|
||||
e = MockAPIError("usage limit exceeded, try again later", status_code=402)
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.rate_limit
|
||||
assert result.retryable is True
|
||||
|
||||
# ── Rate limit ──
|
||||
|
||||
def test_429_rate_limit(self):
|
||||
e = MockAPIError("Too Many Requests", status_code=429)
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.rate_limit
|
||||
assert result.should_fallback is True
|
||||
|
||||
# ── Server errors ──
|
||||
|
||||
def test_500_server_error(self):
|
||||
e = MockAPIError("Internal Server Error", status_code=500)
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.server_error
|
||||
assert result.retryable is True
|
||||
|
||||
def test_502_server_error(self):
|
||||
e = MockAPIError("Bad Gateway", status_code=502)
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.server_error
|
||||
|
||||
def test_503_overloaded(self):
|
||||
e = MockAPIError("Service Unavailable", status_code=503)
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.overloaded
|
||||
|
||||
def test_529_anthropic_overloaded(self):
|
||||
e = MockAPIError("Overloaded", status_code=529)
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.overloaded
|
||||
|
||||
# ── Model not found ──
|
||||
|
||||
def test_404_model_not_found(self):
|
||||
e = MockAPIError("model not found", status_code=404)
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.model_not_found
|
||||
assert result.should_fallback is True
|
||||
assert result.retryable is False
|
||||
|
||||
def test_404_generic(self):
|
||||
e = MockAPIError("Not Found", status_code=404)
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.model_not_found
|
||||
|
||||
# ── Payload too large ──
|
||||
|
||||
def test_413_payload_too_large(self):
|
||||
e = MockAPIError("Request Entity Too Large", status_code=413)
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.payload_too_large
|
||||
assert result.should_compress is True
|
||||
|
||||
# ── Context overflow ──
|
||||
|
||||
def test_400_context_length(self):
|
||||
e = MockAPIError("context length exceeded: 250000 > 200000", status_code=400)
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.context_overflow
|
||||
assert result.should_compress is True
|
||||
|
||||
def test_400_too_many_tokens(self):
|
||||
e = MockAPIError("This model's maximum context is 128000 tokens, too many tokens", status_code=400)
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.context_overflow
|
||||
|
||||
def test_400_prompt_too_long(self):
|
||||
e = MockAPIError("prompt is too long: 300000 tokens > 200000 maximum", status_code=400)
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.context_overflow
|
||||
|
||||
def test_400_generic_large_session(self):
|
||||
"""Generic 400 with large session → context overflow heuristic."""
|
||||
e = MockAPIError(
|
||||
"Error",
|
||||
status_code=400,
|
||||
body={"error": {"message": "Error"}},
|
||||
)
|
||||
result = classify_api_error(e, approx_tokens=100000, context_length=200000)
|
||||
assert result.reason == FailoverReason.context_overflow
|
||||
|
||||
def test_400_generic_small_session_is_format_error(self):
|
||||
"""Generic 400 with small session → format error, not context overflow."""
|
||||
e = MockAPIError(
|
||||
"Error",
|
||||
status_code=400,
|
||||
body={"error": {"message": "Error"}},
|
||||
)
|
||||
result = classify_api_error(e, approx_tokens=1000, context_length=200000)
|
||||
assert result.reason == FailoverReason.format_error
|
||||
|
||||
# ── Server disconnect + large session ──
|
||||
|
||||
def test_disconnect_large_session_context_overflow(self):
|
||||
"""Server disconnect with large session → context overflow."""
|
||||
e = Exception("server disconnected without sending complete message")
|
||||
result = classify_api_error(e, approx_tokens=150000, context_length=200000)
|
||||
assert result.reason == FailoverReason.context_overflow
|
||||
assert result.should_compress is True
|
||||
|
||||
def test_disconnect_small_session_timeout(self):
|
||||
"""Server disconnect with small session → timeout."""
|
||||
e = Exception("server disconnected without sending complete message")
|
||||
result = classify_api_error(e, approx_tokens=5000, context_length=200000)
|
||||
assert result.reason == FailoverReason.timeout
|
||||
|
||||
# ── Provider-specific: Anthropic thinking signature ──
|
||||
|
||||
def test_anthropic_thinking_signature(self):
|
||||
e = MockAPIError(
|
||||
"thinking block has invalid signature",
|
||||
status_code=400,
|
||||
)
|
||||
result = classify_api_error(e, provider="anthropic")
|
||||
assert result.reason == FailoverReason.thinking_signature
|
||||
assert result.retryable is True
|
||||
|
||||
def test_non_anthropic_400_with_signature_not_classified_as_thinking(self):
|
||||
"""400 with 'signature' but from non-Anthropic → format error."""
|
||||
e = MockAPIError("invalid signature", status_code=400)
|
||||
result = classify_api_error(e, provider="openrouter", approx_tokens=0)
|
||||
# Without "thinking" in the message, it shouldn't be thinking_signature
|
||||
assert result.reason != FailoverReason.thinking_signature
|
||||
|
||||
# ── Provider-specific: Anthropic long-context tier ──
|
||||
|
||||
def test_anthropic_long_context_tier(self):
|
||||
e = MockAPIError(
|
||||
"Extra usage is required for long context requests over 200k tokens",
|
||||
status_code=429,
|
||||
)
|
||||
result = classify_api_error(e, provider="anthropic", model="claude-sonnet-4")
|
||||
assert result.reason == FailoverReason.long_context_tier
|
||||
assert result.should_compress is True
|
||||
|
||||
def test_normal_429_not_long_context(self):
|
||||
"""Normal 429 without 'extra usage' + 'long context' → rate_limit."""
|
||||
e = MockAPIError("Too Many Requests", status_code=429)
|
||||
result = classify_api_error(e, provider="anthropic")
|
||||
assert result.reason == FailoverReason.rate_limit
|
||||
|
||||
# ── Transport errors ──
|
||||
|
||||
def test_read_timeout(self):
|
||||
e = ReadTimeout("Read timed out")
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.timeout
|
||||
assert result.retryable is True
|
||||
|
||||
def test_connect_error(self):
|
||||
e = ConnectError("Connection refused")
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.timeout
|
||||
|
||||
def test_connection_error_builtin(self):
|
||||
e = ConnectionError("Connection reset by peer")
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.timeout
|
||||
|
||||
def test_timeout_error_builtin(self):
|
||||
e = TimeoutError("timed out")
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.timeout
|
||||
|
||||
# ── Error code classification ──
|
||||
|
||||
def test_error_code_resource_exhausted(self):
|
||||
e = MockAPIError(
|
||||
"Resource exhausted",
|
||||
body={"error": {"code": "resource_exhausted", "message": "Too many requests"}},
|
||||
)
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.rate_limit
|
||||
|
||||
def test_error_code_model_not_found(self):
|
||||
e = MockAPIError(
|
||||
"Model not available",
|
||||
body={"error": {"code": "model_not_found"}},
|
||||
)
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.model_not_found
|
||||
|
||||
def test_error_code_context_length_exceeded(self):
|
||||
e = MockAPIError(
|
||||
"Context too large",
|
||||
body={"error": {"code": "context_length_exceeded"}},
|
||||
)
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.context_overflow
|
||||
|
||||
# ── Message-only patterns (no status code) ──
|
||||
|
||||
def test_message_billing_pattern(self):
|
||||
e = Exception("insufficient credits to complete this request")
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.billing
|
||||
|
||||
def test_message_rate_limit_pattern(self):
|
||||
e = Exception("rate limit reached for this model")
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.rate_limit
|
||||
|
||||
def test_message_auth_pattern(self):
|
||||
e = Exception("invalid api key provided")
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.auth
|
||||
|
||||
def test_message_model_not_found_pattern(self):
|
||||
e = Exception("gpt-99 is not a valid model")
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.model_not_found
|
||||
|
||||
def test_message_context_overflow_pattern(self):
|
||||
e = Exception("maximum context length exceeded")
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.context_overflow
|
||||
|
||||
# ── Unknown / fallback ──
|
||||
|
||||
def test_generic_exception_is_unknown(self):
|
||||
e = Exception("something weird happened")
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.unknown
|
||||
assert result.retryable is True
|
||||
|
||||
# ── Format error ──
|
||||
|
||||
def test_400_descriptive_format_error(self):
|
||||
"""400 with descriptive message (not context overflow) → format error."""
|
||||
e = MockAPIError(
|
||||
"Invalid value for parameter 'temperature': must be between 0 and 2",
|
||||
status_code=400,
|
||||
body={"error": {"message": "Invalid value for parameter 'temperature': must be between 0 and 2"}},
|
||||
)
|
||||
result = classify_api_error(e, approx_tokens=1000)
|
||||
assert result.reason == FailoverReason.format_error
|
||||
assert result.retryable is False
|
||||
|
||||
def test_422_format_error(self):
|
||||
e = MockAPIError("Unprocessable Entity", status_code=422)
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.format_error
|
||||
assert result.retryable is False
|
||||
|
||||
def test_400_flat_body_descriptive_not_context_overflow(self):
|
||||
"""Responses API flat body with descriptive error + large session → format error.
|
||||
|
||||
The Codex Responses API returns errors in flat body format:
|
||||
{"message": "...", "type": "..."} without an "error" wrapper.
|
||||
A descriptive 400 must NOT be misclassified as context overflow
|
||||
just because the session is large.
|
||||
"""
|
||||
e = MockAPIError(
|
||||
"Invalid 'input[index].name': string does not match pattern.",
|
||||
status_code=400,
|
||||
body={"message": "Invalid 'input[index].name': string does not match pattern.",
|
||||
"type": "invalid_request_error"},
|
||||
)
|
||||
result = classify_api_error(e, approx_tokens=200000, context_length=400000, num_messages=500)
|
||||
assert result.reason == FailoverReason.format_error
|
||||
assert result.retryable is False
|
||||
|
||||
def test_400_flat_body_generic_large_session_still_context_overflow(self):
|
||||
"""Flat body with generic 'Error' message + large session → context overflow.
|
||||
|
||||
Regression: the flat-body fallback must not break the existing heuristic
|
||||
for genuinely generic errors from providers that use flat bodies.
|
||||
"""
|
||||
e = MockAPIError(
|
||||
"Error",
|
||||
status_code=400,
|
||||
body={"message": "Error"},
|
||||
)
|
||||
result = classify_api_error(e, approx_tokens=100000, context_length=200000)
|
||||
assert result.reason == FailoverReason.context_overflow
|
||||
|
||||
# ── Peer closed + large session ──
|
||||
|
||||
def test_peer_closed_large_session(self):
|
||||
e = Exception("peer closed connection without sending complete message")
|
||||
result = classify_api_error(e, approx_tokens=130000, context_length=200000)
|
||||
assert result.reason == FailoverReason.context_overflow
|
||||
|
||||
# ── Chinese error messages ──
|
||||
|
||||
def test_chinese_context_overflow(self):
|
||||
e = MockAPIError("超过最大长度限制", status_code=400)
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.context_overflow
|
||||
|
||||
# ── Result metadata ──
|
||||
|
||||
def test_provider_and_model_in_result(self):
|
||||
e = MockAPIError("fail", status_code=500)
|
||||
result = classify_api_error(e, provider="openrouter", model="gpt-5")
|
||||
assert result.provider == "openrouter"
|
||||
assert result.model == "gpt-5"
|
||||
assert result.status_code == 500
|
||||
|
||||
def test_message_extracted(self):
|
||||
e = MockAPIError(
|
||||
"outer",
|
||||
status_code=500,
|
||||
body={"error": {"message": "Internal server error occurred"}},
|
||||
)
|
||||
result = classify_api_error(e)
|
||||
assert result.message == "Internal server error occurred"
|
||||
|
||||
|
||||
# ── Test: Adversarial / edge cases (from live testing) ─────────────────
|
||||
|
||||
class TestAdversarialEdgeCases:
|
||||
"""Edge cases discovered during live testing with real SDK objects."""
|
||||
|
||||
def test_empty_exception_message(self):
|
||||
result = classify_api_error(Exception(""))
|
||||
assert result.reason == FailoverReason.unknown
|
||||
assert result.retryable is True
|
||||
|
||||
def test_500_with_none_body(self):
|
||||
e = MockAPIError("fail", status_code=500, body=None)
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.server_error
|
||||
|
||||
def test_non_dict_body(self):
|
||||
"""Some providers return strings instead of JSON."""
|
||||
class StringBodyError(Exception):
|
||||
status_code = 400
|
||||
body = "just a string"
|
||||
result = classify_api_error(StringBodyError("bad"))
|
||||
assert result.reason == FailoverReason.format_error
|
||||
|
||||
def test_list_body(self):
|
||||
class ListBodyError(Exception):
|
||||
status_code = 500
|
||||
body = [{"error": "something"}]
|
||||
result = classify_api_error(ListBodyError("server error"))
|
||||
assert result.reason == FailoverReason.server_error
|
||||
|
||||
def test_circular_cause_chain(self):
|
||||
"""Must not infinite-loop on circular __cause__."""
|
||||
e = Exception("circular")
|
||||
e.__cause__ = e
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.unknown
|
||||
|
||||
def test_three_level_cause_chain(self):
|
||||
inner = MockAPIError("inner", status_code=429)
|
||||
middle = Exception("middle")
|
||||
middle.__cause__ = inner
|
||||
outer = RuntimeError("outer")
|
||||
outer.__cause__ = middle
|
||||
result = classify_api_error(outer)
|
||||
assert result.status_code == 429
|
||||
assert result.reason == FailoverReason.rate_limit
|
||||
|
||||
def test_400_with_rate_limit_text(self):
|
||||
"""Some providers send rate limits as 400 instead of 429."""
|
||||
e = MockAPIError(
|
||||
"rate limit policy",
|
||||
status_code=400,
|
||||
body={"error": {"message": "rate limit exceeded on this model"}},
|
||||
)
|
||||
result = classify_api_error(e, provider="openrouter")
|
||||
assert result.reason == FailoverReason.rate_limit
|
||||
|
||||
def test_400_with_billing_text(self):
|
||||
"""Some providers send billing errors as 400."""
|
||||
e = MockAPIError(
|
||||
"billing",
|
||||
status_code=400,
|
||||
body={"error": {"message": "insufficient credits for this request"}},
|
||||
)
|
||||
result = classify_api_error(e)
|
||||
assert result.reason == FailoverReason.billing
|
||||
|
||||
def test_200_with_error_body(self):
|
||||
"""200 status with error in body — should be unknown, not crash."""
|
||||
class WeirdSuccess(Exception):
|
||||
status_code = 200
|
||||
body = {"error": {"message": "loading"}}
|
||||
result = classify_api_error(WeirdSuccess("model loading"))
|
||||
assert result.reason == FailoverReason.unknown
|
||||
|
||||
def test_ollama_context_size_exceeded(self):
|
||||
e = MockAPIError(
|
||||
"Error",
|
||||
status_code=400,
|
||||
body={"error": {"message": "context size has been exceeded"}},
|
||||
)
|
||||
result = classify_api_error(e, provider="ollama")
|
||||
assert result.reason == FailoverReason.context_overflow
|
||||
|
||||
def test_connection_refused_error(self):
|
||||
e = ConnectionRefusedError("Connection refused: localhost:11434")
|
||||
result = classify_api_error(e, provider="ollama")
|
||||
assert result.reason == FailoverReason.timeout
|
||||
|
||||
def test_body_message_enrichment(self):
|
||||
"""Body message must be included in pattern matching even when
|
||||
str(error) doesn't contain it (OpenAI SDK APIStatusError)."""
|
||||
e = MockAPIError(
|
||||
"Usage limit", # str(e) = "usage limit"
|
||||
status_code=402,
|
||||
body={"error": {"message": "Usage limit reached, try again in 5 minutes"}},
|
||||
)
|
||||
result = classify_api_error(e)
|
||||
# "try again" is only in body, not in str(e)
|
||||
assert result.reason == FailoverReason.rate_limit
|
||||
|
||||
def test_disconnect_pattern_ordering(self):
|
||||
"""Disconnect + large session must beat generic transport catch."""
|
||||
class FakeRemoteProtocol(Exception):
|
||||
pass
|
||||
# Type name isn't in _TRANSPORT_ERROR_TYPES but message has disconnect pattern
|
||||
e = Exception("peer closed connection without sending complete message")
|
||||
result = classify_api_error(e, approx_tokens=150000, context_length=200000)
|
||||
assert result.reason == FailoverReason.context_overflow
|
||||
assert result.should_compress is True
|
||||
|
||||
def test_credit_balance_too_low(self):
|
||||
e = MockAPIError(
|
||||
"Credits low",
|
||||
status_code=402,
|
||||
body={"error": {"message": "Your credit balance is too low"}},
|
||||
)
|
||||
result = classify_api_error(e, provider="anthropic")
|
||||
assert result.reason == FailoverReason.billing
|
||||
|
||||
def test_deepseek_402_chinese(self):
|
||||
"""Chinese billing message should still match billing patterns."""
|
||||
# "余额不足" doesn't match English billing patterns, but 402 defaults to billing
|
||||
e = MockAPIError("余额不足", status_code=402)
|
||||
result = classify_api_error(e, provider="deepseek")
|
||||
assert result.reason == FailoverReason.billing
|
||||
|
||||
def test_openrouter_wrapped_context_overflow_in_metadata_raw(self):
|
||||
"""OpenRouter wraps provider errors in metadata.raw JSON string."""
|
||||
e = MockAPIError(
|
||||
"Provider returned error",
|
||||
status_code=400,
|
||||
body={
|
||||
"error": {
|
||||
"message": "Provider returned error",
|
||||
"code": 400,
|
||||
"metadata": {
|
||||
"raw": '{"error":{"message":"context length exceeded: 50000 > 32768"}}'
|
||||
}
|
||||
}
|
||||
},
|
||||
)
|
||||
result = classify_api_error(e, provider="openrouter", approx_tokens=10000)
|
||||
assert result.reason == FailoverReason.context_overflow
|
||||
assert result.should_compress is True
|
||||
|
||||
def test_openrouter_wrapped_rate_limit_in_metadata_raw(self):
|
||||
e = MockAPIError(
|
||||
"Provider returned error",
|
||||
status_code=400,
|
||||
body={
|
||||
"error": {
|
||||
"message": "Provider returned error",
|
||||
"metadata": {
|
||||
"raw": '{"error":{"message":"Rate limit exceeded. Please retry after 30s."}}'
|
||||
}
|
||||
}
|
||||
},
|
||||
)
|
||||
result = classify_api_error(e, provider="openrouter")
|
||||
assert result.reason == FailoverReason.rate_limit
|
||||
|
||||
def test_thinking_signature_via_openrouter(self):
|
||||
"""Thinking signature errors proxied through OpenRouter must be caught."""
|
||||
e = MockAPIError(
|
||||
"thinking block has invalid signature",
|
||||
status_code=400,
|
||||
)
|
||||
# provider is openrouter, not anthropic — old code missed this
|
||||
result = classify_api_error(e, provider="openrouter", model="anthropic/claude-sonnet-4")
|
||||
assert result.reason == FailoverReason.thinking_signature
|
||||
|
||||
def test_generic_400_large_by_message_count(self):
|
||||
"""Many small messages (>80) should trigger context overflow heuristic."""
|
||||
e = MockAPIError(
|
||||
"Error",
|
||||
status_code=400,
|
||||
body={"error": {"message": "Error"}},
|
||||
)
|
||||
# Low token count but high message count
|
||||
result = classify_api_error(
|
||||
e, approx_tokens=5000, context_length=200000, num_messages=100,
|
||||
)
|
||||
assert result.reason == FailoverReason.context_overflow
|
||||
|
||||
def test_disconnect_large_by_message_count(self):
|
||||
"""Server disconnect with 200+ messages should trigger context overflow."""
|
||||
e = Exception("server disconnected without sending complete message")
|
||||
result = classify_api_error(
|
||||
e, approx_tokens=5000, context_length=200000, num_messages=250,
|
||||
)
|
||||
assert result.reason == FailoverReason.context_overflow
|
||||
|
||||
def test_openrouter_wrapped_model_not_found_in_metadata_raw(self):
|
||||
e = MockAPIError(
|
||||
"Provider returned error",
|
||||
status_code=400,
|
||||
body={
|
||||
"error": {
|
||||
"message": "Provider returned error",
|
||||
"metadata": {
|
||||
"raw": '{"error":{"message":"The model gpt-99 does not exist"}}'
|
||||
}
|
||||
}
|
||||
},
|
||||
)
|
||||
result = classify_api_error(e, provider="openrouter")
|
||||
assert result.reason == FailoverReason.model_not_found
|
||||
@@ -0,0 +1,212 @@
|
||||
"""Tests for agent.rate_limit_tracker — header parsing and formatting."""
|
||||
|
||||
import time
|
||||
import pytest
|
||||
from agent.rate_limit_tracker import (
|
||||
RateLimitBucket,
|
||||
RateLimitState,
|
||||
parse_rate_limit_headers,
|
||||
format_rate_limit_display,
|
||||
format_rate_limit_compact,
|
||||
_fmt_count,
|
||||
_fmt_seconds,
|
||||
_bar,
|
||||
)
|
||||
|
||||
|
||||
# ── Sample headers from Nous inference API ──────────────────────────────
|
||||
|
||||
NOUS_HEADERS = {
|
||||
"x-ratelimit-limit-requests": "800",
|
||||
"x-ratelimit-limit-requests-1h": "33600",
|
||||
"x-ratelimit-limit-tokens": "8000000",
|
||||
"x-ratelimit-limit-tokens-1h": "336000000",
|
||||
"x-ratelimit-remaining-requests": "795",
|
||||
"x-ratelimit-remaining-requests-1h": "33590",
|
||||
"x-ratelimit-remaining-tokens": "7999500",
|
||||
"x-ratelimit-remaining-tokens-1h": "335999000",
|
||||
"x-ratelimit-reset-requests": "45.5",
|
||||
"x-ratelimit-reset-requests-1h": "3500.0",
|
||||
"x-ratelimit-reset-tokens": "42.3",
|
||||
"x-ratelimit-reset-tokens-1h": "3490.0",
|
||||
}
|
||||
|
||||
|
||||
class TestParseHeaders:
|
||||
def test_basic_parsing(self):
|
||||
state = parse_rate_limit_headers(NOUS_HEADERS, provider="nous")
|
||||
assert state is not None
|
||||
assert state.provider == "nous"
|
||||
assert state.has_data
|
||||
|
||||
assert state.requests_min.limit == 800
|
||||
assert state.requests_min.remaining == 795
|
||||
assert state.requests_min.reset_seconds == 45.5
|
||||
|
||||
assert state.requests_hour.limit == 33600
|
||||
assert state.requests_hour.remaining == 33590
|
||||
|
||||
assert state.tokens_min.limit == 8000000
|
||||
assert state.tokens_min.remaining == 7999500
|
||||
|
||||
assert state.tokens_hour.limit == 336000000
|
||||
assert state.tokens_hour.remaining == 335999000
|
||||
assert state.tokens_hour.reset_seconds == 3490.0
|
||||
|
||||
def test_no_headers(self):
|
||||
state = parse_rate_limit_headers({})
|
||||
assert state is None
|
||||
|
||||
def test_partial_headers(self):
|
||||
headers = {
|
||||
"x-ratelimit-limit-requests": "100",
|
||||
"x-ratelimit-remaining-requests": "50",
|
||||
}
|
||||
state = parse_rate_limit_headers(headers)
|
||||
assert state is not None
|
||||
assert state.requests_min.limit == 100
|
||||
assert state.requests_min.remaining == 50
|
||||
# Missing fields default to 0
|
||||
assert state.tokens_min.limit == 0
|
||||
|
||||
def test_non_rate_limit_headers_ignored(self):
|
||||
headers = {
|
||||
"content-type": "application/json",
|
||||
"server": "nginx",
|
||||
}
|
||||
state = parse_rate_limit_headers(headers)
|
||||
assert state is None
|
||||
|
||||
def test_malformed_values(self):
|
||||
headers = {
|
||||
"x-ratelimit-limit-requests": "not-a-number",
|
||||
"x-ratelimit-remaining-requests": "",
|
||||
"x-ratelimit-reset-requests": "abc",
|
||||
}
|
||||
state = parse_rate_limit_headers(headers)
|
||||
assert state is not None
|
||||
assert state.requests_min.limit == 0
|
||||
assert state.requests_min.remaining == 0
|
||||
assert state.requests_min.reset_seconds == 0.0
|
||||
|
||||
|
||||
class TestBucket:
|
||||
def test_used(self):
|
||||
b = RateLimitBucket(limit=800, remaining=795, reset_seconds=45.0, captured_at=time.time())
|
||||
assert b.used == 5
|
||||
|
||||
def test_usage_pct(self):
|
||||
b = RateLimitBucket(limit=100, remaining=20, reset_seconds=30.0, captured_at=time.time())
|
||||
assert b.usage_pct == pytest.approx(80.0)
|
||||
|
||||
def test_usage_pct_zero_limit(self):
|
||||
b = RateLimitBucket(limit=0, remaining=0)
|
||||
assert b.usage_pct == 0.0
|
||||
|
||||
def test_remaining_seconds_now(self):
|
||||
now = time.time()
|
||||
b = RateLimitBucket(limit=800, remaining=795, reset_seconds=60.0, captured_at=now - 10)
|
||||
# ~50 seconds should remain
|
||||
assert 49 <= b.remaining_seconds_now <= 51
|
||||
|
||||
def test_remaining_seconds_expired(self):
|
||||
b = RateLimitBucket(limit=800, remaining=795, reset_seconds=30.0, captured_at=time.time() - 60)
|
||||
assert b.remaining_seconds_now == 0.0
|
||||
|
||||
|
||||
class TestFormatting:
|
||||
def test_fmt_count_millions(self):
|
||||
assert _fmt_count(8000000) == "8.0M"
|
||||
assert _fmt_count(336000000) == "336.0M"
|
||||
|
||||
def test_fmt_count_thousands(self):
|
||||
assert _fmt_count(33600) == "33.6K"
|
||||
assert _fmt_count(1500) == "1.5K"
|
||||
|
||||
def test_fmt_count_small(self):
|
||||
assert _fmt_count(800) == "800"
|
||||
assert _fmt_count(0) == "0"
|
||||
|
||||
def test_fmt_seconds_short(self):
|
||||
assert _fmt_seconds(45) == "45s"
|
||||
assert _fmt_seconds(0) == "0s"
|
||||
|
||||
def test_fmt_seconds_minutes(self):
|
||||
assert _fmt_seconds(125) == "2m 5s"
|
||||
assert _fmt_seconds(120) == "2m"
|
||||
|
||||
def test_fmt_seconds_hours(self):
|
||||
assert _fmt_seconds(3660) == "1h 1m"
|
||||
assert _fmt_seconds(3600) == "1h"
|
||||
|
||||
def test_bar(self):
|
||||
bar = _bar(50.0, width=10)
|
||||
assert bar == "[█████░░░░░]"
|
||||
assert _bar(0.0, width=10) == "[░░░░░░░░░░]"
|
||||
assert _bar(100.0, width=10) == "[██████████]"
|
||||
|
||||
def test_format_display_no_data(self):
|
||||
state = RateLimitState()
|
||||
result = format_rate_limit_display(state)
|
||||
assert "No rate limit data" in result
|
||||
|
||||
def test_format_display_with_data(self):
|
||||
state = parse_rate_limit_headers(NOUS_HEADERS, provider="nous")
|
||||
result = format_rate_limit_display(state)
|
||||
assert "Nous" in result
|
||||
assert "Requests/min" in result
|
||||
assert "Requests/hr" in result
|
||||
assert "Tokens/min" in result
|
||||
assert "Tokens/hr" in result
|
||||
assert "resets in" in result
|
||||
|
||||
def test_format_display_warning_on_high_usage(self):
|
||||
headers = {
|
||||
**NOUS_HEADERS,
|
||||
"x-ratelimit-remaining-requests": "50", # 750/800 used = 93.75%
|
||||
}
|
||||
state = parse_rate_limit_headers(headers)
|
||||
result = format_rate_limit_display(state)
|
||||
assert "⚠" in result
|
||||
|
||||
def test_format_compact(self):
|
||||
state = parse_rate_limit_headers(NOUS_HEADERS, provider="nous")
|
||||
result = format_rate_limit_compact(state)
|
||||
assert "RPM:" in result
|
||||
assert "RPH:" in result
|
||||
assert "TPM:" in result
|
||||
assert "TPH:" in result
|
||||
assert "resets" in result
|
||||
|
||||
def test_format_compact_no_data(self):
|
||||
state = RateLimitState()
|
||||
result = format_rate_limit_compact(state)
|
||||
assert "No rate limit data" in result
|
||||
|
||||
|
||||
class TestAgentIntegration:
|
||||
"""Test that AIAgent captures rate limit state correctly."""
|
||||
|
||||
def test_capture_rate_limits_from_headers(self):
|
||||
"""Simulate the header capture path without a real API call."""
|
||||
import sys
|
||||
import os
|
||||
# Use a mock httpx-like response
|
||||
class MockResponse:
|
||||
headers = NOUS_HEADERS
|
||||
|
||||
# Import AIAgent minimally
|
||||
from unittest.mock import MagicMock, patch
|
||||
|
||||
# Test the parsing directly
|
||||
state = parse_rate_limit_headers(MockResponse.headers, provider="nous")
|
||||
assert state is not None
|
||||
assert state.requests_min.limit == 800
|
||||
assert state.tokens_hour.limit == 336000000
|
||||
|
||||
def test_capture_rate_limits_none_response(self):
|
||||
"""_capture_rate_limits should handle None gracefully."""
|
||||
from agent.rate_limit_tracker import parse_rate_limit_headers
|
||||
# None should not crash
|
||||
result = parse_rate_limit_headers({})
|
||||
assert result is None
|
||||
@@ -3,6 +3,7 @@
|
||||
import os
|
||||
import pytest
|
||||
from pathlib import Path
|
||||
from unittest.mock import patch
|
||||
|
||||
from agent.subdirectory_hints import SubdirectoryHintTracker
|
||||
|
||||
@@ -189,3 +190,45 @@ class TestSubdirectoryHintTracker:
|
||||
"terminal", {"command": "curl https://example.com/frontend/api"}
|
||||
)
|
||||
assert result is None
|
||||
|
||||
|
||||
class TestPermissionErrorHandling:
|
||||
"""Regression tests for PermissionError in filesystem checks (ref #6214)."""
|
||||
|
||||
def test_is_valid_subdir_permission_error(self, tmp_path):
|
||||
"""_is_valid_subdir should return False when is_dir() raises PermissionError."""
|
||||
tracker = SubdirectoryHintTracker(working_dir=str(tmp_path))
|
||||
restricted = tmp_path / "restricted"
|
||||
restricted.mkdir()
|
||||
with patch.object(Path, "is_dir", side_effect=PermissionError("Permission denied")):
|
||||
assert tracker._is_valid_subdir(restricted) is False
|
||||
|
||||
def test_load_hints_permission_error_on_is_file(self, tmp_path):
|
||||
"""_load_hints_for_directory should skip files when is_file() raises PermissionError."""
|
||||
tracker = SubdirectoryHintTracker(working_dir=str(tmp_path))
|
||||
restricted = tmp_path / "restricted"
|
||||
restricted.mkdir()
|
||||
original_is_file = Path.is_file
|
||||
def patched_is_file(self):
|
||||
if "restricted" in str(self):
|
||||
raise PermissionError("Permission denied")
|
||||
return original_is_file(self)
|
||||
with patch.object(Path, "is_file", patched_is_file):
|
||||
result = tracker._load_hints_for_directory(restricted)
|
||||
assert result is None
|
||||
|
||||
def test_check_tool_call_survives_inaccessible_path(self, project):
|
||||
"""Full check_tool_call should not crash when a path is inaccessible."""
|
||||
tracker = SubdirectoryHintTracker(working_dir=str(project))
|
||||
original_is_dir = Path.is_dir
|
||||
def patched_is_dir(self):
|
||||
if "backend" in str(self) and "src" not in str(self):
|
||||
raise PermissionError("Permission denied")
|
||||
return original_is_dir(self)
|
||||
with patch.object(Path, "is_dir", patched_is_dir):
|
||||
# Should not raise — gracefully skip the inaccessible directory
|
||||
result = tracker.check_tool_call(
|
||||
"read_file", {"path": str(project / "backend" / "src" / "main.py")}
|
||||
)
|
||||
# Result may be None (backend skipped) — the key point is no crash
|
||||
assert result is None or isinstance(result, str)
|
||||
|
||||
@@ -2,22 +2,65 @@ import queue
|
||||
import threading
|
||||
import time
|
||||
from types import SimpleNamespace
|
||||
from unittest.mock import MagicMock
|
||||
from unittest.mock import MagicMock, patch
|
||||
|
||||
import cli as cli_module
|
||||
from cli import HermesCLI
|
||||
|
||||
|
||||
class _FakeBuffer:
|
||||
def __init__(self, text="", cursor_position=None):
|
||||
self.text = text
|
||||
self.cursor_position = len(text) if cursor_position is None else cursor_position
|
||||
|
||||
def reset(self, append_to_history=False):
|
||||
self.text = ""
|
||||
self.cursor_position = 0
|
||||
|
||||
|
||||
def _make_cli_stub():
|
||||
cli = HermesCLI.__new__(HermesCLI)
|
||||
cli._approval_state = None
|
||||
cli._approval_deadline = 0
|
||||
cli._approval_lock = threading.Lock()
|
||||
cli._sudo_state = None
|
||||
cli._sudo_deadline = 0
|
||||
cli._modal_input_snapshot = None
|
||||
cli._invalidate = MagicMock()
|
||||
cli._app = SimpleNamespace(invalidate=MagicMock())
|
||||
cli._app = SimpleNamespace(invalidate=MagicMock(), current_buffer=_FakeBuffer())
|
||||
return cli
|
||||
|
||||
|
||||
class TestCliApprovalUi:
|
||||
def test_sudo_prompt_restores_existing_draft_after_response(self):
|
||||
cli = _make_cli_stub()
|
||||
cli._app.current_buffer = _FakeBuffer("draft command", cursor_position=5)
|
||||
result = {}
|
||||
|
||||
def _run_callback():
|
||||
result["value"] = cli._sudo_password_callback()
|
||||
|
||||
with patch.object(cli_module, "_cprint"):
|
||||
thread = threading.Thread(target=_run_callback, daemon=True)
|
||||
thread.start()
|
||||
|
||||
deadline = time.time() + 2
|
||||
while cli._sudo_state is None and time.time() < deadline:
|
||||
time.sleep(0.01)
|
||||
|
||||
assert cli._sudo_state is not None
|
||||
assert cli._app.current_buffer.text == ""
|
||||
|
||||
cli._app.current_buffer.text = "secret"
|
||||
cli._app.current_buffer.cursor_position = len("secret")
|
||||
cli._sudo_state["response_queue"].put("secret")
|
||||
|
||||
thread.join(timeout=2)
|
||||
|
||||
assert result["value"] == "secret"
|
||||
assert cli._app.current_buffer.text == "draft command"
|
||||
assert cli._app.current_buffer.cursor_position == 5
|
||||
|
||||
def test_approval_callback_includes_view_for_long_commands(self):
|
||||
cli = _make_cli_stub()
|
||||
command = "sudo dd if=/tmp/githubcli-keyring.gpg of=/usr/share/keyrings/githubcli-archive-keyring.gpg bs=4M status=progress"
|
||||
|
||||
@@ -41,6 +41,7 @@ def _attach_agent(
|
||||
session_completion_tokens=completion_tokens,
|
||||
session_total_tokens=total_tokens,
|
||||
session_api_calls=api_calls,
|
||||
get_rate_limit_state=lambda: None,
|
||||
context_compressor=SimpleNamespace(
|
||||
last_prompt_tokens=context_tokens,
|
||||
context_length=context_length,
|
||||
|
||||
@@ -38,6 +38,8 @@ def _isolate_hermes_home(tmp_path, monkeypatch):
|
||||
monkeypatch.delenv("HERMES_SESSION_CHAT_ID", raising=False)
|
||||
monkeypatch.delenv("HERMES_SESSION_CHAT_NAME", raising=False)
|
||||
monkeypatch.delenv("HERMES_GATEWAY_SESSION", raising=False)
|
||||
# Avoid making real calls during tests if this key is set in the env files
|
||||
monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
|
||||
@@ -38,10 +38,11 @@ def _make_timeout_error() -> httpx.TimeoutException:
|
||||
# cache_image_from_url (base.py)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
@patch("tools.url_safety.is_safe_url", return_value=True)
|
||||
class TestCacheImageFromUrl:
|
||||
"""Tests for gateway.platforms.base.cache_image_from_url"""
|
||||
|
||||
def test_success_on_first_attempt(self, tmp_path, monkeypatch):
|
||||
def test_success_on_first_attempt(self, _mock_safe, tmp_path, monkeypatch):
|
||||
"""A clean 200 response caches the image and returns a path."""
|
||||
monkeypatch.setattr("gateway.platforms.base.IMAGE_CACHE_DIR", tmp_path / "img")
|
||||
|
||||
@@ -65,7 +66,7 @@ class TestCacheImageFromUrl:
|
||||
assert path.endswith(".jpg")
|
||||
mock_client.get.assert_called_once()
|
||||
|
||||
def test_retries_on_timeout_then_succeeds(self, tmp_path, monkeypatch):
|
||||
def test_retries_on_timeout_then_succeeds(self, _mock_safe, tmp_path, monkeypatch):
|
||||
"""A timeout on the first attempt is retried; second attempt succeeds."""
|
||||
monkeypatch.setattr("gateway.platforms.base.IMAGE_CACHE_DIR", tmp_path / "img")
|
||||
|
||||
@@ -95,7 +96,7 @@ class TestCacheImageFromUrl:
|
||||
assert mock_client.get.call_count == 2
|
||||
mock_sleep.assert_called_once()
|
||||
|
||||
def test_retries_on_429_then_succeeds(self, tmp_path, monkeypatch):
|
||||
def test_retries_on_429_then_succeeds(self, _mock_safe, tmp_path, monkeypatch):
|
||||
"""A 429 response on the first attempt is retried; second attempt succeeds."""
|
||||
monkeypatch.setattr("gateway.platforms.base.IMAGE_CACHE_DIR", tmp_path / "img")
|
||||
|
||||
@@ -122,7 +123,7 @@ class TestCacheImageFromUrl:
|
||||
assert path.endswith(".jpg")
|
||||
assert mock_client.get.call_count == 2
|
||||
|
||||
def test_raises_after_max_retries_exhausted(self, tmp_path, monkeypatch):
|
||||
def test_raises_after_max_retries_exhausted(self, _mock_safe, tmp_path, monkeypatch):
|
||||
"""Timeout on every attempt raises after all retries are consumed."""
|
||||
monkeypatch.setattr("gateway.platforms.base.IMAGE_CACHE_DIR", tmp_path / "img")
|
||||
|
||||
@@ -145,7 +146,7 @@ class TestCacheImageFromUrl:
|
||||
# 3 total calls: initial + 2 retries
|
||||
assert mock_client.get.call_count == 3
|
||||
|
||||
def test_non_retryable_4xx_raises_immediately(self, tmp_path, monkeypatch):
|
||||
def test_non_retryable_4xx_raises_immediately(self, _mock_safe, tmp_path, monkeypatch):
|
||||
"""A 404 (non-retryable) is raised immediately without any retry."""
|
||||
monkeypatch.setattr("gateway.platforms.base.IMAGE_CACHE_DIR", tmp_path / "img")
|
||||
|
||||
@@ -175,10 +176,11 @@ class TestCacheImageFromUrl:
|
||||
# cache_audio_from_url (base.py)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
@patch("tools.url_safety.is_safe_url", return_value=True)
|
||||
class TestCacheAudioFromUrl:
|
||||
"""Tests for gateway.platforms.base.cache_audio_from_url"""
|
||||
|
||||
def test_success_on_first_attempt(self, tmp_path, monkeypatch):
|
||||
def test_success_on_first_attempt(self, _mock_safe, tmp_path, monkeypatch):
|
||||
"""A clean 200 response caches the audio and returns a path."""
|
||||
monkeypatch.setattr("gateway.platforms.base.AUDIO_CACHE_DIR", tmp_path / "audio")
|
||||
|
||||
@@ -202,7 +204,7 @@ class TestCacheAudioFromUrl:
|
||||
assert path.endswith(".ogg")
|
||||
mock_client.get.assert_called_once()
|
||||
|
||||
def test_retries_on_timeout_then_succeeds(self, tmp_path, monkeypatch):
|
||||
def test_retries_on_timeout_then_succeeds(self, _mock_safe, tmp_path, monkeypatch):
|
||||
"""A timeout on the first attempt is retried; second attempt succeeds."""
|
||||
monkeypatch.setattr("gateway.platforms.base.AUDIO_CACHE_DIR", tmp_path / "audio")
|
||||
|
||||
@@ -232,7 +234,7 @@ class TestCacheAudioFromUrl:
|
||||
assert mock_client.get.call_count == 2
|
||||
mock_sleep.assert_called_once()
|
||||
|
||||
def test_retries_on_429_then_succeeds(self, tmp_path, monkeypatch):
|
||||
def test_retries_on_429_then_succeeds(self, _mock_safe, tmp_path, monkeypatch):
|
||||
"""A 429 response on the first attempt is retried; second attempt succeeds."""
|
||||
monkeypatch.setattr("gateway.platforms.base.AUDIO_CACHE_DIR", tmp_path / "audio")
|
||||
|
||||
@@ -259,7 +261,7 @@ class TestCacheAudioFromUrl:
|
||||
assert path.endswith(".ogg")
|
||||
assert mock_client.get.call_count == 2
|
||||
|
||||
def test_retries_on_500_then_succeeds(self, tmp_path, monkeypatch):
|
||||
def test_retries_on_500_then_succeeds(self, _mock_safe, tmp_path, monkeypatch):
|
||||
"""A 500 response on the first attempt is retried; second attempt succeeds."""
|
||||
monkeypatch.setattr("gateway.platforms.base.AUDIO_CACHE_DIR", tmp_path / "audio")
|
||||
|
||||
@@ -286,7 +288,7 @@ class TestCacheAudioFromUrl:
|
||||
assert path.endswith(".ogg")
|
||||
assert mock_client.get.call_count == 2
|
||||
|
||||
def test_raises_after_max_retries_exhausted(self, tmp_path, monkeypatch):
|
||||
def test_raises_after_max_retries_exhausted(self, _mock_safe, tmp_path, monkeypatch):
|
||||
"""Timeout on every attempt raises after all retries are consumed."""
|
||||
monkeypatch.setattr("gateway.platforms.base.AUDIO_CACHE_DIR", tmp_path / "audio")
|
||||
|
||||
@@ -309,7 +311,7 @@ class TestCacheAudioFromUrl:
|
||||
# 3 total calls: initial + 2 retries
|
||||
assert mock_client.get.call_count == 3
|
||||
|
||||
def test_non_retryable_4xx_raises_immediately(self, tmp_path, monkeypatch):
|
||||
def test_non_retryable_4xx_raises_immediately(self, _mock_safe, tmp_path, monkeypatch):
|
||||
"""A 404 (non-retryable) is raised immediately without any retry."""
|
||||
monkeypatch.setattr("gateway.platforms.base.AUDIO_CACHE_DIR", tmp_path / "audio")
|
||||
|
||||
|
||||
@@ -4,7 +4,7 @@ import base64
|
||||
import os
|
||||
from pathlib import Path
|
||||
from types import SimpleNamespace
|
||||
from unittest.mock import AsyncMock
|
||||
from unittest.mock import AsyncMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
@@ -355,7 +355,8 @@ class TestMediaUpload:
|
||||
assert calls[3][1]["chunk_index"] == 2
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_download_remote_bytes_rejects_large_content_length(self):
|
||||
@patch("tools.url_safety.is_safe_url", return_value=True)
|
||||
async def test_download_remote_bytes_rejects_large_content_length(self, _mock_safe):
|
||||
from gateway.platforms.wecom import WeComAdapter
|
||||
|
||||
class FakeResponse:
|
||||
|
||||
@@ -628,14 +628,21 @@ class TestHasAnyProviderConfigured:
|
||||
def test_claude_code_creds_ignored_on_fresh_install(self, monkeypatch, tmp_path):
|
||||
"""Claude Code credentials should NOT skip the wizard when Hermes is unconfigured."""
|
||||
from hermes_cli import config as config_module
|
||||
from hermes_cli.auth import PROVIDER_REGISTRY
|
||||
hermes_home = tmp_path / ".hermes"
|
||||
hermes_home.mkdir()
|
||||
monkeypatch.setattr(config_module, "get_env_path", lambda: hermes_home / ".env")
|
||||
monkeypatch.setattr(config_module, "get_hermes_home", lambda: hermes_home)
|
||||
# Clear all provider env vars so earlier checks don't short-circuit
|
||||
for var in ("OPENROUTER_API_KEY", "OPENAI_API_KEY", "ANTHROPIC_API_KEY",
|
||||
"ANTHROPIC_TOKEN", "OPENAI_BASE_URL"):
|
||||
_all_vars = {"OPENROUTER_API_KEY", "OPENAI_API_KEY", "ANTHROPIC_API_KEY",
|
||||
"ANTHROPIC_TOKEN", "OPENAI_BASE_URL"}
|
||||
for pconfig in PROVIDER_REGISTRY.values():
|
||||
if pconfig.auth_type == "api_key":
|
||||
_all_vars.update(pconfig.api_key_env_vars)
|
||||
for var in _all_vars:
|
||||
monkeypatch.delenv(var, raising=False)
|
||||
# Prevent gh-cli / copilot auth fallback from leaking in
|
||||
monkeypatch.setattr("hermes_cli.auth.get_auth_status", lambda _pid: {})
|
||||
# Simulate valid Claude Code credentials
|
||||
monkeypatch.setattr(
|
||||
"agent.anthropic_adapter.read_claude_code_credentials",
|
||||
@@ -710,6 +717,7 @@ class TestHasAnyProviderConfigured:
|
||||
"""config.yaml model dict with empty default and no creds stays false."""
|
||||
import yaml
|
||||
from hermes_cli import config as config_module
|
||||
from hermes_cli.auth import PROVIDER_REGISTRY
|
||||
hermes_home = tmp_path / ".hermes"
|
||||
hermes_home.mkdir()
|
||||
config_file = hermes_home / "config.yaml"
|
||||
@@ -719,9 +727,15 @@ class TestHasAnyProviderConfigured:
|
||||
monkeypatch.setattr(config_module, "get_env_path", lambda: hermes_home / ".env")
|
||||
monkeypatch.setattr(config_module, "get_hermes_home", lambda: hermes_home)
|
||||
monkeypatch.setenv("HERMES_HOME", str(hermes_home))
|
||||
for var in ("OPENROUTER_API_KEY", "OPENAI_API_KEY", "ANTHROPIC_API_KEY",
|
||||
"ANTHROPIC_TOKEN", "OPENAI_BASE_URL"):
|
||||
_all_vars = {"OPENROUTER_API_KEY", "OPENAI_API_KEY", "ANTHROPIC_API_KEY",
|
||||
"ANTHROPIC_TOKEN", "OPENAI_BASE_URL"}
|
||||
for pconfig in PROVIDER_REGISTRY.values():
|
||||
if pconfig.auth_type == "api_key":
|
||||
_all_vars.update(pconfig.api_key_env_vars)
|
||||
for var in _all_vars:
|
||||
monkeypatch.delenv(var, raising=False)
|
||||
# Prevent gh-cli / copilot auth fallback from leaking in
|
||||
monkeypatch.setattr("hermes_cli.auth.get_auth_status", lambda _pid: {})
|
||||
from hermes_cli.main import _has_any_provider_configured
|
||||
assert _has_any_provider_configured() is False
|
||||
|
||||
@@ -941,9 +955,10 @@ class TestHuggingFaceModels:
|
||||
"""Every HF model should have a context length entry."""
|
||||
from hermes_cli.models import _PROVIDER_MODELS
|
||||
from agent.model_metadata import DEFAULT_CONTEXT_LENGTHS
|
||||
lower_keys = {k.lower() for k in DEFAULT_CONTEXT_LENGTHS}
|
||||
hf_models = _PROVIDER_MODELS["huggingface"]
|
||||
for model in hf_models:
|
||||
assert model in DEFAULT_CONTEXT_LENGTHS, (
|
||||
assert model.lower() in lower_keys, (
|
||||
f"HF model {model!r} missing from DEFAULT_CONTEXT_LENGTHS"
|
||||
)
|
||||
|
||||
|
||||
@@ -425,8 +425,8 @@ class TestSlashCommandCompleter:
|
||||
class TestSubcommands:
|
||||
def test_explicit_subcommands_extracted(self):
|
||||
"""Commands with explicit subcommands on CommandDef are extracted."""
|
||||
assert "/prompt" in SUBCOMMANDS
|
||||
assert "clear" in SUBCOMMANDS["/prompt"]
|
||||
assert "/skills" in SUBCOMMANDS
|
||||
assert "install" in SUBCOMMANDS["/skills"]
|
||||
|
||||
def test_reasoning_has_subcommands(self):
|
||||
assert "/reasoning" in SUBCOMMANDS
|
||||
|
||||
@@ -44,7 +44,7 @@ class TestOfferOpenclawMigration:
|
||||
assert setup_mod._offer_openclaw_migration(tmp_path / ".hermes") is False
|
||||
|
||||
def test_runs_migration_when_user_accepts(self, tmp_path):
|
||||
"""Should dynamically load the script and run the Migrator."""
|
||||
"""Should run dry-run preview first, then execute after confirmation."""
|
||||
openclaw_dir = tmp_path / ".openclaw"
|
||||
openclaw_dir.mkdir()
|
||||
|
||||
@@ -60,6 +60,7 @@ class TestOfferOpenclawMigration:
|
||||
fake_migrator = MagicMock()
|
||||
fake_migrator.migrate.return_value = {
|
||||
"summary": {"migrated": 3, "skipped": 1, "conflict": 0, "error": 0},
|
||||
"items": [{"kind": "config", "status": "migrated", "destination": "/tmp/x"}],
|
||||
"output_dir": str(hermes_home / "migration"),
|
||||
}
|
||||
fake_mod.Migrator = MagicMock(return_value=fake_migrator)
|
||||
@@ -70,6 +71,7 @@ class TestOfferOpenclawMigration:
|
||||
with (
|
||||
patch("hermes_cli.setup.Path.home", return_value=tmp_path),
|
||||
patch.object(setup_mod, "_OPENCLAW_SCRIPT", script),
|
||||
# Both prompts answered Yes: preview offer + proceed confirmation
|
||||
patch.object(setup_mod, "prompt_yes_no", return_value=True),
|
||||
patch.object(setup_mod, "get_config_path", return_value=config_path),
|
||||
patch("importlib.util.spec_from_file_location") as mock_spec_fn,
|
||||
@@ -91,13 +93,75 @@ class TestOfferOpenclawMigration:
|
||||
fake_mod.resolve_selected_options.assert_called_once_with(
|
||||
None, None, preset="full"
|
||||
)
|
||||
fake_mod.Migrator.assert_called_once()
|
||||
call_kwargs = fake_mod.Migrator.call_args[1]
|
||||
assert call_kwargs["execute"] is True
|
||||
assert call_kwargs["overwrite"] is True
|
||||
assert call_kwargs["migrate_secrets"] is True
|
||||
assert call_kwargs["preset_name"] == "full"
|
||||
fake_migrator.migrate.assert_called_once()
|
||||
# Migrator called twice: once for dry-run preview, once for execution
|
||||
assert fake_mod.Migrator.call_count == 2
|
||||
|
||||
# First call: dry-run preview (execute=False, overwrite=True to show all)
|
||||
preview_kwargs = fake_mod.Migrator.call_args_list[0][1]
|
||||
assert preview_kwargs["execute"] is False
|
||||
assert preview_kwargs["overwrite"] is True
|
||||
assert preview_kwargs["migrate_secrets"] is True
|
||||
assert preview_kwargs["preset_name"] == "full"
|
||||
|
||||
# Second call: actual execution (execute=True, overwrite=False to preserve)
|
||||
exec_kwargs = fake_mod.Migrator.call_args_list[1][1]
|
||||
assert exec_kwargs["execute"] is True
|
||||
assert exec_kwargs["overwrite"] is False
|
||||
assert exec_kwargs["migrate_secrets"] is True
|
||||
assert exec_kwargs["preset_name"] == "full"
|
||||
|
||||
# migrate() called twice (once per Migrator instance)
|
||||
assert fake_migrator.migrate.call_count == 2
|
||||
|
||||
def test_user_declines_after_preview(self, tmp_path):
|
||||
"""Should return False when user sees preview but declines to proceed."""
|
||||
openclaw_dir = tmp_path / ".openclaw"
|
||||
openclaw_dir.mkdir()
|
||||
|
||||
hermes_home = tmp_path / ".hermes"
|
||||
hermes_home.mkdir()
|
||||
config_path = hermes_home / "config.yaml"
|
||||
config_path.write_text("agent:\n max_turns: 90\n")
|
||||
|
||||
fake_mod = ModuleType("openclaw_to_hermes")
|
||||
fake_mod.resolve_selected_options = MagicMock(return_value={"soul", "memory"})
|
||||
fake_migrator = MagicMock()
|
||||
fake_migrator.migrate.return_value = {
|
||||
"summary": {"migrated": 3, "skipped": 0, "conflict": 0, "error": 0},
|
||||
"items": [{"kind": "config", "status": "migrated", "destination": "/tmp/x"}],
|
||||
}
|
||||
fake_mod.Migrator = MagicMock(return_value=fake_migrator)
|
||||
|
||||
script = tmp_path / "openclaw_to_hermes.py"
|
||||
script.write_text("# placeholder")
|
||||
|
||||
# First prompt (preview): Yes, Second prompt (proceed): No
|
||||
prompt_responses = iter([True, False])
|
||||
|
||||
with (
|
||||
patch("hermes_cli.setup.Path.home", return_value=tmp_path),
|
||||
patch.object(setup_mod, "_OPENCLAW_SCRIPT", script),
|
||||
patch.object(setup_mod, "prompt_yes_no", side_effect=prompt_responses),
|
||||
patch.object(setup_mod, "get_config_path", return_value=config_path),
|
||||
patch("importlib.util.spec_from_file_location") as mock_spec_fn,
|
||||
):
|
||||
mock_spec = MagicMock()
|
||||
mock_spec.loader = MagicMock()
|
||||
mock_spec_fn.return_value = mock_spec
|
||||
|
||||
def exec_module(mod):
|
||||
mod.resolve_selected_options = fake_mod.resolve_selected_options
|
||||
mod.Migrator = fake_mod.Migrator
|
||||
|
||||
mock_spec.loader.exec_module = exec_module
|
||||
|
||||
result = setup_mod._offer_openclaw_migration(hermes_home)
|
||||
|
||||
assert result is False
|
||||
# Only dry-run Migrator was created, not the execute one
|
||||
assert fake_mod.Migrator.call_count == 1
|
||||
preview_kwargs = fake_mod.Migrator.call_args[1]
|
||||
assert preview_kwargs["execute"] is False
|
||||
|
||||
def test_handles_migration_error_gracefully(self, tmp_path):
|
||||
"""Should catch exceptions and return False."""
|
||||
|
||||
@@ -354,6 +354,14 @@ def test_first_install_nous_auto_configures_managed_defaults(monkeypatch):
|
||||
lambda *args, **kwargs: {"web", "image_gen", "tts", "browser"},
|
||||
)
|
||||
monkeypatch.setattr("hermes_cli.tools_config.save_config", lambda config: None)
|
||||
# Prevent leaked platform tokens (e.g. DISCORD_BOT_TOKEN from gateway.run
|
||||
# import) from adding extra platforms. The loop in tools_command runs
|
||||
# apply_nous_managed_defaults per platform; a second iteration sees values
|
||||
# set by the first as "explicit" and skips them.
|
||||
monkeypatch.setattr(
|
||||
"hermes_cli.tools_config._get_enabled_platforms",
|
||||
lambda: ["cli"],
|
||||
)
|
||||
monkeypatch.setattr(
|
||||
"hermes_cli.nous_subscription.get_nous_auth_status",
|
||||
lambda: {"logged_in": True},
|
||||
|
||||
@@ -368,6 +368,9 @@ class TestCmdUpdateLaunchdRestart:
|
||||
monkeypatch.setattr(
|
||||
gateway_cli, "is_macos", lambda: False,
|
||||
)
|
||||
monkeypatch.setattr(
|
||||
gateway_cli, "is_linux", lambda: True,
|
||||
)
|
||||
|
||||
mock_run.side_effect = _make_run_side_effect(
|
||||
commit_count="3",
|
||||
|
||||
@@ -658,6 +658,47 @@ def test_workspace_agents_records_skip_when_missing(tmp_path: Path):
|
||||
assert wa_items[0]["status"] == "skipped"
|
||||
|
||||
|
||||
def test_cron_store_is_archived_without_config_cron_section(tmp_path: Path):
|
||||
"""Bug fix: archive cron store even when openclaw.json has no top-level cron config."""
|
||||
mod = load_module()
|
||||
source = tmp_path / ".openclaw"
|
||||
target = tmp_path / ".hermes"
|
||||
output_dir = target / "migration-report"
|
||||
source.mkdir()
|
||||
target.mkdir()
|
||||
|
||||
(source / "openclaw.json").write_text(json.dumps({"channels": {}}), encoding="utf-8")
|
||||
(source / "cron").mkdir(parents=True)
|
||||
(source / "cron" / "jobs.json").write_text(
|
||||
json.dumps({"version": 1, "jobs": [{"id": "job-1", "name": "demo"}]}),
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
migrator = mod.Migrator(
|
||||
source_root=source,
|
||||
target_root=target,
|
||||
execute=True,
|
||||
workspace_target=None,
|
||||
overwrite=False,
|
||||
migrate_secrets=False,
|
||||
output_dir=output_dir,
|
||||
selected_options={"cron-jobs"},
|
||||
)
|
||||
report = migrator.migrate()
|
||||
|
||||
cron_items = [item for item in report["items"] if item["kind"] == "cron-jobs"]
|
||||
archived_store = next(
|
||||
(item for item in cron_items if item["destination"] and item["destination"].endswith("archive/cron-store")),
|
||||
None,
|
||||
)
|
||||
assert archived_store is not None
|
||||
assert Path(archived_store["destination"]).joinpath("jobs.json").exists()
|
||||
|
||||
notes_text = (output_dir / "MIGRATION_NOTES.md").read_text(encoding="utf-8")
|
||||
assert "Run `hermes cron` to recreate scheduled tasks" in notes_text
|
||||
assert "archive/cron-config.json" not in notes_text
|
||||
|
||||
|
||||
def test_skill_installs_cleanly_under_skills_guard():
|
||||
skills_guard = load_skills_guard()
|
||||
result = skills_guard.scan_skill(
|
||||
|
||||
@@ -0,0 +1,319 @@
|
||||
"""Tests for the context-halving bugfix.
|
||||
|
||||
Background
|
||||
----------
|
||||
When the API returns "max_tokens too large given prompt" (input is fine,
|
||||
but input_tokens + requested max_tokens > context_window), the old code
|
||||
incorrectly halved context_length via get_next_probe_tier().
|
||||
|
||||
The fix introduces:
|
||||
* parse_available_output_tokens_from_error() — detects this specific
|
||||
error class and returns the available output token budget.
|
||||
* _ephemeral_max_output_tokens on AIAgent — a one-shot override that
|
||||
caps the output for one retry without touching context_length.
|
||||
|
||||
Naming note
|
||||
-----------
|
||||
max_tokens = OUTPUT token cap (a single response).
|
||||
context_length = TOTAL context window (input + output combined).
|
||||
These are different and the old code conflated them; the fix keeps them
|
||||
separate.
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
from unittest.mock import MagicMock, patch, PropertyMock
|
||||
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# parse_available_output_tokens_from_error — unit tests
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestParseAvailableOutputTokens:
|
||||
"""Pure-function tests; no I/O required."""
|
||||
|
||||
def _parse(self, msg):
|
||||
from agent.model_metadata import parse_available_output_tokens_from_error
|
||||
return parse_available_output_tokens_from_error(msg)
|
||||
|
||||
# ── Should detect and extract ────────────────────────────────────────
|
||||
|
||||
def test_anthropic_canonical_format(self):
|
||||
"""Canonical Anthropic error: max_tokens: X > context_window: Y - input_tokens: Z = available_tokens: W"""
|
||||
msg = (
|
||||
"max_tokens: 32768 > context_window: 200000 "
|
||||
"- input_tokens: 190000 = available_tokens: 10000"
|
||||
)
|
||||
assert self._parse(msg) == 10000
|
||||
|
||||
def test_anthropic_format_large_numbers(self):
|
||||
msg = (
|
||||
"max_tokens: 128000 > context_window: 200000 "
|
||||
"- input_tokens: 180000 = available_tokens: 20000"
|
||||
)
|
||||
assert self._parse(msg) == 20000
|
||||
|
||||
def test_available_tokens_variant_spacing(self):
|
||||
"""Handles extra spaces around the colon."""
|
||||
msg = "max_tokens: 32768 > 200000 available_tokens : 5000"
|
||||
assert self._parse(msg) == 5000
|
||||
|
||||
def test_available_tokens_natural_language(self):
|
||||
"""'available tokens: N' wording (no underscore)."""
|
||||
msg = "max_tokens must be at most 10000 given your prompt (available tokens: 10000)"
|
||||
assert self._parse(msg) == 10000
|
||||
|
||||
def test_single_token_available(self):
|
||||
"""Edge case: only 1 token left."""
|
||||
msg = "max_tokens: 9999 > context_window: 10000 - input_tokens: 9999 = available_tokens: 1"
|
||||
assert self._parse(msg) == 1
|
||||
|
||||
# ── Should NOT detect (returns None) ─────────────────────────────────
|
||||
|
||||
def test_prompt_too_long_is_not_output_cap_error(self):
|
||||
"""'prompt is too long' errors must NOT be caught — they need context halving."""
|
||||
msg = "prompt is too long: 205000 tokens > 200000 maximum"
|
||||
assert self._parse(msg) is None
|
||||
|
||||
def test_generic_context_window_exceeded(self):
|
||||
"""Generic context window errors without available_tokens should not match."""
|
||||
msg = "context window exceeded: maximum is 32768 tokens"
|
||||
assert self._parse(msg) is None
|
||||
|
||||
def test_context_length_exceeded(self):
|
||||
msg = "context_length_exceeded: prompt has 131073 tokens, limit is 131072"
|
||||
assert self._parse(msg) is None
|
||||
|
||||
def test_no_max_tokens_keyword(self):
|
||||
"""Error not related to max_tokens at all."""
|
||||
msg = "invalid_api_key: the API key is invalid"
|
||||
assert self._parse(msg) is None
|
||||
|
||||
def test_empty_string(self):
|
||||
assert self._parse("") is None
|
||||
|
||||
def test_rate_limit_error(self):
|
||||
msg = "rate_limit_error: too many requests per minute"
|
||||
assert self._parse(msg) is None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# build_anthropic_kwargs — output cap clamping
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestBuildAnthropicKwargsClamping:
|
||||
"""The context_length clamp only fires when output ceiling > window.
|
||||
For standard Anthropic models (output ceiling < window) it must not fire.
|
||||
"""
|
||||
|
||||
def _build(self, model, max_tokens=None, context_length=None):
|
||||
from agent.anthropic_adapter import build_anthropic_kwargs
|
||||
return build_anthropic_kwargs(
|
||||
model=model,
|
||||
messages=[{"role": "user", "content": "hi"}],
|
||||
tools=None,
|
||||
max_tokens=max_tokens,
|
||||
reasoning_config=None,
|
||||
context_length=context_length,
|
||||
)
|
||||
|
||||
def test_no_clamping_when_output_ceiling_fits_in_window(self):
|
||||
"""Opus 4.6 native output (128K) < context window (200K) — no clamping."""
|
||||
kwargs = self._build("claude-opus-4-6", context_length=200_000)
|
||||
assert kwargs["max_tokens"] == 128_000
|
||||
|
||||
def test_clamping_fires_for_tiny_custom_window(self):
|
||||
"""When context_length is 8K (local model), output cap is clamped to 7999."""
|
||||
kwargs = self._build("claude-opus-4-6", context_length=8_000)
|
||||
assert kwargs["max_tokens"] == 7_999
|
||||
|
||||
def test_explicit_max_tokens_respected_when_within_window(self):
|
||||
"""Explicit max_tokens smaller than window passes through unchanged."""
|
||||
kwargs = self._build("claude-opus-4-6", max_tokens=4096, context_length=200_000)
|
||||
assert kwargs["max_tokens"] == 4096
|
||||
|
||||
def test_explicit_max_tokens_clamped_when_exceeds_window(self):
|
||||
"""Explicit max_tokens larger than a small window is clamped."""
|
||||
kwargs = self._build("claude-opus-4-6", max_tokens=32_768, context_length=16_000)
|
||||
assert kwargs["max_tokens"] == 15_999
|
||||
|
||||
def test_no_context_length_uses_native_ceiling(self):
|
||||
"""Without context_length the native output ceiling is used directly."""
|
||||
kwargs = self._build("claude-sonnet-4-6")
|
||||
assert kwargs["max_tokens"] == 64_000
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Ephemeral max_tokens mechanism — _build_api_kwargs
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestEphemeralMaxOutputTokens:
|
||||
"""_build_api_kwargs consumes _ephemeral_max_output_tokens exactly once
|
||||
and falls back to self.max_tokens on subsequent calls.
|
||||
"""
|
||||
|
||||
def _make_agent(self):
|
||||
"""Return a minimal AIAgent with api_mode='anthropic_messages' and
|
||||
a stubbed context_compressor, bypassing full __init__ cost."""
|
||||
from run_agent import AIAgent
|
||||
agent = object.__new__(AIAgent)
|
||||
# Minimal attributes used by _build_api_kwargs
|
||||
agent.api_mode = "anthropic_messages"
|
||||
agent.model = "claude-opus-4-6"
|
||||
agent.tools = []
|
||||
agent.max_tokens = None
|
||||
agent.reasoning_config = None
|
||||
agent._is_anthropic_oauth = False
|
||||
agent._ephemeral_max_output_tokens = None
|
||||
|
||||
compressor = MagicMock()
|
||||
compressor.context_length = 200_000
|
||||
agent.context_compressor = compressor
|
||||
|
||||
# Stub out the internal message-preparation helper
|
||||
agent._prepare_anthropic_messages_for_api = MagicMock(
|
||||
return_value=[{"role": "user", "content": "hi"}]
|
||||
)
|
||||
agent._anthropic_preserve_dots = MagicMock(return_value=False)
|
||||
return agent
|
||||
|
||||
def test_ephemeral_override_is_used_on_first_call(self):
|
||||
"""When _ephemeral_max_output_tokens is set, it overrides self.max_tokens."""
|
||||
agent = self._make_agent()
|
||||
agent._ephemeral_max_output_tokens = 5_000
|
||||
|
||||
kwargs = agent._build_api_kwargs([{"role": "user", "content": "hi"}])
|
||||
assert kwargs["max_tokens"] == 5_000
|
||||
|
||||
def test_ephemeral_override_is_consumed_after_one_call(self):
|
||||
"""After one call the ephemeral override is cleared to None."""
|
||||
agent = self._make_agent()
|
||||
agent._ephemeral_max_output_tokens = 5_000
|
||||
|
||||
agent._build_api_kwargs([{"role": "user", "content": "hi"}])
|
||||
assert agent._ephemeral_max_output_tokens is None
|
||||
|
||||
def test_subsequent_call_uses_self_max_tokens(self):
|
||||
"""A second _build_api_kwargs call uses the normal max_tokens path."""
|
||||
agent = self._make_agent()
|
||||
agent._ephemeral_max_output_tokens = 5_000
|
||||
agent.max_tokens = None # will resolve to native ceiling (128K for Opus 4.6)
|
||||
|
||||
agent._build_api_kwargs([{"role": "user", "content": "hi"}])
|
||||
# Second call — ephemeral is gone
|
||||
kwargs2 = agent._build_api_kwargs([{"role": "user", "content": "hi"}])
|
||||
assert kwargs2["max_tokens"] == 128_000 # Opus 4.6 native ceiling
|
||||
|
||||
def test_no_ephemeral_uses_self_max_tokens_directly(self):
|
||||
"""Without an ephemeral override, self.max_tokens is used normally."""
|
||||
agent = self._make_agent()
|
||||
agent.max_tokens = 8_192
|
||||
|
||||
kwargs = agent._build_api_kwargs([{"role": "user", "content": "hi"}])
|
||||
assert kwargs["max_tokens"] == 8_192
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Integration: error handler does NOT halve context_length for output-cap errors
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestContextNotHalvedOnOutputCapError:
|
||||
"""When the API returns 'max_tokens too large given prompt', the handler
|
||||
must set _ephemeral_max_output_tokens and NOT modify context_length.
|
||||
"""
|
||||
|
||||
def _make_agent_with_compressor(self, context_length=200_000):
|
||||
from run_agent import AIAgent
|
||||
from agent.context_compressor import ContextCompressor
|
||||
|
||||
agent = object.__new__(AIAgent)
|
||||
agent.api_mode = "anthropic_messages"
|
||||
agent.model = "claude-opus-4-6"
|
||||
agent.base_url = "https://api.anthropic.com"
|
||||
agent.tools = []
|
||||
agent.max_tokens = None
|
||||
agent.reasoning_config = None
|
||||
agent._is_anthropic_oauth = False
|
||||
agent._ephemeral_max_output_tokens = None
|
||||
agent.log_prefix = ""
|
||||
agent.quiet_mode = True
|
||||
agent.verbose_logging = False
|
||||
|
||||
compressor = MagicMock(spec=ContextCompressor)
|
||||
compressor.context_length = context_length
|
||||
compressor.threshold_percent = 0.75
|
||||
agent.context_compressor = compressor
|
||||
|
||||
agent._prepare_anthropic_messages_for_api = MagicMock(
|
||||
return_value=[{"role": "user", "content": "hi"}]
|
||||
)
|
||||
agent._anthropic_preserve_dots = MagicMock(return_value=False)
|
||||
agent._vprint = MagicMock()
|
||||
return agent
|
||||
|
||||
def test_output_cap_error_sets_ephemeral_not_context_length(self):
|
||||
"""On 'max_tokens too large' error, _ephemeral_max_output_tokens is set
|
||||
and compressor.context_length is left unchanged."""
|
||||
from agent.model_metadata import parse_available_output_tokens_from_error
|
||||
from agent.model_metadata import get_next_probe_tier
|
||||
|
||||
error_msg = (
|
||||
"max_tokens: 128000 > context_window: 200000 "
|
||||
"- input_tokens: 180000 = available_tokens: 20000"
|
||||
)
|
||||
|
||||
# Simulate the handler logic from run_agent.py
|
||||
agent = self._make_agent_with_compressor(context_length=200_000)
|
||||
old_ctx = agent.context_compressor.context_length
|
||||
|
||||
available_out = parse_available_output_tokens_from_error(error_msg)
|
||||
assert available_out == 20_000, "parser must detect the error"
|
||||
|
||||
# The fix: set ephemeral, skip context_length modification
|
||||
agent._ephemeral_max_output_tokens = max(1, available_out - 64)
|
||||
|
||||
# context_length must be untouched
|
||||
assert agent.context_compressor.context_length == old_ctx
|
||||
assert agent._ephemeral_max_output_tokens == 19_936
|
||||
|
||||
def test_prompt_too_long_still_triggers_probe_tier(self):
|
||||
"""Genuine prompt-too-long errors must still use get_next_probe_tier."""
|
||||
from agent.model_metadata import parse_available_output_tokens_from_error
|
||||
from agent.model_metadata import get_next_probe_tier
|
||||
|
||||
error_msg = "prompt is too long: 205000 tokens > 200000 maximum"
|
||||
|
||||
available_out = parse_available_output_tokens_from_error(error_msg)
|
||||
assert available_out is None, "prompt-too-long must not be caught by output-cap parser"
|
||||
|
||||
# The old halving path is still used for this class of error
|
||||
new_ctx = get_next_probe_tier(200_000)
|
||||
assert new_ctx == 128_000
|
||||
|
||||
def test_output_cap_error_safety_margin(self):
|
||||
"""The ephemeral value includes a 64-token safety margin below available_out."""
|
||||
from agent.model_metadata import parse_available_output_tokens_from_error
|
||||
|
||||
error_msg = (
|
||||
"max_tokens: 32768 > context_window: 200000 "
|
||||
"- input_tokens: 190000 = available_tokens: 10000"
|
||||
)
|
||||
available_out = parse_available_output_tokens_from_error(error_msg)
|
||||
safe_out = max(1, available_out - 64)
|
||||
assert safe_out == 9_936
|
||||
|
||||
def test_safety_margin_never_goes_below_one(self):
|
||||
"""When available_out is very small, safe_out must be at least 1."""
|
||||
from agent.model_metadata import parse_available_output_tokens_from_error
|
||||
|
||||
error_msg = (
|
||||
"max_tokens: 10 > context_window: 200000 "
|
||||
"- input_tokens: 199990 = available_tokens: 1"
|
||||
)
|
||||
available_out = parse_available_output_tokens_from_error(error_msg)
|
||||
safe_out = max(1, available_out - 64)
|
||||
assert safe_out == 1
|
||||
@@ -63,4 +63,4 @@ class TestCamofoxConfigDefaults:
|
||||
from hermes_cli.config import DEFAULT_CONFIG
|
||||
|
||||
# managed_persistence is auto-merged by _deep_merge, no version bump needed
|
||||
assert DEFAULT_CONFIG["_config_version"] == 12
|
||||
assert DEFAULT_CONFIG["_config_version"] == 13
|
||||
|
||||
@@ -258,28 +258,30 @@ def _make_execute_only_env(forward_env=None):
|
||||
|
||||
def test_init_env_args_uses_hermes_dotenv_for_allowlisted_env(monkeypatch):
|
||||
"""_build_init_env_args picks up forwarded env vars from .env file at init time."""
|
||||
env = _make_execute_only_env(["GITHUB_TOKEN"])
|
||||
# Use a var that is NOT in _HERMES_PROVIDER_ENV_BLOCKLIST (GITHUB_TOKEN
|
||||
# is in the copilot provider's api_key_env_vars and gets stripped).
|
||||
env = _make_execute_only_env(["DATABASE_URL"])
|
||||
|
||||
monkeypatch.delenv("GITHUB_TOKEN", raising=False)
|
||||
monkeypatch.setattr(docker_env, "_load_hermes_env_vars", lambda: {"GITHUB_TOKEN": "value_from_dotenv"})
|
||||
monkeypatch.delenv("DATABASE_URL", raising=False)
|
||||
monkeypatch.setattr(docker_env, "_load_hermes_env_vars", lambda: {"DATABASE_URL": "value_from_dotenv"})
|
||||
|
||||
args = env._build_init_env_args()
|
||||
args_str = " ".join(args)
|
||||
|
||||
assert "GITHUB_TOKEN=value_from_dotenv" in args_str
|
||||
assert "DATABASE_URL=value_from_dotenv" in args_str
|
||||
|
||||
|
||||
def test_init_env_args_prefers_shell_env_over_hermes_dotenv(monkeypatch):
|
||||
"""Shell env vars take priority over .env file values in init env args."""
|
||||
env = _make_execute_only_env(["GITHUB_TOKEN"])
|
||||
env = _make_execute_only_env(["DATABASE_URL"])
|
||||
|
||||
monkeypatch.setenv("GITHUB_TOKEN", "value_from_shell")
|
||||
monkeypatch.setattr(docker_env, "_load_hermes_env_vars", lambda: {"GITHUB_TOKEN": "value_from_dotenv"})
|
||||
monkeypatch.setenv("DATABASE_URL", "value_from_shell")
|
||||
monkeypatch.setattr(docker_env, "_load_hermes_env_vars", lambda: {"DATABASE_URL": "value_from_dotenv"})
|
||||
|
||||
args = env._build_init_env_args()
|
||||
args_str = " ".join(args)
|
||||
|
||||
assert "GITHUB_TOKEN=value_from_shell" in args_str
|
||||
assert "DATABASE_URL=value_from_shell" in args_str
|
||||
assert "value_from_dotenv" not in args_str
|
||||
|
||||
|
||||
|
||||
@@ -147,7 +147,7 @@ class TestBaseEnvCompatibility:
|
||||
"""Hermes wires parser selection through ServerManager.tool_parser."""
|
||||
import ast
|
||||
|
||||
base_env_path = Path(__file__).parent.parent / "environments" / "hermes_base_env.py"
|
||||
base_env_path = Path(__file__).parent.parent.parent / "environments" / "hermes_base_env.py"
|
||||
source = base_env_path.read_text()
|
||||
tree = ast.parse(source)
|
||||
|
||||
@@ -171,7 +171,7 @@ class TestBaseEnvCompatibility:
|
||||
|
||||
def test_hermes_base_env_uses_config_tool_call_parser(self):
|
||||
"""Verify hermes_base_env uses the config field rather than a local parser instance."""
|
||||
base_env_path = Path(__file__).parent.parent / "environments" / "hermes_base_env.py"
|
||||
base_env_path = Path(__file__).parent.parent.parent / "environments" / "hermes_base_env.py"
|
||||
source = base_env_path.read_text()
|
||||
|
||||
assert 'tool_call_parser: str = Field(' in source
|
||||
|
||||
@@ -125,7 +125,9 @@ class TestSendMatrix:
|
||||
url = call_kwargs[0][0]
|
||||
assert url.startswith("https://matrix.example.com/_matrix/client/v3/rooms/!room:example.com/send/m.room.message/")
|
||||
assert call_kwargs[1]["headers"]["Authorization"] == "Bearer syt_tok"
|
||||
assert call_kwargs[1]["json"] == {"msgtype": "m.text", "body": "hello matrix"}
|
||||
payload = call_kwargs[1]["json"]
|
||||
assert payload["msgtype"] == "m.text"
|
||||
assert payload["body"] == "hello matrix"
|
||||
|
||||
def test_http_error(self):
|
||||
resp = _make_aiohttp_resp(403, text_data="Forbidden")
|
||||
|
||||
@@ -0,0 +1,90 @@
|
||||
"""Regression tests for sudo detection and sudo password handling."""
|
||||
|
||||
import tools.terminal_tool as terminal_tool
|
||||
|
||||
|
||||
def setup_function():
|
||||
terminal_tool._cached_sudo_password = ""
|
||||
|
||||
|
||||
def teardown_function():
|
||||
terminal_tool._cached_sudo_password = ""
|
||||
|
||||
|
||||
def test_searching_for_sudo_does_not_trigger_rewrite(monkeypatch):
|
||||
monkeypatch.delenv("SUDO_PASSWORD", raising=False)
|
||||
monkeypatch.delenv("HERMES_INTERACTIVE", raising=False)
|
||||
|
||||
command = "rg --line-number --no-heading --with-filename 'sudo' . | head -n 20"
|
||||
transformed, sudo_stdin = terminal_tool._transform_sudo_command(command)
|
||||
|
||||
assert transformed == command
|
||||
assert sudo_stdin is None
|
||||
|
||||
|
||||
def test_printf_literal_sudo_does_not_trigger_rewrite(monkeypatch):
|
||||
monkeypatch.delenv("SUDO_PASSWORD", raising=False)
|
||||
monkeypatch.delenv("HERMES_INTERACTIVE", raising=False)
|
||||
|
||||
command = "printf '%s\\n' sudo"
|
||||
transformed, sudo_stdin = terminal_tool._transform_sudo_command(command)
|
||||
|
||||
assert transformed == command
|
||||
assert sudo_stdin is None
|
||||
|
||||
|
||||
def test_non_command_argument_named_sudo_does_not_trigger_rewrite(monkeypatch):
|
||||
monkeypatch.delenv("SUDO_PASSWORD", raising=False)
|
||||
monkeypatch.delenv("HERMES_INTERACTIVE", raising=False)
|
||||
|
||||
command = "grep -n sudo README.md"
|
||||
transformed, sudo_stdin = terminal_tool._transform_sudo_command(command)
|
||||
|
||||
assert transformed == command
|
||||
assert sudo_stdin is None
|
||||
|
||||
|
||||
def test_actual_sudo_command_uses_configured_password(monkeypatch):
|
||||
monkeypatch.setenv("SUDO_PASSWORD", "testpass")
|
||||
monkeypatch.delenv("HERMES_INTERACTIVE", raising=False)
|
||||
|
||||
transformed, sudo_stdin = terminal_tool._transform_sudo_command("sudo apt install -y ripgrep")
|
||||
|
||||
assert transformed == "sudo -S -p '' apt install -y ripgrep"
|
||||
assert sudo_stdin == "testpass\n"
|
||||
|
||||
|
||||
def test_actual_sudo_after_leading_env_assignment_is_rewritten(monkeypatch):
|
||||
monkeypatch.setenv("SUDO_PASSWORD", "testpass")
|
||||
monkeypatch.delenv("HERMES_INTERACTIVE", raising=False)
|
||||
|
||||
transformed, sudo_stdin = terminal_tool._transform_sudo_command("DEBUG=1 sudo whoami")
|
||||
|
||||
assert transformed == "DEBUG=1 sudo -S -p '' whoami"
|
||||
assert sudo_stdin == "testpass\n"
|
||||
|
||||
|
||||
def test_explicit_empty_sudo_password_tries_empty_without_prompt(monkeypatch):
|
||||
monkeypatch.setenv("SUDO_PASSWORD", "")
|
||||
monkeypatch.setenv("HERMES_INTERACTIVE", "1")
|
||||
|
||||
def _fail_prompt(*_args, **_kwargs):
|
||||
raise AssertionError("interactive sudo prompt should not run for explicit empty password")
|
||||
|
||||
monkeypatch.setattr(terminal_tool, "_prompt_for_sudo_password", _fail_prompt)
|
||||
|
||||
transformed, sudo_stdin = terminal_tool._transform_sudo_command("sudo true")
|
||||
|
||||
assert transformed == "sudo -S -p '' true"
|
||||
assert sudo_stdin == "\n"
|
||||
|
||||
|
||||
def test_cached_sudo_password_is_used_when_env_is_unset(monkeypatch):
|
||||
monkeypatch.delenv("SUDO_PASSWORD", raising=False)
|
||||
monkeypatch.delenv("HERMES_INTERACTIVE", raising=False)
|
||||
terminal_tool._cached_sudo_password = "cached-pass"
|
||||
|
||||
transformed, sudo_stdin = terminal_tool._transform_sudo_command("echo ok && sudo whoami")
|
||||
|
||||
assert transformed == "echo ok && sudo -S -p '' whoami"
|
||||
assert sudo_stdin == "cached-pass\n"
|
||||
@@ -30,7 +30,10 @@ class TestValidateImageUrl:
|
||||
"""Tests for URL validation, including urlparse-based netloc check."""
|
||||
|
||||
def test_valid_https_url(self):
|
||||
assert _validate_image_url("https://example.com/image.jpg") is True
|
||||
with patch("tools.url_safety.socket.getaddrinfo", return_value=[
|
||||
(2, 1, 6, "", ("93.184.216.34", 0)),
|
||||
]):
|
||||
assert _validate_image_url("https://example.com/image.jpg") is True
|
||||
|
||||
def test_valid_http_url(self):
|
||||
with patch("tools.url_safety.socket.getaddrinfo", return_value=[
|
||||
@@ -56,10 +59,16 @@ class TestValidateImageUrl:
|
||||
assert _validate_image_url("http://localhost:8080/image.png") is False
|
||||
|
||||
def test_valid_url_with_port(self):
|
||||
assert _validate_image_url("http://example.com:8080/image.png") is True
|
||||
with patch("tools.url_safety.socket.getaddrinfo", return_value=[
|
||||
(2, 1, 6, "", ("93.184.216.34", 0)),
|
||||
]):
|
||||
assert _validate_image_url("http://example.com:8080/image.png") is True
|
||||
|
||||
def test_valid_url_with_path_only(self):
|
||||
assert _validate_image_url("https://example.com/") is True
|
||||
with patch("tools.url_safety.socket.getaddrinfo", return_value=[
|
||||
(2, 1, 6, "", ("93.184.216.34", 0)),
|
||||
]):
|
||||
assert _validate_image_url("https://example.com/") is True
|
||||
|
||||
def test_rejects_empty_string(self):
|
||||
assert _validate_image_url("") is False
|
||||
@@ -441,6 +450,11 @@ class TestVisionRequirements:
|
||||
(tmp_path / "auth.json").write_text(
|
||||
'{"active_provider":"openai-codex","providers":{"openai-codex":{"tokens":{"access_token":"codex-access-token","refresh_token":"codex-refresh-token"}}}}'
|
||||
)
|
||||
# config.yaml must reference the codex provider so vision auto-detect
|
||||
# falls back to the active provider via _read_main_provider().
|
||||
(tmp_path / "config.yaml").write_text(
|
||||
'model:\n default: gpt-4o\n provider: openai-codex\n'
|
||||
)
|
||||
monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
|
||||
monkeypatch.delenv("OPENAI_BASE_URL", raising=False)
|
||||
monkeypatch.delenv("OPENAI_API_KEY", raising=False)
|
||||
|
||||
@@ -225,6 +225,7 @@ class TestWebCrawlTavily:
|
||||
patch.dict(os.environ, {"TAVILY_API_KEY": "tvly-test"}), \
|
||||
patch("tools.web_tools.httpx.post", return_value=mock_response), \
|
||||
patch("tools.web_tools.check_website_access", return_value=None), \
|
||||
patch("tools.web_tools.is_safe_url", return_value=True), \
|
||||
patch("tools.interrupt.is_interrupted", return_value=False):
|
||||
from tools.web_tools import web_crawl_tool
|
||||
result = json.loads(asyncio.get_event_loop().run_until_complete(
|
||||
@@ -244,6 +245,7 @@ class TestWebCrawlTavily:
|
||||
patch.dict(os.environ, {"TAVILY_API_KEY": "tvly-test"}), \
|
||||
patch("tools.web_tools.httpx.post", return_value=mock_response) as mock_post, \
|
||||
patch("tools.web_tools.check_website_access", return_value=None), \
|
||||
patch("tools.web_tools.is_safe_url", return_value=True), \
|
||||
patch("tools.interrupt.is_interrupted", return_value=False):
|
||||
from tools.web_tools import web_crawl_tool
|
||||
asyncio.get_event_loop().run_until_complete(
|
||||
|
||||
+117
-28
@@ -326,7 +326,6 @@ def _prompt_for_sudo_password(timeout_seconds: int = 45) -> str:
|
||||
if "HERMES_SPINNER_PAUSE" in os.environ:
|
||||
del os.environ["HERMES_SPINNER_PAUSE"]
|
||||
|
||||
|
||||
def _safe_command_preview(command: Any, limit: int = 200) -> str:
|
||||
"""Return a log-safe preview for possibly-invalid command values."""
|
||||
if command is None:
|
||||
@@ -338,6 +337,110 @@ def _safe_command_preview(command: Any, limit: int = 200) -> str:
|
||||
except Exception:
|
||||
return f"<{type(command).__name__}>"
|
||||
|
||||
def _looks_like_env_assignment(token: str) -> bool:
|
||||
"""Return True when *token* is a leading shell environment assignment."""
|
||||
if "=" not in token or token.startswith("="):
|
||||
return False
|
||||
name, _value = token.split("=", 1)
|
||||
return bool(re.match(r"^[A-Za-z_][A-Za-z0-9_]*$", name))
|
||||
|
||||
|
||||
def _read_shell_token(command: str, start: int) -> tuple[str, int]:
|
||||
"""Read one shell token, preserving quotes/escapes, starting at *start*."""
|
||||
i = start
|
||||
n = len(command)
|
||||
|
||||
while i < n:
|
||||
ch = command[i]
|
||||
if ch.isspace() or ch in ";|&()":
|
||||
break
|
||||
if ch == "'":
|
||||
i += 1
|
||||
while i < n and command[i] != "'":
|
||||
i += 1
|
||||
if i < n:
|
||||
i += 1
|
||||
continue
|
||||
if ch == '"':
|
||||
i += 1
|
||||
while i < n:
|
||||
inner = command[i]
|
||||
if inner == "\\" and i + 1 < n:
|
||||
i += 2
|
||||
continue
|
||||
if inner == '"':
|
||||
i += 1
|
||||
break
|
||||
i += 1
|
||||
continue
|
||||
if ch == "\\" and i + 1 < n:
|
||||
i += 2
|
||||
continue
|
||||
i += 1
|
||||
|
||||
return command[start:i], i
|
||||
|
||||
|
||||
def _rewrite_real_sudo_invocations(command: str) -> tuple[str, bool]:
|
||||
"""Rewrite only real unquoted sudo command words, not plain text mentions."""
|
||||
out: list[str] = []
|
||||
i = 0
|
||||
n = len(command)
|
||||
command_start = True
|
||||
found = False
|
||||
|
||||
while i < n:
|
||||
ch = command[i]
|
||||
|
||||
if ch.isspace():
|
||||
out.append(ch)
|
||||
if ch == "\n":
|
||||
command_start = True
|
||||
i += 1
|
||||
continue
|
||||
|
||||
if ch == "#" and command_start:
|
||||
comment_end = command.find("\n", i)
|
||||
if comment_end == -1:
|
||||
out.append(command[i:])
|
||||
break
|
||||
out.append(command[i:comment_end])
|
||||
i = comment_end
|
||||
continue
|
||||
|
||||
if command.startswith("&&", i) or command.startswith("||", i) or command.startswith(";;", i):
|
||||
out.append(command[i:i + 2])
|
||||
i += 2
|
||||
command_start = True
|
||||
continue
|
||||
|
||||
if ch in ";|&(":
|
||||
out.append(ch)
|
||||
i += 1
|
||||
command_start = True
|
||||
continue
|
||||
|
||||
if ch == ")":
|
||||
out.append(ch)
|
||||
i += 1
|
||||
command_start = False
|
||||
continue
|
||||
|
||||
token, next_i = _read_shell_token(command, i)
|
||||
if command_start and token == "sudo":
|
||||
out.append("sudo -S -p ''")
|
||||
found = True
|
||||
else:
|
||||
out.append(token)
|
||||
|
||||
if command_start and _looks_like_env_assignment(token):
|
||||
command_start = True
|
||||
else:
|
||||
command_start = False
|
||||
i = next_i
|
||||
|
||||
return "".join(out), found
|
||||
|
||||
|
||||
def _transform_sudo_command(command: str | None) -> tuple[str | None, str | None]:
|
||||
"""
|
||||
@@ -374,40 +477,26 @@ def _transform_sudo_command(command: str | None) -> tuple[str | None, str | None
|
||||
Command runs as-is (fails gracefully with "sudo: a password is required").
|
||||
"""
|
||||
global _cached_sudo_password
|
||||
import re
|
||||
|
||||
# Check if command even contains sudo
|
||||
if command is None:
|
||||
return None, None
|
||||
transformed, has_real_sudo = _rewrite_real_sudo_invocations(command)
|
||||
if not has_real_sudo:
|
||||
return command, None
|
||||
|
||||
if not re.search(r'\bsudo\b', command):
|
||||
return command, None # No sudo in command, nothing to do
|
||||
has_configured_password = "SUDO_PASSWORD" in os.environ
|
||||
sudo_password = os.environ.get("SUDO_PASSWORD", "") if has_configured_password else _cached_sudo_password
|
||||
|
||||
# Try to get password from: env var -> session cache -> interactive prompt
|
||||
sudo_password = os.getenv("SUDO_PASSWORD", "") or _cached_sudo_password
|
||||
if not has_configured_password and not sudo_password and os.getenv("HERMES_INTERACTIVE"):
|
||||
sudo_password = _prompt_for_sudo_password(timeout_seconds=45)
|
||||
if sudo_password:
|
||||
_cached_sudo_password = sudo_password
|
||||
|
||||
if not sudo_password:
|
||||
# No password configured - check if we're in interactive mode
|
||||
if os.getenv("HERMES_INTERACTIVE"):
|
||||
# Prompt user for password
|
||||
sudo_password = _prompt_for_sudo_password(timeout_seconds=45)
|
||||
if sudo_password:
|
||||
_cached_sudo_password = sudo_password # Cache for session
|
||||
if has_configured_password or sudo_password:
|
||||
# Trailing newline is required: sudo -S reads one line for the password.
|
||||
return transformed, sudo_password + "\n"
|
||||
|
||||
if not sudo_password:
|
||||
return command, None # No password, let it fail gracefully
|
||||
|
||||
def replace_sudo(match):
|
||||
# Replace bare 'sudo' with 'sudo -S -p ""'.
|
||||
# The password is returned as sudo_stdin and must be written to the
|
||||
# process's stdin pipe by the caller — it never appears in any
|
||||
# command-line argument or shell string.
|
||||
return "sudo -S -p ''"
|
||||
|
||||
# Match 'sudo' at word boundaries (not 'visudo' or 'sudoers')
|
||||
transformed = re.sub(r'\bsudo\b', replace_sudo, command)
|
||||
# Trailing newline is required: sudo -S reads one line for the password.
|
||||
return transformed, sudo_password + "\n"
|
||||
return command, None
|
||||
|
||||
|
||||
# Environment classes now live in tools/environments/
|
||||
|
||||
@@ -74,7 +74,7 @@ This module requires NixOS. For non-NixOS systems (macOS, other Linux distros),
|
||||
# /etc/nixos/flake.nix (or your system flake)
|
||||
{
|
||||
inputs = {
|
||||
nixpkgs.url = "github:NixOS/nixpkgs/nixos-24.11";
|
||||
nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
|
||||
hermes-agent.url = "github:NousResearch/hermes-agent";
|
||||
};
|
||||
|
||||
|
||||
@@ -230,7 +230,7 @@ model:
|
||||
```
|
||||
|
||||
:::warning Legacy env vars
|
||||
`OPENAI_BASE_URL` and `LLM_MODEL` in `.env` are **deprecated**. `OPENAI_BASE_URL` is no longer consulted for endpoint resolution — `config.yaml` is the single source of truth. The CLI ignores `LLM_MODEL` entirely (only the gateway reads it as a fallback). Use `hermes model` or edit `config.yaml` directly — both persist correctly across restarts and Docker containers.
|
||||
`OPENAI_BASE_URL` and `LLM_MODEL` in `.env` are **removed**. Neither is read by any part of Hermes — `config.yaml` is the single source of truth for model and endpoint configuration. If you have stale entries in your `.env`, they are automatically cleared on the next `hermes setup` or config migration. Use `hermes model` or edit `config.yaml` directly.
|
||||
:::
|
||||
|
||||
Both approaches persist to `config.yaml`, which is the source of truth for model, provider, and base URL.
|
||||
@@ -657,8 +657,8 @@ model:
|
||||
#### Responses get cut off mid-sentence
|
||||
|
||||
**Possible causes:**
|
||||
1. **Low `max_tokens` on the server** — SGLang defaults to 128 tokens per response. Set `--default-max-tokens` on the server or configure Hermes with `model.max_tokens` in config.yaml.
|
||||
2. **Context exhaustion** — The model filled its context window. Increase context length or enable [context compression](/docs/user-guide/configuration#context-compression) in Hermes.
|
||||
1. **Low output cap (`max_tokens`) on the server** — SGLang defaults to 128 tokens per response. Set `--default-max-tokens` on the server or configure Hermes with `model.max_tokens` in config.yaml. Note: `max_tokens` controls response length only — it is unrelated to how long your conversation history can be (that is `context_length`).
|
||||
2. **Context exhaustion** — The model filled its context window. Increase `model.context_length` or enable [context compression](/docs/user-guide/configuration#context-compression) in Hermes.
|
||||
|
||||
---
|
||||
|
||||
@@ -751,6 +751,15 @@ model:
|
||||
|
||||
### Context Length Detection
|
||||
|
||||
:::note Two settings, easy to confuse
|
||||
**`context_length`** is the **total context window** — the combined budget for input *and* output tokens (e.g. 200,000 for Claude Opus 4.6). Hermes uses this to decide when to compress history and to validate API requests.
|
||||
|
||||
**`model.max_tokens`** is the **output cap** — the maximum number of tokens the model may generate in a *single response*. It has nothing to do with how long your conversation history can be. The industry-standard name `max_tokens` is a common source of confusion; Anthropic's native API has since renamed it `max_output_tokens` for clarity.
|
||||
|
||||
Set `context_length` when auto-detection gets the window size wrong.
|
||||
Set `model.max_tokens` only when you need to limit how long individual responses can be.
|
||||
:::
|
||||
|
||||
Hermes uses a multi-source resolution chain to detect the correct context window for your model and provider:
|
||||
|
||||
1. **Config override** — `model.context_length` in config.yaml (highest priority)
|
||||
|
||||
@@ -43,6 +43,8 @@ hermes [global-options] <command> [subcommand/options]
|
||||
| `hermes cron` | Inspect and tick the cron scheduler. |
|
||||
| `hermes webhook` | Manage dynamic webhook subscriptions for event-driven activation. |
|
||||
| `hermes doctor` | Diagnose config and dependency issues. |
|
||||
| `hermes dump` | Copy-pasteable setup summary for support/debugging. |
|
||||
| `hermes logs` | View, tail, and filter agent/gateway/error log files. |
|
||||
| `hermes config` | Show, edit, migrate, and query configuration files. |
|
||||
| `hermes pairing` | Approve or revoke messaging pairing codes. |
|
||||
| `hermes skills` | Browse, install, publish, audit, and configure skills. |
|
||||
@@ -272,6 +274,149 @@ hermes doctor [--fix]
|
||||
|--------|-------------|
|
||||
| `--fix` | Attempt automatic repairs where possible. |
|
||||
|
||||
## `hermes dump`
|
||||
|
||||
```bash
|
||||
hermes dump [--show-keys]
|
||||
```
|
||||
|
||||
Outputs a compact, plain-text summary of your entire Hermes setup. Designed to be copy-pasted into Discord, GitHub issues, or Telegram when asking for support — no ANSI colors, no special formatting, just data.
|
||||
|
||||
| Option | Description |
|
||||
|--------|-------------|
|
||||
| `--show-keys` | Show redacted API key prefixes (first and last 4 characters) instead of just `set`/`not set`. |
|
||||
|
||||
### What it includes
|
||||
|
||||
| Section | Details |
|
||||
|---------|---------|
|
||||
| **Header** | Hermes version, release date, git commit hash |
|
||||
| **Environment** | OS, Python version, OpenAI SDK version |
|
||||
| **Identity** | Active profile name, HERMES_HOME path |
|
||||
| **Model** | Configured default model and provider |
|
||||
| **Terminal** | Backend type (local, docker, ssh, etc.) |
|
||||
| **API keys** | Presence check for all 22 provider/tool API keys |
|
||||
| **Features** | Enabled toolsets, MCP server count, memory provider |
|
||||
| **Services** | Gateway status, configured messaging platforms |
|
||||
| **Workload** | Cron job counts, installed skill count |
|
||||
| **Config overrides** | Any config values that differ from defaults |
|
||||
|
||||
### Example output
|
||||
|
||||
```
|
||||
--- hermes dump ---
|
||||
version: 0.8.0 (2026.4.8) [af4abd2f]
|
||||
os: Linux 6.14.0-37-generic x86_64
|
||||
python: 3.11.14
|
||||
openai_sdk: 2.24.0
|
||||
profile: default
|
||||
hermes_home: ~/.hermes
|
||||
model: anthropic/claude-opus-4.6
|
||||
provider: openrouter
|
||||
terminal: local
|
||||
|
||||
api_keys:
|
||||
openrouter set
|
||||
openai not set
|
||||
anthropic set
|
||||
nous not set
|
||||
firecrawl set
|
||||
...
|
||||
|
||||
features:
|
||||
toolsets: all
|
||||
mcp_servers: 0
|
||||
memory_provider: built-in
|
||||
gateway: running (systemd)
|
||||
platforms: telegram, discord
|
||||
cron_jobs: 3 active / 5 total
|
||||
skills: 42
|
||||
|
||||
config_overrides:
|
||||
agent.max_turns: 250
|
||||
compression.threshold: 0.85
|
||||
display.streaming: True
|
||||
--- end dump ---
|
||||
```
|
||||
|
||||
### When to use
|
||||
|
||||
- Reporting a bug on GitHub — paste the dump into your issue
|
||||
- Asking for help in Discord — share it in a code block
|
||||
- Comparing your setup to someone else's
|
||||
- Quick sanity check when something isn't working
|
||||
|
||||
:::tip
|
||||
`hermes dump` is specifically designed for sharing. For interactive diagnostics, use `hermes doctor`. For a visual overview, use `hermes status`.
|
||||
:::
|
||||
|
||||
## `hermes logs`
|
||||
|
||||
```bash
|
||||
hermes logs [log_name] [options]
|
||||
```
|
||||
|
||||
View, tail, and filter Hermes log files. All logs are stored in `~/.hermes/logs/` (or `<profile>/logs/` for non-default profiles).
|
||||
|
||||
### Log files
|
||||
|
||||
| Name | File | What it captures |
|
||||
|------|------|-----------------|
|
||||
| `agent` (default) | `agent.log` | All agent activity — API calls, tool dispatch, session lifecycle (INFO and above) |
|
||||
| `errors` | `errors.log` | Warnings and errors only — a filtered subset of agent.log |
|
||||
| `gateway` | `gateway.log` | Messaging gateway activity — platform connections, message dispatch, webhook events |
|
||||
|
||||
### Options
|
||||
|
||||
| Option | Description |
|
||||
|--------|-------------|
|
||||
| `log_name` | Which log to view: `agent` (default), `errors`, `gateway`, or `list` to show available files with sizes. |
|
||||
| `-n`, `--lines <N>` | Number of lines to show (default: 50). |
|
||||
| `-f`, `--follow` | Follow the log in real time, like `tail -f`. Press Ctrl+C to stop. |
|
||||
| `--level <LEVEL>` | Minimum log level to show: `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`. |
|
||||
| `--session <ID>` | Filter lines containing a session ID substring. |
|
||||
| `--since <TIME>` | Show lines from a relative time ago: `30m`, `1h`, `2d`, etc. Supports `s` (seconds), `m` (minutes), `h` (hours), `d` (days). |
|
||||
|
||||
### Examples
|
||||
|
||||
```bash
|
||||
# View the last 50 lines of agent.log (default)
|
||||
hermes logs
|
||||
|
||||
# Follow agent.log in real time
|
||||
hermes logs -f
|
||||
|
||||
# View the last 100 lines of gateway.log
|
||||
hermes logs gateway -n 100
|
||||
|
||||
# Show only warnings and errors from the last hour
|
||||
hermes logs --level WARNING --since 1h
|
||||
|
||||
# Filter by a specific session
|
||||
hermes logs --session abc123
|
||||
|
||||
# Follow errors.log, starting from 30 minutes ago
|
||||
hermes logs errors --since 30m -f
|
||||
|
||||
# List all log files with their sizes
|
||||
hermes logs list
|
||||
```
|
||||
|
||||
### Filtering
|
||||
|
||||
Filters can be combined. When multiple filters are active, a log line must pass **all** of them to be shown:
|
||||
|
||||
```bash
|
||||
# WARNING+ lines from the last 2 hours containing session "tg-12345"
|
||||
hermes logs --level WARNING --since 2h --session tg-12345
|
||||
```
|
||||
|
||||
Lines without a parseable timestamp are included when `--since` is active (they may be continuation lines from a multi-line log entry). Lines without a detectable level are included when `--level` is active.
|
||||
|
||||
### Log rotation
|
||||
|
||||
Hermes uses Python's `RotatingFileHandler`. Old logs are rotated automatically — look for `agent.log.1`, `agent.log.2`, etc. The `hermes logs list` subcommand shows all log files including rotated ones.
|
||||
|
||||
## `hermes config`
|
||||
|
||||
```bash
|
||||
|
||||
@@ -53,8 +53,7 @@ All variables go in `~/.hermes/.env`. You can also set them with `hermes config
|
||||
| `OPENCODE_GO_API_KEY` | OpenCode Go API key — $10/month subscription for open models ([opencode.ai](https://opencode.ai/auth)) |
|
||||
| `OPENCODE_GO_BASE_URL` | Override OpenCode Go base URL |
|
||||
| `CLAUDE_CODE_OAUTH_TOKEN` | Explicit Claude Code token override if you export one manually |
|
||||
| `HERMES_MODEL` | Preferred model name (checked before `LLM_MODEL`, used by gateway) |
|
||||
| `LLM_MODEL` | Default model name (fallback when not set in config.yaml) |
|
||||
| `HERMES_MODEL` | Override model name at process level (used by cron scheduler; prefer `config.yaml` for normal use) |
|
||||
| `VOICE_TOOLS_OPENAI_KEY` | Preferred OpenAI key for OpenAI speech-to-text and text-to-speech providers |
|
||||
| `HERMES_LOCAL_STT_COMMAND` | Optional local speech-to-text command template. Supports `{input_path}`, `{output_dir}`, `{language}`, and `{model}` placeholders |
|
||||
| `HERMES_LOCAL_STT_LANGUAGE` | Default language passed to `HERMES_LOCAL_STT_COMMAND` or auto-detected local `whisper` CLI fallback (default: `en`) |
|
||||
|
||||
@@ -46,7 +46,6 @@ Type `/` in the CLI to open the autocomplete menu. Built-in commands are case-in
|
||||
| `/config` | Show current configuration |
|
||||
| `/model [model-name]` | Show or change the current model. Supports: `/model claude-sonnet-4`, `/model provider:model` (switch providers), `/model custom:model` (custom endpoint), `/model custom:name:model` (named custom provider), `/model custom` (auto-detect from endpoint) |
|
||||
| `/provider` | Show available providers and current provider |
|
||||
| `/prompt` | View/set custom system prompt |
|
||||
| `/personality` | Set a predefined personality |
|
||||
| `/verbose` | Cycle tool progress display: off → new → all → verbose. Can be [enabled for messaging](#notes) via config. |
|
||||
| `/reasoning` | Manage reasoning effort and display (usage: /reasoning [level\|show\|hide]) |
|
||||
@@ -144,7 +143,7 @@ The messaging gateway supports the following built-in commands inside Telegram,
|
||||
|
||||
## Notes
|
||||
|
||||
- `/skin`, `/tools`, `/toolsets`, `/browser`, `/config`, `/prompt`, `/cron`, `/skills`, `/platforms`, `/paste`, `/statusbar`, and `/plugins` are **CLI-only** commands.
|
||||
- `/skin`, `/tools`, `/toolsets`, `/browser`, `/config`, `/cron`, `/skills`, `/platforms`, `/paste`, `/statusbar`, and `/plugins` are **CLI-only** commands.
|
||||
- `/verbose` is **CLI-only by default**, but can be enabled for messaging platforms by setting `display.tool_progress_command: true` in `config.yaml`. When enabled, it cycles the `display.tool_progress` mode and saves to config.
|
||||
- `/status`, `/sethome`, `/update`, `/approve`, `/deny`, and `/commands` are **messaging-only** commands.
|
||||
- `/background`, `/voice`, `/reload-mcp`, `/rollback`, and `/yolo` work in **both** the CLI and the messaging gateway.
|
||||
|
||||
@@ -43,17 +43,18 @@ BLUEBUBBLES_PASSWORD=your-server-password
|
||||
Choose one approach:
|
||||
|
||||
**DM Pairing (recommended):**
|
||||
When someone messages your iMessage, Hermes automatically sends them a pairing code. Approve it with:
|
||||
```bash
|
||||
hermes pairing generate bluebubbles
|
||||
hermes pairing approve bluebubbles <CODE>
|
||||
```
|
||||
Share the pairing code — the user sends it via iMessage to get approved.
|
||||
Use `hermes pairing list` to see pending codes and approved users.
|
||||
|
||||
**Pre-authorize specific users:**
|
||||
**Pre-authorize specific users** (in `~/.hermes/.env`):
|
||||
```bash
|
||||
BLUEBUBBLES_ALLOWED_USERS=user@icloud.com,+15551234567
|
||||
```
|
||||
|
||||
**Open access:**
|
||||
**Open access** (in `~/.hermes/.env`):
|
||||
```bash
|
||||
BLUEBUBBLES_ALLOW_ALL_USERS=true
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user