Compare commits

..

1 Commits

Author SHA1 Message Date
kshitijk4poor 44596731c8 refactor: unify transport dispatch + collapse normalize shims
Consolidate 4 per-transport lazy singleton helpers (_get_anthropic_transport,
_get_codex_transport, _get_chat_completions_transport, _get_bedrock_transport)
into one generic _get_transport(api_mode) with a shared dict cache.

Collapse the 65-line main normalize block (3 api_mode branches, each with
its own SimpleNamespace shim) into 7 lines: one _get_transport() call +
one _nr_to_assistant_message() shared shim. The shim extracts provider_data
fields (codex_reasoning_items, reasoning_details, call_id, response_item_id)
into the SimpleNamespace shape downstream code expects.

Wire chat_completions and bedrock_converse normalize through their transports
for the first time — these were previously falling into the raw
response.choices[0].message else branch.

Remove 8 dead codex adapter imports that have zero callers after PRs 1-6.

Transport lifecycle improvements:
- Eagerly warm transport cache at __init__ (surfaces import errors early)
- Invalidate transport cache on api_mode change (switch_model, fallback
  activation, fallback restore, transport recovery) — prevents stale
  transport after mid-session provider switch

run_agent.py: -32 net lines (11,988 -> 11,956).

PR 7 of the provider transport refactor.
2026-04-22 14:30:09 +05:30
73 changed files with 502 additions and 3461 deletions
+1 -1
View File
@@ -13,7 +13,7 @@
**The self-improving AI agent built by [Nous Research](https://nousresearch.com).** It's the only agent with a built-in learning loop — it creates skills from experience, improves them during use, nudges itself to persist knowledge, searches its own past conversations, and builds a deepening model of who you are across sessions. Run it on a $5 VPS, a GPU cluster, or serverless infrastructure that costs nearly nothing when idle. It's not tied to your laptop — talk to it from Telegram while it works on a cloud VM.
Use any model you want — [Nous Portal](https://portal.nousresearch.com), [OpenRouter](https://openrouter.ai) (200+ models), [Volcengine](https://www.volcengine.com/product/ark), [BytePlus](https://www.byteplus.com/en/product/modelark), [NVIDIA NIM](https://build.nvidia.com) (Nemotron), [Xiaomi MiMo](https://platform.xiaomimimo.com), [z.ai/GLM](https://z.ai), [Kimi/Moonshot](https://platform.moonshot.ai), [MiniMax](https://www.minimax.io), [Hugging Face](https://huggingface.co), OpenAI, or your own endpoint. Switch with `hermes model` — no code changes, no lock-in.
Use any model you want — [Nous Portal](https://portal.nousresearch.com), [OpenRouter](https://openrouter.ai) (200+ models), [NVIDIA NIM](https://build.nvidia.com) (Nemotron), [Xiaomi MiMo](https://platform.xiaomimimo.com), [z.ai/GLM](https://z.ai), [Kimi/Moonshot](https://platform.moonshot.ai), [MiniMax](https://www.minimax.io), [Hugging Face](https://huggingface.co), OpenAI, or your own endpoint. Switch with `hermes model` — no code changes, no lock-in.
<table>
<tr><td><b>A real terminal interface</b></td><td>Full TUI with multiline editing, slash-command autocomplete, conversation history, interrupt-and-redirect, and streaming tool output.</td></tr>
+1 -84
View File
@@ -1083,31 +1083,6 @@ def convert_messages_to_anthropic(
"name": fn.get("name", ""),
"input": parsed_args,
})
# Kimi's /coding endpoint (Anthropic protocol) requires assistant
# tool-call messages to carry reasoning_content when thinking is
# enabled server-side. Preserve it as a thinking block so Kimi
# can validate the message history. See hermes-agent#13848.
#
# Accept empty string "" — _copy_reasoning_content_for_api()
# injects "" as a tier-3 fallback for Kimi tool-call messages
# that had no reasoning. Kimi requires the field to exist, even
# if empty.
#
# Prepend (not append): Anthropic protocol requires thinking
# blocks before text and tool_use blocks.
#
# Guard: only add when reasoning_details didn't already contribute
# thinking blocks. On native Anthropic, reasoning_details produces
# signed thinking blocks — adding another unsigned one from
# reasoning_content would create a duplicate (same text) that gets
# downgraded to a spurious text block on the last assistant message.
reasoning_content = m.get("reasoning_content")
_already_has_thinking = any(
isinstance(b, dict) and b.get("type") in ("thinking", "redacted_thinking")
for b in blocks
)
if isinstance(reasoning_content, str) and not _already_has_thinking:
blocks.insert(0, {"type": "thinking", "thinking": reasoning_content})
# Anthropic rejects empty assistant content
effective = blocks or content
if not effective or effective == "":
@@ -1263,7 +1238,6 @@ def convert_messages_to_anthropic(
# cache markers can interfere with signature validation.
_THINKING_TYPES = frozenset(("thinking", "redacted_thinking"))
_is_third_party = _is_third_party_anthropic_endpoint(base_url)
_is_kimi = _is_kimi_coding_endpoint(base_url)
last_assistant_idx = None
for i in range(len(result) - 1, -1, -1):
@@ -1275,25 +1249,7 @@ def convert_messages_to_anthropic(
if m.get("role") != "assistant" or not isinstance(m.get("content"), list):
continue
if _is_kimi:
# Kimi's /coding endpoint enables thinking server-side and
# requires unsigned thinking blocks on replayed assistant
# tool-call messages. Strip signed Anthropic blocks (Kimi
# can't validate signatures) but preserve the unsigned ones
# we synthesised from reasoning_content above.
new_content = []
for b in m["content"]:
if not isinstance(b, dict) or b.get("type") not in _THINKING_TYPES:
new_content.append(b)
continue
if b.get("signature") or b.get("data"):
# Anthropic-signed block — Kimi can't validate, strip
continue
# Unsigned thinking (synthesised from reasoning_content) —
# keep it: Kimi needs it for message-history validation.
new_content.append(b)
m["content"] = new_content or [{"type": "text", "text": "(empty)"}]
elif _is_third_party or idx != last_assistant_idx:
if _is_third_party or idx != last_assistant_idx:
# Third-party endpoint: strip ALL thinking blocks from every
# assistant message — signatures are Anthropic-proprietary.
# Direct Anthropic: strip from non-latest assistant messages only.
@@ -1604,42 +1560,3 @@ def normalize_anthropic_response(
),
finish_reason,
)
def normalize_anthropic_response_v2(
response,
strip_tool_prefix: bool = False,
) -> "NormalizedResponse":
"""Normalize Anthropic response to NormalizedResponse.
Wraps the existing normalize_anthropic_response() and maps its output
to the shared transport types. This allows incremental migration —
one call site at a time — without changing the original function.
"""
from agent.transports.types import NormalizedResponse, build_tool_call
assistant_msg, finish_reason = normalize_anthropic_response(response, strip_tool_prefix)
tool_calls = None
if assistant_msg.tool_calls:
tool_calls = [
build_tool_call(
id=tc.id,
name=tc.function.name,
arguments=tc.function.arguments,
)
for tc in assistant_msg.tool_calls
]
provider_data = {}
if getattr(assistant_msg, "reasoning_details", None):
provider_data["reasoning_details"] = assistant_msg.reasoning_details
return NormalizedResponse(
content=assistant_msg.content,
tool_calls=tool_calls,
finish_reason=finish_reason,
reasoning=getattr(assistant_msg, "reasoning", None),
usage=None, # Anthropic usage is on the raw response, not the normaliser
provider_data=provider_data or None,
)
-5
View File
@@ -74,10 +74,6 @@ _PROVIDER_ALIASES = {
"minimax_cn": "minimax-cn",
"claude": "anthropic",
"claude-code": "anthropic",
"volcengine-coding-plan": "volcengine",
"volcengine_coding_plan": "volcengine",
"byteplus-coding-plan": "byteplus",
"byteplus_coding_plan": "byteplus",
}
@@ -138,7 +134,6 @@ _API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
"gemini": "gemini-3-flash-preview",
"zai": "glm-4.5-flash",
"kimi-coding": "kimi-k2-turbo-preview",
"stepfun": "step-3.5-flash",
"kimi-coding-cn": "kimi-k2-turbo-preview",
"minimax": "MiniMax-M2.7",
"minimax-cn": "MiniMax-M2.7",
+4 -9
View File
@@ -470,16 +470,11 @@ def _classify_by_status(
retryable=False,
should_fallback=True,
)
# Generic 404 with no "model not found" signal — could be a wrong
# endpoint path (common with local llama.cpp / Ollama / vLLM when
# the URL is slightly misconfigured), a proxy routing glitch, or
# a transient backend issue. Classifying these as model_not_found
# silently falls back to a different provider and tells the model
# the model is missing, which is wrong and wastes a turn. Treat
# as unknown so the retry loop surfaces the real error instead.
# Generic 404 — could be model or endpoint
return result_fn(
FailoverReason.unknown,
retryable=True,
FailoverReason.model_not_found,
retryable=False,
should_fallback=True,
)
if status_code == 413:
+3 -19
View File
@@ -14,8 +14,8 @@ from urllib.parse import urlparse
import requests
import yaml
from hermes_cli.volcengine_byteplus import model_context_window
from utils import base_url_host_matches, base_url_hostname
from hermes_constants import OPENROUTER_MODELS_URL
logger = logging.getLogger(__name__)
@@ -25,22 +25,18 @@ logger = logging.getLogger(__name__)
# are preserved so the full model name reaches cache lookups and server queries.
_PROVIDER_PREFIXES: frozenset[str] = frozenset({
"openrouter", "nous", "openai-codex", "copilot", "copilot-acp",
"gemini", "ollama-cloud", "zai", "kimi-coding", "kimi-coding-cn", "stepfun", "minimax", "minimax-cn", "anthropic", "deepseek",
"gemini", "ollama-cloud", "zai", "kimi-coding", "kimi-coding-cn", "minimax", "minimax-cn", "anthropic", "deepseek",
"opencode-zen", "opencode-go", "ai-gateway", "kilocode", "alibaba",
"qwen-oauth",
"xiaomi",
"arcee",
"volcengine",
"volcengine-coding-plan",
"byteplus",
"byteplus-coding-plan",
"custom", "local",
# Common aliases
"google", "google-gemini", "google-ai-studio",
"glm", "z-ai", "z.ai", "zhipu", "github", "github-copilot",
"github-models", "kimi", "moonshot", "kimi-cn", "moonshot-cn", "claude", "deep-seek",
"ollama",
"stepfun", "opencode", "zen", "go", "vercel", "kilo", "dashscope", "aliyun", "qwen",
"opencode", "zen", "go", "vercel", "kilo", "dashscope", "aliyun", "qwen",
"mimo", "xiaomi-mimo",
"arcee-ai", "arceeai",
"xai", "x-ai", "x.ai", "grok",
@@ -241,8 +237,6 @@ _URL_TO_PROVIDER: Dict[str, str] = {
"api.moonshot.ai": "kimi-coding",
"api.moonshot.cn": "kimi-coding-cn",
"api.kimi.com": "kimi-coding",
"api.stepfun.ai": "stepfun",
"api.stepfun.com": "stepfun",
"api.arcee.ai": "arcee",
"api.minimax": "minimax",
"dashscope.aliyuncs.com": "alibaba",
@@ -261,8 +255,6 @@ _URL_TO_PROVIDER: Dict[str, str] = {
"api.xiaomimimo.com": "xiaomi",
"xiaomimimo.com": "xiaomi",
"ollama.com": "ollama-cloud",
"ark.cn-beijing.volces.com": "volcengine",
"ark.ap-southeast.bytepluses.com": "byteplus",
}
@@ -1125,20 +1117,12 @@ def get_model_context_length(
ctx = _resolve_nous_context_length(model)
if ctx:
return ctx
if effective_provider in {"volcengine", "byteplus"}:
ctx = model_context_window(model)
if ctx:
return ctx
if effective_provider:
from agent.models_dev import lookup_models_dev_context
ctx = lookup_models_dev_context(effective_provider, model)
if ctx:
return ctx
ctx = model_context_window(model)
if ctx:
return ctx
# 6. OpenRouter live API metadata (provider-unaware fallback)
metadata = fetch_model_metadata()
if model in metadata:
-1
View File
@@ -146,7 +146,6 @@ PROVIDER_TO_MODELS_DEV: Dict[str, str] = {
"openai-codex": "openai",
"zai": "zai",
"kimi-coding": "kimi-for-coding",
"stepfun": "stepfun",
"kimi-coding-cn": "kimi-for-coding",
"minimax": "minimax",
"minimax-cn": "minimax-cn",
+25 -4
View File
@@ -78,13 +78,34 @@ class AnthropicTransport(ProviderTransport):
def normalize_response(self, response: Any, **kwargs) -> NormalizedResponse:
"""Normalize Anthropic response to NormalizedResponse.
kwargs:
strip_tool_prefix: bool — strip 'mcp_mcp_' prefixes from tool names.
Calls the adapter's v1 normalize and maps the (SimpleNamespace, finish_reason)
tuple to the shared NormalizedResponse type.
"""
from agent.anthropic_adapter import normalize_anthropic_response_v2
from agent.anthropic_adapter import normalize_anthropic_response
from agent.transports.types import build_tool_call
strip_tool_prefix = kwargs.get("strip_tool_prefix", False)
return normalize_anthropic_response_v2(response, strip_tool_prefix=strip_tool_prefix)
assistant_msg, finish_reason = normalize_anthropic_response(response, strip_tool_prefix)
tool_calls = None
if assistant_msg.tool_calls:
tool_calls = [
build_tool_call(id=tc.id, name=tc.function.name, arguments=tc.function.arguments)
for tc in assistant_msg.tool_calls
]
provider_data = {}
if getattr(assistant_msg, "reasoning_details", None):
provider_data["reasoning_details"] = assistant_msg.reasoning_details
return NormalizedResponse(
content=assistant_msg.content,
tool_calls=tool_calls,
finish_reason=finish_reason,
reasoning=getattr(assistant_msg, "reasoning", None),
usage=None,
provider_data=provider_data or None,
)
def validate_response(self, response: Any) -> bool:
"""Check Anthropic response structure is valid."""
+1 -33
View File
@@ -914,32 +914,6 @@ def _cleanup_worktree(info: Dict[str, str] = None) -> None:
print(f"\033[32m✓ Worktree cleaned up: {wt_path}\033[0m")
def _run_state_db_auto_maintenance(session_db) -> None:
"""Call ``SessionDB.maybe_auto_prune_and_vacuum`` using current config.
Reads the ``sessions:`` section from config.yaml via
:func:`hermes_cli.config.load_config` (the authoritative loader that
deep-merges DEFAULT_CONFIG, so unmigrated configs still get default
values). Honours ``auto_prune`` / ``retention_days`` /
``vacuum_after_prune`` / ``min_interval_hours``, and delegates to the
DB. Never raises maintenance must never block interactive startup.
"""
if session_db is None:
return
try:
from hermes_cli.config import load_config as _load_full_config
cfg = (_load_full_config().get("sessions") or {})
if not cfg.get("auto_prune", False):
return
session_db.maybe_auto_prune_and_vacuum(
retention_days=int(cfg.get("retention_days", 90)),
min_interval_hours=int(cfg.get("min_interval_hours", 24)),
vacuum=bool(cfg.get("vacuum_after_prune", True)),
)
except Exception as exc:
logger.debug("state.db auto-maintenance skipped: %s", exc)
def _prune_stale_worktrees(repo_root: str, max_age_hours: int = 24) -> None:
"""Remove stale worktrees and orphaned branches on startup.
@@ -1987,13 +1961,7 @@ class HermesCLI:
self._session_db = SessionDB()
except Exception as e:
logger.warning("Failed to initialize SessionDB — session will NOT be indexed for search: %s", e)
# Opportunistic state.db maintenance — runs at most once per
# min_interval_hours, tracked via state_meta in state.db itself so
# it's shared across all Hermes processes for this HERMES_HOME.
# Never blocks startup on failure.
_run_state_db_auto_maintenance(self._session_db)
# Deferred title: stored in memory until the session is created in the DB
self._pending_title: Optional[str] = None
-2
View File
@@ -616,8 +616,6 @@ def load_gateway_config() -> GatewayConfig:
if isinstance(frc, list):
frc = ",".join(str(v) for v in frc)
os.environ["SLACK_FREE_RESPONSE_CHANNELS"] = str(frc)
if "reactions" in slack_cfg and not os.getenv("SLACK_REACTIONS"):
os.environ["SLACK_REACTIONS"] = str(slack_cfg["reactions"]).lower()
# Discord settings → env vars (env vars take precedence)
discord_cfg = yaml_cfg.get("discord", {})
+4 -2
View File
@@ -26,8 +26,9 @@ from .adapter import ( # noqa: F401
# -- Onboard (QR-code scan-to-configure) -----------------------------------
from .onboard import ( # noqa: F401
BindStatus,
create_bind_task,
poll_bind_result,
build_connect_url,
qr_register,
)
from .crypto import decrypt_secret, generate_bind_key # noqa: F401
@@ -43,8 +44,9 @@ __all__ = [
"_ssrf_redirect_guard",
# onboard
"BindStatus",
"create_bind_task",
"poll_bind_result",
"build_connect_url",
"qr_register",
# crypto
"decrypt_secret",
"generate_bind_key",
+21 -117
View File
@@ -1,10 +1,6 @@
"""
QQBot scan-to-configure (QR code onboard) module.
Mirrors the Feishu onboarding pattern: synchronous HTTP + a single public
entry-point ``qr_register()`` that handles the full flow (create task →
display QR code → poll → decrypt credentials).
Calls the ``q.qq.com`` ``create_bind_task`` / ``poll_bind_result`` APIs to
generate a QR-code URL and poll for scan completion. On success the caller
receives the bot's *app_id*, *client_secret* (decrypted locally), and the
@@ -16,20 +12,18 @@ Reference: https://bot.q.qq.com/wiki/develop/api-v2/
from __future__ import annotations
import logging
import time
from enum import IntEnum
from typing import Optional, Tuple
from typing import Tuple
from urllib.parse import quote
from .constants import (
ONBOARD_API_TIMEOUT,
ONBOARD_CREATE_PATH,
ONBOARD_POLL_INTERVAL,
ONBOARD_POLL_PATH,
PORTAL_HOST,
QR_URL_TEMPLATE,
)
from .crypto import decrypt_secret, generate_bind_key
from .crypto import generate_bind_key
from .utils import get_api_headers
logger = logging.getLogger(__name__)
@@ -41,7 +35,7 @@ logger = logging.getLogger(__name__)
class BindStatus(IntEnum):
"""Status codes returned by ``_poll_bind_result``."""
"""Status codes returned by ``poll_bind_result``."""
NONE = 0
PENDING = 1
@@ -50,40 +44,18 @@ class BindStatus(IntEnum):
# ---------------------------------------------------------------------------
# QR rendering
# ---------------------------------------------------------------------------
try:
import qrcode as _qrcode_mod
except (ImportError, TypeError):
_qrcode_mod = None # type: ignore[assignment]
def _render_qr(url: str) -> bool:
"""Try to render a QR code in the terminal. Returns True if successful."""
if _qrcode_mod is None:
return False
try:
qr = _qrcode_mod.QRCode(
error_correction=_qrcode_mod.constants.ERROR_CORRECT_M,
border=2,
)
qr.add_data(url)
qr.make(fit=True)
qr.print_ascii(invert=True)
return True
except Exception:
return False
# ---------------------------------------------------------------------------
# Synchronous HTTP helpers (mirrors Feishu _post_registration pattern)
# Public API
# ---------------------------------------------------------------------------
def _create_bind_task(timeout: float = ONBOARD_API_TIMEOUT) -> Tuple[str, str]:
async def create_bind_task(
timeout: float = ONBOARD_API_TIMEOUT,
) -> Tuple[str, str]:
"""Create a bind task and return *(task_id, aes_key_base64)*.
The AES key is generated locally and sent to the server so it can
encrypt the bot credentials before returning them.
Raises:
RuntimeError: If the API returns a non-zero ``retcode``.
"""
@@ -92,8 +64,8 @@ def _create_bind_task(timeout: float = ONBOARD_API_TIMEOUT) -> Tuple[str, str]:
url = f"https://{PORTAL_HOST}{ONBOARD_CREATE_PATH}"
key = generate_bind_key()
with httpx.Client(timeout=timeout, follow_redirects=True) as client:
resp = client.post(url, json={"key": key}, headers=get_api_headers())
async with httpx.AsyncClient(timeout=timeout, follow_redirects=True) as client:
resp = await client.post(url, json={"key": key}, headers=get_api_headers())
resp.raise_for_status()
data = resp.json()
@@ -108,7 +80,7 @@ def _create_bind_task(timeout: float = ONBOARD_API_TIMEOUT) -> Tuple[str, str]:
return task_id, key
def _poll_bind_result(
async def poll_bind_result(
task_id: str,
timeout: float = ONBOARD_API_TIMEOUT,
) -> Tuple[BindStatus, str, str, str]:
@@ -117,6 +89,12 @@ def _poll_bind_result(
Returns:
A 4-tuple of ``(status, bot_appid, bot_encrypt_secret, user_openid)``.
* ``bot_encrypt_secret`` is AES-256-GCM encrypted — decrypt it with
:func:`~gateway.platforms.qqbot.crypto.decrypt_secret` using the
key from :func:`create_bind_task`.
* ``user_openid`` is the OpenID of the person who scanned the code
(available when ``status == COMPLETED``).
Raises:
RuntimeError: If the API returns a non-zero ``retcode``.
"""
@@ -124,8 +102,8 @@ def _poll_bind_result(
url = f"https://{PORTAL_HOST}{ONBOARD_POLL_PATH}"
with httpx.Client(timeout=timeout, follow_redirects=True) as client:
resp = client.post(url, json={"task_id": task_id}, headers=get_api_headers())
async with httpx.AsyncClient(timeout=timeout, follow_redirects=True) as client:
resp = await client.post(url, json={"task_id": task_id}, headers=get_api_headers())
resp.raise_for_status()
data = resp.json()
@@ -144,77 +122,3 @@ def _poll_bind_result(
def build_connect_url(task_id: str) -> str:
"""Build the QR-code target URL for a given *task_id*."""
return QR_URL_TEMPLATE.format(task_id=quote(task_id))
# ---------------------------------------------------------------------------
# Public entry-point
# ---------------------------------------------------------------------------
_MAX_REFRESHES = 3
def qr_register(timeout_seconds: int = 600) -> Optional[dict]:
"""Run the QQBot scan-to-configure QR registration flow.
Mirrors ``feishu.qr_register()``: handles create → display → poll →
decrypt in one call. Unexpected errors propagate to the caller.
:returns:
``{"app_id": ..., "client_secret": ..., "user_openid": ...}`` on
success, or ``None`` on failure / expiry / cancellation.
"""
deadline = time.monotonic() + timeout_seconds
for refresh_count in range(_MAX_REFRESHES + 1):
# ── Create bind task ──
try:
task_id, aes_key = _create_bind_task()
except Exception as exc:
logger.warning("[QQBot onboard] Failed to create bind task: %s", exc)
return None
url = build_connect_url(task_id)
# ── Display QR code + URL ──
print()
if _render_qr(url):
print(f" Scan the QR code above, or open this URL directly:\n {url}")
else:
print(f" Open this URL in QQ on your phone:\n {url}")
print(" Tip: pip install qrcode to display a scannable QR code here")
print()
# ── Poll loop ──
while time.monotonic() < deadline:
try:
status, app_id, encrypted_secret, user_openid = _poll_bind_result(task_id)
except Exception:
time.sleep(ONBOARD_POLL_INTERVAL)
continue
if status == BindStatus.COMPLETED:
client_secret = decrypt_secret(encrypted_secret, aes_key)
print()
print(f" QR scan complete! (App ID: {app_id})")
if user_openid:
print(f" Scanner's OpenID: {user_openid}")
return {
"app_id": app_id,
"client_secret": client_secret,
"user_openid": user_openid,
}
if status == BindStatus.EXPIRED:
if refresh_count >= _MAX_REFRESHES:
logger.warning("[QQBot onboard] QR code expired %d times — giving up", _MAX_REFRESHES)
return None
print(f"\n QR code expired, refreshing... ({refresh_count + 1}/{_MAX_REFRESHES})")
break # next for-loop iteration creates a new task
time.sleep(ONBOARD_POLL_INTERVAL)
else:
# deadline reached without completing
logger.warning("[QQBot onboard] Poll timed out after %ds", timeout_seconds)
return None
return None
+7 -57
View File
@@ -38,7 +38,6 @@ from gateway.platforms.base import (
BasePlatformAdapter,
MessageEvent,
MessageType,
ProcessingOutcome,
SendResult,
SUPPORTED_DOCUMENT_TYPES,
safe_url_for_log,
@@ -114,11 +113,6 @@ class SlackAdapter(BasePlatformAdapter):
# Cache for _fetch_thread_context results: cache_key → _ThreadContextCache
self._thread_context_cache: Dict[str, _ThreadContextCache] = {}
self._THREAD_CACHE_TTL = 60.0
# Track message IDs that should get reaction lifecycle (DMs / @mentions).
self._reacting_message_ids: set = set()
# Track active assistant thread status indicators so stop_typing can
# clear them (chat_id → thread_ts).
self._active_status_threads: Dict[str, str] = {}
async def connect(self) -> bool:
"""Connect to Slack via Socket Mode."""
@@ -368,7 +362,6 @@ class SlackAdapter(BasePlatformAdapter):
if not thread_ts:
return # Can only set status in a thread context
self._active_status_threads[chat_id] = thread_ts
try:
await self._get_client(chat_id).assistant_threads_setStatus(
channel_id=chat_id,
@@ -380,22 +373,6 @@ class SlackAdapter(BasePlatformAdapter):
# in an assistant-enabled context. Falls back to reactions.
logger.debug("[Slack] assistant.threads.setStatus failed: %s", e)
async def stop_typing(self, chat_id: str) -> None:
"""Clear the assistant thread status indicator."""
if not self._app:
return
thread_ts = self._active_status_threads.pop(chat_id, None)
if not thread_ts:
return
try:
await self._get_client(chat_id).assistant_threads_setStatus(
channel_id=chat_id,
thread_ts=thread_ts,
status="",
)
except Exception as e:
logger.debug("[Slack] assistant.threads.setStatus clear failed: %s", e)
def _dm_top_level_threads_as_sessions(self) -> bool:
"""Whether top-level Slack DMs get per-message session threads.
@@ -607,38 +584,6 @@ class SlackAdapter(BasePlatformAdapter):
logger.debug("[Slack] reactions.remove failed (%s): %s", emoji, e)
return False
def _reactions_enabled(self) -> bool:
"""Check if message reactions are enabled via config/env."""
return os.getenv("SLACK_REACTIONS", "true").lower() not in ("false", "0", "no")
async def on_processing_start(self, event: MessageEvent) -> None:
"""Add an in-progress reaction when message processing begins."""
if not self._reactions_enabled():
return
ts = getattr(event, "message_id", None)
if not ts or ts not in self._reacting_message_ids:
return
channel_id = getattr(event.source, "chat_id", None)
if channel_id:
await self._add_reaction(channel_id, ts, "eyes")
async def on_processing_complete(self, event: MessageEvent, outcome: ProcessingOutcome) -> None:
"""Swap the in-progress reaction for a final success/failure reaction."""
if not self._reactions_enabled():
return
ts = getattr(event, "message_id", None)
if not ts or ts not in self._reacting_message_ids:
return
self._reacting_message_ids.discard(ts)
channel_id = getattr(event.source, "chat_id", None)
if not channel_id:
return
await self._remove_reaction(channel_id, ts, "eyes")
if outcome == ProcessingOutcome.SUCCESS:
await self._add_reaction(channel_id, ts, "white_check_mark")
elif outcome == ProcessingOutcome.FAILURE:
await self._add_reaction(channel_id, ts, "x")
# ----- User identity resolution -----
async def _resolve_user_name(self, user_id: str, chat_id: str = "") -> str:
@@ -1268,12 +1213,17 @@ class SlackAdapter(BasePlatformAdapter):
# Only react when bot is directly addressed (DM or @mention).
# In listen-all channels (require_mention=false), reacting to every
# casual message would be noisy.
_should_react = (is_dm or is_mentioned) and self._reactions_enabled()
_should_react = is_dm or is_mentioned
if _should_react:
self._reacting_message_ids.add(ts)
await self._add_reaction(channel_id, ts, "eyes")
await self.handle_message(msg_event)
if _should_react:
await self._remove_reaction(channel_id, ts, "eyes")
await self._add_reaction(channel_id, ts, "white_check_mark")
# ----- Approval button support (Block Kit) -----
async def send_exec_approval(
-131
View File
@@ -1464,134 +1464,3 @@ class WeComAdapter(BasePlatformAdapter):
"name": chat_id,
"type": "group" if chat_id and chat_id.lower().startswith("group") else "dm",
}
# ------------------------------------------------------------------
# QR code scan flow for obtaining bot credentials
# ------------------------------------------------------------------
_QR_GENERATE_URL = "https://work.weixin.qq.com/ai/qc/generate"
_QR_QUERY_URL = "https://work.weixin.qq.com/ai/qc/query_result"
_QR_CODE_PAGE = "https://work.weixin.qq.com/ai/qc/gen?source=hermes&scode="
_QR_POLL_INTERVAL = 3 # seconds
_QR_POLL_TIMEOUT = 300 # 5 minutes
def qr_scan_for_bot_info(
*,
timeout_seconds: int = _QR_POLL_TIMEOUT,
) -> Optional[Dict[str, str]]:
"""Run the WeCom QR scan flow to obtain bot_id and secret.
Fetches a QR code from WeCom, renders it in the terminal, and polls
until the user scans it or the timeout expires.
Returns ``{"bot_id": ..., "secret": ...}`` on success, ``None`` on
failure or timeout.
Note: the ``work.weixin.qq.com/ai/qc/{generate,query_result}`` endpoints
used here are not part of WeCom's public developer API — they back the
admin-console web UI's bot-creation flow and may change without notice.
The same pattern is used by the feishu/dingtalk QR setup wizards.
"""
try:
import urllib.request
import urllib.parse
except ImportError: # pragma: no cover
logger.error("urllib is required for WeCom QR scan")
return None
generate_url = f"{_QR_GENERATE_URL}?source=hermes"
# ── Step 1: Fetch QR code ──
print(" Connecting to WeCom...", end="", flush=True)
try:
req = urllib.request.Request(generate_url, headers={"User-Agent": "HermesAgent/1.0"})
with urllib.request.urlopen(req, timeout=15) as resp:
raw = json.loads(resp.read().decode("utf-8"))
except Exception as exc:
logger.error("WeCom QR: failed to fetch QR code: %s", exc)
print(f" failed: {exc}")
return None
data = raw.get("data") or {}
scode = str(data.get("scode") or "").strip()
auth_url = str(data.get("auth_url") or "").strip()
if not scode or not auth_url:
logger.error("WeCom QR: unexpected response format: %s", raw)
print(" failed: unexpected response format")
return None
print(" done.")
# ── Step 2: Render QR code in terminal ──
print()
qr_rendered = False
try:
import qrcode as _qrcode
qr = _qrcode.QRCode()
qr.add_data(auth_url)
qr.make(fit=True)
qr.print_ascii(invert=True)
qr_rendered = True
except ImportError:
pass
except Exception:
pass
page_url = f"{_QR_CODE_PAGE}{urllib.parse.quote(scode)}"
if qr_rendered:
print(f"\n Scan the QR code above, or open this URL directly:\n {page_url}")
else:
print(f" Open this URL in WeCom on your phone:\n\n {page_url}\n")
print(" Tip: pip install qrcode to display a scannable QR code here next time")
print()
print(" Fetching configuration results...", end="", flush=True)
# ── Step 3: Poll for result ──
import time
deadline = time.time() + timeout_seconds
query_url = f"{_QR_QUERY_URL}?scode={urllib.parse.quote(scode)}"
poll_count = 0
while time.time() < deadline:
try:
req = urllib.request.Request(query_url, headers={"User-Agent": "HermesAgent/1.0"})
with urllib.request.urlopen(req, timeout=10) as resp:
result = json.loads(resp.read().decode("utf-8"))
except Exception as exc:
logger.debug("WeCom QR poll error: %s", exc)
time.sleep(_QR_POLL_INTERVAL)
continue
poll_count += 1
# Print a dot on every poll so progress is visible within 3s.
print(".", end="", flush=True)
result_data = result.get("data") or {}
status = str(result_data.get("status") or "").lower()
if status == "success":
print() # newline after "Fetching configuration results..." dots
bot_info = result_data.get("bot_info") or {}
bot_id = str(bot_info.get("botid") or bot_info.get("bot_id") or "").strip()
secret = str(bot_info.get("secret") or "").strip()
if bot_id and secret:
return {"bot_id": bot_id, "secret": secret}
logger.warning(
"WeCom QR: scan reported success but bot_info missing or incomplete: %s",
result_data,
)
print(
" QR scan reported success but no bot credentials were returned.\n"
" This usually means the bot was not actually created on the WeCom side.\n"
" Falling back to manual credential entry."
)
return None
time.sleep(_QR_POLL_INTERVAL)
print() # newline after dots
print(f" QR scan timed out ({timeout_seconds // 60} minutes). Please try again.")
return None
+2 -36
View File
@@ -710,26 +710,7 @@ class GatewayRunner:
self._session_db = SessionDB()
except Exception as e:
logger.debug("SQLite session store not available: %s", e)
# Opportunistic state.db maintenance: prune ended sessions older
# than sessions.retention_days + optional VACUUM. Tracks last-run
# in state_meta so it only actually executes once per
# sessions.min_interval_hours. Gateway is long-lived so blocking
# a few seconds once per day is acceptable; failures are logged
# but never raised.
if self._session_db is not None:
try:
from hermes_cli.config import load_config as _load_full_config
_sess_cfg = (_load_full_config().get("sessions") or {})
if _sess_cfg.get("auto_prune", False):
self._session_db.maybe_auto_prune_and_vacuum(
retention_days=int(_sess_cfg.get("retention_days", 90)),
min_interval_hours=int(_sess_cfg.get("min_interval_hours", 24)),
vacuum=bool(_sess_cfg.get("vacuum_after_prune", True)),
)
except Exception as exc:
logger.debug("state.db auto-maintenance skipped: %s", exc)
# DM pairing store for code-based user authorization
from gateway.pairing import PairingStore
self.pairing_store = PairingStore()
@@ -5690,7 +5671,6 @@ class GatewayRunner:
from hermes_cli.models import (
list_available_providers,
normalize_provider,
provider_for_base_url,
_PROVIDER_LABELS,
)
@@ -5719,10 +5699,7 @@ class GatewayRunner:
# Detect custom endpoint from config base_url
if current_provider == "openrouter":
_cfg_base = model_cfg.get("base_url", "") if isinstance(model_cfg, dict) else ""
inferred_provider = provider_for_base_url(_cfg_base)
if inferred_provider:
current_provider = inferred_provider
elif _cfg_base and "openrouter.ai" not in _cfg_base:
if _cfg_base and "openrouter.ai" not in _cfg_base:
current_provider = "custom"
current_label = _PROVIDER_LABELS.get(current_provider, current_provider)
@@ -6479,11 +6456,6 @@ class GatewayRunner:
session_id=task_id,
platform=platform_key,
user_id=source.user_id,
user_name=source.user_name,
chat_id=source.chat_id,
chat_name=source.chat_name,
chat_type=source.chat_type,
thread_id=source.thread_id,
session_db=self._session_db,
fallback_model=self._fallback_model,
)
@@ -7244,7 +7216,6 @@ class GatewayRunner:
tool_calls=msg.get("tool_calls"),
tool_call_id=msg.get("tool_call_id"),
reasoning=msg.get("reasoning"),
reasoning_content=msg.get("reasoning_content"),
)
except Exception:
pass # Best-effort copy
@@ -9727,11 +9698,6 @@ class GatewayRunner:
session_id=session_id,
platform=platform_key,
user_id=source.user_id,
user_name=source.user_name,
chat_id=source.chat_id,
chat_name=source.chat_name,
chat_type=source.chat_type,
thread_id=source.thread_id,
gateway_session_key=session_key,
session_db=self._session_db,
fallback_model=self._fallback_model,
-5
View File
@@ -1147,10 +1147,6 @@ class SessionStore:
tool_name=message.get("tool_name"),
tool_calls=message.get("tool_calls"),
tool_call_id=message.get("tool_call_id"),
reasoning=message.get("reasoning") if message.get("role") == "assistant" else None,
reasoning_content=message.get("reasoning_content") if message.get("role") == "assistant" else None,
reasoning_details=message.get("reasoning_details") if message.get("role") == "assistant" else None,
codex_reasoning_items=message.get("codex_reasoning_items") if message.get("role") == "assistant" else None,
)
except Exception as e:
logger.debug("Session DB operation failed: %s", e)
@@ -1180,7 +1176,6 @@ class SessionStore:
tool_calls=msg.get("tool_calls"),
tool_call_id=msg.get("tool_call_id"),
reasoning=msg.get("reasoning") if role == "assistant" else None,
reasoning_content=msg.get("reasoning_content") if role == "assistant" else None,
reasoning_details=msg.get("reasoning_details") if role == "assistant" else None,
codex_reasoning_items=msg.get("codex_reasoning_items") if role == "assistant" else None,
)
+2 -61
View File
@@ -39,13 +39,6 @@ import httpx
import yaml
from hermes_cli.config import get_hermes_home, get_config_path, read_raw_config
from hermes_cli.volcengine_byteplus import (
VOLCENGINE_PROVIDER,
BYTEPLUS_PROVIDER,
VOLCENGINE_STANDARD_BASE_URL,
BYTEPLUS_STANDARD_BASE_URL,
base_url_for_provider_model,
)
from hermes_constants import OPENROUTER_BASE_URL
logger = logging.getLogger(__name__)
@@ -79,8 +72,6 @@ DEFAULT_QWEN_BASE_URL = "https://portal.qwen.ai/v1"
DEFAULT_GITHUB_MODELS_BASE_URL = "https://api.githubcopilot.com"
DEFAULT_COPILOT_ACP_BASE_URL = "acp://copilot"
DEFAULT_OLLAMA_CLOUD_BASE_URL = "https://ollama.com/v1"
STEPFUN_STEP_PLAN_INTL_BASE_URL = "https://api.stepfun.ai/step_plan/v1"
STEPFUN_STEP_PLAN_CN_BASE_URL = "https://api.stepfun.com/step_plan/v1"
CODEX_OAUTH_CLIENT_ID = "app_EMoamEEZ73f0CkXaXp7hrann"
CODEX_OAUTH_TOKEN_URL = "https://auth.openai.com/oauth/token"
CODEX_ACCESS_TOKEN_REFRESH_SKEW_SECONDS = 120
@@ -191,14 +182,6 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
inference_base_url="https://api.moonshot.cn/v1",
api_key_env_vars=("KIMI_CN_API_KEY",),
),
"stepfun": ProviderConfig(
id="stepfun",
name="StepFun Step Plan",
auth_type="api_key",
inference_base_url=STEPFUN_STEP_PLAN_INTL_BASE_URL,
api_key_env_vars=("STEPFUN_API_KEY",),
base_url_env_var="STEPFUN_BASE_URL",
),
"arcee": ProviderConfig(
id="arcee",
name="Arcee AI",
@@ -314,20 +297,6 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
api_key_env_vars=("XIAOMI_API_KEY",),
base_url_env_var="XIAOMI_BASE_URL",
),
"volcengine": ProviderConfig(
id=VOLCENGINE_PROVIDER,
name="Volcengine",
auth_type="api_key",
inference_base_url=VOLCENGINE_STANDARD_BASE_URL,
api_key_env_vars=("VOLCENGINE_API_KEY",),
),
"byteplus": ProviderConfig(
id=BYTEPLUS_PROVIDER,
name="BytePlus",
auth_type="api_key",
inference_base_url=BYTEPLUS_STANDARD_BASE_URL,
api_key_env_vars=("BYTEPLUS_API_KEY",),
),
"ollama-cloud": ProviderConfig(
id="ollama-cloud",
name="Ollama Cloud",
@@ -1023,7 +992,6 @@ def resolve_provider(
"x-ai": "xai", "x.ai": "xai", "grok": "xai",
"kimi": "kimi-coding", "kimi-for-coding": "kimi-coding", "moonshot": "kimi-coding",
"kimi-cn": "kimi-coding-cn", "moonshot-cn": "kimi-coding-cn",
"step": "stepfun", "stepfun-coding-plan": "stepfun",
"arcee-ai": "arcee", "arceeai": "arcee",
"minimax-china": "minimax-cn", "minimax_cn": "minimax-cn",
"claude": "anthropic", "claude-code": "anthropic",
@@ -1036,10 +1004,6 @@ def resolve_provider(
"hf": "huggingface", "hugging-face": "huggingface", "huggingface-hub": "huggingface",
"mimo": "xiaomi", "xiaomi-mimo": "xiaomi",
"aws": "bedrock", "aws-bedrock": "bedrock", "amazon-bedrock": "bedrock", "amazon": "bedrock",
"volcengine-coding-plan": "volcengine",
"volcengine_coding_plan": "volcengine",
"byteplus-coding-plan": "byteplus",
"byteplus_coding_plan": "byteplus",
"go": "opencode-go", "opencode-go-sub": "opencode-go",
"kilo": "kilocode", "kilo-code": "kilocode", "kilo-gateway": "kilocode",
# Local server aliases — route through the generic custom provider
@@ -1182,21 +1146,6 @@ def _qwen_cli_auth_path() -> Path:
return Path.home() / ".qwen" / "oauth_creds.json"
def _current_model_for_provider(provider_id: str) -> str:
"""Return the currently configured model when it belongs to the provider."""
try:
config = read_raw_config()
except Exception:
return ""
model_cfg = config.get("model")
if isinstance(model_cfg, dict):
configured_provider = str(model_cfg.get("provider") or "").strip().lower()
if configured_provider == provider_id:
return str(model_cfg.get("default") or model_cfg.get("model") or "").strip()
return ""
def _read_qwen_cli_tokens() -> Dict[str, Any]:
auth_path = _qwen_cli_auth_path()
if not auth_path.exists():
@@ -2595,11 +2544,7 @@ def get_api_key_provider_status(provider_id: str) -> Dict[str, Any]:
if pconfig.base_url_env_var:
env_url = os.getenv(pconfig.base_url_env_var, "").strip()
active_model = _current_model_for_provider(provider_id)
if provider_id in {VOLCENGINE_PROVIDER, BYTEPLUS_PROVIDER}:
base_url = base_url_for_provider_model(provider_id, active_model) or pconfig.inference_base_url
elif provider_id in ("kimi-coding", "kimi-coding-cn"):
if provider_id in ("kimi-coding", "kimi-coding-cn"):
base_url = _resolve_kimi_base_url(api_key, pconfig.inference_base_url, env_url)
elif env_url:
base_url = env_url
@@ -2694,11 +2639,7 @@ def resolve_api_key_provider_credentials(provider_id: str) -> Dict[str, Any]:
if pconfig.base_url_env_var:
env_url = os.getenv(pconfig.base_url_env_var, "").strip()
active_model = _current_model_for_provider(provider_id)
if provider_id in {VOLCENGINE_PROVIDER, BYTEPLUS_PROVIDER}:
base_url = base_url_for_provider_model(provider_id, active_model) or pconfig.inference_base_url
elif provider_id in ("kimi-coding", "kimi-coding-cn"):
if provider_id in ("kimi-coding", "kimi-coding-cn"):
base_url = _resolve_kimi_base_url(api_key, pconfig.inference_base_url, env_url)
elif provider_id == "zai":
base_url = _resolve_zai_base_url(api_key, pconfig.inference_base_url, env_url)
-59
View File
@@ -893,34 +893,6 @@ DEFAULT_CONFIG = {
"force_ipv4": False,
},
# Session storage — controls automatic cleanup of ~/.hermes/state.db.
# state.db accumulates every session, message, tool call, and FTS5 index
# entry forever. Without auto-pruning, a heavy user (gateway + cron)
# reports 384MB+ databases with 68K+ messages, which slows down FTS5
# inserts, /resume listing, and insights queries.
"sessions": {
# When true, prune ended sessions older than retention_days once
# per (roughly) min_interval_hours at CLI/gateway/cron startup.
# Only touches ended sessions — active sessions are always preserved.
# Default false: session history is valuable for search recall, and
# silently deleting it could surprise users. Opt in explicitly.
"auto_prune": False,
# How many days of ended-session history to keep. Matches the
# default of ``hermes sessions prune``.
"retention_days": 90,
# VACUUM after a prune that actually deleted rows. SQLite does not
# reclaim disk space on DELETE — freed pages are just reused on
# subsequent INSERTs — so without VACUUM the file stays bloated
# even after pruning. VACUUM blocks writes for a few seconds per
# 100MB, so it only runs at startup, and only when prune deleted
# ≥1 session.
"vacuum_after_prune": True,
# Minimum hours between auto-maintenance runs (avoids repeating
# the sweep on every CLI invocation). Tracked via state_meta in
# state.db itself, so it's shared across all processes.
"min_interval_hours": 24,
},
# Config schema version - bump this when adding new required fields
"_config_version": 22,
}
@@ -1078,22 +1050,6 @@ OPTIONAL_ENV_VARS = {
"category": "provider",
"advanced": True,
},
"STEPFUN_API_KEY": {
"description": "StepFun Step Plan API key",
"prompt": "StepFun Step Plan API key",
"url": "https://platform.stepfun.com/",
"password": True,
"category": "provider",
"advanced": True,
},
"STEPFUN_BASE_URL": {
"description": "StepFun Step Plan base URL override",
"prompt": "StepFun Step Plan base URL (leave empty for default)",
"url": None,
"password": False,
"category": "provider",
"advanced": True,
},
"ARCEEAI_API_KEY": {
"description": "Arcee AI API key",
"prompt": "Arcee AI API key",
@@ -1281,20 +1237,6 @@ OPTIONAL_ENV_VARS = {
"category": "provider",
"advanced": True,
},
"VOLCENGINE_API_KEY": {
"description": "Volcengine API key for Doubao / Seed models (standard + Coding Plan catalogs)",
"prompt": "Volcengine API Key",
"url": "https://www.volcengine.com/product/ark",
"password": True,
"category": "provider",
},
"BYTEPLUS_API_KEY": {
"description": "BytePlus API key for Seed / Dola models (standard + Coding Plan catalogs)",
"prompt": "BytePlus API Key",
"url": "https://www.byteplus.com/en/product/modelark",
"password": True,
"category": "provider",
},
"AWS_REGION": {
"description": "AWS region for Bedrock API calls (e.g. us-east-1, eu-central-1)",
"prompt": "AWS Region",
@@ -2160,7 +2102,6 @@ _KNOWN_ROOT_KEYS = {
"fallback_providers", "credential_pool_strategies", "toolsets",
"agent", "terminal", "display", "compression", "delegation",
"auxiliary", "custom_providers", "context", "memory", "gateway",
"sessions",
}
# Valid fields inside a custom_providers list entry
-1
View File
@@ -912,7 +912,6 @@ def run_doctor(args):
_apikey_providers = [
("Z.AI / GLM", ("GLM_API_KEY", "ZAI_API_KEY", "Z_AI_API_KEY"), "https://api.z.ai/api/paas/v4/models", "GLM_BASE_URL", True),
("Kimi / Moonshot", ("KIMI_API_KEY",), "https://api.moonshot.ai/v1/models", "KIMI_BASE_URL", True),
("StepFun Step Plan", ("STEPFUN_API_KEY",), "https://api.stepfun.ai/step_plan/v1/models", "STEPFUN_BASE_URL", True),
("Kimi / Moonshot (China)", ("KIMI_CN_API_KEY",), "https://api.moonshot.cn/v1/models", None, True),
("Arcee AI", ("ARCEEAI_API_KEY",), "https://api.arcee.ai/api/v1/models", "ARCEE_BASE_URL", True),
("DeepSeek", ("DEEPSEEK_API_KEY",), "https://api.deepseek.com/v1/models", "DEEPSEEK_BASE_URL", True),
-2
View File
@@ -160,8 +160,6 @@ def load_hermes_dotenv(
# Fix corrupted .env files before python-dotenv parses them (#8908).
if user_env.exists():
_sanitize_env_file_if_needed(user_env)
if project_env_path and project_env_path.exists():
_sanitize_env_file_if_needed(project_env_path)
if user_env.exists():
_load_dotenv_with_fallback(user_env, override=True)
+104 -118
View File
@@ -2639,120 +2639,9 @@ def _setup_dingtalk():
def _setup_wecom():
"""Interactive setup for WeCom — scan QR code or manual credential input."""
print()
print(color(" ─── 💬 WeCom (Enterprise WeChat) Setup ───", Colors.CYAN))
existing_bot_id = get_env_value("WECOM_BOT_ID")
existing_secret = get_env_value("WECOM_SECRET")
if existing_bot_id and existing_secret:
print()
print_success("WeCom is already configured.")
if not prompt_yes_no(" Reconfigure WeCom?", False):
return
# ── Choose setup method ──
print()
method_choices = [
"Scan QR code to obtain Bot ID and Secret automatically (recommended)",
"Enter existing Bot ID and Secret manually",
]
method_idx = prompt_choice(" How would you like to set up WeCom?", method_choices, 0)
bot_id = None
secret = None
if method_idx == 0:
# ── QR scan flow ──
try:
from gateway.platforms.wecom import qr_scan_for_bot_info
except Exception as exc:
print_error(f" WeCom QR scan import failed: {exc}")
qr_scan_for_bot_info = None
if qr_scan_for_bot_info is not None:
try:
credentials = qr_scan_for_bot_info()
except KeyboardInterrupt:
print()
print_warning(" WeCom setup cancelled.")
return
except Exception as exc:
print_warning(f" QR scan failed: {exc}")
credentials = None
if credentials:
bot_id = credentials.get("bot_id", "")
secret = credentials.get("secret", "")
print_success(" ✔ QR scan successful! Bot ID and Secret obtained.")
if not bot_id or not secret:
print_info(" QR scan did not complete. Continuing with manual input.")
bot_id = None
secret = None
# ── Manual credential input ──
if not bot_id or not secret:
print()
print_info(" 1. Go to WeCom Application → Workspace → Smart Robot -> Create smart robots")
print_info(" 2. Select API Mode")
print_info(" 3. Copy the Bot ID and Secret from the bot's credentials info")
print_info(" 4. The bot connects via WebSocket — no public endpoint needed")
print()
bot_id = prompt(" Bot ID", password=False)
if not bot_id:
print_warning(" Skipped — WeCom won't work without a Bot ID.")
return
secret = prompt(" Secret", password=True)
if not secret:
print_warning(" Skipped — WeCom won't work without a Secret.")
return
# ── Save core credentials ──
save_env_value("WECOM_BOT_ID", bot_id)
save_env_value("WECOM_SECRET", secret)
# ── Allowed users (deny-by-default security) ──
print()
print_info(" The gateway DENIES all users by default for security.")
print_info(" Enter user IDs to create an allowlist, or leave empty.")
allowed = prompt(" Allowed user IDs (comma-separated, or empty)", password=False)
if allowed:
cleaned = allowed.replace(" ", "")
save_env_value("WECOM_ALLOWED_USERS", cleaned)
print_success(" Saved — only these users can interact with the bot.")
else:
print()
access_choices = [
"Enable open access (anyone can message the bot)",
"Use DM pairing (unknown users request access, you approve with 'hermes pairing approve')",
"Disable direct messages",
"Skip for now (bot will deny all users until configured)",
]
access_idx = prompt_choice(" How should unauthorized users be handled?", access_choices, 1)
if access_idx == 0:
save_env_value("WECOM_DM_POLICY", "open")
save_env_value("GATEWAY_ALLOW_ALL_USERS", "true")
print_warning(" Open access enabled — anyone can use your bot!")
elif access_idx == 1:
save_env_value("WECOM_DM_POLICY", "pairing")
print_success(" DM pairing mode — users will receive a code to request access.")
print_info(" Approve with: hermes pairing approve <platform> <code>")
elif access_idx == 2:
save_env_value("WECOM_DM_POLICY", "disabled")
print_warning(" Direct messages disabled.")
else:
print_info(" Skipped — configure later with 'hermes gateway setup'")
# ── Home channel (optional) ──
print()
print_info(" Chat ID for scheduled results and notifications.")
home = prompt(" Home chat ID (optional, for cron/notifications)", password=False)
if home:
save_env_value("WECOM_HOME_CHANNEL", home)
print_success(f" Home channel set to {home}")
print()
print_success("💬 WeCom configured!")
"""Configure WeCom (Enterprise WeChat) via the standard platform setup."""
wecom_platform = next(p for p in _PLATFORMS if p["key"] == "wecom")
_setup_standard_platform(wecom_platform)
def _is_service_installed() -> bool:
@@ -3132,8 +3021,7 @@ def _setup_qqbot():
if method_idx == 0:
# ── QR scan-to-configure ──
try:
from gateway.platforms.qqbot import qr_register
credentials = qr_register()
credentials = _qqbot_qr_flow()
except KeyboardInterrupt:
print()
print_warning(" QQ Bot setup cancelled.")
@@ -3215,6 +3103,106 @@ def _setup_qqbot():
print_info(f" App ID: {credentials['app_id']}")
def _qqbot_render_qr(url: str) -> bool:
"""Try to render a QR code in the terminal. Returns True if successful."""
try:
import qrcode as _qr
qr = _qr.QRCode(border=1,error_correction=_qr.constants.ERROR_CORRECT_L)
qr.add_data(url)
qr.make(fit=True)
qr.print_ascii(invert=True)
return True
except Exception:
return False
def _qqbot_qr_flow():
"""Run the QR-code scan-to-configure flow.
Returns a dict with app_id, client_secret, user_openid on success,
or None on failure/cancel.
"""
try:
from gateway.platforms.qqbot import (
create_bind_task, poll_bind_result, build_connect_url,
decrypt_secret, BindStatus,
)
from gateway.platforms.qqbot.constants import ONBOARD_POLL_INTERVAL
except Exception as exc:
print_error(f" QQBot onboard import failed: {exc}")
return None
import asyncio
import time
MAX_REFRESHES = 3
refresh_count = 0
while refresh_count <= MAX_REFRESHES:
loop = asyncio.new_event_loop()
# ── Create bind task ──
try:
task_id, aes_key = loop.run_until_complete(create_bind_task())
except Exception as e:
print_warning(f" Failed to create bind task: {e}")
loop.close()
return None
url = build_connect_url(task_id)
# ── Display QR code + URL ──
print()
if _qqbot_render_qr(url):
print(f" Scan the QR code above, or open this URL directly:\n {url}")
else:
print(f" Open this URL in QQ on your phone:\n {url}")
print_info(" Tip: pip install qrcode to show a scannable QR code here")
# ── Poll loop (silent — keep QR visible at bottom) ──
try:
while True:
try:
status, app_id, encrypted_secret, user_openid = loop.run_until_complete(
poll_bind_result(task_id)
)
except Exception:
time.sleep(ONBOARD_POLL_INTERVAL)
continue
if status == BindStatus.COMPLETED:
client_secret = decrypt_secret(encrypted_secret, aes_key)
print()
print_success(f" QR scan complete! (App ID: {app_id})")
if user_openid:
print_info(f" Scanner's OpenID: {user_openid}")
return {
"app_id": app_id,
"client_secret": client_secret,
"user_openid": user_openid,
}
if status == BindStatus.EXPIRED:
refresh_count += 1
if refresh_count > MAX_REFRESHES:
print()
print_warning(f" QR code expired {MAX_REFRESHES} times — giving up.")
return None
print()
print_warning(f" QR code expired, refreshing... ({refresh_count}/{MAX_REFRESHES})")
loop.close()
break # outer while creates a new task
time.sleep(ONBOARD_POLL_INTERVAL)
except KeyboardInterrupt:
loop.close()
raise
finally:
loop.close()
return None
def _setup_signal():
"""Interactive setup for Signal messenger."""
import shutil
@@ -3402,8 +3390,6 @@ def gateway_setup():
_setup_feishu()
elif platform["key"] == "qqbot":
_setup_qqbot()
elif platform["key"] == "wecom":
_setup_wecom()
else:
_setup_standard_platform(platform)
+4 -210
View File
@@ -1566,12 +1566,8 @@ def select_provider_and_model(args=None):
_model_flow_anthropic(config, current_model)
elif selected_provider == "kimi-coding":
_model_flow_kimi(config, current_model)
elif selected_provider == "stepfun":
_model_flow_stepfun(config, current_model)
elif selected_provider == "bedrock":
_model_flow_bedrock(config, current_model)
elif selected_provider in ("volcengine", "byteplus"):
_model_flow_contract_provider(config, selected_provider, current_model)
elif selected_provider in (
"gemini",
"deepseek",
@@ -1956,7 +1952,7 @@ def _aux_flow_custom_endpoint(task: str, task_cfg: dict) -> None:
print(f"{display_name}: custom ({short_url})" + (f" · {model}" if model else ""))
def _prompt_provider_choice(choices, *, default=0, title="Select provider:"):
def _prompt_provider_choice(choices, *, default=0):
"""Show provider selection menu with curses arrow-key navigation.
Falls back to a numbered list when curses is unavailable (e.g. piped
@@ -1965,7 +1961,8 @@ def _prompt_provider_choice(choices, *, default=0, title="Select provider:"):
"""
try:
from hermes_cli.setup import _curses_prompt_choice
idx = _curses_prompt_choice(title, choices, default)
idx = _curses_prompt_choice("Select provider:", choices, default)
if idx >= 0:
print()
return idx
@@ -1973,7 +1970,7 @@ def _prompt_provider_choice(choices, *, default=0, title="Select provider:"):
pass
# Fallback: numbered list
print(title)
print("Select provider:")
for i, c in enumerate(choices, 1):
marker = "" if i - 1 == default else " "
print(f" {marker} {i}. {c}")
@@ -2945,10 +2942,6 @@ def _model_flow_named_custom(config, provider_info):
# Curated model lists for direct API-key providers — single source in models.py
from hermes_cli.models import _PROVIDER_MODELS
from hermes_cli.volcengine_byteplus import (
base_url_for_provider_model,
provider_models,
)
def _current_reasoning_effort(config) -> str:
@@ -3469,140 +3462,6 @@ def _model_flow_kimi(config, current_model=""):
print("No change.")
def _infer_stepfun_region(base_url: str) -> str:
"""Infer the current StepFun region from the configured endpoint."""
normalized = (base_url or "").strip().lower()
if "api.stepfun.com" in normalized:
return "china"
return "international"
def _stepfun_base_url_for_region(region: str) -> str:
from hermes_cli.auth import (
STEPFUN_STEP_PLAN_CN_BASE_URL,
STEPFUN_STEP_PLAN_INTL_BASE_URL,
)
return (
STEPFUN_STEP_PLAN_CN_BASE_URL
if region == "china"
else STEPFUN_STEP_PLAN_INTL_BASE_URL
)
def _model_flow_stepfun(config, current_model=""):
"""StepFun Step Plan flow with region-specific endpoints."""
from hermes_cli.auth import (
PROVIDER_REGISTRY,
_prompt_model_selection,
_save_model_choice,
deactivate_provider,
)
from hermes_cli.config import get_env_value, save_env_value, load_config, save_config
from hermes_cli.models import fetch_api_models
provider_id = "stepfun"
pconfig = PROVIDER_REGISTRY[provider_id]
key_env = pconfig.api_key_env_vars[0] if pconfig.api_key_env_vars else ""
base_url_env = pconfig.base_url_env_var or ""
existing_key = ""
for ev in pconfig.api_key_env_vars:
existing_key = get_env_value(ev) or os.getenv(ev, "")
if existing_key:
break
if not existing_key:
print(f"No {pconfig.name} API key configured.")
if key_env:
try:
import getpass
new_key = getpass.getpass(f"{key_env} (or Enter to cancel): ").strip()
except (KeyboardInterrupt, EOFError):
print()
return
if not new_key:
print("Cancelled.")
return
save_env_value(key_env, new_key)
existing_key = new_key
print("API key saved.")
print()
else:
print(f" {pconfig.name} API key: {existing_key[:8]}... ✓")
print()
current_base = ""
if base_url_env:
current_base = get_env_value(base_url_env) or os.getenv(base_url_env, "")
if not current_base:
model_cfg = config.get("model")
if isinstance(model_cfg, dict):
current_base = str(model_cfg.get("base_url") or "").strip()
current_region = _infer_stepfun_region(current_base or pconfig.inference_base_url)
region_choices = [
("international", f"International ({_stepfun_base_url_for_region('international')})"),
("china", f"China ({_stepfun_base_url_for_region('china')})"),
]
ordered_regions = []
for region_key, label in region_choices:
if region_key == current_region:
ordered_regions.insert(0, (region_key, f"{label} ← currently active"))
else:
ordered_regions.append((region_key, label))
ordered_regions.append(("cancel", "Cancel"))
region_idx = _prompt_provider_choice([label for _, label in ordered_regions])
if region_idx is None or ordered_regions[region_idx][0] == "cancel":
print("No change.")
return
selected_region = ordered_regions[region_idx][0]
effective_base = _stepfun_base_url_for_region(selected_region)
if base_url_env:
save_env_value(base_url_env, effective_base)
live_models = fetch_api_models(existing_key, effective_base)
if live_models:
model_list = live_models
print(f" Found {len(model_list)} model(s) from {pconfig.name} API")
else:
model_list = _PROVIDER_MODELS.get(provider_id, [])
if model_list:
print(
f" Could not auto-detect models from {pconfig.name} API — "
"showing Step Plan fallback catalog."
)
if model_list:
selected = _prompt_model_selection(model_list, current_model=current_model)
else:
try:
selected = input("Model name: ").strip()
except (KeyboardInterrupt, EOFError):
selected = None
if selected:
_save_model_choice(selected)
cfg = load_config()
model = cfg.get("model")
if not isinstance(model, dict):
model = {"default": model} if model else {}
cfg["model"] = model
model["provider"] = provider_id
model["base_url"] = effective_base
model.pop("api_mode", None)
save_config(cfg)
deactivate_provider()
config["model"] = dict(model)
print(f"Default model set to: {selected} (via {pconfig.name})")
else:
print("No change.")
def _model_flow_bedrock_api_key(config, region, current_model=""):
"""Bedrock API Key mode — uses the OpenAI-compatible bedrock-mantle endpoint.
@@ -4038,70 +3897,6 @@ def _model_flow_api_key_provider(config, provider_id, current_model=""):
print("No change.")
def _model_flow_contract_provider(config, provider_id, current_model=""):
"""Provider flow for Volcengine / BytePlus contract-backed catalogs."""
from hermes_cli.auth import (
PROVIDER_REGISTRY,
_prompt_model_selection,
_save_model_choice,
deactivate_provider,
)
from hermes_cli.config import get_env_value, load_config, save_config, save_env_value
pconfig = PROVIDER_REGISTRY[provider_id]
key_env = pconfig.api_key_env_vars[0] if pconfig.api_key_env_vars else ""
existing_key = ""
for env_var in pconfig.api_key_env_vars:
existing_key = get_env_value(env_var) or os.getenv(env_var, "")
if existing_key:
break
if not existing_key:
print(f"No {pconfig.name} API key configured.")
if key_env:
try:
import getpass
new_key = getpass.getpass(f"{key_env} (or Enter to cancel): ").strip()
except (KeyboardInterrupt, EOFError):
print()
return
if not new_key:
print("Cancelled.")
return
save_env_value(key_env, new_key)
print("API key saved.")
print()
else:
print(f" {pconfig.name} API key: {existing_key[:8]}... ✓")
print()
model_list = provider_models(provider_id)
if not model_list:
print(f"No curated model catalog found for {pconfig.name}.")
return
selected = _prompt_model_selection(model_list, current_model=current_model)
if not selected:
print("No change.")
return
_save_model_choice(selected)
cfg = load_config()
model = cfg.get("model")
if not isinstance(model, dict):
model = {"default": model} if model else {}
cfg["model"] = model
model["provider"] = provider_id
model["base_url"] = base_url_for_provider_model(provider_id, selected)
model.pop("api_mode", None)
save_config(cfg)
deactivate_provider()
print(f"Default model set to: {selected} (via {pconfig.name})")
def _run_anthropic_oauth_flow(save_env_value):
"""Run the Claude OAuth setup-token flow. Returns True if credentials were saved."""
from agent.anthropic_adapter import (
@@ -6735,7 +6530,6 @@ For more help on a command:
"zai",
"kimi-coding",
"kimi-coding-cn",
"stepfun",
"minimax",
"minimax-cn",
"kilocode",
+1 -2
View File
@@ -97,8 +97,6 @@ _MATCHING_PREFIX_STRIP_PROVIDERS: frozenset[str] = frozenset({
"xiaomi",
"arcee",
"ollama-cloud",
"volcengine",
"byteplus",
"custom",
})
@@ -425,3 +423,4 @@ def normalize_model_for_provider(model_input: str, target_provider: str) -> str:
# ---------------------------------------------------------------------------
# Batch / convenience helpers
# ---------------------------------------------------------------------------
+1 -1
View File
@@ -143,7 +143,7 @@ MODEL_ALIASES: dict[str, ModelIdentity] = {
# Z.AI / GLM
"glm": ModelIdentity("z-ai", "glm"),
# Step Plan (StepFun)
# StepFun
"step": ModelIdentity("stepfun", "step"),
# Xiaomi
+3 -58
View File
@@ -22,12 +22,6 @@ from hermes_cli import __version__ as _HERMES_VERSION
# Check (error 1010) don't reject the default ``Python-urllib/*`` signature.
_HERMES_USER_AGENT = f"hermes-cli/{_HERMES_VERSION}"
from hermes_cli.volcengine_byteplus import (
BYTEPLUS_PROVIDER,
VOLCENGINE_PROVIDER,
provider_models,
)
COPILOT_BASE_URL = "https://api.githubcopilot.com"
COPILOT_MODELS_URL = f"{COPILOT_BASE_URL}/models"
COPILOT_EDITOR_VERSION = "vscode/1.104.1"
@@ -216,10 +210,6 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"kimi-k2-turbo-preview",
"kimi-k2-0905-preview",
],
"stepfun": [
"step-3.5-flash",
"step-3.5-flash-2603",
],
"moonshot": [
"kimi-k2.6",
"kimi-k2.5",
@@ -362,8 +352,6 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"us.meta.llama4-maverick-17b-instruct-v1:0",
"us.meta.llama4-scout-17b-instruct-v1:0",
],
VOLCENGINE_PROVIDER: provider_models(VOLCENGINE_PROVIDER),
BYTEPLUS_PROVIDER: provider_models(BYTEPLUS_PROVIDER),
}
# Vercel AI Gateway: derive the bare-model-id catalog from the curated
@@ -698,8 +686,6 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
ProviderEntry("ai-gateway", "Vercel AI Gateway", "Vercel AI Gateway (200+ models, $5 free credit, no markup)"),
ProviderEntry("anthropic", "Anthropic", "Anthropic (Claude models — API key or Claude Code)"),
ProviderEntry("openai-codex", "OpenAI Codex", "OpenAI Codex"),
ProviderEntry(VOLCENGINE_PROVIDER, "Volcengine", "Volcengine (standard + Coding Plan catalogs)"),
ProviderEntry(BYTEPLUS_PROVIDER, "BytePlus", "BytePlus (standard + Coding Plan catalogs)"),
ProviderEntry("xiaomi", "Xiaomi MiMo", "Xiaomi MiMo (MiMo-V2 models — pro, omni, flash)"),
ProviderEntry("nvidia", "NVIDIA NIM", "NVIDIA NIM (Nemotron models — build.nvidia.com or local NIM)"),
ProviderEntry("qwen-oauth", "Qwen OAuth (Portal)", "Qwen OAuth (reuses local Qwen CLI login)"),
@@ -713,7 +699,6 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
ProviderEntry("zai", "Z.AI / GLM", "Z.AI / GLM (Zhipu AI direct API)"),
ProviderEntry("kimi-coding", "Kimi / Kimi Coding Plan", "Kimi Coding Plan (api.kimi.com) & Moonshot API"),
ProviderEntry("kimi-coding-cn", "Kimi / Moonshot (China)", "Kimi / Moonshot China (Moonshot CN direct API)"),
ProviderEntry("stepfun", "StepFun Step Plan", "StepFun Step Plan (agent/coding models via Step Plan API)"),
ProviderEntry("minimax", "MiniMax", "MiniMax (global direct API)"),
ProviderEntry("minimax-cn", "MiniMax (China)", "MiniMax China (domestic direct API)"),
ProviderEntry("alibaba", "Alibaba Cloud (DashScope)","Alibaba Cloud / DashScope Coding (Qwen + multi-provider)"),
@@ -729,6 +714,7 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
_PROVIDER_LABELS = {p.slug: p.label for p in CANONICAL_PROVIDERS}
_PROVIDER_LABELS["custom"] = "Custom endpoint" # special case: not a named provider
_PROVIDER_ALIASES = {
"glm": "zai",
"z-ai": "zai",
@@ -747,8 +733,6 @@ _PROVIDER_ALIASES = {
"moonshot": "kimi-coding",
"kimi-cn": "kimi-coding-cn",
"moonshot-cn": "kimi-coding-cn",
"step": "stepfun",
"stepfun-coding-plan": "stepfun",
"arcee-ai": "arcee",
"arceeai": "arcee",
"minimax-china": "minimax-cn",
@@ -791,10 +775,6 @@ _PROVIDER_ALIASES = {
"nemotron": "nvidia",
"ollama": "custom", # bare "ollama" = local; use "ollama-cloud" for cloud
"ollama_cloud": "ollama-cloud",
"volcengine-coding-plan": VOLCENGINE_PROVIDER,
"volcengine_coding_plan": VOLCENGINE_PROVIDER,
"byteplus-coding-plan": BYTEPLUS_PROVIDER,
"byteplus_coding_plan": BYTEPLUS_PROVIDER,
}
@@ -1255,6 +1235,7 @@ def list_available_providers() -> list[dict[str, str]]:
"""
# Derive display order from canonical list + custom
provider_order = [p.slug for p in CANONICAL_PROVIDERS] + ["custom"]
# Build reverse alias map
aliases_for: dict[str, list[str]] = {}
for alias, canonical in _PROVIDER_ALIASES.items():
@@ -1270,7 +1251,7 @@ def list_available_providers() -> list[dict[str, str]]:
from hermes_cli.auth import get_auth_status, has_usable_secret
if pid == "custom":
custom_base_url = _get_custom_base_url() or ""
has_creds = bool(custom_base_url.strip()) and provider_for_base_url(custom_base_url) is None
has_creds = bool(custom_base_url.strip())
elif pid == "openrouter":
has_creds = has_usable_secret(os.getenv("OPENROUTER_API_KEY", ""))
else:
@@ -1336,29 +1317,6 @@ def _get_custom_base_url() -> str:
return ""
def provider_for_base_url(base_url: str) -> Optional[str]:
"""Return a known built-in provider for a configured base URL, if any.
Uses the canonical _URL_TO_PROVIDER mapping from model_metadata plus
additional entries for providers not in that dict.
"""
normalized = str(base_url or "").strip().rstrip("/")
if not normalized or "openrouter.ai" in normalized.lower():
return None
url_lower = normalized.lower()
# Primary source — shared with context-length resolution
from agent.model_metadata import _URL_TO_PROVIDER
for host, provider_id in _URL_TO_PROVIDER.items():
if host in url_lower:
canonical = normalize_provider(provider_id)
if canonical in _PROVIDER_LABELS and canonical != "custom":
return canonical
return None
def curated_models_for_provider(
provider: Optional[str],
*,
@@ -1655,19 +1613,6 @@ def provider_model_ids(provider: Optional[str], *, force_refresh: bool = False)
return live
except Exception:
pass
if normalized == "stepfun":
try:
from hermes_cli.auth import resolve_api_key_provider_credentials
creds = resolve_api_key_provider_credentials("stepfun")
api_key = str(creds.get("api_key") or "").strip()
base_url = str(creds.get("base_url") or "").strip()
if api_key and base_url:
live = fetch_api_models(api_key, base_url)
if live:
return live
except Exception:
pass
if normalized == "anthropic":
live = _fetch_anthropic_models()
if live:
-24
View File
@@ -734,30 +734,6 @@ class PluginManager:
)
kind = "standalone"
# Auto-coerce user-installed memory providers to kind="exclusive"
# so they're routed to plugins/memory discovery instead of being
# loaded by the general PluginManager (which has no
# register_memory_provider on PluginContext). Mirrors the
# heuristic in plugins/memory/__init__.py:_is_memory_provider_dir.
# Bundled memory providers are already skipped via skip_names.
if kind == "standalone" and "kind" not in data:
init_file = plugin_dir / "__init__.py"
if init_file.exists():
try:
source_text = init_file.read_text(errors="replace")[:8192]
if (
"register_memory_provider" in source_text
or "MemoryProvider" in source_text
):
kind = "exclusive"
logger.debug(
"Plugin %s: detected memory provider, "
"treating as kind='exclusive'",
key,
)
except Exception:
pass
return PluginManifest(
name=name,
version=str(data.get("version", "")),
-33
View File
@@ -23,12 +23,6 @@ import logging
from dataclasses import dataclass
from typing import Any, Dict, List, Optional, Tuple
from hermes_cli.volcengine_byteplus import (
BYTEPLUS_PROVIDER,
BYTEPLUS_STANDARD_BASE_URL,
VOLCENGINE_PROVIDER,
VOLCENGINE_STANDARD_BASE_URL,
)
from utils import base_url_host_matches, base_url_hostname
logger = logging.getLogger(__name__)
@@ -100,12 +94,6 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
transport="openai_chat",
base_url_env_var="KIMI_BASE_URL",
),
"stepfun": HermesOverlay(
transport="openai_chat",
extra_env_vars=("STEPFUN_API_KEY",),
base_url_override="https://api.stepfun.ai/step_plan/v1",
base_url_env_var="STEPFUN_BASE_URL",
),
"minimax": HermesOverlay(
transport="anthropic_messages",
base_url_env_var="MINIMAX_BASE_URL",
@@ -169,16 +157,6 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
transport="openai_chat",
base_url_env_var="OLLAMA_BASE_URL",
),
VOLCENGINE_PROVIDER: HermesOverlay(
transport="openai_chat",
extra_env_vars=("VOLCENGINE_API_KEY",),
base_url_override=VOLCENGINE_STANDARD_BASE_URL,
),
BYTEPLUS_PROVIDER: HermesOverlay(
transport="openai_chat",
extra_env_vars=("BYTEPLUS_API_KEY",),
base_url_override=BYTEPLUS_STANDARD_BASE_URL,
),
}
@@ -232,10 +210,6 @@ ALIASES: Dict[str, str] = {
"kimi-coding-cn": "kimi-for-coding",
"moonshot": "kimi-for-coding",
# stepfun
"step": "stepfun",
"stepfun-coding-plan": "stepfun",
# minimax-cn
"minimax-china": "minimax-cn",
"minimax_cn": "minimax-cn",
@@ -289,10 +263,6 @@ ALIASES: Dict[str, str] = {
# xiaomi
"mimo": "xiaomi",
"xiaomi-mimo": "xiaomi",
"volcengine-coding-plan": VOLCENGINE_PROVIDER,
"volcengine_coding_plan": VOLCENGINE_PROVIDER,
"byteplus-coding-plan": BYTEPLUS_PROVIDER,
"byteplus_coding_plan": BYTEPLUS_PROVIDER,
# bedrock
"aws": "bedrock",
@@ -324,10 +294,7 @@ _LABEL_OVERRIDES: Dict[str, str] = {
"nous": "Nous Portal",
"openai-codex": "OpenAI Codex",
"copilot-acp": "GitHub Copilot ACP",
"stepfun": "StepFun Step Plan",
"xiaomi": "Xiaomi MiMo",
VOLCENGINE_PROVIDER: "Volcengine",
BYTEPLUS_PROVIDER: "BytePlus",
"local": "Local endpoint",
"bedrock": "AWS Bedrock",
"ollama-cloud": "Ollama Cloud",
+1 -1
View File
@@ -643,7 +643,7 @@ def _resolve_explicit_runtime(
base_url = explicit_base_url
if not base_url:
if provider in ("kimi-coding", "kimi-coding-cn", "volcengine", "byteplus"):
if provider in ("kimi-coding", "kimi-coding-cn"):
creds = resolve_api_key_provider_credentials(provider)
base_url = creds.get("base_url", "").rstrip("/")
else:
-2
View File
@@ -96,7 +96,6 @@ _DEFAULT_PROVIDER_MODELS = {
"zai": ["glm-5.1", "glm-5", "glm-4.7", "glm-4.5", "glm-4.5-flash"],
"kimi-coding": ["kimi-k2.6", "kimi-k2.5", "kimi-k2-thinking", "kimi-k2-turbo-preview"],
"kimi-coding-cn": ["kimi-k2.6", "kimi-k2.5", "kimi-k2-thinking", "kimi-k2-turbo-preview"],
"stepfun": ["step-3.5-flash", "step-3.5-flash-2603"],
"arcee": ["trinity-large-thinking", "trinity-large-preview", "trinity-mini"],
"minimax": ["MiniMax-M2.7", "MiniMax-M2.5", "MiniMax-M2.1", "MiniMax-M2"],
"minimax-cn": ["MiniMax-M2.7", "MiniMax-M2.5", "MiniMax-M2.1", "MiniMax-M2"],
@@ -805,7 +804,6 @@ def setup_model_provider(config: dict, *, quick: bool = False):
"zai": "Z.AI / GLM",
"kimi-coding": "Kimi / Moonshot",
"kimi-coding-cn": "Kimi / Moonshot (China)",
"stepfun": "StepFun Step Plan",
"minimax": "MiniMax",
"minimax-cn": "MiniMax CN",
"anthropic": "Anthropic",
-2
View File
@@ -122,7 +122,6 @@ def show_status(args):
"OpenAI": "OPENAI_API_KEY",
"Z.AI/GLM": "GLM_API_KEY",
"Kimi": "KIMI_API_KEY",
"StepFun Step Plan": "STEPFUN_API_KEY",
"MiniMax": "MINIMAX_API_KEY",
"MiniMax-CN": "MINIMAX_CN_API_KEY",
"Firecrawl": "FIRECRAWL_API_KEY",
@@ -253,7 +252,6 @@ def show_status(args):
apikey_providers = {
"Z.AI / GLM": ("GLM_API_KEY", "ZAI_API_KEY", "Z_AI_API_KEY"),
"Kimi / Moonshot": ("KIMI_API_KEY",),
"StepFun Step Plan": ("STEPFUN_API_KEY",),
"MiniMax": ("MINIMAX_API_KEY",),
"MiniMax (China)": ("MINIMAX_CN_API_KEY",),
}
-134
View File
@@ -1,134 +0,0 @@
"""Source-of-truth contracts for built-in providers without models.dev catalogs."""
from __future__ import annotations
from typing import Dict, List, Tuple
VOLCENGINE_PROVIDER = "volcengine"
BYTEPLUS_PROVIDER = "byteplus"
VOLCENGINE_STANDARD_BASE_URL = "https://ark.cn-beijing.volces.com/api/v3"
VOLCENGINE_CODING_PLAN_BASE_URL = "https://ark.cn-beijing.volces.com/api/coding/v3"
BYTEPLUS_STANDARD_BASE_URL = "https://ark.ap-southeast.bytepluses.com/api/v3"
BYTEPLUS_CODING_PLAN_BASE_URL = "https://ark.ap-southeast.bytepluses.com/api/coding/v3"
VOLCENGINE_STANDARD_MODELS: Tuple[str, ...] = (
"doubao-seed-2-0-pro-260215",
"doubao-seed-2-0-lite-260215",
"doubao-seed-2-0-mini-260215",
"doubao-seed-2-0-code-preview-260215",
"kimi-k2-5-260127",
"glm-4-7-251222",
"deepseek-v3-2-251201",
)
VOLCENGINE_CODING_PLAN_MODELS: Tuple[str, ...] = (
"doubao-seed-2.0-code",
"doubao-seed-2.0-pro",
"doubao-seed-2.0-lite",
"doubao-seed-code",
"minimax-m2.5",
"glm-4.7",
"deepseek-v3.2",
"kimi-k2.5",
)
BYTEPLUS_STANDARD_MODELS: Tuple[str, ...] = (
"seed-2-0-pro-260328",
"seed-2-0-lite-260228",
"seed-2-0-mini-260215",
"kimi-k2-5-260127",
"glm-4-7-251222",
)
BYTEPLUS_CODING_PLAN_MODELS: Tuple[str, ...] = (
"dola-seed-2.0-pro",
"dola-seed-2.0-lite",
"bytedance-seed-code",
"glm-4.7",
"kimi-k2.5",
"gpt-oss-120b",
)
VOLCENGINE_STANDARD_MODEL_REFS: Tuple[str, ...] = tuple(
f"{VOLCENGINE_PROVIDER}/{model_id}" for model_id in VOLCENGINE_STANDARD_MODELS
)
VOLCENGINE_CODING_PLAN_MODEL_REFS: Tuple[str, ...] = tuple(
f"{VOLCENGINE_PROVIDER}-coding-plan/{model_id}" for model_id in VOLCENGINE_CODING_PLAN_MODELS
)
BYTEPLUS_STANDARD_MODEL_REFS: Tuple[str, ...] = tuple(
f"{BYTEPLUS_PROVIDER}/{model_id}" for model_id in BYTEPLUS_STANDARD_MODELS
)
BYTEPLUS_CODING_PLAN_MODEL_REFS: Tuple[str, ...] = tuple(
f"{BYTEPLUS_PROVIDER}-coding-plan/{model_id}" for model_id in BYTEPLUS_CODING_PLAN_MODELS
)
PROVIDER_MODEL_CATALOGS: Dict[str, Tuple[str, ...]] = {
VOLCENGINE_PROVIDER: VOLCENGINE_STANDARD_MODEL_REFS + VOLCENGINE_CODING_PLAN_MODEL_REFS,
BYTEPLUS_PROVIDER: BYTEPLUS_STANDARD_MODEL_REFS + BYTEPLUS_CODING_PLAN_MODEL_REFS,
}
MODEL_CONTEXT_WINDOWS: Dict[str, int] = {
"doubao-seed-2-0-pro-260215": 256000,
"doubao-seed-2-0-lite-260215": 256000,
"doubao-seed-2-0-mini-260215": 256000,
"doubao-seed-2-0-code-preview-260215": 256000,
"kimi-k2-5-260127": 256000,
"glm-4-7-251222": 200000,
"deepseek-v3-2-251201": 128000,
"doubao-seed-2.0-code": 256000,
"doubao-seed-2.0-pro": 256000,
"doubao-seed-2.0-lite": 256000,
"doubao-seed-code": 256000,
"minimax-m2.5": 200000,
"glm-4.7": 200000,
"deepseek-v3.2": 128000,
"kimi-k2.5": 256000,
"seed-2-0-pro-260328": 256000,
"seed-2-0-lite-260228": 256000,
"seed-2-0-mini-260215": 256000,
}
def provider_models(provider_id: str) -> List[str]:
"""Return the full user-facing model catalog for a provider."""
return list(PROVIDER_MODEL_CATALOGS.get(provider_id, ()))
def _bare_model_name(model_name: str) -> str:
value = (model_name or "").strip()
if not value:
return ""
if "/" in value:
return value.split("/", 1)[1].strip()
return value
def is_coding_plan_model(provider_id: str, model_name: str) -> bool:
"""Return True when a model belongs to the coding-plan catalog."""
raw = (model_name or "").strip()
bare = _bare_model_name(raw)
if provider_id == VOLCENGINE_PROVIDER:
return raw in VOLCENGINE_CODING_PLAN_MODEL_REFS or bare in VOLCENGINE_CODING_PLAN_MODELS
if provider_id == BYTEPLUS_PROVIDER:
return raw in BYTEPLUS_CODING_PLAN_MODEL_REFS or bare in BYTEPLUS_CODING_PLAN_MODELS
return False
def base_url_for_provider_model(provider_id: str, model_name: str) -> str:
"""Resolve the source-of-truth base URL for a provider+model pair."""
if provider_id == VOLCENGINE_PROVIDER:
if is_coding_plan_model(provider_id, model_name):
return VOLCENGINE_CODING_PLAN_BASE_URL
return VOLCENGINE_STANDARD_BASE_URL
if provider_id == BYTEPLUS_PROVIDER:
if is_coding_plan_model(provider_id, model_name):
return BYTEPLUS_CODING_PLAN_BASE_URL
return BYTEPLUS_STANDARD_BASE_URL
return ""
def model_context_window(model_name: str) -> int | None:
"""Return a known context window for a model, if specified by the contract."""
bare = _bare_model_name(model_name)
return MODEL_CONTEXT_WINDOWS.get(bare)
+3 -6
View File
@@ -2189,8 +2189,7 @@ async def get_usage_analytics(days: int = 30):
SUM(reasoning_tokens) as reasoning_tokens,
COALESCE(SUM(estimated_cost_usd), 0) as estimated_cost,
COALESCE(SUM(actual_cost_usd), 0) as actual_cost,
COUNT(*) as sessions,
SUM(COALESCE(api_call_count, 0)) as api_calls
COUNT(*) as sessions
FROM sessions WHERE started_at > ?
GROUP BY day ORDER BY day
""", (cutoff,))
@@ -2201,8 +2200,7 @@ async def get_usage_analytics(days: int = 30):
SUM(input_tokens) as input_tokens,
SUM(output_tokens) as output_tokens,
COALESCE(SUM(estimated_cost_usd), 0) as estimated_cost,
COUNT(*) as sessions,
SUM(COALESCE(api_call_count, 0)) as api_calls
COUNT(*) as sessions
FROM sessions WHERE started_at > ? AND model IS NOT NULL
GROUP BY model ORDER BY SUM(input_tokens) + SUM(output_tokens) DESC
""", (cutoff,))
@@ -2215,8 +2213,7 @@ async def get_usage_analytics(days: int = 30):
SUM(reasoning_tokens) as total_reasoning,
COALESCE(SUM(estimated_cost_usd), 0) as total_estimated_cost,
COALESCE(SUM(actual_cost_usd), 0) as total_actual_cost,
COUNT(*) as total_sessions,
SUM(COALESCE(api_call_count, 0)) as total_api_calls
COUNT(*) as total_sessions
FROM sessions WHERE started_at > ?
""", (cutoff,))
totals = dict(cur3.fetchone())
+6 -154
View File
@@ -31,7 +31,7 @@ T = TypeVar("T")
DEFAULT_DB_PATH = get_hermes_home() / "state.db"
SCHEMA_VERSION = 8
SCHEMA_VERSION = 6
SCHEMA_SQL = """
CREATE TABLE IF NOT EXISTS schema_version (
@@ -65,7 +65,6 @@ CREATE TABLE IF NOT EXISTS sessions (
cost_source TEXT,
pricing_version TEXT,
title TEXT,
api_call_count INTEGER DEFAULT 0,
FOREIGN KEY (parent_session_id) REFERENCES sessions(id)
);
@@ -81,16 +80,10 @@ CREATE TABLE IF NOT EXISTS messages (
token_count INTEGER,
finish_reason TEXT,
reasoning TEXT,
reasoning_content TEXT,
reasoning_details TEXT,
codex_reasoning_items TEXT
);
CREATE TABLE IF NOT EXISTS state_meta (
key TEXT PRIMARY KEY,
value TEXT
);
CREATE INDEX IF NOT EXISTS idx_sessions_source ON sessions(source);
CREATE INDEX IF NOT EXISTS idx_sessions_parent ON sessions(parent_session_id);
CREATE INDEX IF NOT EXISTS idx_sessions_started ON sessions(started_at DESC);
@@ -336,26 +329,6 @@ class SessionDB:
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 6")
if current_version < 7:
# v7: preserve provider-native reasoning_content separately from
# normalized reasoning text. Kimi/Moonshot replay can require
# this field on assistant tool-call messages when thinking is on.
try:
cursor.execute('ALTER TABLE messages ADD COLUMN "reasoning_content" TEXT')
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 7")
if current_version < 8:
# v8: add api_call_count column to sessions — tracks the number
# of individual LLM API calls made within a session (as opposed
# to the session count itself).
try:
cursor.execute(
'ALTER TABLE sessions ADD COLUMN "api_call_count" INTEGER DEFAULT 0'
)
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 8")
# Unique title index — always ensure it exists (safe to run after migrations
# since the title column is guaranteed to exist at this point)
@@ -462,7 +435,6 @@ class SessionDB:
billing_provider: Optional[str] = None,
billing_base_url: Optional[str] = None,
billing_mode: Optional[str] = None,
api_call_count: int = 0,
absolute: bool = False,
) -> None:
"""Update token counters and backfill model if not already set.
@@ -492,8 +464,7 @@ class SessionDB:
billing_provider = COALESCE(billing_provider, ?),
billing_base_url = COALESCE(billing_base_url, ?),
billing_mode = COALESCE(billing_mode, ?),
model = COALESCE(model, ?),
api_call_count = ?
model = COALESCE(model, ?)
WHERE id = ?"""
else:
sql = """UPDATE sessions SET
@@ -513,8 +484,7 @@ class SessionDB:
billing_provider = COALESCE(billing_provider, ?),
billing_base_url = COALESCE(billing_base_url, ?),
billing_mode = COALESCE(billing_mode, ?),
model = COALESCE(model, ?),
api_call_count = COALESCE(api_call_count, 0) + ?
model = COALESCE(model, ?)
WHERE id = ?"""
params = (
input_tokens,
@@ -532,7 +502,6 @@ class SessionDB:
billing_base_url,
billing_mode,
model,
api_call_count,
session_id,
)
def _do(conn):
@@ -953,7 +922,6 @@ class SessionDB:
token_count: int = None,
finish_reason: str = None,
reasoning: str = None,
reasoning_content: str = None,
reasoning_details: Any = None,
codex_reasoning_items: Any = None,
) -> int:
@@ -983,8 +951,8 @@ class SessionDB:
cursor = conn.execute(
"""INSERT INTO messages (session_id, role, content, tool_call_id,
tool_calls, tool_name, timestamp, token_count, finish_reason,
reasoning, reasoning_content, reasoning_details, codex_reasoning_items)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
reasoning, reasoning_details, codex_reasoning_items)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
(
session_id,
role,
@@ -996,7 +964,6 @@ class SessionDB:
token_count,
finish_reason,
reasoning,
reasoning_content,
reasoning_details_json,
codex_items_json,
),
@@ -1047,7 +1014,7 @@ class SessionDB:
with self._lock:
cursor = self._conn.execute(
"SELECT role, content, tool_call_id, tool_calls, tool_name, "
"reasoning, reasoning_content, reasoning_details, codex_reasoning_items "
"reasoning, reasoning_details, codex_reasoning_items "
"FROM messages WHERE session_id = ? ORDER BY timestamp, id",
(session_id,),
)
@@ -1071,8 +1038,6 @@ class SessionDB:
if row["role"] == "assistant":
if row["reasoning"]:
msg["reasoning"] = row["reasoning"]
if row["reasoning_content"] is not None:
msg["reasoning_content"] = row["reasoning_content"]
if row["reasoning_details"]:
try:
msg["reasoning_details"] = json.loads(row["reasoning_details"])
@@ -1476,116 +1441,3 @@ class SessionDB:
return len(session_ids)
return self._execute_write(_do)
# ── Meta key/value (for scheduler bookkeeping) ──
def get_meta(self, key: str) -> Optional[str]:
"""Read a value from the state_meta key/value store."""
with self._lock:
row = self._conn.execute(
"SELECT value FROM state_meta WHERE key = ?", (key,)
).fetchone()
if row is None:
return None
return row["value"] if isinstance(row, sqlite3.Row) else row[0]
def set_meta(self, key: str, value: str) -> None:
"""Write a value to the state_meta key/value store."""
def _do(conn):
conn.execute(
"INSERT INTO state_meta (key, value) VALUES (?, ?) "
"ON CONFLICT(key) DO UPDATE SET value = excluded.value",
(key, value),
)
self._execute_write(_do)
# ── Space reclamation ──
def vacuum(self) -> None:
"""Run VACUUM to reclaim disk space after large deletes.
SQLite does not shrink the database file when rows are deleted
freed pages just get reused on the next insert. After a prune that
removed hundreds of sessions, the file stays bloated unless we
explicitly VACUUM.
VACUUM rewrites the entire DB, so it's expensive (seconds per
100MB) and cannot run inside a transaction. It also acquires an
exclusive lock, so callers must ensure no other writers are
active. Safe to call at startup before the gateway/CLI starts
serving traffic.
"""
# VACUUM cannot be executed inside a transaction.
with self._lock:
# Best-effort WAL checkpoint first, then VACUUM.
try:
self._conn.execute("PRAGMA wal_checkpoint(TRUNCATE)")
except Exception:
pass
self._conn.execute("VACUUM")
def maybe_auto_prune_and_vacuum(
self,
retention_days: int = 90,
min_interval_hours: int = 24,
vacuum: bool = True,
) -> Dict[str, Any]:
"""Idempotent auto-maintenance: prune old sessions + optional VACUUM.
Records the last run timestamp in state_meta so subsequent calls
within ``min_interval_hours`` no-op. Designed to be called once at
startup from long-lived entrypoints (CLI, gateway, cron scheduler).
Never raises. On any failure, logs a warning and returns a dict
with ``"error"`` set.
Returns a dict with keys:
- ``"skipped"`` (bool) true if within min_interval_hours of last run
- ``"pruned"`` (int) number of sessions deleted
- ``"vacuumed"`` (bool) true if VACUUM ran
- ``"error"`` (str, optional) present only on failure
"""
result: Dict[str, Any] = {"skipped": False, "pruned": 0, "vacuumed": False}
try:
# Skip if another process/call did maintenance recently.
last_raw = self.get_meta("last_auto_prune")
now = time.time()
if last_raw:
try:
last_ts = float(last_raw)
if now - last_ts < min_interval_hours * 3600:
result["skipped"] = True
return result
except (TypeError, ValueError):
pass # corrupt meta; treat as no prior run
pruned = self.prune_sessions(older_than_days=retention_days)
result["pruned"] = pruned
# Only VACUUM if we actually freed rows — VACUUM on a tight DB
# is wasted I/O. Threshold keeps small DBs from paying the cost.
if vacuum and pruned > 0:
try:
self.vacuum()
result["vacuumed"] = True
except Exception as exc:
logger.warning("state.db VACUUM failed: %s", exc)
# Record the attempt even if pruned == 0, so we don't retry
# every startup within the min_interval_hours window.
self.set_meta("last_auto_prune", str(now))
if pruned > 0:
logger.info(
"state.db auto-maintenance: pruned %d session(s) older than %d days%s",
pruned,
retention_days,
" + VACUUM" if result["vacuumed"] else "",
)
except Exception as exc:
# Maintenance must never block startup. Log and return error marker.
logger.warning("state.db auto-maintenance failed: %s", exc)
result["error"] = str(exc)
return result
@@ -1,5 +0,0 @@
# Web Development
Optional skills for client-side web development workflows — embedding agents, copilots, and AI-native UX patterns into user-facing web apps.
These are distinct from Hermes' own browser automation (Browserbase, Camofox), which operate *on* websites from outside. Web-development skills here help users build *into* their own websites.
@@ -1,189 +0,0 @@
---
name: page-agent
description: Embed alibaba/page-agent into your own web application — a pure-JavaScript in-page GUI agent that ships as a single <script> tag or npm package and lets end-users of your site drive the UI with natural language ("click login, fill username as John"). No Python, no headless browser, no extension required. Use this skill when the user is a web developer who wants to add an AI copilot to their SaaS / admin panel / B2B tool, make a legacy web app accessible via natural language, or evaluate page-agent against a local (Ollama) or cloud (Qwen / OpenAI / OpenRouter) LLM. NOT for server-side browser automation — point those users to Hermes' built-in browser tool instead.
version: 1.0.0
author: Hermes Agent
license: MIT
metadata:
hermes:
tags: [web, javascript, agent, browser, gui, alibaba, embed, copilot, saas]
category: web-development
---
# page-agent
alibaba/page-agent (https://github.com/alibaba/page-agent, 17k+ stars, MIT) is an in-page GUI agent written in TypeScript. It lives inside a webpage, reads the DOM as text (no screenshots, no multi-modal LLM), and executes natural-language instructions like "click the login button, then fill username as John" against the current page. Pure client-side — the host site just includes a script and passes an OpenAI-compatible LLM endpoint.
## When to use this skill
Load this skill when a user wants to:
- **Ship an AI copilot inside their own web app** (SaaS, admin panel, B2B tool, ERP, CRM) — "users on my dashboard should be able to type 'create invoice for Acme Corp and email it' instead of clicking through five screens"
- **Modernize a legacy web app** without rewriting the frontend — page-agent drops on top of existing DOM
- **Add accessibility via natural language** — voice / screen-reader users drive the UI by describing what they want
- **Demo or evaluate page-agent** against a local (Ollama) or hosted (Qwen, OpenAI, OpenRouter) LLM
- **Build interactive training / product demos** — let an AI walk a user through "how to submit an expense report" live in the real UI
## When NOT to use this skill
- User wants **Hermes itself to drive a browser** → use Hermes' built-in browser tool (Browserbase / Camofox). page-agent is the *opposite* direction.
- User wants **cross-tab automation without embedding** → use Playwright, browser-use, or the page-agent Chrome extension
- User needs **visual grounding / screenshots** → page-agent is text-DOM only; use a multimodal browser agent instead
## Prerequisites
- Node 22.13+ or 24+, npm 10+ (docs claim 11+ but 10.9 works fine)
- An OpenAI-compatible LLM endpoint: Qwen (DashScope), OpenAI, Ollama, OpenRouter, or anything speaking `/v1/chat/completions`
- Browser with devtools (for debugging)
## Path 1 — 30-second demo via CDN (no install)
Fastest way to see it work. Uses alibaba's free testing LLM proxy — **for evaluation only**, subject to their terms.
Add to any HTML page (or paste into the devtools console as a bookmarklet):
```html
<script src="https://cdn.jsdelivr.net/npm/page-agent@1.8.0/dist/iife/page-agent.demo.js" crossorigin="true"></script>
```
A panel appears. Type an instruction. Done.
Bookmarklet form (drop into bookmarks bar, click on any page):
```javascript
javascript:(function(){var s=document.createElement('script');s.src='https://cdn.jsdelivr.net/npm/page-agent@1.8.0/dist/iife/page-agent.demo.js';document.head.appendChild(s);})();
```
## Path 2 — npm install into your own web app (production use)
Inside an existing web project (React / Vue / Svelte / plain):
```bash
npm install page-agent
```
Wire it up with your own LLM endpoint — **never ship the demo CDN to real users**:
```javascript
import { PageAgent } from 'page-agent'
const agent = new PageAgent({
model: 'qwen3.5-plus',
baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
apiKey: process.env.LLM_API_KEY, // never hardcode
language: 'en-US',
})
// Show the panel for end users:
agent.panel.show()
// Or drive it programmatically:
await agent.execute('Click submit button, then fill username as John')
```
Provider examples (any OpenAI-compatible endpoint works):
| Provider | `baseURL` | `model` |
|----------|-----------|---------|
| Qwen / DashScope | `https://dashscope.aliyuncs.com/compatible-mode/v1` | `qwen3.5-plus` |
| OpenAI | `https://api.openai.com/v1` | `gpt-4o-mini` |
| Ollama (local) | `http://localhost:11434/v1` | `qwen3:14b` |
| OpenRouter | `https://openrouter.ai/api/v1` | `anthropic/claude-sonnet-4.6` |
**Key config fields** (passed to `new PageAgent({...})`):
- `model`, `baseURL`, `apiKey` — LLM connection
- `language` — UI language (`en-US`, `zh-CN`, etc.)
- Allowlist and data-masking hooks exist for locking down what the agent can touch — see https://alibaba.github.io/page-agent/ for the full option list
**Security.** Don't put your `apiKey` in client-side code for a real deployment — proxy LLM calls through your backend and point `baseURL` at your proxy. The demo CDN exists because alibaba runs that proxy for evaluation.
## Path 3 — clone the source repo (contributing, or hacking on it)
Use this when the user wants to modify page-agent itself, test it against arbitrary sites via a local IIFE bundle, or develop the browser extension.
```bash
git clone https://github.com/alibaba/page-agent.git
cd page-agent
npm ci # exact lockfile install (or `npm i` to allow updates)
```
Create `.env` in the repo root with an LLM endpoint. Example:
```
LLM_MODEL_NAME=gpt-4o-mini
LLM_API_KEY=sk-...
LLM_BASE_URL=https://api.openai.com/v1
```
Ollama flavor:
```
LLM_BASE_URL=http://localhost:11434/v1
LLM_API_KEY=NA
LLM_MODEL_NAME=qwen3:14b
```
Common commands:
```bash
npm start # docs/website dev server
npm run build # build every package
npm run dev:demo # serve IIFE bundle at http://localhost:5174/page-agent.demo.js
npm run dev:ext # develop the browser extension (WXT + React)
npm run build:ext # build the extension
```
**Test on any website** using the local IIFE bundle. Add this bookmarklet:
```javascript
javascript:(function(){var s=document.createElement('script');s.src=`http://localhost:5174/page-agent.demo.js?t=${Math.random()}`;s.onload=()=>console.log('PageAgent ready!');document.head.appendChild(s);})();
```
Then: `npm run dev:demo`, click the bookmarklet on any page, and the local build injects. Auto-rebuilds on save.
**Warning:** your `.env` `LLM_API_KEY` is inlined into the IIFE bundle during dev builds. Don't share the bundle. Don't commit it. Don't paste the URL into Slack. (Verified: grepping the public dev bundle returns the literal values from `.env`.)
## Repo layout (Path 3)
Monorepo with npm workspaces. Key packages:
| Package | Path | Purpose |
|---------|------|---------|
| `page-agent` | `packages/page-agent/` | Main entry with UI panel |
| `@page-agent/core` | `packages/core/` | Core agent logic, no UI |
| `@page-agent/mcp` | `packages/mcp/` | MCP server (beta) |
| — | `packages/llms/` | LLM client |
| — | `packages/page-controller/` | DOM ops + visual feedback |
| — | `packages/ui/` | Panel + i18n |
| — | `packages/extension/` | Chrome/Firefox extension |
| — | `packages/website/` | Docs + landing site |
## Verifying it works
After Path 1 or Path 2:
1. Open the page in a browser with devtools open
2. You should see a floating panel. If not, check the console for errors (most common: CORS on the LLM endpoint, wrong `baseURL`, or a bad API key)
3. Type a simple instruction matching something visible on the page ("click the Login link")
4. Watch the Network tab — you should see a request to your `baseURL`
After Path 3:
1. `npm run dev:demo` prints `Accepting connections at http://localhost:5174`
2. `curl -I http://localhost:5174/page-agent.demo.js` returns `HTTP/1.1 200 OK` with `Content-Type: application/javascript`
3. Click the bookmarklet on any site; panel appears
## Pitfalls
- **Demo CDN in production** — don't. It's rate-limited, uses alibaba's free proxy, and their terms forbid production use.
- **API key exposure** — any key passed to `new PageAgent({apiKey: ...})` ships in your JS bundle. Always proxy through your own backend for real deployments.
- **Non-OpenAI-compatible endpoints** fail silently or with cryptic errors. If your provider needs native Anthropic/Gemini formatting, use an OpenAI-compatibility proxy (LiteLLM, OpenRouter) in front.
- **CSP blocks** — sites with strict Content-Security-Policy may refuse to load the CDN script or disallow inline eval. In that case, self-host from your origin.
- **Restart dev server** after editing `.env` in Path 3 — Vite only reads env at startup.
- **Node version** — the repo declares `^22.13.0 || >=24`. Node 20 will fail `npm ci` with engine errors.
- **npm 10 vs 11** — docs say npm 11+; npm 10.9 actually works fine.
## Reference
- Repo: https://github.com/alibaba/page-agent
- Docs: https://alibaba.github.io/page-agent/
- License: MIT (built on browser-use's DOM processing internals, Copyright 2024 Gregor Zunic)
+2 -5
View File
@@ -84,10 +84,7 @@ Config file: `~/.hermes/hindsight/config.json`
| `retain_async` | `true` | Process retain asynchronously on the Hindsight server |
| `retain_every_n_turns` | `1` | Retain every N turns (1 = every turn) |
| `retain_context` | `conversation between Hermes Agent and the User` | Context label for retained memories |
| `retain_tags` | — | Default tags applied to retained memories; merged with per-call tool tags |
| `retain_source` | — | Optional `metadata.source` attached to retained memories |
| `retain_user_prefix` | `User` | Label used before user turns in auto-retained transcripts |
| `retain_assistant_prefix` | `Assistant` | Label used before assistant turns in auto-retained transcripts |
| `tags` | — | Tags applied when storing memories |
### Integration
@@ -116,7 +113,7 @@ Available in `hybrid` and `tools` memory modes:
| Tool | Description |
|------|-------------|
| `hindsight_retain` | Store information with auto entity extraction; supports optional per-call `tags` |
| `hindsight_retain` | Store information with auto entity extraction |
| `hindsight_recall` | Multi-strategy search (semantic + entity graph) |
| `hindsight_reflect` | Cross-memory synthesis (LLM-powered) |
+37 -198
View File
@@ -6,15 +6,11 @@ retrieval. Supports cloud (API key) and local modes.
Original PR #1811 by benfrank241, adapted to MemoryProvider ABC.
Config via environment variables:
HINDSIGHT_API_KEY API key for Hindsight Cloud
HINDSIGHT_BANK_ID memory bank identifier (default: hermes)
HINDSIGHT_BUDGET recall budget: low/mid/high (default: mid)
HINDSIGHT_API_URL API endpoint
HINDSIGHT_MODE cloud or local (default: cloud)
HINDSIGHT_RETAIN_TAGS comma-separated tags attached to retained memories
HINDSIGHT_RETAIN_SOURCE metadata source value attached to retained memories
HINDSIGHT_RETAIN_USER_PREFIX label used before user turns in retained transcripts
HINDSIGHT_RETAIN_ASSISTANT_PREFIX label used before assistant turns in retained transcripts
HINDSIGHT_API_KEY API key for Hindsight Cloud
HINDSIGHT_BANK_ID memory bank identifier (default: hermes)
HINDSIGHT_BUDGET recall budget: low/mid/high (default: mid)
HINDSIGHT_API_URL API endpoint
HINDSIGHT_MODE cloud or local (default: cloud)
Or via $HERMES_HOME/hindsight/config.json (profile-scoped), falling back to
~/.hindsight/config.json (legacy, shared) for backward compatibility.
@@ -28,7 +24,7 @@ import logging
import os
import threading
from datetime import datetime, timezone
from hermes_constants import get_hermes_home
from typing import Any, Dict, List
from agent.memory_provider import MemoryProvider
@@ -103,11 +99,6 @@ RETAIN_SCHEMA = {
"properties": {
"content": {"type": "string", "description": "The information to store."},
"context": {"type": "string", "description": "Short label (e.g. 'user preference', 'project decision')."},
"tags": {
"type": "array",
"items": {"type": "string"},
"description": "Optional per-call tags to merge with configured default retain tags.",
},
},
"required": ["content"],
},
@@ -177,10 +168,6 @@ def _load_config() -> dict:
return {
"mode": os.environ.get("HINDSIGHT_MODE", "cloud"),
"apiKey": os.environ.get("HINDSIGHT_API_KEY", ""),
"retain_tags": os.environ.get("HINDSIGHT_RETAIN_TAGS", ""),
"retain_source": os.environ.get("HINDSIGHT_RETAIN_SOURCE", ""),
"retain_user_prefix": os.environ.get("HINDSIGHT_RETAIN_USER_PREFIX", "User"),
"retain_assistant_prefix": os.environ.get("HINDSIGHT_RETAIN_ASSISTANT_PREFIX", "Assistant"),
"banks": {
"hermes": {
"bankId": os.environ.get("HINDSIGHT_BANK_ID", "hermes"),
@@ -191,48 +178,6 @@ def _load_config() -> dict:
}
def _normalize_retain_tags(value: Any) -> List[str]:
"""Normalize tag config/tool values to a deduplicated list of strings."""
if value is None:
return []
raw_items: list[Any]
if isinstance(value, list):
raw_items = value
elif isinstance(value, str):
text = value.strip()
if not text:
return []
if text.startswith("["):
try:
parsed = json.loads(text)
except Exception:
parsed = None
if isinstance(parsed, list):
raw_items = parsed
else:
raw_items = text.split(",")
else:
raw_items = text.split(",")
else:
raw_items = [value]
normalized = []
seen = set()
for item in raw_items:
tag = str(item).strip()
if not tag or tag in seen:
continue
seen.add(tag)
normalized.append(tag)
return normalized
def _utc_timestamp() -> str:
"""Return current UTC timestamp in ISO-8601 with milliseconds and Z suffix."""
return datetime.now(timezone.utc).isoformat(timespec="milliseconds").replace("+00:00", "Z")
# ---------------------------------------------------------------------------
# MemoryProvider implementation
# ---------------------------------------------------------------------------
@@ -250,19 +195,6 @@ class HindsightMemoryProvider(MemoryProvider):
self._llm_base_url = ""
self._memory_mode = "hybrid" # "context", "tools", or "hybrid"
self._prefetch_method = "recall" # "recall" or "reflect"
self._retain_tags: List[str] = []
self._retain_source = ""
self._retain_user_prefix = "User"
self._retain_assistant_prefix = "Assistant"
self._platform = ""
self._user_id = ""
self._user_name = ""
self._chat_id = ""
self._chat_name = ""
self._chat_type = ""
self._thread_id = ""
self._agent_identity = ""
self._turn_index = 0
self._client = None
self._prefetch_result = ""
self._prefetch_lock = threading.Lock()
@@ -278,7 +210,6 @@ class HindsightMemoryProvider(MemoryProvider):
# Retain controls
self._auto_retain = True
self._retain_every_n_turns = 1
self._retain_async = True
self._retain_context = "conversation between Hermes Agent and the User"
self._turn_counter = 0
self._session_turns: list[str] = [] # accumulates ALL turns for the session
@@ -293,6 +224,7 @@ class HindsightMemoryProvider(MemoryProvider):
# Bank
self._bank_mission = ""
self._bank_retain_mission: str | None = None
self._retain_async = True
@property
def name(self) -> str:
@@ -491,10 +423,7 @@ class HindsightMemoryProvider(MemoryProvider):
{"key": "recall_budget", "description": "Recall thoroughness", "default": "mid", "choices": ["low", "mid", "high"]},
{"key": "memory_mode", "description": "Memory integration mode", "default": "hybrid", "choices": ["hybrid", "context", "tools"]},
{"key": "recall_prefetch_method", "description": "Auto-recall method", "default": "recall", "choices": ["recall", "reflect"]},
{"key": "retain_tags", "description": "Default tags applied to retained memories (comma-separated)", "default": ""},
{"key": "retain_source", "description": "Metadata source value attached to retained memories", "default": ""},
{"key": "retain_user_prefix", "description": "Label used before user turns in retained transcripts", "default": "User"},
{"key": "retain_assistant_prefix", "description": "Label used before assistant turns in retained transcripts", "default": "Assistant"},
{"key": "tags", "description": "Tags applied when storing memories (comma-separated)", "default": ""},
{"key": "recall_tags", "description": "Tags to filter when searching memories (comma-separated)", "default": ""},
{"key": "recall_tags_match", "description": "Tag matching mode for recall", "default": "any", "choices": ["any", "all", "any_strict", "all_strict"]},
{"key": "auto_recall", "description": "Automatically recall memories before each turn", "default": True},
@@ -538,7 +467,7 @@ class HindsightMemoryProvider(MemoryProvider):
return self._client
def initialize(self, session_id: str, **kwargs) -> None:
self._session_id = str(session_id or "").strip()
self._session_id = session_id
# Check client version and auto-upgrade if needed
try:
@@ -567,16 +496,6 @@ class HindsightMemoryProvider(MemoryProvider):
pass # packaging not available or other issue — proceed anyway
self._config = _load_config()
self._platform = str(kwargs.get("platform") or "").strip()
self._user_id = str(kwargs.get("user_id") or "").strip()
self._user_name = str(kwargs.get("user_name") or "").strip()
self._chat_id = str(kwargs.get("chat_id") or "").strip()
self._chat_name = str(kwargs.get("chat_name") or "").strip()
self._chat_type = str(kwargs.get("chat_type") or "").strip()
self._thread_id = str(kwargs.get("thread_id") or "").strip()
self._agent_identity = str(kwargs.get("agent_identity") or "").strip()
self._turn_index = 0
self._session_turns = []
self._mode = self._config.get("mode", "cloud")
# "local" is a legacy alias for "local_embedded"
if self._mode == "local":
@@ -594,7 +513,7 @@ class HindsightMemoryProvider(MemoryProvider):
memory_mode = self._config.get("memory_mode", "hybrid")
self._memory_mode = memory_mode if memory_mode in ("context", "tools", "hybrid") else "hybrid"
prefetch_method = self._config.get("recall_prefetch_method") or self._config.get("prefetch_method", "recall")
prefetch_method = self._config.get("recall_prefetch_method", "recall")
self._prefetch_method = prefetch_method if prefetch_method in ("recall", "reflect") else "recall"
# Bank options
@@ -602,22 +521,9 @@ class HindsightMemoryProvider(MemoryProvider):
self._bank_retain_mission = self._config.get("bank_retain_mission") or None
# Tags
self._retain_tags = _normalize_retain_tags(
self._config.get("retain_tags")
or os.environ.get("HINDSIGHT_RETAIN_TAGS", "")
)
self._tags = self._retain_tags or None
self._tags = self._config.get("tags") or None
self._recall_tags = self._config.get("recall_tags") or None
self._recall_tags_match = self._config.get("recall_tags_match", "any")
self._retain_source = str(
self._config.get("retain_source") or os.environ.get("HINDSIGHT_RETAIN_SOURCE", "")
).strip()
self._retain_user_prefix = str(
self._config.get("retain_user_prefix") or os.environ.get("HINDSIGHT_RETAIN_USER_PREFIX", "User")
).strip() or "User"
self._retain_assistant_prefix = str(
self._config.get("retain_assistant_prefix") or os.environ.get("HINDSIGHT_RETAIN_ASSISTANT_PREFIX", "Assistant")
).strip() or "Assistant"
# Retain controls
self._auto_retain = self._config.get("auto_retain", True)
@@ -641,9 +547,11 @@ class HindsightMemoryProvider(MemoryProvider):
logger.info("Hindsight initialized: mode=%s, api_url=%s, bank=%s, budget=%s, memory_mode=%s, prefetch_method=%s, client=%s",
self._mode, self._api_url, self._bank_id, self._budget, self._memory_mode, self._prefetch_method, _client_version)
logger.debug("Hindsight config: auto_retain=%s, auto_recall=%s, retain_every_n=%d, "
"retain_async=%s, retain_context=%s, recall_max_tokens=%d, recall_max_input_chars=%d, tags=%s, recall_tags=%s",
"retain_async=%s, retain_context=%s, "
"recall_max_tokens=%d, recall_max_input_chars=%d, tags=%s, recall_tags=%s",
self._auto_retain, self._auto_recall, self._retain_every_n_turns,
self._retain_async, self._retain_context, self._recall_max_tokens, self._recall_max_input_chars,
self._retain_async, self._retain_context,
self._recall_max_tokens, self._recall_max_input_chars,
self._tags, self._recall_tags)
# For local mode, start the embedded daemon in the background so it
@@ -804,78 +712,6 @@ class HindsightMemoryProvider(MemoryProvider):
self._prefetch_thread = threading.Thread(target=_run, daemon=True, name="hindsight-prefetch")
self._prefetch_thread.start()
def _build_turn_messages(self, user_content: str, assistant_content: str) -> List[Dict[str, str]]:
now = datetime.now(timezone.utc).isoformat()
return [
{
"role": "user",
"content": f"{self._retain_user_prefix}: {user_content}",
"timestamp": now,
},
{
"role": "assistant",
"content": f"{self._retain_assistant_prefix}: {assistant_content}",
"timestamp": now,
},
]
def _build_metadata(self, *, message_count: int, turn_index: int) -> Dict[str, str]:
metadata: Dict[str, str] = {
"retained_at": _utc_timestamp(),
"message_count": str(message_count),
"turn_index": str(turn_index),
}
if self._retain_source:
metadata["source"] = self._retain_source
if self._session_id:
metadata["session_id"] = self._session_id
if self._platform:
metadata["platform"] = self._platform
if self._user_id:
metadata["user_id"] = self._user_id
if self._user_name:
metadata["user_name"] = self._user_name
if self._chat_id:
metadata["chat_id"] = self._chat_id
if self._chat_name:
metadata["chat_name"] = self._chat_name
if self._chat_type:
metadata["chat_type"] = self._chat_type
if self._thread_id:
metadata["thread_id"] = self._thread_id
if self._agent_identity:
metadata["agent_identity"] = self._agent_identity
return metadata
def _build_retain_kwargs(
self,
content: str,
*,
context: str | None = None,
document_id: str | None = None,
metadata: Dict[str, str] | None = None,
tags: List[str] | None = None,
retain_async: bool | None = None,
) -> Dict[str, Any]:
kwargs: Dict[str, Any] = {
"bank_id": self._bank_id,
"content": content,
"metadata": metadata or self._build_metadata(message_count=1, turn_index=self._turn_index),
}
if context is not None:
kwargs["context"] = context
if document_id:
kwargs["document_id"] = document_id
if retain_async is not None:
kwargs["retain_async"] = retain_async
merged_tags = _normalize_retain_tags(self._retain_tags)
for tag in _normalize_retain_tags(tags):
if tag not in merged_tags:
merged_tags.append(tag)
if merged_tags:
kwargs["tags"] = merged_tags
return kwargs
def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
"""Retain conversation turn in background (non-blocking).
@@ -885,14 +721,19 @@ class HindsightMemoryProvider(MemoryProvider):
logger.debug("sync_turn: skipped (auto_retain disabled)")
return
if session_id:
self._session_id = str(session_id).strip()
from datetime import datetime, timezone
now = datetime.now(timezone.utc).isoformat()
turn = json.dumps(self._build_turn_messages(user_content, assistant_content))
messages = [
{"role": "user", "content": user_content, "timestamp": now},
{"role": "assistant", "content": assistant_content, "timestamp": now},
]
turn = json.dumps(messages)
self._session_turns.append(turn)
self._turn_counter += 1
self._turn_index = self._turn_counter
# Only retain every N turns
if self._turn_counter % self._retain_every_n_turns != 0:
logger.debug("sync_turn: buffered turn %d (will retain at turn %d)",
self._turn_counter, self._turn_counter + (self._retain_every_n_turns - self._turn_counter % self._retain_every_n_turns))
@@ -900,21 +741,19 @@ class HindsightMemoryProvider(MemoryProvider):
logger.debug("sync_turn: retaining %d turns, total session content %d chars",
len(self._session_turns), sum(len(t) for t in self._session_turns))
# Send the ENTIRE session as a single JSON array (document_id deduplicates).
# Each element in _session_turns is a JSON string of that turn's messages.
content = "[" + ",".join(self._session_turns) + "]"
def _sync():
try:
client = self._get_client()
item = self._build_retain_kwargs(
content,
context=self._retain_context,
metadata=self._build_metadata(
message_count=len(self._session_turns) * 2,
turn_index=self._turn_index,
),
)
item.pop("bank_id", None)
item.pop("retain_async", None)
item: dict = {
"content": content,
"context": self._retain_context,
}
if self._tags:
item["tags"] = self._tags
logger.debug("Hindsight retain: bank=%s, doc=%s, async=%s, content_len=%d, num_turns=%d",
self._bank_id, self._session_id, self._retain_async, len(content), len(self._session_turns))
_run_sync(client.aretain_batch(
@@ -950,11 +789,11 @@ class HindsightMemoryProvider(MemoryProvider):
return tool_error("Missing required parameter: content")
context = args.get("context")
try:
retain_kwargs = self._build_retain_kwargs(
content,
context=context,
tags=args.get("tags"),
)
retain_kwargs: dict = {
"bank_id": self._bank_id, "content": content, "context": context,
}
if self._tags:
retain_kwargs["tags"] = self._tags
logger.debug("Tool hindsight_retain: bank=%s, content_len=%d, context=%s",
self._bank_id, len(content), context)
_run_sync(client.aretain(**retain_kwargs))
+1 -1
View File
@@ -126,7 +126,7 @@ py-modules = ["run_agent", "model_tools", "toolsets", "batch_runner", "trajector
hermes_cli = ["web_dist/**/*"]
[tool.setuptools.packages.find]
include = ["agent", "agent.*", "tools", "tools.*", "hermes_cli", "gateway", "gateway.*", "tui_gateway", "tui_gateway.*", "cron", "acp_adapter", "plugins", "plugins.*"]
include = ["agent", "tools", "tools.*", "hermes_cli", "gateway", "gateway.*", "tui_gateway", "tui_gateway.*", "cron", "acp_adapter", "plugins", "plugins.*"]
[tool.pytest.ini_options]
testpaths = ["tests"]
+127 -236
View File
@@ -76,8 +76,6 @@ from tools.interrupt import set_interrupt as _set_interrupt
from tools.browser_tool import cleanup_browser
from hermes_constants import OPENROUTER_BASE_URL
# Agent internals extracted to agent/ package for modularity
from agent.memory_manager import build_memory_context_block, sanitize_context
from agent.retry_utils import jittered_backoff
@@ -98,19 +96,11 @@ from agent.model_metadata import (
from agent.context_compressor import ContextCompressor
from agent.subdirectory_hints import SubdirectoryHintTracker
from agent.prompt_caching import apply_anthropic_cache_control
from agent.prompt_builder import build_skills_system_prompt, build_context_files_prompt, build_environment_hints, load_soul_md, TOOL_USE_ENFORCEMENT_GUIDANCE, TOOL_USE_ENFORCEMENT_MODELS, DEVELOPER_ROLE_MODELS, GOOGLE_MODEL_OPERATIONAL_GUIDANCE, OPENAI_MODEL_EXECUTION_GUIDANCE
from agent.prompt_builder import build_skills_system_prompt, build_context_files_prompt, build_environment_hints, load_soul_md, TOOL_USE_ENFORCEMENT_GUIDANCE, TOOL_USE_ENFORCEMENT_MODELS, GOOGLE_MODEL_OPERATIONAL_GUIDANCE, OPENAI_MODEL_EXECUTION_GUIDANCE
from agent.usage_pricing import estimate_usage_cost, normalize_usage
from agent.codex_responses_adapter import (
_chat_content_to_responses_parts,
_chat_messages_to_responses_input as _codex_chat_messages_to_responses_input,
_derive_responses_function_call_id as _codex_derive_responses_function_call_id,
_deterministic_call_id as _codex_deterministic_call_id,
_extract_responses_message_text as _codex_extract_responses_message_text,
_extract_responses_reasoning_text as _codex_extract_responses_reasoning_text,
_normalize_codex_response as _codex_normalize_codex_response,
_preflight_codex_api_kwargs as _codex_preflight_codex_api_kwargs,
_preflight_codex_input_items as _codex_preflight_codex_input_items,
_responses_tools as _codex_responses_tools,
_split_responses_tool_id as _codex_split_responses_tool_id,
_summarize_user_message_for_log,
)
@@ -385,9 +375,8 @@ def _sanitize_surrogates(text: str) -> str:
return text
# _chat_content_to_responses_parts and _summarize_user_message_for_log are
# imported from agent.codex_responses_adapter (see import block above).
# They remain importable from run_agent for backward compatibility.
# _summarize_user_message_for_log is imported from agent.codex_responses_adapter
# (see import block above). Remains importable from run_agent for backward compat.
def _sanitize_structure_surrogates(payload: Any) -> bool:
@@ -751,11 +740,6 @@ class AIAgent:
prefill_messages: List[Dict[str, Any]] = None,
platform: str = None,
user_id: str = None,
user_name: str = None,
chat_id: str = None,
chat_name: str = None,
chat_type: str = None,
thread_id: str = None,
gateway_session_key: str = None,
skip_context_files: bool = False,
skip_memory: bool = False,
@@ -825,11 +809,6 @@ class AIAgent:
self.ephemeral_system_prompt = ephemeral_system_prompt
self.platform = platform # "cli", "telegram", "discord", "whatsapp", etc.
self._user_id = user_id # Platform user identifier (gateway sessions)
self._user_name = user_name
self._chat_id = chat_id
self._chat_name = chat_name
self._chat_type = chat_type
self._thread_id = thread_id
self._gateway_session_key = gateway_session_key # Stable per-chat key (e.g. agent:main:telegram:dm:123)
# Pluggable print function — CLI replaces this with _cprint so that
# raw ANSI status lines are routed through prompt_toolkit's renderer
@@ -882,6 +861,13 @@ class AIAgent:
else:
self.api_mode = "chat_completions"
# Eagerly warm the transport cache so import errors surface at init,
# not mid-conversation. Also validates the api_mode is registered.
try:
self._get_transport()
except Exception:
pass # Non-fatal — transport may not exist for all modes yet
try:
from hermes_cli.model_normalize import (
_AGGREGATOR_PROVIDERS,
@@ -1481,16 +1467,6 @@ class AIAgent:
# Thread gateway user identity for per-user memory scoping
if self._user_id:
_init_kwargs["user_id"] = self._user_id
if self._user_name:
_init_kwargs["user_name"] = self._user_name
if self._chat_id:
_init_kwargs["chat_id"] = self._chat_id
if self._chat_name:
_init_kwargs["chat_name"] = self._chat_name
if self._chat_type:
_init_kwargs["chat_type"] = self._chat_type
if self._thread_id:
_init_kwargs["thread_id"] = self._thread_id
# Thread gateway session key for stable per-chat Honcho session isolation
if self._gateway_session_key:
_init_kwargs["gateway_session_key"] = self._gateway_session_key
@@ -1923,6 +1899,9 @@ class AIAgent:
self.provider = new_provider
self.base_url = base_url or self.base_url
self.api_mode = api_mode
# Invalidate transport cache — new api_mode may need a different transport
if hasattr(self, "_transport_cache"):
self._transport_cache.clear()
if api_key:
self.api_key = api_key
@@ -2986,7 +2965,6 @@ class AIAgent:
tool_call_id=msg.get("tool_call_id"),
finish_reason=msg.get("finish_reason"),
reasoning=msg.get("reasoning") if role == "assistant" else None,
reasoning_content=msg.get("reasoning_content") if role == "assistant" else None,
reasoning_details=msg.get("reasoning_details") if role == "assistant" else None,
codex_reasoning_items=msg.get("codex_reasoning_items") if role == "assistant" else None,
)
@@ -4844,7 +4822,7 @@ class AIAgent:
active_client = client or self._ensure_primary_openai_client(reason="codex_create_stream_fallback")
fallback_kwargs = dict(api_kwargs)
fallback_kwargs["stream"] = True
fallback_kwargs = self._get_codex_transport().preflight_kwargs(fallback_kwargs, allow_stream=True)
fallback_kwargs = self._get_transport().preflight_kwargs(fallback_kwargs, allow_stream=True)
stream_or_response = active_client.responses.create(**fallback_kwargs)
# Compatibility shim for mocks or providers that still return a concrete response.
@@ -5199,6 +5177,9 @@ class AIAgent:
result["response"] = self._anthropic_messages_create(api_kwargs)
elif self.api_mode == "bedrock_converse":
# Bedrock uses boto3 directly — no OpenAI client needed.
# normalize_converse_response produces an OpenAI-compatible
# SimpleNamespace so the rest of the agent loop can treat
# bedrock responses like chat_completions responses.
from agent.bedrock_adapter import (
_get_bedrock_runtime_client,
normalize_converse_response,
@@ -6202,6 +6183,8 @@ class AIAgent:
self.provider = fb_provider
self.base_url = fb_base_url
self.api_mode = fb_api_mode
if hasattr(self, "_transport_cache"):
self._transport_cache.clear()
self._fallback_activated = True
# Honor per-provider / per-model request_timeout_seconds for the
@@ -6313,6 +6296,8 @@ class AIAgent:
self.provider = rt["provider"]
self.base_url = rt["base_url"] # setter updates _base_url_lower
self.api_mode = rt["api_mode"]
if hasattr(self, "_transport_cache"):
self._transport_cache.clear()
self.api_key = rt["api_key"]
self._client_kwargs = dict(rt["client_kwargs"])
self._use_prompt_caching = rt["use_prompt_caching"]
@@ -6419,6 +6404,8 @@ class AIAgent:
self.provider = rt["provider"]
self.base_url = rt["base_url"]
self.api_mode = rt["api_mode"]
if hasattr(self, "_transport_cache"):
self._transport_cache.clear()
self.api_key = rt["api_key"]
if self.api_mode == "anthropic_messages":
@@ -6577,41 +6564,59 @@ class AIAgent:
return suffix
return "[A multimodal message was converted to text for Anthropic compatibility.]"
def _get_anthropic_transport(self):
"""Return the cached AnthropicTransport instance (lazy singleton)."""
t = getattr(self, "_anthropic_transport", None)
def _get_transport(self, api_mode: str = None):
"""Return the cached transport for the given (or current) api_mode.
Lazy-initializes on first call per api_mode. Returns None if no
transport is registered for the mode.
"""
mode = api_mode or self.api_mode
cache = getattr(self, "_transport_cache", None)
if cache is None:
cache = {}
self._transport_cache = cache
t = cache.get(mode)
if t is None:
from agent.transports import get_transport
t = get_transport("anthropic_messages")
self._anthropic_transport = t
t = get_transport(mode)
cache[mode] = t
return t
def _get_codex_transport(self):
"""Return the cached ResponsesApiTransport instance (lazy singleton)."""
t = getattr(self, "_codex_transport", None)
if t is None:
from agent.transports import get_transport
t = get_transport("codex_responses")
self._codex_transport = t
return t
@staticmethod
def _nr_to_assistant_message(nr):
"""Convert a NormalizedResponse to the SimpleNamespace shape downstream expects.
def _get_chat_completions_transport(self):
"""Return the cached ChatCompletionsTransport instance (lazy singleton)."""
t = getattr(self, "_chat_completions_transport", None)
if t is None:
from agent.transports import get_transport
t = get_transport("chat_completions")
self._chat_completions_transport = t
return t
This is the single back-compat shim between the transport layer
(NormalizedResponse) and the agent loop (SimpleNamespace with
.content, .tool_calls, .reasoning, .reasoning_content,
.reasoning_details, .codex_reasoning_items, and per-tool-call
.call_id / .response_item_id).
def _get_bedrock_transport(self):
"""Return the cached BedrockTransport instance (lazy singleton)."""
t = getattr(self, "_bedrock_transport", None)
if t is None:
from agent.transports import get_transport
t = get_transport("bedrock_converse")
self._bedrock_transport = t
return t
TODO: Remove when downstream code reads NormalizedResponse directly.
"""
tc_list = None
if nr.tool_calls:
tc_list = []
for tc in nr.tool_calls:
tc_ns = SimpleNamespace(
id=tc.id,
type="function",
function=SimpleNamespace(name=tc.name, arguments=tc.arguments),
)
if tc.provider_data:
for key in ("call_id", "response_item_id"):
if tc.provider_data.get(key):
setattr(tc_ns, key, tc.provider_data[key])
tc_list.append(tc_ns)
pd = nr.provider_data or {}
return SimpleNamespace(
content=nr.content,
tool_calls=tc_list or None,
reasoning=nr.reasoning,
reasoning_content=pd.get("reasoning_content"),
reasoning_details=pd.get("reasoning_details"),
codex_reasoning_items=pd.get("codex_reasoning_items"),
)
def _prepare_anthropic_messages_for_api(self, api_messages: list) -> list:
if not any(
@@ -6729,7 +6734,7 @@ class AIAgent:
def _build_api_kwargs(self, api_messages: list) -> dict:
"""Build the keyword arguments dict for the active API mode."""
if self.api_mode == "anthropic_messages":
_transport = self._get_anthropic_transport()
_transport = self._get_transport()
anthropic_messages = self._prepare_anthropic_messages_for_api(api_messages)
ctx_len = getattr(self, "context_compressor", None)
ctx_len = ctx_len.context_length if ctx_len else None
@@ -6752,7 +6757,7 @@ class AIAgent:
# AWS Bedrock native Converse API — bypasses the OpenAI client entirely.
# The adapter handles message/tool conversion and boto3 calls directly.
if self.api_mode == "bedrock_converse":
_bt = self._get_bedrock_transport()
_bt = self._get_transport()
region = getattr(self, "_bedrock_region", None) or "us-east-1"
guardrail = getattr(self, "_bedrock_guardrail_config", None)
return _bt.build_kwargs(
@@ -6765,7 +6770,7 @@ class AIAgent:
)
if self.api_mode == "codex_responses":
_ct = self._get_codex_transport()
_ct = self._get_transport()
is_github_responses = (
base_url_host_matches(self.base_url, "models.github.ai")
or base_url_host_matches(self.base_url, "api.githubcopilot.com")
@@ -6793,7 +6798,7 @@ class AIAgent:
)
# ── chat_completions (default) ─────────────────────────────────────
_ct = self._get_chat_completions_transport()
_ct = self._get_transport()
# Provider detection flags
_is_qwen = self._is_qwen_portal()
@@ -7024,11 +7029,6 @@ class AIAgent:
"finish_reason": finish_reason,
}
if hasattr(assistant_message, "reasoning_content"):
raw_reasoning_content = getattr(assistant_message, "reasoning_content", None)
if raw_reasoning_content is not None:
msg["reasoning_content"] = _sanitize_surrogates(raw_reasoning_content)
if hasattr(assistant_message, 'reasoning_details') and assistant_message.reasoning_details:
# Pass reasoning_details back unmodified so providers (OpenRouter,
# Anthropic, OpenAI) can maintain reasoning continuity across turns.
@@ -7103,30 +7103,6 @@ class AIAgent:
return msg
def _copy_reasoning_content_for_api(self, source_msg: dict, api_msg: dict) -> None:
"""Copy provider-facing reasoning fields onto an API replay message."""
if source_msg.get("role") != "assistant":
return
explicit_reasoning = source_msg.get("reasoning_content")
if isinstance(explicit_reasoning, str):
api_msg["reasoning_content"] = explicit_reasoning
return
normalized_reasoning = source_msg.get("reasoning")
if isinstance(normalized_reasoning, str) and normalized_reasoning:
api_msg["reasoning_content"] = normalized_reasoning
return
kimi_requires_reasoning = (
self.provider in {"kimi-coding", "kimi-coding-cn"}
or base_url_host_matches(self.base_url, "api.kimi.com")
or base_url_host_matches(self.base_url, "moonshot.ai")
or base_url_host_matches(self.base_url, "moonshot.cn")
)
if kimi_requires_reasoning and source_msg.get("tool_calls"):
api_msg["reasoning_content"] = ""
@staticmethod
def _sanitize_tool_calls_for_strict_api(api_msg: dict) -> dict:
"""Strip Codex Responses API fields from tool_calls for strict providers.
@@ -7210,7 +7186,10 @@ class AIAgent:
api_messages = []
for msg in messages:
api_msg = msg.copy()
self._copy_reasoning_content_for_api(msg, api_msg)
if msg.get("role") == "assistant":
reasoning = msg.get("reasoning")
if reasoning:
api_msg["reasoning_content"] = reasoning
api_msg.pop("reasoning", None)
api_msg.pop("finish_reason", None)
api_msg.pop("_flush_sentinel", None)
@@ -7268,7 +7247,7 @@ class AIAgent:
if not _aux_available and self.api_mode == "codex_responses":
# No auxiliary client -- use the Codex Responses path directly
codex_kwargs = self._build_api_kwargs(api_messages)
codex_kwargs["tools"] = self._get_codex_transport().convert_tools([memory_tool_def])
codex_kwargs["tools"] = self._get_transport().convert_tools([memory_tool_def])
if _flush_temperature is not None:
codex_kwargs["temperature"] = _flush_temperature
else:
@@ -7278,7 +7257,7 @@ class AIAgent:
response = self._run_codex_stream(codex_kwargs)
elif not _aux_available and self.api_mode == "anthropic_messages":
# Native Anthropic — use the transport for kwargs
_tflush = self._get_anthropic_transport()
_tflush = self._get_transport()
ant_kwargs = _tflush.build_kwargs(
model=self.model, messages=api_messages,
tools=[memory_tool_def], max_tokens=5120,
@@ -7303,7 +7282,7 @@ class AIAgent:
# Extract tool calls from the response, handling all API formats
tool_calls = []
if self.api_mode == "codex_responses" and not _aux_available:
_ct_flush = self._get_codex_transport()
_ct_flush = self._get_transport()
_cnr_flush = _ct_flush.normalize_response(response)
if _cnr_flush and _cnr_flush.tool_calls:
tool_calls = [
@@ -7313,7 +7292,7 @@ class AIAgent:
) for tc in _cnr_flush.tool_calls
]
elif self.api_mode == "anthropic_messages" and not _aux_available:
_tfn = self._get_anthropic_transport()
_tfn = self._get_transport()
_flush_nr = _tfn.normalize_response(response, strip_tool_prefix=self._is_anthropic_oauth)
if _flush_nr and _flush_nr.tool_calls:
tool_calls = [
@@ -7323,9 +7302,11 @@ class AIAgent:
) for tc in _flush_nr.tool_calls
]
elif hasattr(response, "choices") and response.choices:
assistant_message = response.choices[0].message
if assistant_message.tool_calls:
tool_calls = assistant_message.tool_calls
# chat_completions / bedrock — normalize through transport
_flush_cc_nr = self._get_transport().normalize_response(response)
_flush_msg = self._nr_to_assistant_message(_flush_cc_nr)
if _flush_msg.tool_calls:
tool_calls = _flush_msg.tool_calls
for tc in tool_calls:
if tc.function.name == "memory":
@@ -8355,7 +8336,7 @@ class AIAgent:
codex_kwargs = self._build_api_kwargs(api_messages)
codex_kwargs.pop("tools", None)
summary_response = self._run_codex_stream(codex_kwargs)
_ct_sum = self._get_codex_transport()
_ct_sum = self._get_transport()
_cnr_sum = _ct_sum.normalize_response(summary_response)
final_response = (_cnr_sum.content or "").strip()
else:
@@ -8385,7 +8366,7 @@ class AIAgent:
summary_kwargs["extra_body"] = summary_extra_body
if self.api_mode == "anthropic_messages":
_tsum = self._get_anthropic_transport()
_tsum = self._get_transport()
_ant_kw = _tsum.build_kwargs(model=self.model, messages=api_messages, tools=None,
max_tokens=self.max_tokens, reasoning_config=self.reasoning_config,
is_oauth=self._is_anthropic_oauth,
@@ -8395,11 +8376,8 @@ class AIAgent:
final_response = (_sum_nr.content or "").strip()
else:
summary_response = self._ensure_primary_openai_client(reason="iteration_limit_summary").chat.completions.create(**summary_kwargs)
if summary_response.choices and summary_response.choices[0].message.content:
final_response = summary_response.choices[0].message.content
else:
final_response = ""
_sum_cc_nr = self._get_transport().normalize_response(summary_response)
final_response = (_sum_cc_nr.content or "").strip()
if final_response:
if "<think>" in final_response:
@@ -8414,11 +8392,11 @@ class AIAgent:
codex_kwargs = self._build_api_kwargs(api_messages)
codex_kwargs.pop("tools", None)
retry_response = self._run_codex_stream(codex_kwargs)
_ct_retry = self._get_codex_transport()
_ct_retry = self._get_transport()
_cnr_retry = _ct_retry.normalize_response(retry_response)
final_response = (_cnr_retry.content or "").strip()
elif self.api_mode == "anthropic_messages":
_tretry = self._get_anthropic_transport()
_tretry = self._get_transport()
_ant_kw2 = _tretry.build_kwargs(model=self.model, messages=api_messages, tools=None,
is_oauth=self._is_anthropic_oauth,
max_tokens=self.max_tokens, reasoning_config=self.reasoning_config,
@@ -8439,11 +8417,8 @@ class AIAgent:
summary_kwargs["extra_body"] = summary_extra_body
summary_response = self._ensure_primary_openai_client(reason="iteration_limit_summary_retry").chat.completions.create(**summary_kwargs)
if summary_response.choices and summary_response.choices[0].message.content:
final_response = summary_response.choices[0].message.content
else:
final_response = ""
_retry_cc_nr = self._get_transport().normalize_response(summary_response)
final_response = (_retry_cc_nr.content or "").strip()
if final_response:
if "<think>" in final_response:
@@ -8970,7 +8945,11 @@ class AIAgent:
# For ALL assistant messages, pass reasoning back to the API
# This ensures multi-turn reasoning context is preserved
self._copy_reasoning_content_for_api(msg, api_msg)
if msg.get("role") == "assistant":
reasoning_text = msg.get("reasoning")
if reasoning_text:
# Add reasoning_content for API compatibility (Moonshot AI, Novita, OpenRouter)
api_msg["reasoning_content"] = reasoning_text
# Remove 'reasoning' field - it's for trajectory storage only
# We've copied it to 'reasoning_content' for the API above
@@ -9174,7 +9153,7 @@ class AIAgent:
if self._force_ascii_payload:
_sanitize_structure_non_ascii(api_kwargs)
if self.api_mode == "codex_responses":
api_kwargs = self._get_codex_transport().preflight_kwargs(api_kwargs, allow_stream=False)
api_kwargs = self._get_transport().preflight_kwargs(api_kwargs, allow_stream=False)
try:
from hermes_cli.plugins import invoke_hook as _invoke_hook
@@ -9262,7 +9241,7 @@ class AIAgent:
response_invalid = False
error_details = []
if self.api_mode == "codex_responses":
_ct_v = self._get_codex_transport()
_ct_v = self._get_transport()
if not _ct_v.validate_response(response):
if response is None:
response_invalid = True
@@ -9291,7 +9270,7 @@ class AIAgent:
response_invalid = True
error_details.append("response.output is empty")
elif self.api_mode == "anthropic_messages":
_tv = self._get_anthropic_transport()
_tv = self._get_transport()
if not _tv.validate_response(response):
response_invalid = True
if response is None:
@@ -9299,7 +9278,7 @@ class AIAgent:
else:
error_details.append("response.content invalid (not a non-empty list)")
elif self.api_mode == "bedrock_converse":
_btv = self._get_bedrock_transport()
_btv = self._get_transport()
if not _btv.validate_response(response):
response_invalid = True
if response is None:
@@ -9307,7 +9286,7 @@ class AIAgent:
else:
error_details.append("Bedrock response invalid (no output or choices)")
else:
_ctv = self._get_chat_completions_transport()
_ctv = self._get_transport()
if not _ctv.validate_response(response):
response_invalid = True
if response is None:
@@ -9467,15 +9446,18 @@ class AIAgent:
else:
finish_reason = "stop"
elif self.api_mode == "anthropic_messages":
_tfr = self._get_anthropic_transport()
_tfr = self._get_transport()
finish_reason = _tfr.map_finish_reason(response.stop_reason)
elif self.api_mode == "bedrock_converse":
# Bedrock response is already normalized at dispatch — finish_reason
# is already in OpenAI format via normalize_converse_response()
finish_reason = response.choices[0].finish_reason if hasattr(response, "choices") and response.choices else "stop"
# Bedrock response already normalized at dispatch — use transport
_bt_fr = self._get_transport()
_bt_fr_nr = _bt_fr.normalize_response(response)
finish_reason = _bt_fr_nr.finish_reason
else:
finish_reason = response.choices[0].finish_reason
assistant_message = response.choices[0].message
_cc_fr = self._get_transport()
_cc_fr_nr = _cc_fr.normalize_response(response)
finish_reason = _cc_fr_nr.finish_reason
assistant_message = self._nr_to_assistant_message(_cc_fr_nr)
if self._should_treat_stop_as_truncated(
finish_reason,
assistant_message,
@@ -9498,27 +9480,14 @@ class AIAgent:
# interim assistant message is byte-identical to what
# would have been appended in the non-truncated path.
_trunc_msg = None
if self.api_mode in ("chat_completions", "bedrock_converse"):
_trunc_msg = response.choices[0].message if (hasattr(response, "choices") and response.choices) else None
elif self.api_mode == "anthropic_messages":
_trunc_nr = self._get_anthropic_transport().normalize_response(
_trunc_transport = self._get_transport()
if self.api_mode == "anthropic_messages":
_trunc_nr = _trunc_transport.normalize_response(
response, strip_tool_prefix=self._is_anthropic_oauth
)
_trunc_msg = SimpleNamespace(
content=_trunc_nr.content,
tool_calls=[
SimpleNamespace(
id=tc.id, type="function",
function=SimpleNamespace(name=tc.name, arguments=tc.arguments),
) for tc in (_trunc_nr.tool_calls or [])
] or None,
reasoning=_trunc_nr.reasoning,
reasoning_content=None,
reasoning_details=(
_trunc_nr.provider_data.get("reasoning_details")
if _trunc_nr.provider_data else None
),
)
else:
_trunc_nr = _trunc_transport.normalize_response(response)
_trunc_msg = self._nr_to_assistant_message(_trunc_nr)
_trunc_content = getattr(_trunc_msg, "content", None) if _trunc_msg else None
_trunc_has_tool_calls = bool(getattr(_trunc_msg, "tool_calls", None)) if _trunc_msg else False
@@ -9767,7 +9736,6 @@ class AIAgent:
billing_mode="subscription_included"
if cost_result.status == "included" else None,
model=self.model,
api_call_count=1,
)
except Exception:
pass # never block the agent loop
@@ -10044,27 +10012,6 @@ class AIAgent:
if self._try_refresh_nous_client_credentials(force=True):
print(f"{self.log_prefix}🔐 Nous agent key refreshed after 401. Retrying request...")
continue
# Credential refresh didn't help — show diagnostic info.
# Most common causes: Portal OAuth expired/revoked,
# account out of credits, or agent key blocked.
from hermes_constants import display_hermes_home as _dhh_fn
_dhh = _dhh_fn()
_body_text = ""
try:
_body = getattr(api_error, "body", None) or getattr(api_error, "response", None)
if _body is not None:
_body_text = str(_body)[:200]
except Exception:
pass
print(f"{self.log_prefix}🔐 Nous 401 — Portal authentication failed.")
if _body_text:
print(f"{self.log_prefix} Response: {_body_text}")
print(f"{self.log_prefix} Most likely: Portal OAuth expired, account out of credits, or agent key revoked.")
print(f"{self.log_prefix} Troubleshooting:")
print(f"{self.log_prefix} • Re-authenticate: hermes login --provider nous")
print(f"{self.log_prefix} • Check credits / billing: https://portal.nousresearch.com")
print(f"{self.log_prefix} • Verify stored credentials: {_dhh}/auth.json")
print(f"{self.log_prefix} • Switch providers temporarily: /model <model> --provider openrouter")
if (
self.api_mode == "anthropic_messages"
and status_code == 401
@@ -10749,69 +10696,13 @@ class AIAgent:
break
try:
if self.api_mode == "codex_responses":
_ct = self._get_codex_transport()
_cnr = _ct.normalize_response(response)
# Back-compat shim: downstream expects SimpleNamespace with
# codex-specific fields (.codex_reasoning_items, .reasoning_details,
# and .call_id/.response_item_id on tool calls).
_tc_list = None
if _cnr.tool_calls:
_tc_list = []
for tc in _cnr.tool_calls:
_tc_ns = SimpleNamespace(
id=tc.id, type="function",
function=SimpleNamespace(name=tc.name, arguments=tc.arguments),
)
if tc.provider_data:
if tc.provider_data.get("call_id"):
_tc_ns.call_id = tc.provider_data["call_id"]
if tc.provider_data.get("response_item_id"):
_tc_ns.response_item_id = tc.provider_data["response_item_id"]
_tc_list.append(_tc_ns)
assistant_message = SimpleNamespace(
content=_cnr.content,
tool_calls=_tc_list or None,
reasoning=_cnr.reasoning,
reasoning_content=None,
codex_reasoning_items=(
_cnr.provider_data.get("codex_reasoning_items")
if _cnr.provider_data else None
),
reasoning_details=(
_cnr.provider_data.get("reasoning_details")
if _cnr.provider_data else None
),
)
finish_reason = _cnr.finish_reason
elif self.api_mode == "anthropic_messages":
_transport = self._get_anthropic_transport()
_nr = _transport.normalize_response(
response, strip_tool_prefix=self._is_anthropic_oauth
)
# Back-compat shim: downstream code expects SimpleNamespace with
# .content, .tool_calls, .reasoning, .reasoning_content,
# .reasoning_details attributes.
assistant_message = SimpleNamespace(
content=_nr.content,
tool_calls=[
SimpleNamespace(
id=tc.id,
type="function",
function=SimpleNamespace(name=tc.name, arguments=tc.arguments),
)
for tc in (_nr.tool_calls or [])
] or None,
reasoning=_nr.reasoning,
reasoning_content=None,
reasoning_details=(
_nr.provider_data.get("reasoning_details")
if _nr.provider_data else None
),
)
finish_reason = _nr.finish_reason
else:
assistant_message = response.choices[0].message
_transport = self._get_transport()
_normalize_kwargs = {}
if self.api_mode == "anthropic_messages":
_normalize_kwargs["strip_tool_prefix"] = self._is_anthropic_oauth
_nr = _transport.normalize_response(response, **_normalize_kwargs)
assistant_message = self._nr_to_assistant_message(_nr)
finish_reason = _nr.finish_reason
# Normalize content to string — some OpenAI-compatible servers
# (llama-server, etc.) return content as a dict or list instead
-7
View File
@@ -50,10 +50,7 @@ AUTHOR_MAP = {
"71184274+MassiveMassimo@users.noreply.github.com": "MassiveMassimo",
"massivemassimo@users.noreply.github.com": "MassiveMassimo",
"82637225+kshitijk4poor@users.noreply.github.com": "kshitijk4poor",
"keifergu@tencent.com": "keifergu",
"kshitijk4poor@users.noreply.github.com": "kshitijk4poor",
"abner.the.foreman@agentmail.to": "Abnertheforeman",
"harryykyle1@gmail.com": "hharry11",
"kshitijk4poor@gmail.com": "kshitijk4poor",
"16443023+stablegenius49@users.noreply.github.com": "stablegenius49",
"185121704+stablegenius49@users.noreply.github.com": "stablegenius49",
@@ -95,8 +92,6 @@ AUTHOR_MAP = {
"135070653+sgaofen@users.noreply.github.com": "sgaofen",
"nocoo@users.noreply.github.com": "nocoo",
"30841158+n-WN@users.noreply.github.com": "n-WN",
"tsuijinglei@gmail.com": "hiddenpuppy",
"jerome@clawwork.ai": "HiddenPuppy",
"leoyuan0099@gmail.com": "keyuyuan",
"bxzt2006@163.com": "Only-Code-A",
"i@troy-y.org": "TroyMitchell911",
@@ -104,8 +99,6 @@ AUTHOR_MAP = {
"hansnow@users.noreply.github.com": "hansnow",
"134848055+UNLINEARITY@users.noreply.github.com": "UNLINEARITY",
"ben.burtenshaw@gmail.com": "burtenshaw",
"roopaknijhara@gmail.com": "rnijhara",
"Maaannnn@users.noreply.github.com": "Maaannnn",
# contributors (manual mapping from git names)
"ahmedsherif95@gmail.com": "asheriif",
"liujinkun@bytedance.com": "liujinkun2025",
-238
View File
@@ -1,238 +0,0 @@
"""Regression tests: normalize_anthropic_response_v2 vs v1.
Constructs mock Anthropic responses and asserts that the v2 function
(returning NormalizedResponse) produces identical field values to the
original v1 function (returning SimpleNamespace + finish_reason).
"""
import json
import pytest
from types import SimpleNamespace
from agent.anthropic_adapter import (
normalize_anthropic_response,
normalize_anthropic_response_v2,
)
from agent.transports.types import NormalizedResponse, ToolCall
# ---------------------------------------------------------------------------
# Helpers to build mock Anthropic SDK responses
# ---------------------------------------------------------------------------
def _text_block(text: str):
return SimpleNamespace(type="text", text=text)
def _thinking_block(thinking: str, signature: str = "sig_abc"):
return SimpleNamespace(type="thinking", thinking=thinking, signature=signature)
def _tool_use_block(id: str, name: str, input: dict):
return SimpleNamespace(type="tool_use", id=id, name=name, input=input)
def _response(content_blocks, stop_reason="end_turn"):
return SimpleNamespace(
content=content_blocks,
stop_reason=stop_reason,
usage=SimpleNamespace(
input_tokens=10,
output_tokens=5,
),
)
# ---------------------------------------------------------------------------
# Tests
# ---------------------------------------------------------------------------
class TestTextOnly:
"""Text-only response — no tools, no thinking."""
def setup_method(self):
self.resp = _response([_text_block("Hello world")])
self.v1_msg, self.v1_finish = normalize_anthropic_response(self.resp)
self.v2 = normalize_anthropic_response_v2(self.resp)
def test_type(self):
assert isinstance(self.v2, NormalizedResponse)
def test_content_matches(self):
assert self.v2.content == self.v1_msg.content
def test_finish_reason_matches(self):
assert self.v2.finish_reason == self.v1_finish
def test_no_tool_calls(self):
assert self.v2.tool_calls is None
assert self.v1_msg.tool_calls is None
def test_no_reasoning(self):
assert self.v2.reasoning is None
assert self.v1_msg.reasoning is None
class TestWithToolCalls:
"""Response with tool calls."""
def setup_method(self):
self.resp = _response(
[
_text_block("I'll check that"),
_tool_use_block("toolu_abc", "terminal", {"command": "ls"}),
_tool_use_block("toolu_def", "read_file", {"path": "/tmp"}),
],
stop_reason="tool_use",
)
self.v1_msg, self.v1_finish = normalize_anthropic_response(self.resp)
self.v2 = normalize_anthropic_response_v2(self.resp)
def test_finish_reason(self):
assert self.v2.finish_reason == "tool_calls"
assert self.v1_finish == "tool_calls"
def test_tool_call_count(self):
assert len(self.v2.tool_calls) == 2
assert len(self.v1_msg.tool_calls) == 2
def test_tool_call_ids_match(self):
for i in range(2):
assert self.v2.tool_calls[i].id == self.v1_msg.tool_calls[i].id
def test_tool_call_names_match(self):
assert self.v2.tool_calls[0].name == "terminal"
assert self.v2.tool_calls[1].name == "read_file"
for i in range(2):
assert self.v2.tool_calls[i].name == self.v1_msg.tool_calls[i].function.name
def test_tool_call_arguments_match(self):
for i in range(2):
assert self.v2.tool_calls[i].arguments == self.v1_msg.tool_calls[i].function.arguments
def test_content_preserved(self):
assert self.v2.content == self.v1_msg.content
assert "check that" in self.v2.content
class TestWithThinking:
"""Response with thinking blocks (Claude 3.5+ extended thinking)."""
def setup_method(self):
self.resp = _response([
_thinking_block("Let me think about this carefully..."),
_text_block("The answer is 42."),
])
self.v1_msg, self.v1_finish = normalize_anthropic_response(self.resp)
self.v2 = normalize_anthropic_response_v2(self.resp)
def test_reasoning_matches(self):
assert self.v2.reasoning == self.v1_msg.reasoning
assert "think about this" in self.v2.reasoning
def test_reasoning_details_in_provider_data(self):
v1_details = self.v1_msg.reasoning_details
v2_details = self.v2.provider_data.get("reasoning_details") if self.v2.provider_data else None
assert v1_details is not None
assert v2_details is not None
assert len(v2_details) == len(v1_details)
def test_content_excludes_thinking(self):
assert self.v2.content == "The answer is 42."
class TestMixed:
"""Response with thinking + text + tool calls."""
def setup_method(self):
self.resp = _response(
[
_thinking_block("Planning my approach..."),
_text_block("I'll run the command"),
_tool_use_block("toolu_xyz", "terminal", {"command": "pwd"}),
],
stop_reason="tool_use",
)
self.v1_msg, self.v1_finish = normalize_anthropic_response(self.resp)
self.v2 = normalize_anthropic_response_v2(self.resp)
def test_all_fields_present(self):
assert self.v2.content is not None
assert self.v2.tool_calls is not None
assert self.v2.reasoning is not None
assert self.v2.finish_reason == "tool_calls"
def test_content_matches(self):
assert self.v2.content == self.v1_msg.content
def test_reasoning_matches(self):
assert self.v2.reasoning == self.v1_msg.reasoning
def test_tool_call_matches(self):
assert self.v2.tool_calls[0].id == self.v1_msg.tool_calls[0].id
assert self.v2.tool_calls[0].name == self.v1_msg.tool_calls[0].function.name
class TestStopReasons:
"""Verify finish_reason mapping matches between v1 and v2."""
@pytest.mark.parametrize("stop_reason,expected", [
("end_turn", "stop"),
("tool_use", "tool_calls"),
("max_tokens", "length"),
("stop_sequence", "stop"),
("refusal", "content_filter"),
("model_context_window_exceeded", "length"),
("unknown_future_reason", "stop"),
])
def test_stop_reason_mapping(self, stop_reason, expected):
resp = _response([_text_block("x")], stop_reason=stop_reason)
v1_msg, v1_finish = normalize_anthropic_response(resp)
v2 = normalize_anthropic_response_v2(resp)
assert v2.finish_reason == v1_finish == expected
class TestStripToolPrefix:
"""Verify mcp_ prefix stripping works identically."""
def test_prefix_stripped(self):
resp = _response(
[_tool_use_block("toolu_1", "mcp_terminal", {"cmd": "ls"})],
stop_reason="tool_use",
)
v1_msg, _ = normalize_anthropic_response(resp, strip_tool_prefix=True)
v2 = normalize_anthropic_response_v2(resp, strip_tool_prefix=True)
assert v1_msg.tool_calls[0].function.name == "terminal"
assert v2.tool_calls[0].name == "terminal"
def test_prefix_kept(self):
resp = _response(
[_tool_use_block("toolu_1", "mcp_terminal", {"cmd": "ls"})],
stop_reason="tool_use",
)
v1_msg, _ = normalize_anthropic_response(resp, strip_tool_prefix=False)
v2 = normalize_anthropic_response_v2(resp, strip_tool_prefix=False)
assert v1_msg.tool_calls[0].function.name == "mcp_terminal"
assert v2.tool_calls[0].name == "mcp_terminal"
class TestEdgeCases:
"""Edge cases: empty content, no blocks, etc."""
def test_empty_content_blocks(self):
resp = _response([])
v1_msg, v1_finish = normalize_anthropic_response(resp)
v2 = normalize_anthropic_response_v2(resp)
assert v2.content == v1_msg.content
assert v2.content is None
def test_no_reasoning_details_means_none_provider_data(self):
resp = _response([_text_block("hi")])
v2 = normalize_anthropic_response_v2(resp)
assert v2.provider_data is None
def test_v2_returns_dataclass_not_namespace(self):
resp = _response([_text_block("hi")])
v2 = normalize_anthropic_response_v2(resp)
assert isinstance(v2, NormalizedResponse)
assert not isinstance(v2, SimpleNamespace)
-39
View File
@@ -782,45 +782,6 @@ def test_resolve_api_key_provider_skips_unconfigured_anthropic(monkeypatch):
# ---------------------------------------------------------------------------
class TestModelDefaultElimination:
"""_resolve_api_key_provider must skip providers without known aux models."""
def test_unknown_provider_skipped(self, monkeypatch):
"""Providers not in _API_KEY_PROVIDER_AUX_MODELS are skipped, not sent model='default'."""
from agent.auxiliary_client import _API_KEY_PROVIDER_AUX_MODELS
# Verify our known providers have entries
assert "gemini" in _API_KEY_PROVIDER_AUX_MODELS
assert "kimi-coding" in _API_KEY_PROVIDER_AUX_MODELS
# A random provider_id not in the dict should return None
assert _API_KEY_PROVIDER_AUX_MODELS.get("totally-unknown-provider") is None
def test_known_provider_gets_real_model(self):
"""Known providers get a real model name, not 'default'."""
from agent.auxiliary_client import _API_KEY_PROVIDER_AUX_MODELS
for provider_id, model in _API_KEY_PROVIDER_AUX_MODELS.items():
assert model != "default", f"{provider_id} should not map to 'default'"
assert isinstance(model, str) and model.strip(), \
f"{provider_id} should have a non-empty model string"
def test_volcengine_byteplus_use_main_model_first(self):
"""Volcengine/BytePlus use main-model-first — no entry in _API_KEY_PROVIDER_AUX_MODELS."""
from agent.auxiliary_client import _API_KEY_PROVIDER_AUX_MODELS
assert "volcengine" not in _API_KEY_PROVIDER_AUX_MODELS
assert "byteplus" not in _API_KEY_PROVIDER_AUX_MODELS
class TestContractProviderAliases:
def test_coding_plan_aliases_normalize_to_canonical_provider(self):
from agent.auxiliary_client import _normalize_aux_provider
assert _normalize_aux_provider("volcengine-coding-plan") == "volcengine"
assert _normalize_aux_provider("byteplus-coding-plan") == "byteplus"
# ---------------------------------------------------------------------------
# _try_payment_fallback reason parameter (#7512 bug 3)
# ---------------------------------------------------------------------------
+1 -7
View File
@@ -298,15 +298,9 @@ class TestClassifyApiError:
assert result.retryable is False
def test_404_generic(self):
# Generic 404 with no "model not found" signal — common for local
# llama.cpp/Ollama/vLLM endpoints with slightly wrong paths. Treat
# as unknown (retryable) so the real error surfaces, rather than
# claiming the model is missing and silently falling back.
e = MockAPIError("Not Found", status_code=404)
result = classify_api_error(e)
assert result.reason == FailoverReason.unknown
assert result.retryable is True
assert result.should_fallback is False
assert result.reason == FailoverReason.model_not_found
# ── Payload too large ──
-23
View File
@@ -79,28 +79,6 @@ class TestMemoryManagerUserIdThreading:
assert p._init_kwargs.get("platform") == "telegram"
assert p._init_session_id == "sess-123"
def test_chat_context_forwarded_to_provider(self):
mgr = MemoryManager()
p = RecordingProvider()
mgr.add_provider(p)
mgr.initialize_all(
session_id="sess-chat",
platform="discord",
user_id="discord_u_7",
user_name="fakeusername",
chat_id="1485316232612941897",
chat_name="fakeassistantname-forums",
chat_type="thread",
thread_id="1491249007475949698",
)
assert p._init_kwargs.get("user_name") == "fakeusername"
assert p._init_kwargs.get("chat_id") == "1485316232612941897"
assert p._init_kwargs.get("chat_name") == "fakeassistantname-forums"
assert p._init_kwargs.get("chat_type") == "thread"
assert p._init_kwargs.get("thread_id") == "1491249007475949698"
def test_no_user_id_when_cli(self):
"""CLI sessions should not have user_id in kwargs."""
mgr = MemoryManager()
@@ -356,4 +334,3 @@ class TestAIAgentUserIdPropagation:
agent = object.__new__(AIAgent)
agent._user_id = None
assert agent._user_id is None
-17
View File
@@ -222,22 +222,6 @@ class TestGetModelContextLength:
mock_fetch.return_value = {}
assert get_model_context_length("unknown/never-heard-of-this") == CONTEXT_PROBE_TIERS[0]
@patch("agent.model_metadata.fetch_model_metadata")
def test_volcengine_contract_model_uses_contract_context_length(self, mock_fetch):
mock_fetch.return_value = {}
assert get_model_context_length(
"volcengine/doubao-seed-2-0-pro-260215",
provider="volcengine",
) == 256000
@patch("agent.model_metadata.fetch_model_metadata")
def test_byteplus_contract_model_infers_provider_from_url(self, mock_fetch):
mock_fetch.return_value = {}
assert get_model_context_length(
"byteplus-coding-plan/kimi-k2.5",
base_url="https://ark.ap-southeast.bytepluses.com/api/coding/v3",
) == 256000
@patch("agent.model_metadata.fetch_model_metadata")
def test_partial_match_in_defaults(self, mock_fetch):
mock_fetch.return_value = {}
@@ -401,7 +385,6 @@ class TestStripProviderPrefix:
assert _strip_provider_prefix("local:my-model") == "my-model"
assert _strip_provider_prefix("openrouter:anthropic/claude-sonnet-4") == "anthropic/claude-sonnet-4"
assert _strip_provider_prefix("anthropic:claude-sonnet-4") == "claude-sonnet-4"
assert _strip_provider_prefix("stepfun:step-3.5-flash") == "step-3.5-flash"
def test_ollama_model_tag_preserved(self):
"""Ollama model:tag format must NOT be stripped."""
-1
View File
@@ -82,7 +82,6 @@ class TestProviderMapping:
def test_known_providers_mapped(self):
assert PROVIDER_TO_MODELS_DEV["anthropic"] == "anthropic"
assert PROVIDER_TO_MODELS_DEV["copilot"] == "github-copilot"
assert PROVIDER_TO_MODELS_DEV["stepfun"] == "stepfun"
assert PROVIDER_TO_MODELS_DEV["kilocode"] == "kilo"
assert PROVIDER_TO_MODELS_DEV["ai-gateway"] == "vercel"
-3
View File
@@ -1059,7 +1059,6 @@ class TestRewriteTranscriptPreservesReasoning:
role="assistant",
content="The answer is 42.",
reasoning="I need to think step by step.",
reasoning_content="provider scratchpad",
reasoning_details=[{"type": "summary", "text": "step by step"}],
codex_reasoning_items=[{"id": "r1", "type": "reasoning"}],
)
@@ -1067,7 +1066,6 @@ class TestRewriteTranscriptPreservesReasoning:
# Verify all three were stored
before = db.get_messages_as_conversation(session_id)
assert before[0].get("reasoning") == "I need to think step by step."
assert before[0].get("reasoning_content") == "provider scratchpad"
assert before[0].get("reasoning_details") == [{"type": "summary", "text": "step by step"}]
assert before[0].get("codex_reasoning_items") == [{"id": "r1", "type": "reasoning"}]
@@ -1084,6 +1082,5 @@ class TestRewriteTranscriptPreservesReasoning:
# Load again — all three reasoning fields must survive
after = db.get_messages_as_conversation(session_id)
assert after[0].get("reasoning") == "I need to think step by step."
assert after[0].get("reasoning_content") == "provider scratchpad"
assert after[0].get("reasoning_details") == [{"type": "summary", "text": "step by step"}]
assert after[0].get("codex_reasoning_items") == [{"id": "r1", "type": "reasoning"}]
+3 -135
View File
@@ -1031,7 +1031,7 @@ class TestReactions:
@pytest.mark.asyncio
async def test_reactions_in_message_flow(self, adapter):
"""Reactions should be bracketed around actual processing via hooks."""
"""Reactions should be added on receipt and swapped on completion."""
adapter._app.client.reactions_add = AsyncMock()
adapter._app.client.reactions_remove = AsyncMock()
adapter._app.client.users_info = AsyncMock(return_value={
@@ -1047,147 +1047,15 @@ class TestReactions:
}
await adapter._handle_slack_message(event)
# _handle_slack_message should register the message for reactions
assert "1234567890.000001" in adapter._reacting_message_ids
# Simulate the base class calling on_processing_start
from gateway.platforms.base import MessageEvent, MessageType, SessionSource
from gateway.config import Platform
source = SessionSource(
platform=Platform.SLACK,
chat_id="C123",
chat_type="dm",
user_id="U_USER",
)
msg_event = MessageEvent(
text="hello",
message_type=MessageType.TEXT,
source=source,
message_id="1234567890.000001",
)
await adapter.on_processing_start(msg_event)
add_calls = adapter._app.client.reactions_add.call_args_list
assert len(add_calls) == 1
assert add_calls[0].kwargs["name"] == "eyes"
# Simulate the base class calling on_processing_complete
from gateway.platforms.base import ProcessingOutcome
await adapter.on_processing_complete(msg_event, ProcessingOutcome.SUCCESS)
# Should have added 👀, then removed 👀, then added ✅
add_calls = adapter._app.client.reactions_add.call_args_list
remove_calls = adapter._app.client.reactions_remove.call_args_list
assert len(add_calls) == 2
assert add_calls[0].kwargs["name"] == "eyes"
assert add_calls[1].kwargs["name"] == "white_check_mark"
assert len(remove_calls) == 1
assert remove_calls[0].kwargs["name"] == "eyes"
# Message ID should be cleaned up
assert "1234567890.000001" not in adapter._reacting_message_ids
@pytest.mark.asyncio
async def test_reactions_failure_outcome(self, adapter):
"""Failed processing should add :x: instead of :white_check_mark:."""
adapter._app.client.reactions_add = AsyncMock()
adapter._app.client.reactions_remove = AsyncMock()
from gateway.platforms.base import MessageEvent, MessageType, SessionSource, ProcessingOutcome
from gateway.config import Platform
source = SessionSource(
platform=Platform.SLACK,
chat_id="C123",
chat_type="dm",
user_id="U_USER",
)
adapter._reacting_message_ids.add("1234567890.000002")
msg_event = MessageEvent(
text="hello",
message_type=MessageType.TEXT,
source=source,
message_id="1234567890.000002",
)
await adapter.on_processing_complete(msg_event, ProcessingOutcome.FAILURE)
add_calls = adapter._app.client.reactions_add.call_args_list
remove_calls = adapter._app.client.reactions_remove.call_args_list
assert len(add_calls) == 1
assert add_calls[0].kwargs["name"] == "x"
assert len(remove_calls) == 1
assert remove_calls[0].kwargs["name"] == "eyes"
@pytest.mark.asyncio
async def test_reactions_skipped_for_non_dm_non_mention(self, adapter):
"""Non-DM, non-mention messages should not get reactions."""
adapter._app.client.reactions_add = AsyncMock()
adapter._app.client.reactions_remove = AsyncMock()
adapter._app.client.users_info = AsyncMock(return_value={
"user": {"profile": {"display_name": "Tyler"}}
})
event = {
"text": "hello",
"user": "U_USER",
"channel": "C123",
"channel_type": "channel",
"ts": "1234567890.000003",
}
await adapter._handle_slack_message(event)
# Should NOT register for reactions when not mentioned in a channel
assert "1234567890.000003" not in adapter._reacting_message_ids
adapter._app.client.reactions_add.assert_not_called()
adapter._app.client.reactions_remove.assert_not_called()
@pytest.mark.asyncio
async def test_reactions_disabled_via_env(self, adapter, monkeypatch):
"""SLACK_REACTIONS=false should suppress all reaction lifecycle."""
monkeypatch.setenv("SLACK_REACTIONS", "false")
adapter._app.client.reactions_add = AsyncMock()
adapter._app.client.reactions_remove = AsyncMock()
adapter._app.client.users_info = AsyncMock(return_value={
"user": {"profile": {"display_name": "Tyler"}}
})
event = {
"text": "hello",
"user": "U_USER",
"channel": "C123",
"channel_type": "im",
"ts": "1234567890.000004",
}
await adapter._handle_slack_message(event)
# Should NOT register for reactions when toggle is off
assert "1234567890.000004" not in adapter._reacting_message_ids
# Hooks should also be no-ops when disabled
from gateway.platforms.base import MessageEvent, MessageType, SessionSource, ProcessingOutcome
from gateway.config import Platform
source = SessionSource(
platform=Platform.SLACK,
chat_id="C123",
chat_type="dm",
user_id="U_USER",
)
msg_event = MessageEvent(
text="hello",
message_type=MessageType.TEXT,
source=source,
message_id="1234567890.000004",
)
# Force-add to verify hooks respect the toggle independently
adapter._reacting_message_ids.add("1234567890.000004")
await adapter.on_processing_start(msg_event)
await adapter.on_processing_complete(msg_event, ProcessingOutcome.SUCCESS)
adapter._app.client.reactions_add.assert_not_called()
adapter._app.client.reactions_remove.assert_not_called()
@pytest.mark.asyncio
async def test_reactions_enabled_by_default(self, adapter):
"""SLACK_REACTIONS defaults to true (matches existing behavior)."""
assert adapter._reactions_enabled() is True
# ---------------------------------------------------------------------------
# TestThreadReplyHandling
+1 -110
View File
@@ -15,8 +15,6 @@ from hermes_cli.auth import (
get_auth_status,
AuthError,
KIMI_CODE_BASE_URL,
STEPFUN_STEP_PLAN_INTL_BASE_URL,
STEPFUN_STEP_PLAN_CN_BASE_URL,
_resolve_kimi_base_url,
)
from hermes_cli.copilot_auth import _try_gh_cli_token
@@ -37,13 +35,10 @@ class TestProviderRegistry:
("xai", "xAI", "api_key"),
("nvidia", "NVIDIA NIM", "api_key"),
("kimi-coding", "Kimi / Moonshot", "api_key"),
("stepfun", "StepFun Step Plan", "api_key"),
("minimax", "MiniMax", "api_key"),
("minimax-cn", "MiniMax (China)", "api_key"),
("ai-gateway", "Vercel AI Gateway", "api_key"),
("kilocode", "Kilo Code", "api_key"),
("volcengine", "Volcengine", "api_key"),
("byteplus", "BytePlus", "api_key"),
])
def test_provider_registered(self, provider_id, name, auth_type):
assert provider_id in PROVIDER_REGISTRY
@@ -88,11 +83,6 @@ class TestProviderRegistry:
assert pconfig.api_key_env_vars == ("MINIMAX_API_KEY",)
assert pconfig.base_url_env_var == "MINIMAX_BASE_URL"
def test_stepfun_env_vars(self):
pconfig = PROVIDER_REGISTRY["stepfun"]
assert pconfig.api_key_env_vars == ("STEPFUN_API_KEY",)
assert pconfig.base_url_env_var == "STEPFUN_BASE_URL"
def test_minimax_cn_env_vars(self):
pconfig = PROVIDER_REGISTRY["minimax-cn"]
assert pconfig.api_key_env_vars == ("MINIMAX_CN_API_KEY",)
@@ -113,29 +103,16 @@ class TestProviderRegistry:
assert pconfig.api_key_env_vars == ("HF_TOKEN",)
assert pconfig.base_url_env_var == "HF_BASE_URL"
def test_volcengine_env_vars(self):
pconfig = PROVIDER_REGISTRY["volcengine"]
assert pconfig.api_key_env_vars == ("VOLCENGINE_API_KEY",)
assert pconfig.base_url_env_var == ""
def test_byteplus_env_vars(self):
pconfig = PROVIDER_REGISTRY["byteplus"]
assert pconfig.api_key_env_vars == ("BYTEPLUS_API_KEY",)
assert pconfig.base_url_env_var == ""
def test_base_urls(self):
assert PROVIDER_REGISTRY["copilot"].inference_base_url == "https://api.githubcopilot.com"
assert PROVIDER_REGISTRY["copilot-acp"].inference_base_url == "acp://copilot"
assert PROVIDER_REGISTRY["zai"].inference_base_url == "https://api.z.ai/api/paas/v4"
assert PROVIDER_REGISTRY["kimi-coding"].inference_base_url == "https://api.moonshot.ai/v1"
assert PROVIDER_REGISTRY["stepfun"].inference_base_url == STEPFUN_STEP_PLAN_INTL_BASE_URL
assert PROVIDER_REGISTRY["minimax"].inference_base_url == "https://api.minimax.io/anthropic"
assert PROVIDER_REGISTRY["minimax-cn"].inference_base_url == "https://api.minimaxi.com/anthropic"
assert PROVIDER_REGISTRY["ai-gateway"].inference_base_url == "https://ai-gateway.vercel.sh/v1"
assert PROVIDER_REGISTRY["kilocode"].inference_base_url == "https://api.kilo.ai/api/gateway"
assert PROVIDER_REGISTRY["huggingface"].inference_base_url == "https://router.huggingface.co/v1"
assert PROVIDER_REGISTRY["volcengine"].inference_base_url == "https://ark.cn-beijing.volces.com/api/v3"
assert PROVIDER_REGISTRY["byteplus"].inference_base_url == "https://ark.ap-southeast.bytepluses.com/api/v3"
def test_oauth_providers_unchanged(self):
"""Ensure we didn't break the existing OAuth providers."""
@@ -153,15 +130,13 @@ PROVIDER_ENV_VARS = (
"OPENROUTER_API_KEY", "OPENAI_API_KEY", "ANTHROPIC_API_KEY", "ANTHROPIC_TOKEN",
"CLAUDE_CODE_OAUTH_TOKEN",
"GLM_API_KEY", "ZAI_API_KEY", "Z_AI_API_KEY",
"KIMI_API_KEY", "KIMI_BASE_URL", "STEPFUN_API_KEY", "STEPFUN_BASE_URL",
"MINIMAX_API_KEY", "MINIMAX_CN_API_KEY",
"KIMI_API_KEY", "KIMI_BASE_URL", "MINIMAX_API_KEY", "MINIMAX_CN_API_KEY",
"AI_GATEWAY_API_KEY", "AI_GATEWAY_BASE_URL",
"KILOCODE_API_KEY", "KILOCODE_BASE_URL",
"DASHSCOPE_API_KEY", "OPENCODE_ZEN_API_KEY", "OPENCODE_GO_API_KEY",
"NOUS_API_KEY", "GITHUB_TOKEN", "GH_TOKEN",
"OPENAI_BASE_URL", "HERMES_COPILOT_ACP_COMMAND", "COPILOT_CLI_PATH",
"HERMES_COPILOT_ACP_ARGS", "COPILOT_ACP_BASE_URL",
"VOLCENGINE_API_KEY", "BYTEPLUS_API_KEY",
)
@@ -181,9 +156,6 @@ class TestResolveProvider:
def test_explicit_kimi_coding(self):
assert resolve_provider("kimi-coding") == "kimi-coding"
def test_explicit_stepfun(self):
assert resolve_provider("stepfun") == "stepfun"
def test_explicit_minimax(self):
assert resolve_provider("minimax") == "minimax"
@@ -208,9 +180,6 @@ class TestResolveProvider:
def test_alias_moonshot(self):
assert resolve_provider("moonshot") == "kimi-coding"
def test_alias_step(self):
assert resolve_provider("step") == "stepfun"
def test_alias_minimax_underscore(self):
assert resolve_provider("minimax_cn") == "minimax-cn"
@@ -247,14 +216,6 @@ class TestResolveProvider:
assert resolve_provider("github-copilot-acp") == "copilot-acp"
assert resolve_provider("copilot-acp-agent") == "copilot-acp"
def test_alias_volcengine_coding_plan(self):
assert resolve_provider("volcengine-coding-plan") == "volcengine"
assert resolve_provider("volcengine_coding_plan") == "volcengine"
def test_alias_byteplus_coding_plan(self):
assert resolve_provider("byteplus-coding-plan") == "byteplus"
assert resolve_provider("byteplus_coding_plan") == "byteplus"
def test_explicit_huggingface(self):
assert resolve_provider("huggingface") == "huggingface"
@@ -287,10 +248,6 @@ class TestResolveProvider:
monkeypatch.setenv("KIMI_API_KEY", "test-kimi-key")
assert resolve_provider("auto") == "kimi-coding"
def test_auto_detects_stepfun_key(self, monkeypatch):
monkeypatch.setenv("STEPFUN_API_KEY", "test-stepfun-key")
assert resolve_provider("auto") == "stepfun"
def test_auto_detects_minimax_key(self, monkeypatch):
monkeypatch.setenv("MINIMAX_API_KEY", "test-mm-key")
assert resolve_provider("auto") == "minimax"
@@ -355,30 +312,6 @@ class TestApiKeyProviderStatus:
status = get_api_key_provider_status("kimi-coding")
assert status["base_url"] == "https://custom.kimi.example/v1"
def test_stepfun_status_uses_configured_base_url(self, monkeypatch):
monkeypatch.setenv("STEPFUN_API_KEY", "stepfun-key")
monkeypatch.setenv("STEPFUN_BASE_URL", STEPFUN_STEP_PLAN_CN_BASE_URL)
status = get_api_key_provider_status("stepfun")
assert status["configured"] is True
assert status["base_url"] == STEPFUN_STEP_PLAN_CN_BASE_URL
def test_volcengine_status_uses_coding_plan_base_url(self, monkeypatch):
monkeypatch.setenv("VOLCENGINE_API_KEY", "volc-test-key")
monkeypatch.setattr(
"hermes_cli.auth.read_raw_config",
lambda: {
"model": {
"provider": "volcengine",
"default": "volcengine-coding-plan/doubao-seed-2.0-code",
}
},
)
status = get_api_key_provider_status("volcengine")
assert status["configured"] is True
assert status["base_url"] == "https://ark.cn-beijing.volces.com/api/coding/v3"
def test_copilot_status_uses_gh_cli_token(self, monkeypatch):
monkeypatch.setattr("hermes_cli.copilot_auth._try_gh_cli_token", lambda: "gho_gh_cli_token")
status = get_api_key_provider_status("copilot")
@@ -434,25 +367,6 @@ class TestResolveApiKeyProviderCredentials:
assert creds["base_url"] == "https://api.z.ai/api/paas/v4"
assert creds["source"] == "GLM_API_KEY"
def test_resolve_byteplus_with_coding_plan_model_uses_coding_base_url(self, monkeypatch):
monkeypatch.setenv("BYTEPLUS_API_KEY", "byteplus-secret-key")
monkeypatch.setattr(
"hermes_cli.auth.read_raw_config",
lambda: {
"model": {
"provider": "byteplus",
"default": "byteplus-coding-plan/dola-seed-2.0-pro",
}
},
)
creds = resolve_api_key_provider_credentials("byteplus")
assert creds["provider"] == "byteplus"
assert creds["api_key"] == "byteplus-secret-key"
assert creds["base_url"] == "https://ark.ap-southeast.bytepluses.com/api/coding/v3"
assert creds["source"] == "BYTEPLUS_API_KEY"
def test_resolve_copilot_with_github_token(self, monkeypatch):
monkeypatch.setenv("GITHUB_TOKEN", "gh-env-secret")
creds = resolve_api_key_provider_credentials("copilot")
@@ -515,19 +429,6 @@ class TestResolveApiKeyProviderCredentials:
assert creds["api_key"] == "kimi-secret-key"
assert creds["base_url"] == "https://api.moonshot.ai/v1"
def test_resolve_stepfun_with_key(self, monkeypatch):
monkeypatch.setenv("STEPFUN_API_KEY", "stepfun-secret-key")
creds = resolve_api_key_provider_credentials("stepfun")
assert creds["provider"] == "stepfun"
assert creds["api_key"] == "stepfun-secret-key"
assert creds["base_url"] == STEPFUN_STEP_PLAN_INTL_BASE_URL
def test_resolve_stepfun_custom_base_url(self, monkeypatch):
monkeypatch.setenv("STEPFUN_API_KEY", "stepfun-secret-key")
monkeypatch.setenv("STEPFUN_BASE_URL", STEPFUN_STEP_PLAN_CN_BASE_URL)
creds = resolve_api_key_provider_credentials("stepfun")
assert creds["base_url"] == STEPFUN_STEP_PLAN_CN_BASE_URL
def test_resolve_minimax_with_key(self, monkeypatch):
monkeypatch.setenv("MINIMAX_API_KEY", "mm-secret-key")
creds = resolve_api_key_provider_credentials("minimax")
@@ -618,16 +519,6 @@ class TestRuntimeProviderResolution:
assert result["api_mode"] == "chat_completions"
assert result["api_key"] == "kimi-key"
def test_runtime_stepfun(self, monkeypatch):
monkeypatch.setenv("STEPFUN_API_KEY", "stepfun-key")
monkeypatch.setenv("STEPFUN_BASE_URL", STEPFUN_STEP_PLAN_CN_BASE_URL)
from hermes_cli.runtime_provider import resolve_runtime_provider
result = resolve_runtime_provider(requested="stepfun")
assert result["provider"] == "stepfun"
assert result["api_mode"] == "chat_completions"
assert result["api_key"] == "stepfun-key"
assert result["base_url"] == STEPFUN_STEP_PLAN_CN_BASE_URL
def test_runtime_minimax(self, monkeypatch):
monkeypatch.setenv("MINIMAX_API_KEY", "mm-key")
from hermes_cli.runtime_provider import resolve_runtime_provider
-19
View File
@@ -33,25 +33,6 @@ def test_project_env_overrides_stale_shell_values_when_user_env_missing(tmp_path
assert os.getenv("OPENAI_BASE_URL") == "https://project.example/v1"
def test_project_env_is_sanitized_before_loading(tmp_path, monkeypatch):
home = tmp_path / "hermes"
project_env = tmp_path / ".env"
project_env.write_text(
"TELEGRAM_BOT_TOKEN=8356550917:AAGGEkzg06Hrc3Hjb3Sa1jkGVDOdU_lYy2Q"
"ANTHROPIC_API_KEY=sk-ant-test123\n",
encoding="utf-8",
)
monkeypatch.delenv("TELEGRAM_BOT_TOKEN", raising=False)
monkeypatch.delenv("ANTHROPIC_API_KEY", raising=False)
loaded = load_hermes_dotenv(hermes_home=home, project_env=project_env)
assert loaded == [project_env]
assert os.getenv("TELEGRAM_BOT_TOKEN") == "8356550917:AAGGEkzg06Hrc3Hjb3Sa1jkGVDOdU_lYy2Q"
assert os.getenv("ANTHROPIC_API_KEY") == "sk-ant-test123"
def test_user_env_takes_precedence_over_project_env(tmp_path, monkeypatch):
home = tmp_path / "hermes"
home.mkdir()
-13
View File
@@ -179,19 +179,6 @@ class TestIssue6211NativeProviderPrefixNormalization:
assert normalize_model_for_provider(model, target_provider) == expected
class TestContractProviderPrefixNormalization:
@pytest.mark.parametrize("model,target_provider,expected", [
("volcengine/doubao-seed-2-0-pro-260215", "volcengine", "doubao-seed-2-0-pro-260215"),
("volcengine-coding-plan/doubao-seed-2.0-code", "volcengine", "doubao-seed-2.0-code"),
("byteplus/seed-2-0-pro-260328", "byteplus", "seed-2-0-pro-260328"),
("byteplus-coding-plan/dola-seed-2.0-pro", "byteplus", "dola-seed-2.0-pro"),
])
def test_contract_provider_prefixes_strip_to_native_model(
self, model, target_provider, expected
):
assert normalize_model_for_provider(model, target_provider) == expected
# ── detect_vendor ──────────────────────────────────────────────────────
class TestDetectVendor:
@@ -32,8 +32,6 @@ def config_home(tmp_path, monkeypatch):
monkeypatch.delenv("OPENAI_BASE_URL", raising=False)
monkeypatch.delenv("OPENAI_API_KEY", raising=False)
monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
monkeypatch.delenv("STEPFUN_API_KEY", raising=False)
monkeypatch.delenv("STEPFUN_BASE_URL", raising=False)
return home
@@ -102,31 +100,6 @@ class TestProviderPersistsAfterModelSave:
)
assert model.get("default") == "kimi-k2.5"
def test_volcengine_contract_provider_persists_coding_plan_model(self, config_home, monkeypatch):
"""Volcengine should persist a prefixed coding-plan model and matching base URL."""
monkeypatch.setenv("VOLCENGINE_API_KEY", "volc-test-key")
from hermes_cli.main import _model_flow_contract_provider
from hermes_cli.config import load_config
with patch(
"hermes_cli.auth._prompt_model_selection",
return_value="volcengine-coding-plan/doubao-seed-2.0-code",
), patch(
"hermes_cli.auth.deactivate_provider",
):
_model_flow_contract_provider(load_config(), "volcengine", "old-model")
import yaml
config = yaml.safe_load((config_home / "config.yaml").read_text()) or {}
model = config.get("model")
assert isinstance(model, dict), f"model should be dict, got {type(model)}"
assert model.get("provider") == "volcengine"
assert model.get("default") == "volcengine-coding-plan/doubao-seed-2.0-code"
assert model.get("base_url") == "https://ark.cn-beijing.volces.com/api/coding/v3"
assert "api_mode" not in model
def test_copilot_provider_saved_when_selected(self, config_home):
"""_model_flow_copilot should persist provider/base_url/model together."""
from hermes_cli.main import _model_flow_copilot
@@ -357,33 +330,3 @@ class TestBaseUrlValidation:
saved = get_env_value("GLM_BASE_URL") or ""
assert saved == "", "Empty input should not save a base URL"
def test_stepfun_provider_saved_with_selected_region(self, config_home, monkeypatch):
from hermes_cli.main import _model_flow_stepfun
from hermes_cli.config import load_config, get_env_value
monkeypatch.setenv("STEPFUN_API_KEY", "stepfun-test-key")
with patch(
"hermes_cli.main._prompt_provider_choice",
return_value=1,
), patch(
"hermes_cli.models.fetch_api_models",
return_value=["step-3.5-flash", "step-3-agent-lite"],
), patch(
"hermes_cli.auth._prompt_model_selection",
return_value="step-3-agent-lite",
), patch(
"hermes_cli.auth.deactivate_provider",
):
_model_flow_stepfun(load_config(), "old-model")
import yaml
config = yaml.safe_load((config_home / "config.yaml").read_text()) or {}
model = config.get("model")
assert isinstance(model, dict)
assert model.get("provider") == "stepfun"
assert model.get("default") == "step-3-agent-lite"
assert model.get("base_url") == "https://api.stepfun.com/step_plan/v1"
assert get_env_value("STEPFUN_BASE_URL") == "https://api.stepfun.com/step_plan/v1"
-17
View File
@@ -63,11 +63,6 @@ class TestParseModelInput:
assert provider == "zai"
assert model == "glm-5"
def test_stepfun_alias_resolved(self):
provider, model = parse_model_input("step:step-3.5-flash", "openrouter")
assert provider == "stepfun"
assert model == "step-3.5-flash"
def test_no_slash_no_colon_keeps_provider(self):
provider, model = parse_model_input("gpt-5.4", "openrouter")
assert provider == "openrouter"
@@ -159,7 +154,6 @@ class TestNormalizeProvider:
assert normalize_provider("glm") == "zai"
assert normalize_provider("kimi") == "kimi-coding"
assert normalize_provider("moonshot") == "kimi-coding"
assert normalize_provider("step") == "stepfun"
assert normalize_provider("github-copilot") == "copilot"
def test_case_insensitive(self):
@@ -170,7 +164,6 @@ class TestProviderLabel:
def test_known_labels_and_auto(self):
assert provider_label("anthropic") == "Anthropic"
assert provider_label("kimi") == "Kimi / Kimi Coding Plan"
assert provider_label("stepfun") == "StepFun Step Plan"
assert provider_label("copilot") == "GitHub Copilot"
assert provider_label("copilot-acp") == "GitHub Copilot ACP"
assert provider_label("auto") == "Auto"
@@ -200,16 +193,6 @@ class TestProviderModelIds:
def test_zai_returns_glm_models(self):
assert "glm-5" in provider_model_ids("zai")
def test_stepfun_prefers_live_catalog(self):
with patch(
"hermes_cli.auth.resolve_api_key_provider_credentials",
return_value={"api_key": "***", "base_url": "https://api.stepfun.com/step_plan/v1"},
), patch(
"hermes_cli.models.fetch_api_models",
return_value=["step-3.5-flash", "step-3-agent-lite"],
):
assert provider_model_ids("stepfun") == ["step-3.5-flash", "step-3-agent-lite"]
def test_copilot_prefers_live_catalog(self):
with patch("hermes_cli.auth.resolve_api_key_provider_credentials", return_value={"api_key": "gh-token"}), \
patch("hermes_cli.models._fetch_github_models", return_value=["gpt-5.4", "claude-sonnet-4.6"]):
-36
View File
@@ -6,7 +6,6 @@ from hermes_cli.models import (
OPENROUTER_MODELS, fetch_openrouter_models, model_ids, detect_provider_for_model,
is_nous_free_tier, partition_nous_models_by_tier,
check_nous_free_tier, _FREE_TIER_CACHE_TTL,
list_available_providers, provider_for_base_url,
)
import hermes_cli.models as _models_mod
@@ -292,41 +291,6 @@ class TestDetectProviderForModel:
assert result is not None
assert result[0] not in ("nous",) # nous has claude models but shouldn't be suggested
def test_volcengine_coding_plan_model_detected(self):
result = detect_provider_for_model(
"volcengine-coding-plan/doubao-seed-2.0-code",
"openrouter",
)
assert result == ("volcengine", "volcengine-coding-plan/doubao-seed-2.0-code")
def test_byteplus_standard_model_detected(self):
result = detect_provider_for_model(
"byteplus/seed-2-0-pro-260328",
"openrouter",
)
assert result == ("byteplus", "byteplus/seed-2-0-pro-260328")
class TestConfiguredBaseUrlProviderDetection:
def test_provider_for_base_url_detects_volcengine(self):
assert provider_for_base_url("https://ark.cn-beijing.volces.com/api/v3") == "volcengine"
def test_provider_for_base_url_detects_byteplus_coding(self):
assert provider_for_base_url("https://ark.ap-southeast.bytepluses.com/api/coding/v3") == "byteplus"
def test_known_builtin_endpoint_is_not_listed_as_custom(self, monkeypatch):
monkeypatch.setattr("hermes_cli.models._get_custom_base_url", lambda: "https://ark.cn-beijing.volces.com/api/v3")
monkeypatch.setattr(
"hermes_cli.auth.get_auth_status",
lambda pid: {"configured": pid == "volcengine", "logged_in": pid == "volcengine"},
)
monkeypatch.setattr("hermes_cli.auth.has_usable_secret", lambda value: False)
providers = {p["id"]: p for p in list_available_providers()}
assert providers["volcengine"]["authenticated"] is True
assert providers["custom"]["authenticated"] is False
class TestIsNousFreeTier:
"""Tests for is_nous_free_tier — account tier detection."""
-67
View File
@@ -250,73 +250,6 @@ class TestPluginLoading:
assert "hermes_plugins.ns_plugin" in sys.modules
def test_user_memory_plugin_auto_coerced_to_exclusive(self, tmp_path, monkeypatch):
"""User-installed memory plugins must NOT be loaded by the general
PluginManager they belong to plugins/memory discovery.
Regression test for the mempalace crash:
'PluginContext' object has no attribute 'register_memory_provider'
A plugin that calls ``ctx.register_memory_provider`` in its
``__init__.py`` should be auto-detected and treated as
``kind: exclusive`` so the general loader records the manifest but
does not import/register() it. The real activation happens through
``plugins/memory/__init__.py`` via ``memory.provider`` config.
"""
plugins_dir = tmp_path / "hermes_test" / "plugins"
plugin_dir = plugins_dir / "mempalace"
plugin_dir.mkdir(parents=True)
# No explicit `kind:` — the heuristic should kick in.
(plugin_dir / "plugin.yaml").write_text(yaml.dump({"name": "mempalace"}))
(plugin_dir / "__init__.py").write_text(
"class MemPalaceProvider:\n"
" pass\n"
"def register(ctx):\n"
" ctx.register_memory_provider('mempalace', MemPalaceProvider)\n"
)
# Even if the user explicitly enables it in config, the loader
# should still treat it as exclusive and skip general loading.
hermes_home = tmp_path / "hermes_test"
(hermes_home / "config.yaml").write_text(
yaml.safe_dump({"plugins": {"enabled": ["mempalace"]}})
)
monkeypatch.setenv("HERMES_HOME", str(hermes_home))
mgr = PluginManager()
mgr.discover_and_load()
assert "mempalace" in mgr._plugins
entry = mgr._plugins["mempalace"]
assert entry.manifest.kind == "exclusive", (
f"Expected auto-coerced kind='exclusive', got {entry.manifest.kind}"
)
# Not loaded by general manager (no register() call, no AttributeError).
assert not entry.enabled
assert entry.module is None
assert "exclusive" in (entry.error or "").lower()
def test_explicit_standalone_kind_not_coerced(self, tmp_path, monkeypatch):
"""If a plugin explicitly declares ``kind: standalone`` in its
manifest, the memory-provider heuristic must NOT override it
even if the source happens to mention ``MemoryProvider``.
"""
plugins_dir = tmp_path / "hermes_test" / "plugins"
plugin_dir = plugins_dir / "not_memory"
plugin_dir.mkdir(parents=True)
(plugin_dir / "plugin.yaml").write_text(
yaml.dump({"name": "not_memory", "kind": "standalone"})
)
(plugin_dir / "__init__.py").write_text(
"# This plugin inspects MemoryProvider docs but isn't one.\n"
"def register(ctx):\n pass\n"
)
monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes_test"))
mgr = PluginManager()
mgr.discover_and_load()
assert mgr._plugins["not_memory"].manifest.kind == "standalone"
# ── TestPluginHooks ────────────────────────────────────────────────────────
-1
View File
@@ -706,7 +706,6 @@ class TestNewEndpoints:
assert "skills" in data
assert isinstance(data["daily"], list)
assert "total_sessions" in data["totals"]
assert "total_api_calls" in data["totals"]
assert data["skills"] == {
"summary": {
"total_skill_loads": 0,
+116 -132
View File
@@ -6,7 +6,6 @@ turn counting, tags), and schema completeness.
"""
import json
import re
import threading
from types import SimpleNamespace
from unittest.mock import AsyncMock, MagicMock, patch
@@ -19,7 +18,6 @@ from plugins.memory.hindsight import (
REFLECT_SCHEMA,
RETAIN_SCHEMA,
_load_config,
_normalize_retain_tags,
)
@@ -34,30 +32,14 @@ def _clean_env(monkeypatch):
for key in (
"HINDSIGHT_API_KEY", "HINDSIGHT_API_URL", "HINDSIGHT_BANK_ID",
"HINDSIGHT_BUDGET", "HINDSIGHT_MODE", "HINDSIGHT_LLM_API_KEY",
"HINDSIGHT_RETAIN_TAGS", "HINDSIGHT_RETAIN_SOURCE",
"HINDSIGHT_RETAIN_USER_PREFIX", "HINDSIGHT_RETAIN_ASSISTANT_PREFIX",
):
monkeypatch.delenv(key, raising=False)
def _make_mock_client():
"""Create a mock Hindsight client with async methods."""
async def _aretain(
bank_id,
content,
timestamp=None,
context=None,
document_id=None,
metadata=None,
entities=None,
tags=None,
update_mode=None,
retain_async=None,
):
return SimpleNamespace(ok=True)
client = MagicMock()
client.aretain = AsyncMock(side_effect=_aretain)
client.aretain = AsyncMock()
client.arecall = AsyncMock(
return_value=SimpleNamespace(
results=[
@@ -74,14 +56,6 @@ def _make_mock_client():
return client
class _FakeSessionDB:
def __init__(self, messages=None):
self._messages = list(messages or [])
def get_messages_as_conversation(self, session_id):
return list(self._messages)
@pytest.fixture()
def provider(tmp_path, monkeypatch):
"""Create an initialized HindsightMemoryProvider with a mock client."""
@@ -135,18 +109,6 @@ def provider_with_config(tmp_path, monkeypatch):
return _make
def test_normalize_retain_tags_accepts_csv_and_dedupes():
assert _normalize_retain_tags("agent:fakeassistantname, source_system:hermes-agent, agent:fakeassistantname") == [
"agent:fakeassistantname",
"source_system:hermes-agent",
]
def test_normalize_retain_tags_accepts_json_array_string():
value = json.dumps(["agent:fakeassistantname", "source_system:hermes-agent"])
assert _normalize_retain_tags(value) == ["agent:fakeassistantname", "source_system:hermes-agent"]
# ---------------------------------------------------------------------------
# Schema tests
# ---------------------------------------------------------------------------
@@ -156,7 +118,6 @@ class TestSchemas:
def test_retain_schema_has_content(self):
assert RETAIN_SCHEMA["name"] == "hindsight_retain"
assert "content" in RETAIN_SCHEMA["parameters"]["properties"]
assert "tags" in RETAIN_SCHEMA["parameters"]["properties"]
assert "content" in RETAIN_SCHEMA["parameters"]["required"]
def test_recall_schema_has_query(self):
@@ -199,10 +160,7 @@ class TestConfig:
def test_custom_config_values(self, provider_with_config):
p = provider_with_config(
retain_tags=["tag1", "tag2"],
retain_source="hermes",
retain_user_prefix="User (fakeusername)",
retain_assistant_prefix="Assistant (fakeassistantname)",
tags=["tag1", "tag2"],
recall_tags=["recall-tag"],
recall_tags_match="all",
auto_retain=False,
@@ -217,10 +175,6 @@ class TestConfig:
bank_mission="Test agent mission",
)
assert p._tags == ["tag1", "tag2"]
assert p._retain_tags == ["tag1", "tag2"]
assert p._retain_source == "hermes"
assert p._retain_user_prefix == "User (fakeusername)"
assert p._retain_assistant_prefix == "Assistant (fakeassistantname)"
assert p._recall_tags == ["recall-tag"]
assert p._recall_tags_match == "all"
assert p._auto_retain is False
@@ -268,20 +222,11 @@ class TestToolHandlers:
assert call_kwargs["content"] == "user likes dark mode"
def test_retain_with_tags(self, provider_with_config):
p = provider_with_config(retain_tags=["pref", "ui"])
p = provider_with_config(tags=["pref", "ui"])
p.handle_tool_call("hindsight_retain", {"content": "likes dark mode"})
call_kwargs = p._client.aretain.call_args.kwargs
assert call_kwargs["tags"] == ["pref", "ui"]
def test_retain_merges_per_call_tags_with_config_tags(self, provider_with_config):
p = provider_with_config(retain_tags=["pref", "ui"])
p.handle_tool_call(
"hindsight_retain",
{"content": "likes dark mode", "tags": ["client:x", "ui"]},
)
call_kwargs = p._client.aretain.call_args.kwargs
assert call_kwargs["tags"] == ["pref", "ui", "client:x"]
def test_retain_without_tags(self, provider):
provider.handle_tool_call("hindsight_retain", {"content": "hello"})
call_kwargs = provider._client.aretain.call_args.kwargs
@@ -444,58 +389,38 @@ class TestPrefetch:
class TestSyncTurn:
def test_sync_turn_retains_metadata_rich_turn(self, provider_with_config):
p = provider_with_config(
retain_tags=["conv", "session1"],
retain_source="hermes",
retain_user_prefix="User (fakeusername)",
retain_assistant_prefix="Assistant (fakeassistantname)",
)
p.initialize(
session_id="session-1",
platform="discord",
user_id="fakeusername-123",
user_name="fakeusername",
chat_id="1485316232612941897",
chat_name="fakeassistantname-forums",
chat_type="thread",
thread_id="1491249007475949698",
agent_identity="fakeassistantname",
)
p._client = _make_mock_client()
def _get_retain_kwargs(self, provider):
"""Helper to get the kwargs from the aretain_batch call."""
return provider._client.aretain_batch.call_args.kwargs
p.sync_turn("hello", "hi there")
p._sync_thread.join(timeout=5.0)
def _get_retain_content(self, provider):
"""Helper to get the raw content string from the first item."""
kwargs = self._get_retain_kwargs(provider)
return kwargs["items"][0]["content"]
p._client.aretain_batch.assert_called_once()
call_kwargs = p._client.aretain_batch.call_args.kwargs
assert call_kwargs["bank_id"] == "test-bank"
assert call_kwargs["document_id"] == "session-1"
assert call_kwargs["retain_async"] is True
assert len(call_kwargs["items"]) == 1
item = call_kwargs["items"][0]
assert item["context"] == "conversation between Hermes Agent and the User"
assert item["tags"] == ["conv", "session1"]
content = json.loads(item["content"])
assert len(content) == 1
assert content[0][0]["role"] == "user"
assert content[0][0]["content"] == "User (fakeusername): hello"
assert content[0][1]["role"] == "assistant"
assert content[0][1]["content"] == "Assistant (fakeassistantname): hi there"
assert item["metadata"]["source"] == "hermes"
assert item["metadata"]["session_id"] == "session-1"
assert item["metadata"]["platform"] == "discord"
assert item["metadata"]["user_id"] == "fakeusername-123"
assert item["metadata"]["user_name"] == "fakeusername"
assert item["metadata"]["chat_id"] == "1485316232612941897"
assert item["metadata"]["chat_name"] == "fakeassistantname-forums"
assert item["metadata"]["chat_type"] == "thread"
assert item["metadata"]["thread_id"] == "1491249007475949698"
assert item["metadata"]["agent_identity"] == "fakeassistantname"
assert item["metadata"]["turn_index"] == "1"
assert item["metadata"]["message_count"] == "2"
assert re.fullmatch(r"\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\.\d+)?\+00:00", content[0][0]["timestamp"])
assert re.fullmatch(r"\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z", item["metadata"]["retained_at"])
def _get_retain_messages(self, provider):
"""Helper to parse the first turn's messages from retained content.
Content is a JSON array of turns: [[msgs...], [msgs...], ...]
For single-turn tests, returns the first turn's messages.
"""
content = self._get_retain_content(provider)
turns = json.loads(content)
return turns[0] if len(turns) == 1 else turns
def test_sync_turn_retains(self, provider):
provider.sync_turn("hello", "hi there")
if provider._sync_thread:
provider._sync_thread.join(timeout=5.0)
provider._client.aretain_batch.assert_called_once()
messages = self._get_retain_messages(provider)
assert len(messages) == 2
assert messages[0]["role"] == "user"
assert messages[0]["content"] == "hello"
assert "timestamp" in messages[0]
assert messages[1]["role"] == "assistant"
assert messages[1]["content"] == "hi there"
assert "timestamp" in messages[1]
def test_sync_turn_skipped_when_auto_retain_off(self, provider_with_config):
p = provider_with_config(auto_retain=False)
@@ -503,33 +428,93 @@ class TestSyncTurn:
assert p._sync_thread is None
p._client.aretain_batch.assert_not_called()
def test_sync_turn_every_n_turns(self, provider_with_config):
p = provider_with_config(retain_every_n_turns=3, retain_async=False)
p.sync_turn("turn1-user", "turn1-asst")
assert p._sync_thread is None
p.sync_turn("turn2-user", "turn2-asst")
assert p._sync_thread is None
p.sync_turn("turn3-user", "turn3-asst")
p._sync_thread.join(timeout=5.0)
p._client.aretain_batch.assert_called_once()
call_kwargs = p._client.aretain_batch.call_args.kwargs
def test_sync_turn_with_tags(self, provider_with_config):
p = provider_with_config(tags=["conv", "session1"])
p.sync_turn("hello", "hi")
if p._sync_thread:
p._sync_thread.join(timeout=5.0)
item = p._client.aretain_batch.call_args.kwargs["items"][0]
assert item["tags"] == ["conv", "session1"]
def test_sync_turn_uses_aretain_batch(self, provider):
"""sync_turn should use aretain_batch with retain_async."""
provider.sync_turn("hello", "hi")
if provider._sync_thread:
provider._sync_thread.join(timeout=5.0)
provider._client.aretain_batch.assert_called_once()
call_kwargs = provider._client.aretain_batch.call_args.kwargs
assert call_kwargs["document_id"] == "test-session"
assert call_kwargs["retain_async"] is True
assert len(call_kwargs["items"]) == 1
assert call_kwargs["items"][0]["context"] == "conversation between Hermes Agent and the User"
def test_sync_turn_custom_context(self, provider_with_config):
p = provider_with_config(retain_context="my-agent")
p.sync_turn("hello", "hi")
if p._sync_thread:
p._sync_thread.join(timeout=5.0)
item = p._client.aretain_batch.call_args.kwargs["items"][0]
assert item["context"] == "my-agent"
def test_sync_turn_every_n_turns(self, provider_with_config):
"""With retain_every_n_turns=3, only retains on every 3rd turn."""
p = provider_with_config(retain_every_n_turns=3)
p.sync_turn("turn1-user", "turn1-asst")
assert p._sync_thread is None # not retained yet
p.sync_turn("turn2-user", "turn2-asst")
assert p._sync_thread is None # not retained yet
p.sync_turn("turn3-user", "turn3-asst")
assert p._sync_thread is not None # retained!
p._sync_thread.join(timeout=5.0)
p._client.aretain_batch.assert_called_once()
content = p._client.aretain_batch.call_args.kwargs["items"][0]["content"]
# Should contain all 3 turns
assert "turn1-user" in content
assert "turn2-user" in content
assert "turn3-user" in content
def test_sync_turn_accumulates_full_session(self, provider_with_config):
"""Each retain sends the ENTIRE session, not just the latest batch."""
p = provider_with_config(retain_every_n_turns=2)
p.sync_turn("turn1-user", "turn1-asst")
p.sync_turn("turn2-user", "turn2-asst")
if p._sync_thread:
p._sync_thread.join(timeout=5.0)
p._client.aretain_batch.reset_mock()
p.sync_turn("turn3-user", "turn3-asst")
p.sync_turn("turn4-user", "turn4-asst")
if p._sync_thread:
p._sync_thread.join(timeout=5.0)
content = p._client.aretain_batch.call_args.kwargs["items"][0]["content"]
# Should contain ALL turns from the session
assert "turn1-user" in content
assert "turn2-user" in content
assert "turn3-user" in content
assert "turn4-user" in content
def test_sync_turn_passes_document_id(self, provider):
"""sync_turn should pass session_id as document_id for dedup."""
provider.sync_turn("hello", "hi")
if provider._sync_thread:
provider._sync_thread.join(timeout=5.0)
call_kwargs = provider._client.aretain_batch.call_args.kwargs
assert call_kwargs["document_id"] == "test-session"
assert call_kwargs["retain_async"] is False
item = call_kwargs["items"][0]
content = json.loads(item["content"])
assert len(content) == 3
assert content[-1][0]["role"] == "user"
assert content[-1][0]["content"] == "User: turn3-user"
assert content[-1][1]["role"] == "assistant"
assert content[-1][1]["content"] == "Assistant: turn3-asst"
assert item["metadata"]["turn_index"] == "3"
assert item["metadata"]["message_count"] == "6"
def test_sync_turn_error_does_not_raise(self, provider):
"""Errors in sync_turn should be swallowed (non-blocking)."""
provider._client.aretain_batch.side_effect = RuntimeError("network error")
provider.sync_turn("hello", "hi")
if provider._sync_thread:
provider._sync_thread.join(timeout=5.0)
# Should not raise
# ---------------------------------------------------------------------------
@@ -570,11 +555,10 @@ class TestConfigSchema:
"mode", "api_url", "api_key", "llm_provider", "llm_api_key",
"llm_model", "bank_id", "bank_mission", "bank_retain_mission",
"recall_budget", "memory_mode", "recall_prefetch_method",
"retain_tags", "retain_source",
"retain_user_prefix", "retain_assistant_prefix",
"recall_tags", "recall_tags_match",
"tags", "recall_tags", "recall_tags_match",
"auto_recall", "auto_retain",
"retain_every_n_turns", "retain_async", "retain_context",
"retain_every_n_turns", "retain_async",
"retain_context",
"recall_max_tokens", "recall_max_input_chars",
"recall_prompt_preamble",
}
-93
View File
@@ -1216,15 +1216,6 @@ class TestBuildAssistantMessage:
result = agent._build_assistant_message(msg, "stop")
assert result["reasoning"] == "thinking"
def test_reasoning_content_preserved_separately(self, agent):
msg = _mock_assistant_msg(
content="answer",
reasoning="summary",
reasoning_content="provider scratchpad",
)
result = agent._build_assistant_message(msg, "stop")
assert result["reasoning_content"] == "provider scratchpad"
def test_with_tool_calls(self, agent):
tc = _mock_tool_call(name="web_search", arguments='{"q":"test"}', call_id="c1")
msg = _mock_assistant_msg(content="", tool_calls=[tc])
@@ -4197,90 +4188,6 @@ class TestPersistUserMessageOverride:
assert first_db_write["content"] == "Hello there"
class TestReasoningReplayForStrictProviders:
"""Assistant replay must preserve provider-native reasoning fields."""
def _setup_agent(self, agent):
agent._cached_system_prompt = "You are helpful."
agent._use_prompt_caching = False
agent.tool_delay = 0
agent.compression_enabled = False
agent.save_trajectories = False
def test_kimi_tool_replay_includes_empty_reasoning_content(self, agent):
self._setup_agent(agent)
agent.base_url = "https://api.kimi.com/coding/v1"
agent._base_url_lower = agent.base_url.lower()
agent.provider = "kimi-coding"
prior_assistant = {
"role": "assistant",
"content": "",
"tool_calls": [
{
"id": "c1",
"type": "function",
"function": {"name": "terminal", "arguments": "{\"command\":\"date\"}"},
}
],
}
tool_result = {"role": "tool", "tool_call_id": "c1", "content": "Tue Apr 21"}
final_resp = _mock_response(content="done", finish_reason="stop")
agent.client.chat.completions.create.return_value = final_resp
with (
patch.object(agent, "_persist_session"),
patch.object(agent, "_save_trajectory"),
patch.object(agent, "_cleanup_task_resources"),
):
result = agent.run_conversation(
"next step",
conversation_history=[prior_assistant, tool_result],
)
assert result["completed"] is True
sent_messages = agent.client.chat.completions.create.call_args.kwargs["messages"]
replayed_assistant = next(msg for msg in sent_messages if msg.get("role") == "assistant")
assert replayed_assistant["role"] == "assistant"
assert replayed_assistant["tool_calls"][0]["function"]["name"] == "terminal"
assert "reasoning_content" in replayed_assistant
assert replayed_assistant["reasoning_content"] == ""
def test_explicit_reasoning_content_beats_normalized_reasoning_on_replay(self, agent):
self._setup_agent(agent)
prior_assistant = {
"role": "assistant",
"content": "",
"tool_calls": [
{
"id": "c1",
"type": "function",
"function": {"name": "web_search", "arguments": "{\"q\":\"test\"}"},
}
],
"reasoning": "summary reasoning",
"reasoning_content": "provider-native scratchpad",
}
tool_result = {"role": "tool", "tool_call_id": "c1", "content": "ok"}
final_resp = _mock_response(content="done", finish_reason="stop")
agent.client.chat.completions.create.return_value = final_resp
with (
patch.object(agent, "_persist_session"),
patch.object(agent, "_save_trajectory"),
patch.object(agent, "_cleanup_task_resources"),
):
result = agent.run_conversation(
"next step",
conversation_history=[prior_assistant, tool_result],
)
assert result["completed"] is True
sent_messages = agent.client.chat.completions.create.call_args.kwargs["messages"]
replayed_assistant = next(msg for msg in sent_messages if msg.get("role") == "assistant")
assert replayed_assistant["reasoning_content"] == "provider-native scratchpad"
# ---------------------------------------------------------------------------
# Bugfix: _vprint force=True on error messages during TTS
# ---------------------------------------------------------------------------
@@ -13,7 +13,8 @@ They do NOT boot the full AIAgent — the prologue-fix guarantees are pure
function contracts at module scope.
"""
from run_agent import _chat_content_to_responses_parts, _summarize_user_message_for_log
from run_agent import _summarize_user_message_for_log
from agent.codex_responses_adapter import _chat_content_to_responses_parts
class TestSummarizeUserMessageForLog:
+3 -183
View File
@@ -93,27 +93,6 @@ class TestSessionLifecycle:
assert session["input_tokens"] == 300
assert session["output_tokens"] == 150
def test_update_token_counts_tracks_api_call_count(self, db):
"""api_call_count increments with each update_token_counts call."""
db.create_session(session_id="s1", source="cli")
db.update_token_counts("s1", input_tokens=100, output_tokens=50, api_call_count=1)
db.update_token_counts("s1", input_tokens=100, output_tokens=50, api_call_count=1)
db.update_token_counts("s1", input_tokens=100, output_tokens=50, api_call_count=1)
session = db.get_session("s1")
assert session["api_call_count"] == 3
def test_update_token_counts_api_call_count_absolute(self, db):
"""absolute mode sets api_call_count directly."""
db.create_session(session_id="s1", source="cli")
db.update_token_counts("s1", input_tokens=100, output_tokens=50, api_call_count=1)
db.update_token_counts("s1", input_tokens=300, output_tokens=150,
api_call_count=5, absolute=True)
session = db.get_session("s1")
assert session["api_call_count"] == 5
assert session["input_tokens"] == 300
def test_update_token_counts_backfills_model_when_null(self, db):
db.create_session(session_id="s1", source="telegram")
db.update_token_counts("s1", input_tokens=10, output_tokens=5, model="openai/gpt-5.4")
@@ -276,38 +255,6 @@ class TestMessageStorage:
assert msg["reasoning"] == "Thinking about what to say"
assert msg["reasoning_details"] == details
def test_reasoning_content_persisted_and_restored(self, db):
"""reasoning_content must survive session replay as its own field."""
db.create_session(session_id="s1", source="cli")
db.append_message(
"s1",
role="assistant",
content="Hello",
reasoning="Short summary",
reasoning_content="Longer provider-native scratchpad",
)
conv = db.get_messages_as_conversation("s1")
assert len(conv) == 1
assert conv[0]["reasoning"] == "Short summary"
assert conv[0]["reasoning_content"] == "Longer provider-native scratchpad"
def test_reasoning_content_empty_string_restored_for_assistant(self, db):
"""Empty reasoning_content still needs to round-trip for strict replays."""
db.create_session(session_id="s1", source="cli")
db.append_message(
"s1",
role="assistant",
content="",
tool_calls=[{"id": "c1", "type": "function", "function": {"name": "date", "arguments": "{}"}}],
reasoning_content="",
)
conv = db.get_messages_as_conversation("s1")
assert len(conv) == 1
assert "reasoning_content" in conv[0]
assert conv[0]["reasoning_content"] == ""
def test_reasoning_not_set_for_non_assistant(self, db):
"""reasoning is never leaked onto user or tool messages."""
db.create_session(session_id="s1", source="telegram")
@@ -1173,7 +1120,7 @@ class TestSchemaInit:
def test_schema_version(self, db):
cursor = db._conn.execute("SELECT version FROM schema_version")
version = cursor.fetchone()[0]
assert version == 8
assert version == 6
def test_title_column_exists(self, db):
"""Verify the title column was created in the sessions table."""
@@ -1229,24 +1176,18 @@ class TestSchemaInit:
conn.commit()
conn.close()
# Open with SessionDB — should migrate to v8
# Open with SessionDB — should migrate to v6
migrated_db = SessionDB(db_path=db_path)
# Verify migration
cursor = migrated_db._conn.execute("SELECT version FROM schema_version")
assert cursor.fetchone()[0] == 8
assert cursor.fetchone()[0] == 6
# Verify title column exists and is NULL for existing sessions
session = migrated_db.get_session("existing")
assert session is not None
assert session["title"] is None
# Verify api_call_count column was added with default 0
cursor = migrated_db._conn.execute(
"SELECT api_call_count FROM sessions WHERE id = 'existing'"
)
assert cursor.fetchone()[0] == 0
# Verify we can set title on migrated session
assert migrated_db.set_session_title("existing", "Migrated Title") is True
session = migrated_db.get_session("existing")
@@ -1791,124 +1732,3 @@ class TestConcurrentWriteSafety:
assert "30" in src, (
"SQLite timeout should be at least 30s to handle CLI/gateway lock contention"
)
# =========================================================================
# Auto-maintenance: state_meta + vacuum + maybe_auto_prune_and_vacuum
# =========================================================================
class TestStateMeta:
def test_get_meta_missing_returns_none(self, db):
assert db.get_meta("nonexistent") is None
def test_set_then_get_meta(self, db):
db.set_meta("foo", "bar")
assert db.get_meta("foo") == "bar"
def test_set_meta_upsert(self, db):
"""set_meta overwrites existing value (ON CONFLICT DO UPDATE)."""
db.set_meta("key", "v1")
db.set_meta("key", "v2")
assert db.get_meta("key") == "v2"
class TestVacuum:
def test_vacuum_runs_without_error(self, db):
"""VACUUM must succeed on a fresh DB (no rows to reclaim)."""
db.create_session(session_id="s1", source="cli")
db.append_message(session_id="s1", role="user", content="hi")
# Should not raise, even though there's nothing significant to reclaim.
db.vacuum()
class TestAutoMaintenance:
def _make_old_ended(self, db, sid: str, days_old: int = 100):
"""Create a session that is ended and was started `days_old` days ago."""
db.create_session(session_id=sid, source="cli")
db.end_session(sid, end_reason="done")
db._conn.execute(
"UPDATE sessions SET started_at = ? WHERE id = ?",
(time.time() - days_old * 86400, sid),
)
db._conn.commit()
def test_first_run_prunes_and_vacuums(self, db):
self._make_old_ended(db, "old1", days_old=100)
self._make_old_ended(db, "old2", days_old=100)
db.create_session(session_id="new", source="cli") # active, must survive
result = db.maybe_auto_prune_and_vacuum(retention_days=90)
assert result["skipped"] is False
assert result["pruned"] == 2
assert result["vacuumed"] is True
assert result.get("error") is None
assert db.get_session("old1") is None
assert db.get_session("old2") is None
assert db.get_session("new") is not None
def test_second_call_within_interval_skips(self, db):
self._make_old_ended(db, "old", days_old=100)
first = db.maybe_auto_prune_and_vacuum(
retention_days=90, min_interval_hours=24
)
assert first["skipped"] is False
assert first["pruned"] == 1
# Create another prunable session; a second call within
# min_interval_hours should still skip without touching it.
self._make_old_ended(db, "old2", days_old=100)
second = db.maybe_auto_prune_and_vacuum(
retention_days=90, min_interval_hours=24
)
assert second["skipped"] is True
assert second["pruned"] == 0
assert db.get_session("old2") is not None # untouched
def test_second_call_after_interval_runs_again(self, db):
self._make_old_ended(db, "old", days_old=100)
db.maybe_auto_prune_and_vacuum(retention_days=90, min_interval_hours=24)
# Backdate the last-run marker to force another run.
db.set_meta("last_auto_prune", str(time.time() - 48 * 3600))
self._make_old_ended(db, "old2", days_old=100)
result = db.maybe_auto_prune_and_vacuum(
retention_days=90, min_interval_hours=24
)
assert result["skipped"] is False
assert result["pruned"] == 1
assert db.get_session("old2") is None
def test_no_prunable_sessions_no_vacuum(self, db):
"""When prune deletes 0 rows, VACUUM is skipped (wasted I/O)."""
db.create_session(session_id="fresh", source="cli") # too recent
result = db.maybe_auto_prune_and_vacuum(retention_days=90)
assert result["skipped"] is False
assert result["pruned"] == 0
assert result["vacuumed"] is False
# But last-run is still recorded so we don't retry immediately.
assert db.get_meta("last_auto_prune") is not None
def test_vacuum_disabled_via_flag(self, db):
self._make_old_ended(db, "old", days_old=100)
result = db.maybe_auto_prune_and_vacuum(retention_days=90, vacuum=False)
assert result["pruned"] == 1
assert result["vacuumed"] is False
def test_corrupt_last_run_marker_treated_as_no_prior_run(self, db):
"""A non-numeric marker must not break maintenance."""
db.set_meta("last_auto_prune", "not-a-timestamp")
self._make_old_ended(db, "old", days_old=100)
result = db.maybe_auto_prune_and_vacuum(retention_days=90)
assert result["skipped"] is False
assert result["pruned"] == 1
def test_state_meta_survives_vacuum(self, db):
"""Marker written just before VACUUM must still be readable after."""
self._make_old_ended(db, "old", days_old=100)
db.maybe_auto_prune_and_vacuum(retention_days=90)
marker = db.get_meta("last_auto_prune")
assert marker is not None
# Should parse as a float timestamp close to now.
assert abs(float(marker) - time.time()) < 60
-13
View File
@@ -19,8 +19,6 @@ from tools.file_operations import (
BINARY_EXTENSIONS,
IMAGE_EXTENSIONS,
MAX_LINE_LENGTH,
normalize_read_pagination,
normalize_search_pagination,
)
@@ -194,17 +192,6 @@ def file_ops(mock_env):
class TestShellFileOpsHelpers:
def test_normalize_read_pagination_clamps_invalid_values(self):
assert normalize_read_pagination(offset=0, limit=0) == (1, 1)
assert normalize_read_pagination(offset=-10, limit=-5) == (1, 1)
assert normalize_read_pagination(offset="bad", limit="bad") == (1, 500)
assert normalize_read_pagination(offset=2, limit=999999) == (2, 2000)
def test_normalize_search_pagination_clamps_invalid_values(self):
assert normalize_search_pagination(offset=-10, limit=-5) == (0, 1)
assert normalize_search_pagination(offset="bad", limit="bad") == (0, 50)
assert normalize_search_pagination(offset=3, limit=0) == (3, 1)
def test_escape_shell_arg_simple(self, file_ops):
assert file_ops._escape_shell_arg("hello") == "'hello'"
@@ -146,61 +146,3 @@ class TestCheckLintBracePaths:
assert result.success is False
assert "SyntaxError" in result.output
# =========================================================================
# Pagination bounds
# =========================================================================
class TestPaginationBounds:
"""Invalid pagination inputs should not leak into shell commands."""
def test_read_file_clamps_offset_and_limit_before_building_sed_range(self):
env = MagicMock()
env.cwd = "/tmp"
ops = ShellFileOperations(env)
commands = []
def fake_exec(command, *args, **kwargs):
commands.append(command)
if command.startswith("wc -c"):
return MagicMock(exit_code=0, stdout="12")
if command.startswith("head -c"):
return MagicMock(exit_code=0, stdout="line1\nline2\n")
if command.startswith("sed -n"):
return MagicMock(exit_code=0, stdout="line1\n")
if command.startswith("wc -l"):
return MagicMock(exit_code=0, stdout="2")
return MagicMock(exit_code=0, stdout="")
with patch.object(ops, "_exec", side_effect=fake_exec):
result = ops.read_file("notes.txt", offset=0, limit=0)
assert result.error is None
assert " 1|line1" in result.content
sed_commands = [cmd for cmd in commands if cmd.startswith("sed -n")]
assert sed_commands == ["sed -n '1,1p' 'notes.txt'"]
def test_search_clamps_offset_and_limit_before_building_head_pipeline(self):
env = MagicMock()
env.cwd = "/tmp"
ops = ShellFileOperations(env)
commands = []
def fake_exec(command, *args, **kwargs):
commands.append(command)
if command.startswith("test -e"):
return MagicMock(exit_code=0, stdout="exists")
if command.startswith("rg --files"):
return MagicMock(exit_code=0, stdout="a.py\n")
return MagicMock(exit_code=0, stdout="")
with patch.object(ops, "_has_command", side_effect=lambda cmd: cmd == "rg"), \
patch.object(ops, "_exec", side_effect=fake_exec):
result = ops.search("*.py", target="files", path=".", offset=-4, limit=-2)
assert result.files == ["a.py"]
rg_commands = [cmd for cmd in commands if cmd.startswith("rg --files")]
assert rg_commands
assert "| head -n 1" in rg_commands[0]
-28
View File
@@ -45,19 +45,6 @@ class TestReadFileHandler:
read_file_tool("/tmp/big.txt", offset=10, limit=20)
mock_ops.read_file.assert_called_once_with("/tmp/big.txt", 10, 20)
@patch("tools.file_tools._get_file_ops")
def test_invalid_offset_and_limit_are_normalized_before_dispatch(self, mock_get):
mock_ops = MagicMock()
result_obj = MagicMock()
result_obj.content = "line1"
result_obj.to_dict.return_value = {"content": "line1", "total_lines": 1}
mock_ops.read_file.return_value = result_obj
mock_get.return_value = mock_ops
from tools.file_tools import read_file_tool
read_file_tool("/tmp/big.txt", offset=0, limit=0)
mock_ops.read_file.assert_called_once_with("/tmp/big.txt", 1, 1)
@patch("tools.file_tools._get_file_ops")
def test_exception_returns_error_json(self, mock_get):
mock_get.side_effect = RuntimeError("terminal not available")
@@ -204,21 +191,6 @@ class TestSearchHandler:
limit=10, offset=5, output_mode="count", context=2,
)
@patch("tools.file_tools._get_file_ops")
def test_search_normalizes_invalid_pagination_before_dispatch(self, mock_get):
mock_ops = MagicMock()
result_obj = MagicMock()
result_obj.to_dict.return_value = {"files": []}
mock_ops.search.return_value = result_obj
mock_get.return_value = mock_ops
from tools.file_tools import search_tool
search_tool(pattern="class", target="files", path="/src", limit=-5, offset=-2)
mock_ops.search.assert_called_once_with(
pattern="class", path="/src", target="files", file_glob=None,
limit=1, offset=0, output_mode="content", context=0,
)
@patch("tools.file_tools._get_file_ops")
def test_search_exception_returns_error(self, mock_get):
mock_get.side_effect = RuntimeError("no terminal")
+2 -37
View File
@@ -271,40 +271,6 @@ LINTERS = {
MAX_LINES = 2000
MAX_LINE_LENGTH = 2000
MAX_FILE_SIZE = 50 * 1024 # 50KB
DEFAULT_READ_OFFSET = 1
DEFAULT_READ_LIMIT = 500
DEFAULT_SEARCH_OFFSET = 0
DEFAULT_SEARCH_LIMIT = 50
def _coerce_int(value: Any, default: int) -> int:
"""Best-effort integer coercion for tool pagination inputs."""
try:
return int(value)
except (TypeError, ValueError):
return default
def normalize_read_pagination(offset: Any = DEFAULT_READ_OFFSET,
limit: Any = DEFAULT_READ_LIMIT) -> tuple[int, int]:
"""Return safe read_file pagination bounds.
Tool schemas declare minimum/maximum values, but not every caller or
provider enforces schemas before dispatch. Clamp here so invalid values
cannot leak into sed ranges like ``0,-1p``.
"""
normalized_offset = max(1, _coerce_int(offset, DEFAULT_READ_OFFSET))
normalized_limit = _coerce_int(limit, DEFAULT_READ_LIMIT)
normalized_limit = max(1, min(normalized_limit, MAX_LINES))
return normalized_offset, normalized_limit
def normalize_search_pagination(offset: Any = DEFAULT_SEARCH_OFFSET,
limit: Any = DEFAULT_SEARCH_LIMIT) -> tuple[int, int]:
"""Return safe search pagination bounds for shell head/tail pipelines."""
normalized_offset = max(0, _coerce_int(offset, DEFAULT_SEARCH_OFFSET))
normalized_limit = max(1, _coerce_int(limit, DEFAULT_SEARCH_LIMIT))
return normalized_offset, normalized_limit
class ShellFileOperations(FileOperations):
@@ -495,7 +461,8 @@ class ShellFileOperations(FileOperations):
# Expand ~ and other shell paths
path = self._expand_path(path)
offset, limit = normalize_read_pagination(offset, limit)
# Clamp limit
limit = min(limit, MAX_LINES)
# Check if file exists and get size (wc -c is POSIX, works on Linux + macOS)
stat_cmd = f"wc -c < {self._escape_shell_arg(path)} 2>/dev/null"
@@ -899,8 +866,6 @@ class ShellFileOperations(FileOperations):
Returns:
SearchResult with matches or file list
"""
offset, limit = normalize_search_pagination(offset, limit)
# Expand ~ and other shell paths
path = self._expand_path(path)
+1 -9
View File
@@ -11,11 +11,7 @@ from typing import Optional
from agent.file_safety import get_read_block_error
from tools.binary_extensions import has_binary_extension
from tools.file_operations import (
ShellFileOperations,
normalize_read_pagination,
normalize_search_pagination,
)
from tools.file_operations import ShellFileOperations
from tools import file_state
from agent.redact import redact_sensitive_text
@@ -355,8 +351,6 @@ def clear_file_ops_cache(task_id: str = None):
def read_file_tool(path: str, offset: int = 1, limit: int = 500, task_id: str = "default") -> str:
"""Read a file with pagination and line numbers."""
try:
offset, limit = normalize_read_pagination(offset, limit)
# ── Device path guard ─────────────────────────────────────────
# Block paths that would hang the process (infinite output,
# blocking on input). Pure path check — no I/O.
@@ -768,8 +762,6 @@ def search_tool(pattern: str, target: str = "content", path: str = ".",
task_id: str = "default") -> str:
"""Search for content or files."""
try:
offset, limit = normalize_search_pagination(offset, limit)
# Track searches to detect *consecutive* repeated search loops.
# Include pagination args so users can page through truncated
# results without tripping the repeated-search guard.
-3
View File
@@ -314,7 +314,6 @@ export interface AnalyticsDailyEntry {
estimated_cost: number;
actual_cost: number;
sessions: number;
api_calls: number;
}
export interface AnalyticsModelEntry {
@@ -323,7 +322,6 @@ export interface AnalyticsModelEntry {
output_tokens: number;
estimated_cost: number;
sessions: number;
api_calls: number;
}
export interface AnalyticsSkillEntry {
@@ -353,7 +351,6 @@ export interface AnalyticsResponse {
total_estimated_cost: number;
total_actual_cost: number;
total_sessions: number;
total_api_calls: number;
};
skills: {
summary: AnalyticsSkillsSummary;
+1 -1
View File
@@ -347,7 +347,7 @@ export default function AnalyticsPage() {
<SummaryCard
icon={TrendingUp}
label={t.analytics.apiCalls}
value={String(data.totals.total_api_calls ?? data.daily.reduce((sum, d) => sum + d.sessions, 0))}
value={String(data.daily.reduce((sum, d) => sum + d.sessions, 0))}
sub={t.analytics.acrossModels.replace("{count}", String(data.by_model.length))}
/>
</div>
+1 -45
View File
@@ -30,8 +30,6 @@ You need at least one way to connect to an LLM. Use `hermes model` to switch pro
| **Alibaba Cloud** | `DASHSCOPE_API_KEY` in `~/.hermes/.env` (provider: `alibaba`, aliases: `dashscope`, `qwen`) |
| **Kilo Code** | `KILOCODE_API_KEY` in `~/.hermes/.env` (provider: `kilocode`) |
| **Xiaomi MiMo** | `XIAOMI_API_KEY` in `~/.hermes/.env` (provider: `xiaomi`, aliases: `mimo`, `xiaomi-mimo`) |
| **Volcengine** | `hermes model` or `VOLCENGINE_API_KEY` in `~/.hermes/.env` (provider: `volcengine`) |
| **BytePlus** | `hermes model` or `BYTEPLUS_API_KEY` in `~/.hermes/.env` (provider: `byteplus`) |
| **OpenCode Zen** | `OPENCODE_ZEN_API_KEY` in `~/.hermes/.env` (provider: `opencode-zen`) |
| **OpenCode Go** | `OPENCODE_GO_API_KEY` in `~/.hermes/.env` (provider: `opencode-go`) |
| **DeepSeek** | `DEEPSEEK_API_KEY` in `~/.hermes/.env` (provider: `deepseek`) |
@@ -276,59 +274,17 @@ hermes chat --provider xiaomi --model mimo-v2-pro
# Arcee AI (Trinity models)
hermes chat --provider arcee --model trinity-large-thinking
# Requires: ARCEEAI_API_KEY in ~/.hermes/.env
# Volcengine
hermes chat --provider volcengine --model volcengine/doubao-seed-2-0-pro-260215
# Requires: VOLCENGINE_API_KEY in ~/.hermes/.env
# Volcengine Coding Plan catalog (same provider, same API key)
hermes chat --provider volcengine --model volcengine-coding-plan/doubao-seed-2.0-code
# BytePlus
hermes chat --provider byteplus --model byteplus/seed-2-0-pro-260328
# Requires: BYTEPLUS_API_KEY in ~/.hermes/.env
# BytePlus Coding Plan catalog (same provider, same API key)
hermes chat --provider byteplus --model byteplus-coding-plan/dola-seed-2.0-pro
```
Or set the provider permanently in `config.yaml`:
```yaml
model:
provider: "zai" # or: kimi-coding, kimi-coding-cn, minimax, minimax-cn, alibaba, xiaomi, arcee, volcengine, byteplus
provider: "zai" # or: kimi-coding, kimi-coding-cn, minimax, minimax-cn, alibaba, xiaomi, arcee
default: "glm-5"
```
Base URLs can be overridden with `GLM_BASE_URL`, `KIMI_BASE_URL`, `MINIMAX_BASE_URL`, `MINIMAX_CN_BASE_URL`, `DASHSCOPE_BASE_URL`, or `XIAOMI_BASE_URL` environment variables.
### Volcengine and BytePlus Contract Catalogs
Hermes exposes **two** built-in providers for these integrations:
- `volcengine`
- `byteplus`
Each provider includes both its standard catalog and its Coding Plan catalog. The selected model ID determines the runtime base URL automatically:
- `volcengine/...` -> `https://ark.cn-beijing.volces.com/api/v3`
- `volcengine-coding-plan/...` -> `https://ark.cn-beijing.volces.com/api/coding/v3`
- `byteplus/...` -> `https://ark.ap-southeast.bytepluses.com/api/v3`
- `byteplus-coding-plan/...` -> `https://ark.ap-southeast.bytepluses.com/api/coding/v3`
In `hermes model`, the setup flow is:
1. Enter API key
2. Select a model
If you pick a `volcengine-coding-plan/...` or `byteplus-coding-plan/...` model, Hermes automatically uses the corresponding coding-plan base URL.
The API key is shared per provider:
- `VOLCENGINE_API_KEY` works for both `volcengine/...` and `volcengine-coding-plan/...`
- `BYTEPLUS_API_KEY` works for both `byteplus/...` and `byteplus-coding-plan/...`
Use `hermes model` to pick from the built-in curated catalogs. Hermes saves the canonical prefixed model ID in `config.yaml`, so standard and Coding Plan variants remain unambiguous.
:::note Z.AI Endpoint Auto-Detection
When using the Z.AI / GLM provider, Hermes automatically probes multiple endpoints (global, China, coding variants) to find one that accepts your API key. You don't need to set `GLM_BASE_URL` manually — the working endpoint is detected and cached automatically.
:::
@@ -44,8 +44,6 @@ All variables go in `~/.hermes/.env`. You can also set them with `hermes config
| `KILOCODE_BASE_URL` | Override Kilo Code base URL (default: `https://api.kilo.ai/api/gateway`) |
| `XIAOMI_API_KEY` | Xiaomi MiMo API key ([platform.xiaomimimo.com](https://platform.xiaomimimo.com)) |
| `XIAOMI_BASE_URL` | Override Xiaomi MiMo base URL (default: `https://api.xiaomimimo.com/v1`) |
| `VOLCENGINE_API_KEY` | Volcengine API key for Doubao / Seed models ([volcengine.com/product/ark](https://www.volcengine.com/product/ark)) |
| `BYTEPLUS_API_KEY` | BytePlus API key for Seed / Dola models ([byteplus.com/en/product/modelark](https://www.byteplus.com/en/product/modelark)) |
| `HF_TOKEN` | Hugging Face token for Inference Providers ([huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)) |
| `HF_BASE_URL` | Override Hugging Face base URL (default: `https://router.huggingface.co/v1`) |
| `GOOGLE_API_KEY` | Google AI Studio API key ([aistudio.google.com/app/apikey](https://aistudio.google.com/app/apikey)) |
+1 -1
View File
@@ -628,7 +628,7 @@ Every model slot in Hermes — auxiliary tasks, compression, fallback — uses t
When `base_url` is set, Hermes ignores the provider and calls that endpoint directly (using `api_key` or `OPENAI_API_KEY` for auth). When only `provider` is set, Hermes uses that provider's built-in auth and base URL.
Available providers for auxiliary tasks: `auto`, `main`, plus any provider in the [provider registry](/docs/reference/environment-variables) — `openrouter`, `nous`, `openai-codex`, `copilot`, `copilot-acp`, `anthropic`, `gemini`, `google-gemini-cli`, `qwen-oauth`, `zai`, `kimi-coding`, `kimi-coding-cn`, `minimax`, `minimax-cn`, `deepseek`, `nvidia`, `xai`, `ollama-cloud`, `alibaba`, `bedrock`, `huggingface`, `arcee`, `xiaomi`, `volcengine`, `byteplus`, `kilocode`, `opencode-zen`, `opencode-go`, `ai-gateway` — or any named custom provider from your `custom_providers` list (e.g. `provider: "beans"`).
Available providers for auxiliary tasks: `auto`, `main`, plus any provider in the [provider registry](/docs/reference/environment-variables) — `openrouter`, `nous`, `openai-codex`, `copilot`, `copilot-acp`, `anthropic`, `gemini`, `google-gemini-cli`, `qwen-oauth`, `zai`, `kimi-coding`, `kimi-coding-cn`, `minimax`, `minimax-cn`, `deepseek`, `nvidia`, `xai`, `ollama-cloud`, `alibaba`, `bedrock`, `huggingface`, `arcee`, `xiaomi`, `kilocode`, `opencode-zen`, `opencode-go`, `ai-gateway` — or any named custom provider from your `custom_providers` list (e.g. `provider: "beans"`).
:::warning `"main"` is for auxiliary tasks only
The `"main"` provider option means "use whatever provider my main agent uses" — it's only valid inside `auxiliary:`, `compression:`, and `fallback_model:` configs. It is **not** a valid value for your top-level `model.provider` setting. If you use a custom OpenAI-compatible endpoint, set `provider: custom` in your `model:` section. See [AI Providers](/docs/integrations/providers) for all main model provider options.
@@ -58,8 +58,6 @@ Both `provider` and `model` are **required**. If either is missing, the fallback
| OpenCode Go | `opencode-go` | `OPENCODE_GO_API_KEY` |
| Kilo Code | `kilocode` | `KILOCODE_API_KEY` |
| Xiaomi MiMo | `xiaomi` | `XIAOMI_API_KEY` |
| Volcengine | `volcengine` | `VOLCENGINE_API_KEY` |
| BytePlus | `byteplus` | `BYTEPLUS_API_KEY` |
| Arcee AI | `arcee` | `ARCEEAI_API_KEY` |
| Alibaba / DashScope | `alibaba` | `DASHSCOPE_API_KEY` |
| Hugging Face | `huggingface` | `HF_TOKEN` |
@@ -359,11 +359,7 @@ The setup wizard installs dependencies automatically and only installs what's ne
| `auto_retain` | `true` | Automatically retain conversation turns |
| `auto_recall` | `true` | Automatically recall memories before each turn |
| `retain_async` | `true` | Process retain asynchronously on the server |
| `retain_context` | `conversation between Hermes Agent and the User` | Context label for retained memories |
| `retain_tags` | — | Default tags applied to retained memories; merged with per-call tool tags |
| `retain_source` | — | Optional `metadata.source` attached to retained memories |
| `retain_user_prefix` | `User` | Label used before user turns in auto-retained transcripts |
| `retain_assistant_prefix` | `Assistant` | Label used before assistant turns in auto-retained transcripts |
| `tags` | — | Tags applied when storing memories |
| `recall_tags` | — | Tags to filter on recall |
See [plugin README](https://github.com/NousResearch/hermes-agent/blob/main/plugins/memory/hindsight/README.md) for the full configuration reference.
+6 -34
View File
@@ -17,52 +17,24 @@ Connect Hermes to [WeCom](https://work.weixin.qq.com/) (企业微信), Tencent's
## Setup
### Step 1: Create an AI Bot
#### Recommended: Scan-to-Create (one command)
```bash
hermes gateway setup
```
Select **WeCom** and scan the QR code with your WeCom mobile app. Hermes will automatically create a bot application with the correct permissions and save the credentials.
The setup wizard will:
1. Display a QR code in your terminal
2. Wait for you to scan it with the WeCom mobile app
3. Automatically retrieve the Bot ID and Secret
4. Guide you through access control configuration
#### Alternative: Manual Setup
If scan-to-create is not available, the wizard falls back to manual input:
### 1. Create an AI Bot
1. Log in to the [WeCom Admin Console](https://work.weixin.qq.com/wework_admin/frame)
2. Navigate to **Applications****Create Application** → **AI Bot**
3. Configure the bot name and description
4. Copy the **Bot ID** and **Secret** from the credentials page
5. Run `hermes gateway setup`, select **WeCom**, and enter the credentials when prompted
:::warning
Keep the Bot Secret private. Anyone with it can impersonate your bot.
:::
### 2. Configure Hermes
### Step 2: Configure Hermes
#### Option A: Interactive Setup (Recommended)
Run the interactive setup:
```bash
hermes gateway setup
```
Select **WeCom** and follow the prompts. The wizard will guide you through:
- Bot credentials (via QR scan or manual entry)
- Access control settings (allowlist, pairing mode, or open access)
- Home channel for notifications
Select **WeCom** and enter your Bot ID and Secret.
#### Option B: Manual Configuration
Add the following to `~/.hermes/.env`:
Or set environment variables in `~/.hermes/.env`:
```bash
WECOM_BOT_ID=your-bot-id
@@ -75,7 +47,7 @@ WECOM_ALLOWED_USERS=user_id_1,user_id_2
WECOM_HOME_CHANNEL=chat_id
```
### Step 3: Start the gateway
### 3. Start the gateway
```bash
hermes gateway
+2 -16
View File
@@ -386,21 +386,7 @@ Key tables in `state.db`:
- Gateway sessions auto-reset based on the configured reset policy
- Before reset, the agent saves memories and skills from the expiring session
- Opt-in auto-pruning: when `sessions.auto_prune` is `true`, ended sessions older than `sessions.retention_days` (default 90) are pruned at CLI/gateway startup
- After a prune that actually removed rows, `state.db` is `VACUUM`ed to reclaim disk space (SQLite does not shrink the file on plain DELETE)
- Pruning runs at most once per `sessions.min_interval_hours` (default 24); the last-run timestamp is tracked inside `state.db` itself so it's shared across every Hermes process in the same `HERMES_HOME`
Default is **off** — session history is valuable for `session_search` recall, and silently deleting it could surprise users. Enable in `~/.hermes/config.yaml`:
```yaml
sessions:
auto_prune: true # opt in — default is false
retention_days: 90 # keep ended sessions this many days
vacuum_after_prune: true # reclaim disk space after a pruning sweep
min_interval_hours: 24 # don't re-run the sweep more often than this
```
Active sessions are never auto-pruned, regardless of age.
- Ended sessions remain in the database until pruned
### Manual Cleanup
@@ -417,5 +403,5 @@ hermes sessions prune --older-than 30 --yes
```
:::tip
The database grows slowly (typical: 10-15 MB for hundreds of sessions) and session history powers `session_search` recall across past conversations, so auto-prune ships disabled. Enable it if you're running a heavy gateway/cron workload where `state.db` is meaningfully affecting performance (observed failure mode: 384 MB state.db with ~1000 sessions slowing down FTS5 inserts and `/resume` listing). Use `hermes sessions prune` for one-off cleanup without turning on the automatic sweep.
The database grows slowly (typical: 10-15 MB for hundreds of sessions). Pruning is mainly useful for removing old conversations you no longer need for search recall.
:::