Compare commits

..

1 Commits

Author SHA1 Message Date
Shannon Sands bad9fe2452 add generic gateway startup readiness checks 2026-04-15 10:03:23 +10:00
205 changed files with 2336 additions and 19300 deletions
-4
View File
@@ -145,10 +145,6 @@
# Only override here if you need to force a backend without touching config.yaml:
# TERMINAL_ENV=local
# Override the container runtime binary (e.g. to use Podman instead of Docker).
# Useful on systems where Docker's storage driver is broken or unavailable.
# HERMES_DOCKER_BINARY=/usr/local/bin/podman
# Container images (for singularity/docker/modal backends)
# TERMINAL_DOCKER_IMAGE=nikolaik/python-nodejs:python3.11-nodejs20
# TERMINAL_SINGULARITY_IMAGE=docker://nikolaik/python-nodejs:python3.11-nodejs20
-1
View File
@@ -105,4 +105,3 @@ tesseracttars-creator <tesseracttars@gmail.com> <tesseracttars@gmail.com>
xinbenlv <zzn+pa@zzn.im> <zzn+pa@zzn.im>
SaulJWu <saul.jj.wu@gmail.com> <saul.jj.wu@gmail.com>
angelos <angelos@oikos.lan.home.malaiwah.com> <angelos@oikos.lan.home.malaiwah.com>
MestreY0d4-Uninter <241404605+MestreY0d4-Uninter@users.noreply.github.com> <MestreY0d4-Uninter@users.noreply.github.com>
+4 -4
View File
@@ -13,7 +13,7 @@ source venv/bin/activate # ALWAYS activate before running Python
```
hermes-agent/
├── run_agent.py # AIAgent class — core conversation loop
├── model_tools.py # Tool orchestration, discover_builtin_tools(), handle_function_call()
├── model_tools.py # Tool orchestration, _discover_tools(), handle_function_call()
├── toolsets.py # Toolset definitions, _HERMES_CORE_TOOLS list
├── cli.py # HermesCLI class — interactive CLI orchestrator
├── hermes_state.py # SessionDB — SQLite session store (FTS5 search)
@@ -181,7 +181,7 @@ if canonical == "mycommand":
## Adding New Tools
Requires changes in **2 files**:
Requires changes in **3 files**:
**1. Create `tools/your_tool.py`:**
```python
@@ -204,9 +204,9 @@ registry.register(
)
```
**2. Add to `toolsets.py`** — either `_HERMES_CORE_TOOLS` (all platforms) or a new toolset.
**2. Add import** in `model_tools.py` `_discover_tools()` list.
Auto-discovery: any `tools/*.py` file with a top-level `registry.register()` call is imported automatically — no manual import list to maintain.
**3. Add to `toolsets.py`** — either `_HERMES_CORE_TOOLS` (all platforms) or a new toolset.
The registry handles schema collection, dispatch, availability checking, and error wrapping. All handlers MUST return a JSON string.
-84
View File
@@ -1,84 +0,0 @@
# Hermes Agent Security Policy
This document outlines the security protocols, trust model, and deployment hardening guidelines for the **Hermes Agent** project.
## 1. Vulnerability Reporting
Hermes Agent does **not** operate a bug bounty program. Security issues should be reported via [GitHub Security Advisories (GHSA)](https://github.com/NousResearch/hermes-agent/security/advisories/new) or by emailing **security@nousresearch.com**. Do not open public issues for security vulnerabilities.
### Required Submission Details
- **Title & Severity:** Concise description and CVSS score/rating.
- **Affected Component:** Exact file path and line range (e.g., `tools/approval.py:120-145`).
- **Environment:** Output of `hermes version`, commit SHA, OS, and Python version.
- **Reproduction:** Step-by-step Proof-of-Concept (PoC) against `main` or the latest release.
- **Impact:** Explanation of what trust boundary was crossed.
---
## 2. Trust Model
The core assumption is that Hermes is a **personal agent** with one trusted operator.
### Operator & Session Trust
- **Single Tenant:** The system protects the operator from LLM actions, not from malicious co-tenants. Multi-user isolation must happen at the OS/host level.
- **Gateway Security:** Authorized callers (Telegram, Discord, Slack, etc.) receive equal trust. Session keys are used for routing, not as authorization boundaries.
- **Execution:** Defaults to `terminal.backend: local` (direct host execution). Container isolation (Docker, Modal, Daytona) is opt-in for sandboxing.
### Dangerous Command Approval
The approval system (`tools/approval.py`) is a core security boundary. Terminal commands, file operations, and other potentially destructive actions are gated behind explicit user confirmation before execution. The approval mode is configurable via `approvals.mode` in `config.yaml`:
- `"on"` (default) — prompts the user to approve dangerous commands.
- `"auto"` — auto-approves after a configurable delay.
- `"off"` — disables the gate entirely (break-glass; see Section 3).
### Output Redaction
`agent/redact.py` strips secret-like patterns (API keys, tokens, credentials) from all display output before it reaches the terminal or gateway platform. This prevents accidental credential leakage in chat logs, tool previews, and response text. Redaction operates on the display layer only — underlying values remain intact for internal agent operations.
### Skills vs. MCP Servers
- **Installed Skills:** High trust. Equivalent to local host code; skills can read environment variables and run arbitrary commands.
- **MCP Servers:** Lower trust. MCP subprocesses receive a filtered environment (`_build_safe_env()` in `tools/mcp_tool.py`) — only safe baseline variables (`PATH`, `HOME`, `XDG_*`) plus variables explicitly declared in the server's `env` config block are passed through. Host credentials are stripped by default. Additionally, packages invoked via `npx`/`uvx` are checked against the OSV malware database before spawning.
### Code Execution Sandbox
The `execute_code` tool (`tools/code_execution_tool.py`) runs LLM-generated Python scripts in a child process with API keys and tokens stripped from the environment to prevent credential exfiltration. Only environment variables explicitly declared by loaded skills (via `env_passthrough`) or by the user in `config.yaml` (`terminal.env_passthrough`) are passed through. The child accesses Hermes tools via RPC, not direct API calls.
### Subagents
- **No recursive delegation:** The `delegate_task` tool is disabled for child agents.
- **Depth limit:** `MAX_DEPTH = 2` — parent (depth 0) can spawn a child (depth 1); grandchildren are rejected.
- **Memory isolation:** Subagents run with `skip_memory=True` and do not have access to the parent's persistent memory provider. The parent receives only the task prompt and final response as an observation.
---
## 3. Out of Scope (Non-Vulnerabilities)
The following scenarios are **not** considered security breaches:
- **Prompt Injection:** Unless it results in a concrete bypass of the approval system, toolset restrictions, or container sandbox.
- **Public Exposure:** Deploying the gateway to the public internet without external authentication or network protection.
- **Trusted State Access:** Reports that require pre-existing write access to `~/.hermes/`, `.env`, or `config.yaml` (these are operator-owned files).
- **Default Behavior:** Host-level command execution when `terminal.backend` is set to `local` — this is the documented default, not a vulnerability.
- **Configuration Trade-offs:** Intentional break-glass settings such as `approvals.mode: "off"` or `terminal.backend: local` in production.
- **Tool-level read/access restrictions:** The agent has unrestricted shell access via the `terminal` tool by design. Reports that a specific tool (e.g., `read_file`) can access a resource are not vulnerabilities if the same access is available through `terminal`. Tool-level deny lists only constitute a meaningful security boundary when paired with equivalent restrictions on the terminal side (as with write operations, where `WRITE_DENIED_PATHS` is paired with the dangerous command approval system).
---
## 4. Deployment Hardening & Best Practices
### Filesystem & Network
- **Production sandboxing:** Use container backends (`docker`, `modal`, `daytona`) instead of `local` for untrusted workloads.
- **File permissions:** Run as non-root (the Docker image uses UID 10000); protect credentials with `chmod 600 ~/.hermes/.env` on local installs.
- **Network exposure:** Do not expose the gateway or API server to the public internet without VPN, Tailscale, or firewall protection. SSRF protection is enabled by default across all gateway platform adapters (Telegram, Discord, Slack, Matrix, Mattermost, etc.) with redirect validation. Note: the local terminal backend does not apply SSRF filtering, as it operates within the trusted operator's environment.
### Skills & Supply Chain
- **Skill installation:** Review Skills Guard reports (`tools/skills_guard.py`) before installing third-party skills. The audit log at `~/.hermes/skills/.hub/audit.log` tracks every install and removal.
- **MCP safety:** OSV malware checking runs automatically for `npx`/`uvx` packages before MCP server processes are spawned.
- **CI/CD:** GitHub Actions are pinned to full commit SHAs. The `supply-chain-audit.yml` workflow blocks PRs containing `.pth` files or suspicious `base64`+`exec` patterns.
### Credential Storage
- API keys and tokens belong exclusively in `~/.hermes/.env` — never in `config.yaml` or checked into version control.
- The credential pool system (`agent/credential_pool.py`) handles key rotation and fallback. Credentials are resolved from environment variables, not stored in plaintext databases.
---
## 5. Disclosure Process
- **Coordinated Disclosure:** 90-day window or until a fix is released, whichever comes first.
- **Communication:** All updates occur via the GHSA thread or email correspondence with security@nousresearch.com.
- **Credits:** Reporters are credited in release notes unless anonymity is requested.
-27
View File
@@ -298,33 +298,6 @@ def build_anthropic_client(api_key: str, base_url: str = None):
return _anthropic_sdk.Anthropic(**kwargs)
def build_anthropic_bedrock_client(region: str):
"""Create an AnthropicBedrock client for Bedrock Claude models.
Uses the Anthropic SDK's native Bedrock adapter, which provides full
Claude feature parity: prompt caching, thinking budgets, adaptive
thinking, fast mode — features not available via the Converse API.
Auth uses the boto3 default credential chain (IAM roles, SSO, env vars).
"""
if _anthropic_sdk is None:
raise ImportError(
"The 'anthropic' package is required for the Bedrock provider. "
"Install it with: pip install 'anthropic>=0.39.0'"
)
if not hasattr(_anthropic_sdk, "AnthropicBedrock"):
raise ImportError(
"anthropic.AnthropicBedrock not available. "
"Upgrade with: pip install 'anthropic>=0.39.0'"
)
from httpx import Timeout
return _anthropic_sdk.AnthropicBedrock(
aws_region=region,
timeout=Timeout(timeout=900.0, connect=10.0),
)
def read_claude_code_credentials() -> Optional[Dict[str, Any]]:
"""Read refreshable Claude Code OAuth credentials from ~/.claude/.credentials.json.
+18 -101
View File
@@ -775,21 +775,6 @@ def _try_openrouter() -> Tuple[Optional[OpenAI], Optional[str]]:
def _try_nous(vision: bool = False) -> Tuple[Optional[OpenAI], Optional[str]]:
# Check cross-session rate limit guard before attempting Nous —
# if another session already recorded a 429, skip Nous entirely
# to avoid piling more requests onto the tapped RPH bucket.
try:
from agent.nous_rate_guard import nous_rate_limit_remaining
_remaining = nous_rate_limit_remaining()
if _remaining is not None and _remaining > 0:
logger.debug(
"Auxiliary: skipping Nous Portal (rate-limited, resets in %.0fs)",
_remaining,
)
return None, None
except Exception:
pass
nous = _read_nous_auth()
if not nous:
return None, None
@@ -914,51 +899,6 @@ def _current_custom_base_url() -> str:
return custom_base or ""
def _validate_proxy_env_urls() -> None:
"""Fail fast with a clear error when proxy env vars have malformed URLs.
Common cause: shell config (e.g. .zshrc) with a typo like
``export HTTP_PROXY=http://127.0.0.1:6153export NEXT_VAR=...``
which concatenates 'export' into the port number. Without this
check the OpenAI/httpx client raises a cryptic ``Invalid port``
error that doesn't name the offending env var.
"""
from urllib.parse import urlparse
for key in ("HTTPS_PROXY", "HTTP_PROXY", "ALL_PROXY",
"https_proxy", "http_proxy", "all_proxy"):
value = str(os.environ.get(key) or "").strip()
if not value:
continue
try:
parsed = urlparse(value)
if parsed.scheme:
_ = parsed.port # raises ValueError for e.g. '6153export'
except ValueError as exc:
raise RuntimeError(
f"Malformed proxy environment variable {key}={value!r}. "
"Fix or unset your proxy settings and try again."
) from exc
def _validate_base_url(base_url: str) -> None:
"""Reject obviously broken custom endpoint URLs before they reach httpx."""
from urllib.parse import urlparse
candidate = str(base_url or "").strip()
if not candidate or candidate.startswith("acp://"):
return
try:
parsed = urlparse(candidate)
if parsed.scheme in {"http", "https"}:
_ = parsed.port # raises ValueError for malformed ports
except ValueError as exc:
raise RuntimeError(
f"Malformed custom endpoint URL: {candidate!r}. "
"Run `hermes setup` or `hermes model` and enter a valid http(s) base URL."
) from exc
def _try_custom_endpoint() -> Tuple[Optional[OpenAI], Optional[str]]:
runtime = _resolve_custom_runtime()
if len(runtime) == 2:
@@ -1359,7 +1299,6 @@ def resolve_provider_client(
Returns:
(client, resolved_model) or (None, None) if auth is unavailable.
"""
_validate_proxy_env_urls()
# Normalise aliases
provider = _normalize_aux_provider(provider)
@@ -1896,15 +1835,9 @@ def auxiliary_max_tokens_param(value: int) -> dict:
# Every auxiliary LLM consumer should use these instead of manually
# constructing clients and calling .chat.completions.create().
# Client cache: (provider, async_mode, base_url, api_key, api_mode, runtime_key) -> (client, default_model, loop)
# NOTE: loop identity is NOT part of the key. On async cache hits we check
# whether the cached loop is the *current* loop; if not, the stale entry is
# replaced in-place. This bounds cache growth to one entry per unique
# provider config rather than one per (config × event-loop), which previously
# caused unbounded fd accumulation in long-running gateway processes (#10200).
# Client cache: (provider, async_mode, base_url, api_key) -> (client, default_model)
_client_cache: Dict[tuple, tuple] = {}
_client_cache_lock = threading.Lock()
_CLIENT_CACHE_MAX_SIZE = 64 # safety belt — evict oldest when exceeded
def neuter_async_httpx_del() -> None:
@@ -2037,49 +1970,39 @@ def _get_cached_client(
Async clients (AsyncOpenAI) use httpx.AsyncClient internally, which
binds to the event loop that was current when the client was created.
Using such a client on a *different* loop causes deadlocks or
RuntimeError. To prevent cross-loop issues, the cache validates on
every async hit that the cached loop is the *current, open* loop.
If the loop changed (e.g. a new gateway worker-thread loop), the stale
entry is replaced in-place rather than creating an additional entry.
This keeps cache size bounded to one entry per unique provider config,
preventing the fd-exhaustion that previously occurred in long-running
gateways where recycled worker threads created unbounded entries (#10200).
RuntimeError. To prevent cross-loop issues (especially in gateway
mode where _run_async() may spawn fresh loops in worker threads), the
cache key for async clients includes the current event loop's identity
so each loop gets its own client instance.
"""
# Resolve the current event loop for async clients so we can validate
# cached entries. Loop identity is NOT in the cache key — instead we
# check at hit time whether the cached loop is still current and open.
# This prevents unbounded cache growth from recycled worker-thread loops
# while still guaranteeing we never reuse a client on the wrong loop
# (which causes deadlocks, see #2681).
# Include loop identity for async clients to prevent cross-loop reuse.
# httpx.AsyncClient (inside AsyncOpenAI) is bound to the loop where it
# was created — reusing it on a different loop causes deadlocks (#2681).
loop_id = 0
current_loop = None
if async_mode:
try:
import asyncio as _aio
current_loop = _aio.get_event_loop()
loop_id = id(current_loop)
except RuntimeError:
pass
runtime = _normalize_main_runtime(main_runtime)
runtime_key = tuple(runtime.get(field, "") for field in _MAIN_RUNTIME_FIELDS) if provider == "auto" else ()
cache_key = (provider, async_mode, base_url or "", api_key or "", api_mode or "", runtime_key)
cache_key = (provider, async_mode, base_url or "", api_key or "", api_mode or "", loop_id, runtime_key)
with _client_cache_lock:
if cache_key in _client_cache:
cached_client, cached_default, cached_loop = _client_cache[cache_key]
if async_mode:
# Validate: the cached client must be bound to the CURRENT,
# OPEN loop. If the loop changed or was closed, the httpx
# transport inside is dead — force-close and replace.
loop_ok = (
cached_loop is not None
and cached_loop is current_loop
and not cached_loop.is_closed()
)
if loop_ok:
# A cached async client whose loop has been closed will raise
# "Event loop is closed" when httpx tries to clean up its
# transport. Discard the stale client and create a fresh one.
if cached_loop is not None and cached_loop.is_closed():
_force_close_async_httpx(cached_client)
del _client_cache[cache_key]
else:
effective = _compat_model(cached_client, model, cached_default)
return cached_client, effective
# Stale — evict and fall through to create a new client.
_force_close_async_httpx(cached_client)
del _client_cache[cache_key]
else:
effective = _compat_model(cached_client, model, cached_default)
return cached_client, effective
@@ -2099,12 +2022,6 @@ def _get_cached_client(
bound_loop = current_loop
with _client_cache_lock:
if cache_key not in _client_cache:
# Safety belt: if the cache has grown beyond the max, evict
# the oldest entries (FIFO — dict preserves insertion order).
while len(_client_cache) >= _CLIENT_CACHE_MAX_SIZE:
evict_key, evict_entry = next(iter(_client_cache.items()))
_force_close_async_httpx(evict_entry[0])
del _client_cache[evict_key]
_client_cache[cache_key] = (client, default_model, bound_loop)
else:
client, default_model, _ = _client_cache[cache_key]
File diff suppressed because it is too large Load Diff
+36 -307
View File
@@ -17,10 +17,7 @@ Improvements over v2:
- Richer tool call/result detail in summarizer input
"""
import hashlib
import json
import logging
import re
import time
from typing import Any, Dict, List, Optional
@@ -60,128 +57,6 @@ _CHARS_PER_TOKEN = 4
_SUMMARY_FAILURE_COOLDOWN_SECONDS = 600
def _summarize_tool_result(tool_name: str, tool_args: str, tool_content: str) -> str:
"""Create an informative 1-line summary of a tool call + result.
Used during the pre-compression pruning pass to replace large tool
outputs with a short but useful description of what the tool did,
rather than a generic placeholder that carries zero information.
Returns strings like::
[terminal] ran `npm test` -> exit 0, 47 lines output
[read_file] read config.py from line 1 (1,200 chars)
[search_files] content search for 'compress' in agent/ -> 12 matches
"""
try:
args = json.loads(tool_args) if tool_args else {}
except (json.JSONDecodeError, TypeError):
args = {}
content = tool_content or ""
content_len = len(content)
line_count = content.count("\n") + 1 if content.strip() else 0
if tool_name == "terminal":
cmd = args.get("command", "")
if len(cmd) > 80:
cmd = cmd[:77] + "..."
exit_match = re.search(r'"exit_code"\s*:\s*(-?\d+)', content)
exit_code = exit_match.group(1) if exit_match else "?"
return f"[terminal] ran `{cmd}` -> exit {exit_code}, {line_count} lines output"
if tool_name == "read_file":
path = args.get("path", "?")
offset = args.get("offset", 1)
return f"[read_file] read {path} from line {offset} ({content_len:,} chars)"
if tool_name == "write_file":
path = args.get("path", "?")
written_lines = args.get("content", "").count("\n") + 1 if args.get("content") else "?"
return f"[write_file] wrote to {path} ({written_lines} lines)"
if tool_name == "search_files":
pattern = args.get("pattern", "?")
path = args.get("path", ".")
target = args.get("target", "content")
match_count = re.search(r'"total_count"\s*:\s*(\d+)', content)
count = match_count.group(1) if match_count else "?"
return f"[search_files] {target} search for '{pattern}' in {path} -> {count} matches"
if tool_name == "patch":
path = args.get("path", "?")
mode = args.get("mode", "replace")
return f"[patch] {mode} in {path} ({content_len:,} chars result)"
if tool_name in ("browser_navigate", "browser_click", "browser_snapshot",
"browser_type", "browser_scroll", "browser_vision"):
url = args.get("url", "")
ref = args.get("ref", "")
detail = f" {url}" if url else (f" ref={ref}" if ref else "")
return f"[{tool_name}]{detail} ({content_len:,} chars)"
if tool_name == "web_search":
query = args.get("query", "?")
return f"[web_search] query='{query}' ({content_len:,} chars result)"
if tool_name == "web_extract":
urls = args.get("urls", [])
url_desc = urls[0] if isinstance(urls, list) and urls else "?"
if isinstance(urls, list) and len(urls) > 1:
url_desc += f" (+{len(urls) - 1} more)"
return f"[web_extract] {url_desc} ({content_len:,} chars)"
if tool_name == "delegate_task":
goal = args.get("goal", "")
if len(goal) > 60:
goal = goal[:57] + "..."
return f"[delegate_task] '{goal}' ({content_len:,} chars result)"
if tool_name == "execute_code":
code_preview = (args.get("code") or "")[:60].replace("\n", " ")
if len(args.get("code", "")) > 60:
code_preview += "..."
return f"[execute_code] `{code_preview}` ({line_count} lines output)"
if tool_name in ("skill_view", "skills_list", "skill_manage"):
name = args.get("name", "?")
return f"[{tool_name}] name={name} ({content_len:,} chars)"
if tool_name == "vision_analyze":
question = args.get("question", "")[:50]
return f"[vision_analyze] '{question}' ({content_len:,} chars)"
if tool_name == "memory":
action = args.get("action", "?")
target = args.get("target", "?")
return f"[memory] {action} on {target}"
if tool_name == "todo":
return "[todo] updated task list"
if tool_name == "clarify":
return "[clarify] asked user a question"
if tool_name == "text_to_speech":
return f"[text_to_speech] generated audio ({content_len:,} chars)"
if tool_name == "cronjob":
action = args.get("action", "?")
return f"[cronjob] {action}"
if tool_name == "process":
action = args.get("action", "?")
sid = args.get("session_id", "?")
return f"[process] {action} session={sid}"
# Generic fallback
first_arg = ""
for k, v in list(args.items())[:2]:
sv = str(v)[:40]
first_arg += f" {k}={sv}"
return f"[{tool_name}]{first_arg} ({content_len:,} chars result)"
class ContextCompressor(ContextEngine):
"""Default context engine — compresses conversation context via lossy summarization.
@@ -203,8 +78,6 @@ class ContextCompressor(ContextEngine):
self._context_probed = False
self._context_probe_persistable = False
self._previous_summary = None
self._last_compression_savings_pct = 100.0
self._ineffective_compression_count = 0
def update_model(
self,
@@ -294,9 +167,6 @@ class ContextCompressor(ContextEngine):
# Stores the previous compaction summary for iterative updates
self._previous_summary: Optional[str] = None
# Anti-thrashing: track whether last compression was effective
self._last_compression_savings_pct: float = 100.0
self._ineffective_compression_count: int = 0
self._summary_failure_cooldown_until: float = 0.0
def update_from_response(self, usage: Dict[str, Any]):
@@ -305,26 +175,9 @@ class ContextCompressor(ContextEngine):
self.last_completion_tokens = usage.get("completion_tokens", 0)
def should_compress(self, prompt_tokens: int = None) -> bool:
"""Check if context exceeds the compression threshold.
Includes anti-thrashing protection: if the last two compressions
each saved less than 10%, skip compression to avoid infinite loops
where each pass removes only 1-2 messages.
"""
"""Check if context exceeds the compression threshold."""
tokens = prompt_tokens if prompt_tokens is not None else self.last_prompt_tokens
if tokens < self.threshold_tokens:
return False
# Anti-thrashing: back off if recent compressions were ineffective
if self._ineffective_compression_count >= 2:
if not self.quiet_mode:
logger.warning(
"Compression skipped — last %d compressions saved <10%% each. "
"Consider /new to start a fresh session, or /compress <topic> "
"for focused compression.",
self._ineffective_compression_count,
)
return False
return True
return tokens >= self.threshold_tokens
# ------------------------------------------------------------------
# Tool output pruning (cheap pre-pass, no LLM call)
@@ -334,16 +187,7 @@ class ContextCompressor(ContextEngine):
self, messages: List[Dict[str, Any]], protect_tail_count: int,
protect_tail_tokens: int | None = None,
) -> tuple[List[Dict[str, Any]], int]:
"""Replace old tool result contents with informative 1-line summaries.
Instead of a generic placeholder, generates a summary like::
[terminal] ran `npm test` -> exit 0, 47 lines output
[read_file] read config.py from line 1 (3,400 chars)
Also deduplicates identical tool results (e.g. reading the same file
5x keeps only the newest full copy) and truncates large tool_call
arguments in assistant messages outside the protected tail.
"""Replace old tool result contents with a short placeholder.
Walks backward from the end, protecting the most recent messages that
fall within ``protect_tail_tokens`` (when provided) OR the last
@@ -359,22 +203,6 @@ class ContextCompressor(ContextEngine):
result = [m.copy() for m in messages]
pruned = 0
# Build index: tool_call_id -> (tool_name, arguments_json)
call_id_to_tool: Dict[str, tuple] = {}
for msg in result:
if msg.get("role") == "assistant":
for tc in msg.get("tool_calls") or []:
if isinstance(tc, dict):
cid = tc.get("id", "")
fn = tc.get("function", {})
call_id_to_tool[cid] = (fn.get("name", "unknown"), fn.get("arguments", ""))
else:
cid = getattr(tc, "id", "") or ""
fn = getattr(tc, "function", None)
name = getattr(fn, "name", "unknown") if fn else "unknown"
args_str = getattr(fn, "arguments", "") if fn else ""
call_id_to_tool[cid] = (name, args_str)
# Determine the prune boundary
if protect_tail_tokens is not None and protect_tail_tokens > 0:
# Token-budget approach: walk backward accumulating tokens
@@ -383,8 +211,7 @@ class ContextCompressor(ContextEngine):
min_protect = min(protect_tail_count, len(result) - 1)
for i in range(len(result) - 1, -1, -1):
msg = result[i]
raw_content = msg.get("content") or ""
content_len = sum(len(p.get("text", "")) for p in raw_content) if isinstance(raw_content, list) else len(raw_content)
content_len = len(msg.get("content") or "")
msg_tokens = content_len // _CHARS_PER_TOKEN + 10
for tc in msg.get("tool_calls") or []:
if isinstance(tc, dict):
@@ -399,69 +226,18 @@ class ContextCompressor(ContextEngine):
else:
prune_boundary = len(result) - protect_tail_count
# Pass 1: Deduplicate identical tool results.
# When the same file is read multiple times, keep only the most recent
# full copy and replace older duplicates with a back-reference.
content_hashes: dict = {} # hash -> (index, tool_call_id)
for i in range(len(result) - 1, -1, -1):
msg = result[i]
if msg.get("role") != "tool":
continue
content = msg.get("content") or ""
# Skip multimodal content (list of content blocks)
if isinstance(content, list):
continue
if len(content) < 200:
continue
h = hashlib.md5(content.encode("utf-8", errors="replace")).hexdigest()[:12]
if h in content_hashes:
# This is an older duplicate — replace with back-reference
result[i] = {**msg, "content": "[Duplicate tool output — same content as a more recent call]"}
pruned += 1
else:
content_hashes[h] = (i, msg.get("tool_call_id", "?"))
# Pass 2: Replace old tool results with informative summaries
for i in range(prune_boundary):
msg = result[i]
if msg.get("role") != "tool":
continue
content = msg.get("content", "")
# Skip multimodal content (list of content blocks)
if isinstance(content, list):
continue
if not content or content == _PRUNED_TOOL_PLACEHOLDER:
continue
# Skip already-deduplicated or previously-summarized results
if content.startswith("[Duplicate tool output"):
continue
# Only prune if the content is substantial (>200 chars)
if len(content) > 200:
call_id = msg.get("tool_call_id", "")
tool_name, tool_args = call_id_to_tool.get(call_id, ("unknown", ""))
summary = _summarize_tool_result(tool_name, tool_args, content)
result[i] = {**msg, "content": summary}
result[i] = {**msg, "content": _PRUNED_TOOL_PLACEHOLDER}
pruned += 1
# Pass 3: Truncate large tool_call arguments in assistant messages
# outside the protected tail. write_file with 50KB content, for
# example, survives pruning entirely without this.
for i in range(prune_boundary):
msg = result[i]
if msg.get("role") != "assistant" or not msg.get("tool_calls"):
continue
new_tcs = []
modified = False
for tc in msg["tool_calls"]:
if isinstance(tc, dict):
args = tc.get("function", {}).get("arguments", "")
if len(args) > 500:
tc = {**tc, "function": {**tc["function"], "arguments": args[:200] + "...[truncated]"}}
modified = True
new_tcs.append(tc)
if modified:
result[i] = {**msg, "tool_calls": new_tcs}
return result, pruned
# ------------------------------------------------------------------
@@ -581,37 +357,29 @@ class ContextCompressor(ContextEngine):
)
# Shared structured template (used by both paths).
# Key changes vs v1:
# - "Pending User Asks" section (from Claude Code) explicitly tracks
# unanswered questions so the model knows what's resolved vs open
# - "Remaining Work" replaces "Next Steps" to avoid reading as active
# instructions
# - "Resolved Questions" makes it clear which questions were already
# answered (prevents model from re-answering them)
_template_sections = f"""## Goal
[What the user is trying to accomplish]
## Constraints & Preferences
[User preferences, coding style, constraints, important decisions]
## Completed Actions
[Numbered list of concrete actions taken — include tool used, target, and outcome.
Format each as: N. ACTION target — outcome [tool: name]
Example:
1. READ config.py:45 — found `==` should be `!=` [tool: read_file]
2. PATCH config.py:45 — changed `==` to `!=` [tool: patch]
3. TEST `pytest tests/` — 3/50 failed: test_parse, test_validate, test_edge [tool: terminal]
Be specific with file paths, commands, line numbers, and results.]
## Active State
[Current working state — include:
- Working directory and branch (if applicable)
- Modified/created files with brief note on each
- Test status (X/Y passing)
- Any running processes or servers
- Environment details that matter]
## In Progress
[Work currently underway — what was being done when compaction fired]
## Blocked
[Any blockers, errors, or issues not yet resolved. Include exact error messages.]
## Progress
### Done
[Completed work — include specific file paths, commands run, results obtained]
### In Progress
[Work currently underway]
### Blocked
[Any blockers or issues encountered]
## Key Decisions
[Important technical decisions and WHY they were made]
[Important technical decisions and why they were made]
## Resolved Questions
[Questions the user asked that were ALREADY answered — include the answer so the next assistant does not re-answer them]
@@ -628,7 +396,10 @@ Be specific with file paths, commands, line numbers, and results.]
## Critical Context
[Any specific values, error messages, configuration details, or data that would be lost without explicit preservation]
Target ~{summary_budget} tokens. Be CONCRETE — include file paths, command outputs, error messages, line numbers, and specific values. Avoid vague descriptions like "made some changes" — say exactly what changed.
## Tools & Patterns
[Which tools were used, how they were used effectively, and any tool-specific discoveries]
Target ~{summary_budget} tokens. Be specific — include file paths, command outputs, error messages, and concrete values rather than vague descriptions.
Write only the summary body. Do not include any preamble or prefix."""
@@ -644,7 +415,7 @@ PREVIOUS SUMMARY:
NEW TURNS TO INCORPORATE:
{content_to_summarize}
Update the summary using this exact structure. PRESERVE all existing information that is still relevant. ADD new completed actions to the numbered list (continue numbering). Move items from "In Progress" to "Completed Actions" when done. Move answered questions to "Resolved Questions". Update "Active State" to reflect current state. Remove information only if it is clearly obsolete.
Update the summary using this exact structure. PRESERVE all existing information that is still relevant. ADD new progress. Move items from "In Progress" to "Done" when completed. Move answered questions to "Resolved Questions". Remove information only if it is clearly obsolete.
{_template_sections}"""
else:
@@ -679,7 +450,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
"api_mode": self.api_mode,
},
"messages": [{"role": "user", "content": prompt}],
"max_tokens": int(summary_budget * 1.3),
"max_tokens": summary_budget * 2,
# timeout resolved from auxiliary.compression.timeout config by call_llm
}
if self.summary_model:
@@ -693,10 +464,8 @@ The user has requested that this compaction PRIORITISE preserving all informatio
# Store for iterative updates on next compaction
self._previous_summary = summary
self._summary_failure_cooldown_until = 0.0
self._summary_model_fallen_back = False
return self._with_summary_prefix(summary)
except RuntimeError:
# No provider configured — long cooldown, unlikely to self-resolve
self._summary_failure_cooldown_until = time.monotonic() + _SUMMARY_FAILURE_COOLDOWN_SECONDS
logging.warning("Context compression: no provider available for "
"summary. Middle turns will be dropped without summary "
@@ -704,42 +473,12 @@ The user has requested that this compaction PRIORITISE preserving all informatio
_SUMMARY_FAILURE_COOLDOWN_SECONDS)
return None
except Exception as e:
# If the summary model is different from the main model and the
# error looks permanent (model not found, 503, 404), fall back to
# using the main model instead of entering cooldown that leaves
# context growing unbounded. (#8620 sub-issue 4)
_status = getattr(e, "status_code", None) or getattr(getattr(e, "response", None), "status_code", None)
_err_str = str(e).lower()
_is_model_not_found = (
_status in (404, 503)
or "model_not_found" in _err_str
or "does not exist" in _err_str
or "no available channel" in _err_str
)
if (
_is_model_not_found
and self.summary_model
and self.summary_model != self.model
and not getattr(self, "_summary_model_fallen_back", False)
):
self._summary_model_fallen_back = True
logging.warning(
"Summary model '%s' not available (%s). "
"Falling back to main model '%s' for compression.",
self.summary_model, e, self.model,
)
self.summary_model = "" # empty = use main model
self._summary_failure_cooldown_until = 0.0 # no cooldown
return self._generate_summary(messages, summary_budget) # retry immediately
# Transient errors (timeout, rate limit, network) — shorter cooldown
_transient_cooldown = 60
self._summary_failure_cooldown_until = time.monotonic() + _transient_cooldown
self._summary_failure_cooldown_until = time.monotonic() + _SUMMARY_FAILURE_COOLDOWN_SECONDS
logging.warning(
"Failed to generate context summary: %s. "
"Further summary attempts paused for %d seconds.",
e,
_transient_cooldown,
_SUMMARY_FAILURE_COOLDOWN_SECONDS,
)
return None
@@ -1005,11 +744,11 @@ The user has requested that this compaction PRIORITISE preserving all informatio
compressed = []
for i in range(compress_start):
msg = messages[i].copy()
if i == 0 and msg.get("role") == "system":
existing = msg.get("content") or ""
_compression_note = "[Note: Some earlier conversation turns have been compacted into a handoff summary to preserve context space. The current session state may still reflect earlier work, so build on that summary and state rather than re-doing work.]"
if _compression_note not in existing:
msg["content"] = existing + "\n\n" + _compression_note
if i == 0 and msg.get("role") == "system" and self.compression_count == 0:
msg["content"] = (
(msg.get("content") or "")
+ "\n\n[Note: Some earlier conversation turns have been compacted into a handoff summary to preserve context space. The current session state may still reflect earlier work, so build on that summary and state rather than re-doing work.]"
)
compressed.append(msg)
# If LLM summary failed, insert a static fallback so the model
@@ -1067,24 +806,14 @@ The user has requested that this compaction PRIORITISE preserving all informatio
compressed = self._sanitize_tool_pairs(compressed)
new_estimate = estimate_messages_tokens_rough(compressed)
saved_estimate = display_tokens - new_estimate
# Anti-thrashing: track compression effectiveness
savings_pct = (saved_estimate / display_tokens * 100) if display_tokens > 0 else 0
self._last_compression_savings_pct = savings_pct
if savings_pct < 10:
self._ineffective_compression_count += 1
else:
self._ineffective_compression_count = 0
if not self.quiet_mode:
new_estimate = estimate_messages_tokens_rough(compressed)
saved_estimate = display_tokens - new_estimate
logger.info(
"Compressed: %d -> %d messages (~%d tokens saved, %.0f%%)",
"Compressed: %d -> %d messages (~%d tokens saved)",
n_messages,
len(compressed),
saved_estimate,
savings_pct,
)
logger.info("Compression #%d complete", self.compression_count)
-2
View File
@@ -1162,7 +1162,6 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
if token:
source_name = "gh_cli" if "gh" in source.lower() else f"env:{source}"
active_sources.add(source_name)
pconfig = PROVIDER_REGISTRY.get(provider)
changed |= _upsert_entry(
entries,
provider,
@@ -1171,7 +1170,6 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
"source": source_name,
"auth_type": AUTH_TYPE_API_KEY,
"access_token": token,
"base_url": pconfig.inference_base_url if pconfig else "",
"label": source,
},
)
-9
View File
@@ -112,10 +112,6 @@ _RATE_LIMIT_PATTERNS = [
"please retry after",
"resource_exhausted",
"rate increased too quickly", # Alibaba/DashScope throttling
# AWS Bedrock throttling
"throttlingexception",
"too many concurrent requests",
"servicequotaexceededexception",
]
# Usage-limit patterns that need disambiguation (could be billing OR rate_limit)
@@ -175,11 +171,6 @@ _CONTEXT_OVERFLOW_PATTERNS = [
# Chinese error messages (some providers return these)
"超过最大长度",
"上下文长度",
# AWS Bedrock Converse API error patterns
"input is too long",
"max input token",
"input token",
"exceeds the maximum number of input tokens",
]
# Model not found patterns
+2 -14
View File
@@ -28,7 +28,6 @@ Usage in run_agent.py:
from __future__ import annotations
import json
import logging
import re
from typing import Any, Dict, List, Optional
@@ -44,22 +43,11 @@ logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
_FENCE_TAG_RE = re.compile(r'</?\s*memory-context\s*>', re.IGNORECASE)
_INTERNAL_CONTEXT_RE = re.compile(
r'<\s*memory-context\s*>[\s\S]*?</\s*memory-context\s*>',
re.IGNORECASE,
)
_INTERNAL_NOTE_RE = re.compile(
r'\[System note:\s*The following is recalled memory context,\s*NOT new user input\.\s*Treat as informational background data\.\]\s*',
re.IGNORECASE,
)
def sanitize_context(text: str) -> str:
"""Strip fence tags, injected context blocks, and system notes from provider output."""
text = _INTERNAL_CONTEXT_RE.sub('', text)
text = _INTERNAL_NOTE_RE.sub('', text)
text = _FENCE_TAG_RE.sub('', text)
return text
"""Strip fence-escape sequences from provider output."""
return _FENCE_TAG_RE.sub('', text)
def build_memory_context_block(raw_context: str) -> str:
-11
View File
@@ -36,7 +36,6 @@ _PROVIDER_PREFIXES: frozenset[str] = frozenset({
"opencode", "zen", "go", "vercel", "kilo", "dashscope", "aliyun", "qwen",
"mimo", "xiaomi-mimo",
"arcee-ai", "arceeai",
"xai", "x-ai", "x.ai", "grok",
"qwen-portal",
})
@@ -1012,16 +1011,6 @@ def get_model_context_length(
if ctx:
return ctx
# 4b. AWS Bedrock — use static context length table.
# Bedrock's ListFoundationModels doesn't expose context window sizes,
# so we maintain a curated table in bedrock_adapter.py.
if provider == "bedrock" or (base_url and "bedrock-runtime" in base_url):
try:
from agent.bedrock_adapter import get_bedrock_context_length
return get_bedrock_context_length(model)
except ImportError:
pass # boto3 not installed — fall through to generic resolution
# 5. Provider-aware lookups (before generic OpenRouter cache)
# These are provider-specific and take priority over the generic OR cache,
# since the same model can have different context limits per provider
-182
View File
@@ -1,182 +0,0 @@
"""Cross-session rate limit guard for Nous Portal.
Writes rate limit state to a shared file so all sessions (CLI, gateway,
cron, auxiliary) can check whether Nous Portal is currently rate-limited
before making requests. Prevents retry amplification when RPH is tapped.
Each 429 from Nous triggers up to 9 API calls per conversation turn
(3 SDK retries x 3 Hermes retries), and every one of those calls counts
against RPH. By recording the rate limit state on first 429 and checking
it before subsequent attempts, we eliminate the amplification effect.
"""
from __future__ import annotations
import json
import logging
import os
import tempfile
import time
from typing import Any, Mapping, Optional
logger = logging.getLogger(__name__)
_STATE_SUBDIR = "rate_limits"
_STATE_FILENAME = "nous.json"
def _state_path() -> str:
"""Return the path to the Nous rate limit state file."""
try:
from hermes_constants import get_hermes_home
base = get_hermes_home()
except ImportError:
base = os.path.join(os.path.expanduser("~"), ".hermes")
return os.path.join(base, _STATE_SUBDIR, _STATE_FILENAME)
def _parse_reset_seconds(headers: Optional[Mapping[str, str]]) -> Optional[float]:
"""Extract the best available reset-time estimate from response headers.
Priority:
1. x-ratelimit-reset-requests-1h (hourly RPH window — most useful)
2. x-ratelimit-reset-requests (per-minute RPM window)
3. retry-after (generic HTTP header)
Returns seconds-from-now, or None if no usable header found.
"""
if not headers:
return None
lowered = {k.lower(): v for k, v in headers.items()}
for key in (
"x-ratelimit-reset-requests-1h",
"x-ratelimit-reset-requests",
"retry-after",
):
raw = lowered.get(key)
if raw is not None:
try:
val = float(raw)
if val > 0:
return val
except (TypeError, ValueError):
pass
return None
def record_nous_rate_limit(
*,
headers: Optional[Mapping[str, str]] = None,
error_context: Optional[dict[str, Any]] = None,
default_cooldown: float = 300.0,
) -> None:
"""Record that Nous Portal is rate-limited.
Parses the reset time from response headers or error context.
Falls back to ``default_cooldown`` (5 minutes) if no reset info
is available. Writes to a shared file that all sessions can read.
Args:
headers: HTTP response headers from the 429 error.
error_context: Structured error context from _extract_api_error_context().
default_cooldown: Fallback cooldown in seconds when no header data.
"""
now = time.time()
reset_at = None
# Try headers first (most accurate)
header_seconds = _parse_reset_seconds(headers)
if header_seconds is not None:
reset_at = now + header_seconds
# Try error_context reset_at (from body parsing)
if reset_at is None and isinstance(error_context, dict):
ctx_reset = error_context.get("reset_at")
if isinstance(ctx_reset, (int, float)) and ctx_reset > now:
reset_at = float(ctx_reset)
# Default cooldown
if reset_at is None:
reset_at = now + default_cooldown
path = _state_path()
try:
state_dir = os.path.dirname(path)
os.makedirs(state_dir, exist_ok=True)
state = {
"reset_at": reset_at,
"recorded_at": now,
"reset_seconds": reset_at - now,
}
# Atomic write: write to temp file + rename
fd, tmp_path = tempfile.mkstemp(dir=state_dir, suffix=".tmp")
try:
with os.fdopen(fd, "w") as f:
json.dump(state, f)
os.replace(tmp_path, path)
except Exception:
# Clean up temp file on failure
try:
os.unlink(tmp_path)
except OSError:
pass
raise
logger.info(
"Nous rate limit recorded: resets in %.0fs (at %.0f)",
reset_at - now, reset_at,
)
except Exception as exc:
logger.debug("Failed to write Nous rate limit state: %s", exc)
def nous_rate_limit_remaining() -> Optional[float]:
"""Check if Nous Portal is currently rate-limited.
Returns:
Seconds remaining until reset, or None if not rate-limited.
"""
path = _state_path()
try:
with open(path) as f:
state = json.load(f)
reset_at = state.get("reset_at", 0)
remaining = reset_at - time.time()
if remaining > 0:
return remaining
# Expired — clean up
try:
os.unlink(path)
except OSError:
pass
return None
except (FileNotFoundError, json.JSONDecodeError, KeyError, TypeError):
return None
def clear_nous_rate_limit() -> None:
"""Clear the rate limit state (e.g., after a successful Nous request)."""
try:
os.unlink(_state_path())
except FileNotFoundError:
pass
except OSError as exc:
logger.debug("Failed to clear Nous rate limit state: %s", exc)
def format_remaining(seconds: float) -> str:
"""Format seconds remaining into human-readable duration."""
s = max(0, int(seconds))
if s < 60:
return f"{s}s"
if s < 3600:
m, sec = divmod(s, 60)
return f"{m}m {sec}s" if sec else f"{m}m"
h, remainder = divmod(s, 3600)
m = remainder // 60
return f"{h}h {m}m" if m else f"{h}h"
+1 -3
View File
@@ -295,9 +295,7 @@ PLATFORM_HINTS = {
),
"telegram": (
"You are on a text messaging communication platform, Telegram. "
"Standard markdown is automatically converted to Telegram format. "
"Supported: **bold**, *italic*, ~~strikethrough~~, ||spoiler||, "
"`inline code`, ```code blocks```, [links](url), and ## headers. "
"Please do not use markdown as it does not render. "
"You can send media files natively: to deliver a file to the user, "
"include MEDIA:/absolute/path/to/file in your response. Images "
"(.png, .jpg, .webp) appear as photos, audio (.ogg) sends as voice "
-17
View File
@@ -93,17 +93,6 @@ _DB_CONNSTR_RE = re.compile(
re.IGNORECASE,
)
# JWT tokens: header.payload[.signature] — always start with "eyJ" (base64 for "{")
# Matches 1-part (header only), 2-part (header.payload), and full 3-part JWTs.
_JWT_RE = re.compile(
r"eyJ[A-Za-z0-9_-]{10,}" # Header (always starts with eyJ)
r"(?:\.[A-Za-z0-9_=-]{4,}){0,2}" # Optional payload and/or signature
)
# Discord user/role mentions: <@123456789012345678> or <@!123456789012345678>
# Snowflake IDs are 17-20 digit integers that resolve to specific Discord accounts.
_DISCORD_MENTION_RE = re.compile(r"<@!?(\d{17,20})>")
# E.164 phone numbers: +<country><number>, 7-15 digits
# Negative lookahead prevents matching hex strings or identifiers
_SIGNAL_PHONE_RE = re.compile(r"(\+[1-9]\d{6,14})(?![A-Za-z0-9])")
@@ -170,12 +159,6 @@ def redact_sensitive_text(text: str) -> str:
# Database connection string passwords
text = _DB_CONNSTR_RE.sub(lambda m: f"{m.group(1)}***{m.group(3)}", text)
# JWT tokens (eyJ... — base64-encoded JSON headers)
text = _JWT_RE.sub(lambda m: _mask_token(m.group(0)), text)
# Discord user/role mentions (<@snowflake_id>)
text = _DISCORD_MENTION_RE.sub(lambda m: f"<@{'!' if '!' in m.group(0) else ''}***>", text)
# E.164 phone numbers (Signal, WhatsApp)
def _redact_phone(m):
phone = m.group(1)
+2 -11
View File
@@ -12,8 +12,6 @@ from datetime import datetime
from pathlib import Path
from typing import Any, Dict, Optional
from hermes_constants import display_hermes_home
logger = logging.getLogger(__name__)
_skill_commands: Dict[str, Dict[str, Any]] = {}
@@ -72,14 +70,7 @@ def _load_skill_payload(skill_identifier: str, task_id: str | None = None) -> tu
skill_name = str(loaded_skill.get("name") or normalized)
skill_path = str(loaded_skill.get("path") or "")
skill_dir = None
# Prefer the absolute skill_dir returned by skill_view() — this is
# correct for both local and external skills. Fall back to the old
# SKILLS_DIR-relative reconstruction only when skill_dir is absent
# (e.g. legacy skill_view responses).
abs_skill_dir = loaded_skill.get("skill_dir")
if abs_skill_dir:
skill_dir = Path(abs_skill_dir)
elif skill_path:
if skill_path:
try:
skill_dir = SKILLS_DIR / Path(skill_path).parent
except Exception:
@@ -117,7 +108,7 @@ def _inject_skill_config(loaded_skill: dict[str, Any], parts: list[str]) -> None
if not resolved:
return
lines = ["", f"[Skill config (from {display_hermes_home()}/config.yaml):"]
lines = ["", "[Skill config (from ~/.hermes/config.yaml):"]
for key, value in resolved.items():
display_val = str(value) if value else "(not set)"
lines.append(f" {key} = {display_val}")
-74
View File
@@ -284,80 +284,6 @@ _OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
source_url="https://ai.google.dev/pricing",
pricing_version="google-pricing-2026-03-16",
),
# AWS Bedrock — pricing per the Bedrock pricing page.
# Bedrock charges the same per-token rates as the model provider but
# through AWS billing. These are the on-demand prices (no commitment).
# Source: https://aws.amazon.com/bedrock/pricing/
(
"bedrock",
"anthropic.claude-opus-4-6",
): PricingEntry(
input_cost_per_million=Decimal("15.00"),
output_cost_per_million=Decimal("75.00"),
source="official_docs_snapshot",
source_url="https://aws.amazon.com/bedrock/pricing/",
pricing_version="bedrock-pricing-2026-04",
),
(
"bedrock",
"anthropic.claude-sonnet-4-6",
): PricingEntry(
input_cost_per_million=Decimal("3.00"),
output_cost_per_million=Decimal("15.00"),
source="official_docs_snapshot",
source_url="https://aws.amazon.com/bedrock/pricing/",
pricing_version="bedrock-pricing-2026-04",
),
(
"bedrock",
"anthropic.claude-sonnet-4-5",
): PricingEntry(
input_cost_per_million=Decimal("3.00"),
output_cost_per_million=Decimal("15.00"),
source="official_docs_snapshot",
source_url="https://aws.amazon.com/bedrock/pricing/",
pricing_version="bedrock-pricing-2026-04",
),
(
"bedrock",
"anthropic.claude-haiku-4-5",
): PricingEntry(
input_cost_per_million=Decimal("0.80"),
output_cost_per_million=Decimal("4.00"),
source="official_docs_snapshot",
source_url="https://aws.amazon.com/bedrock/pricing/",
pricing_version="bedrock-pricing-2026-04",
),
(
"bedrock",
"amazon.nova-pro",
): PricingEntry(
input_cost_per_million=Decimal("0.80"),
output_cost_per_million=Decimal("3.20"),
source="official_docs_snapshot",
source_url="https://aws.amazon.com/bedrock/pricing/",
pricing_version="bedrock-pricing-2026-04",
),
(
"bedrock",
"amazon.nova-lite",
): PricingEntry(
input_cost_per_million=Decimal("0.06"),
output_cost_per_million=Decimal("0.24"),
source="official_docs_snapshot",
source_url="https://aws.amazon.com/bedrock/pricing/",
pricing_version="bedrock-pricing-2026-04",
),
(
"bedrock",
"amazon.nova-micro",
): PricingEntry(
input_cost_per_million=Decimal("0.035"),
output_cost_per_million=Decimal("0.14"),
source="official_docs_snapshot",
source_url="https://aws.amazon.com/bedrock/pricing/",
pricing_version="bedrock-pricing-2026-04",
),
}
+1 -13
View File
@@ -16,7 +16,7 @@ model:
# "nous" - Nous Portal OAuth (requires: hermes login)
# "nous-api" - Nous Portal API key (requires: NOUS_API_KEY)
# "anthropic" - Direct Anthropic API (requires: ANTHROPIC_API_KEY)
# "openai-codex" - OpenAI Codex (requires: hermes auth)
# "openai-codex" - OpenAI Codex (requires: hermes login --provider openai-codex)
# "copilot" - GitHub Copilot / GitHub Models (requires: GITHUB_TOKEN)
# "gemini" - Use Google AI Studio direct (requires: GOOGLE_API_KEY or GEMINI_API_KEY)
# "zai" - Use z.ai / ZhipuAI GLM models (requires: GLM_API_KEY)
@@ -564,18 +564,6 @@ platform_toolsets:
homeassistant: [hermes-homeassistant]
qqbot: [hermes-qqbot]
# =============================================================================
# Gateway Platform Settings
# =============================================================================
# Optional per-platform messaging settings.
# Platform-specific knobs live under `extra`.
#
# platforms:
# telegram:
# reply_to_mode: "first" # off | first | all
# extra:
# disable_link_previews: false # Set true to suppress Telegram URL previews in bot messages
# ─────────────────────────────────────────────────────────────────────────────
# Available toolsets (use these names in platform_toolsets or the toolsets list)
#
+33 -31
View File
@@ -989,7 +989,6 @@ def _prune_orphaned_branches(repo_root: str) -> None:
_ACCENT_ANSI_DEFAULT = "\033[1;38;2;255;215;0m" # True-color #FFD700 bold — fallback
_BOLD = "\033[1m"
_RST = "\033[0m"
_STREAM_PAD = " " # 4-space indent for streamed response text (matches Panel padding)
def _hex_to_ansi(hex_color: str, *, bold: bool = False) -> str:
@@ -1713,9 +1712,9 @@ class HermesCLI:
# Parse and validate toolsets
self.enabled_toolsets = toolsets
if toolsets and "all" not in toolsets and "*" not in toolsets:
# Validate each toolset — MCP server names are resolved via
# live registry aliases (registered during discover_mcp_tools),
# but discovery hasn't run yet at this point, so exclude them.
# Validate each toolset — MCP server names are added by
# _get_platform_tools() but aren't registered in TOOLSETS yet
# (that happens later in _sync_mcp_toolsets), so exclude them.
mcp_names = set((CLI_CONFIG.get("mcp_servers") or {}).keys())
invalid = [t for t in toolsets if not validate_toolset(t) and t not in mcp_names]
if invalid:
@@ -2581,7 +2580,7 @@ class HermesCLI:
_tc = getattr(self, "_stream_text_ansi", "")
while "\n" in self._stream_buf:
line, self._stream_buf = self._stream_buf.split("\n", 1)
_cprint(f"{_STREAM_PAD}{_tc}{line}{_RST}" if _tc else f"{_STREAM_PAD}{line}")
_cprint(f"{_tc}{line}{_RST}" if _tc else line)
def _flush_stream(self) -> None:
"""Emit any remaining partial line from the stream buffer and close the box."""
@@ -2598,7 +2597,7 @@ class HermesCLI:
if self._stream_buf:
_tc = getattr(self, "_stream_text_ansi", "")
_cprint(f"{_STREAM_PAD}{_tc}{self._stream_buf}{_RST}" if _tc else f"{_STREAM_PAD}{self._stream_buf}")
_cprint(f"{_tc}{self._stream_buf}{_RST}" if _tc else self._stream_buf)
self._stream_buf = ""
# Close the response box
@@ -3897,14 +3896,23 @@ class HermesCLI:
def _handle_profile_command(self):
"""Display active profile name and home directory."""
from hermes_constants import display_hermes_home
from hermes_cli.profiles import get_active_profile_name
from hermes_constants import get_hermes_home, display_hermes_home
home = get_hermes_home()
display = display_hermes_home()
profile_name = get_active_profile_name()
profiles_parent = Path.home() / ".hermes" / "profiles"
try:
rel = home.relative_to(profiles_parent)
profile_name = str(rel).split("/")[0]
except ValueError:
profile_name = None
print()
print(f" Profile: {profile_name}")
if profile_name:
print(f" Profile: {profile_name}")
else:
print(" Profile: default")
print(f" Home: {display}")
print()
@@ -4091,8 +4099,6 @@ class HermesCLI:
self.agent.flush_memories(self.conversation_history)
except (Exception, KeyboardInterrupt):
pass
# Trigger memory extraction on the old session before session_id rotates.
self.agent.commit_memory_session(self.conversation_history)
self._notify_session_boundary("on_session_finalize")
elif self.agent:
# First session or empty history — still finalize the old session
@@ -4581,19 +4587,16 @@ class HermesCLI:
self._close_model_picker()
return
provider_data = providers[selected]
# Use the curated model list from list_authenticated_providers()
# (same lists as `hermes model` and gateway pickers).
# Only fall back to the live provider catalog when the curated
# list is empty (e.g. user-defined endpoints with no curated list).
model_list = provider_data.get("models", [])
model_list = []
try:
from hermes_cli.models import provider_model_ids
live = provider_model_ids(provider_data["slug"])
if live:
model_list = live
except Exception:
pass
if not model_list:
try:
from hermes_cli.models import provider_model_ids
live = provider_model_ids(provider_data["slug"])
if live:
model_list = live
except Exception:
pass
model_list = provider_data.get("models", [])
state["stage"] = "model"
state["provider_data"] = provider_data
state["model_list"] = model_list
@@ -5484,8 +5487,7 @@ class HermesCLI:
version = f" v{p['version']}" if p["version"] else ""
tools = f"{p['tools']} tools" if p["tools"] else ""
hooks = f"{p['hooks']} hooks" if p["hooks"] else ""
commands = f"{p['commands']} commands" if p.get("commands") else ""
parts = [x for x in [tools, hooks, commands] if x]
parts = [x for x in [tools, hooks] if x]
detail = f" ({', '.join(parts)})" if parts else ""
error = f"{p['error']}" if p["error"] else ""
print(f" {status} {p['name']}{version}{detail}{error}")
@@ -5759,7 +5761,7 @@ class HermesCLI:
border_style=_resp_color,
style=_resp_text,
box=rich_box.HORIZONTALS,
padding=(1, 4),
padding=(1, 2),
))
else:
_cprint(" (No response generated)")
@@ -5883,7 +5885,7 @@ class HermesCLI:
title_align="left",
border_style=_resp_color,
box=rich_box.HORIZONTALS,
padding=(1, 4),
padding=(1, 2),
))
else:
_cprint(" 💬 /btw: (no response)")
@@ -5950,7 +5952,7 @@ class HermesCLI:
parts = cmd.strip().split(None, 1)
sub = parts[1].lower().strip() if len(parts) > 1 else "status"
_DEFAULT_CDP = "http://127.0.0.1:9222"
_DEFAULT_CDP = "http://localhost:9222"
current = os.environ.get("BROWSER_CDP_URL", "").strip()
if sub.startswith("connect"):
@@ -7646,7 +7648,7 @@ class HermesCLI:
label = " ⚕ Hermes "
fill = w - 2 - len(label)
_cprint(f"\n{_ACCENT}╭─{label}{'' * max(fill - 1, 0)}{_RST}")
_cprint(f"{_STREAM_PAD}{sentence.rstrip()}")
_cprint(sentence.rstrip())
tts_thread = threading.Thread(
target=stream_tts_to_speaker,
@@ -7877,7 +7879,7 @@ class HermesCLI:
border_style=_resp_color,
style=_resp_text,
box=rich_box.HORIZONTALS,
padding=(1, 4),
padding=(1, 2),
))
-6
View File
@@ -501,12 +501,6 @@ def update_job(job_id: str, updates: Dict[str, Any]) -> Optional[Dict[str, Any]]
if schedule_changed:
updated_schedule = updated["schedule"]
# The API may pass schedule as a raw string (e.g. "every 10m")
# instead of a pre-parsed dict. Normalize it the same way
# create_job() does so downstream code can call .get() safely.
if isinstance(updated_schedule, str):
updated_schedule = parse_schedule(updated_schedule)
updated["schedule"] = updated_schedule
updated["schedule_display"] = updates.get(
"schedule_display",
updated_schedule.get("display", updated.get("schedule_display")),
+2 -9
View File
@@ -10,7 +10,6 @@ runs at a time if multiple processes overlap.
import asyncio
import concurrent.futures
import contextvars
import json
import logging
import os
@@ -289,13 +288,11 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Option
if wrap_response:
task_name = job.get("name", job["id"])
job_id = job.get("id", "")
delivery_content = (
f"Cronjob Response: {task_name}\n"
f"(job_id: {job_id})\n"
f"-------------\n\n"
f"{content}\n\n"
f"To stop or manage this job, send me a new message (e.g. \"stop reminder {task_name}\")."
f"Note: The agent cannot see this message, and therefore cannot respond to it."
)
else:
delivery_content = content
@@ -771,11 +768,7 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
_cron_inactivity_limit = _cron_timeout if _cron_timeout > 0 else None
_POLL_INTERVAL = 5.0
_cron_pool = concurrent.futures.ThreadPoolExecutor(max_workers=1)
# Preserve scheduler-scoped ContextVar state (for example skill-declared
# env passthrough registrations) when the cron run hops into the worker
# thread used for inactivity timeout monitoring.
_cron_context = contextvars.copy_context()
_cron_future = _cron_pool.submit(_cron_context.run, agent.run_conversation, prompt)
_cron_future = _cron_pool.submit(agent.run_conversation, prompt)
_inactivity_timeout = False
try:
if _cron_inactivity_limit is None:
Executable → Regular
+6 -13
View File
@@ -1,14 +1,13 @@
#!/bin/bash
# Docker/Podman entrypoint: bootstrap config files into the mounted volume, then run hermes.
# Docker entrypoint: bootstrap config files into the mounted volume, then run hermes.
set -e
HERMES_HOME="${HERMES_HOME:-/opt/data}"
HERMES_HOME="/opt/data"
INSTALL_DIR="/opt/hermes"
# --- Privilege dropping via gosu ---
# When started as root (the default for Docker, or fakeroot in rootless Podman),
# optionally remap the hermes user/group to match host-side ownership, fix volume
# permissions, then re-exec as hermes.
# When started as root (the default), optionally remap the hermes user/group
# to match host-side ownership, fix volume permissions, then re-exec as hermes.
if [ "$(id -u)" = "0" ]; then
if [ -n "$HERMES_UID" ] && [ "$HERMES_UID" != "$(id -u hermes)" ]; then
echo "Changing hermes UID to $HERMES_UID"
@@ -17,19 +16,13 @@ if [ "$(id -u)" = "0" ]; then
if [ -n "$HERMES_GID" ] && [ "$HERMES_GID" != "$(id -g hermes)" ]; then
echo "Changing hermes GID to $HERMES_GID"
# -o allows non-unique GID (e.g. macOS GID 20 "staff" may already exist
# as "dialout" in the Debian-based container image)
groupmod -o -g "$HERMES_GID" hermes 2>/dev/null || true
groupmod -g "$HERMES_GID" hermes
fi
actual_hermes_uid=$(id -u hermes)
if [ "$(stat -c %u "$HERMES_HOME" 2>/dev/null)" != "$actual_hermes_uid" ]; then
echo "$HERMES_HOME is not owned by $actual_hermes_uid, fixing"
# In rootless Podman the container's "root" is mapped to an unprivileged
# host UID — chown will fail. That's fine: the volume is already owned
# by the mapped user on the host side.
chown -R hermes:hermes "$HERMES_HOME" 2>/dev/null || \
echo "Warning: chown failed (rootless container?) — continuing anyway"
chown -R hermes:hermes "$HERMES_HOME"
fi
echo "Dropping root privileges"
-18
View File
@@ -554,12 +554,6 @@ def load_gateway_config() -> GatewayConfig:
bridged["mention_patterns"] = platform_cfg["mention_patterns"]
if plat == Platform.DISCORD and "channel_skill_bindings" in platform_cfg:
bridged["channel_skill_bindings"] = platform_cfg["channel_skill_bindings"]
if "channel_prompts" in platform_cfg:
channel_prompts = platform_cfg["channel_prompts"]
if isinstance(channel_prompts, dict):
bridged["channel_prompts"] = {str(k): v for k, v in channel_prompts.items()}
else:
bridged["channel_prompts"] = channel_prompts
if not bridged:
continue
plat_data = platforms_data.setdefault(plat.value, {})
@@ -638,18 +632,6 @@ def load_gateway_config() -> GatewayConfig:
os.environ["TELEGRAM_IGNORED_THREADS"] = str(ignored_threads)
if "reactions" in telegram_cfg and not os.getenv("TELEGRAM_REACTIONS"):
os.environ["TELEGRAM_REACTIONS"] = str(telegram_cfg["reactions"]).lower()
if "proxy_url" in telegram_cfg and not os.getenv("TELEGRAM_PROXY"):
os.environ["TELEGRAM_PROXY"] = str(telegram_cfg["proxy_url"]).strip()
if "disable_link_previews" in telegram_cfg:
plat_data = platforms_data.setdefault(Platform.TELEGRAM.value, {})
if not isinstance(plat_data, dict):
plat_data = {}
platforms_data[Platform.TELEGRAM.value] = plat_data
extra = plat_data.setdefault("extra", {})
if not isinstance(extra, dict):
extra = {}
plat_data["extra"] = extra
extra["disable_link_previews"] = telegram_cfg["disable_link_previews"]
whatsapp_cfg = yaml_cfg.get("whatsapp", {})
if isinstance(whatsapp_cfg, dict):
+25 -1
View File
@@ -3,11 +3,12 @@ Event Hook System
A lightweight event-driven system that fires handlers at key lifecycle points.
Hooks are discovered from ~/.hermes/hooks/ directories, each containing:
- HOOK.yaml (metadata: name, description, events list)
- HOOK.yaml (metadata: name, description, events list, optional startup_readiness)
- handler.py (Python handler with async def handle(event_type, context))
Events:
- gateway:startup -- Gateway process starts
- gateway:shutdown -- Gateway process is shutting down
- session:start -- New session created (first message of a new session)
- session:end -- Session ends (user ran /new or /reset)
- session:reset -- Session reset completed (new session entry created)
@@ -31,6 +32,26 @@ from hermes_cli.config import get_hermes_home
HOOKS_DIR = get_hermes_home() / "hooks"
def _normalize_startup_readiness(hook_name: str, manifest: dict[str, Any]) -> Optional[dict[str, Any]]:
"""Validate and normalize optional startup readiness metadata."""
readiness = manifest.get("startup_readiness")
if readiness is None:
return None
if not isinstance(readiness, dict):
print(f"[hooks] Ignoring startup_readiness for {hook_name}: expected mapping", flush=True)
return None
check_id = str(readiness.get("id", "")).strip()
if not check_id:
print(f"[hooks] Ignoring startup_readiness for {hook_name}: missing id", flush=True)
return None
return {
"id": check_id,
"required": bool(readiness.get("required", True)),
}
class HookRegistry:
"""
Discovers, loads, and fires event hooks.
@@ -62,6 +83,7 @@ class HookRegistry:
"description": "Run ~/.hermes/BOOT.md on gateway startup",
"events": ["gateway:startup"],
"path": "(builtin)",
"startup_readiness": None,
})
except Exception as e:
print(f"[hooks] Could not load built-in boot-md hook: {e}", flush=True)
@@ -102,6 +124,7 @@ class HookRegistry:
if not events:
print(f"[hooks] Skipping {hook_name}: no events declared", flush=True)
continue
startup_readiness = _normalize_startup_readiness(hook_name, manifest)
# Dynamically load the handler module
spec = importlib.util.spec_from_file_location(
@@ -128,6 +151,7 @@ class HookRegistry:
"description": manifest.get("description", ""),
"events": events,
"path": str(hook_dir),
"startup_readiness": startup_readiness,
})
print(f"[hooks] Loaded hook '{hook_name}' for events: {events}", flush=True)
+3 -512
View File
@@ -515,8 +515,6 @@ class APIServerAdapter(BasePlatformAdapter):
session_id: Optional[str] = None,
stream_delta_callback=None,
tool_progress_callback=None,
tool_start_callback=None,
tool_complete_callback=None,
) -> Any:
"""
Create an AIAgent instance using the gateway's runtime config.
@@ -555,8 +553,6 @@ class APIServerAdapter(BasePlatformAdapter):
platform="api_server",
stream_delta_callback=stream_delta_callback,
tool_progress_callback=tool_progress_callback,
tool_start_callback=tool_start_callback,
tool_complete_callback=tool_complete_callback,
session_db=self._ensure_session_db(),
fallback_model=fallback_model,
)
@@ -969,427 +965,6 @@ class APIServerAdapter(BasePlatformAdapter):
return response
async def _write_sse_responses(
self,
request: "web.Request",
response_id: str,
model: str,
created_at: int,
stream_q,
agent_task,
agent_ref,
conversation_history: List[Dict[str, str]],
user_message: str,
instructions: Optional[str],
conversation: Optional[str],
store: bool,
session_id: str,
) -> "web.StreamResponse":
"""Write an SSE stream for POST /v1/responses (OpenAI Responses API).
Emits spec-compliant event types as the agent runs:
- ``response.created`` initial envelope (status=in_progress)
- ``response.output_text.delta`` / ``response.output_text.done``
streamed assistant text
- ``response.output_item.added`` / ``response.output_item.done``
with ``item.type == "function_call"`` when the agent invokes a
tool (both events fire; the ``done`` event carries the finalized
``arguments`` string)
- ``response.output_item.added`` with
``item.type == "function_call_output"`` tool result with
``{call_id, output, status}``
- ``response.completed`` terminal event carrying the full
response object with all output items + usage (same payload
shape as the non-streaming path for parity)
- ``response.failed`` terminal event on agent error
If the client disconnects mid-stream, ``agent.interrupt()`` is
called so the agent stops issuing upstream LLM calls, then the
asyncio task is cancelled. When ``store=True`` the full response
is persisted to the ResponseStore in a ``finally`` block so GET
/v1/responses/{id} and ``previous_response_id`` chaining work the
same as the batch path.
"""
import queue as _q
sse_headers = {
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache",
"X-Accel-Buffering": "no",
}
origin = request.headers.get("Origin", "")
cors = self._cors_headers_for_origin(origin) if origin else None
if cors:
sse_headers.update(cors)
if session_id:
sse_headers["X-Hermes-Session-Id"] = session_id
response = web.StreamResponse(status=200, headers=sse_headers)
await response.prepare(request)
# State accumulated during the stream
final_text_parts: List[str] = []
# Track open function_call items by name so we can emit a matching
# ``done`` event when the tool completes. Order preserved.
pending_tool_calls: List[Dict[str, Any]] = []
# Output items we've emitted so far (used to build the terminal
# response.completed payload). Kept in the order they appeared.
emitted_items: List[Dict[str, Any]] = []
# Monotonic counter for output_index (spec requires it).
output_index = 0
# Monotonic counter for call_id generation if the agent doesn't
# provide one (it doesn't, from tool_progress_callback).
call_counter = 0
# Canonical Responses SSE events include a monotonically increasing
# sequence_number. Add it server-side for every emitted event so
# clients that validate the OpenAI event schema can parse our stream.
sequence_number = 0
# Track the assistant message item id + content index for text
# delta events — the spec ties deltas to a specific item.
message_item_id = f"msg_{uuid.uuid4().hex[:24]}"
message_output_index: Optional[int] = None
message_opened = False
async def _write_event(event_type: str, data: Dict[str, Any]) -> None:
nonlocal sequence_number
if "sequence_number" not in data:
data["sequence_number"] = sequence_number
sequence_number += 1
payload = f"event: {event_type}\ndata: {json.dumps(data)}\n\n"
await response.write(payload.encode())
def _envelope(status: str) -> Dict[str, Any]:
env: Dict[str, Any] = {
"id": response_id,
"object": "response",
"status": status,
"created_at": created_at,
"model": model,
}
return env
final_response_text = ""
agent_error: Optional[str] = None
usage: Dict[str, int] = {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0}
try:
# response.created — initial envelope, status=in_progress
created_env = _envelope("in_progress")
created_env["output"] = []
await _write_event("response.created", {
"type": "response.created",
"response": created_env,
})
last_activity = time.monotonic()
async def _open_message_item() -> None:
"""Emit response.output_item.added for the assistant message
the first time any text delta arrives."""
nonlocal message_opened, message_output_index, output_index
if message_opened:
return
message_opened = True
message_output_index = output_index
output_index += 1
item = {
"id": message_item_id,
"type": "message",
"status": "in_progress",
"role": "assistant",
"content": [],
}
await _write_event("response.output_item.added", {
"type": "response.output_item.added",
"output_index": message_output_index,
"item": item,
})
async def _emit_text_delta(delta_text: str) -> None:
await _open_message_item()
final_text_parts.append(delta_text)
await _write_event("response.output_text.delta", {
"type": "response.output_text.delta",
"item_id": message_item_id,
"output_index": message_output_index,
"content_index": 0,
"delta": delta_text,
"logprobs": [],
})
async def _emit_tool_started(payload: Dict[str, Any]) -> str:
"""Emit response.output_item.added for a function_call.
Returns the call_id so the matching completion event can
reference it. Prefer the real ``tool_call_id`` from the
agent when available; fall back to a generated call id for
safety in tests or older code paths.
"""
nonlocal output_index, call_counter
call_counter += 1
call_id = payload.get("tool_call_id") or f"call_{response_id[5:]}_{call_counter}"
args = payload.get("arguments", {})
if isinstance(args, dict):
arguments_str = json.dumps(args)
else:
arguments_str = str(args)
item = {
"id": f"fc_{uuid.uuid4().hex[:24]}",
"type": "function_call",
"status": "in_progress",
"name": payload.get("name", ""),
"call_id": call_id,
"arguments": arguments_str,
}
idx = output_index
output_index += 1
pending_tool_calls.append({
"call_id": call_id,
"name": payload.get("name", ""),
"arguments": arguments_str,
"item_id": item["id"],
"output_index": idx,
})
emitted_items.append({
"type": "function_call",
"name": payload.get("name", ""),
"arguments": arguments_str,
"call_id": call_id,
})
await _write_event("response.output_item.added", {
"type": "response.output_item.added",
"output_index": idx,
"item": item,
})
return call_id
async def _emit_tool_completed(payload: Dict[str, Any]) -> None:
"""Emit response.output_item.done (function_call) followed
by response.output_item.added (function_call_output)."""
nonlocal output_index
call_id = payload.get("tool_call_id")
result = payload.get("result", "")
pending = None
if call_id:
for i, p in enumerate(pending_tool_calls):
if p["call_id"] == call_id:
pending = pending_tool_calls.pop(i)
break
if pending is None:
# Completion without a matching start — skip to avoid
# emitting orphaned done events.
return
# function_call done
done_item = {
"id": pending["item_id"],
"type": "function_call",
"status": "completed",
"name": pending["name"],
"call_id": pending["call_id"],
"arguments": pending["arguments"],
}
await _write_event("response.output_item.done", {
"type": "response.output_item.done",
"output_index": pending["output_index"],
"item": done_item,
})
# function_call_output added (result)
result_str = result if isinstance(result, str) else json.dumps(result)
output_parts = [{"type": "input_text", "text": result_str}]
output_item = {
"id": f"fco_{uuid.uuid4().hex[:24]}",
"type": "function_call_output",
"call_id": pending["call_id"],
"output": output_parts,
"status": "completed",
}
idx = output_index
output_index += 1
emitted_items.append({
"type": "function_call_output",
"call_id": pending["call_id"],
"output": output_parts,
})
await _write_event("response.output_item.added", {
"type": "response.output_item.added",
"output_index": idx,
"item": output_item,
})
await _write_event("response.output_item.done", {
"type": "response.output_item.done",
"output_index": idx,
"item": output_item,
})
# Main drain loop — thread-safe queue fed by agent callbacks.
async def _dispatch(it) -> None:
"""Route a queue item to the correct SSE emitter.
Plain strings are text deltas. Tagged tuples with
``__tool_started__`` / ``__tool_completed__`` prefixes
are tool lifecycle events.
"""
if isinstance(it, tuple) and len(it) == 2 and isinstance(it[0], str):
tag, payload = it
if tag == "__tool_started__":
await _emit_tool_started(payload)
elif tag == "__tool_completed__":
await _emit_tool_completed(payload)
# Unknown tags are silently ignored (forward-compat).
elif isinstance(it, str):
await _emit_text_delta(it)
# Other types (non-string, non-tuple) are silently dropped.
loop = asyncio.get_event_loop()
while True:
try:
item = await loop.run_in_executor(None, lambda: stream_q.get(timeout=0.5))
except _q.Empty:
if agent_task.done():
# Drain remaining
while True:
try:
item = stream_q.get_nowait()
if item is None:
break
await _dispatch(item)
last_activity = time.monotonic()
except _q.Empty:
break
break
if time.monotonic() - last_activity >= CHAT_COMPLETIONS_SSE_KEEPALIVE_SECONDS:
await response.write(b": keepalive\n\n")
last_activity = time.monotonic()
continue
if item is None: # EOS sentinel
break
await _dispatch(item)
last_activity = time.monotonic()
# Pick up agent result + usage from the completed task
try:
result, agent_usage = await agent_task
usage = agent_usage or usage
# If the agent produced a final_response but no text
# deltas were streamed (e.g. some providers only emit
# the full response at the end), emit a single fallback
# delta so Responses clients still receive a live text part.
agent_final = result.get("final_response", "") if isinstance(result, dict) else ""
if agent_final and not final_text_parts:
await _emit_text_delta(agent_final)
if agent_final and not final_response_text:
final_response_text = agent_final
if isinstance(result, dict) and result.get("error") and not final_response_text:
agent_error = result["error"]
except Exception as e: # noqa: BLE001
logger.error("Error running agent for streaming responses: %s", e, exc_info=True)
agent_error = str(e)
# Close the message item if it was opened
final_response_text = "".join(final_text_parts) or final_response_text
if message_opened:
await _write_event("response.output_text.done", {
"type": "response.output_text.done",
"item_id": message_item_id,
"output_index": message_output_index,
"content_index": 0,
"text": final_response_text,
"logprobs": [],
})
msg_done_item = {
"id": message_item_id,
"type": "message",
"status": "completed",
"role": "assistant",
"content": [
{"type": "output_text", "text": final_response_text}
],
}
await _write_event("response.output_item.done", {
"type": "response.output_item.done",
"output_index": message_output_index,
"item": msg_done_item,
})
# Always append a final message item in the completed
# response envelope so clients that only parse the terminal
# payload still see the assistant text. This mirrors the
# shape produced by _extract_output_items in the batch path.
final_items: List[Dict[str, Any]] = list(emitted_items)
final_items.append({
"type": "message",
"role": "assistant",
"content": [
{"type": "output_text", "text": final_response_text or (agent_error or "")}
],
})
if agent_error:
failed_env = _envelope("failed")
failed_env["output"] = final_items
failed_env["error"] = {"message": agent_error, "type": "server_error"}
failed_env["usage"] = {
"input_tokens": usage.get("input_tokens", 0),
"output_tokens": usage.get("output_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
}
await _write_event("response.failed", {
"type": "response.failed",
"response": failed_env,
})
else:
completed_env = _envelope("completed")
completed_env["output"] = final_items
completed_env["usage"] = {
"input_tokens": usage.get("input_tokens", 0),
"output_tokens": usage.get("output_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
}
await _write_event("response.completed", {
"type": "response.completed",
"response": completed_env,
})
# Persist for future chaining / GET retrieval, mirroring
# the batch path behavior.
if store:
full_history = list(conversation_history)
full_history.append({"role": "user", "content": user_message})
if isinstance(result, dict) and result.get("messages"):
full_history.extend(result["messages"])
else:
full_history.append({"role": "assistant", "content": final_response_text})
self._response_store.put(response_id, {
"response": completed_env,
"conversation_history": full_history,
"instructions": instructions,
"session_id": session_id,
})
if conversation:
self._response_store.set_conversation(conversation, response_id)
except (ConnectionResetError, ConnectionAbortedError, BrokenPipeError, OSError):
# Client disconnected — interrupt the agent so it stops
# making upstream LLM calls, then cancel the task.
agent = agent_ref[0] if agent_ref else None
if agent is not None:
try:
agent.interrupt("SSE client disconnected")
except Exception:
pass
if not agent_task.done():
agent_task.cancel()
try:
await agent_task
except (asyncio.CancelledError, Exception):
pass
logger.info("SSE client disconnected; interrupted agent task %s", response_id)
return response
async def _handle_responses(self, request: "web.Request") -> "web.Response":
"""POST /v1/responses — OpenAI Responses API format."""
auth_err = self._check_auth(request)
@@ -1460,13 +1035,11 @@ class APIServerAdapter(BasePlatformAdapter):
if previous_response_id:
logger.debug("Both conversation_history and previous_response_id provided; using conversation_history")
stored_session_id = None
if not conversation_history and previous_response_id:
stored = self._response_store.get(previous_response_id)
if stored is None:
return web.json_response(_openai_error(f"Previous response not found: {previous_response_id}"), status=404)
conversation_history = list(stored.get("conversation_history", []))
stored_session_id = stored.get("session_id")
# If no instructions provided, carry forward from previous
if instructions is None:
instructions = stored.get("instructions")
@@ -1484,83 +1057,8 @@ class APIServerAdapter(BasePlatformAdapter):
if body.get("truncation") == "auto" and len(conversation_history) > 100:
conversation_history = conversation_history[-100:]
# Reuse session from previous_response_id chain so the dashboard
# groups the entire conversation under one session entry.
session_id = stored_session_id or str(uuid.uuid4())
stream = bool(body.get("stream", False))
if stream:
# Streaming branch — emit OpenAI Responses SSE events as the
# agent runs so frontends can render text deltas and tool
# calls in real time. See _write_sse_responses for details.
import queue as _q
_stream_q: _q.Queue = _q.Queue()
def _on_delta(delta):
# None from the agent is a CLI box-close signal, not EOS.
# Forwarding would kill the SSE stream prematurely; the
# SSE writer detects completion via agent_task.done().
if delta is not None:
_stream_q.put(delta)
def _on_tool_progress(event_type, name, preview, args, **kwargs):
"""Queue non-start tool progress events if needed in future.
The structured Responses stream uses ``tool_start_callback``
and ``tool_complete_callback`` for exact call-id correlation,
so progress events are currently ignored here.
"""
return
def _on_tool_start(tool_call_id, function_name, function_args):
"""Queue a started tool for live function_call streaming."""
_stream_q.put(("__tool_started__", {
"tool_call_id": tool_call_id,
"name": function_name,
"arguments": function_args or {},
}))
def _on_tool_complete(tool_call_id, function_name, function_args, function_result):
"""Queue a completed tool result for live function_call_output streaming."""
_stream_q.put(("__tool_completed__", {
"tool_call_id": tool_call_id,
"name": function_name,
"arguments": function_args or {},
"result": function_result,
}))
agent_ref = [None]
agent_task = asyncio.ensure_future(self._run_agent(
user_message=user_message,
conversation_history=conversation_history,
ephemeral_system_prompt=instructions,
session_id=session_id,
stream_delta_callback=_on_delta,
tool_progress_callback=_on_tool_progress,
tool_start_callback=_on_tool_start,
tool_complete_callback=_on_tool_complete,
agent_ref=agent_ref,
))
response_id = f"resp_{uuid.uuid4().hex[:28]}"
model_name = body.get("model", self._model_name)
created_at = int(time.time())
return await self._write_sse_responses(
request=request,
response_id=response_id,
model=model_name,
created_at=created_at,
stream_q=_stream_q,
agent_task=agent_task,
agent_ref=agent_ref,
conversation_history=conversation_history,
user_message=user_message,
instructions=instructions,
conversation=conversation,
store=store,
session_id=session_id,
)
# Run the agent (with Idempotency-Key support)
session_id = str(uuid.uuid4())
async def _compute_response():
return await self._run_agent(
@@ -1635,7 +1133,6 @@ class APIServerAdapter(BasePlatformAdapter):
"response": response_data,
"conversation_history": full_history,
"instructions": instructions,
"session_id": session_id,
})
# Update conversation mapping so the next request with the same
# conversation name automatically chains to this response
@@ -1989,8 +1486,6 @@ class APIServerAdapter(BasePlatformAdapter):
session_id: Optional[str] = None,
stream_delta_callback=None,
tool_progress_callback=None,
tool_start_callback=None,
tool_complete_callback=None,
agent_ref: Optional[list] = None,
) -> tuple:
"""
@@ -2012,8 +1507,6 @@ class APIServerAdapter(BasePlatformAdapter):
session_id=session_id,
stream_delta_callback=stream_delta_callback,
tool_progress_callback=tool_progress_callback,
tool_start_callback=tool_start_callback,
tool_complete_callback=tool_complete_callback,
)
if agent_ref is not None:
agent_ref[0] = agent
@@ -2150,12 +1643,10 @@ class APIServerAdapter(BasePlatformAdapter):
if previous_response_id:
logger.debug("Both conversation_history and previous_response_id provided; using conversation_history")
stored_session_id = None
if not conversation_history and previous_response_id:
stored = self._response_store.get(previous_response_id)
if stored:
conversation_history = list(stored.get("conversation_history", []))
stored_session_id = stored.get("session_id")
if instructions is None:
instructions = stored.get("instructions")
@@ -2174,7 +1665,7 @@ class APIServerAdapter(BasePlatformAdapter):
)
conversation_history.append({"role": msg["role"], "content": str(content)})
session_id = body.get("session_id") or stored_session_id or run_id
session_id = body.get("session_id") or run_id
ephemeral_system_prompt = instructions
async def _run_and_close():
-62
View File
@@ -682,10 +682,6 @@ class MessageEvent:
# Auto-loaded skill(s) for topic/channel bindings (e.g., Telegram DM Topics,
# Discord channel_skill_bindings). A single name or ordered list.
auto_skill: Optional[str | list[str]] = None
# Per-channel ephemeral system prompt (e.g. Discord channel_prompts).
# Applied at API call time and never persisted to transcript history.
channel_prompt: Optional[str] = None
# Internal flag — set for synthetic events (e.g. background process
# completion notifications) that must bypass user authorization checks.
@@ -780,36 +776,6 @@ _RETRYABLE_ERROR_PATTERNS = (
MessageHandler = Callable[[MessageEvent], Awaitable[Optional[str]]]
def resolve_channel_prompt(
config_extra: dict,
channel_id: str,
parent_id: str | None = None,
) -> str | None:
"""Resolve a per-channel ephemeral prompt from platform config.
Looks up ``channel_prompts`` in the adapter's ``config.extra`` dict.
Prefers an exact match on *channel_id*; falls back to *parent_id*
(useful for forum threads / child channels inheriting a parent prompt).
Returns the prompt string, or None if no match is found. Blank/whitespace-
only prompts are treated as absent.
"""
prompts = config_extra.get("channel_prompts") or {}
if not isinstance(prompts, dict):
return None
for key in (channel_id, parent_id):
if not key:
continue
prompt = prompts.get(key)
if prompt is None:
continue
prompt = str(prompt).strip()
if prompt:
return prompt
return None
class BasePlatformAdapter(ABC):
"""
Base class for platform adapters.
@@ -839,11 +805,6 @@ class BasePlatformAdapter(ABC):
# Gateway shutdown cancels these so an old gateway instance doesn't keep
# working on a task after --replace or manual restarts.
self._background_tasks: set[asyncio.Task] = set()
# One-shot callbacks to fire after the main response is delivered.
# Keyed by session_key. GatewayRunner uses this to defer
# background-review notifications ("💾 Skill created") until the
# primary reply has been sent.
self._post_delivery_callbacks: Dict[str, Callable] = {}
self._expected_cancelled_tasks: set[asyncio.Task] = set()
self._busy_session_handler: Optional[Callable[[MessageEvent, str], Awaitable[bool]]] = None
# Chats where auto-TTS on voice input is disabled (set by /voice off)
@@ -1663,21 +1624,6 @@ class BasePlatformAdapter(ABC):
# streaming already delivered the text (already_sent=True) or
# when the message was queued behind an active agent. Log at
# DEBUG to avoid noisy warnings for expected behavior.
#
# Suppress stale response when the session was interrupted by a
# new message that hasn't been consumed yet. The pending message
# is processed by the pending-message handler below (#8221/#2483).
if (
response
and interrupt_event.is_set()
and session_key in self._pending_messages
):
logger.info(
"[%s] Suppressing stale response for interrupted session %s",
self.name,
session_key,
)
response = None
if not response:
logger.debug("[%s] Handler returned empty/None response for %s", self.name, event.source.chat_id)
if response:
@@ -1899,14 +1845,6 @@ class BasePlatformAdapter(ABC):
except Exception:
pass # Last resort — don't let error reporting crash the handler
finally:
# Fire any one-shot post-delivery callback registered for this
# session (e.g. deferred background-review notifications).
_post_cb = getattr(self, "_post_delivery_callbacks", {}).pop(session_key, None)
if callable(_post_cb):
try:
_post_cb()
except Exception:
pass
# Stop typing indicator
typing_task.cancel()
try:
+1 -150
View File
@@ -1379,68 +1379,6 @@ class DiscordAdapter(BasePlatformAdapter):
)
return await super().send_image(chat_id, image_url, caption, reply_to)
async def send_animation(
self,
chat_id: str,
animation_url: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Send an animated GIF natively as a Discord file attachment."""
if not self._client:
return SendResult(success=False, error="Not connected")
if not is_safe_url(animation_url):
logger.warning("[%s] Blocked unsafe animation URL during Discord send_animation", self.name)
return await super().send_animation(chat_id, animation_url, caption, reply_to, metadata=metadata)
try:
import aiohttp
channel = self._client.get_channel(int(chat_id))
if not channel:
channel = await self._client.fetch_channel(int(chat_id))
if not channel:
return SendResult(success=False, error=f"Channel {chat_id} not found")
# Download the GIF and send as a Discord file attachment
# (Discord renders .gif attachments as auto-playing animations inline)
from gateway.platforms.base import resolve_proxy_url, proxy_kwargs_for_aiohttp
_proxy = resolve_proxy_url(platform_env_var="DISCORD_PROXY")
_sess_kw, _req_kw = proxy_kwargs_for_aiohttp(_proxy)
async with aiohttp.ClientSession(**_sess_kw) as session:
async with session.get(animation_url, timeout=aiohttp.ClientTimeout(total=30), **_req_kw) as resp:
if resp.status != 200:
raise Exception(f"Failed to download animation: HTTP {resp.status}")
animation_data = await resp.read()
import io
file = discord.File(io.BytesIO(animation_data), filename="animation.gif")
msg = await channel.send(
content=caption if caption else None,
file=file,
)
return SendResult(success=True, message_id=str(msg.id))
except ImportError:
logger.warning(
"[%s] aiohttp not installed, falling back to URL. Run: pip install aiohttp",
self.name,
exc_info=True,
)
return await super().send_animation(chat_id, animation_url, caption, reply_to, metadata=metadata)
except Exception as e: # pragma: no cover - defensive logging
logger.error(
"[%s] Failed to send animation attachment, falling back to URL: %s",
self.name,
e,
exc_info=True,
)
return await super().send_animation(chat_id, animation_url, caption, reply_to, metadata=metadata)
async def send_video(
self,
chat_id: str,
@@ -1758,10 +1696,6 @@ class DiscordAdapter(BasePlatformAdapter):
async def slash_update(interaction: discord.Interaction):
await self._run_simple_slash(interaction, "/update", "Update initiated~")
@tree.command(name="restart", description="Gracefully restart the Hermes gateway")
async def slash_restart(interaction: discord.Interaction):
await self._run_simple_slash(interaction, "/restart", "Restart requested~")
@tree.command(name="approve", description="Approve a pending dangerous command")
@discord.app_commands.describe(scope="Optional: 'all', 'session', 'always', 'all session', 'all always'")
async def slash_approve(interaction: discord.Interaction, scope: str = ""):
@@ -1802,76 +1736,6 @@ class DiscordAdapter(BasePlatformAdapter):
async def slash_btw(interaction: discord.Interaction, question: str):
await self._run_simple_slash(interaction, f"/btw {question}")
# ── Auto-register any gateway-available commands not yet on the tree ──
# This ensures new commands added to COMMAND_REGISTRY in
# hermes_cli/commands.py automatically appear as Discord slash
# commands without needing a manual entry here.
try:
from hermes_cli.commands import COMMAND_REGISTRY, _is_gateway_available, _resolve_config_gates
already_registered = set()
try:
already_registered = {cmd.name for cmd in tree.get_commands()}
except Exception:
pass
config_overrides = _resolve_config_gates()
for cmd_def in COMMAND_REGISTRY:
if not _is_gateway_available(cmd_def, config_overrides):
continue
# Discord command names: lowercase, hyphens OK, max 32 chars.
discord_name = cmd_def.name.lower()[:32]
if discord_name in already_registered:
continue
# Skip aliases that overlap with already-registered names
# (aliases for explicitly registered commands are handled above).
desc = (cmd_def.description or f"Run /{cmd_def.name}")[:100]
has_args = bool(cmd_def.args_hint)
if has_args:
# Command takes optional arguments — create handler with
# an optional ``args`` string parameter.
def _make_args_handler(_name: str, _hint: str):
@discord.app_commands.describe(args=f"Arguments: {_hint}"[:100])
async def _handler(interaction: discord.Interaction, args: str = ""):
await self._run_simple_slash(
interaction, f"/{_name} {args}".strip()
)
_handler.__name__ = f"auto_slash_{_name.replace('-', '_')}"
return _handler
handler = _make_args_handler(cmd_def.name, cmd_def.args_hint)
else:
# Parameterless command.
def _make_simple_handler(_name: str):
async def _handler(interaction: discord.Interaction):
await self._run_simple_slash(interaction, f"/{_name}")
_handler.__name__ = f"auto_slash_{_name.replace('-', '_')}"
return _handler
handler = _make_simple_handler(cmd_def.name)
auto_cmd = discord.app_commands.Command(
name=discord_name,
description=desc,
callback=handler,
)
try:
tree.add_command(auto_cmd)
already_registered.add(discord_name)
except Exception:
# Silently skip commands that fail registration (e.g.
# name conflict with a subcommand group).
pass
logger.debug(
"Discord auto-registered %d commands from COMMAND_REGISTRY",
len(already_registered),
)
except Exception as e:
logger.warning("Discord auto-register from COMMAND_REGISTRY failed: %s", e)
# Register skills under a single /skill command group with category
# subcommand groups. This uses 1 top-level slot instead of N,
# supporting up to 25 categories × 25 skills = 625 skills.
@@ -1992,14 +1856,11 @@ class DiscordAdapter(BasePlatformAdapter):
)
msg_type = MessageType.COMMAND if text.startswith("/") else MessageType.TEXT
channel_id = str(interaction.channel_id)
parent_id = str(getattr(getattr(interaction, "channel", None), "parent_id", "") or "")
return MessageEvent(
text=text,
message_type=msg_type,
source=source,
raw_message=interaction,
channel_prompt=self._resolve_channel_prompt(channel_id, parent_id or None),
)
# ------------------------------------------------------------------
@@ -2070,17 +1931,14 @@ class DiscordAdapter(BasePlatformAdapter):
chat_topic=chat_topic,
)
_parent_channel = self._thread_parent_channel(getattr(interaction, "channel", None))
_parent_id = str(getattr(_parent_channel, "id", "") or "")
_parent_id = str(getattr(getattr(interaction, "channel", None), "parent_id", "") or "")
_skills = self._resolve_channel_skills(thread_id, _parent_id or None)
_channel_prompt = self._resolve_channel_prompt(thread_id, _parent_id or None)
event = MessageEvent(
text=text,
message_type=MessageType.TEXT,
source=source,
raw_message=interaction,
auto_skill=_skills,
channel_prompt=_channel_prompt,
)
await self.handle_message(event)
@@ -2109,11 +1967,6 @@ class DiscordAdapter(BasePlatformAdapter):
return list(dict.fromkeys(skills)) # dedup, preserve order
return None
def _resolve_channel_prompt(self, channel_id: str, parent_id: str | None = None) -> str | None:
"""Resolve a Discord per-channel prompt, preferring the exact channel over its parent."""
from gateway.platforms.base import resolve_channel_prompt
return resolve_channel_prompt(self.config.extra, channel_id, parent_id)
def _thread_parent_channel(self, channel: Any) -> Any:
"""Return the parent text channel when invoked from a thread."""
return getattr(channel, "parent", None) or channel
@@ -2665,7 +2518,6 @@ class DiscordAdapter(BasePlatformAdapter):
_parent_id = str(getattr(_chan, "parent_id", "") or "")
_chan_id = str(getattr(_chan, "id", ""))
_skills = self._resolve_channel_skills(_chan_id, _parent_id or None)
_channel_prompt = self._resolve_channel_prompt(_chan_id, _parent_id or None)
reply_to_id = None
reply_to_text = None
@@ -2686,7 +2538,6 @@ class DiscordAdapter(BasePlatformAdapter):
reply_to_text=reply_to_text,
timestamp=message.created_at,
auto_skill=_skills,
channel_prompt=_channel_prompt,
)
# Track thread participation so the bot won't require @mention for
+1 -4
View File
@@ -49,10 +49,7 @@ class MessageDeduplicator:
return False
now = time.time()
if msg_id in self._seen:
if now - self._seen[msg_id] < self._ttl:
return True
# Entry has expired — remove it and treat as new
del self._seen[msg_id]
return True
self._seen[msg_id] = now
if len(self._seen) > self._max_size:
cutoff = now - self._ttl
-8
View File
@@ -729,14 +729,6 @@ class MatrixAdapter(BasePlatformAdapter):
except Exception:
pass
async def stop_typing(self, chat_id: str) -> None:
"""Stop the Matrix typing indicator."""
if self._client:
try:
await self._client.set_typing(RoomID(chat_id), timeout=0)
except Exception:
pass
async def edit_message(
self, chat_id: str, message_id: str, content: str
) -> SendResult:
-7
View File
@@ -718,12 +718,6 @@ class MattermostAdapter(BasePlatformAdapter):
thread_id=thread_id,
)
# Per-channel ephemeral prompt
from gateway.platforms.base import resolve_channel_prompt
_channel_prompt = resolve_channel_prompt(
self.config.extra, channel_id, None,
)
msg_event = MessageEvent(
text=message_text,
message_type=msg_type,
@@ -732,7 +726,6 @@ class MattermostAdapter(BasePlatformAdapter):
message_id=post_id,
media_urls=media_urls if media_urls else None,
media_types=media_types if media_types else None,
channel_prompt=_channel_prompt,
)
await self.handle_message(msg_event)
-7
View File
@@ -1167,12 +1167,6 @@ class SlackAdapter(BasePlatformAdapter):
thread_id=thread_ts,
)
# Per-channel ephemeral prompt
from gateway.platforms.base import resolve_channel_prompt
_channel_prompt = resolve_channel_prompt(
self.config.extra, channel_id, None,
)
msg_event = MessageEvent(
text=text,
message_type=msg_type,
@@ -1182,7 +1176,6 @@ class SlackAdapter(BasePlatformAdapter):
media_urls=media_urls,
media_types=media_types,
reply_to_message_id=thread_ts if thread_ts != ts else None,
channel_prompt=_channel_prompt,
)
# Only react when bot is directly addressed (DM or @mention).
+9 -63
View File
@@ -18,10 +18,6 @@ logger = logging.getLogger(__name__)
try:
from telegram import Update, Bot, Message, InlineKeyboardButton, InlineKeyboardMarkup
try:
from telegram import LinkPreviewOptions
except ImportError:
LinkPreviewOptions = None
from telegram.ext import (
Application,
CommandHandler,
@@ -40,7 +36,6 @@ except ImportError:
Message = Any
InlineKeyboardButton = Any
InlineKeyboardMarkup = Any
LinkPreviewOptions = None
Application = Any
CommandHandler = Any
CallbackQueryHandler = Any
@@ -142,7 +137,6 @@ class TelegramAdapter(BasePlatformAdapter):
self._webhook_mode: bool = False
self._mention_patterns = self._compile_mention_patterns()
self._reply_to_mode: str = getattr(config, 'reply_to_mode', 'first') or 'first'
self._disable_link_previews: bool = self._coerce_bool_extra("disable_link_previews", False)
# Buffer rapid/album photo updates so Telegram image bursts are handled
# as a single MessageEvent instead of self-interrupting multiple turns.
self._media_batch_delay_seconds = float(os.getenv("HERMES_TELEGRAM_MEDIA_BATCH_DELAY_SECONDS", "0.8"))
@@ -169,15 +163,6 @@ class TelegramAdapter(BasePlatformAdapter):
# Approval button state: message_id → session_key
self._approval_state: Dict[int, str] = {}
@staticmethod
def _is_callback_user_authorized(user_id: str) -> bool:
"""Return whether a Telegram inline-button caller may perform gated actions."""
allowed_csv = os.getenv("TELEGRAM_ALLOWED_USERS", "").strip()
if not allowed_csv:
return True
allowed_ids = {uid.strip() for uid in allowed_csv.split(",") if uid.strip()}
return "*" in allowed_ids or user_id in allowed_ids
def _fallback_ips(self) -> list[str]:
"""Return validated fallback IPs from config (populated by _apply_env_overrides)."""
configured = self.config.extra.get("fallback_ips", []) if getattr(self.config, "extra", None) else []
@@ -208,26 +193,6 @@ class TelegramAdapter(BasePlatformAdapter):
pass
return isinstance(error, OSError)
def _coerce_bool_extra(self, key: str, default: bool = False) -> bool:
value = self.config.extra.get(key) if getattr(self.config, "extra", None) else None
if value is None:
return default
if isinstance(value, str):
lowered = value.strip().lower()
if lowered in ("true", "1", "yes", "on"):
return True
if lowered in ("false", "0", "no", "off"):
return False
return default
return bool(value)
def _link_preview_kwargs(self) -> Dict[str, Any]:
if not getattr(self, "_disable_link_previews", False):
return {}
if LinkPreviewOptions is not None:
return {"link_preview_options": LinkPreviewOptions(is_disabled=True)}
return {"disable_web_page_preview": True}
async def _handle_polling_network_error(self, error: Exception) -> None:
"""Reconnect polling after a transient network interruption.
@@ -575,7 +540,7 @@ class TelegramAdapter(BasePlatformAdapter):
"write_timeout": _env_float("HERMES_TELEGRAM_HTTP_WRITE_TIMEOUT", 20.0),
}
proxy_url = resolve_proxy_url("TELEGRAM_PROXY")
proxy_url = resolve_proxy_url()
disable_fallback = (os.getenv("HERMES_TELEGRAM_DISABLE_FALLBACK_IPS", "").strip().lower() in ("1", "true", "yes", "on"))
fallback_ips = self._fallback_ips()
if not fallback_ips:
@@ -882,7 +847,6 @@ class TelegramAdapter(BasePlatformAdapter):
parse_mode=ParseMode.MARKDOWN_V2,
reply_to_message_id=reply_to_id,
message_thread_id=effective_thread_id,
**self._link_preview_kwargs(),
)
except Exception as md_error:
# Markdown parsing failed, try plain text
@@ -895,7 +859,6 @@ class TelegramAdapter(BasePlatformAdapter):
parse_mode=None,
reply_to_message_id=reply_to_id,
message_thread_id=effective_thread_id,
**self._link_preview_kwargs(),
)
else:
raise
@@ -1083,7 +1046,6 @@ class TelegramAdapter(BasePlatformAdapter):
text=text,
parse_mode=ParseMode.MARKDOWN,
reply_markup=keyboard,
**self._link_preview_kwargs(),
)
return SendResult(success=True, message_id=str(msg.message_id))
except Exception as e:
@@ -1105,13 +1067,10 @@ class TelegramAdapter(BasePlatformAdapter):
try:
cmd_preview = command[:3800] + "..." if len(command) > 3800 else command
# Escape backticks that would break Markdown v1 inline code parsing
safe_cmd = cmd_preview.replace("`", "'")
safe_desc = description.replace("`", "'").replace("*", "")
text = (
f"⚠️ *Command Approval Required*\n\n"
f"`{safe_cmd}`\n\n"
f"Reason: {safe_desc}"
f"`{cmd_preview}`\n\n"
f"Reason: {description}"
)
# Resolve thread context for thread replies
@@ -1143,7 +1102,6 @@ class TelegramAdapter(BasePlatformAdapter):
"text": text,
"parse_mode": ParseMode.MARKDOWN,
"reply_markup": keyboard,
**self._link_preview_kwargs(),
}
if thread_id:
kwargs["message_thread_id"] = int(thread_id)
@@ -1214,7 +1172,6 @@ class TelegramAdapter(BasePlatformAdapter):
parse_mode=ParseMode.MARKDOWN,
reply_markup=keyboard,
message_thread_id=int(thread_id) if thread_id else None,
**self._link_preview_kwargs(),
)
# Store picker state keyed by chat_id
@@ -1483,9 +1440,12 @@ class TelegramAdapter(BasePlatformAdapter):
# Only authorized users may click approval buttons.
caller_id = str(getattr(query.from_user, "id", ""))
if not self._is_callback_user_authorized(caller_id):
await query.answer(text="⛔ You are not authorized to approve commands.")
return
allowed_csv = os.getenv("TELEGRAM_ALLOWED_USERS", "").strip()
if allowed_csv:
allowed_ids = {uid.strip() for uid in allowed_csv.split(",") if uid.strip()}
if "*" not in allowed_ids and caller_id not in allowed_ids:
await query.answer(text="⛔ You are not authorized to approve commands.")
return
session_key = self._approval_state.pop(approval_id, None)
if not session_key:
@@ -1530,10 +1490,6 @@ class TelegramAdapter(BasePlatformAdapter):
if not data.startswith("update_prompt:"):
return
answer = data.split(":", 1)[1] # "y" or "n"
caller_id = str(getattr(query.from_user, "id", ""))
if not self._is_callback_user_authorized(caller_id):
await query.answer(text="⛔ You are not authorized to answer update prompts.")
return
await query.answer(text=f"Sent '{answer}' to the update process.")
# Edit the message to show the choice and remove buttons
label = "Yes" if answer == "y" else "No"
@@ -2809,15 +2765,6 @@ class TelegramAdapter(BasePlatformAdapter):
reply_to_id = str(message.reply_to_message.message_id)
reply_to_text = message.reply_to_message.text or message.reply_to_message.caption or None
# Per-channel/topic ephemeral prompt
from gateway.platforms.base import resolve_channel_prompt
_chat_id_str = str(chat.id)
_channel_prompt = resolve_channel_prompt(
self.config.extra,
thread_id_str or _chat_id_str,
_chat_id_str if thread_id_str else None,
)
return MessageEvent(
text=message.text or "",
message_type=msg_type,
@@ -2827,7 +2774,6 @@ class TelegramAdapter(BasePlatformAdapter):
reply_to_message_id=reply_to_id,
reply_to_text=reply_to_text,
auto_skill=topic_skill,
channel_prompt=_channel_prompt,
timestamp=message.date,
)
+1 -1
View File
@@ -46,7 +46,7 @@ _SEED_FALLBACK_IPS: list[str] = ["149.154.167.220"]
def _resolve_proxy_url() -> str | None:
# Delegate to shared implementation (env vars + macOS system proxy detection)
from gateway.platforms.base import resolve_proxy_url
return resolve_proxy_url("TELEGRAM_PROXY")
return resolve_proxy_url()
class TelegramFallbackTransport(httpx.AsyncBaseTransport):
-14
View File
@@ -258,20 +258,6 @@ class WecomCallbackAdapter(BasePlatformAdapter):
)
event = self._build_event(app, decrypted)
if event is not None:
# Deduplicate: WeCom retries callbacks on timeout,
# producing duplicate inbound messages (#10305).
if event.message_id:
now = time.time()
if event.message_id in self._seen_messages:
if now - self._seen_messages[event.message_id] < MESSAGE_DEDUP_TTL_SECONDS:
logger.debug("[WecomCallback] Duplicate MsgId %s, skipping", event.message_id)
return web.Response(text="success", content_type="text/plain")
del self._seen_messages[event.message_id]
self._seen_messages[event.message_id] = now
# Prune expired entries when cache grows large
if len(self._seen_messages) > 2000:
cutoff = now - MESSAGE_DEDUP_TTL_SECONDS
self._seen_messages = {k: v for k, v in self._seen_messages.items() if v > cutoff}
# Record which app this user belongs to.
if event.source and event.source.user_id:
map_key = self._user_app_key(
+137 -504
View File
@@ -482,32 +482,6 @@ def _resolve_hermes_bin() -> Optional[list[str]]:
return None
def _parse_session_key(session_key: str) -> "dict | None":
"""Parse a session key into its component parts.
Session keys follow the format
``agent:main:{platform}:{chat_type}:{chat_id}[:{extra}...]``.
Returns a dict with ``platform``, ``chat_type``, ``chat_id``, and
optionally ``thread_id`` keys, or None if the key doesn't match.
The 6th element is only returned as ``thread_id`` for chat types where
it is unambiguous (``dm`` and ``thread``). For group/channel sessions
the suffix may be a user_id (per-user isolation) rather than a
thread_id, so we leave ``thread_id`` out to avoid mis-routing.
"""
parts = session_key.split(":")
if len(parts) >= 5 and parts[0] == "agent" and parts[1] == "main":
result = {
"platform": parts[2],
"chat_type": parts[3],
"chat_id": parts[4],
}
if len(parts) > 5 and parts[3] in ("dm", "thread"):
result["thread_id"] = parts[5]
return result
return None
def _format_gateway_process_notification(evt: dict) -> "str | None":
"""Format a watch pattern event from completion_queue into a [SYSTEM:] message."""
evt_type = evt.get("type", "completion")
@@ -599,7 +573,6 @@ class GatewayRunner:
self._running_agents: Dict[str, Any] = {}
self._running_agents_ts: Dict[str, float] = {} # start timestamp per session
self._pending_messages: Dict[str, str] = {} # Queued messages during interrupt
self._busy_ack_ts: Dict[str, float] = {} # last busy-ack timestamp per session (debounce)
# Cache AIAgent instances per session to preserve prompt caching.
# Without this, a new AIAgent is created per message, rebuilding the
@@ -1356,100 +1329,26 @@ class GatewayRunner:
merge_pending_message_event(adapter._pending_messages, session_key, event)
async def _handle_active_session_busy_message(self, event: MessageEvent, session_key: str) -> bool:
# --- Draining case (gateway restarting/stopping) ---
if self._draining:
adapter = self.adapters.get(event.source.platform)
if not adapter:
return True
thread_meta = {"thread_id": event.source.thread_id} if event.source.thread_id else None
if self._queue_during_drain_enabled():
self._queue_or_replace_pending_event(session_key, event)
message = f"⏳ Gateway {self._status_action_gerund()} — queued for the next turn after it comes back."
else:
message = f"⏳ Gateway is {self._status_action_gerund()} and is not accepting another turn right now."
await adapter._send_with_retry(
chat_id=event.source.chat_id,
content=message,
reply_to=event.message_id,
metadata=thread_meta,
)
return True
# --- Normal busy case (agent actively running a task) ---
# The user sent a message while the agent is working. Interrupt the
# agent immediately so it stops the current tool-calling loop and
# processes the new message. The pending message is stored in the
# adapter so the base adapter picks it up once the interrupted run
# returns. A brief ack tells the user what's happening (debounced
# to avoid spam when they fire multiple messages quickly).
if not self._draining:
return False
adapter = self.adapters.get(event.source.platform)
if not adapter:
return False # let default path handle it
# Store the message so it's processed as the next turn after the
# interrupt causes the current run to exit.
from gateway.platforms.base import merge_pending_message_event
merge_pending_message_event(adapter._pending_messages, session_key, event)
# Interrupt the running agent — this aborts in-flight tool calls and
# causes the agent loop to exit at the next check point.
running_agent = self._running_agents.get(session_key)
if running_agent and running_agent is not _AGENT_PENDING_SENTINEL:
try:
running_agent.interrupt(event.text)
except Exception:
pass # don't let interrupt failure block the ack
# Debounce: only send an acknowledgment once every 30 seconds per session
# to avoid spamming the user when they send multiple messages quickly
_BUSY_ACK_COOLDOWN = 30
now = time.time()
last_ack = self._busy_ack_ts.get(session_key, 0)
if now - last_ack < _BUSY_ACK_COOLDOWN:
return True # interrupt sent, ack already delivered recently
self._busy_ack_ts[session_key] = now
# Build a status-rich acknowledgment
status_parts = []
if running_agent and running_agent is not _AGENT_PENDING_SENTINEL:
try:
summary = running_agent.get_activity_summary()
iteration = summary.get("api_call_count", 0)
max_iter = summary.get("max_iterations", 0)
current_tool = summary.get("current_tool")
start_ts = self._running_agents_ts.get(session_key, 0)
if start_ts:
elapsed_min = int((now - start_ts) / 60)
if elapsed_min > 0:
status_parts.append(f"{elapsed_min} min elapsed")
if max_iter:
status_parts.append(f"iteration {iteration}/{max_iter}")
if current_tool:
status_parts.append(f"running: {current_tool}")
except Exception:
pass
status_detail = f" ({', '.join(status_parts)})" if status_parts else ""
message = (
f"⚡ Interrupting current task{status_detail}. "
f"I'll respond to your message shortly."
)
return True
thread_meta = {"thread_id": event.source.thread_id} if event.source.thread_id else None
try:
await adapter._send_with_retry(
chat_id=event.source.chat_id,
content=message,
reply_to=event.message_id,
metadata=thread_meta,
)
except Exception as e:
logger.debug("Failed to send busy-ack: %s", e)
if self._queue_during_drain_enabled():
self._queue_or_replace_pending_event(session_key, event)
message = f"⏳ Gateway {self._status_action_gerund()} — queued for the next turn after it comes back."
else:
message = f"⏳ Gateway is {self._status_action_gerund()} and is not accepting another turn right now."
await adapter._send_with_retry(
chat_id=event.source.chat_id,
content=message,
reply_to=event.message_id,
metadata=thread_meta,
)
return True
async def _drain_active_agents(self, timeout: float) -> tuple[Dict[str, Any], bool]:
@@ -1506,7 +1405,7 @@ class GatewayRunner:
action = "restarting" if self._restart_requested else "shutting down"
hint = (
"Your current task will be interrupted. "
"Send any message after restart to resume where it left off."
"Use /retry after restart to continue."
if self._restart_requested
else "Your current task will be interrupted."
)
@@ -1515,11 +1414,12 @@ class GatewayRunner:
notified: set = set()
for session_key in active:
# Parse platform + chat_id from the session key.
_parsed = _parse_session_key(session_key)
if not _parsed:
# Format: agent:main:{platform}:{chat_type}:{chat_id}[:{extra}...]
parts = session_key.split(":")
if len(parts) < 5:
continue
platform_str = _parsed["platform"]
chat_id = _parsed["chat_id"]
platform_str = parts[2]
chat_id = parts[4]
# Deduplicate: one notification per chat, even if multiple
# sessions (different users/threads) share the same chat.
@@ -1535,7 +1435,7 @@ class GatewayRunner:
# Include thread_id if present so the message lands in the
# correct forum topic / thread.
thread_id = _parsed.get("thread_id")
thread_id = parts[5] if len(parts) > 5 else None
metadata = {"thread_id": thread_id} if thread_id else None
await adapter.send(chat_id, msg, metadata=metadata)
@@ -1575,106 +1475,6 @@ class GatewayRunner:
except Exception:
pass
_STUCK_LOOP_THRESHOLD = 3 # restarts while active before auto-suspend
_STUCK_LOOP_FILE = ".restart_failure_counts"
def _increment_restart_failure_counts(self, active_session_keys: set) -> None:
"""Increment restart-failure counters for sessions active at shutdown.
Persists to a JSON file so counters survive across restarts.
Sessions NOT in active_session_keys are removed (they completed
successfully, so the loop is broken).
"""
import json
path = _hermes_home / self._STUCK_LOOP_FILE
try:
counts = json.loads(path.read_text()) if path.exists() else {}
except Exception:
counts = {}
# Increment active sessions, remove inactive ones (loop broken)
new_counts = {}
for key in active_session_keys:
new_counts[key] = counts.get(key, 0) + 1
# Keep any entries that are still above 0 even if not active now
# (they might become active again next restart)
try:
path.write_text(json.dumps(new_counts))
except Exception:
pass
def _suspend_stuck_loop_sessions(self) -> int:
"""Suspend sessions that have been active across too many restarts.
Returns the number of sessions suspended. Called on gateway startup
AFTER suspend_recently_active() to catch the stuck-loop pattern:
session loads agent gets stuck gateway restarts repeat.
"""
import json
path = _hermes_home / self._STUCK_LOOP_FILE
if not path.exists():
return 0
try:
counts = json.loads(path.read_text())
except Exception:
return 0
suspended = 0
stuck_keys = [k for k, v in counts.items() if v >= self._STUCK_LOOP_THRESHOLD]
for session_key in stuck_keys:
try:
entry = self.session_store._entries.get(session_key)
if entry and not entry.suspended:
entry.suspended = True
suspended += 1
logger.warning(
"Auto-suspended stuck session %s (active across %d "
"consecutive restarts — likely a stuck loop)",
session_key[:30], counts[session_key],
)
except Exception:
pass
if suspended:
try:
self.session_store._save()
except Exception:
pass
# Clear the file — counters start fresh after suspension
try:
path.unlink(missing_ok=True)
except Exception:
pass
return suspended
def _clear_restart_failure_count(self, session_key: str) -> None:
"""Clear the restart-failure counter for a session that completed OK.
Called after a successful agent turn to signal the loop is broken.
"""
import json
path = _hermes_home / self._STUCK_LOOP_FILE
if not path.exists():
return
try:
counts = json.loads(path.read_text())
if session_key in counts:
del counts[session_key]
if counts:
path.write_text(json.dumps(counts))
else:
path.unlink(missing_ok=True)
except Exception:
pass
async def _launch_detached_restart_command(self) -> None:
import shutil
import subprocess
@@ -1740,7 +1540,7 @@ class GatewayRunner:
pass
try:
from gateway.status import write_runtime_status
write_runtime_status(gateway_state="starting", exit_reason=None)
write_runtime_status(gateway_state="starting", exit_reason=None, startup_checks={})
except Exception:
pass
@@ -1782,8 +1582,23 @@ class GatewayRunner:
"or configure platform allowlists (e.g., TELEGRAM_ALLOWED_USERS=your_id)."
)
# Discover plugins before hooks so plugin-owned hook bundles can
# participate in this same startup cycle.
try:
from hermes_cli.plugins import discover_plugins
discover_plugins()
except Exception as e:
logger.warning("Plugin discovery during gateway startup failed: %s", e)
# Discover and load event hooks
self.hooks.discover_and_load()
try:
from gateway.status import reset_startup_checks
reset_startup_checks(self.hooks.loaded_hooks)
except Exception as e:
logger.warning("Startup readiness initialization failed: %s", e)
# Recover background processes from checkpoint (crash recovery)
try:
@@ -1818,17 +1633,6 @@ class GatewayRunner:
except Exception as e:
logger.warning("Session suspension on startup failed: %s", e)
# Stuck-loop detection (#7536): if a session has been active across
# 3+ consecutive restarts, it's probably stuck in a loop (the same
# history keeps causing the agent to hang). Auto-suspend it so the
# user gets a clean slate on the next message.
try:
stuck = self._suspend_stuck_loop_sessions()
if stuck:
logger.warning("Auto-suspended %d stuck-loop session(s)", stuck)
except Exception as e:
logger.debug("Stuck-loop detection failed: %s", e)
connected_count = 0
enabled_platform_count = 0
startup_nonretryable_errors: list[str] = []
@@ -2315,6 +2119,11 @@ class GatewayRunner:
logger.error("Failed to launch detached gateway restart: %s", e)
self._finalize_shutdown_agents(active_agents)
await self.hooks.emit("gateway:shutdown", {
"restart": self._restart_requested,
"service_restart": self._restart_via_service,
"detached_restart": self._restart_detached,
})
for platform, adapter in list(self.adapters.items()):
try:
@@ -2337,8 +2146,6 @@ class GatewayRunner:
self._running_agents.clear()
self._pending_messages.clear()
self._pending_approvals.clear()
if hasattr(self, '_busy_ack_ts'):
self._busy_ack_ts.clear()
self._shutdown_event.set()
# Global cleanup: kill any remaining tool subprocesses not tied
@@ -2382,14 +2189,6 @@ class GatewayRunner:
"active sessions."
)
# Track sessions that were active at shutdown for stuck-loop
# detection (#7536). On each restart, the counter increments
# for sessions that were running. If a session hits the
# threshold (3 consecutive restarts while active), the next
# startup auto-suspends it — breaking the loop.
if active_agents:
self._increment_restart_failure_counts(set(active_agents.keys()))
if self._restart_requested and self._restart_via_service:
self._exit_code = GATEWAY_SERVICE_RESTART_EXIT_CODE
self._exit_reason = self._exit_reason or "Gateway restart requested"
@@ -2823,7 +2622,6 @@ class GatewayRunner:
)
del self._running_agents[_quick_key]
self._running_agents_ts.pop(_quick_key, None)
self._busy_ack_ts.pop(_quick_key, None)
if _quick_key in self._running_agents:
if event.get_command() == "status":
@@ -2891,7 +2689,6 @@ class GatewayRunner:
message_type=_MT.TEXT,
source=event.source,
message_id=event.message_id,
channel_prompt=event.channel_prompt,
)
adapter._pending_messages[_quick_key] = queued_event
return "Queued for the next turn."
@@ -3739,7 +3536,6 @@ class GatewayRunner:
model=_hyg_model,
max_iterations=4,
quiet_mode=True,
skip_memory=True,
enabled_toolsets=["memory"],
session_id=session_entry.session_id,
)
@@ -3870,7 +3666,6 @@ class GatewayRunner:
session_id=session_entry.session_id,
session_key=session_key,
event_message_id=event.message_id,
channel_prompt=event.channel_prompt,
)
# Stop persistent typing indicator now that the agent is done
@@ -3882,18 +3677,6 @@ class GatewayRunner:
pass
response = agent_result.get("final_response") or ""
# Convert the agent's internal "(empty)" sentinel into a
# user-friendly message. "(empty)" means the model failed to
# produce visible content after exhausting all retries (nudge,
# prefill, empty-retry, fallback). Sending the raw sentinel
# looks like a bug; a short explanation is more helpful.
if response == "(empty)":
response = (
"⚠️ The model returned no response after processing tool "
"results. This can happen with some models — try again or "
"rephrase your question."
)
agent_messages = agent_result.get("messages", [])
_response_time = time.time() - _msg_start_time
_api_calls = agent_result.get("api_calls", 0)
@@ -3904,12 +3687,6 @@ class GatewayRunner:
_response_time, _api_calls, _resp_len,
)
# Successful turn — clear any stuck-loop counter for this session.
# This ensures the counter only accumulates across CONSECUTIVE
# restarts where the session was active (never completed).
if session_key:
self._clear_restart_failure_count(session_key)
# Surface error details when the agent failed silently (final_response=None)
if not response and agent_result.get("failed"):
error_detail = agent_result.get("error", "unknown error")
@@ -3998,7 +3775,7 @@ class GatewayRunner:
synth_text = _format_gateway_process_notification(evt)
if synth_text:
try:
await self._inject_watch_notification(synth_text, evt)
await self._inject_watch_notification(synth_text, event)
except Exception as e2:
logger.error("Watch notification injection error: %s", e2)
except Exception as e:
@@ -4016,11 +3793,14 @@ class GatewayRunner:
# intermediate reasoning) so sessions can be resumed with full context
# and transcripts are useful for debugging and training data.
#
# IMPORTANT: When the agent failed (e.g. context-overflow 400,
# compression exhausted), do NOT persist the user's message.
# IMPORTANT: When the agent failed before producing any response
# (e.g. context-overflow 400), do NOT persist the user's message.
# Persisting it would make the session even larger, causing the
# same failure on the next attempt — an infinite loop. (#1630, #9893)
agent_failed_early = bool(agent_result.get("failed"))
# same failure on the next attempt — an infinite loop. (#1630)
agent_failed_early = (
agent_result.get("failed")
and not agent_result.get("final_response")
)
if agent_failed_early:
logger.info(
"Skipping transcript persistence for failed request in "
@@ -4028,24 +3808,6 @@ class GatewayRunner:
session_entry.session_id,
)
# When compression is exhausted, the session is permanently too
# large to process. Auto-reset it so the next message starts
# fresh instead of replaying the same oversized context in an
# infinite fail loop. (#9893)
if agent_result.get("compression_exhausted") and session_entry and session_key:
logger.info(
"Auto-resetting session %s after compression exhaustion.",
session_entry.session_id,
)
self.session_store.reset_session(session_key)
self._evict_cached_agent(session_key)
self._session_model_overrides.pop(session_key, None)
response = (response or "") + (
"\n\n🔄 Session auto-reset — the conversation exceeded the "
"maximum context size and could not be compressed further. "
"Your next message will start a fresh session."
)
ts = datetime.now().isoformat()
# If this is a fresh session (no history), write the full tool
@@ -4153,8 +3915,6 @@ class GatewayRunner:
_hist_len = len(history) if 'history' in locals() else 0
if status_code == 401:
status_hint = " Check your API key or run `claude /login` to refresh OAuth credentials."
elif status_code == 402:
status_hint = " Your API balance or quota is exhausted. Check your provider dashboard."
elif status_code == 429:
# Check if this is a plan usage limit (resets on a schedule) vs a transient rate limit
_err_body = getattr(e, "response", None)
@@ -4394,16 +4154,31 @@ class GatewayRunner:
async def _handle_profile_command(self, event: MessageEvent) -> str:
"""Handle /profile — show active profile name and home directory."""
from hermes_constants import display_hermes_home
from hermes_cli.profiles import get_active_profile_name
from hermes_constants import get_hermes_home, display_hermes_home
from pathlib import Path
home = get_hermes_home()
display = display_hermes_home()
profile_name = get_active_profile_name()
lines = [
f"👤 **Profile:** `{profile_name}`",
f"📂 **Home:** `{display}`",
]
# Detect profile name from HERMES_HOME path
# Profile paths look like: ~/.hermes/profiles/<name>
profiles_parent = Path.home() / ".hermes" / "profiles"
try:
rel = home.relative_to(profiles_parent)
profile_name = str(rel).split("/")[0]
except ValueError:
profile_name = None
if profile_name:
lines = [
f"👤 **Profile:** `{profile_name}`",
f"📂 **Home:** `{display}`",
]
else:
lines = [
"👤 **Profile:** default",
f"📂 **Home:** `{display}`",
]
return "\n".join(lines)
@@ -4976,7 +4751,6 @@ class GatewayRunner:
async def _handle_personality_command(self, event: MessageEvent) -> str:
"""Handle /personality command - list or set a personality."""
import yaml
from hermes_constants import display_hermes_home
args = event.get_command_args().strip().lower()
config_path = _hermes_home / 'config.yaml'
@@ -4994,7 +4768,7 @@ class GatewayRunner:
personalities = {}
if not personalities:
return f"No personalities configured in `{display_hermes_home()}/config.yaml`"
return "No personalities configured in `~/.hermes/config.yaml`"
if not args:
lines = ["🎭 **Available Personalities**\n"]
@@ -5078,7 +4852,6 @@ class GatewayRunner:
message_type=MessageType.TEXT,
source=source,
raw_message=event.raw_message,
channel_prompt=event.channel_prompt,
)
# Let the normal message handler process it
@@ -6222,7 +5995,6 @@ class GatewayRunner:
model=model,
max_iterations=4,
quiet_mode=True,
skip_memory=True,
enabled_toolsets=["memory"],
session_id=session_entry.session_id,
)
@@ -6588,11 +6360,6 @@ class GatewayRunner:
import asyncio as _asyncio
args = event.get_command_args().strip()
# Normalize Unicode dashes (Telegram/iOS auto-converts -- to em/en dash)
import re as _re
args = _re.sub(r'[\u2012\u2013\u2014\u2015](days|source)', r'--\1', args)
days = 30
source = None
@@ -6816,17 +6583,11 @@ class GatewayRunner:
})
async def _handle_debug_command(self, event: MessageEvent) -> str:
"""Handle /debug — upload debug report (summary only) and return paste URLs.
Gateway uploads ONLY the summary report (system info + log tails),
NOT full log files, to protect conversation privacy. Users who need
full log uploads should use ``hermes debug share`` from the CLI.
"""
"""Handle /debug — upload debug report + logs and return paste URLs."""
import asyncio
from hermes_cli.debug import (
_capture_dump, collect_debug_report,
upload_to_pastebin, _schedule_auto_delete,
_GATEWAY_PRIVACY_NOTICE,
_capture_dump, collect_debug_report, _read_full_log,
upload_to_pastebin,
)
loop = asyncio.get_running_loop()
@@ -6835,25 +6596,43 @@ class GatewayRunner:
def _collect_and_upload():
dump_text = _capture_dump()
report = collect_debug_report(log_lines=200, dump_text=dump_text)
agent_log = _read_full_log("agent")
gateway_log = _read_full_log("gateway")
if agent_log:
agent_log = dump_text + "\n\n--- full agent.log ---\n" + agent_log
if gateway_log:
gateway_log = dump_text + "\n\n--- full gateway.log ---\n" + gateway_log
urls = {}
failures = []
try:
urls["Report"] = upload_to_pastebin(report)
except Exception as exc:
return f"✗ Failed to upload debug report: {exc}"
# Schedule auto-deletion after 1 hour
_schedule_auto_delete(list(urls.values()))
if agent_log:
try:
urls["agent.log"] = upload_to_pastebin(agent_log)
except Exception:
failures.append("agent.log")
lines = [_GATEWAY_PRIVACY_NOTICE, "", "**Debug report uploaded:**", ""]
if gateway_log:
try:
urls["gateway.log"] = upload_to_pastebin(gateway_log)
except Exception:
failures.append("gateway.log")
lines = ["**Debug report uploaded:**", ""]
label_width = max(len(k) for k in urls)
for label, url in urls.items():
lines.append(f"`{label:<{label_width}}` {url}")
lines.append("")
lines.append("⏱ Pastes will auto-delete in 1 hour.")
lines.append("For full log uploads, use `hermes debug share` from the CLI.")
lines.append("Share these links with the Hermes team for support.")
if failures:
lines.append(f"\n_(failed to upload: {', '.join(failures)})_")
lines.append("\nShare these links with the Hermes team for support.")
return "\n".join(lines)
return await loop.run_in_executor(None, _collect_and_upload)
@@ -7473,75 +7252,14 @@ class GatewayRunner:
return prefix
return user_text
def _build_process_event_source(self, evt: dict):
"""Resolve the canonical source for a synthetic background-process event.
Prefer the persisted session-store origin for the event's session key.
Falling back to the currently active foreground event is what causes
cross-topic bleed, so don't do that.
"""
from gateway.session import SessionSource
session_key = str(evt.get("session_key") or "").strip()
derived_platform = ""
derived_chat_type = ""
derived_chat_id = ""
if session_key:
try:
self.session_store._ensure_loaded()
entry = self.session_store._entries.get(session_key)
if entry and getattr(entry, "origin", None):
return entry.origin
except Exception as exc:
logger.debug(
"Synthetic process-event session-store lookup failed for %s: %s",
session_key,
exc,
)
_parsed = _parse_session_key(session_key)
if _parsed:
derived_platform = _parsed["platform"]
derived_chat_type = _parsed["chat_type"]
derived_chat_id = _parsed["chat_id"]
platform_name = str(evt.get("platform") or derived_platform or "").strip().lower()
chat_type = str(evt.get("chat_type") or derived_chat_type or "").strip().lower()
chat_id = str(evt.get("chat_id") or derived_chat_id or "").strip()
if not platform_name or not chat_type or not chat_id:
return None
try:
platform = Platform(platform_name)
except Exception:
logger.warning(
"Synthetic process event has invalid platform metadata: %r",
platform_name,
)
return None
return SessionSource(
platform=platform,
chat_id=chat_id,
chat_type=chat_type,
thread_id=str(evt.get("thread_id") or "").strip() or None,
user_id=str(evt.get("user_id") or "").strip() or None,
user_name=str(evt.get("user_name") or "").strip() or None,
)
async def _inject_watch_notification(self, synth_text: str, evt: dict) -> None:
async def _inject_watch_notification(self, synth_text: str, original_event) -> None:
"""Inject a watch-pattern notification as a synthetic message event.
Routing must come from the queued watch event itself, not from whatever
foreground message happened to be active when the queue was drained.
Uses the source from the original user event to route the notification
back to the correct chat/adapter.
"""
source = self._build_process_event_source(evt)
source = getattr(original_event, "source", None)
if not source:
logger.warning(
"Dropping watch notification with no routing metadata for process %s",
evt.get("session_id", "unknown"),
)
return
platform_name = source.platform.value if hasattr(source.platform, "value") else str(source.platform)
adapter = None
@@ -7559,12 +7277,7 @@ class GatewayRunner:
source=source,
internal=True,
)
logger.info(
"Watch pattern notification — injecting for %s chat=%s thread=%s",
platform_name,
source.chat_id,
source.thread_id,
)
logger.info("Watch pattern notification — injecting for %s", platform_name)
await adapter.handle_message(synth_event)
except Exception as e:
logger.error("Watch notification injection error: %s", e)
@@ -7634,42 +7347,33 @@ class GatewayRunner:
f"Command: {session.command}\n"
f"Output:\n{_out}]"
)
source = self._build_process_event_source({
"session_id": session_id,
"session_key": session_key,
"platform": platform_name,
"chat_id": chat_id,
"thread_id": thread_id,
"user_id": user_id,
"user_name": user_name,
})
if not source:
logger.warning(
"Dropping completion notification with no routing metadata for process %s",
session_id,
)
break
adapter = None
for p, a in self.adapters.items():
if p == source.platform:
if p.value == platform_name:
adapter = a
break
if adapter and source.chat_id:
if adapter and chat_id:
try:
from gateway.platforms.base import MessageEvent, MessageType
from gateway.session import SessionSource
from gateway.config import Platform
_platform_enum = Platform(platform_name)
_source = SessionSource(
platform=_platform_enum,
chat_id=chat_id,
thread_id=thread_id or None,
user_id=user_id or None,
user_name=user_name or None,
)
synth_event = MessageEvent(
text=synth_text,
message_type=MessageType.TEXT,
source=source,
source=_source,
internal=True,
)
logger.info(
"Process %s finished — injecting agent notification for session %s chat=%s thread=%s",
session_id,
session_key,
source.chat_id,
source.thread_id,
"Process %s finished — injecting agent notification for session %s",
session_id, session_key,
)
await adapter.handle_message(synth_event)
except Exception as e:
@@ -8065,7 +7769,6 @@ class GatewayRunner:
session_key: str = None,
_interrupt_depth: int = 0,
event_message_id: Optional[str] = None,
channel_prompt: Optional[str] = None,
) -> Dict[str, Any]:
"""
Run the agent with the given message and context.
@@ -8420,12 +8123,8 @@ class GatewayRunner:
# Platform.LOCAL ("local") maps to "cli"; others pass through as-is.
platform_key = "cli" if source.platform == Platform.LOCAL else source.platform.value
# Combine platform context, per-channel context, and the user-configured
# ephemeral system prompt.
# Combine platform context with user-configured ephemeral system prompt
combined_ephemeral = context_prompt or ""
event_channel_prompt = (channel_prompt or "").strip()
if event_channel_prompt:
combined_ephemeral = (combined_ephemeral + "\n\n" + event_channel_prompt).strip()
if self._ephemeral_system_prompt:
combined_ephemeral = (combined_ephemeral + "\n\n" + self._ephemeral_system_prompt).strip()
@@ -8559,12 +8258,6 @@ class GatewayRunner:
cached = _cache.get(session_key)
if cached and cached[1] == _sig:
agent = cached[0]
# Reset activity timestamp so the inactivity timeout
# handler doesn't see stale idle time from the previous
# turn and immediately kill this agent. (#9051)
agent._last_activity_ts = time.time()
agent._last_activity_desc = "starting new turn (cached)"
agent._api_call_count = 0
logger.debug("Reusing cached agent for session %s", session_key)
if agent is None:
@@ -8590,7 +8283,6 @@ class GatewayRunner:
session_id=session_id,
platform=platform_key,
user_id=source.user_id,
gateway_session_key=session_key,
session_db=self._session_db,
fallback_model=self._fallback_model,
)
@@ -8610,11 +8302,8 @@ class GatewayRunner:
agent.service_tier = self._service_tier
agent.request_overrides = turn_route.get("request_overrides")
_bg_review_release = threading.Event()
_bg_review_pending: list[str] = []
_bg_review_pending_lock = threading.Lock()
def _deliver_bg_review_message(message: str) -> None:
# Background review delivery — send "💾 Memory updated" etc. to user
def _bg_review_send(message: str) -> None:
if not _status_adapter:
return
try:
@@ -8629,32 +8318,7 @@ class GatewayRunner:
except Exception as _e:
logger.debug("background_review_callback error: %s", _e)
def _release_bg_review_messages() -> None:
_bg_review_release.set()
with _bg_review_pending_lock:
pending = list(_bg_review_pending)
_bg_review_pending.clear()
for queued in pending:
_deliver_bg_review_message(queued)
# Background review delivery — send "💾 Memory updated" etc. to user
def _bg_review_send(message: str) -> None:
if not _status_adapter:
return
if not _bg_review_release.is_set():
with _bg_review_pending_lock:
if not _bg_review_release.is_set():
_bg_review_pending.append(message)
return
_deliver_bg_review_message(message)
agent.background_review_callback = _bg_review_send
# Register the release hook on the adapter so base.py's finally
# block can fire it after delivering the main response.
if _status_adapter and session_key:
_pdc = getattr(_status_adapter, "_post_delivery_callbacks", None)
if _pdc is not None:
_pdc[session_key] = _release_bg_review_messages
# Store agent reference for interrupt support
agent_holder[0] = agent
@@ -8806,21 +8470,6 @@ class GatewayRunner:
if _msn:
message = _msn + "\n\n" + message
# Auto-continue: if the loaded history ends with a tool result,
# the previous agent turn was interrupted mid-work (gateway
# restart, crash, SIGTERM). Prepend a system note so the model
# finishes processing the pending tool results before addressing
# the user's new message. (#4493)
if agent_history and agent_history[-1].get("role") == "tool":
message = (
"[System note: Your previous turn was interrupted before you could "
"process the last tool result(s). The conversation history contains "
"tool outputs you haven't responded to yet. Please finish processing "
"those results and summarize what was accomplished, then address the "
"user's new message below.]\n\n"
+ message
)
_approval_session_key = session_key or ""
_approval_session_token = set_current_session_key(_approval_session_key)
register_gateway_notify(_approval_session_key, _approval_notify_sync)
@@ -8855,8 +8504,6 @@ class GatewayRunner:
"final_response": error_msg,
"messages": result.get("messages", []),
"api_calls": result.get("api_calls", 0),
"failed": result.get("failed", False),
"compression_exhausted": result.get("compression_exhausted", False),
"tools": tools_holder[0] or [],
"history_offset": len(agent_history),
"last_prompt_tokens": _last_prompt_toks,
@@ -9361,11 +9008,15 @@ class GatewayRunner:
pass
except Exception as e:
logger.debug("Stream consumer wait before queued message failed: %s", e)
_response_previewed = bool(result.get("response_previewed"))
_already_streamed = bool(
_sc
and (
getattr(_sc, "final_response_sent", False)
or getattr(_sc, "already_sent", False)
or (
_response_previewed
and getattr(_sc, "already_sent", False)
)
)
)
first_response = result.get("final_response", "")
@@ -9378,17 +9029,6 @@ class GatewayRunner:
)
except Exception as e:
logger.warning("Failed to send first response before queued message: %s", e)
# Release deferred bg-review notifications now that the
# first response has been delivered. Pop from the
# adapter's callback dict (prevents double-fire in
# base.py's finally block) and call it.
if adapter and hasattr(adapter, "_post_delivery_callbacks"):
_bg_cb = adapter._post_delivery_callbacks.pop(session_key, None)
if callable(_bg_cb):
try:
_bg_cb()
except Exception:
pass
# else: interrupted — discard the interrupted response ("Operation
# interrupted." is just noise; the user already knows they sent a
# new message).
@@ -9417,7 +9057,6 @@ class GatewayRunner:
session_key=session_key,
_interrupt_depth=_interrupt_depth + 1,
event_message_id=next_message_id,
channel_prompt=pending_event.channel_prompt,
)
finally:
# Stop progress sender, interrupt monitor, and notification task
@@ -9459,21 +9098,15 @@ class GatewayRunner:
# BUT: never suppress delivery when the agent failed — the error
# message is new content the user hasn't seen, and it must reach
# them even if streaming had sent earlier partial output.
#
# Also never suppress when the final response is "(empty)" — this
# means the model failed to produce content after tool calls (common
# with mimo-v2-pro, GLM-5, etc.). The stream consumer may have
# sent intermediate text ("Let me search for that…") alongside the
# tool call, setting already_sent=True, but that text is NOT the
# final answer. Suppressing delivery here leaves the user staring
# at silence. (#10xxx — "agent stops after web search")
_sc = stream_consumer_holder[0]
if _sc and isinstance(response, dict) and not response.get("failed"):
_final = response.get("final_response") or ""
_is_empty_sentinel = not _final or _final == "(empty)"
if not _is_empty_sentinel and (
_response_previewed = bool(response.get("response_previewed"))
if (
getattr(_sc, "final_response_sent", False)
or getattr(_sc, "already_sent", False)
or (
_response_previewed
and getattr(_sc, "already_sent", False)
)
):
response["already_sent"] = True
@@ -9782,9 +9415,9 @@ def main():
config = None
if args.config:
import yaml
import json
with open(args.config, encoding="utf-8") as f:
data = yaml.safe_load(f)
data = json.load(f)
config = GatewayConfig.from_dict(data)
# Run the gateway - exit with code 1 if no platforms connected,
+2 -6
View File
@@ -301,8 +301,6 @@ def build_session_context_prompt(
lines.append("")
lines.append("**Delivery options for scheduled tasks:**")
from hermes_constants import display_hermes_home
# Origin delivery
if context.source.platform == Platform.LOCAL:
lines.append("- `\"origin\"` → Local output (saved to files)")
@@ -311,11 +309,9 @@ def build_session_context_prompt(
_hash_chat_id(context.source.chat_id) if redact_pii else context.source.chat_id
)
lines.append(f"- `\"origin\"` → Back to this chat ({_origin_label})")
# Local always available
lines.append(
f"- `\"local\"` → Save to local files only ({display_hermes_home()}/cron/output/)"
)
lines.append("- `\"local\"` → Save to local files only (~/.hermes/cron/output/)")
# Platform home channels
for platform, home in context.home_channels.items():
+17 -34
View File
@@ -37,24 +37,18 @@ needs to replace the import + call site:
"""
from contextvars import ContextVar
from typing import Any
# Sentinel to distinguish "never set in this context" from "explicitly set to empty".
# When a contextvar holds _UNSET, we fall back to os.environ (CLI/cron compat).
# When it holds "" (after clear_session_vars resets it), we return "" — no fallback.
_UNSET: Any = object()
# ---------------------------------------------------------------------------
# Per-task session variables
# ---------------------------------------------------------------------------
_SESSION_PLATFORM: ContextVar = ContextVar("HERMES_SESSION_PLATFORM", default=_UNSET)
_SESSION_CHAT_ID: ContextVar = ContextVar("HERMES_SESSION_CHAT_ID", default=_UNSET)
_SESSION_CHAT_NAME: ContextVar = ContextVar("HERMES_SESSION_CHAT_NAME", default=_UNSET)
_SESSION_THREAD_ID: ContextVar = ContextVar("HERMES_SESSION_THREAD_ID", default=_UNSET)
_SESSION_USER_ID: ContextVar = ContextVar("HERMES_SESSION_USER_ID", default=_UNSET)
_SESSION_USER_NAME: ContextVar = ContextVar("HERMES_SESSION_USER_NAME", default=_UNSET)
_SESSION_KEY: ContextVar = ContextVar("HERMES_SESSION_KEY", default=_UNSET)
_SESSION_PLATFORM: ContextVar[str] = ContextVar("HERMES_SESSION_PLATFORM", default="")
_SESSION_CHAT_ID: ContextVar[str] = ContextVar("HERMES_SESSION_CHAT_ID", default="")
_SESSION_CHAT_NAME: ContextVar[str] = ContextVar("HERMES_SESSION_CHAT_NAME", default="")
_SESSION_THREAD_ID: ContextVar[str] = ContextVar("HERMES_SESSION_THREAD_ID", default="")
_SESSION_USER_ID: ContextVar[str] = ContextVar("HERMES_SESSION_USER_ID", default="")
_SESSION_USER_NAME: ContextVar[str] = ContextVar("HERMES_SESSION_USER_NAME", default="")
_SESSION_KEY: ContextVar[str] = ContextVar("HERMES_SESSION_KEY", default="")
_VAR_MAP = {
"HERMES_SESSION_PLATFORM": _SESSION_PLATFORM,
@@ -97,17 +91,10 @@ def set_session_vars(
def clear_session_vars(tokens: list) -> None:
"""Mark session context variables as explicitly cleared.
Sets all variables to ``""`` so that ``get_session_env`` returns an empty
string instead of falling back to (potentially stale) ``os.environ``
values. The *tokens* argument is accepted for API compatibility with
callers that saved the return value of ``set_session_vars``, but the
actual clearing uses ``var.set("")`` rather than ``var.reset(token)``
to ensure the "explicitly cleared" state is distinguishable from
"never set" (which holds the ``_UNSET`` sentinel).
"""
for var in (
"""Restore session context variables to their pre-handler values."""
if not tokens:
return
vars_in_order = [
_SESSION_PLATFORM,
_SESSION_CHAT_ID,
_SESSION_CHAT_NAME,
@@ -115,8 +102,9 @@ def clear_session_vars(tokens: list) -> None:
_SESSION_USER_ID,
_SESSION_USER_NAME,
_SESSION_KEY,
):
var.set("")
]
for var, token in zip(vars_in_order, tokens):
var.reset(token)
def get_session_env(name: str, default: str = "") -> str:
@@ -125,13 +113,8 @@ def get_session_env(name: str, default: str = "") -> str:
Drop-in replacement for ``os.getenv("HERMES_SESSION_*", default)``.
Resolution order:
1. Context variable (set by the gateway for concurrency-safe access).
If the variable was explicitly set (even to ``""``) via
``set_session_vars`` or ``clear_session_vars``, that value is
returned **no fallback to os.environ**.
2. ``os.environ`` (only when the context variable was never set in
this context i.e. CLI, cron scheduler, and test processes that
don't use ``set_session_vars`` at all).
1. Context variable (set by the gateway for concurrency-safe access)
2. ``os.environ`` (used by CLI, cron scheduler, and tests)
3. *default*
"""
import os
@@ -139,7 +122,7 @@ def get_session_env(name: str, default: str = "") -> str:
var = _VAR_MAP.get(name)
if var is not None:
value = var.get()
if value is not _UNSET:
if value:
return value
# Fall back to os.environ for CLI, cron, and test compatibility
return os.getenv(name, default)
+135 -1
View File
@@ -27,6 +27,7 @@ _RUNTIME_STATUS_FILE = "gateway_state.json"
_LOCKS_DIRNAME = "gateway-locks"
_IS_WINDOWS = sys.platform == "win32"
_UNSET = object()
_VALID_STARTUP_CHECK_STATES = {"pending", "ready", "failed"}
def _get_pid_path() -> Path:
@@ -162,11 +163,39 @@ def _build_runtime_status_record() -> dict[str, Any]:
"restart_requested": False,
"active_agents": 0,
"platforms": {},
"startup_checks": {},
"updated_at": _utc_now_iso(),
})
return payload
def _normalize_startup_check_entries(
startup_checks: Optional[dict[str, Any]],
) -> dict[str, dict[str, Any]]:
"""Normalize persisted startup readiness entries."""
if not isinstance(startup_checks, dict):
return {}
now = _utc_now_iso()
normalized: dict[str, dict[str, Any]] = {}
for raw_id, raw_payload in startup_checks.items():
check_id = str(raw_id).strip()
if not check_id:
continue
payload = raw_payload if isinstance(raw_payload, dict) else {}
state = str(payload.get("state", "pending")).strip().lower()
if state not in _VALID_STARTUP_CHECK_STATES:
state = "pending"
normalized[check_id] = {
"state": state,
"required": bool(payload.get("required", True)),
"source": payload.get("source"),
"detail": payload.get("detail"),
"updated_at": payload.get("updated_at") or now,
}
return normalized
def _read_json_file(path: Path) -> Optional[dict[str, Any]]:
if not path.exists():
return None
@@ -223,6 +252,7 @@ def write_runtime_status(
exit_reason: Any = _UNSET,
restart_requested: Any = _UNSET,
active_agents: Any = _UNSET,
startup_checks: Any = _UNSET,
platform: Any = _UNSET,
platform_state: Any = _UNSET,
error_code: Any = _UNSET,
@@ -245,6 +275,8 @@ def write_runtime_status(
payload["restart_requested"] = bool(restart_requested)
if active_agents is not _UNSET:
payload["active_agents"] = max(0, int(active_agents))
if startup_checks is not _UNSET:
payload["startup_checks"] = _normalize_startup_check_entries(startup_checks)
if platform is not _UNSET:
platform_payload = payload["platforms"].get(platform, {})
@@ -262,7 +294,109 @@ def write_runtime_status(
def read_runtime_status() -> Optional[dict[str, Any]]:
"""Read the persisted gateway runtime health/status information."""
return _read_json_file(_get_runtime_status_path())
payload = _read_json_file(_get_runtime_status_path())
if payload is None:
return None
payload.setdefault("platforms", {})
payload["startup_checks"] = _normalize_startup_check_entries(payload.get("startup_checks"))
return payload
def reset_startup_checks(checks: Optional[list[dict[str, Any]]] = None) -> dict[str, dict[str, Any]]:
"""Replace persisted startup readiness checks for the current run."""
normalized: dict[str, dict[str, Any]] = {}
now = _utc_now_iso()
for hook in checks or []:
if not isinstance(hook, dict):
continue
readiness = hook.get("startup_readiness")
if not isinstance(readiness, dict):
continue
check_id = str(readiness.get("id", "")).strip()
if not check_id:
continue
normalized[check_id] = {
"state": "pending",
"required": bool(readiness.get("required", True)),
"source": hook.get("name"),
"detail": None,
"updated_at": now,
}
write_runtime_status(startup_checks=normalized)
return normalized
def update_startup_check(
check_id: str,
state: str,
*,
detail: Any = _UNSET,
required: Any = _UNSET,
source: Any = _UNSET,
) -> dict[str, Any]:
"""Update a single startup readiness check in the runtime status file."""
normalized_id = str(check_id).strip()
if not normalized_id:
raise ValueError("startup readiness check id is required")
normalized_state = str(state).strip().lower()
if normalized_state not in _VALID_STARTUP_CHECK_STATES:
raise ValueError(f"invalid startup readiness state: {state}")
path = _get_runtime_status_path()
payload = _read_json_file(path) or _build_runtime_status_record()
checks = _normalize_startup_check_entries(payload.get("startup_checks"))
existing = checks.get(normalized_id, {})
now = _utc_now_iso()
checks[normalized_id] = {
"state": normalized_state,
"required": bool(existing.get("required", True) if required is _UNSET else required),
"source": existing.get("source") if source is _UNSET else source,
"detail": existing.get("detail") if detail is _UNSET else detail,
"updated_at": now,
}
payload["startup_checks"] = checks
payload.setdefault("platforms", {})
payload.setdefault("kind", _GATEWAY_KIND)
payload["pid"] = os.getpid()
payload["start_time"] = _get_process_start_time(os.getpid())
payload["updated_at"] = now
_write_json_file(path, payload)
return checks[normalized_id]
def mark_startup_check_pending(
check_id: str,
*,
detail: Any = _UNSET,
required: Any = _UNSET,
source: Any = _UNSET,
) -> dict[str, Any]:
return update_startup_check(check_id, "pending", detail=detail, required=required, source=source)
def mark_startup_check_ready(
check_id: str,
*,
detail: Any = _UNSET,
required: Any = _UNSET,
source: Any = _UNSET,
) -> dict[str, Any]:
return update_startup_check(check_id, "ready", detail=detail, required=required, source=source)
def mark_startup_check_failed(
check_id: str,
*,
detail: Any = _UNSET,
required: Any = _UNSET,
source: Any = _UNSET,
) -> dict[str, Any]:
return update_startup_check(check_id, "failed", detail=detail, required=required, source=source)
def remove_pid_file() -> None:
+4 -7
View File
@@ -609,15 +609,12 @@ class GatewayStreamConsumer:
content=text,
metadata=self.metadata,
)
# Note: do NOT set _already_sent = True here.
# Commentary messages are interim status updates (e.g. "Using browser
# tool..."), not the final response. Setting already_sent would cause
# the final response to be incorrectly suppressed when there are
# multiple tool calls. See: https://github.com/NousResearch/hermes-agent/issues/10454
return result.success
if result.success:
self._already_sent = True
return True
except Exception as e:
logger.error("Commentary send error: %s", e)
return False
return False
async def _send_or_edit(self, text: str) -> bool:
"""Send or edit the streaming message.
+2 -27
View File
@@ -274,14 +274,6 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
api_key_env_vars=("XIAOMI_API_KEY",),
base_url_env_var="XIAOMI_BASE_URL",
),
"bedrock": ProviderConfig(
id="bedrock",
name="AWS Bedrock",
auth_type="aws_sdk",
inference_base_url="https://bedrock-runtime.us-east-1.amazonaws.com",
api_key_env_vars=(),
base_url_env_var="BEDROCK_BASE_URL",
),
}
@@ -932,7 +924,6 @@ def resolve_provider(
"qwen-portal": "qwen-oauth", "qwen-cli": "qwen-oauth", "qwen-oauth": "qwen-oauth",
"hf": "huggingface", "hugging-face": "huggingface", "huggingface-hub": "huggingface",
"mimo": "xiaomi", "xiaomi-mimo": "xiaomi",
"aws": "bedrock", "aws-bedrock": "bedrock", "amazon-bedrock": "bedrock", "amazon": "bedrock",
"go": "opencode-go", "opencode-go-sub": "opencode-go",
"kilo": "kilocode", "kilo-code": "kilocode", "kilo-gateway": "kilocode",
# Local server aliases — route through the generic custom provider
@@ -989,15 +980,6 @@ def resolve_provider(
if has_usable_secret(os.getenv(env_var, "")):
return pid
# AWS Bedrock — detect via boto3 credential chain (IAM roles, SSO, env vars).
# This runs after API-key providers so explicit keys always win.
try:
from agent.bedrock_adapter import has_aws_credentials
if has_aws_credentials():
return "bedrock"
except ImportError:
pass # boto3 not installed — skip Bedrock auto-detection
raise AuthError(
"No inference provider configured. Run 'hermes model' to choose a "
"provider and model, or set an API key (OPENROUTER_API_KEY, "
@@ -2402,7 +2384,7 @@ def get_api_key_provider_status(provider_id: str) -> Dict[str, Any]:
if pconfig.base_url_env_var:
env_url = os.getenv(pconfig.base_url_env_var, "").strip()
if provider_id in ("kimi-coding", "kimi-coding-cn"):
if provider_id == "kimi-coding":
base_url = _resolve_kimi_base_url(api_key, pconfig.inference_base_url, env_url)
elif env_url:
base_url = env_url
@@ -2464,13 +2446,6 @@ def get_auth_status(provider_id: Optional[str] = None) -> Dict[str, Any]:
pconfig = PROVIDER_REGISTRY.get(target)
if pconfig and pconfig.auth_type == "api_key":
return get_api_key_provider_status(target)
# AWS SDK providers (Bedrock) — check via boto3 credential chain
if pconfig and pconfig.auth_type == "aws_sdk":
try:
from agent.bedrock_adapter import has_aws_credentials
return {"logged_in": has_aws_credentials(), "provider": target}
except ImportError:
return {"logged_in": False, "provider": target, "error": "boto3 not installed"}
return {"logged_in": False}
@@ -2495,7 +2470,7 @@ def resolve_api_key_provider_credentials(provider_id: str) -> Dict[str, Any]:
if pconfig.base_url_env_var:
env_url = os.getenv(pconfig.base_url_env_var, "").strip()
if provider_id in ("kimi-coding", "kimi-coding-cn"):
if provider_id == "kimi-coding":
base_url = _resolve_kimi_base_url(api_key, pconfig.inference_base_url, env_url)
elif provider_id == "zai":
base_url = _resolve_zai_base_url(api_key, pconfig.inference_base_url, env_url)
+1 -26
View File
@@ -4,7 +4,6 @@ from __future__ import annotations
from getpass import getpass
import math
import sys
import time
from types import SimpleNamespace
import uuid
@@ -161,10 +160,7 @@ def auth_add_command(args) -> None:
default_label = _api_key_default_label(len(pool.entries()) + 1)
label = (getattr(args, "label", None) or "").strip()
if not label:
if sys.stdin.isatty():
label = input(f"Label (optional, default: {default_label}): ").strip() or default_label
else:
label = default_label
label = input(f"Label (optional, default: {default_label}): ").strip() or default_label
entry = PooledCredential(
provider=provider,
id=uuid.uuid4().hex[:6],
@@ -372,27 +368,6 @@ def _interactive_auth() -> None:
print("=" * 50)
auth_list_command(SimpleNamespace(provider=None))
# Show AWS Bedrock credential status (not in the pool — uses boto3 chain)
try:
from agent.bedrock_adapter import has_aws_credentials, resolve_aws_auth_env_var, resolve_bedrock_region
if has_aws_credentials():
auth_source = resolve_aws_auth_env_var() or "unknown"
region = resolve_bedrock_region()
print(f"bedrock (AWS SDK credential chain):")
print(f" Auth: {auth_source}")
print(f" Region: {region}")
try:
import boto3
sts = boto3.client("sts", region_name=region)
identity = sts.get_caller_identity()
arn = identity.get("Arn", "unknown")
print(f" Identity: {arn}")
except Exception:
print(f" Identity: (could not resolve — boto3 STS call failed)")
print()
except ImportError:
pass # boto3 or bedrock_adapter not available
print()
# Main menu
+4 -19
View File
@@ -164,7 +164,7 @@ COMMAND_REGISTRY: list[CommandDef] = [
# Exit
CommandDef("quit", "Exit the CLI", "Exit",
cli_only=True, aliases=("exit",)),
cli_only=True, aliases=("exit", "q")),
]
@@ -450,7 +450,7 @@ def _collect_gateway_skill_entries(
name = sanitize_name(cmd_name) if sanitize_name else cmd_name
if not name:
continue
desc = plugin_cmds[cmd_name].get("description", "Plugin command")
desc = "Plugin command"
if len(desc) > desc_limit:
desc = desc[:desc_limit - 3] + "..."
plugin_pairs.append((name, desc))
@@ -844,7 +844,8 @@ class SlashCommandCompleter(Completer):
return None
return word
def _context_completions(self, word: str, limit: int = 30):
@staticmethod
def _context_completions(word: str, limit: int = 30):
"""Yield Claude Code-style @ context completions.
Bare ``@`` or ``@partial`` shows static references and matching
@@ -1139,22 +1140,6 @@ class SlashCommandCompleter(Completer):
display_meta=f"{short_desc}",
)
# Plugin-registered slash commands
try:
from hermes_cli.plugins import get_plugin_commands
for cmd_name, cmd_info in get_plugin_commands().items():
if cmd_name.startswith(word):
desc = str(cmd_info.get("description", "Plugin command"))
short_desc = desc[:50] + ("..." if len(desc) > 50 else "")
yield Completion(
self._completion_text(cmd_name, word),
start_position=-len(word),
display=f"/{cmd_name}",
display_meta=f"🔌 {short_desc}",
)
except Exception:
pass
# ---------------------------------------------------------------------------
# Inline auto-suggest (ghost text) for slash commands
+9 -156
View File
@@ -241,41 +241,13 @@ def _secure_dir(path):
pass
def _is_container() -> bool:
"""Detect if we're running inside a Docker/Podman/LXC container.
When Hermes runs in a container with volume-mounted config files, forcing
0o600 permissions breaks multi-process setups where the gateway and
dashboard run as different UIDs or the volume mount requires broader
permissions.
"""
# Explicit opt-out
if os.environ.get("HERMES_CONTAINER") or os.environ.get("HERMES_SKIP_CHMOD"):
return True
# Docker / Podman marker file
if os.path.exists("/.dockerenv"):
return True
# LXC / cgroup-based detection
try:
with open("/proc/1/cgroup", "r") as f:
cgroup_content = f.read()
if "docker" in cgroup_content or "lxc" in cgroup_content or "kubepods" in cgroup_content:
return True
except (OSError, IOError):
pass
return False
def _secure_file(path):
"""Set file to owner-only read/write (0600). No-op on Windows.
Skipped in managed mode the NixOS activation script sets
group-readable permissions (0640) on config files.
Skipped in containers Docker/Podman volume mounts often need broader
permissions. Set HERMES_SKIP_CHMOD=1 to force-skip on other systems.
"""
if is_managed() or _is_container():
if is_managed():
return
try:
if os.path.exists(str(path)):
@@ -447,27 +419,6 @@ DEFAULT_CONFIG = {
"protect_last_n": 20, # minimum recent messages to keep uncompressed
},
# AWS Bedrock provider configuration.
# Only used when model.provider is "bedrock".
"bedrock": {
"region": "", # AWS region for Bedrock API calls (empty = AWS_REGION env var → us-east-1)
"discovery": {
"enabled": True, # Auto-discover models via ListFoundationModels
"provider_filter": [], # Only show models from these providers (e.g. ["anthropic", "amazon"])
"refresh_interval": 3600, # Cache discovery results for this many seconds
},
"guardrail": {
# Amazon Bedrock Guardrails — content filtering and safety policies.
# Create a guardrail in the Bedrock console, then set the ID and version here.
# See: https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails.html
"guardrail_identifier": "", # e.g. "abc123def456"
"guardrail_version": "", # e.g. "1" or "DRAFT"
"stream_processing_mode": "async", # "sync" or "async"
"trace": "disabled", # "enabled", "disabled", or "enabled_full"
},
},
"smart_model_routing": {
"enabled": False,
"max_simple_chars": 160,
@@ -687,7 +638,6 @@ DEFAULT_CONFIG = {
"allowed_channels": "", # If set, bot ONLY responds in these channel IDs (whitelist)
"auto_thread": True, # Auto-create threads on @mention in channels (like Slack)
"reactions": True, # Add 👀/✅/❌ reactions to messages during processing
"channel_prompts": {}, # Per-channel ephemeral system prompts (forum parents apply to child threads)
},
# WhatsApp platform settings (gateway mode)
@@ -698,21 +648,6 @@ DEFAULT_CONFIG = {
# Supports \n for newlines, e.g. "🤖 *My Bot*\n──────\n"
},
# Telegram platform settings (gateway mode)
"telegram": {
"channel_prompts": {}, # Per-chat/topic ephemeral system prompts (topics inherit from parent group)
},
# Slack platform settings (gateway mode)
"slack": {
"channel_prompts": {}, # Per-channel ephemeral system prompts
},
# Mattermost platform settings (gateway mode)
"mattermost": {
"channel_prompts": {}, # Per-channel ephemeral system prompts
},
# Approval mode for dangerous commands:
# manual — always prompt the user (default)
# smart — use auxiliary LLM to auto-approve low-risk commands, prompt for high-risk
@@ -768,7 +703,7 @@ DEFAULT_CONFIG = {
},
# Config schema version - bump this when adding new required fields
"_config_version": 18,
"_config_version": 17,
}
# =============================================================================
@@ -1039,22 +974,6 @@ OPTIONAL_ENV_VARS = {
"category": "provider",
"advanced": True,
},
"AWS_REGION": {
"description": "AWS region for Bedrock API calls (e.g. us-east-1, eu-central-1)",
"prompt": "AWS Region",
"url": "https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-regions.html",
"password": False,
"category": "provider",
"advanced": True,
},
"AWS_PROFILE": {
"description": "AWS named profile for Bedrock authentication (from ~/.aws/credentials)",
"prompt": "AWS Profile",
"url": None,
"password": False,
"category": "provider",
"advanced": True,
},
# ── Tool API keys ──
"EXA_API_KEY": {
@@ -1252,12 +1171,6 @@ OPTIONAL_ENV_VARS = {
"password": False,
"category": "messaging",
},
"TELEGRAM_PROXY": {
"description": "Proxy URL for Telegram connections (overrides HTTPS_PROXY). Supports http://, https://, socks5://",
"prompt": "Telegram proxy URL (optional)",
"password": False,
"category": "messaging",
},
"DISCORD_BOT_TOKEN": {
"description": "Discord bot token from Developer Portal",
"prompt": "Discord bot token",
@@ -2853,47 +2766,6 @@ def sanitize_env_file() -> int:
return fixes
def _check_non_ascii_credential(key: str, value: str) -> str:
"""Warn and strip non-ASCII characters from credential values.
API keys and tokens must be pure ASCII they are sent as HTTP header
values which httpx/httpcore encode as ASCII. Non-ASCII characters
(commonly introduced by copy-pasting from rich-text editors or PDFs
that substitute lookalike Unicode glyphs for ASCII letters) cause
``UnicodeEncodeError: 'ascii' codec can't encode character`` at
request time.
Returns the sanitized (ASCII-only) value. Prints a warning if any
non-ASCII characters were found and removed.
"""
try:
value.encode("ascii")
return value # all ASCII — nothing to do
except UnicodeEncodeError:
pass
# Build a readable list of the offending characters
bad_chars: list[str] = []
for i, ch in enumerate(value):
if ord(ch) > 127:
bad_chars.append(f" position {i}: {ch!r} (U+{ord(ch):04X})")
sanitized = value.encode("ascii", errors="ignore").decode("ascii")
import sys
print(
f"\n Warning: {key} contains non-ASCII characters that will break API requests.\n"
f" This usually happens when copy-pasting from a PDF, rich-text editor,\n"
f" or web page that substitutes lookalike Unicode glyphs for ASCII letters.\n"
f"\n"
+ "\n".join(f" {line}" for line in bad_chars[:5])
+ ("\n ... and more" if len(bad_chars) > 5 else "")
+ f"\n\n The non-ASCII characters have been stripped automatically.\n"
f" If authentication fails, re-copy the key from the provider's dashboard.\n",
file=sys.stderr,
)
return sanitized
def save_env_value(key: str, value: str):
"""Save or update a value in ~/.hermes/.env."""
if is_managed():
@@ -2902,8 +2774,6 @@ def save_env_value(key: str, value: str):
if not _ENV_VAR_NAME_RE.match(key):
raise ValueError(f"Invalid environment variable name: {key!r}")
value = value.replace("\n", "").replace("\r", "")
# API keys / tokens must be ASCII — strip non-ASCII with a warning.
value = _check_non_ascii_credential(key, value)
ensure_hermes_home()
env_path = get_env_path()
@@ -2934,25 +2804,12 @@ def save_env_value(key: str, value: str):
lines.append(f"{key}={value}\n")
fd, tmp_path = tempfile.mkstemp(dir=str(env_path.parent), suffix='.tmp', prefix='.env_')
# Preserve original permissions so Docker volume mounts aren't clobbered.
original_mode = None
if env_path.exists():
try:
original_mode = stat.S_IMODE(env_path.stat().st_mode)
except OSError:
pass
try:
with os.fdopen(fd, 'w', **write_kw) as f:
f.writelines(lines)
f.flush()
os.fsync(f.fileno())
os.replace(tmp_path, env_path)
# Restore original permissions before _secure_file may tighten them.
if original_mode is not None:
try:
os.chmod(env_path, original_mode)
except OSError:
pass
except BaseException:
try:
os.unlink(tmp_path)
@@ -2963,6 +2820,13 @@ def save_env_value(key: str, value: str):
os.environ[key] = value
# Restrict .env permissions to owner-only (contains API keys)
if not _IS_WINDOWS:
try:
os.chmod(env_path, stat.S_IRUSR | stat.S_IWUSR)
except OSError:
pass
def remove_env_value(key: str) -> bool:
"""Remove a key from ~/.hermes/.env and os.environ.
@@ -2991,23 +2855,12 @@ def remove_env_value(key: str) -> bool:
if found:
fd, tmp_path = tempfile.mkstemp(dir=str(env_path.parent), suffix='.tmp', prefix='.env_')
# Preserve original permissions so Docker volume mounts aren't clobbered.
original_mode = None
try:
original_mode = stat.S_IMODE(env_path.stat().st_mode)
except OSError:
pass
try:
with os.fdopen(fd, 'w', **write_kw) as f:
f.writelines(new_lines)
f.flush()
os.fsync(f.fileno())
os.replace(tmp_path, env_path)
if original_mode is not None:
try:
os.chmod(env_path, original_mode)
except OSError:
pass
except BaseException:
try:
os.unlink(tmp_path)
+2 -143
View File
@@ -27,110 +27,6 @@ _DPASTE_COM_URL = "https://dpaste.com/api/"
# paste.rs caps at ~1 MB; we stay under that with headroom.
_MAX_LOG_BYTES = 512_000
# Auto-delete pastes after this many seconds (1 hour).
_AUTO_DELETE_SECONDS = 3600
# ---------------------------------------------------------------------------
# Privacy / delete helpers
# ---------------------------------------------------------------------------
_PRIVACY_NOTICE = """\
This will upload the following to a public paste service:
System info (OS, Python version, Hermes version, provider, which API keys
are configured NOT the actual keys)
Recent log lines (agent.log, errors.log, gateway.log may contain
conversation fragments and file paths)
Full agent.log and gateway.log (up to 512 KB each likely contains
conversation content, tool outputs, and file paths)
Pastes auto-delete after 1 hour.
"""
_GATEWAY_PRIVACY_NOTICE = (
"⚠️ **Privacy notice:** This uploads system info + recent log tails "
"(may contain conversation fragments) to a public paste service. "
"Full logs are NOT included from the gateway — use `hermes debug share` "
"from the CLI for full log uploads.\n"
"Pastes auto-delete after 1 hour."
)
def _extract_paste_id(url: str) -> Optional[str]:
"""Extract the paste ID from a paste.rs or dpaste.com URL.
Returns the ID string, or None if the URL doesn't match a known service.
"""
url = url.strip().rstrip("/")
for prefix in ("https://paste.rs/", "http://paste.rs/"):
if url.startswith(prefix):
return url[len(prefix):]
return None
def delete_paste(url: str) -> bool:
"""Delete a paste from paste.rs. Returns True on success.
Only paste.rs supports unauthenticated DELETE. dpaste.com pastes
expire automatically but cannot be deleted via API.
"""
paste_id = _extract_paste_id(url)
if not paste_id:
raise ValueError(
f"Cannot delete: only paste.rs URLs are supported. Got: {url}"
)
target = f"{_PASTE_RS_URL}{paste_id}"
req = urllib.request.Request(
target, method="DELETE",
headers={"User-Agent": "hermes-agent/debug-share"},
)
with urllib.request.urlopen(req, timeout=30) as resp:
return 200 <= resp.status < 300
def _schedule_auto_delete(urls: list[str], delay_seconds: int = _AUTO_DELETE_SECONDS):
"""Spawn a detached process to delete paste.rs pastes after *delay_seconds*.
The child process is fully detached (``start_new_session=True``) so it
survives the parent exiting (important for CLI mode). Only paste.rs
URLs are attempted dpaste.com pastes auto-expire on their own.
"""
import subprocess
paste_rs_urls = [u for u in urls if _extract_paste_id(u)]
if not paste_rs_urls:
return
# Build a tiny inline Python script. No imports beyond stdlib.
url_list = ", ".join(f'"{u}"' for u in paste_rs_urls)
script = (
"import time, urllib.request; "
f"time.sleep({delay_seconds}); "
f"[urllib.request.urlopen(urllib.request.Request(u, method='DELETE', "
f"headers={{'User-Agent': 'hermes-agent/auto-delete'}}), timeout=15) "
f"for u in [{url_list}]]"
)
try:
subprocess.Popen(
[sys.executable, "-c", script],
start_new_session=True,
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
)
except Exception:
pass # Best-effort; manual delete still available.
def _delete_hint(url: str) -> str:
"""Return a one-liner delete command for the given paste URL."""
paste_id = _extract_paste_id(url)
if paste_id:
return f"hermes debug delete {url}"
# dpaste.com — no API delete, expires on its own.
return "(auto-expires per dpaste.com policy)"
def _upload_paste_rs(content: str) -> str:
"""Upload to paste.rs. Returns the paste URL.
@@ -354,9 +250,6 @@ def run_debug_share(args):
expiry = getattr(args, "expire", 7)
local_only = getattr(args, "local", False)
if not local_only:
print(_PRIVACY_NOTICE)
print("Collecting debug report...")
# Capture dump once — prepended to every paste for context.
@@ -422,56 +315,22 @@ def run_debug_share(args):
if failures:
print(f"\n (failed to upload: {', '.join(failures)})")
# Schedule auto-deletion after 1 hour
_schedule_auto_delete(list(urls.values()))
print(f"\n⏱ Pastes will auto-delete in 1 hour.")
# Manual delete fallback
print(f"To delete now: hermes debug delete <url>")
print(f"\nShare these links with the Hermes team for support.")
def run_debug_delete(args):
"""Delete one or more paste URLs uploaded by /debug."""
urls = getattr(args, "urls", [])
if not urls:
print("Usage: hermes debug delete <url> [<url> ...]")
print(" Deletes paste.rs pastes uploaded by 'hermes debug share'.")
return
for url in urls:
try:
ok = delete_paste(url)
if ok:
print(f" ✓ Deleted: {url}")
else:
print(f" ✗ Failed to delete: {url} (unexpected response)")
except ValueError as exc:
print(f"{exc}")
except Exception as exc:
print(f" ✗ Could not delete {url}: {exc}")
def run_debug(args):
"""Route debug subcommands."""
subcmd = getattr(args, "debug_command", None)
if subcmd == "share":
run_debug_share(args)
elif subcmd == "delete":
run_debug_delete(args)
else:
# Default: show help
print("Usage: hermes debug <command>")
print("Usage: hermes debug share [--lines N] [--expire N] [--local]")
print()
print("Commands:")
print(" share Upload debug report to a paste service and print URL")
print(" delete Delete a previously uploaded paste")
print()
print("Options (share):")
print("Options:")
print(" --lines N Number of log lines to include (default: 200)")
print(" --expire N Paste expiry in days (default: 7)")
print(" --local Print report locally instead of uploading")
print()
print("Options (delete):")
print(" <url> ... One or more paste URLs to delete")
+2 -109
View File
@@ -8,7 +8,6 @@ import os
import sys
import subprocess
import shutil
from pathlib import Path
from hermes_cli.config import get_project_root, get_hermes_home, get_env_path
from hermes_constants import display_hermes_home
@@ -514,87 +513,7 @@ def run_doctor(args):
pass
_check_gateway_service_linger(issues)
# =========================================================================
# Check: Command installation (hermes bin symlink)
# =========================================================================
if sys.platform != "win32":
print()
print(color("◆ Command Installation", Colors.CYAN, Colors.BOLD))
# Determine the venv entry point location
_venv_bin = None
for _venv_name in ("venv", ".venv"):
_candidate = PROJECT_ROOT / _venv_name / "bin" / "hermes"
if _candidate.exists():
_venv_bin = _candidate
break
# Determine the expected command link directory (mirrors install.sh logic)
_prefix = os.environ.get("PREFIX", "")
_is_termux_env = bool(os.environ.get("TERMUX_VERSION")) or "com.termux/files/usr" in _prefix
if _is_termux_env and _prefix:
_cmd_link_dir = Path(_prefix) / "bin"
_cmd_link_display = "$PREFIX/bin"
else:
_cmd_link_dir = Path.home() / ".local" / "bin"
_cmd_link_display = "~/.local/bin"
_cmd_link = _cmd_link_dir / "hermes"
if _venv_bin is None:
check_warn(
"Venv entry point not found",
"(hermes not in venv/bin/ or .venv/bin/ — reinstall with pip install -e '.[all]')"
)
manual_issues.append(
f"Reinstall entry point: cd {PROJECT_ROOT} && source venv/bin/activate && pip install -e '.[all]'"
)
else:
check_ok(f"Venv entry point exists ({_venv_bin.relative_to(PROJECT_ROOT)})")
# Check the symlink at the command link location
if _cmd_link.is_symlink():
_target = _cmd_link.resolve()
_expected = _venv_bin.resolve()
if _target == _expected:
check_ok(f"{_cmd_link_display}/hermes → correct target")
else:
check_warn(
f"{_cmd_link_display}/hermes points to wrong target",
f"(→ {_target}, expected → {_expected})"
)
if should_fix:
_cmd_link.unlink()
_cmd_link.symlink_to(_venv_bin)
check_ok(f"Fixed symlink: {_cmd_link_display}/hermes → {_venv_bin}")
fixed_count += 1
else:
issues.append(f"Broken symlink at {_cmd_link_display}/hermes — run 'hermes doctor --fix'")
elif _cmd_link.exists():
# It's a regular file, not a symlink — possibly a wrapper script
check_ok(f"{_cmd_link_display}/hermes exists (non-symlink)")
else:
check_fail(
f"{_cmd_link_display}/hermes not found",
"(hermes command may not work outside the venv)"
)
if should_fix:
_cmd_link_dir.mkdir(parents=True, exist_ok=True)
_cmd_link.symlink_to(_venv_bin)
check_ok(f"Created symlink: {_cmd_link_display}/hermes → {_venv_bin}")
fixed_count += 1
# Check if the link dir is on PATH
_path_dirs = os.environ.get("PATH", "").split(os.pathsep)
if str(_cmd_link_dir) not in _path_dirs:
check_warn(
f"{_cmd_link_display} is not on your PATH",
"(add it to your shell config: export PATH=\"$HOME/.local/bin:$PATH\")"
)
manual_issues.append(f"Add {_cmd_link_display} to your PATH")
else:
issues.append(f"Missing {_cmd_link_display}/hermes symlink — run 'hermes doctor --fix'")
# =========================================================================
# Check: External tools
# =========================================================================
@@ -814,8 +733,7 @@ def run_doctor(args):
("Vercel AI Gateway", ("AI_GATEWAY_API_KEY",), "https://ai-gateway.vercel.sh/v1/models", "AI_GATEWAY_BASE_URL", True),
("Kilo Code", ("KILOCODE_API_KEY",), "https://api.kilo.ai/api/gateway/models", "KILOCODE_BASE_URL", True),
("OpenCode Zen", ("OPENCODE_ZEN_API_KEY",), "https://opencode.ai/zen/v1/models", "OPENCODE_ZEN_BASE_URL", True),
# OpenCode Go has no shared /models endpoint; skip the health check.
("OpenCode Go", ("OPENCODE_GO_API_KEY",), None, "OPENCODE_GO_BASE_URL", False),
("OpenCode Go", ("OPENCODE_GO_API_KEY",), "https://opencode.ai/zen/go/v1/models", "OPENCODE_GO_BASE_URL", True),
]
for _pname, _env_vars, _default_url, _base_env, _supports_health_check in _apikey_providers:
_key = ""
@@ -860,31 +778,6 @@ def run_doctor(args):
except Exception as _e:
print(f"\r {color('', Colors.YELLOW)} {_label} {color(f'({_e})', Colors.DIM)} ")
# -- AWS Bedrock --
# Bedrock uses the AWS SDK credential chain, not API keys.
try:
from agent.bedrock_adapter import has_aws_credentials, resolve_aws_auth_env_var, resolve_bedrock_region
if has_aws_credentials():
_auth_var = resolve_aws_auth_env_var()
_region = resolve_bedrock_region()
_label = "AWS Bedrock".ljust(20)
print(f" Checking AWS Bedrock...", end="", flush=True)
try:
import boto3
_br_client = boto3.client("bedrock", region_name=_region)
_br_resp = _br_client.list_foundation_models()
_model_count = len(_br_resp.get("modelSummaries", []))
print(f"\r {color('', Colors.GREEN)} {_label} {color(f'({_auth_var}, {_region}, {_model_count} models)', Colors.DIM)} ")
except ImportError:
print(f"\r {color('', Colors.YELLOW)} {_label} {color('(boto3 not installed — pip install hermes-agent[bedrock])', Colors.DIM)} ")
issues.append("Install boto3 for Bedrock: pip install hermes-agent[bedrock]")
except Exception as _e:
_err_name = type(_e).__name__
print(f"\r {color('', Colors.YELLOW)} {_label} {color(f'({_err_name}: {_e})', Colors.DIM)} ")
issues.append(f"AWS Bedrock: {_err_name} — check IAM permissions for bedrock:ListFoundationModels")
except ImportError:
pass # bedrock_adapter not available — skip silently
# =========================================================================
# Check: Submodules
# =========================================================================
-29
View File
@@ -8,40 +8,11 @@ from pathlib import Path
from dotenv import load_dotenv
# Env var name suffixes that indicate credential values. These are the
# only env vars whose values we sanitize on load — we must not silently
# alter arbitrary user env vars, but credentials are known to require
# pure ASCII (they become HTTP header values).
_CREDENTIAL_SUFFIXES = ("_API_KEY", "_TOKEN", "_SECRET", "_KEY")
def _sanitize_loaded_credentials() -> None:
"""Strip non-ASCII characters from credential env vars in os.environ.
Called after dotenv loads so the rest of the codebase never sees
non-ASCII API keys. Only touches env vars whose names end with
known credential suffixes (``_API_KEY``, ``_TOKEN``, etc.).
"""
for key, value in list(os.environ.items()):
if not any(key.endswith(suffix) for suffix in _CREDENTIAL_SUFFIXES):
continue
try:
value.encode("ascii")
except UnicodeEncodeError:
os.environ[key] = value.encode("ascii", errors="ignore").decode("ascii")
def _load_dotenv_with_fallback(path: Path, *, override: bool) -> None:
try:
load_dotenv(dotenv_path=path, override=override, encoding="utf-8")
except UnicodeDecodeError:
load_dotenv(dotenv_path=path, override=override, encoding="latin-1")
# Strip non-ASCII characters from credential env vars that were just
# loaded. API keys must be pure ASCII since they're sent as HTTP
# header values (httpx encodes headers as ASCII). Non-ASCII chars
# typically come from copy-pasting keys from PDFs or rich-text editors
# that substitute Unicode lookalike glyphs (e.g. ʋ U+028B for v).
_sanitize_loaded_credentials()
def _sanitize_env_file_if_needed(path: Path) -> None:
+127 -111
View File
@@ -10,6 +10,7 @@ import shutil
import signal
import subprocess
import sys
import time
from pathlib import Path
PROJECT_ROOT = Path(__file__).parent.parent.resolve()
@@ -37,6 +38,10 @@ from hermes_cli.setup import (
from hermes_cli.colors import Colors, color
_SERVICE_READINESS_TIMEOUT = 30.0
_SERVICE_READINESS_POLL_INTERVAL = 0.2
# =============================================================================
# Process Management (for manual gateway runs)
# =============================================================================
@@ -222,7 +227,7 @@ def find_gateway_pids(exclude_pids: set | None = None, all_profiles: bool = Fals
current_cmd = ""
else:
result = subprocess.run(
["ps", "-A", "eww", "-o", "pid=,command="],
["ps", "eww", "-ax", "-o", "pid=,command="],
capture_output=True,
text=True,
timeout=10,
@@ -715,9 +720,7 @@ def _detect_venv_dir() -> Path | None:
"""Detect the active virtualenv directory.
Checks ``sys.prefix`` first (works regardless of the directory name),
then ``VIRTUAL_ENV`` env var (covers uv-managed environments where
sys.prefix == sys.base_prefix), then falls back to probing common
directory names under PROJECT_ROOT.
then falls back to probing common directory names under PROJECT_ROOT.
Returns ``None`` when no virtualenv can be found.
"""
# If we're running inside a virtualenv, sys.prefix points to it.
@@ -726,15 +729,6 @@ def _detect_venv_dir() -> Path | None:
if venv.is_dir():
return venv
# uv and some other tools set VIRTUAL_ENV without changing sys.prefix.
# This catches `uv run` where sys.prefix == sys.base_prefix but the
# environment IS a venv. (#8620)
_virtual_env = os.environ.get("VIRTUAL_ENV")
if _virtual_env:
venv = Path(_virtual_env)
if venv.is_dir():
return venv
# Fallback: check common virtualenv directory names under the project root.
for candidate in (".venv", "venv"):
venv = PROJECT_ROOT / candidate
@@ -1111,12 +1105,123 @@ def systemd_uninstall(system: bool = False):
print(f"{_service_scope_label(system).capitalize()} service uninstalled")
def _describe_startup_check(check_id: str, check: dict) -> str:
source = check.get("source")
detail = check.get("detail")
label = f"{check_id} ({source})" if source and source != check_id else check_id
return f"{label}: {detail}" if detail else label
def _classify_startup_checks(state: dict | None) -> tuple[list[str], list[str], list[str]]:
checks = (state or {}).get("startup_checks") or {}
pending_required: list[str] = []
failed_required: list[str] = []
optional_warnings: list[str] = []
if not isinstance(checks, dict):
return pending_required, failed_required, optional_warnings
for check_id, raw_check in checks.items():
check = raw_check if isinstance(raw_check, dict) else {}
label = _describe_startup_check(str(check_id), check)
check_state = str(check.get("state", "pending")).strip().lower()
required = bool(check.get("required", True))
if check_state == "ready":
continue
if required:
if check_state == "failed":
failed_required.append(label)
else:
pending_required.append(label)
else:
prefix = "failed" if check_state == "failed" else "pending"
optional_warnings.append(f"{prefix}: {label}")
return pending_required, failed_required, optional_warnings
def _wait_for_service_readiness(
*,
action: str,
previous_pid: int | None = None,
timeout: float = _SERVICE_READINESS_TIMEOUT,
poll_interval: float = _SERVICE_READINESS_POLL_INTERVAL,
) -> list[str]:
from gateway.status import get_running_pid, read_runtime_status
deadline = time.monotonic() + timeout
last_pending: list[str] = []
while time.monotonic() < deadline:
live_pid = get_running_pid()
if live_pid is None or (previous_pid is not None and live_pid == previous_pid):
time.sleep(poll_interval)
continue
runtime = read_runtime_status() or {}
try:
runtime_pid = int(runtime.get("pid"))
except (TypeError, ValueError):
runtime_pid = None
if runtime_pid != live_pid:
time.sleep(poll_interval)
continue
gateway_state = runtime.get("gateway_state")
pending_required, failed_required, optional_warnings = _classify_startup_checks(runtime)
last_pending = pending_required
if gateway_state == "startup_failed":
reason = runtime.get("exit_reason") or f"gateway {action} failed during startup"
raise RuntimeError(reason)
if failed_required:
raise RuntimeError(
"required startup checks failed: " + "; ".join(failed_required)
)
if gateway_state == "running" and not pending_required:
return optional_warnings
time.sleep(poll_interval)
if last_pending:
raise RuntimeError(
"timed out waiting for required startup checks: " + "; ".join(last_pending)
)
if previous_pid is not None:
raise RuntimeError(
f"timed out waiting for gateway {action}; previous process is still active or no new runtime became ready"
)
raise RuntimeError(f"timed out waiting for gateway {action} readiness")
def _await_service_ready_or_exit(
*,
action: str,
previous_pid: int | None = None,
timeout: float = _SERVICE_READINESS_TIMEOUT,
) -> None:
try:
optional_warnings = _wait_for_service_readiness(
action=action,
previous_pid=previous_pid,
timeout=timeout,
)
except RuntimeError as exc:
print_error(f" Gateway {action} did not become ready: {exc}")
raise SystemExit(1) from exc
for warning in optional_warnings:
print_warning(f" Optional startup check {warning}")
def systemd_start(system: bool = False):
system = _select_systemd_scope(system)
if system:
_require_root_for_system_service("start")
refresh_systemd_unit_if_needed(system=system)
_run_systemctl(["start", get_service_name()], system=system, check=True, timeout=30)
_await_service_ready_or_exit(action="start")
print(f"{_service_scope_label(system).capitalize()} service started")
@@ -1139,64 +1244,11 @@ def systemd_restart(system: bool = False):
pid = get_running_pid()
if pid is not None and _request_gateway_self_restart(pid):
# SIGUSR1 sent — the gateway will drain active agents, exit with
# code 75, and systemd will restart it after RestartSec (30s).
# Wait for the old process to die and the new one to become active
# so the CLI doesn't return while the service is still restarting.
import time
scope_label = _service_scope_label(system).capitalize()
svc = get_service_name()
scope_cmd = _systemctl_cmd(system)
# Phase 1: wait for old process to exit (drain + shutdown)
print(f"{scope_label} service draining active work...")
deadline = time.time() + 90
while time.time() < deadline:
try:
os.kill(pid, 0)
time.sleep(1)
except (ProcessLookupError, PermissionError):
break # old process is gone
else:
print(f"⚠ Old process (PID {pid}) still alive after 90s")
# Phase 2: wait for systemd to start the new process
print(f"⏳ Waiting for {svc} to restart...")
deadline = time.time() + 60
while time.time() < deadline:
try:
result = subprocess.run(
scope_cmd + ["is-active", svc],
capture_output=True, text=True, timeout=5,
)
if result.stdout.strip() == "active":
# Verify it's a NEW process, not the old one somehow
new_pid = get_running_pid()
if new_pid and new_pid != pid:
print(f"{scope_label} service restarted (PID {new_pid})")
return
except (subprocess.TimeoutExpired, FileNotFoundError):
pass
time.sleep(2)
# Timed out — check final state
try:
result = subprocess.run(
scope_cmd + ["is-active", svc],
capture_output=True, text=True, timeout=5,
)
if result.stdout.strip() == "active":
print(f"{scope_label} service restarted")
return
except Exception:
pass
print(
f"{scope_label} service did not become active within 60s.\n"
f" Check status: {'sudo ' if system else ''}hermes gateway status\n"
f" Check logs: journalctl {'--user ' if not system else ''}-u {svc} --since '2 min ago'"
)
_await_service_ready_or_exit(action="restart", previous_pid=pid)
print(f"{_service_scope_label(system).capitalize()} service restarted")
return
_run_systemctl(["reload-or-restart", get_service_name()], system=system, check=True, timeout=90)
_await_service_ready_or_exit(action="restart", previous_pid=pid)
print(f"{_service_scope_label(system).capitalize()} service restarted")
@@ -1455,6 +1507,7 @@ def launchd_start():
plist_path.write_text(generate_launchd_plist(), encoding="utf-8")
subprocess.run(["launchctl", "bootstrap", _launchd_domain(), str(plist_path)], check=True, timeout=30)
subprocess.run(["launchctl", "kickstart", f"{_launchd_domain()}/{label}"], check=True, timeout=30)
_await_service_ready_or_exit(action="start")
print("✓ Service started")
return
@@ -1467,6 +1520,7 @@ def launchd_start():
print("↻ launchd job was unloaded; reloading service definition")
subprocess.run(["launchctl", "bootstrap", _launchd_domain(), str(plist_path)], check=True, timeout=30)
subprocess.run(["launchctl", "kickstart", f"{_launchd_domain()}/{label}"], check=True, timeout=30)
_await_service_ready_or_exit(action="start")
print("✓ Service started")
def launchd_stop():
@@ -1537,7 +1591,8 @@ def launchd_restart():
try:
pid = get_running_pid()
if pid is not None and _request_gateway_self_restart(pid):
print("✓ Service restart requested")
_await_service_ready_or_exit(action="restart", previous_pid=pid)
print("✓ Service restarted")
return
if pid is not None:
try:
@@ -1549,6 +1604,7 @@ def launchd_restart():
if not exited:
print(f"⚠ Gateway drain timed out after {drain_timeout:.0f}s — forcing launchd restart")
subprocess.run(["launchctl", "kickstart", "-k", target], check=True, timeout=90)
_await_service_ready_or_exit(action="restart", previous_pid=pid)
print("✓ Service restarted")
except subprocess.CalledProcessError as e:
if e.returncode not in (3, 113):
@@ -1558,6 +1614,7 @@ def launchd_restart():
plist_path = get_launchd_plist_path()
subprocess.run(["launchctl", "bootstrap", _launchd_domain(), str(plist_path)], check=True, timeout=30)
subprocess.run(["launchctl", "kickstart", target], check=True, timeout=30)
_await_service_ready_or_exit(action="restart", previous_pid=pid)
print("✓ Service restarted")
def launchd_status(deep: bool = False):
@@ -2930,15 +2987,6 @@ def gateway_command(args):
elif subcmd == "start":
system = getattr(args, 'system', False)
start_all = getattr(args, 'all', False)
if start_all:
# Kill all stale gateway processes across all profiles before starting
killed = kill_gateway_processes(all_profiles=True)
if killed:
print(f"✓ Killed {killed} stale gateway process(es) across all profiles")
_wait_for_gateway_exit(timeout=10.0, force_after=5.0)
if is_termux():
print("Gateway service start is not supported on Termux because there is no system service manager.")
print("Run manually: hermes gateway")
@@ -3024,39 +3072,7 @@ def gateway_command(args):
# Try service first, fall back to killing and restarting
service_available = False
system = getattr(args, 'system', False)
restart_all = getattr(args, 'all', False)
service_configured = False
if restart_all:
# --all: stop every gateway process across all profiles, then start fresh
service_stopped = False
if supports_systemd_services() and (get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()):
try:
systemd_stop(system=system)
service_stopped = True
except subprocess.CalledProcessError:
pass
elif is_macos() and get_launchd_plist_path().exists():
try:
launchd_stop()
service_stopped = True
except subprocess.CalledProcessError:
pass
killed = kill_gateway_processes(all_profiles=True)
total = killed + (1 if service_stopped else 0)
if total:
print(f"✓ Stopped {total} gateway process(es) across all profiles")
_wait_for_gateway_exit(timeout=10.0, force_after=5.0)
# Start the current profile's service fresh
print("Starting gateway...")
if supports_systemd_services() and (get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()):
systemd_start(system=system)
elif is_macos() and get_launchd_plist_path().exists():
launchd_start()
else:
run_gateway(verbose=0)
return
if supports_systemd_services() and (get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()):
service_configured = True
+1 -295
View File
@@ -1139,8 +1139,6 @@ def select_provider_and_model(args=None):
_model_flow_anthropic(config, current_model)
elif selected_provider == "kimi-coding":
_model_flow_kimi(config, current_model)
elif selected_provider == "bedrock":
_model_flow_bedrock(config, current_model)
elif selected_provider in ("gemini", "deepseek", "xai", "zai", "kimi-coding-cn", "minimax", "minimax-cn", "kilocode", "opencode-zen", "opencode-go", "ai-gateway", "alibaba", "huggingface", "xiaomi", "arcee"):
_model_flow_api_key_provider(config, selected_provider, current_model)
@@ -2427,252 +2425,6 @@ def _model_flow_kimi(config, current_model=""):
print("No change.")
def _model_flow_bedrock_api_key(config, region, current_model=""):
"""Bedrock API Key mode — uses the OpenAI-compatible bedrock-mantle endpoint.
For developers who don't have an AWS account but received a Bedrock API Key
from their AWS admin. Works like any OpenAI-compatible endpoint.
"""
from hermes_cli.auth import _prompt_model_selection, _save_model_choice, deactivate_provider
from hermes_cli.config import load_config, save_config, get_env_value, save_env_value
from hermes_cli.models import _PROVIDER_MODELS
mantle_base_url = f"https://bedrock-mantle.{region}.api.aws/v1"
# Prompt for API key
existing_key = get_env_value("AWS_BEARER_TOKEN_BEDROCK") or ""
if existing_key:
print(f" Bedrock API Key: {existing_key[:12]}... ✓")
else:
print(f" Endpoint: {mantle_base_url}")
print()
try:
import getpass
api_key = getpass.getpass(" Bedrock API Key: ").strip()
except (KeyboardInterrupt, EOFError):
print()
return
if not api_key:
print(" Cancelled.")
return
save_env_value("AWS_BEARER_TOKEN_BEDROCK", api_key)
existing_key = api_key
print(" ✓ API key saved.")
print()
# Model selection — use static list (mantle doesn't need boto3 for discovery)
model_list = _PROVIDER_MODELS.get("bedrock", [])
print(f" Showing {len(model_list)} curated models")
if model_list:
selected = _prompt_model_selection(model_list, current_model=current_model)
else:
try:
selected = input(" Model ID: ").strip()
except (KeyboardInterrupt, EOFError):
selected = None
if selected:
_save_model_choice(selected)
# Save as custom provider pointing to bedrock-mantle
cfg = load_config()
model = cfg.get("model")
if not isinstance(model, dict):
model = {"default": model} if model else {}
cfg["model"] = model
model["provider"] = "custom"
model["base_url"] = mantle_base_url
model.pop("api_mode", None) # chat_completions is the default
# Also save region in bedrock config for reference
bedrock_cfg = cfg.get("bedrock", {})
if not isinstance(bedrock_cfg, dict):
bedrock_cfg = {}
bedrock_cfg["region"] = region
cfg["bedrock"] = bedrock_cfg
# Save the API key env var name so hermes knows where to find it
save_env_value("OPENAI_API_KEY", existing_key)
save_env_value("OPENAI_BASE_URL", mantle_base_url)
save_config(cfg)
deactivate_provider()
print(f" Default model set to: {selected} (via Bedrock API Key, {region})")
print(f" Endpoint: {mantle_base_url}")
else:
print(" No change.")
def _model_flow_bedrock(config, current_model=""):
"""AWS Bedrock provider: verify credentials, pick region, discover models.
Uses the native Converse API via boto3 not the OpenAI-compatible endpoint.
Auth is handled by the AWS SDK default credential chain (env vars, profile,
instance role), so no API key prompt is needed.
"""
from hermes_cli.auth import _prompt_model_selection, _save_model_choice, deactivate_provider
from hermes_cli.config import load_config, save_config
from hermes_cli.models import _PROVIDER_MODELS
# 1. Check for AWS credentials
try:
from agent.bedrock_adapter import (
has_aws_credentials,
resolve_aws_auth_env_var,
resolve_bedrock_region,
discover_bedrock_models,
)
except ImportError:
print(" ✗ boto3 is not installed. Install it with:")
print(" pip install boto3")
print()
return
if not has_aws_credentials():
print(" ⚠ No AWS credentials detected via environment variables.")
print(" Bedrock will use boto3's default credential chain (IMDS, SSO, etc.)")
print()
auth_var = resolve_aws_auth_env_var()
if auth_var:
print(f" AWS credentials: {auth_var}")
else:
print(" AWS credentials: boto3 default chain (instance role / SSO)")
print()
# 2. Region selection
current_region = resolve_bedrock_region()
try:
region_input = input(f" AWS Region [{current_region}]: ").strip()
except (KeyboardInterrupt, EOFError):
print()
return
region = region_input or current_region
# 2b. Authentication mode
print(" Choose authentication method:")
print()
print(" 1. IAM credential chain (recommended)")
print(" Works with EC2 instance roles, SSO, env vars, aws configure")
print(" 2. Bedrock API Key")
print(" Enter your Bedrock API Key directly — also supports")
print(" team scenarios where an admin distributes keys")
print()
try:
auth_choice = input(" Choice [1]: ").strip()
except (KeyboardInterrupt, EOFError):
print()
return
if auth_choice == "2":
_model_flow_bedrock_api_key(config, region, current_model)
return
# 3. Model discovery — try live API first, fall back to static list
print(f" Discovering models in {region}...")
live_models = discover_bedrock_models(region)
if live_models:
_EXCLUDE_PREFIXES = (
"stability.", "cohere.embed", "twelvelabs.", "us.stability.",
"us.cohere.embed", "us.twelvelabs.", "global.cohere.embed",
"global.twelvelabs.",
)
_EXCLUDE_SUBSTRINGS = ("safeguard", "voxtral", "palmyra-vision")
filtered = []
for m in live_models:
mid = m["id"]
if any(mid.startswith(p) for p in _EXCLUDE_PREFIXES):
continue
if any(s in mid.lower() for s in _EXCLUDE_SUBSTRINGS):
continue
filtered.append(m)
# Deduplicate: prefer inference profiles (us.*, global.*) over bare
# foundation model IDs.
profile_base_ids = set()
for m in filtered:
mid = m["id"]
if mid.startswith(("us.", "global.")):
base = mid.split(".", 1)[1] if "." in mid[3:] else mid
profile_base_ids.add(base)
deduped = []
for m in filtered:
mid = m["id"]
if not mid.startswith(("us.", "global.")) and mid in profile_base_ids:
continue
deduped.append(m)
_RECOMMENDED = [
"us.anthropic.claude-sonnet-4-6",
"us.anthropic.claude-opus-4-6",
"us.anthropic.claude-haiku-4-5",
"us.amazon.nova-pro",
"us.amazon.nova-lite",
"us.amazon.nova-micro",
"deepseek.v3",
"us.meta.llama4-maverick",
"us.meta.llama4-scout",
]
def _sort_key(m):
mid = m["id"]
for i, rec in enumerate(_RECOMMENDED):
if mid.startswith(rec):
return (0, i, mid)
if mid.startswith("global."):
return (1, 0, mid)
return (2, 0, mid)
deduped.sort(key=_sort_key)
model_list = [m["id"] for m in deduped]
print(f" Found {len(model_list)} text model(s) (filtered from {len(live_models)} total)")
else:
model_list = _PROVIDER_MODELS.get("bedrock", [])
if model_list:
print(f" Using {len(model_list)} curated models (live discovery unavailable)")
else:
print(" No models found. Check IAM permissions for bedrock:ListFoundationModels.")
return
# 4. Model selection
if model_list:
selected = _prompt_model_selection(model_list, current_model=current_model)
else:
try:
selected = input(" Model ID: ").strip()
except (KeyboardInterrupt, EOFError):
selected = None
if selected:
_save_model_choice(selected)
cfg = load_config()
model = cfg.get("model")
if not isinstance(model, dict):
model = {"default": model} if model else {}
cfg["model"] = model
model["provider"] = "bedrock"
model["base_url"] = f"https://bedrock-runtime.{region}.amazonaws.com"
model.pop("api_mode", None) # bedrock_converse is auto-detected
bedrock_cfg = cfg.get("bedrock", {})
if not isinstance(bedrock_cfg, dict):
bedrock_cfg = {}
bedrock_cfg["region"] = region
cfg["bedrock"] = bedrock_cfg
save_config(cfg)
deactivate_provider()
print(f" Default model set to: {selected} (via AWS Bedrock, {region})")
else:
print(" No change.")
def _model_flow_api_key_provider(config, provider_id, current_model=""):
"""Generic flow for API-key providers (z.ai, MiniMax, OpenCode, etc.)."""
from hermes_cli.auth import (
@@ -4997,7 +4749,6 @@ For more help on a command:
# gateway start
gateway_start = gateway_subparsers.add_parser("start", help="Start the installed systemd/launchd background service")
gateway_start.add_argument("--system", action="store_true", help="Target the Linux system-level gateway service")
gateway_start.add_argument("--all", action="store_true", help="Kill ALL stale gateway processes across all profiles before starting")
# gateway stop
gateway_stop = gateway_subparsers.add_parser("stop", help="Stop gateway service")
@@ -5007,7 +4758,6 @@ For more help on a command:
# gateway restart
gateway_restart = gateway_subparsers.add_parser("restart", help="Restart gateway service")
gateway_restart.add_argument("--system", action="store_true", help="Target the Linux system-level gateway service")
gateway_restart.add_argument("--all", action="store_true", help="Kill ALL gateway processes across all profiles before restarting")
# gateway status
gateway_status = gateway_subparsers.add_parser("status", help="Show gateway status")
@@ -5321,7 +5071,6 @@ Examples:
hermes debug share --lines 500 Include more log lines
hermes debug share --expire 30 Keep paste for 30 days
hermes debug share --local Print report locally (no upload)
hermes debug delete <url> Delete a previously uploaded paste
""",
)
debug_sub = debug_parser.add_subparsers(dest="debug_command")
@@ -5341,14 +5090,6 @@ Examples:
"--local", action="store_true",
help="Print the report locally instead of uploading",
)
delete_parser = debug_sub.add_parser(
"delete",
help="Delete a paste uploaded by 'hermes debug share'",
)
delete_parser.add_argument(
"urls", nargs="*", default=[],
help="One or more paste URLs to delete (e.g. https://paste.rs/abc123)",
)
debug_parser.set_defaults(func=cmd_debug)
# =========================================================================
@@ -6303,42 +6044,7 @@ Examples:
sys.exit(1)
_processed_argv = _coalesce_session_name_args(sys.argv[1:])
# ── Defensive subparser routing (bpo-9338 workaround) ───────────
# On some Python versions (notably <3.11), argparse fails to route
# subcommand tokens when the parent parser has nargs='?' optional
# arguments (--continue). The symptom: "unrecognized arguments: model"
# even though 'model' is a registered subcommand.
#
# Fix: when argv contains a token matching a known subcommand, set
# subparsers.required=True to force deterministic routing. If that
# fails (e.g. 'hermes -c model' where 'model' is consumed as the
# session name for --continue), fall back to the default behaviour.
import io as _io
_known_cmds = set(subparsers.choices.keys()) if hasattr(subparsers, "choices") else set()
_has_cmd_token = any(t in _known_cmds for t in _processed_argv if not t.startswith("-"))
if _has_cmd_token:
subparsers.required = True
_saved_stderr = sys.stderr
try:
sys.stderr = _io.StringIO()
args = parser.parse_args(_processed_argv)
sys.stderr = _saved_stderr
except SystemExit as exc:
sys.stderr = _saved_stderr
# Help/version flags (exit code 0) already printed output —
# re-raise immediately to avoid a second parse_args printing
# the same help text again (#10230).
if exc.code == 0:
raise
# Subcommand name was consumed as a flag value (e.g. -c model).
# Fall back to optional subparsers so argparse handles it normally.
subparsers.required = False
args = parser.parse_args(_processed_argv)
else:
subparsers.required = False
args = parser.parse_args(_processed_argv)
args = parser.parse_args(_processed_argv)
# Handle --version flag
if args.version:
+2 -4
View File
@@ -58,11 +58,9 @@ def _prompt(label: str, default: str | None = None, secret: bool = False) -> str
def _install_dependencies(provider_name: str) -> None:
"""Install pip dependencies declared in plugin.yaml."""
import subprocess
from plugins.memory import find_provider_dir
from pathlib import Path as _Path
plugin_dir = find_provider_dir(provider_name)
if not plugin_dir:
return
plugin_dir = _Path(__file__).parent.parent / "plugins" / "memory" / provider_name
yaml_path = plugin_dir / "plugin.yaml"
if not yaml_path.exists():
return
+10 -22
View File
@@ -274,11 +274,6 @@ def parse_model_flags(raw_args: str) -> tuple[str, str, bool]:
is_global = False
explicit_provider = ""
# Normalize Unicode dashes (Telegram/iOS auto-converts -- to em/en dash)
# A single Unicode dash before a flag keyword becomes "--"
import re as _re
raw_args = _re.sub(r'[\u2012\u2013\u2014\u2015](provider|global)', r'--\1', raw_args)
# Extract --global
if "--global" in raw_args:
is_global = True
@@ -791,8 +786,7 @@ def list_authenticated_providers(
from hermes_cli.models import OPENROUTER_MODELS, _PROVIDER_MODELS
results: List[dict] = []
seen_slugs: set = set() # lowercase-normalized to catch case variants (#9545)
seen_mdev_ids: set = set() # prevent duplicate entries for aliases (e.g. kimi-coding + kimi-coding-cn)
seen_slugs: set = set()
data = fetch_models_dev()
@@ -805,11 +799,6 @@ def list_authenticated_providers(
# --- 1. Check Hermes-mapped providers ---
for hermes_id, mdev_id in PROVIDER_TO_MODELS_DEV.items():
# Skip aliases that map to the same models.dev provider (e.g.
# kimi-coding and kimi-coding-cn both → kimi-for-coding).
# The first one with valid credentials wins (#10526).
if mdev_id in seen_mdev_ids:
continue
pdata = data.get(mdev_id)
if not isinstance(pdata, dict):
continue
@@ -848,8 +837,7 @@ def list_authenticated_providers(
"total_models": total,
"source": "built-in",
})
seen_slugs.add(slug.lower())
seen_mdev_ids.add(mdev_id)
seen_slugs.add(slug)
# --- 2. Check Hermes-only providers (nous, openai-codex, copilot, opencode-go) ---
from hermes_cli.providers import HERMES_OVERLAYS
@@ -861,12 +849,12 @@ def list_authenticated_providers(
_mdev_to_hermes = {v: k for k, v in PROVIDER_TO_MODELS_DEV.items()}
for pid, overlay in HERMES_OVERLAYS.items():
if pid.lower() in seen_slugs:
if pid in seen_slugs:
continue
# Resolve Hermes slug — e.g. "github-copilot" → "copilot"
hermes_slug = _mdev_to_hermes.get(pid, pid)
if hermes_slug.lower() in seen_slugs:
if hermes_slug in seen_slugs:
continue
# Check if credentials exist
@@ -947,8 +935,8 @@ def list_authenticated_providers(
"total_models": total,
"source": "hermes",
})
seen_slugs.add(pid.lower())
seen_slugs.add(hermes_slug.lower())
seen_slugs.add(pid)
seen_slugs.add(hermes_slug)
# --- 2b. Cross-check canonical provider list ---
# Catches providers that are in CANONICAL_PROVIDERS but weren't found
@@ -960,7 +948,7 @@ def list_authenticated_providers(
_canon_provs = []
for _cp in _canon_provs:
if _cp.slug.lower() in seen_slugs:
if _cp.slug in seen_slugs:
continue
# Check credentials via PROVIDER_REGISTRY (auth.py)
@@ -1007,7 +995,7 @@ def list_authenticated_providers(
"total_models": _cp_total,
"source": "canonical",
})
seen_slugs.add(_cp.slug.lower())
seen_slugs.add(_cp.slug)
# --- 3. User-defined endpoints from config ---
if user_providers and isinstance(user_providers, dict):
@@ -1080,7 +1068,7 @@ def list_authenticated_providers(
groups[slug]["models"].append(default_model)
for slug, grp in groups.items():
if slug.lower() in seen_slugs:
if slug in seen_slugs:
continue
results.append({
"slug": slug,
@@ -1092,7 +1080,7 @@ def list_authenticated_providers(
"source": "user-config",
"api_url": grp["api_url"],
})
seen_slugs.add(slug.lower())
seen_slugs.add(slug)
# Sort: current provider first, then by model count descending
results.sort(key=lambda r: (not r["is_current"], -r["total_models"]))
+1 -58
View File
@@ -303,22 +303,6 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"XiaomiMiMo/MiMo-V2-Flash",
"moonshotai/Kimi-K2-Thinking",
],
# AWS Bedrock — static fallback list used when dynamic discovery is
# unavailable (no boto3, no credentials, or API error). The agent
# prefers live discovery via ListFoundationModels + ListInferenceProfiles.
# Use inference profile IDs (us.*) since most models require them.
"bedrock": [
"us.anthropic.claude-sonnet-4-6",
"us.anthropic.claude-opus-4-6-v1",
"us.anthropic.claude-haiku-4-5-20251001-v1:0",
"us.anthropic.claude-sonnet-4-5-20250929-v1:0",
"us.amazon.nova-pro-v1:0",
"us.amazon.nova-lite-v1:0",
"us.amazon.nova-micro-v1:0",
"deepseek.v3.2",
"us.meta.llama4-maverick-17b-instruct-v1:0",
"us.meta.llama4-scout-17b-instruct-v1:0",
],
}
# ---------------------------------------------------------------------------
@@ -542,7 +526,7 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
ProviderEntry("deepseek", "DeepSeek", "DeepSeek (DeepSeek-V3, R1, coder — direct API)"),
ProviderEntry("xai", "xAI", "xAI (Grok models — direct API)"),
ProviderEntry("zai", "Z.AI / GLM", "Z.AI / GLM (Zhipu AI direct API)"),
ProviderEntry("kimi-coding", "Kimi / Kimi Coding Plan", "Kimi Coding Plan (api.kimi.com) & Moonshot API"),
ProviderEntry("kimi-coding", "Kimi / Moonshot", "Kimi / Moonshot (Moonshot AI direct API)"),
ProviderEntry("kimi-coding-cn", "Kimi / Moonshot (China)", "Kimi / Moonshot China (Moonshot CN direct API)"),
ProviderEntry("minimax", "MiniMax", "MiniMax (global direct API)"),
ProviderEntry("minimax-cn", "MiniMax (China)", "MiniMax China (domestic direct API)"),
@@ -552,7 +536,6 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
ProviderEntry("opencode-zen", "OpenCode Zen", "OpenCode Zen (35+ curated models, pay-as-you-go)"),
ProviderEntry("opencode-go", "OpenCode Go", "OpenCode Go (open models, $10/month subscription)"),
ProviderEntry("ai-gateway", "Vercel AI Gateway", "Vercel AI Gateway (200+ models, pay-per-use)"),
ProviderEntry("bedrock", "AWS Bedrock", "AWS Bedrock (Claude, Nova, Llama, DeepSeek — IAM or API key)"),
]
# Derived dicts — used throughout the codebase
@@ -604,10 +587,6 @@ _PROVIDER_ALIASES = {
"huggingface-hub": "huggingface",
"mimo": "xiaomi",
"xiaomi-mimo": "xiaomi",
"aws": "bedrock",
"aws-bedrock": "bedrock",
"amazon-bedrock": "bedrock",
"amazon": "bedrock",
"grok": "xai",
"x-ai": "xai",
"x.ai": "xai",
@@ -1978,42 +1957,6 @@ def validate_requested_model(
# api_models is None — couldn't reach API. Accept and persist,
# but warn so typos don't silently break things.
# Bedrock: use our own discovery instead of HTTP /models endpoint.
# Bedrock's bedrock-runtime URL doesn't support /models — it uses the
# AWS SDK control plane (ListFoundationModels + ListInferenceProfiles).
if normalized == "bedrock":
try:
from agent.bedrock_adapter import discover_bedrock_models, resolve_bedrock_region
region = resolve_bedrock_region()
discovered = discover_bedrock_models(region)
discovered_ids = {m["id"] for m in discovered}
if requested in discovered_ids:
return {
"accepted": True,
"persist": True,
"recognized": True,
"message": None,
}
# Not in discovered list — still accept (user may have custom
# inference profiles or cross-account access), but warn.
suggestions = get_close_matches(requested, list(discovered_ids), n=3, cutoff=0.4)
suggestion_text = ""
if suggestions:
suggestion_text = "\n Similar models: " + ", ".join(f"`{s}`" for s in suggestions)
return {
"accepted": True,
"persist": True,
"recognized": False,
"message": (
f"Note: `{requested}` was not found in Bedrock model discovery for {region}. "
f"It may still work with custom inference profiles or cross-account access."
f"{suggestion_text}"
),
}
except Exception:
pass # Fall through to generic warning
provider_label = _PROVIDER_LABELS.get(normalized, normalized)
return {
"accepted": True,
-99
View File
@@ -112,7 +112,6 @@ class LoadedPlugin:
module: Optional[types.ModuleType] = None
tools_registered: List[str] = field(default_factory=list)
hooks_registered: List[str] = field(default_factory=list)
commands_registered: List[str] = field(default_factory=list)
enabled: bool = False
error: Optional[str] = None
@@ -212,84 +211,6 @@ class PluginContext:
}
logger.debug("Plugin %s registered CLI command: %s", self.manifest.name, name)
# -- slash command registration -------------------------------------------
def register_command(
self,
name: str,
handler: Callable,
description: str = "",
) -> None:
"""Register a slash command (e.g. ``/lcm``) available in CLI and gateway sessions.
The handler signature is ``fn(raw_args: str) -> str | None``.
It may also be an async callable the gateway dispatch handles both.
Unlike ``register_cli_command()`` (which creates ``hermes <subcommand>``
terminal commands), this registers in-session slash commands that users
invoke during a conversation.
Names conflicting with built-in commands are rejected with a warning.
"""
clean = name.lower().strip().lstrip("/").replace(" ", "-")
if not clean:
logger.warning(
"Plugin '%s' tried to register a command with an empty name.",
self.manifest.name,
)
return
# Reject if it conflicts with a built-in command
try:
from hermes_cli.commands import resolve_command
if resolve_command(clean) is not None:
logger.warning(
"Plugin '%s' tried to register command '/%s' which conflicts "
"with a built-in command. Skipping.",
self.manifest.name, clean,
)
return
except Exception:
pass # If commands module isn't available, skip the check
self._manager._plugin_commands[clean] = {
"handler": handler,
"description": description or "Plugin command",
"plugin": self.manifest.name,
}
logger.debug("Plugin %s registered command: /%s", self.manifest.name, clean)
# -- tool dispatch -------------------------------------------------------
def dispatch_tool(self, tool_name: str, args: dict, **kwargs) -> str:
"""Dispatch a tool call through the registry, with parent agent context.
This is the public interface for plugin slash commands that need to call
tools like ``delegate_task`` without reaching into framework internals.
The parent agent (if available) is resolved automatically plugins never
need to access the agent directly.
Args:
tool_name: Registry name of the tool (e.g. ``"delegate_task"``).
args: Tool arguments dict (same as what the model would pass).
**kwargs: Extra keyword args forwarded to the registry dispatch.
Returns:
JSON string from the tool handler (same format as model tool calls).
"""
from tools.registry import registry
# Wire up parent agent context when available (CLI mode).
# In gateway mode _cli_ref is None — tools degrade gracefully
# (workspace hints fall back to TERMINAL_CWD, no spinner).
if "parent_agent" not in kwargs:
cli = self._manager._cli_ref
agent = getattr(cli, "agent", None) if cli else None
if agent is not None:
kwargs["parent_agent"] = agent
return registry.dispatch(tool_name, args, **kwargs)
# -- context engine registration -----------------------------------------
def register_context_engine(self, engine) -> None:
@@ -402,7 +323,6 @@ class PluginManager:
self._plugin_tool_names: Set[str] = set()
self._cli_commands: Dict[str, dict] = {}
self._context_engine = None # Set by a plugin via register_context_engine()
self._plugin_commands: Dict[str, dict] = {} # Slash commands registered by plugins
self._discovered: bool = False
self._cli_ref = None # Set by CLI after plugin discovery
# Plugin skill registry: qualified name → metadata dict.
@@ -565,10 +485,6 @@ class PluginManager:
for h in p.hooks_registered
}
)
loaded.commands_registered = [
c for c in self._plugin_commands
if self._plugin_commands[c].get("plugin") == manifest.name
]
loaded.enabled = True
except Exception as exc:
@@ -682,7 +598,6 @@ class PluginManager:
"enabled": loaded.enabled,
"tools": len(loaded.tools_registered),
"hooks": len(loaded.hooks_registered),
"commands": len(loaded.commands_registered),
"error": loaded.error,
}
)
@@ -784,20 +699,6 @@ def get_plugin_context_engine():
return get_plugin_manager()._context_engine
def get_plugin_command_handler(name: str) -> Optional[Callable]:
"""Return the handler for a plugin-registered slash command, or ``None``."""
entry = get_plugin_manager()._plugin_commands.get(name)
return entry["handler"] if entry else None
def get_plugin_commands() -> Dict[str, dict]:
"""Return the full plugin commands dict (name → {handler, description, plugin}).
Safe to call before discovery returns an empty dict if no plugins loaded.
"""
return get_plugin_manager()._plugin_commands
def get_plugin_toolsets() -> List[tuple]:
"""Return plugin toolsets as ``(key, label, description)`` tuples.
-14
View File
@@ -236,12 +236,6 @@ ALIASES: Dict[str, str] = {
"mimo": "xiaomi",
"xiaomi-mimo": "xiaomi",
# bedrock
"aws": "bedrock",
"aws-bedrock": "bedrock",
"amazon-bedrock": "bedrock",
"amazon": "bedrock",
# arcee
"arcee-ai": "arcee",
"arceeai": "arcee",
@@ -268,7 +262,6 @@ _LABEL_OVERRIDES: Dict[str, str] = {
"copilot-acp": "GitHub Copilot ACP",
"xiaomi": "Xiaomi MiMo",
"local": "Local endpoint",
"bedrock": "AWS Bedrock",
}
@@ -278,7 +271,6 @@ TRANSPORT_TO_API_MODE: Dict[str, str] = {
"openai_chat": "chat_completions",
"anthropic_messages": "anthropic_messages",
"codex_responses": "codex_responses",
"bedrock_converse": "bedrock_converse",
}
@@ -396,10 +388,6 @@ def determine_api_mode(provider: str, base_url: str = "") -> str:
if pdef is not None:
return TRANSPORT_TO_API_MODE.get(pdef.transport, "chat_completions")
# Direct provider checks for providers not in HERMES_OVERLAYS
if provider == "bedrock":
return "bedrock_converse"
# URL-based heuristics for custom / unknown providers
if base_url:
url_lower = base_url.rstrip("/").lower()
@@ -407,8 +395,6 @@ def determine_api_mode(provider: str, base_url: str = "") -> str:
return "anthropic_messages"
if "api.openai.com" in url_lower:
return "codex_responses"
if "bedrock-runtime" in url_lower and "amazonaws.com" in url_lower:
return "bedrock_converse"
return "chat_completions"
+1 -73
View File
@@ -124,7 +124,7 @@ def _copilot_runtime_api_mode(model_cfg: Dict[str, Any], api_key: str) -> str:
return "chat_completions"
_VALID_API_MODES = {"chat_completions", "codex_responses", "anthropic_messages", "bedrock_converse"}
_VALID_API_MODES = {"chat_completions", "codex_responses", "anthropic_messages"}
def _parse_api_mode(raw: Any) -> Optional[str]:
@@ -167,7 +167,6 @@ def _resolve_runtime_from_pool_entry(
api_mode = "chat_completions"
elif provider == "copilot":
api_mode = _copilot_runtime_api_mode(model_cfg, getattr(entry, "runtime_api_key", ""))
base_url = base_url or PROVIDER_REGISTRY["copilot"].inference_base_url
else:
configured_provider = str(model_cfg.get("provider") or "").strip().lower()
# Honour model.base_url from config.yaml when the configured provider
@@ -837,77 +836,6 @@ def resolve_runtime_provider(
"requested_provider": requested_provider,
}
# AWS Bedrock (native Converse API via boto3)
if provider == "bedrock":
from agent.bedrock_adapter import (
has_aws_credentials,
resolve_aws_auth_env_var,
resolve_bedrock_region,
is_anthropic_bedrock_model,
)
# When the user explicitly selected bedrock (not auto-detected),
# trust boto3's credential chain — it handles IMDS, ECS task roles,
# Lambda execution roles, SSO, and other implicit sources that our
# env-var check can't detect.
is_explicit = requested_provider in ("bedrock", "aws", "aws-bedrock", "amazon-bedrock", "amazon")
if not is_explicit and not has_aws_credentials():
raise AuthError(
"No AWS credentials found for Bedrock. Configure one of:\n"
" - AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY\n"
" - AWS_PROFILE (for SSO / named profiles)\n"
" - IAM instance role (EC2, ECS, Lambda)\n"
"Or run 'aws configure' to set up credentials.",
code="no_aws_credentials",
)
# Read bedrock-specific config from config.yaml
from hermes_cli.config import load_config as _load_bedrock_config
_bedrock_cfg = _load_bedrock_config().get("bedrock", {})
# Region priority: config.yaml bedrock.region → env var → us-east-1
region = (_bedrock_cfg.get("region") or "").strip() or resolve_bedrock_region()
auth_source = resolve_aws_auth_env_var() or "aws-sdk-default-chain"
# Build guardrail config if configured
_gr = _bedrock_cfg.get("guardrail", {})
guardrail_config = None
if _gr.get("guardrail_identifier") and _gr.get("guardrail_version"):
guardrail_config = {
"guardrailIdentifier": _gr["guardrail_identifier"],
"guardrailVersion": _gr["guardrail_version"],
}
if _gr.get("stream_processing_mode"):
guardrail_config["streamProcessingMode"] = _gr["stream_processing_mode"]
if _gr.get("trace"):
guardrail_config["trace"] = _gr["trace"]
# Dual-path routing: Claude models use AnthropicBedrock SDK for full
# feature parity (prompt caching, thinking budgets, adaptive thinking).
# Non-Claude models use the Converse API for multi-model support.
_current_model = str(model_cfg.get("default") or "").strip()
if is_anthropic_bedrock_model(_current_model):
# Claude on Bedrock → AnthropicBedrock SDK → anthropic_messages path
runtime = {
"provider": "bedrock",
"api_mode": "anthropic_messages",
"base_url": f"https://bedrock-runtime.{region}.amazonaws.com",
"api_key": "aws-sdk",
"source": auth_source,
"region": region,
"bedrock_anthropic": True, # Signal to use AnthropicBedrock client
"requested_provider": requested_provider,
}
else:
# Non-Claude (Nova, DeepSeek, Llama, etc.) → Converse API
runtime = {
"provider": "bedrock",
"api_mode": "bedrock_converse",
"base_url": f"https://bedrock-runtime.{region}.amazonaws.com",
"api_key": "aws-sdk",
"source": auth_source,
"region": region,
"requested_provider": requested_provider,
}
if guardrail_config:
runtime["guardrail_config"] = guardrail_config
return runtime
# API-key providers (z.ai/GLM, Kimi, MiniMax, MiniMax-CN)
pconfig = PROVIDER_REGISTRY.get(provider)
if pconfig and pconfig.auth_type == "api_key":
+3 -13
View File
@@ -1611,19 +1611,9 @@ def _setup_telegram():
return
print_info("Create a bot via @BotFather on Telegram")
import re
while True:
token = prompt("Telegram bot token", password=True)
if not token:
return
if not re.match(r"^\d+:[A-Za-z0-9_-]{30,}$", token):
print_error(
"Invalid token format. Expected: <numeric_id>:<alphanumeric_hash> "
"(e.g., 123456789:ABCdefGHI-jklMNOpqrSTUvwxYZ)"
)
continue
break
token = prompt("Telegram bot token", password=True)
if not token:
return
save_env_value("TELEGRAM_BOT_TOKEN", token)
print_success("Telegram token saved")
+16 -40
View File
@@ -48,14 +48,12 @@ from hermes_cli.cli_output import ( # noqa: E402 — late import block
# These map to keys in toolsets.py TOOLSETS dict.
CONFIGURABLE_TOOLSETS = [
("web", "🔍 Web Search & Scraping", "web_search, web_extract"),
("x_search", "🐦 X Search", "x_search"),
("browser", "🌐 Browser Automation", "navigate, click, type, scroll"),
("terminal", "💻 Terminal & Processes", "terminal, process"),
("file", "📁 File Operations", "read, write, patch, search"),
("code_execution", "⚡ Code Execution", "execute_code"),
("vision", "👁️ Vision / Image Analysis", "vision_analyze"),
("image_gen", "🎨 Image Generation", "image_generate"),
("video_gen", "🎬 Video Generation", "video_generate"),
("moa", "🧠 Mixture of Agents", "mixture_of_agents"),
("tts", "🔊 Text-to-Speech", "text_to_speech"),
("skills", "📚 Skills", "list, view, manage"),
@@ -65,7 +63,6 @@ CONFIGURABLE_TOOLSETS = [
("clarify", "❓ Clarifying Questions", "clarify"),
("delegation", "👥 Task Delegation", "delegate_task"),
("cronjob", "⏰ Cron Jobs", "create/list/update/pause/resume/run, with optional attached skills"),
("messaging", "📨 Cross-Platform Messaging", "send_message"),
("rl", "🧪 RL Training", "Tinker-Atropos training tools"),
("homeassistant", "🏠 Home Assistant", "smart home device control"),
]
@@ -124,7 +121,6 @@ TOOL_CATEGORIES = {
"providers": [
{
"name": "Nous Subscription",
"badge": "subscription",
"tag": "Managed OpenAI TTS billed to your subscription",
"env_vars": [],
"tts_provider": "openai",
@@ -134,15 +130,13 @@ TOOL_CATEGORIES = {
},
{
"name": "Microsoft Edge TTS",
"badge": "★ recommended · free",
"tag": "Good quality, no API key needed",
"tag": "Free - no API key needed",
"env_vars": [],
"tts_provider": "edge",
},
{
"name": "OpenAI TTS",
"badge": "paid",
"tag": "High quality voices",
"tag": "Premium - high quality voices",
"env_vars": [
{"key": "VOICE_TOOLS_OPENAI_KEY", "prompt": "OpenAI API key", "url": "https://platform.openai.com/api-keys"},
],
@@ -150,8 +144,7 @@ TOOL_CATEGORIES = {
},
{
"name": "ElevenLabs",
"badge": "paid",
"tag": "Most natural voices",
"tag": "Premium - most natural voices",
"env_vars": [
{"key": "ELEVENLABS_API_KEY", "prompt": "ElevenLabs API key", "url": "https://elevenlabs.io/app/settings/api-keys"},
],
@@ -159,8 +152,7 @@ TOOL_CATEGORIES = {
},
{
"name": "Mistral (Voxtral TTS)",
"badge": "paid",
"tag": "Multilingual, native Opus",
"tag": "Multilingual, native Opus, needs MISTRAL_API_KEY",
"env_vars": [
{"key": "MISTRAL_API_KEY", "prompt": "Mistral API key", "url": "https://console.mistral.ai/"},
],
@@ -176,7 +168,6 @@ TOOL_CATEGORIES = {
"providers": [
{
"name": "Nous Subscription",
"badge": "subscription",
"tag": "Managed Firecrawl billed to your subscription",
"web_backend": "firecrawl",
"env_vars": [],
@@ -186,8 +177,7 @@ TOOL_CATEGORIES = {
},
{
"name": "Firecrawl Cloud",
"badge": "★ recommended",
"tag": "Full-featured search, extract, and crawl",
"tag": "Hosted service - search, extract, and crawl",
"web_backend": "firecrawl",
"env_vars": [
{"key": "FIRECRAWL_API_KEY", "prompt": "Firecrawl API key", "url": "https://firecrawl.dev"},
@@ -195,8 +185,7 @@ TOOL_CATEGORIES = {
},
{
"name": "Exa",
"badge": "paid",
"tag": "Neural search with semantic understanding",
"tag": "AI-native search and contents",
"web_backend": "exa",
"env_vars": [
{"key": "EXA_API_KEY", "prompt": "Exa API key", "url": "https://exa.ai"},
@@ -204,8 +193,7 @@ TOOL_CATEGORIES = {
},
{
"name": "Parallel",
"badge": "paid",
"tag": "AI-powered search and extract",
"tag": "AI-native search and extract",
"web_backend": "parallel",
"env_vars": [
{"key": "PARALLEL_API_KEY", "prompt": "Parallel API key", "url": "https://parallel.ai"},
@@ -213,8 +201,7 @@ TOOL_CATEGORIES = {
},
{
"name": "Tavily",
"badge": "free tier",
"tag": "Search, extract, and crawl — 1000 free searches/mo",
"tag": "AI-native search, extract, and crawl",
"web_backend": "tavily",
"env_vars": [
{"key": "TAVILY_API_KEY", "prompt": "Tavily API key", "url": "https://app.tavily.com/home"},
@@ -222,8 +209,7 @@ TOOL_CATEGORIES = {
},
{
"name": "Firecrawl Self-Hosted",
"badge": "free · self-hosted",
"tag": "Run your own Firecrawl instance (Docker)",
"tag": "Free - run your own instance",
"web_backend": "firecrawl",
"env_vars": [
{"key": "FIRECRAWL_API_URL", "prompt": "Your Firecrawl instance URL (e.g., http://localhost:3002)"},
@@ -237,7 +223,6 @@ TOOL_CATEGORIES = {
"providers": [
{
"name": "Nous Subscription",
"badge": "subscription",
"tag": "Managed FAL image generation billed to your subscription",
"env_vars": [],
"requires_nous_auth": True,
@@ -246,7 +231,6 @@ TOOL_CATEGORIES = {
},
{
"name": "FAL.ai",
"badge": "paid",
"tag": "FLUX 2 Pro with auto-upscaling",
"env_vars": [
{"key": "FAL_KEY", "prompt": "FAL API key", "url": "https://fal.ai/dashboard/keys"},
@@ -260,7 +244,6 @@ TOOL_CATEGORIES = {
"providers": [
{
"name": "Nous Subscription (Browser Use cloud)",
"badge": "subscription",
"tag": "Managed Browser Use billed to your subscription",
"env_vars": [],
"browser_provider": "browser-use",
@@ -271,16 +254,14 @@ TOOL_CATEGORIES = {
},
{
"name": "Local Browser",
"badge": "★ recommended · free",
"tag": "Headless Chromium, no API key needed",
"tag": "Free headless Chromium (no API key needed)",
"env_vars": [],
"browser_provider": "local",
"post_setup": "agent_browser",
},
{
"name": "Browserbase",
"badge": "paid",
"tag": "Cloud browser with stealth and proxies",
"tag": "Cloud browser with stealth & proxies",
"env_vars": [
{"key": "BROWSERBASE_API_KEY", "prompt": "Browserbase API key", "url": "https://browserbase.com"},
{"key": "BROWSERBASE_PROJECT_ID", "prompt": "Browserbase project ID"},
@@ -290,7 +271,6 @@ TOOL_CATEGORIES = {
},
{
"name": "Browser Use",
"badge": "paid",
"tag": "Cloud browser with remote execution",
"env_vars": [
{"key": "BROWSER_USE_API_KEY", "prompt": "Browser Use API key", "url": "https://browser-use.com"},
@@ -300,7 +280,6 @@ TOOL_CATEGORIES = {
},
{
"name": "Firecrawl",
"badge": "paid",
"tag": "Cloud browser with remote execution",
"env_vars": [
{"key": "FIRECRAWL_API_KEY", "prompt": "Firecrawl API key", "url": "https://firecrawl.dev"},
@@ -310,8 +289,7 @@ TOOL_CATEGORIES = {
},
{
"name": "Camofox",
"badge": "free · local",
"tag": "Anti-detection browser (Firefox/Camoufox)",
"tag": "Local anti-detection browser (Firefox/Camoufox)",
"env_vars": [
{"key": "CAMOFOX_URL", "prompt": "Camofox server URL", "default": "http://localhost:9377",
"url": "https://github.com/jo-inc/camofox-browser"},
@@ -860,8 +838,7 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
# Plain text labels only (no ANSI codes in menu items)
provider_choices = []
for p in providers:
badge = f" [{p['badge']}]" if p.get("badge") else ""
tag = f"{p['tag']}" if p.get("tag") else ""
tag = f" ({p['tag']})" if p.get("tag") else ""
configured = ""
env_vars = p.get("env_vars", [])
if not env_vars or all(get_env_value(v["key"]) for v in env_vars):
@@ -871,7 +848,7 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
configured = ""
else:
configured = " [configured]"
provider_choices.append(f"{p['name']}{badge}{tag}{configured}")
provider_choices.append(f"{p['name']}{tag}{configured}")
# Add skip option
provider_choices.append("Skip — keep defaults / configure later")
@@ -1127,8 +1104,7 @@ def _configure_tool_category_for_reconfig(ts_key: str, cat: dict, config: dict):
provider_choices = []
for p in providers:
badge = f" [{p['badge']}]" if p.get("badge") else ""
tag = f"{p['tag']}" if p.get("tag") else ""
tag = f" ({p['tag']})" if p.get("tag") else ""
configured = ""
env_vars = p.get("env_vars", [])
if not env_vars or all(get_env_value(v["key"]) for v in env_vars):
@@ -1138,7 +1114,7 @@ def _configure_tool_category_for_reconfig(ts_key: str, cat: dict, config: dict):
configured = ""
else:
configured = " [configured]"
provider_choices.append(f"{p['name']}{badge}{tag}{configured}")
provider_choices.append(f"{p['name']}{tag}{configured}")
default_idx = _detect_active_provider_index(providers, config)
-1
View File
@@ -358,7 +358,6 @@ def _add_rotating_handler(
path.parent.mkdir(parents=True, exist_ok=True)
handler = _ManagedRotatingFileHandler(
str(path), maxBytes=max_bytes, backupCount=backup_count,
encoding="utf-8",
)
handler.setLevel(level)
handler.setFormatter(formatter)
+40 -2
View File
@@ -26,7 +26,7 @@ import logging
import threading
from typing import Dict, Any, List, Optional, Tuple
from tools.registry import discover_builtin_tools, registry
from tools.registry import registry
from toolsets import resolve_toolset, validate_toolset
logger = logging.getLogger(__name__)
@@ -129,7 +129,45 @@ def _run_async(coro):
# Tool Discovery (importing each module triggers its registry.register calls)
# =============================================================================
discover_builtin_tools()
def _discover_tools():
"""Import all tool modules to trigger their registry.register() calls.
Wrapped in a function so import errors in optional tools (e.g., fal_client
not installed) don't prevent the rest from loading.
"""
_modules = [
"tools.web_tools",
"tools.terminal_tool",
"tools.file_tools",
"tools.vision_tools",
"tools.mixture_of_agents_tool",
"tools.image_generation_tool",
"tools.skills_tool",
"tools.skill_manager_tool",
"tools.browser_tool",
"tools.cronjob_tools",
"tools.rl_training_tool",
"tools.tts_tool",
"tools.todo_tool",
"tools.memory_tool",
"tools.session_search_tool",
"tools.clarify_tool",
"tools.code_execution_tool",
"tools.delegate_tool",
"tools.process_registry",
"tools.send_message_tool",
# "tools.honcho_tools", # Removed — Honcho is now a memory provider plugin
"tools.homeassistant_tool",
]
import importlib
for mod_name in _modules:
try:
importlib.import_module(mod_name)
except Exception as e:
logger.warning("Could not import tool module %s: %s", mod_name, e)
_discover_tools()
# MCP tool discovery (external MCP servers from config)
try:
@@ -1,12 +1,12 @@
---
name: honcho
description: Configure and use Honcho memory with Hermes -- cross-session user modeling, multi-profile peer isolation, observation config, dialectic reasoning, session summaries, and context budget enforcement. Use when setting up Honcho, troubleshooting memory, managing profiles with Honcho peers, or tuning observation, recall, and dialectic settings.
version: 2.0.0
description: Configure and use Honcho memory with Hermes -- cross-session user modeling, multi-profile peer isolation, observation config, and dialectic reasoning. Use when setting up Honcho, troubleshooting memory, managing profiles with Honcho peers, or tuning observation and recall settings.
version: 1.0.0
author: Hermes Agent
license: MIT
metadata:
hermes:
tags: [Honcho, Memory, Profiles, Observation, Dialectic, User-Modeling, Session-Summary]
tags: [Honcho, Memory, Profiles, Observation, Dialectic, User-Modeling]
homepage: https://docs.honcho.dev
related_skills: [hermes-agent]
prerequisites:
@@ -22,9 +22,8 @@ Honcho provides AI-native cross-session user modeling. It learns who the user is
- Setting up Honcho (cloud or self-hosted)
- Troubleshooting memory not working / peers not syncing
- Creating multi-profile setups where each agent has its own Honcho peer
- Tuning observation, recall, dialectic depth, or write frequency settings
- Understanding what the 5 Honcho tools do and when to use them
- Configuring context budgets and session summary injection
- Tuning observation, recall, or write frequency settings
- Understanding what the 4 Honcho tools do and when to use them
## Setup
@@ -52,27 +51,6 @@ hermes honcho status # shows resolved config, connection test, peer info
## Architecture
### Base Context Injection
When Honcho injects context into the system prompt (in `hybrid` or `context` recall modes), it assembles the base context block in this order:
1. **Session summary** -- a short digest of the current session so far (placed first so the model has immediate conversational continuity)
2. **User representation** -- Honcho's accumulated model of the user (preferences, facts, patterns)
3. **AI peer card** -- the identity card for this Hermes profile's AI peer
The session summary is generated automatically by Honcho at the start of each turn (when a prior session exists). It gives the model a warm start without replaying full history.
### Cold / Warm Prompt Selection
Honcho automatically selects between two prompt strategies:
| Condition | Strategy | What happens |
|-----------|----------|--------------|
| No prior session or empty representation | **Cold start** | Lightweight intro prompt; skips summary injection; encourages the model to learn about the user |
| Existing representation and/or session history | **Warm start** | Full base context injection (summary → representation → card); richer system prompt |
You do not need to configure this -- it is automatic based on session state.
### Peers
Honcho models conversations as interactions between **peers**. Hermes creates two peers per session:
@@ -134,63 +112,6 @@ How the agent accesses Honcho memory:
| `context` | Yes | No (hidden) | Minimal token cost, no tool calls |
| `tools` | No | Yes | Agent controls all memory access explicitly |
## Three Orthogonal Knobs
Honcho's dialectic behavior is controlled by three independent dimensions. Each can be tuned without affecting the others:
### Cadence (when)
Controls **how often** dialectic and context calls happen.
| Key | Default | Description |
|-----|---------|-------------|
| `contextCadence` | `1` | Min turns between context API calls |
| `dialecticCadence` | `3` | Min turns between dialectic API calls |
| `injectionFrequency` | `every-turn` | `every-turn` or `first-turn` for base context injection |
Higher cadence values reduce API calls and cost. `dialecticCadence: 3` (default) means the dialectic engine fires at most every 3rd turn.
### Depth (how many)
Controls **how many rounds** of dialectic reasoning Honcho performs per query.
| Key | Default | Range | Description |
|-----|---------|-------|-------------|
| `dialecticDepth` | `1` | 1-3 | Number of dialectic reasoning rounds per query |
| `dialecticDepthLevels` | -- | array | Optional per-depth-round level overrides (see below) |
`dialecticDepth: 2` means Honcho runs two rounds of dialectic synthesis. The first round produces an initial answer; the second refines it.
`dialecticDepthLevels` lets you set the reasoning level for each round independently:
```json
{
"dialecticDepth": 3,
"dialecticDepthLevels": ["low", "medium", "high"]
}
```
If `dialecticDepthLevels` is omitted, rounds use **proportional levels** derived from `dialecticReasoningLevel` (the base):
| Depth | Pass levels |
|-------|-------------|
| 1 | [base] |
| 2 | [minimal, base] |
| 3 | [minimal, base, low] |
This keeps earlier passes cheap while using full depth on the final synthesis.
### Level (how hard)
Controls the **intensity** of each dialectic reasoning round.
| Key | Default | Description |
|-----|---------|-------------|
| `dialecticReasoningLevel` | `low` | `minimal`, `low`, `medium`, `high`, `max` |
| `dialecticDynamic` | `true` | When `true`, the model can pass `reasoning_level` to `honcho_reasoning` to override the default per-call. `false` = always use `dialecticReasoningLevel`, model overrides ignored |
Higher levels produce richer synthesis but cost more tokens on Honcho's backend.
## Multi-Profile Setup
Each Hermes profile gets its own Honcho AI peer while sharing the same workspace (user context). This means:
@@ -228,7 +149,6 @@ Override any setting in the host block:
"hermes.coder": {
"aiPeer": "coder",
"recallMode": "tools",
"dialecticDepth": 2,
"observation": {
"user": { "observeMe": true, "observeOthers": false },
"ai": { "observeMe": true, "observeOthers": true }
@@ -240,97 +160,19 @@ Override any setting in the host block:
## Tools
The agent has 5 bidirectional Honcho tools (hidden in `context` recall mode):
| Tool | LLM call? | Cost | Use when |
|------|-----------|------|----------|
| `honcho_profile` | No | minimal | Quick factual snapshot at conversation start or for fast name/role/pref lookups |
| `honcho_search` | No | low | Fetch specific past facts to reason over yourself — raw excerpts, no synthesis |
| `honcho_context` | No | low | Full session context snapshot: summary, representation, card, recent messages |
| `honcho_reasoning` | Yes | mediumhigh | Natural language question synthesized by Honcho's dialectic engine |
| `honcho_conclude` | No | minimal | Write or delete a persistent fact; pass `peer: "ai"` for AI self-knowledge |
The agent has 4 Honcho tools (hidden in `context` recall mode):
### `honcho_profile`
Read or update a peer card — curated key facts (name, role, preferences, communication style). Pass `card: [...]` to update; omit to read. No LLM call.
Quick factual snapshot of the user -- name, role, preferences, patterns. No LLM call, minimal cost. Use at conversation start or for fast lookups.
### `honcho_search`
Semantic search over stored context for a specific peer. Returns raw excerpts ranked by relevance, no synthesis. Default 800 tokens, max 2000. Good when you need specific past facts to reason over yourself rather than a synthesized answer.
Semantic search over stored context. Returns raw excerpts ranked by relevance, no LLM synthesis. Default 800 tokens, max 2000. Use when you want specific past facts to reason over yourself.
### `honcho_context`
Full session context snapshot from Honcho — session summary, peer representation, peer card, and recent messages. No LLM call. Use when you want to see everything Honcho knows about the current session and peer in one shot.
### `honcho_reasoning`
Natural language question answered by Honcho's dialectic reasoning engine (LLM call on Honcho's backend). Higher cost, higher quality. Pass `reasoning_level` to control depth: `minimal` (fast/cheap) → `low``medium``high``max` (thorough). Omit to use the configured default (`low`). Use for synthesized understanding of the user's patterns, goals, or current state.
Natural language question answered by Honcho's dialectic reasoning (LLM call on Honcho's backend). Higher cost, higher quality. Can query about user (default) or the AI peer.
### `honcho_conclude`
Write or delete a persistent conclusion about a peer. Pass `conclusion: "..."` to create. Pass `delete_id: "..."` to remove a conclusion (for PII removal — Honcho self-heals incorrect conclusions over time, so deletion is only needed for PII). You MUST pass exactly one of the two.
### Bidirectional peer targeting
All 5 tools accept an optional `peer` parameter:
- `peer: "user"` (default) — operates on the user peer
- `peer: "ai"` — operates on this profile's AI peer
- `peer: "<explicit-id>"` — any peer ID in the workspace
Examples:
```
honcho_profile # read user's card
honcho_profile peer="ai" # read AI peer's card
honcho_reasoning query="What does this user care about most?"
honcho_reasoning query="What are my interaction patterns?" peer="ai" reasoning_level="medium"
honcho_conclude conclusion="Prefers terse answers"
honcho_conclude conclusion="I tend to over-explain code" peer="ai"
honcho_conclude delete_id="abc123" # PII removal
```
## Agent Usage Patterns
Guidelines for Hermes when Honcho memory is active.
### On conversation start
```
1. honcho_profile → fast warmup, no LLM cost
2. If context looks thin → honcho_context (full snapshot, still no LLM)
3. If deep synthesis needed → honcho_reasoning (LLM call, use sparingly)
```
Do NOT call `honcho_reasoning` on every turn. Auto-injection already handles ongoing context refresh. Use the reasoning tool only when you genuinely need synthesized insight the base context doesn't provide.
### When the user shares something to remember
```
honcho_conclude conclusion="<specific, actionable fact>"
```
Good conclusions: "Prefers code examples over prose explanations", "Working on a Rust async project through April 2026"
Bad conclusions: "User said something about Rust" (too vague), "User seems technical" (already in representation)
### When the user asks about past context / you need to recall specifics
```
honcho_search query="<topic>" → fast, no LLM, good for specific facts
honcho_context → full snapshot with summary + messages
honcho_reasoning query="<question>" → synthesized answer, use when search isn't enough
```
### When to use `peer: "ai"`
Use AI peer targeting to build and query the agent's own self-knowledge:
- `honcho_conclude conclusion="I tend to be verbose when explaining architecture" peer="ai"` — self-correction
- `honcho_reasoning query="How do I typically handle ambiguous requests?" peer="ai"` — self-audit
- `honcho_profile peer="ai"` — review own identity card
### When NOT to call tools
In `hybrid` and `context` modes, base context (user representation + card + session summary) is auto-injected before every turn. Do not re-fetch what was already injected. Call tools only when:
- You need something the injected context doesn't have
- The user explicitly asks you to recall or check memory
- You're writing a conclusion about something new
### Cadence awareness
`honcho_reasoning` on the tool side shares the same cost as auto-injection dialectic. After an explicit tool call, the auto-injection cadence resets — avoiding double-charging the same turn.
Write a persistent fact about the user. Conclusions build the user's profile over time. Use when the user states a preference, corrects you, or shares something to remember.
## Config Reference
@@ -349,39 +191,18 @@ Config file: `$HERMES_HOME/honcho.json` (profile-local) or `~/.honcho/config.jso
| `observation` | all on | Per-peer `observeMe`/`observeOthers` booleans |
| `writeFrequency` | `async` | `async`, `turn`, `session`, or integer N |
| `sessionStrategy` | `per-directory` | `per-directory`, `per-repo`, `per-session`, `global` |
| `messageMaxChars` | `25000` | Max chars per message (chunked if exceeded) |
### Dialectic settings
| Key | Default | Description |
|-----|---------|-------------|
| `dialecticReasoningLevel` | `low` | `minimal`, `low`, `medium`, `high`, `max` |
| `dialecticDynamic` | `true` | Auto-bump reasoning by query complexity. `false` = fixed level |
| `dialecticDepth` | `1` | Number of dialectic rounds per query (1-3) |
| `dialecticDepthLevels` | -- | Optional array of per-round levels, e.g. `["low", "high"]` |
| `dialecticDynamic` | `true` | Auto-bump reasoning by query length. `false` = fixed level |
| `messageMaxChars` | `25000` | Max chars per message (chunked if exceeded) |
| `dialecticMaxInputChars` | `10000` | Max chars for dialectic query input |
### Context budget and injection
### Cost-awareness (advanced, root config only)
| Key | Default | Description |
|-----|---------|-------------|
| `contextTokens` | uncapped | Max tokens for the combined base context injection (summary + representation + card). Opt-in cap — omit to leave uncapped, set to an integer to bound injection size. |
| `injectionFrequency` | `every-turn` | `every-turn` or `first-turn` |
| `contextCadence` | `1` | Min turns between context API calls |
| `dialecticCadence` | `3` | Min turns between dialectic LLM calls |
The `contextTokens` budget is enforced at injection time. If the session summary + representation + card exceed the budget, Honcho trims the summary first, then the representation, preserving the card. This prevents context blowup in long sessions.
### Memory-context sanitization
Honcho sanitizes the `memory-context` block before injection to prevent prompt injection and malformed content:
- Strips XML/HTML tags from user-authored conclusions
- Normalizes whitespace and control characters
- Truncates individual conclusions that exceed `messageMaxChars`
- Escapes delimiter sequences that could break the system prompt structure
This fix addresses edge cases where raw user conclusions containing markup or special characters could corrupt the injected context block.
| `dialecticCadence` | `1` | Min turns between dialectic API calls |
## Troubleshooting
@@ -400,12 +221,6 @@ Observation config is synced from the server on each session init. Start a new s
### Messages truncated
Messages over `messageMaxChars` (default 25k) are automatically chunked with `[continued]` markers. If you're hitting this often, check if tool results or skill content is inflating message size.
### Context injection too large
If you see warnings about context budget exceeded, lower `contextTokens` or reduce `dialecticDepth`. The session summary is trimmed first when the budget is tight.
### Session summary missing
Session summary requires at least one prior turn in the current Honcho session. On cold start (new session, no history), the summary is omitted and Honcho uses the cold-start prompt strategy instead.
## CLI Commands
| Command | Description |
+26 -115
View File
@@ -1,22 +1,18 @@
"""Memory provider plugin discovery.
Scans two directories for memory provider plugins:
1. Bundled providers: ``plugins/memory/<name>/`` (shipped with hermes-agent)
2. User-installed providers: ``$HERMES_HOME/plugins/<name>/``
Scans ``plugins/memory/<name>/`` directories for memory provider plugins.
Each subdirectory must contain ``__init__.py`` with a class implementing
the MemoryProvider ABC. On name collisions, bundled providers take
precedence.
the MemoryProvider ABC.
Only ONE provider can be active at a time, selected via
``memory.provider`` in config.yaml.
Memory providers are separate from the general plugin system they live
in the repo and are always available without user installation. Only ONE
can be active at a time, selected via ``memory.provider`` in config.yaml.
Usage:
from plugins.memory import discover_memory_providers, load_memory_provider
available = discover_memory_providers() # [(name, desc, available), ...]
provider = load_memory_provider("mnemosyne") # MemoryProvider instance
provider = load_memory_provider("openviking") # MemoryProvider instance
"""
from __future__ import annotations
@@ -33,101 +29,24 @@ logger = logging.getLogger(__name__)
_MEMORY_PLUGINS_DIR = Path(__file__).parent
# ---------------------------------------------------------------------------
# Directory helpers
# ---------------------------------------------------------------------------
def _get_user_plugins_dir() -> Optional[Path]:
"""Return ``$HERMES_HOME/plugins/`` or None if unavailable."""
try:
from hermes_constants import get_hermes_home
d = get_hermes_home() / "plugins"
return d if d.is_dir() else None
except Exception:
return None
def _is_memory_provider_dir(path: Path) -> bool:
"""Heuristic: does *path* look like a memory provider plugin?
Checks for ``register_memory_provider`` or ``MemoryProvider`` in the
``__init__.py`` source. Cheap text scan no import needed.
"""
init_file = path / "__init__.py"
if not init_file.exists():
return False
try:
source = init_file.read_text(errors="replace")[:8192]
return "register_memory_provider" in source or "MemoryProvider" in source
except Exception:
return False
def _iter_provider_dirs() -> List[Tuple[str, Path]]:
"""Yield ``(name, path)`` for all discovered provider directories.
Scans bundled first, then user-installed. Bundled takes precedence
on name collisions (first-seen wins via ``seen`` set).
"""
seen: set = set()
dirs: List[Tuple[str, Path]] = []
# 1. Bundled providers (plugins/memory/<name>/)
if _MEMORY_PLUGINS_DIR.is_dir():
for child in sorted(_MEMORY_PLUGINS_DIR.iterdir()):
if not child.is_dir() or child.name.startswith(("_", ".")):
continue
if not (child / "__init__.py").exists():
continue
seen.add(child.name)
dirs.append((child.name, child))
# 2. User-installed providers ($HERMES_HOME/plugins/<name>/)
user_dir = _get_user_plugins_dir()
if user_dir:
for child in sorted(user_dir.iterdir()):
if not child.is_dir() or child.name.startswith(("_", ".")):
continue
if child.name in seen:
continue # bundled takes precedence
if not _is_memory_provider_dir(child):
continue # skip non-memory plugins
dirs.append((child.name, child))
return dirs
def find_provider_dir(name: str) -> Optional[Path]:
"""Resolve a provider name to its directory.
Checks bundled first, then user-installed.
"""
# Bundled
bundled = _MEMORY_PLUGINS_DIR / name
if bundled.is_dir() and (bundled / "__init__.py").exists():
return bundled
# User-installed
user_dir = _get_user_plugins_dir()
if user_dir:
user = user_dir / name
if user.is_dir() and _is_memory_provider_dir(user):
return user
return None
# ---------------------------------------------------------------------------
# Public API
# ---------------------------------------------------------------------------
def discover_memory_providers() -> List[Tuple[str, str, bool]]:
"""Scan bundled and user-installed directories for available providers.
"""Scan plugins/memory/ for available providers.
Returns list of (name, description, is_available) tuples.
Bundled providers take precedence on name collisions.
Does NOT import the providers just reads plugin.yaml for metadata
and does a lightweight availability check.
"""
results = []
if not _MEMORY_PLUGINS_DIR.is_dir():
return results
for child in sorted(_MEMORY_PLUGINS_DIR.iterdir()):
if not child.is_dir() or child.name.startswith(("_", ".")):
continue
init_file = child / "__init__.py"
if not init_file.exists():
continue
for name, child in _iter_provider_dirs():
# Read description from plugin.yaml if available
desc = ""
yaml_file = child / "plugin.yaml"
@@ -151,7 +70,7 @@ def discover_memory_providers() -> List[Tuple[str, str, bool]]:
except Exception:
available = False
results.append((name, desc, available))
results.append((child.name, desc, available))
return results
@@ -159,15 +78,11 @@ def discover_memory_providers() -> List[Tuple[str, str, bool]]:
def load_memory_provider(name: str) -> Optional["MemoryProvider"]:
"""Load and return a MemoryProvider instance by name.
Checks both bundled (``plugins/memory/<name>/``) and user-installed
(``$HERMES_HOME/plugins/<name>/``) directories. Bundled takes
precedence on name collisions.
Returns None if the provider is not found or fails to load.
"""
provider_dir = find_provider_dir(name)
if not provider_dir:
logger.debug("Memory provider '%s' not found in bundled or user plugins", name)
provider_dir = _MEMORY_PLUGINS_DIR / name
if not provider_dir.is_dir():
logger.debug("Memory provider '%s' not found in %s", name, _MEMORY_PLUGINS_DIR)
return None
try:
@@ -189,10 +104,7 @@ def _load_provider_from_dir(provider_dir: Path) -> Optional["MemoryProvider"]:
- A top-level class that extends MemoryProvider we instantiate it
"""
name = provider_dir.name
# Use a separate namespace for user-installed plugins so they don't
# collide with bundled providers in sys.modules.
_is_bundled = _MEMORY_PLUGINS_DIR in provider_dir.parents or provider_dir.parent == _MEMORY_PLUGINS_DIR
module_name = f"plugins.memory.{name}" if _is_bundled else f"_hermes_user_memory.{name}"
module_name = f"plugins.memory.{name}"
init_file = provider_dir / "__init__.py"
if not init_file.exists():
@@ -345,16 +257,15 @@ def discover_plugin_cli_commands() -> List[dict]:
return results
# Only look at the active provider's directory
plugin_dir = find_provider_dir(active_provider)
if not plugin_dir:
plugin_dir = _MEMORY_PLUGINS_DIR / active_provider
if not plugin_dir.is_dir():
return results
cli_file = plugin_dir / "cli.py"
if not cli_file.exists():
return results
_is_bundled = _MEMORY_PLUGINS_DIR in plugin_dir.parents or plugin_dir.parent == _MEMORY_PLUGINS_DIR
module_name = f"plugins.memory.{active_provider}.cli" if _is_bundled else f"_hermes_user_memory.{active_provider}.cli"
module_name = f"plugins.memory.{active_provider}.cli"
try:
# Import the CLI module (lightweight — no SDK needed)
if module_name in sys.modules:
+99 -207
View File
@@ -1,6 +1,6 @@
# Honcho Memory Provider
AI-native cross-session user modeling with multi-pass dialectic reasoning, session summaries, bidirectional peer tools, and persistent conclusions.
AI-native cross-session user modeling with dialectic Q&A, semantic search, peer cards, and persistent conclusions.
> **Honcho docs:** <https://docs.honcho.dev/v3/guides/integrations/hermes>
@@ -19,86 +19,9 @@ hermes memory setup # generic picker, also works
Or manually:
```bash
hermes config set memory.provider honcho
echo "HONCHO_API_KEY=***" >> ~/.hermes/.env
echo "HONCHO_API_KEY=your-key" >> ~/.hermes/.env
```
## Architecture Overview
### Two-Layer Context Injection
Context is injected into the **user message** at API-call time (not the system prompt) to preserve prompt caching. Only a static mode header goes in the system prompt. The injected block is wrapped in `<memory-context>` fences with a system note clarifying it's background data, not new user input.
Two independent layers, each on its own cadence:
**Layer 1 — Base context** (refreshed every `contextCadence` turns):
1. **SESSION SUMMARY** — from `session.context(summary=True)`, placed first
2. **User Representation** — Honcho's evolving model of the user
3. **User Peer Card** — key facts snapshot
4. **AI Self-Representation** — Honcho's model of the AI peer
5. **AI Identity Card** — AI peer facts
**Layer 2 — Dialectic supplement** (fired every `dialecticCadence` turns):
Multi-pass `.chat()` reasoning about the user, appended after base context.
Both layers are joined, then truncated to fit `contextTokens` budget via `_truncate_to_budget` (tokens × 4 chars, word-boundary safe).
### Cold Start vs Warm Session Prompts
Dialectic pass 0 automatically selects its prompt based on session state:
- **Cold** (no base context cached): "Who is this person? What are their preferences, goals, and working style? Focus on facts that would help an AI assistant be immediately useful."
- **Warm** (base context exists): "Given what's been discussed in this session so far, what context about this user is most relevant to the current conversation? Prioritize active context over biographical facts."
Not configurable — determined automatically.
### Dialectic Depth (Multi-Pass Reasoning)
`dialecticDepth` (13, clamped) controls how many `.chat()` calls fire per dialectic cycle:
| Depth | Passes | Behavior |
|-------|--------|----------|
| 1 | single `.chat()` | Base query only (cold or warm prompt) |
| 2 | audit + synthesis | Pass 0 result is self-audited; pass 1 does targeted synthesis. Conditional bail-out if pass 0 returns strong signal (>300 chars or structured with bullets/sections >100 chars) |
| 3 | audit + synthesis + reconciliation | Pass 2 reconciles contradictions across prior passes into a final synthesis |
### Proportional Reasoning Levels
When `dialecticDepthLevels` is not set, each pass uses a proportional level relative to `dialecticReasoningLevel` (the "base"):
| Depth | Pass levels |
|-------|-------------|
| 1 | [base] |
| 2 | [minimal, base] |
| 3 | [minimal, base, low] |
Override with `dialecticDepthLevels`: an explicit array of reasoning level strings per pass.
### Three Orthogonal Dialectic Knobs
| Knob | Controls | Type |
|------|----------|------|
| `dialecticCadence` | How often — minimum turns between dialectic firings | int |
| `dialecticDepth` | How many — passes per firing (13) | int |
| `dialecticReasoningLevel` | How hard — reasoning ceiling per `.chat()` call | string |
### Input Sanitization
`run_conversation` strips leaked `<memory-context>` blocks from user input before processing. When `saveMessages` persists a turn that included injected context, the block can reappear in subsequent turns via message history. The sanitizer removes `<memory-context>` blocks plus associated system notes.
## Tools
Five bidirectional tools. All accept an optional `peer` parameter (`"user"` or `"ai"`, default `"user"`).
| Tool | LLM call? | Description |
|------|-----------|-------------|
| `honcho_profile` | No | Peer card — key facts snapshot |
| `honcho_search` | No | Semantic search over stored context (800 tok default, 2000 max) |
| `honcho_context` | No | Full session context: summary, representation, card, messages |
| `honcho_reasoning` | Yes | LLM-synthesized answer via dialectic `.chat()` |
| `honcho_conclude` | No | Write a persistent fact/conclusion about the user |
Tool visibility depends on `recallMode`: hidden in `context` mode, always present in `tools` and `hybrid`.
## Config Resolution
Config is read from the first file that exists:
@@ -111,128 +34,42 @@ Config is read from the first file that exists:
Host key is derived from the active Hermes profile: `hermes` (default) or `hermes.<profile>`.
For every key, resolution order is: **host block > root > env var > default**.
## Tools
| Tool | LLM call? | Description |
|------|-----------|-------------|
| `honcho_profile` | No | User's peer card -- key facts snapshot |
| `honcho_search` | No | Semantic search over stored context (800 tok default, 2000 max) |
| `honcho_context` | Yes | LLM-synthesized answer via dialectic reasoning |
| `honcho_conclude` | No | Write a persistent fact about the user |
Tool availability depends on `recallMode`: hidden in `context` mode, always present in `tools` and `hybrid`.
## Full Configuration Reference
### Identity & Connection
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `apiKey` | string | | API key. Falls back to `HONCHO_API_KEY` env var |
| `baseUrl` | string | | Base URL for self-hosted Honcho. Local URLs auto-skip API key auth |
| `environment` | string | `"production"` | SDK environment mapping |
| `enabled` | bool | auto | Master toggle. Auto-enables when `apiKey` or `baseUrl` present |
| `workspace` | string | host key | Honcho workspace ID. Shared environment — all profiles in the same workspace can see the same user identity and related memories |
| `peerName` | string | | User peer identity |
| `aiPeer` | string | host key | AI peer identity |
| Key | Type | Default | Scope | Description |
|-----|------|---------|-------|-------------|
| `apiKey` | string | -- | root / host | API key. Falls back to `HONCHO_API_KEY` env var |
| `baseUrl` | string | -- | root | Base URL for self-hosted Honcho. Local URLs (`localhost`, `127.0.0.1`, `::1`) auto-skip API key auth |
| `environment` | string | `"production"` | root / host | SDK environment mapping |
| `enabled` | bool | auto | root / host | Master toggle. Auto-enables when `apiKey` or `baseUrl` present |
| `workspace` | string | host key | root / host | Honcho workspace ID |
| `peerName` | string | -- | root / host | User peer identity |
| `aiPeer` | string | host key | root / host | AI peer identity |
### Memory & Recall
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `recallMode` | string | `"hybrid"` | `"hybrid"` (auto-inject + tools), `"context"` (auto-inject only, tools hidden), `"tools"` (tools only, no injection). Legacy `"auto"` `"hybrid"` |
| `observationMode` | string | `"directional"` | Preset: `"directional"` (all on) or `"unified"` (shared pool). Use `observation` object for granular control |
| `observation` | object | | Per-peer observation config (see Observation section) |
| Key | Type | Default | Scope | Description |
|-----|------|---------|-------|-------------|
| `recallMode` | string | `"hybrid"` | root / host | `"hybrid"` (auto-inject + tools), `"context"` (auto-inject only, tools hidden), `"tools"` (tools only, no injection). Legacy `"auto"` normalizes to `"hybrid"` |
| `observationMode` | string | `"directional"` | root / host | Shorthand preset: `"directional"` (all on) or `"unified"` (shared pool). Use `observation` object for granular control |
| `observation` | object | -- | root / host | Per-peer observation config (see below) |
### Write Behavior
#### Observation (granular)
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `writeFrequency` | string/int | `"async"` | `"async"` (background), `"turn"` (sync per turn), `"session"` (batch on end), or integer N (every N turns) |
| `saveMessages` | bool | `true` | Persist messages to Honcho API |
### Session Resolution
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `sessionStrategy` | string | `"per-directory"` | `"per-directory"`, `"per-session"`, `"per-repo"` (git root), `"global"` |
| `sessionPeerPrefix` | bool | `false` | Prepend peer name to session keys |
| `sessions` | object | `{}` | Manual directory-to-session-name mappings |
#### Session Name Resolution
The Honcho session name determines which conversation bucket memory lands in. Resolution follows a priority chain — first match wins:
| Priority | Source | Example session name |
|----------|--------|---------------------|
| 1 | Manual map (`sessions` config) | `"myproject-main"` |
| 2 | `/title` command (mid-session rename) | `"refactor-auth"` |
| 3 | Gateway session key (Telegram, Discord, etc.) | `"agent-main-telegram-dm-8439114563"` |
| 4 | `per-session` strategy | Hermes session ID (`20260415_a3f2b1`) |
| 5 | `per-repo` strategy | Git root directory name (`hermes-agent`) |
| 6 | `per-directory` strategy | Current directory basename (`src`) |
| 7 | `global` strategy | Workspace name (`hermes`) |
Gateway platforms always resolve via priority 3 (per-chat isolation) regardless of `sessionStrategy`. The strategy setting only affects CLI sessions.
If `sessionPeerPrefix` is `true`, the peer name is prepended: `eri-hermes-agent`.
#### What each strategy produces
- **`per-directory`** — basename of `$PWD`. Opening hermes in `~/code/myapp` and `~/code/other` gives two separate sessions. Same directory = same session across runs.
- **`per-repo`** — git root directory name. All subdirectories within a repo share one session. Falls back to `per-directory` if not inside a git repo.
- **`per-session`** — Hermes session ID (timestamp + hex). Every `hermes` invocation starts a fresh Honcho session. Falls back to `per-directory` if no session ID is available.
- **`global`** — workspace name. One session for everything. Memory accumulates across all directories and runs.
### Multi-Profile Pattern
Multiple Hermes profiles can share one workspace while maintaining separate AI identities. Config resolution is **host block > root > env var > default** — host blocks inherit from root, so shared settings only need to be declared once:
```json
{
"apiKey": "***",
"workspace": "hermes",
"peerName": "yourname",
"hosts": {
"hermes": {
"aiPeer": "hermes",
"recallMode": "hybrid",
"sessionStrategy": "per-directory"
},
"hermes.coder": {
"aiPeer": "coder",
"recallMode": "tools",
"sessionStrategy": "per-repo"
}
}
}
```
Both profiles see the same user (`yourname`) in the same shared environment (`hermes`), but each AI peer builds its own observations, conclusions, and behavior patterns. The coder's memory stays code-oriented; the main agent's stays broad.
Host key is derived from the active Hermes profile: `hermes` (default) or `hermes.<profile>` (e.g. `hermes -p coder` → host key `hermes.coder`).
### Dialectic & Reasoning
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `dialecticDepth` | int | `1` | Passes per dialectic cycle (13, clamped). 1=single query, 2=audit+synthesis, 3=audit+synthesis+reconciliation |
| `dialecticDepthLevels` | array | — | Optional array of reasoning level strings per pass. Overrides proportional defaults. Example: `["minimal", "low", "medium"]` |
| `dialecticReasoningLevel` | string | `"low"` | Base reasoning level for `.chat()`: `"minimal"`, `"low"`, `"medium"`, `"high"`, `"max"` |
| `dialecticDynamic` | bool | `true` | When `true`, model can override reasoning level per-call via `honcho_reasoning` tool. When `false`, always uses `dialecticReasoningLevel` |
| `dialecticMaxChars` | int | `600` | Max chars of dialectic result injected into system prompt |
| `dialecticMaxInputChars` | int | `10000` | Max chars for dialectic query input to `.chat()`. Honcho cloud limit: 10k |
### Token Budgets
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `contextTokens` | int | SDK default | Token budget for `context()` API calls. Also gates prefetch truncation (tokens × 4 chars) |
| `messageMaxChars` | int | `25000` | Max chars per message sent via `add_messages()`. Exceeding this triggers chunking with `[continued]` markers. Honcho cloud limit: 25k |
### Cadence (Cost Control)
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `contextCadence` | int | `1` | Minimum turns between base context refreshes (session summary + representation + card) |
| `dialecticCadence` | int | `1` | Minimum turns between dialectic `.chat()` firings |
| `injectionFrequency` | string | `"every-turn"` | `"every-turn"` or `"first-turn"` (inject context on the first user message only, skip from turn 2 onward) |
| `reasoningLevelCap` | string | — | Hard cap on reasoning level: `"minimal"`, `"low"`, `"medium"`, `"high"` |
### Observation (Granular)
Maps 1:1 to Honcho's per-peer `SessionPeerConfig`. When present, overrides `observationMode` preset.
Maps 1:1 to Honcho's per-peer `SessionPeerConfig`. Set at root or per host block -- each profile can have different observation settings. When present, overrides `observationMode` preset.
```json
"observation": {
@@ -248,16 +85,74 @@ Maps 1:1 to Honcho's per-peer `SessionPeerConfig`. When present, overrides `obse
| `ai.observeMe` | `true` | AI peer self-observation (Honcho builds AI representation) |
| `ai.observeOthers` | `true` | AI peer observes user messages (enables cross-peer dialectic) |
Presets:
- `"directional"` (default): all four `true`
Presets for `observationMode`:
- `"directional"` (default): all four booleans `true`
- `"unified"`: user `observeMe=true`, AI `observeOthers=true`, rest `false`
### Hardcoded Limits
Per-profile example -- coder profile observes the user but user doesn't observe coder:
| Limit | Value |
|-------|-------|
| Search tool max tokens | 2000 (hard cap), 800 (default) |
| Peer card fetch tokens | 200 |
```json
"hosts": {
"hermes.coder": {
"observation": {
"user": { "observeMe": true, "observeOthers": false },
"ai": { "observeMe": true, "observeOthers": true }
}
}
}
```
Settings changed in the [Honcho dashboard](https://app.honcho.dev) are synced back on session init.
### Write Behavior
| Key | Type | Default | Scope | Description |
|-----|------|---------|-------|-------------|
| `writeFrequency` | string or int | `"async"` | root / host | `"async"` (background thread), `"turn"` (sync per turn), `"session"` (batch on end), or integer N (every N turns) |
| `saveMessages` | bool | `true` | root / host | Whether to persist messages to Honcho API |
### Session Resolution
| Key | Type | Default | Scope | Description |
|-----|------|---------|-------|-------------|
| `sessionStrategy` | string | `"per-directory"` | root / host | `"per-directory"`, `"per-session"` (new each run), `"per-repo"` (git root name), `"global"` (single session) |
| `sessionPeerPrefix` | bool | `false` | root / host | Prepend peer name to session keys |
| `sessions` | object | `{}` | root | Manual directory-to-session-name mappings: `{"/path/to/project": "my-session"}` |
### Token Budgets & Dialectic
| Key | Type | Default | Scope | Description |
|-----|------|---------|-------|-------------|
| `contextTokens` | int | SDK default | root / host | Token budget for `context()` API calls. Also gates prefetch truncation (tokens x 4 chars) |
| `dialecticReasoningLevel` | string | `"low"` | root / host | Base reasoning level for `peer.chat()`: `"minimal"`, `"low"`, `"medium"`, `"high"`, `"max"` |
| `dialecticDynamic` | bool | `true` | root / host | Auto-bump reasoning based on query length: `<120` chars = base level, `120-400` = +1, `>400` = +2 (capped at `"high"`). Set `false` to always use `dialecticReasoningLevel` as-is |
| `dialecticMaxChars` | int | `600` | root / host | Max chars of dialectic result injected into system prompt |
| `dialecticMaxInputChars` | int | `10000` | root / host | Max chars for dialectic query input to `peer.chat()`. Honcho cloud limit: 10k |
| `messageMaxChars` | int | `25000` | root / host | Max chars per message sent via `add_messages()`. Messages exceeding this are chunked with `[continued]` markers. Honcho cloud limit: 25k |
### Cost Awareness (Advanced)
These are read from the root config object, not the host block. Must be set manually in `honcho.json`.
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `injectionFrequency` | string | `"every-turn"` | `"every-turn"` or `"first-turn"` (inject context only on turn 0) |
| `contextCadence` | int | `1` | Minimum turns between `context()` API calls |
| `dialecticCadence` | int | `1` | Minimum turns between `peer.chat()` API calls |
| `reasoningLevelCap` | string | -- | Hard cap on auto-bumped reasoning: `"minimal"`, `"low"`, `"mid"`, `"high"` |
### Hardcoded Limits (Not Configurable)
| Limit | Value | Location |
|-------|-------|----------|
| Search tool max tokens | 2000 (hard cap), 800 (default) | `__init__.py` handle_tool_call |
| Peer card fetch tokens | 200 | `session.py` get_peer_card |
## Config Precedence
For every key, resolution order is: **host block > root > env var > default**.
Host key derivation: `HERMES_HONCHO_HOST` env > active profile (`hermes.<profile>`) > `"hermes"`.
## Environment Variables
@@ -287,16 +182,15 @@ Presets:
```json
{
"apiKey": "***",
"apiKey": "your-key",
"workspace": "hermes",
"peerName": "username",
"contextCadence": 2,
"dialecticCadence": 3,
"dialecticDepth": 2,
"peerName": "eri",
"hosts": {
"hermes": {
"enabled": true,
"aiPeer": "hermes",
"workspace": "hermes",
"peerName": "eri",
"recallMode": "hybrid",
"observation": {
"user": { "observeMe": true, "observeOthers": true },
@@ -305,16 +199,14 @@ Presets:
"writeFrequency": "async",
"sessionStrategy": "per-directory",
"dialecticReasoningLevel": "low",
"dialecticDepth": 2,
"dialecticMaxChars": 600,
"saveMessages": true
},
"hermes.coder": {
"enabled": true,
"aiPeer": "coder",
"sessionStrategy": "per-repo",
"dialecticDepth": 1,
"dialecticDepthLevels": ["low"],
"workspace": "hermes",
"peerName": "eri",
"observation": {
"user": { "observeMe": true, "observeOthers": false },
"ai": { "observeMe": true, "observeOthers": true }
+80 -411
View File
@@ -17,7 +17,6 @@ from __future__ import annotations
import json
import logging
import re
import threading
from typing import Any, Dict, List, Optional
@@ -34,33 +33,20 @@ logger = logging.getLogger(__name__)
PROFILE_SCHEMA = {
"name": "honcho_profile",
"description": (
"Retrieve or update a peer card from Honcho — a curated list of key facts "
"about that peer (name, role, preferences, communication style, patterns). "
"Pass `card` to update; omit `card` to read."
"Retrieve the user's peer card from Honcho — a curated list of key facts "
"about them (name, role, preferences, communication style, patterns). "
"Fast, no LLM reasoning, minimal cost. "
"Use this at conversation start or when you need a quick factual snapshot."
),
"parameters": {
"type": "object",
"properties": {
"peer": {
"type": "string",
"description": "Peer to query. Built-in aliases: 'user' (default), 'ai'. Or pass any peer ID from this workspace.",
},
"card": {
"type": "array",
"items": {"type": "string"},
"description": "New peer card as a list of fact strings. Omit to read the current card.",
},
},
"required": [],
},
"parameters": {"type": "object", "properties": {}, "required": []},
}
SEARCH_SCHEMA = {
"name": "honcho_search",
"description": (
"Semantic search over Honcho's stored context about a peer. "
"Semantic search over Honcho's stored context about the user. "
"Returns raw excerpts ranked by relevance — no LLM synthesis. "
"Cheaper and faster than honcho_reasoning. "
"Cheaper and faster than honcho_context. "
"Good when you want to find specific past facts and reason over them yourself."
),
"parameters": {
@@ -74,49 +60,6 @@ SEARCH_SCHEMA = {
"type": "integer",
"description": "Token budget for returned context (default 800, max 2000).",
},
"peer": {
"type": "string",
"description": "Peer to query. Built-in aliases: 'user' (default), 'ai'. Or pass any peer ID from this workspace.",
},
},
"required": ["query"],
},
}
REASONING_SCHEMA = {
"name": "honcho_reasoning",
"description": (
"Ask Honcho a natural language question and get a synthesized answer. "
"Uses Honcho's LLM (dialectic reasoning) — higher cost than honcho_profile or honcho_search. "
"Can query about any peer via alias or explicit peer ID. "
"Pass reasoning_level to control depth: minimal (fast/cheap), low (default), "
"medium, high, max (deep/expensive). Omit for configured default."
),
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "A natural language question.",
},
"reasoning_level": {
"type": "string",
"description": (
"Override the default reasoning depth. "
"Omit to use the configured default (typically low). "
"Guide:\n"
"- minimal: quick factual lookups (name, role, simple preference)\n"
"- low: straightforward questions with clear answers\n"
"- medium: multi-aspect questions requiring synthesis across observations\n"
"- high: complex behavioral patterns, contradictions, deep analysis\n"
"- max: thorough audit-level analysis, leave no stone unturned"
),
"enum": ["minimal", "low", "medium", "high", "max"],
},
"peer": {
"type": "string",
"description": "Peer to query. Built-in aliases: 'user' (default), 'ai'. Or pass any peer ID from this workspace.",
},
},
"required": ["query"],
},
@@ -125,61 +68,48 @@ REASONING_SCHEMA = {
CONTEXT_SCHEMA = {
"name": "honcho_context",
"description": (
"Retrieve full session context from Honcho — summary, peer representation, "
"peer card, and recent messages. No LLM synthesis. "
"Cheaper than honcho_reasoning. Use this to see what Honcho knows about "
"the current conversation and the specified peer."
"Ask Honcho a natural language question and get a synthesized answer. "
"Uses Honcho's LLM (dialectic reasoning) — higher cost than honcho_profile or honcho_search. "
"Can query about any peer: the user (default) or the AI assistant."
),
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Optional focus query to filter context. Omit for full session context snapshot.",
"description": "A natural language question.",
},
"peer": {
"type": "string",
"description": "Peer to query. Built-in aliases: 'user' (default), 'ai'. Or pass any peer ID from this workspace.",
"description": "Which peer to query about: 'user' (default) or 'ai'.",
},
},
"required": [],
"required": ["query"],
},
}
CONCLUDE_SCHEMA = {
"name": "honcho_conclude",
"description": (
"Write or delete a conclusion about a peer in Honcho's memory. "
"Conclusions are persistent facts that build a peer's profile. "
"You MUST pass exactly one of: `conclusion` (to create) or `delete_id` (to delete). "
"Passing neither is an error. "
"Deletion is only for PII removal — Honcho self-heals incorrect conclusions over time."
"Write a conclusion about the user back to Honcho's memory. "
"Conclusions are persistent facts that build the user's profile. "
"Use when the user states a preference, corrects you, or shares "
"something to remember across sessions."
),
"parameters": {
"type": "object",
"properties": {
"conclusion": {
"type": "string",
"description": "A factual statement to persist. Required when not using delete_id.",
},
"delete_id": {
"type": "string",
"description": "Conclusion ID to delete (for PII removal). Required when not using conclusion.",
},
"peer": {
"type": "string",
"description": "Peer to query. Built-in aliases: 'user' (default), 'ai'. Or pass any peer ID from this workspace.",
},
"description": "A factual statement about the user to persist.",
}
},
"anyOf": [
{"required": ["conclusion"]},
{"required": ["delete_id"]},
],
"required": ["conclusion"],
},
}
ALL_TOOL_SCHEMAS = [PROFILE_SCHEMA, SEARCH_SCHEMA, REASONING_SCHEMA, CONTEXT_SCHEMA, CONCLUDE_SCHEMA]
ALL_TOOL_SCHEMAS = [PROFILE_SCHEMA, SEARCH_SCHEMA, CONTEXT_SCHEMA, CONCLUDE_SCHEMA]
# ---------------------------------------------------------------------------
@@ -201,18 +131,16 @@ class HonchoMemoryProvider(MemoryProvider):
# B1: recall_mode — set during initialize from config
self._recall_mode = "hybrid" # "context", "tools", or "hybrid"
# Base context cache — refreshed on context_cadence, not frozen
self._base_context_cache: Optional[str] = None
self._base_context_lock = threading.Lock()
# B4: First-turn context baking
self._first_turn_context: Optional[str] = None
self._first_turn_lock = threading.Lock()
# B5: Cost-awareness turn counting and cadence
self._turn_count = 0
self._injection_frequency = "every-turn" # or "first-turn"
self._context_cadence = 1 # minimum turns between context API calls
self._dialectic_cadence = 3 # minimum turns between dialectic API calls
self._dialectic_depth = 1 # how many .chat() calls per dialectic cycle (1-3)
self._dialectic_depth_levels: list[str] | None = None # per-pass reasoning levels
self._reasoning_level_cap: Optional[str] = None # "minimal", "low", "medium", "high"
self._dialectic_cadence = 1 # minimum turns between dialectic API calls
self._reasoning_level_cap: Optional[str] = None # "minimal", "low", "mid", "high"
self._last_context_turn = -999
self._last_dialectic_turn = -999
@@ -308,11 +236,9 @@ class HonchoMemoryProvider(MemoryProvider):
raw = cfg.raw or {}
self._injection_frequency = raw.get("injectionFrequency", "every-turn")
self._context_cadence = int(raw.get("contextCadence", 1))
self._dialectic_cadence = int(raw.get("dialecticCadence", 3))
self._dialectic_depth = max(1, min(cfg.dialectic_depth, 3))
self._dialectic_depth_levels = cfg.dialectic_depth_levels
self._dialectic_cadence = int(raw.get("dialecticCadence", 1))
cap = raw.get("reasoningLevelCap")
if cap and cap in ("minimal", "low", "medium", "high"):
if cap and cap in ("minimal", "low", "mid", "high"):
self._reasoning_level_cap = cap
except Exception as e:
logger.debug("Honcho cost-awareness config parse error: %s", e)
@@ -325,7 +251,9 @@ class HonchoMemoryProvider(MemoryProvider):
# ----- Port #1957: lazy session init for tools-only mode -----
if self._recall_mode == "tools":
if cfg.init_on_session_start:
# Eager init even in tools mode (opt-in)
# Eager init: create session now so sync_turn() works from turn 1.
# Does NOT enable auto-injection — prefetch() still returns empty.
logger.debug("Honcho tools-only mode — eager session init (initOnSessionStart=true)")
self._do_session_init(cfg, session_id, **kwargs)
return
# Defer actual session creation until first tool call
@@ -359,13 +287,8 @@ class HonchoMemoryProvider(MemoryProvider):
# ----- B3: resolve_session_name -----
session_title = kwargs.get("session_title")
gateway_session_key = kwargs.get("gateway_session_key")
self._session_key = (
cfg.resolve_session_name(
session_title=session_title,
session_id=session_id,
gateway_session_key=gateway_session_key,
)
cfg.resolve_session_name(session_title=session_title, session_id=session_id)
or session_id
or "hermes-default"
)
@@ -376,21 +299,12 @@ class HonchoMemoryProvider(MemoryProvider):
self._session_initialized = True
# ----- B6: Memory file migration (one-time, for new sessions) -----
# Skip under per-session strategy: every Hermes run creates a fresh
# Honcho session by design, so uploading MEMORY.md/USER.md/SOUL.md to
# each one would flood the backend with short-lived duplicates instead
# of performing a one-time migration.
try:
if not session.messages and cfg.session_strategy != "per-session":
if not session.messages:
from hermes_constants import get_hermes_home
mem_dir = str(get_hermes_home() / "memories")
self._manager.migrate_memory_files(self._session_key, mem_dir)
logger.debug("Honcho memory file migration attempted for new session: %s", self._session_key)
elif cfg.session_strategy == "per-session":
logger.debug(
"Honcho memory file migration skipped: per-session strategy creates a fresh session per run (%s)",
self._session_key,
)
except Exception as e:
logger.debug("Honcho memory file migration skipped: %s", e)
@@ -433,11 +347,6 @@ class HonchoMemoryProvider(MemoryProvider):
"""Format the prefetch context dict into a readable system prompt block."""
parts = []
# Session summary — session-scoped context, placed first for relevance
summary = ctx.get("summary", "")
if summary:
parts.append(f"## Session Summary\n{summary}")
rep = ctx.get("representation", "")
if rep:
parts.append(f"## User Representation\n{rep}")
@@ -461,9 +370,9 @@ class HonchoMemoryProvider(MemoryProvider):
def system_prompt_block(self) -> str:
"""Return system prompt text, adapted by recall_mode.
Returns only the mode header and tool instructions static text
that doesn't change between turns (prompt-cache friendly).
Live context (representation, card) is injected via prefetch().
B4: On the FIRST call, fetch and bake the full Honcho context
(user representation, peer card, AI representation, continuity synthesis).
Subsequent calls return the cached block for prompt caching stability.
"""
if self._cron_skipped:
return ""
@@ -473,10 +382,24 @@ class HonchoMemoryProvider(MemoryProvider):
return (
"# Honcho Memory\n"
"Active (tools-only mode). Use honcho_profile, honcho_search, "
"honcho_reasoning, honcho_context, and honcho_conclude tools to access user memory."
"honcho_context, and honcho_conclude tools to access user memory."
)
return ""
# ----- B4: First-turn context baking -----
first_turn_block = ""
if self._recall_mode in ("context", "hybrid"):
with self._first_turn_lock:
if self._first_turn_context is None:
# First call — fetch and cache
try:
ctx = self._manager.get_prefetch_context(self._session_key)
self._first_turn_context = self._format_first_turn_context(ctx) if ctx else ""
except Exception as e:
logger.debug("Honcho first-turn context fetch failed: %s", e)
self._first_turn_context = ""
first_turn_block = self._first_turn_context
# ----- B1: adapt text based on recall_mode -----
if self._recall_mode == "context":
header = (
@@ -489,8 +412,7 @@ class HonchoMemoryProvider(MemoryProvider):
header = (
"# Honcho Memory\n"
"Active (tools-only mode). Use honcho_profile for a quick factual snapshot, "
"honcho_search for raw excerpts, honcho_context for raw peer context, "
"honcho_reasoning for synthesized answers, "
"honcho_search for raw excerpts, honcho_context for synthesized answers, "
"honcho_conclude to save facts about the user. "
"No automatic context injection — you must use tools to access memory."
)
@@ -499,19 +421,16 @@ class HonchoMemoryProvider(MemoryProvider):
"# Honcho Memory\n"
"Active (hybrid mode). Relevant context is auto-injected AND memory tools are available. "
"Use honcho_profile for a quick factual snapshot, "
"honcho_search for raw excerpts, honcho_context for raw peer context, "
"honcho_reasoning for synthesized answers, "
"honcho_search for raw excerpts, honcho_context for synthesized answers, "
"honcho_conclude to save facts about the user."
)
if first_turn_block:
return f"{header}\n\n{first_turn_block}"
return header
def prefetch(self, query: str, *, session_id: str = "") -> str:
"""Return base context (representation + card) plus dialectic supplement.
Assembles two layers:
1. Base context from peer.context() cached, refreshed on context_cadence
2. Dialectic supplement cached, refreshed on dialectic_cadence
"""Return prefetched dialectic context from background thread.
B1: Returns empty when recall_mode is "tools" (no injection).
B5: Respects injection_frequency "first-turn" returns cached/empty after turn 0.
@@ -524,95 +443,22 @@ class HonchoMemoryProvider(MemoryProvider):
if self._recall_mode == "tools":
return ""
# B5: injection_frequency — if "first-turn" and past first turn, return empty.
# _turn_count is 1-indexed (first user message = 1), so > 1 means "past first".
if self._injection_frequency == "first-turn" and self._turn_count > 1:
# B5: injection_frequency — if "first-turn" and past first turn, return empty
if self._injection_frequency == "first-turn" and self._turn_count > 0:
return ""
parts = []
# ----- Layer 1: Base context (representation + card) -----
# On first call, fetch synchronously so turn 1 isn't empty.
# After that, serve from cache and refresh in background on cadence.
with self._base_context_lock:
if self._base_context_cache is None:
# First call — synchronous fetch
try:
ctx = self._manager.get_prefetch_context(self._session_key)
self._base_context_cache = self._format_first_turn_context(ctx) if ctx else ""
self._last_context_turn = self._turn_count
except Exception as e:
logger.debug("Honcho base context fetch failed: %s", e)
self._base_context_cache = ""
base_context = self._base_context_cache
# Check if background context prefetch has a fresher result
if self._manager:
fresh_ctx = self._manager.pop_context_result(self._session_key)
if fresh_ctx:
formatted = self._format_first_turn_context(fresh_ctx)
if formatted:
with self._base_context_lock:
self._base_context_cache = formatted
base_context = formatted
if base_context:
parts.append(base_context)
# ----- Layer 2: Dialectic supplement -----
# On the very first turn, no queue_prefetch() has run yet so the
# dialectic result is empty. Run with a bounded timeout so a slow
# Honcho connection doesn't block the first response indefinitely.
# On timeout the result is skipped and queue_prefetch() will pick it
# up at the next cadence-allowed turn.
if self._last_dialectic_turn == -999 and query:
_first_turn_timeout = (
self._config.timeout if self._config and self._config.timeout else 8.0
)
_result_holder: list[str] = []
def _run_first_turn() -> None:
try:
_result_holder.append(self._run_dialectic_depth(query))
except Exception as exc:
logger.debug("Honcho first-turn dialectic failed: %s", exc)
_t = threading.Thread(target=_run_first_turn, daemon=True)
_t.start()
_t.join(timeout=_first_turn_timeout)
if not _t.is_alive():
first_turn_dialectic = _result_holder[0] if _result_holder else ""
if first_turn_dialectic and first_turn_dialectic.strip():
with self._prefetch_lock:
self._prefetch_result = first_turn_dialectic
self._last_dialectic_turn = self._turn_count
else:
logger.debug(
"Honcho first-turn dialectic timed out (%.1fs) — "
"will inject at next cadence-allowed turn",
_first_turn_timeout,
)
# Don't update _last_dialectic_turn: queue_prefetch() will
# retry at the next cadence-allowed turn via the async path.
if self._prefetch_thread and self._prefetch_thread.is_alive():
self._prefetch_thread.join(timeout=3.0)
with self._prefetch_lock:
dialectic_result = self._prefetch_result
result = self._prefetch_result
self._prefetch_result = ""
if dialectic_result and dialectic_result.strip():
parts.append(dialectic_result)
if not parts:
if not result:
return ""
result = "\n\n".join(parts)
# ----- Port #3265: token budget enforcement -----
result = self._truncate_to_budget(result)
return result
return f"## Honcho Context\n{result}"
def _truncate_to_budget(self, text: str) -> str:
"""Truncate text to fit within context_tokens budget if set."""
@@ -629,11 +475,9 @@ class HonchoMemoryProvider(MemoryProvider):
return truncated + ""
def queue_prefetch(self, query: str, *, session_id: str = "") -> None:
"""Fire background prefetch threads for the upcoming turn.
"""Fire a background dialectic query for the upcoming turn.
B5: Checks cadence independently for dialectic and context refresh.
Context refresh updates the base layer (representation + card).
Dialectic fires the LLM reasoning supplement.
B5: Checks cadence before firing background threads.
"""
if self._cron_skipped:
return
@@ -644,15 +488,6 @@ class HonchoMemoryProvider(MemoryProvider):
if self._recall_mode == "tools":
return
# ----- Context refresh (base layer) — independent cadence -----
if self._context_cadence <= 1 or (self._turn_count - self._last_context_turn) >= self._context_cadence:
self._last_context_turn = self._turn_count
try:
self._manager.prefetch_context(self._session_key, query)
except Exception as e:
logger.debug("Honcho context prefetch failed: %s", e)
# ----- Dialectic prefetch (supplement layer) -----
# B5: cadence check — skip if too soon since last dialectic call
if self._dialectic_cadence > 1:
if (self._turn_count - self._last_dialectic_turn) < self._dialectic_cadence:
@@ -664,7 +499,9 @@ class HonchoMemoryProvider(MemoryProvider):
def _run():
try:
result = self._run_dialectic_depth(query)
result = self._manager.dialectic_query(
self._session_key, query, peer="user"
)
if result and result.strip():
with self._prefetch_lock:
self._prefetch_result = result
@@ -676,140 +513,13 @@ class HonchoMemoryProvider(MemoryProvider):
)
self._prefetch_thread.start()
# ----- Dialectic depth: multi-pass .chat() with cold/warm prompts -----
# Proportional reasoning levels per depth/pass when dialecticDepthLevels
# is not configured. The base level is dialecticReasoningLevel.
# Index: (depth, pass) → level relative to base.
_PROPORTIONAL_LEVELS: dict[tuple[int, int], str] = {
# depth 1: single pass at base level
(1, 0): "base",
# depth 2: pass 0 lighter, pass 1 at base
(2, 0): "minimal",
(2, 1): "base",
# depth 3: pass 0 lighter, pass 1 at base, pass 2 one above minimal
(3, 0): "minimal",
(3, 1): "base",
(3, 2): "low",
}
_LEVEL_ORDER = ("minimal", "low", "medium", "high", "max")
def _resolve_pass_level(self, pass_idx: int) -> str:
"""Resolve reasoning level for a given pass index.
Uses dialecticDepthLevels if configured, otherwise proportional
defaults relative to dialecticReasoningLevel.
"""
if self._dialectic_depth_levels and pass_idx < len(self._dialectic_depth_levels):
return self._dialectic_depth_levels[pass_idx]
base = (self._config.dialectic_reasoning_level if self._config else "low")
mapping = self._PROPORTIONAL_LEVELS.get((self._dialectic_depth, pass_idx))
if mapping is None or mapping == "base":
return base
return mapping
def _build_dialectic_prompt(self, pass_idx: int, prior_results: list[str], is_cold: bool) -> str:
"""Build the prompt for a given dialectic pass.
Pass 0: cold start (general user query) or warm (session-scoped).
Pass 1: self-audit / targeted synthesis against gaps from pass 0.
Pass 2: reconciliation / contradiction check across prior passes.
"""
if pass_idx == 0:
if is_cold:
return (
"Who is this person? What are their preferences, goals, "
"and working style? Focus on facts that would help an AI "
"assistant be immediately useful."
)
return (
"Given what's been discussed in this session so far, what "
"context about this user is most relevant to the current "
"conversation? Prioritize active context over biographical facts."
)
elif pass_idx == 1:
prior = prior_results[-1] if prior_results else ""
return (
f"Given this initial assessment:\n\n{prior}\n\n"
"What gaps remain in your understanding that would help "
"going forward? Synthesize what you actually know about "
"the user's current state and immediate needs, grounded "
"in evidence from recent sessions."
)
else:
# pass 2: reconciliation
return (
f"Prior passes produced:\n\n"
f"Pass 1:\n{prior_results[0] if len(prior_results) > 0 else '(empty)'}\n\n"
f"Pass 2:\n{prior_results[1] if len(prior_results) > 1 else '(empty)'}\n\n"
"Do these assessments cohere? Reconcile any contradictions "
"and produce a final, concise synthesis of what matters most "
"for the current conversation."
)
@staticmethod
def _signal_sufficient(result: str) -> bool:
"""Check if a dialectic pass returned enough signal to skip further passes.
Heuristic: a response longer than 100 chars with some structure
(section headers, bullets, or an ordered list) is considered sufficient.
"""
if not result or len(result.strip()) < 100:
return False
# Structured output with sections/bullets is strong signal
if "\n" in result and (
"##" in result
or "" in result
or re.search(r"^[*-] ", result, re.MULTILINE)
or re.search(r"^\s*\d+\. ", result, re.MULTILINE)
):
return True
# Long enough even without structure
return len(result.strip()) > 300
def _run_dialectic_depth(self, query: str) -> str:
"""Execute up to dialecticDepth .chat() calls with conditional bail-out.
Cold start (no base context): general user-oriented query.
Warm session (base context exists): session-scoped query.
Each pass is conditional bails early if prior pass returned strong signal.
Returns the best (usually last) result.
"""
if not self._manager or not self._session_key:
return ""
is_cold = not self._base_context_cache
results: list[str] = []
for i in range(self._dialectic_depth):
if i == 0:
prompt = self._build_dialectic_prompt(0, results, is_cold)
else:
# Skip further passes if prior pass delivered strong signal
if results and self._signal_sufficient(results[-1]):
logger.debug("Honcho dialectic depth %d: pass %d skipped, prior signal sufficient",
self._dialectic_depth, i)
break
prompt = self._build_dialectic_prompt(i, results, is_cold)
level = self._resolve_pass_level(i)
logger.debug("Honcho dialectic depth %d: pass %d, level=%s, cold=%s",
self._dialectic_depth, i, level, is_cold)
result = self._manager.dialectic_query(
self._session_key, prompt,
reasoning_level=level,
peer="user",
)
results.append(result or "")
# Return the last non-empty result (deepest pass that ran)
for r in reversed(results):
if r and r.strip():
return r
return ""
# Also fire context prefetch if cadence allows
if self._context_cadence <= 1 or (self._turn_count - self._last_context_turn) >= self._context_cadence:
self._last_context_turn = self._turn_count
try:
self._manager.prefetch_context(self._session_key, query)
except Exception as e:
logger.debug("Honcho context prefetch failed: %s", e)
def on_turn_start(self, turn_number: int, message: str, **kwargs) -> None:
"""Track turn count for cadence and injection_frequency logic."""
@@ -949,14 +659,7 @@ class HonchoMemoryProvider(MemoryProvider):
try:
if tool_name == "honcho_profile":
peer = args.get("peer", "user")
card_update = args.get("card")
if card_update:
result = self._manager.set_peer_card(self._session_key, card_update, peer=peer)
if result is None:
return tool_error("Failed to update peer card.")
return json.dumps({"result": f"Peer card updated ({len(result)} facts).", "card": result})
card = self._manager.get_peer_card(self._session_key, peer=peer)
card = self._manager.get_peer_card(self._session_key)
if not card:
return json.dumps({"result": "No profile facts available yet."})
return json.dumps({"result": card})
@@ -966,64 +669,30 @@ class HonchoMemoryProvider(MemoryProvider):
if not query:
return tool_error("Missing required parameter: query")
max_tokens = min(int(args.get("max_tokens", 800)), 2000)
peer = args.get("peer", "user")
result = self._manager.search_context(
self._session_key, query, max_tokens=max_tokens, peer=peer
self._session_key, query, max_tokens=max_tokens
)
if not result:
return json.dumps({"result": "No relevant context found."})
return json.dumps({"result": result})
elif tool_name == "honcho_reasoning":
elif tool_name == "honcho_context":
query = args.get("query", "")
if not query:
return tool_error("Missing required parameter: query")
peer = args.get("peer", "user")
reasoning_level = args.get("reasoning_level")
result = self._manager.dialectic_query(
self._session_key, query,
reasoning_level=reasoning_level,
peer=peer,
self._session_key, query, peer=peer
)
# Update cadence tracker so auto-injection respects the gap after an explicit call
self._last_dialectic_turn = self._turn_count
return json.dumps({"result": result or "No result from Honcho."})
elif tool_name == "honcho_context":
peer = args.get("peer", "user")
ctx = self._manager.get_session_context(self._session_key, peer=peer)
if not ctx:
return json.dumps({"result": "No context available yet."})
parts = []
if ctx.get("summary"):
parts.append(f"## Summary\n{ctx['summary']}")
if ctx.get("representation"):
parts.append(f"## Representation\n{ctx['representation']}")
if ctx.get("card"):
parts.append(f"## Card\n{ctx['card']}")
if ctx.get("recent_messages"):
msgs = ctx["recent_messages"]
msg_str = "\n".join(
f" [{m['role']}] {m['content'][:200]}"
for m in msgs[-5:] # last 5 for brevity
)
parts.append(f"## Recent messages\n{msg_str}")
return json.dumps({"result": "\n\n".join(parts) or "No context available."})
elif tool_name == "honcho_conclude":
delete_id = args.get("delete_id")
peer = args.get("peer", "user")
if delete_id:
ok = self._manager.delete_conclusion(self._session_key, delete_id, peer=peer)
if ok:
return json.dumps({"result": f"Conclusion {delete_id} deleted."})
return tool_error(f"Failed to delete conclusion {delete_id}.")
conclusion = args.get("conclusion", "")
if not conclusion:
return tool_error("Missing required parameter: conclusion or delete_id")
ok = self._manager.create_conclusion(self._session_key, conclusion, peer=peer)
return tool_error("Missing required parameter: conclusion")
ok = self._manager.create_conclusion(self._session_key, conclusion)
if ok:
return json.dumps({"result": f"Conclusion saved for {peer}: {conclusion}"})
return json.dumps({"result": f"Conclusion saved: {conclusion}"})
return tool_error("Failed to save conclusion.")
return tool_error(f"Unknown tool: {tool_name}")
+16 -109
View File
@@ -440,43 +440,11 @@ def cmd_setup(args) -> None:
if new_recall in ("hybrid", "context", "tools"):
hermes_host["recallMode"] = new_recall
# --- 7. Context token budget ---
current_ctx_tokens = hermes_host.get("contextTokens") or cfg.get("contextTokens")
current_display = str(current_ctx_tokens) if current_ctx_tokens else "uncapped"
print("\n Context injection per turn (hybrid/context recall modes only):")
print(" uncapped -- no limit (default)")
print(" N -- token limit per turn (e.g. 1200)")
new_ctx_tokens = _prompt("Context tokens", default=current_display)
if new_ctx_tokens.strip().lower() in ("none", "uncapped", "no limit"):
hermes_host.pop("contextTokens", None)
elif new_ctx_tokens.strip() == "":
pass # keep current
else:
try:
val = int(new_ctx_tokens)
if val >= 0:
hermes_host["contextTokens"] = val
except (ValueError, TypeError):
pass # keep current
# --- 7b. Dialectic cadence ---
current_dialectic = str(hermes_host.get("dialecticCadence") or cfg.get("dialecticCadence") or "3")
print("\n Dialectic cadence:")
print(" How often Honcho rebuilds its user model (LLM call on Honcho backend).")
print(" 1 = every turn (aggressive), 3 = every 3 turns (recommended), 5+ = sparse.")
new_dialectic = _prompt("Dialectic cadence", default=current_dialectic)
try:
val = int(new_dialectic)
if val >= 1:
hermes_host["dialecticCadence"] = val
except (ValueError, TypeError):
hermes_host["dialecticCadence"] = 3
# --- 8. Session strategy ---
current_strat = hermes_host.get("sessionStrategy") or cfg.get("sessionStrategy", "per-session")
# --- 7. Session strategy ---
current_strat = hermes_host.get("sessionStrategy") or cfg.get("sessionStrategy", "per-directory")
print("\n Session strategy:")
print(" per-session -- each run starts clean, Honcho injects context automatically")
print(" per-directory -- reuses session per dir, prior context auto-injected each run")
print(" per-directory -- one session per working directory (default)")
print(" per-session -- new Honcho session each run")
print(" per-repo -- one session per git repository")
print(" global -- single session across all directories")
new_strat = _prompt("Session strategy", default=current_strat)
@@ -522,11 +490,10 @@ def cmd_setup(args) -> None:
print(f" Recall: {hcfg.recall_mode}")
print(f" Sessions: {hcfg.session_strategy}")
print("\n Honcho tools available in chat:")
print(" honcho_context -- session context: summary, representation, card, messages")
print(" honcho_search -- semantic search over history")
print(" honcho_profile -- peer card, key facts")
print(" honcho_reasoning -- ask Honcho a question, synthesized answer")
print(" honcho_conclude -- persist a user fact to memory")
print(" honcho_context -- ask Honcho about the user (LLM-synthesized)")
print(" honcho_search -- semantic search over history (no LLM)")
print(" honcho_profile -- peer card, key facts (no LLM)")
print(" honcho_conclude -- persist a user fact to memory (no LLM)")
print("\n Other commands:")
print(" hermes honcho status -- show full config")
print(" hermes honcho mode -- change recall/observation mode")
@@ -618,26 +585,13 @@ def cmd_status(args) -> None:
print(f" Enabled: {hcfg.enabled}")
print(f" API key: {masked}")
print(f" Workspace: {hcfg.workspace_id}")
# Config paths — show where config was read from and where writes go
global_path = Path.home() / ".honcho" / "config.json"
print(f" Config: {active_path}")
print(f" Config path: {active_path}")
if write_path != active_path:
print(f" Write to: {write_path} (profile-local)")
if active_path == global_path:
print(f" Fallback: (none — using global ~/.honcho/config.json)")
elif global_path.exists():
print(f" Fallback: {global_path} (exists, cross-app interop)")
print(f" Write path: {write_path} (instance-local)")
print(f" AI peer: {hcfg.ai_peer}")
print(f" User peer: {hcfg.peer_name or 'not set'}")
print(f" Session key: {hcfg.resolve_session_name()}")
print(f" Session strat: {hcfg.session_strategy}")
print(f" Recall mode: {hcfg.recall_mode}")
print(f" Context budget: {hcfg.context_tokens or '(uncapped)'} tokens")
raw = getattr(hcfg, "raw", None) or {}
dialectic_cadence = raw.get("dialecticCadence") or 3
print(f" Dialectic cad: every {dialectic_cadence} turn{'s' if dialectic_cadence != 1 else ''}")
print(f" Observation: user(me={hcfg.user_observe_me},others={hcfg.user_observe_others}) ai(me={hcfg.ai_observe_me},others={hcfg.ai_observe_others})")
print(f" Write freq: {hcfg.write_frequency}")
@@ -645,8 +599,8 @@ def cmd_status(args) -> None:
print("\n Connection... ", end="", flush=True)
try:
client = get_honcho_client(hcfg)
_show_peer_cards(hcfg, client)
print("OK")
_show_peer_cards(hcfg, client)
except Exception as e:
print(f"FAILED ({e})\n")
else:
@@ -870,41 +824,6 @@ def cmd_mode(args) -> None:
print(f" {label}Recall mode -> {mode_arg} ({MODES[mode_arg]})\n")
def cmd_strategy(args) -> None:
"""Show or set the session strategy."""
STRATEGIES = {
"per-session": "each run starts clean, Honcho injects context automatically",
"per-directory": "reuses session per dir, prior context auto-injected each run",
"per-repo": "one session per git repository",
"global": "single session across all directories",
}
cfg = _read_config()
strat_arg = getattr(args, "strategy", None)
if strat_arg is None:
current = (
(cfg.get("hosts") or {}).get(_host_key(), {}).get("sessionStrategy")
or cfg.get("sessionStrategy")
or "per-session"
)
print("\nHoncho session strategy\n" + "" * 40)
for s, desc in STRATEGIES.items():
marker = " <-" if s == current else ""
print(f" {s:<15} {desc}{marker}")
print(f"\n Set with: hermes honcho strategy [per-session|per-directory|per-repo|global]\n")
return
if strat_arg not in STRATEGIES:
print(f" Invalid strategy '{strat_arg}'. Options: {', '.join(STRATEGIES)}\n")
return
host = _host_key()
label = f"[{host}] " if host != "hermes" else ""
cfg.setdefault("hosts", {}).setdefault(host, {})["sessionStrategy"] = strat_arg
_write_config(cfg)
print(f" {label}Session strategy -> {strat_arg} ({STRATEGIES[strat_arg]})\n")
def cmd_tokens(args) -> None:
"""Show or set token budget settings."""
cfg = _read_config()
@@ -1224,11 +1143,10 @@ def cmd_migrate(args) -> None:
print(" automatically. Files become the seed, not the live store.")
print()
print(" Honcho tools (available to the agent during conversation)")
print(" honcho_context — session context: summary, representation, card, messages")
print(" honcho_search — semantic search over stored context")
print(" honcho_profile — fast peer card snapshot")
print(" honcho_reasoning — ask Honcho a question, synthesized answer")
print(" honcho_conclude — write a conclusion/fact back to memory")
print(" honcho_context — ask Honcho a question, get a synthesized answer (LLM)")
print(" honcho_search — semantic search over stored context (no LLM)")
print(" honcho_profile — fast peer card snapshot (no LLM)")
print(" honcho_conclude — write a conclusion/fact back to memory (no LLM)")
print()
print(" Session naming")
print(" OpenClaw: no persistent session concept — files are global.")
@@ -1279,8 +1197,6 @@ def honcho_command(args) -> None:
cmd_peer(args)
elif sub == "mode":
cmd_mode(args)
elif sub == "strategy":
cmd_strategy(args)
elif sub == "tokens":
cmd_tokens(args)
elif sub == "identity":
@@ -1295,7 +1211,7 @@ def honcho_command(args) -> None:
cmd_sync(args)
else:
print(f" Unknown honcho command: {sub}")
print(" Available: status, sessions, map, peer, mode, strategy, tokens, identity, migrate, enable, disable, sync\n")
print(" Available: status, sessions, map, peer, mode, tokens, identity, migrate, enable, disable, sync\n")
def register_cli(subparser) -> None:
@@ -1354,15 +1270,6 @@ def register_cli(subparser) -> None:
help="Recall mode to set (hybrid/context/tools). Omit to show current.",
)
strategy_parser = subs.add_parser(
"strategy", help="Show or set session strategy (per-session/per-directory/per-repo/global)",
)
strategy_parser.add_argument(
"strategy", nargs="?", metavar="STRATEGY",
choices=("per-session", "per-directory", "per-repo", "global"),
help="Session strategy to set. Omit to show current.",
)
tokens_parser = subs.add_parser(
"tokens", help="Show or set token budget for context and dialectic",
)
+16 -127
View File
@@ -94,68 +94,6 @@ def _resolve_bool(host_val, root_val, *, default: bool) -> bool:
return default
def _parse_context_tokens(host_val, root_val) -> int | None:
"""Parse contextTokens: host wins, then root, then None (uncapped)."""
for val in (host_val, root_val):
if val is not None:
try:
return int(val)
except (ValueError, TypeError):
pass
return None
def _parse_dialectic_depth(host_val, root_val) -> int:
"""Parse dialecticDepth: host wins, then root, then 1. Clamped to 1-3."""
for val in (host_val, root_val):
if val is not None:
try:
return max(1, min(int(val), 3))
except (ValueError, TypeError):
pass
return 1
_VALID_REASONING_LEVELS = ("minimal", "low", "medium", "high", "max")
def _parse_dialectic_depth_levels(host_val, root_val, depth: int) -> list[str] | None:
"""Parse dialecticDepthLevels: optional array of reasoning levels per pass.
Returns None when not configured (use proportional defaults).
When configured, validates each level and truncates/pads to match depth.
"""
for val in (host_val, root_val):
if val is not None and isinstance(val, list):
levels = [
lvl if lvl in _VALID_REASONING_LEVELS else "low"
for lvl in val[:depth]
]
# Pad with "low" if array is shorter than depth
while len(levels) < depth:
levels.append("low")
return levels
return None
def _resolve_optional_float(*values: Any) -> float | None:
"""Return the first non-empty value coerced to a positive float."""
for value in values:
if value is None:
continue
if isinstance(value, str):
value = value.strip()
if not value:
continue
try:
parsed = float(value)
except (TypeError, ValueError):
continue
if parsed > 0:
return parsed
return None
_VALID_OBSERVATION_MODES = {"unified", "directional"}
_OBSERVATION_MODE_ALIASES = {"shared": "unified", "separate": "directional", "cross": "directional"}
@@ -221,8 +159,6 @@ class HonchoClientConfig:
environment: str = "production"
# Optional base URL for self-hosted Honcho (overrides environment mapping)
base_url: str | None = None
# Optional request timeout in seconds for Honcho SDK HTTP calls
timeout: float | None = None
# Identity
peer_name: str | None = None
ai_peer: str = "hermes"
@@ -232,25 +168,17 @@ class HonchoClientConfig:
# Write frequency: "async" (background thread), "turn" (sync per turn),
# "session" (flush on session end), or int (every N turns)
write_frequency: str | int = "async"
# Prefetch budget (None = no cap; set to an integer to bound auto-injected context)
# Prefetch budget
context_tokens: int | None = None
# Dialectic (peer.chat) settings
# reasoning_level: "minimal" | "low" | "medium" | "high" | "max"
dialectic_reasoning_level: str = "low"
# When true, the model can override reasoning_level per-call via the
# honcho_reasoning tool param (agentic). When false, always uses
# dialecticReasoningLevel and ignores model-provided overrides.
# dynamic: auto-bump reasoning level based on query length
# true — low->medium (120+ chars), low->high (400+ chars), capped at "high"
# false — always use dialecticReasoningLevel as-is
dialectic_dynamic: bool = True
# Max chars of dialectic result to inject into Hermes system prompt
dialectic_max_chars: int = 600
# Dialectic depth: how many .chat() calls per dialectic cycle (1-3).
# Depth 1: single call. Depth 2: self-audit + targeted synthesis.
# Depth 3: self-audit + synthesis + reconciliation.
dialectic_depth: int = 1
# Optional per-pass reasoning level override. Array of reasoning levels
# matching dialectic_depth length. When None, uses proportional defaults
# derived from dialectic_reasoning_level.
dialectic_depth_levels: list[str] | None = None
# Honcho API limits — configurable for self-hosted instances
# Max chars per message sent via add_messages() (Honcho cloud: 25000)
message_max_chars: int = 25000
@@ -261,8 +189,10 @@ class HonchoClientConfig:
# "context" — auto-injected context only, Honcho tools removed
# "tools" — Honcho tools only, no auto-injected context
recall_mode: str = "hybrid"
# Eager init in tools mode — when true, initializes session during
# initialize() instead of deferring to first tool call
# When True and recallMode is "tools", create the Honcho session eagerly
# during initialize() instead of deferring to the first tool call.
# This ensures sync_turn() can write from the very first turn.
# Does NOT enable automatic context injection — only changes init timing.
init_on_session_start: bool = False
# Observation mode: legacy string shorthand ("directional" or "unified").
# Kept for backward compat; granular per-peer booleans below are preferred.
@@ -294,14 +224,12 @@ class HonchoClientConfig:
resolved_host = host or resolve_active_host()
api_key = os.environ.get("HONCHO_API_KEY")
base_url = os.environ.get("HONCHO_BASE_URL", "").strip() or None
timeout = _resolve_optional_float(os.environ.get("HONCHO_TIMEOUT"))
return cls(
host=resolved_host,
workspace_id=workspace_id,
api_key=api_key,
environment=os.environ.get("HONCHO_ENVIRONMENT", "production"),
base_url=base_url,
timeout=timeout,
ai_peer=resolved_host,
enabled=bool(api_key or base_url),
)
@@ -362,11 +290,6 @@ class HonchoClientConfig:
or os.environ.get("HONCHO_BASE_URL", "").strip()
or None
)
timeout = _resolve_optional_float(
raw.get("timeout"),
raw.get("requestTimeout"),
os.environ.get("HONCHO_TIMEOUT"),
)
# Auto-enable when API key or base_url is present (unless explicitly disabled)
# Host-level enabled wins, then root-level, then auto-enable if key/url exists.
@@ -412,16 +335,12 @@ class HonchoClientConfig:
api_key=api_key,
environment=environment,
base_url=base_url,
timeout=timeout,
peer_name=host_block.get("peerName") or raw.get("peerName"),
ai_peer=ai_peer,
enabled=enabled,
save_messages=save_messages,
write_frequency=write_frequency,
context_tokens=_parse_context_tokens(
host_block.get("contextTokens"),
raw.get("contextTokens"),
),
context_tokens=host_block.get("contextTokens") or raw.get("contextTokens"),
dialectic_reasoning_level=(
host_block.get("dialecticReasoningLevel")
or raw.get("dialecticReasoningLevel")
@@ -437,15 +356,6 @@ class HonchoClientConfig:
or raw.get("dialecticMaxChars")
or 600
),
dialectic_depth=_parse_dialectic_depth(
host_block.get("dialecticDepth"),
raw.get("dialecticDepth"),
),
dialectic_depth_levels=_parse_dialectic_depth_levels(
host_block.get("dialecticDepthLevels"),
raw.get("dialecticDepthLevels"),
depth=_parse_dialectic_depth(host_block.get("dialecticDepth"), raw.get("dialecticDepth")),
),
message_max_chars=int(
host_block.get("messageMaxChars")
or raw.get("messageMaxChars")
@@ -512,18 +422,16 @@ class HonchoClientConfig:
cwd: str | None = None,
session_title: str | None = None,
session_id: str | None = None,
gateway_session_key: str | None = None,
) -> str | None:
"""Resolve Honcho session name.
Resolution order:
1. Manual directory override from sessions map
2. Hermes session title (from /title command)
3. Gateway session key (stable per-chat identifier from gateway platforms)
4. per-session strategy Hermes session_id ({timestamp}_{hex})
5. per-repo strategy git repo root directory name
6. per-directory strategy directory basename
7. global strategy workspace name
3. per-session strategy Hermes session_id ({timestamp}_{hex})
4. per-repo strategy git repo root directory name
5. per-directory strategy directory basename
6. global strategy workspace name
"""
import re
@@ -537,22 +445,12 @@ class HonchoClientConfig:
# /title mid-session remap
if session_title:
sanitized = re.sub(r'[^a-zA-Z0-9_-]+', '-', session_title).strip('-')
sanitized = re.sub(r'[^a-zA-Z0-9_-]', '-', session_title).strip('-')
if sanitized:
if self.session_peer_prefix and self.peer_name:
return f"{self.peer_name}-{sanitized}"
return sanitized
# Gateway session key: stable per-chat identifier passed by the gateway
# (e.g. "agent:main:telegram:dm:8439114563"). Sanitize colons to hyphens
# for Honcho session ID compatibility. This takes priority over strategy-
# based resolution because gateway platforms need per-chat isolation that
# cwd-based strategies cannot provide.
if gateway_session_key:
sanitized = re.sub(r'[^a-zA-Z0-9_-]+', '-', gateway_session_key).strip('-')
if sanitized:
return sanitized
# per-session: inherit Hermes session_id (new Honcho session each run)
if self.session_strategy == "per-session" and session_id:
if self.session_peer_prefix and self.peer_name:
@@ -614,20 +512,13 @@ def get_honcho_client(config: HonchoClientConfig | None = None) -> Honcho:
# mapping, enabling remote self-hosted Honcho deployments without
# requiring the server to live on localhost.
resolved_base_url = config.base_url
resolved_timeout = config.timeout
if not resolved_base_url or resolved_timeout is None:
if not resolved_base_url:
try:
from hermes_cli.config import load_config
hermes_cfg = load_config()
honcho_cfg = hermes_cfg.get("honcho", {})
if isinstance(honcho_cfg, dict):
if not resolved_base_url:
resolved_base_url = honcho_cfg.get("base_url", "").strip() or None
if resolved_timeout is None:
resolved_timeout = _resolve_optional_float(
honcho_cfg.get("timeout"),
honcho_cfg.get("request_timeout"),
)
resolved_base_url = honcho_cfg.get("base_url", "").strip() or None
except Exception:
pass
@@ -662,8 +553,6 @@ def get_honcho_client(config: HonchoClientConfig | None = None) -> Honcho:
}
if resolved_base_url:
kwargs["base_url"] = resolved_base_url
if resolved_timeout is not None:
kwargs["timeout"] = resolved_timeout
_honcho_client = Honcho(**kwargs)
+75 -247
View File
@@ -486,9 +486,36 @@ class HonchoSessionManager:
_REASONING_LEVELS = ("minimal", "low", "medium", "high", "max")
def _default_reasoning_level(self) -> str:
"""Return the configured default reasoning level."""
return self._dialectic_reasoning_level
def _dynamic_reasoning_level(self, query: str) -> str:
"""
Pick a reasoning level for a dialectic query.
When dialecticDynamic is true (default), auto-bumps based on query
length so Honcho applies more inference where it matters:
< 120 chars -> configured default (typically "low")
120-400 chars -> +1 level above default (cap at "high")
> 400 chars -> +2 levels above default (cap at "high")
"max" is never selected automatically -- reserve it for explicit config.
When dialecticDynamic is false, always returns the configured level.
"""
if not self._dialectic_dynamic:
return self._dialectic_reasoning_level
levels = self._REASONING_LEVELS
default_idx = levels.index(self._dialectic_reasoning_level) if self._dialectic_reasoning_level in levels else 1
n = len(query)
if n < 120:
bump = 0
elif n < 400:
bump = 1
else:
bump = 2
# Cap at "high" (index 3) for auto-selection
idx = min(default_idx + bump, 3)
return levels[idx]
def dialectic_query(
self, session_key: str, query: str,
@@ -505,9 +532,8 @@ class HonchoSessionManager:
Args:
session_key: The session key to query against.
query: Natural language question.
reasoning_level: Override the configured default (dialecticReasoningLevel).
Only honored when dialecticDynamic is true.
If None or dialecticDynamic is false, uses the configured default.
reasoning_level: Override the config default. If None, uses
_dynamic_reasoning_level(query).
peer: Which peer to query "user" (default) or "ai".
Returns:
@@ -517,34 +543,29 @@ class HonchoSessionManager:
if not session:
return ""
target_peer_id = self._resolve_peer_id(session, peer)
if target_peer_id is None:
return ""
# Guard: truncate query to Honcho's dialectic input limit
if len(query) > self._dialectic_max_input_chars:
query = query[:self._dialectic_max_input_chars].rsplit(" ", 1)[0]
if self._dialectic_dynamic and reasoning_level:
level = reasoning_level
else:
level = self._default_reasoning_level()
level = reasoning_level or self._dynamic_reasoning_level(query)
try:
if self._ai_observe_others:
# AI peer can observe other peers — use assistant as observer.
ai_peer_obj = self._get_or_create_peer(session.assistant_peer_id)
if target_peer_id == session.assistant_peer_id:
# AI peer can observe user — use cross-observation routing
if peer == "ai":
ai_peer_obj = self._get_or_create_peer(session.assistant_peer_id)
result = ai_peer_obj.chat(query, reasoning_level=level) or ""
else:
ai_peer_obj = self._get_or_create_peer(session.assistant_peer_id)
result = ai_peer_obj.chat(
query,
target=target_peer_id,
target=session.user_peer_id,
reasoning_level=level,
) or ""
else:
# Without cross-observation, each peer queries its own context.
target_peer = self._get_or_create_peer(target_peer_id)
# AI can't observe others — each peer queries self
peer_id = session.assistant_peer_id if peer == "ai" else session.user_peer_id
target_peer = self._get_or_create_peer(peer_id)
result = target_peer.chat(query, reasoning_level=level) or ""
# Apply Hermes-side char cap before caching
@@ -626,11 +647,10 @@ class HonchoSessionManager:
"""
Pre-fetch user and AI peer context from Honcho.
Fetches peer_representation and peer_card for both peers, plus the
session summary when available. search_query is intentionally omitted
it would only affect additional excerpts that this code does not
consume, and passing the raw message exposes conversation content in
server access logs.
Fetches peer_representation and peer_card for both peers. search_query
is intentionally omitted it would only affect additional excerpts
that this code does not consume, and passing the raw message exposes
conversation content in server access logs.
Args:
session_key: The session key to get context for.
@@ -638,29 +658,15 @@ class HonchoSessionManager:
Returns:
Dictionary with 'representation', 'card', 'ai_representation',
'ai_card', and optionally 'summary' keys.
and 'ai_card' keys.
"""
session = self._cache.get(session_key)
if not session:
return {}
result: dict[str, str] = {}
# Session summary — provides session-scoped context.
# Fresh sessions (per-session cold start, or first-ever per-directory)
# return null summary — the guard below handles that gracefully.
# Per-directory returning sessions get their accumulated summary.
try:
honcho_session = self._sessions_cache.get(session.honcho_session_id)
if honcho_session:
ctx = honcho_session.context(summary=True)
if ctx.summary and getattr(ctx.summary, "content", None):
result["summary"] = ctx.summary.content
except Exception as e:
logger.debug("Failed to fetch session summary from Honcho: %s", e)
try:
user_ctx = self._fetch_peer_context(session.user_peer_id, target=session.user_peer_id)
user_ctx = self._fetch_peer_context(session.user_peer_id)
result["representation"] = user_ctx["representation"]
result["card"] = "\n".join(user_ctx["card"])
except Exception as e:
@@ -668,7 +674,7 @@ class HonchoSessionManager:
# Also fetch AI peer's own representation so Hermes knows itself.
try:
ai_ctx = self._fetch_peer_context(session.assistant_peer_id, target=session.assistant_peer_id)
ai_ctx = self._fetch_peer_context(session.assistant_peer_id)
result["ai_representation"] = ai_ctx["representation"]
result["ai_card"] = "\n".join(ai_ctx["card"])
except Exception as e:
@@ -856,7 +862,7 @@ class HonchoSessionManager:
return [str(item) for item in card if item]
return [str(card)]
def _fetch_peer_card(self, peer_id: str, *, target: str | None = None) -> list[str]:
def _fetch_peer_card(self, peer_id: str) -> list[str]:
"""Fetch a peer card directly from the peer object.
This avoids relying on session.context(), which can return an empty
@@ -866,33 +872,22 @@ class HonchoSessionManager:
peer = self._get_or_create_peer(peer_id)
getter = getattr(peer, "get_card", None)
if callable(getter):
return self._normalize_card(getter(target=target) if target is not None else getter())
return self._normalize_card(getter())
legacy_getter = getattr(peer, "card", None)
if callable(legacy_getter):
return self._normalize_card(legacy_getter(target=target) if target is not None else legacy_getter())
return self._normalize_card(legacy_getter())
return []
def _fetch_peer_context(
self,
peer_id: str,
search_query: str | None = None,
*,
target: str | None = None,
) -> dict[str, Any]:
def _fetch_peer_context(self, peer_id: str, search_query: str | None = None) -> dict[str, Any]:
"""Fetch representation + peer card directly from a peer object."""
peer = self._get_or_create_peer(peer_id)
representation = ""
card: list[str] = []
try:
context_kwargs: dict[str, Any] = {}
if target is not None:
context_kwargs["target"] = target
if search_query is not None:
context_kwargs["search_query"] = search_query
ctx = peer.context(**context_kwargs) if context_kwargs else peer.context()
ctx = peer.context(search_query=search_query) if search_query else peer.context()
representation = (
getattr(ctx, "representation", None)
or getattr(ctx, "peer_representation", None)
@@ -904,111 +899,24 @@ class HonchoSessionManager:
if not representation:
try:
representation = (
peer.representation(target=target) if target is not None else peer.representation()
) or ""
representation = peer.representation() or ""
except Exception as e:
logger.debug("Direct peer.representation() failed for '%s': %s", peer_id, e)
if not card:
try:
card = self._fetch_peer_card(peer_id, target=target)
card = self._fetch_peer_card(peer_id)
except Exception as e:
logger.debug("Direct peer card fetch failed for '%s': %s", peer_id, e)
return {"representation": representation, "card": card}
def get_session_context(self, session_key: str, peer: str = "user") -> dict[str, Any]:
"""Fetch full session context from Honcho including summary.
Uses the session-level context() API which returns summary,
peer_representation, peer_card, and messages.
def get_peer_card(self, session_key: str) -> list[str]:
"""
session = self._cache.get(session_key)
if not session:
return {}
honcho_session = self._sessions_cache.get(session.honcho_session_id)
if not honcho_session:
# Fall back to peer-level context, respecting the requested peer
peer_id = self._resolve_peer_id(session, peer)
if peer_id is None:
peer_id = session.user_peer_id
return self._fetch_peer_context(peer_id, target=peer_id)
try:
peer_id = self._resolve_peer_id(session, peer)
ctx = honcho_session.context(
summary=True,
peer_target=peer_id,
peer_perspective=session.user_peer_id if peer == "user" else session.assistant_peer_id,
)
result: dict[str, Any] = {}
# Summary
if ctx.summary:
result["summary"] = ctx.summary.content
# Peer representation and card
if ctx.peer_representation:
result["representation"] = ctx.peer_representation
if ctx.peer_card:
result["card"] = "\n".join(ctx.peer_card)
# Messages (last N for context)
if ctx.messages:
recent = ctx.messages[-10:] # last 10 messages
result["recent_messages"] = [
{"role": getattr(m, "peer_id", "unknown"), "content": (m.content or "")[:500]}
for m in recent
]
return result
except Exception as e:
logger.debug("Session context fetch failed: %s", e)
return {}
def _resolve_peer_id(self, session: HonchoSession, peer: str | None) -> str:
"""Resolve a peer alias or explicit peer ID to a concrete Honcho peer ID.
Always returns a non-empty string: either a known peer ID or a
sanitized version of the caller-supplied alias/ID.
"""
candidate = (peer or "user").strip()
if not candidate:
return session.user_peer_id
normalized = self._sanitize_id(candidate)
if normalized == self._sanitize_id("user"):
return session.user_peer_id
if normalized == self._sanitize_id("ai"):
return session.assistant_peer_id
return normalized
def _resolve_observer_target(
self,
session: HonchoSession,
peer: str | None,
) -> tuple[str, str | None]:
"""Resolve observer and target peer IDs for context/search/profile queries."""
target_peer_id = self._resolve_peer_id(session, peer)
if target_peer_id == session.assistant_peer_id:
return session.assistant_peer_id, session.assistant_peer_id
if self._ai_observe_others:
return session.assistant_peer_id, target_peer_id
return target_peer_id, None
def get_peer_card(self, session_key: str, peer: str = "user") -> list[str]:
"""
Fetch a peer card a curated list of key facts.
Fetch the user peer's card — a curated list of key facts.
Fast, no LLM reasoning. Returns raw structured facts Honcho has
inferred about the target peer (name, role, preferences, patterns).
inferred about the user (name, role, preferences, patterns).
Empty list if unavailable.
"""
session = self._cache.get(session_key)
@@ -1016,19 +924,12 @@ class HonchoSessionManager:
return []
try:
observer_peer_id, target_peer_id = self._resolve_observer_target(session, peer)
return self._fetch_peer_card(observer_peer_id, target=target_peer_id)
return self._fetch_peer_card(session.user_peer_id)
except Exception as e:
logger.debug("Failed to fetch peer card from Honcho: %s", e)
return []
def search_context(
self,
session_key: str,
query: str,
max_tokens: int = 800,
peer: str = "user",
) -> str:
def search_context(self, session_key: str, query: str, max_tokens: int = 800) -> str:
"""
Semantic search over Honcho session context.
@@ -1040,7 +941,6 @@ class HonchoSessionManager:
session_key: Session to search against.
query: Search query for semantic matching.
max_tokens: Token budget for returned content.
peer: Peer alias or explicit peer ID to search about.
Returns:
Relevant context excerpts as a string, or empty string if none.
@@ -1050,13 +950,7 @@ class HonchoSessionManager:
return ""
try:
observer_peer_id, target = self._resolve_observer_target(session, peer)
ctx = self._fetch_peer_context(
observer_peer_id,
search_query=query,
target=target,
)
ctx = self._fetch_peer_context(session.user_peer_id, search_query=query)
parts = []
if ctx["representation"]:
parts.append(ctx["representation"])
@@ -1068,17 +962,16 @@ class HonchoSessionManager:
logger.debug("Honcho search_context failed: %s", e)
return ""
def create_conclusion(self, session_key: str, content: str, peer: str = "user") -> bool:
"""Write a conclusion about a target peer back to Honcho.
def create_conclusion(self, session_key: str, content: str) -> bool:
"""Write a conclusion about the user back to Honcho.
Conclusions are facts a peer observes about another peer or itself
preferences, corrections, clarifications, and project context.
They feed into the target peer's card and representation.
Conclusions are facts the AI peer observes about the user
preferences, corrections, clarifications, project context.
They feed into the user's peer card and representation.
Args:
session_key: Session to associate the conclusion with.
content: The conclusion text.
peer: Peer alias or explicit peer ID. "user" is the default alias.
content: The conclusion text (e.g. "User prefers dark mode").
Returns:
True on success, False on failure.
@@ -1092,90 +985,25 @@ class HonchoSessionManager:
return False
try:
target_peer_id = self._resolve_peer_id(session, peer)
if target_peer_id is None:
logger.warning("Could not resolve conclusion peer '%s' for session '%s'", peer, session_key)
return False
if target_peer_id == session.assistant_peer_id:
if self._ai_observe_others:
# AI peer creates conclusion about user (cross-observation)
assistant_peer = self._get_or_create_peer(session.assistant_peer_id)
conclusions_scope = assistant_peer.conclusions_of(session.assistant_peer_id)
elif self._ai_observe_others:
assistant_peer = self._get_or_create_peer(session.assistant_peer_id)
conclusions_scope = assistant_peer.conclusions_of(target_peer_id)
conclusions_scope = assistant_peer.conclusions_of(session.user_peer_id)
else:
target_peer = self._get_or_create_peer(target_peer_id)
conclusions_scope = target_peer.conclusions_of(target_peer_id)
# AI can't observe others — user peer creates self-conclusion
user_peer = self._get_or_create_peer(session.user_peer_id)
conclusions_scope = user_peer.conclusions_of(session.user_peer_id)
conclusions_scope.create([{
"content": content.strip(),
"session_id": session.honcho_session_id,
}])
logger.info("Created conclusion about %s for %s: %s", target_peer_id, session_key, content[:80])
logger.info("Created conclusion for %s: %s", session_key, content[:80])
return True
except Exception as e:
logger.error("Failed to create conclusion: %s", e)
return False
def delete_conclusion(self, session_key: str, conclusion_id: str, peer: str = "user") -> bool:
"""Delete a conclusion by ID. Use only for PII removal.
Args:
session_key: Session key for peer resolution.
conclusion_id: The conclusion ID to delete.
peer: Peer alias or explicit peer ID.
Returns:
True on success, False on failure.
"""
session = self._cache.get(session_key)
if not session:
return False
try:
target_peer_id = self._resolve_peer_id(session, peer)
if target_peer_id == session.assistant_peer_id:
observer = self._get_or_create_peer(session.assistant_peer_id)
scope = observer.conclusions_of(session.assistant_peer_id)
elif self._ai_observe_others:
observer = self._get_or_create_peer(session.assistant_peer_id)
scope = observer.conclusions_of(target_peer_id)
else:
target_peer = self._get_or_create_peer(target_peer_id)
scope = target_peer.conclusions_of(target_peer_id)
scope.delete(conclusion_id)
logger.info("Deleted conclusion %s for %s", conclusion_id, session_key)
return True
except Exception as e:
logger.error("Failed to delete conclusion %s: %s", conclusion_id, e)
return False
def set_peer_card(self, session_key: str, card: list[str], peer: str = "user") -> list[str] | None:
"""Update a peer's card.
Args:
session_key: Session key for peer resolution.
card: New peer card as list of fact strings.
peer: Peer alias or explicit peer ID.
Returns:
Updated card on success, None on failure.
"""
session = self._cache.get(session_key)
if not session:
return None
try:
peer_id = self._resolve_peer_id(session, peer)
if peer_id is None:
logger.warning("Could not resolve peer '%s' for set_peer_card in session '%s'", peer, session_key)
return None
peer_obj = self._get_or_create_peer(peer_id)
result = peer_obj.set_card(card)
logger.info("Updated peer card for %s (%d facts)", peer_id, len(card))
return result
except Exception as e:
logger.error("Failed to set peer card: %s", e)
return None
def seed_ai_identity(self, session_key: str, content: str, source: str = "manual") -> bool:
"""
Seed the AI peer's Honcho representation from text content.
@@ -1233,7 +1061,7 @@ class HonchoSessionManager:
return {"representation": "", "card": ""}
try:
ctx = self._fetch_peer_context(session.assistant_peer_id, target=session.assistant_peer_id)
ctx = self._fetch_peer_context(session.assistant_peer_id)
return {
"representation": ctx["representation"] or "",
"card": "\n".join(ctx["card"]),
+9 -46
View File
@@ -10,9 +10,8 @@ lifecycle instead of read-only search endpoints.
Config via environment variables (profile-scoped via each profile's .env):
OPENVIKING_ENDPOINT Server URL (default: http://127.0.0.1:1933)
OPENVIKING_API_KEY API key (required for authenticated servers)
OPENVIKING_ACCOUNT Tenant account (default: default)
OPENVIKING_ACCOUNT Tenant account (default: root)
OPENVIKING_USER Tenant user (default: default)
OPENVIKING_AGENT Tenant agent (default: hermes)
Capabilities:
- Automatic memory extraction on session commit (6 categories)
@@ -81,12 +80,11 @@ class _VikingClient:
"""Thin HTTP client for the OpenViking REST API."""
def __init__(self, endpoint: str, api_key: str = "",
account: str = "", user: str = "", agent: str = ""):
account: str = "", user: str = ""):
self._endpoint = endpoint.rstrip("/")
self._api_key = api_key
self._account = account or os.environ.get("OPENVIKING_ACCOUNT", "default")
self._account = account or os.environ.get("OPENVIKING_ACCOUNT", "root")
self._user = user or os.environ.get("OPENVIKING_USER", "default")
self._agent = agent or os.environ.get("OPENVIKING_AGENT", "hermes")
self._httpx = _get_httpx()
if self._httpx is None:
raise ImportError("httpx is required for OpenViking: pip install httpx")
@@ -96,7 +94,6 @@ class _VikingClient:
"Content-Type": "application/json",
"X-OpenViking-Account": self._account,
"X-OpenViking-User": self._user,
"X-OpenViking-Agent": self._agent,
}
if self._api_key:
h["X-API-Key"] = self._api_key
@@ -285,44 +282,20 @@ class OpenVikingMemoryProvider(MemoryProvider):
},
{
"key": "api_key",
"description": "OpenViking API key (leave blank for local dev mode)",
"description": "OpenViking API key",
"secret": True,
"env_var": "OPENVIKING_API_KEY",
},
{
"key": "account",
"description": "OpenViking tenant account ID ([default], used when local mode, OPENVIKING_API_KEY is empty)",
"default": "default",
"env_var": "OPENVIKING_ACCOUNT",
},
{
"key": "user",
"description": "OpenViking user ID within the account ([default], used when local mode, OPENVIKING_API_KEY is empty)",
"default": "default",
"env_var": "OPENVIKING_USER",
},
{
"key": "agent",
"description": "OpenViking agent ID within the account ([hermes], useful in multi-agent mode)",
"default": "hermes",
"env_var": "OPENVIKING_AGENT",
},
]
def initialize(self, session_id: str, **kwargs) -> None:
self._endpoint = os.environ.get("OPENVIKING_ENDPOINT", _DEFAULT_ENDPOINT)
self._api_key = os.environ.get("OPENVIKING_API_KEY", "")
self._account = os.environ.get("OPENVIKING_ACCOUNT", "default")
self._user = os.environ.get("OPENVIKING_USER", "default")
self._agent = os.environ.get("OPENVIKING_AGENT", "hermes")
self._session_id = session_id
self._turn_count = 0
try:
self._client = _VikingClient(
self._endpoint, self._api_key,
account=self._account, user=self._user, agent=self._agent,
)
self._client = _VikingClient(self._endpoint, self._api_key)
if not self._client.health():
logger.warning("OpenViking server at %s is not reachable", self._endpoint)
self._client = None
@@ -352,8 +325,7 @@ class OpenVikingMemoryProvider(MemoryProvider):
"(abstract/overview/full), viking_browse to explore.\n"
"Use viking_remember to store facts, viking_add_resource to index URLs/docs."
)
except Exception as e:
logger.warning("OpenViking system_prompt_block failed: %s", e)
except Exception:
return (
"# OpenViking Knowledge Base\n"
f"Active. Endpoint: {self._endpoint}\n"
@@ -379,10 +351,7 @@ class OpenVikingMemoryProvider(MemoryProvider):
def _run():
try:
client = _VikingClient(
self._endpoint, self._api_key,
account=self._account, user=self._user, agent=self._agent,
)
client = _VikingClient(self._endpoint, self._api_key)
resp = client.post("/api/v1/search/find", {
"query": query,
"top_k": 5,
@@ -417,10 +386,7 @@ class OpenVikingMemoryProvider(MemoryProvider):
def _sync():
try:
client = _VikingClient(
self._endpoint, self._api_key,
account=self._account, user=self._user, agent=self._agent,
)
client = _VikingClient(self._endpoint, self._api_key)
sid = self._session_id
# Add user message
@@ -476,10 +442,7 @@ class OpenVikingMemoryProvider(MemoryProvider):
def _write():
try:
client = _VikingClient(
self._endpoint, self._api_key,
account=self._account, user=self._user, agent=self._agent,
)
client = _VikingClient(self._endpoint, self._api_key)
# Add as a user message with memory context so the commit
# picks it up as an explicit memory during extraction
client.post(f"/api/v1/sessions/{self._session_id}/messages", {
-3
View File
@@ -63,12 +63,10 @@ homeassistant = ["aiohttp>=3.9.0,<4"]
sms = ["aiohttp>=3.9.0,<4"]
acp = ["agent-client-protocol>=0.9.0,<1.0"]
mistral = ["mistralai>=2.3.0,<3"]
bedrock = ["boto3>=1.35.0,<2"]
termux = [
# Tested Android / Termux path: keeps the core CLI feature-rich while
# avoiding extras that currently depend on non-Android wheels (notably
# faster-whisper -> ctranslate2 via the voice extra).
"python-telegram-bot[webhooks]>=22.6,<23",
"hermes-agent[cron]",
"hermes-agent[cli]",
"hermes-agent[pty]",
@@ -110,7 +108,6 @@ all = [
"hermes-agent[dingtalk]",
"hermes-agent[feishu]",
"hermes-agent[mistral]",
"hermes-agent[bedrock]",
"hermes-agent[web]",
]
+84 -696
View File
File diff suppressed because it is too large Load Diff
+1 -6
View File
@@ -28,7 +28,7 @@ BOLD='\033[1m'
# Configuration
REPO_URL_SSH="git@github.com:NousResearch/hermes-agent.git"
REPO_URL_HTTPS="https://github.com/NousResearch/hermes-agent.git"
HERMES_HOME="${HERMES_HOME:-$HOME/.hermes}"
HERMES_HOME="$HOME/.hermes"
INSTALL_DIR="${HERMES_INSTALL_DIR:-$HERMES_HOME/hermes-agent}"
PYTHON_VERSION="3.11"
NODE_VERSION="22"
@@ -66,10 +66,6 @@ while [[ $# -gt 0 ]]; do
INSTALL_DIR="$2"
shift 2
;;
--hermes-home)
HERMES_HOME="$2"
shift 2
;;
-h|--help)
echo "Hermes Agent Installer"
echo ""
@@ -80,7 +76,6 @@ while [[ $# -gt 0 ]]; do
echo " --skip-setup Skip interactive setup wizard"
echo " --branch NAME Git branch to install (default: main)"
echo " --dir PATH Installation directory (default: ~/.hermes/hermes-agent)"
echo " --hermes-home PATH Data directory (default: ~/.hermes, or \$HERMES_HOME)"
echo " -h, --help Show this help"
exit 0
;;
+1 -15
View File
@@ -62,9 +62,7 @@ AUTHOR_MAP = {
"258577966+voidborne-d@users.noreply.github.com": "voidborne-d",
"70424851+insecurejezza@users.noreply.github.com": "insecurejezza",
"259807879+Bartok9@users.noreply.github.com": "Bartok9",
"241404605+MestreY0d4-Uninter@users.noreply.github.com": "MestreY0d4-Uninter",
"268667990+Roy-oss1@users.noreply.github.com": "Roy-oss1",
"241404605+MestreY0d4-Uninter@users.noreply.github.com": "MestreY0d4-Uninter",
# contributors (manual mapping from git names)
"dmayhem93@gmail.com": "dmahan93",
"samherring99@gmail.com": "samherring99",
@@ -77,13 +75,8 @@ AUTHOR_MAP = {
"abdullahfarukozden@gmail.com": "Farukest",
"lovre.pesut@gmail.com": "rovle",
"hakanerten02@hotmail.com": "teyrebaz33",
"ruzzgarcn@gmail.com": "Ruzzgar",
"alireza78.crypto@gmail.com": "alireza78a",
"brooklyn.bb.nicholson@gmail.com": "brooklynnicholson",
"4317663+helix4u@users.noreply.github.com": "helix4u",
"331214+counterposition@users.noreply.github.com": "counterposition",
"blspear@gmail.com": "BrennerSpear",
"239876380+handsdiff@users.noreply.github.com": "handsdiff",
"gpickett00@gmail.com": "gpickett00",
"mcosma@gmail.com": "wakamex",
"clawdia.nash@proton.me": "clawdia-nash",
@@ -102,9 +95,7 @@ AUTHOR_MAP = {
"vincentcharlebois@gmail.com": "vincentcharlebois",
"aryan@synvoid.com": "aryansingh",
"johnsonblake1@gmail.com": "blakejohnson",
"greer.guthrie@gmail.com": "g-guthrie",
"kennyx102@gmail.com": "bobashopcashier",
"shokatalishaikh95@gmail.com": "areu01or00",
"bryan@intertwinesys.com": "bryanyoung",
"christo.mitov@gmail.com": "christomitov",
"hermes@nousresearch.com": "NousResearch",
@@ -124,9 +115,6 @@ AUTHOR_MAP = {
"m@statecraft.systems": "mbierling",
"balyan.sid@gmail.com": "balyansid",
"oluwadareab12@gmail.com": "bennytimz",
"simon@simonmarcus.org": "simon-marcus",
"xowiekk@gmail.com": "Xowiek",
"1243352777@qq.com": "zons-zhaozhy",
# ── bulk addition: 75 emails resolved via API, PR salvage bodies, noreply
# crossref, and GH contributor list matching (April 2026 audit) ──
"1115117931@qq.com": "aaronagent",
@@ -199,14 +187,12 @@ AUTHOR_MAP = {
"yangzhi.see@gmail.com": "SeeYangZhi",
"yongtenglei@gmail.com": "yongtenglei",
"young@YoungdeMacBook-Pro.local": "YoungYang963",
"ysfalweshcan@gmail.com": "Junass1",
"ysfalweshcan@gmail.com": "Awsh1",
"ysfwaxlycan@gmail.com": "WAXLYY",
"yusufalweshdemir@gmail.com": "Dusk1e",
"zhouboli@gmail.com": "zhouboli",
"zqiao@microsoft.com": "tomqiaozc",
"zzn+pa@zzn.im": "xinbenlv",
"zaynjarvis@gmail.com": "ZaynJarvis",
"zhiheng.liu@bytedance.com": "ZaynJarvis",
}
@@ -313,7 +313,7 @@ Type these during an interactive chat session.
```
~/.hermes/config.yaml Main configuration
~/.hermes/.env API keys and secrets
$HERMES_HOME/skills/ Installed skills
~/.hermes/skills/ Installed skills
~/.hermes/sessions/ Session transcripts
~/.hermes/logs/ Gateway and error logs
~/.hermes/auth.json OAuth tokens and credential pools
@@ -351,8 +351,8 @@ Full config reference: https://hermes-agent.nousresearch.com/docs/user-guide/con
|----------|------|-------------|
| OpenRouter | API key | `OPENROUTER_API_KEY` |
| Anthropic | API key | `ANTHROPIC_API_KEY` |
| Nous Portal | OAuth | `hermes auth` |
| OpenAI Codex | OAuth | `hermes auth` |
| Nous Portal | OAuth | `hermes login --provider nous` |
| OpenAI Codex | OAuth | `hermes login --provider openai-codex` |
| GitHub Copilot | Token | `COPILOT_GITHUB_TOKEN` |
| Google Gemini | API key | `GOOGLE_API_KEY` or `GEMINI_API_KEY` |
| DeepSeek | API key | `DEEPSEEK_API_KEY` |
@@ -650,9 +650,9 @@ registry.register(
)
```
**2. Add to `toolsets.py`**`_HERMES_CORE_TOOLS` list.
**2. Add import** in `model_tools.py``_discover_tools()` list.
Auto-discovery: any `tools/*.py` file with a top-level `registry.register()` call is imported automatically — no manual list needed.
**3. Add to `toolsets.py`** → `_HERMES_CORE_TOOLS` list.
All handlers must return JSON strings. Use `get_hermes_home()` for paths, never hardcode `~/.hermes`.
+1 -1
View File
@@ -334,7 +334,7 @@ When the user asks you to "review PR #N", "look at this PR", or gives you a PR U
### Step 1: Set up environment
```bash
source "${HERMES_HOME:-$HOME/.hermes}/skills/github/github-auth/scripts/gh-env.sh"
source ~/.hermes/skills/github/github-auth/scripts/gh-env.sh
# Or run the inline setup block from the top of this skill
```
@@ -6,7 +6,7 @@ All requests need: `-H "Authorization: token $GITHUB_TOKEN"`
Use the `gh-env.sh` helper to set `$GITHUB_TOKEN`, `$GH_OWNER`, `$GH_REPO` automatically:
```bash
source "${HERMES_HOME:-$HOME/.hermes}/skills/github/github-auth/scripts/gh-env.sh"
source ~/.hermes/skills/github/github-auth/scripts/gh-env.sh
```
## Repositories
@@ -98,7 +98,7 @@ def find_nearby(lat: float, lon: float, types: list[str], radius: int = 1500, li
# Get coordinates (nodes have lat/lon directly, ways/relations use center)
plat = el.get("lat") or (el.get("center", {}) or {}).get("lat")
plon = el.get("lon") or (el.get("center", {}) or {}).get("lon")
if plat is None or plon is None:
if not plat or not plon:
continue
dist = haversine(lat, lon, plat, plon)
+105 -136
View File
@@ -1,19 +1,35 @@
---
name: google-workspace
description: Gmail, Calendar, Drive, Contacts, Sheets, and Docs integration for Hermes. Uses Hermes-managed OAuth2 setup, prefers the Google Workspace CLI (`gws`) when available for broader API coverage, and falls back to the Python client libraries otherwise.
version: 1.0.0
description: Gmail, Calendar, Drive, Contacts, Sheets, and Docs integration via gws CLI (googleworkspace/cli). Uses OAuth2 with automatic token refresh via bridge script. Requires gws binary.
version: 2.0.0
author: Nous Research
license: MIT
required_credential_files:
- path: google_token.json
description: Google OAuth2 token (created by setup script)
- path: google_client_secret.json
description: Google OAuth2 client credentials (downloaded from Google Cloud Console)
metadata:
hermes:
tags: [Google, Gmail, Calendar, Drive, Sheets, Docs, Contacts, Email, OAuth]
tags: [Google, Gmail, Calendar, Drive, Sheets, Docs, Contacts, Email, OAuth, gws]
homepage: https://github.com/NousResearch/hermes-agent
related_skills: [himalaya]
---
# Google Workspace
Gmail, Calendar, Drive, Contacts, Sheets, and Docs — through Hermes-managed OAuth and a thin CLI wrapper. When `gws` is installed, the skill uses it as the execution backend for broader Google Workspace coverage; otherwise it falls back to the bundled Python client implementation.
Gmail, Calendar, Drive, Contacts, Sheets, and Docs — powered by `gws` (Google's official Rust CLI). The skill provides a backward-compatible Python wrapper that handles OAuth token refresh and delegates to `gws`.
## Architecture
```
google_api.py → gws_bridge.py → gws CLI
(argparse compat) (token refresh) (Google APIs)
```
- `setup.py` handles OAuth2 (headless-compatible, works on CLI/Telegram/Discord)
- `gws_bridge.py` refreshes the Hermes token and injects it into `gws` via `GOOGLE_WORKSPACE_CLI_TOKEN`
- `google_api.py` provides the same CLI interface as v1 but delegates to `gws`
## References
@@ -22,7 +38,22 @@ Gmail, Calendar, Drive, Contacts, Sheets, and Docs — through Hermes-managed OA
## Scripts
- `scripts/setup.py` — OAuth2 setup (run once to authorize)
- `scripts/google_api.py` — compatibility wrapper CLI. It prefers `gws` for operations when available, while preserving Hermes' existing JSON output contract.
- `scripts/gws_bridge.py` — Token refresh bridge to gws CLI
- `scripts/google_api.py` — Backward-compatible API wrapper (delegates to gws)
## Prerequisites
Install `gws`:
```bash
cargo install google-workspace-cli
# or via npm (recommended, downloads prebuilt binary):
npm install -g @googleworkspace/cli
# or via Homebrew:
brew install googleworkspace-cli
```
Verify: `gws --version`
## First-Time Setup
@@ -32,7 +63,13 @@ on CLI, Telegram, Discord, or any platform.
Define a shorthand first:
```bash
GSETUP="python ${HERMES_HOME:-$HOME/.hermes}/skills/productivity/google-workspace/scripts/setup.py"
HERMES_HOME="${HERMES_HOME:-$HOME/.hermes}"
GWORKSPACE_SKILL_DIR="$HERMES_HOME/skills/productivity/google-workspace"
PYTHON_BIN="${HERMES_PYTHON:-python3}"
if [ -x "$HERMES_HOME/hermes-agent/venv/bin/python" ]; then
PYTHON_BIN="$HERMES_HOME/hermes-agent/venv/bin/python"
fi
GSETUP="$PYTHON_BIN $GWORKSPACE_SKILL_DIR/scripts/setup.py"
```
### Step 0: Check if already set up
@@ -45,166 +82,88 @@ If it prints `AUTHENTICATED`, skip to Usage — setup is already done.
### Step 1: Triage — ask the user what they need
Before starting OAuth setup, ask the user TWO questions:
**Question 1: "What Google services do you need? Just email, or also
Calendar/Drive/Sheets/Docs?"**
- **Email only** They don't need this skill at all. Use the `himalaya` skill
instead — it works with a Gmail App Password (Settings → Security → App
Passwords) and takes 2 minutes to set up. No Google Cloud project needed.
Load the himalaya skill and follow its setup instructions.
- **Email only** → Use the `himalaya` skill instead — simpler setup.
- **Calendar, Drive, Sheets, Docs (or email + these)** → Continue below.
- **Email + Calendar** → Continue with this skill, but use
`--services email,calendar` during auth so the consent screen only asks for
the scopes they actually need.
**Partial scopes**: Users can authorize only a subset of services. The setup
script accepts partial scopes and warns about missing ones.
- **Calendar/Drive/Sheets/Docs only** → Continue with this skill and use a
narrower `--services` set like `calendar,drive,sheets,docs`.
**Question 2: "Does your Google account use Advanced Protection?"**
- **Full Workspace access** → Continue with this skill and use the default
`all` service set.
**Question 2: "Does your Google account use Advanced Protection (hardware
security keys required to sign in)? If you're not sure, you probably don't
— it's something you would have explicitly enrolled in."**
- **No / Not sure** → Normal setup. Continue below.
- **Yes** → Their Workspace admin must add the OAuth client ID to the org's
allowed apps list before Step 4 will work. Let them know upfront.
- **No / Not sure** → Normal setup.
- **Yes** → Workspace admin must add the OAuth client ID to allowed apps first.
### Step 2: Create OAuth credentials (one-time, ~5 minutes)
Tell the user:
> You need a Google Cloud OAuth client. This is a one-time setup:
>
> 1. Create or select a project:
> https://console.cloud.google.com/projectselector2/home/dashboard
> 2. Enable the required APIs from the API Library:
> https://console.cloud.google.com/apis/library
> Enable: Gmail API, Google Calendar API, Google Drive API,
> Google Sheets API, Google Docs API, People API
> 3. Create the OAuth client here:
> https://console.cloud.google.com/apis/credentials
> Credentials → Create Credentials → OAuth 2.0 Client ID
> 4. Application type: "Desktop app" → Create
> 5. If the app is still in Testing, add the user's Google account as a test user here:
> https://console.cloud.google.com/auth/audience
> Audience → Test users → Add users
> 6. Download the JSON file and tell me the file path
>
> Important Hermes CLI note: if the file path starts with `/`, do NOT send only the bare path as its own message in the CLI, because it can be mistaken for a slash command. Send it in a sentence instead, like:
> `The JSON file path is: /home/user/Downloads/client_secret_....json`
Once they provide the path:
> 1. Go to https://console.cloud.google.com/apis/credentials
> 2. Create a project (or use an existing one)
> 3. Enable the APIs you need (Gmail, Calendar, Drive, Sheets, Docs, People)
> 4. Credentials → Create Credentials → OAuth 2.0 Client ID → Desktop app
> 5. Download JSON and tell me the file path
```bash
$GSETUP --client-secret /path/to/client_secret.json
```
If they paste the raw client ID / client secret values instead of a file path,
write a valid Desktop OAuth JSON file for them yourself, save it somewhere
explicit (for example `~/Downloads/hermes-google-client-secret.json`), then run
`--client-secret` against that file.
### Step 3: Get authorization URL
Use the service set chosen in Step 1. Examples:
```bash
$GSETUP --auth-url --services email,calendar --format json
$GSETUP --auth-url --services calendar,drive,sheets,docs --format json
$GSETUP --auth-url --services all --format json
$GSETUP --auth-url
```
This returns JSON with an `auth_url` field and also saves the exact URL to
`~/.hermes/google_oauth_last_url.txt`.
Agent rules for this step:
- Extract the `auth_url` field and send that exact URL to the user as a single line.
- Tell the user that the browser will likely fail on `http://localhost:1` after approval, and that this is expected.
- Tell them to copy the ENTIRE redirected URL from the browser address bar.
- If the user gets `Error 403: access_denied`, send them directly to `https://console.cloud.google.com/auth/audience` to add themselves as a test user.
Send the URL to the user. After authorizing, they paste back the redirect URL or code.
### Step 4: Exchange the code
The user will paste back either a URL like `http://localhost:1/?code=4/0A...&scope=...`
or just the code string. Either works. The `--auth-url` step stores a temporary
pending OAuth session locally so `--auth-code` can complete the PKCE exchange
later, even on headless systems:
```bash
$GSETUP --auth-code "THE_URL_OR_CODE_THE_USER_PASTED" --format json
$GSETUP --auth-code "THE_URL_OR_CODE_THE_USER_PASTED"
```
If `--auth-code` fails because the code expired, was already used, or came from
an older browser tab, it now returns a fresh `fresh_auth_url`. In that case,
immediately send the new URL to the user and have them retry with the newest
browser redirect only.
### Step 5: Verify
```bash
$GSETUP --check
```
Should print `AUTHENTICATED`. Setup is complete — token refreshes automatically from now on.
### Notes
- Token is stored at `~/.hermes/google_token.json` and auto-refreshes.
- Pending OAuth session state/verifier are stored temporarily at `~/.hermes/google_oauth_pending.json` until exchange completes.
- If `gws` is installed, `google_api.py` points it at the same `~/.hermes/google_token.json` credentials file. Users do not need to run a separate `gws auth login` flow.
- To revoke: `$GSETUP --revoke`
Should print `AUTHENTICATED`. Token refreshes automatically from now on.
## Usage
All commands go through the API script. Set `GAPI` as a shorthand:
All commands go through the API script:
```bash
GAPI="python ${HERMES_HOME:-$HOME/.hermes}/skills/productivity/google-workspace/scripts/google_api.py"
HERMES_HOME="${HERMES_HOME:-$HOME/.hermes}"
GWORKSPACE_SKILL_DIR="$HERMES_HOME/skills/productivity/google-workspace"
PYTHON_BIN="${HERMES_PYTHON:-python3}"
if [ -x "$HERMES_HOME/hermes-agent/venv/bin/python" ]; then
PYTHON_BIN="$HERMES_HOME/hermes-agent/venv/bin/python"
fi
GAPI="$PYTHON_BIN $GWORKSPACE_SKILL_DIR/scripts/google_api.py"
```
### Gmail
```bash
# Search (returns JSON array with id, from, subject, date, snippet)
$GAPI gmail search "is:unread" --max 10
$GAPI gmail search "from:boss@company.com newer_than:1d"
$GAPI gmail search "has:attachment filename:pdf newer_than:7d"
# Read full message (returns JSON with body text)
$GAPI gmail get MESSAGE_ID
# Send
$GAPI gmail send --to user@example.com --subject "Hello" --body "Message text"
$GAPI gmail send --to user@example.com --subject "Report" --body "<h1>Q4</h1><p>Details...</p>" --html
$GAPI gmail send --to user@example.com --subject "Hello" --from '"Research Agent" <user@example.com>' --body "Message text"
# Reply (automatically threads and sets In-Reply-To)
$GAPI gmail send --to user@example.com --subject "Report" --body "<h1>Q4</h1>" --html
$GAPI gmail reply MESSAGE_ID --body "Thanks, that works for me."
$GAPI gmail reply MESSAGE_ID --from '"Support Bot" <user@example.com>' --body "Thanks"
# Labels
$GAPI gmail labels
$GAPI gmail modify MESSAGE_ID --add-labels LABEL_ID
$GAPI gmail modify MESSAGE_ID --remove-labels UNREAD
```
### Calendar
```bash
# List events (defaults to next 7 days)
$GAPI calendar list
$GAPI calendar list --start 2026-03-01T00:00:00Z --end 2026-03-07T23:59:59Z
# Create event (ISO 8601 with timezone required)
$GAPI calendar create --summary "Team Standup" --start 2026-03-01T10:00:00-06:00 --end 2026-03-01T10:30:00-06:00
$GAPI calendar create --summary "Lunch" --start 2026-03-01T12:00:00Z --end 2026-03-01T13:00:00Z --location "Cafe"
$GAPI calendar create --summary "Review" --start 2026-03-01T14:00:00Z --end 2026-03-01T15:00:00Z --attendees "alice@co.com,bob@co.com"
# Delete event
$GAPI calendar create --summary "Standup" --start 2026-03-01T10:00:00+01:00 --end 2026-03-01T10:30:00+01:00
$GAPI calendar create --summary "Review" --start ... --end ... --attendees "alice@co.com,bob@co.com"
$GAPI calendar delete EVENT_ID
```
@@ -224,13 +183,8 @@ $GAPI contacts list --max 20
### Sheets
```bash
# Read
$GAPI sheets get SHEET_ID "Sheet1!A1:D10"
# Write
$GAPI sheets update SHEET_ID "Sheet1!A1:B2" --values '[["Name","Score"],["Alice","95"]]'
# Append rows
$GAPI sheets append SHEET_ID "Sheet1!A:C" --values '[["new","row","data"]]'
```
@@ -240,37 +194,52 @@ $GAPI sheets append SHEET_ID "Sheet1!A:C" --values '[["new","row","data"]]'
$GAPI docs get DOC_ID
```
### Direct gws access (advanced)
For operations not covered by the wrapper, use `gws_bridge.py` directly:
```bash
GBRIDGE="$PYTHON_BIN $GWORKSPACE_SKILL_DIR/scripts/gws_bridge.py"
$GBRIDGE calendar +agenda --today --format table
$GBRIDGE gmail +triage --labels --format json
$GBRIDGE drive +upload ./report.pdf
$GBRIDGE sheets +read --spreadsheet SHEET_ID --range "Sheet1!A1:D10"
```
## Output Format
All commands return JSON. Parse with `jq` or read directly. Key fields:
All commands return JSON via `gws --format json`. Key output shapes:
- **Gmail search**: `[{id, threadId, from, to, subject, date, snippet, labels}]`
- **Gmail get**: `{id, threadId, from, to, subject, date, labels, body}`
- **Gmail send/reply**: `{status: "sent", id, threadId}`
- **Calendar list**: `[{id, summary, start, end, location, description, htmlLink}]`
- **Calendar create**: `{status: "created", id, summary, htmlLink}`
- **Drive search**: `[{id, name, mimeType, modifiedTime, webViewLink}]`
- **Contacts list**: `[{name, emails: [...], phones: [...]}]`
- **Sheets get**: `[[cell, cell, ...], ...]`
- **Gmail search/triage**: Array of message summaries (sender, subject, date, snippet)
- **Gmail get/read**: Message object with headers and body text
- **Gmail send/reply**: Confirmation with message ID
- **Calendar list/agenda**: Array of event objects (summary, start, end, location)
- **Calendar create**: Confirmation with event ID and htmlLink
- **Drive search**: Array of file objects (id, name, mimeType, webViewLink)
- **Sheets get/read**: 2D array of cell values
- **Docs get**: Full document JSON (use `body.content` for text extraction)
- **Contacts list**: Array of person objects with names, emails, phones
Parse output with `jq` or read JSON directly.
## Rules
1. **Never send email or create/delete events without confirming with the user first.** Show the draft content and ask for approval.
2. **Check auth before first use** — run `setup.py --check`. If it fails, guide the user through setup.
3. **Use the Gmail search syntax reference** for complex queries — load it with `skill_view("google-workspace", file_path="references/gmail-search-syntax.md")`.
4. **Calendar times must include timezone** always use ISO 8601 with offset (e.g., `2026-03-01T10:00:00-06:00`) or UTC (`Z`).
5. **Respect rate limits** — avoid rapid-fire sequential API calls. Batch reads when possible.
1. **Never send email or create/delete events without confirming with the user first.**
2. **Check auth before first use** — run `setup.py --check`.
3. **Use the Gmail search syntax reference** for complex queries.
4. **Calendar times must include timezone** — ISO 8601 with offset or UTC.
5. **Respect rate limits** — avoid rapid-fire sequential API calls.
## Troubleshooting
| Problem | Fix |
|---------|-----|
| `NOT_AUTHENTICATED` | Run setup Steps 2-5 above |
| `REFRESH_FAILED` | Token revoked or expired — redo Steps 3-5 |
| `HttpError 403: Insufficient Permission` | Missing API scope — `$GSETUP --revoke` then redo Steps 3-5 |
| `HttpError 403: Access Not Configured` | API not enabled — user needs to enable it in Google Cloud Console |
| `ModuleNotFoundError` | Run `$GSETUP --install-deps` |
| Advanced Protection blocks auth | Workspace admin must allowlist the OAuth client ID |
| `NOT_AUTHENTICATED` | Run setup Steps 2-5 |
| `REFRESH_FAILED` | Token revoked — redo Steps 3-5 |
| `gws: command not found` | Install: `npm install -g @googleworkspace/cli` |
| `HttpError 403` | Missing scope — `$GSETUP --revoke` then redo Steps 3-5 |
| `HttpError 403: Access Not Configured` | Enable API in Google Cloud Console |
| Advanced Protection blocks auth | Admin must allowlist the OAuth client ID |
## Revoking Access
@@ -1,17 +1,17 @@
#!/usr/bin/env python3
"""Google Workspace API CLI for Hermes Agent.
Uses the Google Workspace CLI (`gws`) when available, but preserves the
existing Hermes-facing JSON contract and falls back to the Python client
libraries if `gws` is not installed.
Thin wrapper that delegates to gws (googleworkspace/cli) via gws_bridge.py.
Maintains the same CLI interface for backward compatibility with Hermes skills.
Usage:
python google_api.py gmail search "is:unread" [--max 10]
python google_api.py gmail get MESSAGE_ID
python google_api.py gmail send --to user@example.com --subject "Hi" --body "Hello"
python google_api.py gmail reply MESSAGE_ID --body "Thanks"
python google_api.py calendar list [--from DATE] [--to DATE] [--calendar primary]
python google_api.py calendar list [--start DATE] [--end DATE] [--calendar primary]
python google_api.py calendar create --summary "Meeting" --start DATETIME --end DATETIME
python google_api.py calendar delete EVENT_ID
python google_api.py drive search "budget report" [--max 10]
python google_api.py contacts list [--max 20]
python google_api.py sheets get SHEET_ID RANGE
@@ -21,396 +21,47 @@ Usage:
"""
import argparse
import base64
import json
import os
import shutil
import subprocess
import sys
from datetime import datetime, timedelta, timezone
from email.mime.text import MIMEText
from pathlib import Path
HERMES_HOME = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
TOKEN_PATH = HERMES_HOME / "google_token.json"
CLIENT_SECRET_PATH = HERMES_HOME / "google_client_secret.json"
SCOPES = [
"https://www.googleapis.com/auth/gmail.readonly",
"https://www.googleapis.com/auth/gmail.send",
"https://www.googleapis.com/auth/gmail.modify",
"https://www.googleapis.com/auth/calendar",
"https://www.googleapis.com/auth/drive.readonly",
"https://www.googleapis.com/auth/contacts.readonly",
"https://www.googleapis.com/auth/spreadsheets",
"https://www.googleapis.com/auth/documents.readonly",
]
BRIDGE = Path(__file__).parent / "gws_bridge.py"
PYTHON = sys.executable
def _ensure_authenticated():
if not TOKEN_PATH.exists():
print("Not authenticated. Run the setup script first:", file=sys.stderr)
print(f" python {Path(__file__).parent / 'setup.py'}", file=sys.stderr)
sys.exit(1)
def _stored_token_scopes() -> list[str]:
try:
data = json.loads(TOKEN_PATH.read_text())
except Exception:
return list(SCOPES)
scopes = data.get("scopes")
if isinstance(scopes, list) and scopes:
return scopes
return list(SCOPES)
def _gws_binary() -> str | None:
override = os.getenv("HERMES_GWS_BIN")
if override:
return override
return shutil.which("gws")
def _gws_env() -> dict[str, str]:
env = os.environ.copy()
env["GOOGLE_WORKSPACE_CLI_CREDENTIALS_FILE"] = str(TOKEN_PATH)
return env
def _run_gws(parts: list[str], *, params: dict | None = None, body: dict | None = None):
binary = _gws_binary()
if not binary:
raise RuntimeError("gws not installed")
_ensure_authenticated()
cmd = [binary, *parts]
if params is not None:
cmd.extend(["--params", json.dumps(params)])
if body is not None:
cmd.extend(["--json", json.dumps(body)])
def gws(*args: str) -> None:
"""Call gws via the bridge and exit with its return code."""
result = subprocess.run(
cmd,
capture_output=True,
text=True,
env=_gws_env(),
[PYTHON, str(BRIDGE)] + list(args),
env={**os.environ, "HERMES_HOME": os.environ.get("HERMES_HOME", str(Path.home() / ".hermes"))},
)
if result.returncode != 0:
err = result.stderr.strip() or result.stdout.strip() or "Unknown gws error"
print(err, file=sys.stderr)
sys.exit(result.returncode or 1)
stdout = result.stdout.strip()
if not stdout:
return {}
try:
return json.loads(stdout)
except json.JSONDecodeError:
print("ERROR: Unexpected non-JSON output from gws:", file=sys.stderr)
print(stdout, file=sys.stderr)
sys.exit(1)
sys.exit(result.returncode)
def _headers_dict(msg: dict) -> dict[str, str]:
return {h["name"]: h["value"] for h in msg.get("payload", {}).get("headers", [])}
def _extract_message_body(msg: dict) -> str:
body = ""
payload = msg.get("payload", {})
if payload.get("body", {}).get("data"):
body = base64.urlsafe_b64decode(payload["body"]["data"]).decode("utf-8", errors="replace")
elif payload.get("parts"):
for part in payload["parts"]:
if part.get("mimeType") == "text/plain" and part.get("body", {}).get("data"):
body = base64.urlsafe_b64decode(part["body"]["data"]).decode("utf-8", errors="replace")
break
if not body:
for part in payload["parts"]:
if part.get("mimeType") == "text/html" and part.get("body", {}).get("data"):
body = base64.urlsafe_b64decode(part["body"]["data"]).decode("utf-8", errors="replace")
break
return body
def _extract_doc_text(doc: dict) -> str:
text_parts = []
for element in doc.get("body", {}).get("content", []):
paragraph = element.get("paragraph", {})
for pe in paragraph.get("elements", []):
text_run = pe.get("textRun", {})
if text_run.get("content"):
text_parts.append(text_run["content"])
return "".join(text_parts)
def _datetime_with_timezone(value: str) -> str:
if not value:
return value
if "T" not in value:
return value
if value.endswith("Z"):
return value
tail = value[10:]
if "+" in tail or "-" in tail:
return value
return value + "Z"
def get_credentials():
"""Load and refresh credentials from token file."""
_ensure_authenticated()
from google.oauth2.credentials import Credentials
from google.auth.transport.requests import Request
creds = Credentials.from_authorized_user_file(str(TOKEN_PATH), _stored_token_scopes())
if creds.expired and creds.refresh_token:
creds.refresh(Request())
TOKEN_PATH.write_text(creds.to_json())
if not creds.valid:
print("Token is invalid. Re-run setup.", file=sys.stderr)
sys.exit(1)
return creds
def build_service(api, version):
from googleapiclient.discovery import build
return build(api, version, credentials=get_credentials())
# =========================================================================
# Gmail
# =========================================================================
# -- Gmail --
def gmail_search(args):
if _gws_binary():
results = _run_gws(
["gmail", "users", "messages", "list"],
params={"userId": "me", "q": args.query, "maxResults": args.max},
)
messages = results.get("messages", [])
output = []
for msg_meta in messages:
msg = _run_gws(
["gmail", "users", "messages", "get"],
params={
"userId": "me",
"id": msg_meta["id"],
"format": "metadata",
"metadataHeaders": ["From", "To", "Subject", "Date"],
},
)
headers = _headers_dict(msg)
output.append(
{
"id": msg["id"],
"threadId": msg["threadId"],
"from": headers.get("From", ""),
"to": headers.get("To", ""),
"subject": headers.get("Subject", ""),
"date": headers.get("Date", ""),
"snippet": msg.get("snippet", ""),
"labels": msg.get("labelIds", []),
}
)
print(json.dumps(output, indent=2, ensure_ascii=False))
return
service = build_service("gmail", "v1")
results = service.users().messages().list(
userId="me", q=args.query, maxResults=args.max
).execute()
messages = results.get("messages", [])
if not messages:
print("No messages found.")
return
output = []
for msg_meta in messages:
msg = service.users().messages().get(
userId="me", id=msg_meta["id"], format="metadata",
metadataHeaders=["From", "To", "Subject", "Date"],
).execute()
headers = _headers_dict(msg)
output.append({
"id": msg["id"],
"threadId": msg["threadId"],
"from": headers.get("From", ""),
"to": headers.get("To", ""),
"subject": headers.get("Subject", ""),
"date": headers.get("Date", ""),
"snippet": msg.get("snippet", ""),
"labels": msg.get("labelIds", []),
})
print(json.dumps(output, indent=2, ensure_ascii=False))
cmd = ["gmail", "+triage", "--query", args.query, "--max", str(args.max), "--format", "json"]
gws(*cmd)
def gmail_get(args):
if _gws_binary():
msg = _run_gws(
["gmail", "users", "messages", "get"],
params={"userId": "me", "id": args.message_id, "format": "full"},
)
headers = _headers_dict(msg)
result = {
"id": msg["id"],
"threadId": msg["threadId"],
"from": headers.get("From", ""),
"to": headers.get("To", ""),
"subject": headers.get("Subject", ""),
"date": headers.get("Date", ""),
"labels": msg.get("labelIds", []),
"body": _extract_message_body(msg),
}
print(json.dumps(result, indent=2, ensure_ascii=False))
return
service = build_service("gmail", "v1")
msg = service.users().messages().get(
userId="me", id=args.message_id, format="full"
).execute()
headers = _headers_dict(msg)
result = {
"id": msg["id"],
"threadId": msg["threadId"],
"from": headers.get("From", ""),
"to": headers.get("To", ""),
"subject": headers.get("Subject", ""),
"date": headers.get("Date", ""),
"labels": msg.get("labelIds", []),
"body": _extract_message_body(msg),
}
print(json.dumps(result, indent=2, ensure_ascii=False))
gws("gmail", "+read", "--id", args.message_id, "--headers", "--format", "json")
def gmail_send(args):
if _gws_binary():
message = MIMEText(args.body, "html" if args.html else "plain")
message["to"] = args.to
message["subject"] = args.subject
if args.cc:
message["cc"] = args.cc
if args.from_header:
message["from"] = args.from_header
raw = base64.urlsafe_b64encode(message.as_bytes()).decode()
body = {"raw": raw}
if args.thread_id:
body["threadId"] = args.thread_id
result = _run_gws(
["gmail", "users", "messages", "send"],
params={"userId": "me"},
body=body,
)
print(json.dumps({"status": "sent", "id": result["id"], "threadId": result.get("threadId", "")}, indent=2))
return
service = build_service("gmail", "v1")
message = MIMEText(args.body, "html" if args.html else "plain")
message["to"] = args.to
message["subject"] = args.subject
cmd = ["gmail", "+send", "--to", args.to, "--subject", args.subject, "--body", args.body, "--format", "json"]
if args.cc:
message["cc"] = args.cc
if args.from_header:
message["from"] = args.from_header
raw = base64.urlsafe_b64encode(message.as_bytes()).decode()
body = {"raw": raw}
if args.thread_id:
body["threadId"] = args.thread_id
result = service.users().messages().send(userId="me", body=body).execute()
print(json.dumps({"status": "sent", "id": result["id"], "threadId": result.get("threadId", "")}, indent=2))
cmd += ["--cc", args.cc]
if args.html:
cmd.append("--html")
gws(*cmd)
def gmail_reply(args):
if _gws_binary():
original = _run_gws(
["gmail", "users", "messages", "get"],
params={
"userId": "me",
"id": args.message_id,
"format": "metadata",
"metadataHeaders": ["From", "Subject", "Message-ID"],
},
)
headers = _headers_dict(original)
subject = headers.get("Subject", "")
if not subject.startswith("Re:"):
subject = f"Re: {subject}"
message = MIMEText(args.body)
message["to"] = headers.get("From", "")
message["subject"] = subject
if args.from_header:
message["from"] = args.from_header
if headers.get("Message-ID"):
message["In-Reply-To"] = headers["Message-ID"]
message["References"] = headers["Message-ID"]
raw = base64.urlsafe_b64encode(message.as_bytes()).decode()
result = _run_gws(
["gmail", "users", "messages", "send"],
params={"userId": "me"},
body={"raw": raw, "threadId": original["threadId"]},
)
print(json.dumps({"status": "sent", "id": result["id"], "threadId": result.get("threadId", "")}, indent=2))
return
service = build_service("gmail", "v1")
original = service.users().messages().get(
userId="me", id=args.message_id, format="metadata",
metadataHeaders=["From", "Subject", "Message-ID"],
).execute()
headers = _headers_dict(original)
subject = headers.get("Subject", "")
if not subject.startswith("Re:"):
subject = f"Re: {subject}"
message = MIMEText(args.body)
message["to"] = headers.get("From", "")
message["subject"] = subject
if args.from_header:
message["from"] = args.from_header
if headers.get("Message-ID"):
message["In-Reply-To"] = headers["Message-ID"]
message["References"] = headers["Message-ID"]
raw = base64.urlsafe_b64encode(message.as_bytes()).decode()
body = {"raw": raw, "threadId": original["threadId"]}
result = service.users().messages().send(userId="me", body=body).execute()
print(json.dumps({"status": "sent", "id": result["id"], "threadId": result.get("threadId", "")}, indent=2))
gws("gmail", "+reply", "--message-id", args.message_id, "--body", args.body, "--format", "json")
def gmail_labels(args):
if _gws_binary():
results = _run_gws(["gmail", "users", "labels", "list"], params={"userId": "me"})
labels = [{"id": l["id"], "name": l["name"], "type": l.get("type", "")} for l in results.get("labels", [])]
print(json.dumps(labels, indent=2))
return
service = build_service("gmail", "v1")
results = service.users().labels().list(userId="me").execute()
labels = [{"id": l["id"], "name": l["name"], "type": l.get("type", "")} for l in results.get("labels", [])]
print(json.dumps(labels, indent=2))
gws("gmail", "users", "labels", "list", "--params", json.dumps({"userId": "me"}), "--format", "json")
def gmail_modify(args):
body = {}
@@ -418,310 +69,145 @@ def gmail_modify(args):
body["addLabelIds"] = args.add_labels.split(",")
if args.remove_labels:
body["removeLabelIds"] = args.remove_labels.split(",")
if _gws_binary():
result = _run_gws(
["gmail", "users", "messages", "modify"],
params={"userId": "me", "id": args.message_id},
body=body,
)
print(json.dumps({"id": result["id"], "labels": result.get("labelIds", [])}, indent=2))
return
service = build_service("gmail", "v1")
result = service.users().messages().modify(userId="me", id=args.message_id, body=body).execute()
print(json.dumps({"id": result["id"], "labels": result.get("labelIds", [])}, indent=2))
gws(
"gmail", "users", "messages", "modify",
"--params", json.dumps({"userId": "me", "id": args.message_id}),
"--json", json.dumps(body),
"--format", "json",
)
# =========================================================================
# Calendar
# =========================================================================
# -- Calendar --
def calendar_list(args):
now = datetime.now(timezone.utc)
time_min = _datetime_with_timezone(args.start or now.isoformat())
time_max = _datetime_with_timezone(args.end or (now + timedelta(days=7)).isoformat())
if _gws_binary():
results = _run_gws(
["calendar", "events", "list"],
params={
if args.start or args.end:
# Specific date range — use raw Calendar API for precise timeMin/timeMax
from datetime import datetime, timedelta, timezone as tz
now = datetime.now(tz.utc)
time_min = args.start or now.isoformat()
time_max = args.end or (now + timedelta(days=7)).isoformat()
gws(
"calendar", "events", "list",
"--params", json.dumps({
"calendarId": args.calendar,
"timeMin": time_min,
"timeMax": time_max,
"maxResults": args.max,
"singleEvents": True,
"orderBy": "startTime",
},
}),
"--format", "json",
)
events = []
for e in results.get("items", []):
events.append({
"id": e["id"],
"summary": e.get("summary", "(no title)"),
"start": e.get("start", {}).get("dateTime", e.get("start", {}).get("date", "")),
"end": e.get("end", {}).get("dateTime", e.get("end", {}).get("date", "")),
"location": e.get("location", ""),
"description": e.get("description", ""),
"status": e.get("status", ""),
"htmlLink": e.get("htmlLink", ""),
})
print(json.dumps(events, indent=2, ensure_ascii=False))
return
service = build_service("calendar", "v3")
results = service.events().list(
calendarId=args.calendar, timeMin=time_min, timeMax=time_max,
maxResults=args.max, singleEvents=True, orderBy="startTime",
).execute()
events = []
for e in results.get("items", []):
events.append({
"id": e["id"],
"summary": e.get("summary", "(no title)"),
"start": e.get("start", {}).get("dateTime", e.get("start", {}).get("date", "")),
"end": e.get("end", {}).get("dateTime", e.get("end", {}).get("date", "")),
"location": e.get("location", ""),
"description": e.get("description", ""),
"status": e.get("status", ""),
"htmlLink": e.get("htmlLink", ""),
})
print(json.dumps(events, indent=2, ensure_ascii=False))
else:
# No date range — use +agenda helper (defaults to 7 days)
cmd = ["calendar", "+agenda", "--days", "7", "--format", "json"]
if args.calendar != "primary":
cmd += ["--calendar", args.calendar]
gws(*cmd)
def calendar_create(args):
event = {
"summary": args.summary,
"start": {"dateTime": args.start},
"end": {"dateTime": args.end},
}
cmd = [
"calendar", "+insert",
"--summary", args.summary,
"--start", args.start,
"--end", args.end,
"--format", "json",
]
if args.location:
event["location"] = args.location
cmd += ["--location", args.location]
if args.description:
event["description"] = args.description
cmd += ["--description", args.description]
if args.attendees:
event["attendees"] = [{"email": e.strip()} for e in args.attendees.split(",") if e.strip()]
if _gws_binary():
result = _run_gws(
["calendar", "events", "insert"],
params={"calendarId": args.calendar},
body=event,
)
print(json.dumps({
"status": "created",
"id": result["id"],
"summary": result.get("summary", ""),
"htmlLink": result.get("htmlLink", ""),
}, indent=2))
return
service = build_service("calendar", "v3")
result = service.events().insert(calendarId=args.calendar, body=event).execute()
print(json.dumps({
"status": "created",
"id": result["id"],
"summary": result.get("summary", ""),
"htmlLink": result.get("htmlLink", ""),
}, indent=2))
for email in args.attendees.split(","):
cmd += ["--attendee", email.strip()]
if args.calendar != "primary":
cmd += ["--calendar", args.calendar]
gws(*cmd)
def calendar_delete(args):
if _gws_binary():
_run_gws(["calendar", "events", "delete"], params={"calendarId": args.calendar, "eventId": args.event_id})
print(json.dumps({"status": "deleted", "eventId": args.event_id}))
return
service = build_service("calendar", "v3")
service.events().delete(calendarId=args.calendar, eventId=args.event_id).execute()
print(json.dumps({"status": "deleted", "eventId": args.event_id}))
gws(
"calendar", "events", "delete",
"--params", json.dumps({"calendarId": args.calendar, "eventId": args.event_id}),
"--format", "json",
)
# =========================================================================
# Drive
# =========================================================================
# -- Drive --
def drive_search(args):
query = args.query if args.raw_query else f"fullText contains '{args.query}'"
if _gws_binary():
results = _run_gws(
["drive", "files", "list"],
params={
"q": query,
"pageSize": args.max,
"fields": "files(id, name, mimeType, modifiedTime, webViewLink)",
},
)
print(json.dumps(results.get("files", []), indent=2, ensure_ascii=False))
return
service = build_service("drive", "v3")
results = service.files().list(
q=query, pageSize=args.max, fields="files(id, name, mimeType, modifiedTime, webViewLink)",
).execute()
files = results.get("files", [])
print(json.dumps(files, indent=2, ensure_ascii=False))
gws(
"drive", "files", "list",
"--params", json.dumps({
"q": query,
"pageSize": args.max,
"fields": "files(id,name,mimeType,modifiedTime,webViewLink)",
}),
"--format", "json",
)
# =========================================================================
# Contacts
# =========================================================================
# -- Contacts --
def contacts_list(args):
if _gws_binary():
results = _run_gws(
["people", "people", "connections", "list"],
params={
"resourceName": "people/me",
"pageSize": args.max,
"personFields": "names,emailAddresses,phoneNumbers",
},
)
contacts = []
for person in results.get("connections", []):
names = person.get("names", [{}])
emails = person.get("emailAddresses", [])
phones = person.get("phoneNumbers", [])
contacts.append({
"name": names[0].get("displayName", "") if names else "",
"emails": [e.get("value", "") for e in emails],
"phones": [p.get("value", "") for p in phones],
})
print(json.dumps(contacts, indent=2, ensure_ascii=False))
return
service = build_service("people", "v1")
results = service.people().connections().list(
resourceName="people/me",
pageSize=args.max,
personFields="names,emailAddresses,phoneNumbers",
).execute()
contacts = []
for person in results.get("connections", []):
names = person.get("names", [{}])
emails = person.get("emailAddresses", [])
phones = person.get("phoneNumbers", [])
contacts.append({
"name": names[0].get("displayName", "") if names else "",
"emails": [e.get("value", "") for e in emails],
"phones": [p.get("value", "") for p in phones],
})
print(json.dumps(contacts, indent=2, ensure_ascii=False))
gws(
"people", "people", "connections", "list",
"--params", json.dumps({
"resourceName": "people/me",
"pageSize": args.max,
"personFields": "names,emailAddresses,phoneNumbers",
}),
"--format", "json",
)
# =========================================================================
# Sheets
# =========================================================================
# -- Sheets --
def sheets_get(args):
if _gws_binary():
result = _run_gws(
["sheets", "spreadsheets", "values", "get"],
params={"spreadsheetId": args.sheet_id, "range": args.range},
)
print(json.dumps(result.get("values", []), indent=2, ensure_ascii=False))
return
service = build_service("sheets", "v4")
result = service.spreadsheets().values().get(
spreadsheetId=args.sheet_id, range=args.range,
).execute()
print(json.dumps(result.get("values", []), indent=2, ensure_ascii=False))
gws(
"sheets", "+read",
"--spreadsheet", args.sheet_id,
"--range", args.range,
"--format", "json",
)
def sheets_update(args):
values = json.loads(args.values)
body = {"values": values}
if _gws_binary():
result = _run_gws(
["sheets", "spreadsheets", "values", "update"],
params={
"spreadsheetId": args.sheet_id,
"range": args.range,
"valueInputOption": "USER_ENTERED",
},
body=body,
)
print(json.dumps({"updatedCells": result.get("updatedCells", 0), "updatedRange": result.get("updatedRange", "")}, indent=2))
return
service = build_service("sheets", "v4")
result = service.spreadsheets().values().update(
spreadsheetId=args.sheet_id, range=args.range,
valueInputOption="USER_ENTERED", body=body,
).execute()
print(json.dumps({"updatedCells": result.get("updatedCells", 0), "updatedRange": result.get("updatedRange", "")}, indent=2))
gws(
"sheets", "spreadsheets", "values", "update",
"--params", json.dumps({
"spreadsheetId": args.sheet_id,
"range": args.range,
"valueInputOption": "USER_ENTERED",
}),
"--json", json.dumps({"values": values}),
"--format", "json",
)
def sheets_append(args):
values = json.loads(args.values)
body = {"values": values}
if _gws_binary():
result = _run_gws(
["sheets", "spreadsheets", "values", "append"],
params={
"spreadsheetId": args.sheet_id,
"range": args.range,
"valueInputOption": "USER_ENTERED",
"insertDataOption": "INSERT_ROWS",
},
body=body,
)
print(json.dumps({"updatedCells": result.get("updates", {}).get("updatedCells", 0)}, indent=2))
return
service = build_service("sheets", "v4")
result = service.spreadsheets().values().append(
spreadsheetId=args.sheet_id, range=args.range,
valueInputOption="USER_ENTERED", insertDataOption="INSERT_ROWS", body=body,
).execute()
print(json.dumps({"updatedCells": result.get("updates", {}).get("updatedCells", 0)}, indent=2))
gws(
"sheets", "+append",
"--spreadsheet", args.sheet_id,
"--json-values", json.dumps(values),
"--format", "json",
)
# =========================================================================
# Docs
# =========================================================================
# -- Docs --
def docs_get(args):
if _gws_binary():
doc = _run_gws(["docs", "documents", "get"], params={"documentId": args.doc_id})
result = {
"title": doc.get("title", ""),
"documentId": doc.get("documentId", ""),
"body": _extract_doc_text(doc),
}
print(json.dumps(result, indent=2, ensure_ascii=False))
return
service = build_service("docs", "v1")
doc = service.documents().get(documentId=args.doc_id).execute()
result = {
"title": doc.get("title", ""),
"documentId": doc.get("documentId", ""),
"body": _extract_doc_text(doc),
}
print(json.dumps(result, indent=2, ensure_ascii=False))
gws(
"docs", "documents", "get",
"--params", json.dumps({"documentId": args.doc_id}),
"--format", "json",
)
# =========================================================================
# CLI parser
# =========================================================================
# -- CLI parser (backward-compatible interface) --
def main():
parser = argparse.ArgumentParser(description="Google Workspace API for Hermes Agent")
parser = argparse.ArgumentParser(description="Google Workspace API for Hermes Agent (gws backend)")
sub = parser.add_subparsers(dest="service", required=True)
# --- Gmail ---
@@ -742,15 +228,13 @@ def main():
p.add_argument("--subject", required=True)
p.add_argument("--body", required=True)
p.add_argument("--cc", default="")
p.add_argument("--from", dest="from_header", default="", help="Custom From header (e.g. '\"Agent Name\" <user@example.com>')")
p.add_argument("--html", action="store_true", help="Send body as HTML")
p.add_argument("--thread-id", default="", help="Thread ID for threading")
p.add_argument("--thread-id", default="", help="Thread ID (unused with gws, kept for compat)")
p.set_defaults(func=gmail_send)
p = gmail_sub.add_parser("reply")
p.add_argument("message_id", help="Message ID to reply to")
p.add_argument("--body", required=True)
p.add_argument("--from", dest="from_header", default="", help="Custom From header (e.g. '\"Agent Name\" <user@example.com>')")
p.set_defaults(func=gmail_reply)
p = gmail_sub.add_parser("labels")
@@ -25,13 +25,6 @@ def refresh_token(token_data: dict) -> dict:
import urllib.parse
import urllib.request
required_keys = ["client_id", "client_secret", "refresh_token", "token_uri"]
missing = [k for k in required_keys if k not in token_data]
if missing:
print(f"ERROR: google_token.json is missing required fields: {', '.join(missing)}", file=sys.stderr)
print("Please re-authenticate by running the Google Workspace setup script.", file=sys.stderr)
sys.exit(1)
params = urllib.parse.urlencode({
"client_id": token_data["client_id"],
"client_secret": token_data["client_secret"],
+3 -3
View File
@@ -60,7 +60,7 @@ The fastest path — auto-detect the model, test strategies, and lock in the win
# In execute_code — use the loader to avoid exec-scoping issues:
import os
exec(open(os.path.expanduser(
os.path.join(os.environ.get("HERMES_HOME", os.path.expanduser("~/.hermes")), "skills/red-teaming/godmode/scripts/load_godmode.py")
"~/.hermes/skills/red-teaming/godmode/scripts/load_godmode.py"
)).read())
# Auto-detect model from config and jailbreak it
@@ -192,7 +192,7 @@ python3 scripts/parseltongue.py "How do I hack into a WiFi network?" --tier stan
Or use `execute_code` inline:
```python
# Load the parseltongue module
exec(open(os.path.join(os.environ.get("HERMES_HOME", os.path.expanduser("~/.hermes")), "skills/red-teaming/godmode/scripts/parseltongue.py")).read())
exec(open(os.path.expanduser("~/.hermes/skills/red-teaming/godmode/scripts/parseltongue.py")).read())
query = "How do I hack into a WiFi network?"
variants = generate_variants(query, tier="standard")
@@ -229,7 +229,7 @@ Race multiple models against the same query, score responses, pick the winner:
```python
# Via execute_code
exec(open(os.path.join(os.environ.get("HERMES_HOME", os.path.expanduser("~/.hermes")), "skills/red-teaming/godmode/scripts/godmode_race.py")).read())
exec(open(os.path.expanduser("~/.hermes/skills/red-teaming/godmode/scripts/godmode_race.py")).read())
result = race_models(
query="Explain how SQL injection works with a practical example",
@@ -114,7 +114,7 @@ hermes
### Via the GODMODE CLASSIC racer script
```python
exec(open(os.path.join(os.environ.get("HERMES_HOME", os.path.expanduser("~/.hermes")), "skills/red-teaming/godmode/scripts/godmode_race.py")).read())
exec(open(os.path.expanduser("~/.hermes/skills/red-teaming/godmode/scripts/godmode_race.py")).read())
result = race_godmode_classic("Your query here")
print(f"Winner: {result['codename']} — Score: {result['score']}")
print(result['content'])
@@ -129,7 +129,7 @@ These don't auto-reject but reduce the response score:
## Using in Python
```python
exec(open(os.path.join(os.environ.get("HERMES_HOME", os.path.expanduser("~/.hermes")), "skills/red-teaming/godmode/scripts/godmode_race.py")).read())
exec(open(os.path.expanduser("~/.hermes/skills/red-teaming/godmode/scripts/godmode_race.py")).read())
# Check if a response is a refusal
text = "I'm sorry, but I can't assist with that request."
@@ -7,7 +7,7 @@ finds what works, and locks it in by writing config.yaml + prefill.json.
Usage in execute_code:
exec(open(os.path.expanduser(
os.path.join(os.environ.get("HERMES_HOME", os.path.expanduser("~/.hermes")), "skills/red-teaming/godmode/scripts/auto_jailbreak.py")
"~/.hermes/skills/red-teaming/godmode/scripts/auto_jailbreak.py"
)).read())
result = auto_jailbreak() # Uses current model from config
@@ -7,7 +7,7 @@ Queries multiple models in parallel via OpenRouter, scores responses
on quality/filteredness/speed, returns the best unfiltered answer.
Usage in execute_code:
exec(open(os.path.join(os.environ.get("HERMES_HOME", os.path.expanduser("~/.hermes")), "skills/red-teaming/godmode/scripts/godmode_race.py")).read())
exec(open(os.path.expanduser("~/.hermes/skills/red-teaming/godmode/scripts/godmode_race.py")).read())
result = race_models(
query="Your query here",
@@ -3,7 +3,7 @@ Loader for G0DM0D3 scripts. Handles the exec-scoping issues.
Usage in execute_code:
exec(open(os.path.expanduser(
os.path.join(os.environ.get("HERMES_HOME", os.path.expanduser("~/.hermes")), "skills/red-teaming/godmode/scripts/load_godmode.py")
"~/.hermes/skills/red-teaming/godmode/scripts/load_godmode.py"
)).read())
# Now all functions are available:
@@ -11,7 +11,7 @@ Usage:
python parseltongue.py "How do I hack a WiFi network?" --tier standard
# As a module in execute_code
exec(open(os.path.join(os.environ.get("HERMES_HOME", os.path.expanduser("~/.hermes")), "skills/red-teaming/godmode/scripts/parseltongue.py")).read())
exec(open("~/.hermes/skills/red-teaming/godmode/scripts/parseltongue.py").read())
variants = generate_variants("How do I hack a WiFi network?", tier="standard")
"""
+7 -16
View File
@@ -89,8 +89,7 @@ class TestReadCodexAccessToken:
hermes_home.mkdir(parents=True, exist_ok=True)
(hermes_home / "auth.json").write_text(json.dumps({"version": 1, "providers": {}}))
monkeypatch.setenv("HERMES_HOME", str(hermes_home))
with patch("agent.auxiliary_client._select_pool_entry", return_value=(False, None)):
result = _read_codex_access_token()
result = _read_codex_access_token()
assert result is None
def test_empty_token_returns_none(self, tmp_path, monkeypatch):
@@ -147,8 +146,7 @@ class TestReadCodexAccessToken:
},
}))
monkeypatch.setenv("HERMES_HOME", str(hermes_home))
with patch("agent.auxiliary_client._select_pool_entry", return_value=(False, None)):
result = _read_codex_access_token()
result = _read_codex_access_token()
assert result is None, "Expired JWT should return None"
def test_valid_jwt_returns_token(self, tmp_path, monkeypatch):
@@ -587,10 +585,7 @@ class TestGetTextAuxiliaryClient:
assert call_kwargs.kwargs["base_url"] == "http://localhost:1234/v1"
def test_codex_fallback_when_nothing_else(self, codex_auth_dir):
with patch("agent.auxiliary_client._try_openrouter", return_value=(None, None)), \
patch("agent.auxiliary_client._try_nous", return_value=(None, None)), \
patch("agent.auxiliary_client._try_custom_endpoint", return_value=(None, None)), \
patch("agent.auxiliary_client._read_main_provider", return_value="openrouter"), \
with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
patch("agent.auxiliary_client.OpenAI") as mock_openai:
client, model = get_text_auxiliary_client()
assert model == "gpt-5.2-codex"
@@ -628,21 +623,17 @@ class TestGetTextAuxiliaryClient:
monkeypatch.delenv("OPENAI_BASE_URL", raising=False)
monkeypatch.delenv("OPENAI_API_KEY", raising=False)
monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
with patch("agent.auxiliary_client._resolve_auto", return_value=(None, None)):
with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
patch("agent.auxiliary_client._read_codex_access_token", return_value=None), \
patch("agent.auxiliary_client._resolve_api_key_provider", return_value=(None, None)):
client, model = get_text_auxiliary_client()
assert client is None
assert model is None
def test_custom_endpoint_uses_codex_wrapper_when_runtime_requests_responses_api(self, monkeypatch):
monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
monkeypatch.delenv("OPENAI_API_KEY", raising=False)
monkeypatch.delenv("OPENAI_BASE_URL", raising=False)
def test_custom_endpoint_uses_codex_wrapper_when_runtime_requests_responses_api(self):
with patch("agent.auxiliary_client._resolve_custom_runtime",
return_value=("https://api.openai.com/v1", "sk-test", "codex_responses")), \
patch("agent.auxiliary_client._read_main_model", return_value="gpt-5.3-codex"), \
patch("agent.auxiliary_client._try_openrouter", return_value=(None, None)), \
patch("agent.auxiliary_client._try_nous", return_value=(None, None)), \
patch("agent.auxiliary_client._read_main_provider", return_value="openrouter"), \
patch("agent.auxiliary_client.OpenAI") as mock_openai:
client, model = get_text_auxiliary_client()
@@ -232,7 +232,7 @@ class TestResolveVisionProviderClientModelNormalization:
assert provider == "zai"
assert client is not None
assert model == "glm-5v-turbo" # zai has dedicated vision model in _PROVIDER_VISION_MODELS
assert model == "glm-5.1"
class TestVisionPathApiMode:
File diff suppressed because it is too large Load Diff
-269
View File
@@ -1,269 +0,0 @@
"""Integration tests for the AWS Bedrock provider wiring.
Verifies that the Bedrock provider is correctly registered in the
provider registry, model catalog, and runtime resolution pipeline.
These tests do NOT require AWS credentials or boto3 all AWS calls
are mocked.
Note: Tests that import ``hermes_cli.auth`` or ``hermes_cli.runtime_provider``
require Python 3.10+ due to ``str | None`` type syntax in the import chain.
"""
import os
from unittest.mock import MagicMock, patch
import pytest
class TestProviderRegistry:
"""Verify Bedrock is registered in PROVIDER_REGISTRY."""
def test_bedrock_in_registry(self):
from hermes_cli.auth import PROVIDER_REGISTRY
assert "bedrock" in PROVIDER_REGISTRY
def test_bedrock_auth_type_is_aws_sdk(self):
from hermes_cli.auth import PROVIDER_REGISTRY
pconfig = PROVIDER_REGISTRY["bedrock"]
assert pconfig.auth_type == "aws_sdk"
def test_bedrock_has_no_api_key_env_vars(self):
"""Bedrock uses the AWS SDK credential chain, not API keys."""
from hermes_cli.auth import PROVIDER_REGISTRY
pconfig = PROVIDER_REGISTRY["bedrock"]
assert pconfig.api_key_env_vars == ()
def test_bedrock_base_url_env_var(self):
from hermes_cli.auth import PROVIDER_REGISTRY
pconfig = PROVIDER_REGISTRY["bedrock"]
assert pconfig.base_url_env_var == "BEDROCK_BASE_URL"
class TestProviderAliases:
"""Verify Bedrock aliases resolve correctly."""
def test_aws_alias(self):
from hermes_cli.models import _PROVIDER_ALIASES
assert _PROVIDER_ALIASES.get("aws") == "bedrock"
def test_aws_bedrock_alias(self):
from hermes_cli.models import _PROVIDER_ALIASES
assert _PROVIDER_ALIASES.get("aws-bedrock") == "bedrock"
def test_amazon_bedrock_alias(self):
from hermes_cli.models import _PROVIDER_ALIASES
assert _PROVIDER_ALIASES.get("amazon-bedrock") == "bedrock"
def test_amazon_alias(self):
from hermes_cli.models import _PROVIDER_ALIASES
assert _PROVIDER_ALIASES.get("amazon") == "bedrock"
class TestProviderLabels:
"""Verify Bedrock appears in provider labels."""
def test_bedrock_label(self):
from hermes_cli.models import _PROVIDER_LABELS
assert _PROVIDER_LABELS.get("bedrock") == "AWS Bedrock"
class TestModelCatalog:
"""Verify Bedrock has a static model fallback list."""
def test_bedrock_has_curated_models(self):
from hermes_cli.models import _PROVIDER_MODELS
models = _PROVIDER_MODELS.get("bedrock", [])
assert len(models) > 0
def test_bedrock_models_include_claude(self):
from hermes_cli.models import _PROVIDER_MODELS
models = _PROVIDER_MODELS.get("bedrock", [])
claude_models = [m for m in models if "anthropic.claude" in m]
assert len(claude_models) > 0
def test_bedrock_models_include_nova(self):
from hermes_cli.models import _PROVIDER_MODELS
models = _PROVIDER_MODELS.get("bedrock", [])
nova_models = [m for m in models if "amazon.nova" in m]
assert len(nova_models) > 0
class TestResolveProvider:
"""Verify resolve_provider() handles bedrock correctly."""
def test_explicit_bedrock_resolves(self, monkeypatch):
"""When user explicitly requests 'bedrock', it should resolve."""
from hermes_cli.auth import PROVIDER_REGISTRY
# bedrock is in the registry, so resolve_provider should return it
from hermes_cli.auth import resolve_provider
result = resolve_provider("bedrock")
assert result == "bedrock"
def test_aws_alias_resolves_to_bedrock(self):
from hermes_cli.auth import resolve_provider
result = resolve_provider("aws")
assert result == "bedrock"
def test_amazon_bedrock_alias_resolves(self):
from hermes_cli.auth import resolve_provider
result = resolve_provider("amazon-bedrock")
assert result == "bedrock"
def test_auto_detect_with_aws_credentials(self, monkeypatch):
"""When AWS credentials are present and no other provider is configured,
auto-detect should find bedrock."""
from hermes_cli.auth import resolve_provider
# Clear all other provider env vars
for var in ["OPENAI_API_KEY", "OPENROUTER_API_KEY", "ANTHROPIC_API_KEY",
"ANTHROPIC_TOKEN", "GOOGLE_API_KEY", "DEEPSEEK_API_KEY"]:
monkeypatch.delenv(var, raising=False)
# Set AWS credentials
monkeypatch.setenv("AWS_ACCESS_KEY_ID", "AKIAIOSFODNN7EXAMPLE")
monkeypatch.setenv("AWS_SECRET_ACCESS_KEY", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
# Mock the auth store to have no active provider
with patch("hermes_cli.auth._load_auth_store", return_value={}):
result = resolve_provider("auto")
assert result == "bedrock"
class TestRuntimeProvider:
"""Verify resolve_runtime_provider() handles bedrock correctly."""
def test_bedrock_runtime_resolution(self, monkeypatch):
from hermes_cli.runtime_provider import resolve_runtime_provider
monkeypatch.setenv("AWS_ACCESS_KEY_ID", "AKIAIOSFODNN7EXAMPLE")
monkeypatch.setenv("AWS_SECRET_ACCESS_KEY", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
monkeypatch.setenv("AWS_REGION", "eu-west-1")
# Mock resolve_provider to return bedrock
with patch("hermes_cli.runtime_provider.resolve_provider", return_value="bedrock"), \
patch("hermes_cli.runtime_provider._get_model_config", return_value={"provider": "bedrock"}):
result = resolve_runtime_provider(requested="bedrock")
assert result["provider"] == "bedrock"
assert result["api_mode"] == "bedrock_converse"
assert result["region"] == "eu-west-1"
assert "bedrock-runtime.eu-west-1.amazonaws.com" in result["base_url"]
assert result["api_key"] == "aws-sdk"
def test_bedrock_runtime_default_region(self, monkeypatch):
from hermes_cli.runtime_provider import resolve_runtime_provider
monkeypatch.setenv("AWS_PROFILE", "default")
monkeypatch.delenv("AWS_REGION", raising=False)
monkeypatch.delenv("AWS_DEFAULT_REGION", raising=False)
with patch("hermes_cli.runtime_provider.resolve_provider", return_value="bedrock"), \
patch("hermes_cli.runtime_provider._get_model_config", return_value={"provider": "bedrock"}):
result = resolve_runtime_provider(requested="bedrock")
assert result["region"] == "us-east-1"
def test_bedrock_runtime_no_credentials_raises_on_auto_detect(self, monkeypatch):
"""When bedrock is auto-detected (not explicitly requested) and no
credentials are found, runtime resolution should raise AuthError."""
from hermes_cli.runtime_provider import resolve_runtime_provider
from hermes_cli.auth import AuthError
# Clear all AWS env vars
for var in ["AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY", "AWS_PROFILE",
"AWS_BEARER_TOKEN_BEDROCK", "AWS_CONTAINER_CREDENTIALS_RELATIVE_URI",
"AWS_WEB_IDENTITY_TOKEN_FILE"]:
monkeypatch.delenv(var, raising=False)
# Mock both the provider resolution and boto3's credential chain
mock_session = MagicMock()
mock_session.get_credentials.return_value = None
with patch("hermes_cli.runtime_provider.resolve_provider", return_value="bedrock"), \
patch("hermes_cli.runtime_provider._get_model_config", return_value={"provider": "bedrock"}), \
patch("hermes_cli.runtime_provider.resolve_requested_provider", return_value="auto"), \
patch.dict("sys.modules", {"botocore": MagicMock(), "botocore.session": MagicMock()}):
import botocore.session as _bs
_bs.get_session = MagicMock(return_value=mock_session)
with pytest.raises(AuthError, match="No AWS credentials"):
resolve_runtime_provider(requested="auto")
def test_bedrock_runtime_explicit_skips_credential_check(self, monkeypatch):
"""When user explicitly requests bedrock, trust boto3's credential chain
even if env-var detection finds nothing (covers IMDS, SSO, etc.)."""
from hermes_cli.runtime_provider import resolve_runtime_provider
# No AWS env vars set — but explicit bedrock request should not raise
for var in ["AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY", "AWS_PROFILE",
"AWS_BEARER_TOKEN_BEDROCK"]:
monkeypatch.delenv(var, raising=False)
with patch("hermes_cli.runtime_provider.resolve_provider", return_value="bedrock"), \
patch("hermes_cli.runtime_provider._get_model_config", return_value={"provider": "bedrock"}):
result = resolve_runtime_provider(requested="bedrock")
assert result["provider"] == "bedrock"
assert result["api_mode"] == "bedrock_converse"
# ---------------------------------------------------------------------------
# providers.py integration
# ---------------------------------------------------------------------------
class TestProvidersModule:
"""Verify bedrock is wired into hermes_cli/providers.py."""
def test_bedrock_alias_in_providers(self):
from hermes_cli.providers import ALIASES
assert ALIASES.get("bedrock") is None # "bedrock" IS the canonical name, not an alias
assert ALIASES.get("aws") == "bedrock"
assert ALIASES.get("aws-bedrock") == "bedrock"
def test_bedrock_transport_mapping(self):
from hermes_cli.providers import TRANSPORT_TO_API_MODE
assert TRANSPORT_TO_API_MODE.get("bedrock_converse") == "bedrock_converse"
def test_determine_api_mode_from_bedrock_url(self):
from hermes_cli.providers import determine_api_mode
assert determine_api_mode(
"unknown", "https://bedrock-runtime.us-east-1.amazonaws.com"
) == "bedrock_converse"
def test_label_override(self):
from hermes_cli.providers import _LABEL_OVERRIDES
assert _LABEL_OVERRIDES.get("bedrock") == "AWS Bedrock"
# ---------------------------------------------------------------------------
# Error classifier integration
# ---------------------------------------------------------------------------
class TestErrorClassifierBedrock:
"""Verify Bedrock error patterns are in the global error classifier."""
def test_throttling_in_rate_limit_patterns(self):
from agent.error_classifier import _RATE_LIMIT_PATTERNS
assert "throttlingexception" in _RATE_LIMIT_PATTERNS
def test_context_overflow_patterns(self):
from agent.error_classifier import _CONTEXT_OVERFLOW_PATTERNS
assert "input is too long" in _CONTEXT_OVERFLOW_PATTERNS
# ---------------------------------------------------------------------------
# pyproject.toml bedrock extra
# ---------------------------------------------------------------------------
class TestPackaging:
"""Verify bedrock optional dependency is declared."""
def test_bedrock_extra_exists(self):
import configparser
from pathlib import Path
# Read pyproject.toml to verify [bedrock] extra
toml_path = Path(__file__).parent.parent.parent / "pyproject.toml"
content = toml_path.read_text()
assert 'bedrock = ["boto3' in content
def test_bedrock_in_all_extra(self):
from pathlib import Path
content = (Path(__file__).parent.parent.parent / "pyproject.toml").read_text()
assert '"hermes-agent[bedrock]"' in content
-6
View File
@@ -252,11 +252,6 @@ def test_exhausted_402_entry_resets_after_one_hour(tmp_path, monkeypatch):
def test_explicit_reset_timestamp_overrides_default_429_ttl(tmp_path, monkeypatch):
monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes"))
# Prevent auto-seeding from Codex CLI tokens on the host
monkeypatch.setattr(
"hermes_cli.auth._import_codex_cli_tokens",
lambda: None,
)
_write_auth_store(
tmp_path,
{
@@ -1096,7 +1091,6 @@ def test_load_pool_seeds_copilot_via_gh_auth_token(tmp_path, monkeypatch):
assert len(entries) == 1
assert entries[0].source == "gh_cli"
assert entries[0].access_token == "gho_fake_token_abc123"
assert entries[0].base_url == "https://api.githubcopilot.com"
def test_load_pool_does_not_seed_copilot_when_no_token(tmp_path, monkeypatch):
-315
View File
@@ -396,108 +396,6 @@ class TestPluginMemoryDiscovery:
assert load_memory_provider("nonexistent_provider") is None
class TestUserInstalledProviderDiscovery:
"""Memory providers installed to $HERMES_HOME/plugins/ should be found.
Regression test for issues #4956 and #9099: load_memory_provider() and
discover_memory_providers() only scanned the bundled plugins/memory/
directory, ignoring user-installed plugins.
"""
def _make_user_memory_plugin(self, tmp_path, name="myprovider"):
"""Create a minimal user memory provider plugin."""
plugin_dir = tmp_path / "plugins" / name
plugin_dir.mkdir(parents=True)
(plugin_dir / "__init__.py").write_text(
"from agent.memory_provider import MemoryProvider\n"
"class MyProvider(MemoryProvider):\n"
f" @property\n"
f" def name(self): return {name!r}\n"
" def is_available(self): return True\n"
" def initialize(self, **kw): pass\n"
" def sync_turn(self, *a, **kw): pass\n"
" def get_tool_schemas(self): return []\n"
" def handle_tool_call(self, *a, **kw): return '{}'\n"
)
(plugin_dir / "plugin.yaml").write_text(
f"name: {name}\ndescription: Test user provider\n"
)
return plugin_dir
def test_discover_finds_user_plugins(self, tmp_path, monkeypatch):
"""discover_memory_providers() includes user-installed plugins."""
from plugins.memory import discover_memory_providers, _get_user_plugins_dir
self._make_user_memory_plugin(tmp_path, "myexternal")
monkeypatch.setattr(
"plugins.memory._get_user_plugins_dir",
lambda: tmp_path / "plugins",
)
providers = discover_memory_providers()
names = [n for n, _, _ in providers]
assert "myexternal" in names
assert "holographic" in names # bundled still found
def test_load_user_plugin(self, tmp_path, monkeypatch):
"""load_memory_provider() can load from $HERMES_HOME/plugins/."""
from plugins.memory import load_memory_provider
self._make_user_memory_plugin(tmp_path, "myexternal")
monkeypatch.setattr(
"plugins.memory._get_user_plugins_dir",
lambda: tmp_path / "plugins",
)
p = load_memory_provider("myexternal")
assert p is not None
assert p.name == "myexternal"
assert p.is_available()
def test_bundled_takes_precedence(self, tmp_path, monkeypatch):
"""Bundled provider wins when user plugin has the same name."""
from plugins.memory import load_memory_provider, discover_memory_providers
# Create user plugin named "holographic" (same as bundled)
plugin_dir = tmp_path / "plugins" / "holographic"
plugin_dir.mkdir(parents=True)
(plugin_dir / "__init__.py").write_text(
"from agent.memory_provider import MemoryProvider\n"
"class Fake(MemoryProvider):\n"
" @property\n"
" def name(self): return 'holographic-FAKE'\n"
" def is_available(self): return True\n"
" def initialize(self, **kw): pass\n"
" def sync_turn(self, *a, **kw): pass\n"
" def get_tool_schemas(self): return []\n"
" def handle_tool_call(self, *a, **kw): return '{}'\n"
)
monkeypatch.setattr(
"plugins.memory._get_user_plugins_dir",
lambda: tmp_path / "plugins",
)
# Load should return bundled (name "holographic"), not user (name "holographic-FAKE")
p = load_memory_provider("holographic")
assert p is not None
assert p.name == "holographic" # bundled wins
# discover should not duplicate
providers = discover_memory_providers()
holo_count = sum(1 for n, _, _ in providers if n == "holographic")
assert holo_count == 1
def test_non_memory_user_plugins_excluded(self, tmp_path, monkeypatch):
"""User plugins that don't reference MemoryProvider are skipped."""
from plugins.memory import discover_memory_providers
plugin_dir = tmp_path / "plugins" / "notmemory"
plugin_dir.mkdir(parents=True)
(plugin_dir / "__init__.py").write_text(
"def register(ctx):\n ctx.register_tool('foo', 'bar', {}, lambda: None)\n"
)
monkeypatch.setattr(
"plugins.memory._get_user_plugins_dir",
lambda: tmp_path / "plugins",
)
providers = discover_memory_providers()
names = [n for n, _, _ in providers]
assert "notmemory" not in names
# ---------------------------------------------------------------------------
# Sequential dispatch routing tests
# ---------------------------------------------------------------------------
@@ -797,216 +695,3 @@ class TestMemoryContextFencing:
fence_end = combined.index("</memory-context>")
assert "Alice" in combined[fence_start:fence_end]
assert combined.index("weather") < fence_start
# ---------------------------------------------------------------------------
# AIAgent.commit_memory_session — routes to MemoryManager.on_session_end
# ---------------------------------------------------------------------------
class _CommitRecorder(FakeMemoryProvider):
"""Provider that records on_session_end calls for assertions."""
def __init__(self, name="recorder"):
super().__init__(name)
self.end_calls = []
def on_session_end(self, messages):
self.end_calls.append(list(messages or []))
class TestCommitMemorySessionRouting:
def test_on_session_end_fans_out(self):
mgr = MemoryManager()
builtin = _CommitRecorder("builtin")
external = _CommitRecorder("openviking")
mgr.add_provider(builtin)
mgr.add_provider(external)
msgs = [{"role": "user", "content": "hi"}]
mgr.on_session_end(msgs)
assert builtin.end_calls == [msgs]
assert external.end_calls == [msgs]
def test_on_session_end_tolerates_failure(self):
mgr = MemoryManager()
builtin = FakeMemoryProvider("builtin")
bad = _CommitRecorder("bad-provider")
bad.on_session_end = lambda m: (_ for _ in ()).throw(RuntimeError("boom"))
mgr.add_provider(builtin)
mgr.add_provider(bad)
mgr.on_session_end([]) # must not raise
# ---------------------------------------------------------------------------
# on_memory_write bridge — must fire from both concurrent AND sequential paths
# ---------------------------------------------------------------------------
class TestOnMemoryWriteBridge:
"""Verify that MemoryManager.on_memory_write is called when built-in
memory writes happen. This is a regression test for #10174 where the
sequential tool execution path (_execute_tool_calls_sequential) was
missing the bridge call, so single memory tool calls never notified
external memory providers.
"""
def test_on_memory_write_add(self):
"""on_memory_write fires for 'add' actions."""
mgr = MemoryManager()
p = FakeMemoryProvider("ext")
mgr.add_provider(p)
mgr.on_memory_write("add", "memory", "new fact")
assert p.memory_writes == [("add", "memory", "new fact")]
def test_on_memory_write_replace(self):
"""on_memory_write fires for 'replace' actions."""
mgr = MemoryManager()
p = FakeMemoryProvider("ext")
mgr.add_provider(p)
mgr.on_memory_write("replace", "user", "updated pref")
assert p.memory_writes == [("replace", "user", "updated pref")]
def test_on_memory_write_remove_not_bridged(self):
"""The bridge intentionally skips 'remove' — only add/replace notify."""
# This tests the contract that run_agent.py checks:
# function_args.get("action") in ("add", "replace")
mgr = MemoryManager()
p = FakeMemoryProvider("ext")
mgr.add_provider(p)
# Manager itself doesn't filter — run_agent.py does.
# But providers should handle remove gracefully.
mgr.on_memory_write("remove", "memory", "old fact")
assert p.memory_writes == [("remove", "memory", "old fact")]
def test_memory_manager_tool_injection_deduplicates(self):
"""Memory manager tools already in self.tools (from plugin registry)
must not be appended again. Duplicate function names cause 400 errors
on providers that enforce unique names (e.g. Xiaomi MiMo via Nous Portal).
Regression test for: duplicate mnemosyne_recall / mnemosyne_remember /
mnemosyne_stats in tools array 400 from Nous Portal.
"""
mgr = MemoryManager()
p = FakeMemoryProvider("ext", tools=[
{"name": "ext_recall", "description": "Recall", "parameters": {}},
{"name": "ext_remember", "description": "Remember", "parameters": {}},
])
mgr.add_provider(p)
# Simulate self.tools already containing one of the plugin tools
# (as if it was registered via ctx.register_tool → get_tool_definitions)
existing_tools = [
{"type": "function", "function": {"name": "ext_recall", "description": "Recall (from registry)", "parameters": {}}},
{"type": "function", "function": {"name": "web_search", "description": "Search", "parameters": {}}},
]
# Apply the same dedup logic from run_agent.py __init__
_existing_names = {
t.get("function", {}).get("name")
for t in existing_tools
if isinstance(t, dict)
}
for _schema in mgr.get_all_tool_schemas():
_tname = _schema.get("name", "")
if _tname and _tname in _existing_names:
continue
existing_tools.append({"type": "function", "function": _schema})
if _tname:
_existing_names.add(_tname)
# ext_recall should NOT be duplicated; ext_remember should be added
tool_names = [t["function"]["name"] for t in existing_tools]
assert tool_names.count("ext_recall") == 1, f"ext_recall duplicated: {tool_names}"
assert tool_names.count("ext_remember") == 1
assert tool_names.count("web_search") == 1
assert len(existing_tools) == 3 # web_search + ext_recall + ext_remember
def test_on_memory_write_tolerates_provider_failure(self):
"""If a provider's on_memory_write raises, others still get notified."""
mgr = MemoryManager()
bad = FakeMemoryProvider("builtin")
bad.on_memory_write = MagicMock(side_effect=RuntimeError("boom"))
good = FakeMemoryProvider("good")
mgr.add_provider(bad)
mgr.add_provider(good)
mgr.on_memory_write("add", "user", "test")
# Good provider still received the call despite bad provider crashing
assert good.memory_writes == [("add", "user", "test")]
class TestHonchoCadenceTracking:
"""Verify Honcho provider cadence gating depends on on_turn_start().
Bug: _turn_count was never updated because on_turn_start() was not called
from run_conversation(). This meant cadence checks always passed (every
turn fired both context refresh and dialectic). Fixed by calling
on_turn_start(self._user_turn_count, msg) before prefetch_all().
"""
def test_turn_count_updates_on_turn_start(self):
"""on_turn_start sets _turn_count, enabling cadence math."""
from plugins.memory.honcho import HonchoMemoryProvider
p = HonchoMemoryProvider()
assert p._turn_count == 0
p.on_turn_start(1, "hello")
assert p._turn_count == 1
p.on_turn_start(5, "world")
assert p._turn_count == 5
def test_queue_prefetch_respects_dialectic_cadence(self):
"""With dialecticCadence=3, dialectic should skip turns 2 and 3."""
from plugins.memory.honcho import HonchoMemoryProvider
p = HonchoMemoryProvider()
p._dialectic_cadence = 3
p._recall_mode = "context"
p._session_key = "test-session"
# Simulate a manager that records prefetch calls
class FakeManager:
def prefetch_context(self, key, query=None):
pass
def prefetch_dialectic(self, key, query):
pass
p._manager = FakeManager()
# Simulate turn 1: last_dialectic_turn = -999, so (1 - (-999)) >= 3 -> fires
p.on_turn_start(1, "turn 1")
p._last_dialectic_turn = 1 # simulate it fired
p._last_context_turn = 1
# Simulate turn 2: (2 - 1) = 1 < 3 -> should NOT fire dialectic
p.on_turn_start(2, "turn 2")
assert (p._turn_count - p._last_dialectic_turn) < p._dialectic_cadence
# Simulate turn 3: (3 - 1) = 2 < 3 -> should NOT fire dialectic
p.on_turn_start(3, "turn 3")
assert (p._turn_count - p._last_dialectic_turn) < p._dialectic_cadence
# Simulate turn 4: (4 - 1) = 3 >= 3 -> should fire dialectic
p.on_turn_start(4, "turn 4")
assert (p._turn_count - p._last_dialectic_turn) >= p._dialectic_cadence
def test_injection_frequency_first_turn_with_1indexed(self):
"""injection_frequency='first-turn' must inject on turn 1 (1-indexed)."""
from plugins.memory.honcho import HonchoMemoryProvider
p = HonchoMemoryProvider()
p._injection_frequency = "first-turn"
# Turn 1 should inject (not skip)
p.on_turn_start(1, "first message")
assert p._turn_count == 1
# The guard is `_turn_count > 1`, so turn 1 passes through
should_skip = p._injection_frequency == "first-turn" and p._turn_count > 1
assert not should_skip, "First turn (turn 1) should NOT be skipped"
# Turn 2 should skip
p.on_turn_start(2, "second message")
should_skip = p._injection_frequency == "first-turn" and p._turn_count > 1
assert should_skip, "Second turn (turn 2) SHOULD be skipped"
-253
View File
@@ -1,253 +0,0 @@
"""Tests for agent/nous_rate_guard.py — cross-session Nous Portal rate limit guard."""
import json
import os
import time
import pytest
@pytest.fixture
def rate_guard_env(tmp_path, monkeypatch):
"""Isolate rate guard state to a temp directory."""
hermes_home = str(tmp_path / ".hermes")
os.makedirs(hermes_home, exist_ok=True)
monkeypatch.setenv("HERMES_HOME", hermes_home)
# Clear any cached module-level imports
return hermes_home
class TestRecordNousRateLimit:
"""Test recording rate limit state."""
def test_records_with_header_reset(self, rate_guard_env):
from agent.nous_rate_guard import record_nous_rate_limit, _state_path
headers = {"x-ratelimit-reset-requests-1h": "1800"}
record_nous_rate_limit(headers=headers)
path = _state_path()
assert os.path.exists(path)
with open(path) as f:
state = json.load(f)
assert state["reset_seconds"] == pytest.approx(1800, abs=2)
assert state["reset_at"] > time.time()
def test_records_with_per_minute_header(self, rate_guard_env):
from agent.nous_rate_guard import record_nous_rate_limit, _state_path
headers = {"x-ratelimit-reset-requests": "45"}
record_nous_rate_limit(headers=headers)
with open(_state_path()) as f:
state = json.load(f)
assert state["reset_seconds"] == pytest.approx(45, abs=2)
def test_records_with_retry_after_header(self, rate_guard_env):
from agent.nous_rate_guard import record_nous_rate_limit, _state_path
headers = {"retry-after": "60"}
record_nous_rate_limit(headers=headers)
with open(_state_path()) as f:
state = json.load(f)
assert state["reset_seconds"] == pytest.approx(60, abs=2)
def test_prefers_hourly_over_per_minute(self, rate_guard_env):
from agent.nous_rate_guard import record_nous_rate_limit, _state_path
headers = {
"x-ratelimit-reset-requests-1h": "1800",
"x-ratelimit-reset-requests": "45",
}
record_nous_rate_limit(headers=headers)
with open(_state_path()) as f:
state = json.load(f)
# Should use the hourly value, not the per-minute one
assert state["reset_seconds"] == pytest.approx(1800, abs=2)
def test_falls_back_to_error_context_reset_at(self, rate_guard_env):
from agent.nous_rate_guard import record_nous_rate_limit, _state_path
future_reset = time.time() + 900
record_nous_rate_limit(
headers=None,
error_context={"reset_at": future_reset},
)
with open(_state_path()) as f:
state = json.load(f)
assert state["reset_at"] == pytest.approx(future_reset, abs=1)
def test_falls_back_to_default_cooldown(self, rate_guard_env):
from agent.nous_rate_guard import record_nous_rate_limit, _state_path
record_nous_rate_limit(headers=None)
with open(_state_path()) as f:
state = json.load(f)
# Default is 300 seconds (5 minutes)
assert state["reset_seconds"] == pytest.approx(300, abs=2)
def test_custom_default_cooldown(self, rate_guard_env):
from agent.nous_rate_guard import record_nous_rate_limit, _state_path
record_nous_rate_limit(headers=None, default_cooldown=120.0)
with open(_state_path()) as f:
state = json.load(f)
assert state["reset_seconds"] == pytest.approx(120, abs=2)
def test_creates_directory_if_missing(self, rate_guard_env):
from agent.nous_rate_guard import record_nous_rate_limit, _state_path
record_nous_rate_limit(headers={"retry-after": "10"})
assert os.path.exists(_state_path())
class TestNousRateLimitRemaining:
"""Test checking remaining rate limit time."""
def test_returns_none_when_no_file(self, rate_guard_env):
from agent.nous_rate_guard import nous_rate_limit_remaining
assert nous_rate_limit_remaining() is None
def test_returns_remaining_seconds_when_active(self, rate_guard_env):
from agent.nous_rate_guard import record_nous_rate_limit, nous_rate_limit_remaining
record_nous_rate_limit(headers={"x-ratelimit-reset-requests-1h": "600"})
remaining = nous_rate_limit_remaining()
assert remaining is not None
assert 595 < remaining <= 605 # ~600 seconds, allowing for test execution time
def test_returns_none_when_expired(self, rate_guard_env):
from agent.nous_rate_guard import nous_rate_limit_remaining, _state_path
# Write an already-expired state
state_dir = os.path.dirname(_state_path())
os.makedirs(state_dir, exist_ok=True)
with open(_state_path(), "w") as f:
json.dump({"reset_at": time.time() - 10, "recorded_at": time.time() - 100}, f)
assert nous_rate_limit_remaining() is None
# File should be cleaned up
assert not os.path.exists(_state_path())
def test_handles_corrupt_file(self, rate_guard_env):
from agent.nous_rate_guard import nous_rate_limit_remaining, _state_path
state_dir = os.path.dirname(_state_path())
os.makedirs(state_dir, exist_ok=True)
with open(_state_path(), "w") as f:
f.write("not valid json{{{")
assert nous_rate_limit_remaining() is None
class TestClearNousRateLimit:
"""Test clearing rate limit state."""
def test_clears_existing_file(self, rate_guard_env):
from agent.nous_rate_guard import (
record_nous_rate_limit,
clear_nous_rate_limit,
nous_rate_limit_remaining,
_state_path,
)
record_nous_rate_limit(headers={"retry-after": "600"})
assert nous_rate_limit_remaining() is not None
clear_nous_rate_limit()
assert nous_rate_limit_remaining() is None
assert not os.path.exists(_state_path())
def test_clear_when_no_file(self, rate_guard_env):
from agent.nous_rate_guard import clear_nous_rate_limit
# Should not raise
clear_nous_rate_limit()
class TestFormatRemaining:
"""Test human-readable duration formatting."""
def test_seconds(self):
from agent.nous_rate_guard import format_remaining
assert format_remaining(30) == "30s"
def test_minutes(self):
from agent.nous_rate_guard import format_remaining
assert format_remaining(125) == "2m 5s"
def test_exact_minutes(self):
from agent.nous_rate_guard import format_remaining
assert format_remaining(120) == "2m"
def test_hours(self):
from agent.nous_rate_guard import format_remaining
assert format_remaining(3720) == "1h 2m"
class TestParseResetSeconds:
"""Test header parsing for reset times."""
def test_case_insensitive_headers(self, rate_guard_env):
from agent.nous_rate_guard import _parse_reset_seconds
headers = {"X-Ratelimit-Reset-Requests-1h": "1200"}
assert _parse_reset_seconds(headers) == 1200.0
def test_returns_none_for_empty_headers(self):
from agent.nous_rate_guard import _parse_reset_seconds
assert _parse_reset_seconds(None) is None
assert _parse_reset_seconds({}) is None
def test_ignores_zero_values(self):
from agent.nous_rate_guard import _parse_reset_seconds
headers = {"x-ratelimit-reset-requests-1h": "0"}
assert _parse_reset_seconds(headers) is None
def test_ignores_invalid_values(self):
from agent.nous_rate_guard import _parse_reset_seconds
headers = {"x-ratelimit-reset-requests-1h": "not-a-number"}
assert _parse_reset_seconds(headers) is None
class TestAuxiliaryClientIntegration:
"""Test that the auxiliary client respects the rate guard."""
def test_try_nous_skips_when_rate_limited(self, rate_guard_env, monkeypatch):
from agent.nous_rate_guard import record_nous_rate_limit
# Record a rate limit
record_nous_rate_limit(headers={"retry-after": "600"})
# Mock _read_nous_auth to return valid creds (would normally succeed)
import agent.auxiliary_client as aux
monkeypatch.setattr(aux, "_read_nous_auth", lambda: {
"access_token": "test-token",
"inference_base_url": "https://api.nous.test/v1",
})
result = aux._try_nous()
assert result == (None, None)
def test_try_nous_works_when_not_rate_limited(self, rate_guard_env, monkeypatch):
import agent.auxiliary_client as aux
# No rate limit recorded — _try_nous should proceed normally
# (will return None because no real creds, but won't be blocked
# by the rate guard)
monkeypatch.setattr(aux, "_read_nous_auth", lambda: None)
result = aux._try_nous()
assert result == (None, None)
@@ -1,60 +0,0 @@
"""Tests for malformed proxy env var and base URL validation.
Salvaged from PR #6403 by MestreY0d4-Uninter — validates that the agent
surfaces clear errors instead of cryptic httpx ``Invalid port`` exceptions
when proxy env vars or custom endpoint URLs are malformed.
"""
from __future__ import annotations
import pytest
from agent.auxiliary_client import _validate_base_url, _validate_proxy_env_urls
# -- proxy env validation ------------------------------------------------
def test_proxy_env_accepts_normal_values(monkeypatch):
monkeypatch.setenv("HTTP_PROXY", "http://127.0.0.1:6153")
monkeypatch.setenv("HTTPS_PROXY", "https://proxy.example.com:8443")
monkeypatch.setenv("ALL_PROXY", "socks5://127.0.0.1:1080")
_validate_proxy_env_urls() # should not raise
def test_proxy_env_accepts_empty(monkeypatch):
monkeypatch.delenv("HTTP_PROXY", raising=False)
monkeypatch.delenv("HTTPS_PROXY", raising=False)
monkeypatch.delenv("ALL_PROXY", raising=False)
monkeypatch.delenv("http_proxy", raising=False)
monkeypatch.delenv("https_proxy", raising=False)
monkeypatch.delenv("all_proxy", raising=False)
_validate_proxy_env_urls() # should not raise
@pytest.mark.parametrize("key", [
"HTTP_PROXY", "HTTPS_PROXY", "ALL_PROXY",
"http_proxy", "https_proxy", "all_proxy",
])
def test_proxy_env_rejects_malformed_port(monkeypatch, key):
monkeypatch.setenv(key, "http://127.0.0.1:6153export")
with pytest.raises(RuntimeError, match=rf"Malformed proxy environment variable {key}=.*6153export"):
_validate_proxy_env_urls()
# -- base URL validation -------------------------------------------------
@pytest.mark.parametrize("url", [
"https://api.example.com/v1",
"http://127.0.0.1:6153/v1",
"acp://copilot",
"",
None,
])
def test_base_url_accepts_valid(url):
_validate_base_url(url) # should not raise
def test_base_url_rejects_malformed_port():
with pytest.raises(RuntimeError, match="Malformed custom endpoint URL"):
_validate_base_url("http://127.0.0.1:6153export")
-92
View File
@@ -284,95 +284,3 @@ class TestElevenLabsTavilyExaKeys:
assert "XYZ789abcdef" not in result
assert "HOME=/home/user" in result
assert "SHELL=/bin/bash" in result
class TestJWTTokens:
"""JWT tokens start with eyJ (base64 for '{') and have dot-separated parts."""
def test_full_3part_jwt(self):
text = (
"Token: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9"
".eyJpc3MiOiI0MjNiZDJkYjg4MjI0MDAwIn0"
".Gxgv0rru-_kS-I_60EJ7CENTnBh9UeuL3QhkMoQ-VnM"
)
result = redact_sensitive_text(text)
assert "Token:" in result
# Payload and signature must not survive
assert "eyJpc3Mi" not in result
assert "Gxgv0rru" not in result
def test_2part_jwt(self):
text = "eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiIxMjM0NTY3ODkwIn0"
result = redact_sensitive_text(text)
assert "eyJzdWIi" not in result
def test_standalone_jwt_header(self):
text = "leaked header: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9 here"
result = redact_sensitive_text(text)
assert "IkpXVCJ9" not in result
assert "leaked header:" in result
def test_jwt_with_base64_padding(self):
text = "eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiIxMjM0NTY3ODkwIn0=.abc123def456ghij"
result = redact_sensitive_text(text)
assert "abc123def456" not in result
def test_short_eyj_not_matched(self):
"""eyJ followed by fewer than 10 base64 chars should not match."""
text = "eyJust a normal word"
assert redact_sensitive_text(text) == text
def test_jwt_preserves_surrounding_text(self):
text = "before eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiIxMjM0NTY3ODkwIn0 after"
result = redact_sensitive_text(text)
assert result.startswith("before ")
assert result.endswith(" after")
def test_home_assistant_jwt_in_memory(self):
"""Real-world pattern: HA token stored in agent memory block."""
text = (
"Home Assistant API Token: "
"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9"
".eyJpc3MiOiJhYmNkZWYiLCJleHAiOjE3NzQ5NTcxMDN9"
".Gxgv0rru-_kS-I_60EJ7CENTnBh9UeuL3QhkMoQ-VnM"
)
result = redact_sensitive_text(text)
assert "Home Assistant API Token:" in result
assert "Gxgv0rru" not in result
assert "..." in result
class TestDiscordMentions:
"""Discord snowflake IDs in <@ID> or <@!ID> format."""
def test_normal_mention(self):
result = redact_sensitive_text("Hello <@222589316709220353>")
assert "222589316709220353" not in result
assert "<@***>" in result
def test_nickname_mention(self):
result = redact_sensitive_text("Ping <@!1331549159177846844>")
assert "1331549159177846844" not in result
assert "<@!***>" in result
def test_multiple_mentions(self):
text = "<@111111111111111111> and <@222222222222222222>"
result = redact_sensitive_text(text)
assert "111111111111111111" not in result
assert "222222222222222222" not in result
def test_short_id_not_matched(self):
"""IDs shorter than 17 digits are not Discord snowflakes."""
text = "<@12345>"
assert redact_sensitive_text(text) == text
def test_slack_mention_not_matched(self):
"""Slack mentions use letters, not pure digits."""
text = "<@U024BE7LH>"
assert redact_sensitive_text(text) == text
def test_preserves_surrounding_text(self):
text = "User <@222589316709220353> said hello"
result = redact_sensitive_text(text)
assert result.startswith("User ")
assert result.endswith(" said hello")
-16
View File
@@ -1,6 +1,5 @@
"""Tests for CLI /status command behavior."""
from datetime import datetime
from pathlib import Path
from types import SimpleNamespace
from unittest.mock import MagicMock, patch
@@ -84,18 +83,3 @@ def test_show_session_status_prints_gateway_style_summary():
_, kwargs = cli_obj.console.print.call_args
assert kwargs.get("highlight") is False
assert kwargs.get("markup") is False
def test_profile_command_reports_custom_root_profile(monkeypatch, tmp_path, capsys):
"""Profile detection works for custom-root deployments (not under ~/.hermes)."""
cli_obj = _make_cli()
profile_home = tmp_path / "profiles" / "coder"
monkeypatch.setenv("HERMES_HOME", str(profile_home))
monkeypatch.setattr(Path, "home", lambda: tmp_path / "unrelated-home")
cli_obj._handle_profile_command()
out = capsys.readouterr().out
assert "Profile: coder" in out
assert f"Home: {profile_home}" in out
-12
View File
@@ -144,18 +144,6 @@ class TestGatewayPersonalityNone:
assert "none" in result.lower()
@pytest.mark.asyncio
async def test_empty_personality_list_uses_profile_display_path(self, tmp_path):
runner = self._make_runner(personalities={})
(tmp_path / "config.yaml").write_text(yaml.dump({"agent": {"personalities": {}}}))
with patch("gateway.run._hermes_home", tmp_path), \
patch("hermes_constants.display_hermes_home", return_value="~/.hermes/profiles/coder"):
event = self._make_event("")
result = await runner._handle_personality_command(event)
assert result == "No personalities configured in `~/.hermes/profiles/coder/config.yaml`"
class TestPersonalityDictFormat:
"""Test dict-format custom personalities with description, tone, style."""
+1 -115
View File
@@ -8,8 +8,6 @@ from unittest.mock import AsyncMock, patch, MagicMock
import pytest
from cron.scheduler import _resolve_origin, _resolve_delivery_target, _deliver_result, _send_media_via_adapter, run_job, SILENT_MARKER, _build_job_prompt
from tools.env_passthrough import clear_env_passthrough
from tools.credential_files import clear_credential_files
class TestResolveOrigin:
@@ -235,10 +233,9 @@ class TestDeliverResultWrapping:
send_mock.assert_called_once()
sent_content = send_mock.call_args.kwargs.get("content") or send_mock.call_args[0][-1]
assert "Cronjob Response: daily-report" in sent_content
assert "(job_id: test-job)" in sent_content
assert "-------------" in sent_content
assert "Here is today's summary." in sent_content
assert "To stop or manage this job" in sent_content
assert "The agent cannot see this message" in sent_content
def test_delivery_uses_job_id_when_no_name(self):
"""When a job has no name, the wrapper should fall back to job id."""
@@ -879,117 +876,6 @@ class TestRunJobPerJobOverrides:
class TestRunJobSkillBacked:
def test_run_job_preserves_skill_env_passthrough_into_worker_thread(self, tmp_path):
job = {
"id": "skill-env-job",
"name": "skill env test",
"prompt": "Use the skill.",
"skill": "notion",
}
fake_db = MagicMock()
def _skill_view(name):
assert name == "notion"
from tools.env_passthrough import register_env_passthrough
register_env_passthrough(["NOTION_API_KEY"])
return json.dumps({"success": True, "content": "# notion\nUse Notion."})
def _run_conversation(prompt):
from tools.env_passthrough import get_all_passthrough
assert "NOTION_API_KEY" in get_all_passthrough()
return {"final_response": "ok"}
with patch("cron.scheduler._hermes_home", tmp_path), \
patch("cron.scheduler._resolve_origin", return_value=None), \
patch("dotenv.load_dotenv"), \
patch("hermes_state.SessionDB", return_value=fake_db), \
patch(
"hermes_cli.runtime_provider.resolve_runtime_provider",
return_value={
"api_key": "***",
"base_url": "https://example.invalid/v1",
"provider": "openrouter",
"api_mode": "chat_completions",
},
), \
patch("tools.skills_tool.skill_view", side_effect=_skill_view), \
patch("run_agent.AIAgent") as mock_agent_cls:
mock_agent = MagicMock()
mock_agent.run_conversation.side_effect = _run_conversation
mock_agent_cls.return_value = mock_agent
try:
success, output, final_response, error = run_job(job)
finally:
clear_env_passthrough()
assert success is True
assert error is None
assert final_response == "ok"
def test_run_job_preserves_credential_file_passthrough_into_worker_thread(self, tmp_path):
"""copy_context() also propagates credential_files ContextVar."""
job = {
"id": "cred-env-job",
"name": "cred file test",
"prompt": "Use the skill.",
"skill": "google-workspace",
}
fake_db = MagicMock()
# Create a credential file so register_credential_file succeeds
cred_dir = tmp_path / "credentials"
cred_dir.mkdir()
(cred_dir / "google_token.json").write_text('{"token": "t"}')
def _skill_view(name):
assert name == "google-workspace"
from tools.credential_files import register_credential_file
register_credential_file("credentials/google_token.json")
return json.dumps({"success": True, "content": "# google-workspace\nUse Google."})
def _run_conversation(prompt):
from tools.credential_files import _get_registered
registered = _get_registered()
assert registered, "credential files must be visible in worker thread"
assert any("google_token.json" in v for v in registered.values())
return {"final_response": "ok"}
with patch("cron.scheduler._hermes_home", tmp_path), \
patch("cron.scheduler._resolve_origin", return_value=None), \
patch("tools.credential_files._resolve_hermes_home", return_value=tmp_path), \
patch("dotenv.load_dotenv"), \
patch("hermes_state.SessionDB", return_value=fake_db), \
patch(
"hermes_cli.runtime_provider.resolve_runtime_provider",
return_value={
"api_key": "***",
"base_url": "https://example.invalid/v1",
"provider": "openrouter",
"api_mode": "chat_completions",
},
), \
patch("tools.skills_tool.skill_view", side_effect=_skill_view), \
patch("run_agent.AIAgent") as mock_agent_cls:
mock_agent = MagicMock()
mock_agent.run_conversation.side_effect = _run_conversation
mock_agent_cls.return_value = mock_agent
try:
success, output, final_response, error = run_job(job)
finally:
clear_credential_files()
assert success is True
assert error is None
assert final_response == "ok"
def test_run_job_loads_skill_and_disables_recursive_cron_tools(self, tmp_path):
job = {
"id": "skill-job",
-66
View File
@@ -1,66 +0,0 @@
"""Shared fixtures for gateway tests.
The ``_ensure_telegram_mock`` helper guarantees that a minimal mock of
the ``telegram`` package is registered in :data:`sys.modules` **before**
any test file triggers ``from gateway.platforms.telegram import ...``.
Without this, ``pytest-xdist`` workers that happen to collect
``test_telegram_caption_merge.py`` (bare top-level import, no per-file
mock) first will cache ``ChatType = None`` from the production
ImportError fallback, causing 30+ downstream test failures wherever
``ChatType.GROUP`` / ``ChatType.SUPERGROUP`` is accessed.
Individual test files may still call their own ``_ensure_telegram_mock``
it short-circuits when the mock is already present.
"""
import sys
from unittest.mock import MagicMock
def _ensure_telegram_mock() -> None:
"""Install a comprehensive telegram mock in sys.modules.
Idempotent skips when the real library is already imported.
Uses ``sys.modules[name] = mod`` (overwrite) instead of
``setdefault`` so it wins even if a partial/broken import
already cached a module with ``ChatType = None``.
"""
if "telegram" in sys.modules and hasattr(sys.modules["telegram"], "__file__"):
return # Real library is installed — nothing to mock
mod = MagicMock()
mod.ext.ContextTypes.DEFAULT_TYPE = type(None)
mod.constants.ParseMode.MARKDOWN = "Markdown"
mod.constants.ParseMode.MARKDOWN_V2 = "MarkdownV2"
mod.constants.ParseMode.HTML = "HTML"
mod.constants.ChatType.PRIVATE = "private"
mod.constants.ChatType.GROUP = "group"
mod.constants.ChatType.SUPERGROUP = "supergroup"
mod.constants.ChatType.CHANNEL = "channel"
# Real exception classes so ``except (NetworkError, ...)`` clauses
# in production code don't blow up with TypeError.
mod.error.NetworkError = type("NetworkError", (OSError,), {})
mod.error.TimedOut = type("TimedOut", (OSError,), {})
mod.error.BadRequest = type("BadRequest", (Exception,), {})
mod.error.Forbidden = type("Forbidden", (Exception,), {})
mod.error.InvalidToken = type("InvalidToken", (Exception,), {})
mod.error.RetryAfter = type("RetryAfter", (Exception,), {"retry_after": 1})
mod.error.Conflict = type("Conflict", (Exception,), {})
# Update.ALL_TYPES used in start_polling()
mod.Update.ALL_TYPES = []
for name in (
"telegram",
"telegram.ext",
"telegram.constants",
"telegram.request",
):
sys.modules[name] = mod
sys.modules["telegram.error"] = mod.error
# Run at collection time — before any test file's module-level imports.
_ensure_telegram_mock()
-169
View File
@@ -1016,47 +1016,6 @@ class TestResponsesEndpoint:
assert len(call_kwargs["conversation_history"]) > 0
assert call_kwargs["user_message"] == "Now add 1 more"
@pytest.mark.asyncio
async def test_previous_response_id_preserves_session(self, adapter):
"""Chained responses via previous_response_id reuse the same session_id."""
mock_result = {
"final_response": "ok",
"messages": [{"role": "assistant", "content": "ok"}],
"api_calls": 1,
}
usage = {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0}
app = _create_app(adapter)
async with TestClient(TestServer(app)) as cli:
# First request — establishes a session
with patch.object(adapter, "_run_agent", new_callable=AsyncMock) as mock_run:
mock_run.return_value = (mock_result, usage)
resp1 = await cli.post(
"/v1/responses",
json={"model": "hermes-agent", "input": "Hello"},
)
assert resp1.status == 200
first_session_id = mock_run.call_args.kwargs["session_id"]
data1 = await resp1.json()
response_id = data1["id"]
# Second request — chains from the first
with patch.object(adapter, "_run_agent", new_callable=AsyncMock) as mock_run:
mock_run.return_value = (mock_result, usage)
resp2 = await cli.post(
"/v1/responses",
json={
"model": "hermes-agent",
"input": "Follow up",
"previous_response_id": response_id,
},
)
assert resp2.status == 200
second_session_id = mock_run.call_args.kwargs["session_id"]
# Session must be the same across the chain
assert first_session_id == second_session_id
@pytest.mark.asyncio
async def test_invalid_previous_response_id_returns_404(self, adapter):
app = _create_app(adapter)
@@ -1156,134 +1115,6 @@ class TestResponsesEndpoint:
assert resp.status == 400
class TestResponsesStreaming:
@pytest.mark.asyncio
async def test_stream_true_returns_responses_sse(self, adapter):
app = _create_app(adapter)
async with TestClient(TestServer(app)) as cli:
async def _mock_run_agent(**kwargs):
cb = kwargs.get("stream_delta_callback")
if cb:
cb("Hello")
cb(" world")
return (
{"final_response": "Hello world", "messages": [], "api_calls": 1},
{"input_tokens": 10, "output_tokens": 5, "total_tokens": 15},
)
with patch.object(adapter, "_run_agent", side_effect=_mock_run_agent):
resp = await cli.post(
"/v1/responses",
json={"model": "hermes-agent", "input": "hi", "stream": True},
)
assert resp.status == 200
assert "text/event-stream" in resp.headers.get("Content-Type", "")
body = await resp.text()
assert "event: response.created" in body
assert "event: response.output_text.delta" in body
assert "event: response.output_text.done" in body
assert "event: response.completed" in body
assert '"sequence_number":' in body
assert '"logprobs": []' in body
assert "Hello" in body
assert " world" in body
@pytest.mark.asyncio
async def test_stream_emits_function_call_and_output_items(self, adapter):
app = _create_app(adapter)
async with TestClient(TestServer(app)) as cli:
async def _mock_run_agent(**kwargs):
start_cb = kwargs.get("tool_start_callback")
complete_cb = kwargs.get("tool_complete_callback")
text_cb = kwargs.get("stream_delta_callback")
if start_cb:
start_cb("call_123", "read_file", {"path": "/tmp/test.txt"})
if complete_cb:
complete_cb("call_123", "read_file", {"path": "/tmp/test.txt"}, '{"content":"hello"}')
if text_cb:
text_cb("Done.")
return (
{
"final_response": "Done.",
"messages": [
{
"role": "assistant",
"tool_calls": [
{
"id": "call_123",
"function": {
"name": "read_file",
"arguments": '{"path":"/tmp/test.txt"}',
},
}
],
},
{
"role": "tool",
"tool_call_id": "call_123",
"content": '{"content":"hello"}',
},
],
"api_calls": 1,
},
{"input_tokens": 10, "output_tokens": 5, "total_tokens": 15},
)
with patch.object(adapter, "_run_agent", side_effect=_mock_run_agent):
resp = await cli.post(
"/v1/responses",
json={"model": "hermes-agent", "input": "read the file", "stream": True},
)
assert resp.status == 200
body = await resp.text()
assert "event: response.output_item.added" in body
assert "event: response.output_item.done" in body
assert body.count("event: response.output_item.done") >= 2
assert '"type": "function_call"' in body
assert '"type": "function_call_output"' in body
assert '"call_id": "call_123"' in body
assert '"name": "read_file"' in body
assert '"output": [{"type": "input_text", "text": "{\\"content\\":\\"hello\\"}"}]' in body
@pytest.mark.asyncio
async def test_streamed_response_is_stored_for_get(self, adapter):
app = _create_app(adapter)
async with TestClient(TestServer(app)) as cli:
async def _mock_run_agent(**kwargs):
cb = kwargs.get("stream_delta_callback")
if cb:
cb("Stored response")
return (
{"final_response": "Stored response", "messages": [], "api_calls": 1},
{"input_tokens": 1, "output_tokens": 2, "total_tokens": 3},
)
with patch.object(adapter, "_run_agent", side_effect=_mock_run_agent):
resp = await cli.post(
"/v1/responses",
json={"model": "hermes-agent", "input": "store this", "stream": True},
)
body = await resp.text()
response_id = None
for line in body.splitlines():
if line.startswith("data: "):
try:
payload = json.loads(line[len("data: "):])
except json.JSONDecodeError:
continue
if payload.get("type") == "response.completed":
response_id = payload["response"]["id"]
break
assert response_id
get_resp = await cli.get(f"/v1/responses/{response_id}")
assert get_resp.status == 200
data = await get_resp.json()
assert data["id"] == response_id
assert data["status"] == "completed"
assert data["output"][-1]["content"][0]["text"] == "Stored response"
# ---------------------------------------------------------------------------
# Auth on endpoints
# ---------------------------------------------------------------------------
-95
View File
@@ -1,95 +0,0 @@
"""Tests for the auto-continue feature (#4493).
When the gateway restarts mid-agent-work, the session transcript ends on a
tool result that the agent never processed. The auto-continue logic detects
this and prepends a system note to the next user message so the model
finishes the interrupted work before addressing the new input.
"""
import pytest
def _simulate_auto_continue(agent_history: list, user_message: str) -> str:
"""Reproduce the auto-continue injection logic from _run_agent().
This mirrors the exact code in gateway/run.py so we can test the
detection and message transformation without spinning up a full
gateway runner.
"""
message = user_message
if agent_history and agent_history[-1].get("role") == "tool":
message = (
"[System note: Your previous turn was interrupted before you could "
"process the last tool result(s). The conversation history contains "
"tool outputs you haven't responded to yet. Please finish processing "
"those results and summarize what was accomplished, then address the "
"user's new message below.]\n\n"
+ message
)
return message
class TestAutoDetection:
"""Test that trailing tool results are correctly detected."""
def test_trailing_tool_result_triggers_note(self):
history = [
{"role": "user", "content": "deploy the app"},
{"role": "assistant", "content": None, "tool_calls": [
{"id": "call_1", "function": {"name": "terminal", "arguments": "{}"}}
]},
{"role": "tool", "tool_call_id": "call_1", "content": "deployed successfully"},
]
result = _simulate_auto_continue(history, "what happened?")
assert "[System note:" in result
assert "interrupted" in result
assert "what happened?" in result
def test_trailing_assistant_message_no_note(self):
history = [
{"role": "user", "content": "hello"},
{"role": "assistant", "content": "Hi there!"},
]
result = _simulate_auto_continue(history, "how are you?")
assert "[System note:" not in result
assert result == "how are you?"
def test_empty_history_no_note(self):
result = _simulate_auto_continue([], "hello")
assert result == "hello"
def test_trailing_user_message_no_note(self):
"""Shouldn't happen in practice, but ensure no false positive."""
history = [
{"role": "user", "content": "hello"},
]
result = _simulate_auto_continue(history, "hello again")
assert result == "hello again"
def test_multiple_tool_results_still_triggers(self):
"""Multiple tool calls in a row — last one is still role=tool."""
history = [
{"role": "user", "content": "search and read"},
{"role": "assistant", "content": None, "tool_calls": [
{"id": "call_1", "function": {"name": "search", "arguments": "{}"}},
{"id": "call_2", "function": {"name": "read", "arguments": "{}"}},
]},
{"role": "tool", "tool_call_id": "call_1", "content": "found it"},
{"role": "tool", "tool_call_id": "call_2", "content": "file content here"},
]
result = _simulate_auto_continue(history, "continue")
assert "[System note:" in result
def test_original_message_preserved_after_note(self):
"""The user's actual message must appear after the system note."""
history = [
{"role": "assistant", "content": None, "tool_calls": [
{"id": "c1", "function": {"name": "t", "arguments": "{}"}}
]},
{"role": "tool", "tool_call_id": "c1", "content": "done"},
]
result = _simulate_auto_continue(history, "now do X")
# System note comes first, then user's message
note_end = result.index("]\n\n")
user_msg_start = result.index("now do X")
assert user_msg_start > note_end
@@ -14,7 +14,7 @@ from unittest.mock import AsyncMock, patch
import pytest
from gateway.config import GatewayConfig, Platform
from gateway.run import GatewayRunner, _parse_session_key
from gateway.run import GatewayRunner
# ---------------------------------------------------------------------------
@@ -45,7 +45,7 @@ def _build_runner(monkeypatch, tmp_path, mode: str) -> GatewayRunner:
monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path)
runner = GatewayRunner(GatewayConfig())
adapter = SimpleNamespace(send=AsyncMock(), handle_message=AsyncMock())
adapter = SimpleNamespace(send=AsyncMock())
runner.adapters[Platform.TELEGRAM] = adapter
return runner
@@ -243,174 +243,3 @@ async def test_no_thread_id_sends_no_metadata(monkeypatch, tmp_path):
assert adapter.send.await_count == 1
_, kwargs = adapter.send.call_args
assert kwargs["metadata"] is None
@pytest.mark.asyncio
async def test_inject_watch_notification_routes_from_session_store_origin(monkeypatch, tmp_path):
from gateway.session import SessionSource
runner = _build_runner(monkeypatch, tmp_path, "all")
adapter = runner.adapters[Platform.TELEGRAM]
runner.session_store._entries["agent:main:telegram:group:-100:42"] = SimpleNamespace(
origin=SessionSource(
platform=Platform.TELEGRAM,
chat_id="-100",
chat_type="group",
thread_id="42",
user_id="123",
user_name="Emiliyan",
)
)
evt = {
"session_id": "proc_watch",
"session_key": "agent:main:telegram:group:-100:42",
}
await runner._inject_watch_notification("[SYSTEM: Background process matched]", evt)
adapter.handle_message.assert_awaited_once()
synth_event = adapter.handle_message.await_args.args[0]
assert synth_event.internal is True
assert synth_event.source.platform == Platform.TELEGRAM
assert synth_event.source.chat_id == "-100"
assert synth_event.source.chat_type == "group"
assert synth_event.source.thread_id == "42"
assert synth_event.source.user_id == "123"
assert synth_event.source.user_name == "Emiliyan"
def test_build_process_event_source_falls_back_to_session_key_chat_type(monkeypatch, tmp_path):
runner = _build_runner(monkeypatch, tmp_path, "all")
evt = {
"session_id": "proc_watch",
"session_key": "agent:main:telegram:group:-100:42",
"platform": "telegram",
"chat_id": "-100",
"thread_id": "42",
"user_id": "123",
"user_name": "Emiliyan",
}
source = runner._build_process_event_source(evt)
assert source is not None
assert source.platform == Platform.TELEGRAM
assert source.chat_id == "-100"
assert source.chat_type == "group"
assert source.thread_id == "42"
assert source.user_id == "123"
assert source.user_name == "Emiliyan"
@pytest.mark.asyncio
async def test_inject_watch_notification_ignores_foreground_event_source(monkeypatch, tmp_path):
"""Negative test: watch notification must NOT route to the foreground thread."""
from gateway.session import SessionSource
runner = _build_runner(monkeypatch, tmp_path, "all")
adapter = runner.adapters[Platform.TELEGRAM]
# Session store has the process's original thread (thread 42)
runner.session_store._entries["agent:main:telegram:group:-100:42"] = SimpleNamespace(
origin=SessionSource(
platform=Platform.TELEGRAM,
chat_id="-100",
chat_type="group",
thread_id="42",
user_id="proc_owner",
user_name="alice",
)
)
# The evt dict carries the correct session_key — NOT a foreground event
evt = {
"session_id": "proc_cross_thread",
"session_key": "agent:main:telegram:group:-100:42",
}
await runner._inject_watch_notification("[SYSTEM: watch match]", evt)
adapter.handle_message.assert_awaited_once()
synth_event = adapter.handle_message.await_args.args[0]
# Must route to thread 42 (process origin), NOT some other thread
assert synth_event.source.thread_id == "42"
assert synth_event.source.user_id == "proc_owner"
def test_build_process_event_source_returns_none_for_empty_evt(monkeypatch, tmp_path):
"""Missing session_key and no platform metadata → None (drop notification)."""
runner = _build_runner(monkeypatch, tmp_path, "all")
source = runner._build_process_event_source({"session_id": "proc_orphan"})
assert source is None
def test_build_process_event_source_returns_none_for_invalid_platform(monkeypatch, tmp_path):
"""Invalid platform string → None."""
runner = _build_runner(monkeypatch, tmp_path, "all")
evt = {
"session_id": "proc_bad",
"platform": "not_a_real_platform",
"chat_type": "dm",
"chat_id": "123",
}
source = runner._build_process_event_source(evt)
assert source is None
def test_build_process_event_source_returns_none_for_short_session_key(monkeypatch, tmp_path):
"""Session key with <5 parts doesn't parse, falls through to empty metadata → None."""
runner = _build_runner(monkeypatch, tmp_path, "all")
evt = {
"session_id": "proc_short",
"session_key": "agent:main:telegram", # Too few parts
}
source = runner._build_process_event_source(evt)
assert source is None
# ---------------------------------------------------------------------------
# _parse_session_key helper
# ---------------------------------------------------------------------------
def test_parse_session_key_valid():
result = _parse_session_key("agent:main:telegram:group:-100")
assert result == {"platform": "telegram", "chat_type": "group", "chat_id": "-100"}
def test_parse_session_key_with_extra_parts():
"""6th part in a group key may be a user_id, not a thread_id — omit it."""
result = _parse_session_key("agent:main:discord:group:chan123:thread456")
assert result == {"platform": "discord", "chat_type": "group", "chat_id": "chan123"}
def test_parse_session_key_with_user_id_part():
"""Group keys with per-user isolation have user_id as 6th part — don't return as thread_id."""
result = _parse_session_key("agent:main:telegram:group:chat1:user99")
assert result == {"platform": "telegram", "chat_type": "group", "chat_id": "chat1"}
def test_parse_session_key_dm_with_thread():
"""DM keys use parts[5] as thread_id unambiguously."""
result = _parse_session_key("agent:main:telegram:dm:chat1:topic42")
assert result == {"platform": "telegram", "chat_type": "dm", "chat_id": "chat1", "thread_id": "topic42"}
def test_parse_session_key_thread_chat_type():
"""Thread-typed keys use parts[5] as thread_id unambiguously."""
result = _parse_session_key("agent:main:discord:thread:chan1:thread99")
assert result == {"platform": "discord", "chat_type": "thread", "chat_id": "chan1", "thread_id": "thread99"}
def test_parse_session_key_too_short():
assert _parse_session_key("agent:main:telegram") is None
assert _parse_session_key("") is None
def test_parse_session_key_wrong_prefix():
assert _parse_session_key("cron:main:telegram:dm:123") is None
assert _parse_session_key("agent:cron:telegram:dm:123") is None

Some files were not shown because too many files have changed in this diff Show More