Compare commits

..

1 Commits

Author SHA1 Message Date
Shannon Sands bad9fe2452 add generic gateway startup readiness checks 2026-04-15 10:03:23 +10:00
302 changed files with 5856 additions and 23827 deletions
-13
View File
@@ -24,15 +24,6 @@
# Optional base URL override (default: Google's OpenAI-compatible endpoint)
# GEMINI_BASE_URL=https://generativelanguage.googleapis.com/v1beta/openai
# =============================================================================
# LLM PROVIDER (Ollama Cloud)
# =============================================================================
# Cloud-hosted open models via Ollama's OpenAI-compatible endpoint.
# Get your key at: https://ollama.com/settings
# OLLAMA_API_KEY=your_ollama_key_here
# Optional base URL override (default: https://ollama.com/v1)
# OLLAMA_BASE_URL=https://ollama.com/v1
# =============================================================================
# LLM PROVIDER (z.ai / GLM)
# =============================================================================
@@ -154,10 +145,6 @@
# Only override here if you need to force a backend without touching config.yaml:
# TERMINAL_ENV=local
# Override the container runtime binary (e.g. to use Podman instead of Docker).
# Useful on systems where Docker's storage driver is broken or unavailable.
# HERMES_DOCKER_BINARY=/usr/local/bin/podman
# Container images (for singularity/docker/modal backends)
# TERMINAL_DOCKER_IMAGE=nikolaik/python-nodejs:python3.11-nodejs20
# TERMINAL_SINGULARITY_IMAGE=docker://nikolaik/python-nodejs:python3.11-nodejs20
+8 -10
View File
@@ -1,12 +1,11 @@
name: Deploy Site
on:
release:
types: [published]
push:
branches: [main]
paths:
- 'website/**'
- 'landingpage/**'
- 'skills/**'
- 'optional-skills/**'
- '.github/workflows/deploy-site.yml'
@@ -21,14 +20,8 @@ concurrency:
cancel-in-progress: false
jobs:
deploy-vercel:
if: github.event_name == 'release'
runs-on: ubuntu-latest
steps:
- name: Trigger Vercel Deploy
run: curl -X POST "${{ secrets.VERCEL_DEPLOY_HOOK }}"
deploy-docs:
build-and-deploy:
# Only run on the upstream repository, not on forks
if: github.repository == 'NousResearch/hermes-agent'
runs-on: ubuntu-latest
environment:
@@ -72,7 +65,12 @@ jobs:
- name: Stage deployment
run: |
mkdir -p _site/docs
# Landing page at root
cp -r landingpage/* _site/
# Docusaurus at /docs/
cp -r website/build/* _site/docs/
# CNAME so GitHub Pages keeps the custom domain between deploys
echo "hermes-agent.nousresearch.com" > _site/CNAME
- name: Upload artifact
uses: actions/upload-pages-artifact@56afc609e74202658d3ffba0e8f6dda462b719fa # v3
-1
View File
@@ -105,4 +105,3 @@ tesseracttars-creator <tesseracttars@gmail.com> <tesseracttars@gmail.com>
xinbenlv <zzn+pa@zzn.im> <zzn+pa@zzn.im>
SaulJWu <saul.jj.wu@gmail.com> <saul.jj.wu@gmail.com>
angelos <angelos@oikos.lan.home.malaiwah.com> <angelos@oikos.lan.home.malaiwah.com>
MestreY0d4-Uninter <241404605+MestreY0d4-Uninter@users.noreply.github.com> <MestreY0d4-Uninter@users.noreply.github.com>
+4 -4
View File
@@ -13,7 +13,7 @@ source venv/bin/activate # ALWAYS activate before running Python
```
hermes-agent/
├── run_agent.py # AIAgent class — core conversation loop
├── model_tools.py # Tool orchestration, discover_builtin_tools(), handle_function_call()
├── model_tools.py # Tool orchestration, _discover_tools(), handle_function_call()
├── toolsets.py # Toolset definitions, _HERMES_CORE_TOOLS list
├── cli.py # HermesCLI class — interactive CLI orchestrator
├── hermes_state.py # SessionDB — SQLite session store (FTS5 search)
@@ -181,7 +181,7 @@ if canonical == "mycommand":
## Adding New Tools
Requires changes in **2 files**:
Requires changes in **3 files**:
**1. Create `tools/your_tool.py`:**
```python
@@ -204,9 +204,9 @@ registry.register(
)
```
**2. Add to `toolsets.py`** — either `_HERMES_CORE_TOOLS` (all platforms) or a new toolset.
**2. Add import** in `model_tools.py` `_discover_tools()` list.
Auto-discovery: any `tools/*.py` file with a top-level `registry.register()` call is imported automatically — no manual import list to maintain.
**3. Add to `toolsets.py`** — either `_HERMES_CORE_TOOLS` (all platforms) or a new toolset.
The registry handles schema collection, dispatch, availability checking, and error wrapping. All handlers MUST return a JSON string.
-27
View File
@@ -1,27 +0,0 @@
# Hermes Agent v0.10.0 (v2026.4.16)
**Release Date:** April 16, 2026
> The Tool Gateway release — paid Nous Portal subscribers can now use web search, image generation, text-to-speech, and browser automation through their existing subscription with zero additional API keys.
---
## ✨ Highlights
- **Nous Tool Gateway** — Paid [Nous Portal](https://portal.nousresearch.com) subscribers now get automatic access to **web search** (Firecrawl), **image generation** (FAL / FLUX 2 Pro), **text-to-speech** (OpenAI TTS), and **browser automation** (Browser Use) through their existing subscription. No separate API keys needed — just run `hermes model`, select Nous Portal, and pick which tools to enable. Per-tool opt-in via `use_gateway` config, full integration with `hermes tools` and `hermes status`, and the runtime correctly prefers the gateway even when direct API keys exist. Replaces the old hidden `HERMES_ENABLE_NOUS_MANAGED_TOOLS` env var with clean subscription-based detection. ([#11206](https://github.com/NousResearch/hermes-agent/pull/11206), based on work by @jquesnelle; docs: [#11208](https://github.com/NousResearch/hermes-agent/pull/11208))
---
## 🐛 Bug Fixes & Improvements
This release includes 180+ commits with numerous bug fixes, platform improvements, and reliability enhancements across the agent core, gateway, CLI, and tool system. Full details will be published in the v0.11.0 changelog.
---
## 👥 Contributors
- **@jquesnelle** (emozilla) — Original Tool Gateway implementation ([#10799](https://github.com/NousResearch/hermes-agent/pull/10799)), salvaged and shipped in this release
---
**Full Changelog**: [v2026.4.13...v2026.4.16](https://github.com/NousResearch/hermes-agent/compare/v2026.4.13...v2026.4.16)
-84
View File
@@ -1,84 +0,0 @@
# Hermes Agent Security Policy
This document outlines the security protocols, trust model, and deployment hardening guidelines for the **Hermes Agent** project.
## 1. Vulnerability Reporting
Hermes Agent does **not** operate a bug bounty program. Security issues should be reported via [GitHub Security Advisories (GHSA)](https://github.com/NousResearch/hermes-agent/security/advisories/new) or by emailing **security@nousresearch.com**. Do not open public issues for security vulnerabilities.
### Required Submission Details
- **Title & Severity:** Concise description and CVSS score/rating.
- **Affected Component:** Exact file path and line range (e.g., `tools/approval.py:120-145`).
- **Environment:** Output of `hermes version`, commit SHA, OS, and Python version.
- **Reproduction:** Step-by-step Proof-of-Concept (PoC) against `main` or the latest release.
- **Impact:** Explanation of what trust boundary was crossed.
---
## 2. Trust Model
The core assumption is that Hermes is a **personal agent** with one trusted operator.
### Operator & Session Trust
- **Single Tenant:** The system protects the operator from LLM actions, not from malicious co-tenants. Multi-user isolation must happen at the OS/host level.
- **Gateway Security:** Authorized callers (Telegram, Discord, Slack, etc.) receive equal trust. Session keys are used for routing, not as authorization boundaries.
- **Execution:** Defaults to `terminal.backend: local` (direct host execution). Container isolation (Docker, Modal, Daytona) is opt-in for sandboxing.
### Dangerous Command Approval
The approval system (`tools/approval.py`) is a core security boundary. Terminal commands, file operations, and other potentially destructive actions are gated behind explicit user confirmation before execution. The approval mode is configurable via `approvals.mode` in `config.yaml`:
- `"on"` (default) — prompts the user to approve dangerous commands.
- `"auto"` — auto-approves after a configurable delay.
- `"off"` — disables the gate entirely (break-glass; see Section 3).
### Output Redaction
`agent/redact.py` strips secret-like patterns (API keys, tokens, credentials) from all display output before it reaches the terminal or gateway platform. This prevents accidental credential leakage in chat logs, tool previews, and response text. Redaction operates on the display layer only — underlying values remain intact for internal agent operations.
### Skills vs. MCP Servers
- **Installed Skills:** High trust. Equivalent to local host code; skills can read environment variables and run arbitrary commands.
- **MCP Servers:** Lower trust. MCP subprocesses receive a filtered environment (`_build_safe_env()` in `tools/mcp_tool.py`) — only safe baseline variables (`PATH`, `HOME`, `XDG_*`) plus variables explicitly declared in the server's `env` config block are passed through. Host credentials are stripped by default. Additionally, packages invoked via `npx`/`uvx` are checked against the OSV malware database before spawning.
### Code Execution Sandbox
The `execute_code` tool (`tools/code_execution_tool.py`) runs LLM-generated Python scripts in a child process with API keys and tokens stripped from the environment to prevent credential exfiltration. Only environment variables explicitly declared by loaded skills (via `env_passthrough`) or by the user in `config.yaml` (`terminal.env_passthrough`) are passed through. The child accesses Hermes tools via RPC, not direct API calls.
### Subagents
- **No recursive delegation:** The `delegate_task` tool is disabled for child agents.
- **Depth limit:** `MAX_DEPTH = 2` — parent (depth 0) can spawn a child (depth 1); grandchildren are rejected.
- **Memory isolation:** Subagents run with `skip_memory=True` and do not have access to the parent's persistent memory provider. The parent receives only the task prompt and final response as an observation.
---
## 3. Out of Scope (Non-Vulnerabilities)
The following scenarios are **not** considered security breaches:
- **Prompt Injection:** Unless it results in a concrete bypass of the approval system, toolset restrictions, or container sandbox.
- **Public Exposure:** Deploying the gateway to the public internet without external authentication or network protection.
- **Trusted State Access:** Reports that require pre-existing write access to `~/.hermes/`, `.env`, or `config.yaml` (these are operator-owned files).
- **Default Behavior:** Host-level command execution when `terminal.backend` is set to `local` — this is the documented default, not a vulnerability.
- **Configuration Trade-offs:** Intentional break-glass settings such as `approvals.mode: "off"` or `terminal.backend: local` in production.
- **Tool-level read/access restrictions:** The agent has unrestricted shell access via the `terminal` tool by design. Reports that a specific tool (e.g., `read_file`) can access a resource are not vulnerabilities if the same access is available through `terminal`. Tool-level deny lists only constitute a meaningful security boundary when paired with equivalent restrictions on the terminal side (as with write operations, where `WRITE_DENIED_PATHS` is paired with the dangerous command approval system).
---
## 4. Deployment Hardening & Best Practices
### Filesystem & Network
- **Production sandboxing:** Use container backends (`docker`, `modal`, `daytona`) instead of `local` for untrusted workloads.
- **File permissions:** Run as non-root (the Docker image uses UID 10000); protect credentials with `chmod 600 ~/.hermes/.env` on local installs.
- **Network exposure:** Do not expose the gateway or API server to the public internet without VPN, Tailscale, or firewall protection. SSRF protection is enabled by default across all gateway platform adapters (Telegram, Discord, Slack, Matrix, Mattermost, etc.) with redirect validation. Note: the local terminal backend does not apply SSRF filtering, as it operates within the trusted operator's environment.
### Skills & Supply Chain
- **Skill installation:** Review Skills Guard reports (`tools/skills_guard.py`) before installing third-party skills. The audit log at `~/.hermes/skills/.hub/audit.log` tracks every install and removal.
- **MCP safety:** OSV malware checking runs automatically for `npx`/`uvx` packages before MCP server processes are spawned.
- **CI/CD:** GitHub Actions are pinned to full commit SHAs. The `supply-chain-audit.yml` workflow blocks PRs containing `.pth` files or suspicious `base64`+`exec` patterns.
### Credential Storage
- API keys and tokens belong exclusively in `~/.hermes/.env` — never in `config.yaml` or checked into version control.
- The credential pool system (`agent/credential_pool.py`) handles key rotation and fallback. Credentials are resolved from environment variables, not stored in plaintext databases.
---
## 5. Disclosure Process
- **Coordinated Disclosure:** 90-day window or until a fix is released, whichever comes first.
- **Communication:** All updates occur via the GHSA thread or email correspondence with security@nousresearch.com.
- **Credits:** Reporters are credited in release notes unless anonymity is requested.
+11 -120
View File
@@ -28,45 +28,19 @@ except ImportError:
logger = logging.getLogger(__name__)
THINKING_BUDGET = {"xhigh": 32000, "high": 16000, "medium": 8000, "low": 4000}
# Hermes effort → Anthropic adaptive-thinking effort (output_config.effort).
# Anthropic exposes 5 levels on 4.7+: low, medium, high, xhigh, max.
# Opus/Sonnet 4.6 only expose 4 levels: low, medium, high, max — no xhigh.
# We preserve xhigh as xhigh on 4.7+ (the recommended default for coding/
# agentic work) and downgrade it to max on pre-4.7 adaptive models (which
# is the strongest level they accept). "minimal" is a legacy alias that
# maps to low on every model. See:
# https://platform.claude.com/docs/en/about-claude/models/migration-guide
ADAPTIVE_EFFORT_MAP = {
"max": "max",
"xhigh": "xhigh",
"high": "high",
"medium": "medium",
"low": "low",
"xhigh": "max",
"high": "high",
"medium": "medium",
"low": "low",
"minimal": "low",
}
# Models that accept the "xhigh" output_config.effort level. Opus 4.7 added
# xhigh as a distinct level between high and max; older adaptive-thinking
# models (4.6) reject it with a 400. Keep this substring list in sync with
# the Anthropic migration guide as new model families ship.
_XHIGH_EFFORT_SUBSTRINGS = ("4-7", "4.7")
# Models where extended thinking is deprecated/removed (4.6+ behavior: adaptive
# is the only supported mode; 4.7 additionally forbids manual thinking entirely
# and drops temperature/top_p/top_k).
_ADAPTIVE_THINKING_SUBSTRINGS = ("4-6", "4.6", "4-7", "4.7")
# Models where temperature/top_p/top_k return 400 if set to non-default values.
# This is the Opus 4.7 contract; future 4.x+ models are expected to follow it.
_NO_SAMPLING_PARAMS_SUBSTRINGS = ("4-7", "4.7")
# ── Max output token limits per Anthropic model ───────────────────────
# Source: Anthropic docs + Cline model catalog. Anthropic's API requires
# max_tokens as a mandatory field. Previously we hardcoded 16384, which
# starves thinking-enabled models (thinking tokens count toward the limit).
_ANTHROPIC_OUTPUT_LIMITS = {
# Claude 4.7
"claude-opus-4-7": 128_000,
# Claude 4.6
"claude-opus-4-6": 128_000,
"claude-sonnet-4-6": 64_000,
@@ -117,37 +91,11 @@ def _get_anthropic_max_output(model: str) -> int:
def _supports_adaptive_thinking(model: str) -> bool:
"""Return True for Claude 4.6+ models that support adaptive thinking."""
return any(v in model for v in _ADAPTIVE_THINKING_SUBSTRINGS)
"""Return True for Claude 4.6 models that support adaptive thinking."""
return any(v in model for v in ("4-6", "4.6"))
def _supports_xhigh_effort(model: str) -> bool:
"""Return True for models that accept the 'xhigh' adaptive effort level.
Opus 4.7 introduced xhigh as a distinct level between high and max.
Pre-4.7 adaptive models (Opus/Sonnet 4.6) only accept low/medium/high/max
and reject xhigh with an HTTP 400. Callers should downgrade xhigh→max
when this returns False.
"""
return any(v in model for v in _XHIGH_EFFORT_SUBSTRINGS)
def _forbids_sampling_params(model: str) -> bool:
"""Return True for models that 400 on any non-default temperature/top_p/top_k.
Opus 4.7 explicitly rejects sampling parameters; later Claude releases are
expected to follow suit. Callers should omit these fields entirely rather
than passing zero/default values (the API rejects anything non-null).
"""
return any(v in model for v in _NO_SAMPLING_PARAMS_SUBSTRINGS)
# Beta headers for enhanced features (sent with ALL auth types).
# As of Opus 4.7 (2026-04-16), both of these are GA on Claude 4.6+ — the
# beta headers are still accepted (harmless no-op) but not required. Kept
# here so older Claude (4.5, 4.1) + third-party Anthropic-compat endpoints
# that still gate on the headers continue to get the enhanced features.
# Migration guide: remove these if you no longer support ≤4.5 models.
# Beta headers for enhanced features (sent with ALL auth types)
_COMMON_BETAS = [
"interleaved-thinking-2025-05-14",
"fine-grained-tool-streaming-2025-05-14",
@@ -350,33 +298,6 @@ def build_anthropic_client(api_key: str, base_url: str = None):
return _anthropic_sdk.Anthropic(**kwargs)
def build_anthropic_bedrock_client(region: str):
"""Create an AnthropicBedrock client for Bedrock Claude models.
Uses the Anthropic SDK's native Bedrock adapter, which provides full
Claude feature parity: prompt caching, thinking budgets, adaptive
thinking, fast mode — features not available via the Converse API.
Auth uses the boto3 default credential chain (IAM roles, SSO, env vars).
"""
if _anthropic_sdk is None:
raise ImportError(
"The 'anthropic' package is required for the Bedrock provider. "
"Install it with: pip install 'anthropic>=0.39.0'"
)
if not hasattr(_anthropic_sdk, "AnthropicBedrock"):
raise ImportError(
"anthropic.AnthropicBedrock not available. "
"Upgrade with: pip install 'anthropic>=0.39.0'"
)
from httpx import Timeout
return _anthropic_sdk.AnthropicBedrock(
aws_region=region,
timeout=Timeout(timeout=900.0, connect=10.0),
)
def read_claude_code_credentials() -> Optional[Dict[str, Any]]:
"""Read refreshable Claude Code OAuth credentials from ~/.claude/.credentials.json.
@@ -1393,31 +1314,18 @@ def build_anthropic_kwargs(
kwargs["tool_choice"] = {"type": "tool", "name": tool_choice}
# Map reasoning_config to Anthropic's thinking parameter.
# Claude 4.6+ models use adaptive thinking + output_config.effort.
# Claude 4.6 models use adaptive thinking + output_config.effort.
# Older models use manual thinking with budget_tokens.
# MiniMax Anthropic-compat endpoints support thinking (manual mode only,
# not adaptive). Haiku does NOT support extended thinking — skip entirely.
#
# On 4.7+ the `thinking.display` field defaults to "omitted", which
# silently hides reasoning text that Hermes surfaces in its CLI. We
# request "summarized" so the reasoning blocks stay populated — matching
# 4.6 behavior and preserving the activity-feed UX during long tool runs.
if reasoning_config and isinstance(reasoning_config, dict):
if reasoning_config.get("enabled") is not False and "haiku" not in model.lower():
effort = str(reasoning_config.get("effort", "medium")).lower()
budget = THINKING_BUDGET.get(effort, 8000)
if _supports_adaptive_thinking(model):
kwargs["thinking"] = {
"type": "adaptive",
"display": "summarized",
}
adaptive_effort = ADAPTIVE_EFFORT_MAP.get(effort, "medium")
# Downgrade xhigh→max on models that don't list xhigh as a
# supported level (Opus/Sonnet 4.6). Opus 4.7+ keeps xhigh.
if adaptive_effort == "xhigh" and not _supports_xhigh_effort(model):
adaptive_effort = "max"
kwargs["thinking"] = {"type": "adaptive"}
kwargs["output_config"] = {
"effort": adaptive_effort,
"effort": ADAPTIVE_EFFORT_MAP.get(effort, "medium")
}
else:
kwargs["thinking"] = {"type": "enabled", "budget_tokens": budget}
@@ -1425,15 +1333,6 @@ def build_anthropic_kwargs(
kwargs["temperature"] = 1
kwargs["max_tokens"] = max(effective_max_tokens, budget + 4096)
# ── Strip sampling params on 4.7+ ─────────────────────────────────
# Opus 4.7 rejects any non-default temperature/top_p/top_k with a 400.
# Callers (auxiliary_client, flush_memories, etc.) may set these for
# older models; drop them here as a safety net so upstream 4.6 → 4.7
# migrations don't require coordinated edits everywhere.
if _forbids_sampling_params(model):
for _sampling_key in ("temperature", "top_p", "top_k"):
kwargs.pop(_sampling_key, None)
# ── Fast mode (Opus 4.6 only) ────────────────────────────────────
# Adds extra_body.speed="fast" + the fast-mode beta header for ~2.5x
# output speed. Only for native Anthropic endpoints — third-party
@@ -1491,20 +1390,12 @@ def normalize_anthropic_response(
)
)
# Map Anthropic stop_reason to OpenAI finish_reason.
# Newer stop reasons added in Claude 4.5+ / 4.7:
# - refusal: the model declined to answer (cyber safeguards, CSAM, etc.)
# - model_context_window_exceeded: hit context limit (not max_tokens)
# Both need distinct handling upstream — a refusal should surface to the
# user with a clear message, and a context-window overflow should trigger
# compression/truncation rather than be treated as normal end-of-turn.
# Map Anthropic stop_reason to OpenAI finish_reason
stop_reason_map = {
"end_turn": "stop",
"tool_use": "tool_calls",
"max_tokens": "length",
"stop_sequence": "stop",
"refusal": "content_filter",
"model_context_window_exceeded": "length",
}
finish_reason = stop_reason_map.get(response.stop_reason, "stop")
+27 -128
View File
@@ -58,9 +58,6 @@ _PROVIDER_ALIASES = {
"google": "gemini",
"google-gemini": "gemini",
"google-ai-studio": "gemini",
"x-ai": "xai",
"x.ai": "xai",
"grok": "xai",
"glm": "zai",
"z-ai": "zai",
"z.ai": "zai",
@@ -107,7 +104,6 @@ _API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
"opencode-zen": "gemini-3-flash",
"opencode-go": "glm-5",
"kilocode": "google/gemini-3-flash-preview",
"ollama-cloud": "nemotron-3-nano:30b",
}
# Vision-specific model overrides for direct providers.
@@ -518,13 +514,8 @@ class _AnthropicCompletionsAdapter:
tool_choice=normalized_tool_choice,
is_oauth=self._is_oauth,
)
# Opus 4.7+ rejects any non-default temperature/top_p/top_k; only set
# temperature for models that still accept it. build_anthropic_kwargs
# additionally strips these keys as a safety net — keep both layers.
if temperature is not None:
from agent.anthropic_adapter import _forbids_sampling_params
if not _forbids_sampling_params(model):
anthropic_kwargs["temperature"] = temperature
anthropic_kwargs["temperature"] = temperature
response = self._client.messages.create(**anthropic_kwargs)
assistant_message, finish_reason = normalize_anthropic_response(response)
@@ -784,21 +775,6 @@ def _try_openrouter() -> Tuple[Optional[OpenAI], Optional[str]]:
def _try_nous(vision: bool = False) -> Tuple[Optional[OpenAI], Optional[str]]:
# Check cross-session rate limit guard before attempting Nous —
# if another session already recorded a 429, skip Nous entirely
# to avoid piling more requests onto the tapped RPH bucket.
try:
from agent.nous_rate_guard import nous_rate_limit_remaining
_remaining = nous_rate_limit_remaining()
if _remaining is not None and _remaining > 0:
logger.debug(
"Auxiliary: skipping Nous Portal (rate-limited, resets in %.0fs)",
_remaining,
)
return None, None
except Exception:
pass
nous = _read_nous_auth()
if not nous:
return None, None
@@ -923,51 +899,6 @@ def _current_custom_base_url() -> str:
return custom_base or ""
def _validate_proxy_env_urls() -> None:
"""Fail fast with a clear error when proxy env vars have malformed URLs.
Common cause: shell config (e.g. .zshrc) with a typo like
``export HTTP_PROXY=http://127.0.0.1:6153export NEXT_VAR=...``
which concatenates 'export' into the port number. Without this
check the OpenAI/httpx client raises a cryptic ``Invalid port``
error that doesn't name the offending env var.
"""
from urllib.parse import urlparse
for key in ("HTTPS_PROXY", "HTTP_PROXY", "ALL_PROXY",
"https_proxy", "http_proxy", "all_proxy"):
value = str(os.environ.get(key) or "").strip()
if not value:
continue
try:
parsed = urlparse(value)
if parsed.scheme:
_ = parsed.port # raises ValueError for e.g. '6153export'
except ValueError as exc:
raise RuntimeError(
f"Malformed proxy environment variable {key}={value!r}. "
"Fix or unset your proxy settings and try again."
) from exc
def _validate_base_url(base_url: str) -> None:
"""Reject obviously broken custom endpoint URLs before they reach httpx."""
from urllib.parse import urlparse
candidate = str(base_url or "").strip()
if not candidate or candidate.startswith("acp://"):
return
try:
parsed = urlparse(candidate)
if parsed.scheme in {"http", "https"}:
_ = parsed.port # raises ValueError for malformed ports
except ValueError as exc:
raise RuntimeError(
f"Malformed custom endpoint URL: {candidate!r}. "
"Run `hermes setup` or `hermes model` and enter a valid http(s) base URL."
) from exc
def _try_custom_endpoint() -> Tuple[Optional[OpenAI], Optional[str]]:
runtime = _resolve_custom_runtime()
if len(runtime) == 2:
@@ -1368,7 +1299,6 @@ def resolve_provider_client(
Returns:
(client, resolved_model) or (None, None) if auth is unavailable.
"""
_validate_proxy_env_urls()
# Normalise aliases
provider = _normalize_aux_provider(provider)
@@ -1905,15 +1835,9 @@ def auxiliary_max_tokens_param(value: int) -> dict:
# Every auxiliary LLM consumer should use these instead of manually
# constructing clients and calling .chat.completions.create().
# Client cache: (provider, async_mode, base_url, api_key, api_mode, runtime_key) -> (client, default_model, loop)
# NOTE: loop identity is NOT part of the key. On async cache hits we check
# whether the cached loop is the *current* loop; if not, the stale entry is
# replaced in-place. This bounds cache growth to one entry per unique
# provider config rather than one per (config × event-loop), which previously
# caused unbounded fd accumulation in long-running gateway processes (#10200).
# Client cache: (provider, async_mode, base_url, api_key) -> (client, default_model)
_client_cache: Dict[tuple, tuple] = {}
_client_cache_lock = threading.Lock()
_CLIENT_CACHE_MAX_SIZE = 64 # safety belt — evict oldest when exceeded
def neuter_async_httpx_del() -> None:
@@ -2046,49 +1970,39 @@ def _get_cached_client(
Async clients (AsyncOpenAI) use httpx.AsyncClient internally, which
binds to the event loop that was current when the client was created.
Using such a client on a *different* loop causes deadlocks or
RuntimeError. To prevent cross-loop issues, the cache validates on
every async hit that the cached loop is the *current, open* loop.
If the loop changed (e.g. a new gateway worker-thread loop), the stale
entry is replaced in-place rather than creating an additional entry.
This keeps cache size bounded to one entry per unique provider config,
preventing the fd-exhaustion that previously occurred in long-running
gateways where recycled worker threads created unbounded entries (#10200).
RuntimeError. To prevent cross-loop issues (especially in gateway
mode where _run_async() may spawn fresh loops in worker threads), the
cache key for async clients includes the current event loop's identity
so each loop gets its own client instance.
"""
# Resolve the current event loop for async clients so we can validate
# cached entries. Loop identity is NOT in the cache key — instead we
# check at hit time whether the cached loop is still current and open.
# This prevents unbounded cache growth from recycled worker-thread loops
# while still guaranteeing we never reuse a client on the wrong loop
# (which causes deadlocks, see #2681).
# Include loop identity for async clients to prevent cross-loop reuse.
# httpx.AsyncClient (inside AsyncOpenAI) is bound to the loop where it
# was created — reusing it on a different loop causes deadlocks (#2681).
loop_id = 0
current_loop = None
if async_mode:
try:
import asyncio as _aio
current_loop = _aio.get_event_loop()
loop_id = id(current_loop)
except RuntimeError:
pass
runtime = _normalize_main_runtime(main_runtime)
runtime_key = tuple(runtime.get(field, "") for field in _MAIN_RUNTIME_FIELDS) if provider == "auto" else ()
cache_key = (provider, async_mode, base_url or "", api_key or "", api_mode or "", runtime_key)
cache_key = (provider, async_mode, base_url or "", api_key or "", api_mode or "", loop_id, runtime_key)
with _client_cache_lock:
if cache_key in _client_cache:
cached_client, cached_default, cached_loop = _client_cache[cache_key]
if async_mode:
# Validate: the cached client must be bound to the CURRENT,
# OPEN loop. If the loop changed or was closed, the httpx
# transport inside is dead — force-close and replace.
loop_ok = (
cached_loop is not None
and cached_loop is current_loop
and not cached_loop.is_closed()
)
if loop_ok:
# A cached async client whose loop has been closed will raise
# "Event loop is closed" when httpx tries to clean up its
# transport. Discard the stale client and create a fresh one.
if cached_loop is not None and cached_loop.is_closed():
_force_close_async_httpx(cached_client)
del _client_cache[cache_key]
else:
effective = _compat_model(cached_client, model, cached_default)
return cached_client, effective
# Stale — evict and fall through to create a new client.
_force_close_async_httpx(cached_client)
del _client_cache[cache_key]
else:
effective = _compat_model(cached_client, model, cached_default)
return cached_client, effective
@@ -2108,12 +2022,6 @@ def _get_cached_client(
bound_loop = current_loop
with _client_cache_lock:
if cache_key not in _client_cache:
# Safety belt: if the cache has grown beyond the max, evict
# the oldest entries (FIFO — dict preserves insertion order).
while len(_client_cache) >= _CLIENT_CACHE_MAX_SIZE:
evict_key, evict_entry = next(iter(_client_cache.items()))
_force_close_async_httpx(evict_entry[0])
del _client_cache[evict_key]
_client_cache[cache_key] = (client, default_model, bound_loop)
else:
client, default_model, _ = _client_cache[cache_key]
@@ -2293,15 +2201,6 @@ def _build_call_kwargs(
"timeout": timeout,
}
# Opus 4.7+ rejects any non-default temperature/top_p/top_k — silently
# drop here so auxiliary callers that hardcode temperature (e.g. 0.3 on
# flush_memories, 0 on structured-JSON extraction) don't 400 the moment
# the aux model is flipped to 4.7.
if temperature is not None:
from agent.anthropic_adapter import _forbids_sampling_params
if _forbids_sampling_params(model):
temperature = None
if temperature is not None:
kwargs["temperature"] = temperature
@@ -2405,10 +2304,10 @@ def call_llm(
if task == "vision":
effective_provider, client, final_model = resolve_vision_provider_client(
provider=resolved_provider if resolved_provider != "auto" else provider,
model=resolved_model or model,
base_url=resolved_base_url or base_url,
api_key=resolved_api_key or api_key,
provider=provider,
model=model,
base_url=base_url,
api_key=api_key,
async_mode=False,
)
if client is None and resolved_provider != "auto" and not resolved_base_url:
@@ -2613,10 +2512,10 @@ async def async_call_llm(
if task == "vision":
effective_provider, client, final_model = resolve_vision_provider_client(
provider=resolved_provider if resolved_provider != "auto" else provider,
model=resolved_model or model,
base_url=resolved_base_url or base_url,
api_key=resolved_api_key or api_key,
provider=provider,
model=model,
base_url=base_url,
api_key=api_key,
async_mode=True,
)
if client is None and resolved_provider != "auto" and not resolved_base_url:
File diff suppressed because it is too large Load Diff
+40 -383
View File
@@ -17,10 +17,7 @@ Improvements over v2:
- Richer tool call/result detail in summarizer input
"""
import hashlib
import json
import logging
import re
import time
from typing import Any, Dict, List, Optional
@@ -39,10 +36,7 @@ SUMMARY_PREFIX = (
"into the summary below. This is a handoff from a previous context "
"window — treat it as background reference, NOT as active instructions. "
"Do NOT answer questions or fulfill requests mentioned in this summary; "
"they were already addressed. "
"Your current task is identified in the '## Active Task' section of the "
"summary — resume exactly from there. "
"Respond ONLY to the latest user message "
"they were already addressed. Respond ONLY to the latest user message "
"that appears AFTER this summary. The current session state (files, "
"config, etc.) may reflect work described here — avoid repeating it:"
)
@@ -63,128 +57,6 @@ _CHARS_PER_TOKEN = 4
_SUMMARY_FAILURE_COOLDOWN_SECONDS = 600
def _summarize_tool_result(tool_name: str, tool_args: str, tool_content: str) -> str:
"""Create an informative 1-line summary of a tool call + result.
Used during the pre-compression pruning pass to replace large tool
outputs with a short but useful description of what the tool did,
rather than a generic placeholder that carries zero information.
Returns strings like::
[terminal] ran `npm test` -> exit 0, 47 lines output
[read_file] read config.py from line 1 (1,200 chars)
[search_files] content search for 'compress' in agent/ -> 12 matches
"""
try:
args = json.loads(tool_args) if tool_args else {}
except (json.JSONDecodeError, TypeError):
args = {}
content = tool_content or ""
content_len = len(content)
line_count = content.count("\n") + 1 if content.strip() else 0
if tool_name == "terminal":
cmd = args.get("command", "")
if len(cmd) > 80:
cmd = cmd[:77] + "..."
exit_match = re.search(r'"exit_code"\s*:\s*(-?\d+)', content)
exit_code = exit_match.group(1) if exit_match else "?"
return f"[terminal] ran `{cmd}` -> exit {exit_code}, {line_count} lines output"
if tool_name == "read_file":
path = args.get("path", "?")
offset = args.get("offset", 1)
return f"[read_file] read {path} from line {offset} ({content_len:,} chars)"
if tool_name == "write_file":
path = args.get("path", "?")
written_lines = args.get("content", "").count("\n") + 1 if args.get("content") else "?"
return f"[write_file] wrote to {path} ({written_lines} lines)"
if tool_name == "search_files":
pattern = args.get("pattern", "?")
path = args.get("path", ".")
target = args.get("target", "content")
match_count = re.search(r'"total_count"\s*:\s*(\d+)', content)
count = match_count.group(1) if match_count else "?"
return f"[search_files] {target} search for '{pattern}' in {path} -> {count} matches"
if tool_name == "patch":
path = args.get("path", "?")
mode = args.get("mode", "replace")
return f"[patch] {mode} in {path} ({content_len:,} chars result)"
if tool_name in ("browser_navigate", "browser_click", "browser_snapshot",
"browser_type", "browser_scroll", "browser_vision"):
url = args.get("url", "")
ref = args.get("ref", "")
detail = f" {url}" if url else (f" ref={ref}" if ref else "")
return f"[{tool_name}]{detail} ({content_len:,} chars)"
if tool_name == "web_search":
query = args.get("query", "?")
return f"[web_search] query='{query}' ({content_len:,} chars result)"
if tool_name == "web_extract":
urls = args.get("urls", [])
url_desc = urls[0] if isinstance(urls, list) and urls else "?"
if isinstance(urls, list) and len(urls) > 1:
url_desc += f" (+{len(urls) - 1} more)"
return f"[web_extract] {url_desc} ({content_len:,} chars)"
if tool_name == "delegate_task":
goal = args.get("goal", "")
if len(goal) > 60:
goal = goal[:57] + "..."
return f"[delegate_task] '{goal}' ({content_len:,} chars result)"
if tool_name == "execute_code":
code_preview = (args.get("code") or "")[:60].replace("\n", " ")
if len(args.get("code", "")) > 60:
code_preview += "..."
return f"[execute_code] `{code_preview}` ({line_count} lines output)"
if tool_name in ("skill_view", "skills_list", "skill_manage"):
name = args.get("name", "?")
return f"[{tool_name}] name={name} ({content_len:,} chars)"
if tool_name == "vision_analyze":
question = args.get("question", "")[:50]
return f"[vision_analyze] '{question}' ({content_len:,} chars)"
if tool_name == "memory":
action = args.get("action", "?")
target = args.get("target", "?")
return f"[memory] {action} on {target}"
if tool_name == "todo":
return "[todo] updated task list"
if tool_name == "clarify":
return "[clarify] asked user a question"
if tool_name == "text_to_speech":
return f"[text_to_speech] generated audio ({content_len:,} chars)"
if tool_name == "cronjob":
action = args.get("action", "?")
return f"[cronjob] {action}"
if tool_name == "process":
action = args.get("action", "?")
sid = args.get("session_id", "?")
return f"[process] {action} session={sid}"
# Generic fallback
first_arg = ""
for k, v in list(args.items())[:2]:
sv = str(v)[:40]
first_arg += f" {k}={sv}"
return f"[{tool_name}]{first_arg} ({content_len:,} chars result)"
class ContextCompressor(ContextEngine):
"""Default context engine — compresses conversation context via lossy summarization.
@@ -206,8 +78,6 @@ class ContextCompressor(ContextEngine):
self._context_probed = False
self._context_probe_persistable = False
self._previous_summary = None
self._last_compression_savings_pct = 100.0
self._ineffective_compression_count = 0
def update_model(
self,
@@ -297,9 +167,6 @@ class ContextCompressor(ContextEngine):
# Stores the previous compaction summary for iterative updates
self._previous_summary: Optional[str] = None
# Anti-thrashing: track whether last compression was effective
self._last_compression_savings_pct: float = 100.0
self._ineffective_compression_count: int = 0
self._summary_failure_cooldown_until: float = 0.0
def update_from_response(self, usage: Dict[str, Any]):
@@ -308,26 +175,9 @@ class ContextCompressor(ContextEngine):
self.last_completion_tokens = usage.get("completion_tokens", 0)
def should_compress(self, prompt_tokens: int = None) -> bool:
"""Check if context exceeds the compression threshold.
Includes anti-thrashing protection: if the last two compressions
each saved less than 10%, skip compression to avoid infinite loops
where each pass removes only 1-2 messages.
"""
"""Check if context exceeds the compression threshold."""
tokens = prompt_tokens if prompt_tokens is not None else self.last_prompt_tokens
if tokens < self.threshold_tokens:
return False
# Anti-thrashing: back off if recent compressions were ineffective
if self._ineffective_compression_count >= 2:
if not self.quiet_mode:
logger.warning(
"Compression skipped — last %d compressions saved <10%% each. "
"Consider /new to start a fresh session, or /compress <topic> "
"for focused compression.",
self._ineffective_compression_count,
)
return False
return True
return tokens >= self.threshold_tokens
# ------------------------------------------------------------------
# Tool output pruning (cheap pre-pass, no LLM call)
@@ -337,16 +187,7 @@ class ContextCompressor(ContextEngine):
self, messages: List[Dict[str, Any]], protect_tail_count: int,
protect_tail_tokens: int | None = None,
) -> tuple[List[Dict[str, Any]], int]:
"""Replace old tool result contents with informative 1-line summaries.
Instead of a generic placeholder, generates a summary like::
[terminal] ran `npm test` -> exit 0, 47 lines output
[read_file] read config.py from line 1 (3,400 chars)
Also deduplicates identical tool results (e.g. reading the same file
5x keeps only the newest full copy) and truncates large tool_call
arguments in assistant messages outside the protected tail.
"""Replace old tool result contents with a short placeholder.
Walks backward from the end, protecting the most recent messages that
fall within ``protect_tail_tokens`` (when provided) OR the last
@@ -362,22 +203,6 @@ class ContextCompressor(ContextEngine):
result = [m.copy() for m in messages]
pruned = 0
# Build index: tool_call_id -> (tool_name, arguments_json)
call_id_to_tool: Dict[str, tuple] = {}
for msg in result:
if msg.get("role") == "assistant":
for tc in msg.get("tool_calls") or []:
if isinstance(tc, dict):
cid = tc.get("id", "")
fn = tc.get("function", {})
call_id_to_tool[cid] = (fn.get("name", "unknown"), fn.get("arguments", ""))
else:
cid = getattr(tc, "id", "") or ""
fn = getattr(tc, "function", None)
name = getattr(fn, "name", "unknown") if fn else "unknown"
args_str = getattr(fn, "arguments", "") if fn else ""
call_id_to_tool[cid] = (name, args_str)
# Determine the prune boundary
if protect_tail_tokens is not None and protect_tail_tokens > 0:
# Token-budget approach: walk backward accumulating tokens
@@ -386,8 +211,7 @@ class ContextCompressor(ContextEngine):
min_protect = min(protect_tail_count, len(result) - 1)
for i in range(len(result) - 1, -1, -1):
msg = result[i]
raw_content = msg.get("content") or ""
content_len = sum(len(p.get("text", "")) for p in raw_content) if isinstance(raw_content, list) else len(raw_content)
content_len = len(msg.get("content") or "")
msg_tokens = content_len // _CHARS_PER_TOKEN + 10
for tc in msg.get("tool_calls") or []:
if isinstance(tc, dict):
@@ -402,69 +226,18 @@ class ContextCompressor(ContextEngine):
else:
prune_boundary = len(result) - protect_tail_count
# Pass 1: Deduplicate identical tool results.
# When the same file is read multiple times, keep only the most recent
# full copy and replace older duplicates with a back-reference.
content_hashes: dict = {} # hash -> (index, tool_call_id)
for i in range(len(result) - 1, -1, -1):
msg = result[i]
if msg.get("role") != "tool":
continue
content = msg.get("content") or ""
# Skip multimodal content (list of content blocks)
if isinstance(content, list):
continue
if len(content) < 200:
continue
h = hashlib.md5(content.encode("utf-8", errors="replace")).hexdigest()[:12]
if h in content_hashes:
# This is an older duplicate — replace with back-reference
result[i] = {**msg, "content": "[Duplicate tool output — same content as a more recent call]"}
pruned += 1
else:
content_hashes[h] = (i, msg.get("tool_call_id", "?"))
# Pass 2: Replace old tool results with informative summaries
for i in range(prune_boundary):
msg = result[i]
if msg.get("role") != "tool":
continue
content = msg.get("content", "")
# Skip multimodal content (list of content blocks)
if isinstance(content, list):
continue
if not content or content == _PRUNED_TOOL_PLACEHOLDER:
continue
# Skip already-deduplicated or previously-summarized results
if content.startswith("[Duplicate tool output"):
continue
# Only prune if the content is substantial (>200 chars)
if len(content) > 200:
call_id = msg.get("tool_call_id", "")
tool_name, tool_args = call_id_to_tool.get(call_id, ("unknown", ""))
summary = _summarize_tool_result(tool_name, tool_args, content)
result[i] = {**msg, "content": summary}
result[i] = {**msg, "content": _PRUNED_TOOL_PLACEHOLDER}
pruned += 1
# Pass 3: Truncate large tool_call arguments in assistant messages
# outside the protected tail. write_file with 50KB content, for
# example, survives pruning entirely without this.
for i in range(prune_boundary):
msg = result[i]
if msg.get("role") != "assistant" or not msg.get("tool_calls"):
continue
new_tcs = []
modified = False
for tc in msg["tool_calls"]:
if isinstance(tc, dict):
args = tc.get("function", {}).get("arguments", "")
if len(args) > 500:
tc = {**tc, "function": {**tc["function"], "arguments": args[:200] + "...[truncated]"}}
modified = True
new_tcs.append(tc)
if modified:
result[i] = {**msg, "tool_calls": new_tcs}
return result, pruned
# ------------------------------------------------------------------
@@ -584,45 +357,29 @@ class ContextCompressor(ContextEngine):
)
# Shared structured template (used by both paths).
_template_sections = f"""## Active Task
[THE SINGLE MOST IMPORTANT FIELD. Copy the user's most recent request or
task assignment verbatim — the exact words they used. If multiple tasks
were requested and only some are done, list only the ones NOT yet completed.
The next assistant must pick up exactly here. Example:
"User asked: 'Now refactor the auth module to use JWT instead of sessions'"
If no outstanding task exists, write "None."]
## Goal
[What the user is trying to accomplish overall]
# Key changes vs v1:
# - "Pending User Asks" section (from Claude Code) explicitly tracks
# unanswered questions so the model knows what's resolved vs open
# - "Remaining Work" replaces "Next Steps" to avoid reading as active
# instructions
# - "Resolved Questions" makes it clear which questions were already
# answered (prevents model from re-answering them)
_template_sections = f"""## Goal
[What the user is trying to accomplish]
## Constraints & Preferences
[User preferences, coding style, constraints, important decisions]
## Completed Actions
[Numbered list of concrete actions taken — include tool used, target, and outcome.
Format each as: N. ACTION target — outcome [tool: name]
Example:
1. READ config.py:45 — found `==` should be `!=` [tool: read_file]
2. PATCH config.py:45 — changed `==` to `!=` [tool: patch]
3. TEST `pytest tests/` — 3/50 failed: test_parse, test_validate, test_edge [tool: terminal]
Be specific with file paths, commands, line numbers, and results.]
## Active State
[Current working state — include:
- Working directory and branch (if applicable)
- Modified/created files with brief note on each
- Test status (X/Y passing)
- Any running processes or servers
- Environment details that matter]
## In Progress
[Work currently underway — what was being done when compaction fired]
## Blocked
[Any blockers, errors, or issues not yet resolved. Include exact error messages.]
## Progress
### Done
[Completed work — include specific file paths, commands run, results obtained]
### In Progress
[Work currently underway]
### Blocked
[Any blockers or issues encountered]
## Key Decisions
[Important technical decisions and WHY they were made]
[Important technical decisions and why they were made]
## Resolved Questions
[Questions the user asked that were ALREADY answered — include the answer so the next assistant does not re-answer them]
@@ -639,7 +396,10 @@ Be specific with file paths, commands, line numbers, and results.]
## Critical Context
[Any specific values, error messages, configuration details, or data that would be lost without explicit preservation]
Target ~{summary_budget} tokens. Be CONCRETE — include file paths, command outputs, error messages, line numbers, and specific values. Avoid vague descriptions like "made some changes" — say exactly what changed.
## Tools & Patterns
[Which tools were used, how they were used effectively, and any tool-specific discoveries]
Target ~{summary_budget} tokens. Be specific — include file paths, command outputs, error messages, and concrete values rather than vague descriptions.
Write only the summary body. Do not include any preamble or prefix."""
@@ -655,7 +415,7 @@ PREVIOUS SUMMARY:
NEW TURNS TO INCORPORATE:
{content_to_summarize}
Update the summary using this exact structure. PRESERVE all existing information that is still relevant. ADD new completed actions to the numbered list (continue numbering). Move items from "In Progress" to "Completed Actions" when done. Move answered questions to "Resolved Questions". Update "Active State" to reflect current state. Remove information only if it is clearly obsolete. CRITICAL: Update "## Active Task" to reflect the user's most recent unfulfilled request — this is the most important field for task continuity.
Update the summary using this exact structure. PRESERVE all existing information that is still relevant. ADD new progress. Move items from "In Progress" to "Done" when completed. Move answered questions to "Resolved Questions". Remove information only if it is clearly obsolete.
{_template_sections}"""
else:
@@ -690,7 +450,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
"api_mode": self.api_mode,
},
"messages": [{"role": "user", "content": prompt}],
"max_tokens": int(summary_budget * 1.3),
"max_tokens": summary_budget * 2,
# timeout resolved from auxiliary.compression.timeout config by call_llm
}
if self.summary_model:
@@ -704,10 +464,8 @@ The user has requested that this compaction PRIORITISE preserving all informatio
# Store for iterative updates on next compaction
self._previous_summary = summary
self._summary_failure_cooldown_until = 0.0
self._summary_model_fallen_back = False
return self._with_summary_prefix(summary)
except RuntimeError:
# No provider configured — long cooldown, unlikely to self-resolve
self._summary_failure_cooldown_until = time.monotonic() + _SUMMARY_FAILURE_COOLDOWN_SECONDS
logging.warning("Context compression: no provider available for "
"summary. Middle turns will be dropped without summary "
@@ -715,42 +473,12 @@ The user has requested that this compaction PRIORITISE preserving all informatio
_SUMMARY_FAILURE_COOLDOWN_SECONDS)
return None
except Exception as e:
# If the summary model is different from the main model and the
# error looks permanent (model not found, 503, 404), fall back to
# using the main model instead of entering cooldown that leaves
# context growing unbounded. (#8620 sub-issue 4)
_status = getattr(e, "status_code", None) or getattr(getattr(e, "response", None), "status_code", None)
_err_str = str(e).lower()
_is_model_not_found = (
_status in (404, 503)
or "model_not_found" in _err_str
or "does not exist" in _err_str
or "no available channel" in _err_str
)
if (
_is_model_not_found
and self.summary_model
and self.summary_model != self.model
and not getattr(self, "_summary_model_fallen_back", False)
):
self._summary_model_fallen_back = True
logging.warning(
"Summary model '%s' not available (%s). "
"Falling back to main model '%s' for compression.",
self.summary_model, e, self.model,
)
self.summary_model = "" # empty = use main model
self._summary_failure_cooldown_until = 0.0 # no cooldown
return self._generate_summary(messages, summary_budget) # retry immediately
# Transient errors (timeout, rate limit, network) — shorter cooldown
_transient_cooldown = 60
self._summary_failure_cooldown_until = time.monotonic() + _transient_cooldown
self._summary_failure_cooldown_until = time.monotonic() + _SUMMARY_FAILURE_COOLDOWN_SECONDS
logging.warning(
"Failed to generate context summary: %s. "
"Further summary attempts paused for %d seconds.",
e,
_transient_cooldown,
_SUMMARY_FAILURE_COOLDOWN_SECONDS,
)
return None
@@ -873,62 +601,6 @@ The user has requested that this compaction PRIORITISE preserving all informatio
# Tail protection by token budget
# ------------------------------------------------------------------
def _find_last_user_message_idx(
self, messages: List[Dict[str, Any]], head_end: int
) -> int:
"""Return the index of the last user-role message at or after *head_end*, or -1."""
for i in range(len(messages) - 1, head_end - 1, -1):
if messages[i].get("role") == "user":
return i
return -1
def _ensure_last_user_message_in_tail(
self,
messages: List[Dict[str, Any]],
cut_idx: int,
head_end: int,
) -> int:
"""Guarantee the most recent user message is in the protected tail.
Context compressor bug (#10896): ``_align_boundary_backward`` can pull
``cut_idx`` past a user message when it tries to keep tool_call/result
groups together. If the last user message ends up in the *compressed*
middle region the LLM summariser writes it into "Pending User Asks",
but ``SUMMARY_PREFIX`` tells the next model to respond only to user
messages *after* the summary — so the task effectively disappears from
the active context, causing the agent to stall, repeat completed work,
or silently drop the user's latest request.
Fix: if the last user-role message is not already in the tail
(``messages[cut_idx:]``), walk ``cut_idx`` back to include it. We
then re-align backward one more time to avoid splitting any
tool_call/result group that immediately precedes the user message.
"""
last_user_idx = self._find_last_user_message_idx(messages, head_end)
if last_user_idx < 0:
# No user message found beyond head — nothing to anchor.
return cut_idx
if last_user_idx >= cut_idx:
# Already in the tail; nothing to do.
return cut_idx
# The last user message is in the middle (compressed) region.
# Pull cut_idx back to it directly — a user message is already a
# clean boundary (no tool_call/result splitting risk), so there is no
# need to call _align_boundary_backward here; doing so would
# unnecessarily pull the cut further back into the preceding
# assistant + tool_calls group.
if not self.quiet_mode:
logger.debug(
"Anchoring tail cut to last user message at index %d "
"(was %d) to prevent active-task loss after compression",
last_user_idx,
cut_idx,
)
# Safety: never go back into the head region.
return max(last_user_idx, head_end + 1)
def _find_tail_cut_by_tokens(
self, messages: List[Dict[str, Any]], head_end: int,
token_budget: int | None = None,
@@ -946,8 +618,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
read, etc.). If even the minimum 3 messages exceed 1.5x the budget
the cut is placed right after the head so compression still runs.
Never cuts inside a tool_call/result group. Always ensures the most
recent user message is in the tail (see ``_ensure_last_user_message_in_tail``).
Never cuts inside a tool_call/result group.
"""
if token_budget is None:
token_budget = self.tail_token_budget
@@ -986,10 +657,6 @@ The user has requested that this compaction PRIORITISE preserving all informatio
# Align to avoid splitting tool groups
cut_idx = self._align_boundary_backward(messages, cut_idx)
# Ensure the most recent user message is always in the tail so the
# active task is never lost to compression (fixes #10896).
cut_idx = self._ensure_last_user_message_in_tail(messages, cut_idx, head_end)
return max(cut_idx, head_end + 1)
# ------------------------------------------------------------------
@@ -1077,11 +744,11 @@ The user has requested that this compaction PRIORITISE preserving all informatio
compressed = []
for i in range(compress_start):
msg = messages[i].copy()
if i == 0 and msg.get("role") == "system":
existing = msg.get("content") or ""
_compression_note = "[Note: Some earlier conversation turns have been compacted into a handoff summary to preserve context space. The current session state may still reflect earlier work, so build on that summary and state rather than re-doing work.]"
if _compression_note not in existing:
msg["content"] = existing + "\n\n" + _compression_note
if i == 0 and msg.get("role") == "system" and self.compression_count == 0:
msg["content"] = (
(msg.get("content") or "")
+ "\n\n[Note: Some earlier conversation turns have been compacted into a handoff summary to preserve context space. The current session state may still reflect earlier work, so build on that summary and state rather than re-doing work.]"
)
compressed.append(msg)
# If LLM summary failed, insert a static fallback so the model
@@ -1139,24 +806,14 @@ The user has requested that this compaction PRIORITISE preserving all informatio
compressed = self._sanitize_tool_pairs(compressed)
new_estimate = estimate_messages_tokens_rough(compressed)
saved_estimate = display_tokens - new_estimate
# Anti-thrashing: track compression effectiveness
savings_pct = (saved_estimate / display_tokens * 100) if display_tokens > 0 else 0
self._last_compression_savings_pct = savings_pct
if savings_pct < 10:
self._ineffective_compression_count += 1
else:
self._ineffective_compression_count = 0
if not self.quiet_mode:
new_estimate = estimate_messages_tokens_rough(compressed)
saved_estimate = display_tokens - new_estimate
logger.info(
"Compressed: %d -> %d messages (~%d tokens saved, %.0f%%)",
"Compressed: %d -> %d messages (~%d tokens saved)",
n_messages,
len(compressed),
saved_estimate,
savings_pct,
)
logger.info("Compression #%d complete", self.compression_count)
+1 -17
View File
@@ -313,25 +313,9 @@ class CopilotACPClient:
tools=tools,
tool_choice=tool_choice,
)
# Normalise timeout: run_agent.py may pass an httpx.Timeout object
# (used natively by the OpenAI SDK) rather than a plain float.
if timeout is None:
_effective_timeout = _DEFAULT_TIMEOUT_SECONDS
elif isinstance(timeout, (int, float)):
_effective_timeout = float(timeout)
else:
# httpx.Timeout or similar — pick the largest component so the
# subprocess has enough wall-clock time for the full response.
_candidates = [
getattr(timeout, attr, None)
for attr in ("read", "write", "connect", "pool", "timeout")
]
_numeric = [float(v) for v in _candidates if isinstance(v, (int, float))]
_effective_timeout = max(_numeric) if _numeric else _DEFAULT_TIMEOUT_SECONDS
response_text, reasoning_text = self._run_prompt(
prompt_text,
timeout_seconds=_effective_timeout,
timeout_seconds=float(timeout or _DEFAULT_TIMEOUT_SECONDS),
)
tool_calls, cleaned_text = _extract_tool_calls_from_text(response_text)
-2
View File
@@ -1162,7 +1162,6 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
if token:
source_name = "gh_cli" if "gh" in source.lower() else f"env:{source}"
active_sources.add(source_name)
pconfig = PROVIDER_REGISTRY.get(provider)
changed |= _upsert_entry(
entries,
provider,
@@ -1171,7 +1170,6 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
"source": source_name,
"auth_type": AUTH_TYPE_API_KEY,
"access_token": token,
"base_url": pconfig.inference_base_url if pconfig else "",
"label": source,
},
)
+80 -39
View File
@@ -600,45 +600,6 @@ class KawaiiSpinner:
"analyzing", "computing", "synthesizing", "formulating", "brainstorming",
]
@classmethod
def get_waiting_faces(cls) -> list:
"""Return waiting faces from the active skin, falling back to KAWAII_WAITING."""
try:
skin = _get_skin()
if skin:
faces = skin.spinner.get("waiting_faces", [])
if faces:
return faces
except Exception:
pass
return cls.KAWAII_WAITING
@classmethod
def get_thinking_faces(cls) -> list:
"""Return thinking faces from the active skin, falling back to KAWAII_THINKING."""
try:
skin = _get_skin()
if skin:
faces = skin.spinner.get("thinking_faces", [])
if faces:
return faces
except Exception:
pass
return cls.KAWAII_THINKING
@classmethod
def get_thinking_verbs(cls) -> list:
"""Return thinking verbs from the active skin, falling back to THINKING_VERBS."""
try:
skin = _get_skin()
if skin:
verbs = skin.spinner.get("thinking_verbs", [])
if verbs:
return verbs
except Exception:
pass
return cls.THINKING_VERBS
def __init__(self, message: str = "", spinner_type: str = 'dots', print_fn=None):
self.message = message
self.spinner_frames = self.SPINNERS.get(spinner_type, self.SPINNERS['dots'])
@@ -993,4 +954,84 @@ def get_cute_tool_message(
# Honcho session line (one-liner with clickable OSC 8 hyperlink)
# =========================================================================
_DIM = "\033[2m"
_SKY_BLUE = "\033[38;5;117m"
_ANSI_RESET = "\033[0m"
# =========================================================================
# Context pressure display (CLI user-facing warnings)
# =========================================================================
# ANSI color codes for context pressure tiers
_CYAN = "\033[36m"
_YELLOW = "\033[33m"
_BOLD = "\033[1m"
_DIM_ANSI = "\033[2m"
# Bar characters
_BAR_FILLED = ""
_BAR_EMPTY = ""
_BAR_WIDTH = 20
def format_context_pressure(
compaction_progress: float,
threshold_tokens: int,
threshold_percent: float,
compression_enabled: bool = True,
) -> str:
"""Build a formatted context pressure line for CLI display.
The bar and percentage show progress toward the compaction threshold,
NOT the raw context window. 100% = compaction fires.
Args:
compaction_progress: How close to compaction (0.01.0, 1.0 = fires).
threshold_tokens: Compaction threshold in tokens.
threshold_percent: Compaction threshold as a fraction of context window.
compression_enabled: Whether auto-compression is active.
"""
pct_int = min(int(compaction_progress * 100), 100)
filled = min(int(compaction_progress * _BAR_WIDTH), _BAR_WIDTH)
bar = _BAR_FILLED * filled + _BAR_EMPTY * (_BAR_WIDTH - filled)
threshold_k = f"{threshold_tokens // 1000}k" if threshold_tokens >= 1000 else str(threshold_tokens)
threshold_pct_int = int(threshold_percent * 100)
color = f"{_BOLD}{_YELLOW}"
icon = ""
if compression_enabled:
hint = "compaction approaching"
else:
hint = "no auto-compaction"
return (
f" {color}{icon} context {bar} {pct_int}% to compaction{_ANSI_RESET}"
f" {_DIM_ANSI}{threshold_k} threshold ({threshold_pct_int}%) · {hint}{_ANSI_RESET}"
)
def format_context_pressure_gateway(
compaction_progress: float,
threshold_percent: float,
compression_enabled: bool = True,
) -> str:
"""Build a plain-text context pressure notification for messaging platforms.
No ANSI — just Unicode and plain text suitable for Telegram/Discord/etc.
The percentage shows progress toward the compaction threshold.
"""
pct_int = min(int(compaction_progress * 100), 100)
filled = min(int(compaction_progress * _BAR_WIDTH), _BAR_WIDTH)
bar = _BAR_FILLED * filled + _BAR_EMPTY * (_BAR_WIDTH - filled)
threshold_pct_int = int(threshold_percent * 100)
icon = "⚠️"
if compression_enabled:
hint = f"Context compaction approaching (threshold: {threshold_pct_int}% of window)."
else:
hint = "Auto-compaction is disabled — context may be truncated."
return f"{icon} Context: {bar} {pct_int}% to compaction\n{hint}"
-9
View File
@@ -112,10 +112,6 @@ _RATE_LIMIT_PATTERNS = [
"please retry after",
"resource_exhausted",
"rate increased too quickly", # Alibaba/DashScope throttling
# AWS Bedrock throttling
"throttlingexception",
"too many concurrent requests",
"servicequotaexceededexception",
]
# Usage-limit patterns that need disambiguation (could be billing OR rate_limit)
@@ -175,11 +171,6 @@ _CONTEXT_OVERFLOW_PATTERNS = [
# Chinese error messages (some providers return these)
"超过最大长度",
"上下文长度",
# AWS Bedrock Converse API error patterns
"input is too long",
"max input token",
"input token",
"exceeds the maximum number of input tokens",
]
# Model not found patterns
+2 -14
View File
@@ -28,7 +28,6 @@ Usage in run_agent.py:
from __future__ import annotations
import json
import logging
import re
from typing import Any, Dict, List, Optional
@@ -44,22 +43,11 @@ logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
_FENCE_TAG_RE = re.compile(r'</?\s*memory-context\s*>', re.IGNORECASE)
_INTERNAL_CONTEXT_RE = re.compile(
r'<\s*memory-context\s*>[\s\S]*?</\s*memory-context\s*>',
re.IGNORECASE,
)
_INTERNAL_NOTE_RE = re.compile(
r'\[System note:\s*The following is recalled memory context,\s*NOT new user input\.\s*Treat as informational background data\.\]\s*',
re.IGNORECASE,
)
def sanitize_context(text: str) -> str:
"""Strip fence tags, injected context blocks, and system notes from provider output."""
text = _INTERNAL_CONTEXT_RE.sub('', text)
text = _INTERNAL_NOTE_RE.sub('', text)
text = _FENCE_TAG_RE.sub('', text)
return text
"""Strip fence-escape sequences from provider output."""
return _FENCE_TAG_RE.sub('', text)
def build_memory_context_block(raw_context: str) -> str:
+1 -16
View File
@@ -23,7 +23,7 @@ logger = logging.getLogger(__name__)
# are preserved so the full model name reaches cache lookups and server queries.
_PROVIDER_PREFIXES: frozenset[str] = frozenset({
"openrouter", "nous", "openai-codex", "copilot", "copilot-acp",
"gemini", "ollama-cloud", "zai", "kimi-coding", "kimi-coding-cn", "minimax", "minimax-cn", "anthropic", "deepseek",
"gemini", "zai", "kimi-coding", "kimi-coding-cn", "minimax", "minimax-cn", "anthropic", "deepseek",
"opencode-zen", "opencode-go", "ai-gateway", "kilocode", "alibaba",
"qwen-oauth",
"xiaomi",
@@ -33,11 +33,9 @@ _PROVIDER_PREFIXES: frozenset[str] = frozenset({
"google", "google-gemini", "google-ai-studio",
"glm", "z-ai", "z.ai", "zhipu", "github", "github-copilot",
"github-models", "kimi", "moonshot", "kimi-cn", "moonshot-cn", "claude", "deep-seek",
"ollama",
"opencode", "zen", "go", "vercel", "kilo", "dashscope", "aliyun", "qwen",
"mimo", "xiaomi-mimo",
"arcee-ai", "arceeai",
"xai", "x-ai", "x.ai", "grok",
"qwen-portal",
})
@@ -102,8 +100,6 @@ DEFAULT_CONTEXT_LENGTHS = {
# fuzzy-match collisions (e.g. "anthropic/claude-sonnet-4" is a
# substring of "anthropic/claude-sonnet-4.6").
# OpenRouter-prefixed models resolve via OpenRouter live API or models.dev.
"claude-opus-4-7": 1000000,
"claude-opus-4.7": 1000000,
"claude-opus-4-6": 1000000,
"claude-sonnet-4-6": 1000000,
"claude-opus-4.6": 1000000,
@@ -242,7 +238,6 @@ _URL_TO_PROVIDER: Dict[str, str] = {
"api.x.ai": "xai",
"api.xiaomimimo.com": "xiaomi",
"xiaomimimo.com": "xiaomi",
"ollama.com": "ollama-cloud",
}
@@ -1016,16 +1011,6 @@ def get_model_context_length(
if ctx:
return ctx
# 4b. AWS Bedrock — use static context length table.
# Bedrock's ListFoundationModels doesn't expose context window sizes,
# so we maintain a curated table in bedrock_adapter.py.
if provider == "bedrock" or (base_url and "bedrock-runtime" in base_url):
try:
from agent.bedrock_adapter import get_bedrock_context_length
return get_bedrock_context_length(model)
except ImportError:
pass # boto3 not installed — fall through to generic resolution
# 5. Provider-aware lookups (before generic OpenRouter cache)
# These are provider-specific and take priority over the generic OR cache,
# since the same model can have different context limits per provider
-1
View File
@@ -169,7 +169,6 @@ PROVIDER_TO_MODELS_DEV: Dict[str, str] = {
"togetherai": "togetherai",
"perplexity": "perplexity",
"cohere": "cohere",
"ollama-cloud": "ollama-cloud",
}
# Reverse mapping: models.dev → Hermes (built lazily)
-182
View File
@@ -1,182 +0,0 @@
"""Cross-session rate limit guard for Nous Portal.
Writes rate limit state to a shared file so all sessions (CLI, gateway,
cron, auxiliary) can check whether Nous Portal is currently rate-limited
before making requests. Prevents retry amplification when RPH is tapped.
Each 429 from Nous triggers up to 9 API calls per conversation turn
(3 SDK retries x 3 Hermes retries), and every one of those calls counts
against RPH. By recording the rate limit state on first 429 and checking
it before subsequent attempts, we eliminate the amplification effect.
"""
from __future__ import annotations
import json
import logging
import os
import tempfile
import time
from typing import Any, Mapping, Optional
logger = logging.getLogger(__name__)
_STATE_SUBDIR = "rate_limits"
_STATE_FILENAME = "nous.json"
def _state_path() -> str:
"""Return the path to the Nous rate limit state file."""
try:
from hermes_constants import get_hermes_home
base = get_hermes_home()
except ImportError:
base = os.path.join(os.path.expanduser("~"), ".hermes")
return os.path.join(base, _STATE_SUBDIR, _STATE_FILENAME)
def _parse_reset_seconds(headers: Optional[Mapping[str, str]]) -> Optional[float]:
"""Extract the best available reset-time estimate from response headers.
Priority:
1. x-ratelimit-reset-requests-1h (hourly RPH window most useful)
2. x-ratelimit-reset-requests (per-minute RPM window)
3. retry-after (generic HTTP header)
Returns seconds-from-now, or None if no usable header found.
"""
if not headers:
return None
lowered = {k.lower(): v for k, v in headers.items()}
for key in (
"x-ratelimit-reset-requests-1h",
"x-ratelimit-reset-requests",
"retry-after",
):
raw = lowered.get(key)
if raw is not None:
try:
val = float(raw)
if val > 0:
return val
except (TypeError, ValueError):
pass
return None
def record_nous_rate_limit(
*,
headers: Optional[Mapping[str, str]] = None,
error_context: Optional[dict[str, Any]] = None,
default_cooldown: float = 300.0,
) -> None:
"""Record that Nous Portal is rate-limited.
Parses the reset time from response headers or error context.
Falls back to ``default_cooldown`` (5 minutes) if no reset info
is available. Writes to a shared file that all sessions can read.
Args:
headers: HTTP response headers from the 429 error.
error_context: Structured error context from _extract_api_error_context().
default_cooldown: Fallback cooldown in seconds when no header data.
"""
now = time.time()
reset_at = None
# Try headers first (most accurate)
header_seconds = _parse_reset_seconds(headers)
if header_seconds is not None:
reset_at = now + header_seconds
# Try error_context reset_at (from body parsing)
if reset_at is None and isinstance(error_context, dict):
ctx_reset = error_context.get("reset_at")
if isinstance(ctx_reset, (int, float)) and ctx_reset > now:
reset_at = float(ctx_reset)
# Default cooldown
if reset_at is None:
reset_at = now + default_cooldown
path = _state_path()
try:
state_dir = os.path.dirname(path)
os.makedirs(state_dir, exist_ok=True)
state = {
"reset_at": reset_at,
"recorded_at": now,
"reset_seconds": reset_at - now,
}
# Atomic write: write to temp file + rename
fd, tmp_path = tempfile.mkstemp(dir=state_dir, suffix=".tmp")
try:
with os.fdopen(fd, "w") as f:
json.dump(state, f)
os.replace(tmp_path, path)
except Exception:
# Clean up temp file on failure
try:
os.unlink(tmp_path)
except OSError:
pass
raise
logger.info(
"Nous rate limit recorded: resets in %.0fs (at %.0f)",
reset_at - now, reset_at,
)
except Exception as exc:
logger.debug("Failed to write Nous rate limit state: %s", exc)
def nous_rate_limit_remaining() -> Optional[float]:
"""Check if Nous Portal is currently rate-limited.
Returns:
Seconds remaining until reset, or None if not rate-limited.
"""
path = _state_path()
try:
with open(path) as f:
state = json.load(f)
reset_at = state.get("reset_at", 0)
remaining = reset_at - time.time()
if remaining > 0:
return remaining
# Expired — clean up
try:
os.unlink(path)
except OSError:
pass
return None
except (FileNotFoundError, json.JSONDecodeError, KeyError, TypeError):
return None
def clear_nous_rate_limit() -> None:
"""Clear the rate limit state (e.g., after a successful Nous request)."""
try:
os.unlink(_state_path())
except FileNotFoundError:
pass
except OSError as exc:
logger.debug("Failed to clear Nous rate limit state: %s", exc)
def format_remaining(seconds: float) -> str:
"""Format seconds remaining into human-readable duration."""
s = max(0, int(seconds))
if s < 60:
return f"{s}s"
if s < 3600:
m, sec = divmod(s, 60)
return f"{m}m {sec}s" if sec else f"{m}m"
h, remainder = divmod(s, 3600)
m = remainder // 60
return f"{h}h {m}m" if m else f"{h}h"
+1 -3
View File
@@ -295,9 +295,7 @@ PLATFORM_HINTS = {
),
"telegram": (
"You are on a text messaging communication platform, Telegram. "
"Standard markdown is automatically converted to Telegram format. "
"Supported: **bold**, *italic*, ~~strikethrough~~, ||spoiler||, "
"`inline code`, ```code blocks```, [links](url), and ## headers. "
"Please do not use markdown as it does not render. "
"You can send media files natively: to deliver a file to the user, "
"include MEDIA:/absolute/path/to/file in your response. Images "
"(.png, .jpg, .webp) appear as photos, audio (.ogg) sends as voice "
-17
View File
@@ -93,17 +93,6 @@ _DB_CONNSTR_RE = re.compile(
re.IGNORECASE,
)
# JWT tokens: header.payload[.signature] — always start with "eyJ" (base64 for "{")
# Matches 1-part (header only), 2-part (header.payload), and full 3-part JWTs.
_JWT_RE = re.compile(
r"eyJ[A-Za-z0-9_-]{10,}" # Header (always starts with eyJ)
r"(?:\.[A-Za-z0-9_=-]{4,}){0,2}" # Optional payload and/or signature
)
# Discord user/role mentions: <@123456789012345678> or <@!123456789012345678>
# Snowflake IDs are 17-20 digit integers that resolve to specific Discord accounts.
_DISCORD_MENTION_RE = re.compile(r"<@!?(\d{17,20})>")
# E.164 phone numbers: +<country><number>, 7-15 digits
# Negative lookahead prevents matching hex strings or identifiers
_SIGNAL_PHONE_RE = re.compile(r"(\+[1-9]\d{6,14})(?![A-Za-z0-9])")
@@ -170,12 +159,6 @@ def redact_sensitive_text(text: str) -> str:
# Database connection string passwords
text = _DB_CONNSTR_RE.sub(lambda m: f"{m.group(1)}***{m.group(3)}", text)
# JWT tokens (eyJ... — base64-encoded JSON headers)
text = _JWT_RE.sub(lambda m: _mask_token(m.group(0)), text)
# Discord user/role mentions (<@snowflake_id>)
text = _DISCORD_MENTION_RE.sub(lambda m: f"<@{'!' if '!' in m.group(0) else ''}***>", text)
# E.164 phone numbers (Signal, WhatsApp)
def _redact_phone(m):
phone = m.group(1)
+2 -11
View File
@@ -12,8 +12,6 @@ from datetime import datetime
from pathlib import Path
from typing import Any, Dict, Optional
from hermes_constants import display_hermes_home
logger = logging.getLogger(__name__)
_skill_commands: Dict[str, Dict[str, Any]] = {}
@@ -72,14 +70,7 @@ def _load_skill_payload(skill_identifier: str, task_id: str | None = None) -> tu
skill_name = str(loaded_skill.get("name") or normalized)
skill_path = str(loaded_skill.get("path") or "")
skill_dir = None
# Prefer the absolute skill_dir returned by skill_view() — this is
# correct for both local and external skills. Fall back to the old
# SKILLS_DIR-relative reconstruction only when skill_dir is absent
# (e.g. legacy skill_view responses).
abs_skill_dir = loaded_skill.get("skill_dir")
if abs_skill_dir:
skill_dir = Path(abs_skill_dir)
elif skill_path:
if skill_path:
try:
skill_dir = SKILLS_DIR / Path(skill_path).parent
except Exception:
@@ -117,7 +108,7 @@ def _inject_skill_config(loaded_skill: dict[str, Any], parts: list[str]) -> None
if not resolved:
return
lines = ["", f"[Skill config (from {display_hermes_home()}/config.yaml):"]
lines = ["", "[Skill config (from ~/.hermes/config.yaml):"]
for key, value in resolved.items():
display_val = str(value) if value else "(not set)"
lines.append(f" {key} = {display_val}")
-74
View File
@@ -284,80 +284,6 @@ _OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
source_url="https://ai.google.dev/pricing",
pricing_version="google-pricing-2026-03-16",
),
# AWS Bedrock — pricing per the Bedrock pricing page.
# Bedrock charges the same per-token rates as the model provider but
# through AWS billing. These are the on-demand prices (no commitment).
# Source: https://aws.amazon.com/bedrock/pricing/
(
"bedrock",
"anthropic.claude-opus-4-6",
): PricingEntry(
input_cost_per_million=Decimal("15.00"),
output_cost_per_million=Decimal("75.00"),
source="official_docs_snapshot",
source_url="https://aws.amazon.com/bedrock/pricing/",
pricing_version="bedrock-pricing-2026-04",
),
(
"bedrock",
"anthropic.claude-sonnet-4-6",
): PricingEntry(
input_cost_per_million=Decimal("3.00"),
output_cost_per_million=Decimal("15.00"),
source="official_docs_snapshot",
source_url="https://aws.amazon.com/bedrock/pricing/",
pricing_version="bedrock-pricing-2026-04",
),
(
"bedrock",
"anthropic.claude-sonnet-4-5",
): PricingEntry(
input_cost_per_million=Decimal("3.00"),
output_cost_per_million=Decimal("15.00"),
source="official_docs_snapshot",
source_url="https://aws.amazon.com/bedrock/pricing/",
pricing_version="bedrock-pricing-2026-04",
),
(
"bedrock",
"anthropic.claude-haiku-4-5",
): PricingEntry(
input_cost_per_million=Decimal("0.80"),
output_cost_per_million=Decimal("4.00"),
source="official_docs_snapshot",
source_url="https://aws.amazon.com/bedrock/pricing/",
pricing_version="bedrock-pricing-2026-04",
),
(
"bedrock",
"amazon.nova-pro",
): PricingEntry(
input_cost_per_million=Decimal("0.80"),
output_cost_per_million=Decimal("3.20"),
source="official_docs_snapshot",
source_url="https://aws.amazon.com/bedrock/pricing/",
pricing_version="bedrock-pricing-2026-04",
),
(
"bedrock",
"amazon.nova-lite",
): PricingEntry(
input_cost_per_million=Decimal("0.06"),
output_cost_per_million=Decimal("0.24"),
source="official_docs_snapshot",
source_url="https://aws.amazon.com/bedrock/pricing/",
pricing_version="bedrock-pricing-2026-04",
),
(
"bedrock",
"amazon.nova-micro",
): PricingEntry(
input_cost_per_million=Decimal("0.035"),
output_cost_per_million=Decimal("0.14"),
source="official_docs_snapshot",
source_url="https://aws.amazon.com/bedrock/pricing/",
pricing_version="bedrock-pricing-2026-04",
),
}
+1 -4
View File
@@ -561,10 +561,7 @@ class BatchRunner:
provider_sort (str): Sort providers by price/throughput/latency (optional)
max_tokens (int): Maximum tokens for model responses (optional, uses model default if not set)
reasoning_config (Dict): OpenRouter reasoning config override (e.g. {"effort": "none"} to disable thinking)
prefill_messages (List[Dict]): Messages to prepend as prefilled conversation context (few-shot priming).
NOTE: Anthropic Sonnet 4.6+ and Opus 4.6+ reject a trailing assistant-role prefill
(400 error). For those models use output_config.format or structured-output
schemas instead. Safe here for user-role priming and for older Claude / non-Claude models.
prefill_messages (List[Dict]): Messages to prepend as prefilled conversation context (few-shot priming)
max_samples (int): Only process the first N samples from the dataset (optional, processes all if not set)
"""
self.dataset_file = Path(dataset_file)
+7 -15
View File
@@ -16,7 +16,7 @@ model:
# "nous" - Nous Portal OAuth (requires: hermes login)
# "nous-api" - Nous Portal API key (requires: NOUS_API_KEY)
# "anthropic" - Direct Anthropic API (requires: ANTHROPIC_API_KEY)
# "openai-codex" - OpenAI Codex (requires: hermes auth)
# "openai-codex" - OpenAI Codex (requires: hermes login --provider openai-codex)
# "copilot" - GitHub Copilot / GitHub Models (requires: GITHUB_TOKEN)
# "gemini" - Use Google AI Studio direct (requires: GOOGLE_API_KEY or GEMINI_API_KEY)
# "zai" - Use z.ai / ZhipuAI GLM models (requires: GLM_API_KEY)
@@ -26,7 +26,6 @@ model:
# "huggingface" - Hugging Face Inference (requires: HF_TOKEN)
# "xiaomi" - Xiaomi MiMo (requires: XIAOMI_API_KEY)
# "arcee" - Arcee AI Trinity models (requires: ARCEEAI_API_KEY)
# "ollama-cloud" - Ollama Cloud (requires: OLLAMA_API_KEY — https://ollama.com/settings)
# "kilocode" - KiloCode gateway (requires: KILOCODE_API_KEY)
# "ai-gateway" - Vercel AI Gateway (requires: AI_GATEWAY_API_KEY)
#
@@ -38,6 +37,12 @@ model:
# base_url: "http://localhost:1234/v1"
# No API key needed — local servers typically ignore auth.
#
# For Ollama Cloud (https://ollama.com/pricing):
# provider: "custom"
# base_url: "https://ollama.com/v1"
# Set OLLAMA_API_KEY in .env — automatically picked up when base_url
# points to ollama.com.
#
# Can also be overridden with --provider flag or HERMES_INFERENCE_PROVIDER env var.
provider: "auto"
@@ -332,7 +337,6 @@ compression:
# "openrouter" - Force OpenRouter (requires OPENROUTER_API_KEY)
# "nous" - Force Nous Portal (requires: hermes login)
# "gemini" - Force Google AI Studio direct (requires: GOOGLE_API_KEY or GEMINI_API_KEY)
# "ollama-cloud" - Ollama Cloud (requires: OLLAMA_API_KEY)
# "codex" - Force Codex OAuth (requires: hermes model → Codex).
# Uses gpt-5.3-codex which supports vision.
# "main" - Use your custom endpoint (OPENAI_BASE_URL + OPENAI_API_KEY).
@@ -560,18 +564,6 @@ platform_toolsets:
homeassistant: [hermes-homeassistant]
qqbot: [hermes-qqbot]
# =============================================================================
# Gateway Platform Settings
# =============================================================================
# Optional per-platform messaging settings.
# Platform-specific knobs live under `extra`.
#
# platforms:
# telegram:
# reply_to_mode: "first" # off | first | all
# extra:
# disable_link_previews: false # Set true to suppress Telegram URL previews in bot messages
# ─────────────────────────────────────────────────────────────────────────────
# Available toolsets (use these names in platform_toolsets or the toolsets list)
#
+45 -107
View File
@@ -401,27 +401,14 @@ def load_cli_config() -> Dict[str, Any]:
# filesystem is directly accessible. For ALL remote/container backends
# (ssh, docker, modal, singularity), the host path doesn't exist on the
# target -- remove the key so terminal_tool.py uses its per-backend default.
#
# GUARD: If TERMINAL_CWD is already set to a real absolute path (by the
# gateway's config bridge earlier in the process), don't clobber it.
# This prevents a lazy import of cli.py during gateway runtime from
# rewriting TERMINAL_CWD to the service's working directory.
# See issue #10817.
_CWD_PLACEHOLDERS = (".", "auto", "cwd")
if terminal_config.get("cwd") in _CWD_PLACEHOLDERS:
_existing_cwd = os.environ.get("TERMINAL_CWD", "")
if _existing_cwd and _existing_cwd not in _CWD_PLACEHOLDERS and os.path.isabs(_existing_cwd):
# Gateway (or earlier startup) already resolved a real path — keep it
terminal_config["cwd"] = _existing_cwd
defaults["terminal"]["cwd"] = _existing_cwd
if terminal_config.get("cwd") in (".", "auto", "cwd"):
effective_backend = terminal_config.get("env_type", "local")
if effective_backend == "local":
terminal_config["cwd"] = os.getcwd()
defaults["terminal"]["cwd"] = terminal_config["cwd"]
else:
effective_backend = terminal_config.get("env_type", "local")
if effective_backend == "local":
terminal_config["cwd"] = os.getcwd()
defaults["terminal"]["cwd"] = terminal_config["cwd"]
else:
# Remove so TERMINAL_CWD stays unset → tool picks backend default
terminal_config.pop("cwd", None)
# Remove so TERMINAL_CWD stays unset → tool picks backend default
terminal_config.pop("cwd", None)
env_mappings = {
"env_type": "TERMINAL_ENV",
@@ -1002,7 +989,6 @@ def _prune_orphaned_branches(repo_root: str) -> None:
_ACCENT_ANSI_DEFAULT = "\033[1;38;2;255;215;0m" # True-color #FFD700 bold — fallback
_BOLD = "\033[1m"
_RST = "\033[0m"
_STREAM_PAD = " " # 4-space indent for streamed response text (matches Panel padding)
def _hex_to_ansi(hex_color: str, *, bold: bool = False) -> str:
@@ -1726,9 +1712,9 @@ class HermesCLI:
# Parse and validate toolsets
self.enabled_toolsets = toolsets
if toolsets and "all" not in toolsets and "*" not in toolsets:
# Validate each toolset — MCP server names are resolved via
# live registry aliases (registered during discover_mcp_tools),
# but discovery hasn't run yet at this point, so exclude them.
# Validate each toolset — MCP server names are added by
# _get_platform_tools() but aren't registered in TOOLSETS yet
# (that happens later in _sync_mcp_toolsets), so exclude them.
mcp_names = set((CLI_CONFIG.get("mcp_servers") or {}).keys())
invalid = [t for t in toolsets if not validate_toolset(t) and t not in mcp_names]
if invalid:
@@ -2026,17 +2012,7 @@ class HermesCLI:
"""Return the visible height for the spinner/status text line above the status bar."""
if not getattr(self, "_spinner_text", ""):
return 0
if self._use_minimal_tui_chrome(width=width):
return 0
# Compute how many lines the spinner text needs when wrapped.
# The rendered text is " {emoji} {label} ({elapsed})" — about
# len(_spinner_text) + 16 chars for indent + timer suffix.
width = width or self._get_tui_terminal_width()
if width and width > 10:
import math
text_len = len(self._spinner_text) + 16 # indent + timer
return max(1, math.ceil(text_len / width))
return 1
return 0 if self._use_minimal_tui_chrome(width=width) else 1
def _get_voice_status_fragments(self, width: Optional[int] = None):
"""Return the voice status bar fragments for the interactive TUI."""
@@ -2604,7 +2580,7 @@ class HermesCLI:
_tc = getattr(self, "_stream_text_ansi", "")
while "\n" in self._stream_buf:
line, self._stream_buf = self._stream_buf.split("\n", 1)
_cprint(f"{_STREAM_PAD}{_tc}{line}{_RST}" if _tc else f"{_STREAM_PAD}{line}")
_cprint(f"{_tc}{line}{_RST}" if _tc else line)
def _flush_stream(self) -> None:
"""Emit any remaining partial line from the stream buffer and close the box."""
@@ -2621,7 +2597,7 @@ class HermesCLI:
if self._stream_buf:
_tc = getattr(self, "_stream_text_ansi", "")
_cprint(f"{_STREAM_PAD}{_tc}{self._stream_buf}{_RST}" if _tc else f"{_STREAM_PAD}{self._stream_buf}")
_cprint(f"{_tc}{self._stream_buf}{_RST}" if _tc else self._stream_buf)
self._stream_buf = ""
# Close the response box
@@ -3920,14 +3896,23 @@ class HermesCLI:
def _handle_profile_command(self):
"""Display active profile name and home directory."""
from hermes_constants import display_hermes_home
from hermes_cli.profiles import get_active_profile_name
from hermes_constants import get_hermes_home, display_hermes_home
home = get_hermes_home()
display = display_hermes_home()
profile_name = get_active_profile_name()
profiles_parent = Path.home() / ".hermes" / "profiles"
try:
rel = home.relative_to(profiles_parent)
profile_name = str(rel).split("/")[0]
except ValueError:
profile_name = None
print()
print(f" Profile: {profile_name}")
if profile_name:
print(f" Profile: {profile_name}")
else:
print(" Profile: default")
print(f" Home: {display}")
print()
@@ -4114,8 +4099,6 @@ class HermesCLI:
self.agent.flush_memories(self.conversation_history)
except (Exception, KeyboardInterrupt):
pass
# Trigger memory extraction on the old session before session_id rotates.
self.agent.commit_memory_session(self.conversation_history)
self._notify_session_boundary("on_session_finalize")
elif self.agent:
# First session or empty history — still finalize the old session
@@ -4604,19 +4587,16 @@ class HermesCLI:
self._close_model_picker()
return
provider_data = providers[selected]
# Use the curated model list from list_authenticated_providers()
# (same lists as `hermes model` and gateway pickers).
# Only fall back to the live provider catalog when the curated
# list is empty (e.g. user-defined endpoints with no curated list).
model_list = provider_data.get("models", [])
model_list = []
try:
from hermes_cli.models import provider_model_ids
live = provider_model_ids(provider_data["slug"])
if live:
model_list = live
except Exception:
pass
if not model_list:
try:
from hermes_cli.models import provider_model_ids
live = provider_model_ids(provider_data["slug"])
if live:
model_list = live
except Exception:
pass
model_list = provider_data.get("models", [])
state["stage"] = "model"
state["provider_data"] = provider_data
state["model_list"] = model_list
@@ -5507,8 +5487,7 @@ class HermesCLI:
version = f" v{p['version']}" if p["version"] else ""
tools = f"{p['tools']} tools" if p["tools"] else ""
hooks = f"{p['hooks']} hooks" if p["hooks"] else ""
commands = f"{p['commands']} commands" if p.get("commands") else ""
parts = [x for x in [tools, hooks, commands] if x]
parts = [x for x in [tools, hooks] if x]
detail = f" ({', '.join(parts)})" if parts else ""
error = f"{p['error']}" if p["error"] else ""
print(f" {status} {p['name']}{version}{detail}{error}")
@@ -5782,7 +5761,7 @@ class HermesCLI:
border_style=_resp_color,
style=_resp_text,
box=rich_box.HORIZONTALS,
padding=(1, 4),
padding=(1, 2),
))
else:
_cprint(" (No response generated)")
@@ -5906,7 +5885,7 @@ class HermesCLI:
title_align="left",
border_style=_resp_color,
box=rich_box.HORIZONTALS,
padding=(1, 4),
padding=(1, 2),
))
else:
_cprint(" 💬 /btw: (no response)")
@@ -5973,7 +5952,7 @@ class HermesCLI:
parts = cmd.strip().split(None, 1)
sub = parts[1].lower().strip() if len(parts) > 1 else "status"
_DEFAULT_CDP = "http://127.0.0.1:9222"
_DEFAULT_CDP = "http://localhost:9222"
current = os.environ.get("BROWSER_CDP_URL", "").strip()
if sub.startswith("connect"):
@@ -6220,21 +6199,13 @@ class HermesCLI:
def _toggle_yolo(self):
"""Toggle YOLO mode — skip all dangerous command approval prompts."""
import os
from hermes_cli.colors import Colors as _Colors
current = bool(os.environ.get("HERMES_YOLO_MODE"))
if current:
os.environ.pop("HERMES_YOLO_MODE", None)
_cprint(
f" ⚠ YOLO mode {_Colors.BOLD}{_Colors.RED}OFF{_Colors.RESET}"
" — dangerous commands will require approval."
)
self.console.print(" ⚠ YOLO mode [bold red]OFF[/] — dangerous commands will require approval.")
else:
os.environ["HERMES_YOLO_MODE"] = "1"
_cprint(
f" ⚡ YOLO mode {_Colors.BOLD}{_Colors.GREEN}ON{_Colors.RESET}"
" — all commands auto-approved. Use with caution."
)
self.console.print(" ⚡ YOLO mode [bold green]ON[/] — all commands auto-approved. Use with caution.")
def _handle_reasoning_command(self, cmd: str):
"""Handle /reasoning — manage effort level and display toggle.
@@ -7677,7 +7648,7 @@ class HermesCLI:
label = " ⚕ Hermes "
fill = w - 2 - len(label)
_cprint(f"\n{_ACCENT}╭─{label}{'' * max(fill - 1, 0)}{_RST}")
_cprint(f"{_STREAM_PAD}{sentence.rstrip()}")
_cprint(sentence.rstrip())
tts_thread = threading.Thread(
target=stream_tts_to_speaker,
@@ -7781,33 +7752,7 @@ class HermesCLI:
# Fallback for non-interactive mode (e.g., single-query)
agent_thread.join(0.1)
# Wait for the agent thread to finish. After an interrupt the
# agent may take a few seconds to clean up (kill subprocess, persist
# session). Poll instead of a blocking join so the process_loop
# stays responsive — if the user sent another interrupt or the
# agent gets stuck, we can break out instead of freezing forever.
if interrupt_msg is not None:
# Interrupt path: poll briefly, then move on. The agent
# thread is daemon — it dies on process exit regardless.
for _wait_tick in range(50): # 50 * 0.2s = 10s max
agent_thread.join(timeout=0.2)
if not agent_thread.is_alive():
break
# Check if user fired ANOTHER interrupt (Ctrl+C sets
# _should_exit which process_loop checks on next pass).
if getattr(self, '_should_exit', False):
break
if agent_thread.is_alive():
logger.warning(
"Agent thread still alive after interrupt "
"(thread %s). Daemon thread will be cleaned up "
"on exit.",
agent_thread.ident,
)
else:
# Normal completion: agent thread should be done already,
# but guard against edge cases.
agent_thread.join(timeout=30)
agent_thread.join() # Ensure agent thread completes
# Proactively clean up async clients whose event loop is dead.
# The agent thread may have created AsyncOpenAI clients bound
@@ -7934,7 +7879,7 @@ class HermesCLI:
border_style=_resp_color,
style=_resp_text,
box=rich_box.HORIZONTALS,
padding=(1, 4),
padding=(1, 2),
))
@@ -9100,7 +9045,6 @@ class HermesCLI:
spinner_widget = Window(
content=FormattedTextControl(get_spinner_text),
height=get_spinner_height,
wrap_lines=True,
)
spacer = Window(
@@ -10060,11 +10004,6 @@ def main(
):
cli.agent.quiet_mode = True
cli.agent.suppress_status_output = True
# Suppress streaming display callbacks so stdout stays
# machine-readable (no styled "Hermes" box, no tool-gen
# status lines). The response is printed once below.
cli.agent.stream_delta_callback = None
cli.agent.tool_gen_callback = None
result = cli.agent.run_conversation(
user_message=effective_query,
conversation_history=cli.conversation_history,
@@ -10072,8 +10011,7 @@ def main(
response = result.get("final_response", "") if isinstance(result, dict) else str(result)
if response:
print(response)
# Session ID goes to stderr so piped stdout is clean.
print(f"\nsession_id: {cli.session_id}", file=sys.stderr)
print(f"\nsession_id: {cli.session_id}")
# Ensure proper exit code for automation wrappers
sys.exit(1 if isinstance(result, dict) and result.get("failed") else 0)
-6
View File
@@ -501,12 +501,6 @@ def update_job(job_id: str, updates: Dict[str, Any]) -> Optional[Dict[str, Any]]
if schedule_changed:
updated_schedule = updated["schedule"]
# The API may pass schedule as a raw string (e.g. "every 10m")
# instead of a pre-parsed dict. Normalize it the same way
# create_job() does so downstream code can call .get() safely.
if isinstance(updated_schedule, str):
updated_schedule = parse_schedule(updated_schedule)
updated["schedule"] = updated_schedule
updated["schedule_display"] = updates.get(
"schedule_display",
updated_schedule.get("display", updated.get("schedule_display")),
+2 -19
View File
@@ -10,7 +10,6 @@ runs at a time if multiple processes overlap.
import asyncio
import concurrent.futures
import contextvars
import json
import logging
import os
@@ -289,13 +288,11 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Option
if wrap_response:
task_name = job.get("name", job["id"])
job_id = job.get("id", "")
delivery_content = (
f"Cronjob Response: {task_name}\n"
f"(job_id: {job_id})\n"
f"-------------\n\n"
f"{content}\n\n"
f"To stop or manage this job, send me a new message (e.g. \"stop reminder {task_name}\")."
f"Note: The agent cannot see this message, and therefore cannot respond to it."
)
else:
delivery_content = content
@@ -771,11 +768,7 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
_cron_inactivity_limit = _cron_timeout if _cron_timeout > 0 else None
_POLL_INTERVAL = 5.0
_cron_pool = concurrent.futures.ThreadPoolExecutor(max_workers=1)
# Preserve scheduler-scoped ContextVar state (for example skill-declared
# env passthrough registrations) when the cron run hops into the worker
# thread used for inactivity timeout monitoring.
_cron_context = contextvars.copy_context()
_cron_future = _cron_pool.submit(_cron_context.run, agent.run_conversation, prompt)
_cron_future = _cron_pool.submit(agent.run_conversation, prompt)
_inactivity_timeout = False
try:
if _cron_inactivity_limit is None:
@@ -837,9 +830,6 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
)
final_response = result.get("final_response", "") or ""
# Strip leaked placeholder text that upstream may inject on empty completions.
if final_response.strip() == "(No response generated)":
final_response = ""
# Use a separate variable for log display; keep final_response clean
# for delivery logic (empty response = no delivery).
logged_response = final_response if final_response else "(No response generated)"
@@ -979,13 +969,6 @@ def tick(verbose: bool = True, adapters=None, loop=None) -> int:
delivery_error = str(de)
logger.error("Delivery failed for job %s: %s", job["id"], de)
# Treat empty final_response as a soft failure so last_status
# is not "ok" — the agent ran but produced nothing useful.
# (issue #8585)
if success and not final_response:
success = False
error = "Agent completed but produced empty response (model error, timeout, or misconfiguration)"
mark_job_run(job["id"], success, error, delivery_error=delivery_error)
executed += 1
Executable → Regular
+6 -13
View File
@@ -1,14 +1,13 @@
#!/bin/bash
# Docker/Podman entrypoint: bootstrap config files into the mounted volume, then run hermes.
# Docker entrypoint: bootstrap config files into the mounted volume, then run hermes.
set -e
HERMES_HOME="${HERMES_HOME:-/opt/data}"
HERMES_HOME="/opt/data"
INSTALL_DIR="/opt/hermes"
# --- Privilege dropping via gosu ---
# When started as root (the default for Docker, or fakeroot in rootless Podman),
# optionally remap the hermes user/group to match host-side ownership, fix volume
# permissions, then re-exec as hermes.
# When started as root (the default), optionally remap the hermes user/group
# to match host-side ownership, fix volume permissions, then re-exec as hermes.
if [ "$(id -u)" = "0" ]; then
if [ -n "$HERMES_UID" ] && [ "$HERMES_UID" != "$(id -u hermes)" ]; then
echo "Changing hermes UID to $HERMES_UID"
@@ -17,19 +16,13 @@ if [ "$(id -u)" = "0" ]; then
if [ -n "$HERMES_GID" ] && [ "$HERMES_GID" != "$(id -g hermes)" ]; then
echo "Changing hermes GID to $HERMES_GID"
# -o allows non-unique GID (e.g. macOS GID 20 "staff" may already exist
# as "dialout" in the Debian-based container image)
groupmod -o -g "$HERMES_GID" hermes 2>/dev/null || true
groupmod -g "$HERMES_GID" hermes
fi
actual_hermes_uid=$(id -u hermes)
if [ "$(stat -c %u "$HERMES_HOME" 2>/dev/null)" != "$actual_hermes_uid" ]; then
echo "$HERMES_HOME is not owned by $actual_hermes_uid, fixing"
# In rootless Podman the container's "root" is mapped to an unprivileged
# host UID — chown will fail. That's fine: the volume is already owned
# by the mapped user on the host side.
chown -R hermes:hermes "$HERMES_HOME" 2>/dev/null || \
echo "Warning: chown failed (rootless container?) — continuing anyway"
chown -R hermes:hermes "$HERMES_HOME"
fi
echo "Dropping root privileges"
-18
View File
@@ -554,12 +554,6 @@ def load_gateway_config() -> GatewayConfig:
bridged["mention_patterns"] = platform_cfg["mention_patterns"]
if plat == Platform.DISCORD and "channel_skill_bindings" in platform_cfg:
bridged["channel_skill_bindings"] = platform_cfg["channel_skill_bindings"]
if "channel_prompts" in platform_cfg:
channel_prompts = platform_cfg["channel_prompts"]
if isinstance(channel_prompts, dict):
bridged["channel_prompts"] = {str(k): v for k, v in channel_prompts.items()}
else:
bridged["channel_prompts"] = channel_prompts
if not bridged:
continue
plat_data = platforms_data.setdefault(plat.value, {})
@@ -638,18 +632,6 @@ def load_gateway_config() -> GatewayConfig:
os.environ["TELEGRAM_IGNORED_THREADS"] = str(ignored_threads)
if "reactions" in telegram_cfg and not os.getenv("TELEGRAM_REACTIONS"):
os.environ["TELEGRAM_REACTIONS"] = str(telegram_cfg["reactions"]).lower()
if "proxy_url" in telegram_cfg and not os.getenv("TELEGRAM_PROXY"):
os.environ["TELEGRAM_PROXY"] = str(telegram_cfg["proxy_url"]).strip()
if "disable_link_previews" in telegram_cfg:
plat_data = platforms_data.setdefault(Platform.TELEGRAM.value, {})
if not isinstance(plat_data, dict):
plat_data = {}
platforms_data[Platform.TELEGRAM.value] = plat_data
extra = plat_data.setdefault("extra", {})
if not isinstance(extra, dict):
extra = {}
plat_data["extra"] = extra
extra["disable_link_previews"] = telegram_cfg["disable_link_previews"]
whatsapp_cfg = yaml_cfg.get("whatsapp", {})
if isinstance(whatsapp_cfg, dict):
+25 -1
View File
@@ -3,11 +3,12 @@ Event Hook System
A lightweight event-driven system that fires handlers at key lifecycle points.
Hooks are discovered from ~/.hermes/hooks/ directories, each containing:
- HOOK.yaml (metadata: name, description, events list)
- HOOK.yaml (metadata: name, description, events list, optional startup_readiness)
- handler.py (Python handler with async def handle(event_type, context))
Events:
- gateway:startup -- Gateway process starts
- gateway:shutdown -- Gateway process is shutting down
- session:start -- New session created (first message of a new session)
- session:end -- Session ends (user ran /new or /reset)
- session:reset -- Session reset completed (new session entry created)
@@ -31,6 +32,26 @@ from hermes_cli.config import get_hermes_home
HOOKS_DIR = get_hermes_home() / "hooks"
def _normalize_startup_readiness(hook_name: str, manifest: dict[str, Any]) -> Optional[dict[str, Any]]:
"""Validate and normalize optional startup readiness metadata."""
readiness = manifest.get("startup_readiness")
if readiness is None:
return None
if not isinstance(readiness, dict):
print(f"[hooks] Ignoring startup_readiness for {hook_name}: expected mapping", flush=True)
return None
check_id = str(readiness.get("id", "")).strip()
if not check_id:
print(f"[hooks] Ignoring startup_readiness for {hook_name}: missing id", flush=True)
return None
return {
"id": check_id,
"required": bool(readiness.get("required", True)),
}
class HookRegistry:
"""
Discovers, loads, and fires event hooks.
@@ -62,6 +83,7 @@ class HookRegistry:
"description": "Run ~/.hermes/BOOT.md on gateway startup",
"events": ["gateway:startup"],
"path": "(builtin)",
"startup_readiness": None,
})
except Exception as e:
print(f"[hooks] Could not load built-in boot-md hook: {e}", flush=True)
@@ -102,6 +124,7 @@ class HookRegistry:
if not events:
print(f"[hooks] Skipping {hook_name}: no events declared", flush=True)
continue
startup_readiness = _normalize_startup_readiness(hook_name, manifest)
# Dynamically load the handler module
spec = importlib.util.spec_from_file_location(
@@ -128,6 +151,7 @@ class HookRegistry:
"description": manifest.get("description", ""),
"events": events,
"path": str(hook_dir),
"startup_readiness": startup_readiness,
})
print(f"[hooks] Loaded hook '{hook_name}' for events: {events}", flush=True)
+5 -514
View File
@@ -515,8 +515,6 @@ class APIServerAdapter(BasePlatformAdapter):
session_id: Optional[str] = None,
stream_delta_callback=None,
tool_progress_callback=None,
tool_start_callback=None,
tool_complete_callback=None,
) -> Any:
"""
Create an AIAgent instance using the gateway's runtime config.
@@ -555,8 +553,6 @@ class APIServerAdapter(BasePlatformAdapter):
platform="api_server",
stream_delta_callback=stream_delta_callback,
tool_progress_callback=tool_progress_callback,
tool_start_callback=tool_start_callback,
tool_complete_callback=tool_complete_callback,
session_db=self._ensure_session_db(),
fallback_model=fallback_model,
)
@@ -902,7 +898,7 @@ class APIServerAdapter(BasePlatformAdapter):
return time.monotonic()
# Stream content chunks as they arrive from the agent
loop = asyncio.get_running_loop()
loop = asyncio.get_event_loop()
while True:
try:
delta = await loop.run_in_executor(None, lambda: stream_q.get(timeout=0.5))
@@ -969,427 +965,6 @@ class APIServerAdapter(BasePlatformAdapter):
return response
async def _write_sse_responses(
self,
request: "web.Request",
response_id: str,
model: str,
created_at: int,
stream_q,
agent_task,
agent_ref,
conversation_history: List[Dict[str, str]],
user_message: str,
instructions: Optional[str],
conversation: Optional[str],
store: bool,
session_id: str,
) -> "web.StreamResponse":
"""Write an SSE stream for POST /v1/responses (OpenAI Responses API).
Emits spec-compliant event types as the agent runs:
- ``response.created`` initial envelope (status=in_progress)
- ``response.output_text.delta`` / ``response.output_text.done``
streamed assistant text
- ``response.output_item.added`` / ``response.output_item.done``
with ``item.type == "function_call"`` when the agent invokes a
tool (both events fire; the ``done`` event carries the finalized
``arguments`` string)
- ``response.output_item.added`` with
``item.type == "function_call_output"`` tool result with
``{call_id, output, status}``
- ``response.completed`` terminal event carrying the full
response object with all output items + usage (same payload
shape as the non-streaming path for parity)
- ``response.failed`` terminal event on agent error
If the client disconnects mid-stream, ``agent.interrupt()`` is
called so the agent stops issuing upstream LLM calls, then the
asyncio task is cancelled. When ``store=True`` the full response
is persisted to the ResponseStore in a ``finally`` block so GET
/v1/responses/{id} and ``previous_response_id`` chaining work the
same as the batch path.
"""
import queue as _q
sse_headers = {
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache",
"X-Accel-Buffering": "no",
}
origin = request.headers.get("Origin", "")
cors = self._cors_headers_for_origin(origin) if origin else None
if cors:
sse_headers.update(cors)
if session_id:
sse_headers["X-Hermes-Session-Id"] = session_id
response = web.StreamResponse(status=200, headers=sse_headers)
await response.prepare(request)
# State accumulated during the stream
final_text_parts: List[str] = []
# Track open function_call items by name so we can emit a matching
# ``done`` event when the tool completes. Order preserved.
pending_tool_calls: List[Dict[str, Any]] = []
# Output items we've emitted so far (used to build the terminal
# response.completed payload). Kept in the order they appeared.
emitted_items: List[Dict[str, Any]] = []
# Monotonic counter for output_index (spec requires it).
output_index = 0
# Monotonic counter for call_id generation if the agent doesn't
# provide one (it doesn't, from tool_progress_callback).
call_counter = 0
# Canonical Responses SSE events include a monotonically increasing
# sequence_number. Add it server-side for every emitted event so
# clients that validate the OpenAI event schema can parse our stream.
sequence_number = 0
# Track the assistant message item id + content index for text
# delta events — the spec ties deltas to a specific item.
message_item_id = f"msg_{uuid.uuid4().hex[:24]}"
message_output_index: Optional[int] = None
message_opened = False
async def _write_event(event_type: str, data: Dict[str, Any]) -> None:
nonlocal sequence_number
if "sequence_number" not in data:
data["sequence_number"] = sequence_number
sequence_number += 1
payload = f"event: {event_type}\ndata: {json.dumps(data)}\n\n"
await response.write(payload.encode())
def _envelope(status: str) -> Dict[str, Any]:
env: Dict[str, Any] = {
"id": response_id,
"object": "response",
"status": status,
"created_at": created_at,
"model": model,
}
return env
final_response_text = ""
agent_error: Optional[str] = None
usage: Dict[str, int] = {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0}
try:
# response.created — initial envelope, status=in_progress
created_env = _envelope("in_progress")
created_env["output"] = []
await _write_event("response.created", {
"type": "response.created",
"response": created_env,
})
last_activity = time.monotonic()
async def _open_message_item() -> None:
"""Emit response.output_item.added for the assistant message
the first time any text delta arrives."""
nonlocal message_opened, message_output_index, output_index
if message_opened:
return
message_opened = True
message_output_index = output_index
output_index += 1
item = {
"id": message_item_id,
"type": "message",
"status": "in_progress",
"role": "assistant",
"content": [],
}
await _write_event("response.output_item.added", {
"type": "response.output_item.added",
"output_index": message_output_index,
"item": item,
})
async def _emit_text_delta(delta_text: str) -> None:
await _open_message_item()
final_text_parts.append(delta_text)
await _write_event("response.output_text.delta", {
"type": "response.output_text.delta",
"item_id": message_item_id,
"output_index": message_output_index,
"content_index": 0,
"delta": delta_text,
"logprobs": [],
})
async def _emit_tool_started(payload: Dict[str, Any]) -> str:
"""Emit response.output_item.added for a function_call.
Returns the call_id so the matching completion event can
reference it. Prefer the real ``tool_call_id`` from the
agent when available; fall back to a generated call id for
safety in tests or older code paths.
"""
nonlocal output_index, call_counter
call_counter += 1
call_id = payload.get("tool_call_id") or f"call_{response_id[5:]}_{call_counter}"
args = payload.get("arguments", {})
if isinstance(args, dict):
arguments_str = json.dumps(args)
else:
arguments_str = str(args)
item = {
"id": f"fc_{uuid.uuid4().hex[:24]}",
"type": "function_call",
"status": "in_progress",
"name": payload.get("name", ""),
"call_id": call_id,
"arguments": arguments_str,
}
idx = output_index
output_index += 1
pending_tool_calls.append({
"call_id": call_id,
"name": payload.get("name", ""),
"arguments": arguments_str,
"item_id": item["id"],
"output_index": idx,
})
emitted_items.append({
"type": "function_call",
"name": payload.get("name", ""),
"arguments": arguments_str,
"call_id": call_id,
})
await _write_event("response.output_item.added", {
"type": "response.output_item.added",
"output_index": idx,
"item": item,
})
return call_id
async def _emit_tool_completed(payload: Dict[str, Any]) -> None:
"""Emit response.output_item.done (function_call) followed
by response.output_item.added (function_call_output)."""
nonlocal output_index
call_id = payload.get("tool_call_id")
result = payload.get("result", "")
pending = None
if call_id:
for i, p in enumerate(pending_tool_calls):
if p["call_id"] == call_id:
pending = pending_tool_calls.pop(i)
break
if pending is None:
# Completion without a matching start — skip to avoid
# emitting orphaned done events.
return
# function_call done
done_item = {
"id": pending["item_id"],
"type": "function_call",
"status": "completed",
"name": pending["name"],
"call_id": pending["call_id"],
"arguments": pending["arguments"],
}
await _write_event("response.output_item.done", {
"type": "response.output_item.done",
"output_index": pending["output_index"],
"item": done_item,
})
# function_call_output added (result)
result_str = result if isinstance(result, str) else json.dumps(result)
output_parts = [{"type": "input_text", "text": result_str}]
output_item = {
"id": f"fco_{uuid.uuid4().hex[:24]}",
"type": "function_call_output",
"call_id": pending["call_id"],
"output": output_parts,
"status": "completed",
}
idx = output_index
output_index += 1
emitted_items.append({
"type": "function_call_output",
"call_id": pending["call_id"],
"output": output_parts,
})
await _write_event("response.output_item.added", {
"type": "response.output_item.added",
"output_index": idx,
"item": output_item,
})
await _write_event("response.output_item.done", {
"type": "response.output_item.done",
"output_index": idx,
"item": output_item,
})
# Main drain loop — thread-safe queue fed by agent callbacks.
async def _dispatch(it) -> None:
"""Route a queue item to the correct SSE emitter.
Plain strings are text deltas. Tagged tuples with
``__tool_started__`` / ``__tool_completed__`` prefixes
are tool lifecycle events.
"""
if isinstance(it, tuple) and len(it) == 2 and isinstance(it[0], str):
tag, payload = it
if tag == "__tool_started__":
await _emit_tool_started(payload)
elif tag == "__tool_completed__":
await _emit_tool_completed(payload)
# Unknown tags are silently ignored (forward-compat).
elif isinstance(it, str):
await _emit_text_delta(it)
# Other types (non-string, non-tuple) are silently dropped.
loop = asyncio.get_running_loop()
while True:
try:
item = await loop.run_in_executor(None, lambda: stream_q.get(timeout=0.5))
except _q.Empty:
if agent_task.done():
# Drain remaining
while True:
try:
item = stream_q.get_nowait()
if item is None:
break
await _dispatch(item)
last_activity = time.monotonic()
except _q.Empty:
break
break
if time.monotonic() - last_activity >= CHAT_COMPLETIONS_SSE_KEEPALIVE_SECONDS:
await response.write(b": keepalive\n\n")
last_activity = time.monotonic()
continue
if item is None: # EOS sentinel
break
await _dispatch(item)
last_activity = time.monotonic()
# Pick up agent result + usage from the completed task
try:
result, agent_usage = await agent_task
usage = agent_usage or usage
# If the agent produced a final_response but no text
# deltas were streamed (e.g. some providers only emit
# the full response at the end), emit a single fallback
# delta so Responses clients still receive a live text part.
agent_final = result.get("final_response", "") if isinstance(result, dict) else ""
if agent_final and not final_text_parts:
await _emit_text_delta(agent_final)
if agent_final and not final_response_text:
final_response_text = agent_final
if isinstance(result, dict) and result.get("error") and not final_response_text:
agent_error = result["error"]
except Exception as e: # noqa: BLE001
logger.error("Error running agent for streaming responses: %s", e, exc_info=True)
agent_error = str(e)
# Close the message item if it was opened
final_response_text = "".join(final_text_parts) or final_response_text
if message_opened:
await _write_event("response.output_text.done", {
"type": "response.output_text.done",
"item_id": message_item_id,
"output_index": message_output_index,
"content_index": 0,
"text": final_response_text,
"logprobs": [],
})
msg_done_item = {
"id": message_item_id,
"type": "message",
"status": "completed",
"role": "assistant",
"content": [
{"type": "output_text", "text": final_response_text}
],
}
await _write_event("response.output_item.done", {
"type": "response.output_item.done",
"output_index": message_output_index,
"item": msg_done_item,
})
# Always append a final message item in the completed
# response envelope so clients that only parse the terminal
# payload still see the assistant text. This mirrors the
# shape produced by _extract_output_items in the batch path.
final_items: List[Dict[str, Any]] = list(emitted_items)
final_items.append({
"type": "message",
"role": "assistant",
"content": [
{"type": "output_text", "text": final_response_text or (agent_error or "")}
],
})
if agent_error:
failed_env = _envelope("failed")
failed_env["output"] = final_items
failed_env["error"] = {"message": agent_error, "type": "server_error"}
failed_env["usage"] = {
"input_tokens": usage.get("input_tokens", 0),
"output_tokens": usage.get("output_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
}
await _write_event("response.failed", {
"type": "response.failed",
"response": failed_env,
})
else:
completed_env = _envelope("completed")
completed_env["output"] = final_items
completed_env["usage"] = {
"input_tokens": usage.get("input_tokens", 0),
"output_tokens": usage.get("output_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
}
await _write_event("response.completed", {
"type": "response.completed",
"response": completed_env,
})
# Persist for future chaining / GET retrieval, mirroring
# the batch path behavior.
if store:
full_history = list(conversation_history)
full_history.append({"role": "user", "content": user_message})
if isinstance(result, dict) and result.get("messages"):
full_history.extend(result["messages"])
else:
full_history.append({"role": "assistant", "content": final_response_text})
self._response_store.put(response_id, {
"response": completed_env,
"conversation_history": full_history,
"instructions": instructions,
"session_id": session_id,
})
if conversation:
self._response_store.set_conversation(conversation, response_id)
except (ConnectionResetError, ConnectionAbortedError, BrokenPipeError, OSError):
# Client disconnected — interrupt the agent so it stops
# making upstream LLM calls, then cancel the task.
agent = agent_ref[0] if agent_ref else None
if agent is not None:
try:
agent.interrupt("SSE client disconnected")
except Exception:
pass
if not agent_task.done():
agent_task.cancel()
try:
await agent_task
except (asyncio.CancelledError, Exception):
pass
logger.info("SSE client disconnected; interrupted agent task %s", response_id)
return response
async def _handle_responses(self, request: "web.Request") -> "web.Response":
"""POST /v1/responses — OpenAI Responses API format."""
auth_err = self._check_auth(request)
@@ -1460,13 +1035,11 @@ class APIServerAdapter(BasePlatformAdapter):
if previous_response_id:
logger.debug("Both conversation_history and previous_response_id provided; using conversation_history")
stored_session_id = None
if not conversation_history and previous_response_id:
stored = self._response_store.get(previous_response_id)
if stored is None:
return web.json_response(_openai_error(f"Previous response not found: {previous_response_id}"), status=404)
conversation_history = list(stored.get("conversation_history", []))
stored_session_id = stored.get("session_id")
# If no instructions provided, carry forward from previous
if instructions is None:
instructions = stored.get("instructions")
@@ -1484,83 +1057,8 @@ class APIServerAdapter(BasePlatformAdapter):
if body.get("truncation") == "auto" and len(conversation_history) > 100:
conversation_history = conversation_history[-100:]
# Reuse session from previous_response_id chain so the dashboard
# groups the entire conversation under one session entry.
session_id = stored_session_id or str(uuid.uuid4())
stream = bool(body.get("stream", False))
if stream:
# Streaming branch — emit OpenAI Responses SSE events as the
# agent runs so frontends can render text deltas and tool
# calls in real time. See _write_sse_responses for details.
import queue as _q
_stream_q: _q.Queue = _q.Queue()
def _on_delta(delta):
# None from the agent is a CLI box-close signal, not EOS.
# Forwarding would kill the SSE stream prematurely; the
# SSE writer detects completion via agent_task.done().
if delta is not None:
_stream_q.put(delta)
def _on_tool_progress(event_type, name, preview, args, **kwargs):
"""Queue non-start tool progress events if needed in future.
The structured Responses stream uses ``tool_start_callback``
and ``tool_complete_callback`` for exact call-id correlation,
so progress events are currently ignored here.
"""
return
def _on_tool_start(tool_call_id, function_name, function_args):
"""Queue a started tool for live function_call streaming."""
_stream_q.put(("__tool_started__", {
"tool_call_id": tool_call_id,
"name": function_name,
"arguments": function_args or {},
}))
def _on_tool_complete(tool_call_id, function_name, function_args, function_result):
"""Queue a completed tool result for live function_call_output streaming."""
_stream_q.put(("__tool_completed__", {
"tool_call_id": tool_call_id,
"name": function_name,
"arguments": function_args or {},
"result": function_result,
}))
agent_ref = [None]
agent_task = asyncio.ensure_future(self._run_agent(
user_message=user_message,
conversation_history=conversation_history,
ephemeral_system_prompt=instructions,
session_id=session_id,
stream_delta_callback=_on_delta,
tool_progress_callback=_on_tool_progress,
tool_start_callback=_on_tool_start,
tool_complete_callback=_on_tool_complete,
agent_ref=agent_ref,
))
response_id = f"resp_{uuid.uuid4().hex[:28]}"
model_name = body.get("model", self._model_name)
created_at = int(time.time())
return await self._write_sse_responses(
request=request,
response_id=response_id,
model=model_name,
created_at=created_at,
stream_q=_stream_q,
agent_task=agent_task,
agent_ref=agent_ref,
conversation_history=conversation_history,
user_message=user_message,
instructions=instructions,
conversation=conversation,
store=store,
session_id=session_id,
)
# Run the agent (with Idempotency-Key support)
session_id = str(uuid.uuid4())
async def _compute_response():
return await self._run_agent(
@@ -1635,7 +1133,6 @@ class APIServerAdapter(BasePlatformAdapter):
"response": response_data,
"conversation_history": full_history,
"instructions": instructions,
"session_id": session_id,
})
# Update conversation mapping so the next request with the same
# conversation name automatically chains to this response
@@ -1989,8 +1486,6 @@ class APIServerAdapter(BasePlatformAdapter):
session_id: Optional[str] = None,
stream_delta_callback=None,
tool_progress_callback=None,
tool_start_callback=None,
tool_complete_callback=None,
agent_ref: Optional[list] = None,
) -> tuple:
"""
@@ -2004,7 +1499,7 @@ class APIServerAdapter(BasePlatformAdapter):
callers (e.g. the SSE writer) to call ``agent.interrupt()`` from
another thread to stop in-progress LLM calls.
"""
loop = asyncio.get_running_loop()
loop = asyncio.get_event_loop()
def _run():
agent = self._create_agent(
@@ -2012,8 +1507,6 @@ class APIServerAdapter(BasePlatformAdapter):
session_id=session_id,
stream_delta_callback=stream_delta_callback,
tool_progress_callback=tool_progress_callback,
tool_start_callback=tool_start_callback,
tool_complete_callback=tool_complete_callback,
)
if agent_ref is not None:
agent_ref[0] = agent
@@ -2150,12 +1643,10 @@ class APIServerAdapter(BasePlatformAdapter):
if previous_response_id:
logger.debug("Both conversation_history and previous_response_id provided; using conversation_history")
stored_session_id = None
if not conversation_history and previous_response_id:
stored = self._response_store.get(previous_response_id)
if stored:
conversation_history = list(stored.get("conversation_history", []))
stored_session_id = stored.get("session_id")
if instructions is None:
instructions = stored.get("instructions")
@@ -2174,7 +1665,7 @@ class APIServerAdapter(BasePlatformAdapter):
)
conversation_history.append({"role": msg["role"], "content": str(content)})
session_id = body.get("session_id") or stored_session_id or run_id
session_id = body.get("session_id") or run_id
ephemeral_system_prompt = instructions
async def _run_and_close():
+12 -105
View File
@@ -682,10 +682,6 @@ class MessageEvent:
# Auto-loaded skill(s) for topic/channel bindings (e.g., Telegram DM Topics,
# Discord channel_skill_bindings). A single name or ordered list.
auto_skill: Optional[str | list[str]] = None
# Per-channel ephemeral system prompt (e.g. Discord channel_prompts).
# Applied at API call time and never persisted to transcript history.
channel_prompt: Optional[str] = None
# Internal flag — set for synthetic events (e.g. background process
# completion notifications) that must bypass user authorization checks.
@@ -734,56 +730,25 @@ def merge_pending_message_event(
pending_messages: Dict[str, MessageEvent],
session_key: str,
event: MessageEvent,
*,
merge_text: bool = False,
) -> None:
"""Store or merge a pending event for a session.
Photo bursts/albums often arrive as multiple near-simultaneous PHOTO
events. Merge those into the existing queued event so the next turn sees
the whole burst.
When ``merge_text`` is enabled, rapid follow-up TEXT events are appended
instead of replacing the pending turn. This is used for Telegram bursty
follow-ups so a multi-part user thought is not silently truncated to only
the last queued fragment.
the whole burst, while non-photo follow-ups still replace the pending
event normally.
"""
existing = pending_messages.get(session_key)
if existing:
existing_is_photo = getattr(existing, "message_type", None) == MessageType.PHOTO
incoming_is_photo = event.message_type == MessageType.PHOTO
existing_has_media = bool(existing.media_urls)
incoming_has_media = bool(event.media_urls)
if existing_is_photo and incoming_is_photo:
existing.media_urls.extend(event.media_urls)
existing.media_types.extend(event.media_types)
if event.text:
existing.text = BasePlatformAdapter._merge_caption(existing.text, event.text)
return
if existing_has_media or incoming_has_media:
if incoming_has_media:
existing.media_urls.extend(event.media_urls)
existing.media_types.extend(event.media_types)
if event.text:
if existing.text:
existing.text = BasePlatformAdapter._merge_caption(existing.text, event.text)
else:
existing.text = event.text
if existing_is_photo or incoming_is_photo:
existing.message_type = MessageType.PHOTO
return
if (
merge_text
and getattr(existing, "message_type", None) == MessageType.TEXT
and event.message_type == MessageType.TEXT
):
if event.text:
existing.text = f"{existing.text}\n{event.text}" if existing.text else event.text
return
if (
existing
and getattr(existing, "message_type", None) == MessageType.PHOTO
and event.message_type == MessageType.PHOTO
):
existing.media_urls.extend(event.media_urls)
existing.media_types.extend(event.media_types)
if event.text:
existing.text = BasePlatformAdapter._merge_caption(existing.text, event.text)
return
pending_messages[session_key] = event
@@ -811,36 +776,6 @@ _RETRYABLE_ERROR_PATTERNS = (
MessageHandler = Callable[[MessageEvent], Awaitable[Optional[str]]]
def resolve_channel_prompt(
config_extra: dict,
channel_id: str,
parent_id: str | None = None,
) -> str | None:
"""Resolve a per-channel ephemeral prompt from platform config.
Looks up ``channel_prompts`` in the adapter's ``config.extra`` dict.
Prefers an exact match on *channel_id*; falls back to *parent_id*
(useful for forum threads / child channels inheriting a parent prompt).
Returns the prompt string, or None if no match is found. Blank/whitespace-
only prompts are treated as absent.
"""
prompts = config_extra.get("channel_prompts") or {}
if not isinstance(prompts, dict):
return None
for key in (channel_id, parent_id):
if not key:
continue
prompt = prompts.get(key)
if prompt is None:
continue
prompt = str(prompt).strip()
if prompt:
return prompt
return None
class BasePlatformAdapter(ABC):
"""
Base class for platform adapters.
@@ -870,11 +805,6 @@ class BasePlatformAdapter(ABC):
# Gateway shutdown cancels these so an old gateway instance doesn't keep
# working on a task after --replace or manual restarts.
self._background_tasks: set[asyncio.Task] = set()
# One-shot callbacks to fire after the main response is delivered.
# Keyed by session_key. GatewayRunner uses this to defer
# background-review notifications ("💾 Skill created") until the
# primary reply has been sent.
self._post_delivery_callbacks: Dict[str, Callable] = {}
self._expected_cancelled_tasks: set[asyncio.Task] = set()
self._busy_session_handler: Optional[Callable[[MessageEvent, str], Awaitable[bool]]] = None
# Chats where auto-TTS on voice input is disabled (set by /voice off)
@@ -1694,21 +1624,6 @@ class BasePlatformAdapter(ABC):
# streaming already delivered the text (already_sent=True) or
# when the message was queued behind an active agent. Log at
# DEBUG to avoid noisy warnings for expected behavior.
#
# Suppress stale response when the session was interrupted by a
# new message that hasn't been consumed yet. The pending message
# is processed by the pending-message handler below (#8221/#2483).
if (
response
and interrupt_event.is_set()
and session_key in self._pending_messages
):
logger.info(
"[%s] Suppressing stale response for interrupted session %s",
self.name,
session_key,
)
response = None
if not response:
logger.debug("[%s] Handler returned empty/None response for %s", self.name, event.source.chat_id)
if response:
@@ -1930,14 +1845,6 @@ class BasePlatformAdapter(ABC):
except Exception:
pass # Last resort — don't let error reporting crash the handler
finally:
# Fire any one-shot post-delivery callback registered for this
# session (e.g. deferred background-review notifications).
_post_cb = getattr(self, "_post_delivery_callbacks", {}).pop(session_key, None)
if callable(_post_cb):
try:
_post_cb()
except Exception:
pass
# Stop typing indicator
typing_task.cancel()
try:
+1 -150
View File
@@ -1379,68 +1379,6 @@ class DiscordAdapter(BasePlatformAdapter):
)
return await super().send_image(chat_id, image_url, caption, reply_to)
async def send_animation(
self,
chat_id: str,
animation_url: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Send an animated GIF natively as a Discord file attachment."""
if not self._client:
return SendResult(success=False, error="Not connected")
if not is_safe_url(animation_url):
logger.warning("[%s] Blocked unsafe animation URL during Discord send_animation", self.name)
return await super().send_animation(chat_id, animation_url, caption, reply_to, metadata=metadata)
try:
import aiohttp
channel = self._client.get_channel(int(chat_id))
if not channel:
channel = await self._client.fetch_channel(int(chat_id))
if not channel:
return SendResult(success=False, error=f"Channel {chat_id} not found")
# Download the GIF and send as a Discord file attachment
# (Discord renders .gif attachments as auto-playing animations inline)
from gateway.platforms.base import resolve_proxy_url, proxy_kwargs_for_aiohttp
_proxy = resolve_proxy_url(platform_env_var="DISCORD_PROXY")
_sess_kw, _req_kw = proxy_kwargs_for_aiohttp(_proxy)
async with aiohttp.ClientSession(**_sess_kw) as session:
async with session.get(animation_url, timeout=aiohttp.ClientTimeout(total=30), **_req_kw) as resp:
if resp.status != 200:
raise Exception(f"Failed to download animation: HTTP {resp.status}")
animation_data = await resp.read()
import io
file = discord.File(io.BytesIO(animation_data), filename="animation.gif")
msg = await channel.send(
content=caption if caption else None,
file=file,
)
return SendResult(success=True, message_id=str(msg.id))
except ImportError:
logger.warning(
"[%s] aiohttp not installed, falling back to URL. Run: pip install aiohttp",
self.name,
exc_info=True,
)
return await super().send_animation(chat_id, animation_url, caption, reply_to, metadata=metadata)
except Exception as e: # pragma: no cover - defensive logging
logger.error(
"[%s] Failed to send animation attachment, falling back to URL: %s",
self.name,
e,
exc_info=True,
)
return await super().send_animation(chat_id, animation_url, caption, reply_to, metadata=metadata)
async def send_video(
self,
chat_id: str,
@@ -1758,10 +1696,6 @@ class DiscordAdapter(BasePlatformAdapter):
async def slash_update(interaction: discord.Interaction):
await self._run_simple_slash(interaction, "/update", "Update initiated~")
@tree.command(name="restart", description="Gracefully restart the Hermes gateway")
async def slash_restart(interaction: discord.Interaction):
await self._run_simple_slash(interaction, "/restart", "Restart requested~")
@tree.command(name="approve", description="Approve a pending dangerous command")
@discord.app_commands.describe(scope="Optional: 'all', 'session', 'always', 'all session', 'all always'")
async def slash_approve(interaction: discord.Interaction, scope: str = ""):
@@ -1802,76 +1736,6 @@ class DiscordAdapter(BasePlatformAdapter):
async def slash_btw(interaction: discord.Interaction, question: str):
await self._run_simple_slash(interaction, f"/btw {question}")
# ── Auto-register any gateway-available commands not yet on the tree ──
# This ensures new commands added to COMMAND_REGISTRY in
# hermes_cli/commands.py automatically appear as Discord slash
# commands without needing a manual entry here.
try:
from hermes_cli.commands import COMMAND_REGISTRY, _is_gateway_available, _resolve_config_gates
already_registered = set()
try:
already_registered = {cmd.name for cmd in tree.get_commands()}
except Exception:
pass
config_overrides = _resolve_config_gates()
for cmd_def in COMMAND_REGISTRY:
if not _is_gateway_available(cmd_def, config_overrides):
continue
# Discord command names: lowercase, hyphens OK, max 32 chars.
discord_name = cmd_def.name.lower()[:32]
if discord_name in already_registered:
continue
# Skip aliases that overlap with already-registered names
# (aliases for explicitly registered commands are handled above).
desc = (cmd_def.description or f"Run /{cmd_def.name}")[:100]
has_args = bool(cmd_def.args_hint)
if has_args:
# Command takes optional arguments — create handler with
# an optional ``args`` string parameter.
def _make_args_handler(_name: str, _hint: str):
@discord.app_commands.describe(args=f"Arguments: {_hint}"[:100])
async def _handler(interaction: discord.Interaction, args: str = ""):
await self._run_simple_slash(
interaction, f"/{_name} {args}".strip()
)
_handler.__name__ = f"auto_slash_{_name.replace('-', '_')}"
return _handler
handler = _make_args_handler(cmd_def.name, cmd_def.args_hint)
else:
# Parameterless command.
def _make_simple_handler(_name: str):
async def _handler(interaction: discord.Interaction):
await self._run_simple_slash(interaction, f"/{_name}")
_handler.__name__ = f"auto_slash_{_name.replace('-', '_')}"
return _handler
handler = _make_simple_handler(cmd_def.name)
auto_cmd = discord.app_commands.Command(
name=discord_name,
description=desc,
callback=handler,
)
try:
tree.add_command(auto_cmd)
already_registered.add(discord_name)
except Exception:
# Silently skip commands that fail registration (e.g.
# name conflict with a subcommand group).
pass
logger.debug(
"Discord auto-registered %d commands from COMMAND_REGISTRY",
len(already_registered),
)
except Exception as e:
logger.warning("Discord auto-register from COMMAND_REGISTRY failed: %s", e)
# Register skills under a single /skill command group with category
# subcommand groups. This uses 1 top-level slot instead of N,
# supporting up to 25 categories × 25 skills = 625 skills.
@@ -1992,14 +1856,11 @@ class DiscordAdapter(BasePlatformAdapter):
)
msg_type = MessageType.COMMAND if text.startswith("/") else MessageType.TEXT
channel_id = str(interaction.channel_id)
parent_id = str(getattr(getattr(interaction, "channel", None), "parent_id", "") or "")
return MessageEvent(
text=text,
message_type=msg_type,
source=source,
raw_message=interaction,
channel_prompt=self._resolve_channel_prompt(channel_id, parent_id or None),
)
# ------------------------------------------------------------------
@@ -2070,17 +1931,14 @@ class DiscordAdapter(BasePlatformAdapter):
chat_topic=chat_topic,
)
_parent_channel = self._thread_parent_channel(getattr(interaction, "channel", None))
_parent_id = str(getattr(_parent_channel, "id", "") or "")
_parent_id = str(getattr(getattr(interaction, "channel", None), "parent_id", "") or "")
_skills = self._resolve_channel_skills(thread_id, _parent_id or None)
_channel_prompt = self._resolve_channel_prompt(thread_id, _parent_id or None)
event = MessageEvent(
text=text,
message_type=MessageType.TEXT,
source=source,
raw_message=interaction,
auto_skill=_skills,
channel_prompt=_channel_prompt,
)
await self.handle_message(event)
@@ -2109,11 +1967,6 @@ class DiscordAdapter(BasePlatformAdapter):
return list(dict.fromkeys(skills)) # dedup, preserve order
return None
def _resolve_channel_prompt(self, channel_id: str, parent_id: str | None = None) -> str | None:
"""Resolve a Discord per-channel prompt, preferring the exact channel over its parent."""
from gateway.platforms.base import resolve_channel_prompt
return resolve_channel_prompt(self.config.extra, channel_id, parent_id)
def _thread_parent_channel(self, channel: Any) -> Any:
"""Return the parent text channel when invoked from a thread."""
return getattr(channel, "parent", None) or channel
@@ -2665,7 +2518,6 @@ class DiscordAdapter(BasePlatformAdapter):
_parent_id = str(getattr(_chan, "parent_id", "") or "")
_chan_id = str(getattr(_chan, "id", ""))
_skills = self._resolve_channel_skills(_chan_id, _parent_id or None)
_channel_prompt = self._resolve_channel_prompt(_chan_id, _parent_id or None)
reply_to_id = None
reply_to_text = None
@@ -2686,7 +2538,6 @@ class DiscordAdapter(BasePlatformAdapter):
reply_to_text=reply_to_text,
timestamp=message.created_at,
auto_skill=_skills,
channel_prompt=_channel_prompt,
)
# Track thread participation so the bot won't require @mention for
+1 -4
View File
@@ -49,10 +49,7 @@ class MessageDeduplicator:
return False
now = time.time()
if msg_id in self._seen:
if now - self._seen[msg_id] < self._ttl:
return True
# Entry has expired — remove it and treat as new
del self._seen[msg_id]
return True
self._seen[msg_id] = now
if len(self._seen) > self._max_size:
cutoff = now - self._ttl
-8
View File
@@ -729,14 +729,6 @@ class MatrixAdapter(BasePlatformAdapter):
except Exception:
pass
async def stop_typing(self, chat_id: str) -> None:
"""Stop the Matrix typing indicator."""
if self._client:
try:
await self._client.set_typing(RoomID(chat_id), timeout=0)
except Exception:
pass
async def edit_message(
self, chat_id: str, message_id: str, content: str
) -> SendResult:
-7
View File
@@ -718,12 +718,6 @@ class MattermostAdapter(BasePlatformAdapter):
thread_id=thread_id,
)
# Per-channel ephemeral prompt
from gateway.platforms.base import resolve_channel_prompt
_channel_prompt = resolve_channel_prompt(
self.config.extra, channel_id, None,
)
msg_event = MessageEvent(
text=message_text,
message_type=msg_type,
@@ -732,7 +726,6 @@ class MattermostAdapter(BasePlatformAdapter):
message_id=post_id,
media_urls=media_urls if media_urls else None,
media_types=media_types if media_types else None,
channel_prompt=_channel_prompt,
)
await self.handle_message(msg_event)
+3 -28
View File
@@ -366,20 +366,6 @@ class SlackAdapter(BasePlatformAdapter):
# in an assistant-enabled context. Falls back to reactions.
logger.debug("[Slack] assistant.threads.setStatus failed: %s", e)
def _dm_top_level_threads_as_sessions(self) -> bool:
"""Whether top-level Slack DMs get per-message session threads.
Defaults to ``True`` so each visible DM reply thread is isolated as its
own Hermes session matching the per-thread behavior channels already
have. Set ``platforms.slack.extra.dm_top_level_threads_as_sessions``
to ``false`` in config.yaml to revert to the legacy behavior where all
top-level DMs share one continuous session.
"""
raw = self.config.extra.get("dm_top_level_threads_as_sessions")
if raw is None:
return True # default: each DM thread is its own session
return str(raw).strip().lower() in ("1", "true", "yes", "on")
def _resolve_thread_ts(
self,
reply_to: Optional[str] = None,
@@ -1010,14 +996,10 @@ class SlackAdapter(BasePlatformAdapter):
# Build thread_ts for session keying.
# In channels: fall back to ts so each top-level @mention starts a
# new thread/session (the bot always replies in a thread).
# In DMs: fall back to ts so each top-level DM reply thread gets
# its own session key (matching channel behavior). Set
# dm_top_level_threads_as_sessions: false in config to revert to
# legacy single-session-per-DM-channel behavior.
# In DMs: only use the real thread_ts — top-level DMs should share
# one continuous session, threaded DMs get their own session.
if is_dm:
thread_ts = event.get("thread_ts") or assistant_meta.get("thread_ts")
if not thread_ts and self._dm_top_level_threads_as_sessions():
thread_ts = ts
thread_ts = event.get("thread_ts") or assistant_meta.get("thread_ts") # None for top-level DMs
else:
thread_ts = event.get("thread_ts") or ts # ts fallback for channels
@@ -1185,12 +1167,6 @@ class SlackAdapter(BasePlatformAdapter):
thread_id=thread_ts,
)
# Per-channel ephemeral prompt
from gateway.platforms.base import resolve_channel_prompt
_channel_prompt = resolve_channel_prompt(
self.config.extra, channel_id, None,
)
msg_event = MessageEvent(
text=text,
message_type=msg_type,
@@ -1200,7 +1176,6 @@ class SlackAdapter(BasePlatformAdapter):
media_urls=media_urls,
media_types=media_types,
reply_to_message_id=thread_ts if thread_ts != ts else None,
channel_prompt=_channel_prompt,
)
# Only react when bot is directly addressed (DM or @mention).
+43 -132
View File
@@ -11,7 +11,6 @@ import asyncio
import json
import logging
import os
import html as _html
import re
from typing import Dict, List, Optional, Any
@@ -19,10 +18,6 @@ logger = logging.getLogger(__name__)
try:
from telegram import Update, Bot, Message, InlineKeyboardButton, InlineKeyboardMarkup
try:
from telegram import LinkPreviewOptions
except ImportError:
LinkPreviewOptions = None
from telegram.ext import (
Application,
CommandHandler,
@@ -41,7 +36,6 @@ except ImportError:
Message = Any
InlineKeyboardButton = Any
InlineKeyboardMarkup = Any
LinkPreviewOptions = None
Application = Any
CommandHandler = Any
CallbackQueryHandler = Any
@@ -135,7 +129,6 @@ class TelegramAdapter(BasePlatformAdapter):
# When a chunk is near this limit, a continuation is almost certain.
_SPLIT_THRESHOLD = 4000
MEDIA_GROUP_WAIT_SECONDS = 0.8
_GENERAL_TOPIC_THREAD_ID = "1"
def __init__(self, config: PlatformConfig):
super().__init__(config, Platform.TELEGRAM)
@@ -144,7 +137,6 @@ class TelegramAdapter(BasePlatformAdapter):
self._webhook_mode: bool = False
self._mention_patterns = self._compile_mention_patterns()
self._reply_to_mode: str = getattr(config, 'reply_to_mode', 'first') or 'first'
self._disable_link_previews: bool = self._coerce_bool_extra("disable_link_previews", False)
# Buffer rapid/album photo updates so Telegram image bursts are handled
# as a single MessageEvent instead of self-interrupting multiple turns.
self._media_batch_delay_seconds = float(os.getenv("HERMES_TELEGRAM_MEDIA_BATCH_DELAY_SECONDS", "0.8"))
@@ -171,38 +163,6 @@ class TelegramAdapter(BasePlatformAdapter):
# Approval button state: message_id → session_key
self._approval_state: Dict[int, str] = {}
@staticmethod
def _is_callback_user_authorized(user_id: str) -> bool:
"""Return whether a Telegram inline-button caller may perform gated actions."""
allowed_csv = os.getenv("TELEGRAM_ALLOWED_USERS", "").strip()
if not allowed_csv:
return True
allowed_ids = {uid.strip() for uid in allowed_csv.split(",") if uid.strip()}
return "*" in allowed_ids or user_id in allowed_ids
@classmethod
def _metadata_thread_id(cls, metadata: Optional[Dict[str, Any]]) -> Optional[str]:
if not metadata:
return None
thread_id = metadata.get("thread_id") or metadata.get("message_thread_id")
return str(thread_id) if thread_id is not None else None
@classmethod
def _message_thread_id_for_send(cls, thread_id: Optional[str]) -> Optional[int]:
if not thread_id or str(thread_id) == cls._GENERAL_TOPIC_THREAD_ID:
return None
return int(thread_id)
@classmethod
def _message_thread_id_for_typing(cls, thread_id: Optional[str]) -> Optional[int]:
if not thread_id:
return None
return int(thread_id)
@staticmethod
def _is_thread_not_found_error(error: Exception) -> bool:
return "thread not found" in str(error).lower()
def _fallback_ips(self) -> list[str]:
"""Return validated fallback IPs from config (populated by _apply_env_overrides)."""
configured = self.config.extra.get("fallback_ips", []) if getattr(self.config, "extra", None) else []
@@ -233,26 +193,6 @@ class TelegramAdapter(BasePlatformAdapter):
pass
return isinstance(error, OSError)
def _coerce_bool_extra(self, key: str, default: bool = False) -> bool:
value = self.config.extra.get(key) if getattr(self.config, "extra", None) else None
if value is None:
return default
if isinstance(value, str):
lowered = value.strip().lower()
if lowered in ("true", "1", "yes", "on"):
return True
if lowered in ("false", "0", "no", "off"):
return False
return default
return bool(value)
def _link_preview_kwargs(self) -> Dict[str, Any]:
if not getattr(self, "_disable_link_previews", False):
return {}
if LinkPreviewOptions is not None:
return {"link_preview_options": LinkPreviewOptions(is_disabled=True)}
return {"disable_web_page_preview": True}
async def _handle_polling_network_error(self, error: Exception) -> None:
"""Reconnect polling after a transient network interruption.
@@ -600,7 +540,7 @@ class TelegramAdapter(BasePlatformAdapter):
"write_timeout": _env_float("HERMES_TELEGRAM_HTTP_WRITE_TIMEOUT", 20.0),
}
proxy_url = resolve_proxy_url("TELEGRAM_PROXY")
proxy_url = resolve_proxy_url()
disable_fallback = (os.getenv("HERMES_TELEGRAM_DISABLE_FALLBACK_IPS", "").strip().lower() in ("1", "true", "yes", "on"))
fallback_ips = self._fallback_ips()
if not fallback_ips:
@@ -666,14 +606,14 @@ class TelegramAdapter(BasePlatformAdapter):
from telegram.error import NetworkError, TimedOut
except ImportError:
NetworkError = TimedOut = OSError # type: ignore[misc,assignment]
_max_connect = 8
_max_connect = 3
for _attempt in range(_max_connect):
try:
await self._app.initialize()
break
except (NetworkError, TimedOut, OSError) as init_err:
if _attempt < _max_connect - 1:
wait = min(2 ** _attempt, 15)
wait = 2 ** _attempt
logger.warning(
"[%s] Connect attempt %d/%d failed: %s — retrying in %ds",
self.name, _attempt + 1, _max_connect, init_err, wait,
@@ -874,7 +814,7 @@ class TelegramAdapter(BasePlatformAdapter):
]
message_ids = []
thread_id = self._metadata_thread_id(metadata)
thread_id = metadata.get("thread_id") if metadata else None
try:
from telegram.error import NetworkError as _NetErr
@@ -894,7 +834,7 @@ class TelegramAdapter(BasePlatformAdapter):
for i, chunk in enumerate(chunks):
should_thread = self._should_thread_reply(reply_to, i)
reply_to_id = int(reply_to) if should_thread else None
effective_thread_id = self._message_thread_id_for_send(thread_id)
effective_thread_id = int(thread_id) if thread_id else None
msg = None
for _send_attempt in range(3):
@@ -907,7 +847,6 @@ class TelegramAdapter(BasePlatformAdapter):
parse_mode=ParseMode.MARKDOWN_V2,
reply_to_message_id=reply_to_id,
message_thread_id=effective_thread_id,
**self._link_preview_kwargs(),
)
except Exception as md_error:
# Markdown parsing failed, try plain text
@@ -920,7 +859,6 @@ class TelegramAdapter(BasePlatformAdapter):
parse_mode=None,
reply_to_message_id=reply_to_id,
message_thread_id=effective_thread_id,
**self._link_preview_kwargs(),
)
else:
raise
@@ -931,7 +869,8 @@ class TelegramAdapter(BasePlatformAdapter):
# (not transient network issues). Detect and handle
# specific cases instead of blindly retrying.
if _BadReq and isinstance(send_err, _BadReq):
if self._is_thread_not_found_error(send_err) and effective_thread_id is not None:
err_lower = str(send_err).lower()
if "thread not found" in err_lower and effective_thread_id is not None:
# Thread doesn't exist — retry without
# message_thread_id so the message still
# reaches the chat.
@@ -941,7 +880,6 @@ class TelegramAdapter(BasePlatformAdapter):
)
effective_thread_id = None
continue
err_lower = str(send_err).lower()
if "message to be replied not found" in err_lower and reply_to_id is not None:
# Original message was deleted before we
# could reply — clear reply target and retry
@@ -1108,7 +1046,6 @@ class TelegramAdapter(BasePlatformAdapter):
text=text,
parse_mode=ParseMode.MARKDOWN,
reply_markup=keyboard,
**self._link_preview_kwargs(),
)
return SendResult(success=True, message_id=str(msg.message_id))
except Exception as e:
@@ -1131,13 +1068,15 @@ class TelegramAdapter(BasePlatformAdapter):
try:
cmd_preview = command[:3800] + "..." if len(command) > 3800 else command
text = (
f"⚠️ <b>Command Approval Required</b>\n\n"
f"<pre>{_html.escape(cmd_preview)}</pre>\n\n"
f"Reason: {_html.escape(description)}"
f"⚠️ *Command Approval Required*\n\n"
f"`{cmd_preview}`\n\n"
f"Reason: {description}"
)
# Resolve thread context for thread replies
thread_id = self._metadata_thread_id(metadata)
thread_id = None
if metadata:
thread_id = metadata.get("thread_id") or metadata.get("message_thread_id")
# We'll use the message_id as part of callback_data to look up session_key
# Send a placeholder first, then update — or use a counter.
@@ -1161,13 +1100,11 @@ class TelegramAdapter(BasePlatformAdapter):
kwargs: Dict[str, Any] = {
"chat_id": int(chat_id),
"text": text,
"parse_mode": ParseMode.HTML,
"parse_mode": ParseMode.MARKDOWN,
"reply_markup": keyboard,
**self._link_preview_kwargs(),
}
message_thread_id = self._message_thread_id_for_send(thread_id)
if message_thread_id is not None:
kwargs["message_thread_id"] = message_thread_id
if thread_id:
kwargs["message_thread_id"] = int(thread_id)
msg = await self._bot.send_message(**kwargs)
@@ -1235,7 +1172,6 @@ class TelegramAdapter(BasePlatformAdapter):
parse_mode=ParseMode.MARKDOWN,
reply_markup=keyboard,
message_thread_id=int(thread_id) if thread_id else None,
**self._link_preview_kwargs(),
)
# Store picker state keyed by chat_id
@@ -1504,9 +1440,12 @@ class TelegramAdapter(BasePlatformAdapter):
# Only authorized users may click approval buttons.
caller_id = str(getattr(query.from_user, "id", ""))
if not self._is_callback_user_authorized(caller_id):
await query.answer(text="⛔ You are not authorized to approve commands.")
return
allowed_csv = os.getenv("TELEGRAM_ALLOWED_USERS", "").strip()
if allowed_csv:
allowed_ids = {uid.strip() for uid in allowed_csv.split(",") if uid.strip()}
if "*" not in allowed_ids and caller_id not in allowed_ids:
await query.answer(text="⛔ You are not authorized to approve commands.")
return
session_key = self._approval_state.pop(approval_id, None)
if not session_key:
@@ -1551,10 +1490,6 @@ class TelegramAdapter(BasePlatformAdapter):
if not data.startswith("update_prompt:"):
return
answer = data.split(":", 1)[1] # "y" or "n"
caller_id = str(getattr(query.from_user, "id", ""))
if not self._is_callback_user_authorized(caller_id):
await query.answer(text="⛔ You are not authorized to answer update prompts.")
return
await query.answer(text=f"Sent '{answer}' to the update process.")
# Edit the message to show the choice and remove buttons
label = "Yes" if answer == "y" else "No"
@@ -1600,23 +1535,23 @@ class TelegramAdapter(BasePlatformAdapter):
with open(audio_path, "rb") as audio_file:
# .ogg files -> send as voice (round playable bubble)
if audio_path.endswith((".ogg", ".opus")):
_voice_thread = self._metadata_thread_id(metadata)
_voice_thread = metadata.get("thread_id") if metadata else None
msg = await self._bot.send_voice(
chat_id=int(chat_id),
voice=audio_file,
caption=caption[:1024] if caption else None,
reply_to_message_id=int(reply_to) if reply_to else None,
message_thread_id=self._message_thread_id_for_send(_voice_thread),
message_thread_id=int(_voice_thread) if _voice_thread else None,
)
else:
# .mp3 and others -> send as audio file
_audio_thread = self._metadata_thread_id(metadata)
_audio_thread = metadata.get("thread_id") if metadata else None
msg = await self._bot.send_audio(
chat_id=int(chat_id),
audio=audio_file,
caption=caption[:1024] if caption else None,
reply_to_message_id=int(reply_to) if reply_to else None,
message_thread_id=self._message_thread_id_for_send(_audio_thread),
message_thread_id=int(_audio_thread) if _audio_thread else None,
)
return SendResult(success=True, message_id=str(msg.message_id))
except Exception as e:
@@ -1646,14 +1581,14 @@ class TelegramAdapter(BasePlatformAdapter):
if not os.path.exists(image_path):
return SendResult(success=False, error=f"Image file not found: {image_path}")
_thread = self._metadata_thread_id(metadata)
_thread = metadata.get("thread_id") if metadata else None
with open(image_path, "rb") as image_file:
msg = await self._bot.send_photo(
chat_id=int(chat_id),
photo=image_file,
caption=caption[:1024] if caption else None,
reply_to_message_id=int(reply_to) if reply_to else None,
message_thread_id=self._message_thread_id_for_send(_thread),
message_thread_id=int(_thread) if _thread else None,
)
return SendResult(success=True, message_id=str(msg.message_id))
except Exception as e:
@@ -1684,7 +1619,7 @@ class TelegramAdapter(BasePlatformAdapter):
return SendResult(success=False, error=f"File not found: {file_path}")
display_name = file_name or os.path.basename(file_path)
_thread = self._metadata_thread_id(metadata)
_thread = metadata.get("thread_id") if metadata else None
with open(file_path, "rb") as f:
msg = await self._bot.send_document(
@@ -1693,7 +1628,7 @@ class TelegramAdapter(BasePlatformAdapter):
filename=display_name,
caption=caption[:1024] if caption else None,
reply_to_message_id=int(reply_to) if reply_to else None,
message_thread_id=self._message_thread_id_for_send(_thread),
message_thread_id=int(_thread) if _thread else None,
)
return SendResult(success=True, message_id=str(msg.message_id))
except Exception as e:
@@ -1717,14 +1652,14 @@ class TelegramAdapter(BasePlatformAdapter):
if not os.path.exists(video_path):
return SendResult(success=False, error=f"Video file not found: {video_path}")
_thread = self._metadata_thread_id(metadata)
_thread = metadata.get("thread_id") if metadata else None
with open(video_path, "rb") as f:
msg = await self._bot.send_video(
chat_id=int(chat_id),
video=f,
caption=caption[:1024] if caption else None,
reply_to_message_id=int(reply_to) if reply_to else None,
message_thread_id=self._message_thread_id_for_send(_thread),
message_thread_id=int(_thread) if _thread else None,
)
return SendResult(success=True, message_id=str(msg.message_id))
except Exception as e:
@@ -1754,13 +1689,13 @@ class TelegramAdapter(BasePlatformAdapter):
try:
# Telegram can send photos directly from URLs (up to ~5MB)
_photo_thread = self._metadata_thread_id(metadata)
_photo_thread = metadata.get("thread_id") if metadata else None
msg = await self._bot.send_photo(
chat_id=int(chat_id),
photo=image_url,
caption=caption[:1024] if caption else None, # Telegram caption limit
reply_to_message_id=int(reply_to) if reply_to else None,
message_thread_id=self._message_thread_id_for_send(_photo_thread),
message_thread_id=int(_photo_thread) if _photo_thread else None,
)
return SendResult(success=True, message_id=str(msg.message_id))
except Exception as e:
@@ -1783,7 +1718,6 @@ class TelegramAdapter(BasePlatformAdapter):
photo=image_data,
caption=caption[:1024] if caption else None,
reply_to_message_id=int(reply_to) if reply_to else None,
message_thread_id=self._message_thread_id_for_send(_photo_thread),
)
return SendResult(success=True, message_id=str(msg.message_id))
except Exception as e2:
@@ -1809,13 +1743,13 @@ class TelegramAdapter(BasePlatformAdapter):
return SendResult(success=False, error="Not connected")
try:
_anim_thread = self._metadata_thread_id(metadata)
_anim_thread = metadata.get("thread_id") if metadata else None
msg = await self._bot.send_animation(
chat_id=int(chat_id),
animation=animation_url,
caption=caption[:1024] if caption else None,
reply_to_message_id=int(reply_to) if reply_to else None,
message_thread_id=self._message_thread_id_for_send(_anim_thread),
message_thread_id=int(_anim_thread) if _anim_thread else None,
)
return SendResult(success=True, message_id=str(msg.message_id))
except Exception as e:
@@ -1832,23 +1766,12 @@ class TelegramAdapter(BasePlatformAdapter):
"""Send typing indicator."""
if self._bot:
try:
_typing_thread = self._metadata_thread_id(metadata)
message_thread_id = self._message_thread_id_for_typing(_typing_thread)
try:
await self._bot.send_chat_action(
chat_id=int(chat_id),
action="typing",
message_thread_id=message_thread_id,
)
except Exception as e:
if message_thread_id is not None and self._is_thread_not_found_error(e):
await self._bot.send_chat_action(
chat_id=int(chat_id),
action="typing",
message_thread_id=None,
)
else:
raise
_typing_thread = metadata.get("thread_id") if metadata else None
await self._bot.send_chat_action(
chat_id=int(chat_id),
action="typing",
message_thread_id=int(_typing_thread) if _typing_thread else None,
)
except Exception as e:
# Typing failures are non-fatal; log at debug level only.
logger.debug(
@@ -2793,9 +2716,7 @@ class TelegramAdapter(BasePlatformAdapter):
# Resolve DM topic name and skill binding
thread_id_raw = message.message_thread_id
thread_id_str = str(thread_id_raw) if thread_id_raw is not None else None
if chat_type == "group" and thread_id_str is None and getattr(chat, "is_forum", False):
thread_id_str = self._GENERAL_TOPIC_THREAD_ID
thread_id_str = str(thread_id_raw) if thread_id_raw else None
chat_topic = None
topic_skill = None
@@ -2844,15 +2765,6 @@ class TelegramAdapter(BasePlatformAdapter):
reply_to_id = str(message.reply_to_message.message_id)
reply_to_text = message.reply_to_message.text or message.reply_to_message.caption or None
# Per-channel/topic ephemeral prompt
from gateway.platforms.base import resolve_channel_prompt
_chat_id_str = str(chat.id)
_channel_prompt = resolve_channel_prompt(
self.config.extra,
thread_id_str or _chat_id_str,
_chat_id_str if thread_id_str else None,
)
return MessageEvent(
text=message.text or "",
message_type=msg_type,
@@ -2862,7 +2774,6 @@ class TelegramAdapter(BasePlatformAdapter):
reply_to_message_id=reply_to_id,
reply_to_text=reply_to_text,
auto_skill=topic_skill,
channel_prompt=_channel_prompt,
timestamp=message.date,
)
+1 -1
View File
@@ -46,7 +46,7 @@ _SEED_FALLBACK_IPS: list[str] = ["149.154.167.220"]
def _resolve_proxy_url() -> str | None:
# Delegate to shared implementation (env vars + macOS system proxy detection)
from gateway.platforms.base import resolve_proxy_url
return resolve_proxy_url("TELEGRAM_PROXY")
return resolve_proxy_url()
class TelegramFallbackTransport(httpx.AsyncBaseTransport):
-14
View File
@@ -258,20 +258,6 @@ class WecomCallbackAdapter(BasePlatformAdapter):
)
event = self._build_event(app, decrypted)
if event is not None:
# Deduplicate: WeCom retries callbacks on timeout,
# producing duplicate inbound messages (#10305).
if event.message_id:
now = time.time()
if event.message_id in self._seen_messages:
if now - self._seen_messages[event.message_id] < MESSAGE_DEDUP_TTL_SECONDS:
logger.debug("[WecomCallback] Duplicate MsgId %s, skipping", event.message_id)
return web.Response(text="success", content_type="text/plain")
del self._seen_messages[event.message_id]
self._seen_messages[event.message_id] = now
# Prune expired entries when cache grows large
if len(self._seen_messages) > 2000:
cutoff = now - MESSAGE_DEDUP_TTL_SECONDS
self._seen_messages = {k: v for k, v in self._seen_messages.items() if v > cutoff}
# Record which app this user belongs to.
if event.source and event.source.user_id:
map_key = self._user_app_key(
+333 -788
View File
File diff suppressed because it is too large Load Diff
+2 -6
View File
@@ -301,8 +301,6 @@ def build_session_context_prompt(
lines.append("")
lines.append("**Delivery options for scheduled tasks:**")
from hermes_constants import display_hermes_home
# Origin delivery
if context.source.platform == Platform.LOCAL:
lines.append("- `\"origin\"` → Local output (saved to files)")
@@ -311,11 +309,9 @@ def build_session_context_prompt(
_hash_chat_id(context.source.chat_id) if redact_pii else context.source.chat_id
)
lines.append(f"- `\"origin\"` → Back to this chat ({_origin_label})")
# Local always available
lines.append(
f"- `\"local\"` → Save to local files only ({display_hermes_home()}/cron/output/)"
)
lines.append("- `\"local\"` → Save to local files only (~/.hermes/cron/output/)")
# Platform home channels
for platform, home in context.home_channels.items():
+17 -34
View File
@@ -37,24 +37,18 @@ needs to replace the import + call site:
"""
from contextvars import ContextVar
from typing import Any
# Sentinel to distinguish "never set in this context" from "explicitly set to empty".
# When a contextvar holds _UNSET, we fall back to os.environ (CLI/cron compat).
# When it holds "" (after clear_session_vars resets it), we return "" — no fallback.
_UNSET: Any = object()
# ---------------------------------------------------------------------------
# Per-task session variables
# ---------------------------------------------------------------------------
_SESSION_PLATFORM: ContextVar = ContextVar("HERMES_SESSION_PLATFORM", default=_UNSET)
_SESSION_CHAT_ID: ContextVar = ContextVar("HERMES_SESSION_CHAT_ID", default=_UNSET)
_SESSION_CHAT_NAME: ContextVar = ContextVar("HERMES_SESSION_CHAT_NAME", default=_UNSET)
_SESSION_THREAD_ID: ContextVar = ContextVar("HERMES_SESSION_THREAD_ID", default=_UNSET)
_SESSION_USER_ID: ContextVar = ContextVar("HERMES_SESSION_USER_ID", default=_UNSET)
_SESSION_USER_NAME: ContextVar = ContextVar("HERMES_SESSION_USER_NAME", default=_UNSET)
_SESSION_KEY: ContextVar = ContextVar("HERMES_SESSION_KEY", default=_UNSET)
_SESSION_PLATFORM: ContextVar[str] = ContextVar("HERMES_SESSION_PLATFORM", default="")
_SESSION_CHAT_ID: ContextVar[str] = ContextVar("HERMES_SESSION_CHAT_ID", default="")
_SESSION_CHAT_NAME: ContextVar[str] = ContextVar("HERMES_SESSION_CHAT_NAME", default="")
_SESSION_THREAD_ID: ContextVar[str] = ContextVar("HERMES_SESSION_THREAD_ID", default="")
_SESSION_USER_ID: ContextVar[str] = ContextVar("HERMES_SESSION_USER_ID", default="")
_SESSION_USER_NAME: ContextVar[str] = ContextVar("HERMES_SESSION_USER_NAME", default="")
_SESSION_KEY: ContextVar[str] = ContextVar("HERMES_SESSION_KEY", default="")
_VAR_MAP = {
"HERMES_SESSION_PLATFORM": _SESSION_PLATFORM,
@@ -97,17 +91,10 @@ def set_session_vars(
def clear_session_vars(tokens: list) -> None:
"""Mark session context variables as explicitly cleared.
Sets all variables to ``""`` so that ``get_session_env`` returns an empty
string instead of falling back to (potentially stale) ``os.environ``
values. The *tokens* argument is accepted for API compatibility with
callers that saved the return value of ``set_session_vars``, but the
actual clearing uses ``var.set("")`` rather than ``var.reset(token)``
to ensure the "explicitly cleared" state is distinguishable from
"never set" (which holds the ``_UNSET`` sentinel).
"""
for var in (
"""Restore session context variables to their pre-handler values."""
if not tokens:
return
vars_in_order = [
_SESSION_PLATFORM,
_SESSION_CHAT_ID,
_SESSION_CHAT_NAME,
@@ -115,8 +102,9 @@ def clear_session_vars(tokens: list) -> None:
_SESSION_USER_ID,
_SESSION_USER_NAME,
_SESSION_KEY,
):
var.set("")
]
for var, token in zip(vars_in_order, tokens):
var.reset(token)
def get_session_env(name: str, default: str = "") -> str:
@@ -125,13 +113,8 @@ def get_session_env(name: str, default: str = "") -> str:
Drop-in replacement for ``os.getenv("HERMES_SESSION_*", default)``.
Resolution order:
1. Context variable (set by the gateway for concurrency-safe access).
If the variable was explicitly set (even to ``""``) via
``set_session_vars`` or ``clear_session_vars``, that value is
returned **no fallback to os.environ**.
2. ``os.environ`` (only when the context variable was never set in
this context i.e. CLI, cron scheduler, and test processes that
don't use ``set_session_vars`` at all).
1. Context variable (set by the gateway for concurrency-safe access)
2. ``os.environ`` (used by CLI, cron scheduler, and tests)
3. *default*
"""
import os
@@ -139,7 +122,7 @@ def get_session_env(name: str, default: str = "") -> str:
var = _VAR_MAP.get(name)
if var is not None:
value = var.get()
if value is not _UNSET:
if value:
return value
# Fall back to os.environ for CLI, cron, and test compatibility
return os.getenv(name, default)
+135 -1
View File
@@ -27,6 +27,7 @@ _RUNTIME_STATUS_FILE = "gateway_state.json"
_LOCKS_DIRNAME = "gateway-locks"
_IS_WINDOWS = sys.platform == "win32"
_UNSET = object()
_VALID_STARTUP_CHECK_STATES = {"pending", "ready", "failed"}
def _get_pid_path() -> Path:
@@ -162,11 +163,39 @@ def _build_runtime_status_record() -> dict[str, Any]:
"restart_requested": False,
"active_agents": 0,
"platforms": {},
"startup_checks": {},
"updated_at": _utc_now_iso(),
})
return payload
def _normalize_startup_check_entries(
startup_checks: Optional[dict[str, Any]],
) -> dict[str, dict[str, Any]]:
"""Normalize persisted startup readiness entries."""
if not isinstance(startup_checks, dict):
return {}
now = _utc_now_iso()
normalized: dict[str, dict[str, Any]] = {}
for raw_id, raw_payload in startup_checks.items():
check_id = str(raw_id).strip()
if not check_id:
continue
payload = raw_payload if isinstance(raw_payload, dict) else {}
state = str(payload.get("state", "pending")).strip().lower()
if state not in _VALID_STARTUP_CHECK_STATES:
state = "pending"
normalized[check_id] = {
"state": state,
"required": bool(payload.get("required", True)),
"source": payload.get("source"),
"detail": payload.get("detail"),
"updated_at": payload.get("updated_at") or now,
}
return normalized
def _read_json_file(path: Path) -> Optional[dict[str, Any]]:
if not path.exists():
return None
@@ -223,6 +252,7 @@ def write_runtime_status(
exit_reason: Any = _UNSET,
restart_requested: Any = _UNSET,
active_agents: Any = _UNSET,
startup_checks: Any = _UNSET,
platform: Any = _UNSET,
platform_state: Any = _UNSET,
error_code: Any = _UNSET,
@@ -245,6 +275,8 @@ def write_runtime_status(
payload["restart_requested"] = bool(restart_requested)
if active_agents is not _UNSET:
payload["active_agents"] = max(0, int(active_agents))
if startup_checks is not _UNSET:
payload["startup_checks"] = _normalize_startup_check_entries(startup_checks)
if platform is not _UNSET:
platform_payload = payload["platforms"].get(platform, {})
@@ -262,7 +294,109 @@ def write_runtime_status(
def read_runtime_status() -> Optional[dict[str, Any]]:
"""Read the persisted gateway runtime health/status information."""
return _read_json_file(_get_runtime_status_path())
payload = _read_json_file(_get_runtime_status_path())
if payload is None:
return None
payload.setdefault("platforms", {})
payload["startup_checks"] = _normalize_startup_check_entries(payload.get("startup_checks"))
return payload
def reset_startup_checks(checks: Optional[list[dict[str, Any]]] = None) -> dict[str, dict[str, Any]]:
"""Replace persisted startup readiness checks for the current run."""
normalized: dict[str, dict[str, Any]] = {}
now = _utc_now_iso()
for hook in checks or []:
if not isinstance(hook, dict):
continue
readiness = hook.get("startup_readiness")
if not isinstance(readiness, dict):
continue
check_id = str(readiness.get("id", "")).strip()
if not check_id:
continue
normalized[check_id] = {
"state": "pending",
"required": bool(readiness.get("required", True)),
"source": hook.get("name"),
"detail": None,
"updated_at": now,
}
write_runtime_status(startup_checks=normalized)
return normalized
def update_startup_check(
check_id: str,
state: str,
*,
detail: Any = _UNSET,
required: Any = _UNSET,
source: Any = _UNSET,
) -> dict[str, Any]:
"""Update a single startup readiness check in the runtime status file."""
normalized_id = str(check_id).strip()
if not normalized_id:
raise ValueError("startup readiness check id is required")
normalized_state = str(state).strip().lower()
if normalized_state not in _VALID_STARTUP_CHECK_STATES:
raise ValueError(f"invalid startup readiness state: {state}")
path = _get_runtime_status_path()
payload = _read_json_file(path) or _build_runtime_status_record()
checks = _normalize_startup_check_entries(payload.get("startup_checks"))
existing = checks.get(normalized_id, {})
now = _utc_now_iso()
checks[normalized_id] = {
"state": normalized_state,
"required": bool(existing.get("required", True) if required is _UNSET else required),
"source": existing.get("source") if source is _UNSET else source,
"detail": existing.get("detail") if detail is _UNSET else detail,
"updated_at": now,
}
payload["startup_checks"] = checks
payload.setdefault("platforms", {})
payload.setdefault("kind", _GATEWAY_KIND)
payload["pid"] = os.getpid()
payload["start_time"] = _get_process_start_time(os.getpid())
payload["updated_at"] = now
_write_json_file(path, payload)
return checks[normalized_id]
def mark_startup_check_pending(
check_id: str,
*,
detail: Any = _UNSET,
required: Any = _UNSET,
source: Any = _UNSET,
) -> dict[str, Any]:
return update_startup_check(check_id, "pending", detail=detail, required=required, source=source)
def mark_startup_check_ready(
check_id: str,
*,
detail: Any = _UNSET,
required: Any = _UNSET,
source: Any = _UNSET,
) -> dict[str, Any]:
return update_startup_check(check_id, "ready", detail=detail, required=required, source=source)
def mark_startup_check_failed(
check_id: str,
*,
detail: Any = _UNSET,
required: Any = _UNSET,
source: Any = _UNSET,
) -> dict[str, Any]:
return update_startup_check(check_id, "failed", detail=detail, required=required, source=source)
def remove_pid_file() -> None:
+15 -28
View File
@@ -403,20 +403,18 @@ class GatewayStreamConsumer:
except asyncio.CancelledError:
# Best-effort final edit on cancellation
_best_effort_ok = False
if self._accumulated and self._message_id:
try:
_best_effort_ok = bool(await self._send_or_edit(self._accumulated))
await self._send_or_edit(self._accumulated)
except Exception:
pass
# Only confirm final delivery if the best-effort send above
# actually succeeded OR if the final response was already
# confirmed before we were cancelled. Previously this
# promoted any partial send (already_sent=True) to
# final_response_sent — which suppressed the gateway's
# fallback send even when only intermediate text (e.g.
# "Let me search…") had been delivered, not the real answer.
if _best_effort_ok and not self._final_response_sent:
# If we delivered any content before being cancelled, mark the
# final response as sent so the gateway's already_sent check
# doesn't trigger a duplicate message. The 5-second
# stream_task timeout (gateway/run.py) can cancel us while
# waiting on a slow Telegram API call — without this flag the
# gateway falls through to the normal send path.
if self._already_sent:
self._final_response_sent = True
except Exception as e:
logger.error("Stream consumer error: %s", e)
@@ -515,17 +513,9 @@ class GatewayStreamConsumer:
self._fallback_final_send = False
if not continuation.strip():
# Nothing new to send — the visible partial already matches final text.
# BUT: if final_text itself has meaningful content (e.g. a timeout
# message after a long tool call), the prefix-based continuation
# calculation may wrongly conclude "already shown" because the
# streamed prefix was from a *previous* segment (before the tool
# boundary). In that case, send the full final_text as-is (#10807).
if final_text.strip() and final_text != self._visible_prefix():
continuation = final_text
else:
self._already_sent = True
self._final_response_sent = True
return
self._already_sent = True
self._final_response_sent = True
return
raw_limit = getattr(self.adapter, "MAX_MESSAGE_LENGTH", 4096)
safe_limit = max(500, raw_limit - 100)
@@ -619,15 +609,12 @@ class GatewayStreamConsumer:
content=text,
metadata=self.metadata,
)
# Note: do NOT set _already_sent = True here.
# Commentary messages are interim status updates (e.g. "Using browser
# tool..."), not the final response. Setting already_sent would cause
# the final response to be incorrectly suppressed when there are
# multiple tool calls. See: https://github.com/NousResearch/hermes-agent/issues/10454
return result.success
if result.success:
self._already_sent = True
return True
except Exception as e:
logger.error("Commentary send error: %s", e)
return False
return False
async def _send_or_edit(self, text: str) -> bool:
"""Send or edit the streaming message.
+2 -2
View File
@@ -11,5 +11,5 @@ Provides subcommands for:
- hermes cron - Manage cron jobs
"""
__version__ = "0.10.0"
__release_date__ = "2026.4.16"
__version__ = "0.9.0"
__release_date__ = "2026.4.13"
+3 -39
View File
@@ -70,7 +70,6 @@ DEFAULT_CODEX_BASE_URL = "https://chatgpt.com/backend-api/codex"
DEFAULT_QWEN_BASE_URL = "https://portal.qwen.ai/v1"
DEFAULT_GITHUB_MODELS_BASE_URL = "https://api.githubcopilot.com"
DEFAULT_COPILOT_ACP_BASE_URL = "acp://copilot"
DEFAULT_OLLAMA_CLOUD_BASE_URL = "https://ollama.com/v1"
CODEX_OAUTH_CLIENT_ID = "app_EMoamEEZ73f0CkXaXp7hrann"
CODEX_OAUTH_TOKEN_URL = "https://auth.openai.com/oauth/token"
CODEX_ACCESS_TOKEN_REFRESH_SKEW_SECONDS = 120
@@ -275,22 +274,6 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
api_key_env_vars=("XIAOMI_API_KEY",),
base_url_env_var="XIAOMI_BASE_URL",
),
"ollama-cloud": ProviderConfig(
id="ollama-cloud",
name="Ollama Cloud",
auth_type="api_key",
inference_base_url=DEFAULT_OLLAMA_CLOUD_BASE_URL,
api_key_env_vars=("OLLAMA_API_KEY",),
base_url_env_var="OLLAMA_BASE_URL",
),
"bedrock": ProviderConfig(
id="bedrock",
name="AWS Bedrock",
auth_type="aws_sdk",
inference_base_url="https://bedrock-runtime.us-east-1.amazonaws.com",
api_key_env_vars=(),
base_url_env_var="BEDROCK_BASE_URL",
),
}
@@ -928,7 +911,6 @@ def resolve_provider(
_PROVIDER_ALIASES = {
"glm": "zai", "z-ai": "zai", "z.ai": "zai", "zhipu": "zai",
"google": "gemini", "google-gemini": "gemini", "google-ai-studio": "gemini",
"x-ai": "xai", "x.ai": "xai", "grok": "xai",
"kimi": "kimi-coding", "kimi-for-coding": "kimi-coding", "moonshot": "kimi-coding",
"kimi-cn": "kimi-coding-cn", "moonshot-cn": "kimi-coding-cn",
"arcee-ai": "arcee", "arceeai": "arcee",
@@ -942,13 +924,11 @@ def resolve_provider(
"qwen-portal": "qwen-oauth", "qwen-cli": "qwen-oauth", "qwen-oauth": "qwen-oauth",
"hf": "huggingface", "hugging-face": "huggingface", "huggingface-hub": "huggingface",
"mimo": "xiaomi", "xiaomi-mimo": "xiaomi",
"aws": "bedrock", "aws-bedrock": "bedrock", "amazon-bedrock": "bedrock", "amazon": "bedrock",
"go": "opencode-go", "opencode-go-sub": "opencode-go",
"kilo": "kilocode", "kilo-code": "kilocode", "kilo-gateway": "kilocode",
# Local server aliases — route through the generic custom provider
"lmstudio": "custom", "lm-studio": "custom", "lm_studio": "custom",
"ollama": "custom", "ollama_cloud": "ollama-cloud",
"vllm": "custom", "llamacpp": "custom",
"ollama": "custom", "vllm": "custom", "llamacpp": "custom",
"llama.cpp": "custom", "llama-cpp": "custom",
}
normalized = _PROVIDER_ALIASES.get(normalized, normalized)
@@ -1000,15 +980,6 @@ def resolve_provider(
if has_usable_secret(os.getenv(env_var, "")):
return pid
# AWS Bedrock — detect via boto3 credential chain (IAM roles, SSO, env vars).
# This runs after API-key providers so explicit keys always win.
try:
from agent.bedrock_adapter import has_aws_credentials
if has_aws_credentials():
return "bedrock"
except ImportError:
pass # boto3 not installed — skip Bedrock auto-detection
raise AuthError(
"No inference provider configured. Run 'hermes model' to choose a "
"provider and model, or set an API key (OPENROUTER_API_KEY, "
@@ -2413,7 +2384,7 @@ def get_api_key_provider_status(provider_id: str) -> Dict[str, Any]:
if pconfig.base_url_env_var:
env_url = os.getenv(pconfig.base_url_env_var, "").strip()
if provider_id in ("kimi-coding", "kimi-coding-cn"):
if provider_id == "kimi-coding":
base_url = _resolve_kimi_base_url(api_key, pconfig.inference_base_url, env_url)
elif env_url:
base_url = env_url
@@ -2475,13 +2446,6 @@ def get_auth_status(provider_id: Optional[str] = None) -> Dict[str, Any]:
pconfig = PROVIDER_REGISTRY.get(target)
if pconfig and pconfig.auth_type == "api_key":
return get_api_key_provider_status(target)
# AWS SDK providers (Bedrock) — check via boto3 credential chain
if pconfig and pconfig.auth_type == "aws_sdk":
try:
from agent.bedrock_adapter import has_aws_credentials
return {"logged_in": has_aws_credentials(), "provider": target}
except ImportError:
return {"logged_in": False, "provider": target, "error": "boto3 not installed"}
return {"logged_in": False}
@@ -2506,7 +2470,7 @@ def resolve_api_key_provider_credentials(provider_id: str) -> Dict[str, Any]:
if pconfig.base_url_env_var:
env_url = os.getenv(pconfig.base_url_env_var, "").strip()
if provider_id in ("kimi-coding", "kimi-coding-cn"):
if provider_id == "kimi-coding":
base_url = _resolve_kimi_base_url(api_key, pconfig.inference_base_url, env_url)
elif provider_id == "zai":
base_url = _resolve_zai_base_url(api_key, pconfig.inference_base_url, env_url)
+1 -26
View File
@@ -4,7 +4,6 @@ from __future__ import annotations
from getpass import getpass
import math
import sys
import time
from types import SimpleNamespace
import uuid
@@ -161,10 +160,7 @@ def auth_add_command(args) -> None:
default_label = _api_key_default_label(len(pool.entries()) + 1)
label = (getattr(args, "label", None) or "").strip()
if not label:
if sys.stdin.isatty():
label = input(f"Label (optional, default: {default_label}): ").strip() or default_label
else:
label = default_label
label = input(f"Label (optional, default: {default_label}): ").strip() or default_label
entry = PooledCredential(
provider=provider,
id=uuid.uuid4().hex[:6],
@@ -372,27 +368,6 @@ def _interactive_auth() -> None:
print("=" * 50)
auth_list_command(SimpleNamespace(provider=None))
# Show AWS Bedrock credential status (not in the pool — uses boto3 chain)
try:
from agent.bedrock_adapter import has_aws_credentials, resolve_aws_auth_env_var, resolve_bedrock_region
if has_aws_credentials():
auth_source = resolve_aws_auth_env_var() or "unknown"
region = resolve_bedrock_region()
print(f"bedrock (AWS SDK credential chain):")
print(f" Auth: {auth_source}")
print(f" Region: {region}")
try:
import boto3
sts = boto3.client("sts", region_name=region)
identity = sts.get_caller_identity()
arn = identity.get("Arn", "unknown")
print(f" Identity: {arn}")
except Exception:
print(f" Identity: (could not resolve — boto3 STS call failed)")
print()
except ImportError:
pass # boto3 or bedrock_adapter not available
print()
# Main menu
+4 -19
View File
@@ -164,7 +164,7 @@ COMMAND_REGISTRY: list[CommandDef] = [
# Exit
CommandDef("quit", "Exit the CLI", "Exit",
cli_only=True, aliases=("exit",)),
cli_only=True, aliases=("exit", "q")),
]
@@ -450,7 +450,7 @@ def _collect_gateway_skill_entries(
name = sanitize_name(cmd_name) if sanitize_name else cmd_name
if not name:
continue
desc = plugin_cmds[cmd_name].get("description", "Plugin command")
desc = "Plugin command"
if len(desc) > desc_limit:
desc = desc[:desc_limit - 3] + "..."
plugin_pairs.append((name, desc))
@@ -844,7 +844,8 @@ class SlashCommandCompleter(Completer):
return None
return word
def _context_completions(self, word: str, limit: int = 30):
@staticmethod
def _context_completions(word: str, limit: int = 30):
"""Yield Claude Code-style @ context completions.
Bare ``@`` or ``@partial`` shows static references and matching
@@ -1139,22 +1140,6 @@ class SlashCommandCompleter(Completer):
display_meta=f"{short_desc}",
)
# Plugin-registered slash commands
try:
from hermes_cli.plugins import get_plugin_commands
for cmd_name, cmd_info in get_plugin_commands().items():
if cmd_name.startswith(word):
desc = str(cmd_info.get("description", "Plugin command"))
short_desc = desc[:50] + ("..." if len(desc) > 50 else "")
yield Completion(
self._completion_text(cmd_name, word),
start_position=-len(word),
display=f"/{cmd_name}",
display_meta=f"🔌 {short_desc}",
)
except Exception:
pass
# ---------------------------------------------------------------------------
# Inline auto-suggest (ghost text) for slash commands
+28 -251
View File
@@ -23,6 +23,7 @@ from dataclasses import dataclass
from pathlib import Path
from typing import Dict, Any, Optional, List, Tuple
from tools.tool_backend_helpers import managed_nous_tools_enabled as _managed_nous_tools_enabled
_IS_WINDOWS = platform.system() == "Windows"
_ENV_VAR_NAME_RE = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*$")
@@ -240,41 +241,13 @@ def _secure_dir(path):
pass
def _is_container() -> bool:
"""Detect if we're running inside a Docker/Podman/LXC container.
When Hermes runs in a container with volume-mounted config files, forcing
0o600 permissions breaks multi-process setups where the gateway and
dashboard run as different UIDs or the volume mount requires broader
permissions.
"""
# Explicit opt-out
if os.environ.get("HERMES_CONTAINER") or os.environ.get("HERMES_SKIP_CHMOD"):
return True
# Docker / Podman marker file
if os.path.exists("/.dockerenv"):
return True
# LXC / cgroup-based detection
try:
with open("/proc/1/cgroup", "r") as f:
cgroup_content = f.read()
if "docker" in cgroup_content or "lxc" in cgroup_content or "kubepods" in cgroup_content:
return True
except (OSError, IOError):
pass
return False
def _secure_file(path):
"""Set file to owner-only read/write (0600). No-op on Windows.
Skipped in managed mode the NixOS activation script sets
group-readable permissions (0640) on config files.
Skipped in containers Docker/Podman volume mounts often need broader
permissions. Set HERMES_SKIP_CHMOD=1 to force-skip on other systems.
"""
if is_managed() or _is_container():
if is_managed():
return
try:
if os.path.exists(str(path)):
@@ -419,7 +392,8 @@ DEFAULT_CONFIG = {
"allow_private_urls": False, # Allow navigating to private/internal IPs (localhost, 192.168.x.x, etc.)
"camofox": {
# When true, Hermes sends a stable profile-scoped userId to Camofox
# so the server maps it to a persistent Firefox profile automatically.
# so the server can map it to a persistent browser profile directory.
# Requires Camofox server to be configured with CAMOFOX_PROFILE_DIR.
# When false (default), each session gets a random userId (ephemeral).
"managed_persistence": False,
},
@@ -445,27 +419,6 @@ DEFAULT_CONFIG = {
"protect_last_n": 20, # minimum recent messages to keep uncompressed
},
# AWS Bedrock provider configuration.
# Only used when model.provider is "bedrock".
"bedrock": {
"region": "", # AWS region for Bedrock API calls (empty = AWS_REGION env var → us-east-1)
"discovery": {
"enabled": True, # Auto-discover models via ListFoundationModels
"provider_filter": [], # Only show models from these providers (e.g. ["anthropic", "amazon"])
"refresh_interval": 3600, # Cache discovery results for this many seconds
},
"guardrail": {
# Amazon Bedrock Guardrails — content filtering and safety policies.
# Create a guardrail in the Bedrock console, then set the ID and version here.
# See: https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails.html
"guardrail_identifier": "", # e.g. "abc123def456"
"guardrail_version": "", # e.g. "1" or "DRAFT"
"stream_processing_mode": "async", # "sync" or "async"
"trace": "disabled", # "enabled", "disabled", or "enabled_full"
},
},
"smart_model_routing": {
"enabled": False,
"max_simple_chars": 160,
@@ -557,11 +510,6 @@ DEFAULT_CONFIG = {
"platforms": {}, # Per-platform display overrides: {"telegram": {"tool_progress": "all"}, "slack": {"tool_progress": "off"}}
},
# Web dashboard settings
"dashboard": {
"theme": "default", # Dashboard visual theme: "default", "midnight", "ember", "mono", "cyberpunk", "rose"
},
# Privacy settings
"privacy": {
"redact_pii": False, # When True, hash user IDs and strip phone numbers from LLM context
@@ -569,7 +517,7 @@ DEFAULT_CONFIG = {
# Text-to-speech configuration
"tts": {
"provider": "edge", # "edge" (free) | "elevenlabs" (premium) | "openai" | "xai" | "minimax" | "mistral" | "neutts" (local)
"provider": "edge", # "edge" (free) | "elevenlabs" (premium) | "openai" | "minimax" | "mistral" | "neutts" (local)
"edge": {
"voice": "en-US-AriaNeural",
# Popular: AriaNeural, JennyNeural, AndrewNeural, BrianNeural, SoniaNeural
@@ -583,12 +531,6 @@ DEFAULT_CONFIG = {
"voice": "alloy",
# Voices: alloy, echo, fable, onyx, nova, shimmer
},
"xai": {
"voice_id": "eve",
"language": "en",
"sample_rate": 24000,
"bit_rate": 128000,
},
"mistral": {
"model": "voxtral-mini-tts-2603",
"voice_id": "c69964a6-ab8b-4f8a-9465-ec0925096ec8", # Paul - Neutral
@@ -696,7 +638,6 @@ DEFAULT_CONFIG = {
"allowed_channels": "", # If set, bot ONLY responds in these channel IDs (whitelist)
"auto_thread": True, # Auto-create threads on @mention in channels (like Slack)
"reactions": True, # Add 👀/✅/❌ reactions to messages during processing
"channel_prompts": {}, # Per-channel ephemeral system prompts (forum parents apply to child threads)
},
# WhatsApp platform settings (gateway mode)
@@ -707,21 +648,6 @@ DEFAULT_CONFIG = {
# Supports \n for newlines, e.g. "🤖 *My Bot*\n──────\n"
},
# Telegram platform settings (gateway mode)
"telegram": {
"channel_prompts": {}, # Per-chat/topic ephemeral system prompts (topics inherit from parent group)
},
# Slack platform settings (gateway mode)
"slack": {
"channel_prompts": {}, # Per-channel ephemeral system prompts
},
# Mattermost platform settings (gateway mode)
"mattermost": {
"channel_prompts": {}, # Per-channel ephemeral system prompts
},
# Approval mode for dangerous commands:
# manual — always prompt the user (default)
# smart — use auxiliary LLM to auto-approve low-risk commands, prompt for high-risk
@@ -777,7 +703,7 @@ DEFAULT_CONFIG = {
},
# Config schema version - bump this when adding new required fields
"_config_version": 18,
"_config_version": 17,
}
# =============================================================================
@@ -845,22 +771,6 @@ OPTIONAL_ENV_VARS = {
"category": "provider",
"advanced": True,
},
"XAI_API_KEY": {
"description": "xAI API key",
"prompt": "xAI API key",
"url": "https://console.x.ai/",
"password": True,
"category": "provider",
"advanced": True,
},
"XAI_BASE_URL": {
"description": "xAI base URL override",
"prompt": "xAI base URL (leave empty for default)",
"url": None,
"password": False,
"category": "provider",
"advanced": True,
},
"GLM_API_KEY": {
"description": "Z.AI / GLM API key (also recognized as ZAI_API_KEY / Z_AI_API_KEY)",
"prompt": "Z.AI / GLM API key",
@@ -1049,22 +959,6 @@ OPTIONAL_ENV_VARS = {
"category": "provider",
"advanced": True,
},
"OLLAMA_API_KEY": {
"description": "Ollama Cloud API key (ollama.com — cloud-hosted open models)",
"prompt": "Ollama Cloud API key",
"url": "https://ollama.com/settings",
"password": True,
"category": "provider",
"advanced": True,
},
"OLLAMA_BASE_URL": {
"description": "Ollama Cloud base URL override (default: https://ollama.com/v1)",
"prompt": "Ollama base URL (leave empty for default)",
"url": None,
"password": False,
"category": "provider",
"advanced": True,
},
"XIAOMI_API_KEY": {
"description": "Xiaomi MiMo API key for MiMo models (mimo-v2-pro, mimo-v2-omni, mimo-v2-flash)",
"prompt": "Xiaomi MiMo API Key",
@@ -1080,22 +974,6 @@ OPTIONAL_ENV_VARS = {
"category": "provider",
"advanced": True,
},
"AWS_REGION": {
"description": "AWS region for Bedrock API calls (e.g. us-east-1, eu-central-1)",
"prompt": "AWS Region",
"url": "https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-regions.html",
"password": False,
"category": "provider",
"advanced": True,
},
"AWS_PROFILE": {
"description": "AWS named profile for Bedrock authentication (from ~/.aws/credentials)",
"prompt": "AWS Profile",
"url": None,
"password": False,
"category": "provider",
"advanced": True,
},
# ── Tool API keys ──
"EXA_API_KEY": {
@@ -1293,12 +1171,6 @@ OPTIONAL_ENV_VARS = {
"password": False,
"category": "messaging",
},
"TELEGRAM_PROXY": {
"description": "Proxy URL for Telegram connections (overrides HTTPS_PROXY). Supports http://, https://, socks5://",
"prompt": "Telegram proxy URL (optional)",
"password": False,
"category": "messaging",
},
"DISCORD_BOT_TOKEN": {
"description": "Discord bot token from Developer Portal",
"prompt": "Discord bot token",
@@ -1596,8 +1468,13 @@ OPTIONAL_ENV_VARS = {
},
# ── Agent settings ──
# NOTE: MESSAGING_CWD was removed here — use terminal.cwd in config.yaml
# instead. The gateway reads TERMINAL_CWD (bridged from terminal.cwd).
"MESSAGING_CWD": {
"description": "Working directory for terminal commands via messaging",
"prompt": "Messaging working directory (default: home)",
"url": None,
"password": False,
"category": "setting",
},
"SUDO_PASSWORD": {
"description": "Sudo password for terminal commands requiring root access; set to an explicit empty string to try empty without prompting",
"prompt": "Sudo password",
@@ -1645,8 +1522,14 @@ OPTIONAL_ENV_VARS = {
},
}
# Tool Gateway env vars are always visible — they're useful for
# self-hosted / custom gateway setups regardless of subscription state.
if not _managed_nous_tools_enabled():
for _hidden_var in (
"FIRECRAWL_GATEWAY_URL",
"TOOL_GATEWAY_DOMAIN",
"TOOL_GATEWAY_SCHEME",
"TOOL_GATEWAY_USER_TOKEN",
):
OPTIONAL_ENV_VARS.pop(_hidden_var, None)
def get_missing_env_vars(required_only: bool = False) -> List[Dict[str, Any]]:
@@ -2070,52 +1953,6 @@ def print_config_warnings(config: Optional[Dict[str, Any]] = None) -> None:
sys.stderr.write("\n".join(lines) + "\n\n")
def warn_deprecated_cwd_env_vars(config: Optional[Dict[str, Any]] = None) -> None:
"""Warn if MESSAGING_CWD or TERMINAL_CWD is set in .env instead of config.yaml.
These env vars are deprecated the canonical setting is terminal.cwd
in config.yaml. Prints a migration hint to stderr.
"""
import os, sys
messaging_cwd = os.environ.get("MESSAGING_CWD")
terminal_cwd_env = os.environ.get("TERMINAL_CWD")
if config is None:
try:
config = load_config()
except Exception:
return
terminal_cfg = config.get("terminal", {})
config_cwd = terminal_cfg.get("cwd", ".") if isinstance(terminal_cfg, dict) else "."
# Only warn if config.yaml doesn't have an explicit path
config_has_explicit_cwd = config_cwd not in (".", "auto", "cwd", "")
lines: list[str] = []
if messaging_cwd:
lines.append(
f" \033[33m⚠\033[0m MESSAGING_CWD={messaging_cwd} found in .env — "
f"this is deprecated."
)
if terminal_cwd_env and not config_has_explicit_cwd:
# TERMINAL_CWD in env but not from config bridge — likely from .env
lines.append(
f" \033[33m⚠\033[0m TERMINAL_CWD={terminal_cwd_env} found in .env — "
f"this is deprecated."
)
if lines:
hint_path = os.environ.get("HERMES_HOME", "~/.hermes")
lines.insert(0, "\033[33m⚠ Deprecated .env settings detected:\033[0m")
lines.append(
f" \033[2mMove to config.yaml instead: "
f"terminal:\\n cwd: /your/project/path\033[0m"
)
lines.append(
f" \033[2mThen remove the old entries from {hint_path}/.env\033[0m"
)
sys.stderr.write("\n".join(lines) + "\n\n")
def migrate_config(interactive: bool = True, quiet: bool = False) -> Dict[str, Any]:
"""
Migrate config to latest version, prompting for new required fields.
@@ -2929,47 +2766,6 @@ def sanitize_env_file() -> int:
return fixes
def _check_non_ascii_credential(key: str, value: str) -> str:
"""Warn and strip non-ASCII characters from credential values.
API keys and tokens must be pure ASCII they are sent as HTTP header
values which httpx/httpcore encode as ASCII. Non-ASCII characters
(commonly introduced by copy-pasting from rich-text editors or PDFs
that substitute lookalike Unicode glyphs for ASCII letters) cause
``UnicodeEncodeError: 'ascii' codec can't encode character`` at
request time.
Returns the sanitized (ASCII-only) value. Prints a warning if any
non-ASCII characters were found and removed.
"""
try:
value.encode("ascii")
return value # all ASCII — nothing to do
except UnicodeEncodeError:
pass
# Build a readable list of the offending characters
bad_chars: list[str] = []
for i, ch in enumerate(value):
if ord(ch) > 127:
bad_chars.append(f" position {i}: {ch!r} (U+{ord(ch):04X})")
sanitized = value.encode("ascii", errors="ignore").decode("ascii")
import sys
print(
f"\n Warning: {key} contains non-ASCII characters that will break API requests.\n"
f" This usually happens when copy-pasting from a PDF, rich-text editor,\n"
f" or web page that substitutes lookalike Unicode glyphs for ASCII letters.\n"
f"\n"
+ "\n".join(f" {line}" for line in bad_chars[:5])
+ ("\n ... and more" if len(bad_chars) > 5 else "")
+ f"\n\n The non-ASCII characters have been stripped automatically.\n"
f" If authentication fails, re-copy the key from the provider's dashboard.\n",
file=sys.stderr,
)
return sanitized
def save_env_value(key: str, value: str):
"""Save or update a value in ~/.hermes/.env."""
if is_managed():
@@ -2978,8 +2774,6 @@ def save_env_value(key: str, value: str):
if not _ENV_VAR_NAME_RE.match(key):
raise ValueError(f"Invalid environment variable name: {key!r}")
value = value.replace("\n", "").replace("\r", "")
# API keys / tokens must be ASCII — strip non-ASCII with a warning.
value = _check_non_ascii_credential(key, value)
ensure_hermes_home()
env_path = get_env_path()
@@ -3010,25 +2804,12 @@ def save_env_value(key: str, value: str):
lines.append(f"{key}={value}\n")
fd, tmp_path = tempfile.mkstemp(dir=str(env_path.parent), suffix='.tmp', prefix='.env_')
# Preserve original permissions so Docker volume mounts aren't clobbered.
original_mode = None
if env_path.exists():
try:
original_mode = stat.S_IMODE(env_path.stat().st_mode)
except OSError:
pass
try:
with os.fdopen(fd, 'w', **write_kw) as f:
f.writelines(lines)
f.flush()
os.fsync(f.fileno())
os.replace(tmp_path, env_path)
# Restore original permissions before _secure_file may tighten them.
if original_mode is not None:
try:
os.chmod(env_path, original_mode)
except OSError:
pass
except BaseException:
try:
os.unlink(tmp_path)
@@ -3039,6 +2820,13 @@ def save_env_value(key: str, value: str):
os.environ[key] = value
# Restrict .env permissions to owner-only (contains API keys)
if not _IS_WINDOWS:
try:
os.chmod(env_path, stat.S_IRUSR | stat.S_IWUSR)
except OSError:
pass
def remove_env_value(key: str) -> bool:
"""Remove a key from ~/.hermes/.env and os.environ.
@@ -3067,23 +2855,12 @@ def remove_env_value(key: str) -> bool:
if found:
fd, tmp_path = tempfile.mkstemp(dir=str(env_path.parent), suffix='.tmp', prefix='.env_')
# Preserve original permissions so Docker volume mounts aren't clobbered.
original_mode = None
try:
original_mode = stat.S_IMODE(env_path.stat().st_mode)
except OSError:
pass
try:
with os.fdopen(fd, 'w', **write_kw) as f:
f.writelines(new_lines)
f.flush()
os.fsync(f.fileno())
os.replace(tmp_path, env_path)
if original_mode is not None:
try:
os.chmod(env_path, original_mode)
except OSError:
pass
except BaseException:
try:
os.unlink(tmp_path)
+4 -25
View File
@@ -166,7 +166,6 @@ def curses_radiolist(
selected: int = 0,
*,
cancel_returns: int | None = None,
description: str | None = None,
) -> int:
"""Curses single-select radio list. Returns the selected index.
@@ -175,9 +174,6 @@ def curses_radiolist(
items: Display labels for each row.
selected: Index that starts selected (pre-selected).
cancel_returns: Returned on ESC/q. Defaults to the original *selected*.
description: Optional multi-line text shown between the title and
the item list. Useful for context that should survive the
curses screen clear.
"""
if cancel_returns is None:
cancel_returns = selected
@@ -185,10 +181,6 @@ def curses_radiolist(
if not sys.stdin.isatty():
return cancel_returns
desc_lines: list[str] = []
if description:
desc_lines = description.splitlines()
try:
import curses
result_holder: list = [None]
@@ -207,35 +199,22 @@ def curses_radiolist(
stdscr.clear()
max_y, max_x = stdscr.getmaxyx()
row = 0
# Header
try:
hattr = curses.A_BOLD
if curses.has_colors():
hattr |= curses.color_pair(2)
stdscr.addnstr(row, 0, title, max_x - 1, hattr)
row += 1
# Description lines
for dline in desc_lines:
if row >= max_y - 1:
break
stdscr.addnstr(row, 0, dline, max_x - 1, curses.A_NORMAL)
row += 1
stdscr.addnstr(0, 0, title, max_x - 1, hattr)
stdscr.addnstr(
row, 0,
1, 0,
" \u2191\u2193 navigate ENTER/SPACE select ESC cancel",
max_x - 1, curses.A_DIM,
)
row += 1
except curses.error:
pass
# Scrollable item list
items_start = row + 1
visible_rows = max_y - items_start - 1
visible_rows = max_y - 4
if cursor < scroll_offset:
scroll_offset = cursor
elif cursor >= scroll_offset + visible_rows:
@@ -244,7 +223,7 @@ def curses_radiolist(
for draw_i, i in enumerate(
range(scroll_offset, min(len(items), scroll_offset + visible_rows))
):
y = draw_i + items_start
y = draw_i + 3
if y >= max_y - 1:
break
radio = "\u25cf" if i == selected else "\u25cb"
+2 -143
View File
@@ -27,110 +27,6 @@ _DPASTE_COM_URL = "https://dpaste.com/api/"
# paste.rs caps at ~1 MB; we stay under that with headroom.
_MAX_LOG_BYTES = 512_000
# Auto-delete pastes after this many seconds (1 hour).
_AUTO_DELETE_SECONDS = 3600
# ---------------------------------------------------------------------------
# Privacy / delete helpers
# ---------------------------------------------------------------------------
_PRIVACY_NOTICE = """\
This will upload the following to a public paste service:
System info (OS, Python version, Hermes version, provider, which API keys
are configured NOT the actual keys)
Recent log lines (agent.log, errors.log, gateway.log may contain
conversation fragments and file paths)
Full agent.log and gateway.log (up to 512 KB each likely contains
conversation content, tool outputs, and file paths)
Pastes auto-delete after 1 hour.
"""
_GATEWAY_PRIVACY_NOTICE = (
"⚠️ **Privacy notice:** This uploads system info + recent log tails "
"(may contain conversation fragments) to a public paste service. "
"Full logs are NOT included from the gateway — use `hermes debug share` "
"from the CLI for full log uploads.\n"
"Pastes auto-delete after 1 hour."
)
def _extract_paste_id(url: str) -> Optional[str]:
"""Extract the paste ID from a paste.rs or dpaste.com URL.
Returns the ID string, or None if the URL doesn't match a known service.
"""
url = url.strip().rstrip("/")
for prefix in ("https://paste.rs/", "http://paste.rs/"):
if url.startswith(prefix):
return url[len(prefix):]
return None
def delete_paste(url: str) -> bool:
"""Delete a paste from paste.rs. Returns True on success.
Only paste.rs supports unauthenticated DELETE. dpaste.com pastes
expire automatically but cannot be deleted via API.
"""
paste_id = _extract_paste_id(url)
if not paste_id:
raise ValueError(
f"Cannot delete: only paste.rs URLs are supported. Got: {url}"
)
target = f"{_PASTE_RS_URL}{paste_id}"
req = urllib.request.Request(
target, method="DELETE",
headers={"User-Agent": "hermes-agent/debug-share"},
)
with urllib.request.urlopen(req, timeout=30) as resp:
return 200 <= resp.status < 300
def _schedule_auto_delete(urls: list[str], delay_seconds: int = _AUTO_DELETE_SECONDS):
"""Spawn a detached process to delete paste.rs pastes after *delay_seconds*.
The child process is fully detached (``start_new_session=True``) so it
survives the parent exiting (important for CLI mode). Only paste.rs
URLs are attempted dpaste.com pastes auto-expire on their own.
"""
import subprocess
paste_rs_urls = [u for u in urls if _extract_paste_id(u)]
if not paste_rs_urls:
return
# Build a tiny inline Python script. No imports beyond stdlib.
url_list = ", ".join(f'"{u}"' for u in paste_rs_urls)
script = (
"import time, urllib.request; "
f"time.sleep({delay_seconds}); "
f"[urllib.request.urlopen(urllib.request.Request(u, method='DELETE', "
f"headers={{'User-Agent': 'hermes-agent/auto-delete'}}), timeout=15) "
f"for u in [{url_list}]]"
)
try:
subprocess.Popen(
[sys.executable, "-c", script],
start_new_session=True,
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
)
except Exception:
pass # Best-effort; manual delete still available.
def _delete_hint(url: str) -> str:
"""Return a one-liner delete command for the given paste URL."""
paste_id = _extract_paste_id(url)
if paste_id:
return f"hermes debug delete {url}"
# dpaste.com — no API delete, expires on its own.
return "(auto-expires per dpaste.com policy)"
def _upload_paste_rs(content: str) -> str:
"""Upload to paste.rs. Returns the paste URL.
@@ -354,9 +250,6 @@ def run_debug_share(args):
expiry = getattr(args, "expire", 7)
local_only = getattr(args, "local", False)
if not local_only:
print(_PRIVACY_NOTICE)
print("Collecting debug report...")
# Capture dump once — prepended to every paste for context.
@@ -422,56 +315,22 @@ def run_debug_share(args):
if failures:
print(f"\n (failed to upload: {', '.join(failures)})")
# Schedule auto-deletion after 1 hour
_schedule_auto_delete(list(urls.values()))
print(f"\n⏱ Pastes will auto-delete in 1 hour.")
# Manual delete fallback
print(f"To delete now: hermes debug delete <url>")
print(f"\nShare these links with the Hermes team for support.")
def run_debug_delete(args):
"""Delete one or more paste URLs uploaded by /debug."""
urls = getattr(args, "urls", [])
if not urls:
print("Usage: hermes debug delete <url> [<url> ...]")
print(" Deletes paste.rs pastes uploaded by 'hermes debug share'.")
return
for url in urls:
try:
ok = delete_paste(url)
if ok:
print(f" ✓ Deleted: {url}")
else:
print(f" ✗ Failed to delete: {url} (unexpected response)")
except ValueError as exc:
print(f"{exc}")
except Exception as exc:
print(f" ✗ Could not delete {url}: {exc}")
def run_debug(args):
"""Route debug subcommands."""
subcmd = getattr(args, "debug_command", None)
if subcmd == "share":
run_debug_share(args)
elif subcmd == "delete":
run_debug_delete(args)
else:
# Default: show help
print("Usage: hermes debug <command>")
print("Usage: hermes debug share [--lines N] [--expire N] [--local]")
print()
print("Commands:")
print(" share Upload debug report to a paste service and print URL")
print(" delete Delete a previously uploaded paste")
print()
print("Options (share):")
print("Options:")
print(" --lines N Number of log lines to include (default: 200)")
print(" --expire N Paste expiry in days (default: 7)")
print(" --local Print report locally instead of uploading")
print()
print("Options (delete):")
print(" <url> ... One or more paste URLs to delete")
+2 -109
View File
@@ -8,7 +8,6 @@ import os
import sys
import subprocess
import shutil
from pathlib import Path
from hermes_cli.config import get_project_root, get_hermes_home, get_env_path
from hermes_constants import display_hermes_home
@@ -514,87 +513,7 @@ def run_doctor(args):
pass
_check_gateway_service_linger(issues)
# =========================================================================
# Check: Command installation (hermes bin symlink)
# =========================================================================
if sys.platform != "win32":
print()
print(color("◆ Command Installation", Colors.CYAN, Colors.BOLD))
# Determine the venv entry point location
_venv_bin = None
for _venv_name in ("venv", ".venv"):
_candidate = PROJECT_ROOT / _venv_name / "bin" / "hermes"
if _candidate.exists():
_venv_bin = _candidate
break
# Determine the expected command link directory (mirrors install.sh logic)
_prefix = os.environ.get("PREFIX", "")
_is_termux_env = bool(os.environ.get("TERMUX_VERSION")) or "com.termux/files/usr" in _prefix
if _is_termux_env and _prefix:
_cmd_link_dir = Path(_prefix) / "bin"
_cmd_link_display = "$PREFIX/bin"
else:
_cmd_link_dir = Path.home() / ".local" / "bin"
_cmd_link_display = "~/.local/bin"
_cmd_link = _cmd_link_dir / "hermes"
if _venv_bin is None:
check_warn(
"Venv entry point not found",
"(hermes not in venv/bin/ or .venv/bin/ — reinstall with pip install -e '.[all]')"
)
manual_issues.append(
f"Reinstall entry point: cd {PROJECT_ROOT} && source venv/bin/activate && pip install -e '.[all]'"
)
else:
check_ok(f"Venv entry point exists ({_venv_bin.relative_to(PROJECT_ROOT)})")
# Check the symlink at the command link location
if _cmd_link.is_symlink():
_target = _cmd_link.resolve()
_expected = _venv_bin.resolve()
if _target == _expected:
check_ok(f"{_cmd_link_display}/hermes → correct target")
else:
check_warn(
f"{_cmd_link_display}/hermes points to wrong target",
f"(→ {_target}, expected → {_expected})"
)
if should_fix:
_cmd_link.unlink()
_cmd_link.symlink_to(_venv_bin)
check_ok(f"Fixed symlink: {_cmd_link_display}/hermes → {_venv_bin}")
fixed_count += 1
else:
issues.append(f"Broken symlink at {_cmd_link_display}/hermes — run 'hermes doctor --fix'")
elif _cmd_link.exists():
# It's a regular file, not a symlink — possibly a wrapper script
check_ok(f"{_cmd_link_display}/hermes exists (non-symlink)")
else:
check_fail(
f"{_cmd_link_display}/hermes not found",
"(hermes command may not work outside the venv)"
)
if should_fix:
_cmd_link_dir.mkdir(parents=True, exist_ok=True)
_cmd_link.symlink_to(_venv_bin)
check_ok(f"Created symlink: {_cmd_link_display}/hermes → {_venv_bin}")
fixed_count += 1
# Check if the link dir is on PATH
_path_dirs = os.environ.get("PATH", "").split(os.pathsep)
if str(_cmd_link_dir) not in _path_dirs:
check_warn(
f"{_cmd_link_display} is not on your PATH",
"(add it to your shell config: export PATH=\"$HOME/.local/bin:$PATH\")"
)
manual_issues.append(f"Add {_cmd_link_display} to your PATH")
else:
issues.append(f"Missing {_cmd_link_display}/hermes symlink — run 'hermes doctor --fix'")
# =========================================================================
# Check: External tools
# =========================================================================
@@ -814,8 +733,7 @@ def run_doctor(args):
("Vercel AI Gateway", ("AI_GATEWAY_API_KEY",), "https://ai-gateway.vercel.sh/v1/models", "AI_GATEWAY_BASE_URL", True),
("Kilo Code", ("KILOCODE_API_KEY",), "https://api.kilo.ai/api/gateway/models", "KILOCODE_BASE_URL", True),
("OpenCode Zen", ("OPENCODE_ZEN_API_KEY",), "https://opencode.ai/zen/v1/models", "OPENCODE_ZEN_BASE_URL", True),
# OpenCode Go has no shared /models endpoint; skip the health check.
("OpenCode Go", ("OPENCODE_GO_API_KEY",), None, "OPENCODE_GO_BASE_URL", False),
("OpenCode Go", ("OPENCODE_GO_API_KEY",), "https://opencode.ai/zen/go/v1/models", "OPENCODE_GO_BASE_URL", True),
]
for _pname, _env_vars, _default_url, _base_env, _supports_health_check in _apikey_providers:
_key = ""
@@ -860,31 +778,6 @@ def run_doctor(args):
except Exception as _e:
print(f"\r {color('', Colors.YELLOW)} {_label} {color(f'({_e})', Colors.DIM)} ")
# -- AWS Bedrock --
# Bedrock uses the AWS SDK credential chain, not API keys.
try:
from agent.bedrock_adapter import has_aws_credentials, resolve_aws_auth_env_var, resolve_bedrock_region
if has_aws_credentials():
_auth_var = resolve_aws_auth_env_var()
_region = resolve_bedrock_region()
_label = "AWS Bedrock".ljust(20)
print(f" Checking AWS Bedrock...", end="", flush=True)
try:
import boto3
_br_client = boto3.client("bedrock", region_name=_region)
_br_resp = _br_client.list_foundation_models()
_model_count = len(_br_resp.get("modelSummaries", []))
print(f"\r {color('', Colors.GREEN)} {_label} {color(f'({_auth_var}, {_region}, {_model_count} models)', Colors.DIM)} ")
except ImportError:
print(f"\r {color('', Colors.YELLOW)} {_label} {color('(boto3 not installed — pip install hermes-agent[bedrock])', Colors.DIM)} ")
issues.append("Install boto3 for Bedrock: pip install hermes-agent[bedrock]")
except Exception as _e:
_err_name = type(_e).__name__
print(f"\r {color('', Colors.YELLOW)} {_label} {color(f'({_err_name}: {_e})', Colors.DIM)} ")
issues.append(f"AWS Bedrock: {_err_name} — check IAM permissions for bedrock:ListFoundationModels")
except ImportError:
pass # bedrock_adapter not available — skip silently
# =========================================================================
# Check: Submodules
# =========================================================================
-29
View File
@@ -8,40 +8,11 @@ from pathlib import Path
from dotenv import load_dotenv
# Env var name suffixes that indicate credential values. These are the
# only env vars whose values we sanitize on load — we must not silently
# alter arbitrary user env vars, but credentials are known to require
# pure ASCII (they become HTTP header values).
_CREDENTIAL_SUFFIXES = ("_API_KEY", "_TOKEN", "_SECRET", "_KEY")
def _sanitize_loaded_credentials() -> None:
"""Strip non-ASCII characters from credential env vars in os.environ.
Called after dotenv loads so the rest of the codebase never sees
non-ASCII API keys. Only touches env vars whose names end with
known credential suffixes (``_API_KEY``, ``_TOKEN``, etc.).
"""
for key, value in list(os.environ.items()):
if not any(key.endswith(suffix) for suffix in _CREDENTIAL_SUFFIXES):
continue
try:
value.encode("ascii")
except UnicodeEncodeError:
os.environ[key] = value.encode("ascii", errors="ignore").decode("ascii")
def _load_dotenv_with_fallback(path: Path, *, override: bool) -> None:
try:
load_dotenv(dotenv_path=path, override=override, encoding="utf-8")
except UnicodeDecodeError:
load_dotenv(dotenv_path=path, override=override, encoding="latin-1")
# Strip non-ASCII characters from credential env vars that were just
# loaded. API keys must be pure ASCII since they're sent as HTTP
# header values (httpx encodes headers as ASCII). Non-ASCII chars
# typically come from copy-pasting keys from PDFs or rich-text editors
# that substitute Unicode lookalike glyphs (e.g. ʋ U+028B for v).
_sanitize_loaded_credentials()
def _sanitize_env_file_if_needed(path: Path) -> None:
+127 -111
View File
@@ -10,6 +10,7 @@ import shutil
import signal
import subprocess
import sys
import time
from pathlib import Path
PROJECT_ROOT = Path(__file__).parent.parent.resolve()
@@ -37,6 +38,10 @@ from hermes_cli.setup import (
from hermes_cli.colors import Colors, color
_SERVICE_READINESS_TIMEOUT = 30.0
_SERVICE_READINESS_POLL_INTERVAL = 0.2
# =============================================================================
# Process Management (for manual gateway runs)
# =============================================================================
@@ -222,7 +227,7 @@ def find_gateway_pids(exclude_pids: set | None = None, all_profiles: bool = Fals
current_cmd = ""
else:
result = subprocess.run(
["ps", "-A", "eww", "-o", "pid=,command="],
["ps", "eww", "-ax", "-o", "pid=,command="],
capture_output=True,
text=True,
timeout=10,
@@ -715,9 +720,7 @@ def _detect_venv_dir() -> Path | None:
"""Detect the active virtualenv directory.
Checks ``sys.prefix`` first (works regardless of the directory name),
then ``VIRTUAL_ENV`` env var (covers uv-managed environments where
sys.prefix == sys.base_prefix), then falls back to probing common
directory names under PROJECT_ROOT.
then falls back to probing common directory names under PROJECT_ROOT.
Returns ``None`` when no virtualenv can be found.
"""
# If we're running inside a virtualenv, sys.prefix points to it.
@@ -726,15 +729,6 @@ def _detect_venv_dir() -> Path | None:
if venv.is_dir():
return venv
# uv and some other tools set VIRTUAL_ENV without changing sys.prefix.
# This catches `uv run` where sys.prefix == sys.base_prefix but the
# environment IS a venv. (#8620)
_virtual_env = os.environ.get("VIRTUAL_ENV")
if _virtual_env:
venv = Path(_virtual_env)
if venv.is_dir():
return venv
# Fallback: check common virtualenv directory names under the project root.
for candidate in (".venv", "venv"):
venv = PROJECT_ROOT / candidate
@@ -1111,12 +1105,123 @@ def systemd_uninstall(system: bool = False):
print(f"{_service_scope_label(system).capitalize()} service uninstalled")
def _describe_startup_check(check_id: str, check: dict) -> str:
source = check.get("source")
detail = check.get("detail")
label = f"{check_id} ({source})" if source and source != check_id else check_id
return f"{label}: {detail}" if detail else label
def _classify_startup_checks(state: dict | None) -> tuple[list[str], list[str], list[str]]:
checks = (state or {}).get("startup_checks") or {}
pending_required: list[str] = []
failed_required: list[str] = []
optional_warnings: list[str] = []
if not isinstance(checks, dict):
return pending_required, failed_required, optional_warnings
for check_id, raw_check in checks.items():
check = raw_check if isinstance(raw_check, dict) else {}
label = _describe_startup_check(str(check_id), check)
check_state = str(check.get("state", "pending")).strip().lower()
required = bool(check.get("required", True))
if check_state == "ready":
continue
if required:
if check_state == "failed":
failed_required.append(label)
else:
pending_required.append(label)
else:
prefix = "failed" if check_state == "failed" else "pending"
optional_warnings.append(f"{prefix}: {label}")
return pending_required, failed_required, optional_warnings
def _wait_for_service_readiness(
*,
action: str,
previous_pid: int | None = None,
timeout: float = _SERVICE_READINESS_TIMEOUT,
poll_interval: float = _SERVICE_READINESS_POLL_INTERVAL,
) -> list[str]:
from gateway.status import get_running_pid, read_runtime_status
deadline = time.monotonic() + timeout
last_pending: list[str] = []
while time.monotonic() < deadline:
live_pid = get_running_pid()
if live_pid is None or (previous_pid is not None and live_pid == previous_pid):
time.sleep(poll_interval)
continue
runtime = read_runtime_status() or {}
try:
runtime_pid = int(runtime.get("pid"))
except (TypeError, ValueError):
runtime_pid = None
if runtime_pid != live_pid:
time.sleep(poll_interval)
continue
gateway_state = runtime.get("gateway_state")
pending_required, failed_required, optional_warnings = _classify_startup_checks(runtime)
last_pending = pending_required
if gateway_state == "startup_failed":
reason = runtime.get("exit_reason") or f"gateway {action} failed during startup"
raise RuntimeError(reason)
if failed_required:
raise RuntimeError(
"required startup checks failed: " + "; ".join(failed_required)
)
if gateway_state == "running" and not pending_required:
return optional_warnings
time.sleep(poll_interval)
if last_pending:
raise RuntimeError(
"timed out waiting for required startup checks: " + "; ".join(last_pending)
)
if previous_pid is not None:
raise RuntimeError(
f"timed out waiting for gateway {action}; previous process is still active or no new runtime became ready"
)
raise RuntimeError(f"timed out waiting for gateway {action} readiness")
def _await_service_ready_or_exit(
*,
action: str,
previous_pid: int | None = None,
timeout: float = _SERVICE_READINESS_TIMEOUT,
) -> None:
try:
optional_warnings = _wait_for_service_readiness(
action=action,
previous_pid=previous_pid,
timeout=timeout,
)
except RuntimeError as exc:
print_error(f" Gateway {action} did not become ready: {exc}")
raise SystemExit(1) from exc
for warning in optional_warnings:
print_warning(f" Optional startup check {warning}")
def systemd_start(system: bool = False):
system = _select_systemd_scope(system)
if system:
_require_root_for_system_service("start")
refresh_systemd_unit_if_needed(system=system)
_run_systemctl(["start", get_service_name()], system=system, check=True, timeout=30)
_await_service_ready_or_exit(action="start")
print(f"{_service_scope_label(system).capitalize()} service started")
@@ -1139,64 +1244,11 @@ def systemd_restart(system: bool = False):
pid = get_running_pid()
if pid is not None and _request_gateway_self_restart(pid):
# SIGUSR1 sent — the gateway will drain active agents, exit with
# code 75, and systemd will restart it after RestartSec (30s).
# Wait for the old process to die and the new one to become active
# so the CLI doesn't return while the service is still restarting.
import time
scope_label = _service_scope_label(system).capitalize()
svc = get_service_name()
scope_cmd = _systemctl_cmd(system)
# Phase 1: wait for old process to exit (drain + shutdown)
print(f"{scope_label} service draining active work...")
deadline = time.time() + 90
while time.time() < deadline:
try:
os.kill(pid, 0)
time.sleep(1)
except (ProcessLookupError, PermissionError):
break # old process is gone
else:
print(f"⚠ Old process (PID {pid}) still alive after 90s")
# Phase 2: wait for systemd to start the new process
print(f"⏳ Waiting for {svc} to restart...")
deadline = time.time() + 60
while time.time() < deadline:
try:
result = subprocess.run(
scope_cmd + ["is-active", svc],
capture_output=True, text=True, timeout=5,
)
if result.stdout.strip() == "active":
# Verify it's a NEW process, not the old one somehow
new_pid = get_running_pid()
if new_pid and new_pid != pid:
print(f"{scope_label} service restarted (PID {new_pid})")
return
except (subprocess.TimeoutExpired, FileNotFoundError):
pass
time.sleep(2)
# Timed out — check final state
try:
result = subprocess.run(
scope_cmd + ["is-active", svc],
capture_output=True, text=True, timeout=5,
)
if result.stdout.strip() == "active":
print(f"{scope_label} service restarted")
return
except Exception:
pass
print(
f"{scope_label} service did not become active within 60s.\n"
f" Check status: {'sudo ' if system else ''}hermes gateway status\n"
f" Check logs: journalctl {'--user ' if not system else ''}-u {svc} --since '2 min ago'"
)
_await_service_ready_or_exit(action="restart", previous_pid=pid)
print(f"{_service_scope_label(system).capitalize()} service restarted")
return
_run_systemctl(["reload-or-restart", get_service_name()], system=system, check=True, timeout=90)
_await_service_ready_or_exit(action="restart", previous_pid=pid)
print(f"{_service_scope_label(system).capitalize()} service restarted")
@@ -1455,6 +1507,7 @@ def launchd_start():
plist_path.write_text(generate_launchd_plist(), encoding="utf-8")
subprocess.run(["launchctl", "bootstrap", _launchd_domain(), str(plist_path)], check=True, timeout=30)
subprocess.run(["launchctl", "kickstart", f"{_launchd_domain()}/{label}"], check=True, timeout=30)
_await_service_ready_or_exit(action="start")
print("✓ Service started")
return
@@ -1467,6 +1520,7 @@ def launchd_start():
print("↻ launchd job was unloaded; reloading service definition")
subprocess.run(["launchctl", "bootstrap", _launchd_domain(), str(plist_path)], check=True, timeout=30)
subprocess.run(["launchctl", "kickstart", f"{_launchd_domain()}/{label}"], check=True, timeout=30)
_await_service_ready_or_exit(action="start")
print("✓ Service started")
def launchd_stop():
@@ -1537,7 +1591,8 @@ def launchd_restart():
try:
pid = get_running_pid()
if pid is not None and _request_gateway_self_restart(pid):
print("✓ Service restart requested")
_await_service_ready_or_exit(action="restart", previous_pid=pid)
print("✓ Service restarted")
return
if pid is not None:
try:
@@ -1549,6 +1604,7 @@ def launchd_restart():
if not exited:
print(f"⚠ Gateway drain timed out after {drain_timeout:.0f}s — forcing launchd restart")
subprocess.run(["launchctl", "kickstart", "-k", target], check=True, timeout=90)
_await_service_ready_or_exit(action="restart", previous_pid=pid)
print("✓ Service restarted")
except subprocess.CalledProcessError as e:
if e.returncode not in (3, 113):
@@ -1558,6 +1614,7 @@ def launchd_restart():
plist_path = get_launchd_plist_path()
subprocess.run(["launchctl", "bootstrap", _launchd_domain(), str(plist_path)], check=True, timeout=30)
subprocess.run(["launchctl", "kickstart", target], check=True, timeout=30)
_await_service_ready_or_exit(action="restart", previous_pid=pid)
print("✓ Service restarted")
def launchd_status(deep: bool = False):
@@ -2930,15 +2987,6 @@ def gateway_command(args):
elif subcmd == "start":
system = getattr(args, 'system', False)
start_all = getattr(args, 'all', False)
if start_all:
# Kill all stale gateway processes across all profiles before starting
killed = kill_gateway_processes(all_profiles=True)
if killed:
print(f"✓ Killed {killed} stale gateway process(es) across all profiles")
_wait_for_gateway_exit(timeout=10.0, force_after=5.0)
if is_termux():
print("Gateway service start is not supported on Termux because there is no system service manager.")
print("Run manually: hermes gateway")
@@ -3024,39 +3072,7 @@ def gateway_command(args):
# Try service first, fall back to killing and restarting
service_available = False
system = getattr(args, 'system', False)
restart_all = getattr(args, 'all', False)
service_configured = False
if restart_all:
# --all: stop every gateway process across all profiles, then start fresh
service_stopped = False
if supports_systemd_services() and (get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()):
try:
systemd_stop(system=system)
service_stopped = True
except subprocess.CalledProcessError:
pass
elif is_macos() and get_launchd_plist_path().exists():
try:
launchd_stop()
service_stopped = True
except subprocess.CalledProcessError:
pass
killed = kill_gateway_processes(all_profiles=True)
total = killed + (1 if service_stopped else 0)
if total:
print(f"✓ Stopped {total} gateway process(es) across all profiles")
_wait_for_gateway_exit(timeout=10.0, force_after=5.0)
# Start the current profile's service fresh
print("Starting gateway...")
if supports_systemd_services() and (get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()):
systemd_start(system=system)
elif is_macos() and get_launchd_plist_path().exists():
launchd_start()
else:
run_gateway(verbose=0)
return
if supports_systemd_services() and (get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()):
service_configured = True
+47 -413
View File
@@ -1139,9 +1139,7 @@ def select_provider_and_model(args=None):
_model_flow_anthropic(config, current_model)
elif selected_provider == "kimi-coding":
_model_flow_kimi(config, current_model)
elif selected_provider == "bedrock":
_model_flow_bedrock(config, current_model)
elif selected_provider in ("gemini", "deepseek", "xai", "zai", "kimi-coding-cn", "minimax", "minimax-cn", "kilocode", "opencode-zen", "opencode-go", "ai-gateway", "alibaba", "huggingface", "xiaomi", "arcee", "ollama-cloud"):
elif selected_provider in ("gemini", "deepseek", "xai", "zai", "kimi-coding-cn", "minimax", "minimax-cn", "kilocode", "opencode-zen", "opencode-go", "ai-gateway", "alibaba", "huggingface", "xiaomi", "arcee"):
_model_flow_api_key_provider(config, selected_provider, current_model)
# ── Post-switch cleanup: clear stale OPENAI_BASE_URL ──────────────
@@ -1277,8 +1275,11 @@ def _model_flow_nous(config, current_model="", args=None):
AuthError, format_auth_error,
_login_nous, PROVIDER_REGISTRY,
)
from hermes_cli.config import get_env_value, load_config, save_config, save_env_value
from hermes_cli.nous_subscription import prompt_enable_tool_gateway
from hermes_cli.config import get_env_value, save_config, save_env_value
from hermes_cli.nous_subscription import (
apply_nous_provider_defaults,
get_nous_subscription_explainer_lines,
)
import argparse
state = get_provider_auth_state("nous")
@@ -1297,12 +1298,9 @@ def _model_flow_nous(config, current_model="", args=None):
insecure=bool(getattr(args, "insecure", False)),
)
_login_nous(mock_args, PROVIDER_REGISTRY["nous"])
# Offer Tool Gateway enablement for paid subscribers
try:
_refreshed = load_config() or {}
prompt_enable_tool_gateway(_refreshed)
except Exception:
pass
print()
for line in get_nous_subscription_explainer_lines():
print(line)
except SystemExit:
print("Login cancelled or failed.")
return
@@ -1410,10 +1408,18 @@ def _model_flow_nous(config, current_model="", args=None):
if get_env_value("OPENAI_BASE_URL"):
save_env_value("OPENAI_BASE_URL", "")
save_env_value("OPENAI_API_KEY", "")
changed_defaults = apply_nous_provider_defaults(config)
save_config(config)
print(f"Default model set to: {selected} (via Nous Portal)")
# Offer Tool Gateway enablement for paid subscribers
prompt_enable_tool_gateway(config)
if "tts" in changed_defaults:
print("TTS provider set to: OpenAI TTS via your Nous subscription")
else:
current_tts = str(config.get("tts", {}).get("provider") or "edge")
if current_tts.lower() not in {"", "edge"}:
print(f"Keeping your existing TTS provider: {current_tts}")
print()
for line in get_nous_subscription_explainer_lines():
print(line)
else:
print("No change.")
@@ -1560,27 +1566,6 @@ def _model_flow_custom(config):
effective_key = api_key or current_key
# Hint: most local model servers (Ollama, vLLM, llama.cpp) require /v1
# in the base URL for OpenAI-compatible chat completions. Prompt the
# user if the URL looks like a local server without /v1.
_url_lower = effective_url.rstrip("/").lower()
_looks_local = any(h in _url_lower for h in ("localhost", "127.0.0.1", "0.0.0.0", ":11434", ":8080", ":5000"))
if _looks_local and not _url_lower.endswith("/v1"):
print()
print(f" Hint: Did you mean to add /v1 at the end?")
print(f" Most local model servers (Ollama, vLLM, llama.cpp) require it.")
print(f" e.g. {effective_url.rstrip('/')}/v1")
try:
_add_v1 = input(" Add /v1? [Y/n]: ").strip().lower()
except (KeyboardInterrupt, EOFError):
_add_v1 = "n"
if _add_v1 in ("", "y", "yes"):
effective_url = effective_url.rstrip("/") + "/v1"
if base_url:
base_url = effective_url
print(f" Updated URL: {effective_url}")
print()
from hermes_cli.models import probe_api_models
probe = probe_api_models(effective_key, effective_url)
@@ -2440,252 +2425,6 @@ def _model_flow_kimi(config, current_model=""):
print("No change.")
def _model_flow_bedrock_api_key(config, region, current_model=""):
"""Bedrock API Key mode — uses the OpenAI-compatible bedrock-mantle endpoint.
For developers who don't have an AWS account but received a Bedrock API Key
from their AWS admin. Works like any OpenAI-compatible endpoint.
"""
from hermes_cli.auth import _prompt_model_selection, _save_model_choice, deactivate_provider
from hermes_cli.config import load_config, save_config, get_env_value, save_env_value
from hermes_cli.models import _PROVIDER_MODELS
mantle_base_url = f"https://bedrock-mantle.{region}.api.aws/v1"
# Prompt for API key
existing_key = get_env_value("AWS_BEARER_TOKEN_BEDROCK") or ""
if existing_key:
print(f" Bedrock API Key: {existing_key[:12]}... ✓")
else:
print(f" Endpoint: {mantle_base_url}")
print()
try:
import getpass
api_key = getpass.getpass(" Bedrock API Key: ").strip()
except (KeyboardInterrupt, EOFError):
print()
return
if not api_key:
print(" Cancelled.")
return
save_env_value("AWS_BEARER_TOKEN_BEDROCK", api_key)
existing_key = api_key
print(" ✓ API key saved.")
print()
# Model selection — use static list (mantle doesn't need boto3 for discovery)
model_list = _PROVIDER_MODELS.get("bedrock", [])
print(f" Showing {len(model_list)} curated models")
if model_list:
selected = _prompt_model_selection(model_list, current_model=current_model)
else:
try:
selected = input(" Model ID: ").strip()
except (KeyboardInterrupt, EOFError):
selected = None
if selected:
_save_model_choice(selected)
# Save as custom provider pointing to bedrock-mantle
cfg = load_config()
model = cfg.get("model")
if not isinstance(model, dict):
model = {"default": model} if model else {}
cfg["model"] = model
model["provider"] = "custom"
model["base_url"] = mantle_base_url
model.pop("api_mode", None) # chat_completions is the default
# Also save region in bedrock config for reference
bedrock_cfg = cfg.get("bedrock", {})
if not isinstance(bedrock_cfg, dict):
bedrock_cfg = {}
bedrock_cfg["region"] = region
cfg["bedrock"] = bedrock_cfg
# Save the API key env var name so hermes knows where to find it
save_env_value("OPENAI_API_KEY", existing_key)
save_env_value("OPENAI_BASE_URL", mantle_base_url)
save_config(cfg)
deactivate_provider()
print(f" Default model set to: {selected} (via Bedrock API Key, {region})")
print(f" Endpoint: {mantle_base_url}")
else:
print(" No change.")
def _model_flow_bedrock(config, current_model=""):
"""AWS Bedrock provider: verify credentials, pick region, discover models.
Uses the native Converse API via boto3 not the OpenAI-compatible endpoint.
Auth is handled by the AWS SDK default credential chain (env vars, profile,
instance role), so no API key prompt is needed.
"""
from hermes_cli.auth import _prompt_model_selection, _save_model_choice, deactivate_provider
from hermes_cli.config import load_config, save_config
from hermes_cli.models import _PROVIDER_MODELS
# 1. Check for AWS credentials
try:
from agent.bedrock_adapter import (
has_aws_credentials,
resolve_aws_auth_env_var,
resolve_bedrock_region,
discover_bedrock_models,
)
except ImportError:
print(" ✗ boto3 is not installed. Install it with:")
print(" pip install boto3")
print()
return
if not has_aws_credentials():
print(" ⚠ No AWS credentials detected via environment variables.")
print(" Bedrock will use boto3's default credential chain (IMDS, SSO, etc.)")
print()
auth_var = resolve_aws_auth_env_var()
if auth_var:
print(f" AWS credentials: {auth_var}")
else:
print(" AWS credentials: boto3 default chain (instance role / SSO)")
print()
# 2. Region selection
current_region = resolve_bedrock_region()
try:
region_input = input(f" AWS Region [{current_region}]: ").strip()
except (KeyboardInterrupt, EOFError):
print()
return
region = region_input or current_region
# 2b. Authentication mode
print(" Choose authentication method:")
print()
print(" 1. IAM credential chain (recommended)")
print(" Works with EC2 instance roles, SSO, env vars, aws configure")
print(" 2. Bedrock API Key")
print(" Enter your Bedrock API Key directly — also supports")
print(" team scenarios where an admin distributes keys")
print()
try:
auth_choice = input(" Choice [1]: ").strip()
except (KeyboardInterrupt, EOFError):
print()
return
if auth_choice == "2":
_model_flow_bedrock_api_key(config, region, current_model)
return
# 3. Model discovery — try live API first, fall back to static list
print(f" Discovering models in {region}...")
live_models = discover_bedrock_models(region)
if live_models:
_EXCLUDE_PREFIXES = (
"stability.", "cohere.embed", "twelvelabs.", "us.stability.",
"us.cohere.embed", "us.twelvelabs.", "global.cohere.embed",
"global.twelvelabs.",
)
_EXCLUDE_SUBSTRINGS = ("safeguard", "voxtral", "palmyra-vision")
filtered = []
for m in live_models:
mid = m["id"]
if any(mid.startswith(p) for p in _EXCLUDE_PREFIXES):
continue
if any(s in mid.lower() for s in _EXCLUDE_SUBSTRINGS):
continue
filtered.append(m)
# Deduplicate: prefer inference profiles (us.*, global.*) over bare
# foundation model IDs.
profile_base_ids = set()
for m in filtered:
mid = m["id"]
if mid.startswith(("us.", "global.")):
base = mid.split(".", 1)[1] if "." in mid[3:] else mid
profile_base_ids.add(base)
deduped = []
for m in filtered:
mid = m["id"]
if not mid.startswith(("us.", "global.")) and mid in profile_base_ids:
continue
deduped.append(m)
_RECOMMENDED = [
"us.anthropic.claude-sonnet-4-6",
"us.anthropic.claude-opus-4-6",
"us.anthropic.claude-haiku-4-5",
"us.amazon.nova-pro",
"us.amazon.nova-lite",
"us.amazon.nova-micro",
"deepseek.v3",
"us.meta.llama4-maverick",
"us.meta.llama4-scout",
]
def _sort_key(m):
mid = m["id"]
for i, rec in enumerate(_RECOMMENDED):
if mid.startswith(rec):
return (0, i, mid)
if mid.startswith("global."):
return (1, 0, mid)
return (2, 0, mid)
deduped.sort(key=_sort_key)
model_list = [m["id"] for m in deduped]
print(f" Found {len(model_list)} text model(s) (filtered from {len(live_models)} total)")
else:
model_list = _PROVIDER_MODELS.get("bedrock", [])
if model_list:
print(f" Using {len(model_list)} curated models (live discovery unavailable)")
else:
print(" No models found. Check IAM permissions for bedrock:ListFoundationModels.")
return
# 4. Model selection
if model_list:
selected = _prompt_model_selection(model_list, current_model=current_model)
else:
try:
selected = input(" Model ID: ").strip()
except (KeyboardInterrupt, EOFError):
selected = None
if selected:
_save_model_choice(selected)
cfg = load_config()
model = cfg.get("model")
if not isinstance(model, dict):
model = {"default": model} if model else {}
cfg["model"] = model
model["provider"] = "bedrock"
model["base_url"] = f"https://bedrock-runtime.{region}.amazonaws.com"
model.pop("api_mode", None) # bedrock_converse is auto-detected
bedrock_cfg = cfg.get("bedrock", {})
if not isinstance(bedrock_cfg, dict):
bedrock_cfg = {}
bedrock_cfg["region"] = region
cfg["bedrock"] = bedrock_cfg
save_config(cfg)
deactivate_provider()
print(f" Default model set to: {selected} (via AWS Bedrock, {region})")
else:
print(" No change.")
def _model_flow_api_key_provider(config, provider_id, current_model=""):
"""Generic flow for API-key providers (z.ai, MiniMax, OpenCode, etc.)."""
from hermes_cli.auth import (
@@ -2747,43 +2486,34 @@ def _model_flow_api_key_provider(config, provider_id, current_model=""):
# 1. models.dev registry (cached, filtered for agentic/tool-capable models)
# 2. Curated static fallback list (offline insurance)
# 3. Live /models endpoint probe (small providers without models.dev data)
#
# Ollama Cloud: dedicated merged discovery (live API + models.dev + disk cache)
if provider_id == "ollama-cloud":
from hermes_cli.models import fetch_ollama_cloud_models
api_key_for_probe = existing_key or (get_env_value(key_env) if key_env else "")
model_list = fetch_ollama_cloud_models(api_key=api_key_for_probe, base_url=effective_base)
if model_list:
print(f" Found {len(model_list)} model(s) from Ollama Cloud")
curated = _PROVIDER_MODELS.get(provider_id, [])
# Try models.dev first — returns tool-capable models, filtered for noise
mdev_models: list = []
try:
from agent.models_dev import list_agentic_models
mdev_models = list_agentic_models(provider_id)
except Exception:
pass
if mdev_models:
model_list = mdev_models
print(f" Found {len(model_list)} model(s) from models.dev registry")
elif curated and len(curated) >= 8:
# Curated list is substantial — use it directly, skip live probe
model_list = curated
print(f" Showing {len(model_list)} curated models — use \"Enter custom model name\" for others.")
else:
curated = _PROVIDER_MODELS.get(provider_id, [])
# Try models.dev first — returns tool-capable models, filtered for noise
mdev_models: list = []
try:
from agent.models_dev import list_agentic_models
mdev_models = list_agentic_models(provider_id)
except Exception:
pass
if mdev_models:
model_list = mdev_models
print(f" Found {len(model_list)} model(s) from models.dev registry")
elif curated and len(curated) >= 8:
# Curated list is substantial — use it directly, skip live probe
model_list = curated
print(f" Showing {len(model_list)} curated models — use \"Enter custom model name\" for others.")
api_key_for_probe = existing_key or (get_env_value(key_env) if key_env else "")
live_models = fetch_api_models(api_key_for_probe, effective_base)
if live_models and len(live_models) >= len(curated):
model_list = live_models
print(f" Found {len(model_list)} model(s) from {pconfig.name} API")
else:
api_key_for_probe = existing_key or (get_env_value(key_env) if key_env else "")
live_models = fetch_api_models(api_key_for_probe, effective_base)
if live_models and len(live_models) >= len(curated):
model_list = live_models
print(f" Found {len(model_list)} model(s) from {pconfig.name} API")
else:
model_list = curated
if model_list:
print(f" Showing {len(model_list)} curated models — use \"Enter custom model name\" for others.")
# else: no defaults either, will fall through to raw input
model_list = curated
if model_list:
print(f" Showing {len(model_list)} curated models — use \"Enter custom model name\" for others.")
# else: no defaults either, will fall through to raw input
if provider_id in {"opencode-zen", "opencode-go"}:
model_list = [normalize_opencode_model_id(provider_id, mid) for mid in model_list]
@@ -4882,7 +4612,7 @@ For more help on a command:
)
chat_parser.add_argument(
"--provider",
choices=["auto", "openrouter", "nous", "openai-codex", "copilot-acp", "copilot", "anthropic", "gemini", "xai", "ollama-cloud", "huggingface", "zai", "kimi-coding", "kimi-coding-cn", "minimax", "minimax-cn", "kilocode", "xiaomi", "arcee"],
choices=["auto", "openrouter", "nous", "openai-codex", "copilot-acp", "copilot", "anthropic", "gemini", "huggingface", "zai", "kimi-coding", "kimi-coding-cn", "minimax", "minimax-cn", "kilocode", "xiaomi", "arcee"],
default=None,
help="Inference provider (default: auto)"
)
@@ -5019,7 +4749,6 @@ For more help on a command:
# gateway start
gateway_start = gateway_subparsers.add_parser("start", help="Start the installed systemd/launchd background service")
gateway_start.add_argument("--system", action="store_true", help="Target the Linux system-level gateway service")
gateway_start.add_argument("--all", action="store_true", help="Kill ALL stale gateway processes across all profiles before starting")
# gateway stop
gateway_stop = gateway_subparsers.add_parser("stop", help="Stop gateway service")
@@ -5029,7 +4758,6 @@ For more help on a command:
# gateway restart
gateway_restart = gateway_subparsers.add_parser("restart", help="Restart gateway service")
gateway_restart.add_argument("--system", action="store_true", help="Target the Linux system-level gateway service")
gateway_restart.add_argument("--all", action="store_true", help="Kill ALL gateway processes across all profiles before restarting")
# gateway status
gateway_status = gateway_subparsers.add_parser("status", help="Show gateway status")
@@ -5343,7 +5071,6 @@ Examples:
hermes debug share --lines 500 Include more log lines
hermes debug share --expire 30 Keep paste for 30 days
hermes debug share --local Print report locally (no upload)
hermes debug delete <url> Delete a previously uploaded paste
""",
)
debug_sub = debug_parser.add_subparsers(dest="debug_command")
@@ -5363,14 +5090,6 @@ Examples:
"--local", action="store_true",
help="Print the report locally instead of uploading",
)
delete_parser = debug_sub.add_parser(
"delete",
help="Delete a paste uploaded by 'hermes debug share'",
)
delete_parser.add_argument(
"urls", nargs="*", default=[],
help="One or more paste URLs to delete (e.g. https://paste.rs/abc123)",
)
debug_parser.set_defaults(func=cmd_debug)
# =========================================================================
@@ -5651,18 +5370,6 @@ Examples:
memory_sub.add_parser("setup", help="Interactive provider selection and configuration")
memory_sub.add_parser("status", help="Show current memory provider config")
memory_sub.add_parser("off", help="Disable external provider (built-in only)")
_reset_parser = memory_sub.add_parser(
"reset",
help="Erase all built-in memory (MEMORY.md and USER.md)",
)
_reset_parser.add_argument(
"--yes", "-y", action="store_true",
help="Skip confirmation prompt",
)
_reset_parser.add_argument(
"--target", choices=["all", "memory", "user"], default="all",
help="Which store to reset: 'all' (default), 'memory', or 'user'",
)
def cmd_memory(args):
sub = getattr(args, "memory_command", None)
@@ -5675,44 +5382,6 @@ Examples:
save_config(config)
print("\n ✓ Memory provider: built-in only")
print(" Saved to config.yaml\n")
elif sub == "reset":
from hermes_constants import get_hermes_home, display_hermes_home
mem_dir = get_hermes_home() / "memories"
target = getattr(args, "target", "all")
files_to_reset = []
if target in ("all", "memory"):
files_to_reset.append(("MEMORY.md", "agent notes"))
if target in ("all", "user"):
files_to_reset.append(("USER.md", "user profile"))
# Check what exists
existing = [(f, desc) for f, desc in files_to_reset if (mem_dir / f).exists()]
if not existing:
print(f"\n Nothing to reset — no memory files found in {display_hermes_home()}/memories/\n")
return
print(f"\n This will permanently erase the following memory files:")
for f, desc in existing:
path = mem_dir / f
size = path.stat().st_size
print(f"{f} ({desc}) — {size:,} bytes")
if not getattr(args, "yes", False):
try:
answer = input("\n Type 'yes' to confirm: ").strip().lower()
except (EOFError, KeyboardInterrupt):
print("\n Cancelled.\n")
return
if answer != "yes":
print(" Cancelled.\n")
return
for f, desc in existing:
(mem_dir / f).unlink()
print(f" ✓ Deleted {f} ({desc})")
print(f"\n Memory reset complete. New sessions will start with a blank slate.")
print(f" Files were in: {display_hermes_home()}/memories/\n")
else:
from hermes_cli.memory_setup import memory_command
memory_command(args)
@@ -6375,42 +6044,7 @@ Examples:
sys.exit(1)
_processed_argv = _coalesce_session_name_args(sys.argv[1:])
# ── Defensive subparser routing (bpo-9338 workaround) ───────────
# On some Python versions (notably <3.11), argparse fails to route
# subcommand tokens when the parent parser has nargs='?' optional
# arguments (--continue). The symptom: "unrecognized arguments: model"
# even though 'model' is a registered subcommand.
#
# Fix: when argv contains a token matching a known subcommand, set
# subparsers.required=True to force deterministic routing. If that
# fails (e.g. 'hermes -c model' where 'model' is consumed as the
# session name for --continue), fall back to the default behaviour.
import io as _io
_known_cmds = set(subparsers.choices.keys()) if hasattr(subparsers, "choices") else set()
_has_cmd_token = any(t in _known_cmds for t in _processed_argv if not t.startswith("-"))
if _has_cmd_token:
subparsers.required = True
_saved_stderr = sys.stderr
try:
sys.stderr = _io.StringIO()
args = parser.parse_args(_processed_argv)
sys.stderr = _saved_stderr
except SystemExit as exc:
sys.stderr = _saved_stderr
# Help/version flags (exit code 0) already printed output —
# re-raise immediately to avoid a second parse_args printing
# the same help text again (#10230).
if exc.code == 0:
raise
# Subcommand name was consumed as a flag value (e.g. -c model).
# Fall back to optional subparsers so argparse handles it normally.
subparsers.required = False
args = parser.parse_args(_processed_argv)
else:
subparsers.required = False
args = parser.parse_args(_processed_argv)
args = parser.parse_args(_processed_argv)
# Handle --version flag
if args.version:
+2 -4
View File
@@ -58,11 +58,9 @@ def _prompt(label: str, default: str | None = None, secret: bool = False) -> str
def _install_dependencies(provider_name: str) -> None:
"""Install pip dependencies declared in plugin.yaml."""
import subprocess
from plugins.memory import find_provider_dir
from pathlib import Path as _Path
plugin_dir = find_provider_dir(provider_name)
if not plugin_dir:
return
plugin_dir = _Path(__file__).parent.parent / "plugins" / "memory" / provider_name
yaml_path = plugin_dir / "plugin.yaml"
if not yaml_path.exists():
return
-1
View File
@@ -96,7 +96,6 @@ _MATCHING_PREFIX_STRIP_PROVIDERS: frozenset[str] = frozenset({
"qwen-oauth",
"xiaomi",
"arcee",
"ollama-cloud",
"custom",
})
+13 -32
View File
@@ -274,11 +274,6 @@ def parse_model_flags(raw_args: str) -> tuple[str, str, bool]:
is_global = False
explicit_provider = ""
# Normalize Unicode dashes (Telegram/iOS auto-converts -- to em/en dash)
# A single Unicode dash before a flag keyword becomes "--"
import re as _re
raw_args = _re.sub(r'[\u2012\u2013\u2014\u2015](provider|global)', r'--\1', raw_args)
# Extract --global
if "--global" in raw_args:
is_global = True
@@ -457,7 +452,6 @@ def switch_model(
ModelSwitchResult with all information the caller needs.
"""
from hermes_cli.models import (
copilot_model_api_mode,
detect_provider_for_model,
validate_requested_model,
opencode_model_api_mode,
@@ -715,12 +709,8 @@ def switch_model(
if validation.get("corrected_model"):
new_model = validation["corrected_model"]
# --- Copilot api_mode override ---
if target_provider in {"copilot", "github-copilot"}:
api_mode = copilot_model_api_mode(new_model, api_key=api_key)
# --- OpenCode api_mode override ---
if target_provider in {"opencode-zen", "opencode-go", "opencode"}:
if target_provider in {"opencode-zen", "opencode-go", "opencode", "opencode-go"}:
api_mode = opencode_model_api_mode(target_provider, new_model)
# --- Determine api_mode if not already set ---
@@ -796,8 +786,7 @@ def list_authenticated_providers(
from hermes_cli.models import OPENROUTER_MODELS, _PROVIDER_MODELS
results: List[dict] = []
seen_slugs: set = set() # lowercase-normalized to catch case variants (#9545)
seen_mdev_ids: set = set() # prevent duplicate entries for aliases (e.g. kimi-coding + kimi-coding-cn)
seen_slugs: set = set()
data = fetch_models_dev()
@@ -807,18 +796,9 @@ def list_authenticated_providers(
# "nous" shares OpenRouter's curated list if not separately defined
if "nous" not in curated:
curated["nous"] = curated["openrouter"]
# Ollama Cloud uses dynamic discovery (no static curated list)
if "ollama-cloud" not in curated:
from hermes_cli.models import fetch_ollama_cloud_models
curated["ollama-cloud"] = fetch_ollama_cloud_models()
# --- 1. Check Hermes-mapped providers ---
for hermes_id, mdev_id in PROVIDER_TO_MODELS_DEV.items():
# Skip aliases that map to the same models.dev provider (e.g.
# kimi-coding and kimi-coding-cn both → kimi-for-coding).
# The first one with valid credentials wins (#10526).
if mdev_id in seen_mdev_ids:
continue
pdata = data.get(mdev_id)
if not isinstance(pdata, dict):
continue
@@ -857,8 +837,7 @@ def list_authenticated_providers(
"total_models": total,
"source": "built-in",
})
seen_slugs.add(slug.lower())
seen_mdev_ids.add(mdev_id)
seen_slugs.add(slug)
# --- 2. Check Hermes-only providers (nous, openai-codex, copilot, opencode-go) ---
from hermes_cli.providers import HERMES_OVERLAYS
@@ -870,12 +849,12 @@ def list_authenticated_providers(
_mdev_to_hermes = {v: k for k, v in PROVIDER_TO_MODELS_DEV.items()}
for pid, overlay in HERMES_OVERLAYS.items():
if pid.lower() in seen_slugs:
if pid in seen_slugs:
continue
# Resolve Hermes slug — e.g. "github-copilot" → "copilot"
hermes_slug = _mdev_to_hermes.get(pid, pid)
if hermes_slug.lower() in seen_slugs:
if hermes_slug in seen_slugs:
continue
# Check if credentials exist
@@ -956,8 +935,8 @@ def list_authenticated_providers(
"total_models": total,
"source": "hermes",
})
seen_slugs.add(pid.lower())
seen_slugs.add(hermes_slug.lower())
seen_slugs.add(pid)
seen_slugs.add(hermes_slug)
# --- 2b. Cross-check canonical provider list ---
# Catches providers that are in CANONICAL_PROVIDERS but weren't found
@@ -969,7 +948,7 @@ def list_authenticated_providers(
_canon_provs = []
for _cp in _canon_provs:
if _cp.slug.lower() in seen_slugs:
if _cp.slug in seen_slugs:
continue
# Check credentials via PROVIDER_REGISTRY (auth.py)
@@ -1016,7 +995,7 @@ def list_authenticated_providers(
"total_models": _cp_total,
"source": "canonical",
})
seen_slugs.add(_cp.slug.lower())
seen_slugs.add(_cp.slug)
# --- 3. User-defined endpoints from config ---
if user_providers and isinstance(user_providers, dict):
@@ -1089,7 +1068,7 @@ def list_authenticated_providers(
groups[slug]["models"].append(default_model)
for slug, grp in groups.items():
if slug.lower() in seen_slugs:
if slug in seen_slugs:
continue
results.append({
"slug": slug,
@@ -1101,9 +1080,11 @@ def list_authenticated_providers(
"source": "user-config",
"api_url": grp["api_url"],
})
seen_slugs.add(slug.lower())
seen_slugs.add(slug)
# Sort: current provider first, then by model count descending
results.sort(key=lambda r: (not r["is_current"], -r["total_models"]))
return results
+26 -220
View File
@@ -11,9 +11,7 @@ import json
import os
import urllib.request
import urllib.error
import time
from difflib import get_close_matches
from pathlib import Path
from typing import Any, NamedTuple, Optional
COPILOT_BASE_URL = "https://api.githubcopilot.com"
@@ -26,8 +24,7 @@ COPILOT_REASONING_EFFORTS_O_SERIES = ["low", "medium", "high"]
# Fallback OpenRouter snapshot used when the live catalog is unavailable.
# (model_id, display description shown in menus)
OPENROUTER_MODELS: list[tuple[str, str]] = [
("anthropic/claude-opus-4.7", "recommended"),
("anthropic/claude-opus-4.6", ""),
("anthropic/claude-opus-4.6", "recommended"),
("anthropic/claude-sonnet-4.6", ""),
("qwen/qwen3.6-plus", ""),
("anthropic/claude-sonnet-4.5", ""),
@@ -146,8 +143,17 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"glm-4.5-flash",
],
"xai": [
"grok-4.20-reasoning",
"grok-4.20-0309-reasoning",
"grok-4.20-0309-non-reasoning",
"grok-4.20-multi-agent-0309",
"grok-4-1-fast-reasoning",
"grok-4-1-fast-non-reasoning",
"grok-4-fast-reasoning",
"grok-4-fast-non-reasoning",
"grok-4-0709",
"grok-code-fast-1",
"grok-3",
"grok-3-mini",
],
"kimi-coding": [
"kimi-for-coding",
@@ -182,7 +188,6 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"MiniMax-M2",
],
"anthropic": [
"claude-opus-4-7",
"claude-opus-4-6",
"claude-sonnet-4-6",
"claude-opus-4-5-20251101",
@@ -298,22 +303,6 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"XiaomiMiMo/MiMo-V2-Flash",
"moonshotai/Kimi-K2-Thinking",
],
# AWS Bedrock — static fallback list used when dynamic discovery is
# unavailable (no boto3, no credentials, or API error). The agent
# prefers live discovery via ListFoundationModels + ListInferenceProfiles.
# Use inference profile IDs (us.*) since most models require them.
"bedrock": [
"us.anthropic.claude-sonnet-4-6",
"us.anthropic.claude-opus-4-6-v1",
"us.anthropic.claude-haiku-4-5-20251001-v1:0",
"us.anthropic.claude-sonnet-4-5-20250929-v1:0",
"us.amazon.nova-pro-v1:0",
"us.amazon.nova-lite-v1:0",
"us.amazon.nova-micro-v1:0",
"deepseek.v3.2",
"us.meta.llama4-maverick-17b-instruct-v1:0",
"us.meta.llama4-scout-17b-instruct-v1:0",
],
}
# ---------------------------------------------------------------------------
@@ -537,25 +526,22 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
ProviderEntry("deepseek", "DeepSeek", "DeepSeek (DeepSeek-V3, R1, coder — direct API)"),
ProviderEntry("xai", "xAI", "xAI (Grok models — direct API)"),
ProviderEntry("zai", "Z.AI / GLM", "Z.AI / GLM (Zhipu AI direct API)"),
ProviderEntry("kimi-coding", "Kimi / Kimi Coding Plan", "Kimi Coding Plan (api.kimi.com) & Moonshot API"),
ProviderEntry("kimi-coding", "Kimi / Moonshot", "Kimi / Moonshot (Moonshot AI direct API)"),
ProviderEntry("kimi-coding-cn", "Kimi / Moonshot (China)", "Kimi / Moonshot China (Moonshot CN direct API)"),
ProviderEntry("minimax", "MiniMax", "MiniMax (global direct API)"),
ProviderEntry("minimax-cn", "MiniMax (China)", "MiniMax China (domestic direct API)"),
ProviderEntry("alibaba", "Alibaba Cloud (DashScope)","Alibaba Cloud / DashScope Coding (Qwen + multi-provider)"),
ProviderEntry("ollama-cloud", "Ollama Cloud", "Ollama Cloud (cloud-hosted open models — ollama.com)"),
ProviderEntry("arcee", "Arcee AI", "Arcee AI (Trinity models — direct API)"),
ProviderEntry("kilocode", "Kilo Code", "Kilo Code (Kilo Gateway API)"),
ProviderEntry("opencode-zen", "OpenCode Zen", "OpenCode Zen (35+ curated models, pay-as-you-go)"),
ProviderEntry("opencode-go", "OpenCode Go", "OpenCode Go (open models, $10/month subscription)"),
ProviderEntry("ai-gateway", "Vercel AI Gateway", "Vercel AI Gateway (200+ models, pay-per-use)"),
ProviderEntry("bedrock", "AWS Bedrock", "AWS Bedrock (Claude, Nova, Llama, DeepSeek — IAM or API key)"),
]
# Derived dicts — used throughout the codebase
_PROVIDER_LABELS = {p.slug: p.label for p in CANONICAL_PROVIDERS}
_PROVIDER_LABELS["custom"] = "Custom endpoint" # special case: not a named provider
_PROVIDER_ALIASES = {
"glm": "zai",
"z-ai": "zai",
@@ -601,15 +587,9 @@ _PROVIDER_ALIASES = {
"huggingface-hub": "huggingface",
"mimo": "xiaomi",
"xiaomi-mimo": "xiaomi",
"aws": "bedrock",
"aws-bedrock": "bedrock",
"amazon-bedrock": "bedrock",
"amazon": "bedrock",
"grok": "xai",
"x-ai": "xai",
"x.ai": "xai",
"ollama": "custom", # bare "ollama" = local; use "ollama-cloud" for cloud
"ollama_cloud": "ollama-cloud",
}
@@ -1046,7 +1026,7 @@ def detect_provider_for_model(
return (resolved_provider, default_models[0])
# Aggregators list other providers' models — never auto-switch TO them
_AGGREGATORS = {"nous", "openrouter", "ai-gateway", "copilot", "kilocode"}
_AGGREGATORS = {"nous", "openrouter"}
# If the model belongs to the current provider's catalog, don't suggest switching
current_models = _PROVIDER_MODELS.get(current_provider, [])
@@ -1063,8 +1043,7 @@ def detect_provider_for_model(
break
if direct_match:
# Check if we have credentials for this provider — env vars,
# credential pool, or auth store entries.
# Check if we have credentials for this provider
has_creds = False
try:
from hermes_cli.auth import PROVIDER_REGISTRY
@@ -1077,28 +1056,16 @@ def detect_provider_for_model(
break
except Exception:
pass
# Also check credential pool and auth store — covers OAuth,
# Claude Code tokens, and other non-env-var credentials (#10300).
if not has_creds:
try:
from agent.credential_pool import load_pool
pool = load_pool(direct_match)
if pool.has_credentials():
has_creds = True
except Exception:
pass
if not has_creds:
try:
from hermes_cli.auth import _load_auth_store
store = _load_auth_store()
if direct_match in store.get("providers", {}) or direct_match in store.get("credential_pool", {}):
has_creds = True
except Exception:
pass
# Always return the direct provider match. If credentials are
# missing, the client init will give a clear error rather than
# silently routing through the wrong provider (#10300).
if has_creds:
return (direct_match, name)
# No direct creds — try to find this model on OpenRouter instead
or_slug = _find_openrouter_slug(name)
if or_slug:
return ("openrouter", or_slug)
# Still return the direct provider — credential resolution will
# give a clear error rather than silently using the wrong provider
return (direct_match, name)
# --- Step 2: check OpenRouter catalog ---
@@ -1288,10 +1255,6 @@ def provider_model_ids(provider: Optional[str], *, force_refresh: bool = False)
live = _fetch_ai_gateway_models()
if live:
return live
if normalized == "ollama-cloud":
live = fetch_ollama_cloud_models(force_refresh=force_refresh)
if live:
return live
if normalized == "custom":
base_url = _get_custom_base_url()
if base_url:
@@ -1576,11 +1539,6 @@ def copilot_model_api_mode(
primary signal. Falls back to the catalog's ``supported_endpoints``
only for models not covered by the pattern check.
"""
# Fetch the catalog once so normalize + endpoint check share it
# (avoids two redundant network calls for non-GPT-5 models).
if catalog is None and api_key:
catalog = fetch_github_model_catalog(api_key=api_key)
normalized = normalize_copilot_model_id(model_id, catalog=catalog, api_key=api_key)
if not normalized:
return "chat_completions"
@@ -1590,6 +1548,9 @@ def copilot_model_api_mode(
return "codex_responses"
# Secondary: check catalog for non-GPT-5 models (Claude via /v1/messages, etc.)
if catalog is None and api_key:
catalog = fetch_github_model_catalog(api_key=api_key)
if catalog:
catalog_entry = next((item for item in catalog if item.get("id") == normalized), None)
if isinstance(catalog_entry, dict):
@@ -1804,125 +1765,6 @@ def fetch_api_models(
return probe_api_models(api_key, base_url, timeout=timeout).get("models")
# ---------------------------------------------------------------------------
# Ollama Cloud — merged model discovery with disk cache
# ---------------------------------------------------------------------------
_OLLAMA_CLOUD_CACHE_TTL = 3600 # 1 hour
def _ollama_cloud_cache_path() -> Path:
"""Return the path for the Ollama Cloud model cache."""
from hermes_constants import get_hermes_home
return get_hermes_home() / "ollama_cloud_models_cache.json"
def _load_ollama_cloud_cache(*, ignore_ttl: bool = False) -> Optional[dict]:
"""Load cached Ollama Cloud models from disk.
Args:
ignore_ttl: If True, return data even if the TTL has expired (stale fallback).
"""
try:
cache_path = _ollama_cloud_cache_path()
if not cache_path.exists():
return None
with open(cache_path, encoding="utf-8") as f:
data = json.load(f)
if not isinstance(data, dict):
return None
models = data.get("models")
if not (isinstance(models, list) and models):
return None
if not ignore_ttl:
cached_at = data.get("cached_at", 0)
if (time.time() - cached_at) > _OLLAMA_CLOUD_CACHE_TTL:
return None # stale
return data
except Exception:
pass
return None
def _save_ollama_cloud_cache(models: list[str]) -> None:
"""Persist the merged Ollama Cloud model list to disk."""
try:
from utils import atomic_json_write
cache_path = _ollama_cloud_cache_path()
cache_path.parent.mkdir(parents=True, exist_ok=True)
atomic_json_write(cache_path, {"models": models, "cached_at": time.time()}, indent=None)
except Exception:
pass
def fetch_ollama_cloud_models(
api_key: Optional[str] = None,
base_url: Optional[str] = None,
*,
force_refresh: bool = False,
) -> list[str]:
"""Fetch Ollama Cloud models by merging live API + models.dev, with disk cache.
Resolution order:
1. Disk cache (if fresh, < 1 hour, and not force_refresh)
2. Live ``/v1/models`` endpoint (primary freshest source)
3. models.dev registry (secondary fills gaps for unlisted models)
4. Merge: live models first, then models.dev additions (deduped)
Returns a list of model IDs (never None empty list on total failure).
"""
# 1. Check disk cache
if not force_refresh:
cached = _load_ollama_cloud_cache()
if cached is not None:
return cached["models"]
# 2. Live API probe
if not api_key:
api_key = os.getenv("OLLAMA_API_KEY", "")
if not base_url:
base_url = os.getenv("OLLAMA_BASE_URL", "") or "https://ollama.com/v1"
live_models: list[str] = []
if api_key:
result = fetch_api_models(api_key, base_url, timeout=8.0)
if result:
live_models = result
# 3. models.dev registry
mdev_models: list[str] = []
try:
from agent.models_dev import list_agentic_models
mdev_models = list_agentic_models("ollama-cloud")
except Exception:
pass
# 4. Merge: live first, then models.dev additions (deduped, order-preserving)
if live_models or mdev_models:
seen: set[str] = set()
merged: list[str] = []
for m in live_models:
if m and m not in seen:
seen.add(m)
merged.append(m)
for m in mdev_models:
if m and m not in seen:
seen.add(m)
merged.append(m)
if merged:
_save_ollama_cloud_cache(merged)
return merged
# Total failure — return stale cache if available (ignore TTL)
stale = _load_ollama_cloud_cache(ignore_ttl=True)
if stale is not None:
return stale["models"]
return []
def validate_requested_model(
model_name: str,
provider: Optional[str],
@@ -2115,42 +1957,6 @@ def validate_requested_model(
# api_models is None — couldn't reach API. Accept and persist,
# but warn so typos don't silently break things.
# Bedrock: use our own discovery instead of HTTP /models endpoint.
# Bedrock's bedrock-runtime URL doesn't support /models — it uses the
# AWS SDK control plane (ListFoundationModels + ListInferenceProfiles).
if normalized == "bedrock":
try:
from agent.bedrock_adapter import discover_bedrock_models, resolve_bedrock_region
region = resolve_bedrock_region()
discovered = discover_bedrock_models(region)
discovered_ids = {m["id"] for m in discovered}
if requested in discovered_ids:
return {
"accepted": True,
"persist": True,
"recognized": True,
"message": None,
}
# Not in discovered list — still accept (user may have custom
# inference profiles or cross-account access), but warn.
suggestions = get_close_matches(requested, list(discovered_ids), n=3, cutoff=0.4)
suggestion_text = ""
if suggestions:
suggestion_text = "\n Similar models: " + ", ".join(f"`{s}`" for s in suggestions)
return {
"accepted": True,
"persist": True,
"recognized": False,
"message": (
f"Note: `{requested}` was not found in Bedrock model discovery for {region}. "
f"It may still work with custom inference profiles or cross-account access."
f"{suggestion_text}"
),
}
except Exception:
pass # Fall through to generic warning
provider_label = _PROVIDER_LABELS.get(normalized, normalized)
return {
"accepted": True,
+30 -277
View File
@@ -143,7 +143,6 @@ def _tts_label(current_provider: str) -> str:
"openai": "OpenAI TTS",
"elevenlabs": "ElevenLabs",
"edge": "Edge TTS",
"xai": "xAI TTS",
"mistral": "Mistral Voxtral TTS",
"neutts": "NeuTTS",
}
@@ -258,15 +257,6 @@ def get_nous_subscription_features(
terminal_cfg.get("modal_mode")
)
# use_gateway flags — when True, the user explicitly opted into the
# Tool Gateway via `hermes model`, so direct credentials should NOT
# prevent gateway routing.
web_use_gateway = bool(web_cfg.get("use_gateway"))
tts_use_gateway = bool(tts_cfg.get("use_gateway"))
browser_use_gateway = bool(browser_cfg.get("use_gateway"))
image_gen_cfg = config.get("image_gen") if isinstance(config.get("image_gen"), dict) else {}
image_use_gateway = bool(image_gen_cfg.get("use_gateway"))
direct_exa = bool(get_env_value("EXA_API_KEY"))
direct_firecrawl = bool(get_env_value("FIRECRAWL_API_KEY") or get_env_value("FIRECRAWL_API_URL"))
direct_parallel = bool(get_env_value("PARALLEL_API_KEY"))
@@ -279,21 +269,6 @@ def get_nous_subscription_features(
direct_browser_use = bool(get_env_value("BROWSER_USE_API_KEY"))
direct_modal = has_direct_modal_credentials()
# When use_gateway is set, suppress direct credentials for managed detection
if web_use_gateway:
direct_firecrawl = False
direct_exa = False
direct_parallel = False
direct_tavily = False
if image_use_gateway:
direct_fal = False
if tts_use_gateway:
direct_openai_tts = False
direct_elevenlabs = False
if browser_use_gateway:
direct_browser_use = False
direct_browserbase = False
managed_web_available = managed_tools_flag and nous_auth_present and is_managed_tool_gateway_ready("firecrawl")
managed_image_available = managed_tools_flag and nous_auth_present and is_managed_tool_gateway_ready("fal-queue")
managed_tts_available = managed_tools_flag and nous_auth_present and is_managed_tool_gateway_ready("openai-audio")
@@ -464,7 +439,37 @@ def get_nous_subscription_features(
)
def get_nous_subscription_explainer_lines() -> list[str]:
if not managed_nous_tools_enabled():
return []
return [
"Nous subscription enables managed web tools, image generation, OpenAI TTS, and browser automation by default.",
"Those managed tools bill to your Nous subscription. Modal execution is optional and can bill to your subscription too.",
"Change these later with: hermes setup tools, hermes setup terminal, or hermes status.",
]
def apply_nous_provider_defaults(config: Dict[str, object]) -> set[str]:
"""Apply provider-level Nous defaults shared by `hermes setup` and `hermes model`."""
if not managed_nous_tools_enabled():
return set()
features = get_nous_subscription_features(config)
if not features.provider_is_nous:
return set()
tts_cfg = config.get("tts")
if not isinstance(tts_cfg, dict):
tts_cfg = {}
config["tts"] = tts_cfg
current_tts = str(tts_cfg.get("provider") or "edge").strip().lower()
if current_tts not in {"", "edge"}:
return set()
tts_cfg["provider"] = "openai"
return {"tts"}
def apply_nous_managed_defaults(
@@ -524,255 +529,3 @@ def apply_nous_managed_defaults(
changed.add("image_gen")
return changed
# ---------------------------------------------------------------------------
# Tool Gateway offer — single Y/n prompt after model selection
# ---------------------------------------------------------------------------
_GATEWAY_TOOL_LABELS = {
"web": "Web search & extract (Firecrawl)",
"image_gen": "Image generation (FAL)",
"tts": "Text-to-speech (OpenAI TTS)",
"browser": "Browser automation (Browser Use)",
}
def _get_gateway_direct_credentials() -> Dict[str, bool]:
"""Return a dict of tool_key -> has_direct_credentials."""
return {
"web": bool(
get_env_value("FIRECRAWL_API_KEY")
or get_env_value("FIRECRAWL_API_URL")
or get_env_value("PARALLEL_API_KEY")
or get_env_value("TAVILY_API_KEY")
or get_env_value("EXA_API_KEY")
),
"image_gen": bool(get_env_value("FAL_KEY")),
"tts": bool(
resolve_openai_audio_api_key()
or get_env_value("ELEVENLABS_API_KEY")
),
"browser": bool(
get_env_value("BROWSER_USE_API_KEY")
or (get_env_value("BROWSERBASE_API_KEY") and get_env_value("BROWSERBASE_PROJECT_ID"))
),
}
_GATEWAY_DIRECT_LABELS = {
"web": "Firecrawl/Exa/Parallel/Tavily key",
"image_gen": "FAL key",
"tts": "OpenAI/ElevenLabs key",
"browser": "Browser Use/Browserbase key",
}
_ALL_GATEWAY_KEYS = ("web", "image_gen", "tts", "browser")
def get_gateway_eligible_tools(
config: Optional[Dict[str, object]] = None,
) -> tuple[list[str], list[str], list[str]]:
"""Return (unconfigured, has_direct, already_managed) tool key lists.
- unconfigured: tools with no direct credentials (easy switch)
- has_direct: tools where the user has their own API keys
- already_managed: tools already routed through the gateway
All lists are empty when the user is not a paid Nous subscriber or
is not using Nous as their provider.
"""
if not managed_nous_tools_enabled():
return [], [], []
if config is None:
from hermes_cli.config import load_config
config = load_config() or {}
# Quick provider check without the heavy get_nous_subscription_features call
model_cfg = config.get("model")
if not isinstance(model_cfg, dict) or str(model_cfg.get("provider") or "").strip().lower() != "nous":
return [], [], []
direct = _get_gateway_direct_credentials()
# Check which tools the user has explicitly opted into the gateway for.
# This is distinct from managed_by_nous which fires implicitly when
# no direct keys exist — we only skip the prompt for tools where
# use_gateway was explicitly set.
opted_in = {
"web": bool((config.get("web") if isinstance(config.get("web"), dict) else {}).get("use_gateway")),
"image_gen": bool((config.get("image_gen") if isinstance(config.get("image_gen"), dict) else {}).get("use_gateway")),
"tts": bool((config.get("tts") if isinstance(config.get("tts"), dict) else {}).get("use_gateway")),
"browser": bool((config.get("browser") if isinstance(config.get("browser"), dict) else {}).get("use_gateway")),
}
unconfigured: list[str] = []
has_direct: list[str] = []
already_managed: list[str] = []
for key in _ALL_GATEWAY_KEYS:
if opted_in.get(key):
already_managed.append(key)
elif direct.get(key):
has_direct.append(key)
else:
unconfigured.append(key)
return unconfigured, has_direct, already_managed
def apply_gateway_defaults(
config: Dict[str, object],
tool_keys: list[str],
) -> set[str]:
"""Apply Tool Gateway config for the given tool keys.
Sets ``use_gateway: true`` in each tool's config section so the
runtime prefers the gateway even when direct API keys are present.
Returns the set of tools that were actually changed.
"""
changed: set[str] = set()
web_cfg = config.get("web")
if not isinstance(web_cfg, dict):
web_cfg = {}
config["web"] = web_cfg
tts_cfg = config.get("tts")
if not isinstance(tts_cfg, dict):
tts_cfg = {}
config["tts"] = tts_cfg
browser_cfg = config.get("browser")
if not isinstance(browser_cfg, dict):
browser_cfg = {}
config["browser"] = browser_cfg
if "web" in tool_keys:
web_cfg["backend"] = "firecrawl"
web_cfg["use_gateway"] = True
changed.add("web")
if "tts" in tool_keys:
tts_cfg["provider"] = "openai"
tts_cfg["use_gateway"] = True
changed.add("tts")
if "browser" in tool_keys:
browser_cfg["cloud_provider"] = "browser-use"
browser_cfg["use_gateway"] = True
changed.add("browser")
if "image_gen" in tool_keys:
image_cfg = config.get("image_gen")
if not isinstance(image_cfg, dict):
image_cfg = {}
config["image_gen"] = image_cfg
image_cfg["use_gateway"] = True
changed.add("image_gen")
return changed
def prompt_enable_tool_gateway(config: Dict[str, object]) -> set[str]:
"""If eligible tools exist, prompt the user to enable the Tool Gateway.
Uses prompt_choice() with a description parameter so the curses TUI
shows the tool context alongside the choices.
Returns the set of tools that were enabled, or empty set if the user
declined or no tools were eligible.
"""
unconfigured, has_direct, already_managed = get_gateway_eligible_tools(config)
if not unconfigured and not has_direct:
return set()
try:
from hermes_cli.setup import prompt_choice
except Exception:
return set()
# Build description lines showing full status of all gateway tools
desc_parts: list[str] = [
"",
" The Tool Gateway gives you access to web search, image generation,",
" text-to-speech, and browser automation through your Nous subscription.",
" No need to sign up for separate API keys — just pick the tools you want.",
"",
]
if already_managed:
for k in already_managed:
desc_parts.append(f"{_GATEWAY_TOOL_LABELS[k]} — using Tool Gateway")
if unconfigured:
for k in unconfigured:
desc_parts.append(f"{_GATEWAY_TOOL_LABELS[k]} — not configured")
if has_direct:
for k in has_direct:
desc_parts.append(f"{_GATEWAY_TOOL_LABELS[k]} — using {_GATEWAY_DIRECT_LABELS[k]}")
# Build short choice labels — detail is in the description above
choices: list[str] = []
choice_keys: list[str] = [] # maps choice index -> action
if unconfigured and has_direct:
choices.append("Enable for all tools (existing keys kept, not used)")
choice_keys.append("all")
choices.append("Enable only for tools without existing keys")
choice_keys.append("unconfigured")
choices.append("Skip")
choice_keys.append("skip")
elif unconfigured:
choices.append("Enable Tool Gateway")
choice_keys.append("unconfigured")
choices.append("Skip")
choice_keys.append("skip")
else:
choices.append("Enable Tool Gateway (existing keys kept, not used)")
choice_keys.append("all")
choices.append("Skip")
choice_keys.append("skip")
description = "\n".join(desc_parts) if desc_parts else None
# Default to "Enable" when user has no direct keys (new user),
# default to "Skip" when they have existing keys to preserve.
default_idx = 0 if not has_direct else len(choices) - 1
try:
idx = prompt_choice(
"Your Nous subscription includes the Tool Gateway.",
choices,
default_idx,
description=description,
)
except (KeyboardInterrupt, EOFError, OSError, SystemExit):
return set()
action = choice_keys[idx]
if action == "skip":
return set()
if action == "all":
# Apply to switchable tools + ensure already-managed tools also
# have use_gateway persisted in config for consistency.
to_apply = list(_ALL_GATEWAY_KEYS)
else:
to_apply = unconfigured
changed = apply_gateway_defaults(config, to_apply)
if changed:
from hermes_cli.config import save_config
save_config(config)
# Only report the tools that actually switched (not already-managed ones)
newly_switched = changed - set(already_managed)
for key in sorted(newly_switched):
label = _GATEWAY_TOOL_LABELS.get(key, key)
print(f"{label}: enabled via Nous subscription")
if already_managed and not newly_switched:
print(" (all tools already using Tool Gateway)")
return changed
-99
View File
@@ -112,7 +112,6 @@ class LoadedPlugin:
module: Optional[types.ModuleType] = None
tools_registered: List[str] = field(default_factory=list)
hooks_registered: List[str] = field(default_factory=list)
commands_registered: List[str] = field(default_factory=list)
enabled: bool = False
error: Optional[str] = None
@@ -212,84 +211,6 @@ class PluginContext:
}
logger.debug("Plugin %s registered CLI command: %s", self.manifest.name, name)
# -- slash command registration -------------------------------------------
def register_command(
self,
name: str,
handler: Callable,
description: str = "",
) -> None:
"""Register a slash command (e.g. ``/lcm``) available in CLI and gateway sessions.
The handler signature is ``fn(raw_args: str) -> str | None``.
It may also be an async callable the gateway dispatch handles both.
Unlike ``register_cli_command()`` (which creates ``hermes <subcommand>``
terminal commands), this registers in-session slash commands that users
invoke during a conversation.
Names conflicting with built-in commands are rejected with a warning.
"""
clean = name.lower().strip().lstrip("/").replace(" ", "-")
if not clean:
logger.warning(
"Plugin '%s' tried to register a command with an empty name.",
self.manifest.name,
)
return
# Reject if it conflicts with a built-in command
try:
from hermes_cli.commands import resolve_command
if resolve_command(clean) is not None:
logger.warning(
"Plugin '%s' tried to register command '/%s' which conflicts "
"with a built-in command. Skipping.",
self.manifest.name, clean,
)
return
except Exception:
pass # If commands module isn't available, skip the check
self._manager._plugin_commands[clean] = {
"handler": handler,
"description": description or "Plugin command",
"plugin": self.manifest.name,
}
logger.debug("Plugin %s registered command: /%s", self.manifest.name, clean)
# -- tool dispatch -------------------------------------------------------
def dispatch_tool(self, tool_name: str, args: dict, **kwargs) -> str:
"""Dispatch a tool call through the registry, with parent agent context.
This is the public interface for plugin slash commands that need to call
tools like ``delegate_task`` without reaching into framework internals.
The parent agent (if available) is resolved automatically plugins never
need to access the agent directly.
Args:
tool_name: Registry name of the tool (e.g. ``"delegate_task"``).
args: Tool arguments dict (same as what the model would pass).
**kwargs: Extra keyword args forwarded to the registry dispatch.
Returns:
JSON string from the tool handler (same format as model tool calls).
"""
from tools.registry import registry
# Wire up parent agent context when available (CLI mode).
# In gateway mode _cli_ref is None — tools degrade gracefully
# (workspace hints fall back to TERMINAL_CWD, no spinner).
if "parent_agent" not in kwargs:
cli = self._manager._cli_ref
agent = getattr(cli, "agent", None) if cli else None
if agent is not None:
kwargs["parent_agent"] = agent
return registry.dispatch(tool_name, args, **kwargs)
# -- context engine registration -----------------------------------------
def register_context_engine(self, engine) -> None:
@@ -402,7 +323,6 @@ class PluginManager:
self._plugin_tool_names: Set[str] = set()
self._cli_commands: Dict[str, dict] = {}
self._context_engine = None # Set by a plugin via register_context_engine()
self._plugin_commands: Dict[str, dict] = {} # Slash commands registered by plugins
self._discovered: bool = False
self._cli_ref = None # Set by CLI after plugin discovery
# Plugin skill registry: qualified name → metadata dict.
@@ -565,10 +485,6 @@ class PluginManager:
for h in p.hooks_registered
}
)
loaded.commands_registered = [
c for c in self._plugin_commands
if self._plugin_commands[c].get("plugin") == manifest.name
]
loaded.enabled = True
except Exception as exc:
@@ -682,7 +598,6 @@ class PluginManager:
"enabled": loaded.enabled,
"tools": len(loaded.tools_registered),
"hooks": len(loaded.hooks_registered),
"commands": len(loaded.commands_registered),
"error": loaded.error,
}
)
@@ -784,20 +699,6 @@ def get_plugin_context_engine():
return get_plugin_manager()._context_engine
def get_plugin_command_handler(name: str) -> Optional[Callable]:
"""Return the handler for a plugin-registered slash command, or ``None``."""
entry = get_plugin_manager()._plugin_commands.get(name)
return entry["handler"] if entry else None
def get_plugin_commands() -> Dict[str, dict]:
"""Return the full plugin commands dict (name → {handler, description, plugin}).
Safe to call before discovery returns an empty dict if no plugins loaded.
"""
return get_plugin_manager()._plugin_commands
def get_plugin_toolsets() -> List[tuple]:
"""Return plugin toolsets as ``(key, label, description)`` tuples.
+2 -22
View File
@@ -128,7 +128,7 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
base_url_env_var="HF_BASE_URL",
),
"xai": HermesOverlay(
transport="codex_responses",
transport="openai_chat",
base_url_override="https://api.x.ai/v1",
base_url_env_var="XAI_BASE_URL",
),
@@ -141,10 +141,6 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
base_url_override="https://api.arcee.ai/api/v1",
base_url_env_var="ARCEE_BASE_URL",
),
"ollama-cloud": HermesOverlay(
transport="openai_chat",
base_url_env_var="OLLAMA_BASE_URL",
),
}
@@ -184,7 +180,6 @@ ALIASES: Dict[str, str] = {
# xai
"x-ai": "xai",
"x.ai": "xai",
"grok": "xai",
# kimi-for-coding (models.dev ID)
"kimi": "kimi-for-coding",
@@ -241,12 +236,6 @@ ALIASES: Dict[str, str] = {
"mimo": "xiaomi",
"xiaomi-mimo": "xiaomi",
# bedrock
"aws": "bedrock",
"aws-bedrock": "bedrock",
"amazon-bedrock": "bedrock",
"amazon": "bedrock",
# arcee
"arcee-ai": "arcee",
"arceeai": "arcee",
@@ -255,7 +244,7 @@ ALIASES: Dict[str, str] = {
"lmstudio": "lmstudio",
"lm-studio": "lmstudio",
"lm_studio": "lmstudio",
"ollama": "custom", # bare "ollama" = local; use "ollama-cloud" for cloud
"ollama": "ollama-cloud",
"vllm": "local",
"llamacpp": "local",
"llama.cpp": "local",
@@ -273,8 +262,6 @@ _LABEL_OVERRIDES: Dict[str, str] = {
"copilot-acp": "GitHub Copilot ACP",
"xiaomi": "Xiaomi MiMo",
"local": "Local endpoint",
"bedrock": "AWS Bedrock",
"ollama-cloud": "Ollama Cloud",
}
@@ -284,7 +271,6 @@ TRANSPORT_TO_API_MODE: Dict[str, str] = {
"openai_chat": "chat_completions",
"anthropic_messages": "anthropic_messages",
"codex_responses": "codex_responses",
"bedrock_converse": "bedrock_converse",
}
@@ -402,10 +388,6 @@ def determine_api_mode(provider: str, base_url: str = "") -> str:
if pdef is not None:
return TRANSPORT_TO_API_MODE.get(pdef.transport, "chat_completions")
# Direct provider checks for providers not in HERMES_OVERLAYS
if provider == "bedrock":
return "bedrock_converse"
# URL-based heuristics for custom / unknown providers
if base_url:
url_lower = base_url.rstrip("/").lower()
@@ -413,8 +395,6 @@ def determine_api_mode(provider: str, base_url: str = "") -> str:
return "anthropic_messages"
if "api.openai.com" in url_lower:
return "codex_responses"
if "bedrock-runtime" in url_lower and "amazonaws.com" in url_lower:
return "bedrock_converse"
return "chat_completions"
+1 -81
View File
@@ -41,8 +41,6 @@ def _detect_api_mode_for_url(base_url: str) -> Optional[str]:
tool calls with reasoning (chat/completions returns 400).
"""
normalized = (base_url or "").strip().lower().rstrip("/")
if "api.x.ai" in normalized:
return "codex_responses"
if "api.openai.com" in normalized and "openrouter" not in normalized:
return "codex_responses"
return None
@@ -126,7 +124,7 @@ def _copilot_runtime_api_mode(model_cfg: Dict[str, Any], api_key: str) -> str:
return "chat_completions"
_VALID_API_MODES = {"chat_completions", "codex_responses", "anthropic_messages", "bedrock_converse"}
_VALID_API_MODES = {"chat_completions", "codex_responses", "anthropic_messages"}
def _parse_api_mode(raw: Any) -> Optional[str]:
@@ -165,13 +163,10 @@ def _resolve_runtime_from_pool_entry(
base_url = cfg_base_url or base_url or "https://api.anthropic.com"
elif provider == "openrouter":
base_url = base_url or OPENROUTER_BASE_URL
elif provider == "xai":
api_mode = "codex_responses"
elif provider == "nous":
api_mode = "chat_completions"
elif provider == "copilot":
api_mode = _copilot_runtime_api_mode(model_cfg, getattr(entry, "runtime_api_key", ""))
base_url = base_url or PROVIDER_REGISTRY["copilot"].inference_base_url
else:
configured_provider = str(model_cfg.get("provider") or "").strip().lower()
# Honour model.base_url from config.yaml when the configured provider
@@ -632,8 +627,6 @@ def _resolve_explicit_runtime(
api_mode = "chat_completions"
if provider == "copilot":
api_mode = _copilot_runtime_api_mode(model_cfg, api_key)
elif provider == "xai":
api_mode = "codex_responses"
else:
configured_mode = _parse_api_mode(model_cfg.get("api_mode"))
if configured_mode:
@@ -843,77 +836,6 @@ def resolve_runtime_provider(
"requested_provider": requested_provider,
}
# AWS Bedrock (native Converse API via boto3)
if provider == "bedrock":
from agent.bedrock_adapter import (
has_aws_credentials,
resolve_aws_auth_env_var,
resolve_bedrock_region,
is_anthropic_bedrock_model,
)
# When the user explicitly selected bedrock (not auto-detected),
# trust boto3's credential chain — it handles IMDS, ECS task roles,
# Lambda execution roles, SSO, and other implicit sources that our
# env-var check can't detect.
is_explicit = requested_provider in ("bedrock", "aws", "aws-bedrock", "amazon-bedrock", "amazon")
if not is_explicit and not has_aws_credentials():
raise AuthError(
"No AWS credentials found for Bedrock. Configure one of:\n"
" - AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY\n"
" - AWS_PROFILE (for SSO / named profiles)\n"
" - IAM instance role (EC2, ECS, Lambda)\n"
"Or run 'aws configure' to set up credentials.",
code="no_aws_credentials",
)
# Read bedrock-specific config from config.yaml
from hermes_cli.config import load_config as _load_bedrock_config
_bedrock_cfg = _load_bedrock_config().get("bedrock", {})
# Region priority: config.yaml bedrock.region → env var → us-east-1
region = (_bedrock_cfg.get("region") or "").strip() or resolve_bedrock_region()
auth_source = resolve_aws_auth_env_var() or "aws-sdk-default-chain"
# Build guardrail config if configured
_gr = _bedrock_cfg.get("guardrail", {})
guardrail_config = None
if _gr.get("guardrail_identifier") and _gr.get("guardrail_version"):
guardrail_config = {
"guardrailIdentifier": _gr["guardrail_identifier"],
"guardrailVersion": _gr["guardrail_version"],
}
if _gr.get("stream_processing_mode"):
guardrail_config["streamProcessingMode"] = _gr["stream_processing_mode"]
if _gr.get("trace"):
guardrail_config["trace"] = _gr["trace"]
# Dual-path routing: Claude models use AnthropicBedrock SDK for full
# feature parity (prompt caching, thinking budgets, adaptive thinking).
# Non-Claude models use the Converse API for multi-model support.
_current_model = str(model_cfg.get("default") or "").strip()
if is_anthropic_bedrock_model(_current_model):
# Claude on Bedrock → AnthropicBedrock SDK → anthropic_messages path
runtime = {
"provider": "bedrock",
"api_mode": "anthropic_messages",
"base_url": f"https://bedrock-runtime.{region}.amazonaws.com",
"api_key": "aws-sdk",
"source": auth_source,
"region": region,
"bedrock_anthropic": True, # Signal to use AnthropicBedrock client
"requested_provider": requested_provider,
}
else:
# Non-Claude (Nova, DeepSeek, Llama, etc.) → Converse API
runtime = {
"provider": "bedrock",
"api_mode": "bedrock_converse",
"base_url": f"https://bedrock-runtime.{region}.amazonaws.com",
"api_key": "aws-sdk",
"source": auth_source,
"region": region,
"requested_provider": requested_provider,
}
if guardrail_config:
runtime["guardrail_config"] = guardrail_config
return runtime
# API-key providers (z.ai/GLM, Kimi, MiniMax, MiniMax-CN)
pconfig = PROVIDER_REGISTRY.get(provider)
if pconfig and pconfig.auth_type == "api_key":
@@ -930,8 +852,6 @@ def resolve_runtime_provider(
api_mode = "chat_completions"
if provider == "copilot":
api_mode = _copilot_runtime_api_mode(model_cfg, creds.get("api_key", ""))
elif provider == "xai":
api_mode = "codex_responses"
else:
configured_provider = str(model_cfg.get("provider") or "").strip().lower()
# Only honor persisted api_mode when it belongs to the same provider family.
+20 -39
View File
@@ -20,7 +20,10 @@ import copy
from pathlib import Path
from typing import Optional, Dict, Any
from hermes_cli.nous_subscription import get_nous_subscription_features
from hermes_cli.nous_subscription import (
apply_nous_provider_defaults,
get_nous_subscription_features,
)
from tools.tool_backend_helpers import managed_nous_tools_enabled
from hermes_constants import get_optional_skills_dir
@@ -210,20 +213,20 @@ def prompt(question: str, default: str = None, password: bool = False) -> str:
sys.exit(1)
def _curses_prompt_choice(question: str, choices: list, default: int = 0, description: str | None = None) -> int:
def _curses_prompt_choice(question: str, choices: list, default: int = 0) -> int:
"""Single-select menu using curses. Delegates to curses_radiolist."""
from hermes_cli.curses_ui import curses_radiolist
return curses_radiolist(question, choices, selected=default, cancel_returns=-1, description=description)
return curses_radiolist(question, choices, selected=default, cancel_returns=-1)
def prompt_choice(question: str, choices: list, default: int = 0, description: str | None = None) -> int:
def prompt_choice(question: str, choices: list, default: int = 0) -> int:
"""Prompt for a choice from a list with arrow key navigation.
Escape keeps the current default (skips the question).
Ctrl+C exits the wizard.
"""
idx = _curses_prompt_choice(question, choices, default, description=description)
idx = _curses_prompt_choice(question, choices, default)
if idx >= 0:
if idx == default:
print_info(" Skipped (keeping current)")
@@ -832,7 +835,14 @@ def setup_model_provider(config: dict, *, quick: bool = False):
print_info("Skipped — add later with 'hermes setup' or configure AUXILIARY_VISION_* settings")
# Tool Gateway prompt is already shown by _model_flow_nous() above.
if selected_provider == "nous" and nous_subscription_selected:
changed_defaults = apply_nous_provider_defaults(config)
current_tts = str(config.get("tts", {}).get("provider") or "edge")
if "tts" in changed_defaults:
print_success("TTS provider set to: OpenAI TTS via your Nous subscription")
else:
print_info(f"Keeping your existing TTS provider: {current_tts}")
save_config(config)
if not quick and selected_provider != "nous":
@@ -910,7 +920,6 @@ def _setup_tts_provider(config: dict):
"edge": "Edge TTS",
"elevenlabs": "ElevenLabs",
"openai": "OpenAI TTS",
"xai": "xAI TTS",
"minimax": "MiniMax TTS",
"mistral": "Mistral Voxtral TTS",
"neutts": "NeuTTS",
@@ -932,13 +941,12 @@ def _setup_tts_provider(config: dict):
"Edge TTS (free, cloud-based, no setup needed)",
"ElevenLabs (premium quality, needs API key)",
"OpenAI TTS (good quality, needs API key)",
"xAI TTS (Grok voices, needs API key)",
"MiniMax TTS (high quality with voice cloning, needs API key)",
"Mistral Voxtral TTS (multilingual, native Opus, needs API key)",
"NeuTTS (local on-device, free, ~300MB model download)",
]
)
providers.extend(["edge", "elevenlabs", "openai", "xai", "minimax", "mistral", "neutts"])
providers.extend(["edge", "elevenlabs", "openai", "minimax", "mistral", "neutts"])
choices.append(f"Keep current ({current_label})")
keep_current_idx = len(choices) - 1
idx = prompt_choice("Select TTS provider:", choices, keep_current_idx)
@@ -1004,23 +1012,6 @@ def _setup_tts_provider(config: dict):
print_warning("No API key provided. Falling back to Edge TTS.")
selected = "edge"
elif selected == "xai":
existing = get_env_value("XAI_API_KEY")
if not existing:
print()
api_key = prompt("xAI API key for TTS", password=True)
if api_key:
save_env_value("XAI_API_KEY", api_key)
print_success("xAI TTS API key saved")
else:
from hermes_constants import display_hermes_home as _dhh
print_warning(
"No xAI API key provided for TTS. Configure XAI_API_KEY via "
f"hermes setup model or {_dhh()}/.env to use xAI TTS. "
"Falling back to Edge TTS."
)
selected = "edge"
elif selected == "minimax":
existing = get_env_value("MINIMAX_API_KEY")
if not existing:
@@ -1620,19 +1611,9 @@ def _setup_telegram():
return
print_info("Create a bot via @BotFather on Telegram")
import re
while True:
token = prompt("Telegram bot token", password=True)
if not token:
return
if not re.match(r"^\d+:[A-Za-z0-9_-]{30,}$", token):
print_error(
"Invalid token format. Expected: <numeric_id>:<alphanumeric_hash> "
"(e.g., 123456789:ABCdefGHI-jklMNOpqrSTUvwxYZ)"
)
continue
break
token = prompt("Telegram bot token", password=True)
if not token:
return
save_env_value("TELEGRAM_BOT_TOKEN", token)
print_success("Telegram token saved")
+1 -3
View File
@@ -708,9 +708,7 @@ def init_skin_from_config(config: dict) -> None:
Call this once during CLI init with the loaded config dict.
"""
display = config.get("display") or {}
if not isinstance(display, dict):
display = {}
display = config.get("display", {})
skin_name = display.get("skin", "default")
if isinstance(skin_name, str) and skin_name.strip():
set_active_skin(skin_name.strip())
+1 -13
View File
@@ -212,7 +212,7 @@ def show_status(args):
if managed_nous_tools_enabled():
features = get_nous_subscription_features(config)
print()
print(color("◆ Nous Tool Gateway", Colors.CYAN, Colors.BOLD))
print(color("◆ Nous Subscription Features", Colors.CYAN, Colors.BOLD))
if not features.nous_auth_present:
print(" Nous Portal ✗ not logged in")
else:
@@ -230,18 +230,6 @@ def show_status(args):
else:
state = "not configured"
print(f" {feature.label:<15} {check_mark(feature.available or feature.active or feature.managed_by_nous)} {state}")
elif nous_logged_in:
# Logged into Nous but on the free tier — show upgrade nudge
print()
print(color("◆ Nous Tool Gateway", Colors.CYAN, Colors.BOLD))
print(" Your free-tier Nous account does not include Tool Gateway access.")
print(" Upgrade your subscription to unlock managed web, image, TTS, and browser tools.")
try:
portal_url = nous_status.get("portal_base_url", "").rstrip("/")
if portal_url:
print(f" Upgrade: {portal_url}")
except Exception:
pass
# =========================================================================
# API-Key Providers
+30 -70
View File
@@ -63,7 +63,6 @@ CONFIGURABLE_TOOLSETS = [
("clarify", "❓ Clarifying Questions", "clarify"),
("delegation", "👥 Task Delegation", "delegate_task"),
("cronjob", "⏰ Cron Jobs", "create/list/update/pause/resume/run, with optional attached skills"),
("messaging", "📨 Cross-Platform Messaging", "send_message"),
("rl", "🧪 RL Training", "Tinker-Atropos training tools"),
("homeassistant", "🏠 Home Assistant", "smart home device control"),
]
@@ -122,7 +121,6 @@ TOOL_CATEGORIES = {
"providers": [
{
"name": "Nous Subscription",
"badge": "subscription",
"tag": "Managed OpenAI TTS billed to your subscription",
"env_vars": [],
"tts_provider": "openai",
@@ -132,32 +130,21 @@ TOOL_CATEGORIES = {
},
{
"name": "Microsoft Edge TTS",
"badge": "★ recommended · free",
"tag": "Good quality, no API key needed",
"tag": "Free - no API key needed",
"env_vars": [],
"tts_provider": "edge",
},
{
"name": "OpenAI TTS",
"badge": "paid",
"tag": "High quality voices",
"tag": "Premium - high quality voices",
"env_vars": [
{"key": "VOICE_TOOLS_OPENAI_KEY", "prompt": "OpenAI API key", "url": "https://platform.openai.com/api-keys"},
],
"tts_provider": "openai",
},
{
"name": "xAI TTS",
"tag": "Grok voices - requires xAI API key",
"env_vars": [
{"key": "XAI_API_KEY", "prompt": "xAI API key", "url": "https://console.x.ai/"},
],
"tts_provider": "xai",
},
{
"name": "ElevenLabs",
"badge": "paid",
"tag": "Most natural voices",
"tag": "Premium - most natural voices",
"env_vars": [
{"key": "ELEVENLABS_API_KEY", "prompt": "ElevenLabs API key", "url": "https://elevenlabs.io/app/settings/api-keys"},
],
@@ -165,8 +152,7 @@ TOOL_CATEGORIES = {
},
{
"name": "Mistral (Voxtral TTS)",
"badge": "paid",
"tag": "Multilingual, native Opus",
"tag": "Multilingual, native Opus, needs MISTRAL_API_KEY",
"env_vars": [
{"key": "MISTRAL_API_KEY", "prompt": "Mistral API key", "url": "https://console.mistral.ai/"},
],
@@ -182,7 +168,6 @@ TOOL_CATEGORIES = {
"providers": [
{
"name": "Nous Subscription",
"badge": "subscription",
"tag": "Managed Firecrawl billed to your subscription",
"web_backend": "firecrawl",
"env_vars": [],
@@ -192,8 +177,7 @@ TOOL_CATEGORIES = {
},
{
"name": "Firecrawl Cloud",
"badge": "★ recommended",
"tag": "Full-featured search, extract, and crawl",
"tag": "Hosted service - search, extract, and crawl",
"web_backend": "firecrawl",
"env_vars": [
{"key": "FIRECRAWL_API_KEY", "prompt": "Firecrawl API key", "url": "https://firecrawl.dev"},
@@ -201,8 +185,7 @@ TOOL_CATEGORIES = {
},
{
"name": "Exa",
"badge": "paid",
"tag": "Neural search with semantic understanding",
"tag": "AI-native search and contents",
"web_backend": "exa",
"env_vars": [
{"key": "EXA_API_KEY", "prompt": "Exa API key", "url": "https://exa.ai"},
@@ -210,8 +193,7 @@ TOOL_CATEGORIES = {
},
{
"name": "Parallel",
"badge": "paid",
"tag": "AI-powered search and extract",
"tag": "AI-native search and extract",
"web_backend": "parallel",
"env_vars": [
{"key": "PARALLEL_API_KEY", "prompt": "Parallel API key", "url": "https://parallel.ai"},
@@ -219,8 +201,7 @@ TOOL_CATEGORIES = {
},
{
"name": "Tavily",
"badge": "free tier",
"tag": "Search, extract, and crawl — 1000 free searches/mo",
"tag": "AI-native search, extract, and crawl",
"web_backend": "tavily",
"env_vars": [
{"key": "TAVILY_API_KEY", "prompt": "Tavily API key", "url": "https://app.tavily.com/home"},
@@ -228,8 +209,7 @@ TOOL_CATEGORIES = {
},
{
"name": "Firecrawl Self-Hosted",
"badge": "free · self-hosted",
"tag": "Run your own Firecrawl instance (Docker)",
"tag": "Free - run your own instance",
"web_backend": "firecrawl",
"env_vars": [
{"key": "FIRECRAWL_API_URL", "prompt": "Your Firecrawl instance URL (e.g., http://localhost:3002)"},
@@ -243,7 +223,6 @@ TOOL_CATEGORIES = {
"providers": [
{
"name": "Nous Subscription",
"badge": "subscription",
"tag": "Managed FAL image generation billed to your subscription",
"env_vars": [],
"requires_nous_auth": True,
@@ -252,7 +231,6 @@ TOOL_CATEGORIES = {
},
{
"name": "FAL.ai",
"badge": "paid",
"tag": "FLUX 2 Pro with auto-upscaling",
"env_vars": [
{"key": "FAL_KEY", "prompt": "FAL API key", "url": "https://fal.ai/dashboard/keys"},
@@ -266,7 +244,6 @@ TOOL_CATEGORIES = {
"providers": [
{
"name": "Nous Subscription (Browser Use cloud)",
"badge": "subscription",
"tag": "Managed Browser Use billed to your subscription",
"env_vars": [],
"browser_provider": "browser-use",
@@ -277,16 +254,14 @@ TOOL_CATEGORIES = {
},
{
"name": "Local Browser",
"badge": "★ recommended · free",
"tag": "Headless Chromium, no API key needed",
"tag": "Free headless Chromium (no API key needed)",
"env_vars": [],
"browser_provider": "local",
"post_setup": "agent_browser",
},
{
"name": "Browserbase",
"badge": "paid",
"tag": "Cloud browser with stealth and proxies",
"tag": "Cloud browser with stealth & proxies",
"env_vars": [
{"key": "BROWSERBASE_API_KEY", "prompt": "Browserbase API key", "url": "https://browserbase.com"},
{"key": "BROWSERBASE_PROJECT_ID", "prompt": "Browserbase project ID"},
@@ -296,7 +271,6 @@ TOOL_CATEGORIES = {
},
{
"name": "Browser Use",
"badge": "paid",
"tag": "Cloud browser with remote execution",
"env_vars": [
{"key": "BROWSER_USE_API_KEY", "prompt": "Browser Use API key", "url": "https://browser-use.com"},
@@ -306,7 +280,6 @@ TOOL_CATEGORIES = {
},
{
"name": "Firecrawl",
"badge": "paid",
"tag": "Cloud browser with remote execution",
"env_vars": [
{"key": "FIRECRAWL_API_KEY", "prompt": "Firecrawl API key", "url": "https://firecrawl.dev"},
@@ -316,8 +289,7 @@ TOOL_CATEGORIES = {
},
{
"name": "Camofox",
"badge": "free · local",
"tag": "Anti-detection browser (Firefox/Camoufox)",
"tag": "Local anti-detection browser (Firefox/Camoufox)",
"env_vars": [
{"key": "CAMOFOX_URL", "prompt": "Camofox server URL", "default": "http://localhost:9377",
"url": "https://github.com/jo-inc/camofox-browser"},
@@ -866,8 +838,7 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
# Plain text labels only (no ANSI codes in menu items)
provider_choices = []
for p in providers:
badge = f" [{p['badge']}]" if p.get("badge") else ""
tag = f"{p['tag']}" if p.get("tag") else ""
tag = f" ({p['tag']})" if p.get("tag") else ""
configured = ""
env_vars = p.get("env_vars", [])
if not env_vars or all(get_env_value(v["key"]) for v in env_vars):
@@ -877,7 +848,7 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
configured = ""
else:
configured = " [configured]"
provider_choices.append(f"{p['name']}{badge}{tag}{configured}")
provider_choices.append(f"{p['name']}{tag}{configured}")
# Add skip option
provider_choices.append("Skip — keep defaults / configure later")
@@ -954,49 +925,34 @@ def _configure_provider(provider: dict, config: dict):
# Set TTS provider in config if applicable
if provider.get("tts_provider"):
tts_cfg = config.setdefault("tts", {})
tts_cfg["provider"] = provider["tts_provider"]
tts_cfg["use_gateway"] = bool(managed_feature)
config.setdefault("tts", {})["provider"] = provider["tts_provider"]
# Set browser cloud provider in config if applicable
if "browser_provider" in provider:
bp = provider["browser_provider"]
browser_cfg = config.setdefault("browser", {})
if bp == "local":
browser_cfg["cloud_provider"] = "local"
config.setdefault("browser", {})["cloud_provider"] = "local"
_print_success(" Browser set to local mode")
elif bp:
browser_cfg["cloud_provider"] = bp
config.setdefault("browser", {})["cloud_provider"] = bp
_print_success(f" Browser cloud provider set to: {bp}")
browser_cfg["use_gateway"] = bool(managed_feature)
# Set web search backend in config if applicable
if provider.get("web_backend"):
web_cfg = config.setdefault("web", {})
web_cfg["backend"] = provider["web_backend"]
web_cfg["use_gateway"] = bool(managed_feature)
config.setdefault("web", {})["backend"] = provider["web_backend"]
_print_success(f" Web backend set to: {provider['web_backend']}")
# For tools without a specific config key (e.g. image_gen), still
# track use_gateway so the runtime knows the user's intent.
if managed_feature and managed_feature not in ("web", "tts", "browser"):
config.setdefault(managed_feature, {})["use_gateway"] = True
elif not managed_feature:
# User picked a non-gateway provider — find which category this
# belongs to and clear use_gateway if it was previously set.
for cat_key, cat in TOOL_CATEGORIES.items():
if provider in cat.get("providers", []):
section = config.get(cat_key)
if isinstance(section, dict) and section.get("use_gateway"):
section["use_gateway"] = False
break
if not env_vars:
if provider.get("post_setup"):
_run_post_setup(provider["post_setup"])
_print_success(f" {provider['name']} - no configuration needed!")
if managed_feature:
_print_info(" Requests for this tool will be billed to your Nous subscription.")
override_envs = provider.get("override_env_vars", [])
if any(get_env_value(env_var) for env_var in override_envs):
_print_warning(
" Direct credentials are still configured and may take precedence until you remove them from ~/.hermes/.env."
)
return
# Prompt for each required env var
@@ -1148,8 +1104,7 @@ def _configure_tool_category_for_reconfig(ts_key: str, cat: dict, config: dict):
provider_choices = []
for p in providers:
badge = f" [{p['badge']}]" if p.get("badge") else ""
tag = f"{p['tag']}" if p.get("tag") else ""
tag = f" ({p['tag']})" if p.get("tag") else ""
configured = ""
env_vars = p.get("env_vars", [])
if not env_vars or all(get_env_value(v["key"]) for v in env_vars):
@@ -1159,7 +1114,7 @@ def _configure_tool_category_for_reconfig(ts_key: str, cat: dict, config: dict):
configured = ""
else:
configured = " [configured]"
provider_choices.append(f"{p['name']}{badge}{tag}{configured}")
provider_choices.append(f"{p['name']}{tag}{configured}")
default_idx = _detect_active_provider_index(providers, config)
@@ -1202,6 +1157,11 @@ def _reconfigure_provider(provider: dict, config: dict):
_print_success(f" {provider['name']} - no configuration needed!")
if managed_feature:
_print_info(" Requests for this tool will be billed to your Nous subscription.")
override_envs = provider.get("override_env_vars", [])
if any(get_env_value(env_var) for env_var in override_envs):
_print_warning(
" Direct credentials are still configured and may take precedence until you remove them from ~/.hermes/.env."
)
return
for var in env_vars:
+1 -242
View File
@@ -11,7 +11,6 @@ Usage:
import asyncio
import hmac
import importlib.util
import json
import logging
import os
@@ -97,9 +96,6 @@ _PUBLIC_API_PATHS: frozenset = frozenset({
"/api/config/defaults",
"/api/config/schema",
"/api/model/info",
"/api/dashboard/themes",
"/api/dashboard/plugins",
"/api/dashboard/plugins/rescan",
})
@@ -118,7 +114,7 @@ def _require_token(request: Request) -> None:
async def auth_middleware(request: Request, call_next):
"""Require the session token on all /api/ routes except the public list."""
path = request.url.path
if path.startswith("/api/") and path not in _PUBLIC_API_PATHS and not path.startswith("/api/plugins/"):
if path.startswith("/api/") and path not in _PUBLIC_API_PATHS:
auth = request.headers.get("authorization", "")
expected = f"Bearer {_SESSION_TOKEN}"
if not hmac.compare_digest(auth.encode(), expected.encode()):
@@ -170,11 +166,6 @@ _SCHEMA_OVERRIDES: Dict[str, Dict[str, Any]] = {
"description": "CLI visual theme",
"options": ["default", "ares", "mono", "slate"],
},
"dashboard.theme": {
"type": "select",
"description": "Web dashboard visual theme",
"options": ["default", "midnight", "ember", "mono", "cyberpunk", "rose"],
},
"display.resume_display": {
"type": "select",
"description": "How resumed sessions display history",
@@ -233,7 +224,6 @@ _CATEGORY_MERGE: Dict[str, str] = {
"approvals": "security",
"human_delay": "display",
"smart_model_routing": "agent",
"dashboard": "display",
}
# Display order for tabs — unlisted categories sort alphabetically after these.
@@ -2078,237 +2068,6 @@ def mount_spa(application: FastAPI):
return _serve_index()
# ---------------------------------------------------------------------------
# Dashboard theme endpoints
# ---------------------------------------------------------------------------
# Built-in dashboard themes — label + description only. The actual color
# definitions live in the frontend (web/src/themes/presets.ts).
_BUILTIN_DASHBOARD_THEMES = [
{"name": "default", "label": "Hermes Teal", "description": "Classic dark teal — the canonical Hermes look"},
{"name": "midnight", "label": "Midnight", "description": "Deep blue-violet with cool accents"},
{"name": "ember", "label": "Ember", "description": "Warm crimson and bronze — forge vibes"},
{"name": "mono", "label": "Mono", "description": "Clean grayscale — minimal and focused"},
{"name": "cyberpunk", "label": "Cyberpunk", "description": "Neon green on black — matrix terminal"},
{"name": "rose", "label": "Rosé", "description": "Soft pink and warm ivory — easy on the eyes"},
]
def _discover_user_themes() -> list:
"""Scan ~/.hermes/dashboard-themes/*.yaml for user-created themes."""
themes_dir = get_hermes_home() / "dashboard-themes"
if not themes_dir.is_dir():
return []
result = []
for f in sorted(themes_dir.glob("*.yaml")):
try:
data = yaml.safe_load(f.read_text(encoding="utf-8"))
if isinstance(data, dict) and data.get("name"):
result.append({
"name": data["name"],
"label": data.get("label", data["name"]),
"description": data.get("description", ""),
})
except Exception:
continue
return result
@app.get("/api/dashboard/themes")
async def get_dashboard_themes():
"""Return available themes and the currently active one."""
config = load_config()
active = config.get("dashboard", {}).get("theme", "default")
user_themes = _discover_user_themes()
# Merge built-in + user, user themes override built-in by name.
seen = set()
themes = []
for t in _BUILTIN_DASHBOARD_THEMES:
seen.add(t["name"])
themes.append(t)
for t in user_themes:
if t["name"] not in seen:
themes.append(t)
seen.add(t["name"])
return {"themes": themes, "active": active}
class ThemeSetBody(BaseModel):
name: str
@app.put("/api/dashboard/theme")
async def set_dashboard_theme(body: ThemeSetBody):
"""Set the active dashboard theme (persists to config.yaml)."""
config = load_config()
if "dashboard" not in config:
config["dashboard"] = {}
config["dashboard"]["theme"] = body.name
save_config(config)
return {"ok": True, "theme": body.name}
# ---------------------------------------------------------------------------
# Dashboard plugin system
# ---------------------------------------------------------------------------
def _discover_dashboard_plugins() -> list:
"""Scan plugins/*/dashboard/manifest.json for dashboard extensions.
Checks three plugin sources (same as hermes_cli.plugins):
1. User plugins: ~/.hermes/plugins/<name>/dashboard/manifest.json
2. Bundled plugins: <repo>/plugins/<name>/dashboard/manifest.json (memory/, etc.)
3. Project plugins: ./.hermes/plugins/ (only if HERMES_ENABLE_PROJECT_PLUGINS)
"""
plugins = []
seen_names: set = set()
search_dirs = [
(get_hermes_home() / "plugins", "user"),
(PROJECT_ROOT / "plugins" / "memory", "bundled"),
(PROJECT_ROOT / "plugins", "bundled"),
]
if os.environ.get("HERMES_ENABLE_PROJECT_PLUGINS"):
search_dirs.append((Path.cwd() / ".hermes" / "plugins", "project"))
for plugins_root, source in search_dirs:
if not plugins_root.is_dir():
continue
for child in sorted(plugins_root.iterdir()):
if not child.is_dir():
continue
manifest_file = child / "dashboard" / "manifest.json"
if not manifest_file.exists():
continue
try:
data = json.loads(manifest_file.read_text(encoding="utf-8"))
name = data.get("name", child.name)
if name in seen_names:
continue
seen_names.add(name)
plugins.append({
"name": name,
"label": data.get("label", name),
"description": data.get("description", ""),
"icon": data.get("icon", "Puzzle"),
"version": data.get("version", "0.0.0"),
"tab": data.get("tab", {"path": f"/{name}", "position": "end"}),
"entry": data.get("entry", "dist/index.js"),
"css": data.get("css"),
"has_api": bool(data.get("api")),
"source": source,
"_dir": str(child / "dashboard"),
"_api_file": data.get("api"),
})
except Exception as exc:
_log.warning("Bad dashboard plugin manifest %s: %s", manifest_file, exc)
continue
return plugins
# Cache discovered plugins per-process (refresh on explicit re-scan).
_dashboard_plugins_cache: Optional[list] = None
def _get_dashboard_plugins(force_rescan: bool = False) -> list:
global _dashboard_plugins_cache
if _dashboard_plugins_cache is None or force_rescan:
_dashboard_plugins_cache = _discover_dashboard_plugins()
return _dashboard_plugins_cache
@app.get("/api/dashboard/plugins")
async def get_dashboard_plugins():
"""Return discovered dashboard plugins."""
plugins = _get_dashboard_plugins()
# Strip internal fields before sending to frontend.
return [
{k: v for k, v in p.items() if not k.startswith("_")}
for p in plugins
]
@app.get("/api/dashboard/plugins/rescan")
async def rescan_dashboard_plugins():
"""Force re-scan of dashboard plugins."""
plugins = _get_dashboard_plugins(force_rescan=True)
return {"ok": True, "count": len(plugins)}
@app.get("/dashboard-plugins/{plugin_name}/{file_path:path}")
async def serve_plugin_asset(plugin_name: str, file_path: str):
"""Serve static assets from a dashboard plugin directory.
Only serves files from the plugin's ``dashboard/`` subdirectory.
Path traversal is blocked by checking ``resolve().is_relative_to()``.
"""
plugins = _get_dashboard_plugins()
plugin = next((p for p in plugins if p["name"] == plugin_name), None)
if not plugin:
raise HTTPException(status_code=404, detail="Plugin not found")
base = Path(plugin["_dir"])
target = (base / file_path).resolve()
if not target.is_relative_to(base.resolve()):
raise HTTPException(status_code=403, detail="Path traversal blocked")
if not target.exists() or not target.is_file():
raise HTTPException(status_code=404, detail="File not found")
# Guess content type
suffix = target.suffix.lower()
content_types = {
".js": "application/javascript",
".mjs": "application/javascript",
".css": "text/css",
".json": "application/json",
".html": "text/html",
".svg": "image/svg+xml",
".png": "image/png",
".jpg": "image/jpeg",
".woff2": "font/woff2",
".woff": "font/woff",
}
media_type = content_types.get(suffix, "application/octet-stream")
return FileResponse(target, media_type=media_type)
def _mount_plugin_api_routes():
"""Import and mount backend API routes from plugins that declare them.
Each plugin's ``api`` field points to a Python file that must expose
a ``router`` (FastAPI APIRouter). Routes are mounted under
``/api/plugins/<name>/``.
"""
for plugin in _get_dashboard_plugins():
api_file_name = plugin.get("_api_file")
if not api_file_name:
continue
api_path = Path(plugin["_dir"]) / api_file_name
if not api_path.exists():
_log.warning("Plugin %s declares api=%s but file not found", plugin["name"], api_file_name)
continue
try:
spec = importlib.util.spec_from_file_location(
f"hermes_dashboard_plugin_{plugin['name']}", api_path,
)
if spec is None or spec.loader is None:
continue
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)
router = getattr(mod, "router", None)
if router is None:
_log.warning("Plugin %s api file has no 'router' attribute", plugin["name"])
continue
app.include_router(router, prefix=f"/api/plugins/{plugin['name']}")
_log.info("Mounted plugin API routes: /api/plugins/%s/", plugin["name"])
except Exception as exc:
_log.warning("Failed to load plugin %s API routes: %s", plugin["name"], exc)
# Mount plugin API routes before the SPA catch-all.
_mount_plugin_api_routes()
mount_spa(app)
-1
View File
@@ -358,7 +358,6 @@ def _add_rotating_handler(
path.parent.mkdir(parents=True, exist_ok=True)
handler = _ManagedRotatingFileHandler(
str(path), maxBytes=max_bytes, backupCount=backup_count,
encoding="utf-8",
)
handler.setLevel(level)
handler.setFormatter(formatter)
Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 870 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 134 KiB

+665
View File
@@ -0,0 +1,665 @@
<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Hermes Agent — An Agent That Grows With You</title>
<meta
name="description"
content="An open-source agent that grows with you — learns your projects, builds its own skills, and reaches you wherever you are. By Nous Research."
/>
<meta name="theme-color" content="#0A0E1A" />
<meta property="og:title" content="Hermes Agent — AI Agent Framework" />
<meta
property="og:description"
content="An open-source agent that grows with you. Install it, give it your messaging accounts, and it becomes a persistent personal agent — learning your projects, building its own skills, and reaching you wherever you are."
/>
<meta property="og:type" content="website" />
<meta property="og:url" content="https://hermes-agent.nousresearch.com" />
<meta
property="og:image"
content="https://hermes-agent.nousresearch.com/hermes-agent-banner.png"
/>
<link rel="preconnect" href="https://fonts.googleapis.com" />
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
<link
href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&family=JetBrains+Mono:wght@400;500&display=swap"
rel="stylesheet"
/>
<script
src="https://cdnjs.cloudflare.com/ajax/libs/three.js/r128/three.min.js"
defer
></script>
<link rel="stylesheet" href="style.css" />
<link rel="icon" type="image/x-icon" href="favicon.ico" />
<link rel="icon" type="image/png" sizes="32x32" href="favicon-32x32.png" />
<link rel="icon" type="image/png" sizes="16x16" href="favicon-16x16.png" />
<link rel="apple-touch-icon" sizes="180x180" href="apple-touch-icon.png" />
</head>
<body>
<canvas id="noise-overlay"></canvas>
<div class="ambient-glow glow-1"></div>
<div class="ambient-glow glow-2"></div>
<nav class="nav">
<div class="nav-inner">
<a href="#" class="nav-logo">
<img src="nous-logo.png" alt="Nous Research" class="nav-nous-logo" />
<span class="nav-brand"
>Hermes Agent <span class="nav-by">by Nous Research</span></span
>
</a>
<div class="nav-links">
<a href="#install">Install</a>
<a href="#features">Features</a>
<a href="/docs/">Docs</a>
<a
href="https://github.com/NousResearch/hermes-agent"
target="_blank"
rel="noopener"
>GitHub</a
>
<a
href="https://discord.gg/NousResearch"
target="_blank"
rel="noopener"
>Discord</a
>
</div>
<button
class="nav-hamburger"
id="nav-hamburger"
onclick="toggleMobileNav()"
aria-label="Toggle menu"
>
<span class="hamburger-bar"></span>
<span class="hamburger-bar"></span>
<span class="hamburger-bar"></span>
</button>
<div class="nav-mobile" id="nav-mobile">
<a href="#install" onclick="toggleMobileNav()">Install</a>
<a href="#features" onclick="toggleMobileNav()">Features</a>
<a href="/docs/">Docs</a>
<a
href="https://github.com/NousResearch/hermes-agent"
target="_blank"
rel="noopener"
>GitHub</a
>
<a
href="https://discord.gg/NousResearch"
target="_blank"
rel="noopener"
>Discord</a
>
</div>
</div>
</nav>
<section class="hero">
<div class="hero-content">
<div class="hero-badge">
<span class="badge-dot"></span>
Open Source &bull; MIT License
</div>
<!-- prettier-ignore -->
<pre class="hero-ascii" aria-hidden="true" style="font-family: monospace; line-height: 1.1">
██╗ ██╗███████╗██████╗ ███╗ ███╗███████╗███████╗ █████╗ ██████╗ ███████╗███╗ ██╗████████╗
██║ ██║██╔════╝██╔══██╗████╗ ████║██╔════╝██╔════╝ ██╔══██╗██╔════╝ ██╔════╝████╗ ██║╚══██╔══╝
███████║█████╗ ██████╔╝██╔████╔██║█████╗ ███████╗ ███████║██║ ███╗█████╗ ██╔██╗ ██║ ██║
██╔══██║██╔══╝ ██╔══██╗██║╚██╔╝██║██╔══╝ ╚════██║ ██╔══██║██║ ██║██╔══╝ ██║╚██╗██║ ██║
██║ ██║███████╗██║ ██║██║ ╚═╝ ██║███████╗███████║ ██║ ██║╚██████╔╝███████╗██║ ╚████║ ██║
╚═╝ ╚═╝╚══════╝╚═╝ ╚═╝╚═╝ ╚═╝╚══════╝╚══════╝ ╚═╝ ╚═╝ ╚═════╝ ╚══════╝╚═╝ ╚═══╝ ╚═╝
</pre>
<h1 class="hero-title">
An agent that<br />
<span class="hero-gradient">grows with you.</span>
</h1>
<p class="hero-subtitle">
It's not a coding copilot tethered to an IDE or a chatbot wrapper
around a single API. It's an <strong>autonomous agent</strong> that
lives on your server, remembers what it learns, and gets more capable
the longer it runs.
</p>
<div class="hero-install">
<div class="install-widget">
<div class="install-widget-header">
<div class="install-dots">
<span class="dot dot-red"></span>
<span class="dot dot-yellow"></span>
<span class="dot dot-green"></span>
</div>
<div class="install-tabs">
<button
class="install-tab active"
data-platform="linux"
onclick="switchPlatform('linux')"
>
Linux / macOS / WSL
</button>
</div>
</div>
<div class="install-widget-body">
<span class="install-prompt" id="install-prompt">$</span>
<code id="install-command"
>curl -fsSL
https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh
| bash</code
>
<button
class="copy-btn"
onclick="copyInstall()"
title="Copy to clipboard"
>
<svg
width="16"
height="16"
viewBox="0 0 24 24"
fill="none"
stroke="currentColor"
stroke-width="2"
stroke-linecap="round"
stroke-linejoin="round"
>
<rect x="9" y="9" width="13" height="13" rx="2" ry="2" />
<path
d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"
/>
</svg>
<span class="copy-text">Copy</span>
</button>
</div>
</div>
<p class="install-note" id="install-note">
Works on Linux, macOS & WSL2 · No prerequisites · Installs
everything automatically
</p>
</div>
<div class="hero-links">
<a
href="https://portal.nousresearch.com"
class="btn btn-primary"
target="_blank"
rel="noopener"
>
<svg
width="20"
height="20"
viewBox="0 0 24 24"
fill="none"
stroke="currentColor"
stroke-width="2"
stroke-linecap="round"
stroke-linejoin="round"
>
<path d="M15 3h4a2 2 0 0 1 2 2v14a2 2 0 0 1-2 2h-4" />
<polyline points="10 17 15 12 10 7" />
<line x1="15" y1="12" x2="3" y2="12" />
</svg>
Sign Up on Nous Portal
</a>
</div>
</div>
</section>
<section class="section section-install" id="install">
<div class="container">
<div class="section-header">
<h2>Get started in 60 seconds</h2>
</div>
<div class="install-steps">
<div class="install-step">
<div class="step-number">1</div>
<div class="step-content">
<h4>Install</h4>
<div class="code-block">
<div class="code-header">
<div class="code-tabs">
<button
class="code-tab active"
data-platform="linux"
onclick="switchStepPlatform('linux')"
>
Linux / macOS / WSL
</button>
</div>
<button
class="copy-btn"
id="step1-copy"
onclick="copyText(this)"
data-text="curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash"
>
Copy
</button>
</div>
<pre><code id="step1-command">curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash</code></pre>
</div>
<p class="step-note" id="step1-note">
Installs uv, Python 3.11, clones the repo, sets up everything.
No sudo needed.
</p>
</div>
</div>
<div class="install-step">
<div class="step-number">2</div>
<div class="step-content">
<h4>Configure</h4>
<div class="code-block">
<div class="code-header">
<span>bash</span>
<button
class="copy-btn"
onclick="copyText(this)"
data-text="hermes setup"
>
Copy
</button>
</div>
<pre><code><span class="code-comment"># Interactive setup wizard</span>
hermes setup
<span class="code-comment"># Or choose your model</span>
hermes model</code></pre>
</div>
<p class="step-note">
Connect to Nous Portal (OAuth), OpenRouter (API key), or your
own endpoint.
</p>
</div>
</div>
<div class="install-step">
<div class="step-number">3</div>
<div class="step-content">
<h4>Start chatting</h4>
<div class="code-block">
<div class="code-header">
<span>bash</span>
<button
class="copy-btn"
onclick="copyText(this)"
data-text="hermes"
>
Copy
</button>
</div>
<pre><code>hermes</code></pre>
</div>
<p class="step-note">
That's it. Full interactive CLI with tools, memory, and skills.
</p>
</div>
</div>
<div class="install-step">
<div class="step-number">4</div>
<div class="step-content">
<h4>
Go multi-platform <span class="step-optional">(optional)</span>
</h4>
<div class="code-block">
<div class="code-header">
<span>bash</span>
<button
class="copy-btn"
onclick="copyText(this)"
data-text="hermes gateway setup"
>
Copy
</button>
</div>
<pre><code><span class="code-comment"># Interactive gateway setup wizard</span>
hermes gateway setup
<span class="code-comment"># Start the messaging gateway</span>
hermes gateway
<span class="code-comment"># Install as a system service</span>
hermes gateway install</code></pre>
</div>
<p class="step-note">
Walk through connecting Telegram, Discord, Slack, or WhatsApp.
Runs as a systemd service.
</p>
</div>
</div>
<div class="install-step">
<div class="step-number">5</div>
<div class="step-content">
<h4>Keep it up to date</h4>
<div class="code-block">
<div class="code-header">
<span>bash</span>
<button
class="copy-btn"
onclick="copyText(this)"
data-text="hermes update"
>
Copy
</button>
</div>
<pre><code>hermes update</code></pre>
</div>
<p class="step-note">
Pulls the latest changes and reinstalls dependencies. Run
anytime to get new features and fixes.
</p>
</div>
</div>
</div>
<div class="install-windows">
<p>
Native Windows support is extremely experimental and unsupported.
Please install
<a
href="https://learn.microsoft.com/en-us/windows/wsl/install"
target="_blank"
rel="noopener"
>WSL2</a
>
and run Hermes Agent from there.
</p>
</div>
</div>
</section>
<!-- Terminal Demo -->
<section class="section section-demo" id="demo">
<div class="container">
<div class="section-header">
<h2>See it in action</h2>
</div>
<div class="terminal-window">
<div class="terminal-header">
<div class="terminal-dots">
<span class="dot dot-red"></span>
<span class="dot dot-yellow"></span>
<span class="dot dot-green"></span>
</div>
<span class="terminal-title">hermes</span>
</div>
<div class="terminal-body" id="terminal-demo"></div>
</div>
</div>
</section>
<!-- Features + Specs -->
<section class="section" id="features">
<div class="container">
<div class="section-header">
<h2>Features</h2>
</div>
<div class="features-grid">
<div class="feature-card">
<div class="feature-header">
<div class="feature-icon">
<svg
width="20"
height="20"
viewBox="0 0 24 24"
fill="none"
stroke="currentColor"
stroke-width="1.5"
stroke-linecap="round"
stroke-linejoin="round"
>
<path
d="M21 15a2 2 0 0 1-2 2H7l-4 4V5a2 2 0 0 1 2-2h14a2 2 0 0 1 2 2z"
/>
</svg>
</div>
<h3>Lives Where You Do</h3>
</div>
<p>
Telegram, Discord, Slack, WhatsApp, and CLI from a single gateway
— start on one, pick up on another.
</p>
</div>
<div class="feature-card">
<div class="feature-header">
<div class="feature-icon">
<svg
width="20"
height="20"
viewBox="0 0 24 24"
fill="none"
stroke="currentColor"
stroke-width="1.5"
stroke-linecap="round"
stroke-linejoin="round"
>
<polyline points="22 7 13.5 15.5 8.5 10.5 2 17" />
<polyline points="16 7 22 7 22 13" />
</svg>
</div>
<h3>Grows the Longer It Runs</h3>
</div>
<p>
Persistent memory and auto-generated skills — it learns your
projects and never forgets how it solved a problem.
</p>
</div>
<div class="feature-card">
<div class="feature-header">
<div class="feature-icon">
<svg
width="20"
height="20"
viewBox="0 0 24 24"
fill="none"
stroke="currentColor"
stroke-width="1.5"
stroke-linecap="round"
stroke-linejoin="round"
>
<circle cx="12" cy="12" r="10" />
<polyline points="12 6 12 12 16 14" />
</svg>
</div>
<h3>Scheduled Automations</h3>
</div>
<p>
Natural language cron scheduling for reports, backups, and
briefings — running unattended through the gateway.
</p>
</div>
<div class="feature-card">
<div class="feature-header">
<div class="feature-icon">
<svg
width="20"
height="20"
viewBox="0 0 24 24"
fill="none"
stroke="currentColor"
stroke-width="1.5"
stroke-linecap="round"
stroke-linejoin="round"
>
<circle cx="18" cy="18" r="3" />
<circle cx="6" cy="6" r="3" />
<path d="M6 21V9a9 9 0 0 0 9 9" />
<path d="M18 3v12a9 9 0 0 1-9-9" />
</svg>
</div>
<h3>Delegates & Parallelizes</h3>
</div>
<p>
Isolated subagents with their own conversations, terminals, and
Python RPC scripts for zero-context-cost pipelines.
</p>
</div>
<div class="feature-card">
<div class="feature-header">
<div class="feature-icon">
<svg
width="20"
height="20"
viewBox="0 0 24 24"
fill="none"
stroke="currentColor"
stroke-width="1.5"
stroke-linecap="round"
stroke-linejoin="round"
>
<rect x="3" y="11" width="18" height="11" rx="2" ry="2" />
<path d="M7 11V7a5 5 0 0 1 10 0v4" />
</svg>
</div>
<h3>Real Sandboxing</h3>
</div>
<p>
Five backends — local, Docker, SSH, Singularity, Modal — with
container hardening and namespace isolation.
</p>
</div>
<div class="feature-card">
<div class="feature-header">
<div class="feature-icon">
<svg
width="20"
height="20"
viewBox="0 0 24 24"
fill="none"
stroke="currentColor"
stroke-width="1.5"
stroke-linecap="round"
stroke-linejoin="round"
>
<circle cx="12" cy="12" r="10" />
<line x1="2" y1="12" x2="22" y2="12" />
<path
d="M12 2a15.3 15.3 0 0 1 4 10 15.3 15.3 0 0 1-4 10 15.3 15.3 0 0 1-4-10 15.3 15.3 0 0 1 4-10z"
/>
</svg>
</div>
<h3>Full Web & Browser Control</h3>
</div>
<p>
Web search, browser automation, vision, image generation,
text-to-speech, and multi-model reasoning.
</p>
</div>
</div>
<div class="features-more">
<button class="more-toggle" onclick="toggleSpecs()" id="specs-toggle">
<span class="toggle-label">More details</span>
<svg
class="more-chevron"
width="16"
height="16"
viewBox="0 0 24 24"
fill="none"
stroke="currentColor"
stroke-width="2"
stroke-linecap="round"
stroke-linejoin="round"
>
<polyline points="6 9 12 15 18 9" />
</svg>
</button>
</div>
<div class="specs-wrapper" id="specs-wrapper">
<div class="specs-list">
<div class="spec-row">
<h3 class="spec-label">Tools</h3>
<p class="spec-value">
40+ built-in — web search, terminal, file system, browser
automation, vision, image generation, text-to-speech, code
execution, subagent delegation, memory, task planning, cron
scheduling, multi-model reasoning, and more.
</p>
</div>
<div class="spec-row">
<h3 class="spec-label">Platforms</h3>
<p class="spec-value">
Telegram, Discord, Slack, WhatsApp, Signal, Email, and CLI — all
from a single gateway. Connect to
<a
href="https://portal.nousresearch.com"
target="_blank"
rel="noopener"
>Nous Portal</a
>, OpenRouter, or any OpenAI-compatible API.
</p>
</div>
<div class="spec-row">
<h3 class="spec-label">Environments</h3>
<p class="spec-value">
Run locally, in Docker, over SSH, on Modal, Daytona, or
Singularity. Container hardening with read-only root, dropped
capabilities, and namespace isolation.
</p>
</div>
<div class="spec-row">
<h3 class="spec-label">Skills</h3>
<p class="spec-value">
40+ bundled skills covering MLOps, GitHub workflows, research,
and more. The agent creates new skills on the fly and shares
them via the open
<a href="https://agentskills.io" target="_blank" rel="noopener"
>agentskills.io</a
>
format. Install community skills from
<a href="https://clawhub.ai" target="_blank" rel="noopener"
>ClawHub</a
>,
<a href="https://lobehub.com" target="_blank" rel="noopener"
>LobeHub</a
>, and GitHub.
</p>
</div>
<div class="spec-row">
<h3 class="spec-label">Research</h3>
<p class="spec-value">
Batch trajectory generation with parallel workers and
checkpointing. Atropos integration for RL training. Export to
ShareGPT for fine-tuning with trajectory compression.
</p>
</div>
</div>
</div>
</div>
</section>
<footer class="footer">
<div class="container">
<p class="footer-copy">
Built by
<a href="https://nousresearch.com" target="_blank" rel="noopener"
>Nous Research</a
>
&middot; MIT License &middot; 2026
</p>
</div>
</footer>
<script src="script.js"></script>
</body>
</html>
Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

+521
View File
@@ -0,0 +1,521 @@
// =========================================================================
// Hermes Agent Landing Page — Interactions
// =========================================================================
// --- Platform install commands ---
const PLATFORMS = {
linux: {
command:
"curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash",
prompt: "$",
note: "Works on Linux, macOS & WSL2 · No prerequisites · Installs everything automatically",
stepNote:
"Installs uv, Python 3.11, clones the repo, sets up everything. No sudo needed.",
},
};
function detectPlatform() {
return "linux";
}
function switchPlatform(platform) {
const cfg = PLATFORMS[platform];
if (!cfg) return;
// Update hero install widget
const commandEl = document.getElementById("install-command");
const promptEl = document.getElementById("install-prompt");
const noteEl = document.getElementById("install-note");
if (commandEl) commandEl.textContent = cfg.command;
if (promptEl) promptEl.textContent = cfg.prompt;
if (noteEl) noteEl.textContent = cfg.note;
// Update active tab in hero
document.querySelectorAll(".install-tab").forEach((tab) => {
tab.classList.toggle("active", tab.dataset.platform === platform);
});
// Sync the step section tabs too
switchStepPlatform(platform);
}
function switchStepPlatform(platform) {
const cfg = PLATFORMS[platform];
if (!cfg) return;
const commandEl = document.getElementById("step1-command");
const copyBtn = document.getElementById("step1-copy");
const noteEl = document.getElementById("step1-note");
if (commandEl) commandEl.textContent = cfg.command;
if (copyBtn) copyBtn.setAttribute("data-text", cfg.command);
if (noteEl) noteEl.textContent = cfg.stepNote;
// Update active tab in step section
document.querySelectorAll(".code-tab").forEach((tab) => {
tab.classList.toggle("active", tab.dataset.platform === platform);
});
}
function toggleMobileNav() {
document.getElementById("nav-mobile").classList.toggle("open");
document.getElementById("nav-hamburger").classList.toggle("open");
}
function toggleSpecs() {
const wrapper = document.getElementById("specs-wrapper");
const btn = document.getElementById("specs-toggle");
const label = btn.querySelector(".toggle-label");
const isOpen = wrapper.classList.contains("open");
if (isOpen) {
wrapper.style.maxHeight = wrapper.scrollHeight + "px";
requestAnimationFrame(() => {
wrapper.style.maxHeight = "0";
});
wrapper.classList.remove("open");
btn.classList.remove("open");
if (label) label.textContent = "More details";
} else {
wrapper.classList.add("open");
wrapper.style.maxHeight = wrapper.scrollHeight + "px";
btn.classList.add("open");
if (label) label.textContent = "Less";
wrapper.addEventListener(
"transitionend",
() => {
if (wrapper.classList.contains("open")) {
wrapper.style.maxHeight = "none";
}
},
{ once: true }
);
}
}
// --- Copy to clipboard ---
function copyInstall() {
const text = document.getElementById("install-command").textContent;
navigator.clipboard.writeText(text).then(() => {
const btn = document.querySelector(".install-widget-body .copy-btn");
const original = btn.querySelector(".copy-text").textContent;
btn.querySelector(".copy-text").textContent = "Copied!";
btn.style.color = "var(--primary-light)";
setTimeout(() => {
btn.querySelector(".copy-text").textContent = original;
btn.style.color = "";
}, 2000);
});
}
function copyText(btn) {
const text = btn.getAttribute("data-text");
navigator.clipboard.writeText(text).then(() => {
const original = btn.textContent;
btn.textContent = "Copied!";
btn.style.color = "var(--primary-light)";
setTimeout(() => {
btn.textContent = original;
btn.style.color = "";
}, 2000);
});
}
// --- Scroll-triggered fade-in ---
function initScrollAnimations() {
const elements = document.querySelectorAll(
".feature-card, .install-step, " +
".section-header, .terminal-window",
);
elements.forEach((el) => el.classList.add("fade-in"));
const observer = new IntersectionObserver(
(entries) => {
entries.forEach((entry) => {
if (entry.isIntersecting) {
// Stagger children within grids
const parent = entry.target.parentElement;
if (parent) {
const siblings = parent.querySelectorAll(".fade-in");
let idx = Array.from(siblings).indexOf(entry.target);
if (idx < 0) idx = 0;
setTimeout(() => {
entry.target.classList.add("visible");
}, idx * 60);
} else {
entry.target.classList.add("visible");
}
observer.unobserve(entry.target);
}
});
},
{ threshold: 0.1, rootMargin: "0px 0px -40px 0px" },
);
elements.forEach((el) => observer.observe(el));
}
// --- Terminal Demo ---
const CURSOR = '<span class="terminal-cursor">█</span>';
const demoSequence = [
{ type: "prompt", text: " " },
{
type: "type",
text: "Research the latest approaches to GRPO training and write a summary",
delay: 30,
},
{ type: "pause", ms: 600 },
{
type: "output",
lines: [
"",
'<span class="t-dim"> web_search "GRPO reinforcement learning 2026" 1.2s</span>',
],
},
{ type: "pause", ms: 400 },
{
type: "output",
lines: [
'<span class="t-dim"> web_extract arxiv.org/abs/2402.03300 3.1s</span>',
],
},
{ type: "pause", ms: 400 },
{
type: "output",
lines: [
'<span class="t-dim"> web_search "GRPO vs PPO ablation results" 0.9s</span>',
],
},
{ type: "pause", ms: 400 },
{
type: "output",
lines: [
'<span class="t-dim"> web_extract huggingface.co/blog/grpo 2.8s</span>',
],
},
{ type: "pause", ms: 400 },
{
type: "output",
lines: [
'<span class="t-dim"> write_file ~/research/grpo-summary.md 0.1s</span>',
],
},
{ type: "pause", ms: 500 },
{
type: "output",
lines: [
"",
'<span class="t-text">Done! I\'ve written a summary covering:</span>',
"",
'<span class="t-text"> <span class="t-green">✓</span> GRPO\'s group-relative advantage (no critic model needed)</span>',
'<span class="t-text"> <span class="t-green">✓</span> Comparison with PPO/DPO on reasoning benchmarks</span>',
'<span class="t-text"> <span class="t-green">✓</span> Implementation notes for Axolotl and TRL</span>',
"",
'<span class="t-text">Saved to</span> <span class="t-accent">~/research/grpo-summary.md</span>',
],
},
{ type: "pause", ms: 2500 },
{ type: "clear" },
{ type: "prompt", text: " " },
{
type: "type",
text: "Review the PR at NousResearch/hermes-agent#42 and fix any issues",
delay: 30,
},
{ type: "pause", ms: 600 },
{
type: "output",
lines: [
"",
'<span class="t-dim"> delegate_task "review PR #42 changes" 2.1s</span>',
],
},
{ type: "pause", ms: 500 },
{
type: "output",
lines: [
'<span class="t-dim"> git diff main..pr-42 0.4s</span>',
],
},
{ type: "pause", ms: 400 },
{
type: "output",
lines: [
'<span class="t-dim"> patch tools/registry.py 0.1s</span>',
],
},
{ type: "pause", ms: 400 },
{
type: "output",
lines: [
'<span class="t-dim"> python -m pytest tests/ -x 3.2s</span>',
],
},
{ type: "pause", ms: 400 },
{
type: "output",
lines: [
'<span class="t-dim"> git commit -m "fix: handle empty tool schemas" 0.3s</span>',
],
},
{ type: "pause", ms: 500 },
{
type: "output",
lines: [
"",
'<span class="t-text">Found 2 issues in the PR and fixed both:</span>',
"",
'<span class="t-text"> <span class="t-green">✓</span> Empty tool schema crash in registry.py — added guard</span>',
'<span class="t-text"> <span class="t-green">✓</span> Missing error handling in delegate_tool.py — added try/except</span>',
"",
'<span class="t-text">Tests pass. Committed the fix and pushed to the PR branch.</span>',
'<span class="t-text">I also saved a</span> <span class="t-accent">skill</span> <span class="t-text">for this PR review pattern.</span>',
],
},
{ type: "pause", ms: 2500 },
{ type: "clear" },
{ type: "prompt", text: " " },
{
type: "type",
text: "How did we fix that Docker networking issue?",
delay: 35,
},
{ type: "pause", ms: 500 },
{
type: "output",
lines: [
"",
'<span class="t-dim"> session_search "Docker networking" 1.4s</span>',
],
},
{ type: "pause", ms: 500 },
{
type: "output",
lines: [
"",
'<span class="t-text">Found it — from a session on February 12th:</span>',
"",
'<span class="t-text">The containers couldn\'t reach each other because the compose</span>',
'<span class="t-text">file was using the default bridge network. We switched to a</span>',
'<span class="t-text">custom network with</span> <span class="t-accent">driver: overlay</span><span class="t-text">, added explicit</span>',
'<span class="t-text">aliases, and set</span> <span class="t-accent">dns: 8.8.8.8</span> <span class="t-text">as a fallback.</span>',
"",
'<span class="t-text">The fix was committed in</span> <span class="t-accent">docker-compose.prod.yml</span><span class="t-text">.</span>',
],
},
{ type: "pause", ms: 3000 },
];
class TerminalDemo {
constructor(container) {
this.container = container;
this.running = false;
this.content = "";
}
async start() {
if (this.running) return;
this.running = true;
while (this.running) {
for (const step of demoSequence) {
if (!this.running) return;
await this.execute(step);
}
this.clear();
await this.sleep(1000);
}
}
stop() {
this.running = false;
}
async execute(step) {
switch (step.type) {
case "prompt":
this.append(`<span class="t-prompt">${step.text}</span>`);
break;
case "type":
for (const char of step.text) {
if (!this.running) return;
this.append(`<span class="t-cmd">${char}</span>`);
await this.sleep(step.delay || 30);
}
break;
case "output":
for (const line of step.lines) {
if (!this.running) return;
this.append("\n" + line);
await this.sleep(50);
}
break;
case "pause":
await this.sleep(step.ms);
break;
case "clear":
this.clear();
break;
}
}
append(html) {
this.content += html;
this.render();
}
render() {
this.container.innerHTML = this.content + CURSOR;
this.container.scrollTop = this.container.scrollHeight;
}
clear() {
this.content = "";
this.container.innerHTML = "";
}
sleep(ms) {
return new Promise((resolve) => setTimeout(resolve, ms));
}
}
// --- Noise Overlay (ported from hermes-chat NoiseOverlay) ---
function initNoiseOverlay() {
if (window.matchMedia("(prefers-reduced-motion: reduce)").matches) return;
if (typeof THREE === "undefined") return;
const canvas = document.getElementById("noise-overlay");
if (!canvas) return;
const vertexShader = `
varying vec2 vUv;
void main() {
vUv = uv;
gl_Position = projectionMatrix * modelViewMatrix * vec4(position, 1.0);
}
`;
const fragmentShader = `
uniform vec2 uRes;
uniform float uDpr, uSize, uDensity, uOpacity;
uniform vec3 uColor;
varying vec2 vUv;
float hash(vec2 p) {
vec3 p3 = fract(vec3(p.xyx) * 0.1031);
p3 += dot(p3, p3.yzx + 33.33);
return fract((p3.x + p3.y) * p3.z);
}
void main() {
float n = hash(floor(vUv * uRes / (uSize * uDpr)));
gl_FragColor = vec4(uColor, step(1.0 - uDensity, n)) * uOpacity;
}
`;
function hexToVec3(hex) {
const c = hex.replace("#", "");
return new THREE.Vector3(
parseInt(c.substring(0, 2), 16) / 255,
parseInt(c.substring(2, 4), 16) / 255,
parseInt(c.substring(4, 6), 16) / 255,
);
}
const renderer = new THREE.WebGLRenderer({
alpha: true,
canvas,
premultipliedAlpha: false,
});
renderer.setClearColor(0x000000, 0);
const scene = new THREE.Scene();
const camera = new THREE.OrthographicCamera(-1, 1, 1, -1, 0, 1);
const geo = new THREE.PlaneGeometry(2, 2);
const mat = new THREE.ShaderMaterial({
vertexShader,
fragmentShader,
transparent: true,
uniforms: {
uColor: { value: hexToVec3("#8090BB") },
uDensity: { value: 0.1 },
uDpr: { value: 1 },
uOpacity: { value: 0.4 },
uRes: { value: new THREE.Vector2() },
uSize: { value: 1.0 },
},
});
scene.add(new THREE.Mesh(geo, mat));
function resize() {
const dpr = window.devicePixelRatio;
const w = window.innerWidth;
const h = window.innerHeight;
renderer.setSize(w, h);
renderer.setPixelRatio(dpr);
mat.uniforms.uRes.value.set(w * dpr, h * dpr);
mat.uniforms.uDpr.value = dpr;
}
resize();
window.addEventListener("resize", resize);
function loop() {
requestAnimationFrame(loop);
renderer.render(scene, camera);
}
loop();
}
// --- Initialize ---
document.addEventListener("DOMContentLoaded", () => {
const detectedPlatform = detectPlatform();
switchPlatform(detectedPlatform);
initScrollAnimations();
initNoiseOverlay();
const terminalEl = document.getElementById("terminal-demo");
if (terminalEl) {
const demo = new TerminalDemo(terminalEl);
const observer = new IntersectionObserver(
(entries) => {
entries.forEach((entry) => {
if (entry.isIntersecting) {
demo.start();
} else {
demo.stop();
}
});
},
{ threshold: 0.3 },
);
observer.observe(document.querySelector(".terminal-window"));
}
const nav = document.querySelector(".nav");
let ticking = false;
window.addEventListener("scroll", () => {
if (!ticking) {
requestAnimationFrame(() => {
if (window.scrollY > 50) {
nav.style.borderBottomColor = "rgba(48, 80, 255, 0.15)";
} else {
nav.style.borderBottomColor = "";
}
ticking = false;
});
ticking = true;
}
});
});
File diff suppressed because it is too large Load Diff
+40 -2
View File
@@ -26,7 +26,7 @@ import logging
import threading
from typing import Dict, Any, List, Optional, Tuple
from tools.registry import discover_builtin_tools, registry
from tools.registry import registry
from toolsets import resolve_toolset, validate_toolset
logger = logging.getLogger(__name__)
@@ -129,7 +129,45 @@ def _run_async(coro):
# Tool Discovery (importing each module triggers its registry.register calls)
# =============================================================================
discover_builtin_tools()
def _discover_tools():
"""Import all tool modules to trigger their registry.register() calls.
Wrapped in a function so import errors in optional tools (e.g., fal_client
not installed) don't prevent the rest from loading.
"""
_modules = [
"tools.web_tools",
"tools.terminal_tool",
"tools.file_tools",
"tools.vision_tools",
"tools.mixture_of_agents_tool",
"tools.image_generation_tool",
"tools.skills_tool",
"tools.skill_manager_tool",
"tools.browser_tool",
"tools.cronjob_tools",
"tools.rl_training_tool",
"tools.tts_tool",
"tools.todo_tool",
"tools.memory_tool",
"tools.session_search_tool",
"tools.clarify_tool",
"tools.code_execution_tool",
"tools.delegate_tool",
"tools.process_registry",
"tools.send_message_tool",
# "tools.honcho_tools", # Removed — Honcho is now a memory provider plugin
"tools.homeassistant_tool",
]
import importlib
for mod_name in _modules:
try:
importlib.import_module(mod_name)
except Exception as e:
logger.warning("Could not import tool module %s: %s", mod_name, e)
_discover_tools()
# MCP tool discovery (external MCP servers from config)
try:
@@ -1,12 +1,12 @@
---
name: honcho
description: Configure and use Honcho memory with Hermes -- cross-session user modeling, multi-profile peer isolation, observation config, dialectic reasoning, session summaries, and context budget enforcement. Use when setting up Honcho, troubleshooting memory, managing profiles with Honcho peers, or tuning observation, recall, and dialectic settings.
version: 2.0.0
description: Configure and use Honcho memory with Hermes -- cross-session user modeling, multi-profile peer isolation, observation config, and dialectic reasoning. Use when setting up Honcho, troubleshooting memory, managing profiles with Honcho peers, or tuning observation and recall settings.
version: 1.0.0
author: Hermes Agent
license: MIT
metadata:
hermes:
tags: [Honcho, Memory, Profiles, Observation, Dialectic, User-Modeling, Session-Summary]
tags: [Honcho, Memory, Profiles, Observation, Dialectic, User-Modeling]
homepage: https://docs.honcho.dev
related_skills: [hermes-agent]
prerequisites:
@@ -22,9 +22,8 @@ Honcho provides AI-native cross-session user modeling. It learns who the user is
- Setting up Honcho (cloud or self-hosted)
- Troubleshooting memory not working / peers not syncing
- Creating multi-profile setups where each agent has its own Honcho peer
- Tuning observation, recall, dialectic depth, or write frequency settings
- Understanding what the 5 Honcho tools do and when to use them
- Configuring context budgets and session summary injection
- Tuning observation, recall, or write frequency settings
- Understanding what the 4 Honcho tools do and when to use them
## Setup
@@ -52,27 +51,6 @@ hermes honcho status # shows resolved config, connection test, peer info
## Architecture
### Base Context Injection
When Honcho injects context into the system prompt (in `hybrid` or `context` recall modes), it assembles the base context block in this order:
1. **Session summary** -- a short digest of the current session so far (placed first so the model has immediate conversational continuity)
2. **User representation** -- Honcho's accumulated model of the user (preferences, facts, patterns)
3. **AI peer card** -- the identity card for this Hermes profile's AI peer
The session summary is generated automatically by Honcho at the start of each turn (when a prior session exists). It gives the model a warm start without replaying full history.
### Cold / Warm Prompt Selection
Honcho automatically selects between two prompt strategies:
| Condition | Strategy | What happens |
|-----------|----------|--------------|
| No prior session or empty representation | **Cold start** | Lightweight intro prompt; skips summary injection; encourages the model to learn about the user |
| Existing representation and/or session history | **Warm start** | Full base context injection (summary → representation → card); richer system prompt |
You do not need to configure this -- it is automatic based on session state.
### Peers
Honcho models conversations as interactions between **peers**. Hermes creates two peers per session:
@@ -134,63 +112,6 @@ How the agent accesses Honcho memory:
| `context` | Yes | No (hidden) | Minimal token cost, no tool calls |
| `tools` | No | Yes | Agent controls all memory access explicitly |
## Three Orthogonal Knobs
Honcho's dialectic behavior is controlled by three independent dimensions. Each can be tuned without affecting the others:
### Cadence (when)
Controls **how often** dialectic and context calls happen.
| Key | Default | Description |
|-----|---------|-------------|
| `contextCadence` | `1` | Min turns between context API calls |
| `dialecticCadence` | `3` | Min turns between dialectic API calls |
| `injectionFrequency` | `every-turn` | `every-turn` or `first-turn` for base context injection |
Higher cadence values reduce API calls and cost. `dialecticCadence: 3` (default) means the dialectic engine fires at most every 3rd turn.
### Depth (how many)
Controls **how many rounds** of dialectic reasoning Honcho performs per query.
| Key | Default | Range | Description |
|-----|---------|-------|-------------|
| `dialecticDepth` | `1` | 1-3 | Number of dialectic reasoning rounds per query |
| `dialecticDepthLevels` | -- | array | Optional per-depth-round level overrides (see below) |
`dialecticDepth: 2` means Honcho runs two rounds of dialectic synthesis. The first round produces an initial answer; the second refines it.
`dialecticDepthLevels` lets you set the reasoning level for each round independently:
```json
{
"dialecticDepth": 3,
"dialecticDepthLevels": ["low", "medium", "high"]
}
```
If `dialecticDepthLevels` is omitted, rounds use **proportional levels** derived from `dialecticReasoningLevel` (the base):
| Depth | Pass levels |
|-------|-------------|
| 1 | [base] |
| 2 | [minimal, base] |
| 3 | [minimal, base, low] |
This keeps earlier passes cheap while using full depth on the final synthesis.
### Level (how hard)
Controls the **intensity** of each dialectic reasoning round.
| Key | Default | Description |
|-----|---------|-------------|
| `dialecticReasoningLevel` | `low` | `minimal`, `low`, `medium`, `high`, `max` |
| `dialecticDynamic` | `true` | When `true`, the model can pass `reasoning_level` to `honcho_reasoning` to override the default per-call. `false` = always use `dialecticReasoningLevel`, model overrides ignored |
Higher levels produce richer synthesis but cost more tokens on Honcho's backend.
## Multi-Profile Setup
Each Hermes profile gets its own Honcho AI peer while sharing the same workspace (user context). This means:
@@ -228,7 +149,6 @@ Override any setting in the host block:
"hermes.coder": {
"aiPeer": "coder",
"recallMode": "tools",
"dialecticDepth": 2,
"observation": {
"user": { "observeMe": true, "observeOthers": false },
"ai": { "observeMe": true, "observeOthers": true }
@@ -240,97 +160,19 @@ Override any setting in the host block:
## Tools
The agent has 5 bidirectional Honcho tools (hidden in `context` recall mode):
| Tool | LLM call? | Cost | Use when |
|------|-----------|------|----------|
| `honcho_profile` | No | minimal | Quick factual snapshot at conversation start or for fast name/role/pref lookups |
| `honcho_search` | No | low | Fetch specific past facts to reason over yourself — raw excerpts, no synthesis |
| `honcho_context` | No | low | Full session context snapshot: summary, representation, card, recent messages |
| `honcho_reasoning` | Yes | mediumhigh | Natural language question synthesized by Honcho's dialectic engine |
| `honcho_conclude` | No | minimal | Write or delete a persistent fact; pass `peer: "ai"` for AI self-knowledge |
The agent has 4 Honcho tools (hidden in `context` recall mode):
### `honcho_profile`
Read or update a peer card — curated key facts (name, role, preferences, communication style). Pass `card: [...]` to update; omit to read. No LLM call.
Quick factual snapshot of the user -- name, role, preferences, patterns. No LLM call, minimal cost. Use at conversation start or for fast lookups.
### `honcho_search`
Semantic search over stored context for a specific peer. Returns raw excerpts ranked by relevance, no synthesis. Default 800 tokens, max 2000. Good when you need specific past facts to reason over yourself rather than a synthesized answer.
Semantic search over stored context. Returns raw excerpts ranked by relevance, no LLM synthesis. Default 800 tokens, max 2000. Use when you want specific past facts to reason over yourself.
### `honcho_context`
Full session context snapshot from Honcho — session summary, peer representation, peer card, and recent messages. No LLM call. Use when you want to see everything Honcho knows about the current session and peer in one shot.
### `honcho_reasoning`
Natural language question answered by Honcho's dialectic reasoning engine (LLM call on Honcho's backend). Higher cost, higher quality. Pass `reasoning_level` to control depth: `minimal` (fast/cheap) → `low``medium``high``max` (thorough). Omit to use the configured default (`low`). Use for synthesized understanding of the user's patterns, goals, or current state.
Natural language question answered by Honcho's dialectic reasoning (LLM call on Honcho's backend). Higher cost, higher quality. Can query about user (default) or the AI peer.
### `honcho_conclude`
Write or delete a persistent conclusion about a peer. Pass `conclusion: "..."` to create. Pass `delete_id: "..."` to remove a conclusion (for PII removal — Honcho self-heals incorrect conclusions over time, so deletion is only needed for PII). You MUST pass exactly one of the two.
### Bidirectional peer targeting
All 5 tools accept an optional `peer` parameter:
- `peer: "user"` (default) — operates on the user peer
- `peer: "ai"` — operates on this profile's AI peer
- `peer: "<explicit-id>"` — any peer ID in the workspace
Examples:
```
honcho_profile # read user's card
honcho_profile peer="ai" # read AI peer's card
honcho_reasoning query="What does this user care about most?"
honcho_reasoning query="What are my interaction patterns?" peer="ai" reasoning_level="medium"
honcho_conclude conclusion="Prefers terse answers"
honcho_conclude conclusion="I tend to over-explain code" peer="ai"
honcho_conclude delete_id="abc123" # PII removal
```
## Agent Usage Patterns
Guidelines for Hermes when Honcho memory is active.
### On conversation start
```
1. honcho_profile → fast warmup, no LLM cost
2. If context looks thin → honcho_context (full snapshot, still no LLM)
3. If deep synthesis needed → honcho_reasoning (LLM call, use sparingly)
```
Do NOT call `honcho_reasoning` on every turn. Auto-injection already handles ongoing context refresh. Use the reasoning tool only when you genuinely need synthesized insight the base context doesn't provide.
### When the user shares something to remember
```
honcho_conclude conclusion="<specific, actionable fact>"
```
Good conclusions: "Prefers code examples over prose explanations", "Working on a Rust async project through April 2026"
Bad conclusions: "User said something about Rust" (too vague), "User seems technical" (already in representation)
### When the user asks about past context / you need to recall specifics
```
honcho_search query="<topic>" → fast, no LLM, good for specific facts
honcho_context → full snapshot with summary + messages
honcho_reasoning query="<question>" → synthesized answer, use when search isn't enough
```
### When to use `peer: "ai"`
Use AI peer targeting to build and query the agent's own self-knowledge:
- `honcho_conclude conclusion="I tend to be verbose when explaining architecture" peer="ai"` — self-correction
- `honcho_reasoning query="How do I typically handle ambiguous requests?" peer="ai"` — self-audit
- `honcho_profile peer="ai"` — review own identity card
### When NOT to call tools
In `hybrid` and `context` modes, base context (user representation + card + session summary) is auto-injected before every turn. Do not re-fetch what was already injected. Call tools only when:
- You need something the injected context doesn't have
- The user explicitly asks you to recall or check memory
- You're writing a conclusion about something new
### Cadence awareness
`honcho_reasoning` on the tool side shares the same cost as auto-injection dialectic. After an explicit tool call, the auto-injection cadence resets — avoiding double-charging the same turn.
Write a persistent fact about the user. Conclusions build the user's profile over time. Use when the user states a preference, corrects you, or shares something to remember.
## Config Reference
@@ -349,39 +191,18 @@ Config file: `$HERMES_HOME/honcho.json` (profile-local) or `~/.honcho/config.jso
| `observation` | all on | Per-peer `observeMe`/`observeOthers` booleans |
| `writeFrequency` | `async` | `async`, `turn`, `session`, or integer N |
| `sessionStrategy` | `per-directory` | `per-directory`, `per-repo`, `per-session`, `global` |
| `messageMaxChars` | `25000` | Max chars per message (chunked if exceeded) |
### Dialectic settings
| Key | Default | Description |
|-----|---------|-------------|
| `dialecticReasoningLevel` | `low` | `minimal`, `low`, `medium`, `high`, `max` |
| `dialecticDynamic` | `true` | Auto-bump reasoning by query complexity. `false` = fixed level |
| `dialecticDepth` | `1` | Number of dialectic rounds per query (1-3) |
| `dialecticDepthLevels` | -- | Optional array of per-round levels, e.g. `["low", "high"]` |
| `dialecticDynamic` | `true` | Auto-bump reasoning by query length. `false` = fixed level |
| `messageMaxChars` | `25000` | Max chars per message (chunked if exceeded) |
| `dialecticMaxInputChars` | `10000` | Max chars for dialectic query input |
### Context budget and injection
### Cost-awareness (advanced, root config only)
| Key | Default | Description |
|-----|---------|-------------|
| `contextTokens` | uncapped | Max tokens for the combined base context injection (summary + representation + card). Opt-in cap — omit to leave uncapped, set to an integer to bound injection size. |
| `injectionFrequency` | `every-turn` | `every-turn` or `first-turn` |
| `contextCadence` | `1` | Min turns between context API calls |
| `dialecticCadence` | `3` | Min turns between dialectic LLM calls |
The `contextTokens` budget is enforced at injection time. If the session summary + representation + card exceed the budget, Honcho trims the summary first, then the representation, preserving the card. This prevents context blowup in long sessions.
### Memory-context sanitization
Honcho sanitizes the `memory-context` block before injection to prevent prompt injection and malformed content:
- Strips XML/HTML tags from user-authored conclusions
- Normalizes whitespace and control characters
- Truncates individual conclusions that exceed `messageMaxChars`
- Escapes delimiter sequences that could break the system prompt structure
This fix addresses edge cases where raw user conclusions containing markup or special characters could corrupt the injected context block.
| `dialecticCadence` | `1` | Min turns between dialectic API calls |
## Troubleshooting
@@ -400,12 +221,6 @@ Observation config is synced from the server on each session init. Start a new s
### Messages truncated
Messages over `messageMaxChars` (default 25k) are automatically chunked with `[continued]` markers. If you're hitting this often, check if tool results or skill content is inflating message size.
### Context injection too large
If you see warnings about context budget exceeded, lower `contextTokens` or reduce `dialecticDepth`. The session summary is trimmed first when the budget is tight.
### Session summary missing
Session summary requires at least one prior turn in the current Honcho session. On cold start (new session, no history), the summary is omitted and Honcho uses the cold-start prompt strategy instead.
## CLI Commands
| Command | Description |
-94
View File
@@ -1,94 +0,0 @@
/**
* Example Dashboard Plugin
*
* Demonstrates how to build a dashboard plugin using the Hermes Plugin SDK.
* No build step needed this is a plain IIFE that uses globals from the SDK.
*/
(function () {
"use strict";
const SDK = window.__HERMES_PLUGIN_SDK__;
const { React } = SDK;
const { Card, CardHeader, CardTitle, CardContent, Badge, Button } = SDK.components;
const { useState, useEffect } = SDK.hooks;
const { cn } = SDK.utils;
function ExamplePage() {
const [greeting, setGreeting] = useState(null);
const [loading, setLoading] = useState(false);
function fetchGreeting() {
setLoading(true);
SDK.fetchJSON("/api/plugins/example/hello")
.then(function (data) { setGreeting(data.message); })
.catch(function () { setGreeting("(backend not available)"); })
.finally(function () { setLoading(false); });
}
return React.createElement("div", { className: "flex flex-col gap-6" },
// Header card
React.createElement(Card, null,
React.createElement(CardHeader, null,
React.createElement("div", { className: "flex items-center gap-3" },
React.createElement(CardTitle, { className: "text-lg" }, "Example Plugin"),
React.createElement(Badge, { variant: "outline" }, "v1.0.0"),
),
),
React.createElement(CardContent, { className: "flex flex-col gap-4" },
React.createElement("p", { className: "text-sm text-muted-foreground" },
"This is an example dashboard plugin. It demonstrates using the Plugin SDK to build ",
"custom tabs with React components, connect to backend API routes, and integrate with ",
"the existing Hermes UI system.",
),
React.createElement("div", { className: "flex items-center gap-3" },
React.createElement(Button, {
onClick: fetchGreeting,
disabled: loading,
className: cn(
"inline-flex items-center gap-2 border border-border bg-background/40 px-4 py-2",
"text-sm font-courier transition-colors hover:bg-foreground/10 cursor-pointer",
),
}, loading ? "Loading..." : "Call Backend API"),
greeting && React.createElement("span", {
className: "text-sm font-courier text-muted-foreground",
}, greeting),
),
),
),
// Info card about the SDK
React.createElement(Card, null,
React.createElement(CardHeader, null,
React.createElement(CardTitle, { className: "text-base" }, "Plugin SDK Reference"),
),
React.createElement(CardContent, null,
React.createElement("div", { className: "grid gap-3 text-sm" },
React.createElement("div", { className: "flex flex-col gap-1 border border-border p-3" },
React.createElement("span", { className: "font-medium" }, "window.__HERMES_PLUGIN_SDK__.React"),
React.createElement("span", { className: "text-muted-foreground text-xs" }, "React instance — use instead of importing react"),
),
React.createElement("div", { className: "flex flex-col gap-1 border border-border p-3" },
React.createElement("span", { className: "font-medium" }, "window.__HERMES_PLUGIN_SDK__.hooks"),
React.createElement("span", { className: "text-muted-foreground text-xs" }, "useState, useEffect, useCallback, useMemo, useRef, useContext, createContext"),
),
React.createElement("div", { className: "flex flex-col gap-1 border border-border p-3" },
React.createElement("span", { className: "font-medium" }, "window.__HERMES_PLUGIN_SDK__.components"),
React.createElement("span", { className: "text-muted-foreground text-xs" }, "Card, Badge, Button, Input, Label, Select, Separator, Tabs, etc."),
),
React.createElement("div", { className: "flex flex-col gap-1 border border-border p-3" },
React.createElement("span", { className: "font-medium" }, "window.__HERMES_PLUGIN_SDK__.api"),
React.createElement("span", { className: "text-muted-foreground text-xs" }, "Hermes API client — getStatus(), getSessions(), etc."),
),
React.createElement("div", { className: "flex flex-col gap-1 border border-border p-3" },
React.createElement("span", { className: "font-medium" }, "window.__HERMES_PLUGIN_SDK__.utils"),
React.createElement("span", { className: "text-muted-foreground text-xs" }, "cn(), timeAgo(), isoTimeAgo()"),
),
),
),
),
);
}
// Register this plugin — the dashboard picks it up automatically.
window.__HERMES_PLUGINS__.register("example", ExamplePage);
})();
@@ -1,13 +0,0 @@
{
"name": "example",
"label": "Example",
"description": "Example dashboard plugin — demonstrates the plugin SDK",
"icon": "Sparkles",
"version": "1.0.0",
"tab": {
"path": "/example",
"position": "after:skills"
},
"entry": "dist/index.js",
"api": "plugin_api.py"
}
@@ -1,14 +0,0 @@
"""Example dashboard plugin — backend API routes.
Mounted at /api/plugins/example/ by the dashboard plugin system.
"""
from fastapi import APIRouter
router = APIRouter()
@router.get("/hello")
async def hello():
"""Simple greeting endpoint to demonstrate plugin API routes."""
return {"message": "Hello from the example plugin!", "plugin": "example", "version": "1.0.0"}
+26 -115
View File
@@ -1,22 +1,18 @@
"""Memory provider plugin discovery.
Scans two directories for memory provider plugins:
1. Bundled providers: ``plugins/memory/<name>/`` (shipped with hermes-agent)
2. User-installed providers: ``$HERMES_HOME/plugins/<name>/``
Scans ``plugins/memory/<name>/`` directories for memory provider plugins.
Each subdirectory must contain ``__init__.py`` with a class implementing
the MemoryProvider ABC. On name collisions, bundled providers take
precedence.
the MemoryProvider ABC.
Only ONE provider can be active at a time, selected via
``memory.provider`` in config.yaml.
Memory providers are separate from the general plugin system they live
in the repo and are always available without user installation. Only ONE
can be active at a time, selected via ``memory.provider`` in config.yaml.
Usage:
from plugins.memory import discover_memory_providers, load_memory_provider
available = discover_memory_providers() # [(name, desc, available), ...]
provider = load_memory_provider("mnemosyne") # MemoryProvider instance
provider = load_memory_provider("openviking") # MemoryProvider instance
"""
from __future__ import annotations
@@ -33,101 +29,24 @@ logger = logging.getLogger(__name__)
_MEMORY_PLUGINS_DIR = Path(__file__).parent
# ---------------------------------------------------------------------------
# Directory helpers
# ---------------------------------------------------------------------------
def _get_user_plugins_dir() -> Optional[Path]:
"""Return ``$HERMES_HOME/plugins/`` or None if unavailable."""
try:
from hermes_constants import get_hermes_home
d = get_hermes_home() / "plugins"
return d if d.is_dir() else None
except Exception:
return None
def _is_memory_provider_dir(path: Path) -> bool:
"""Heuristic: does *path* look like a memory provider plugin?
Checks for ``register_memory_provider`` or ``MemoryProvider`` in the
``__init__.py`` source. Cheap text scan no import needed.
"""
init_file = path / "__init__.py"
if not init_file.exists():
return False
try:
source = init_file.read_text(errors="replace")[:8192]
return "register_memory_provider" in source or "MemoryProvider" in source
except Exception:
return False
def _iter_provider_dirs() -> List[Tuple[str, Path]]:
"""Yield ``(name, path)`` for all discovered provider directories.
Scans bundled first, then user-installed. Bundled takes precedence
on name collisions (first-seen wins via ``seen`` set).
"""
seen: set = set()
dirs: List[Tuple[str, Path]] = []
# 1. Bundled providers (plugins/memory/<name>/)
if _MEMORY_PLUGINS_DIR.is_dir():
for child in sorted(_MEMORY_PLUGINS_DIR.iterdir()):
if not child.is_dir() or child.name.startswith(("_", ".")):
continue
if not (child / "__init__.py").exists():
continue
seen.add(child.name)
dirs.append((child.name, child))
# 2. User-installed providers ($HERMES_HOME/plugins/<name>/)
user_dir = _get_user_plugins_dir()
if user_dir:
for child in sorted(user_dir.iterdir()):
if not child.is_dir() or child.name.startswith(("_", ".")):
continue
if child.name in seen:
continue # bundled takes precedence
if not _is_memory_provider_dir(child):
continue # skip non-memory plugins
dirs.append((child.name, child))
return dirs
def find_provider_dir(name: str) -> Optional[Path]:
"""Resolve a provider name to its directory.
Checks bundled first, then user-installed.
"""
# Bundled
bundled = _MEMORY_PLUGINS_DIR / name
if bundled.is_dir() and (bundled / "__init__.py").exists():
return bundled
# User-installed
user_dir = _get_user_plugins_dir()
if user_dir:
user = user_dir / name
if user.is_dir() and _is_memory_provider_dir(user):
return user
return None
# ---------------------------------------------------------------------------
# Public API
# ---------------------------------------------------------------------------
def discover_memory_providers() -> List[Tuple[str, str, bool]]:
"""Scan bundled and user-installed directories for available providers.
"""Scan plugins/memory/ for available providers.
Returns list of (name, description, is_available) tuples.
Bundled providers take precedence on name collisions.
Does NOT import the providers just reads plugin.yaml for metadata
and does a lightweight availability check.
"""
results = []
if not _MEMORY_PLUGINS_DIR.is_dir():
return results
for child in sorted(_MEMORY_PLUGINS_DIR.iterdir()):
if not child.is_dir() or child.name.startswith(("_", ".")):
continue
init_file = child / "__init__.py"
if not init_file.exists():
continue
for name, child in _iter_provider_dirs():
# Read description from plugin.yaml if available
desc = ""
yaml_file = child / "plugin.yaml"
@@ -151,7 +70,7 @@ def discover_memory_providers() -> List[Tuple[str, str, bool]]:
except Exception:
available = False
results.append((name, desc, available))
results.append((child.name, desc, available))
return results
@@ -159,15 +78,11 @@ def discover_memory_providers() -> List[Tuple[str, str, bool]]:
def load_memory_provider(name: str) -> Optional["MemoryProvider"]:
"""Load and return a MemoryProvider instance by name.
Checks both bundled (``plugins/memory/<name>/``) and user-installed
(``$HERMES_HOME/plugins/<name>/``) directories. Bundled takes
precedence on name collisions.
Returns None if the provider is not found or fails to load.
"""
provider_dir = find_provider_dir(name)
if not provider_dir:
logger.debug("Memory provider '%s' not found in bundled or user plugins", name)
provider_dir = _MEMORY_PLUGINS_DIR / name
if not provider_dir.is_dir():
logger.debug("Memory provider '%s' not found in %s", name, _MEMORY_PLUGINS_DIR)
return None
try:
@@ -189,10 +104,7 @@ def _load_provider_from_dir(provider_dir: Path) -> Optional["MemoryProvider"]:
- A top-level class that extends MemoryProvider we instantiate it
"""
name = provider_dir.name
# Use a separate namespace for user-installed plugins so they don't
# collide with bundled providers in sys.modules.
_is_bundled = _MEMORY_PLUGINS_DIR in provider_dir.parents or provider_dir.parent == _MEMORY_PLUGINS_DIR
module_name = f"plugins.memory.{name}" if _is_bundled else f"_hermes_user_memory.{name}"
module_name = f"plugins.memory.{name}"
init_file = provider_dir / "__init__.py"
if not init_file.exists():
@@ -345,16 +257,15 @@ def discover_plugin_cli_commands() -> List[dict]:
return results
# Only look at the active provider's directory
plugin_dir = find_provider_dir(active_provider)
if not plugin_dir:
plugin_dir = _MEMORY_PLUGINS_DIR / active_provider
if not plugin_dir.is_dir():
return results
cli_file = plugin_dir / "cli.py"
if not cli_file.exists():
return results
_is_bundled = _MEMORY_PLUGINS_DIR in plugin_dir.parents or plugin_dir.parent == _MEMORY_PLUGINS_DIR
module_name = f"plugins.memory.{active_provider}.cli" if _is_bundled else f"_hermes_user_memory.{active_provider}.cli"
module_name = f"plugins.memory.{active_provider}.cli"
try:
# Import the CLI module (lightweight — no SDK needed)
if module_name in sys.modules:
+99 -207
View File
@@ -1,6 +1,6 @@
# Honcho Memory Provider
AI-native cross-session user modeling with multi-pass dialectic reasoning, session summaries, bidirectional peer tools, and persistent conclusions.
AI-native cross-session user modeling with dialectic Q&A, semantic search, peer cards, and persistent conclusions.
> **Honcho docs:** <https://docs.honcho.dev/v3/guides/integrations/hermes>
@@ -19,86 +19,9 @@ hermes memory setup # generic picker, also works
Or manually:
```bash
hermes config set memory.provider honcho
echo "HONCHO_API_KEY=***" >> ~/.hermes/.env
echo "HONCHO_API_KEY=your-key" >> ~/.hermes/.env
```
## Architecture Overview
### Two-Layer Context Injection
Context is injected into the **user message** at API-call time (not the system prompt) to preserve prompt caching. Only a static mode header goes in the system prompt. The injected block is wrapped in `<memory-context>` fences with a system note clarifying it's background data, not new user input.
Two independent layers, each on its own cadence:
**Layer 1 — Base context** (refreshed every `contextCadence` turns):
1. **SESSION SUMMARY** — from `session.context(summary=True)`, placed first
2. **User Representation** — Honcho's evolving model of the user
3. **User Peer Card** — key facts snapshot
4. **AI Self-Representation** — Honcho's model of the AI peer
5. **AI Identity Card** — AI peer facts
**Layer 2 — Dialectic supplement** (fired every `dialecticCadence` turns):
Multi-pass `.chat()` reasoning about the user, appended after base context.
Both layers are joined, then truncated to fit `contextTokens` budget via `_truncate_to_budget` (tokens × 4 chars, word-boundary safe).
### Cold Start vs Warm Session Prompts
Dialectic pass 0 automatically selects its prompt based on session state:
- **Cold** (no base context cached): "Who is this person? What are their preferences, goals, and working style? Focus on facts that would help an AI assistant be immediately useful."
- **Warm** (base context exists): "Given what's been discussed in this session so far, what context about this user is most relevant to the current conversation? Prioritize active context over biographical facts."
Not configurable — determined automatically.
### Dialectic Depth (Multi-Pass Reasoning)
`dialecticDepth` (13, clamped) controls how many `.chat()` calls fire per dialectic cycle:
| Depth | Passes | Behavior |
|-------|--------|----------|
| 1 | single `.chat()` | Base query only (cold or warm prompt) |
| 2 | audit + synthesis | Pass 0 result is self-audited; pass 1 does targeted synthesis. Conditional bail-out if pass 0 returns strong signal (>300 chars or structured with bullets/sections >100 chars) |
| 3 | audit + synthesis + reconciliation | Pass 2 reconciles contradictions across prior passes into a final synthesis |
### Proportional Reasoning Levels
When `dialecticDepthLevels` is not set, each pass uses a proportional level relative to `dialecticReasoningLevel` (the "base"):
| Depth | Pass levels |
|-------|-------------|
| 1 | [base] |
| 2 | [minimal, base] |
| 3 | [minimal, base, low] |
Override with `dialecticDepthLevels`: an explicit array of reasoning level strings per pass.
### Three Orthogonal Dialectic Knobs
| Knob | Controls | Type |
|------|----------|------|
| `dialecticCadence` | How often — minimum turns between dialectic firings | int |
| `dialecticDepth` | How many — passes per firing (13) | int |
| `dialecticReasoningLevel` | How hard — reasoning ceiling per `.chat()` call | string |
### Input Sanitization
`run_conversation` strips leaked `<memory-context>` blocks from user input before processing. When `saveMessages` persists a turn that included injected context, the block can reappear in subsequent turns via message history. The sanitizer removes `<memory-context>` blocks plus associated system notes.
## Tools
Five bidirectional tools. All accept an optional `peer` parameter (`"user"` or `"ai"`, default `"user"`).
| Tool | LLM call? | Description |
|------|-----------|-------------|
| `honcho_profile` | No | Peer card — key facts snapshot |
| `honcho_search` | No | Semantic search over stored context (800 tok default, 2000 max) |
| `honcho_context` | No | Full session context: summary, representation, card, messages |
| `honcho_reasoning` | Yes | LLM-synthesized answer via dialectic `.chat()` |
| `honcho_conclude` | No | Write a persistent fact/conclusion about the user |
Tool visibility depends on `recallMode`: hidden in `context` mode, always present in `tools` and `hybrid`.
## Config Resolution
Config is read from the first file that exists:
@@ -111,128 +34,42 @@ Config is read from the first file that exists:
Host key is derived from the active Hermes profile: `hermes` (default) or `hermes.<profile>`.
For every key, resolution order is: **host block > root > env var > default**.
## Tools
| Tool | LLM call? | Description |
|------|-----------|-------------|
| `honcho_profile` | No | User's peer card -- key facts snapshot |
| `honcho_search` | No | Semantic search over stored context (800 tok default, 2000 max) |
| `honcho_context` | Yes | LLM-synthesized answer via dialectic reasoning |
| `honcho_conclude` | No | Write a persistent fact about the user |
Tool availability depends on `recallMode`: hidden in `context` mode, always present in `tools` and `hybrid`.
## Full Configuration Reference
### Identity & Connection
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `apiKey` | string | | API key. Falls back to `HONCHO_API_KEY` env var |
| `baseUrl` | string | | Base URL for self-hosted Honcho. Local URLs auto-skip API key auth |
| `environment` | string | `"production"` | SDK environment mapping |
| `enabled` | bool | auto | Master toggle. Auto-enables when `apiKey` or `baseUrl` present |
| `workspace` | string | host key | Honcho workspace ID. Shared environment — all profiles in the same workspace can see the same user identity and related memories |
| `peerName` | string | | User peer identity |
| `aiPeer` | string | host key | AI peer identity |
| Key | Type | Default | Scope | Description |
|-----|------|---------|-------|-------------|
| `apiKey` | string | -- | root / host | API key. Falls back to `HONCHO_API_KEY` env var |
| `baseUrl` | string | -- | root | Base URL for self-hosted Honcho. Local URLs (`localhost`, `127.0.0.1`, `::1`) auto-skip API key auth |
| `environment` | string | `"production"` | root / host | SDK environment mapping |
| `enabled` | bool | auto | root / host | Master toggle. Auto-enables when `apiKey` or `baseUrl` present |
| `workspace` | string | host key | root / host | Honcho workspace ID |
| `peerName` | string | -- | root / host | User peer identity |
| `aiPeer` | string | host key | root / host | AI peer identity |
### Memory & Recall
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `recallMode` | string | `"hybrid"` | `"hybrid"` (auto-inject + tools), `"context"` (auto-inject only, tools hidden), `"tools"` (tools only, no injection). Legacy `"auto"` `"hybrid"` |
| `observationMode` | string | `"directional"` | Preset: `"directional"` (all on) or `"unified"` (shared pool). Use `observation` object for granular control |
| `observation` | object | | Per-peer observation config (see Observation section) |
| Key | Type | Default | Scope | Description |
|-----|------|---------|-------|-------------|
| `recallMode` | string | `"hybrid"` | root / host | `"hybrid"` (auto-inject + tools), `"context"` (auto-inject only, tools hidden), `"tools"` (tools only, no injection). Legacy `"auto"` normalizes to `"hybrid"` |
| `observationMode` | string | `"directional"` | root / host | Shorthand preset: `"directional"` (all on) or `"unified"` (shared pool). Use `observation` object for granular control |
| `observation` | object | -- | root / host | Per-peer observation config (see below) |
### Write Behavior
#### Observation (granular)
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `writeFrequency` | string/int | `"async"` | `"async"` (background), `"turn"` (sync per turn), `"session"` (batch on end), or integer N (every N turns) |
| `saveMessages` | bool | `true` | Persist messages to Honcho API |
### Session Resolution
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `sessionStrategy` | string | `"per-directory"` | `"per-directory"`, `"per-session"`, `"per-repo"` (git root), `"global"` |
| `sessionPeerPrefix` | bool | `false` | Prepend peer name to session keys |
| `sessions` | object | `{}` | Manual directory-to-session-name mappings |
#### Session Name Resolution
The Honcho session name determines which conversation bucket memory lands in. Resolution follows a priority chain — first match wins:
| Priority | Source | Example session name |
|----------|--------|---------------------|
| 1 | Manual map (`sessions` config) | `"myproject-main"` |
| 2 | `/title` command (mid-session rename) | `"refactor-auth"` |
| 3 | Gateway session key (Telegram, Discord, etc.) | `"agent-main-telegram-dm-8439114563"` |
| 4 | `per-session` strategy | Hermes session ID (`20260415_a3f2b1`) |
| 5 | `per-repo` strategy | Git root directory name (`hermes-agent`) |
| 6 | `per-directory` strategy | Current directory basename (`src`) |
| 7 | `global` strategy | Workspace name (`hermes`) |
Gateway platforms always resolve via priority 3 (per-chat isolation) regardless of `sessionStrategy`. The strategy setting only affects CLI sessions.
If `sessionPeerPrefix` is `true`, the peer name is prepended: `eri-hermes-agent`.
#### What each strategy produces
- **`per-directory`** — basename of `$PWD`. Opening hermes in `~/code/myapp` and `~/code/other` gives two separate sessions. Same directory = same session across runs.
- **`per-repo`** — git root directory name. All subdirectories within a repo share one session. Falls back to `per-directory` if not inside a git repo.
- **`per-session`** — Hermes session ID (timestamp + hex). Every `hermes` invocation starts a fresh Honcho session. Falls back to `per-directory` if no session ID is available.
- **`global`** — workspace name. One session for everything. Memory accumulates across all directories and runs.
### Multi-Profile Pattern
Multiple Hermes profiles can share one workspace while maintaining separate AI identities. Config resolution is **host block > root > env var > default** — host blocks inherit from root, so shared settings only need to be declared once:
```json
{
"apiKey": "***",
"workspace": "hermes",
"peerName": "yourname",
"hosts": {
"hermes": {
"aiPeer": "hermes",
"recallMode": "hybrid",
"sessionStrategy": "per-directory"
},
"hermes.coder": {
"aiPeer": "coder",
"recallMode": "tools",
"sessionStrategy": "per-repo"
}
}
}
```
Both profiles see the same user (`yourname`) in the same shared environment (`hermes`), but each AI peer builds its own observations, conclusions, and behavior patterns. The coder's memory stays code-oriented; the main agent's stays broad.
Host key is derived from the active Hermes profile: `hermes` (default) or `hermes.<profile>` (e.g. `hermes -p coder` → host key `hermes.coder`).
### Dialectic & Reasoning
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `dialecticDepth` | int | `1` | Passes per dialectic cycle (13, clamped). 1=single query, 2=audit+synthesis, 3=audit+synthesis+reconciliation |
| `dialecticDepthLevels` | array | — | Optional array of reasoning level strings per pass. Overrides proportional defaults. Example: `["minimal", "low", "medium"]` |
| `dialecticReasoningLevel` | string | `"low"` | Base reasoning level for `.chat()`: `"minimal"`, `"low"`, `"medium"`, `"high"`, `"max"` |
| `dialecticDynamic` | bool | `true` | When `true`, model can override reasoning level per-call via `honcho_reasoning` tool. When `false`, always uses `dialecticReasoningLevel` |
| `dialecticMaxChars` | int | `600` | Max chars of dialectic result injected into system prompt |
| `dialecticMaxInputChars` | int | `10000` | Max chars for dialectic query input to `.chat()`. Honcho cloud limit: 10k |
### Token Budgets
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `contextTokens` | int | SDK default | Token budget for `context()` API calls. Also gates prefetch truncation (tokens × 4 chars) |
| `messageMaxChars` | int | `25000` | Max chars per message sent via `add_messages()`. Exceeding this triggers chunking with `[continued]` markers. Honcho cloud limit: 25k |
### Cadence (Cost Control)
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `contextCadence` | int | `1` | Minimum turns between base context refreshes (session summary + representation + card) |
| `dialecticCadence` | int | `1` | Minimum turns between dialectic `.chat()` firings |
| `injectionFrequency` | string | `"every-turn"` | `"every-turn"` or `"first-turn"` (inject context on the first user message only, skip from turn 2 onward) |
| `reasoningLevelCap` | string | — | Hard cap on reasoning level: `"minimal"`, `"low"`, `"medium"`, `"high"` |
### Observation (Granular)
Maps 1:1 to Honcho's per-peer `SessionPeerConfig`. When present, overrides `observationMode` preset.
Maps 1:1 to Honcho's per-peer `SessionPeerConfig`. Set at root or per host block -- each profile can have different observation settings. When present, overrides `observationMode` preset.
```json
"observation": {
@@ -248,16 +85,74 @@ Maps 1:1 to Honcho's per-peer `SessionPeerConfig`. When present, overrides `obse
| `ai.observeMe` | `true` | AI peer self-observation (Honcho builds AI representation) |
| `ai.observeOthers` | `true` | AI peer observes user messages (enables cross-peer dialectic) |
Presets:
- `"directional"` (default): all four `true`
Presets for `observationMode`:
- `"directional"` (default): all four booleans `true`
- `"unified"`: user `observeMe=true`, AI `observeOthers=true`, rest `false`
### Hardcoded Limits
Per-profile example -- coder profile observes the user but user doesn't observe coder:
| Limit | Value |
|-------|-------|
| Search tool max tokens | 2000 (hard cap), 800 (default) |
| Peer card fetch tokens | 200 |
```json
"hosts": {
"hermes.coder": {
"observation": {
"user": { "observeMe": true, "observeOthers": false },
"ai": { "observeMe": true, "observeOthers": true }
}
}
}
```
Settings changed in the [Honcho dashboard](https://app.honcho.dev) are synced back on session init.
### Write Behavior
| Key | Type | Default | Scope | Description |
|-----|------|---------|-------|-------------|
| `writeFrequency` | string or int | `"async"` | root / host | `"async"` (background thread), `"turn"` (sync per turn), `"session"` (batch on end), or integer N (every N turns) |
| `saveMessages` | bool | `true` | root / host | Whether to persist messages to Honcho API |
### Session Resolution
| Key | Type | Default | Scope | Description |
|-----|------|---------|-------|-------------|
| `sessionStrategy` | string | `"per-directory"` | root / host | `"per-directory"`, `"per-session"` (new each run), `"per-repo"` (git root name), `"global"` (single session) |
| `sessionPeerPrefix` | bool | `false` | root / host | Prepend peer name to session keys |
| `sessions` | object | `{}` | root | Manual directory-to-session-name mappings: `{"/path/to/project": "my-session"}` |
### Token Budgets & Dialectic
| Key | Type | Default | Scope | Description |
|-----|------|---------|-------|-------------|
| `contextTokens` | int | SDK default | root / host | Token budget for `context()` API calls. Also gates prefetch truncation (tokens x 4 chars) |
| `dialecticReasoningLevel` | string | `"low"` | root / host | Base reasoning level for `peer.chat()`: `"minimal"`, `"low"`, `"medium"`, `"high"`, `"max"` |
| `dialecticDynamic` | bool | `true` | root / host | Auto-bump reasoning based on query length: `<120` chars = base level, `120-400` = +1, `>400` = +2 (capped at `"high"`). Set `false` to always use `dialecticReasoningLevel` as-is |
| `dialecticMaxChars` | int | `600` | root / host | Max chars of dialectic result injected into system prompt |
| `dialecticMaxInputChars` | int | `10000` | root / host | Max chars for dialectic query input to `peer.chat()`. Honcho cloud limit: 10k |
| `messageMaxChars` | int | `25000` | root / host | Max chars per message sent via `add_messages()`. Messages exceeding this are chunked with `[continued]` markers. Honcho cloud limit: 25k |
### Cost Awareness (Advanced)
These are read from the root config object, not the host block. Must be set manually in `honcho.json`.
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `injectionFrequency` | string | `"every-turn"` | `"every-turn"` or `"first-turn"` (inject context only on turn 0) |
| `contextCadence` | int | `1` | Minimum turns between `context()` API calls |
| `dialecticCadence` | int | `1` | Minimum turns between `peer.chat()` API calls |
| `reasoningLevelCap` | string | -- | Hard cap on auto-bumped reasoning: `"minimal"`, `"low"`, `"mid"`, `"high"` |
### Hardcoded Limits (Not Configurable)
| Limit | Value | Location |
|-------|-------|----------|
| Search tool max tokens | 2000 (hard cap), 800 (default) | `__init__.py` handle_tool_call |
| Peer card fetch tokens | 200 | `session.py` get_peer_card |
## Config Precedence
For every key, resolution order is: **host block > root > env var > default**.
Host key derivation: `HERMES_HONCHO_HOST` env > active profile (`hermes.<profile>`) > `"hermes"`.
## Environment Variables
@@ -287,16 +182,15 @@ Presets:
```json
{
"apiKey": "***",
"apiKey": "your-key",
"workspace": "hermes",
"peerName": "username",
"contextCadence": 2,
"dialecticCadence": 3,
"dialecticDepth": 2,
"peerName": "eri",
"hosts": {
"hermes": {
"enabled": true,
"aiPeer": "hermes",
"workspace": "hermes",
"peerName": "eri",
"recallMode": "hybrid",
"observation": {
"user": { "observeMe": true, "observeOthers": true },
@@ -305,16 +199,14 @@ Presets:
"writeFrequency": "async",
"sessionStrategy": "per-directory",
"dialecticReasoningLevel": "low",
"dialecticDepth": 2,
"dialecticMaxChars": 600,
"saveMessages": true
},
"hermes.coder": {
"enabled": true,
"aiPeer": "coder",
"sessionStrategy": "per-repo",
"dialecticDepth": 1,
"dialecticDepthLevels": ["low"],
"workspace": "hermes",
"peerName": "eri",
"observation": {
"user": { "observeMe": true, "observeOthers": false },
"ai": { "observeMe": true, "observeOthers": true }
+82 -414
View File
@@ -17,7 +17,6 @@ from __future__ import annotations
import json
import logging
import re
import threading
from typing import Any, Dict, List, Optional
@@ -34,33 +33,20 @@ logger = logging.getLogger(__name__)
PROFILE_SCHEMA = {
"name": "honcho_profile",
"description": (
"Retrieve or update a peer card from Honcho — a curated list of key facts "
"about that peer (name, role, preferences, communication style, patterns). "
"Pass `card` to update; omit `card` to read."
"Retrieve the user's peer card from Honcho — a curated list of key facts "
"about them (name, role, preferences, communication style, patterns). "
"Fast, no LLM reasoning, minimal cost. "
"Use this at conversation start or when you need a quick factual snapshot."
),
"parameters": {
"type": "object",
"properties": {
"peer": {
"type": "string",
"description": "Peer to query. Built-in aliases: 'user' (default), 'ai'. Or pass any peer ID from this workspace.",
},
"card": {
"type": "array",
"items": {"type": "string"},
"description": "New peer card as a list of fact strings. Omit to read the current card.",
},
},
"required": [],
},
"parameters": {"type": "object", "properties": {}, "required": []},
}
SEARCH_SCHEMA = {
"name": "honcho_search",
"description": (
"Semantic search over Honcho's stored context about a peer. "
"Semantic search over Honcho's stored context about the user. "
"Returns raw excerpts ranked by relevance — no LLM synthesis. "
"Cheaper and faster than honcho_reasoning. "
"Cheaper and faster than honcho_context. "
"Good when you want to find specific past facts and reason over them yourself."
),
"parameters": {
@@ -74,49 +60,6 @@ SEARCH_SCHEMA = {
"type": "integer",
"description": "Token budget for returned context (default 800, max 2000).",
},
"peer": {
"type": "string",
"description": "Peer to query. Built-in aliases: 'user' (default), 'ai'. Or pass any peer ID from this workspace.",
},
},
"required": ["query"],
},
}
REASONING_SCHEMA = {
"name": "honcho_reasoning",
"description": (
"Ask Honcho a natural language question and get a synthesized answer. "
"Uses Honcho's LLM (dialectic reasoning) — higher cost than honcho_profile or honcho_search. "
"Can query about any peer via alias or explicit peer ID. "
"Pass reasoning_level to control depth: minimal (fast/cheap), low (default), "
"medium, high, max (deep/expensive). Omit for configured default."
),
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "A natural language question.",
},
"reasoning_level": {
"type": "string",
"description": (
"Override the default reasoning depth. "
"Omit to use the configured default (typically low). "
"Guide:\n"
"- minimal: quick factual lookups (name, role, simple preference)\n"
"- low: straightforward questions with clear answers\n"
"- medium: multi-aspect questions requiring synthesis across observations\n"
"- high: complex behavioral patterns, contradictions, deep analysis\n"
"- max: thorough audit-level analysis, leave no stone unturned"
),
"enum": ["minimal", "low", "medium", "high", "max"],
},
"peer": {
"type": "string",
"description": "Peer to query. Built-in aliases: 'user' (default), 'ai'. Or pass any peer ID from this workspace.",
},
},
"required": ["query"],
},
@@ -125,58 +68,48 @@ REASONING_SCHEMA = {
CONTEXT_SCHEMA = {
"name": "honcho_context",
"description": (
"Retrieve full session context from Honcho — summary, peer representation, "
"peer card, and recent messages. No LLM synthesis. "
"Cheaper than honcho_reasoning. Use this to see what Honcho knows about "
"the current conversation and the specified peer."
"Ask Honcho a natural language question and get a synthesized answer. "
"Uses Honcho's LLM (dialectic reasoning) — higher cost than honcho_profile or honcho_search. "
"Can query about any peer: the user (default) or the AI assistant."
),
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Optional focus query to filter context. Omit for full session context snapshot.",
"description": "A natural language question.",
},
"peer": {
"type": "string",
"description": "Peer to query. Built-in aliases: 'user' (default), 'ai'. Or pass any peer ID from this workspace.",
"description": "Which peer to query about: 'user' (default) or 'ai'.",
},
},
"required": [],
"required": ["query"],
},
}
CONCLUDE_SCHEMA = {
"name": "honcho_conclude",
"description": (
"Write or delete a conclusion about a peer in Honcho's memory. "
"Conclusions are persistent facts that build a peer's profile. "
"You MUST pass exactly one of: `conclusion` (to create) or `delete_id` (to delete). "
"Passing neither is an error. "
"Deletion is only for PII removal — Honcho self-heals incorrect conclusions over time."
"Write a conclusion about the user back to Honcho's memory. "
"Conclusions are persistent facts that build the user's profile. "
"Use when the user states a preference, corrects you, or shares "
"something to remember across sessions."
),
"parameters": {
"type": "object",
"properties": {
"conclusion": {
"type": "string",
"description": "A factual statement to persist. Provide this when creating a conclusion. Do not send it together with delete_id.",
},
"delete_id": {
"type": "string",
"description": "Conclusion ID to delete for PII removal. Provide this when deleting a conclusion. Do not send it together with conclusion.",
},
"peer": {
"type": "string",
"description": "Peer to query. Built-in aliases: 'user' (default), 'ai'. Or pass any peer ID from this workspace.",
},
"description": "A factual statement about the user to persist.",
}
},
"required": [],
"required": ["conclusion"],
},
}
ALL_TOOL_SCHEMAS = [PROFILE_SCHEMA, SEARCH_SCHEMA, REASONING_SCHEMA, CONTEXT_SCHEMA, CONCLUDE_SCHEMA]
ALL_TOOL_SCHEMAS = [PROFILE_SCHEMA, SEARCH_SCHEMA, CONTEXT_SCHEMA, CONCLUDE_SCHEMA]
# ---------------------------------------------------------------------------
@@ -198,18 +131,16 @@ class HonchoMemoryProvider(MemoryProvider):
# B1: recall_mode — set during initialize from config
self._recall_mode = "hybrid" # "context", "tools", or "hybrid"
# Base context cache — refreshed on context_cadence, not frozen
self._base_context_cache: Optional[str] = None
self._base_context_lock = threading.Lock()
# B4: First-turn context baking
self._first_turn_context: Optional[str] = None
self._first_turn_lock = threading.Lock()
# B5: Cost-awareness turn counting and cadence
self._turn_count = 0
self._injection_frequency = "every-turn" # or "first-turn"
self._context_cadence = 1 # minimum turns between context API calls
self._dialectic_cadence = 3 # minimum turns between dialectic API calls
self._dialectic_depth = 1 # how many .chat() calls per dialectic cycle (1-3)
self._dialectic_depth_levels: list[str] | None = None # per-pass reasoning levels
self._reasoning_level_cap: Optional[str] = None # "minimal", "low", "medium", "high"
self._dialectic_cadence = 1 # minimum turns between dialectic API calls
self._reasoning_level_cap: Optional[str] = None # "minimal", "low", "mid", "high"
self._last_context_turn = -999
self._last_dialectic_turn = -999
@@ -305,11 +236,9 @@ class HonchoMemoryProvider(MemoryProvider):
raw = cfg.raw or {}
self._injection_frequency = raw.get("injectionFrequency", "every-turn")
self._context_cadence = int(raw.get("contextCadence", 1))
self._dialectic_cadence = int(raw.get("dialecticCadence", 3))
self._dialectic_depth = max(1, min(cfg.dialectic_depth, 3))
self._dialectic_depth_levels = cfg.dialectic_depth_levels
self._dialectic_cadence = int(raw.get("dialecticCadence", 1))
cap = raw.get("reasoningLevelCap")
if cap and cap in ("minimal", "low", "medium", "high"):
if cap and cap in ("minimal", "low", "mid", "high"):
self._reasoning_level_cap = cap
except Exception as e:
logger.debug("Honcho cost-awareness config parse error: %s", e)
@@ -322,7 +251,9 @@ class HonchoMemoryProvider(MemoryProvider):
# ----- Port #1957: lazy session init for tools-only mode -----
if self._recall_mode == "tools":
if cfg.init_on_session_start:
# Eager init even in tools mode (opt-in)
# Eager init: create session now so sync_turn() works from turn 1.
# Does NOT enable auto-injection — prefetch() still returns empty.
logger.debug("Honcho tools-only mode — eager session init (initOnSessionStart=true)")
self._do_session_init(cfg, session_id, **kwargs)
return
# Defer actual session creation until first tool call
@@ -356,13 +287,8 @@ class HonchoMemoryProvider(MemoryProvider):
# ----- B3: resolve_session_name -----
session_title = kwargs.get("session_title")
gateway_session_key = kwargs.get("gateway_session_key")
self._session_key = (
cfg.resolve_session_name(
session_title=session_title,
session_id=session_id,
gateway_session_key=gateway_session_key,
)
cfg.resolve_session_name(session_title=session_title, session_id=session_id)
or session_id
or "hermes-default"
)
@@ -373,21 +299,12 @@ class HonchoMemoryProvider(MemoryProvider):
self._session_initialized = True
# ----- B6: Memory file migration (one-time, for new sessions) -----
# Skip under per-session strategy: every Hermes run creates a fresh
# Honcho session by design, so uploading MEMORY.md/USER.md/SOUL.md to
# each one would flood the backend with short-lived duplicates instead
# of performing a one-time migration.
try:
if not session.messages and cfg.session_strategy != "per-session":
if not session.messages:
from hermes_constants import get_hermes_home
mem_dir = str(get_hermes_home() / "memories")
self._manager.migrate_memory_files(self._session_key, mem_dir)
logger.debug("Honcho memory file migration attempted for new session: %s", self._session_key)
elif cfg.session_strategy == "per-session":
logger.debug(
"Honcho memory file migration skipped: per-session strategy creates a fresh session per run (%s)",
self._session_key,
)
except Exception as e:
logger.debug("Honcho memory file migration skipped: %s", e)
@@ -430,11 +347,6 @@ class HonchoMemoryProvider(MemoryProvider):
"""Format the prefetch context dict into a readable system prompt block."""
parts = []
# Session summary — session-scoped context, placed first for relevance
summary = ctx.get("summary", "")
if summary:
parts.append(f"## Session Summary\n{summary}")
rep = ctx.get("representation", "")
if rep:
parts.append(f"## User Representation\n{rep}")
@@ -458,9 +370,9 @@ class HonchoMemoryProvider(MemoryProvider):
def system_prompt_block(self) -> str:
"""Return system prompt text, adapted by recall_mode.
Returns only the mode header and tool instructions static text
that doesn't change between turns (prompt-cache friendly).
Live context (representation, card) is injected via prefetch().
B4: On the FIRST call, fetch and bake the full Honcho context
(user representation, peer card, AI representation, continuity synthesis).
Subsequent calls return the cached block for prompt caching stability.
"""
if self._cron_skipped:
return ""
@@ -470,10 +382,24 @@ class HonchoMemoryProvider(MemoryProvider):
return (
"# Honcho Memory\n"
"Active (tools-only mode). Use honcho_profile, honcho_search, "
"honcho_reasoning, honcho_context, and honcho_conclude tools to access user memory."
"honcho_context, and honcho_conclude tools to access user memory."
)
return ""
# ----- B4: First-turn context baking -----
first_turn_block = ""
if self._recall_mode in ("context", "hybrid"):
with self._first_turn_lock:
if self._first_turn_context is None:
# First call — fetch and cache
try:
ctx = self._manager.get_prefetch_context(self._session_key)
self._first_turn_context = self._format_first_turn_context(ctx) if ctx else ""
except Exception as e:
logger.debug("Honcho first-turn context fetch failed: %s", e)
self._first_turn_context = ""
first_turn_block = self._first_turn_context
# ----- B1: adapt text based on recall_mode -----
if self._recall_mode == "context":
header = (
@@ -486,8 +412,7 @@ class HonchoMemoryProvider(MemoryProvider):
header = (
"# Honcho Memory\n"
"Active (tools-only mode). Use honcho_profile for a quick factual snapshot, "
"honcho_search for raw excerpts, honcho_context for raw peer context, "
"honcho_reasoning for synthesized answers, "
"honcho_search for raw excerpts, honcho_context for synthesized answers, "
"honcho_conclude to save facts about the user. "
"No automatic context injection — you must use tools to access memory."
)
@@ -496,19 +421,16 @@ class HonchoMemoryProvider(MemoryProvider):
"# Honcho Memory\n"
"Active (hybrid mode). Relevant context is auto-injected AND memory tools are available. "
"Use honcho_profile for a quick factual snapshot, "
"honcho_search for raw excerpts, honcho_context for raw peer context, "
"honcho_reasoning for synthesized answers, "
"honcho_search for raw excerpts, honcho_context for synthesized answers, "
"honcho_conclude to save facts about the user."
)
if first_turn_block:
return f"{header}\n\n{first_turn_block}"
return header
def prefetch(self, query: str, *, session_id: str = "") -> str:
"""Return base context (representation + card) plus dialectic supplement.
Assembles two layers:
1. Base context from peer.context() cached, refreshed on context_cadence
2. Dialectic supplement cached, refreshed on dialectic_cadence
"""Return prefetched dialectic context from background thread.
B1: Returns empty when recall_mode is "tools" (no injection).
B5: Respects injection_frequency "first-turn" returns cached/empty after turn 0.
@@ -521,95 +443,22 @@ class HonchoMemoryProvider(MemoryProvider):
if self._recall_mode == "tools":
return ""
# B5: injection_frequency — if "first-turn" and past first turn, return empty.
# _turn_count is 1-indexed (first user message = 1), so > 1 means "past first".
if self._injection_frequency == "first-turn" and self._turn_count > 1:
# B5: injection_frequency — if "first-turn" and past first turn, return empty
if self._injection_frequency == "first-turn" and self._turn_count > 0:
return ""
parts = []
# ----- Layer 1: Base context (representation + card) -----
# On first call, fetch synchronously so turn 1 isn't empty.
# After that, serve from cache and refresh in background on cadence.
with self._base_context_lock:
if self._base_context_cache is None:
# First call — synchronous fetch
try:
ctx = self._manager.get_prefetch_context(self._session_key)
self._base_context_cache = self._format_first_turn_context(ctx) if ctx else ""
self._last_context_turn = self._turn_count
except Exception as e:
logger.debug("Honcho base context fetch failed: %s", e)
self._base_context_cache = ""
base_context = self._base_context_cache
# Check if background context prefetch has a fresher result
if self._manager:
fresh_ctx = self._manager.pop_context_result(self._session_key)
if fresh_ctx:
formatted = self._format_first_turn_context(fresh_ctx)
if formatted:
with self._base_context_lock:
self._base_context_cache = formatted
base_context = formatted
if base_context:
parts.append(base_context)
# ----- Layer 2: Dialectic supplement -----
# On the very first turn, no queue_prefetch() has run yet so the
# dialectic result is empty. Run with a bounded timeout so a slow
# Honcho connection doesn't block the first response indefinitely.
# On timeout the result is skipped and queue_prefetch() will pick it
# up at the next cadence-allowed turn.
if self._last_dialectic_turn == -999 and query:
_first_turn_timeout = (
self._config.timeout if self._config and self._config.timeout else 8.0
)
_result_holder: list[str] = []
def _run_first_turn() -> None:
try:
_result_holder.append(self._run_dialectic_depth(query))
except Exception as exc:
logger.debug("Honcho first-turn dialectic failed: %s", exc)
_t = threading.Thread(target=_run_first_turn, daemon=True)
_t.start()
_t.join(timeout=_first_turn_timeout)
if not _t.is_alive():
first_turn_dialectic = _result_holder[0] if _result_holder else ""
if first_turn_dialectic and first_turn_dialectic.strip():
with self._prefetch_lock:
self._prefetch_result = first_turn_dialectic
self._last_dialectic_turn = self._turn_count
else:
logger.debug(
"Honcho first-turn dialectic timed out (%.1fs) — "
"will inject at next cadence-allowed turn",
_first_turn_timeout,
)
# Don't update _last_dialectic_turn: queue_prefetch() will
# retry at the next cadence-allowed turn via the async path.
if self._prefetch_thread and self._prefetch_thread.is_alive():
self._prefetch_thread.join(timeout=3.0)
with self._prefetch_lock:
dialectic_result = self._prefetch_result
result = self._prefetch_result
self._prefetch_result = ""
if dialectic_result and dialectic_result.strip():
parts.append(dialectic_result)
if not parts:
if not result:
return ""
result = "\n\n".join(parts)
# ----- Port #3265: token budget enforcement -----
result = self._truncate_to_budget(result)
return result
return f"## Honcho Context\n{result}"
def _truncate_to_budget(self, text: str) -> str:
"""Truncate text to fit within context_tokens budget if set."""
@@ -626,11 +475,9 @@ class HonchoMemoryProvider(MemoryProvider):
return truncated + ""
def queue_prefetch(self, query: str, *, session_id: str = "") -> None:
"""Fire background prefetch threads for the upcoming turn.
"""Fire a background dialectic query for the upcoming turn.
B5: Checks cadence independently for dialectic and context refresh.
Context refresh updates the base layer (representation + card).
Dialectic fires the LLM reasoning supplement.
B5: Checks cadence before firing background threads.
"""
if self._cron_skipped:
return
@@ -641,15 +488,6 @@ class HonchoMemoryProvider(MemoryProvider):
if self._recall_mode == "tools":
return
# ----- Context refresh (base layer) — independent cadence -----
if self._context_cadence <= 1 or (self._turn_count - self._last_context_turn) >= self._context_cadence:
self._last_context_turn = self._turn_count
try:
self._manager.prefetch_context(self._session_key, query)
except Exception as e:
logger.debug("Honcho context prefetch failed: %s", e)
# ----- Dialectic prefetch (supplement layer) -----
# B5: cadence check — skip if too soon since last dialectic call
if self._dialectic_cadence > 1:
if (self._turn_count - self._last_dialectic_turn) < self._dialectic_cadence:
@@ -661,7 +499,9 @@ class HonchoMemoryProvider(MemoryProvider):
def _run():
try:
result = self._run_dialectic_depth(query)
result = self._manager.dialectic_query(
self._session_key, query, peer="user"
)
if result and result.strip():
with self._prefetch_lock:
self._prefetch_result = result
@@ -673,140 +513,13 @@ class HonchoMemoryProvider(MemoryProvider):
)
self._prefetch_thread.start()
# ----- Dialectic depth: multi-pass .chat() with cold/warm prompts -----
# Proportional reasoning levels per depth/pass when dialecticDepthLevels
# is not configured. The base level is dialecticReasoningLevel.
# Index: (depth, pass) → level relative to base.
_PROPORTIONAL_LEVELS: dict[tuple[int, int], str] = {
# depth 1: single pass at base level
(1, 0): "base",
# depth 2: pass 0 lighter, pass 1 at base
(2, 0): "minimal",
(2, 1): "base",
# depth 3: pass 0 lighter, pass 1 at base, pass 2 one above minimal
(3, 0): "minimal",
(3, 1): "base",
(3, 2): "low",
}
_LEVEL_ORDER = ("minimal", "low", "medium", "high", "max")
def _resolve_pass_level(self, pass_idx: int) -> str:
"""Resolve reasoning level for a given pass index.
Uses dialecticDepthLevels if configured, otherwise proportional
defaults relative to dialecticReasoningLevel.
"""
if self._dialectic_depth_levels and pass_idx < len(self._dialectic_depth_levels):
return self._dialectic_depth_levels[pass_idx]
base = (self._config.dialectic_reasoning_level if self._config else "low")
mapping = self._PROPORTIONAL_LEVELS.get((self._dialectic_depth, pass_idx))
if mapping is None or mapping == "base":
return base
return mapping
def _build_dialectic_prompt(self, pass_idx: int, prior_results: list[str], is_cold: bool) -> str:
"""Build the prompt for a given dialectic pass.
Pass 0: cold start (general user query) or warm (session-scoped).
Pass 1: self-audit / targeted synthesis against gaps from pass 0.
Pass 2: reconciliation / contradiction check across prior passes.
"""
if pass_idx == 0:
if is_cold:
return (
"Who is this person? What are their preferences, goals, "
"and working style? Focus on facts that would help an AI "
"assistant be immediately useful."
)
return (
"Given what's been discussed in this session so far, what "
"context about this user is most relevant to the current "
"conversation? Prioritize active context over biographical facts."
)
elif pass_idx == 1:
prior = prior_results[-1] if prior_results else ""
return (
f"Given this initial assessment:\n\n{prior}\n\n"
"What gaps remain in your understanding that would help "
"going forward? Synthesize what you actually know about "
"the user's current state and immediate needs, grounded "
"in evidence from recent sessions."
)
else:
# pass 2: reconciliation
return (
f"Prior passes produced:\n\n"
f"Pass 1:\n{prior_results[0] if len(prior_results) > 0 else '(empty)'}\n\n"
f"Pass 2:\n{prior_results[1] if len(prior_results) > 1 else '(empty)'}\n\n"
"Do these assessments cohere? Reconcile any contradictions "
"and produce a final, concise synthesis of what matters most "
"for the current conversation."
)
@staticmethod
def _signal_sufficient(result: str) -> bool:
"""Check if a dialectic pass returned enough signal to skip further passes.
Heuristic: a response longer than 100 chars with some structure
(section headers, bullets, or an ordered list) is considered sufficient.
"""
if not result or len(result.strip()) < 100:
return False
# Structured output with sections/bullets is strong signal
if "\n" in result and (
"##" in result
or "" in result
or re.search(r"^[*-] ", result, re.MULTILINE)
or re.search(r"^\s*\d+\. ", result, re.MULTILINE)
):
return True
# Long enough even without structure
return len(result.strip()) > 300
def _run_dialectic_depth(self, query: str) -> str:
"""Execute up to dialecticDepth .chat() calls with conditional bail-out.
Cold start (no base context): general user-oriented query.
Warm session (base context exists): session-scoped query.
Each pass is conditional bails early if prior pass returned strong signal.
Returns the best (usually last) result.
"""
if not self._manager or not self._session_key:
return ""
is_cold = not self._base_context_cache
results: list[str] = []
for i in range(self._dialectic_depth):
if i == 0:
prompt = self._build_dialectic_prompt(0, results, is_cold)
else:
# Skip further passes if prior pass delivered strong signal
if results and self._signal_sufficient(results[-1]):
logger.debug("Honcho dialectic depth %d: pass %d skipped, prior signal sufficient",
self._dialectic_depth, i)
break
prompt = self._build_dialectic_prompt(i, results, is_cold)
level = self._resolve_pass_level(i)
logger.debug("Honcho dialectic depth %d: pass %d, level=%s, cold=%s",
self._dialectic_depth, i, level, is_cold)
result = self._manager.dialectic_query(
self._session_key, prompt,
reasoning_level=level,
peer="user",
)
results.append(result or "")
# Return the last non-empty result (deepest pass that ran)
for r in reversed(results):
if r and r.strip():
return r
return ""
# Also fire context prefetch if cadence allows
if self._context_cadence <= 1 or (self._turn_count - self._last_context_turn) >= self._context_cadence:
self._last_context_turn = self._turn_count
try:
self._manager.prefetch_context(self._session_key, query)
except Exception as e:
logger.debug("Honcho context prefetch failed: %s", e)
def on_turn_start(self, turn_number: int, message: str, **kwargs) -> None:
"""Track turn count for cadence and injection_frequency logic."""
@@ -946,14 +659,7 @@ class HonchoMemoryProvider(MemoryProvider):
try:
if tool_name == "honcho_profile":
peer = args.get("peer", "user")
card_update = args.get("card")
if card_update:
result = self._manager.set_peer_card(self._session_key, card_update, peer=peer)
if result is None:
return tool_error("Failed to update peer card.")
return json.dumps({"result": f"Peer card updated ({len(result)} facts).", "card": result})
card = self._manager.get_peer_card(self._session_key, peer=peer)
card = self._manager.get_peer_card(self._session_key)
if not card:
return json.dumps({"result": "No profile facts available yet."})
return json.dumps({"result": card})
@@ -963,68 +669,30 @@ class HonchoMemoryProvider(MemoryProvider):
if not query:
return tool_error("Missing required parameter: query")
max_tokens = min(int(args.get("max_tokens", 800)), 2000)
peer = args.get("peer", "user")
result = self._manager.search_context(
self._session_key, query, max_tokens=max_tokens, peer=peer
self._session_key, query, max_tokens=max_tokens
)
if not result:
return json.dumps({"result": "No relevant context found."})
return json.dumps({"result": result})
elif tool_name == "honcho_reasoning":
elif tool_name == "honcho_context":
query = args.get("query", "")
if not query:
return tool_error("Missing required parameter: query")
peer = args.get("peer", "user")
reasoning_level = args.get("reasoning_level")
result = self._manager.dialectic_query(
self._session_key, query,
reasoning_level=reasoning_level,
peer=peer,
self._session_key, query, peer=peer
)
# Update cadence tracker so auto-injection respects the gap after an explicit call
self._last_dialectic_turn = self._turn_count
return json.dumps({"result": result or "No result from Honcho."})
elif tool_name == "honcho_context":
peer = args.get("peer", "user")
ctx = self._manager.get_session_context(self._session_key, peer=peer)
if not ctx:
return json.dumps({"result": "No context available yet."})
parts = []
if ctx.get("summary"):
parts.append(f"## Summary\n{ctx['summary']}")
if ctx.get("representation"):
parts.append(f"## Representation\n{ctx['representation']}")
if ctx.get("card"):
parts.append(f"## Card\n{ctx['card']}")
if ctx.get("recent_messages"):
msgs = ctx["recent_messages"]
msg_str = "\n".join(
f" [{m['role']}] {m['content'][:200]}"
for m in msgs[-5:] # last 5 for brevity
)
parts.append(f"## Recent messages\n{msg_str}")
return json.dumps({"result": "\n\n".join(parts) or "No context available."})
elif tool_name == "honcho_conclude":
delete_id = (args.get("delete_id") or "").strip()
conclusion = args.get("conclusion", "").strip()
peer = args.get("peer", "user")
has_delete_id = bool(delete_id)
has_conclusion = bool(conclusion)
if has_delete_id == has_conclusion:
return tool_error("Exactly one of conclusion or delete_id must be provided.")
if has_delete_id:
ok = self._manager.delete_conclusion(self._session_key, delete_id, peer=peer)
if ok:
return json.dumps({"result": f"Conclusion {delete_id} deleted."})
return tool_error(f"Failed to delete conclusion {delete_id}.")
ok = self._manager.create_conclusion(self._session_key, conclusion, peer=peer)
conclusion = args.get("conclusion", "")
if not conclusion:
return tool_error("Missing required parameter: conclusion")
ok = self._manager.create_conclusion(self._session_key, conclusion)
if ok:
return json.dumps({"result": f"Conclusion saved for {peer}: {conclusion}"})
return json.dumps({"result": f"Conclusion saved: {conclusion}"})
return tool_error("Failed to save conclusion.")
return tool_error(f"Unknown tool: {tool_name}")
+16 -109
View File
@@ -440,43 +440,11 @@ def cmd_setup(args) -> None:
if new_recall in ("hybrid", "context", "tools"):
hermes_host["recallMode"] = new_recall
# --- 7. Context token budget ---
current_ctx_tokens = hermes_host.get("contextTokens") or cfg.get("contextTokens")
current_display = str(current_ctx_tokens) if current_ctx_tokens else "uncapped"
print("\n Context injection per turn (hybrid/context recall modes only):")
print(" uncapped -- no limit (default)")
print(" N -- token limit per turn (e.g. 1200)")
new_ctx_tokens = _prompt("Context tokens", default=current_display)
if new_ctx_tokens.strip().lower() in ("none", "uncapped", "no limit"):
hermes_host.pop("contextTokens", None)
elif new_ctx_tokens.strip() == "":
pass # keep current
else:
try:
val = int(new_ctx_tokens)
if val >= 0:
hermes_host["contextTokens"] = val
except (ValueError, TypeError):
pass # keep current
# --- 7b. Dialectic cadence ---
current_dialectic = str(hermes_host.get("dialecticCadence") or cfg.get("dialecticCadence") or "3")
print("\n Dialectic cadence:")
print(" How often Honcho rebuilds its user model (LLM call on Honcho backend).")
print(" 1 = every turn (aggressive), 3 = every 3 turns (recommended), 5+ = sparse.")
new_dialectic = _prompt("Dialectic cadence", default=current_dialectic)
try:
val = int(new_dialectic)
if val >= 1:
hermes_host["dialecticCadence"] = val
except (ValueError, TypeError):
hermes_host["dialecticCadence"] = 3
# --- 8. Session strategy ---
current_strat = hermes_host.get("sessionStrategy") or cfg.get("sessionStrategy", "per-session")
# --- 7. Session strategy ---
current_strat = hermes_host.get("sessionStrategy") or cfg.get("sessionStrategy", "per-directory")
print("\n Session strategy:")
print(" per-session -- each run starts clean, Honcho injects context automatically")
print(" per-directory -- reuses session per dir, prior context auto-injected each run")
print(" per-directory -- one session per working directory (default)")
print(" per-session -- new Honcho session each run")
print(" per-repo -- one session per git repository")
print(" global -- single session across all directories")
new_strat = _prompt("Session strategy", default=current_strat)
@@ -522,11 +490,10 @@ def cmd_setup(args) -> None:
print(f" Recall: {hcfg.recall_mode}")
print(f" Sessions: {hcfg.session_strategy}")
print("\n Honcho tools available in chat:")
print(" honcho_context -- session context: summary, representation, card, messages")
print(" honcho_search -- semantic search over history")
print(" honcho_profile -- peer card, key facts")
print(" honcho_reasoning -- ask Honcho a question, synthesized answer")
print(" honcho_conclude -- persist a user fact to memory")
print(" honcho_context -- ask Honcho about the user (LLM-synthesized)")
print(" honcho_search -- semantic search over history (no LLM)")
print(" honcho_profile -- peer card, key facts (no LLM)")
print(" honcho_conclude -- persist a user fact to memory (no LLM)")
print("\n Other commands:")
print(" hermes honcho status -- show full config")
print(" hermes honcho mode -- change recall/observation mode")
@@ -618,26 +585,13 @@ def cmd_status(args) -> None:
print(f" Enabled: {hcfg.enabled}")
print(f" API key: {masked}")
print(f" Workspace: {hcfg.workspace_id}")
# Config paths — show where config was read from and where writes go
global_path = Path.home() / ".honcho" / "config.json"
print(f" Config: {active_path}")
print(f" Config path: {active_path}")
if write_path != active_path:
print(f" Write to: {write_path} (profile-local)")
if active_path == global_path:
print(f" Fallback: (none — using global ~/.honcho/config.json)")
elif global_path.exists():
print(f" Fallback: {global_path} (exists, cross-app interop)")
print(f" Write path: {write_path} (instance-local)")
print(f" AI peer: {hcfg.ai_peer}")
print(f" User peer: {hcfg.peer_name or 'not set'}")
print(f" Session key: {hcfg.resolve_session_name()}")
print(f" Session strat: {hcfg.session_strategy}")
print(f" Recall mode: {hcfg.recall_mode}")
print(f" Context budget: {hcfg.context_tokens or '(uncapped)'} tokens")
raw = getattr(hcfg, "raw", None) or {}
dialectic_cadence = raw.get("dialecticCadence") or 3
print(f" Dialectic cad: every {dialectic_cadence} turn{'s' if dialectic_cadence != 1 else ''}")
print(f" Observation: user(me={hcfg.user_observe_me},others={hcfg.user_observe_others}) ai(me={hcfg.ai_observe_me},others={hcfg.ai_observe_others})")
print(f" Write freq: {hcfg.write_frequency}")
@@ -645,8 +599,8 @@ def cmd_status(args) -> None:
print("\n Connection... ", end="", flush=True)
try:
client = get_honcho_client(hcfg)
_show_peer_cards(hcfg, client)
print("OK")
_show_peer_cards(hcfg, client)
except Exception as e:
print(f"FAILED ({e})\n")
else:
@@ -870,41 +824,6 @@ def cmd_mode(args) -> None:
print(f" {label}Recall mode -> {mode_arg} ({MODES[mode_arg]})\n")
def cmd_strategy(args) -> None:
"""Show or set the session strategy."""
STRATEGIES = {
"per-session": "each run starts clean, Honcho injects context automatically",
"per-directory": "reuses session per dir, prior context auto-injected each run",
"per-repo": "one session per git repository",
"global": "single session across all directories",
}
cfg = _read_config()
strat_arg = getattr(args, "strategy", None)
if strat_arg is None:
current = (
(cfg.get("hosts") or {}).get(_host_key(), {}).get("sessionStrategy")
or cfg.get("sessionStrategy")
or "per-session"
)
print("\nHoncho session strategy\n" + "" * 40)
for s, desc in STRATEGIES.items():
marker = " <-" if s == current else ""
print(f" {s:<15} {desc}{marker}")
print(f"\n Set with: hermes honcho strategy [per-session|per-directory|per-repo|global]\n")
return
if strat_arg not in STRATEGIES:
print(f" Invalid strategy '{strat_arg}'. Options: {', '.join(STRATEGIES)}\n")
return
host = _host_key()
label = f"[{host}] " if host != "hermes" else ""
cfg.setdefault("hosts", {}).setdefault(host, {})["sessionStrategy"] = strat_arg
_write_config(cfg)
print(f" {label}Session strategy -> {strat_arg} ({STRATEGIES[strat_arg]})\n")
def cmd_tokens(args) -> None:
"""Show or set token budget settings."""
cfg = _read_config()
@@ -1224,11 +1143,10 @@ def cmd_migrate(args) -> None:
print(" automatically. Files become the seed, not the live store.")
print()
print(" Honcho tools (available to the agent during conversation)")
print(" honcho_context — session context: summary, representation, card, messages")
print(" honcho_search — semantic search over stored context")
print(" honcho_profile — fast peer card snapshot")
print(" honcho_reasoning — ask Honcho a question, synthesized answer")
print(" honcho_conclude — write a conclusion/fact back to memory")
print(" honcho_context — ask Honcho a question, get a synthesized answer (LLM)")
print(" honcho_search — semantic search over stored context (no LLM)")
print(" honcho_profile — fast peer card snapshot (no LLM)")
print(" honcho_conclude — write a conclusion/fact back to memory (no LLM)")
print()
print(" Session naming")
print(" OpenClaw: no persistent session concept — files are global.")
@@ -1279,8 +1197,6 @@ def honcho_command(args) -> None:
cmd_peer(args)
elif sub == "mode":
cmd_mode(args)
elif sub == "strategy":
cmd_strategy(args)
elif sub == "tokens":
cmd_tokens(args)
elif sub == "identity":
@@ -1295,7 +1211,7 @@ def honcho_command(args) -> None:
cmd_sync(args)
else:
print(f" Unknown honcho command: {sub}")
print(" Available: status, sessions, map, peer, mode, strategy, tokens, identity, migrate, enable, disable, sync\n")
print(" Available: status, sessions, map, peer, mode, tokens, identity, migrate, enable, disable, sync\n")
def register_cli(subparser) -> None:
@@ -1354,15 +1270,6 @@ def register_cli(subparser) -> None:
help="Recall mode to set (hybrid/context/tools). Omit to show current.",
)
strategy_parser = subs.add_parser(
"strategy", help="Show or set session strategy (per-session/per-directory/per-repo/global)",
)
strategy_parser.add_argument(
"strategy", nargs="?", metavar="STRATEGY",
choices=("per-session", "per-directory", "per-repo", "global"),
help="Session strategy to set. Omit to show current.",
)
tokens_parser = subs.add_parser(
"tokens", help="Show or set token budget for context and dialectic",
)
+16 -127
View File
@@ -94,68 +94,6 @@ def _resolve_bool(host_val, root_val, *, default: bool) -> bool:
return default
def _parse_context_tokens(host_val, root_val) -> int | None:
"""Parse contextTokens: host wins, then root, then None (uncapped)."""
for val in (host_val, root_val):
if val is not None:
try:
return int(val)
except (ValueError, TypeError):
pass
return None
def _parse_dialectic_depth(host_val, root_val) -> int:
"""Parse dialecticDepth: host wins, then root, then 1. Clamped to 1-3."""
for val in (host_val, root_val):
if val is not None:
try:
return max(1, min(int(val), 3))
except (ValueError, TypeError):
pass
return 1
_VALID_REASONING_LEVELS = ("minimal", "low", "medium", "high", "max")
def _parse_dialectic_depth_levels(host_val, root_val, depth: int) -> list[str] | None:
"""Parse dialecticDepthLevels: optional array of reasoning levels per pass.
Returns None when not configured (use proportional defaults).
When configured, validates each level and truncates/pads to match depth.
"""
for val in (host_val, root_val):
if val is not None and isinstance(val, list):
levels = [
lvl if lvl in _VALID_REASONING_LEVELS else "low"
for lvl in val[:depth]
]
# Pad with "low" if array is shorter than depth
while len(levels) < depth:
levels.append("low")
return levels
return None
def _resolve_optional_float(*values: Any) -> float | None:
"""Return the first non-empty value coerced to a positive float."""
for value in values:
if value is None:
continue
if isinstance(value, str):
value = value.strip()
if not value:
continue
try:
parsed = float(value)
except (TypeError, ValueError):
continue
if parsed > 0:
return parsed
return None
_VALID_OBSERVATION_MODES = {"unified", "directional"}
_OBSERVATION_MODE_ALIASES = {"shared": "unified", "separate": "directional", "cross": "directional"}
@@ -221,8 +159,6 @@ class HonchoClientConfig:
environment: str = "production"
# Optional base URL for self-hosted Honcho (overrides environment mapping)
base_url: str | None = None
# Optional request timeout in seconds for Honcho SDK HTTP calls
timeout: float | None = None
# Identity
peer_name: str | None = None
ai_peer: str = "hermes"
@@ -232,25 +168,17 @@ class HonchoClientConfig:
# Write frequency: "async" (background thread), "turn" (sync per turn),
# "session" (flush on session end), or int (every N turns)
write_frequency: str | int = "async"
# Prefetch budget (None = no cap; set to an integer to bound auto-injected context)
# Prefetch budget
context_tokens: int | None = None
# Dialectic (peer.chat) settings
# reasoning_level: "minimal" | "low" | "medium" | "high" | "max"
dialectic_reasoning_level: str = "low"
# When true, the model can override reasoning_level per-call via the
# honcho_reasoning tool param (agentic). When false, always uses
# dialecticReasoningLevel and ignores model-provided overrides.
# dynamic: auto-bump reasoning level based on query length
# true — low->medium (120+ chars), low->high (400+ chars), capped at "high"
# false — always use dialecticReasoningLevel as-is
dialectic_dynamic: bool = True
# Max chars of dialectic result to inject into Hermes system prompt
dialectic_max_chars: int = 600
# Dialectic depth: how many .chat() calls per dialectic cycle (1-3).
# Depth 1: single call. Depth 2: self-audit + targeted synthesis.
# Depth 3: self-audit + synthesis + reconciliation.
dialectic_depth: int = 1
# Optional per-pass reasoning level override. Array of reasoning levels
# matching dialectic_depth length. When None, uses proportional defaults
# derived from dialectic_reasoning_level.
dialectic_depth_levels: list[str] | None = None
# Honcho API limits — configurable for self-hosted instances
# Max chars per message sent via add_messages() (Honcho cloud: 25000)
message_max_chars: int = 25000
@@ -261,8 +189,10 @@ class HonchoClientConfig:
# "context" — auto-injected context only, Honcho tools removed
# "tools" — Honcho tools only, no auto-injected context
recall_mode: str = "hybrid"
# Eager init in tools mode — when true, initializes session during
# initialize() instead of deferring to first tool call
# When True and recallMode is "tools", create the Honcho session eagerly
# during initialize() instead of deferring to the first tool call.
# This ensures sync_turn() can write from the very first turn.
# Does NOT enable automatic context injection — only changes init timing.
init_on_session_start: bool = False
# Observation mode: legacy string shorthand ("directional" or "unified").
# Kept for backward compat; granular per-peer booleans below are preferred.
@@ -294,14 +224,12 @@ class HonchoClientConfig:
resolved_host = host or resolve_active_host()
api_key = os.environ.get("HONCHO_API_KEY")
base_url = os.environ.get("HONCHO_BASE_URL", "").strip() or None
timeout = _resolve_optional_float(os.environ.get("HONCHO_TIMEOUT"))
return cls(
host=resolved_host,
workspace_id=workspace_id,
api_key=api_key,
environment=os.environ.get("HONCHO_ENVIRONMENT", "production"),
base_url=base_url,
timeout=timeout,
ai_peer=resolved_host,
enabled=bool(api_key or base_url),
)
@@ -362,11 +290,6 @@ class HonchoClientConfig:
or os.environ.get("HONCHO_BASE_URL", "").strip()
or None
)
timeout = _resolve_optional_float(
raw.get("timeout"),
raw.get("requestTimeout"),
os.environ.get("HONCHO_TIMEOUT"),
)
# Auto-enable when API key or base_url is present (unless explicitly disabled)
# Host-level enabled wins, then root-level, then auto-enable if key/url exists.
@@ -412,16 +335,12 @@ class HonchoClientConfig:
api_key=api_key,
environment=environment,
base_url=base_url,
timeout=timeout,
peer_name=host_block.get("peerName") or raw.get("peerName"),
ai_peer=ai_peer,
enabled=enabled,
save_messages=save_messages,
write_frequency=write_frequency,
context_tokens=_parse_context_tokens(
host_block.get("contextTokens"),
raw.get("contextTokens"),
),
context_tokens=host_block.get("contextTokens") or raw.get("contextTokens"),
dialectic_reasoning_level=(
host_block.get("dialecticReasoningLevel")
or raw.get("dialecticReasoningLevel")
@@ -437,15 +356,6 @@ class HonchoClientConfig:
or raw.get("dialecticMaxChars")
or 600
),
dialectic_depth=_parse_dialectic_depth(
host_block.get("dialecticDepth"),
raw.get("dialecticDepth"),
),
dialectic_depth_levels=_parse_dialectic_depth_levels(
host_block.get("dialecticDepthLevels"),
raw.get("dialecticDepthLevels"),
depth=_parse_dialectic_depth(host_block.get("dialecticDepth"), raw.get("dialecticDepth")),
),
message_max_chars=int(
host_block.get("messageMaxChars")
or raw.get("messageMaxChars")
@@ -512,18 +422,16 @@ class HonchoClientConfig:
cwd: str | None = None,
session_title: str | None = None,
session_id: str | None = None,
gateway_session_key: str | None = None,
) -> str | None:
"""Resolve Honcho session name.
Resolution order:
1. Manual directory override from sessions map
2. Hermes session title (from /title command)
3. Gateway session key (stable per-chat identifier from gateway platforms)
4. per-session strategy Hermes session_id ({timestamp}_{hex})
5. per-repo strategy git repo root directory name
6. per-directory strategy directory basename
7. global strategy workspace name
3. per-session strategy Hermes session_id ({timestamp}_{hex})
4. per-repo strategy git repo root directory name
5. per-directory strategy directory basename
6. global strategy workspace name
"""
import re
@@ -537,22 +445,12 @@ class HonchoClientConfig:
# /title mid-session remap
if session_title:
sanitized = re.sub(r'[^a-zA-Z0-9_-]+', '-', session_title).strip('-')
sanitized = re.sub(r'[^a-zA-Z0-9_-]', '-', session_title).strip('-')
if sanitized:
if self.session_peer_prefix and self.peer_name:
return f"{self.peer_name}-{sanitized}"
return sanitized
# Gateway session key: stable per-chat identifier passed by the gateway
# (e.g. "agent:main:telegram:dm:8439114563"). Sanitize colons to hyphens
# for Honcho session ID compatibility. This takes priority over strategy-
# based resolution because gateway platforms need per-chat isolation that
# cwd-based strategies cannot provide.
if gateway_session_key:
sanitized = re.sub(r'[^a-zA-Z0-9_-]+', '-', gateway_session_key).strip('-')
if sanitized:
return sanitized
# per-session: inherit Hermes session_id (new Honcho session each run)
if self.session_strategy == "per-session" and session_id:
if self.session_peer_prefix and self.peer_name:
@@ -614,20 +512,13 @@ def get_honcho_client(config: HonchoClientConfig | None = None) -> Honcho:
# mapping, enabling remote self-hosted Honcho deployments without
# requiring the server to live on localhost.
resolved_base_url = config.base_url
resolved_timeout = config.timeout
if not resolved_base_url or resolved_timeout is None:
if not resolved_base_url:
try:
from hermes_cli.config import load_config
hermes_cfg = load_config()
honcho_cfg = hermes_cfg.get("honcho", {})
if isinstance(honcho_cfg, dict):
if not resolved_base_url:
resolved_base_url = honcho_cfg.get("base_url", "").strip() or None
if resolved_timeout is None:
resolved_timeout = _resolve_optional_float(
honcho_cfg.get("timeout"),
honcho_cfg.get("request_timeout"),
)
resolved_base_url = honcho_cfg.get("base_url", "").strip() or None
except Exception:
pass
@@ -662,8 +553,6 @@ def get_honcho_client(config: HonchoClientConfig | None = None) -> Honcho:
}
if resolved_base_url:
kwargs["base_url"] = resolved_base_url
if resolved_timeout is not None:
kwargs["timeout"] = resolved_timeout
_honcho_client = Honcho(**kwargs)
+75 -247
View File
@@ -486,9 +486,36 @@ class HonchoSessionManager:
_REASONING_LEVELS = ("minimal", "low", "medium", "high", "max")
def _default_reasoning_level(self) -> str:
"""Return the configured default reasoning level."""
return self._dialectic_reasoning_level
def _dynamic_reasoning_level(self, query: str) -> str:
"""
Pick a reasoning level for a dialectic query.
When dialecticDynamic is true (default), auto-bumps based on query
length so Honcho applies more inference where it matters:
< 120 chars -> configured default (typically "low")
120-400 chars -> +1 level above default (cap at "high")
> 400 chars -> +2 levels above default (cap at "high")
"max" is never selected automatically -- reserve it for explicit config.
When dialecticDynamic is false, always returns the configured level.
"""
if not self._dialectic_dynamic:
return self._dialectic_reasoning_level
levels = self._REASONING_LEVELS
default_idx = levels.index(self._dialectic_reasoning_level) if self._dialectic_reasoning_level in levels else 1
n = len(query)
if n < 120:
bump = 0
elif n < 400:
bump = 1
else:
bump = 2
# Cap at "high" (index 3) for auto-selection
idx = min(default_idx + bump, 3)
return levels[idx]
def dialectic_query(
self, session_key: str, query: str,
@@ -505,9 +532,8 @@ class HonchoSessionManager:
Args:
session_key: The session key to query against.
query: Natural language question.
reasoning_level: Override the configured default (dialecticReasoningLevel).
Only honored when dialecticDynamic is true.
If None or dialecticDynamic is false, uses the configured default.
reasoning_level: Override the config default. If None, uses
_dynamic_reasoning_level(query).
peer: Which peer to query "user" (default) or "ai".
Returns:
@@ -517,34 +543,29 @@ class HonchoSessionManager:
if not session:
return ""
target_peer_id = self._resolve_peer_id(session, peer)
if target_peer_id is None:
return ""
# Guard: truncate query to Honcho's dialectic input limit
if len(query) > self._dialectic_max_input_chars:
query = query[:self._dialectic_max_input_chars].rsplit(" ", 1)[0]
if self._dialectic_dynamic and reasoning_level:
level = reasoning_level
else:
level = self._default_reasoning_level()
level = reasoning_level or self._dynamic_reasoning_level(query)
try:
if self._ai_observe_others:
# AI peer can observe other peers — use assistant as observer.
ai_peer_obj = self._get_or_create_peer(session.assistant_peer_id)
if target_peer_id == session.assistant_peer_id:
# AI peer can observe user — use cross-observation routing
if peer == "ai":
ai_peer_obj = self._get_or_create_peer(session.assistant_peer_id)
result = ai_peer_obj.chat(query, reasoning_level=level) or ""
else:
ai_peer_obj = self._get_or_create_peer(session.assistant_peer_id)
result = ai_peer_obj.chat(
query,
target=target_peer_id,
target=session.user_peer_id,
reasoning_level=level,
) or ""
else:
# Without cross-observation, each peer queries its own context.
target_peer = self._get_or_create_peer(target_peer_id)
# AI can't observe others — each peer queries self
peer_id = session.assistant_peer_id if peer == "ai" else session.user_peer_id
target_peer = self._get_or_create_peer(peer_id)
result = target_peer.chat(query, reasoning_level=level) or ""
# Apply Hermes-side char cap before caching
@@ -626,11 +647,10 @@ class HonchoSessionManager:
"""
Pre-fetch user and AI peer context from Honcho.
Fetches peer_representation and peer_card for both peers, plus the
session summary when available. search_query is intentionally omitted
it would only affect additional excerpts that this code does not
consume, and passing the raw message exposes conversation content in
server access logs.
Fetches peer_representation and peer_card for both peers. search_query
is intentionally omitted it would only affect additional excerpts
that this code does not consume, and passing the raw message exposes
conversation content in server access logs.
Args:
session_key: The session key to get context for.
@@ -638,29 +658,15 @@ class HonchoSessionManager:
Returns:
Dictionary with 'representation', 'card', 'ai_representation',
'ai_card', and optionally 'summary' keys.
and 'ai_card' keys.
"""
session = self._cache.get(session_key)
if not session:
return {}
result: dict[str, str] = {}
# Session summary — provides session-scoped context.
# Fresh sessions (per-session cold start, or first-ever per-directory)
# return null summary — the guard below handles that gracefully.
# Per-directory returning sessions get their accumulated summary.
try:
honcho_session = self._sessions_cache.get(session.honcho_session_id)
if honcho_session:
ctx = honcho_session.context(summary=True)
if ctx.summary and getattr(ctx.summary, "content", None):
result["summary"] = ctx.summary.content
except Exception as e:
logger.debug("Failed to fetch session summary from Honcho: %s", e)
try:
user_ctx = self._fetch_peer_context(session.user_peer_id, target=session.user_peer_id)
user_ctx = self._fetch_peer_context(session.user_peer_id)
result["representation"] = user_ctx["representation"]
result["card"] = "\n".join(user_ctx["card"])
except Exception as e:
@@ -668,7 +674,7 @@ class HonchoSessionManager:
# Also fetch AI peer's own representation so Hermes knows itself.
try:
ai_ctx = self._fetch_peer_context(session.assistant_peer_id, target=session.assistant_peer_id)
ai_ctx = self._fetch_peer_context(session.assistant_peer_id)
result["ai_representation"] = ai_ctx["representation"]
result["ai_card"] = "\n".join(ai_ctx["card"])
except Exception as e:
@@ -856,7 +862,7 @@ class HonchoSessionManager:
return [str(item) for item in card if item]
return [str(card)]
def _fetch_peer_card(self, peer_id: str, *, target: str | None = None) -> list[str]:
def _fetch_peer_card(self, peer_id: str) -> list[str]:
"""Fetch a peer card directly from the peer object.
This avoids relying on session.context(), which can return an empty
@@ -866,33 +872,22 @@ class HonchoSessionManager:
peer = self._get_or_create_peer(peer_id)
getter = getattr(peer, "get_card", None)
if callable(getter):
return self._normalize_card(getter(target=target) if target is not None else getter())
return self._normalize_card(getter())
legacy_getter = getattr(peer, "card", None)
if callable(legacy_getter):
return self._normalize_card(legacy_getter(target=target) if target is not None else legacy_getter())
return self._normalize_card(legacy_getter())
return []
def _fetch_peer_context(
self,
peer_id: str,
search_query: str | None = None,
*,
target: str | None = None,
) -> dict[str, Any]:
def _fetch_peer_context(self, peer_id: str, search_query: str | None = None) -> dict[str, Any]:
"""Fetch representation + peer card directly from a peer object."""
peer = self._get_or_create_peer(peer_id)
representation = ""
card: list[str] = []
try:
context_kwargs: dict[str, Any] = {}
if target is not None:
context_kwargs["target"] = target
if search_query is not None:
context_kwargs["search_query"] = search_query
ctx = peer.context(**context_kwargs) if context_kwargs else peer.context()
ctx = peer.context(search_query=search_query) if search_query else peer.context()
representation = (
getattr(ctx, "representation", None)
or getattr(ctx, "peer_representation", None)
@@ -904,111 +899,24 @@ class HonchoSessionManager:
if not representation:
try:
representation = (
peer.representation(target=target) if target is not None else peer.representation()
) or ""
representation = peer.representation() or ""
except Exception as e:
logger.debug("Direct peer.representation() failed for '%s': %s", peer_id, e)
if not card:
try:
card = self._fetch_peer_card(peer_id, target=target)
card = self._fetch_peer_card(peer_id)
except Exception as e:
logger.debug("Direct peer card fetch failed for '%s': %s", peer_id, e)
return {"representation": representation, "card": card}
def get_session_context(self, session_key: str, peer: str = "user") -> dict[str, Any]:
"""Fetch full session context from Honcho including summary.
Uses the session-level context() API which returns summary,
peer_representation, peer_card, and messages.
def get_peer_card(self, session_key: str) -> list[str]:
"""
session = self._cache.get(session_key)
if not session:
return {}
honcho_session = self._sessions_cache.get(session.honcho_session_id)
if not honcho_session:
# Fall back to peer-level context, respecting the requested peer
peer_id = self._resolve_peer_id(session, peer)
if peer_id is None:
peer_id = session.user_peer_id
return self._fetch_peer_context(peer_id, target=peer_id)
try:
peer_id = self._resolve_peer_id(session, peer)
ctx = honcho_session.context(
summary=True,
peer_target=peer_id,
peer_perspective=session.user_peer_id if peer == "user" else session.assistant_peer_id,
)
result: dict[str, Any] = {}
# Summary
if ctx.summary:
result["summary"] = ctx.summary.content
# Peer representation and card
if ctx.peer_representation:
result["representation"] = ctx.peer_representation
if ctx.peer_card:
result["card"] = "\n".join(ctx.peer_card)
# Messages (last N for context)
if ctx.messages:
recent = ctx.messages[-10:] # last 10 messages
result["recent_messages"] = [
{"role": getattr(m, "peer_id", "unknown"), "content": (m.content or "")[:500]}
for m in recent
]
return result
except Exception as e:
logger.debug("Session context fetch failed: %s", e)
return {}
def _resolve_peer_id(self, session: HonchoSession, peer: str | None) -> str:
"""Resolve a peer alias or explicit peer ID to a concrete Honcho peer ID.
Always returns a non-empty string: either a known peer ID or a
sanitized version of the caller-supplied alias/ID.
"""
candidate = (peer or "user").strip()
if not candidate:
return session.user_peer_id
normalized = self._sanitize_id(candidate)
if normalized == self._sanitize_id("user"):
return session.user_peer_id
if normalized == self._sanitize_id("ai"):
return session.assistant_peer_id
return normalized
def _resolve_observer_target(
self,
session: HonchoSession,
peer: str | None,
) -> tuple[str, str | None]:
"""Resolve observer and target peer IDs for context/search/profile queries."""
target_peer_id = self._resolve_peer_id(session, peer)
if target_peer_id == session.assistant_peer_id:
return session.assistant_peer_id, session.assistant_peer_id
if self._ai_observe_others:
return session.assistant_peer_id, target_peer_id
return target_peer_id, None
def get_peer_card(self, session_key: str, peer: str = "user") -> list[str]:
"""
Fetch a peer card a curated list of key facts.
Fetch the user peer's card — a curated list of key facts.
Fast, no LLM reasoning. Returns raw structured facts Honcho has
inferred about the target peer (name, role, preferences, patterns).
inferred about the user (name, role, preferences, patterns).
Empty list if unavailable.
"""
session = self._cache.get(session_key)
@@ -1016,19 +924,12 @@ class HonchoSessionManager:
return []
try:
observer_peer_id, target_peer_id = self._resolve_observer_target(session, peer)
return self._fetch_peer_card(observer_peer_id, target=target_peer_id)
return self._fetch_peer_card(session.user_peer_id)
except Exception as e:
logger.debug("Failed to fetch peer card from Honcho: %s", e)
return []
def search_context(
self,
session_key: str,
query: str,
max_tokens: int = 800,
peer: str = "user",
) -> str:
def search_context(self, session_key: str, query: str, max_tokens: int = 800) -> str:
"""
Semantic search over Honcho session context.
@@ -1040,7 +941,6 @@ class HonchoSessionManager:
session_key: Session to search against.
query: Search query for semantic matching.
max_tokens: Token budget for returned content.
peer: Peer alias or explicit peer ID to search about.
Returns:
Relevant context excerpts as a string, or empty string if none.
@@ -1050,13 +950,7 @@ class HonchoSessionManager:
return ""
try:
observer_peer_id, target = self._resolve_observer_target(session, peer)
ctx = self._fetch_peer_context(
observer_peer_id,
search_query=query,
target=target,
)
ctx = self._fetch_peer_context(session.user_peer_id, search_query=query)
parts = []
if ctx["representation"]:
parts.append(ctx["representation"])
@@ -1068,17 +962,16 @@ class HonchoSessionManager:
logger.debug("Honcho search_context failed: %s", e)
return ""
def create_conclusion(self, session_key: str, content: str, peer: str = "user") -> bool:
"""Write a conclusion about a target peer back to Honcho.
def create_conclusion(self, session_key: str, content: str) -> bool:
"""Write a conclusion about the user back to Honcho.
Conclusions are facts a peer observes about another peer or itself
preferences, corrections, clarifications, and project context.
They feed into the target peer's card and representation.
Conclusions are facts the AI peer observes about the user
preferences, corrections, clarifications, project context.
They feed into the user's peer card and representation.
Args:
session_key: Session to associate the conclusion with.
content: The conclusion text.
peer: Peer alias or explicit peer ID. "user" is the default alias.
content: The conclusion text (e.g. "User prefers dark mode").
Returns:
True on success, False on failure.
@@ -1092,90 +985,25 @@ class HonchoSessionManager:
return False
try:
target_peer_id = self._resolve_peer_id(session, peer)
if target_peer_id is None:
logger.warning("Could not resolve conclusion peer '%s' for session '%s'", peer, session_key)
return False
if target_peer_id == session.assistant_peer_id:
if self._ai_observe_others:
# AI peer creates conclusion about user (cross-observation)
assistant_peer = self._get_or_create_peer(session.assistant_peer_id)
conclusions_scope = assistant_peer.conclusions_of(session.assistant_peer_id)
elif self._ai_observe_others:
assistant_peer = self._get_or_create_peer(session.assistant_peer_id)
conclusions_scope = assistant_peer.conclusions_of(target_peer_id)
conclusions_scope = assistant_peer.conclusions_of(session.user_peer_id)
else:
target_peer = self._get_or_create_peer(target_peer_id)
conclusions_scope = target_peer.conclusions_of(target_peer_id)
# AI can't observe others — user peer creates self-conclusion
user_peer = self._get_or_create_peer(session.user_peer_id)
conclusions_scope = user_peer.conclusions_of(session.user_peer_id)
conclusions_scope.create([{
"content": content.strip(),
"session_id": session.honcho_session_id,
}])
logger.info("Created conclusion about %s for %s: %s", target_peer_id, session_key, content[:80])
logger.info("Created conclusion for %s: %s", session_key, content[:80])
return True
except Exception as e:
logger.error("Failed to create conclusion: %s", e)
return False
def delete_conclusion(self, session_key: str, conclusion_id: str, peer: str = "user") -> bool:
"""Delete a conclusion by ID. Use only for PII removal.
Args:
session_key: Session key for peer resolution.
conclusion_id: The conclusion ID to delete.
peer: Peer alias or explicit peer ID.
Returns:
True on success, False on failure.
"""
session = self._cache.get(session_key)
if not session:
return False
try:
target_peer_id = self._resolve_peer_id(session, peer)
if target_peer_id == session.assistant_peer_id:
observer = self._get_or_create_peer(session.assistant_peer_id)
scope = observer.conclusions_of(session.assistant_peer_id)
elif self._ai_observe_others:
observer = self._get_or_create_peer(session.assistant_peer_id)
scope = observer.conclusions_of(target_peer_id)
else:
target_peer = self._get_or_create_peer(target_peer_id)
scope = target_peer.conclusions_of(target_peer_id)
scope.delete(conclusion_id)
logger.info("Deleted conclusion %s for %s", conclusion_id, session_key)
return True
except Exception as e:
logger.error("Failed to delete conclusion %s: %s", conclusion_id, e)
return False
def set_peer_card(self, session_key: str, card: list[str], peer: str = "user") -> list[str] | None:
"""Update a peer's card.
Args:
session_key: Session key for peer resolution.
card: New peer card as list of fact strings.
peer: Peer alias or explicit peer ID.
Returns:
Updated card on success, None on failure.
"""
session = self._cache.get(session_key)
if not session:
return None
try:
peer_id = self._resolve_peer_id(session, peer)
if peer_id is None:
logger.warning("Could not resolve peer '%s' for set_peer_card in session '%s'", peer, session_key)
return None
peer_obj = self._get_or_create_peer(peer_id)
result = peer_obj.set_card(card)
logger.info("Updated peer card for %s (%d facts)", peer_id, len(card))
return result
except Exception as e:
logger.error("Failed to set peer card: %s", e)
return None
def seed_ai_identity(self, session_key: str, content: str, source: str = "manual") -> bool:
"""
Seed the AI peer's Honcho representation from text content.
@@ -1233,7 +1061,7 @@ class HonchoSessionManager:
return {"representation": "", "card": ""}
try:
ctx = self._fetch_peer_context(session.assistant_peer_id, target=session.assistant_peer_id)
ctx = self._fetch_peer_context(session.assistant_peer_id)
return {
"representation": ctx["representation"] or "",
"card": "\n".join(ctx["card"]),
+9 -46
View File
@@ -10,9 +10,8 @@ lifecycle instead of read-only search endpoints.
Config via environment variables (profile-scoped via each profile's .env):
OPENVIKING_ENDPOINT Server URL (default: http://127.0.0.1:1933)
OPENVIKING_API_KEY API key (required for authenticated servers)
OPENVIKING_ACCOUNT Tenant account (default: default)
OPENVIKING_ACCOUNT Tenant account (default: root)
OPENVIKING_USER Tenant user (default: default)
OPENVIKING_AGENT Tenant agent (default: hermes)
Capabilities:
- Automatic memory extraction on session commit (6 categories)
@@ -81,12 +80,11 @@ class _VikingClient:
"""Thin HTTP client for the OpenViking REST API."""
def __init__(self, endpoint: str, api_key: str = "",
account: str = "", user: str = "", agent: str = ""):
account: str = "", user: str = ""):
self._endpoint = endpoint.rstrip("/")
self._api_key = api_key
self._account = account or os.environ.get("OPENVIKING_ACCOUNT", "default")
self._account = account or os.environ.get("OPENVIKING_ACCOUNT", "root")
self._user = user or os.environ.get("OPENVIKING_USER", "default")
self._agent = agent or os.environ.get("OPENVIKING_AGENT", "hermes")
self._httpx = _get_httpx()
if self._httpx is None:
raise ImportError("httpx is required for OpenViking: pip install httpx")
@@ -96,7 +94,6 @@ class _VikingClient:
"Content-Type": "application/json",
"X-OpenViking-Account": self._account,
"X-OpenViking-User": self._user,
"X-OpenViking-Agent": self._agent,
}
if self._api_key:
h["X-API-Key"] = self._api_key
@@ -285,44 +282,20 @@ class OpenVikingMemoryProvider(MemoryProvider):
},
{
"key": "api_key",
"description": "OpenViking API key (leave blank for local dev mode)",
"description": "OpenViking API key",
"secret": True,
"env_var": "OPENVIKING_API_KEY",
},
{
"key": "account",
"description": "OpenViking tenant account ID ([default], used when local mode, OPENVIKING_API_KEY is empty)",
"default": "default",
"env_var": "OPENVIKING_ACCOUNT",
},
{
"key": "user",
"description": "OpenViking user ID within the account ([default], used when local mode, OPENVIKING_API_KEY is empty)",
"default": "default",
"env_var": "OPENVIKING_USER",
},
{
"key": "agent",
"description": "OpenViking agent ID within the account ([hermes], useful in multi-agent mode)",
"default": "hermes",
"env_var": "OPENVIKING_AGENT",
},
]
def initialize(self, session_id: str, **kwargs) -> None:
self._endpoint = os.environ.get("OPENVIKING_ENDPOINT", _DEFAULT_ENDPOINT)
self._api_key = os.environ.get("OPENVIKING_API_KEY", "")
self._account = os.environ.get("OPENVIKING_ACCOUNT", "default")
self._user = os.environ.get("OPENVIKING_USER", "default")
self._agent = os.environ.get("OPENVIKING_AGENT", "hermes")
self._session_id = session_id
self._turn_count = 0
try:
self._client = _VikingClient(
self._endpoint, self._api_key,
account=self._account, user=self._user, agent=self._agent,
)
self._client = _VikingClient(self._endpoint, self._api_key)
if not self._client.health():
logger.warning("OpenViking server at %s is not reachable", self._endpoint)
self._client = None
@@ -352,8 +325,7 @@ class OpenVikingMemoryProvider(MemoryProvider):
"(abstract/overview/full), viking_browse to explore.\n"
"Use viking_remember to store facts, viking_add_resource to index URLs/docs."
)
except Exception as e:
logger.warning("OpenViking system_prompt_block failed: %s", e)
except Exception:
return (
"# OpenViking Knowledge Base\n"
f"Active. Endpoint: {self._endpoint}\n"
@@ -379,10 +351,7 @@ class OpenVikingMemoryProvider(MemoryProvider):
def _run():
try:
client = _VikingClient(
self._endpoint, self._api_key,
account=self._account, user=self._user, agent=self._agent,
)
client = _VikingClient(self._endpoint, self._api_key)
resp = client.post("/api/v1/search/find", {
"query": query,
"top_k": 5,
@@ -417,10 +386,7 @@ class OpenVikingMemoryProvider(MemoryProvider):
def _sync():
try:
client = _VikingClient(
self._endpoint, self._api_key,
account=self._account, user=self._user, agent=self._agent,
)
client = _VikingClient(self._endpoint, self._api_key)
sid = self._session_id
# Add user message
@@ -476,10 +442,7 @@ class OpenVikingMemoryProvider(MemoryProvider):
def _write():
try:
client = _VikingClient(
self._endpoint, self._api_key,
account=self._account, user=self._user, agent=self._agent,
)
client = _VikingClient(self._endpoint, self._api_key)
# Add as a user message with memory context so the commit
# picks it up as an explicit memory during extraction
client.post(f"/api/v1/sessions/{self._session_id}/messages", {
+1 -4
View File
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "hermes-agent"
version = "0.10.0"
version = "0.9.0"
description = "The self-improving AI agent — creates skills from experience, improves them during use, and runs anywhere"
readme = "README.md"
requires-python = ">=3.11"
@@ -63,12 +63,10 @@ homeassistant = ["aiohttp>=3.9.0,<4"]
sms = ["aiohttp>=3.9.0,<4"]
acp = ["agent-client-protocol>=0.9.0,<1.0"]
mistral = ["mistralai>=2.3.0,<3"]
bedrock = ["boto3>=1.35.0,<2"]
termux = [
# Tested Android / Termux path: keeps the core CLI feature-rich while
# avoiding extras that currently depend on non-Android wheels (notably
# faster-whisper -> ctranslate2 via the voice extra).
"python-telegram-bot[webhooks]>=22.6,<23",
"hermes-agent[cron]",
"hermes-agent[cli]",
"hermes-agent[pty]",
@@ -110,7 +108,6 @@ all = [
"hermes-agent[dingtalk]",
"hermes-agent[feishu]",
"hermes-agent[mistral]",
"hermes-agent[bedrock]",
"hermes-agent[web]",
]
+200 -867
View File
File diff suppressed because it is too large Load Diff
+1 -6
View File
@@ -28,7 +28,7 @@ BOLD='\033[1m'
# Configuration
REPO_URL_SSH="git@github.com:NousResearch/hermes-agent.git"
REPO_URL_HTTPS="https://github.com/NousResearch/hermes-agent.git"
HERMES_HOME="${HERMES_HOME:-$HOME/.hermes}"
HERMES_HOME="$HOME/.hermes"
INSTALL_DIR="${HERMES_INSTALL_DIR:-$HERMES_HOME/hermes-agent}"
PYTHON_VERSION="3.11"
NODE_VERSION="22"
@@ -66,10 +66,6 @@ while [[ $# -gt 0 ]]; do
INSTALL_DIR="$2"
shift 2
;;
--hermes-home)
HERMES_HOME="$2"
shift 2
;;
-h|--help)
echo "Hermes Agent Installer"
echo ""
@@ -80,7 +76,6 @@ while [[ $# -gt 0 ]]; do
echo " --skip-setup Skip interactive setup wizard"
echo " --branch NAME Git branch to install (default: main)"
echo " --dir PATH Installation directory (default: ~/.hermes/hermes-agent)"
echo " --hermes-home PATH Data directory (default: ~/.hermes, or \$HERMES_HOME)"
echo " -h, --help Show this help"
exit 0
;;
+1 -34
View File
@@ -46,7 +46,6 @@ AUTHOR_MAP = {
# contributors (from noreply pattern)
"35742124+0xbyt4@users.noreply.github.com": "0xbyt4",
"82637225+kshitijk4poor@users.noreply.github.com": "kshitijk4poor",
"kshitijk4poor@users.noreply.github.com": "kshitijk4poor",
"16443023+stablegenius49@users.noreply.github.com": "stablegenius49",
"185121704+stablegenius49@users.noreply.github.com": "stablegenius49",
"101283333+batuhankocyigit@users.noreply.github.com": "batuhankocyigit",
@@ -63,11 +62,7 @@ AUTHOR_MAP = {
"258577966+voidborne-d@users.noreply.github.com": "voidborne-d",
"70424851+insecurejezza@users.noreply.github.com": "insecurejezza",
"259807879+Bartok9@users.noreply.github.com": "Bartok9",
"241404605+MestreY0d4-Uninter@users.noreply.github.com": "MestreY0d4-Uninter",
"268667990+Roy-oss1@users.noreply.github.com": "Roy-oss1",
"27917469+nosleepcassette@users.noreply.github.com": "nosleepcassette",
"241404605+MestreY0d4-Uninter@users.noreply.github.com": "MestreY0d4-Uninter",
"109555139+davetist@users.noreply.github.com": "davetist",
# contributors (manual mapping from git names)
"dmayhem93@gmail.com": "dmahan93",
"samherring99@gmail.com": "samherring99",
@@ -80,13 +75,8 @@ AUTHOR_MAP = {
"abdullahfarukozden@gmail.com": "Farukest",
"lovre.pesut@gmail.com": "rovle",
"hakanerten02@hotmail.com": "teyrebaz33",
"ruzzgarcn@gmail.com": "Ruzzgar",
"alireza78.crypto@gmail.com": "alireza78a",
"brooklyn.bb.nicholson@gmail.com": "brooklynnicholson",
"4317663+helix4u@users.noreply.github.com": "helix4u",
"331214+counterposition@users.noreply.github.com": "counterposition",
"blspear@gmail.com": "BrennerSpear",
"239876380+handsdiff@users.noreply.github.com": "handsdiff",
"gpickett00@gmail.com": "gpickett00",
"mcosma@gmail.com": "wakamex",
"clawdia.nash@proton.me": "clawdia-nash",
@@ -105,9 +95,7 @@ AUTHOR_MAP = {
"vincentcharlebois@gmail.com": "vincentcharlebois",
"aryan@synvoid.com": "aryansingh",
"johnsonblake1@gmail.com": "blakejohnson",
"greer.guthrie@gmail.com": "g-guthrie",
"kennyx102@gmail.com": "bobashopcashier",
"shokatalishaikh95@gmail.com": "areu01or00",
"bryan@intertwinesys.com": "bryanyoung",
"christo.mitov@gmail.com": "christomitov",
"hermes@nousresearch.com": "NousResearch",
@@ -127,9 +115,6 @@ AUTHOR_MAP = {
"m@statecraft.systems": "mbierling",
"balyan.sid@gmail.com": "balyansid",
"oluwadareab12@gmail.com": "bennytimz",
"simon@simonmarcus.org": "simon-marcus",
"xowiekk@gmail.com": "Xowiek",
"1243352777@qq.com": "zons-zhaozhy",
# ── bulk addition: 75 emails resolved via API, PR salvage bodies, noreply
# crossref, and GH contributor list matching (April 2026 audit) ──
"1115117931@qq.com": "aaronagent",
@@ -180,22 +165,6 @@ AUTHOR_MAP = {
"limars874@gmail.com": "limars874",
"lisicheng168@gmail.com": "lesterli",
"mingjwan@microsoft.com": "MagicRay1217",
"orangeko@gmail.com": "GenKoKo",
"82095453+iacker@users.noreply.github.com": "iacker",
"sontianye@users.noreply.github.com": "sontianye",
"jackjin1997@users.noreply.github.com": "jackjin1997",
"danieldoderlein@users.noreply.github.com": "danieldoderlein",
"lrawnsley@users.noreply.github.com": "lrawnsley",
"taeuk178@users.noreply.github.com": "taeuk178",
"ogzerber@users.noreply.github.com": "ogzerber",
"cola-runner@users.noreply.github.com": "cola-runner",
"ygd58@users.noreply.github.com": "ygd58",
"vominh1919@users.noreply.github.com": "vominh1919",
"trevmanthony@gmail.com": "trevthefoolish",
"ziliangpeng@users.noreply.github.com": "ziliangpeng",
"centripetal-star@users.noreply.github.com": "centripetal-star",
"LeonSGP43@users.noreply.github.com": "LeonSGP43",
"Lubrsy706@users.noreply.github.com": "Lubrsy706",
"niyant@spicefi.xyz": "spniyant",
"olafthiele@gmail.com": "olafthiele",
"oncuevtv@gmail.com": "sprmn24",
@@ -218,14 +187,12 @@ AUTHOR_MAP = {
"yangzhi.see@gmail.com": "SeeYangZhi",
"yongtenglei@gmail.com": "yongtenglei",
"young@YoungdeMacBook-Pro.local": "YoungYang963",
"ysfalweshcan@gmail.com": "Junass1",
"ysfalweshcan@gmail.com": "Awsh1",
"ysfwaxlycan@gmail.com": "WAXLYY",
"yusufalweshdemir@gmail.com": "Dusk1e",
"zhouboli@gmail.com": "zhouboli",
"zqiao@microsoft.com": "tomqiaozc",
"zzn+pa@zzn.im": "xinbenlv",
"zaynjarvis@gmail.com": "ZaynJarvis",
"zhiheng.liu@bytedance.com": "ZaynJarvis",
}
@@ -313,7 +313,7 @@ Type these during an interactive chat session.
```
~/.hermes/config.yaml Main configuration
~/.hermes/.env API keys and secrets
$HERMES_HOME/skills/ Installed skills
~/.hermes/skills/ Installed skills
~/.hermes/sessions/ Session transcripts
~/.hermes/logs/ Gateway and error logs
~/.hermes/auth.json OAuth tokens and credential pools
@@ -351,8 +351,8 @@ Full config reference: https://hermes-agent.nousresearch.com/docs/user-guide/con
|----------|------|-------------|
| OpenRouter | API key | `OPENROUTER_API_KEY` |
| Anthropic | API key | `ANTHROPIC_API_KEY` |
| Nous Portal | OAuth | `hermes auth` |
| OpenAI Codex | OAuth | `hermes auth` |
| Nous Portal | OAuth | `hermes login --provider nous` |
| OpenAI Codex | OAuth | `hermes login --provider openai-codex` |
| GitHub Copilot | Token | `COPILOT_GITHUB_TOKEN` |
| Google Gemini | API key | `GOOGLE_API_KEY` or `GEMINI_API_KEY` |
| DeepSeek | API key | `DEEPSEEK_API_KEY` |
@@ -650,9 +650,9 @@ registry.register(
)
```
**2. Add to `toolsets.py`**`_HERMES_CORE_TOOLS` list.
**2. Add import** in `model_tools.py``_discover_tools()` list.
Auto-discovery: any `tools/*.py` file with a top-level `registry.register()` call is imported automatically — no manual list needed.
**3. Add to `toolsets.py`** → `_HERMES_CORE_TOOLS` list.
All handlers must return JSON strings. Use `get_hermes_home()` for paths, never hardcode `~/.hermes`.
+1 -1
View File
@@ -334,7 +334,7 @@ When the user asks you to "review PR #N", "look at this PR", or gives you a PR U
### Step 1: Set up environment
```bash
source "${HERMES_HOME:-$HOME/.hermes}/skills/github/github-auth/scripts/gh-env.sh"
source ~/.hermes/skills/github/github-auth/scripts/gh-env.sh
# Or run the inline setup block from the top of this skill
```
@@ -6,7 +6,7 @@ All requests need: `-H "Authorization: token $GITHUB_TOKEN"`
Use the `gh-env.sh` helper to set `$GITHUB_TOKEN`, `$GH_OWNER`, `$GH_REPO` automatically:
```bash
source "${HERMES_HOME:-$HOME/.hermes}/skills/github/github-auth/scripts/gh-env.sh"
source ~/.hermes/skills/github/github-auth/scripts/gh-env.sh
```
## Repositories

Some files were not shown because too many files have changed in this diff Show More