Compare commits
3 Commits
custom_aux
...
feat/modul
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
fea3a5bdcf | ||
|
|
93dd869eab | ||
|
|
50ee4aa672 |
28
.env.example
28
.env.example
@@ -13,34 +13,6 @@ OPENROUTER_API_KEY=
|
||||
# Examples: anthropic/claude-opus-4.6, openai/gpt-4o, google/gemini-3-flash-preview, zhipuai/glm-4-plus
|
||||
LLM_MODEL=anthropic/claude-opus-4.6
|
||||
|
||||
# =============================================================================
|
||||
# LLM PROVIDER (z.ai / GLM)
|
||||
# =============================================================================
|
||||
# z.ai provides access to ZhipuAI GLM models (GLM-4-Plus, etc.)
|
||||
# Get your key at: https://z.ai or https://open.bigmodel.cn
|
||||
GLM_API_KEY=
|
||||
# GLM_BASE_URL=https://api.z.ai/api/paas/v4 # Override default base URL
|
||||
|
||||
# =============================================================================
|
||||
# LLM PROVIDER (Kimi / Moonshot)
|
||||
# =============================================================================
|
||||
# Kimi/Moonshot provides access to Moonshot AI coding models
|
||||
# Get your key at: https://platform.moonshot.ai
|
||||
KIMI_API_KEY=
|
||||
# KIMI_BASE_URL=https://api.moonshot.ai/v1 # Override default base URL
|
||||
|
||||
# =============================================================================
|
||||
# LLM PROVIDER (MiniMax)
|
||||
# =============================================================================
|
||||
# MiniMax provides access to MiniMax models (global endpoint)
|
||||
# Get your key at: https://www.minimax.io
|
||||
MINIMAX_API_KEY=
|
||||
# MINIMAX_BASE_URL=https://api.minimax.io/v1 # Override default base URL
|
||||
|
||||
# MiniMax China endpoint (for users in mainland China)
|
||||
MINIMAX_CN_API_KEY=
|
||||
# MINIMAX_CN_BASE_URL=https://api.minimaxi.com/v1 # Override default base URL
|
||||
|
||||
# =============================================================================
|
||||
# TOOL API KEYS
|
||||
# =============================================================================
|
||||
|
||||
@@ -204,7 +204,7 @@ Every installed skill in `~/.hermes/skills/` is automatically registered as a sl
|
||||
The skill name (from frontmatter or folder name) becomes the command: `axolotl` → `/axolotl`.
|
||||
|
||||
Implementation (`agent/skill_commands.py`, shared between CLI and gateway):
|
||||
1. `scan_skill_commands()` scans all SKILL.md files at startup, filtering out skills incompatible with the current OS platform (via the `platforms` frontmatter field)
|
||||
1. `scan_skill_commands()` scans all SKILL.md files at startup
|
||||
2. `build_skill_invocation_message()` loads the SKILL.md content and builds a user-turn message
|
||||
3. The message includes the full skill content, a list of supporting files (not loaded), and the user's instruction
|
||||
4. Supporting files can be loaded on demand via the `skill_view` tool
|
||||
@@ -657,7 +657,6 @@ SKILL.md files use YAML frontmatter (agentskills.io format):
|
||||
name: skill-name
|
||||
description: Brief description for listing
|
||||
version: 1.0.0
|
||||
platforms: [macos] # Optional — restrict to specific OS (macos/linux/windows)
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [tag1, tag2]
|
||||
@@ -666,8 +665,6 @@ metadata:
|
||||
# Skill Content...
|
||||
```
|
||||
|
||||
**Platform filtering** — Skills with a `platforms` field are automatically excluded from the system prompt index, `skills_list()`, and slash commands on incompatible platforms. Skills without the field load everywhere (backward compatible). See `skills/apple/` for macOS-only examples (iMessage, Reminders, Notes, FindMy).
|
||||
|
||||
**Skills Hub** — user-driven skill search/install from online registries and official optional skills. Sources: official optional skills (shipped with repo, labeled "official"), GitHub (openai/skills, anthropics/skills, custom taps), ClawHub, Claude marketplace, LobeHub. Not exposed as an agent tool — the model cannot search for or install skills. Users manage skills via `hermes skills browse/search/install` CLI commands or the `/skills` slash command in chat.
|
||||
|
||||
Key files:
|
||||
|
||||
@@ -325,9 +325,6 @@ description: Brief description (shown in skill search results)
|
||||
version: 1.0.0
|
||||
author: Your Name
|
||||
license: MIT
|
||||
platforms: [macos, linux] # Optional — restrict to specific OS platforms
|
||||
# Valid: macos, linux, windows
|
||||
# Omit to load on all platforms (default)
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [Category, Subcategory, Keywords]
|
||||
@@ -354,18 +351,6 @@ Known failure modes and how to handle them.
|
||||
How the agent confirms it worked.
|
||||
```
|
||||
|
||||
### Platform-specific skills
|
||||
|
||||
Skills can declare which OS platforms they support via the `platforms` frontmatter field. Skills with this field are automatically hidden from the system prompt, `skills_list()`, and slash commands on incompatible platforms.
|
||||
|
||||
```yaml
|
||||
platforms: [macos] # macOS only (e.g., iMessage, Apple Reminders)
|
||||
platforms: [macos, linux] # macOS and Linux
|
||||
platforms: [windows] # Windows only
|
||||
```
|
||||
|
||||
If the field is omitted or empty, the skill loads on all platforms (backward compatible). See `skills/apple/` for examples of macOS-only skills.
|
||||
|
||||
### Skill guidelines
|
||||
|
||||
- **No external dependencies unless absolutely necessary.** Prefer stdlib Python, curl, and existing Hermes tools (`web_extract`, `terminal`, `read_file`).
|
||||
|
||||
@@ -13,7 +13,7 @@
|
||||
|
||||
**The self-improving AI agent built by [Nous Research](https://nousresearch.com).** It's the only agent with a built-in learning loop — it creates skills from experience, improves them during use, nudges itself to persist knowledge, searches its own past conversations, and builds a deepening model of who you are across sessions. Run it on a $5 VPS, a GPU cluster, or serverless infrastructure that costs nearly nothing when idle. It's not tied to your laptop — talk to it from Telegram while it works on a cloud VM.
|
||||
|
||||
Use any model you want — [Nous Portal](https://portal.nousresearch.com), [OpenRouter](https://openrouter.ai) (200+ models), [z.ai/GLM](https://z.ai), [Kimi/Moonshot](https://platform.moonshot.ai), [MiniMax](https://www.minimax.io), OpenAI, or your own endpoint. Switch with `hermes model` — no code changes, no lock-in.
|
||||
Use any model you want — [Nous Portal](https://portal.nousresearch.com), [OpenRouter](https://openrouter.ai) (200+ models), OpenAI, or your own endpoint. Switch with `hermes model` — no code changes, no lock-in.
|
||||
|
||||
<table>
|
||||
<tr><td><b>A real terminal interface</b></td><td>Full TUI with multiline editing, slash-command autocomplete, conversation history, interrupt-and-redirect, and streaming tool output.</td></tr>
|
||||
|
||||
129
TODO.md
Normal file
129
TODO.md
Normal file
@@ -0,0 +1,129 @@
|
||||
# Hermes Agent - Future Improvements
|
||||
|
||||
---
|
||||
|
||||
|
||||
|
||||
## 3. Local Browser Control via CDP 🌐
|
||||
|
||||
**Status:** Not started (currently Browserbase cloud only)
|
||||
**Priority:** Medium
|
||||
|
||||
Support local Chrome/Chromium via Chrome DevTools Protocol alongside existing Browserbase cloud backend.
|
||||
|
||||
**What other agents do:**
|
||||
- **OpenClaw**: Full CDP-based Chrome control with snapshots, actions, uploads, profiles, file chooser, PDF save, console messages, tab management. Uses local Chrome for persistent login sessions.
|
||||
- **Cline**: Headless browser with Computer Use (click, type, scroll, screenshot, console logs)
|
||||
|
||||
**Our approach:**
|
||||
- Add a `local` backend option to `browser_tool.py` using Playwright or raw CDP
|
||||
- Config toggle: `browser.backend: local | browserbase | auto`
|
||||
- `auto` mode: try local first, fall back to Browserbase
|
||||
- Local advantages: free, persistent login sessions, no API key needed
|
||||
- Local disadvantages: no CAPTCHA solving, no stealth mode, requires Chrome installed
|
||||
- Reuse the same 10-tool interface -- just swap the backend
|
||||
- Later: Chrome profile management for persistent sessions across restarts
|
||||
|
||||
---
|
||||
|
||||
## 4. Signal Integration 📡
|
||||
|
||||
**Status:** Not started
|
||||
**Priority:** Low
|
||||
|
||||
New platform adapter using signal-cli daemon (JSON-RPC HTTP + SSE). Requires Java runtime and phone number registration.
|
||||
|
||||
**Reference:** OpenClaw has Signal support via signal-cli.
|
||||
|
||||
---
|
||||
|
||||
## 5. Plugin/Extension System 🔌
|
||||
|
||||
**Status:** Partially implemented (event hooks exist in `gateway/hooks.py`)
|
||||
**Priority:** Medium
|
||||
|
||||
Full Python plugin interface that goes beyond the current hook system.
|
||||
|
||||
**What other agents do:**
|
||||
- **OpenClaw**: Plugin SDK with tool-send capabilities, lifecycle phase hooks (before-agent-start, after-tool-call, model-override), plugin registry with install/uninstall.
|
||||
- **Pi**: Extensions are TypeScript modules that can register tools, commands, keyboard shortcuts, custom UI widgets, overlays, status lines, dialogs, compaction hooks, raw terminal input listeners. Extremely comprehensive.
|
||||
- **OpenCode**: MCP client support (stdio, SSE, StreamableHTTP), OAuth auth for MCP servers. Also has Copilot/Codex plugins.
|
||||
- **Codex**: Full MCP integration with skill dependencies.
|
||||
- **Cline**: MCP integration + lifecycle hooks with cancellation support.
|
||||
|
||||
**Our approach (phased):**
|
||||
|
||||
### Phase 1: Enhanced hooks
|
||||
- Expand the existing `gateway/hooks.py` to support more events: `before-tool-call`, `after-tool-call`, `before-response`, `context-compress`, `session-end`
|
||||
- Allow hooks to modify tool results (e.g., filter sensitive output)
|
||||
|
||||
### Phase 2: Plugin interface
|
||||
- `~/.hermes/plugins/<name>/plugin.yaml` + `handler.py`
|
||||
- Plugins can: register new tools, add CLI commands, subscribe to events, inject system prompt sections
|
||||
- `hermes plugin list|install|uninstall|create` CLI commands
|
||||
- Plugin discovery and validation on startup
|
||||
|
||||
### Phase 3: MCP support (industry standard) ✅ DONE
|
||||
- ✅ MCP client that connects to external MCP servers (stdio + HTTP/StreamableHTTP)
|
||||
- ✅ Config: `mcp_servers` in config.yaml with connection details
|
||||
- ✅ Each MCP server's tools auto-registered as a dynamic toolset
|
||||
- Future: Resources, Prompts, Progress notifications, `hermes mcp` CLI command
|
||||
|
||||
---
|
||||
|
||||
## 6. MCP (Model Context Protocol) Support 🔗 ✅ DONE
|
||||
|
||||
**Status:** Implemented (PR #301)
|
||||
**Priority:** Complete
|
||||
|
||||
Native MCP client support with stdio and HTTP/StreamableHTTP transports, auto-discovery, reconnection with exponential backoff, env var filtering, and credential stripping. See `docs/mcp.md` for full documentation.
|
||||
|
||||
**Still TODO:**
|
||||
- `hermes mcp` CLI subcommand (list/test/status)
|
||||
- `hermes tools` UI integration for MCP toolsets
|
||||
- MCP Resources and Prompts support
|
||||
- OAuth authentication for remote servers
|
||||
- Progress notifications for long-running tools
|
||||
|
||||
---
|
||||
|
||||
## 8. Filesystem Checkpointing / Rollback 🔄
|
||||
|
||||
**Status:** Not started
|
||||
**Priority:** Low-Medium
|
||||
|
||||
Automatic filesystem snapshots after each agent loop iteration so the user can roll back destructive changes to their project.
|
||||
|
||||
**What other agents do:**
|
||||
- **Cline**: Workspace checkpoints at each step with Compare/Restore UI
|
||||
- **OpenCode**: Git-backed workspace snapshots per step, with weekly gc
|
||||
- **Codex**: Sandboxed execution with commit-per-step, rollback on failure
|
||||
|
||||
**Our approach:**
|
||||
- After each tool call (or batch of tool calls in a single turn) that modifies files, create a lightweight checkpoint of the affected files
|
||||
- Git-based when the project is a repo: auto-commit to a detached/temporary branch (`hermes/checkpoints/<session>`) after each agent turn, squash or discard on session end
|
||||
- Non-git fallback: tar snapshots of changed files in `~/.hermes/checkpoints/<session_id>/`
|
||||
- `hermes rollback` CLI command to restore to a previous checkpoint
|
||||
- Agent-accessible via a `checkpoint` tool: `list` (show available restore points), `restore` (roll back to a named point), `diff` (show what changed since a checkpoint)
|
||||
- Configurable: off by default (opt-in via `config.yaml`), since auto-committing can be surprising
|
||||
- Cleanup: checkpoints expire after session ends (or configurable retention period)
|
||||
- Integration with the terminal backend: works with local, SSH, and Docker backends (snapshots happen on the execution host)
|
||||
|
||||
---
|
||||
|
||||
## Implementation Priority Order
|
||||
|
||||
### Tier 1: Next Up
|
||||
|
||||
1. ~~MCP Support -- #6~~ ✅ Done (PR #301)
|
||||
|
||||
### Tier 2: Quality of Life
|
||||
|
||||
3. Local Browser Control via CDP -- #3
|
||||
4. Plugin/Extension System -- #5
|
||||
|
||||
### Tier 3: Nice to Have
|
||||
|
||||
5. Session Branching / Checkpoints -- #7
|
||||
6. Filesystem Checkpointing / Rollback -- #8
|
||||
7. Signal Integration -- #4
|
||||
@@ -4,20 +4,18 @@ Provides a single resolution chain so every consumer (context compression,
|
||||
session search, web extraction, vision analysis, browser vision) picks up
|
||||
the best available backend without duplicating fallback logic.
|
||||
|
||||
Resolution order (same for text and vision tasks):
|
||||
Resolution order for text tasks:
|
||||
1. OpenRouter (OPENROUTER_API_KEY)
|
||||
2. Nous Portal (~/.hermes/auth.json active provider)
|
||||
3. Custom endpoint (OPENAI_BASE_URL + OPENAI_API_KEY)
|
||||
4. Codex OAuth (Responses API via chatgpt.com with gpt-5.3-codex,
|
||||
wrapped to look like a chat.completions client)
|
||||
5. Direct API-key providers (z.ai/GLM, Kimi/Moonshot, MiniMax, MiniMax-CN)
|
||||
— checked via PROVIDER_REGISTRY entries with auth_type='api_key'
|
||||
6. None
|
||||
5. None
|
||||
|
||||
Per-task provider overrides (e.g. AUXILIARY_VISION_PROVIDER,
|
||||
CONTEXT_COMPRESSION_PROVIDER) can force a specific provider for each task:
|
||||
"openrouter", "nous", or "main" (= steps 3-5).
|
||||
Default "auto" follows the full chain above.
|
||||
Resolution order for vision/multimodal tasks:
|
||||
1. OpenRouter
|
||||
2. Nous Portal
|
||||
3. None (custom endpoints can't substitute for Gemini multimodal)
|
||||
"""
|
||||
|
||||
import json
|
||||
@@ -33,14 +31,6 @@ from hermes_constants import OPENROUTER_BASE_URL
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Default auxiliary models for direct API-key providers (cheap/fast for side tasks)
|
||||
_API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
|
||||
"zai": "glm-4.5-flash",
|
||||
"kimi-coding": "kimi-k2-turbo-preview",
|
||||
"minimax": "MiniMax-M2.5-highspeed",
|
||||
"minimax-cn": "MiniMax-M2.5-highspeed",
|
||||
}
|
||||
|
||||
# OpenRouter app attribution headers
|
||||
_OR_HEADERS = {
|
||||
"HTTP-Referer": "https://github.com/NousResearch/hermes-agent",
|
||||
@@ -292,159 +282,53 @@ def _read_codex_access_token() -> Optional[str]:
|
||||
return None
|
||||
|
||||
|
||||
def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
|
||||
"""Try each API-key provider in PROVIDER_REGISTRY order.
|
||||
|
||||
Returns (client, model) for the first provider whose env var is set,
|
||||
or (None, None) if none are configured.
|
||||
"""
|
||||
try:
|
||||
from hermes_cli.auth import PROVIDER_REGISTRY
|
||||
except ImportError:
|
||||
logger.debug("Could not import PROVIDER_REGISTRY for API-key fallback")
|
||||
return None, None
|
||||
|
||||
for provider_id, pconfig in PROVIDER_REGISTRY.items():
|
||||
if pconfig.auth_type != "api_key":
|
||||
continue
|
||||
# Check if any of the provider's env vars are set
|
||||
api_key = ""
|
||||
for env_var in pconfig.api_key_env_vars:
|
||||
val = os.getenv(env_var, "").strip()
|
||||
if val:
|
||||
api_key = val
|
||||
break
|
||||
if not api_key:
|
||||
continue
|
||||
# Resolve base URL (with optional env-var override)
|
||||
base_url = pconfig.inference_base_url
|
||||
if pconfig.base_url_env_var:
|
||||
env_url = os.getenv(pconfig.base_url_env_var, "").strip()
|
||||
if env_url:
|
||||
base_url = env_url.rstrip("/")
|
||||
model = _API_KEY_PROVIDER_AUX_MODELS.get(provider_id, "default")
|
||||
logger.debug("Auxiliary text client: %s (%s)", pconfig.name, model)
|
||||
return OpenAI(api_key=api_key, base_url=base_url), model
|
||||
|
||||
return None, None
|
||||
|
||||
|
||||
# ── Provider resolution helpers ─────────────────────────────────────────────
|
||||
|
||||
def _get_auxiliary_provider(task: str = "") -> str:
|
||||
"""Read the provider override for a specific auxiliary task.
|
||||
|
||||
Checks AUXILIARY_{TASK}_PROVIDER first (e.g. AUXILIARY_VISION_PROVIDER),
|
||||
then CONTEXT_{TASK}_PROVIDER (for the compression section's summary_provider),
|
||||
then falls back to "auto". Returns one of: "auto", "openrouter", "nous", "main".
|
||||
"""
|
||||
if task:
|
||||
for prefix in ("AUXILIARY_", "CONTEXT_"):
|
||||
val = os.getenv(f"{prefix}{task.upper()}_PROVIDER", "").strip().lower()
|
||||
if val and val != "auto":
|
||||
return val
|
||||
return "auto"
|
||||
|
||||
|
||||
def _try_openrouter() -> Tuple[Optional[OpenAI], Optional[str]]:
|
||||
or_key = os.getenv("OPENROUTER_API_KEY")
|
||||
if not or_key:
|
||||
return None, None
|
||||
logger.debug("Auxiliary client: OpenRouter")
|
||||
return OpenAI(api_key=or_key, base_url=OPENROUTER_BASE_URL,
|
||||
default_headers=_OR_HEADERS), _OPENROUTER_MODEL
|
||||
|
||||
|
||||
def _try_nous() -> Tuple[Optional[OpenAI], Optional[str]]:
|
||||
nous = _read_nous_auth()
|
||||
if not nous:
|
||||
return None, None
|
||||
global auxiliary_is_nous
|
||||
auxiliary_is_nous = True
|
||||
logger.debug("Auxiliary client: Nous Portal")
|
||||
return (
|
||||
OpenAI(api_key=_nous_api_key(nous), base_url=_nous_base_url()),
|
||||
_NOUS_MODEL,
|
||||
)
|
||||
|
||||
|
||||
def _try_custom_endpoint() -> Tuple[Optional[OpenAI], Optional[str]]:
|
||||
custom_base = os.getenv("OPENAI_BASE_URL")
|
||||
custom_key = os.getenv("OPENAI_API_KEY")
|
||||
if not custom_base or not custom_key:
|
||||
return None, None
|
||||
model = os.getenv("OPENAI_MODEL") or os.getenv("LLM_MODEL") or "gpt-4o-mini"
|
||||
logger.debug("Auxiliary client: custom endpoint (%s)", model)
|
||||
return OpenAI(api_key=custom_key, base_url=custom_base), model
|
||||
|
||||
|
||||
def _try_codex() -> Tuple[Optional[Any], Optional[str]]:
|
||||
codex_token = _read_codex_access_token()
|
||||
if not codex_token:
|
||||
return None, None
|
||||
logger.debug("Auxiliary client: Codex OAuth (%s via Responses API)", _CODEX_AUX_MODEL)
|
||||
real_client = OpenAI(api_key=codex_token, base_url=_CODEX_AUX_BASE_URL)
|
||||
return CodexAuxiliaryClient(real_client, _CODEX_AUX_MODEL), _CODEX_AUX_MODEL
|
||||
|
||||
|
||||
def _resolve_forced_provider(forced: str) -> Tuple[Optional[OpenAI], Optional[str]]:
|
||||
"""Resolve a specific forced provider. Returns (None, None) if creds missing."""
|
||||
if forced == "openrouter":
|
||||
client, model = _try_openrouter()
|
||||
if client is None:
|
||||
logger.warning("auxiliary.provider=openrouter but OPENROUTER_API_KEY not set")
|
||||
return client, model
|
||||
|
||||
if forced == "nous":
|
||||
client, model = _try_nous()
|
||||
if client is None:
|
||||
logger.warning("auxiliary.provider=nous but Nous Portal not configured (run: hermes login)")
|
||||
return client, model
|
||||
|
||||
if forced == "main":
|
||||
# "main" = skip OpenRouter/Nous, use the main chat model's credentials.
|
||||
for try_fn in (_try_custom_endpoint, _try_codex, _resolve_api_key_provider):
|
||||
client, model = try_fn()
|
||||
if client is not None:
|
||||
return client, model
|
||||
logger.warning("auxiliary.provider=main but no main endpoint credentials found")
|
||||
return None, None
|
||||
|
||||
# Unknown provider name — fall through to auto
|
||||
logger.warning("Unknown auxiliary.provider=%r, falling back to auto", forced)
|
||||
return None, None
|
||||
|
||||
|
||||
def _resolve_auto() -> Tuple[Optional[OpenAI], Optional[str]]:
|
||||
"""Full auto-detection chain: OpenRouter → Nous → custom → Codex → API-key → None."""
|
||||
for try_fn in (_try_openrouter, _try_nous, _try_custom_endpoint,
|
||||
_try_codex, _resolve_api_key_provider):
|
||||
client, model = try_fn()
|
||||
if client is not None:
|
||||
return client, model
|
||||
logger.debug("Auxiliary client: none available")
|
||||
return None, None
|
||||
|
||||
|
||||
# ── Public API ──────────────────────────────────────────────────────────────
|
||||
|
||||
def get_text_auxiliary_client(task: str = "") -> Tuple[Optional[OpenAI], Optional[str]]:
|
||||
"""Return (client, default_model_slug) for text-only auxiliary tasks.
|
||||
def get_text_auxiliary_client() -> Tuple[Optional[OpenAI], Optional[str]]:
|
||||
"""Return (client, model_slug) for text-only auxiliary tasks.
|
||||
|
||||
Args:
|
||||
task: Optional task name ("compression", "web_extract") to check
|
||||
for a task-specific provider override.
|
||||
|
||||
Callers may override the returned model with a per-task env var
|
||||
(e.g. CONTEXT_COMPRESSION_MODEL, AUXILIARY_WEB_EXTRACT_MODEL).
|
||||
Falls through OpenRouter -> Nous Portal -> custom endpoint -> Codex OAuth -> (None, None).
|
||||
"""
|
||||
forced = _get_auxiliary_provider(task)
|
||||
if forced != "auto":
|
||||
return _resolve_forced_provider(forced)
|
||||
return _resolve_auto()
|
||||
# 1. OpenRouter
|
||||
or_key = os.getenv("OPENROUTER_API_KEY")
|
||||
if or_key:
|
||||
logger.debug("Auxiliary text client: OpenRouter")
|
||||
return OpenAI(api_key=or_key, base_url=OPENROUTER_BASE_URL,
|
||||
default_headers=_OR_HEADERS), _OPENROUTER_MODEL
|
||||
|
||||
# 2. Nous Portal
|
||||
nous = _read_nous_auth()
|
||||
if nous:
|
||||
global auxiliary_is_nous
|
||||
auxiliary_is_nous = True
|
||||
logger.debug("Auxiliary text client: Nous Portal")
|
||||
return (
|
||||
OpenAI(api_key=_nous_api_key(nous), base_url=_nous_base_url()),
|
||||
_NOUS_MODEL,
|
||||
)
|
||||
|
||||
# 3. Custom endpoint (both base URL and key must be set)
|
||||
custom_base = os.getenv("OPENAI_BASE_URL")
|
||||
custom_key = os.getenv("OPENAI_API_KEY")
|
||||
if custom_base and custom_key:
|
||||
model = os.getenv("OPENAI_MODEL") or os.getenv("LLM_MODEL") or "gpt-4o-mini"
|
||||
logger.debug("Auxiliary text client: custom endpoint (%s)", model)
|
||||
return OpenAI(api_key=custom_key, base_url=custom_base), model
|
||||
|
||||
# 4. Codex OAuth -- uses the Responses API (only endpoint the token
|
||||
# can access), wrapped to look like a chat.completions client.
|
||||
codex_token = _read_codex_access_token()
|
||||
if codex_token:
|
||||
logger.debug("Auxiliary text client: Codex OAuth (%s via Responses API)", _CODEX_AUX_MODEL)
|
||||
real_client = OpenAI(api_key=codex_token, base_url=_CODEX_AUX_BASE_URL)
|
||||
return CodexAuxiliaryClient(real_client, _CODEX_AUX_MODEL), _CODEX_AUX_MODEL
|
||||
|
||||
# 5. Nothing available
|
||||
logger.debug("Auxiliary text client: none available")
|
||||
return None, None
|
||||
|
||||
|
||||
def get_async_text_auxiliary_client(task: str = ""):
|
||||
def get_async_text_auxiliary_client():
|
||||
"""Return (async_client, model_slug) for async consumers.
|
||||
|
||||
For standard providers returns (AsyncOpenAI, model). For Codex returns
|
||||
@@ -453,7 +337,7 @@ def get_async_text_auxiliary_client(task: str = ""):
|
||||
"""
|
||||
from openai import AsyncOpenAI
|
||||
|
||||
sync_client, model = get_text_auxiliary_client(task)
|
||||
sync_client, model = get_text_auxiliary_client()
|
||||
if sync_client is None:
|
||||
return None, None
|
||||
|
||||
@@ -470,16 +354,30 @@ def get_async_text_auxiliary_client(task: str = ""):
|
||||
|
||||
|
||||
def get_vision_auxiliary_client() -> Tuple[Optional[OpenAI], Optional[str]]:
|
||||
"""Return (client, default_model_slug) for vision/multimodal auxiliary tasks.
|
||||
"""Return (client, model_slug) for vision/multimodal auxiliary tasks.
|
||||
|
||||
Checks AUXILIARY_VISION_PROVIDER for a forced provider, otherwise
|
||||
auto-detects. Callers may override the returned model with
|
||||
AUXILIARY_VISION_MODEL.
|
||||
Only OpenRouter and Nous Portal qualify — custom endpoints cannot
|
||||
substitute for Gemini multimodal.
|
||||
"""
|
||||
forced = _get_auxiliary_provider("vision")
|
||||
if forced != "auto":
|
||||
return _resolve_forced_provider(forced)
|
||||
return _resolve_auto()
|
||||
# 1. OpenRouter
|
||||
or_key = os.getenv("OPENROUTER_API_KEY")
|
||||
if or_key:
|
||||
logger.debug("Auxiliary vision client: OpenRouter")
|
||||
return OpenAI(api_key=or_key, base_url=OPENROUTER_BASE_URL,
|
||||
default_headers=_OR_HEADERS), _OPENROUTER_MODEL
|
||||
|
||||
# 2. Nous Portal
|
||||
nous = _read_nous_auth()
|
||||
if nous:
|
||||
logger.debug("Auxiliary vision client: Nous Portal")
|
||||
return (
|
||||
OpenAI(api_key=_nous_api_key(nous), base_url=_nous_base_url()),
|
||||
_NOUS_MODEL,
|
||||
)
|
||||
|
||||
# 3. Nothing suitable
|
||||
logger.debug("Auxiliary vision client: none available")
|
||||
return None, None
|
||||
|
||||
|
||||
def get_auxiliary_extra_body() -> dict:
|
||||
|
||||
@@ -53,7 +53,7 @@ class ContextCompressor:
|
||||
self.last_completion_tokens = 0
|
||||
self.last_total_tokens = 0
|
||||
|
||||
self.client, default_model = get_text_auxiliary_client("compression")
|
||||
self.client, default_model = get_text_auxiliary_client()
|
||||
self.summary_model = summary_model_override or default_model
|
||||
|
||||
def update_from_response(self, usage: Dict[str, Any]):
|
||||
@@ -196,111 +196,10 @@ Write only the summary, starting with "[CONTEXT SUMMARY]:" prefix."""
|
||||
logger.debug("Could not build fallback auxiliary client: %s", exc)
|
||||
return None, None
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Tool-call / tool-result pair integrity helpers
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
@staticmethod
|
||||
def _get_tool_call_id(tc) -> str:
|
||||
"""Extract the call ID from a tool_call entry (dict or SimpleNamespace)."""
|
||||
if isinstance(tc, dict):
|
||||
return tc.get("id", "")
|
||||
return getattr(tc, "id", "") or ""
|
||||
|
||||
def _sanitize_tool_pairs(self, messages: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
|
||||
"""Fix orphaned tool_call / tool_result pairs after compression.
|
||||
|
||||
Two failure modes:
|
||||
1. A tool *result* references a call_id whose assistant tool_call was
|
||||
removed (summarized/truncated). The API rejects this with
|
||||
"No tool call found for function call output with call_id ...".
|
||||
2. An assistant message has tool_calls whose results were dropped.
|
||||
The API rejects this because every tool_call must be followed by
|
||||
a tool result with the matching call_id.
|
||||
|
||||
This method removes orphaned results and inserts stub results for
|
||||
orphaned calls so the message list is always well-formed.
|
||||
"""
|
||||
surviving_call_ids: set = set()
|
||||
for msg in messages:
|
||||
if msg.get("role") == "assistant":
|
||||
for tc in msg.get("tool_calls") or []:
|
||||
cid = self._get_tool_call_id(tc)
|
||||
if cid:
|
||||
surviving_call_ids.add(cid)
|
||||
|
||||
result_call_ids: set = set()
|
||||
for msg in messages:
|
||||
if msg.get("role") == "tool":
|
||||
cid = msg.get("tool_call_id")
|
||||
if cid:
|
||||
result_call_ids.add(cid)
|
||||
|
||||
# 1. Remove tool results whose call_id has no matching assistant tool_call
|
||||
orphaned_results = result_call_ids - surviving_call_ids
|
||||
if orphaned_results:
|
||||
messages = [
|
||||
m for m in messages
|
||||
if not (m.get("role") == "tool" and m.get("tool_call_id") in orphaned_results)
|
||||
]
|
||||
if not self.quiet_mode:
|
||||
logger.info("Compression sanitizer: removed %d orphaned tool result(s)", len(orphaned_results))
|
||||
|
||||
# 2. Add stub results for assistant tool_calls whose results were dropped
|
||||
missing_results = surviving_call_ids - result_call_ids
|
||||
if missing_results:
|
||||
patched: List[Dict[str, Any]] = []
|
||||
for msg in messages:
|
||||
patched.append(msg)
|
||||
if msg.get("role") == "assistant":
|
||||
for tc in msg.get("tool_calls") or []:
|
||||
cid = self._get_tool_call_id(tc)
|
||||
if cid in missing_results:
|
||||
patched.append({
|
||||
"role": "tool",
|
||||
"content": "[Result from earlier conversation — see context summary above]",
|
||||
"tool_call_id": cid,
|
||||
})
|
||||
messages = patched
|
||||
if not self.quiet_mode:
|
||||
logger.info("Compression sanitizer: added %d stub tool result(s)", len(missing_results))
|
||||
|
||||
return messages
|
||||
|
||||
def _align_boundary_forward(self, messages: List[Dict[str, Any]], idx: int) -> int:
|
||||
"""Push a compress-start boundary forward past any orphan tool results.
|
||||
|
||||
If ``messages[idx]`` is a tool result, slide forward until we hit a
|
||||
non-tool message so we don't start the summarised region mid-group.
|
||||
"""
|
||||
while idx < len(messages) and messages[idx].get("role") == "tool":
|
||||
idx += 1
|
||||
return idx
|
||||
|
||||
def _align_boundary_backward(self, messages: List[Dict[str, Any]], idx: int) -> int:
|
||||
"""Pull a compress-end boundary backward to avoid splitting a
|
||||
tool_call / result group.
|
||||
|
||||
If the message just before ``idx`` is an assistant message with
|
||||
tool_calls, those tool results will start at ``idx`` and would be
|
||||
separated from their parent. Move backwards to include the whole
|
||||
group in the summarised region.
|
||||
"""
|
||||
if idx <= 0 or idx >= len(messages):
|
||||
return idx
|
||||
prev = messages[idx - 1]
|
||||
if prev.get("role") == "assistant" and prev.get("tool_calls"):
|
||||
# The results for this assistant turn sit at idx..idx+k.
|
||||
# Include the assistant message in the summarised region too.
|
||||
idx -= 1
|
||||
return idx
|
||||
|
||||
def compress(self, messages: List[Dict[str, Any]], current_tokens: int = None) -> List[Dict[str, Any]]:
|
||||
"""Compress conversation messages by summarizing middle turns.
|
||||
|
||||
Keeps first N + last N turns, summarizes everything in between.
|
||||
After compression, orphaned tool_call / tool_result pairs are cleaned
|
||||
up so the API never receives mismatched IDs.
|
||||
"""
|
||||
n_messages = len(messages)
|
||||
if n_messages <= self.protect_first_n + self.protect_last_n + 1:
|
||||
@@ -313,12 +212,6 @@ Write only the summary, starting with "[CONTEXT SUMMARY]:" prefix."""
|
||||
if compress_start >= compress_end:
|
||||
return messages
|
||||
|
||||
# Adjust boundaries to avoid splitting tool_call/result groups.
|
||||
compress_start = self._align_boundary_forward(messages, compress_start)
|
||||
compress_end = self._align_boundary_backward(messages, compress_end)
|
||||
if compress_start >= compress_end:
|
||||
return messages
|
||||
|
||||
turns_to_summarize = messages[compress_start:compress_end]
|
||||
display_tokens = current_tokens if current_tokens else self.last_prompt_tokens or estimate_messages_tokens_rough(messages)
|
||||
|
||||
@@ -340,7 +233,6 @@ Write only the summary, starting with "[CONTEXT SUMMARY]:" prefix."""
|
||||
tail = messages[-self.protect_last_n:]
|
||||
kept.extend(m.copy() for m in tail)
|
||||
self.compression_count += 1
|
||||
kept = self._sanitize_tool_pairs(kept)
|
||||
if not self.quiet_mode:
|
||||
print(f" ✂️ Truncated: {len(messages)} → {len(kept)} messages (dropped middle turns)")
|
||||
return kept
|
||||
@@ -364,8 +256,6 @@ Write only the summary, starting with "[CONTEXT SUMMARY]:" prefix."""
|
||||
|
||||
self.compression_count += 1
|
||||
|
||||
compressed = self._sanitize_tool_pairs(compressed)
|
||||
|
||||
if not self.quiet_mode:
|
||||
new_estimate = estimate_messages_tokens_rough(compressed)
|
||||
saved_estimate = display_tokens - new_estimate
|
||||
|
||||
@@ -55,20 +55,6 @@ MODEL_PRICING = {
|
||||
# Meta (via providers)
|
||||
"llama-4-maverick": {"input": 0.50, "output": 0.70},
|
||||
"llama-4-scout": {"input": 0.20, "output": 0.30},
|
||||
# Z.AI / GLM (direct provider — pricing not published externally, treat as local)
|
||||
"glm-5": {"input": 0.0, "output": 0.0},
|
||||
"glm-4.7": {"input": 0.0, "output": 0.0},
|
||||
"glm-4.5": {"input": 0.0, "output": 0.0},
|
||||
"glm-4.5-flash": {"input": 0.0, "output": 0.0},
|
||||
# Kimi / Moonshot (direct provider — pricing not published externally, treat as local)
|
||||
"kimi-k2.5": {"input": 0.0, "output": 0.0},
|
||||
"kimi-k2-thinking": {"input": 0.0, "output": 0.0},
|
||||
"kimi-k2-turbo-preview": {"input": 0.0, "output": 0.0},
|
||||
"kimi-k2-0905-preview": {"input": 0.0, "output": 0.0},
|
||||
# MiniMax (direct provider — pricing not published externally, treat as local)
|
||||
"MiniMax-M2.5": {"input": 0.0, "output": 0.0},
|
||||
"MiniMax-M2.5-highspeed": {"input": 0.0, "output": 0.0},
|
||||
"MiniMax-M2.1": {"input": 0.0, "output": 0.0},
|
||||
}
|
||||
|
||||
# Fallback: unknown/custom models get zero cost (we can't assume pricing
|
||||
|
||||
@@ -49,17 +49,6 @@ DEFAULT_CONTEXT_LENGTHS = {
|
||||
"meta-llama/llama-3.3-70b-instruct": 131072,
|
||||
"deepseek/deepseek-chat-v3": 65536,
|
||||
"qwen/qwen-2.5-72b-instruct": 32768,
|
||||
"glm-4.7": 202752,
|
||||
"glm-5": 202752,
|
||||
"glm-4.5": 131072,
|
||||
"glm-4.5-flash": 131072,
|
||||
"kimi-k2.5": 262144,
|
||||
"kimi-k2-thinking": 262144,
|
||||
"kimi-k2-turbo-preview": 262144,
|
||||
"kimi-k2-0905-preview": 131072,
|
||||
"MiniMax-M2.5": 204800,
|
||||
"MiniMax-M2.5-highspeed": 204800,
|
||||
"MiniMax-M2.1": 204800,
|
||||
}
|
||||
|
||||
|
||||
|
||||
@@ -142,28 +142,12 @@ def _read_skill_description(skill_file: Path, max_chars: int = 60) -> str:
|
||||
return ""
|
||||
|
||||
|
||||
def _skill_is_platform_compatible(skill_file: Path) -> bool:
|
||||
"""Quick check if a SKILL.md is compatible with the current OS platform.
|
||||
|
||||
Reads just enough to parse the ``platforms`` frontmatter field.
|
||||
Skills without the field (the vast majority) are always compatible.
|
||||
"""
|
||||
try:
|
||||
from tools.skills_tool import _parse_frontmatter, skill_matches_platform
|
||||
raw = skill_file.read_text(encoding="utf-8")[:2000]
|
||||
frontmatter, _ = _parse_frontmatter(raw)
|
||||
return skill_matches_platform(frontmatter)
|
||||
except Exception:
|
||||
return True # Err on the side of showing the skill
|
||||
|
||||
|
||||
def build_skills_system_prompt() -> str:
|
||||
"""Build a compact skill index for the system prompt.
|
||||
|
||||
Scans ~/.hermes/skills/ for SKILL.md files grouped by category.
|
||||
Includes per-skill descriptions from frontmatter so the model can
|
||||
match skills by meaning, not just name.
|
||||
Filters out skills incompatible with the current OS platform.
|
||||
"""
|
||||
hermes_home = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
|
||||
skills_dir = hermes_home / "skills"
|
||||
@@ -175,9 +159,6 @@ def build_skills_system_prompt() -> str:
|
||||
# Each entry: (skill_name, description)
|
||||
skills_by_category: dict[str, list[tuple[str, str]]] = {}
|
||||
for skill_file in skills_dir.rglob("SKILL.md"):
|
||||
# Skip skills incompatible with the current OS platform
|
||||
if not _skill_is_platform_compatible(skill_file):
|
||||
continue
|
||||
rel_path = skill_file.relative_to(skills_dir)
|
||||
parts = rel_path.parts
|
||||
if len(parts) >= 2:
|
||||
|
||||
@@ -22,7 +22,7 @@ def scan_skill_commands() -> Dict[str, Dict[str, Any]]:
|
||||
global _skill_commands
|
||||
_skill_commands = {}
|
||||
try:
|
||||
from tools.skills_tool import SKILLS_DIR, _parse_frontmatter, skill_matches_platform
|
||||
from tools.skills_tool import SKILLS_DIR, _parse_frontmatter
|
||||
if not SKILLS_DIR.exists():
|
||||
return _skill_commands
|
||||
for skill_md in SKILLS_DIR.rglob("SKILL.md"):
|
||||
@@ -31,9 +31,6 @@ def scan_skill_commands() -> Dict[str, Dict[str, Any]]:
|
||||
try:
|
||||
content = skill_md.read_text(encoding='utf-8')
|
||||
frontmatter, body = _parse_frontmatter(content)
|
||||
# Skip skills incompatible with the current OS platform
|
||||
if not skill_matches_platform(frontmatter):
|
||||
continue
|
||||
name = frontmatter.get('name', skill_md.parent.name)
|
||||
description = frontmatter.get('description', '')
|
||||
if not description:
|
||||
|
||||
@@ -13,10 +13,6 @@ model:
|
||||
# "auto" - Use Nous Portal if logged in, otherwise OpenRouter/env vars (default)
|
||||
# "openrouter" - Always use OpenRouter API key from OPENROUTER_API_KEY
|
||||
# "nous" - Always use Nous Portal (requires: hermes login)
|
||||
# "zai" - Use z.ai / ZhipuAI GLM models (requires: GLM_API_KEY)
|
||||
# "kimi-coding"- Use Kimi / Moonshot AI models (requires: KIMI_API_KEY)
|
||||
# "minimax" - Use MiniMax global endpoint (requires: MINIMAX_API_KEY)
|
||||
# "minimax-cn" - Use MiniMax China endpoint (requires: MINIMAX_CN_API_KEY)
|
||||
# Can also be overridden with --provider flag or HERMES_INFERENCE_PROVIDER env var.
|
||||
provider: "auto"
|
||||
|
||||
@@ -199,58 +195,8 @@ compression:
|
||||
threshold: 0.85
|
||||
|
||||
# Model to use for generating summaries (fast/cheap recommended)
|
||||
# This model compresses the middle turns into a concise summary.
|
||||
# IMPORTANT: it receives the full middle section of the conversation, so it
|
||||
# MUST support a context length at least as large as your main model's.
|
||||
# This model compresses the middle turns into a concise summary
|
||||
summary_model: "google/gemini-3-flash-preview"
|
||||
|
||||
# Provider for the summary model (default: "auto")
|
||||
# Options: "auto", "openrouter", "nous", "main"
|
||||
# summary_provider: "auto"
|
||||
|
||||
# =============================================================================
|
||||
# Auxiliary Models (Advanced — Experimental)
|
||||
# =============================================================================
|
||||
# Hermes uses lightweight "auxiliary" models for side tasks: image analysis,
|
||||
# browser screenshot analysis, web page summarization, and context compression.
|
||||
#
|
||||
# By default these use Gemini Flash via OpenRouter or Nous Portal and are
|
||||
# auto-detected from your credentials. You do NOT need to change anything
|
||||
# here for normal usage.
|
||||
#
|
||||
# WARNING: Overriding these with providers other than OpenRouter or Nous Portal
|
||||
# is EXPERIMENTAL and may not work. Not all models/providers support vision,
|
||||
# produce usable summaries, or accept the same API format. Change at your own
|
||||
# risk — if things break, reset to "auto" / empty values.
|
||||
#
|
||||
# Each task has its own provider + model pair so you can mix providers.
|
||||
# For example: OpenRouter for vision (needs multimodal), but your main
|
||||
# local endpoint for compression (just needs text).
|
||||
#
|
||||
# Provider options:
|
||||
# "auto" - Best available: OpenRouter → Nous Portal → main endpoint (default)
|
||||
# "openrouter" - Force OpenRouter (requires OPENROUTER_API_KEY)
|
||||
# "nous" - Force Nous Portal (requires: hermes login)
|
||||
# "main" - Use the same provider & credentials as your main chat model.
|
||||
# Skips OpenRouter/Nous and uses your custom endpoint
|
||||
# (OPENAI_BASE_URL), Codex OAuth, or API-key provider directly.
|
||||
# Useful if you run a local model and want auxiliary tasks to
|
||||
# use it too.
|
||||
#
|
||||
# Model: leave empty to use the provider's default. When empty, OpenRouter
|
||||
# uses "google/gemini-3-flash-preview" and Nous uses "gemini-3-flash".
|
||||
# Other providers pick a sensible default automatically.
|
||||
#
|
||||
# auxiliary:
|
||||
# # Image analysis: vision_analyze tool + browser screenshots
|
||||
# vision:
|
||||
# provider: "auto"
|
||||
# model: "" # e.g. "google/gemini-2.5-flash", "openai/gpt-4o"
|
||||
#
|
||||
# # Web page scraping / summarization + browser page text extraction
|
||||
# web_extract:
|
||||
# provider: "auto"
|
||||
# model: ""
|
||||
|
||||
# =============================================================================
|
||||
# Persistent Memory
|
||||
|
||||
34
cli.py
34
cli.py
@@ -169,7 +169,7 @@ def load_cli_config() -> Dict[str, Any]:
|
||||
"summary_model": "google/gemini-3-flash-preview", # Fast/cheap model for summaries
|
||||
},
|
||||
"agent": {
|
||||
"max_turns": 90, # Default max tool-calling iterations (shared with subagents)
|
||||
"max_turns": 60, # Default max tool-calling iterations
|
||||
"verbose": False,
|
||||
"system_prompt": "",
|
||||
"prefill_messages_file": "",
|
||||
@@ -332,36 +332,12 @@ def load_cli_config() -> Dict[str, Any]:
|
||||
"enabled": "CONTEXT_COMPRESSION_ENABLED",
|
||||
"threshold": "CONTEXT_COMPRESSION_THRESHOLD",
|
||||
"summary_model": "CONTEXT_COMPRESSION_MODEL",
|
||||
"summary_provider": "CONTEXT_COMPRESSION_PROVIDER",
|
||||
}
|
||||
|
||||
for config_key, env_var in compression_env_mappings.items():
|
||||
if config_key in compression_config:
|
||||
os.environ[env_var] = str(compression_config[config_key])
|
||||
|
||||
# Apply auxiliary model overrides to environment variables.
|
||||
# Vision and web_extract each have their own provider + model pair.
|
||||
# (Compression is handled in the compression section above.)
|
||||
# Only set env vars for non-empty / non-default values so auto-detection
|
||||
# still works.
|
||||
auxiliary_config = defaults.get("auxiliary", {})
|
||||
auxiliary_task_env = {
|
||||
# config key → (provider env var, model env var)
|
||||
"vision": ("AUXILIARY_VISION_PROVIDER", "AUXILIARY_VISION_MODEL"),
|
||||
"web_extract": ("AUXILIARY_WEB_EXTRACT_PROVIDER", "AUXILIARY_WEB_EXTRACT_MODEL"),
|
||||
}
|
||||
|
||||
for task_key, (prov_env, model_env) in auxiliary_task_env.items():
|
||||
task_cfg = auxiliary_config.get(task_key, {})
|
||||
if not isinstance(task_cfg, dict):
|
||||
continue
|
||||
prov = str(task_cfg.get("provider", "")).strip()
|
||||
model = str(task_cfg.get("model", "")).strip()
|
||||
if prov and prov != "auto":
|
||||
os.environ[prov_env] = prov
|
||||
if model:
|
||||
os.environ[model_env] = model
|
||||
|
||||
return defaults
|
||||
|
||||
# Load configuration at module startup
|
||||
@@ -857,10 +833,10 @@ class HermesCLI:
|
||||
Args:
|
||||
model: Model to use (default: from env or claude-sonnet)
|
||||
toolsets: List of toolsets to enable (default: all)
|
||||
provider: Inference provider ("auto", "openrouter", "nous", "openai-codex", "zai", "kimi-coding", "minimax", "minimax-cn")
|
||||
provider: Inference provider ("auto", "openrouter", "nous", "openai-codex")
|
||||
api_key: API key (default: from environment)
|
||||
base_url: API base URL (default: OpenRouter)
|
||||
max_turns: Maximum tool-calling iterations shared with subagents (default: 90)
|
||||
max_turns: Maximum tool-calling iterations (default: 60)
|
||||
verbose: Enable verbose logging
|
||||
compact: Use compact display mode
|
||||
resume: Session ID to resume (restores conversation history from SQLite)
|
||||
@@ -913,7 +889,7 @@ class HermesCLI:
|
||||
elif os.getenv("HERMES_MAX_ITERATIONS"):
|
||||
self.max_turns = int(os.getenv("HERMES_MAX_ITERATIONS"))
|
||||
else:
|
||||
self.max_turns = 90
|
||||
self.max_turns = 60
|
||||
|
||||
# Parse and validate toolsets
|
||||
self.enabled_toolsets = toolsets
|
||||
@@ -3253,7 +3229,7 @@ def main(
|
||||
q: Shorthand for --query
|
||||
toolsets: Comma-separated list of toolsets to enable (e.g., "web,terminal")
|
||||
model: Model to use (default: anthropic/claude-opus-4-20250514)
|
||||
provider: Inference provider ("auto", "openrouter", "nous", "openai-codex", "zai", "kimi-coding", "minimax", "minimax-cn")
|
||||
provider: Inference provider ("auto", "openrouter", "nous")
|
||||
api_key: API key for authentication
|
||||
base_url: Base URL for the API
|
||||
max_turns: Maximum tool-calling iterations (default: 60)
|
||||
|
||||
45
cron/jobs.py
45
cron/jobs.py
@@ -14,8 +14,6 @@ from datetime import datetime, timedelta
|
||||
from pathlib import Path
|
||||
from typing import Optional, Dict, List, Any
|
||||
|
||||
from hermes_time import now as _hermes_now
|
||||
|
||||
try:
|
||||
from croniter import croniter
|
||||
HAS_CRONITER = True
|
||||
@@ -130,7 +128,7 @@ def parse_schedule(schedule: str) -> Dict[str, Any]:
|
||||
# Duration like "30m", "2h", "1d" → one-shot from now
|
||||
try:
|
||||
minutes = parse_duration(schedule)
|
||||
run_at = _hermes_now() + timedelta(minutes=minutes)
|
||||
run_at = datetime.now() + timedelta(minutes=minutes)
|
||||
return {
|
||||
"kind": "once",
|
||||
"run_at": run_at.isoformat(),
|
||||
@@ -148,50 +146,37 @@ def parse_schedule(schedule: str) -> Dict[str, Any]:
|
||||
)
|
||||
|
||||
|
||||
def _ensure_aware(dt: datetime) -> datetime:
|
||||
"""Make a naive datetime tz-aware using the configured timezone.
|
||||
|
||||
Handles backward compatibility: timestamps stored before timezone support
|
||||
are naive (server-local). We assume they were in the same timezone as
|
||||
the current configuration so comparisons work without crashing.
|
||||
"""
|
||||
if dt.tzinfo is None:
|
||||
tz = _hermes_now().tzinfo
|
||||
return dt.replace(tzinfo=tz)
|
||||
return dt
|
||||
|
||||
|
||||
def compute_next_run(schedule: Dict[str, Any], last_run_at: Optional[str] = None) -> Optional[str]:
|
||||
"""
|
||||
Compute the next run time for a schedule.
|
||||
|
||||
|
||||
Returns ISO timestamp string, or None if no more runs.
|
||||
"""
|
||||
now = _hermes_now()
|
||||
|
||||
now = datetime.now()
|
||||
|
||||
if schedule["kind"] == "once":
|
||||
run_at = _ensure_aware(datetime.fromisoformat(schedule["run_at"]))
|
||||
run_at = datetime.fromisoformat(schedule["run_at"])
|
||||
# If in the future, return it; if in the past, no more runs
|
||||
return schedule["run_at"] if run_at > now else None
|
||||
|
||||
|
||||
elif schedule["kind"] == "interval":
|
||||
minutes = schedule["minutes"]
|
||||
if last_run_at:
|
||||
# Next run is last_run + interval
|
||||
last = _ensure_aware(datetime.fromisoformat(last_run_at))
|
||||
last = datetime.fromisoformat(last_run_at)
|
||||
next_run = last + timedelta(minutes=minutes)
|
||||
else:
|
||||
# First run is now + interval
|
||||
next_run = now + timedelta(minutes=minutes)
|
||||
return next_run.isoformat()
|
||||
|
||||
|
||||
elif schedule["kind"] == "cron":
|
||||
if not HAS_CRONITER:
|
||||
return None
|
||||
cron = croniter(schedule["expr"], now)
|
||||
next_run = cron.get_next(datetime)
|
||||
return next_run.isoformat()
|
||||
|
||||
|
||||
return None
|
||||
|
||||
|
||||
@@ -219,7 +204,7 @@ def save_jobs(jobs: List[Dict[str, Any]]):
|
||||
fd, tmp_path = tempfile.mkstemp(dir=str(JOBS_FILE.parent), suffix='.tmp', prefix='.jobs_')
|
||||
try:
|
||||
with os.fdopen(fd, 'w', encoding='utf-8') as f:
|
||||
json.dump({"jobs": jobs, "updated_at": _hermes_now().isoformat()}, f, indent=2)
|
||||
json.dump({"jobs": jobs, "updated_at": datetime.now().isoformat()}, f, indent=2)
|
||||
f.flush()
|
||||
os.fsync(f.fileno())
|
||||
os.replace(tmp_path, JOBS_FILE)
|
||||
@@ -264,7 +249,7 @@ def create_job(
|
||||
deliver = "origin" if origin else "local"
|
||||
|
||||
job_id = uuid.uuid4().hex[:12]
|
||||
now = _hermes_now().isoformat()
|
||||
now = datetime.now().isoformat()
|
||||
|
||||
job = {
|
||||
"id": job_id,
|
||||
@@ -343,7 +328,7 @@ def mark_job_run(job_id: str, success: bool, error: Optional[str] = None):
|
||||
jobs = load_jobs()
|
||||
for i, job in enumerate(jobs):
|
||||
if job["id"] == job_id:
|
||||
now = _hermes_now().isoformat()
|
||||
now = datetime.now().isoformat()
|
||||
job["last_run_at"] = now
|
||||
job["last_status"] = "ok" if success else "error"
|
||||
job["last_error"] = error if not success else None
|
||||
@@ -376,7 +361,7 @@ def mark_job_run(job_id: str, success: bool, error: Optional[str] = None):
|
||||
|
||||
def get_due_jobs() -> List[Dict[str, Any]]:
|
||||
"""Get all jobs that are due to run now."""
|
||||
now = _hermes_now()
|
||||
now = datetime.now()
|
||||
jobs = load_jobs()
|
||||
due = []
|
||||
|
||||
@@ -388,7 +373,7 @@ def get_due_jobs() -> List[Dict[str, Any]]:
|
||||
if not next_run:
|
||||
continue
|
||||
|
||||
next_run_dt = _ensure_aware(datetime.fromisoformat(next_run))
|
||||
next_run_dt = datetime.fromisoformat(next_run)
|
||||
if next_run_dt <= now:
|
||||
due.append(job)
|
||||
|
||||
@@ -401,7 +386,7 @@ def save_job_output(job_id: str, output: str):
|
||||
job_output_dir = OUTPUT_DIR / job_id
|
||||
job_output_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
timestamp = _hermes_now().strftime("%Y-%m-%d_%H-%M-%S")
|
||||
timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
|
||||
output_file = job_output_dir / f"{timestamp}.md"
|
||||
|
||||
with open(output_file, 'w', encoding='utf-8') as f:
|
||||
|
||||
@@ -27,8 +27,6 @@ from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
from hermes_time import now as _hermes_now
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Add parent directory to path for imports
|
||||
@@ -209,7 +207,7 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
|
||||
provider=runtime.get("provider"),
|
||||
api_mode=runtime.get("api_mode"),
|
||||
quiet_mode=True,
|
||||
session_id=f"cron_{job_id}_{_hermes_now().strftime('%Y%m%d_%H%M%S')}"
|
||||
session_id=f"cron_{job_id}_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
|
||||
)
|
||||
|
||||
result = agent.run_conversation(prompt)
|
||||
@@ -221,7 +219,7 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
|
||||
output = f"""# Cron Job: {job_name}
|
||||
|
||||
**Job ID:** {job_id}
|
||||
**Run Time:** {_hermes_now().strftime('%Y-%m-%d %H:%M:%S')}
|
||||
**Run Time:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
|
||||
**Schedule:** {job.get('schedule_display', 'N/A')}
|
||||
|
||||
## Prompt
|
||||
@@ -243,7 +241,7 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
|
||||
output = f"""# Cron Job: {job_name} (FAILED)
|
||||
|
||||
**Job ID:** {job_id}
|
||||
**Run Time:** {_hermes_now().strftime('%Y-%m-%d %H:%M:%S')}
|
||||
**Run Time:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
|
||||
**Schedule:** {job.get('schedule_display', 'N/A')}
|
||||
|
||||
## Prompt
|
||||
@@ -299,11 +297,11 @@ def tick(verbose: bool = True) -> int:
|
||||
due_jobs = get_due_jobs()
|
||||
|
||||
if verbose and not due_jobs:
|
||||
logger.info("%s - No jobs due", _hermes_now().strftime('%H:%M:%S'))
|
||||
logger.info("%s - No jobs due", datetime.now().strftime('%H:%M:%S'))
|
||||
return 0
|
||||
|
||||
if verbose:
|
||||
logger.info("%s - %s job(s) due", _hermes_now().strftime('%H:%M:%S'), len(due_jobs))
|
||||
logger.info("%s - %s job(s) due", datetime.now().strftime('%H:%M:%S'), len(due_jobs))
|
||||
|
||||
executed = 0
|
||||
for job in due_jobs:
|
||||
|
||||
@@ -195,12 +195,8 @@ environments/
|
||||
│ └── hermes_swe_env.py
|
||||
│
|
||||
└── benchmarks/ # Evaluation benchmarks
|
||||
├── terminalbench_2/ # 89 terminal tasks, Modal sandboxes
|
||||
│ └── terminalbench2_env.py
|
||||
├── tblite/ # 100 calibrated tasks (fast TB2 proxy)
|
||||
│ └── tblite_env.py
|
||||
└── yc_bench/ # Long-horizon strategic benchmark
|
||||
└── yc_bench_env.py
|
||||
└── terminalbench_2/
|
||||
└── terminalbench2_env.py
|
||||
```
|
||||
|
||||
## Concrete Environments
|
||||
|
||||
@@ -1,115 +0,0 @@
|
||||
# YC-Bench: Long-Horizon Agent Benchmark
|
||||
|
||||
[YC-Bench](https://github.com/collinear-ai/yc-bench) by [Collinear AI](https://collinear.ai/) is a deterministic, long-horizon benchmark that tests LLM agents' ability to act as a tech startup CEO. The agent manages a simulated company over 1-3 years, making compounding decisions about resource allocation, cash flow, task management, and prestige specialisation across 4 skill domains.
|
||||
|
||||
Unlike TerminalBench2 (which evaluates per-task coding ability with binary pass/fail), YC-Bench measures **long-term strategic coherence** — whether an agent can maintain consistent strategy, manage compounding consequences, and adapt plans over hundreds of turns.
|
||||
|
||||
## Setup
|
||||
|
||||
```bash
|
||||
# Install yc-bench (optional dependency)
|
||||
pip install "hermes-agent[yc-bench]"
|
||||
|
||||
# Or install from source
|
||||
git clone https://github.com/collinear-ai/yc-bench
|
||||
cd yc-bench && pip install -e .
|
||||
|
||||
# Verify
|
||||
yc-bench --help
|
||||
```
|
||||
|
||||
## Running
|
||||
|
||||
```bash
|
||||
# From the repo root:
|
||||
bash environments/benchmarks/yc_bench/run_eval.sh
|
||||
|
||||
# Or directly:
|
||||
python environments/benchmarks/yc_bench/yc_bench_env.py evaluate \
|
||||
--config environments/benchmarks/yc_bench/default.yaml
|
||||
|
||||
# Override model:
|
||||
bash environments/benchmarks/yc_bench/run_eval.sh \
|
||||
--openai.model_name anthropic/claude-opus-4-20250514
|
||||
|
||||
# Quick single-preset test:
|
||||
bash environments/benchmarks/yc_bench/run_eval.sh \
|
||||
--env.presets '["fast_test"]' --env.seeds '[1]'
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
### Architecture
|
||||
|
||||
```
|
||||
HermesAgentLoop (our agent)
|
||||
-> terminal tool -> subprocess("yc-bench company status") -> JSON output
|
||||
-> terminal tool -> subprocess("yc-bench task accept --task-id X") -> JSON
|
||||
-> terminal tool -> subprocess("yc-bench sim resume") -> JSON (advance time)
|
||||
-> ... (100-500 turns per run)
|
||||
```
|
||||
|
||||
The environment initialises the simulation via `yc-bench sim init` (NOT `yc-bench run`, which would start yc-bench's own built-in agent loop). Our `HermesAgentLoop` then drives all interaction through CLI commands.
|
||||
|
||||
### Simulation Mechanics
|
||||
|
||||
- **4 skill domains**: research, inference, data_environment, training
|
||||
- **Prestige system** (1.0-10.0): Gates access to higher-paying tasks
|
||||
- **Employee management**: Junior/Mid/Senior with domain-specific skill rates
|
||||
- **Throughput splitting**: `effective_rate = base_rate / N` active tasks per employee
|
||||
- **Financial pressure**: Monthly payroll, bankruptcy = game over
|
||||
- **Deterministic**: SHA256-based RNG — same seed + preset = same world
|
||||
|
||||
### Difficulty Presets
|
||||
|
||||
| Preset | Employees | Tasks | Focus |
|
||||
|-----------|-----------|-------|-------|
|
||||
| tutorial | 3 | 50 | Basic loop mechanics |
|
||||
| easy | 5 | 100 | Throughput awareness |
|
||||
| **medium**| 5 | 150 | Prestige climbing + domain specialisation |
|
||||
| **hard** | 7 | 200 | Precise ETA reasoning |
|
||||
| nightmare | 8 | 300 | Sustained perfection under payroll pressure |
|
||||
| fast_test | (varies) | (varies) | Quick validation (~50 turns) |
|
||||
|
||||
Default eval runs **fast_test + medium + hard** × 3 seeds = 9 runs.
|
||||
|
||||
### Scoring
|
||||
|
||||
```
|
||||
composite = 0.5 × survival + 0.5 × normalised_funds
|
||||
```
|
||||
|
||||
- **Survival** (binary): Did the company avoid bankruptcy?
|
||||
- **Normalised funds** (0.0-1.0): Log-scale relative to initial $250K capital
|
||||
|
||||
## Configuration
|
||||
|
||||
Key fields in `default.yaml`:
|
||||
|
||||
| Field | Default | Description |
|
||||
|-------|---------|-------------|
|
||||
| `presets` | `["fast_test", "medium", "hard"]` | Which presets to evaluate |
|
||||
| `seeds` | `[1, 2, 3]` | RNG seeds per preset |
|
||||
| `max_agent_turns` | 200 | Max LLM calls per run |
|
||||
| `run_timeout` | 3600 | Wall-clock timeout per run (seconds) |
|
||||
| `survival_weight` | 0.5 | Weight of survival in composite score |
|
||||
| `funds_weight` | 0.5 | Weight of normalised funds in composite |
|
||||
| `horizon_years` | null | Override horizon (null = auto from preset) |
|
||||
|
||||
## Cost & Time Estimates
|
||||
|
||||
Each run is 100-500 LLM turns. Approximate costs per run at typical API rates:
|
||||
|
||||
| Preset | Turns | Time | Est. Cost |
|
||||
|--------|-------|------|-----------|
|
||||
| fast_test | ~50 | 5-10 min | $1-5 |
|
||||
| medium | ~200 | 20-40 min | $5-15 |
|
||||
| hard | ~300 | 30-60 min | $10-25 |
|
||||
|
||||
Full default eval (9 runs): ~3-6 hours, $50-200 depending on model.
|
||||
|
||||
## References
|
||||
|
||||
- [collinear-ai/yc-bench](https://github.com/collinear-ai/yc-bench) — Official repository
|
||||
- [Collinear AI](https://collinear.ai/) — Company behind yc-bench
|
||||
- [TerminalBench2](../terminalbench_2/) — Per-task coding benchmark (complementary)
|
||||
@@ -1,43 +0,0 @@
|
||||
# YC-Bench Evaluation -- Default Configuration
|
||||
#
|
||||
# Long-horizon agent benchmark: agent plays CEO of an AI startup over
|
||||
# a simulated 1-3 year run, interacting via yc-bench CLI subcommands.
|
||||
#
|
||||
# Requires: pip install "hermes-agent[yc-bench]"
|
||||
#
|
||||
# Usage:
|
||||
# python environments/benchmarks/yc_bench/yc_bench_env.py evaluate \
|
||||
# --config environments/benchmarks/yc_bench/default.yaml
|
||||
#
|
||||
# # Override model:
|
||||
# python environments/benchmarks/yc_bench/yc_bench_env.py evaluate \
|
||||
# --config environments/benchmarks/yc_bench/default.yaml \
|
||||
# --openai.model_name anthropic/claude-opus-4-20250514
|
||||
|
||||
env:
|
||||
enabled_toolsets: ["terminal"]
|
||||
max_agent_turns: 200
|
||||
max_token_length: 32000
|
||||
agent_temperature: 0.0
|
||||
terminal_backend: "local"
|
||||
terminal_timeout: 60
|
||||
presets: ["fast_test", "medium", "hard"]
|
||||
seeds: [1, 2, 3]
|
||||
run_timeout: 3600 # 60 min wall-clock per run, auto-FAIL if exceeded
|
||||
survival_weight: 0.5 # weight of binary survival in composite score
|
||||
funds_weight: 0.5 # weight of normalised final funds in composite score
|
||||
db_dir: "/tmp/yc_bench_dbs"
|
||||
company_name: "BenchCo"
|
||||
start_date: "01/01/2025" # MM/DD/YYYY (yc-bench convention)
|
||||
tokenizer_name: "NousResearch/Hermes-3-Llama-3.1-8B"
|
||||
use_wandb: true
|
||||
wandb_name: "yc-bench"
|
||||
ensure_scores_are_not_same: false
|
||||
data_dir_to_save_evals: "environments/benchmarks/evals/yc-bench"
|
||||
|
||||
openai:
|
||||
base_url: "https://openrouter.ai/api/v1"
|
||||
model_name: "anthropic/claude-sonnet-4.6"
|
||||
server_type: "openai"
|
||||
health_check: false
|
||||
# api_key loaded from OPENROUTER_API_KEY in .env
|
||||
@@ -1,34 +0,0 @@
|
||||
#!/bin/bash
|
||||
|
||||
# YC-Bench Evaluation
|
||||
#
|
||||
# Requires: pip install "hermes-agent[yc-bench]"
|
||||
#
|
||||
# Run from repo root:
|
||||
# bash environments/benchmarks/yc_bench/run_eval.sh
|
||||
#
|
||||
# Override model:
|
||||
# bash environments/benchmarks/yc_bench/run_eval.sh \
|
||||
# --openai.model_name anthropic/claude-opus-4-20250514
|
||||
#
|
||||
# Run a single preset:
|
||||
# bash environments/benchmarks/yc_bench/run_eval.sh \
|
||||
# --env.presets '["fast_test"]' --env.seeds '[1]'
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
mkdir -p logs evals/yc-bench
|
||||
LOG_FILE="logs/yc_bench_$(date +%Y%m%d_%H%M%S).log"
|
||||
|
||||
echo "YC-Bench Evaluation"
|
||||
echo "Log: $LOG_FILE"
|
||||
echo ""
|
||||
|
||||
PYTHONUNBUFFERED=1 LOGLEVEL="${LOGLEVEL:-INFO}" \
|
||||
python environments/benchmarks/yc_bench/yc_bench_env.py evaluate \
|
||||
--config environments/benchmarks/yc_bench/default.yaml \
|
||||
"$@" \
|
||||
2>&1 | tee "$LOG_FILE"
|
||||
|
||||
echo ""
|
||||
echo "Log saved to: $LOG_FILE"
|
||||
@@ -1,847 +0,0 @@
|
||||
"""
|
||||
YCBenchEvalEnv -- YC-Bench Long-Horizon Agent Benchmark Environment
|
||||
|
||||
Evaluates agentic LLMs on YC-Bench: a deterministic, long-horizon benchmark
|
||||
where the agent acts as CEO of an AI startup over a simulated 1-3 year run.
|
||||
The agent manages cash flow, employees, tasks, and prestige across 4 domains,
|
||||
interacting exclusively via CLI subprocess calls against a SQLite-backed
|
||||
discrete-event simulation.
|
||||
|
||||
Unlike TerminalBench2 (per-task binary pass/fail), YC-Bench measures sustained
|
||||
multi-turn strategic coherence -- whether an agent can manage compounding
|
||||
decisions over hundreds of turns without going bankrupt.
|
||||
|
||||
This is an eval-only environment. Run via:
|
||||
|
||||
python environments/benchmarks/yc_bench/yc_bench_env.py evaluate \
|
||||
--config environments/benchmarks/yc_bench/default.yaml
|
||||
|
||||
The evaluate flow:
|
||||
1. setup() -- Verifies yc-bench installed, builds eval matrix (preset x seed)
|
||||
2. evaluate() -- Iterates over all runs sequentially through:
|
||||
a. rollout_and_score_eval() -- Per-run agent loop
|
||||
- Initialises a fresh yc-bench simulation via `sim init` (NOT `run`)
|
||||
- Runs HermesAgentLoop with terminal tool only
|
||||
- Reads final SQLite DB to extract score
|
||||
- Returns survival (0/1) + normalised funds score
|
||||
b. Aggregates per-preset and overall metrics
|
||||
c. Logs results via evaluate_log() and wandb
|
||||
|
||||
Key features:
|
||||
- CLI-only interface: agent calls yc-bench subcommands via terminal tool
|
||||
- Deterministic: same seed + preset = same world (SHA256-based RNG)
|
||||
- Multi-dimensional scoring: survival + normalised final funds
|
||||
- Per-preset difficulty breakdown in results
|
||||
- Isolated SQLite DB per run (no cross-run state leakage)
|
||||
|
||||
Requires: pip install hermes-agent[yc-bench]
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import datetime
|
||||
import json
|
||||
import logging
|
||||
import math
|
||||
import os
|
||||
import sqlite3
|
||||
import subprocess
|
||||
import sys
|
||||
import threading
|
||||
import time
|
||||
import uuid
|
||||
from collections import defaultdict
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List, Optional, Tuple
|
||||
|
||||
_repo_root = Path(__file__).resolve().parent.parent.parent.parent
|
||||
if str(_repo_root) not in sys.path:
|
||||
sys.path.insert(0, str(_repo_root))
|
||||
|
||||
from pydantic import Field
|
||||
|
||||
from atroposlib.envs.base import EvalHandlingEnum
|
||||
from atroposlib.envs.server_handling.server_manager import APIServerConfig
|
||||
|
||||
from environments.agent_loop import HermesAgentLoop
|
||||
from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# =============================================================================
|
||||
# System prompt
|
||||
# =============================================================================
|
||||
|
||||
YC_BENCH_SYSTEM_PROMPT = """\
|
||||
You are the autonomous CEO of an early-stage AI startup in a deterministic
|
||||
business simulation. You manage the company exclusively through the `yc-bench`
|
||||
CLI tool. Your primary goal is to **survive** until the simulation horizon ends
|
||||
without going bankrupt, while **maximising final funds**.
|
||||
|
||||
## Simulation Mechanics
|
||||
|
||||
- **Funds**: You start with $250,000 seed capital. Revenue comes from completing
|
||||
tasks. Rewards scale with your prestige: `base × (1 + scale × (prestige − 1))`.
|
||||
- **Domains**: There are 4 skill domains: **research**, **inference**,
|
||||
**data_environment**, and **training**. Each has its own prestige level
|
||||
(1.0-10.0). Higher prestige unlocks better-paying tasks.
|
||||
- **Employees**: You have employees (Junior/Mid/Senior) with domain-specific
|
||||
skill rates. **Throughput splits**: `effective_rate = base_rate / N` where N
|
||||
is the number of active tasks assigned to that employee. Focus beats breadth.
|
||||
- **Payroll**: Deducted automatically on the first business day of each month.
|
||||
Running out of funds = bankruptcy = game over.
|
||||
- **Time**: The simulation runs on business days (Mon-Fri), 09:00-18:00.
|
||||
Time only advances when you call `yc-bench sim resume`.
|
||||
|
||||
## Task Lifecycle
|
||||
|
||||
1. Browse market tasks with `market browse`
|
||||
2. Accept a task with `task accept` (this sets its deadline)
|
||||
3. Assign employees with `task assign`
|
||||
4. Dispatch with `task dispatch` to start work
|
||||
5. Call `sim resume` to advance time and let employees make progress
|
||||
6. Tasks complete when all domain requirements are fulfilled
|
||||
|
||||
**Penalties for failure vary by difficulty preset.** Completing a task on time
|
||||
earns full reward + prestige gain. Missing a deadline or cancelling a task
|
||||
incurs prestige penalties -- cancelling is always more costly than letting a
|
||||
task fail, so cancel only as a last resort.
|
||||
|
||||
## CLI Commands
|
||||
|
||||
### Observe
|
||||
- `yc-bench company status` -- funds, prestige, runway
|
||||
- `yc-bench employee list` -- skills, salary, active tasks
|
||||
- `yc-bench market browse [--domain D] [--required-prestige-lte N]` -- available tasks
|
||||
- `yc-bench task list [--status active|planned]` -- your tasks
|
||||
- `yc-bench task inspect --task-id UUID` -- progress, deadline, assignments
|
||||
- `yc-bench finance ledger [--category monthly_payroll|task_reward]` -- transaction history
|
||||
- `yc-bench report monthly` -- monthly P&L
|
||||
|
||||
### Act
|
||||
- `yc-bench task accept --task-id UUID` -- accept from market
|
||||
- `yc-bench task assign --task-id UUID --employee-id UUID` -- assign employee
|
||||
- `yc-bench task dispatch --task-id UUID` -- start work (needs >=1 assignment)
|
||||
- `yc-bench task cancel --task-id UUID --reason "text"` -- cancel (prestige penalty)
|
||||
- `yc-bench sim resume` -- advance simulation clock
|
||||
|
||||
### Memory (persists across context truncation)
|
||||
- `yc-bench scratchpad read` -- read your persistent notes
|
||||
- `yc-bench scratchpad write --content "text"` -- overwrite notes
|
||||
- `yc-bench scratchpad append --content "text"` -- append to notes
|
||||
- `yc-bench scratchpad clear` -- clear notes
|
||||
|
||||
## Strategy Guidelines
|
||||
|
||||
1. **Specialise in 2-3 domains** to climb the prestige ladder faster and unlock
|
||||
high-reward tasks. Don't spread thin across all 4 domains early on.
|
||||
2. **Focus employees** -- assigning one employee to many tasks halves their
|
||||
throughput per additional task. Keep assignments concentrated.
|
||||
3. **Use the scratchpad** to track your strategy, upcoming deadlines, and
|
||||
employee assignments. This persists even if conversation context is truncated.
|
||||
4. **Monitor runway** -- always know how many months of payroll you can cover.
|
||||
Accept high-reward tasks before payroll dates.
|
||||
5. **Don't over-accept** -- taking too many tasks and missing deadlines cascades
|
||||
into prestige loss, locking you out of profitable contracts.
|
||||
6. Use `finance ledger` and `report monthly` to track revenue trends.
|
||||
|
||||
## Your Turn
|
||||
|
||||
Each turn:
|
||||
1. Call `yc-bench company status` and `yc-bench task list` to orient yourself.
|
||||
2. Check for completed tasks and pending deadlines.
|
||||
3. Browse market for profitable tasks within your prestige level.
|
||||
4. Accept, assign, and dispatch tasks strategically.
|
||||
5. Call `yc-bench sim resume` to advance time.
|
||||
6. Repeat until the simulation ends.
|
||||
|
||||
Think step by step before acting."""
|
||||
|
||||
# Starting funds in cents ($250,000)
|
||||
INITIAL_FUNDS_CENTS = 25_000_000
|
||||
|
||||
# Default horizon per preset (years)
|
||||
_PRESET_HORIZONS = {
|
||||
"tutorial": 1,
|
||||
"easy": 1,
|
||||
"medium": 1,
|
||||
"hard": 1,
|
||||
"nightmare": 1,
|
||||
"fast_test": 1,
|
||||
"default": 3,
|
||||
"high_reward": 1,
|
||||
}
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Configuration
|
||||
# =============================================================================
|
||||
|
||||
class YCBenchEvalConfig(HermesAgentEnvConfig):
|
||||
"""
|
||||
Configuration for the YC-Bench evaluation environment.
|
||||
|
||||
Extends HermesAgentEnvConfig with YC-Bench-specific settings for
|
||||
preset selection, seed control, scoring, and simulation parameters.
|
||||
"""
|
||||
|
||||
presets: List[str] = Field(
|
||||
default=["fast_test", "medium", "hard"],
|
||||
description="YC-Bench preset names to evaluate.",
|
||||
)
|
||||
seeds: List[int] = Field(
|
||||
default=[1, 2, 3],
|
||||
description="Random seeds -- each preset x seed = one run.",
|
||||
)
|
||||
run_timeout: int = Field(
|
||||
default=3600,
|
||||
description="Maximum wall-clock seconds per run. Default 60 minutes.",
|
||||
)
|
||||
survival_weight: float = Field(
|
||||
default=0.5,
|
||||
description="Weight of survival (0/1) in composite score.",
|
||||
)
|
||||
funds_weight: float = Field(
|
||||
default=0.5,
|
||||
description="Weight of normalised final funds in composite score.",
|
||||
)
|
||||
db_dir: str = Field(
|
||||
default="/tmp/yc_bench_dbs",
|
||||
description="Directory for per-run SQLite databases.",
|
||||
)
|
||||
horizon_years: Optional[int] = Field(
|
||||
default=None,
|
||||
description=(
|
||||
"Simulation horizon in years. If None (default), inferred from "
|
||||
"preset name (1 year for most, 3 for 'default')."
|
||||
),
|
||||
)
|
||||
company_name: str = Field(
|
||||
default="BenchCo",
|
||||
description="Name of the simulated company.",
|
||||
)
|
||||
start_date: str = Field(
|
||||
default="01/01/2025",
|
||||
description="Simulation start date in MM/DD/YYYY format (yc-bench convention).",
|
||||
)
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Scoring helpers
|
||||
# =============================================================================
|
||||
|
||||
def _read_final_score(db_path: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Read final game state from a YC-Bench SQLite database.
|
||||
|
||||
Returns dict with final_funds_cents (int), survived (bool),
|
||||
terminal_reason (str).
|
||||
|
||||
Note: yc-bench table names are plural -- 'companies' not 'company',
|
||||
'sim_events' not 'simulation_log'.
|
||||
"""
|
||||
if not os.path.exists(db_path):
|
||||
logger.warning("DB not found at %s", db_path)
|
||||
return {
|
||||
"final_funds_cents": 0,
|
||||
"survived": False,
|
||||
"terminal_reason": "db_missing",
|
||||
}
|
||||
|
||||
conn = None
|
||||
try:
|
||||
conn = sqlite3.connect(db_path)
|
||||
cur = conn.cursor()
|
||||
|
||||
# Read final funds from the 'companies' table
|
||||
cur.execute("SELECT funds_cents FROM companies LIMIT 1")
|
||||
row = cur.fetchone()
|
||||
funds = row[0] if row else 0
|
||||
|
||||
# Determine terminal reason from 'sim_events' table
|
||||
terminal_reason = "unknown"
|
||||
try:
|
||||
cur.execute(
|
||||
"SELECT event_type FROM sim_events "
|
||||
"WHERE event_type IN ('bankruptcy', 'horizon_end') "
|
||||
"ORDER BY scheduled_at DESC LIMIT 1"
|
||||
)
|
||||
event_row = cur.fetchone()
|
||||
if event_row:
|
||||
terminal_reason = event_row[0]
|
||||
except sqlite3.OperationalError:
|
||||
# Table may not exist if simulation didn't progress
|
||||
pass
|
||||
|
||||
survived = funds >= 0 and terminal_reason != "bankruptcy"
|
||||
return {
|
||||
"final_funds_cents": funds,
|
||||
"survived": survived,
|
||||
"terminal_reason": terminal_reason,
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
logger.error("Failed to read DB %s: %s", db_path, e)
|
||||
return {
|
||||
"final_funds_cents": 0,
|
||||
"survived": False,
|
||||
"terminal_reason": f"db_error: {e}",
|
||||
}
|
||||
finally:
|
||||
if conn:
|
||||
conn.close()
|
||||
|
||||
|
||||
def _compute_composite_score(
|
||||
final_funds_cents: int,
|
||||
survived: bool,
|
||||
survival_weight: float = 0.5,
|
||||
funds_weight: float = 0.5,
|
||||
initial_funds_cents: int = INITIAL_FUNDS_CENTS,
|
||||
) -> float:
|
||||
"""
|
||||
Compute composite score from survival and final funds.
|
||||
|
||||
Score = survival_weight * survival_score
|
||||
+ funds_weight * normalised_funds_score
|
||||
|
||||
Normalised funds uses log-scale relative to initial capital:
|
||||
- funds <= 0: 0.0
|
||||
- funds == initial: ~0.15
|
||||
- funds == 10x: ~0.52
|
||||
- funds == 100x: 1.0
|
||||
"""
|
||||
survival_score = 1.0 if survived else 0.0
|
||||
|
||||
if final_funds_cents <= 0:
|
||||
funds_score = 0.0
|
||||
else:
|
||||
max_ratio = 100.0
|
||||
ratio = final_funds_cents / max(initial_funds_cents, 1)
|
||||
funds_score = min(math.log1p(ratio) / math.log1p(max_ratio), 1.0)
|
||||
|
||||
return survival_weight * survival_score + funds_weight * funds_score
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Main Environment
|
||||
# =============================================================================
|
||||
|
||||
class YCBenchEvalEnv(HermesAgentBaseEnv):
|
||||
"""
|
||||
YC-Bench long-horizon agent benchmark environment (eval-only).
|
||||
|
||||
Each eval item is a (preset, seed) pair. The environment initialises the
|
||||
simulation via ``yc-bench sim init`` (NOT ``yc-bench run`` which would start
|
||||
a competing built-in agent loop). The HermesAgentLoop then drives the
|
||||
interaction by calling individual yc-bench CLI commands via the terminal tool.
|
||||
|
||||
After the agent loop ends, the SQLite DB is read to extract the final score.
|
||||
|
||||
Scoring:
|
||||
composite = 0.5 * survival + 0.5 * normalised_funds
|
||||
"""
|
||||
|
||||
name = "yc-bench"
|
||||
env_config_cls = YCBenchEvalConfig
|
||||
|
||||
@classmethod
|
||||
def config_init(cls) -> Tuple[YCBenchEvalConfig, List[APIServerConfig]]:
|
||||
env_config = YCBenchEvalConfig(
|
||||
enabled_toolsets=["terminal"],
|
||||
disabled_toolsets=None,
|
||||
distribution=None,
|
||||
max_agent_turns=200,
|
||||
max_token_length=32000,
|
||||
agent_temperature=0.0,
|
||||
system_prompt=YC_BENCH_SYSTEM_PROMPT,
|
||||
terminal_backend="local",
|
||||
terminal_timeout=60,
|
||||
presets=["fast_test", "medium", "hard"],
|
||||
seeds=[1, 2, 3],
|
||||
run_timeout=3600,
|
||||
survival_weight=0.5,
|
||||
funds_weight=0.5,
|
||||
db_dir="/tmp/yc_bench_dbs",
|
||||
eval_handling=EvalHandlingEnum.STOP_TRAIN,
|
||||
group_size=1,
|
||||
steps_per_eval=1,
|
||||
total_steps=1,
|
||||
tokenizer_name="NousResearch/Hermes-3-Llama-3.1-8B",
|
||||
use_wandb=True,
|
||||
wandb_name="yc-bench",
|
||||
ensure_scores_are_not_same=False,
|
||||
)
|
||||
|
||||
server_configs = [
|
||||
APIServerConfig(
|
||||
base_url="https://openrouter.ai/api/v1",
|
||||
model_name="anthropic/claude-sonnet-4.6",
|
||||
server_type="openai",
|
||||
api_key=os.getenv("OPENROUTER_API_KEY", ""),
|
||||
health_check=False,
|
||||
)
|
||||
]
|
||||
|
||||
return env_config, server_configs
|
||||
|
||||
# =========================================================================
|
||||
# Setup
|
||||
# =========================================================================
|
||||
|
||||
async def setup(self):
|
||||
"""Verify yc-bench is installed and build the eval matrix."""
|
||||
# Verify yc-bench CLI is available
|
||||
try:
|
||||
result = subprocess.run(
|
||||
["yc-bench", "--help"], capture_output=True, text=True, timeout=10
|
||||
)
|
||||
if result.returncode != 0:
|
||||
raise FileNotFoundError
|
||||
except (FileNotFoundError, subprocess.TimeoutExpired):
|
||||
raise RuntimeError(
|
||||
"yc-bench CLI not found. Install with:\n"
|
||||
' pip install "hermes-agent[yc-bench]"\n'
|
||||
"Or: git clone https://github.com/collinear-ai/yc-bench "
|
||||
"&& cd yc-bench && pip install -e ."
|
||||
)
|
||||
print("yc-bench CLI verified.")
|
||||
|
||||
# Build eval matrix: preset x seed
|
||||
self.all_eval_items = [
|
||||
{"preset": preset, "seed": seed}
|
||||
for preset in self.config.presets
|
||||
for seed in self.config.seeds
|
||||
]
|
||||
self.iter = 0
|
||||
|
||||
os.makedirs(self.config.db_dir, exist_ok=True)
|
||||
self.eval_metrics: List[Tuple[str, float]] = []
|
||||
|
||||
# Streaming JSONL log for crash-safe result persistence
|
||||
log_dir = os.path.join(os.path.dirname(__file__), "logs")
|
||||
os.makedirs(log_dir, exist_ok=True)
|
||||
run_ts = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||
self._streaming_path = os.path.join(log_dir, f"samples_{run_ts}.jsonl")
|
||||
self._streaming_file = open(self._streaming_path, "w")
|
||||
self._streaming_lock = threading.Lock()
|
||||
|
||||
print(f"\nYC-Bench eval matrix: {len(self.all_eval_items)} runs")
|
||||
for item in self.all_eval_items:
|
||||
print(f" preset={item['preset']!r} seed={item['seed']}")
|
||||
print(f"Streaming results to: {self._streaming_path}\n")
|
||||
|
||||
def _save_result(self, result: Dict[str, Any]):
|
||||
"""Write a single run result to the streaming JSONL file immediately."""
|
||||
if not hasattr(self, "_streaming_file") or self._streaming_file.closed:
|
||||
return
|
||||
with self._streaming_lock:
|
||||
self._streaming_file.write(
|
||||
json.dumps(result, ensure_ascii=False, default=str) + "\n"
|
||||
)
|
||||
self._streaming_file.flush()
|
||||
|
||||
# =========================================================================
|
||||
# Training pipeline stubs (eval-only -- not used)
|
||||
# =========================================================================
|
||||
|
||||
async def get_next_item(self):
|
||||
item = self.all_eval_items[self.iter % len(self.all_eval_items)]
|
||||
self.iter += 1
|
||||
return item
|
||||
|
||||
def format_prompt(self, item: Dict[str, Any]) -> str:
|
||||
preset = item["preset"]
|
||||
seed = item["seed"]
|
||||
return (
|
||||
f"A new YC-Bench simulation has been initialized "
|
||||
f"(preset='{preset}', seed={seed}).\n"
|
||||
f"Your company '{self.config.company_name}' is ready.\n\n"
|
||||
"Begin by calling:\n"
|
||||
"1. `yc-bench company status` -- see your starting funds and prestige\n"
|
||||
"2. `yc-bench employee list` -- see your team and their skills\n"
|
||||
"3. `yc-bench market browse --required-prestige-lte 1` -- find tasks "
|
||||
"you can take\n\n"
|
||||
"Then accept 2-3 tasks, assign employees, dispatch them, and call "
|
||||
"`yc-bench sim resume` to advance time. Repeat this loop until the "
|
||||
"simulation ends (horizon reached or bankruptcy)."
|
||||
)
|
||||
|
||||
async def compute_reward(self, item, result, ctx) -> float:
|
||||
return 0.0
|
||||
|
||||
async def collect_trajectories(self, item):
|
||||
return None, []
|
||||
|
||||
async def score(self, rollout_group_data):
|
||||
return None
|
||||
|
||||
# =========================================================================
|
||||
# Per-run evaluation
|
||||
# =========================================================================
|
||||
|
||||
async def rollout_and_score_eval(self, eval_item: Dict[str, Any]) -> Dict:
|
||||
"""
|
||||
Evaluate a single (preset, seed) run.
|
||||
|
||||
1. Sets DATABASE_URL and YC_BENCH_EXPERIMENT env vars
|
||||
2. Initialises the simulation via ``yc-bench sim init`` (NOT ``run``)
|
||||
3. Runs HermesAgentLoop with terminal tool
|
||||
4. Reads SQLite DB to compute final score
|
||||
5. Returns result dict with survival, funds, and composite score
|
||||
"""
|
||||
preset = eval_item["preset"]
|
||||
seed = eval_item["seed"]
|
||||
run_id = str(uuid.uuid4())[:8]
|
||||
run_key = f"{preset}_seed{seed}_{run_id}"
|
||||
|
||||
from tqdm import tqdm
|
||||
tqdm.write(f" [START] preset={preset!r} seed={seed} (run_id={run_id})")
|
||||
run_start = time.time()
|
||||
|
||||
# Isolated DB per run -- prevents cross-run state leakage
|
||||
db_path = os.path.join(self.config.db_dir, f"yc_bench_{run_key}.db")
|
||||
os.environ["DATABASE_URL"] = f"sqlite:///{db_path}"
|
||||
os.environ["YC_BENCH_EXPERIMENT"] = preset
|
||||
|
||||
# Determine horizon: explicit config override > preset lookup > default 1
|
||||
horizon = self.config.horizon_years or _PRESET_HORIZONS.get(preset, 1)
|
||||
|
||||
try:
|
||||
# ----------------------------------------------------------
|
||||
# Step 1: Initialise the simulation via CLI
|
||||
# IMPORTANT: We use `sim init`, NOT `yc-bench run`.
|
||||
# `yc-bench run` starts yc-bench's own LLM agent loop (via
|
||||
# LiteLLM), which would compete with our HermesAgentLoop.
|
||||
# `sim init` just sets up the world and returns.
|
||||
# ----------------------------------------------------------
|
||||
init_cmd = [
|
||||
"yc-bench", "sim", "init",
|
||||
"--seed", str(seed),
|
||||
"--start-date", self.config.start_date,
|
||||
"--company-name", self.config.company_name,
|
||||
"--horizon-years", str(horizon),
|
||||
]
|
||||
init_result = subprocess.run(
|
||||
init_cmd, capture_output=True, text=True, timeout=30,
|
||||
)
|
||||
if init_result.returncode != 0:
|
||||
error_msg = (init_result.stderr or init_result.stdout).strip()
|
||||
raise RuntimeError(f"yc-bench sim init failed: {error_msg}")
|
||||
|
||||
tqdm.write(f" Simulation initialized (horizon={horizon}yr)")
|
||||
|
||||
# ----------------------------------------------------------
|
||||
# Step 2: Run the HermesAgentLoop
|
||||
# ----------------------------------------------------------
|
||||
tools, valid_names = self._resolve_tools_for_group()
|
||||
|
||||
messages: List[Dict[str, Any]] = [
|
||||
{"role": "system", "content": YC_BENCH_SYSTEM_PROMPT},
|
||||
{"role": "user", "content": self.format_prompt(eval_item)},
|
||||
]
|
||||
|
||||
agent = HermesAgentLoop(
|
||||
server=self.server,
|
||||
tool_schemas=tools,
|
||||
valid_tool_names=valid_names,
|
||||
max_turns=self.config.max_agent_turns,
|
||||
task_id=run_id,
|
||||
temperature=self.config.agent_temperature,
|
||||
max_tokens=self.config.max_token_length,
|
||||
extra_body=self.config.extra_body,
|
||||
)
|
||||
result = await agent.run(messages)
|
||||
|
||||
# ----------------------------------------------------------
|
||||
# Step 3: Read final score from the simulation DB
|
||||
# ----------------------------------------------------------
|
||||
score_data = _read_final_score(db_path)
|
||||
final_funds = score_data["final_funds_cents"]
|
||||
survived = score_data["survived"]
|
||||
terminal_reason = score_data["terminal_reason"]
|
||||
|
||||
composite = _compute_composite_score(
|
||||
final_funds_cents=final_funds,
|
||||
survived=survived,
|
||||
survival_weight=self.config.survival_weight,
|
||||
funds_weight=self.config.funds_weight,
|
||||
)
|
||||
|
||||
elapsed = time.time() - run_start
|
||||
status = "SURVIVED" if survived else "BANKRUPT"
|
||||
if final_funds >= 0:
|
||||
funds_str = f"${final_funds / 100:,.0f}"
|
||||
else:
|
||||
funds_str = f"-${abs(final_funds) / 100:,.0f}"
|
||||
|
||||
tqdm.write(
|
||||
f" [{status}] preset={preset!r} seed={seed} "
|
||||
f"funds={funds_str} score={composite:.3f} "
|
||||
f"turns={result.turns_used} ({elapsed:.0f}s)"
|
||||
)
|
||||
|
||||
out = {
|
||||
"preset": preset,
|
||||
"seed": seed,
|
||||
"survived": survived,
|
||||
"final_funds_cents": final_funds,
|
||||
"final_funds_usd": final_funds / 100,
|
||||
"terminal_reason": terminal_reason,
|
||||
"composite_score": composite,
|
||||
"turns_used": result.turns_used,
|
||||
"finished_naturally": result.finished_naturally,
|
||||
"elapsed_seconds": elapsed,
|
||||
"db_path": db_path,
|
||||
"messages": result.messages,
|
||||
}
|
||||
self._save_result(out)
|
||||
return out
|
||||
|
||||
except Exception as e:
|
||||
elapsed = time.time() - run_start
|
||||
logger.error("Run %s failed: %s", run_key, e, exc_info=True)
|
||||
tqdm.write(
|
||||
f" [ERROR] preset={preset!r} seed={seed}: {e} ({elapsed:.0f}s)"
|
||||
)
|
||||
out = {
|
||||
"preset": preset,
|
||||
"seed": seed,
|
||||
"survived": False,
|
||||
"final_funds_cents": 0,
|
||||
"final_funds_usd": 0.0,
|
||||
"terminal_reason": f"error: {e}",
|
||||
"composite_score": 0.0,
|
||||
"turns_used": 0,
|
||||
"error": str(e),
|
||||
"elapsed_seconds": elapsed,
|
||||
}
|
||||
self._save_result(out)
|
||||
return out
|
||||
|
||||
# =========================================================================
|
||||
# Evaluate
|
||||
# =========================================================================
|
||||
|
||||
async def _run_with_timeout(self, item: Dict[str, Any]) -> Dict:
|
||||
"""Wrap a single rollout with a wall-clock timeout."""
|
||||
preset = item["preset"]
|
||||
seed = item["seed"]
|
||||
try:
|
||||
return await asyncio.wait_for(
|
||||
self.rollout_and_score_eval(item),
|
||||
timeout=self.config.run_timeout,
|
||||
)
|
||||
except asyncio.TimeoutError:
|
||||
from tqdm import tqdm
|
||||
tqdm.write(
|
||||
f" [TIMEOUT] preset={preset!r} seed={seed} "
|
||||
f"(exceeded {self.config.run_timeout}s)"
|
||||
)
|
||||
out = {
|
||||
"preset": preset,
|
||||
"seed": seed,
|
||||
"survived": False,
|
||||
"final_funds_cents": 0,
|
||||
"final_funds_usd": 0.0,
|
||||
"terminal_reason": f"timeout ({self.config.run_timeout}s)",
|
||||
"composite_score": 0.0,
|
||||
"turns_used": 0,
|
||||
"error": "timeout",
|
||||
}
|
||||
self._save_result(out)
|
||||
return out
|
||||
|
||||
async def evaluate(self, *args, **kwargs) -> None:
|
||||
"""
|
||||
Run YC-Bench evaluation over all (preset, seed) combinations.
|
||||
|
||||
Runs sequentially -- each run is 100-500 turns, parallelising would
|
||||
be prohibitively expensive and cause env var conflicts.
|
||||
"""
|
||||
start_time = time.time()
|
||||
from tqdm import tqdm
|
||||
|
||||
# --- tqdm-compatible logging handler (TB2 pattern) ---
|
||||
class _TqdmHandler(logging.Handler):
|
||||
def emit(self, record):
|
||||
try:
|
||||
tqdm.write(self.format(record))
|
||||
except Exception:
|
||||
self.handleError(record)
|
||||
|
||||
root = logging.getLogger()
|
||||
handler = _TqdmHandler()
|
||||
handler.setFormatter(
|
||||
logging.Formatter("%(levelname)s %(name)s: %(message)s")
|
||||
)
|
||||
root.handlers = [handler]
|
||||
for noisy in ("httpx", "openai"):
|
||||
logging.getLogger(noisy).setLevel(logging.WARNING)
|
||||
|
||||
# --- Print config summary ---
|
||||
print(f"\n{'='*60}")
|
||||
print("Starting YC-Bench Evaluation")
|
||||
print(f"{'='*60}")
|
||||
print(f" Presets: {self.config.presets}")
|
||||
print(f" Seeds: {self.config.seeds}")
|
||||
print(f" Total runs: {len(self.all_eval_items)}")
|
||||
print(f" Max turns/run: {self.config.max_agent_turns}")
|
||||
print(f" Run timeout: {self.config.run_timeout}s")
|
||||
print(f"{'='*60}\n")
|
||||
|
||||
results = []
|
||||
pbar = tqdm(
|
||||
total=len(self.all_eval_items), desc="YC-Bench", dynamic_ncols=True
|
||||
)
|
||||
|
||||
try:
|
||||
for item in self.all_eval_items:
|
||||
result = await self._run_with_timeout(item)
|
||||
results.append(result)
|
||||
survived_count = sum(1 for r in results if r.get("survived"))
|
||||
pbar.set_postfix_str(
|
||||
f"survived={survived_count}/{len(results)}"
|
||||
)
|
||||
pbar.update(1)
|
||||
|
||||
except (KeyboardInterrupt, asyncio.CancelledError):
|
||||
tqdm.write("\n[INTERRUPTED] Stopping evaluation...")
|
||||
pbar.close()
|
||||
try:
|
||||
from tools.terminal_tool import cleanup_all_environments
|
||||
cleanup_all_environments()
|
||||
except Exception:
|
||||
pass
|
||||
if hasattr(self, "_streaming_file") and not self._streaming_file.closed:
|
||||
self._streaming_file.close()
|
||||
return
|
||||
|
||||
pbar.close()
|
||||
end_time = time.time()
|
||||
|
||||
# --- Compute metrics ---
|
||||
valid = [r for r in results if r is not None]
|
||||
if not valid:
|
||||
print("Warning: No valid results.")
|
||||
return
|
||||
|
||||
total = len(valid)
|
||||
survived_total = sum(1 for r in valid if r.get("survived"))
|
||||
survival_rate = survived_total / total if total else 0.0
|
||||
avg_score = (
|
||||
sum(r.get("composite_score", 0) for r in valid) / total
|
||||
if total
|
||||
else 0.0
|
||||
)
|
||||
|
||||
preset_results: Dict[str, List[Dict]] = defaultdict(list)
|
||||
for r in valid:
|
||||
preset_results[r["preset"]].append(r)
|
||||
|
||||
eval_metrics = {
|
||||
"eval/survival_rate": survival_rate,
|
||||
"eval/avg_composite_score": avg_score,
|
||||
"eval/total_runs": total,
|
||||
"eval/survived_runs": survived_total,
|
||||
"eval/evaluation_time_seconds": end_time - start_time,
|
||||
}
|
||||
|
||||
for preset, items in sorted(preset_results.items()):
|
||||
ps = sum(1 for r in items if r.get("survived"))
|
||||
pt = len(items)
|
||||
pa = (
|
||||
sum(r.get("composite_score", 0) for r in items) / pt
|
||||
if pt
|
||||
else 0
|
||||
)
|
||||
key = preset.replace("-", "_")
|
||||
eval_metrics[f"eval/survival_rate_{key}"] = ps / pt if pt else 0
|
||||
eval_metrics[f"eval/avg_score_{key}"] = pa
|
||||
|
||||
self.eval_metrics = [(k, v) for k, v in eval_metrics.items()]
|
||||
|
||||
# --- Print summary ---
|
||||
print(f"\n{'='*60}")
|
||||
print("YC-Bench Evaluation Results")
|
||||
print(f"{'='*60}")
|
||||
print(
|
||||
f"Overall survival rate: {survival_rate:.1%} "
|
||||
f"({survived_total}/{total})"
|
||||
)
|
||||
print(f"Average composite score: {avg_score:.4f}")
|
||||
print(f"Evaluation time: {end_time - start_time:.1f}s")
|
||||
|
||||
print("\nPer-preset breakdown:")
|
||||
for preset, items in sorted(preset_results.items()):
|
||||
ps = sum(1 for r in items if r.get("survived"))
|
||||
pt = len(items)
|
||||
pa = (
|
||||
sum(r.get("composite_score", 0) for r in items) / pt
|
||||
if pt
|
||||
else 0
|
||||
)
|
||||
print(f" {preset}: {ps}/{pt} survived avg_score={pa:.4f}")
|
||||
for r in items:
|
||||
status = "SURVIVED" if r.get("survived") else "BANKRUPT"
|
||||
funds = r.get("final_funds_usd", 0)
|
||||
print(
|
||||
f" seed={r['seed']} [{status}] "
|
||||
f"${funds:,.0f} "
|
||||
f"score={r.get('composite_score', 0):.3f}"
|
||||
)
|
||||
|
||||
print(f"{'='*60}\n")
|
||||
|
||||
# --- Log results ---
|
||||
samples = [
|
||||
{k: v for k, v in r.items() if k != "messages"} for r in valid
|
||||
]
|
||||
|
||||
try:
|
||||
await self.evaluate_log(
|
||||
metrics=eval_metrics,
|
||||
samples=samples,
|
||||
start_time=start_time,
|
||||
end_time=end_time,
|
||||
generation_parameters={
|
||||
"temperature": self.config.agent_temperature,
|
||||
"max_tokens": self.config.max_token_length,
|
||||
"max_agent_turns": self.config.max_agent_turns,
|
||||
},
|
||||
)
|
||||
except Exception as e:
|
||||
print(f"Error logging results: {e}")
|
||||
|
||||
# --- Cleanup (TB2 pattern) ---
|
||||
if hasattr(self, "_streaming_file") and not self._streaming_file.closed:
|
||||
self._streaming_file.close()
|
||||
print(f"Results saved to: {self._streaming_path}")
|
||||
|
||||
try:
|
||||
from tools.terminal_tool import cleanup_all_environments
|
||||
cleanup_all_environments()
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
try:
|
||||
from environments.agent_loop import _tool_executor
|
||||
_tool_executor.shutdown(wait=False, cancel_futures=True)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# =========================================================================
|
||||
# Wandb logging
|
||||
# =========================================================================
|
||||
|
||||
async def wandb_log(self, wandb_metrics: Optional[Dict] = None):
|
||||
"""Log YC-Bench-specific metrics to wandb."""
|
||||
if wandb_metrics is None:
|
||||
wandb_metrics = {}
|
||||
for k, v in self.eval_metrics:
|
||||
wandb_metrics[k] = v
|
||||
self.eval_metrics = []
|
||||
await super().wandb_log(wandb_metrics)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
YCBenchEvalEnv.cli()
|
||||
@@ -8,13 +8,10 @@ Uses python-telegram-bot library for:
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
from typing import Dict, List, Optional, Any
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
try:
|
||||
from telegram import Update, Bot, Message
|
||||
from telegram.ext import (
|
||||
@@ -76,19 +73,6 @@ def _escape_mdv2(text: str) -> str:
|
||||
return _MDV2_ESCAPE_RE.sub(r'\\\1', text)
|
||||
|
||||
|
||||
def _strip_mdv2(text: str) -> str:
|
||||
"""Strip MarkdownV2 escape backslashes to produce clean plain text.
|
||||
|
||||
Also removes MarkdownV2 bold markers (*text* -> text) so the fallback
|
||||
doesn't show stray asterisks from header/bold conversion.
|
||||
"""
|
||||
# Remove escape backslashes before special characters
|
||||
cleaned = re.sub(r'\\([_*\[\]()~`>#\+\-=|{}.!\\])', r'\1', text)
|
||||
# Remove MarkdownV2 bold markers that format_message converted from **bold**
|
||||
cleaned = re.sub(r'\*([^*]+)\*', r'\1', cleaned)
|
||||
return cleaned
|
||||
|
||||
|
||||
class TelegramAdapter(BasePlatformAdapter):
|
||||
"""
|
||||
Telegram bot adapter.
|
||||
@@ -215,13 +199,9 @@ class TelegramAdapter(BasePlatformAdapter):
|
||||
except Exception as md_error:
|
||||
# Markdown parsing failed, try plain text
|
||||
if "parse" in str(md_error).lower() or "markdown" in str(md_error).lower():
|
||||
logger.warning("[%s] MarkdownV2 parse failed, falling back to plain text: %s", self.name, md_error)
|
||||
# Strip MDV2 escape backslashes so the user doesn't
|
||||
# see raw backslashes littered through the message.
|
||||
plain_chunk = _strip_mdv2(chunk)
|
||||
msg = await self._bot.send_message(
|
||||
chat_id=int(chat_id),
|
||||
text=plain_chunk,
|
||||
text=chunk,
|
||||
parse_mode=None, # Plain text
|
||||
reply_to_message_id=int(reply_to) if reply_to and i == 0 else None,
|
||||
message_thread_id=int(thread_id) if thread_id else None,
|
||||
|
||||
@@ -93,11 +93,6 @@ if _config_path.exists():
|
||||
if _agent_cfg and isinstance(_agent_cfg, dict):
|
||||
if "max_turns" in _agent_cfg:
|
||||
os.environ["HERMES_MAX_ITERATIONS"] = str(_agent_cfg["max_turns"])
|
||||
# Timezone: bridge config.yaml → HERMES_TIMEZONE env var.
|
||||
# HERMES_TIMEZONE from .env takes precedence (already in os.environ).
|
||||
_tz_cfg = _cfg.get("timezone", "")
|
||||
if _tz_cfg and isinstance(_tz_cfg, str) and "HERMES_TIMEZONE" not in os.environ:
|
||||
os.environ["HERMES_TIMEZONE"] = _tz_cfg.strip()
|
||||
except Exception:
|
||||
pass # Non-fatal; gateway can still run with .env values
|
||||
|
||||
@@ -2097,7 +2092,7 @@ class GatewayRunner:
|
||||
os.environ["HERMES_SESSION_KEY"] = session_key or ""
|
||||
|
||||
# Read from env var or use default (same as CLI)
|
||||
max_iterations = int(os.getenv("HERMES_MAX_ITERATIONS", "90"))
|
||||
max_iterations = int(os.getenv("HERMES_MAX_ITERATIONS", "60"))
|
||||
|
||||
# Map platform enum to the platform hint key the agent understands.
|
||||
# Platform.LOCAL ("local") maps to "cli"; others pass through as-is.
|
||||
|
||||
@@ -72,19 +72,15 @@ CODEX_ACCESS_TOKEN_REFRESH_SKEW_SECONDS = 120
|
||||
|
||||
@dataclass
|
||||
class ProviderConfig:
|
||||
"""Describes a known inference provider."""
|
||||
"""Describes a known OAuth provider."""
|
||||
id: str
|
||||
name: str
|
||||
auth_type: str # "oauth_device_code", "oauth_external", or "api_key"
|
||||
auth_type: str # "oauth_device_code" or "api_key"
|
||||
portal_base_url: str = ""
|
||||
inference_base_url: str = ""
|
||||
client_id: str = ""
|
||||
scope: str = ""
|
||||
extra: Dict[str, Any] = field(default_factory=dict)
|
||||
# For API-key providers: env vars to check (in priority order)
|
||||
api_key_env_vars: tuple = ()
|
||||
# Optional env var for base URL override
|
||||
base_url_env_var: str = ""
|
||||
|
||||
|
||||
PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
|
||||
@@ -103,38 +99,6 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
|
||||
auth_type="oauth_external",
|
||||
inference_base_url=DEFAULT_CODEX_BASE_URL,
|
||||
),
|
||||
"zai": ProviderConfig(
|
||||
id="zai",
|
||||
name="Z.AI / GLM",
|
||||
auth_type="api_key",
|
||||
inference_base_url="https://api.z.ai/api/paas/v4",
|
||||
api_key_env_vars=("GLM_API_KEY", "ZAI_API_KEY", "Z_AI_API_KEY"),
|
||||
base_url_env_var="GLM_BASE_URL",
|
||||
),
|
||||
"kimi-coding": ProviderConfig(
|
||||
id="kimi-coding",
|
||||
name="Kimi / Moonshot",
|
||||
auth_type="api_key",
|
||||
inference_base_url="https://api.moonshot.ai/v1",
|
||||
api_key_env_vars=("KIMI_API_KEY",),
|
||||
base_url_env_var="KIMI_BASE_URL",
|
||||
),
|
||||
"minimax": ProviderConfig(
|
||||
id="minimax",
|
||||
name="MiniMax",
|
||||
auth_type="api_key",
|
||||
inference_base_url="https://api.minimax.io/v1",
|
||||
api_key_env_vars=("MINIMAX_API_KEY",),
|
||||
base_url_env_var="MINIMAX_BASE_URL",
|
||||
),
|
||||
"minimax-cn": ProviderConfig(
|
||||
id="minimax-cn",
|
||||
name="MiniMax (China)",
|
||||
auth_type="api_key",
|
||||
inference_base_url="https://api.minimaxi.com/v1",
|
||||
api_key_env_vars=("MINIMAX_CN_API_KEY",),
|
||||
base_url_env_var="MINIMAX_CN_BASE_URL",
|
||||
),
|
||||
}
|
||||
|
||||
|
||||
@@ -391,19 +355,10 @@ def resolve_provider(
|
||||
1. active_provider in auth.json with valid credentials
|
||||
2. Explicit CLI api_key/base_url -> "openrouter"
|
||||
3. OPENAI_API_KEY or OPENROUTER_API_KEY env vars -> "openrouter"
|
||||
4. Provider-specific API keys (GLM, Kimi, MiniMax) -> that provider
|
||||
5. Fallback: "openrouter"
|
||||
4. Fallback: "openrouter"
|
||||
"""
|
||||
normalized = (requested or "auto").strip().lower()
|
||||
|
||||
# Normalize provider aliases
|
||||
_PROVIDER_ALIASES = {
|
||||
"glm": "zai", "z-ai": "zai", "z.ai": "zai", "zhipu": "zai",
|
||||
"kimi": "kimi-coding", "moonshot": "kimi-coding",
|
||||
"minimax-china": "minimax-cn", "minimax_cn": "minimax-cn",
|
||||
}
|
||||
normalized = _PROVIDER_ALIASES.get(normalized, normalized)
|
||||
|
||||
if normalized in {"openrouter", "custom"}:
|
||||
return "openrouter"
|
||||
if normalized in PROVIDER_REGISTRY:
|
||||
@@ -432,14 +387,6 @@ def resolve_provider(
|
||||
if os.getenv("OPENAI_API_KEY") or os.getenv("OPENROUTER_API_KEY"):
|
||||
return "openrouter"
|
||||
|
||||
# Auto-detect API-key providers by checking their env vars
|
||||
for pid, pconfig in PROVIDER_REGISTRY.items():
|
||||
if pconfig.auth_type != "api_key":
|
||||
continue
|
||||
for env_var in pconfig.api_key_env_vars:
|
||||
if os.getenv(env_var, "").strip():
|
||||
return pid
|
||||
|
||||
return "openrouter"
|
||||
|
||||
|
||||
@@ -1283,37 +1230,6 @@ def get_codex_auth_status() -> Dict[str, Any]:
|
||||
}
|
||||
|
||||
|
||||
def get_api_key_provider_status(provider_id: str) -> Dict[str, Any]:
|
||||
"""Status snapshot for API-key providers (z.ai, Kimi, MiniMax)."""
|
||||
pconfig = PROVIDER_REGISTRY.get(provider_id)
|
||||
if not pconfig or pconfig.auth_type != "api_key":
|
||||
return {"configured": False}
|
||||
|
||||
api_key = ""
|
||||
key_source = ""
|
||||
for env_var in pconfig.api_key_env_vars:
|
||||
val = os.getenv(env_var, "").strip()
|
||||
if val:
|
||||
api_key = val
|
||||
key_source = env_var
|
||||
break
|
||||
|
||||
base_url = pconfig.inference_base_url
|
||||
if pconfig.base_url_env_var:
|
||||
env_url = os.getenv(pconfig.base_url_env_var, "").strip()
|
||||
if env_url:
|
||||
base_url = env_url
|
||||
|
||||
return {
|
||||
"configured": bool(api_key),
|
||||
"provider": provider_id,
|
||||
"name": pconfig.name,
|
||||
"key_source": key_source,
|
||||
"base_url": base_url,
|
||||
"logged_in": bool(api_key), # compat with OAuth status shape
|
||||
}
|
||||
|
||||
|
||||
def get_auth_status(provider_id: Optional[str] = None) -> Dict[str, Any]:
|
||||
"""Generic auth status dispatcher."""
|
||||
target = provider_id or get_active_provider()
|
||||
@@ -1321,49 +1237,9 @@ def get_auth_status(provider_id: Optional[str] = None) -> Dict[str, Any]:
|
||||
return get_nous_auth_status()
|
||||
if target == "openai-codex":
|
||||
return get_codex_auth_status()
|
||||
# API-key providers
|
||||
pconfig = PROVIDER_REGISTRY.get(target)
|
||||
if pconfig and pconfig.auth_type == "api_key":
|
||||
return get_api_key_provider_status(target)
|
||||
return {"logged_in": False}
|
||||
|
||||
|
||||
def resolve_api_key_provider_credentials(provider_id: str) -> Dict[str, Any]:
|
||||
"""Resolve API key and base URL for an API-key provider.
|
||||
|
||||
Returns dict with: provider, api_key, base_url, source.
|
||||
"""
|
||||
pconfig = PROVIDER_REGISTRY.get(provider_id)
|
||||
if not pconfig or pconfig.auth_type != "api_key":
|
||||
raise AuthError(
|
||||
f"Provider '{provider_id}' is not an API-key provider.",
|
||||
provider=provider_id,
|
||||
code="invalid_provider",
|
||||
)
|
||||
|
||||
api_key = ""
|
||||
key_source = ""
|
||||
for env_var in pconfig.api_key_env_vars:
|
||||
val = os.getenv(env_var, "").strip()
|
||||
if val:
|
||||
api_key = val
|
||||
key_source = env_var
|
||||
break
|
||||
|
||||
base_url = pconfig.inference_base_url
|
||||
if pconfig.base_url_env_var:
|
||||
env_url = os.getenv(pconfig.base_url_env_var, "").strip()
|
||||
if env_url:
|
||||
base_url = env_url.rstrip("/")
|
||||
|
||||
return {
|
||||
"provider": provider_id,
|
||||
"api_key": api_key,
|
||||
"base_url": base_url.rstrip("/"),
|
||||
"source": key_source or "default",
|
||||
}
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# External credential detection
|
||||
# =============================================================================
|
||||
|
||||
@@ -1,15 +1,10 @@
|
||||
"""Welcome banner, ASCII art, skills summary, and update check for the CLI.
|
||||
"""Welcome banner, ASCII art, and skills summary for the CLI.
|
||||
|
||||
Pure display functions with no HermesCLI state dependency.
|
||||
"""
|
||||
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import subprocess
|
||||
import time
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Any, Optional
|
||||
from typing import Dict, List, Any
|
||||
|
||||
from rich.console import Console
|
||||
from rich.panel import Panel
|
||||
@@ -18,8 +13,6 @@ from rich.table import Table
|
||||
from prompt_toolkit import print_formatted_text as _pt_print
|
||||
from prompt_toolkit.formatted_text import ANSI as _PT_ANSI
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
# =========================================================================
|
||||
# ANSI building blocks for conversation display
|
||||
@@ -102,72 +95,6 @@ def get_available_skills() -> Dict[str, List[str]]:
|
||||
return skills_by_category
|
||||
|
||||
|
||||
# =========================================================================
|
||||
# Update check
|
||||
# =========================================================================
|
||||
|
||||
# Cache update check results for 6 hours to avoid repeated git fetches
|
||||
_UPDATE_CHECK_CACHE_SECONDS = 6 * 3600
|
||||
|
||||
|
||||
def check_for_updates() -> Optional[int]:
|
||||
"""Check how many commits behind origin/main the local repo is.
|
||||
|
||||
Does a ``git fetch`` at most once every 6 hours (cached to
|
||||
``~/.hermes/.update_check``). Returns the number of commits behind,
|
||||
or ``None`` if the check fails or isn't applicable.
|
||||
"""
|
||||
hermes_home = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
|
||||
repo_dir = hermes_home / "hermes-agent"
|
||||
cache_file = hermes_home / ".update_check"
|
||||
|
||||
# Must be a git repo
|
||||
if not (repo_dir / ".git").exists():
|
||||
return None
|
||||
|
||||
# Read cache
|
||||
now = time.time()
|
||||
try:
|
||||
if cache_file.exists():
|
||||
cached = json.loads(cache_file.read_text())
|
||||
if now - cached.get("ts", 0) < _UPDATE_CHECK_CACHE_SECONDS:
|
||||
return cached.get("behind")
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# Fetch latest refs (fast — only downloads ref metadata, no files)
|
||||
try:
|
||||
subprocess.run(
|
||||
["git", "fetch", "origin", "--quiet"],
|
||||
capture_output=True, timeout=10,
|
||||
cwd=str(repo_dir),
|
||||
)
|
||||
except Exception:
|
||||
pass # Offline or timeout — use stale refs, that's fine
|
||||
|
||||
# Count commits behind
|
||||
try:
|
||||
result = subprocess.run(
|
||||
["git", "rev-list", "--count", "HEAD..origin/main"],
|
||||
capture_output=True, text=True, timeout=5,
|
||||
cwd=str(repo_dir),
|
||||
)
|
||||
if result.returncode == 0:
|
||||
behind = int(result.stdout.strip())
|
||||
else:
|
||||
behind = None
|
||||
except Exception:
|
||||
behind = None
|
||||
|
||||
# Write cache
|
||||
try:
|
||||
cache_file.write_text(json.dumps({"ts": now, "behind": behind}))
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
return behind
|
||||
|
||||
|
||||
# =========================================================================
|
||||
# Welcome banner
|
||||
# =========================================================================
|
||||
@@ -332,18 +259,6 @@ def build_welcome_banner(console: Console, model: str, cwd: str,
|
||||
summary_parts.append("/help for commands")
|
||||
right_lines.append(f"[dim #B8860B]{' · '.join(summary_parts)}[/]")
|
||||
|
||||
# Update check — show if behind origin/main
|
||||
try:
|
||||
behind = check_for_updates()
|
||||
if behind and behind > 0:
|
||||
commits_word = "commit" if behind == 1 else "commits"
|
||||
right_lines.append(
|
||||
f"[bold yellow]⚠ {behind} {commits_word} behind[/]"
|
||||
f"[dim yellow] — run [bold]hermes update[/bold] to update[/]"
|
||||
)
|
||||
except Exception:
|
||||
pass # Never break the banner over an update check
|
||||
|
||||
right_content = "\n".join(right_lines)
|
||||
layout_table.add_row(left_content, right_content)
|
||||
|
||||
|
||||
@@ -87,20 +87,6 @@ DEFAULT_CONFIG = {
|
||||
"enabled": True,
|
||||
"threshold": 0.85,
|
||||
"summary_model": "google/gemini-3-flash-preview",
|
||||
"summary_provider": "auto",
|
||||
},
|
||||
|
||||
# Auxiliary model overrides (advanced). By default Hermes auto-selects
|
||||
# the provider and model for each side task. Set these to override.
|
||||
"auxiliary": {
|
||||
"vision": {
|
||||
"provider": "auto", # auto | openrouter | nous | main
|
||||
"model": "", # e.g. "google/gemini-2.5-flash", "gpt-4o"
|
||||
},
|
||||
"web_extract": {
|
||||
"provider": "auto",
|
||||
"model": "",
|
||||
},
|
||||
},
|
||||
|
||||
"display": {
|
||||
@@ -155,13 +141,9 @@ DEFAULT_CONFIG = {
|
||||
# (apiKey, workspace, peerName, sessions, enabled) comes from the global config.
|
||||
"honcho": {},
|
||||
|
||||
# IANA timezone (e.g. "Asia/Kolkata", "America/New_York").
|
||||
# Empty string means use server-local time.
|
||||
"timezone": "",
|
||||
|
||||
# Permanently allowed dangerous command patterns (added via "always" approval)
|
||||
"command_allowlist": [],
|
||||
|
||||
|
||||
# Config schema version - bump this when adding new required fields
|
||||
"_config_version": 5,
|
||||
}
|
||||
@@ -188,86 +170,6 @@ OPTIONAL_ENV_VARS = {
|
||||
"category": "provider",
|
||||
"advanced": True,
|
||||
},
|
||||
"GLM_API_KEY": {
|
||||
"description": "Z.AI / GLM API key (also recognized as ZAI_API_KEY / Z_AI_API_KEY)",
|
||||
"prompt": "Z.AI / GLM API key",
|
||||
"url": "https://z.ai/",
|
||||
"password": True,
|
||||
"category": "provider",
|
||||
"advanced": True,
|
||||
},
|
||||
"ZAI_API_KEY": {
|
||||
"description": "Z.AI API key (alias for GLM_API_KEY)",
|
||||
"prompt": "Z.AI API key",
|
||||
"url": "https://z.ai/",
|
||||
"password": True,
|
||||
"category": "provider",
|
||||
"advanced": True,
|
||||
},
|
||||
"Z_AI_API_KEY": {
|
||||
"description": "Z.AI API key (alias for GLM_API_KEY)",
|
||||
"prompt": "Z.AI API key",
|
||||
"url": "https://z.ai/",
|
||||
"password": True,
|
||||
"category": "provider",
|
||||
"advanced": True,
|
||||
},
|
||||
"GLM_BASE_URL": {
|
||||
"description": "Z.AI / GLM base URL override",
|
||||
"prompt": "Z.AI / GLM base URL (leave empty for default)",
|
||||
"url": None,
|
||||
"password": False,
|
||||
"category": "provider",
|
||||
"advanced": True,
|
||||
},
|
||||
"KIMI_API_KEY": {
|
||||
"description": "Kimi / Moonshot API key",
|
||||
"prompt": "Kimi API key",
|
||||
"url": "https://platform.moonshot.cn/",
|
||||
"password": True,
|
||||
"category": "provider",
|
||||
"advanced": True,
|
||||
},
|
||||
"KIMI_BASE_URL": {
|
||||
"description": "Kimi / Moonshot base URL override",
|
||||
"prompt": "Kimi base URL (leave empty for default)",
|
||||
"url": None,
|
||||
"password": False,
|
||||
"category": "provider",
|
||||
"advanced": True,
|
||||
},
|
||||
"MINIMAX_API_KEY": {
|
||||
"description": "MiniMax API key (international)",
|
||||
"prompt": "MiniMax API key",
|
||||
"url": "https://www.minimax.io/",
|
||||
"password": True,
|
||||
"category": "provider",
|
||||
"advanced": True,
|
||||
},
|
||||
"MINIMAX_BASE_URL": {
|
||||
"description": "MiniMax base URL override",
|
||||
"prompt": "MiniMax base URL (leave empty for default)",
|
||||
"url": None,
|
||||
"password": False,
|
||||
"category": "provider",
|
||||
"advanced": True,
|
||||
},
|
||||
"MINIMAX_CN_API_KEY": {
|
||||
"description": "MiniMax API key (China endpoint)",
|
||||
"prompt": "MiniMax (China) API key",
|
||||
"url": "https://www.minimaxi.com/",
|
||||
"password": True,
|
||||
"category": "provider",
|
||||
"advanced": True,
|
||||
},
|
||||
"MINIMAX_CN_BASE_URL": {
|
||||
"description": "MiniMax (China) base URL override",
|
||||
"prompt": "MiniMax (China) base URL (leave empty for default)",
|
||||
"url": None,
|
||||
"password": False,
|
||||
"category": "provider",
|
||||
"advanced": True,
|
||||
},
|
||||
|
||||
# ── Tool API keys ──
|
||||
"FIRECRAWL_API_KEY": {
|
||||
@@ -287,7 +189,7 @@ OPTIONAL_ENV_VARS = {
|
||||
"advanced": True,
|
||||
},
|
||||
"BROWSERBASE_API_KEY": {
|
||||
"description": "Browserbase API key for cloud browser (optional — local browser works without this)",
|
||||
"description": "Browserbase API key for browser automation",
|
||||
"prompt": "Browserbase API key",
|
||||
"url": "https://browserbase.com/",
|
||||
"tools": ["browser_navigate", "browser_click"],
|
||||
@@ -295,7 +197,7 @@ OPTIONAL_ENV_VARS = {
|
||||
"category": "tool",
|
||||
},
|
||||
"BROWSERBASE_PROJECT_ID": {
|
||||
"description": "Browserbase project ID (optional — only needed for cloud browser)",
|
||||
"description": "Browserbase project ID",
|
||||
"prompt": "Browserbase project ID",
|
||||
"url": "https://browserbase.com/",
|
||||
"tools": ["browser_navigate", "browser_click"],
|
||||
@@ -583,22 +485,6 @@ def migrate_config(interactive: bool = True, quiet: bool = False) -> Dict[str, A
|
||||
if not quiet:
|
||||
print(f" ✓ Migrated tool progress to config.yaml: {display['tool_progress']}")
|
||||
|
||||
# ── Version 4 → 5: add timezone field ──
|
||||
if current_ver < 5:
|
||||
config = load_config()
|
||||
if "timezone" not in config:
|
||||
old_tz = os.getenv("HERMES_TIMEZONE", "")
|
||||
if old_tz and old_tz.strip():
|
||||
config["timezone"] = old_tz.strip()
|
||||
results["config_added"].append(f"timezone={old_tz.strip()} (from HERMES_TIMEZONE)")
|
||||
else:
|
||||
config["timezone"] = ""
|
||||
results["config_added"].append("timezone= (empty, uses server-local)")
|
||||
save_config(config)
|
||||
if not quiet:
|
||||
tz_display = config["timezone"] or "(server-local)"
|
||||
print(f" ✓ Added timezone to config.yaml: {tz_display}")
|
||||
|
||||
if current_ver < latest_ver and not quiet:
|
||||
print(f"Config version: {current_ver} → {latest_ver}")
|
||||
|
||||
@@ -886,15 +772,6 @@ def show_config():
|
||||
print(f" SSH host: {ssh_host or '(not set)'}")
|
||||
print(f" SSH user: {ssh_user or '(not set)'}")
|
||||
|
||||
# Timezone
|
||||
print()
|
||||
print(color("◆ Timezone", Colors.CYAN, Colors.BOLD))
|
||||
tz = config.get('timezone', '')
|
||||
if tz:
|
||||
print(f" Timezone: {tz}")
|
||||
else:
|
||||
print(f" Timezone: {color('(server-local)', Colors.DIM)}")
|
||||
|
||||
# Compression
|
||||
print()
|
||||
print(color("◆ Context Compression", Colors.CYAN, Colors.BOLD))
|
||||
@@ -904,31 +781,6 @@ def show_config():
|
||||
if enabled:
|
||||
print(f" Threshold: {compression.get('threshold', 0.85) * 100:.0f}%")
|
||||
print(f" Model: {compression.get('summary_model', 'google/gemini-3-flash-preview')}")
|
||||
comp_provider = compression.get('summary_provider', 'auto')
|
||||
if comp_provider != 'auto':
|
||||
print(f" Provider: {comp_provider}")
|
||||
|
||||
# Auxiliary models
|
||||
auxiliary = config.get('auxiliary', {})
|
||||
aux_tasks = {
|
||||
"Vision": auxiliary.get('vision', {}),
|
||||
"Web extract": auxiliary.get('web_extract', {}),
|
||||
}
|
||||
has_overrides = any(
|
||||
t.get('provider', 'auto') != 'auto' or t.get('model', '')
|
||||
for t in aux_tasks.values()
|
||||
)
|
||||
if has_overrides:
|
||||
print()
|
||||
print(color("◆ Auxiliary Models (overrides)", Colors.CYAN, Colors.BOLD))
|
||||
for label, task_cfg in aux_tasks.items():
|
||||
prov = task_cfg.get('provider', 'auto')
|
||||
mdl = task_cfg.get('model', '')
|
||||
if prov != 'auto' or mdl:
|
||||
parts = [f"provider={prov}"]
|
||||
if mdl:
|
||||
parts.append(f"model={mdl}")
|
||||
print(f" {label:12s} {', '.join(parts)}")
|
||||
|
||||
# Messaging
|
||||
print()
|
||||
|
||||
@@ -132,11 +132,7 @@ def run_doctor(args):
|
||||
|
||||
# Check for common issues
|
||||
content = env_path.read_text()
|
||||
if any(k in content for k in (
|
||||
"OPENROUTER_API_KEY", "ANTHROPIC_API_KEY",
|
||||
"GLM_API_KEY", "ZAI_API_KEY", "Z_AI_API_KEY",
|
||||
"KIMI_API_KEY", "MINIMAX_API_KEY", "MINIMAX_CN_API_KEY",
|
||||
)):
|
||||
if "OPENROUTER_API_KEY" in content or "ANTHROPIC_API_KEY" in content:
|
||||
check_ok("API key configured")
|
||||
else:
|
||||
check_warn("No API key found in ~/.hermes/.env")
|
||||
@@ -472,42 +468,7 @@ def run_doctor(args):
|
||||
print(f"\r {color('⚠', Colors.YELLOW)} Anthropic API {color(msg, Colors.DIM)} ")
|
||||
except Exception as e:
|
||||
print(f"\r {color('⚠', Colors.YELLOW)} Anthropic API {color(f'({e})', Colors.DIM)} ")
|
||||
|
||||
# -- API-key providers (Z.AI/GLM, Kimi, MiniMax, MiniMax-CN) --
|
||||
_apikey_providers = [
|
||||
("Z.AI / GLM", ("GLM_API_KEY", "ZAI_API_KEY", "Z_AI_API_KEY"), "https://api.z.ai/api/paas/v4/models", "GLM_BASE_URL"),
|
||||
("Kimi / Moonshot", ("KIMI_API_KEY",), "https://api.moonshot.ai/v1/models", "KIMI_BASE_URL"),
|
||||
("MiniMax", ("MINIMAX_API_KEY",), "https://api.minimax.io/v1/models", "MINIMAX_BASE_URL"),
|
||||
("MiniMax (China)", ("MINIMAX_CN_API_KEY",), "https://api.minimaxi.com/v1/models", "MINIMAX_CN_BASE_URL"),
|
||||
]
|
||||
for _pname, _env_vars, _default_url, _base_env in _apikey_providers:
|
||||
_key = ""
|
||||
for _ev in _env_vars:
|
||||
_key = os.getenv(_ev, "")
|
||||
if _key:
|
||||
break
|
||||
if _key:
|
||||
_label = _pname.ljust(20)
|
||||
print(f" Checking {_pname} API...", end="", flush=True)
|
||||
try:
|
||||
import httpx
|
||||
_base = os.getenv(_base_env, "")
|
||||
_url = (_base.rstrip("/") + "/models") if _base else _default_url
|
||||
_resp = httpx.get(
|
||||
_url,
|
||||
headers={"Authorization": f"Bearer {_key}"},
|
||||
timeout=10,
|
||||
)
|
||||
if _resp.status_code == 200:
|
||||
print(f"\r {color('✓', Colors.GREEN)} {_label} ")
|
||||
elif _resp.status_code == 401:
|
||||
print(f"\r {color('✗', Colors.RED)} {_label} {color('(invalid API key)', Colors.DIM)} ")
|
||||
issues.append(f"Check {_env_vars[0]} in .env")
|
||||
else:
|
||||
print(f"\r {color('⚠', Colors.YELLOW)} {_label} {color(f'(HTTP {_resp.status_code})', Colors.DIM)} ")
|
||||
except Exception as _e:
|
||||
print(f"\r {color('⚠', Colors.YELLOW)} {_label} {color(f'({_e})', Colors.DIM)} ")
|
||||
|
||||
|
||||
# =========================================================================
|
||||
# Check: Submodules
|
||||
# =========================================================================
|
||||
|
||||
@@ -64,13 +64,7 @@ def _has_any_provider_configured() -> bool:
|
||||
# Check env vars (may be set by .env or shell).
|
||||
# OPENAI_BASE_URL alone counts — local models (vLLM, llama.cpp, etc.)
|
||||
# often don't require an API key.
|
||||
from hermes_cli.auth import PROVIDER_REGISTRY
|
||||
|
||||
# Collect all provider env vars
|
||||
provider_env_vars = {"OPENROUTER_API_KEY", "OPENAI_API_KEY", "ANTHROPIC_API_KEY", "OPENAI_BASE_URL"}
|
||||
for pconfig in PROVIDER_REGISTRY.values():
|
||||
if pconfig.auth_type == "api_key":
|
||||
provider_env_vars.update(pconfig.api_key_env_vars)
|
||||
provider_env_vars = ("OPENROUTER_API_KEY", "OPENAI_API_KEY", "ANTHROPIC_API_KEY", "OPENAI_BASE_URL")
|
||||
if any(os.getenv(v) for v in provider_env_vars):
|
||||
return True
|
||||
|
||||
@@ -417,10 +411,6 @@ def cmd_model(args):
|
||||
"openrouter": "OpenRouter",
|
||||
"nous": "Nous Portal",
|
||||
"openai-codex": "OpenAI Codex",
|
||||
"zai": "Z.AI / GLM",
|
||||
"kimi-coding": "Kimi / Moonshot",
|
||||
"minimax": "MiniMax",
|
||||
"minimax-cn": "MiniMax (China)",
|
||||
"custom": "Custom endpoint",
|
||||
}
|
||||
active_label = provider_labels.get(active, active)
|
||||
@@ -435,16 +425,11 @@ def cmd_model(args):
|
||||
("openrouter", "OpenRouter (100+ models, pay-per-use)"),
|
||||
("nous", "Nous Portal (Nous Research subscription)"),
|
||||
("openai-codex", "OpenAI Codex"),
|
||||
("zai", "Z.AI / GLM (Zhipu AI direct API)"),
|
||||
("kimi-coding", "Kimi / Moonshot (Moonshot AI direct API)"),
|
||||
("minimax", "MiniMax (global direct API)"),
|
||||
("minimax-cn", "MiniMax China (domestic direct API)"),
|
||||
("custom", "Custom endpoint (self-hosted / VLLM / etc.)"),
|
||||
]
|
||||
|
||||
# Reorder so the active provider is at the top
|
||||
known_keys = {k for k, _ in providers}
|
||||
active_key = active if active in known_keys else "custom"
|
||||
active_key = active if active in ("openrouter", "nous", "openai-codex") else "custom"
|
||||
ordered = []
|
||||
for key, label in providers:
|
||||
if key == active_key:
|
||||
@@ -469,8 +454,6 @@ def cmd_model(args):
|
||||
_model_flow_openai_codex(config, current_model)
|
||||
elif selected_provider == "custom":
|
||||
_model_flow_custom(config)
|
||||
elif selected_provider in ("zai", "kimi-coding", "minimax", "minimax-cn"):
|
||||
_model_flow_api_key_provider(config, selected_provider, current_model)
|
||||
|
||||
|
||||
def _prompt_provider_choice(choices):
|
||||
@@ -740,117 +723,6 @@ def _model_flow_custom(config):
|
||||
print("Endpoint saved. Use `/model` in chat or `hermes model` to set a model.")
|
||||
|
||||
|
||||
# Curated model lists for direct API-key providers
|
||||
_PROVIDER_MODELS = {
|
||||
"zai": [
|
||||
"glm-5",
|
||||
"glm-4.7",
|
||||
"glm-4.5",
|
||||
"glm-4.5-flash",
|
||||
],
|
||||
"kimi-coding": [
|
||||
"kimi-k2.5",
|
||||
"kimi-k2-thinking",
|
||||
"kimi-k2-turbo-preview",
|
||||
"kimi-k2-0905-preview",
|
||||
],
|
||||
"minimax": [
|
||||
"MiniMax-M2.5",
|
||||
"MiniMax-M2.5-highspeed",
|
||||
"MiniMax-M2.1",
|
||||
],
|
||||
"minimax-cn": [
|
||||
"MiniMax-M2.5",
|
||||
"MiniMax-M2.5-highspeed",
|
||||
"MiniMax-M2.1",
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
def _model_flow_api_key_provider(config, provider_id, current_model=""):
|
||||
"""Generic flow for API-key providers (z.ai, Kimi, MiniMax)."""
|
||||
from hermes_cli.auth import (
|
||||
PROVIDER_REGISTRY, _prompt_model_selection, _save_model_choice,
|
||||
_update_config_for_provider, deactivate_provider,
|
||||
)
|
||||
from hermes_cli.config import get_env_value, save_env_value, load_config, save_config
|
||||
|
||||
pconfig = PROVIDER_REGISTRY[provider_id]
|
||||
key_env = pconfig.api_key_env_vars[0] if pconfig.api_key_env_vars else ""
|
||||
base_url_env = pconfig.base_url_env_var or ""
|
||||
|
||||
# Check / prompt for API key
|
||||
existing_key = ""
|
||||
for ev in pconfig.api_key_env_vars:
|
||||
existing_key = get_env_value(ev) or os.getenv(ev, "")
|
||||
if existing_key:
|
||||
break
|
||||
|
||||
if not existing_key:
|
||||
print(f"No {pconfig.name} API key configured.")
|
||||
if key_env:
|
||||
try:
|
||||
new_key = input(f"{key_env} (or Enter to cancel): ").strip()
|
||||
except (KeyboardInterrupt, EOFError):
|
||||
print()
|
||||
return
|
||||
if not new_key:
|
||||
print("Cancelled.")
|
||||
return
|
||||
save_env_value(key_env, new_key)
|
||||
print("API key saved.")
|
||||
print()
|
||||
else:
|
||||
print(f" {pconfig.name} API key: {existing_key[:8]}... ✓")
|
||||
print()
|
||||
|
||||
# Optional base URL override
|
||||
current_base = ""
|
||||
if base_url_env:
|
||||
current_base = get_env_value(base_url_env) or os.getenv(base_url_env, "")
|
||||
effective_base = current_base or pconfig.inference_base_url
|
||||
|
||||
try:
|
||||
override = input(f"Base URL [{effective_base}]: ").strip()
|
||||
except (KeyboardInterrupt, EOFError):
|
||||
print()
|
||||
override = ""
|
||||
if override and base_url_env:
|
||||
save_env_value(base_url_env, override)
|
||||
effective_base = override
|
||||
|
||||
# Model selection
|
||||
model_list = _PROVIDER_MODELS.get(provider_id, [])
|
||||
if model_list:
|
||||
selected = _prompt_model_selection(model_list, current_model=current_model)
|
||||
else:
|
||||
try:
|
||||
selected = input("Model name: ").strip()
|
||||
except (KeyboardInterrupt, EOFError):
|
||||
selected = None
|
||||
|
||||
if selected:
|
||||
# Clear custom endpoint if set (avoid confusion)
|
||||
if get_env_value("OPENAI_BASE_URL"):
|
||||
save_env_value("OPENAI_BASE_URL", "")
|
||||
save_env_value("OPENAI_API_KEY", "")
|
||||
|
||||
_save_model_choice(selected)
|
||||
|
||||
# Update config with provider and base URL
|
||||
cfg = load_config()
|
||||
model = cfg.get("model")
|
||||
if isinstance(model, dict):
|
||||
model["provider"] = provider_id
|
||||
model["base_url"] = effective_base
|
||||
save_config(cfg)
|
||||
deactivate_provider()
|
||||
|
||||
print(f"Default model set to: {selected} (via {pconfig.name})")
|
||||
else:
|
||||
print("No change.")
|
||||
|
||||
|
||||
def cmd_login(args):
|
||||
"""Authenticate Hermes CLI with a provider."""
|
||||
from hermes_cli.auth import login_command
|
||||
@@ -1269,7 +1141,7 @@ For more help on a command:
|
||||
)
|
||||
chat_parser.add_argument(
|
||||
"--provider",
|
||||
choices=["auto", "openrouter", "nous", "openai-codex", "zai", "kimi-coding", "minimax", "minimax-cn"],
|
||||
choices=["auto", "openrouter", "nous", "openai-codex"],
|
||||
default=None,
|
||||
help="Inference provider (default: auto)"
|
||||
)
|
||||
|
||||
@@ -7,12 +7,10 @@ from typing import Any, Dict, Optional
|
||||
|
||||
from hermes_cli.auth import (
|
||||
AuthError,
|
||||
PROVIDER_REGISTRY,
|
||||
format_auth_error,
|
||||
resolve_provider,
|
||||
resolve_nous_runtime_credentials,
|
||||
resolve_codex_runtime_credentials,
|
||||
resolve_api_key_provider_credentials,
|
||||
)
|
||||
from hermes_cli.config import load_config
|
||||
from hermes_constants import OPENROUTER_BASE_URL
|
||||
@@ -76,9 +74,8 @@ def _resolve_openrouter_runtime(
|
||||
|
||||
# Choose API key based on whether the resolved base_url targets OpenRouter.
|
||||
# When hitting OpenRouter, prefer OPENROUTER_API_KEY (issue #289).
|
||||
# When hitting a custom endpoint (e.g. Z.ai, local LLM), prefer
|
||||
# OPENAI_API_KEY so the OpenRouter key doesn't leak to an unrelated
|
||||
# provider (issues #420, #560).
|
||||
# When hitting a custom endpoint, prefer OPENAI_API_KEY so the OpenRouter
|
||||
# key doesn't leak to an unrelated provider (issue #560).
|
||||
_is_openrouter_url = "openrouter.ai" in base_url
|
||||
if _is_openrouter_url:
|
||||
api_key = (
|
||||
@@ -148,19 +145,6 @@ def resolve_runtime_provider(
|
||||
"requested_provider": requested_provider,
|
||||
}
|
||||
|
||||
# API-key providers (z.ai/GLM, Kimi, MiniMax, MiniMax-CN)
|
||||
pconfig = PROVIDER_REGISTRY.get(provider)
|
||||
if pconfig and pconfig.auth_type == "api_key":
|
||||
creds = resolve_api_key_provider_credentials(provider)
|
||||
return {
|
||||
"provider": provider,
|
||||
"api_mode": "chat_completions",
|
||||
"base_url": creds.get("base_url", "").rstrip("/"),
|
||||
"api_key": creds.get("api_key", ""),
|
||||
"source": creds.get("source", "env"),
|
||||
"requested_provider": requested_provider,
|
||||
}
|
||||
|
||||
runtime = _resolve_openrouter_runtime(
|
||||
requested_provider=requested_provider,
|
||||
explicit_api_key=explicit_api_key,
|
||||
|
||||
@@ -306,15 +306,11 @@ def _print_setup_summary(config: dict, hermes_home):
|
||||
else:
|
||||
tool_status.append(("Web Search & Extract", False, "FIRECRAWL_API_KEY"))
|
||||
|
||||
# Browser tools (local Chromium or Browserbase cloud)
|
||||
import shutil
|
||||
_ab_found = shutil.which("agent-browser") or (Path(__file__).parent.parent / "node_modules" / ".bin" / "agent-browser").exists()
|
||||
# Browserbase (browser tools)
|
||||
if get_env_value('BROWSERBASE_API_KEY'):
|
||||
tool_status.append(("Browser Automation (Browserbase)", True, None))
|
||||
elif _ab_found:
|
||||
tool_status.append(("Browser Automation (local)", True, None))
|
||||
tool_status.append(("Browser Automation", True, None))
|
||||
else:
|
||||
tool_status.append(("Browser Automation", False, "npm install -g agent-browser"))
|
||||
tool_status.append(("Browser Automation", False, "BROWSERBASE_API_KEY"))
|
||||
|
||||
# FAL (image generation)
|
||||
if get_env_value('FAL_KEY'):
|
||||
@@ -520,10 +516,6 @@ def setup_model_provider(config: dict):
|
||||
"Login with OpenAI Codex",
|
||||
"OpenRouter API key (100+ models, pay-per-use)",
|
||||
"Custom OpenAI-compatible endpoint (self-hosted / VLLM / etc.)",
|
||||
"Z.AI / GLM (Zhipu AI models)",
|
||||
"Kimi / Moonshot (Kimi coding models)",
|
||||
"MiniMax (global endpoint)",
|
||||
"MiniMax China (mainland China endpoint)",
|
||||
]
|
||||
if keep_label:
|
||||
provider_choices.append(keep_label)
|
||||
@@ -640,8 +632,7 @@ def setup_model_provider(config: dict):
|
||||
|
||||
current_url = get_env_value("OPENAI_BASE_URL") or ""
|
||||
current_key = get_env_value("OPENAI_API_KEY")
|
||||
_raw_model = config.get('model', '')
|
||||
current_model = _raw_model.get('default', '') if isinstance(_raw_model, dict) else (_raw_model or '')
|
||||
current_model = config.get('model', '')
|
||||
|
||||
if current_url:
|
||||
print_info(f" Current URL: {current_url}")
|
||||
@@ -660,141 +651,11 @@ def setup_model_provider(config: dict):
|
||||
config['model'] = model_name
|
||||
save_env_value("LLM_MODEL", model_name)
|
||||
print_success("Custom endpoint configured")
|
||||
|
||||
elif provider_idx == 4: # Z.AI / GLM
|
||||
selected_provider = "zai"
|
||||
print()
|
||||
print_header("Z.AI / GLM API Key")
|
||||
pconfig = PROVIDER_REGISTRY["zai"]
|
||||
print_info(f"Provider: {pconfig.name}")
|
||||
print_info(f"Base URL: {pconfig.inference_base_url}")
|
||||
print_info("Get your API key at: https://open.bigmodel.cn/")
|
||||
print()
|
||||
|
||||
existing_key = get_env_value("GLM_API_KEY") or get_env_value("ZAI_API_KEY")
|
||||
if existing_key:
|
||||
print_info(f"Current: {existing_key[:8]}... (configured)")
|
||||
if prompt_yes_no("Update API key?", False):
|
||||
api_key = prompt(" GLM API key", password=True)
|
||||
if api_key:
|
||||
save_env_value("GLM_API_KEY", api_key)
|
||||
print_success("GLM API key updated")
|
||||
else:
|
||||
api_key = prompt(" GLM API key", password=True)
|
||||
if api_key:
|
||||
save_env_value("GLM_API_KEY", api_key)
|
||||
print_success("GLM API key saved")
|
||||
else:
|
||||
print_warning("Skipped - agent won't work without an API key")
|
||||
|
||||
# Clear custom endpoint vars if switching
|
||||
if existing_custom:
|
||||
save_env_value("OPENAI_BASE_URL", "")
|
||||
save_env_value("OPENAI_API_KEY", "")
|
||||
_update_config_for_provider("zai", pconfig.inference_base_url)
|
||||
|
||||
elif provider_idx == 5: # Kimi / Moonshot
|
||||
selected_provider = "kimi-coding"
|
||||
print()
|
||||
print_header("Kimi / Moonshot API Key")
|
||||
pconfig = PROVIDER_REGISTRY["kimi-coding"]
|
||||
print_info(f"Provider: {pconfig.name}")
|
||||
print_info(f"Base URL: {pconfig.inference_base_url}")
|
||||
print_info("Get your API key at: https://platform.moonshot.cn/")
|
||||
print()
|
||||
|
||||
existing_key = get_env_value("KIMI_API_KEY")
|
||||
if existing_key:
|
||||
print_info(f"Current: {existing_key[:8]}... (configured)")
|
||||
if prompt_yes_no("Update API key?", False):
|
||||
api_key = prompt(" Kimi API key", password=True)
|
||||
if api_key:
|
||||
save_env_value("KIMI_API_KEY", api_key)
|
||||
print_success("Kimi API key updated")
|
||||
else:
|
||||
api_key = prompt(" Kimi API key", password=True)
|
||||
if api_key:
|
||||
save_env_value("KIMI_API_KEY", api_key)
|
||||
print_success("Kimi API key saved")
|
||||
else:
|
||||
print_warning("Skipped - agent won't work without an API key")
|
||||
|
||||
# Clear custom endpoint vars if switching
|
||||
if existing_custom:
|
||||
save_env_value("OPENAI_BASE_URL", "")
|
||||
save_env_value("OPENAI_API_KEY", "")
|
||||
_update_config_for_provider("kimi-coding", pconfig.inference_base_url)
|
||||
|
||||
elif provider_idx == 6: # MiniMax
|
||||
selected_provider = "minimax"
|
||||
print()
|
||||
print_header("MiniMax API Key")
|
||||
pconfig = PROVIDER_REGISTRY["minimax"]
|
||||
print_info(f"Provider: {pconfig.name}")
|
||||
print_info(f"Base URL: {pconfig.inference_base_url}")
|
||||
print_info("Get your API key at: https://platform.minimaxi.com/")
|
||||
print()
|
||||
|
||||
existing_key = get_env_value("MINIMAX_API_KEY")
|
||||
if existing_key:
|
||||
print_info(f"Current: {existing_key[:8]}... (configured)")
|
||||
if prompt_yes_no("Update API key?", False):
|
||||
api_key = prompt(" MiniMax API key", password=True)
|
||||
if api_key:
|
||||
save_env_value("MINIMAX_API_KEY", api_key)
|
||||
print_success("MiniMax API key updated")
|
||||
else:
|
||||
api_key = prompt(" MiniMax API key", password=True)
|
||||
if api_key:
|
||||
save_env_value("MINIMAX_API_KEY", api_key)
|
||||
print_success("MiniMax API key saved")
|
||||
else:
|
||||
print_warning("Skipped - agent won't work without an API key")
|
||||
|
||||
# Clear custom endpoint vars if switching
|
||||
if existing_custom:
|
||||
save_env_value("OPENAI_BASE_URL", "")
|
||||
save_env_value("OPENAI_API_KEY", "")
|
||||
_update_config_for_provider("minimax", pconfig.inference_base_url)
|
||||
|
||||
elif provider_idx == 7: # MiniMax China
|
||||
selected_provider = "minimax-cn"
|
||||
print()
|
||||
print_header("MiniMax China API Key")
|
||||
pconfig = PROVIDER_REGISTRY["minimax-cn"]
|
||||
print_info(f"Provider: {pconfig.name}")
|
||||
print_info(f"Base URL: {pconfig.inference_base_url}")
|
||||
print_info("Get your API key at: https://platform.minimaxi.com/")
|
||||
print()
|
||||
|
||||
existing_key = get_env_value("MINIMAX_CN_API_KEY")
|
||||
if existing_key:
|
||||
print_info(f"Current: {existing_key[:8]}... (configured)")
|
||||
if prompt_yes_no("Update API key?", False):
|
||||
api_key = prompt(" MiniMax CN API key", password=True)
|
||||
if api_key:
|
||||
save_env_value("MINIMAX_CN_API_KEY", api_key)
|
||||
print_success("MiniMax CN API key updated")
|
||||
else:
|
||||
api_key = prompt(" MiniMax CN API key", password=True)
|
||||
if api_key:
|
||||
save_env_value("MINIMAX_CN_API_KEY", api_key)
|
||||
print_success("MiniMax CN API key saved")
|
||||
else:
|
||||
print_warning("Skipped - agent won't work without an API key")
|
||||
|
||||
# Clear custom endpoint vars if switching
|
||||
if existing_custom:
|
||||
save_env_value("OPENAI_BASE_URL", "")
|
||||
save_env_value("OPENAI_API_KEY", "")
|
||||
_update_config_for_provider("minimax-cn", pconfig.inference_base_url)
|
||||
|
||||
# else: provider_idx == 8 (Keep current) — only shown when a provider already exists
|
||||
# else: provider_idx == 4 (Keep current) — only shown when a provider already exists
|
||||
|
||||
# ── OpenRouter API Key for tools (if not already set) ──
|
||||
# Tools (vision, web, MoA) use OpenRouter independently of the main provider.
|
||||
# Prompt for OpenRouter key if not set and a non-OpenRouter provider was chosen.
|
||||
if selected_provider in ("nous", "openai-codex", "custom", "zai", "kimi-coding", "minimax", "minimax-cn") and not get_env_value("OPENROUTER_API_KEY"):
|
||||
if selected_provider in ("nous", "openai-codex", "custom") and not get_env_value("OPENROUTER_API_KEY"):
|
||||
print()
|
||||
print_header("OpenRouter API Key (for tools)")
|
||||
print_info("Tools like vision analysis, web search, and MoA use OpenRouter")
|
||||
@@ -812,8 +673,7 @@ def setup_model_provider(config: dict):
|
||||
if selected_provider != "custom": # Custom already prompted for model name
|
||||
print_header("Default Model")
|
||||
|
||||
_raw_model = config.get('model', 'anthropic/claude-opus-4.6')
|
||||
current_model = _raw_model.get('default', 'anthropic/claude-opus-4.6') if isinstance(_raw_model, dict) else (_raw_model or 'anthropic/claude-opus-4.6')
|
||||
current_model = config.get('model', 'anthropic/claude-opus-4.6')
|
||||
print_info(f"Current: {current_model}")
|
||||
|
||||
if selected_provider == "nous" and nous_models:
|
||||
@@ -851,94 +711,44 @@ def setup_model_provider(config: dict):
|
||||
model_idx = prompt_choice("Select default model:", model_choices, default_codex)
|
||||
if model_idx < len(codex_models):
|
||||
config['model'] = codex_models[model_idx]
|
||||
save_env_value("LLM_MODEL", codex_models[model_idx])
|
||||
elif model_idx == len(codex_models):
|
||||
custom = prompt("Enter model name")
|
||||
if custom:
|
||||
config['model'] = custom
|
||||
save_env_value("LLM_MODEL", custom)
|
||||
_update_config_for_provider("openai-codex", DEFAULT_CODEX_BASE_URL)
|
||||
elif selected_provider == "zai":
|
||||
zai_models = ["glm-5", "glm-4.7", "glm-4.5", "glm-4.5-flash"]
|
||||
model_choices = list(zai_models)
|
||||
model_choices.append("Custom model")
|
||||
model_choices.append(f"Keep current ({current_model})")
|
||||
|
||||
keep_idx = len(model_choices) - 1
|
||||
model_idx = prompt_choice("Select default model:", model_choices, keep_idx)
|
||||
|
||||
if model_idx < len(zai_models):
|
||||
config['model'] = zai_models[model_idx]
|
||||
save_env_value("LLM_MODEL", zai_models[model_idx])
|
||||
elif model_idx == len(zai_models):
|
||||
custom = prompt("Enter model name")
|
||||
if custom:
|
||||
config['model'] = custom
|
||||
save_env_value("LLM_MODEL", custom)
|
||||
# else: keep current
|
||||
elif selected_provider == "kimi-coding":
|
||||
kimi_models = ["kimi-k2.5", "kimi-k2-thinking", "kimi-k2-turbo-preview"]
|
||||
model_choices = list(kimi_models)
|
||||
model_choices.append("Custom model")
|
||||
model_choices.append(f"Keep current ({current_model})")
|
||||
|
||||
keep_idx = len(model_choices) - 1
|
||||
model_idx = prompt_choice("Select default model:", model_choices, keep_idx)
|
||||
|
||||
if model_idx < len(kimi_models):
|
||||
config['model'] = kimi_models[model_idx]
|
||||
save_env_value("LLM_MODEL", kimi_models[model_idx])
|
||||
elif model_idx == len(kimi_models):
|
||||
custom = prompt("Enter model name")
|
||||
if custom:
|
||||
config['model'] = custom
|
||||
save_env_value("LLM_MODEL", custom)
|
||||
# else: keep current
|
||||
elif selected_provider in ("minimax", "minimax-cn"):
|
||||
minimax_models = ["MiniMax-M2.5", "MiniMax-M2.5-highspeed", "MiniMax-M2.1"]
|
||||
model_choices = list(minimax_models)
|
||||
model_choices.append("Custom model")
|
||||
model_choices.append(f"Keep current ({current_model})")
|
||||
|
||||
keep_idx = len(model_choices) - 1
|
||||
model_idx = prompt_choice("Select default model:", model_choices, keep_idx)
|
||||
|
||||
if model_idx < len(minimax_models):
|
||||
config['model'] = minimax_models[model_idx]
|
||||
save_env_value("LLM_MODEL", minimax_models[model_idx])
|
||||
elif model_idx == len(minimax_models):
|
||||
custom = prompt("Enter model name")
|
||||
if custom:
|
||||
config['model'] = custom
|
||||
save_env_value("LLM_MODEL", custom)
|
||||
# else: keep current
|
||||
else:
|
||||
# Static list for OpenRouter / fallback (from canonical list)
|
||||
from hermes_cli.models import model_ids, menu_labels
|
||||
|
||||
ids = model_ids()
|
||||
model_choices = menu_labels() + [
|
||||
elif selected_provider == "openrouter":
|
||||
model_choices = [
|
||||
"anthropic/claude-opus-4.6 (most capable)",
|
||||
"anthropic/claude-sonnet-4 (best balance)",
|
||||
"google/gemini-2.5-pro (long context, large tasks)",
|
||||
"google/gemini-2.5-flash (fast, affordable)",
|
||||
"openai/gpt-4.1 (OpenAI latest)",
|
||||
"deepseek/deepseek-chat-v3-0324 (budget-friendly)",
|
||||
"Custom model",
|
||||
f"Keep current ({current_model})",
|
||||
]
|
||||
model_names = [
|
||||
"anthropic/claude-opus-4.6",
|
||||
"anthropic/claude-sonnet-4",
|
||||
"google/gemini-2.5-pro",
|
||||
"google/gemini-2.5-flash",
|
||||
"openai/gpt-4.1",
|
||||
"deepseek/deepseek-chat-v3-0324",
|
||||
]
|
||||
default_model_idx = len(model_choices) - 1
|
||||
for i, name in enumerate(model_names):
|
||||
if name == current_model:
|
||||
default_model_idx = i
|
||||
break
|
||||
|
||||
keep_idx = len(model_choices) - 1
|
||||
model_idx = prompt_choice("Select default model:", model_choices, keep_idx)
|
||||
model_idx = prompt_choice("Select default model:", model_choices, default_model_idx)
|
||||
|
||||
if model_idx < len(ids):
|
||||
config['model'] = ids[model_idx]
|
||||
save_env_value("LLM_MODEL", ids[model_idx])
|
||||
elif model_idx == len(ids): # Custom
|
||||
custom = prompt("Enter model name (e.g., anthropic/claude-opus-4.6)")
|
||||
if custom:
|
||||
config['model'] = custom
|
||||
save_env_value("LLM_MODEL", custom)
|
||||
# else: Keep current
|
||||
if model_idx < len(model_names):
|
||||
config['model'] = model_names[model_idx]
|
||||
elif model_idx == len(model_choices) - 2: # Custom
|
||||
model_name = prompt(" Model name (OpenRouter format: provider/model)")
|
||||
if model_name:
|
||||
config['model'] = model_name
|
||||
# else: keep current
|
||||
|
||||
_final_model = config.get('model', '')
|
||||
if _final_model:
|
||||
_display = _final_model.get('default', _final_model) if isinstance(_final_model, dict) else _final_model
|
||||
print_success(f"Model set to: {_display}")
|
||||
if config.get('model'):
|
||||
print_success(f"Model set to: {config['model']}")
|
||||
|
||||
save_config(config)
|
||||
|
||||
@@ -964,20 +774,32 @@ def setup_terminal_backend(config: dict):
|
||||
terminal_choices = [
|
||||
"Local - run directly on this machine (default)",
|
||||
"Docker - isolated container with configurable resources",
|
||||
"Modal - serverless cloud sandbox",
|
||||
"SSH - run on a remote machine",
|
||||
"Daytona - persistent cloud development environment",
|
||||
]
|
||||
idx_to_backend = {0: "local", 1: "docker", 2: "modal", 3: "ssh", 4: "daytona"}
|
||||
backend_to_idx = {"local": 0, "docker": 1, "modal": 2, "ssh": 3, "daytona": 4}
|
||||
idx_to_backend = {0: "local", 1: "docker"}
|
||||
backend_to_idx = {"local": 0, "docker": 1}
|
||||
|
||||
next_idx = 5
|
||||
next_idx = 2
|
||||
if is_linux:
|
||||
terminal_choices.append("Singularity/Apptainer - HPC-friendly container")
|
||||
idx_to_backend[next_idx] = "singularity"
|
||||
backend_to_idx["singularity"] = next_idx
|
||||
next_idx += 1
|
||||
|
||||
terminal_choices.append("Modal - serverless cloud sandbox")
|
||||
idx_to_backend[next_idx] = "modal"
|
||||
backend_to_idx["modal"] = next_idx
|
||||
next_idx += 1
|
||||
|
||||
terminal_choices.append("Daytona - persistent cloud development environment")
|
||||
idx_to_backend[next_idx] = "daytona"
|
||||
backend_to_idx["daytona"] = next_idx
|
||||
next_idx += 1
|
||||
|
||||
terminal_choices.append("SSH - run on a remote machine")
|
||||
idx_to_backend[next_idx] = "ssh"
|
||||
backend_to_idx["ssh"] = next_idx
|
||||
next_idx += 1
|
||||
|
||||
# Add keep current option
|
||||
keep_current_idx = next_idx
|
||||
terminal_choices.append(f"Keep current ({current_backend})")
|
||||
@@ -1072,7 +894,7 @@ def setup_terminal_backend(config: dict):
|
||||
uv_bin = shutil.which("uv")
|
||||
if uv_bin:
|
||||
result = subprocess.run(
|
||||
[uv_bin, "pip", "install", "--python", sys.executable, "swe-rex[modal]"],
|
||||
[uv_bin, "pip", "install", "swe-rex[modal]"],
|
||||
capture_output=True, text=True
|
||||
)
|
||||
else:
|
||||
@@ -1124,7 +946,7 @@ def setup_terminal_backend(config: dict):
|
||||
uv_bin = shutil.which("uv")
|
||||
if uv_bin:
|
||||
result = subprocess.run(
|
||||
[uv_bin, "pip", "install", "--python", sys.executable, "daytona"],
|
||||
[uv_bin, "pip", "install", "daytona"],
|
||||
capture_output=True, text=True
|
||||
)
|
||||
else:
|
||||
@@ -1136,8 +958,6 @@ def setup_terminal_backend(config: dict):
|
||||
print_success("daytona SDK installed")
|
||||
else:
|
||||
print_warning("Install failed — run manually: pip install daytona")
|
||||
if result.stderr:
|
||||
print_info(f" Error: {result.stderr.strip().splitlines()[-1]}")
|
||||
|
||||
# Daytona API key
|
||||
print()
|
||||
|
||||
@@ -79,12 +79,8 @@ def show_status(args):
|
||||
"OpenRouter": "OPENROUTER_API_KEY",
|
||||
"Anthropic": "ANTHROPIC_API_KEY",
|
||||
"OpenAI": "OPENAI_API_KEY",
|
||||
"Z.AI/GLM": "GLM_API_KEY",
|
||||
"Kimi": "KIMI_API_KEY",
|
||||
"MiniMax": "MINIMAX_API_KEY",
|
||||
"MiniMax-CN": "MINIMAX_CN_API_KEY",
|
||||
"Firecrawl": "FIRECRAWL_API_KEY",
|
||||
"Browserbase": "BROWSERBASE_API_KEY", # Optional — local browser works without this
|
||||
"Browserbase": "BROWSERBASE_API_KEY",
|
||||
"FAL": "FAL_KEY",
|
||||
"Tinker": "TINKER_API_KEY",
|
||||
"WandB": "WANDB_API_KEY",
|
||||
@@ -141,28 +137,6 @@ def show_status(args):
|
||||
if codex_status.get("error") and not codex_logged_in:
|
||||
print(f" Error: {codex_status.get('error')}")
|
||||
|
||||
# =========================================================================
|
||||
# API-Key Providers
|
||||
# =========================================================================
|
||||
print()
|
||||
print(color("◆ API-Key Providers", Colors.CYAN, Colors.BOLD))
|
||||
|
||||
apikey_providers = {
|
||||
"Z.AI / GLM": ("GLM_API_KEY", "ZAI_API_KEY", "Z_AI_API_KEY"),
|
||||
"Kimi / Moonshot": ("KIMI_API_KEY",),
|
||||
"MiniMax": ("MINIMAX_API_KEY",),
|
||||
"MiniMax (China)": ("MINIMAX_CN_API_KEY",),
|
||||
}
|
||||
for pname, env_vars in apikey_providers.items():
|
||||
key_val = ""
|
||||
for ev in env_vars:
|
||||
key_val = get_env_value(ev) or ""
|
||||
if key_val:
|
||||
break
|
||||
configured = bool(key_val)
|
||||
label = "configured" if configured else "not configured (run: hermes model)"
|
||||
print(f" {pname:<16} {check_mark(configured)} {label}")
|
||||
|
||||
# =========================================================================
|
||||
# Terminal Configuration
|
||||
# =========================================================================
|
||||
|
||||
@@ -177,15 +177,9 @@ TOOL_CATEGORIES = {
|
||||
"name": "Browser Automation",
|
||||
"icon": "🌐",
|
||||
"providers": [
|
||||
{
|
||||
"name": "Local Browser",
|
||||
"tag": "Free headless Chromium (no API key needed)",
|
||||
"env_vars": [],
|
||||
"post_setup": "browserbase", # Same npm install for agent-browser
|
||||
},
|
||||
{
|
||||
"name": "Browserbase",
|
||||
"tag": "Cloud browser with stealth & proxies",
|
||||
"tag": "Cloud browser with stealth mode",
|
||||
"env_vars": [
|
||||
{"key": "BROWSERBASE_API_KEY", "prompt": "Browserbase API key", "url": "https://browserbase.com"},
|
||||
{"key": "BROWSERBASE_PROJECT_ID", "prompt": "Browserbase project ID"},
|
||||
@@ -266,7 +260,7 @@ def _run_post_setup(post_setup_key: str):
|
||||
uv_bin = shutil.which("uv")
|
||||
if uv_bin:
|
||||
result = subprocess.run(
|
||||
[uv_bin, "pip", "install", "--python", sys.executable, "-e", str(tinker_dir)],
|
||||
[uv_bin, "pip", "install", "-e", str(tinker_dir)],
|
||||
capture_output=True, text=True
|
||||
)
|
||||
else:
|
||||
|
||||
119
hermes_time.py
119
hermes_time.py
@@ -1,119 +0,0 @@
|
||||
"""
|
||||
Timezone-aware clock for Hermes.
|
||||
|
||||
Provides a single ``now()`` helper that returns a timezone-aware datetime
|
||||
based on the user's configured IANA timezone (e.g. ``Asia/Kolkata``).
|
||||
|
||||
Resolution order:
|
||||
1. ``HERMES_TIMEZONE`` environment variable
|
||||
2. ``timezone`` key in ``~/.hermes/config.yaml``
|
||||
3. Falls back to the server's local time (``datetime.now().astimezone()``)
|
||||
|
||||
Invalid timezone values log a warning and fall back safely — Hermes never
|
||||
crashes due to a bad timezone string.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import os
|
||||
from datetime import datetime, timezone as _tz
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
try:
|
||||
from zoneinfo import ZoneInfo
|
||||
except ImportError:
|
||||
# Python 3.8 fallback (shouldn't be needed — Hermes requires 3.9+)
|
||||
from backports.zoneinfo import ZoneInfo # type: ignore[no-redef]
|
||||
|
||||
# Cached state — resolved once, reused on every call.
|
||||
# Call reset_cache() to force re-resolution (e.g. after config changes).
|
||||
_cached_tz: Optional[ZoneInfo] = None
|
||||
_cached_tz_name: Optional[str] = None
|
||||
_cache_resolved: bool = False
|
||||
|
||||
|
||||
def _resolve_timezone_name() -> str:
|
||||
"""Read the configured IANA timezone string (or empty string).
|
||||
|
||||
This does file I/O when falling through to config.yaml, so callers
|
||||
should cache the result rather than calling on every ``now()``.
|
||||
"""
|
||||
# 1. Environment variable (highest priority — set by Supervisor, etc.)
|
||||
tz_env = os.getenv("HERMES_TIMEZONE", "").strip()
|
||||
if tz_env:
|
||||
return tz_env
|
||||
|
||||
# 2. config.yaml ``timezone`` key
|
||||
try:
|
||||
import yaml
|
||||
hermes_home = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
|
||||
config_path = hermes_home / "config.yaml"
|
||||
if config_path.exists():
|
||||
with open(config_path) as f:
|
||||
cfg = yaml.safe_load(f) or {}
|
||||
tz_cfg = cfg.get("timezone", "")
|
||||
if isinstance(tz_cfg, str) and tz_cfg.strip():
|
||||
return tz_cfg.strip()
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
return ""
|
||||
|
||||
|
||||
def _get_zoneinfo(name: str) -> Optional[ZoneInfo]:
|
||||
"""Validate and return a ZoneInfo, or None if invalid."""
|
||||
if not name:
|
||||
return None
|
||||
try:
|
||||
return ZoneInfo(name)
|
||||
except (KeyError, Exception) as exc:
|
||||
logger.warning(
|
||||
"Invalid timezone '%s': %s. Falling back to server local time.",
|
||||
name, exc,
|
||||
)
|
||||
return None
|
||||
|
||||
|
||||
def get_timezone() -> Optional[ZoneInfo]:
|
||||
"""Return the user's configured ZoneInfo, or None (meaning server-local).
|
||||
|
||||
Resolved once and cached. Call ``reset_cache()`` after config changes.
|
||||
"""
|
||||
global _cached_tz, _cached_tz_name, _cache_resolved
|
||||
if not _cache_resolved:
|
||||
_cached_tz_name = _resolve_timezone_name()
|
||||
_cached_tz = _get_zoneinfo(_cached_tz_name)
|
||||
_cache_resolved = True
|
||||
return _cached_tz
|
||||
|
||||
|
||||
def get_timezone_name() -> str:
|
||||
"""Return the IANA name of the configured timezone, or empty string."""
|
||||
global _cached_tz_name, _cache_resolved
|
||||
if not _cache_resolved:
|
||||
get_timezone() # populates cache
|
||||
return _cached_tz_name or ""
|
||||
|
||||
|
||||
def now() -> datetime:
|
||||
"""
|
||||
Return the current time as a timezone-aware datetime.
|
||||
|
||||
If a valid timezone is configured, returns wall-clock time in that zone.
|
||||
Otherwise returns the server's local time (via ``astimezone()``).
|
||||
"""
|
||||
tz = get_timezone()
|
||||
if tz is not None:
|
||||
return datetime.now(tz)
|
||||
# No timezone configured — use server-local (still tz-aware)
|
||||
return datetime.now().astimezone()
|
||||
|
||||
|
||||
def reset_cache() -> None:
|
||||
"""Clear the cached timezone. Used by tests and after config changes."""
|
||||
global _cached_tz, _cached_tz_name, _cache_resolved
|
||||
_cached_tz = None
|
||||
_cached_tz_name = None
|
||||
_cache_resolved = False
|
||||
@@ -50,7 +50,6 @@ pty = ["ptyprocess>=0.7.0"]
|
||||
honcho = ["honcho-ai>=2.0.1"]
|
||||
mcp = ["mcp>=1.2.0"]
|
||||
homeassistant = ["aiohttp>=3.9.0"]
|
||||
yc-bench = ["yc-bench @ git+https://github.com/collinear-ai/yc-bench.git"]
|
||||
all = [
|
||||
"hermes-agent[modal]",
|
||||
"hermes-agent[daytona]",
|
||||
|
||||
125
run_agent.py
125
run_agent.py
@@ -99,46 +99,6 @@ from agent.trajectory import (
|
||||
)
|
||||
|
||||
|
||||
class IterationBudget:
|
||||
"""Thread-safe shared iteration counter for parent and child agents.
|
||||
|
||||
Tracks total LLM-call iterations consumed across a parent agent and all
|
||||
its subagents. A single ``IterationBudget`` is created by the parent
|
||||
and passed to every child so they share the same cap.
|
||||
|
||||
``execute_code`` (programmatic tool calling) iterations are refunded via
|
||||
:meth:`refund` so they don't eat into the budget.
|
||||
"""
|
||||
|
||||
def __init__(self, max_total: int):
|
||||
self.max_total = max_total
|
||||
self._used = 0
|
||||
self._lock = threading.Lock()
|
||||
|
||||
def consume(self) -> bool:
|
||||
"""Try to consume one iteration. Returns True if allowed."""
|
||||
with self._lock:
|
||||
if self._used >= self.max_total:
|
||||
return False
|
||||
self._used += 1
|
||||
return True
|
||||
|
||||
def refund(self) -> None:
|
||||
"""Give back one iteration (e.g. for execute_code turns)."""
|
||||
with self._lock:
|
||||
if self._used > 0:
|
||||
self._used -= 1
|
||||
|
||||
@property
|
||||
def used(self) -> int:
|
||||
return self._used
|
||||
|
||||
@property
|
||||
def remaining(self) -> int:
|
||||
with self._lock:
|
||||
return max(0, self.max_total - self._used)
|
||||
|
||||
|
||||
class AIAgent:
|
||||
"""
|
||||
AI Agent with tool calling capabilities.
|
||||
@@ -154,7 +114,7 @@ class AIAgent:
|
||||
provider: str = None,
|
||||
api_mode: str = None,
|
||||
model: str = "anthropic/claude-opus-4.6", # OpenRouter format
|
||||
max_iterations: int = 90, # Default tool-calling iterations (shared with subagents)
|
||||
max_iterations: int = 60, # Default tool-calling iterations
|
||||
tool_delay: float = 1.0,
|
||||
enabled_toolsets: List[str] = None,
|
||||
disabled_toolsets: List[str] = None,
|
||||
@@ -182,7 +142,6 @@ class AIAgent:
|
||||
skip_memory: bool = False,
|
||||
session_db=None,
|
||||
honcho_session_key: str = None,
|
||||
iteration_budget: "IterationBudget" = None,
|
||||
):
|
||||
"""
|
||||
Initialize the AI Agent.
|
||||
@@ -193,7 +152,7 @@ class AIAgent:
|
||||
provider (str): Provider identifier (optional; used for telemetry/routing hints)
|
||||
api_mode (str): API mode override: "chat_completions" or "codex_responses"
|
||||
model (str): Model name to use (default: "anthropic/claude-opus-4.6")
|
||||
max_iterations (int): Maximum number of tool calling iterations (default: 90)
|
||||
max_iterations (int): Maximum number of tool calling iterations (default: 60)
|
||||
tool_delay (float): Delay between tool calls in seconds (default: 1.0)
|
||||
enabled_toolsets (List[str]): Only enable tools from these toolsets (optional)
|
||||
disabled_toolsets (List[str]): Disable tools from these toolsets (optional)
|
||||
@@ -227,9 +186,6 @@ class AIAgent:
|
||||
"""
|
||||
self.model = model
|
||||
self.max_iterations = max_iterations
|
||||
# Shared iteration budget — parent creates, children inherit.
|
||||
# Consumed by every LLM turn across parent + all subagents.
|
||||
self.iteration_budget = iteration_budget or IterationBudget(max_iterations)
|
||||
self.tool_delay = tool_delay
|
||||
self.save_trajectories = save_trajectories
|
||||
self.verbose_logging = verbose_logging
|
||||
@@ -1407,8 +1363,7 @@ class AIAgent:
|
||||
if context_files_prompt:
|
||||
prompt_parts.append(context_files_prompt)
|
||||
|
||||
from hermes_time import now as _hermes_now
|
||||
now = _hermes_now()
|
||||
now = datetime.now()
|
||||
prompt_parts.append(
|
||||
f"Conversation started: {now.strftime('%A, %B %d, %Y %I:%M %p')}"
|
||||
)
|
||||
@@ -2063,49 +2018,6 @@ class AIAgent:
|
||||
|
||||
return True
|
||||
|
||||
def _try_refresh_nous_client_credentials(self, *, force: bool = True) -> bool:
|
||||
if self.api_mode != "chat_completions" or self.provider != "nous":
|
||||
return False
|
||||
|
||||
try:
|
||||
from hermes_cli.auth import resolve_nous_runtime_credentials
|
||||
|
||||
creds = resolve_nous_runtime_credentials(
|
||||
min_key_ttl_seconds=max(60, int(os.getenv("HERMES_NOUS_MIN_KEY_TTL_SECONDS", "1800"))),
|
||||
timeout_seconds=float(os.getenv("HERMES_NOUS_TIMEOUT_SECONDS", "15")),
|
||||
force_mint=force,
|
||||
)
|
||||
except Exception as exc:
|
||||
logger.debug("Nous credential refresh failed: %s", exc)
|
||||
return False
|
||||
|
||||
api_key = creds.get("api_key")
|
||||
base_url = creds.get("base_url")
|
||||
if not isinstance(api_key, str) or not api_key.strip():
|
||||
return False
|
||||
if not isinstance(base_url, str) or not base_url.strip():
|
||||
return False
|
||||
|
||||
self.api_key = api_key.strip()
|
||||
self.base_url = base_url.strip().rstrip("/")
|
||||
self._client_kwargs["api_key"] = self.api_key
|
||||
self._client_kwargs["base_url"] = self.base_url
|
||||
# Nous requests should not inherit OpenRouter-only attribution headers.
|
||||
self._client_kwargs.pop("default_headers", None)
|
||||
|
||||
try:
|
||||
self.client.close()
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
try:
|
||||
self.client = OpenAI(**self._client_kwargs)
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to rebuild OpenAI client after Nous refresh: %s", exc)
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
def _interruptible_api_call(self, api_kwargs: dict):
|
||||
"""
|
||||
Run the API call in a background thread so the main conversation loop
|
||||
@@ -3018,7 +2930,7 @@ class AIAgent:
|
||||
# Clear any stale interrupt state at start
|
||||
self.clear_interrupt()
|
||||
|
||||
while api_call_count < self.max_iterations and self.iteration_budget.remaining > 0:
|
||||
while api_call_count < self.max_iterations:
|
||||
# Check for interrupt request (e.g., user sent new message)
|
||||
if self._interrupt_requested:
|
||||
interrupted = True
|
||||
@@ -3027,10 +2939,6 @@ class AIAgent:
|
||||
break
|
||||
|
||||
api_call_count += 1
|
||||
if not self.iteration_budget.consume():
|
||||
if not self.quiet_mode:
|
||||
print(f"\n⚠️ Session iteration budget exhausted ({self.iteration_budget.max_total} total across agent + subagents)")
|
||||
break
|
||||
|
||||
# Fire step_callback for gateway hooks (agent:step event)
|
||||
if self.step_callback is not None:
|
||||
@@ -3107,13 +3015,6 @@ class AIAgent:
|
||||
if self._use_prompt_caching:
|
||||
api_messages = apply_anthropic_cache_control(api_messages, cache_ttl=self._cache_ttl)
|
||||
|
||||
# Safety net: strip orphaned tool results / add stubs for missing
|
||||
# results before sending to the API. The compressor handles this
|
||||
# during compression, but orphans can also sneak in from session
|
||||
# loading or manual message manipulation.
|
||||
if hasattr(self, 'context_compressor') and self.context_compressor:
|
||||
api_messages = self.context_compressor._sanitize_tool_pairs(api_messages)
|
||||
|
||||
# Calculate approximate request size for logging
|
||||
total_chars = sum(len(str(msg)) for msg in api_messages)
|
||||
approx_tokens = total_chars // 4 # Rough estimate: 4 chars per token
|
||||
@@ -3143,7 +3044,6 @@ class AIAgent:
|
||||
retry_count = 0
|
||||
max_retries = 6 # Increased to allow longer backoff periods
|
||||
codex_auth_retry_attempted = False
|
||||
nous_auth_retry_attempted = False
|
||||
|
||||
finish_reason = "stop"
|
||||
|
||||
@@ -3393,16 +3293,6 @@ class AIAgent:
|
||||
if self._try_refresh_codex_client_credentials(force=True):
|
||||
print(f"{self.log_prefix}🔐 Codex auth refreshed after 401. Retrying request...")
|
||||
continue
|
||||
if (
|
||||
self.api_mode == "chat_completions"
|
||||
and self.provider == "nous"
|
||||
and status_code == 401
|
||||
and not nous_auth_retry_attempted
|
||||
):
|
||||
nous_auth_retry_attempted = True
|
||||
if self._try_refresh_nous_client_credentials(force=True):
|
||||
print(f"{self.log_prefix}🔐 Nous agent key refreshed after 401. Retrying request...")
|
||||
continue
|
||||
|
||||
retry_count += 1
|
||||
elapsed_time = time.time() - api_start_time
|
||||
@@ -3797,13 +3687,6 @@ class AIAgent:
|
||||
self._log_msg_to_db(assistant_msg)
|
||||
|
||||
self._execute_tool_calls(assistant_message, messages, effective_task_id)
|
||||
|
||||
# Refund the iteration if the ONLY tool(s) called were
|
||||
# execute_code (programmatic tool calling). These are
|
||||
# cheap RPC-style calls that shouldn't eat the budget.
|
||||
_tc_names = {tc.function.name for tc in assistant_message.tool_calls}
|
||||
if _tc_names == {"execute_code"}:
|
||||
self.iteration_budget.refund()
|
||||
|
||||
if self.compression_enabled and self.context_compressor.should_compress():
|
||||
messages, active_system_prompt = self._compress_context(
|
||||
|
||||
@@ -1,3 +0,0 @@
|
||||
---
|
||||
description: Apple/macOS-specific skills — iMessage, Reminders, Notes, FindMy, and macOS automation. These skills only load on macOS systems.
|
||||
---
|
||||
@@ -1,88 +0,0 @@
|
||||
---
|
||||
name: apple-notes
|
||||
description: Manage Apple Notes via the memo CLI on macOS (create, view, search, edit).
|
||||
version: 1.0.0
|
||||
author: Hermes Agent
|
||||
license: MIT
|
||||
platforms: [macos]
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [Notes, Apple, macOS, note-taking]
|
||||
related_skills: [obsidian]
|
||||
---
|
||||
|
||||
# Apple Notes
|
||||
|
||||
Use `memo` to manage Apple Notes directly from the terminal. Notes sync across all Apple devices via iCloud.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- **macOS** with Notes.app
|
||||
- Install: `brew tap antoniorodr/memo && brew install antoniorodr/memo/memo`
|
||||
- Grant Automation access to Notes.app when prompted (System Settings → Privacy → Automation)
|
||||
|
||||
## When to Use
|
||||
|
||||
- User asks to create, view, or search Apple Notes
|
||||
- Saving information to Notes.app for cross-device access
|
||||
- Organizing notes into folders
|
||||
- Exporting notes to Markdown/HTML
|
||||
|
||||
## When NOT to Use
|
||||
|
||||
- Obsidian vault management → use the `obsidian` skill
|
||||
- Bear Notes → separate app (not supported here)
|
||||
- Quick agent-only notes → use the `memory` tool instead
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### View Notes
|
||||
|
||||
```bash
|
||||
memo notes # List all notes
|
||||
memo notes -f "Folder Name" # Filter by folder
|
||||
memo notes -s "query" # Search notes (fuzzy)
|
||||
```
|
||||
|
||||
### Create Notes
|
||||
|
||||
```bash
|
||||
memo notes -a # Interactive editor
|
||||
memo notes -a "Note Title" # Quick add with title
|
||||
```
|
||||
|
||||
### Edit Notes
|
||||
|
||||
```bash
|
||||
memo notes -e # Interactive selection to edit
|
||||
```
|
||||
|
||||
### Delete Notes
|
||||
|
||||
```bash
|
||||
memo notes -d # Interactive selection to delete
|
||||
```
|
||||
|
||||
### Move Notes
|
||||
|
||||
```bash
|
||||
memo notes -m # Move note to folder (interactive)
|
||||
```
|
||||
|
||||
### Export Notes
|
||||
|
||||
```bash
|
||||
memo notes -ex # Export to HTML/Markdown
|
||||
```
|
||||
|
||||
## Limitations
|
||||
|
||||
- Cannot edit notes containing images or attachments
|
||||
- Interactive prompts require terminal access (use pty=true if needed)
|
||||
- macOS only — requires Apple Notes.app
|
||||
|
||||
## Rules
|
||||
|
||||
1. Prefer Apple Notes when user wants cross-device sync (iPhone/iPad/Mac)
|
||||
2. Use the `memory` tool for agent-internal notes that don't need to sync
|
||||
3. Use the `obsidian` skill for Markdown-native knowledge management
|
||||
@@ -1,96 +0,0 @@
|
||||
---
|
||||
name: apple-reminders
|
||||
description: Manage Apple Reminders via remindctl CLI (list, add, complete, delete).
|
||||
version: 1.0.0
|
||||
author: Hermes Agent
|
||||
license: MIT
|
||||
platforms: [macos]
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [Reminders, tasks, todo, macOS, Apple]
|
||||
---
|
||||
|
||||
# Apple Reminders
|
||||
|
||||
Use `remindctl` to manage Apple Reminders directly from the terminal. Tasks sync across all Apple devices via iCloud.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- **macOS** with Reminders.app
|
||||
- Install: `brew install steipete/tap/remindctl`
|
||||
- Grant Reminders permission when prompted
|
||||
- Check: `remindctl status` / Request: `remindctl authorize`
|
||||
|
||||
## When to Use
|
||||
|
||||
- User mentions "reminder" or "Reminders app"
|
||||
- Creating personal to-dos with due dates that sync to iOS
|
||||
- Managing Apple Reminders lists
|
||||
- User wants tasks to appear on their iPhone/iPad
|
||||
|
||||
## When NOT to Use
|
||||
|
||||
- Scheduling agent alerts → use the cronjob tool instead
|
||||
- Calendar events → use Apple Calendar or Google Calendar
|
||||
- Project task management → use GitHub Issues, Notion, etc.
|
||||
- If user says "remind me" but means an agent alert → clarify first
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### View Reminders
|
||||
|
||||
```bash
|
||||
remindctl # Today's reminders
|
||||
remindctl today # Today
|
||||
remindctl tomorrow # Tomorrow
|
||||
remindctl week # This week
|
||||
remindctl overdue # Past due
|
||||
remindctl all # Everything
|
||||
remindctl 2026-01-04 # Specific date
|
||||
```
|
||||
|
||||
### Manage Lists
|
||||
|
||||
```bash
|
||||
remindctl list # List all lists
|
||||
remindctl list Work # Show specific list
|
||||
remindctl list Projects --create # Create list
|
||||
remindctl list Work --delete # Delete list
|
||||
```
|
||||
|
||||
### Create Reminders
|
||||
|
||||
```bash
|
||||
remindctl add "Buy milk"
|
||||
remindctl add --title "Call mom" --list Personal --due tomorrow
|
||||
remindctl add --title "Meeting prep" --due "2026-02-15 09:00"
|
||||
```
|
||||
|
||||
### Complete / Delete
|
||||
|
||||
```bash
|
||||
remindctl complete 1 2 3 # Complete by ID
|
||||
remindctl delete 4A83 --force # Delete by ID
|
||||
```
|
||||
|
||||
### Output Formats
|
||||
|
||||
```bash
|
||||
remindctl today --json # JSON for scripting
|
||||
remindctl today --plain # TSV format
|
||||
remindctl today --quiet # Counts only
|
||||
```
|
||||
|
||||
## Date Formats
|
||||
|
||||
Accepted by `--due` and date filters:
|
||||
- `today`, `tomorrow`, `yesterday`
|
||||
- `YYYY-MM-DD`
|
||||
- `YYYY-MM-DD HH:mm`
|
||||
- ISO 8601 (`2026-01-04T12:34:56Z`)
|
||||
|
||||
## Rules
|
||||
|
||||
1. When user says "remind me", clarify: Apple Reminders (syncs to phone) vs agent cronjob alert
|
||||
2. Always confirm reminder content and due date before creating
|
||||
3. Use `--json` for programmatic parsing
|
||||
@@ -1,131 +0,0 @@
|
||||
---
|
||||
name: findmy
|
||||
description: Track Apple devices and AirTags via FindMy.app on macOS using AppleScript and screen capture.
|
||||
version: 1.0.0
|
||||
author: Hermes Agent
|
||||
license: MIT
|
||||
platforms: [macos]
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [FindMy, AirTag, location, tracking, macOS, Apple]
|
||||
---
|
||||
|
||||
# Find My (Apple)
|
||||
|
||||
Track Apple devices and AirTags via the FindMy.app on macOS. Since Apple doesn't
|
||||
provide a CLI for FindMy, this skill uses AppleScript to open the app and
|
||||
screen capture to read device locations.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- **macOS** with Find My app and iCloud signed in
|
||||
- Devices/AirTags already registered in Find My
|
||||
- Screen Recording permission for terminal (System Settings → Privacy → Screen Recording)
|
||||
- **Optional but recommended**: Install `peekaboo` for better UI automation:
|
||||
`brew install steipete/tap/peekaboo`
|
||||
|
||||
## When to Use
|
||||
|
||||
- User asks "where is my [device/cat/keys/bag]?"
|
||||
- Tracking AirTag locations
|
||||
- Checking device locations (iPhone, iPad, Mac, AirPods)
|
||||
- Monitoring pet or item movement over time (AirTag patrol routes)
|
||||
|
||||
## Method 1: AppleScript + Screenshot (Basic)
|
||||
|
||||
### Open FindMy and Navigate
|
||||
|
||||
```bash
|
||||
# Open Find My app
|
||||
osascript -e 'tell application "FindMy" to activate'
|
||||
|
||||
# Wait for it to load
|
||||
sleep 3
|
||||
|
||||
# Take a screenshot of the Find My window
|
||||
screencapture -w -o /tmp/findmy.png
|
||||
```
|
||||
|
||||
Then use `vision_analyze` to read the screenshot:
|
||||
```
|
||||
vision_analyze(image_url="/tmp/findmy.png", question="What devices/items are shown and what are their locations?")
|
||||
```
|
||||
|
||||
### Switch Between Tabs
|
||||
|
||||
```bash
|
||||
# Switch to Devices tab
|
||||
osascript -e '
|
||||
tell application "System Events"
|
||||
tell process "FindMy"
|
||||
click button "Devices" of toolbar 1 of window 1
|
||||
end tell
|
||||
end tell'
|
||||
|
||||
# Switch to Items tab (AirTags)
|
||||
osascript -e '
|
||||
tell application "System Events"
|
||||
tell process "FindMy"
|
||||
click button "Items" of toolbar 1 of window 1
|
||||
end tell
|
||||
end tell'
|
||||
```
|
||||
|
||||
## Method 2: Peekaboo UI Automation (Recommended)
|
||||
|
||||
If `peekaboo` is installed, use it for more reliable UI interaction:
|
||||
|
||||
```bash
|
||||
# Open Find My
|
||||
osascript -e 'tell application "FindMy" to activate'
|
||||
sleep 3
|
||||
|
||||
# Capture and annotate the UI
|
||||
peekaboo see --app "FindMy" --annotate --path /tmp/findmy-ui.png
|
||||
|
||||
# Click on a specific device/item by element ID
|
||||
peekaboo click --on B3 --app "FindMy"
|
||||
|
||||
# Capture the detail view
|
||||
peekaboo image --app "FindMy" --path /tmp/findmy-detail.png
|
||||
```
|
||||
|
||||
Then analyze with vision:
|
||||
```
|
||||
vision_analyze(image_url="/tmp/findmy-detail.png", question="What is the location shown for this device/item? Include address and coordinates if visible.")
|
||||
```
|
||||
|
||||
## Workflow: Track AirTag Location Over Time
|
||||
|
||||
For monitoring an AirTag (e.g., tracking a cat's patrol route):
|
||||
|
||||
```bash
|
||||
# 1. Open FindMy to Items tab
|
||||
osascript -e 'tell application "FindMy" to activate'
|
||||
sleep 3
|
||||
|
||||
# 2. Click on the AirTag item (stay on page — AirTag only updates when page is open)
|
||||
|
||||
# 3. Periodically capture location
|
||||
while true; do
|
||||
screencapture -w -o /tmp/findmy-$(date +%H%M%S).png
|
||||
sleep 300 # Every 5 minutes
|
||||
done
|
||||
```
|
||||
|
||||
Analyze each screenshot with vision to extract coordinates, then compile a route.
|
||||
|
||||
## Limitations
|
||||
|
||||
- FindMy has **no CLI or API** — must use UI automation
|
||||
- AirTags only update location while the FindMy page is actively displayed
|
||||
- Location accuracy depends on nearby Apple devices in the FindMy network
|
||||
- Screen Recording permission required for screenshots
|
||||
- AppleScript UI automation may break across macOS versions
|
||||
|
||||
## Rules
|
||||
|
||||
1. Keep FindMy app in the foreground when tracking AirTags (updates stop when minimized)
|
||||
2. Use `vision_analyze` to read screenshot content — don't try to parse pixels
|
||||
3. For ongoing tracking, use a cronjob to periodically capture and log locations
|
||||
4. Respect privacy — only track devices/items the user owns
|
||||
@@ -1,100 +0,0 @@
|
||||
---
|
||||
name: imessage
|
||||
description: Send and receive iMessages/SMS via the imsg CLI on macOS.
|
||||
version: 1.0.0
|
||||
author: Hermes Agent
|
||||
license: MIT
|
||||
platforms: [macos]
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [iMessage, SMS, messaging, macOS, Apple]
|
||||
---
|
||||
|
||||
# iMessage
|
||||
|
||||
Use `imsg` to read and send iMessage/SMS via macOS Messages.app.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- **macOS** with Messages.app signed in
|
||||
- Install: `brew install steipete/tap/imsg`
|
||||
- Grant Full Disk Access for terminal (System Settings → Privacy → Full Disk Access)
|
||||
- Grant Automation permission for Messages.app when prompted
|
||||
|
||||
## When to Use
|
||||
|
||||
- User asks to send an iMessage or text message
|
||||
- Reading iMessage conversation history
|
||||
- Checking recent Messages.app chats
|
||||
- Sending to phone numbers or Apple IDs
|
||||
|
||||
## When NOT to Use
|
||||
|
||||
- Telegram/Discord/Slack/WhatsApp messages → use the appropriate gateway channel
|
||||
- Group chat management (adding/removing members) → not supported
|
||||
- Bulk/mass messaging → always confirm with user first
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### List Chats
|
||||
|
||||
```bash
|
||||
imsg chats --limit 10 --json
|
||||
```
|
||||
|
||||
### View History
|
||||
|
||||
```bash
|
||||
# By chat ID
|
||||
imsg history --chat-id 1 --limit 20 --json
|
||||
|
||||
# With attachments info
|
||||
imsg history --chat-id 1 --limit 20 --attachments --json
|
||||
```
|
||||
|
||||
### Send Messages
|
||||
|
||||
```bash
|
||||
# Text only
|
||||
imsg send --to "+14155551212" --text "Hello!"
|
||||
|
||||
# With attachment
|
||||
imsg send --to "+14155551212" --text "Check this out" --file /path/to/image.jpg
|
||||
|
||||
# Force iMessage or SMS
|
||||
imsg send --to "+14155551212" --text "Hi" --service imessage
|
||||
imsg send --to "+14155551212" --text "Hi" --service sms
|
||||
```
|
||||
|
||||
### Watch for New Messages
|
||||
|
||||
```bash
|
||||
imsg watch --chat-id 1 --attachments
|
||||
```
|
||||
|
||||
## Service Options
|
||||
|
||||
- `--service imessage` — Force iMessage (requires recipient has iMessage)
|
||||
- `--service sms` — Force SMS (green bubble)
|
||||
- `--service auto` — Let Messages.app decide (default)
|
||||
|
||||
## Rules
|
||||
|
||||
1. **Always confirm recipient and message content** before sending
|
||||
2. **Never send to unknown numbers** without explicit user approval
|
||||
3. **Verify file paths** exist before attaching
|
||||
4. **Don't spam** — rate-limit yourself
|
||||
|
||||
## Example Workflow
|
||||
|
||||
User: "Text mom that I'll be late"
|
||||
|
||||
```bash
|
||||
# 1. Find mom's chat
|
||||
imsg chats --limit 20 --json | jq '.[] | select(.displayName | contains("Mom"))'
|
||||
|
||||
# 2. Confirm with user: "Found Mom at +1555123456. Send 'I'll be late' via iMessage?"
|
||||
|
||||
# 3. Send after confirmation
|
||||
imsg send --to "+1555123456" --text "I'll be late"
|
||||
```
|
||||
@@ -151,10 +151,10 @@ class TestGetTextAuxiliaryClient:
|
||||
assert model is None
|
||||
|
||||
|
||||
class TestVisionClientFallback:
|
||||
"""Vision client uses the same full fallback chain as text."""
|
||||
class TestCodexNotInVisionClient:
|
||||
"""Codex fallback should NOT apply to vision tasks."""
|
||||
|
||||
def test_vision_returns_none_without_any_credentials(self):
|
||||
def test_vision_returns_none_without_openrouter_nous(self):
|
||||
with patch("agent.auxiliary_client._read_nous_auth", return_value=None):
|
||||
client, model = get_vision_auxiliary_client()
|
||||
assert client is None
|
||||
|
||||
@@ -165,52 +165,6 @@ class TestBuildSkillsSystemPrompt:
|
||||
# "search" should appear only once per category
|
||||
assert result.count("- search") == 1
|
||||
|
||||
def test_excludes_incompatible_platform_skills(self, monkeypatch, tmp_path):
|
||||
"""Skills with platforms: [macos] should not appear on Linux."""
|
||||
monkeypatch.setenv("HERMES_HOME", str(tmp_path))
|
||||
skills_dir = tmp_path / "skills" / "apple"
|
||||
skills_dir.mkdir(parents=True)
|
||||
|
||||
# macOS-only skill
|
||||
mac_skill = skills_dir / "imessage"
|
||||
mac_skill.mkdir()
|
||||
(mac_skill / "SKILL.md").write_text(
|
||||
"---\nname: imessage\ndescription: Send iMessages\nplatforms: [macos]\n---\n"
|
||||
)
|
||||
|
||||
# Universal skill
|
||||
uni_skill = skills_dir / "web-search"
|
||||
uni_skill.mkdir()
|
||||
(uni_skill / "SKILL.md").write_text(
|
||||
"---\nname: web-search\ndescription: Search the web\n---\n"
|
||||
)
|
||||
|
||||
from unittest.mock import patch
|
||||
with patch("tools.skills_tool.sys") as mock_sys:
|
||||
mock_sys.platform = "linux"
|
||||
result = build_skills_system_prompt()
|
||||
|
||||
assert "web-search" in result
|
||||
assert "imessage" not in result
|
||||
|
||||
def test_includes_matching_platform_skills(self, monkeypatch, tmp_path):
|
||||
"""Skills with platforms: [macos] should appear on macOS."""
|
||||
monkeypatch.setenv("HERMES_HOME", str(tmp_path))
|
||||
skills_dir = tmp_path / "skills" / "apple"
|
||||
mac_skill = skills_dir / "imessage"
|
||||
mac_skill.mkdir(parents=True)
|
||||
(mac_skill / "SKILL.md").write_text(
|
||||
"---\nname: imessage\ndescription: Send iMessages\nplatforms: [macos]\n---\n"
|
||||
)
|
||||
|
||||
from unittest.mock import patch
|
||||
with patch("tools.skills_tool.sys") as mock_sys:
|
||||
mock_sys.platform = "darwin"
|
||||
result = build_skills_system_prompt()
|
||||
|
||||
assert "imessage" in result
|
||||
assert "Send iMessages" in result
|
||||
|
||||
|
||||
# =========================================================================
|
||||
# Context files prompt builder
|
||||
|
||||
@@ -1,87 +0,0 @@
|
||||
"""Tests for agent/skill_commands.py — skill slash command scanning and platform filtering."""
|
||||
|
||||
from pathlib import Path
|
||||
from unittest.mock import patch
|
||||
|
||||
from agent.skill_commands import scan_skill_commands, build_skill_invocation_message
|
||||
|
||||
|
||||
def _make_skill(skills_dir, name, frontmatter_extra="", body="Do the thing.", category=None):
|
||||
"""Helper to create a minimal skill directory with SKILL.md."""
|
||||
if category:
|
||||
skill_dir = skills_dir / category / name
|
||||
else:
|
||||
skill_dir = skills_dir / name
|
||||
skill_dir.mkdir(parents=True, exist_ok=True)
|
||||
content = f"""\
|
||||
---
|
||||
name: {name}
|
||||
description: Description for {name}.
|
||||
{frontmatter_extra}---
|
||||
|
||||
# {name}
|
||||
|
||||
{body}
|
||||
"""
|
||||
(skill_dir / "SKILL.md").write_text(content)
|
||||
return skill_dir
|
||||
|
||||
|
||||
class TestScanSkillCommands:
|
||||
def test_finds_skills(self, tmp_path):
|
||||
with patch("tools.skills_tool.SKILLS_DIR", tmp_path):
|
||||
_make_skill(tmp_path, "my-skill")
|
||||
result = scan_skill_commands()
|
||||
assert "/my-skill" in result
|
||||
assert result["/my-skill"]["name"] == "my-skill"
|
||||
|
||||
def test_empty_dir(self, tmp_path):
|
||||
with patch("tools.skills_tool.SKILLS_DIR", tmp_path):
|
||||
result = scan_skill_commands()
|
||||
assert result == {}
|
||||
|
||||
def test_excludes_incompatible_platform(self, tmp_path):
|
||||
"""macOS-only skills should not register slash commands on Linux."""
|
||||
with patch("tools.skills_tool.SKILLS_DIR", tmp_path), \
|
||||
patch("tools.skills_tool.sys") as mock_sys:
|
||||
mock_sys.platform = "linux"
|
||||
_make_skill(tmp_path, "imessage", frontmatter_extra="platforms: [macos]\n")
|
||||
_make_skill(tmp_path, "web-search")
|
||||
result = scan_skill_commands()
|
||||
assert "/web-search" in result
|
||||
assert "/imessage" not in result
|
||||
|
||||
def test_includes_matching_platform(self, tmp_path):
|
||||
"""macOS-only skills should register slash commands on macOS."""
|
||||
with patch("tools.skills_tool.SKILLS_DIR", tmp_path), \
|
||||
patch("tools.skills_tool.sys") as mock_sys:
|
||||
mock_sys.platform = "darwin"
|
||||
_make_skill(tmp_path, "imessage", frontmatter_extra="platforms: [macos]\n")
|
||||
result = scan_skill_commands()
|
||||
assert "/imessage" in result
|
||||
|
||||
def test_universal_skill_on_any_platform(self, tmp_path):
|
||||
"""Skills without platforms field should register on any platform."""
|
||||
with patch("tools.skills_tool.SKILLS_DIR", tmp_path), \
|
||||
patch("tools.skills_tool.sys") as mock_sys:
|
||||
mock_sys.platform = "win32"
|
||||
_make_skill(tmp_path, "generic-tool")
|
||||
result = scan_skill_commands()
|
||||
assert "/generic-tool" in result
|
||||
|
||||
|
||||
class TestBuildSkillInvocationMessage:
|
||||
def test_builds_message(self, tmp_path):
|
||||
with patch("tools.skills_tool.SKILLS_DIR", tmp_path):
|
||||
_make_skill(tmp_path, "test-skill")
|
||||
scan_skill_commands()
|
||||
msg = build_skill_invocation_message("/test-skill", "do stuff")
|
||||
assert msg is not None
|
||||
assert "test-skill" in msg
|
||||
assert "do stuff" in msg
|
||||
|
||||
def test_returns_none_for_unknown(self, tmp_path):
|
||||
with patch("tools.skills_tool.SKILLS_DIR", tmp_path):
|
||||
scan_skill_commands()
|
||||
msg = build_skill_invocation_message("/nonexistent")
|
||||
assert msg is None
|
||||
@@ -75,9 +75,8 @@ class TestParseSchedule:
|
||||
run_at_str = result["run_at"]
|
||||
assert isinstance(run_at_str, str)
|
||||
run_at = datetime.fromisoformat(run_at_str)
|
||||
now = datetime.now().astimezone()
|
||||
assert run_at > now
|
||||
assert run_at < now + timedelta(minutes=31)
|
||||
assert run_at > datetime.now()
|
||||
assert run_at < datetime.now() + timedelta(minutes=31)
|
||||
|
||||
def test_every_becomes_interval(self):
|
||||
result = parse_schedule("every 2h")
|
||||
@@ -130,15 +129,15 @@ class TestComputeNextRun:
|
||||
result = compute_next_run(schedule)
|
||||
next_dt = datetime.fromisoformat(result)
|
||||
# Should be ~60 minutes from now
|
||||
assert next_dt > datetime.now().astimezone() + timedelta(minutes=59)
|
||||
assert next_dt > datetime.now() + timedelta(minutes=59)
|
||||
|
||||
def test_interval_subsequent_run(self):
|
||||
schedule = {"kind": "interval", "minutes": 30}
|
||||
last = datetime.now().astimezone().isoformat()
|
||||
last = datetime.now().isoformat()
|
||||
result = compute_next_run(schedule, last_run_at=last)
|
||||
next_dt = datetime.fromisoformat(result)
|
||||
# Should be ~30 minutes from last run
|
||||
assert next_dt > datetime.now().astimezone() + timedelta(minutes=29)
|
||||
assert next_dt > datetime.now() + timedelta(minutes=29)
|
||||
|
||||
def test_cron_returns_future(self):
|
||||
pytest.importorskip("croniter")
|
||||
@@ -148,7 +147,7 @@ class TestComputeNextRun:
|
||||
assert len(result) > 0
|
||||
next_dt = datetime.fromisoformat(result)
|
||||
assert isinstance(next_dt, datetime)
|
||||
assert next_dt > datetime.now().astimezone()
|
||||
assert next_dt > datetime.now()
|
||||
|
||||
def test_unknown_kind_returns_none(self):
|
||||
assert compute_next_run({"kind": "unknown"}) is None
|
||||
|
||||
@@ -1,342 +0,0 @@
|
||||
"""Tests for API-key provider support (z.ai/GLM, Kimi, MiniMax)."""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import types
|
||||
|
||||
import pytest
|
||||
|
||||
# Ensure dotenv doesn't interfere
|
||||
if "dotenv" not in sys.modules:
|
||||
fake_dotenv = types.ModuleType("dotenv")
|
||||
fake_dotenv.load_dotenv = lambda *args, **kwargs: None
|
||||
sys.modules["dotenv"] = fake_dotenv
|
||||
|
||||
from hermes_cli.auth import (
|
||||
PROVIDER_REGISTRY,
|
||||
ProviderConfig,
|
||||
resolve_provider,
|
||||
get_api_key_provider_status,
|
||||
resolve_api_key_provider_credentials,
|
||||
get_auth_status,
|
||||
AuthError,
|
||||
)
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Provider Registry tests
|
||||
# =============================================================================
|
||||
|
||||
class TestProviderRegistry:
|
||||
"""Test that new providers are correctly registered."""
|
||||
|
||||
@pytest.mark.parametrize("provider_id,name,auth_type", [
|
||||
("zai", "Z.AI / GLM", "api_key"),
|
||||
("kimi-coding", "Kimi / Moonshot", "api_key"),
|
||||
("minimax", "MiniMax", "api_key"),
|
||||
("minimax-cn", "MiniMax (China)", "api_key"),
|
||||
])
|
||||
def test_provider_registered(self, provider_id, name, auth_type):
|
||||
assert provider_id in PROVIDER_REGISTRY
|
||||
pconfig = PROVIDER_REGISTRY[provider_id]
|
||||
assert pconfig.name == name
|
||||
assert pconfig.auth_type == auth_type
|
||||
assert pconfig.inference_base_url # must have a default base URL
|
||||
|
||||
def test_zai_env_vars(self):
|
||||
pconfig = PROVIDER_REGISTRY["zai"]
|
||||
assert pconfig.api_key_env_vars == ("GLM_API_KEY", "ZAI_API_KEY", "Z_AI_API_KEY")
|
||||
assert pconfig.base_url_env_var == "GLM_BASE_URL"
|
||||
|
||||
def test_kimi_env_vars(self):
|
||||
pconfig = PROVIDER_REGISTRY["kimi-coding"]
|
||||
assert pconfig.api_key_env_vars == ("KIMI_API_KEY",)
|
||||
assert pconfig.base_url_env_var == "KIMI_BASE_URL"
|
||||
|
||||
def test_minimax_env_vars(self):
|
||||
pconfig = PROVIDER_REGISTRY["minimax"]
|
||||
assert pconfig.api_key_env_vars == ("MINIMAX_API_KEY",)
|
||||
assert pconfig.base_url_env_var == "MINIMAX_BASE_URL"
|
||||
|
||||
def test_minimax_cn_env_vars(self):
|
||||
pconfig = PROVIDER_REGISTRY["minimax-cn"]
|
||||
assert pconfig.api_key_env_vars == ("MINIMAX_CN_API_KEY",)
|
||||
assert pconfig.base_url_env_var == "MINIMAX_CN_BASE_URL"
|
||||
|
||||
def test_base_urls(self):
|
||||
assert PROVIDER_REGISTRY["zai"].inference_base_url == "https://api.z.ai/api/paas/v4"
|
||||
assert PROVIDER_REGISTRY["kimi-coding"].inference_base_url == "https://api.moonshot.ai/v1"
|
||||
assert PROVIDER_REGISTRY["minimax"].inference_base_url == "https://api.minimax.io/v1"
|
||||
assert PROVIDER_REGISTRY["minimax-cn"].inference_base_url == "https://api.minimaxi.com/v1"
|
||||
|
||||
def test_oauth_providers_unchanged(self):
|
||||
"""Ensure we didn't break the existing OAuth providers."""
|
||||
assert "nous" in PROVIDER_REGISTRY
|
||||
assert PROVIDER_REGISTRY["nous"].auth_type == "oauth_device_code"
|
||||
assert "openai-codex" in PROVIDER_REGISTRY
|
||||
assert PROVIDER_REGISTRY["openai-codex"].auth_type == "oauth_external"
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Provider Resolution tests
|
||||
# =============================================================================
|
||||
|
||||
PROVIDER_ENV_VARS = (
|
||||
"OPENROUTER_API_KEY", "OPENAI_API_KEY", "ANTHROPIC_API_KEY",
|
||||
"GLM_API_KEY", "ZAI_API_KEY", "Z_AI_API_KEY",
|
||||
"KIMI_API_KEY", "MINIMAX_API_KEY", "MINIMAX_CN_API_KEY",
|
||||
"OPENAI_BASE_URL",
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def _clear_provider_env(monkeypatch):
|
||||
for key in PROVIDER_ENV_VARS:
|
||||
monkeypatch.delenv(key, raising=False)
|
||||
|
||||
|
||||
class TestResolveProvider:
|
||||
"""Test resolve_provider() with new providers."""
|
||||
|
||||
def test_explicit_zai(self):
|
||||
assert resolve_provider("zai") == "zai"
|
||||
|
||||
def test_explicit_kimi_coding(self):
|
||||
assert resolve_provider("kimi-coding") == "kimi-coding"
|
||||
|
||||
def test_explicit_minimax(self):
|
||||
assert resolve_provider("minimax") == "minimax"
|
||||
|
||||
def test_explicit_minimax_cn(self):
|
||||
assert resolve_provider("minimax-cn") == "minimax-cn"
|
||||
|
||||
def test_alias_glm(self):
|
||||
assert resolve_provider("glm") == "zai"
|
||||
|
||||
def test_alias_z_ai(self):
|
||||
assert resolve_provider("z-ai") == "zai"
|
||||
|
||||
def test_alias_zhipu(self):
|
||||
assert resolve_provider("zhipu") == "zai"
|
||||
|
||||
def test_alias_kimi(self):
|
||||
assert resolve_provider("kimi") == "kimi-coding"
|
||||
|
||||
def test_alias_moonshot(self):
|
||||
assert resolve_provider("moonshot") == "kimi-coding"
|
||||
|
||||
def test_alias_minimax_underscore(self):
|
||||
assert resolve_provider("minimax_cn") == "minimax-cn"
|
||||
|
||||
def test_alias_case_insensitive(self):
|
||||
assert resolve_provider("GLM") == "zai"
|
||||
assert resolve_provider("Z-AI") == "zai"
|
||||
assert resolve_provider("Kimi") == "kimi-coding"
|
||||
|
||||
def test_unknown_provider_raises(self):
|
||||
with pytest.raises(AuthError):
|
||||
resolve_provider("nonexistent-provider-xyz")
|
||||
|
||||
def test_auto_detects_glm_key(self, monkeypatch):
|
||||
monkeypatch.setenv("GLM_API_KEY", "test-glm-key")
|
||||
assert resolve_provider("auto") == "zai"
|
||||
|
||||
def test_auto_detects_zai_key(self, monkeypatch):
|
||||
monkeypatch.setenv("ZAI_API_KEY", "test-zai-key")
|
||||
assert resolve_provider("auto") == "zai"
|
||||
|
||||
def test_auto_detects_z_ai_key(self, monkeypatch):
|
||||
monkeypatch.setenv("Z_AI_API_KEY", "test-z-ai-key")
|
||||
assert resolve_provider("auto") == "zai"
|
||||
|
||||
def test_auto_detects_kimi_key(self, monkeypatch):
|
||||
monkeypatch.setenv("KIMI_API_KEY", "test-kimi-key")
|
||||
assert resolve_provider("auto") == "kimi-coding"
|
||||
|
||||
def test_auto_detects_minimax_key(self, monkeypatch):
|
||||
monkeypatch.setenv("MINIMAX_API_KEY", "test-mm-key")
|
||||
assert resolve_provider("auto") == "minimax"
|
||||
|
||||
def test_auto_detects_minimax_cn_key(self, monkeypatch):
|
||||
monkeypatch.setenv("MINIMAX_CN_API_KEY", "test-mm-cn-key")
|
||||
assert resolve_provider("auto") == "minimax-cn"
|
||||
|
||||
def test_openrouter_takes_priority_over_glm(self, monkeypatch):
|
||||
"""OpenRouter API key should win over GLM in auto-detection."""
|
||||
monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")
|
||||
monkeypatch.setenv("GLM_API_KEY", "glm-key")
|
||||
assert resolve_provider("auto") == "openrouter"
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# API Key Provider Status tests
|
||||
# =============================================================================
|
||||
|
||||
class TestApiKeyProviderStatus:
|
||||
|
||||
def test_unconfigured_provider(self):
|
||||
status = get_api_key_provider_status("zai")
|
||||
assert status["configured"] is False
|
||||
assert status["logged_in"] is False
|
||||
|
||||
def test_configured_provider(self, monkeypatch):
|
||||
monkeypatch.setenv("GLM_API_KEY", "test-key-123")
|
||||
status = get_api_key_provider_status("zai")
|
||||
assert status["configured"] is True
|
||||
assert status["logged_in"] is True
|
||||
assert status["key_source"] == "GLM_API_KEY"
|
||||
assert "z.ai" in status["base_url"].lower() or "api.z.ai" in status["base_url"]
|
||||
|
||||
def test_fallback_env_var(self, monkeypatch):
|
||||
"""ZAI_API_KEY should work when GLM_API_KEY is not set."""
|
||||
monkeypatch.setenv("ZAI_API_KEY", "zai-fallback-key")
|
||||
status = get_api_key_provider_status("zai")
|
||||
assert status["configured"] is True
|
||||
assert status["key_source"] == "ZAI_API_KEY"
|
||||
|
||||
def test_custom_base_url(self, monkeypatch):
|
||||
monkeypatch.setenv("KIMI_API_KEY", "kimi-key")
|
||||
monkeypatch.setenv("KIMI_BASE_URL", "https://custom.kimi.example/v1")
|
||||
status = get_api_key_provider_status("kimi-coding")
|
||||
assert status["base_url"] == "https://custom.kimi.example/v1"
|
||||
|
||||
def test_get_auth_status_dispatches_to_api_key(self, monkeypatch):
|
||||
monkeypatch.setenv("MINIMAX_API_KEY", "mm-key")
|
||||
status = get_auth_status("minimax")
|
||||
assert status["configured"] is True
|
||||
assert status["provider"] == "minimax"
|
||||
|
||||
def test_non_api_key_provider(self):
|
||||
status = get_api_key_provider_status("nous")
|
||||
assert status["configured"] is False
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Credential Resolution tests
|
||||
# =============================================================================
|
||||
|
||||
class TestResolveApiKeyProviderCredentials:
|
||||
|
||||
def test_resolve_zai_with_key(self, monkeypatch):
|
||||
monkeypatch.setenv("GLM_API_KEY", "glm-secret-key")
|
||||
creds = resolve_api_key_provider_credentials("zai")
|
||||
assert creds["provider"] == "zai"
|
||||
assert creds["api_key"] == "glm-secret-key"
|
||||
assert creds["base_url"] == "https://api.z.ai/api/paas/v4"
|
||||
assert creds["source"] == "GLM_API_KEY"
|
||||
|
||||
def test_resolve_kimi_with_key(self, monkeypatch):
|
||||
monkeypatch.setenv("KIMI_API_KEY", "kimi-secret-key")
|
||||
creds = resolve_api_key_provider_credentials("kimi-coding")
|
||||
assert creds["provider"] == "kimi-coding"
|
||||
assert creds["api_key"] == "kimi-secret-key"
|
||||
assert creds["base_url"] == "https://api.moonshot.ai/v1"
|
||||
|
||||
def test_resolve_minimax_with_key(self, monkeypatch):
|
||||
monkeypatch.setenv("MINIMAX_API_KEY", "mm-secret-key")
|
||||
creds = resolve_api_key_provider_credentials("minimax")
|
||||
assert creds["provider"] == "minimax"
|
||||
assert creds["api_key"] == "mm-secret-key"
|
||||
assert creds["base_url"] == "https://api.minimax.io/v1"
|
||||
|
||||
def test_resolve_minimax_cn_with_key(self, monkeypatch):
|
||||
monkeypatch.setenv("MINIMAX_CN_API_KEY", "mmcn-secret-key")
|
||||
creds = resolve_api_key_provider_credentials("minimax-cn")
|
||||
assert creds["provider"] == "minimax-cn"
|
||||
assert creds["api_key"] == "mmcn-secret-key"
|
||||
assert creds["base_url"] == "https://api.minimaxi.com/v1"
|
||||
|
||||
def test_resolve_with_custom_base_url(self, monkeypatch):
|
||||
monkeypatch.setenv("GLM_API_KEY", "glm-key")
|
||||
monkeypatch.setenv("GLM_BASE_URL", "https://custom.glm.example/v4")
|
||||
creds = resolve_api_key_provider_credentials("zai")
|
||||
assert creds["base_url"] == "https://custom.glm.example/v4"
|
||||
|
||||
def test_resolve_without_key_returns_empty(self):
|
||||
creds = resolve_api_key_provider_credentials("zai")
|
||||
assert creds["api_key"] == ""
|
||||
assert creds["source"] == "default"
|
||||
|
||||
def test_resolve_invalid_provider_raises(self):
|
||||
with pytest.raises(AuthError):
|
||||
resolve_api_key_provider_credentials("nous")
|
||||
|
||||
def test_glm_key_priority(self, monkeypatch):
|
||||
"""GLM_API_KEY takes priority over ZAI_API_KEY."""
|
||||
monkeypatch.setenv("GLM_API_KEY", "primary")
|
||||
monkeypatch.setenv("ZAI_API_KEY", "secondary")
|
||||
creds = resolve_api_key_provider_credentials("zai")
|
||||
assert creds["api_key"] == "primary"
|
||||
assert creds["source"] == "GLM_API_KEY"
|
||||
|
||||
def test_zai_key_fallback(self, monkeypatch):
|
||||
"""ZAI_API_KEY used when GLM_API_KEY not set."""
|
||||
monkeypatch.setenv("ZAI_API_KEY", "secondary")
|
||||
creds = resolve_api_key_provider_credentials("zai")
|
||||
assert creds["api_key"] == "secondary"
|
||||
assert creds["source"] == "ZAI_API_KEY"
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Runtime Provider Resolution tests
|
||||
# =============================================================================
|
||||
|
||||
class TestRuntimeProviderResolution:
|
||||
|
||||
def test_runtime_zai(self, monkeypatch):
|
||||
monkeypatch.setenv("GLM_API_KEY", "glm-key")
|
||||
from hermes_cli.runtime_provider import resolve_runtime_provider
|
||||
result = resolve_runtime_provider(requested="zai")
|
||||
assert result["provider"] == "zai"
|
||||
assert result["api_mode"] == "chat_completions"
|
||||
assert result["api_key"] == "glm-key"
|
||||
assert "z.ai" in result["base_url"] or "api.z.ai" in result["base_url"]
|
||||
|
||||
def test_runtime_kimi(self, monkeypatch):
|
||||
monkeypatch.setenv("KIMI_API_KEY", "kimi-key")
|
||||
from hermes_cli.runtime_provider import resolve_runtime_provider
|
||||
result = resolve_runtime_provider(requested="kimi-coding")
|
||||
assert result["provider"] == "kimi-coding"
|
||||
assert result["api_mode"] == "chat_completions"
|
||||
assert result["api_key"] == "kimi-key"
|
||||
|
||||
def test_runtime_minimax(self, monkeypatch):
|
||||
monkeypatch.setenv("MINIMAX_API_KEY", "mm-key")
|
||||
from hermes_cli.runtime_provider import resolve_runtime_provider
|
||||
result = resolve_runtime_provider(requested="minimax")
|
||||
assert result["provider"] == "minimax"
|
||||
assert result["api_key"] == "mm-key"
|
||||
|
||||
def test_runtime_auto_detects_api_key_provider(self, monkeypatch):
|
||||
monkeypatch.setenv("KIMI_API_KEY", "auto-kimi-key")
|
||||
from hermes_cli.runtime_provider import resolve_runtime_provider
|
||||
result = resolve_runtime_provider(requested="auto")
|
||||
assert result["provider"] == "kimi-coding"
|
||||
assert result["api_key"] == "auto-kimi-key"
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# _has_any_provider_configured tests
|
||||
# =============================================================================
|
||||
|
||||
class TestHasAnyProviderConfigured:
|
||||
|
||||
def test_glm_key_counts(self, monkeypatch, tmp_path):
|
||||
from hermes_cli import config as config_module
|
||||
monkeypatch.setenv("GLM_API_KEY", "test-key")
|
||||
hermes_home = tmp_path / ".hermes"
|
||||
hermes_home.mkdir()
|
||||
monkeypatch.setattr(config_module, "get_env_path", lambda: hermes_home / ".env")
|
||||
monkeypatch.setattr(config_module, "get_hermes_home", lambda: hermes_home)
|
||||
from hermes_cli.main import _has_any_provider_configured
|
||||
assert _has_any_provider_configured() is True
|
||||
|
||||
def test_minimax_key_counts(self, monkeypatch, tmp_path):
|
||||
from hermes_cli import config as config_module
|
||||
monkeypatch.setenv("MINIMAX_API_KEY", "test-key")
|
||||
hermes_home = tmp_path / ".hermes"
|
||||
hermes_home.mkdir()
|
||||
monkeypatch.setattr(config_module, "get_env_path", lambda: hermes_home / ".env")
|
||||
monkeypatch.setattr(config_module, "get_hermes_home", lambda: hermes_home)
|
||||
from hermes_cli.main import _has_any_provider_configured
|
||||
assert _has_any_provider_configured() is True
|
||||
@@ -765,43 +765,6 @@ class TestRunConversation:
|
||||
assert result["completed"] is False
|
||||
assert result.get("partial") is True
|
||||
|
||||
def test_nous_401_refreshes_after_remint_and_retries(self, agent):
|
||||
self._setup_agent(agent)
|
||||
agent.provider = "nous"
|
||||
agent.api_mode = "chat_completions"
|
||||
|
||||
calls = {"api": 0, "refresh": 0}
|
||||
|
||||
class _UnauthorizedError(RuntimeError):
|
||||
def __init__(self):
|
||||
super().__init__("Error code: 401 - unauthorized")
|
||||
self.status_code = 401
|
||||
|
||||
def _fake_api_call(api_kwargs):
|
||||
calls["api"] += 1
|
||||
if calls["api"] == 1:
|
||||
raise _UnauthorizedError()
|
||||
return _mock_response(content="Recovered after remint", finish_reason="stop")
|
||||
|
||||
def _fake_refresh(*, force=True):
|
||||
calls["refresh"] += 1
|
||||
assert force is True
|
||||
return True
|
||||
|
||||
with (
|
||||
patch.object(agent, "_persist_session"),
|
||||
patch.object(agent, "_save_trajectory"),
|
||||
patch.object(agent, "_cleanup_task_resources"),
|
||||
patch.object(agent, "_interruptible_api_call", side_effect=_fake_api_call),
|
||||
patch.object(agent, "_try_refresh_nous_client_credentials", side_effect=_fake_refresh),
|
||||
):
|
||||
result = agent.run_conversation("hello")
|
||||
|
||||
assert calls["api"] == 2
|
||||
assert calls["refresh"] == 1
|
||||
assert result["completed"] is True
|
||||
assert result["final_response"] == "Recovered after remint"
|
||||
|
||||
def test_context_compression_triggered(self, agent):
|
||||
"""When compressor says should_compress, compression runs."""
|
||||
self._setup_agent(agent)
|
||||
@@ -975,50 +938,6 @@ class TestConversationHistoryNotMutated:
|
||||
# _max_tokens_param consistency
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestNousCredentialRefresh:
|
||||
"""Verify Nous credential refresh rebuilds the runtime client."""
|
||||
|
||||
def test_try_refresh_nous_client_credentials_rebuilds_client(self, agent, monkeypatch):
|
||||
agent.provider = "nous"
|
||||
agent.api_mode = "chat_completions"
|
||||
|
||||
closed = {"value": False}
|
||||
rebuilt = {"kwargs": None}
|
||||
captured = {}
|
||||
|
||||
class _ExistingClient:
|
||||
def close(self):
|
||||
closed["value"] = True
|
||||
|
||||
class _RebuiltClient:
|
||||
pass
|
||||
|
||||
def _fake_resolve(**kwargs):
|
||||
captured.update(kwargs)
|
||||
return {
|
||||
"api_key": "new-nous-key",
|
||||
"base_url": "https://inference-api.nousresearch.com/v1",
|
||||
}
|
||||
|
||||
def _fake_openai(**kwargs):
|
||||
rebuilt["kwargs"] = kwargs
|
||||
return _RebuiltClient()
|
||||
|
||||
monkeypatch.setattr("hermes_cli.auth.resolve_nous_runtime_credentials", _fake_resolve)
|
||||
|
||||
agent.client = _ExistingClient()
|
||||
with patch("run_agent.OpenAI", side_effect=_fake_openai):
|
||||
ok = agent._try_refresh_nous_client_credentials(force=True)
|
||||
|
||||
assert ok is True
|
||||
assert closed["value"] is True
|
||||
assert captured["force_mint"] is True
|
||||
assert rebuilt["kwargs"]["api_key"] == "new-nous-key"
|
||||
assert rebuilt["kwargs"]["base_url"] == "https://inference-api.nousresearch.com/v1"
|
||||
assert "default_headers" not in rebuilt["kwargs"]
|
||||
assert isinstance(agent.client, _RebuiltClient)
|
||||
|
||||
|
||||
class TestMaxTokensParam:
|
||||
"""Verify _max_tokens_param returns the correct key for each provider."""
|
||||
|
||||
|
||||
@@ -1,269 +0,0 @@
|
||||
"""
|
||||
Tests for timezone support (hermes_time module + integration points).
|
||||
|
||||
Covers:
|
||||
- Valid timezone applies correctly
|
||||
- Invalid timezone falls back safely (no crash, warning logged)
|
||||
- execute_code child env receives TZ
|
||||
- Cron uses timezone-aware now()
|
||||
- Backward compatibility with naive timestamps
|
||||
"""
|
||||
|
||||
import os
|
||||
import logging
|
||||
import sys
|
||||
import pytest
|
||||
from datetime import datetime, timedelta, timezone
|
||||
from unittest.mock import patch, MagicMock
|
||||
from zoneinfo import ZoneInfo
|
||||
|
||||
import hermes_time
|
||||
|
||||
|
||||
# =========================================================================
|
||||
# hermes_time.now() — core helper
|
||||
# =========================================================================
|
||||
|
||||
class TestHermesTimeNow:
|
||||
"""Test the timezone-aware now() helper."""
|
||||
|
||||
def setup_method(self):
|
||||
hermes_time.reset_cache()
|
||||
|
||||
def teardown_method(self):
|
||||
hermes_time.reset_cache()
|
||||
os.environ.pop("HERMES_TIMEZONE", None)
|
||||
|
||||
def test_valid_timezone_applies(self):
|
||||
"""With a valid IANA timezone, now() returns time in that zone."""
|
||||
os.environ["HERMES_TIMEZONE"] = "Asia/Kolkata"
|
||||
result = hermes_time.now()
|
||||
assert result.tzinfo is not None
|
||||
# IST is UTC+5:30
|
||||
offset = result.utcoffset()
|
||||
assert offset == timedelta(hours=5, minutes=30)
|
||||
|
||||
def test_utc_timezone(self):
|
||||
"""UTC timezone works."""
|
||||
os.environ["HERMES_TIMEZONE"] = "UTC"
|
||||
result = hermes_time.now()
|
||||
assert result.utcoffset() == timedelta(0)
|
||||
|
||||
def test_us_eastern(self):
|
||||
"""US/Eastern timezone works (DST-aware zone)."""
|
||||
os.environ["HERMES_TIMEZONE"] = "America/New_York"
|
||||
result = hermes_time.now()
|
||||
assert result.tzinfo is not None
|
||||
# Offset is -5h or -4h depending on DST
|
||||
offset_hours = result.utcoffset().total_seconds() / 3600
|
||||
assert offset_hours in (-5, -4)
|
||||
|
||||
def test_invalid_timezone_falls_back(self, caplog):
|
||||
"""Invalid timezone logs warning and falls back to server-local."""
|
||||
os.environ["HERMES_TIMEZONE"] = "Mars/Olympus_Mons"
|
||||
with caplog.at_level(logging.WARNING, logger="hermes_time"):
|
||||
result = hermes_time.now()
|
||||
assert result.tzinfo is not None # Still tz-aware (server-local)
|
||||
assert "Invalid timezone" in caplog.text
|
||||
assert "Mars/Olympus_Mons" in caplog.text
|
||||
|
||||
def test_empty_timezone_uses_local(self):
|
||||
"""No timezone configured → server-local time (still tz-aware)."""
|
||||
os.environ.pop("HERMES_TIMEZONE", None)
|
||||
result = hermes_time.now()
|
||||
assert result.tzinfo is not None
|
||||
|
||||
def test_format_unchanged(self):
|
||||
"""Timestamp formatting matches original strftime pattern."""
|
||||
os.environ["HERMES_TIMEZONE"] = "Asia/Kolkata"
|
||||
result = hermes_time.now()
|
||||
formatted = result.strftime("%A, %B %d, %Y %I:%M %p")
|
||||
# Should produce something like "Monday, March 03, 2026 05:30 PM"
|
||||
assert len(formatted) > 10
|
||||
# No timezone abbreviation in the format (matching original behavior)
|
||||
assert "+" not in formatted
|
||||
|
||||
def test_cache_invalidation(self):
|
||||
"""Changing env var + reset_cache picks up new timezone."""
|
||||
os.environ["HERMES_TIMEZONE"] = "UTC"
|
||||
hermes_time.reset_cache()
|
||||
r1 = hermes_time.now()
|
||||
assert r1.utcoffset() == timedelta(0)
|
||||
|
||||
os.environ["HERMES_TIMEZONE"] = "Asia/Kolkata"
|
||||
hermes_time.reset_cache()
|
||||
r2 = hermes_time.now()
|
||||
assert r2.utcoffset() == timedelta(hours=5, minutes=30)
|
||||
|
||||
|
||||
class TestGetTimezone:
|
||||
"""Test get_timezone() and get_timezone_name()."""
|
||||
|
||||
def setup_method(self):
|
||||
hermes_time.reset_cache()
|
||||
|
||||
def teardown_method(self):
|
||||
hermes_time.reset_cache()
|
||||
os.environ.pop("HERMES_TIMEZONE", None)
|
||||
|
||||
def test_returns_zoneinfo_for_valid(self):
|
||||
os.environ["HERMES_TIMEZONE"] = "Europe/London"
|
||||
tz = hermes_time.get_timezone()
|
||||
assert isinstance(tz, ZoneInfo)
|
||||
assert str(tz) == "Europe/London"
|
||||
|
||||
def test_returns_none_for_empty(self):
|
||||
os.environ.pop("HERMES_TIMEZONE", None)
|
||||
tz = hermes_time.get_timezone()
|
||||
assert tz is None
|
||||
|
||||
def test_returns_none_for_invalid(self):
|
||||
os.environ["HERMES_TIMEZONE"] = "Not/A/Timezone"
|
||||
tz = hermes_time.get_timezone()
|
||||
assert tz is None
|
||||
|
||||
def test_get_timezone_name(self):
|
||||
os.environ["HERMES_TIMEZONE"] = "Asia/Tokyo"
|
||||
assert hermes_time.get_timezone_name() == "Asia/Tokyo"
|
||||
|
||||
|
||||
# =========================================================================
|
||||
# execute_code child env — TZ injection
|
||||
# =========================================================================
|
||||
|
||||
@pytest.mark.skipif(sys.platform == "win32", reason="UDS not available on Windows")
|
||||
class TestCodeExecutionTZ:
|
||||
"""Verify TZ env var is passed to sandboxed child process via real execute_code."""
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def _import_execute_code(self):
|
||||
"""Lazy-import execute_code to avoid pulling in firecrawl at collection time."""
|
||||
try:
|
||||
from tools.code_execution_tool import execute_code
|
||||
self._execute_code = execute_code
|
||||
except ImportError:
|
||||
pytest.skip("tools.code_execution_tool not importable (missing deps)")
|
||||
|
||||
def teardown_method(self):
|
||||
os.environ.pop("HERMES_TIMEZONE", None)
|
||||
|
||||
def _mock_handle(self, function_name, function_args, task_id=None, user_task=None):
|
||||
import json as _json
|
||||
return _json.dumps({"error": f"unexpected tool call: {function_name}"})
|
||||
|
||||
def test_tz_injected_when_configured(self):
|
||||
"""When HERMES_TIMEZONE is set, child process sees TZ env var."""
|
||||
import json as _json
|
||||
os.environ["HERMES_TIMEZONE"] = "Asia/Kolkata"
|
||||
|
||||
with patch("model_tools.handle_function_call", side_effect=self._mock_handle):
|
||||
result = _json.loads(self._execute_code(
|
||||
code='import os; print(os.environ.get("TZ", "NOT_SET"))',
|
||||
task_id="tz-test",
|
||||
enabled_tools=[],
|
||||
))
|
||||
assert result["status"] == "success"
|
||||
assert "Asia/Kolkata" in result["output"]
|
||||
|
||||
def test_tz_not_injected_when_empty(self):
|
||||
"""When HERMES_TIMEZONE is not set, child process has no TZ."""
|
||||
import json as _json
|
||||
os.environ.pop("HERMES_TIMEZONE", None)
|
||||
|
||||
with patch("model_tools.handle_function_call", side_effect=self._mock_handle):
|
||||
result = _json.loads(self._execute_code(
|
||||
code='import os; print(os.environ.get("TZ", "NOT_SET"))',
|
||||
task_id="tz-test-empty",
|
||||
enabled_tools=[],
|
||||
))
|
||||
assert result["status"] == "success"
|
||||
assert "NOT_SET" in result["output"]
|
||||
|
||||
def test_hermes_timezone_not_leaked_to_child(self):
|
||||
"""HERMES_TIMEZONE itself must NOT appear in child env (only TZ)."""
|
||||
import json as _json
|
||||
os.environ["HERMES_TIMEZONE"] = "Asia/Kolkata"
|
||||
|
||||
with patch("model_tools.handle_function_call", side_effect=self._mock_handle):
|
||||
result = _json.loads(self._execute_code(
|
||||
code='import os; print(os.environ.get("HERMES_TIMEZONE", "NOT_SET"))',
|
||||
task_id="tz-leak-test",
|
||||
enabled_tools=[],
|
||||
))
|
||||
assert result["status"] == "success"
|
||||
assert "NOT_SET" in result["output"]
|
||||
|
||||
|
||||
# =========================================================================
|
||||
# Cron timezone-aware scheduling
|
||||
# =========================================================================
|
||||
|
||||
class TestCronTimezone:
|
||||
"""Verify cron paths use timezone-aware now()."""
|
||||
|
||||
def setup_method(self):
|
||||
hermes_time.reset_cache()
|
||||
|
||||
def teardown_method(self):
|
||||
hermes_time.reset_cache()
|
||||
os.environ.pop("HERMES_TIMEZONE", None)
|
||||
|
||||
def test_parse_schedule_duration_uses_tz_aware_now(self):
|
||||
"""parse_schedule('30m') should produce a tz-aware run_at."""
|
||||
os.environ["HERMES_TIMEZONE"] = "Asia/Kolkata"
|
||||
from cron.jobs import parse_schedule
|
||||
result = parse_schedule("30m")
|
||||
run_at = datetime.fromisoformat(result["run_at"])
|
||||
# The stored timestamp should be tz-aware
|
||||
assert run_at.tzinfo is not None
|
||||
|
||||
def test_compute_next_run_tz_aware(self):
|
||||
"""compute_next_run returns tz-aware timestamps."""
|
||||
os.environ["HERMES_TIMEZONE"] = "Asia/Kolkata"
|
||||
from cron.jobs import compute_next_run
|
||||
schedule = {"kind": "interval", "minutes": 60}
|
||||
result = compute_next_run(schedule)
|
||||
next_dt = datetime.fromisoformat(result)
|
||||
assert next_dt.tzinfo is not None
|
||||
|
||||
def test_get_due_jobs_handles_naive_timestamps(self, tmp_path, monkeypatch):
|
||||
"""Backward compat: naive timestamps from before tz support don't crash."""
|
||||
import cron.jobs as jobs_module
|
||||
monkeypatch.setattr(jobs_module, "CRON_DIR", tmp_path / "cron")
|
||||
monkeypatch.setattr(jobs_module, "JOBS_FILE", tmp_path / "cron" / "jobs.json")
|
||||
monkeypatch.setattr(jobs_module, "OUTPUT_DIR", tmp_path / "cron" / "output")
|
||||
|
||||
os.environ["HERMES_TIMEZONE"] = "Asia/Kolkata"
|
||||
hermes_time.reset_cache()
|
||||
|
||||
# Create a job with a NAIVE past timestamp (simulating pre-tz data)
|
||||
from cron.jobs import create_job, load_jobs, save_jobs, get_due_jobs
|
||||
job = create_job(prompt="Test job", schedule="every 1h")
|
||||
jobs = load_jobs()
|
||||
# Force a naive (no timezone) past timestamp
|
||||
naive_past = (datetime.now() - timedelta(minutes=5)).isoformat()
|
||||
jobs[0]["next_run_at"] = naive_past
|
||||
save_jobs(jobs)
|
||||
|
||||
# Should not crash — _ensure_aware handles the naive timestamp
|
||||
due = get_due_jobs()
|
||||
assert len(due) == 1
|
||||
|
||||
def test_create_job_stores_tz_aware_timestamps(self, tmp_path, monkeypatch):
|
||||
"""New jobs store timezone-aware created_at and next_run_at."""
|
||||
import cron.jobs as jobs_module
|
||||
monkeypatch.setattr(jobs_module, "CRON_DIR", tmp_path / "cron")
|
||||
monkeypatch.setattr(jobs_module, "JOBS_FILE", tmp_path / "cron" / "jobs.json")
|
||||
monkeypatch.setattr(jobs_module, "OUTPUT_DIR", tmp_path / "cron" / "output")
|
||||
|
||||
os.environ["HERMES_TIMEZONE"] = "US/Eastern"
|
||||
hermes_time.reset_cache()
|
||||
|
||||
from cron.jobs import create_job
|
||||
job = create_job(prompt="TZ test", schedule="every 2h")
|
||||
|
||||
created = datetime.fromisoformat(job["created_at"])
|
||||
assert created.tzinfo is not None
|
||||
|
||||
next_run = datetime.fromisoformat(job["next_run_at"])
|
||||
assert next_run.tzinfo is not None
|
||||
@@ -11,7 +11,6 @@ from tools.skills_tool import (
|
||||
_estimate_tokens,
|
||||
_find_all_skills,
|
||||
_load_category_description,
|
||||
skill_matches_platform,
|
||||
skills_list,
|
||||
skills_categories,
|
||||
skill_view,
|
||||
@@ -333,134 +332,3 @@ class TestSkillsCategories:
|
||||
result = json.loads(raw)
|
||||
assert result["success"] is True
|
||||
assert result["categories"] == []
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# skill_matches_platform
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestSkillMatchesPlatform:
|
||||
"""Tests for the platforms frontmatter field filtering."""
|
||||
|
||||
def test_no_platforms_field_matches_everything(self):
|
||||
"""Skills without a platforms field should load on any OS."""
|
||||
assert skill_matches_platform({}) is True
|
||||
assert skill_matches_platform({"name": "foo"}) is True
|
||||
|
||||
def test_empty_platforms_matches_everything(self):
|
||||
"""Empty platforms list should load on any OS."""
|
||||
assert skill_matches_platform({"platforms": []}) is True
|
||||
assert skill_matches_platform({"platforms": None}) is True
|
||||
|
||||
def test_macos_on_darwin(self):
|
||||
with patch("tools.skills_tool.sys") as mock_sys:
|
||||
mock_sys.platform = "darwin"
|
||||
assert skill_matches_platform({"platforms": ["macos"]}) is True
|
||||
|
||||
def test_macos_on_linux(self):
|
||||
with patch("tools.skills_tool.sys") as mock_sys:
|
||||
mock_sys.platform = "linux"
|
||||
assert skill_matches_platform({"platforms": ["macos"]}) is False
|
||||
|
||||
def test_linux_on_linux(self):
|
||||
with patch("tools.skills_tool.sys") as mock_sys:
|
||||
mock_sys.platform = "linux"
|
||||
assert skill_matches_platform({"platforms": ["linux"]}) is True
|
||||
|
||||
def test_linux_on_darwin(self):
|
||||
with patch("tools.skills_tool.sys") as mock_sys:
|
||||
mock_sys.platform = "darwin"
|
||||
assert skill_matches_platform({"platforms": ["linux"]}) is False
|
||||
|
||||
def test_windows_on_win32(self):
|
||||
with patch("tools.skills_tool.sys") as mock_sys:
|
||||
mock_sys.platform = "win32"
|
||||
assert skill_matches_platform({"platforms": ["windows"]}) is True
|
||||
|
||||
def test_windows_on_linux(self):
|
||||
with patch("tools.skills_tool.sys") as mock_sys:
|
||||
mock_sys.platform = "linux"
|
||||
assert skill_matches_platform({"platforms": ["windows"]}) is False
|
||||
|
||||
def test_multi_platform_match(self):
|
||||
"""Skills listing multiple platforms should match any of them."""
|
||||
with patch("tools.skills_tool.sys") as mock_sys:
|
||||
mock_sys.platform = "darwin"
|
||||
assert skill_matches_platform({"platforms": ["macos", "linux"]}) is True
|
||||
mock_sys.platform = "linux"
|
||||
assert skill_matches_platform({"platforms": ["macos", "linux"]}) is True
|
||||
mock_sys.platform = "win32"
|
||||
assert skill_matches_platform({"platforms": ["macos", "linux"]}) is False
|
||||
|
||||
def test_string_instead_of_list(self):
|
||||
"""A single string value should be treated as a one-element list."""
|
||||
with patch("tools.skills_tool.sys") as mock_sys:
|
||||
mock_sys.platform = "darwin"
|
||||
assert skill_matches_platform({"platforms": "macos"}) is True
|
||||
mock_sys.platform = "linux"
|
||||
assert skill_matches_platform({"platforms": "macos"}) is False
|
||||
|
||||
def test_case_insensitive(self):
|
||||
with patch("tools.skills_tool.sys") as mock_sys:
|
||||
mock_sys.platform = "darwin"
|
||||
assert skill_matches_platform({"platforms": ["MacOS"]}) is True
|
||||
assert skill_matches_platform({"platforms": ["MACOS"]}) is True
|
||||
|
||||
def test_unknown_platform_no_match(self):
|
||||
with patch("tools.skills_tool.sys") as mock_sys:
|
||||
mock_sys.platform = "linux"
|
||||
assert skill_matches_platform({"platforms": ["freebsd"]}) is False
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _find_all_skills — platform filtering integration
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestFindAllSkillsPlatformFiltering:
|
||||
"""Test that _find_all_skills respects the platforms field."""
|
||||
|
||||
def test_excludes_incompatible_platform(self, tmp_path):
|
||||
with patch("tools.skills_tool.SKILLS_DIR", tmp_path), \
|
||||
patch("tools.skills_tool.sys") as mock_sys:
|
||||
mock_sys.platform = "linux"
|
||||
_make_skill(tmp_path, "universal-skill")
|
||||
_make_skill(tmp_path, "mac-only", frontmatter_extra="platforms: [macos]\n")
|
||||
skills = _find_all_skills()
|
||||
names = {s["name"] for s in skills}
|
||||
assert "universal-skill" in names
|
||||
assert "mac-only" not in names
|
||||
|
||||
def test_includes_matching_platform(self, tmp_path):
|
||||
with patch("tools.skills_tool.SKILLS_DIR", tmp_path), \
|
||||
patch("tools.skills_tool.sys") as mock_sys:
|
||||
mock_sys.platform = "darwin"
|
||||
_make_skill(tmp_path, "mac-only", frontmatter_extra="platforms: [macos]\n")
|
||||
skills = _find_all_skills()
|
||||
names = {s["name"] for s in skills}
|
||||
assert "mac-only" in names
|
||||
|
||||
def test_no_platforms_always_included(self, tmp_path):
|
||||
"""Skills without platforms field should appear on any platform."""
|
||||
with patch("tools.skills_tool.SKILLS_DIR", tmp_path), \
|
||||
patch("tools.skills_tool.sys") as mock_sys:
|
||||
mock_sys.platform = "win32"
|
||||
_make_skill(tmp_path, "generic-skill")
|
||||
skills = _find_all_skills()
|
||||
assert len(skills) == 1
|
||||
assert skills[0]["name"] == "generic-skill"
|
||||
|
||||
def test_multi_platform_skill(self, tmp_path):
|
||||
with patch("tools.skills_tool.SKILLS_DIR", tmp_path), \
|
||||
patch("tools.skills_tool.sys") as mock_sys:
|
||||
_make_skill(tmp_path, "cross-plat", frontmatter_extra="platforms: [macos, linux]\n")
|
||||
mock_sys.platform = "darwin"
|
||||
skills_darwin = _find_all_skills()
|
||||
mock_sys.platform = "linux"
|
||||
skills_linux = _find_all_skills()
|
||||
mock_sys.platform = "win32"
|
||||
skills_win = _find_all_skills()
|
||||
assert len(skills_darwin) == 1
|
||||
assert len(skills_linux) == 1
|
||||
assert len(skills_win) == 0
|
||||
|
||||
@@ -2,23 +2,17 @@
|
||||
"""
|
||||
Browser Tool Module
|
||||
|
||||
This module provides browser automation tools using agent-browser CLI. It
|
||||
supports two backends — **Browserbase** (cloud) and **local Chromium** — with
|
||||
identical agent-facing behaviour. The backend is auto-detected: if
|
||||
``BROWSERBASE_API_KEY`` is set the cloud service is used; otherwise a local
|
||||
headless Chromium instance is launched automatically.
|
||||
This module provides browser automation tools using agent-browser CLI with
|
||||
Browserbase cloud execution. It enables AI agents to navigate websites,
|
||||
interact with page elements, and extract information in a text-based format.
|
||||
|
||||
The tool uses agent-browser's accessibility tree (ariaSnapshot) for text-based
|
||||
page representation, making it ideal for LLM agents without vision capabilities.
|
||||
|
||||
Features:
|
||||
- **Local mode** (default): zero-cost headless Chromium via agent-browser.
|
||||
Works on Linux servers without a display. One-time setup:
|
||||
``agent-browser install`` (downloads Chromium) or
|
||||
``agent-browser install --with-deps`` (also installs system libraries for
|
||||
Debian/Ubuntu/Docker).
|
||||
- **Cloud mode**: Browserbase cloud execution with stealth features, proxies,
|
||||
and CAPTCHA solving. Activated when BROWSERBASE_API_KEY is set.
|
||||
- Cloud browser execution via Browserbase (no local browser needed)
|
||||
- Basic Stealth Mode always active (random fingerprints, CAPTCHA solving)
|
||||
- Proxies enabled by default for better CAPTCHA solving and anti-bot avoidance
|
||||
- Session isolation per task ID
|
||||
- Text-based page snapshots using accessibility tree
|
||||
- Element interaction via ref selectors (@e1, @e2, etc.)
|
||||
@@ -26,8 +20,8 @@ Features:
|
||||
- Automatic cleanup of browser sessions
|
||||
|
||||
Environment Variables:
|
||||
- BROWSERBASE_API_KEY: API key for Browserbase (enables cloud mode)
|
||||
- BROWSERBASE_PROJECT_ID: Project ID for Browserbase (required for cloud mode)
|
||||
- BROWSERBASE_API_KEY: API key for Browserbase (required)
|
||||
- BROWSERBASE_PROJECT_ID: Project ID for Browserbase (required)
|
||||
- BROWSERBASE_PROXIES: Enable/disable residential proxies (default: "true")
|
||||
- BROWSERBASE_ADVANCED_STEALTH: Enable advanced stealth mode with custom Chromium,
|
||||
requires Scale Plan (default: "false")
|
||||
@@ -63,7 +57,7 @@ import time
|
||||
import requests
|
||||
from typing import Dict, Any, Optional, List
|
||||
from pathlib import Path
|
||||
from agent.auxiliary_client import get_vision_auxiliary_client, get_text_auxiliary_client
|
||||
from agent.auxiliary_client import get_vision_auxiliary_client
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@@ -80,43 +74,12 @@ DEFAULT_SESSION_TIMEOUT = 300
|
||||
# Max tokens for snapshot content before summarization
|
||||
SNAPSHOT_SUMMARIZE_THRESHOLD = 8000
|
||||
|
||||
# Vision client — for browser_vision (screenshot analysis)
|
||||
_aux_vision_client, _DEFAULT_VISION_MODEL = get_vision_auxiliary_client()
|
||||
|
||||
# Text client — for page snapshot summarization (same config as web_extract)
|
||||
_aux_text_client, _DEFAULT_TEXT_MODEL = get_text_auxiliary_client("web_extract")
|
||||
|
||||
# Module-level alias for availability checks
|
||||
EXTRACTION_MODEL = _DEFAULT_TEXT_MODEL or _DEFAULT_VISION_MODEL
|
||||
|
||||
|
||||
def _get_vision_model() -> str:
|
||||
"""Model for browser_vision (screenshot analysis — multimodal)."""
|
||||
return (os.getenv("AUXILIARY_VISION_MODEL", "").strip()
|
||||
or _DEFAULT_VISION_MODEL
|
||||
or "google/gemini-3-flash-preview")
|
||||
|
||||
|
||||
def _get_extraction_model() -> str:
|
||||
"""Model for page snapshot text summarization — same as web_extract."""
|
||||
return (os.getenv("AUXILIARY_WEB_EXTRACT_MODEL", "").strip()
|
||||
or _DEFAULT_TEXT_MODEL
|
||||
or "google/gemini-3-flash-preview")
|
||||
|
||||
|
||||
def _is_local_mode() -> bool:
|
||||
"""Return True when no Browserbase credentials are configured.
|
||||
|
||||
In local mode the browser tools launch a headless Chromium instance via
|
||||
``agent-browser --session`` instead of connecting to a remote Browserbase
|
||||
session via ``--cdp``.
|
||||
"""
|
||||
return not (os.environ.get("BROWSERBASE_API_KEY") and os.environ.get("BROWSERBASE_PROJECT_ID"))
|
||||
|
||||
# Resolve vision auxiliary client for extraction/vision tasks
|
||||
_aux_vision_client, EXTRACTION_MODEL = get_vision_auxiliary_client()
|
||||
|
||||
# Track active sessions per task
|
||||
# Stores: session_name (always), bb_session_id + cdp_url (cloud mode only)
|
||||
_active_sessions: Dict[str, Dict[str, str]] = {} # task_id -> {session_name, ...}
|
||||
# Now stores tuple of (session_name, browserbase_session_id, cdp_url)
|
||||
_active_sessions: Dict[str, Dict[str, str]] = {} # task_id -> {session_name, bb_session_id, cdp_url}
|
||||
|
||||
# Flag to track if cleanup has been done
|
||||
_cleanup_done = False
|
||||
@@ -157,56 +120,35 @@ def _emergency_cleanup_all_sessions():
|
||||
logger.info("Emergency cleanup: closing %s active session(s)...", len(_active_sessions))
|
||||
|
||||
try:
|
||||
if _is_local_mode():
|
||||
# Local mode: just close agent-browser sessions via CLI
|
||||
for task_id, session_info in list(_active_sessions.items()):
|
||||
session_name = session_info.get("session_name")
|
||||
if session_name:
|
||||
try:
|
||||
browser_cmd = _find_agent_browser()
|
||||
task_socket_dir = os.path.join(
|
||||
tempfile.gettempdir(),
|
||||
f"agent-browser-{session_name}"
|
||||
)
|
||||
env = {**os.environ, "AGENT_BROWSER_SOCKET_DIR": task_socket_dir}
|
||||
subprocess.run(
|
||||
browser_cmd.split() + ["--session", session_name, "--json", "close"],
|
||||
capture_output=True, timeout=5, env=env,
|
||||
)
|
||||
logger.info("Closed local session %s", session_name)
|
||||
except Exception as e:
|
||||
logger.debug("Error closing local session %s: %s", session_name, e)
|
||||
else:
|
||||
# Cloud mode: release Browserbase sessions via API
|
||||
api_key = os.environ.get("BROWSERBASE_API_KEY")
|
||||
project_id = os.environ.get("BROWSERBASE_PROJECT_ID")
|
||||
|
||||
if not api_key or not project_id:
|
||||
logger.warning("Cannot cleanup - missing BROWSERBASE credentials")
|
||||
return
|
||||
|
||||
for task_id, session_info in list(_active_sessions.items()):
|
||||
bb_session_id = session_info.get("bb_session_id")
|
||||
if bb_session_id:
|
||||
try:
|
||||
response = requests.post(
|
||||
f"https://api.browserbase.com/v1/sessions/{bb_session_id}",
|
||||
headers={
|
||||
"X-BB-API-Key": api_key,
|
||||
"Content-Type": "application/json"
|
||||
},
|
||||
json={
|
||||
"projectId": project_id,
|
||||
"status": "REQUEST_RELEASE"
|
||||
},
|
||||
timeout=5 # Short timeout for cleanup
|
||||
)
|
||||
if response.status_code in (200, 201, 204):
|
||||
logger.info("Closed session %s", bb_session_id)
|
||||
else:
|
||||
logger.warning("Failed to close session %s: HTTP %s", bb_session_id, response.status_code)
|
||||
except Exception as e:
|
||||
logger.error("Error closing session %s: %s", bb_session_id, e)
|
||||
api_key = os.environ.get("BROWSERBASE_API_KEY")
|
||||
project_id = os.environ.get("BROWSERBASE_PROJECT_ID")
|
||||
|
||||
if not api_key or not project_id:
|
||||
logger.warning("Cannot cleanup - missing BROWSERBASE credentials")
|
||||
return
|
||||
|
||||
for task_id, session_info in list(_active_sessions.items()):
|
||||
bb_session_id = session_info.get("bb_session_id")
|
||||
if bb_session_id:
|
||||
try:
|
||||
response = requests.post(
|
||||
f"https://api.browserbase.com/v1/sessions/{bb_session_id}",
|
||||
headers={
|
||||
"X-BB-API-Key": api_key,
|
||||
"Content-Type": "application/json"
|
||||
},
|
||||
json={
|
||||
"projectId": project_id,
|
||||
"status": "REQUEST_RELEASE"
|
||||
},
|
||||
timeout=5 # Short timeout for cleanup
|
||||
)
|
||||
if response.status_code in (200, 201, 204):
|
||||
logger.info("Closed session %s", bb_session_id)
|
||||
else:
|
||||
logger.warning("Failed to close session %s: HTTP %s", bb_session_id, response.status_code)
|
||||
except Exception as e:
|
||||
logger.error("Error closing session %s: %s", bb_session_id, e)
|
||||
|
||||
_active_sessions.clear()
|
||||
except Exception as e:
|
||||
@@ -242,7 +184,7 @@ def _cleanup_inactive_browser_sessions():
|
||||
|
||||
This function is called periodically by the background cleanup thread to
|
||||
automatically close sessions that haven't been used recently, preventing
|
||||
orphaned sessions (local or Browserbase) from accumulating.
|
||||
orphaned Browserbase sessions from accumulating.
|
||||
"""
|
||||
current_time = time.time()
|
||||
sessions_to_cleanup = []
|
||||
@@ -618,29 +560,11 @@ def _create_browserbase_session(task_id: str) -> Dict[str, str]:
|
||||
}
|
||||
|
||||
|
||||
def _create_local_session(task_id: str) -> Dict[str, str]:
|
||||
"""Create a lightweight local browser session (no cloud API call).
|
||||
|
||||
Returns the same dict shape as ``_create_browserbase_session`` so the rest
|
||||
of the code can treat both modes uniformly.
|
||||
"""
|
||||
import uuid
|
||||
session_name = f"hermes_{task_id}_{uuid.uuid4().hex[:8]}"
|
||||
logger.info("Created local browser session %s", session_name)
|
||||
return {
|
||||
"session_name": session_name,
|
||||
"bb_session_id": None, # Not applicable in local mode
|
||||
"cdp_url": None, # Not applicable in local mode
|
||||
"features": {"local": True},
|
||||
}
|
||||
|
||||
|
||||
def _get_session_info(task_id: Optional[str] = None) -> Dict[str, str]:
|
||||
"""
|
||||
Get or create session info for the given task.
|
||||
|
||||
In cloud mode, creates a Browserbase session with proxies enabled.
|
||||
In local mode, generates a session name for agent-browser --session.
|
||||
Creates a Browserbase session with proxies enabled if one doesn't exist.
|
||||
Also starts the inactivity cleanup thread and updates activity tracking.
|
||||
Thread-safe: multiple subagents can call this concurrently.
|
||||
|
||||
@@ -648,7 +572,7 @@ def _get_session_info(task_id: Optional[str] = None) -> Dict[str, str]:
|
||||
task_id: Unique identifier for the task
|
||||
|
||||
Returns:
|
||||
Dict with session_name (always), bb_session_id + cdp_url (cloud only)
|
||||
Dict with session_name, bb_session_id, and cdp_url
|
||||
"""
|
||||
if task_id is None:
|
||||
task_id = "default"
|
||||
@@ -664,11 +588,8 @@ def _get_session_info(task_id: Optional[str] = None) -> Dict[str, str]:
|
||||
if task_id in _active_sessions:
|
||||
return _active_sessions[task_id]
|
||||
|
||||
# Create session outside the lock (network call in cloud mode)
|
||||
if _is_local_mode():
|
||||
session_info = _create_local_session(task_id)
|
||||
else:
|
||||
session_info = _create_browserbase_session(task_id)
|
||||
# Create session outside the lock (network call - don't hold lock during I/O)
|
||||
session_info = _create_browserbase_session(task_id)
|
||||
|
||||
with _cleanup_lock:
|
||||
_active_sessions[task_id] = session_info
|
||||
@@ -787,20 +708,12 @@ def _run_browser_command(
|
||||
except Exception as e:
|
||||
return {"success": False, "error": f"Failed to create browser session: {str(e)}"}
|
||||
|
||||
# Build the command with the appropriate backend flag.
|
||||
# Cloud mode: --cdp <websocket_url> connects to Browserbase.
|
||||
# Local mode: --session <name> launches a local headless Chromium.
|
||||
# The rest of the command (--json, command, args) is identical.
|
||||
if session_info.get("cdp_url"):
|
||||
# Cloud mode — connect to remote Browserbase browser via CDP
|
||||
# IMPORTANT: Do NOT use --session with --cdp. In agent-browser >=0.13,
|
||||
# --session creates a local browser instance and silently ignores --cdp.
|
||||
backend_args = ["--cdp", session_info["cdp_url"]]
|
||||
else:
|
||||
# Local mode — launch a headless Chromium instance
|
||||
backend_args = ["--session", session_info["session_name"]]
|
||||
|
||||
cmd_parts = browser_cmd.split() + backend_args + [
|
||||
# Connect via CDP to our pre-created Browserbase session.
|
||||
# IMPORTANT: Do NOT use --session with --cdp. In agent-browser >=0.13,
|
||||
# --session creates a local browser instance and silently ignores --cdp.
|
||||
# Per-task isolation is handled by AGENT_BROWSER_SOCKET_DIR instead.
|
||||
cmd_parts = browser_cmd.split() + [
|
||||
"--cdp", session_info["cdp_url"],
|
||||
"--json",
|
||||
command
|
||||
] + args
|
||||
@@ -870,9 +783,9 @@ def _extract_relevant_content(
|
||||
) -> str:
|
||||
"""Use LLM to extract relevant content from a snapshot based on the user's task.
|
||||
|
||||
Falls back to simple truncation when no auxiliary text model is configured.
|
||||
Falls back to simple truncation when no auxiliary vision model is configured.
|
||||
"""
|
||||
if _aux_text_client is None:
|
||||
if _aux_vision_client is None or EXTRACTION_MODEL is None:
|
||||
return _truncate_snapshot(snapshot_text)
|
||||
|
||||
if user_task:
|
||||
@@ -900,8 +813,8 @@ def _extract_relevant_content(
|
||||
|
||||
try:
|
||||
from agent.auxiliary_client import auxiliary_max_tokens_param
|
||||
response = _aux_text_client.chat.completions.create(
|
||||
model=_get_extraction_model(),
|
||||
response = _aux_vision_client.chat.completions.create(
|
||||
model=EXTRACTION_MODEL,
|
||||
messages=[{"role": "user", "content": extraction_prompt}],
|
||||
**auxiliary_max_tokens_param(4000),
|
||||
temperature=0.1,
|
||||
@@ -1218,13 +1131,12 @@ def browser_close(task_id: Optional[str] = None) -> str:
|
||||
effective_task_id = task_id or "default"
|
||||
result = _run_browser_command(effective_task_id, "close", [])
|
||||
|
||||
# Close the backend session (Browserbase API in cloud mode, nothing extra in local mode)
|
||||
# Close the BrowserBase session via API
|
||||
session_key = task_id if task_id and task_id in _active_sessions else "default"
|
||||
if session_key in _active_sessions:
|
||||
session_info = _active_sessions[session_key]
|
||||
bb_session_id = session_info.get("bb_session_id")
|
||||
if bb_session_id:
|
||||
# Cloud mode: release the Browserbase session via API
|
||||
try:
|
||||
config = _get_browserbase_config()
|
||||
_close_browserbase_session(bb_session_id, config["api_key"], config["project_id"])
|
||||
@@ -1324,7 +1236,7 @@ def browser_vision(question: str, task_id: Optional[str] = None) -> str:
|
||||
effective_task_id = task_id or "default"
|
||||
|
||||
# Check auxiliary vision client
|
||||
if _aux_vision_client is None or _DEFAULT_VISION_MODEL is None:
|
||||
if _aux_vision_client is None or EXTRACTION_MODEL is None:
|
||||
return json.dumps({
|
||||
"success": False,
|
||||
"error": "Browser vision unavailable: no auxiliary vision model configured. "
|
||||
@@ -1374,7 +1286,7 @@ def browser_vision(question: str, task_id: Optional[str] = None) -> str:
|
||||
# Use the sync auxiliary vision client directly
|
||||
from agent.auxiliary_client import auxiliary_max_tokens_param
|
||||
response = _aux_vision_client.chat.completions.create(
|
||||
model=_get_vision_model(),
|
||||
model=EXTRACTION_MODEL,
|
||||
messages=[
|
||||
{
|
||||
"role": "user",
|
||||
@@ -1492,15 +1404,14 @@ def cleanup_browser(task_id: Optional[str] = None) -> None:
|
||||
_active_sessions.pop(task_id, None)
|
||||
_session_last_activity.pop(task_id, None)
|
||||
|
||||
# Cloud mode: close the Browserbase session via API
|
||||
if bb_session_id and not _is_local_mode():
|
||||
try:
|
||||
config = _get_browserbase_config()
|
||||
success = _close_browserbase_session(bb_session_id, config["api_key"], config["project_id"])
|
||||
if not success:
|
||||
logger.warning("Could not close BrowserBase session %s", bb_session_id)
|
||||
except Exception as e:
|
||||
logger.error("Exception during BrowserBase session close: %s", e)
|
||||
# Close the Browserbase session immediately via API
|
||||
try:
|
||||
config = _get_browserbase_config()
|
||||
success = _close_browserbase_session(bb_session_id, config["api_key"], config["project_id"])
|
||||
if not success:
|
||||
logger.warning("Could not close BrowserBase session %s", bb_session_id)
|
||||
except Exception as e:
|
||||
logger.error("Exception during BrowserBase session close: %s", e)
|
||||
|
||||
# Kill the daemon process and clean up socket directory
|
||||
session_name = session_info.get("session_name", "")
|
||||
@@ -1553,31 +1464,24 @@ def get_active_browser_sessions() -> Dict[str, Dict[str, str]]:
|
||||
def check_browser_requirements() -> bool:
|
||||
"""
|
||||
Check if browser tool requirements are met.
|
||||
|
||||
In **local mode** (no Browserbase credentials): only the ``agent-browser``
|
||||
CLI must be findable.
|
||||
|
||||
In **cloud mode** (BROWSERBASE_API_KEY set): the CLI *and* both
|
||||
``BROWSERBASE_API_KEY`` / ``BROWSERBASE_PROJECT_ID`` must be present.
|
||||
|
||||
Returns:
|
||||
True if all requirements are met, False otherwise
|
||||
"""
|
||||
# The agent-browser CLI is always required
|
||||
# Check for Browserbase credentials
|
||||
api_key = os.environ.get("BROWSERBASE_API_KEY")
|
||||
project_id = os.environ.get("BROWSERBASE_PROJECT_ID")
|
||||
|
||||
if not api_key or not project_id:
|
||||
return False
|
||||
|
||||
# Check for agent-browser CLI
|
||||
try:
|
||||
_find_agent_browser()
|
||||
return True
|
||||
except FileNotFoundError:
|
||||
return False
|
||||
|
||||
# In cloud mode, also require Browserbase credentials
|
||||
if not _is_local_mode():
|
||||
api_key = os.environ.get("BROWSERBASE_API_KEY")
|
||||
project_id = os.environ.get("BROWSERBASE_PROJECT_ID")
|
||||
if not api_key or not project_id:
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Module Test
|
||||
@@ -1589,26 +1493,20 @@ if __name__ == "__main__":
|
||||
"""
|
||||
print("🌐 Browser Tool Module")
|
||||
print("=" * 40)
|
||||
|
||||
mode = "local" if _is_local_mode() else "cloud (Browserbase)"
|
||||
print(f" Mode: {mode}")
|
||||
|
||||
# Check requirements
|
||||
if check_browser_requirements():
|
||||
print("✅ All requirements met")
|
||||
else:
|
||||
print("❌ Missing requirements:")
|
||||
if not os.environ.get("BROWSERBASE_API_KEY"):
|
||||
print(" - BROWSERBASE_API_KEY not set")
|
||||
if not os.environ.get("BROWSERBASE_PROJECT_ID"):
|
||||
print(" - BROWSERBASE_PROJECT_ID not set")
|
||||
try:
|
||||
_find_agent_browser()
|
||||
except FileNotFoundError:
|
||||
print(" - agent-browser CLI not found")
|
||||
print(" Install: npm install -g agent-browser && agent-browser install --with-deps")
|
||||
if not _is_local_mode():
|
||||
if not os.environ.get("BROWSERBASE_API_KEY"):
|
||||
print(" - BROWSERBASE_API_KEY not set (required for cloud mode)")
|
||||
if not os.environ.get("BROWSERBASE_PROJECT_ID"):
|
||||
print(" - BROWSERBASE_PROJECT_ID not set (required for cloud mode)")
|
||||
print(" Tip: unset BROWSERBASE_API_KEY to use free local mode instead")
|
||||
|
||||
print("\n📋 Available Browser Tools:")
|
||||
for schema in BROWSER_TOOL_SCHEMAS:
|
||||
@@ -1633,6 +1531,7 @@ registry.register(
|
||||
schema=_BROWSER_SCHEMA_MAP["browser_navigate"],
|
||||
handler=lambda args, **kw: browser_navigate(url=args.get("url", ""), task_id=kw.get("task_id")),
|
||||
check_fn=check_browser_requirements,
|
||||
requires_env=["BROWSERBASE_API_KEY", "BROWSERBASE_PROJECT_ID"],
|
||||
)
|
||||
registry.register(
|
||||
name="browser_snapshot",
|
||||
@@ -1641,6 +1540,7 @@ registry.register(
|
||||
handler=lambda args, **kw: browser_snapshot(
|
||||
full=args.get("full", False), task_id=kw.get("task_id"), user_task=kw.get("user_task")),
|
||||
check_fn=check_browser_requirements,
|
||||
requires_env=["BROWSERBASE_API_KEY", "BROWSERBASE_PROJECT_ID"],
|
||||
)
|
||||
registry.register(
|
||||
name="browser_click",
|
||||
@@ -1648,6 +1548,7 @@ registry.register(
|
||||
schema=_BROWSER_SCHEMA_MAP["browser_click"],
|
||||
handler=lambda args, **kw: browser_click(**args, task_id=kw.get("task_id")),
|
||||
check_fn=check_browser_requirements,
|
||||
requires_env=["BROWSERBASE_API_KEY", "BROWSERBASE_PROJECT_ID"],
|
||||
)
|
||||
registry.register(
|
||||
name="browser_type",
|
||||
@@ -1655,6 +1556,7 @@ registry.register(
|
||||
schema=_BROWSER_SCHEMA_MAP["browser_type"],
|
||||
handler=lambda args, **kw: browser_type(**args, task_id=kw.get("task_id")),
|
||||
check_fn=check_browser_requirements,
|
||||
requires_env=["BROWSERBASE_API_KEY", "BROWSERBASE_PROJECT_ID"],
|
||||
)
|
||||
registry.register(
|
||||
name="browser_scroll",
|
||||
@@ -1662,6 +1564,7 @@ registry.register(
|
||||
schema=_BROWSER_SCHEMA_MAP["browser_scroll"],
|
||||
handler=lambda args, **kw: browser_scroll(**args, task_id=kw.get("task_id")),
|
||||
check_fn=check_browser_requirements,
|
||||
requires_env=["BROWSERBASE_API_KEY", "BROWSERBASE_PROJECT_ID"],
|
||||
)
|
||||
registry.register(
|
||||
name="browser_back",
|
||||
@@ -1669,6 +1572,7 @@ registry.register(
|
||||
schema=_BROWSER_SCHEMA_MAP["browser_back"],
|
||||
handler=lambda args, **kw: browser_back(task_id=kw.get("task_id")),
|
||||
check_fn=check_browser_requirements,
|
||||
requires_env=["BROWSERBASE_API_KEY", "BROWSERBASE_PROJECT_ID"],
|
||||
)
|
||||
registry.register(
|
||||
name="browser_press",
|
||||
@@ -1676,6 +1580,7 @@ registry.register(
|
||||
schema=_BROWSER_SCHEMA_MAP["browser_press"],
|
||||
handler=lambda args, **kw: browser_press(key=args.get("key", ""), task_id=kw.get("task_id")),
|
||||
check_fn=check_browser_requirements,
|
||||
requires_env=["BROWSERBASE_API_KEY", "BROWSERBASE_PROJECT_ID"],
|
||||
)
|
||||
registry.register(
|
||||
name="browser_close",
|
||||
@@ -1683,6 +1588,7 @@ registry.register(
|
||||
schema=_BROWSER_SCHEMA_MAP["browser_close"],
|
||||
handler=lambda args, **kw: browser_close(task_id=kw.get("task_id")),
|
||||
check_fn=check_browser_requirements,
|
||||
requires_env=["BROWSERBASE_API_KEY", "BROWSERBASE_PROJECT_ID"],
|
||||
)
|
||||
registry.register(
|
||||
name="browser_get_images",
|
||||
@@ -1690,6 +1596,7 @@ registry.register(
|
||||
schema=_BROWSER_SCHEMA_MAP["browser_get_images"],
|
||||
handler=lambda args, **kw: browser_get_images(task_id=kw.get("task_id")),
|
||||
check_fn=check_browser_requirements,
|
||||
requires_env=["BROWSERBASE_API_KEY", "BROWSERBASE_PROJECT_ID"],
|
||||
)
|
||||
registry.register(
|
||||
name="browser_vision",
|
||||
@@ -1697,4 +1604,5 @@ registry.register(
|
||||
schema=_BROWSER_SCHEMA_MAP["browser_vision"],
|
||||
handler=lambda args, **kw: browser_vision(question=args.get("question", ""), task_id=kw.get("task_id")),
|
||||
check_fn=check_browser_requirements,
|
||||
requires_env=["BROWSERBASE_API_KEY", "BROWSERBASE_PROJECT_ID"],
|
||||
)
|
||||
|
||||
@@ -435,11 +435,6 @@ def execute_code(
|
||||
child_env[k] = v
|
||||
child_env["HERMES_RPC_SOCKET"] = sock_path
|
||||
child_env["PYTHONDONTWRITEBYTECODE"] = "1"
|
||||
# Inject user's configured timezone so datetime.now() in sandboxed
|
||||
# code reflects the correct wall-clock time.
|
||||
_tz_name = os.getenv("HERMES_TIMEZONE", "").strip()
|
||||
if _tz_name:
|
||||
child_env["TZ"] = _tz_name
|
||||
|
||||
proc = subprocess.Popen(
|
||||
[sys.executable, "script.py"],
|
||||
|
||||
@@ -194,10 +194,6 @@ def _run_single_child(
|
||||
# Build progress callback to relay tool calls to parent display
|
||||
child_progress_cb = _build_child_progress_callback(task_index, parent_agent, task_count)
|
||||
|
||||
# Share the parent's iteration budget so subagent tool calls
|
||||
# count toward the session-wide limit.
|
||||
shared_budget = getattr(parent_agent, "iteration_budget", None)
|
||||
|
||||
child = AIAgent(
|
||||
base_url=parent_agent.base_url,
|
||||
api_key=parent_api_key,
|
||||
@@ -219,7 +215,6 @@ def _run_single_child(
|
||||
providers_order=parent_agent.providers_order,
|
||||
provider_sort=parent_agent.provider_sort,
|
||||
tool_progress_callback=child_progress_cb,
|
||||
iteration_budget=shared_budget,
|
||||
)
|
||||
|
||||
# Set delegation depth so children can't spawn grandchildren
|
||||
|
||||
@@ -993,7 +993,7 @@ async def rl_list_runs() -> str:
|
||||
TEST_MODELS = [
|
||||
{"id": "qwen/qwen3-8b", "name": "Qwen3 8B", "scale": "small"},
|
||||
{"id": "z-ai/glm-4.7-flash", "name": "GLM-4.7 Flash", "scale": "medium"},
|
||||
{"id": "minimax/minimax-m2.5", "name": "MiniMax M2.5", "scale": "large"},
|
||||
{"id": "minimax/minimax-m2.1", "name": "MiniMax M2.1", "scale": "large"},
|
||||
]
|
||||
|
||||
# Default test parameters - quick but representative
|
||||
@@ -1353,7 +1353,7 @@ RL_CHECK_STATUS_SCHEMA = {"name": "rl_check_status", "description": "Get status
|
||||
RL_STOP_TRAINING_SCHEMA = {"name": "rl_stop_training", "description": "Stop a running training job. Use if metrics look bad, training is stagnant, or you want to try different settings.", "parameters": {"type": "object", "properties": {"run_id": {"type": "string", "description": "The run ID to stop"}}, "required": ["run_id"]}}
|
||||
RL_GET_RESULTS_SCHEMA = {"name": "rl_get_results", "description": "Get final results and metrics for a completed training run. Returns final metrics and path to trained weights.", "parameters": {"type": "object", "properties": {"run_id": {"type": "string", "description": "The run ID to get results for"}}, "required": ["run_id"]}}
|
||||
RL_LIST_RUNS_SCHEMA = {"name": "rl_list_runs", "description": "List all training runs (active and completed) with their status.", "parameters": {"type": "object", "properties": {}, "required": []}}
|
||||
RL_TEST_INFERENCE_SCHEMA = {"name": "rl_test_inference", "description": "Quick inference test for any environment. Runs a few steps of inference + scoring using OpenRouter. Default: 3 steps x 16 completions = 48 rollouts per model, testing 3 models = 144 total. Tests environment loading, prompt construction, inference parsing, and verifier logic. Use BEFORE training to catch issues.", "parameters": {"type": "object", "properties": {"num_steps": {"type": "integer", "description": "Number of steps to run (default: 3, recommended max for testing)", "default": 3}, "group_size": {"type": "integer", "description": "Completions per step (default: 16, like training)", "default": 16}, "models": {"type": "array", "items": {"type": "string"}, "description": "Optional list of OpenRouter model IDs. Default: qwen/qwen3-8b, z-ai/glm-4.7-flash, minimax/minimax-m2.5"}}, "required": []}}
|
||||
RL_TEST_INFERENCE_SCHEMA = {"name": "rl_test_inference", "description": "Quick inference test for any environment. Runs a few steps of inference + scoring using OpenRouter. Default: 3 steps x 16 completions = 48 rollouts per model, testing 3 models = 144 total. Tests environment loading, prompt construction, inference parsing, and verifier logic. Use BEFORE training to catch issues.", "parameters": {"type": "object", "properties": {"num_steps": {"type": "integer", "description": "Number of steps to run (default: 3, recommended max for testing)", "default": 3}, "group_size": {"type": "integer", "description": "Completions per step (default: 16, like training)", "default": 16}, "models": {"type": "array", "items": {"type": "string"}, "description": "Optional list of OpenRouter model IDs. Default: qwen/qwen3-8b, z-ai/glm-4.7-flash, minimax/minimax-m2.1"}}, "required": []}}
|
||||
|
||||
_rl_env = ["TINKER_API_KEY", "WANDB_API_KEY"]
|
||||
|
||||
|
||||
@@ -31,9 +31,6 @@ SKILL.md Format (YAML Frontmatter, agentskills.io compatible):
|
||||
description: Brief description # Required, max 1024 chars
|
||||
version: 1.0.0 # Optional
|
||||
license: MIT # Optional (agentskills.io)
|
||||
platforms: [macos] # Optional — restrict to specific OS platforms
|
||||
# Valid: macos, linux, windows
|
||||
# Omit to load on all platforms (default)
|
||||
compatibility: Requires X # Optional (agentskills.io)
|
||||
metadata: # Optional, arbitrary key-value (agentskills.io)
|
||||
hermes:
|
||||
@@ -65,7 +62,6 @@ Usage:
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, List, Optional, Tuple
|
||||
|
||||
@@ -82,41 +78,6 @@ SKILLS_DIR = HERMES_HOME / "skills"
|
||||
MAX_NAME_LENGTH = 64
|
||||
MAX_DESCRIPTION_LENGTH = 1024
|
||||
|
||||
# Platform identifiers for the 'platforms' frontmatter field.
|
||||
# Maps user-friendly names to sys.platform prefixes.
|
||||
_PLATFORM_MAP = {
|
||||
"macos": "darwin",
|
||||
"linux": "linux",
|
||||
"windows": "win32",
|
||||
}
|
||||
|
||||
|
||||
def skill_matches_platform(frontmatter: Dict[str, Any]) -> bool:
|
||||
"""Check if a skill is compatible with the current OS platform.
|
||||
|
||||
Skills declare platform requirements via a top-level ``platforms`` list
|
||||
in their YAML frontmatter::
|
||||
|
||||
platforms: [macos] # macOS only
|
||||
platforms: [macos, linux] # macOS and Linux
|
||||
|
||||
Valid values: ``macos``, ``linux``, ``windows``.
|
||||
|
||||
If the field is absent or empty the skill is compatible with **all**
|
||||
platforms (backward-compatible default).
|
||||
"""
|
||||
platforms = frontmatter.get("platforms")
|
||||
if not platforms:
|
||||
return True # No restriction → loads everywhere
|
||||
if not isinstance(platforms, list):
|
||||
platforms = [platforms]
|
||||
current = sys.platform
|
||||
for p in platforms:
|
||||
mapped = _PLATFORM_MAP.get(str(p).lower().strip(), str(p).lower().strip())
|
||||
if current.startswith(mapped):
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def check_skills_requirements() -> bool:
|
||||
"""Skills are always available -- the directory is created on first use if needed."""
|
||||
@@ -243,10 +204,6 @@ def _find_all_skills() -> List[Dict[str, Any]]:
|
||||
try:
|
||||
content = skill_md.read_text(encoding='utf-8')
|
||||
frontmatter, body = _parse_frontmatter(content)
|
||||
|
||||
# Skip skills incompatible with the current OS platform
|
||||
if not skill_matches_platform(frontmatter):
|
||||
continue
|
||||
|
||||
name = frontmatter.get('name', skill_dir.name)[:MAX_NAME_LENGTH]
|
||||
|
||||
|
||||
@@ -468,9 +468,7 @@ def _handle_vision_analyze(args, **kw):
|
||||
image_url = args.get("image_url", "")
|
||||
question = args.get("question", "")
|
||||
full_prompt = f"Fully describe and explain everything about this image, then answer the following question:\n\n{question}"
|
||||
model = (os.getenv("AUXILIARY_VISION_MODEL", "").strip()
|
||||
or DEFAULT_VISION_MODEL
|
||||
or "google/gemini-3-flash-preview")
|
||||
model = DEFAULT_VISION_MODEL or "google/gemini-3-flash-preview"
|
||||
return vision_analyze_tool(image_url, full_prompt, model)
|
||||
|
||||
|
||||
|
||||
@@ -85,13 +85,7 @@ DEFAULT_MIN_LENGTH_FOR_SUMMARIZATION = 5000
|
||||
|
||||
# Resolve async auxiliary client at module level.
|
||||
# Handles Codex Responses API adapter transparently.
|
||||
_aux_async_client, _DEFAULT_SUMMARIZER_MODEL = get_async_text_auxiliary_client("web_extract")
|
||||
|
||||
# Allow per-task override via config.yaml auxiliary.web_extract_model
|
||||
DEFAULT_SUMMARIZER_MODEL = (
|
||||
os.getenv("AUXILIARY_WEB_EXTRACT_MODEL", "").strip()
|
||||
or _DEFAULT_SUMMARIZER_MODEL
|
||||
)
|
||||
_aux_async_client, DEFAULT_SUMMARIZER_MODEL = get_async_text_auxiliary_client()
|
||||
|
||||
_debug = DebugSession("web_tools", env_var="WEB_TOOLS_DEBUG")
|
||||
|
||||
|
||||
@@ -50,9 +50,6 @@ description: Brief description (shown in skill search results)
|
||||
version: 1.0.0
|
||||
author: Your Name
|
||||
license: MIT
|
||||
platforms: [macos, linux] # Optional — restrict to specific OS platforms
|
||||
# Valid: macos, linux, windows
|
||||
# Omit to load on all platforms (default)
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [Category, Subcategory, Keywords]
|
||||
@@ -79,20 +76,6 @@ Known failure modes and how to handle them.
|
||||
How the agent confirms it worked.
|
||||
```
|
||||
|
||||
### Platform-Specific Skills
|
||||
|
||||
Skills can restrict themselves to specific operating systems using the `platforms` field:
|
||||
|
||||
```yaml
|
||||
platforms: [macos] # macOS only (e.g., iMessage, Apple Reminders)
|
||||
platforms: [macos, linux] # macOS and Linux
|
||||
platforms: [windows] # Windows only
|
||||
```
|
||||
|
||||
When set, the skill is automatically hidden from the system prompt, `skills_list()`, and slash commands on incompatible platforms. If omitted or empty, the skill loads on all platforms (backward compatible).
|
||||
|
||||
See `skills/apple/` for examples of macOS-only skills.
|
||||
|
||||
## Skill Guidelines
|
||||
|
||||
### No External Dependencies
|
||||
|
||||
@@ -1,502 +0,0 @@
|
||||
---
|
||||
sidebar_position: 5
|
||||
title: "Environments, Benchmarks & Data Generation"
|
||||
description: "Building RL training environments, running evaluation benchmarks, and generating SFT data with the Hermes-Agent Atropos integration"
|
||||
---
|
||||
|
||||
# Environments, Benchmarks & Data Generation
|
||||
|
||||
Hermes Agent includes a full environment framework that connects its tool-calling capabilities to the [Atropos](https://github.com/NousResearch/atropos) RL training framework. This enables three workflows:
|
||||
|
||||
1. **RL Training** — Train language models on multi-turn agentic tasks with GRPO
|
||||
2. **Benchmarks** — Evaluate models on standardised agentic benchmarks
|
||||
3. **Data Generation** — Generate SFT training data from agent rollouts
|
||||
|
||||
All three share the same core: an **environment** class that defines tasks, runs an agent loop, and scores the output.
|
||||
|
||||
:::tip Quick Links
|
||||
- **Want to run benchmarks?** Jump to [Available Benchmarks](#available-benchmarks)
|
||||
- **Want to train with RL?** See [RL Training Tools](/user-guide/features/rl-training) for the agent-driven interface, or [Running Environments](#running-environments) for manual execution
|
||||
- **Want to create a new environment?** See [Creating Environments](#creating-environments)
|
||||
:::
|
||||
|
||||
## Architecture
|
||||
|
||||
The environment system is built on a three-layer inheritance chain:
|
||||
|
||||
```
|
||||
Atropos Framework
|
||||
┌───────────────────────┐
|
||||
│ BaseEnv │ (atroposlib)
|
||||
│ - Server management │
|
||||
│ - Worker scheduling │
|
||||
│ - Wandb logging │
|
||||
│ - CLI (serve/process/ │
|
||||
│ evaluate) │
|
||||
└───────────┬───────────┘
|
||||
│ inherits
|
||||
┌───────────┴───────────┐
|
||||
│ HermesAgentBaseEnv │ environments/hermes_base_env.py
|
||||
│ - Terminal backend │
|
||||
│ - Tool resolution │
|
||||
│ - Agent loop engine │
|
||||
│ - ToolContext │
|
||||
└───────────┬───────────┘
|
||||
│ inherits
|
||||
┌─────────────────────┼─────────────────────┐
|
||||
│ │ │
|
||||
TerminalTestEnv HermesSweEnv TerminalBench2EvalEnv
|
||||
(stack testing) (SWE training) (benchmark eval)
|
||||
│
|
||||
┌────────┼────────┐
|
||||
│ │
|
||||
TBLiteEvalEnv YCBenchEvalEnv
|
||||
(fast benchmark) (long-horizon)
|
||||
```
|
||||
|
||||
### BaseEnv (Atropos)
|
||||
|
||||
The foundation from `atroposlib`. Provides:
|
||||
- **Server management** — connects to OpenAI-compatible APIs (VLLM, SGLang, OpenRouter)
|
||||
- **Worker scheduling** — parallel rollout coordination
|
||||
- **Wandb integration** — metrics logging and rollout visualisation
|
||||
- **CLI interface** — three subcommands: `serve`, `process`, `evaluate`
|
||||
- **Eval logging** — `evaluate_log()` saves results to JSON + JSONL
|
||||
|
||||
### HermesAgentBaseEnv
|
||||
|
||||
The hermes-agent layer (`environments/hermes_base_env.py`). Adds:
|
||||
- **Terminal backend configuration** — sets `TERMINAL_ENV` for sandboxed execution (local, Docker, Modal, Daytona, SSH, Singularity)
|
||||
- **Tool resolution** — `_resolve_tools_for_group()` calls hermes-agent's `get_tool_definitions()` to get the right tool schemas based on enabled/disabled toolsets
|
||||
- **Agent loop integration** — `collect_trajectory()` runs `HermesAgentLoop` and scores the result
|
||||
- **Two-phase operation** — Phase 1 (OpenAI server) for eval/SFT, Phase 2 (VLLM ManagedServer) for full RL with logprobs
|
||||
- **Async safety patches** — monkey-patches Modal backend to work inside Atropos's event loop
|
||||
|
||||
### Concrete Environments
|
||||
|
||||
Your environment inherits from `HermesAgentBaseEnv` and implements five methods:
|
||||
|
||||
| Method | Purpose |
|
||||
|--------|---------|
|
||||
| `setup()` | Load dataset, initialise state |
|
||||
| `get_next_item()` | Return the next item for rollout |
|
||||
| `format_prompt(item)` | Convert an item into the user message |
|
||||
| `compute_reward(item, result, ctx)` | Score the rollout (0.0–1.0) |
|
||||
| `evaluate()` | Periodic evaluation logic |
|
||||
|
||||
## Core Components
|
||||
|
||||
### Agent Loop
|
||||
|
||||
`HermesAgentLoop` (`environments/agent_loop.py`) is the reusable multi-turn agent engine. It runs the same tool-calling pattern as hermes-agent's main loop:
|
||||
|
||||
1. Send messages + tool schemas to the API via `server.chat_completion()`
|
||||
2. If the response contains `tool_calls`, dispatch each via `handle_function_call()`
|
||||
3. Append tool results to the conversation, go back to step 1
|
||||
4. If no `tool_calls`, the agent is done
|
||||
|
||||
Tool calls execute in a thread pool (`ThreadPoolExecutor(128)`) so that async backends (Modal, Docker) don't deadlock inside Atropos's event loop.
|
||||
|
||||
Returns an `AgentResult`:
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class AgentResult:
|
||||
messages: List[Dict[str, Any]] # Full conversation history
|
||||
turns_used: int # Number of LLM calls made
|
||||
finished_naturally: bool # True if model stopped on its own
|
||||
reasoning_per_turn: List[Optional[str]] # Extracted reasoning content
|
||||
tool_errors: List[ToolError] # Errors encountered during tool dispatch
|
||||
managed_state: Optional[Dict] # VLLM ManagedServer state (Phase 2)
|
||||
```
|
||||
|
||||
### Tool Context
|
||||
|
||||
`ToolContext` (`environments/tool_context.py`) gives reward functions direct access to the **same sandbox** the model used during its rollout. The `task_id` scoping means all state (files, processes, browser tabs) is preserved.
|
||||
|
||||
```python
|
||||
async def compute_reward(self, item, result, ctx: ToolContext):
|
||||
# Run tests in the model's terminal sandbox
|
||||
test = ctx.terminal("pytest -v")
|
||||
if test["exit_code"] == 0:
|
||||
return 1.0
|
||||
|
||||
# Check if a file was created
|
||||
content = ctx.read_file("/workspace/solution.py")
|
||||
if content.get("content"):
|
||||
return 0.5
|
||||
|
||||
# Download files for local verification
|
||||
ctx.download_file("/remote/output.bin", "/local/output.bin")
|
||||
return 0.0
|
||||
```
|
||||
|
||||
Available methods:
|
||||
|
||||
| Category | Methods |
|
||||
|----------|---------|
|
||||
| **Terminal** | `terminal(command, timeout)` |
|
||||
| **Files** | `read_file(path)`, `write_file(path, content)`, `search(query, path)` |
|
||||
| **Transfers** | `upload_file()`, `upload_dir()`, `download_file()`, `download_dir()` |
|
||||
| **Web** | `web_search(query)`, `web_extract(urls)` |
|
||||
| **Browser** | `browser_navigate(url)`, `browser_snapshot()` |
|
||||
| **Generic** | `call_tool(name, args)` — escape hatch for any hermes-agent tool |
|
||||
| **Cleanup** | `cleanup()` — release all resources |
|
||||
|
||||
### Tool Call Parsers
|
||||
|
||||
For **Phase 2** (VLLM ManagedServer), the server returns raw text without structured tool calls. Client-side parsers in `environments/tool_call_parsers/` extract `tool_calls` from raw output:
|
||||
|
||||
```python
|
||||
from environments.tool_call_parsers import get_parser
|
||||
|
||||
parser = get_parser("hermes") # or "mistral", "llama3_json", "qwen", "deepseek_v3", etc.
|
||||
content, tool_calls = parser.parse(raw_model_output)
|
||||
```
|
||||
|
||||
Available parsers: `hermes`, `mistral`, `llama3_json`, `qwen`, `qwen3_coder`, `deepseek_v3`, `deepseek_v3_1`, `kimi_k2`, `longcat`, `glm45`, `glm47`.
|
||||
|
||||
In Phase 1 (OpenAI server type), parsers are not needed — the server handles tool call parsing natively.
|
||||
|
||||
## Available Benchmarks
|
||||
|
||||
### TerminalBench2
|
||||
|
||||
**89 challenging terminal tasks** with per-task Docker sandbox environments.
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| **What it tests** | Single-task coding/sysadmin ability |
|
||||
| **Scoring** | Binary pass/fail (test suite verification) |
|
||||
| **Sandbox** | Modal cloud sandboxes (per-task Docker images) |
|
||||
| **Tools** | `terminal` + `file` |
|
||||
| **Tasks** | 89 tasks across multiple categories |
|
||||
| **Cost** | ~$50–200 for full eval (parallel execution) |
|
||||
| **Time** | ~2–4 hours |
|
||||
|
||||
```bash
|
||||
python environments/benchmarks/terminalbench_2/terminalbench2_env.py evaluate \
|
||||
--config environments/benchmarks/terminalbench_2/default.yaml
|
||||
|
||||
# Run specific tasks
|
||||
python environments/benchmarks/terminalbench_2/terminalbench2_env.py evaluate \
|
||||
--config environments/benchmarks/terminalbench_2/default.yaml \
|
||||
--env.task_filter fix-git,git-multibranch
|
||||
```
|
||||
|
||||
Dataset: [NousResearch/terminal-bench-2](https://huggingface.co/datasets/NousResearch/terminal-bench-2) on HuggingFace.
|
||||
|
||||
### TBLite (OpenThoughts Terminal Bench Lite)
|
||||
|
||||
**100 difficulty-calibrated tasks** — a faster proxy for TerminalBench2.
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| **What it tests** | Same as TB2 (coding/sysadmin), calibrated difficulty tiers |
|
||||
| **Scoring** | Binary pass/fail |
|
||||
| **Sandbox** | Modal cloud sandboxes |
|
||||
| **Tools** | `terminal` + `file` |
|
||||
| **Tasks** | 100 tasks: Easy (40), Medium (26), Hard (26), Extreme (8) |
|
||||
| **Correlation** | r=0.911 with full TB2 |
|
||||
| **Speed** | 2.6–8× faster than TB2 |
|
||||
|
||||
```bash
|
||||
python environments/benchmarks/tblite/tblite_env.py evaluate \
|
||||
--config environments/benchmarks/tblite/default.yaml
|
||||
```
|
||||
|
||||
TBLite is a thin subclass of TerminalBench2 — only the dataset and timeouts differ. Created by the OpenThoughts Agent team (Snorkel AI + Bespoke Labs). Dataset: [NousResearch/openthoughts-tblite](https://huggingface.co/datasets/NousResearch/openthoughts-tblite).
|
||||
|
||||
### YC-Bench
|
||||
|
||||
**Long-horizon strategic benchmark** — the agent plays CEO of an AI startup.
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| **What it tests** | Multi-turn strategic coherence over hundreds of turns |
|
||||
| **Scoring** | Composite: `0.5 × survival + 0.5 × normalised_funds` |
|
||||
| **Sandbox** | Local terminal (no Modal needed) |
|
||||
| **Tools** | `terminal` only |
|
||||
| **Runs** | 9 default (3 presets × 3 seeds), sequential |
|
||||
| **Cost** | ~$50–200 for full eval |
|
||||
| **Time** | ~3–6 hours |
|
||||
|
||||
```bash
|
||||
# Install yc-bench (optional dependency)
|
||||
pip install "hermes-agent[yc-bench]"
|
||||
|
||||
# Run evaluation
|
||||
bash environments/benchmarks/yc_bench/run_eval.sh
|
||||
|
||||
# Or directly
|
||||
python environments/benchmarks/yc_bench/yc_bench_env.py evaluate \
|
||||
--config environments/benchmarks/yc_bench/default.yaml
|
||||
|
||||
# Quick single-preset test
|
||||
python environments/benchmarks/yc_bench/yc_bench_env.py evaluate \
|
||||
--config environments/benchmarks/yc_bench/default.yaml \
|
||||
--env.presets '["fast_test"]' --env.seeds '[1]'
|
||||
```
|
||||
|
||||
YC-Bench uses [collinear-ai/yc-bench](https://github.com/collinear-ai/yc-bench) — a deterministic simulation with 4 skill domains (research, inference, data_environment, training), prestige system, employee management, and financial pressure. Unlike TB2's per-task binary scoring, YC-Bench measures whether an agent can maintain coherent strategy over hundreds of compounding decisions.
|
||||
|
||||
## Training Environments
|
||||
|
||||
### TerminalTestEnv
|
||||
|
||||
A minimal self-contained environment with inline tasks (no external dataset). Used for **validating the full stack** end-to-end. Each task asks the model to create a file at a known path; the verifier checks the content.
|
||||
|
||||
```bash
|
||||
# Process mode (saves rollouts to JSONL, no training server needed)
|
||||
python environments/terminal_test_env/terminal_test_env.py process \
|
||||
--env.data_path_to_save_groups terminal_test_output.jsonl
|
||||
|
||||
# Serve mode (connects to Atropos API for RL training)
|
||||
python environments/terminal_test_env/terminal_test_env.py serve
|
||||
```
|
||||
|
||||
### HermesSweEnv
|
||||
|
||||
SWE-bench style training environment. The model gets a coding task, uses terminal + file + web tools to solve it, and the reward function runs tests in the same Modal sandbox.
|
||||
|
||||
```bash
|
||||
python environments/hermes_swe_env/hermes_swe_env.py serve \
|
||||
--openai.model_name YourModel \
|
||||
--env.dataset_name bigcode/humanevalpack \
|
||||
--env.terminal_backend modal
|
||||
```
|
||||
|
||||
## Running Environments
|
||||
|
||||
Every environment is a standalone Python script with three CLI subcommands:
|
||||
|
||||
### `evaluate` — Run a benchmark
|
||||
|
||||
For eval-only environments (benchmarks). Runs all items, computes metrics, logs to wandb.
|
||||
|
||||
```bash
|
||||
python environments/benchmarks/tblite/tblite_env.py evaluate \
|
||||
--config environments/benchmarks/tblite/default.yaml \
|
||||
--openai.model_name anthropic/claude-sonnet-4.6
|
||||
```
|
||||
|
||||
No training server or `run-api` needed. The environment handles everything.
|
||||
|
||||
### `process` — Generate SFT data
|
||||
|
||||
Runs rollouts and saves scored trajectories to JSONL. Useful for generating training data without a full RL loop.
|
||||
|
||||
```bash
|
||||
python environments/terminal_test_env/terminal_test_env.py process \
|
||||
--env.data_path_to_save_groups output.jsonl \
|
||||
--openai.model_name anthropic/claude-sonnet-4.6
|
||||
```
|
||||
|
||||
Output format: each line is a scored trajectory with the full conversation history, reward, and metadata.
|
||||
|
||||
### `serve` — Connect to Atropos for RL training
|
||||
|
||||
Connects the environment to a running Atropos API server (`run-api`). Used during live RL training.
|
||||
|
||||
```bash
|
||||
# Terminal 1: Start the Atropos API
|
||||
run-api
|
||||
|
||||
# Terminal 2: Start the environment
|
||||
python environments/hermes_swe_env/hermes_swe_env.py serve \
|
||||
--openai.model_name YourModel
|
||||
```
|
||||
|
||||
The environment receives items from Atropos, runs agent rollouts, computes rewards, and sends scored trajectories back for training.
|
||||
|
||||
## Two-Phase Operation
|
||||
|
||||
### Phase 1: OpenAI Server (Eval / SFT)
|
||||
|
||||
Uses `server.chat_completion()` with `tools=` parameter. The server (VLLM, SGLang, OpenRouter, OpenAI) handles tool call parsing natively. Returns `ChatCompletion` objects with structured `tool_calls`.
|
||||
|
||||
- **Use for**: evaluation, SFT data generation, benchmarks, testing
|
||||
- **Placeholder tokens** are created for the Atropos pipeline (since real token IDs aren't available from the OpenAI API)
|
||||
|
||||
### Phase 2: VLLM ManagedServer (Full RL)
|
||||
|
||||
Uses ManagedServer for exact token IDs + logprobs via `/generate`. A client-side [tool call parser](#tool-call-parsers) reconstructs structured `tool_calls` from raw output.
|
||||
|
||||
- **Use for**: full RL training with GRPO/PPO
|
||||
- **Real tokens**, masks, and logprobs flow through the pipeline
|
||||
- Set `tool_call_parser` in config to match your model's format (e.g., `"hermes"`, `"qwen"`, `"mistral"`)
|
||||
|
||||
## Creating Environments
|
||||
|
||||
### Training Environment
|
||||
|
||||
```python
|
||||
from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
|
||||
from atroposlib.envs.server_handling.server_manager import APIServerConfig
|
||||
|
||||
class MyEnvConfig(HermesAgentEnvConfig):
|
||||
my_custom_field: str = "default_value"
|
||||
|
||||
class MyEnv(HermesAgentBaseEnv):
|
||||
name = "my-env"
|
||||
env_config_cls = MyEnvConfig
|
||||
|
||||
@classmethod
|
||||
def config_init(cls):
|
||||
env_config = MyEnvConfig(
|
||||
enabled_toolsets=["terminal", "file"],
|
||||
terminal_backend="modal",
|
||||
max_agent_turns=30,
|
||||
)
|
||||
server_configs = [APIServerConfig(
|
||||
base_url="https://openrouter.ai/api/v1",
|
||||
model_name="anthropic/claude-sonnet-4.6",
|
||||
server_type="openai",
|
||||
)]
|
||||
return env_config, server_configs
|
||||
|
||||
async def setup(self):
|
||||
from datasets import load_dataset
|
||||
self.dataset = list(load_dataset("my-dataset", split="train"))
|
||||
self.iter = 0
|
||||
|
||||
async def get_next_item(self):
|
||||
item = self.dataset[self.iter % len(self.dataset)]
|
||||
self.iter += 1
|
||||
return item
|
||||
|
||||
def format_prompt(self, item):
|
||||
return item["instruction"]
|
||||
|
||||
async def compute_reward(self, item, result, ctx):
|
||||
# ctx gives full tool access to the rollout's sandbox
|
||||
test = ctx.terminal("pytest -v")
|
||||
return 1.0 if test["exit_code"] == 0 else 0.0
|
||||
|
||||
async def evaluate(self, *args, **kwargs):
|
||||
# Periodic evaluation during training
|
||||
pass
|
||||
|
||||
if __name__ == "__main__":
|
||||
MyEnv.cli()
|
||||
```
|
||||
|
||||
### Eval-Only Benchmark
|
||||
|
||||
For benchmarks, follow the pattern used by TerminalBench2, TBLite, and YC-Bench:
|
||||
|
||||
1. **Create under** `environments/benchmarks/your-benchmark/`
|
||||
2. **Set eval-only config**: `eval_handling=STOP_TRAIN`, `steps_per_eval=1`, `total_steps=1`
|
||||
3. **Stub training methods**: `collect_trajectories()` returns `(None, [])`, `score()` returns `None`
|
||||
4. **Implement** `rollout_and_score_eval(eval_item)` — the per-item agent loop + scoring
|
||||
5. **Implement** `evaluate()` — orchestrates all runs, computes aggregate metrics
|
||||
6. **Add streaming JSONL** for crash-safe result persistence
|
||||
7. **Add cleanup**: `KeyboardInterrupt` handling, `cleanup_all_environments()`, `_tool_executor.shutdown()`
|
||||
8. **Run with** `evaluate` subcommand
|
||||
|
||||
See `environments/benchmarks/yc_bench/yc_bench_env.py` for a clean, well-documented reference implementation.
|
||||
|
||||
## Configuration Reference
|
||||
|
||||
### HermesAgentEnvConfig Fields
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `enabled_toolsets` | `List[str]` | `None` (all) | Which hermes toolsets to enable |
|
||||
| `disabled_toolsets` | `List[str]` | `None` | Toolsets to filter out |
|
||||
| `distribution` | `str` | `None` | Probabilistic toolset distribution name |
|
||||
| `max_agent_turns` | `int` | `30` | Max LLM calls per rollout |
|
||||
| `agent_temperature` | `float` | `1.0` | Sampling temperature |
|
||||
| `system_prompt` | `str` | `None` | System message for the agent |
|
||||
| `terminal_backend` | `str` | `"local"` | `local`, `docker`, `modal`, `daytona`, `ssh`, `singularity` |
|
||||
| `terminal_timeout` | `int` | `120` | Seconds per terminal command |
|
||||
| `terminal_lifetime` | `int` | `3600` | Max sandbox lifetime |
|
||||
| `dataset_name` | `str` | `None` | HuggingFace dataset identifier |
|
||||
| `tool_pool_size` | `int` | `128` | Thread pool size for tool execution |
|
||||
| `tool_call_parser` | `str` | `"hermes"` | Parser for Phase 2 raw output |
|
||||
| `extra_body` | `Dict` | `None` | Extra params for OpenAI API (e.g., OpenRouter provider prefs) |
|
||||
| `eval_handling` | `Enum` | `STOP_TRAIN` | `STOP_TRAIN`, `LIMIT_TRAIN`, `NONE` |
|
||||
|
||||
### YAML Configuration
|
||||
|
||||
Environments can be configured via YAML files passed with `--config`:
|
||||
|
||||
```yaml
|
||||
env:
|
||||
enabled_toolsets: ["terminal", "file"]
|
||||
max_agent_turns: 60
|
||||
max_token_length: 32000
|
||||
agent_temperature: 0.8
|
||||
terminal_backend: "modal"
|
||||
terminal_timeout: 300
|
||||
dataset_name: "NousResearch/terminal-bench-2"
|
||||
tokenizer_name: "NousResearch/Hermes-3-Llama-3.1-8B"
|
||||
use_wandb: true
|
||||
wandb_name: "my-benchmark"
|
||||
|
||||
openai:
|
||||
base_url: "https://openrouter.ai/api/v1"
|
||||
model_name: "anthropic/claude-sonnet-4.6"
|
||||
server_type: "openai"
|
||||
health_check: false
|
||||
```
|
||||
|
||||
YAML values override `config_init()` defaults. CLI arguments override YAML values:
|
||||
|
||||
```bash
|
||||
python my_env.py evaluate \
|
||||
--config my_config.yaml \
|
||||
--openai.model_name anthropic/claude-opus-4.6 # overrides YAML
|
||||
```
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### For all environments
|
||||
|
||||
- Python >= 3.11
|
||||
- `atroposlib`: `pip install git+https://github.com/NousResearch/atropos.git`
|
||||
- An LLM API key (OpenRouter, OpenAI, or self-hosted VLLM/SGLang)
|
||||
|
||||
### For Modal-sandboxed benchmarks (TB2, TBLite)
|
||||
|
||||
- [Modal](https://modal.com) account and CLI: `pip install "hermes-agent[modal]"`
|
||||
- `MODAL_TOKEN_ID` and `MODAL_TOKEN_SECRET` environment variables
|
||||
|
||||
### For YC-Bench
|
||||
|
||||
- `pip install "hermes-agent[yc-bench]"` (installs the yc-bench CLI + SQLAlchemy)
|
||||
- No Modal needed — runs with local terminal backend
|
||||
|
||||
### For RL training
|
||||
|
||||
- `TINKER_API_KEY` — API key for the [Tinker](https://tinker.computer) training service
|
||||
- `WANDB_API_KEY` — for Weights & Biases metrics tracking
|
||||
- The `tinker-atropos` submodule (at `tinker-atropos/` in the repo)
|
||||
|
||||
See [RL Training](/user-guide/features/rl-training) for the agent-driven RL workflow.
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
environments/
|
||||
├── hermes_base_env.py # Abstract base class (HermesAgentBaseEnv)
|
||||
├── agent_loop.py # Multi-turn agent engine (HermesAgentLoop)
|
||||
├── tool_context.py # Per-rollout tool access for reward functions
|
||||
├── patches.py # Async-safety patches for Modal backend
|
||||
│
|
||||
├── tool_call_parsers/ # Phase 2 client-side parsers
|
||||
│ ├── hermes_parser.py # Hermes/ChatML <tool_call> format
|
||||
│ ├── mistral_parser.py # Mistral [TOOL_CALLS] format
|
||||
│ ├── llama_parser.py # Llama 3 JSON tool calling
|
||||
│ ├── qwen_parser.py # Qwen format
|
||||
│ ├── deepseek_v3_parser.py # DeepSeek V3 format
|
||||
│ └── ... # + kimi_k2, longcat, glm45/47, etc.
|
||||
│
|
||||
├── terminal_test_env/ # Stack validation (inline tasks)
|
||||
├── hermes_swe_env/ # SWE-bench training environment
|
||||
│
|
||||
└── benchmarks/ # Evaluation benchmarks
|
||||
├── terminalbench_2/ # 89 terminal tasks, Modal sandboxes
|
||||
├── tblite/ # 100 calibrated tasks (fast TB2 proxy)
|
||||
└── yc_bench/ # Long-horizon strategic benchmark
|
||||
```
|
||||
@@ -19,7 +19,7 @@ These are commands you run from your shell.
|
||||
| `hermes chat --continue` / `-c` | Resume the most recent session |
|
||||
| `hermes chat --resume <id>` / `-r <id>` | Resume a specific session |
|
||||
| `hermes chat --model <name>` | Use a specific model |
|
||||
| `hermes chat --provider <name>` | Force a provider (`nous`, `openrouter`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`) |
|
||||
| `hermes chat --provider <name>` | Force a provider (`nous`, `openrouter`) |
|
||||
| `hermes chat --toolsets "web,terminal"` / `-t` | Use specific toolsets |
|
||||
| `hermes chat --verbose` | Enable verbose/debug output |
|
||||
|
||||
|
||||
@@ -15,14 +15,6 @@ All variables go in `~/.hermes/.env`. You can also set them with `hermes config
|
||||
| `OPENROUTER_API_KEY` | OpenRouter API key (recommended for flexibility) |
|
||||
| `OPENAI_API_KEY` | API key for custom OpenAI-compatible endpoints (used with `OPENAI_BASE_URL`) |
|
||||
| `OPENAI_BASE_URL` | Base URL for custom endpoint (VLLM, SGLang, etc.) |
|
||||
| `GLM_API_KEY` | z.ai / ZhipuAI GLM API key ([z.ai](https://z.ai)) |
|
||||
| `GLM_BASE_URL` | Override z.ai base URL (default: `https://api.z.ai/api/paas/v4`) |
|
||||
| `KIMI_API_KEY` | Kimi / Moonshot AI API key ([moonshot.ai](https://platform.moonshot.ai)) |
|
||||
| `KIMI_BASE_URL` | Override Kimi base URL (default: `https://api.moonshot.ai/v1`) |
|
||||
| `MINIMAX_API_KEY` | MiniMax API key — global endpoint ([minimax.io](https://www.minimax.io)) |
|
||||
| `MINIMAX_BASE_URL` | Override MiniMax base URL (default: `https://api.minimax.io/v1`) |
|
||||
| `MINIMAX_CN_API_KEY` | MiniMax API key — China endpoint ([minimaxi.com](https://www.minimaxi.com)) |
|
||||
| `MINIMAX_CN_BASE_URL` | Override MiniMax China base URL (default: `https://api.minimaxi.com/v1`) |
|
||||
| `HERMES_MODEL` | Preferred model name (checked before `LLM_MODEL`, used by gateway) |
|
||||
| `LLM_MODEL` | Default model name (fallback when not set in config.yaml) |
|
||||
| `VOICE_TOOLS_OPENAI_KEY` | OpenAI key for TTS and voice transcription (separate from custom endpoint) |
|
||||
@@ -32,7 +24,7 @@ All variables go in `~/.hermes/.env`. You can also set them with `hermes config
|
||||
|
||||
| Variable | Description |
|
||||
|----------|-------------|
|
||||
| `HERMES_INFERENCE_PROVIDER` | Override provider selection: `auto`, `openrouter`, `nous`, `zai`, `kimi-coding`, `minimax`, `minimax-cn` (default: `auto`) |
|
||||
| `HERMES_INFERENCE_PROVIDER` | Override provider selection: `auto`, `openrouter`, `nous` (default: `auto`) |
|
||||
| `HERMES_PORTAL_BASE_URL` | Override Nous Portal URL (for development/testing) |
|
||||
| `NOUS_INFERENCE_BASE_URL` | Override Nous inference API URL |
|
||||
| `HERMES_NOUS_MIN_KEY_TTL_SECONDS` | Min agent key TTL before re-mint (default: 1800 = 30min) |
|
||||
|
||||
@@ -64,10 +64,6 @@ You need at least one way to connect to an LLM. Use `hermes model` to switch pro
|
||||
| **Nous Portal** | `hermes model` (OAuth, subscription-based) |
|
||||
| **OpenAI Codex** | `hermes model` (ChatGPT OAuth, uses Codex models) |
|
||||
| **OpenRouter** | `OPENROUTER_API_KEY` in `~/.hermes/.env` |
|
||||
| **z.ai / GLM** | `GLM_API_KEY` in `~/.hermes/.env` (provider: `zai`) |
|
||||
| **Kimi / Moonshot** | `KIMI_API_KEY` in `~/.hermes/.env` (provider: `kimi-coding`) |
|
||||
| **MiniMax** | `MINIMAX_API_KEY` in `~/.hermes/.env` (provider: `minimax`) |
|
||||
| **MiniMax China** | `MINIMAX_CN_API_KEY` in `~/.hermes/.env` (provider: `minimax-cn`) |
|
||||
| **Custom Endpoint** | `OPENAI_BASE_URL` + `OPENAI_API_KEY` in `~/.hermes/.env` |
|
||||
|
||||
:::info Codex Note
|
||||
@@ -78,37 +74,6 @@ The OpenAI Codex provider authenticates via device code (open a URL, enter a cod
|
||||
Even when using Nous Portal, Codex, or a custom endpoint, some tools (vision, web summarization, MoA) use OpenRouter independently. An `OPENROUTER_API_KEY` enables these tools.
|
||||
:::
|
||||
|
||||
### First-Class Chinese AI Providers
|
||||
|
||||
These providers have built-in support with dedicated provider IDs. Set the API key and use `--provider` to select:
|
||||
|
||||
```bash
|
||||
# z.ai / ZhipuAI GLM
|
||||
hermes chat --provider zai --model glm-4-plus
|
||||
# Requires: GLM_API_KEY in ~/.hermes/.env
|
||||
|
||||
# Kimi / Moonshot AI
|
||||
hermes chat --provider kimi-coding --model moonshot-v1-auto
|
||||
# Requires: KIMI_API_KEY in ~/.hermes/.env
|
||||
|
||||
# MiniMax (global endpoint)
|
||||
hermes chat --provider minimax --model MiniMax-Text-01
|
||||
# Requires: MINIMAX_API_KEY in ~/.hermes/.env
|
||||
|
||||
# MiniMax (China endpoint)
|
||||
hermes chat --provider minimax-cn --model MiniMax-Text-01
|
||||
# Requires: MINIMAX_CN_API_KEY in ~/.hermes/.env
|
||||
```
|
||||
|
||||
Or set the provider permanently in `config.yaml`:
|
||||
```yaml
|
||||
model:
|
||||
provider: "zai" # or: kimi-coding, minimax, minimax-cn
|
||||
default: "glm-4-plus"
|
||||
```
|
||||
|
||||
Base URLs can be overridden with `GLM_BASE_URL`, `KIMI_BASE_URL`, `MINIMAX_BASE_URL`, or `MINIMAX_CN_BASE_URL` environment variables.
|
||||
|
||||
## Custom & Self-Hosted LLM Providers
|
||||
|
||||
Hermes Agent works with **any OpenAI-compatible API endpoint**. If a server implements `/v1/chat/completions`, you can point Hermes at it. This means you can use local models, GPU inference servers, multi-provider routers, or any third-party API.
|
||||
@@ -325,7 +290,6 @@ LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct-Turbo
|
||||
| **Cost optimization** | ClawRouter or OpenRouter with `sort: "price"` |
|
||||
| **Maximum privacy** | Ollama, vLLM, or llama.cpp (fully local) |
|
||||
| **Enterprise / Azure** | Azure OpenAI with custom endpoint |
|
||||
| **Chinese AI models** | z.ai (GLM), Kimi/Moonshot, or MiniMax (first-class providers) |
|
||||
|
||||
:::tip
|
||||
You can switch between providers at any time with `hermes model` — no restart required. Your conversation history, memory, and skills carry over regardless of which provider you use.
|
||||
|
||||
@@ -147,7 +147,7 @@ Default configuration:
|
||||
- Tests 3 models at different scales for robustness:
|
||||
- `qwen/qwen3-8b` (small)
|
||||
- `z-ai/glm-4.7-flash` (medium)
|
||||
- `minimax/minimax-m2.5` (large)
|
||||
- `minimax/minimax-m2.1` (large)
|
||||
- Total: ~144 rollouts
|
||||
|
||||
This validates:
|
||||
|
||||
@@ -50,7 +50,6 @@ The agent only loads the full skill content when it actually needs it.
|
||||
name: my-skill
|
||||
description: Brief description of what this skill does
|
||||
version: 1.0.0
|
||||
platforms: [macos, linux] # Optional — restrict to specific OS platforms
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [python, automation]
|
||||
@@ -73,23 +72,6 @@ Trigger conditions for this skill.
|
||||
How to confirm it worked.
|
||||
```
|
||||
|
||||
### Platform-Specific Skills
|
||||
|
||||
Skills can restrict themselves to specific operating systems using the `platforms` field:
|
||||
|
||||
| Value | Matches |
|
||||
|-------|---------|
|
||||
| `macos` | macOS (Darwin) |
|
||||
| `linux` | Linux |
|
||||
| `windows` | Windows |
|
||||
|
||||
```yaml
|
||||
platforms: [macos] # macOS only (e.g., iMessage, Apple Reminders, FindMy)
|
||||
platforms: [macos, linux] # macOS and Linux
|
||||
```
|
||||
|
||||
When set, the skill is automatically hidden from the system prompt, `skills_list()`, and slash commands on incompatible platforms. If omitted, the skill loads on all platforms.
|
||||
|
||||
## Skill Directory Structure
|
||||
|
||||
```
|
||||
|
||||
@@ -64,7 +64,6 @@ const sidebars: SidebarsConfig = {
|
||||
label: 'Developer Guide',
|
||||
items: [
|
||||
'developer-guide/architecture',
|
||||
'developer-guide/environments',
|
||||
'developer-guide/adding-tools',
|
||||
'developer-guide/creating-skills',
|
||||
'developer-guide/contributing',
|
||||
|
||||
Reference in New Issue
Block a user