Compare commits
8 Commits
rl-capabil
...
atropos-he
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
d999d9876d | ||
|
|
a8809bbd3e | ||
|
|
a478e44585 | ||
|
|
c0494b3558 | ||
|
|
07b615e96e | ||
|
|
ac79725923 | ||
|
|
8dd38318fc | ||
|
|
8380895ae3 |
4
.gitignore
vendored
4
.gitignore
vendored
@@ -39,6 +39,10 @@ agent-browser/
|
||||
*.pem
|
||||
privvy*
|
||||
images/
|
||||
__pycache__/
|
||||
hermes_agent.egg-info/
|
||||
wandb/
|
||||
testlogs
|
||||
|
||||
# CLI config (may contain sensitive SSH paths)
|
||||
cli-config.yaml
|
||||
|
||||
443
README.md
443
README.md
@@ -15,11 +15,13 @@ irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/ins
|
||||
```
|
||||
|
||||
The installer will:
|
||||
- Clone to `~/.hermes-agent` (with submodules: mini-swe-agent, tinker-atropos)
|
||||
- Create a virtual environment
|
||||
- Install all dependencies
|
||||
- Install [uv](https://docs.astral.sh/uv/) (fast Python package manager) if not present
|
||||
- Install Python 3.11 via uv if not already available (no sudo needed)
|
||||
- Clone to `~/.hermes/hermes-agent` (with submodules: mini-swe-agent, tinker-atropos)
|
||||
- Create a virtual environment with Python 3.11
|
||||
- Install all dependencies and submodule packages
|
||||
- Symlink `hermes` into `~/.local/bin` so it works globally (no venv activation needed)
|
||||
- Run the interactive setup wizard
|
||||
- Add `hermes` to your PATH
|
||||
|
||||
After installation, reload your shell and run:
|
||||
```bash
|
||||
@@ -64,13 +66,13 @@ You need at least one LLM provider:
|
||||
| Provider | Get Key | Env Variable |
|
||||
|----------|---------|--------------|
|
||||
| **OpenRouter** (recommended) | [openrouter.ai/keys](https://openrouter.ai/keys) | `OPENROUTER_API_KEY` |
|
||||
| Anthropic | [console.anthropic.com](https://console.anthropic.com/) | `ANTHROPIC_API_KEY` |
|
||||
| OpenAI | [platform.openai.com](https://platform.openai.com/api-keys) | `OPENAI_API_KEY` |
|
||||
|
||||
|
||||
### Optional API Keys
|
||||
|
||||
| Feature | Provider | Env Variable |
|
||||
|---------|----------|--------------|
|
||||
| Custom OpenAI Endpoint (OAI or VLLM/SGLANG) | [platform.openai.com](https://platform.openai.com/api-keys) | `OPENAI_API_KEY` |
|
||||
| Web scraping | [Firecrawl](https://firecrawl.dev/) | `FIRECRAWL_API_KEY` |
|
||||
| Browser automation | [Browserbase](https://browserbase.com/) | `BROWSERBASE_API_KEY`, `BROWSERBASE_PROJECT_ID` |
|
||||
| Image generation | [FAL](https://fal.ai/) | `FAL_KEY` |
|
||||
@@ -179,8 +181,8 @@ hermes config set terminal.singularity_image ~/python.sif
|
||||
|
||||
**Modal** (serverless cloud):
|
||||
```bash
|
||||
pip install modal boto3
|
||||
modal setup # Authenticate
|
||||
uv pip install "swe-rex[modal]" # Installs swe-rex + modal + boto3
|
||||
modal setup # Authenticate with Modal
|
||||
hermes config set terminal.backend modal
|
||||
```
|
||||
|
||||
@@ -275,16 +277,19 @@ See [docs/messaging.md](docs/messaging.md) for WhatsApp and advanced setup.
|
||||
|
||||
Train language models with reinforcement learning using the Tinker API and Atropos framework.
|
||||
|
||||
> **Note:** RL training tools require **Python 3.11+** (the upstream `tinker` package has this requirement). On Python 3.10, the RL toolset will be automatically disabled — all other features work fine.
|
||||
|
||||
#### Requirements
|
||||
|
||||
1. **API Keys:** Add to `~/.hermes/.env`:
|
||||
1. **Python 3.11+** (check with `python3 --version`)
|
||||
2. **API Keys:** Add to `~/.hermes/.env`:
|
||||
```bash
|
||||
TINKER_API_KEY=your-tinker-key # Get from https://tinker-console.thinkingmachines.ai/keys
|
||||
WANDB_API_KEY=your-wandb-key # Get from https://wandb.ai/authorize
|
||||
OPENROUTER_API_KEY=your-key # Optional: for rl_test_inference
|
||||
```
|
||||
|
||||
2. **That's it!** tinker-atropos is included as a submodule - no separate installation needed.
|
||||
3. **That's it!** tinker-atropos is included as a submodule — the installer handles it automatically.
|
||||
|
||||
#### Using RL Tools
|
||||
|
||||
@@ -320,6 +325,94 @@ For extended RL workflows with longer timeouts:
|
||||
python rl_cli.py --model "anthropic/claude-sonnet-4-20250514"
|
||||
```
|
||||
|
||||
### 🧪 Atropos RL Environments
|
||||
|
||||
Hermes-Agent integrates with the [Atropos](https://github.com/NousResearch/atropos) RL framework through a layered environment system. This allows training models with reinforcement learning on agentic tasks using hermes-agent's tools.
|
||||
|
||||
#### Architecture
|
||||
|
||||
The integration has three layers:
|
||||
|
||||
| Layer | File | Purpose |
|
||||
|-------|------|---------|
|
||||
| **Agent Loop** | `environments/agent_loop.py` | Reusable multi-turn tool-calling engine (standard OpenAI spec) |
|
||||
| **Base Environment** | `environments/hermes_base_env.py` | Abstract Atropos `BaseEnv` subclass with toolset resolution, ToolContext, scoring |
|
||||
| **Concrete Envs** | `environments/terminal_test_env.py`, `environments/hermes_swe_env.py` | Task-specific environments |
|
||||
|
||||
#### Two-Phase Operation
|
||||
|
||||
- **Phase 1 (OpenAI server type)**: Works with any OpenAI-compatible endpoint (VLLM, SGLang, OpenRouter, OpenAI API). The server handles tool call parsing natively. Good for **SFT data generation**, **verifier testing**, and **evaluation**.
|
||||
- **Phase 2 (VLLM server type)**: Uses ManagedServer for exact token IDs + logprobs via `/generate`. Client-side tool call parser registry reconstructs structured `tool_calls` from raw output. Required for **full RL training**.
|
||||
|
||||
#### Quick Start
|
||||
|
||||
```bash
|
||||
# 1. Launch VLLM with tool parser
|
||||
vllm serve YourModel --tool-parser hermes
|
||||
|
||||
# 2. Start the Atropos API server
|
||||
run-api
|
||||
|
||||
# 3. Run an environment
|
||||
python environments/terminal_test_env.py serve \
|
||||
--openai.base_url http://localhost:8000/v1 \
|
||||
--openai.model_name YourModel \
|
||||
--openai.server_type openai
|
||||
```
|
||||
|
||||
#### ToolContext (Reward Functions)
|
||||
|
||||
Reward functions receive a `ToolContext` with unrestricted access to all hermes-agent tools, scoped to the rollout's sandbox:
|
||||
|
||||
```python
|
||||
async def compute_reward(self, item, result, ctx: ToolContext) -> float:
|
||||
# Run tests in the model's terminal sandbox
|
||||
test = ctx.terminal("pytest -v")
|
||||
if test["exit_code"] == 0:
|
||||
return 1.0
|
||||
# Or check a file, search the web, navigate a browser...
|
||||
return 0.0
|
||||
```
|
||||
|
||||
#### Creating Custom Environments
|
||||
|
||||
Subclass `HermesAgentBaseEnv` and implement 5 methods:
|
||||
|
||||
```python
|
||||
from environments.hermes_base_env import HermesAgentBaseEnv
|
||||
|
||||
class MyEnv(HermesAgentBaseEnv):
|
||||
name = "my-env"
|
||||
async def setup(self): ... # Load data
|
||||
async def get_next_item(self): ... # Return next item
|
||||
def format_prompt(self, item): ... # Item -> prompt string
|
||||
async def compute_reward(self, item, result, ctx): ... # Score with ToolContext
|
||||
async def evaluate(self, *args, **kwargs): ... # Periodic eval
|
||||
|
||||
if __name__ == "__main__":
|
||||
MyEnv.cli()
|
||||
```
|
||||
|
||||
#### Toolset Distributions
|
||||
|
||||
Configure which tools are available per group, either explicitly or probabilistically:
|
||||
|
||||
```bash
|
||||
# Explicit toolsets
|
||||
--env.enabled_toolsets '["terminal","file","web"]'
|
||||
|
||||
# Probabilistic distribution (sampled per group)
|
||||
--env.distribution development
|
||||
```
|
||||
|
||||
#### Tool Call Parsers (Phase 2)
|
||||
|
||||
For VLLM server type, a parser registry extracts structured `tool_calls` from raw model output. Supported parsers: `hermes`, `mistral`, `llama3_json`, `qwen`, `deepseek_v3`, `deepseek_v3_1`, `kimi_k2`, `longcat`, `glm45`, `glm47`, `qwen3_coder`.
|
||||
|
||||
```bash
|
||||
--env.tool_call_parser hermes # Match your VLLM --tool-parser flag
|
||||
```
|
||||
|
||||
### ⏰ Scheduled Tasks (Cron)
|
||||
|
||||
Schedule tasks to run automatically:
|
||||
@@ -425,26 +518,332 @@ skills/
|
||||
|
||||
## Manual Installation
|
||||
|
||||
If you prefer not to use the installer:
|
||||
If you prefer full control over the installation process (or the quick-install script doesn't suit your environment), follow these steps to set everything up by hand.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
| Requirement | Minimum Version | Check Command | Notes |
|
||||
|-------------|----------------|---------------|-------|
|
||||
| **Git** | Any recent | `git --version` | Required |
|
||||
| **Node.js** | 18+ | `node --version` | Optional — needed for browser automation tools |
|
||||
| **ripgrep** | Any | `rg --version` | Optional — faster file search in terminal tool (falls back to grep) |
|
||||
|
||||
> **Note:** Python and pip are **not** prerequisites. The installer uses [uv](https://docs.astral.sh/uv/) to provision Python 3.11 automatically (no sudo needed). If you already have Python 3.11+ installed, uv will use it.
|
||||
|
||||
<details>
|
||||
<summary><strong>Installing prerequisites by platform</strong></summary>
|
||||
|
||||
**Ubuntu / Debian:**
|
||||
```bash
|
||||
sudo apt update && sudo apt install git
|
||||
# Optional:
|
||||
sudo apt install ripgrep nodejs npm
|
||||
```
|
||||
|
||||
**macOS (Homebrew):**
|
||||
```bash
|
||||
brew install git
|
||||
# Optional:
|
||||
brew install ripgrep node
|
||||
```
|
||||
|
||||
**Windows (WSL recommended):**
|
||||
Use the [Windows Subsystem for Linux](https://learn.microsoft.com/en-us/windows/wsl/install) and follow the Ubuntu instructions above. Alternatively, use the PowerShell quick-install script at the top of this README.
|
||||
|
||||
</details>
|
||||
|
||||
---
|
||||
|
||||
### Step 1: Clone the Repository
|
||||
|
||||
Clone with `--recurse-submodules` to pull the required submodules ([mini-swe-agent](https://github.com/SWE-agent/mini-swe-agent) for the terminal tool backend and [tinker-atropos](https://github.com/nousresearch/tinker-atropos) for RL training):
|
||||
|
||||
```bash
|
||||
# Clone the repository (with submodules)
|
||||
git clone --recurse-submodules https://github.com/NousResearch/hermes-agent.git
|
||||
cd hermes-agent
|
||||
```
|
||||
|
||||
If you already cloned without `--recurse-submodules`, initialize them manually:
|
||||
```bash
|
||||
git submodule update --init --recursive
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 2: Install uv & Create Virtual Environment
|
||||
|
||||
[uv](https://docs.astral.sh/uv/) is a fast Python package manager that can also provision Python itself. Install it and create the venv in one go:
|
||||
|
||||
```bash
|
||||
# Install uv (if not already installed)
|
||||
curl -LsSf https://astral.sh/uv/install.sh | sh
|
||||
|
||||
# Create venv with Python 3.11 (uv downloads it if not present — no sudo needed)
|
||||
uv venv venv --python 3.11
|
||||
```
|
||||
|
||||
> **Tip:** You do **not** need to activate the venv to use `hermes`. The entry point has a hardcoded shebang pointing to the venv Python, so it works globally once symlinked (see Step 8). For installing packages, uv can target the venv directly via `VIRTUAL_ENV`.
|
||||
|
||||
---
|
||||
|
||||
### Step 3: Install Python Dependencies
|
||||
|
||||
Install the main package in editable mode with all optional extras (messaging, cron, CLI menus, modal):
|
||||
|
||||
```bash
|
||||
# Tell uv which venv to install into
|
||||
export VIRTUAL_ENV="$(pwd)/venv"
|
||||
|
||||
# Install with all extras
|
||||
uv pip install -e ".[all]"
|
||||
```
|
||||
|
||||
If you only want the core agent (no Telegram/Discord/cron support):
|
||||
```bash
|
||||
uv pip install -e "."
|
||||
```
|
||||
|
||||
<details>
|
||||
<summary><strong>Optional extras breakdown</strong></summary>
|
||||
|
||||
| Extra | What it adds | Install command |
|
||||
|-------|-------------|-----------------|
|
||||
| `all` | Everything below | `uv pip install -e ".[all]"` |
|
||||
| `messaging` | Telegram & Discord gateway | `uv pip install -e ".[messaging]"` |
|
||||
| `cron` | Cron expression parsing for scheduled tasks | `uv pip install -e ".[cron]"` |
|
||||
| `cli` | Terminal menu UI for setup wizard | `uv pip install -e ".[cli]"` |
|
||||
| `modal` | Modal cloud execution backend (swe-rex + modal + boto3) | `uv pip install -e ".[modal]"` |
|
||||
| `dev` | pytest & test utilities | `uv pip install -e ".[dev]"` |
|
||||
|
||||
You can combine extras: `uv pip install -e ".[messaging,cron]"`
|
||||
|
||||
</details>
|
||||
|
||||
---
|
||||
|
||||
### Step 4: Install Submodule Packages
|
||||
|
||||
These are local packages checked out as Git submodules. Install them in editable mode:
|
||||
|
||||
```bash
|
||||
# Terminal tool backend (required for the terminal/command-execution tool)
|
||||
uv pip install -e "./mini-swe-agent"
|
||||
|
||||
# RL training backend
|
||||
uv pip install -e "./tinker-atropos"
|
||||
```
|
||||
|
||||
Both are optional — if you skip them, the corresponding toolsets simply won't be available.
|
||||
|
||||
---
|
||||
|
||||
### Step 5: Install Node.js Dependencies (Optional)
|
||||
|
||||
Only needed if you plan to use the **browser automation** toolset (Browserbase-powered):
|
||||
|
||||
```bash
|
||||
npm install
|
||||
```
|
||||
|
||||
This installs the `agent-browser` package defined in `package.json`. Skip this step if you don't need browser tools.
|
||||
|
||||
---
|
||||
|
||||
### Step 6: Create the Configuration Directory
|
||||
|
||||
Hermes stores all user configuration in `~/.hermes/`:
|
||||
|
||||
```bash
|
||||
# Create the directory structure
|
||||
mkdir -p ~/.hermes/{cron,sessions,logs}
|
||||
|
||||
# Copy the example config file
|
||||
cp cli-config.yaml.example ~/.hermes/config.yaml
|
||||
|
||||
# Create an empty .env file for API keys
|
||||
touch ~/.hermes/.env
|
||||
```
|
||||
|
||||
Your `~/.hermes/` directory should now look like:
|
||||
```
|
||||
~/.hermes/
|
||||
├── config.yaml # Agent settings (model, terminal, toolsets, compression, etc.)
|
||||
├── .env # API keys and secrets (one per line: KEY=value)
|
||||
├── cron/ # Scheduled job data
|
||||
├── sessions/ # Messaging gateway sessions
|
||||
└── logs/ # Conversation logs
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 7: Add Your API Keys
|
||||
|
||||
Open `~/.hermes/.env` in your editor and add at minimum an LLM provider key:
|
||||
|
||||
```bash
|
||||
# Required — at least one LLM provider:
|
||||
OPENROUTER_API_KEY=sk-or-v1-your-key-here
|
||||
|
||||
# Optional — enable additional tools:
|
||||
FIRECRAWL_API_KEY=fc-your-key # Web search & scraping
|
||||
BROWSERBASE_API_KEY=bb-your-key # Browser automation
|
||||
BROWSERBASE_PROJECT_ID=your-project-id # Browser automation
|
||||
FAL_KEY=your-fal-key # Image generation (FLUX)
|
||||
TINKER_API_KEY=your-tinker-key # RL training
|
||||
WANDB_API_KEY=your-wandb-key # RL training metrics
|
||||
|
||||
# Optional — messaging gateway:
|
||||
TELEGRAM_BOT_TOKEN=123456:ABC-DEF # From @BotFather
|
||||
TELEGRAM_ALLOWED_USERS=your-user-id # Comma-separated
|
||||
DISCORD_BOT_TOKEN=MTIz... # From Developer Portal
|
||||
DISCORD_ALLOWED_USERS=your-user-id # Comma-separated
|
||||
```
|
||||
|
||||
Or set them one at a time via the CLI:
|
||||
```bash
|
||||
hermes config set OPENROUTER_API_KEY sk-or-v1-your-key-here
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 8: Add `hermes` to Your PATH
|
||||
|
||||
The `hermes` entry point at `venv/bin/hermes` has a hardcoded shebang pointing to the venv's Python, so it works **without activating the venv**. The recommended approach is a symlink into `~/.local/bin` (most distributions already have this on PATH):
|
||||
|
||||
```bash
|
||||
mkdir -p ~/.local/bin
|
||||
ln -sf "$(pwd)/venv/bin/hermes" ~/.local/bin/hermes
|
||||
```
|
||||
|
||||
If `~/.local/bin` isn't on your PATH yet, add it:
|
||||
|
||||
**Bash** (`~/.bashrc`):
|
||||
```bash
|
||||
echo '' >> ~/.bashrc
|
||||
echo '# Hermes Agent' >> ~/.bashrc
|
||||
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
|
||||
source ~/.bashrc
|
||||
```
|
||||
|
||||
**Zsh** (`~/.zshrc`):
|
||||
```bash
|
||||
echo '' >> ~/.zshrc
|
||||
echo '# Hermes Agent' >> ~/.zshrc
|
||||
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.zshrc
|
||||
source ~/.zshrc
|
||||
```
|
||||
|
||||
**Fish** (`~/.config/fish/config.fish`):
|
||||
```fish
|
||||
fish_add_path $HOME/.local/bin
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 9: Run the Setup Wizard (Optional)
|
||||
|
||||
The interactive setup wizard walks you through configuring your API keys and preferences:
|
||||
|
||||
```bash
|
||||
hermes setup
|
||||
```
|
||||
|
||||
This is optional if you already configured `~/.hermes/.env` and `~/.hermes/config.yaml` manually in the steps above.
|
||||
|
||||
---
|
||||
|
||||
### Step 10: Verify the Installation
|
||||
|
||||
```bash
|
||||
# Check that the command is available
|
||||
hermes version
|
||||
|
||||
# Run diagnostics to verify everything is working
|
||||
hermes doctor
|
||||
|
||||
# Check your configuration
|
||||
hermes status
|
||||
|
||||
# Test with a quick query
|
||||
hermes chat -q "Hello! What tools do you have available?"
|
||||
```
|
||||
|
||||
If `hermes doctor` reports issues, it will tell you exactly what's missing and how to fix it.
|
||||
|
||||
---
|
||||
|
||||
### Quick-Reference: Manual Install (Condensed)
|
||||
|
||||
For those who just want the commands without the explanations:
|
||||
|
||||
```bash
|
||||
# Install uv (if not already installed)
|
||||
curl -LsSf https://astral.sh/uv/install.sh | sh
|
||||
|
||||
# Clone & enter
|
||||
git clone --recurse-submodules https://github.com/NousResearch/hermes-agent.git
|
||||
cd hermes-agent
|
||||
|
||||
# Run setup script
|
||||
./setup-hermes.sh
|
||||
# Create venv with Python 3.11 (uv downloads it if needed)
|
||||
uv venv venv --python 3.11
|
||||
export VIRTUAL_ENV="$(pwd)/venv"
|
||||
|
||||
# Or manually:
|
||||
python3 -m venv venv
|
||||
source venv/bin/activate
|
||||
pip install -e ".[all]"
|
||||
# Install everything
|
||||
uv pip install -e ".[all]"
|
||||
uv pip install -e "./mini-swe-agent"
|
||||
uv pip install -e "./tinker-atropos"
|
||||
npm install # optional, for browser tools
|
||||
|
||||
# Install submodules (required for terminal and RL tools)
|
||||
pip install -e "./mini-swe-agent" # Terminal tool backend
|
||||
pip install -e "./tinker-atropos" # RL training backend
|
||||
# Configure
|
||||
mkdir -p ~/.hermes/{cron,sessions,logs}
|
||||
cp cli-config.yaml.example ~/.hermes/config.yaml
|
||||
touch ~/.hermes/.env
|
||||
echo 'OPENROUTER_API_KEY=sk-or-v1-your-key' >> ~/.hermes/.env
|
||||
|
||||
hermes setup
|
||||
# Make hermes available globally (no venv activation needed)
|
||||
mkdir -p ~/.local/bin
|
||||
ln -sf "$(pwd)/venv/bin/hermes" ~/.local/bin/hermes
|
||||
|
||||
# Verify
|
||||
hermes doctor
|
||||
hermes
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Updating a Manual Installation
|
||||
|
||||
To update an existing manual install to the latest version:
|
||||
|
||||
```bash
|
||||
cd /path/to/hermes-agent
|
||||
export VIRTUAL_ENV="$(pwd)/venv"
|
||||
|
||||
# Pull latest code and submodules
|
||||
git pull origin main
|
||||
git submodule update --init --recursive
|
||||
|
||||
# Reinstall (picks up new dependencies)
|
||||
uv pip install -e ".[all]"
|
||||
uv pip install -e "./mini-swe-agent"
|
||||
uv pip install -e "./tinker-atropos"
|
||||
|
||||
# Check for new config options added since your last update
|
||||
hermes config check
|
||||
hermes config migrate # Interactively add any missing options
|
||||
```
|
||||
|
||||
### Uninstalling a Manual Installation
|
||||
|
||||
```bash
|
||||
# Remove the hermes symlink
|
||||
rm -f ~/.local/bin/hermes
|
||||
|
||||
# Remove the cloned repository
|
||||
rm -rf /path/to/hermes-agent
|
||||
|
||||
# Remove user configuration (optional — keep if you plan to reinstall)
|
||||
rm -rf ~/.hermes
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
Binary file not shown.
Binary file not shown.
28
environments/__init__.py
Normal file
28
environments/__init__.py
Normal file
@@ -0,0 +1,28 @@
|
||||
"""
|
||||
Hermes-Agent Atropos Environments
|
||||
|
||||
Provides a layered integration between hermes-agent's tool-calling capabilities
|
||||
and the Atropos RL training framework.
|
||||
|
||||
Layers:
|
||||
- agent_loop: Reusable multi-turn agent loop with standard OpenAI-spec tool calling
|
||||
- tool_context: Per-rollout tool access handle for reward/verification functions
|
||||
- hermes_base_env: Abstract base environment (BaseEnv subclass) for Atropos
|
||||
- tool_call_parsers: Client-side tool call parser registry for Phase 2 (VLLM /generate)
|
||||
|
||||
Concrete environments:
|
||||
- terminal_test_env: Simple file-creation tasks for testing the stack
|
||||
- hermes_swe_env: SWE-bench style tasks with Modal sandboxes
|
||||
"""
|
||||
|
||||
from environments.agent_loop import AgentResult, HermesAgentLoop
|
||||
from environments.tool_context import ToolContext
|
||||
from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
|
||||
|
||||
__all__ = [
|
||||
"AgentResult",
|
||||
"HermesAgentLoop",
|
||||
"ToolContext",
|
||||
"HermesAgentBaseEnv",
|
||||
"HermesAgentEnvConfig",
|
||||
]
|
||||
372
environments/agent_loop.py
Normal file
372
environments/agent_loop.py
Normal file
@@ -0,0 +1,372 @@
|
||||
"""
|
||||
HermesAgentLoop -- Reusable Multi-Turn Agent Engine
|
||||
|
||||
Runs the hermes-agent tool-calling loop using standard OpenAI-spec tool calling.
|
||||
Works with any server that returns ChatCompletion objects with tool_calls:
|
||||
- Phase 1: OpenAI server type (VLLM, SGLang, OpenRouter, OpenAI API)
|
||||
- Phase 2: ManagedServer with client-side tool call parser
|
||||
|
||||
The loop passes tools= and checks response.choices[0].message.tool_calls,
|
||||
identical to hermes-agent's run_agent.py. Tool execution is dispatched via
|
||||
handle_function_call() from model_tools.py.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import concurrent.futures
|
||||
import json
|
||||
import logging
|
||||
import uuid
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any, Dict, List, Optional, Set
|
||||
|
||||
from model_tools import handle_function_call
|
||||
|
||||
# Thread pool for running sync tool calls that internally use asyncio.run()
|
||||
# (e.g., mini-swe-agent's modal/docker backends). Running them in a separate
|
||||
# thread gives them a clean event loop so they don't deadlock inside Atropos's loop.
|
||||
_tool_executor = concurrent.futures.ThreadPoolExecutor(max_workers=8)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@dataclass
|
||||
class ToolError:
|
||||
"""Record of a tool execution error during the agent loop."""
|
||||
|
||||
turn: int # Which turn the error occurred on
|
||||
tool_name: str # Which tool was called
|
||||
arguments: str # The arguments passed (truncated)
|
||||
error: str # The error message
|
||||
tool_result: str # The raw result returned to the model
|
||||
|
||||
|
||||
@dataclass
|
||||
class AgentResult:
|
||||
"""Result of running the agent loop."""
|
||||
|
||||
# Full conversation history in OpenAI message format
|
||||
messages: List[Dict[str, Any]]
|
||||
# ManagedServer.get_state() if available (Phase 2), None otherwise
|
||||
managed_state: Optional[Dict[str, Any]] = None
|
||||
# How many LLM calls were made
|
||||
turns_used: int = 0
|
||||
# True if model stopped calling tools naturally (vs hitting max_turns)
|
||||
finished_naturally: bool = False
|
||||
# Extracted reasoning content per turn (from PR #297 helpers)
|
||||
reasoning_per_turn: List[Optional[str]] = field(default_factory=list)
|
||||
# Tool errors encountered during the loop
|
||||
tool_errors: List[ToolError] = field(default_factory=list)
|
||||
|
||||
|
||||
def _extract_reasoning_from_message(message) -> Optional[str]:
|
||||
"""
|
||||
Extract reasoning content from a ChatCompletion message.
|
||||
|
||||
Handles multiple provider formats:
|
||||
1. message.reasoning_content field (some providers)
|
||||
2. message.reasoning field (some providers)
|
||||
3. message.reasoning_details[].text (OpenRouter style)
|
||||
|
||||
Note: <think> block extraction from content is NOT done here -- that's
|
||||
handled by the response already in Phase 1 (server does it) or by
|
||||
ManagedServer's patch in Phase 2.
|
||||
|
||||
Args:
|
||||
message: The assistant message from ChatCompletion response
|
||||
|
||||
Returns:
|
||||
Extracted reasoning text, or None if not found
|
||||
"""
|
||||
# Check reasoning_content field (common across providers)
|
||||
if hasattr(message, "reasoning_content") and message.reasoning_content:
|
||||
return message.reasoning_content
|
||||
|
||||
# Check reasoning field
|
||||
if hasattr(message, "reasoning") and message.reasoning:
|
||||
return message.reasoning
|
||||
|
||||
# Check reasoning_details (OpenRouter style)
|
||||
if hasattr(message, "reasoning_details") and message.reasoning_details:
|
||||
for detail in message.reasoning_details:
|
||||
if hasattr(detail, "text") and detail.text:
|
||||
return detail.text
|
||||
if isinstance(detail, dict) and detail.get("text"):
|
||||
return detail["text"]
|
||||
|
||||
return None
|
||||
|
||||
|
||||
class HermesAgentLoop:
|
||||
"""
|
||||
Runs hermes-agent's tool-calling loop using standard OpenAI-spec tool calling.
|
||||
|
||||
Same pattern as run_agent.py:
|
||||
- Pass tools= to the API
|
||||
- Check response.choices[0].message.tool_calls
|
||||
- Dispatch via handle_function_call()
|
||||
|
||||
Works identically with any server type -- OpenAI, VLLM, SGLang, OpenRouter,
|
||||
or ManagedServer with a parser. The server determines how tool_calls get
|
||||
populated on the response.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
server,
|
||||
tool_schemas: List[Dict[str, Any]],
|
||||
valid_tool_names: Set[str],
|
||||
max_turns: int = 30,
|
||||
task_id: Optional[str] = None,
|
||||
temperature: float = 1.0,
|
||||
max_tokens: Optional[int] = None,
|
||||
):
|
||||
"""
|
||||
Initialize the agent loop.
|
||||
|
||||
Args:
|
||||
server: Server object with chat_completion() method (OpenAIServer,
|
||||
ManagedServer, ServerManager, etc.)
|
||||
tool_schemas: OpenAI-format tool definitions from get_tool_definitions()
|
||||
valid_tool_names: Set of tool names the model is allowed to call
|
||||
max_turns: Maximum number of LLM calls before stopping
|
||||
task_id: Unique ID for terminal/browser session isolation
|
||||
temperature: Sampling temperature for generation
|
||||
max_tokens: Max tokens per generation (None for server default)
|
||||
"""
|
||||
self.server = server
|
||||
self.tool_schemas = tool_schemas
|
||||
self.valid_tool_names = valid_tool_names
|
||||
self.max_turns = max_turns
|
||||
self.task_id = task_id or str(uuid.uuid4())
|
||||
self.temperature = temperature
|
||||
self.max_tokens = max_tokens
|
||||
|
||||
async def run(self, messages: List[Dict[str, Any]]) -> AgentResult:
|
||||
"""
|
||||
Execute the full agent loop using standard OpenAI tool calling.
|
||||
|
||||
Args:
|
||||
messages: Initial conversation messages (system + user).
|
||||
Modified in-place as the conversation progresses.
|
||||
|
||||
Returns:
|
||||
AgentResult with full conversation history, managed state, and metadata
|
||||
"""
|
||||
reasoning_per_turn = []
|
||||
tool_errors: List[ToolError] = []
|
||||
|
||||
for turn in range(self.max_turns):
|
||||
# Build the chat_completion kwargs
|
||||
chat_kwargs = {
|
||||
"messages": messages,
|
||||
"n": 1,
|
||||
"temperature": self.temperature,
|
||||
}
|
||||
|
||||
# Only pass tools if we have them
|
||||
if self.tool_schemas:
|
||||
chat_kwargs["tools"] = self.tool_schemas
|
||||
|
||||
# Only pass max_tokens if explicitly set
|
||||
if self.max_tokens is not None:
|
||||
chat_kwargs["max_tokens"] = self.max_tokens
|
||||
|
||||
# Make the API call -- standard OpenAI spec
|
||||
try:
|
||||
response = await self.server.chat_completion(**chat_kwargs)
|
||||
except Exception as e:
|
||||
logger.error("API call failed on turn %d: %s", turn + 1, e)
|
||||
return AgentResult(
|
||||
messages=messages,
|
||||
managed_state=self._get_managed_state(),
|
||||
turns_used=turn + 1,
|
||||
finished_naturally=False,
|
||||
reasoning_per_turn=reasoning_per_turn,
|
||||
tool_errors=tool_errors,
|
||||
)
|
||||
|
||||
if not response or not response.choices:
|
||||
logger.warning("Empty response on turn %d", turn + 1)
|
||||
return AgentResult(
|
||||
messages=messages,
|
||||
managed_state=self._get_managed_state(),
|
||||
turns_used=turn + 1,
|
||||
finished_naturally=False,
|
||||
reasoning_per_turn=reasoning_per_turn,
|
||||
tool_errors=tool_errors,
|
||||
)
|
||||
|
||||
assistant_msg = response.choices[0].message
|
||||
|
||||
# Extract reasoning content from the response (all provider formats)
|
||||
reasoning = _extract_reasoning_from_message(assistant_msg)
|
||||
reasoning_per_turn.append(reasoning)
|
||||
|
||||
# Check for tool calls -- standard OpenAI spec
|
||||
if assistant_msg.tool_calls:
|
||||
# Build the assistant message dict for conversation history
|
||||
msg_dict: Dict[str, Any] = {
|
||||
"role": "assistant",
|
||||
"content": assistant_msg.content or "",
|
||||
"tool_calls": [
|
||||
{
|
||||
"id": tc.id,
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": tc.function.name,
|
||||
"arguments": tc.function.arguments,
|
||||
},
|
||||
}
|
||||
for tc in assistant_msg.tool_calls
|
||||
],
|
||||
}
|
||||
|
||||
# Preserve reasoning_content for multi-turn chat template handling
|
||||
# (e.g., Kimi-K2's template renders <think> blocks differently
|
||||
# for history vs. the latest turn based on this field)
|
||||
if reasoning:
|
||||
msg_dict["reasoning_content"] = reasoning
|
||||
|
||||
messages.append(msg_dict)
|
||||
|
||||
# Execute each tool call via hermes-agent's dispatch
|
||||
for tc in assistant_msg.tool_calls:
|
||||
tool_name = tc.function.name
|
||||
tool_args_raw = tc.function.arguments
|
||||
|
||||
# Validate tool name
|
||||
if tool_name not in self.valid_tool_names:
|
||||
tool_result = json.dumps(
|
||||
{
|
||||
"error": f"Unknown tool '{tool_name}'. "
|
||||
f"Available tools: {sorted(self.valid_tool_names)}"
|
||||
}
|
||||
)
|
||||
tool_errors.append(ToolError(
|
||||
turn=turn + 1, tool_name=tool_name,
|
||||
arguments=tool_args_raw[:200],
|
||||
error=f"Unknown tool '{tool_name}'",
|
||||
tool_result=tool_result,
|
||||
))
|
||||
logger.warning(
|
||||
"Model called unknown tool '%s' on turn %d",
|
||||
tool_name, turn + 1,
|
||||
)
|
||||
else:
|
||||
# Parse arguments and dispatch
|
||||
try:
|
||||
args = json.loads(tool_args_raw)
|
||||
except json.JSONDecodeError:
|
||||
args = {}
|
||||
logger.warning(
|
||||
"Invalid JSON in tool call arguments for '%s': %s",
|
||||
tool_name, tool_args_raw[:200],
|
||||
)
|
||||
|
||||
try:
|
||||
if tool_name == "terminal":
|
||||
import os
|
||||
backend = os.getenv("TERMINAL_ENV", "local")
|
||||
cmd_preview = args.get("command", "")[:80]
|
||||
print(f" 🖥️ [{backend}] $ {cmd_preview}")
|
||||
|
||||
# Run tool calls in a thread pool so backends that use
|
||||
# asyncio.run() internally (modal, docker) get a clean
|
||||
# event loop instead of deadlocking inside Atropos's loop.
|
||||
loop = asyncio.get_event_loop()
|
||||
tool_result = await loop.run_in_executor(
|
||||
_tool_executor,
|
||||
lambda: handle_function_call(
|
||||
tool_name, args, task_id=self.task_id
|
||||
),
|
||||
)
|
||||
except Exception as e:
|
||||
tool_result = json.dumps(
|
||||
{"error": f"Tool execution failed: {type(e).__name__}: {str(e)}"}
|
||||
)
|
||||
tool_errors.append(ToolError(
|
||||
turn=turn + 1, tool_name=tool_name,
|
||||
arguments=tool_args_raw[:200],
|
||||
error=f"{type(e).__name__}: {str(e)}",
|
||||
tool_result=tool_result,
|
||||
))
|
||||
logger.error(
|
||||
"Tool '%s' execution failed on turn %d: %s",
|
||||
tool_name, turn + 1, e,
|
||||
)
|
||||
|
||||
# Also check if the tool returned an error in its JSON result
|
||||
try:
|
||||
result_data = json.loads(tool_result)
|
||||
if isinstance(result_data, dict):
|
||||
err = result_data.get("error")
|
||||
exit_code = result_data.get("exit_code")
|
||||
if err and exit_code and exit_code < 0:
|
||||
tool_errors.append(ToolError(
|
||||
turn=turn + 1, tool_name=tool_name,
|
||||
arguments=tool_args_raw[:200],
|
||||
error=str(err),
|
||||
tool_result=tool_result[:500],
|
||||
))
|
||||
except (json.JSONDecodeError, TypeError):
|
||||
pass
|
||||
|
||||
# Add tool response to conversation
|
||||
messages.append(
|
||||
{
|
||||
"role": "tool",
|
||||
"tool_call_id": tc.id,
|
||||
"content": tool_result,
|
||||
}
|
||||
)
|
||||
|
||||
logger.debug(
|
||||
"Turn %d: %d tool calls executed",
|
||||
turn + 1,
|
||||
len(assistant_msg.tool_calls),
|
||||
)
|
||||
|
||||
else:
|
||||
# No tool calls -- model is done
|
||||
msg_dict = {
|
||||
"role": "assistant",
|
||||
"content": assistant_msg.content or "",
|
||||
}
|
||||
if reasoning:
|
||||
msg_dict["reasoning_content"] = reasoning
|
||||
messages.append(msg_dict)
|
||||
|
||||
logger.debug(
|
||||
"Turn %d: model finished naturally (no tool calls)", turn + 1
|
||||
)
|
||||
|
||||
return AgentResult(
|
||||
messages=messages,
|
||||
managed_state=self._get_managed_state(),
|
||||
turns_used=turn + 1,
|
||||
finished_naturally=True,
|
||||
reasoning_per_turn=reasoning_per_turn,
|
||||
tool_errors=tool_errors,
|
||||
)
|
||||
|
||||
# Hit max turns without the model stopping
|
||||
logger.info("Agent hit max_turns (%d) without finishing", self.max_turns)
|
||||
return AgentResult(
|
||||
messages=messages,
|
||||
managed_state=self._get_managed_state(),
|
||||
turns_used=self.max_turns,
|
||||
finished_naturally=False,
|
||||
reasoning_per_turn=reasoning_per_turn,
|
||||
tool_errors=tool_errors,
|
||||
)
|
||||
|
||||
def _get_managed_state(self) -> Optional[Dict[str, Any]]:
|
||||
"""
|
||||
Get ManagedServer state if the server supports it.
|
||||
|
||||
Returns state dict with SequenceNodes containing tokens/logprobs/masks,
|
||||
or None if the server doesn't support get_state() (e.g., regular OpenAI server).
|
||||
"""
|
||||
if hasattr(self.server, "get_state"):
|
||||
return self.server.get_state()
|
||||
return None
|
||||
33
environments/configs/swe_default.yaml
Normal file
33
environments/configs/swe_default.yaml
Normal file
@@ -0,0 +1,33 @@
|
||||
# SWE Environment -- Default Configuration
|
||||
#
|
||||
# SWE-bench style tasks with Modal sandboxes for cloud isolation.
|
||||
# Uses terminal + file + web toolsets.
|
||||
#
|
||||
# Usage:
|
||||
# python environments/hermes_swe_env.py serve --config environments/configs/swe_default.yaml
|
||||
|
||||
env:
|
||||
enabled_toolsets: ["terminal", "file", "web"]
|
||||
max_agent_turns: 30
|
||||
max_token_length: 4096
|
||||
group_size: 4
|
||||
terminal_backend: "modal"
|
||||
tool_call_parser: "hermes"
|
||||
tokenizer_name: "NousResearch/DeepHermes-3-Llama-3-3B-Preview"
|
||||
dataset_name: "bigcode/humanevalpack"
|
||||
dataset_split: "test"
|
||||
prompt_field: "prompt"
|
||||
steps_per_eval: 50
|
||||
total_steps: 500
|
||||
use_wandb: true
|
||||
wandb_name: "hermes-swe"
|
||||
system_prompt: >
|
||||
You are a skilled software engineer. You have access to a terminal,
|
||||
file tools, and web search. Use these tools to complete the coding task.
|
||||
Write clean, working code and verify it runs correctly before finishing.
|
||||
|
||||
openai:
|
||||
base_url: "http://localhost:8000/v1"
|
||||
model_name: "NousResearch/DeepHermes-3-Llama-3-3B-Preview"
|
||||
server_type: "openai"
|
||||
api_key: ""
|
||||
35
environments/configs/terminal_test_default.yaml
Normal file
35
environments/configs/terminal_test_default.yaml
Normal file
@@ -0,0 +1,35 @@
|
||||
# Terminal Test Environment -- Default Configuration
|
||||
#
|
||||
# Simple file-creation tasks for validating the full Atropos + hermes-agent stack.
|
||||
# Uses Modal terminal backend and OpenRouter (Claude) for inference.
|
||||
# API keys loaded from ~/hermes-agent/.env
|
||||
#
|
||||
# Usage:
|
||||
# run-api
|
||||
# python environments/terminal_test_env.py serve
|
||||
# # Or with config file:
|
||||
# python environments/terminal_test_env.py serve --config environments/configs/terminal_test_default.yaml
|
||||
|
||||
env:
|
||||
enabled_toolsets: ["terminal", "file"]
|
||||
max_agent_turns: 10
|
||||
max_token_length: 2048
|
||||
group_size: 3
|
||||
total_steps: 3
|
||||
steps_per_eval: 3
|
||||
terminal_backend: "modal"
|
||||
tool_call_parser: "hermes"
|
||||
tokenizer_name: "NousResearch/DeepHermes-3-Llama-3-3B-Preview"
|
||||
ensure_scores_are_not_same: false
|
||||
use_wandb: false
|
||||
system_prompt: >
|
||||
You are a helpful assistant with access to a terminal and file tools.
|
||||
Complete the user's request by using the available tools.
|
||||
Be precise and follow instructions exactly.
|
||||
|
||||
openai:
|
||||
base_url: "https://openrouter.ai/api/v1"
|
||||
model_name: "anthropic/claude-opus-4.6"
|
||||
server_type: "openai"
|
||||
health_check: false
|
||||
# api_key loaded from OPENROUTER_API_KEY in .env
|
||||
615
environments/hermes_base_env.py
Normal file
615
environments/hermes_base_env.py
Normal file
@@ -0,0 +1,615 @@
|
||||
"""
|
||||
HermesAgentBaseEnv -- Abstract Base Environment for Hermes-Agent + Atropos
|
||||
|
||||
Provides the Atropos integration plumbing that all hermes-agent environments share:
|
||||
- Two-mode operation (OpenAI server for Phase 1, VLLM ManagedServer for Phase 2)
|
||||
- Per-group toolset/distribution resolution
|
||||
- Agent loop orchestration via HermesAgentLoop
|
||||
- ToolContext creation for reward functions
|
||||
- ScoredDataGroup construction from ManagedServer state
|
||||
|
||||
Subclasses only need to implement:
|
||||
setup() -- Load dataset, initialize state
|
||||
get_next_item() -- Return the next item from the dataset
|
||||
format_prompt() -- Convert a dataset item into the user message
|
||||
compute_reward() -- Score the rollout (has full ToolContext access)
|
||||
evaluate() -- Periodic evaluation
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import sys
|
||||
import uuid
|
||||
from abc import abstractmethod
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List, Optional, Set, Tuple, Union
|
||||
|
||||
# Ensure the hermes-agent repo root is on sys.path so that imports like
|
||||
# `from model_tools import ...` and `from environments.X import ...` work
|
||||
# regardless of where the script is invoked from.
|
||||
_repo_root = Path(__file__).resolve().parent.parent
|
||||
if str(_repo_root) not in sys.path:
|
||||
sys.path.insert(0, str(_repo_root))
|
||||
|
||||
from dotenv import load_dotenv
|
||||
from pydantic import Field
|
||||
|
||||
# Load API keys from hermes-agent/.env so all environments can access them
|
||||
_env_path = _repo_root / ".env"
|
||||
if _env_path.exists():
|
||||
load_dotenv(dotenv_path=_env_path)
|
||||
|
||||
# Apply monkey patches for async-safe tool operation inside Atropos's event loop.
|
||||
# This patches SwerexModalEnvironment to use a background thread instead of
|
||||
# asyncio.run(), which would deadlock inside Atropos. Safe for normal CLI too.
|
||||
from environments.patches import apply_patches
|
||||
apply_patches()
|
||||
|
||||
from atroposlib.envs.base import (
|
||||
BaseEnv,
|
||||
BaseEnvConfig,
|
||||
ScoredDataGroup,
|
||||
ScoredDataItem,
|
||||
)
|
||||
from atroposlib.envs.server_handling.server_manager import (
|
||||
APIServerConfig,
|
||||
ServerBaseline,
|
||||
ServerManager,
|
||||
)
|
||||
from atroposlib.type_definitions import Item
|
||||
|
||||
from environments.agent_loop import AgentResult, HermesAgentLoop
|
||||
from environments.tool_context import ToolContext
|
||||
|
||||
# Import hermes-agent toolset infrastructure
|
||||
from model_tools import get_tool_definitions
|
||||
from toolset_distributions import sample_toolsets_from_distribution
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class HermesAgentEnvConfig(BaseEnvConfig):
|
||||
"""
|
||||
Configuration for hermes-agent Atropos environments.
|
||||
|
||||
Extends BaseEnvConfig with agent-specific settings for toolsets,
|
||||
terminal backend, dataset loading, and tool call parsing.
|
||||
"""
|
||||
|
||||
# --- Toolset configuration ---
|
||||
# Mutually exclusive: use either enabled_toolsets OR distribution
|
||||
enabled_toolsets: Optional[List[str]] = Field(
|
||||
default=None,
|
||||
description="Explicit list of hermes toolsets to enable (e.g., ['terminal', 'file', 'web']). "
|
||||
"If None and distribution is also None, all available toolsets are enabled.",
|
||||
)
|
||||
disabled_toolsets: Optional[List[str]] = Field(
|
||||
default=None,
|
||||
description="Toolsets to disable. Applied as a filter on top of enabled_toolsets or distribution.",
|
||||
)
|
||||
distribution: Optional[str] = Field(
|
||||
default=None,
|
||||
description="Name of a toolset distribution from toolset_distributions.py "
|
||||
"(e.g., 'development', 'terminal_tasks'). Sampled once per group. "
|
||||
"Mutually exclusive with enabled_toolsets.",
|
||||
)
|
||||
|
||||
# --- Agent loop configuration ---
|
||||
max_agent_turns: int = Field(
|
||||
default=30,
|
||||
description="Maximum number of LLM calls (tool-calling iterations) per rollout.",
|
||||
)
|
||||
system_prompt: Optional[str] = Field(
|
||||
default=None,
|
||||
description="System prompt for the agent. Tools are handled via the tools= parameter, "
|
||||
"not embedded in the prompt text.",
|
||||
)
|
||||
agent_temperature: float = Field(
|
||||
default=1.0,
|
||||
description="Sampling temperature for agent generation during rollouts.",
|
||||
)
|
||||
|
||||
# --- Terminal backend ---
|
||||
terminal_backend: str = Field(
|
||||
default="local",
|
||||
description="Terminal backend: 'local', 'docker', 'modal', 'ssh', 'singularity'. "
|
||||
"Modal recommended for production RL (cloud isolation per rollout).",
|
||||
)
|
||||
|
||||
# --- Dataset ---
|
||||
dataset_name: Optional[str] = Field(
|
||||
default=None,
|
||||
description="HuggingFace dataset name. Optional if tasks are defined inline.",
|
||||
)
|
||||
dataset_split: str = Field(
|
||||
default="train",
|
||||
description="Dataset split to use.",
|
||||
)
|
||||
prompt_field: str = Field(
|
||||
default="prompt",
|
||||
description="Which field in the dataset contains the prompt.",
|
||||
)
|
||||
|
||||
# --- Phase 2: Tool call parsing ---
|
||||
tool_call_parser: str = Field(
|
||||
default="hermes",
|
||||
description="Tool call parser name for Phase 2 (VLLM server type). "
|
||||
"Ignored in Phase 1 (OpenAI server type where VLLM parses natively). "
|
||||
"Options: hermes, mistral, llama3_json, qwen, deepseek_v3, etc.",
|
||||
)
|
||||
|
||||
|
||||
class HermesAgentBaseEnv(BaseEnv):
|
||||
"""
|
||||
Abstract base environment for hermes-agent Atropos integration.
|
||||
|
||||
Handles two modes of operation:
|
||||
- Phase 1 (OpenAI server type): Uses server.chat_completion() directly.
|
||||
The server (VLLM, SGLang, OpenRouter, OpenAI) handles tool call parsing
|
||||
and reasoning extraction natively. DummyManagedServer provides placeholder
|
||||
tokens. Good for SFT data gen, verifier testing, evaluation.
|
||||
|
||||
- Phase 2 (VLLM server type): Uses ManagedServer for exact token IDs + logprobs
|
||||
via /generate. Client-side tool call parser reconstructs structured tool_calls
|
||||
from raw output. Full RL training capability.
|
||||
|
||||
Subclasses must implement:
|
||||
setup() -- Load dataset, initialize state
|
||||
get_next_item() -- Return the next item to roll out
|
||||
format_prompt() -- Convert a dataset item into the user message string
|
||||
compute_reward() -- Score the rollout using ToolContext
|
||||
evaluate() -- Periodic evaluation
|
||||
"""
|
||||
|
||||
name: Optional[str] = "hermes-agent"
|
||||
env_config_cls = HermesAgentEnvConfig
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
config: HermesAgentEnvConfig,
|
||||
server_configs: Union[ServerBaseline, List[APIServerConfig]],
|
||||
slurm=False,
|
||||
testing=False,
|
||||
):
|
||||
super().__init__(config, server_configs, slurm, testing)
|
||||
|
||||
# Set terminal backend environment variable so hermes tools pick it up
|
||||
if config.terminal_backend:
|
||||
os.environ["TERMINAL_ENV"] = config.terminal_backend
|
||||
print(f"🖥️ Terminal backend: {config.terminal_backend}")
|
||||
|
||||
# Current group's resolved tools (set in collect_trajectories)
|
||||
self._current_group_tools: Optional[Tuple[List[Dict], Set[str]]] = None
|
||||
|
||||
# Tool error tracking for wandb logging
|
||||
self._tool_error_buffer: List[Dict[str, Any]] = []
|
||||
|
||||
# =========================================================================
|
||||
# Toolset resolution (per-group)
|
||||
# =========================================================================
|
||||
|
||||
def _resolve_tools_for_group(self) -> Tuple[List[Dict[str, Any]], Set[str]]:
|
||||
"""
|
||||
Resolve toolsets for a group. Called once in collect_trajectories(),
|
||||
then shared by all collect_trajectory() calls in the group.
|
||||
|
||||
If distribution is set, samples probabilistically.
|
||||
If enabled_toolsets is set, uses that explicit list.
|
||||
disabled_toolsets is applied as a filter on top.
|
||||
|
||||
Returns:
|
||||
(tool_schemas, valid_tool_names) tuple
|
||||
"""
|
||||
config = self.config
|
||||
|
||||
if config.distribution:
|
||||
group_toolsets = sample_toolsets_from_distribution(config.distribution)
|
||||
logger.info("Sampled toolsets from '%s': %s", config.distribution, group_toolsets)
|
||||
else:
|
||||
group_toolsets = config.enabled_toolsets # None means "all available"
|
||||
|
||||
tools = get_tool_definitions(
|
||||
enabled_toolsets=group_toolsets,
|
||||
disabled_toolsets=config.disabled_toolsets,
|
||||
quiet_mode=True,
|
||||
)
|
||||
|
||||
valid_names = {t["function"]["name"] for t in tools} if tools else set()
|
||||
logger.info("Resolved %d tools for group: %s", len(valid_names), sorted(valid_names))
|
||||
return tools, valid_names
|
||||
|
||||
# =========================================================================
|
||||
# Server mode detection
|
||||
# =========================================================================
|
||||
|
||||
def _use_managed_server(self) -> bool:
|
||||
"""
|
||||
Determine if we should use ManagedServer (Phase 2) or direct server (Phase 1).
|
||||
|
||||
Phase 2 (ManagedServer) is used when the server type is 'vllm' or 'sglang',
|
||||
which go through the /generate endpoint for exact token tracking.
|
||||
|
||||
Phase 1 (direct server) is used for 'openai' server type, which uses
|
||||
/v1/chat/completions with native tool call parsing.
|
||||
"""
|
||||
if not self.server.servers:
|
||||
return False
|
||||
|
||||
server = self.server.servers[0]
|
||||
# If the server is an OpenAI server (not VLLM/SGLang), use direct mode
|
||||
from atroposlib.envs.server_handling.openai_server import OpenAIServer
|
||||
return not isinstance(server, OpenAIServer)
|
||||
|
||||
# =========================================================================
|
||||
# Core Atropos integration
|
||||
# =========================================================================
|
||||
|
||||
async def collect_trajectories(
|
||||
self, item: Item
|
||||
) -> Tuple[
|
||||
Union[Optional[ScoredDataGroup], List[Optional[ScoredDataGroup]]],
|
||||
List[Item],
|
||||
]:
|
||||
"""
|
||||
Override collect_trajectories to resolve toolsets once per group,
|
||||
then delegate to the standard group-level collection.
|
||||
|
||||
The default BaseEnv.collect_trajectories() calls collect_trajectory()
|
||||
group_size times in parallel. We resolve tools once here and store
|
||||
them for all those calls to use.
|
||||
"""
|
||||
# Resolve toolsets for this group (shared by all rollouts in the group)
|
||||
self._current_group_tools = self._resolve_tools_for_group()
|
||||
|
||||
# Delegate to the default implementation which calls collect_trajectory()
|
||||
# group_size times via asyncio.gather
|
||||
return await super().collect_trajectories(item)
|
||||
|
||||
# =========================================================================
|
||||
# Wandb rollout display -- format trajectories nicely
|
||||
# =========================================================================
|
||||
|
||||
@staticmethod
|
||||
def _format_trajectory_for_display(messages: List[Dict[str, Any]]) -> str:
|
||||
"""
|
||||
Format a conversation's messages into a readable trajectory string
|
||||
for wandb rollout tables. Shows tool calls, tool results, and reasoning
|
||||
in a structured way instead of raw token decoding.
|
||||
"""
|
||||
parts = []
|
||||
for msg in messages:
|
||||
role = msg.get("role", "unknown")
|
||||
content = msg.get("content", "")
|
||||
|
||||
if role == "system":
|
||||
parts.append(f"[SYSTEM]\n{content}")
|
||||
|
||||
elif role == "user":
|
||||
parts.append(f"[USER]\n{content}")
|
||||
|
||||
elif role == "assistant":
|
||||
# Show reasoning if present
|
||||
reasoning = msg.get("reasoning_content", "")
|
||||
if reasoning:
|
||||
# Truncate long reasoning for display
|
||||
if len(reasoning) > 300:
|
||||
reasoning = reasoning[:300] + "..."
|
||||
parts.append(f"[ASSISTANT thinking]\n{reasoning}")
|
||||
|
||||
# Show content
|
||||
if content:
|
||||
parts.append(f"[ASSISTANT]\n{content}")
|
||||
|
||||
# Show tool calls
|
||||
tool_calls = msg.get("tool_calls", [])
|
||||
for tc in tool_calls:
|
||||
func = tc.get("function", {})
|
||||
name = func.get("name", "?")
|
||||
args = func.get("arguments", "{}")
|
||||
# Truncate long arguments for display
|
||||
if len(args) > 200:
|
||||
args = args[:200] + "..."
|
||||
parts.append(f"[TOOL CALL] {name}({args})")
|
||||
|
||||
elif role == "tool":
|
||||
tool_id = msg.get("tool_call_id", "")
|
||||
result = content
|
||||
# Truncate long tool results for display
|
||||
if len(result) > 500:
|
||||
result = result[:500] + "..."
|
||||
parts.append(f"[TOOL RESULT] {result}")
|
||||
|
||||
return "\n\n".join(parts)
|
||||
|
||||
async def add_rollouts_for_wandb(
|
||||
self,
|
||||
scored_data,
|
||||
item=None,
|
||||
):
|
||||
"""
|
||||
Override to show formatted trajectories with tool calls visible,
|
||||
instead of raw token decoding which loses all structure.
|
||||
"""
|
||||
num_keep = self.config.num_rollouts_per_group_for_logging
|
||||
if num_keep == -1:
|
||||
num_keep = self.config.group_size
|
||||
|
||||
group = []
|
||||
for i in range(min(num_keep, len(scored_data.get("scores", [])))):
|
||||
score = scored_data["scores"][i]
|
||||
|
||||
# Use messages if available for rich display
|
||||
messages = None
|
||||
if scored_data.get("messages") and i < len(scored_data["messages"]):
|
||||
messages = scored_data["messages"][i]
|
||||
|
||||
if messages:
|
||||
text = self._format_trajectory_for_display(messages)
|
||||
elif scored_data.get("tokens") and i < len(scored_data["tokens"]):
|
||||
text = self.tokenizer.decode(scored_data["tokens"][i])
|
||||
else:
|
||||
text = "(no data)"
|
||||
|
||||
group.append((text, score))
|
||||
|
||||
self.rollouts_for_wandb.append(group)
|
||||
if len(self.rollouts_for_wandb) > self.config.num_rollouts_to_keep:
|
||||
self.rollouts_for_wandb.pop(0)
|
||||
|
||||
async def wandb_log(self, wandb_metrics: Optional[Dict] = None):
|
||||
"""Log base metrics including tool errors to wandb."""
|
||||
if wandb_metrics is None:
|
||||
wandb_metrics = {}
|
||||
|
||||
# Log tool error stats
|
||||
if self._tool_error_buffer:
|
||||
wandb_metrics["train/tool_errors_count"] = len(self._tool_error_buffer)
|
||||
|
||||
# Log error details as a summary string (tables can crash wandb on tmp cleanup)
|
||||
error_summaries = []
|
||||
for err in self._tool_error_buffer:
|
||||
error_summaries.append(
|
||||
f"[turn {err['turn']}] {err['tool']}({err['args'][:80]}) -> {err['error'][:150]}"
|
||||
)
|
||||
wandb_metrics["train/tool_error_details"] = "\n".join(error_summaries)
|
||||
|
||||
# Also print to stdout for immediate visibility
|
||||
for summary in error_summaries:
|
||||
print(f" Tool Error: {summary}")
|
||||
|
||||
self._tool_error_buffer = []
|
||||
else:
|
||||
wandb_metrics["train/tool_errors_count"] = 0
|
||||
|
||||
await super().wandb_log(wandb_metrics)
|
||||
|
||||
async def collect_trajectory(
|
||||
self, item: Item
|
||||
) -> Tuple[Optional[Union[ScoredDataItem, Any]], List[Item]]:
|
||||
"""
|
||||
Run a single rollout: agent loop + reward computation.
|
||||
|
||||
This is called group_size times in parallel by collect_trajectories().
|
||||
Each call gets its own task_id for terminal/browser session isolation.
|
||||
"""
|
||||
task_id = str(uuid.uuid4())
|
||||
|
||||
# Get group-level tools (resolved once in collect_trajectories)
|
||||
if self._current_group_tools is None:
|
||||
# Fallback: resolve per-trajectory if called outside collect_trajectories
|
||||
tools, valid_names = self._resolve_tools_for_group()
|
||||
else:
|
||||
tools, valid_names = self._current_group_tools
|
||||
|
||||
# Build initial messages
|
||||
messages: List[Dict[str, Any]] = []
|
||||
if self.config.system_prompt:
|
||||
messages.append({"role": "system", "content": self.config.system_prompt})
|
||||
messages.append({"role": "user", "content": self.format_prompt(item)})
|
||||
|
||||
# Run the agent loop
|
||||
result: AgentResult
|
||||
if self._use_managed_server():
|
||||
# Phase 2: ManagedServer with parser -- exact tokens + logprobs
|
||||
# Load the tool call parser from registry based on config
|
||||
from environments.tool_call_parsers import get_parser
|
||||
try:
|
||||
tc_parser = get_parser(self.config.tool_call_parser)
|
||||
except KeyError:
|
||||
logger.warning(
|
||||
"Tool call parser '%s' not found, falling back to 'hermes'",
|
||||
self.config.tool_call_parser,
|
||||
)
|
||||
tc_parser = get_parser("hermes")
|
||||
|
||||
try:
|
||||
async with self.server.managed_server(
|
||||
tokenizer=self.tokenizer,
|
||||
tool_call_parser=tc_parser,
|
||||
) as managed:
|
||||
agent = HermesAgentLoop(
|
||||
server=managed,
|
||||
tool_schemas=tools,
|
||||
valid_tool_names=valid_names,
|
||||
max_turns=self.config.max_agent_turns,
|
||||
task_id=task_id,
|
||||
temperature=self.config.agent_temperature,
|
||||
max_tokens=self.config.max_token_length,
|
||||
)
|
||||
result = await agent.run(messages)
|
||||
except NotImplementedError:
|
||||
# DummyManagedServer not allowed -- fall back to Phase 1
|
||||
logger.warning(
|
||||
"ManagedServer not available (OpenAI server?). "
|
||||
"Falling back to direct server mode."
|
||||
)
|
||||
agent = HermesAgentLoop(
|
||||
server=self.server,
|
||||
tool_schemas=tools,
|
||||
valid_tool_names=valid_names,
|
||||
max_turns=self.config.max_agent_turns,
|
||||
task_id=task_id,
|
||||
temperature=self.config.agent_temperature,
|
||||
max_tokens=self.config.max_token_length,
|
||||
)
|
||||
result = await agent.run(messages)
|
||||
else:
|
||||
# Phase 1: OpenAI server -- native tool_calls, placeholder tokens
|
||||
agent = HermesAgentLoop(
|
||||
server=self.server,
|
||||
tool_schemas=tools,
|
||||
valid_tool_names=valid_names,
|
||||
max_turns=self.config.max_agent_turns,
|
||||
task_id=task_id,
|
||||
temperature=self.config.agent_temperature,
|
||||
max_tokens=self.config.max_token_length,
|
||||
)
|
||||
result = await agent.run(messages)
|
||||
|
||||
# Skip reward computation if the agent loop produced no meaningful work
|
||||
# (e.g., API call failed on turn 1). No point spinning up a Modal sandbox
|
||||
# just to verify files that were never created.
|
||||
only_system_and_user = all(
|
||||
msg.get("role") in ("system", "user") for msg in result.messages
|
||||
)
|
||||
if result.turns_used == 0 or only_system_and_user:
|
||||
logger.warning(
|
||||
"Agent loop produced no output (turns=%d, msgs=%d). Skipping reward.",
|
||||
result.turns_used, len(result.messages),
|
||||
)
|
||||
reward = 0.0
|
||||
else:
|
||||
# Compute reward using ToolContext (gives verifier full tool access)
|
||||
ctx = ToolContext(task_id)
|
||||
try:
|
||||
reward = await self.compute_reward(item, result, ctx)
|
||||
except Exception as e:
|
||||
logger.error("compute_reward failed: %s", e)
|
||||
reward = 0.0
|
||||
finally:
|
||||
ctx.cleanup()
|
||||
|
||||
# Track tool errors for wandb logging
|
||||
if result.tool_errors:
|
||||
for err in result.tool_errors:
|
||||
self._tool_error_buffer.append({
|
||||
"turn": err.turn,
|
||||
"tool": err.tool_name,
|
||||
"args": err.arguments[:150],
|
||||
"error": err.error[:300],
|
||||
"result": err.tool_result[:300],
|
||||
})
|
||||
|
||||
# Build ScoredDataItem from ManagedServer state
|
||||
# Phase 2: real tokens/masks/logprobs from SequenceNodes
|
||||
# Phase 1: placeholder tokens (still need a valid ScoredDataItem for the pipeline)
|
||||
nodes = (result.managed_state or {}).get("nodes", [])
|
||||
|
||||
if nodes:
|
||||
# Phase 2 (or DummyManagedServer): use actual node data
|
||||
node = nodes[-1] # Final sequence node = full trajectory
|
||||
scored_item: Dict[str, Any] = {
|
||||
"tokens": node.tokens,
|
||||
"masks": node.masked_tokens,
|
||||
"scores": reward,
|
||||
}
|
||||
|
||||
# Include logprobs if available (Phase 2)
|
||||
if hasattr(node, "logprobs") and node.logprobs:
|
||||
scored_item["advantages"] = None # Computed by trainer
|
||||
scored_item["ref_logprobs"] = None
|
||||
else:
|
||||
# Phase 1 with no managed state: create placeholder tokens
|
||||
# so the data pipeline doesn't break. These are NOT suitable
|
||||
# for training but allow process mode (SFT data gen) to work.
|
||||
# Tokenize the full conversation to get approximate tokens.
|
||||
full_text = "\n".join(
|
||||
msg.get("content", "") for msg in result.messages if msg.get("content")
|
||||
)
|
||||
if self.tokenizer:
|
||||
tokens = self.tokenizer.encode(full_text, add_special_tokens=True)
|
||||
else:
|
||||
tokens = list(range(min(len(full_text) // 4, 128)))
|
||||
|
||||
scored_item = {
|
||||
"tokens": tokens,
|
||||
"masks": [-100] + tokens[1:], # Mask first token as prompt
|
||||
"scores": reward,
|
||||
}
|
||||
|
||||
# Always include messages for wandb rollout display and data logging
|
||||
scored_item["messages"] = result.messages
|
||||
|
||||
return scored_item, []
|
||||
|
||||
# =========================================================================
|
||||
# Abstract methods -- subclasses must implement
|
||||
# =========================================================================
|
||||
|
||||
@abstractmethod
|
||||
async def setup(self):
|
||||
"""
|
||||
Load dataset, initialize state.
|
||||
|
||||
Called once when the environment starts. Typical implementation:
|
||||
self.dataset = load_dataset(self.config.dataset_name, split=self.config.dataset_split)
|
||||
self.iter = 0
|
||||
"""
|
||||
raise NotImplementedError
|
||||
|
||||
@abstractmethod
|
||||
async def get_next_item(self) -> Item:
|
||||
"""
|
||||
Return the next item from the dataset for rollout.
|
||||
|
||||
Called by the base env's main loop to get items for workers.
|
||||
Should cycle through the dataset.
|
||||
"""
|
||||
raise NotImplementedError
|
||||
|
||||
@abstractmethod
|
||||
def format_prompt(self, item: Item) -> str:
|
||||
"""
|
||||
Convert a dataset item into the user message for the agent.
|
||||
|
||||
Args:
|
||||
item: Dataset item (dict, tuple, etc.)
|
||||
|
||||
Returns:
|
||||
The prompt string to send to the agent
|
||||
"""
|
||||
raise NotImplementedError
|
||||
|
||||
@abstractmethod
|
||||
async def compute_reward(
|
||||
self, item: Item, result: AgentResult, ctx: ToolContext
|
||||
) -> float:
|
||||
"""
|
||||
Score the rollout. Has full access to:
|
||||
- item: the original dataset item (ground truth, test commands, etc.)
|
||||
- result: AgentResult with full messages, turn count, reasoning, etc.
|
||||
- ctx: ToolContext -- call ANY hermes-agent tool (terminal, file, web,
|
||||
browser, vision...) scoped to this rollout's sandbox. Nothing
|
||||
is off-limits.
|
||||
|
||||
Args:
|
||||
item: The dataset item that was rolled out
|
||||
result: The agent's rollout result
|
||||
ctx: ToolContext with full tool access for verification
|
||||
|
||||
Returns:
|
||||
Reward float (typically 0.0 to 1.0, but any float is valid)
|
||||
"""
|
||||
raise NotImplementedError
|
||||
|
||||
@abstractmethod
|
||||
async def evaluate(self, *args, **kwargs):
|
||||
"""
|
||||
Periodic evaluation. Called every steps_per_eval steps.
|
||||
|
||||
Typical implementation runs the agent on a held-out eval set
|
||||
and logs metrics via wandb/evaluate_log.
|
||||
"""
|
||||
raise NotImplementedError
|
||||
229
environments/hermes_swe_env.py
Normal file
229
environments/hermes_swe_env.py
Normal file
@@ -0,0 +1,229 @@
|
||||
"""
|
||||
HermesSweEnv -- SWE-Bench Style Environment with Modal Sandboxes
|
||||
|
||||
A concrete environment for software engineering tasks where the model writes code
|
||||
and the reward function runs tests to verify correctness. Uses Modal terminal
|
||||
backend for cloud-isolated sandboxes per rollout.
|
||||
|
||||
The reward function uses ToolContext.terminal() to run test commands in the same
|
||||
Modal sandbox the model used during its agentic loop. All filesystem state from
|
||||
the model's tool calls is preserved for verification.
|
||||
|
||||
Usage:
|
||||
# Phase 1: OpenAI server type
|
||||
vllm serve YourModel --tool-parser hermes
|
||||
run-api
|
||||
python environments/hermes_swe_env.py serve \\
|
||||
--openai.base_url http://localhost:8000/v1 \\
|
||||
--openai.model_name YourModel \\
|
||||
--openai.server_type openai \\
|
||||
--env.dataset_name bigcode/humanevalpack \\
|
||||
--env.terminal_backend modal
|
||||
|
||||
# Phase 2: VLLM server type (full RL training)
|
||||
python environments/hermes_swe_env.py serve \\
|
||||
--openai.base_url http://localhost:8000/v1 \\
|
||||
--openai.model_name YourModel \\
|
||||
--openai.server_type vllm \\
|
||||
--env.tool_call_parser hermes \\
|
||||
--env.terminal_backend modal
|
||||
"""
|
||||
|
||||
import logging
|
||||
import sys
|
||||
import time
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List, Optional, Tuple, Union
|
||||
|
||||
# Ensure repo root is on sys.path for imports
|
||||
_repo_root = Path(__file__).resolve().parent.parent
|
||||
if str(_repo_root) not in sys.path:
|
||||
sys.path.insert(0, str(_repo_root))
|
||||
|
||||
from datasets import load_dataset
|
||||
|
||||
from atroposlib.envs.base import ScoredDataGroup
|
||||
from atroposlib.envs.server_handling.server_manager import APIServerConfig
|
||||
from atroposlib.type_definitions import Item
|
||||
|
||||
from environments.agent_loop import AgentResult
|
||||
from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
|
||||
from environments.tool_context import ToolContext
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class HermesSweEnvConfig(HermesAgentEnvConfig):
|
||||
"""Config with defaults for SWE-bench style tasks."""
|
||||
|
||||
pass # Inherits all fields, overrides defaults in config_init
|
||||
|
||||
|
||||
class HermesSweEnv(HermesAgentBaseEnv):
|
||||
"""
|
||||
SWE-bench style environment using Modal terminal backend.
|
||||
|
||||
The model gets a coding task, uses terminal + file + web tools to solve it,
|
||||
and the reward function runs tests in the same Modal sandbox to verify.
|
||||
|
||||
Subclass this for specific SWE datasets (HumanEval, SWE-bench, etc.)
|
||||
and customize format_prompt() and compute_reward() as needed.
|
||||
"""
|
||||
|
||||
name = "hermes-swe"
|
||||
env_config_cls = HermesSweEnvConfig
|
||||
|
||||
@classmethod
|
||||
def config_init(cls) -> Tuple[HermesSweEnvConfig, List[APIServerConfig]]:
|
||||
"""
|
||||
Default configuration for the SWE environment.
|
||||
|
||||
Uses Modal terminal backend for cloud isolation and terminal + file + web toolsets.
|
||||
"""
|
||||
env_config = HermesSweEnvConfig(
|
||||
# Toolsets: terminal for running code, file for reading/writing, web for docs
|
||||
enabled_toolsets=["terminal", "file", "web"],
|
||||
disabled_toolsets=None,
|
||||
distribution=None,
|
||||
# Agent settings -- SWE tasks need more turns
|
||||
max_agent_turns=30,
|
||||
max_token_length=4096,
|
||||
agent_temperature=1.0,
|
||||
system_prompt=(
|
||||
"You are a skilled software engineer. You have access to a terminal, "
|
||||
"file tools, and web search. Use these tools to complete the coding task. "
|
||||
"Write clean, working code and verify it runs correctly before finishing."
|
||||
),
|
||||
# Modal backend for cloud-isolated sandboxes
|
||||
terminal_backend="modal",
|
||||
# Dataset -- override via CLI for your specific SWE dataset
|
||||
dataset_name="bigcode/humanevalpack",
|
||||
dataset_split="test",
|
||||
prompt_field="prompt",
|
||||
# Atropos settings
|
||||
group_size=4,
|
||||
tokenizer_name="NousResearch/DeepHermes-3-Llama-3-3B-Preview",
|
||||
tool_call_parser="hermes",
|
||||
steps_per_eval=50,
|
||||
total_steps=500,
|
||||
use_wandb=True,
|
||||
wandb_name="hermes-swe",
|
||||
)
|
||||
|
||||
server_configs = [
|
||||
APIServerConfig(
|
||||
base_url="http://localhost:8000/v1",
|
||||
model_name="NousResearch/DeepHermes-3-Llama-3-3B-Preview",
|
||||
server_type="openai", # Phase 1; switch to "vllm" for Phase 2
|
||||
api_key="",
|
||||
)
|
||||
]
|
||||
|
||||
return env_config, server_configs
|
||||
|
||||
async def setup(self):
|
||||
"""Load the SWE dataset."""
|
||||
if self.config.dataset_name:
|
||||
self.dataset = load_dataset(
|
||||
self.config.dataset_name, split=self.config.dataset_split
|
||||
)
|
||||
else:
|
||||
# Placeholder if no dataset specified
|
||||
self.dataset = []
|
||||
self.iter = 0
|
||||
self.reward_buffer: List[float] = []
|
||||
|
||||
async def get_next_item(self) -> Dict[str, Any]:
|
||||
"""Cycle through the SWE dataset."""
|
||||
if not self.dataset:
|
||||
raise ValueError("No dataset loaded. Set dataset_name in config.")
|
||||
item = self.dataset[self.iter % len(self.dataset)]
|
||||
self.iter += 1
|
||||
return item
|
||||
|
||||
def format_prompt(self, item: Dict[str, Any]) -> str:
|
||||
"""
|
||||
Format the SWE task prompt.
|
||||
|
||||
Override this in subclasses for different dataset formats.
|
||||
Default assumes the dataset has a 'prompt' field and optionally a 'test' field.
|
||||
"""
|
||||
prompt = item.get(self.config.prompt_field, "")
|
||||
|
||||
# If the dataset has test information, include it in the prompt
|
||||
test_info = item.get("test", item.get("test_code", item.get("tests", "")))
|
||||
if test_info:
|
||||
prompt += f"\n\nTests to pass:\n{test_info}"
|
||||
|
||||
return prompt
|
||||
|
||||
async def compute_reward(
|
||||
self, item: Dict[str, Any], result: AgentResult, ctx: ToolContext
|
||||
) -> float:
|
||||
"""
|
||||
Score by running tests in the model's Modal sandbox.
|
||||
|
||||
Default implementation:
|
||||
- If the dataset item has a 'test' or 'test_code' field, run it
|
||||
- Check exit code: 0 = pass, non-zero = fail
|
||||
- Partial credit for file creation
|
||||
|
||||
Override this in subclasses for more sophisticated reward logic.
|
||||
"""
|
||||
# Find the test command from the dataset item
|
||||
test_code = item.get("test", item.get("test_code", item.get("tests", "")))
|
||||
|
||||
if test_code:
|
||||
# Run the test in the model's sandbox
|
||||
test_result = ctx.terminal(
|
||||
f'cd /workspace && python3 -c "{test_code}"', timeout=60
|
||||
)
|
||||
|
||||
if test_result["exit_code"] == 0:
|
||||
self.reward_buffer.append(1.0)
|
||||
return 1.0
|
||||
|
||||
# Partial credit: check if the model created any Python files
|
||||
file_check = ctx.terminal("find /workspace -name '*.py' -newer /tmp/.start_marker 2>/dev/null | head -5")
|
||||
if file_check["exit_code"] == 0 and file_check.get("output", "").strip():
|
||||
self.reward_buffer.append(0.1)
|
||||
return 0.1
|
||||
|
||||
self.reward_buffer.append(0.0)
|
||||
return 0.0
|
||||
|
||||
async def evaluate(self, *args, **kwargs):
|
||||
"""
|
||||
Run evaluation on a held-out set.
|
||||
|
||||
Override for dataset-specific evaluation logic.
|
||||
"""
|
||||
start_time = time.time()
|
||||
end_time = time.time()
|
||||
|
||||
eval_metrics = {"eval/placeholder": 0.0}
|
||||
await self.evaluate_log(
|
||||
metrics=eval_metrics,
|
||||
start_time=start_time,
|
||||
end_time=end_time,
|
||||
)
|
||||
|
||||
async def wandb_log(self, wandb_metrics: Optional[Dict] = None):
|
||||
"""Log SWE-specific metrics."""
|
||||
if wandb_metrics is None:
|
||||
wandb_metrics = {}
|
||||
|
||||
if self.reward_buffer:
|
||||
wandb_metrics["train/avg_reward"] = sum(self.reward_buffer) / len(
|
||||
self.reward_buffer
|
||||
)
|
||||
wandb_metrics["train/pass_rate"] = sum(
|
||||
1 for r in self.reward_buffer if r == 1.0
|
||||
) / len(self.reward_buffer)
|
||||
self.reward_buffer = []
|
||||
|
||||
await super().wandb_log(wandb_metrics)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
HermesSweEnv.cli()
|
||||
188
environments/patches.py
Normal file
188
environments/patches.py
Normal file
@@ -0,0 +1,188 @@
|
||||
"""
|
||||
Monkey patches for making hermes-agent tools work inside async frameworks (Atropos).
|
||||
|
||||
Problem:
|
||||
Some tools use asyncio.run() internally (e.g., mini-swe-agent's Modal backend,
|
||||
web_extract). This crashes when called from inside Atropos's event loop because
|
||||
asyncio.run() can't be nested.
|
||||
|
||||
Solution:
|
||||
Replace the problematic methods with versions that use a dedicated background
|
||||
thread with its own event loop. The calling code sees the same sync interface --
|
||||
call a function, get a result -- but internally the async work happens on a
|
||||
separate thread that doesn't conflict with Atropos's loop.
|
||||
|
||||
These patches are safe for normal CLI use too: when there's no running event
|
||||
loop, the behavior is identical (the background thread approach works regardless).
|
||||
|
||||
What gets patched:
|
||||
- SwerexModalEnvironment.__init__ -- creates Modal deployment on a background thread
|
||||
- SwerexModalEnvironment.execute -- runs commands on the same background thread
|
||||
- SwerexModalEnvironment.stop -- stops deployment on the background thread
|
||||
|
||||
Usage:
|
||||
Call apply_patches() once at import time (done automatically by hermes_base_env.py).
|
||||
This is idempotent -- calling it multiple times is safe.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import threading
|
||||
from typing import Any
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
_patches_applied = False
|
||||
|
||||
|
||||
class _AsyncWorker:
|
||||
"""
|
||||
A dedicated background thread with its own event loop.
|
||||
|
||||
Allows sync code to submit async coroutines and block for results,
|
||||
even when called from inside another running event loop. Used to
|
||||
bridge sync tool interfaces with async backends (Modal, SWE-ReX).
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
self._loop: asyncio.AbstractEventLoop = None
|
||||
self._thread: threading.Thread = None
|
||||
self._started = threading.Event()
|
||||
|
||||
def start(self):
|
||||
"""Start the background event loop thread."""
|
||||
self._thread = threading.Thread(target=self._run_loop, daemon=True)
|
||||
self._thread.start()
|
||||
self._started.wait(timeout=30)
|
||||
|
||||
def _run_loop(self):
|
||||
"""Background thread entry point -- runs the event loop forever."""
|
||||
self._loop = asyncio.new_event_loop()
|
||||
asyncio.set_event_loop(self._loop)
|
||||
self._started.set()
|
||||
self._loop.run_forever()
|
||||
|
||||
def run_coroutine(self, coro, timeout=600):
|
||||
"""
|
||||
Submit a coroutine to the background loop and block until it completes.
|
||||
|
||||
Safe to call from any thread, including threads that already have
|
||||
a running event loop.
|
||||
"""
|
||||
if self._loop is None or self._loop.is_closed():
|
||||
raise RuntimeError("AsyncWorker loop is not running")
|
||||
future = asyncio.run_coroutine_threadsafe(coro, self._loop)
|
||||
return future.result(timeout=timeout)
|
||||
|
||||
def stop(self):
|
||||
"""Stop the background event loop and join the thread."""
|
||||
if self._loop and self._loop.is_running():
|
||||
self._loop.call_soon_threadsafe(self._loop.stop)
|
||||
if self._thread:
|
||||
self._thread.join(timeout=10)
|
||||
|
||||
|
||||
def _patch_swerex_modal():
|
||||
"""
|
||||
Monkey patch SwerexModalEnvironment to use a background thread event loop
|
||||
instead of asyncio.run(). This makes it safe to call from inside Atropos's
|
||||
async event loop.
|
||||
|
||||
The patched methods have the exact same interface and behavior -- the only
|
||||
difference is HOW the async work is executed internally.
|
||||
"""
|
||||
try:
|
||||
from minisweagent.environments.extra.swerex_modal import (
|
||||
SwerexModalEnvironment,
|
||||
SwerexModalEnvironmentConfig,
|
||||
)
|
||||
from swerex.deployment.modal import ModalDeployment
|
||||
from swerex.runtime.abstract import Command as RexCommand
|
||||
except ImportError:
|
||||
# mini-swe-agent or swe-rex not installed -- nothing to patch
|
||||
logger.debug("mini-swe-agent Modal backend not available, skipping patch")
|
||||
return
|
||||
|
||||
# Save original methods so we can refer to config handling
|
||||
_original_init = SwerexModalEnvironment.__init__
|
||||
|
||||
def _patched_init(self, **kwargs):
|
||||
"""Patched __init__: creates Modal deployment on a background thread."""
|
||||
self.config = SwerexModalEnvironmentConfig(**kwargs)
|
||||
|
||||
# Start a dedicated event loop thread for all Modal async operations
|
||||
self._worker = _AsyncWorker()
|
||||
self._worker.start()
|
||||
|
||||
# Create AND start the deployment entirely on the worker's loop/thread
|
||||
# so all gRPC channels and async state are bound to that loop
|
||||
async def _create_and_start():
|
||||
deployment = ModalDeployment(
|
||||
image=self.config.image,
|
||||
startup_timeout=self.config.startup_timeout,
|
||||
runtime_timeout=self.config.runtime_timeout,
|
||||
deployment_timeout=self.config.deployment_timeout,
|
||||
install_pipx=self.config.install_pipx,
|
||||
modal_sandbox_kwargs=self.config.modal_sandbox_kwargs,
|
||||
)
|
||||
await deployment.start()
|
||||
return deployment
|
||||
|
||||
self.deployment = self._worker.run_coroutine(_create_and_start())
|
||||
|
||||
def _patched_execute(self, command: str, cwd: str = "", *, timeout: int | None = None) -> dict[str, Any]:
|
||||
"""Patched execute: runs commands on the background thread's loop."""
|
||||
async def _do_execute():
|
||||
return await self.deployment.runtime.execute(
|
||||
RexCommand(
|
||||
command=command,
|
||||
shell=True,
|
||||
check=False,
|
||||
cwd=cwd or self.config.cwd,
|
||||
timeout=timeout or self.config.timeout,
|
||||
merge_output_streams=True,
|
||||
env=self.config.env if self.config.env else None,
|
||||
)
|
||||
)
|
||||
|
||||
output = self._worker.run_coroutine(_do_execute())
|
||||
return {
|
||||
"output": output.stdout,
|
||||
"returncode": output.exit_code,
|
||||
}
|
||||
|
||||
def _patched_stop(self):
|
||||
"""Patched stop: stops deployment on the background thread, then stops the thread."""
|
||||
try:
|
||||
self._worker.run_coroutine(
|
||||
asyncio.wait_for(self.deployment.stop(), timeout=10),
|
||||
timeout=15,
|
||||
)
|
||||
except Exception:
|
||||
pass
|
||||
finally:
|
||||
self._worker.stop()
|
||||
|
||||
# Apply the patches
|
||||
SwerexModalEnvironment.__init__ = _patched_init
|
||||
SwerexModalEnvironment.execute = _patched_execute
|
||||
SwerexModalEnvironment.stop = _patched_stop
|
||||
|
||||
logger.debug("Patched SwerexModalEnvironment for async-safe operation")
|
||||
|
||||
|
||||
def apply_patches():
|
||||
"""
|
||||
Apply all monkey patches needed for Atropos compatibility.
|
||||
|
||||
Safe to call multiple times -- patches are only applied once.
|
||||
Safe for normal CLI use -- patched code works identically when
|
||||
there is no running event loop.
|
||||
"""
|
||||
global _patches_applied
|
||||
if _patches_applied:
|
||||
return
|
||||
|
||||
_patch_swerex_modal()
|
||||
|
||||
_patches_applied = True
|
||||
292
environments/terminal_test_env.py
Normal file
292
environments/terminal_test_env.py
Normal file
@@ -0,0 +1,292 @@
|
||||
"""
|
||||
TerminalTestEnv -- Simple Test Environment for Validating the Stack
|
||||
|
||||
A self-contained environment with inline tasks (no external dataset needed).
|
||||
Each task asks the model to create a file at a known path with specific content.
|
||||
The reward verifier cats the file and checks if the content matches.
|
||||
|
||||
Enables only terminal + file toolsets. Uses Modal terminal backend with
|
||||
OpenRouter (Claude) by default.
|
||||
|
||||
Training tasks (3):
|
||||
1. Create ~/greeting.txt with "Hello from Hermes Agent"
|
||||
2. Create ~/count.txt with numbers 1-5, one per line
|
||||
3. Create ~/answer.txt with the result of 123 + 456
|
||||
|
||||
Eval task (1):
|
||||
1. Create ~/result.txt with the result of 6 * 7
|
||||
|
||||
Usage:
|
||||
# Start Atropos API server
|
||||
run-api
|
||||
|
||||
# Run environment (uses OpenRouter + Modal by default)
|
||||
python environments/terminal_test_env.py serve
|
||||
|
||||
# Process mode (no run-api needed, saves to JSONL)
|
||||
python environments/terminal_test_env.py process \\
|
||||
--env.data_path_to_save_groups terminal_test_output.jsonl
|
||||
"""
|
||||
|
||||
import logging
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List, Optional, Tuple, Union
|
||||
|
||||
# Ensure repo root is on sys.path for imports
|
||||
_repo_root = Path(__file__).resolve().parent.parent
|
||||
if str(_repo_root) not in sys.path:
|
||||
sys.path.insert(0, str(_repo_root))
|
||||
|
||||
from atroposlib.envs.base import ScoredDataGroup
|
||||
from atroposlib.envs.server_handling.server_manager import APIServerConfig
|
||||
from atroposlib.type_definitions import Item
|
||||
|
||||
from environments.agent_loop import AgentResult
|
||||
from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
|
||||
from environments.tool_context import ToolContext
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Inline task definitions -- no external dataset needed
|
||||
# =============================================================================
|
||||
|
||||
TRAIN_TASKS = [
|
||||
{
|
||||
"prompt": "Create a file at ~/greeting.txt containing exactly the text: Hello from Hermes Agent",
|
||||
"verify_path": "~/greeting.txt",
|
||||
"expected_content": "Hello from Hermes Agent",
|
||||
},
|
||||
{
|
||||
"prompt": "Create a file at ~/count.txt containing the numbers 1 through 5, one per line",
|
||||
"verify_path": "~/count.txt",
|
||||
"expected_content": "1\n2\n3\n4\n5",
|
||||
},
|
||||
{
|
||||
"prompt": "Create a file at ~/answer.txt containing the result of 123 + 456",
|
||||
"verify_path": "~/answer.txt",
|
||||
"expected_content": "579",
|
||||
},
|
||||
]
|
||||
|
||||
EVAL_TASKS = [
|
||||
{
|
||||
"prompt": "Create a file at ~/result.txt containing the result of 6 * 7",
|
||||
"verify_path": "~/result.txt",
|
||||
"expected_content": "42",
|
||||
},
|
||||
]
|
||||
|
||||
|
||||
class TerminalTestEnvConfig(HermesAgentEnvConfig):
|
||||
"""Config with defaults suitable for terminal testing."""
|
||||
|
||||
pass # Inherits all fields, overrides defaults in config_init
|
||||
|
||||
|
||||
class TerminalTestEnv(HermesAgentBaseEnv):
|
||||
"""
|
||||
Simple test environment with inline file-creation tasks.
|
||||
|
||||
All tasks follow the same pattern: "create a file at ~/X.txt with content Y".
|
||||
The verifier runs `cat ~/X.txt` in the rollout's terminal and checks the output
|
||||
against the expected string. Same verifier logic for all tasks.
|
||||
|
||||
This environment is designed to validate the full stack end-to-end:
|
||||
- Agent loop executes tool calls (terminal/file)
|
||||
- ToolContext provides terminal access to the reward function
|
||||
- Reward function verifies file content via cat
|
||||
- Scored data flows through the Atropos pipeline
|
||||
"""
|
||||
|
||||
name = "terminal-test"
|
||||
env_config_cls = TerminalTestEnvConfig
|
||||
|
||||
@classmethod
|
||||
def config_init(cls) -> Tuple[TerminalTestEnvConfig, List[APIServerConfig]]:
|
||||
"""
|
||||
Default configuration for the terminal test environment.
|
||||
|
||||
Uses Modal terminal backend for cloud isolation and OpenRouter with
|
||||
Claude for inference. API keys loaded from ~/hermes-agent/.env.
|
||||
"""
|
||||
env_config = TerminalTestEnvConfig(
|
||||
# Terminal + file tools only
|
||||
enabled_toolsets=["terminal", "file"],
|
||||
disabled_toolsets=None,
|
||||
distribution=None,
|
||||
# Agent settings
|
||||
max_agent_turns=10, # Simple tasks, don't need many turns
|
||||
max_token_length=16000,
|
||||
agent_temperature=1.0,
|
||||
system_prompt=(
|
||||
"You are a helpful assistant with access to a terminal and file tools. "
|
||||
"Complete the user's request by using the available tools. "
|
||||
"Be precise and follow instructions exactly."
|
||||
),
|
||||
# Modal terminal backend for cloud-isolated sandboxes per rollout
|
||||
terminal_backend="modal",
|
||||
# Atropos settings
|
||||
group_size=3, # 3 rollouts per group
|
||||
tokenizer_name="NousResearch/q-30b-t-h45-e1",
|
||||
tool_call_parser="hermes",
|
||||
steps_per_eval=3, # Eval after all 3 steps
|
||||
total_steps=3, # 3 groups total (1 group per step)
|
||||
use_wandb=True,
|
||||
wandb_name="terminal-test",
|
||||
ensure_scores_are_not_same=False, # Allow all-same scores for simple tasks
|
||||
# No external dataset
|
||||
dataset_name=None,
|
||||
)
|
||||
|
||||
# OpenRouter with Claude -- API key loaded from .env (OPENROUTER_API_KEY)
|
||||
server_configs = [
|
||||
APIServerConfig(
|
||||
base_url="https://openrouter.ai/api/v1",
|
||||
model_name="anthropic/claude-opus-4.6",
|
||||
server_type="openai",
|
||||
api_key=os.getenv("OPENROUTER_API_KEY", ""),
|
||||
health_check=False, # OpenRouter doesn't have a /health endpoint
|
||||
)
|
||||
]
|
||||
|
||||
return env_config, server_configs
|
||||
|
||||
async def setup(self):
|
||||
"""Initialize inline task lists."""
|
||||
self.train_tasks = list(TRAIN_TASKS)
|
||||
self.eval_tasks = list(EVAL_TASKS)
|
||||
self.iter = 0
|
||||
# Track reward stats for wandb logging
|
||||
self.reward_buffer: List[float] = []
|
||||
|
||||
async def get_next_item(self) -> Dict[str, str]:
|
||||
"""Cycle through training tasks."""
|
||||
item = self.train_tasks[self.iter % len(self.train_tasks)]
|
||||
self.iter += 1
|
||||
return item
|
||||
|
||||
def format_prompt(self, item: Dict[str, str]) -> str:
|
||||
"""The prompt is directly in the task item."""
|
||||
return item["prompt"]
|
||||
|
||||
async def compute_reward(
|
||||
self, item: Dict[str, str], result: AgentResult, ctx: ToolContext
|
||||
) -> float:
|
||||
"""
|
||||
Verify by cat-ing the expected file path and checking content matches.
|
||||
Same verifier for all tasks -- they all write a file at a known path.
|
||||
|
||||
Scoring:
|
||||
1.0 = exact match
|
||||
0.5 = expected content is present but has extra stuff
|
||||
0.0 = file doesn't exist or content doesn't match
|
||||
"""
|
||||
verify_result = ctx.terminal(f"cat {item['verify_path']}")
|
||||
|
||||
# File doesn't exist or can't be read
|
||||
if verify_result["exit_code"] != 0:
|
||||
self.reward_buffer.append(0.0)
|
||||
return 0.0
|
||||
|
||||
actual = verify_result.get("output", "").strip()
|
||||
expected = item["expected_content"].strip()
|
||||
|
||||
# Exact match
|
||||
if actual == expected:
|
||||
self.reward_buffer.append(1.0)
|
||||
return 1.0
|
||||
|
||||
# Partial credit: expected content is present but has extra stuff
|
||||
if expected in actual:
|
||||
self.reward_buffer.append(0.5)
|
||||
return 0.5
|
||||
|
||||
self.reward_buffer.append(0.0)
|
||||
return 0.0
|
||||
|
||||
async def evaluate(self, *args, **kwargs):
|
||||
"""
|
||||
Run eval tasks using the agent loop and verify results.
|
||||
Logs accuracy metrics.
|
||||
"""
|
||||
start_time = time.time()
|
||||
correct = 0
|
||||
total = len(self.eval_tasks)
|
||||
samples = []
|
||||
|
||||
for eval_item in self.eval_tasks:
|
||||
try:
|
||||
# For eval, we do a simple single-turn completion (not full agent loop)
|
||||
# to keep eval fast. The agent loop is tested via training.
|
||||
completion = await self.server.chat_completion(
|
||||
messages=[
|
||||
{"role": "system", "content": self.config.system_prompt or ""},
|
||||
{"role": "user", "content": eval_item["prompt"]},
|
||||
],
|
||||
n=1,
|
||||
max_tokens=self.config.max_token_length,
|
||||
temperature=0.0,
|
||||
split="eval",
|
||||
)
|
||||
|
||||
response_content = (
|
||||
completion.choices[0].message.content if completion.choices else ""
|
||||
)
|
||||
|
||||
samples.append(
|
||||
{
|
||||
"prompt": eval_item["prompt"],
|
||||
"response": response_content,
|
||||
"expected": eval_item["expected_content"],
|
||||
}
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
logger.error("Eval failed for item: %s", e)
|
||||
samples.append(
|
||||
{
|
||||
"prompt": eval_item["prompt"],
|
||||
"response": f"ERROR: {e}",
|
||||
"expected": eval_item["expected_content"],
|
||||
}
|
||||
)
|
||||
|
||||
end_time = time.time()
|
||||
|
||||
eval_metrics = {
|
||||
"eval/num_samples": total,
|
||||
}
|
||||
|
||||
await self.evaluate_log(
|
||||
metrics=eval_metrics,
|
||||
samples=samples,
|
||||
start_time=start_time,
|
||||
end_time=end_time,
|
||||
)
|
||||
|
||||
async def wandb_log(self, wandb_metrics: Optional[Dict] = None):
|
||||
"""Log training metrics including reward stats and accuracy."""
|
||||
if wandb_metrics is None:
|
||||
wandb_metrics = {}
|
||||
|
||||
if self.reward_buffer:
|
||||
total = len(self.reward_buffer)
|
||||
correct = sum(1 for r in self.reward_buffer if r == 1.0)
|
||||
partial = sum(1 for r in self.reward_buffer if r == 0.5)
|
||||
|
||||
wandb_metrics["train/avg_reward"] = sum(self.reward_buffer) / total
|
||||
wandb_metrics["train/accuracy"] = correct / total
|
||||
wandb_metrics["train/partial_match_rate"] = partial / total
|
||||
wandb_metrics["train/total_rollouts"] = total
|
||||
self.reward_buffer = []
|
||||
|
||||
await super().wandb_log(wandb_metrics)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
TerminalTestEnv.cli()
|
||||
120
environments/tool_call_parsers/__init__.py
Normal file
120
environments/tool_call_parsers/__init__.py
Normal file
@@ -0,0 +1,120 @@
|
||||
"""
|
||||
Tool Call Parser Registry
|
||||
|
||||
Client-side parsers that extract structured tool_calls from raw model output text.
|
||||
Used in Phase 2 (VLLM server type) where ManagedServer's /generate endpoint returns
|
||||
raw text without tool call parsing.
|
||||
|
||||
Each parser is a standalone reimplementation of the corresponding VLLM parser's
|
||||
non-streaming extract_tool_calls() logic. No VLLM dependency -- only standard library
|
||||
(re, json, uuid) and openai types.
|
||||
|
||||
Usage:
|
||||
from environments.tool_call_parsers import get_parser
|
||||
|
||||
parser = get_parser("hermes")
|
||||
content, tool_calls = parser.parse(raw_model_output)
|
||||
# content = text with tool call markup stripped
|
||||
# tool_calls = list of ChatCompletionMessageToolCall objects, or None
|
||||
"""
|
||||
|
||||
import logging
|
||||
from abc import ABC, abstractmethod
|
||||
from typing import Dict, List, Optional, Tuple, Type
|
||||
|
||||
from openai.types.chat.chat_completion_message_tool_call import (
|
||||
ChatCompletionMessageToolCall,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Type alias for parser return value
|
||||
ParseResult = Tuple[Optional[str], Optional[List[ChatCompletionMessageToolCall]]]
|
||||
|
||||
|
||||
class ToolCallParser(ABC):
|
||||
"""
|
||||
Base class for tool call parsers.
|
||||
|
||||
Each parser knows how to extract structured tool_calls from a specific
|
||||
model family's raw output text format.
|
||||
"""
|
||||
|
||||
@abstractmethod
|
||||
def parse(self, text: str) -> ParseResult:
|
||||
"""
|
||||
Parse raw model output text for tool calls.
|
||||
|
||||
Args:
|
||||
text: Raw decoded text from the model's completion
|
||||
|
||||
Returns:
|
||||
Tuple of (content, tool_calls) where:
|
||||
- content: text with tool call markup stripped (the message 'content' field),
|
||||
or None if the entire output was tool calls
|
||||
- tool_calls: list of ChatCompletionMessageToolCall objects,
|
||||
or None if no tool calls were found
|
||||
"""
|
||||
raise NotImplementedError
|
||||
|
||||
|
||||
# Global parser registry: name -> parser class
|
||||
PARSER_REGISTRY: Dict[str, Type[ToolCallParser]] = {}
|
||||
|
||||
|
||||
def register_parser(name: str):
|
||||
"""
|
||||
Decorator to register a parser class under a given name.
|
||||
|
||||
Usage:
|
||||
@register_parser("hermes")
|
||||
class HermesToolCallParser(ToolCallParser):
|
||||
...
|
||||
"""
|
||||
|
||||
def decorator(cls: Type[ToolCallParser]) -> Type[ToolCallParser]:
|
||||
PARSER_REGISTRY[name] = cls
|
||||
return cls
|
||||
|
||||
return decorator
|
||||
|
||||
|
||||
def get_parser(name: str) -> ToolCallParser:
|
||||
"""
|
||||
Get a parser instance by name.
|
||||
|
||||
Args:
|
||||
name: Parser name (e.g., "hermes", "mistral", "llama3_json")
|
||||
|
||||
Returns:
|
||||
Instantiated parser
|
||||
|
||||
Raises:
|
||||
KeyError: If parser name is not found in registry
|
||||
"""
|
||||
if name not in PARSER_REGISTRY:
|
||||
available = sorted(PARSER_REGISTRY.keys())
|
||||
raise KeyError(
|
||||
f"Tool call parser '{name}' not found. Available parsers: {available}"
|
||||
)
|
||||
return PARSER_REGISTRY[name]()
|
||||
|
||||
|
||||
def list_parsers() -> List[str]:
|
||||
"""Return sorted list of registered parser names."""
|
||||
return sorted(PARSER_REGISTRY.keys())
|
||||
|
||||
|
||||
# Import all parser modules to trigger registration via @register_parser decorators
|
||||
# Each module registers itself when imported
|
||||
from environments.tool_call_parsers.hermes_parser import HermesToolCallParser # noqa: E402, F401
|
||||
from environments.tool_call_parsers.longcat_parser import LongcatToolCallParser # noqa: E402, F401
|
||||
from environments.tool_call_parsers.mistral_parser import MistralToolCallParser # noqa: E402, F401
|
||||
from environments.tool_call_parsers.llama_parser import LlamaToolCallParser # noqa: E402, F401
|
||||
from environments.tool_call_parsers.qwen_parser import QwenToolCallParser # noqa: E402, F401
|
||||
from environments.tool_call_parsers.deepseek_v3_parser import DeepSeekV3ToolCallParser # noqa: E402, F401
|
||||
from environments.tool_call_parsers.deepseek_v3_1_parser import DeepSeekV31ToolCallParser # noqa: E402, F401
|
||||
from environments.tool_call_parsers.kimi_k2_parser import KimiK2ToolCallParser # noqa: E402, F401
|
||||
from environments.tool_call_parsers.glm45_parser import Glm45ToolCallParser # noqa: E402, F401
|
||||
from environments.tool_call_parsers.glm47_parser import Glm47ToolCallParser # noqa: E402, F401
|
||||
from environments.tool_call_parsers.qwen3_coder_parser import Qwen3CoderToolCallParser # noqa: E402, F401
|
||||
71
environments/tool_call_parsers/deepseek_v3_1_parser.py
Normal file
71
environments/tool_call_parsers/deepseek_v3_1_parser.py
Normal file
@@ -0,0 +1,71 @@
|
||||
"""
|
||||
DeepSeek V3.1 tool call parser.
|
||||
|
||||
Similar to V3 but with a slightly different format:
|
||||
<|tool▁call▁begin|>function_name<|tool▁sep|>arguments<|tool▁call▁end|>
|
||||
|
||||
Note: V3 has type+name before the separator, V3.1 has name before and args after.
|
||||
|
||||
Based on VLLM's DeepSeekV31ToolParser.extract_tool_calls()
|
||||
"""
|
||||
|
||||
import re
|
||||
import uuid
|
||||
from typing import List, Optional
|
||||
|
||||
from openai.types.chat.chat_completion_message_tool_call import (
|
||||
ChatCompletionMessageToolCall,
|
||||
Function,
|
||||
)
|
||||
|
||||
from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
|
||||
|
||||
|
||||
@register_parser("deepseek_v3_1")
|
||||
@register_parser("deepseek_v31")
|
||||
class DeepSeekV31ToolCallParser(ToolCallParser):
|
||||
"""
|
||||
Parser for DeepSeek V3.1 tool calls.
|
||||
|
||||
Slightly different regex than V3: function_name comes before the separator,
|
||||
arguments come after (no type field, no json code block wrapper).
|
||||
"""
|
||||
|
||||
START_TOKEN = "<|tool▁calls▁begin|>"
|
||||
|
||||
# Regex captures: function_name, function_arguments
|
||||
PATTERN = re.compile(
|
||||
r"<|tool▁call▁begin|>(?P<function_name>.*?)<|tool▁sep|>(?P<function_arguments>.*?)<|tool▁call▁end|>"
|
||||
)
|
||||
|
||||
def parse(self, text: str) -> ParseResult:
|
||||
if self.START_TOKEN not in text:
|
||||
return text, None
|
||||
|
||||
try:
|
||||
matches = self.PATTERN.findall(text)
|
||||
if not matches:
|
||||
return text, None
|
||||
|
||||
tool_calls: List[ChatCompletionMessageToolCall] = []
|
||||
for match in matches:
|
||||
func_name, func_args = match
|
||||
tool_calls.append(
|
||||
ChatCompletionMessageToolCall(
|
||||
id=f"call_{uuid.uuid4().hex[:8]}",
|
||||
type="function",
|
||||
function=Function(
|
||||
name=func_name.strip(),
|
||||
arguments=func_args.strip(),
|
||||
),
|
||||
)
|
||||
)
|
||||
|
||||
if not tool_calls:
|
||||
return text, None
|
||||
|
||||
content = text[: text.find(self.START_TOKEN)].strip()
|
||||
return content if content else None, tool_calls
|
||||
|
||||
except Exception:
|
||||
return text, None
|
||||
75
environments/tool_call_parsers/deepseek_v3_parser.py
Normal file
75
environments/tool_call_parsers/deepseek_v3_parser.py
Normal file
@@ -0,0 +1,75 @@
|
||||
"""
|
||||
DeepSeek V3 tool call parser.
|
||||
|
||||
Format uses special unicode tokens:
|
||||
<|tool▁calls▁begin|>
|
||||
<|tool▁call▁begin|>type<|tool▁sep|>function_name
|
||||
```json
|
||||
{"arg": "value"}
|
||||
```
|
||||
<|tool▁call▁end|>
|
||||
<|tool▁calls▁end|>
|
||||
|
||||
Based on VLLM's DeepSeekV3ToolParser.extract_tool_calls()
|
||||
"""
|
||||
|
||||
import re
|
||||
import uuid
|
||||
from typing import List, Optional
|
||||
|
||||
from openai.types.chat.chat_completion_message_tool_call import (
|
||||
ChatCompletionMessageToolCall,
|
||||
Function,
|
||||
)
|
||||
|
||||
from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
|
||||
|
||||
|
||||
@register_parser("deepseek_v3")
|
||||
class DeepSeekV3ToolCallParser(ToolCallParser):
|
||||
"""
|
||||
Parser for DeepSeek V3 tool calls.
|
||||
|
||||
Uses special unicode tokens with fullwidth angle brackets and block elements.
|
||||
Extracts type, function name, and JSON arguments from the structured format.
|
||||
"""
|
||||
|
||||
START_TOKEN = "<|tool▁calls▁begin|>"
|
||||
|
||||
# Regex captures: type, function_name, function_arguments
|
||||
PATTERN = re.compile(
|
||||
r"<|tool▁call▁begin|>(?P<type>.*)<|tool▁sep|>(?P<function_name>.*)\n```json\n(?P<function_arguments>.*)\n```<|tool▁call▁end|>"
|
||||
)
|
||||
|
||||
def parse(self, text: str) -> ParseResult:
|
||||
if self.START_TOKEN not in text:
|
||||
return text, None
|
||||
|
||||
try:
|
||||
matches = self.PATTERN.findall(text)
|
||||
if not matches:
|
||||
return text, None
|
||||
|
||||
tool_calls: List[ChatCompletionMessageToolCall] = []
|
||||
for match in matches:
|
||||
tc_type, func_name, func_args = match
|
||||
tool_calls.append(
|
||||
ChatCompletionMessageToolCall(
|
||||
id=f"call_{uuid.uuid4().hex[:8]}",
|
||||
type="function",
|
||||
function=Function(
|
||||
name=func_name.strip(),
|
||||
arguments=func_args.strip(),
|
||||
),
|
||||
)
|
||||
)
|
||||
|
||||
if not tool_calls:
|
||||
return text, None
|
||||
|
||||
# Content is everything before the tool calls section
|
||||
content = text[: text.find(self.START_TOKEN)].strip()
|
||||
return content if content else None, tool_calls
|
||||
|
||||
except Exception:
|
||||
return text, None
|
||||
109
environments/tool_call_parsers/glm45_parser.py
Normal file
109
environments/tool_call_parsers/glm45_parser.py
Normal file
@@ -0,0 +1,109 @@
|
||||
"""
|
||||
GLM 4.5 (GLM-4-MoE) tool call parser.
|
||||
|
||||
Format uses custom arg_key/arg_value tags rather than standard JSON:
|
||||
<tool_call>function_name
|
||||
<arg_key>param1</arg_key><arg_value>value1</arg_value>
|
||||
<arg_key>param2</arg_key><arg_value>value2</arg_value>
|
||||
</tool_call>
|
||||
|
||||
Values are deserialized using json.loads -> ast.literal_eval -> raw string fallback.
|
||||
|
||||
Based on VLLM's Glm4MoeModelToolParser.extract_tool_calls()
|
||||
"""
|
||||
|
||||
import ast
|
||||
import json
|
||||
import re
|
||||
import uuid
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
from openai.types.chat.chat_completion_message_tool_call import (
|
||||
ChatCompletionMessageToolCall,
|
||||
Function,
|
||||
)
|
||||
|
||||
from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
|
||||
|
||||
|
||||
def _deserialize_value(value: str) -> Any:
|
||||
"""
|
||||
Try to deserialize a string value to its native Python type.
|
||||
Attempts json.loads, then ast.literal_eval, then returns raw string.
|
||||
"""
|
||||
try:
|
||||
return json.loads(value)
|
||||
except (json.JSONDecodeError, TypeError):
|
||||
pass
|
||||
|
||||
try:
|
||||
return ast.literal_eval(value)
|
||||
except (ValueError, SyntaxError, TypeError):
|
||||
pass
|
||||
|
||||
return value
|
||||
|
||||
|
||||
@register_parser("glm45")
|
||||
class Glm45ToolCallParser(ToolCallParser):
|
||||
"""
|
||||
Parser for GLM 4.5 (GLM-4-MoE) tool calls.
|
||||
|
||||
Uses <tool_call>...</tool_call> tags with <arg_key>/<arg_value> pairs
|
||||
instead of standard JSON arguments.
|
||||
"""
|
||||
|
||||
FUNC_CALL_REGEX = re.compile(r"<tool_call>.*?</tool_call>", re.DOTALL)
|
||||
FUNC_DETAIL_REGEX = re.compile(r"<tool_call>([^\n]*)\n(.*)</tool_call>", re.DOTALL)
|
||||
FUNC_ARG_REGEX = re.compile(
|
||||
r"<arg_key>(.*?)</arg_key>\s*<arg_value>(.*?)</arg_value>", re.DOTALL
|
||||
)
|
||||
|
||||
START_TOKEN = "<tool_call>"
|
||||
|
||||
def parse(self, text: str) -> ParseResult:
|
||||
if self.START_TOKEN not in text:
|
||||
return text, None
|
||||
|
||||
try:
|
||||
matched_calls = self.FUNC_CALL_REGEX.findall(text)
|
||||
if not matched_calls:
|
||||
return text, None
|
||||
|
||||
tool_calls: List[ChatCompletionMessageToolCall] = []
|
||||
|
||||
for match in matched_calls:
|
||||
detail = self.FUNC_DETAIL_REGEX.search(match)
|
||||
if not detail:
|
||||
continue
|
||||
|
||||
func_name = detail.group(1).strip()
|
||||
func_args_raw = detail.group(2)
|
||||
|
||||
# Parse arg_key/arg_value pairs
|
||||
pairs = self.FUNC_ARG_REGEX.findall(func_args_raw) if func_args_raw else []
|
||||
arg_dict: Dict[str, Any] = {}
|
||||
for key, value in pairs:
|
||||
arg_key = key.strip()
|
||||
arg_val = _deserialize_value(value.strip())
|
||||
arg_dict[arg_key] = arg_val
|
||||
|
||||
tool_calls.append(
|
||||
ChatCompletionMessageToolCall(
|
||||
id=f"call_{uuid.uuid4().hex[:8]}",
|
||||
type="function",
|
||||
function=Function(
|
||||
name=func_name,
|
||||
arguments=json.dumps(arg_dict, ensure_ascii=False),
|
||||
),
|
||||
)
|
||||
)
|
||||
|
||||
if not tool_calls:
|
||||
return text, None
|
||||
|
||||
content = text[: text.find(self.START_TOKEN)].strip()
|
||||
return content if content else None, tool_calls
|
||||
|
||||
except Exception:
|
||||
return text, None
|
||||
35
environments/tool_call_parsers/glm47_parser.py
Normal file
35
environments/tool_call_parsers/glm47_parser.py
Normal file
@@ -0,0 +1,35 @@
|
||||
"""
|
||||
GLM 4.7 tool call parser.
|
||||
|
||||
Same as GLM 4.5 but with slightly different regex patterns.
|
||||
The tool_call tags may wrap differently and arg parsing handles
|
||||
newlines between key/value pairs.
|
||||
|
||||
Based on VLLM's Glm47MoeModelToolParser (extends Glm4MoeModelToolParser).
|
||||
"""
|
||||
|
||||
import re
|
||||
|
||||
from environments.tool_call_parsers import ParseResult, register_parser
|
||||
from environments.tool_call_parsers.glm45_parser import Glm45ToolCallParser
|
||||
|
||||
|
||||
@register_parser("glm47")
|
||||
class Glm47ToolCallParser(Glm45ToolCallParser):
|
||||
"""
|
||||
Parser for GLM 4.7 tool calls.
|
||||
Extends GLM 4.5 with updated regex patterns.
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
# GLM 4.7 uses a slightly different detail regex that includes
|
||||
# the <tool_call> wrapper and optional arg_key content
|
||||
self.FUNC_DETAIL_REGEX = re.compile(
|
||||
r"<tool_call>(.*?)(<arg_key>.*?)?</tool_call>", re.DOTALL
|
||||
)
|
||||
# GLM 4.7 handles newlines between arg_key and arg_value tags
|
||||
self.FUNC_ARG_REGEX = re.compile(
|
||||
r"<arg_key>(.*?)</arg_key>(?:\\n|\s)*<arg_value>(.*?)</arg_value>",
|
||||
re.DOTALL,
|
||||
)
|
||||
73
environments/tool_call_parsers/hermes_parser.py
Normal file
73
environments/tool_call_parsers/hermes_parser.py
Normal file
@@ -0,0 +1,73 @@
|
||||
"""
|
||||
Hermes tool call parser.
|
||||
|
||||
Format: <tool_call>{"name": "func", "arguments": {...}}</tool_call>
|
||||
Based on VLLM's Hermes2ProToolParser.extract_tool_calls()
|
||||
"""
|
||||
|
||||
import json
|
||||
import re
|
||||
import uuid
|
||||
from typing import List, Optional, Tuple
|
||||
|
||||
from openai.types.chat.chat_completion_message_tool_call import (
|
||||
ChatCompletionMessageToolCall,
|
||||
Function,
|
||||
)
|
||||
|
||||
from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
|
||||
|
||||
|
||||
@register_parser("hermes")
|
||||
class HermesToolCallParser(ToolCallParser):
|
||||
"""
|
||||
Parser for Hermes-format tool calls.
|
||||
|
||||
Matches <tool_call>...</tool_call> tags containing JSON with "name" and "arguments".
|
||||
Also handles unclosed <tool_call> at end-of-string (truncated generation).
|
||||
"""
|
||||
|
||||
# Matches both closed and unclosed tool_call tags
|
||||
PATTERN = re.compile(
|
||||
r"<tool_call>\s*(.*?)\s*</tool_call>|<tool_call>\s*(.*)", re.DOTALL
|
||||
)
|
||||
|
||||
def parse(self, text: str) -> ParseResult:
|
||||
if "<tool_call>" not in text:
|
||||
return text, None
|
||||
|
||||
try:
|
||||
matches = self.PATTERN.findall(text)
|
||||
if not matches:
|
||||
return text, None
|
||||
|
||||
tool_calls: List[ChatCompletionMessageToolCall] = []
|
||||
for match in matches:
|
||||
# match is a tuple: (closed_content, unclosed_content)
|
||||
raw_json = match[0] if match[0] else match[1]
|
||||
if not raw_json.strip():
|
||||
continue
|
||||
|
||||
tc_data = json.loads(raw_json)
|
||||
tool_calls.append(
|
||||
ChatCompletionMessageToolCall(
|
||||
id=f"call_{uuid.uuid4().hex[:8]}",
|
||||
type="function",
|
||||
function=Function(
|
||||
name=tc_data["name"],
|
||||
arguments=json.dumps(
|
||||
tc_data.get("arguments", {}), ensure_ascii=False
|
||||
),
|
||||
),
|
||||
)
|
||||
)
|
||||
|
||||
if not tool_calls:
|
||||
return text, None
|
||||
|
||||
# Content is everything before the first <tool_call> tag
|
||||
content = text[: text.find("<tool_call>")].strip()
|
||||
return content if content else None, tool_calls
|
||||
|
||||
except Exception:
|
||||
return text, None
|
||||
93
environments/tool_call_parsers/kimi_k2_parser.py
Normal file
93
environments/tool_call_parsers/kimi_k2_parser.py
Normal file
@@ -0,0 +1,93 @@
|
||||
"""
|
||||
Kimi K2 tool call parser.
|
||||
|
||||
Format:
|
||||
<|tool_calls_section_begin|>
|
||||
<|tool_call_begin|>function_id:0<|tool_call_argument_begin|>{"arg": "val"}<|tool_call_end|>
|
||||
<|tool_calls_section_end|>
|
||||
|
||||
The function_id format is typically "functions.func_name:index" or "func_name:index".
|
||||
|
||||
Based on VLLM's KimiK2ToolParser.extract_tool_calls()
|
||||
"""
|
||||
|
||||
import re
|
||||
import uuid
|
||||
from typing import List, Optional
|
||||
|
||||
from openai.types.chat.chat_completion_message_tool_call import (
|
||||
ChatCompletionMessageToolCall,
|
||||
Function,
|
||||
)
|
||||
|
||||
from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
|
||||
|
||||
|
||||
@register_parser("kimi_k2")
|
||||
class KimiK2ToolCallParser(ToolCallParser):
|
||||
"""
|
||||
Parser for Kimi K2 tool calls.
|
||||
|
||||
Uses section begin/end tokens wrapping individual tool call begin/end tokens.
|
||||
The tool_call_id contains the function name (after last dot, before colon).
|
||||
"""
|
||||
|
||||
# Support both singular and plural variants
|
||||
START_TOKENS = [
|
||||
"<|tool_calls_section_begin|>",
|
||||
"<|tool_call_section_begin|>",
|
||||
]
|
||||
|
||||
# Regex captures: tool_call_id (e.g., "functions.get_weather:0"), function_arguments
|
||||
PATTERN = re.compile(
|
||||
r"<\|tool_call_begin\|>\s*(?P<tool_call_id>[^<]+:\d+)\s*"
|
||||
r"<\|tool_call_argument_begin\|>\s*"
|
||||
r"(?P<function_arguments>(?:(?!<\|tool_call_begin\|>).)*?)\s*"
|
||||
r"<\|tool_call_end\|>",
|
||||
re.DOTALL,
|
||||
)
|
||||
|
||||
def parse(self, text: str) -> ParseResult:
|
||||
# Check for any variant of the start token
|
||||
has_start = any(token in text for token in self.START_TOKENS)
|
||||
if not has_start:
|
||||
return text, None
|
||||
|
||||
try:
|
||||
matches = self.PATTERN.findall(text)
|
||||
if not matches:
|
||||
return text, None
|
||||
|
||||
tool_calls: List[ChatCompletionMessageToolCall] = []
|
||||
for match in matches:
|
||||
function_id, function_args = match
|
||||
|
||||
# Extract function name from ID format: "functions.get_weather:0" -> "get_weather"
|
||||
function_name = function_id.split(":")[0].split(".")[-1]
|
||||
|
||||
tool_calls.append(
|
||||
ChatCompletionMessageToolCall(
|
||||
id=function_id, # Preserve the original ID format
|
||||
type="function",
|
||||
function=Function(
|
||||
name=function_name,
|
||||
arguments=function_args.strip(),
|
||||
),
|
||||
)
|
||||
)
|
||||
|
||||
if not tool_calls:
|
||||
return text, None
|
||||
|
||||
# Content is everything before the tool calls section
|
||||
earliest_start = len(text)
|
||||
for token in self.START_TOKENS:
|
||||
idx = text.find(token)
|
||||
if idx >= 0 and idx < earliest_start:
|
||||
earliest_start = idx
|
||||
|
||||
content = text[:earliest_start].strip()
|
||||
return content if content else None, tool_calls
|
||||
|
||||
except Exception:
|
||||
return text, None
|
||||
96
environments/tool_call_parsers/llama_parser.py
Normal file
96
environments/tool_call_parsers/llama_parser.py
Normal file
@@ -0,0 +1,96 @@
|
||||
"""
|
||||
Llama 3.x / 4 tool call parser.
|
||||
|
||||
Format: The model outputs JSON objects with "name" and "arguments" (or "parameters") keys.
|
||||
May be preceded by <|python_tag|> token. Supports multiple JSON objects separated
|
||||
by content or semicolons.
|
||||
|
||||
Based on VLLM's Llama3JsonToolParser.extract_tool_calls()
|
||||
"""
|
||||
|
||||
import json
|
||||
import re
|
||||
import uuid
|
||||
from typing import List, Optional
|
||||
|
||||
from openai.types.chat.chat_completion_message_tool_call import (
|
||||
ChatCompletionMessageToolCall,
|
||||
Function,
|
||||
)
|
||||
|
||||
from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
|
||||
|
||||
|
||||
@register_parser("llama3_json")
|
||||
@register_parser("llama4_json")
|
||||
class LlamaToolCallParser(ToolCallParser):
|
||||
"""
|
||||
Parser for Llama 3.x and 4 JSON-format tool calls.
|
||||
|
||||
Finds JSON objects containing "name" + ("arguments" or "parameters") keys.
|
||||
Uses Python's json.JSONDecoder.raw_decode for robust extraction of
|
||||
JSON objects from mixed text.
|
||||
"""
|
||||
|
||||
BOT_TOKEN = "<|python_tag|>"
|
||||
|
||||
# Regex to find the start of potential JSON objects
|
||||
JSON_START = re.compile(r"\{")
|
||||
|
||||
def parse(self, text: str) -> ParseResult:
|
||||
# Quick check: need either the bot token or a JSON brace
|
||||
if self.BOT_TOKEN not in text and "{" not in text:
|
||||
return text, None
|
||||
|
||||
try:
|
||||
decoder = json.JSONDecoder()
|
||||
tool_calls: List[ChatCompletionMessageToolCall] = []
|
||||
end_index = -1 # Track where the last parsed JSON ended
|
||||
|
||||
for match in self.JSON_START.finditer(text):
|
||||
start = match.start()
|
||||
# Skip if this brace is inside a previously parsed JSON object
|
||||
if start <= end_index:
|
||||
continue
|
||||
|
||||
try:
|
||||
obj, json_end = decoder.raw_decode(text[start:])
|
||||
end_index = start + json_end
|
||||
|
||||
# Must have "name" and either "arguments" or "parameters"
|
||||
name = obj.get("name")
|
||||
args = obj.get("arguments", obj.get("parameters"))
|
||||
|
||||
if not name or args is None:
|
||||
continue
|
||||
|
||||
# Normalize arguments to JSON string
|
||||
if isinstance(args, dict):
|
||||
args = json.dumps(args, ensure_ascii=False)
|
||||
elif not isinstance(args, str):
|
||||
args = json.dumps(args, ensure_ascii=False)
|
||||
|
||||
tool_calls.append(
|
||||
ChatCompletionMessageToolCall(
|
||||
id=f"call_{uuid.uuid4().hex[:8]}",
|
||||
type="function",
|
||||
function=Function(name=name, arguments=args),
|
||||
)
|
||||
)
|
||||
except (json.JSONDecodeError, KeyError, ValueError):
|
||||
continue
|
||||
|
||||
if not tool_calls:
|
||||
return text, None
|
||||
|
||||
# Content is everything before the first tool call JSON
|
||||
# Find where the first tool call starts in the text
|
||||
first_tc_start = text.find("{")
|
||||
if self.BOT_TOKEN in text:
|
||||
first_tc_start = text.find(self.BOT_TOKEN)
|
||||
content = text[:first_tc_start].strip() if first_tc_start > 0 else None
|
||||
|
||||
return content, tool_calls
|
||||
|
||||
except Exception:
|
||||
return text, None
|
||||
69
environments/tool_call_parsers/longcat_parser.py
Normal file
69
environments/tool_call_parsers/longcat_parser.py
Normal file
@@ -0,0 +1,69 @@
|
||||
"""
|
||||
Longcat Flash Chat tool call parser.
|
||||
|
||||
Same as Hermes but uses <longcat_tool_call> tags instead of <tool_call>.
|
||||
Based on VLLM's LongcatFlashToolParser (extends Hermes2ProToolParser).
|
||||
"""
|
||||
|
||||
import json
|
||||
import re
|
||||
import uuid
|
||||
from typing import List, Optional
|
||||
|
||||
from openai.types.chat.chat_completion_message_tool_call import (
|
||||
ChatCompletionMessageToolCall,
|
||||
Function,
|
||||
)
|
||||
|
||||
from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
|
||||
|
||||
|
||||
@register_parser("longcat")
|
||||
class LongcatToolCallParser(ToolCallParser):
|
||||
"""
|
||||
Parser for Longcat Flash Chat tool calls.
|
||||
Identical logic to Hermes, just different tag names.
|
||||
"""
|
||||
|
||||
PATTERN = re.compile(
|
||||
r"<longcat_tool_call>\s*(.*?)\s*</longcat_tool_call>|<longcat_tool_call>\s*(.*)",
|
||||
re.DOTALL,
|
||||
)
|
||||
|
||||
def parse(self, text: str) -> ParseResult:
|
||||
if "<longcat_tool_call>" not in text:
|
||||
return text, None
|
||||
|
||||
try:
|
||||
matches = self.PATTERN.findall(text)
|
||||
if not matches:
|
||||
return text, None
|
||||
|
||||
tool_calls: List[ChatCompletionMessageToolCall] = []
|
||||
for match in matches:
|
||||
raw_json = match[0] if match[0] else match[1]
|
||||
if not raw_json.strip():
|
||||
continue
|
||||
|
||||
tc_data = json.loads(raw_json)
|
||||
tool_calls.append(
|
||||
ChatCompletionMessageToolCall(
|
||||
id=f"call_{uuid.uuid4().hex[:8]}",
|
||||
type="function",
|
||||
function=Function(
|
||||
name=tc_data["name"],
|
||||
arguments=json.dumps(
|
||||
tc_data.get("arguments", {}), ensure_ascii=False
|
||||
),
|
||||
),
|
||||
)
|
||||
)
|
||||
|
||||
if not tool_calls:
|
||||
return text, None
|
||||
|
||||
content = text[: text.find("<longcat_tool_call>")].strip()
|
||||
return content if content else None, tool_calls
|
||||
|
||||
except Exception:
|
||||
return text, None
|
||||
130
environments/tool_call_parsers/mistral_parser.py
Normal file
130
environments/tool_call_parsers/mistral_parser.py
Normal file
@@ -0,0 +1,130 @@
|
||||
"""
|
||||
Mistral tool call parser.
|
||||
|
||||
Supports two formats depending on tokenizer version:
|
||||
- Pre-v11: content[TOOL_CALLS] [{"name": ..., "arguments": {...}}, ...]
|
||||
- v11+: content[TOOL_CALLS]tool_name1{"arg": "val"}[TOOL_CALLS]tool_name2{"arg": "val"}
|
||||
|
||||
Based on VLLM's MistralToolParser.extract_tool_calls()
|
||||
The [TOOL_CALLS] token is the bot_token used by Mistral models.
|
||||
"""
|
||||
|
||||
import json
|
||||
import re
|
||||
import uuid
|
||||
from typing import List, Optional
|
||||
|
||||
from openai.types.chat.chat_completion_message_tool_call import (
|
||||
ChatCompletionMessageToolCall,
|
||||
Function,
|
||||
)
|
||||
|
||||
from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
|
||||
|
||||
|
||||
def _generate_mistral_id() -> str:
|
||||
"""Mistral tool call IDs are 9-char alphanumeric strings."""
|
||||
import random
|
||||
import string
|
||||
|
||||
return "".join(random.choices(string.ascii_letters + string.digits, k=9))
|
||||
|
||||
|
||||
@register_parser("mistral")
|
||||
class MistralToolCallParser(ToolCallParser):
|
||||
"""
|
||||
Parser for Mistral-format tool calls.
|
||||
|
||||
Detects format by checking if the content after [TOOL_CALLS] starts with '['
|
||||
(pre-v11 JSON array) or with a tool name (v11+ format).
|
||||
"""
|
||||
|
||||
# The [TOOL_CALLS] token -- may appear as different strings depending on tokenizer
|
||||
BOT_TOKEN = "[TOOL_CALLS]"
|
||||
|
||||
# Fallback regex for pre-v11 format when JSON parsing fails
|
||||
TOOL_CALL_REGEX = re.compile(r"\[?\s*(\{.*?\})\s*\]?", re.DOTALL)
|
||||
|
||||
def parse(self, text: str) -> ParseResult:
|
||||
if self.BOT_TOKEN not in text:
|
||||
return text, None
|
||||
|
||||
try:
|
||||
parts = text.split(self.BOT_TOKEN)
|
||||
content = parts[0].strip()
|
||||
raw_tool_calls = parts[1:]
|
||||
|
||||
# Detect format: if the first raw part starts with '[', it's pre-v11
|
||||
first_raw = raw_tool_calls[0].strip() if raw_tool_calls else ""
|
||||
is_pre_v11 = first_raw.startswith("[") or first_raw.startswith("{")
|
||||
|
||||
tool_calls: List[ChatCompletionMessageToolCall] = []
|
||||
|
||||
if not is_pre_v11:
|
||||
# v11+ format: [TOOL_CALLS]tool_name{args}[TOOL_CALLS]tool_name2{args2}
|
||||
for raw in raw_tool_calls:
|
||||
raw = raw.strip()
|
||||
if not raw or "{" not in raw:
|
||||
continue
|
||||
|
||||
brace_idx = raw.find("{")
|
||||
tool_name = raw[:brace_idx].strip()
|
||||
args_str = raw[brace_idx:]
|
||||
|
||||
tool_calls.append(
|
||||
ChatCompletionMessageToolCall(
|
||||
id=_generate_mistral_id(),
|
||||
type="function",
|
||||
function=Function(name=tool_name, arguments=args_str),
|
||||
)
|
||||
)
|
||||
else:
|
||||
# Pre-v11 format: [TOOL_CALLS] [{"name": ..., "arguments": {...}}]
|
||||
try:
|
||||
parsed = json.loads(first_raw)
|
||||
if isinstance(parsed, dict):
|
||||
parsed = [parsed]
|
||||
|
||||
for tc in parsed:
|
||||
args = tc.get("arguments", {})
|
||||
if isinstance(args, dict):
|
||||
args = json.dumps(args, ensure_ascii=False)
|
||||
|
||||
tool_calls.append(
|
||||
ChatCompletionMessageToolCall(
|
||||
id=_generate_mistral_id(),
|
||||
type="function",
|
||||
function=Function(
|
||||
name=tc["name"], arguments=args
|
||||
),
|
||||
)
|
||||
)
|
||||
except json.JSONDecodeError:
|
||||
# Fallback regex extraction
|
||||
match = self.TOOL_CALL_REGEX.findall(first_raw)
|
||||
if match:
|
||||
for raw_json in match:
|
||||
try:
|
||||
tc = json.loads(raw_json)
|
||||
args = tc.get("arguments", {})
|
||||
if isinstance(args, dict):
|
||||
args = json.dumps(args, ensure_ascii=False)
|
||||
tool_calls.append(
|
||||
ChatCompletionMessageToolCall(
|
||||
id=_generate_mistral_id(),
|
||||
type="function",
|
||||
function=Function(
|
||||
name=tc["name"], arguments=args
|
||||
),
|
||||
)
|
||||
)
|
||||
except (json.JSONDecodeError, KeyError):
|
||||
continue
|
||||
|
||||
if not tool_calls:
|
||||
return text, None
|
||||
|
||||
return content if content else None, tool_calls
|
||||
|
||||
except Exception:
|
||||
return text, None
|
||||
163
environments/tool_call_parsers/qwen3_coder_parser.py
Normal file
163
environments/tool_call_parsers/qwen3_coder_parser.py
Normal file
@@ -0,0 +1,163 @@
|
||||
"""
|
||||
Qwen3-Coder tool call parser.
|
||||
|
||||
Format uses XML-style nested tags:
|
||||
<tool_call>
|
||||
<function=function_name>
|
||||
<parameter=param_name>value</parameter>
|
||||
<parameter=param_name2>value2</parameter>
|
||||
</function>
|
||||
</tool_call>
|
||||
|
||||
Parameters are extracted from <parameter=name>value</parameter> tags and
|
||||
type-converted using the schema if available, otherwise treated as strings.
|
||||
|
||||
Based on VLLM's Qwen3CoderToolParser.extract_tool_calls()
|
||||
"""
|
||||
|
||||
import ast
|
||||
import json
|
||||
import re
|
||||
import uuid
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
from openai.types.chat.chat_completion_message_tool_call import (
|
||||
ChatCompletionMessageToolCall,
|
||||
Function,
|
||||
)
|
||||
|
||||
from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser
|
||||
|
||||
|
||||
def _try_convert_value(value: str) -> Any:
|
||||
"""
|
||||
Try to convert a parameter value string to a native Python type.
|
||||
Handles null, numbers, booleans, JSON objects/arrays, and falls back to string.
|
||||
"""
|
||||
stripped = value.strip()
|
||||
|
||||
# Handle null
|
||||
if stripped.lower() == "null":
|
||||
return None
|
||||
|
||||
# Try JSON first (handles objects, arrays, strings, numbers, booleans)
|
||||
try:
|
||||
return json.loads(stripped)
|
||||
except (json.JSONDecodeError, TypeError):
|
||||
pass
|
||||
|
||||
# Try Python literal eval (handles tuples, etc.)
|
||||
try:
|
||||
return ast.literal_eval(stripped)
|
||||
except (ValueError, SyntaxError, TypeError):
|
||||
pass
|
||||
|
||||
# Return as string
|
||||
return stripped
|
||||
|
||||
|
||||
@register_parser("qwen3_coder")
|
||||
class Qwen3CoderToolCallParser(ToolCallParser):
|
||||
"""
|
||||
Parser for Qwen3-Coder XML-format tool calls.
|
||||
|
||||
Uses nested XML tags: <tool_call><function=name><parameter=key>val</parameter></function></tool_call>
|
||||
"""
|
||||
|
||||
START_TOKEN = "<tool_call>"
|
||||
FUNCTION_PREFIX = "<function="
|
||||
|
||||
# Find complete tool_call blocks (or unclosed at end)
|
||||
TOOL_CALL_REGEX = re.compile(
|
||||
r"<tool_call>(.*?)</tool_call>|<tool_call>(.*?)$", re.DOTALL
|
||||
)
|
||||
|
||||
# Find function blocks within a tool_call
|
||||
FUNCTION_REGEX = re.compile(
|
||||
r"<function=(.*?)</function>|<function=(.*)$", re.DOTALL
|
||||
)
|
||||
|
||||
# Find parameter blocks within a function
|
||||
PARAMETER_REGEX = re.compile(
|
||||
r"<parameter=(.*?)(?:</parameter>|(?=<parameter=)|(?=</function>)|$)",
|
||||
re.DOTALL,
|
||||
)
|
||||
|
||||
def _parse_function_call(self, function_str: str) -> Optional[ChatCompletionMessageToolCall]:
|
||||
"""Parse a single <function=name>...</function> block into a ToolCall."""
|
||||
try:
|
||||
# Extract function name: everything before the first '>'
|
||||
gt_idx = function_str.index(">")
|
||||
func_name = function_str[:gt_idx].strip()
|
||||
params_str = function_str[gt_idx + 1:]
|
||||
|
||||
# Extract parameters
|
||||
param_dict: Dict[str, Any] = {}
|
||||
for match_text in self.PARAMETER_REGEX.findall(params_str):
|
||||
if ">" not in match_text:
|
||||
continue
|
||||
eq_idx = match_text.index(">")
|
||||
param_name = match_text[:eq_idx].strip()
|
||||
param_value = match_text[eq_idx + 1:]
|
||||
|
||||
# Clean up whitespace
|
||||
if param_value.startswith("\n"):
|
||||
param_value = param_value[1:]
|
||||
if param_value.endswith("\n"):
|
||||
param_value = param_value[:-1]
|
||||
|
||||
param_dict[param_name] = _try_convert_value(param_value)
|
||||
|
||||
return ChatCompletionMessageToolCall(
|
||||
id=f"call_{uuid.uuid4().hex[:24]}",
|
||||
type="function",
|
||||
function=Function(
|
||||
name=func_name,
|
||||
arguments=json.dumps(param_dict, ensure_ascii=False),
|
||||
),
|
||||
)
|
||||
except (ValueError, IndexError):
|
||||
return None
|
||||
|
||||
def parse(self, text: str) -> ParseResult:
|
||||
if self.FUNCTION_PREFIX not in text:
|
||||
return text, None
|
||||
|
||||
try:
|
||||
# Find all tool_call blocks
|
||||
tc_matches = self.TOOL_CALL_REGEX.findall(text)
|
||||
raw_blocks = [m[0] if m[0] else m[1] for m in tc_matches]
|
||||
|
||||
# Fallback: if no tool_call tags, try the whole text
|
||||
if not raw_blocks:
|
||||
raw_blocks = [text]
|
||||
|
||||
# Find function blocks within each tool_call
|
||||
function_strs: List[str] = []
|
||||
for block in raw_blocks:
|
||||
func_matches = self.FUNCTION_REGEX.findall(block)
|
||||
function_strs.extend(m[0] if m[0] else m[1] for m in func_matches)
|
||||
|
||||
if not function_strs:
|
||||
return text, None
|
||||
|
||||
# Parse each function call
|
||||
tool_calls: List[ChatCompletionMessageToolCall] = []
|
||||
for func_str in function_strs:
|
||||
tc = self._parse_function_call(func_str)
|
||||
if tc is not None:
|
||||
tool_calls.append(tc)
|
||||
|
||||
if not tool_calls:
|
||||
return text, None
|
||||
|
||||
# Content before tool calls
|
||||
first_tc = text.find(self.START_TOKEN)
|
||||
if first_tc < 0:
|
||||
first_tc = text.find(self.FUNCTION_PREFIX)
|
||||
content = text[:first_tc].strip() if first_tc > 0 else None
|
||||
|
||||
return content, tool_calls
|
||||
|
||||
except Exception:
|
||||
return text, None
|
||||
19
environments/tool_call_parsers/qwen_parser.py
Normal file
19
environments/tool_call_parsers/qwen_parser.py
Normal file
@@ -0,0 +1,19 @@
|
||||
"""
|
||||
Qwen 2.5 tool call parser.
|
||||
|
||||
Uses the same <tool_call> format as Hermes.
|
||||
Registered as a separate parser name for clarity when using --tool-parser=qwen.
|
||||
"""
|
||||
|
||||
from environments.tool_call_parsers import register_parser
|
||||
from environments.tool_call_parsers.hermes_parser import HermesToolCallParser
|
||||
|
||||
|
||||
@register_parser("qwen")
|
||||
class QwenToolCallParser(HermesToolCallParser):
|
||||
"""
|
||||
Parser for Qwen 2.5 tool calls.
|
||||
Same <tool_call>{"name": ..., "arguments": ...}</tool_call> format as Hermes.
|
||||
"""
|
||||
|
||||
pass # Identical format -- inherits everything from Hermes
|
||||
289
environments/tool_context.py
Normal file
289
environments/tool_context.py
Normal file
@@ -0,0 +1,289 @@
|
||||
"""
|
||||
ToolContext -- Unrestricted Tool Access for Reward Functions
|
||||
|
||||
A per-rollout handle that gives reward/verification functions direct access to
|
||||
ALL hermes-agent tools, scoped to the rollout's task_id. The same task_id means
|
||||
the terminal/browser session is the SAME one the model used during its rollout --
|
||||
all state (files, processes, browser tabs) is preserved.
|
||||
|
||||
The verifier author decides which tools to use. Nothing is hardcoded or gated.
|
||||
|
||||
Example usage in a compute_reward():
|
||||
async def compute_reward(self, item, result, ctx):
|
||||
# Run tests in the model's terminal sandbox
|
||||
test = ctx.terminal("pytest -v")
|
||||
if test["exit_code"] == 0:
|
||||
return 1.0
|
||||
|
||||
# Check if a file was created
|
||||
content = ctx.read_file("/workspace/solution.py")
|
||||
if content.get("content"):
|
||||
return 0.5
|
||||
|
||||
return 0.0
|
||||
"""
|
||||
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
import asyncio
|
||||
import concurrent.futures
|
||||
|
||||
from model_tools import handle_function_call
|
||||
from tools.terminal_tool import cleanup_vm
|
||||
from tools.browser_tool import cleanup_browser
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Thread pool for running sync tool calls that internally use asyncio.run()
|
||||
_tool_executor = concurrent.futures.ThreadPoolExecutor(max_workers=4)
|
||||
|
||||
|
||||
def _run_tool_in_thread(tool_name: str, arguments: Dict[str, Any], task_id: str) -> str:
|
||||
"""
|
||||
Run a tool call in a thread pool executor so backends that use asyncio.run()
|
||||
internally (modal, docker) get a clean event loop.
|
||||
|
||||
If we're already in an async context, uses run_in_executor.
|
||||
If not (e.g., called from sync code), runs directly.
|
||||
"""
|
||||
try:
|
||||
loop = asyncio.get_running_loop()
|
||||
# We're in an async context -- need to run in thread
|
||||
import concurrent.futures
|
||||
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
|
||||
future = pool.submit(
|
||||
handle_function_call, tool_name, arguments, task_id
|
||||
)
|
||||
return future.result(timeout=300)
|
||||
except RuntimeError:
|
||||
# No running event loop -- safe to call directly
|
||||
return handle_function_call(tool_name, arguments, task_id)
|
||||
|
||||
|
||||
class ToolContext:
|
||||
"""
|
||||
Open-ended access to all hermes-agent tools for a specific rollout.
|
||||
|
||||
Passed to compute_reward() so verifiers can use any tool they need:
|
||||
terminal commands, file reads/writes, web searches, browser automation, etc.
|
||||
All calls share the rollout's task_id for session isolation.
|
||||
"""
|
||||
|
||||
def __init__(self, task_id: str):
|
||||
self.task_id = task_id
|
||||
|
||||
# -------------------------------------------------------------------------
|
||||
# Terminal tools
|
||||
# -------------------------------------------------------------------------
|
||||
|
||||
def terminal(self, command: str, timeout: int = 180) -> Dict[str, Any]:
|
||||
"""
|
||||
Run a command in the rollout's terminal session.
|
||||
|
||||
Args:
|
||||
command: Shell command to execute
|
||||
timeout: Command timeout in seconds
|
||||
|
||||
Returns:
|
||||
Dict with 'exit_code' (int) and 'output' (str)
|
||||
"""
|
||||
import os
|
||||
backend = os.getenv("TERMINAL_ENV", "local")
|
||||
logger.debug("ToolContext.terminal [%s backend] task=%s: %s", backend, self.task_id[:8], command[:100])
|
||||
|
||||
# Run in thread pool so modal/docker backends' asyncio.run() doesn't deadlock
|
||||
result = _run_tool_in_thread(
|
||||
"terminal",
|
||||
{"command": command, "timeout": timeout},
|
||||
self.task_id,
|
||||
)
|
||||
try:
|
||||
return json.loads(result)
|
||||
except json.JSONDecodeError:
|
||||
return {"exit_code": -1, "output": result}
|
||||
|
||||
# -------------------------------------------------------------------------
|
||||
# File tools
|
||||
# -------------------------------------------------------------------------
|
||||
|
||||
def read_file(self, path: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Read a file from the rollout's filesystem.
|
||||
|
||||
Args:
|
||||
path: File path to read
|
||||
|
||||
Returns:
|
||||
Dict with file content or error
|
||||
"""
|
||||
result = handle_function_call(
|
||||
"read_file", {"path": path}, task_id=self.task_id
|
||||
)
|
||||
try:
|
||||
return json.loads(result)
|
||||
except json.JSONDecodeError:
|
||||
return {"error": result}
|
||||
|
||||
def write_file(self, path: str, content: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Write a file in the rollout's filesystem.
|
||||
|
||||
Args:
|
||||
path: File path to write
|
||||
content: Content to write
|
||||
|
||||
Returns:
|
||||
Dict with success status or error
|
||||
"""
|
||||
result = handle_function_call(
|
||||
"write_file", {"path": path, "content": content}, task_id=self.task_id
|
||||
)
|
||||
try:
|
||||
return json.loads(result)
|
||||
except json.JSONDecodeError:
|
||||
return {"error": result}
|
||||
|
||||
def search(self, query: str, path: str = ".") -> Dict[str, Any]:
|
||||
"""
|
||||
Search for text in the rollout's filesystem.
|
||||
|
||||
Args:
|
||||
query: Search query
|
||||
path: Directory to search in
|
||||
|
||||
Returns:
|
||||
Dict with search results
|
||||
"""
|
||||
result = handle_function_call(
|
||||
"search", {"query": query, "path": path}, task_id=self.task_id
|
||||
)
|
||||
try:
|
||||
return json.loads(result)
|
||||
except json.JSONDecodeError:
|
||||
return {"error": result}
|
||||
|
||||
# -------------------------------------------------------------------------
|
||||
# Web tools
|
||||
# -------------------------------------------------------------------------
|
||||
|
||||
def web_search(self, query: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Search the web.
|
||||
|
||||
Args:
|
||||
query: Search query
|
||||
|
||||
Returns:
|
||||
Dict with search results
|
||||
"""
|
||||
result = handle_function_call("web_search", {"query": query})
|
||||
try:
|
||||
return json.loads(result)
|
||||
except json.JSONDecodeError:
|
||||
return {"error": result}
|
||||
|
||||
def web_extract(self, urls: List[str]) -> Dict[str, Any]:
|
||||
"""
|
||||
Extract content from URLs.
|
||||
|
||||
Args:
|
||||
urls: List of URLs to extract content from
|
||||
|
||||
Returns:
|
||||
Dict with extracted content
|
||||
"""
|
||||
result = handle_function_call("web_extract", {"urls": urls})
|
||||
try:
|
||||
return json.loads(result)
|
||||
except json.JSONDecodeError:
|
||||
return {"error": result}
|
||||
|
||||
# -------------------------------------------------------------------------
|
||||
# Browser tools
|
||||
# -------------------------------------------------------------------------
|
||||
|
||||
def browser_navigate(self, url: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Navigate the rollout's browser session to a URL.
|
||||
|
||||
Args:
|
||||
url: URL to navigate to
|
||||
|
||||
Returns:
|
||||
Dict with page snapshot or error
|
||||
"""
|
||||
result = handle_function_call(
|
||||
"browser_navigate", {"url": url}, task_id=self.task_id
|
||||
)
|
||||
try:
|
||||
return json.loads(result)
|
||||
except json.JSONDecodeError:
|
||||
return {"error": result}
|
||||
|
||||
def browser_snapshot(self) -> Dict[str, Any]:
|
||||
"""
|
||||
Take a snapshot of the current browser page.
|
||||
|
||||
Returns:
|
||||
Dict with page content/accessibility snapshot
|
||||
"""
|
||||
result = handle_function_call(
|
||||
"browser_snapshot", {}, task_id=self.task_id
|
||||
)
|
||||
try:
|
||||
return json.loads(result)
|
||||
except json.JSONDecodeError:
|
||||
return {"error": result}
|
||||
|
||||
# -------------------------------------------------------------------------
|
||||
# Generic tool access
|
||||
# -------------------------------------------------------------------------
|
||||
|
||||
def call_tool(self, tool_name: str, arguments: Dict[str, Any]) -> str:
|
||||
"""
|
||||
Call any hermes-agent tool by name.
|
||||
|
||||
This is the generic escape hatch -- if a tool doesn't have a convenience
|
||||
wrapper above, you can call it directly here.
|
||||
|
||||
Args:
|
||||
tool_name: Name of the tool (e.g., "vision_analyze", "skills_list")
|
||||
arguments: Dict of arguments for the tool
|
||||
|
||||
Returns:
|
||||
Raw JSON string result from the tool
|
||||
"""
|
||||
return _run_tool_in_thread(tool_name, arguments, self.task_id)
|
||||
|
||||
# -------------------------------------------------------------------------
|
||||
# Cleanup
|
||||
# -------------------------------------------------------------------------
|
||||
|
||||
def cleanup(self):
|
||||
"""
|
||||
Release all resources (terminal VMs, browser sessions) for this rollout.
|
||||
|
||||
Called automatically by the base environment via try/finally after
|
||||
compute_reward() completes. You generally don't need to call this yourself.
|
||||
"""
|
||||
try:
|
||||
cleanup_vm(self.task_id)
|
||||
except Exception as e:
|
||||
logger.debug("VM cleanup for task %s: %s", self.task_id, e)
|
||||
|
||||
# Suppress browser_tool's noisy debug prints during cleanup.
|
||||
# The cleanup still runs (safe), it just doesn't spam the console.
|
||||
_prev_quiet = os.environ.get("HERMES_QUIET")
|
||||
os.environ["HERMES_QUIET"] = "1"
|
||||
try:
|
||||
cleanup_browser(self.task_id)
|
||||
except Exception as e:
|
||||
logger.debug("Browser cleanup for task %s: %s", self.task_id, e)
|
||||
finally:
|
||||
if _prev_quiet is None:
|
||||
os.environ.pop("HERMES_QUIET", None)
|
||||
else:
|
||||
os.environ["HERMES_QUIET"] = _prev_quiet
|
||||
@@ -1,70 +0,0 @@
|
||||
---
|
||||
name: example-skill
|
||||
description: An example skill demonstrating the skill file format and structure
|
||||
---
|
||||
|
||||
# Example Skill
|
||||
|
||||
This is an example skill file that demonstrates how to create skills for the Hermes Agent.
|
||||
|
||||
## Skill File Format
|
||||
|
||||
Skills are markdown files with YAML frontmatter at the top:
|
||||
|
||||
```yaml
|
||||
---
|
||||
name: your-skill-name
|
||||
description: A brief one-line description of what this skill does
|
||||
---
|
||||
```
|
||||
|
||||
The frontmatter fields:
|
||||
- **name**: The identifier used to reference this skill (lowercase, hyphens for spaces)
|
||||
- **description**: A brief description shown when listing skills (keep under 200 chars)
|
||||
|
||||
## Writing Effective Skills
|
||||
|
||||
### 1. Be Specific and Actionable
|
||||
|
||||
Good skills provide clear, actionable instructions:
|
||||
|
||||
```
|
||||
When reviewing code:
|
||||
1. Check for security vulnerabilities first
|
||||
2. Verify error handling is comprehensive
|
||||
3. Ensure tests cover edge cases
|
||||
```
|
||||
|
||||
### 2. Include Examples
|
||||
|
||||
Show concrete examples of what you want:
|
||||
|
||||
```python
|
||||
# Good: Descriptive variable names
|
||||
user_authentication_token = get_token()
|
||||
|
||||
# Bad: Cryptic abbreviations
|
||||
uat = gt()
|
||||
```
|
||||
|
||||
### 3. Define When to Use
|
||||
|
||||
Help the agent understand when this skill applies:
|
||||
|
||||
> Use this skill when: reviewing pull requests, auditing security, or checking code quality.
|
||||
|
||||
## Skill Categories
|
||||
|
||||
Consider organizing skills by purpose:
|
||||
|
||||
- **Conventions**: Coding standards, API patterns, naming rules
|
||||
- **Workflows**: Step-by-step processes for deployments, reviews, releases
|
||||
- **Knowledge**: Domain-specific information, system architecture, gotchas
|
||||
- **Templates**: Boilerplate for common tasks, response formats
|
||||
|
||||
## Tips
|
||||
|
||||
1. Keep the description concise - it's shown in the skills list
|
||||
2. Use headers to organize longer skills
|
||||
3. Include code examples where helpful
|
||||
4. Reference other skills if they're related
|
||||
@@ -1,868 +0,0 @@
|
||||
Metadata-Version: 2.4
|
||||
Name: hermes-agent
|
||||
Version: 0.1.0
|
||||
Summary: AI agent with advanced tool-calling and toolsets
|
||||
Author: Nous Research
|
||||
License: MIT
|
||||
Requires-Python: >=3.10
|
||||
Description-Content-Type: text/markdown
|
||||
Requires-Dist: openai
|
||||
Requires-Dist: python-dotenv
|
||||
Requires-Dist: fire
|
||||
Requires-Dist: httpx
|
||||
Requires-Dist: rich
|
||||
Requires-Dist: tenacity
|
||||
Requires-Dist: pyyaml
|
||||
Requires-Dist: requests
|
||||
Requires-Dist: jinja2
|
||||
Requires-Dist: pydantic>=2.0
|
||||
Requires-Dist: firecrawl-py
|
||||
Requires-Dist: fal-client
|
||||
Requires-Dist: litellm>=1.75.5
|
||||
Requires-Dist: typer
|
||||
Requires-Dist: platformdirs
|
||||
Provides-Extra: modal
|
||||
Requires-Dist: modal; extra == "modal"
|
||||
Requires-Dist: boto3; extra == "modal"
|
||||
Provides-Extra: dev
|
||||
Requires-Dist: pytest; extra == "dev"
|
||||
Requires-Dist: pytest-asyncio; extra == "dev"
|
||||
Provides-Extra: messaging
|
||||
Requires-Dist: python-telegram-bot>=20.0; extra == "messaging"
|
||||
Requires-Dist: discord.py>=2.0; extra == "messaging"
|
||||
Provides-Extra: cron
|
||||
Requires-Dist: croniter; extra == "cron"
|
||||
Provides-Extra: all
|
||||
Requires-Dist: croniter; extra == "all"
|
||||
Requires-Dist: python-telegram-bot>=20.0; extra == "all"
|
||||
Requires-Dist: discord.py>=2.0; extra == "all"
|
||||
|
||||
# Hermes Agent
|
||||
|
||||
An AI agent with advanced tool-calling capabilities, featuring a flexible toolsets system for organizing and managing tools.
|
||||
|
||||
## Features
|
||||
|
||||
- **Interactive CLI**: Beautiful terminal interface with animated feedback, personalities, and session management
|
||||
- **Messaging Gateway**: Connect to Telegram, Discord, and WhatsApp for conversational AI anywhere
|
||||
- **Web Tools**: Search, extract content, and crawl websites
|
||||
- **Terminal Tools**: Execute commands via local, Docker, Singularity, Modal, or SSH backends
|
||||
- **Browser Tools**: Automate web browsers to navigate, click, type, and extract content
|
||||
- **Vision Tools**: Analyze images from URLs
|
||||
- **Reasoning Tools**: Advanced multi-model reasoning (Mixture of Agents)
|
||||
- **Creative Tools**: Generate images from text prompts
|
||||
- **Skills Tools**: On-demand knowledge documents with progressive disclosure
|
||||
- **Toolsets System**: Organize tools into logical groups for different scenarios
|
||||
- **Scheduled Tasks**: Cron jobs for automated agent tasks with delivery to platforms
|
||||
- **Context Compression**: Automatic summarization when approaching context limits
|
||||
- **Batch Processing**: Process datasets in parallel with checkpointing and statistics tracking
|
||||
- **Ephemeral System Prompts**: Guide model behavior without polluting training datasets
|
||||
|
||||
## Installation
|
||||
|
||||
### Quick Install (Recommended)
|
||||
|
||||
**Linux/macOS:**
|
||||
```bash
|
||||
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
|
||||
```
|
||||
|
||||
**Windows (PowerShell):**
|
||||
```powershell
|
||||
irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1 | iex
|
||||
```
|
||||
|
||||
This installer will:
|
||||
- Clone the repository to `~/.hermes-agent`
|
||||
- Create a virtual environment and install dependencies
|
||||
- Set up the `hermes` command in your PATH
|
||||
- Run an interactive setup wizard to configure API keys
|
||||
|
||||
### Manual Installation
|
||||
|
||||
If you prefer to install manually:
|
||||
|
||||
```bash
|
||||
# Clone with submodules
|
||||
git clone --recurse-submodules https://github.com/NousResearch/Hermes-Agent.git
|
||||
cd Hermes-Agent
|
||||
|
||||
# Run the setup script
|
||||
./setup-hermes.sh
|
||||
```
|
||||
|
||||
Or step-by-step:
|
||||
|
||||
```bash
|
||||
# Create and activate virtual environment
|
||||
python3 -m venv venv
|
||||
source venv/bin/activate # Windows: venv\Scripts\activate
|
||||
|
||||
# Install in editable mode with all extras
|
||||
pip install -e ".[all]"
|
||||
|
||||
# Or install dependencies manually
|
||||
pip install -r requirements.txt
|
||||
pip install -e ./mini-swe-agent
|
||||
|
||||
# Copy and configure environment
|
||||
cp .env.example .env
|
||||
# Edit .env with your API keys
|
||||
|
||||
# Run the setup wizard
|
||||
hermes setup
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
Once installed, the `hermes` command is your main entry point:
|
||||
|
||||
```bash
|
||||
hermes # Interactive chat (default)
|
||||
hermes chat # Same as above
|
||||
hermes chat -q "Hello" # Single query, then exit
|
||||
hermes setup # Configure API keys and settings
|
||||
hermes status # Show configuration status
|
||||
hermes doctor # Diagnose issues
|
||||
hermes gateway # Start messaging gateway (Telegram/Discord/WhatsApp)
|
||||
hermes cron daemon # Run cron job scheduler
|
||||
hermes version # Show version info
|
||||
```
|
||||
|
||||
**Legacy `./hermes` script:**
|
||||
```bash
|
||||
# The old CLI script still works:
|
||||
./hermes
|
||||
|
||||
# Or with options:
|
||||
./hermes --model "anthropic/claude-sonnet-4" --toolsets "web,terminal"
|
||||
```
|
||||
|
||||
The CLI provides:
|
||||
- Animated spinners during thinking and tool execution
|
||||
- Kawaii-style feedback messages
|
||||
- `/commands` for configuration, history, and session management
|
||||
- Customizable personalities (`/personality kawaii`, `/personality pirate`, etc.)
|
||||
- Persistent configuration via `cli-config.yaml`
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
```bash
|
||||
# Copy the example environment file
|
||||
cp .env.example .env
|
||||
|
||||
# Edit .env and add your API keys
|
||||
nano .env # or use your preferred editor
|
||||
```
|
||||
|
||||
**Required API Keys:**
|
||||
- `OPENROUTER_API_KEY` - LLM access via OpenRouter (get at: https://openrouter.ai/keys)
|
||||
- `FIRECRAWL_API_KEY` - Web tools (get at: https://firecrawl.dev/)
|
||||
- `NOUS_API_KEY` - Vision & reasoning tools (get at: https://inference-api.nousresearch.com/)
|
||||
- `FAL_KEY` - Image generation (get at: https://fal.ai/)
|
||||
|
||||
**Optional API Keys (for specific features):**
|
||||
- `BROWSERBASE_API_KEY` - Browser automation (get at: https://browserbase.com/)
|
||||
- `BROWSERBASE_PROJECT_ID` - From Browserbase dashboard
|
||||
- `MORPH_API_KEY` - For legacy Hecate terminal backend (get at: https://morph.so/)
|
||||
|
||||
### 4. Configure Terminal Backend
|
||||
|
||||
The terminal tool uses **mini-swe-agent** environments. Configure in `.env` or `cli-config.yaml`:
|
||||
|
||||
```bash
|
||||
# Backend: "local", "docker", "singularity", "modal", or "ssh"
|
||||
TERMINAL_ENV=local # Default: runs on host machine (no isolation)
|
||||
TERMINAL_ENV=ssh # Remote execution via SSH (agent code stays local)
|
||||
TERMINAL_ENV=singularity # Recommended for HPC: Apptainer/Singularity containers
|
||||
TERMINAL_ENV=docker # Isolated Docker containers
|
||||
TERMINAL_ENV=modal # Cloud execution via Modal
|
||||
|
||||
# Container image (for docker/singularity/modal backends)
|
||||
TERMINAL_DOCKER_IMAGE=python:3.11-slim
|
||||
TERMINAL_SINGULARITY_IMAGE=docker://python:3.11-slim
|
||||
TERMINAL_TIMEOUT=60
|
||||
|
||||
# SSH backend (for ssh)
|
||||
TERMINAL_SSH_HOST=my-server.example.com
|
||||
TERMINAL_SSH_USER=myuser
|
||||
TERMINAL_SSH_KEY=~/.ssh/id_rsa # Optional, uses ssh-agent if not set
|
||||
```
|
||||
|
||||
**Backend Requirements:**
|
||||
- **local**: No extra setup (runs directly on your machine, no isolation)
|
||||
- **ssh**: SSH access to remote machine (great for sandboxing - agent can't touch its own code)
|
||||
- **singularity**: Requires Apptainer or Singularity installed (common on HPC clusters, no root needed)
|
||||
- **docker**: Requires Docker installed and user in `docker` group
|
||||
- **modal**: Requires Modal account (see setup below)
|
||||
|
||||
### Singularity/Apptainer Setup (Recommended for HPC)
|
||||
|
||||
Singularity/Apptainer provides rootless container execution, ideal for HPC clusters:
|
||||
|
||||
```bash
|
||||
# 1. Verify Apptainer is installed
|
||||
apptainer --version # or: singularity --version
|
||||
|
||||
# 2. Set up cache directories (important for parallel workers)
|
||||
# Use /scratch if available (HPC), otherwise /tmp
|
||||
export APPTAINER_CACHEDIR=/scratch/$USER/.apptainer
|
||||
export APPTAINER_TMPDIR=/scratch/$USER/.apptainer/tmp
|
||||
mkdir -p "$APPTAINER_CACHEDIR" "$APPTAINER_TMPDIR"
|
||||
|
||||
# 3. Pre-build SIF image (recommended for parallel batch processing)
|
||||
# This avoids race conditions when multiple workers start simultaneously
|
||||
apptainer build $APPTAINER_CACHEDIR/python-nodejs.sif docker://nikolaik/python-nodejs:python3.11-nodejs20
|
||||
|
||||
# 4. Configure .env to use the local SIF
|
||||
TERMINAL_ENV=singularity
|
||||
TERMINAL_SINGULARITY_IMAGE=/scratch/$USER/.apptainer/python-nodejs.sif
|
||||
```
|
||||
|
||||
**Tip:** The batch scripts in `configs/` automatically handle SIF pre-building if `/scratch` is available.
|
||||
|
||||
### Modal Cloud Backend Setup
|
||||
|
||||
[Modal](https://modal.com) provides serverless cloud compute for running sandboxed environments at scale.
|
||||
|
||||
```bash
|
||||
# 1. Install Modal and dependencies
|
||||
pip install modal boto3
|
||||
|
||||
# 2. Authenticate with Modal (opens browser)
|
||||
modal setup
|
||||
|
||||
# 3. Set terminal backend to modal in .env
|
||||
TERMINAL_ENV=modal
|
||||
```
|
||||
|
||||
Modal uses CLI-based authentication (stored in `~/.modal/`), so no API key is needed in `.env`. After running `modal setup`, commands will automatically execute in Modal's cloud sandboxes.
|
||||
|
||||
### Browser Tools Setup
|
||||
|
||||
Browser tools enable the agent to navigate websites, fill forms, click buttons, and extract content. They use [agent-browser](https://github.com/vercel-labs/agent-browser) CLI with [Browserbase](https://browserbase.com) cloud execution.
|
||||
|
||||
```bash
|
||||
# 1. Install Node.js (if not already installed)
|
||||
# Use nvm (recommended) or your package manager
|
||||
|
||||
# 2. Install agent-browser CLI (choose one option):
|
||||
npm install -g agent-browser # Option A: Global install (recommended)
|
||||
npm install # Option B: Local install (uses npx fallback)
|
||||
|
||||
# 3. Get Browserbase credentials
|
||||
# Sign up at https://browserbase.com/ and get your:
|
||||
# - API Key (from Settings → API Keys)
|
||||
# - Project ID (from your project dashboard)
|
||||
|
||||
# 4. Add to your .env file:
|
||||
BROWSERBASE_API_KEY=your_api_key_here
|
||||
BROWSERBASE_PROJECT_ID=your_project_id_here
|
||||
```
|
||||
|
||||
**Available Browser Tools:**
|
||||
|
||||
| Tool | Description |
|
||||
|------|-------------|
|
||||
| `browser_navigate` | Navigate to a URL |
|
||||
| `browser_snapshot` | Get text-based page snapshot with element refs |
|
||||
| `browser_click` | Click an element by ref (e.g., `@e5`) |
|
||||
| `browser_type` | Type text into an input field |
|
||||
| `browser_scroll` | Scroll up or down |
|
||||
| `browser_back` | Go back in browser history |
|
||||
| `browser_press` | Press a keyboard key (Enter, Tab, etc.) |
|
||||
| `browser_close` | Close the browser session |
|
||||
| `browser_get_images` | Get list of images on the page |
|
||||
|
||||
**Example Usage:**
|
||||
```bash
|
||||
# Use browser tools with web search and vision
|
||||
python run_agent.py \
|
||||
--query "Go to amazon.com and find the price of the latest Kindle" \
|
||||
--enabled_toolsets=browser,web,vision
|
||||
|
||||
# Use browser-focused distribution
|
||||
python batch_runner.py \
|
||||
--dataset_file=browser_tasks.jsonl \
|
||||
--distribution=browser_use \
|
||||
--run_name=browser_run
|
||||
```
|
||||
|
||||
See `.env.example` for all available configuration options including debug settings.
|
||||
|
||||
### Skills Tools
|
||||
|
||||
Skills are on-demand knowledge documents the agent can load when needed. They follow a **progressive disclosure** pattern to minimize token usage:
|
||||
|
||||
```
|
||||
skills/
|
||||
├── mlops/ # Category folder
|
||||
│ ├── axolotl/ # Skill folder
|
||||
│ │ ├── SKILL.md # Main instructions (required)
|
||||
│ │ ├── references/ # Additional docs, API specs
|
||||
│ │ └── templates/ # Output formats, configs
|
||||
│ └── vllm/
|
||||
│ └── SKILL.md
|
||||
```
|
||||
|
||||
**Available Skills Tools:**
|
||||
|
||||
| Tool | Description |
|
||||
|------|-------------|
|
||||
| `skills_categories` | List available skill categories (~50 tokens) |
|
||||
| `skills_list` | List skills with name + description (~3k tokens for 40 skills) |
|
||||
| `skill_view` | Load full skill content, tags, and linked files |
|
||||
|
||||
**Example Usage:**
|
||||
```bash
|
||||
# Use skills tools
|
||||
python run_agent.py \
|
||||
--query "What skills do you have for fine-tuning? Show me the axolotl skill." \
|
||||
--enabled_toolsets=skills
|
||||
```
|
||||
|
||||
**Creating Skills:**
|
||||
|
||||
Skills use YAML frontmatter for metadata:
|
||||
```yaml
|
||||
---
|
||||
name: my-skill
|
||||
description: Brief description shown in skills_list
|
||||
tags: [tag1, tag2]
|
||||
related_skills: [other-skill]
|
||||
version: 1.0.0
|
||||
---
|
||||
# Skill Content
|
||||
|
||||
Instructions, examples, and guidelines here...
|
||||
```
|
||||
|
||||
Skills can include:
|
||||
- `references/` - Additional documentation, API specs, examples
|
||||
- `templates/` - Output formats, config files, boilerplate code
|
||||
- `scripts/` - Executable helpers (Python, shell scripts)
|
||||
|
||||
## Session Logging
|
||||
|
||||
Every conversation is automatically logged to `logs/` for debugging and inspection:
|
||||
|
||||
```
|
||||
logs/
|
||||
├── session_20260201_143052_a1b2c3.json
|
||||
├── session_20260201_150217_d4e5f6.json
|
||||
└── ...
|
||||
```
|
||||
|
||||
**Log Format:**
|
||||
```json
|
||||
{
|
||||
"session_id": "20260201_143052_a1b2c3",
|
||||
"model": "anthropic/claude-sonnet-4",
|
||||
"session_start": "2026-02-01T14:30:52.123456",
|
||||
"last_updated": "2026-02-01T14:35:12.789012",
|
||||
"message_count": 8,
|
||||
"conversations": [
|
||||
{"from": "system", "value": "..."},
|
||||
{"from": "human", "value": "..."},
|
||||
{"from": "gpt", "value": "..."},
|
||||
{"from": "tool", "value": "..."}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
- **Automatic**: Logs are created and updated automatically after each conversation turn
|
||||
- **Session ID in Banner**: The CLI displays the session ID in the welcome banner
|
||||
- **Trajectory Format**: Uses the same format as batch processing for consistency
|
||||
- **Git Ignored**: `logs/` is in `.gitignore` so logs aren't committed
|
||||
|
||||
## Context Compression
|
||||
|
||||
Long conversations can exceed the model's context limit. Hermes Agent automatically compresses context when approaching the limit:
|
||||
|
||||
**How it works:**
|
||||
1. Tracks actual token usage from API responses (`usage.prompt_tokens`)
|
||||
2. When tokens reach 85% of model's context limit, triggers compression
|
||||
3. Protects first 3 turns (system prompt, initial request, first response)
|
||||
4. Protects last 4 turns (recent context is most relevant)
|
||||
5. Summarizes middle turns using a fast/cheap model (Gemini Flash)
|
||||
6. Inserts summary as a user message, conversation continues seamlessly
|
||||
|
||||
**Configuration (`cli-config.yaml`):**
|
||||
```yaml
|
||||
compression:
|
||||
enabled: true # Enable auto-compression (default)
|
||||
threshold: 0.85 # Compress at 85% of context limit
|
||||
summary_model: "google/gemini-2.0-flash-001"
|
||||
```
|
||||
|
||||
**Or via environment variables:**
|
||||
```bash
|
||||
CONTEXT_COMPRESSION_ENABLED=true
|
||||
CONTEXT_COMPRESSION_THRESHOLD=0.85
|
||||
CONTEXT_COMPRESSION_MODEL=google/gemini-2.0-flash-001
|
||||
```
|
||||
|
||||
**When compression triggers, you'll see:**
|
||||
```
|
||||
📦 Context compression triggered (170,000 tokens ≥ 170,000 threshold)
|
||||
📊 Model context limit: 200,000 tokens (85% = 170,000)
|
||||
🗜️ Summarizing turns 4-15 (12 turns)
|
||||
✅ Compressed: 20 → 9 messages (~45,000 tokens saved)
|
||||
```
|
||||
|
||||
## Scheduled Tasks (Cron Jobs)
|
||||
|
||||
Hermes Agent can schedule automated tasks to run in the future - either one-time reminders or recurring jobs.
|
||||
|
||||
### CLI Commands
|
||||
|
||||
```bash
|
||||
# List scheduled jobs
|
||||
/cron
|
||||
|
||||
# Add a one-shot reminder (runs once in 30 minutes)
|
||||
/cron add 30m Remind me to check the build status
|
||||
|
||||
# Add a recurring job (every 2 hours)
|
||||
/cron add "every 2h" Check server status at 192.168.1.100 and report any issues
|
||||
|
||||
# Add a cron expression (daily at 9am)
|
||||
/cron add "0 9 * * *" Generate a morning briefing summarizing GitHub notifications
|
||||
|
||||
# Remove a job
|
||||
/cron remove abc123def456
|
||||
```
|
||||
|
||||
### Agent Self-Scheduling
|
||||
|
||||
The agent can also schedule its own follow-up tasks using tools:
|
||||
|
||||
```python
|
||||
# Available when using hermes-cli toolset (default for CLI)
|
||||
schedule_cronjob(prompt="...", schedule="30m", repeat=1) # One-shot
|
||||
schedule_cronjob(prompt="...", schedule="every 2h") # Recurring
|
||||
list_cronjobs() # View all jobs
|
||||
remove_cronjob(job_id="...") # Cancel a job
|
||||
```
|
||||
|
||||
**⚠️ Important:** Cronjobs run in **isolated sessions with NO prior context**. The prompt must be completely self-contained with all necessary information (file paths, URLs, server addresses, etc.). The future agent will not remember anything from the current conversation.
|
||||
|
||||
### Schedule Formats
|
||||
|
||||
| Format | Example | Description |
|
||||
|--------|---------|-------------|
|
||||
| Duration | `30m`, `2h`, `1d` | One-shot delay from now |
|
||||
| Interval | `every 30m`, `every 2h` | Recurring at fixed intervals |
|
||||
| Cron | `0 9 * * *` | Cron expression (requires `croniter`) |
|
||||
| Timestamp | `2026-02-03T14:00` | One-shot at specific time |
|
||||
|
||||
### Repeat Options
|
||||
|
||||
| repeat | Behavior |
|
||||
|--------|----------|
|
||||
| (omitted) | One-shot schedules run once; intervals/cron run forever |
|
||||
| `1` | Run once then auto-delete |
|
||||
| `N` | Run N times then auto-delete |
|
||||
|
||||
### Running the Cron Daemon
|
||||
|
||||
Jobs are stored in `~/.hermes/cron/jobs.json` and executed by a scheduler:
|
||||
|
||||
```bash
|
||||
# Option 1: Built-in daemon (checks every 60 seconds)
|
||||
python cli.py --cron-daemon
|
||||
|
||||
# Option 2: System cron integration (run once per minute)
|
||||
# Add to crontab: crontab -e
|
||||
*/1 * * * * cd ~/hermes-agent && python cli.py --cron-tick-once >> ~/.hermes/cron/cron.log 2>&1
|
||||
```
|
||||
|
||||
### Job Output
|
||||
|
||||
Job outputs are saved to `~/.hermes/cron/output/{job_id}/{timestamp}.md` for review.
|
||||
|
||||
## Messaging Gateway (Telegram, Discord, WhatsApp)
|
||||
|
||||
Connect Hermes Agent to messaging platforms so you can chat from anywhere.
|
||||
|
||||
### Quick Start
|
||||
|
||||
```bash
|
||||
# 1. Add your bot token to .env
|
||||
echo 'TELEGRAM_BOT_TOKEN="your_token"' >> .env
|
||||
|
||||
# 2. Test the gateway (foreground)
|
||||
./scripts/hermes-gateway run
|
||||
|
||||
# 3. Install as a background service
|
||||
./scripts/hermes-gateway install
|
||||
|
||||
# 4. Manage the service
|
||||
./scripts/hermes-gateway start # Start
|
||||
./scripts/hermes-gateway stop # Stop
|
||||
./scripts/hermes-gateway status # Check status
|
||||
```
|
||||
|
||||
### Supported Platforms
|
||||
|
||||
| Platform | Setup | Toolset |
|
||||
|----------|-------|---------|
|
||||
| Telegram | Bot via @BotFather | `hermes-telegram` |
|
||||
| Discord | Bot via Developer Portal | `hermes-discord` |
|
||||
| WhatsApp | Node.js bridge | `hermes-whatsapp` |
|
||||
|
||||
### Session Management
|
||||
|
||||
- Sessions persist across messages (agent remembers context)
|
||||
- Reset policies: daily (4am), idle (2 hours), or both
|
||||
- Manual reset: send `/new` or `/reset`
|
||||
|
||||
### Cron Job Delivery
|
||||
|
||||
Schedule tasks that deliver to specific platforms:
|
||||
|
||||
```python
|
||||
schedule_cronjob(
|
||||
prompt="Check server status...",
|
||||
schedule="every 1h",
|
||||
deliver="telegram" # or "origin", "discord", etc.
|
||||
)
|
||||
```
|
||||
|
||||
### CLI Commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/platforms` | Show gateway configuration status |
|
||||
| `--gateway` | Start the gateway (CLI flag) |
|
||||
|
||||
See [docs/messaging.md](docs/messaging.md) for full setup instructions.
|
||||
|
||||
## Interactive CLI
|
||||
|
||||
The CLI provides a rich interactive experience for working with the agent.
|
||||
|
||||
### Running the CLI
|
||||
|
||||
```bash
|
||||
# Basic usage
|
||||
./hermes
|
||||
|
||||
# With specific model
|
||||
./hermes --model "anthropic/claude-sonnet-4"
|
||||
|
||||
# With specific toolsets
|
||||
./hermes --toolsets "web,terminal,skills"
|
||||
```
|
||||
|
||||
### CLI Commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/help` | Show available commands |
|
||||
| `/tools` | List available tools by toolset |
|
||||
| `/toolsets` | List available toolsets |
|
||||
| `/model [name]` | Show or change the current model |
|
||||
| `/prompt [text]` | View/set custom system prompt |
|
||||
| `/personality [name]` | Set a predefined personality |
|
||||
| `/clear` | Clear screen and reset conversation |
|
||||
| `/reset` | Reset conversation only |
|
||||
| `/history` | Show conversation history |
|
||||
| `/save` | Save current conversation to file |
|
||||
| `/config` | Show current configuration |
|
||||
| `/cron` | Manage scheduled tasks (list, add, remove) |
|
||||
| `/platforms` | Show gateway/messaging platform status |
|
||||
| `/quit` | Exit the CLI |
|
||||
|
||||
### Configuration
|
||||
|
||||
Copy `cli-config.yaml.example` to `cli-config.yaml` and customize:
|
||||
|
||||
```yaml
|
||||
# Model settings
|
||||
model:
|
||||
default: "anthropic/claude-sonnet-4"
|
||||
|
||||
# Terminal backend (local, docker, singularity, modal, or ssh)
|
||||
terminal:
|
||||
env_type: "local"
|
||||
cwd: "." # Use current directory
|
||||
|
||||
# Or use SSH for remote execution (keeps agent code isolated)
|
||||
# terminal:
|
||||
# env_type: "ssh"
|
||||
# ssh_host: "my-server.example.com"
|
||||
# ssh_user: "myuser"
|
||||
# ssh_key: "~/.ssh/id_rsa"
|
||||
# cwd: "/home/myuser/project"
|
||||
|
||||
# Enable specific toolsets
|
||||
toolsets:
|
||||
- all # or: web, terminal, browser, vision, etc.
|
||||
|
||||
# Custom personalities (use with /personality command)
|
||||
agent:
|
||||
personalities:
|
||||
helpful: "You are a helpful assistant."
|
||||
kawaii: "You are a kawaii assistant! Use cute expressions..."
|
||||
```
|
||||
|
||||
### Personalities
|
||||
|
||||
Built-in personalities available via `/personality`:
|
||||
- `helpful`, `concise`, `technical`, `creative`, `teacher`
|
||||
- `kawaii`, `catgirl`, `pirate`, `shakespeare`, `surfer`
|
||||
- `noir`, `uwu`, `philosopher`, `hype`
|
||||
|
||||
## Toolsets System
|
||||
|
||||
The agent uses a toolsets system for organizing and managing tools. All tools must be part of a toolset to be accessible - individual tool selection is not supported. This ensures consistent and logical grouping of capabilities.
|
||||
|
||||
### Key Concepts
|
||||
|
||||
- **Toolsets**: Logical groups of tools for specific use cases (e.g., "research", "development", "debugging")
|
||||
- **Composition**: Toolsets can include other toolsets for powerful combinations
|
||||
- **Custom Toolsets**: Create your own toolsets at runtime or by editing `toolsets.py`
|
||||
- **Toolset-Only Access**: Tools are only accessible through toolsets, not individually
|
||||
|
||||
### Available Toolsets
|
||||
|
||||
See `toolsets.py` for the complete list of predefined toolsets including:
|
||||
- Basic toolsets (web, terminal, vision, creative, reasoning)
|
||||
- Composite toolsets (research, development, analysis, etc.)
|
||||
- Scenario-specific toolsets (debugging, documentation, API testing, etc.)
|
||||
- Special toolsets (safe mode without terminal, minimal, offline)
|
||||
|
||||
### Using Toolsets
|
||||
|
||||
```bash
|
||||
# Use a predefined toolset
|
||||
python run_agent.py --enabled_toolsets=research --query "Find latest AI papers"
|
||||
|
||||
# Combine multiple toolsets
|
||||
python run_agent.py --enabled_toolsets=web,vision --query "Analyze this website"
|
||||
|
||||
# Enable all toolsets explicitly (same as omitting the flag)
|
||||
python run_agent.py --enabled_toolsets=all --query "Do web research and run commands if helpful"
|
||||
|
||||
# Safe mode (no terminal access)
|
||||
python run_agent.py --enabled_toolsets=safe --query "Help without running commands"
|
||||
|
||||
# List all available toolsets and tools
|
||||
python run_agent.py --list_tools
|
||||
```
|
||||
|
||||
See `toolsets.py` for the complete list of available toolsets and how to create custom ones.
|
||||
|
||||
## Basic Usage
|
||||
|
||||
### Default (all tools enabled)
|
||||
```bash
|
||||
# Uses OpenRouter by default - just set OPENROUTER_API_KEY in .env
|
||||
python run_agent.py \
|
||||
--query "search up the latest docs on jit in python 3.13 and write me basic example that's not in their docs. profile its perf" \
|
||||
--max_turns 20 \
|
||||
--model anthropic/claude-sonnet-4-20250514
|
||||
```
|
||||
|
||||
### With specific toolset
|
||||
```bash
|
||||
python run_agent.py \
|
||||
--query "Debug this Python error" \
|
||||
--enabled_toolsets=debugging \
|
||||
--model anthropic/claude-sonnet-4-20250514
|
||||
```
|
||||
|
||||
### Python API
|
||||
```python
|
||||
from run_agent import AIAgent
|
||||
|
||||
# Uses OpenRouter by default (reads OPENROUTER_API_KEY from .env)
|
||||
agent = AIAgent(
|
||||
model="anthropic/claude-sonnet-4-20250514",
|
||||
enabled_toolsets=["research"]
|
||||
)
|
||||
response = agent.chat("Find information about quantum computing")
|
||||
|
||||
# Create custom toolset at runtime
|
||||
from toolsets import create_custom_toolset
|
||||
|
||||
create_custom_toolset(
|
||||
name="my_tools",
|
||||
description="My custom toolkit",
|
||||
tools=["web_search"],
|
||||
includes=["terminal", "vision"]
|
||||
)
|
||||
|
||||
agent = AIAgent(enabled_toolsets=["my_tools"])
|
||||
```
|
||||
|
||||
## Batch Processing
|
||||
|
||||
Process multiple prompts from a dataset in parallel with automatic checkpointing and statistics tracking:
|
||||
|
||||
```bash
|
||||
# Basic batch processing
|
||||
python batch_runner.py \
|
||||
--dataset_file=prompts.jsonl \
|
||||
--batch_size=20 \
|
||||
--run_name=my_run
|
||||
|
||||
# With specific distribution
|
||||
python batch_runner.py \
|
||||
--dataset_file=prompts.jsonl \
|
||||
--batch_size=20 \
|
||||
--run_name=image_run \
|
||||
--distribution=image_gen \
|
||||
--num_workers=4
|
||||
```
|
||||
|
||||
**Key Features:**
|
||||
- Parallel processing with configurable workers
|
||||
- Toolset distributions for varied data generation
|
||||
- Automatic checkpointing and resume capability
|
||||
- Combined output in `data/<run_name>/trajectories.jsonl`
|
||||
- Tool usage statistics and success rates
|
||||
|
||||
Use `--list_distributions` to see available toolset distributions for varied data generation.
|
||||
|
||||
### Trajectory Compression
|
||||
|
||||
Post-process trajectories to fit within token budgets for training:
|
||||
|
||||
```bash
|
||||
# Compress a directory of JSONL files
|
||||
python trajectory_compressor.py --input=data/my_run
|
||||
|
||||
# Compress a single JSONL file
|
||||
python trajectory_compressor.py --input=data/trajectories.jsonl
|
||||
|
||||
# Compress a 15% sample (useful for creating smaller training sets)
|
||||
python trajectory_compressor.py --input=data/trajectories.jsonl --sample_percent=15
|
||||
|
||||
# Custom output and token target
|
||||
python trajectory_compressor.py \
|
||||
--input=data/trajectories.jsonl \
|
||||
--output=data/compressed.jsonl \
|
||||
--target_max_tokens=16000
|
||||
```
|
||||
|
||||
**Features:**
|
||||
- Protects first turns (system, human, first GPT response, first tool call)
|
||||
- Protects last N turns (configurable)
|
||||
- Summarizes middle turns using LLM to fit target token budget
|
||||
- Supports both directory and single file input
|
||||
- Optional random sampling with `--sample_percent`
|
||||
- Configurable via `configs/trajectory_compression.yaml`
|
||||
|
||||
### Ephemeral System Prompts
|
||||
|
||||
The ephemeral system prompt feature allows you to guide the model's behavior during batch processing **without** saving that prompt to the training dataset trajectories. This is useful for:
|
||||
|
||||
- Guiding model behavior during data collection
|
||||
- Adding task-specific instructions
|
||||
- Keeping saved trajectories clean and focused on tool-calling format
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
python batch_runner.py \
|
||||
--dataset_file=prompts.jsonl \
|
||||
--batch_size=10 \
|
||||
--run_name=my_run \
|
||||
--ephemeral_system_prompt="You are a helpful assistant focused on image generation."
|
||||
```
|
||||
|
||||
The ephemeral prompt will influence the model's behavior during execution, but **only the standard tool-calling system prompt** will be saved in the trajectory files.
|
||||
|
||||
The ephemeral prompt influences model behavior during execution, but **only the standard tool-calling system prompt** is saved in trajectory files.
|
||||
|
||||
## Command Line Arguments
|
||||
|
||||
**Single Agent (`run_agent.py`):**
|
||||
- `--query`: The question or task for the agent
|
||||
- `--model`: Model to use (default: claude-opus-4-20250514)
|
||||
- `--api_key`: API key for authentication
|
||||
- `--base_url`: API endpoint URL
|
||||
- `--max_turns`: Maximum number of tool-calling iterations
|
||||
- `--enabled_toolsets`: Comma-separated list of toolsets to enable. Use `all` (or `*`) to enable everything. If omitted, all toolsets are enabled by default.
|
||||
- `--disabled_toolsets`: Comma-separated list of toolsets to disable
|
||||
- `--list_tools`: List all available toolsets and tools
|
||||
- `--save_trajectories`: Save conversation trajectories to JSONL files
|
||||
|
||||
**Batch Processing (`batch_runner.py`):**
|
||||
- `--dataset_file`: Path to JSONL file with prompts
|
||||
- `--batch_size`: Number of prompts per batch
|
||||
- `--run_name`: Name for this run (for output/checkpointing)
|
||||
- `--distribution`: Toolset distribution to use (default: "default")
|
||||
- `--num_workers`: Number of parallel workers (default: 4)
|
||||
- `--resume`: Resume from checkpoint if interrupted
|
||||
- `--ephemeral_system_prompt`: System prompt used during execution but NOT saved to trajectories
|
||||
- `--list_distributions`: List available toolset distributions
|
||||
|
||||
## Environment Variables
|
||||
|
||||
All environment variables can be configured in the `.env` file (copy from `.env.example`).
|
||||
|
||||
**LLM Provider (OpenRouter):**
|
||||
- `OPENROUTER_API_KEY`: Primary LLM access via OpenRouter (supports Claude, GPT-4, Gemini, etc.)
|
||||
- `LLM_MODEL`: Default model (e.g., `anthropic/claude-sonnet-4`, `openai/gpt-4o`)
|
||||
|
||||
**Tool API Keys:**
|
||||
- `FIRECRAWL_API_KEY`: Web tools (search, extract, crawl)
|
||||
- `NOUS_API_KEY`: Vision and reasoning tools
|
||||
- `FAL_KEY`: Image generation tools
|
||||
|
||||
**Terminal Tool Configuration (mini-swe-agent backend):**
|
||||
- `TERMINAL_ENV`: Backend type - `local`, `docker`, `singularity`, `modal`, or `ssh` (default: `local`)
|
||||
- `TERMINAL_DOCKER_IMAGE`: Docker image for docker backend (default: `python:3.11-slim`)
|
||||
- `TERMINAL_SINGULARITY_IMAGE`: Singularity/Apptainer image (can be `docker://...` URL or local `.sif` path)
|
||||
- `TERMINAL_TIMEOUT`: Command timeout in seconds (default: `60`)
|
||||
- `TERMINAL_LIFETIME_SECONDS`: Cleanup inactive environments after this time (default: `300`)
|
||||
- `TERMINAL_CWD`: Working directory inside containers (default: `/tmp`)
|
||||
- `TERMINAL_SCRATCH_DIR`: Custom scratch directory for sandbox storage (optional, auto-detects `/scratch`)
|
||||
- `SUDO_PASSWORD`: Enable sudo commands by piping password via `sudo -S` (works with all backends)
|
||||
- If unset in CLI mode, you'll be prompted interactively when sudo is needed (45s timeout)
|
||||
|
||||
**SSH Backend Configuration (for remote execution):**
|
||||
- `TERMINAL_SSH_HOST`: Remote server hostname or IP
|
||||
- `TERMINAL_SSH_USER`: SSH username
|
||||
- `TERMINAL_SSH_PORT`: SSH port (default: `22`)
|
||||
- `TERMINAL_SSH_KEY`: Path to SSH private key (optional, uses ssh-agent if not set)
|
||||
|
||||
**Context Compression (auto-shrinks long conversations):**
|
||||
- `CONTEXT_COMPRESSION_ENABLED`: Enable auto-compression (default: `true`)
|
||||
- `CONTEXT_COMPRESSION_THRESHOLD`: Compress at this % of context limit (default: `0.85`)
|
||||
- `CONTEXT_COMPRESSION_MODEL`: Model for generating summaries (default: `google/gemini-2.0-flash-001`)
|
||||
|
||||
**Browser Tool Configuration (agent-browser + Browserbase):**
|
||||
- `BROWSERBASE_API_KEY`: Browserbase API key for cloud browser execution
|
||||
- `BROWSERBASE_PROJECT_ID`: Browserbase project ID
|
||||
- `BROWSER_SESSION_TIMEOUT`: Session timeout in seconds (default: `300`)
|
||||
|
||||
**Legacy Hecate Terminal Backend (optional):**
|
||||
- `MORPH_API_KEY`: For Hecate/MorphCloud terminal backend
|
||||
- `HECATE_VM_LIFETIME_SECONDS`: VM lifetime (default: 300)
|
||||
- `HECATE_DEFAULT_SNAPSHOT_ID`: Default snapshot (default: snapshot_p5294qxt)
|
||||
|
||||
**Debug Options:**
|
||||
- `WEB_TOOLS_DEBUG`, `VISION_TOOLS_DEBUG`, `MOA_TOOLS_DEBUG`, `IMAGE_TOOLS_DEBUG`: Enable debug logging
|
||||
|
||||
## Key Files
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `hermes` | CLI launcher script (run with `./hermes`) |
|
||||
| `cli.py` | Interactive CLI implementation |
|
||||
| `cli-config.yaml` | CLI configuration (copy from `.example`) |
|
||||
| `run_agent.py` | Main agent runner - single query execution |
|
||||
| `batch_runner.py` | Parallel batch processing with checkpointing |
|
||||
| `model_tools.py` | Core tool definitions and handlers |
|
||||
| `toolsets.py` | Toolset definitions and composition |
|
||||
| `toolset_distributions.py` | Probability distributions for data generation |
|
||||
| `trajectory_compressor.py` | Post-process trajectories for training |
|
||||
| `tools/` | Individual tool implementations |
|
||||
| `tools/skills_tool.py` | Skills system with progressive disclosure |
|
||||
| `skills/` | On-demand knowledge documents |
|
||||
| `docs/` | Documentation |
|
||||
| `configs/` | Example batch run scripts |
|
||||
@@ -1,47 +0,0 @@
|
||||
README.md
|
||||
batch_runner.py
|
||||
cli.py
|
||||
model_tools.py
|
||||
pyproject.toml
|
||||
run_agent.py
|
||||
toolset_distributions.py
|
||||
toolsets.py
|
||||
trajectory_compressor.py
|
||||
cron/__init__.py
|
||||
cron/jobs.py
|
||||
cron/scheduler.py
|
||||
gateway/__init__.py
|
||||
gateway/config.py
|
||||
gateway/delivery.py
|
||||
gateway/run.py
|
||||
gateway/session.py
|
||||
hermes_agent.egg-info/PKG-INFO
|
||||
hermes_agent.egg-info/SOURCES.txt
|
||||
hermes_agent.egg-info/dependency_links.txt
|
||||
hermes_agent.egg-info/entry_points.txt
|
||||
hermes_agent.egg-info/requires.txt
|
||||
hermes_agent.egg-info/top_level.txt
|
||||
hermes_cli/__init__.py
|
||||
hermes_cli/cron.py
|
||||
hermes_cli/doctor.py
|
||||
hermes_cli/gateway.py
|
||||
hermes_cli/main.py
|
||||
hermes_cli/setup.py
|
||||
hermes_cli/status.py
|
||||
tests/test_batch_runner.py
|
||||
tests/test_checkpoint_resumption.py
|
||||
tests/test_modal_terminal.py
|
||||
tests/test_nous_api_limits.py
|
||||
tests/test_nous_api_pattern.py
|
||||
tests/test_temperature_fix.py
|
||||
tests/test_web_tools.py
|
||||
tools/__init__.py
|
||||
tools/browser_tool.py
|
||||
tools/cronjob_tools.py
|
||||
tools/image_generation_tool.py
|
||||
tools/mixture_of_agents_tool.py
|
||||
tools/skills_tool.py
|
||||
tools/terminal_hecate.py
|
||||
tools/terminal_tool.py
|
||||
tools/vision_tools.py
|
||||
tools/web_tools.py
|
||||
@@ -1 +0,0 @@
|
||||
|
||||
@@ -1,3 +0,0 @@
|
||||
[console_scripts]
|
||||
hermes = hermes_cli.main:main
|
||||
hermes-agent = run_agent:main
|
||||
@@ -1,35 +0,0 @@
|
||||
openai
|
||||
python-dotenv
|
||||
fire
|
||||
httpx
|
||||
rich
|
||||
tenacity
|
||||
pyyaml
|
||||
requests
|
||||
jinja2
|
||||
pydantic>=2.0
|
||||
firecrawl-py
|
||||
fal-client
|
||||
litellm>=1.75.5
|
||||
typer
|
||||
platformdirs
|
||||
|
||||
[all]
|
||||
croniter
|
||||
python-telegram-bot>=20.0
|
||||
discord.py>=2.0
|
||||
|
||||
[cron]
|
||||
croniter
|
||||
|
||||
[dev]
|
||||
pytest
|
||||
pytest-asyncio
|
||||
|
||||
[messaging]
|
||||
python-telegram-bot>=20.0
|
||||
discord.py>=2.0
|
||||
|
||||
[modal]
|
||||
modal
|
||||
boto3
|
||||
@@ -1,11 +0,0 @@
|
||||
batch_runner
|
||||
cli
|
||||
cron
|
||||
gateway
|
||||
hermes_cli
|
||||
model_tools
|
||||
run_agent
|
||||
tools
|
||||
toolset_distributions
|
||||
toolsets
|
||||
trajectory_compressor
|
||||
@@ -58,8 +58,11 @@ def run_doctor(args):
|
||||
print(color("◆ Python Environment", Colors.CYAN, Colors.BOLD))
|
||||
|
||||
py_version = sys.version_info
|
||||
if py_version >= (3, 10):
|
||||
if py_version >= (3, 11):
|
||||
check_ok(f"Python {py_version.major}.{py_version.minor}.{py_version.micro}")
|
||||
elif py_version >= (3, 10):
|
||||
check_ok(f"Python {py_version.major}.{py_version.minor}.{py_version.micro}")
|
||||
check_warn("Python 3.11+ recommended for RL Training tools (tinker requires >= 3.11)")
|
||||
elif py_version >= (3, 8):
|
||||
check_warn(f"Python {py_version.major}.{py_version.minor}.{py_version.micro}", "(3.10+ recommended)")
|
||||
else:
|
||||
@@ -100,7 +103,7 @@ def run_doctor(args):
|
||||
check_ok(name)
|
||||
except ImportError:
|
||||
check_fail(name, "(missing)")
|
||||
issues.append(f"Install {name}: pip install {module}")
|
||||
issues.append(f"Install {name}: uv pip install {module}")
|
||||
|
||||
for module, name in optional_packages:
|
||||
try:
|
||||
@@ -263,6 +266,39 @@ def run_doctor(args):
|
||||
except Exception as e:
|
||||
check_warn("Anthropic API", f"({e})")
|
||||
|
||||
# =========================================================================
|
||||
# Check: Submodules
|
||||
# =========================================================================
|
||||
print()
|
||||
print(color("◆ Submodules", Colors.CYAN, Colors.BOLD))
|
||||
|
||||
# mini-swe-agent (terminal tool backend)
|
||||
mini_swe_dir = PROJECT_ROOT / "mini-swe-agent"
|
||||
if mini_swe_dir.exists() and (mini_swe_dir / "pyproject.toml").exists():
|
||||
try:
|
||||
__import__("minisweagent")
|
||||
check_ok("mini-swe-agent", "(terminal backend)")
|
||||
except ImportError:
|
||||
check_warn("mini-swe-agent found but not installed", "(run: uv pip install -e ./mini-swe-agent)")
|
||||
issues.append("Install mini-swe-agent: uv pip install -e ./mini-swe-agent")
|
||||
else:
|
||||
check_warn("mini-swe-agent not found", "(run: git submodule update --init --recursive)")
|
||||
|
||||
# tinker-atropos (RL training backend)
|
||||
tinker_dir = PROJECT_ROOT / "tinker-atropos"
|
||||
if tinker_dir.exists() and (tinker_dir / "pyproject.toml").exists():
|
||||
if py_version >= (3, 11):
|
||||
try:
|
||||
__import__("tinker_atropos")
|
||||
check_ok("tinker-atropos", "(RL training backend)")
|
||||
except ImportError:
|
||||
check_warn("tinker-atropos found but not installed", "(run: uv pip install -e ./tinker-atropos)")
|
||||
issues.append("Install tinker-atropos: uv pip install -e ./tinker-atropos")
|
||||
else:
|
||||
check_warn("tinker-atropos requires Python 3.11+", f"(current: {py_version.major}.{py_version.minor})")
|
||||
else:
|
||||
check_warn("tinker-atropos not found", "(run: git submodule update --init --recursive)")
|
||||
|
||||
# =========================================================================
|
||||
# Check: Tool Availability
|
||||
# =========================================================================
|
||||
|
||||
@@ -119,6 +119,7 @@ def cmd_uninstall(args):
|
||||
def cmd_update(args):
|
||||
"""Update Hermes Agent to the latest version."""
|
||||
import subprocess
|
||||
import shutil
|
||||
|
||||
print("🦋 Updating Hermes Agent...")
|
||||
print()
|
||||
@@ -163,13 +164,21 @@ def cmd_update(args):
|
||||
print("→ Pulling updates...")
|
||||
subprocess.run(["git", "pull", "origin", branch], cwd=PROJECT_ROOT, check=True)
|
||||
|
||||
# Reinstall Python dependencies
|
||||
# Reinstall Python dependencies (prefer uv for speed, fall back to pip)
|
||||
print("→ Updating Python dependencies...")
|
||||
venv_pip = PROJECT_ROOT / "venv" / "bin" / "pip"
|
||||
if venv_pip.exists():
|
||||
subprocess.run([str(venv_pip), "install", "-e", ".", "--quiet"], cwd=PROJECT_ROOT, check=True)
|
||||
uv_bin = shutil.which("uv")
|
||||
if uv_bin:
|
||||
subprocess.run(
|
||||
[uv_bin, "pip", "install", "-e", ".", "--quiet"],
|
||||
cwd=PROJECT_ROOT, check=True,
|
||||
env={**os.environ, "VIRTUAL_ENV": str(PROJECT_ROOT / "venv")}
|
||||
)
|
||||
else:
|
||||
subprocess.run(["pip", "install", "-e", ".", "--quiet"], cwd=PROJECT_ROOT, check=True)
|
||||
venv_pip = PROJECT_ROOT / "venv" / "bin" / "pip"
|
||||
if venv_pip.exists():
|
||||
subprocess.run([str(venv_pip), "install", "-e", ".", "--quiet"], cwd=PROJECT_ROOT, check=True)
|
||||
else:
|
||||
subprocess.run(["pip", "install", "-e", ".", "--quiet"], cwd=PROJECT_ROOT, check=True)
|
||||
|
||||
# Check for Node.js deps
|
||||
if (PROJECT_ROOT / "package.json").exists():
|
||||
|
||||
@@ -652,6 +652,32 @@ def run_setup_wizard(args):
|
||||
print_info("Modal Cloud Configuration:")
|
||||
print_info("Get credentials at: https://modal.com/settings")
|
||||
|
||||
# Check if swe-rex[modal] is installed, install if missing
|
||||
try:
|
||||
from swerex.deployment.modal import ModalDeployment
|
||||
print_info("swe-rex[modal] package: installed ✓")
|
||||
except ImportError:
|
||||
print_info("Installing required package: swe-rex[modal]...")
|
||||
import subprocess
|
||||
import shutil
|
||||
# Prefer uv for speed, fall back to pip
|
||||
uv_bin = shutil.which("uv")
|
||||
if uv_bin:
|
||||
result = subprocess.run(
|
||||
[uv_bin, "pip", "install", "swe-rex[modal]>=1.4.0"],
|
||||
capture_output=True, text=True
|
||||
)
|
||||
else:
|
||||
result = subprocess.run(
|
||||
[sys.executable, "-m", "pip", "install", "swe-rex[modal]>=1.4.0"],
|
||||
capture_output=True, text=True
|
||||
)
|
||||
if result.returncode == 0:
|
||||
print_success("swe-rex[modal] installed (includes modal + boto3)")
|
||||
else:
|
||||
print_warning("Failed to install swe-rex[modal] — install manually:")
|
||||
print_info(' uv pip install "swe-rex[modal]>=1.4.0"')
|
||||
|
||||
# Always show current status and allow reconfiguration
|
||||
current_token = get_env_value('MODAL_TOKEN_ID')
|
||||
if current_token:
|
||||
@@ -917,6 +943,24 @@ def run_setup_wizard(args):
|
||||
save_env_value("BROWSERBASE_API_KEY", api_key)
|
||||
if project_id:
|
||||
save_env_value("BROWSERBASE_PROJECT_ID", project_id)
|
||||
|
||||
# Check if Node.js dependencies are installed (required for browser tools)
|
||||
import shutil
|
||||
node_modules = PROJECT_ROOT / "node_modules" / "agent-browser"
|
||||
if not node_modules.exists() and shutil.which("npm"):
|
||||
print_info(" Installing Node.js dependencies for browser tools...")
|
||||
import subprocess
|
||||
result = subprocess.run(
|
||||
["npm", "install", "--silent"],
|
||||
capture_output=True, text=True, cwd=str(PROJECT_ROOT)
|
||||
)
|
||||
if result.returncode == 0:
|
||||
print_success(" Node.js dependencies installed")
|
||||
else:
|
||||
print_warning(" npm install failed — run manually: cd ~/.hermes/hermes-agent && npm install")
|
||||
elif not node_modules.exists():
|
||||
print_warning(" Node.js not found — browser tools require: npm install (in the hermes-agent directory)")
|
||||
|
||||
print_success(" Configured ✓")
|
||||
print()
|
||||
|
||||
@@ -950,6 +994,11 @@ def run_setup_wizard(args):
|
||||
tinker_configured = get_env_value('TINKER_API_KEY')
|
||||
wandb_configured = get_env_value('WANDB_API_KEY')
|
||||
|
||||
# Check Python version requirement upfront
|
||||
rl_python_ok = sys.version_info >= (3, 11)
|
||||
if not rl_python_ok:
|
||||
print_warning(f" Requires Python 3.11+ (current: {sys.version_info.major}.{sys.version_info.minor})")
|
||||
|
||||
if tinker_configured and wandb_configured:
|
||||
print_success(" Status: Configured ✓")
|
||||
if prompt_yes_no(" Update RL training credentials?", False):
|
||||
@@ -969,18 +1018,55 @@ def run_setup_wizard(args):
|
||||
print_warning(" Status: Not configured (tools will be disabled)")
|
||||
|
||||
if prompt_yes_no(" Set up RL Training?", False):
|
||||
print_info(" Get Tinker key at: https://tinker-console.thinkingmachines.ai/keys")
|
||||
print_info(" Get WandB key at: https://wandb.ai/authorize")
|
||||
api_key = prompt(" Tinker API key", password=True)
|
||||
if api_key:
|
||||
save_env_value("TINKER_API_KEY", api_key)
|
||||
wandb_key = prompt(" WandB API key", password=True)
|
||||
if wandb_key:
|
||||
save_env_value("WANDB_API_KEY", wandb_key)
|
||||
if api_key and wandb_key:
|
||||
print_success(" Configured ✓")
|
||||
# Check Python version before proceeding
|
||||
if not rl_python_ok:
|
||||
print_error(f" Python 3.11+ required (current: {sys.version_info.major}.{sys.version_info.minor})")
|
||||
print_info(" Upgrade Python and reinstall to enable RL training tools")
|
||||
else:
|
||||
print_warning(" Partially configured (both keys required)")
|
||||
print_info(" Get Tinker key at: https://tinker-console.thinkingmachines.ai/keys")
|
||||
print_info(" Get WandB key at: https://wandb.ai/authorize")
|
||||
api_key = prompt(" Tinker API key", password=True)
|
||||
if api_key:
|
||||
save_env_value("TINKER_API_KEY", api_key)
|
||||
wandb_key = prompt(" WandB API key", password=True)
|
||||
if wandb_key:
|
||||
save_env_value("WANDB_API_KEY", wandb_key)
|
||||
|
||||
# Check if tinker-atropos submodule is installed
|
||||
try:
|
||||
__import__("tinker_atropos")
|
||||
except ImportError:
|
||||
tinker_dir = PROJECT_ROOT / "tinker-atropos"
|
||||
if tinker_dir.exists() and (tinker_dir / "pyproject.toml").exists():
|
||||
print_info(" Installing tinker-atropos submodule...")
|
||||
import subprocess
|
||||
import shutil
|
||||
# Prefer uv for speed, fall back to pip
|
||||
uv_bin = shutil.which("uv")
|
||||
if uv_bin:
|
||||
result = subprocess.run(
|
||||
[uv_bin, "pip", "install", "-e", str(tinker_dir)],
|
||||
capture_output=True, text=True
|
||||
)
|
||||
else:
|
||||
result = subprocess.run(
|
||||
[sys.executable, "-m", "pip", "install", "-e", str(tinker_dir)],
|
||||
capture_output=True, text=True
|
||||
)
|
||||
if result.returncode == 0:
|
||||
print_success(" tinker-atropos installed")
|
||||
else:
|
||||
print_warning(" tinker-atropos install failed — run manually:")
|
||||
print_info(' uv pip install -e "./tinker-atropos"')
|
||||
else:
|
||||
print_warning(" tinker-atropos submodule not found — run:")
|
||||
print_info(" git submodule update --init --recursive")
|
||||
print_info(' uv pip install -e "./tinker-atropos"')
|
||||
|
||||
if api_key and wandb_key:
|
||||
print_success(" Configured ✓")
|
||||
else:
|
||||
print_warning(" Partially configured (both keys required)")
|
||||
|
||||
# =========================================================================
|
||||
# Save config and show summary
|
||||
|
||||
@@ -1191,8 +1191,19 @@ def handle_web_function_call(function_name: str, function_args: Dict[str, Any])
|
||||
urls = function_args.get("urls", [])
|
||||
# Limit URLs to prevent abuse
|
||||
urls = urls[:5] if isinstance(urls, list) else []
|
||||
# Run async function in event loop
|
||||
return asyncio.run(web_extract_tool(urls, "markdown"))
|
||||
# Run async function -- use existing loop if available (Atropos),
|
||||
# otherwise create one (normal CLI)
|
||||
try:
|
||||
loop = asyncio.get_running_loop()
|
||||
# Already in an async context (Atropos) -- run in a thread
|
||||
import concurrent.futures
|
||||
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
|
||||
return pool.submit(
|
||||
lambda: asyncio.run(web_extract_tool(urls, "markdown"))
|
||||
).result(timeout=120)
|
||||
except RuntimeError:
|
||||
# No running loop (normal CLI) -- use asyncio.run directly
|
||||
return asyncio.run(web_extract_tool(urls, "markdown"))
|
||||
|
||||
else:
|
||||
return json.dumps({"error": f"Unknown web function: {function_name}"}, ensure_ascii=False)
|
||||
|
||||
@@ -22,6 +22,8 @@ dependencies = [
|
||||
"requests",
|
||||
"jinja2",
|
||||
"pydantic>=2.0",
|
||||
# Interactive CLI (prompt_toolkit is used directly by cli.py)
|
||||
"prompt_toolkit",
|
||||
# Tools
|
||||
"firecrawl-py",
|
||||
"fal-client",
|
||||
@@ -32,12 +34,18 @@ dependencies = [
|
||||
]
|
||||
|
||||
[project.optional-dependencies]
|
||||
modal = ["modal", "boto3"]
|
||||
modal = ["swe-rex[modal]>=1.4.0"]
|
||||
dev = ["pytest", "pytest-asyncio"]
|
||||
messaging = ["python-telegram-bot>=20.0", "discord.py>=2.0"]
|
||||
messaging = ["python-telegram-bot>=20.0", "discord.py>=2.0", "aiohttp>=3.9.0"]
|
||||
cron = ["croniter"]
|
||||
cli = ["simple-term-menu"]
|
||||
all = ["croniter", "python-telegram-bot>=20.0", "discord.py>=2.0", "simple-term-menu"]
|
||||
all = [
|
||||
"hermes-agent[modal]",
|
||||
"hermes-agent[messaging]",
|
||||
"hermes-agent[cron]",
|
||||
"hermes-agent[cli]",
|
||||
"hermes-agent[dev]",
|
||||
]
|
||||
|
||||
[project.scripts]
|
||||
hermes = "hermes_cli.main:main"
|
||||
|
||||
@@ -6,6 +6,10 @@ httpx
|
||||
rich
|
||||
tenacity
|
||||
prompt_toolkit
|
||||
pyyaml
|
||||
requests
|
||||
jinja2
|
||||
pydantic>=2.0
|
||||
|
||||
# Web tools
|
||||
firecrawl-py
|
||||
@@ -15,10 +19,6 @@ fal-client
|
||||
|
||||
# mini-swe-agent dependencies (for terminal tool)
|
||||
# Note: Install mini-swe-agent itself with: pip install -e ./mini-swe-agent
|
||||
pyyaml
|
||||
requests
|
||||
jinja2
|
||||
pydantic>=2.0
|
||||
litellm>=1.75.5
|
||||
typer
|
||||
platformdirs
|
||||
@@ -27,18 +27,17 @@ platformdirs
|
||||
# Requires Docker installed and user in 'docker' group
|
||||
|
||||
# Optional: For Modal backend (cloud execution)
|
||||
# modal
|
||||
# boto3
|
||||
# swe-rex[modal]>=1.4.0 # Includes modal + boto3 + swe-rex runtime
|
||||
|
||||
# Optional: For cron expression parsing (cronjob scheduling)
|
||||
croniter
|
||||
|
||||
# Optional: For messaging platform integrations (gateway)
|
||||
# Telegram: pip install python-telegram-bot
|
||||
# Telegram
|
||||
python-telegram-bot>=20.0
|
||||
|
||||
# Discord: pip install discord.py
|
||||
# Discord
|
||||
discord.py>=2.0
|
||||
|
||||
# WhatsApp: Requires Node.js bridge (see docs/messaging.md)
|
||||
# aiohttp # For WhatsApp bridge communication
|
||||
# WhatsApp bridge communication + general async HTTP (used by gateway)
|
||||
aiohttp>=3.9.0
|
||||
|
||||
@@ -2,6 +2,7 @@
|
||||
# Hermes Agent Installer for Windows
|
||||
# ============================================================================
|
||||
# Installation script for Windows (PowerShell).
|
||||
# Uses uv for fast Python provisioning and package management.
|
||||
#
|
||||
# Usage:
|
||||
# irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1 | iex
|
||||
@@ -27,6 +28,7 @@ $ErrorActionPreference = "Stop"
|
||||
|
||||
$RepoUrlSsh = "git@github.com:NousResearch/hermes-agent.git"
|
||||
$RepoUrlHttps = "https://github.com/NousResearch/hermes-agent.git"
|
||||
$PythonVersion = "3.11"
|
||||
|
||||
# ============================================================================
|
||||
# Helper functions
|
||||
@@ -52,12 +54,12 @@ function Write-Success {
|
||||
Write-Host "✓ $Message" -ForegroundColor Green
|
||||
}
|
||||
|
||||
function Write-Warning {
|
||||
function Write-Warn {
|
||||
param([string]$Message)
|
||||
Write-Host "⚠ $Message" -ForegroundColor Yellow
|
||||
}
|
||||
|
||||
function Write-Error {
|
||||
function Write-Err {
|
||||
param([string]$Message)
|
||||
Write-Host "✗ $Message" -ForegroundColor Red
|
||||
}
|
||||
@@ -66,33 +68,93 @@ function Write-Error {
|
||||
# Dependency checks
|
||||
# ============================================================================
|
||||
|
||||
function Test-Python {
|
||||
Write-Info "Checking Python..."
|
||||
function Install-Uv {
|
||||
Write-Info "Checking for uv package manager..."
|
||||
|
||||
# Try different python commands
|
||||
$pythonCmds = @("python3", "python", "py -3")
|
||||
# Check if uv is already available
|
||||
if (Get-Command uv -ErrorAction SilentlyContinue) {
|
||||
$version = uv --version
|
||||
$script:UvCmd = "uv"
|
||||
Write-Success "uv found ($version)"
|
||||
return $true
|
||||
}
|
||||
|
||||
foreach ($cmd in $pythonCmds) {
|
||||
try {
|
||||
$version = & $cmd.Split()[0] $cmd.Split()[1..99] -c "import sys; print(f'{sys.version_info.major}.{sys.version_info.minor}')" 2>$null
|
||||
if ($version) {
|
||||
$major, $minor = $version.Split('.')
|
||||
if ([int]$major -ge 3 -and [int]$minor -ge 10) {
|
||||
$script:PythonCmd = $cmd
|
||||
Write-Success "Python $version found"
|
||||
return $true
|
||||
}
|
||||
}
|
||||
} catch {
|
||||
# Try next command
|
||||
# Check common install locations
|
||||
$uvPaths = @(
|
||||
"$env:USERPROFILE\.local\bin\uv.exe",
|
||||
"$env:USERPROFILE\.cargo\bin\uv.exe"
|
||||
)
|
||||
foreach ($uvPath in $uvPaths) {
|
||||
if (Test-Path $uvPath) {
|
||||
$script:UvCmd = $uvPath
|
||||
$version = & $uvPath --version
|
||||
Write-Success "uv found at $uvPath ($version)"
|
||||
return $true
|
||||
}
|
||||
}
|
||||
|
||||
Write-Error "Python 3.10+ not found"
|
||||
Write-Info "Please install Python 3.10 or newer from:"
|
||||
Write-Info " https://www.python.org/downloads/"
|
||||
Write-Info ""
|
||||
Write-Info "Make sure to check 'Add Python to PATH' during installation"
|
||||
# Install uv
|
||||
Write-Info "Installing uv (fast Python package manager)..."
|
||||
try {
|
||||
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" 2>&1 | Out-Null
|
||||
|
||||
# Find the installed binary
|
||||
$uvExe = "$env:USERPROFILE\.local\bin\uv.exe"
|
||||
if (-not (Test-Path $uvExe)) {
|
||||
$uvExe = "$env:USERPROFILE\.cargo\bin\uv.exe"
|
||||
}
|
||||
if (-not (Test-Path $uvExe)) {
|
||||
# Refresh PATH and try again
|
||||
$env:Path = [Environment]::GetEnvironmentVariable("Path", "User") + ";" + [Environment]::GetEnvironmentVariable("Path", "Machine")
|
||||
if (Get-Command uv -ErrorAction SilentlyContinue) {
|
||||
$uvExe = (Get-Command uv).Source
|
||||
}
|
||||
}
|
||||
|
||||
if (Test-Path $uvExe) {
|
||||
$script:UvCmd = $uvExe
|
||||
$version = & $uvExe --version
|
||||
Write-Success "uv installed ($version)"
|
||||
return $true
|
||||
}
|
||||
|
||||
Write-Err "uv installed but not found on PATH"
|
||||
Write-Info "Try restarting your terminal and re-running"
|
||||
return $false
|
||||
} catch {
|
||||
Write-Err "Failed to install uv"
|
||||
Write-Info "Install manually: https://docs.astral.sh/uv/getting-started/installation/"
|
||||
return $false
|
||||
}
|
||||
}
|
||||
|
||||
function Test-Python {
|
||||
Write-Info "Checking Python $PythonVersion..."
|
||||
|
||||
# Let uv find or install Python
|
||||
try {
|
||||
$pythonPath = & $UvCmd python find $PythonVersion 2>$null
|
||||
if ($pythonPath) {
|
||||
$ver = & $pythonPath --version 2>$null
|
||||
Write-Success "Python found: $ver"
|
||||
return $true
|
||||
}
|
||||
} catch { }
|
||||
|
||||
# Python not found — use uv to install it (no admin needed!)
|
||||
Write-Info "Python $PythonVersion not found, installing via uv..."
|
||||
try {
|
||||
& $UvCmd python install $PythonVersion 2>&1 | Out-Null
|
||||
$pythonPath = & $UvCmd python find $PythonVersion 2>$null
|
||||
if ($pythonPath) {
|
||||
$ver = & $pythonPath --version 2>$null
|
||||
Write-Success "Python installed: $ver"
|
||||
return $true
|
||||
}
|
||||
} catch { }
|
||||
|
||||
Write-Err "Failed to install Python $PythonVersion"
|
||||
Write-Info "Install Python $PythonVersion manually, then re-run this script"
|
||||
return $false
|
||||
}
|
||||
|
||||
@@ -105,7 +167,7 @@ function Test-Git {
|
||||
return $true
|
||||
}
|
||||
|
||||
Write-Error "Git not found"
|
||||
Write-Err "Git not found"
|
||||
Write-Info "Please install Git from:"
|
||||
Write-Info " https://git-scm.com/download/win"
|
||||
return $false
|
||||
@@ -121,7 +183,7 @@ function Test-Node {
|
||||
return $true
|
||||
}
|
||||
|
||||
Write-Warning "Node.js not found (browser tools will be limited)"
|
||||
Write-Warn "Node.js not found (browser tools will be limited)"
|
||||
Write-Info "To install Node.js (optional):"
|
||||
Write-Info " https://nodejs.org/en/download/"
|
||||
$script:HasNode = $false
|
||||
@@ -138,7 +200,7 @@ function Test-Ripgrep {
|
||||
return $true
|
||||
}
|
||||
|
||||
Write-Warning "ripgrep not found (file search will use findstr fallback)"
|
||||
Write-Warn "ripgrep not found (file search will use findstr fallback)"
|
||||
|
||||
# Check what package managers are available
|
||||
$hasWinget = Get-Command winget -ErrorAction SilentlyContinue
|
||||
@@ -185,7 +247,7 @@ function Test-Ripgrep {
|
||||
} catch { }
|
||||
}
|
||||
|
||||
Write-Warning "Auto-install failed. You can install manually:"
|
||||
Write-Warn "Auto-install failed. You can install manually:"
|
||||
} else {
|
||||
Write-Info "Skipping ripgrep installation. To install manually:"
|
||||
}
|
||||
@@ -216,13 +278,12 @@ function Install-Repository {
|
||||
git pull origin $Branch
|
||||
Pop-Location
|
||||
} else {
|
||||
Write-Error "Directory exists but is not a git repository: $InstallDir"
|
||||
Write-Err "Directory exists but is not a git repository: $InstallDir"
|
||||
Write-Info "Remove it or choose a different directory with -InstallDir"
|
||||
exit 1
|
||||
}
|
||||
} else {
|
||||
# Try SSH first (for private repo access), fall back to HTTPS
|
||||
# Use --recurse-submodules to also clone mini-swe-agent and tinker-atropos
|
||||
Write-Info "Trying SSH clone..."
|
||||
$sshResult = git clone --branch $Branch --recurse-submodules $RepoUrlSsh $InstallDir 2>&1
|
||||
|
||||
@@ -235,7 +296,7 @@ function Install-Repository {
|
||||
if ($LASTEXITCODE -eq 0) {
|
||||
Write-Success "Cloned via HTTPS"
|
||||
} else {
|
||||
Write-Error "Failed to clone repository"
|
||||
Write-Err "Failed to clone repository"
|
||||
Write-Info "For private repo access, ensure your SSH key is added to GitHub:"
|
||||
Write-Info " ssh-add ~/.ssh/id_rsa"
|
||||
Write-Info " ssh -T git@github.com # Test connection"
|
||||
@@ -244,7 +305,7 @@ function Install-Repository {
|
||||
}
|
||||
}
|
||||
|
||||
# Ensure submodules are initialized and updated (for existing installs or if --recurse failed)
|
||||
# Ensure submodules are initialized and updated
|
||||
Write-Info "Initializing submodules (mini-swe-agent, tinker-atropos)..."
|
||||
Push-Location $InstallDir
|
||||
git submodule update --init --recursive
|
||||
@@ -260,23 +321,21 @@ function Install-Venv {
|
||||
return
|
||||
}
|
||||
|
||||
Write-Info "Creating virtual environment..."
|
||||
Write-Info "Creating virtual environment with Python $PythonVersion..."
|
||||
|
||||
Push-Location $InstallDir
|
||||
|
||||
if (-not (Test-Path "venv")) {
|
||||
& $PythonCmd -m venv venv
|
||||
if (Test-Path "venv") {
|
||||
Write-Info "Virtual environment already exists, recreating..."
|
||||
Remove-Item -Recurse -Force "venv"
|
||||
}
|
||||
|
||||
# Activate
|
||||
& .\venv\Scripts\Activate.ps1
|
||||
|
||||
# Upgrade pip
|
||||
pip install --upgrade pip wheel setuptools | Out-Null
|
||||
# uv creates the venv and pins the Python version in one step
|
||||
& $UvCmd venv venv --python $PythonVersion
|
||||
|
||||
Pop-Location
|
||||
|
||||
Write-Success "Virtual environment ready"
|
||||
Write-Success "Virtual environment ready (Python $PythonVersion)"
|
||||
}
|
||||
|
||||
function Install-Dependencies {
|
||||
@@ -285,14 +344,15 @@ function Install-Dependencies {
|
||||
Push-Location $InstallDir
|
||||
|
||||
if (-not $NoVenv) {
|
||||
& .\venv\Scripts\Activate.ps1
|
||||
# Tell uv to install into our venv (no activation needed)
|
||||
$env:VIRTUAL_ENV = "$InstallDir\venv"
|
||||
}
|
||||
|
||||
# Install main package
|
||||
# Install main package with all extras
|
||||
try {
|
||||
pip install -e ".[all]" 2>&1 | Out-Null
|
||||
& $UvCmd pip install -e ".[all]" 2>&1 | Out-Null
|
||||
} catch {
|
||||
pip install -e "." | Out-Null
|
||||
& $UvCmd pip install -e "." | Out-Null
|
||||
}
|
||||
|
||||
Write-Success "Main package installed"
|
||||
@@ -301,25 +361,25 @@ function Install-Dependencies {
|
||||
Write-Info "Installing mini-swe-agent (terminal tool backend)..."
|
||||
if (Test-Path "mini-swe-agent\pyproject.toml") {
|
||||
try {
|
||||
pip install -e ".\mini-swe-agent" 2>&1 | Out-Null
|
||||
& $UvCmd pip install -e ".\mini-swe-agent" 2>&1 | Out-Null
|
||||
Write-Success "mini-swe-agent installed"
|
||||
} catch {
|
||||
Write-Warning "mini-swe-agent install failed (terminal tools may not work)"
|
||||
Write-Warn "mini-swe-agent install failed (terminal tools may not work)"
|
||||
}
|
||||
} else {
|
||||
Write-Warning "mini-swe-agent not found (run: git submodule update --init)"
|
||||
Write-Warn "mini-swe-agent not found (run: git submodule update --init)"
|
||||
}
|
||||
|
||||
Write-Info "Installing tinker-atropos (RL training backend)..."
|
||||
if (Test-Path "tinker-atropos\pyproject.toml") {
|
||||
try {
|
||||
pip install -e ".\tinker-atropos" 2>&1 | Out-Null
|
||||
& $UvCmd pip install -e ".\tinker-atropos" 2>&1 | Out-Null
|
||||
Write-Success "tinker-atropos installed"
|
||||
} catch {
|
||||
Write-Warning "tinker-atropos install failed (RL tools may not work)"
|
||||
Write-Warn "tinker-atropos install failed (RL tools may not work)"
|
||||
}
|
||||
} else {
|
||||
Write-Warning "tinker-atropos not found (run: git submodule update --init)"
|
||||
Write-Warn "tinker-atropos not found (run: git submodule update --init)"
|
||||
}
|
||||
|
||||
Pop-Location
|
||||
@@ -328,41 +388,44 @@ function Install-Dependencies {
|
||||
}
|
||||
|
||||
function Set-PathVariable {
|
||||
Write-Info "Setting up PATH..."
|
||||
Write-Info "Setting up hermes command..."
|
||||
|
||||
if ($NoVenv) {
|
||||
$binDir = "$InstallDir"
|
||||
$hermesBin = "$InstallDir"
|
||||
} else {
|
||||
$binDir = "$InstallDir\venv\Scripts"
|
||||
$hermesBin = "$InstallDir\venv\Scripts"
|
||||
}
|
||||
|
||||
# Add to user PATH
|
||||
# Add the venv Scripts dir to user PATH so hermes is globally available
|
||||
# On Windows, the hermes.exe in venv\Scripts\ has the venv Python baked in
|
||||
$currentPath = [Environment]::GetEnvironmentVariable("Path", "User")
|
||||
|
||||
if ($currentPath -notlike "*$binDir*") {
|
||||
if ($currentPath -notlike "*$hermesBin*") {
|
||||
[Environment]::SetEnvironmentVariable(
|
||||
"Path",
|
||||
"$binDir;$currentPath",
|
||||
"$hermesBin;$currentPath",
|
||||
"User"
|
||||
)
|
||||
Write-Success "Added to user PATH"
|
||||
Write-Success "Added to user PATH: $hermesBin"
|
||||
} else {
|
||||
Write-Info "PATH already configured"
|
||||
}
|
||||
|
||||
# Update current session
|
||||
$env:Path = "$binDir;$env:Path"
|
||||
$env:Path = "$hermesBin;$env:Path"
|
||||
|
||||
Write-Success "hermes command ready"
|
||||
}
|
||||
|
||||
function Copy-ConfigTemplates {
|
||||
Write-Info "Setting up configuration files..."
|
||||
|
||||
# Create ~/.hermes directory structure (config at top level, code in subdir)
|
||||
# Create ~/.hermes directory structure
|
||||
New-Item -ItemType Directory -Force -Path "$HermesHome\cron" | Out-Null
|
||||
New-Item -ItemType Directory -Force -Path "$HermesHome\sessions" | Out-Null
|
||||
New-Item -ItemType Directory -Force -Path "$HermesHome\logs" | Out-Null
|
||||
|
||||
# Create .env at ~/.hermes/.env (top level, easy to find)
|
||||
# Create .env
|
||||
$envPath = "$HermesHome\.env"
|
||||
if (-not (Test-Path $envPath)) {
|
||||
$examplePath = "$InstallDir\.env.example"
|
||||
@@ -370,7 +433,6 @@ function Copy-ConfigTemplates {
|
||||
Copy-Item $examplePath $envPath
|
||||
Write-Success "Created ~/.hermes/.env from template"
|
||||
} else {
|
||||
# Create empty .env if no example exists
|
||||
New-Item -ItemType File -Force -Path $envPath | Out-Null
|
||||
Write-Success "Created ~/.hermes/.env"
|
||||
}
|
||||
@@ -378,7 +440,7 @@ function Copy-ConfigTemplates {
|
||||
Write-Info "~/.hermes/.env already exists, keeping it"
|
||||
}
|
||||
|
||||
# Create config.yaml at ~/.hermes/config.yaml (top level, easy to find)
|
||||
# Create config.yaml
|
||||
$configPath = "$HermesHome\config.yaml"
|
||||
if (-not (Test-Path $configPath)) {
|
||||
$examplePath = "$InstallDir\cli-config.yaml.example"
|
||||
@@ -407,7 +469,7 @@ function Install-NodeDeps {
|
||||
npm install --silent 2>&1 | Out-Null
|
||||
Write-Success "Node.js dependencies installed"
|
||||
} catch {
|
||||
Write-Warning "npm install failed (browser tools may not work)"
|
||||
Write-Warn "npm install failed (browser tools may not work)"
|
||||
}
|
||||
}
|
||||
|
||||
@@ -426,12 +488,13 @@ function Invoke-SetupWizard {
|
||||
|
||||
Push-Location $InstallDir
|
||||
|
||||
# Run hermes setup using the venv Python directly (no activation needed)
|
||||
if (-not $NoVenv) {
|
||||
& .\venv\Scripts\Activate.ps1
|
||||
& ".\venv\Scripts\python.exe" -m hermes_cli.main setup
|
||||
} else {
|
||||
python -m hermes_cli.main setup
|
||||
}
|
||||
|
||||
python -m hermes_cli.main setup
|
||||
|
||||
Pop-Location
|
||||
}
|
||||
|
||||
@@ -478,7 +541,6 @@ function Write-Completion {
|
||||
Write-Host "⚡ Restart your terminal for PATH changes to take effect" -ForegroundColor Yellow
|
||||
Write-Host ""
|
||||
|
||||
# Show notes about optional tools
|
||||
if (-not $HasNode) {
|
||||
Write-Host "Note: Node.js was not found. Browser automation tools" -ForegroundColor Yellow
|
||||
Write-Host "will have limited functionality." -ForegroundColor Yellow
|
||||
@@ -500,6 +562,7 @@ function Write-Completion {
|
||||
function Main {
|
||||
Write-Banner
|
||||
|
||||
if (-not (Install-Uv)) { exit 1 }
|
||||
if (-not (Test-Python)) { exit 1 }
|
||||
if (-not (Test-Git)) { exit 1 }
|
||||
Test-Node # Optional, doesn't fail
|
||||
|
||||
@@ -3,6 +3,7 @@
|
||||
# Hermes Agent Installer
|
||||
# ============================================================================
|
||||
# Installation script for Linux and macOS.
|
||||
# Uses uv for fast Python provisioning and package management.
|
||||
#
|
||||
# Usage:
|
||||
# curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
|
||||
@@ -29,7 +30,7 @@ REPO_URL_SSH="git@github.com:NousResearch/hermes-agent.git"
|
||||
REPO_URL_HTTPS="https://github.com/NousResearch/hermes-agent.git"
|
||||
HERMES_HOME="$HOME/.hermes"
|
||||
INSTALL_DIR="${HERMES_INSTALL_DIR:-$HERMES_HOME/hermes-agent}"
|
||||
PYTHON_MIN_VERSION="3.10"
|
||||
PYTHON_VERSION="3.11"
|
||||
|
||||
# Options
|
||||
USE_VENV=true
|
||||
@@ -64,7 +65,7 @@ while [[ $# -gt 0 ]]; do
|
||||
echo " --no-venv Don't create virtual environment"
|
||||
echo " --skip-setup Skip interactive setup wizard"
|
||||
echo " --branch NAME Git branch to install (default: main)"
|
||||
echo " --dir PATH Installation directory (default: ~/.hermes-agent)"
|
||||
echo " --dir PATH Installation directory (default: ~/.hermes/hermes-agent)"
|
||||
echo " -h, --help Show this help"
|
||||
exit 0
|
||||
;;
|
||||
@@ -146,50 +147,80 @@ detect_os() {
|
||||
# Dependency checks
|
||||
# ============================================================================
|
||||
|
||||
check_python() {
|
||||
log_info "Checking Python..."
|
||||
install_uv() {
|
||||
log_info "Checking for uv package manager..."
|
||||
|
||||
# Try different python commands
|
||||
for cmd in python3.12 python3.11 python3.10 python3 python; do
|
||||
if command -v $cmd &> /dev/null; then
|
||||
PYTHON_CMD=$cmd
|
||||
PYTHON_VERSION=$($cmd -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')
|
||||
|
||||
# Check version
|
||||
if python3 -c "import sys; exit(0 if sys.version_info >= (3, 10) else 1)" 2>/dev/null; then
|
||||
log_success "Python $PYTHON_VERSION found"
|
||||
return 0
|
||||
fi
|
||||
# Check common locations for uv
|
||||
if command -v uv &> /dev/null; then
|
||||
UV_CMD="uv"
|
||||
UV_VERSION=$($UV_CMD --version 2>/dev/null)
|
||||
log_success "uv found ($UV_VERSION)"
|
||||
return 0
|
||||
fi
|
||||
|
||||
# Check ~/.local/bin (default uv install location) even if not on PATH yet
|
||||
if [ -x "$HOME/.local/bin/uv" ]; then
|
||||
UV_CMD="$HOME/.local/bin/uv"
|
||||
UV_VERSION=$($UV_CMD --version 2>/dev/null)
|
||||
log_success "uv found at ~/.local/bin ($UV_VERSION)"
|
||||
return 0
|
||||
fi
|
||||
|
||||
# Check ~/.cargo/bin (alternative uv install location)
|
||||
if [ -x "$HOME/.cargo/bin/uv" ]; then
|
||||
UV_CMD="$HOME/.cargo/bin/uv"
|
||||
UV_VERSION=$($UV_CMD --version 2>/dev/null)
|
||||
log_success "uv found at ~/.cargo/bin ($UV_VERSION)"
|
||||
return 0
|
||||
fi
|
||||
|
||||
# Install uv
|
||||
log_info "Installing uv (fast Python package manager)..."
|
||||
if curl -LsSf https://astral.sh/uv/install.sh | sh 2>/dev/null; then
|
||||
# uv installs to ~/.local/bin by default
|
||||
if [ -x "$HOME/.local/bin/uv" ]; then
|
||||
UV_CMD="$HOME/.local/bin/uv"
|
||||
elif [ -x "$HOME/.cargo/bin/uv" ]; then
|
||||
UV_CMD="$HOME/.cargo/bin/uv"
|
||||
elif command -v uv &> /dev/null; then
|
||||
UV_CMD="uv"
|
||||
else
|
||||
log_error "uv installed but not found on PATH"
|
||||
log_info "Try adding ~/.local/bin to your PATH and re-running"
|
||||
exit 1
|
||||
fi
|
||||
done
|
||||
UV_VERSION=$($UV_CMD --version 2>/dev/null)
|
||||
log_success "uv installed ($UV_VERSION)"
|
||||
else
|
||||
log_error "Failed to install uv"
|
||||
log_info "Install manually: https://docs.astral.sh/uv/getting-started/installation/"
|
||||
exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
check_python() {
|
||||
log_info "Checking Python $PYTHON_VERSION..."
|
||||
|
||||
log_error "Python 3.10+ not found"
|
||||
log_info "Please install Python 3.10 or newer:"
|
||||
# Let uv handle Python — it can download and manage Python versions
|
||||
# First check if a suitable Python is already available
|
||||
if $UV_CMD python find "$PYTHON_VERSION" &> /dev/null; then
|
||||
PYTHON_PATH=$($UV_CMD python find "$PYTHON_VERSION")
|
||||
PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
|
||||
log_success "Python found: $PYTHON_FOUND_VERSION"
|
||||
return 0
|
||||
fi
|
||||
|
||||
case "$OS" in
|
||||
linux)
|
||||
case "$DISTRO" in
|
||||
ubuntu|debian)
|
||||
log_info " sudo apt update && sudo apt install python3.11 python3.11-venv"
|
||||
;;
|
||||
fedora)
|
||||
log_info " sudo dnf install python3.11"
|
||||
;;
|
||||
arch)
|
||||
log_info " sudo pacman -S python"
|
||||
;;
|
||||
*)
|
||||
log_info " Use your package manager to install Python 3.10+"
|
||||
;;
|
||||
esac
|
||||
;;
|
||||
macos)
|
||||
log_info " brew install python@3.11"
|
||||
log_info " Or download from https://www.python.org/downloads/"
|
||||
;;
|
||||
esac
|
||||
|
||||
exit 1
|
||||
# Python not found — use uv to install it (no sudo needed!)
|
||||
log_info "Python $PYTHON_VERSION not found, installing via uv..."
|
||||
if $UV_CMD python install "$PYTHON_VERSION"; then
|
||||
PYTHON_PATH=$($UV_CMD python find "$PYTHON_VERSION")
|
||||
PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
|
||||
log_success "Python installed: $PYTHON_FOUND_VERSION"
|
||||
else
|
||||
log_error "Failed to install Python $PYTHON_VERSION"
|
||||
log_info "Install Python $PYTHON_VERSION manually, then re-run this script"
|
||||
exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
check_git() {
|
||||
@@ -294,7 +325,6 @@ check_ripgrep() {
|
||||
# Check if we can use sudo
|
||||
CAN_SUDO=false
|
||||
if command -v sudo &> /dev/null; then
|
||||
# Check if user has sudo access (without actually running sudo)
|
||||
if sudo -n true 2>/dev/null || sudo -v 2>/dev/null; then
|
||||
CAN_SUDO=true
|
||||
fi
|
||||
@@ -328,7 +358,6 @@ check_ripgrep() {
|
||||
esac
|
||||
else
|
||||
log_warn "sudo not available - cannot auto-install system packages"
|
||||
# Try cargo as fallback if available
|
||||
if command -v cargo &> /dev/null; then
|
||||
log_info "Trying cargo install (no sudo required)..."
|
||||
if cargo install ripgrep 2>/dev/null; then
|
||||
@@ -371,7 +400,6 @@ check_ripgrep() {
|
||||
log_info " https://github.com/BurntSushi/ripgrep#installation"
|
||||
;;
|
||||
esac
|
||||
# Show cargo alternative for users without sudo
|
||||
if command -v cargo &> /dev/null; then
|
||||
log_info " Or without sudo: cargo install ripgrep"
|
||||
fi
|
||||
@@ -440,39 +468,36 @@ setup_venv() {
|
||||
return 0
|
||||
fi
|
||||
|
||||
log_info "Creating virtual environment..."
|
||||
log_info "Creating virtual environment with Python $PYTHON_VERSION..."
|
||||
|
||||
if [ -d "venv" ]; then
|
||||
log_info "Virtual environment already exists"
|
||||
else
|
||||
$PYTHON_CMD -m venv venv
|
||||
log_info "Virtual environment already exists, recreating..."
|
||||
rm -rf venv
|
||||
fi
|
||||
|
||||
# Activate
|
||||
source venv/bin/activate
|
||||
# uv creates the venv and pins the Python version in one step
|
||||
$UV_CMD venv venv --python "$PYTHON_VERSION"
|
||||
|
||||
# Upgrade pip
|
||||
pip install --upgrade pip wheel setuptools > /dev/null
|
||||
|
||||
log_success "Virtual environment ready"
|
||||
log_success "Virtual environment ready (Python $PYTHON_VERSION)"
|
||||
}
|
||||
|
||||
install_deps() {
|
||||
log_info "Installing dependencies..."
|
||||
|
||||
if [ "$USE_VENV" = true ]; then
|
||||
source venv/bin/activate
|
||||
# Tell uv to install into our venv (no need to activate)
|
||||
export VIRTUAL_ENV="$INSTALL_DIR/venv"
|
||||
fi
|
||||
|
||||
# Install the main package in editable mode with all extras
|
||||
pip install -e ".[all]" > /dev/null 2>&1 || pip install -e "." > /dev/null
|
||||
$UV_CMD pip install -e ".[all]" || $UV_CMD pip install -e "."
|
||||
|
||||
log_success "Main package installed"
|
||||
|
||||
# Install submodules
|
||||
log_info "Installing mini-swe-agent (terminal tool backend)..."
|
||||
if [ -d "mini-swe-agent" ] && [ -f "mini-swe-agent/pyproject.toml" ]; then
|
||||
pip install -e "./mini-swe-agent" > /dev/null 2>&1 || log_warn "mini-swe-agent install failed (terminal tools may not work)"
|
||||
$UV_CMD pip install -e "./mini-swe-agent" || log_warn "mini-swe-agent install failed (terminal tools may not work)"
|
||||
log_success "mini-swe-agent installed"
|
||||
else
|
||||
log_warn "mini-swe-agent not found (run: git submodule update --init)"
|
||||
@@ -480,7 +505,7 @@ install_deps() {
|
||||
|
||||
log_info "Installing tinker-atropos (RL training backend)..."
|
||||
if [ -d "tinker-atropos" ] && [ -f "tinker-atropos/pyproject.toml" ]; then
|
||||
pip install -e "./tinker-atropos" > /dev/null 2>&1 || log_warn "tinker-atropos install failed (RL tools may not work)"
|
||||
$UV_CMD pip install -e "./tinker-atropos" || log_warn "tinker-atropos install failed (RL tools may not work)"
|
||||
log_success "tinker-atropos installed"
|
||||
else
|
||||
log_warn "tinker-atropos not found (run: git submodule update --init)"
|
||||
@@ -490,53 +515,56 @@ install_deps() {
|
||||
}
|
||||
|
||||
setup_path() {
|
||||
log_info "Setting up PATH..."
|
||||
log_info "Setting up hermes command..."
|
||||
|
||||
# Determine the bin directory
|
||||
if [ "$USE_VENV" = true ]; then
|
||||
BIN_DIR="$INSTALL_DIR/venv/bin"
|
||||
HERMES_BIN="$INSTALL_DIR/venv/bin/hermes"
|
||||
else
|
||||
BIN_DIR="$HOME/.local/bin"
|
||||
mkdir -p "$BIN_DIR"
|
||||
HERMES_BIN="$(which hermes 2>/dev/null || echo "")"
|
||||
if [ -z "$HERMES_BIN" ]; then
|
||||
log_warn "hermes not found on PATH after install"
|
||||
return 0
|
||||
fi
|
||||
fi
|
||||
|
||||
# Create symlink in ~/.local/bin (standard user binary location, usually on PATH)
|
||||
mkdir -p "$HOME/.local/bin"
|
||||
ln -sf "$HERMES_BIN" "$HOME/.local/bin/hermes"
|
||||
log_success "Symlinked hermes → ~/.local/bin/hermes"
|
||||
|
||||
# Check if ~/.local/bin is on PATH; if not, add it to shell config
|
||||
if ! echo "$PATH" | tr ':' '\n' | grep -q "^$HOME/.local/bin$"; then
|
||||
SHELL_CONFIG=""
|
||||
if [ -n "$BASH_VERSION" ]; then
|
||||
if [ -f "$HOME/.bashrc" ]; then
|
||||
SHELL_CONFIG="$HOME/.bashrc"
|
||||
elif [ -f "$HOME/.bash_profile" ]; then
|
||||
SHELL_CONFIG="$HOME/.bash_profile"
|
||||
fi
|
||||
elif [ -n "$ZSH_VERSION" ] || [ -f "$HOME/.zshrc" ]; then
|
||||
SHELL_CONFIG="$HOME/.zshrc"
|
||||
fi
|
||||
|
||||
# Create a wrapper script
|
||||
cat > "$BIN_DIR/hermes" << EOF
|
||||
#!/bin/bash
|
||||
cd "$INSTALL_DIR"
|
||||
exec python -m hermes_cli.main "\$@"
|
||||
EOF
|
||||
chmod +x "$BIN_DIR/hermes"
|
||||
fi
|
||||
|
||||
# Add to PATH in shell config
|
||||
SHELL_CONFIG=""
|
||||
if [ -n "$BASH_VERSION" ]; then
|
||||
if [ -f "$HOME/.bashrc" ]; then
|
||||
SHELL_CONFIG="$HOME/.bashrc"
|
||||
elif [ -f "$HOME/.bash_profile" ]; then
|
||||
SHELL_CONFIG="$HOME/.bash_profile"
|
||||
PATH_LINE='export PATH="$HOME/.local/bin:$PATH"'
|
||||
|
||||
if [ -n "$SHELL_CONFIG" ]; then
|
||||
if ! grep -q '\.local/bin' "$SHELL_CONFIG" 2>/dev/null; then
|
||||
echo "" >> "$SHELL_CONFIG"
|
||||
echo "# Hermes Agent — ensure ~/.local/bin is on PATH" >> "$SHELL_CONFIG"
|
||||
echo "$PATH_LINE" >> "$SHELL_CONFIG"
|
||||
log_success "Added ~/.local/bin to PATH in $SHELL_CONFIG"
|
||||
else
|
||||
log_info "~/.local/bin already referenced in $SHELL_CONFIG"
|
||||
fi
|
||||
fi
|
||||
elif [ -n "$ZSH_VERSION" ] || [ -f "$HOME/.zshrc" ]; then
|
||||
SHELL_CONFIG="$HOME/.zshrc"
|
||||
else
|
||||
log_info "~/.local/bin already on PATH"
|
||||
fi
|
||||
|
||||
PATH_LINE="export PATH=\"$BIN_DIR:\$PATH\""
|
||||
# Export for current session so hermes works immediately
|
||||
export PATH="$HOME/.local/bin:$PATH"
|
||||
|
||||
if [ -n "$SHELL_CONFIG" ]; then
|
||||
if ! grep -q "hermes-agent" "$SHELL_CONFIG" 2>/dev/null; then
|
||||
echo "" >> "$SHELL_CONFIG"
|
||||
echo "# Hermes Agent" >> "$SHELL_CONFIG"
|
||||
echo "$PATH_LINE" >> "$SHELL_CONFIG"
|
||||
log_success "Added to $SHELL_CONFIG"
|
||||
else
|
||||
log_info "PATH already configured in $SHELL_CONFIG"
|
||||
fi
|
||||
fi
|
||||
|
||||
# Also export for current session
|
||||
export PATH="$BIN_DIR:$PATH"
|
||||
|
||||
log_success "PATH configured"
|
||||
log_success "hermes command ready"
|
||||
}
|
||||
|
||||
copy_config_templates() {
|
||||
@@ -553,7 +581,6 @@ copy_config_templates() {
|
||||
cp "$INSTALL_DIR/.env.example" "$HERMES_HOME/.env"
|
||||
log_success "Created ~/.hermes/.env from template"
|
||||
else
|
||||
# Create empty .env if no example exists
|
||||
touch "$HERMES_HOME/.env"
|
||||
log_success "Created ~/.hermes/.env"
|
||||
fi
|
||||
@@ -601,12 +628,14 @@ run_setup_wizard() {
|
||||
log_info "Starting setup wizard..."
|
||||
echo ""
|
||||
|
||||
if [ "$USE_VENV" = true ]; then
|
||||
source "$INSTALL_DIR/venv/bin/activate"
|
||||
fi
|
||||
|
||||
cd "$INSTALL_DIR"
|
||||
python -m hermes_cli.main setup
|
||||
|
||||
# Run hermes setup using the venv Python directly (no activation needed)
|
||||
if [ "$USE_VENV" = true ]; then
|
||||
"$INSTALL_DIR/venv/bin/python" -m hermes_cli.main setup
|
||||
else
|
||||
python -m hermes_cli.main setup
|
||||
fi
|
||||
}
|
||||
|
||||
print_success() {
|
||||
@@ -673,6 +702,7 @@ main() {
|
||||
print_banner
|
||||
|
||||
detect_os
|
||||
install_uv
|
||||
check_python
|
||||
check_git
|
||||
check_node
|
||||
|
||||
152
setup-hermes.sh
152
setup-hermes.sh
@@ -3,16 +3,18 @@
|
||||
# Hermes Agent Setup Script
|
||||
# ============================================================================
|
||||
# Quick setup for developers who cloned the repo manually.
|
||||
# Uses uv for fast Python provisioning and package management.
|
||||
#
|
||||
# Usage:
|
||||
# ./setup-hermes.sh
|
||||
#
|
||||
# This script:
|
||||
# 1. Creates a virtual environment (if not exists)
|
||||
# 2. Installs dependencies
|
||||
# 3. Creates .env from template (if not exists)
|
||||
# 4. Installs the 'hermes' CLI command
|
||||
# 5. Runs the setup wizard (optional)
|
||||
# 1. Installs uv if not present
|
||||
# 2. Creates a virtual environment with Python 3.11 via uv
|
||||
# 3. Installs all dependencies (main package + submodules)
|
||||
# 4. Creates .env from template (if not exists)
|
||||
# 5. Symlinks the 'hermes' CLI command into ~/.local/bin
|
||||
# 6. Runs the setup wizard (optional)
|
||||
# ============================================================================
|
||||
|
||||
set -e
|
||||
@@ -21,38 +23,75 @@ set -e
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[0;33m'
|
||||
CYAN='\033[0;36m'
|
||||
RED='\033[0;31m'
|
||||
NC='\033[0m'
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
cd "$SCRIPT_DIR"
|
||||
|
||||
PYTHON_VERSION="3.11"
|
||||
|
||||
echo ""
|
||||
echo -e "${CYAN}🦋 Hermes Agent Setup${NC}"
|
||||
echo ""
|
||||
|
||||
# ============================================================================
|
||||
# Python check
|
||||
# Install / locate uv
|
||||
# ============================================================================
|
||||
|
||||
echo -e "${CYAN}→${NC} Checking Python..."
|
||||
echo -e "${CYAN}→${NC} Checking for uv..."
|
||||
|
||||
PYTHON_CMD=""
|
||||
for cmd in python3.12 python3.11 python3.10 python3 python; do
|
||||
if command -v $cmd &> /dev/null; then
|
||||
if $cmd -c "import sys; exit(0 if sys.version_info >= (3, 10) else 1)" 2>/dev/null; then
|
||||
PYTHON_CMD=$cmd
|
||||
break
|
||||
fi
|
||||
fi
|
||||
done
|
||||
|
||||
if [ -z "$PYTHON_CMD" ]; then
|
||||
echo -e "${YELLOW}✗${NC} Python 3.10+ required"
|
||||
exit 1
|
||||
UV_CMD=""
|
||||
if command -v uv &> /dev/null; then
|
||||
UV_CMD="uv"
|
||||
elif [ -x "$HOME/.local/bin/uv" ]; then
|
||||
UV_CMD="$HOME/.local/bin/uv"
|
||||
elif [ -x "$HOME/.cargo/bin/uv" ]; then
|
||||
UV_CMD="$HOME/.cargo/bin/uv"
|
||||
fi
|
||||
|
||||
PYTHON_VERSION=$($PYTHON_CMD -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')
|
||||
echo -e "${GREEN}✓${NC} Python $PYTHON_VERSION found"
|
||||
if [ -n "$UV_CMD" ]; then
|
||||
UV_VERSION=$($UV_CMD --version 2>/dev/null)
|
||||
echo -e "${GREEN}✓${NC} uv found ($UV_VERSION)"
|
||||
else
|
||||
echo -e "${CYAN}→${NC} Installing uv..."
|
||||
if curl -LsSf https://astral.sh/uv/install.sh | sh 2>/dev/null; then
|
||||
if [ -x "$HOME/.local/bin/uv" ]; then
|
||||
UV_CMD="$HOME/.local/bin/uv"
|
||||
elif [ -x "$HOME/.cargo/bin/uv" ]; then
|
||||
UV_CMD="$HOME/.cargo/bin/uv"
|
||||
fi
|
||||
|
||||
if [ -n "$UV_CMD" ]; then
|
||||
UV_VERSION=$($UV_CMD --version 2>/dev/null)
|
||||
echo -e "${GREEN}✓${NC} uv installed ($UV_VERSION)"
|
||||
else
|
||||
echo -e "${RED}✗${NC} uv installed but not found. Add ~/.local/bin to PATH and retry."
|
||||
exit 1
|
||||
fi
|
||||
else
|
||||
echo -e "${RED}✗${NC} Failed to install uv. Visit https://docs.astral.sh/uv/"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
# ============================================================================
|
||||
# Python check (uv can provision it automatically)
|
||||
# ============================================================================
|
||||
|
||||
echo -e "${CYAN}→${NC} Checking Python $PYTHON_VERSION..."
|
||||
|
||||
if $UV_CMD python find "$PYTHON_VERSION" &> /dev/null; then
|
||||
PYTHON_PATH=$($UV_CMD python find "$PYTHON_VERSION")
|
||||
PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
|
||||
echo -e "${GREEN}✓${NC} $PYTHON_FOUND_VERSION found"
|
||||
else
|
||||
echo -e "${CYAN}→${NC} Python $PYTHON_VERSION not found, installing via uv..."
|
||||
$UV_CMD python install "$PYTHON_VERSION"
|
||||
PYTHON_PATH=$($UV_CMD python find "$PYTHON_VERSION")
|
||||
PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
|
||||
echo -e "${GREEN}✓${NC} $PYTHON_FOUND_VERSION installed"
|
||||
fi
|
||||
|
||||
# ============================================================================
|
||||
# Virtual environment
|
||||
@@ -60,15 +99,16 @@ echo -e "${GREEN}✓${NC} Python $PYTHON_VERSION found"
|
||||
|
||||
echo -e "${CYAN}→${NC} Setting up virtual environment..."
|
||||
|
||||
if [ ! -d "venv" ]; then
|
||||
$PYTHON_CMD -m venv venv
|
||||
echo -e "${GREEN}✓${NC} Created venv"
|
||||
else
|
||||
echo -e "${GREEN}✓${NC} venv exists"
|
||||
if [ -d "venv" ]; then
|
||||
echo -e "${CYAN}→${NC} Removing old venv..."
|
||||
rm -rf venv
|
||||
fi
|
||||
|
||||
source venv/bin/activate
|
||||
pip install --upgrade pip wheel setuptools > /dev/null
|
||||
$UV_CMD venv venv --python "$PYTHON_VERSION"
|
||||
echo -e "${GREEN}✓${NC} venv created (Python $PYTHON_VERSION)"
|
||||
|
||||
# Tell uv to install into this venv (no activation needed for uv)
|
||||
export VIRTUAL_ENV="$SCRIPT_DIR/venv"
|
||||
|
||||
# ============================================================================
|
||||
# Dependencies
|
||||
@@ -76,10 +116,34 @@ pip install --upgrade pip wheel setuptools > /dev/null
|
||||
|
||||
echo -e "${CYAN}→${NC} Installing dependencies..."
|
||||
|
||||
pip install -e ".[all]" > /dev/null 2>&1 || pip install -e "." > /dev/null
|
||||
$UV_CMD pip install -e ".[all]" || $UV_CMD pip install -e "."
|
||||
|
||||
echo -e "${GREEN}✓${NC} Dependencies installed"
|
||||
|
||||
# ============================================================================
|
||||
# Submodules (terminal backend + RL training)
|
||||
# ============================================================================
|
||||
|
||||
echo -e "${CYAN}→${NC} Installing submodules..."
|
||||
|
||||
# mini-swe-agent (terminal tool backend)
|
||||
if [ -d "mini-swe-agent" ] && [ -f "mini-swe-agent/pyproject.toml" ]; then
|
||||
$UV_CMD pip install -e "./mini-swe-agent" && \
|
||||
echo -e "${GREEN}✓${NC} mini-swe-agent installed" || \
|
||||
echo -e "${YELLOW}⚠${NC} mini-swe-agent install failed (terminal tools may not work)"
|
||||
else
|
||||
echo -e "${YELLOW}⚠${NC} mini-swe-agent not found (run: git submodule update --init --recursive)"
|
||||
fi
|
||||
|
||||
# tinker-atropos (RL training backend)
|
||||
if [ -d "tinker-atropos" ] && [ -f "tinker-atropos/pyproject.toml" ]; then
|
||||
$UV_CMD pip install -e "./tinker-atropos" && \
|
||||
echo -e "${GREEN}✓${NC} tinker-atropos installed" || \
|
||||
echo -e "${YELLOW}⚠${NC} tinker-atropos install failed (RL tools may not work)"
|
||||
else
|
||||
echo -e "${YELLOW}⚠${NC} tinker-atropos not found (run: git submodule update --init --recursive)"
|
||||
fi
|
||||
|
||||
# ============================================================================
|
||||
# Optional: ripgrep (for faster file search)
|
||||
# ============================================================================
|
||||
@@ -141,14 +205,17 @@ else
|
||||
fi
|
||||
|
||||
# ============================================================================
|
||||
# PATH setup
|
||||
# PATH setup — symlink hermes into ~/.local/bin
|
||||
# ============================================================================
|
||||
|
||||
echo -e "${CYAN}→${NC} Setting up hermes command..."
|
||||
|
||||
BIN_DIR="$SCRIPT_DIR/venv/bin"
|
||||
HERMES_BIN="$SCRIPT_DIR/venv/bin/hermes"
|
||||
mkdir -p "$HOME/.local/bin"
|
||||
ln -sf "$HERMES_BIN" "$HOME/.local/bin/hermes"
|
||||
echo -e "${GREEN}✓${NC} Symlinked hermes → ~/.local/bin/hermes"
|
||||
|
||||
# Add to shell config if not already there
|
||||
# Ensure ~/.local/bin is on PATH in shell config
|
||||
SHELL_CONFIG=""
|
||||
if [ -f "$HOME/.zshrc" ]; then
|
||||
SHELL_CONFIG="$HOME/.zshrc"
|
||||
@@ -159,13 +226,17 @@ elif [ -f "$HOME/.bash_profile" ]; then
|
||||
fi
|
||||
|
||||
if [ -n "$SHELL_CONFIG" ]; then
|
||||
if ! grep -q "hermes-agent" "$SHELL_CONFIG" 2>/dev/null; then
|
||||
echo "" >> "$SHELL_CONFIG"
|
||||
echo "# Hermes Agent" >> "$SHELL_CONFIG"
|
||||
echo "export PATH=\"$BIN_DIR:\$PATH\"" >> "$SHELL_CONFIG"
|
||||
echo -e "${GREEN}✓${NC} Added to $SHELL_CONFIG"
|
||||
if ! echo "$PATH" | tr ':' '\n' | grep -q "^$HOME/.local/bin$"; then
|
||||
if ! grep -q '\.local/bin' "$SHELL_CONFIG" 2>/dev/null; then
|
||||
echo "" >> "$SHELL_CONFIG"
|
||||
echo "# Hermes Agent — ensure ~/.local/bin is on PATH" >> "$SHELL_CONFIG"
|
||||
echo 'export PATH="$HOME/.local/bin:$PATH"' >> "$SHELL_CONFIG"
|
||||
echo -e "${GREEN}✓${NC} Added ~/.local/bin to PATH in $SHELL_CONFIG"
|
||||
else
|
||||
echo -e "${GREEN}✓${NC} ~/.local/bin already in $SHELL_CONFIG"
|
||||
fi
|
||||
else
|
||||
echo -e "${GREEN}✓${NC} PATH already in $SHELL_CONFIG"
|
||||
echo -e "${GREEN}✓${NC} ~/.local/bin already on PATH"
|
||||
fi
|
||||
fi
|
||||
|
||||
@@ -199,5 +270,6 @@ read -p "Would you like to run the setup wizard now? [Y/n] " -n 1 -r
|
||||
echo
|
||||
if [[ $REPLY =~ ^[Yy]$ ]] || [[ -z $REPLY ]]; then
|
||||
echo ""
|
||||
python -m hermes_cli.main setup
|
||||
# Run directly with venv Python (no activation needed)
|
||||
"$SCRIPT_DIR/venv/bin/python" -m hermes_cli.main setup
|
||||
fi
|
||||
|
||||
@@ -2,6 +2,7 @@
|
||||
"""File Tools Module - LLM agent file manipulation tools."""
|
||||
|
||||
import json
|
||||
import os
|
||||
import threading
|
||||
from typing import Optional
|
||||
from tools.file_operations import ShellFileOperations
|
||||
@@ -11,23 +12,85 @@ _file_ops_cache: dict = {}
|
||||
|
||||
|
||||
def _get_file_ops(task_id: str = "default") -> ShellFileOperations:
|
||||
"""Get or create ShellFileOperations for a terminal environment."""
|
||||
from tools.terminal_tool import _active_environments, _env_lock, _LocalEnvironment
|
||||
"""Get or create ShellFileOperations for a terminal environment.
|
||||
|
||||
Respects the TERMINAL_ENV setting -- if the task_id doesn't have an
|
||||
environment yet, creates one using the configured backend (local, docker,
|
||||
modal, etc.) rather than always defaulting to local.
|
||||
"""
|
||||
from tools.terminal_tool import (
|
||||
_active_environments, _env_lock, _create_environment,
|
||||
_get_env_config, _last_activity, _start_cleanup_thread,
|
||||
_check_disk_usage_warning,
|
||||
)
|
||||
import time
|
||||
|
||||
# Fast path: check cache without heavy locks
|
||||
with _file_ops_lock:
|
||||
if task_id in _file_ops_cache:
|
||||
return _file_ops_cache[task_id]
|
||||
|
||||
# Check if we need to create a new environment
|
||||
needs_creation = False
|
||||
with _env_lock:
|
||||
if task_id not in _active_environments:
|
||||
needs_creation = True
|
||||
|
||||
# Create environment OUTSIDE locks so we don't block other rollouts
|
||||
# during slow Modal/Docker startup (~10s)
|
||||
if needs_creation:
|
||||
config = _get_env_config()
|
||||
env_type = config["env_type"]
|
||||
|
||||
if env_type == "docker":
|
||||
image = config["docker_image"]
|
||||
elif env_type == "singularity":
|
||||
image = config["singularity_image"]
|
||||
elif env_type == "modal":
|
||||
image = config["modal_image"]
|
||||
else:
|
||||
image = ""
|
||||
|
||||
cwd = config["cwd"]
|
||||
_check_disk_usage_warning()
|
||||
if not os.getenv("HERMES_QUIET"):
|
||||
print(f"[FileTools] Creating new {env_type} environment for task {task_id[:8]}...", flush=True)
|
||||
|
||||
new_env = _create_environment(
|
||||
env_type=env_type,
|
||||
image=image,
|
||||
cwd=cwd,
|
||||
timeout=config["timeout"],
|
||||
)
|
||||
|
||||
# Store under lock (brief) -- do NOT call _start_cleanup_thread inside
|
||||
# the lock because it also acquires _env_lock (non-reentrant = deadlock)
|
||||
created = False
|
||||
with _env_lock:
|
||||
if task_id not in _active_environments:
|
||||
import os
|
||||
env = _LocalEnvironment(cwd=os.getcwd(), timeout=60)
|
||||
_active_environments[task_id] = env
|
||||
terminal_env = _active_environments[task_id]
|
||||
_active_environments[task_id] = new_env
|
||||
created = True
|
||||
else:
|
||||
try:
|
||||
if hasattr(new_env, 'stop'):
|
||||
new_env.stop()
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
file_ops = ShellFileOperations(terminal_env)
|
||||
if created:
|
||||
_start_cleanup_thread()
|
||||
if not os.getenv("HERMES_QUIET"):
|
||||
print(f"[FileTools] {env_type} environment ready for task {task_id[:8]}", flush=True)
|
||||
|
||||
# Now get the environment and build file_ops
|
||||
with _env_lock:
|
||||
_last_activity[task_id] = time.time()
|
||||
terminal_env = _active_environments[task_id]
|
||||
|
||||
file_ops = ShellFileOperations(terminal_env)
|
||||
with _file_ops_lock:
|
||||
_file_ops_cache[task_id] = file_ops
|
||||
return file_ops
|
||||
return file_ops
|
||||
|
||||
|
||||
def clear_file_ops_cache(task_id: str = None):
|
||||
@@ -56,6 +119,7 @@ def write_file_tool(path: str, content: str, task_id: str = "default") -> str:
|
||||
result = file_ops.write_file(path, content)
|
||||
return json.dumps(result.to_dict(), ensure_ascii=False)
|
||||
except Exception as e:
|
||||
print(f"[FileTools] write_file error: {type(e).__name__}: {e}", flush=True)
|
||||
return json.dumps({"error": str(e)}, ensure_ascii=False)
|
||||
|
||||
|
||||
|
||||
@@ -1300,10 +1300,26 @@ async def rl_test_inference(
|
||||
# Requirements Check
|
||||
# ============================================================================
|
||||
|
||||
def check_rl_python_version() -> bool:
|
||||
"""
|
||||
Check if Python version meets the minimum for RL tools.
|
||||
|
||||
tinker-atropos depends on the 'tinker' package which requires Python >= 3.11.
|
||||
"""
|
||||
return sys.version_info >= (3, 11)
|
||||
|
||||
|
||||
def check_rl_api_keys() -> bool:
|
||||
"""
|
||||
Check if required API keys are available.
|
||||
Check if required API keys and Python version are available.
|
||||
|
||||
RL training requires:
|
||||
- Python >= 3.11 (tinker package requirement)
|
||||
- TINKER_API_KEY for the Tinker training API
|
||||
- WANDB_API_KEY for Weights & Biases metrics
|
||||
"""
|
||||
if not check_rl_python_version():
|
||||
return False
|
||||
tinker_key = os.getenv("TINKER_API_KEY")
|
||||
wandb_key = os.getenv("WANDB_API_KEY")
|
||||
return bool(tinker_key) and bool(wandb_key)
|
||||
@@ -1311,9 +1327,11 @@ def check_rl_api_keys() -> bool:
|
||||
|
||||
def get_missing_keys() -> List[str]:
|
||||
"""
|
||||
Get list of missing required API keys.
|
||||
Get list of missing requirements for RL tools (API keys and Python version).
|
||||
"""
|
||||
missing = []
|
||||
if not check_rl_python_version():
|
||||
missing.append(f"Python >= 3.11 (current: {sys.version_info.major}.{sys.version_info.minor})")
|
||||
if not os.getenv("TINKER_API_KEY"):
|
||||
missing.append("TINKER_API_KEY")
|
||||
if not os.getenv("WANDB_API_KEY"):
|
||||
|
||||
@@ -1347,40 +1347,61 @@ def terminal_tool(
|
||||
_start_cleanup_thread()
|
||||
|
||||
# Get or create environment
|
||||
# Check under lock, but create OUTSIDE lock so we don't block
|
||||
# other concurrent rollouts during slow Modal/Docker startup
|
||||
needs_creation = False
|
||||
with _env_lock:
|
||||
if effective_task_id not in _active_environments:
|
||||
# Check disk usage before creating new environment
|
||||
_check_disk_usage_warning()
|
||||
|
||||
try:
|
||||
# Build SSH config if using SSH environment
|
||||
ssh_config = None
|
||||
if env_type == "ssh":
|
||||
ssh_config = {
|
||||
"host": config.get("ssh_host", ""),
|
||||
"user": config.get("ssh_user", ""),
|
||||
"port": config.get("ssh_port", 22),
|
||||
"key": config.get("ssh_key", ""),
|
||||
}
|
||||
|
||||
_active_environments[effective_task_id] = _create_environment(
|
||||
env_type=env_type,
|
||||
image=image,
|
||||
cwd=cwd,
|
||||
timeout=effective_timeout,
|
||||
ssh_config=ssh_config
|
||||
)
|
||||
except ImportError as e:
|
||||
return json.dumps({
|
||||
"output": "",
|
||||
"exit_code": -1,
|
||||
"error": f"Terminal tool disabled: mini-swe-agent not available ({e})",
|
||||
"status": "disabled"
|
||||
}, ensure_ascii=False)
|
||||
needs_creation = True
|
||||
else:
|
||||
_last_activity[effective_task_id] = time.time()
|
||||
env = _active_environments[effective_task_id]
|
||||
|
||||
# Update last activity time
|
||||
_last_activity[effective_task_id] = time.time()
|
||||
env = _active_environments[effective_task_id]
|
||||
if needs_creation:
|
||||
_check_disk_usage_warning()
|
||||
if not os.getenv("HERMES_QUIET"):
|
||||
print(f"[Terminal] Creating new {env_type} environment for task {effective_task_id[:8]}...", flush=True)
|
||||
try:
|
||||
ssh_config = None
|
||||
if env_type == "ssh":
|
||||
ssh_config = {
|
||||
"host": config.get("ssh_host", ""),
|
||||
"user": config.get("ssh_user", ""),
|
||||
"port": config.get("ssh_port", 22),
|
||||
"key": config.get("ssh_key", ""),
|
||||
}
|
||||
|
||||
new_env = _create_environment(
|
||||
env_type=env_type,
|
||||
image=image,
|
||||
cwd=cwd,
|
||||
timeout=effective_timeout,
|
||||
ssh_config=ssh_config
|
||||
)
|
||||
except ImportError as e:
|
||||
return json.dumps({
|
||||
"output": "",
|
||||
"exit_code": -1,
|
||||
"error": f"Terminal tool disabled: mini-swe-agent not available ({e})",
|
||||
"status": "disabled"
|
||||
}, ensure_ascii=False)
|
||||
|
||||
# Store under lock (brief)
|
||||
with _env_lock:
|
||||
if effective_task_id not in _active_environments:
|
||||
_active_environments[effective_task_id] = new_env
|
||||
else:
|
||||
# Another thread created it while we were building -- clean up ours
|
||||
try:
|
||||
if hasattr(new_env, 'stop'):
|
||||
new_env.stop()
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
_last_activity[effective_task_id] = time.time()
|
||||
env = _active_environments[effective_task_id]
|
||||
if not os.getenv("HERMES_QUIET"):
|
||||
print(f"[Terminal] {env_type} environment ready for task {effective_task_id[:8]}", flush=True)
|
||||
|
||||
# Check for dangerous commands (only for local/ssh in interactive modes)
|
||||
# Skip check if force=True (user has confirmed they want to run it)
|
||||
@@ -1435,13 +1456,20 @@ def terminal_tool(
|
||||
retry_count += 1
|
||||
wait_time = 2 ** retry_count
|
||||
print(f"⚠️ Terminal: execution error, retrying in {wait_time}s (attempt {retry_count}/{max_retries})")
|
||||
print(f" Command: {command[:200]}")
|
||||
print(f" Error: {type(e).__name__}: {e}")
|
||||
print(f" Task ID: {effective_task_id}, Backend: {env_type}")
|
||||
time.sleep(wait_time)
|
||||
continue
|
||||
|
||||
print(f"❌ Terminal: execution failed after {max_retries} retries")
|
||||
print(f" Command: {command[:200]}")
|
||||
print(f" Error: {type(e).__name__}: {e}")
|
||||
print(f" Task ID: {effective_task_id}, Backend: {env_type}")
|
||||
return json.dumps({
|
||||
"output": "",
|
||||
"exit_code": -1,
|
||||
"error": f"Command execution failed: {str(e)}"
|
||||
"error": f"Command execution failed: {type(e).__name__}: {str(e)}"
|
||||
}, ensure_ascii=False)
|
||||
|
||||
# Got a result
|
||||
|
||||
Reference in New Issue
Block a user